Message ID | 20250309121324.29633-1-john.madieu.xa@bp.renesas.com (mailing list archive) |
---|---|
Headers | show |
Series | thermal: Add CPU hotplug cooling driver | expand |
Hi John, Thanks for the patch. > -----Original Message----- > From: John Madieu <john.madieu.xa@bp.renesas.com> > Sent: 09 March 2025 12:13 > Subject: [RFC PATCH 0/3] thermal: Add CPU hotplug cooling driver > > MIME-Version: 1.0 > Content-Type: text/plain; charset=UTF-8 > Content-Transfer-Encoding: 8bit > > This patch series introduces a new thermal cooling driver that implements CPU hotplug-based thermal > management. The driver dynamically takes CPUs offline during thermal excursions to reduce power > consumption and prevent overheating, while maintaining system stability by keeping at least one CPU > online. > > 1- Problem Statement > > Modern SoCs require robust thermal management to prevent overheating under heavy workloads. Existing > cooling mechanisms like frequency scaling may not always provide sufficient thermal relief, especially > in multi-core systems where per-core thermal contributions can be significant. > > 2- Solution Overview > > The driver: > > - Integrates with the Linux thermal framework as a cooling device > - Registers per-CPU cooling devices that respond to thermal trip points > - Uses CPU hotplug operations to reduce thermal load > - Maintains system stability by preserving the boot CPU from being put offline, regardless the CPUs > that are specified in cooling device list. > - Implements proper state tracking and cleanup > > Key Features: > > - Dynamic CPU online/offline management based on thermal thresholds > - Device tree-based configuration via thermal zones and trip points > - Hysteresis support through thermal governor interactions > - Safe handling of CPU state transitions during module load/unload > - Compatibility with existing thermal management frameworks > > Testing > > - Verified on Renesas RZ/G3E platforms with multi-core CPU configurations > - Validated thermal response using artificial load generation (emul_temp) > - Confirmed proper interaction with other cooling devices > - Verified support for 'plug' type trace events > - Tested with step_wise governor > > As the 'hot' type is already used for user space notification, I've choosen 'plug' for this new type. > suggestions on this are welcome. Here is an example of 'thermal-zone' that integrate 'plug' type: > > ``` > thermal-zones { > cpu-thermal { > polling-delay = <1000>; > polling-delay-passive = <250>; > thermal-sensors = <&tsu>; > > cooling-maps { > map0 { > trip = <&target>; > cooling-device = <&cpu0 0 3>, <&cpu3 0 3>; > contribution = <1024>; > }; Is it not possible here to make cpu1 and cpu2 as well for DVFS passive cooling? > > map1 { > trip = <&trip_emergency>; > cooling-device = <&cpu1 0 1>, <&cpu2 0 1>; > contribution = <1024>; > }; > > }; Is it not possible here to make cpu3 as well as hot pluggable device for cooling? Cheers, Biju
Hi Biju, Thanks for your review. > -----Original Message----- > From: Biju Das <biju.das.jz@bp.renesas.com> > Sent: Monday, March 10, 2025 11:18 AM > To: John Madieu <john.madieu.xa@bp.renesas.com>; geert+renesas@glider.be; > niklas.soderlund+renesas@ragnatech.se; conor+dt@kernel.org; > krzk+dt@kernel.org; robh@kernel.org; rafael@kernel.org; > daniel.lezcano@linaro.org > Subject: RE: [RFC PATCH 0/3] thermal: Add CPU hotplug cooling driver > > Hi John, > > Thanks for the patch. > > > -----Original Message----- > > From: John Madieu <john.madieu.xa@bp.renesas.com> > > Sent: 09 March 2025 12:13 > > Subject: [RFC PATCH 0/3] thermal: Add CPU hotplug cooling driver > > > > MIME-Version: 1.0 > > Content-Type: text/plain; charset=UTF-8 > > Content-Transfer-Encoding: 8bit > > > > This patch series introduces a new thermal cooling driver that > > implements CPU hotplug-based thermal management. The driver > > dynamically takes CPUs offline during thermal excursions to reduce > > power consumption and prevent overheating, while maintaining system > stability by keeping at least one CPU online. > > > > 1- Problem Statement > > > > Modern SoCs require robust thermal management to prevent overheating > > under heavy workloads. Existing cooling mechanisms like frequency > > scaling may not always provide sufficient thermal relief, especially in > multi-core systems where per-core thermal contributions can be > significant. > > > > 2- Solution Overview > > > > The driver: > > > > - Integrates with the Linux thermal framework as a cooling device > > - Registers per-CPU cooling devices that respond to thermal trip > > points > > - Uses CPU hotplug operations to reduce thermal load > > - Maintains system stability by preserving the boot CPU from being > > put offline, regardless the CPUs that are specified in cooling device > list. > > - Implements proper state tracking and cleanup > > > > Key Features: > > > > - Dynamic CPU online/offline management based on thermal thresholds > > - Device tree-based configuration via thermal zones and trip points > > - Hysteresis support through thermal governor interactions > > - Safe handling of CPU state transitions during module load/unload > > - Compatibility with existing thermal management frameworks > > > > Testing > > > > - Verified on Renesas RZ/G3E platforms with multi-core CPU > > configurations > > - Validated thermal response using artificial load generation > > (emul_temp) > > - Confirmed proper interaction with other cooling devices > > - Verified support for 'plug' type trace events > > - Tested with step_wise governor > > > > As the 'hot' type is already used for user space notification, I've > choosen 'plug' for this new type. > > suggestions on this are welcome. Here is an example of 'thermal-zone' > that integrate 'plug' type: > > > > ``` > > thermal-zones { > > cpu-thermal { > > polling-delay = <1000>; > > polling-delay-passive = <250>; > > thermal-sensors = <&tsu>; > > > > cooling-maps { > > map0 { > > trip = <&target>; > > cooling-device = <&cpu0 0 3>, <&cpu3 0 3>; > > contribution = <1024>; > > }; > > Is it not possible here to make cpu1 and cpu2 as well for DVFS passive > cooling? From my tests, adding same CPUs as cooling devices in both maps generated some warnings saying that the trip could not be bound to my ("plug") cooling device. This is a point I still must investigate, and comments from maintainers would be welcome. However, despite these warnings, I had no unexpected behavior, and even thermal trace events were Ok. > > > > > map1 { > > trip = <&trip_emergency>; > > cooling-device = <&cpu1 0 1>, <&cpu2 0 1>; > > contribution = <1024>; > > }; > > > > }; > > Is it not possible here to make cpu3 as well as hot pluggable device for > cooling? > > Cheers, > Biju Regards, John