mbox series

[RFC,0/3] thermal: Add CPU hotplug cooling driver

Message ID 20250309121324.29633-1-john.madieu.xa@bp.renesas.com (mailing list archive)
Headers show
Series thermal: Add CPU hotplug cooling driver | expand

Message

John Madieu March 9, 2025, 12:13 p.m. UTC
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This patch series introduces a new thermal cooling driver that implements CPU
hotplug-based thermal management. The driver dynamically takes CPUs offline
during thermal excursions to reduce power consumption and prevent overheating,
while maintaining system stability by keeping at least one CPU online. 

1- Problem Statement

Modern SoCs require robust thermal management to prevent overheating under heavy
workloads. Existing cooling mechanisms like frequency scaling may not always
provide sufficient thermal relief, especially in multi-core systems where
per-core thermal contributions can be significant. 

2- Solution Overview 

The driver:

 - Integrates with the Linux thermal framework as a cooling device  
 - Registers per-CPU cooling devices that respond to thermal trip points  
 - Uses CPU hotplug operations to reduce thermal load  
 - Maintains system stability by preserving the boot CPU from being put offline,
 regardless the CPUs that are specified in cooling device list. 
 - Implements proper state tracking and cleanup

Key Features:   

 - Dynamic CPU online/offline management based on thermal thresholds  
 - Device tree-based configuration via thermal zones and trip points  
 - Hysteresis support through thermal governor interactions  
 - Safe handling of CPU state transitions during module load/unload  
 - Compatibility with existing thermal management frameworks

Testing    

 - Verified on Renesas RZ/G3E platforms with multi-core CPU configurations  
 - Validated thermal response using artificial load generation (emul_temp)  
 - Confirmed proper interaction with other cooling devices
 - Verified support for 'plug' type trace events
 - Tested with step_wise governor

As the 'hot' type is already used for user space notification, I've choosen
'plug' for this new type. suggestions on this are welcome. Here is an example
of 'thermal-zone' that integrate 'plug' type:

```
thermal-zones {
	cpu-thermal {
		polling-delay = <1000>;
		polling-delay-passive = <250>;
		thermal-sensors = <&tsu>;

		cooling-maps {
			map0 {
				trip = <&target>;
				cooling-device = <&cpu0 0 3>, <&cpu3 0 3>;
				contribution = <1024>;
			};

			map1 {
				trip = <&trip_emergency>;
				cooling-device = <&cpu1 0 1>, <&cpu2 0 1>;
				contribution = <1024>;
			};

		};

		trips {
			target: trip-point {
				temperature = <95000>;
				hysteresis = <1000>;
				type = "passive";
			};

			trip_emergency: emergency {
				temperature = <110000>;
				hysteresis = <1000>;
				type = "plug";
			};

			sensor_crit: sensor-crit {
				temperature = <120000>;
				hysteresis = <1000>;
				type = "critical";
			};
		};
	};
};
```

Dependencies    

 - Requires standard thermal framework components (CONFIG_THERMAL)  
 - Depends on CPU hotplug support (CONFIG_HOTPLUG_CPU)  
 - Assumes device tree contains appropriate thermal zone definitions

This series also depends upon [1], more precisely on patch 6/7, 
arm64: dts: renesas: r9a09g047: Add TSU node.


3) Notes for Reviewers    

 - Focus areas: Thermal framework integration, CPU state management, and error handling  
 - Feedback on device tree binding requirements is particularly welcome  
 - Suggestions for interaction improvements with other governors are appreciated

I look forward to your feedback and guidance on this contribution.

[1] https://patchwork.kernel.org/project/linux-clk/cover/20250227122453.30480-1-john.madieu.xa@bp.renesas.com/

Regards,
John


John Madieu (3):
  thermal/cpuplog_cooling: Add CPU hotplug cooling driver
  tmon: Add support for THERMAL_TRIP_PLUG type
  arm64: dts: renesas: r9a09g047: Add thermal hotplug trip point

 arch/arm64/boot/dts/renesas/r9a09g047.dtsi |  13 +
 drivers/thermal/Kconfig                    |  12 +
 drivers/thermal/Makefile                   |   1 +
 drivers/thermal/cpuplug_cooling.c          | 363 +++++++++++++++++++++
 drivers/thermal/thermal_of.c               |   1 +
 drivers/thermal/thermal_trace.h            |   2 +
 drivers/thermal/thermal_trip.c             |   1 +
 include/uapi/linux/thermal.h               |   1 +
 tools/thermal/tmon/tmon.h                  |   1 +
 tools/thermal/tmon/tui.c                   |   3 +-
 10 files changed, 397 insertions(+), 1 deletion(-)
 create mode 100644 drivers/thermal/cpuplug_cooling.c

Comments

Biju Das March 10, 2025, 10:17 a.m. UTC | #1
Hi John,

Thanks for the patch.

> -----Original Message-----
> From: John Madieu <john.madieu.xa@bp.renesas.com>
> Sent: 09 March 2025 12:13
> Subject: [RFC PATCH 0/3] thermal: Add CPU hotplug cooling driver
> 
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
> 
> This patch series introduces a new thermal cooling driver that implements CPU hotplug-based thermal
> management. The driver dynamically takes CPUs offline during thermal excursions to reduce power
> consumption and prevent overheating, while maintaining system stability by keeping at least one CPU
> online.
> 
> 1- Problem Statement
> 
> Modern SoCs require robust thermal management to prevent overheating under heavy workloads. Existing
> cooling mechanisms like frequency scaling may not always provide sufficient thermal relief, especially
> in multi-core systems where per-core thermal contributions can be significant.
> 
> 2- Solution Overview
> 
> The driver:
> 
>  - Integrates with the Linux thermal framework as a cooling device
>  - Registers per-CPU cooling devices that respond to thermal trip points
>  - Uses CPU hotplug operations to reduce thermal load
>  - Maintains system stability by preserving the boot CPU from being put offline,  regardless the CPUs
> that are specified in cooling device list.
>  - Implements proper state tracking and cleanup
> 
> Key Features:
> 
>  - Dynamic CPU online/offline management based on thermal thresholds
>  - Device tree-based configuration via thermal zones and trip points
>  - Hysteresis support through thermal governor interactions
>  - Safe handling of CPU state transitions during module load/unload
>  - Compatibility with existing thermal management frameworks
> 
> Testing
> 
>  - Verified on Renesas RZ/G3E platforms with multi-core CPU configurations
>  - Validated thermal response using artificial load generation (emul_temp)
>  - Confirmed proper interaction with other cooling devices
>  - Verified support for 'plug' type trace events
>  - Tested with step_wise governor
> 
> As the 'hot' type is already used for user space notification, I've choosen 'plug' for this new type.
> suggestions on this are welcome. Here is an example of 'thermal-zone' that integrate 'plug' type:
> 
> ```
> thermal-zones {
> 	cpu-thermal {
> 		polling-delay = <1000>;
> 		polling-delay-passive = <250>;
> 		thermal-sensors = <&tsu>;
> 
> 		cooling-maps {
> 			map0 {
> 				trip = <&target>;
> 				cooling-device = <&cpu0 0 3>, <&cpu3 0 3>;
> 				contribution = <1024>;
> 			};

Is it not possible here to make cpu1 and cpu2 as well for DVFS passive cooling?

> 
> 			map1 {
> 				trip = <&trip_emergency>;
> 				cooling-device = <&cpu1 0 1>, <&cpu2 0 1>;
> 				contribution = <1024>;
> 			};
> 
> 		};

Is it not possible here to make cpu3 as well as hot pluggable device for cooling?

Cheers,
Biju
John Madieu March 11, 2025, 11:33 a.m. UTC | #2
Hi Biju,

Thanks for your review.

> -----Original Message-----
> From: Biju Das <biju.das.jz@bp.renesas.com>
> Sent: Monday, March 10, 2025 11:18 AM
> To: John Madieu <john.madieu.xa@bp.renesas.com>; geert+renesas@glider.be;
> niklas.soderlund+renesas@ragnatech.se; conor+dt@kernel.org;
> krzk+dt@kernel.org; robh@kernel.org; rafael@kernel.org;
> daniel.lezcano@linaro.org
> Subject: RE: [RFC PATCH 0/3] thermal: Add CPU hotplug cooling driver
> 
> Hi John,
> 
> Thanks for the patch.
> 
> > -----Original Message-----
> > From: John Madieu <john.madieu.xa@bp.renesas.com>
> > Sent: 09 March 2025 12:13
> > Subject: [RFC PATCH 0/3] thermal: Add CPU hotplug cooling driver
> >
> > MIME-Version: 1.0
> > Content-Type: text/plain; charset=UTF-8
> > Content-Transfer-Encoding: 8bit
> >
> > This patch series introduces a new thermal cooling driver that
> > implements CPU hotplug-based thermal management. The driver
> > dynamically takes CPUs offline during thermal excursions to reduce
> > power consumption and prevent overheating, while maintaining system
> stability by keeping at least one CPU online.
> >
> > 1- Problem Statement
> >
> > Modern SoCs require robust thermal management to prevent overheating
> > under heavy workloads. Existing cooling mechanisms like frequency
> > scaling may not always provide sufficient thermal relief, especially in
> multi-core systems where per-core thermal contributions can be
> significant.
> >
> > 2- Solution Overview
> >
> > The driver:
> >
> >  - Integrates with the Linux thermal framework as a cooling device
> >  - Registers per-CPU cooling devices that respond to thermal trip
> > points
> >  - Uses CPU hotplug operations to reduce thermal load
> >  - Maintains system stability by preserving the boot CPU from being
> > put offline,  regardless the CPUs that are specified in cooling device
> list.
> >  - Implements proper state tracking and cleanup
> >
> > Key Features:
> >
> >  - Dynamic CPU online/offline management based on thermal thresholds
> >  - Device tree-based configuration via thermal zones and trip points
> >  - Hysteresis support through thermal governor interactions
> >  - Safe handling of CPU state transitions during module load/unload
> >  - Compatibility with existing thermal management frameworks
> >
> > Testing
> >
> >  - Verified on Renesas RZ/G3E platforms with multi-core CPU
> > configurations
> >  - Validated thermal response using artificial load generation
> > (emul_temp)
> >  - Confirmed proper interaction with other cooling devices
> >  - Verified support for 'plug' type trace events
> >  - Tested with step_wise governor
> >
> > As the 'hot' type is already used for user space notification, I've
> choosen 'plug' for this new type.
> > suggestions on this are welcome. Here is an example of 'thermal-zone'
> that integrate 'plug' type:
> >
> > ```
> > thermal-zones {
> > 	cpu-thermal {
> > 		polling-delay = <1000>;
> > 		polling-delay-passive = <250>;
> > 		thermal-sensors = <&tsu>;
> >
> > 		cooling-maps {
> > 			map0 {
> > 				trip = <&target>;
> > 				cooling-device = <&cpu0 0 3>, <&cpu3 0 3>;
> > 				contribution = <1024>;
> > 			};
> 
> Is it not possible here to make cpu1 and cpu2 as well for DVFS passive
> cooling?

From my tests, adding same CPUs as cooling devices in both maps
generated some warnings saying that the trip could not be bound
to my ("plug") cooling device.

This is a point I still must investigate, and comments from maintainers
would be welcome. However, despite these warnings, I had no unexpected
behavior, and even thermal trace events were Ok.

> 
> >
> > 			map1 {
> > 				trip = <&trip_emergency>;
> > 				cooling-device = <&cpu1 0 1>, <&cpu2 0 1>;
> > 				contribution = <1024>;
> > 			};
> >
> > 		};
> 
> Is it not possible here to make cpu3 as well as hot pluggable device for
> cooling?
> 
> Cheers,
> Biju

Regards,
John