Message ID | 2692681.mvXUDI8C0e@kreacher (mailing list archive) |
---|---|
Headers | show |
Series | thermal: core/ACPI: Fix processor cooling device regression | expand |
Hi, Rafael, The only concern to me is that, in thermal_cooling_device_update(), we should handle the cases that the cooling device is current used by one/more thermal zone. say, something like list_for_each_entry(pos, &cdev->thermal_instances, cdev_node) { /* e.g. what to do if tz1 set it to state 1 previously */ } I have not got a clear idea what we should do here. But given that I have confirmed that this patch series fixes the original problem, and the ACPI passive cooling is unlikely to be triggered before CPUFREQ_CREATE_POLICY notification, probably we can address that problem later. Tested-by: Zhang Rui <rui.zhang@intel.com> Reviewed-by: Zhang Rui <rui.zhang@intel.com> thanks, rui On Mon, 2023-03-13 at 15:24 +0100, Rafael J. Wysocki wrote: > Hi All, > > The first revision of this patch series was posted as > > https://lore.kernel.org/linux-pm/2148907.irdbgypaU6@kreacher/ > > As reported by Rui in this thread: > > Link: > https://lore.kernel.org/linux-pm/53ec1f06f61c984100868926f282647e57ecfb2d.camel@intel.com/ > > some recent changes in the thermal core cause the CPU cooling devices > registered by the ACPI processor driver to become unusable in some > cases > and somewhat crippled in general. > > The problem is that the ACPI processor driver changes its > ->get_max_state() > callback return value depending on whether or not cpufreq is > available and > there is a cpufreq policy for a given CPU. However, the thermal core > has > always assumed that the return value of that callback will not > change, which > in fact is relied on by the cooling device statistics code. In > particular, > when the ->get_max_state() grows, the memory buffer allocated for > storing the > statistics will be too small and corruption may ensue as a result. > > For this reason, the issue needs to be addressed in the ACPI > processor driver > and not in the thermal core, but the core needs to help somewhat > too. Namely, > it needs to provide a helper allowing an interested driver to update > the > max_state value for an already registered cooling device in certain > situations > which will also cause the statistics to be rebuilt. > > This series implements the above and for details please refer to the > individual > patch chagelogs. > > Thanks! > > >
On Mon, Mar 13, 2023 at 5:47 PM Zhang, Rui <rui.zhang@intel.com> wrote: > > Hi, Rafael, > > The only concern to me is that, in thermal_cooling_device_update(), we > should handle the cases that the cooling device is current used by > one/more thermal zone. say, something like > > list_for_each_entry(pos, &cdev->thermal_instances, cdev_node) { > /* e.g. what to do if tz1 set it to state 1 previously */ > } > I have not got a clear idea what we should do here. For each instance, set upper to max_state if above it and set target to upper if above it I'd say. I guess otherwise there may be some confusion in principle and I have missed that piece, so thanks for pointing it out! > But given that I have confirmed that this patch series fixes the > original problem, and the ACPI passive cooling is unlikely to be > triggered before CPUFREQ_CREATE_POLICY notification, probably we can > address that problem later. > > Tested-by: Zhang Rui <rui.zhang@intel.com> > Reviewed-by: Zhang Rui <rui.zhang@intel.com> Thank you!
On Mon, 2023-03-13 at 19:02 +0100, Rafael J. Wysocki wrote: > On Mon, Mar 13, 2023 at 5:47 PM Zhang, Rui <rui.zhang@intel.com> > wrote: > > Hi, Rafael, > > > > The only concern to me is that, in thermal_cooling_device_update(), > > we > > should handle the cases that the cooling device is current used by > > one/more thermal zone. say, something like > > > > list_for_each_entry(pos, &cdev->thermal_instances, cdev_node) { > > /* e.g. what to do if tz1 set it to state 1 previously */ > > } > > I have not got a clear idea what we should do here. > > For each instance, set upper to max_state if above it and set target > to upper if above it I'd say. > Say, before update, max_state: 3 target: 1 upper is set to 3 because upper == THERMAL_NO_LIMIT during binding then, after update max_state: 7 target: ? upper: ? Maybe we should do unbind and rebind, and then set target to THERMAL_NO_TARGET? it is really the governor that should set the target. > I guess otherwise there may be some confusion in principle and I have > missed that piece, so thanks for pointing it out! > > > But given that I have confirmed that this patch series fixes the > > original problem, and the ACPI passive cooling is unlikely to be > > triggered before CPUFREQ_CREATE_POLICY notification, probably we > > can > > address that problem later. > > > > Tested-by: Zhang Rui <rui.zhang@intel.com> > > Reviewed-by: Zhang Rui <rui.zhang@intel.com> > > I recalled that patchwork used to catch these tags here and apply them to every patches in the series, so the tags are appended automatically when applying the patches. But it apparently does not work now. Let me reply to the patches one by one. thanks, rui