diff mbox series

cpufreq: Don't unregister cpufreq cooling on CPU hotplug

Message ID 1333a397b93e0e15cb7cb358e21a289bc7d71a63.1709193295.git.viresh.kumar@linaro.org (mailing list archive)
State In Next
Delegated to: Rafael Wysocki
Headers show
Series cpufreq: Don't unregister cpufreq cooling on CPU hotplug | expand

Commit Message

Viresh Kumar Feb. 29, 2024, 8:12 a.m. UTC
Offlining a CPU and bringing it back online is a common operation and it
happens frequently during system suspend/resume, where the non-boot CPUs
are hotplugged out during suspend and brought back at resume.

The cpufreq core already tries to make this path as fast as possible as
the changes are only temporary in nature and full cleanup of resources
isn't required in this case. For example the drivers can implement
online()/offline() callbacks to avoid a lot of tear down of resources.

On similar lines, there is no need to unregister the cpufreq cooling
device during suspend / resume, but only while the policy is getting
removed.

Moreover, unregistering the cpufreq cooling device is resulting in an
unwanted outcome, where the system suspend is eventually aborted in the
process.  Currently, during system suspend the cpufreq core unregisters
the cooling device, which in turn removes a kobject using device_del()
and that generates a notification to the userspace via uevent broadcast.
This causes system suspend to abort in some setups.

This was also earlier reported (indirectly) by Roman [1]. Maybe there is
another way around to fixing that problem properly, but this change
makes sense anyways.

Move the registering and unregistering of the cooling device to policy
creation and removal times onlyy.

Reported-by: Manaf Meethalavalappu Pallikunhi <quic_manafm@quicinc.com>
Reported-by: Roman Stratiienko <r.stratiienko@gmail.com>
Link: https://patchwork.kernel.org/project/linux-pm/patch/20220710164026.541466-1-r.stratiienko@gmail.com/ [1]
Tested-by: Manaf Meethalavalappu Pallikunhi <quic_manafm@quicinc.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 drivers/cpufreq/cpufreq.c | 17 +++++++++++------
 1 file changed, 11 insertions(+), 6 deletions(-)

Comments

Mukesh Ojha Feb. 29, 2024, 8:53 a.m. UTC | #1
On 2/29/2024 1:42 PM, Viresh Kumar wrote:
> Offlining a CPU and bringing it back online is a common operation and it
> happens frequently during system suspend/resume, where the non-boot CPUs
> are hotplugged out during suspend and brought back at resume.
> 
> The cpufreq core already tries to make this path as fast as possible as
> the changes are only temporary in nature and full cleanup of resources
> isn't required in this case. For example the drivers can implement
> online()/offline() callbacks to avoid a lot of tear down of resources.
> 
> On similar lines, there is no need to unregister the cpufreq cooling
> device during suspend / resume, but only while the policy is getting
> removed.
> 
> Moreover, unregistering the cpufreq cooling device is resulting in an
> unwanted outcome, where the system suspend is eventually aborted in the
> process.  Currently, during system suspend the cpufreq core unregisters
> the cooling device, which in turn removes a kobject using device_del()
> and that generates a notification to the userspace via uevent broadcast.
> This causes system suspend to abort in some setups.
> 
> This was also earlier reported (indirectly) by Roman [1]. Maybe there is
> another way around to fixing that problem properly, but this change
> makes sense anyways.
> 
> Move the registering and unregistering of the cooling device to policy
> creation and removal times onlyy.
> 
> Reported-by: Manaf Meethalavalappu Pallikunhi <quic_manafm@quicinc.com>
> Reported-by: Roman Stratiienko <r.stratiienko@gmail.com>
> Link: https://patchwork.kernel.org/project/linux-pm/patch/20220710164026.541466-1-r.stratiienko@gmail.com/ [1]
> Tested-by: Manaf Meethalavalappu Pallikunhi <quic_manafm@quicinc.com>
> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
> ---
>   drivers/cpufreq/cpufreq.c | 17 +++++++++++------
>   1 file changed, 11 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index 44db4f59c4cc..4133c606dacb 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -1571,7 +1571,8 @@ static int cpufreq_online(unsigned int cpu)
>   	if (cpufreq_driver->ready)
>   		cpufreq_driver->ready(policy);
>   
> -	if (cpufreq_thermal_control_enabled(cpufreq_driver))
> +	/* Register cpufreq cooling only for a new policy */
> +	if (new_policy && cpufreq_thermal_control_enabled(cpufreq_driver))
>   		policy->cdev = of_cpufreq_cooling_register(policy);
>   
>   	pr_debug("initialization complete\n");
> @@ -1655,11 +1656,6 @@ static void __cpufreq_offline(unsigned int cpu, struct cpufreq_policy *policy)
>   	else
>   		policy->last_policy = policy->policy;
>   
> -	if (cpufreq_thermal_control_enabled(cpufreq_driver)) {
> -		cpufreq_cooling_unregister(policy->cdev);
> -		policy->cdev = NULL;
> -	}
> -
>   	if (has_target())
>   		cpufreq_exit_governor(policy);
>   
> @@ -1720,6 +1716,15 @@ static void cpufreq_remove_dev(struct device *dev, struct subsys_interface *sif)
>   		return;
>   	}
>   
> +	/*
> +	 * Unregister cpufreq cooling once all the CPUs of the policy are
> +	 * removed.
> +	 */
> +	if (cpufreq_thermal_control_enabled(cpufreq_driver)) {
> +		cpufreq_cooling_unregister(policy->cdev);
> +		policy->cdev = NULL;
> +	}
> +

Looks fine than other solution..

-Mukesh
>   	/* We did light-weight exit earlier, do full tear down now */
>   	if (cpufreq_driver->offline)
>   		cpufreq_driver->exit(policy);
Daniel Lezcano Feb. 29, 2024, 8:58 a.m. UTC | #2
On 29/02/2024 09:12, Viresh Kumar wrote:
> Offlining a CPU and bringing it back online is a common operation and it
> happens frequently during system suspend/resume, where the non-boot CPUs
> are hotplugged out during suspend and brought back at resume.
> 
> The cpufreq core already tries to make this path as fast as possible as
> the changes are only temporary in nature and full cleanup of resources
> isn't required in this case. For example the drivers can implement
> online()/offline() callbacks to avoid a lot of tear down of resources.
> 
> On similar lines, there is no need to unregister the cpufreq cooling
> device during suspend / resume, but only while the policy is getting
> removed.
> 
> Moreover, unregistering the cpufreq cooling device is resulting in an
> unwanted outcome, where the system suspend is eventually aborted in the
> process.  Currently, during system suspend the cpufreq core unregisters
> the cooling device, which in turn removes a kobject using device_del()
> and that generates a notification to the userspace via uevent broadcast.
> This causes system suspend to abort in some setups.
> 
> This was also earlier reported (indirectly) by Roman [1]. Maybe there is
> another way around to fixing that problem properly, but this change
> makes sense anyways.
> 
> Move the registering and unregistering of the cooling device to policy
> creation and removal times onlyy.

Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218521

> Reported-by: Manaf Meethalavalappu Pallikunhi <quic_manafm@quicinc.com>
> Reported-by: Roman Stratiienko <r.stratiienko@gmail.com>
> Link: https://patchwork.kernel.org/project/linux-pm/patch/20220710164026.541466-1-r.stratiienko@gmail.com/ [1]
> Tested-by: Manaf Meethalavalappu Pallikunhi <quic_manafm@quicinc.com>
> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
> ---
>   drivers/cpufreq/cpufreq.c | 17 +++++++++++------
>   1 file changed, 11 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index 44db4f59c4cc..4133c606dacb 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -1571,7 +1571,8 @@ static int cpufreq_online(unsigned int cpu)
>   	if (cpufreq_driver->ready)
>   		cpufreq_driver->ready(policy);
>   
> -	if (cpufreq_thermal_control_enabled(cpufreq_driver))
> +	/* Register cpufreq cooling only for a new policy */
> +	if (new_policy && cpufreq_thermal_control_enabled(cpufreq_driver))
>   		policy->cdev = of_cpufreq_cooling_register(policy);
>   
>   	pr_debug("initialization complete\n");
> @@ -1655,11 +1656,6 @@ static void __cpufreq_offline(unsigned int cpu, struct cpufreq_policy *policy)
>   	else
>   		policy->last_policy = policy->policy;
>   
> -	if (cpufreq_thermal_control_enabled(cpufreq_driver)) {
> -		cpufreq_cooling_unregister(policy->cdev);
> -		policy->cdev = NULL;
> -	}
> -
>   	if (has_target())
>   		cpufreq_exit_governor(policy);
>   
> @@ -1720,6 +1716,15 @@ static void cpufreq_remove_dev(struct device *dev, struct subsys_interface *sif)
>   		return;
>   	}
>   
> +	/*
> +	 * Unregister cpufreq cooling once all the CPUs of the policy are
> +	 * removed.
> +	 */
> +	if (cpufreq_thermal_control_enabled(cpufreq_driver)) {
> +		cpufreq_cooling_unregister(policy->cdev);
> +		policy->cdev = NULL;
> +	}
> +
>   	/* We did light-weight exit earlier, do full tear down now */
>   	if (cpufreq_driver->offline)
>   		cpufreq_driver->exit(policy);
Dhruva Gole March 1, 2024, 5:50 a.m. UTC | #3
Hi,

On Feb 29, 2024 at 13:42:07 +0530, Viresh Kumar wrote:
> Offlining a CPU and bringing it back online is a common operation and it
> happens frequently during system suspend/resume, where the non-boot CPUs
> are hotplugged out during suspend and brought back at resume.
> 
> The cpufreq core already tries to make this path as fast as possible as
> the changes are only temporary in nature and full cleanup of resources
> isn't required in this case. For example the drivers can implement
> online()/offline() callbacks to avoid a lot of tear down of resources.
> 
> On similar lines, there is no need to unregister the cpufreq cooling
> device during suspend / resume, but only while the policy is getting
> removed.
> 
> Moreover, unregistering the cpufreq cooling device is resulting in an
> unwanted outcome, where the system suspend is eventually aborted in the
> process.  Currently, during system suspend the cpufreq core unregisters
> the cooling device, which in turn removes a kobject using device_del()
> and that generates a notification to the userspace via uevent broadcast.
> This causes system suspend to abort in some setups.
> 
> This was also earlier reported (indirectly) by Roman [1]. Maybe there is
> another way around to fixing that problem properly, but this change
> makes sense anyways.
> 
> Move the registering and unregistering of the cooling device to policy
> creation and removal times onlyy.
> 
> Reported-by: Manaf Meethalavalappu Pallikunhi <quic_manafm@quicinc.com>
> Reported-by: Roman Stratiienko <r.stratiienko@gmail.com>
> Link: https://patchwork.kernel.org/project/linux-pm/patch/20220710164026.541466-1-r.stratiienko@gmail.com/ [1]
> Tested-by: Manaf Meethalavalappu Pallikunhi <quic_manafm@quicinc.com>
> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
> ---

Makes sense to me,

Reviewed-by: Dhruva Gole <d-gole@ti.com>
Rafael J. Wysocki March 1, 2024, 8:37 p.m. UTC | #4
On Fri, Mar 1, 2024 at 6:50 AM Dhruva Gole <d-gole@ti.com> wrote:
>
> Hi,
>
> On Feb 29, 2024 at 13:42:07 +0530, Viresh Kumar wrote:
> > Offlining a CPU and bringing it back online is a common operation and it
> > happens frequently during system suspend/resume, where the non-boot CPUs
> > are hotplugged out during suspend and brought back at resume.
> >
> > The cpufreq core already tries to make this path as fast as possible as
> > the changes are only temporary in nature and full cleanup of resources
> > isn't required in this case. For example the drivers can implement
> > online()/offline() callbacks to avoid a lot of tear down of resources.
> >
> > On similar lines, there is no need to unregister the cpufreq cooling
> > device during suspend / resume, but only while the policy is getting
> > removed.
> >
> > Moreover, unregistering the cpufreq cooling device is resulting in an
> > unwanted outcome, where the system suspend is eventually aborted in the
> > process.  Currently, during system suspend the cpufreq core unregisters
> > the cooling device, which in turn removes a kobject using device_del()
> > and that generates a notification to the userspace via uevent broadcast.
> > This causes system suspend to abort in some setups.
> >
> > This was also earlier reported (indirectly) by Roman [1]. Maybe there is
> > another way around to fixing that problem properly, but this change
> > makes sense anyways.
> >
> > Move the registering and unregistering of the cooling device to policy
> > creation and removal times onlyy.
> >
> > Reported-by: Manaf Meethalavalappu Pallikunhi <quic_manafm@quicinc.com>
> > Reported-by: Roman Stratiienko <r.stratiienko@gmail.com>
> > Link: https://patchwork.kernel.org/project/linux-pm/patch/20220710164026.541466-1-r.stratiienko@gmail.com/ [1]
> > Tested-by: Manaf Meethalavalappu Pallikunhi <quic_manafm@quicinc.com>
> > Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
> > ---
>
> Makes sense to me,
>
> Reviewed-by: Dhruva Gole <d-gole@ti.com>

Applied (for 6.9), added a Closes: tag as suggested by Daniel.

Thanks!
diff mbox series

Patch

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 44db4f59c4cc..4133c606dacb 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -1571,7 +1571,8 @@  static int cpufreq_online(unsigned int cpu)
 	if (cpufreq_driver->ready)
 		cpufreq_driver->ready(policy);
 
-	if (cpufreq_thermal_control_enabled(cpufreq_driver))
+	/* Register cpufreq cooling only for a new policy */
+	if (new_policy && cpufreq_thermal_control_enabled(cpufreq_driver))
 		policy->cdev = of_cpufreq_cooling_register(policy);
 
 	pr_debug("initialization complete\n");
@@ -1655,11 +1656,6 @@  static void __cpufreq_offline(unsigned int cpu, struct cpufreq_policy *policy)
 	else
 		policy->last_policy = policy->policy;
 
-	if (cpufreq_thermal_control_enabled(cpufreq_driver)) {
-		cpufreq_cooling_unregister(policy->cdev);
-		policy->cdev = NULL;
-	}
-
 	if (has_target())
 		cpufreq_exit_governor(policy);
 
@@ -1720,6 +1716,15 @@  static void cpufreq_remove_dev(struct device *dev, struct subsys_interface *sif)
 		return;
 	}
 
+	/*
+	 * Unregister cpufreq cooling once all the CPUs of the policy are
+	 * removed.
+	 */
+	if (cpufreq_thermal_control_enabled(cpufreq_driver)) {
+		cpufreq_cooling_unregister(policy->cdev);
+		policy->cdev = NULL;
+	}
+
 	/* We did light-weight exit earlier, do full tear down now */
 	if (cpufreq_driver->offline)
 		cpufreq_driver->exit(policy);