diff mbox series

[14/18] drivers: firmware: psci: Manage runtime PM in the idle path for CPUs

Message ID 20190513192300.653-15-ulf.hansson@linaro.org (mailing list archive)
State Not Applicable, archived
Headers show
Series ARM/ARM64: Support hierarchical CPU arrangement for PSCI | expand

Commit Message

Ulf Hansson May 13, 2019, 7:22 p.m. UTC
When the hierarchical CPU topology layout is used in DT, let's allow the
CPU to be power managed through its PM domain, via deploying runtime PM
support.

To know for which idle states runtime PM reference counting is needed,
let's store the index of deepest idle state for the CPU, in a per CPU
variable. This allows psci_cpu_suspend_enter() to compare this index with
the requested idle state index and then act accordingly.

Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
---

Changes:
	- Simplify the code by using the new per CPU struct, that stores the
	  needed struct device*.

---
 drivers/firmware/psci/psci.c | 22 ++++++++++++++++++++--
 1 file changed, 20 insertions(+), 2 deletions(-)

Comments

Lorenzo Pieralisi July 16, 2019, 3:53 p.m. UTC | #1
On Mon, May 13, 2019 at 09:22:56PM +0200, Ulf Hansson wrote:
> When the hierarchical CPU topology layout is used in DT, let's allow the
> CPU to be power managed through its PM domain, via deploying runtime PM
> support.
> 
> To know for which idle states runtime PM reference counting is needed,
> let's store the index of deepest idle state for the CPU, in a per CPU
> variable. This allows psci_cpu_suspend_enter() to compare this index with
> the requested idle state index and then act accordingly.

I do not see why a system with two CPU CPUidle states, say CPU retention
and CPU shutdown, should not be calling runtime PM on CPU retention
entry.

The question then is what cluster/package/system states
are allowed for a given CPU idle state, to understand
what idle states can be actually entered at any hierarchy
level given the choice made for the CPU idle state.

In the case above, a CPU entering retention state should prevent
runtime PM selecting a cluster shutdown state; most likely firmware
would demote the request to cluster retention but still, we should
find a way to describe these dependencies.

Thanks,
Lorenzo

> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
> ---
> 
> Changes:
> 	- Simplify the code by using the new per CPU struct, that stores the
> 	  needed struct device*.
> 
> ---
>  drivers/firmware/psci/psci.c | 22 ++++++++++++++++++++--
>  1 file changed, 20 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c
> index 54e23d4ed0ea..2c4157d3a616 100644
> --- a/drivers/firmware/psci/psci.c
> +++ b/drivers/firmware/psci/psci.c
> @@ -20,6 +20,7 @@
>  #include <linux/linkage.h>
>  #include <linux/of.h>
>  #include <linux/pm.h>
> +#include <linux/pm_runtime.h>
>  #include <linux/printk.h>
>  #include <linux/psci.h>
>  #include <linux/reboot.h>
> @@ -298,6 +299,7 @@ static int __init psci_features(u32 psci_func_id)
>  
>  struct psci_cpuidle_data {
>  	u32 *psci_states;
> +	u32 rpm_state_id;
>  	struct device *dev;
>  };
>  
> @@ -385,6 +387,7 @@ static int psci_dt_cpu_init_idle(struct cpuidle_driver *drv,
>  			goto free_mem;
>  
>  		data->dev = dev;
> +		data->rpm_state_id = drv->state_count - 1;
>  	}
>  
>  	/* Idle states parsed correctly, store them in the per-cpu struct. */
> @@ -481,8 +484,11 @@ static int psci_suspend_finisher(unsigned long index)
>  int psci_cpu_suspend_enter(unsigned long index)
>  {
>  	int ret;
> -	u32 *state = __this_cpu_read(psci_cpuidle_data.psci_states);
> -	u32 composite_state = state[index - 1] | psci_get_domain_state();
> +	struct psci_cpuidle_data *data = this_cpu_ptr(&psci_cpuidle_data);
> +	u32 *states = data->psci_states;
> +	struct device *dev = data->dev;
> +	bool runtime_pm = (dev && data->rpm_state_id == index);
> +	u32 composite_state;
>  
>  	/*
>  	 * idle state index 0 corresponds to wfi, should never be called
> @@ -491,11 +497,23 @@ int psci_cpu_suspend_enter(unsigned long index)
>  	if (WARN_ON_ONCE(!index))
>  		return -EINVAL;
>  
> +	/*
> +	 * Do runtime PM if we are using the hierarchical CPU toplogy, but only
> +	 * when cpuidle have selected the deepest idle state for the CPU.
> +	 */
> +	if (runtime_pm)
> +		pm_runtime_put_sync_suspend(dev);
> +
> +	composite_state = states[index - 1] | psci_get_domain_state();
> +
>  	if (!psci_power_state_loses_context(composite_state))
>  		ret = psci_ops.cpu_suspend(composite_state, 0);
>  	else
>  		ret = cpu_suspend(index, psci_suspend_finisher);
>  
> +	if (runtime_pm)
> +		pm_runtime_get_sync(dev);
> +
>  	/* Clear the domain state to start fresh when back from idle. */
>  	psci_set_domain_state(0);
>  
> -- 
> 2.17.1
>
Ulf Hansson July 18, 2019, 10:35 a.m. UTC | #2
On Tue, 16 Jul 2019 at 17:53, Lorenzo Pieralisi
<lorenzo.pieralisi@arm.com> wrote:
>
> On Mon, May 13, 2019 at 09:22:56PM +0200, Ulf Hansson wrote:
> > When the hierarchical CPU topology layout is used in DT, let's allow the
> > CPU to be power managed through its PM domain, via deploying runtime PM
> > support.
> >
> > To know for which idle states runtime PM reference counting is needed,
> > let's store the index of deepest idle state for the CPU, in a per CPU
> > variable. This allows psci_cpu_suspend_enter() to compare this index with
> > the requested idle state index and then act accordingly.
>
> I do not see why a system with two CPU CPUidle states, say CPU retention
> and CPU shutdown, should not be calling runtime PM on CPU retention
> entry.

If the CPU idle governor did select the CPU retention for the CPU, it
was probably because the target residency for the CPU shutdown state
could not be met.

In this case, there is no point in allowing any other deeper idle
states for cluster/package/system, since those have even greater
residencies, hence calling runtime PM doesn't make sense.

>
> The question then is what cluster/package/system states
> are allowed for a given CPU idle state, to understand
> what idle states can be actually entered at any hierarchy
> level given the choice made for the CPU idle state.
>
> In the case above, a CPU entering retention state should prevent
> runtime PM selecting a cluster shutdown state; most likely firmware
> would demote the request to cluster retention but still, we should
> find a way to describe these dependencies.

See above.

[...]

Kind regards
Uffe
Lorenzo Pieralisi July 18, 2019, 1:30 p.m. UTC | #3
On Thu, Jul 18, 2019 at 12:35:07PM +0200, Ulf Hansson wrote:
> On Tue, 16 Jul 2019 at 17:53, Lorenzo Pieralisi
> <lorenzo.pieralisi@arm.com> wrote:
> >
> > On Mon, May 13, 2019 at 09:22:56PM +0200, Ulf Hansson wrote:
> > > When the hierarchical CPU topology layout is used in DT, let's allow the
> > > CPU to be power managed through its PM domain, via deploying runtime PM
> > > support.
> > >
> > > To know for which idle states runtime PM reference counting is needed,
> > > let's store the index of deepest idle state for the CPU, in a per CPU
> > > variable. This allows psci_cpu_suspend_enter() to compare this index with
> > > the requested idle state index and then act accordingly.
> >
> > I do not see why a system with two CPU CPUidle states, say CPU retention
> > and CPU shutdown, should not be calling runtime PM on CPU retention
> > entry.
> 
> If the CPU idle governor did select the CPU retention for the CPU, it
> was probably because the target residency for the CPU shutdown state
> could not be met.

The kernel does not know what those cpu states represent, so, this is an
assumption you are making and it must be made clear that this code works
as long as your assumption is valid.

If eg a "cluster" retention state has lower target_residency than
the deepest CPU idle state this assumption is wrong.

And CPUidle and genPD governor decisions are not synced anyway so,
again, this is an assumption, not a certainty.

> In this case, there is no point in allowing any other deeper idle
> states for cluster/package/system, since those have even greater
> residencies, hence calling runtime PM doesn't make sense.

On the systems you are testing on.

Lorenzo

> > The question then is what cluster/package/system states
> > are allowed for a given CPU idle state, to understand
> > what idle states can be actually entered at any hierarchy
> > level given the choice made for the CPU idle state.
> >
> > In the case above, a CPU entering retention state should prevent
> > runtime PM selecting a cluster shutdown state; most likely firmware
> > would demote the request to cluster retention but still, we should
> > find a way to describe these dependencies.
> 
> See above.
> 
> [...]
> 
> Kind regards
> Uffe
Ulf Hansson July 18, 2019, 4:54 p.m. UTC | #4
On Thu, 18 Jul 2019 at 15:31, Lorenzo Pieralisi
<lorenzo.pieralisi@arm.com> wrote:
>
> On Thu, Jul 18, 2019 at 12:35:07PM +0200, Ulf Hansson wrote:
> > On Tue, 16 Jul 2019 at 17:53, Lorenzo Pieralisi
> > <lorenzo.pieralisi@arm.com> wrote:
> > >
> > > On Mon, May 13, 2019 at 09:22:56PM +0200, Ulf Hansson wrote:
> > > > When the hierarchical CPU topology layout is used in DT, let's allow the
> > > > CPU to be power managed through its PM domain, via deploying runtime PM
> > > > support.
> > > >
> > > > To know for which idle states runtime PM reference counting is needed,
> > > > let's store the index of deepest idle state for the CPU, in a per CPU
> > > > variable. This allows psci_cpu_suspend_enter() to compare this index with
> > > > the requested idle state index and then act accordingly.
> > >
> > > I do not see why a system with two CPU CPUidle states, say CPU retention
> > > and CPU shutdown, should not be calling runtime PM on CPU retention
> > > entry.
> >
> > If the CPU idle governor did select the CPU retention for the CPU, it
> > was probably because the target residency for the CPU shutdown state
> > could not be met.
>
> The kernel does not know what those cpu states represent, so, this is an
> assumption you are making and it must be made clear that this code works
> as long as your assumption is valid.
>
> If eg a "cluster" retention state has lower target_residency than
> the deepest CPU idle state this assumption is wrong.

Good point, you are right. I try to find a place to document this assumption.

>
> And CPUidle and genPD governor decisions are not synced anyway so,
> again, this is an assumption, not a certainty.
>
> > In this case, there is no point in allowing any other deeper idle
> > states for cluster/package/system, since those have even greater
> > residencies, hence calling runtime PM doesn't make sense.
>
> On the systems you are testing on.

So what you are saying typically means, that if all CPUs in the same
cluster have entered the CPU retention state, on some system the
cluster may also put into a cluster retention state (assuming the
target residency is met)?

Do you know of any systems that has these characteristics?

[...]

Kind regards
Uffe
Lina Iyer July 18, 2019, 5:41 p.m. UTC | #5
On Thu, Jul 18 2019 at 10:55 -0600, Ulf Hansson wrote:
>On Thu, 18 Jul 2019 at 15:31, Lorenzo Pieralisi
><lorenzo.pieralisi@arm.com> wrote:
>>
>> On Thu, Jul 18, 2019 at 12:35:07PM +0200, Ulf Hansson wrote:
>> > On Tue, 16 Jul 2019 at 17:53, Lorenzo Pieralisi
>> > <lorenzo.pieralisi@arm.com> wrote:
>> > >
>> > > On Mon, May 13, 2019 at 09:22:56PM +0200, Ulf Hansson wrote:
>> > > > When the hierarchical CPU topology layout is used in DT, let's allow the
>> > > > CPU to be power managed through its PM domain, via deploying runtime PM
>> > > > support.
>> > > >
>> > > > To know for which idle states runtime PM reference counting is needed,
>> > > > let's store the index of deepest idle state for the CPU, in a per CPU
>> > > > variable. This allows psci_cpu_suspend_enter() to compare this index with
>> > > > the requested idle state index and then act accordingly.
>> > >
>> > > I do not see why a system with two CPU CPUidle states, say CPU retention
>> > > and CPU shutdown, should not be calling runtime PM on CPU retention
>> > > entry.
>> >
>> > If the CPU idle governor did select the CPU retention for the CPU, it
>> > was probably because the target residency for the CPU shutdown state
>> > could not be met.
>>
>> The kernel does not know what those cpu states represent, so, this is an
>> assumption you are making and it must be made clear that this code works
>> as long as your assumption is valid.
>>
>> If eg a "cluster" retention state has lower target_residency than
>> the deepest CPU idle state this assumption is wrong.
>
>Good point, you are right. I try to find a place to document this assumption.
>
>>
>> And CPUidle and genPD governor decisions are not synced anyway so,
>> again, this is an assumption, not a certainty.
>>
>> > In this case, there is no point in allowing any other deeper idle
>> > states for cluster/package/system, since those have even greater
>> > residencies, hence calling runtime PM doesn't make sense.
>>
>> On the systems you are testing on.
>
>So what you are saying typically means, that if all CPUs in the same
>cluster have entered the CPU retention state, on some system the
>cluster may also put into a cluster retention state (assuming the
>target residency is met)?
>
>Do you know of any systems that has these characteristics?
>
Many QCOM SoCs can do that. But with the hardware improving, the
power-performance benefits skew the results in favor of powering off
the cluster than keeping the CPU and cluster in retention.

Kevin H and I thought of this problem earlier on. But that is a second
level problem to solve and definitely to be thought of after we have the
support for the deepest states in the kernel. We left that out for a
later date. The idea would have been to setup the allowable state(s) in
the DT for CPU and cluster state definitions and have the genpd take
that into consideration when deciding the idle state for the domain.

Thanks,
Lina
Ulf Hansson July 18, 2019, 9:49 p.m. UTC | #6
On Thu, 18 Jul 2019 at 19:41, Lina Iyer <ilina@codeaurora.org> wrote:
>
> On Thu, Jul 18 2019 at 10:55 -0600, Ulf Hansson wrote:
> >On Thu, 18 Jul 2019 at 15:31, Lorenzo Pieralisi
> ><lorenzo.pieralisi@arm.com> wrote:
> >>
> >> On Thu, Jul 18, 2019 at 12:35:07PM +0200, Ulf Hansson wrote:
> >> > On Tue, 16 Jul 2019 at 17:53, Lorenzo Pieralisi
> >> > <lorenzo.pieralisi@arm.com> wrote:
> >> > >
> >> > > On Mon, May 13, 2019 at 09:22:56PM +0200, Ulf Hansson wrote:
> >> > > > When the hierarchical CPU topology layout is used in DT, let's allow the
> >> > > > CPU to be power managed through its PM domain, via deploying runtime PM
> >> > > > support.
> >> > > >
> >> > > > To know for which idle states runtime PM reference counting is needed,
> >> > > > let's store the index of deepest idle state for the CPU, in a per CPU
> >> > > > variable. This allows psci_cpu_suspend_enter() to compare this index with
> >> > > > the requested idle state index and then act accordingly.
> >> > >
> >> > > I do not see why a system with two CPU CPUidle states, say CPU retention
> >> > > and CPU shutdown, should not be calling runtime PM on CPU retention
> >> > > entry.
> >> >
> >> > If the CPU idle governor did select the CPU retention for the CPU, it
> >> > was probably because the target residency for the CPU shutdown state
> >> > could not be met.
> >>
> >> The kernel does not know what those cpu states represent, so, this is an
> >> assumption you are making and it must be made clear that this code works
> >> as long as your assumption is valid.
> >>
> >> If eg a "cluster" retention state has lower target_residency than
> >> the deepest CPU idle state this assumption is wrong.
> >
> >Good point, you are right. I try to find a place to document this assumption.
> >
> >>
> >> And CPUidle and genPD governor decisions are not synced anyway so,
> >> again, this is an assumption, not a certainty.
> >>
> >> > In this case, there is no point in allowing any other deeper idle
> >> > states for cluster/package/system, since those have even greater
> >> > residencies, hence calling runtime PM doesn't make sense.
> >>
> >> On the systems you are testing on.
> >
> >So what you are saying typically means, that if all CPUs in the same
> >cluster have entered the CPU retention state, on some system the
> >cluster may also put into a cluster retention state (assuming the
> >target residency is met)?
> >
> >Do you know of any systems that has these characteristics?
> >
> Many QCOM SoCs can do that. But with the hardware improving, the
> power-performance benefits skew the results in favor of powering off
> the cluster than keeping the CPU and cluster in retention.
>
> Kevin H and I thought of this problem earlier on. But that is a second
> level problem to solve and definitely to be thought of after we have the
> support for the deepest states in the kernel. We left that out for a
> later date. The idea would have been to setup the allowable state(s) in
> the DT for CPU and cluster state definitions and have the genpd take
> that into consideration when deciding the idle state for the domain.

Thanks for confirming.

This more or less means we need to improve the hierarchical support in
genpd to support more levels, such that it makes sense to have a genpd
governor assigned at more than one level. This doesn't work well
today. As I also have stated, this is on my todo list for genpd.

However, I also agree with your standpoint, that let's start simple to
enable the deepest state as a start with, then we can improve things
on top.

Kind regards
Uffe
Lorenzo Pieralisi July 19, 2019, 10:02 a.m. UTC | #7
On Thu, Jul 18, 2019 at 11:49:11PM +0200, Ulf Hansson wrote:
> On Thu, 18 Jul 2019 at 19:41, Lina Iyer <ilina@codeaurora.org> wrote:
> >
> > On Thu, Jul 18 2019 at 10:55 -0600, Ulf Hansson wrote:
> > >On Thu, 18 Jul 2019 at 15:31, Lorenzo Pieralisi
> > ><lorenzo.pieralisi@arm.com> wrote:
> > >>
> > >> On Thu, Jul 18, 2019 at 12:35:07PM +0200, Ulf Hansson wrote:
> > >> > On Tue, 16 Jul 2019 at 17:53, Lorenzo Pieralisi
> > >> > <lorenzo.pieralisi@arm.com> wrote:
> > >> > >
> > >> > > On Mon, May 13, 2019 at 09:22:56PM +0200, Ulf Hansson wrote:
> > >> > > > When the hierarchical CPU topology layout is used in DT, let's allow the
> > >> > > > CPU to be power managed through its PM domain, via deploying runtime PM
> > >> > > > support.
> > >> > > >
> > >> > > > To know for which idle states runtime PM reference counting is needed,
> > >> > > > let's store the index of deepest idle state for the CPU, in a per CPU
> > >> > > > variable. This allows psci_cpu_suspend_enter() to compare this index with
> > >> > > > the requested idle state index and then act accordingly.
> > >> > >
> > >> > > I do not see why a system with two CPU CPUidle states, say CPU retention
> > >> > > and CPU shutdown, should not be calling runtime PM on CPU retention
> > >> > > entry.
> > >> >
> > >> > If the CPU idle governor did select the CPU retention for the CPU, it
> > >> > was probably because the target residency for the CPU shutdown state
> > >> > could not be met.
> > >>
> > >> The kernel does not know what those cpu states represent, so, this is an
> > >> assumption you are making and it must be made clear that this code works
> > >> as long as your assumption is valid.
> > >>
> > >> If eg a "cluster" retention state has lower target_residency than
> > >> the deepest CPU idle state this assumption is wrong.
> > >
> > >Good point, you are right. I try to find a place to document this assumption.
> > >
> > >>
> > >> And CPUidle and genPD governor decisions are not synced anyway so,
> > >> again, this is an assumption, not a certainty.
> > >>
> > >> > In this case, there is no point in allowing any other deeper idle
> > >> > states for cluster/package/system, since those have even greater
> > >> > residencies, hence calling runtime PM doesn't make sense.
> > >>
> > >> On the systems you are testing on.
> > >
> > >So what you are saying typically means, that if all CPUs in the same
> > >cluster have entered the CPU retention state, on some system the
> > >cluster may also put into a cluster retention state (assuming the
> > >target residency is met)?
> > >
> > >Do you know of any systems that has these characteristics?
> > >
> > Many QCOM SoCs can do that. But with the hardware improving, the
> > power-performance benefits skew the results in favor of powering off
> > the cluster than keeping the CPU and cluster in retention.
> >
> > Kevin H and I thought of this problem earlier on. But that is a second
> > level problem to solve and definitely to be thought of after we have the
> > support for the deepest states in the kernel. We left that out for a
> > later date. The idea would have been to setup the allowable state(s) in
> > the DT for CPU and cluster state definitions and have the genpd take
> > that into consideration when deciding the idle state for the domain.
> 
> Thanks for confirming.
> 
> This more or less means we need to improve the hierarchical support in
> genpd to support more levels, such that it makes sense to have a genpd
> governor assigned at more than one level. This doesn't work well
> today. As I also have stated, this is on my todo list for genpd.
> 
> However, I also agree with your standpoint, that let's start simple to
> enable the deepest state as a start with, then we can improve things
> on top.

How to solve this in the kernel I don't know but please do make sure
that the DT bindings allow you to describe what's needed, once they are
merged you won't be able to change them and I won't bodge the code to
make things fit, so if anything let's focus on getting them right as a
matter of priority to get this done please.

Thanks,
Lorenzo
diff mbox series

Patch

diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c
index 54e23d4ed0ea..2c4157d3a616 100644
--- a/drivers/firmware/psci/psci.c
+++ b/drivers/firmware/psci/psci.c
@@ -20,6 +20,7 @@ 
 #include <linux/linkage.h>
 #include <linux/of.h>
 #include <linux/pm.h>
+#include <linux/pm_runtime.h>
 #include <linux/printk.h>
 #include <linux/psci.h>
 #include <linux/reboot.h>
@@ -298,6 +299,7 @@  static int __init psci_features(u32 psci_func_id)
 
 struct psci_cpuidle_data {
 	u32 *psci_states;
+	u32 rpm_state_id;
 	struct device *dev;
 };
 
@@ -385,6 +387,7 @@  static int psci_dt_cpu_init_idle(struct cpuidle_driver *drv,
 			goto free_mem;
 
 		data->dev = dev;
+		data->rpm_state_id = drv->state_count - 1;
 	}
 
 	/* Idle states parsed correctly, store them in the per-cpu struct. */
@@ -481,8 +484,11 @@  static int psci_suspend_finisher(unsigned long index)
 int psci_cpu_suspend_enter(unsigned long index)
 {
 	int ret;
-	u32 *state = __this_cpu_read(psci_cpuidle_data.psci_states);
-	u32 composite_state = state[index - 1] | psci_get_domain_state();
+	struct psci_cpuidle_data *data = this_cpu_ptr(&psci_cpuidle_data);
+	u32 *states = data->psci_states;
+	struct device *dev = data->dev;
+	bool runtime_pm = (dev && data->rpm_state_id == index);
+	u32 composite_state;
 
 	/*
 	 * idle state index 0 corresponds to wfi, should never be called
@@ -491,11 +497,23 @@  int psci_cpu_suspend_enter(unsigned long index)
 	if (WARN_ON_ONCE(!index))
 		return -EINVAL;
 
+	/*
+	 * Do runtime PM if we are using the hierarchical CPU toplogy, but only
+	 * when cpuidle have selected the deepest idle state for the CPU.
+	 */
+	if (runtime_pm)
+		pm_runtime_put_sync_suspend(dev);
+
+	composite_state = states[index - 1] | psci_get_domain_state();
+
 	if (!psci_power_state_loses_context(composite_state))
 		ret = psci_ops.cpu_suspend(composite_state, 0);
 	else
 		ret = cpu_suspend(index, psci_suspend_finisher);
 
+	if (runtime_pm)
+		pm_runtime_get_sync(dev);
+
 	/* Clear the domain state to start fresh when back from idle. */
 	psci_set_domain_state(0);