[v8,09/26] kernel/cpu_pm: Manage runtime PM in the idle path for CPUs
diff mbox

Message ID 20180620172226.15012-10-ulf.hansson@linaro.org
State Changes Requested, archived
Headers show

Commit Message

Ulf Hansson June 20, 2018, 5:22 p.m. UTC
To allow CPUs being power managed by PM domains, let's deploy support for
runtime PM for the CPU's corresponding struct device.

More precisely, at the point when the CPU is about to enter an idle state,
decrease the runtime PM usage count for its corresponding struct device,
via calling pm_runtime_put_sync_suspend(). Then, at the point when the CPU
resumes from idle, let's increase the runtime PM usage count, via calling
pm_runtime_get_sync().

Cc: Lina Iyer <ilina@codeaurora.org>
Co-developed-by: Lina Iyer <lina.iyer@linaro.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
---
 kernel/cpu_pm.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

Comments

Rafael J. Wysocki July 18, 2018, 10:11 a.m. UTC | #1
On Wednesday, June 20, 2018 7:22:09 PM CEST Ulf Hansson wrote:
> To allow CPUs being power managed by PM domains, let's deploy support for
> runtime PM for the CPU's corresponding struct device.
> 
> More precisely, at the point when the CPU is about to enter an idle state,
> decrease the runtime PM usage count for its corresponding struct device,
> via calling pm_runtime_put_sync_suspend(). Then, at the point when the CPU
> resumes from idle, let's increase the runtime PM usage count, via calling
> pm_runtime_get_sync().
> 
> Cc: Lina Iyer <ilina@codeaurora.org>
> Co-developed-by: Lina Iyer <lina.iyer@linaro.org>
> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

I finally got to this one, sorry for the huge delay.

Let me confirm that I understand the code flow correctly.

> ---
>  kernel/cpu_pm.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/kernel/cpu_pm.c b/kernel/cpu_pm.c
> index 67b02e138a47..492d4a83dca0 100644
> --- a/kernel/cpu_pm.c
> +++ b/kernel/cpu_pm.c
> @@ -16,9 +16,11 @@
>   */
>  
>  #include <linux/kernel.h>
> +#include <linux/cpu.h>
>  #include <linux/cpu_pm.h>
>  #include <linux/module.h>
>  #include <linux/notifier.h>
> +#include <linux/pm_runtime.h>
>  #include <linux/spinlock.h>
>  #include <linux/syscore_ops.h>
>  
> @@ -91,6 +93,7 @@ int cpu_pm_enter(void)

This is called from a cpuidle driver's ->enter callback for the target state
selected by the idle governor ->

>  {
>  	int nr_calls;
>  	int ret = 0;
> +	struct device *dev = get_cpu_device(smp_processor_id());
>  
>  	ret = cpu_pm_notify(CPU_PM_ENTER, -1, &nr_calls);
>  	if (ret)
> @@ -100,6 +103,9 @@ int cpu_pm_enter(void)
>  		 */
>  		cpu_pm_notify(CPU_PM_ENTER_FAILED, nr_calls - 1, NULL);
>  
> +	if (!ret && dev && dev->pm_domain)
> +		pm_runtime_put_sync_suspend(dev);

-> so this is going to invoke genpd_runtime_suspend() if the usage
counter of dev is 0.

That will cause cpu_power_down_ok() to be called (because this is
a CPU domain) and that will walk the domain cpumask and compute the
estimated idle duration as the minimum of tick_nohz_get_next_wakeup()
values over the CPUs in that cpumask.  [Note that the weight of the
cpumask must be seriously limited for that to actually work, as this
happens in the idle path.]  Next, it will return "true" if it can
find a domain state with residency within the estimated idle
duration.  [Note that this sort of overlaps with the idle governor's
job.]

Next, __genpd_runtime_suspend() will be invoked to run the device-specific
callback if any [Note that this has to be suitable for the idle path if
present.] and genpd_stop_dev() runs (which, again, may invoke a callback)
and genpd_power_off() runs under the domain lock (which must be a spinlock
then).

> +
>  	return ret;
>  }
>  EXPORT_SYMBOL_GPL(cpu_pm_enter);
> @@ -118,6 +124,11 @@ EXPORT_SYMBOL_GPL(cpu_pm_enter);
>   */
>  int cpu_pm_exit(void)
>  {
> +	struct device *dev = get_cpu_device(smp_processor_id());
> +
> +	if (dev && dev->pm_domain)
> +		pm_runtime_get_sync(dev);
> +
>  	return cpu_pm_notify(CPU_PM_EXIT, -1, NULL);
>  }
>  EXPORT_SYMBOL_GPL(cpu_pm_exit);
> 

And this is called on wakeup when the cpuidle driver's ->enter callback
is about to return and it reverses the suspend flow (except that the
governor doesn't need to be called now).

Have I got that right?
Rafael J. Wysocki July 19, 2018, 10:12 a.m. UTC | #2
On Wednesday, July 18, 2018 12:11:06 PM CEST Rafael J. Wysocki wrote:
> On Wednesday, June 20, 2018 7:22:09 PM CEST Ulf Hansson wrote:
> > To allow CPUs being power managed by PM domains, let's deploy support for
> > runtime PM for the CPU's corresponding struct device.
> > 
> > More precisely, at the point when the CPU is about to enter an idle state,
> > decrease the runtime PM usage count for its corresponding struct device,
> > via calling pm_runtime_put_sync_suspend(). Then, at the point when the CPU
> > resumes from idle, let's increase the runtime PM usage count, via calling
> > pm_runtime_get_sync().
> > 
> > Cc: Lina Iyer <ilina@codeaurora.org>
> > Co-developed-by: Lina Iyer <lina.iyer@linaro.org>
> > Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
> 
> I finally got to this one, sorry for the huge delay.
> 
> Let me confirm that I understand the code flow correctly.
> 
> > ---
> >  kernel/cpu_pm.c | 11 +++++++++++
> >  1 file changed, 11 insertions(+)
> > 
> > diff --git a/kernel/cpu_pm.c b/kernel/cpu_pm.c
> > index 67b02e138a47..492d4a83dca0 100644
> > --- a/kernel/cpu_pm.c
> > +++ b/kernel/cpu_pm.c
> > @@ -16,9 +16,11 @@
> >   */
> >  
> >  #include <linux/kernel.h>
> > +#include <linux/cpu.h>
> >  #include <linux/cpu_pm.h>
> >  #include <linux/module.h>
> >  #include <linux/notifier.h>
> > +#include <linux/pm_runtime.h>
> >  #include <linux/spinlock.h>
> >  #include <linux/syscore_ops.h>
> >  
> > @@ -91,6 +93,7 @@ int cpu_pm_enter(void)
> 
> This is called from a cpuidle driver's ->enter callback for the target state
> selected by the idle governor ->
> 
> >  {
> >  	int nr_calls;
> >  	int ret = 0;
> > +	struct device *dev = get_cpu_device(smp_processor_id());
> >  
> >  	ret = cpu_pm_notify(CPU_PM_ENTER, -1, &nr_calls);
> >  	if (ret)
> > @@ -100,6 +103,9 @@ int cpu_pm_enter(void)
> >  		 */
> >  		cpu_pm_notify(CPU_PM_ENTER_FAILED, nr_calls - 1, NULL);
> >  
> > +	if (!ret && dev && dev->pm_domain)
> > +		pm_runtime_put_sync_suspend(dev);
> 
> -> so this is going to invoke genpd_runtime_suspend() if the usage
> counter of dev is 0.
> 
> That will cause cpu_power_down_ok() to be called (because this is
> a CPU domain) and that will walk the domain cpumask and compute the
> estimated idle duration as the minimum of tick_nohz_get_next_wakeup()
> values over the CPUs in that cpumask.  [Note that the weight of the
> cpumask must be seriously limited for that to actually work, as this
> happens in the idle path.]  Next, it will return "true" if it can
> find a domain state with residency within the estimated idle
> duration.  [Note that this sort of overlaps with the idle governor's
> job.]
> 
> Next, __genpd_runtime_suspend() will be invoked to run the device-specific
> callback if any [Note that this has to be suitable for the idle path if
> present.] and genpd_stop_dev() runs (which, again, may invoke a callback)
> and genpd_power_off() runs under the domain lock (which must be a spinlock
> then).
> 
> > +
> >  	return ret;
> >  }
> >  EXPORT_SYMBOL_GPL(cpu_pm_enter);
> > @@ -118,6 +124,11 @@ EXPORT_SYMBOL_GPL(cpu_pm_enter);
> >   */
> >  int cpu_pm_exit(void)
> >  {
> > +	struct device *dev = get_cpu_device(smp_processor_id());
> > +
> > +	if (dev && dev->pm_domain)
> > +		pm_runtime_get_sync(dev);
> > +
> >  	return cpu_pm_notify(CPU_PM_EXIT, -1, NULL);
> >  }
> >  EXPORT_SYMBOL_GPL(cpu_pm_exit);
> > 
> 
> And this is called on wakeup when the cpuidle driver's ->enter callback
> is about to return and it reverses the suspend flow (except that the
> governor doesn't need to be called now).
> 
> Have I got that right?

Assuming that I have got that right, there are concerns, mostly regarding
patch [07/26], but I will reply to that directly.

The $subject patch is fine by me by itself, but it obviously depends on the
previous ones.  Patches [01-02/26] are fine too, but they don't seem to be
particularly useful without the rest of the series.

As far as patches [10-26/26] go, I'd like to see some review comments and/or
tags from the people with vested interest in there, in particular from Daniel
on patch [12/26] and from Sudeep on the PSCI ones.

Thanks,
Rafael
Rafael J. Wysocki July 19, 2018, 10:39 a.m. UTC | #3
On Thursday, July 19, 2018 12:12:55 PM CEST Rafael J. Wysocki wrote:
> On Wednesday, July 18, 2018 12:11:06 PM CEST Rafael J. Wysocki wrote:
> > On Wednesday, June 20, 2018 7:22:09 PM CEST Ulf Hansson wrote:
> > > To allow CPUs being power managed by PM domains, let's deploy support for
> > > runtime PM for the CPU's corresponding struct device.
> > > 
> > > More precisely, at the point when the CPU is about to enter an idle state,
> > > decrease the runtime PM usage count for its corresponding struct device,
> > > via calling pm_runtime_put_sync_suspend(). Then, at the point when the CPU
> > > resumes from idle, let's increase the runtime PM usage count, via calling
> > > pm_runtime_get_sync().
> > > 
> > > Cc: Lina Iyer <ilina@codeaurora.org>
> > > Co-developed-by: Lina Iyer <lina.iyer@linaro.org>
> > > Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
> > 
> > I finally got to this one, sorry for the huge delay.
> > 
> > Let me confirm that I understand the code flow correctly.
> > 
> > > ---
> > >  kernel/cpu_pm.c | 11 +++++++++++
> > >  1 file changed, 11 insertions(+)
> > > 
> > > diff --git a/kernel/cpu_pm.c b/kernel/cpu_pm.c
> > > index 67b02e138a47..492d4a83dca0 100644
> > > --- a/kernel/cpu_pm.c
> > > +++ b/kernel/cpu_pm.c
> > > @@ -16,9 +16,11 @@
> > >   */
> > >  
> > >  #include <linux/kernel.h>
> > > +#include <linux/cpu.h>
> > >  #include <linux/cpu_pm.h>
> > >  #include <linux/module.h>
> > >  #include <linux/notifier.h>
> > > +#include <linux/pm_runtime.h>
> > >  #include <linux/spinlock.h>
> > >  #include <linux/syscore_ops.h>
> > >  
> > > @@ -91,6 +93,7 @@ int cpu_pm_enter(void)
> > 
> > This is called from a cpuidle driver's ->enter callback for the target state
> > selected by the idle governor ->
> > 
> > >  {
> > >  	int nr_calls;
> > >  	int ret = 0;
> > > +	struct device *dev = get_cpu_device(smp_processor_id());
> > >  
> > >  	ret = cpu_pm_notify(CPU_PM_ENTER, -1, &nr_calls);
> > >  	if (ret)
> > > @@ -100,6 +103,9 @@ int cpu_pm_enter(void)
> > >  		 */
> > >  		cpu_pm_notify(CPU_PM_ENTER_FAILED, nr_calls - 1, NULL);
> > >  
> > > +	if (!ret && dev && dev->pm_domain)
> > > +		pm_runtime_put_sync_suspend(dev);
> > 
> > -> so this is going to invoke genpd_runtime_suspend() if the usage
> > counter of dev is 0.
> > 
> > That will cause cpu_power_down_ok() to be called (because this is
> > a CPU domain) and that will walk the domain cpumask and compute the
> > estimated idle duration as the minimum of tick_nohz_get_next_wakeup()
> > values over the CPUs in that cpumask.  [Note that the weight of the
> > cpumask must be seriously limited for that to actually work, as this
> > happens in the idle path.]  Next, it will return "true" if it can
> > find a domain state with residency within the estimated idle
> > duration.  [Note that this sort of overlaps with the idle governor's
> > job.]
> > 
> > Next, __genpd_runtime_suspend() will be invoked to run the device-specific
> > callback if any [Note that this has to be suitable for the idle path if
> > present.] and genpd_stop_dev() runs (which, again, may invoke a callback)
> > and genpd_power_off() runs under the domain lock (which must be a spinlock
> > then).
> > 
> > > +
> > >  	return ret;
> > >  }
> > >  EXPORT_SYMBOL_GPL(cpu_pm_enter);
> > > @@ -118,6 +124,11 @@ EXPORT_SYMBOL_GPL(cpu_pm_enter);
> > >   */
> > >  int cpu_pm_exit(void)
> > >  {
> > > +	struct device *dev = get_cpu_device(smp_processor_id());
> > > +
> > > +	if (dev && dev->pm_domain)
> > > +		pm_runtime_get_sync(dev);
> > > +
> > >  	return cpu_pm_notify(CPU_PM_EXIT, -1, NULL);
> > >  }
> > >  EXPORT_SYMBOL_GPL(cpu_pm_exit);
> > > 
> > 
> > And this is called on wakeup when the cpuidle driver's ->enter callback
> > is about to return and it reverses the suspend flow (except that the
> > governor doesn't need to be called now).
> > 
> > Have I got that right?
> 
> Assuming that I have got that right, there are concerns, mostly regarding
> patch [07/26], but I will reply to that directly.

Well, I haven't got that right, so never mind.

There are a few minor things to address, but apart from that the general
genpd patches look ready.

> The $subject patch is fine by me by itself, but it obviously depends on the
> previous ones.  Patches [01-02/26] are fine too, but they don't seem to be
> particularly useful without the rest of the series.
> 
> As far as patches [10-26/26] go, I'd like to see some review comments and/or
> tags from the people with vested interest in there, in particular from Daniel
> on patch [12/26] and from Sudeep on the PSCI ones.

But this still holds.

Thanks,
Rafael
Ulf Hansson Aug. 3, 2018, 11:42 a.m. UTC | #4
[...]

>>
>> Assuming that I have got that right, there are concerns, mostly regarding
>> patch [07/26], but I will reply to that directly.
>
> Well, I haven't got that right, so never mind.
>
> There are a few minor things to address, but apart from that the general
> genpd patches look ready.

Alright, thanks!

I will re-spin the series and post a new version once 4.19 rc1 is out.
Hopefully we can queue it up early in next cycle to get it tested in
next for a while.

>
>> The $subject patch is fine by me by itself, but it obviously depends on the
>> previous ones.  Patches [01-02/26] are fine too, but they don't seem to be
>> particularly useful without the rest of the series.
>>
>> As far as patches [10-26/26] go, I'd like to see some review comments and/or
>> tags from the people with vested interest in there, in particular from Daniel
>> on patch [12/26] and from Sudeep on the PSCI ones.
>
> But this still holds.

Actually, patch 10 and patch11 is ready to go as well. I ping Daniel
on patch 12.

In regards to the rest of the series, some of the PSCI/ARM changes
have been reviewed by Mark Rutland, however several changes have not
been acked.

On the other hand, one can also interpret the long silence in regards
to PSCI/ARM changes as they are good to go. :-)

Kind regards
Uffe
Rafael J. Wysocki Aug. 6, 2018, 9:37 a.m. UTC | #5
On Fri, Aug 3, 2018 at 1:42 PM, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> [...]
>
>>>
>>> Assuming that I have got that right, there are concerns, mostly regarding
>>> patch [07/26], but I will reply to that directly.
>>
>> Well, I haven't got that right, so never mind.
>>
>> There are a few minor things to address, but apart from that the general
>> genpd patches look ready.
>
> Alright, thanks!
>
> I will re-spin the series and post a new version once 4.19 rc1 is out.
> Hopefully we can queue it up early in next cycle to get it tested in
> next for a while.
>
>>
>>> The $subject patch is fine by me by itself, but it obviously depends on the
>>> previous ones.  Patches [01-02/26] are fine too, but they don't seem to be
>>> particularly useful without the rest of the series.
>>>
>>> As far as patches [10-26/26] go, I'd like to see some review comments and/or
>>> tags from the people with vested interest in there, in particular from Daniel
>>> on patch [12/26] and from Sudeep on the PSCI ones.
>>
>> But this still holds.
>
> Actually, patch 10 and patch11 is ready to go as well. I ping Daniel
> on patch 12.
>
> In regards to the rest of the series, some of the PSCI/ARM changes
> have been reviewed by Mark Rutland, however several changes have not
> been acked.
>
> On the other hand, one can also interpret the long silence in regards
> to PSCI/ARM changes as they are good to go. :-)

Well, in that case giving an ACK to them should not be an issue for
the people with a vested interest I suppose.
Lorenzo Pieralisi Aug. 8, 2018, 10:56 a.m. UTC | #6
On Mon, Aug 06, 2018 at 11:37:55AM +0200, Rafael J. Wysocki wrote:
> On Fri, Aug 3, 2018 at 1:42 PM, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > [...]
> >
> >>>
> >>> Assuming that I have got that right, there are concerns, mostly regarding
> >>> patch [07/26], but I will reply to that directly.
> >>
> >> Well, I haven't got that right, so never mind.
> >>
> >> There are a few minor things to address, but apart from that the general
> >> genpd patches look ready.
> >
> > Alright, thanks!
> >
> > I will re-spin the series and post a new version once 4.19 rc1 is out.
> > Hopefully we can queue it up early in next cycle to get it tested in
> > next for a while.
> >
> >>
> >>> The $subject patch is fine by me by itself, but it obviously depends on the
> >>> previous ones.  Patches [01-02/26] are fine too, but they don't seem to be
> >>> particularly useful without the rest of the series.
> >>>
> >>> As far as patches [10-26/26] go, I'd like to see some review comments and/or
> >>> tags from the people with vested interest in there, in particular from Daniel
> >>> on patch [12/26] and from Sudeep on the PSCI ones.
> >>
> >> But this still holds.
> >
> > Actually, patch 10 and patch11 is ready to go as well. I ping Daniel
> > on patch 12.
> >
> > In regards to the rest of the series, some of the PSCI/ARM changes
> > have been reviewed by Mark Rutland, however several changes have not
> > been acked.
> >
> > On the other hand, one can also interpret the long silence in regards
> > to PSCI/ARM changes as they are good to go. :-)
> 
> Well, in that case giving an ACK to them should not be an issue for
> the people with a vested interest I suppose.

Apologies to everyone for the delay in replying.

Side note: cpu_pm_enter()/exit() are also called through syscore ops in
s2RAM/IDLE, you know that but I just wanted to mention it to compound
the discussion.

As for PSCI patches I do not personally think PSCI OSI enablement is
beneficial (and my position has always been the same since PSCI OSI was
added to the specification, I am not even talking about this patchset)
and Arm Trusted Firmware does not currently support it for the same
reason.

We (if Mark and Sudeep agree) will enable PSCI OSI if and when we have a
definitive and constructive answer to *why* we have to do that that is
not a dogmatic "the kernel knows better" but rather a comprehensive
power benchmark evaluation - I thought that was the agreement reached
at OSPM but apparently I was mistaken.

As a reminder - PSCI firmware implementation has to have state machines
and locking to guarantee safe power down operations (and to flush caches
only if necessary - which requires cpu masks for power domains) and
that's true whether we enable PSCI OSI or not, the coordination logic
must be in firmware/hardware _already_ - the cpumasks, the power domain
topology, etc.

I agree with the power-domains representation of idle-states (since
that's the correct HW description) and I thought and hoped that runtime
PM could help _remove_ the CPU PM notifiers (by making the notifiers
callbacks a runtime PM one) even though I have to say that's quite
complex, given that only few (ie one instance :)) CPU PM notifiers
callbacks are backed by a struct device (eg an ARM PMU is a device but
for instance the GIC is not a device so its save/restore code I am not
sure it can be implemented with runtime PM callbacks).

Lorenzo
Lina Iyer Aug. 8, 2018, 6:02 p.m. UTC | #7
On Wed, Aug 08 2018 at 04:56 -0600, Lorenzo Pieralisi wrote:
>On Mon, Aug 06, 2018 at 11:37:55AM +0200, Rafael J. Wysocki wrote:
>> On Fri, Aug 3, 2018 at 1:42 PM, Ulf Hansson <ulf.hansson@linaro.org> wrote:
>> > [...]
>> >
>> >>>
>> >>> Assuming that I have got that right, there are concerns, mostly regarding
>> >>> patch [07/26], but I will reply to that directly.
>> >>
>> >> Well, I haven't got that right, so never mind.
>> >>
>> >> There are a few minor things to address, but apart from that the general
>> >> genpd patches look ready.
>> >
>> > Alright, thanks!
>> >
>> > I will re-spin the series and post a new version once 4.19 rc1 is out.
>> > Hopefully we can queue it up early in next cycle to get it tested in
>> > next for a while.
>> >
>> >>
>> >>> The $subject patch is fine by me by itself, but it obviously depends on the
>> >>> previous ones.  Patches [01-02/26] are fine too, but they don't seem to be
>> >>> particularly useful without the rest of the series.
>> >>>
>> >>> As far as patches [10-26/26] go, I'd like to see some review comments and/or
>> >>> tags from the people with vested interest in there, in particular from Daniel
>> >>> on patch [12/26] and from Sudeep on the PSCI ones.
>> >>
>> >> But this still holds.
>> >
>> > Actually, patch 10 and patch11 is ready to go as well. I ping Daniel
>> > on patch 12.
>> >
>> > In regards to the rest of the series, some of the PSCI/ARM changes
>> > have been reviewed by Mark Rutland, however several changes have not
>> > been acked.
>> >
>> > On the other hand, one can also interpret the long silence in regards
>> > to PSCI/ARM changes as they are good to go. :-)
>>
>> Well, in that case giving an ACK to them should not be an issue for
>> the people with a vested interest I suppose.
>
>Apologies to everyone for the delay in replying.
>
>Side note: cpu_pm_enter()/exit() are also called through syscore ops in
>s2RAM/IDLE, you know that but I just wanted to mention it to compound
>the discussion.
>
>As for PSCI patches I do not personally think PSCI OSI enablement is
>beneficial (and my position has always been the same since PSCI OSI was
>added to the specification, I am not even talking about this patchset)
>and Arm Trusted Firmware does not currently support it for the same
>reason.
>
>We (if Mark and Sudeep agree) will enable PSCI OSI if and when we have a
>definitive and constructive answer to *why* we have to do that that is
>not a dogmatic "the kernel knows better" but rather a comprehensive
>power benchmark evaluation - I thought that was the agreement reached
>at OSPM but apparently I was mistaken.
>
I will not speak to any comparison of benchmarks between OSI and PC.
AFAIK, there are no platforms supporting both.

But, the OSI feature is critical for QCOM mobile platforms. The
last man activities during cpuidle save quite a lot of power. Powering
off the clocks, busses, regulators and even the oscillator is very
important to have a reasonable battery life when using the phone.
Platform coordinated approach falls quite short of the needs of a
powerful processor with a desired battery efficiency.

-- Lina

>As a reminder - PSCI firmware implementation has to have state machines
>and locking to guarantee safe power down operations (and to flush caches
>only if necessary - which requires cpu masks for power domains) and
>that's true whether we enable PSCI OSI or not, the coordination logic
>must be in firmware/hardware _already_ - the cpumasks, the power domain
>topology, etc.
>
>I agree with the power-domains representation of idle-states (since
>that's the correct HW description) and I thought and hoped that runtime
>PM could help _remove_ the CPU PM notifiers (by making the notifiers
>callbacks a runtime PM one) even though I have to say that's quite
>complex, given that only few (ie one instance :)) CPU PM notifiers
>callbacks are backed by a struct device (eg an ARM PMU is a device but
>for instance the GIC is not a device so its save/restore code I am not
>sure it can be implemented with runtime PM callbacks).
>
>Lorenzo
Rafael J. Wysocki Aug. 9, 2018, 8:16 a.m. UTC | #8
On Wed, Aug 8, 2018 at 8:02 PM, Lina Iyer <ilina@codeaurora.org> wrote:
> On Wed, Aug 08 2018 at 04:56 -0600, Lorenzo Pieralisi wrote:
>>
>> On Mon, Aug 06, 2018 at 11:37:55AM +0200, Rafael J. Wysocki wrote:
>>>
>>> On Fri, Aug 3, 2018 at 1:42 PM, Ulf Hansson <ulf.hansson@linaro.org>
>>> wrote:
>>> > [...]
>>> >
>>> >>>
>>> >>> Assuming that I have got that right, there are concerns, mostly
>>> >>> regarding
>>> >>> patch [07/26], but I will reply to that directly.
>>> >>
>>> >> Well, I haven't got that right, so never mind.
>>> >>
>>> >> There are a few minor things to address, but apart from that the
>>> >> general
>>> >> genpd patches look ready.
>>> >
>>> > Alright, thanks!
>>> >
>>> > I will re-spin the series and post a new version once 4.19 rc1 is out.
>>> > Hopefully we can queue it up early in next cycle to get it tested in
>>> > next for a while.
>>> >
>>> >>
>>> >>> The $subject patch is fine by me by itself, but it obviously depends
>>> >>> on the
>>> >>> previous ones.  Patches [01-02/26] are fine too, but they don't seem
>>> >>> to be
>>> >>> particularly useful without the rest of the series.
>>> >>>
>>> >>> As far as patches [10-26/26] go, I'd like to see some review comments
>>> >>> and/or
>>> >>> tags from the people with vested interest in there, in particular
>>> >>> from Daniel
>>> >>> on patch [12/26] and from Sudeep on the PSCI ones.
>>> >>
>>> >> But this still holds.
>>> >
>>> > Actually, patch 10 and patch11 is ready to go as well. I ping Daniel
>>> > on patch 12.
>>> >
>>> > In regards to the rest of the series, some of the PSCI/ARM changes
>>> > have been reviewed by Mark Rutland, however several changes have not
>>> > been acked.
>>> >
>>> > On the other hand, one can also interpret the long silence in regards
>>> > to PSCI/ARM changes as they are good to go. :-)
>>>
>>> Well, in that case giving an ACK to them should not be an issue for
>>> the people with a vested interest I suppose.
>>
>>
>> Apologies to everyone for the delay in replying.
>>
>> Side note: cpu_pm_enter()/exit() are also called through syscore ops in
>> s2RAM/IDLE, you know that but I just wanted to mention it to compound
>> the discussion.
>>
>> As for PSCI patches I do not personally think PSCI OSI enablement is
>> beneficial (and my position has always been the same since PSCI OSI was
>> added to the specification, I am not even talking about this patchset)
>> and Arm Trusted Firmware does not currently support it for the same
>> reason.
>>
>> We (if Mark and Sudeep agree) will enable PSCI OSI if and when we have a
>> definitive and constructive answer to *why* we have to do that that is
>> not a dogmatic "the kernel knows better" but rather a comprehensive
>> power benchmark evaluation - I thought that was the agreement reached
>> at OSPM but apparently I was mistaken.
>>
> I will not speak to any comparison of benchmarks between OSI and PC.
> AFAIK, there are no platforms supporting both.
>
> But, the OSI feature is critical for QCOM mobile platforms. The
> last man activities during cpuidle save quite a lot of power. Powering
> off the clocks, busses, regulators and even the oscillator is very
> important to have a reasonable battery life when using the phone.
> Platform coordinated approach falls quite short of the needs of a
> powerful processor with a desired battery efficiency.

Even so, you still need firmware (or hardware) to do the right thing
in the concurrent wakeup via an edge-triggered interrupt case AFAICS.
That is, you need the domain to be prevented from being turned off if
one of the CPUs in it has just been woken up and the interrupt is
still pending.
Sudeep Holla Aug. 9, 2018, 9:58 a.m. UTC | #9
On Wed, Aug 08, 2018 at 12:02:48PM -0600, Lina Iyer wrote:

[...]

> I will not speak to any comparison of benchmarks between OSI and PC.
> AFAIK, there are no platforms supporting both.
>

That's the fundamental issue here. So we have never ever done a proper
comparison.

> But, the OSI feature is critical for QCOM mobile platforms. The
> last man activities during cpuidle save quite a lot of power. Powering
> off the clocks, busses, regulators and even the oscillator is very
> important to have a reasonable battery life when using the phone.
> Platform coordinated approach falls quite short of the needs of a
> powerful processor with a desired battery efficiency.
>

As mentioned above, without the actual comparison it's hard to justify
that. While there are corner cases where OSI is able to make better
judgement, may be we can add ways to deal with that in the firmware
with PC mode, have we explored that before adding complexity to the OSPM ?
Since the firmware complexity with OSI remains same as PC mode, isn't it
worth checking if the corner case we are talking here can be handled in
the firmware.

--
Regards,
Sudeep
Lorenzo Pieralisi Aug. 9, 2018, 10:25 a.m. UTC | #10
On Wed, Aug 08, 2018 at 12:02:48PM -0600, Lina Iyer wrote:
> On Wed, Aug 08 2018 at 04:56 -0600, Lorenzo Pieralisi wrote:
> >On Mon, Aug 06, 2018 at 11:37:55AM +0200, Rafael J. Wysocki wrote:
> >>On Fri, Aug 3, 2018 at 1:42 PM, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> >>> [...]
> >>>
> >>>>>
> >>>>> Assuming that I have got that right, there are concerns, mostly regarding
> >>>>> patch [07/26], but I will reply to that directly.
> >>>>
> >>>> Well, I haven't got that right, so never mind.
> >>>>
> >>>> There are a few minor things to address, but apart from that the general
> >>>> genpd patches look ready.
> >>>
> >>> Alright, thanks!
> >>>
> >>> I will re-spin the series and post a new version once 4.19 rc1 is out.
> >>> Hopefully we can queue it up early in next cycle to get it tested in
> >>> next for a while.
> >>>
> >>>>
> >>>>> The $subject patch is fine by me by itself, but it obviously depends on the
> >>>>> previous ones.  Patches [01-02/26] are fine too, but they don't seem to be
> >>>>> particularly useful without the rest of the series.
> >>>>>
> >>>>> As far as patches [10-26/26] go, I'd like to see some review comments and/or
> >>>>> tags from the people with vested interest in there, in particular from Daniel
> >>>>> on patch [12/26] and from Sudeep on the PSCI ones.
> >>>>
> >>>> But this still holds.
> >>>
> >>> Actually, patch 10 and patch11 is ready to go as well. I ping Daniel
> >>> on patch 12.
> >>>
> >>> In regards to the rest of the series, some of the PSCI/ARM changes
> >>> have been reviewed by Mark Rutland, however several changes have not
> >>> been acked.
> >>>
> >>> On the other hand, one can also interpret the long silence in regards
> >>> to PSCI/ARM changes as they are good to go. :-)
> >>
> >>Well, in that case giving an ACK to them should not be an issue for
> >>the people with a vested interest I suppose.
> >
> >Apologies to everyone for the delay in replying.
> >
> >Side note: cpu_pm_enter()/exit() are also called through syscore ops in
> >s2RAM/IDLE, you know that but I just wanted to mention it to compound
> >the discussion.
> >
> >As for PSCI patches I do not personally think PSCI OSI enablement is
> >beneficial (and my position has always been the same since PSCI OSI was
> >added to the specification, I am not even talking about this patchset)
> >and Arm Trusted Firmware does not currently support it for the same
> >reason.
> >
> >We (if Mark and Sudeep agree) will enable PSCI OSI if and when we have a
> >definitive and constructive answer to *why* we have to do that that is
> >not a dogmatic "the kernel knows better" but rather a comprehensive
> >power benchmark evaluation - I thought that was the agreement reached
> >at OSPM but apparently I was mistaken.
> >
> I will not speak to any comparison of benchmarks between OSI and PC.
> AFAIK, there are no platforms supporting both.

PSCI specifications, 5.20.1:

"The platform will boot in platform-coordinated mode."

So all platforms implementing OSI have to support both.

> But, the OSI feature is critical for QCOM mobile platforms. The
> last man activities during cpuidle save quite a lot of power.

What I expressed above was that, in PSCI based systems (OSI or PC
alike), it is up to firmware/hardware to detect "the last man" not
the kernel.

I need to understand what you mean by "last man activities" to
provide feedback here.

> Powering off the clocks, busses, regulators and even the oscillator is
> very important to have a reasonable battery life when using the phone.
> Platform coordinated approach falls quite short of the needs of a
> powerful processor with a desired battery efficiency.

I am sorry but if you want us to merge PSCI patches in this series you
will have to back the claim above with a detailed technical explanation
of *why* platform-coordination falls short of QCOM (or whoever else)
needs wrt PSCI OSI.

Thanks,
Lorenzo
Lina Iyer Aug. 10, 2018, 8:18 p.m. UTC | #11
On Thu, Aug 09 2018 at 04:25 -0600, Lorenzo Pieralisi wrote:
>On Wed, Aug 08, 2018 at 12:02:48PM -0600, Lina Iyer wrote:
>> On Wed, Aug 08 2018 at 04:56 -0600, Lorenzo Pieralisi wrote:
>> >On Mon, Aug 06, 2018 at 11:37:55AM +0200, Rafael J. Wysocki wrote:
>> >>On Fri, Aug 3, 2018 at 1:42 PM, Ulf Hansson <ulf.hansson@linaro.org> wrote:
>> >>> [...]
>> >>>
>> >>>>>
>> >>>>> Assuming that I have got that right, there are concerns, mostly regarding
>> >>>>> patch [07/26], but I will reply to that directly.
>> >>>>
>> >>>> Well, I haven't got that right, so never mind.
>> >>>>
>> >>>> There are a few minor things to address, but apart from that the general
>> >>>> genpd patches look ready.
>> >>>
>> >>> Alright, thanks!
>> >>>
>> >>> I will re-spin the series and post a new version once 4.19 rc1 is out.
>> >>> Hopefully we can queue it up early in next cycle to get it tested in
>> >>> next for a while.
>> >>>
>> >>>>
>> >>>>> The $subject patch is fine by me by itself, but it obviously depends on the
>> >>>>> previous ones.  Patches [01-02/26] are fine too, but they don't seem to be
>> >>>>> particularly useful without the rest of the series.
>> >>>>>
>> >>>>> As far as patches [10-26/26] go, I'd like to see some review comments and/or
>> >>>>> tags from the people with vested interest in there, in particular from Daniel
>> >>>>> on patch [12/26] and from Sudeep on the PSCI ones.
>> >>>>
>> >>>> But this still holds.
>> >>>
>> >>> Actually, patch 10 and patch11 is ready to go as well. I ping Daniel
>> >>> on patch 12.
>> >>>
>> >>> In regards to the rest of the series, some of the PSCI/ARM changes
>> >>> have been reviewed by Mark Rutland, however several changes have not
>> >>> been acked.
>> >>>
>> >>> On the other hand, one can also interpret the long silence in regards
>> >>> to PSCI/ARM changes as they are good to go. :-)
>> >>
>> >>Well, in that case giving an ACK to them should not be an issue for
>> >>the people with a vested interest I suppose.
>> >
>> >Apologies to everyone for the delay in replying.
>> >
>> >Side note: cpu_pm_enter()/exit() are also called through syscore ops in
>> >s2RAM/IDLE, you know that but I just wanted to mention it to compound
>> >the discussion.
>> >
>> >As for PSCI patches I do not personally think PSCI OSI enablement is
>> >beneficial (and my position has always been the same since PSCI OSI was
>> >added to the specification, I am not even talking about this patchset)
>> >and Arm Trusted Firmware does not currently support it for the same
>> >reason.
>> >
>> >We (if Mark and Sudeep agree) will enable PSCI OSI if and when we have a
>> >definitive and constructive answer to *why* we have to do that that is
>> >not a dogmatic "the kernel knows better" but rather a comprehensive
>> >power benchmark evaluation - I thought that was the agreement reached
>> >at OSPM but apparently I was mistaken.
>> >
>> I will not speak to any comparison of benchmarks between OSI and PC.
>> AFAIK, there are no platforms supporting both.
>
>PSCI specifications, 5.20.1:
>
>"The platform will boot in platform-coordinated mode."
>
>So all platforms implementing OSI have to support both.
>
I understand. But there are no actual platforms out there that support
both. QC platforms do not support Platform coordinated in the firmware.
The primary reason for not doing PC is that it did not fit the
requirements for all high level OSes running on the AP. Also, having
dead code in a firmware that also does secure aspects was not desirable
for QC platforms. That said, the decision to not do PC is beyond my pay
grade.

>> But, the OSI feature is critical for QCOM mobile platforms. The
>> last man activities during cpuidle save quite a lot of power.
>
>What I expressed above was that, in PSCI based systems (OSI or PC
>alike), it is up to firmware/hardware to detect "the last man" not
>the kernel.
>
>I need to understand what you mean by "last man activities" to
>provide feedback here.
>
When the last CPU goes down during deep sleep, the following would be
done
- Lower resource requirements for shared resources such as clocks,
  busses and regulators that were used by drivers in AP. These shared
  resources when not used by other processors in the SoC may be turned
  off and put in low power state by a remote processor. [1][2]
- Enable and setup wakeup capable interrupts on an always-on interrupt
  controller, so the GIC and the GPIO controllers may be put in low
  power state. [3][4]
- Write next known wakeup value to the timer, so the blocks that were
  powered off, may be brought back into operational before the wakeup.
  [4][5]

These are commonly done during suspend, but to achieve a good power
efficiency, we have to do this when all the CPUs are just executing CPU
idle. Also, they cannot be done from the firmware (because the data
required for all this is part of Linux). OSI plays a crucial role in
determining when to do all this.

>> Powering off the clocks, busses, regulators and even the oscillator is
>> very important to have a reasonable battery life when using the phone.
>> Platform coordinated approach falls quite short of the needs of a
>> powerful processor with a desired battery efficiency.
>
>I am sorry but if you want us to merge PSCI patches in this series you
>will have to back the claim above with a detailed technical explanation
>of *why* platform-coordination falls short of QCOM (or whoever else)
>needs wrt PSCI OSI.
These items above add much value to reduce latency in wakeup idle,
increase cache performance and increase days of use. Even if we had a
platform to test for platform coordinated, it would be hard to quantify
because of the inability to do low power state for resources and setting
up wakeup is not easily possible with platform coordinated. Not doing
it, would leave a lot of power efficiency and performance on the table.

Thanks,
Lina

[1]. https://lkml.org/lkml/2018/6/11/546
[2]. For older production code -
https://source.codeaurora.org/quic/la/kernel/msm-4.4/tree/drivers/soc/qcom/rpm-smd.c?h=LA.HB.1.1.5.c1
Line 1764.
[3]. https://lkml.org/lkml/2018/8/10/437
[4]. Older production code -
https://source.codeaurora.org/quic/la/kernel/msm-4.4/tree/drivers/soc/qcom/mpm-of.c?h=LA.HB.1.1.5.c1
[5]. https://lkml.org/lkml/2018/7/19/218
Lina Iyer Aug. 10, 2018, 8:36 p.m. UTC | #12
On Thu, Aug 09 2018 at 02:16 -0600, Rafael J. Wysocki wrote:
>On Wed, Aug 8, 2018 at 8:02 PM, Lina Iyer <ilina@codeaurora.org> wrote:
>> On Wed, Aug 08 2018 at 04:56 -0600, Lorenzo Pieralisi wrote:
>>>
>>> On Mon, Aug 06, 2018 at 11:37:55AM +0200, Rafael J. Wysocki wrote:
>>>>
>>>> On Fri, Aug 3, 2018 at 1:42 PM, Ulf Hansson <ulf.hansson@linaro.org>
>>>> wrote:
>>>> > [...]
>>>> >
>>>> >>>
>>>> >>> Assuming that I have got that right, there are concerns, mostly
>>>> >>> regarding
>>>> >>> patch [07/26], but I will reply to that directly.
>>>> >>
>>>> >> Well, I haven't got that right, so never mind.
>>>> >>
>>>> >> There are a few minor things to address, but apart from that the
>>>> >> general
>>>> >> genpd patches look ready.
>>>> >
>>>> > Alright, thanks!
>>>> >
>>>> > I will re-spin the series and post a new version once 4.19 rc1 is out.
>>>> > Hopefully we can queue it up early in next cycle to get it tested in
>>>> > next for a while.
>>>> >
>>>> >>
>>>> >>> The $subject patch is fine by me by itself, but it obviously depends
>>>> >>> on the
>>>> >>> previous ones.  Patches [01-02/26] are fine too, but they don't seem
>>>> >>> to be
>>>> >>> particularly useful without the rest of the series.
>>>> >>>
>>>> >>> As far as patches [10-26/26] go, I'd like to see some review comments
>>>> >>> and/or
>>>> >>> tags from the people with vested interest in there, in particular
>>>> >>> from Daniel
>>>> >>> on patch [12/26] and from Sudeep on the PSCI ones.
>>>> >>
>>>> >> But this still holds.
>>>> >
>>>> > Actually, patch 10 and patch11 is ready to go as well. I ping Daniel
>>>> > on patch 12.
>>>> >
>>>> > In regards to the rest of the series, some of the PSCI/ARM changes
>>>> > have been reviewed by Mark Rutland, however several changes have not
>>>> > been acked.
>>>> >
>>>> > On the other hand, one can also interpret the long silence in regards
>>>> > to PSCI/ARM changes as they are good to go. :-)
>>>>
>>>> Well, in that case giving an ACK to them should not be an issue for
>>>> the people with a vested interest I suppose.
>>>
>>>
>>> Apologies to everyone for the delay in replying.
>>>
>>> Side note: cpu_pm_enter()/exit() are also called through syscore ops in
>>> s2RAM/IDLE, you know that but I just wanted to mention it to compound
>>> the discussion.
>>>
>>> As for PSCI patches I do not personally think PSCI OSI enablement is
>>> beneficial (and my position has always been the same since PSCI OSI was
>>> added to the specification, I am not even talking about this patchset)
>>> and Arm Trusted Firmware does not currently support it for the same
>>> reason.
>>>
>>> We (if Mark and Sudeep agree) will enable PSCI OSI if and when we have a
>>> definitive and constructive answer to *why* we have to do that that is
>>> not a dogmatic "the kernel knows better" but rather a comprehensive
>>> power benchmark evaluation - I thought that was the agreement reached
>>> at OSPM but apparently I was mistaken.
>>>
>> I will not speak to any comparison of benchmarks between OSI and PC.
>> AFAIK, there are no platforms supporting both.
>>
>> But, the OSI feature is critical for QCOM mobile platforms. The
>> last man activities during cpuidle save quite a lot of power. Powering
>> off the clocks, busses, regulators and even the oscillator is very
>> important to have a reasonable battery life when using the phone.
>> Platform coordinated approach falls quite short of the needs of a
>> powerful processor with a desired battery efficiency.
>
>Even so, you still need firmware (or hardware) to do the right thing
>in the concurrent wakeup via an edge-triggered interrupt case AFAICS.
>That is, you need the domain to be prevented from being turned off if
>one of the CPUs in it has just been woken up and the interrupt is
>still pending.
Yes, that is true and we have been doing this on pretty much every QC
SoC there is, for CPU domains. Generally, there is a handshake of sorts
with the power domain controller when the core executes WFI. It
decrements the reference on the controller when going down and
increments when coming up. The controller is only turned off when the
reference count is 0 and is turned back on before the CPU is ready to
exit the WFI.

What we are doing here is hand the domain's ->power_off and ->power_on
over to the platform firmware, which needs to make sure the races are
handled correctly either in h/w or through mechanisms like MCPM or in
the firmware. I would consider what happens during the power on/off of
the domains beyond the realm of the genpd at least for CPU specific PM
domains.

Thanks,
Lina
Rafael J. Wysocki Aug. 12, 2018, 9:53 a.m. UTC | #13
On Fri, Aug 10, 2018 at 10:36 PM Lina Iyer <ilina@codeaurora.org> wrote:
>
> On Thu, Aug 09 2018 at 02:16 -0600, Rafael J. Wysocki wrote:
> >On Wed, Aug 8, 2018 at 8:02 PM, Lina Iyer <ilina@codeaurora.org> wrote:
> >> On Wed, Aug 08 2018 at 04:56 -0600, Lorenzo Pieralisi wrote:
> >>>
> >>> On Mon, Aug 06, 2018 at 11:37:55AM +0200, Rafael J. Wysocki wrote:
> >>>>
> >>>> On Fri, Aug 3, 2018 at 1:42 PM, Ulf Hansson <ulf.hansson@linaro.org>
> >>>> wrote:
> >>>> > [...]
> >>>> >
> >>>> >>>
> >>>> >>> Assuming that I have got that right, there are concerns, mostly
> >>>> >>> regarding
> >>>> >>> patch [07/26], but I will reply to that directly.
> >>>> >>
> >>>> >> Well, I haven't got that right, so never mind.
> >>>> >>
> >>>> >> There are a few minor things to address, but apart from that the
> >>>> >> general
> >>>> >> genpd patches look ready.
> >>>> >
> >>>> > Alright, thanks!
> >>>> >
> >>>> > I will re-spin the series and post a new version once 4.19 rc1 is out.
> >>>> > Hopefully we can queue it up early in next cycle to get it tested in
> >>>> > next for a while.
> >>>> >
> >>>> >>
> >>>> >>> The $subject patch is fine by me by itself, but it obviously depends
> >>>> >>> on the
> >>>> >>> previous ones.  Patches [01-02/26] are fine too, but they don't seem
> >>>> >>> to be
> >>>> >>> particularly useful without the rest of the series.
> >>>> >>>
> >>>> >>> As far as patches [10-26/26] go, I'd like to see some review comments
> >>>> >>> and/or
> >>>> >>> tags from the people with vested interest in there, in particular
> >>>> >>> from Daniel
> >>>> >>> on patch [12/26] and from Sudeep on the PSCI ones.
> >>>> >>
> >>>> >> But this still holds.
> >>>> >
> >>>> > Actually, patch 10 and patch11 is ready to go as well. I ping Daniel
> >>>> > on patch 12.
> >>>> >
> >>>> > In regards to the rest of the series, some of the PSCI/ARM changes
> >>>> > have been reviewed by Mark Rutland, however several changes have not
> >>>> > been acked.
> >>>> >
> >>>> > On the other hand, one can also interpret the long silence in regards
> >>>> > to PSCI/ARM changes as they are good to go. :-)
> >>>>
> >>>> Well, in that case giving an ACK to them should not be an issue for
> >>>> the people with a vested interest I suppose.
> >>>
> >>>
> >>> Apologies to everyone for the delay in replying.
> >>>
> >>> Side note: cpu_pm_enter()/exit() are also called through syscore ops in
> >>> s2RAM/IDLE, you know that but I just wanted to mention it to compound
> >>> the discussion.
> >>>
> >>> As for PSCI patches I do not personally think PSCI OSI enablement is
> >>> beneficial (and my position has always been the same since PSCI OSI was
> >>> added to the specification, I am not even talking about this patchset)
> >>> and Arm Trusted Firmware does not currently support it for the same
> >>> reason.
> >>>
> >>> We (if Mark and Sudeep agree) will enable PSCI OSI if and when we have a
> >>> definitive and constructive answer to *why* we have to do that that is
> >>> not a dogmatic "the kernel knows better" but rather a comprehensive
> >>> power benchmark evaluation - I thought that was the agreement reached
> >>> at OSPM but apparently I was mistaken.
> >>>
> >> I will not speak to any comparison of benchmarks between OSI and PC.
> >> AFAIK, there are no platforms supporting both.
> >>
> >> But, the OSI feature is critical for QCOM mobile platforms. The
> >> last man activities during cpuidle save quite a lot of power. Powering
> >> off the clocks, busses, regulators and even the oscillator is very
> >> important to have a reasonable battery life when using the phone.
> >> Platform coordinated approach falls quite short of the needs of a
> >> powerful processor with a desired battery efficiency.
> >
> >Even so, you still need firmware (or hardware) to do the right thing
> >in the concurrent wakeup via an edge-triggered interrupt case AFAICS.
> >That is, you need the domain to be prevented from being turned off if
> >one of the CPUs in it has just been woken up and the interrupt is
> >still pending.
> Yes, that is true and we have been doing this on pretty much every QC
> SoC there is, for CPU domains. Generally, there is a handshake of sorts
> with the power domain controller when the core executes WFI. It
> decrements the reference on the controller when going down and
> increments when coming up. The controller is only turned off when the
> reference count is 0 and is turned back on before the CPU is ready to
> exit the WFI.
>
> What we are doing here is hand the domain's ->power_off and ->power_on
> over to the platform firmware, which needs to make sure the races are
> handled correctly either in h/w or through mechanisms like MCPM or in
> the firmware. I would consider what happens during the power on/off of
> the domains beyond the realm of the genpd at least for CPU specific PM
> domains.

I see.

The dependency on this FW/HW behavior should be clearly documented,
though, preferably next to the code in question, or people will try to
use it on systems where this requirement is not met and will be
wondering what's wrong and/or complaining.
Lorenzo Pieralisi Aug. 15, 2018, 10:44 a.m. UTC | #14
On Fri, Aug 10, 2018 at 02:18:15PM -0600, Lina Iyer wrote:

[...]

> >>But, the OSI feature is critical for QCOM mobile platforms. The
> >>last man activities during cpuidle save quite a lot of power.
> >
> >What I expressed above was that, in PSCI based systems (OSI or PC
> >alike), it is up to firmware/hardware to detect "the last man" not
> >the kernel.
> >
> >I need to understand what you mean by "last man activities" to
> >provide feedback here.
> >
> When the last CPU goes down during deep sleep, the following would be
> done
> - Lower resource requirements for shared resources such as clocks,
>  busses and regulators that were used by drivers in AP. These shared
>  resources when not used by other processors in the SoC may be turned
>  off and put in low power state by a remote processor. [1][2]
> - Enable and setup wakeup capable interrupts on an always-on interrupt
>  controller, so the GIC and the GPIO controllers may be put in low
>  power state. [3][4]
> - Write next known wakeup value to the timer, so the blocks that were
>  powered off, may be brought back into operational before the wakeup.
>  [4][5]
> 
> These are commonly done during suspend, but to achieve a good power
> efficiency, we have to do this when all the CPUs are just executing CPU
> idle. Also, they cannot be done from the firmware (because the data
> required for all this is part of Linux). OSI plays a crucial role in
> determining when to do all this.

No it does not. It is the power domain cpumasks that allow this code to
make an educated guess on the last cpu running (the kernel), PSCI OSI is
not crucial at all (it is crucial in QC platforms because that's the
only mode supported but that's not a reason I accept as valid since it
does not comply with the PSCI specifications).

As I mentioned in another thread[1] the generic part of this
series may be applicable in a platform agnostic way to the
CPUidle framework, whether that's beneficial it has to be proven
and it is benchmark specific anyway.

Lorenzo

[1]: https://marc.info/?l=linux-pm&m=153382916513032&w=2
Ulf Hansson Aug. 24, 2018, 12:24 p.m. UTC | #15
Lorenzo, Sudeep, Mark

On 15 August 2018 at 12:44, Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> wrote:
> On Fri, Aug 10, 2018 at 02:18:15PM -0600, Lina Iyer wrote:
>
> [...]
>
>> >>But, the OSI feature is critical for QCOM mobile platforms. The
>> >>last man activities during cpuidle save quite a lot of power.
>> >
>> >What I expressed above was that, in PSCI based systems (OSI or PC
>> >alike), it is up to firmware/hardware to detect "the last man" not
>> >the kernel.
>> >
>> >I need to understand what you mean by "last man activities" to
>> >provide feedback here.
>> >
>> When the last CPU goes down during deep sleep, the following would be
>> done
>> - Lower resource requirements for shared resources such as clocks,
>>  busses and regulators that were used by drivers in AP. These shared
>>  resources when not used by other processors in the SoC may be turned
>>  off and put in low power state by a remote processor. [1][2]
>> - Enable and setup wakeup capable interrupts on an always-on interrupt
>>  controller, so the GIC and the GPIO controllers may be put in low
>>  power state. [3][4]
>> - Write next known wakeup value to the timer, so the blocks that were
>>  powered off, may be brought back into operational before the wakeup.
>>  [4][5]
>>
>> These are commonly done during suspend, but to achieve a good power
>> efficiency, we have to do this when all the CPUs are just executing CPU
>> idle. Also, they cannot be done from the firmware (because the data
>> required for all this is part of Linux). OSI plays a crucial role in
>> determining when to do all this.
>
> No it does not. It is the power domain cpumasks that allow this code to
> make an educated guess on the last cpu running (the kernel), PSCI OSI is
> not crucial at all (it is crucial in QC platforms because that's the
> only mode supported but that's not a reason I accept as valid since it
> does not comply with the PSCI specifications).

We can keep argue on this back and forward, but it seems to lead
nowhere. As a matter of fact I am also surprised that this kind of
discussion pops up, again. I thought we had sorted this out,
especially since we have also met face to face, discussing this in
detail, several times by now. Well, well, let's try again. :-)

First, in regards to complying with the PSCI spec, to me, that sounds
like nonsense, sorry! Is the spec stating that the PSCI FW needs to
support all the idle states in PC mode, when the optional OSI mode
also is supported? To me, it looks like the QCOM PSCI FW supports PC
mode, but in that mode only a subset of the idle states can be
reached, so that should be fine, no?

Moving forward, I am wondering if a more detailed technical
description, comparing the benefits from OSI mode vs the benefits from
PC mode could help? Or is just a waste of everybody time, as you all
already know this? Anyway I am willing to try, just tell me and I
provide you with the best details I can give, about why OSI is better
suited for these kind of QCOM SoCs. I trust Lina to help to fill in,
if/when needed.

Why? Simply because I doubt we ever see the QCOM FW for the battery
driven embedded devices to support all idles states in the PC mode, so
doing a comparison on for example the 410c platform, just doesn't
seems to be possible, sorry!

I also have another, quite important, concern. That is, ARM decided to
put the OSI mode into the PSCI spec, I assume there were reasons for
it. Then, when the ARM community wants to implement support for OSI
mode, you are now requiring us to proof the justification of it in the
spec. To me, that is, nicely stated, weird. :-) But it also worries
me, ARM vendors observes the behavior.

That said, in the end we are discussing a quite limited amount of
lines of code to support PSCI OSI (some which may not even be
considered as OSI specific). It's ~200 lines of code, where most of
the code lives in a separate new c-file (psci_pm_domain.c).
Additionally, existing PC mode only platforms should still work as
before, without drawbacks.

Really, why are we arguing about this at all?

>
> As I mentioned in another thread[1] the generic part of this
> series may be applicable in a platform agnostic way to the
> CPUidle framework, whether that's beneficial it has to be proven
> and it is benchmark specific anyway.

I don't think this can be made fully platform agnostic. Or maybe you
are suggesting another helper layer above the new genpd
infrastructure?

Anyway, my point is, the genpd backend driver requires knowledge about
the FW and the last man standing algorithm, hence a platform agnostic
backend doesn't sound feasible to me.

>
> Lorenzo
>
> [1]: https://marc.info/?l=linux-pm&m=153382916513032&w=2

Kind regards
Uffe

Patch
diff mbox

diff --git a/kernel/cpu_pm.c b/kernel/cpu_pm.c
index 67b02e138a47..492d4a83dca0 100644
--- a/kernel/cpu_pm.c
+++ b/kernel/cpu_pm.c
@@ -16,9 +16,11 @@ 
  */
 
 #include <linux/kernel.h>
+#include <linux/cpu.h>
 #include <linux/cpu_pm.h>
 #include <linux/module.h>
 #include <linux/notifier.h>
+#include <linux/pm_runtime.h>
 #include <linux/spinlock.h>
 #include <linux/syscore_ops.h>
 
@@ -91,6 +93,7 @@  int cpu_pm_enter(void)
 {
 	int nr_calls;
 	int ret = 0;
+	struct device *dev = get_cpu_device(smp_processor_id());
 
 	ret = cpu_pm_notify(CPU_PM_ENTER, -1, &nr_calls);
 	if (ret)
@@ -100,6 +103,9 @@  int cpu_pm_enter(void)
 		 */
 		cpu_pm_notify(CPU_PM_ENTER_FAILED, nr_calls - 1, NULL);
 
+	if (!ret && dev && dev->pm_domain)
+		pm_runtime_put_sync_suspend(dev);
+
 	return ret;
 }
 EXPORT_SYMBOL_GPL(cpu_pm_enter);
@@ -118,6 +124,11 @@  EXPORT_SYMBOL_GPL(cpu_pm_enter);
  */
 int cpu_pm_exit(void)
 {
+	struct device *dev = get_cpu_device(smp_processor_id());
+
+	if (dev && dev->pm_domain)
+		pm_runtime_get_sync(dev);
+
 	return cpu_pm_notify(CPU_PM_EXIT, -1, NULL);
 }
 EXPORT_SYMBOL_GPL(cpu_pm_exit);