diff mbox series

[RFC,v1,7/8] drivers: qcom: cpu_pd: Handle cpu hotplug in the domain

Message ID 1539206455-29342-8-git-send-email-rplsssn@codeaurora.org (mailing list archive)
State RFC
Delegated to: Andy Gross
Headers show
Series drivers: qcom: Add cpu power domain for SDM845 | expand

Commit Message

Raju P.L.S.S.S.N Oct. 10, 2018, 9:20 p.m. UTC
Use cpu hotplug callback mechanism to attach/dettach the cpu in
the cpu power domain. During cpu hotplug callback registration,
the starting callback is invoked on all online cpus. So there is
no need to attach from device probe.

Signed-off-by: Raju P.L.S.S.S.N <rplsssn@codeaurora.org>
---
 drivers/soc/qcom/cpu_pd.c  | 33 +++++++++++++++++++++++++--------
 include/linux/cpuhotplug.h |  1 +
 2 files changed, 26 insertions(+), 8 deletions(-)

Comments

Sudeep Holla Oct. 11, 2018, 11:20 a.m. UTC | #1
On Thu, Oct 11, 2018 at 02:50:54AM +0530, Raju P.L.S.S.S.N wrote:
> Use cpu hotplug callback mechanism to attach/dettach the cpu in
> the cpu power domain. During cpu hotplug callback registration,
> the starting callback is invoked on all online cpus. So there is
> no need to attach from device probe.
>

To be more explicit, NACK to this series(patches 4-8 to be more specific)
unless you provide details on:

1. Why this is not in PSCI implementation ?
2. If PSCI is used on this platform, how does it work/co-exist ?
3. Is this a shortcut approached taken to bypass the CPU genpd attempts
   from Lina/Ulf ?

--
Regards,
Sudeep
Lina Iyer Oct. 11, 2018, 4 p.m. UTC | #2
Sudeep,

This is idea is based on GenPD/PM Domains, but solves for the CPU domain
activities that need to be done from Linux when the CPU domain could be
powered off. In that, it shares the same ideas from the series posted by
Ulf. But this has no bearing on PSCI. The 845 SoC that Raju is working
on, uses Platform-coordinated and so is largely deficient in meeting the
power benefits achievable on this SoC, from Linux.

If you have looked at the patches, you probably know by now, there are
two things this patch does when the CPUs are powered off -
- Flush RPMH requests to the controller (shared resource state requests
  after the CPU is powered off and resume to the current state before
  the CPU wakes up).
- Setup the wakeup time for the CPUs in the hardware registers such that
  the hardware blocks are awake before a CPU is powered back on. This is
  like a back-off time for handling timer wakeups faster.

The CPU PD does not power off the domain from Linux. That is done from
PSCI firmware (ATF). These patches are doing the part that Linux has do,
when powering off the CPUs, to achieve a low standby power consumption.

On Thu, Oct 11 2018 at 05:20 -0600, Sudeep Holla wrote:
>On Thu, Oct 11, 2018 at 02:50:54AM +0530, Raju P.L.S.S.S.N wrote:
>> Use cpu hotplug callback mechanism to attach/dettach the cpu in
>> the cpu power domain. During cpu hotplug callback registration,
>> the starting callback is invoked on all online cpus. So there is
>> no need to attach from device probe.
>>
>
>To be more explicit, NACK to this series(patches 4-8 to be more specific)
>unless you provide details on:
>
>1. Why this is not in PSCI implementation ?
This is a linux activity and there is no provision in ATF or QC firmware
to do this activity. The task of physically powering down the domain,
still is a firmware decision and is done through PSCI platform
coordinated in the firmware.

>2. If PSCI is used on this platform, how does it work/co-exist ?
Yes PSCI is used in this platform. ATF is the firmware and that supports
only Platform Coordinated. This SoC uses cpuidle and the regular PSCI
methods to power off the domain from the firmware. However, Linux has
responsibilities that it needs to complete before the power down can be
beneficial.

>3. Is this a shortcut approached taken to bypass the CPU genpd attempts
>   from Lina/Ulf ?
>
This is an incorrect interpretation and an unwarranted accusation. There
is no need to bypass CPU genpd attempts. That series stands on its own
merits. (Thank you, Ulf). Raju has mentioned in the cover letter
explicitly, that Ulf's patches posted here are for completeness of the
concept, as that series is a dependency. But it will merged as part of
Ulf's efforts.

Isn't sharing ideas a key aspect of working with the community? This
series just goes to say that the idea of CPU PM domains are useful,
whether PSCI uses it or not. If you still need clarifications, Raju and
I will be happy to set up a meeting and go over the idea.

Thanks,
Lina
Sudeep Holla Oct. 11, 2018, 4:19 p.m. UTC | #3
On Thu, Oct 11, 2018 at 10:00:53AM -0600, Lina Iyer wrote:
> Sudeep,
>
> This is idea is based on GenPD/PM Domains, but solves for the CPU domain
> activities that need to be done from Linux when the CPU domain could be
> powered off. In that, it shares the same ideas from the series posted by
> Ulf. But this has no bearing on PSCI. The 845 SoC that Raju is working
> on, uses Platform-coordinated and so is largely deficient in meeting the
> power benefits achievable on this SoC, from Linux.
>

Interesting to know we have some QCOM part with PC mode.

> If you have looked at the patches, you probably know by now, there are
> two things this patch does when the CPUs are powered off -
> - Flush RPMH requests to the controller (shared resource state requests
>  after the CPU is powered off and resume to the current state before
>  the CPU wakes up).
> - Setup the wakeup time for the CPUs in the hardware registers such that
>  the hardware blocks are awake before a CPU is powered back on. This is
>  like a back-off time for handling timer wakeups faster.
>

Yes I understand these.

> The CPU PD does not power off the domain from Linux. That is done from
> PSCI firmware (ATF). These patches are doing the part that Linux has do,
> when powering off the CPUs, to achieve a low standby power consumption.
>

I don't understand why Linux *has do* this part as PSCI manages CPU PM.

> On Thu, Oct 11 2018 at 05:20 -0600, Sudeep Holla wrote:
> > On Thu, Oct 11, 2018 at 02:50:54AM +0530, Raju P.L.S.S.S.N wrote:
> > > Use cpu hotplug callback mechanism to attach/dettach the cpu in
> > > the cpu power domain. During cpu hotplug callback registration,
> > > the starting callback is invoked on all online cpus. So there is
> > > no need to attach from device probe.
> > >
> >
> > To be more explicit, NACK to this series(patches 4-8 to be more specific)
> > unless you provide details on:
> >
> > 1. Why this is not in PSCI implementation ?
> This is a linux activity and there is no provision in ATF or QC firmware
> to do this activity. The task of physically powering down the domain,
> still is a firmware decision and is done through PSCI platform
> coordinated in the firmware.
>

Yes that was my understanding. So the addon question here is: if PSCI
decides to abort entering the idle state, the Linux doing the RPMH
request is of no use which can be prevented if done once PSCI f/w is
about to enter the state. I know it may be corner case, but we have
whole OSI mode based on such corner cases.

> > 2. If PSCI is used on this platform, how does it work/co-exist ?
> Yes PSCI is used in this platform. ATF is the firmware and that supports
> only Platform Coordinated. This SoC uses cpuidle and the regular PSCI
> methods to power off the domain from the firmware. However, Linux has
> responsibilities that it needs to complete before the power down can be
> beneficial.
>

I understand the need to inform RPMH. So I take only reason to do that
in Linux is TF-A doesn't have any support to talk to RPMH ?

> > 3. Is this a shortcut approached taken to bypass the CPU genpd attempts
> >   from Lina/Ulf ?
> >
> This is an incorrect interpretation and an unwarranted accusation. There
> is no need to bypass CPU genpd attempts. That series stands on its own
> merits. (Thank you, Ulf). Raju has mentioned in the cover letter
> explicitly, that Ulf's patches posted here are for completeness of the
> concept, as that series is a dependency. But it will merged as part of
> Ulf's efforts.
>

Sorry if it sounded like accusation. I didn't mean to. I was checking
if this was alternative solution. I do understand the need to deal with
product pressures.

Now that we have some platform with PC, it's good to compare PC vs OSI
which we always lacked. Thanks for letting us know this platform is PC
based.

> Isn't sharing ideas a key aspect of working with the community? This
> series just goes to say that the idea of CPU PM domains are useful,
> whether PSCI uses it or not. If you still need clarifications, Raju and
> I will be happy to set up a meeting and go over the idea.
>
Ah OK, so this platform will have flattened cpu-idle-states list ? That's
absolutely fine. But what if we want to represent hierarchical PSCI based
PM domains and this power domain for some platform. That's the main
concern I raised. For me, the power-domains in DT introduced in this
is just to deal with RPMH though the actual work is done by PSCI.
That just doesn't look that good for me.

--
Regards,
Sudeep
Lina Iyer Oct. 11, 2018, 4:58 p.m. UTC | #4
On Thu, Oct 11 2018 at 10:19 -0600, Sudeep Holla wrote:
>On Thu, Oct 11, 2018 at 10:00:53AM -0600, Lina Iyer wrote:
>> Sudeep,
>>
>> The CPU PD does not power off the domain from Linux. That is done from
>> PSCI firmware (ATF). These patches are doing the part that Linux has do,
>> when powering off the CPUs, to achieve a low standby power consumption.
>>
>
>I don't understand why Linux *has do* this part as PSCI manages CPU PM.
>
If we don't do this, then we leave a lot of power saving on the table,
when the CPU powered off. Why should the DDR and the shared busses and
clocks be idling at high power, when not needed ? PSCI has no clue to
what resource requests was made my Linux and its Linux's responsibility
to relinquish them when not needed. Therefore has to done from Linux.

>> On Thu, Oct 11 2018 at 05:20 -0600, Sudeep Holla wrote:
>> > On Thu, Oct 11, 2018 at 02:50:54AM +0530, Raju P.L.S.S.S.N wrote:
>> > > Use cpu hotplug callback mechanism to attach/dettach the cpu in
>> > > the cpu power domain. During cpu hotplug callback registration,
>> > > the starting callback is invoked on all online cpus. So there is
>> > > no need to attach from device probe.
>> > >
>> >
>> > To be more explicit, NACK to this series(patches 4-8 to be more specific)
>> > unless you provide details on:
>> >
>> > 1. Why this is not in PSCI implementation ?
>> This is a linux activity and there is no provision in ATF or QC firmware
>> to do this activity. The task of physically powering down the domain,
>> still is a firmware decision and is done through PSCI platform
>> coordinated in the firmware.
>>
>
>Yes that was my understanding. So the addon question here is: if PSCI
>decides to abort entering the idle state, the Linux doing the RPMH
>request is of no use which can be prevented if done once PSCI f/w is
>about to enter the state. I know it may be corner case, but we have
>whole OSI mode based on such corner cases.
>
Yes, it is a possibility and worth the chance taken. On an SoC, there
are other processors that may vote against powering down the shared
resources even when Linux has shutdown and it is a very likely
possibility. Ex., when you are on a phone call, the CPU subsystem could
be powered off and the flush of RPMH requests is a 'waste', but it is a
chance we need to take. The alternate is a synchronization nightmare.

Even with all the unnecessary flushing it is totally worth it. OSI helps
alleviates this a bit because it embodies the same CPU PD concepts at
its core. Imagine if you didn't have CPU PM domain, the every CPU would
be flushing RPMH request, whenever they power down, because you never know
when all CPUs are going to be powered down at the same time. That is the
biggest benefit of OSI over PC mode in PSCI.

>> > 2. If PSCI is used on this platform, how does it work/co-exist ?
>> Yes PSCI is used in this platform. ATF is the firmware and that supports
>> only Platform Coordinated. This SoC uses cpuidle and the regular PSCI
>> methods to power off the domain from the firmware. However, Linux has
>> responsibilities that it needs to complete before the power down can be
>> beneficial.
>>
>
>I understand the need to inform RPMH. So I take only reason to do that
>in Linux is TF-A doesn't have any support to talk to RPMH ?
>
It may or may not, depending on which firmware you talk to. But consider
this, if the firmware votes for lowering resource state, it is doing it
for its exception level and not for Linux. So Linux has to take care of
its own.

>Now that we have some platform with PC, it's good to compare PC vs OSI
>which we always lacked. Thanks for letting us know this platform is PC
>based.
>
Can I take that you are okay with the OSI idea and this one, then :)
Yes, we are close to having a platform have both, possibly.

>> Isn't sharing ideas a key aspect of working with the community? This
>> series just goes to say that the idea of CPU PM domains are useful,
>> whether PSCI uses it or not. If you still need clarifications, Raju and
>> I will be happy to set up a meeting and go over the idea.
>>
>Ah OK, so this platform will have flattened cpu-idle-states list ? That's
>absolutely fine. But what if we want to represent hierarchical PSCI based
>PM domains and this power domain for some platform. That's the main
>concern I raised. For me, the power-domains in DT introduced in this
>is just to deal with RPMH though the actual work is done by PSCI.
>That just doesn't look that good for me.
>
What problem do you see with that? It would help if you could clarify
your concern.

Thanks,
Lina
Sudeep Holla Oct. 11, 2018, 5:37 p.m. UTC | #5
On Thu, Oct 11, 2018 at 10:58:22AM -0600, Lina Iyer wrote:
> On Thu, Oct 11 2018 at 10:19 -0600, Sudeep Holla wrote:
> > On Thu, Oct 11, 2018 at 10:00:53AM -0600, Lina Iyer wrote:
> > > Sudeep,
> > >
> > > The CPU PD does not power off the domain from Linux. That is done from
> > > PSCI firmware (ATF). These patches are doing the part that Linux has do,
> > > when powering off the CPUs, to achieve a low standby power consumption.
> > >
> >
> > I don't understand why Linux *has do* this part as PSCI manages CPU PM.
> >
> If we don't do this, then we leave a lot of power saving on the table,
> when the CPU powered off. Why should the DDR and the shared busses and
> clocks be idling at high power, when not needed ? PSCI has no clue to
> what resource requests was made my Linux and its Linux's responsibility
> to relinquish them when not needed. Therefore has to done from Linux.
>

Is DDR managed by Linux ? I assumed it was handled by higher exception
levels. Can you give examples of resources used by CPU in this context.
When CPU can be powered on or woken up without Linux intervention, the
same holds true for CPU power down or sleep states. I still see no reason
other than the firmware has no support to talk to RPMH.

> > > On Thu, Oct 11 2018 at 05:20 -0600, Sudeep Holla wrote:
> > > > On Thu, Oct 11, 2018 at 02:50:54AM +0530, Raju P.L.S.S.S.N wrote:
> > > > > Use cpu hotplug callback mechanism to attach/dettach the cpu in
> > > > > the cpu power domain. During cpu hotplug callback registration,
> > > > > the starting callback is invoked on all online cpus. So there is
> > > > > no need to attach from device probe.
> > > > >
> > > >
> > > > To be more explicit, NACK to this series(patches 4-8 to be more specific)
> > > > unless you provide details on:
> > > >
> > > > 1. Why this is not in PSCI implementation ?
> > > This is a linux activity and there is no provision in ATF or QC firmware
> > > to do this activity. The task of physically powering down the domain,
> > > still is a firmware decision and is done through PSCI platform
> > > coordinated in the firmware.
> > >
> >
> > Yes that was my understanding. So the addon question here is: if PSCI
> > decides to abort entering the idle state, the Linux doing the RPMH
> > request is of no use which can be prevented if done once PSCI f/w is
> > about to enter the state. I know it may be corner case, but we have
> > whole OSI mode based on such corner cases.
> >
> Yes, it is a possibility and worth the chance taken. On an SoC, there
> are other processors that may vote against powering down the shared
> resources even when Linux has shutdown and it is a very likely
> possibility. Ex., when you are on a phone call, the CPU subsystem could
> be powered off and the flush of RPMH requests is a 'waste', but it is a
> chance we need to take. The alternate is a synchronization nightmare.
>

I am not against voting here. I am saying need not be done in Linux. The
last piece of software running before powering down is EL3 and it should
so the voting. I can understand for devices controlled in/by Linux. EL3
firmware controls the CPU PM, so that needs to vote and by that it's
assumed nothing is running in lower EL in that path.

> Even with all the unnecessary flushing it is totally worth it. OSI helps
> alleviates this a bit because it embodies the same CPU PD concepts at
> its core. Imagine if you didn't have CPU PM domain, the every CPU would
> be flushing RPMH request, whenever they power down, because you never know
> when all CPUs are going to be powered down at the same time. That is the
> biggest benefit of OSI over PC mode in PSCI.
>

I am not saying every CPU needs to do that, last CPU can do that in PSCI.

> > > > 2. If PSCI is used on this platform, how does it work/co-exist ?
> > > Yes PSCI is used in this platform. ATF is the firmware and that supports
> > > only Platform Coordinated. This SoC uses cpuidle and the regular PSCI
> > > methods to power off the domain from the firmware. However, Linux has
> > > responsibilities that it needs to complete before the power down can be
> > > beneficial.
> > >
> >
> > I understand the need to inform RPMH. So I take only reason to do that
> > in Linux is TF-A doesn't have any support to talk to RPMH ?
> >
> It may or may not, depending on which firmware you talk to. But consider
> this, if the firmware votes for lowering resource state, it is doing it
> for its exception level and not for Linux. So Linux has to take care of
> its own.
>

Oh interesting, wasn't aware RPMH really needs to care about exception
level. For me, we know CPU is powering down, so it needs to release all
the resource. RPMH needs to know that and any exception level can let
RPMH know that. Sorry may be I don't have enough knowledge on SDM SoC.

But IMO Linux shouldn't contribute to any votes around CPU PM as EL3/PSCI
is the one managing it and not Linux. If CPU can be powered up without
Linux voting for anything, why is it required for going down. I simply
fail to understand there and get completely lost :(

> > Now that we have some platform with PC, it's good to compare PC vs OSI
> > which we always lacked. Thanks for letting us know this platform is PC
> > based.
> >
> Can I take that you are okay with the OSI idea and this one, then :)
> Yes, we are close to having a platform have both, possibly.
>

Comparison numbers please :)

> > > Isn't sharing ideas a key aspect of working with the community? This
> > > series just goes to say that the idea of CPU PM domains are useful,
> > > whether PSCI uses it or not. If you still need clarifications, Raju and
> > > I will be happy to set up a meeting and go over the idea.
> > >
> > Ah OK, so this platform will have flattened cpu-idle-states list ? That's
> > absolutely fine. But what if we want to represent hierarchical PSCI based
> > PM domains and this power domain for some platform. That's the main
> > concern I raised. For me, the power-domains in DT introduced in this
> > is just to deal with RPMH though the actual work is done by PSCI.
> > That just doesn't look that good for me.
> >
> What problem do you see with that? It would help if you could clarify
> your concern.
>
Having to adapt DT to the firmware though the feature is fully discoverable
is not at all good IMO. So the DT in this series *should work* with OSI
mode if the firmware has the support for it, it's as simple as that.

--
Regards,
Sudeep
Lina Iyer Oct. 11, 2018, 9:06 p.m. UTC | #6
On Thu, Oct 11 2018 at 11:37 -0600, Sudeep Holla wrote:
>On Thu, Oct 11, 2018 at 10:58:22AM -0600, Lina Iyer wrote:
>> On Thu, Oct 11 2018 at 10:19 -0600, Sudeep Holla wrote:
>> > On Thu, Oct 11, 2018 at 10:00:53AM -0600, Lina Iyer wrote:
>> > > Sudeep,
>> > >
>> > > The CPU PD does not power off the domain from Linux. That is done from
>> > > PSCI firmware (ATF). These patches are doing the part that Linux has do,
>> > > when powering off the CPUs, to achieve a low standby power consumption.
>> > >
>> >
>> > I don't understand why Linux *has do* this part as PSCI manages CPU PM.
>> >
>> If we don't do this, then we leave a lot of power saving on the table,
>> when the CPU powered off. Why should the DDR and the shared busses and
>> clocks be idling at high power, when not needed ? PSCI has no clue to
>> what resource requests was made my Linux and its Linux's responsibility
>> to relinquish them when not needed. Therefore has to done from Linux.
>>
>
>Is DDR managed by Linux ? I assumed it was handled by higher exception
>levels. Can you give examples of resources used by CPU in this context.
>When CPU can be powered on or woken up without Linux intervention, the
>same holds true for CPU power down or sleep states. I still see no reason
>other than the firmware has no support to talk to RPMH.
>
DDR, shared clocks, regulators etc. Imagine you are running something on
the screen and CPUs enter low power mode, while the CPUs were active,
there was a need for bunch of display resources, and things the app may
have requested resources, while the CPU powered down the requests may
not be needed the full extent as when the CPU was running, so they can
voted down to a lower state of in some cases turn off the resources
completely. What the driver voted for is dependent on the runtime state
and the usecase currently active. The 'sleep' state value is also
determined by the driver/framework.

>> Even with all the unnecessary flushing it is totally worth it. OSI helps
>> alleviates this a bit because it embodies the same CPU PD concepts at
>> its core. Imagine if you didn't have CPU PM domain, the every CPU would
>> be flushing RPMH request, whenever they power down, because you never know
>> when all CPUs are going to be powered down at the same time. That is the
>> biggest benefit of OSI over PC mode in PSCI.
>>
>
>I am not saying every CPU needs to do that, last CPU can do that in PSCI.
>
Yes, the last CPU is what we are getting to with this series.. Now this
can't be done from PSCI. I will explain below.

>Oh interesting, wasn't aware RPMH really needs to care about exception
>level. For me, we know CPU is powering down, so it needs to release all
>the resource. RPMH needs to know that and any exception level can let
>RPMH know that. Sorry may be I don't have enough knowledge on SDM SoC.
>
Some resources are secure resources used in secure environments. They
cannot be requested from non-secure. Hence secure levels are voters of
their own accord.

Now, since we are considering linux and secure (infact linux,hyp,secure)
as separate voters they have to each request their votes and release
their votes separately. PSCI cannot release a request made from Linux.
This is how the SoC is designed. All exception levels will abide by
that.

>> Yes, we are close to having a platform have both, possibly.
>>
>
>Comparison numbers please :)
>
We are far from it, for that, atleast now. But we will get there.

>Having to adapt DT to the firmware though the feature is fully discoverable
>is not at all good IMO. So the DT in this series *should work* with OSI
>mode if the firmware has the support for it, it's as simple as that.
>
The firmware is ATF and does not support OSI.

Thanks,
Lina
Sudeep Holla Oct. 12, 2018, 2:25 p.m. UTC | #7
On Thu, Oct 11, 2018 at 02:50:54AM +0530, Raju P.L.S.S.S.N wrote:
> Use cpu hotplug callback mechanism to attach/dettach the cpu in
> the cpu power domain. During cpu hotplug callback registration,
> the starting callback is invoked on all online cpus. So there is
> no need to attach from device probe.
> 
> Signed-off-by: Raju P.L.S.S.S.N <rplsssn@codeaurora.org>
> ---
>  drivers/soc/qcom/cpu_pd.c  | 33 +++++++++++++++++++++++++--------
>  include/linux/cpuhotplug.h |  1 +
>  2 files changed, 26 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/soc/qcom/cpu_pd.c b/drivers/soc/qcom/cpu_pd.c
> index 242eced..bf2d95d 100644
> --- a/drivers/soc/qcom/cpu_pd.c
> +++ b/drivers/soc/qcom/cpu_pd.c
> @@ -3,6 +3,7 @@
>   * Copyright (c) 2018, The Linux Foundation. All rights reserved.
>   */
>  
> +#include <linux/cpuhotplug.h>
>  #include <linux/ktime.h>
>  #include <linux/of_platform.h>
>  #include <linux/pm_domain.h>
> @@ -85,12 +86,26 @@ static int cpu_pd_power_off(struct generic_pm_domain *domain)
>  	return 0;
>  }
>  
> +static int cpu_pd_starting(unsigned int cpu)
> +{
> +	if (!suspend && of_genpd_attach_cpu(cpu))
> +		pr_err("%s: genpd_attach_cpu fail\n", __func__);

If it's that serious, return proper error code and drop the message which
could be annoying if you are running some hotplug test loop.

--
Regards,
Sudeep
Sudeep Holla Oct. 12, 2018, 3:04 p.m. UTC | #8
On Thu, Oct 11, 2018 at 03:06:09PM -0600, Lina Iyer wrote:
> On Thu, Oct 11 2018 at 11:37 -0600, Sudeep Holla wrote:
[...]

> >
> > Is DDR managed by Linux ? I assumed it was handled by higher exception
> > levels. Can you give examples of resources used by CPU in this context.
> > When CPU can be powered on or woken up without Linux intervention, the
> > same holds true for CPU power down or sleep states. I still see no reason
> > other than the firmware has no support to talk to RPMH.
> >
> DDR, shared clocks, regulators etc. Imagine you are running something on
> the screen and CPUs enter low power mode, while the CPUs were active,
> there was a need for bunch of display resources, and things the app may
> have requested resources, while the CPU powered down the requests may
> not be needed the full extent as when the CPU was running, so they can
> voted down to a lower state of in some cases turn off the resources
> completely. What the driver voted for is dependent on the runtime state
> and the usecase currently active. The 'sleep' state value is also
> determined by the driver/framework.
>

Why does CPU going down says that another (screen - supposedly shared)
resource needs to be relinquished ? Shouldn't display decide that on it's
own ? I have no idea why screen/display is brought into this discussion.
CPU can just say: hey I am going down and I don't need my resource.
How can it say: hey I am going down and display or screen also doesn't
need the resource. On a multi-cluster, how will the last CPU on one know
that it needs to act on behalf of the shared resource instead of another
cluster.

I think we are mixing the system sleep states with CPU idle here.
If it's system sleeps states, the we need to deal it in some system ops
when it's the last CPU in the system and not the cluster/power domain.

[...]

> > Oh interesting, wasn't aware RPMH really needs to care about exception
> > level. For me, we know CPU is powering down, so it needs to release all
> > the resource. RPMH needs to know that and any exception level can let
> > RPMH know that. Sorry may be I don't have enough knowledge on SDM SoC.
> >
> Some resources are secure resources used in secure environments. They
> cannot be requested from non-secure. Hence secure levels are voters of
> their own accord.
>

I still don't think RPMH can more than refcounting and all I am saying
is for CPU's EL3 can manage that refcount on behalf of all ELx

> Now, since we are considering linux and secure (infact linux,hyp,secure)
> as separate voters they have to each request their votes and release
> their votes separately. PSCI cannot release a request made from Linux.

Why should Linux make that request at the first instance as CPU is
already up and running.

> This is how the SoC is designed. All exception levels will abide by
> that.
>
> > > Yes, we are close to having a platform have both, possibly.
> > >
> >
> > Comparison numbers please :)
> >
> We are far from it, for that, atleast now. But we will get there.
>

Hopefully one day. We are waiting for that day for few years now :)

> > Having to adapt DT to the firmware though the feature is fully discoverable
> > is not at all good IMO. So the DT in this series *should work* with OSI
> > mode if the firmware has the support for it, it's as simple as that.
> >
> The firmware is ATF and does not support OSI.
>

OK, to keep it simple: If a platform with PC mode only replaces the firmware
with one that has OSI mode, we *shouldn't need* to change DT to suite it.
I think I asked Ulf to add something similar in DT bindings.

--
Regards,
Sudeep
Ulf Hansson Oct. 12, 2018, 3:46 p.m. UTC | #9
On 12 October 2018 at 17:04, Sudeep Holla <sudeep.holla@arm.com> wrote:
> On Thu, Oct 11, 2018 at 03:06:09PM -0600, Lina Iyer wrote:
>> On Thu, Oct 11 2018 at 11:37 -0600, Sudeep Holla wrote:
> [...]
>
>> >
>> > Is DDR managed by Linux ? I assumed it was handled by higher exception
>> > levels. Can you give examples of resources used by CPU in this context.
>> > When CPU can be powered on or woken up without Linux intervention, the
>> > same holds true for CPU power down or sleep states. I still see no reason
>> > other than the firmware has no support to talk to RPMH.
>> >
>> DDR, shared clocks, regulators etc. Imagine you are running something on
>> the screen and CPUs enter low power mode, while the CPUs were active,
>> there was a need for bunch of display resources, and things the app may
>> have requested resources, while the CPU powered down the requests may
>> not be needed the full extent as when the CPU was running, so they can
>> voted down to a lower state of in some cases turn off the resources
>> completely. What the driver voted for is dependent on the runtime state
>> and the usecase currently active. The 'sleep' state value is also
>> determined by the driver/framework.
>>
>
> Why does CPU going down says that another (screen - supposedly shared)
> resource needs to be relinquished ? Shouldn't display decide that on it's
> own ? I have no idea why screen/display is brought into this discussion.
> CPU can just say: hey I am going down and I don't need my resource.
> How can it say: hey I am going down and display or screen also doesn't
> need the resource. On a multi-cluster, how will the last CPU on one know
> that it needs to act on behalf of the shared resource instead of another
> cluster.

Apologize for sidetracking the discussion, just want to fold in a few comments.

This is becoming a complicated story. May I suggest we pick the GIC as
an example instead?

Let's assume the simple case, we have one cluster and when the cluster
becomes powered off, the GIC needs to be re-configured and wakeups
must be routed through some "always on" external logic.

The PSCI spec mentions nothing about how to manage this and not the
rest of the SoC topology for that matter. Hence if the GIC is managed
by Linux - then Linux also needs to take actions before cluster power
down and after cluster power up. So, if PSCI FW can't deal with GIC,
how would manage it?

>
> I think we are mixing the system sleep states with CPU idle here.
> If it's system sleeps states, the we need to deal it in some system ops
> when it's the last CPU in the system and not the cluster/power domain.

What is really a system sleep state? One could consider it just being
another idles state, having heaver residency targets and greater
enter/exit latency values, couldn't you?

In the end, there is no reason to keep things powered on, unless they
are being in used (or soon to be used), that is main point.

We are also working on S2I at Linaro. We strive towards being able to
show the same power numbers as for S2R, but then we need to get these
cluster-idle things right.

[...]

Have a nice weekend!

Kind regards
Uffe
Lina Iyer Oct. 12, 2018, 4:04 p.m. UTC | #10
On Fri, Oct 12 2018 at 09:04 -0600, Sudeep Holla wrote:
>On Thu, Oct 11, 2018 at 03:06:09PM -0600, Lina Iyer wrote:
>> On Thu, Oct 11 2018 at 11:37 -0600, Sudeep Holla wrote:
>[...]
>
>> >
>> > Is DDR managed by Linux ? I assumed it was handled by higher exception
>> > levels. Can you give examples of resources used by CPU in this context.
>> > When CPU can be powered on or woken up without Linux intervention, the
>> > same holds true for CPU power down or sleep states. I still see no reason
>> > other than the firmware has no support to talk to RPMH.
>> >
>> DDR, shared clocks, regulators etc. Imagine you are running something on
>> the screen and CPUs enter low power mode, while the CPUs were active,
>> there was a need for bunch of display resources, and things the app may
>> have requested resources, while the CPU powered down the requests may
>> not be needed the full extent as when the CPU was running, so they can
>> voted down to a lower state of in some cases turn off the resources
>> completely. What the driver voted for is dependent on the runtime state
>> and the usecase currently active. The 'sleep' state value is also
>> determined by the driver/framework.
>>
>
>Why does CPU going down says that another (screen - supposedly shared)
>resource needs to be relinquished ? Shouldn't display decide that on it's
>own ? I have no idea why screen/display is brought into this discussion.

>CPU can just say: hey I am going down and I don't need my resource.
>How can it say: hey I am going down and display or screen also doesn't
>need the resource. On a multi-cluster, how will the last CPU on one know
>that it needs to act on behalf of the shared resource instead of another
>cluster.
>
Fair questions. Now how would the driver know that the CPUs have powered
down, to say, if you are not active, then you can put these resources in
low power state?
Well they don't, because sending out CPU power down notifications for
all CPUs and the cluster are expensive and can lead to lot of latency.
Instead, the drivers let the RPMH driver know that if and when the CPUs
power down, then you could request these resources to be in that low
power state. The CPU PD power off callbacks trigger the RPMH driver to
flush and request a low power state on behalf of all the drivers.

Drivers let know what their active state request for the resource is as
well as their CPU powered down state request is, in advance. The
'active' request is made immediately, while the 'sleep' request is
staged in. When the CPUs are to be powered off, this request is written
into a hardware registers. The CPU PM domain controller, after powering
down, will make these state requests in hardware thereby lowering the
standby power. The resource state is brought back into the 'active'
value before powering on the first CPU.

>I think we are mixing the system sleep states with CPU idle here.
>If it's system sleeps states, the we need to deal it in some system ops
>when it's the last CPU in the system and not the cluster/power domain.
>
I think the confusion for you is system sleep vs suspend. System sleep
here (probably more of a QC terminology), refers to powering down the
entire SoC for very small durations, while not actually suspended. The
drivers are unaware that this is happening. No hotplug happens and the
interrupts are not migrated during system sleep. When all the CPUs go
into cpuidle, the system sleep state is activated and the resource
requirements are lowered. The resources are brought back to their
previous active values before we exit cpuidle on any CPU. The drivers
have no idea that this happened. We have been doing this on QCOM SoCs
for a decade, so this is not something new for this SoC. Every QCOM SoC
has been doing this, albeit differently because of their architecture.
The newer ones do most of these transitions in hardware as opposed to an
remote CPU. But this is the first time, we are upstreaming this :)

Suspend is an altogether another idle state where drivers are notified
and relinquish their resources before the CPU powers down. Similar
things happen there as well, but at a much deeper level. Resources may
be turned off completely instead of just lowering to a low power state.

For example, suspend happens when the screen times out on a phone.
System sleep happens few hundred times when you are actively reading
something on the phone.

>> > Having to adapt DT to the firmware though the feature is fully discoverable
>> > is not at all good IMO. So the DT in this series *should work* with OSI
>> > mode if the firmware has the support for it, it's as simple as that.
>> >
>> The firmware is ATF and does not support OSI.
>>
>
>OK, to keep it simple: If a platform with PC mode only replaces the firmware
>with one that has OSI mode, we *shouldn't need* to change DT to suite it.
>I think I asked Ulf to add something similar in DT bindings.
>
Fair point and that is what this RFC intends to bring. That PM domains
are useful not just for PSCI, but also for Linux PM drivers such as this
one. We will discuss more how we can fold in platform specific
activities along with PSCI OSI state determination when the
domain->power_off is called. I have some ideas on that. Was hoping to
get to that after the inital idea is conveyed.

Thanks for your time.

Lina
Lina Iyer Oct. 12, 2018, 4:16 p.m. UTC | #11
On Fri, Oct 12 2018 at 09:46 -0600, Ulf Hansson wrote:
>On 12 October 2018 at 17:04, Sudeep Holla <sudeep.holla@arm.com> wrote:
>> On Thu, Oct 11, 2018 at 03:06:09PM -0600, Lina Iyer wrote:
>>> On Thu, Oct 11 2018 at 11:37 -0600, Sudeep Holla wrote:
>> [...]
>>
>>> >
>>> > Is DDR managed by Linux ? I assumed it was handled by higher exception
>>> > levels. Can you give examples of resources used by CPU in this context.
>>> > When CPU can be powered on or woken up without Linux intervention, the
>>> > same holds true for CPU power down or sleep states. I still see no reason
>>> > other than the firmware has no support to talk to RPMH.
>>> >
>>> DDR, shared clocks, regulators etc. Imagine you are running something on
>>> the screen and CPUs enter low power mode, while the CPUs were active,
>>> there was a need for bunch of display resources, and things the app may
>>> have requested resources, while the CPU powered down the requests may
>>> not be needed the full extent as when the CPU was running, so they can
>>> voted down to a lower state of in some cases turn off the resources
>>> completely. What the driver voted for is dependent on the runtime state
>>> and the usecase currently active. The 'sleep' state value is also
>>> determined by the driver/framework.
>>>
>>
>> Why does CPU going down says that another (screen - supposedly shared)
>> resource needs to be relinquished ? Shouldn't display decide that on it's
>> own ? I have no idea why screen/display is brought into this discussion.
>> CPU can just say: hey I am going down and I don't need my resource.
>> How can it say: hey I am going down and display or screen also doesn't
>> need the resource. On a multi-cluster, how will the last CPU on one know
>> that it needs to act on behalf of the shared resource instead of another
>> cluster.
>
>Apologize for sidetracking the discussion, just want to fold in a few comments.
>
No, this is perfect to warp the whole thing around. Thanks Ulf.

>This is becoming a complicated story. May I suggest we pick the GIC as
>an example instead?
>
>Let's assume the simple case, we have one cluster and when the cluster
>becomes powered off, the GIC needs to be re-configured and wakeups
>must be routed through some "always on" external logic.
>
>The PSCI spec mentions nothing about how to manage this and not the
>rest of the SoC topology for that matter. Hence if the GIC is managed
>by Linux - then Linux also needs to take actions before cluster power
>down and after cluster power up. So, if PSCI FW can't deal with GIC,
>how would manage it?
>
>>
>> I think we are mixing the system sleep states with CPU idle here.
>> If it's system sleeps states, the we need to deal it in some system ops
>> when it's the last CPU in the system and not the cluster/power domain.
>
>What is really a system sleep state? One could consider it just being
>another idles state, having heaver residency targets and greater
>enter/exit latency values, couldn't you?
>
While I explained the driver's side of the story, for the CPUs, system
sleep is a deeper low power mode that ties in with OSI.

Thanks,
Lina

>In the end, there is no reason to keep things powered on, unless they
>are being in used (or soon to be used), that is main point.
>
>We are also working on S2I at Linaro. We strive towards being able to
>show the same power numbers as for S2R, but then we need to get these
>cluster-idle things right.
>
>[...]
>
>Have a nice weekend!
>
>Kind regards
>Uffe
Sudeep Holla Oct. 12, 2018, 4:33 p.m. UTC | #12
On Fri, Oct 12, 2018 at 05:46:13PM +0200, Ulf Hansson wrote:
[...]

>
> Apologize for sidetracking the discussion, just want to fold in a few comments.
>
No need to apologize, you are just contributing to get better
understanding of the system.

> This is becoming a complicated story. May I suggest we pick the GIC as
> an example instead?
>

Sure.

> Let's assume the simple case, we have one cluster and when the cluster
> becomes powered off, the GIC needs to be re-configured and wakeups
> must be routed through some "always on" external logic.
>

OK, but is the cluster powered off with any wakeup configured(idling)
or with selected wakeups(system suspend/idle)

> The PSCI spec mentions nothing about how to manage this and not the
> rest of the SoC topology for that matter. Hence if the GIC is managed
> by Linux - then Linux also needs to take actions before cluster power
> down and after cluster power up. So, if PSCI FW can't deal with GIC,
> how would manage it?
>

To add to the complications, some of the configurations in GIC can be
done only in higher exception level. So we expect PSCI to power down
the GIC if possible and move the wakeup source accordingly based on the
platform. So PSCI F/W unable to deal with GIC is simply impossible.
It configures Group0/1 interrupts and generally Linux deals with Group 1.
E.g. First 8 SGI are put in Group 0 which is secure interrupts.

So by default, GIC driver in Linux sets MASK_ON_SUSPEND which ensures
only wake-up sources are kept enabled before entering suspend. If the
GIC is being powered down, secure side has to do it's book-keeping if
any and transfer the wakeups to any external always on wakeup controller.

> >
> > I think we are mixing the system sleep states with CPU idle here.
> > If it's system sleeps states, the we need to deal it in some system ops
> > when it's the last CPU in the system and not the cluster/power domain.
>
> What is really a system sleep state? One could consider it just being
> another idles state, having heaver residency targets and greater
> enter/exit latency values, couldn't you?
>

Yes, but theses are user triggered states where the system resources are
informed before entering it. By that I am referring system_suspend_ops.

> In the end, there is no reason to keep things powered on, unless they
> are being in used (or soon to be used), that is main point.
>

I assume by "they", you refer the GIC for example.

> We are also working on S2I at Linaro. We strive towards being able to
> show the same power numbers as for S2R, but then we need to get these
> cluster-idle things right.
>

I don't think anything prevents it. You may need to check how to execute
cpu_pm_suspend which in case S2R gets executed as syscore_suspend_ops.

We can't assume any idle states will take the GIC down and do that
always. At the same, representing this is DT is equal challenging as we
can't assume single idle state power domains. So the platform specific
firmware need to handle this transparently for OSPM.

> [...]
>
> Have a nice weekend!
>

You too have a nice one.

--
Regards,
Sudeep
Sudeep Holla Oct. 12, 2018, 5 p.m. UTC | #13
On Fri, Oct 12, 2018 at 10:04:27AM -0600, Lina Iyer wrote:
> On Fri, Oct 12 2018 at 09:04 -0600, Sudeep Holla wrote:

[...]

> > Why does CPU going down says that another (screen - supposedly shared)
> > resource needs to be relinquished ? Shouldn't display decide that on it's
> > own ? I have no idea why screen/display is brought into this discussion.
>
> > CPU can just say: hey I am going down and I don't need my resource.
> > How can it say: hey I am going down and display or screen also doesn't
> > need the resource. On a multi-cluster, how will the last CPU on one know
> > that it needs to act on behalf of the shared resource instead of another
> > cluster.
> >
> Fair questions. Now how would the driver know that the CPUs have powered
> down, to say, if you are not active, then you can put these resources in
> low power state?
> Well they don't, because sending out CPU power down notifications for
> all CPUs and the cluster are expensive and can lead to lot of latency.
> Instead, the drivers let the RPMH driver know that if and when the CPUs
> power down, then you could request these resources to be in that low
> power state. The CPU PD power off callbacks trigger the RPMH driver to
> flush and request a low power state on behalf of all the drivers.
>
> Drivers let know what their active state request for the resource is as
> well as their CPU powered down state request is, in advance. The
> 'active' request is made immediately, while the 'sleep' request is
> staged in. When the CPUs are to be powered off, this request is written
> into a hardware registers. The CPU PM domain controller, after powering
> down, will make these state requests in hardware thereby lowering the
> standby power. The resource state is brought back into the 'active'
> value before powering on the first CPU.
>

My understanding was in sync with most of the above except the staging
part in advance. So thanks for the detailed explanation.

Yes all these are fine but with multiple power-domains/cluster, it's
hard to determine the first CPU. You may be able to identify it within
the power domain but not system wide. So this doesn't scale with large
systems(e.g. 4 - 8 clusters with 16 CPUs).

> > I think we are mixing the system sleep states with CPU idle here.
> > If it's system sleeps states, the we need to deal it in some system ops
> > when it's the last CPU in the system and not the cluster/power domain.
> >
> I think the confusion for you is system sleep vs suspend. System sleep
> here (probably more of a QC terminology), refers to powering down the
> entire SoC for very small durations, while not actually suspended. The
> drivers are unaware that this is happening. No hotplug happens and the
> interrupts are not migrated during system sleep. When all the CPUs go
> into cpuidle, the system sleep state is activated and the resource
> requirements are lowered. The resources are brought back to their
> previous active values before we exit cpuidle on any CPU. The drivers
> have no idea that this happened. We have been doing this on QCOM SoCs
> for a decade, so this is not something new for this SoC. Every QCOM SoC
> has been doing this, albeit differently because of their architecture.
> The newer ones do most of these transitions in hardware as opposed to an
> remote CPU. But this is the first time, we are upstreaming this :)
>

Indeed, I know mobile platforms do such optimisations and I agree it may
save power. As I mentioned above it doesn't scale well with large systems
and also even with single power domains having multiple idle states where
only one state can do this system level idle but not all. As I mentioned
in the other email to Ulf, it's had to generalise this even with DT.
So it's better to have this dealt transparently in the firmware.

> Suspend is an altogether another idle state where drivers are notified
> and relinquish their resources before the CPU powers down. Similar
> things happen there as well, but at a much deeper level. Resources may
> be turned off completely instead of just lowering to a low power state.
>

Yes I understand the difference.

> For example, suspend happens when the screen times out on a phone.
> System sleep happens few hundred times when you are actively reading
> something on the phone.
>

Sure

> > > > Having to adapt DT to the firmware though the feature is fully discoverable
> > > > is not at all good IMO. So the DT in this series *should work* with OSI
> > > > mode if the firmware has the support for it, it's as simple as that.
> > > >
> > > The firmware is ATF and does not support OSI.
> > >
> >
> > OK, to keep it simple: If a platform with PC mode only replaces the firmware
> > with one that has OSI mode, we *shouldn't need* to change DT to suite it.
> > I think I asked Ulf to add something similar in DT bindings.
> >
> Fair point and that is what this RFC intends to bring. That PM domains
> are useful not just for PSCI, but also for Linux PM drivers such as this
> one. We will discuss more how we can fold in platform specific
> activities along with PSCI OSI state determination when the
> domain->power_off is called. I have some ideas on that. Was hoping to
> get to that after the inital idea is conveyed.
>

Got it. This is not a new discussion, I am sure this has been discussed
several times in the past. We have so much platform dependent code that
coming up with generic solution with DT is challenging. I have mentioned
just few of those. I am sure the list is much bigger. Hence the suggestion
is always to got with firmware based solution which is bested suited for
upstream and proven to work(e.g. on x86).

Have a nice weekend!

--
Regards,
Sudeep
Lina Iyer Oct. 12, 2018, 5:19 p.m. UTC | #14
On Fri, Oct 12 2018 at 11:01 -0600, Sudeep Holla wrote:
>On Fri, Oct 12, 2018 at 10:04:27AM -0600, Lina Iyer wrote:
>> On Fri, Oct 12 2018 at 09:04 -0600, Sudeep Holla wrote:
>
>[...]
>
>Yes all these are fine but with multiple power-domains/cluster, it's
>hard to determine the first CPU. You may be able to identify it within
>the power domain but not system wide. So this doesn't scale with large
>systems(e.g. 4 - 8 clusters with 16 CPUs).
>
We would probably not worry too much about power savings in a msec
scale, if we have that big a system. The driver is a platform specific
driver, primarily intended for a mobile class CPU and usage. In fact, we
haven't done this for QC's server class CPUs.

>> > I think we are mixing the system sleep states with CPU idle here.
>> > If it's system sleeps states, the we need to deal it in some system ops
>> > when it's the last CPU in the system and not the cluster/power domain.
>> >
>> I think the confusion for you is system sleep vs suspend. System sleep
>> here (probably more of a QC terminology), refers to powering down the
>> entire SoC for very small durations, while not actually suspended. The
>> drivers are unaware that this is happening. No hotplug happens and the
>> interrupts are not migrated during system sleep. When all the CPUs go
>> into cpuidle, the system sleep state is activated and the resource
>> requirements are lowered. The resources are brought back to their
>> previous active values before we exit cpuidle on any CPU. The drivers
>> have no idea that this happened. We have been doing this on QCOM SoCs
>> for a decade, so this is not something new for this SoC. Every QCOM SoC
>> has been doing this, albeit differently because of their architecture.
>> The newer ones do most of these transitions in hardware as opposed to an
>> remote CPU. But this is the first time, we are upstreaming this :)
>>
>
>Indeed, I know mobile platforms do such optimisations and I agree it may
>save power. As I mentioned above it doesn't scale well with large systems
>and also even with single power domains having multiple idle states where
>only one state can do this system level idle but not all. As I mentioned
>in the other email to Ulf, it's had to generalise this even with DT.
>So it's better to have this dealt transparently in the firmware.
>
Good, then we are on agreement here.
But this is how this platform is. It cannot be done in firmware and what
we doing here is a Linux platform driver that cleans up nicely without
having to piggy back on an external dependency.

Thanks,
Lina
Sudeep Holla Oct. 12, 2018, 5:25 p.m. UTC | #15
On Fri, Oct 12, 2018 at 11:19:10AM -0600, Lina Iyer wrote:
> On Fri, Oct 12 2018 at 11:01 -0600, Sudeep Holla wrote:
> > On Fri, Oct 12, 2018 at 10:04:27AM -0600, Lina Iyer wrote:
> > > On Fri, Oct 12 2018 at 09:04 -0600, Sudeep Holla wrote:
> >
> > [...]
> >
> > Yes all these are fine but with multiple power-domains/cluster, it's
> > hard to determine the first CPU. You may be able to identify it within
> > the power domain but not system wide. So this doesn't scale with large
> > systems(e.g. 4 - 8 clusters with 16 CPUs).
> >
> We would probably not worry too much about power savings in a msec
> scale, if we have that big a system. The driver is a platform specific
> driver, primarily intended for a mobile class CPU and usage. In fact, we
> haven't done this for QC's server class CPUs.
>

OK, along as there's no attempt to make it generic and keep it platform
specific, I am not that bothered.

> > > > I think we are mixing the system sleep states with CPU idle here.
> > > > If it's system sleeps states, the we need to deal it in some system ops
> > > > when it's the last CPU in the system and not the cluster/power domain.
> > > >
> > > I think the confusion for you is system sleep vs suspend. System sleep
> > > here (probably more of a QC terminology), refers to powering down the
> > > entire SoC for very small durations, while not actually suspended. The
> > > drivers are unaware that this is happening. No hotplug happens and the
> > > interrupts are not migrated during system sleep. When all the CPUs go
> > > into cpuidle, the system sleep state is activated and the resource
> > > requirements are lowered. The resources are brought back to their
> > > previous active values before we exit cpuidle on any CPU. The drivers
> > > have no idea that this happened. We have been doing this on QCOM SoCs
> > > for a decade, so this is not something new for this SoC. Every QCOM SoC
> > > has been doing this, albeit differently because of their architecture.
> > > The newer ones do most of these transitions in hardware as opposed to an
> > > remote CPU. But this is the first time, we are upstreaming this :)
> > >
> >
> > Indeed, I know mobile platforms do such optimisations and I agree it may
> > save power. As I mentioned above it doesn't scale well with large systems
> > and also even with single power domains having multiple idle states where
> > only one state can do this system level idle but not all. As I mentioned
> > in the other email to Ulf, it's had to generalise this even with DT.
> > So it's better to have this dealt transparently in the firmware.
> >
> Good, then we are on agreement here.

No worries.

> But this is how this platform is. It cannot be done in firmware and what
> we doing here is a Linux platform driver that cleans up nicely without
> having to piggy back on an external dependency.
>
Yes Qcom always says it can't be done in firmware. Even PSCI was adopted 
after couple of years of pushback.

--
Regards,
Sudeep
Raju P.L.S.S.S.N Oct. 12, 2018, 6:10 p.m. UTC | #16
On 10/12/2018 7:55 PM, Sudeep Holla wrote:
> On Thu, Oct 11, 2018 at 02:50:54AM +0530, Raju P.L.S.S.S.N wrote:
>> Use cpu hotplug callback mechanism to attach/dettach the cpu in
>> the cpu power domain. During cpu hotplug callback registration,
>> the starting callback is invoked on all online cpus. So there is
>> no need to attach from device probe.
>>
>> Signed-off-by: Raju P.L.S.S.S.N <rplsssn@codeaurora.org>
>> ---
>>   drivers/soc/qcom/cpu_pd.c  | 33 +++++++++++++++++++++++++--------
>>   include/linux/cpuhotplug.h |  1 +
>>   2 files changed, 26 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/soc/qcom/cpu_pd.c b/drivers/soc/qcom/cpu_pd.c
>> index 242eced..bf2d95d 100644
>> --- a/drivers/soc/qcom/cpu_pd.c
>> +++ b/drivers/soc/qcom/cpu_pd.c
>> @@ -3,6 +3,7 @@
>>    * Copyright (c) 2018, The Linux Foundation. All rights reserved.
>>    */
>>   
>> +#include <linux/cpuhotplug.h>
>>   #include <linux/ktime.h>
>>   #include <linux/of_platform.h>
>>   #include <linux/pm_domain.h>
>> @@ -85,12 +86,26 @@ static int cpu_pd_power_off(struct generic_pm_domain *domain)
>>   	return 0;
>>   }
>>   
>> +static int cpu_pd_starting(unsigned int cpu)
>> +{
>> +	if (!suspend && of_genpd_attach_cpu(cpu))
>> +		pr_err("%s: genpd_attach_cpu fail\n", __func__);
> 
> If it's that serious, return proper error code and drop the message which
> could be annoying if you are running some hotplug test loop.

Sure will do that.

Thanks Sudeep.

- Raju

> 
> --
> Regards,
> Sudeep
>
Lina Iyer Oct. 22, 2018, 7:50 p.m. UTC | #17
On Fri, Oct 12 2018 at 11:25 -0600, Sudeep Holla wrote:
>On Fri, Oct 12, 2018 at 11:19:10AM -0600, Lina Iyer wrote:
>> On Fri, Oct 12 2018 at 11:01 -0600, Sudeep Holla wrote:
>> > On Fri, Oct 12, 2018 at 10:04:27AM -0600, Lina Iyer wrote:
>> > > On Fri, Oct 12 2018 at 09:04 -0600, Sudeep Holla wrote:
>> >
>> > [...]
>> >
>> > Yes all these are fine but with multiple power-domains/cluster, it's
>> > hard to determine the first CPU. You may be able to identify it within
>> > the power domain but not system wide. So this doesn't scale with large
>> > systems(e.g. 4 - 8 clusters with 16 CPUs).
>> >
>> We would probably not worry too much about power savings in a msec
>> scale, if we have that big a system. The driver is a platform specific
>> driver, primarily intended for a mobile class CPU and usage. In fact, we
>> haven't done this for QC's server class CPUs.
>>
>
>OK, along as there's no attempt to make it generic and keep it platform
>specific, I am not that bothered.
>
>> > > > I think we are mixing the system sleep states with CPU idle here.
>> > > > If it's system sleeps states, the we need to deal it in some system ops
>> > > > when it's the last CPU in the system and not the cluster/power domain.
>> > > >
>> > > I think the confusion for you is system sleep vs suspend. System sleep
>> > > here (probably more of a QC terminology), refers to powering down the
>> > > entire SoC for very small durations, while not actually suspended. The
>> > > drivers are unaware that this is happening. No hotplug happens and the
>> > > interrupts are not migrated during system sleep. When all the CPUs go
>> > > into cpuidle, the system sleep state is activated and the resource
>> > > requirements are lowered. The resources are brought back to their
>> > > previous active values before we exit cpuidle on any CPU. The drivers
>> > > have no idea that this happened. We have been doing this on QCOM SoCs
>> > > for a decade, so this is not something new for this SoC. Every QCOM SoC
>> > > has been doing this, albeit differently because of their architecture.
>> > > The newer ones do most of these transitions in hardware as opposed to an
>> > > remote CPU. But this is the first time, we are upstreaming this :)
>> > >
>> >
>> > Indeed, I know mobile platforms do such optimisations and I agree it may
>> > save power. As I mentioned above it doesn't scale well with large systems
>> > and also even with single power domains having multiple idle states where
>> > only one state can do this system level idle but not all. As I mentioned
>> > in the other email to Ulf, it's had to generalise this even with DT.
>> > So it's better to have this dealt transparently in the firmware.
>> >
>> Good, then we are on agreement here.
>
It was brought to my attention that there may be some misunderstanding
here. I still believe we need to do this for small systems like the
mobile platforms and the solution may not scale well to servers. We
don't plan to extend the solution to anything other than the mobile SoC.

>No worries.
>
Thanks,
Lina
diff mbox series

Patch

diff --git a/drivers/soc/qcom/cpu_pd.c b/drivers/soc/qcom/cpu_pd.c
index 242eced..bf2d95d 100644
--- a/drivers/soc/qcom/cpu_pd.c
+++ b/drivers/soc/qcom/cpu_pd.c
@@ -3,6 +3,7 @@ 
  * Copyright (c) 2018, The Linux Foundation. All rights reserved.
  */
 
+#include <linux/cpuhotplug.h>
 #include <linux/ktime.h>
 #include <linux/of_platform.h>
 #include <linux/pm_domain.h>
@@ -85,12 +86,26 @@  static int cpu_pd_power_off(struct generic_pm_domain *domain)
 	return 0;
 }
 
+static int cpu_pd_starting(unsigned int cpu)
+{
+	if (!suspend && of_genpd_attach_cpu(cpu))
+		pr_err("%s: genpd_attach_cpu fail\n", __func__);
+	return 0;
+}
+
+static int cpu_pd_dying(unsigned int cpu)
+{
+	if (!suspend)
+		of_genpd_detach_cpu(cpu);
+	return 0;
+}
+
 static int cpu_pm_domain_probe(struct platform_device *pdev)
 {
 	struct device *dev = &pdev->dev;
 	struct device_node *np = dev->of_node;
 	struct generic_pm_domain *cpu_pd;
-	int ret = -EINVAL, cpu;
+	int ret = -EINVAL;
 
 	if (!np) {
 		dev_err(dev, "device tree node not found\n");
@@ -127,15 +142,17 @@  static int cpu_pm_domain_probe(struct platform_device *pdev)
 
 	pr_info("init PM domain %s\n", cpu_pd->name);
 
-	for_each_present_cpu(cpu) {
-		ret = of_genpd_attach_cpu(cpu);
-		if (ret)
-			goto detach_cpu;
-	}
+	/* Install hotplug callbacks */
+	ret = cpuhp_setup_state(CPUHP_AP_QCOM_SYS_PM_DOMAIN_STARTING,
+				"AP_QCOM_SYS_PM_DOMAIN_STARTING",
+				cpu_pd_starting, cpu_pd_dying);
+	if (ret)
+		goto remove_hotplug;
+
 	return 0;
 
-detach_cpu:
-	of_genpd_detach_cpu(cpu);
+remove_hotplug:
+	cpuhp_remove_state(CPUHP_AP_QCOM_SYS_PM_DOMAIN_STARTING);
 
 remove_pd:
 	pm_genpd_remove(cpu_pd);
diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index caf40ad..4bfe8e2 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -139,6 +139,7 @@  enum cpuhp_state {
 	CPUHP_AP_X86_TBOOT_DYING,
 	CPUHP_AP_ARM_CACHE_B15_RAC_DYING,
 	CPUHP_AP_ONLINE,
+	CPUHP_AP_QCOM_SYS_PM_DOMAIN_STARTING,
 	CPUHP_TEARDOWN_CPU,
 	CPUHP_AP_ONLINE_IDLE,
 	CPUHP_AP_SMPBOOT_THREADS,