diff mbox series

[v4,10/14] cpuidle: psci: Prepare to use OS initiated suspend mode via PM domains

Message ID 20191211154343.29765-11-ulf.hansson@linaro.org (mailing list archive)
State Not Applicable, archived
Headers show
Series cpuidle: psci: Support hierarchical CPU arrangement | expand

Commit Message

Ulf Hansson Dec. 11, 2019, 3:43 p.m. UTC
The per CPU variable psci_power_state, contains an array of fixed values,
which reflects the corresponding arm,psci-suspend-param parsed from DT, for
each of the available CPU idle states.

This isn't sufficient when using the hierarchical CPU topology in DT, in
combination with having PSCI OS initiated (OSI) mode enabled. More
precisely, in OSI mode, Linux is responsible of telling the PSCI FW what
idle state the cluster (a group of CPUs) should enter, while in PSCI
Platform Coordinated (PC) mode, each CPU independently votes for an idle
state of the cluster.

For this reason, introduce a per CPU variable called domain_state and
implement two helper functions to read/write its value. Then let the
domain_state take precedence over the regular selected state, when entering
and idle state.

To avoid executing the above OSI specific code in the ->enter() callback,
while operating in the default PSCI Platform Coordinated mode, let's also
add a new enter-function and use it for OSI.

Co-developed-by: Lina Iyer <lina.iyer@linaro.org>
Signed-off-by: Lina Iyer <lina.iyer@linaro.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
---

Changes in v4:
	- Rebased on top of earlier changes.
	- Add comment about using the deepest cpuidle state for the domain state
	selection.

---
 drivers/cpuidle/cpuidle-psci.c | 56 ++++++++++++++++++++++++++++++----
 1 file changed, 50 insertions(+), 6 deletions(-)

Comments

Sudeep Holla Dec. 19, 2019, 2:31 p.m. UTC | #1
On Wed, Dec 11, 2019 at 04:43:39PM +0100, Ulf Hansson wrote:
> The per CPU variable psci_power_state, contains an array of fixed values,
> which reflects the corresponding arm,psci-suspend-param parsed from DT, for
> each of the available CPU idle states.
>
> This isn't sufficient when using the hierarchical CPU topology in DT, in
> combination with having PSCI OS initiated (OSI) mode enabled. More
> precisely, in OSI mode, Linux is responsible of telling the PSCI FW what
> idle state the cluster (a group of CPUs) should enter, while in PSCI
> Platform Coordinated (PC) mode, each CPU independently votes for an idle
> state of the cluster.
>
> For this reason, introduce a per CPU variable called domain_state and
> implement two helper functions to read/write its value. Then let the
> domain_state take precedence over the regular selected state, when entering
> and idle state.
>
> To avoid executing the above OSI specific code in the ->enter() callback,
> while operating in the default PSCI Platform Coordinated mode, let's also
> add a new enter-function and use it for OSI.
>
> Co-developed-by: Lina Iyer <lina.iyer@linaro.org>
> Signed-off-by: Lina Iyer <lina.iyer@linaro.org>
> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
> ---
>
> Changes in v4:
> 	- Rebased on top of earlier changes.
> 	- Add comment about using the deepest cpuidle state for the domain state
> 	selection.
>
> ---
>  drivers/cpuidle/cpuidle-psci.c | 56 ++++++++++++++++++++++++++++++----
>  1 file changed, 50 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/cpuidle/cpuidle-psci.c b/drivers/cpuidle/cpuidle-psci.c
> index 6a87848be3c3..9600fe674a89 100644
> --- a/drivers/cpuidle/cpuidle-psci.c
> +++ b/drivers/cpuidle/cpuidle-psci.c
> @@ -29,14 +29,47 @@ struct psci_cpuidle_data {
>  };
>
>  static DEFINE_PER_CPU_READ_MOSTLY(struct psci_cpuidle_data, psci_cpuidle_data);
> +static DEFINE_PER_CPU(u32, domain_state);
> +

[...]

> +static int psci_enter_domain_idle_state(struct cpuidle_device *dev,
> +					struct cpuidle_driver *drv, int idx)
> +{
> +	struct psci_cpuidle_data *data = this_cpu_ptr(&psci_cpuidle_data);
> +	u32 *states = data->psci_states;

Why can't the above be like this for consistency(see below in
psci_enter_idle_state) ?

 	u32 *states = __this_cpu_read(psci_cpuidle_data.psci_states);

> +	u32 state = psci_get_domain_state();
> +	int ret;
> +
> +	if (!state)
> +		state = states[idx];
> +
> +	ret = psci_enter_state(idx, state);
> +
> +	/* Clear the domain state to start fresh when back from idle. */
> +	psci_set_domain_state(0);
> +	return ret;
> +}
>

[...]

> @@ -118,6 +152,15 @@ static int __init psci_dt_cpu_init_idle(struct device_node *cpu_node,
>  			ret = PTR_ERR(data->dev);
>  			goto free_mem;
>  		}
> +
> +		/*
> +		 * Using the deepest state for the CPU to trigger a potential
> +		 * selection of a shared state for the domain, assumes the
> +		 * domain states are all deeper states.
> +		 */
> +		if (data->dev)

You can drop this check as return on error above.

> +			drv->states[state_count - 1].enter =
> +				psci_enter_domain_idle_state;

I see the comment above but this potential blocks retention mode at
cluster level when all cpu enter retention at CPU level. I don't like
this assumption, but I don't have any better suggestion. Please add the
note that we can't enter RETENTION state at cluster/domain level when
all CPUs enter at CPU level.

As I wrote above I got another doubt. What if platform specifies just
RETENTION state at CPU as well as Cluster/domain ? I think it should be
fine, just asking it out loud.

--
Regards,
Sudeep
Ulf Hansson Dec. 19, 2019, 3:48 p.m. UTC | #2
On Thu, 19 Dec 2019 at 15:32, Sudeep Holla <sudeep.holla@arm.com> wrote:
>
> On Wed, Dec 11, 2019 at 04:43:39PM +0100, Ulf Hansson wrote:
> > The per CPU variable psci_power_state, contains an array of fixed values,
> > which reflects the corresponding arm,psci-suspend-param parsed from DT, for
> > each of the available CPU idle states.
> >
> > This isn't sufficient when using the hierarchical CPU topology in DT, in
> > combination with having PSCI OS initiated (OSI) mode enabled. More
> > precisely, in OSI mode, Linux is responsible of telling the PSCI FW what
> > idle state the cluster (a group of CPUs) should enter, while in PSCI
> > Platform Coordinated (PC) mode, each CPU independently votes for an idle
> > state of the cluster.
> >
> > For this reason, introduce a per CPU variable called domain_state and
> > implement two helper functions to read/write its value. Then let the
> > domain_state take precedence over the regular selected state, when entering
> > and idle state.
> >
> > To avoid executing the above OSI specific code in the ->enter() callback,
> > while operating in the default PSCI Platform Coordinated mode, let's also
> > add a new enter-function and use it for OSI.
> >
> > Co-developed-by: Lina Iyer <lina.iyer@linaro.org>
> > Signed-off-by: Lina Iyer <lina.iyer@linaro.org>
> > Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
> > ---
> >
> > Changes in v4:
> >       - Rebased on top of earlier changes.
> >       - Add comment about using the deepest cpuidle state for the domain state
> >       selection.
> >
> > ---
> >  drivers/cpuidle/cpuidle-psci.c | 56 ++++++++++++++++++++++++++++++----
> >  1 file changed, 50 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/cpuidle/cpuidle-psci.c b/drivers/cpuidle/cpuidle-psci.c
> > index 6a87848be3c3..9600fe674a89 100644
> > --- a/drivers/cpuidle/cpuidle-psci.c
> > +++ b/drivers/cpuidle/cpuidle-psci.c
> > @@ -29,14 +29,47 @@ struct psci_cpuidle_data {
> >  };
> >
> >  static DEFINE_PER_CPU_READ_MOSTLY(struct psci_cpuidle_data, psci_cpuidle_data);
> > +static DEFINE_PER_CPU(u32, domain_state);
> > +
>
> [...]
>
> > +static int psci_enter_domain_idle_state(struct cpuidle_device *dev,
> > +                                     struct cpuidle_driver *drv, int idx)
> > +{
> > +     struct psci_cpuidle_data *data = this_cpu_ptr(&psci_cpuidle_data);
> > +     u32 *states = data->psci_states;
>
> Why can't the above be like this for consistency(see below in
> psci_enter_idle_state) ?

You have a point, however in patch11 I am adding this line below.

struct device *pd_dev = data->dev;

So I don't think it matters much, agree?

>
>         u32 *states = __this_cpu_read(psci_cpuidle_data.psci_states);
>
> > +     u32 state = psci_get_domain_state();
> > +     int ret;
> > +
> > +     if (!state)
> > +             state = states[idx];
> > +
> > +     ret = psci_enter_state(idx, state);
> > +
> > +     /* Clear the domain state to start fresh when back from idle. */
> > +     psci_set_domain_state(0);
> > +     return ret;
> > +}
> >
>
> [...]
>
> > @@ -118,6 +152,15 @@ static int __init psci_dt_cpu_init_idle(struct device_node *cpu_node,
> >                       ret = PTR_ERR(data->dev);
> >                       goto free_mem;
> >               }
> > +
> > +             /*
> > +              * Using the deepest state for the CPU to trigger a potential
> > +              * selection of a shared state for the domain, assumes the
> > +              * domain states are all deeper states.
> > +              */
> > +             if (data->dev)
>
> You can drop this check as return on error above.

Actually not, because if OSI is supported, there is still a
possibility that the PM domain topology isn't used.

This means ->data->dev is NULL.

>
> > +                     drv->states[state_count - 1].enter =
> > +                             psci_enter_domain_idle_state;
>
> I see the comment above but this potential blocks retention mode at
> cluster level when all cpu enter retention at CPU level. I don't like
> this assumption, but I don't have any better suggestion. Please add the
> note that we can't enter RETENTION state at cluster/domain level when
> all CPUs enter at CPU level.

You are correct, but I think the comment a few lines above (agreed to
be added by Lorenzo in the previous version) should be enough to
explain that. No?

The point is, this is only a problem if cluster RETENTION is
considered to be a shallower state that CPU power off, for example.

>
> As I wrote above I got another doubt. What if platform specifies just
> RETENTION state at CPU as well as Cluster/domain ? I think it should be
> fine, just asking it out loud.

It's fine.

However, I am looking at what future improvements that can be made.
This is one of them, but let's discuss that later on.

Kind regards
Uffe
Sudeep Holla Dec. 19, 2019, 6:01 p.m. UTC | #3
On Thu, Dec 19, 2019 at 04:48:13PM +0100, Ulf Hansson wrote:
> On Thu, 19 Dec 2019 at 15:32, Sudeep Holla <sudeep.holla@arm.com> wrote:
> >
> > On Wed, Dec 11, 2019 at 04:43:39PM +0100, Ulf Hansson wrote:
> > > The per CPU variable psci_power_state, contains an array of fixed values,
> > > which reflects the corresponding arm,psci-suspend-param parsed from DT, for
> > > each of the available CPU idle states.
> > >
> > > This isn't sufficient when using the hierarchical CPU topology in DT, in
> > > combination with having PSCI OS initiated (OSI) mode enabled. More
> > > precisely, in OSI mode, Linux is responsible of telling the PSCI FW what
> > > idle state the cluster (a group of CPUs) should enter, while in PSCI
> > > Platform Coordinated (PC) mode, each CPU independently votes for an idle
> > > state of the cluster.
> > >
> > > For this reason, introduce a per CPU variable called domain_state and
> > > implement two helper functions to read/write its value. Then let the
> > > domain_state take precedence over the regular selected state, when entering
> > > and idle state.
> > >
> > > To avoid executing the above OSI specific code in the ->enter() callback,
> > > while operating in the default PSCI Platform Coordinated mode, let's also
> > > add a new enter-function and use it for OSI.
> > >
> > > Co-developed-by: Lina Iyer <lina.iyer@linaro.org>
> > > Signed-off-by: Lina Iyer <lina.iyer@linaro.org>
> > > Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
> > > ---
> > >
> > > Changes in v4:
> > >       - Rebased on top of earlier changes.
> > >       - Add comment about using the deepest cpuidle state for the domain state
> > >       selection.
> > >
> > > ---
> > >  drivers/cpuidle/cpuidle-psci.c | 56 ++++++++++++++++++++++++++++++----
> > >  1 file changed, 50 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/drivers/cpuidle/cpuidle-psci.c b/drivers/cpuidle/cpuidle-psci.c
> > > index 6a87848be3c3..9600fe674a89 100644
> > > --- a/drivers/cpuidle/cpuidle-psci.c
> > > +++ b/drivers/cpuidle/cpuidle-psci.c
> > > @@ -29,14 +29,47 @@ struct psci_cpuidle_data {
> > >  };
> > >
> > >  static DEFINE_PER_CPU_READ_MOSTLY(struct psci_cpuidle_data, psci_cpuidle_data);
> > > +static DEFINE_PER_CPU(u32, domain_state);
> > > +
> >
> > [...]
> >
> > > +static int psci_enter_domain_idle_state(struct cpuidle_device *dev,
> > > +                                     struct cpuidle_driver *drv, int idx)
> > > +{
> > > +     struct psci_cpuidle_data *data = this_cpu_ptr(&psci_cpuidle_data);
> > > +     u32 *states = data->psci_states;
> >
> > Why can't the above be like this for consistency(see below in
> > psci_enter_idle_state) ?
>
> You have a point, however in patch11 I am adding this line below.
>
> struct device *pd_dev = data->dev;
>
> So I don't think it matters much, agree?
>

Ah OK, looked odd as part of this patch, may be you could have moved
this change into that patch. Anyways fine as is.

> >
> >         u32 *states = __this_cpu_read(psci_cpuidle_data.psci_states);
> >
> > > +     u32 state = psci_get_domain_state();
> > > +     int ret;
> > > +
> > > +     if (!state)
> > > +             state = states[idx];
> > > +
> > > +     ret = psci_enter_state(idx, state);
> > > +
> > > +     /* Clear the domain state to start fresh when back from idle. */
> > > +     psci_set_domain_state(0);
> > > +     return ret;
> > > +}
> > >
> >
> > [...]
> >
> > > @@ -118,6 +152,15 @@ static int __init psci_dt_cpu_init_idle(struct device_node *cpu_node,
> > >                       ret = PTR_ERR(data->dev);
> > >                       goto free_mem;
> > >               }
> > > +
> > > +             /*
> > > +              * Using the deepest state for the CPU to trigger a potential
> > > +              * selection of a shared state for the domain, assumes the
> > > +              * domain states are all deeper states.
> > > +              */
> > > +             if (data->dev)
> >
> > You can drop this check as return on error above.
>
> Actually not, because if OSI is supported, there is still a
> possibility that the PM domain topology isn't used.
>

And how do we support that ? I am missing something here.

> This means ->data->dev is NULL.
>

I don't get that.

> >
> > > +                     drv->states[state_count - 1].enter =
> > > +                             psci_enter_domain_idle_state;
> >
> > I see the comment above but this potential blocks retention mode at
> > cluster level when all cpu enter retention at CPU level. I don't like
> > this assumption, but I don't have any better suggestion. Please add the
> > note that we can't enter RETENTION state at cluster/domain level when
> > all CPUs enter at CPU level.
>
> You are correct, but I think the comment a few lines above (agreed to
> be added by Lorenzo in the previous version) should be enough to
> explain that. No?
>
> The point is, this is only a problem if cluster RETENTION is
> considered to be a shallower state that CPU power off, for example.
>

Yes, but give examples makes it better and helps people who may be
wondering why cluster retention state is not being entered. You can just
add to the above comment:

"e.g. If CPU Retention is one of the shallower state, then we can't enter
any of the allowed domain states."

> >
> > As I wrote above I got another doubt. What if platform specifies just
> > RETENTION state at CPU as well as Cluster/domain ? I think it should be
> > fine, just asking it out loud.
>
> It's fine.
>
> However, I am looking at what future improvements that can be made.
> This is one of them, but let's discuss that later on.
>

OK

--
Regards,
Sudeep
Ulf Hansson Dec. 19, 2019, 9:33 p.m. UTC | #4
On Thu, 19 Dec 2019 at 19:01, Sudeep Holla <sudeep.holla@arm.com> wrote:
>
> On Thu, Dec 19, 2019 at 04:48:13PM +0100, Ulf Hansson wrote:
> > On Thu, 19 Dec 2019 at 15:32, Sudeep Holla <sudeep.holla@arm.com> wrote:
> > >
> > > On Wed, Dec 11, 2019 at 04:43:39PM +0100, Ulf Hansson wrote:
> > > > The per CPU variable psci_power_state, contains an array of fixed values,
> > > > which reflects the corresponding arm,psci-suspend-param parsed from DT, for
> > > > each of the available CPU idle states.
> > > >
> > > > This isn't sufficient when using the hierarchical CPU topology in DT, in
> > > > combination with having PSCI OS initiated (OSI) mode enabled. More
> > > > precisely, in OSI mode, Linux is responsible of telling the PSCI FW what
> > > > idle state the cluster (a group of CPUs) should enter, while in PSCI
> > > > Platform Coordinated (PC) mode, each CPU independently votes for an idle
> > > > state of the cluster.
> > > >
> > > > For this reason, introduce a per CPU variable called domain_state and
> > > > implement two helper functions to read/write its value. Then let the
> > > > domain_state take precedence over the regular selected state, when entering
> > > > and idle state.
> > > >
> > > > To avoid executing the above OSI specific code in the ->enter() callback,
> > > > while operating in the default PSCI Platform Coordinated mode, let's also
> > > > add a new enter-function and use it for OSI.
> > > >
> > > > Co-developed-by: Lina Iyer <lina.iyer@linaro.org>
> > > > Signed-off-by: Lina Iyer <lina.iyer@linaro.org>
> > > > Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
> > > > ---
> > > >
> > > > Changes in v4:
> > > >       - Rebased on top of earlier changes.
> > > >       - Add comment about using the deepest cpuidle state for the domain state
> > > >       selection.
> > > >
> > > > ---
> > > >  drivers/cpuidle/cpuidle-psci.c | 56 ++++++++++++++++++++++++++++++----
> > > >  1 file changed, 50 insertions(+), 6 deletions(-)
> > > >
> > > > diff --git a/drivers/cpuidle/cpuidle-psci.c b/drivers/cpuidle/cpuidle-psci.c
> > > > index 6a87848be3c3..9600fe674a89 100644
> > > > --- a/drivers/cpuidle/cpuidle-psci.c
> > > > +++ b/drivers/cpuidle/cpuidle-psci.c
> > > > @@ -29,14 +29,47 @@ struct psci_cpuidle_data {
> > > >  };
> > > >
> > > >  static DEFINE_PER_CPU_READ_MOSTLY(struct psci_cpuidle_data, psci_cpuidle_data);
> > > > +static DEFINE_PER_CPU(u32, domain_state);
> > > > +
> > >
> > > [...]
> > >
> > > > +static int psci_enter_domain_idle_state(struct cpuidle_device *dev,
> > > > +                                     struct cpuidle_driver *drv, int idx)
> > > > +{
> > > > +     struct psci_cpuidle_data *data = this_cpu_ptr(&psci_cpuidle_data);
> > > > +     u32 *states = data->psci_states;
> > >
> > > Why can't the above be like this for consistency(see below in
> > > psci_enter_idle_state) ?
> >
> > You have a point, however in patch11 I am adding this line below.
> >
> > struct device *pd_dev = data->dev;
> >
> > So I don't think it matters much, agree?
> >
>
> Ah OK, looked odd as part of this patch, may be you could have moved
> this change into that patch. Anyways fine as is.

Okay, then I rather just keep it.

>
> > >
> > >         u32 *states = __this_cpu_read(psci_cpuidle_data.psci_states);
> > >
> > > > +     u32 state = psci_get_domain_state();
> > > > +     int ret;
> > > > +
> > > > +     if (!state)
> > > > +             state = states[idx];
> > > > +
> > > > +     ret = psci_enter_state(idx, state);
> > > > +
> > > > +     /* Clear the domain state to start fresh when back from idle. */
> > > > +     psci_set_domain_state(0);
> > > > +     return ret;
> > > > +}
> > > >
> > >
> > > [...]
> > >
> > > > @@ -118,6 +152,15 @@ static int __init psci_dt_cpu_init_idle(struct device_node *cpu_node,
> > > >                       ret = PTR_ERR(data->dev);
> > > >                       goto free_mem;
> > > >               }
> > > > +
> > > > +             /*
> > > > +              * Using the deepest state for the CPU to trigger a potential
> > > > +              * selection of a shared state for the domain, assumes the
> > > > +              * domain states are all deeper states.
> > > > +              */
> > > > +             if (data->dev)
> > >
> > > You can drop this check as return on error above.
> >
> > Actually not, because if OSI is supported, there is still a
> > possibility that the PM domain topology isn't used.
> >
>
> And how do we support that ? I am missing something here.
>
> > This means ->data->dev is NULL.
> >
>
> I don't get that.

This is quite similar to the existing limited support we have for OSI today.

We are using the idle states for the CPU, but ignoring the idle states
for the cluster. If you just skip applying the DTS patch14, this is
what happens.

>
> > >
> > > > +                     drv->states[state_count - 1].enter =
> > > > +                             psci_enter_domain_idle_state;
> > >
> > > I see the comment above but this potential blocks retention mode at
> > > cluster level when all cpu enter retention at CPU level. I don't like
> > > this assumption, but I don't have any better suggestion. Please add the
> > > note that we can't enter RETENTION state at cluster/domain level when
> > > all CPUs enter at CPU level.
> >
> > You are correct, but I think the comment a few lines above (agreed to
> > be added by Lorenzo in the previous version) should be enough to
> > explain that. No?
> >
> > The point is, this is only a problem if cluster RETENTION is
> > considered to be a shallower state that CPU power off, for example.
> >
>
> Yes, but give examples makes it better and helps people who may be
> wondering why cluster retention state is not being entered. You can just
> add to the above comment:
>
> "e.g. If CPU Retention is one of the shallower state, then we can't enter
> any of the allowed domain states."

Hmm, that it's not a correct statement I think, let me elaborate.

The problem is, that in case the CPU has both RETENTION and POWER OFF
(deepest CPU state), we would only be able to reach a cluster state
(RETENTION or POWER OFF) when the CPUs are in CPU POWER OFF (as that's
the deepest).

This is okay, as long as a cluster RETENTION state is considered being
"deeper" than the CPU POWER OFF state. However, if that isn't the
case, it means  the cluster RETENTION state is not considered in the
correct order, but it's still possible to reach as a "domain state".

I think this all is kind of summarized in the comment I agreed upon
with Lorenzo, but if you still think there is some clarification
needed I happy to add it.

Makes sense?

[...]

Kind regards
Uffe
Sudeep Holla Dec. 20, 2019, 10:01 a.m. UTC | #5
On Thu, Dec 19, 2019 at 10:33:34PM +0100, Ulf Hansson wrote:
> On Thu, 19 Dec 2019 at 19:01, Sudeep Holla <sudeep.holla@arm.com> wrote:
> >
> > On Thu, Dec 19, 2019 at 04:48:13PM +0100, Ulf Hansson wrote:
> > > On Thu, 19 Dec 2019 at 15:32, Sudeep Holla <sudeep.holla@arm.com> wrote:
> > > >
> > > > On Wed, Dec 11, 2019 at 04:43:39PM +0100, Ulf Hansson wrote:
> > > > > The per CPU variable psci_power_state, contains an array of fixed values,
> > > > > which reflects the corresponding arm,psci-suspend-param parsed from DT, for
> > > > > each of the available CPU idle states.
> > > > >
> > > > > This isn't sufficient when using the hierarchical CPU topology in DT, in
> > > > > combination with having PSCI OS initiated (OSI) mode enabled. More
> > > > > precisely, in OSI mode, Linux is responsible of telling the PSCI FW what
> > > > > idle state the cluster (a group of CPUs) should enter, while in PSCI
> > > > > Platform Coordinated (PC) mode, each CPU independently votes for an idle
> > > > > state of the cluster.
> > > > >
> > > > > For this reason, introduce a per CPU variable called domain_state and
> > > > > implement two helper functions to read/write its value. Then let the
> > > > > domain_state take precedence over the regular selected state, when entering
> > > > > and idle state.
> > > > >
> > > > > To avoid executing the above OSI specific code in the ->enter() callback,
> > > > > while operating in the default PSCI Platform Coordinated mode, let's also
> > > > > add a new enter-function and use it for OSI.
> > > > >
> > > > > Co-developed-by: Lina Iyer <lina.iyer@linaro.org>
> > > > > Signed-off-by: Lina Iyer <lina.iyer@linaro.org>
> > > > > Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
> > > > > ---
> > > > >
> > > > > Changes in v4:
> > > > >       - Rebased on top of earlier changes.
> > > > >       - Add comment about using the deepest cpuidle state for the domain state
> > > > >       selection.
> > > > >
> > > > > ---
> > > > >  drivers/cpuidle/cpuidle-psci.c | 56 ++++++++++++++++++++++++++++++----
> > > > >  1 file changed, 50 insertions(+), 6 deletions(-)
> > > > >
> > > > > diff --git a/drivers/cpuidle/cpuidle-psci.c b/drivers/cpuidle/cpuidle-psci.c
> > > > > index 6a87848be3c3..9600fe674a89 100644
> > > > > --- a/drivers/cpuidle/cpuidle-psci.c
> > > > > +++ b/drivers/cpuidle/cpuidle-psci.c
> > > > > @@ -29,14 +29,47 @@ struct psci_cpuidle_data {
> > > > >  };
> > > > >
> > > > >  static DEFINE_PER_CPU_READ_MOSTLY(struct psci_cpuidle_data, psci_cpuidle_data);
> > > > > +static DEFINE_PER_CPU(u32, domain_state);
> > > > > +
> > > >
> > > > [...]
> > > >
> > > > > +static int psci_enter_domain_idle_state(struct cpuidle_device *dev,
> > > > > +                                     struct cpuidle_driver *drv, int idx)
> > > > > +{
> > > > > +     struct psci_cpuidle_data *data = this_cpu_ptr(&psci_cpuidle_data);
> > > > > +     u32 *states = data->psci_states;
> > > >
> > > > Why can't the above be like this for consistency(see below in
> > > > psci_enter_idle_state) ?
> > >
> > > You have a point, however in patch11 I am adding this line below.
> > >
> > > struct device *pd_dev = data->dev;
> > >
> > > So I don't think it matters much, agree?
> > >
> >
> > Ah OK, looked odd as part of this patch, may be you could have moved
> > this change into that patch. Anyways fine as is.
>
> Okay, then I rather just keep it.
>
> >
> > > >
> > > >         u32 *states = __this_cpu_read(psci_cpuidle_data.psci_states);
> > > >
> > > > > +     u32 state = psci_get_domain_state();
> > > > > +     int ret;
> > > > > +
> > > > > +     if (!state)
> > > > > +             state = states[idx];
> > > > > +
> > > > > +     ret = psci_enter_state(idx, state);
> > > > > +
> > > > > +     /* Clear the domain state to start fresh when back from idle. */
> > > > > +     psci_set_domain_state(0);
> > > > > +     return ret;
> > > > > +}
> > > > >
> > > >
> > > > [...]
> > > >
> > > > > @@ -118,6 +152,15 @@ static int __init psci_dt_cpu_init_idle(struct device_node *cpu_node,
> > > > >                       ret = PTR_ERR(data->dev);
> > > > >                       goto free_mem;
> > > > >               }
> > > > > +
> > > > > +             /*
> > > > > +              * Using the deepest state for the CPU to trigger a potential
> > > > > +              * selection of a shared state for the domain, assumes the
> > > > > +              * domain states are all deeper states.
> > > > > +              */
> > > > > +             if (data->dev)
> > > >
> > > > You can drop this check as return on error above.
> > >
> > > Actually not, because if OSI is supported, there is still a
> > > possibility that the PM domain topology isn't used.
> > >
> >
> > And how do we support that ? I am missing something here.
> >
> > > This means ->data->dev is NULL.
> > >
> >
> > I don't get that.
>
> This is quite similar to the existing limited support we have for OSI today.
>
> We are using the idle states for the CPU, but ignoring the idle states
> for the cluster. If you just skip applying the DTS patch14, this is
> what happens.
>

No if psci_set_osi fails, we shouldn't create genpd domain as we don't
enter any cluster state. The default mode(same as PC) should work which
don't need any genpd domains. Adding one which is unused is just confusion.
Please avoid that.

> >
> > > >
> > > > > +                     drv->states[state_count - 1].enter =
> > > > > +                             psci_enter_domain_idle_state;
> > > >
> > > > I see the comment above but this potential blocks retention mode at
> > > > cluster level when all cpu enter retention at CPU level. I don't like
> > > > this assumption, but I don't have any better suggestion. Please add the
> > > > note that we can't enter RETENTION state at cluster/domain level when
> > > > all CPUs enter at CPU level.
> > >
> > > You are correct, but I think the comment a few lines above (agreed to
> > > be added by Lorenzo in the previous version) should be enough to
> > > explain that. No?
> > >
> > > The point is, this is only a problem if cluster RETENTION is
> > > considered to be a shallower state that CPU power off, for example.
> > >
> >
> > Yes, but give examples makes it better and helps people who may be
> > wondering why cluster retention state is not being entered. You can just
> > add to the above comment:
> >
> > "e.g. If CPU Retention is one of the shallower state, then we can't enter
> > any of the allowed domain states."
>
> Hmm, that it's not a correct statement I think, let me elaborate.
>
> The problem is, that in case the CPU has both RETENTION and POWER OFF
> (deepest CPU state), we would only be able to reach a cluster state
> (RETENTION or POWER OFF) when the CPUs are in CPU POWER OFF (as that's
> the deepest).
>

Sorry for the poor choice of words. What I meant is only one can be
deepest and it will be CPU POWER OFF if it exist at the CPU level.
RETENTION(again if exist) is shallower(rather deeper but not deepest
state).

> This is okay, as long as a cluster RETENTION state is considered being
> "deeper" than the CPU POWER OFF state. However, if that isn't the
> case, it means  the cluster RETENTION state is not considered in the
> correct order, but it's still possible to reach as a "domain state".
>

Again sorry for not being clear, I was referring CPU RET + CLUSTER RET.

> I think this all is kind of summarized in the comment I agreed upon
> with Lorenzo, but if you still think there is some clarification
> needed I happy to add it.
>
> Makes sense?
>

OK, if you happy, that's fine. I just wanted to clearly state CPU RET
+ CLUSTER RET is not possible with the implementation.

--
Regards,
Sudeep
Ulf Hansson Dec. 20, 2019, 11:33 a.m. UTC | #6
On Fri, 20 Dec 2019 at 11:01, Sudeep Holla <sudeep.holla@arm.com> wrote:
>
> On Thu, Dec 19, 2019 at 10:33:34PM +0100, Ulf Hansson wrote:
> > On Thu, 19 Dec 2019 at 19:01, Sudeep Holla <sudeep.holla@arm.com> wrote:
> > >
> > > On Thu, Dec 19, 2019 at 04:48:13PM +0100, Ulf Hansson wrote:
> > > > On Thu, 19 Dec 2019 at 15:32, Sudeep Holla <sudeep.holla@arm.com> wrote:
> > > > >
> > > > > On Wed, Dec 11, 2019 at 04:43:39PM +0100, Ulf Hansson wrote:
> > > > > > The per CPU variable psci_power_state, contains an array of fixed values,
> > > > > > which reflects the corresponding arm,psci-suspend-param parsed from DT, for
> > > > > > each of the available CPU idle states.
> > > > > >
> > > > > > This isn't sufficient when using the hierarchical CPU topology in DT, in
> > > > > > combination with having PSCI OS initiated (OSI) mode enabled. More
> > > > > > precisely, in OSI mode, Linux is responsible of telling the PSCI FW what
> > > > > > idle state the cluster (a group of CPUs) should enter, while in PSCI
> > > > > > Platform Coordinated (PC) mode, each CPU independently votes for an idle
> > > > > > state of the cluster.
> > > > > >
> > > > > > For this reason, introduce a per CPU variable called domain_state and
> > > > > > implement two helper functions to read/write its value. Then let the
> > > > > > domain_state take precedence over the regular selected state, when entering
> > > > > > and idle state.
> > > > > >
> > > > > > To avoid executing the above OSI specific code in the ->enter() callback,
> > > > > > while operating in the default PSCI Platform Coordinated mode, let's also
> > > > > > add a new enter-function and use it for OSI.
> > > > > >
> > > > > > Co-developed-by: Lina Iyer <lina.iyer@linaro.org>
> > > > > > Signed-off-by: Lina Iyer <lina.iyer@linaro.org>
> > > > > > Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
> > > > > > ---
> > > > > >
> > > > > > Changes in v4:
> > > > > >       - Rebased on top of earlier changes.
> > > > > >       - Add comment about using the deepest cpuidle state for the domain state
> > > > > >       selection.
> > > > > >
> > > > > > ---
> > > > > >  drivers/cpuidle/cpuidle-psci.c | 56 ++++++++++++++++++++++++++++++----
> > > > > >  1 file changed, 50 insertions(+), 6 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/cpuidle/cpuidle-psci.c b/drivers/cpuidle/cpuidle-psci.c
> > > > > > index 6a87848be3c3..9600fe674a89 100644
> > > > > > --- a/drivers/cpuidle/cpuidle-psci.c
> > > > > > +++ b/drivers/cpuidle/cpuidle-psci.c
> > > > > > @@ -29,14 +29,47 @@ struct psci_cpuidle_data {
> > > > > >  };
> > > > > >
> > > > > >  static DEFINE_PER_CPU_READ_MOSTLY(struct psci_cpuidle_data, psci_cpuidle_data);
> > > > > > +static DEFINE_PER_CPU(u32, domain_state);
> > > > > > +
> > > > >
> > > > > [...]
> > > > >
> > > > > > +static int psci_enter_domain_idle_state(struct cpuidle_device *dev,
> > > > > > +                                     struct cpuidle_driver *drv, int idx)
> > > > > > +{
> > > > > > +     struct psci_cpuidle_data *data = this_cpu_ptr(&psci_cpuidle_data);
> > > > > > +     u32 *states = data->psci_states;
> > > > >
> > > > > Why can't the above be like this for consistency(see below in
> > > > > psci_enter_idle_state) ?
> > > >
> > > > You have a point, however in patch11 I am adding this line below.
> > > >
> > > > struct device *pd_dev = data->dev;
> > > >
> > > > So I don't think it matters much, agree?
> > > >
> > >
> > > Ah OK, looked odd as part of this patch, may be you could have moved
> > > this change into that patch. Anyways fine as is.
> >
> > Okay, then I rather just keep it.
> >
> > >
> > > > >
> > > > >         u32 *states = __this_cpu_read(psci_cpuidle_data.psci_states);
> > > > >
> > > > > > +     u32 state = psci_get_domain_state();
> > > > > > +     int ret;
> > > > > > +
> > > > > > +     if (!state)
> > > > > > +             state = states[idx];
> > > > > > +
> > > > > > +     ret = psci_enter_state(idx, state);
> > > > > > +
> > > > > > +     /* Clear the domain state to start fresh when back from idle. */
> > > > > > +     psci_set_domain_state(0);
> > > > > > +     return ret;
> > > > > > +}
> > > > > >
> > > > >
> > > > > [...]
> > > > >
> > > > > > @@ -118,6 +152,15 @@ static int __init psci_dt_cpu_init_idle(struct device_node *cpu_node,
> > > > > >                       ret = PTR_ERR(data->dev);
> > > > > >                       goto free_mem;
> > > > > >               }
> > > > > > +
> > > > > > +             /*
> > > > > > +              * Using the deepest state for the CPU to trigger a potential
> > > > > > +              * selection of a shared state for the domain, assumes the
> > > > > > +              * domain states are all deeper states.
> > > > > > +              */
> > > > > > +             if (data->dev)
> > > > >
> > > > > You can drop this check as return on error above.
> > > >
> > > > Actually not, because if OSI is supported, there is still a
> > > > possibility that the PM domain topology isn't used.
> > > >
> > >
> > > And how do we support that ? I am missing something here.
> > >
> > > > This means ->data->dev is NULL.
> > > >
> > >
> > > I don't get that.
> >
> > This is quite similar to the existing limited support we have for OSI today.
> >
> > We are using the idle states for the CPU, but ignoring the idle states
> > for the cluster. If you just skip applying the DTS patch14, this is
> > what happens.
> >
>
> No if psci_set_osi fails, we shouldn't create genpd domain as we don't
> enter any cluster state. The default mode(same as PC) should work which
> don't need any genpd domains. Adding one which is unused is just confusion.
> Please avoid that.

I am deferring to the other thread to continue this discussion.

>
> > >
> > > > >
> > > > > > +                     drv->states[state_count - 1].enter =
> > > > > > +                             psci_enter_domain_idle_state;
> > > > >
> > > > > I see the comment above but this potential blocks retention mode at
> > > > > cluster level when all cpu enter retention at CPU level. I don't like
> > > > > this assumption, but I don't have any better suggestion. Please add the
> > > > > note that we can't enter RETENTION state at cluster/domain level when
> > > > > all CPUs enter at CPU level.
> > > >
> > > > You are correct, but I think the comment a few lines above (agreed to
> > > > be added by Lorenzo in the previous version) should be enough to
> > > > explain that. No?
> > > >
> > > > The point is, this is only a problem if cluster RETENTION is
> > > > considered to be a shallower state that CPU power off, for example.
> > > >
> > >
> > > Yes, but give examples makes it better and helps people who may be
> > > wondering why cluster retention state is not being entered. You can just
> > > add to the above comment:
> > >
> > > "e.g. If CPU Retention is one of the shallower state, then we can't enter
> > > any of the allowed domain states."
> >
> > Hmm, that it's not a correct statement I think, let me elaborate.
> >
> > The problem is, that in case the CPU has both RETENTION and POWER OFF
> > (deepest CPU state), we would only be able to reach a cluster state
> > (RETENTION or POWER OFF) when the CPUs are in CPU POWER OFF (as that's
> > the deepest).
> >
>
> Sorry for the poor choice of words. What I meant is only one can be
> deepest and it will be CPU POWER OFF if it exist at the CPU level.
> RETENTION(again if exist) is shallower(rather deeper but not deepest
> state).
>
> > This is okay, as long as a cluster RETENTION state is considered being
> > "deeper" than the CPU POWER OFF state. However, if that isn't the
> > case, it means  the cluster RETENTION state is not considered in the
> > correct order, but it's still possible to reach as a "domain state".
> >
>
> Again sorry for not being clear, I was referring CPU RET + CLUSTER RET.
>
> > I think this all is kind of summarized in the comment I agreed upon
> > with Lorenzo, but if you still think there is some clarification
> > needed I happy to add it.
> >
> > Makes sense?
> >
>
> OK, if you happy, that's fine. I just wanted to clearly state CPU RET
> + CLUSTER RET is not possible with the implementation.

Okay!

I will then leave this as is. When/if you find a better wording of the
comment, you can always send a patch on top.

Kind regards
Uffe
diff mbox series

Patch

diff --git a/drivers/cpuidle/cpuidle-psci.c b/drivers/cpuidle/cpuidle-psci.c
index 6a87848be3c3..9600fe674a89 100644
--- a/drivers/cpuidle/cpuidle-psci.c
+++ b/drivers/cpuidle/cpuidle-psci.c
@@ -29,14 +29,47 @@  struct psci_cpuidle_data {
 };
 
 static DEFINE_PER_CPU_READ_MOSTLY(struct psci_cpuidle_data, psci_cpuidle_data);
+static DEFINE_PER_CPU(u32, domain_state);
+
+static inline void psci_set_domain_state(u32 state)
+{
+	__this_cpu_write(domain_state, state);
+}
+
+static inline u32 psci_get_domain_state(void)
+{
+	return __this_cpu_read(domain_state);
+}
+
+static inline int psci_enter_state(int idx, u32 state)
+{
+	return CPU_PM_CPU_IDLE_ENTER_PARAM(psci_cpu_suspend_enter, idx, state);
+}
+
+static int psci_enter_domain_idle_state(struct cpuidle_device *dev,
+					struct cpuidle_driver *drv, int idx)
+{
+	struct psci_cpuidle_data *data = this_cpu_ptr(&psci_cpuidle_data);
+	u32 *states = data->psci_states;
+	u32 state = psci_get_domain_state();
+	int ret;
+
+	if (!state)
+		state = states[idx];
+
+	ret = psci_enter_state(idx, state);
+
+	/* Clear the domain state to start fresh when back from idle. */
+	psci_set_domain_state(0);
+	return ret;
+}
 
 static int psci_enter_idle_state(struct cpuidle_device *dev,
 				struct cpuidle_driver *drv, int idx)
 {
 	u32 *state = __this_cpu_read(psci_cpuidle_data.psci_states);
 
-	return CPU_PM_CPU_IDLE_ENTER_PARAM(psci_cpu_suspend_enter,
-					   idx, state[idx]);
+	return psci_enter_state(idx, state[idx]);
 }
 
 static struct cpuidle_driver psci_idle_driver __initdata = {
@@ -79,7 +112,8 @@  static int __init psci_dt_parse_state_node(struct device_node *np, u32 *state)
 	return 0;
 }
 
-static int __init psci_dt_cpu_init_idle(struct device_node *cpu_node,
+static int __init psci_dt_cpu_init_idle(struct cpuidle_driver *drv,
+					struct device_node *cpu_node,
 					unsigned int state_count, int cpu)
 {
 	int i, ret = 0;
@@ -118,6 +152,15 @@  static int __init psci_dt_cpu_init_idle(struct device_node *cpu_node,
 			ret = PTR_ERR(data->dev);
 			goto free_mem;
 		}
+
+		/*
+		 * Using the deepest state for the CPU to trigger a potential
+		 * selection of a shared state for the domain, assumes the
+		 * domain states are all deeper states.
+		 */
+		if (data->dev)
+			drv->states[state_count - 1].enter =
+				psci_enter_domain_idle_state;
 	}
 
 	/* Idle states parsed correctly, store them in the per-cpu struct. */
@@ -129,7 +172,8 @@  static int __init psci_dt_cpu_init_idle(struct device_node *cpu_node,
 	return ret;
 }
 
-static __init int psci_cpu_init_idle(unsigned int cpu, unsigned int state_count)
+static __init int psci_cpu_init_idle(struct cpuidle_driver *drv,
+				     unsigned int cpu, unsigned int state_count)
 {
 	struct device_node *cpu_node;
 	int ret;
@@ -145,7 +189,7 @@  static __init int psci_cpu_init_idle(unsigned int cpu, unsigned int state_count)
 	if (!cpu_node)
 		return -ENODEV;
 
-	ret = psci_dt_cpu_init_idle(cpu_node, state_count, cpu);
+	ret = psci_dt_cpu_init_idle(drv, cpu_node, state_count, cpu);
 
 	of_node_put(cpu_node);
 
@@ -201,7 +245,7 @@  static int __init psci_idle_init_cpu(int cpu)
 	/*
 	 * Initialize PSCI idle states.
 	 */
-	ret = psci_cpu_init_idle(cpu, ret);
+	ret = psci_cpu_init_idle(drv, cpu, ret);
 	if (ret) {
 		pr_err("CPU %d failed to PSCI idle\n", cpu);
 		goto out_kfree_drv;