diff mbox series

[RFC] PM: domains: Reverse the order of performance and enabling ops

Message ID 20220720110246.762939-1-abel.vesa@linaro.org (mailing list archive)
State RFC, archived
Headers show
Series [RFC] PM: domains: Reverse the order of performance and enabling ops | expand

Commit Message

Abel Vesa July 20, 2022, 11:02 a.m. UTC
Rather than enabling and then setting the performance state, which usually
translates into two different levels (voltages) in order to get to the
one required by the consumer, we could give a chance to the providers to
cache the performance state needed by the consumer and then, when powering
on the power domain, the provider could use the cached level instead.
Also the drop_performance and power_off have to be reversed so that
when the last active consumer suspends, the level doesn't actually drop
until the pd is disabled.

For the power domains that do not provide the set_performance, things
remain unchanged, as does for the power domains that only provide the
set_performance but do not provide the power_on/off.

Signed-off-by: Abel Vesa <abel.vesa@linaro.org>
---
 drivers/base/power/domain.c | 30 +++++++++++++++---------------
 1 file changed, 15 insertions(+), 15 deletions(-)

--
2.34.3

Comments

Ulf Hansson July 21, 2022, 4:48 p.m. UTC | #1
On Wed, 20 Jul 2022 at 13:03, Abel Vesa <abel.vesa@linaro.org> wrote:
>
> Rather than enabling and then setting the performance state, which usually
> translates into two different levels (voltages) in order to get to the
> one required by the consumer, we could give a chance to the providers to
> cache the performance state needed by the consumer and then, when powering
> on the power domain, the provider could use the cached level instead.

I don't think it's really clear what you want to do here. Let's see
what the discussion below brings us to, but for the next version
please elaborate a bit more in the commit message.

Although, if I understand correctly (also from our offlist
discussions), you want to make it possible to move from two calls,
into one call into the FW from the genpd provider. So it's basically
an optimization, which to me, certainly sounds worth doing.

Furthermore, to get the complete picture, in the Qcom case, we set a
"default" low performance level from the genpd's ->power_on()
callback, which is needed to enable basic functionality for some
consumers.

The second call that I refer to is made when genpd calls the
->set_performance() callback (from genpd_runtime_suspend()), which is
done by genpd to potentially set a new value for an aggregated
performance state of the PM domain. In case when there actually is a
new performance state set in this path, we end up calling the FW twice
for the Qcom case, where this first one is unnecessary.

Did I get that right?

> Also the drop_performance and power_off have to be reversed so that
> when the last active consumer suspends, the level doesn't actually drop
> until the pd is disabled.

I don't quite get what this part helps with, is it really needed to
improve the behaviour?

>
> For the power domains that do not provide the set_performance, things
> remain unchanged, as does for the power domains that only provide the
> set_performance but do not provide the power_on/off.

Right, good points!

I get back to review the code soon, just wanted to make sure I have
the complete picture first.

Kind regards
Uffe

>
> Signed-off-by: Abel Vesa <abel.vesa@linaro.org>
> ---
>  drivers/base/power/domain.c | 30 +++++++++++++++---------------
>  1 file changed, 15 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
> index 5a2e0232862e..38647c304b73 100644
> --- a/drivers/base/power/domain.c
> +++ b/drivers/base/power/domain.c
> @@ -939,8 +939,8 @@ static int genpd_runtime_suspend(struct device *dev)
>                 return 0;
>
>         genpd_lock(genpd);
> -       gpd_data->rpm_pstate = genpd_drop_performance_state(dev);
>         genpd_power_off(genpd, true, 0);
> +       gpd_data->rpm_pstate = genpd_drop_performance_state(dev);
>         genpd_unlock(genpd);
>
>         return 0;
> @@ -978,9 +978,8 @@ static int genpd_runtime_resume(struct device *dev)
>                 goto out;
>
>         genpd_lock(genpd);
> +       genpd_restore_performance_state(dev, gpd_data->rpm_pstate);
>         ret = genpd_power_on(genpd, 0);
> -       if (!ret)
> -               genpd_restore_performance_state(dev, gpd_data->rpm_pstate);
>         genpd_unlock(genpd);
>
>         if (ret)
> @@ -1018,8 +1017,8 @@ static int genpd_runtime_resume(struct device *dev)
>  err_poweroff:
>         if (!pm_runtime_is_irq_safe(dev) || genpd_is_irq_safe(genpd)) {
>                 genpd_lock(genpd);
> -               gpd_data->rpm_pstate = genpd_drop_performance_state(dev);
>                 genpd_power_off(genpd, true, 0);
> +               gpd_data->rpm_pstate = genpd_drop_performance_state(dev);
>                 genpd_unlock(genpd);
>         }
>
> @@ -2747,17 +2746,6 @@ static int __genpd_dev_pm_attach(struct device *dev, struct device *base_dev,
>         dev->pm_domain->detach = genpd_dev_pm_detach;
>         dev->pm_domain->sync = genpd_dev_pm_sync;
>
> -       if (power_on) {
> -               genpd_lock(pd);
> -               ret = genpd_power_on(pd, 0);
> -               genpd_unlock(pd);
> -       }
> -
> -       if (ret) {
> -               genpd_remove_device(pd, dev);
> -               return -EPROBE_DEFER;
> -       }
> -
>         /* Set the default performance state */
>         pstate = of_get_required_opp_performance_state(dev->of_node, index);
>         if (pstate < 0 && pstate != -ENODEV && pstate != -EOPNOTSUPP) {
> @@ -2769,6 +2757,18 @@ static int __genpd_dev_pm_attach(struct device *dev, struct device *base_dev,
>                         goto err;
>                 dev_gpd_data(dev)->default_pstate = pstate;
>         }
> +
> +       if (power_on) {
> +               genpd_lock(pd);
> +               ret = genpd_power_on(pd, 0);
> +               genpd_unlock(pd);
> +       }
> +
> +       if (ret) {
> +               genpd_remove_device(pd, dev);
> +               return -EPROBE_DEFER;
> +       }
> +
>         return 1;
>
>  err:
> --
> 2.34.3
>
Abel Vesa July 26, 2022, 6:38 p.m. UTC | #2
On 22-07-21 18:48:10, Ulf Hansson wrote:
> On Wed, 20 Jul 2022 at 13:03, Abel Vesa <abel.vesa@linaro.org> wrote:
> >
> > Rather than enabling and then setting the performance state, which usually
> > translates into two different levels (voltages) in order to get to the
> > one required by the consumer, we could give a chance to the providers to
> > cache the performance state needed by the consumer and then, when powering
> > on the power domain, the provider could use the cached level instead.
>
> I don't think it's really clear what you want to do here. Let's see
> what the discussion below brings us to, but for the next version
> please elaborate a bit more in the commit message.

Sorry about that. Will give more details in the next version.

>
> Although, if I understand correctly (also from our offlist
> discussions), you want to make it possible to move from two calls,
> into one call into the FW from the genpd provider. So it's basically
> an optimization, which to me, certainly sounds worth doing.
>
> Furthermore, to get the complete picture, in the Qcom case, we set a
> "default" low performance level from the genpd's ->power_on()
> callback, which is needed to enable basic functionality for some
> consumers.
>
> The second call that I refer to is made when genpd calls the
> ->set_performance() callback (from genpd_runtime_suspend()), which is
> done by genpd to potentially set a new value for an aggregated
> performance state of the PM domain. In case when there actually is a
> new performance state set in this path, we end up calling the FW twice
> for the Qcom case, where this first one is unnecessary.
>
> Did I get that right?

Actually, for every ->power_on, there is a ->set_performance right after.

For example, on genpd_runtime_suspend, this is done:

	genpd_lock(genpd);
	ret = genpd_power_on(genpd, 0);
	if (!ret)
        	genpd_restore_performance_state(dev, gpd_data->rpm_pstate);
	genpd_unlock(genpd);

And same thing on __genpd_dev_pm_attach.

Now, TBH, I can't think of any scenario where a consumer would want its PD powered,
(which implies a non-zero voltage level) and then changed to a higher performance
level (higher voltage).

In most scenarios, though, the consumer needs the PD powered on to a specific voltage
level.

Based on the two statements above, we need ->set_performance to actually act as
a way to tell the provider to which voltage level to power on the power domain
when the ->power_on will be called.

So my suggestion with this patch is to reverse the order, do ->set_performance first
and then ->power_on, this way the provider receives the voltage level required by
a consumer before the request to power on the PD. Then a provider might use that
info when powering on/off that PD.

>
> > Also the drop_performance and power_off have to be reversed so that
> > when the last active consumer suspends, the level doesn't actually drop
> > until the pd is disabled.
>
> I don't quite get what this part helps with, is it really needed to
> improve the behaviour?

Again, why would a consumer need its PD voltage dropped before being powered off?

I think it makes more sense for the ->set_performance in this case to act as a
way to tell the provider that a specific device has yeilded its voltage level
request. That way the provider can drop the voltage to the minimum requested by
the active consumers of that PD.

>
> >
> > For the power domains that do not provide the set_performance, things
> > remain unchanged, as does for the power domains that only provide the
> > set_performance but do not provide the power_on/off.
>
> Right, good points!
>
> I get back to review the code soon, just wanted to make sure I have
> the complete picture first.
>
> Kind regards
> Uffe
>
> >
> > Signed-off-by: Abel Vesa <abel.vesa@linaro.org>
> > ---
> >  drivers/base/power/domain.c | 30 +++++++++++++++---------------
> >  1 file changed, 15 insertions(+), 15 deletions(-)
> >
> > diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
> > index 5a2e0232862e..38647c304b73 100644
> > --- a/drivers/base/power/domain.c
> > +++ b/drivers/base/power/domain.c
> > @@ -939,8 +939,8 @@ static int genpd_runtime_suspend(struct device *dev)
> >                 return 0;
> >
> >         genpd_lock(genpd);
> > -       gpd_data->rpm_pstate = genpd_drop_performance_state(dev);
> >         genpd_power_off(genpd, true, 0);
> > +       gpd_data->rpm_pstate = genpd_drop_performance_state(dev);
> >         genpd_unlock(genpd);
> >
> >         return 0;
> > @@ -978,9 +978,8 @@ static int genpd_runtime_resume(struct device *dev)
> >                 goto out;
> >
> >         genpd_lock(genpd);
> > +       genpd_restore_performance_state(dev, gpd_data->rpm_pstate);
> >         ret = genpd_power_on(genpd, 0);
> > -       if (!ret)
> > -               genpd_restore_performance_state(dev, gpd_data->rpm_pstate);
> >         genpd_unlock(genpd);
> >
> >         if (ret)
> > @@ -1018,8 +1017,8 @@ static int genpd_runtime_resume(struct device *dev)
> >  err_poweroff:
> >         if (!pm_runtime_is_irq_safe(dev) || genpd_is_irq_safe(genpd)) {
> >                 genpd_lock(genpd);
> > -               gpd_data->rpm_pstate = genpd_drop_performance_state(dev);
> >                 genpd_power_off(genpd, true, 0);
> > +               gpd_data->rpm_pstate = genpd_drop_performance_state(dev);
> >                 genpd_unlock(genpd);
> >         }
> >
> > @@ -2747,17 +2746,6 @@ static int __genpd_dev_pm_attach(struct device *dev, struct device *base_dev,
> >         dev->pm_domain->detach = genpd_dev_pm_detach;
> >         dev->pm_domain->sync = genpd_dev_pm_sync;
> >
> > -       if (power_on) {
> > -               genpd_lock(pd);
> > -               ret = genpd_power_on(pd, 0);
> > -               genpd_unlock(pd);
> > -       }
> > -
> > -       if (ret) {
> > -               genpd_remove_device(pd, dev);
> > -               return -EPROBE_DEFER;
> > -       }
> > -
> >         /* Set the default performance state */
> >         pstate = of_get_required_opp_performance_state(dev->of_node, index);
> >         if (pstate < 0 && pstate != -ENODEV && pstate != -EOPNOTSUPP) {
> > @@ -2769,6 +2757,18 @@ static int __genpd_dev_pm_attach(struct device *dev, struct device *base_dev,
> >                         goto err;
> >                 dev_gpd_data(dev)->default_pstate = pstate;
> >         }
> > +
> > +       if (power_on) {
> > +               genpd_lock(pd);
> > +               ret = genpd_power_on(pd, 0);
> > +               genpd_unlock(pd);
> > +       }
> > +
> > +       if (ret) {
> > +               genpd_remove_device(pd, dev);
> > +               return -EPROBE_DEFER;
> > +       }
> > +
> >         return 1;
> >
> >  err:
> > --
> > 2.34.3
> >
>
Ulf Hansson July 28, 2022, 11:37 a.m. UTC | #3
+ Dmitry, Thierry

On Tue, 26 Jul 2022 at 20:38, Abel Vesa <abel.vesa@linaro.org> wrote:
>
> On 22-07-21 18:48:10, Ulf Hansson wrote:
> > On Wed, 20 Jul 2022 at 13:03, Abel Vesa <abel.vesa@linaro.org> wrote:
> > >
> > > Rather than enabling and then setting the performance state, which usually
> > > translates into two different levels (voltages) in order to get to the
> > > one required by the consumer, we could give a chance to the providers to
> > > cache the performance state needed by the consumer and then, when powering
> > > on the power domain, the provider could use the cached level instead.
> >
> > I don't think it's really clear what you want to do here. Let's see
> > what the discussion below brings us to, but for the next version
> > please elaborate a bit more in the commit message.
>
> Sorry about that. Will give more details in the next version.
>
> >
> > Although, if I understand correctly (also from our offlist
> > discussions), you want to make it possible to move from two calls,
> > into one call into the FW from the genpd provider. So it's basically
> > an optimization, which to me, certainly sounds worth doing.
> >
> > Furthermore, to get the complete picture, in the Qcom case, we set a
> > "default" low performance level from the genpd's ->power_on()
> > callback, which is needed to enable basic functionality for some
> > consumers.
> >
> > The second call that I refer to is made when genpd calls the
> > ->set_performance() callback (from genpd_runtime_suspend()), which is
> > done by genpd to potentially set a new value for an aggregated
> > performance state of the PM domain. In case when there actually is a
> > new performance state set in this path, we end up calling the FW twice
> > for the Qcom case, where this first one is unnecessary.
> >
> > Did I get that right?
>
> Actually, for every ->power_on, there is a ->set_performance right after.
>
> For example, on genpd_runtime_suspend, this is done:
>
>         genpd_lock(genpd);
>         ret = genpd_power_on(genpd, 0);
>         if (!ret)
>                 genpd_restore_performance_state(dev, gpd_data->rpm_pstate);
>         genpd_unlock(genpd);
>
> And same thing on __genpd_dev_pm_attach.

I guess you refer to genpd_runtime_resume(), but never mind, I get it.

Note that, I only wanted to highlight  that genpd doesn't necessarily
invoke the ->set_performance() callback at genpd_runtime_resume(). It
depends on whether there is a new performance level to be set for the
device (which also needs to aggregate to a new level for the genpd in
question).

>
> Now, TBH, I can't think of any scenario where a consumer would want its PD powered,
> (which implies a non-zero voltage level) and then changed to a higher performance
> level (higher voltage).
>
> In most scenarios, though, the consumer needs the PD powered on to a specific voltage
> level.

Yes, this sounds reasonable to me too.

However, in some cases there are certain orders of how things need to
be "powered on", there may be clocks etc that need to be managed
carefully to not break HW or cause glitches, for example.

I have looped in Dmitry and Thierry to see if they think the change
should be fine for Tegra platforms too.

>
> Based on the two statements above, we need ->set_performance to actually act as
> a way to tell the provider to which voltage level to power on the power domain
> when the ->power_on will be called.
>
> So my suggestion with this patch is to reverse the order, do ->set_performance first
> and then ->power_on, this way the provider receives the voltage level required by
> a consumer before the request to power on the PD. Then a provider might use that
> info when powering on/off that PD.

Yes, that should work fine. At least for the power on scenario. Let's
discuss more about the power off scenario below.

>
> >
> > > Also the drop_performance and power_off have to be reversed so that
> > > when the last active consumer suspends, the level doesn't actually drop
> > > until the pd is disabled.
> >
> > I don't quite get what this part helps with, is it really needed to
> > improve the behaviour?
>
> Again, why would a consumer need its PD voltage dropped before being powered off?

Similar to powering on a device/PM domain, we need to be careful about
the order of how we turn things off. Other than that, you are probably
right.

However, I don't quite understand how reversing the order of calling
the ->power_off() callback and the ->set_performance_state() would
help the provider to implement this. See more below.

>
> I think it makes more sense for the ->set_performance in this case to act as a
> way to tell the provider that a specific device has yeilded its voltage level
> request. That way the provider can drop the voltage to the minimum requested by
> the active consumers of that PD.

The genpd provider can know if the PM domain is powered on or off,
when the ->set_performance_state() callback is invoked. If it's
powered off, it may then decide to "cache" the request for the
performance level request, until it gets powered on.

Although, I don't see how a genpd provider should be able to cache a
performance state request, when the PM domain is already powered on
(which is what you propose, if I understand correctly), that simply
doesn't work for the other scenarios.

>
> >
> > >
> > > For the power domains that do not provide the set_performance, things
> > > remain unchanged, as does for the power domains that only provide the
> > > set_performance but do not provide the power_on/off.
> >
> > Right, good points!
> >
> > I get back to review the code soon, just wanted to make sure I have
> > the complete picture first.
> >
> > Kind regards
> > Uffe
> >
> > >
> > > Signed-off-by: Abel Vesa <abel.vesa@linaro.org>
> > > ---
> > >  drivers/base/power/domain.c | 30 +++++++++++++++---------------
> > >  1 file changed, 15 insertions(+), 15 deletions(-)
> > >
> > > diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
> > > index 5a2e0232862e..38647c304b73 100644
> > > --- a/drivers/base/power/domain.c
> > > +++ b/drivers/base/power/domain.c
> > > @@ -939,8 +939,8 @@ static int genpd_runtime_suspend(struct device *dev)
> > >                 return 0;
> > >
> > >         genpd_lock(genpd);
> > > -       gpd_data->rpm_pstate = genpd_drop_performance_state(dev);
> > >         genpd_power_off(genpd, true, 0);
> > > +       gpd_data->rpm_pstate = genpd_drop_performance_state(dev);
> > >         genpd_unlock(genpd);
> > >
> > >         return 0;
> > > @@ -978,9 +978,8 @@ static int genpd_runtime_resume(struct device *dev)
> > >                 goto out;
> > >
> > >         genpd_lock(genpd);
> > > +       genpd_restore_performance_state(dev, gpd_data->rpm_pstate);
> > >         ret = genpd_power_on(genpd, 0);
> > > -       if (!ret)
> > > -               genpd_restore_performance_state(dev, gpd_data->rpm_pstate);
> > >         genpd_unlock(genpd);
> > >
> > >         if (ret)
> > > @@ -1018,8 +1017,8 @@ static int genpd_runtime_resume(struct device *dev)
> > >  err_poweroff:
> > >         if (!pm_runtime_is_irq_safe(dev) || genpd_is_irq_safe(genpd)) {
> > >                 genpd_lock(genpd);
> > > -               gpd_data->rpm_pstate = genpd_drop_performance_state(dev);
> > >                 genpd_power_off(genpd, true, 0);
> > > +               gpd_data->rpm_pstate = genpd_drop_performance_state(dev);
> > >                 genpd_unlock(genpd);
> > >         }
> > >
> > > @@ -2747,17 +2746,6 @@ static int __genpd_dev_pm_attach(struct device *dev, struct device *base_dev,
> > >         dev->pm_domain->detach = genpd_dev_pm_detach;
> > >         dev->pm_domain->sync = genpd_dev_pm_sync;
> > >
> > > -       if (power_on) {
> > > -               genpd_lock(pd);
> > > -               ret = genpd_power_on(pd, 0);
> > > -               genpd_unlock(pd);
> > > -       }
> > > -
> > > -       if (ret) {
> > > -               genpd_remove_device(pd, dev);
> > > -               return -EPROBE_DEFER;
> > > -       }
> > > -
> > >         /* Set the default performance state */
> > >         pstate = of_get_required_opp_performance_state(dev->of_node, index);
> > >         if (pstate < 0 && pstate != -ENODEV && pstate != -EOPNOTSUPP) {
> > > @@ -2769,6 +2757,18 @@ static int __genpd_dev_pm_attach(struct device *dev, struct device *base_dev,
> > >                         goto err;
> > >                 dev_gpd_data(dev)->default_pstate = pstate;
> > >         }
> > > +
> > > +       if (power_on) {
> > > +               genpd_lock(pd);
> > > +               ret = genpd_power_on(pd, 0);
> > > +               genpd_unlock(pd);
> > > +       }
> > > +
> > > +       if (ret) {
> > > +               genpd_remove_device(pd, dev);
> > > +               return -EPROBE_DEFER;
> > > +       }
> > > +
> > >         return 1;
> > >
> > >  err:
> > > --
> > > 2.34.3
> > >
> >
Abel Vesa July 29, 2022, 9:46 a.m. UTC | #4
On 22-07-28 13:37:38, Ulf Hansson wrote:
> + Dmitry, Thierry
>
> On Tue, 26 Jul 2022 at 20:38, Abel Vesa <abel.vesa@linaro.org> wrote:
> >
> > On 22-07-21 18:48:10, Ulf Hansson wrote:
> > > On Wed, 20 Jul 2022 at 13:03, Abel Vesa <abel.vesa@linaro.org> wrote:
> > > >
> > > > Rather than enabling and then setting the performance state, which usually
> > > > translates into two different levels (voltages) in order to get to the
> > > > one required by the consumer, we could give a chance to the providers to
> > > > cache the performance state needed by the consumer and then, when powering
> > > > on the power domain, the provider could use the cached level instead.
> > >
> > > I don't think it's really clear what you want to do here. Let's see
> > > what the discussion below brings us to, but for the next version
> > > please elaborate a bit more in the commit message.
> >
> > Sorry about that. Will give more details in the next version.
> >
> > >
> > > Although, if I understand correctly (also from our offlist
> > > discussions), you want to make it possible to move from two calls,
> > > into one call into the FW from the genpd provider. So it's basically
> > > an optimization, which to me, certainly sounds worth doing.
> > >
> > > Furthermore, to get the complete picture, in the Qcom case, we set a
> > > "default" low performance level from the genpd's ->power_on()
> > > callback, which is needed to enable basic functionality for some
> > > consumers.
> > >
> > > The second call that I refer to is made when genpd calls the
> > > ->set_performance() callback (from genpd_runtime_suspend()), which is
> > > done by genpd to potentially set a new value for an aggregated
> > > performance state of the PM domain. In case when there actually is a
> > > new performance state set in this path, we end up calling the FW twice
> > > for the Qcom case, where this first one is unnecessary.
> > >
> > > Did I get that right?
> >
> > Actually, for every ->power_on, there is a ->set_performance right after.
> >
> > For example, on genpd_runtime_suspend, this is done:
> >
> >         genpd_lock(genpd);
> >         ret = genpd_power_on(genpd, 0);
> >         if (!ret)
> >                 genpd_restore_performance_state(dev, gpd_data->rpm_pstate);
> >         genpd_unlock(genpd);
> >
> > And same thing on __genpd_dev_pm_attach.
>
> I guess you refer to genpd_runtime_resume(), but never mind, I get it.

Yes, I meant _resume.

>
> Note that, I only wanted to highlight  that genpd doesn't necessarily
> invoke the ->set_performance() callback at genpd_runtime_resume(). It
> depends on whether there is a new performance level to be set for the
> device (which also needs to aggregate to a new level for the genpd in
> question).

Yes, I get that. But if there is no new performance level to be set, the
order doesn't matter. Right?

>
> >
> > Now, TBH, I can't think of any scenario where a consumer would want its PD powered,
> > (which implies a non-zero voltage level) and then changed to a higher performance
> > level (higher voltage).
> >
> > In most scenarios, though, the consumer needs the PD powered on to a specific voltage
> > level.
>
> Yes, this sounds reasonable to me too.
>
> However, in some cases there are certain orders of how things need to
> be "powered on", there may be clocks etc that need to be managed
> carefully to not break HW or cause glitches, for example.

Totally agree. Maybe this would be another reason for the
->set_performance to be called before the ->power_on. Think about it
as the equivalent of clk_prepare in the CCF. Makes sense to have an
initial preparation before actually enabling the PD. And this
preparation should include the performance level as information passed
to the provider.

>
> I have looped in Dmitry and Thierry to see if they think the change
> should be fine for Tegra platforms too.
>

Good. But the tegra usecase uses only the ->set_performance and does not
use ->power_on and ->power_off for that specific PD. So I don't think
their usecase will be affected by the order reverse.

> >
> > Based on the two statements above, we need ->set_performance to actually act as
> > a way to tell the provider to which voltage level to power on the power domain
> > when the ->power_on will be called.
> >
> > So my suggestion with this patch is to reverse the order, do ->set_performance first
> > and then ->power_on, this way the provider receives the voltage level required by
> > a consumer before the request to power on the PD. Then a provider might use that
> > info when powering on/off that PD.
>
> Yes, that should work fine. At least for the power on scenario. Let's
> discuss more about the power off scenario below.
>
> >
> > >
> > > > Also the drop_performance and power_off have to be reversed so that
> > > > when the last active consumer suspends, the level doesn't actually drop
> > > > until the pd is disabled.
> > >
> > > I don't quite get what this part helps with, is it really needed to
> > > improve the behaviour?
> >
> > Again, why would a consumer need its PD voltage dropped before being powered off?
>
> Similar to powering on a device/PM domain, we need to be careful about
> the order of how we turn things off. Other than that, you are probably
> right.

Again, there are multiple qcom providers that use these and all of them
need the order reversed. Other than the that, there is the tegra single
PD case which would not be impacted, as I explained above.

>
> However, I don't quite understand how reversing the order of calling
> the ->power_off() callback and the ->set_performance_state() would
> help the provider to implement this. See more below.
>

Please correct me where I'm wrong, but here is the most popular scenario as
I see it.

Device A needs the PD enabled at the lowest performance level, so
->set_performance passes on that information to the provider and then,
on ->power_on, the provider uses that performance level to power on that
PD. This can be done by a single call to FW on power_on.

Now lets assume there is another device (B) that needs same PD at the
highest performance level, so genpd calls into the provider performance
call, which checks if the PD is on and if so, it writes the new value to
FW.

On power off, lets assume device B runtime suspends first, so the
->set_performance will 'release' that higher performance level, so the
provider will do another call to FW (since the PD is enabled) with the
lowest performance level, which is still needed by device B. In this case,
the ->power_off will not even be called.

When the last active consumer suspends (in our case here, device A), ->power_off
will be called first disabling the PD, then the ->set_performance will
'release' that lowest perf level the device A requested but will not
call to FW since the PD is already disabled. This would make
sure there are not two calls with two different levels to the FW.

Now, most of this depends on the provider's way of doing things.
But in order to allow the provider to do what is described above, it
needs to know about the perf level before it is asked to power on a PD.
Same applies to powering off.

> >
> > I think it makes more sense for the ->set_performance in this case to act as a
> > way to tell the provider that a specific device has yeilded its voltage level
> > request. That way the provider can drop the voltage to the minimum requested by
> > the active consumers of that PD.
>
> The genpd provider can know if the PM domain is powered on or off,
> when the ->set_performance_state() callback is invoked. If it's
> powered off, it may then decide to "cache" the request for the
> performance level request, until it gets powered on.

But the ->set_performance is called only after ->power_on, so the PD
will always be on when ->set_performance checks. And this is what my
patch is trying to change actually.

>
> Although, I don't see how a genpd provider should be able to cache a
> performance state request, when the PM domain is already powered on
> (which is what you propose, if I understand correctly), that simply
> doesn't work for the other scenarios.

I explained this above. The provider will need to check if the PD is on
and only write to FW if it is. Otherwise it will cache the value for
when the power_on is called.

>
> >
> > >
> > > >
> > > > For the power domains that do not provide the set_performance, things
> > > > remain unchanged, as does for the power domains that only provide the
> > > > set_performance but do not provide the power_on/off.
> > >
> > > Right, good points!
> > >
> > > I get back to review the code soon, just wanted to make sure I have
> > > the complete picture first.
> > >
> > > Kind regards
> > > Uffe
> > >
> > > >
> > > > Signed-off-by: Abel Vesa <abel.vesa@linaro.org>
> > > > ---
> > > >  drivers/base/power/domain.c | 30 +++++++++++++++---------------
> > > >  1 file changed, 15 insertions(+), 15 deletions(-)
> > > >
> > > > diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
> > > > index 5a2e0232862e..38647c304b73 100644
> > > > --- a/drivers/base/power/domain.c
> > > > +++ b/drivers/base/power/domain.c
> > > > @@ -939,8 +939,8 @@ static int genpd_runtime_suspend(struct device *dev)
> > > >                 return 0;
> > > >
> > > >         genpd_lock(genpd);
> > > > -       gpd_data->rpm_pstate = genpd_drop_performance_state(dev);
> > > >         genpd_power_off(genpd, true, 0);
> > > > +       gpd_data->rpm_pstate = genpd_drop_performance_state(dev);
> > > >         genpd_unlock(genpd);
> > > >
> > > >         return 0;
> > > > @@ -978,9 +978,8 @@ static int genpd_runtime_resume(struct device *dev)
> > > >                 goto out;
> > > >
> > > >         genpd_lock(genpd);
> > > > +       genpd_restore_performance_state(dev, gpd_data->rpm_pstate);
> > > >         ret = genpd_power_on(genpd, 0);
> > > > -       if (!ret)
> > > > -               genpd_restore_performance_state(dev, gpd_data->rpm_pstate);
> > > >         genpd_unlock(genpd);
> > > >
> > > >         if (ret)
> > > > @@ -1018,8 +1017,8 @@ static int genpd_runtime_resume(struct device *dev)
> > > >  err_poweroff:
> > > >         if (!pm_runtime_is_irq_safe(dev) || genpd_is_irq_safe(genpd)) {
> > > >                 genpd_lock(genpd);
> > > > -               gpd_data->rpm_pstate = genpd_drop_performance_state(dev);
> > > >                 genpd_power_off(genpd, true, 0);
> > > > +               gpd_data->rpm_pstate = genpd_drop_performance_state(dev);
> > > >                 genpd_unlock(genpd);
> > > >         }
> > > >
> > > > @@ -2747,17 +2746,6 @@ static int __genpd_dev_pm_attach(struct device *dev, struct device *base_dev,
> > > >         dev->pm_domain->detach = genpd_dev_pm_detach;
> > > >         dev->pm_domain->sync = genpd_dev_pm_sync;
> > > >
> > > > -       if (power_on) {
> > > > -               genpd_lock(pd);
> > > > -               ret = genpd_power_on(pd, 0);
> > > > -               genpd_unlock(pd);
> > > > -       }
> > > > -
> > > > -       if (ret) {
> > > > -               genpd_remove_device(pd, dev);
> > > > -               return -EPROBE_DEFER;
> > > > -       }
> > > > -
> > > >         /* Set the default performance state */
> > > >         pstate = of_get_required_opp_performance_state(dev->of_node, index);
> > > >         if (pstate < 0 && pstate != -ENODEV && pstate != -EOPNOTSUPP) {
> > > > @@ -2769,6 +2757,18 @@ static int __genpd_dev_pm_attach(struct device *dev, struct device *base_dev,
> > > >                         goto err;
> > > >                 dev_gpd_data(dev)->default_pstate = pstate;
> > > >         }
> > > > +
> > > > +       if (power_on) {
> > > > +               genpd_lock(pd);
> > > > +               ret = genpd_power_on(pd, 0);
> > > > +               genpd_unlock(pd);
> > > > +       }
> > > > +
> > > > +       if (ret) {
> > > > +               genpd_remove_device(pd, dev);
> > > > +               return -EPROBE_DEFER;
> > > > +       }
> > > > +
> > > >         return 1;
> > > >
> > > >  err:
> > > > --
> > > > 2.34.3
> > > >
> > >
Dmitry Osipenko Aug. 4, 2022, 8:58 p.m. UTC | #5
29.07.2022 12:46, Abel Vesa пишет:
>> I have looped in Dmitry and Thierry to see if they think the change
>> should be fine for Tegra platforms too.
>>
> Good. But the tegra usecase uses only the ->set_performance and does not
> use ->power_on and ->power_off for that specific PD. So I don't think
> their usecase will be affected by the order reverse.
> 

For Tegra it indeed shouldn't change anything.
Ulf Hansson Aug. 12, 2022, 1:14 p.m. UTC | #6
On Fri, 29 Jul 2022 at 11:46, Abel Vesa <abel.vesa@linaro.org> wrote:
>
> On 22-07-28 13:37:38, Ulf Hansson wrote:
> > + Dmitry, Thierry
> >
> > On Tue, 26 Jul 2022 at 20:38, Abel Vesa <abel.vesa@linaro.org> wrote:
> > >
> > > On 22-07-21 18:48:10, Ulf Hansson wrote:
> > > > On Wed, 20 Jul 2022 at 13:03, Abel Vesa <abel.vesa@linaro.org> wrote:
> > > > >
> > > > > Rather than enabling and then setting the performance state, which usually
> > > > > translates into two different levels (voltages) in order to get to the
> > > > > one required by the consumer, we could give a chance to the providers to
> > > > > cache the performance state needed by the consumer and then, when powering
> > > > > on the power domain, the provider could use the cached level instead.
> > > >
> > > > I don't think it's really clear what you want to do here. Let's see
> > > > what the discussion below brings us to, but for the next version
> > > > please elaborate a bit more in the commit message.
> > >
> > > Sorry about that. Will give more details in the next version.
> > >
> > > >
> > > > Although, if I understand correctly (also from our offlist
> > > > discussions), you want to make it possible to move from two calls,
> > > > into one call into the FW from the genpd provider. So it's basically
> > > > an optimization, which to me, certainly sounds worth doing.
> > > >
> > > > Furthermore, to get the complete picture, in the Qcom case, we set a
> > > > "default" low performance level from the genpd's ->power_on()
> > > > callback, which is needed to enable basic functionality for some
> > > > consumers.
> > > >
> > > > The second call that I refer to is made when genpd calls the
> > > > ->set_performance() callback (from genpd_runtime_suspend()), which is
> > > > done by genpd to potentially set a new value for an aggregated
> > > > performance state of the PM domain. In case when there actually is a
> > > > new performance state set in this path, we end up calling the FW twice
> > > > for the Qcom case, where this first one is unnecessary.
> > > >
> > > > Did I get that right?
> > >
> > > Actually, for every ->power_on, there is a ->set_performance right after.
> > >
> > > For example, on genpd_runtime_suspend, this is done:
> > >
> > >         genpd_lock(genpd);
> > >         ret = genpd_power_on(genpd, 0);
> > >         if (!ret)
> > >                 genpd_restore_performance_state(dev, gpd_data->rpm_pstate);
> > >         genpd_unlock(genpd);
> > >
> > > And same thing on __genpd_dev_pm_attach.
> >
> > I guess you refer to genpd_runtime_resume(), but never mind, I get it.
>
> Yes, I meant _resume.
>
> >
> > Note that, I only wanted to highlight  that genpd doesn't necessarily
> > invoke the ->set_performance() callback at genpd_runtime_resume(). It
> > depends on whether there is a new performance level to be set for the
> > device (which also needs to aggregate to a new level for the genpd in
> > question).
>
> Yes, I get that. But if there is no new performance level to be set, the
> order doesn't matter. Right?
>
> >
> > >
> > > Now, TBH, I can't think of any scenario where a consumer would want its PD powered,
> > > (which implies a non-zero voltage level) and then changed to a higher performance
> > > level (higher voltage).
> > >
> > > In most scenarios, though, the consumer needs the PD powered on to a specific voltage
> > > level.
> >
> > Yes, this sounds reasonable to me too.
> >
> > However, in some cases there are certain orders of how things need to
> > be "powered on", there may be clocks etc that need to be managed
> > carefully to not break HW or cause glitches, for example.
>
> Totally agree. Maybe this would be another reason for the
> ->set_performance to be called before the ->power_on. Think about it
> as the equivalent of clk_prepare in the CCF. Makes sense to have an
> initial preparation before actually enabling the PD. And this
> preparation should include the performance level as information passed
> to the provider.
>
> >
> > I have looped in Dmitry and Thierry to see if they think the change
> > should be fine for Tegra platforms too.
> >
>
> Good. But the tegra usecase uses only the ->set_performance and does not
> use ->power_on and ->power_off for that specific PD. So I don't think
> their usecase will be affected by the order reverse.
>
> > >
> > > Based on the two statements above, we need ->set_performance to actually act as
> > > a way to tell the provider to which voltage level to power on the power domain
> > > when the ->power_on will be called.
> > >
> > > So my suggestion with this patch is to reverse the order, do ->set_performance first
> > > and then ->power_on, this way the provider receives the voltage level required by
> > > a consumer before the request to power on the PD. Then a provider might use that
> > > info when powering on/off that PD.
> >
> > Yes, that should work fine. At least for the power on scenario. Let's
> > discuss more about the power off scenario below.
> >
> > >
> > > >
> > > > > Also the drop_performance and power_off have to be reversed so that
> > > > > when the last active consumer suspends, the level doesn't actually drop
> > > > > until the pd is disabled.
> > > >
> > > > I don't quite get what this part helps with, is it really needed to
> > > > improve the behaviour?
> > >
> > > Again, why would a consumer need its PD voltage dropped before being powered off?
> >
> > Similar to powering on a device/PM domain, we need to be careful about
> > the order of how we turn things off. Other than that, you are probably
> > right.
>
> Again, there are multiple qcom providers that use these and all of them
> need the order reversed. Other than the that, there is the tegra single
> PD case which would not be impacted, as I explained above.
>
> >
> > However, I don't quite understand how reversing the order of calling
> > the ->power_off() callback and the ->set_performance_state() would
> > help the provider to implement this. See more below.
> >
>
> Please correct me where I'm wrong, but here is the most popular scenario as
> I see it.
>
> Device A needs the PD enabled at the lowest performance level, so
> ->set_performance passes on that information to the provider and then,
> on ->power_on, the provider uses that performance level to power on that
> PD. This can be done by a single call to FW on power_on.
>
> Now lets assume there is another device (B) that needs same PD at the
> highest performance level, so genpd calls into the provider performance
> call, which checks if the PD is on and if so, it writes the new value to
> FW.
>
> On power off, lets assume device B runtime suspends first, so the
> ->set_performance will 'release' that higher performance level, so the
> provider will do another call to FW (since the PD is enabled) with the
> lowest performance level, which is still needed by device B. In this case,

/s/device B/device A (I assume)

> the ->power_off will not even be called.

Right.

>
> When the last active consumer suspends (in our case here, device A), ->power_off
> will be called first disabling the PD, then the ->set_performance will
> 'release' that lowest perf level the device A requested but will not
> call to FW since the PD is already disabled. This would make
> sure there are not two calls with two different levels to the FW.

I understand what you want to achieve, but I think the ->power_off()
scenario may be a bit more tricky.

For example, it would be perfectly fine for genpd to keep the PM
domain powered-on, even when the device A gets runtime suspended (a
genpd governor may prevent it). In other words, we may end up not
getting the ->power_off() callback invoked at all, even if there are
no runtime resumed devices in the PM domain.

Could this lead to problems on the provider side, when trying to take
into account the different combinations of sequences?

>
> Now, most of this depends on the provider's way of doing things.
> But in order to allow the provider to do what is described above, it
> needs to know about the perf level before it is asked to power on a PD.
> Same applies to powering off.
>
> > >
> > > I think it makes more sense for the ->set_performance in this case to act as a
> > > way to tell the provider that a specific device has yeilded its voltage level
> > > request. That way the provider can drop the voltage to the minimum requested by
> > > the active consumers of that PD.
> >
> > The genpd provider can know if the PM domain is powered on or off,
> > when the ->set_performance_state() callback is invoked. If it's
> > powered off, it may then decide to "cache" the request for the
> > performance level request, until it gets powered on.
>
> But the ->set_performance is called only after ->power_on, so the PD
> will always be on when ->set_performance checks. And this is what my
> patch is trying to change actually.
>
> >
> > Although, I don't see how a genpd provider should be able to cache a
> > performance state request, when the PM domain is already powered on
> > (which is what you propose, if I understand correctly), that simply
> > doesn't work for the other scenarios.
>
> I explained this above. The provider will need to check if the PD is on
> and only write to FW if it is. Otherwise it will cache the value for
> when the power_on is called.

As indicated above, it looks to me that you may need to check a
combination of things at the provider side. Is relying on whether
genpd is on/off to decide when to apply or cache a performance state,
really sufficient? I could certainly be wrong though.

Perhaps if you can provide a corresponding patch for the genpd
provider side too, that can help to convince me.

[...]

Kind regards
Uffe
Abel Vesa Aug. 12, 2022, 3:46 p.m. UTC | #7
On 22-08-12 15:14:30, Ulf Hansson wrote:
> On Fri, 29 Jul 2022 at 11:46, Abel Vesa <abel.vesa@linaro.org> wrote:
> >
> > On 22-07-28 13:37:38, Ulf Hansson wrote:
> > > + Dmitry, Thierry
> > >
> > > On Tue, 26 Jul 2022 at 20:38, Abel Vesa <abel.vesa@linaro.org> wrote:
> > > >
> > > > On 22-07-21 18:48:10, Ulf Hansson wrote:
> > > > > On Wed, 20 Jul 2022 at 13:03, Abel Vesa <abel.vesa@linaro.org> wrote:
> > > > > >
> > > > > > Rather than enabling and then setting the performance state, which usually
> > > > > > translates into two different levels (voltages) in order to get to the
> > > > > > one required by the consumer, we could give a chance to the providers to
> > > > > > cache the performance state needed by the consumer and then, when powering
> > > > > > on the power domain, the provider could use the cached level instead.
> > > > >
> > > > > I don't think it's really clear what you want to do here. Let's see
> > > > > what the discussion below brings us to, but for the next version
> > > > > please elaborate a bit more in the commit message.
> > > >
> > > > Sorry about that. Will give more details in the next version.
> > > >
> > > > >
> > > > > Although, if I understand correctly (also from our offlist
> > > > > discussions), you want to make it possible to move from two calls,
> > > > > into one call into the FW from the genpd provider. So it's basically
> > > > > an optimization, which to me, certainly sounds worth doing.
> > > > >
> > > > > Furthermore, to get the complete picture, in the Qcom case, we set a
> > > > > "default" low performance level from the genpd's ->power_on()
> > > > > callback, which is needed to enable basic functionality for some
> > > > > consumers.
> > > > >
> > > > > The second call that I refer to is made when genpd calls the
> > > > > ->set_performance() callback (from genpd_runtime_suspend()), which is
> > > > > done by genpd to potentially set a new value for an aggregated
> > > > > performance state of the PM domain. In case when there actually is a
> > > > > new performance state set in this path, we end up calling the FW twice
> > > > > for the Qcom case, where this first one is unnecessary.
> > > > >
> > > > > Did I get that right?
> > > >
> > > > Actually, for every ->power_on, there is a ->set_performance right after.
> > > >
> > > > For example, on genpd_runtime_suspend, this is done:
> > > >
> > > >         genpd_lock(genpd);
> > > >         ret = genpd_power_on(genpd, 0);
> > > >         if (!ret)
> > > >                 genpd_restore_performance_state(dev, gpd_data->rpm_pstate);
> > > >         genpd_unlock(genpd);
> > > >
> > > > And same thing on __genpd_dev_pm_attach.
> > >
> > > I guess you refer to genpd_runtime_resume(), but never mind, I get it.
> >
> > Yes, I meant _resume.
> >
> > >
> > > Note that, I only wanted to highlight  that genpd doesn't necessarily
> > > invoke the ->set_performance() callback at genpd_runtime_resume(). It
> > > depends on whether there is a new performance level to be set for the
> > > device (which also needs to aggregate to a new level for the genpd in
> > > question).
> >
> > Yes, I get that. But if there is no new performance level to be set, the
> > order doesn't matter. Right?
> >
> > >
> > > >
> > > > Now, TBH, I can't think of any scenario where a consumer would want its PD powered,
> > > > (which implies a non-zero voltage level) and then changed to a higher performance
> > > > level (higher voltage).
> > > >
> > > > In most scenarios, though, the consumer needs the PD powered on to a specific voltage
> > > > level.
> > >
> > > Yes, this sounds reasonable to me too.
> > >
> > > However, in some cases there are certain orders of how things need to
> > > be "powered on", there may be clocks etc that need to be managed
> > > carefully to not break HW or cause glitches, for example.
> >
> > Totally agree. Maybe this would be another reason for the
> > ->set_performance to be called before the ->power_on. Think about it
> > as the equivalent of clk_prepare in the CCF. Makes sense to have an
> > initial preparation before actually enabling the PD. And this
> > preparation should include the performance level as information passed
> > to the provider.
> >
> > >
> > > I have looped in Dmitry and Thierry to see if they think the change
> > > should be fine for Tegra platforms too.
> > >
> >
> > Good. But the tegra usecase uses only the ->set_performance and does not
> > use ->power_on and ->power_off for that specific PD. So I don't think
> > their usecase will be affected by the order reverse.
> >
> > > >
> > > > Based on the two statements above, we need ->set_performance to actually act as
> > > > a way to tell the provider to which voltage level to power on the power domain
> > > > when the ->power_on will be called.
> > > >
> > > > So my suggestion with this patch is to reverse the order, do ->set_performance first
> > > > and then ->power_on, this way the provider receives the voltage level required by
> > > > a consumer before the request to power on the PD. Then a provider might use that
> > > > info when powering on/off that PD.
> > >
> > > Yes, that should work fine. At least for the power on scenario. Let's
> > > discuss more about the power off scenario below.
> > >
> > > >
> > > > >
> > > > > > Also the drop_performance and power_off have to be reversed so that
> > > > > > when the last active consumer suspends, the level doesn't actually drop
> > > > > > until the pd is disabled.
> > > > >
> > > > > I don't quite get what this part helps with, is it really needed to
> > > > > improve the behaviour?
> > > >
> > > > Again, why would a consumer need its PD voltage dropped before being powered off?
> > >
> > > Similar to powering on a device/PM domain, we need to be careful about
> > > the order of how we turn things off. Other than that, you are probably
> > > right.
> >
> > Again, there are multiple qcom providers that use these and all of them
> > need the order reversed. Other than the that, there is the tegra single
> > PD case which would not be impacted, as I explained above.
> >
> > >
> > > However, I don't quite understand how reversing the order of calling
> > > the ->power_off() callback and the ->set_performance_state() would
> > > help the provider to implement this. See more below.
> > >
> >
> > Please correct me where I'm wrong, but here is the most popular scenario as
> > I see it.
> >
> > Device A needs the PD enabled at the lowest performance level, so
> > ->set_performance passes on that information to the provider and then,
> > on ->power_on, the provider uses that performance level to power on that
> > PD. This can be done by a single call to FW on power_on.
> >
> > Now lets assume there is another device (B) that needs same PD at the
> > highest performance level, so genpd calls into the provider performance
> > call, which checks if the PD is on and if so, it writes the new value to
> > FW.
> >
> > On power off, lets assume device B runtime suspends first, so the
> > ->set_performance will 'release' that higher performance level, so the
> > provider will do another call to FW (since the PD is enabled) with the
> > lowest performance level, which is still needed by device B. In this case,
> 
> /s/device B/device A (I assume)
> 
> > the ->power_off will not even be called.
> 
> Right.
> 
> >
> > When the last active consumer suspends (in our case here, device A), ->power_off
> > will be called first disabling the PD, then the ->set_performance will
> > 'release' that lowest perf level the device A requested but will not
> > call to FW since the PD is already disabled. This would make
> > sure there are not two calls with two different levels to the FW.
> 
> I understand what you want to achieve, but I think the ->power_off()
> scenario may be a bit more tricky.
> 
> For example, it would be perfectly fine for genpd to keep the PM
> domain powered-on, even when the device A gets runtime suspended (a
> genpd governor may prevent it). In other words, we may end up not
> getting the ->power_off() callback invoked at all, even if there are
> no runtime resumed devices in the PM domain.
> 
> Could this lead to problems on the provider side, when trying to take
> into account the different combinations of sequences?

Correct me if I'm wrong, but even if a genpd governor would prevent
the power_off to be called, if we do the reversal, since the power
domain is not off, the provider would lower the performance state and
that's it. The responsability falls on the provider, but so does with
the current order of the calls.

So I don't see how this could lead to problems compared to the current
order of the calls.

Maybe I missunderstood your point, so please correct me if I'm getting
this wrong.

> 
> >
> > Now, most of this depends on the provider's way of doing things.
> > But in order to allow the provider to do what is described above, it
> > needs to know about the perf level before it is asked to power on a PD.
> > Same applies to powering off.
> >
> > > >
> > > > I think it makes more sense for the ->set_performance in this case to act as a
> > > > way to tell the provider that a specific device has yeilded its voltage level
> > > > request. That way the provider can drop the voltage to the minimum requested by
> > > > the active consumers of that PD.
> > >
> > > The genpd provider can know if the PM domain is powered on or off,
> > > when the ->set_performance_state() callback is invoked. If it's
> > > powered off, it may then decide to "cache" the request for the
> > > performance level request, until it gets powered on.
> >
> > But the ->set_performance is called only after ->power_on, so the PD
> > will always be on when ->set_performance checks. And this is what my
> > patch is trying to change actually.
> >
> > >
> > > Although, I don't see how a genpd provider should be able to cache a
> > > performance state request, when the PM domain is already powered on
> > > (which is what you propose, if I understand correctly), that simply
> > > doesn't work for the other scenarios.
> >
> > I explained this above. The provider will need to check if the PD is on
> > and only write to FW if it is. Otherwise it will cache the value for
> > when the power_on is called.
> 
> As indicated above, it looks to me that you may need to check a
> combination of things at the provider side. Is relying on whether
> genpd is on/off to decide when to apply or cache a performance state,
> really sufficient? I could certainly be wrong though.

I don't think there is any change from this point of view, when compared
to the current order. Even with the current order, the provider would
either cache the performance state if the power domain is off, or would
apply it if the power domain is on.

> 
> Perhaps if you can provide a corresponding patch for the genpd
> provider side too, that can help to convince me.

The qcom-rpmhpd actually does that even now. On set_performance, it caches
the performance state (corner) if the power domain is disabled, and it
applies (aggregates the corner) if the power domain is enabled.

> 
> [...]
> 
> Kind regards
> Uffe
Abel Vesa Aug. 12, 2022, 3:47 p.m. UTC | #8
On 22-08-04 23:58:34, Dmitry Osipenko wrote:
> 29.07.2022 12:46, Abel Vesa пишет:
> >> I have looped in Dmitry and Thierry to see if they think the change
> >> should be fine for Tegra platforms too.
> >>
> > Good. But the tegra usecase uses only the ->set_performance and does not
> > use ->power_on and ->power_off for that specific PD. So I don't think
> > their usecase will be affected by the order reverse.
> > 
> 
> For Tegra it indeed shouldn't change anything.

Thanks Dmitry for confirming.
Ulf Hansson Aug. 16, 2022, 10:48 a.m. UTC | #9
[...]

> > >
> > > When the last active consumer suspends (in our case here, device A), ->power_off
> > > will be called first disabling the PD, then the ->set_performance will
> > > 'release' that lowest perf level the device A requested but will not
> > > call to FW since the PD is already disabled. This would make
> > > sure there are not two calls with two different levels to the FW.
> >
> > I understand what you want to achieve, but I think the ->power_off()
> > scenario may be a bit more tricky.
> >
> > For example, it would be perfectly fine for genpd to keep the PM
> > domain powered-on, even when the device A gets runtime suspended (a
> > genpd governor may prevent it). In other words, we may end up not
> > getting the ->power_off() callback invoked at all, even if there are
> > no runtime resumed devices in the PM domain.
> >
> > Could this lead to problems on the provider side, when trying to take
> > into account the different combinations of sequences?
>
> Correct me if I'm wrong, but even if a genpd governor would prevent
> the power_off to be called, if we do the reversal, since the power
> domain is not off, the provider would lower the performance state and
> that's it. The responsability falls on the provider, but so does with
> the current order of the calls.
>
> So I don't see how this could lead to problems compared to the current
> order of the calls.

Alright, I agree, it shouldn't really matter then.

>
> Maybe I missunderstood your point, so please correct me if I'm getting
> this wrong.
>
> >
> > >
> > > Now, most of this depends on the provider's way of doing things.
> > > But in order to allow the provider to do what is described above, it
> > > needs to know about the perf level before it is asked to power on a PD.
> > > Same applies to powering off.
> > >
> > > > >
> > > > > I think it makes more sense for the ->set_performance in this case to act as a
> > > > > way to tell the provider that a specific device has yeilded its voltage level
> > > > > request. That way the provider can drop the voltage to the minimum requested by
> > > > > the active consumers of that PD.
> > > >
> > > > The genpd provider can know if the PM domain is powered on or off,
> > > > when the ->set_performance_state() callback is invoked. If it's
> > > > powered off, it may then decide to "cache" the request for the
> > > > performance level request, until it gets powered on.
> > >
> > > But the ->set_performance is called only after ->power_on, so the PD
> > > will always be on when ->set_performance checks. And this is what my
> > > patch is trying to change actually.
> > >
> > > >
> > > > Although, I don't see how a genpd provider should be able to cache a
> > > > performance state request, when the PM domain is already powered on
> > > > (which is what you propose, if I understand correctly), that simply
> > > > doesn't work for the other scenarios.
> > >
> > > I explained this above. The provider will need to check if the PD is on
> > > and only write to FW if it is. Otherwise it will cache the value for
> > > when the power_on is called.
> >
> > As indicated above, it looks to me that you may need to check a
> > combination of things at the provider side. Is relying on whether
> > genpd is on/off to decide when to apply or cache a performance state,
> > really sufficient? I could certainly be wrong though.
>
> I don't think there is any change from this point of view, when compared
> to the current order. Even with the current order, the provider would
> either cache the performance state if the power domain is off, or would
> apply it if the power domain is on.

For the Qcom case, I don't think it's that simple on the genpd provider side.

With the changes you propose in the $subject patch, I think there are
two specific scenarios when the genpd can be powered off and when the
->set_performance_state() callback can get called. These scenarios can
just rely on whether the genpd is powered off or not, to make the best
decision. See more below.

*) In genpd_runtime_resume() we make sure to *raise* the performance
state prior to power on the PM domain, if the PM domain is powered
off, of course. In this way the ->set_performance_state() callback may
be invoked when the genpd is powered off, to *raise* the performance
state.

**) In genpd_runtime_suspend() we may power off the PM domain, before
invoking the ->set_performance_state() callback to *lower* the
performance state.

In other words, just checking whether the genpd is powered off, to
decide to cache/postpone the call to the FW to set a new performance
state, would mean that we may end up running in a higher performance
state than actually needed, right?

Perhaps if we would check if the performance state is lowered (or set
to zero) too, that should improve the situation, right?

>
> >
> > Perhaps if you can provide a corresponding patch for the genpd
> > provider side too, that can help to convince me.
>
> The qcom-rpmhpd actually does that even now. On set_performance, it caches
> the performance state (corner) if the power domain is disabled, and it
> applies (aggregates the corner) if the power domain is enabled.

Okay, good!

As stated above, this sounds like it can be improved then, right?

Kind regards
Uffe
Abel Vesa Aug. 16, 2022, 12:23 p.m. UTC | #10
On 22-08-16 12:48:20, Ulf Hansson wrote:
> [...]
> 
> > > >
> > > > When the last active consumer suspends (in our case here, device A), ->power_off
> > > > will be called first disabling the PD, then the ->set_performance will
> > > > 'release' that lowest perf level the device A requested but will not
> > > > call to FW since the PD is already disabled. This would make
> > > > sure there are not two calls with two different levels to the FW.
> > >
> > > I understand what you want to achieve, but I think the ->power_off()
> > > scenario may be a bit more tricky.
> > >
> > > For example, it would be perfectly fine for genpd to keep the PM
> > > domain powered-on, even when the device A gets runtime suspended (a
> > > genpd governor may prevent it). In other words, we may end up not
> > > getting the ->power_off() callback invoked at all, even if there are
> > > no runtime resumed devices in the PM domain.
> > >
> > > Could this lead to problems on the provider side, when trying to take
> > > into account the different combinations of sequences?
> >
> > Correct me if I'm wrong, but even if a genpd governor would prevent
> > the power_off to be called, if we do the reversal, since the power
> > domain is not off, the provider would lower the performance state and
> > that's it. The responsability falls on the provider, but so does with
> > the current order of the calls.
> >
> > So I don't see how this could lead to problems compared to the current
> > order of the calls.
> 
> Alright, I agree, it shouldn't really matter then.
> 
> >
> > Maybe I missunderstood your point, so please correct me if I'm getting
> > this wrong.
> >
> > >
> > > >
> > > > Now, most of this depends on the provider's way of doing things.
> > > > But in order to allow the provider to do what is described above, it
> > > > needs to know about the perf level before it is asked to power on a PD.
> > > > Same applies to powering off.
> > > >
> > > > > >
> > > > > > I think it makes more sense for the ->set_performance in this case to act as a
> > > > > > way to tell the provider that a specific device has yeilded its voltage level
> > > > > > request. That way the provider can drop the voltage to the minimum requested by
> > > > > > the active consumers of that PD.
> > > > >
> > > > > The genpd provider can know if the PM domain is powered on or off,
> > > > > when the ->set_performance_state() callback is invoked. If it's
> > > > > powered off, it may then decide to "cache" the request for the
> > > > > performance level request, until it gets powered on.
> > > >
> > > > But the ->set_performance is called only after ->power_on, so the PD
> > > > will always be on when ->set_performance checks. And this is what my
> > > > patch is trying to change actually.
> > > >
> > > > >
> > > > > Although, I don't see how a genpd provider should be able to cache a
> > > > > performance state request, when the PM domain is already powered on
> > > > > (which is what you propose, if I understand correctly), that simply
> > > > > doesn't work for the other scenarios.
> > > >
> > > > I explained this above. The provider will need to check if the PD is on
> > > > and only write to FW if it is. Otherwise it will cache the value for
> > > > when the power_on is called.
> > >
> > > As indicated above, it looks to me that you may need to check a
> > > combination of things at the provider side. Is relying on whether
> > > genpd is on/off to decide when to apply or cache a performance state,
> > > really sufficient? I could certainly be wrong though.
> >
> > I don't think there is any change from this point of view, when compared
> > to the current order. Even with the current order, the provider would
> > either cache the performance state if the power domain is off, or would
> > apply it if the power domain is on.
> 
> For the Qcom case, I don't think it's that simple on the genpd provider side.
> 
> With the changes you propose in the $subject patch, I think there are
> two specific scenarios when the genpd can be powered off and when the
> ->set_performance_state() callback can get called. These scenarios can
> just rely on whether the genpd is powered off or not, to make the best
> decision. See more below.
> 
> *) In genpd_runtime_resume() we make sure to *raise* the performance
> state prior to power on the PM domain, if the PM domain is powered
> off, of course. In this way the ->set_performance_state() callback may
> be invoked when the genpd is powered off, to *raise* the performance
> state.

I'm not sure I understand the issue with this one. Please note that the
genpd will not decide whether to call the ->set_performance_state() or
not. The change I propose here is to _always_ call ->set_performance_state()
before calling ->power_on(). There is no condition there.

Since the provider will always get the call to ->set_performance_state(),
and based on the state of the genpd (powered on or not), it will either
call to FW or will cache the value for when the next ->power_on() call is
done.

> 
> **) In genpd_runtime_suspend() we may power off the PM domain, before
> invoking the ->set_performance_state() callback to *lower* the
> performance state.

Yeah, this is so that it would not undervolt the consumer.
In some cases, an undevolt, on some platforms, could actually result in a
consumer's HW FSM hang.

And it really doesn't make sense to drop the voltage right before powering
off the genpd. Instead, it makes sense to let the provider know that a
specific perf state is not voted for by a consumer anymore, only after the genpd
is powered off.

Now, that being said, there is the case where a consumer drops its vote
for a specific perf state, but since there is another consumer using
that genpd, it doesn't power down. So that could result in undervolt
too, but if there is a know such issue on a platform, the responsability
to handle that would fall on the provider, and there are ways to do
that.

I hope I'm not complicating the problem we're trying to solve here even
more by adding more scenarios.

> 
> In other words, just checking whether the genpd is powered off, to
> decide to cache/postpone the call to the FW to set a new performance
> state, would mean that we may end up running in a higher performance
> state than actually needed, right?

Assuming I understood your comment right, in the first scenario you mentioned,
the actual point when a specific performance state is actually started is on
->power_on(), not on ->set_performance(). As you said, the genpd is off,
so between ->set_performance() and the ->power_on() calls, the
performance state is still 0 (genpd disabled).

> 
> Perhaps if we would check if the performance state is lowered (or set
> to zero) too, that should improve the situation, right?
> 

Unless I wrong in the statements I made above, I don't see a need for such a
check.

Or maybe I missed your point.

> >
> > >
> > > Perhaps if you can provide a corresponding patch for the genpd
> > > provider side too, that can help to convince me.
> >
> > The qcom-rpmhpd actually does that even now. On set_performance, it caches
> > the performance state (corner) if the power domain is disabled, and it
> > applies (aggregates the corner) if the power domain is enabled.
> 
> Okay, good!
> 
> As stated above, this sounds like it can be improved then, right?
> 

I would certainly say so.

> Kind regards
> Uffe
Ulf Hansson Aug. 17, 2022, 11:04 a.m. UTC | #11
On Tue, 16 Aug 2022 at 14:23, Abel Vesa <abel.vesa@linaro.org> wrote:
>
> On 22-08-16 12:48:20, Ulf Hansson wrote:
> > [...]
> >
> > > > >
> > > > > When the last active consumer suspends (in our case here, device A), ->power_off
> > > > > will be called first disabling the PD, then the ->set_performance will
> > > > > 'release' that lowest perf level the device A requested but will not
> > > > > call to FW since the PD is already disabled. This would make
> > > > > sure there are not two calls with two different levels to the FW.
> > > >
> > > > I understand what you want to achieve, but I think the ->power_off()
> > > > scenario may be a bit more tricky.
> > > >
> > > > For example, it would be perfectly fine for genpd to keep the PM
> > > > domain powered-on, even when the device A gets runtime suspended (a
> > > > genpd governor may prevent it). In other words, we may end up not
> > > > getting the ->power_off() callback invoked at all, even if there are
> > > > no runtime resumed devices in the PM domain.
> > > >
> > > > Could this lead to problems on the provider side, when trying to take
> > > > into account the different combinations of sequences?
> > >
> > > Correct me if I'm wrong, but even if a genpd governor would prevent
> > > the power_off to be called, if we do the reversal, since the power
> > > domain is not off, the provider would lower the performance state and
> > > that's it. The responsability falls on the provider, but so does with
> > > the current order of the calls.
> > >
> > > So I don't see how this could lead to problems compared to the current
> > > order of the calls.
> >
> > Alright, I agree, it shouldn't really matter then.
> >
> > >
> > > Maybe I missunderstood your point, so please correct me if I'm getting
> > > this wrong.
> > >
> > > >
> > > > >
> > > > > Now, most of this depends on the provider's way of doing things.
> > > > > But in order to allow the provider to do what is described above, it
> > > > > needs to know about the perf level before it is asked to power on a PD.
> > > > > Same applies to powering off.
> > > > >
> > > > > > >
> > > > > > > I think it makes more sense for the ->set_performance in this case to act as a
> > > > > > > way to tell the provider that a specific device has yeilded its voltage level
> > > > > > > request. That way the provider can drop the voltage to the minimum requested by
> > > > > > > the active consumers of that PD.
> > > > > >
> > > > > > The genpd provider can know if the PM domain is powered on or off,
> > > > > > when the ->set_performance_state() callback is invoked. If it's
> > > > > > powered off, it may then decide to "cache" the request for the
> > > > > > performance level request, until it gets powered on.
> > > > >
> > > > > But the ->set_performance is called only after ->power_on, so the PD
> > > > > will always be on when ->set_performance checks. And this is what my
> > > > > patch is trying to change actually.
> > > > >
> > > > > >
> > > > > > Although, I don't see how a genpd provider should be able to cache a
> > > > > > performance state request, when the PM domain is already powered on
> > > > > > (which is what you propose, if I understand correctly), that simply
> > > > > > doesn't work for the other scenarios.
> > > > >
> > > > > I explained this above. The provider will need to check if the PD is on
> > > > > and only write to FW if it is. Otherwise it will cache the value for
> > > > > when the power_on is called.
> > > >
> > > > As indicated above, it looks to me that you may need to check a
> > > > combination of things at the provider side. Is relying on whether
> > > > genpd is on/off to decide when to apply or cache a performance state,
> > > > really sufficient? I could certainly be wrong though.
> > >
> > > I don't think there is any change from this point of view, when compared
> > > to the current order. Even with the current order, the provider would
> > > either cache the performance state if the power domain is off, or would
> > > apply it if the power domain is on.
> >
> > For the Qcom case, I don't think it's that simple on the genpd provider side.
> >
> > With the changes you propose in the $subject patch, I think there are
> > two specific scenarios when the genpd can be powered off and when the
> > ->set_performance_state() callback can get called. These scenarios can
> > just rely on whether the genpd is powered off or not, to make the best
> > decision. See more below.
> >
> > *) In genpd_runtime_resume() we make sure to *raise* the performance
> > state prior to power on the PM domain, if the PM domain is powered
> > off, of course. In this way the ->set_performance_state() callback may
> > be invoked when the genpd is powered off, to *raise* the performance
> > state.
>
> I'm not sure I understand the issue with this one. Please note that the
> genpd will not decide whether to call the ->set_performance_state() or
> not. The change I propose here is to _always_ call ->set_performance_state()
> before calling ->power_on(). There is no condition there.

The idea was to describe two scenarios under when
->set_performance_state() is invoked when the genpd is *powered off*.
In these two scenarios, it looks to me that we may want to make two
different decisions.

In the first scenario above, we want to cache/postpone the call to the
FW to set a new performance state, until the next time the PM domain
gets powered on.

In the second scenario below, we directly want to make the call the
FW, to avoid running at a higher performance level than really needed.

>
> Since the provider will always get the call to ->set_performance_state(),
> and based on the state of the genpd (powered on or not), it will either
> call to FW or will cache the value for when the next ->power_on() call is
> done.

Yes, that makes sense for me too.

>
> >
> > **) In genpd_runtime_suspend() we may power off the PM domain, before
> > invoking the ->set_performance_state() callback to *lower* the
> > performance state.
>
> Yeah, this is so that it would not undervolt the consumer.
> In some cases, an undevolt, on some platforms, could actually result in a
> consumer's HW FSM hang.
>
> And it really doesn't make sense to drop the voltage right before powering
> off the genpd. Instead, it makes sense to let the provider know that a
> specific perf state is not voted for by a consumer anymore, only after the genpd
> is powered off.

I agree, it seems like a reasonable change to make to genpd.

>
> Now, that being said, there is the case where a consumer drops its vote
> for a specific perf state, but since there is another consumer using
> that genpd, it doesn't power down. So that could result in undervolt
> too, but if there is a know such issue on a platform, the responsability
> to handle that would fall on the provider, and there are ways to do
> that.

I believe I understand this use case too, but please correct me if I am wrong.

The "last" consumer device that keeps the PM domain powered on, may
need some basic performance level to be functional (but without having
to put a vote on a performance state from genpd point of view).

That's the main reason why we need to make use of the ->power_on|off()
callbacks in combination with the >set_performance_state() callback
for the Qcom platforms. Otherwise, we could have settled with just
using the ->set_performance_state() callback, like in the Tegra case.

>
> I hope I'm not complicating the problem we're trying to solve here even
> more by adding more scenarios.

No worries, it's fine.

>
> >
> > In other words, just checking whether the genpd is powered off, to
> > decide to cache/postpone the call to the FW to set a new performance
> > state, would mean that we may end up running in a higher performance
> > state than actually needed, right?
>
> Assuming I understood your comment right, in the first scenario you mentioned,
> the actual point when a specific performance state is actually started is on
> ->power_on(), not on ->set_performance(). As you said, the genpd is off,
> so between ->set_performance() and the ->power_on() calls, the
> performance state is still 0 (genpd disabled).
>
> >
> > Perhaps if we would check if the performance state is lowered (or set
> > to zero) too, that should improve the situation, right?
> >
>
> Unless I wrong in the statements I made above, I don't see a need for such a
> check.
>
> Or maybe I missed your point.

Yes, unfortunately I think you did.

On the other hand, it doesn't really matter much. You have convinced
me that the $subject patch makes sense. May I suggest you make a new
submission, skipping the RFC - and try to clarify a bit on the problem
for Qcom specifically in the commit message. Then I will happily give
it my blessing.

[...]

Kind regards
Uffe
diff mbox series

Patch

diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
index 5a2e0232862e..38647c304b73 100644
--- a/drivers/base/power/domain.c
+++ b/drivers/base/power/domain.c
@@ -939,8 +939,8 @@  static int genpd_runtime_suspend(struct device *dev)
 		return 0;

 	genpd_lock(genpd);
-	gpd_data->rpm_pstate = genpd_drop_performance_state(dev);
 	genpd_power_off(genpd, true, 0);
+	gpd_data->rpm_pstate = genpd_drop_performance_state(dev);
 	genpd_unlock(genpd);

 	return 0;
@@ -978,9 +978,8 @@  static int genpd_runtime_resume(struct device *dev)
 		goto out;

 	genpd_lock(genpd);
+	genpd_restore_performance_state(dev, gpd_data->rpm_pstate);
 	ret = genpd_power_on(genpd, 0);
-	if (!ret)
-		genpd_restore_performance_state(dev, gpd_data->rpm_pstate);
 	genpd_unlock(genpd);

 	if (ret)
@@ -1018,8 +1017,8 @@  static int genpd_runtime_resume(struct device *dev)
 err_poweroff:
 	if (!pm_runtime_is_irq_safe(dev) || genpd_is_irq_safe(genpd)) {
 		genpd_lock(genpd);
-		gpd_data->rpm_pstate = genpd_drop_performance_state(dev);
 		genpd_power_off(genpd, true, 0);
+		gpd_data->rpm_pstate = genpd_drop_performance_state(dev);
 		genpd_unlock(genpd);
 	}

@@ -2747,17 +2746,6 @@  static int __genpd_dev_pm_attach(struct device *dev, struct device *base_dev,
 	dev->pm_domain->detach = genpd_dev_pm_detach;
 	dev->pm_domain->sync = genpd_dev_pm_sync;

-	if (power_on) {
-		genpd_lock(pd);
-		ret = genpd_power_on(pd, 0);
-		genpd_unlock(pd);
-	}
-
-	if (ret) {
-		genpd_remove_device(pd, dev);
-		return -EPROBE_DEFER;
-	}
-
 	/* Set the default performance state */
 	pstate = of_get_required_opp_performance_state(dev->of_node, index);
 	if (pstate < 0 && pstate != -ENODEV && pstate != -EOPNOTSUPP) {
@@ -2769,6 +2757,18 @@  static int __genpd_dev_pm_attach(struct device *dev, struct device *base_dev,
 			goto err;
 		dev_gpd_data(dev)->default_pstate = pstate;
 	}
+
+	if (power_on) {
+		genpd_lock(pd);
+		ret = genpd_power_on(pd, 0);
+		genpd_unlock(pd);
+	}
+
+	if (ret) {
+		genpd_remove_device(pd, dev);
+		return -EPROBE_DEFER;
+	}
+
 	return 1;

 err: