diff mbox

[0/12] PM / sleep: Driver flags for system suspend/resume

Message ID 1591167.zlNa3zLfmM@aspire.rjw.lan (mailing list archive)
State New, archived
Delegated to: Bjorn Helgaas
Headers show

Commit Message

Rafael J. Wysocki Oct. 18, 2017, 12:39 a.m. UTC
On Tuesday, October 17, 2017 9:41:16 PM CEST Ulf Hansson wrote:

[cut]

> >
> >> deploying this and from a middle layer point of view, all the trivial
> >> cases supports this.
> >
> > These functions are wrong, however, because they attempt to reuse the
> > whole callback *path* instead of just reusing driver callbacks.  The
> > *only* reason why it all "works" is because there are no middle layer
> > callbacks involved in that now.
> >
> > If you changed them to reuse driver callbacks only today, nothing would break
> > AFAICS.
> 
> Yes, it would.
> 
> First, for example, the amba bus is responsible for the amba bus
> clock, but relies on drivers to gate/ungate it during system sleep. In
> case the amba drivers don't use the pm_runtime_force_suspend|resume(),
> it will explicitly have to start manage the clock during system sleep
> themselves. Leading to open coding.

Well, I suspected that something like this would surface. ;-)

Are there any major reasons why the appended patch (obviously untested) won't
work, then?

> Second, it will introduce a regression in behavior for all users of
> pm_runtime_force_suspend|resume(), especially during system resume as
> the driver may then end up resuming the device even in case it isn't
> needed.

How so?

I'm talking about a change like in the appended patch, where
pm_runtime_force_* simply invoke driver callbacks directly.  What is
skipped there is middle-layer stuff which is empty anyway in all cases
except for AMBA (if that's all what is lurking below the surface), so
I don't quite see how the failure will happen.

> I believe I have explained why, also several times by now -
> and that's also how far you could take the i2c designware driver at
> this point.
> 
> That said, I assume the second part may be addressed in this series,
> if these drivers convert to use the "driver PM flags", right?
> 
> However, what about the first case? Is some open coding needed or your
> think the amba driver can instruct the amba bus via the "driver PM
> flags"?

With the appended patch applied things should work for AMBA like for
any other bus type implementing PM, so I don't see why not.

> >
> >> Like the spi bus, i2c bus, amba bus, platform
> >> bus, genpd, etc. There are no changes needed to continue to support
> >> this option, if you see what I mean.
> >
> > For the time being, nothing changes in that respect, but eventually I'd
> > prefer the pm_runtime_force_* things to go away, frankly.
> 
> Okay, thanks for that clear statement!
> 
> >
> >> So, when you say that re-using runtime PM callbacks for system-wide PM
> >> isn't going to happen, can you please elaborate what you mean?
> >
> > I didn't mean "reusing runtime PM callbacks for system-wide PM" overall, but
> > reusing *middle-layer* runtime PM callbacks for system-wide PM.  That is the
> > bogus part.
> 
> I think we have discussed this several times, but the arguments you
> have put forward, explaining *why* haven't yet convinced me.

Well, sorry about that.  I would like to be able to explain my point to you so
that you understand my perspective, but if that's not working, that's not a
sufficient reason for me to give up.

I'm just refusing to maintain code that I don't agree with in the long run.

> In principle what you have been saying is that it's a "layering
> violation" to use pm_runtime_force_suspend|resume() from driver's
> system sleep callbacks, but on the other hand you think using
> pm_runtime_get*  and friends is okay!?

Not unconditionally, which would be fair to mention.

Only if it is called in ->prepare or as the first thing in a ->suspend
callback.  Later than that is broken too in principle.

> That makes little sense to me, because it's the same "layering
> violation" that is done for both cases.

The "layering violation" is all about things possibly occurring in a
wrong order.  For example, say a middle-layer ->runtime_suspend is
called via pm_runtime_force_suspend() which in turn is called from
middle-layer ->suspend_late as a driver callback.  If the ->runtime_suspend
does anything significat to the device, then executing the remaining part of
->suspend_late will almost cetainly break things, more or less.

That is not a concern with a middle-layer ->runtime_resume running
*before* a middle-layer ->suspend (or any subsequent callbacks) does
anything significant to the device.

Is there anything in the above which is not clear enough?

> Moreover, you have been explaining that re-using runtime PM callbacks
> for PCI doesn't work. Then my question is, why should a limitation of
> the PCI subsystem put constraints on the behavior for all other
> subsystems/middle-layers?

Because they aren't just PCI subsystem limitations only.  The need to handle
wakeup setup differently for runtime PM and system sleep is not PCI-specific.
The need to handle suspend and hibernation differently isn't too.

Those things may be more obvious in PCI, but they are generic rather than
special.

Also, quite so often other middle layers interact with PCI directly or
indirectly (eg. a platform device may be a child or a consumer of a PCI
device) and some optimizations need to take that into account (eg. parents
generally need to be accessible when their childres are resumed and so on).

Moreover, the majority of the "other subsystems/middle-layers" you've talked
about so far don't provide any PM callbacks to be invoked by pm_runtime_force_*,
so question is how representative they really are.

> >
> > Quoting again:
> >
> > "If you are a middle layer, your role is basically to do PM for a certain
> > group of devices.  Thus you cannot really do the same in ->suspend or
> > ->suspend_early and in ->runtime_suspend (because the former generally need to
> > take device_may_wakeup() into account and the latter doesn't) and you shouldn't
> > really do the same in ->suspend and ->freeze (becuase the latter shouldn't
> > change the device's power state) and so on."
> >
> > I have said for multiple times that re-using *driver* callbacks actually makes
> > sense and the series is for doing that easier in general among other things.
> >
> >> I assume you mean that the PM core won't be involved to support this,
> >> but is that it?
> >>
> >> Do you also mean that *all* users of pm_runtime_force_suspend|resume()
> >> must convert to this new thing, using "driver PM flags", so in the end
> >> you want to remove pm_runtime_force_suspend|resume()?
> >>  - Then if so, you must of course consider all cases for how
> >> pm_runtime_force_suspend|resume() are being deployed currently, else
> >> existing users can't convert to the "driver PM flags" thing. Have you
> >> done that in this series?
> >
> > Let me turn this around.
> >
> > The majority of cases in which pm_runtime_force_* are used *should* be
> > addressable using the flags introduced here.  Some case in which
> > pm_runtime_force_* cannot be used should be addressable by these flags
> > as well.
> 
> That's sounds really great!
> 
> >
> > There may be some cases in which pm_runtime_force_* are used that may
> > require something more, but I'm not going to worry about that right now.
> 
> This approach concerns me, because if we in the end realizes that
> pm_runtime_force_suspend|resume() will be too hard to get rid of, then
> this series just add yet another generic way of trying to optimize the
> system sleep path for runtime PM enabled devices.

Which also works for PCI and the ACPI PM domain and that's sort of valuable
anyway, isn't it?

For the record, I don't think it will be too hard to get rid of
pm_runtime_force_suspend|resume(), although that may take quite some time.

> So then we would end up having to support the "direct_complete" path,
> the "driver PM flags" and cases where
> pm_runtime_force_suspend|resume() is used. No, that just isn't good
> enough to me. That will just lead to similar scenarios as we had in
> the i2c designware driver.

Frankly, this sounds like staging for indefinite blocking of changes in
this area on non-technical grounds.  I hope that it isn't the case ...

> If we decide to go with these new "driver PM flags", I want to make
> sure, as long as possible, that we can remove both the
> "direct_complete" path support from the PM core as well as removing
> the pm_runtime_force_suspend|resume() helpers.

We'll see.

> >
> > I'll take care of that when I'll be removing pm_runtime_force_*, which I'm
> > not doing here.
> 
> Of course I am fine with that we postpone doing the actual converting
> of drivers etc from this series, although as stated above, let's sure
> we *can* do it by using the "driver PM flags".

There clearly are use cases that benefit from this series and I don't see
any alternatives covering them, including both direct-complete and the
pm_runtime_force* approach, so I'm not buying this "let's make sure
it can cover all possible use cases that exist" argumentation.

Thanks,
Rafael


---
 drivers/amba/bus.c           |   79 ++++++++++++++++++++++++++++---------------
 drivers/base/power/runtime.c |   10 +++--
 2 files changed, 58 insertions(+), 31 deletions(-)

Comments

Rafael J. Wysocki Oct. 18, 2017, 10:24 a.m. UTC | #1
On Wednesday, October 18, 2017 2:39:24 AM CEST Rafael J. Wysocki wrote:
> On Tuesday, October 17, 2017 9:41:16 PM CEST Ulf Hansson wrote:
> 
> [cut]
> 
> > >
> > >> deploying this and from a middle layer point of view, all the trivial
> > >> cases supports this.
> > >
> > > These functions are wrong, however, because they attempt to reuse the
> > > whole callback *path* instead of just reusing driver callbacks.  The
> > > *only* reason why it all "works" is because there are no middle layer
> > > callbacks involved in that now.
> > >
> > > If you changed them to reuse driver callbacks only today, nothing would break
> > > AFAICS.
> > 
> > Yes, it would.
> > 
> > First, for example, the amba bus is responsible for the amba bus
> > clock, but relies on drivers to gate/ungate it during system sleep. In
> > case the amba drivers don't use the pm_runtime_force_suspend|resume(),
> > it will explicitly have to start manage the clock during system sleep
> > themselves. Leading to open coding.
> 
> Well, I suspected that something like this would surface. ;-)
> 
> Are there any major reasons why the appended patch (obviously untested) won't
> work, then?

OK, there is a reason, which is the optimizations bundled into
pm_runtime_force_*, because (a) the device may be left in runtime suspend
by them (in which case amba_pm_suspend_early() in my patch should not run)
and (b) pm_runtime_force_resume() may decide to leave it suspended (in which
case amba_pm_suspend_late() in my patch should not run).

[BTW, the "leave the device suspended" optimization in pm_runtime_force_*
is potentially problematic too, because it requires the children to do
the right thing, which effectively means that their drivers need to use
pm_runtime_force_* too, but what if they don't want to reuse their
runtime PM callbacks for system-wide PM?]

Honestly, I don't like the way this is designed.  IMO, it would be better
to do the optimizations and all in the bus type middle-layer code instead
of expecting drivers to use pm_runtime_force_* as their system-wide PM
callbacks (and that expectation should at least be documented, which I'm
not sure is the case now).  But whatever.

It all should work the way it does now without pm_runtime_force_* if (a) the
bus type's PM callbacks are changed like in the last patch and the drivers
(b) point their system suspend callbacks to the runtime PM callback routines
and (c) set DPM_FLAG_SMART_SUSPEND and DPM_FLAG_LEAVE_SUSPENDED for the
devices (if they need to do the PM in ->suspend and ->resume, they may set
DPM_FLAG_AVOID_RPM too).

And if you see a reason why that won't work, please let me know.

Thanks,
Rafael
Ulf Hansson Oct. 18, 2017, 11:57 a.m. UTC | #2
On 18 October 2017 at 02:39, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> On Tuesday, October 17, 2017 9:41:16 PM CEST Ulf Hansson wrote:
>
> [cut]
>
>> >
>> >> deploying this and from a middle layer point of view, all the trivial
>> >> cases supports this.
>> >
>> > These functions are wrong, however, because they attempt to reuse the
>> > whole callback *path* instead of just reusing driver callbacks.  The
>> > *only* reason why it all "works" is because there are no middle layer
>> > callbacks involved in that now.
>> >
>> > If you changed them to reuse driver callbacks only today, nothing would break
>> > AFAICS.
>>
>> Yes, it would.
>>
>> First, for example, the amba bus is responsible for the amba bus
>> clock, but relies on drivers to gate/ungate it during system sleep. In
>> case the amba drivers don't use the pm_runtime_force_suspend|resume(),
>> it will explicitly have to start manage the clock during system sleep
>> themselves. Leading to open coding.
>
> Well, I suspected that something like this would surface. ;-)
>
> Are there any major reasons why the appended patch (obviously untested) won't
> work, then?

Let me comment on the code, instead of here...

...just realized your second reply, so let me reply to that instead
regarding the patch.

>
>> Second, it will introduce a regression in behavior for all users of
>> pm_runtime_force_suspend|resume(), especially during system resume as
>> the driver may then end up resuming the device even in case it isn't
>> needed.
>
> How so?
>
> I'm talking about a change like in the appended patch, where
> pm_runtime_force_* simply invoke driver callbacks directly.  What is
> skipped there is middle-layer stuff which is empty anyway in all cases
> except for AMBA (if that's all what is lurking below the surface), so
> I don't quite see how the failure will happen.

I am afraid changing pm_runtime_force* to only call driver callbacks
may become fragile. Let me elaborate.

The reason why pm_runtime_force_* needs to respects the hierarchy of
the RPM callbacks, is because otherwise it can't safely update the
runtime PM status of the device. And updating the runtime PM status of
the device is required to manage the optimized behavior during system
resume (avoiding to unnecessary resume devices).

Besides the AMBA case, I also realized that we are dealing with PM
clocks in the genpd case. For this, genpd relies on the that runtime
PM status of the device properly reflects the state of the HW, during
system-wide PM.

In other words, if the driver would change the runtime PM status of
the device, without respecting the hierarchy of the runtime PM
callbacks, it would lead to that genpd starts taking wrong decisions
while managing the PM clocks during system-wide PM. So in case you
intend to change pm_runtime_force_* this needs to be addressed too.

>
>> I believe I have explained why, also several times by now -
>> and that's also how far you could take the i2c designware driver at
>> this point.
>>
>> That said, I assume the second part may be addressed in this series,
>> if these drivers convert to use the "driver PM flags", right?
>>
>> However, what about the first case? Is some open coding needed or your
>> think the amba driver can instruct the amba bus via the "driver PM
>> flags"?
>
> With the appended patch applied things should work for AMBA like for
> any other bus type implementing PM, so I don't see why not.
>
>> >
>> >> Like the spi bus, i2c bus, amba bus, platform
>> >> bus, genpd, etc. There are no changes needed to continue to support
>> >> this option, if you see what I mean.
>> >
>> > For the time being, nothing changes in that respect, but eventually I'd
>> > prefer the pm_runtime_force_* things to go away, frankly.
>>
>> Okay, thanks for that clear statement!
>>
>> >
>> >> So, when you say that re-using runtime PM callbacks for system-wide PM
>> >> isn't going to happen, can you please elaborate what you mean?
>> >
>> > I didn't mean "reusing runtime PM callbacks for system-wide PM" overall, but
>> > reusing *middle-layer* runtime PM callbacks for system-wide PM.  That is the
>> > bogus part.
>>
>> I think we have discussed this several times, but the arguments you
>> have put forward, explaining *why* haven't yet convinced me.
>
> Well, sorry about that.  I would like to be able to explain my point to you so
> that you understand my perspective, but if that's not working, that's not a
> sufficient reason for me to give up.
>
> I'm just refusing to maintain code that I don't agree with in the long run.
>
>> In principle what you have been saying is that it's a "layering
>> violation" to use pm_runtime_force_suspend|resume() from driver's
>> system sleep callbacks, but on the other hand you think using
>> pm_runtime_get*  and friends is okay!?
>
> Not unconditionally, which would be fair to mention.
>
> Only if it is called in ->prepare or as the first thing in a ->suspend
> callback.  Later than that is broken too in principle.
>
>> That makes little sense to me, because it's the same "layering
>> violation" that is done for both cases.
>
> The "layering violation" is all about things possibly occurring in a
> wrong order.  For example, say a middle-layer ->runtime_suspend is
> called via pm_runtime_force_suspend() which in turn is called from
> middle-layer ->suspend_late as a driver callback.  If the ->runtime_suspend
> does anything significat to the device, then executing the remaining part of
> ->suspend_late will almost cetainly break things, more or less.
>
> That is not a concern with a middle-layer ->runtime_resume running
> *before* a middle-layer ->suspend (or any subsequent callbacks) does
> anything significant to the device.
>
> Is there anything in the above which is not clear enough?
>
>> Moreover, you have been explaining that re-using runtime PM callbacks
>> for PCI doesn't work. Then my question is, why should a limitation of
>> the PCI subsystem put constraints on the behavior for all other
>> subsystems/middle-layers?
>
> Because they aren't just PCI subsystem limitations only.  The need to handle
> wakeup setup differently for runtime PM and system sleep is not PCI-specific.
> The need to handle suspend and hibernation differently isn't too.
>
> Those things may be more obvious in PCI, but they are generic rather than
> special.

Absolutely agree about the different wake-up settings. However, these
issues can be addressed also when using pm_runtime_force_*, at least
in general, but then not for PCI.

Regarding hibernation, honestly that's not really my area of
expertise. Although, I assume the middle-layer and driver can treat
that as a separate case, so if it's not suitable to use
pm_runtime_force* for that case, then they shouldn't do it.

>
> Also, quite so often other middle layers interact with PCI directly or
> indirectly (eg. a platform device may be a child or a consumer of a PCI
> device) and some optimizations need to take that into account (eg. parents
> generally need to be accessible when their childres are resumed and so on).

A device's parent becomes informed when changing the runtime PM status
of the device via pm_runtime_force_suspend|resume(), as those calls
pm_runtime_set_suspended|active(). In case that isn't that sufficient,
what else is needed? Perhaps you can point me to an example so I can
understand better?

For a PCI consumer device those will of course have to play by the rules of PCI.

>
> Moreover, the majority of the "other subsystems/middle-layers" you've talked
> about so far don't provide any PM callbacks to be invoked by pm_runtime_force_*,
> so question is how representative they really are.

That's the point. We know pm_runtime_force_* works nicely for the
trivial middle-layer cases. For the more complex cases, we need
something additional/different.

>
>> >
>> > Quoting again:
>> >
>> > "If you are a middle layer, your role is basically to do PM for a certain
>> > group of devices.  Thus you cannot really do the same in ->suspend or
>> > ->suspend_early and in ->runtime_suspend (because the former generally need to
>> > take device_may_wakeup() into account and the latter doesn't) and you shouldn't
>> > really do the same in ->suspend and ->freeze (becuase the latter shouldn't
>> > change the device's power state) and so on."
>> >
>> > I have said for multiple times that re-using *driver* callbacks actually makes
>> > sense and the series is for doing that easier in general among other things.
>> >
>> >> I assume you mean that the PM core won't be involved to support this,
>> >> but is that it?
>> >>
>> >> Do you also mean that *all* users of pm_runtime_force_suspend|resume()
>> >> must convert to this new thing, using "driver PM flags", so in the end
>> >> you want to remove pm_runtime_force_suspend|resume()?
>> >>  - Then if so, you must of course consider all cases for how
>> >> pm_runtime_force_suspend|resume() are being deployed currently, else
>> >> existing users can't convert to the "driver PM flags" thing. Have you
>> >> done that in this series?
>> >
>> > Let me turn this around.
>> >
>> > The majority of cases in which pm_runtime_force_* are used *should* be
>> > addressable using the flags introduced here.  Some case in which
>> > pm_runtime_force_* cannot be used should be addressable by these flags
>> > as well.
>>
>> That's sounds really great!
>>
>> >
>> > There may be some cases in which pm_runtime_force_* are used that may
>> > require something more, but I'm not going to worry about that right now.
>>
>> This approach concerns me, because if we in the end realizes that
>> pm_runtime_force_suspend|resume() will be too hard to get rid of, then
>> this series just add yet another generic way of trying to optimize the
>> system sleep path for runtime PM enabled devices.
>
> Which also works for PCI and the ACPI PM domain and that's sort of valuable
> anyway, isn't it?

Indeed it is! I am definitely open to improve the situation for ACPI and PCI.

Seems like I may have given the wrong impression about that.

>
> For the record, I don't think it will be too hard to get rid of
> pm_runtime_force_suspend|resume(), although that may take quite some time.
>
>> So then we would end up having to support the "direct_complete" path,
>> the "driver PM flags" and cases where
>> pm_runtime_force_suspend|resume() is used. No, that just isn't good
>> enough to me. That will just lead to similar scenarios as we had in
>> the i2c designware driver.
>
> Frankly, this sounds like staging for indefinite blocking of changes in
> this area on non-technical grounds.  I hope that it isn't the case ...
>
>> If we decide to go with these new "driver PM flags", I want to make
>> sure, as long as possible, that we can remove both the
>> "direct_complete" path support from the PM core as well as removing
>> the pm_runtime_force_suspend|resume() helpers.
>
> We'll see.
>
>> >
>> > I'll take care of that when I'll be removing pm_runtime_force_*, which I'm
>> > not doing here.
>>
>> Of course I am fine with that we postpone doing the actual converting
>> of drivers etc from this series, although as stated above, let's sure
>> we *can* do it by using the "driver PM flags".
>
> There clearly are use cases that benefit from this series and I don't see
> any alternatives covering them, including both direct-complete and the
> pm_runtime_force* approach, so I'm not buying this "let's make sure
> it can cover all possible use cases that exist" argumentation.

Alright, let me re-phrase my take on this.

Because you stated that you plan to remove pm_runtime_force_*
eventually, then I think you need to put up some valid reasons of why
(I consider that done), but more importantly, you need to offer an
alternative solution that can replace it. Else such that statement can
easily become wrong interpreted. My point is, the "driver PM flags" do
*not* offers a full alternative solution, it may do in the future or
it may not.

So, to conclude from my side, I don't have any major objections to
going forward with the "driver PM flags", especially with the goal of
improving the situation for PCI and ACPI. Down the road, we can then
*try* to make it replace pm_runtime_force_* and the "direct_complete
path".

Hopefully that makes it more clear.

[...]

Kind regards
Uffe
Ulf Hansson Oct. 18, 2017, 12:34 p.m. UTC | #3
[...]

>> Are there any major reasons why the appended patch (obviously untested) won't
>> work, then?
>
> OK, there is a reason, which is the optimizations bundled into
> pm_runtime_force_*, because (a) the device may be left in runtime suspend
> by them (in which case amba_pm_suspend_early() in my patch should not run)
> and (b) pm_runtime_force_resume() may decide to leave it suspended (in which
> case amba_pm_suspend_late() in my patch should not run).

Exactly.

>
> [BTW, the "leave the device suspended" optimization in pm_runtime_force_*
> is potentially problematic too, because it requires the children to do
> the right thing, which effectively means that their drivers need to use
> pm_runtime_force_* too, but what if they don't want to reuse their
> runtime PM callbacks for system-wide PM?]

Deployment of pm_runtime_force_suspend() should generally be done for
children devices first.

If some reason that isn't the case, it's expected that the call to
pm_runtime_set_suspended() invoked from pm_runtime_force_suspend(),
for the parent, should fail and thus abort system suspend.

>
> Honestly, I don't like the way this is designed.  IMO, it would be better
> to do the optimizations and all in the bus type middle-layer code instead
> of expecting drivers to use pm_runtime_force_* as their system-wide PM
> callbacks (and that expectation should at least be documented, which I'm
> not sure is the case now).  But whatever.
>
> It all should work the way it does now without pm_runtime_force_* if (a) the
> bus type's PM callbacks are changed like in the last patch and the drivers
> (b) point their system suspend callbacks to the runtime PM callback routines
> and (c) set DPM_FLAG_SMART_SUSPEND and DPM_FLAG_LEAVE_SUSPENDED for the
> devices (if they need to do the PM in ->suspend and ->resume, they may set
> DPM_FLAG_AVOID_RPM too).
>
> And if you see a reason why that won't work, please let me know.

I will have look and try out the series by using my local "runtime PM
test driver".

I get back to you with an update on this.

Kind regards
Uffe
Rafael J. Wysocki Oct. 18, 2017, 1 p.m. UTC | #4
On Wednesday, October 18, 2017 1:57:52 PM CEST Ulf Hansson wrote:
> On 18 October 2017 at 02:39, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > On Tuesday, October 17, 2017 9:41:16 PM CEST Ulf Hansson wrote:
> >
> > [cut]
> >
> >> >
> >> >> deploying this and from a middle layer point of view, all the trivial
> >> >> cases supports this.
> >> >
> >> > These functions are wrong, however, because they attempt to reuse the
> >> > whole callback *path* instead of just reusing driver callbacks.  The
> >> > *only* reason why it all "works" is because there are no middle layer
> >> > callbacks involved in that now.
> >> >
> >> > If you changed them to reuse driver callbacks only today, nothing would break
> >> > AFAICS.
> >>
> >> Yes, it would.
> >>
> >> First, for example, the amba bus is responsible for the amba bus
> >> clock, but relies on drivers to gate/ungate it during system sleep. In
> >> case the amba drivers don't use the pm_runtime_force_suspend|resume(),
> >> it will explicitly have to start manage the clock during system sleep
> >> themselves. Leading to open coding.
> >
> > Well, I suspected that something like this would surface. ;-)
> >
> > Are there any major reasons why the appended patch (obviously untested) won't
> > work, then?
> 
> Let me comment on the code, instead of here...
> 
> ...just realized your second reply, so let me reply to that instead
> regarding the patch.
> 
> >
> >> Second, it will introduce a regression in behavior for all users of
> >> pm_runtime_force_suspend|resume(), especially during system resume as
> >> the driver may then end up resuming the device even in case it isn't
> >> needed.
> >
> > How so?
> >
> > I'm talking about a change like in the appended patch, where
> > pm_runtime_force_* simply invoke driver callbacks directly.  What is
> > skipped there is middle-layer stuff which is empty anyway in all cases
> > except for AMBA (if that's all what is lurking below the surface), so
> > I don't quite see how the failure will happen.
> 
> I am afraid changing pm_runtime_force* to only call driver callbacks
> may become fragile. Let me elaborate.
> 
> The reason why pm_runtime_force_* needs to respects the hierarchy of
> the RPM callbacks, is because otherwise it can't safely update the
> runtime PM status of the device.

I'm not sure I follow this requirement.  Why is that so?

> And updating the runtime PM status of
> the device is required to manage the optimized behavior during system
> resume (avoiding to unnecessary resume devices).

Well, OK.  The runtime PM status of the device after system resume should
better reflect its physical state.

[The physical state of the device may not be under the control of the
kernel in some cases, like in S3 resume on some systems that reset
devices in the firmware and so on, but let's set that aside.]

However, for the runtime PM status of the device may still reflect its state
if, say, a ->resume_early of the middle layer is called during resume along
with a driver's ->runtime_resume.  That still can produce the right state
of the device and all depends on the middle layer.

On the other hand, as I said before, using a middle-layer ->runtime_suspend
during a system sleep transition may be outright incorrect, say if device
wakeup settings need to be adjusted by the middle layer (which is the
case for some of them).

Of course, if the middle layer expects the driver to point its
system-wide PM callbacks to pm_runtime_force_*, then that's how it goes,
but the drivers working with this particular middle layer generally
won't work with other middle layers and may interact incorrectly
with parents and/or children using the other middle layers.

I guess the problem boils down to having a common set of expectations
on the driver side and on the middle layer side allowing different
combinations of these to work together.

> Besides the AMBA case, I also realized that we are dealing with PM
> clocks in the genpd case. For this, genpd relies on the that runtime
> PM status of the device properly reflects the state of the HW, during
> system-wide PM.
> 
> In other words, if the driver would change the runtime PM status of
> the device, without respecting the hierarchy of the runtime PM
> callbacks, it would lead to that genpd starts taking wrong decisions
> while managing the PM clocks during system-wide PM. So in case you
> intend to change pm_runtime_force_* this needs to be addressed too.

I've just looked at the genpd code and quite frankly I'm not sure how this
works, but I'll figure this out. :-)

> >
> >> I believe I have explained why, also several times by now -
> >> and that's also how far you could take the i2c designware driver at
> >> this point.
> >>
> >> That said, I assume the second part may be addressed in this series,
> >> if these drivers convert to use the "driver PM flags", right?
> >>
> >> However, what about the first case? Is some open coding needed or your
> >> think the amba driver can instruct the amba bus via the "driver PM
> >> flags"?
> >
> > With the appended patch applied things should work for AMBA like for
> > any other bus type implementing PM, so I don't see why not.
> >
> >> >
> >> >> Like the spi bus, i2c bus, amba bus, platform
> >> >> bus, genpd, etc. There are no changes needed to continue to support
> >> >> this option, if you see what I mean.
> >> >
> >> > For the time being, nothing changes in that respect, but eventually I'd
> >> > prefer the pm_runtime_force_* things to go away, frankly.
> >>
> >> Okay, thanks for that clear statement!
> >>
> >> >
> >> >> So, when you say that re-using runtime PM callbacks for system-wide PM
> >> >> isn't going to happen, can you please elaborate what you mean?
> >> >
> >> > I didn't mean "reusing runtime PM callbacks for system-wide PM" overall, but
> >> > reusing *middle-layer* runtime PM callbacks for system-wide PM.  That is the
> >> > bogus part.
> >>
> >> I think we have discussed this several times, but the arguments you
> >> have put forward, explaining *why* haven't yet convinced me.
> >
> > Well, sorry about that.  I would like to be able to explain my point to you so
> > that you understand my perspective, but if that's not working, that's not a
> > sufficient reason for me to give up.
> >
> > I'm just refusing to maintain code that I don't agree with in the long run.
> >
> >> In principle what you have been saying is that it's a "layering
> >> violation" to use pm_runtime_force_suspend|resume() from driver's
> >> system sleep callbacks, but on the other hand you think using
> >> pm_runtime_get*  and friends is okay!?
> >
> > Not unconditionally, which would be fair to mention.
> >
> > Only if it is called in ->prepare or as the first thing in a ->suspend
> > callback.  Later than that is broken too in principle.
> >
> >> That makes little sense to me, because it's the same "layering
> >> violation" that is done for both cases.
> >
> > The "layering violation" is all about things possibly occurring in a
> > wrong order.  For example, say a middle-layer ->runtime_suspend is
> > called via pm_runtime_force_suspend() which in turn is called from
> > middle-layer ->suspend_late as a driver callback.  If the ->runtime_suspend
> > does anything significat to the device, then executing the remaining part of
> > ->suspend_late will almost cetainly break things, more or less.
> >
> > That is not a concern with a middle-layer ->runtime_resume running
> > *before* a middle-layer ->suspend (or any subsequent callbacks) does
> > anything significant to the device.
> >
> > Is there anything in the above which is not clear enough?
> >
> >> Moreover, you have been explaining that re-using runtime PM callbacks
> >> for PCI doesn't work. Then my question is, why should a limitation of
> >> the PCI subsystem put constraints on the behavior for all other
> >> subsystems/middle-layers?
> >
> > Because they aren't just PCI subsystem limitations only.  The need to handle
> > wakeup setup differently for runtime PM and system sleep is not PCI-specific.
> > The need to handle suspend and hibernation differently isn't too.
> >
> > Those things may be more obvious in PCI, but they are generic rather than
> > special.
> 
> Absolutely agree about the different wake-up settings. However, these
> issues can be addressed also when using pm_runtime_force_*, at least
> in general, but then not for PCI.

Well, not for the ACPI PM domain too.

In general, not if the wakeup settings are adjusted by the middle layer.

> Regarding hibernation, honestly that's not really my area of
> expertise. Although, I assume the middle-layer and driver can treat
> that as a separate case, so if it's not suitable to use
> pm_runtime_force* for that case, then they shouldn't do it.

Well, agreed.

In some simple cases, though, driver callbacks can be reused for hibernation
too, so it would be good to have a common way to do that too, IMO.

> >
> > Also, quite so often other middle layers interact with PCI directly or
> > indirectly (eg. a platform device may be a child or a consumer of a PCI
> > device) and some optimizations need to take that into account (eg. parents
> > generally need to be accessible when their childres are resumed and so on).
> 
> A device's parent becomes informed when changing the runtime PM status
> of the device via pm_runtime_force_suspend|resume(), as those calls
> pm_runtime_set_suspended|active().

This requires the parent driver or middle layer to look at the reference
counter and understand it the same way as pm_runtime_force_*.

> In case that isn't that sufficient, what else is needed? Perhaps you can
> point me to an example so I can understand better?

Say you want to leave the parent suspended after system resume, but the
child drivers use pm_runtime_force_suspend|resume().  The parent would then
need to use pm_runtime_force_suspend|resume() too, no?
 
> For a PCI consumer device those will of course have to play by the rules of PCI.
> 
> >
> > Moreover, the majority of the "other subsystems/middle-layers" you've talked
> > about so far don't provide any PM callbacks to be invoked by pm_runtime_force_*,
> > so question is how representative they really are.
> 
> That's the point. We know pm_runtime_force_* works nicely for the
> trivial middle-layer cases.

In which cases the middle-layer callbacks don't exist, so it's just like
reusing driver callbacks directly. :-)

> For the more complex cases, we need something additional/different.

Something different.

But overall, as I said, this is about common expectations.

Today, some middle layers expect drivers to point their callback pointers
to the same routine in order to resue it (PCI, ACPI bus type), some of them
expect pm_runtime_force_suspend|resume() to be used (AMBA, maybe genpd),
and some of them have no expectations at all.

There needs to be a common ground in that area for drivers to be able to
work with different middle layers.

> >
> >> >
> >> > Quoting again:
> >> >
> >> > "If you are a middle layer, your role is basically to do PM for a certain
> >> > group of devices.  Thus you cannot really do the same in ->suspend or
> >> > ->suspend_early and in ->runtime_suspend (because the former generally need to
> >> > take device_may_wakeup() into account and the latter doesn't) and you shouldn't
> >> > really do the same in ->suspend and ->freeze (becuase the latter shouldn't
> >> > change the device's power state) and so on."
> >> >
> >> > I have said for multiple times that re-using *driver* callbacks actually makes
> >> > sense and the series is for doing that easier in general among other things.
> >> >
> >> >> I assume you mean that the PM core won't be involved to support this,
> >> >> but is that it?
> >> >>
> >> >> Do you also mean that *all* users of pm_runtime_force_suspend|resume()
> >> >> must convert to this new thing, using "driver PM flags", so in the end
> >> >> you want to remove pm_runtime_force_suspend|resume()?
> >> >>  - Then if so, you must of course consider all cases for how
> >> >> pm_runtime_force_suspend|resume() are being deployed currently, else
> >> >> existing users can't convert to the "driver PM flags" thing. Have you
> >> >> done that in this series?
> >> >
> >> > Let me turn this around.
> >> >
> >> > The majority of cases in which pm_runtime_force_* are used *should* be
> >> > addressable using the flags introduced here.  Some case in which
> >> > pm_runtime_force_* cannot be used should be addressable by these flags
> >> > as well.
> >>
> >> That's sounds really great!
> >>
> >> >
> >> > There may be some cases in which pm_runtime_force_* are used that may
> >> > require something more, but I'm not going to worry about that right now.
> >>
> >> This approach concerns me, because if we in the end realizes that
> >> pm_runtime_force_suspend|resume() will be too hard to get rid of, then
> >> this series just add yet another generic way of trying to optimize the
> >> system sleep path for runtime PM enabled devices.
> >
> > Which also works for PCI and the ACPI PM domain and that's sort of valuable
> > anyway, isn't it?
> 
> Indeed it is! I am definitely open to improve the situation for ACPI and PCI.
> 
> Seems like I may have given the wrong impression about that.
> 
> >
> > For the record, I don't think it will be too hard to get rid of
> > pm_runtime_force_suspend|resume(), although that may take quite some time.
> >
> >> So then we would end up having to support the "direct_complete" path,
> >> the "driver PM flags" and cases where
> >> pm_runtime_force_suspend|resume() is used. No, that just isn't good
> >> enough to me. That will just lead to similar scenarios as we had in
> >> the i2c designware driver.
> >
> > Frankly, this sounds like staging for indefinite blocking of changes in
> > this area on non-technical grounds.  I hope that it isn't the case ...
> >
> >> If we decide to go with these new "driver PM flags", I want to make
> >> sure, as long as possible, that we can remove both the
> >> "direct_complete" path support from the PM core as well as removing
> >> the pm_runtime_force_suspend|resume() helpers.
> >
> > We'll see.
> >
> >> >
> >> > I'll take care of that when I'll be removing pm_runtime_force_*, which I'm
> >> > not doing here.
> >>
> >> Of course I am fine with that we postpone doing the actual converting
> >> of drivers etc from this series, although as stated above, let's sure
> >> we *can* do it by using the "driver PM flags".
> >
> > There clearly are use cases that benefit from this series and I don't see
> > any alternatives covering them, including both direct-complete and the
> > pm_runtime_force* approach, so I'm not buying this "let's make sure
> > it can cover all possible use cases that exist" argumentation.
> 
> Alright, let me re-phrase my take on this.
> 
> Because you stated that you plan to remove pm_runtime_force_*
> eventually, then I think you need to put up some valid reasons of why
> (I consider that done), but more importantly, you need to offer an
> alternative solution that can replace it. Else such that statement can
> easily become wrong interpreted. My point is, the "driver PM flags" do
> *not* offers a full alternative solution, it may do in the future or
> it may not.
> 
> So, to conclude from my side, I don't have any major objections to
> going forward with the "driver PM flags", especially with the goal of
> improving the situation for PCI and ACPI. Down the road, we can then
> *try* to make it replace pm_runtime_force_* and the "direct_complete
> path".
> 
> Hopefully that makes it more clear.

Yes, it does, thank you!

Rafael
Ulf Hansson Oct. 18, 2017, 2:11 p.m. UTC | #5
[...]

>>
>> The reason why pm_runtime_force_* needs to respects the hierarchy of
>> the RPM callbacks, is because otherwise it can't safely update the
>> runtime PM status of the device.
>
> I'm not sure I follow this requirement.  Why is that so?

If the PM domain controls some resources for the device in its RPM
callbacks and the driver controls some other resources in its RPM
callbacks - then these resources needs to be managed together.

This follows the behavior of when a regular call to
pm_runtime_get|put(), triggers the RPM callbacks to be invoked.

>
>> And updating the runtime PM status of
>> the device is required to manage the optimized behavior during system
>> resume (avoiding to unnecessary resume devices).
>
> Well, OK.  The runtime PM status of the device after system resume should
> better reflect its physical state.
>
> [The physical state of the device may not be under the control of the
> kernel in some cases, like in S3 resume on some systems that reset
> devices in the firmware and so on, but let's set that aside.]
>
> However, for the runtime PM status of the device may still reflect its state
> if, say, a ->resume_early of the middle layer is called during resume along
> with a driver's ->runtime_resume.  That still can produce the right state
> of the device and all depends on the middle layer.
>
> On the other hand, as I said before, using a middle-layer ->runtime_suspend
> during a system sleep transition may be outright incorrect, say if device
> wakeup settings need to be adjusted by the middle layer (which is the
> case for some of them).
>
> Of course, if the middle layer expects the driver to point its
> system-wide PM callbacks to pm_runtime_force_*, then that's how it goes,
> but the drivers working with this particular middle layer generally
> won't work with other middle layers and may interact incorrectly
> with parents and/or children using the other middle layers.
>
> I guess the problem boils down to having a common set of expectations
> on the driver side and on the middle layer side allowing different
> combinations of these to work together.

Yes!

>
>> Besides the AMBA case, I also realized that we are dealing with PM
>> clocks in the genpd case. For this, genpd relies on the that runtime
>> PM status of the device properly reflects the state of the HW, during
>> system-wide PM.
>>
>> In other words, if the driver would change the runtime PM status of
>> the device, without respecting the hierarchy of the runtime PM
>> callbacks, it would lead to that genpd starts taking wrong decisions
>> while managing the PM clocks during system-wide PM. So in case you
>> intend to change pm_runtime_force_* this needs to be addressed too.
>
> I've just looked at the genpd code and quite frankly I'm not sure how this
> works, but I'll figure this out. :-)

You may think of it as genpd's RPM callback controls some device
clocks, while the driver control some other device resources (pinctrl
for example) from its RPM callback.

These resources needs to managed together, similar to as I described above.

[...]

>> Absolutely agree about the different wake-up settings. However, these
>> issues can be addressed also when using pm_runtime_force_*, at least
>> in general, but then not for PCI.
>
> Well, not for the ACPI PM domain too.
>
> In general, not if the wakeup settings are adjusted by the middle layer.

Correct!

To use pm_runtime_force* for these cases, one would need some
additional information exchange between the driver and the
middle-layer.

>
>> Regarding hibernation, honestly that's not really my area of
>> expertise. Although, I assume the middle-layer and driver can treat
>> that as a separate case, so if it's not suitable to use
>> pm_runtime_force* for that case, then they shouldn't do it.
>
> Well, agreed.
>
> In some simple cases, though, driver callbacks can be reused for hibernation
> too, so it would be good to have a common way to do that too, IMO.

Okay, that makes sense!

>
>> >
>> > Also, quite so often other middle layers interact with PCI directly or
>> > indirectly (eg. a platform device may be a child or a consumer of a PCI
>> > device) and some optimizations need to take that into account (eg. parents
>> > generally need to be accessible when their childres are resumed and so on).
>>
>> A device's parent becomes informed when changing the runtime PM status
>> of the device via pm_runtime_force_suspend|resume(), as those calls
>> pm_runtime_set_suspended|active().
>
> This requires the parent driver or middle layer to look at the reference
> counter and understand it the same way as pm_runtime_force_*.
>
>> In case that isn't that sufficient, what else is needed? Perhaps you can
>> point me to an example so I can understand better?
>
> Say you want to leave the parent suspended after system resume, but the
> child drivers use pm_runtime_force_suspend|resume().  The parent would then
> need to use pm_runtime_force_suspend|resume() too, no?

Actually no.

Currently the other options of "deferring resume" (not using
pm_runtime_force_*), is either using the "direct_complete" path or
similar to the approach you took for the i2c designware driver.

Both cases should play nicely in combination of a child being managed
by pm_runtime_force_*. That's because only when the parent device is
kept runtime suspended during system suspend, resuming can be
deferred.

That means, if the resume of the parent is deferred, so will the also
the resume of the child.

>
>> For a PCI consumer device those will of course have to play by the rules of PCI.
>>
>> >
>> > Moreover, the majority of the "other subsystems/middle-layers" you've talked
>> > about so far don't provide any PM callbacks to be invoked by pm_runtime_force_*,
>> > so question is how representative they really are.
>>
>> That's the point. We know pm_runtime_force_* works nicely for the
>> trivial middle-layer cases.
>
> In which cases the middle-layer callbacks don't exist, so it's just like
> reusing driver callbacks directly. :-)
>
>> For the more complex cases, we need something additional/different.
>
> Something different.
>
> But overall, as I said, this is about common expectations.
>
> Today, some middle layers expect drivers to point their callback pointers
> to the same routine in order to resue it (PCI, ACPI bus type), some of them
> expect pm_runtime_force_suspend|resume() to be used (AMBA, maybe genpd),
> and some of them have no expectations at all.
>
> There needs to be a common ground in that area for drivers to be able to
> work with different middle layers.

Yes, reaching that point would be great, we should definitively aim for that!

[...]

Kind regards
Uffe
Grygorii Strashko Oct. 18, 2017, 7:45 p.m. UTC | #6
On 10/18/2017 09:11 AM, Ulf Hansson wrote:
> [...]
> 
>>>
>>> The reason why pm_runtime_force_* needs to respects the hierarchy of
>>> the RPM callbacks, is because otherwise it can't safely update the
>>> runtime PM status of the device.
>>
>> I'm not sure I follow this requirement.  Why is that so?
> 
> If the PM domain controls some resources for the device in its RPM
> callbacks and the driver controls some other resources in its RPM
> callbacks - then these resources needs to be managed together.
> 
> This follows the behavior of when a regular call to
> pm_runtime_get|put(), triggers the RPM callbacks to be invoked.
> 
>>
>>> And updating the runtime PM status of
>>> the device is required to manage the optimized behavior during system
>>> resume (avoiding to unnecessary resume devices).
>>
>> Well, OK.  The runtime PM status of the device after system resume should
>> better reflect its physical state.
>>
>> [The physical state of the device may not be under the control of the
>> kernel in some cases, like in S3 resume on some systems that reset
>> devices in the firmware and so on, but let's set that aside.]
>>
>> However, for the runtime PM status of the device may still reflect its state
>> if, say, a ->resume_early of the middle layer is called during resume along
>> with a driver's ->runtime_resume.  That still can produce the right state
>> of the device and all depends on the middle layer.
>>
>> On the other hand, as I said before, using a middle-layer ->runtime_suspend
>> during a system sleep transition may be outright incorrect, say if device
>> wakeup settings need to be adjusted by the middle layer (which is the
>> case for some of them).
>>
>> Of course, if the middle layer expects the driver to point its
>> system-wide PM callbacks to pm_runtime_force_*, then that's how it goes,
>> but the drivers working with this particular middle layer generally
>> won't work with other middle layers and may interact incorrectly
>> with parents and/or children using the other middle layers.
>>
>> I guess the problem boils down to having a common set of expectations
>> on the driver side and on the middle layer side allowing different
>> combinations of these to work together.
> 
> Yes!
> 
>>
>>> Besides the AMBA case, I also realized that we are dealing with PM
>>> clocks in the genpd case. For this, genpd relies on the that runtime
>>> PM status of the device properly reflects the state of the HW, during
>>> system-wide PM.
>>>
>>> In other words, if the driver would change the runtime PM status of
>>> the device, without respecting the hierarchy of the runtime PM
>>> callbacks, it would lead to that genpd starts taking wrong decisions
>>> while managing the PM clocks during system-wide PM. So in case you
>>> intend to change pm_runtime_force_* this needs to be addressed too.
>>
>> I've just looked at the genpd code and quite frankly I'm not sure how this
>> works, but I'll figure this out. :-)
> 
> You may think of it as genpd's RPM callback controls some device
> clocks, while the driver control some other device resources (pinctrl
> for example) from its RPM callback.
> 
> These resources needs to managed together, similar to as I described above.
> 
> [...]
> 
>>> Absolutely agree about the different wake-up settings. However, these
>>> issues can be addressed also when using pm_runtime_force_*, at least
>>> in general, but then not for PCI.
>>
>> Well, not for the ACPI PM domain too.
>>
>> In general, not if the wakeup settings are adjusted by the middle layer.
> 
> Correct!
> 
> To use pm_runtime_force* for these cases, one would need some
> additional information exchange between the driver and the
> middle-layer.
> 
>>
>>> Regarding hibernation, honestly that's not really my area of
>>> expertise. Although, I assume the middle-layer and driver can treat
>>> that as a separate case, so if it's not suitable to use
>>> pm_runtime_force* for that case, then they shouldn't do it.
>>
>> Well, agreed.
>>
>> In some simple cases, though, driver callbacks can be reused for hibernation
>> too, so it would be good to have a common way to do that too, IMO.
> 
> Okay, that makes sense!
> 
>>
>>>>
>>>> Also, quite so often other middle layers interact with PCI directly or
>>>> indirectly (eg. a platform device may be a child or a consumer of a PCI
>>>> device) and some optimizations need to take that into account (eg. parents
>>>> generally need to be accessible when their childres are resumed and so on).
>>>
>>> A device's parent becomes informed when changing the runtime PM status
>>> of the device via pm_runtime_force_suspend|resume(), as those calls
>>> pm_runtime_set_suspended|active().
>>
>> This requires the parent driver or middle layer to look at the reference
>> counter and understand it the same way as pm_runtime_force_*.
>>
>>> In case that isn't that sufficient, what else is needed? Perhaps you can
>>> point me to an example so I can understand better?
>>
>> Say you want to leave the parent suspended after system resume, but the
>> child drivers use pm_runtime_force_suspend|resume().  The parent would then
>> need to use pm_runtime_force_suspend|resume() too, no?
> 
> Actually no.
> 
> Currently the other options of "deferring resume" (not using
> pm_runtime_force_*), is either using the "direct_complete" path or
> similar to the approach you took for the i2c designware driver.
> 
> Both cases should play nicely in combination of a child being managed
> by pm_runtime_force_*. That's because only when the parent device is
> kept runtime suspended during system suspend, resuming can be
> deferred.
> 
> That means, if the resume of the parent is deferred, so will the also
> the resume of the child.
> 
>>
>>> For a PCI consumer device those will of course have to play by the rules of PCI.
>>>
>>>>
>>>> Moreover, the majority of the "other subsystems/middle-layers" you've talked
>>>> about so far don't provide any PM callbacks to be invoked by pm_runtime_force_*,
>>>> so question is how representative they really are.
>>>
>>> That's the point. We know pm_runtime_force_* works nicely for the
>>> trivial middle-layer cases.
>>
>> In which cases the middle-layer callbacks don't exist, so it's just like
>> reusing driver callbacks directly. :-)

I'd like to ask you clarify one point here and provide some info which I hope can be useful - 
what's exactly means  "trivial middle-layer cases"?

Is it when systems use "drivers/base/power/clock_ops.c - Generic clock manipulation PM callbacks"
as dev_pm_domain (arm davinci/keystone), or OMAP device framework struct dev_pm_domain omap_device_pm_domain
 (arm/mach-omap2/omap_device.c) or static const struct dev_pm_ops tegra_aconnect_pm_ops?

if yes all above have PM runtime callbacks.
Rafael J. Wysocki Oct. 18, 2017, 9:48 p.m. UTC | #7
On Wednesday, October 18, 2017 9:45:11 PM CEST Grygorii Strashko wrote:
> 
> On 10/18/2017 09:11 AM, Ulf Hansson wrote:

[...]

> >>> That's the point. We know pm_runtime_force_* works nicely for the
> >>> trivial middle-layer cases.
> >>
> >> In which cases the middle-layer callbacks don't exist, so it's just like
> >> reusing driver callbacks directly. :-)
> 
> I'd like to ask you clarify one point here and provide some info which I hope can be useful - 
> what's exactly means  "trivial middle-layer cases"?
> 
> Is it when systems use "drivers/base/power/clock_ops.c - Generic clock
> manipulation PM callbacks" as dev_pm_domain (arm davinci/keystone), or OMAP
> device framework struct dev_pm_domain omap_device_pm_domain
> (arm/mach-omap2/omap_device.c) or static const struct dev_pm_ops
> tegra_aconnect_pm_ops?
> 
> if yes all above have PM runtime callbacks.

Trivial ones don't actually do anything meaningful in their PM callbacks.

Things like the platform bus type, spi bus type, i2c bus type and similar.

If the middle-layer callbacks manipulate devices in a significant way, then
they aren't trivial.

Thanks,
Rafael
Rafael J. Wysocki Oct. 18, 2017, 9:54 p.m. UTC | #8
On Wednesday, October 18, 2017 2:34:10 PM CEST Ulf Hansson wrote:
> [...]
> 
> >> Are there any major reasons why the appended patch (obviously untested) won't
> >> work, then?
> >
> > OK, there is a reason, which is the optimizations bundled into
> > pm_runtime_force_*, because (a) the device may be left in runtime suspend
> > by them (in which case amba_pm_suspend_early() in my patch should not run)
> > and (b) pm_runtime_force_resume() may decide to leave it suspended (in which
> > case amba_pm_suspend_late() in my patch should not run).
> 
> Exactly.
> 
> >
> > [BTW, the "leave the device suspended" optimization in pm_runtime_force_*
> > is potentially problematic too, because it requires the children to do
> > the right thing, which effectively means that their drivers need to use
> > pm_runtime_force_* too, but what if they don't want to reuse their
> > runtime PM callbacks for system-wide PM?]
> 
> Deployment of pm_runtime_force_suspend() should generally be done for
> children devices first.
> 
> If some reason that isn't the case, it's expected that the call to
> pm_runtime_set_suspended() invoked from pm_runtime_force_suspend(),
> for the parent, should fail and thus abort system suspend.

Well, generally what about drivers that need to do something significantly
different for system suspend and runtime PM?  The whole picture seems to be
falling apart if one of these is involved.

> >
> > Honestly, I don't like the way this is designed.  IMO, it would be better
> > to do the optimizations and all in the bus type middle-layer code instead
> > of expecting drivers to use pm_runtime_force_* as their system-wide PM
> > callbacks (and that expectation should at least be documented, which I'm
> > not sure is the case now).  But whatever.
> >
> > It all should work the way it does now without pm_runtime_force_* if (a) the
> > bus type's PM callbacks are changed like in the last patch and the drivers
> > (b) point their system suspend callbacks to the runtime PM callback routines
> > and (c) set DPM_FLAG_SMART_SUSPEND and DPM_FLAG_LEAVE_SUSPENDED for the
> > devices (if they need to do the PM in ->suspend and ->resume, they may set
> > DPM_FLAG_AVOID_RPM too).
> >
> > And if you see a reason why that won't work, please let me know.
> 
> I will have look and try out the series by using my local "runtime PM
> test driver".
> 
> I get back to you with an update on this.

OK, thanks!
Rafael J. Wysocki Oct. 18, 2017, 10:12 p.m. UTC | #9
On Wednesday, October 18, 2017 4:11:33 PM CEST Ulf Hansson wrote:
> [...]
> 
> >>
> >> The reason why pm_runtime_force_* needs to respects the hierarchy of
> >> the RPM callbacks, is because otherwise it can't safely update the
> >> runtime PM status of the device.
> >
> > I'm not sure I follow this requirement.  Why is that so?
> 
> If the PM domain controls some resources for the device in its RPM
> callbacks and the driver controls some other resources in its RPM
> callbacks - then these resources needs to be managed together.

Right, but that doesn't automatically make it necessary to use runtime PM
callbacks in the middle layer.  Its system-wide PM callbacks may be
suitable for that just fine.

That is, at least in some cases, you can combine ->runtime_suspend from a
driver and ->suspend_late from a middle layer with no problems, for example.

That's why some middle layers allow drivers to point ->suspend_late and
->runtime_suspend to the same routine if they want to reuse that code.

> This follows the behavior of when a regular call to
> pm_runtime_get|put(), triggers the RPM callbacks to be invoked.

But (a) it doesn't have to follow it and (b) in some cases it should not
follow it.
 
> >
> >> And updating the runtime PM status of
> >> the device is required to manage the optimized behavior during system
> >> resume (avoiding to unnecessary resume devices).
> >
> > Well, OK.  The runtime PM status of the device after system resume should
> > better reflect its physical state.
> >
> > [The physical state of the device may not be under the control of the
> > kernel in some cases, like in S3 resume on some systems that reset
> > devices in the firmware and so on, but let's set that aside.]
> >
> > However, for the runtime PM status of the device may still reflect its state
> > if, say, a ->resume_early of the middle layer is called during resume along
> > with a driver's ->runtime_resume.  That still can produce the right state
> > of the device and all depends on the middle layer.
> >
> > On the other hand, as I said before, using a middle-layer ->runtime_suspend
> > during a system sleep transition may be outright incorrect, say if device
> > wakeup settings need to be adjusted by the middle layer (which is the
> > case for some of them).
> >
> > Of course, if the middle layer expects the driver to point its
> > system-wide PM callbacks to pm_runtime_force_*, then that's how it goes,
> > but the drivers working with this particular middle layer generally
> > won't work with other middle layers and may interact incorrectly
> > with parents and/or children using the other middle layers.
> >
> > I guess the problem boils down to having a common set of expectations
> > on the driver side and on the middle layer side allowing different
> > combinations of these to work together.
> 
> Yes!
> 
> >
> >> Besides the AMBA case, I also realized that we are dealing with PM
> >> clocks in the genpd case. For this, genpd relies on the that runtime
> >> PM status of the device properly reflects the state of the HW, during
> >> system-wide PM.
> >>
> >> In other words, if the driver would change the runtime PM status of
> >> the device, without respecting the hierarchy of the runtime PM
> >> callbacks, it would lead to that genpd starts taking wrong decisions
> >> while managing the PM clocks during system-wide PM. So in case you
> >> intend to change pm_runtime_force_* this needs to be addressed too.
> >
> > I've just looked at the genpd code and quite frankly I'm not sure how this
> > works, but I'll figure this out. :-)
> 
> You may think of it as genpd's RPM callback controls some device
> clocks, while the driver control some other device resources (pinctrl
> for example) from its RPM callback.
> 
> These resources needs to managed together, similar to as I described above.

Which, again, doesn't mean that runtime PM callbacks from the middle layer
have to be used for that.

> [...]
> 
> >> Absolutely agree about the different wake-up settings. However, these
> >> issues can be addressed also when using pm_runtime_force_*, at least
> >> in general, but then not for PCI.
> >
> > Well, not for the ACPI PM domain too.
> >
> > In general, not if the wakeup settings are adjusted by the middle layer.
> 
> Correct!
> 
> To use pm_runtime_force* for these cases, one would need some
> additional information exchange between the driver and the
> middle-layer.

Which pretty much defeats the purpose of the wrappers, doesn't it?

> >
> >> Regarding hibernation, honestly that's not really my area of
> >> expertise. Although, I assume the middle-layer and driver can treat
> >> that as a separate case, so if it's not suitable to use
> >> pm_runtime_force* for that case, then they shouldn't do it.
> >
> > Well, agreed.
> >
> > In some simple cases, though, driver callbacks can be reused for hibernation
> > too, so it would be good to have a common way to do that too, IMO.
> 
> Okay, that makes sense!
> 
> >
> >> >
> >> > Also, quite so often other middle layers interact with PCI directly or
> >> > indirectly (eg. a platform device may be a child or a consumer of a PCI
> >> > device) and some optimizations need to take that into account (eg. parents
> >> > generally need to be accessible when their childres are resumed and so on).
> >>
> >> A device's parent becomes informed when changing the runtime PM status
> >> of the device via pm_runtime_force_suspend|resume(), as those calls
> >> pm_runtime_set_suspended|active().
> >
> > This requires the parent driver or middle layer to look at the reference
> > counter and understand it the same way as pm_runtime_force_*.
> >
> >> In case that isn't that sufficient, what else is needed? Perhaps you can
> >> point me to an example so I can understand better?
> >
> > Say you want to leave the parent suspended after system resume, but the
> > child drivers use pm_runtime_force_suspend|resume().  The parent would then
> > need to use pm_runtime_force_suspend|resume() too, no?
> 
> Actually no.
> 
> Currently the other options of "deferring resume" (not using
> pm_runtime_force_*), is either using the "direct_complete" path or
> similar to the approach you took for the i2c designware driver.
>
> Both cases should play nicely in combination of a child being managed
> by pm_runtime_force_*. That's because only when the parent device is
> kept runtime suspended during system suspend, resuming can be
> deferred.

And because the parent remains in runtime suspend late enough in the
system suspend path, its children also are guaranteed to be suspended.

But then all of them need to be left in runtime suspend during system
resume too, which is somewhat restrictive, because some drivers may
want their devices to be resumed then.

[BTW, our current documentation recommends resuming devices during
system resume, actually, and gives a list of reasons why. :-)]

> That means, if the resume of the parent is deferred, so will the also
> the resume of the child.
> 
> >
> >> For a PCI consumer device those will of course have to play by the rules of PCI.
> >>
> >> >
> >> > Moreover, the majority of the "other subsystems/middle-layers" you've talked
> >> > about so far don't provide any PM callbacks to be invoked by pm_runtime_force_*,
> >> > so question is how representative they really are.
> >>
> >> That's the point. We know pm_runtime_force_* works nicely for the
> >> trivial middle-layer cases.
> >
> > In which cases the middle-layer callbacks don't exist, so it's just like
> > reusing driver callbacks directly. :-)
> >
> >> For the more complex cases, we need something additional/different.
> >
> > Something different.
> >
> > But overall, as I said, this is about common expectations.
> >
> > Today, some middle layers expect drivers to point their callback pointers
> > to the same routine in order to resue it (PCI, ACPI bus type), some of them
> > expect pm_runtime_force_suspend|resume() to be used (AMBA, maybe genpd),
> > and some of them have no expectations at all.
> >
> > There needs to be a common ground in that area for drivers to be able to
> > work with different middle layers.
> 
> Yes, reaching that point would be great, we should definitively aim for that!

Indeed.

Thanks,
Rafael
Ulf Hansson Oct. 19, 2017, 8:33 a.m. UTC | #10
On 18 October 2017 at 23:48, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> On Wednesday, October 18, 2017 9:45:11 PM CEST Grygorii Strashko wrote:
>>
>> On 10/18/2017 09:11 AM, Ulf Hansson wrote:
>
> [...]
>
>> >>> That's the point. We know pm_runtime_force_* works nicely for the
>> >>> trivial middle-layer cases.
>> >>
>> >> In which cases the middle-layer callbacks don't exist, so it's just like
>> >> reusing driver callbacks directly. :-)
>>
>> I'd like to ask you clarify one point here and provide some info which I hope can be useful -
>> what's exactly means  "trivial middle-layer cases"?
>>
>> Is it when systems use "drivers/base/power/clock_ops.c - Generic clock
>> manipulation PM callbacks" as dev_pm_domain (arm davinci/keystone), or OMAP
>> device framework struct dev_pm_domain omap_device_pm_domain
>> (arm/mach-omap2/omap_device.c) or static const struct dev_pm_ops
>> tegra_aconnect_pm_ops?
>>
>> if yes all above have PM runtime callbacks.
>
> Trivial ones don't actually do anything meaningful in their PM callbacks.
>
> Things like the platform bus type, spi bus type, i2c bus type and similar.
>
> If the middle-layer callbacks manipulate devices in a significant way, then
> they aren't trivial.

I fully agree with Rafael's description above, but let me also clarify
one more thing.

We have also been discussing PM domains as being trivial and
non-trivial. In some statements I even think the PM domain has been a
part the middle-layer terminology, which may have been a bit
confusing.

In this regards as we consider genpd being a trivial PM domain, those
examples your bring up above is too me also examples of trivial PM
domains. Especially because they don't deal with wakeups, as that is
taken care of by the drivers, right!?

Kind regards
Uffe
Ulf Hansson Oct. 19, 2017, 12:21 p.m. UTC | #11
On 19 October 2017 at 00:12, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> On Wednesday, October 18, 2017 4:11:33 PM CEST Ulf Hansson wrote:
>> [...]
>>
>> >>
>> >> The reason why pm_runtime_force_* needs to respects the hierarchy of
>> >> the RPM callbacks, is because otherwise it can't safely update the
>> >> runtime PM status of the device.
>> >
>> > I'm not sure I follow this requirement.  Why is that so?
>>
>> If the PM domain controls some resources for the device in its RPM
>> callbacks and the driver controls some other resources in its RPM
>> callbacks - then these resources needs to be managed together.
>
> Right, but that doesn't automatically make it necessary to use runtime PM
> callbacks in the middle layer.  Its system-wide PM callbacks may be
> suitable for that just fine.
>
> That is, at least in some cases, you can combine ->runtime_suspend from a
> driver and ->suspend_late from a middle layer with no problems, for example.
>
> That's why some middle layers allow drivers to point ->suspend_late and
> ->runtime_suspend to the same routine if they want to reuse that code.
>
>> This follows the behavior of when a regular call to
>> pm_runtime_get|put(), triggers the RPM callbacks to be invoked.
>
> But (a) it doesn't have to follow it and (b) in some cases it should not
> follow it.

Of course you don't explicitly *have to* respect the hierarchy of the
RPM callbacks in pm_runtime_force_*.

However, changing that would require some additional information
exchange between the driver and the middle-layer/PM domain, as to
instruct the middle-layer/PM domain of what to do during system-wide
PM. Especially in cases when the driver deals with wakeup, as in those
cases the instructions may change dynamically.

[...]

>> > In general, not if the wakeup settings are adjusted by the middle layer.
>>
>> Correct!
>>
>> To use pm_runtime_force* for these cases, one would need some
>> additional information exchange between the driver and the
>> middle-layer.
>
> Which pretty much defeats the purpose of the wrappers, doesn't it?

Well, no matter if the wrappers are used or not, we need some kind of
information exchange between the driver and the middle-layers/PM
domains.

Anyway, me personally think it's too early to conclude that using the
wrappers may not be useful going forward. At this point, they clearly
helps trivial cases to remain being trivial.

>
>> >
>> >> Regarding hibernation, honestly that's not really my area of
>> >> expertise. Although, I assume the middle-layer and driver can treat
>> >> that as a separate case, so if it's not suitable to use
>> >> pm_runtime_force* for that case, then they shouldn't do it.
>> >
>> > Well, agreed.
>> >
>> > In some simple cases, though, driver callbacks can be reused for hibernation
>> > too, so it would be good to have a common way to do that too, IMO.
>>
>> Okay, that makes sense!
>>
>> >
>> >> >
>> >> > Also, quite so often other middle layers interact with PCI directly or
>> >> > indirectly (eg. a platform device may be a child or a consumer of a PCI
>> >> > device) and some optimizations need to take that into account (eg. parents
>> >> > generally need to be accessible when their childres are resumed and so on).
>> >>
>> >> A device's parent becomes informed when changing the runtime PM status
>> >> of the device via pm_runtime_force_suspend|resume(), as those calls
>> >> pm_runtime_set_suspended|active().
>> >
>> > This requires the parent driver or middle layer to look at the reference
>> > counter and understand it the same way as pm_runtime_force_*.
>> >
>> >> In case that isn't that sufficient, what else is needed? Perhaps you can
>> >> point me to an example so I can understand better?
>> >
>> > Say you want to leave the parent suspended after system resume, but the
>> > child drivers use pm_runtime_force_suspend|resume().  The parent would then
>> > need to use pm_runtime_force_suspend|resume() too, no?
>>
>> Actually no.
>>
>> Currently the other options of "deferring resume" (not using
>> pm_runtime_force_*), is either using the "direct_complete" path or
>> similar to the approach you took for the i2c designware driver.
>>
>> Both cases should play nicely in combination of a child being managed
>> by pm_runtime_force_*. That's because only when the parent device is
>> kept runtime suspended during system suspend, resuming can be
>> deferred.
>
> And because the parent remains in runtime suspend late enough in the
> system suspend path, its children also are guaranteed to be suspended.

Yes.

>
> But then all of them need to be left in runtime suspend during system
> resume too, which is somewhat restrictive, because some drivers may
> want their devices to be resumed then.

Actually, this scenario is also addressed when using the pm_runtime_force_*.

The driver for the child would only need to bump the runtime PM usage
count (pm_runtime_get_noresume()) before calling
pm_runtime_force_suspend() at system suspend. That then also
propagates to the parent, leading to that both the parent and the
child will be resumed when pm_runtime_force_resume() is called for
them.

Of course, if the driver of the parent isn't using pm_runtime_force_,
we would have to assume that it's always being resumed at system
resume.

As at matter of fact, doesn't this scenario actually indicates that we
do need to involve the runtime PM core (updating RPM status according
to the HW state even during system-wide PM) to really get this right.
It's not enough to only use "driver PM flags"!?

Seems like we need to create a list of all requirements, pitfalls,
good things vs bad things etc. :-)

>
> [BTW, our current documentation recommends resuming devices during
> system resume, actually, and gives a list of reasons why. :-)]

Yes, but that too easy and to me not good enough. :-)

[...]

Kind regards
Uffe
Grygorii Strashko Oct. 19, 2017, 5:21 p.m. UTC | #12
On 10/19/2017 03:33 AM, Ulf Hansson wrote:
> On 18 October 2017 at 23:48, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>> On Wednesday, October 18, 2017 9:45:11 PM CEST Grygorii Strashko wrote:
>>>
>>> On 10/18/2017 09:11 AM, Ulf Hansson wrote:
>>
>> [...]
>>
>>>>>> That's the point. We know pm_runtime_force_* works nicely for the
>>>>>> trivial middle-layer cases.
>>>>>
>>>>> In which cases the middle-layer callbacks don't exist, so it's just like
>>>>> reusing driver callbacks directly. :-)
>>>
>>> I'd like to ask you clarify one point here and provide some info which I hope can be useful -
>>> what's exactly means  "trivial middle-layer cases"?
>>>
>>> Is it when systems use "drivers/base/power/clock_ops.c - Generic clock
>>> manipulation PM callbacks" as dev_pm_domain (arm davinci/keystone), or OMAP
>>> device framework struct dev_pm_domain omap_device_pm_domain
>>> (arm/mach-omap2/omap_device.c) or static const struct dev_pm_ops
>>> tegra_aconnect_pm_ops?
>>>
>>> if yes all above have PM runtime callbacks.
>>
>> Trivial ones don't actually do anything meaningful in their PM callbacks.
>>
>> Things like the platform bus type, spi bus type, i2c bus type and similar.
>>
>> If the middle-layer callbacks manipulate devices in a significant way, then
>> they aren't trivial.
> 
> I fully agree with Rafael's description above, but let me also clarify
> one more thing.
> 
> We have also been discussing PM domains as being trivial and
> non-trivial. In some statements I even think the PM domain has been a
> part the middle-layer terminology, which may have been a bit
> confusing.
> 
> In this regards as we consider genpd being a trivial PM domain, those
> examples your bring up above is too me also examples of trivial PM
> domains. Especially because they don't deal with wakeups, as that is
> taken care of by the drivers, right!?

Not directly, for example, omap device framework has noirq callback implemented
which forcibly disable all devices which are not PM runtime suspended.
while doing this it calls drivers PM .runtime_suspend() which may return
non 0 value and in this case device will be left enabled (powered) at suspend for
wake up purposes (see _od_suspend_noirq()).
Ulf Hansson Oct. 19, 2017, 6:01 p.m. UTC | #13
[...]

>>> > Say you want to leave the parent suspended after system resume, but the
>>> > child drivers use pm_runtime_force_suspend|resume().  The parent would then
>>> > need to use pm_runtime_force_suspend|resume() too, no?
>>>
>>> Actually no.
>>>
>>> Currently the other options of "deferring resume" (not using
>>> pm_runtime_force_*), is either using the "direct_complete" path or
>>> similar to the approach you took for the i2c designware driver.
>>>
>>> Both cases should play nicely in combination of a child being managed
>>> by pm_runtime_force_*. That's because only when the parent device is
>>> kept runtime suspended during system suspend, resuming can be
>>> deferred.
>>
>> And because the parent remains in runtime suspend late enough in the
>> system suspend path, its children also are guaranteed to be suspended.
>
> Yes.
>
>>
>> But then all of them need to be left in runtime suspend during system
>> resume too, which is somewhat restrictive, because some drivers may
>> want their devices to be resumed then.
>
> Actually, this scenario is also addressed when using the pm_runtime_force_*.
>
> The driver for the child would only need to bump the runtime PM usage
> count (pm_runtime_get_noresume()) before calling
> pm_runtime_force_suspend() at system suspend. That then also
> propagates to the parent, leading to that both the parent and the
> child will be resumed when pm_runtime_force_resume() is called for
> them.

I need to correct myself here. The above currently only works if the
child is runtime resumed while pm_runtime_force_suspend() is called.

The logic in pm_runtime_force_* needs to be improved to take care of
such scenarios. However I think that should be rather easy to fix, if
we want that.

[...]

Kind regards
Uffe
Ulf Hansson Oct. 19, 2017, 6:04 p.m. UTC | #14
On 19 October 2017 at 19:21, Grygorii Strashko <grygorii.strashko@ti.com> wrote:
>
>
> On 10/19/2017 03:33 AM, Ulf Hansson wrote:
>> On 18 October 2017 at 23:48, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>>> On Wednesday, October 18, 2017 9:45:11 PM CEST Grygorii Strashko wrote:
>>>>
>>>> On 10/18/2017 09:11 AM, Ulf Hansson wrote:
>>>
>>> [...]
>>>
>>>>>>> That's the point. We know pm_runtime_force_* works nicely for the
>>>>>>> trivial middle-layer cases.
>>>>>>
>>>>>> In which cases the middle-layer callbacks don't exist, so it's just like
>>>>>> reusing driver callbacks directly. :-)
>>>>
>>>> I'd like to ask you clarify one point here and provide some info which I hope can be useful -
>>>> what's exactly means  "trivial middle-layer cases"?
>>>>
>>>> Is it when systems use "drivers/base/power/clock_ops.c - Generic clock
>>>> manipulation PM callbacks" as dev_pm_domain (arm davinci/keystone), or OMAP
>>>> device framework struct dev_pm_domain omap_device_pm_domain
>>>> (arm/mach-omap2/omap_device.c) or static const struct dev_pm_ops
>>>> tegra_aconnect_pm_ops?
>>>>
>>>> if yes all above have PM runtime callbacks.
>>>
>>> Trivial ones don't actually do anything meaningful in their PM callbacks.
>>>
>>> Things like the platform bus type, spi bus type, i2c bus type and similar.
>>>
>>> If the middle-layer callbacks manipulate devices in a significant way, then
>>> they aren't trivial.
>>
>> I fully agree with Rafael's description above, but let me also clarify
>> one more thing.
>>
>> We have also been discussing PM domains as being trivial and
>> non-trivial. In some statements I even think the PM domain has been a
>> part the middle-layer terminology, which may have been a bit
>> confusing.
>>
>> In this regards as we consider genpd being a trivial PM domain, those
>> examples your bring up above is too me also examples of trivial PM
>> domains. Especially because they don't deal with wakeups, as that is
>> taken care of by the drivers, right!?
>
> Not directly, for example, omap device framework has noirq callback implemented
> which forcibly disable all devices which are not PM runtime suspended.
> while doing this it calls drivers PM .runtime_suspend() which may return
> non 0 value and in this case device will be left enabled (powered) at suspend for
> wake up purposes (see _od_suspend_noirq()).
>

Yeah, I had that feeling that omap has some trickyness going on. :-)

I sure that can be fixed in the omap PM domain, although
Ulf Hansson Oct. 19, 2017, 6:11 p.m. UTC | #15
On 19 October 2017 at 20:04, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> On 19 October 2017 at 19:21, Grygorii Strashko <grygorii.strashko@ti.com> wrote:
>>
>>
>> On 10/19/2017 03:33 AM, Ulf Hansson wrote:
>>> On 18 October 2017 at 23:48, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>>>> On Wednesday, October 18, 2017 9:45:11 PM CEST Grygorii Strashko wrote:
>>>>>
>>>>> On 10/18/2017 09:11 AM, Ulf Hansson wrote:
>>>>
>>>> [...]
>>>>
>>>>>>>> That's the point. We know pm_runtime_force_* works nicely for the
>>>>>>>> trivial middle-layer cases.
>>>>>>>
>>>>>>> In which cases the middle-layer callbacks don't exist, so it's just like
>>>>>>> reusing driver callbacks directly. :-)
>>>>>
>>>>> I'd like to ask you clarify one point here and provide some info which I hope can be useful -
>>>>> what's exactly means  "trivial middle-layer cases"?
>>>>>
>>>>> Is it when systems use "drivers/base/power/clock_ops.c - Generic clock
>>>>> manipulation PM callbacks" as dev_pm_domain (arm davinci/keystone), or OMAP
>>>>> device framework struct dev_pm_domain omap_device_pm_domain
>>>>> (arm/mach-omap2/omap_device.c) or static const struct dev_pm_ops
>>>>> tegra_aconnect_pm_ops?
>>>>>
>>>>> if yes all above have PM runtime callbacks.
>>>>
>>>> Trivial ones don't actually do anything meaningful in their PM callbacks.
>>>>
>>>> Things like the platform bus type, spi bus type, i2c bus type and similar.
>>>>
>>>> If the middle-layer callbacks manipulate devices in a significant way, then
>>>> they aren't trivial.
>>>
>>> I fully agree with Rafael's description above, but let me also clarify
>>> one more thing.
>>>
>>> We have also been discussing PM domains as being trivial and
>>> non-trivial. In some statements I even think the PM domain has been a
>>> part the middle-layer terminology, which may have been a bit
>>> confusing.
>>>
>>> In this regards as we consider genpd being a trivial PM domain, those
>>> examples your bring up above is too me also examples of trivial PM
>>> domains. Especially because they don't deal with wakeups, as that is
>>> taken care of by the drivers, right!?
>>
>> Not directly, for example, omap device framework has noirq callback implemented
>> which forcibly disable all devices which are not PM runtime suspended.
>> while doing this it calls drivers PM .runtime_suspend() which may return
>> non 0 value and in this case device will be left enabled (powered) at suspend for
>> wake up purposes (see _od_suspend_noirq()).
>>
>
> Yeah, I had that feeling that omap has some trickyness going on. :-)
>
> I sure that can be fixed in the omap PM domain, although

...slipped with my fingers.. here is the rest of the reply...

..of course that require us to use another way for drivers to signal
to the omap PM domain that it needs to stay powered as to deal with
wakeup.

I can have a look at that more closely, to see if it makes sense to change.

Kind regards
Uffe
Grygorii Strashko Oct. 19, 2017, 9:31 p.m. UTC | #16
On 10/19/2017 01:11 PM, Ulf Hansson wrote:
> On 19 October 2017 at 20:04, Ulf Hansson <ulf.hansson@linaro.org> wrote:
>> On 19 October 2017 at 19:21, Grygorii Strashko <grygorii.strashko@ti.com> wrote:
>>>
>>>
>>> On 10/19/2017 03:33 AM, Ulf Hansson wrote:
>>>> On 18 October 2017 at 23:48, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>>>>> On Wednesday, October 18, 2017 9:45:11 PM CEST Grygorii Strashko wrote:
>>>>>>
>>>>>> On 10/18/2017 09:11 AM, Ulf Hansson wrote:
>>>>>
>>>>> [...]
>>>>>
>>>>>>>>> That's the point. We know pm_runtime_force_* works nicely for the
>>>>>>>>> trivial middle-layer cases.
>>>>>>>>
>>>>>>>> In which cases the middle-layer callbacks don't exist, so it's just like
>>>>>>>> reusing driver callbacks directly. :-)
>>>>>>
>>>>>> I'd like to ask you clarify one point here and provide some info which I hope can be useful -
>>>>>> what's exactly means  "trivial middle-layer cases"?
>>>>>>
>>>>>> Is it when systems use "drivers/base/power/clock_ops.c - Generic clock
>>>>>> manipulation PM callbacks" as dev_pm_domain (arm davinci/keystone), or OMAP
>>>>>> device framework struct dev_pm_domain omap_device_pm_domain
>>>>>> (arm/mach-omap2/omap_device.c) or static const struct dev_pm_ops
>>>>>> tegra_aconnect_pm_ops?
>>>>>>
>>>>>> if yes all above have PM runtime callbacks.
>>>>>
>>>>> Trivial ones don't actually do anything meaningful in their PM callbacks.
>>>>>
>>>>> Things like the platform bus type, spi bus type, i2c bus type and similar.
>>>>>
>>>>> If the middle-layer callbacks manipulate devices in a significant way, then
>>>>> they aren't trivial.
>>>>
>>>> I fully agree with Rafael's description above, but let me also clarify
>>>> one more thing.
>>>>
>>>> We have also been discussing PM domains as being trivial and
>>>> non-trivial. In some statements I even think the PM domain has been a
>>>> part the middle-layer terminology, which may have been a bit
>>>> confusing.
>>>>
>>>> In this regards as we consider genpd being a trivial PM domain, those
>>>> examples your bring up above is too me also examples of trivial PM
>>>> domains. Especially because they don't deal with wakeups, as that is
>>>> taken care of by the drivers, right!?
>>>
>>> Not directly, for example, omap device framework has noirq callback implemented
>>> which forcibly disable all devices which are not PM runtime suspended.
>>> while doing this it calls drivers PM .runtime_suspend() which may return
>>> non 0 value and in this case device will be left enabled (powered) at suspend for
>>> wake up purposes (see _od_suspend_noirq()).
>>>
>>
>> Yeah, I had that feeling that omap has some trickyness going on. :-)
>>
>> I sure that can be fixed in the omap PM domain, although
> 
> ...slipped with my fingers.. here is the rest of the reply...
> 
> ..of course that require us to use another way for drivers to signal
> to the omap PM domain that it needs to stay powered as to deal with
> wakeup.
> 
> I can have a look at that more closely, to see if it makes sense to change.
> 

Also, additional note here. some IPs are reused between OMAP/Davinci/Keystone,
OMAP PM domain have some code running at noirq time to dial with devices left
in PM runtime enabled state (OMAP PM runtime centric), while Davinci/Keystone haven't (clock_ops.c),
so pm_runtime_force_* API is actually possibility now to make the same driver work 
 on all these platforms.
Rafael J. Wysocki Oct. 20, 2017, 1:19 a.m. UTC | #17
On Thursday, October 19, 2017 2:21:07 PM CEST Ulf Hansson wrote:
> On 19 October 2017 at 00:12, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > On Wednesday, October 18, 2017 4:11:33 PM CEST Ulf Hansson wrote:
> >> [...]
> >>
> >> >>
> >> >> The reason why pm_runtime_force_* needs to respects the hierarchy of
> >> >> the RPM callbacks, is because otherwise it can't safely update the
> >> >> runtime PM status of the device.
> >> >
> >> > I'm not sure I follow this requirement.  Why is that so?
> >>
> >> If the PM domain controls some resources for the device in its RPM
> >> callbacks and the driver controls some other resources in its RPM
> >> callbacks - then these resources needs to be managed together.
> >
> > Right, but that doesn't automatically make it necessary to use runtime PM
> > callbacks in the middle layer.  Its system-wide PM callbacks may be
> > suitable for that just fine.
> >
> > That is, at least in some cases, you can combine ->runtime_suspend from a
> > driver and ->suspend_late from a middle layer with no problems, for example.
> >
> > That's why some middle layers allow drivers to point ->suspend_late and
> > ->runtime_suspend to the same routine if they want to reuse that code.
> >
> >> This follows the behavior of when a regular call to
> >> pm_runtime_get|put(), triggers the RPM callbacks to be invoked.
> >
> > But (a) it doesn't have to follow it and (b) in some cases it should not
> > follow it.
> 
> Of course you don't explicitly *have to* respect the hierarchy of the
> RPM callbacks in pm_runtime_force_*.
> 
> However, changing that would require some additional information
> exchange between the driver and the middle-layer/PM domain, as to
> instruct the middle-layer/PM domain of what to do during system-wide
> PM. Especially in cases when the driver deals with wakeup, as in those
> cases the instructions may change dynamically.

Well, if wakeup matters, drivers can't simply point their PM callbacks
to pm_runtime_force_* anyway.

If the driver itself deals with wakeups, it clearly needs different callback
routines for system-wide PM and for runtime PM, so it can't reuse its runtime
PM callbacks at all then.

If the middle layer deals with wakeups, different callbacks are needed at
that level and so pm_runtime_force_* are unsuitable too.

Really, invoking runtime PM callbacks from the middle layer in
pm_runtime_force_* is a not a idea at all and there's no general requirement
for it whatever.

> [...]
> 
> >> > In general, not if the wakeup settings are adjusted by the middle layer.
> >>
> >> Correct!
> >>
> >> To use pm_runtime_force* for these cases, one would need some
> >> additional information exchange between the driver and the
> >> middle-layer.
> >
> > Which pretty much defeats the purpose of the wrappers, doesn't it?
> 
> Well, no matter if the wrappers are used or not, we need some kind of
> information exchange between the driver and the middle-layers/PM
> domains.

Right.

But if that information is exchanged, then why use wrappers *in* *addition*
to that?

> Anyway, me personally think it's too early to conclude that using the
> wrappers may not be useful going forward. At this point, they clearly
> helps trivial cases to remain being trivial.

I'm not sure about that really.  So far I've seen more complexity resulting
from using them than being avoided by using them, but I guess the beauty is
in the eye of the beholder. :-)

> >
> >> >
> >> >> Regarding hibernation, honestly that's not really my area of
> >> >> expertise. Although, I assume the middle-layer and driver can treat
> >> >> that as a separate case, so if it's not suitable to use
> >> >> pm_runtime_force* for that case, then they shouldn't do it.
> >> >
> >> > Well, agreed.
> >> >
> >> > In some simple cases, though, driver callbacks can be reused for hibernation
> >> > too, so it would be good to have a common way to do that too, IMO.
> >>
> >> Okay, that makes sense!
> >>
> >> >
> >> >> >
> >> >> > Also, quite so often other middle layers interact with PCI directly or
> >> >> > indirectly (eg. a platform device may be a child or a consumer of a PCI
> >> >> > device) and some optimizations need to take that into account (eg. parents
> >> >> > generally need to be accessible when their childres are resumed and so on).
> >> >>
> >> >> A device's parent becomes informed when changing the runtime PM status
> >> >> of the device via pm_runtime_force_suspend|resume(), as those calls
> >> >> pm_runtime_set_suspended|active().
> >> >
> >> > This requires the parent driver or middle layer to look at the reference
> >> > counter and understand it the same way as pm_runtime_force_*.
> >> >
> >> >> In case that isn't that sufficient, what else is needed? Perhaps you can
> >> >> point me to an example so I can understand better?
> >> >
> >> > Say you want to leave the parent suspended after system resume, but the
> >> > child drivers use pm_runtime_force_suspend|resume().  The parent would then
> >> > need to use pm_runtime_force_suspend|resume() too, no?
> >>
> >> Actually no.
> >>
> >> Currently the other options of "deferring resume" (not using
> >> pm_runtime_force_*), is either using the "direct_complete" path or
> >> similar to the approach you took for the i2c designware driver.
> >>
> >> Both cases should play nicely in combination of a child being managed
> >> by pm_runtime_force_*. That's because only when the parent device is
> >> kept runtime suspended during system suspend, resuming can be
> >> deferred.
> >
> > And because the parent remains in runtime suspend late enough in the
> > system suspend path, its children also are guaranteed to be suspended.
> 
> Yes.
> 
> >
> > But then all of them need to be left in runtime suspend during system
> > resume too, which is somewhat restrictive, because some drivers may
> > want their devices to be resumed then.
> 
> Actually, this scenario is also addressed when using the pm_runtime_force_*.
> 
> The driver for the child would only need to bump the runtime PM usage
> count (pm_runtime_get_noresume()) before calling
> pm_runtime_force_suspend() at system suspend. That then also
> propagates to the parent, leading to that both the parent and the
> child will be resumed when pm_runtime_force_resume() is called for
> them.
> 
> Of course, if the driver of the parent isn't using pm_runtime_force_,
> we would have to assume that it's always being resumed at system
> resume.

There may be other ways to avoid that, though.

BTW, I don't quite like using the RPM usage counter this way either, if
that hasn't been clear so far.

> As at matter of fact, doesn't this scenario actually indicates that we
> do need to involve the runtime PM core (updating RPM status according
> to the HW state even during system-wide PM) to really get this right.
> It's not enough to only use "driver PM flags"!?

I'm not sure what you are talking about.

For all devices with enabled runtime PM any state produced by system
suspend/resume has to be labeled either as RPM_SUSPENDED or as RPM_ACTIVE.
That has always been the case and hasn't involved any magic.

However, while runtime PM is disabled, the state of the device doesn't
need to be reflected by its RPM status and there's no need to track it then.
Moreover, in some cases it cannot be tracked even, because of the firmare
involvement (and we cannot track the firmware).

Besides, please really look at what happens in the patches I posted and
then we can talk.

> Seems like we need to create a list of all requirements, pitfalls,
> good things vs bad things etc. :-)

We surely need to know what general cases need to be addressed.

> >
> > [BTW, our current documentation recommends resuming devices during
> > system resume, actually, and gives a list of reasons why. :-)]
> 
> Yes, but that too easy and to me not good enough. :-)

But the list of reasons why is kind of valid still.  There may be better
reasons for not doing that, but it really is a tradeoff and drivers
should be able to decide which way they want to go.

IOW, the "leave the device in runtime suspend throughout system
suspend" optimization doesn't have to be bundled with the "leave the
device in suspend throughout and after system resume" one.

Thanks,
Rafael
Ulf Hansson Oct. 20, 2017, 5:57 a.m. UTC | #18
On 20 October 2017 at 03:19, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> On Thursday, October 19, 2017 2:21:07 PM CEST Ulf Hansson wrote:
>> On 19 October 2017 at 00:12, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>> > On Wednesday, October 18, 2017 4:11:33 PM CEST Ulf Hansson wrote:
>> >> [...]
>> >>
>> >> >>
>> >> >> The reason why pm_runtime_force_* needs to respects the hierarchy of
>> >> >> the RPM callbacks, is because otherwise it can't safely update the
>> >> >> runtime PM status of the device.
>> >> >
>> >> > I'm not sure I follow this requirement.  Why is that so?
>> >>
>> >> If the PM domain controls some resources for the device in its RPM
>> >> callbacks and the driver controls some other resources in its RPM
>> >> callbacks - then these resources needs to be managed together.
>> >
>> > Right, but that doesn't automatically make it necessary to use runtime PM
>> > callbacks in the middle layer.  Its system-wide PM callbacks may be
>> > suitable for that just fine.
>> >
>> > That is, at least in some cases, you can combine ->runtime_suspend from a
>> > driver and ->suspend_late from a middle layer with no problems, for example.
>> >
>> > That's why some middle layers allow drivers to point ->suspend_late and
>> > ->runtime_suspend to the same routine if they want to reuse that code.
>> >
>> >> This follows the behavior of when a regular call to
>> >> pm_runtime_get|put(), triggers the RPM callbacks to be invoked.
>> >
>> > But (a) it doesn't have to follow it and (b) in some cases it should not
>> > follow it.
>>
>> Of course you don't explicitly *have to* respect the hierarchy of the
>> RPM callbacks in pm_runtime_force_*.
>>
>> However, changing that would require some additional information
>> exchange between the driver and the middle-layer/PM domain, as to
>> instruct the middle-layer/PM domain of what to do during system-wide
>> PM. Especially in cases when the driver deals with wakeup, as in those
>> cases the instructions may change dynamically.
>
> Well, if wakeup matters, drivers can't simply point their PM callbacks
> to pm_runtime_force_* anyway.
>
> If the driver itself deals with wakeups, it clearly needs different callback
> routines for system-wide PM and for runtime PM, so it can't reuse its runtime
> PM callbacks at all then.

It can still re-use its runtime PM callbacks, simply by calling
pm_runtime_force_ from its system sleep callbacks.

Drivers already do that today, not only to deal with wakeups, but
generally when they need to deal with some additional operations.

>
> If the middle layer deals with wakeups, different callbacks are needed at
> that level and so pm_runtime_force_* are unsuitable too.
>
> Really, invoking runtime PM callbacks from the middle layer in
> pm_runtime_force_* is a not a idea at all and there's no general requirement
> for it whatever.
>
>> [...]
>>
>> >> > In general, not if the wakeup settings are adjusted by the middle layer.
>> >>
>> >> Correct!
>> >>
>> >> To use pm_runtime_force* for these cases, one would need some
>> >> additional information exchange between the driver and the
>> >> middle-layer.
>> >
>> > Which pretty much defeats the purpose of the wrappers, doesn't it?
>>
>> Well, no matter if the wrappers are used or not, we need some kind of
>> information exchange between the driver and the middle-layers/PM
>> domains.
>
> Right.
>
> But if that information is exchanged, then why use wrappers *in* *addition*
> to that?

If we can find a different method that both avoids both open coding
and offers the optimize system-wide PM path at resume, I am open to
that.

>
>> Anyway, me personally think it's too early to conclude that using the
>> wrappers may not be useful going forward. At this point, they clearly
>> helps trivial cases to remain being trivial.
>
> I'm not sure about that really.  So far I've seen more complexity resulting
> from using them than being avoided by using them, but I guess the beauty is
> in the eye of the beholder. :-)

Hehe, yeah you may be right. :-)

>
>> >
>> >> >
>> >> >> Regarding hibernation, honestly that's not really my area of
>> >> >> expertise. Although, I assume the middle-layer and driver can treat
>> >> >> that as a separate case, so if it's not suitable to use
>> >> >> pm_runtime_force* for that case, then they shouldn't do it.
>> >> >
>> >> > Well, agreed.
>> >> >
>> >> > In some simple cases, though, driver callbacks can be reused for hibernation
>> >> > too, so it would be good to have a common way to do that too, IMO.
>> >>
>> >> Okay, that makes sense!
>> >>
>> >> >
>> >> >> >
>> >> >> > Also, quite so often other middle layers interact with PCI directly or
>> >> >> > indirectly (eg. a platform device may be a child or a consumer of a PCI
>> >> >> > device) and some optimizations need to take that into account (eg. parents
>> >> >> > generally need to be accessible when their childres are resumed and so on).
>> >> >>
>> >> >> A device's parent becomes informed when changing the runtime PM status
>> >> >> of the device via pm_runtime_force_suspend|resume(), as those calls
>> >> >> pm_runtime_set_suspended|active().
>> >> >
>> >> > This requires the parent driver or middle layer to look at the reference
>> >> > counter and understand it the same way as pm_runtime_force_*.
>> >> >
>> >> >> In case that isn't that sufficient, what else is needed? Perhaps you can
>> >> >> point me to an example so I can understand better?
>> >> >
>> >> > Say you want to leave the parent suspended after system resume, but the
>> >> > child drivers use pm_runtime_force_suspend|resume().  The parent would then
>> >> > need to use pm_runtime_force_suspend|resume() too, no?
>> >>
>> >> Actually no.
>> >>
>> >> Currently the other options of "deferring resume" (not using
>> >> pm_runtime_force_*), is either using the "direct_complete" path or
>> >> similar to the approach you took for the i2c designware driver.
>> >>
>> >> Both cases should play nicely in combination of a child being managed
>> >> by pm_runtime_force_*. That's because only when the parent device is
>> >> kept runtime suspended during system suspend, resuming can be
>> >> deferred.
>> >
>> > And because the parent remains in runtime suspend late enough in the
>> > system suspend path, its children also are guaranteed to be suspended.
>>
>> Yes.
>>
>> >
>> > But then all of them need to be left in runtime suspend during system
>> > resume too, which is somewhat restrictive, because some drivers may
>> > want their devices to be resumed then.
>>
>> Actually, this scenario is also addressed when using the pm_runtime_force_*.
>>
>> The driver for the child would only need to bump the runtime PM usage
>> count (pm_runtime_get_noresume()) before calling
>> pm_runtime_force_suspend() at system suspend. That then also
>> propagates to the parent, leading to that both the parent and the
>> child will be resumed when pm_runtime_force_resume() is called for
>> them.
>>
>> Of course, if the driver of the parent isn't using pm_runtime_force_,
>> we would have to assume that it's always being resumed at system
>> resume.
>
> There may be other ways to avoid that, though.
>
> BTW, I don't quite like using the RPM usage counter this way either, if
> that hasn't been clear so far.
>
>> As at matter of fact, doesn't this scenario actually indicates that we
>> do need to involve the runtime PM core (updating RPM status according
>> to the HW state even during system-wide PM) to really get this right.
>> It's not enough to only use "driver PM flags"!?
>
> I'm not sure what you are talking about.
>
> For all devices with enabled runtime PM any state produced by system
> suspend/resume has to be labeled either as RPM_SUSPENDED or as RPM_ACTIVE.
> That has always been the case and hasn't involved any magic.
>
> However, while runtime PM is disabled, the state of the device doesn't
> need to be reflected by its RPM status and there's no need to track it then.
> Moreover, in some cases it cannot be tracked even, because of the firmare
> involvement (and we cannot track the firmware).
>
> Besides, please really look at what happens in the patches I posted and
> then we can talk.

Yes, I will have look.

>
>> Seems like we need to create a list of all requirements, pitfalls,
>> good things vs bad things etc. :-)
>
> We surely need to know what general cases need to be addressed.
>
>> >
>> > [BTW, our current documentation recommends resuming devices during
>> > system resume, actually, and gives a list of reasons why. :-)]
>>
>> Yes, but that too easy and to me not good enough. :-)
>
> But the list of reasons why is kind of valid still.  There may be better
> reasons for not doing that, but it really is a tradeoff and drivers
> should be able to decide which way they want to go.

Agree.

>
> IOW, the "leave the device in runtime suspend throughout system
> suspend" optimization doesn't have to be bundled with the "leave the
> device in suspend throughout and after system resume" one.

Agree.

Kind regards
Uffe
Ulf Hansson Oct. 20, 2017, 6:05 a.m. UTC | #19
[...]

>>>>> In this regards as we consider genpd being a trivial PM domain, those
>>>>> examples your bring up above is too me also examples of trivial PM
>>>>> domains. Especially because they don't deal with wakeups, as that is
>>>>> taken care of by the drivers, right!?
>>>>
>>>> Not directly, for example, omap device framework has noirq callback implemented
>>>> which forcibly disable all devices which are not PM runtime suspended.
>>>> while doing this it calls drivers PM .runtime_suspend() which may return
>>>> non 0 value and in this case device will be left enabled (powered) at suspend for
>>>> wake up purposes (see _od_suspend_noirq()).
>>>>
>>>
>>> Yeah, I had that feeling that omap has some trickyness going on. :-)
>>>
>>> I sure that can be fixed in the omap PM domain, although
>>
>> ...slipped with my fingers.. here is the rest of the reply...
>>
>> ..of course that require us to use another way for drivers to signal
>> to the omap PM domain that it needs to stay powered as to deal with
>> wakeup.
>>
>> I can have a look at that more closely, to see if it makes sense to change.
>>
>
> Also, additional note here. some IPs are reused between OMAP/Davinci/Keystone,
> OMAP PM domain have some code running at noirq time to dial with devices left
> in PM runtime enabled state (OMAP PM runtime centric), while Davinci/Keystone haven't (clock_ops.c),
> so pm_runtime_force_* API is actually possibility now to make the same driver work
>  on all these platforms.

That sounds great!

Also, in the end it would be nice to also convert the OMAP PM domain
to genpd. I think most of the needed infrastructure is already there
to do that.

Kind regards
Uffe
diff mbox

Patch

Index: linux-pm/drivers/amba/bus.c
===================================================================
--- linux-pm.orig/drivers/amba/bus.c
+++ linux-pm/drivers/amba/bus.c
@@ -132,52 +132,77 @@  static struct attribute *amba_dev_attrs[
 ATTRIBUTE_GROUPS(amba_dev);
 
 #ifdef CONFIG_PM
+static void amba_pm_suspend(struct device *dev)
+{
+	struct amba_device *pcdev = to_amba_device(dev);
+
+	if (!dev->driver)
+		return;
+
+	if (pm_runtime_is_irq_safe(dev))
+		clk_disable(pcdev->pclk);
+	else
+		clk_disable_unprepare(pcdev->pclk);
+}
+
+static int amba_pm_resume(struct device *dev)
+{
+	struct amba_device *pcdev = to_amba_device(dev);
+
+	if (!dev->driver)
+		return 0;
+
+	/* Failure is probably fatal to the system, but... */
+	if (pm_runtime_is_irq_safe(dev))
+		return clk_enable(pcdev->pclk);
+
+	return clk_prepare_enable(pcdev->pclk);
+}
+
 /*
  * Hooks to provide runtime PM of the pclk (bus clock).  It is safe to
  * enable/disable the bus clock at runtime PM suspend/resume as this
  * does not result in loss of context.
  */
+static int amba_pm_suspend_early(struct device *dev)
+{
+	int ret = pm_generic_suspend_early(dev);
+
+	if (ret)
+		return ret;
+
+	amba_pm_suspend(dev);
+	return 0;
+}
+
+static int amba_pm_resume_late(struct device *dev)
+{
+	int ret = amba_pm_resume(dev);
+
+	return ret ? ret : pm_generic_resume_late(dev);
+}
+
 static int amba_pm_runtime_suspend(struct device *dev)
 {
-	struct amba_device *pcdev = to_amba_device(dev);
 	int ret = pm_generic_runtime_suspend(dev);
 
-	if (ret == 0 && dev->driver) {
-		if (pm_runtime_is_irq_safe(dev))
-			clk_disable(pcdev->pclk);
-		else
-			clk_disable_unprepare(pcdev->pclk);
-	}
+	if (ret)
+		return ret;
 
-	return ret;
+	amba_pm_suspend(dev);
+	return 0;
 }
 
 static int amba_pm_runtime_resume(struct device *dev)
 {
-	struct amba_device *pcdev = to_amba_device(dev);
-	int ret;
-
-	if (dev->driver) {
-		if (pm_runtime_is_irq_safe(dev))
-			ret = clk_enable(pcdev->pclk);
-		else
-			ret = clk_prepare_enable(pcdev->pclk);
-		/* Failure is probably fatal to the system, but... */
-		if (ret)
-			return ret;
-	}
+	int ret = amba_pm_resume(dev);
 
-	return pm_generic_runtime_resume(dev);
+	return ret ? ret : pm_generic_runtime_resume(dev);
 }
 #endif /* CONFIG_PM */
 
 static const struct dev_pm_ops amba_pm = {
-	.suspend	= pm_generic_suspend,
-	.resume		= pm_generic_resume,
-	.freeze		= pm_generic_freeze,
-	.thaw		= pm_generic_thaw,
-	.poweroff	= pm_generic_poweroff,
-	.restore	= pm_generic_restore,
+	SET_LATE_SYSTEM_SLEEP_PM_OPS(amba_pm_suspend_late, amba_pm_resume_early)
 	SET_RUNTIME_PM_OPS(
 		amba_pm_runtime_suspend,
 		amba_pm_runtime_resume,
Index: linux-pm/drivers/base/power/runtime.c
===================================================================
--- linux-pm.orig/drivers/base/power/runtime.c
+++ linux-pm/drivers/base/power/runtime.c
@@ -1636,14 +1636,15 @@  void pm_runtime_drop_link(struct device
  */
 int pm_runtime_force_suspend(struct device *dev)
 {
-	int (*callback)(struct device *);
+	int (*callback)(struct device *) = NULL;
 	int ret = 0;
 
 	pm_runtime_disable(dev);
 	if (pm_runtime_status_suspended(dev))
 		return 0;
 
-	callback = RPM_GET_CALLBACK(dev, runtime_suspend);
+	if (dev->driver && dev->driver->pm)
+		callback = dev->driver->pm->runtime_suspend;
 
 	if (!callback) {
 		ret = -ENOSYS;
@@ -1690,10 +1691,11 @@  EXPORT_SYMBOL_GPL(pm_runtime_force_suspe
  */
 int pm_runtime_force_resume(struct device *dev)
 {
-	int (*callback)(struct device *);
+	int (*callback)(struct device *) = NULL;
 	int ret = 0;
 
-	callback = RPM_GET_CALLBACK(dev, runtime_resume);
+	if (dev->driver && dev->driver->pm)
+		callback = dev->driver->pm->runtime_resume;
 
 	if (!callback) {
 		ret = -ENOSYS;