diff mbox

[Alternative,RFC] PM / Runtime: Introduce driver runtime PM work routine

Message ID 201208092242.32602.rjw@sisk.pl (mailing list archive)
State RFC, archived
Headers show

Commit Message

Rafael Wysocki Aug. 9, 2012, 8:42 p.m. UTC
Hi all,

On Thursday, August 09, 2012, Rafael J. Wysocki wrote:
> 
> Unfortunately, pm_runtime_get() is not a very useful interface,
> because if the device is not in the "active" state already, it
> only queues up a work item supposed to resume the device.  Then,
> the caller doesn't really know when the device is going to be
> resumed which makes it difficult to synchronize future device
> accesses with the resume completion.
> 
> In that case, if the caller is the device's driver, the most
> straightforward way for it to cope with the situation is to use its
> .runtime_resume() callback for data processing unrelated to the
> resume itself, but the correctness of this is questionable.  Namely,
> the driver's .runtime_resume() callback need not be executed directly
> by the core and it may be run from within a subsystem or PM domain
> callback.  Then, the data processing carried out by the driver's
> callback may disturb the subsystem's or PM domain's functionality
> (for example, the subsystem may still be unready for the device to
> process I/O when the driver's callback is about to return).  Besides,
> the .runtime_resume() callback is not supposed to do anything beyond
> what is needed to resume the device.
> 
> For this reason, it appears to be necessary to introduce a mechanism
> by which device drivers may schedule the execution of certain code
> (say a procedure) to occur when the device in question is known to be
> in the "active" state (preferably, as soon as it has been resumed).
> Thus introduce a new runtime PM helper function,
> __pm_runtime_get_and_call(), and its simplified variant
> pm_runtime_get_and_call(), allowing the caller to increment the
> device's runtime PM usage counter and execute a function provided
> as the second argument, either synchronously, if the device is
> in the "active" state, or from the runtime PM workqueue, after
> resuming the device (in that case the execution of the function is
> queued up along with a request to resume the device).
> 
> The simplified helper, pm_runtime_get_and_call(), will only execute
> the function if the runtime PM status of the device is "active".  The
> more complicated one, __pm_runtime_get_and_call(), has one more
> argument allowing the caller to specify whether or not the function
> should be executed even if runtime PM is disabled for the device or
> there has been a runtime PM error.  In those cases, the function will
> always be executed synchronously.
> 
> This version of the patch doesn't include any documentation updates.
> 
> No sign-off yet.

There are some known concerns about this approach.

First of all, the patch at

https://patchwork.kernel.org/patch/1299781/

increases the size of struct device by the size of a pointer, which may seem to
be a bit excessive to somebody, although I personally don't think it's a big
problem.  We don't use _that_ many struct device objects for it to matter much.

Second, which is more important to me, it seems that for a given device func()
will always be the same pointer and it will be used by the device's driver
only.  In that case, most likely, it will be possible to determine the
address of func() at the time of driver initialization, so the setting and
clearing of power.func and passing the address of func() as an argument every
time __pm_runtime_get_and_call() is run may turn out to be an unnecessary
overhead.  Thus it may be more efficient to use a function pointer in struct
device_driver (it can't be located in struct dev_pm_ops, because some drivers
don't use it at all, like USB drivers, and it wouldn't be useful for subsystems
and PM domains) to store the address of func() permanently.

For the above reasons, the appended patch implements an alternative approach,
which is to modify the way pm_runtime_get() works so that, when the device is
not active, it will queue a resume request for the device _along_ _with_ the
execution of a driver routine provided through a new function pointer
.pm_runtime_work().  There also is pm_runtime_get_nowork() that won't do that
and works in the same way as the "old" pm_runtime_get().

Of course, the disadvantage of the new patch is that it makes the change
in struct device_driver, but perhaps it's not a big deal.

I wonder what you think.

Thanks,
Rafael


---
 drivers/base/power/runtime.c |   71 +++++++++++++++++++++++++++++++++----------
 include/linux/device.h       |    2 +
 include/linux/pm.h           |    2 +
 include/linux/pm_runtime.h   |    6 +++
 4 files changed, 66 insertions(+), 15 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Alan Stern Aug. 12, 2012, 4:43 p.m. UTC | #1
On Thu, 9 Aug 2012, Rafael J. Wysocki wrote:

> There are some known concerns about this approach.
> 
> First of all, the patch at
> 
> https://patchwork.kernel.org/patch/1299781/
> 
> increases the size of struct device by the size of a pointer, which may seem to
> be a bit excessive to somebody, although I personally don't think it's a big
> problem.  We don't use _that_ many struct device objects for it to matter much.
> 
> Second, which is more important to me, it seems that for a given device func()
> will always be the same pointer and it will be used by the device's driver
> only.  In that case, most likely, it will be possible to determine the
> address of func() at the time of driver initialization, so the setting and
> clearing of power.func and passing the address of func() as an argument every
> time __pm_runtime_get_and_call() is run may turn out to be an unnecessary
> overhead.  Thus it may be more efficient to use a function pointer in struct
> device_driver (it can't be located in struct dev_pm_ops, because some drivers
> don't use it at all, like USB drivers, and it wouldn't be useful for subsystems
> and PM domains) to store the address of func() permanently.
> 
> For the above reasons, the appended patch implements an alternative approach,
> which is to modify the way pm_runtime_get() works so that, when the device is
> not active, it will queue a resume request for the device _along_ _with_ the
> execution of a driver routine provided through a new function pointer
> .pm_runtime_work().  There also is pm_runtime_get_nowork() that won't do that
> and works in the same way as the "old" pm_runtime_get().
> 
> Of course, the disadvantage of the new patch is that it makes the change
> in struct device_driver, but perhaps it's not a big deal.
> 
> I wonder what you think.

I have some concerns about this patch.

Firstly, the patch doesn't do anything in the case where the device is
already at full power.  Should we invoke the callback routine 
synchronously?  This loses the advantage of a workqueue's "pristine" 
environment, but on the other hand it is much more efficient.  (And 
we're talking about hot pathways, where efficiency matters.)  The 
alternative, of course, is to have the driver invoke the callback 
directly if pm_runtime_get() returns 1.

Secondly, is it necessary for __pm_runtime_barrier() to wait for the 
callback routine?  More generally, does __pm_runtime_barrier() really 
do what it's supposed to do?  What happens if it runs at the same time 
as another thread is executing the pm_runtime_put(parent) call at the 
end of rpm_resume(), or the rpm_resume(parent, 0) in the middle?

Thirdly, I would reorganize the new code added to pm_runtime_work(); 
see below.


> @@ -533,6 +550,13 @@ static int rpm_resume(struct device *dev
>  		goto out;
>  	}
>  
> +	if ((rpmflags & RPM_ASYNC) && (rpmflags & RPM_RUN_WORK)) {
> +		dev->power.run_driver_work = true;
> +		rpm_queue_up_resume(dev);
> +		retval = 0;
> +		goto out;
> +	}
> +

The section of code just before the start of this hunk exits the
routine if the device is already active.  Do you want to put this new
section in the preceding "if" block?

Also, it feels odd to have this code here when there is another section
lower down that also tests for RPM_ASYNC and does almost the same
thing.  It suggests that this new section isn't in the right place.  
For instance, consider what happens in the "no_callbacks" case where
the parent is already active.

> @@ -715,11 +736,29 @@ static void pm_runtime_work(struct work_
>  		rpm_suspend(dev, RPM_NOWAIT | RPM_AUTO);
>  		break;
>  	case RPM_REQ_RESUME:
> -		rpm_resume(dev, RPM_NOWAIT);
> +		if (dev->power.run_driver_work && dev->driver->pm_runtime_work)
> +			driver_work = dev->driver->pm_runtime_work;
> +
> +		dev->power.run_driver_work = false;
> +		rpm_resume(dev, driver_work ? 0 : RPM_NOWAIT);
>  		break;
>  	}
>  
>   out:
> +	if (driver_work) {
> +		pm_runtime_get_noresume(dev);
> +		dev->power.work_in_progress = true;
> +		spin_unlock_irq(&dev->power.lock);
> +
> +		driver_work(dev);
> +
> +		spin_lock_irq(&dev->power.lock);
> +		dev->power.work_in_progress = false;
> +		wake_up_all(&dev->power.wait_queue);
> +		pm_runtime_put_noidle(dev);
> +		rpm_idle(dev, RPM_NOWAIT);
> +	}
> +

It seems very illgical to do all the callback stuff here, after the
"switch" statement.  IMO it would make more sense to put it all
together, more or less as follows:

	case RPM_REQ_RESUME:
		if (dev->power.run_driver_work && dev->driver->pm_runtime_work) {
			driver_work = dev->driver->pm_runtime_work;
			dev->power.run_driver_work = false;
			dev->power.work_in_progress = true;
			pm_runtime_get_noresume(dev);
			rpm_resume(dev, 0);
			spin_unlock_irq(&dev->power.lock);

			driver_work(dev);

			spin_lock_irq(&dev->power.lock);
			dev->power.work_in_progress = false;
			wake_up_all(&dev->power.wait_queue);
			pm_runtime_put_noidle(dev);
			rpm_idle(dev, RPM_NOWAIT);
		} else {
			rpm_resume(dev, RPM_NOWAIT);
		}
		break;

Notice also that it's important to do the _get_noresume _before_
calling rpm_resume().  Otherwise the device might get suspended again
before the callback gets a chance to run.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael Wysocki Aug. 12, 2012, 10:21 p.m. UTC | #2
On Sunday, August 12, 2012, Alan Stern wrote:
> On Thu, 9 Aug 2012, Rafael J. Wysocki wrote:
> 
> > There are some known concerns about this approach.
> > 
> > First of all, the patch at
> > 
> > https://patchwork.kernel.org/patch/1299781/
> > 
> > increases the size of struct device by the size of a pointer, which may seem to
> > be a bit excessive to somebody, although I personally don't think it's a big
> > problem.  We don't use _that_ many struct device objects for it to matter much.
> > 
> > Second, which is more important to me, it seems that for a given device func()
> > will always be the same pointer and it will be used by the device's driver
> > only.  In that case, most likely, it will be possible to determine the
> > address of func() at the time of driver initialization, so the setting and
> > clearing of power.func and passing the address of func() as an argument every
> > time __pm_runtime_get_and_call() is run may turn out to be an unnecessary
> > overhead.  Thus it may be more efficient to use a function pointer in struct
> > device_driver (it can't be located in struct dev_pm_ops, because some drivers
> > don't use it at all, like USB drivers, and it wouldn't be useful for subsystems
> > and PM domains) to store the address of func() permanently.
> > 
> > For the above reasons, the appended patch implements an alternative approach,
> > which is to modify the way pm_runtime_get() works so that, when the device is
> > not active, it will queue a resume request for the device _along_ _with_ the
> > execution of a driver routine provided through a new function pointer
> > .pm_runtime_work().  There also is pm_runtime_get_nowork() that won't do that
> > and works in the same way as the "old" pm_runtime_get().
> > 
> > Of course, the disadvantage of the new patch is that it makes the change
> > in struct device_driver, but perhaps it's not a big deal.
> > 
> > I wonder what you think.
> 
> I have some concerns about this patch.
> 
> Firstly, the patch doesn't do anything in the case where the device is
> already at full power.

This is intentional, because I'm not sure that the code to be run
if pm_runtime_get() returns 1 should always be pm_runtime_work().

For example, the driver may want to acquire a lock before running
pm_runtime_get() and execute that code under the lock.

> Should we invoke the callback routine 
> synchronously?  This loses the advantage of a workqueue's "pristine" 
> environment, but on the other hand it is much more efficient.

I'm not sure if it is always going to be more efficient.

> (And we're talking about hot pathways, where efficiency matters.)  The 
> alternative, of course, is to have the driver invoke the callback 
> directly if pm_runtime_get() returns 1.

Sure.  If every user of .pm_runtime_work() ends up calling it when
pm_runtime_get() returns 1, then there will be a point to modify the
core to do that instead.  However, I'm not sure if that's going to be the
case at the moment.

> Secondly, is it necessary for __pm_runtime_barrier() to wait for the 
> callback routine?

I believe so.  At least that's what is documented about __pm_runtime_barrier().

> More generally, does __pm_runtime_barrier() really 
> do what it's supposed to do?  What happens if it runs at the same time 
> as another thread is executing the pm_runtime_put(parent) call at the 
> end of rpm_resume(), or the rpm_resume(parent, 0) in the middle?

So these are two different situations.  When pm_runtime_put(parent) is
executed, the device has been resumed and no runtime PM code _for_ _the_
_device_ is running (although the trace_rpm_return_int() is at a wrong
place in my opinion).  The second one is more interesting, but it really
is equivalent to calling pm_runtime_resume() (in a different thread)
after __pm_runtime_barrier() has run.

> Thirdly, I would reorganize the new code added to pm_runtime_work(); 
> see below.
> 
> 
> > @@ -533,6 +550,13 @@ static int rpm_resume(struct device *dev
> >  		goto out;
> >  	}
> >  
> > +	if ((rpmflags & RPM_ASYNC) && (rpmflags & RPM_RUN_WORK)) {
> > +		dev->power.run_driver_work = true;
> > +		rpm_queue_up_resume(dev);
> > +		retval = 0;
> > +		goto out;
> > +	}
> > +
> 
> The section of code just before the start of this hunk exits the
> routine if the device is already active.  Do you want to put this new
> section in the preceding "if" block?

Yes, I do.  This is to ensure that the execution of pm_runtime_work() will
be scheduled if RPM_RUN_WORK is set.

> Also, it feels odd to have this code here when there is another section
> lower down that also tests for RPM_ASYNC and does almost the same
> thing.  It suggests that this new section isn't in the right place.

Yes, it does.  However, the code between the two questions contains some checks
that I want to skip if RPM_RUN_WORK is set (otherwis, the execution of
pm_runtime_work() may not be scheduled at all).

> For instance, consider what happens in the "no_callbacks" case where
> the parent is already active.

The no_callbacks case is actually interesting, because I think that the
function should return 1 in that case.  Otherwise, the caller of
pm_runtime_get() may think that it has to wait for the device to resume,
which isn't the case.  So, this seems to need fixing now.

Moreover, if power.no_callbacks is set, the RPM_SUSPENDING and RPM_RESUMING
status values are impossible, as far as I can say, so the entire "no callbacks"
section should be moved right after the check against RPM_ACTIVE.  The same
appears to apply to the analogous "no callbacks" check in rpm_suspend() (i.e.
it should be done earlier).

After those changes I'd put "my" check against RPM_RUN_WORK after the
"no callbacks" check, but before the "RPM_SUSPENDING or RPM_RESUMING" one.

> > @@ -715,11 +736,29 @@ static void pm_runtime_work(struct work_
> >  		rpm_suspend(dev, RPM_NOWAIT | RPM_AUTO);
> >  		break;
> >  	case RPM_REQ_RESUME:
> > -		rpm_resume(dev, RPM_NOWAIT);
> > +		if (dev->power.run_driver_work && dev->driver->pm_runtime_work)
> > +			driver_work = dev->driver->pm_runtime_work;
> > +
> > +		dev->power.run_driver_work = false;
> > +		rpm_resume(dev, driver_work ? 0 : RPM_NOWAIT);
> >  		break;
> >  	}
> >  
> >   out:
> > +	if (driver_work) {
> > +		pm_runtime_get_noresume(dev);
> > +		dev->power.work_in_progress = true;
> > +		spin_unlock_irq(&dev->power.lock);
> > +
> > +		driver_work(dev);
> > +
> > +		spin_lock_irq(&dev->power.lock);
> > +		dev->power.work_in_progress = false;
> > +		wake_up_all(&dev->power.wait_queue);
> > +		pm_runtime_put_noidle(dev);
> > +		rpm_idle(dev, RPM_NOWAIT);
> > +	}
> > +
> 
> It seems very illgical to do all the callback stuff here, after the
> "switch" statement.  IMO it would make more sense to put it all
> together, more or less as follows:
> 
> 	case RPM_REQ_RESUME:
> 		if (dev->power.run_driver_work && dev->driver->pm_runtime_work) {
> 			driver_work = dev->driver->pm_runtime_work;
> 			dev->power.run_driver_work = false;
> 			dev->power.work_in_progress = true;
> 			pm_runtime_get_noresume(dev);
> 			rpm_resume(dev, 0);
> 			spin_unlock_irq(&dev->power.lock);
> 
> 			driver_work(dev);
> 
> 			spin_lock_irq(&dev->power.lock);
> 			dev->power.work_in_progress = false;
> 			wake_up_all(&dev->power.wait_queue);
> 			pm_runtime_put_noidle(dev);
> 			rpm_idle(dev, RPM_NOWAIT);
> 		} else {
> 			rpm_resume(dev, RPM_NOWAIT);
> 		}
> 		break;

OK

> Notice also that it's important to do the _get_noresume _before_
> calling rpm_resume().  Otherwise the device might get suspended again
> before the callback gets a chance to run.

You're right.  I forgot about dropping the lock in order to call
pm_runtime_put(parent).

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Stern Aug. 13, 2012, 4:23 p.m. UTC | #3
On Mon, 13 Aug 2012, Rafael J. Wysocki wrote:

> > Firstly, the patch doesn't do anything in the case where the device is
> > already at full power.
> 
> This is intentional, because I'm not sure that the code to be run
> if pm_runtime_get() returns 1 should always be pm_runtime_work().
> 
> For example, the driver may want to acquire a lock before running
> pm_runtime_get() and execute that code under the lock.

Your reason above makes sense, but the example isn't quite right.  If
the driver wants to execute the callback under a lock then the callback
has to take that lock -- this is necessary because of the async
workqueue case.

In other words, you're saying there may well be situations where the
synchronous and async cases need to run slightly different code.  True 
enough.

> > Secondly, is it necessary for __pm_runtime_barrier() to wait for the 
> > callback routine?
> 
> I believe so.  At least that's what is documented about __pm_runtime_barrier().
> 
> > More generally, does __pm_runtime_barrier() really 
> > do what it's supposed to do?  What happens if it runs at the same time 
> > as another thread is executing the pm_runtime_put(parent) call at the 
> > end of rpm_resume(), or the rpm_resume(parent, 0) in the middle?
> 
> So these are two different situations.  When pm_runtime_put(parent) is
> executed, the device has been resumed and no runtime PM code _for_ _the_
> _device_ is running (although the trace_rpm_return_int() is at a wrong
> place in my opinion).  The second one is more interesting, but it really
> is equivalent to calling pm_runtime_resume() (in a different thread)
> after __pm_runtime_barrier() has run.

__pm_runtime_barrier() has never made very strong guarantees about 
runtime_resume callbacks.  But the kerneldoc does claim that any 
pending callbacks will have been completed, and that claim evidently is 
violated in the rpm_resume(parent,0) case.  Maybe the kerneldoc needs 
to be changed, or maybe we need to fix the code.

> > For instance, consider what happens in the "no_callbacks" case where
> > the parent is already active.
> 
> The no_callbacks case is actually interesting, because I think that the
> function should return 1 in that case.  Otherwise, the caller of
> pm_runtime_get() may think that it has to wait for the device to resume,
> which isn't the case.  So, this seems to need fixing now.

Right.

> Moreover, if power.no_callbacks is set, the RPM_SUSPENDING and RPM_RESUMING
> status values are impossible, as far as I can say, so the entire "no callbacks"
> section should be moved right after the check against RPM_ACTIVE.  The same
> appears to apply to the analogous "no callbacks" check in rpm_suspend() (i.e.
> it should be done earlier).

I'm not so sure about that.  Basically we've got two conditions:

	if (A) {
		...
	}
	if (B) {
		...
	}

where A and B can never both be true.  In this situation it doesn't 
make much difference what order you do the tests.  We should use 
whichever order is better for adding the RPM_RUN_WORK code.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael Wysocki Aug. 13, 2012, 6:47 p.m. UTC | #4
On Monday, August 13, 2012, Alan Stern wrote:
> On Mon, 13 Aug 2012, Rafael J. Wysocki wrote:
> 
> > > Firstly, the patch doesn't do anything in the case where the device is
> > > already at full power.
> > 
> > This is intentional, because I'm not sure that the code to be run
> > if pm_runtime_get() returns 1 should always be pm_runtime_work().
> > 
> > For example, the driver may want to acquire a lock before running
> > pm_runtime_get() and execute that code under the lock.
> 
> Your reason above makes sense, but the example isn't quite right.  If
> the driver wants to execute the callback under a lock then the callback
> has to take that lock -- this is necessary because of the async
> workqueue case.

I thought about this scenario:

* acquire lock
* do something
* ret = pm_runtime_get(dev);
* if (ret > 0)
      do something more
* pm_runtime_put(dev);
* release lock

in which the "do something more" thing cannot be the same as the contents
of the .pm_runtime_work() callback (I notice that I have a collision of
names between the callback and the workqueue function in runtime.c), unless
that callback doesn't acquire the same lock.

> In other words, you're saying there may well be situations where the
> synchronous and async cases need to run slightly different code.

Yes.

> True enough.

:-)

> > > Secondly, is it necessary for __pm_runtime_barrier() to wait for the 
> > > callback routine?
> > 
> > I believe so.  At least that's what is documented about __pm_runtime_barrier().
> > 
> > > More generally, does __pm_runtime_barrier() really 
> > > do what it's supposed to do?  What happens if it runs at the same time 
> > > as another thread is executing the pm_runtime_put(parent) call at the 
> > > end of rpm_resume(), or the rpm_resume(parent, 0) in the middle?
> > 
> > So these are two different situations.  When pm_runtime_put(parent) is
> > executed, the device has been resumed and no runtime PM code _for_ _the_
> > _device_ is running (although the trace_rpm_return_int() is at a wrong
> > place in my opinion).  The second one is more interesting, but it really
> > is equivalent to calling pm_runtime_resume() (in a different thread)
> > after __pm_runtime_barrier() has run.
> 
> __pm_runtime_barrier() has never made very strong guarantees about 
> runtime_resume callbacks.  But the kerneldoc does claim that any 
> pending callbacks will have been completed, and that claim evidently is 
> violated in the rpm_resume(parent,0) case.

"Flush all pending requests for the device from pm_wq and wait for all
runtime PM operations involving the device in progress to complete."

It doesn't mention the parent.

But I agree, it's not very clear.

> Maybe the kerneldoc needs to be changed, or maybe we need to fix the code.

If anything, I'd change the kerneldoc.  The code pretty much has to be
what it is in this respect.

> > > For instance, consider what happens in the "no_callbacks" case where
> > > the parent is already active.
> > 
> > The no_callbacks case is actually interesting, because I think that the
> > function should return 1 in that case.  Otherwise, the caller of
> > pm_runtime_get() may think that it has to wait for the device to resume,
> > which isn't the case.  So, this seems to need fixing now.
> 
> Right.
> 
> > Moreover, if power.no_callbacks is set, the RPM_SUSPENDING and RPM_RESUMING
> > status values are impossible, as far as I can say, so the entire "no callbacks"
> > section should be moved right after the check against RPM_ACTIVE.  The same
> > appears to apply to the analogous "no callbacks" check in rpm_suspend() (i.e.
> > it should be done earlier).
> 
> I'm not so sure about that.  Basically we've got two conditions:
> 
> 	if (A) {
> 		...
> 	}
> 	if (B) {
> 		...
> 	}
> 
> where A and B can never both be true.  In this situation it doesn't 
> make much difference what order you do the tests.  We should use 
> whichever order is better for adding the RPM_RUN_WORK code.

OK, fair enough.

I think that it's better to reorder the checks so that the final ordering is:

* check power.no_callbacks and parent status
* check RPM_RUN_WORK
* check RPM_RESUMING || RPM_SUSPENDING
* check RPM_ASYNC

so that we don't schedule the execution of pm_runtime_work() if
power.no_callbacks is set and the parent is active and we still do the
power.deferred_resume optimization if RPM_RUN_WORK is unset.

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Stern Aug. 13, 2012, 7:20 p.m. UTC | #5
On Mon, 13 Aug 2012, Rafael J. Wysocki wrote:

> > __pm_runtime_barrier() has never made very strong guarantees about 
> > runtime_resume callbacks.  But the kerneldoc does claim that any 
> > pending callbacks will have been completed, and that claim evidently is 
> > violated in the rpm_resume(parent,0) case.
> 
> "Flush all pending requests for the device from pm_wq and wait for all
> runtime PM operations involving the device in progress to complete."
> 
> It doesn't mention the parent.

You're missing the point.  Suppose you do an async resume and while the
workqueue routine is executing pm_resume(parent,0), another thread
calls pm_runtime_barrier() for the same device.  The barrier will
return more or less immediately, even though there is a runtime PM
operation involving the device (that is, the async resume) still in
progress.  The rpm_resume() routine was running before
pm_runtime_barrier() was called and will still be running afterward.

> But I agree, it's not very clear.
> 
> > Maybe the kerneldoc needs to be changed, or maybe we need to fix the code.
> 
> If anything, I'd change the kerneldoc.  The code pretty much has to be
> what it is in this respect.

I'm not sure what guarantees pm_runtime_barrier() _can_ make about
runtime_resume callbacks.  If you call that routine while the device is
suspended then a runtime_resume callback could occur at any moment,
because userspace can write "on" to the power/control attribute
whenever it wants to.

I guess the best we can say is that if you call pm_runtime_barrier()  
after updating the dev_pm_ops method pointers then after the barrier
returns, the old method pointers will not be invoked and the old method
routines will not be running.  So we need an equivalent guarantee with
regard to the pm_runtime_work pointer.  (Yes, we could use a better 
name for that pointer.)

Which means the code in the patch isn't quite right, because it saves 
the pm_runtime_work pointer before calling rpm_resume().  Maybe we 
should avoid looking at the pointer until rpm_resume() returns.

> I think that it's better to reorder the checks so that the final ordering is:
> 
> * check power.no_callbacks and parent status
> * check RPM_RUN_WORK
> * check RPM_RESUMING || RPM_SUSPENDING
> * check RPM_ASYNC
> 
> so that we don't schedule the execution of pm_runtime_work() if
> power.no_callbacks is set and the parent is active and we still do the
> power.deferred_resume optimization if RPM_RUN_WORK is unset.

That seems reasonable.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael Wysocki Aug. 13, 2012, 7:56 p.m. UTC | #6
On Monday, August 13, 2012, Alan Stern wrote:
> On Mon, 13 Aug 2012, Rafael J. Wysocki wrote:
> 
> > > __pm_runtime_barrier() has never made very strong guarantees about 
> > > runtime_resume callbacks.  But the kerneldoc does claim that any 
> > > pending callbacks will have been completed, and that claim evidently is 
> > > violated in the rpm_resume(parent,0) case.
> > 
> > "Flush all pending requests for the device from pm_wq and wait for all
> > runtime PM operations involving the device in progress to complete."
> > 
> > It doesn't mention the parent.
> 
> You're missing the point.  Suppose you do an async resume and while the
> workqueue routine is executing pm_resume(parent,0), another thread
> calls pm_runtime_barrier() for the same device.  The barrier will
> return more or less immediately, even though there is a runtime PM
> operation involving the device (that is, the async resume) still in
> progress.  The rpm_resume() routine was running before
> pm_runtime_barrier() was called and will still be running afterward.

I see what you mean now.

> > But I agree, it's not very clear.
> > 
> > > Maybe the kerneldoc needs to be changed, or maybe we need to fix the code.
> > 
> > If anything, I'd change the kerneldoc.  The code pretty much has to be
> > what it is in this respect.
> 
> I'm not sure what guarantees pm_runtime_barrier() _can_ make about
> runtime_resume callbacks.  If you call that routine while the device is
> suspended then a runtime_resume callback could occur at any moment,
> because userspace can write "on" to the power/control attribute
> whenever it wants to.
> 
> I guess the best we can say is that if you call pm_runtime_barrier()  
> after updating the dev_pm_ops method pointers then after the barrier
> returns, the old method pointers will not be invoked and the old method
> routines will not be running.  So we need an equivalent guarantee with
> regard to the pm_runtime_work pointer.  (Yes, we could use a better 
> name for that pointer.)
> 
> Which means the code in the patch isn't quite right, because it saves 
> the pm_runtime_work pointer before calling rpm_resume().  Maybe we 
> should avoid looking at the pointer until rpm_resume() returns.

Yes, we can do that.

Alternatively, we can set power.work_in_progress before calling
rpm_resume(dev, 0) (i.e. regard the resume as a part of the work) to make
the barrier wait for all of it to complete.

> > I think that it's better to reorder the checks so that the final ordering is:
> > 
> > * check power.no_callbacks and parent status
> > * check RPM_RUN_WORK
> > * check RPM_RESUMING || RPM_SUSPENDING
> > * check RPM_ASYNC
> > 
> > so that we don't schedule the execution of pm_runtime_work() if
> > power.no_callbacks is set and the parent is active and we still do the
> > power.deferred_resume optimization if RPM_RUN_WORK is unset.
> 
> That seems reasonable.

OK

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Stern Aug. 13, 2012, 9:39 p.m. UTC | #7
On Mon, 13 Aug 2012, Rafael J. Wysocki wrote:

> > I guess the best we can say is that if you call pm_runtime_barrier()  
> > after updating the dev_pm_ops method pointers then after the barrier
> > returns, the old method pointers will not be invoked and the old method
> > routines will not be running.  So we need an equivalent guarantee with
> > regard to the pm_runtime_work pointer.  (Yes, we could use a better 
> > name for that pointer.)
> > 
> > Which means the code in the patch isn't quite right, because it saves 
> > the pm_runtime_work pointer before calling rpm_resume().  Maybe we 
> > should avoid looking at the pointer until rpm_resume() returns.
> 
> Yes, we can do that.
> 
> Alternatively, we can set power.work_in_progress before calling
> rpm_resume(dev, 0) (i.e. regard the resume as a part of the work) to make
> the barrier wait for all of it to complete.

Yep, that would work.  In fact, I did it that way in the proposed code 
posted earlier in this thread.  (But that was just on general 
principles, not because I had this particular race in mind.)

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael Wysocki Aug. 13, 2012, 10:06 p.m. UTC | #8
On Monday, August 13, 2012, Alan Stern wrote:
> On Mon, 13 Aug 2012, Rafael J. Wysocki wrote:
> 
> > > I guess the best we can say is that if you call pm_runtime_barrier()  
> > > after updating the dev_pm_ops method pointers then after the barrier
> > > returns, the old method pointers will not be invoked and the old method
> > > routines will not be running.  So we need an equivalent guarantee with
> > > regard to the pm_runtime_work pointer.  (Yes, we could use a better 
> > > name for that pointer.)
> > > 
> > > Which means the code in the patch isn't quite right, because it saves 
> > > the pm_runtime_work pointer before calling rpm_resume().  Maybe we 
> > > should avoid looking at the pointer until rpm_resume() returns.
> > 
> > Yes, we can do that.
> > 
> > Alternatively, we can set power.work_in_progress before calling
> > rpm_resume(dev, 0) (i.e. regard the resume as a part of the work) to make
> > the barrier wait for all of it to complete.
> 
> Yep, that would work.  In fact, I did it that way in the proposed code 
> posted earlier in this thread.  (But that was just on general 
> principles, not because I had this particular race in mind.)

OK

I need to prepare a new patch now, but first I'll send a couple of (minor)
fixes for the core runtime PM code.

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

Index: linux/include/linux/device.h
===================================================================
--- linux.orig/include/linux/device.h
+++ linux/include/linux/device.h
@@ -203,6 +203,7 @@  extern struct klist *bus_get_device_klis
  *		automatically.
  * @pm:		Power management operations of the device which matched
  *		this driver.
+ * @pm_runtime_work: Called after asynchronous runtime resume of the device.
  * @p:		Driver core's private data, no one other than the driver
  *		core can touch this.
  *
@@ -232,6 +233,7 @@  struct device_driver {
 	const struct attribute_group **groups;
 
 	const struct dev_pm_ops *pm;
+	void (*pm_runtime_work)(struct device *dev);
 
 	struct driver_private *p;
 };
Index: linux/include/linux/pm.h
===================================================================
--- linux.orig/include/linux/pm.h
+++ linux/include/linux/pm.h
@@ -538,6 +538,8 @@  struct dev_pm_info {
 	unsigned int		irq_safe:1;
 	unsigned int		use_autosuspend:1;
 	unsigned int		timer_autosuspends:1;
+	bool			run_driver_work:1;
+	bool			work_in_progress:1;
 	enum rpm_request	request;
 	enum rpm_status		runtime_status;
 	int			runtime_error;
Index: linux/include/linux/pm_runtime.h
===================================================================
--- linux.orig/include/linux/pm_runtime.h
+++ linux/include/linux/pm_runtime.h
@@ -22,6 +22,7 @@ 
 #define RPM_GET_PUT		0x04	/* Increment/decrement the
 					    usage_count */
 #define RPM_AUTO		0x08	/* Use autosuspend_delay */
+#define RPM_RUN_WORK		0x10	/* Run asynchronous work routine */
 
 #ifdef CONFIG_PM_RUNTIME
 
@@ -189,6 +190,11 @@  static inline int pm_request_autosuspend
 
 static inline int pm_runtime_get(struct device *dev)
 {
+	return __pm_runtime_resume(dev, RPM_GET_PUT | RPM_ASYNC | RPM_RUN_WORK);
+}
+
+static inline int pm_runtime_get_nowork(struct device *dev)
+{
 	return __pm_runtime_resume(dev, RPM_GET_PUT | RPM_ASYNC);
 }
 
Index: linux/drivers/base/power/runtime.c
===================================================================
--- linux.orig/drivers/base/power/runtime.c
+++ linux/drivers/base/power/runtime.c
@@ -484,6 +484,15 @@  static int rpm_suspend(struct device *de
 	goto out;
 }
 
+void rpm_queue_up_resume(struct device *dev)
+{
+	dev->power.request = RPM_REQ_RESUME;
+	if (!dev->power.request_pending) {
+		dev->power.request_pending = true;
+		queue_work(pm_wq, &dev->power.work);
+	}
+}
+
 /**
  * rpm_resume - Carry out runtime resume of given device.
  * @dev: Device to resume.
@@ -495,8 +504,10 @@  static int rpm_suspend(struct device *de
  * RPM_NOWAIT and RPM_ASYNC flags.  Similarly, if there's a suspend running in
  * parallel with this function, either tell the other process to resume after
  * suspending (deferred_resume) or wait for it to finish.  If the RPM_ASYNC
- * flag is set then queue a resume request; otherwise run the
- * ->runtime_resume() callback directly.  Queue an idle notification for the
+ * flag is set, then queue a resume request and if the RPM_RUN_WORK flag is set
+ * too, schedule the executction of the device driver's .pm_runtime_work()
+ * callback after the resume request has been completed.  Otherwise run the
+ * .runtime_resume() callback directly and queue an idle notification for the
  * device if the resume succeeded.
  *
  * This function must be called under dev->power.lock with interrupts disabled.
@@ -519,12 +530,18 @@  static int rpm_resume(struct device *dev
 		goto out;
 
 	/*
-	 * Other scheduled or pending requests need to be canceled.  Small
-	 * optimization: If an autosuspend timer is running, leave it running
-	 * rather than cancelling it now only to restart it again in the near
-	 * future.
+	 * Other scheduled or pending requests need to be canceled.  If the
+	 * execution of driver work is queued up along with a resume request,
+	 * do not cancel it.
+	 */
+	if (dev->power.request != RPM_REQ_RESUME || !dev->power.run_driver_work)
+		dev->power.request = RPM_REQ_NONE;
+
+	/*
+	 * Small optimization: If an autosuspend timer is running, leave it
+	 * running rather than cancelling it now only to restart it again in the
+	 * near future.
 	 */
-	dev->power.request = RPM_REQ_NONE;
 	if (!dev->power.timer_autosuspends)
 		pm_runtime_deactivate_timer(dev);
 
@@ -533,6 +550,13 @@  static int rpm_resume(struct device *dev
 		goto out;
 	}
 
+	if ((rpmflags & RPM_ASYNC) && (rpmflags & RPM_RUN_WORK)) {
+		dev->power.run_driver_work = true;
+		rpm_queue_up_resume(dev);
+		retval = 0;
+		goto out;
+	}
+
 	if (dev->power.runtime_status == RPM_RESUMING
 	    || dev->power.runtime_status == RPM_SUSPENDING) {
 		DEFINE_WAIT(wait);
@@ -591,11 +615,7 @@  static int rpm_resume(struct device *dev
 
 	/* Carry out an asynchronous or a synchronous resume. */
 	if (rpmflags & RPM_ASYNC) {
-		dev->power.request = RPM_REQ_RESUME;
-		if (!dev->power.request_pending) {
-			dev->power.request_pending = true;
-			queue_work(pm_wq, &dev->power.work);
-		}
+		rpm_queue_up_resume(dev);
 		retval = 0;
 		goto out;
 	}
@@ -691,6 +711,7 @@  static int rpm_resume(struct device *dev
 static void pm_runtime_work(struct work_struct *work)
 {
 	struct device *dev = container_of(work, struct device, power.work);
+	void (*driver_work)(struct device *) = NULL;
 	enum rpm_request req;
 
 	spin_lock_irq(&dev->power.lock);
@@ -715,11 +736,29 @@  static void pm_runtime_work(struct work_
 		rpm_suspend(dev, RPM_NOWAIT | RPM_AUTO);
 		break;
 	case RPM_REQ_RESUME:
-		rpm_resume(dev, RPM_NOWAIT);
+		if (dev->power.run_driver_work && dev->driver->pm_runtime_work)
+			driver_work = dev->driver->pm_runtime_work;
+
+		dev->power.run_driver_work = false;
+		rpm_resume(dev, driver_work ? 0 : RPM_NOWAIT);
 		break;
 	}
 
  out:
+	if (driver_work) {
+		pm_runtime_get_noresume(dev);
+		dev->power.work_in_progress = true;
+		spin_unlock_irq(&dev->power.lock);
+
+		driver_work(dev);
+
+		spin_lock_irq(&dev->power.lock);
+		dev->power.work_in_progress = false;
+		wake_up_all(&dev->power.wait_queue);
+		pm_runtime_put_noidle(dev);
+		rpm_idle(dev, RPM_NOWAIT);
+	}
+
 	spin_unlock_irq(&dev->power.lock);
 }
 
@@ -982,7 +1021,8 @@  static void __pm_runtime_barrier(struct
 
 	if (dev->power.runtime_status == RPM_SUSPENDING
 	    || dev->power.runtime_status == RPM_RESUMING
-	    || dev->power.idle_notification) {
+	    || dev->power.idle_notification
+	    || dev->power.work_in_progress) {
 		DEFINE_WAIT(wait);
 
 		/* Suspend, wake-up or idle notification in progress. */
@@ -991,7 +1031,8 @@  static void __pm_runtime_barrier(struct
 					TASK_UNINTERRUPTIBLE);
 			if (dev->power.runtime_status != RPM_SUSPENDING
 			    && dev->power.runtime_status != RPM_RESUMING
-			    && !dev->power.idle_notification)
+			    && !dev->power.idle_notification
+			    && !dev->power.work_in_progress)
 				break;
 			spin_unlock_irq(&dev->power.lock);