Message ID | 201208092242.32602.rjw@sisk.pl (mailing list archive) |
---|---|
State | RFC, archived |
Headers | show |
On Thu, 9 Aug 2012, Rafael J. Wysocki wrote: > There are some known concerns about this approach. > > First of all, the patch at > > https://patchwork.kernel.org/patch/1299781/ > > increases the size of struct device by the size of a pointer, which may seem to > be a bit excessive to somebody, although I personally don't think it's a big > problem. We don't use _that_ many struct device objects for it to matter much. > > Second, which is more important to me, it seems that for a given device func() > will always be the same pointer and it will be used by the device's driver > only. In that case, most likely, it will be possible to determine the > address of func() at the time of driver initialization, so the setting and > clearing of power.func and passing the address of func() as an argument every > time __pm_runtime_get_and_call() is run may turn out to be an unnecessary > overhead. Thus it may be more efficient to use a function pointer in struct > device_driver (it can't be located in struct dev_pm_ops, because some drivers > don't use it at all, like USB drivers, and it wouldn't be useful for subsystems > and PM domains) to store the address of func() permanently. > > For the above reasons, the appended patch implements an alternative approach, > which is to modify the way pm_runtime_get() works so that, when the device is > not active, it will queue a resume request for the device _along_ _with_ the > execution of a driver routine provided through a new function pointer > .pm_runtime_work(). There also is pm_runtime_get_nowork() that won't do that > and works in the same way as the "old" pm_runtime_get(). > > Of course, the disadvantage of the new patch is that it makes the change > in struct device_driver, but perhaps it's not a big deal. > > I wonder what you think. I have some concerns about this patch. Firstly, the patch doesn't do anything in the case where the device is already at full power. Should we invoke the callback routine synchronously? This loses the advantage of a workqueue's "pristine" environment, but on the other hand it is much more efficient. (And we're talking about hot pathways, where efficiency matters.) The alternative, of course, is to have the driver invoke the callback directly if pm_runtime_get() returns 1. Secondly, is it necessary for __pm_runtime_barrier() to wait for the callback routine? More generally, does __pm_runtime_barrier() really do what it's supposed to do? What happens if it runs at the same time as another thread is executing the pm_runtime_put(parent) call at the end of rpm_resume(), or the rpm_resume(parent, 0) in the middle? Thirdly, I would reorganize the new code added to pm_runtime_work(); see below. > @@ -533,6 +550,13 @@ static int rpm_resume(struct device *dev > goto out; > } > > + if ((rpmflags & RPM_ASYNC) && (rpmflags & RPM_RUN_WORK)) { > + dev->power.run_driver_work = true; > + rpm_queue_up_resume(dev); > + retval = 0; > + goto out; > + } > + The section of code just before the start of this hunk exits the routine if the device is already active. Do you want to put this new section in the preceding "if" block? Also, it feels odd to have this code here when there is another section lower down that also tests for RPM_ASYNC and does almost the same thing. It suggests that this new section isn't in the right place. For instance, consider what happens in the "no_callbacks" case where the parent is already active. > @@ -715,11 +736,29 @@ static void pm_runtime_work(struct work_ > rpm_suspend(dev, RPM_NOWAIT | RPM_AUTO); > break; > case RPM_REQ_RESUME: > - rpm_resume(dev, RPM_NOWAIT); > + if (dev->power.run_driver_work && dev->driver->pm_runtime_work) > + driver_work = dev->driver->pm_runtime_work; > + > + dev->power.run_driver_work = false; > + rpm_resume(dev, driver_work ? 0 : RPM_NOWAIT); > break; > } > > out: > + if (driver_work) { > + pm_runtime_get_noresume(dev); > + dev->power.work_in_progress = true; > + spin_unlock_irq(&dev->power.lock); > + > + driver_work(dev); > + > + spin_lock_irq(&dev->power.lock); > + dev->power.work_in_progress = false; > + wake_up_all(&dev->power.wait_queue); > + pm_runtime_put_noidle(dev); > + rpm_idle(dev, RPM_NOWAIT); > + } > + It seems very illgical to do all the callback stuff here, after the "switch" statement. IMO it would make more sense to put it all together, more or less as follows: case RPM_REQ_RESUME: if (dev->power.run_driver_work && dev->driver->pm_runtime_work) { driver_work = dev->driver->pm_runtime_work; dev->power.run_driver_work = false; dev->power.work_in_progress = true; pm_runtime_get_noresume(dev); rpm_resume(dev, 0); spin_unlock_irq(&dev->power.lock); driver_work(dev); spin_lock_irq(&dev->power.lock); dev->power.work_in_progress = false; wake_up_all(&dev->power.wait_queue); pm_runtime_put_noidle(dev); rpm_idle(dev, RPM_NOWAIT); } else { rpm_resume(dev, RPM_NOWAIT); } break; Notice also that it's important to do the _get_noresume _before_ calling rpm_resume(). Otherwise the device might get suspended again before the callback gets a chance to run. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sunday, August 12, 2012, Alan Stern wrote: > On Thu, 9 Aug 2012, Rafael J. Wysocki wrote: > > > There are some known concerns about this approach. > > > > First of all, the patch at > > > > https://patchwork.kernel.org/patch/1299781/ > > > > increases the size of struct device by the size of a pointer, which may seem to > > be a bit excessive to somebody, although I personally don't think it's a big > > problem. We don't use _that_ many struct device objects for it to matter much. > > > > Second, which is more important to me, it seems that for a given device func() > > will always be the same pointer and it will be used by the device's driver > > only. In that case, most likely, it will be possible to determine the > > address of func() at the time of driver initialization, so the setting and > > clearing of power.func and passing the address of func() as an argument every > > time __pm_runtime_get_and_call() is run may turn out to be an unnecessary > > overhead. Thus it may be more efficient to use a function pointer in struct > > device_driver (it can't be located in struct dev_pm_ops, because some drivers > > don't use it at all, like USB drivers, and it wouldn't be useful for subsystems > > and PM domains) to store the address of func() permanently. > > > > For the above reasons, the appended patch implements an alternative approach, > > which is to modify the way pm_runtime_get() works so that, when the device is > > not active, it will queue a resume request for the device _along_ _with_ the > > execution of a driver routine provided through a new function pointer > > .pm_runtime_work(). There also is pm_runtime_get_nowork() that won't do that > > and works in the same way as the "old" pm_runtime_get(). > > > > Of course, the disadvantage of the new patch is that it makes the change > > in struct device_driver, but perhaps it's not a big deal. > > > > I wonder what you think. > > I have some concerns about this patch. > > Firstly, the patch doesn't do anything in the case where the device is > already at full power. This is intentional, because I'm not sure that the code to be run if pm_runtime_get() returns 1 should always be pm_runtime_work(). For example, the driver may want to acquire a lock before running pm_runtime_get() and execute that code under the lock. > Should we invoke the callback routine > synchronously? This loses the advantage of a workqueue's "pristine" > environment, but on the other hand it is much more efficient. I'm not sure if it is always going to be more efficient. > (And we're talking about hot pathways, where efficiency matters.) The > alternative, of course, is to have the driver invoke the callback > directly if pm_runtime_get() returns 1. Sure. If every user of .pm_runtime_work() ends up calling it when pm_runtime_get() returns 1, then there will be a point to modify the core to do that instead. However, I'm not sure if that's going to be the case at the moment. > Secondly, is it necessary for __pm_runtime_barrier() to wait for the > callback routine? I believe so. At least that's what is documented about __pm_runtime_barrier(). > More generally, does __pm_runtime_barrier() really > do what it's supposed to do? What happens if it runs at the same time > as another thread is executing the pm_runtime_put(parent) call at the > end of rpm_resume(), or the rpm_resume(parent, 0) in the middle? So these are two different situations. When pm_runtime_put(parent) is executed, the device has been resumed and no runtime PM code _for_ _the_ _device_ is running (although the trace_rpm_return_int() is at a wrong place in my opinion). The second one is more interesting, but it really is equivalent to calling pm_runtime_resume() (in a different thread) after __pm_runtime_barrier() has run. > Thirdly, I would reorganize the new code added to pm_runtime_work(); > see below. > > > > @@ -533,6 +550,13 @@ static int rpm_resume(struct device *dev > > goto out; > > } > > > > + if ((rpmflags & RPM_ASYNC) && (rpmflags & RPM_RUN_WORK)) { > > + dev->power.run_driver_work = true; > > + rpm_queue_up_resume(dev); > > + retval = 0; > > + goto out; > > + } > > + > > The section of code just before the start of this hunk exits the > routine if the device is already active. Do you want to put this new > section in the preceding "if" block? Yes, I do. This is to ensure that the execution of pm_runtime_work() will be scheduled if RPM_RUN_WORK is set. > Also, it feels odd to have this code here when there is another section > lower down that also tests for RPM_ASYNC and does almost the same > thing. It suggests that this new section isn't in the right place. Yes, it does. However, the code between the two questions contains some checks that I want to skip if RPM_RUN_WORK is set (otherwis, the execution of pm_runtime_work() may not be scheduled at all). > For instance, consider what happens in the "no_callbacks" case where > the parent is already active. The no_callbacks case is actually interesting, because I think that the function should return 1 in that case. Otherwise, the caller of pm_runtime_get() may think that it has to wait for the device to resume, which isn't the case. So, this seems to need fixing now. Moreover, if power.no_callbacks is set, the RPM_SUSPENDING and RPM_RESUMING status values are impossible, as far as I can say, so the entire "no callbacks" section should be moved right after the check against RPM_ACTIVE. The same appears to apply to the analogous "no callbacks" check in rpm_suspend() (i.e. it should be done earlier). After those changes I'd put "my" check against RPM_RUN_WORK after the "no callbacks" check, but before the "RPM_SUSPENDING or RPM_RESUMING" one. > > @@ -715,11 +736,29 @@ static void pm_runtime_work(struct work_ > > rpm_suspend(dev, RPM_NOWAIT | RPM_AUTO); > > break; > > case RPM_REQ_RESUME: > > - rpm_resume(dev, RPM_NOWAIT); > > + if (dev->power.run_driver_work && dev->driver->pm_runtime_work) > > + driver_work = dev->driver->pm_runtime_work; > > + > > + dev->power.run_driver_work = false; > > + rpm_resume(dev, driver_work ? 0 : RPM_NOWAIT); > > break; > > } > > > > out: > > + if (driver_work) { > > + pm_runtime_get_noresume(dev); > > + dev->power.work_in_progress = true; > > + spin_unlock_irq(&dev->power.lock); > > + > > + driver_work(dev); > > + > > + spin_lock_irq(&dev->power.lock); > > + dev->power.work_in_progress = false; > > + wake_up_all(&dev->power.wait_queue); > > + pm_runtime_put_noidle(dev); > > + rpm_idle(dev, RPM_NOWAIT); > > + } > > + > > It seems very illgical to do all the callback stuff here, after the > "switch" statement. IMO it would make more sense to put it all > together, more or less as follows: > > case RPM_REQ_RESUME: > if (dev->power.run_driver_work && dev->driver->pm_runtime_work) { > driver_work = dev->driver->pm_runtime_work; > dev->power.run_driver_work = false; > dev->power.work_in_progress = true; > pm_runtime_get_noresume(dev); > rpm_resume(dev, 0); > spin_unlock_irq(&dev->power.lock); > > driver_work(dev); > > spin_lock_irq(&dev->power.lock); > dev->power.work_in_progress = false; > wake_up_all(&dev->power.wait_queue); > pm_runtime_put_noidle(dev); > rpm_idle(dev, RPM_NOWAIT); > } else { > rpm_resume(dev, RPM_NOWAIT); > } > break; OK > Notice also that it's important to do the _get_noresume _before_ > calling rpm_resume(). Otherwise the device might get suspended again > before the callback gets a chance to run. You're right. I forgot about dropping the lock in order to call pm_runtime_put(parent). Thanks, Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 13 Aug 2012, Rafael J. Wysocki wrote: > > Firstly, the patch doesn't do anything in the case where the device is > > already at full power. > > This is intentional, because I'm not sure that the code to be run > if pm_runtime_get() returns 1 should always be pm_runtime_work(). > > For example, the driver may want to acquire a lock before running > pm_runtime_get() and execute that code under the lock. Your reason above makes sense, but the example isn't quite right. If the driver wants to execute the callback under a lock then the callback has to take that lock -- this is necessary because of the async workqueue case. In other words, you're saying there may well be situations where the synchronous and async cases need to run slightly different code. True enough. > > Secondly, is it necessary for __pm_runtime_barrier() to wait for the > > callback routine? > > I believe so. At least that's what is documented about __pm_runtime_barrier(). > > > More generally, does __pm_runtime_barrier() really > > do what it's supposed to do? What happens if it runs at the same time > > as another thread is executing the pm_runtime_put(parent) call at the > > end of rpm_resume(), or the rpm_resume(parent, 0) in the middle? > > So these are two different situations. When pm_runtime_put(parent) is > executed, the device has been resumed and no runtime PM code _for_ _the_ > _device_ is running (although the trace_rpm_return_int() is at a wrong > place in my opinion). The second one is more interesting, but it really > is equivalent to calling pm_runtime_resume() (in a different thread) > after __pm_runtime_barrier() has run. __pm_runtime_barrier() has never made very strong guarantees about runtime_resume callbacks. But the kerneldoc does claim that any pending callbacks will have been completed, and that claim evidently is violated in the rpm_resume(parent,0) case. Maybe the kerneldoc needs to be changed, or maybe we need to fix the code. > > For instance, consider what happens in the "no_callbacks" case where > > the parent is already active. > > The no_callbacks case is actually interesting, because I think that the > function should return 1 in that case. Otherwise, the caller of > pm_runtime_get() may think that it has to wait for the device to resume, > which isn't the case. So, this seems to need fixing now. Right. > Moreover, if power.no_callbacks is set, the RPM_SUSPENDING and RPM_RESUMING > status values are impossible, as far as I can say, so the entire "no callbacks" > section should be moved right after the check against RPM_ACTIVE. The same > appears to apply to the analogous "no callbacks" check in rpm_suspend() (i.e. > it should be done earlier). I'm not so sure about that. Basically we've got two conditions: if (A) { ... } if (B) { ... } where A and B can never both be true. In this situation it doesn't make much difference what order you do the tests. We should use whichever order is better for adding the RPM_RUN_WORK code. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Monday, August 13, 2012, Alan Stern wrote: > On Mon, 13 Aug 2012, Rafael J. Wysocki wrote: > > > > Firstly, the patch doesn't do anything in the case where the device is > > > already at full power. > > > > This is intentional, because I'm not sure that the code to be run > > if pm_runtime_get() returns 1 should always be pm_runtime_work(). > > > > For example, the driver may want to acquire a lock before running > > pm_runtime_get() and execute that code under the lock. > > Your reason above makes sense, but the example isn't quite right. If > the driver wants to execute the callback under a lock then the callback > has to take that lock -- this is necessary because of the async > workqueue case. I thought about this scenario: * acquire lock * do something * ret = pm_runtime_get(dev); * if (ret > 0) do something more * pm_runtime_put(dev); * release lock in which the "do something more" thing cannot be the same as the contents of the .pm_runtime_work() callback (I notice that I have a collision of names between the callback and the workqueue function in runtime.c), unless that callback doesn't acquire the same lock. > In other words, you're saying there may well be situations where the > synchronous and async cases need to run slightly different code. Yes. > True enough. :-) > > > Secondly, is it necessary for __pm_runtime_barrier() to wait for the > > > callback routine? > > > > I believe so. At least that's what is documented about __pm_runtime_barrier(). > > > > > More generally, does __pm_runtime_barrier() really > > > do what it's supposed to do? What happens if it runs at the same time > > > as another thread is executing the pm_runtime_put(parent) call at the > > > end of rpm_resume(), or the rpm_resume(parent, 0) in the middle? > > > > So these are two different situations. When pm_runtime_put(parent) is > > executed, the device has been resumed and no runtime PM code _for_ _the_ > > _device_ is running (although the trace_rpm_return_int() is at a wrong > > place in my opinion). The second one is more interesting, but it really > > is equivalent to calling pm_runtime_resume() (in a different thread) > > after __pm_runtime_barrier() has run. > > __pm_runtime_barrier() has never made very strong guarantees about > runtime_resume callbacks. But the kerneldoc does claim that any > pending callbacks will have been completed, and that claim evidently is > violated in the rpm_resume(parent,0) case. "Flush all pending requests for the device from pm_wq and wait for all runtime PM operations involving the device in progress to complete." It doesn't mention the parent. But I agree, it's not very clear. > Maybe the kerneldoc needs to be changed, or maybe we need to fix the code. If anything, I'd change the kerneldoc. The code pretty much has to be what it is in this respect. > > > For instance, consider what happens in the "no_callbacks" case where > > > the parent is already active. > > > > The no_callbacks case is actually interesting, because I think that the > > function should return 1 in that case. Otherwise, the caller of > > pm_runtime_get() may think that it has to wait for the device to resume, > > which isn't the case. So, this seems to need fixing now. > > Right. > > > Moreover, if power.no_callbacks is set, the RPM_SUSPENDING and RPM_RESUMING > > status values are impossible, as far as I can say, so the entire "no callbacks" > > section should be moved right after the check against RPM_ACTIVE. The same > > appears to apply to the analogous "no callbacks" check in rpm_suspend() (i.e. > > it should be done earlier). > > I'm not so sure about that. Basically we've got two conditions: > > if (A) { > ... > } > if (B) { > ... > } > > where A and B can never both be true. In this situation it doesn't > make much difference what order you do the tests. We should use > whichever order is better for adding the RPM_RUN_WORK code. OK, fair enough. I think that it's better to reorder the checks so that the final ordering is: * check power.no_callbacks and parent status * check RPM_RUN_WORK * check RPM_RESUMING || RPM_SUSPENDING * check RPM_ASYNC so that we don't schedule the execution of pm_runtime_work() if power.no_callbacks is set and the parent is active and we still do the power.deferred_resume optimization if RPM_RUN_WORK is unset. Thanks, Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 13 Aug 2012, Rafael J. Wysocki wrote: > > __pm_runtime_barrier() has never made very strong guarantees about > > runtime_resume callbacks. But the kerneldoc does claim that any > > pending callbacks will have been completed, and that claim evidently is > > violated in the rpm_resume(parent,0) case. > > "Flush all pending requests for the device from pm_wq and wait for all > runtime PM operations involving the device in progress to complete." > > It doesn't mention the parent. You're missing the point. Suppose you do an async resume and while the workqueue routine is executing pm_resume(parent,0), another thread calls pm_runtime_barrier() for the same device. The barrier will return more or less immediately, even though there is a runtime PM operation involving the device (that is, the async resume) still in progress. The rpm_resume() routine was running before pm_runtime_barrier() was called and will still be running afterward. > But I agree, it's not very clear. > > > Maybe the kerneldoc needs to be changed, or maybe we need to fix the code. > > If anything, I'd change the kerneldoc. The code pretty much has to be > what it is in this respect. I'm not sure what guarantees pm_runtime_barrier() _can_ make about runtime_resume callbacks. If you call that routine while the device is suspended then a runtime_resume callback could occur at any moment, because userspace can write "on" to the power/control attribute whenever it wants to. I guess the best we can say is that if you call pm_runtime_barrier() after updating the dev_pm_ops method pointers then after the barrier returns, the old method pointers will not be invoked and the old method routines will not be running. So we need an equivalent guarantee with regard to the pm_runtime_work pointer. (Yes, we could use a better name for that pointer.) Which means the code in the patch isn't quite right, because it saves the pm_runtime_work pointer before calling rpm_resume(). Maybe we should avoid looking at the pointer until rpm_resume() returns. > I think that it's better to reorder the checks so that the final ordering is: > > * check power.no_callbacks and parent status > * check RPM_RUN_WORK > * check RPM_RESUMING || RPM_SUSPENDING > * check RPM_ASYNC > > so that we don't schedule the execution of pm_runtime_work() if > power.no_callbacks is set and the parent is active and we still do the > power.deferred_resume optimization if RPM_RUN_WORK is unset. That seems reasonable. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Monday, August 13, 2012, Alan Stern wrote: > On Mon, 13 Aug 2012, Rafael J. Wysocki wrote: > > > > __pm_runtime_barrier() has never made very strong guarantees about > > > runtime_resume callbacks. But the kerneldoc does claim that any > > > pending callbacks will have been completed, and that claim evidently is > > > violated in the rpm_resume(parent,0) case. > > > > "Flush all pending requests for the device from pm_wq and wait for all > > runtime PM operations involving the device in progress to complete." > > > > It doesn't mention the parent. > > You're missing the point. Suppose you do an async resume and while the > workqueue routine is executing pm_resume(parent,0), another thread > calls pm_runtime_barrier() for the same device. The barrier will > return more or less immediately, even though there is a runtime PM > operation involving the device (that is, the async resume) still in > progress. The rpm_resume() routine was running before > pm_runtime_barrier() was called and will still be running afterward. I see what you mean now. > > But I agree, it's not very clear. > > > > > Maybe the kerneldoc needs to be changed, or maybe we need to fix the code. > > > > If anything, I'd change the kerneldoc. The code pretty much has to be > > what it is in this respect. > > I'm not sure what guarantees pm_runtime_barrier() _can_ make about > runtime_resume callbacks. If you call that routine while the device is > suspended then a runtime_resume callback could occur at any moment, > because userspace can write "on" to the power/control attribute > whenever it wants to. > > I guess the best we can say is that if you call pm_runtime_barrier() > after updating the dev_pm_ops method pointers then after the barrier > returns, the old method pointers will not be invoked and the old method > routines will not be running. So we need an equivalent guarantee with > regard to the pm_runtime_work pointer. (Yes, we could use a better > name for that pointer.) > > Which means the code in the patch isn't quite right, because it saves > the pm_runtime_work pointer before calling rpm_resume(). Maybe we > should avoid looking at the pointer until rpm_resume() returns. Yes, we can do that. Alternatively, we can set power.work_in_progress before calling rpm_resume(dev, 0) (i.e. regard the resume as a part of the work) to make the barrier wait for all of it to complete. > > I think that it's better to reorder the checks so that the final ordering is: > > > > * check power.no_callbacks and parent status > > * check RPM_RUN_WORK > > * check RPM_RESUMING || RPM_SUSPENDING > > * check RPM_ASYNC > > > > so that we don't schedule the execution of pm_runtime_work() if > > power.no_callbacks is set and the parent is active and we still do the > > power.deferred_resume optimization if RPM_RUN_WORK is unset. > > That seems reasonable. OK Thanks, Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 13 Aug 2012, Rafael J. Wysocki wrote: > > I guess the best we can say is that if you call pm_runtime_barrier() > > after updating the dev_pm_ops method pointers then after the barrier > > returns, the old method pointers will not be invoked and the old method > > routines will not be running. So we need an equivalent guarantee with > > regard to the pm_runtime_work pointer. (Yes, we could use a better > > name for that pointer.) > > > > Which means the code in the patch isn't quite right, because it saves > > the pm_runtime_work pointer before calling rpm_resume(). Maybe we > > should avoid looking at the pointer until rpm_resume() returns. > > Yes, we can do that. > > Alternatively, we can set power.work_in_progress before calling > rpm_resume(dev, 0) (i.e. regard the resume as a part of the work) to make > the barrier wait for all of it to complete. Yep, that would work. In fact, I did it that way in the proposed code posted earlier in this thread. (But that was just on general principles, not because I had this particular race in mind.) Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Monday, August 13, 2012, Alan Stern wrote: > On Mon, 13 Aug 2012, Rafael J. Wysocki wrote: > > > > I guess the best we can say is that if you call pm_runtime_barrier() > > > after updating the dev_pm_ops method pointers then after the barrier > > > returns, the old method pointers will not be invoked and the old method > > > routines will not be running. So we need an equivalent guarantee with > > > regard to the pm_runtime_work pointer. (Yes, we could use a better > > > name for that pointer.) > > > > > > Which means the code in the patch isn't quite right, because it saves > > > the pm_runtime_work pointer before calling rpm_resume(). Maybe we > > > should avoid looking at the pointer until rpm_resume() returns. > > > > Yes, we can do that. > > > > Alternatively, we can set power.work_in_progress before calling > > rpm_resume(dev, 0) (i.e. regard the resume as a part of the work) to make > > the barrier wait for all of it to complete. > > Yep, that would work. In fact, I did it that way in the proposed code > posted earlier in this thread. (But that was just on general > principles, not because I had this particular race in mind.) OK I need to prepare a new patch now, but first I'll send a couple of (minor) fixes for the core runtime PM code. Thanks, Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Index: linux/include/linux/device.h =================================================================== --- linux.orig/include/linux/device.h +++ linux/include/linux/device.h @@ -203,6 +203,7 @@ extern struct klist *bus_get_device_klis * automatically. * @pm: Power management operations of the device which matched * this driver. + * @pm_runtime_work: Called after asynchronous runtime resume of the device. * @p: Driver core's private data, no one other than the driver * core can touch this. * @@ -232,6 +233,7 @@ struct device_driver { const struct attribute_group **groups; const struct dev_pm_ops *pm; + void (*pm_runtime_work)(struct device *dev); struct driver_private *p; }; Index: linux/include/linux/pm.h =================================================================== --- linux.orig/include/linux/pm.h +++ linux/include/linux/pm.h @@ -538,6 +538,8 @@ struct dev_pm_info { unsigned int irq_safe:1; unsigned int use_autosuspend:1; unsigned int timer_autosuspends:1; + bool run_driver_work:1; + bool work_in_progress:1; enum rpm_request request; enum rpm_status runtime_status; int runtime_error; Index: linux/include/linux/pm_runtime.h =================================================================== --- linux.orig/include/linux/pm_runtime.h +++ linux/include/linux/pm_runtime.h @@ -22,6 +22,7 @@ #define RPM_GET_PUT 0x04 /* Increment/decrement the usage_count */ #define RPM_AUTO 0x08 /* Use autosuspend_delay */ +#define RPM_RUN_WORK 0x10 /* Run asynchronous work routine */ #ifdef CONFIG_PM_RUNTIME @@ -189,6 +190,11 @@ static inline int pm_request_autosuspend static inline int pm_runtime_get(struct device *dev) { + return __pm_runtime_resume(dev, RPM_GET_PUT | RPM_ASYNC | RPM_RUN_WORK); +} + +static inline int pm_runtime_get_nowork(struct device *dev) +{ return __pm_runtime_resume(dev, RPM_GET_PUT | RPM_ASYNC); } Index: linux/drivers/base/power/runtime.c =================================================================== --- linux.orig/drivers/base/power/runtime.c +++ linux/drivers/base/power/runtime.c @@ -484,6 +484,15 @@ static int rpm_suspend(struct device *de goto out; } +void rpm_queue_up_resume(struct device *dev) +{ + dev->power.request = RPM_REQ_RESUME; + if (!dev->power.request_pending) { + dev->power.request_pending = true; + queue_work(pm_wq, &dev->power.work); + } +} + /** * rpm_resume - Carry out runtime resume of given device. * @dev: Device to resume. @@ -495,8 +504,10 @@ static int rpm_suspend(struct device *de * RPM_NOWAIT and RPM_ASYNC flags. Similarly, if there's a suspend running in * parallel with this function, either tell the other process to resume after * suspending (deferred_resume) or wait for it to finish. If the RPM_ASYNC - * flag is set then queue a resume request; otherwise run the - * ->runtime_resume() callback directly. Queue an idle notification for the + * flag is set, then queue a resume request and if the RPM_RUN_WORK flag is set + * too, schedule the executction of the device driver's .pm_runtime_work() + * callback after the resume request has been completed. Otherwise run the + * .runtime_resume() callback directly and queue an idle notification for the * device if the resume succeeded. * * This function must be called under dev->power.lock with interrupts disabled. @@ -519,12 +530,18 @@ static int rpm_resume(struct device *dev goto out; /* - * Other scheduled or pending requests need to be canceled. Small - * optimization: If an autosuspend timer is running, leave it running - * rather than cancelling it now only to restart it again in the near - * future. + * Other scheduled or pending requests need to be canceled. If the + * execution of driver work is queued up along with a resume request, + * do not cancel it. + */ + if (dev->power.request != RPM_REQ_RESUME || !dev->power.run_driver_work) + dev->power.request = RPM_REQ_NONE; + + /* + * Small optimization: If an autosuspend timer is running, leave it + * running rather than cancelling it now only to restart it again in the + * near future. */ - dev->power.request = RPM_REQ_NONE; if (!dev->power.timer_autosuspends) pm_runtime_deactivate_timer(dev); @@ -533,6 +550,13 @@ static int rpm_resume(struct device *dev goto out; } + if ((rpmflags & RPM_ASYNC) && (rpmflags & RPM_RUN_WORK)) { + dev->power.run_driver_work = true; + rpm_queue_up_resume(dev); + retval = 0; + goto out; + } + if (dev->power.runtime_status == RPM_RESUMING || dev->power.runtime_status == RPM_SUSPENDING) { DEFINE_WAIT(wait); @@ -591,11 +615,7 @@ static int rpm_resume(struct device *dev /* Carry out an asynchronous or a synchronous resume. */ if (rpmflags & RPM_ASYNC) { - dev->power.request = RPM_REQ_RESUME; - if (!dev->power.request_pending) { - dev->power.request_pending = true; - queue_work(pm_wq, &dev->power.work); - } + rpm_queue_up_resume(dev); retval = 0; goto out; } @@ -691,6 +711,7 @@ static int rpm_resume(struct device *dev static void pm_runtime_work(struct work_struct *work) { struct device *dev = container_of(work, struct device, power.work); + void (*driver_work)(struct device *) = NULL; enum rpm_request req; spin_lock_irq(&dev->power.lock); @@ -715,11 +736,29 @@ static void pm_runtime_work(struct work_ rpm_suspend(dev, RPM_NOWAIT | RPM_AUTO); break; case RPM_REQ_RESUME: - rpm_resume(dev, RPM_NOWAIT); + if (dev->power.run_driver_work && dev->driver->pm_runtime_work) + driver_work = dev->driver->pm_runtime_work; + + dev->power.run_driver_work = false; + rpm_resume(dev, driver_work ? 0 : RPM_NOWAIT); break; } out: + if (driver_work) { + pm_runtime_get_noresume(dev); + dev->power.work_in_progress = true; + spin_unlock_irq(&dev->power.lock); + + driver_work(dev); + + spin_lock_irq(&dev->power.lock); + dev->power.work_in_progress = false; + wake_up_all(&dev->power.wait_queue); + pm_runtime_put_noidle(dev); + rpm_idle(dev, RPM_NOWAIT); + } + spin_unlock_irq(&dev->power.lock); } @@ -982,7 +1021,8 @@ static void __pm_runtime_barrier(struct if (dev->power.runtime_status == RPM_SUSPENDING || dev->power.runtime_status == RPM_RESUMING - || dev->power.idle_notification) { + || dev->power.idle_notification + || dev->power.work_in_progress) { DEFINE_WAIT(wait); /* Suspend, wake-up or idle notification in progress. */ @@ -991,7 +1031,8 @@ static void __pm_runtime_barrier(struct TASK_UNINTERRUPTIBLE); if (dev->power.runtime_status != RPM_SUSPENDING && dev->power.runtime_status != RPM_RESUMING - && !dev->power.idle_notification) + && !dev->power.idle_notification + && !dev->power.work_in_progress) break; spin_unlock_irq(&dev->power.lock);