diff mbox

PCI / PM: Block races between runtime PM and system sleep

Message ID 201106192149.19401.rjw@sisk.pl (mailing list archive)
State Superseded, archived
Headers show

Commit Message

Rafael Wysocki June 19, 2011, 7:49 p.m. UTC
From: Rafael J. Wysocki <rjw@sisk.pl>

After commit e8665002477f0278f84f898145b1f141ba26ee26
(PM: Allow pm_runtime_suspend() to succeed during system suspend) it
is possible that a device resumed by the pm_runtime_resume(dev) in
pci_pm_prepare() will be suspended immediately from a work item,
timer function or otherwise, defeating the very purpose of calling
pm_runtime_resume(dev) from there.  To prevent that from happening
it is necessary to increment the runtime PM usage counter of the
device by replacing pm_runtime_resume() with pm_runtime_get_sync().
Moreover, the incremented runtime PM usage counter has to be
decremented by the corresponding pci_pm_complete(), via
pm_runtime_put_noidle().

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: stable@kernel.org
---
 drivers/pci/pci-driver.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Alan Stern June 20, 2011, 2:46 p.m. UTC | #1
On Sun, 19 Jun 2011, Rafael J. Wysocki wrote:

> From: Rafael J. Wysocki <rjw@sisk.pl>
> 
> After commit e8665002477f0278f84f898145b1f141ba26ee26
> (PM: Allow pm_runtime_suspend() to succeed during system suspend) it
> is possible that a device resumed by the pm_runtime_resume(dev) in
> pci_pm_prepare() will be suspended immediately from a work item,
> timer function or otherwise, defeating the very purpose of calling
> pm_runtime_resume(dev) from there.  To prevent that from happening
> it is necessary to increment the runtime PM usage counter of the
> device by replacing pm_runtime_resume() with pm_runtime_get_sync().
> Moreover, the incremented runtime PM usage counter has to be
> decremented by the corresponding pci_pm_complete(), via
> pm_runtime_put_noidle().

In both this and the previous patch, the final decrement should be done 
by pm_runtime_put_sync() instead of pm_runtime_put_idle().  Otherwise 
you face the possibility that the usage_count may go to 0 but the 
device will be left active.

Furthermore, since we're going to disable runtime PM as soon as the
suspend callback returns anyway, why not increment usage_count before
invoking the callback?  This will prevent runtime suspends from 
occurring while the callback runs, so no changes will be needed in the 
PCI or USB subsystems.

It also will prevent Kevin from calling pm_runtime_suspend from within
his suspend callbacks, but you have already determined that subsystems
and drivers should never do that in any case.

Alan Stern
Rafael Wysocki June 20, 2011, 7:42 p.m. UTC | #2
On Monday, June 20, 2011, Alan Stern wrote:
> On Sun, 19 Jun 2011, Rafael J. Wysocki wrote:
> 
> > From: Rafael J. Wysocki <rjw@sisk.pl>
> > 
> > After commit e8665002477f0278f84f898145b1f141ba26ee26
> > (PM: Allow pm_runtime_suspend() to succeed during system suspend) it
> > is possible that a device resumed by the pm_runtime_resume(dev) in
> > pci_pm_prepare() will be suspended immediately from a work item,
> > timer function or otherwise, defeating the very purpose of calling
> > pm_runtime_resume(dev) from there.  To prevent that from happening
> > it is necessary to increment the runtime PM usage counter of the
> > device by replacing pm_runtime_resume() with pm_runtime_get_sync().
> > Moreover, the incremented runtime PM usage counter has to be
> > decremented by the corresponding pci_pm_complete(), via
> > pm_runtime_put_noidle().
> 
> In both this and the previous patch, the final decrement should be done 
> by pm_runtime_put_sync() instead of pm_runtime_put_idle().  Otherwise 
> you face the possibility that the usage_count may go to 0 but the 
> device will be left active.

OK, that's how the old code worked, BTW, I overlooked that.

> Furthermore, since we're going to disable runtime PM as soon as the
> suspend callback returns anyway, why not increment usage_count before
> invoking the callback?  This will prevent runtime suspends from 
> occurring while the callback runs, so no changes will be needed in the 
> PCI or USB subsystems.

The PCI case is different from the USB one.  PCI needs to resume devices
before calling their drivers' .suspend() callbacks, so it does that in
.prepare().  If the core acquired a reference to every device  before executing
the subsystem .suspend(), then pm_runtime_resume() could be moved from
pci_pm_prepare() to pci_prepare_suspend(), but then additionally it would
have to be called from pci_pm_freeze() and pci_pm_poweroff().  It simply is
more efficient to call it once from pci_pm_prepare(), but then PCI needs to
take the reference by itself.

Also the core doesn't call the subsystem-level .runtime_idle() after the
subsystem-level .complete() has run, which is useful as you pointed out above. :-)

> It also will prevent Kevin from calling pm_runtime_suspend from within
> his suspend callbacks, but you have already determined that subsystems
> and drivers should never do that in any case.

Then reverting commit e8665002477f0278f84f898145b1f141ba26ee26 would be
even better. :-)

Thanks,
Rafael
Alan Stern June 20, 2011, 9 p.m. UTC | #3
On Mon, 20 Jun 2011, Rafael J. Wysocki wrote:

> > Furthermore, since we're going to disable runtime PM as soon as the
> > suspend callback returns anyway, why not increment usage_count before
> > invoking the callback?  This will prevent runtime suspends from 
> > occurring while the callback runs, so no changes will be needed in the 
> > PCI or USB subsystems.
> 
> The PCI case is different from the USB one.  PCI needs to resume devices
> before calling their drivers' .suspend() callbacks, so it does that in
> .prepare().  If the core acquired a reference to every device  before executing
> the subsystem .suspend(), then pm_runtime_resume() could be moved from
> pci_pm_prepare() to pci_prepare_suspend(), but then additionally it would
> have to be called from pci_pm_freeze() and pci_pm_poweroff().  It simply is
> more efficient to call it once from pci_pm_prepare(), but then PCI needs to
> take the reference by itself.

Ah, okay.  The PCI part makes sense then.

> > It also will prevent Kevin from calling pm_runtime_suspend from within
> > his suspend callbacks, but you have already determined that subsystems
> > and drivers should never do that in any case.
> 
> Then reverting commit e8665002477f0278f84f898145b1f141ba26ee26 would be
> even better. :-)

See below.


> > As I see it, we never want a suspend or suspend_noirq callback to call 
> > pm_runtime_suspend().  However it's okay for the suspend callback to 
> > invoke pm_runtime_resume(), as long as this is all done in subsystem 
> > code.
> 
> First off, I don't really see a reason for a subsystem to call
> pm_runtime_resume() from its .suspend_noirq() callback.

I was referring to .suspend(), not .suspend_noirq().

>  Now, if
> pm_runtime_resume() is to be called concurrently with the subsystem's
> .suspend_noirq() callback, I'd rather won't let that happen. :-)

Me too.  But I see no reason to prevent pm_runtime_resume() from being 
called by .suspend().

> > And in between the prepare and suspend callbacks, runtime PM should be
> > more or less fully functional, right?  For most devices it will never
> > be triggered, because it has to run in process context and both
> > userspace and pm_wq are frozen.  It may trigger for devices marked as
> > IRQ-safe, though.
> 
> It also may trigger for drivers using non-freezable workqueues and calling
> runtime PM synchronously from there.

Right.  So we shouldn't ignore this window.

> > Maybe the barrier should be moved into __device_suspend().
> 
> I _really_ think that the initial approach, i.e. before commit
> e8665002477f0278f84f898145b1f141ba26ee26, made the most sense.  It didn't
> cover the "pm_runtime_resume() called during system suspend" case, but
> it did cover everything else.

But it prevented runtime PM from working during the window between 
.prepare() and .suspend(), and also between .resume() and .complete().
If a subsystem like PCI wants to rule out runtime PM during those 
windows, then fine -- it can do whatever it wants.  But the PM core 
shouldn't do this.

> So, I think there are serious technical arguments for reverting that commit.
> 
> I think we went really far trying to avoid that, but I'm not sure I want to go
> any further.

What I'm suggesting is to revert the commit but at the same time,
move the get_noresume() into __device_suspend() and the put_sync() into 
device_resume().

Alan Stern
diff mbox

Patch

Index: linux-2.6/drivers/pci/pci-driver.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci-driver.c
+++ linux-2.6/drivers/pci/pci-driver.c
@@ -624,7 +624,7 @@  static int pci_pm_prepare(struct device
 	 * system from the sleep state, we'll have to prevent it from signaling
 	 * wake-up.
 	 */
-	pm_runtime_resume(dev);
+	pm_runtime_get_sync(dev);
 
 	if (drv && drv->pm && drv->pm->prepare)
 		error = drv->pm->prepare(dev);
@@ -638,6 +638,8 @@  static void pci_pm_complete(struct devic
 
 	if (drv && drv->pm && drv->pm->complete)
 		drv->pm->complete(dev);
+
+	pm_runtime_put_noidle(dev);
 }
 
 #else /* !CONFIG_PM_SLEEP */