diff mbox

PCI / PM: Block races between runtime PM and system sleep

Message ID 201106240059.38687.rjw@sisk.pl (mailing list archive)
State Superseded, archived
Headers show

Commit Message

Rafael Wysocki June 23, 2011, 10:59 p.m. UTC
On Friday, June 24, 2011, Rafael J. Wysocki wrote:
> On Thursday, June 23, 2011, Alan Stern wrote:
> > On Thu, 23 Jun 2011, Rafael J. Wysocki wrote:
> > 
> > > > Then maybe this disable_depth > 0 case should return something other
> > > > than 0.  Something new, like -EACCES.  That way the caller would
> > > > realize something strange was going on but wouldn't have to treat the 
> > > > situation as an error.
> > > 
> > > I would be fine with that, but then we'd need to reserve that error code,
> > > so that it's not returned by subsystem callbacks (or even we should convert
> > > it to a different error code if it is returned by the subsystem callback in
> > > rpm_resume()). 
> > > 
> > > > After all, the return value from pm_runtime_get_sync() is documented to 
> > > > be the error code for the underlying pm_runtime_resume().  It doesn't 
> > > > refer to the increment operation -- that always succeeds.
> > > 
> > > That means we should change the caller, which is the SCSI subsystem in this
> > > particular case, to check the error code.  The problem with this approach
> > > is that the same error code may be returned in a different situation, so
> > > we should prevent that from happening in the first place.  Still, suppose
> > > that we do that and that the caller checks the error code.  What is it
> > > supposed to do in that situation?  The only reasonable action for the
> > > caller is to ignore the error code if it means disable_depth > 0 and go
> > > on with whatever it has to do, but that's what it will do if the
> > > pm_runtime_get_sync() returns 0 in that situation.
> > > 
> > > So, in my opinion it simply may be best to update the documentation of
> > > pm_runtime_get_sync() along with the code changes. :-)
> > 
> > The only reason you're doing this is for the SCSI error-handler 
> > routine?
> 
> Yes, it is.
> 
> > I think it would be easier to change that routine instead of the PM 
> > core.  It should be smart enough to know that a runtime PM call isn't 
> > needed during a system sleep transition, i.e., between the scsi_host's 
> > suspend and resume callbacks.  Maybe check the new is_suspended flag.  
> > You'd also have to make sure the scsi_host wasn't runtime suspended 
> > when the sleep begins, rather like PCI.
> 
> Well, I think the problem is still there even at run time if runtime PM
> happens to be disabled for shost_gendev (this probably never happens in
> practice, though, except when runtime PM is disabled during system
> suspend, which also wasn't done before).
> 
> > I'm still not clear on why the error handler needs to run at this time.
> 
> Because SATA ports are suspended with the help of the SCSI error handling
> mechanism (which Tejun claims is the best way to do that).
> 
> But the thing done by scsi_autopm_get_host() is entirely reasonable
> (maybe except that calling pm_runtime_put_sync() after failing
> pm_runtime_get_sync() is rather pointless, pm_runtime_put_noidle() would
> be sufficient), but it doesn't work for disabled runtime PM.
> 
> So, the question is whether or not what scsi_autopm_get_host() does should work
> when runtime PM is disabled and I think it should.  Hence the patch.
> 
> Now, I agree that the proposed change is a little ugly, because it makes
> "resume with taking reference" behave differently from "resume".  The
> solution to that would be to return a special error code in the "disabled
> runtime PM" case.  However, in that case we'd need to change the runtime PM
> code in general to return this particular error code in the "disabled runtime
> PM" case and to prevent this error code from being returned in other
> circumstances.  In addition to that, we'de need to change the SCSI code, of
> course.
> 
> Do you think that would be better?

I've carried out this exercise to see how complicated it is going to be
and it doesn't really seem to be _that_ complicated.  The appended patch
illustrates this, but it hasn't been tested, so caveat emptor.

Thanks,
Rafael

---
 drivers/base/power/runtime.c |    9 +++++----
 drivers/scsi/scsi_pm.c       |    8 ++++----
 2 files changed, 9 insertions(+), 8 deletions(-)

Comments

Alan Stern June 26, 2011, 2:39 a.m. UTC | #1
On Fri, 24 Jun 2011, Rafael J. Wysocki wrote:

> > > I'm still not clear on why the error handler needs to run at this time.
> > 
> > Because SATA ports are suspended with the help of the SCSI error handling
> > mechanism (which Tejun claims is the best way to do that).

> I've carried out this exercise to see how complicated it is going to be
> and it doesn't really seem to be _that_ complicated.  The appended patch
> illustrates this, but it hasn't been tested, so caveat emptor.

The patch is straightforward enough.  But will it be sufficient?

Suppose a SATA port is already in runtime suspend when the system sleep
starts.  Will the error handler be able to do its special job?  I don't
know...  It may turn out to be necessary for the SATA port to be
runtime resumed somewhere along the line.

Alan Stern
Rafael Wysocki June 26, 2011, 12:22 p.m. UTC | #2
On Sunday, June 26, 2011, Alan Stern wrote:
> On Fri, 24 Jun 2011, Rafael J. Wysocki wrote:
> 
> > > > I'm still not clear on why the error handler needs to run at this time.
> > > 
> > > Because SATA ports are suspended with the help of the SCSI error handling
> > > mechanism (which Tejun claims is the best way to do that).
> 
> > I've carried out this exercise to see how complicated it is going to be
> > and it doesn't really seem to be _that_ complicated.  The appended patch
> > illustrates this, but it hasn't been tested, so caveat emptor.
> 
> The patch is straightforward enough.  But will it be sufficient?
> 
> Suppose a SATA port is already in runtime suspend when the system sleep
> starts.  Will the error handler be able to do its special job?  I don't
> know...  It may turn out to be necessary for the SATA port to be
> runtime resumed somewhere along the line.

That's correct, but at least in the ususal situation (i.e. sdev_gendev is
not suspended at run time) the patch makes things work when runtime PM
is disabled during system suspend and enabled during system resume.

It still may be necessary to add code to the SATA subsystem if sdev_gendev
is to be suspended at run time for SATA controllers.  I'm not aware of any
of such cases, though.

Thanks,
Rafael
diff mbox

Patch

Index: linux-2.6/drivers/base/power/runtime.c
===================================================================
--- linux-2.6.orig/drivers/base/power/runtime.c
+++ linux-2.6/drivers/base/power/runtime.c
@@ -135,8 +135,9 @@  static int rpm_check_suspend_allowed(str
 
 	if (dev->power.runtime_error)
 		retval = -EINVAL;
-	else if (atomic_read(&dev->power.usage_count) > 0
-	    || dev->power.disable_depth > 0)
+	else if (dev->power.disable_depth > 0)
+		retval = -EACCES;
+	else if (atomic_read(&dev->power.usage_count) > 0)
 		retval = -EAGAIN;
 	else if (!pm_children_suspended(dev))
 		retval = -EBUSY;
@@ -262,7 +263,7 @@  static int rpm_callback(int (*cb)(struct
 		spin_lock_irq(&dev->power.lock);
 	}
 	dev->power.runtime_error = retval;
-	return retval;
+	return retval != -EACCES ? retval : -EIO;
 }
 
 /**
@@ -458,7 +459,7 @@  static int rpm_resume(struct device *dev
 	if (dev->power.runtime_error)
 		retval = -EINVAL;
 	else if (dev->power.disable_depth > 0)
-		retval = -EAGAIN;
+		retval = -EACCES;
 	if (retval)
 		goto out;
 
Index: linux-2.6/drivers/scsi/scsi_pm.c
===================================================================
--- linux-2.6.orig/drivers/scsi/scsi_pm.c
+++ linux-2.6/drivers/scsi/scsi_pm.c
@@ -144,9 +144,9 @@  int scsi_autopm_get_device(struct scsi_d
 	int	err;
 
 	err = pm_runtime_get_sync(&sdev->sdev_gendev);
-	if (err < 0)
+	if (err < 0 && err !=-EACCES)
 		pm_runtime_put_sync(&sdev->sdev_gendev);
-	else if (err > 0)
+	else
 		err = 0;
 	return err;
 }
@@ -173,9 +173,9 @@  int scsi_autopm_get_host(struct Scsi_Hos
 	int	err;
 
 	err = pm_runtime_get_sync(&shost->shost_gendev);
-	if (err < 0)
+	if (err < 0 && err !=-EACCES)
 		pm_runtime_put_sync(&shost->shost_gendev);
-	else if (err > 0)
+	else
 		err = 0;
 	return err;
 }