diff mbox

Allow runtime suspend during system resume

Message ID 1860008.68DHz0qP41@vostro.rjw.lan (mailing list archive)
State RFC, archived
Headers show

Commit Message

Rafael J. Wysocki Jan. 8, 2014, 10:55 p.m. UTC
On Wednesday, January 08, 2014 03:44:53 PM Alan Stern wrote:
> On Wed, 8 Jan 2014, Alan Stern wrote:
> 
> > On Wed, 8 Jan 2014, Alan Stern wrote:
> > 
> > > In essence, you want your disk's state to move directly from unpowered
> > > (system asleep) to runtime-suspended with no intermediate active -- but
> > > only if the disk doesn't spin up all by itself.
> > 
> > > In the end, it sounds like what you want can be done fairly easily.  
> > 
> > It turns out there is a snag.  The PM core is set up such that during
> > system sleep transitions, there's no way to tell whether or not
> > userspace has permitted runtime PM for a device.  (This is because the
> > core wants to prevent untimely runtime suspends from occurring in the
> > middle of a system sleep transition, and to do so it uses the same
> > mechanism as it does when userspace disallows runtime suspend.)
> > 
> > As a result, we must not go directly from unpowered to
> > runtime-suspended.  Not without some sort of change to the PM core.
> 
> Rafael:
> 
> Right now the PM core does pm_runtime_get_noresume() during the prepare 
> phase of system sleep, and pm_runtime_put() during the complete phase.  
> Maybe we should cut down the range of coverage.
> 
> For example, we could do the pm_runtime_put() during dpm_resume(), or 
> even earlier.  That way, the resume routine could do something like:
> 
> 	if (!pm_runtime_in_use(dev)) {
> 		pm_runtime_disable(dev);
> 		pm_runtime_set_suspended(dev);
> 		pm_runtime_enable(dev);
> 		return 0;
> 	}
> 
> where pm_runtime_in_use() would check the device's usage counters.  
> The device would come out of system sleep in a powered-down state, and
> could remain runtime-suspended with no need for an extra
> power-up/power-down cycle.  Isn't this the sort of thing the embedded
> people want to do?
> 
> As it is, drivers have no way to safely put devices directly into 
> runtime suspend when coming out of system sleep.
> 
> What do you think?

There are two problems here.

First off, for some devices that were runtime-suspended before system suspend
we may not want to touch them at all during system suspend/resume.  I think
I know how to address that, but I need to prepare patches yet.  That's going
to take a few days at least, sorry about that.

Second, we may want some devices to be set to "runtime suspended" during
system resume without touching them, so that runtime PM resumes them when
necessary.  I have a patch for that, which hasn't been published yet and which
is appended.

The way it works is that whoever wants the system resume of certain device
to be skipped, sets the (new) skip_resume flag for that device using
device_pm_skip_resume() and that causes the PM core to basically ignore that
device's PM callbacks during system resume.

Whether or not doing the pm_request_resume(dev) for devices with skip_resume
set in dpm_resume_early() is a good idea is a discussion item in my opinion
(evidently, I thought it was when I was preparing this patch).

Thanks,
Rafael


---
 drivers/base/power/main.c |   98 ++++++++++++++++++++++++++++++++++------------
 include/linux/pm.h        |   13 +++++-
 2 files changed, 85 insertions(+), 26 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Alan Stern Jan. 8, 2014, 11:24 p.m. UTC | #1
On Wed, 8 Jan 2014, Rafael J. Wysocki wrote:

> On Wednesday, January 08, 2014 03:44:53 PM Alan Stern wrote:
> > On Wed, 8 Jan 2014, Alan Stern wrote:
> > 
> > > On Wed, 8 Jan 2014, Alan Stern wrote:
> > > 
> > > > In essence, you want your disk's state to move directly from unpowered
> > > > (system asleep) to runtime-suspended with no intermediate active -- but
> > > > only if the disk doesn't spin up all by itself.
> > > 
> > > > In the end, it sounds like what you want can be done fairly easily.  
> > > 
> > > It turns out there is a snag.  The PM core is set up such that during
> > > system sleep transitions, there's no way to tell whether or not
> > > userspace has permitted runtime PM for a device.  (This is because the
> > > core wants to prevent untimely runtime suspends from occurring in the
> > > middle of a system sleep transition, and to do so it uses the same
> > > mechanism as it does when userspace disallows runtime suspend.)
> > > 
> > > As a result, we must not go directly from unpowered to
> > > runtime-suspended.  Not without some sort of change to the PM core.
> > 
> > Rafael:
> > 
> > Right now the PM core does pm_runtime_get_noresume() during the prepare 
> > phase of system sleep, and pm_runtime_put() during the complete phase.  
> > Maybe we should cut down the range of coverage.
> > 
> > For example, we could do the pm_runtime_put() during dpm_resume(), or 
> > even earlier.  That way, the resume routine could do something like:
> > 
> > 	if (!pm_runtime_in_use(dev)) {
> > 		pm_runtime_disable(dev);
> > 		pm_runtime_set_suspended(dev);
> > 		pm_runtime_enable(dev);
> > 		return 0;
> > 	}
> > 
> > where pm_runtime_in_use() would check the device's usage counters.  
> > The device would come out of system sleep in a powered-down state, and
> > could remain runtime-suspended with no need for an extra
> > power-up/power-down cycle.  Isn't this the sort of thing the embedded
> > people want to do?
> > 
> > As it is, drivers have no way to safely put devices directly into 
> > runtime suspend when coming out of system sleep.
> > 
> > What do you think?
> 
> There are two problems here.
> 
> First off, for some devices that were runtime-suspended before system suspend
> we may not want to touch them at all during system suspend/resume.  I think
> I know how to address that, but I need to prepare patches yet.  That's going
> to take a few days at least, sorry about that.
> 
> Second, we may want some devices to be set to "runtime suspended" during
> system resume without touching them, so that runtime PM resumes them when
> necessary.  I have a patch for that, which hasn't been published yet and which
> is appended.
> 
> The way it works is that whoever wants the system resume of certain device
> to be skipped, sets the (new) skip_resume flag for that device using
> device_pm_skip_resume() and that causes the PM core to basically ignore that
> device's PM callbacks during system resume.
> 
> Whether or not doing the pm_request_resume(dev) for devices with skip_resume
> set in dpm_resume_early() is a good idea is a discussion item in my opinion
> (evidently, I thought it was when I was preparing this patch).

The situation isn't quite so simple.

We're talking specifically about disk drives.  Depending on the drive,
it may or may not spin up all by itself when the system wakes up.  If
it does spin up, the final state should be RPM_ACTIVE (the way it is
now).  If it doesn't, we want the final state to be RPM_SUSPENDED.  
This is true regardless of the runtime status before the system sleep.

This means we can't skip the resume routine.  At the very least, the 
routine needs to ask the disk whether or not it is spinning.  Depending 
on the result, it would then set the RPM status appropriately.

In addition, we only want to end up in RPM_SUSPENDED if the usage
counters are 0 (in particular, power/control = "auto").  Otherwise, we
want the resume routine always to spin up the drive, if it hasn't spun
up by itself.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki Jan. 9, 2014, 12:05 a.m. UTC | #2
> On Wed, 8 Jan 2014, Rafael J. Wysocki wrote:
> > On Wednesday, January 08, 2014 03:44:53 PM Alan Stern wrote:
> > > On Wed, 8 Jan 2014, Alan Stern wrote:
> > > > On Wed, 8 Jan 2014, Alan Stern wrote:
> > > > > In essence, you want your disk's state to move directly from
> > > > > unpowered
> > > > > (system asleep) to runtime-suspended with no intermediate active --
> > > > > but
> > > > > only if the disk doesn't spin up all by itself.
> > > > > 
> > > > > In the end, it sounds like what you want can be done fairly easily.
> > > > 
> > > > It turns out there is a snag.  The PM core is set up such that during
> > > > system sleep transitions, there's no way to tell whether or not
> > > > userspace has permitted runtime PM for a device.  (This is because the
> > > > core wants to prevent untimely runtime suspends from occurring in the
> > > > middle of a system sleep transition, and to do so it uses the same
> > > > mechanism as it does when userspace disallows runtime suspend.)
> > > > 
> > > > As a result, we must not go directly from unpowered to
> > > > runtime-suspended.  Not without some sort of change to the PM core.
> > > 
> > > Rafael:
> > > 
> > > Right now the PM core does pm_runtime_get_noresume() during the prepare
> > > phase of system sleep, and pm_runtime_put() during the complete phase.
> > > Maybe we should cut down the range of coverage.
> > > 
> > > For example, we could do the pm_runtime_put() during dpm_resume(), or
> > > 
> > > even earlier.  That way, the resume routine could do something like:
> > > 	if (!pm_runtime_in_use(dev)) {
> > > 	
> > > 		pm_runtime_disable(dev);
> > > 		pm_runtime_set_suspended(dev);
> > > 		pm_runtime_enable(dev);
> > > 		return 0;
> > > 	
> > > 	}
> > > 
> > > where pm_runtime_in_use() would check the device's usage counters.
> > > The device would come out of system sleep in a powered-down state, and
> > > could remain runtime-suspended with no need for an extra
> > > power-up/power-down cycle.  Isn't this the sort of thing the embedded
> > > people want to do?
> > > 
> > > As it is, drivers have no way to safely put devices directly into
> > > runtime suspend when coming out of system sleep.
> > > 
> > > What do you think?
> > 
> > There are two problems here.
> > 
> > First off, for some devices that were runtime-suspended before system
> > suspend
> > we may not want to touch them at all during system suspend/resume.  I
> > think
> > I know how to address that, but I need to prepare patches yet.  That's
> > going
> > to take a few days at least, sorry about that.
> > 
> > Second, we may want some devices to be set to "runtime suspended" during
> > system resume without touching them, so that runtime PM resumes them when
> > necessary.  I have a patch for that, which hasn't been published yet and
> > which is appended.
> > 
> > The way it works is that whoever wants the system resume of certain device
> > to be skipped, sets the (new) skip_resume flag for that device using
> > device_pm_skip_resume() and that causes the PM core to basically ignore
> > that
> > device's PM callbacks during system resume.
> > 
> > Whether or not doing the pm_request_resume(dev) for devices with
> > skip_resume set in dpm_resume_early() is a good idea is a discussion item
> > in my opinion (evidently, I thought it was when I was preparing this patch).
> 
> The situation isn't quite so simple.
> 
> We're talking specifically about disk drives.  Depending on the drive,
> it may or may not spin up all by itself when the system wakes up.  If
> it does spin up, the final state should be RPM_ACTIVE (the way it is
> now).  If it doesn't, we want the final state to be RPM_SUSPENDED.
> This is true regardless of the runtime status before the system sleep.

OK

We can do all that in .resume_early() which runs with runtime PM disabled.

> This means we can't skip the resume routine.  At the very least, the
> routine needs to ask the disk whether or not it is spinning.  Depending
> on the result, it would then set the RPM status appropriately.

Yes, .resume_early() can do that.  And it can set a (internal) flag for
a device telling the driver/bus type .resume() and .complete() to
leave the device alone.

> In addition, we only want to end up in RPM_SUSPENDED if the usage
> counters are 0 (in particular, power/control = "auto").  Otherwise, we
> want the resume routine always to spin up the drive, if it hasn't spun
> up by itself.

That still can be done in .resume_early().  Or in .resume_noirq() if device
interrupts are not requisite for that.

Thanks,
Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Stern Jan. 9, 2014, 3:32 p.m. UTC | #3
On Thu, 9 Jan 2014, Rafael J. Wysocki wrote:

> > The situation isn't quite so simple.
> > 
> > We're talking specifically about disk drives.  Depending on the drive,
> > it may or may not spin up all by itself when the system wakes up.  If
> > it does spin up, the final state should be RPM_ACTIVE (the way it is
> > now).  If it doesn't, we want the final state to be RPM_SUSPENDED.
> > This is true regardless of the runtime status before the system sleep.
> 
> OK
> 
> We can do all that in .resume_early() which runs with runtime PM disabled.

Either the .resume_early or the .resume routine would be okay, as far
as I'm concerned.  Is there any strong reason to prefer one over the
other, besides the fact that in .resume_early the pm_runtime_disable
and _enable calls wouldn't be needed?

> > This means we can't skip the resume routine.  At the very least, the
> > routine needs to ask the disk whether or not it is spinning.  Depending
> > on the result, it would then set the RPM status appropriately.
> 
> Yes, .resume_early() can do that.  And it can set a (internal) flag for
> a device telling the driver/bus type .resume() and .complete() to
> leave the device alone.

I had in mind something more like this for the driver's .resume (or
.resume_early) routine:

	bool leave_suspended = false;
	int rc = 0;

	if (!pm_runtime_in_use(dev) && !disk_is_spinning(dev))
		leave_suspended = true;
	else
		rc = spin_up_the_disk(dev);

	if (rc == 0) {
		pm_runtime_disable(dev);
		if (leave_suspended)
			pm_runtime_set_suspended(dev);
		else
			pm_runtime_set_active(dev);
		pm_runtime_enable(dev);
	}
	return rc;

That wouldn't need any special support from the PM core, other than the
new pm_runtime_in_use routine.

> > In addition, we only want to end up in RPM_SUSPENDED if the usage
> > counters are 0 (in particular, power/control = "auto").  Otherwise, we
> > want the resume routine always to spin up the drive, if it hasn't spun
> > up by itself.
> 
> That still can be done in .resume_early().  Or in .resume_noirq() if device
> interrupts are not requisite for that.

No, currently it cannot be done.

In both resume_early and resume_noirq, the usage counter is _always_ >
0.  This is because device_prepare calls pm_runtime_get_noresume, and
the corresponding pm_runtime_put doesn't occur until device_complete.  
That's the problem I need to fix.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Phillip Susi Jan. 9, 2014, 3:50 p.m. UTC | #4
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 1/9/2014 10:32 AM, Alan Stern wrote:
> I had in mind something more like this for the driver's .resume
> (or .resume_early) routine:
> 
> bool leave_suspended = false; int rc = 0;
> 
> if (!pm_runtime_in_use(dev) && !disk_is_spinning(dev)) 
> leave_suspended = true; else rc = spin_up_the_disk(dev);
> 
> if (rc == 0) { pm_runtime_disable(dev); if (leave_suspended) 
> pm_runtime_set_suspended(dev); else pm_runtime_set_active(dev); 
> pm_runtime_enable(dev); } return rc;

We also don't want to block the resume path while the REQUEST SENSE
runs, since it typically takes 10 seconds or so on normal ATA disks.
Thus, the resume or resume early needs to be able to return with the
device left in the transitioning state and complete or abort the
transition later.


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (MingW32)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJSzsVVAAoJEI5FoCIzSKrwLGAIAJj3GvPEcwtFvkJ6oAX+4fbf
z75OHkOkKtwOTY1SvqpgyYUMdrsTLRYO5ZPCCzZemow6s24xjEhtHVYONKogZ8Rv
lYzFpsZhWd7pLjHm5PDUSbEJiOvx9Ba+6JbRYGl9rInv2PdJOSscEpN+4a0/bvGU
F72tUfYI58su0RHcee9xXFRBOZkM61DCxNNXAyr+SHa7fIJi5jKrlNSbHPzBXAo0
g/qw8IY86a3SWE4cOGIfsrtL8GpmbwFNkT+BHWhaRlj2wj2VTP5FTZrOHndNL7pU
WC29Qxg2cKhSgm15BhNw3ESjn/3rdFaHsV96ietFKMlavXDqRmmwPsTmNK8Pbuo=
=5AE6
-----END PGP SIGNATURE-----
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Stern Jan. 9, 2014, 4:08 p.m. UTC | #5
On Thu, 9 Jan 2014, Phillip Susi wrote:

> On 1/9/2014 10:32 AM, Alan Stern wrote:
> > I had in mind something more like this for the driver's .resume
> > (or .resume_early) routine:
> > 
> > bool leave_suspended = false; int rc = 0;
> > 
> > if (!pm_runtime_in_use(dev) && !disk_is_spinning(dev)) 
> > leave_suspended = true; else rc = spin_up_the_disk(dev);
> > 
> > if (rc == 0) { pm_runtime_disable(dev); if (leave_suspended) 
> > pm_runtime_set_suspended(dev); else pm_runtime_set_active(dev); 
> > pm_runtime_enable(dev); } return rc;
> 
> We also don't want to block the resume path while the REQUEST SENSE
> runs, since it typically takes 10 seconds or so on normal ATA disks.
> Thus, the resume or resume early needs to be able to return with the
> device left in the transitioning state and complete or abort the
> transition later.

The code will be restructured as part of the asynchronous spin-up 
change.  The resume routine would then look more like this:

	bool test_spinning = !pm_runtime_in_use(dev);

	pm_runtime_disable(dev);
	if (test_spinning)
		pm_runtime_set_suspended(dev);
	else
		pm_runtime_set_active(dev);
	pm_runtime_enable(dev);

	start_async_spin_up(dev, test_spinning);
	return 0;

The asynchronous routine would skip the initial test if its second 
argument is nonzero, and proceed directly to spin-up the disk.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Phillip Susi Jan. 9, 2014, 4:30 p.m. UTC | #6
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 1/9/2014 11:08 AM, Alan Stern wrote:
> The code will be restructured as part of the asynchronous spin-up 
> change.  The resume routine would then look more like this:
> 
> bool test_spinning = !pm_runtime_in_use(dev);
> 
> pm_runtime_disable(dev); if (test_spinning) 
> pm_runtime_set_suspended(dev); else pm_runtime_set_active(dev); 
> pm_runtime_enable(dev);
> 
> start_async_spin_up(dev, test_spinning); return 0;
> 
> The asynchronous routine would skip the initial test if its second
>  argument is nonzero, and proceed directly to spin-up the disk.

It isn't enough to background the spin up; the TEST will block, so it
needs to be async as well.


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (MingW32)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJSzs66AAoJEI5FoCIzSKrwY40H/1gAfH5KxfXvr9gHGMOHAdiw
VwUNKJTnADrEREOTEgWSPH8/VYyK2aFPjiGlWKvmZXMNC0CYLPRInSbhLx0cH4x1
uvezYnvVno/xME8QlLTqzVuQIrk3TKjyrd0rCBpj3YvAAyDy8TSGrJZfDAUixZkP
zoahqzIthPE52DirixVBYIamsDOZltHOjK2HbfTHZXo9m+dxQirDqL/ClQivt3Gq
e0IhRDEXsEb1b4ICQztuqNAp2BV57IzIHCd83+CTd/hkPalHFxJv6Y26IGYrORSZ
CRtXWMYiGs2/cIElSpIIQDUL60niqfFPaxsGRCT+AISwu/f4TRzAo2t/t+Vaa6c=
=aR9i
-----END PGP SIGNATURE-----
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Stern Jan. 9, 2014, 5:04 p.m. UTC | #7
On Thu, 9 Jan 2014, Phillip Susi wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On 1/9/2014 11:08 AM, Alan Stern wrote:
> > The code will be restructured as part of the asynchronous spin-up 
> > change.  The resume routine would then look more like this:
> > 
> > bool test_spinning = !pm_runtime_in_use(dev);
> > 
> > pm_runtime_disable(dev); if (test_spinning) 
> > pm_runtime_set_suspended(dev); else pm_runtime_set_active(dev); 
> > pm_runtime_enable(dev);
> > 
> > start_async_spin_up(dev, test_spinning); return 0;
> > 
> > The asynchronous routine would skip the initial test if its second
> >  argument is nonzero, and proceed directly to spin-up the disk.
> 
> It isn't enough to background the spin up; the TEST will block, so it
> needs to be async as well.

That's what I meant.  If the second argument is zero, the asynchronous 
routine will first test whether the disk is spinning.  If it is, the 
status will be set to RPM_ACTIVE.  (The routine may also send a START 
command -- for some disks this may be necessary.)

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki Jan. 10, 2014, 1:25 a.m. UTC | #8
On Thursday, January 09, 2014 10:32:37 AM Alan Stern wrote:
> On Thu, 9 Jan 2014, Rafael J. Wysocki wrote:
> 
> > > The situation isn't quite so simple.
> > > 
> > > We're talking specifically about disk drives.  Depending on the drive,
> > > it may or may not spin up all by itself when the system wakes up.  If
> > > it does spin up, the final state should be RPM_ACTIVE (the way it is
> > > now).  If it doesn't, we want the final state to be RPM_SUSPENDED.
> > > This is true regardless of the runtime status before the system sleep.
> > 
> > OK
> > 
> > We can do all that in .resume_early() which runs with runtime PM disabled.
> 
> Either the .resume_early or the .resume routine would be okay, as far
> as I'm concerned.  Is there any strong reason to prefer one over the
> other, besides the fact that in .resume_early the pm_runtime_disable
> and _enable calls wouldn't be needed?

Not so much, at least I don't have anything like that in mind right now, but
I'm not a big fan of disabling/enabling runtime PM temporarily to have a look
at the "current" status.  That just doesn't seem quite right. :-)

> > > This means we can't skip the resume routine.  At the very least, the
> > > routine needs to ask the disk whether or not it is spinning.  Depending
> > > on the result, it would then set the RPM status appropriately.
> > 
> > Yes, .resume_early() can do that.  And it can set a (internal) flag for
> > a device telling the driver/bus type .resume() and .complete() to
> > leave the device alone.
> 
> I had in mind something more like this for the driver's .resume (or
> .resume_early) routine:
> 
> 	bool leave_suspended = false;
> 	int rc = 0;
> 
> 	if (!pm_runtime_in_use(dev) && !disk_is_spinning(dev))
> 		leave_suspended = true;
> 	else
> 		rc = spin_up_the_disk(dev);
> 
> 	if (rc == 0) {
> 		pm_runtime_disable(dev);
> 		if (leave_suspended)
> 			pm_runtime_set_suspended(dev);
> 		else
> 			pm_runtime_set_active(dev);
> 		pm_runtime_enable(dev);
> 	}
> 	return rc;
> 
> That wouldn't need any special support from the PM core, other than the
> new pm_runtime_in_use routine.

That looks reasonable, but in .resume_early() you can do it like this I think:

	bool disk_spinning = disk_is_spinning(dev);
	int rc = 0;

	if (!pm_runtime_in_use(dev) && !disk_spinning) {
		pm_runtime_set_suspended(dev);
	} else {
		pm_runtime_set_active(dev);
		if (!disk_spinning)
			rc = spin_up_the_disk(dev);
	}

and of course there needs to be a flag to pass to .resume() in case the
device is supposed to be left in the suspended state.

> > > In addition, we only want to end up in RPM_SUSPENDED if the usage
> > > counters are 0 (in particular, power/control = "auto").  Otherwise, we
> > > want the resume routine always to spin up the drive, if it hasn't spun
> > > up by itself.
> > 
> > That still can be done in .resume_early().  Or in .resume_noirq() if device
> > interrupts are not requisite for that.
> 
> No, currently it cannot be done.
> 
> In both resume_early and resume_noirq, the usage counter is _always_ >
> 0.  This is because device_prepare calls pm_runtime_get_noresume, and
> the corresponding pm_runtime_put doesn't occur until device_complete.  
> That's the problem I need to fix.

Hmm.  Why do you have to compare usage_count with 0 and not with 1?

Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Phillip Susi Jan. 10, 2014, 1:55 a.m. UTC | #9
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

On 01/09/2014 08:25 PM, Rafael J. Wysocki wrote:
> That looks reasonable, but in .resume_early() you can do it like
> this I think:
> 
> bool disk_spinning = disk_is_spinning(dev); int rc = 0;
> 
> if (!pm_runtime_in_use(dev) && !disk_spinning) { 
> pm_runtime_set_suspended(dev); } else { 
> pm_runtime_set_active(dev); if (!disk_spinning) rc =
> spin_up_the_disk(dev); }

You're not allowed to call pm_runtime_set_{suspended,active} without
disabling runtime pm.


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBCgAGBQJSz1L6AAoJEI5FoCIzSKrwNWoIAKXZ6E7Vcrhbf7odxxdp11nq
vmr76+RkywUvJWDeRkq5IrgXfepyc/tEdQURNXcUvquOwbxX6NfH/Sb4DYZ+UUva
nJ9Po/vCyQAZgXodK3In7cteY4NvZLGVt9m3JkaYHKs3vkDriusSLMI1ghVNZIkW
X/VrVWhlnl9XNYWzFZuCs3ccsYpDfQ0akrxqFDoEh/mj5UUVkVXQt+JpkGPuUDD4
8yR3omZ+OFwnjqHLtzC8AOI59wujY2NPjoohQH4Rd+JTJEmAu9Ml5fqkPg90z9b2
C2mnpkEBoDEO4L1pDrD1jmxCwZQApf0qFWfu8qo+OR/VtMnxF5Cpkd+O5H+0OQg=
=Ksl4
-----END PGP SIGNATURE-----
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki Jan. 10, 2014, 1:35 p.m. UTC | #10
On Thursday, January 09, 2014 08:55:06 PM Phillip Susi wrote:
> On 01/09/2014 08:25 PM, Rafael J. Wysocki wrote:
> > That looks reasonable, but in .resume_early() you can do it like
> > this I think:
> >
> > bool disk_spinning = disk_is_spinning(dev); int rc = 0;
> >
> > if (!pm_runtime_in_use(dev) && !disk_spinning) {
> > pm_runtime_set_suspended(dev); } else {
> > pm_runtime_set_active(dev); if (!disk_spinning) rc =
> > spin_up_the_disk(dev); }
> 
> You're not allowed to call pm_runtime_set_{suspended,active} without
> disabling runtime pm.

Well, thanks for reminding me how my code works. ;-)

.resume_early() runs with runtime PM disabled.

Thanks!
Phillip Susi Jan. 10, 2014, 2:46 p.m. UTC | #11
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 1/10/2014 8:35 AM, Rafael J. Wysocki wrote:
> Well, thanks for reminding me how my code works. ;-)
> 
> .resume_early() runs with runtime PM disabled.

Ahh, then in that case we're back to not wanting to block the resume
path on disk_is_spinning().



-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (MingW32)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJS0Ae2AAoJEI5FoCIzSKrwiqcH/jaoQ9WtE124xT53OaD7XfZQ
C45ftL7iZG/8iF/oW48RaEa8l87yoisThDjdokuMqo6SEkxsxAegg+l9yz98tSoX
kTj3x7rE8wPAwC4JCq7+6Xu2XTiK9ujHmohwJNljsqIdhbuLo+Y4XWNrKUzag2WX
8rJbX7bPTfg0ijtOkYUHfpJyap1OkG007MIeH4qQ/vZVd8Pem3Oy1jLdyLtpUMB5
D5ebHo6IdJDJpxX2EDSyHsI5UKEDycH6VLkCz06RpeMBkL4jePzL76D//9qfw+Kc
RpOyoOFhYM5q2XeXRSjtPX4vrUe5EqdLfqYriIYiYjNNDqrirhsBDUQ5SdXr7zE=
=HLer
-----END PGP SIGNATURE-----
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Stern Jan. 10, 2014, 3:25 p.m. UTC | #12
On Fri, 10 Jan 2014, Rafael J. Wysocki wrote:

> On Thursday, January 09, 2014 10:32:37 AM Alan Stern wrote:
> > On Thu, 9 Jan 2014, Rafael J. Wysocki wrote:
> > 
> > > > The situation isn't quite so simple.
> > > > 
> > > > We're talking specifically about disk drives.  Depending on the drive,
> > > > it may or may not spin up all by itself when the system wakes up.  If
> > > > it does spin up, the final state should be RPM_ACTIVE (the way it is
> > > > now).  If it doesn't, we want the final state to be RPM_SUSPENDED.
> > > > This is true regardless of the runtime status before the system sleep.
> > > 
> > > OK
> > > 
> > > We can do all that in .resume_early() which runs with runtime PM disabled.
> > 
> > Either the .resume_early or the .resume routine would be okay, as far
> > as I'm concerned.  Is there any strong reason to prefer one over the
> > other, besides the fact that in .resume_early the pm_runtime_disable
> > and _enable calls wouldn't be needed?
> 
> Not so much, at least I don't have anything like that in mind right now, but
> I'm not a big fan of disabling/enabling runtime PM temporarily to have a look
> at the "current" status.  That just doesn't seem quite right. :-)

The disable/enable is in order to _set_ the status, not to have a look 
at it.

> > I had in mind something more like this for the driver's .resume (or
> > .resume_early) routine:
> > 
> > 	bool leave_suspended = false;
> > 	int rc = 0;
> > 
> > 	if (!pm_runtime_in_use(dev) && !disk_is_spinning(dev))
> > 		leave_suspended = true;
> > 	else
> > 		rc = spin_up_the_disk(dev);
> > 
> > 	if (rc == 0) {
> > 		pm_runtime_disable(dev);
> > 		if (leave_suspended)
> > 			pm_runtime_set_suspended(dev);
> > 		else
> > 			pm_runtime_set_active(dev);
> > 		pm_runtime_enable(dev);
> > 	}
> > 	return rc;
> > 
> > That wouldn't need any special support from the PM core, other than the
> > new pm_runtime_in_use routine.
> 
> That looks reasonable, but in .resume_early() you can do it like this I think:
> 
> 	bool disk_spinning = disk_is_spinning(dev);
> 	int rc = 0;
> 
> 	if (!pm_runtime_in_use(dev) && !disk_spinning) {
> 		pm_runtime_set_suspended(dev);
> 	} else {
> 		pm_runtime_set_active(dev);
> 		if (!disk_spinning)
> 			rc = spin_up_the_disk(dev);
> 	}
> 
> and of course there needs to be a flag to pass to .resume() in case the
> device is supposed to be left in the suspended state.

Agreed, there are various ways to get the same end result.  My question 
here is: How should we implement pm_runtime_in_use()?

> > > > In addition, we only want to end up in RPM_SUSPENDED if the usage
> > > > counters are 0 (in particular, power/control = "auto").  Otherwise, we
> > > > want the resume routine always to spin up the drive, if it hasn't spun
> > > > up by itself.
> > > 
> > > That still can be done in .resume_early().  Or in .resume_noirq() if device
> > > interrupts are not requisite for that.
> > 
> > No, currently it cannot be done.
> > 
> > In both resume_early and resume_noirq, the usage counter is _always_ >
> > 0.  This is because device_prepare calls pm_runtime_get_noresume, and
> > the corresponding pm_runtime_put doesn't occur until device_complete.  
> > That's the problem I need to fix.
> 
> Hmm.  Why do you have to compare usage_count with 0 and not with 1?

It is an awkward problem.  During a system sleep transition,
pm_runtime_in_use() should return 0 if the usage_count is 1 (because of
the pm_runtime_get_noresume() in device_prepare()).

But at other times (i.e., during normal operation), pm_runtime_in_use()  
should return 0 if the usage_count is 0.

I suppose a simple way to solve the problem is to document that
pm_runtime_in_use() should only be invoked from within a system sleep
callback routine.  Then we can compare against 1 and not worry about
the normal-operation case.  Maybe call it
pm_runtime_usage_during_sleep() to make this restriction explicit.

Would that be acceptable?

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki Jan. 10, 2014, 11:02 p.m. UTC | #13
On Friday, January 10, 2014 10:25:32 AM Alan Stern wrote:
> On Fri, 10 Jan 2014, Rafael J. Wysocki wrote:
> 
> > On Thursday, January 09, 2014 10:32:37 AM Alan Stern wrote:
> > > On Thu, 9 Jan 2014, Rafael J. Wysocki wrote:
> > > 
> > > > > The situation isn't quite so simple.
> > > > > 
> > > > > We're talking specifically about disk drives.  Depending on the drive,
> > > > > it may or may not spin up all by itself when the system wakes up.  If
> > > > > it does spin up, the final state should be RPM_ACTIVE (the way it is
> > > > > now).  If it doesn't, we want the final state to be RPM_SUSPENDED.
> > > > > This is true regardless of the runtime status before the system sleep.
> > > > 
> > > > OK
> > > > 
> > > > We can do all that in .resume_early() which runs with runtime PM disabled.
> > > 
> > > Either the .resume_early or the .resume routine would be okay, as far
> > > as I'm concerned.  Is there any strong reason to prefer one over the
> > > other, besides the fact that in .resume_early the pm_runtime_disable
> > > and _enable calls wouldn't be needed?
> > 
> > Not so much, at least I don't have anything like that in mind right now, but
> > I'm not a big fan of disabling/enabling runtime PM temporarily to have a look
> > at the "current" status.  That just doesn't seem quite right. :-)
> 
> The disable/enable is in order to _set_ the status, not to have a look 
> at it.

Right.

I should have said that in my opinion disabling runtime PM in order to do
anything with the fields owned by it and re-enabling it right after that
wasn't quite right in principle.  Re-adjusting those fields *before* we
re-enable runtime PM during system resume is quite different from doing
the on-off trick *after* we've re-enabled runtime PM.

So, my opinion is that the runtime PM internal information should be made
reflect the actual (and/or desirable) state of things before it is re-enabled,
which practically means in the .resume_noirq()/.resume_early() time frame.

Doing it after that might actually work, but is questionable from the
overall consistency perspective (it basically would mean that we ran with
wrong runtime PM data for a while after we'd re-enabled it and before it
got adjusted, so we shouldn't have re-enabled it in the first place).

> > > I had in mind something more like this for the driver's .resume (or
> > > .resume_early) routine:
> > > 
> > > 	bool leave_suspended = false;
> > > 	int rc = 0;
> > > 
> > > 	if (!pm_runtime_in_use(dev) && !disk_is_spinning(dev))
> > > 		leave_suspended = true;
> > > 	else
> > > 		rc = spin_up_the_disk(dev);
> > > 
> > > 	if (rc == 0) {
> > > 		pm_runtime_disable(dev);
> > > 		if (leave_suspended)
> > > 			pm_runtime_set_suspended(dev);
> > > 		else
> > > 			pm_runtime_set_active(dev);
> > > 		pm_runtime_enable(dev);
> > > 	}
> > > 	return rc;
> > > 
> > > That wouldn't need any special support from the PM core, other than the
> > > new pm_runtime_in_use routine.
> > 
> > That looks reasonable, but in .resume_early() you can do it like this I think:
> > 
> > 	bool disk_spinning = disk_is_spinning(dev);
> > 	int rc = 0;
> > 
> > 	if (!pm_runtime_in_use(dev) && !disk_spinning) {
> > 		pm_runtime_set_suspended(dev);
> > 	} else {
> > 		pm_runtime_set_active(dev);
> > 		if (!disk_spinning)
> > 			rc = spin_up_the_disk(dev);
> > 	}
> > 
> > and of course there needs to be a flag to pass to .resume() in case the
> > device is supposed to be left in the suspended state.
> 
> Agreed, there are various ways to get the same end result.  My question 
> here is: How should we implement pm_runtime_in_use()?

In my opinion what really matters is (a) whether or not the pm_runtime_enable()
in device_resume_early() will actually enable runtime PM and (b) whether or not
the pm_runtime_put() in device_complete() will actually make it possible to
suspend the device.  So, these are the conditions that I would check.

> > > > > In addition, we only want to end up in RPM_SUSPENDED if the usage
> > > > > counters are 0 (in particular, power/control = "auto").  Otherwise, we
> > > > > want the resume routine always to spin up the drive, if it hasn't spun
> > > > > up by itself.
> > > > 
> > > > That still can be done in .resume_early().  Or in .resume_noirq() if device
> > > > interrupts are not requisite for that.
> > > 
> > > No, currently it cannot be done.
> > > 
> > > In both resume_early and resume_noirq, the usage counter is _always_ >
> > > 0.  This is because device_prepare calls pm_runtime_get_noresume, and
> > > the corresponding pm_runtime_put doesn't occur until device_complete.  
> > > That's the problem I need to fix.
> > 
> > Hmm.  Why do you have to compare usage_count with 0 and not with 1?
> 
> It is an awkward problem.  During a system sleep transition,
> pm_runtime_in_use() should return 0 if the usage_count is 1 (because of
> the pm_runtime_get_noresume() in device_prepare()).
> 
> But at other times (i.e., during normal operation), pm_runtime_in_use()  
> should return 0 if the usage_count is 0.
> 
> I suppose a simple way to solve the problem is to document that
> pm_runtime_in_use() should only be invoked from within a system sleep
> callback routine.  Then we can compare against 1 and not worry about
> the normal-operation case.  Maybe call it
> pm_runtime_usage_during_sleep() to make this restriction explicit.
> 
> Would that be acceptable?

Yes, it would, as far as I'm concerned.

In my opinion it only is really useful to call it before calling
pm_runtime_enable() in device_resume_early().

Thanks,
Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Phillip Susi Jan. 11, 2014, 2:08 a.m. UTC | #14
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

On 01/10/2014 06:02 PM, Rafael J. Wysocki wrote:
> So, my opinion is that the runtime PM internal information should
> be made reflect the actual (and/or desirable) state of things
> before it is re-enabled, which practically means in the
> .resume_noirq()/.resume_early() time frame.
> 
> Doing it after that might actually work, but is questionable from
> the overall consistency perspective (it basically would mean that
> we ran with wrong runtime PM data for a while after we'd re-enabled
> it and before it got adjusted, so we shouldn't have re-enabled it
> in the first place).

But we don't want to block the system resume path for too long while
we figure out what the correct state is.


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBCgAGBQJS0KedAAoJEI5FoCIzSKrw41EH/0phrR8FKj2D8H2YidMluZ0/
0crckeYkz4VMEkxMrzjnFl1sxwaWLfYW1GOpJ6FnGZqArjLr0GktOHBJfVTWqNqI
bNIFW5rKX/JKgns3Nmv1Zx5twxHuIfVtAF2mjgorabyKyRpaVwsiZa9ogLgBnUVr
ipLYX1mBRvmoMPDHYB3/wrefc4c0Eg4n1O24/wVhehxtsBCeN9BKUUEcIRvkiRUj
FwMmWXuZMlTGK2LhjmgCMuH7Ea9r8sAJWUkGA8zJJLuS8y7kK9grUwAW5IhfrJGH
Wg5sBce8sKhoX7h5gze+rVktrM1B4DDzEt4J356lmIjfeQ75rUrHMzEDvzh021M=
=+ubG
-----END PGP SIGNATURE-----
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Stern Jan. 11, 2014, 10:34 p.m. UTC | #15
On Sat, 11 Jan 2014, Rafael J. Wysocki wrote:

> I should have said that in my opinion disabling runtime PM in order to do
> anything with the fields owned by it and re-enabling it right after that
> wasn't quite right in principle.  Re-adjusting those fields *before* we
> re-enable runtime PM during system resume is quite different from doing
> the on-off trick *after* we've re-enabled runtime PM.
> 
> So, my opinion is that the runtime PM internal information should be made
> reflect the actual (and/or desirable) state of things before it is re-enabled,
> which practically means in the .resume_noirq()/.resume_early() time frame.

That makes sense.

> Doing it after that might actually work, but is questionable from the
> overall consistency perspective (it basically would mean that we ran with
> wrong runtime PM data for a while after we'd re-enabled it and before it
> got adjusted, so we shouldn't have re-enabled it in the first place).

Of course, the middle of a system resume is a rather special time.  
Still, I agree with your basic point.

> > Agreed, there are various ways to get the same end result.  My question 
> > here is: How should we implement pm_runtime_in_use()?
> 
> In my opinion what really matters is (a) whether or not the pm_runtime_enable()
> in device_resume_early() will actually enable runtime PM and (b) whether or not
> the pm_runtime_put() in device_complete() will actually make it possible to
> suspend the device.  So, these are the conditions that I would check.

Yes.  The key being that the check will give the correct result only
during the _noirq and _late/_early phases of system sleep, not at other
times.

> > I suppose a simple way to solve the problem is to document that
> > pm_runtime_in_use() should only be invoked from within a system sleep
> > callback routine.  Then we can compare against 1 and not worry about
> > the normal-operation case.  Maybe call it
> > pm_runtime_usage_during_sleep() to make this restriction explicit.
> > 
> > Would that be acceptable?
> 
> Yes, it would, as far as I'm concerned.
> 
> In my opinion it only is really useful to call it before calling
> pm_runtime_enable() in device_resume_early().

Okay, I'll put something together.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Stern Jan. 11, 2014, 10:50 p.m. UTC | #16
On Fri, 10 Jan 2014, Phillip Susi wrote:

> On 01/10/2014 06:02 PM, Rafael J. Wysocki wrote:
> > So, my opinion is that the runtime PM internal information should
> > be made reflect the actual (and/or desirable) state of things
> > before it is re-enabled, which practically means in the
> > .resume_noirq()/.resume_early() time frame.
> > 
> > Doing it after that might actually work, but is questionable from
> > the overall consistency perspective (it basically would mean that
> > we ran with wrong runtime PM data for a while after we'd re-enabled
> > it and before it got adjusted, so we shouldn't have re-enabled it
> > in the first place).
> 
> But we don't want to block the system resume path for too long while
> we figure out what the correct state is.

We won't.  The resume_early routine will set the status to RPM_ACTIVE 
or RPM_SUSPENDED, according to the result from 
pm_runtime_usage_during_sleep.  It will also do a pm_runtime_get_sync 
on the disk's parent, to prevent the parent from going into runtime 
suspend before the async routine can run.

Then the async routine will do the following:

	If the status is RPM_ACTIVE, spin up the disk.

	Else (the status is RPM_SUSPENDED) check whether the disk has
	spun up by itself.  If it has, or if you can't tell, call 
	pm_runtime_resume to set the status correctly and do an 
	explicit spin-up.

	Call pm_runtime_put_sync on the parent.

I think this will give the desired result.  But there is one potential
problem: If the disk has a child device (such as a partition) and the
child's driver needs to do I/O as part of its resume routine, the I/O
is likely to hang if the disk is runtime suspended.  However, I don't
know of any child devices like this.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Phillip Susi Jan. 12, 2014, 1:50 a.m. UTC | #17
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

On 01/11/2014 05:50 PM, Alan Stern wrote:
> We won't.  The resume_early routine will set the status to
> RPM_ACTIVE or RPM_SUSPENDED, according to the result from 
> pm_runtime_usage_during_sleep.  It will also do a
> pm_runtime_get_sync on the disk's parent, to prevent the parent
> from going into runtime suspend before the async routine can run.
> 
> Then the async routine will do the following:
> 
> If the status is RPM_ACTIVE, spin up the disk.
> 
> Else (the status is RPM_SUSPENDED) check whether the disk has spun
> up by itself.  If it has, or if you can't tell, call 
> pm_runtime_resume to set the status correctly and do an explicit
> spin-up.
> 
> Call pm_runtime_put_sync on the parent.

I see now... thanks.

> I think this will give the desired result.  But there is one
> potential problem: If the disk has a child device (such as a
> partition) and the child's driver needs to do I/O as part of its
> resume routine, the I/O is likely to hang if the disk is runtime
> suspended.  However, I don't know of any child devices like this.

Probably should check dm and md to see if they do IO in their resume.


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBCgAGBQJS0fTbAAoJEI5FoCIzSKrwTVoH/3Cl4ZEmE+5aUJX5+mCGmoPD
Ap+UHe517qsfDWwKaDqURTcXaZOqMfbk71jdTCAIIVTSPB95NWFQg3d/XpWAVY29
yoXdUj28WHTlgtfhtqa92K7lImOijxJ/32myEK9/+dQBDXXnG4n+jt1eL+vo9Rhy
vlcAJLKcCsL4SpDD8z/N2lvsmJ5F1vIB5SkM7GiyaR3HHfRIM3hzV4JErtHozgWE
x2zNfD6DXk7CR3GnISq2bznIYvRlfsmvzwCjfJ3AYBJdtFQmuwMDznaVKlCpy/Ya
G07n0HqghB3ZYseZLZt8HyB/men72SZolJCTb/UnkODTsymq9+6J3XrJ6bb29Vc=
=V+EM
-----END PGP SIGNATURE-----
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

Index: linux-pm/include/linux/pm.h
===================================================================
--- linux-pm.orig/include/linux/pm.h
+++ linux-pm/include/linux/pm.h
@@ -532,9 +532,12 @@  struct dev_pm_info {
 	struct wakeup_source	*wakeup;
 	bool			wakeup_path:1;
 	bool			syscore:1;
+#ifdef CONFIG_PM_RUNTIME
+	bool			skip_resume:1;
+#endif
 #else
 	unsigned int		should_wakeup:1;
-#endif
+#endif /* CONFIG_PM_SLEEP */
 #ifdef CONFIG_PM_RUNTIME
 	struct timer_list	suspend_timer;
 	unsigned long		timer_expires;
@@ -561,7 +564,7 @@  struct dev_pm_info {
 	unsigned long		active_jiffies;
 	unsigned long		suspended_jiffies;
 	unsigned long		accounting_timestamp;
-#endif
+#endif /* CONFIG_PM_RUNTIME */
 	struct pm_subsys_data	*subsys_data;  /* Owned by the subsystem. */
 	struct dev_pm_qos	*qos;
 };
@@ -716,4 +719,10 @@  enum dpm_order {
 	DPM_ORDER_DEV_LAST,
 };
 
+#if defined(CONFIG_PM_RUNTIME) && defined(CONFIG_PM_SLEEP)
+void device_pm_skip_resume(struct device *dev, bool enable);
+#else
+static inline void device_pm_skip_resume(struct device *dev, bool enable) {}
+#endif /* CONFIG_PM_RUNTIME && CONFIG_PM_SLEEP */
+
 #endif /* _LINUX_PM_H */
Index: linux-pm/drivers/base/power/main.c
===================================================================
--- linux-pm.orig/drivers/base/power/main.c
+++ linux-pm/drivers/base/power/main.c
@@ -384,6 +384,29 @@  static int dpm_run_callback(pm_callback_
 	return error;
 }
 
+#ifdef CONFIG_PM_RUNTIME
+void device_pm_skip_resume(struct device *dev, bool enable)
+{
+	device_pm_lock();
+
+	if (!dev->power.is_prepared)
+		dev->power.skip_resume = !!enable;
+
+	device_pm_unlock();
+}
+
+static inline bool skip_device_resume(struct device *dev, pm_message_t state)
+{
+	return dev->power.skip_resume && (state.event == PM_EVENT_RESUME
+		|| state.event == PM_EVENT_RESTORE);
+}
+#else
+static inline bool skip_device_resume(struct device *dev, pm_message_t state)
+{
+	return false;
+}
+#endif /* CONFIG_PM_RUNTIME */
+
 /*------------------------- Resume routines -------------------------*/
 
 /**
@@ -446,21 +469,24 @@  static void dpm_resume_noirq(pm_message_
 	mutex_lock(&dpm_list_mtx);
 	while (!list_empty(&dpm_noirq_list)) {
 		struct device *dev = to_device(dpm_noirq_list.next);
-		int error;
 
 		get_device(dev);
 		list_move_tail(&dev->power.entry, &dpm_late_early_list);
-		mutex_unlock(&dpm_list_mtx);
+		if (!skip_device_resume(dev, state)) {
+			int error;
 
-		error = device_resume_noirq(dev, state);
-		if (error) {
-			suspend_stats.failed_resume_noirq++;
-			dpm_save_failed_step(SUSPEND_RESUME_NOIRQ);
-			dpm_save_failed_dev(dev_name(dev));
-			pm_dev_err(dev, state, " noirq", error);
-		}
+			mutex_unlock(&dpm_list_mtx);
+
+			error = device_resume_noirq(dev, state);
+			if (error) {
+				suspend_stats.failed_resume_noirq++;
+				dpm_save_failed_step(SUSPEND_RESUME_NOIRQ);
+				dpm_save_failed_dev(dev_name(dev));
+				pm_dev_err(dev, state, " noirq", error);
+			}
 
-		mutex_lock(&dpm_list_mtx);
+			mutex_lock(&dpm_list_mtx);
+		}
 		put_device(dev);
 	}
 	mutex_unlock(&dpm_list_mtx);
@@ -527,21 +553,39 @@  static void dpm_resume_early(pm_message_
 	mutex_lock(&dpm_list_mtx);
 	while (!list_empty(&dpm_late_early_list)) {
 		struct device *dev = to_device(dpm_late_early_list.next);
-		int error;
 
 		get_device(dev);
 		list_move_tail(&dev->power.entry, &dpm_suspended_list);
-		mutex_unlock(&dpm_list_mtx);
+		if (skip_device_resume(dev, state)) {
+			pm_runtime_set_suspended(dev);
+			pm_runtime_enable(dev);
+			/*
+			 * Balance the pm_runtime_get_noresume() in
+			 * device_prepare().
+			 */
+			pm_runtime_put_noidle(dev);
+			/*
+			 * The device might have been powered up by the platform
+			 * firmware already, so make it resume and then possibly
+			 * suspend again to avoid leaving powered up devices as
+			 * "suspended" for too long.
+			 */
+			pm_request_resume(dev);
+		} else {
+			int error;
 
-		error = device_resume_early(dev, state);
-		if (error) {
-			suspend_stats.failed_resume_early++;
-			dpm_save_failed_step(SUSPEND_RESUME_EARLY);
-			dpm_save_failed_dev(dev_name(dev));
-			pm_dev_err(dev, state, " early", error);
-		}
+			mutex_unlock(&dpm_list_mtx);
+
+			error = device_resume_early(dev, state);
+			if (error) {
+				suspend_stats.failed_resume_early++;
+				dpm_save_failed_step(SUSPEND_RESUME_EARLY);
+				dpm_save_failed_dev(dev_name(dev));
+				pm_dev_err(dev, state, " early", error);
+			}
 
-		mutex_lock(&dpm_list_mtx);
+			mutex_lock(&dpm_list_mtx);
+		}
 		put_device(dev);
 	}
 	mutex_unlock(&dpm_list_mtx);
@@ -682,6 +726,10 @@  void dpm_resume(pm_message_t state)
 
 	list_for_each_entry(dev, &dpm_suspended_list, power.entry) {
 		INIT_COMPLETION(dev->power.completion);
+		if (skip_device_resume(dev, state)) {
+			complete(&dev->power.completion);
+			continue;
+		}
 		if (is_async(dev)) {
 			get_device(dev);
 			async_schedule(async_resume, dev);
@@ -691,7 +739,7 @@  void dpm_resume(pm_message_t state)
 	while (!list_empty(&dpm_suspended_list)) {
 		dev = to_device(dpm_suspended_list.next);
 		get_device(dev);
-		if (!is_async(dev)) {
+		if (!is_async(dev) && !skip_device_resume(dev, state)) {
 			int error;
 
 			mutex_unlock(&dpm_list_mtx);
@@ -780,11 +828,13 @@  void dpm_complete(pm_message_t state)
 		get_device(dev);
 		dev->power.is_prepared = false;
 		list_move(&dev->power.entry, &list);
-		mutex_unlock(&dpm_list_mtx);
+		if (!skip_device_resume(dev, state)) {
+			mutex_unlock(&dpm_list_mtx);
 
-		device_complete(dev, state);
+			device_complete(dev, state);
 
-		mutex_lock(&dpm_list_mtx);
+			mutex_lock(&dpm_list_mtx);
+		}
 		put_device(dev);
 	}
 	list_splice(&list, &dpm_list);