diff mbox

[2/6] PM: Asynchronous resume of devices

Message ID 200908262221.09224.rjw@sisk.pl (mailing list archive)
State Superseded, archived
Headers show

Commit Message

Rafael Wysocki Aug. 26, 2009, 8:21 p.m. UTC
From: Rafael J. Wysocki <rjw@sisk.pl>

Theoretically, the total time of system sleep transitions (suspend
to RAM, hibernation) can be reduced by running suspend and resume
callbacks of device drivers in parallel with each other.  However,
there are dependencies between devices such that, for example, we may
not be allowed to put one device into a low power state before
anohter one has been suspended (e.g. we cannot suspend a bridge
before suspending all devices behind it).  In particular, we're not
allowed to suspend the parent of a device before suspending the
device itself.  Analogously, we're not allowed to resume a device
before resuming its parent.

Thus, to make it possible to execute suspend and resume callbacks
provided by device drivers in parallel with each other, we need to
provide a synchronization mechanism preventing the dependencies
between devices from being violated.

The patch below allows some devices to be resumed asynchronously,
with the following rules:
(1) There is a wait queue head, dev->power.wait_queue, and an
    "operation complete" flag, dev->power.op_complete for each device
    object.
(2) All of the power.op_complete flags are reset before suspend as
    well as after each resume stage (dpm_resume_noirq(),
    dpm_resume()).
(3) If power.async_suspend is set for dev or for one of devices it
    depends on, the PM core waits for the "master" device's
    power.op_complete flag to be set before attempting to run the
    resume callbacks, appropriate for this particular stage of
    resume, for dev.
(4) dev->power.op_complete is set for each device after running its
    resume callbacks at each stage of resume (dpm_resume_noirq(),
    dpm_resume()) and all threads waiting in the devices wait
    queue are woken up.

With this mechanism in place, the drivers wanting their resume
callbacks to be executed asynchronously can set
dev->power.async_suspend for them, with the help of
device_enable_async_suspend().  In addition to that, the PM off-tree
dependencies between devices have to be represented by
'struct pm_link' objects introduced by the previous patch.

In this version of the patch the async threads started to execute
the resume callbacks of specific device don't exit immediately having
done that, but search dpm_list for devices whose PM dependencies have
already been satisfied and execute their callbacks without waiting.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 drivers/base/power/common.c |    5 
 drivers/base/power/main.c   |  327 ++++++++++++++++++++++++++++++++++++++++++--
 include/linux/device.h      |    6 
 include/linux/pm.h          |    6 
 4 files changed, 329 insertions(+), 15 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Alan Stern Aug. 28, 2009, 3:43 p.m. UTC | #1
On Wed, 26 Aug 2009, Rafael J. Wysocki wrote:

> From: Rafael J. Wysocki <rjw@sisk.pl>
> 
> Theoretically, the total time of system sleep transitions (suspend
> to RAM, hibernation) can be reduced by running suspend and resume
> callbacks of device drivers in parallel with each other.  However,
> there are dependencies between devices such that, for example, we may
> not be allowed to put one device into a low power state before
> anohter one has been suspended (e.g. we cannot suspend a bridge
> before suspending all devices behind it).  In particular, we're not
> allowed to suspend the parent of a device before suspending the
> device itself.  Analogously, we're not allowed to resume a device
> before resuming its parent.

> In this version of the patch the async threads started to execute
> the resume callbacks of specific device don't exit immediately having
> done that, but search dpm_list for devices whose PM dependencies have
> already been satisfied and execute their callbacks without waiting.

Given this design, why bother to invoke device_resume() for the async 
devices?  Why not just start up a bunch of async threads, each of which 
calls async_resume() repeatedly until everything is finished?  (And 
rearrange async_resume() to scan the list first and do the actual 
resume second.)

The same goes for the noirq versions.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael Wysocki Aug. 28, 2009, 7:38 p.m. UTC | #2
On Friday 28 August 2009, Alan Stern wrote:
> On Wed, 26 Aug 2009, Rafael J. Wysocki wrote:
> 
> > From: Rafael J. Wysocki <rjw@sisk.pl>
> > 
> > Theoretically, the total time of system sleep transitions (suspend
> > to RAM, hibernation) can be reduced by running suspend and resume
> > callbacks of device drivers in parallel with each other.  However,
> > there are dependencies between devices such that, for example, we may
> > not be allowed to put one device into a low power state before
> > anohter one has been suspended (e.g. we cannot suspend a bridge
> > before suspending all devices behind it).  In particular, we're not
> > allowed to suspend the parent of a device before suspending the
> > device itself.  Analogously, we're not allowed to resume a device
> > before resuming its parent.
> 
> > In this version of the patch the async threads started to execute
> > the resume callbacks of specific device don't exit immediately having
> > done that, but search dpm_list for devices whose PM dependencies have
> > already been satisfied and execute their callbacks without waiting.
> 
> Given this design, why bother to invoke device_resume() for the async 
> devices?  Why not just start up a bunch of async threads, each of which 
> calls async_resume() repeatedly until everything is finished?  (And 
> rearrange async_resume() to scan the list first and do the actual 
> resume second.)
> 
> The same goes for the noirq versions.

I thought about that, but there are a few things to figure out:
- how many threads to start
- when to start them
- stop condition
I had a few ideas, but then I thought it would be simpler to start an async
thread when we know there's some async work to do (ie. there's an async
device to handle) and make each async thread browse the list just once (the
rationale is that we've just handled a device, so there's a chance there are
some devices with satisfied dependencies down the list).

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Stern Aug. 28, 2009, 7:59 p.m. UTC | #3
On Fri, 28 Aug 2009, Rafael J. Wysocki wrote:

> > Given this design, why bother to invoke device_resume() for the async 
> > devices?  Why not just start up a bunch of async threads, each of which 
> > calls async_resume() repeatedly until everything is finished?  (And 
> > rearrange async_resume() to scan the list first and do the actual 
> > resume second.)
> > 
> > The same goes for the noirq versions.
> 
> I thought about that, but there are a few things to figure out:
> - how many threads to start

That's a tough question.  Right now you start roughly as many threads
as there are async devices.  That seems like overkill.

I would expect that a reasonably small number of threads would suffice 
to achieve most of the possible time savings.  Something on the order 
of 10 should work well.  If the majority of the time is spent 
handling N devices then N+1 threads would be enough.  Judging from some 
of the comments posted earlier, even 4 threads would give a big 
advantage.

> - when to start them

You might as well start them at the beginning of dpm_resume and 
dpm_resume_noirq.  That way they can overlap with the synchronous 
operations.

> - stop condition

When an error occurs or when op_started has been set for every 
remaining async device.

> I had a few ideas, but then I thought it would be simpler to start an async
> thread when we know there's some async work to do (ie. there's an async
> device to handle) and make each async thread browse the list just once (the
> rationale is that we've just handled a device, so there's a chance there are
> some devices with satisfied dependencies down the list).

It comes down to this: Should there be many threads, each of which 
browses the list only once, or should there be a few threads, each of 
which browses the list many times?

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael Wysocki Aug. 28, 2009, 10:22 p.m. UTC | #4
On Friday 28 August 2009, Alan Stern wrote:
> On Fri, 28 Aug 2009, Rafael J. Wysocki wrote:
> 
> > > Given this design, why bother to invoke device_resume() for the async 
> > > devices?  Why not just start up a bunch of async threads, each of which 
> > > calls async_resume() repeatedly until everything is finished?  (And 
> > > rearrange async_resume() to scan the list first and do the actual 
> > > resume second.)
> > > 
> > > The same goes for the noirq versions.
> > 
> > I thought about that, but there are a few things to figure out:
> > - how many threads to start
> 
> That's a tough question.  Right now you start roughly as many threads
> as there are async devices.  That seems like overkill.

In fact they are substantially fewer than that, for the following reasons.

First, the async framework will not start more than MAX_THREADS threads,
which is 256 at the moment.  This number is less than the number of async
devices to handle on an average system.

Second, no new async threads are started while the main thread is handling the
sync devices , so the existing threads have a chance to do their job.  If
there's a "cluster" of sync devices in dpm_list, the number of async threads
running is likely to drop rapidly while those devices are being handled.
(BTW, if there were no sync devices, the whole thing would be much simpler,
but I don't think it's realistic to assume we'll be able to get rid of them any
time soon).

Finally, but not least importantly, async threads are not started for the
async devices that were previously handled "out of order" by the already
running async threads (or by async threads that have already finished).  My
testing shows that there are quite a few of them on the average.  For example,
on the HP nx6325 typically there are as many as 580 async devices handled "out
of order" during a _single_ suspend-resume cycle (including the "early" and
"late" phases), while only a few (below 10) devices are waited for by at least
one async thread.

I can try to monitor the number of asyn threads started if you're interested.

> I would expect that a reasonably small number of threads would suffice 
> to achieve most of the possible time savings.  Something on the order 
> of 10 should work well.  If the majority of the time is spent 
> handling N devices then N+1 threads would be enough.  Judging from some 
> of the comments posted earlier, even 4 threads would give a big 
> advantage.

That unfortunately is not the case with the set of async devices including
PCI, ACPI and serio devices only.  The average time savings are between 5% to
14%, depending on the system and the phase of the cycle (the relative savings
are typically greater for suspend).  Still, that amounts to .5 s in some cases.

> > - when to start them
> 
> You might as well start them at the beginning of dpm_resume and 
> dpm_resume_noirq.  That way they can overlap with the synchronous 
> operations.

In that case they would have to wait in the beginning, so I'd need a mechanism
to wake them up.

Alternatively, there could be a limit to the number of async threads started
within the current design, but I'd prefer to leave that to the async framework
(namely, if MAX_THREADS makes sense for boot, it's also likely to make sense
for PM).

> > - stop condition
> 
> When an error occurs or when op_started has been set for every 
> remaining async device.

Yeah, that's the easy one. :-)

> > I had a few ideas, but then I thought it would be simpler to start an async
> > thread when we know there's some async work to do (ie. there's an async
> > device to handle) and make each async thread browse the list just once (the
> > rationale is that we've just handled a device, so there's a chance there are
> > some devices with satisfied dependencies down the list).
> 
> It comes down to this: Should there be many threads, each of which 
> browses the list only once, or should there be a few threads, each of 
> which browses the list many times?

Well, quite obviously I prefer the many threads version. :-)

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Stern Aug. 29, 2009, 2:06 a.m. UTC | #5
On Sat, 29 Aug 2009, Rafael J. Wysocki wrote:

> On Friday 28 August 2009, Alan Stern wrote:
> > On Fri, 28 Aug 2009, Rafael J. Wysocki wrote:
> > 
> > > > Given this design, why bother to invoke device_resume() for the async 
> > > > devices?  Why not just start up a bunch of async threads, each of which 
> > > > calls async_resume() repeatedly until everything is finished?  (And 
> > > > rearrange async_resume() to scan the list first and do the actual 
> > > > resume second.)
> > > > 
> > > > The same goes for the noirq versions.
> > > 
> > > I thought about that, but there are a few things to figure out:
> > > - how many threads to start
> > 
> > That's a tough question.  Right now you start roughly as many threads
> > as there are async devices.  That seems like overkill.
> 
> In fact they are substantially fewer than that, for the following reasons.
> 
> First, the async framework will not start more than MAX_THREADS threads,
> which is 256 at the moment.  This number is less than the number of async
> devices to handle on an average system.

Okay, but MAX_THREADS isn't under your control.  Remember also that 
each thread takes up some memory, and during hibernation we are in a 
memory-constrained situation.

> Second, no new async threads are started while the main thread is handling the
> sync devices , so the existing threads have a chance to do their job.  If
> there's a "cluster" of sync devices in dpm_list, the number of async threads
> running is likely to drop rapidly while those devices are being handled.
> (BTW, if there were no sync devices, the whole thing would be much simpler,
> but I don't think it's realistic to assume we'll be able to get rid of them any
> time soon).

Perhaps not, but it would be interesting to see what happens if every 
device is async.  Maybe you can try it and get a meaningful result.

> Finally, but not least importantly, async threads are not started for the
> async devices that were previously handled "out of order" by the already
> running async threads (or by async threads that have already finished).  My
> testing shows that there are quite a few of them on the average.  For example,
> on the HP nx6325 typically there are as many as 580 async devices handled "out
> of order" during a _single_ suspend-resume cycle (including the "early" and
> "late" phases), while only a few (below 10) devices are waited for by at least
> one async thread.

That is a difficult sort of thing to know in advance.  It ought to be 
highly influenced by the percentage of async devices; that's another 
reason for wanting to know what happens when every device is async.

> > I would expect that a reasonably small number of threads would suffice 
> > to achieve most of the possible time savings.  Something on the order 
> > of 10 should work well.  If the majority of the time is spent 
> > handling N devices then N+1 threads would be enough.  Judging from some 
> > of the comments posted earlier, even 4 threads would give a big 
> > advantage.
> 
> That unfortunately is not the case with the set of async devices including
> PCI, ACPI and serio devices only.  The average time savings are between 5% to
> 14%, depending on the system and the phase of the cycle (the relative savings
> are typically greater for suspend).  Still, that amounts to .5 s in some cases.

Without context it's hard to be sure, but I don't think your numbers 
contradict what I said.  If you get between 5% and 14% time savings 
with 14 threads, then you might get between 4% and 10% savings with 
only 4 threads.

I must agree, 14 threads isn't a lot.  But at the moment that number is 
random, not under your control.

> > > - when to start them
> > 
> > You might as well start them at the beginning of dpm_resume and 
> > dpm_resume_noirq.  That way they can overlap with the synchronous 
> > operations.
> 
> In that case they would have to wait in the beginning, so I'd need a mechanism
> to wake them up.

You already have two such mechanisms: dpm_list_mtx and the embedded 
wait_queue_heads.  Although in the scheme I'm proposing, no async 
threads would ever have to wait on a per-device waitqueue.  A 
system-wide waitqueue might work out better (for use when a thread 
reaches the end of the list and then waits before starting over at the 
beginning).

> Alternatively, there could be a limit to the number of async threads started
> within the current design, but I'd prefer to leave that to the async framework
> (namely, if MAX_THREADS makes sense for boot, it's also likely to make sense
> for PM).

Strictly speaking, a new thread should be started only when needed.  
That is, only when all the existing threads are busy running a 
callback.  It shouldn't be too hard to keep track of when that happens.

> > It comes down to this: Should there be many threads, each of which 
> > browses the list only once, or should there be a few threads, each of 
> > which browses the list many times?
> 
> Well, quite obviously I prefer the many threads version. :-)

Okay, clearly it's a matter of taste.  To me the many-threads version 
seems less elegant and less well controlled.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael Wysocki Aug. 29, 2009, 12:49 p.m. UTC | #6
On Saturday 29 August 2009, Alan Stern wrote:
> On Sat, 29 Aug 2009, Rafael J. Wysocki wrote:
> 
> > On Friday 28 August 2009, Alan Stern wrote:
> > > On Fri, 28 Aug 2009, Rafael J. Wysocki wrote:
> > > 
> > > > > Given this design, why bother to invoke device_resume() for the async 
> > > > > devices?  Why not just start up a bunch of async threads, each of which 
> > > > > calls async_resume() repeatedly until everything is finished?  (And 
> > > > > rearrange async_resume() to scan the list first and do the actual 
> > > > > resume second.)
> > > > > 
> > > > > The same goes for the noirq versions.
> > > > 
> > > > I thought about that, but there are a few things to figure out:
> > > > - how many threads to start
> > > 
> > > That's a tough question.  Right now you start roughly as many threads
> > > as there are async devices.  That seems like overkill.
> > 
> > In fact they are substantially fewer than that, for the following reasons.
> > 
> > First, the async framework will not start more than MAX_THREADS threads,
> > which is 256 at the moment.  This number is less than the number of async
> > devices to handle on an average system.
> 
> Okay, but MAX_THREADS isn't under your control.  Remember also that 
> each thread takes up some memory, and during hibernation we are in a 
> memory-constrained situation.

We keep some extra free memory for things like this.  It's not likely to be
exhausted by the async threads alone.

> > Second, no new async threads are started while the main thread is handling the
> > sync devices , so the existing threads have a chance to do their job.  If
> > there's a "cluster" of sync devices in dpm_list, the number of async threads
> > running is likely to drop rapidly while those devices are being handled.
> > (BTW, if there were no sync devices, the whole thing would be much simpler,
> > but I don't think it's realistic to assume we'll be able to get rid of them any
> > time soon).
> 
> Perhaps not, but it would be interesting to see what happens if every 
> device is async.  Maybe you can try it and get a meaningful result.

I could if it didn't crash.  Perhaps I can with init=/bin/bash.

> > Finally, but not least importantly, async threads are not started for the
> > async devices that were previously handled "out of order" by the already
> > running async threads (or by async threads that have already finished).  My
> > testing shows that there are quite a few of them on the average.  For example,
> > on the HP nx6325 typically there are as many as 580 async devices handled "out
> > of order" during a _single_ suspend-resume cycle (including the "early" and
> > "late" phases), while only a few (below 10) devices are waited for by at least
> > one async thread.
> 
> That is a difficult sort of thing to know in advance.  It ought to be 
> highly influenced by the percentage of async devices; that's another 
> reason for wanting to know what happens when every device is async.

There are a few factors beyond our direct control influencing this, like for
example how much time it takes to handle each individual device (that may
vary from .5 s to microseconds AFAICS).

In addition to this not only the percentage, but also the distribution of sync
devices in dpm_list has an effect.  For example, the situations where every
second device is sync and where the first half of devices are async and the
other is sync would lead to different kinds of behavior.

> > > I would expect that a reasonably small number of threads would suffice 
> > > to achieve most of the possible time savings.  Something on the order 
> > > of 10 should work well.  If the majority of the time is spent 
> > > handling N devices then N+1 threads would be enough.  Judging from some 
> > > of the comments posted earlier, even 4 threads would give a big 
> > > advantage.
> > 
> > That unfortunately is not the case with the set of async devices including
> > PCI, ACPI and serio devices only.  The average time savings are between 5% to
> > 14%, depending on the system and the phase of the cycle (the relative savings
> > are typically greater for suspend).  Still, that amounts to .5 s in some cases.
> 
> Without context it's hard to be sure, but I don't think your numbers 
> contradict what I said.  If you get between 5% and 14% time savings 
> with 14 threads, then you might get between 4% and 10% savings with 
> only 4 threads.

I only wanted to say that the advantage is not really that "big". :-)

> I must agree, 14 threads isn't a lot.  But at the moment that number is 
> random, not under your control.

It's not directly controlled, but there are some interactions between the
async threads, the main threads and the async framework that don't allow this
number to grow too much.

IMO it sometimes is better to allow things to work themselves out, as long as
they don't explode, than to try to keep everything under strict control.  YMMV.

> > > > - when to start them
> > > 
> > > You might as well start them at the beginning of dpm_resume and 
> > > dpm_resume_noirq.  That way they can overlap with the synchronous 
> > > operations.
> > 
> > In that case they would have to wait in the beginning, so I'd need a mechanism
> > to wake them up.
> 
> You already have two such mechanisms: dpm_list_mtx and the embedded 
> wait_queue_heads.  Although in the scheme I'm proposing, no async 
> threads would ever have to wait on a per-device waitqueue.  A 
> system-wide waitqueue might work out better (for use when a thread 
> reaches the end of the list and then waits before starting over at the 
> beginning).

However, sync devices may depend on the async ones and they need the
per-device wait queues for this purpose.

> > Alternatively, there could be a limit to the number of async threads started
> > within the current design, but I'd prefer to leave that to the async framework
> > (namely, if MAX_THREADS makes sense for boot, it's also likely to make sense
> > for PM).
> 
> Strictly speaking, a new thread should be started only when needed.  
> That is, only when all the existing threads are busy running a 
> callback.  It shouldn't be too hard to keep track of when that happens.

The async framework does that for us. :-)

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael Wysocki Aug. 29, 2009, 7:17 p.m. UTC | #7
On Saturday 29 August 2009, Rafael J. Wysocki wrote:
> On Saturday 29 August 2009, Alan Stern wrote:
...
> > Strictly speaking, a new thread should be started only when needed.  
> > That is, only when all the existing threads are busy running a 
> > callback.  It shouldn't be too hard to keep track of when that happens.
> 
> The async framework does that for us. :-)

So, when I said "async threads", I should have rather said "async functions",
because one async thread can execute multiple functions scheduled with
async_schedule().

Taking that into consideration, it all works like this.  When there's a new
async device to handle and there are no async threads available, a new thread
is started and it runs the async function that handles the device and
"opportunistically" searches dpm_list (once) for other device that are ready
for handling.  After this function has returned, the thread goes to sleep and
it only exits after being idle for 1 s.  In turn, when there's a new async
device to handle and there are async threads available, on of them gets the
async function to run.

I don't really think we can do much better than this in any other approach.

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Stern Aug. 30, 2009, 12:48 a.m. UTC | #8
On Sat, 29 Aug 2009, Rafael J. Wysocki wrote:

> I only wanted to say that the advantage is not really that "big". :-)
> 
> > I must agree, 14 threads isn't a lot.  But at the moment that number is 
> > random, not under your control.
> 
> It's not directly controlled, but there are some interactions between the
> async threads, the main threads and the async framework that don't allow this
> number to grow too much.
> 
> IMO it sometimes is better to allow things to work themselves out, as long as
> they don't explode, than to try to keep everything under strict control.  YMMV.

For testing purposes it would be nice to have a one-line summary for
each device containing a thread ID, start timestamp, end timestamp, and
elapsed time.  With that information you could evaluate the amount of
parallelism and determine where the bottlenecks are.  It would give a
much more detailed picture of the entire process than the total time of
your recent patch 9.

Alan Stern


--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Stern Aug. 30, 2009, 12:53 a.m. UTC | #9
On Sat, 29 Aug 2009, Rafael J. Wysocki wrote:

> On Saturday 29 August 2009, Rafael J. Wysocki wrote:
> > On Saturday 29 August 2009, Alan Stern wrote:
> ...
> > > Strictly speaking, a new thread should be started only when needed.  
> > > That is, only when all the existing threads are busy running a 
> > > callback.  It shouldn't be too hard to keep track of when that happens.
> > 
> > The async framework does that for us. :-)
> 
> So, when I said "async threads", I should have rather said "async functions",
> because one async thread can execute multiple functions scheduled with
> async_schedule().
> 
> Taking that into consideration, it all works like this.  When there's a new
> async device to handle and there are no async threads available, a new thread
> is started and it runs the async function that handles the device and
> "opportunistically" searches dpm_list (once) for other device that are ready
> for handling.  After this function has returned, the thread goes to sleep and
> it only exits after being idle for 1 s.  In turn, when there's a new async
> device to handle and there are async threads available, on of them gets the
> async function to run.
> 
> I don't really think we can do much better than this in any other approach.

Since it obviously works and is relatively simple, I guess we ought to 
adopt it.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Pavel Machek Aug. 30, 2009, 6:45 a.m. UTC | #10
Hi!

> > > > > > The same goes for the noirq versions.
> > > > > 
> > > > > I thought about that, but there are a few things to figure out:
> > > > > - how many threads to start
> > > > 
> > > > That's a tough question.  Right now you start roughly as many threads
> > > > as there are async devices.  That seems like overkill.
> > > 
> > > In fact they are substantially fewer than that, for the following reasons.
> > > 
> > > First, the async framework will not start more than MAX_THREADS threads,
> > > which is 256 at the moment.  This number is less than the number of async
> > > devices to handle on an average system.
> > 
> > Okay, but MAX_THREADS isn't under your control.  Remember also that 
> > each thread takes up some memory, and during hibernation we are in a 
> > memory-constrained situation.
> 
> We keep some extra free memory for things like this.  It's not likely to be
> exhausted by the async threads alone.

What extra memory? You are creating quite a lot of threads. For 256 of
them, it would take cca 2MB... You recently removed code from s2disk
that freed 4MB of extra memory, so you are now essentially relying on
min_free_kbytes -- and even then is not reliable, user might s2disk
under memory pressure. min_free_kbytes is 1MB on my system.

								Pavel
Rafael Wysocki Aug. 30, 2009, 1:09 p.m. UTC | #11
On Sunday 30 August 2009, Pavel Machek wrote:
> Hi!
> 
> > > > > > > The same goes for the noirq versions.
> > > > > > 
> > > > > > I thought about that, but there are a few things to figure out:
> > > > > > - how many threads to start
> > > > > 
> > > > > That's a tough question.  Right now you start roughly as many threads
> > > > > as there are async devices.  That seems like overkill.
> > > > 
> > > > In fact they are substantially fewer than that, for the following reasons.
> > > > 
> > > > First, the async framework will not start more than MAX_THREADS threads,
> > > > which is 256 at the moment.  This number is less than the number of async
> > > > devices to handle on an average system.
> > > 
> > > Okay, but MAX_THREADS isn't under your control.  Remember also that 
> > > each thread takes up some memory, and during hibernation we are in a 
> > > memory-constrained situation.
> > 
> > We keep some extra free memory for things like this.  It's not likely to be
> > exhausted by the async threads alone.
> 
> What extra memory? You are creating quite a lot of threads. For 256 of
> them, it would take cca 2MB...

We never start that many threads and even if there's not enough memory to
start a new thread, the async framework will handle that for us.

> You recently removed code from s2disk that freed 4MB of extra memory,

That was removed from s2ram.  For STD we still have PAGES_FOR_IO and
SPARE_PAGES, nothing's changed there.

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael Wysocki Aug. 30, 2009, 1:15 p.m. UTC | #12
On Sunday 30 August 2009, Alan Stern wrote:
> On Sat, 29 Aug 2009, Rafael J. Wysocki wrote:
> 
> > I only wanted to say that the advantage is not really that "big". :-)
> > 
> > > I must agree, 14 threads isn't a lot.  But at the moment that number is 
> > > random, not under your control.
> > 
> > It's not directly controlled, but there are some interactions between the
> > async threads, the main threads and the async framework that don't allow this
> > number to grow too much.
> > 
> > IMO it sometimes is better to allow things to work themselves out, as long as
> > they don't explode, than to try to keep everything under strict control.  YMMV.
> 
> For testing purposes it would be nice to have a one-line summary for
> each device containing a thread ID, start timestamp, end timestamp, and
> elapsed time.  With that information you could evaluate the amount of
> parallelism and determine where the bottlenecks are.  It would give a
> much more detailed picture of the entire process than the total time of
> your recent patch 9.

Of course it would.  I think I'll implement it.

The purpose of patch 9 is basically to allow one to see how much time is
spent on the handling of devices overall and to compare that with the time
spent on the other operation during suspend-resume.

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

Index: linux-2.6/include/linux/pm.h
===================================================================
--- linux-2.6.orig/include/linux/pm.h
+++ linux-2.6/include/linux/pm.h
@@ -26,6 +26,7 @@ 
 #include <linux/spinlock.h>
 #include <linux/wait.h>
 #include <linux/timer.h>
+#include <linux/completion.h>
 
 /*
  * Callbacks for platform drivers to implement.
@@ -409,20 +410,23 @@  enum rpm_request {
 
 struct dev_pm_info {
 	spinlock_t		lock;
+	wait_queue_head_t	wait_queue;
 	struct list_head	master_links;
 	struct list_head	slave_links;
 	pm_message_t		power_state;
 	unsigned int		can_wakeup:1;
 	unsigned int		should_wakeup:1;
+	unsigned int		async_suspend:1;
 	enum dpm_state		status;		/* Owned by the PM core */
 #ifdef CONFIG_PM_SLEEP
 	struct list_head	entry;
+	unsigned int		op_started:1;
+	unsigned int		op_complete:1;
 #endif
 #ifdef CONFIG_PM_RUNTIME
 	struct timer_list	suspend_timer;
 	unsigned long		timer_expires;
 	struct work_struct	work;
-	wait_queue_head_t	wait_queue;
 	atomic_t		usage_count;
 	atomic_t		child_count;
 	unsigned int		disable_depth:3;
Index: linux-2.6/include/linux/device.h
===================================================================
--- linux-2.6.orig/include/linux/device.h
+++ linux-2.6/include/linux/device.h
@@ -472,6 +472,12 @@  static inline int device_is_registered(s
 	return dev->kobj.state_in_sysfs;
 }
 
+static inline void device_enable_async_suspend(struct device *dev, bool enable)
+{
+	if (dev->power.status == DPM_ON)
+		dev->power.async_suspend = enable;
+}
+
 void driver_init(void);
 
 /*
Index: linux-2.6/drivers/base/power/main.c
===================================================================
--- linux-2.6.orig/drivers/base/power/main.c
+++ linux-2.6/drivers/base/power/main.c
@@ -26,6 +26,8 @@ 
 #include <linux/resume-trace.h>
 #include <linux/rwsem.h>
 #include <linux/interrupt.h>
+#include <linux/async.h>
+#include <linux/completion.h>
 
 #include "../base.h"
 #include "power.h"
@@ -43,6 +45,7 @@ 
 LIST_HEAD(dpm_list);
 
 static DEFINE_MUTEX(dpm_list_mtx);
+static pm_message_t pm_transition;
 
 /*
  * Set once the preparation of devices for a PM transition has started, reset
@@ -145,6 +148,131 @@  void device_pm_move_last(struct device *
 }
 
 /**
+ * dpm_reset - Clear power.op_started and power.op_complete for given device.
+ * @dev: Device to handle.
+ */
+static void dpm_reset(struct device *dev)
+{
+	dev->power.op_started = false;
+	dev->power.op_complete = false;
+}
+
+/**
+ * dpm_reset_all - Call dpm_reset() for all devices.
+ */
+static void dpm_reset_all(void)
+{
+	struct device *dev;
+
+	list_for_each_entry(dev, &dpm_list, power.entry)
+		dpm_reset(dev);
+}
+
+/**
+ * dpm_synchronize_noirq - Wait for "late" or "early" PM callbacks to complete.
+ *
+ * Wait for the "late" or "early" suspend/resume callbacks of all devices to
+ * complete and clear power.op_started and power.op_complete for all devices.
+ */
+static void dpm_synchronize_noirq(void)
+{
+	async_synchronize_full();
+	dpm_reset_all();
+}
+
+/**
+ * dpm_synchronize_noirq - Wait for PM callbacks to complete.
+ *
+ * Wait for the "regular" suspend/resume callbacks of all devices to complete
+ * and clear power.op_started and power.op_complete for all devices.
+ */
+static void dpm_synchronize(void)
+{
+	async_synchronize_full();
+	mutex_lock(&dpm_list_mtx);
+	dpm_reset_all();
+	mutex_unlock(&dpm_list_mtx);
+}
+
+/**
+ * device_pm_wait - Wait for a PM operation to complete.
+ * @sub: "Slave" device.
+ * @dev: Device to wait for.
+ *
+ * Wait for a PM operation carried out for @dev to complete, unless both @sub
+ * and @dev have to be handled synchronously (in such a case they are going to
+ * be handled in the right order anyway thanks to the pm_list ordering).
+ */
+static void device_pm_wait(struct device *sub, struct device *dev)
+{
+	if (!dev)
+		return;
+
+	if (!(sub->power.async_suspend || dev->power.async_suspend))
+		return;
+
+	if (!dev->power.op_complete) {
+		dev_dbg(sub, "PM: Waiting for %s %s\n", dev_driver_string(dev),
+			dev_name(dev));
+		wait_event(dev->power.wait_queue, !!dev->power.op_complete);
+	}
+}
+
+/**
+ * device_pm_wait_fn - Wrapper for device_pm_wait().
+ * @dev: Device to wait for.
+ * @data: Pointer to the "slave" device object.
+ */
+static int device_pm_wait_fn(struct device *dev, void *data)
+{
+	device_pm_wait((struct device *)data, dev);
+	return 0;
+}
+
+/**
+ * device_pm_wait_for_masters - Wait for all masters of given device.
+ * @slave: Device to wait for the masters of.
+ */
+static void device_pm_wait_for_masters(struct device *slave)
+{
+	if (!pm_trace_enabled)
+		device_for_each_master(slave, slave, device_pm_wait_fn);
+}
+
+/**
+ * device_pm_check - Check the power.op_complete flag of given device.
+ * @dev: Device to check.
+ */
+static bool device_pm_check(struct device *dev)
+{
+	int ret = 0;
+
+	if (dev)
+		ret = !dev->power.op_complete;
+
+	return ret;
+}
+
+/**
+ * device_pm_check_fn - Wrapper for device_pm_check().
+ * @dev: Device to check.
+ * @data: Ignored.
+ */
+static int device_pm_check_fn(struct device *dev, void *data)
+{
+	return device_pm_check(dev);
+}
+
+/**
+ * device_pm_check_masters - Check power.op_complete for masters of a device.
+ * @slave: Device to check the masters of.
+ */
+static int device_pm_check_masters(struct device *slave)
+{
+	return device_for_each_master(slave, NULL, device_pm_check_fn);
+}
+
+/**
  * pm_op - Execute the PM operation appropriate for given PM event.
  * @dev: Device to handle.
  * @ops: PM operations to choose from.
@@ -269,6 +397,24 @@  static int pm_noirq_op(struct device *de
 	return error;
 }
 
+/**
+ * pm_op_started - Mark the beginning of a PM operation for given device.
+ * @dev: Device to handle.
+ */
+static bool pm_op_started(struct device *dev)
+{
+	bool ret = false;
+
+	spin_lock_irq(&dev->power.lock);
+	if (dev->power.op_started)
+		ret = true;
+	else
+		dev->power.op_started = true;
+	spin_unlock_irq(&dev->power.lock);
+
+	return ret;
+}
+
 static char *pm_verb(int event)
 {
 	switch (event) {
@@ -310,33 +456,102 @@  static void pm_dev_err(struct device *de
 /*------------------------- Resume routines -------------------------*/
 
 /**
- * device_resume_noirq - Execute an "early resume" callback for given device.
+ * __device_resume_noirq - Execute an "early resume" callback for given device.
  * @dev: Device to handle.
  * @state: PM transition of the system being carried out.
  *
  * The driver of @dev will not receive interrupts while this function is being
  * executed.
  */
-static int device_resume_noirq(struct device *dev, pm_message_t state)
+static int __device_resume_noirq(struct device *dev, pm_message_t state)
 {
 	int error = 0;
 
 	TRACE_DEVICE(dev);
 	TRACE_RESUME(0);
 
-	if (!dev->bus)
-		goto End;
-
-	if (dev->bus->pm) {
+	if (dev->bus && dev->bus->pm) {
 		pm_dev_dbg(dev, state, "EARLY ");
 		error = pm_noirq_op(dev, dev->bus->pm, state);
 	}
- End:
+
+	dev->power.op_complete = true;
+	wake_up_all(&dev->power.wait_queue);
+
 	TRACE_RESUME(error);
 	return error;
 }
 
 /**
+ * async_device_resume_noirq - Wrapper of __device_resume_noirq().
+ * @dev: Device to resume.
+ */
+static void async_device_resume_noirq(struct device *dev)
+{
+	int error;
+
+	pm_dev_dbg(dev, pm_transition, "async EARLY ");
+	error = __device_resume_noirq(dev, pm_transition);
+	if (error)
+		pm_dev_err(dev, pm_transition, " async EARLY", error);
+}
+
+/**
+ * async_resume_noirq - Execute "early" resume callbacks asynchronously.
+ * @data: Pointer to the first device to resume.
+ * @cookie: Ignored.
+ *
+ * The execution of this function is scheduled with async_schedule(), so it runs
+ * in its own kernel thread.  It first calls the "early" resume callback for the
+ * device passed to it as @data.  Next, it walks dpm_list looking for devices
+ * that can be resumed without waiting for their "masters".  If such a device is
+ * found, its "early" resume callback is run.
+ */
+static void async_resume_noirq(void *data, async_cookie_t cookie)
+{
+	struct device *dev = (struct device *)data;
+
+	device_pm_wait_for_masters(dev);
+	async_device_resume_noirq(dev);
+
+	list_for_each_entry_continue(dev, &dpm_list, power.entry) {
+		if (!dev->power.async_suspend || dev->power.status <= DPM_OFF)
+			continue;
+
+		if (device_pm_check_masters(dev))
+			continue;
+
+		if (pm_op_started(dev))
+			continue;
+
+		pm_dev_dbg(dev, pm_transition, "out of order EARLY ");
+		async_device_resume_noirq(dev);
+	}
+}
+
+/**
+ * device_resume_noirq - Execute or schedule "early" resume callback.
+ * @dev: Device to resume.
+ *
+ * If @dev can be resumed asynchronously, schedule the execution of
+ * async_resume_noirq() for it.  Otherwise, execute its "early" resume callback
+ * directly.
+ */
+static int device_resume_noirq(struct device *dev)
+{
+	if (pm_op_started(dev))
+		return 0;
+
+	if (dev->power.async_suspend && !pm_trace_enabled) {
+		async_schedule(async_resume_noirq, dev);
+		return 0;
+	}
+
+	device_pm_wait_for_masters(dev);
+	return __device_resume_noirq(dev, pm_transition);
+}
+
+/**
  * dpm_resume_noirq - Execute "early resume" callbacks for non-sysdev devices.
  * @state: PM transition of the system being carried out.
  *
@@ -349,26 +564,28 @@  void dpm_resume_noirq(pm_message_t state
 
 	mutex_lock(&dpm_list_mtx);
 	transition_started = false;
+	pm_transition = state;
 	list_for_each_entry(dev, &dpm_list, power.entry)
 		if (dev->power.status > DPM_OFF) {
 			int error;
 
 			dev->power.status = DPM_OFF;
-			error = device_resume_noirq(dev, state);
+			error = device_resume_noirq(dev);
 			if (error)
-				pm_dev_err(dev, state, " early", error);
+				pm_dev_err(dev, state, " EARLY", error);
 		}
+	dpm_synchronize_noirq();
 	mutex_unlock(&dpm_list_mtx);
 	resume_device_irqs();
 }
 EXPORT_SYMBOL_GPL(dpm_resume_noirq);
 
 /**
- * device_resume - Execute "resume" callbacks for given device.
+ * __device_resume - Execute "resume" callbacks for given device.
  * @dev: Device to handle.
  * @state: PM transition of the system being carried out.
  */
-static int device_resume(struct device *dev, pm_message_t state)
+static int __device_resume(struct device *dev, pm_message_t state)
 {
 	int error = 0;
 
@@ -409,12 +626,92 @@  static int device_resume(struct device *
 	}
  End:
 	up(&dev->sem);
+	dev->power.op_complete = true;
+	wake_up_all(&dev->power.wait_queue);
 
 	TRACE_RESUME(error);
 	return error;
 }
 
 /**
+ * async_device_resume - Wrapper of __device_resume().
+ * @dev: Device to resume.
+ */
+static void async_device_resume(struct device *dev)
+{
+	int error;
+
+	pm_dev_dbg(dev, pm_transition, "async ");
+	error = __device_resume(dev, pm_transition);
+	if (error)
+		pm_dev_err(dev, pm_transition, " async", error);
+}
+
+/**
+ * async_resume - Execute resume callbacks asynchronously.
+ * @data: Pointer to the first device to resume.
+ * @cookie: Ignored.
+ *
+ * The execution of this function is scheduled with async_schedule(), so it runs
+ * in its own kernel thread.  It first calls the resume callbacks for the device
+ * passed to it as @data.  Next, it walks dpm_list looking for devices that can
+ * be resumed without waiting for their "masters".  If such a device is found,
+ * its resume callbacks are run.
+ */
+static void async_resume(void *data, async_cookie_t cookie)
+{
+	struct device *dev = (struct device *)data;
+
+	device_pm_wait_for_masters(dev);
+
+ repeat:
+	async_device_resume(dev);
+	put_device(dev);
+
+	mutex_lock(&dpm_list_mtx);
+	if (dev->power.status < DPM_OFF)
+		dev = to_device(dpm_list.next);
+	list_for_each_entry_continue(dev, &dpm_list, power.entry) {
+		if (!dev->power.async_suspend || dev->power.status < DPM_OFF)
+			continue;
+
+		if (device_pm_check_masters(dev))
+			continue;
+
+		if (pm_op_started(dev))
+			continue;
+
+		get_device(dev);
+		mutex_unlock(&dpm_list_mtx);
+		pm_dev_dbg(dev, pm_transition, "out of order ");
+		goto repeat;
+	}
+	mutex_unlock(&dpm_list_mtx);
+}
+
+/**
+ * device_resume - Execute or schedule resume callbacks for given device.
+ * @dev: Device to resume.
+ *
+ * If @dev can be resumed asynchronously, schedule the execution of
+ * async_resume() for it.  Otherwise, execute its resume callbacks directly.
+ */
+static int device_resume(struct device *dev)
+{
+	if (pm_op_started(dev))
+		return 0;
+
+	if (dev->power.async_suspend && !pm_trace_enabled) {
+		get_device(dev);
+		async_schedule(async_resume, dev);
+		return 0;
+	}
+
+	device_pm_wait_for_masters(dev);
+	return __device_resume(dev, pm_transition);
+}
+
+/**
  * dpm_resume - Execute "resume" callbacks for non-sysdev devices.
  * @state: PM transition of the system being carried out.
  *
@@ -427,6 +724,7 @@  static void dpm_resume(pm_message_t stat
 
 	INIT_LIST_HEAD(&list);
 	mutex_lock(&dpm_list_mtx);
+	pm_transition = state;
 	while (!list_empty(&dpm_list)) {
 		struct device *dev = to_device(dpm_list.next);
 
@@ -437,7 +735,7 @@  static void dpm_resume(pm_message_t stat
 			dev->power.status = DPM_RESUMING;
 			mutex_unlock(&dpm_list_mtx);
 
-			error = device_resume(dev, state);
+			error = device_resume(dev);
 
 			mutex_lock(&dpm_list_mtx);
 			if (error)
@@ -452,6 +750,7 @@  static void dpm_resume(pm_message_t stat
 	}
 	list_splice(&list, &dpm_list);
 	mutex_unlock(&dpm_list_mtx);
+	dpm_synchronize();
 }
 
 /**
@@ -775,8 +1074,10 @@  static int dpm_prepare(pm_message_t stat
 			break;
 		}
 		dev->power.status = DPM_SUSPENDING;
-		if (!list_empty(&dev->power.entry))
+		if (!list_empty(&dev->power.entry)) {
 			list_move_tail(&dev->power.entry, &list);
+			dpm_reset(dev);
+		}
 		put_device(dev);
 	}
 	list_splice(&list, &dpm_list);
Index: linux-2.6/drivers/base/power/common.c
===================================================================
--- linux-2.6.orig/drivers/base/power/common.c
+++ linux-2.6/drivers/base/power/common.c
@@ -19,10 +19,11 @@ 
  */
 void device_pm_init(struct device *dev)
 {
-	dev->power.status = DPM_ON;
 	spin_lock_init(&dev->power.lock);
+	init_waitqueue_head(&dev->power.wait_queue);
 	INIT_LIST_HEAD(&dev->power.master_links);
 	INIT_LIST_HEAD(&dev->power.slave_links);
+	dev->power.status = DPM_ON;
 	pm_runtime_init(dev);
 }
 
@@ -117,6 +118,8 @@  int pm_link_add(struct device *slave, st
 	return 0;
 
  err_link:
+	master->power.async_suspend = false;
+	slave->power.async_suspend = false;
 	error = -ENOMEM;
 	put_device(slave);