diff mbox series

[v1,3/5] PM: sleep: Resume children right after resuming the parent

Message ID 1819312.VLH7GnMWUR@rjwysocki.net (mailing list archive)
State Superseded, archived
Headers show
Series PM: sleep: Improvements of async suspend and resume of devices | expand

Commit Message

Rafael J. Wysocki Feb. 25, 2025, 4:45 p.m. UTC
From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

According to [1], the handling of device suspend and resume, and
particularly the latter, involves unnecessary overhead related to
starting new async work items for devices that cannot make progress
right away because they have to wait for other devices.

To reduce this problem in the resume path, use the observation that
starting the async resume of the children of a device after resuming
the parent is likely to produce less scheduling and memory management
noise than starting it upfront while at the same time it should not
increase the resume duration substantially.

Accordingly, modify the code to start the async resume of the device's
children when the processing of the parent has been completed in each
stage of device resume and only start async resume upfront for devices
without parents.

Also make it check if a given device can be resumed asynchronously
before starting the synchronous resume of it in case it will have to
wait for another that is already resuming asynchronously.

In addition to making the async resume of devices more friendly to
systems with relatively less computing resources, this change is also
preliminary for analogous changes in the suspend path.

On the systems where it has been tested, this change by itself does
not affect the overall system resume duration in a measurable way.

Link: https://lore.kernel.org/linux-pm/20241114220921.2529905-1-saravanak@google.com/ [1]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/base/power/main.c |   72 ++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 63 insertions(+), 9 deletions(-)

Comments

Saravana Kannan March 13, 2025, 1:46 a.m. UTC | #1
Sorry for the delay, I wanted to double, triple, multiple check my
replies to make sure I didn't get it wrong. I hope I didn't.

On Tue, Feb 25, 2025 at 8:46 AM Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> According to [1], the handling of device suspend and resume, and
> particularly the latter, involves unnecessary overhead related to
> starting new async work items for devices that cannot make progress
> right away because they have to wait for other devices.
>
> To reduce this problem in the resume path, use the observation that
> starting the async resume of the children of a device after resuming
> the parent is likely to produce less scheduling and memory management
> noise than starting it upfront while at the same time it should not
> increase the resume duration substantially.
>
> Accordingly, modify the code to start the async resume of the device's
> children when the processing of the parent has been completed in each
> stage of device resume and only start async resume upfront for devices
> without parents.
>
> Also make it check if a given device can be resumed asynchronously
> before starting the synchronous resume of it in case it will have to
> wait for another that is already resuming asynchronously.
>
> In addition to making the async resume of devices more friendly to
> systems with relatively less computing resources, this change is also
> preliminary for analogous changes in the suspend path.
>
> On the systems where it has been tested, this change by itself does
> not affect the overall system resume duration in a measurable way.
>
> Link: https://lore.kernel.org/linux-pm/20241114220921.2529905-1-saravanak@google.com/ [1]
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
>  drivers/base/power/main.c |   72 ++++++++++++++++++++++++++++++++++++++++------
>  1 file changed, 63 insertions(+), 9 deletions(-)
>
> --- a/drivers/base/power/main.c
> +++ b/drivers/base/power/main.c
> @@ -621,12 +621,41 @@
>         return false;
>  }
>
> +static int dpm_async_unless_in_progress(struct device *dev, void *fn)
> +{
> +       async_func_t func = fn;
> +
> +       if (!dev->power.work_in_progress)
> +               dpm_async_fn(dev, func);
> +
> +       return 0;
> +}
> +
> +static void dpm_async_resume_children(struct device *dev, async_func_t func)
> +{
> +       mutex_lock(&dpm_list_mtx);
> +
> +       /*
> +        * Start processing "async" children of the device unless it's been
> +        * started already for them.
> +        *
> +        * This could have been done for the device's "async" consumers too, but
> +        * they either need to wait for their parents or the processing has
> +        * already started for them after their parents were processed.
> +        */
> +       device_for_each_child(dev, func, dpm_async_unless_in_progress);

Is there a reason you aren't resuming the consumers? Or checking that
the children won't block on any suppliers?

Not dealing with device links might be ok for systems where there
aren't a lot of device links, but in DT based systems with fw_devlink
this patch will not make much of a difference. There'll still be a ton
of "sleep and try again" issues because of the supplier/consumer
dependencies. And when you include device links, it's a dependency
graph and no longer a dependency tree. So things become a bit more
complicated.

Also, not taking device links isn't consideration when kicking off
async work is not just about additional sleep/wake cycles, but it'll
also cause more thread creation because blocked "async" worker threads
will quickly use up the worker thread pool and more threads need to be
created.

> +
> +       mutex_unlock(&dpm_list_mtx);
> +}
> +
>  static void dpm_clear_async_state(struct device *dev)
>  {
>         reinit_completion(&dev->power.completion);
>         dev->power.work_in_progress = false;
>  }
>
> +static void async_resume_noirq(void *data, async_cookie_t cookie);
> +
>  /**
>   * device_resume_noirq - Execute a "noirq resume" callback for given device.
>   * @dev: Device to handle.
> @@ -710,6 +739,8 @@
>                 dpm_save_failed_dev(dev_name(dev));
>                 pm_dev_err(dev, state, async ? " async noirq" : " noirq", error);
>         }
> +
> +       dpm_async_resume_children(dev, async_resume_noirq);

Similar comment here.

>  }
>
>  static void async_resume_noirq(void *data, async_cookie_t cookie)
> @@ -733,19 +764,24 @@
>         mutex_lock(&dpm_list_mtx);
>
>         /*
> -        * Trigger the resume of "async" devices upfront so they don't have to
> -        * wait for the "non-async" ones they don't depend on.
> +        * Start processing "async" devices without parents upfront so they
> +        * don't wait for the "sync" devices they don't depend on.
>          */
>         list_for_each_entry(dev, &dpm_noirq_list, power.entry) {
>                 dpm_clear_async_state(dev);
> -               dpm_async_fn(dev, async_resume_noirq);
> +               if (!dev->parent)
> +                       dpm_async_fn(dev, async_resume_noirq);
>         }
>
>         while (!list_empty(&dpm_noirq_list)) {
>                 dev = to_device(dpm_noirq_list.next);
>                 list_move_tail(&dev->power.entry, &dpm_late_early_list);
>
> +               dpm_async_unless_in_progress(dev, async_resume_noirq);
> +
>                 if (!dev->power.work_in_progress) {
> +                       dev->power.work_in_progress = true;
> +
>                         get_device(dev);
>
>                         mutex_unlock(&dpm_list_mtx);
> @@ -781,6 +817,8 @@
>         device_wakeup_disarm_wake_irqs();
>  }
>
> +static void async_resume_early(void *data, async_cookie_t cookie);
> +
>  /**
>   * device_resume_early - Execute an "early resume" callback for given device.
>   * @dev: Device to handle.
> @@ -848,6 +886,8 @@
>                 dpm_save_failed_dev(dev_name(dev));
>                 pm_dev_err(dev, state, async ? " async early" : " early", error);
>         }
> +
> +       dpm_async_resume_children(dev, async_resume_early);

Similar comment here.

>  }
>
>  static void async_resume_early(void *data, async_cookie_t cookie)
> @@ -875,19 +915,24 @@
>         mutex_lock(&dpm_list_mtx);
>
>         /*
> -        * Trigger the resume of "async" devices upfront so they don't have to
> -        * wait for the "non-async" ones they don't depend on.
> +        * Start processing "async" devices without parents upfront so they
> +        * don't wait for the "sync" devices they don't depend on.
>          */
>         list_for_each_entry(dev, &dpm_late_early_list, power.entry) {
>                 dpm_clear_async_state(dev);

I initially thought that there could be a race here, but it's not the
case because you are using the dpm_list_mtx here. However, since you
are grabbing the dpm_list_mtx lock before you loop through and kick
off the async threads for child devices it'll cause a lot of
unnecessary lock contention/waiting. Which is what this series is
trying to avoid in the first place.

If you take device links into consideration (which we need to do for
this patch series to improve suspend/resume for DT based systems),
there is an even higher chance of racing (or more lock contention)
because multiple threads might attempt to resume the same device.
Which is why in my patch series, I try to use the device links read
lock than using the single dpm_list_mtx mutex.

Thanks,
Saravana

> -               dpm_async_fn(dev, async_resume_early);
> +               if (!dev->parent)
> +                       dpm_async_fn(dev, async_resume_early);
>         }
>
>         while (!list_empty(&dpm_late_early_list)) {
>                 dev = to_device(dpm_late_early_list.next);
>                 list_move_tail(&dev->power.entry, &dpm_suspended_list);
>
> +               dpm_async_unless_in_progress(dev, async_resume_early);
> +
>                 if (!dev->power.work_in_progress) {
> +                       dev->power.work_in_progress = true;
> +
>                         get_device(dev);
>
>                         mutex_unlock(&dpm_list_mtx);
> @@ -919,6 +964,8 @@
>  }
>  EXPORT_SYMBOL_GPL(dpm_resume_start);
>
> +static void async_resume(void *data, async_cookie_t cookie);
> +
>  /**
>   * device_resume - Execute "resume" callbacks for given device.
>   * @dev: Device to handle.
> @@ -1012,6 +1059,8 @@
>                 dpm_save_failed_dev(dev_name(dev));
>                 pm_dev_err(dev, state, async ? " async" : "", error);
>         }
> +
> +       dpm_async_resume_children(dev, async_resume);
>  }
>
>  static void async_resume(void *data, async_cookie_t cookie)
> @@ -1043,19 +1092,24 @@
>         mutex_lock(&dpm_list_mtx);
>
>         /*
> -        * Trigger the resume of "async" devices upfront so they don't have to
> -        * wait for the "non-async" ones they don't depend on.
> +        * Start processing "async" devices without parents upfront so they
> +        * don't wait for the "sync" devices they don't depend on.
>          */
>         list_for_each_entry(dev, &dpm_suspended_list, power.entry) {
>                 dpm_clear_async_state(dev);
> -               dpm_async_fn(dev, async_resume);
> +               if (!dev->parent)
> +                       dpm_async_fn(dev, async_resume);
>         }
>
>         while (!list_empty(&dpm_suspended_list)) {
>                 dev = to_device(dpm_suspended_list.next);
>                 list_move_tail(&dev->power.entry, &dpm_prepared_list);
>
> +               dpm_async_unless_in_progress(dev, async_resume);
> +
>                 if (!dev->power.work_in_progress) {
> +                       dev->power.work_in_progress = true;
> +
>                         get_device(dev);
>
>                         mutex_unlock(&dpm_list_mtx);
>
>
>
Rafael J. Wysocki March 13, 2025, 3:16 p.m. UTC | #2
On Thu, Mar 13, 2025 at 2:47 AM Saravana Kannan <saravanak@google.com> wrote:
>
> Sorry for the delay, I wanted to double, triple, multiple check my
> replies to make sure I didn't get it wrong. I hope I didn't.
>
> On Tue, Feb 25, 2025 at 8:46 AM Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> >
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >
> > According to [1], the handling of device suspend and resume, and
> > particularly the latter, involves unnecessary overhead related to
> > starting new async work items for devices that cannot make progress
> > right away because they have to wait for other devices.
> >
> > To reduce this problem in the resume path, use the observation that
> > starting the async resume of the children of a device after resuming
> > the parent is likely to produce less scheduling and memory management
> > noise than starting it upfront while at the same time it should not
> > increase the resume duration substantially.
> >
> > Accordingly, modify the code to start the async resume of the device's
> > children when the processing of the parent has been completed in each
> > stage of device resume and only start async resume upfront for devices
> > without parents.
> >
> > Also make it check if a given device can be resumed asynchronously
> > before starting the synchronous resume of it in case it will have to
> > wait for another that is already resuming asynchronously.
> >
> > In addition to making the async resume of devices more friendly to
> > systems with relatively less computing resources, this change is also
> > preliminary for analogous changes in the suspend path.
> >
> > On the systems where it has been tested, this change by itself does
> > not affect the overall system resume duration in a measurable way.
> >
> > Link: https://lore.kernel.org/linux-pm/20241114220921.2529905-1-saravanak@google.com/ [1]
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > ---
> >  drivers/base/power/main.c |   72 ++++++++++++++++++++++++++++++++++++++++------
> >  1 file changed, 63 insertions(+), 9 deletions(-)
> >
> > --- a/drivers/base/power/main.c
> > +++ b/drivers/base/power/main.c
> > @@ -621,12 +621,41 @@
> >         return false;
> >  }
> >
> > +static int dpm_async_unless_in_progress(struct device *dev, void *fn)
> > +{
> > +       async_func_t func = fn;
> > +
> > +       if (!dev->power.work_in_progress)
> > +               dpm_async_fn(dev, func);
> > +
> > +       return 0;
> > +}
> > +
> > +static void dpm_async_resume_children(struct device *dev, async_func_t func)
> > +{
> > +       mutex_lock(&dpm_list_mtx);
> > +
> > +       /*
> > +        * Start processing "async" children of the device unless it's been
> > +        * started already for them.
> > +        *
> > +        * This could have been done for the device's "async" consumers too, but
> > +        * they either need to wait for their parents or the processing has
> > +        * already started for them after their parents were processed.
> > +        */
> > +       device_for_each_child(dev, func, dpm_async_unless_in_progress);
>
> Is there a reason you aren't resuming the consumers? Or checking that
> the children won't block on any suppliers?

This is deliberate and the above comment is about it.

At this point, all of the consumers either still need to wait for
their parents, in which case it is not useful to queue them up, or
their parent's have already completed the resume and started their
processing, if they are async.

Checking if children won't block can be added here, but then (a) the
code will need to track their dependencies and (b) walking the
consumers would become necessary which is extra overhead.

Arguably, there are no devices with tons of children, so adding
several async work items that will just wait until they get unblocked
here should not add too much overhead.

> Not dealing with device links might be ok for systems where there
> aren't a lot of device links, but in DT based systems with fw_devlink
> this patch will not make much of a difference. There'll still be a ton
> of "sleep and try again" issues because of the supplier/consumer
> dependencies.

That IMO really needs to be evaluated because it all depends on how
much it takes to resume individual devices.

> And when you include device links, it's a dependency graph and no longer
> a dependency tree. So things become a bit more complicated.

They are, but all I said above still holds I believe.

> Also, not taking device links isn't consideration when kicking off
> async work is not just about additional sleep/wake cycles, but it'll
> also cause more thread creation because blocked "async" worker threads
> will quickly use up the worker thread pool and more threads need to be
> created.

Again, this needs to be evaluated and an actual system.  It would have
been the case if devices had taken no time to actually resume, but it
is far from reality AFAICS.

[skip]

> >  }
> >
> >  static void async_resume_early(void *data, async_cookie_t cookie)
> > @@ -875,19 +915,24 @@
> >         mutex_lock(&dpm_list_mtx);
> >
> >         /*
> > -        * Trigger the resume of "async" devices upfront so they don't have to
> > -        * wait for the "non-async" ones they don't depend on.
> > +        * Start processing "async" devices without parents upfront so they
> > +        * don't wait for the "sync" devices they don't depend on.
> >          */
> >         list_for_each_entry(dev, &dpm_late_early_list, power.entry) {
> >                 dpm_clear_async_state(dev);
>
> I initially thought that there could be a race here, but it's not the
> case because you are using the dpm_list_mtx here. However, since you
> are grabbing the dpm_list_mtx lock before you loop through and kick
> off the async threads for child devices it'll cause a lot of
> unnecessary lock contention/waiting.

I don't think this is what happens here.  It is only processing the
devices without parents which are not too many.

> Which is what this series is trying to avoid in the first place.
>
> If you take device links into consideration (which we need to do for
> this patch series to improve suspend/resume for DT based systems),
> there is an even higher chance of racing (or more lock contention)
> because multiple threads might attempt to resume the same device.
> Which is why in my patch series, I try to use the device links read
> lock than using the single dpm_list_mtx mutex.

I think that you are talking about acquiring dpm_list_mtx in
dpm_async_resume_children().

The reason for acquiring it there is dpm_async_unless_in_progress()
and particularly the power.work_in_progress check in it and the
invocation of dpm_async_fn() which manipulates power.work_in_progress.

If dpm_list_mtx is too contentious for this, the device's power.lock
can be used for protecting the power.work_in_progress accesses in
principle, but that would mean more complexity in the code.

This is unrelated to whether or not device links are walked in
dpm_async_resume_children() (well, if they were, it would need to be
renamed), though.
diff mbox series

Patch

--- a/drivers/base/power/main.c
+++ b/drivers/base/power/main.c
@@ -621,12 +621,41 @@ 
 	return false;
 }
 
+static int dpm_async_unless_in_progress(struct device *dev, void *fn)
+{
+	async_func_t func = fn;
+
+	if (!dev->power.work_in_progress)
+		dpm_async_fn(dev, func);
+
+	return 0;
+}
+
+static void dpm_async_resume_children(struct device *dev, async_func_t func)
+{
+	mutex_lock(&dpm_list_mtx);
+
+	/*
+	 * Start processing "async" children of the device unless it's been
+	 * started already for them.
+	 *
+	 * This could have been done for the device's "async" consumers too, but
+	 * they either need to wait for their parents or the processing has
+	 * already started for them after their parents were processed.
+	 */
+	device_for_each_child(dev, func, dpm_async_unless_in_progress);
+
+	mutex_unlock(&dpm_list_mtx);
+}
+
 static void dpm_clear_async_state(struct device *dev)
 {
 	reinit_completion(&dev->power.completion);
 	dev->power.work_in_progress = false;
 }
 
+static void async_resume_noirq(void *data, async_cookie_t cookie);
+
 /**
  * device_resume_noirq - Execute a "noirq resume" callback for given device.
  * @dev: Device to handle.
@@ -710,6 +739,8 @@ 
 		dpm_save_failed_dev(dev_name(dev));
 		pm_dev_err(dev, state, async ? " async noirq" : " noirq", error);
 	}
+
+	dpm_async_resume_children(dev, async_resume_noirq);
 }
 
 static void async_resume_noirq(void *data, async_cookie_t cookie)
@@ -733,19 +764,24 @@ 
 	mutex_lock(&dpm_list_mtx);
 
 	/*
-	 * Trigger the resume of "async" devices upfront so they don't have to
-	 * wait for the "non-async" ones they don't depend on.
+	 * Start processing "async" devices without parents upfront so they
+	 * don't wait for the "sync" devices they don't depend on.
 	 */
 	list_for_each_entry(dev, &dpm_noirq_list, power.entry) {
 		dpm_clear_async_state(dev);
-		dpm_async_fn(dev, async_resume_noirq);
+		if (!dev->parent)
+			dpm_async_fn(dev, async_resume_noirq);
 	}
 
 	while (!list_empty(&dpm_noirq_list)) {
 		dev = to_device(dpm_noirq_list.next);
 		list_move_tail(&dev->power.entry, &dpm_late_early_list);
 
+		dpm_async_unless_in_progress(dev, async_resume_noirq);
+
 		if (!dev->power.work_in_progress) {
+			dev->power.work_in_progress = true;
+
 			get_device(dev);
 
 			mutex_unlock(&dpm_list_mtx);
@@ -781,6 +817,8 @@ 
 	device_wakeup_disarm_wake_irqs();
 }
 
+static void async_resume_early(void *data, async_cookie_t cookie);
+
 /**
  * device_resume_early - Execute an "early resume" callback for given device.
  * @dev: Device to handle.
@@ -848,6 +886,8 @@ 
 		dpm_save_failed_dev(dev_name(dev));
 		pm_dev_err(dev, state, async ? " async early" : " early", error);
 	}
+
+	dpm_async_resume_children(dev, async_resume_early);
 }
 
 static void async_resume_early(void *data, async_cookie_t cookie)
@@ -875,19 +915,24 @@ 
 	mutex_lock(&dpm_list_mtx);
 
 	/*
-	 * Trigger the resume of "async" devices upfront so they don't have to
-	 * wait for the "non-async" ones they don't depend on.
+	 * Start processing "async" devices without parents upfront so they
+	 * don't wait for the "sync" devices they don't depend on.
 	 */
 	list_for_each_entry(dev, &dpm_late_early_list, power.entry) {
 		dpm_clear_async_state(dev);
-		dpm_async_fn(dev, async_resume_early);
+		if (!dev->parent)
+			dpm_async_fn(dev, async_resume_early);
 	}
 
 	while (!list_empty(&dpm_late_early_list)) {
 		dev = to_device(dpm_late_early_list.next);
 		list_move_tail(&dev->power.entry, &dpm_suspended_list);
 
+		dpm_async_unless_in_progress(dev, async_resume_early);
+
 		if (!dev->power.work_in_progress) {
+			dev->power.work_in_progress = true;
+
 			get_device(dev);
 
 			mutex_unlock(&dpm_list_mtx);
@@ -919,6 +964,8 @@ 
 }
 EXPORT_SYMBOL_GPL(dpm_resume_start);
 
+static void async_resume(void *data, async_cookie_t cookie);
+
 /**
  * device_resume - Execute "resume" callbacks for given device.
  * @dev: Device to handle.
@@ -1012,6 +1059,8 @@ 
 		dpm_save_failed_dev(dev_name(dev));
 		pm_dev_err(dev, state, async ? " async" : "", error);
 	}
+
+	dpm_async_resume_children(dev, async_resume);
 }
 
 static void async_resume(void *data, async_cookie_t cookie)
@@ -1043,19 +1092,24 @@ 
 	mutex_lock(&dpm_list_mtx);
 
 	/*
-	 * Trigger the resume of "async" devices upfront so they don't have to
-	 * wait for the "non-async" ones they don't depend on.
+	 * Start processing "async" devices without parents upfront so they
+	 * don't wait for the "sync" devices they don't depend on.
 	 */
 	list_for_each_entry(dev, &dpm_suspended_list, power.entry) {
 		dpm_clear_async_state(dev);
-		dpm_async_fn(dev, async_resume);
+		if (!dev->parent)
+			dpm_async_fn(dev, async_resume);
 	}
 
 	while (!list_empty(&dpm_suspended_list)) {
 		dev = to_device(dpm_suspended_list.next);
 		list_move_tail(&dev->power.entry, &dpm_prepared_list);
 
+		dpm_async_unless_in_progress(dev, async_resume);
+
 		if (!dev->power.work_in_progress) {
+			dev->power.work_in_progress = true;
+
 			get_device(dev);
 
 			mutex_unlock(&dpm_list_mtx);