Message ID | 2016539.usQuhbGJ8B@rjwysocki.net (mailing list archive) |
---|---|
State | Superseded, archived |
Headers | show |
Series | PM: sleep: Improvements of async suspend and resume of devices | expand |
On Tue, Feb 25, 2025 at 8:46 AM Rafael J. Wysocki <rjw@rjwysocki.net> wrote: > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > In analogy with previous changes, make device_suspend_late() and > device_suspend_noirq() start the async suspend of the device's parent > and suppliers after the device itself has been processed and make > dpm_suspend_late() and dpm_noirq_suspend_devices() start processing > "async" leaf devices (that is, devices without children or consumers) > upfront because they don't need to wait for any other devices. > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > --- > drivers/base/power/main.c | 60 +++++++++++++++++++++++++++++++++++++++------- > 1 file changed, 52 insertions(+), 8 deletions(-) > > --- a/drivers/base/power/main.c > +++ b/drivers/base/power/main.c > @@ -1317,6 +1317,8 @@ > device_links_read_unlock(idx); > } > > +static void async_suspend_noirq(void *data, async_cookie_t cookie); > + > /** > * device_suspend_noirq - Execute a "noirq suspend" callback for given device. > * @dev: Device to handle. > @@ -1396,7 +1398,13 @@ > Complete: > complete_all(&dev->power.completion); > TRACE_SUSPEND(error); > - return error; > + > + if (error || async_error) > + return error; > + > + dpm_async_suspend_superior(dev, async_suspend_noirq); > + > + return 0; > } > > static void async_suspend_noirq(void *data, async_cookie_t cookie) > @@ -1410,6 +1418,7 @@ > static int dpm_noirq_suspend_devices(pm_message_t state) > { > ktime_t starttime = ktime_get(); > + struct device *dev; > int error = 0; > > trace_suspend_resume(TPS("dpm_suspend_noirq"), state.event, true); > @@ -1419,15 +1428,28 @@ > > mutex_lock(&dpm_list_mtx); > > + /* > + * Start processing "async" leaf devices upfront because they don't need > + * to wait. > + */ > + list_for_each_entry_reverse(dev, &dpm_late_early_list, power.entry) { > + dpm_clear_async_state(dev); > + if (dpm_leaf_device(dev)) > + dpm_async_fn(dev, async_suspend_noirq); > + } > + > while (!list_empty(&dpm_late_early_list)) { > - struct device *dev = to_device(dpm_late_early_list.prev); > + dev = to_device(dpm_late_early_list.prev); > > list_move(&dev->power.entry, &dpm_noirq_list); This issue is present in the previous patch too, but it's easier for me to point it out here. Suspend abort will break now. For example, say the devices are suspended in the order A -> B -> C -> D -> E if everything was sync. Case 1: Fully sync devices If C aborts, only A and B will be in the dpm_noirq_list. When we try to undo the suspend, we just resume devices in dpm_noirq_list and that just resumes A and B. Case 2: Only C is sync. When C aborts, A, B, D and E could have finished suspending. But only A and B will be in the dpm_noirq_list. When we try to undo the suspend, we just resume devices in dpm_noirq_list and that just resumes A and B. D and E never get resumed. My fix for this is to move all devices to dpm_noirq_list if a suspend aborts and then using the existing is_suspended/is_noirq_suspended/is_late_suspended flags to skip over devices that haven't been suspended. That works nicely except in is_suspended and I tried to fix it in [2]. But you had an issue with [2] that I didn't fully understand and I meant to dig deeper and fix. But I didn't get around to it as I got swamped with other work. [2] - https://lore.kernel.org/linux-pm/20241114220921.2529905-2-saravanak@google.com/ Thanks, Saravana > > - dpm_clear_async_state(dev); > - if (dpm_async_fn(dev, async_suspend_noirq)) > + dpm_async_unless_in_progress(dev, async_suspend_noirq); > + > + if (dev->power.work_in_progress) > continue; > > + dev->power.work_in_progress = true; > + > get_device(dev); > > mutex_unlock(&dpm_list_mtx); > @@ -1492,6 +1514,8 @@ > spin_unlock_irq(&parent->power.lock); > } > > +static void async_suspend_late(void *data, async_cookie_t cookie); > + > /** > * device_suspend_late - Execute a "late suspend" callback for given device. > * @dev: Device to handle. > @@ -1568,7 +1592,13 @@ > Complete: > TRACE_SUSPEND(error); > complete_all(&dev->power.completion); > - return error; > + > + if (error || async_error) > + return error; > + > + dpm_async_suspend_superior(dev, async_suspend_late); > + > + return 0; > } > > static void async_suspend_late(void *data, async_cookie_t cookie) > @@ -1586,6 +1616,7 @@ > int dpm_suspend_late(pm_message_t state) > { > ktime_t starttime = ktime_get(); > + struct device *dev; > int error = 0; > > trace_suspend_resume(TPS("dpm_suspend_late"), state.event, true); > @@ -1597,15 +1628,28 @@ > > mutex_lock(&dpm_list_mtx); > > + /* > + * Start processing "async" leaf devices upfront because they don't need > + * to wait. > + */ > + list_for_each_entry_reverse(dev, &dpm_suspended_list, power.entry) { > + dpm_clear_async_state(dev); > + if (dpm_leaf_device(dev)) > + dpm_async_fn(dev, async_suspend_late); > + } > + > while (!list_empty(&dpm_suspended_list)) { > - struct device *dev = to_device(dpm_suspended_list.prev); > + dev = to_device(dpm_suspended_list.prev); > > list_move(&dev->power.entry, &dpm_late_early_list); > > - dpm_clear_async_state(dev); > - if (dpm_async_fn(dev, async_suspend_late)) > + dpm_async_unless_in_progress(dev, async_suspend_late); > + > + if (dev->power.work_in_progress) > continue; > > + dev->power.work_in_progress = true; > + > get_device(dev); > > mutex_unlock(&dpm_list_mtx); > > >
On Thu, Mar 13, 2025 at 2:47 AM Saravana Kannan <saravanak@google.com> wrote: > > On Tue, Feb 25, 2025 at 8:46 AM Rafael J. Wysocki <rjw@rjwysocki.net> wrote: > > > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > > > In analogy with previous changes, make device_suspend_late() and > > device_suspend_noirq() start the async suspend of the device's parent > > and suppliers after the device itself has been processed and make > > dpm_suspend_late() and dpm_noirq_suspend_devices() start processing > > "async" leaf devices (that is, devices without children or consumers) > > upfront because they don't need to wait for any other devices. > > > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > --- > > drivers/base/power/main.c | 60 +++++++++++++++++++++++++++++++++++++++------- > > 1 file changed, 52 insertions(+), 8 deletions(-) > > > > --- a/drivers/base/power/main.c > > +++ b/drivers/base/power/main.c > > @@ -1317,6 +1317,8 @@ > > device_links_read_unlock(idx); > > } > > > > +static void async_suspend_noirq(void *data, async_cookie_t cookie); > > + > > /** > > * device_suspend_noirq - Execute a "noirq suspend" callback for given device. > > * @dev: Device to handle. > > @@ -1396,7 +1398,13 @@ > > Complete: > > complete_all(&dev->power.completion); > > TRACE_SUSPEND(error); > > - return error; > > + > > + if (error || async_error) > > + return error; > > + > > + dpm_async_suspend_superior(dev, async_suspend_noirq); > > + > > + return 0; > > } > > > > static void async_suspend_noirq(void *data, async_cookie_t cookie) > > @@ -1410,6 +1418,7 @@ > > static int dpm_noirq_suspend_devices(pm_message_t state) > > { > > ktime_t starttime = ktime_get(); > > + struct device *dev; > > int error = 0; > > > > trace_suspend_resume(TPS("dpm_suspend_noirq"), state.event, true); > > @@ -1419,15 +1428,28 @@ > > > > mutex_lock(&dpm_list_mtx); > > > > + /* > > + * Start processing "async" leaf devices upfront because they don't need > > + * to wait. > > + */ > > + list_for_each_entry_reverse(dev, &dpm_late_early_list, power.entry) { > > + dpm_clear_async_state(dev); > > + if (dpm_leaf_device(dev)) > > + dpm_async_fn(dev, async_suspend_noirq); > > + } > > + > > while (!list_empty(&dpm_late_early_list)) { > > - struct device *dev = to_device(dpm_late_early_list.prev); > > + dev = to_device(dpm_late_early_list.prev); > > > > list_move(&dev->power.entry, &dpm_noirq_list); > > This issue is present in the previous patch too, but it's easier for > me to point it out here. Suspend abort will break now. > > For example, say the devices are suspended in the order A -> B -> C -> > D -> E if everything was sync. > > Case 1: Fully sync devices > If C aborts, only A and B will be in the dpm_noirq_list. When we try > to undo the suspend, we just resume devices in dpm_noirq_list and that > just resumes A and B. > > Case 2: Only C is sync. > When C aborts, A, B, D and E could have finished suspending. But only > A and B will be in the dpm_noirq_list. When we try to undo the > suspend, we just resume devices in dpm_noirq_list and that just > resumes A and B. D and E never get resumed. > > My fix for this is to move all devices to dpm_noirq_list if a suspend > aborts and then using the existing > is_suspended/is_noirq_suspended/is_late_suspended flags to skip over > devices that haven't been suspended. That works nicely except in > is_suspended and I tried to fix it in [2]. But you had an issue with > [2] that I didn't fully understand and I meant to dig deeper and fix. > But I didn't get around to it as I got swamped with other work. > > [2] - https://lore.kernel.org/linux-pm/20241114220921.2529905-2-saravanak@google.com/ I hope that my last message in this thread clarifies this a bit. Anyway, yes, this is a bug and yes, it can be addressed by just continuing to move devices to the target list on errors until all of them get moved and then look at is_suspended/is_noirq_suspended/is_late_suspended. I'll post a fix for the direct-complete handling in device_suspend() and an update of the last three patches in this series with this issue addressed.
--- a/drivers/base/power/main.c +++ b/drivers/base/power/main.c @@ -1317,6 +1317,8 @@ device_links_read_unlock(idx); } +static void async_suspend_noirq(void *data, async_cookie_t cookie); + /** * device_suspend_noirq - Execute a "noirq suspend" callback for given device. * @dev: Device to handle. @@ -1396,7 +1398,13 @@ Complete: complete_all(&dev->power.completion); TRACE_SUSPEND(error); - return error; + + if (error || async_error) + return error; + + dpm_async_suspend_superior(dev, async_suspend_noirq); + + return 0; } static void async_suspend_noirq(void *data, async_cookie_t cookie) @@ -1410,6 +1418,7 @@ static int dpm_noirq_suspend_devices(pm_message_t state) { ktime_t starttime = ktime_get(); + struct device *dev; int error = 0; trace_suspend_resume(TPS("dpm_suspend_noirq"), state.event, true); @@ -1419,15 +1428,28 @@ mutex_lock(&dpm_list_mtx); + /* + * Start processing "async" leaf devices upfront because they don't need + * to wait. + */ + list_for_each_entry_reverse(dev, &dpm_late_early_list, power.entry) { + dpm_clear_async_state(dev); + if (dpm_leaf_device(dev)) + dpm_async_fn(dev, async_suspend_noirq); + } + while (!list_empty(&dpm_late_early_list)) { - struct device *dev = to_device(dpm_late_early_list.prev); + dev = to_device(dpm_late_early_list.prev); list_move(&dev->power.entry, &dpm_noirq_list); - dpm_clear_async_state(dev); - if (dpm_async_fn(dev, async_suspend_noirq)) + dpm_async_unless_in_progress(dev, async_suspend_noirq); + + if (dev->power.work_in_progress) continue; + dev->power.work_in_progress = true; + get_device(dev); mutex_unlock(&dpm_list_mtx); @@ -1492,6 +1514,8 @@ spin_unlock_irq(&parent->power.lock); } +static void async_suspend_late(void *data, async_cookie_t cookie); + /** * device_suspend_late - Execute a "late suspend" callback for given device. * @dev: Device to handle. @@ -1568,7 +1592,13 @@ Complete: TRACE_SUSPEND(error); complete_all(&dev->power.completion); - return error; + + if (error || async_error) + return error; + + dpm_async_suspend_superior(dev, async_suspend_late); + + return 0; } static void async_suspend_late(void *data, async_cookie_t cookie) @@ -1586,6 +1616,7 @@ int dpm_suspend_late(pm_message_t state) { ktime_t starttime = ktime_get(); + struct device *dev; int error = 0; trace_suspend_resume(TPS("dpm_suspend_late"), state.event, true); @@ -1597,15 +1628,28 @@ mutex_lock(&dpm_list_mtx); + /* + * Start processing "async" leaf devices upfront because they don't need + * to wait. + */ + list_for_each_entry_reverse(dev, &dpm_suspended_list, power.entry) { + dpm_clear_async_state(dev); + if (dpm_leaf_device(dev)) + dpm_async_fn(dev, async_suspend_late); + } + while (!list_empty(&dpm_suspended_list)) { - struct device *dev = to_device(dpm_suspended_list.prev); + dev = to_device(dpm_suspended_list.prev); list_move(&dev->power.entry, &dpm_late_early_list); - dpm_clear_async_state(dev); - if (dpm_async_fn(dev, async_suspend_late)) + dpm_async_unless_in_progress(dev, async_suspend_late); + + if (dev->power.work_in_progress) continue; + dev->power.work_in_progress = true; + get_device(dev); mutex_unlock(&dpm_list_mtx);