Message ID | 1476923170-111986-2-git-send-email-briannorris@chromium.org (mailing list archive) |
---|---|
State | Superseded, archived |
Headers | show |
Ugh, as I hope the patch context makes clear, the subject should be s/early/late/ as should the body of the commit message. On Wed, Oct 19, 2016 at 05:26:10PM -0700, Brian Norris wrote: > Consider two devices, A and B, where B is a child of A, and B utilizes > asynchronous suspend (it does not matter whether A is sync or async). If > B fails to suspend_noirq() or suspend_early(), or is interrupted by a s/early/late/ > wakeup (pm_wakeup_pending()), then it aborts and sets the async_error > variable. However, device A does not (immediately) check the async_error > variable; it may continue to run its own suspend_noirq()/suspend_early() s/early/late/ > callback. This is bad. > > We can resolve this problem by checking the async_error flag after > waiting for children to suspend, using the same logic for the noirq and > late suspend cases as we already do for __device_suspend(). > > It's easy to observe this erroneous behavior by, for example, forcing a > device to sleep a bit in its suspend_noirq() (to ensure the parent is > waiting for the child to complete), then return an error, and watch the > parent suspend_noirq() still get called. (Or similarly, fake a wakeup > event at the right (or is it wrong?) time.) > > Fixes: de377b397272 ("PM / sleep: Asynchronous threads for suspend_late") > Fixes: 28b6fd6e3779 ("PM / sleep: Asynchronous threads for suspend_noirq") > Reported-by: Jeffy Chen <jeffy.chen@rock-chips.com> > Signed-off-by: Brian Norris <briannorris@chromium.org> If the patch is otherwise acceptable, feel free to make the above edits. Or I can fix them up if I send v2. Sorry for the noise, Brian -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Oct 19, 2016 at 05:26:10PM -0700, Brian Norris wrote: > Consider two devices, A and B, where B is a child of A, and B utilizes > asynchronous suspend (it does not matter whether A is sync or async). If > B fails to suspend_noirq() or suspend_early(), or is interrupted by a > wakeup (pm_wakeup_pending()), then it aborts and sets the async_error > variable. However, device A does not (immediately) check the async_error > variable; it may continue to run its own suspend_noirq()/suspend_early() > callback. This is bad. > > We can resolve this problem by checking the async_error flag after > waiting for children to suspend, using the same logic for the noirq and > late suspend cases as we already do for __device_suspend(). > > It's easy to observe this erroneous behavior by, for example, forcing a > device to sleep a bit in its suspend_noirq() (to ensure the parent is > waiting for the child to complete), then return an error, and watch the > parent suspend_noirq() still get called. (Or similarly, fake a wakeup > event at the right (or is it wrong?) time.) > > Fixes: de377b397272 ("PM / sleep: Asynchronous threads for suspend_late") > Fixes: 28b6fd6e3779 ("PM / sleep: Asynchronous threads for suspend_noirq") > Reported-by: Jeffy Chen <jeffy.chen@rock-chips.com> > Signed-off-by: Brian Norris <briannorris@chromium.org> Makes sense to me. Reviewed-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> > --- > Tested mostly on 4.4, but the code (and seemingly the bug) is mostly untouched > since then. I can try to conjure up a proper upstream test if required. > > drivers/base/power/main.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c > index c58563581345..eaf6b53463a5 100644 > --- a/drivers/base/power/main.c > +++ b/drivers/base/power/main.c > @@ -1040,6 +1040,9 @@ static int __device_suspend_noirq(struct device *dev, pm_message_t state, bool a > > dpm_wait_for_children(dev, async); > > + if (async_error) > + goto Complete; > + > if (dev->pm_domain) { > info = "noirq power domain "; > callback = pm_noirq_op(&dev->pm_domain->ops, state); > @@ -1187,6 +1190,9 @@ static int __device_suspend_late(struct device *dev, pm_message_t state, bool as > > dpm_wait_for_children(dev, async); > > + if (async_error) > + goto Complete; > + > if (dev->pm_domain) { > info = "late power domain "; > callback = pm_late_early_op(&dev->pm_domain->ops, state); > -- > 2.8.0.rc3.226.g39d4020 >
On Wed, Oct 19, 2016 at 05:46:11PM -0700, Brian Norris wrote: > Ugh, as I hope the patch context makes clear, the subject should be > > s/early/late/ > > as should the body of the commit message. > > On Wed, Oct 19, 2016 at 05:26:10PM -0700, Brian Norris wrote: > > Consider two devices, A and B, where B is a child of A, and B utilizes > > asynchronous suspend (it does not matter whether A is sync or async). If > > B fails to suspend_noirq() or suspend_early(), or is interrupted by a > > s/early/late/ > > > wakeup (pm_wakeup_pending()), then it aborts and sets the async_error > > variable. However, device A does not (immediately) check the async_error > > variable; it may continue to run its own suspend_noirq()/suspend_early() > > s/early/late/ > > > callback. This is bad. > > > > We can resolve this problem by checking the async_error flag after > > waiting for children to suspend, using the same logic for the noirq and > > late suspend cases as we already do for __device_suspend(). > > > > It's easy to observe this erroneous behavior by, for example, forcing a > > device to sleep a bit in its suspend_noirq() (to ensure the parent is > > waiting for the child to complete), then return an error, and watch the > > parent suspend_noirq() still get called. (Or similarly, fake a wakeup > > event at the right (or is it wrong?) time.) > > > > Fixes: de377b397272 ("PM / sleep: Asynchronous threads for suspend_late") > > Fixes: 28b6fd6e3779 ("PM / sleep: Asynchronous threads for suspend_noirq") > > Reported-by: Jeffy Chen <jeffy.chen@rock-chips.com> > > Signed-off-by: Brian Norris <briannorris@chromium.org> > > If the patch is otherwise acceptable, feel free to make the above edits. > Or I can fix them up if I send v2. Please fix up, we should never have to hand-edit a changelog text... -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Oct 27, 2016 at 05:34:06PM +0200, Greg Kroah-Hartman wrote: > On Wed, Oct 19, 2016 at 05:46:11PM -0700, Brian Norris wrote: > > Ugh, as I hope the patch context makes clear, the subject should be > > > > s/early/late/ > > > > as should the body of the commit message. [...] > > If the patch is otherwise acceptable, feel free to make the above edits. > > Or I can fix them up if I send v2. > > Please fix up, we should never have to hand-edit a changelog text... Sorry, I figured it was possible to get review without spamming everyone. If I was going to need a v2 anyway, for example, then why bother for a trivial change. I'll send v2 (of just this patch) shortly. Thanks, Brian -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c index c58563581345..eaf6b53463a5 100644 --- a/drivers/base/power/main.c +++ b/drivers/base/power/main.c @@ -1040,6 +1040,9 @@ static int __device_suspend_noirq(struct device *dev, pm_message_t state, bool a dpm_wait_for_children(dev, async); + if (async_error) + goto Complete; + if (dev->pm_domain) { info = "noirq power domain "; callback = pm_noirq_op(&dev->pm_domain->ops, state); @@ -1187,6 +1190,9 @@ static int __device_suspend_late(struct device *dev, pm_message_t state, bool as dpm_wait_for_children(dev, async); + if (async_error) + goto Complete; + if (dev->pm_domain) { info = "late power domain "; callback = pm_late_early_op(&dev->pm_domain->ops, state);
Consider two devices, A and B, where B is a child of A, and B utilizes asynchronous suspend (it does not matter whether A is sync or async). If B fails to suspend_noirq() or suspend_early(), or is interrupted by a wakeup (pm_wakeup_pending()), then it aborts and sets the async_error variable. However, device A does not (immediately) check the async_error variable; it may continue to run its own suspend_noirq()/suspend_early() callback. This is bad. We can resolve this problem by checking the async_error flag after waiting for children to suspend, using the same logic for the noirq and late suspend cases as we already do for __device_suspend(). It's easy to observe this erroneous behavior by, for example, forcing a device to sleep a bit in its suspend_noirq() (to ensure the parent is waiting for the child to complete), then return an error, and watch the parent suspend_noirq() still get called. (Or similarly, fake a wakeup event at the right (or is it wrong?) time.) Fixes: de377b397272 ("PM / sleep: Asynchronous threads for suspend_late") Fixes: 28b6fd6e3779 ("PM / sleep: Asynchronous threads for suspend_noirq") Reported-by: Jeffy Chen <jeffy.chen@rock-chips.com> Signed-off-by: Brian Norris <briannorris@chromium.org> --- Tested mostly on 4.4, but the code (and seemingly the bug) is mostly untouched since then. I can try to conjure up a proper upstream test if required. drivers/base/power/main.c | 6 ++++++ 1 file changed, 6 insertions(+)