Message ID | 2601275.1tEomSadG4@kreacher (mailing list archive) |
---|---|
State | Mainlined, archived |
Headers | show |
Series | PM: core: Fix handling of devices deleted during system-wide resume | expand |
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > If a device is deleted by one of its system-wide resume callbacks > (for example, because it does not appear to be present or accessible > any more) along with its children, the resume of the children may > continue leading to use-after-free errors and other issues > (potentially). > > Namely, if the device's children are resumed asynchronously, their > resume may have been scheduled already before the device's callback > runs and so the device may be deleted while dpm_wait_for_superior() > is being executed for them. The memory taken up by the parent device > object may be freed then while dpm_wait() is waiting for the parent's > resume callback to complete, which leads to a use-after-free. > Moreover, the resume of the children is really not expected to > continue after they have been unregistered, so it must be terminated > right away in that case.Seokjoo Lee <seokjoo.lee@lge.com> > > To address this problem, modify dpm_wait_for_superior() to check > if the target device is still there in the system-wide PM list of > devices and if so, to increment its parent's reference counter, both > under dpm_list_mtx which prevents device_del() running for the child > from dropping the parent's reference counter prematurely. > > If the device is not present in the system-wide PM list of devices > any more, the resume of it cannot continue, so check that again after > dpm_wait() returns, which means that the parent's callback has been > completed, and pass the result of that check to the caller of > dpm_wait_for_superior() to allow it to abort the device's resume > if it is not there any more. > > Link: https://lore.kernel.org/linux-pm/1579568452-27253-1-git-send-email-chanho.min@lge.com > Reported-by: Chanho Min <chanho.min@lge.com> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > --- > drivers/base/power/main.c | 42 +++++++++++++++++++++++++++++++++++++----- > 1 file changed, 37 insertions(+), 5 deletions(-) > > Index: linux-pm/drivers/base/power/main.c > =================================================================== > --- linux-pm.orig/drivers/base/power/main.cSeokjoo Lee <seokjoo.lee@lge.com> > +++ linux-pm/drivers/base/power/main.c20. 1. 23. 오전 8:11에 Rafael J. Wysocki 이(가) 쓴 글: > @@ -273,10 +273,38 @@ static void dpm_wait_for_suppliers(struc > device_links_read_unlock(idx); > } > > -static void dpm_wait_for_superior(struct device *dev, bool async) > +static bool dpm_wait_for_superior(struct device *dev, bool async) > { > - dpm_wait(dev->parent, async); > + struct device *parent;board > + > + /* > + * If the device is resumed asynchronously and the parent's callback > + * deletes both the device and the parent itself, the parent object may > + * be freed while this function is running, so avoid that by reference > + * counting the parent once more unless the device has been deleted > + * already (in which case return right away). > + */ > + mutex_lock(&dpm_list_mtx); > + > + if (!device_pm_initialized(dev)) {20. 1. 23. 오전 8:11에 Rafael J. Wysocki 이(가) 쓴 글: > + mutex_unlock(&dpm_list_mtx); > + return false; > + } > + > + parent = get_device(dev->parent); > + > + mutex_unlock(&dpm_list_mtx); > + > + dpm_wait(parent, async); > + put_device(parent); > + > dpm_wait_for_suppliers(dev, async); > + > + /* > + * If the parent's callback has deleted the device, attempting to resume > + * it would be invalid, so avoid doing that then. > + */ > + return device_pm_initialized(dev);20. 1. 23. 오전 8:11에 Rafael J. Wysocki 이(가) 쓴 글: > } > > static void dpm_wait_for_consumers(struct device *dev, bool async) > @@ -621,7 +649,8 @@ static int device_resume_noirq(struct de > if (!dev->power.is_noirq_suspended) > goto Out; > > - dpm_wait_for_superior(dev, async); > + if (!dpm_wait_for_superior(dev, async)) > + goto Out; > > skip_resume = dev_pm_may_skip_resume(dev); > > @@ -829,7 +858,8 @@ static int device_resume_early(struct de > if (!dev->power.is_late_suspended) > goto Out; > > - dpm_wait_for_superior(dev, async);Seokjoo Lee <seokjoo.lee@lge.com> > + if (!dpm_wait_for_superior(dev, async)) > + goto Out; > > callback = dpm_subsys_resume_early_cb(dev, state, &info); > > @@ -944,7 +974,9 @@ static int device_resume(struct device * > goto Complete; > } > > - dpm_wait_for_superior(dev, async); > + if (!dpm_wait_for_superior(dev, async)) > + goto Complete; > + > dpm_watchdog_set(&wd, dev); > device_lock(dev);Thanks, This seems to solve the rare hang on our target. Actually, the problem is occurred in v4.4. Shouldn't it apply to -stable? Chanho
On Thu, Jan 23, 2020 at 3:14 AM Chanho Min <chanho.min@lge.com> wrote: > > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > > > If a device is deleted by one of its system-wide resume callbacks > > (for example, because it does not appear to be present or accessible > > any more) along with its children, the resume of the children may > > continue leading to use-after-free errors and other issues > > (potentially). > > > > Namely, if the device's children are resumed asynchronously, their > > resume may have been scheduled already before the device's callback > > runs and so the device may be deleted while dpm_wait_for_superior() > > is being executed for them. The memory taken up by the parent device > > object may be freed then while dpm_wait() is waiting for the parent's > > resume callback to complete, which leads to a use-after-free. > > Moreover, the resume of the children is really not expected to > > continue after they have been unregistered, so it must be terminated > > right away in that case.Seokjoo Lee <seokjoo.lee@lge.com> > > > > To address this problem, modify dpm_wait_for_superior() to check > > if the target device is still there in the system-wide PM list of > > devices and if so, to increment its parent's reference counter, both > > under dpm_list_mtx which prevents device_del() running for the child > > from dropping the parent's reference counter prematurely. > > > > If the device is not present in the system-wide PM list of devices > > any more, the resume of it cannot continue, so check that again after > > dpm_wait() returns, which means that the parent's callback has been > > completed, and pass the result of that check to the caller of > > dpm_wait_for_superior() to allow it to abort the device's resume > > if it is not there any more. > > > > Link: https://lore.kernel.org/linux-pm/1579568452-27253-1-git-send-email-chanho.min@lge.com > > Reported-by: Chanho Min <chanho.min@lge.com> > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > --- > > drivers/base/power/main.c | 42 +++++++++++++++++++++++++++++++++++++----- > > 1 file changed, 37 insertions(+), 5 deletions(-) > > > > Index: linux-pm/drivers/base/power/main.c > > =================================================================== > > --- linux-pm.orig/drivers/base/power/main.cSeokjoo Lee <seokjoo.lee@lge.com> > > +++ linux-pm/drivers/base/power/main.c20. 1. 23. 오전 8:11에 Rafael J. Wysocki 이(가) 쓴 글: > > @@ -273,10 +273,38 @@ static void dpm_wait_for_suppliers(struc > > device_links_read_unlock(idx); > > } > > > > -static void dpm_wait_for_superior(struct device *dev, bool async) > > +static bool dpm_wait_for_superior(struct device *dev, bool async) > > { > > - dpm_wait(dev->parent, async); > > + struct device *parent;board > > + > > + /* > > + * If the device is resumed asynchronously and the parent's callback > > + * deletes both the device and the parent itself, the parent object may > > + * be freed while this function is running, so avoid that by reference > > + * counting the parent once more unless the device has been deleted > > + * already (in which case return right away). > > + */ > > + mutex_lock(&dpm_list_mtx); > > + > > + if (!device_pm_initialized(dev)) {20. 1. 23. 오전 8:11에 Rafael J. Wysocki 이(가) 쓴 글: > > + mutex_unlock(&dpm_list_mtx); > > + return false; > > + } > > + > > + parent = get_device(dev->parent); > > + > > + mutex_unlock(&dpm_list_mtx); > > + > > + dpm_wait(parent, async); > > + put_device(parent); > > + > > dpm_wait_for_suppliers(dev, async); > > + > > + /* > > + * If the parent's callback has deleted the device, attempting to resume > > + * it would be invalid, so avoid doing that then. > > + */ > > + return device_pm_initialized(dev);20. 1. 23. 오전 8:11에 Rafael J. Wysocki 이(가) 쓴 글: > > } > > > > static void dpm_wait_for_consumers(struct device *dev, bool async) > > @@ -621,7 +649,8 @@ static int device_resume_noirq(struct de > > if (!dev->power.is_noirq_suspended) > > goto Out; > > > > - dpm_wait_for_superior(dev, async); > > + if (!dpm_wait_for_superior(dev, async)) > > + goto Out; > > > > skip_resume = dev_pm_may_skip_resume(dev); > > > > @@ -829,7 +858,8 @@ static int device_resume_early(struct de > > if (!dev->power.is_late_suspended) > > goto Out; > > > > - dpm_wait_for_superior(dev, async);Seokjoo Lee <seokjoo.lee@lge.com> > > + if (!dpm_wait_for_superior(dev, async)) > > + goto Out; > > > > callback = dpm_subsys_resume_early_cb(dev, state, &info); > > > > @@ -944,7 +974,9 @@ static int device_resume(struct device * > > goto Complete; > > } > > > > - dpm_wait_for_superior(dev, async); > > + if (!dpm_wait_for_superior(dev, async)) > > + goto Complete; > > + > > dpm_watchdog_set(&wd, dev); > > device_lock(dev);Thanks, This seems to solve the rare hang on our target. > Actually, the problem is occurred in v4.4. > Shouldn't it apply to -stable? Yes, it should, but I'll add a "stable" tag later.
On Thu, Jan 23, 2020 at 12:11:24AM +0100, Rafael J. Wysocki wrote: > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > If a device is deleted by one of its system-wide resume callbacks > (for example, because it does not appear to be present or accessible > any more) along with its children, the resume of the children may > continue leading to use-after-free errors and other issues > (potentially). > > Namely, if the device's children are resumed asynchronously, their > resume may have been scheduled already before the device's callback > runs and so the device may be deleted while dpm_wait_for_superior() > is being executed for them. The memory taken up by the parent device > object may be freed then while dpm_wait() is waiting for the parent's > resume callback to complete, which leads to a use-after-free. > Moreover, the resume of the children is really not expected to > continue after they have been unregistered, so it must be terminated > right away in that case. > > To address this problem, modify dpm_wait_for_superior() to check > if the target device is still there in the system-wide PM list of > devices and if so, to increment its parent's reference counter, both > under dpm_list_mtx which prevents device_del() running for the child > from dropping the parent's reference counter prematurely. > > If the device is not present in the system-wide PM list of devices > any more, the resume of it cannot continue, so check that again after > dpm_wait() returns, which means that the parent's callback has been > completed, and pass the result of that check to the caller of > dpm_wait_for_superior() to allow it to abort the device's resume > if it is not there any more. > > Link: https://lore.kernel.org/linux-pm/1579568452-27253-1-git-send-email-chanho.min@lge.com > Reported-by: Chanho Min <chanho.min@lge.com> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > --- > drivers/base/power/main.c | 42 +++++++++++++++++++++++++++++++++++++----- > 1 file changed, 37 insertions(+), 5 deletions(-) Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Index: linux-pm/drivers/base/power/main.c =================================================================== --- linux-pm.orig/drivers/base/power/main.c +++ linux-pm/drivers/base/power/main.c @@ -273,10 +273,38 @@ static void dpm_wait_for_suppliers(struc device_links_read_unlock(idx); } -static void dpm_wait_for_superior(struct device *dev, bool async) +static bool dpm_wait_for_superior(struct device *dev, bool async) { - dpm_wait(dev->parent, async); + struct device *parent; + + /* + * If the device is resumed asynchronously and the parent's callback + * deletes both the device and the parent itself, the parent object may + * be freed while this function is running, so avoid that by reference + * counting the parent once more unless the device has been deleted + * already (in which case return right away). + */ + mutex_lock(&dpm_list_mtx); + + if (!device_pm_initialized(dev)) { + mutex_unlock(&dpm_list_mtx); + return false; + } + + parent = get_device(dev->parent); + + mutex_unlock(&dpm_list_mtx); + + dpm_wait(parent, async); + put_device(parent); + dpm_wait_for_suppliers(dev, async); + + /* + * If the parent's callback has deleted the device, attempting to resume + * it would be invalid, so avoid doing that then. + */ + return device_pm_initialized(dev); } static void dpm_wait_for_consumers(struct device *dev, bool async) @@ -621,7 +649,8 @@ static int device_resume_noirq(struct de if (!dev->power.is_noirq_suspended) goto Out; - dpm_wait_for_superior(dev, async); + if (!dpm_wait_for_superior(dev, async)) + goto Out; skip_resume = dev_pm_may_skip_resume(dev); @@ -829,7 +858,8 @@ static int device_resume_early(struct de if (!dev->power.is_late_suspended) goto Out; - dpm_wait_for_superior(dev, async); + if (!dpm_wait_for_superior(dev, async)) + goto Out; callback = dpm_subsys_resume_early_cb(dev, state, &info); @@ -944,7 +974,9 @@ static int device_resume(struct device * goto Complete; } - dpm_wait_for_superior(dev, async); + if (!dpm_wait_for_superior(dev, async)) + goto Complete; + dpm_watchdog_set(&wd, dev); device_lock(dev);