Message ID | 1600780266-155908-1-git-send-email-chenxiang66@hisilicon.com (mailing list archive) |
---|---|
State | Mainlined, archived |
Headers | show |
Series | [v2] PM:runtime:Remove the link state check in function rpm_get_supplier() and rpm_put_supplier() | expand |
On Tue, Sep 22, 2020 at 3:15 PM chenxiang <chenxiang66@hisilicon.com> wrote: > > From: Xiang Chen <chenxiang66@hisilicon.com> > > To support runtime PM for hisi SAS driver (the driver is in directory > drivers/scsi/hisi_sas), we add device link between scsi_device->sdev_gendev > (consumer device) and hisi_hba->dev(supplier device) with flags > DL_FLAG_PM_RUNTIME | DL_FLAG_RPM_ACTIVE. > After runtime suspended consumers and supplier, unload the dirver which > cause a hung. We find that it calls function device_release_driver_internal() > to release supplier device hisi_hba->dev, as the device link is busy, > it sets the device link as DL_STATE_SUPPLIER_UNBIND, and then call function > device_release_driver_internal() to release consumer device > scsi_device->sdev_gendev). Then it will try to call pm_runtime_get_sync() > to resume consumer device, as consumer-supplier relation exists, it will try > to resume supplier first, but as the link state is already set as > DL_STATE_SUPPLIER_UNBIND, so it skips resuming supplier and only resume > consumer which cause a hung (it sends IOs to resume scsi_device while > SAS controller is suspended). Simple flow is as follows: > > device_release_driver_internal -> (supplier device) > if device_links_busy -> > device_links_unbind_consumers -> > ... > WRITE_ONCE(link->status, DL_STATE_SUPPLIER_UNBIND) > device_release_driver_internal (consumer device) > pm_runtime_get_sync -> (consumer device) > ... > __rpm_callback -> > rpm_get_suppliers -> > if link->state == DL_STATE_SUPPLIER_UNBIND -> skip the action of resuming the supplier > ... > pm_runtime_clean_up_links > ... > > It should guarantee correct suspend/resume ordering between a supplier > device and its consumer devices (resume the supplier device before resuming > consumer devices, and suspend consumer devices before suspending supplier > device) for runtime PM, but it seems the checks in rpm_get_supplier() and > rpm_put_supplier() break the rule, so remove them. > > Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> > --- > drivers/base/power/runtime.c | 5 +---- > 1 file changed, 1 insertion(+), 4 deletions(-) > > diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c > index 8143210..6f605f7 100644 > --- a/drivers/base/power/runtime.c > +++ b/drivers/base/power/runtime.c > @@ -291,8 +291,7 @@ static int rpm_get_suppliers(struct device *dev) > device_links_read_lock_held()) { > int retval; > > - if (!(link->flags & DL_FLAG_PM_RUNTIME) || > - READ_ONCE(link->status) == DL_STATE_SUPPLIER_UNBIND) > + if (!(link->flags & DL_FLAG_PM_RUNTIME)) > continue; > > retval = pm_runtime_get_sync(link->supplier); > @@ -312,8 +311,6 @@ static void rpm_put_suppliers(struct device *dev) > > list_for_each_entry_rcu(link, &dev->links.suppliers, c_node, > device_links_read_lock_held()) { > - if (READ_ONCE(link->status) == DL_STATE_SUPPLIER_UNBIND) > - continue; > > while (refcount_dec_not_one(&link->rpm_active)) > pm_runtime_put(link->supplier); > -- Applied as 5.10 material with some edits in the subject and changelog, thanks!
On Fri, Sep 25, 2020 at 5:02 PM Rafael J. Wysocki <rafael@kernel.org> wrote: > > On Tue, Sep 22, 2020 at 3:15 PM chenxiang <chenxiang66@hisilicon.com> wrote: > > > > From: Xiang Chen <chenxiang66@hisilicon.com> > > > > To support runtime PM for hisi SAS driver (the driver is in directory > > drivers/scsi/hisi_sas), we add device link between scsi_device->sdev_gendev > > (consumer device) and hisi_hba->dev(supplier device) with flags > > DL_FLAG_PM_RUNTIME | DL_FLAG_RPM_ACTIVE. > > After runtime suspended consumers and supplier, unload the dirver which > > cause a hung. We find that it calls function device_release_driver_internal() > > to release supplier device hisi_hba->dev, as the device link is busy, > > it sets the device link as DL_STATE_SUPPLIER_UNBIND, and then call function > > device_release_driver_internal() to release consumer device > > scsi_device->sdev_gendev). Then it will try to call pm_runtime_get_sync() > > to resume consumer device, as consumer-supplier relation exists, it will try > > to resume supplier first, It looks like I have overlooked one thing here. Does it help (without the $subject patch) to move the pm_runtime_get_sync(dev) in __device_release_driver() before the while (device_links_busy(dev)) loop? > > but as the link state is already set as > > DL_STATE_SUPPLIER_UNBIND, so it skips resuming supplier and only resume > > consumer which cause a hung (it sends IOs to resume scsi_device while > > SAS controller is suspended). Simple flow is as follows: > > > > device_release_driver_internal -> (supplier device) > > if device_links_busy -> > > device_links_unbind_consumers -> > > ... > > WRITE_ONCE(link->status, DL_STATE_SUPPLIER_UNBIND) > > device_release_driver_internal (consumer device) > > pm_runtime_get_sync -> (consumer device) > > ... > > __rpm_callback -> > > rpm_get_suppliers -> > > if link->state == DL_STATE_SUPPLIER_UNBIND -> skip the action of resuming the supplier > > ... > > pm_runtime_clean_up_links > > ... > > > > It should guarantee correct suspend/resume ordering between a supplier > > device and its consumer devices (resume the supplier device before resuming > > consumer devices, and suspend consumer devices before suspending supplier > > device) for runtime PM, but it seems the checks in rpm_get_supplier() and > > rpm_put_supplier() break the rule, so remove them. > > > > Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> > > --- > > drivers/base/power/runtime.c | 5 +---- > > 1 file changed, 1 insertion(+), 4 deletions(-) > > > > diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c > > index 8143210..6f605f7 100644 > > --- a/drivers/base/power/runtime.c > > +++ b/drivers/base/power/runtime.c > > @@ -291,8 +291,7 @@ static int rpm_get_suppliers(struct device *dev) > > device_links_read_lock_held()) { > > int retval; > > > > - if (!(link->flags & DL_FLAG_PM_RUNTIME) || > > - READ_ONCE(link->status) == DL_STATE_SUPPLIER_UNBIND) > > + if (!(link->flags & DL_FLAG_PM_RUNTIME)) > > continue; > > > > retval = pm_runtime_get_sync(link->supplier); > > @@ -312,8 +311,6 @@ static void rpm_put_suppliers(struct device *dev) > > > > list_for_each_entry_rcu(link, &dev->links.suppliers, c_node, > > device_links_read_lock_held()) { > > - if (READ_ONCE(link->status) == DL_STATE_SUPPLIER_UNBIND) > > - continue; > > > > while (refcount_dec_not_one(&link->rpm_active)) > > pm_runtime_put(link->supplier); > > -- > > Applied as 5.10 material with some edits in the subject and changelog, thanks! I may need to revert it, because it introduced a (theoretical for now) race condition between PM-runtime and pm_runtime_clean_up_links(). Thanks!
Hi Rafael, 在 2020/10/20 1:03, Rafael J. Wysocki 写道: > On Fri, Sep 25, 2020 at 5:02 PM Rafael J. Wysocki <rafael@kernel.org> wrote: >> On Tue, Sep 22, 2020 at 3:15 PM chenxiang <chenxiang66@hisilicon.com> wrote: >>> From: Xiang Chen <chenxiang66@hisilicon.com> >>> >>> To support runtime PM for hisi SAS driver (the driver is in directory >>> drivers/scsi/hisi_sas), we add device link between scsi_device->sdev_gendev >>> (consumer device) and hisi_hba->dev(supplier device) with flags >>> DL_FLAG_PM_RUNTIME | DL_FLAG_RPM_ACTIVE. >>> After runtime suspended consumers and supplier, unload the dirver which >>> cause a hung. We find that it calls function device_release_driver_internal() >>> to release supplier device hisi_hba->dev, as the device link is busy, >>> it sets the device link as DL_STATE_SUPPLIER_UNBIND, and then call function >>> device_release_driver_internal() to release consumer device >>> scsi_device->sdev_gendev). Then it will try to call pm_runtime_get_sync() >>> to resume consumer device, as consumer-supplier relation exists, it will try >>> to resume supplier first, > It looks like I have overlooked one thing here. > > Does it help (without the $subject patch) to move the > pm_runtime_get_sync(dev) in __device_release_driver() before the while > (device_links_busy(dev)) loop? I have tested it, and it also solve the issue. > >>> but as the link state is already set as >>> DL_STATE_SUPPLIER_UNBIND, so it skips resuming supplier and only resume >>> consumer which cause a hung (it sends IOs to resume scsi_device while >>> SAS controller is suspended). Simple flow is as follows: >>> >>> device_release_driver_internal -> (supplier device) >>> if device_links_busy -> >>> device_links_unbind_consumers -> >>> ... >>> WRITE_ONCE(link->status, DL_STATE_SUPPLIER_UNBIND) >>> device_release_driver_internal (consumer device) >>> pm_runtime_get_sync -> (consumer device) >>> ... >>> __rpm_callback -> >>> rpm_get_suppliers -> >>> if link->state == DL_STATE_SUPPLIER_UNBIND -> skip the action of resuming the supplier >>> ... >>> pm_runtime_clean_up_links >>> ... >>> >>> It should guarantee correct suspend/resume ordering between a supplier >>> device and its consumer devices (resume the supplier device before resuming >>> consumer devices, and suspend consumer devices before suspending supplier >>> device) for runtime PM, but it seems the checks in rpm_get_supplier() and >>> rpm_put_supplier() break the rule, so remove them. >>> >>> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> >>> --- >>> drivers/base/power/runtime.c | 5 +---- >>> 1 file changed, 1 insertion(+), 4 deletions(-) >>> >>> diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c >>> index 8143210..6f605f7 100644 >>> --- a/drivers/base/power/runtime.c >>> +++ b/drivers/base/power/runtime.c >>> @@ -291,8 +291,7 @@ static int rpm_get_suppliers(struct device *dev) >>> device_links_read_lock_held()) { >>> int retval; >>> >>> - if (!(link->flags & DL_FLAG_PM_RUNTIME) || >>> - READ_ONCE(link->status) == DL_STATE_SUPPLIER_UNBIND) >>> + if (!(link->flags & DL_FLAG_PM_RUNTIME)) >>> continue; >>> >>> retval = pm_runtime_get_sync(link->supplier); >>> @@ -312,8 +311,6 @@ static void rpm_put_suppliers(struct device *dev) >>> >>> list_for_each_entry_rcu(link, &dev->links.suppliers, c_node, >>> device_links_read_lock_held()) { >>> - if (READ_ONCE(link->status) == DL_STATE_SUPPLIER_UNBIND) >>> - continue; >>> >>> while (refcount_dec_not_one(&link->rpm_active)) >>> pm_runtime_put(link->supplier); >>> -- >> Applied as 5.10 material with some edits in the subject and changelog, thanks! > I may need to revert it, because it introduced a (theoretical for now) > race condition between PM-runtime and pm_runtime_clean_up_links(). > > Thanks! > > . >
On Tue, Oct 20, 2020 at 4:24 AM chenxiang (M) <chenxiang66@hisilicon.com> wrote: > > Hi Rafael, > > 在 2020/10/20 1:03, Rafael J. Wysocki 写道: > > On Fri, Sep 25, 2020 at 5:02 PM Rafael J. Wysocki <rafael@kernel.org> wrote: > >> On Tue, Sep 22, 2020 at 3:15 PM chenxiang <chenxiang66@hisilicon.com> wrote: > >>> From: Xiang Chen <chenxiang66@hisilicon.com> > >>> > >>> To support runtime PM for hisi SAS driver (the driver is in directory > >>> drivers/scsi/hisi_sas), we add device link between scsi_device->sdev_gendev > >>> (consumer device) and hisi_hba->dev(supplier device) with flags > >>> DL_FLAG_PM_RUNTIME | DL_FLAG_RPM_ACTIVE. > >>> After runtime suspended consumers and supplier, unload the dirver which > >>> cause a hung. We find that it calls function device_release_driver_internal() > >>> to release supplier device hisi_hba->dev, as the device link is busy, > >>> it sets the device link as DL_STATE_SUPPLIER_UNBIND, and then call function > >>> device_release_driver_internal() to release consumer device > >>> scsi_device->sdev_gendev). Then it will try to call pm_runtime_get_sync() > >>> to resume consumer device, as consumer-supplier relation exists, it will try > >>> to resume supplier first, > > It looks like I have overlooked one thing here. > > > > Does it help (without the $subject patch) to move the > > pm_runtime_get_sync(dev) in __device_release_driver() before the while > > (device_links_busy(dev)) loop? > > I have tested it, and it also solve the issue. OK, thanks!
diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c index 8143210..6f605f7 100644 --- a/drivers/base/power/runtime.c +++ b/drivers/base/power/runtime.c @@ -291,8 +291,7 @@ static int rpm_get_suppliers(struct device *dev) device_links_read_lock_held()) { int retval; - if (!(link->flags & DL_FLAG_PM_RUNTIME) || - READ_ONCE(link->status) == DL_STATE_SUPPLIER_UNBIND) + if (!(link->flags & DL_FLAG_PM_RUNTIME)) continue; retval = pm_runtime_get_sync(link->supplier); @@ -312,8 +311,6 @@ static void rpm_put_suppliers(struct device *dev) list_for_each_entry_rcu(link, &dev->links.suppliers, c_node, device_links_read_lock_held()) { - if (READ_ONCE(link->status) == DL_STATE_SUPPLIER_UNBIND) - continue; while (refcount_dec_not_one(&link->rpm_active)) pm_runtime_put(link->supplier);