Message ID | 1596088129-88814-1-git-send-email-chenxiang66@hisilicon.com (mailing list archive) |
---|---|
State | RFC, archived |
Headers | show |
Series | [RFC] PM:runtime:Remove the link state check in function rpm_get_supplier() | expand |
在 2020/7/30 13:48, chenxiang 写道: > From: Xiang Chen <chenxiang66@hisilicon.com> > > To support runtime PM for hisi SAS driver (the dirver is in directory > drivers/scsi/hisi_sas), we add device link between scsi_device->sdev_gendev > (consumer device) and hisi_hba->dev(supplier device) with flags > DL_FLAG_PM_RUNTIME | DL_FLAG_RPM_ACTIVE. > After runtime suspended consumers and supplier, unload the dirver which > cause a hung. We find that it calls function device_release_driver_internal() > to release supplier device hisi_hba->dev, as the device link is busy, > it sets the device link as DL_STATE_SUPPLIER_UNBIND, and then call function > device_release_driver_internal() to release consumer device > scsi_device->sdev_gendev). Then It will try to call pm_runtime_get_sync() > to resume consumer device, as consumer-supplier relation exists, it will try > to resume supplier first, but as the link state is already set as > DL_STATE_SUPPLIER_UNBIND, so it skips resuming supplier and only resume > consumer which cause a hung (it sends IOs to resume scsi_device while > SAS controller is suspended). Simple flow is as follows: > > device_release_driver_internal -> (supplier device) > if device_links_busy -> > device_links_unbind_consumers -> > ... > WRITE_ONCE(link->status, DL_STATE_SUPPLIER_UNBIND) > device_release_driver_internal (consumer device) > pm_runtime_get_sync -> (consumer device) > ... > __rpm_callback -> > rpm_get_suppliers -> > if link->state == DL_STATE_SUPPLIER_UNBIND -> skip the action of resuming the supplier > ... > pm_runtime_clean_up_links > ... Hi Rafael, do you have any idea about this issue? > It should guarantee correct suspend/resume ordering between a supplier > device and its consumer devices (resume the supplier device before resuming > consumer devices, and suspend consumer devices before suspending supplier > device) for runtime PM, but it seems the check in rpm_get_supplier() breaks > the rule, so remove it. > > Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> > --- > drivers/base/power/runtime.c | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c > index 9f62790..a8edd92 100644 > --- a/drivers/base/power/runtime.c > +++ b/drivers/base/power/runtime.c > @@ -291,8 +291,7 @@ static int rpm_get_suppliers(struct device *dev) > device_links_read_lock_held()) { > int retval; > > - if (!(link->flags & DL_FLAG_PM_RUNTIME) || > - READ_ONCE(link->status) == DL_STATE_SUPPLIER_UNBIND) > + if (!(link->flags & DL_FLAG_PM_RUNTIME)) > continue; > > retval = pm_runtime_get_sync(link->supplier);
On Mon, Sep 21, 2020 at 9:12 AM chenxiang (M) <chenxiang66@hisilicon.com> wrote: > > > > 在 2020/7/30 13:48, chenxiang 写道: > > From: Xiang Chen <chenxiang66@hisilicon.com> > > > > To support runtime PM for hisi SAS driver (the dirver is in directory > > drivers/scsi/hisi_sas), we add device link between scsi_device->sdev_gendev > > (consumer device) and hisi_hba->dev(supplier device) with flags > > DL_FLAG_PM_RUNTIME | DL_FLAG_RPM_ACTIVE. > > After runtime suspended consumers and supplier, unload the dirver which > > cause a hung. We find that it calls function device_release_driver_internal() > > to release supplier device hisi_hba->dev, as the device link is busy, > > it sets the device link as DL_STATE_SUPPLIER_UNBIND, and then call function > > device_release_driver_internal() to release consumer device > > scsi_device->sdev_gendev). Then It will try to call pm_runtime_get_sync() > > to resume consumer device, as consumer-supplier relation exists, it will try > > to resume supplier first, but as the link state is already set as > > DL_STATE_SUPPLIER_UNBIND, so it skips resuming supplier and only resume > > consumer which cause a hung (it sends IOs to resume scsi_device while > > SAS controller is suspended). Simple flow is as follows: > > > > device_release_driver_internal -> (supplier device) > > if device_links_busy -> > > device_links_unbind_consumers -> > > ... > > WRITE_ONCE(link->status, DL_STATE_SUPPLIER_UNBIND) > > device_release_driver_internal (consumer device) > > pm_runtime_get_sync -> (consumer device) > > ... > > __rpm_callback -> > > rpm_get_suppliers -> > > if link->state == DL_STATE_SUPPLIER_UNBIND -> skip the action of resuming the supplier > > ... > > pm_runtime_clean_up_links > > ... > > Hi Rafael, do you have any idea about this issue? > > > It should guarantee correct suspend/resume ordering between a supplier > > device and its consumer devices (resume the supplier device before resuming > > consumer devices, and suspend consumer devices before suspending supplier > > device) for runtime PM, but it seems the check in rpm_get_supplier() breaks > > the rule, so remove it. > > > > Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> > > --- > > drivers/base/power/runtime.c | 3 +-- > > 1 file changed, 1 insertion(+), 2 deletions(-) > > > > diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c > > index 9f62790..a8edd92 100644 > > --- a/drivers/base/power/runtime.c > > +++ b/drivers/base/power/runtime.c > > @@ -291,8 +291,7 @@ static int rpm_get_suppliers(struct device *dev) > > device_links_read_lock_held()) { > > int retval; > > > > - if (!(link->flags & DL_FLAG_PM_RUNTIME) || > > - READ_ONCE(link->status) == DL_STATE_SUPPLIER_UNBIND) > > + if (!(link->flags & DL_FLAG_PM_RUNTIME)) AFAICS you need to make the analogous change in rpm_put_suppliers(), but apart from this it should be fine. Thanks! > > continue; > > > > retval = pm_runtime_get_sync(link->supplier); > >
在 2020/9/22 0:02, Rafael J. Wysocki 写道: > On Mon, Sep 21, 2020 at 9:12 AM chenxiang (M) <chenxiang66@hisilicon.com> wrote: >> >> >> 在 2020/7/30 13:48, chenxiang 写道: >>> From: Xiang Chen <chenxiang66@hisilicon.com> >>> >>> To support runtime PM for hisi SAS driver (the dirver is in directory >>> drivers/scsi/hisi_sas), we add device link between scsi_device->sdev_gendev >>> (consumer device) and hisi_hba->dev(supplier device) with flags >>> DL_FLAG_PM_RUNTIME | DL_FLAG_RPM_ACTIVE. >>> After runtime suspended consumers and supplier, unload the dirver which >>> cause a hung. We find that it calls function device_release_driver_internal() >>> to release supplier device hisi_hba->dev, as the device link is busy, >>> it sets the device link as DL_STATE_SUPPLIER_UNBIND, and then call function >>> device_release_driver_internal() to release consumer device >>> scsi_device->sdev_gendev). Then It will try to call pm_runtime_get_sync() >>> to resume consumer device, as consumer-supplier relation exists, it will try >>> to resume supplier first, but as the link state is already set as >>> DL_STATE_SUPPLIER_UNBIND, so it skips resuming supplier and only resume >>> consumer which cause a hung (it sends IOs to resume scsi_device while >>> SAS controller is suspended). Simple flow is as follows: >>> >>> device_release_driver_internal -> (supplier device) >>> if device_links_busy -> >>> device_links_unbind_consumers -> >>> ... >>> WRITE_ONCE(link->status, DL_STATE_SUPPLIER_UNBIND) >>> device_release_driver_internal (consumer device) >>> pm_runtime_get_sync -> (consumer device) >>> ... >>> __rpm_callback -> >>> rpm_get_suppliers -> >>> if link->state == DL_STATE_SUPPLIER_UNBIND -> skip the action of resuming the supplier >>> ... >>> pm_runtime_clean_up_links >>> ... >> Hi Rafael, do you have any idea about this issue? >> >>> It should guarantee correct suspend/resume ordering between a supplier >>> device and its consumer devices (resume the supplier device before resuming >>> consumer devices, and suspend consumer devices before suspending supplier >>> device) for runtime PM, but it seems the check in rpm_get_supplier() breaks >>> the rule, so remove it. >>> >>> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> >>> --- >>> drivers/base/power/runtime.c | 3 +-- >>> 1 file changed, 1 insertion(+), 2 deletions(-) >>> >>> diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c >>> index 9f62790..a8edd92 100644 >>> --- a/drivers/base/power/runtime.c >>> +++ b/drivers/base/power/runtime.c >>> @@ -291,8 +291,7 @@ static int rpm_get_suppliers(struct device *dev) >>> device_links_read_lock_held()) { >>> int retval; >>> >>> - if (!(link->flags & DL_FLAG_PM_RUNTIME) || >>> - READ_ONCE(link->status) == DL_STATE_SUPPLIER_UNBIND) >>> + if (!(link->flags & DL_FLAG_PM_RUNTIME)) > AFAICS you need to make the analogous change in rpm_put_suppliers(), > but apart from this it should be fine. Thanks, i will add the change in next version. > > Thanks! > >>> continue; >>> >>> retval = pm_runtime_get_sync(link->supplier); >> > . >
diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c index 9f62790..a8edd92 100644 --- a/drivers/base/power/runtime.c +++ b/drivers/base/power/runtime.c @@ -291,8 +291,7 @@ static int rpm_get_suppliers(struct device *dev) device_links_read_lock_held()) { int retval; - if (!(link->flags & DL_FLAG_PM_RUNTIME) || - READ_ONCE(link->status) == DL_STATE_SUPPLIER_UNBIND) + if (!(link->flags & DL_FLAG_PM_RUNTIME)) continue; retval = pm_runtime_get_sync(link->supplier);