diff mbox series

[v2] PM:runtime:Remove the link state check in function rpm_get_supplier() and rpm_put_supplier()

Message ID 1600780266-155908-1-git-send-email-chenxiang66@hisilicon.com (mailing list archive)
State Mainlined, archived
Headers show
Series [v2] PM:runtime:Remove the link state check in function rpm_get_supplier() and rpm_put_supplier() | expand

Commit Message

chenxiang Sept. 22, 2020, 1:11 p.m. UTC
From: Xiang Chen <chenxiang66@hisilicon.com>

To support runtime PM for hisi SAS driver (the driver is in directory
drivers/scsi/hisi_sas), we add device link between scsi_device->sdev_gendev
(consumer device) and hisi_hba->dev(supplier device) with flags
DL_FLAG_PM_RUNTIME | DL_FLAG_RPM_ACTIVE.
After runtime suspended consumers and supplier, unload the dirver which
cause a hung. We find that it calls function device_release_driver_internal()
to release supplier device hisi_hba->dev, as the device link is busy,
it sets the device link as DL_STATE_SUPPLIER_UNBIND, and then call function
device_release_driver_internal() to release consumer device
scsi_device->sdev_gendev). Then it will try to call pm_runtime_get_sync()
to resume consumer device, as consumer-supplier relation exists, it will try
to resume supplier first, but as the link state is already set as
DL_STATE_SUPPLIER_UNBIND, so it skips resuming supplier and only resume
consumer which cause a hung (it sends IOs to resume scsi_device while
SAS controller is suspended). Simple flow is as follows:

device_release_driver_internal -> (supplier device)
    if device_links_busy ->
	device_links_unbind_consumers ->
	    ...
	    WRITE_ONCE(link->status, DL_STATE_SUPPLIER_UNBIND)
	    device_release_driver_internal (consumer device)
    pm_runtime_get_sync -> (consumer device)
	...
	__rpm_callback ->
	    rpm_get_suppliers ->
		if link->state == DL_STATE_SUPPLIER_UNBIND -> skip the action of resuming the supplier
		...
    pm_runtime_clean_up_links
    ...

It should guarantee correct suspend/resume ordering between a supplier
device and its consumer devices (resume the supplier device before resuming
consumer devices, and suspend consumer devices before suspending supplier
device) for runtime PM, but it seems the checks in rpm_get_supplier() and
rpm_put_supplier() break the rule, so remove them.

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
---
 drivers/base/power/runtime.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

Comments

Rafael J. Wysocki Sept. 25, 2020, 3:02 p.m. UTC | #1
On Tue, Sep 22, 2020 at 3:15 PM chenxiang <chenxiang66@hisilicon.com> wrote:
>
> From: Xiang Chen <chenxiang66@hisilicon.com>
>
> To support runtime PM for hisi SAS driver (the driver is in directory
> drivers/scsi/hisi_sas), we add device link between scsi_device->sdev_gendev
> (consumer device) and hisi_hba->dev(supplier device) with flags
> DL_FLAG_PM_RUNTIME | DL_FLAG_RPM_ACTIVE.
> After runtime suspended consumers and supplier, unload the dirver which
> cause a hung. We find that it calls function device_release_driver_internal()
> to release supplier device hisi_hba->dev, as the device link is busy,
> it sets the device link as DL_STATE_SUPPLIER_UNBIND, and then call function
> device_release_driver_internal() to release consumer device
> scsi_device->sdev_gendev). Then it will try to call pm_runtime_get_sync()
> to resume consumer device, as consumer-supplier relation exists, it will try
> to resume supplier first, but as the link state is already set as
> DL_STATE_SUPPLIER_UNBIND, so it skips resuming supplier and only resume
> consumer which cause a hung (it sends IOs to resume scsi_device while
> SAS controller is suspended). Simple flow is as follows:
>
> device_release_driver_internal -> (supplier device)
>     if device_links_busy ->
>         device_links_unbind_consumers ->
>             ...
>             WRITE_ONCE(link->status, DL_STATE_SUPPLIER_UNBIND)
>             device_release_driver_internal (consumer device)
>     pm_runtime_get_sync -> (consumer device)
>         ...
>         __rpm_callback ->
>             rpm_get_suppliers ->
>                 if link->state == DL_STATE_SUPPLIER_UNBIND -> skip the action of resuming the supplier
>                 ...
>     pm_runtime_clean_up_links
>     ...
>
> It should guarantee correct suspend/resume ordering between a supplier
> device and its consumer devices (resume the supplier device before resuming
> consumer devices, and suspend consumer devices before suspending supplier
> device) for runtime PM, but it seems the checks in rpm_get_supplier() and
> rpm_put_supplier() break the rule, so remove them.
>
> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
> ---
>  drivers/base/power/runtime.c | 5 +----
>  1 file changed, 1 insertion(+), 4 deletions(-)
>
> diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
> index 8143210..6f605f7 100644
> --- a/drivers/base/power/runtime.c
> +++ b/drivers/base/power/runtime.c
> @@ -291,8 +291,7 @@ static int rpm_get_suppliers(struct device *dev)
>                                 device_links_read_lock_held()) {
>                 int retval;
>
> -               if (!(link->flags & DL_FLAG_PM_RUNTIME) ||
> -                   READ_ONCE(link->status) == DL_STATE_SUPPLIER_UNBIND)
> +               if (!(link->flags & DL_FLAG_PM_RUNTIME))
>                         continue;
>
>                 retval = pm_runtime_get_sync(link->supplier);
> @@ -312,8 +311,6 @@ static void rpm_put_suppliers(struct device *dev)
>
>         list_for_each_entry_rcu(link, &dev->links.suppliers, c_node,
>                                 device_links_read_lock_held()) {
> -               if (READ_ONCE(link->status) == DL_STATE_SUPPLIER_UNBIND)
> -                       continue;
>
>                 while (refcount_dec_not_one(&link->rpm_active))
>                         pm_runtime_put(link->supplier);
> --

Applied as 5.10 material with some edits in the subject and changelog, thanks!
Rafael J. Wysocki Oct. 19, 2020, 5:03 p.m. UTC | #2
On Fri, Sep 25, 2020 at 5:02 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Tue, Sep 22, 2020 at 3:15 PM chenxiang <chenxiang66@hisilicon.com> wrote:
> >
> > From: Xiang Chen <chenxiang66@hisilicon.com>
> >
> > To support runtime PM for hisi SAS driver (the driver is in directory
> > drivers/scsi/hisi_sas), we add device link between scsi_device->sdev_gendev
> > (consumer device) and hisi_hba->dev(supplier device) with flags
> > DL_FLAG_PM_RUNTIME | DL_FLAG_RPM_ACTIVE.
> > After runtime suspended consumers and supplier, unload the dirver which
> > cause a hung. We find that it calls function device_release_driver_internal()
> > to release supplier device hisi_hba->dev, as the device link is busy,
> > it sets the device link as DL_STATE_SUPPLIER_UNBIND, and then call function
> > device_release_driver_internal() to release consumer device
> > scsi_device->sdev_gendev). Then it will try to call pm_runtime_get_sync()
> > to resume consumer device, as consumer-supplier relation exists, it will try
> > to resume supplier first,

It looks like I have overlooked one thing here.

Does it help (without the $subject patch) to move the
pm_runtime_get_sync(dev) in __device_release_driver() before the while
(device_links_busy(dev)) loop?

> > but as the link state is already set as
> > DL_STATE_SUPPLIER_UNBIND, so it skips resuming supplier and only resume
> > consumer which cause a hung (it sends IOs to resume scsi_device while
> > SAS controller is suspended). Simple flow is as follows:
> >
> > device_release_driver_internal -> (supplier device)
> >     if device_links_busy ->
> >         device_links_unbind_consumers ->
> >             ...
> >             WRITE_ONCE(link->status, DL_STATE_SUPPLIER_UNBIND)
> >             device_release_driver_internal (consumer device)
> >     pm_runtime_get_sync -> (consumer device)
> >         ...
> >         __rpm_callback ->
> >             rpm_get_suppliers ->
> >                 if link->state == DL_STATE_SUPPLIER_UNBIND -> skip the action of resuming the supplier
> >                 ...
> >     pm_runtime_clean_up_links
> >     ...
> >
> > It should guarantee correct suspend/resume ordering between a supplier
> > device and its consumer devices (resume the supplier device before resuming
> > consumer devices, and suspend consumer devices before suspending supplier
> > device) for runtime PM, but it seems the checks in rpm_get_supplier() and
> > rpm_put_supplier() break the rule, so remove them.
> >
> > Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
> > ---
> >  drivers/base/power/runtime.c | 5 +----
> >  1 file changed, 1 insertion(+), 4 deletions(-)
> >
> > diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
> > index 8143210..6f605f7 100644
> > --- a/drivers/base/power/runtime.c
> > +++ b/drivers/base/power/runtime.c
> > @@ -291,8 +291,7 @@ static int rpm_get_suppliers(struct device *dev)
> >                                 device_links_read_lock_held()) {
> >                 int retval;
> >
> > -               if (!(link->flags & DL_FLAG_PM_RUNTIME) ||
> > -                   READ_ONCE(link->status) == DL_STATE_SUPPLIER_UNBIND)
> > +               if (!(link->flags & DL_FLAG_PM_RUNTIME))
> >                         continue;
> >
> >                 retval = pm_runtime_get_sync(link->supplier);
> > @@ -312,8 +311,6 @@ static void rpm_put_suppliers(struct device *dev)
> >
> >         list_for_each_entry_rcu(link, &dev->links.suppliers, c_node,
> >                                 device_links_read_lock_held()) {
> > -               if (READ_ONCE(link->status) == DL_STATE_SUPPLIER_UNBIND)
> > -                       continue;
> >
> >                 while (refcount_dec_not_one(&link->rpm_active))
> >                         pm_runtime_put(link->supplier);
> > --
>
> Applied as 5.10 material with some edits in the subject and changelog, thanks!

I may need to revert it, because it introduced a (theoretical for now)
race condition between PM-runtime and pm_runtime_clean_up_links().

Thanks!
chenxiang Oct. 20, 2020, 2:23 a.m. UTC | #3
Hi Rafael,

在 2020/10/20 1:03, Rafael J. Wysocki 写道:
> On Fri, Sep 25, 2020 at 5:02 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>> On Tue, Sep 22, 2020 at 3:15 PM chenxiang <chenxiang66@hisilicon.com> wrote:
>>> From: Xiang Chen <chenxiang66@hisilicon.com>
>>>
>>> To support runtime PM for hisi SAS driver (the driver is in directory
>>> drivers/scsi/hisi_sas), we add device link between scsi_device->sdev_gendev
>>> (consumer device) and hisi_hba->dev(supplier device) with flags
>>> DL_FLAG_PM_RUNTIME | DL_FLAG_RPM_ACTIVE.
>>> After runtime suspended consumers and supplier, unload the dirver which
>>> cause a hung. We find that it calls function device_release_driver_internal()
>>> to release supplier device hisi_hba->dev, as the device link is busy,
>>> it sets the device link as DL_STATE_SUPPLIER_UNBIND, and then call function
>>> device_release_driver_internal() to release consumer device
>>> scsi_device->sdev_gendev). Then it will try to call pm_runtime_get_sync()
>>> to resume consumer device, as consumer-supplier relation exists, it will try
>>> to resume supplier first,
> It looks like I have overlooked one thing here.
>
> Does it help (without the $subject patch) to move the
> pm_runtime_get_sync(dev) in __device_release_driver() before the while
> (device_links_busy(dev)) loop?

I have tested it, and it also solve the issue.


>
>>> but as the link state is already set as
>>> DL_STATE_SUPPLIER_UNBIND, so it skips resuming supplier and only resume
>>> consumer which cause a hung (it sends IOs to resume scsi_device while
>>> SAS controller is suspended). Simple flow is as follows:
>>>
>>> device_release_driver_internal -> (supplier device)
>>>      if device_links_busy ->
>>>          device_links_unbind_consumers ->
>>>              ...
>>>              WRITE_ONCE(link->status, DL_STATE_SUPPLIER_UNBIND)
>>>              device_release_driver_internal (consumer device)
>>>      pm_runtime_get_sync -> (consumer device)
>>>          ...
>>>          __rpm_callback ->
>>>              rpm_get_suppliers ->
>>>                  if link->state == DL_STATE_SUPPLIER_UNBIND -> skip the action of resuming the supplier
>>>                  ...
>>>      pm_runtime_clean_up_links
>>>      ...
>>>
>>> It should guarantee correct suspend/resume ordering between a supplier
>>> device and its consumer devices (resume the supplier device before resuming
>>> consumer devices, and suspend consumer devices before suspending supplier
>>> device) for runtime PM, but it seems the checks in rpm_get_supplier() and
>>> rpm_put_supplier() break the rule, so remove them.
>>>
>>> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
>>> ---
>>>   drivers/base/power/runtime.c | 5 +----
>>>   1 file changed, 1 insertion(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
>>> index 8143210..6f605f7 100644
>>> --- a/drivers/base/power/runtime.c
>>> +++ b/drivers/base/power/runtime.c
>>> @@ -291,8 +291,7 @@ static int rpm_get_suppliers(struct device *dev)
>>>                                  device_links_read_lock_held()) {
>>>                  int retval;
>>>
>>> -               if (!(link->flags & DL_FLAG_PM_RUNTIME) ||
>>> -                   READ_ONCE(link->status) == DL_STATE_SUPPLIER_UNBIND)
>>> +               if (!(link->flags & DL_FLAG_PM_RUNTIME))
>>>                          continue;
>>>
>>>                  retval = pm_runtime_get_sync(link->supplier);
>>> @@ -312,8 +311,6 @@ static void rpm_put_suppliers(struct device *dev)
>>>
>>>          list_for_each_entry_rcu(link, &dev->links.suppliers, c_node,
>>>                                  device_links_read_lock_held()) {
>>> -               if (READ_ONCE(link->status) == DL_STATE_SUPPLIER_UNBIND)
>>> -                       continue;
>>>
>>>                  while (refcount_dec_not_one(&link->rpm_active))
>>>                          pm_runtime_put(link->supplier);
>>> --
>> Applied as 5.10 material with some edits in the subject and changelog, thanks!
> I may need to revert it, because it introduced a (theoretical for now)
> race condition between PM-runtime and pm_runtime_clean_up_links().
>
> Thanks!
>
> .
>
Rafael J. Wysocki Oct. 20, 2020, 12:45 p.m. UTC | #4
On Tue, Oct 20, 2020 at 4:24 AM chenxiang (M) <chenxiang66@hisilicon.com> wrote:
>
> Hi Rafael,
>
> 在 2020/10/20 1:03, Rafael J. Wysocki 写道:
> > On Fri, Sep 25, 2020 at 5:02 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> >> On Tue, Sep 22, 2020 at 3:15 PM chenxiang <chenxiang66@hisilicon.com> wrote:
> >>> From: Xiang Chen <chenxiang66@hisilicon.com>
> >>>
> >>> To support runtime PM for hisi SAS driver (the driver is in directory
> >>> drivers/scsi/hisi_sas), we add device link between scsi_device->sdev_gendev
> >>> (consumer device) and hisi_hba->dev(supplier device) with flags
> >>> DL_FLAG_PM_RUNTIME | DL_FLAG_RPM_ACTIVE.
> >>> After runtime suspended consumers and supplier, unload the dirver which
> >>> cause a hung. We find that it calls function device_release_driver_internal()
> >>> to release supplier device hisi_hba->dev, as the device link is busy,
> >>> it sets the device link as DL_STATE_SUPPLIER_UNBIND, and then call function
> >>> device_release_driver_internal() to release consumer device
> >>> scsi_device->sdev_gendev). Then it will try to call pm_runtime_get_sync()
> >>> to resume consumer device, as consumer-supplier relation exists, it will try
> >>> to resume supplier first,
> > It looks like I have overlooked one thing here.
> >
> > Does it help (without the $subject patch) to move the
> > pm_runtime_get_sync(dev) in __device_release_driver() before the while
> > (device_links_busy(dev)) loop?
>
> I have tested it, and it also solve the issue.

OK, thanks!
diff mbox series

Patch

diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
index 8143210..6f605f7 100644
--- a/drivers/base/power/runtime.c
+++ b/drivers/base/power/runtime.c
@@ -291,8 +291,7 @@  static int rpm_get_suppliers(struct device *dev)
 				device_links_read_lock_held()) {
 		int retval;
 
-		if (!(link->flags & DL_FLAG_PM_RUNTIME) ||
-		    READ_ONCE(link->status) == DL_STATE_SUPPLIER_UNBIND)
+		if (!(link->flags & DL_FLAG_PM_RUNTIME))
 			continue;
 
 		retval = pm_runtime_get_sync(link->supplier);
@@ -312,8 +311,6 @@  static void rpm_put_suppliers(struct device *dev)
 
 	list_for_each_entry_rcu(link, &dev->links.suppliers, c_node,
 				device_links_read_lock_held()) {
-		if (READ_ONCE(link->status) == DL_STATE_SUPPLIER_UNBIND)
-			continue;
 
 		while (refcount_dec_not_one(&link->rpm_active))
 			pm_runtime_put(link->supplier);