diff mbox series

[RFC] PM:runtime:Remove the link state check in function rpm_get_supplier()

Message ID 1596088129-88814-1-git-send-email-chenxiang66@hisilicon.com (mailing list archive)
State RFC, archived
Headers show
Series [RFC] PM:runtime:Remove the link state check in function rpm_get_supplier() | expand

Commit Message

chenxiang July 30, 2020, 5:48 a.m. UTC
From: Xiang Chen <chenxiang66@hisilicon.com>

To support runtime PM for hisi SAS driver (the dirver is in directory
drivers/scsi/hisi_sas), we add device link between scsi_device->sdev_gendev
(consumer device) and hisi_hba->dev(supplier device) with flags
DL_FLAG_PM_RUNTIME | DL_FLAG_RPM_ACTIVE.
After runtime suspended consumers and supplier, unload the dirver which
cause a hung. We find that it calls function device_release_driver_internal()
to release supplier device hisi_hba->dev, as the device link is busy, 
it sets the device link as DL_STATE_SUPPLIER_UNBIND, and then call function
device_release_driver_internal() to release consumer device
scsi_device->sdev_gendev). Then It will try to call pm_runtime_get_sync()
to resume consumer device, as consumer-supplier relation exists, it will try
to resume supplier first, but as the link state is already set as 
DL_STATE_SUPPLIER_UNBIND, so it skips resuming supplier and only resume 
consumer which cause a hung (it sends IOs to resume scsi_device while 
SAS controller is suspended). Simple flow is as follows:

device_release_driver_internal -> (supplier device)
    if device_links_busy ->
	    device_links_unbind_consumers ->
		...
		WRITE_ONCE(link->status, DL_STATE_SUPPLIER_UNBIND)
		device_release_driver_internal (consumer device)
    pm_runtime_get_sync -> (consumer device)
	...
	__rpm_callback ->
	    rpm_get_suppliers ->
		if link->state == DL_STATE_SUPPLIER_UNBIND -> skip the action of resuming the supplier
		...
    pm_runtime_clean_up_links
    ...

It should guarantee correct suspend/resume ordering between a supplier
device and its consumer devices (resume the supplier device before resuming
consumer devices, and suspend consumer devices before suspending supplier
device) for runtime PM, but it seems the check in rpm_get_supplier() breaks 
the rule, so remove it.

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
---
 drivers/base/power/runtime.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

Comments

chenxiang Sept. 21, 2020, 7:11 a.m. UTC | #1
在 2020/7/30 13:48, chenxiang 写道:
> From: Xiang Chen <chenxiang66@hisilicon.com>
>
> To support runtime PM for hisi SAS driver (the dirver is in directory
> drivers/scsi/hisi_sas), we add device link between scsi_device->sdev_gendev
> (consumer device) and hisi_hba->dev(supplier device) with flags
> DL_FLAG_PM_RUNTIME | DL_FLAG_RPM_ACTIVE.
> After runtime suspended consumers and supplier, unload the dirver which
> cause a hung. We find that it calls function device_release_driver_internal()
> to release supplier device hisi_hba->dev, as the device link is busy,
> it sets the device link as DL_STATE_SUPPLIER_UNBIND, and then call function
> device_release_driver_internal() to release consumer device
> scsi_device->sdev_gendev). Then It will try to call pm_runtime_get_sync()
> to resume consumer device, as consumer-supplier relation exists, it will try
> to resume supplier first, but as the link state is already set as
> DL_STATE_SUPPLIER_UNBIND, so it skips resuming supplier and only resume
> consumer which cause a hung (it sends IOs to resume scsi_device while
> SAS controller is suspended). Simple flow is as follows:
>
> device_release_driver_internal -> (supplier device)
>      if device_links_busy ->
> 	    device_links_unbind_consumers ->
> 		...
> 		WRITE_ONCE(link->status, DL_STATE_SUPPLIER_UNBIND)
> 		device_release_driver_internal (consumer device)
>      pm_runtime_get_sync -> (consumer device)
> 	...
> 	__rpm_callback ->
> 	    rpm_get_suppliers ->
> 		if link->state == DL_STATE_SUPPLIER_UNBIND -> skip the action of resuming the supplier
> 		...
>      pm_runtime_clean_up_links
>      ...

Hi Rafael, do you have any idea about this issue?

> It should guarantee correct suspend/resume ordering between a supplier
> device and its consumer devices (resume the supplier device before resuming
> consumer devices, and suspend consumer devices before suspending supplier
> device) for runtime PM, but it seems the check in rpm_get_supplier() breaks
> the rule, so remove it.
>
> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
> ---
>   drivers/base/power/runtime.c | 3 +--
>   1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
> index 9f62790..a8edd92 100644
> --- a/drivers/base/power/runtime.c
> +++ b/drivers/base/power/runtime.c
> @@ -291,8 +291,7 @@ static int rpm_get_suppliers(struct device *dev)
>   				device_links_read_lock_held()) {
>   		int retval;
>   
> -		if (!(link->flags & DL_FLAG_PM_RUNTIME) ||
> -		    READ_ONCE(link->status) == DL_STATE_SUPPLIER_UNBIND)
> +		if (!(link->flags & DL_FLAG_PM_RUNTIME))
>   			continue;
>   
>   		retval = pm_runtime_get_sync(link->supplier);
Rafael J. Wysocki Sept. 21, 2020, 4:02 p.m. UTC | #2
On Mon, Sep 21, 2020 at 9:12 AM chenxiang (M) <chenxiang66@hisilicon.com> wrote:
>
>
>
> 在 2020/7/30 13:48, chenxiang 写道:
> > From: Xiang Chen <chenxiang66@hisilicon.com>
> >
> > To support runtime PM for hisi SAS driver (the dirver is in directory
> > drivers/scsi/hisi_sas), we add device link between scsi_device->sdev_gendev
> > (consumer device) and hisi_hba->dev(supplier device) with flags
> > DL_FLAG_PM_RUNTIME | DL_FLAG_RPM_ACTIVE.
> > After runtime suspended consumers and supplier, unload the dirver which
> > cause a hung. We find that it calls function device_release_driver_internal()
> > to release supplier device hisi_hba->dev, as the device link is busy,
> > it sets the device link as DL_STATE_SUPPLIER_UNBIND, and then call function
> > device_release_driver_internal() to release consumer device
> > scsi_device->sdev_gendev). Then It will try to call pm_runtime_get_sync()
> > to resume consumer device, as consumer-supplier relation exists, it will try
> > to resume supplier first, but as the link state is already set as
> > DL_STATE_SUPPLIER_UNBIND, so it skips resuming supplier and only resume
> > consumer which cause a hung (it sends IOs to resume scsi_device while
> > SAS controller is suspended). Simple flow is as follows:
> >
> > device_release_driver_internal -> (supplier device)
> >      if device_links_busy ->
> >           device_links_unbind_consumers ->
> >               ...
> >               WRITE_ONCE(link->status, DL_STATE_SUPPLIER_UNBIND)
> >               device_release_driver_internal (consumer device)
> >      pm_runtime_get_sync -> (consumer device)
> >       ...
> >       __rpm_callback ->
> >           rpm_get_suppliers ->
> >               if link->state == DL_STATE_SUPPLIER_UNBIND -> skip the action of resuming the supplier
> >               ...
> >      pm_runtime_clean_up_links
> >      ...
>
> Hi Rafael, do you have any idea about this issue?
>
> > It should guarantee correct suspend/resume ordering between a supplier
> > device and its consumer devices (resume the supplier device before resuming
> > consumer devices, and suspend consumer devices before suspending supplier
> > device) for runtime PM, but it seems the check in rpm_get_supplier() breaks
> > the rule, so remove it.
> >
> > Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
> > ---
> >   drivers/base/power/runtime.c | 3 +--
> >   1 file changed, 1 insertion(+), 2 deletions(-)
> >
> > diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
> > index 9f62790..a8edd92 100644
> > --- a/drivers/base/power/runtime.c
> > +++ b/drivers/base/power/runtime.c
> > @@ -291,8 +291,7 @@ static int rpm_get_suppliers(struct device *dev)
> >                               device_links_read_lock_held()) {
> >               int retval;
> >
> > -             if (!(link->flags & DL_FLAG_PM_RUNTIME) ||
> > -                 READ_ONCE(link->status) == DL_STATE_SUPPLIER_UNBIND)
> > +             if (!(link->flags & DL_FLAG_PM_RUNTIME))

AFAICS you need to make the analogous change in rpm_put_suppliers(),
but apart from this it should be fine.

Thanks!

> >                       continue;
> >
> >               retval = pm_runtime_get_sync(link->supplier);
>
>
chenxiang Sept. 22, 2020, 9:32 a.m. UTC | #3
在 2020/9/22 0:02, Rafael J. Wysocki 写道:
> On Mon, Sep 21, 2020 at 9:12 AM chenxiang (M) <chenxiang66@hisilicon.com> wrote:
>>
>>
>> 在 2020/7/30 13:48, chenxiang 写道:
>>> From: Xiang Chen <chenxiang66@hisilicon.com>
>>>
>>> To support runtime PM for hisi SAS driver (the dirver is in directory
>>> drivers/scsi/hisi_sas), we add device link between scsi_device->sdev_gendev
>>> (consumer device) and hisi_hba->dev(supplier device) with flags
>>> DL_FLAG_PM_RUNTIME | DL_FLAG_RPM_ACTIVE.
>>> After runtime suspended consumers and supplier, unload the dirver which
>>> cause a hung. We find that it calls function device_release_driver_internal()
>>> to release supplier device hisi_hba->dev, as the device link is busy,
>>> it sets the device link as DL_STATE_SUPPLIER_UNBIND, and then call function
>>> device_release_driver_internal() to release consumer device
>>> scsi_device->sdev_gendev). Then It will try to call pm_runtime_get_sync()
>>> to resume consumer device, as consumer-supplier relation exists, it will try
>>> to resume supplier first, but as the link state is already set as
>>> DL_STATE_SUPPLIER_UNBIND, so it skips resuming supplier and only resume
>>> consumer which cause a hung (it sends IOs to resume scsi_device while
>>> SAS controller is suspended). Simple flow is as follows:
>>>
>>> device_release_driver_internal -> (supplier device)
>>>       if device_links_busy ->
>>>            device_links_unbind_consumers ->
>>>                ...
>>>                WRITE_ONCE(link->status, DL_STATE_SUPPLIER_UNBIND)
>>>                device_release_driver_internal (consumer device)
>>>       pm_runtime_get_sync -> (consumer device)
>>>        ...
>>>        __rpm_callback ->
>>>            rpm_get_suppliers ->
>>>                if link->state == DL_STATE_SUPPLIER_UNBIND -> skip the action of resuming the supplier
>>>                ...
>>>       pm_runtime_clean_up_links
>>>       ...
>> Hi Rafael, do you have any idea about this issue?
>>
>>> It should guarantee correct suspend/resume ordering between a supplier
>>> device and its consumer devices (resume the supplier device before resuming
>>> consumer devices, and suspend consumer devices before suspending supplier
>>> device) for runtime PM, but it seems the check in rpm_get_supplier() breaks
>>> the rule, so remove it.
>>>
>>> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
>>> ---
>>>    drivers/base/power/runtime.c | 3 +--
>>>    1 file changed, 1 insertion(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
>>> index 9f62790..a8edd92 100644
>>> --- a/drivers/base/power/runtime.c
>>> +++ b/drivers/base/power/runtime.c
>>> @@ -291,8 +291,7 @@ static int rpm_get_suppliers(struct device *dev)
>>>                                device_links_read_lock_held()) {
>>>                int retval;
>>>
>>> -             if (!(link->flags & DL_FLAG_PM_RUNTIME) ||
>>> -                 READ_ONCE(link->status) == DL_STATE_SUPPLIER_UNBIND)
>>> +             if (!(link->flags & DL_FLAG_PM_RUNTIME))
> AFAICS you need to make the analogous change in rpm_put_suppliers(),
> but apart from this it should be fine.

Thanks, i will add the change in next version.

>
> Thanks!
>
>>>                        continue;
>>>
>>>                retval = pm_runtime_get_sync(link->supplier);
>>
> .
>
diff mbox series

Patch

diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
index 9f62790..a8edd92 100644
--- a/drivers/base/power/runtime.c
+++ b/drivers/base/power/runtime.c
@@ -291,8 +291,7 @@  static int rpm_get_suppliers(struct device *dev)
 				device_links_read_lock_held()) {
 		int retval;
 
-		if (!(link->flags & DL_FLAG_PM_RUNTIME) ||
-		    READ_ONCE(link->status) == DL_STATE_SUPPLIER_UNBIND)
+		if (!(link->flags & DL_FLAG_PM_RUNTIME))
 			continue;
 
 		retval = pm_runtime_get_sync(link->supplier);