diff mbox series

[v5,1/3] scsi: core: Add new helper to iterate all devices of host

Message ID 20240605091731.3111195-2-haowenchao22@gmail.com (mailing list archive)
State Changes Requested
Headers show
Series SCSI: Fix issues between removing device and error handle | expand

Commit Message

Hao Wenchao June 5, 2024, 9:17 a.m. UTC
shost_for_each_device() would skip devices which is in SDEV_CANCEL or
SDEV_DEL state, for some scenarios, we donot want to skip these devices,
so add a new macro shost_for_each_device_include_deleted() to handle it.

Following changes are introduced:

1. Rework scsi_device_get(), add new helper __scsi_device_get() which
   determine if skip deleted scsi_device by parameter "skip_deleted".
2. Add new parameter "skip_deleted" to __scsi_iterate_devices() which
   is used when calling __scsi_device_get()
3. Update shost_for_each_device() to call __scsi_iterate_devices() with
   "skip_deleted" true
4. Add new macro shost_for_each_device_include_deleted() which call
   __scsi_iterate_devices() with "skip_deleted" false

Signed-off-by: Wenchao Hao <haowenchao22@gmail.com>
---
 drivers/scsi/scsi.c        | 46 ++++++++++++++++++++++++++------------
 include/scsi/scsi_device.h | 25 ++++++++++++++++++---
 2 files changed, 54 insertions(+), 17 deletions(-)

Comments

Hannes Reinecke June 12, 2024, 8:33 a.m. UTC | #1
On 6/5/24 11:17, Wenchao Hao wrote:
> shost_for_each_device() would skip devices which is in SDEV_CANCEL or
> SDEV_DEL state, for some scenarios, we donot want to skip these devices,
> so add a new macro shost_for_each_device_include_deleted() to handle it.
> 
> Following changes are introduced:
> 
> 1. Rework scsi_device_get(), add new helper __scsi_device_get() which
>     determine if skip deleted scsi_device by parameter "skip_deleted".
> 2. Add new parameter "skip_deleted" to __scsi_iterate_devices() which
>     is used when calling __scsi_device_get()
> 3. Update shost_for_each_device() to call __scsi_iterate_devices() with
>     "skip_deleted" true
> 4. Add new macro shost_for_each_device_include_deleted() which call
>     __scsi_iterate_devices() with "skip_deleted" false
> 
> Signed-off-by: Wenchao Hao <haowenchao22@gmail.com>
> ---
>   drivers/scsi/scsi.c        | 46 ++++++++++++++++++++++++++------------
>   include/scsi/scsi_device.h | 25 ++++++++++++++++++---
>   2 files changed, 54 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
> index 3e0c0381277a..5913de543d93 100644
> --- a/drivers/scsi/scsi.c
> +++ b/drivers/scsi/scsi.c
> @@ -735,20 +735,18 @@ int scsi_cdl_enable(struct scsi_device *sdev, bool enable)
>   	return 0;
>   }
>   
> -/**
> - * scsi_device_get  -  get an additional reference to a scsi_device
> +/*
> + * __scsi_device_get  -  get an additional reference to a scsi_device
>    * @sdev:	device to get a reference to
> - *
> - * Description: Gets a reference to the scsi_device and increments the use count
> - * of the underlying LLDD module.  You must hold host_lock of the
> - * parent Scsi_Host or already have a reference when calling this.
> - *
> - * This will fail if a device is deleted or cancelled, or when the LLD module
> - * is in the process of being unloaded.
> + * @skip_deleted: when true, would return failed if device is deleted
>    */
> -int scsi_device_get(struct scsi_device *sdev)
> +static int __scsi_device_get(struct scsi_device *sdev, bool skip_deleted)
>   {
> -	if (sdev->sdev_state == SDEV_DEL || sdev->sdev_state == SDEV_CANCEL)
> +	/*
> +	 * if skip_deleted is true and device is in removing, return failed
> +	 */
> +	if (skip_deleted &&
> +	    (sdev->sdev_state == SDEV_DEL || sdev->sdev_state == SDEV_CANCEL))
>   		goto fail;

Nack.
SDEV_DEL means the device is about to be deleted, so we _must not_ 
access it at all.

Cheers,

Hannes
Hao Wenchao June 12, 2024, 3:06 p.m. UTC | #2
On 6/12/24 4:33 PM, Hannes Reinecke wrote:
> On 6/5/24 11:17, Wenchao Hao wrote:
>> shost_for_each_device() would skip devices which is in SDEV_CANCEL or
>> SDEV_DEL state, for some scenarios, we donot want to skip these devices,
>> so add a new macro shost_for_each_device_include_deleted() to handle it.
>>
>> Following changes are introduced:
>>
>> 1. Rework scsi_device_get(), add new helper __scsi_device_get() which
>>     determine if skip deleted scsi_device by parameter "skip_deleted".
>> 2. Add new parameter "skip_deleted" to __scsi_iterate_devices() which
>>     is used when calling __scsi_device_get()
>> 3. Update shost_for_each_device() to call __scsi_iterate_devices() with
>>     "skip_deleted" true
>> 4. Add new macro shost_for_each_device_include_deleted() which call
>>     __scsi_iterate_devices() with "skip_deleted" false
>>
>> Signed-off-by: Wenchao Hao <haowenchao22@gmail.com>
>> ---
>>   drivers/scsi/scsi.c        | 46 ++++++++++++++++++++++++++------------
>>   include/scsi/scsi_device.h | 25 ++++++++++++++++++---
>>   2 files changed, 54 insertions(+), 17 deletions(-)
>>
>> diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
>> index 3e0c0381277a..5913de543d93 100644
>> --- a/drivers/scsi/scsi.c
>> +++ b/drivers/scsi/scsi.c
>> @@ -735,20 +735,18 @@ int scsi_cdl_enable(struct scsi_device *sdev, bool enable)
>>       return 0;
>>   }
>>   -/**
>> - * scsi_device_get  -  get an additional reference to a scsi_device
>> +/*
>> + * __scsi_device_get  -  get an additional reference to a scsi_device
>>    * @sdev:    device to get a reference to
>> - *
>> - * Description: Gets a reference to the scsi_device and increments the use count
>> - * of the underlying LLDD module.  You must hold host_lock of the
>> - * parent Scsi_Host or already have a reference when calling this.
>> - *
>> - * This will fail if a device is deleted or cancelled, or when the LLD module
>> - * is in the process of being unloaded.
>> + * @skip_deleted: when true, would return failed if device is deleted
>>    */
>> -int scsi_device_get(struct scsi_device *sdev)
>> +static int __scsi_device_get(struct scsi_device *sdev, bool skip_deleted)
>>   {
>> -    if (sdev->sdev_state == SDEV_DEL || sdev->sdev_state == SDEV_CANCEL)
>> +    /*
>> +     * if skip_deleted is true and device is in removing, return failed
>> +     */
>> +    if (skip_deleted &&
>> +        (sdev->sdev_state == SDEV_DEL || sdev->sdev_state == SDEV_CANCEL))
>>           goto fail;
> 
> Nack.
> SDEV_DEL means the device is about to be deleted, so we _must not_ access it at all.
> 

Sorry I added SDEV_DEL here at hand without understanding what it means.
Actually, just include scsi_device which is in SDEV_CANCEL would fix the
issues I described.

The issues are because device removing concurrent with error handle.
Normally, error handle would not be triggered when scsi_device is in
SDEV_DEL. Below is my analysis, if it is wrong, please correct me.

If there are scsi_cmnd remain unfinished when removing scsi_device,
the removing process would waiting for all commands to be finished.
If commands error happened and trigger error handle, the removing
process would be blocked until error handle finished, because
__scsi_remove_device called  del_gendisk() which would wait all
requests to be finished. So now scsi_device is in SDEV_CANCEL.

If the scsi_device is already in SDEV_DEL, then no scsi_cmnd has been
dispatched to this scsi_device, then error handle would never triggered.

I want to change the new function __scsi_device_get() as following,
please help to review.

/*
 * __scsi_device_get  -  get an additional reference to a scsi_device
 * @sdev:	device to get a reference to
 * @skip_canceled: when true, would return failed if device is deleted
 */
static int __scsi_device_get(struct scsi_device *sdev, bool skip_canceled)
{
	/*
	 * if skip_canceled is true and device is in removing, return failed
	 */
	if (sdev->sdev_state == SDEV_DEL ||
	    (sdev->sdev_state == SDEV_CANCEL && skip_canceled))
		goto fail;
	if (!try_module_get(sdev->host->hostt->module))
		goto fail;
	if (!get_device(&sdev->sdev_gendev))
		goto fail_put_module;
	return 0;

fail_put_module:
	module_put(sdev->host->hostt->module);
fail:
	return -ENXIO;
}

> Cheers,
> 
> Hannes
Hannes Reinecke June 13, 2024, 6:27 a.m. UTC | #3
On 6/12/24 17:06, Wenchao Hao wrote:
> On 6/12/24 4:33 PM, Hannes Reinecke wrote:
>> On 6/5/24 11:17, Wenchao Hao wrote:
>>> shost_for_each_device() would skip devices which is in SDEV_CANCEL or
>>> SDEV_DEL state, for some scenarios, we donot want to skip these devices,
>>> so add a new macro shost_for_each_device_include_deleted() to handle it.
>>>
>>> Following changes are introduced:
>>>
>>> 1. Rework scsi_device_get(), add new helper __scsi_device_get() which
>>>      determine if skip deleted scsi_device by parameter "skip_deleted".
>>> 2. Add new parameter "skip_deleted" to __scsi_iterate_devices() which
>>>      is used when calling __scsi_device_get()
>>> 3. Update shost_for_each_device() to call __scsi_iterate_devices() with
>>>      "skip_deleted" true
>>> 4. Add new macro shost_for_each_device_include_deleted() which call
>>>      __scsi_iterate_devices() with "skip_deleted" false
>>>
>>> Signed-off-by: Wenchao Hao <haowenchao22@gmail.com>
>>> ---
>>>    drivers/scsi/scsi.c        | 46 ++++++++++++++++++++++++++------------
>>>    include/scsi/scsi_device.h | 25 ++++++++++++++++++---
>>>    2 files changed, 54 insertions(+), 17 deletions(-)
>>>
>>> diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
>>> index 3e0c0381277a..5913de543d93 100644
>>> --- a/drivers/scsi/scsi.c
>>> +++ b/drivers/scsi/scsi.c
>>> @@ -735,20 +735,18 @@ int scsi_cdl_enable(struct scsi_device *sdev, bool enable)
>>>        return 0;
>>>    }
>>>    -/**
>>> - * scsi_device_get  -  get an additional reference to a scsi_device
>>> +/*
>>> + * __scsi_device_get  -  get an additional reference to a scsi_device
>>>     * @sdev:    device to get a reference to
>>> - *
>>> - * Description: Gets a reference to the scsi_device and increments the use count
>>> - * of the underlying LLDD module.  You must hold host_lock of the
>>> - * parent Scsi_Host or already have a reference when calling this.
>>> - *
>>> - * This will fail if a device is deleted or cancelled, or when the LLD module
>>> - * is in the process of being unloaded.
>>> + * @skip_deleted: when true, would return failed if device is deleted
>>>     */
>>> -int scsi_device_get(struct scsi_device *sdev)
>>> +static int __scsi_device_get(struct scsi_device *sdev, bool skip_deleted)
>>>    {
>>> -    if (sdev->sdev_state == SDEV_DEL || sdev->sdev_state == SDEV_CANCEL)
>>> +    /*
>>> +     * if skip_deleted is true and device is in removing, return failed
>>> +     */
>>> +    if (skip_deleted &&
>>> +        (sdev->sdev_state == SDEV_DEL || sdev->sdev_state == SDEV_CANCEL))
>>>            goto fail;
>>
>> Nack.
>> SDEV_DEL means the device is about to be deleted, so we _must not_ access it at all.
>>
> 
> Sorry I added SDEV_DEL here at hand without understanding what it means.
> Actually, just include scsi_device which is in SDEV_CANCEL would fix the
> issues I described.
> 
> The issues are because device removing concurrent with error handle.
> Normally, error handle would not be triggered when scsi_device is in
> SDEV_DEL. Below is my analysis, if it is wrong, please correct me.
> 
> If there are scsi_cmnd remain unfinished when removing scsi_device,
> the removing process would waiting for all commands to be finished.
> If commands error happened and trigger error handle, the removing
> process would be blocked until error handle finished, because
> __scsi_remove_device called  del_gendisk() which would wait all
> requests to be finished. So now scsi_device is in SDEV_CANCEL.
> 
> If the scsi_device is already in SDEV_DEL, then no scsi_cmnd has been
> dispatched to this scsi_device, then error handle would never triggered.
> 
> I want to change the new function __scsi_device_get() as following,
> please help to review.
> 
> /*
>   * __scsi_device_get  -  get an additional reference to a scsi_device
>   * @sdev:	device to get a reference to
>   * @skip_canceled: when true, would return failed if device is deleted
>   */
> static int __scsi_device_get(struct scsi_device *sdev, bool skip_canceled)
> {
> 	/*
> 	 * if skip_canceled is true and device is in removing, return failed
> 	 */
> 	if (sdev->sdev_state == SDEV_DEL ||
> 	    (sdev->sdev_state == SDEV_CANCEL && skip_canceled))
> 		goto fail;
> 	if (!try_module_get(sdev->host->hostt->module))
> 		goto fail;
> 	if (!get_device(&sdev->sdev_gendev))
> 		goto fail_put_module;
> 	return 0;
> 
> fail_put_module:
> 	module_put(sdev->host->hostt->module);
> fail:
> 	return -ENXIO;
> }
> 
I don't think that's required.
With your above analysis, wouldn't the problem be solved with:

diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index 775df00021e4..911fcfa80d69 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -1470,6 +1470,8 @@ void __scsi_remove_device(struct scsi_device *sdev)
         if (sdev->sdev_state == SDEV_DEL)
                 return;

+       scsi_block_when_processing_errors(sdev);
+
         if (sdev->is_visible) {
                 /*
                  * If scsi_internal_target_block() is running concurrently,

Hmm?

Cheers,

Hannes
Hao Wenchao June 13, 2024, 7:10 a.m. UTC | #4
On 2024/6/13 14:27, Hannes Reinecke wrote:
> On 6/12/24 17:06, Wenchao Hao wrote:
>> On 6/12/24 4:33 PM, Hannes Reinecke wrote:
>>> On 6/5/24 11:17, Wenchao Hao wrote:
>>>> shost_for_each_device() would skip devices which is in SDEV_CANCEL or
>>>> SDEV_DEL state, for some scenarios, we donot want to skip these 
>>>> devices,
>>>> so add a new macro shost_for_each_device_include_deleted() to handle 
>>>> it.
>>>>
>>>> Following changes are introduced:
>>>>
>>>> 1. Rework scsi_device_get(), add new helper __scsi_device_get() which
>>>>      determine if skip deleted scsi_device by parameter "skip_deleted".
>>>> 2. Add new parameter "skip_deleted" to __scsi_iterate_devices() which
>>>>      is used when calling __scsi_device_get()
>>>> 3. Update shost_for_each_device() to call __scsi_iterate_devices() with
>>>>      "skip_deleted" true
>>>> 4. Add new macro shost_for_each_device_include_deleted() which call
>>>>      __scsi_iterate_devices() with "skip_deleted" false
>>>>
>>>> Signed-off-by: Wenchao Hao <haowenchao22@gmail.com>
>>>> ---
>>>>    drivers/scsi/scsi.c        | 46 
>>>> ++++++++++++++++++++++++++------------
>>>>    include/scsi/scsi_device.h | 25 ++++++++++++++++++---
>>>>    2 files changed, 54 insertions(+), 17 deletions(-)
>>>>
>>>> diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
>>>> index 3e0c0381277a..5913de543d93 100644
>>>> --- a/drivers/scsi/scsi.c
>>>> +++ b/drivers/scsi/scsi.c
>>>> @@ -735,20 +735,18 @@ int scsi_cdl_enable(struct scsi_device *sdev, 
>>>> bool enable)
>>>>        return 0;
>>>>    }
>>>>    -/**
>>>> - * scsi_device_get  -  get an additional reference to a scsi_device
>>>> +/*
>>>> + * __scsi_device_get  -  get an additional reference to a scsi_device
>>>>     * @sdev:    device to get a reference to
>>>> - *
>>>> - * Description: Gets a reference to the scsi_device and increments 
>>>> the use count
>>>> - * of the underlying LLDD module.  You must hold host_lock of the
>>>> - * parent Scsi_Host or already have a reference when calling this.
>>>> - *
>>>> - * This will fail if a device is deleted or cancelled, or when the 
>>>> LLD module
>>>> - * is in the process of being unloaded.
>>>> + * @skip_deleted: when true, would return failed if device is deleted
>>>>     */
>>>> -int scsi_device_get(struct scsi_device *sdev)
>>>> +static int __scsi_device_get(struct scsi_device *sdev, bool 
>>>> skip_deleted)
>>>>    {
>>>> -    if (sdev->sdev_state == SDEV_DEL || sdev->sdev_state == 
>>>> SDEV_CANCEL)
>>>> +    /*
>>>> +     * if skip_deleted is true and device is in removing, return 
>>>> failed
>>>> +     */
>>>> +    if (skip_deleted &&
>>>> +        (sdev->sdev_state == SDEV_DEL || sdev->sdev_state == 
>>>> SDEV_CANCEL))
>>>>            goto fail;
>>>
>>> Nack.
>>> SDEV_DEL means the device is about to be deleted, so we _must not_ 
>>> access it at all.
>>>
>>
>> Sorry I added SDEV_DEL here at hand without understanding what it means.
>> Actually, just include scsi_device which is in SDEV_CANCEL would fix the
>> issues I described.
>>
>> The issues are because device removing concurrent with error handle.
>> Normally, error handle would not be triggered when scsi_device is in
>> SDEV_DEL. Below is my analysis, if it is wrong, please correct me.
>>
>> If there are scsi_cmnd remain unfinished when removing scsi_device,
>> the removing process would waiting for all commands to be finished.
>> If commands error happened and trigger error handle, the removing
>> process would be blocked until error handle finished, because
>> __scsi_remove_device called  del_gendisk() which would wait all
>> requests to be finished. So now scsi_device is in SDEV_CANCEL.
>>
>> If the scsi_device is already in SDEV_DEL, then no scsi_cmnd has been
>> dispatched to this scsi_device, then error handle would never triggered.
>>
>> I want to change the new function __scsi_device_get() as following,
>> please help to review.
>>
>> /*
>>   * __scsi_device_get  -  get an additional reference to a scsi_device
>>   * @sdev:    device to get a reference to
>>   * @skip_canceled: when true, would return failed if device is deleted
>>   */
>> static int __scsi_device_get(struct scsi_device *sdev, bool 
>> skip_canceled)
>> {
>>     /*
>>      * if skip_canceled is true and device is in removing, return failed
>>      */
>>     if (sdev->sdev_state == SDEV_DEL ||
>>         (sdev->sdev_state == SDEV_CANCEL && skip_canceled))
>>         goto fail;
>>     if (!try_module_get(sdev->host->hostt->module))
>>         goto fail;
>>     if (!get_device(&sdev->sdev_gendev))
>>         goto fail_put_module;
>>     return 0;
>>
>> fail_put_module:
>>     module_put(sdev->host->hostt->module);
>> fail:
>>     return -ENXIO;
>> }
>>
> I don't think that's required.
> With your above analysis, wouldn't the problem be solved with:
> 
> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
> index 775df00021e4..911fcfa80d69 100644
> --- a/drivers/scsi/scsi_sysfs.c
> +++ b/drivers/scsi/scsi_sysfs.c
> @@ -1470,6 +1470,8 @@ void __scsi_remove_device(struct scsi_device *sdev)
>          if (sdev->sdev_state == SDEV_DEL)
>                  return;
> 
> +       scsi_block_when_processing_errors(sdev);
> +
>          if (sdev->is_visible) {
>                  /*
>                   * If scsi_internal_target_block() is running 
> concurrently,
> 
> Hmm?
> 

We can not make sure no scsi_cmnd remain unfinished after 
scsi_block_when_processing_errors(). For example, there is a
command has been dispatched but it's not timeouted when removing
device, no error happened. After scsi_device is set to SDEV_CANCEL,
the removing process would be blocked by del_gendisk() because there
is still a request.

Then the request timeout and abort failed, error handle would be 
triggered, now scsi_device is SDEV_CANCEL.

The error handle would skip this scsi_device when doing device reset.

> Cheers,
> 
> Hannes
Hao Wenchao July 12, 2024, 2:20 a.m. UTC | #5
On 2024/6/13 15:10, Wenchao Hao wrote:
> On 2024/6/13 14:27, Hannes Reinecke wrote:
>> On 6/12/24 17:06, Wenchao Hao wrote:
>>> On 6/12/24 4:33 PM, Hannes Reinecke wrote:
>>>> On 6/5/24 11:17, Wenchao Hao wrote:
>>>>> shost_for_each_device() would skip devices which is in SDEV_CANCEL or
>>>>> SDEV_DEL state, for some scenarios, we donot want to skip these devices,
>>>>> so add a new macro shost_for_each_device_include_deleted() to handle it.
>>>>>
>>>>> Following changes are introduced:
>>>>>
>>>>> 1. Rework scsi_device_get(), add new helper __scsi_device_get() which
>>>>>      determine if skip deleted scsi_device by parameter "skip_deleted".
>>>>> 2. Add new parameter "skip_deleted" to __scsi_iterate_devices() which
>>>>>      is used when calling __scsi_device_get()
>>>>> 3. Update shost_for_each_device() to call __scsi_iterate_devices() with
>>>>>      "skip_deleted" true
>>>>> 4. Add new macro shost_for_each_device_include_deleted() which call
>>>>>      __scsi_iterate_devices() with "skip_deleted" false
>>>>>
>>>>> Signed-off-by: Wenchao Hao <haowenchao22@gmail.com>
>>>>> ---
>>>>>    drivers/scsi/scsi.c        | 46 ++++++++++++++++++++++++++------------
>>>>>    include/scsi/scsi_device.h | 25 ++++++++++++++++++---
>>>>>    2 files changed, 54 insertions(+), 17 deletions(-)
>>>>>
>>>>> diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
>>>>> index 3e0c0381277a..5913de543d93 100644
>>>>> --- a/drivers/scsi/scsi.c
>>>>> +++ b/drivers/scsi/scsi.c
>>>>> @@ -735,20 +735,18 @@ int scsi_cdl_enable(struct scsi_device *sdev, bool enable)
>>>>>        return 0;
>>>>>    }
>>>>>    -/**
>>>>> - * scsi_device_get  -  get an additional reference to a scsi_device
>>>>> +/*
>>>>> + * __scsi_device_get  -  get an additional reference to a scsi_device
>>>>>     * @sdev:    device to get a reference to
>>>>> - *
>>>>> - * Description: Gets a reference to the scsi_device and increments the use count
>>>>> - * of the underlying LLDD module.  You must hold host_lock of the
>>>>> - * parent Scsi_Host or already have a reference when calling this.
>>>>> - *
>>>>> - * This will fail if a device is deleted or cancelled, or when the LLD module
>>>>> - * is in the process of being unloaded.
>>>>> + * @skip_deleted: when true, would return failed if device is deleted
>>>>>     */
>>>>> -int scsi_device_get(struct scsi_device *sdev)
>>>>> +static int __scsi_device_get(struct scsi_device *sdev, bool skip_deleted)
>>>>>    {
>>>>> -    if (sdev->sdev_state == SDEV_DEL || sdev->sdev_state == SDEV_CANCEL)
>>>>> +    /*
>>>>> +     * if skip_deleted is true and device is in removing, return failed
>>>>> +     */
>>>>> +    if (skip_deleted &&
>>>>> +        (sdev->sdev_state == SDEV_DEL || sdev->sdev_state == SDEV_CANCEL))
>>>>>            goto fail;
>>>>
>>>> Nack.
>>>> SDEV_DEL means the device is about to be deleted, so we _must not_ access it at all.
>>>>
>>>
>>> Sorry I added SDEV_DEL here at hand without understanding what it means.
>>> Actually, just include scsi_device which is in SDEV_CANCEL would fix the
>>> issues I described.
>>>
>>> The issues are because device removing concurrent with error handle.
>>> Normally, error handle would not be triggered when scsi_device is in
>>> SDEV_DEL. Below is my analysis, if it is wrong, please correct me.
>>>
>>> If there are scsi_cmnd remain unfinished when removing scsi_device,
>>> the removing process would waiting for all commands to be finished.
>>> If commands error happened and trigger error handle, the removing
>>> process would be blocked until error handle finished, because
>>> __scsi_remove_device called  del_gendisk() which would wait all
>>> requests to be finished. So now scsi_device is in SDEV_CANCEL.
>>>
>>> If the scsi_device is already in SDEV_DEL, then no scsi_cmnd has been
>>> dispatched to this scsi_device, then error handle would never triggered.
>>>
>>> I want to change the new function __scsi_device_get() as following,
>>> please help to review.
>>>
>>> /*
>>>   * __scsi_device_get  -  get an additional reference to a scsi_device
>>>   * @sdev:    device to get a reference to
>>>   * @skip_canceled: when true, would return failed if device is deleted
>>>   */
>>> static int __scsi_device_get(struct scsi_device *sdev, bool skip_canceled)
>>> {
>>>     /*
>>>      * if skip_canceled is true and device is in removing, return failed
>>>      */
>>>     if (sdev->sdev_state == SDEV_DEL ||
>>>         (sdev->sdev_state == SDEV_CANCEL && skip_canceled))
>>>         goto fail;
>>>     if (!try_module_get(sdev->host->hostt->module))
>>>         goto fail;
>>>     if (!get_device(&sdev->sdev_gendev))
>>>         goto fail_put_module;
>>>     return 0;
>>>
>>> fail_put_module:
>>>     module_put(sdev->host->hostt->module);
>>> fail:
>>>     return -ENXIO;
>>> }
>>>
>> I don't think that's required.
>> With your above analysis, wouldn't the problem be solved with:
>>
>> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
>> index 775df00021e4..911fcfa80d69 100644
>> --- a/drivers/scsi/scsi_sysfs.c
>> +++ b/drivers/scsi/scsi_sysfs.c
>> @@ -1470,6 +1470,8 @@ void __scsi_remove_device(struct scsi_device *sdev)
>>          if (sdev->sdev_state == SDEV_DEL)
>>                  return;
>>
>> +       scsi_block_when_processing_errors(sdev);
>> +
>>          if (sdev->is_visible) {
>>                  /*
>>                   * If scsi_internal_target_block() is running concurrently,
>>
>> Hmm?
>>
> 
> We can not make sure no scsi_cmnd remain unfinished after scsi_block_when_processing_errors(). For example, there is a
> command has been dispatched but it's not timeouted when removing
> device, no error happened. After scsi_device is set to SDEV_CANCEL,
> the removing process would be blocked by del_gendisk() because there
> is still a request.
> 
> Then the request timeout and abort failed, error handle would be triggered, now scsi_device is SDEV_CANCEL.
> 
> The error handle would skip this scsi_device when doing device reset.
> 
>> Cheers,
>>
>> Hannes
> 
Hi Hannes,

Would you review these patches? Or do how do you suggest to fix the issues?

thanks.
diff mbox series

Patch

diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index 3e0c0381277a..5913de543d93 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -735,20 +735,18 @@  int scsi_cdl_enable(struct scsi_device *sdev, bool enable)
 	return 0;
 }
 
-/**
- * scsi_device_get  -  get an additional reference to a scsi_device
+/*
+ * __scsi_device_get  -  get an additional reference to a scsi_device
  * @sdev:	device to get a reference to
- *
- * Description: Gets a reference to the scsi_device and increments the use count
- * of the underlying LLDD module.  You must hold host_lock of the
- * parent Scsi_Host or already have a reference when calling this.
- *
- * This will fail if a device is deleted or cancelled, or when the LLD module
- * is in the process of being unloaded.
+ * @skip_deleted: when true, would return failed if device is deleted
  */
-int scsi_device_get(struct scsi_device *sdev)
+static int __scsi_device_get(struct scsi_device *sdev, bool skip_deleted)
 {
-	if (sdev->sdev_state == SDEV_DEL || sdev->sdev_state == SDEV_CANCEL)
+	/*
+	 * if skip_deleted is true and device is in removing, return failed
+	 */
+	if (skip_deleted &&
+	    (sdev->sdev_state == SDEV_DEL || sdev->sdev_state == SDEV_CANCEL))
 		goto fail;
 	if (!try_module_get(sdev->host->hostt->module))
 		goto fail;
@@ -761,6 +759,22 @@  int scsi_device_get(struct scsi_device *sdev)
 fail:
 	return -ENXIO;
 }
+
+/**
+ * scsi_device_get  -  get an additional reference to a scsi_device
+ * @sdev:	device to get a reference to
+ *
+ * Description: Gets a reference to the scsi_device and increments the use count
+ * of the underlying LLDD module.  You must hold host_lock of the
+ * parent Scsi_Host or already have a reference when calling this.
+ *
+ * This will fail if a device is deleted or cancelled, or when the LLD module
+ * is in the process of being unloaded.
+ */
+int scsi_device_get(struct scsi_device *sdev)
+{
+	return __scsi_device_get(sdev, 0);
+}
 EXPORT_SYMBOL(scsi_device_get);
 
 /**
@@ -780,9 +794,13 @@  void scsi_device_put(struct scsi_device *sdev)
 }
 EXPORT_SYMBOL(scsi_device_put);
 
-/* helper for shost_for_each_device, see that for documentation */
+/**
+ * helper for shost_for_each_device, see that for documentation
+ * @skip_deleted: if true, sdev in progress of removing would be skipped
+ */
 struct scsi_device *__scsi_iterate_devices(struct Scsi_Host *shost,
-					   struct scsi_device *prev)
+					   struct scsi_device *prev,
+					   bool skip_deleted)
 {
 	struct list_head *list = (prev ? &prev->siblings : &shost->__devices);
 	struct scsi_device *next = NULL;
@@ -792,7 +810,7 @@  struct scsi_device *__scsi_iterate_devices(struct Scsi_Host *shost,
 	while (list->next != &shost->__devices) {
 		next = list_entry(list->next, struct scsi_device, siblings);
 		/* skip devices that we can't get a reference to */
-		if (!scsi_device_get(next))
+		if (!__scsi_device_get(next, skip_deleted))
 			break;
 		next = NULL;
 		list = list->next;
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index 9c540f5468eb..5cb294ff0a41 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -412,7 +412,8 @@  extern void __starget_for_each_device(struct scsi_target *, void *,
 
 /* only exposed to implement shost_for_each_device */
 extern struct scsi_device *__scsi_iterate_devices(struct Scsi_Host *,
-						  struct scsi_device *);
+						  struct scsi_device *,
+						  bool);
 
 /**
  * shost_for_each_device - iterate over all devices of a host
@@ -422,11 +423,29 @@  extern struct scsi_device *__scsi_iterate_devices(struct Scsi_Host *,
  * Iterator that returns each device attached to @shost.  This loop
  * takes a reference on each device and releases it at the end.  If
  * you break out of the loop, you must call scsi_device_put(sdev).
+ *
+ * Note: this macro would skip sdev which is in progress of removing
  */
 #define shost_for_each_device(sdev, shost) \
-	for ((sdev) = __scsi_iterate_devices((shost), NULL); \
+	for ((sdev) = __scsi_iterate_devices((shost), NULL, 1); \
+	     (sdev); \
+	     (sdev) = __scsi_iterate_devices((shost), (sdev), 1))
+
+/*
+ * shost_for_each_device_include_deleted- iterate over all devices of a host
+ * @sdev: the &struct scsi_device to use as a cursor
+ * @shost: the &struct scsi_host to iterate over
+ *
+ * Iterator that returns each device attached to @shost.  This loop
+ * takes a reference on each device and releases it at the end.  If
+ * you break out of the loop, you must call scsi_device_put(sdev).
+ *
+ * Note: this macro would include sdev which is in progress of removing
+ */
+#define shost_for_each_device_include_deleted(sdev, shost) \
+	for ((sdev) = __scsi_iterate_devices((shost), NULL, 0); \
 	     (sdev); \
-	     (sdev) = __scsi_iterate_devices((shost), (sdev)))
+	     (sdev) = __scsi_iterate_devices((shost), (sdev), 0))
 
 /**
  * __shost_for_each_device - iterate over all devices of a host (UNLOCKED)