diff mbox series

[1/1] scsi: fix hang when device state is set via sysfs

Message ID 20211006043117.11121-1-michael.christie@oracle.com (mailing list archive)
State Changes Requested
Headers show
Series [1/1] scsi: fix hang when device state is set via sysfs | expand

Commit Message

Mike Christie Oct. 6, 2021, 4:31 a.m. UTC
This fixes a regression added with:

commit f0f82e2476f6 ("scsi: core: Fix capacity set to zero after
offlinining device")

The problem is that after iSCSI recovery, iscsid will call into the kernel
to set the dev's state to running, and with that patch we now call
scsi_rescan_device with the state_mutex held. If the scsi error handler
thread is just starting to test the device in scsi_send_eh_cmnd then it's
going to try to grab the state_mutex.

We are then stuck, because when scsi_rescan_device tries to send its IO
scsi_queue_rq calls -> scsi_host_queue_ready -> scsi_host_in_recovery
will return true (the host state is still in recovery) and IO will just be
requeued. scsi_send_eh_cmnd will then never be able to grab the
state_mutex to finish error handling.

This just moves the scsi_rescan_device call to after we drop the
state_mutex.

Fixes: f0f82e2476f6 ("scsi: core: Fix capacity set to zero after
offlinining device")
Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/scsi/scsi_sysfs.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

Comments

Mike Christie Oct. 6, 2021, 4:45 a.m. UTC | #1
Cc'ing lee.

On 10/5/21 11:31 PM, Mike Christie wrote:
> This fixes a regression added with:
> 
> commit f0f82e2476f6 ("scsi: core: Fix capacity set to zero after
> offlinining device")
> 
> The problem is that after iSCSI recovery, iscsid will call into the kernel
> to set the dev's state to running, and with that patch we now call
> scsi_rescan_device with the state_mutex held. If the scsi error handler
> thread is just starting to test the device in scsi_send_eh_cmnd then it's
> going to try to grab the state_mutex.
> 
> We are then stuck, because when scsi_rescan_device tries to send its IO
> scsi_queue_rq calls -> scsi_host_queue_ready -> scsi_host_in_recovery
> will return true (the host state is still in recovery) and IO will just be
> requeued. scsi_send_eh_cmnd will then never be able to grab the
> state_mutex to finish error handling.
> 
> This just moves the scsi_rescan_device call to after we drop the
> state_mutex.


I want to maybe nak my own patch. There is still a problem where if one
of the rescan IOs hits an issue then userspace is stuck waiting for
however long it takes to perform recovery. For iscsid, this will cause
problems because it sets the device state from its main thread. So
while scsi_rescan_device is hung then iscsid can't do anything for
any session.

I think we either want to:

1. Do the patch below, but Lee will need to change iscsid so it sets
the dev state from a worker thread.

2. Have the kernel kick off the rescan from a workqueue. This seems
easiest but I'm not sure if it will cause issues for lijinlin's use
case.


> 
> Fixes: f0f82e2476f6 ("scsi: core: Fix capacity set to zero after
> offlinining device")
> Signed-off-by: Mike Christie <michael.christie@oracle.com>
> ---
>  drivers/scsi/scsi_sysfs.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
> index 86793259e541..5b63407c3a3f 100644
> --- a/drivers/scsi/scsi_sysfs.c
> +++ b/drivers/scsi/scsi_sysfs.c
> @@ -788,6 +788,7 @@ store_state_field(struct device *dev, struct device_attribute *attr,
>  	int i, ret;
>  	struct scsi_device *sdev = to_scsi_device(dev);
>  	enum scsi_device_state state = 0;
> +	bool rescan_dev = false;
>  
>  	for (i = 0; i < ARRAY_SIZE(sdev_states); i++) {
>  		const int len = strlen(sdev_states[i].name);
> @@ -817,10 +818,13 @@ store_state_field(struct device *dev, struct device_attribute *attr,
>  	 */
>  	if (ret == 0 && state == SDEV_RUNNING) {
>  		blk_mq_run_hw_queues(sdev->request_queue, true);
> -		scsi_rescan_device(dev);
> +		rescan_dev = true;
>  	}
>  	mutex_unlock(&sdev->state_mutex);
>  
> +	if (rescan_dev)
> +		scsi_rescan_device(dev);
> +
>  	return ret == 0 ? count : -EINVAL;
>  }
>  
>
Lee Duncan Oct. 6, 2021, 4:59 p.m. UTC | #2
On 10/5/21 9:45 PM, Mike Christie wrote:
> Cc'ing lee.
> 
> On 10/5/21 11:31 PM, Mike Christie wrote:
>> This fixes a regression added with:
>>
>> commit f0f82e2476f6 ("scsi: core: Fix capacity set to zero after
>> offlinining device")
>>
>> The problem is that after iSCSI recovery, iscsid will call into the kernel
>> to set the dev's state to running, and with that patch we now call
>> scsi_rescan_device with the state_mutex held. If the scsi error handler
>> thread is just starting to test the device in scsi_send_eh_cmnd then it's
>> going to try to grab the state_mutex.
>>
>> We are then stuck, because when scsi_rescan_device tries to send its IO
>> scsi_queue_rq calls -> scsi_host_queue_ready -> scsi_host_in_recovery
>> will return true (the host state is still in recovery) and IO will just be
>> requeued. scsi_send_eh_cmnd will then never be able to grab the
>> state_mutex to finish error handling.
>>
>> This just moves the scsi_rescan_device call to after we drop the
>> state_mutex.
> 
> 
> I want to maybe nak my own patch. There is still a problem where if one
> of the rescan IOs hits an issue then userspace is stuck waiting for
> however long it takes to perform recovery. For iscsid, this will cause
> problems because it sets the device state from its main thread. So
> while scsi_rescan_device is hung then iscsid can't do anything for
> any session.
> 
> I think we either want to:
> 
> 1. Do the patch below, but Lee will need to change iscsid so it sets
> the dev state from a worker thread.
> 
> 2. Have the kernel kick off the rescan from a workqueue. This seems
> easiest but I'm not sure if it will cause issues for lijinlin's use
> case.

I vote for #2, if possible.

> 
> 
>>
>> Fixes: f0f82e2476f6 ("scsi: core: Fix capacity set to zero after
>> offlinining device")
>> Signed-off-by: Mike Christie <michael.christie@oracle.com>
>> ---
>>  drivers/scsi/scsi_sysfs.c | 6 +++++-
>>  1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
>> index 86793259e541..5b63407c3a3f 100644
>> --- a/drivers/scsi/scsi_sysfs.c
>> +++ b/drivers/scsi/scsi_sysfs.c
>> @@ -788,6 +788,7 @@ store_state_field(struct device *dev, struct device_attribute *attr,
>>  	int i, ret;
>>  	struct scsi_device *sdev = to_scsi_device(dev);
>>  	enum scsi_device_state state = 0;
>> +	bool rescan_dev = false;
>>  
>>  	for (i = 0; i < ARRAY_SIZE(sdev_states); i++) {
>>  		const int len = strlen(sdev_states[i].name);
>> @@ -817,10 +818,13 @@ store_state_field(struct device *dev, struct device_attribute *attr,
>>  	 */
>>  	if (ret == 0 && state == SDEV_RUNNING) {
>>  		blk_mq_run_hw_queues(sdev->request_queue, true);
>> -		scsi_rescan_device(dev);
>> +		rescan_dev = true;
>>  	}
>>  	mutex_unlock(&sdev->state_mutex);
>>  
>> +	if (rescan_dev)
>> +		scsi_rescan_device(dev);
>> +
>>  	return ret == 0 ? count : -EINVAL;
>>  }
>>  
>>
>
Mike Christie Oct. 12, 2021, 3:50 p.m. UTC | #3
On 10/5/21 11:45 PM, Mike Christie wrote:
> Cc'ing lee.
> 
> On 10/5/21 11:31 PM, Mike Christie wrote:
>> This fixes a regression added with:
>>
>> commit f0f82e2476f6 ("scsi: core: Fix capacity set to zero after
>> offlinining device")
>>
>> The problem is that after iSCSI recovery, iscsid will call into the kernel
>> to set the dev's state to running, and with that patch we now call
>> scsi_rescan_device with the state_mutex held. If the scsi error handler
>> thread is just starting to test the device in scsi_send_eh_cmnd then it's
>> going to try to grab the state_mutex.
>>
>> We are then stuck, because when scsi_rescan_device tries to send its IO
>> scsi_queue_rq calls -> scsi_host_queue_ready -> scsi_host_in_recovery
>> will return true (the host state is still in recovery) and IO will just be
>> requeued. scsi_send_eh_cmnd will then never be able to grab the
>> state_mutex to finish error handling.
>>
>> This just moves the scsi_rescan_device call to after we drop the
>> state_mutex.
> 
> 
> I want to maybe nak my own patch. There is still a problem where if one
> of the rescan IOs hits an issue then userspace is stuck waiting for
> however long it takes to perform recovery. For iscsid, this will cause
> problems because it sets the device state from its main thread. So
> while scsi_rescan_device is hung then iscsid can't do anything for
> any session.
> 
> I think we either want to:
> 
> 1. Do the patch below, but Lee will need to change iscsid so it sets
> the dev state from a worker thread.
> 
> 2. Have the kernel kick off the rescan from a workqueue. This seems
> easiest but I'm not sure if it will cause issues for lijinlin's use
> case.

I have not heard from huawei, but I don't think we can do 2. The problem
is that I think userspace will not assume once the write returns that the
device is ready to go. So we can't:

1. just kick off an async rescan, then return. There are issues with this
assumption though. See below.

2.
	2.A kick off rescan
	2.B wait for rescan to complete or device/host to change state

this could hang iscsid until the scsi cmd timer fires or until the
transport timer fires.


I think the options are:

1. Revert the patch. The problem is that the rescan can still fail,
and if it didn't cause a deadlock due to the bug in this thread, the
device could go back to offline, but we would still return success.

Why didn't userspace just rescan from sysfs?

2. Do the patch below so we at least don't deadlock and fix iscsid. Maybe
iscsid was a little too smart for its own good, and should not have assumed
that writing to the state file could not block for long periods.

3. ?


> 
> 
>>
>> Fixes: f0f82e2476f6 ("scsi: core: Fix capacity set to zero after
>> offlinining device")
>> Signed-off-by: Mike Christie <michael.christie@oracle.com>
>> ---
>>  drivers/scsi/scsi_sysfs.c | 6 +++++-
>>  1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
>> index 86793259e541..5b63407c3a3f 100644
>> --- a/drivers/scsi/scsi_sysfs.c
>> +++ b/drivers/scsi/scsi_sysfs.c
>> @@ -788,6 +788,7 @@ store_state_field(struct device *dev, struct device_attribute *attr,
>>  	int i, ret;
>>  	struct scsi_device *sdev = to_scsi_device(dev);
>>  	enum scsi_device_state state = 0;
>> +	bool rescan_dev = false;
>>  
>>  	for (i = 0; i < ARRAY_SIZE(sdev_states); i++) {
>>  		const int len = strlen(sdev_states[i].name);
>> @@ -817,10 +818,13 @@ store_state_field(struct device *dev, struct device_attribute *attr,
>>  	 */
>>  	if (ret == 0 && state == SDEV_RUNNING) {
>>  		blk_mq_run_hw_queues(sdev->request_queue, true);
>> -		scsi_rescan_device(dev);
>> +		rescan_dev = true;
>>  	}
>>  	mutex_unlock(&sdev->state_mutex);
>>  
>> +	if (rescan_dev)
>> +		scsi_rescan_device(dev);
>> +
>>  	return ret == 0 ? count : -EINVAL;
>>  }
>>  
>>
>
Mike Christie Oct. 12, 2021, 3:52 p.m. UTC | #4
On 10/12/21 10:50 AM, Mike Christie wrote:
> On 10/5/21 11:45 PM, Mike Christie wrote:
>> Cc'ing lee.
>>
>> On 10/5/21 11:31 PM, Mike Christie wrote:
>>> This fixes a regression added with:
>>>
>>> commit f0f82e2476f6 ("scsi: core: Fix capacity set to zero after
>>> offlinining device")
>>>
>>> The problem is that after iSCSI recovery, iscsid will call into the kernel
>>> to set the dev's state to running, and with that patch we now call
>>> scsi_rescan_device with the state_mutex held. If the scsi error handler
>>> thread is just starting to test the device in scsi_send_eh_cmnd then it's
>>> going to try to grab the state_mutex.
>>>
>>> We are then stuck, because when scsi_rescan_device tries to send its IO
>>> scsi_queue_rq calls -> scsi_host_queue_ready -> scsi_host_in_recovery
>>> will return true (the host state is still in recovery) and IO will just be
>>> requeued. scsi_send_eh_cmnd will then never be able to grab the
>>> state_mutex to finish error handling.
>>>
>>> This just moves the scsi_rescan_device call to after we drop the
>>> state_mutex.
>>
>>
>> I want to maybe nak my own patch. There is still a problem where if one
>> of the rescan IOs hits an issue then userspace is stuck waiting for
>> however long it takes to perform recovery. For iscsid, this will cause
>> problems because it sets the device state from its main thread. So
>> while scsi_rescan_device is hung then iscsid can't do anything for
>> any session.
>>
>> I think we either want to:
>>
>> 1. Do the patch below, but Lee will need to change iscsid so it sets
>> the dev state from a worker thread.
>>
>> 2. Have the kernel kick off the rescan from a workqueue. This seems
>> easiest but I'm not sure if it will cause issues for lijinlin's use
>> case.
> 
> I have not heard from huawei, but I don't think we can do 2. The problem
> is that I think userspace will not assume once the write returns that the

Meant userspace will now assume.
diff mbox series

Patch

diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index 86793259e541..5b63407c3a3f 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -788,6 +788,7 @@  store_state_field(struct device *dev, struct device_attribute *attr,
 	int i, ret;
 	struct scsi_device *sdev = to_scsi_device(dev);
 	enum scsi_device_state state = 0;
+	bool rescan_dev = false;
 
 	for (i = 0; i < ARRAY_SIZE(sdev_states); i++) {
 		const int len = strlen(sdev_states[i].name);
@@ -817,10 +818,13 @@  store_state_field(struct device *dev, struct device_attribute *attr,
 	 */
 	if (ret == 0 && state == SDEV_RUNNING) {
 		blk_mq_run_hw_queues(sdev->request_queue, true);
-		scsi_rescan_device(dev);
+		rescan_dev = true;
 	}
 	mutex_unlock(&sdev->state_mutex);
 
+	if (rescan_dev)
+		scsi_rescan_device(dev);
+
 	return ret == 0 ? count : -EINVAL;
 }