[1/1] scsi: fix hang when device state is set via sysfs

Message ID	20211006043117.11121-1-michael.christie@oracle.com (mailing list archive)
State	Changes Requested
Headers	show Return-Path: <linux-scsi-owner@kernel.org> From: Mike Christie <michael.christie@oracle.com> To: lijinlin3@huawei.com, qiulaibin@huawei.com, bvanassche@acm.org, wubo40@huawei.com, martin.petersen@oracle.com, linux-scsi@vger.kernel.org, james.bottomley@hansenpartnership.com Cc: Mike Christie <michael.christie@oracle.com> Subject: [PATCH 1/1] scsi: fix hang when device state is set via sysfs Date: Tue, 5 Oct 2021 23:31:17 -0500 Message-Id: <20211006043117.11121-1-michael.christie@oracle.com> Content-Transfer-Encoding: 8bit Content-Type: text/plain MIME-Version: 1.0 Precedence: bulk
Series	[1/1] scsi: fix hang when device state is set via sysfs \| expand [1/1] scsi: fix hang when device state is set via sysfs

Message ID

20211006043117.11121-1-michael.christie@oracle.com (mailing list archive)

State

Changes Requested

Headers

From: Mike Christie <michael.christie@oracle.com>
To: lijinlin3@huawei.com, qiulaibin@huawei.com, bvanassche@acm.org,
        wubo40@huawei.com, martin.petersen@oracle.com,
        linux-scsi@vger.kernel.org, james.bottomley@hansenpartnership.com
Cc: Mike Christie <michael.christie@oracle.com>
Subject: [PATCH 1/1] scsi: fix hang when device state is set via sysfs
Date: Tue,  5 Oct 2021 23:31:17 -0500
Message-Id: <20211006043117.11121-1-michael.christie@oracle.com>
Content-Transfer-Encoding: 8bit
Content-Type: text/plain
MIME-Version: 1.0
X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1
X-MS-Exchange-AntiSpam-MessageData-0: 
 uXbnpaDWWaLOIwLnqkCDqHy4dhQ74gKGRnqu8P7GVhuFg9w82fp3c4jSeNEWFEoTkXfFCIjrFCRpy0kEWjhGjXRoDRxIHn16IoW67ozdEIpCGfkdsreQin0HrnD85xsQQfNUdjAoXhDi8HpG5Vnz0EhT3bZG4Jpzcjtkd7UvBJDgnKzqr6h/xpKWy4YAkOsbQ5td4yg79DbZ4zmUjdrDUyHXORWctjnSiTyDxk4nNA6ISZBVkqFPN0Ywy7GgQFMo4QFiDP1FM9UyiKgT3IZL1LG+nnkOLxwFsO7+OoF/EtVdmD7A9rbnJ1jUJXl81XufUIe/MBMX5zWVVPLN13e90C/I4+ttVvYdR/7fATc7jwE/KZ56SXaLCBtpPdKFUnVWD7y9Hmd3+KUTRcAke0UIMxdrI5hHOJnuYiW8uHN/0w8nXDJblZWfAvfL+a8lmffn/pUHSGAtIPSSbvEiuvh8MyNj0Wgi+FLU2Yu5sco2pLDP9HN0IAVRHWJmFyIKxnIrYPEigWNxN9D4WuzE1tV7x2ubqJaIF9hjkFyU2J9oGnkNUEsRgEQ3IQSpVDKy2ZgVDkzKc02Z9C8/gYxaACJ5QTs6jWI1OId8QzKUg3YmbDB4+VNQklqjL/ByG8fUhITLB4kB7AxZx/kLpCoqQVT51VGu/zlJBYm61g3RLtw13h/nYhZIXAx6OybWMIjWAgiAXhyd7ApKMAC7hQKgve7DbTRzJvqg6mcMTcINemdb4nUepMgCsNwVUBtUWqiVybpld24oSEZuCyfLorUZgKgMOaTRTp/sr27veXCTivDdhVSH0RbwfvSqDjIeVnS/+zqmOqqnbn6Ml2Si8f4+ApvvxzkKc10YYjqxv49/ZcRUIIct2w4/RStvKIcakdOm3Ts/g0GyntxTQMVFpEexSlPPh0sO4LBSySm2qdRx4FC1QY9Ix6nlxnWv+jNLq1wxQxBcGIKxJvr2/k2YPFomds1ekPjZZK3gs00vmg0ZLoU2OnxEbY60QJgXtcKH2Fe04vs3FOAIx8cb/5mbviTYrB5E2QwL2W5OrfvEKmyd/zDkfcZxt8t3o/QXwxRBAvWDr4f7GHwjc55+DKX59O1iZcfedeOELHXah6kW5G9+vkw/w04HpRVpEeS37oDmDhgREBMdXc7kzNz7c4ofu/3TtGrjCTb8l4KX/ibbDlZ2rkzO+gplsT+Ls1uCj6LeXR8t/v7lrBzNc1ulOlYMCn/kYdk2zeg6oEJGUJ8LjZebkNh17L5UDdZyRsfp+gPVP56do4+ULDJN5lRkuJlMgusm5F7XdfbkxBM0Hi2KOMxcTUElv2qFFkVO0MDYvC6nTSFWN68b
X-OriginatorOrg: oracle.com
X-MS-Exchange-CrossTenant-Network-Message-Id: 
 660a8577-c531-4868-71ab-08d988822814
X-MS-Exchange-CrossTenant-AuthSource: DM5PR10MB1466.namprd10.prod.outlook.com
X-MS-Exchange-CrossTenant-AuthAs: Internal
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 06 Oct 2021 04:31:25.2378
 (UTC)
X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted
X-MS-Exchange-CrossTenant-Id: 4e2c6054-71cb-48f1-bd6c-3a9705aca71b
X-MS-Exchange-CrossTenant-MailboxType: HOSTED
X-MS-Exchange-CrossTenant-UserPrincipalName: 
 rHD8MDUwb5Em6LW6iBTU3Qf0C/au92yAOwKZXACXoRi79zrPvgC8csAhjDibx8VX2UARE6mTwttCgAfzTDYx+RQmJoAMiyKKb/Qgz1Lz5og=
X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM6PR10MB3033
X-Proofpoint-Virus-Version: vendor=nai engine=6300 definitions=10128
 signatures=668683
X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxscore=0
 mlxlogscore=999 adultscore=0
 bulkscore=0 suspectscore=0 malwarescore=0 spamscore=0 phishscore=0
 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2109230001
 definitions=main-2110060026
X-Proofpoint-GUID: 2vWK2a0E58VQkgZ9MDbiVJ3bpXJmWgJP
X-Proofpoint-ORIG-GUID: 2vWK2a0E58VQkgZ9MDbiVJ3bpXJmWgJP
Precedence: bulk
List-ID: <linux-scsi.vger.kernel.org>
X-Mailing-List: linux-scsi@vger.kernel.org

Series

[1/1] scsi: fix hang when device state is set via sysfs | expand

Commit Message

Mike Christie Oct. 6, 2021, 4:31 a.m. UTC

This fixes a regression added with:

commit f0f82e2476f6 ("scsi: core: Fix capacity set to zero after
offlinining device")

The problem is that after iSCSI recovery, iscsid will call into the kernel
to set the dev's state to running, and with that patch we now call
scsi_rescan_device with the state_mutex held. If the scsi error handler
thread is just starting to test the device in scsi_send_eh_cmnd then it's
going to try to grab the state_mutex.

We are then stuck, because when scsi_rescan_device tries to send its IO
scsi_queue_rq calls -> scsi_host_queue_ready -> scsi_host_in_recovery
will return true (the host state is still in recovery) and IO will just be
requeued. scsi_send_eh_cmnd will then never be able to grab the
state_mutex to finish error handling.

This just moves the scsi_rescan_device call to after we drop the
state_mutex.

Fixes: f0f82e2476f6 ("scsi: core: Fix capacity set to zero after
offlinining device")
Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/scsi/scsi_sysfs.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

Comments

Mike Christie Oct. 6, 2021, 4:45 a.m. UTC | #1

Cc'ing lee.

On 10/5/21 11:31 PM, Mike Christie wrote:
> This fixes a regression added with:
> 
> commit f0f82e2476f6 ("scsi: core: Fix capacity set to zero after
> offlinining device")
> 
> The problem is that after iSCSI recovery, iscsid will call into the kernel
> to set the dev's state to running, and with that patch we now call
> scsi_rescan_device with the state_mutex held. If the scsi error handler
> thread is just starting to test the device in scsi_send_eh_cmnd then it's
> going to try to grab the state_mutex.
> 
> We are then stuck, because when scsi_rescan_device tries to send its IO
> scsi_queue_rq calls -> scsi_host_queue_ready -> scsi_host_in_recovery
> will return true (the host state is still in recovery) and IO will just be
> requeued. scsi_send_eh_cmnd will then never be able to grab the
> state_mutex to finish error handling.
> 
> This just moves the scsi_rescan_device call to after we drop the
> state_mutex.


I want to maybe nak my own patch. There is still a problem where if one
of the rescan IOs hits an issue then userspace is stuck waiting for
however long it takes to perform recovery. For iscsid, this will cause
problems because it sets the device state from its main thread. So
while scsi_rescan_device is hung then iscsid can't do anything for
any session.

I think we either want to:

1. Do the patch below, but Lee will need to change iscsid so it sets
the dev state from a worker thread.

2. Have the kernel kick off the rescan from a workqueue. This seems
easiest but I'm not sure if it will cause issues for lijinlin's use
case.


> 
> Fixes: f0f82e2476f6 ("scsi: core: Fix capacity set to zero after
> offlinining device")
> Signed-off-by: Mike Christie <michael.christie@oracle.com>
> ---
>  drivers/scsi/scsi_sysfs.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
> index 86793259e541..5b63407c3a3f 100644
> --- a/drivers/scsi/scsi_sysfs.c
> +++ b/drivers/scsi/scsi_sysfs.c
> @@ -788,6 +788,7 @@ store_state_field(struct device *dev, struct device_attribute *attr,
>  	int i, ret;
>  	struct scsi_device *sdev = to_scsi_device(dev);
>  	enum scsi_device_state state = 0;
> +	bool rescan_dev = false;
>  
>  	for (i = 0; i < ARRAY_SIZE(sdev_states); i++) {
>  		const int len = strlen(sdev_states[i].name);
> @@ -817,10 +818,13 @@ store_state_field(struct device *dev, struct device_attribute *attr,
>  	 */
>  	if (ret == 0 && state == SDEV_RUNNING) {
>  		blk_mq_run_hw_queues(sdev->request_queue, true);
> -		scsi_rescan_device(dev);
> +		rescan_dev = true;
>  	}
>  	mutex_unlock(&sdev->state_mutex);
>  
> +	if (rescan_dev)
> +		scsi_rescan_device(dev);
> +
>  	return ret == 0 ? count : -EINVAL;
>  }
>  
>

Lee Duncan Oct. 6, 2021, 4:59 p.m. UTC | #2

On 10/5/21 9:45 PM, Mike Christie wrote:
> Cc'ing lee.
> 
> On 10/5/21 11:31 PM, Mike Christie wrote:
>> This fixes a regression added with:
>>
>> commit f0f82e2476f6 ("scsi: core: Fix capacity set to zero after
>> offlinining device")
>>
>> The problem is that after iSCSI recovery, iscsid will call into the kernel
>> to set the dev's state to running, and with that patch we now call
>> scsi_rescan_device with the state_mutex held. If the scsi error handler
>> thread is just starting to test the device in scsi_send_eh_cmnd then it's
>> going to try to grab the state_mutex.
>>
>> We are then stuck, because when scsi_rescan_device tries to send its IO
>> scsi_queue_rq calls -> scsi_host_queue_ready -> scsi_host_in_recovery
>> will return true (the host state is still in recovery) and IO will just be
>> requeued. scsi_send_eh_cmnd will then never be able to grab the
>> state_mutex to finish error handling.
>>
>> This just moves the scsi_rescan_device call to after we drop the
>> state_mutex.
> 
> 
> I want to maybe nak my own patch. There is still a problem where if one
> of the rescan IOs hits an issue then userspace is stuck waiting for
> however long it takes to perform recovery. For iscsid, this will cause
> problems because it sets the device state from its main thread. So
> while scsi_rescan_device is hung then iscsid can't do anything for
> any session.
> 
> I think we either want to:
> 
> 1. Do the patch below, but Lee will need to change iscsid so it sets
> the dev state from a worker thread.
> 
> 2. Have the kernel kick off the rescan from a workqueue. This seems
> easiest but I'm not sure if it will cause issues for lijinlin's use
> case.

I vote for #2, if possible.

> 
> 
>>
>> Fixes: f0f82e2476f6 ("scsi: core: Fix capacity set to zero after
>> offlinining device")
>> Signed-off-by: Mike Christie <michael.christie@oracle.com>
>> ---
>>  drivers/scsi/scsi_sysfs.c | 6 +++++-
>>  1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
>> index 86793259e541..5b63407c3a3f 100644
>> --- a/drivers/scsi/scsi_sysfs.c
>> +++ b/drivers/scsi/scsi_sysfs.c
>> @@ -788,6 +788,7 @@ store_state_field(struct device *dev, struct device_attribute *attr,
>>  	int i, ret;
>>  	struct scsi_device *sdev = to_scsi_device(dev);
>>  	enum scsi_device_state state = 0;
>> +	bool rescan_dev = false;
>>  
>>  	for (i = 0; i < ARRAY_SIZE(sdev_states); i++) {
>>  		const int len = strlen(sdev_states[i].name);
>> @@ -817,10 +818,13 @@ store_state_field(struct device *dev, struct device_attribute *attr,
>>  	 */
>>  	if (ret == 0 && state == SDEV_RUNNING) {
>>  		blk_mq_run_hw_queues(sdev->request_queue, true);
>> -		scsi_rescan_device(dev);
>> +		rescan_dev = true;
>>  	}
>>  	mutex_unlock(&sdev->state_mutex);
>>  
>> +	if (rescan_dev)
>> +		scsi_rescan_device(dev);
>> +
>>  	return ret == 0 ? count : -EINVAL;
>>  }
>>  
>>
>

Mike Christie Oct. 12, 2021, 3:50 p.m. UTC | #3

On 10/5/21 11:45 PM, Mike Christie wrote:
> Cc'ing lee.
> 
> On 10/5/21 11:31 PM, Mike Christie wrote:
>> This fixes a regression added with:
>>
>> commit f0f82e2476f6 ("scsi: core: Fix capacity set to zero after
>> offlinining device")
>>
>> The problem is that after iSCSI recovery, iscsid will call into the kernel
>> to set the dev's state to running, and with that patch we now call
>> scsi_rescan_device with the state_mutex held. If the scsi error handler
>> thread is just starting to test the device in scsi_send_eh_cmnd then it's
>> going to try to grab the state_mutex.
>>
>> We are then stuck, because when scsi_rescan_device tries to send its IO
>> scsi_queue_rq calls -> scsi_host_queue_ready -> scsi_host_in_recovery
>> will return true (the host state is still in recovery) and IO will just be
>> requeued. scsi_send_eh_cmnd will then never be able to grab the
>> state_mutex to finish error handling.
>>
>> This just moves the scsi_rescan_device call to after we drop the
>> state_mutex.
> 
> 
> I want to maybe nak my own patch. There is still a problem where if one
> of the rescan IOs hits an issue then userspace is stuck waiting for
> however long it takes to perform recovery. For iscsid, this will cause
> problems because it sets the device state from its main thread. So
> while scsi_rescan_device is hung then iscsid can't do anything for
> any session.
> 
> I think we either want to:
> 
> 1. Do the patch below, but Lee will need to change iscsid so it sets
> the dev state from a worker thread.
> 
> 2. Have the kernel kick off the rescan from a workqueue. This seems
> easiest but I'm not sure if it will cause issues for lijinlin's use
> case.

I have not heard from huawei, but I don't think we can do 2. The problem
is that I think userspace will not assume once the write returns that the
device is ready to go. So we can't:

1. just kick off an async rescan, then return. There are issues with this
assumption though. See below.

2.
	2.A kick off rescan
	2.B wait for rescan to complete or device/host to change state

this could hang iscsid until the scsi cmd timer fires or until the
transport timer fires.


I think the options are:

1. Revert the patch. The problem is that the rescan can still fail,
and if it didn't cause a deadlock due to the bug in this thread, the
device could go back to offline, but we would still return success.

Why didn't userspace just rescan from sysfs?

2. Do the patch below so we at least don't deadlock and fix iscsid. Maybe
iscsid was a little too smart for its own good, and should not have assumed
that writing to the state file could not block for long periods.

3. ?


> 
> 
>>
>> Fixes: f0f82e2476f6 ("scsi: core: Fix capacity set to zero after
>> offlinining device")
>> Signed-off-by: Mike Christie <michael.christie@oracle.com>
>> ---
>>  drivers/scsi/scsi_sysfs.c | 6 +++++-
>>  1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
>> index 86793259e541..5b63407c3a3f 100644
>> --- a/drivers/scsi/scsi_sysfs.c
>> +++ b/drivers/scsi/scsi_sysfs.c
>> @@ -788,6 +788,7 @@ store_state_field(struct device *dev, struct device_attribute *attr,
>>  	int i, ret;
>>  	struct scsi_device *sdev = to_scsi_device(dev);
>>  	enum scsi_device_state state = 0;
>> +	bool rescan_dev = false;
>>  
>>  	for (i = 0; i < ARRAY_SIZE(sdev_states); i++) {
>>  		const int len = strlen(sdev_states[i].name);
>> @@ -817,10 +818,13 @@ store_state_field(struct device *dev, struct device_attribute *attr,
>>  	 */
>>  	if (ret == 0 && state == SDEV_RUNNING) {
>>  		blk_mq_run_hw_queues(sdev->request_queue, true);
>> -		scsi_rescan_device(dev);
>> +		rescan_dev = true;
>>  	}
>>  	mutex_unlock(&sdev->state_mutex);
>>  
>> +	if (rescan_dev)
>> +		scsi_rescan_device(dev);
>> +
>>  	return ret == 0 ? count : -EINVAL;
>>  }
>>  
>>
>

Mike Christie Oct. 12, 2021, 3:52 p.m. UTC | #4

On 10/12/21 10:50 AM, Mike Christie wrote:
> On 10/5/21 11:45 PM, Mike Christie wrote:
>> Cc'ing lee.
>>
>> On 10/5/21 11:31 PM, Mike Christie wrote:
>>> This fixes a regression added with:
>>>
>>> commit f0f82e2476f6 ("scsi: core: Fix capacity set to zero after
>>> offlinining device")
>>>
>>> The problem is that after iSCSI recovery, iscsid will call into the kernel
>>> to set the dev's state to running, and with that patch we now call
>>> scsi_rescan_device with the state_mutex held. If the scsi error handler
>>> thread is just starting to test the device in scsi_send_eh_cmnd then it's
>>> going to try to grab the state_mutex.
>>>
>>> We are then stuck, because when scsi_rescan_device tries to send its IO
>>> scsi_queue_rq calls -> scsi_host_queue_ready -> scsi_host_in_recovery
>>> will return true (the host state is still in recovery) and IO will just be
>>> requeued. scsi_send_eh_cmnd will then never be able to grab the
>>> state_mutex to finish error handling.
>>>
>>> This just moves the scsi_rescan_device call to after we drop the
>>> state_mutex.
>>
>>
>> I want to maybe nak my own patch. There is still a problem where if one
>> of the rescan IOs hits an issue then userspace is stuck waiting for
>> however long it takes to perform recovery. For iscsid, this will cause
>> problems because it sets the device state from its main thread. So
>> while scsi_rescan_device is hung then iscsid can't do anything for
>> any session.
>>
>> I think we either want to:
>>
>> 1. Do the patch below, but Lee will need to change iscsid so it sets
>> the dev state from a worker thread.
>>
>> 2. Have the kernel kick off the rescan from a workqueue. This seems
>> easiest but I'm not sure if it will cause issues for lijinlin's use
>> case.
> 
> I have not heard from huawei, but I don't think we can do 2. The problem
> is that I think userspace will not assume once the write returns that the

Meant userspace will now assume.

diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index 86793259e541..5b63407c3a3f 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -788,6 +788,7 @@  store_state_field(struct device *dev, struct device_attribute *attr,
 	int i, ret;
 	struct scsi_device *sdev = to_scsi_device(dev);
 	enum scsi_device_state state = 0;
+	bool rescan_dev = false;
 
 	for (i = 0; i < ARRAY_SIZE(sdev_states); i++) {
 		const int len = strlen(sdev_states[i].name);
@@ -817,10 +818,13 @@  store_state_field(struct device *dev, struct device_attribute *attr,
 	 */
 	if (ret == 0 && state == SDEV_RUNNING) {
 		blk_mq_run_hw_queues(sdev->request_queue, true);
-		scsi_rescan_device(dev);
+		rescan_dev = true;
 	}
 	mutex_unlock(&sdev->state_mutex);
 
+	if (rescan_dev)
+		scsi_rescan_device(dev);
+
 	return ret == 0 ? count : -EINVAL;
 }

[1/1] scsi: fix hang when device state is set via sysfs

Commit Message

Comments

Patch