diff mbox

[v2,3/4] scsi: Avoid that .queuecommand() gets called for a quiesced SCSI device

Message ID 20180110181817.25166-4-bart.vanassche@wdc.com (mailing list archive)
State Not Applicable
Headers show

Commit Message

Bart Van Assche Jan. 10, 2018, 6:18 p.m. UTC
Several SCSI transport and LLD drivers surround code that does not
tolerate concurrent calls of .queuecommand() with scsi_target_block() /
scsi_target_unblock(). These last two functions use
blk_mq_quiesce_queue() / blk_mq_unquiesce_queue() for scsi-mq request
queues to prevent concurrent .queuecommand() calls. However, that is
not sufficient to prevent .queuecommand() calls from scsi_send_eh_cmnd().
Hence surround the .queuecommand() call from the SCSI error handler with
blk_start_wait_if_quiesced() / blk_finish_wait_if_quiesced().

Note: converting the .queuecommand() call in scsi_send_eh_cmnd() into
code that calls blk_get_request(), e.g. scsi_execute_req(), is not an
option since scsi_send_eh_cmnd() can be called if all requests are
allocated and if no requests will make progress without aborting any
of these requests.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Ming Lei <ming.lei@redhat.com>
---
 drivers/scsi/scsi_error.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Ming Lei Jan. 11, 2018, 2:23 a.m. UTC | #1
On Wed, Jan 10, 2018 at 10:18:16AM -0800, Bart Van Assche wrote:
> Several SCSI transport and LLD drivers surround code that does not
> tolerate concurrent calls of .queuecommand() with scsi_target_block() /
> scsi_target_unblock(). These last two functions use
> blk_mq_quiesce_queue() / blk_mq_unquiesce_queue() for scsi-mq request
> queues to prevent concurrent .queuecommand() calls. However, that is

Actually blk_mq_quiesce_queue() is supposed to disable and drain dispatch,
not for prevent concurrent .queuecommand() calls.

> not sufficient to prevent .queuecommand() calls from scsi_send_eh_cmnd().

Given it is error handling, do we need to prevent the .queuecommand() call
in scsi_send_eh_cmnd()? Could you share us what the actual issue
observed is from user view?

> Hence surround the .queuecommand() call from the SCSI error handler with
> blk_start_wait_if_quiesced() / blk_finish_wait_if_quiesced().
> 
> Note: converting the .queuecommand() call in scsi_send_eh_cmnd() into
> code that calls blk_get_request(), e.g. scsi_execute_req(), is not an
> option since scsi_send_eh_cmnd() can be called if all requests are
> allocated and if no requests will make progress without aborting any
> of these requests.

If we need to prevent the .queuecommand() in scsi_send_eh_cmnd(), what
do you think of the approach by requeuing the EH command via
scsi_mq_requeue_cmd()/scsi_requeue_cmd()? And the EH request will be
dispatched finally when the queue becomes unquiesced or the STOPPED
is cleared.
Bart Van Assche Jan. 12, 2018, 10:45 p.m. UTC | #2
T24gVGh1LCAyMDE4LTAxLTExIGF0IDEwOjIzICswODAwLCBNaW5nIExlaSB3cm90ZToNCj4gPiBu
b3Qgc3VmZmljaWVudCB0byBwcmV2ZW50IC5xdWV1ZWNvbW1hbmQoKSBjYWxscyBmcm9tIHNjc2lf
c2VuZF9laF9jbW5kKCkuDQo+IA0KPiBHaXZlbiBpdCBpcyBlcnJvciBoYW5kbGluZywgZG8gd2Ug
bmVlZCB0byBwcmV2ZW50IHRoZSAucXVldWVjb21tYW5kKCkgY2FsbA0KPiBpbiBzY3NpX3NlbmRf
ZWhfY21uZCgpPyBDb3VsZCB5b3Ugc2hhcmUgdXMgd2hhdCB0aGUgYWN0dWFsIGlzc3VlDQo+IG9i
c2VydmVkIGlzIGZyb20gdXNlciB2aWV3Pw0KDQpQbGVhc2UgaGF2ZSBhIGxvb2sgYXQgdGhlIGtl
cm5lbCBidWcgcmVwb3J0IGluIHRoZSBkZXNjcmlwdGlvbiBvZiBwYXRjaCA0LzQgb2YNCnRoaXMg
c2VyaWVzLg0KDQo+ID4gSGVuY2Ugc3Vycm91bmQgdGhlIC5xdWV1ZWNvbW1hbmQoKSBjYWxsIGZy
b20gdGhlIFNDU0kgZXJyb3IgaGFuZGxlciB3aXRoDQo+ID4gYmxrX3N0YXJ0X3dhaXRfaWZfcXVp
ZXNjZWQoKSAvIGJsa19maW5pc2hfd2FpdF9pZl9xdWllc2NlZCgpLg0KPiA+IA0KPiA+IE5vdGU6
IGNvbnZlcnRpbmcgdGhlIC5xdWV1ZWNvbW1hbmQoKSBjYWxsIGluIHNjc2lfc2VuZF9laF9jbW5k
KCkgaW50bw0KPiA+IGNvZGUgdGhhdCBjYWxscyBibGtfZ2V0X3JlcXVlc3QoKSwgZS5nLiBzY3Np
X2V4ZWN1dGVfcmVxKCksIGlzIG5vdCBhbg0KPiA+IG9wdGlvbiBzaW5jZSBzY3NpX3NlbmRfZWhf
Y21uZCgpIGNhbiBiZSBjYWxsZWQgaWYgYWxsIHJlcXVlc3RzIGFyZQ0KPiA+IGFsbG9jYXRlZCBh
bmQgaWYgbm8gcmVxdWVzdHMgd2lsbCBtYWtlIHByb2dyZXNzIHdpdGhvdXQgYWJvcnRpbmcgYW55
DQo+ID4gb2YgdGhlc2UgcmVxdWVzdHMuDQo+IA0KPiBJZiB3ZSBuZWVkIHRvIHByZXZlbnQgdGhl
IC5xdWV1ZWNvbW1hbmQoKSBpbiBzY3NpX3NlbmRfZWhfY21uZCgpLCB3aGF0DQo+IGRvIHlvdSB0
aGluayBvZiB0aGUgYXBwcm9hY2ggYnkgcmVxdWV1aW5nIHRoZSBFSCBjb21tYW5kIHZpYQ0KPiBz
Y3NpX21xX3JlcXVldWVfY21kKCkvc2NzaV9yZXF1ZXVlX2NtZCgpPyBBbmQgdGhlIEVIIHJlcXVl
c3Qgd2lsbCBiZQ0KPiBkaXNwYXRjaGVkIGZpbmFsbHkgd2hlbiB0aGUgcXVldWUgYmVjb21lcyB1
bnF1aWVzY2VkIG9yIHRoZSBTVE9QUEVEDQo+IGlzIGNsZWFyZWQuDQoNCkNhbGxpbmcgc2NzaV9t
cV9yZXF1ZXVlX2NtZCgpIHdvdWxkIHRyaWdnZXIgc2NzaV9tcV91bmluaXRfY21kKCksDQpibGtf
bXFfcHV0X2RyaXZlcl90YWcoKSwgYmxrX21xX2dldF9kcml2ZXJfdGFnKCkgYW5kIHNjc2lfbXFf
cHJlcF9mbigpLg0KVGhhdCBjb3VsZCBjYXVzZSBzY3NpX2VoX3NjbWRfYWRkKCkgdG8gYmUgY2Fs
bGVkIG11bHRpcGxlIHRpbWVzIGZvciB0aGUNCnNhbWUgU0NTSSBjb21tYW5kLiBIb3dldmVyLCB0
aGF0IHdvdWxkIGJyZWFrIG9uZSBvZiB0aGUgYXNzdW1wdGlvbnMNCnNjc2lfZWhfc2NtZF9hZGQo
KSBpcyBiYXNlZCBvbiwgbmFtZWx5IHRoYXQgdGhhdCBmdW5jdGlvbiBnZXRzIGNhbGxlZA0Kb25s
eSBvbmNlIHBlciBTQ1NJIGNvbW1hbmQgdGhhdCBpcyBpbiBwcm9ncmVzcy4NCg0KQmFydC4g
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ming Lei Jan. 13, 2018, 3:54 p.m. UTC | #3
On Fri, Jan 12, 2018 at 10:45:57PM +0000, Bart Van Assche wrote:
> On Thu, 2018-01-11 at 10:23 +0800, Ming Lei wrote:
> > > not sufficient to prevent .queuecommand() calls from scsi_send_eh_cmnd().
> > 
> > Given it is error handling, do we need to prevent the .queuecommand() call
> > in scsi_send_eh_cmnd()? Could you share us what the actual issue
> > observed is from user view?
> 
> Please have a look at the kernel bug report in the description of patch 4/4 of
> this series.

Thanks for your mentioning, then I can find the following comment in
srp_queuecommand():

		/*
		 * The SCSI EH thread is the only context from which srp_queuecommand()
		 * can get invoked for blocked devices (SDEV_BLOCK /
		 * SDEV_CREATED_BLOCK). Avoid racing with srp_reconnect_rport() by
		 * locking the rport mutex if invoked from inside the SCSI EH.
		 */

That means EH request is allowed to send to blocked device.

I also replied in patch 4/4, looks there is one simple one line change
which should address the issue of 'sleep in atomic context', please discuss that
in patch 4/4 thread.
Bart Van Assche Jan. 25, 2018, 5:07 p.m. UTC | #4
On Sat, 2018-01-13 at 23:54 +0800, Ming Lei wrote:
> Thanks for your mentioning, then I can find the following comment in

> srp_queuecommand():

> 

> 		/*

> 		 * The SCSI EH thread is the only context from which srp_queuecommand()

> 		 * can get invoked for blocked devices (SDEV_BLOCK /

> 		 * SDEV_CREATED_BLOCK). Avoid racing with srp_reconnect_rport() by

> 		 * locking the rport mutex if invoked from inside the SCSI EH.

> 		 */

> 

> That means EH request is allowed to send to blocked device.


No way. That's a bug and a bug that needs to be fixed.

Bart.
diff mbox

Patch

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 62b56de38ae8..f7154ea86715 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -1016,6 +1016,7 @@  static int scsi_send_eh_cmnd(struct scsi_cmnd *scmd, unsigned char *cmnd,
 {
 	struct scsi_device *sdev = scmd->device;
 	struct Scsi_Host *shost = sdev->host;
+	struct request_queue *q = sdev->request_queue;
 	DECLARE_COMPLETION_ONSTACK(done);
 	unsigned long timeleft = timeout;
 	struct scsi_eh_save ses;
@@ -1028,7 +1029,9 @@  static int scsi_send_eh_cmnd(struct scsi_cmnd *scmd, unsigned char *cmnd,
 
 	scsi_log_send(scmd);
 	scmd->scsi_done = scsi_eh_done;
+	blk_start_wait_if_quiesced(q);
 	rtn = shost->hostt->queuecommand(shost, scmd);
+	blk_finish_wait_if_quiesced(q);
 	if (rtn) {
 		if (timeleft > stall_for) {
 			scsi_eh_restore_cmnd(scmd, &ses);