Message ID | 20180110181817.25166-4-bart.vanassche@wdc.com (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
On Wed, Jan 10, 2018 at 10:18:16AM -0800, Bart Van Assche wrote: > Several SCSI transport and LLD drivers surround code that does not > tolerate concurrent calls of .queuecommand() with scsi_target_block() / > scsi_target_unblock(). These last two functions use > blk_mq_quiesce_queue() / blk_mq_unquiesce_queue() for scsi-mq request > queues to prevent concurrent .queuecommand() calls. However, that is Actually blk_mq_quiesce_queue() is supposed to disable and drain dispatch, not for prevent concurrent .queuecommand() calls. > not sufficient to prevent .queuecommand() calls from scsi_send_eh_cmnd(). Given it is error handling, do we need to prevent the .queuecommand() call in scsi_send_eh_cmnd()? Could you share us what the actual issue observed is from user view? > Hence surround the .queuecommand() call from the SCSI error handler with > blk_start_wait_if_quiesced() / blk_finish_wait_if_quiesced(). > > Note: converting the .queuecommand() call in scsi_send_eh_cmnd() into > code that calls blk_get_request(), e.g. scsi_execute_req(), is not an > option since scsi_send_eh_cmnd() can be called if all requests are > allocated and if no requests will make progress without aborting any > of these requests. If we need to prevent the .queuecommand() in scsi_send_eh_cmnd(), what do you think of the approach by requeuing the EH command via scsi_mq_requeue_cmd()/scsi_requeue_cmd()? And the EH request will be dispatched finally when the queue becomes unquiesced or the STOPPED is cleared.
T24gVGh1LCAyMDE4LTAxLTExIGF0IDEwOjIzICswODAwLCBNaW5nIExlaSB3cm90ZToNCj4gPiBu b3Qgc3VmZmljaWVudCB0byBwcmV2ZW50IC5xdWV1ZWNvbW1hbmQoKSBjYWxscyBmcm9tIHNjc2lf c2VuZF9laF9jbW5kKCkuDQo+IA0KPiBHaXZlbiBpdCBpcyBlcnJvciBoYW5kbGluZywgZG8gd2Ug bmVlZCB0byBwcmV2ZW50IHRoZSAucXVldWVjb21tYW5kKCkgY2FsbA0KPiBpbiBzY3NpX3NlbmRf ZWhfY21uZCgpPyBDb3VsZCB5b3Ugc2hhcmUgdXMgd2hhdCB0aGUgYWN0dWFsIGlzc3VlDQo+IG9i c2VydmVkIGlzIGZyb20gdXNlciB2aWV3Pw0KDQpQbGVhc2UgaGF2ZSBhIGxvb2sgYXQgdGhlIGtl cm5lbCBidWcgcmVwb3J0IGluIHRoZSBkZXNjcmlwdGlvbiBvZiBwYXRjaCA0LzQgb2YNCnRoaXMg c2VyaWVzLg0KDQo+ID4gSGVuY2Ugc3Vycm91bmQgdGhlIC5xdWV1ZWNvbW1hbmQoKSBjYWxsIGZy b20gdGhlIFNDU0kgZXJyb3IgaGFuZGxlciB3aXRoDQo+ID4gYmxrX3N0YXJ0X3dhaXRfaWZfcXVp ZXNjZWQoKSAvIGJsa19maW5pc2hfd2FpdF9pZl9xdWllc2NlZCgpLg0KPiA+IA0KPiA+IE5vdGU6 IGNvbnZlcnRpbmcgdGhlIC5xdWV1ZWNvbW1hbmQoKSBjYWxsIGluIHNjc2lfc2VuZF9laF9jbW5k KCkgaW50bw0KPiA+IGNvZGUgdGhhdCBjYWxscyBibGtfZ2V0X3JlcXVlc3QoKSwgZS5nLiBzY3Np X2V4ZWN1dGVfcmVxKCksIGlzIG5vdCBhbg0KPiA+IG9wdGlvbiBzaW5jZSBzY3NpX3NlbmRfZWhf Y21uZCgpIGNhbiBiZSBjYWxsZWQgaWYgYWxsIHJlcXVlc3RzIGFyZQ0KPiA+IGFsbG9jYXRlZCBh bmQgaWYgbm8gcmVxdWVzdHMgd2lsbCBtYWtlIHByb2dyZXNzIHdpdGhvdXQgYWJvcnRpbmcgYW55 DQo+ID4gb2YgdGhlc2UgcmVxdWVzdHMuDQo+IA0KPiBJZiB3ZSBuZWVkIHRvIHByZXZlbnQgdGhl IC5xdWV1ZWNvbW1hbmQoKSBpbiBzY3NpX3NlbmRfZWhfY21uZCgpLCB3aGF0DQo+IGRvIHlvdSB0 aGluayBvZiB0aGUgYXBwcm9hY2ggYnkgcmVxdWV1aW5nIHRoZSBFSCBjb21tYW5kIHZpYQ0KPiBz Y3NpX21xX3JlcXVldWVfY21kKCkvc2NzaV9yZXF1ZXVlX2NtZCgpPyBBbmQgdGhlIEVIIHJlcXVl c3Qgd2lsbCBiZQ0KPiBkaXNwYXRjaGVkIGZpbmFsbHkgd2hlbiB0aGUgcXVldWUgYmVjb21lcyB1 bnF1aWVzY2VkIG9yIHRoZSBTVE9QUEVEDQo+IGlzIGNsZWFyZWQuDQoNCkNhbGxpbmcgc2NzaV9t cV9yZXF1ZXVlX2NtZCgpIHdvdWxkIHRyaWdnZXIgc2NzaV9tcV91bmluaXRfY21kKCksDQpibGtf bXFfcHV0X2RyaXZlcl90YWcoKSwgYmxrX21xX2dldF9kcml2ZXJfdGFnKCkgYW5kIHNjc2lfbXFf cHJlcF9mbigpLg0KVGhhdCBjb3VsZCBjYXVzZSBzY3NpX2VoX3NjbWRfYWRkKCkgdG8gYmUgY2Fs bGVkIG11bHRpcGxlIHRpbWVzIGZvciB0aGUNCnNhbWUgU0NTSSBjb21tYW5kLiBIb3dldmVyLCB0 aGF0IHdvdWxkIGJyZWFrIG9uZSBvZiB0aGUgYXNzdW1wdGlvbnMNCnNjc2lfZWhfc2NtZF9hZGQo KSBpcyBiYXNlZCBvbiwgbmFtZWx5IHRoYXQgdGhhdCBmdW5jdGlvbiBnZXRzIGNhbGxlZA0Kb25s eSBvbmNlIHBlciBTQ1NJIGNvbW1hbmQgdGhhdCBpcyBpbiBwcm9ncmVzcy4NCg0KQmFydC4g -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Jan 12, 2018 at 10:45:57PM +0000, Bart Van Assche wrote: > On Thu, 2018-01-11 at 10:23 +0800, Ming Lei wrote: > > > not sufficient to prevent .queuecommand() calls from scsi_send_eh_cmnd(). > > > > Given it is error handling, do we need to prevent the .queuecommand() call > > in scsi_send_eh_cmnd()? Could you share us what the actual issue > > observed is from user view? > > Please have a look at the kernel bug report in the description of patch 4/4 of > this series. Thanks for your mentioning, then I can find the following comment in srp_queuecommand(): /* * The SCSI EH thread is the only context from which srp_queuecommand() * can get invoked for blocked devices (SDEV_BLOCK / * SDEV_CREATED_BLOCK). Avoid racing with srp_reconnect_rport() by * locking the rport mutex if invoked from inside the SCSI EH. */ That means EH request is allowed to send to blocked device. I also replied in patch 4/4, looks there is one simple one line change which should address the issue of 'sleep in atomic context', please discuss that in patch 4/4 thread.
On Sat, 2018-01-13 at 23:54 +0800, Ming Lei wrote: > Thanks for your mentioning, then I can find the following comment in > srp_queuecommand(): > > /* > * The SCSI EH thread is the only context from which srp_queuecommand() > * can get invoked for blocked devices (SDEV_BLOCK / > * SDEV_CREATED_BLOCK). Avoid racing with srp_reconnect_rport() by > * locking the rport mutex if invoked from inside the SCSI EH. > */ > > That means EH request is allowed to send to blocked device. No way. That's a bug and a bug that needs to be fixed. Bart.
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c index 62b56de38ae8..f7154ea86715 100644 --- a/drivers/scsi/scsi_error.c +++ b/drivers/scsi/scsi_error.c @@ -1016,6 +1016,7 @@ static int scsi_send_eh_cmnd(struct scsi_cmnd *scmd, unsigned char *cmnd, { struct scsi_device *sdev = scmd->device; struct Scsi_Host *shost = sdev->host; + struct request_queue *q = sdev->request_queue; DECLARE_COMPLETION_ONSTACK(done); unsigned long timeleft = timeout; struct scsi_eh_save ses; @@ -1028,7 +1029,9 @@ static int scsi_send_eh_cmnd(struct scsi_cmnd *scmd, unsigned char *cmnd, scsi_log_send(scmd); scmd->scsi_done = scsi_eh_done; + blk_start_wait_if_quiesced(q); rtn = shost->hostt->queuecommand(shost, scmd); + blk_finish_wait_if_quiesced(q); if (rtn) { if (timeleft > stall_for) { scsi_eh_restore_cmnd(scmd, &ses);