Message ID | 20170927054853.6647-7-ming.lei@redhat.com (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
On Wed, 2017-09-27 at 13:48 +0800, Ming Lei wrote: > @@ -2928,12 +2929,28 @@ scsi_device_quiesce(struct scsi_device *sdev) > { > int err; > > + /* > + * Simply quiesing SCSI device isn't safe, it is easy > + * to use up requests because all these allocated requests > + * can't be dispatched when device is put in QIUESCE. > + * Then no request can be allocated and we may hang > + * somewhere, such as system suspend/resume. > + * > + * So we set block queue in preempt only first, no new > + * normal request can enter queue any more, and all pending > + * requests are drained once blk_set_preempt_only() > + * returns. Only RQF_PREEMPT is allowed in preempt only mode. > + */ > + blk_set_preempt_only(sdev->request_queue, true); > + > mutex_lock(&sdev->state_mutex); > err = scsi_device_set_state(sdev, SDEV_QUIESCE); > mutex_unlock(&sdev->state_mutex); > > - if (err) > + if (err) { > + blk_set_preempt_only(sdev->request_queue, false); > return err; > + } > > scsi_run_queue(sdev->request_queue); > while (atomic_read(&sdev->device_busy)) { > @@ -2964,6 +2981,8 @@ void scsi_device_resume(struct scsi_device *sdev) > scsi_device_set_state(sdev, SDEV_RUNNING) == 0) > scsi_run_queue(sdev->request_queue); > mutex_unlock(&sdev->state_mutex); > + > + blk_set_preempt_only(sdev->request_queue, false); You should have realized yourself that this code is racy. If a request is allocated just before scsi_device_quiesce() is called and dispatched just after the device state has been changed into SDEV_QUIESCE then the loop that waits for all commands to complete will wait forever due to the SCSI prep function returning BLKPREP_DEFER. Bart.
On Wed, Sep 27, 2017 at 09:54:09AM +0000, Bart Van Assche wrote: > On Wed, 2017-09-27 at 13:48 +0800, Ming Lei wrote: > > @@ -2928,12 +2929,28 @@ scsi_device_quiesce(struct scsi_device *sdev) > > { > > int err; > > > > + /* > > + * Simply quiesing SCSI device isn't safe, it is easy > > + * to use up requests because all these allocated requests > > + * can't be dispatched when device is put in QIUESCE. > > + * Then no request can be allocated and we may hang > > + * somewhere, such as system suspend/resume. > > + * > > + * So we set block queue in preempt only first, no new > > + * normal request can enter queue any more, and all pending > > + * requests are drained once blk_set_preempt_only() > > + * returns. Only RQF_PREEMPT is allowed in preempt only mode. > > + */ > > + blk_set_preempt_only(sdev->request_queue, true); > > + > > mutex_lock(&sdev->state_mutex); > > err = scsi_device_set_state(sdev, SDEV_QUIESCE); > > mutex_unlock(&sdev->state_mutex); > > > > - if (err) > > + if (err) { > > + blk_set_preempt_only(sdev->request_queue, false); > > return err; > > + } > > > > scsi_run_queue(sdev->request_queue); > > while (atomic_read(&sdev->device_busy)) { > > @@ -2964,6 +2981,8 @@ void scsi_device_resume(struct scsi_device *sdev) > > scsi_device_set_state(sdev, SDEV_RUNNING) == 0) > > scsi_run_queue(sdev->request_queue); > > mutex_unlock(&sdev->state_mutex); > > + > > + blk_set_preempt_only(sdev->request_queue, false); > > You should have realized yourself that this code is racy. If a request is > allocated just before scsi_device_quiesce() is called and dispatched just > after the device state has been changed into SDEV_QUIESCE then the loop that That won't happen, any requests allocated before blk_set_preempt_only(true) will be drained. Any normal requests are prevented from being entering queue after blk_set_preempt_only(true) returns. Please look at blk_set_preempt_only(): +void blk_set_preempt_only(struct request_queue *q, bool preempt_only) +{ + blk_mq_freeze_queue(q); + if (preempt_only) + queue_flag_set_unlocked(QUEUE_FLAG_PREEMPT_ONLY, q); + else + queue_flag_clear_unlocked(QUEUE_FLAG_PREEMPT_ONLY, q); + blk_mq_unfreeze_queue(q); +} +EXPORT_SYMBOL(blk_set_preempt_only); blk_set_preempt_only(true) is called before scsi_device_set_state(sdev, SDEV_QUIESCE), then any requests will be drained by blk_mq_freeze_queue() inside blk_set_preempt_only(), meantime new normal requests are prevented from being entering queue. Once blk_set_preempt_only() returns, only RQF_PREEMPT is allowed to enter queue.
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index 9cf6a80fe297..82c51619f1b7 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -252,9 +252,10 @@ int scsi_execute(struct scsi_device *sdev, const unsigned char *cmd, struct scsi_request *rq; int ret = DRIVER_ERROR << 24; - req = blk_get_request(sdev->request_queue, + req = __blk_get_request(sdev->request_queue, data_direction == DMA_TO_DEVICE ? - REQ_OP_SCSI_OUT : REQ_OP_SCSI_IN, __GFP_RECLAIM); + REQ_OP_SCSI_OUT : REQ_OP_SCSI_IN, __GFP_RECLAIM, + BLK_REQ_PREEMPT); if (IS_ERR(req)) return ret; rq = scsi_req(req); @@ -2928,12 +2929,28 @@ scsi_device_quiesce(struct scsi_device *sdev) { int err; + /* + * Simply quiesing SCSI device isn't safe, it is easy + * to use up requests because all these allocated requests + * can't be dispatched when device is put in QIUESCE. + * Then no request can be allocated and we may hang + * somewhere, such as system suspend/resume. + * + * So we set block queue in preempt only first, no new + * normal request can enter queue any more, and all pending + * requests are drained once blk_set_preempt_only() + * returns. Only RQF_PREEMPT is allowed in preempt only mode. + */ + blk_set_preempt_only(sdev->request_queue, true); + mutex_lock(&sdev->state_mutex); err = scsi_device_set_state(sdev, SDEV_QUIESCE); mutex_unlock(&sdev->state_mutex); - if (err) + if (err) { + blk_set_preempt_only(sdev->request_queue, false); return err; + } scsi_run_queue(sdev->request_queue); while (atomic_read(&sdev->device_busy)) { @@ -2964,6 +2981,8 @@ void scsi_device_resume(struct scsi_device *sdev) scsi_device_set_state(sdev, SDEV_RUNNING) == 0) scsi_run_queue(sdev->request_queue); mutex_unlock(&sdev->state_mutex); + + blk_set_preempt_only(sdev->request_queue, false); } EXPORT_SYMBOL(scsi_device_resume);
Simply quiesing SCSI device and waiting for completeion of IO dispatched to SCSI queue isn't safe, it is easy to use up request pool because all allocated requests before can't be dispatched when device is put in QIUESCE. Then no request can be allocated for RQF_PREEMPT, and system may hang somewhere, such as When sending commands of sync_cache or start_stop during system suspend path. Before quiesing SCSI, this patch sets block queue in preempt mode first, so no new normal request can enter queue any more, and all pending requests are drained too once blk_set_preempt_only(true) is returned. Then RQF_PREEMPT can be allocated successfully duirng preempt freeze. Signed-off-by: Ming Lei <ming.lei@redhat.com> --- drivers/scsi/scsi_lib.c | 25 ++++++++++++++++++++++--- 1 file changed, 22 insertions(+), 3 deletions(-)