Message ID | 20180316174016.5127-1-bart.vanassche@wdc.com (mailing list archive) |
---|---|
State | Changes Requested |
Headers | show |
On Fri, 2018-03-16 at 10:40 -0700, Bart Van Assche wrote: > @@ -1050,7 +1050,22 @@ static int scsi_send_eh_cmnd(struct scsi_cmnd > *scmd, unsigned char *cmnd, > > scsi_log_send(scmd); > scmd->scsi_done = scsi_eh_done; > - rtn = shost->hostt->queuecommand(shost, scmd); > + mutex_lock(&sdev->state_mutex); > + while (sdev->sdev_state == SDEV_QUIESCE && timeleft > 0) { > + mutex_unlock(&sdev->state_mutex); > + SCSI_LOG_ERROR_RECOVERY(5, sdev_printk(KERN_DEBUG, > sdev, > + "%s: state %d <> %d\n", __func__, sdev- > >sdev_state, > + SDEV_QUIESCE)); > + delay = min(timeleft, stall_for); > + timeleft -= delay; > + msleep(jiffies_to_msecs(delay)); > + mutex_lock(&sdev->state_mutex); > + } What's the point of this loop? if you eliminate it, you still get exactly the same msleep from the stall_for retry processing. Plus I really don't think you want to call ->queuecommand() with the state mutex held. You don't even need to hold the state mutex to read sdev->state because the read is atomic and the mutex doesn't mediate anything. The check to queuecommand race is the same for every consumer. James
On Fri, 2018-03-16 at 15:00 -0700, James Bottomley wrote: > On Fri, 2018-03-16 at 10:40 -0700, Bart Van Assche wrote: > > @@ -1050,7 +1050,22 @@ static int scsi_send_eh_cmnd(struct scsi_cmnd > > *scmd, unsigned char *cmnd, > > > > scsi_log_send(scmd); > > scmd->scsi_done = scsi_eh_done; > > - rtn = shost->hostt->queuecommand(shost, scmd); > > + mutex_lock(&sdev->state_mutex); > > + while (sdev->sdev_state == SDEV_QUIESCE && timeleft > 0) { > > + mutex_unlock(&sdev->state_mutex); > > + SCSI_LOG_ERROR_RECOVERY(5, sdev_printk(KERN_DEBUG, > > sdev, > > + "%s: state %d <> %d\n", __func__, sdev- > > > sdev_state, > > > > + SDEV_QUIESCE)); > > + delay = min(timeleft, stall_for); > > + timeleft -= delay; > > + msleep(jiffies_to_msecs(delay)); > > + mutex_lock(&sdev->state_mutex); > > + } > > What's the point of this loop? if you eliminate it, you still get > exactly the same msleep from the stall_for retry processing. Hello James, The purpose of that loop is to check the SCSI device state every "stall_for" jiffies and to avoid that more than "timeleft" jiffies is spent on waiting. > Plus I really don't think you want to call ->queuecommand() with the > state mutex held. Since .queuecommand() is not allowed to sleep I think that holding the SCSI device state mutex does not introduce the possibility of a new deadlock. What makes you think that calling ->queuecommand() with that mutex held wouldn't be safe? > You don't even need to hold the state mutex to read sdev->state because > the read is atomic and the mutex doesn't mediate anything. The check to > queuecommand race is the same for every consumer. Since both scsi_device_quiesce() and scsi_device_resume() acquire and release the SCSI device .state_mutex, obtaining that mutex realizes serialization of the above code against scsi_device_quiesce() and scsi_device_resume(). If the above code would not obtain .state_mutex then the following race would be possible: * scsi_send_eh_cmnd() sees that .sdev_state == SDEV_QUIESCE and decides that it is safe to call .queuecommand(). * Another thread calls scsi_device_quiesce() and removes some of the resources needed by .queuecommand(). * scsi_send_eh_cmnd() calls .queuecommand(), resulting in a kernel oops. Please let me know if you need more information. Thanks, Bart.
On Fri, 2018-03-16 at 22:21 +0000, Bart Van Assche wrote: > On Fri, 2018-03-16 at 15:00 -0700, James Bottomley wrote: > > > > On Fri, 2018-03-16 at 10:40 -0700, Bart Van Assche wrote: > > > > > > @@ -1050,7 +1050,22 @@ static int scsi_send_eh_cmnd(struct > > > scsi_cmnd > > > *scmd, unsigned char *cmnd, > > > > > > scsi_log_send(scmd); > > > scmd->scsi_done = scsi_eh_done; > > > - rtn = shost->hostt->queuecommand(shost, scmd); > > > + mutex_lock(&sdev->state_mutex); > > > + while (sdev->sdev_state == SDEV_QUIESCE && timeleft > 0) > > > { > > > + mutex_unlock(&sdev->state_mutex); > > > + SCSI_LOG_ERROR_RECOVERY(5, > > > sdev_printk(KERN_DEBUG, > > > sdev, > > > + "%s: state %d <> %d\n", __func__, sdev- > > > > > > > > sdev_state, > > > > > > + SDEV_QUIESCE)); > > > + delay = min(timeleft, stall_for); > > > + timeleft -= delay; > > > + msleep(jiffies_to_msecs(delay)); > > > + mutex_lock(&sdev->state_mutex); > > > + } > > > > What's the point of this loop? if you eliminate it, you still get > > exactly the same msleep from the stall_for retry processing. > > Hello James, > > The purpose of that loop is to check the SCSI device state every > "stall_for" jiffies and to avoid that more than "timeleft" jiffies is > spent on waiting. I know what the loop does; the the question I was asking is doesn't setting rtn instead of calling ->queuecommand() achieve the same thing? James
On Fri, 2018-03-16 at 15:31 -0700, James Bottomley wrote: > On Fri, 2018-03-16 at 22:21 +0000, Bart Van Assche wrote: > > On Fri, 2018-03-16 at 15:00 -0700, James Bottomley wrote: > > > > > > On Fri, 2018-03-16 at 10:40 -0700, Bart Van Assche wrote: > > > > > > > > @@ -1050,7 +1050,22 @@ static int scsi_send_eh_cmnd(struct > > > > scsi_cmnd > > > > *scmd, unsigned char *cmnd, > > > > > > > > scsi_log_send(scmd); > > > > scmd->scsi_done = scsi_eh_done; > > > > - rtn = shost->hostt->queuecommand(shost, scmd); > > > > + mutex_lock(&sdev->state_mutex); > > > > + while (sdev->sdev_state == SDEV_QUIESCE && timeleft > 0) > > > > { > > > > + mutex_unlock(&sdev->state_mutex); > > > > + SCSI_LOG_ERROR_RECOVERY(5, > > > > sdev_printk(KERN_DEBUG, > > > > sdev, > > > > + "%s: state %d <> %d\n", __func__, sdev- > > > > > > > > > > sdev_state, > > > > > > > > + SDEV_QUIESCE)); > > > > + delay = min(timeleft, stall_for); > > > > + timeleft -= delay; > > > > + msleep(jiffies_to_msecs(delay)); > > > > + mutex_lock(&sdev->state_mutex); > > > > + } > > > > > > What's the point of this loop? if you eliminate it, you still get > > > exactly the same msleep from the stall_for retry processing. > > > > Hello James, > > > > The purpose of that loop is to check the SCSI device state every > > "stall_for" jiffies and to avoid that more than "timeleft" jiffies is > > spent on waiting. > > I know what the loop does; the the question I was asking is doesn't > setting rtn instead of calling ->queuecommand() achieve the same thing? Sorry but I don't understand that last question. How could setting rtn be an alternative for calling ->queuecommand() since we really want to call ->queuecommand if the SCSI device leaves the quiesced state before the timeout has expired? Can you rephrase that last question? Thanks, Bart.
On Fri, 2018-03-16 at 22:40 +0000, Bart Van Assche wrote: > On Fri, 2018-03-16 at 15:31 -0700, James Bottomley wrote: > > > > On Fri, 2018-03-16 at 22:21 +0000, Bart Van Assche wrote: > > > > > > On Fri, 2018-03-16 at 15:00 -0700, James Bottomley wrote: > > > > > > > > > > > > On Fri, 2018-03-16 at 10:40 -0700, Bart Van Assche wrote: > > > > > > > > > > > > > > > @@ -1050,7 +1050,22 @@ static int scsi_send_eh_cmnd(struct > > > > > scsi_cmnd > > > > > *scmd, unsigned char *cmnd, > > > > > > > > > > scsi_log_send(scmd); > > > > > scmd->scsi_done = scsi_eh_done; > > > > > - rtn = shost->hostt->queuecommand(shost, scmd); > > > > > + mutex_lock(&sdev->state_mutex); > > > > > + while (sdev->sdev_state == SDEV_QUIESCE && timeleft > > > > > > 0) > > > > > { > > > > > + mutex_unlock(&sdev->state_mutex); > > > > > + SCSI_LOG_ERROR_RECOVERY(5, > > > > > sdev_printk(KERN_DEBUG, > > > > > sdev, > > > > > + "%s: state %d <> %d\n", __func__, > > > > > sdev- > > > > > > > > > > > > > > > > > > sdev_state, > > > > > > > > > > + SDEV_QUIESCE)); > > > > > + delay = min(timeleft, stall_for); > > > > > + timeleft -= delay; > > > > > + msleep(jiffies_to_msecs(delay)); > > > > > + mutex_lock(&sdev->state_mutex); > > > > > + } > > > > > > > > What's the point of this loop? if you eliminate it, you still > > > > get > > > > exactly the same msleep from the stall_for retry processing. > > > > > > Hello James, > > > > > > The purpose of that loop is to check the SCSI device state every > > > "stall_for" jiffies and to avoid that more than "timeleft" > > > jiffies is > > > spent on waiting. > > > > I know what the loop does; the the question I was asking is doesn't > > setting rtn instead of calling ->queuecommand() achieve the same > > thing? > > Sorry but I don't understand that last question. How could setting > rtn be an alternative for calling ->queuecommand() since we really > want to call ->queuecommand if the SCSI device leaves the quiesced > state before the timeout has expired? Can you rephrase that last > question? In your new code you have + if (sdev->sdev_state != SDEV_QUIESCE) + rtn = shost->hostt->queuecommand(shost, scmd); + else + rtn = SCSI_MLQUEUE_DEVICE_BUSY; That sets rtn instead of calling queuecommand Then you drop through to this code below: if (rtn) { if (timeleft > stall_for) { scsi_eh_restore_cmnd(scmd, &ses); timeleft -= stall_for; msleep(jiffies_to_msecs(stall_for)); goto retry; } Which causes a msleep which is equivalent to the while loop. James
On Fri, 2018-03-16 at 15:49 -0700, James Bottomley wrote: > In your new code you have > > + if (sdev->sdev_state != SDEV_QUIESCE) > + rtn = shost->hostt->queuecommand(shost, scmd); > + else > + rtn = SCSI_MLQUEUE_DEVICE_BUSY; > > That sets rtn instead of calling queuecommand > > Then you drop through to this code below: > > if (rtn) { > if (timeleft > stall_for) { > scsi_eh_restore_cmnd(scmd, &ses); > timeleft -= stall_for; > msleep(jiffies_to_msecs(stall_for)); > goto retry; > } > > Which causes a msleep which is equivalent to the while loop. Hello James, Thanks for the clarification - apparently we were each looking at another part of the code. If the "rtn = SCSI_MLQUEUE_DEVICE_BUSY" statement is executed that means that the if-statement that controls that statement noticed that sdev->sdev_state == SDEV_QUIESCE. Since the while loop above that statement only finishes if either sdev->sdev_state != SDEV_QUIESCE or timeleft <= 0, if the "rtn = SCSI_MLQUEUE_DEVICE_BUSY" statement is executed that implies that timeleft <= 0. Since stall_for > 0, the expression timeleft > stall_for will evaluate to false. In other words, the msleep() shown in your e-mail will be skipped. Bart.
On Fri, 2018-03-16 at 23:10 +0000, Bart Van Assche wrote: > On Fri, 2018-03-16 at 15:49 -0700, James Bottomley wrote: > > > > In your new code you have > > > > + if (sdev->sdev_state != SDEV_QUIESCE) > > + rtn = shost->hostt->queuecommand(shost, scmd); > > + else > > + rtn = SCSI_MLQUEUE_DEVICE_BUSY; > > > > That sets rtn instead of calling queuecommand > > > > Then you drop through to this code below: > > > > if (rtn) { > > if (timeleft > stall_for) { > > scsi_eh_restore_cmnd(scmd, &ses); > > timeleft -= stall_for; > > msleep(jiffies_to_msecs(stall_for)); > > goto retry; > > } > > > > Which causes a msleep which is equivalent to the while loop. > > Hello James, > > Thanks for the clarification - apparently we were each looking at > another part of the code. > > If the "rtn = SCSI_MLQUEUE_DEVICE_BUSY" statement is executed that > means that the if-statement that controls that statement noticed that > sdev->sdev_state == SDEV_QUIESCE. Since the while loop above that > statement only finishes if either sdev->sdev_state != SDEV_QUIESCE or > timeleft <= 0, if the "rtn = SCSI_MLQUEUE_DEVICE_BUSY" statement is > executed that implies that timeleft <= 0. Since stall_for > 0, the > expression timeleft > stall_for will evaluate to false. In other > words, the msleep() shown in your e-mail will be skipped. Not if you remove the while loop from the patch which was the original request. James
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c index 946039117bf4..71d7d2b893ab 100644 --- a/drivers/scsi/scsi_error.c +++ b/drivers/scsi/scsi_error.c @@ -1039,7 +1039,7 @@ static int scsi_send_eh_cmnd(struct scsi_cmnd *scmd, unsigned char *cmnd, struct scsi_device *sdev = scmd->device; struct Scsi_Host *shost = sdev->host; DECLARE_COMPLETION_ONSTACK(done); - unsigned long timeleft = timeout; + unsigned long timeleft = timeout, delay; struct scsi_eh_save ses; const unsigned long stall_for = msecs_to_jiffies(100); int rtn; @@ -1050,7 +1050,22 @@ static int scsi_send_eh_cmnd(struct scsi_cmnd *scmd, unsigned char *cmnd, scsi_log_send(scmd); scmd->scsi_done = scsi_eh_done; - rtn = shost->hostt->queuecommand(shost, scmd); + mutex_lock(&sdev->state_mutex); + while (sdev->sdev_state == SDEV_QUIESCE && timeleft > 0) { + mutex_unlock(&sdev->state_mutex); + SCSI_LOG_ERROR_RECOVERY(5, sdev_printk(KERN_DEBUG, sdev, + "%s: state %d <> %d\n", __func__, sdev->sdev_state, + SDEV_QUIESCE)); + delay = min(timeleft, stall_for); + timeleft -= delay; + msleep(jiffies_to_msecs(delay)); + mutex_lock(&sdev->state_mutex); + } + if (sdev->sdev_state != SDEV_QUIESCE) + rtn = shost->hostt->queuecommand(shost, scmd); + else + rtn = SCSI_MLQUEUE_DEVICE_BUSY; + mutex_unlock(&sdev->state_mutex); if (rtn) { if (timeleft > stall_for) { scsi_eh_restore_cmnd(scmd, &ses);
Several SCSI transport and LLD drivers surround code that does not tolerate concurrent calls of .queuecommand() with scsi_target_block() / scsi_target_unblock(). These last two functions use blk_mq_quiesce_queue() / blk_mq_unquiesce_queue() for scsi-mq request queues to prevent concurrent .queuecommand() calls. However, that is not sufficient to prevent .queuecommand() calls from scsi_send_eh_cmnd(). Hence surround the .queuecommand() call from the SCSI error handler with code that avoids that .queuecommand() gets called in the quiesced state. Notes: - Converting the .queuecommand() call in scsi_send_eh_cmnd() into code that calls blk_get_request() + blk_execute_rq() is not an option since scsi_send_eh_cmnd() must be able to make forward progress even if all requests are allocated. - Converting the .queuecommand() call in scsi_send_eh_cmnd() into a blk_execute_rq() or blk_mq_requeue_request() call is not an option either because that would require to change every individual function in the I/O path. Each function in the I/O path would have to be modified such that it handles requests received from the block layer core and request received from the SCSI EH differently. Since struct scsi_cmnd is not initialized by the block layer for filesystem requests, it is not possible to determine in scsi_queue_rq() whether or not a request has been submitted by the SCSI EH without modifying the block layer. Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: Hannes Reinecke <hare@suse.de> Cc: Johannes Thumshirn <jthumshirn@suse.de> --- Changes compared to v1: - As requested by James, removed the wait queue again that was added to the SCSI device structure. drivers/scsi/scsi_error.c | 19 +++++++++++++++++-- 1 file changed, 17 insertions(+), 2 deletions(-)