diff mbox

[v2] scsi: Avoid that .queuecommand() gets called for a quiesced SCSI device

Message ID 20180316174016.5127-1-bart.vanassche@wdc.com (mailing list archive)
State Changes Requested
Headers show

Commit Message

Bart Van Assche March 16, 2018, 5:40 p.m. UTC
Several SCSI transport and LLD drivers surround code that does not
tolerate concurrent calls of .queuecommand() with scsi_target_block() /
scsi_target_unblock(). These last two functions use
blk_mq_quiesce_queue() / blk_mq_unquiesce_queue() for scsi-mq request
queues to prevent concurrent .queuecommand() calls. However, that is
not sufficient to prevent .queuecommand() calls from scsi_send_eh_cmnd().
Hence surround the .queuecommand() call from the SCSI error handler with
code that avoids that .queuecommand() gets called in the quiesced state.

Notes:
- Converting the .queuecommand() call in scsi_send_eh_cmnd() into
  code that calls blk_get_request() + blk_execute_rq() is not an
  option since scsi_send_eh_cmnd() must be able to make forward progress
  even if all requests are allocated.
- Converting the .queuecommand() call in scsi_send_eh_cmnd() into a
  blk_execute_rq() or blk_mq_requeue_request() call is not an option either
  because that would require to change every individual function in the I/O
  path. Each function in the I/O path would have to be modified such that it
  handles requests received from the block layer core and request received
  from the SCSI EH differently. Since struct scsi_cmnd is not initialized by
  the block layer for filesystem requests, it is not possible to determine
  in scsi_queue_rq() whether or not a request has been submitted by the
  SCSI EH without modifying the block layer.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
---

Changes compared to v1:
- As requested by James, removed the wait queue again that was added to the
  SCSI device structure.

 drivers/scsi/scsi_error.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

Comments

James Bottomley March 16, 2018, 10 p.m. UTC | #1
On Fri, 2018-03-16 at 10:40 -0700, Bart Van Assche wrote:
> @@ -1050,7 +1050,22 @@ static int scsi_send_eh_cmnd(struct scsi_cmnd
> *scmd, unsigned char *cmnd,
>  
>  	scsi_log_send(scmd);
>  	scmd->scsi_done = scsi_eh_done;
> -	rtn = shost->hostt->queuecommand(shost, scmd);
> +	mutex_lock(&sdev->state_mutex);
> +	while (sdev->sdev_state == SDEV_QUIESCE && timeleft > 0) {
> +		mutex_unlock(&sdev->state_mutex);
> +		SCSI_LOG_ERROR_RECOVERY(5, sdev_printk(KERN_DEBUG,
> sdev,
> +			"%s: state %d <> %d\n", __func__, sdev-
> >sdev_state,
> +			SDEV_QUIESCE));
> +		delay = min(timeleft, stall_for);
> +		timeleft -= delay;
> +		msleep(jiffies_to_msecs(delay));
> +		mutex_lock(&sdev->state_mutex);
> +	}

What's the point of this loop?  if you eliminate it, you still get
exactly the same msleep from the stall_for retry processing.

Plus I really don't think you want to call ->queuecommand() with the
state mutex held.

You don't even need to hold the state mutex to read sdev->state because
the read is atomic and the mutex doesn't mediate anything. The check to
queuecommand race is the same for every consumer.

James
Bart Van Assche March 16, 2018, 10:21 p.m. UTC | #2
On Fri, 2018-03-16 at 15:00 -0700, James Bottomley wrote:
> On Fri, 2018-03-16 at 10:40 -0700, Bart Van Assche wrote:

> > @@ -1050,7 +1050,22 @@ static int scsi_send_eh_cmnd(struct scsi_cmnd

> > *scmd, unsigned char *cmnd,

> >  

> >  	scsi_log_send(scmd);

> >  	scmd->scsi_done = scsi_eh_done;

> > -	rtn = shost->hostt->queuecommand(shost, scmd);

> > +	mutex_lock(&sdev->state_mutex);

> > +	while (sdev->sdev_state == SDEV_QUIESCE && timeleft > 0) {

> > +		mutex_unlock(&sdev->state_mutex);

> > +		SCSI_LOG_ERROR_RECOVERY(5, sdev_printk(KERN_DEBUG,

> > sdev,

> > +			"%s: state %d <> %d\n", __func__, sdev-

> > > sdev_state,

> > 

> > +			SDEV_QUIESCE));

> > +		delay = min(timeleft, stall_for);

> > +		timeleft -= delay;

> > +		msleep(jiffies_to_msecs(delay));

> > +		mutex_lock(&sdev->state_mutex);

> > +	}

> 

> What's the point of this loop?  if you eliminate it, you still get

> exactly the same msleep from the stall_for retry processing.


Hello James,

The purpose of that loop is to check the SCSI device state every "stall_for"
jiffies and to avoid that more than "timeleft" jiffies is spent on waiting.

> Plus I really don't think you want to call ->queuecommand() with the

> state mutex held.


Since .queuecommand() is not allowed to sleep I think that holding the
SCSI device state mutex does not introduce the possibility of a new deadlock.
What makes you think that calling ->queuecommand()
with that mutex held
wouldn't be safe?

> You don't even need to hold the state mutex to read sdev->state because

> the read is atomic and the mutex doesn't mediate anything. The check to

> queuecommand race is the same for every consumer.


Since both scsi_device_quiesce() and scsi_device_resume() acquire and release
the SCSI device .state_mutex, obtaining that mutex realizes serialization of
the above code against scsi_device_quiesce() and scsi_device_resume(). If the
above code would not obtain .state_mutex then the following race would be
possible:
* scsi_send_eh_cmnd() sees that .sdev_state == SDEV_QUIESCE and decides that
  it is safe to call .queuecommand().
* Another thread calls scsi_device_quiesce() and removes some of the resources
  needed by .queuecommand().
* scsi_send_eh_cmnd() calls .queuecommand(), resulting in a kernel oops.

Please let me know if you need more information.

Thanks,

Bart.
James Bottomley March 16, 2018, 10:31 p.m. UTC | #3
On Fri, 2018-03-16 at 22:21 +0000, Bart Van Assche wrote:
> On Fri, 2018-03-16 at 15:00 -0700, James Bottomley wrote:
> > 
> > On Fri, 2018-03-16 at 10:40 -0700, Bart Van Assche wrote:
> > > 
> > > @@ -1050,7 +1050,22 @@ static int scsi_send_eh_cmnd(struct
> > > scsi_cmnd
> > > *scmd, unsigned char *cmnd,
> > >  
> > >  	scsi_log_send(scmd);
> > >  	scmd->scsi_done = scsi_eh_done;
> > > -	rtn = shost->hostt->queuecommand(shost, scmd);
> > > +	mutex_lock(&sdev->state_mutex);
> > > +	while (sdev->sdev_state == SDEV_QUIESCE && timeleft > 0)
> > > {
> > > +		mutex_unlock(&sdev->state_mutex);
> > > +		SCSI_LOG_ERROR_RECOVERY(5,
> > > sdev_printk(KERN_DEBUG,
> > > sdev,
> > > +			"%s: state %d <> %d\n", __func__, sdev-
> > > > 
> > > > sdev_state,
> > > 
> > > +			SDEV_QUIESCE));
> > > +		delay = min(timeleft, stall_for);
> > > +		timeleft -= delay;
> > > +		msleep(jiffies_to_msecs(delay));
> > > +		mutex_lock(&sdev->state_mutex);
> > > +	}
> > 
> > What's the point of this loop?  if you eliminate it, you still get
> > exactly the same msleep from the stall_for retry processing.
> 
> Hello James,
> 
> The purpose of that loop is to check the SCSI device state every
> "stall_for" jiffies and to avoid that more than "timeleft" jiffies is
> spent on waiting.

I know what the loop does; the the question I was asking is doesn't
setting rtn instead of calling ->queuecommand() achieve the same thing?

James
Bart Van Assche March 16, 2018, 10:40 p.m. UTC | #4
On Fri, 2018-03-16 at 15:31 -0700, James Bottomley wrote:
> On Fri, 2018-03-16 at 22:21 +0000, Bart Van Assche wrote:

> > On Fri, 2018-03-16 at 15:00 -0700, James Bottomley wrote:

> > > 

> > > On Fri, 2018-03-16 at 10:40 -0700, Bart Van Assche wrote:

> > > > 

> > > > @@ -1050,7 +1050,22 @@ static int scsi_send_eh_cmnd(struct

> > > > scsi_cmnd

> > > > *scmd, unsigned char *cmnd,

> > > >  

> > > >  	scsi_log_send(scmd);

> > > >  	scmd->scsi_done = scsi_eh_done;

> > > > -	rtn = shost->hostt->queuecommand(shost, scmd);

> > > > +	mutex_lock(&sdev->state_mutex);

> > > > +	while (sdev->sdev_state == SDEV_QUIESCE && timeleft > 0)

> > > > {

> > > > +		mutex_unlock(&sdev->state_mutex);

> > > > +		SCSI_LOG_ERROR_RECOVERY(5,

> > > > sdev_printk(KERN_DEBUG,

> > > > sdev,

> > > > +			"%s: state %d <> %d\n", __func__, sdev-

> > > > > 

> > > > > sdev_state,

> > > > 

> > > > +			SDEV_QUIESCE));

> > > > +		delay = min(timeleft, stall_for);

> > > > +		timeleft -= delay;

> > > > +		msleep(jiffies_to_msecs(delay));

> > > > +		mutex_lock(&sdev->state_mutex);

> > > > +	}

> > > 

> > > What's the point of this loop?  if you eliminate it, you still get

> > > exactly the same msleep from the stall_for retry processing.

> > 

> > Hello James,

> > 

> > The purpose of that loop is to check the SCSI device state every

> > "stall_for" jiffies and to avoid that more than "timeleft" jiffies is

> > spent on waiting.

> 

> I know what the loop does; the the question I was asking is doesn't

> setting rtn instead of calling ->queuecommand() achieve the same thing?


Sorry but I don't understand that last question. How could setting rtn be
an alternative for calling ->queuecommand() since we really want to call
->queuecommand if the SCSI device leaves the quiesced state before the
timeout has expired? Can you rephrase that last question?

Thanks,

Bart.
James Bottomley March 16, 2018, 10:49 p.m. UTC | #5
On Fri, 2018-03-16 at 22:40 +0000, Bart Van Assche wrote:
> On Fri, 2018-03-16 at 15:31 -0700, James Bottomley wrote:
> > 
> > On Fri, 2018-03-16 at 22:21 +0000, Bart Van Assche wrote:
> > > 
> > > On Fri, 2018-03-16 at 15:00 -0700, James Bottomley wrote:
> > > > 
> > > > 
> > > > On Fri, 2018-03-16 at 10:40 -0700, Bart Van Assche wrote:
> > > > > 
> > > > > 
> > > > > @@ -1050,7 +1050,22 @@ static int scsi_send_eh_cmnd(struct
> > > > > scsi_cmnd
> > > > > *scmd, unsigned char *cmnd,
> > > > >  
> > > > >  	scsi_log_send(scmd);
> > > > >  	scmd->scsi_done = scsi_eh_done;
> > > > > -	rtn = shost->hostt->queuecommand(shost, scmd);
> > > > > +	mutex_lock(&sdev->state_mutex);
> > > > > +	while (sdev->sdev_state == SDEV_QUIESCE && timeleft
> > > > > > 0)
> > > > > {
> > > > > +		mutex_unlock(&sdev->state_mutex);
> > > > > +		SCSI_LOG_ERROR_RECOVERY(5,
> > > > > sdev_printk(KERN_DEBUG,
> > > > > sdev,
> > > > > +			"%s: state %d <> %d\n", __func__,
> > > > > sdev-
> > > > > > 
> > > > > > 
> > > > > > sdev_state,
> > > > > 
> > > > > +			SDEV_QUIESCE));
> > > > > +		delay = min(timeleft, stall_for);
> > > > > +		timeleft -= delay;
> > > > > +		msleep(jiffies_to_msecs(delay));
> > > > > +		mutex_lock(&sdev->state_mutex);
> > > > > +	}
> > > > 
> > > > What's the point of this loop?  if you eliminate it, you still
> > > > get
> > > > exactly the same msleep from the stall_for retry processing.
> > > 
> > > Hello James,
> > > 
> > > The purpose of that loop is to check the SCSI device state every
> > > "stall_for" jiffies and to avoid that more than "timeleft"
> > > jiffies is
> > > spent on waiting.
> > 
> > I know what the loop does; the the question I was asking is doesn't
> > setting rtn instead of calling ->queuecommand() achieve the same
> > thing?
> 
> Sorry but I don't understand that last question. How could setting
> rtn be an alternative for calling ->queuecommand() since we really
> want to call ->queuecommand if the SCSI device leaves the quiesced
> state before the timeout has expired? Can you rephrase that last
> question?

In your new code you have

+       if (sdev->sdev_state != SDEV_QUIESCE)
+               rtn = shost->hostt->queuecommand(shost, scmd);
+       else
+               rtn = SCSI_MLQUEUE_DEVICE_BUSY;

That sets rtn instead of calling queuecommand

Then you drop through to this code below:

	if (rtn) {
		if (timeleft > stall_for) {
			scsi_eh_restore_cmnd(scmd, &ses);
			timeleft -= stall_for;
			msleep(jiffies_to_msecs(stall_for));
			goto retry;
		}

Which causes a msleep which is equivalent to the while loop.

James
Bart Van Assche March 16, 2018, 11:10 p.m. UTC | #6
On Fri, 2018-03-16 at 15:49 -0700, James Bottomley wrote:
> In your new code you have

> 

> +       if (sdev->sdev_state != SDEV_QUIESCE)

> +               rtn = shost->hostt->queuecommand(shost, scmd);

> +       else

> +               rtn = SCSI_MLQUEUE_DEVICE_BUSY;

> 

> That sets rtn instead of calling queuecommand

> 

> Then you drop through to this code below:

> 

> 	if (rtn) {

> 		if (timeleft > stall_for) {

> 			scsi_eh_restore_cmnd(scmd, &ses);

> 			timeleft -= stall_for;

> 			msleep(jiffies_to_msecs(stall_for));

> 			goto retry;

> 		}

> 

> Which causes a msleep which is equivalent to the while loop.


Hello James,

Thanks for the clarification - apparently we were each looking at another
part of the code.

If the "rtn = SCSI_MLQUEUE_DEVICE_BUSY" statement is executed that means that
the if-statement that controls that statement noticed that sdev->sdev_state ==
SDEV_QUIESCE. Since the while loop above that statement only finishes if
either sdev->sdev_state != SDEV_QUIESCE or timeleft <= 0, if the "rtn =
SCSI_MLQUEUE_DEVICE_BUSY" statement is executed that implies that timeleft <=
0. Since stall_for > 0, the expression timeleft > stall_for will evaluate to
false. In other words, the msleep() shown in your e-mail will be skipped.

Bart.
James Bottomley March 17, 2018, 3:07 p.m. UTC | #7
On Fri, 2018-03-16 at 23:10 +0000, Bart Van Assche wrote:
> On Fri, 2018-03-16 at 15:49 -0700, James Bottomley wrote:
> > 
> > In your new code you have
> > 
> > +       if (sdev->sdev_state != SDEV_QUIESCE)
> > +               rtn = shost->hostt->queuecommand(shost, scmd);
> > +       else
> > +               rtn = SCSI_MLQUEUE_DEVICE_BUSY;
> > 
> > That sets rtn instead of calling queuecommand
> > 
> > Then you drop through to this code below:
> > 
> > 	if (rtn) {
> > 		if (timeleft > stall_for) {
> > 			scsi_eh_restore_cmnd(scmd, &ses);
> > 			timeleft -= stall_for;
> > 			msleep(jiffies_to_msecs(stall_for));
> > 			goto retry;
> > 		}
> > 
> > Which causes a msleep which is equivalent to the while loop.
> 
> Hello James,
> 
> Thanks for the clarification - apparently we were each looking at
> another part of the code.
> 
> If the "rtn = SCSI_MLQUEUE_DEVICE_BUSY" statement is executed that
> means that the if-statement that controls that statement noticed that
> sdev->sdev_state == SDEV_QUIESCE. Since the while loop above that
> statement only finishes if either sdev->sdev_state != SDEV_QUIESCE or
> timeleft <= 0, if the "rtn = SCSI_MLQUEUE_DEVICE_BUSY" statement is
> executed that implies that timeleft <= 0. Since stall_for > 0, the
> expression timeleft > stall_for will evaluate to false. In other
> words, the msleep() shown in your e-mail will be skipped.

Not if you remove the while loop from the patch which was the original
request.

James
diff mbox

Patch

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 946039117bf4..71d7d2b893ab 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -1039,7 +1039,7 @@  static int scsi_send_eh_cmnd(struct scsi_cmnd *scmd, unsigned char *cmnd,
 	struct scsi_device *sdev = scmd->device;
 	struct Scsi_Host *shost = sdev->host;
 	DECLARE_COMPLETION_ONSTACK(done);
-	unsigned long timeleft = timeout;
+	unsigned long timeleft = timeout, delay;
 	struct scsi_eh_save ses;
 	const unsigned long stall_for = msecs_to_jiffies(100);
 	int rtn;
@@ -1050,7 +1050,22 @@  static int scsi_send_eh_cmnd(struct scsi_cmnd *scmd, unsigned char *cmnd,
 
 	scsi_log_send(scmd);
 	scmd->scsi_done = scsi_eh_done;
-	rtn = shost->hostt->queuecommand(shost, scmd);
+	mutex_lock(&sdev->state_mutex);
+	while (sdev->sdev_state == SDEV_QUIESCE && timeleft > 0) {
+		mutex_unlock(&sdev->state_mutex);
+		SCSI_LOG_ERROR_RECOVERY(5, sdev_printk(KERN_DEBUG, sdev,
+			"%s: state %d <> %d\n", __func__, sdev->sdev_state,
+			SDEV_QUIESCE));
+		delay = min(timeleft, stall_for);
+		timeleft -= delay;
+		msleep(jiffies_to_msecs(delay));
+		mutex_lock(&sdev->state_mutex);
+	}
+	if (sdev->sdev_state != SDEV_QUIESCE)
+		rtn = shost->hostt->queuecommand(shost, scmd);
+	else
+		rtn = SCSI_MLQUEUE_DEVICE_BUSY;
+	mutex_unlock(&sdev->state_mutex);
 	if (rtn) {
 		if (timeleft > stall_for) {
 			scsi_eh_restore_cmnd(scmd, &ses);