diff mbox

[2/2] scsi: set timed out out mq requests to complete

Message ID 20180719212618.2406-2-keith.busch@intel.com (mailing list archive)
State Not Applicable
Headers show

Commit Message

Keith Busch July 19, 2018, 9:26 p.m. UTC
The scsi block layer requires requests claimed be the error handling be
completed by the error handler. A previous commit allowed completions
to proceed for blk-mq, breaking that assumption.

This patch prevents completions that may race with the timeout handler
by marking the state to complete, restoring the previous behavior.

Fixes: 12f5b931 ("blk-mq: Remove generation seqeunce")
Signed-off-by: Keith Busch <keith.busch@intel.com>
---

Tested using Jianchao's abort injection to scsi_debug described here:

  https://lore.kernel.org/lkml/a68ad043-26a1-d3d8-2009-504ba4230e0f@oracle.com/

 drivers/scsi/scsi_error.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

Comments

Johannes Thumshirn July 20, 2018, 6:52 a.m. UTC | #1
On Thu, Jul 19, 2018 at 03:26:18PM -0600, Keith Busch wrote:
> The scsi block layer requires requests claimed be the error handling be
> completed by the error handler. A previous commit allowed completions
> to proceed for blk-mq, breaking that assumption.

Ahm the first sentence is a bit hard to parse, sorry.
Did you mean something like:

"The scsi block later requires requests claimed by the error handling
to be completed by the error handler"?
Keith Busch July 20, 2018, 2:05 p.m. UTC | #2
On Fri, Jul 20, 2018 at 08:52:58AM +0200, Johannes Thumshirn wrote:
> On Thu, Jul 19, 2018 at 03:26:18PM -0600, Keith Busch wrote:
> > The scsi block layer requires requests claimed be the error handling be
> > completed by the error handler. A previous commit allowed completions
> > to proceed for blk-mq, breaking that assumption.
> 
> Ahm the first sentence is a bit hard to parse, sorry.
> Did you mean something like:
> 
> "The scsi block later requires requests claimed by the error handling
> to be completed by the error handler"?

Ugh, sorry. Yes, you are correct on the intended phrasing.
Christoph Hellwig July 20, 2018, 2:41 p.m. UTC | #3
Has this been tested?  Especially with the blk_abort_request corner
case?
Keith Busch July 20, 2018, 2:50 p.m. UTC | #4
On Fri, Jul 20, 2018 at 07:41:46AM -0700, Christoph Hellwig wrote:
> Has this been tested?  Especially with the blk_abort_request corner
> case?

Yes, that's what Jianchao's modification to scsi_debug used.
Bart Van Assche July 20, 2018, 3:12 p.m. UTC | #5
On Thu, 2018-07-19 at 15:26 -0600, Keith Busch wrote:
> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> index 8932ae81a15a..86ee10b2c775 100644
> --- a/drivers/scsi/scsi_error.c
> +++ b/drivers/scsi/scsi_error.c
> @@ -286,6 +286,9 @@ enum blk_eh_timer_return scsi_times_out(struct request *req)
>  	enum blk_eh_timer_return rtn = BLK_EH_DONE;
>  	struct Scsi_Host *host = scmd->device->host;
>  
> +	if (req->q->mq_ops && blk_mq_mark_complete(req))
> +		return rtn;
> +
>  	trace_scsi_dispatch_cmd_timeout(scmd);
>  	scsi_log_completion(scmd, TIMEOUT_ERROR);
>  
> @@ -300,7 +303,8 @@ enum blk_eh_timer_return scsi_times_out(struct request *req)
>  			set_host_byte(scmd, DID_TIME_OUT);
>  			scsi_eh_scmd_add(scmd);
>  		}
> -	}
> +	} else if (req->q->mq_ops)
> +		WRITE_ONCE(req->state, MQ_RQ_IN_FLIGHT);
>  
>  	return rtn;
>  }

Modifying the completion state and req->state from the SCSI core are layering
violations. Have you considered to move the above changes into blk_mq_rq_timed_out()?
An additional benefit of that approach is that the req->q->mq_ops checks can be
left out.

Thanks,

Bart.
Keith Busch July 20, 2018, 3:56 p.m. UTC | #6
On Fri, Jul 20, 2018 at 03:12:59PM +0000, Bart Van Assche wrote:
> On Thu, 2018-07-19 at 15:26 -0600, Keith Busch wrote:
> > diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> > index 8932ae81a15a..86ee10b2c775 100644
> > --- a/drivers/scsi/scsi_error.c
> > +++ b/drivers/scsi/scsi_error.c
> > @@ -286,6 +286,9 @@ enum blk_eh_timer_return scsi_times_out(struct request *req)
> >  	enum blk_eh_timer_return rtn = BLK_EH_DONE;
> >  	struct Scsi_Host *host = scmd->device->host;
> >  
> > +	if (req->q->mq_ops && blk_mq_mark_complete(req))
> > +		return rtn;
> > +
> >  	trace_scsi_dispatch_cmd_timeout(scmd);
> >  	scsi_log_completion(scmd, TIMEOUT_ERROR);
> >  
> > @@ -300,7 +303,8 @@ enum blk_eh_timer_return scsi_times_out(struct request *req)
> >  			set_host_byte(scmd, DID_TIME_OUT);
> >  			scsi_eh_scmd_add(scmd);
> >  		}
> > -	}
> > +	} else if (req->q->mq_ops)
> > +		WRITE_ONCE(req->state, MQ_RQ_IN_FLIGHT);
> >  
> >  	return rtn;
> >  }
> 
> Modifying the completion state and req->state from the SCSI core are layering
> violations. Have you considered to move the above changes into blk_mq_rq_timed_out()?
> An additional benefit of that approach is that the req->q->mq_ops checks can be
> left out.

SCSI is the only block driver that wants this behavior. Moving it back
to generic where it used to be breaks other block drivers.
Bart Van Assche July 20, 2018, 4:03 p.m. UTC | #7
On Fri, 2018-07-20 at 09:56 -0600, Keith Busch wrote:
> On Fri, Jul 20, 2018 at 03:12:59PM +0000, Bart Van Assche wrote:
> > On Thu, 2018-07-19 at 15:26 -0600, Keith Busch wrote:
> > > diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> > > index 8932ae81a15a..86ee10b2c775 100644
> > > --- a/drivers/scsi/scsi_error.c
> > > +++ b/drivers/scsi/scsi_error.c
> > > @@ -286,6 +286,9 @@ enum blk_eh_timer_return scsi_times_out(struct request *req)
> > >  	enum blk_eh_timer_return rtn = BLK_EH_DONE;
> > >  	struct Scsi_Host *host = scmd->device->host;
> > >  
> > > +	if (req->q->mq_ops && blk_mq_mark_complete(req))
> > > +		return rtn;
> > > +
> > >  	trace_scsi_dispatch_cmd_timeout(scmd);
> > >  	scsi_log_completion(scmd, TIMEOUT_ERROR);
> > >  
> > > @@ -300,7 +303,8 @@ enum blk_eh_timer_return scsi_times_out(struct request *req)
> > >  			set_host_byte(scmd, DID_TIME_OUT);
> > >  			scsi_eh_scmd_add(scmd);
> > >  		}
> > > -	}
> > > +	} else if (req->q->mq_ops)
> > > +		WRITE_ONCE(req->state, MQ_RQ_IN_FLIGHT);
> > >  
> > >  	return rtn;
> > >  }
> > 
> > Modifying the completion state and req->state from the SCSI core are layering
> > violations. Have you considered to move the above changes into blk_mq_rq_timed_out()?
> > An additional benefit of that approach is that the req->q->mq_ops checks can be
> > left out.
> 
> SCSI is the only block driver that wants this behavior. Moving it back
> to generic where it used to be breaks other block drivers.

That's new to me. What would break in the NVMe driver if the above change would be
present in the block layer?

Thanks,

Bart.
Keith Busch July 20, 2018, 4:12 p.m. UTC | #8
On Fri, Jul 20, 2018 at 04:03:18PM +0000, Bart Van Assche wrote:
> On Fri, 2018-07-20 at 09:56 -0600, Keith Busch wrote:
> > SCSI is the only block driver that wants this behavior. Moving it back
> > to generic where it used to be breaks other block drivers.
> 
> That's new to me. What would break in the NVMe driver if the above change would be
> present in the block layer?

This is what causes the block layer to lose completions, and most drivers
don't want the kernel to lose their completions.
Bart Van Assche July 20, 2018, 4:20 p.m. UTC | #9
On Fri, 2018-07-20 at 10:12 -0600, Keith Busch wrote:
> On Fri, Jul 20, 2018 at 04:03:18PM +0000, Bart Van Assche wrote:
> > On Fri, 2018-07-20 at 09:56 -0600, Keith Busch wrote:
> > > SCSI is the only block driver that wants this behavior. Moving it back
> > > to generic where it used to be breaks other block drivers.
> > 
> > That's new to me. What would break in the NVMe driver if the above change would be
> > present in the block layer?
> 
> This is what causes the block layer to lose completions, and most drivers
> don't want the kernel to lose their completions.

Hello Keith,

Have you considered to introduce a fourth state for block layer requests to
avoid that completions that occur while a timeout handler is in progress get
lost? That would avoid that completions get lost not only for the NVMe driver
but also for SCSI drivers. See e.g. the MQ_RQ_TIMED_OUT state in
https://www.mail-archive.com/linux-block@vger.kernel.org/msg22196.html

Thanks,

Bart.
Keith Busch July 20, 2018, 4:23 p.m. UTC | #10
On Fri, Jul 20, 2018 at 04:20:01PM +0000, Bart Van Assche wrote:
> On Fri, 2018-07-20 at 10:12 -0600, Keith Busch wrote:
> > On Fri, Jul 20, 2018 at 04:03:18PM +0000, Bart Van Assche wrote:
> > > On Fri, 2018-07-20 at 09:56 -0600, Keith Busch wrote:
> > > > SCSI is the only block driver that wants this behavior. Moving it back
> > > > to generic where it used to be breaks other block drivers.
> > > 
> > > That's new to me. What would break in the NVMe driver if the above change would be
> > > present in the block layer?
> > 
> > This is what causes the block layer to lose completions, and most drivers
> > don't want the kernel to lose their completions.
> 
> Hello Keith,
> 
> Have you considered to introduce a fourth state for block layer requests to
> avoid that completions that occur while a timeout handler is in progress get
> lost? That would avoid that completions get lost not only for the NVMe driver
> but also for SCSI drivers. See e.g. the MQ_RQ_TIMED_OUT state in
> https://www.mail-archive.com/linux-block@vger.kernel.org/msg22196.html

Yes, I've considered that, and I really want to use it, but scsi may
still reference a freed request in scmd_eh_abort_handler that way.
Bart Van Assche July 20, 2018, 4:45 p.m. UTC | #11
On Fri, 2018-07-20 at 10:23 -0600, Keith Busch wrote:
> On Fri, Jul 20, 2018 at 04:20:01PM +0000, Bart Van Assche wrote:
> > On Fri, 2018-07-20 at 10:12 -0600, Keith Busch wrote:
> > > On Fri, Jul 20, 2018 at 04:03:18PM +0000, Bart Van Assche wrote:
> > > > On Fri, 2018-07-20 at 09:56 -0600, Keith Busch wrote:
> > > > > SCSI is the only block driver that wants this behavior. Moving it back
> > > > > to generic where it used to be breaks other block drivers.
> > > > 
> > > > That's new to me. What would break in the NVMe driver if the above change would be
> > > > present in the block layer?
> > > 
> > > This is what causes the block layer to lose completions, and most drivers
> > > don't want the kernel to lose their completions.
> > 
> > Hello Keith,
> > 
> > Have you considered to introduce a fourth state for block layer requests to
> > avoid that completions that occur while a timeout handler is in progress get
> > lost? That would avoid that completions get lost not only for the NVMe driver
> > but also for SCSI drivers. See e.g. the MQ_RQ_TIMED_OUT state in
> > https://www.mail-archive.com/linux-block@vger.kernel.org/msg22196.html
> 
> Yes, I've considered that, and I really want to use it, but scsi may
> still reference a freed request in scmd_eh_abort_handler that way.

I think that's a misunderstanding. If scsi_times_out() queues an abort
asynchronously then it tells the block layer through its return value that the
SCSI core still owns the request and hence that the block layer should ignore any
completions that occur until the SCSI core calls scsi_finish_command(). That
scsi_finish_command() will trigger a call to __blk_mq_end_request(). The
scsi_times_out() return value I was referring to is called BLK_EH_DONE today and
was called BLK_EH_NOT_HANDLED in kernel version v4.17.

This also means that I got the BLK_EH_NOT_HANDLED case wrong in "blk-mq: Rework
blk-mq timeout handling again": in that case concurrent a blk_mq_complete_request()
call should be ignored instead of triggering request completion.

Bart.
diff mbox

Patch

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 8932ae81a15a..86ee10b2c775 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -286,6 +286,9 @@  enum blk_eh_timer_return scsi_times_out(struct request *req)
 	enum blk_eh_timer_return rtn = BLK_EH_DONE;
 	struct Scsi_Host *host = scmd->device->host;
 
+	if (req->q->mq_ops && blk_mq_mark_complete(req))
+		return rtn;
+
 	trace_scsi_dispatch_cmd_timeout(scmd);
 	scsi_log_completion(scmd, TIMEOUT_ERROR);
 
@@ -300,7 +303,8 @@  enum blk_eh_timer_return scsi_times_out(struct request *req)
 			set_host_byte(scmd, DID_TIME_OUT);
 			scsi_eh_scmd_add(scmd);
 		}
-	}
+	} else if (req->q->mq_ops)
+		WRITE_ONCE(req->state, MQ_RQ_IN_FLIGHT);
 
 	return rtn;
 }