diff mbox

[2/2] scsi: set timed out out mq requests to complete

Message ID 20180720172444.GH4093@localhost.localdomain (mailing list archive)
State New, archived
Headers show

Commit Message

Keith Busch July 20, 2018, 5:24 p.m. UTC
On Fri, Jul 20, 2018 at 04:45:05PM +0000, Bart Van Assche wrote:
> I think that's a misunderstanding. If scsi_times_out() queues an abort
> asynchronously then it tells the block layer through its return value that the
> SCSI core still owns the request and hence that the block layer should ignore any
> completions that occur until the SCSI core calls scsi_finish_command(). That
> scsi_finish_command() will trigger a call to __blk_mq_end_request(). The
> scsi_times_out() return value I was referring to is called BLK_EH_DONE today and
> was called BLK_EH_NOT_HANDLED in kernel version v4.17.
> 
> This also means that I got the BLK_EH_NOT_HANDLED case wrong in "blk-mq: Rework
> blk-mq timeout handling again": in that case concurrent a blk_mq_complete_request()
> call should be ignored instead of triggering request completion.

I definitely think it's worth revisiting that for the longer term.

For near term, I don't want scsi error handling broken for 4.18, but also
not revert the changes that fixed all the other drivers. Restoring the
old behavior that scsi wants isolated to the scsi driver seems like the
lowest touch option.

My patch restores the state that scsi had in 4.17. It still has that
gap that may lose requests forever when the scsi LLD always returns
BLK_EH_RESET_TIMER (see virtio-scsi, for example). That gap existed prior,
so that's not new with my patch. Maybe we can fix that with a slight
modification to my previous patch. It looks like SCSI really wants to
block completions only when it hands off the command to the error handler,
so we don't need to have the inflight -> compete -> inflight transition,
and the following is all that's needed:

---
--

Comments

Christoph Hellwig July 23, 2018, 8:12 a.m. UTC | #1
On Fri, Jul 20, 2018 at 11:24:45AM -0600, Keith Busch wrote:
> My patch restores the state that scsi had in 4.17. It still has that
> gap that may lose requests forever when the scsi LLD always returns
> BLK_EH_RESET_TIMER (see virtio-scsi, for example). That gap existed prior,
> so that's not new with my patch. Maybe we can fix that with a slight
> modification to my previous patch. It looks like SCSI really wants to
> block completions only when it hands off the command to the error handler,
> so we don't need to have the inflight -> compete -> inflight transition,
> and the following is all that's needed:

Btw, one thing we should do in blk-mq and scsi is to make the time
optional.  If the blk_mq driver doesn't even have a timeout structure
there is no point in timing out requests and enter the timeout handler
ever.  Same for those scsi drivers always returning BLK_EH_RESET_TIMER.

Whether never having timeouts is a good idea is a different discussion,
but as long as we have such drivers we should handle them somewhat sane.

> ---
> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> index 8932ae81a15a..902c30d3c0ed 100644
> --- a/drivers/scsi/scsi_error.c
> +++ b/drivers/scsi/scsi_error.c
> @@ -296,6 +296,8 @@ enum blk_eh_timer_return scsi_times_out(struct request *req)
>  		rtn = host->hostt->eh_timed_out(scmd);
>  
>  	if (rtn == BLK_EH_DONE) {
> +		if (req->q->mq_ops && blk_mq_mark_complete(req))
> +			return rtn;

This looks pretty sensible to me as a band-aid.  It just needs a very
detailed comment explaining what is going on here.
Bart Van Assche July 23, 2018, 1:59 p.m. UTC | #2
On Mon, 2018-07-23 at 10:12 +0200, hch@lst.de wrote:
> Btw, one thing we should do in blk-mq and scsi is to make the time
> optional.  If the blk_mq driver doesn't even have a timeout structure
> there is no point in timing out requests and enter the timeout handler
> ever.

Are there any blk-mq drivers that do not define a timeout handler and that
use shared tags? I think such drivers need periodic calls to blk_mq_tag_idle().
Do you perhaps want to happen these calls from another context?

Thanks,

Bart.
Keith Busch July 23, 2018, 2:04 p.m. UTC | #3
On Mon, Jul 23, 2018 at 10:12:31AM +0200, hch@lst.de wrote:
> > diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> > index 8932ae81a15a..902c30d3c0ed 100644
> > --- a/drivers/scsi/scsi_error.c
> > +++ b/drivers/scsi/scsi_error.c
> > @@ -296,6 +296,8 @@ enum blk_eh_timer_return scsi_times_out(struct request *req)
> >  		rtn = host->hostt->eh_timed_out(scmd);
> >  
> >  	if (rtn == BLK_EH_DONE) {
> > +		if (req->q->mq_ops && blk_mq_mark_complete(req))
> > +			return rtn;
> 
> This looks pretty sensible to me as a band-aid.  It just needs a very
> detailed comment explaining what is going on here.

Sounds good, v2 will be sent shortly.
diff mbox

Patch

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 8932ae81a15a..902c30d3c0ed 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -296,6 +296,8 @@  enum blk_eh_timer_return scsi_times_out(struct request *req)
 		rtn = host->hostt->eh_timed_out(scmd);
 
 	if (rtn == BLK_EH_DONE) {
+		if (req->q->mq_ops && blk_mq_mark_complete(req))
+			return rtn;
 		if (scsi_abort_command(scmd) != SUCCESS) {
 			set_host_byte(scmd, DID_TIME_OUT);
 			scsi_eh_scmd_add(scmd);