diff mbox series

[v3,09/18] scsi: sd: handle read/write CDL timeout failures

Message ID 20230124190308.127318-10-niklas.cassel@wdc.com (mailing list archive)
State Changes Requested
Headers show
Series Add Command Duration Limits support | expand

Commit Message

Niklas Cassel Jan. 24, 2023, 7:02 p.m. UTC
Commands using a duration limit descriptor that has limit policies set
to a value other than 0x0 may be failed by the device if one of the
limits are exceeded. For such commands, since the failure is the result
of the user duration limit configuration and workload, the commands
should not be retried and terminated immediately. Furthermore, to allow
the user to differentiate these "soft" failures from hard errors due to
hardware problem, a different error code than EIO should be returned.

There are 2 cases to consider:
(1) The failure is due to a limit policy failing the command with a
check condition sense key, that is, any limit policy other than 0xD.
For this case, scsi_check_sense() is modified to detect failures with
the ABORTED COMMAND sense key and the COMMAND TIMEOUT BEFORE PROCESSING
or COMMAND TIMEOUT DURING PROCESSING or COMMAND TIMEOUT DURING
PROCESSING DUE TO ERROR RECOVERY additional sense code. For these
failures, a SUCCESS disposition is returned so that
scsi_finish_command() is called to terminate the command.

(2) The failure is due to a limit policy set to 0xD, which result in the
command being terminated with a GOOD status, COMPLETED sense key, and
DATA CURRENTLY UNAVAILABLE additional sense code. To handle this case,
the scsi_check_sense() is modified to return a SUCCESS disposition so
that scsi_finish_command() is called to terminate the command.
In addition, scsi_decide_disposition() has to be modified to see if a
command being terminated with GOOD status has sense data.
This is as defined in SCSI Primary Commands - 6 (SPC-6), so all
according to spec, even if GOOD status commands were not checked before.

If scsi_check_sense() detects sense data representing a duration limit,
scsi_check_sense() will set the newly introduced SCSI ML byte
SCSIML_STAT_DL_TIMEOUT. This SCSI ML byte is checked in
scsi_noretry_cmd(), so that a command that failed because of a CDL
timeout cannot be retried. The SCSI ML byte is also checked in
scsi_result_to_blk_status() to complete the command request with the
BLK_STS_DURATION_LIMIT status, which result in the user seeing ETIME
errors for the failed commands.

Co-developed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
---
 drivers/scsi/scsi_error.c | 46 +++++++++++++++++++++++++++++++++++++++
 drivers/scsi/scsi_lib.c   |  4 ++++
 drivers/scsi/scsi_priv.h  |  1 +
 3 files changed, 51 insertions(+)

Comments

Hannes Reinecke Jan. 27, 2023, 3:34 p.m. UTC | #1
On 1/24/23 20:02, Niklas Cassel wrote:
> Commands using a duration limit descriptor that has limit policies set
> to a value other than 0x0 may be failed by the device if one of the
> limits are exceeded. For such commands, since the failure is the result
> of the user duration limit configuration and workload, the commands
> should not be retried and terminated immediately. Furthermore, to allow
> the user to differentiate these "soft" failures from hard errors due to
> hardware problem, a different error code than EIO should be returned.
> 
> There are 2 cases to consider:
> (1) The failure is due to a limit policy failing the command with a
> check condition sense key, that is, any limit policy other than 0xD.
> For this case, scsi_check_sense() is modified to detect failures with
> the ABORTED COMMAND sense key and the COMMAND TIMEOUT BEFORE PROCESSING
> or COMMAND TIMEOUT DURING PROCESSING or COMMAND TIMEOUT DURING
> PROCESSING DUE TO ERROR RECOVERY additional sense code. For these
> failures, a SUCCESS disposition is returned so that
> scsi_finish_command() is called to terminate the command.
> 
> (2) The failure is due to a limit policy set to 0xD, which result in the
> command being terminated with a GOOD status, COMPLETED sense key, and
> DATA CURRENTLY UNAVAILABLE additional sense code. To handle this case,
> the scsi_check_sense() is modified to return a SUCCESS disposition so
> that scsi_finish_command() is called to terminate the command.
> In addition, scsi_decide_disposition() has to be modified to see if a
> command being terminated with GOOD status has sense data.
> This is as defined in SCSI Primary Commands - 6 (SPC-6), so all
> according to spec, even if GOOD status commands were not checked before.
> 
> If scsi_check_sense() detects sense data representing a duration limit,
> scsi_check_sense() will set the newly introduced SCSI ML byte
> SCSIML_STAT_DL_TIMEOUT. This SCSI ML byte is checked in
> scsi_noretry_cmd(), so that a command that failed because of a CDL
> timeout cannot be retried. The SCSI ML byte is also checked in
> scsi_result_to_blk_status() to complete the command request with the
> BLK_STS_DURATION_LIMIT status, which result in the user seeing ETIME
> errors for the failed commands.
> 
> Co-developed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
> Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
> Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
> ---
>   drivers/scsi/scsi_error.c | 46 +++++++++++++++++++++++++++++++++++++++
>   drivers/scsi/scsi_lib.c   |  4 ++++
>   drivers/scsi/scsi_priv.h  |  1 +
>   3 files changed, 51 insertions(+)
> 
> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> index cf5ec5f5f4f6..9988539bc348 100644
> --- a/drivers/scsi/scsi_error.c
> +++ b/drivers/scsi/scsi_error.c
> @@ -536,6 +536,7 @@ static inline void set_scsi_ml_byte(struct scsi_cmnd *cmd, u8 status)
>    */
>   enum scsi_disposition scsi_check_sense(struct scsi_cmnd *scmd)
>   {
> +	struct request *req = scsi_cmd_to_rq(scmd);
>   	struct scsi_device *sdev = scmd->device;
>   	struct scsi_sense_hdr sshdr;
>   
> @@ -595,6 +596,22 @@ enum scsi_disposition scsi_check_sense(struct scsi_cmnd *scmd)
>   		if (sshdr.asc == 0x10) /* DIF */
>   			return SUCCESS;
>   
> +		/*
> +		 * Check aborts due to command duration limit policy:
> +		 * ABORTED COMMAND additional sense code with the
> +		 * COMMAND TIMEOUT BEFORE PROCESSING or
> +		 * COMMAND TIMEOUT DURING PROCESSING or
> +		 * COMMAND TIMEOUT DURING PROCESSING DUE TO ERROR RECOVERY
> +		 * additional sense code qualifiers.
> +		 */
> +		if (sshdr.asc == 0x2e &&
> +		    sshdr.ascq >= 0x01 && sshdr.ascq <= 0x03) {
> +			set_scsi_ml_byte(scmd, SCSIML_STAT_DL_TIMEOUT);
> +			req->cmd_flags |= REQ_FAILFAST_DEV;
> +			req->rq_flags |= RQF_QUIET;
> +			return SUCCESS;
> +		}
> +
>   		if (sshdr.asc == 0x44 && sdev->sdev_bflags & BLIST_RETRY_ITF)
>   			return ADD_TO_MLQUEUE;
>   		if (sshdr.asc == 0xc1 && sshdr.ascq == 0x01 &&
> @@ -691,6 +708,15 @@ enum scsi_disposition scsi_check_sense(struct scsi_cmnd *scmd)
>   		}
>   		return SUCCESS;
>   
> +	case COMPLETED:
> +		if (sshdr.asc == 0x55 && sshdr.ascq == 0x0a) {
> +			set_scsi_ml_byte(scmd, SCSIML_STAT_DL_TIMEOUT);
> +			req->cmd_flags |= REQ_FAILFAST_DEV;
> +			req->rq_flags |= RQF_QUIET;
> +			return SUCCESS;

You can kill this line, will be done anyway.

> +		}
> +		return SUCCESS;
> +
>   	default:
>   		return SUCCESS;
>   	}
> @@ -785,6 +811,14 @@ static enum scsi_disposition scsi_eh_completed_normally(struct scsi_cmnd *scmd)
>   	switch (get_status_byte(scmd)) {
>   	case SAM_STAT_GOOD:
>   		scsi_handle_queue_ramp_up(scmd->device);
> +		if (scmd->sense_buffer && SCSI_SENSE_VALID(scmd))
> +			/*
> +			 * If we have sense data, call scsi_check_sense() in
> +			 * order to set the correct SCSI ML byte (if any).
> +			 * No point in checking the return value, since the
> +			 * command has already completed successfully.
> +			 */
> +			scsi_check_sense(scmd);

I am every so slightly nervous here.
We never checked the sense code for GOOD status, so heaven knows if 
there are devices out there which return something here.
And you have checked that we've cleared the sense code before submitting 
(or retrying, even), right?

>   		fallthrough;
>   	case SAM_STAT_COMMAND_TERMINATED:
>   		return SUCCESS;
> @@ -1807,6 +1841,10 @@ bool scsi_noretry_cmd(struct scsi_cmnd *scmd)
>   		return !!(req->cmd_flags & REQ_FAILFAST_DRIVER);
>   	}
>   
> +	/* Never retry commands aborted due to a duration limit timeout */
> +	if (scsi_ml_byte(scmd->result) == SCSIML_STAT_DL_TIMEOUT)
> +		return true;
> +
>   	if (!scsi_status_is_check_condition(scmd->result))
>   		return false;
>   
> @@ -1966,6 +2004,14 @@ enum scsi_disposition scsi_decide_disposition(struct scsi_cmnd *scmd)
>   		if (scmd->cmnd[0] == REPORT_LUNS)
>   			scmd->device->sdev_target->expecting_lun_change = 0;
>   		scsi_handle_queue_ramp_up(scmd->device);
> +		if (scmd->sense_buffer && SCSI_SENSE_VALID(scmd))
> +			/*
> +			 * If we have sense data, call scsi_check_sense() in
> +			 * order to set the correct SCSI ML byte (if any).
> +			 * No point in checking the return value, since the
> +			 * command has already completed successfully.
> +			 */
> +			scsi_check_sense(scmd);
>   		fallthrough;
>   	case SAM_STAT_COMMAND_TERMINATED:
>   		return SUCCESS;
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index e1a021dd4da2..406952e72a68 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -600,6 +600,8 @@ static blk_status_t scsi_result_to_blk_status(int result)
>   		return BLK_STS_MEDIUM;
>   	case SCSIML_STAT_TGT_FAILURE:
>   		return BLK_STS_TARGET;
> +	case SCSIML_STAT_DL_TIMEOUT:
> +		return BLK_STS_DURATION_LIMIT;
>   	}
>   
>   	switch (host_byte(result)) {
> @@ -797,6 +799,8 @@ static void scsi_io_completion_action(struct scsi_cmnd *cmd, int result)
>   				blk_stat = BLK_STS_ZONE_OPEN_RESOURCE;
>   			}
>   			break;
> +		case COMPLETED:
> +			fallthrough;
>   		default:
>   			action = ACTION_FAIL;
>   			break;
> diff --git a/drivers/scsi/scsi_priv.h b/drivers/scsi/scsi_priv.h
> index 74324fba4281..f42388ecb024 100644
> --- a/drivers/scsi/scsi_priv.h
> +++ b/drivers/scsi/scsi_priv.h
> @@ -27,6 +27,7 @@ enum scsi_ml_status {
>   	SCSIML_STAT_NOSPC		= 0x02,	/* Space allocation on the dev failed */
>   	SCSIML_STAT_MED_ERROR		= 0x03,	/* Medium error */
>   	SCSIML_STAT_TGT_FAILURE		= 0x04,	/* Permanent target failure */
> +	SCSIML_STAT_DL_TIMEOUT		= 0x05, /* Command Duration Limit timeout */
>   };
>   
>   static inline u8 scsi_ml_byte(int result)

Cheers,

Hannes
Damien Le Moal Jan. 28, 2023, 12:06 a.m. UTC | #2
On 1/28/23 00:34, Hannes Reinecke wrote:
> On 1/24/23 20:02, Niklas Cassel wrote:
>> Commands using a duration limit descriptor that has limit policies set
>> to a value other than 0x0 may be failed by the device if one of the
>> limits are exceeded. For such commands, since the failure is the result
>> of the user duration limit configuration and workload, the commands
>> should not be retried and terminated immediately. Furthermore, to allow
>> the user to differentiate these "soft" failures from hard errors due to
>> hardware problem, a different error code than EIO should be returned.
>>
>> There are 2 cases to consider:
>> (1) The failure is due to a limit policy failing the command with a
>> check condition sense key, that is, any limit policy other than 0xD.
>> For this case, scsi_check_sense() is modified to detect failures with
>> the ABORTED COMMAND sense key and the COMMAND TIMEOUT BEFORE PROCESSING
>> or COMMAND TIMEOUT DURING PROCESSING or COMMAND TIMEOUT DURING
>> PROCESSING DUE TO ERROR RECOVERY additional sense code. For these
>> failures, a SUCCESS disposition is returned so that
>> scsi_finish_command() is called to terminate the command.
>>
>> (2) The failure is due to a limit policy set to 0xD, which result in the
>> command being terminated with a GOOD status, COMPLETED sense key, and
>> DATA CURRENTLY UNAVAILABLE additional sense code. To handle this case,
>> the scsi_check_sense() is modified to return a SUCCESS disposition so
>> that scsi_finish_command() is called to terminate the command.
>> In addition, scsi_decide_disposition() has to be modified to see if a
>> command being terminated with GOOD status has sense data.
>> This is as defined in SCSI Primary Commands - 6 (SPC-6), so all
>> according to spec, even if GOOD status commands were not checked before.
>>
>> If scsi_check_sense() detects sense data representing a duration limit,
>> scsi_check_sense() will set the newly introduced SCSI ML byte
>> SCSIML_STAT_DL_TIMEOUT. This SCSI ML byte is checked in
>> scsi_noretry_cmd(), so that a command that failed because of a CDL
>> timeout cannot be retried. The SCSI ML byte is also checked in
>> scsi_result_to_blk_status() to complete the command request with the
>> BLK_STS_DURATION_LIMIT status, which result in the user seeing ETIME
>> errors for the failed commands.
>>
>> Co-developed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
>> Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
>> Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
>> ---
>>   drivers/scsi/scsi_error.c | 46 +++++++++++++++++++++++++++++++++++++++
>>   drivers/scsi/scsi_lib.c   |  4 ++++
>>   drivers/scsi/scsi_priv.h  |  1 +
>>   3 files changed, 51 insertions(+)
>>
>> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
>> index cf5ec5f5f4f6..9988539bc348 100644
>> --- a/drivers/scsi/scsi_error.c
>> +++ b/drivers/scsi/scsi_error.c
>> @@ -536,6 +536,7 @@ static inline void set_scsi_ml_byte(struct scsi_cmnd *cmd, u8 status)
>>    */
>>   enum scsi_disposition scsi_check_sense(struct scsi_cmnd *scmd)
>>   {
>> +	struct request *req = scsi_cmd_to_rq(scmd);
>>   	struct scsi_device *sdev = scmd->device;
>>   	struct scsi_sense_hdr sshdr;
>>   
>> @@ -595,6 +596,22 @@ enum scsi_disposition scsi_check_sense(struct scsi_cmnd *scmd)
>>   		if (sshdr.asc == 0x10) /* DIF */
>>   			return SUCCESS;
>>   
>> +		/*
>> +		 * Check aborts due to command duration limit policy:
>> +		 * ABORTED COMMAND additional sense code with the
>> +		 * COMMAND TIMEOUT BEFORE PROCESSING or
>> +		 * COMMAND TIMEOUT DURING PROCESSING or
>> +		 * COMMAND TIMEOUT DURING PROCESSING DUE TO ERROR RECOVERY
>> +		 * additional sense code qualifiers.
>> +		 */
>> +		if (sshdr.asc == 0x2e &&
>> +		    sshdr.ascq >= 0x01 && sshdr.ascq <= 0x03) {
>> +			set_scsi_ml_byte(scmd, SCSIML_STAT_DL_TIMEOUT);
>> +			req->cmd_flags |= REQ_FAILFAST_DEV;
>> +			req->rq_flags |= RQF_QUIET;
>> +			return SUCCESS;
>> +		}
>> +
>>   		if (sshdr.asc == 0x44 && sdev->sdev_bflags & BLIST_RETRY_ITF)
>>   			return ADD_TO_MLQUEUE;
>>   		if (sshdr.asc == 0xc1 && sshdr.ascq == 0x01 &&
>> @@ -691,6 +708,15 @@ enum scsi_disposition scsi_check_sense(struct scsi_cmnd *scmd)
>>   		}
>>   		return SUCCESS;
>>   
>> +	case COMPLETED:
>> +		if (sshdr.asc == 0x55 && sshdr.ascq == 0x0a) {
>> +			set_scsi_ml_byte(scmd, SCSIML_STAT_DL_TIMEOUT);
>> +			req->cmd_flags |= REQ_FAILFAST_DEV;
>> +			req->rq_flags |= RQF_QUIET;
>> +			return SUCCESS;
> 
> You can kill this line, will be done anyway.
> 
>> +		}
>> +		return SUCCESS;
>> +
>>   	default:
>>   		return SUCCESS;
>>   	}
>> @@ -785,6 +811,14 @@ static enum scsi_disposition scsi_eh_completed_normally(struct scsi_cmnd *scmd)
>>   	switch (get_status_byte(scmd)) {
>>   	case SAM_STAT_GOOD:
>>   		scsi_handle_queue_ramp_up(scmd->device);
>> +		if (scmd->sense_buffer && SCSI_SENSE_VALID(scmd))
>> +			/*
>> +			 * If we have sense data, call scsi_check_sense() in
>> +			 * order to set the correct SCSI ML byte (if any).
>> +			 * No point in checking the return value, since the
>> +			 * command has already completed successfully.
>> +			 */
>> +			scsi_check_sense(scmd);
> 
> I am every so slightly nervous here.
> We never checked the sense code for GOOD status, so heaven knows if 
> there are devices out there which return something here.
> And you have checked that we've cleared the sense code before submitting 
> (or retrying, even), right?

We'll double check that.
Niklas Cassel Feb. 3, 2023, 4:49 p.m. UTC | #3
Hello Hannes,

On Fri, Jan 27, 2023 at 04:34:59PM +0100, Hannes Reinecke wrote:
> > @@ -691,6 +708,15 @@ enum scsi_disposition scsi_check_sense(struct scsi_cmnd *scmd)
> >   		}
> >   		return SUCCESS;
> > +	case COMPLETED:
> > +		if (sshdr.asc == 0x55 && sshdr.ascq == 0x0a) {
> > +			set_scsi_ml_byte(scmd, SCSIML_STAT_DL_TIMEOUT);
> > +			req->cmd_flags |= REQ_FAILFAST_DEV;
> > +			req->rq_flags |= RQF_QUIET;
> > +			return SUCCESS;
> 
> You can kill this line, will be done anyway.

Thanks, we will fix in next revision.

> 
> > +		}
> > +		return SUCCESS;
> > +
> >   	default:
> >   		return SUCCESS;
> >   	}
> > @@ -785,6 +811,14 @@ static enum scsi_disposition scsi_eh_completed_normally(struct scsi_cmnd *scmd)
> >   	switch (get_status_byte(scmd)) {
> >   	case SAM_STAT_GOOD:
> >   		scsi_handle_queue_ramp_up(scmd->device);
> > +		if (scmd->sense_buffer && SCSI_SENSE_VALID(scmd))
> > +			/*
> > +			 * If we have sense data, call scsi_check_sense() in
> > +			 * order to set the correct SCSI ML byte (if any).
> > +			 * No point in checking the return value, since the
> > +			 * command has already completed successfully.
> > +			 */
> > +			scsi_check_sense(scmd);
> 
> I am every so slightly nervous here.
> We never checked the sense code for GOOD status, so heaven knows if there
> are devices out there which return something here.

Well, right now we have had quite the opposite problem, to even allow
sense data while not being a CHECK_COMMAND, since they have been very
intertwined historically.

That alone makes me seriously doubt that this is going to be a problem.

But if there is a device that returns sense data when it shouldn't,
that is obviously a spec violation, and I assume that we would deal
with it in the same way we usually do, i.e. a device that does not
follow the spec has to be quirked. However, right now, I think that
it is sheer speculation that such a device even exists.

> And you have checked that we've cleared the sense code before submitting (or
> retrying, even), right?

Yes, scsi_queue_rq() does an unconditionall memset(cmd->sense_buffer, ...).

A retried command will call scsi_queue_insert() -> blk_mq_requeue_request(),
which will eventually reach blk_mq_dispatch_rq_list(), which will again lead
to scsi_queue_insert() getting called (which will memset the sense_buffer).

> 
> >   		fallthrough;
> >   	case SAM_STAT_COMMAND_TERMINATED:
> >   		return SUCCESS;
> > @@ -1807,6 +1841,10 @@ bool scsi_noretry_cmd(struct scsi_cmnd *scmd)
> >   		return !!(req->cmd_flags & REQ_FAILFAST_DRIVER);
> >   	}
> > +	/* Never retry commands aborted due to a duration limit timeout */
> > +	if (scsi_ml_byte(scmd->result) == SCSIML_STAT_DL_TIMEOUT)
> > +		return true;
> > +
> >   	if (!scsi_status_is_check_condition(scmd->result))
> >   		return false;
> > @@ -1966,6 +2004,14 @@ enum scsi_disposition scsi_decide_disposition(struct scsi_cmnd *scmd)
> >   		if (scmd->cmnd[0] == REPORT_LUNS)
> >   			scmd->device->sdev_target->expecting_lun_change = 0;
> >   		scsi_handle_queue_ramp_up(scmd->device);
> > +		if (scmd->sense_buffer && SCSI_SENSE_VALID(scmd))
> > +			/*
> > +			 * If we have sense data, call scsi_check_sense() in
> > +			 * order to set the correct SCSI ML byte (if any).
> > +			 * No point in checking the return value, since the
> > +			 * command has already completed successfully.
> > +			 */
> > +			scsi_check_sense(scmd);
> >   		fallthrough;
> >   	case SAM_STAT_COMMAND_TERMINATED:
> >   		return SUCCESS;
> > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> > index e1a021dd4da2..406952e72a68 100644
> > --- a/drivers/scsi/scsi_lib.c
> > +++ b/drivers/scsi/scsi_lib.c
> > @@ -600,6 +600,8 @@ static blk_status_t scsi_result_to_blk_status(int result)
> >   		return BLK_STS_MEDIUM;
> >   	case SCSIML_STAT_TGT_FAILURE:
> >   		return BLK_STS_TARGET;
> > +	case SCSIML_STAT_DL_TIMEOUT:
> > +		return BLK_STS_DURATION_LIMIT;
> >   	}
> >   	switch (host_byte(result)) {
> > @@ -797,6 +799,8 @@ static void scsi_io_completion_action(struct scsi_cmnd *cmd, int result)
> >   				blk_stat = BLK_STS_ZONE_OPEN_RESOURCE;
> >   			}
> >   			break;
> > +		case COMPLETED:
> > +			fallthrough;
> >   		default:
> >   			action = ACTION_FAIL;
> >   			break;
> > diff --git a/drivers/scsi/scsi_priv.h b/drivers/scsi/scsi_priv.h
> > index 74324fba4281..f42388ecb024 100644
> > --- a/drivers/scsi/scsi_priv.h
> > +++ b/drivers/scsi/scsi_priv.h
> > @@ -27,6 +27,7 @@ enum scsi_ml_status {
> >   	SCSIML_STAT_NOSPC		= 0x02,	/* Space allocation on the dev failed */
> >   	SCSIML_STAT_MED_ERROR		= 0x03,	/* Medium error */
> >   	SCSIML_STAT_TGT_FAILURE		= 0x04,	/* Permanent target failure */
> > +	SCSIML_STAT_DL_TIMEOUT		= 0x05, /* Command Duration Limit timeout */
> >   };
> >   static inline u8 scsi_ml_byte(int result)
> 
> Cheers,
> 
> Hannes
>
diff mbox series

Patch

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index cf5ec5f5f4f6..9988539bc348 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -536,6 +536,7 @@  static inline void set_scsi_ml_byte(struct scsi_cmnd *cmd, u8 status)
  */
 enum scsi_disposition scsi_check_sense(struct scsi_cmnd *scmd)
 {
+	struct request *req = scsi_cmd_to_rq(scmd);
 	struct scsi_device *sdev = scmd->device;
 	struct scsi_sense_hdr sshdr;
 
@@ -595,6 +596,22 @@  enum scsi_disposition scsi_check_sense(struct scsi_cmnd *scmd)
 		if (sshdr.asc == 0x10) /* DIF */
 			return SUCCESS;
 
+		/*
+		 * Check aborts due to command duration limit policy:
+		 * ABORTED COMMAND additional sense code with the
+		 * COMMAND TIMEOUT BEFORE PROCESSING or
+		 * COMMAND TIMEOUT DURING PROCESSING or
+		 * COMMAND TIMEOUT DURING PROCESSING DUE TO ERROR RECOVERY
+		 * additional sense code qualifiers.
+		 */
+		if (sshdr.asc == 0x2e &&
+		    sshdr.ascq >= 0x01 && sshdr.ascq <= 0x03) {
+			set_scsi_ml_byte(scmd, SCSIML_STAT_DL_TIMEOUT);
+			req->cmd_flags |= REQ_FAILFAST_DEV;
+			req->rq_flags |= RQF_QUIET;
+			return SUCCESS;
+		}
+
 		if (sshdr.asc == 0x44 && sdev->sdev_bflags & BLIST_RETRY_ITF)
 			return ADD_TO_MLQUEUE;
 		if (sshdr.asc == 0xc1 && sshdr.ascq == 0x01 &&
@@ -691,6 +708,15 @@  enum scsi_disposition scsi_check_sense(struct scsi_cmnd *scmd)
 		}
 		return SUCCESS;
 
+	case COMPLETED:
+		if (sshdr.asc == 0x55 && sshdr.ascq == 0x0a) {
+			set_scsi_ml_byte(scmd, SCSIML_STAT_DL_TIMEOUT);
+			req->cmd_flags |= REQ_FAILFAST_DEV;
+			req->rq_flags |= RQF_QUIET;
+			return SUCCESS;
+		}
+		return SUCCESS;
+
 	default:
 		return SUCCESS;
 	}
@@ -785,6 +811,14 @@  static enum scsi_disposition scsi_eh_completed_normally(struct scsi_cmnd *scmd)
 	switch (get_status_byte(scmd)) {
 	case SAM_STAT_GOOD:
 		scsi_handle_queue_ramp_up(scmd->device);
+		if (scmd->sense_buffer && SCSI_SENSE_VALID(scmd))
+			/*
+			 * If we have sense data, call scsi_check_sense() in
+			 * order to set the correct SCSI ML byte (if any).
+			 * No point in checking the return value, since the
+			 * command has already completed successfully.
+			 */
+			scsi_check_sense(scmd);
 		fallthrough;
 	case SAM_STAT_COMMAND_TERMINATED:
 		return SUCCESS;
@@ -1807,6 +1841,10 @@  bool scsi_noretry_cmd(struct scsi_cmnd *scmd)
 		return !!(req->cmd_flags & REQ_FAILFAST_DRIVER);
 	}
 
+	/* Never retry commands aborted due to a duration limit timeout */
+	if (scsi_ml_byte(scmd->result) == SCSIML_STAT_DL_TIMEOUT)
+		return true;
+
 	if (!scsi_status_is_check_condition(scmd->result))
 		return false;
 
@@ -1966,6 +2004,14 @@  enum scsi_disposition scsi_decide_disposition(struct scsi_cmnd *scmd)
 		if (scmd->cmnd[0] == REPORT_LUNS)
 			scmd->device->sdev_target->expecting_lun_change = 0;
 		scsi_handle_queue_ramp_up(scmd->device);
+		if (scmd->sense_buffer && SCSI_SENSE_VALID(scmd))
+			/*
+			 * If we have sense data, call scsi_check_sense() in
+			 * order to set the correct SCSI ML byte (if any).
+			 * No point in checking the return value, since the
+			 * command has already completed successfully.
+			 */
+			scsi_check_sense(scmd);
 		fallthrough;
 	case SAM_STAT_COMMAND_TERMINATED:
 		return SUCCESS;
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index e1a021dd4da2..406952e72a68 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -600,6 +600,8 @@  static blk_status_t scsi_result_to_blk_status(int result)
 		return BLK_STS_MEDIUM;
 	case SCSIML_STAT_TGT_FAILURE:
 		return BLK_STS_TARGET;
+	case SCSIML_STAT_DL_TIMEOUT:
+		return BLK_STS_DURATION_LIMIT;
 	}
 
 	switch (host_byte(result)) {
@@ -797,6 +799,8 @@  static void scsi_io_completion_action(struct scsi_cmnd *cmd, int result)
 				blk_stat = BLK_STS_ZONE_OPEN_RESOURCE;
 			}
 			break;
+		case COMPLETED:
+			fallthrough;
 		default:
 			action = ACTION_FAIL;
 			break;
diff --git a/drivers/scsi/scsi_priv.h b/drivers/scsi/scsi_priv.h
index 74324fba4281..f42388ecb024 100644
--- a/drivers/scsi/scsi_priv.h
+++ b/drivers/scsi/scsi_priv.h
@@ -27,6 +27,7 @@  enum scsi_ml_status {
 	SCSIML_STAT_NOSPC		= 0x02,	/* Space allocation on the dev failed */
 	SCSIML_STAT_MED_ERROR		= 0x03,	/* Medium error */
 	SCSIML_STAT_TGT_FAILURE		= 0x04,	/* Permanent target failure */
+	SCSIML_STAT_DL_TIMEOUT		= 0x05, /* Command Duration Limit timeout */
 };
 
 static inline u8 scsi_ml_byte(int result)