diff mbox series

[v12,04/20] scsi: sd: Have scsi-ml retry read_capacity_16 errors

Message ID 20231114013750.76609-5-michael.christie@oracle.com (mailing list archive)
State Changes Requested
Headers show
Series scsi: Allow scsi_execute users to request retries | expand

Commit Message

Mike Christie Nov. 14, 2023, 1:37 a.m. UTC
This has read_capacity_16 have scsi-ml retry errors instead of driving
them itself.

There are 2 behavior changes with this patch:
1. There is one behavior change where we no longer retry when
scsi_execute_cmd returns < 0, but we should be ok. We don't need to retry
for failures like the queue being removed, and for the case where there
are no tags/reqs since the block layer waits/retries for us. For possible
memory allocation failures from blk_rq_map_kern we use GFP_NOIO, so
retrying will probably not help.
2. For the specific UAs we checked for and retried, we would get
READ_CAPACITY_RETRIES_ON_RESET retries plus whatever retries were left
from the main loop's retries. Each UA now gets
READ_CAPACITY_RETRIES_ON_RESET reties, and the other errors get up to 3
retries. This is most likely ok, because READ_CAPACITY_RETRIES_ON_RESET
is already 10 and is not based on anything specific like a spec or
device, so the extra 3 we got from the main loop was probably just an
accident and is not going to help.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/scsi/sd.c | 108 ++++++++++++++++++++++++++++++----------------
 1 file changed, 71 insertions(+), 37 deletions(-)

Comments

John Garry Nov. 16, 2023, 11:39 a.m. UTC | #1
On 14/11/2023 01:37, Mike Christie wrote:

Feel free to pick up:
Reviewed-by: John Garry <john.g.garry@oracle.com>

> +	the_result = scsi_execute_cmd(sdp, cmd, REQ_OP_DRV_IN, buffer,
> +				      RC16_LEN, SD_TIMEOUT, sdkp->max_retries,

BTW, some might find it a little confusing that we have 
scsi_execute_cmd() retries arg as well as being able to pass 
exec_args.failure, and exec_args.failure may allow us to retry. Could 
this be improved, even in terms of arg or struct member naming/comments?

Thanks,
John

> +				      &exec_args);
>   
> -	} while (the_result && retries);
> +	if (the_result > 0) {
> +		if (media_not_present(sdkp, &sshdr))
> +			return -ENODEV;
> +
> +		sense_valid = scsi_sense_valid(&sshdr);
> +		if (sense_valid && sshdr.sense_key == ILLEGAL_REQUEST &&
> +		    (sshdr.asc == 0x20 || sshdr.asc == 0x24) &&
> +		     sshdr.ascq == 0x00) {
> +			/*
> +			 * Invalid Command Operation Code or Invalid Field in
> +			 * CDB, just retry silently with RC10
> +			 */
> +			return -EINVAL;
> +		}
> +	}
Mike Christie Nov. 16, 2023, 5:15 p.m. UTC | #2
On 11/16/23 5:39 AM, John Garry wrote:
> On 14/11/2023 01:37, Mike Christie wrote:
> 
> Feel free to pick up:
> Reviewed-by: John Garry <john.g.garry@oracle.com>
> 
>> +    the_result = scsi_execute_cmd(sdp, cmd, REQ_OP_DRV_IN, buffer,
>> +                      RC16_LEN, SD_TIMEOUT, sdkp->max_retries,
> 
> BTW, some might find it a little confusing that we have scsi_execute_cmd() retries arg as well as being able to pass exec_args.failure, and exec_args.failure may allow us to retry. Could this be improved, even in terms of arg or struct member naming/comments?

Will add some comments.

Martin W, if you are going to ask about this again, the answer is that
in a future patchset, we can kill the reties passed directly into
scsi_execute_cmd.

We could make scsi_decide_disposition's failures an array of
scsi_failure structs that gets checked in scsi_decide_disposition
and scsi_check_passthrough. We would need to add a function callout
to the scsi_failure struct for some of the more complicated failure
handling. That could then be used for some of the other scsi_execute_cmd
cases as well. For the normal/fast path case though we would need
something to avoid function pointers.
Martin Wilck Nov. 16, 2023, 5:57 p.m. UTC | #3
On Thu, 2023-11-16 at 11:15 -0600, Mike Christie wrote:
> On 11/16/23 5:39 AM, John Garry wrote:
> > On 14/11/2023 01:37, Mike Christie wrote:
> > 
> > Feel free to pick up:
> > Reviewed-by: John Garry <john.g.garry@oracle.com>
> > 
> > > +    the_result = scsi_execute_cmd(sdp, cmd, REQ_OP_DRV_IN,
> > > buffer,
> > > +                      RC16_LEN, SD_TIMEOUT, sdkp->max_retries,
> > 
> > BTW, some might find it a little confusing that we have
> > scsi_execute_cmd() retries arg as well as being able to pass
> > exec_args.failure, and exec_args.failure may allow us to retry.
> > Could this be improved, even in terms of arg or struct member
> > naming/comments?
> 
> Will add some comments.
> 
> Martin W, if you are going to ask about this again, the answer is
> that
> in a future patchset, we can kill the reties passed directly into
> scsi_execute_cmd.
> 

Did I ask about this? ... I dimly remember ;-)

> We could make scsi_decide_disposition's failures an array of
> scsi_failure structs that gets checked in scsi_decide_disposition
> and scsi_check_passthrough. We would need to add a function callout
> to the scsi_failure struct for some of the more complicated failure
> handling. That could then be used for some of the other
> scsi_execute_cmd
> cases as well. For the normal/fast path case though we would need
> something to avoid function pointers.

I am not sure if I like this idea. scsi_decide_disposition() is
challenging to read already. I am not convinced that converting it into
a loop over "struct scsi_failure" would make it easier to understand.
I guess we'd need to see how it would look like. 

To my understanding, the two retry counts are currently orthogonal, the
one in scsi_execute_cmd() representing the standard mid-layer
"maybe_retry" case in scsi_decide_disposition, and the the other one
representing the additional "high level" passthrough retry. We need to
be careful when merging them.

Wrt the callout, I'd actually thought about that when looking at your
previous set, but I didn't want to interfere too much. I thought that
using callouts might simplify the scsi_failure handling, and might
actually make the new code easier to read. I can't judge possible
effects on performance.

Cheers,
Martin
diff mbox series

Patch

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index fa00dd503cbf..1af04b01e1df 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -2453,55 +2453,89 @@  static void read_capacity_error(struct scsi_disk *sdkp, struct scsi_device *sdp,
 static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp,
 						unsigned char *buffer)
 {
-	unsigned char cmd[16];
-	struct scsi_sense_hdr sshdr;
-	const struct scsi_exec_args exec_args = {
-		.sshdr = &sshdr,
+	static const u8 cmd[16] = {
+		[0] = SERVICE_ACTION_IN_16,
+		[1] = SAI_READ_CAPACITY_16,
+		[13] = RC16_LEN,
 	};
+	struct scsi_sense_hdr sshdr;
 	int sense_valid = 0;
 	int the_result;
-	int retries = 3, reset_retries = READ_CAPACITY_RETRIES_ON_RESET;
 	unsigned int alignment;
 	unsigned long long lba;
 	unsigned sector_size;
+	struct scsi_failure failure_defs[] = {
+		/*
+		 * Do not retry Invalid Command Operation Code or Invalid
+		 * Field in CDB.
+		 */
+		{
+			.sense = ILLEGAL_REQUEST,
+			.asc = 0x20,
+			.result = SAM_STAT_CHECK_CONDITION,
+		},
+		{
+			.sense = ILLEGAL_REQUEST,
+			.asc = 0x24,
+			.result = SAM_STAT_CHECK_CONDITION,
+		},
+		/* Do not retry Medium Not Present */
+		{
+			.sense = UNIT_ATTENTION,
+			.asc = 0x3A,
+			.result = SAM_STAT_CHECK_CONDITION,
+		},
+		{
+			.sense = NOT_READY,
+			.asc = 0x3A,
+			.result = SAM_STAT_CHECK_CONDITION,
+		},
+		/* Device reset might occur several times so retry a lot */
+		{
+			.sense = UNIT_ATTENTION,
+			.asc = 0x29,
+			.allowed = READ_CAPACITY_RETRIES_ON_RESET,
+			.result = SAM_STAT_CHECK_CONDITION,
+		},
+		/* Any other error not listed above retry 3 times */
+		{
+			.result = SCMD_FAILURE_RESULT_ANY,
+			.allowed = 3,
+		},
+		{}
+	};
+	struct scsi_failures failures = {
+		.failure_definitions = failure_defs,
+	};
+	const struct scsi_exec_args exec_args = {
+		.sshdr = &sshdr,
+		.failures = &failures,
+	};
 
 	if (sdp->no_read_capacity_16)
 		return -EINVAL;
 
-	do {
-		memset(cmd, 0, 16);
-		cmd[0] = SERVICE_ACTION_IN_16;
-		cmd[1] = SAI_READ_CAPACITY_16;
-		cmd[13] = RC16_LEN;
-		memset(buffer, 0, RC16_LEN);
-
-		the_result = scsi_execute_cmd(sdp, cmd, REQ_OP_DRV_IN,
-					      buffer, RC16_LEN, SD_TIMEOUT,
-					      sdkp->max_retries, &exec_args);
-		if (the_result > 0) {
-			if (media_not_present(sdkp, &sshdr))
-				return -ENODEV;
+	memset(buffer, 0, RC16_LEN);
 
-			sense_valid = scsi_sense_valid(&sshdr);
-			if (sense_valid &&
-			    sshdr.sense_key == ILLEGAL_REQUEST &&
-			    (sshdr.asc == 0x20 || sshdr.asc == 0x24) &&
-			    sshdr.ascq == 0x00)
-				/* Invalid Command Operation Code or
-				 * Invalid Field in CDB, just retry
-				 * silently with RC10 */
-				return -EINVAL;
-			if (sense_valid &&
-			    sshdr.sense_key == UNIT_ATTENTION &&
-			    sshdr.asc == 0x29 && sshdr.ascq == 0x00)
-				/* Device reset might occur several times,
-				 * give it one more chance */
-				if (--reset_retries > 0)
-					continue;
-		}
-		retries--;
+	the_result = scsi_execute_cmd(sdp, cmd, REQ_OP_DRV_IN, buffer,
+				      RC16_LEN, SD_TIMEOUT, sdkp->max_retries,
+				      &exec_args);
 
-	} while (the_result && retries);
+	if (the_result > 0) {
+		if (media_not_present(sdkp, &sshdr))
+			return -ENODEV;
+
+		sense_valid = scsi_sense_valid(&sshdr);
+		if (sense_valid && sshdr.sense_key == ILLEGAL_REQUEST &&
+		    (sshdr.asc == 0x20 || sshdr.asc == 0x24) &&
+		     sshdr.ascq == 0x00) {
+			/*
+			 * Invalid Command Operation Code or Invalid Field in
+			 * CDB, just retry silently with RC10
+			 */
+			return -EINVAL;
+		}
+	}
 
 	if (the_result) {
 		sd_print_result(sdkp, "Read Capacity(16) failed", the_result);