diff mbox

scsi: Always retry internal target error

Message ID 1508224263-130302-1-git-send-email-hare@suse.de (mailing list archive)
State Changes Requested
Headers show

Commit Message

Hannes Reinecke Oct. 17, 2017, 7:11 a.m. UTC
EMC Symmetrix is returning 'internal target error' for a variety
of conditions, most of which will be transient.
So we should always retry it, even with failfast set.
Otherwise we'd get spurious path flaps with multipath.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/scsi/scsi_error.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

Comments

Martin K. Petersen Oct. 18, 2017, 2:58 a.m. UTC | #1
Hannes,

> +		if (!strncmp(scmd->device->vendor, "EMC", 3) &&
> +		    !strncmp(scmd->device->model, "SYMMETRIX", 9) &&
> +		    (sshdr.asc == 0x44) && (sshdr.ascq == 0x0)) {
> +			/*
> +			 * EMC Symmetrix returns 'Internal target failure'
> +			 * for a variety of internal issues, all of which
> +			 * can be recovered by retry.
> +			 */
> +			return ADD_TO_MLQUEUE;
> +		}

It's decidedly awful to have vendor/model-specific triggers in
scsi_error.

What are the drawbacks of just always refiring on AC/0x44/ITF?
Hannes Reinecke Oct. 18, 2017, 6:15 a.m. UTC | #2
On 10/18/2017 04:58 AM, Martin K. Petersen wrote:
> 
> Hannes,
> 
>> +		if (!strncmp(scmd->device->vendor, "EMC", 3) &&
>> +		    !strncmp(scmd->device->model, "SYMMETRIX", 9) &&
>> +		    (sshdr.asc == 0x44) && (sshdr.ascq == 0x0)) {
>> +			/*
>> +			 * EMC Symmetrix returns 'Internal target failure'
>> +			 * for a variety of internal issues, all of which
>> +			 * can be recovered by retry.
>> +			 */
>> +			return ADD_TO_MLQUEUE;
>> +		}
> 
> It's decidedly awful to have vendor/model-specific triggers in
> scsi_error.
> 
> What are the drawbacks of just always refiring on AC/0x44/ITF?
> 
Hmm. 'Internal target failure' is not very descriptive, so it could mean
anything. Hence the rather awkward approach.
But I just checked with the qemu code, and that returns 0x44/0x00 only
if some (internal) call returned with -ENOMEM.
So I guess we're safe to always retry here.

Cheers,

Hannes
Christoph Hellwig Oct. 18, 2017, 6:34 a.m. UTC | #3
On Wed, Oct 18, 2017 at 08:15:46AM +0200, Hannes Reinecke wrote:
> > It's decidedly awful to have vendor/model-specific triggers in
> > scsi_error.
> > 
> > What are the drawbacks of just always refiring on AC/0x44/ITF?
> > 
> Hmm. 'Internal target failure' is not very descriptive, so it could mean
> anything. Hence the rather awkward approach.
> But I just checked with the qemu code, and that returns 0x44/0x00 only
> if some (internal) call returned with -ENOMEM.
> So I guess we're safe to always retry here.

And even if not we should add a quirk so that we can just check a
bit instead of doing two string compares in the I/O completion path..
Hannes Reinecke Oct. 18, 2017, 7:27 a.m. UTC | #4
On 10/18/2017 08:34 AM, Christoph Hellwig wrote:
> On Wed, Oct 18, 2017 at 08:15:46AM +0200, Hannes Reinecke wrote:
>>> It's decidedly awful to have vendor/model-specific triggers in
>>> scsi_error.
>>>
>>> What are the drawbacks of just always refiring on AC/0x44/ITF?
>>>
>> Hmm. 'Internal target failure' is not very descriptive, so it could mean
>> anything. Hence the rather awkward approach.
>> But I just checked with the qemu code, and that returns 0x44/0x00 only
>> if some (internal) call returned with -ENOMEM.
>> So I guess we're safe to always retry here.
> 
> And even if not we should add a quirk so that we can just check a
> bit instead of doing two string compares in the I/O completion path..
> 
Yeah, you are right.
Will be adding a blacklist entry for this.

Cheers,

Hannes
Xose Vazquez Perez Jan. 10, 2018, 4:54 p.m. UTC | #5
> On 10/17/2017 09:11 AM, Hannes Reinecke wrote:
>> EMC Symmetrix is returning 'internal target error' for a variety
>> of conditions, most of which will be transient.
>> So we should always retry it, even with failfast set.
>> Otherwise we'd get spurious path flaps with multipath.
>> 
>> Signed-off-by: Hannes Reinecke <hare@suse.de>
>> ---
>>  drivers/scsi/scsi_error.c | 13 +++++++++++++
>>  1 file changed, 13 insertions(+)
>> 
>> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
>> index 5086489..5722c2e 100644
>> --- a/drivers/scsi/scsi_error.c
>> +++ b/drivers/scsi/scsi_error.c
>> @@ -497,6 +497,16 @@ int scsi_check_sense(struct scsi_cmnd *scmd)
>>  		if (sshdr.asc == 0x10) /* DIF */
>>  			return SUCCESS;
>>  
>> +		if (!strncmp(scmd->device->vendor, "EMC", 3) &&
>> +		    !strncmp(scmd->device->model, "SYMMETRIX", 9) &&
>> +		    (sshdr.asc == 0x44) && (sshdr.ascq == 0x0)) {
>> +			/*
>> +			 * EMC Symmetrix returns 'Internal target failure'
>> +			 * for a variety of internal issues, all of which
>> +			 * can be recovered by retry.
>> +			 */
>> +			return ADD_TO_MLQUEUE;
>> +		}
>>  		return NEEDS_RETRY;
>>  	case NOT_READY:
>>  	case UNIT_ATTENTION:
>> @@ -1846,6 +1856,9 @@ int scsi_decide_disposition(struct scsi_cmnd *scmd)
>>  		rtn = scsi_check_sense(scmd);
>>  		if (rtn == NEEDS_RETRY)
>>  			goto maybe_retry;
>> +		else if (rtn == ADD_TO_MLQUEUE)
>> +			/* Always enforce a retry for ADD_TO_MLQUEUE */
>> +			rtn = NEEDS_RETRY;
>>  		/* if rtn == FAILED, we have no sense information;
>>  		 * returning FAILED will wake the error handler thread
>>  		 * to collect the sense and redo the decide
>> 

On 10/18/2017 07:27 AM, Hannes Reinecke wrote:
>> Yeah, you are right.
>> Will be adding a blacklist entry for this.


Is there a new patch ongoing?
diff mbox

Patch

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 5086489..5722c2e 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -497,6 +497,16 @@  int scsi_check_sense(struct scsi_cmnd *scmd)
 		if (sshdr.asc == 0x10) /* DIF */
 			return SUCCESS;
 
+		if (!strncmp(scmd->device->vendor, "EMC", 3) &&
+		    !strncmp(scmd->device->model, "SYMMETRIX", 9) &&
+		    (sshdr.asc == 0x44) && (sshdr.ascq == 0x0)) {
+			/*
+			 * EMC Symmetrix returns 'Internal target failure'
+			 * for a variety of internal issues, all of which
+			 * can be recovered by retry.
+			 */
+			return ADD_TO_MLQUEUE;
+		}
 		return NEEDS_RETRY;
 	case NOT_READY:
 	case UNIT_ATTENTION:
@@ -1846,6 +1856,9 @@  int scsi_decide_disposition(struct scsi_cmnd *scmd)
 		rtn = scsi_check_sense(scmd);
 		if (rtn == NEEDS_RETRY)
 			goto maybe_retry;
+		else if (rtn == ADD_TO_MLQUEUE)
+			/* Always enforce a retry for ADD_TO_MLQUEUE */
+			rtn = NEEDS_RETRY;
 		/* if rtn == FAILED, we have no sense information;
 		 * returning FAILED will wake the error handler thread
 		 * to collect the sense and redo the decide