diff mbox

sd: always retry READ CAPACITY for ALUA state transition

Message ID 1430127309-90412-1-git-send-email-hare@suse.de (mailing list archive)
State New, archived
Headers show

Commit Message

Hannes Reinecke April 27, 2015, 9:35 a.m. UTC
During ALUA state transitions the device might return
a sense code 02/04/0a (Logical unit not accessible, asymmetric
access state transition). As this is a transient error
we should just retry the READ CAPACITY call until
the state transition finishes and the correct
capacity can be returned.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/scsi/sd.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

Comments

James Bottomley April 28, 2015, 9:18 p.m. UTC | #1
On Mon, 2015-04-27 at 11:35 +0200, Hannes Reinecke wrote:
> During ALUA state transitions the device might return
> a sense code 02/04/0a (Logical unit not accessible, asymmetric
> access state transition). As this is a transient error
> we should just retry the READ CAPACITY call until
> the state transition finishes and the correct
> capacity can be returned.
> 
> Signed-off-by: Hannes Reinecke <hare@suse.de>
> ---
>  drivers/scsi/sd.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index 79beebf..7178b05 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -1987,6 +1987,11 @@ static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp,
>  				 * give it one more chance */
>  				if (--reset_retries > 0)
>  					continue;
> +			if (sense_valid &&
> +			    sshdr.sense_key == NOT_READY &&
> +			    sshdr.asc == 0x04 && sshdr.ascq == 0x0A)
> +				/* ALUA state transition; always retry */
> +				continue;
>  		}
>  		retries--;
>  
> @@ -2069,6 +2074,11 @@ static int read_capacity_10(struct scsi_disk *sdkp, struct scsi_device *sdp,
>  				 * give it one more chance */
>  				if (--reset_retries > 0)
>  					continue;
> +			if (sense_valid &&
> +			    sshdr.sense_key == NOT_READY &&
> +			    sshdr.asc == 0x04 && sshdr.ascq == 0x0A)
> +				/* ALUA state transition; always retry */
> +				continue;
>  		}
>  		retries--;
>  

Got to say I really don't like this infinite retry possibility.  How
long does the ALUA transition take?  Would increasing retries work (or
even hijacking reset_retries)?

James


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hannes Reinecke April 30, 2015, 12:26 p.m. UTC | #2
On 04/28/2015 11:18 PM, James Bottomley wrote:
> On Mon, 2015-04-27 at 11:35 +0200, Hannes Reinecke wrote:
>> During ALUA state transitions the device might return
>> a sense code 02/04/0a (Logical unit not accessible, asymmetric
>> access state transition). As this is a transient error
>> we should just retry the READ CAPACITY call until
>> the state transition finishes and the correct
>> capacity can be returned.
>>
>> Signed-off-by: Hannes Reinecke <hare@suse.de>
>> ---
>>  drivers/scsi/sd.c | 10 ++++++++++
>>  1 file changed, 10 insertions(+)
>>
>> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
>> index 79beebf..7178b05 100644
>> --- a/drivers/scsi/sd.c
>> +++ b/drivers/scsi/sd.c
>> @@ -1987,6 +1987,11 @@ static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp,
>>  				 * give it one more chance */
>>  				if (--reset_retries > 0)
>>  					continue;
>> +			if (sense_valid &&
>> +			    sshdr.sense_key == NOT_READY &&
>> +			    sshdr.asc == 0x04 && sshdr.ascq == 0x0A)
>> +				/* ALUA state transition; always retry */
>> +				continue;
>>  		}
>>  		retries--;
>>  
>> @@ -2069,6 +2074,11 @@ static int read_capacity_10(struct scsi_disk *sdkp, struct scsi_device *sdp,
>>  				 * give it one more chance */
>>  				if (--reset_retries > 0)
>>  					continue;
>> +			if (sense_valid &&
>> +			    sshdr.sense_key == NOT_READY &&
>> +			    sshdr.asc == 0x04 && sshdr.ascq == 0x0A)
>> +				/* ALUA state transition; always retry */
>> +				continue;
>>  		}
>>  		retries--;
>>  
> 
> Got to say I really don't like this infinite retry possibility.  How
> long does the ALUA transition take?  Would increasing retries work (or
> even hijacking reset_retries)?
> 
Well ... transitioning could be quite long (NetApp FAS has a
transition timeout of 30 _minutes_ ...).
But yeah, I could see to limit this somewhat.

Cheers,

Hannes
Martin George May 1, 2015, 12:39 p.m. UTC | #3
On 4/30/2015 5:56 PM, Hannes Reinecke wrote:
> On 04/28/2015 11:18 PM, James Bottomley wrote:
>> On Mon, 2015-04-27 at 11:35 +0200, Hannes Reinecke wrote:
>>> During ALUA state transitions the device might return
>>> a sense code 02/04/0a (Logical unit not accessible, asymmetric
>>> access state transition). As this is a transient error
>>> we should just retry the READ CAPACITY call until
>>> the state transition finishes and the correct
>>> capacity can be returned.
>>>
>>> Signed-off-by: Hannes Reinecke <hare@suse.de>
>>> ---
>>>   drivers/scsi/sd.c | 10 ++++++++++
>>>   1 file changed, 10 insertions(+)
>>>
>>> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
>>> index 79beebf..7178b05 100644
>>> --- a/drivers/scsi/sd.c
>>> +++ b/drivers/scsi/sd.c
>>> @@ -1987,6 +1987,11 @@ static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp,
>>>   				 * give it one more chance */
>>>   				if (--reset_retries > 0)
>>>   					continue;
>>> +			if (sense_valid &&
>>> +			    sshdr.sense_key == NOT_READY &&
>>> +			    sshdr.asc == 0x04 && sshdr.ascq == 0x0A)
>>> +				/* ALUA state transition; always retry */
>>> +				continue;
>>>   		}
>>>   		retries--;
>>>
>>> @@ -2069,6 +2074,11 @@ static int read_capacity_10(struct scsi_disk *sdkp, struct scsi_device *sdp,
>>>   				 * give it one more chance */
>>>   				if (--reset_retries > 0)
>>>   					continue;
>>> +			if (sense_valid &&
>>> +			    sshdr.sense_key == NOT_READY &&
>>> +			    sshdr.asc == 0x04 && sshdr.ascq == 0x0A)
>>> +				/* ALUA state transition; always retry */
>>> +				continue;
>>>   		}
>>>   		retries--;
>>>
>>
>> Got to say I really don't like this infinite retry possibility.  How
>> long does the ALUA transition take?  Would increasing retries work (or
>> even hijacking reset_retries)?
>>
> Well ... transitioning could be quite long (NetApp FAS has a
> transition timeout of 30 _minutes_ ...).

Well, actually NetApp FAS has a transition timeout of 2 minutes, and not 
30 minutes - as reported in the IMPLICIT TRANSITION TIMEOUT value in the 
extended RTPG data.

-Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
James Bottomley May 1, 2015, 1:22 p.m. UTC | #4
On Thu, 2015-04-30 at 14:26 +0200, Hannes Reinecke wrote:
> On 04/28/2015 11:18 PM, James Bottomley wrote:
> > On Mon, 2015-04-27 at 11:35 +0200, Hannes Reinecke wrote:
> >> During ALUA state transitions the device might return
> >> a sense code 02/04/0a (Logical unit not accessible, asymmetric
> >> access state transition). As this is a transient error
> >> we should just retry the READ CAPACITY call until
> >> the state transition finishes and the correct
> >> capacity can be returned.
> >>
> >> Signed-off-by: Hannes Reinecke <hare@suse.de>
> >> ---
> >>  drivers/scsi/sd.c | 10 ++++++++++
> >>  1 file changed, 10 insertions(+)
> >>
> >> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> >> index 79beebf..7178b05 100644
> >> --- a/drivers/scsi/sd.c
> >> +++ b/drivers/scsi/sd.c
> >> @@ -1987,6 +1987,11 @@ static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp,
> >>  				 * give it one more chance */
> >>  				if (--reset_retries > 0)
> >>  					continue;
> >> +			if (sense_valid &&
> >> +			    sshdr.sense_key == NOT_READY &&
> >> +			    sshdr.asc == 0x04 && sshdr.ascq == 0x0A)
> >> +				/* ALUA state transition; always retry */
> >> +				continue;
> >>  		}
> >>  		retries--;
> >>  
> >> @@ -2069,6 +2074,11 @@ static int read_capacity_10(struct scsi_disk *sdkp, struct scsi_device *sdp,
> >>  				 * give it one more chance */
> >>  				if (--reset_retries > 0)
> >>  					continue;
> >> +			if (sense_valid &&
> >> +			    sshdr.sense_key == NOT_READY &&
> >> +			    sshdr.asc == 0x04 && sshdr.ascq == 0x0A)
> >> +				/* ALUA state transition; always retry */
> >> +				continue;
> >>  		}
> >>  		retries--;
> >>  
> > 
> > Got to say I really don't like this infinite retry possibility.  How
> > long does the ALUA transition take?  Would increasing retries work (or
> > even hijacking reset_retries)?
> > 
> Well ... transitioning could be quite long (NetApp FAS has a
> transition timeout of 30 _minutes_ ...).
> But yeah, I could see to limit this somewhat.

I think that might be a good idea.  We can't hold this device (and the
corresponding asynchronous probe thread) in a continuous loop for 30
minutes ...

James


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 79beebf..7178b05 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -1987,6 +1987,11 @@  static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp,
 				 * give it one more chance */
 				if (--reset_retries > 0)
 					continue;
+			if (sense_valid &&
+			    sshdr.sense_key == NOT_READY &&
+			    sshdr.asc == 0x04 && sshdr.ascq == 0x0A)
+				/* ALUA state transition; always retry */
+				continue;
 		}
 		retries--;
 
@@ -2069,6 +2074,11 @@  static int read_capacity_10(struct scsi_disk *sdkp, struct scsi_device *sdp,
 				 * give it one more chance */
 				if (--reset_retries > 0)
 					continue;
+			if (sense_valid &&
+			    sshdr.sense_key == NOT_READY &&
+			    sshdr.asc == 0x04 && sshdr.ascq == 0x0A)
+				/* ALUA state transition; always retry */
+				continue;
 		}
 		retries--;