diff mbox

sd: always retry READ CAPACITY for ALUA state transition

Message ID 1508224276-130348-1-git-send-email-hare@suse.de (mailing list archive)
State Changes Requested
Headers show

Commit Message

Hannes Reinecke Oct. 17, 2017, 7:11 a.m. UTC
During ALUA state transitions the device might return
a sense code 02/04/0a (Logical unit not accessible, asymmetric
access state transition). As this is a transient error
we should just retry the READ CAPACITY call until
the state transition finishes and the correct
capacity can be returned.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/scsi/sd.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

Comments

James Bottomley Oct. 17, 2017, 1:57 p.m. UTC | #1
On Tue, 2017-10-17 at 09:11 +0200, Hannes Reinecke wrote:
> During ALUA state transitions the device might return
> a sense code 02/04/0a (Logical unit not accessible, asymmetric
> access state transition). As this is a transient error
> we should just retry the READ CAPACITY call until
> the state transition finishes and the correct
> capacity can be returned.

This will lock up the system if some ALUA initiator gets into a state
where it always returns transitioning and never completes, which
doesn't look like the best way to handle problem devices.

I thought after the ALUA transition the LUN gives a unit attention ...
can't you use that some way to trigger the capacity re-read, so do
asynchronous event notification instead of polling forever.

James
Hannes Reinecke Oct. 18, 2017, 5:54 a.m. UTC | #2
On 10/17/2017 03:57 PM, James Bottomley wrote:
> On Tue, 2017-10-17 at 09:11 +0200, Hannes Reinecke wrote:
>> During ALUA state transitions the device might return
>> a sense code 02/04/0a (Logical unit not accessible, asymmetric
>> access state transition). As this is a transient error
>> we should just retry the READ CAPACITY call until
>> the state transition finishes and the correct
>> capacity can be returned.
> 
> This will lock up the system if some ALUA initiator gets into a state
> where it always returns transitioning and never completes, which
> doesn't look like the best way to handle problem devices.
> 
> I thought after the ALUA transition the LUN gives a unit attention ...
> can't you use that some way to trigger the capacity re-read, so do
> asynchronous event notification instead of polling forever.
> 
Hmm.
Will give it a try.

Cheers,

Hannes
diff mbox

Patch

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 37daf9a..b4647f5 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -2333,6 +2333,11 @@  static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp,
 				 * give it one more chance */
 				if (--reset_retries > 0)
 					continue;
+			if (sense_valid &&
+			    sshdr.sense_key == NOT_READY &&
+			    sshdr.asc == 0x04 && sshdr.ascq == 0x0A)
+				/* ALUA state transition; always retry */
+				continue;
 		}
 		retries--;
 
@@ -2418,6 +2423,11 @@  static int read_capacity_10(struct scsi_disk *sdkp, struct scsi_device *sdp,
 				 * give it one more chance */
 				if (--reset_retries > 0)
 					continue;
+			if (sense_valid &&
+			    sshdr.sense_key == NOT_READY &&
+			    sshdr.asc == 0x04 && sshdr.ascq == 0x0A)
+				/* ALUA state transition; always retry */
+				continue;
 		}
 		retries--;