diff mbox

[1/1] multipath-tools: Change path checker for IBM IPR devices

Message ID 20140925165743.GA20621@infradead.org (mailing list archive)
State Not Applicable, archived
Delegated to: Mike Snitzer
Headers show

Commit Message

Christoph Hellwig Sept. 25, 2014, 4:57 p.m. UTC
On Thu, Sep 25, 2014 at 11:47:42AM -0500, Brian King wrote:
> The issue we've run into started when this patch started making its
> way into distros:
> 
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/scsi/scsi_error.c?id=14216561e164671ce147458653b1fea06a4ada1e
> 
> That changed the behaviour for user initiated TUR commands. After an ipr
> adapter gets reset, all disk array devices require a start unit command
> to be issued to them before they will accept commands. So, with the SCSI
> EH change, we now end up in a scenario with dual ipr adapters where the
> TUR getting issued from the health checker returns with a Not Ready response
> and since SCSI EH no longer triggers the Start Unit in this scenario,
> the path never recovers.
> 
> The alternative solution would be to change the TUR path checker in multipath-tools
> to issue a Start Unit if it sees a 02/04/02.

Or we could fix up the check introduced by the commit, with something
ala:


> 
> Thanks,
> 
> Brian
> 
> -- 
> Brian King
> Power Linux I/O
> IBM Linux Technology Center
> 
> 
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
---end quoted text---

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

Comments

wenxiong@linux.vnet.ibm.com Sept. 30, 2014, 6:05 p.m. UTC | #1
Quoting Christoph Hellwig <hch@infradead.org>:

> On Thu, Sep 25, 2014 at 11:47:42AM -0500, Brian King wrote:
>> The issue we've run into started when this patch started making its
>> way into distros:
>>
>> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/scsi/scsi_error.c?id=14216561e164671ce147458653b1fea06a4ada1e
>>
>> That changed the behaviour for user initiated TUR commands. After an ipr
>> adapter gets reset, all disk array devices require a start unit command
>> to be issued to them before they will accept commands. So, with the SCSI
>> EH change, we now end up in a scenario with dual ipr adapters where the
>> TUR getting issued from the health checker returns with a Not Ready response
>> and since SCSI EH no longer triggers the Start Unit in this scenario,
>> the path never recovers.
>>
>> The alternative solution would be to change the TUR path checker in  
>> multipath-tools
>> to issue a Start Unit if it sees a 02/04/02.
>
> Or we could fix up the check introduced by the commit, with something
> ala:
>
> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> index a2c3d3d..7228d9e 100644
> --- a/drivers/scsi/scsi_error.c
> +++ b/drivers/scsi/scsi_error.c
> @@ -459,13 +459,18 @@ static int scsi_check_sense(struct scsi_cmnd *scmd)
>  	if (! scsi_command_normalize_sense(scmd, &sshdr))
>  		return FAILED;	/* no valid sense data */
>
> -	if (scmd->cmnd[0] == TEST_UNIT_READY && scmd->scsi_done != scsi_eh_done)
> +	if (scmd->cmnd[0] == TEST_UNIT_READY &&
> +	    scmd->request->cmd_type == REQ_TYPE_FS &&
> +	    scmd->scsi_done != scsi_eh_done) {
>  		/*
>  		 * nasty: for mid-layer issued TURs, we need to return the
>  		 * actual sense data without any recovery attempt.  For eh
> -		 * issued ones, we need to try to recover and interpret
> +		 * issued ones, we need to try to recover and interpret,
> +		 * and for pass through TURs we just need to stay out of the
> +		 * way, so that the device handlers can do the right thing.
>  		 */
>  		return SUCCESS;
> +	}
>
>  	scsi_report_sense(sdev, &sshdr);
>
>

Hi Christoph,

We have verified above patch in our test group system yesterday and  
today. It works fine with their testcases.

Thanks,
Wendy


>>
>> Thanks,
>>
>> Brian
>>
>> --
>> Brian King
>> Power Linux I/O
>> IBM Linux Technology Center
>>
>>
>> --
>> dm-devel mailing list
>> dm-devel@redhat.com
>> https://www.redhat.com/mailman/listinfo/dm-devel
> ---end quoted text---
>
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
diff mbox

Patch

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index a2c3d3d..7228d9e 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -459,13 +459,18 @@  static int scsi_check_sense(struct scsi_cmnd *scmd)
 	if (! scsi_command_normalize_sense(scmd, &sshdr))
 		return FAILED;	/* no valid sense data */
 
-	if (scmd->cmnd[0] == TEST_UNIT_READY && scmd->scsi_done != scsi_eh_done)
+	if (scmd->cmnd[0] == TEST_UNIT_READY &&
+	    scmd->request->cmd_type == REQ_TYPE_FS &&
+	    scmd->scsi_done != scsi_eh_done) {
 		/*
 		 * nasty: for mid-layer issued TURs, we need to return the
 		 * actual sense data without any recovery attempt.  For eh
-		 * issued ones, we need to try to recover and interpret
+		 * issued ones, we need to try to recover and interpret,
+		 * and for pass through TURs we just need to stay out of the
+		 * way, so that the device handlers can do the right thing.
 		 */
 		return SUCCESS;
+	}
 
 	scsi_report_sense(sdev, &sshdr);