diff mbox

target/qla2xxx: Honor max_data_sg_nents I/O transfer limit

Message ID D2174D86.B2810%himanshu.madhani@qlogic.com (mailing list archive)
State New, archived
Headers show

Commit Message

Himanshu Madhani Sept. 10, 2015, 10:16 p.m. UTC
Hi Nic, 

Sorry about the delay in response.

I have tested with RHEL 6.5 and noticed that IO were not able to complete
with 16M and 32M read/write. IO¹s were getting error due to Mid-layer
underflow

With initiator running upstream kernel version 4.2 as well I was seeing
error with Mid-layer reporting under-run.

I made change in the driver to report DID_OK to Mid-layer with residual
count and I was able to see IO completed without any error.
Basically, overriding driver¹s logic of failing an I/O, when residual
makes it an error with scsi_cmnd->underflow set.
In reality, though, with scsi_cmnd->underflow being set, Linux Qlogic
initiator will always see this as an error.
Maybe this is useful in other OSes. We have verified both read and write
data on FC trace and it looks good.

Here¹s diff that was done in qla2xxx driver to report response to
mid-layer as DID_OK.


# git diff
                } else if (lscsi_status != SAM_STAT_TASK_SET_FULL &&
(END)


Please go ahead and add patch to mainline kernel.

Thanks,
-Himanshu




On 9/9/15, 4:44 PM, "Nicholas A. Bellinger" <nab@linux-iscsi.org> wrote:

>Hi Arun & Co,
>
>On Thu, 2015-08-20 at 16:17 -0700, Nicholas A. Bellinger wrote:
>> On Wed, 2015-08-19 at 17:48 -0700, Arun Easi wrote:
>> > Hi nab,
>> > 
>> > On Thu, 13 Aug 2015, 1:45am -0700, Nicholas A. Bellinger wrote:
>> > 
>> > > From: Nicholas Bellinger <nab@linux-iscsi.org>
>> > >
>> > > Hi Arun, Roland & Co,
>> > >
>> > > Based upon the feedback from last week, here is a proper
>> > > patch for target-core to honor a fabric provided SGL limit
>> > > using residual count plus underflow response bit.
>> > >
>> > > Everything appears to be working as expected with tcm-loop
>> > > LUNs with basic I/O and sg_raw to test CDB underflow, but
>> > > still needs to be verified on real qla2xxx hardware over
>> > > the next days for v4.2.0 code.
>> > >
>> > > Please review + test.
>> > 
>> > Changes look good. I could not test the changes, though. I have
>>requested 
>> > internally to test this patch. Himanshu (copied) will get back with
>>the 
>> > test results (Thanks Himanshu).
>> > 
>> 
>> Thanks for the update.  Btw, this patch has been pushed to
>> refs/heads/queue, atop the other recent v4.2-rc fixes here:
>> 
>> 
>>https://git.kernel.org/cgit/linux/kernel/git/nab/target-pending.git/commi
>>t/?h=queue
>> 
>> Just an FYI for Himanshu, for testing against Linux hosts that do honor
>> EVPD block-limits, you'll want to include the following small hack to
>> report a larger block-limits value so the new residual handling can
>> actually get invoked.
>> 
>> diff --git a/drivers/target/target_core_spc.c
>>b/drivers/target/target_core_spc.c
>> index 01421e9..f02767b 100644
>> --- a/drivers/target/target_core_spc.c
>> +++ b/drivers/target/target_core_spc.c
>> @@ -523,6 +523,7 @@ spc_emulate_evpd_b0(struct se_cmd *cmd, unsigned
>>char *buf)
>>         if (cmd->se_tfo->max_data_sg_nents) {
>>                 mtl = (cmd->se_tfo->max_data_sg_nents * PAGE_SIZE) /
>>                        dev->dev_attrib.block_size;
>> +               mtl *= 2;
>>         }
>>         put_unaligned_be32(min_not_zero(mtl,
>>dev->dev_attrib.hw_max_sectors), &buf[8]);
>> 
>> Also for testing, I'd recommend lowering tcm_qla2xxx's max_data_sg_nents
>> to something arbitrarily small to exercise the new code.
>> 
>> Craig, have you had a chance to test the new logic on your setup with
>> hosts that don't honor EVPD block-limits..?
>> 
>
>I've not seen any testing feedback on this patch over the last weeks
>from Himanshu or Craig.
>
>So barring any specific objections, I think it's safe enough to include
>in the upcoming v4.3-rc1 PULL request, minus the stable CC'.
>
>Himanshu + Craig, please let me know how your testing is going.
>
>Based upon your feedback we can address any regressions post merge, and
>subsequently get the patch (plus any other bugfixes) picked up for
>stable code.
>
>Thank you,
>
>--nab

Comments

Nicholas A. Bellinger Sept. 11, 2015, 6:36 a.m. UTC | #1
On Thu, 2015-09-10 at 22:16 +0000, Himanshu Madhani wrote:
> Hi Nic, 
> 
> Sorry about the delay in response.
> 
> I have tested with RHEL 6.5 and noticed that IO were not able to complete
> with 16M and 32M read/write. IO¹s were getting error due to Mid-layer
> underflow
> 
> With initiator running upstream kernel version 4.2 as well I was seeing
> error with Mid-layer reporting under-run.
> 
> I made change in the driver to report DID_OK to Mid-layer with residual
> count and I was able to see IO completed without any error.
> Basically, overriding driver¹s logic of failing an I/O, when residual
> makes it an error with scsi_cmnd->underflow set.
> In reality, though, with scsi_cmnd->underflow being set, Linux Qlogic
> initiator will always see this as an error.
> Maybe this is useful in other OSes. We have verified both read and write
> data on FC trace and it looks good.
> 

Great.  Thanks for the update.

> Here¹s diff that was done in qla2xxx driver to report response to
> mid-layer as DID_OK.
> 
> 
> # git diff
> diff --git a/drivers/scsi/qla2xxx/qla_isr.c
> b/drivers/scsi/qla2xxx/qla_isr.c
> index ccf6a7f..fc7b6a2 100644
> --- a/drivers/scsi/qla2xxx/qla_isr.c
> +++ b/drivers/scsi/qla2xxx/qla_isr.c
> @@ -2257,7 +2257,7 @@ qla2x00_status_entry(scsi_qla_host_t *vha, struct
> rsp_que *rsp, void *pkt)
>                                     "detected (0x%x of 0x%x bytes).\n",
>                                     resid, scsi_bufflen(cp));
> 
> -                               res = DID_ERROR << 16;
> +                               res = DID_OK << 16;
>                                 break;
>                         }
>                 } else if (lscsi_status != SAM_STAT_TASK_SET_FULL &&
> (END)
> 
> 
> Please go ahead and add patch to mainline kernel.
> 

This missed the last linux-next build by a day, but with your Tested-by
+ Signed-off-by on this patch, I think it's still safe enough to queue
for v4.3-rc1 code.  (Adding JEJB CC')

qla2xxx: Allow DID_OK status for residual under/overflow completion
https://git.kernel.org/cgit/linux/kernel/git/nab/target-pending.git/commit/?h=for-next&id=46376bdc1e8a77c6d2cc65b77a1e340d4b4c9f1c

The target-pending/for-next PULL request will be going out in the next
48 hours.

Please let me know ASAP if you hit any regressions.

Thank you,

--nab

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Arun Easi Sept. 11, 2015, 6:56 a.m. UTC | #2
On Thu, 10 Sep 2015, 11:36pm -0700, Nicholas A. Bellinger wrote:

> On Thu, 2015-09-10 at 22:16 +0000, Himanshu Madhani wrote:
>> Hi Nic,
>>
>> Sorry about the delay in response.
>>
>> I have tested with RHEL 6.5 and noticed that IO were not able to complete
>> with 16M and 32M read/write. IO¹s were getting error due to Mid-layer
>> underflow
>>
>> With initiator running upstream kernel version 4.2 as well I was seeing
>> error with Mid-layer reporting under-run.
>>
>> I made change in the driver to report DID_OK to Mid-layer with residual
>> count and I was able to see IO completed without any error.
>> Basically, overriding driver¹s logic of failing an I/O, when residual
>> makes it an error with scsi_cmnd->underflow set.
>> In reality, though, with scsi_cmnd->underflow being set, Linux Qlogic
>> initiator will always see this as an error.
>> Maybe this is useful in other OSes. We have verified both read and write
>> data on FC trace and it looks good.
>>
>
> Great.  Thanks for the update.
>
>> Here¹s diff that was done in qla2xxx driver to report response to
>> mid-layer as DID_OK.
>>
>>
>> # git diff
>> diff --git a/drivers/scsi/qla2xxx/qla_isr.c
>> b/drivers/scsi/qla2xxx/qla_isr.c
>> index ccf6a7f..fc7b6a2 100644
>> --- a/drivers/scsi/qla2xxx/qla_isr.c
>> +++ b/drivers/scsi/qla2xxx/qla_isr.c
>> @@ -2257,7 +2257,7 @@ qla2x00_status_entry(scsi_qla_host_t *vha, struct
>> rsp_que *rsp, void *pkt)
>>                                     "detected (0x%x of 0x%x bytes).\n",
>>                                     resid, scsi_bufflen(cp));
>>
>> -                               res = DID_ERROR << 16;
>> +                               res = DID_OK << 16;
>>                                 break;
>>                         }
>>                 } else if (lscsi_status != SAM_STAT_TASK_SET_FULL &&
>> (END)
>>
>>
>> Please go ahead and add patch to mainline kernel.
>>
>
> This missed the last linux-next build by a day, but with your Tested-by
> + Signed-off-by on this patch, I think it's still safe enough to queue
> for v4.3-rc1 code.  (Adding JEJB CC')
>
> qla2xxx: Allow DID_OK status for residual under/overflow completion
> https://git.kernel.org/cgit/linux/kernel/git/nab/target-pending.git/commit/?h=for-next&id=46376bdc1e8a77c6d2cc65b77a1e340d4b4c9f1c
>
> The target-pending/for-next PULL request will be going out in the next
> 48 hours.
>
> Please let me know ASAP if you hit any regressions.
>

With existing scsi_cmnd->underflow semantics, LLDs are required (I 
believe) to treat the I/O as error if data transferred is less than 
underflow. "underflow", I think, is set to scsi_bufflen() today making any 
non-zero residual cases an error. So the above patch would break that 
semantic. We were not planning to push that change when we made that 
internally for testing.

From scsi_cmnd.h:
--8<-- struct scsi_cmnd --
         unsigned underflow;     /* Return error if less than
                                    this amount is transferred */
--8<--

That said, I think moving the check (of scsi_cmnd->underflow) to upper 
layers make more sense to me. LLDs would continue to set residual, if 
there is a residual. SCSI M/L or sd/st can do the enforcement of 
"scsi_cmnd->underflow" requirement.

Regards,
-Arun
Nicholas A. Bellinger Sept. 11, 2015, 7:07 a.m. UTC | #3
On Thu, 2015-09-10 at 23:56 -0700, Arun Easi wrote:
> On Thu, 10 Sep 2015, 11:36pm -0700, Nicholas A. Bellinger wrote:
> 
> > On Thu, 2015-09-10 at 22:16 +0000, Himanshu Madhani wrote:
> >> Hi Nic,
> >>
> >> Sorry about the delay in response.
> >>
> >> I have tested with RHEL 6.5 and noticed that IO were not able to complete
> >> with 16M and 32M read/write. IO¹s were getting error due to Mid-layer
> >> underflow
> >>
> >> With initiator running upstream kernel version 4.2 as well I was seeing
> >> error with Mid-layer reporting under-run.
> >>
> >> I made change in the driver to report DID_OK to Mid-layer with residual
> >> count and I was able to see IO completed without any error.
> >> Basically, overriding driver¹s logic of failing an I/O, when residual
> >> makes it an error with scsi_cmnd->underflow set.
> >> In reality, though, with scsi_cmnd->underflow being set, Linux Qlogic
> >> initiator will always see this as an error.
> >> Maybe this is useful in other OSes. We have verified both read and write
> >> data on FC trace and it looks good.
> >>
> >
> > Great.  Thanks for the update.
> >
> >> Here¹s diff that was done in qla2xxx driver to report response to
> >> mid-layer as DID_OK.
> >>
> >>
> >> # git diff
> >> diff --git a/drivers/scsi/qla2xxx/qla_isr.c
> >> b/drivers/scsi/qla2xxx/qla_isr.c
> >> index ccf6a7f..fc7b6a2 100644
> >> --- a/drivers/scsi/qla2xxx/qla_isr.c
> >> +++ b/drivers/scsi/qla2xxx/qla_isr.c
> >> @@ -2257,7 +2257,7 @@ qla2x00_status_entry(scsi_qla_host_t *vha, struct
> >> rsp_que *rsp, void *pkt)
> >>                                     "detected (0x%x of 0x%x bytes).\n",
> >>                                     resid, scsi_bufflen(cp));
> >>
> >> -                               res = DID_ERROR << 16;
> >> +                               res = DID_OK << 16;
> >>                                 break;
> >>                         }
> >>                 } else if (lscsi_status != SAM_STAT_TASK_SET_FULL &&
> >> (END)
> >>
> >>
> >> Please go ahead and add patch to mainline kernel.
> >>
> >
> > This missed the last linux-next build by a day, but with your Tested-by
> > + Signed-off-by on this patch, I think it's still safe enough to queue
> > for v4.3-rc1 code.  (Adding JEJB CC')
> >
> > qla2xxx: Allow DID_OK status for residual under/overflow completion
> > https://git.kernel.org/cgit/linux/kernel/git/nab/target-pending.git/commit/?h=for-next&id=46376bdc1e8a77c6d2cc65b77a1e340d4b4c9f1c
> >
> > The target-pending/for-next PULL request will be going out in the next
> > 48 hours.
> >
> > Please let me know ASAP if you hit any regressions.
> >
> 
> With existing scsi_cmnd->underflow semantics, LLDs are required (I 
> believe) to treat the I/O as error if data transferred is less than 
> underflow. "underflow", I think, is set to scsi_bufflen() today making any 
> non-zero residual cases an error. So the above patch would break that 
> semantic. We were not planning to push that change when we made that 
> internally for testing.
> 
> From scsi_cmnd.h:
> --8<-- struct scsi_cmnd --
>          unsigned underflow;     /* Return error if less than
>                                     this amount is transferred */
> --8<--
> 
> That said, I think moving the check (of scsi_cmnd->underflow) to upper 
> layers make more sense to me. LLDs would continue to set residual, if 
> there is a residual. SCSI M/L or sd/st can do the enforcement of 
> "scsi_cmnd->underflow" requirement.
> 

Er, sorry.  I read Himanshu's response wrong.

Dropping the LLD side patch now.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/scsi/qla2xxx/qla_isr.c
b/drivers/scsi/qla2xxx/qla_isr.c
index ccf6a7f..fc7b6a2 100644
--- a/drivers/scsi/qla2xxx/qla_isr.c
+++ b/drivers/scsi/qla2xxx/qla_isr.c
@@ -2257,7 +2257,7 @@  qla2x00_status_entry(scsi_qla_host_t *vha, struct
rsp_que *rsp, void *pkt)
                                    "detected (0x%x of 0x%x bytes).\n",
                                    resid, scsi_bufflen(cp));

-                               res = DID_ERROR << 16;
+                               res = DID_OK << 16;
                                break;
                        }