mbox series

[0/4] scsi: qla2xxx: Bug fixes

Message ID 20190912003919.8488-1-r.bolshakov@yadro.com (mailing list archive)
Headers show
Series scsi: qla2xxx: Bug fixes | expand

Message

Roman Bolshakov Sept. 12, 2019, 12:39 a.m. UTC
Hi Martin,

This series has a few bug fixes for the driver.

Note, #1 only fixes the crash in the kernel. The complete fix for clean
ACL deletion from initiator side is in works and requires a discussion.

As of now initiator is not aware that target no longer wants talking to
it, that implies unneeded timeout. It might be fixed by making LOGO
explicit on session deletion but it's an issue I want to raise first
before making the change. Whether we need implicit LOGO in qla2xxx,
explicit or use both.

Also, an unsolicited ABTS from a port without session would still result
in BA_RJT response instead of frame discard and LOGO ELS, as specified
in FCP (12.3.3 Target FCP_Port response to Exchange termination):

   When an ABTS-LS is received at the target FCP_Port, it shall abort
   the designated Exchange and return one of the following responses:

   a) the target FCP_Port shall discard the ABTS-LS and transmit a LOGO
      ELS if the Nx_Port issuing the ABTS-LS is not currently logged in
      (i.e., no N_Port Login exists);

FWIW, the target driver can receive ABTS as part of ABORT TASK/LUN
RESET/CLEAR TASK SET TMFs and in case of failed sequence retransmission
requests, exchange or sequence errors. IIRC, some initiators requeue
SCSI commands if BA_RJT is received. Therefore, a timely LOGO will
prevent a perceived session freeze on the initiators.

Thanks,
Roman

Roman Bolshakov (4):
  scsi: qla2xxx: Ignore NULL pointer in tcm_qla2xxx_free_mcmd
  scsi: qla2xxx: Initialize free_work before flushing it
  scsi: qla2xxx: Drop superfluous INIT_WORK of del_work
  scsi: qla2xxx: Change discovery state before PLOGI

 drivers/scsi/qla2xxx/qla_init.c    | 2 ++
 drivers/scsi/qla2xxx/qla_target.c  | 2 --
 drivers/scsi/qla2xxx/tcm_qla2xxx.c | 2 ++
 3 files changed, 4 insertions(+), 2 deletions(-)

Comments

Bart Van Assche Sept. 12, 2019, 5:37 a.m. UTC | #1
On 9/12/19 1:39 AM, Roman Bolshakov wrote:
> This series has a few bug fixes for the driver.
> 
> Note, #1 only fixes the crash in the kernel. The complete fix for clean
> ACL deletion from initiator side is in works and requires a discussion.
> 
> As of now initiator is not aware that target no longer wants talking to
> it, that implies unneeded timeout. It might be fixed by making LOGO
> explicit on session deletion but it's an issue I want to raise first
> before making the change. Whether we need implicit LOGO in qla2xxx,
> explicit or use both.
> 
> Also, an unsolicited ABTS from a port without session would still result
> in BA_RJT response instead of frame discard and LOGO ELS, as specified
> in FCP (12.3.3 Target FCP_Port response to Exchange termination):
> 
>    When an ABTS-LS is received at the target FCP_Port, it shall abort
>    the designated Exchange and return one of the following responses:
> 
>    a) the target FCP_Port shall discard the ABTS-LS and transmit a LOGO
>       ELS if the Nx_Port issuing the ABTS-LS is not currently logged in
>       (i.e., no N_Port Login exists);
> 
> FWIW, the target driver can receive ABTS as part of ABORT TASK/LUN
> RESET/CLEAR TASK SET TMFs and in case of failed sequence retransmission
> requests, exchange or sequence errors. IIRC, some initiators requeue
> SCSI commands if BA_RJT is received. Therefore, a timely LOGO will
> prevent a perceived session freeze on the initiators.

Hi Roman,

Has this patch series been prepared against Linus' master branch,
against Martin's 5.3/scsi-fixes or against Martin's 5.4/scsi-queue
branch? I'm asking this because some patches in this series look similar
to patches that are already present in the 5.4/scsi-queue branch.

Thanks,

Bart.
Roman Bolshakov Sept. 12, 2019, 1:49 p.m. UTC | #2
On Thu, Sep 12, 2019 at 06:37:22AM +0100, Bart Van Assche wrote:
> On 9/12/19 1:39 AM, Roman Bolshakov wrote:
> > This series has a few bug fixes for the driver.
> > 
> > Note, #1 only fixes the crash in the kernel. The complete fix for clean
> > ACL deletion from initiator side is in works and requires a discussion.
> > 
> > As of now initiator is not aware that target no longer wants talking to
> > it, that implies unneeded timeout. It might be fixed by making LOGO
> > explicit on session deletion but it's an issue I want to raise first
> > before making the change. Whether we need implicit LOGO in qla2xxx,
> > explicit or use both.
> > 
> > Also, an unsolicited ABTS from a port without session would still result
> > in BA_RJT response instead of frame discard and LOGO ELS, as specified
> > in FCP (12.3.3 Target FCP_Port response to Exchange termination):
> > 
> >    When an ABTS-LS is received at the target FCP_Port, it shall abort
> >    the designated Exchange and return one of the following responses:
> > 
> >    a) the target FCP_Port shall discard the ABTS-LS and transmit a LOGO
> >       ELS if the Nx_Port issuing the ABTS-LS is not currently logged in
> >       (i.e., no N_Port Login exists);
> > 
> > FWIW, the target driver can receive ABTS as part of ABORT TASK/LUN
> > RESET/CLEAR TASK SET TMFs and in case of failed sequence retransmission
> > requests, exchange or sequence errors. IIRC, some initiators requeue
> > SCSI commands if BA_RJT is received. Therefore, a timely LOGO will
> > prevent a perceived session freeze on the initiators.
> 
> Hi Roman,
> 
> Has this patch series been prepared against Linus' master branch,
> against Martin's 5.3/scsi-fixes or against Martin's 5.4/scsi-queue
> branch? I'm asking this because some patches in this series look similar
> to patches that are already present in the 5.4/scsi-queue branch.
> 
> Thanks,
> 
> Bart.
> 

Hi Bart,

To be honest it was prepared against next-20190904 but it applies to
5.4/scsi-queue cleanly. The fixes made two weeks ago look promising but
are related to stuck PRLI and unhandled RSCN while #4 is related to
stuck PLOGI after qla_post_els_plogi_work.

Thank you,
Roman
Himanshu Madhani Sept. 12, 2019, 1:53 p.m. UTC | #3
Adding Correct Quinn. Please use "qutran@mavell.com"

We'll take a look at the series

On 9/12/19, 8:49 AM, "linux-scsi-owner@vger.kernel.org on behalf of Roman Bolshakov" <linux-scsi-owner@vger.kernel.org on behalf of r.bolshakov@yadro.com> wrote:

    On Thu, Sep 12, 2019 at 06:37:22AM +0100, Bart Van Assche wrote:
    > On 9/12/19 1:39 AM, Roman Bolshakov wrote:
    > > This series has a few bug fixes for the driver.
    > > 
    > > Note, #1 only fixes the crash in the kernel. The complete fix for clean
    > > ACL deletion from initiator side is in works and requires a discussion.
    > > 
    > > As of now initiator is not aware that target no longer wants talking to
    > > it, that implies unneeded timeout. It might be fixed by making LOGO
    > > explicit on session deletion but it's an issue I want to raise first
    > > before making the change. Whether we need implicit LOGO in qla2xxx,
    > > explicit or use both.
    > > 
    > > Also, an unsolicited ABTS from a port without session would still result
    > > in BA_RJT response instead of frame discard and LOGO ELS, as specified
    > > in FCP (12.3.3 Target FCP_Port response to Exchange termination):
    > > 
    > >    When an ABTS-LS is received at the target FCP_Port, it shall abort
    > >    the designated Exchange and return one of the following responses:
    > > 
    > >    a) the target FCP_Port shall discard the ABTS-LS and transmit a LOGO
    > >       ELS if the Nx_Port issuing the ABTS-LS is not currently logged in
    > >       (i.e., no N_Port Login exists);
    > > 
    > > FWIW, the target driver can receive ABTS as part of ABORT TASK/LUN
    > > RESET/CLEAR TASK SET TMFs and in case of failed sequence retransmission
    > > requests, exchange or sequence errors. IIRC, some initiators requeue
    > > SCSI commands if BA_RJT is received. Therefore, a timely LOGO will
    > > prevent a perceived session freeze on the initiators.
    > 
    > Hi Roman,
    > 
    > Has this patch series been prepared against Linus' master branch,
    > against Martin's 5.3/scsi-fixes or against Martin's 5.4/scsi-queue
    > branch? I'm asking this because some patches in this series look similar
    > to patches that are already present in the 5.4/scsi-queue branch.
    > 
    > Thanks,
    > 
    > Bart.
    > 
    
    Hi Bart,
    
    To be honest it was prepared against next-20190904 but it applies to
    5.4/scsi-queue cleanly. The fixes made two weeks ago look promising but
    are related to stuck PRLI and unhandled RSCN while #4 is related to
    stuck PLOGI after qla_post_els_plogi_work.
    
    Thank you,
    Roman
Roman Bolshakov Nov. 7, 2019, 7 p.m. UTC | #4
Hi Himanshu,

Could you please take a look at the series and anwser if we should stop
doing BA_RJT as a response on ABTS when there's no session?

Thank you,
Roman

On Thu, Sep 12, 2019 at 01:53:03PM +0000, Himanshu Madhani wrote:
> Adding Correct Quinn. Please use "qutran@mavell.com"
> 
> We'll take a look at the series
> 
> On 9/12/19, 8:49 AM, "linux-scsi-owner@vger.kernel.org on behalf of Roman Bolshakov" <linux-scsi-owner@vger.kernel.org on behalf of r.bolshakov@yadro.com> wrote:
> 
>     On Thu, Sep 12, 2019 at 06:37:22AM +0100, Bart Van Assche wrote:
>     > On 9/12/19 1:39 AM, Roman Bolshakov wrote:
>     > > This series has a few bug fixes for the driver.
>     > > 
>     > > Note, #1 only fixes the crash in the kernel. The complete fix for clean
>     > > ACL deletion from initiator side is in works and requires a discussion.
>     > > 
>     > > As of now initiator is not aware that target no longer wants talking to
>     > > it, that implies unneeded timeout. It might be fixed by making LOGO
>     > > explicit on session deletion but it's an issue I want to raise first
>     > > before making the change. Whether we need implicit LOGO in qla2xxx,
>     > > explicit or use both.
>     > > 
>     > > Also, an unsolicited ABTS from a port without session would still result
>     > > in BA_RJT response instead of frame discard and LOGO ELS, as specified
>     > > in FCP (12.3.3 Target FCP_Port response to Exchange termination):
>     > > 
>     > >    When an ABTS-LS is received at the target FCP_Port, it shall abort
>     > >    the designated Exchange and return one of the following responses:
>     > > 
>     > >    a) the target FCP_Port shall discard the ABTS-LS and transmit a LOGO
>     > >       ELS if the Nx_Port issuing the ABTS-LS is not currently logged in
>     > >       (i.e., no N_Port Login exists);
>     > > 
>     > > FWIW, the target driver can receive ABTS as part of ABORT TASK/LUN
>     > > RESET/CLEAR TASK SET TMFs and in case of failed sequence retransmission
>     > > requests, exchange or sequence errors. IIRC, some initiators requeue
>     > > SCSI commands if BA_RJT is received. Therefore, a timely LOGO will
>     > > prevent a perceived session freeze on the initiators.
>     > 
>     > Hi Roman,
>     > 
>     > Has this patch series been prepared against Linus' master branch,
>     > against Martin's 5.3/scsi-fixes or against Martin's 5.4/scsi-queue
>     > branch? I'm asking this because some patches in this series look similar
>     > to patches that are already present in the 5.4/scsi-queue branch.
>     > 
>     > Thanks,
>     > 
>     > Bart.
>     > 
>     
>     Hi Bart,
>     
>     To be honest it was prepared against next-20190904 but it applies to
>     5.4/scsi-queue cleanly. The fixes made two weeks ago look promising but
>     are related to stuck PRLI and unhandled RSCN while #4 is related to
>     stuck PLOGI after qla_post_els_plogi_work.
>     
>     Thank you,
>     Roman
>     
>
Roman Bolshakov Nov. 13, 2019, 6:54 p.m. UTC | #5
Hi Himanshu,

Could you please answer the questions below?
And if we should start doing explicit LOGO on session shutdown?

Thank you,
Roman

On Thu, Nov 07, 2019 at 10:00:32PM +0300, Roman Bolshakov wrote:
> Hi Himanshu,
> 
> Could you please take a look at the series and anwser if we should stop
> doing BA_RJT as a response on ABTS when there's no session?
> 
> Thank you,
> Roman
> 
> On Thu, Sep 12, 2019 at 01:53:03PM +0000, Himanshu Madhani wrote:
> > Adding Correct Quinn. Please use "qutran@mavell.com"
> > 
> > We'll take a look at the series
> > 
> > On 9/12/19, 8:49 AM, "linux-scsi-owner@vger.kernel.org on behalf of Roman Bolshakov" <linux-scsi-owner@vger.kernel.org on behalf of r.bolshakov@yadro.com> wrote:
> > 
> >     On Thu, Sep 12, 2019 at 06:37:22AM +0100, Bart Van Assche wrote:
> >     > On 9/12/19 1:39 AM, Roman Bolshakov wrote:
> >     > > This series has a few bug fixes for the driver.
> >     > > 
> >     > > Note, #1 only fixes the crash in the kernel. The complete fix for clean
> >     > > ACL deletion from initiator side is in works and requires a discussion.
> >     > > 
> >     > > As of now initiator is not aware that target no longer wants talking to
> >     > > it, that implies unneeded timeout. It might be fixed by making LOGO
> >     > > explicit on session deletion but it's an issue I want to raise first
> >     > > before making the change. Whether we need implicit LOGO in qla2xxx,
> >     > > explicit or use both.
> >     > > 
> >     > > Also, an unsolicited ABTS from a port without session would still result
> >     > > in BA_RJT response instead of frame discard and LOGO ELS, as specified
> >     > > in FCP (12.3.3 Target FCP_Port response to Exchange termination):
> >     > > 
> >     > >    When an ABTS-LS is received at the target FCP_Port, it shall abort
> >     > >    the designated Exchange and return one of the following responses:
> >     > > 
> >     > >    a) the target FCP_Port shall discard the ABTS-LS and transmit a LOGO
> >     > >       ELS if the Nx_Port issuing the ABTS-LS is not currently logged in
> >     > >       (i.e., no N_Port Login exists);
> >     > > 
> >     > > FWIW, the target driver can receive ABTS as part of ABORT TASK/LUN
> >     > > RESET/CLEAR TASK SET TMFs and in case of failed sequence retransmission
> >     > > requests, exchange or sequence errors. IIRC, some initiators requeue
> >     > > SCSI commands if BA_RJT is received. Therefore, a timely LOGO will
> >     > > prevent a perceived session freeze on the initiators.
> >     > 
> >     > Hi Roman,
> >     > 
> >     > Has this patch series been prepared against Linus' master branch,
> >     > against Martin's 5.3/scsi-fixes or against Martin's 5.4/scsi-queue
> >     > branch? I'm asking this because some patches in this series look similar
> >     > to patches that are already present in the 5.4/scsi-queue branch.
> >     > 
> >     > Thanks,
> >     > 
> >     > Bart.
> >     > 
> >     
> >     Hi Bart,
> >     
> >     To be honest it was prepared against next-20190904 but it applies to
> >     5.4/scsi-queue cleanly. The fixes made two weeks ago look promising but
> >     are related to stuck PRLI and unhandled RSCN while #4 is related to
> >     stuck PLOGI after qla_post_els_plogi_work.
> >     
> >     Thank you,
> >     Roman
> >     
> >
Roman Bolshakov Nov. 19, 2019, 9:46 p.m. UTC | #6
Hi Himanshu,

I've tried the patch and it seems that LOGO doesn't succeed yet:
[ 1079.073246] qla2xxx [0000:00:06.0]-2870:1: Async-logout - hdl=0 loop-id=0 portid=000002 21:00:00:24:ff:7f:35:c6.
[ 1079.073333] qla2xxx [0000:00:06.0]-5837:1: Async-logout failed - 21:00:00:24:ff:7f:35:c6 hdl=12 portid=000002 comp=31 iop0=19 iop1=c.

It means that firmware detected IOCB parameter error at offset 0xc.
I'll examine IOCB parameter dumps tomorrow.

Are you ok if I send the patch (keeping Quinn's authorship) in the my
patch set once I get it fixed?

I also consider to add one more patch to address the issue with BA_RJT.
The idea is to discard a frame that has no session (not logged in) and
send explicit LOGO ELS instead of BA_RJT to follow FCP spec 12.3.3:
   When an ABTS-LS is received at the target FCP_Port, it shall abort
   the designated Exchange and return one of the following responses:

   a) the target FCP_Port shall discard the ABTS-LS and transmit a LOGO
      ELS if the Nx_Port issuing the ABTS-LS is not currently logged in
      (i.e., no N_Port Login exists);

Although, I don't know if firmware can handle that yet. We'll see.

Thank you,
Roman

On Wed, Nov 13, 2019 at 10:33:35PM +0000, Himanshu Madhani wrote:
> Hi Roman,
> 
> 
> > On Nov 13, 2019, at 12:54 PM, Roman Bolshakov <r.bolshakov@yadro.com> wrote:
> >
> > External Email
> >
> > ----------------------------------------------------------------------
> > Hi Himanshu,
> >
> > Could you please answer the questions below?
> > And if we should start doing explicit LOGO on session shutdown?
> >
> 
> You are correct in target mode driver should do explicit logo if target does not want to talk to initiator anymore.
> 
> We had patch for the explicit_logo in our internal code base, but did not sent it upstream. I’ve attached it here for you to try it out and see if it helps in your env.
> 
> Thanks,
> Himanshu
> 
> 
> 
> > Thank you,
> > Roman
> >
> > On Thu, Nov 07, 2019 at 10:00:32PM +0300, Roman Bolshakov wrote:
> >> Hi Himanshu,
> >>
> >> Could you please take a look at the series and anwser if we should stop
> >> doing BA_RJT as a response on ABTS when there's no session?
> >>
> >> Thank you,
> >> Roman
> >>
> >> On Thu, Sep 12, 2019 at 01:53:03PM +0000, Himanshu Madhani wrote:
> >>> Adding Correct Quinn. Please use "qutran@mavell.com"
> >>>
> >>> We'll take a look at the series
> >>>
> >>> On 9/12/19, 8:49 AM, "linux-scsi-owner@vger.kernel.org on behalf of Roman Bolshakov" <linux-scsi-owner@vger.kernel.org on behalf of r.bolshakov@yadro.com> wrote:
> >>>
> >>>    On Thu, Sep 12, 2019 at 06:37:22AM +0100, Bart Van Assche wrote:
> >>>> On 9/12/19 1:39 AM, Roman Bolshakov wrote:
> >>>>> This series has a few bug fixes for the driver.
> >>>>>
> >>>>> Note, #1 only fixes the crash in the kernel. The complete fix for clean
> >>>>> ACL deletion from initiator side is in works and requires a discussion.
> >>>>>
> >>>>> As of now initiator is not aware that target no longer wants talking to
> >>>>> it, that implies unneeded timeout. It might be fixed by making LOGO
> >>>>> explicit on session deletion but it's an issue I want to raise first
> >>>>> before making the change. Whether we need implicit LOGO in qla2xxx,
> >>>>> explicit or use both.
> >>>>>
> >>>>> Also, an unsolicited ABTS from a port without session would still result
> >>>>> in BA_RJT response instead of frame discard and LOGO ELS, as specified
> >>>>> in FCP (12.3.3 Target FCP_Port response to Exchange termination):
> >>>>>
> >>>>>   When an ABTS-LS is received at the target FCP_Port, it shall abort
> >>>>>   the designated Exchange and return one of the following responses:
> >>>>>
> >>>>>   a) the target FCP_Port shall discard the ABTS-LS and transmit a LOGO
> >>>>>      ELS if the Nx_Port issuing the ABTS-LS is not currently logged in
> >>>>>      (i.e., no N_Port Login exists);
> >>>>>
> >>>>> FWIW, the target driver can receive ABTS as part of ABORT TASK/LUN
> >>>>> RESET/CLEAR TASK SET TMFs and in case of failed sequence retransmission
> >>>>> requests, exchange or sequence errors. IIRC, some initiators requeue
> >>>>> SCSI commands if BA_RJT is received. Therefore, a timely LOGO will
> >>>>> prevent a perceived session freeze on the initiators.
> >>>>
> >>>> Hi Roman,
> >>>>
> >>>> Has this patch series been prepared against Linus' master branch,
> >>>> against Martin's 5.3/scsi-fixes or against Martin's 5.4/scsi-queue
> >>>> branch? I'm asking this because some patches in this series look similar
> >>>> to patches that are already present in the 5.4/scsi-queue branch.
> >>>>
> >>>> Thanks,
> >>>>
> >>>> Bart.
> >>>>
> >>>
> >>>    Hi Bart,
> >>>
> >>>    To be honest it was prepared against next-20190904 but it applies to
> >>>    5.4/scsi-queue cleanly. The fixes made two weeks ago look promising but
> >>>    are related to stuck PRLI and unhandled RSCN while #4 is related to
> >>>    stuck PLOGI after qla_post_els_plogi_work.
> >>>
> >>>    Thank you,
> >>>    Roman
> >>>
> >>>
>
Roman Bolshakov Nov. 20, 2019, 12:09 a.m. UTC | #7
On Wed, Nov 20, 2019 at 12:46:56AM +0300, Roman Bolshakov wrote:
> Hi Himanshu,
> 
> I've tried the patch and it seems that LOGO doesn't succeed yet:
> [ 1079.073246] qla2xxx [0000:00:06.0]-2870:1: Async-logout - hdl=0 loop-id=0 portid=000002 21:00:00:24:ff:7f:35:c6.
> [ 1079.073333] qla2xxx [0000:00:06.0]-5837:1: Async-logout failed - 21:00:00:24:ff:7f:35:c6 hdl=12 portid=000002 comp=31 iop0=19 iop1=c.
> 
> It means that firmware detected IOCB parameter error at offset 0xc.
> I'll examine IOCB parameter dumps tomorrow.
> 

FWIW, it was an easy fix, we have to set either explicit LOGO bit or implicit
LOGO bit. Free N_Port handle is not allowed to be set alone. I corrected
the patch:
https://github.com/roolebo/linux/commit/6e86300a60552bfc0a4c49d65d89e5011dd90f10

--
Roman