Message ID | 20190912003919.8488-1-r.bolshakov@yadro.com (mailing list archive) |
---|---|
Headers | show |
Series | scsi: qla2xxx: Bug fixes | expand |
On 9/12/19 1:39 AM, Roman Bolshakov wrote: > This series has a few bug fixes for the driver. > > Note, #1 only fixes the crash in the kernel. The complete fix for clean > ACL deletion from initiator side is in works and requires a discussion. > > As of now initiator is not aware that target no longer wants talking to > it, that implies unneeded timeout. It might be fixed by making LOGO > explicit on session deletion but it's an issue I want to raise first > before making the change. Whether we need implicit LOGO in qla2xxx, > explicit or use both. > > Also, an unsolicited ABTS from a port without session would still result > in BA_RJT response instead of frame discard and LOGO ELS, as specified > in FCP (12.3.3 Target FCP_Port response to Exchange termination): > > When an ABTS-LS is received at the target FCP_Port, it shall abort > the designated Exchange and return one of the following responses: > > a) the target FCP_Port shall discard the ABTS-LS and transmit a LOGO > ELS if the Nx_Port issuing the ABTS-LS is not currently logged in > (i.e., no N_Port Login exists); > > FWIW, the target driver can receive ABTS as part of ABORT TASK/LUN > RESET/CLEAR TASK SET TMFs and in case of failed sequence retransmission > requests, exchange or sequence errors. IIRC, some initiators requeue > SCSI commands if BA_RJT is received. Therefore, a timely LOGO will > prevent a perceived session freeze on the initiators. Hi Roman, Has this patch series been prepared against Linus' master branch, against Martin's 5.3/scsi-fixes or against Martin's 5.4/scsi-queue branch? I'm asking this because some patches in this series look similar to patches that are already present in the 5.4/scsi-queue branch. Thanks, Bart.
On Thu, Sep 12, 2019 at 06:37:22AM +0100, Bart Van Assche wrote: > On 9/12/19 1:39 AM, Roman Bolshakov wrote: > > This series has a few bug fixes for the driver. > > > > Note, #1 only fixes the crash in the kernel. The complete fix for clean > > ACL deletion from initiator side is in works and requires a discussion. > > > > As of now initiator is not aware that target no longer wants talking to > > it, that implies unneeded timeout. It might be fixed by making LOGO > > explicit on session deletion but it's an issue I want to raise first > > before making the change. Whether we need implicit LOGO in qla2xxx, > > explicit or use both. > > > > Also, an unsolicited ABTS from a port without session would still result > > in BA_RJT response instead of frame discard and LOGO ELS, as specified > > in FCP (12.3.3 Target FCP_Port response to Exchange termination): > > > > When an ABTS-LS is received at the target FCP_Port, it shall abort > > the designated Exchange and return one of the following responses: > > > > a) the target FCP_Port shall discard the ABTS-LS and transmit a LOGO > > ELS if the Nx_Port issuing the ABTS-LS is not currently logged in > > (i.e., no N_Port Login exists); > > > > FWIW, the target driver can receive ABTS as part of ABORT TASK/LUN > > RESET/CLEAR TASK SET TMFs and in case of failed sequence retransmission > > requests, exchange or sequence errors. IIRC, some initiators requeue > > SCSI commands if BA_RJT is received. Therefore, a timely LOGO will > > prevent a perceived session freeze on the initiators. > > Hi Roman, > > Has this patch series been prepared against Linus' master branch, > against Martin's 5.3/scsi-fixes or against Martin's 5.4/scsi-queue > branch? I'm asking this because some patches in this series look similar > to patches that are already present in the 5.4/scsi-queue branch. > > Thanks, > > Bart. > Hi Bart, To be honest it was prepared against next-20190904 but it applies to 5.4/scsi-queue cleanly. The fixes made two weeks ago look promising but are related to stuck PRLI and unhandled RSCN while #4 is related to stuck PLOGI after qla_post_els_plogi_work. Thank you, Roman
Adding Correct Quinn. Please use "qutran@mavell.com" We'll take a look at the series On 9/12/19, 8:49 AM, "linux-scsi-owner@vger.kernel.org on behalf of Roman Bolshakov" <linux-scsi-owner@vger.kernel.org on behalf of r.bolshakov@yadro.com> wrote: On Thu, Sep 12, 2019 at 06:37:22AM +0100, Bart Van Assche wrote: > On 9/12/19 1:39 AM, Roman Bolshakov wrote: > > This series has a few bug fixes for the driver. > > > > Note, #1 only fixes the crash in the kernel. The complete fix for clean > > ACL deletion from initiator side is in works and requires a discussion. > > > > As of now initiator is not aware that target no longer wants talking to > > it, that implies unneeded timeout. It might be fixed by making LOGO > > explicit on session deletion but it's an issue I want to raise first > > before making the change. Whether we need implicit LOGO in qla2xxx, > > explicit or use both. > > > > Also, an unsolicited ABTS from a port without session would still result > > in BA_RJT response instead of frame discard and LOGO ELS, as specified > > in FCP (12.3.3 Target FCP_Port response to Exchange termination): > > > > When an ABTS-LS is received at the target FCP_Port, it shall abort > > the designated Exchange and return one of the following responses: > > > > a) the target FCP_Port shall discard the ABTS-LS and transmit a LOGO > > ELS if the Nx_Port issuing the ABTS-LS is not currently logged in > > (i.e., no N_Port Login exists); > > > > FWIW, the target driver can receive ABTS as part of ABORT TASK/LUN > > RESET/CLEAR TASK SET TMFs and in case of failed sequence retransmission > > requests, exchange or sequence errors. IIRC, some initiators requeue > > SCSI commands if BA_RJT is received. Therefore, a timely LOGO will > > prevent a perceived session freeze on the initiators. > > Hi Roman, > > Has this patch series been prepared against Linus' master branch, > against Martin's 5.3/scsi-fixes or against Martin's 5.4/scsi-queue > branch? I'm asking this because some patches in this series look similar > to patches that are already present in the 5.4/scsi-queue branch. > > Thanks, > > Bart. > Hi Bart, To be honest it was prepared against next-20190904 but it applies to 5.4/scsi-queue cleanly. The fixes made two weeks ago look promising but are related to stuck PRLI and unhandled RSCN while #4 is related to stuck PLOGI after qla_post_els_plogi_work. Thank you, Roman
Hi Himanshu, Could you please take a look at the series and anwser if we should stop doing BA_RJT as a response on ABTS when there's no session? Thank you, Roman On Thu, Sep 12, 2019 at 01:53:03PM +0000, Himanshu Madhani wrote: > Adding Correct Quinn. Please use "qutran@mavell.com" > > We'll take a look at the series > > On 9/12/19, 8:49 AM, "linux-scsi-owner@vger.kernel.org on behalf of Roman Bolshakov" <linux-scsi-owner@vger.kernel.org on behalf of r.bolshakov@yadro.com> wrote: > > On Thu, Sep 12, 2019 at 06:37:22AM +0100, Bart Van Assche wrote: > > On 9/12/19 1:39 AM, Roman Bolshakov wrote: > > > This series has a few bug fixes for the driver. > > > > > > Note, #1 only fixes the crash in the kernel. The complete fix for clean > > > ACL deletion from initiator side is in works and requires a discussion. > > > > > > As of now initiator is not aware that target no longer wants talking to > > > it, that implies unneeded timeout. It might be fixed by making LOGO > > > explicit on session deletion but it's an issue I want to raise first > > > before making the change. Whether we need implicit LOGO in qla2xxx, > > > explicit or use both. > > > > > > Also, an unsolicited ABTS from a port without session would still result > > > in BA_RJT response instead of frame discard and LOGO ELS, as specified > > > in FCP (12.3.3 Target FCP_Port response to Exchange termination): > > > > > > When an ABTS-LS is received at the target FCP_Port, it shall abort > > > the designated Exchange and return one of the following responses: > > > > > > a) the target FCP_Port shall discard the ABTS-LS and transmit a LOGO > > > ELS if the Nx_Port issuing the ABTS-LS is not currently logged in > > > (i.e., no N_Port Login exists); > > > > > > FWIW, the target driver can receive ABTS as part of ABORT TASK/LUN > > > RESET/CLEAR TASK SET TMFs and in case of failed sequence retransmission > > > requests, exchange or sequence errors. IIRC, some initiators requeue > > > SCSI commands if BA_RJT is received. Therefore, a timely LOGO will > > > prevent a perceived session freeze on the initiators. > > > > Hi Roman, > > > > Has this patch series been prepared against Linus' master branch, > > against Martin's 5.3/scsi-fixes or against Martin's 5.4/scsi-queue > > branch? I'm asking this because some patches in this series look similar > > to patches that are already present in the 5.4/scsi-queue branch. > > > > Thanks, > > > > Bart. > > > > Hi Bart, > > To be honest it was prepared against next-20190904 but it applies to > 5.4/scsi-queue cleanly. The fixes made two weeks ago look promising but > are related to stuck PRLI and unhandled RSCN while #4 is related to > stuck PLOGI after qla_post_els_plogi_work. > > Thank you, > Roman > >
Hi Himanshu, Could you please answer the questions below? And if we should start doing explicit LOGO on session shutdown? Thank you, Roman On Thu, Nov 07, 2019 at 10:00:32PM +0300, Roman Bolshakov wrote: > Hi Himanshu, > > Could you please take a look at the series and anwser if we should stop > doing BA_RJT as a response on ABTS when there's no session? > > Thank you, > Roman > > On Thu, Sep 12, 2019 at 01:53:03PM +0000, Himanshu Madhani wrote: > > Adding Correct Quinn. Please use "qutran@mavell.com" > > > > We'll take a look at the series > > > > On 9/12/19, 8:49 AM, "linux-scsi-owner@vger.kernel.org on behalf of Roman Bolshakov" <linux-scsi-owner@vger.kernel.org on behalf of r.bolshakov@yadro.com> wrote: > > > > On Thu, Sep 12, 2019 at 06:37:22AM +0100, Bart Van Assche wrote: > > > On 9/12/19 1:39 AM, Roman Bolshakov wrote: > > > > This series has a few bug fixes for the driver. > > > > > > > > Note, #1 only fixes the crash in the kernel. The complete fix for clean > > > > ACL deletion from initiator side is in works and requires a discussion. > > > > > > > > As of now initiator is not aware that target no longer wants talking to > > > > it, that implies unneeded timeout. It might be fixed by making LOGO > > > > explicit on session deletion but it's an issue I want to raise first > > > > before making the change. Whether we need implicit LOGO in qla2xxx, > > > > explicit or use both. > > > > > > > > Also, an unsolicited ABTS from a port without session would still result > > > > in BA_RJT response instead of frame discard and LOGO ELS, as specified > > > > in FCP (12.3.3 Target FCP_Port response to Exchange termination): > > > > > > > > When an ABTS-LS is received at the target FCP_Port, it shall abort > > > > the designated Exchange and return one of the following responses: > > > > > > > > a) the target FCP_Port shall discard the ABTS-LS and transmit a LOGO > > > > ELS if the Nx_Port issuing the ABTS-LS is not currently logged in > > > > (i.e., no N_Port Login exists); > > > > > > > > FWIW, the target driver can receive ABTS as part of ABORT TASK/LUN > > > > RESET/CLEAR TASK SET TMFs and in case of failed sequence retransmission > > > > requests, exchange or sequence errors. IIRC, some initiators requeue > > > > SCSI commands if BA_RJT is received. Therefore, a timely LOGO will > > > > prevent a perceived session freeze on the initiators. > > > > > > Hi Roman, > > > > > > Has this patch series been prepared against Linus' master branch, > > > against Martin's 5.3/scsi-fixes or against Martin's 5.4/scsi-queue > > > branch? I'm asking this because some patches in this series look similar > > > to patches that are already present in the 5.4/scsi-queue branch. > > > > > > Thanks, > > > > > > Bart. > > > > > > > Hi Bart, > > > > To be honest it was prepared against next-20190904 but it applies to > > 5.4/scsi-queue cleanly. The fixes made two weeks ago look promising but > > are related to stuck PRLI and unhandled RSCN while #4 is related to > > stuck PLOGI after qla_post_els_plogi_work. > > > > Thank you, > > Roman > > > >
Hi Himanshu, I've tried the patch and it seems that LOGO doesn't succeed yet: [ 1079.073246] qla2xxx [0000:00:06.0]-2870:1: Async-logout - hdl=0 loop-id=0 portid=000002 21:00:00:24:ff:7f:35:c6. [ 1079.073333] qla2xxx [0000:00:06.0]-5837:1: Async-logout failed - 21:00:00:24:ff:7f:35:c6 hdl=12 portid=000002 comp=31 iop0=19 iop1=c. It means that firmware detected IOCB parameter error at offset 0xc. I'll examine IOCB parameter dumps tomorrow. Are you ok if I send the patch (keeping Quinn's authorship) in the my patch set once I get it fixed? I also consider to add one more patch to address the issue with BA_RJT. The idea is to discard a frame that has no session (not logged in) and send explicit LOGO ELS instead of BA_RJT to follow FCP spec 12.3.3: When an ABTS-LS is received at the target FCP_Port, it shall abort the designated Exchange and return one of the following responses: a) the target FCP_Port shall discard the ABTS-LS and transmit a LOGO ELS if the Nx_Port issuing the ABTS-LS is not currently logged in (i.e., no N_Port Login exists); Although, I don't know if firmware can handle that yet. We'll see. Thank you, Roman On Wed, Nov 13, 2019 at 10:33:35PM +0000, Himanshu Madhani wrote: > Hi Roman, > > > > On Nov 13, 2019, at 12:54 PM, Roman Bolshakov <r.bolshakov@yadro.com> wrote: > > > > External Email > > > > ---------------------------------------------------------------------- > > Hi Himanshu, > > > > Could you please answer the questions below? > > And if we should start doing explicit LOGO on session shutdown? > > > > You are correct in target mode driver should do explicit logo if target does not want to talk to initiator anymore. > > We had patch for the explicit_logo in our internal code base, but did not sent it upstream. I’ve attached it here for you to try it out and see if it helps in your env. > > Thanks, > Himanshu > > > > > Thank you, > > Roman > > > > On Thu, Nov 07, 2019 at 10:00:32PM +0300, Roman Bolshakov wrote: > >> Hi Himanshu, > >> > >> Could you please take a look at the series and anwser if we should stop > >> doing BA_RJT as a response on ABTS when there's no session? > >> > >> Thank you, > >> Roman > >> > >> On Thu, Sep 12, 2019 at 01:53:03PM +0000, Himanshu Madhani wrote: > >>> Adding Correct Quinn. Please use "qutran@mavell.com" > >>> > >>> We'll take a look at the series > >>> > >>> On 9/12/19, 8:49 AM, "linux-scsi-owner@vger.kernel.org on behalf of Roman Bolshakov" <linux-scsi-owner@vger.kernel.org on behalf of r.bolshakov@yadro.com> wrote: > >>> > >>> On Thu, Sep 12, 2019 at 06:37:22AM +0100, Bart Van Assche wrote: > >>>> On 9/12/19 1:39 AM, Roman Bolshakov wrote: > >>>>> This series has a few bug fixes for the driver. > >>>>> > >>>>> Note, #1 only fixes the crash in the kernel. The complete fix for clean > >>>>> ACL deletion from initiator side is in works and requires a discussion. > >>>>> > >>>>> As of now initiator is not aware that target no longer wants talking to > >>>>> it, that implies unneeded timeout. It might be fixed by making LOGO > >>>>> explicit on session deletion but it's an issue I want to raise first > >>>>> before making the change. Whether we need implicit LOGO in qla2xxx, > >>>>> explicit or use both. > >>>>> > >>>>> Also, an unsolicited ABTS from a port without session would still result > >>>>> in BA_RJT response instead of frame discard and LOGO ELS, as specified > >>>>> in FCP (12.3.3 Target FCP_Port response to Exchange termination): > >>>>> > >>>>> When an ABTS-LS is received at the target FCP_Port, it shall abort > >>>>> the designated Exchange and return one of the following responses: > >>>>> > >>>>> a) the target FCP_Port shall discard the ABTS-LS and transmit a LOGO > >>>>> ELS if the Nx_Port issuing the ABTS-LS is not currently logged in > >>>>> (i.e., no N_Port Login exists); > >>>>> > >>>>> FWIW, the target driver can receive ABTS as part of ABORT TASK/LUN > >>>>> RESET/CLEAR TASK SET TMFs and in case of failed sequence retransmission > >>>>> requests, exchange or sequence errors. IIRC, some initiators requeue > >>>>> SCSI commands if BA_RJT is received. Therefore, a timely LOGO will > >>>>> prevent a perceived session freeze on the initiators. > >>>> > >>>> Hi Roman, > >>>> > >>>> Has this patch series been prepared against Linus' master branch, > >>>> against Martin's 5.3/scsi-fixes or against Martin's 5.4/scsi-queue > >>>> branch? I'm asking this because some patches in this series look similar > >>>> to patches that are already present in the 5.4/scsi-queue branch. > >>>> > >>>> Thanks, > >>>> > >>>> Bart. > >>>> > >>> > >>> Hi Bart, > >>> > >>> To be honest it was prepared against next-20190904 but it applies to > >>> 5.4/scsi-queue cleanly. The fixes made two weeks ago look promising but > >>> are related to stuck PRLI and unhandled RSCN while #4 is related to > >>> stuck PLOGI after qla_post_els_plogi_work. > >>> > >>> Thank you, > >>> Roman > >>> > >>> >
On Wed, Nov 20, 2019 at 12:46:56AM +0300, Roman Bolshakov wrote: > Hi Himanshu, > > I've tried the patch and it seems that LOGO doesn't succeed yet: > [ 1079.073246] qla2xxx [0000:00:06.0]-2870:1: Async-logout - hdl=0 loop-id=0 portid=000002 21:00:00:24:ff:7f:35:c6. > [ 1079.073333] qla2xxx [0000:00:06.0]-5837:1: Async-logout failed - 21:00:00:24:ff:7f:35:c6 hdl=12 portid=000002 comp=31 iop0=19 iop1=c. > > It means that firmware detected IOCB parameter error at offset 0xc. > I'll examine IOCB parameter dumps tomorrow. > FWIW, it was an easy fix, we have to set either explicit LOGO bit or implicit LOGO bit. Free N_Port handle is not allowed to be set alone. I corrected the patch: https://github.com/roolebo/linux/commit/6e86300a60552bfc0a4c49d65d89e5011dd90f10 -- Roman