Message ID | CAANLjFpRCwGSh=HV12dc_OtFKgcjQqERSPirqxWD1w0N-G_mCg@mail.gmail.com (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
Sorry sent prematurely... On Thu, Jan 12, 2017 at 2:22 PM, Robert LeBlanc <robert@leblancnet.us> wrote: > I'm having trouble replicating the D state issue on Infiniband (I was > able to trigger it reliably a couple weeks back, I don't know if OFED > to verify the same results happen there as well. I'm having trouble replicating the D state issue on Infiniband (I was able to trigger it reliably a couple weeks back, I don't know if OFED being installed is altering things but it only installed for 3.10. The ConnectX-4-LX exposes the issue easily if you have those cards.) to verify the same results happen there as well. ---------------- Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
----- Original Message ----- > From: "Robert LeBlanc" <robert@leblancnet.us> > To: "Laurence Oberman" <loberman@redhat.com> > Cc: "Doug Ledford" <dledford@redhat.com>, "Nicholas A. Bellinger" <nab@linux-iscsi.org>, "Zhu Lingshan" > <lszhu@suse.com>, "linux-rdma" <linux-rdma@vger.kernel.org>, linux-scsi@vger.kernel.org, "Sagi Grimberg" > <sagi@grimberg.me>, "Christoph Hellwig" <hch@lst.de> > Sent: Thursday, January 12, 2017 4:26:05 PM > Subject: Re: iscsi_trx going into D state > > Sorry sent prematurely... > > On Thu, Jan 12, 2017 at 2:22 PM, Robert LeBlanc <robert@leblancnet.us> wrote: > > I'm having trouble replicating the D state issue on Infiniband (I was > > able to trigger it reliably a couple weeks back, I don't know if OFED > > to verify the same results happen there as well. > > I'm having trouble replicating the D state issue on Infiniband (I was > able to trigger it reliably a couple weeks back, I don't know if OFED > being installed is altering things but it only installed for 3.10. The > ConnectX-4-LX exposes the issue easily if you have those cards.) to > verify the same results happen there as well. > > ---------------- > Robert LeBlanc > PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 > -- > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > I am only back in the office next Wednesday. I have this all setup using ConnectX-4 with IB/ISER but have no way of remotely creating the disconnect as I currently have it back-to-back. Have run multiple tests with IB and ISER hard resting the client to break the IB connection but have not been able to reproduce as yet. So it will have to wait until I can pull cables next week as that seemed to be the way you have been reproducing this. This is in a code area I also don't have a lot of knowledge of the flow but have started trying to understand it better. Thanks Laurence -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Laurance, I'm really starting to think that the stars aligned with the phase of the moon or something when I reproduced this in my lab before because I've been unable to reproduce it on Infiniband the last two days. The problem with this issue is that it is so hard to trigger, but causes a lot of problems when it does happen. I really hate wasting people's time when I can't reproduce it myself reliably. Please don't waste too much time if you can't get it reproduced on Infiniband, I'll have to wait until someone with the ConnectX-4-LX cards can replicate it. Hmmm.... you do have ConnectX-4 cards which may have the same bug it Ethernet mode. I don't see the RoCE bug on my ConnectX-3 cards, but your ConnectX-4 cards may work. Try putting the cards into Ethernet mode, set the speed and advertised speed to something lower than the max speed and verify that the link speed is that (ethtool). On the ConnectX-4-LX cards, I just had to set both interfaces down and then back up at the same time, on the ConnectX-3 I had to pull the cable (shutting down the client might have worked). Then set up target and client with iSER, format and run the test and it should trigger automatically. Looking at release notes on the ConnectX-4-LX cards, the latest firmware may fix the bug that so easily exposes the problem with that card. My cards are SuperMicro branded cards and don't have the new firmware available yet. Good luck. ---------------- Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 On Fri, Jan 13, 2017 at 8:10 AM, Laurence Oberman <loberman@redhat.com> wrote: > > > ----- Original Message ----- >> From: "Robert LeBlanc" <robert@leblancnet.us> >> To: "Laurence Oberman" <loberman@redhat.com> >> Cc: "Doug Ledford" <dledford@redhat.com>, "Nicholas A. Bellinger" <nab@linux-iscsi.org>, "Zhu Lingshan" >> <lszhu@suse.com>, "linux-rdma" <linux-rdma@vger.kernel.org>, linux-scsi@vger.kernel.org, "Sagi Grimberg" >> <sagi@grimberg.me>, "Christoph Hellwig" <hch@lst.de> >> Sent: Thursday, January 12, 2017 4:26:05 PM >> Subject: Re: iscsi_trx going into D state >> >> Sorry sent prematurely... >> >> On Thu, Jan 12, 2017 at 2:22 PM, Robert LeBlanc <robert@leblancnet.us> wrote: >> > I'm having trouble replicating the D state issue on Infiniband (I was >> > able to trigger it reliably a couple weeks back, I don't know if OFED >> > to verify the same results happen there as well. >> >> I'm having trouble replicating the D state issue on Infiniband (I was >> able to trigger it reliably a couple weeks back, I don't know if OFED >> being installed is altering things but it only installed for 3.10. The >> ConnectX-4-LX exposes the issue easily if you have those cards.) to >> verify the same results happen there as well. >> >> ---------------- >> Robert LeBlanc >> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > > I am only back in the office next Wednesday. > I have this all setup using ConnectX-4 with IB/ISER but have no way of remotely creating the disconnect as I currently have it back-to-back. > Have run multiple tests with IB and ISER hard resting the client to break the IB connection but have not been able to reproduce as yet. > So it will have to wait until I can pull cables next week as that seemed to be the way you have been reproducing this. > > This is in a code area I also don't have a lot of knowledge of the flow but have started trying to understand it better. > > Thanks > Laurence > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
----- Original Message ----- > From: "Robert LeBlanc" <robert@leblancnet.us> > To: "Laurence Oberman" <loberman@redhat.com> > Cc: "Doug Ledford" <dledford@redhat.com>, "Nicholas A. Bellinger" <nab@linux-iscsi.org>, "Zhu Lingshan" > <lszhu@suse.com>, "linux-rdma" <linux-rdma@vger.kernel.org>, linux-scsi@vger.kernel.org, "Sagi Grimberg" > <sagi@grimberg.me>, "Christoph Hellwig" <hch@lst.de> > Sent: Friday, January 13, 2017 6:38:33 PM > Subject: Re: iscsi_trx going into D state > > Laurance, > > I'm really starting to think that the stars aligned with the phase of > the moon or something when I reproduced this in my lab before because > I've been unable to reproduce it on Infiniband the last two days. The > problem with this issue is that it is so hard to trigger, but causes a > lot of problems when it does happen. I really hate wasting people's > time when I can't reproduce it myself reliably. Please don't waste too > much time if you can't get it reproduced on Infiniband, I'll have to > wait until someone with the ConnectX-4-LX cards can replicate it. > > Hmmm.... you do have ConnectX-4 cards which may have the same bug it > Ethernet mode. I don't see the RoCE bug on my ConnectX-3 cards, but > your ConnectX-4 cards may work. Try putting the cards into Ethernet > mode, set the speed and advertised speed to something lower than the > max speed and verify that the link speed is that (ethtool). On the > ConnectX-4-LX cards, I just had to set both interfaces down and then > back up at the same time, on the ConnectX-3 I had to pull the cable > (shutting down the client might have worked). Then set up target and > client with iSER, format and run the test and it should trigger > automatically. > > Looking at release notes on the ConnectX-4-LX cards, the latest > firmware may fix the bug that so easily exposes the problem with that > card. My cards are SuperMicro branded cards and don't have the new > firmware available yet. > > Good luck. > ---------------- > Robert LeBlanc > PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 > > > On Fri, Jan 13, 2017 at 8:10 AM, Laurence Oberman <loberman@redhat.com> > wrote: > > > > > > ----- Original Message ----- > >> From: "Robert LeBlanc" <robert@leblancnet.us> > >> To: "Laurence Oberman" <loberman@redhat.com> > >> Cc: "Doug Ledford" <dledford@redhat.com>, "Nicholas A. Bellinger" > >> <nab@linux-iscsi.org>, "Zhu Lingshan" > >> <lszhu@suse.com>, "linux-rdma" <linux-rdma@vger.kernel.org>, > >> linux-scsi@vger.kernel.org, "Sagi Grimberg" > >> <sagi@grimberg.me>, "Christoph Hellwig" <hch@lst.de> > >> Sent: Thursday, January 12, 2017 4:26:05 PM > >> Subject: Re: iscsi_trx going into D state > >> > >> Sorry sent prematurely... > >> > >> On Thu, Jan 12, 2017 at 2:22 PM, Robert LeBlanc <robert@leblancnet.us> > >> wrote: > >> > I'm having trouble replicating the D state issue on Infiniband (I was > >> > able to trigger it reliably a couple weeks back, I don't know if OFED > >> > to verify the same results happen there as well. > >> > >> I'm having trouble replicating the D state issue on Infiniband (I was > >> able to trigger it reliably a couple weeks back, I don't know if OFED > >> being installed is altering things but it only installed for 3.10. The > >> ConnectX-4-LX exposes the issue easily if you have those cards.) to > >> verify the same results happen there as well. > >> > >> ---------------- > >> Robert LeBlanc > >> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 > >> -- > >> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in > >> the body of a message to majordomo@vger.kernel.org > >> More majordomo info at http://vger.kernel.org/majordomo-info.html > >> > > > > I am only back in the office next Wednesday. > > I have this all setup using ConnectX-4 with IB/ISER but have no way of > > remotely creating the disconnect as I currently have it back-to-back. > > Have run multiple tests with IB and ISER hard resting the client to break > > the IB connection but have not been able to reproduce as yet. > > So it will have to wait until I can pull cables next week as that seemed to > > be the way you have been reproducing this. > > > > This is in a code area I also don't have a lot of knowledge of the flow but > > have started trying to understand it better. > > > > Thanks > > Laurence > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Hello Robert I will try this sometime tomorrow by running in ethernet mode. Its been days of resets with no reproduction so I agree, very hard ro trproduce with Infiniband. Thanks Laurence -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c index 8368764..ed36748 100644 --- a/drivers/infiniband/core/verbs.c +++ b/drivers/infiniband/core/verbs.c @@ -2089,3 +2089,19 @@ void ib_drain_qp(struct ib_qp *qp) ib_drain_rq(qp); } EXPORT_SYMBOL(ib_drain_qp); + +void ib_reset_sq(struct ib_qp *qp) +{ + struct ib_qp_attr attr = { .qp_state = IB_QPS_RESET}; + int ret; + + ret = ib_modify_qp(qp, &attr, IB_QP_STATE); +} +EXPORT_SYMBOL(ib_reset_sq); + +void ib_reset_qp(struct ib_qp *qp) +{ + printk("ib_reset_qp calling ib_reset_sq.\n"); + ib_reset_sq(qp); +} +EXPORT_SYMBOL(ib_reset_qp); diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c index 6dd43f6..619dbc7 100644 --- a/drivers/infiniband/ulp/isert/ib_isert.c +++ b/drivers/infiniband/ulp/isert/ib_isert.c @@ -2595,10 +2595,9 @@ static void isert_wait_conn(struct iscsi_conn *conn) isert_conn_terminate(isert_conn); mutex_unlock(&isert_conn->mutex); - ib_drain_qp(isert_conn->qp); + ib_reset_qp(isert_conn->qp); isert_put_unsol_pending_cmds(conn); - isert_wait4cmds(conn); - isert_wait4logout(isert_conn); + cancel_work_sync(&isert_conn->release_work); queue_work(isert_release_wq, &isert_conn->release_work); } @@ -2607,7 +2606,7 @@ static void isert_free_conn(struct iscsi_conn *conn) { struct isert_conn *isert_conn = conn->context; - ib_drain_qp(isert_conn->qp); + ib_close_qp(isert_conn->qp); isert_put_conn(isert_conn); } diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 5ad43a4..3310c37 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -3357,4 +3357,6 @@ int ib_sg_to_pages(struct ib_mr *mr, struct scatterlist *sgl, int sg_nents, void ib_drain_rq(struct ib_qp *qp); void ib_drain_sq(struct ib_qp *qp); void ib_drain_qp(struct ib_qp *qp); +void ib_reset_sq(struct ib_qp *qp); +void ib_reset_qp(struct ib_qp *qp);