Message ID | 1515791472.2396.57.camel@wdc.com (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
On Fri, 2018-01-12 at 21:11 +0000, Bart Van Assche wrote: > On Thu, 2018-01-11 at 16:33 -0500, Laurence Oberman wrote: > > I just rebooted the server into 4.13 and its fine again and found > > all > > the targets with the same kernel on the client. > > > > So its specific to your new tree with srpt > > > > I will reboot again and re-load LIO and show you but here is my ACL > > list that has been this way for some time. > > > > > > o- srpt > > ................................................................... > > .... > > ...................................... [Targets: 2] > > | o- ib.fe800000000000007cfe900300726e4e > > ................................................................... > > .... > > .... [no-gen-acls] > > | | o- acls > > ................................................................... > > .... > > ..................................... [ACLs: 8] > > | | | o- ib.4e6e72000390fe7c7cfe900300726ed2 > > > > [ ... ] > > Hello Laurence, > > Although I'm not sure I think I found the root cause of this failure. > The > following patch should fix the failure: > > diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c > b/drivers/infiniband/ulp/srpt/ib_srpt.c > index 96142110a155..5297963c834d 100644 > --- a/drivers/infiniband/ulp/srpt/ib_srpt.c > +++ b/drivers/infiniband/ulp/srpt/ib_srpt.c > @@ -2083,7 +2083,7 @@ static int srpt_cm_req_recv(struct srpt_device > *const sdev, > struct ib_cm_rep_param ib_cm; > } *rep_param = NULL; > struct srpt_rdma_ch *ch; > - char i_port_id[24]; > + char i_port_id[36]; > u32 it_iu_len; > int i, ret; > > diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.h > b/drivers/infiniband/ulp/srpt/ib_srpt.h > index bf4525b24d98..02883f8e9c71 100644 > --- a/drivers/infiniband/ulp/srpt/ib_srpt.h > +++ b/drivers/infiniband/ulp/srpt/ib_srpt.h > @@ -308,7 +308,7 @@ struct srpt_rdma_ch { > bool using_rdma_cm; > bool processing_wait_list; > struct se_session *sess; > - u8 sess_name[36]; > + u8 sess_name[24]; > struct work_struct release_work; > }; > > > I wrote "should" because targetcli is not installed on my test setup > and > because I have not yet verified this change with targetcli. If you > have the > time to verify this change that would be great. If not then I will > install > targetcli myself and verify this change. > > Thanks, > > Bart.NrybXǧv^){.n+{ٚ{ayʇڙ,jfhzwj:+vwjmzZ+ݢj"! Hi Bart I will get this tested tonight and report back. Fix makes sesne. Regards Laurence -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 2018-01-12 at 19:09 -0500, Laurence Oberman wrote: > On Fri, 2018-01-12 at 21:11 +0000, Bart Van Assche wrote: > > On Thu, 2018-01-11 at 16:33 -0500, Laurence Oberman wrote: > > > I just rebooted the server into 4.13 and its fine again and found > > > all > > > the targets with the same kernel on the client. > > > > > > So its specific to your new tree with srpt > > > > > > I will reboot again and re-load LIO and show you but here is my > > > ACL > > > list that has been this way for some time. > > > > > > > > > o- srpt > > > ................................................................. > > > .. > > > .... > > > ...................................... [Targets: 2] > > > | o- ib.fe800000000000007cfe900300726e4e > > > ................................................................. > > > .. > > > .... > > > .... [no-gen-acls] > > > | | o- acls > > > ................................................................. > > > .. > > > .... > > > ..................................... [ACLs: 8] > > > | | | o- ib.4e6e72000390fe7c7cfe900300726ed2 > > > > > > [ ... ] > > > > Hello Laurence, > > > > Although I'm not sure I think I found the root cause of this > > failure. > > The > > following patch should fix the failure: > > > > diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c > > b/drivers/infiniband/ulp/srpt/ib_srpt.c > > index 96142110a155..5297963c834d 100644 > > --- a/drivers/infiniband/ulp/srpt/ib_srpt.c > > +++ b/drivers/infiniband/ulp/srpt/ib_srpt.c > > @@ -2083,7 +2083,7 @@ static int srpt_cm_req_recv(struct > > srpt_device > > *const sdev, > > struct ib_cm_rep_param ib_cm; > > } *rep_param = NULL; > > struct srpt_rdma_ch *ch; > > - char i_port_id[24]; > > + char i_port_id[36]; > > u32 it_iu_len; > > int i, ret; > > > > diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.h > > b/drivers/infiniband/ulp/srpt/ib_srpt.h > > index bf4525b24d98..02883f8e9c71 100644 > > --- a/drivers/infiniband/ulp/srpt/ib_srpt.h > > +++ b/drivers/infiniband/ulp/srpt/ib_srpt.h > > @@ -308,7 +308,7 @@ struct srpt_rdma_ch { > > bool using_rdma_cm; > > bool processing_wait_list; > > struct se_session *sess; > > - u8 sess_name[36]; > > + u8 sess_name[24]; > > struct work_struct release_work; > > }; > > > > > > I wrote "should" because targetcli is not installed on my test > > setup > > and > > because I have not yet verified this change with targetcli. If you > > have the > > time to verify this change that would be great. If not then I will > > install > > targetcli myself and verify this change. > > > > Thanks, > > > > Bart.NrybXǧv^){.n+{ٚ{ayʇڙ,jfhzwj:+vwjmzZ+ݢj"! > > > Hi Bart > > I will get this tested tonight and report back. > > Fix makes sesne. > > Regards > Laurence Hello Bart For the patch above: This corrects the connectivity issue with LIO targets and I will continue now testing your patches from your tree. Reviewed-by: Laurence Oberman <loberman@redhat.com> Tested-by: Laurence Oberman <loberman@redhat.com> Thank you for your quick response Sir. Laurence -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 2018-01-12 at 20:57 -0500, Laurence Oberman wrote: > On Fri, 2018-01-12 at 19:09 -0500, Laurence Oberman wrote: > > On Fri, 2018-01-12 at 21:11 +0000, Bart Van Assche wrote: > > > On Thu, 2018-01-11 at 16:33 -0500, Laurence Oberman wrote: > > > > I just rebooted the server into 4.13 and its fine again and > > > > found > > > > all > > > > the targets with the same kernel on the client. > > > > > > > > So its specific to your new tree with srpt > > > > > > > > I will reboot again and re-load LIO and show you but here is my > > > > ACL > > > > list that has been this way for some time. > > > > > > > > > > > > o- srpt > > > > ............................................................... > > > > .. > > > > .. > > > > .... > > > > ...................................... [Targets: 2] > > > > | o- ib.fe800000000000007cfe900300726e4e > > > > ............................................................... > > > > .. > > > > .. > > > > .... > > > > .... [no-gen-acls] > > > > | | o- acls > > > > ............................................................... > > > > .. > > > > .. > > > > .... > > > > ..................................... [ACLs: 8] > > > > | | | o- ib.4e6e72000390fe7c7cfe900300726ed2 > > > > > > > > [ ... ] > > > > > > Hello Laurence, > > > > > > Although I'm not sure I think I found the root cause of this > > > failure. > > > The > > > following patch should fix the failure: > > > > > > diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c > > > b/drivers/infiniband/ulp/srpt/ib_srpt.c > > > index 96142110a155..5297963c834d 100644 > > > --- a/drivers/infiniband/ulp/srpt/ib_srpt.c > > > +++ b/drivers/infiniband/ulp/srpt/ib_srpt.c > > > @@ -2083,7 +2083,7 @@ static int srpt_cm_req_recv(struct > > > srpt_device > > > *const sdev, > > > struct ib_cm_rep_param ib_cm; > > > } *rep_param = NULL; > > > struct srpt_rdma_ch *ch; > > > - char i_port_id[24]; > > > + char i_port_id[36]; > > > u32 it_iu_len; > > > int i, ret; > > > > > > diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.h > > > b/drivers/infiniband/ulp/srpt/ib_srpt.h > > > index bf4525b24d98..02883f8e9c71 100644 > > > --- a/drivers/infiniband/ulp/srpt/ib_srpt.h > > > +++ b/drivers/infiniband/ulp/srpt/ib_srpt.h > > > @@ -308,7 +308,7 @@ struct srpt_rdma_ch { > > > bool using_rdma_cm; > > > bool processing_wait_list; > > > struct se_session *sess; > > > - u8 sess_name[36]; > > > + u8 sess_name[24]; > > > struct work_struct release_work; > > > }; > > > > > > > > > I wrote "should" because targetcli is not installed on my test > > > setup > > > and > > > because I have not yet verified this change with targetcli. If > > > you > > > have the > > > time to verify this change that would be great. If not then I > > > will > > > install > > > targetcli myself and verify this change. > > > > > > Thanks, > > > > > > Bart.NrybXǧv^){.n+{ٚ{ayʇڙ,jfhzwj:+vwjmzZ+ݢj"! > > > > > > Hi Bart > > > > I will get this tested tonight and report back. > > > > Fix makes sesne. > > > > Regards > > Laurence > > Hello Bart > For the patch above: > > This corrects the connectivity issue with LIO targets and I will > continue now testing your patches from your tree. > > Reviewed-by: Laurence Oberman <loberman@redhat.com> > Tested-by: Laurence Oberman <loberman@redhat.com> > > Thank you for your quick response Sir. > > Laurence > Hello Bart I missed some logs when I tested last night. Its working fine as mentioned with the the above patch and I see all the targets (That's what I checked for). However I still see these in the srpt server, but I get access to all the targets now on the client. [ 239.502025] ib_srpt Received SRP_LOGIN_REQ with i_port_id 7cfe:9003:0072:6e4f:7cfe:9003:0072:6ed3, t_port_id 7cfe:9003:0072:6e4e:7cfe:9003:0072:6e4e and it_iu_len 2116 on port 1 (guid=fe80:0000:0000:0000:7cfe:9003:0072:6e4f); pkey 0xffff [ 239.623881] ib_srpt failed to create queue pair with sq_size = 16384 (-12) - retrying [ 239.669381] ib_srpt failed to create queue pair with sq_size = 8192 (-12) - retrying [ 239.715366] ib_srpt Received SRP_LOGIN_REQ with i_port_id 7cfe:9003:0072:6e4e:7cfe:9003:0072:6ed2, t_port_id 7cfe:9003:0072:6e4e:7cfe:9003:0072:6e4e and it_iu_len 2116 on port 1 (guid=fe80:0000:0000:0000:7cfe:9003:0072:6e4e); pkey 0xffff [ 239.831661] ib_srpt failed to create queue pair with sq_size = 16384 (-12) - retrying [ 239.877193] ib_srpt failed to create queue pair with sq_size = 8192 (-12) - retrying [ 239.967259] ib_srpt Received SRP_LOGIN_REQ with i_port_id 7cfe:9003:0072:6e4f:7cfe:9003:0072:6ed3, t_port_id 7cfe:9003:0072:6e4e:7cfe:9003:0072:6e4e and it_iu_len 2116 on port 1 (guid=fe80:0000:0000:0000:7cfe:9003:0072:6e4f); pkey 0xffff [ 240.087362] ib_srpt failed to create queue pair with sq_size = 16384 (-12) - retrying [ 240.130981] ib_srpt failed to create queue pair with sq_size = 8192 (-12) - retrying .. .. So the functional report was valid but we need to see why we are still getting the messages above. Apologies, should have checked all the logs last night before my first reply. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, 2018-01-13 at 09:53 -0500, Laurence Oberman wrote: > [ 239.502025] ib_srpt Received SRP_LOGIN_REQ with i_port_id > 7cfe:9003:0072:6e4f:7cfe:9003:0072:6ed3, t_port_id > 7cfe:9003:0072:6e4e:7cfe:9003:0072:6e4e and it_iu_len 2116 on port 1 > (guid=fe80:0000:0000:0000:7cfe:9003:0072:6e4f); pkey 0xffff > [ 239.623881] ib_srpt failed to create queue pair with sq_size = 16384 > (-12) - retrying > [ 239.669381] ib_srpt failed to create queue pair with sq_size = 8192 > (-12) - retrying > [ 239.715366] ib_srpt Received SRP_LOGIN_REQ with i_port_id > 7cfe:9003:0072:6e4e:7cfe:9003:0072:6ed2, t_port_id > 7cfe:9003:0072:6e4e:7cfe:9003:0072:6e4e and it_iu_len 2116 on port 1 > (guid=fe80:0000:0000:0000:7cfe:9003:0072:6e4e); pkey 0xffff > [ 239.831661] ib_srpt failed to create queue pair with sq_size = 16384 > (-12) - retrying > [ 239.877193] ib_srpt failed to create queue pair with sq_size = 8192 > (-12) - retrying Hello Laurence, These messages are expected and do not indicate a failure. The retry loop the above messages refer to got introduced a long time ago: commit ab477c1ff5e0a744c072404bf7db51bfe1f05b6e Author: Bart Van Assche <bvanassche@acm.org> Date: Sun Oct 19 18:05:33 2014 +0300 srp-target: Retry when QP creation fails with ENOMEM It is not guaranteed to that srp_sq_size is supported by the HCA. So if we failed to create the QP with ENOMEM, try with a smaller srp_sq_size. Keep it up until we hit MIN_SRPT_SQ_SIZE, then fail the connection. [ ... ] The only recent change in that code is that retry attempts are now logged. From commit 0e9949f1db6c "IB/srpt: Add RDMA/CM support": + if (ret) { + bool retry = sq_size > MIN_SRPT_SQ_SIZE; + + pr_err("failed to create queue pair with sq_size = %d (%d)%s\n", + sq_size, ret, retry ? " - retrying" : ""); + if (retry) { + ib_free_cq(ch->cq); + sq_size = max(sq_size / 2, MIN_SRPT_SQ_SIZE); + goto retry; + } else { + goto err_destroy_cq; } - pr_err("failed to create_qp ret= %d\n", ret); - goto err_destroy_cq; } Do you perhaps want that pr_err() to be changed into a pr_debug() for retry attempts? Thanks, Bart.
On Mon, 2018-01-15 at 16:12 +0000, Bart Van Assche wrote: > On Sat, 2018-01-13 at 09:53 -0500, Laurence Oberman wrote: > > [ 239.502025] ib_srpt Received SRP_LOGIN_REQ with i_port_id > > 7cfe:9003:0072:6e4f:7cfe:9003:0072:6ed3, t_port_id > > 7cfe:9003:0072:6e4e:7cfe:9003:0072:6e4e and it_iu_len 2116 on port > > 1 > > (guid=fe80:0000:0000:0000:7cfe:9003:0072:6e4f); pkey 0xffff > > [ 239.623881] ib_srpt failed to create queue pair with sq_size = > > 16384 > > (-12) - retrying > > [ 239.669381] ib_srpt failed to create queue pair with sq_size = > > 8192 > > (-12) - retrying > > [ 239.715366] ib_srpt Received SRP_LOGIN_REQ with i_port_id > > 7cfe:9003:0072:6e4e:7cfe:9003:0072:6ed2, t_port_id > > 7cfe:9003:0072:6e4e:7cfe:9003:0072:6e4e and it_iu_len 2116 on port > > 1 > > (guid=fe80:0000:0000:0000:7cfe:9003:0072:6e4e); pkey 0xffff > > [ 239.831661] ib_srpt failed to create queue pair with sq_size = > > 16384 > > (-12) - retrying > > [ 239.877193] ib_srpt failed to create queue pair with sq_size = > > 8192 > > (-12) - retrying > > Hello Laurence, > > These messages are expected and do not indicate a failure. The retry > loop > the above messages refer to got introduced a long time ago: > > commit ab477c1ff5e0a744c072404bf7db51bfe1f05b6e > Author: Bart Van Assche <bvanassche@acm.org> > Date: Sun Oct 19 18:05:33 2014 +0300 > > srp-target: Retry when QP creation fails with ENOMEM > > It is not guaranteed to that srp_sq_size is supported > by the HCA. So if we failed to create the QP with ENOMEM, > try with a smaller srp_sq_size. Keep it up until we hit > MIN_SRPT_SQ_SIZE, then fail the connection. > > [ ... ] > > The only recent change in that code is that retry attempts are now > logged. > From commit 0e9949f1db6c "IB/srpt: Add RDMA/CM support": > > + if (ret) { > + bool retry = sq_size > MIN_SRPT_SQ_SIZE; > + > + pr_err("failed to create queue pair with sq_size = %d > (%d)%s\n", > + sq_size, ret, retry ? " - retrying" : ""); > + if (retry) { > + ib_free_cq(ch->cq); > + sq_size = max(sq_size / 2, MIN_SRPT_SQ_SIZE); > + goto retry; > + } else { > + goto err_destroy_cq; > } > - pr_err("failed to create_qp ret= %d\n", ret); > - goto err_destroy_cq; > } > > Do you perhaps want that pr_err() to be changed into a pr_debug() for > retry > attempts? > > Thanks, > > Bart. Hi Bart, I recognized those as maybe just reporting messages so I thought we were were good with the recent patch to fix the connection issue. However when I attempted to actually use the targets with your latest SRPT I had failures on the client. It was a tough weekend for me, and maybe I made mistakes. Let me complete the irq/cpu test Ming is waiting for and I will revisit this fully with a clean build and your most recent patch. I will answer off list while we figure it out Many Thanks Laurence -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c b/drivers/infiniband/ulp/srpt/ib_srpt.c index 96142110a155..5297963c834d 100644 --- a/drivers/infiniband/ulp/srpt/ib_srpt.c +++ b/drivers/infiniband/ulp/srpt/ib_srpt.c @@ -2083,7 +2083,7 @@ static int srpt_cm_req_recv(struct srpt_device *const sdev, struct ib_cm_rep_param ib_cm; } *rep_param = NULL; struct srpt_rdma_ch *ch; - char i_port_id[24]; + char i_port_id[36]; u32 it_iu_len; int i, ret; diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.h b/drivers/infiniband/ulp/srpt/ib_srpt.h index bf4525b24d98..02883f8e9c71 100644 --- a/drivers/infiniband/ulp/srpt/ib_srpt.h +++ b/drivers/infiniband/ulp/srpt/ib_srpt.h @@ -308,7 +308,7 @@ struct srpt_rdma_ch { bool using_rdma_cm; bool processing_wait_list; struct se_session *sess; - u8 sess_name[36]; + u8 sess_name[24]; struct work_struct release_work; };