Message ID | VI1PR0502MB3008273F9B13095A781D8764D18D0@VI1PR0502MB3008.eurprd05.prod.outlook.com (mailing list archive) |
---|---|
State | Deferred |
Headers | show |
On Friday, April 04/27/18, 2018 at 03:36:18 +0000, Parav Pandit wrote: Parav, > Hi Shiraz, Raju, > > > -----Original Message----- > > My validation team reported the same issue on i40iw with 4.17-rc kernels. > > > > Some more data. Looks like the failure is because we can't find the cached gid > > due to the gid idx being wrong in query_gid. > > > > rdma_connect > > cma_connect_iw > > cma_modify_qp_rtr > > ib_query_gid > > ib_get_cached_gid > > __ib_cache_gid_get (EINVAL) > > > This call trace is helpful. > Can you please run ibv_devinfo -v | grep GID and see that you are getting the expected GID. > For iWarp we have only one entry GID table. So want to make sure that GID table is build correctly. > > From the above call trace, is appears that, > cma_modify_qp_rtr() contains, > struct ib_qp_attr qp_attr; > > ah_attr from above qp_attr remains uninitialized by iw_cm_init_qp_attr() and iwcm_init_qp_init_attr(). > Before my fix, gid_index was always ignored by the query_gid() callback such as i40iw_query_gid() and c4iw_query_gid(). > So it used to work. > Now my fix expects all values to be correct; due to uninitialized ib_qp_attr it is likely failing. > > So can you please try below hunk and see if ib_query_gid() progresses for you? > If it works, I will send the proper patch shortly. > Its working. Issue is not seen with the below patch. > diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c > index 8512f63..e119cff 100644 > --- a/drivers/infiniband/core/cma.c > +++ b/drivers/infiniband/core/cma.c > @@ -863,7 +863,7 @@ void rdma_destroy_qp(struct rdma_cm_id *id) > static int cma_modify_qp_rtr(struct rdma_id_private *id_priv, > struct rdma_conn_param *conn_param) > { > - struct ib_qp_attr qp_attr; > + struct ib_qp_attr qp_attr = {}; > int qp_attr_mask, ret; > union ib_gid sgid; > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Raju, > -----Original Message----- > From: Raju Rangoju [mailto:rajur@chelsio.com] > Sent: Friday, April 27, 2018 4:08 AM > To: Parav Pandit <parav@mellanox.com> > Cc: Shiraz Saleem <shiraz.saleem@intel.com>; linux-rdma@vger.kernel.org; > SWise OGC <swise@opengridcomputing.com>; sean.hefty@intel.com > Subject: Re: iwarp kernel mode applications are broken with commit f35faa4ba > > On Friday, April 04/27/18, 2018 at 03:36:18 +0000, Parav Pandit wrote: > > Parav, > > > Hi Shiraz, Raju, > > > > > -----Original Message----- > > > My validation team reported the same issue on i40iw with 4.17-rc kernels. > > > > > > Some more data. Looks like the failure is because we can't find the > > > cached gid due to the gid idx being wrong in query_gid. > > > > > > rdma_connect > > > cma_connect_iw > > > cma_modify_qp_rtr > > > ib_query_gid > > > ib_get_cached_gid > > > __ib_cache_gid_get (EINVAL) > > > > > This call trace is helpful. > > Can you please run ibv_devinfo -v | grep GID and see that you are getting the > expected GID. > > For iWarp we have only one entry GID table. So want to make sure that GID > table is build correctly. > > > > From the above call trace, is appears that, > > cma_modify_qp_rtr() contains, > > struct ib_qp_attr qp_attr; > > > > ah_attr from above qp_attr remains uninitialized by iw_cm_init_qp_attr() and > iwcm_init_qp_init_attr(). > > Before my fix, gid_index was always ignored by the query_gid() callback such > as i40iw_query_gid() and c4iw_query_gid(). > > So it used to work. > > Now my fix expects all values to be correct; due to uninitialized ib_qp_attr it is > likely failing. > > > > So can you please try below hunk and see if ib_query_gid() progresses for you? > > If it works, I will send the proper patch shortly. > > > > Its working. Issue is not seen with the below patch. Thanks a lot for the quick test. I will send the patch through Leon's queue for for-rc with your Tested-by tag. > > > > diff --git a/drivers/infiniband/core/cma.c > > b/drivers/infiniband/core/cma.c index 8512f63..e119cff 100644 > > --- a/drivers/infiniband/core/cma.c > > +++ b/drivers/infiniband/core/cma.c > > @@ -863,7 +863,7 @@ void rdma_destroy_qp(struct rdma_cm_id *id) > > static int cma_modify_qp_rtr(struct rdma_id_private *id_priv, > > struct rdma_conn_param *conn_param) { > > - struct ib_qp_attr qp_attr; > > + struct ib_qp_attr qp_attr = {}; > > int qp_attr_mask, ret; > > union ib_gid sgid; > > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
>Subject: RE: iwarp kernel mode applications are broken with commit f35faa4ba > >> >> > diff --git a/drivers/infiniband/core/cma.c >> > b/drivers/infiniband/core/cma.c index 8512f63..e119cff 100644 >> > --- a/drivers/infiniband/core/cma.c >> > +++ b/drivers/infiniband/core/cma.c >> > @@ -863,7 +863,7 @@ void rdma_destroy_qp(struct rdma_cm_id *id) >> > static int cma_modify_qp_rtr(struct rdma_id_private *id_priv, >> > struct rdma_conn_param *conn_param) { >> > - struct ib_qp_attr qp_attr; >> > + struct ib_qp_attr qp_attr = {}; >> > int qp_attr_mask, ret; >> > union ib_gid sgid; >> > >-- Hi Parav - The patch resolves the issue. But shouldn't we remove ib_query_gid() from cma_modify_qp_rtr() altogether? What is its purpose in that function? Shiraz -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Friday, April 04/27/18, 2018 at 16:35:32 +0000, Parav Pandit wrote: > Hi Raju, > > > -----Original Message----- > > From: Raju Rangoju [mailto:rajur@chelsio.com] > > Sent: Friday, April 27, 2018 4:08 AM > > To: Parav Pandit <parav@mellanox.com> > > Cc: Shiraz Saleem <shiraz.saleem@intel.com>; linux-rdma@vger.kernel.org; > > SWise OGC <swise@opengridcomputing.com>; sean.hefty@intel.com > > Subject: Re: iwarp kernel mode applications are broken with commit f35faa4ba > > > > On Friday, April 04/27/18, 2018 at 03:36:18 +0000, Parav Pandit wrote: > > > > Parav, > > > > > Hi Shiraz, Raju, > > > > > > > -----Original Message----- > > > > My validation team reported the same issue on i40iw with 4.17-rc kernels. > > > > > > > > Some more data. Looks like the failure is because we can't find the > > > > cached gid due to the gid idx being wrong in query_gid. > > > > > > > > rdma_connect > > > > cma_connect_iw > > > > cma_modify_qp_rtr > > > > ib_query_gid > > > > ib_get_cached_gid > > > > __ib_cache_gid_get (EINVAL) > > > > > > > This call trace is helpful. > > > Can you please run ibv_devinfo -v | grep GID and see that you are getting the > > expected GID. > > > For iWarp we have only one entry GID table. So want to make sure that GID > > table is build correctly. > > > > > > From the above call trace, is appears that, > > > cma_modify_qp_rtr() contains, > > > struct ib_qp_attr qp_attr; > > > > > > ah_attr from above qp_attr remains uninitialized by iw_cm_init_qp_attr() and > > iwcm_init_qp_init_attr(). > > > Before my fix, gid_index was always ignored by the query_gid() callback such > > as i40iw_query_gid() and c4iw_query_gid(). > > > So it used to work. > > > Now my fix expects all values to be correct; due to uninitialized ib_qp_attr it is > > likely failing. > > > > > > So can you please try below hunk and see if ib_query_gid() progresses for you? > > > If it works, I will send the proper patch shortly. > > > > > > > Its working. Issue is not seen with the below patch. > > Thanks a lot for the quick test. > I will send the patch through Leon's queue for for-rc with your Tested-by tag. > Sure. Thanks Parav. > > > > > > > diff --git a/drivers/infiniband/core/cma.c > > > b/drivers/infiniband/core/cma.c index 8512f63..e119cff 100644 > > > --- a/drivers/infiniband/core/cma.c > > > +++ b/drivers/infiniband/core/cma.c > > > @@ -863,7 +863,7 @@ void rdma_destroy_qp(struct rdma_cm_id *id) > > > static int cma_modify_qp_rtr(struct rdma_id_private *id_priv, > > > struct rdma_conn_param *conn_param) { > > > - struct ib_qp_attr qp_attr; > > > + struct ib_qp_attr qp_attr = {}; > > > int qp_attr_mask, ret; > > > union ib_gid sgid; > > > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Shiraz, > -----Original Message----- > From: Saleem, Shiraz [mailto:shiraz.saleem@intel.com] > Sent: Friday, April 27, 2018 1:26 PM > To: Parav Pandit <parav@mellanox.com>; Raju Rangoju <rajur@chelsio.com> > Cc: linux-rdma@vger.kernel.org; SWise OGC <swise@opengridcomputing.com>; > Hefty, Sean <sean.hefty@intel.com> > Subject: RE: iwarp kernel mode applications are broken with commit f35faa4ba > > >Subject: RE: iwarp kernel mode applications are broken with commit > >f35faa4ba > > > >> > >> > diff --git a/drivers/infiniband/core/cma.c > >> > b/drivers/infiniband/core/cma.c index 8512f63..e119cff 100644 > >> > --- a/drivers/infiniband/core/cma.c > >> > +++ b/drivers/infiniband/core/cma.c > >> > @@ -863,7 +863,7 @@ void rdma_destroy_qp(struct rdma_cm_id *id) > >> > static int cma_modify_qp_rtr(struct rdma_id_private *id_priv, > >> > struct rdma_conn_param *conn_param) { > >> > - struct ib_qp_attr qp_attr; > >> > + struct ib_qp_attr qp_attr = {}; > >> > int qp_attr_mask, ret; > >> > union ib_gid sgid; > >> > > >-- > > Hi Parav - The patch resolves the issue. > But shouldn't we remove ib_query_gid() from cma_modify_qp_rtr() altogether? > What is its purpose in that function? I agree it is not needed. I looked at the commit dd5f03beb4f7 from Matan where sgid was used for smac address. But now the code has improved a lot over it and this query_gid should have been removed when smac thing was refactored, but it was not. For this rc cycle I think we just keep it and only make qp_attr {} which is anyway good thing to do, just trying to avoid too many changes in rc now. But I am fine either ways. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> Hi Shiraz, > > > -----Original Message----- > > From: Saleem, Shiraz [mailto:shiraz.saleem@intel.com] > > Sent: Friday, April 27, 2018 1:26 PM > > To: Parav Pandit <parav@mellanox.com>; Raju Rangoju > <rajur@chelsio.com> > > Cc: linux-rdma@vger.kernel.org; SWise OGC > <swise@opengridcomputing.com>; > > Hefty, Sean <sean.hefty@intel.com> > > Subject: RE: iwarp kernel mode applications are broken with commit > f35faa4ba > > > > >Subject: RE: iwarp kernel mode applications are broken with commit > > >f35faa4ba > > > > > >> > > >> > diff --git a/drivers/infiniband/core/cma.c > > >> > b/drivers/infiniband/core/cma.c index 8512f63..e119cff 100644 > > >> > --- a/drivers/infiniband/core/cma.c > > >> > +++ b/drivers/infiniband/core/cma.c > > >> > @@ -863,7 +863,7 @@ void rdma_destroy_qp(struct rdma_cm_id *id) > > >> > static int cma_modify_qp_rtr(struct rdma_id_private *id_priv, > > >> > struct rdma_conn_param *conn_param) { > > >> > - struct ib_qp_attr qp_attr; > > >> > + struct ib_qp_attr qp_attr = {}; > > >> > int qp_attr_mask, ret; > > >> > union ib_gid sgid; > > >> > > > >-- > > > > Hi Parav - The patch resolves the issue. > > But shouldn't we remove ib_query_gid() from cma_modify_qp_rtr() > altogether? > > What is its purpose in that function? > I agree it is not needed. I looked at the commit dd5f03beb4f7 from Matan > where sgid was used for smac address. > But now the code has improved a lot over it and this query_gid should have > been removed when smac thing was refactored, but it was not. > For this rc cycle I think we just keep it and only make qp_attr {} which is > anyway good thing to do, just trying to avoid too many changes in rc now. > But I am fine either ways. Let's just initialize qp_attr for this -rc. Please post a patch for Doug to merge. And tag it for -stable if that is needed. Thanks, Steve. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> -----Original Message----- > From: Steve Wise [mailto:swise@opengridcomputing.com] > Sent: Friday, April 27, 2018 2:21 PM > To: Parav Pandit <parav@mellanox.com>; 'Saleem, Shiraz' > <shiraz.saleem@intel.com>; 'Raju Rangoju' <rajur@chelsio.com> > Cc: linux-rdma@vger.kernel.org; 'Hefty, Sean' <sean.hefty@intel.com> > Subject: RE: iwarp kernel mode applications are broken with commit f35faa4ba > > > Hi Shiraz, > > > > > -----Original Message----- > > > From: Saleem, Shiraz [mailto:shiraz.saleem@intel.com] > > > Sent: Friday, April 27, 2018 1:26 PM > > > To: Parav Pandit <parav@mellanox.com>; Raju Rangoju > > <rajur@chelsio.com> > > > Cc: linux-rdma@vger.kernel.org; SWise OGC > > <swise@opengridcomputing.com>; > > > Hefty, Sean <sean.hefty@intel.com> > > > Subject: RE: iwarp kernel mode applications are broken with commit > > f35faa4ba > > > > > > >Subject: RE: iwarp kernel mode applications are broken with commit > > > >f35faa4ba > > > > > > > >> > > > >> > diff --git a/drivers/infiniband/core/cma.c > > > >> > b/drivers/infiniband/core/cma.c index 8512f63..e119cff 100644 > > > >> > --- a/drivers/infiniband/core/cma.c > > > >> > +++ b/drivers/infiniband/core/cma.c > > > >> > @@ -863,7 +863,7 @@ void rdma_destroy_qp(struct rdma_cm_id *id) > > > >> > static int cma_modify_qp_rtr(struct rdma_id_private *id_priv, > > > >> > struct rdma_conn_param *conn_param) { > > > >> > - struct ib_qp_attr qp_attr; > > > >> > + struct ib_qp_attr qp_attr = {}; > > > >> > int qp_attr_mask, ret; > > > >> > union ib_gid sgid; > > > >> > > > > >-- > > > > > > Hi Parav - The patch resolves the issue. > > > But shouldn't we remove ib_query_gid() from cma_modify_qp_rtr() > > altogether? > > > What is its purpose in that function? > > I agree it is not needed. I looked at the commit dd5f03beb4f7 from > > Matan where sgid was used for smac address. > > But now the code has improved a lot over it and this query_gid should > > have been removed when smac thing was refactored, but it was not. > > For this rc cycle I think we just keep it and only make qp_attr {} > > which > is > > anyway good thing to do, just trying to avoid too many changes in rc now. > > But I am fine either ways. > > Let's just initialize qp_attr for this -rc. Please post a patch for Doug to merge. > And tag it for -stable if that is needed. ok Steve, we usually let Leon post it. I have already enqueued to Leon's internal tree, so mostly Sun/Mon it should be out to the ML. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index 8512f63..e119cff 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -863,7 +863,7 @@ void rdma_destroy_qp(struct rdma_cm_id *id) static int cma_modify_qp_rtr(struct rdma_id_private *id_priv, struct rdma_conn_param *conn_param) { - struct ib_qp_attr qp_attr; + struct ib_qp_attr qp_attr = {}; int qp_attr_mask, ret; union ib_gid sgid;