Message ID | 20171013184347.28410.87093.stgit@phlsvslse11.ph.intel.com (mailing list archive) |
---|---|
State | Changes Requested |
Headers | show |
PiBUaGUgTUxJRCB2YWx1ZSBwYXNzZWQgdG8gaWJ2X2F0dGFjaC9kZXRhY2hfbWNhc3QoKSBtdXN0 IGJlIGluIGhvc3QNCj4gYnl0ZSBvcmRlci4NCj4gDQo+IGFjbXAuYyBpbmNvcnJlY3RseSB1c2Vz IHRoZSBiaWcgZW5kaWFuIGZvcm1hdCB3aGVuIGRvaW5nIGEgbXVsdGljYXN0DQo+IGF0dGFjaC9k ZXRhY2ggKGpvaW4pLiBNdWx0aWNhc3QgcGFja2V0cyBhcmUgdXNlZCB0byBkbyBuYW1lIHJlc29s dXRpb24NCj4gYnkgdGhlIGxpYmliYWNtcCBsaWJyYXJ5Lg0KPiANCj4gVGhlcmUgYXJlIHR3byBw b3NzaWJsZSByZXN1bHRzIGJlY2F1c2Ugb2YgdGhpcyBpc3N1ZS4NCj4gDQo+IElmIGEga2VybmVs IGhhcyBjb21taXQgMDBiOGEzMzUxYjJiLCB0aGUgYXR0YWNoIHdpbGwgZmFpbCB3aXRoIGFuDQo+ IEVJTlZBTC4gIGliYWNtIHdpbGwgbG9nIHRoaXMgYXMgYSBmYWlsdXJlIGR1cmluZyB0aGUgbXVs dGljYXN0IGpvaW4uDQo+IA0KPiBJZiBhIGtlcm5lbCBkb2VzIG5vdCBoYXZlIGNvbW1pdCAwMGI4 YTMzNTFiMmIsIHRoZSBhdHRhY2ggd2lsbA0KPiBjb21wbGV0ZSBzdWNjZXNzZnVsbHkuICBQYWNr ZXRzIHNlbnQgdG8gdGhpcyBhZGRyZXNzIHdpbGwgYmUgZHJvcHBlZA0KPiBiZWNhdXNlIHRoZSBw YWNrZXQgZGxpZCB2YWx1ZSBhbmQgdGhlIG11bHRpY2FzdCBhZGRyZXNzIGluZm9ybWF0aW9uDQo+ IGdpdmVuIGJ5IHRoZSBhdHRhY2ggd2lsbCBub3QgbWF0Y2guDQo+IA0KPiBVcGRhdGUgTUxJRCB1 c2FnZSB0byB1c2UgdGhlIGNvcnJlY3QgYnl0ZSBvcmRlci4NCj4gDQo+IFJldmlld2VkLWJ5OiBN aWtlIE1hcmNpbmlzenluIDxtaWtlLm1hcmNpbmlzenluQGludGVsLmNvbT4NCj4gU2lnbmVkLW9m Zi1ieTogTWljaGFlbCBKLiBSdWhsIDxtaWNoYWVsLmoucnVobEBpbnRlbC5jb20+DQo+IC0tLQ0K PiAgaWJhY20vcHJvdi9hY21wL3NyYy9hY21wLmMgfCAgICA0ICsrLS0NCj4gIDEgZmlsZXMgY2hh bmdlZCwgMiBpbnNlcnRpb25zKCspLCAyIGRlbGV0aW9ucygtKQ0KPiANCj4gZGlmZiAtLWdpdCBh L2liYWNtL3Byb3YvYWNtcC9zcmMvYWNtcC5jIGIvaWJhY20vcHJvdi9hY21wL3NyYy9hY21wLmMN Cj4gaW5kZXggYWE3ODQxNi4uNzhkOWEyOSAxMDA2NDQNCj4gLS0tIGEvaWJhY20vcHJvdi9hY21w L3NyYy9hY21wLmMNCj4gKysrIGIvaWJhY20vcHJvdi9hY21wL3NyYy9hY21wLmMNCj4gQEAgLTcz Miw3ICs3MzIsNyBAQCBzdGF0aWMgdm9pZCBhY21wX3Byb2Nlc3Nfam9pbl9yZXNwKHN0cnVjdA0K PiBhY21fc2FfbWFkICpzYV9tYWQpDQo+ICAJCQlhY21fbG9nKDAsICJFUlJPUiAtIHVuYWJsZSB0 byBjcmVhdGUgYWhcbiIpOw0KPiAgCQkJZ290byBvdXQ7DQo+ICAJCX0NCj4gLQkJcmV0ID0gaWJ2 X2F0dGFjaF9tY2FzdChlcC0+cXAsICZtY19yZWMtPm1naWQsIG1jX3JlYy0NCj4gPm1saWQpOw0K PiArCQlyZXQgPSBpYnZfYXR0YWNoX21jYXN0KGVwLT5xcCwgJmRlc3QtPm1naWQsIGRlc3QtDQo+ ID5hdi5kbGlkKTsNCj4gIAkJaWYgKHJldCkgew0KPiAgCQkJYWNtX2xvZygwLCAiRVJST1IgLSB1 bmFibGUgdG8gYXR0YWNoIFFQIHRvIG11bHRpY2FzdA0KPiBncm91cFxuIik7DQo+ICAJCQlpYnZf ZGVzdHJveV9haChkZXN0LT5haCk7DQo+IEBAIC0xNDI5LDcgKzE0MjksNyBAQCBzdGF0aWMgdm9p ZCBhY21wX2VwX2pvaW4oc3RydWN0IGFjbXBfZXAgKmVwKQ0KPiANCj4gIAlpZiAoZXAtPm1jX2Rl c3RbMF0uc3RhdGUgPT0gQUNNUF9SRUFEWSAmJiBlcC0+bWNfZGVzdFswXS5haCkgew0KPiAgCQlp YnZfZGV0YWNoX21jYXN0KGVwLT5xcCwgJmVwLT5tY19kZXN0WzBdLm1naWQsDQo+IC0JCQkJIGJl MTZ0b2goZXAtPm1jX2Rlc3RbMF0uYXYuZGxpZCkpOw0KPiArCQkJCSBlcC0+bWNfZGVzdFswXS5h di5kbGlkKTsNCj4gIAkJaWJ2X2Rlc3Ryb3lfYWgoZXAtPm1jX2Rlc3RbMF0uYWgpOw0KPiAgCQll cC0+bWNfZGVzdFswXS5haCA9IE5VTEw7DQo+ICAJfQ0KDQpDaGFuZ2VzIGxvb2sgY29ycmVjdCBm b3IgYm90aCBwYXRjaGVzLg0KDQpBY2tlZC1ieTogU2VhbiBIZWZ0eSA8c2Vhbi5oZWZ0eUBpbnRl bC5jb20+DQoNCkl0IHdvdWxkIGJlIG5pY2UgdG8gdW5kZXJzdGFuZCBob3cgdGhlIGNvZGUgd2Fz IHdvcmtpbmcgaW4gdGhlIHBhc3QuICBBdCBsZWFzdCBpYmFjbSBoYXMgYmVlbiBhYmxlIHRvIHJl cG9ydCBjYWNoZWQgZGF0YS4gIEFsbCBub2RlcyB3b3VsZCBoYXZlIGpvaW5lZCB0aGUgd3Jvbmcg bWNhc3QgZ3JvdXAsIGJ1dCBhcyB5b3UgbWVudGlvbiB0aGUgZGxpZCBpbiB0aGUgQVYgd291bGRu J3QgaGF2ZSBtYXRjaGVkLiAgSSB0cmllZCBsb29raW5nIGJhY2sgdGhyb3VnaCB0aGUgaWJhY20g aGlzdG9yeSwgYnV0IGRpZG4ndCBzZWUgYW55IHJlbGV2YW50IGNoYW5nZXMuIA0K -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> -----Original Message----- > From: Hefty, Sean > Sent: Friday, October 13, 2017 3:37 PM > To: Ruhl, Michael J <michael.j.ruhl@intel.com> > Cc: linux-rdma@vger.kernel.org; hal@dev.mellanox.co.il > Subject: RE: [PATCH 1/2] ibacm: Incorrect usage of BE byte order of MLID > attach/detach_mcast() > > > The MLID value passed to ibv_attach/detach_mcast() must be in host > > byte order. > > > > acmp.c incorrectly uses the big endian format when doing a multicast > > attach/detach (join). Multicast packets are used to do name resolution > > by the libibacmp library. > > > > There are two possible results because of this issue. > > > > If a kernel has commit 00b8a3351b2b, the attach will fail with an > > EINVAL. ibacm will log this as a failure during the multicast join. > > > > If a kernel does not have commit 00b8a3351b2b, the attach will > > complete successfully. Packets sent to this address will be dropped > > because the packet dlid value and the multicast address information > > given by the attach will not match. > > > > Update MLID usage to use the correct byte order. > > > > Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> > > Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> > > --- > > ibacm/prov/acmp/src/acmp.c | 4 ++-- > > 1 files changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/ibacm/prov/acmp/src/acmp.c b/ibacm/prov/acmp/src/acmp.c > > index aa78416..78d9a29 100644 > > --- a/ibacm/prov/acmp/src/acmp.c > > +++ b/ibacm/prov/acmp/src/acmp.c > > @@ -732,7 +732,7 @@ static void acmp_process_join_resp(struct > > acm_sa_mad *sa_mad) > > acm_log(0, "ERROR - unable to create ah\n"); > > goto out; > > } > > - ret = ibv_attach_mcast(ep->qp, &mc_rec->mgid, mc_rec- > > >mlid); > > + ret = ibv_attach_mcast(ep->qp, &dest->mgid, dest- > > >av.dlid); > > if (ret) { > > acm_log(0, "ERROR - unable to attach QP to multicast > > group\n"); > > ibv_destroy_ah(dest->ah); > > @@ -1429,7 +1429,7 @@ static void acmp_ep_join(struct acmp_ep *ep) > > > > if (ep->mc_dest[0].state == ACMP_READY && ep->mc_dest[0].ah) { > > ibv_detach_mcast(ep->qp, &ep->mc_dest[0].mgid, > > - be16toh(ep->mc_dest[0].av.dlid)); > > + ep->mc_dest[0].av.dlid); > > ibv_destroy_ah(ep->mc_dest[0].ah); > > ep->mc_dest[0].ah = NULL; > > } > > Changes look correct for both patches. > > Acked-by: Sean Hefty <sean.hefty@intel.com> > > It would be nice to understand how the code was working in the past. At > least ibacm has been able to report cached data. All nodes would have > joined the wrong mcast group, but as you mention the dlid in the AV > wouldn't have matched. I tried looking back through the ibacm history, but > didn't see any relevant changes. I don’t believe that the MLID value was checked. I added a patch several months ago to verify that the MLID was a mcast lid (in ib_attach_mcast). And it may be that this is what caused this to stop working. M
On Fri, Oct 13, 2017 at 02:43:47PM -0400, Michael J. Ruhl wrote: > From: Michael J. Ruhl <michael.j.ruhl@intel.com> > > The MLID value passed to ibv_attach/detach_mcast() must be in host > byte order. > > acmp.c incorrectly uses the big endian format when doing a multicast > attach/detach (join). Multicast packets are used to do name resolution > by the libibacmp library. > > There are two possible results because of this issue. > > If a kernel has commit 00b8a3351b2b, the attach will fail with an > EINVAL. ibacm will log this as a failure during the multicast join. Kernel doesn't have such commit. https://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma.git/commit/?h=k.o/for-next&id=00b8a3351b2b > > If a kernel does not have commit 00b8a3351b2b, the attach will > complete successfully. Packets sent to this address will be dropped > because the packet dlid value and the multicast address information > given by the attach will not match. > > Update MLID usage to use the correct byte order. > > Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> > Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> > --- > ibacm/prov/acmp/src/acmp.c | 4 ++-- > 1 files changed, 2 insertions(+), 2 deletions(-) > > diff --git a/ibacm/prov/acmp/src/acmp.c b/ibacm/prov/acmp/src/acmp.c > index aa78416..78d9a29 100644 > --- a/ibacm/prov/acmp/src/acmp.c > +++ b/ibacm/prov/acmp/src/acmp.c > @@ -732,7 +732,7 @@ static void acmp_process_join_resp(struct acm_sa_mad *sa_mad) > acm_log(0, "ERROR - unable to create ah\n"); > goto out; > } > - ret = ibv_attach_mcast(ep->qp, &mc_rec->mgid, mc_rec->mlid); > + ret = ibv_attach_mcast(ep->qp, &dest->mgid, dest->av.dlid); > if (ret) { > acm_log(0, "ERROR - unable to attach QP to multicast group\n"); > ibv_destroy_ah(dest->ah); > @@ -1429,7 +1429,7 @@ static void acmp_ep_join(struct acmp_ep *ep) > > if (ep->mc_dest[0].state == ACMP_READY && ep->mc_dest[0].ah) { > ibv_detach_mcast(ep->qp, &ep->mc_dest[0].mgid, > - be16toh(ep->mc_dest[0].av.dlid)); > + ep->mc_dest[0].av.dlid); > ibv_destroy_ah(ep->mc_dest[0].ah); > ep->mc_dest[0].ah = NULL; > } > > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/ibacm/prov/acmp/src/acmp.c b/ibacm/prov/acmp/src/acmp.c index aa78416..78d9a29 100644 --- a/ibacm/prov/acmp/src/acmp.c +++ b/ibacm/prov/acmp/src/acmp.c @@ -732,7 +732,7 @@ static void acmp_process_join_resp(struct acm_sa_mad *sa_mad) acm_log(0, "ERROR - unable to create ah\n"); goto out; } - ret = ibv_attach_mcast(ep->qp, &mc_rec->mgid, mc_rec->mlid); + ret = ibv_attach_mcast(ep->qp, &dest->mgid, dest->av.dlid); if (ret) { acm_log(0, "ERROR - unable to attach QP to multicast group\n"); ibv_destroy_ah(dest->ah); @@ -1429,7 +1429,7 @@ static void acmp_ep_join(struct acmp_ep *ep) if (ep->mc_dest[0].state == ACMP_READY && ep->mc_dest[0].ah) { ibv_detach_mcast(ep->qp, &ep->mc_dest[0].mgid, - be16toh(ep->mc_dest[0].av.dlid)); + ep->mc_dest[0].av.dlid); ibv_destroy_ah(ep->mc_dest[0].ah); ep->mc_dest[0].ah = NULL; }