Message ID | 1434358036-15526-5-git-send-email-haggaie@mellanox.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
> drivers/infiniband/core/cm.c | 7 +++++++ > include/rdma/ib_cm.h | 2 ++ > 2 files changed, 9 insertions(+) > > diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c > index c5f5f89e274a..46f99ec4080a 100644 > --- a/drivers/infiniband/core/cm.c > +++ b/drivers/infiniband/core/cm.c > @@ -2983,6 +2983,13 @@ static void cm_format_sidr_req_event(struct cm_work > *work, > param->pkey = __be16_to_cpu(sidr_req_msg->pkey); > param->listen_id = listen_id; > param->service_id = sidr_req_msg->service_id; > + if (work->mad_recv_wc->wc->wc_flags & IB_WC_GRH) { > + param->grh = 1; > + memcpy(¶m->dgid, &work->mad_recv_wc->recv_buf.grh->dgid, > + sizeof(param->dgid)); > + } else { > + param->grh = 0; What is the use case here? Are you trying to sort by device? How does the GID of the GMP relate to the listen? -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jun 15, 2015 at 09:32:53PM +0000, Hefty, Sean wrote: > > drivers/infiniband/core/cm.c | 7 +++++++ > > include/rdma/ib_cm.h | 2 ++ > > 2 files changed, 9 insertions(+) > > > > diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c > > index c5f5f89e274a..46f99ec4080a 100644 > > +++ b/drivers/infiniband/core/cm.c > > @@ -2983,6 +2983,13 @@ static void cm_format_sidr_req_event(struct cm_work > > *work, > > param->pkey = __be16_to_cpu(sidr_req_msg->pkey); > > param->listen_id = listen_id; > > param->service_id = sidr_req_msg->service_id; > > + if (work->mad_recv_wc->wc->wc_flags & IB_WC_GRH) { > > + param->grh = 1; > > + memcpy(¶m->dgid, &work->mad_recv_wc->recv_buf.grh->dgid, > > + sizeof(param->dgid)); > > + } else { > > + param->grh = 0; > > What is the use case here? Are you trying to sort by device? How does the GID of the GMP relate to the listen? Ouch, that is getting fugly, it used by cma_save_req_info, which feeds the data into ib_get_net_dev_by_params - basically it chooses the alias GUID'd netdev to use. But how is that going to work? How is the sender to know it should be sending a GRH with the CM message? Falling back to use the primary_path sgid seems like a poor substitute, if APM is being used that might be a totally different port than the CM message. I'm also not sure about the pkey, it seems to me that the pkey used to select the ingress netdevice should be the pkey of the rx'd CM GMP, not the pkey of the future RDMA channel, so this looks like it should change: +static int cma_save_req_info(const struct ib_cm_event *ib_event, + struct cma_req_info *req) [..] + switch (ib_event->event) { + case IB_CM_REQ_RECEIVED: + req->pkey = be16_to_cpu(req_param->primary_path->pkey); [..] + case IB_CM_SIDR_REQ_RECEIVED: + req->pkey = sidr_param->pkey; Some comment for the GID, if the GRH is present, then the DGID from there should alwas be used, not the content of the REQ. All this is because the CM IP protocol didn't include the LLADDR of the target's IPoIB interface.. If we are already looking at a breaking change to force GRH, how hard would it be to add on the LLADDR somehow instead? Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 16/06/2015 00:32, Hefty, Sean wrote: >> drivers/infiniband/core/cm.c | 7 +++++++ >> include/rdma/ib_cm.h | 2 ++ >> 2 files changed, 9 insertions(+) >> >> diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c >> index c5f5f89e274a..46f99ec4080a 100644 >> --- a/drivers/infiniband/core/cm.c >> +++ b/drivers/infiniband/core/cm.c >> @@ -2983,6 +2983,13 @@ static void cm_format_sidr_req_event(struct cm_work >> *work, >> param->pkey = __be16_to_cpu(sidr_req_msg->pkey); >> param->listen_id = listen_id; >> param->service_id = sidr_req_msg->service_id; >> + if (work->mad_recv_wc->wc->wc_flags & IB_WC_GRH) { >> + param->grh = 1; >> + memcpy(¶m->dgid, &work->mad_recv_wc->recv_buf.grh->dgid, >> + sizeof(param->dgid)); >> + } else { >> + param->grh = 0; > > What is the use case here? Are you trying to sort by device? How does the GID of the GMP relate to the listen? The idea is to allow SIDR request to be sorted by the GID, when we will have alias GIDs for IPoIB. Unlike the CM requests, SIDR requests do not contain the remote GID, so I thought we could use the GID from the GRH and turn on GRH on such systems. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 16/06/2015 01:08, Jason Gunthorpe wrote: > On Mon, Jun 15, 2015 at 09:32:53PM +0000, Hefty, Sean wrote: >>> drivers/infiniband/core/cm.c | 7 +++++++ >>> include/rdma/ib_cm.h | 2 ++ >>> 2 files changed, 9 insertions(+) >>> >>> diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c >>> index c5f5f89e274a..46f99ec4080a 100644 >>> +++ b/drivers/infiniband/core/cm.c >>> @@ -2983,6 +2983,13 @@ static void cm_format_sidr_req_event(struct cm_work >>> *work, >>> param->pkey = __be16_to_cpu(sidr_req_msg->pkey); >>> param->listen_id = listen_id; >>> param->service_id = sidr_req_msg->service_id; >>> + if (work->mad_recv_wc->wc->wc_flags & IB_WC_GRH) { >>> + param->grh = 1; >>> + memcpy(¶m->dgid, &work->mad_recv_wc->recv_buf.grh->dgid, >>> + sizeof(param->dgid)); >>> + } else { >>> + param->grh = 0; >> >> What is the use case here? Are you trying to sort by device? How does the GID of the GMP relate to the listen? > > Ouch, that is getting fugly, it used by cma_save_req_info, which feeds > the data into ib_get_net_dev_by_params - basically it chooses the > alias GUID'd netdev to use. > > But how is that going to work? How is the sender to know it should be > sending a GRH with the CM message? If the admin wants to use SIDR with alias GIDs, they will need to configure the system to enable GRH for such GMPs. (This series doesn't include such a patch). > > Falling back to use the primary_path sgid seems like a poor > substitute, if APM is being used that might be a totally different > port than the CM message. Note that the patchset uses primary_path for CM requests always (not as a fallback), and uses GRH as a fallback for SIDR requests. Regarding APM, currently the ib_cm code always sends the GMP to the primary path anyway, right? And in any case, one would expect the primary path's GID to have a valid net_device and local routing rules, so I think for the purpose of demuxing and validating the request using the primary path should be fine. > > I'm also not sure about the pkey, it seems to me that the pkey used to > select the ingress netdevice should be the pkey of the rx'd CM GMP, > not the pkey of the future RDMA channel, so this looks like it should > change: > > +static int cma_save_req_info(const struct ib_cm_event *ib_event, > + struct cma_req_info *req) > [..] > + switch (ib_event->event) { > + case IB_CM_REQ_RECEIVED: > + req->pkey = be16_to_cpu(req_param->primary_path->pkey); > [..] > + case IB_CM_SIDR_REQ_RECEIVED: > + req->pkey = sidr_param->pkey; > > Some comment for the GID, if the GRH is present, then the DGID from > there should alwas be used, not the content of the REQ. Why do you think the GMP's net_device should be used over the one of the future RDMA channel? Thinking about the network namespaces implications, I'm having trouble thinking of a good use case where a port redirector is in one namespace and the future RDMA channel is in another namespace. > > All this is because the CM IP protocol didn't include the LLADDR of > the target's IPoIB interface.. If we are already looking at a breaking > change to force GRH, how hard would it be to add on the LLADDR > somehow instead? So far we can work without GRH for CM requests, and also without GRH for SIDR requests if we rely on layer 3 for the interface resolution. I'm not against adding a LLADDR to the protocol somehow, but I don't think we should abandon all these use cases and the interoperability with existing software. Regards, Haggai -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> The idea is to allow SIDR request to be sorted by the GID, when we will > have alias GIDs for IPoIB. Please limit this series, or at least the early patches in this series, to simply moving the de-mux out of the ib_cm and into the rdma_cm. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tuesday, June 16, 2015 7:47 PM, Hefty, Sean <sean.hefty@intel.com> wrote: > To: Haggai Eran; Doug Ledford > Cc: linux-rdma@vger.kernel.org; netdev@vger.kernel.org; Liran Liss; Guy Shapiro; Shachar Raindel; Yotam Kenneth; Jason Gunthorpe > Subject: RE: [PATCH 04/11] IB/cm: Expose DGID in SIDR request events > >> The idea is to allow SIDR request to be sorted by the GID, when we will >> have alias GIDs for IPoIB. > > Please limit this series, or at least the early patches in this series, to simply moving the de-mux out of the ib_cm and into the rdma_cm. Fair enough. I'll remove this patch and the related code from this series. Haggai -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jun 16, 2015 at 02:25:07PM +0300, Haggai Eran wrote: > > But how is that going to work? How is the sender to know it should be > > sending a GRH with the CM message? > > If the admin wants to use SIDR with alias GIDs, they will need to > configure the system to enable GRH for such GMPs. (This series doesn't > include such a patch). Gross. > Regarding APM, currently the ib_cm code always sends the GMP to the > primary path anyway, right? And in any case, one would expect the > primary path's GID to have a valid net_device and local routing rules, > so I think for the purpose of demuxing and validating the request using > the primary path should be fine. The current code works that way, but it is not what I'd expect generally. For instance, future APM support will be able to drive dual-rail and policy will decide which rail is the current best rail for data transfer. So the GMP may be directed to the IPoIB device with port 1, but the data transfer may happen on the RDMA port 2. [Note, I already have very rough patches that do this de-coupling] > Why do you think the GMP's net_device should be used over the one of the > future RDMA channel? The code needs to match the incoming GMP with the logical netdev that rx's *that GMP*. The fact that goes on to setup an RDMA channel is not relevant, the nature of the future RDMA channel should not impact how the GMP is recieved. > So far we can work without GRH for CM requests, and also without GRH for > SIDR requests if we rely on layer 3 for the interface resolution. I'm > not against adding a LLADDR to the protocol somehow, but I don't think > we should abandon all these use cases and the interoperability with > existing software. Well, there is a middle ground. Lets say we get the LLADDR in the GMP somehow, then we get 100% correct operation when it is present. For degraded operation we have the (device,port,pkey) and possibly (device,port,pkey,gid) if there was a GRH. We also have the IP address hack. So, I'd say, search in this sequence: - If the LLADDR is present, just find the right netdev - Otherwise search for (device,port,pkey) / (device,port,pkey,gid) If there is only one match then that is unambiguously the correct device to use. - Repeat the above search, but add the IP address. This is the hack we perform for compatibility. There is no reason to hackily look at the GMP path parameters if we are relying on #3 above as the hack to save us in the legacy ambiguous case. .. and to answer your question in the other email, I think we should keep the hack clearly distinct from the proper operation (in fact, perhaps it should be user configurable). So #3 should be a distinct step taken when all else fails, not integrated into earlier steps. So, this series as it stands just needs to do #2/#3 above and you guys can figure out the LLADDR business later. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 17/06/2015 20:06, Jason Gunthorpe wrote: > On Tue, Jun 16, 2015 at 02:25:07PM +0300, Haggai Eran wrote: >> Regarding APM, currently the ib_cm code always sends the GMP to the >> primary path anyway, right? And in any case, one would expect the >> primary path's GID to have a valid net_device and local routing rules, >> so I think for the purpose of demuxing and validating the request using >> the primary path should be fine. > > The current code works that way, but it is not what I'd expect > generally. > > For instance, future APM support will be able to drive dual-rail and > policy will decide which rail is the current best rail for data > transfer. So the GMP may be directed to the IPoIB device with port 1, > but the data transfer may happen on the RDMA port 2. [Note, I already > have very rough patches that do this de-coupling] > >> Why do you think the GMP's net_device should be used over the one of the >> future RDMA channel? > > The code needs to match the incoming GMP with the logical netdev that > rx's *that GMP*. The fact that goes on to setup an RDMA channel is not > relevant, the nature of the future RDMA channel should not impact how > the GMP is recieved. From what I understand, ib_cm and rdma_cm keeps their own addresses. I thought that ib_cm's addresses would be used to handle GMPs, and the rdma_cm addresses (id.route.addr) to represent the created RDMA channel. After all, that is what ucma_query_addr returns. So are you proposing that we use the logical netdev that was resolved by the GMP to fill up the source address returned to user-space? It sounds like it would prevent the APM usage you described above. > >> So far we can work without GRH for CM requests, and also without GRH for >> SIDR requests if we rely on layer 3 for the interface resolution. I'm >> not against adding a LLADDR to the protocol somehow, but I don't think >> we should abandon all these use cases and the interoperability with >> existing software. > > Well, there is a middle ground. Lets say we get the LLADDR in the GMP > somehow, then we get 100% correct operation when it is present. > > For degraded operation we have the (device,port,pkey) and possibly > (device,port,pkey,gid) if there was a GRH. We also have the IP address > hack. > > So, I'd say, search in this sequence: > - If the LLADDR is present, just find the right netdev > - Otherwise search for (device,port,pkey) / (device,port,pkey,gid) > If there is only one match then that is unambiguously the correct > device to use. > - Repeat the above search, but add the IP address. This is the hack > we perform for compatibility. > > There is no reason to hackily look at the GMP path parameters if we are > relying on #3 above as the hack to save us in the legacy ambiguous case. > > .. and to answer your question in the other email, I think we should > keep the hack clearly distinct from the proper operation (in fact, > perhaps it should be user configurable). So #3 should be a distinct > step taken when all else fails, not integrated into earlier steps. > > So, this series as it stands just needs to do #2/#3 above and you guys > can figure out the LLADDR business later. Okay. I can add a first search without the IP address. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c index c5f5f89e274a..46f99ec4080a 100644 --- a/drivers/infiniband/core/cm.c +++ b/drivers/infiniband/core/cm.c @@ -2983,6 +2983,13 @@ static void cm_format_sidr_req_event(struct cm_work *work, param->pkey = __be16_to_cpu(sidr_req_msg->pkey); param->listen_id = listen_id; param->service_id = sidr_req_msg->service_id; + if (work->mad_recv_wc->wc->wc_flags & IB_WC_GRH) { + param->grh = 1; + memcpy(¶m->dgid, &work->mad_recv_wc->recv_buf.grh->dgid, + sizeof(param->dgid)); + } else { + param->grh = 0; + } param->port = work->port->port_num; work->cm_event.private_data = &sidr_req_msg->private_data; } diff --git a/include/rdma/ib_cm.h b/include/rdma/ib_cm.h index 1b567bbc3ad4..3a5d70d79790 100644 --- a/include/rdma/ib_cm.h +++ b/include/rdma/ib_cm.h @@ -224,6 +224,8 @@ struct ib_cm_apr_event_param { struct ib_cm_sidr_req_event_param { struct ib_cm_id *listen_id; __be64 service_id; + union ib_gid dgid; + u8 grh:1; u8 port; u16 pkey; };
The destination GID can be used to uniquely resolve the request. Expose the GID in SIDR request events when it is available, so that the rdma_cm module can use that information. Signed-off-by: Haggai Eran <haggaie@mellanox.com> --- drivers/infiniband/core/cm.c | 7 +++++++ include/rdma/ib_cm.h | 2 ++ 2 files changed, 9 insertions(+)