Message ID | 168616682205.2099.4473975057644323224.stgit@oracle-102.nfsv4bat.org (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [v1] RDMA/core: Handle ARPHRD_NONE devices for iWARP | expand |
> On Jun 7, 2023, at 3:43 PM, Chuck Lever <cel@kernel.org> wrote: > > From: Chuck Lever <chuck.lever@oracle.com> > > We would like to enable the use of siw on top of a VPN that is > constructed and managed via a tun device. That hasn't worked up > until now because ARPHRD_NONE devices (such as tun devices) have > no GID for the RDMA/core to look up. > > But it turns out that the egress device has already been picked for > us. addr_handler() just has to do the right thing with it. > > Tested with siw and qedr, both initiator and target. > > Suggested-by: Jason Gunthorpe <jgg@nvidia.com> > Signed-off-by: Chuck Lever <chuck.lever@oracle.com> > --- > drivers/infiniband/core/cma.c | 3 +++ > 1 file changed, 3 insertions(+) > > This of course needs broader testing, but it seems to work, and it's > a little nicer than "if (dev_type == ARPHRD_NONE)". > > One thing I discovered is that the NFS/RDMA server implementation > does not deal at all with more than one RDMA device on the system. > I will address that with an ib_client; SunRPC patches forthcoming. Or maybe not. I'm looking at cma_iw_acquire_dev(). Where is the listen_id_priv->id.port_num field supposed to be initialized? > diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c > index 56e568fcd32b..c9a2bdb49e3c 100644 > --- a/drivers/infiniband/core/cma.c > +++ b/drivers/infiniband/core/cma.c > @@ -694,6 +694,9 @@ cma_validate_port(struct ib_device *device, u32 port, > if (!rdma_dev_access_netns(device, id_priv->id.route.addr.dev_addr.net)) > return ERR_PTR(-ENODEV); > > + if (rdma_protocol_iwarp(device, port)) > + return rdma_get_gid_attr(device, port, 0); > + > if ((dev_type == ARPHRD_INFINIBAND) && !rdma_protocol_ib(device, port)) > return ERR_PTR(-ENODEV); > > > -- Chuck Lever
On 6/7/2023 3:43 PM, Chuck Lever wrote: > From: Chuck Lever <chuck.lever@oracle.com> > > We would like to enable the use of siw on top of a VPN that is > constructed and managed via a tun device. That hasn't worked up > until now because ARPHRD_NONE devices (such as tun devices) have > no GID for the RDMA/core to look up. > > But it turns out that the egress device has already been picked for > us. addr_handler() just has to do the right thing with it. > > Tested with siw and qedr, both initiator and target. > > Suggested-by: Jason Gunthorpe <jgg@nvidia.com> > Signed-off-by: Chuck Lever <chuck.lever@oracle.com> > --- > drivers/infiniband/core/cma.c | 3 +++ > 1 file changed, 3 insertions(+) > > This of course needs broader testing, but it seems to work, and it's > a little nicer than "if (dev_type == ARPHRD_NONE)". > > One thing I discovered is that the NFS/RDMA server implementation > does not deal at all with more than one RDMA device on the system. > I will address that with an ib_client; SunRPC patches forthcoming. > > diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c > index 56e568fcd32b..c9a2bdb49e3c 100644 > --- a/drivers/infiniband/core/cma.c > +++ b/drivers/infiniband/core/cma.c > @@ -694,6 +694,9 @@ cma_validate_port(struct ib_device *device, u32 port, > if (!rdma_dev_access_netns(device, id_priv->id.route.addr.dev_addr.net)) > return ERR_PTR(-ENODEV); > > + if (rdma_protocol_iwarp(device, port)) > + return rdma_get_gid_attr(device, port, 0); This might work, but I'm skeptical of the conditional. There's nothing special about iWARP that says a GID should come from the outgoing device mac. And, other protocols without IB GID addressing won't benefit. Wouldn't it be better to detect a missing GID, or infer the need from some other transport attribute? Tom. > + > if ((dev_type == ARPHRD_INFINIBAND) && !rdma_protocol_ib(device, port)) > return ERR_PTR(-ENODEV); > > > >
在 2023/6/10 4:49, Tom Talpey 写道: > On 6/7/2023 3:43 PM, Chuck Lever wrote: >> From: Chuck Lever <chuck.lever@oracle.com> >> >> We would like to enable the use of siw on top of a VPN that is >> constructed and managed via a tun device. That hasn't worked up >> until now because ARPHRD_NONE devices (such as tun devices) have >> no GID for the RDMA/core to look up. >> >> But it turns out that the egress device has already been picked for >> us. addr_handler() just has to do the right thing with it. >> >> Tested with siw and qedr, both initiator and target. >> >> Suggested-by: Jason Gunthorpe <jgg@nvidia.com> >> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> >> --- >> drivers/infiniband/core/cma.c | 3 +++ >> 1 file changed, 3 insertions(+) >> >> This of course needs broader testing, but it seems to work, and it's >> a little nicer than "if (dev_type == ARPHRD_NONE)". >> >> One thing I discovered is that the NFS/RDMA server implementation >> does not deal at all with more than one RDMA device on the system. >> I will address that with an ib_client; SunRPC patches forthcoming. >> >> diff --git a/drivers/infiniband/core/cma.c >> b/drivers/infiniband/core/cma.c >> index 56e568fcd32b..c9a2bdb49e3c 100644 >> --- a/drivers/infiniband/core/cma.c >> +++ b/drivers/infiniband/core/cma.c >> @@ -694,6 +694,9 @@ cma_validate_port(struct ib_device *device, u32 port, >> if (!rdma_dev_access_netns(device, >> id_priv->id.route.addr.dev_addr.net)) >> return ERR_PTR(-ENODEV); >> + if (rdma_protocol_iwarp(device, port)) >> + return rdma_get_gid_attr(device, port, 0); > > This might work, but I'm skeptical of the conditional. There's nothing > special about iWARP that says a GID should come from the outgoing device > mac. And, other protocols without IB GID addressing won't benefit. Agree with you. Other protocols, such as RXE, also need be handled. So a better solution than this needs. Zhu Yanjun > > Wouldn't it be better to detect a missing GID, or infer the need from > some other transport attribute? > > Tom. > >> + >> if ((dev_type == ARPHRD_INFINIBAND) && !rdma_protocol_ib(device, >> port)) >> return ERR_PTR(-ENODEV); >> >> >>
On Fri, Jun 09, 2023 at 04:49:49PM -0400, Tom Talpey wrote: > On 6/7/2023 3:43 PM, Chuck Lever wrote: > > From: Chuck Lever <chuck.lever@oracle.com> > > > > We would like to enable the use of siw on top of a VPN that is > > constructed and managed via a tun device. That hasn't worked up > > until now because ARPHRD_NONE devices (such as tun devices) have > > no GID for the RDMA/core to look up. > > > > But it turns out that the egress device has already been picked for > > us. addr_handler() just has to do the right thing with it. > > > > Tested with siw and qedr, both initiator and target. > > > > Suggested-by: Jason Gunthorpe <jgg@nvidia.com> > > Signed-off-by: Chuck Lever <chuck.lever@oracle.com> > > --- > > drivers/infiniband/core/cma.c | 3 +++ > > 1 file changed, 3 insertions(+) > > > > This of course needs broader testing, but it seems to work, and it's > > a little nicer than "if (dev_type == ARPHRD_NONE)". > > > > One thing I discovered is that the NFS/RDMA server implementation > > does not deal at all with more than one RDMA device on the system. > > I will address that with an ib_client; SunRPC patches forthcoming. > > > > diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c > > index 56e568fcd32b..c9a2bdb49e3c 100644 > > --- a/drivers/infiniband/core/cma.c > > +++ b/drivers/infiniband/core/cma.c > > @@ -694,6 +694,9 @@ cma_validate_port(struct ib_device *device, u32 port, > > if (!rdma_dev_access_netns(device, id_priv->id.route.addr.dev_addr.net)) > > return ERR_PTR(-ENODEV); > > + if (rdma_protocol_iwarp(device, port)) > > + return rdma_get_gid_attr(device, port, 0); > > This might work, but I'm skeptical of the conditional. There's nothing > special about iWARP that says a GID should come from the outgoing device > mac. And, other protocols without IB GID addressing won't benefit. The GID represents the source address, so it better come from the outgoing device or something is really wrong. iWARP gets a conditional because iwarp always has a single GID, other devices do not work that way. Jason
On 6/10/2023 8:05 AM, Jason Gunthorpe wrote: > On Fri, Jun 09, 2023 at 04:49:49PM -0400, Tom Talpey wrote: >> On 6/7/2023 3:43 PM, Chuck Lever wrote: >>> From: Chuck Lever <chuck.lever@oracle.com> >>> >>> We would like to enable the use of siw on top of a VPN that is >>> constructed and managed via a tun device. That hasn't worked up >>> until now because ARPHRD_NONE devices (such as tun devices) have >>> no GID for the RDMA/core to look up. >>> >>> But it turns out that the egress device has already been picked for >>> us. addr_handler() just has to do the right thing with it. >>> >>> Tested with siw and qedr, both initiator and target. >>> >>> Suggested-by: Jason Gunthorpe <jgg@nvidia.com> >>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> >>> --- >>> drivers/infiniband/core/cma.c | 3 +++ >>> 1 file changed, 3 insertions(+) >>> >>> This of course needs broader testing, but it seems to work, and it's >>> a little nicer than "if (dev_type == ARPHRD_NONE)". >>> >>> One thing I discovered is that the NFS/RDMA server implementation >>> does not deal at all with more than one RDMA device on the system. >>> I will address that with an ib_client; SunRPC patches forthcoming. >>> >>> diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c >>> index 56e568fcd32b..c9a2bdb49e3c 100644 >>> --- a/drivers/infiniband/core/cma.c >>> +++ b/drivers/infiniband/core/cma.c >>> @@ -694,6 +694,9 @@ cma_validate_port(struct ib_device *device, u32 port, >>> if (!rdma_dev_access_netns(device, id_priv->id.route.addr.dev_addr.net)) >>> return ERR_PTR(-ENODEV); >>> + if (rdma_protocol_iwarp(device, port)) >>> + return rdma_get_gid_attr(device, port, 0); >> >> This might work, but I'm skeptical of the conditional. There's nothing >> special about iWARP that says a GID should come from the outgoing device >> mac. And, other protocols without IB GID addressing won't benefit. > > The GID represents the source address, so it better come from the > outgoing device or something is really wrong. > > iWARP gets a conditional because iwarp always has a single GID, other > devices do not work that way. Not sure I follow. TCP is routable so it can use multiple egress ports. That same routing means an incoming packet bearing an appropriate local address will be accepted on any port. So still, I don't think this an iWARP property per se. It's coming from the transport and its addressing. I'd hope that this can be gleaned from something other than "it's iWARP, cma needs to do ...". Tom.
> On Jun 9, 2023, at 8:04 PM, Zhu Yanjun <yanjun.zhu@linux.dev> wrote: > > 在 2023/6/10 4:49, Tom Talpey 写道: >> On 6/7/2023 3:43 PM, Chuck Lever wrote: >>> From: Chuck Lever <chuck.lever@oracle.com> >>> >>> We would like to enable the use of siw on top of a VPN that is >>> constructed and managed via a tun device. That hasn't worked up >>> until now because ARPHRD_NONE devices (such as tun devices) have >>> no GID for the RDMA/core to look up. >>> >>> But it turns out that the egress device has already been picked for >>> us. addr_handler() just has to do the right thing with it. >>> >>> Tested with siw and qedr, both initiator and target. >>> >>> Suggested-by: Jason Gunthorpe <jgg@nvidia.com> >>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> >>> --- >>> drivers/infiniband/core/cma.c | 3 +++ >>> 1 file changed, 3 insertions(+) >>> >>> This of course needs broader testing, but it seems to work, and it's >>> a little nicer than "if (dev_type == ARPHRD_NONE)". >>> >>> One thing I discovered is that the NFS/RDMA server implementation >>> does not deal at all with more than one RDMA device on the system. >>> I will address that with an ib_client; SunRPC patches forthcoming. >>> >>> diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c >>> index 56e568fcd32b..c9a2bdb49e3c 100644 >>> --- a/drivers/infiniband/core/cma.c >>> +++ b/drivers/infiniband/core/cma.c >>> @@ -694,6 +694,9 @@ cma_validate_port(struct ib_device *device, u32 port, >>> if (!rdma_dev_access_netns(device, id_priv->id.route.addr.dev_addr.net)) >>> return ERR_PTR(-ENODEV); >>> + if (rdma_protocol_iwarp(device, port)) >>> + return rdma_get_gid_attr(device, port, 0); >> This might work, but I'm skeptical of the conditional. There's nothing >> special about iWARP that says a GID should come from the outgoing device >> mac. And, other protocols without IB GID addressing won't benefit. Hi Zhu, thanks for having a look. > Agree with you. Other protocols, such as RXE, also need be handled. > So a better solution than this needs. I assume you mean, in particular, RoCEv2 with rxe on a non-Ethernet device? Tom and I discussed that possibility privately while I developed this patch. I do not object to that idea. But I would feel more comfortable if that was justified and tested via a separate patch. I expect that the logic for those cases will look different to what I'm adding here. > Zhu Yanjun > >> Wouldn't it be better to detect a missing GID, or infer the need from >> some other transport attribute? >> Tom. >>> + >>> if ((dev_type == ARPHRD_INFINIBAND) && !rdma_protocol_ib(device, port)) >>> return ERR_PTR(-ENODEV); -- Chuck Lever
> On Jun 10, 2023, at 12:38 PM, Tom Talpey <tom@talpey.com> wrote: > > On 6/10/2023 8:05 AM, Jason Gunthorpe wrote: >> On Fri, Jun 09, 2023 at 04:49:49PM -0400, Tom Talpey wrote: >>> On 6/7/2023 3:43 PM, Chuck Lever wrote: >>>> From: Chuck Lever <chuck.lever@oracle.com> >>>> >>>> We would like to enable the use of siw on top of a VPN that is >>>> constructed and managed via a tun device. That hasn't worked up >>>> until now because ARPHRD_NONE devices (such as tun devices) have >>>> no GID for the RDMA/core to look up. >>>> >>>> But it turns out that the egress device has already been picked for >>>> us. addr_handler() just has to do the right thing with it. >>>> >>>> Tested with siw and qedr, both initiator and target. >>>> >>>> Suggested-by: Jason Gunthorpe <jgg@nvidia.com> >>>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> >>>> --- >>>> drivers/infiniband/core/cma.c | 3 +++ >>>> 1 file changed, 3 insertions(+) >>>> >>>> This of course needs broader testing, but it seems to work, and it's >>>> a little nicer than "if (dev_type == ARPHRD_NONE)". >>>> >>>> One thing I discovered is that the NFS/RDMA server implementation >>>> does not deal at all with more than one RDMA device on the system. >>>> I will address that with an ib_client; SunRPC patches forthcoming. >>>> >>>> diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c >>>> index 56e568fcd32b..c9a2bdb49e3c 100644 >>>> --- a/drivers/infiniband/core/cma.c >>>> +++ b/drivers/infiniband/core/cma.c >>>> @@ -694,6 +694,9 @@ cma_validate_port(struct ib_device *device, u32 port, >>>> if (!rdma_dev_access_netns(device, id_priv->id.route.addr.dev_addr.net)) >>>> return ERR_PTR(-ENODEV); >>>> + if (rdma_protocol_iwarp(device, port)) >>>> + return rdma_get_gid_attr(device, port, 0); >>> >>> This might work, but I'm skeptical of the conditional. There's nothing >>> special about iWARP that says a GID should come from the outgoing device >>> mac. And, other protocols without IB GID addressing won't benefit. >> The GID represents the source address, so it better come from the >> outgoing device or something is really wrong. >> iWARP gets a conditional because iwarp always has a single GID, other >> devices do not work that way. > > Not sure I follow. TCP is routable so it can use multiple egress ports. > That same routing means an incoming packet bearing an appropriate local > address will be accepted on any port. An siw virtual device, for example, is paired with one particular network i/f. I'm not sure that "multiple egress ports" is applicable. I would think this is also the case with a device-offloaded iWARP implementation: the egress device is always the i/f on the offload card. > So still, I don't think this an iWARP property per se. Agreed, it's not particular to the iWARP protocol. But it does seem to be particular to the entire class of iWARP protocol implementations on Linux. Adding a comment about this distinction would certainly be apropos. > It's coming from > the transport and its addressing. I'd hope that this can be gleaned from > something other than "it's iWARP, cma needs to do ...". I could add another flag somewhere to indicate this property of a device. I expect only Linux iWARP devices would set it, though, so would it really add much value? Do you have a recommendation for where to home such a flag? Would it be a fixed device property, or something that would be set dynamically during earlier steps of address resolution? Again, it feels like something that would be set iff rdma_protocol_iwarp() is true... -- Chuck Lever
在 2023/6/11 0:43, Chuck Lever III 写道: > >> On Jun 9, 2023, at 8:04 PM, Zhu Yanjun <yanjun.zhu@linux.dev> wrote: >> >> 在 2023/6/10 4:49, Tom Talpey 写道: >>> On 6/7/2023 3:43 PM, Chuck Lever wrote: >>>> From: Chuck Lever <chuck.lever@oracle.com> >>>> >>>> We would like to enable the use of siw on top of a VPN that is >>>> constructed and managed via a tun device. That hasn't worked up >>>> until now because ARPHRD_NONE devices (such as tun devices) have >>>> no GID for the RDMA/core to look up. >>>> >>>> But it turns out that the egress device has already been picked for >>>> us. addr_handler() just has to do the right thing with it. >>>> >>>> Tested with siw and qedr, both initiator and target. >>>> >>>> Suggested-by: Jason Gunthorpe <jgg@nvidia.com> >>>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> >>>> --- >>>> drivers/infiniband/core/cma.c | 3 +++ >>>> 1 file changed, 3 insertions(+) >>>> >>>> This of course needs broader testing, but it seems to work, and it's >>>> a little nicer than "if (dev_type == ARPHRD_NONE)". >>>> >>>> One thing I discovered is that the NFS/RDMA server implementation >>>> does not deal at all with more than one RDMA device on the system. >>>> I will address that with an ib_client; SunRPC patches forthcoming. >>>> >>>> diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c >>>> index 56e568fcd32b..c9a2bdb49e3c 100644 >>>> --- a/drivers/infiniband/core/cma.c >>>> +++ b/drivers/infiniband/core/cma.c >>>> @@ -694,6 +694,9 @@ cma_validate_port(struct ib_device *device, u32 port, >>>> if (!rdma_dev_access_netns(device, id_priv->id.route.addr.dev_addr.net)) >>>> return ERR_PTR(-ENODEV); >>>> + if (rdma_protocol_iwarp(device, port)) >>>> + return rdma_get_gid_attr(device, port, 0); >>> This might work, but I'm skeptical of the conditional. There's nothing >>> special about iWARP that says a GID should come from the outgoing device >>> mac. And, other protocols without IB GID addressing won't benefit. > Hi Zhu, thanks for having a look. > > >> Agree with you. Other protocols, such as RXE, also need be handled. >> So a better solution than this needs. > I assume you mean, in particular, RoCEv2 with rxe on a non-Ethernet > device? Tom and I discussed that possibility privately while I > developed this patch. I am not sure if I can get you correctly or not. I mean, if RXE is attached to a loopback device, the symptopms are similar to iWARP. So if you want to fix this problem on iWARP, it is better to also fix the similar problem on RXE. That is , your patch should fix the similar problems on all the rdma devices, not just for iWARP. I am not sure if this request is appropriate to you or not. Zhu Yanjun > > I do not object to that idea. But I would feel more comfortable if > that was justified and tested via a separate patch. I expect that > the logic for those cases will look different to what I'm adding here. > > >> Zhu Yanjun >> >>> Wouldn't it be better to detect a missing GID, or infer the need from >>> some other transport attribute? >>> Tom. >>>> + >>>> if ((dev_type == ARPHRD_INFINIBAND) && !rdma_protocol_ib(device, port)) >>>> return ERR_PTR(-ENODEV); > > -- > Chuck Lever > >
在 2023/6/10 20:05, Jason Gunthorpe 写道: > On Fri, Jun 09, 2023 at 04:49:49PM -0400, Tom Talpey wrote: >> On 6/7/2023 3:43 PM, Chuck Lever wrote: >>> From: Chuck Lever <chuck.lever@oracle.com> >>> >>> We would like to enable the use of siw on top of a VPN that is >>> constructed and managed via a tun device. That hasn't worked up >>> until now because ARPHRD_NONE devices (such as tun devices) have >>> no GID for the RDMA/core to look up. >>> >>> But it turns out that the egress device has already been picked for >>> us. addr_handler() just has to do the right thing with it. >>> >>> Tested with siw and qedr, both initiator and target. >>> >>> Suggested-by: Jason Gunthorpe <jgg@nvidia.com> >>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> >>> --- >>> drivers/infiniband/core/cma.c | 3 +++ >>> 1 file changed, 3 insertions(+) >>> >>> This of course needs broader testing, but it seems to work, and it's >>> a little nicer than "if (dev_type == ARPHRD_NONE)". >>> >>> One thing I discovered is that the NFS/RDMA server implementation >>> does not deal at all with more than one RDMA device on the system. >>> I will address that with an ib_client; SunRPC patches forthcoming. >>> >>> diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c >>> index 56e568fcd32b..c9a2bdb49e3c 100644 >>> --- a/drivers/infiniband/core/cma.c >>> +++ b/drivers/infiniband/core/cma.c >>> @@ -694,6 +694,9 @@ cma_validate_port(struct ib_device *device, u32 port, >>> if (!rdma_dev_access_netns(device, id_priv->id.route.addr.dev_addr.net)) >>> return ERR_PTR(-ENODEV); >>> + if (rdma_protocol_iwarp(device, port)) >>> + return rdma_get_gid_attr(device, port, 0); >> >> This might work, but I'm skeptical of the conditional. There's nothing >> special about iWARP that says a GID should come from the outgoing device >> mac. And, other protocols without IB GID addressing won't benefit. > > The GID represents the source address, so it better come from the > outgoing device or something is really wrong. > > iWARP gets a conditional because iwarp always has a single GID, other > devices do not work that way. Thanks. If I get this problem correctly, this problem is about "GID can not be generated correctly if a iWARP device is attached to a loopback device". This patch will fix this problem. So iWARP device can be managed via this unique GID. Do you mean that RXE does not have the similar problem? But if a RXE device is attached to a loopback device, the similar problem occurs. Zhu Yanjun > > Jason
On Sat, Jun 10, 2023 at 12:38:53PM -0400, Tom Talpey wrote: > On 6/10/2023 8:05 AM, Jason Gunthorpe wrote: > > On Fri, Jun 09, 2023 at 04:49:49PM -0400, Tom Talpey wrote: > > > On 6/7/2023 3:43 PM, Chuck Lever wrote: > > > > From: Chuck Lever <chuck.lever@oracle.com> > > > > > > > > We would like to enable the use of siw on top of a VPN that is > > > > constructed and managed via a tun device. That hasn't worked up > > > > until now because ARPHRD_NONE devices (such as tun devices) have > > > > no GID for the RDMA/core to look up. > > > > > > > > But it turns out that the egress device has already been picked for > > > > us. addr_handler() just has to do the right thing with it. > > > > > > > > Tested with siw and qedr, both initiator and target. > > > > > > > > Suggested-by: Jason Gunthorpe <jgg@nvidia.com> > > > > Signed-off-by: Chuck Lever <chuck.lever@oracle.com> > > > > --- > > > > drivers/infiniband/core/cma.c | 3 +++ > > > > 1 file changed, 3 insertions(+) > > > > > > > > This of course needs broader testing, but it seems to work, and it's > > > > a little nicer than "if (dev_type == ARPHRD_NONE)". > > > > > > > > One thing I discovered is that the NFS/RDMA server implementation > > > > does not deal at all with more than one RDMA device on the system. > > > > I will address that with an ib_client; SunRPC patches forthcoming. > > > > > > > > diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c > > > > index 56e568fcd32b..c9a2bdb49e3c 100644 > > > > --- a/drivers/infiniband/core/cma.c > > > > +++ b/drivers/infiniband/core/cma.c > > > > @@ -694,6 +694,9 @@ cma_validate_port(struct ib_device *device, u32 port, > > > > if (!rdma_dev_access_netns(device, id_priv->id.route.addr.dev_addr.net)) > > > > return ERR_PTR(-ENODEV); > > > > + if (rdma_protocol_iwarp(device, port)) > > > > + return rdma_get_gid_attr(device, port, 0); > > > > > > This might work, but I'm skeptical of the conditional. There's nothing > > > special about iWARP that says a GID should come from the outgoing device > > > mac. And, other protocols without IB GID addressing won't benefit. > > > > The GID represents the source address, so it better come from the > > outgoing device or something is really wrong. > > > > iWARP gets a conditional because iwarp always has a single GID, other > > devices do not work that way. > > Not sure I follow. TCP is routable so it can use multiple egress ports. > That same routing means an incoming packet bearing an appropriate local > address will be accepted on any port. sgid is about the outgoing device (and usually IP, but not for iwarp), and RDMA is always bound to a single outgoing netdevice > So still, I don't think this an iWARP property per se. It's coming from > the transport and its addressing. I'd hope that this can be gleaned from > something other than "it's iWARP, cma needs to do ...". Well, we are supposed to match the sgid based on the outgoing route selected, but the iwarp drivers don't setup their sgid's properly.. You are right it should be fixed better, but I'm not sure anyone is left familiar enough with all the iwarp drivers to do the change. Jason
> On Jun 10, 2023, at 8:05 AM, Jason Gunthorpe <jgg@nvidia.com> wrote: > > On Fri, Jun 09, 2023 at 04:49:49PM -0400, Tom Talpey wrote: >> On 6/7/2023 3:43 PM, Chuck Lever wrote: >>> From: Chuck Lever <chuck.lever@oracle.com> >>> >>> We would like to enable the use of siw on top of a VPN that is >>> constructed and managed via a tun device. That hasn't worked up >>> until now because ARPHRD_NONE devices (such as tun devices) have >>> no GID for the RDMA/core to look up. >>> >>> But it turns out that the egress device has already been picked for >>> us. addr_handler() just has to do the right thing with it. >>> >>> Tested with siw and qedr, both initiator and target. >>> >>> Suggested-by: Jason Gunthorpe <jgg@nvidia.com> >>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> >>> --- >>> drivers/infiniband/core/cma.c | 3 +++ >>> 1 file changed, 3 insertions(+) >>> >>> This of course needs broader testing, but it seems to work, and it's >>> a little nicer than "if (dev_type == ARPHRD_NONE)". >>> >>> One thing I discovered is that the NFS/RDMA server implementation >>> does not deal at all with more than one RDMA device on the system. >>> I will address that with an ib_client; SunRPC patches forthcoming. >>> >>> diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c >>> index 56e568fcd32b..c9a2bdb49e3c 100644 >>> --- a/drivers/infiniband/core/cma.c >>> +++ b/drivers/infiniband/core/cma.c >>> @@ -694,6 +694,9 @@ cma_validate_port(struct ib_device *device, u32 port, >>> if (!rdma_dev_access_netns(device, id_priv->id.route.addr.dev_addr.net)) >>> return ERR_PTR(-ENODEV); >>> + if (rdma_protocol_iwarp(device, port)) >>> + return rdma_get_gid_attr(device, port, 0); >> >> This might work, but I'm skeptical of the conditional. There's nothing >> special about iWARP that says a GID should come from the outgoing device >> mac. And, other protocols without IB GID addressing won't benefit. > > The GID represents the source address, so it better come from the > outgoing device or something is really wrong. > > iWARP gets a conditional because iwarp always has a single GID, other > devices do not work that way. If rdma_get_gid_attr() works for rxe, perhaps we could change the conditional to match on only software-emulated devices. -- Chuck Lever
在 2023/6/12 8:19, Chuck Lever III 写道: > > >> On Jun 10, 2023, at 8:05 AM, Jason Gunthorpe <jgg@nvidia.com> wrote: >> >> On Fri, Jun 09, 2023 at 04:49:49PM -0400, Tom Talpey wrote: >>> On 6/7/2023 3:43 PM, Chuck Lever wrote: >>>> From: Chuck Lever <chuck.lever@oracle.com> >>>> >>>> We would like to enable the use of siw on top of a VPN that is >>>> constructed and managed via a tun device. That hasn't worked up >>>> until now because ARPHRD_NONE devices (such as tun devices) have >>>> no GID for the RDMA/core to look up. >>>> >>>> But it turns out that the egress device has already been picked for >>>> us. addr_handler() just has to do the right thing with it. >>>> >>>> Tested with siw and qedr, both initiator and target. >>>> >>>> Suggested-by: Jason Gunthorpe <jgg@nvidia.com> >>>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> >>>> --- >>>> drivers/infiniband/core/cma.c | 3 +++ >>>> 1 file changed, 3 insertions(+) >>>> >>>> This of course needs broader testing, but it seems to work, and it's >>>> a little nicer than "if (dev_type == ARPHRD_NONE)". >>>> >>>> One thing I discovered is that the NFS/RDMA server implementation >>>> does not deal at all with more than one RDMA device on the system. >>>> I will address that with an ib_client; SunRPC patches forthcoming. >>>> >>>> diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c >>>> index 56e568fcd32b..c9a2bdb49e3c 100644 >>>> --- a/drivers/infiniband/core/cma.c >>>> +++ b/drivers/infiniband/core/cma.c >>>> @@ -694,6 +694,9 @@ cma_validate_port(struct ib_device *device, u32 port, >>>> if (!rdma_dev_access_netns(device, id_priv->id.route.addr.dev_addr.net)) >>>> return ERR_PTR(-ENODEV); >>>> + if (rdma_protocol_iwarp(device, port)) >>>> + return rdma_get_gid_attr(device, port, 0); >>> >>> This might work, but I'm skeptical of the conditional. There's nothing >>> special about iWARP that says a GID should come from the outgoing device >>> mac. And, other protocols without IB GID addressing won't benefit. >> >> The GID represents the source address, so it better come from the >> outgoing device or something is really wrong. >> >> iWARP gets a conditional because iwarp always has a single GID, other >> devices do not work that way. > > If rdma_get_gid_attr() works for rxe, perhaps we could change > the conditional to match on only software-emulated devices. drivers/infiniband/sw/rxe/rxe_net.c: 453 struct sk_buff *rxe_init_packet(struct rxe_dev *rxe, struct rxe_av *av, 454 int paylen, struct rxe_pkt_info *pkt) 455 { 456 unsigned int hdr_len; 457 struct sk_buff *skb = NULL; 458 struct net_device *ndev; 459 const struct ib_gid_attr *attr; 460 const int port_num = 1; 461 462 attr = rdma_get_gid_attr(&rxe->ib_dev, port_num, av->grh.sgid_index); 463 if (IS_ERR(attr)) 464 return NULL; 465 ... Zhu Yanjun > > > -- > Chuck Lever > >
diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index 56e568fcd32b..c9a2bdb49e3c 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -694,6 +694,9 @@ cma_validate_port(struct ib_device *device, u32 port, if (!rdma_dev_access_netns(device, id_priv->id.route.addr.dev_addr.net)) return ERR_PTR(-ENODEV); + if (rdma_protocol_iwarp(device, port)) + return rdma_get_gid_attr(device, port, 0); + if ((dev_type == ARPHRD_INFINIBAND) && !rdma_protocol_ib(device, port)) return ERR_PTR(-ENODEV);