[RFC,0/2] IB device in-kernel API support indication

Message ID	1546335025-31360-1-git-send-email-galpress@amazon.com (mailing list archive)
Headers	show Return-Path: <linux-rdma-owner@kernel.org> From: Gal Pressman <galpress@amazon.com> To: Jason Gunthorpe <jgg@ziepe.ca>, Doug Ledford <dledford@redhat.com> CC: <linux-rdma@vger.kernel.org>, Alexander Matushevsky <matua@amazon.com>, Yossi Leybovich <sleybo@amazon.com>, Dave Goodell <goodell@amazon.com>, "Brian Barrett" <bbarrett@amazon.com>, Leah Shalev <shalevl@amazon.com>, Sean Hefty <sean.hefty@intel.com>, Gal Pressman <galpress@amazon.com> Subject: [PATCH RFC 0/2] IB device in-kernel API support indication Date: Tue, 1 Jan 2019 11:30:23 +0200 Message-ID: <1546335025-31360-1-git-send-email-galpress@amazon.com> MIME-Version: 1.0 Content-Type: text/plain Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk
Series	IB device in-kernel API support indication \| expand [RFC,0/2] IB device in-kernel API support indication [RFC,1/2] RDMA: Add indication for in kernel API support to IB device [RFC,2/2] IB/usnic: Mark device as a non kernel verbs provider

Gal Pressman Jan. 1, 2019, 9:30 a.m. UTC

Hello all,
This RFC allows device drivers to indicate their support for in-kernel API
through a flag in the IB device.
Currently, devices that do not support in-kernel APIs (such as usnic) have no
way to communicate that to the ULPs which try to use the device and fail.
Instead, make the driver advertise its support upfront and allow clients to
exit gracefully in case of unsupported device.

Patch #1 adds the flag to the IB device, sets all existing drivers as kernel
verbs providers and chanes the IB clients.
Patch #2 changes usnic driver to a non-kernel verbs provider as it offers no
kernel API support.

This RFC is introduced following the discussion over the EFA driver [1], which
initially does not provide in-kernel API support.

[1] https://patchwork.kernel.org/cover/10711629/

Thanks,
Gal

Gal Pressman (2):
  RDMA: Add indication for in kernel API support to IB device
  IB/usnic: Mark device as a non kernel verbs provider

 drivers/infiniband/core/cm.c                    | 3 +++
 drivers/infiniband/core/cma.c                   | 3 +++
 drivers/infiniband/core/mad.c                   | 3 +++
 drivers/infiniband/core/multicast.c             | 3 +++
 drivers/infiniband/core/sa_query.c              | 3 +++
 drivers/infiniband/core/ucm.c                   | 3 ++-
 drivers/infiniband/core/user_mad.c              | 3 +++
 drivers/infiniband/hw/bnxt_re/main.c            | 1 +
 drivers/infiniband/hw/cxgb3/iwch_provider.c     | 1 +
 drivers/infiniband/hw/cxgb4/provider.c          | 1 +
 drivers/infiniband/hw/hns/hns_roce_main.c       | 1 +
 drivers/infiniband/hw/i40iw/i40iw_verbs.c       | 1 +
 drivers/infiniband/hw/mlx4/main.c               | 1 +
 drivers/infiniband/hw/mlx5/main.c               | 1 +
 drivers/infiniband/hw/mthca/mthca_provider.c    | 1 +
 drivers/infiniband/hw/nes/nes_verbs.c           | 1 +
 drivers/infiniband/hw/ocrdma/ocrdma_main.c      | 1 +
 drivers/infiniband/hw/qedr/main.c               | 1 +
 drivers/infiniband/hw/usnic/usnic_ib_main.c     | 1 +
 drivers/infiniband/hw/vmw_pvrdma/pvrdma_main.c  | 1 +
 drivers/infiniband/sw/rdmavt/vt.c               | 1 +
 drivers/infiniband/sw/rxe/rxe_verbs.c           | 1 +
 drivers/infiniband/ulp/ipoib/ipoib_main.c       | 3 +++
 drivers/infiniband/ulp/opa_vnic/opa_vnic_vema.c | 2 +-
 drivers/infiniband/ulp/srp/ib_srp.c             | 3 +++
 drivers/infiniband/ulp/srpt/ib_srpt.c           | 3 +++
 include/rdma/ib_verbs.h                         | 1 +
 net/rds/ib.c                                    | 4 ++--
 net/smc/smc_ib.c                                | 2 +-
 29 files changed, 49 insertions(+), 5 deletions(-)

Bart Van Assche Jan. 1, 2019, 4:01 p.m. UTC | #1

On 1/1/19 1:30 AM, Gal Pressman wrote:
> Hello all,
> This RFC allows device drivers to indicate their support for in-kernel API
> through a flag in the IB device.
> Currently, devices that do not support in-kernel APIs (such as usnic) have no
> way to communicate that to the ULPs which try to use the device and fail.
> Instead, make the driver advertise its support upfront and allow clients to
> exit gracefully in case of unsupported device.
> 
> Patch #1 adds the flag to the IB device, sets all existing drivers as kernel
> verbs providers and chanes the IB clients.
> Patch #2 changes usnic driver to a non-kernel verbs provider as it offers no
> kernel API support.
> 
> This RFC is introduced following the discussion over the EFA driver [1], which
> initially does not provide in-kernel API support.
> 
> [1] https://patchwork.kernel.org/cover/10711629/

Having some drivers support kernel verbs and others not is confusing to 
Linux RDMA users. If we add a kverbs_provider flag then that means that 
we officially support that not all drivers support kverbs. I don't like 
this - I think new RDMA drivers should support both kverbs and uverbs.

What is so hard about adding kverbs support to the EFA driver?

Bart.

Leon Romanovsky Jan. 1, 2019, 4:37 p.m. UTC | #2

On Tue, Jan 01, 2019 at 08:01:39AM -0800, Bart Van Assche wrote:
> On 1/1/19 1:30 AM, Gal Pressman wrote:
> > Hello all,
> > This RFC allows device drivers to indicate their support for in-kernel API
> > through a flag in the IB device.
> > Currently, devices that do not support in-kernel APIs (such as usnic) have no
> > way to communicate that to the ULPs which try to use the device and fail.
> > Instead, make the driver advertise its support upfront and allow clients to
> > exit gracefully in case of unsupported device.
> >
> > Patch #1 adds the flag to the IB device, sets all existing drivers as kernel
> > verbs providers and chanes the IB clients.
> > Patch #2 changes usnic driver to a non-kernel verbs provider as it offers no
> > kernel API support.
> >
> > This RFC is introduced following the discussion over the EFA driver [1], which
> > initially does not provide in-kernel API support.
> >
> > [1] https://patchwork.kernel.org/cover/10711629/
>
> Having some drivers support kernel verbs and others not is confusing to
> Linux RDMA users. If we add a kverbs_provider flag then that means that we
> officially support that not all drivers support kverbs. I don't like this -
> I think new RDMA drivers should support both kverbs and uverbs.

+1, and I'm not sure that we came to any meaningful conclusion that we
allow drivers without kverbs, do we?

>
> What is so hard about adding kverbs support to the EFA driver?

It is not hard, but they don't need it and won't test it and probably
won't care to support it too.

So, this series returned us to square one, to discussion if EFA belongs
to drivers/infiniband/ or not.

Thanks

>
> Bart.

Gal Pressman Jan. 1, 2019, 4:38 p.m. UTC | #3

On 01-Jan-19 18:01, Bart Van Assche wrote:
> On 1/1/19 1:30 AM, Gal Pressman wrote:
>> Hello all,
>> This RFC allows device drivers to indicate their support for in-kernel API
>> through a flag in the IB device.
>> Currently, devices that do not support in-kernel APIs (such as usnic) have no
>> way to communicate that to the ULPs which try to use the device and fail.
>> Instead, make the driver advertise its support upfront and allow clients to
>> exit gracefully in case of unsupported device.
>>
>> Patch #1 adds the flag to the IB device, sets all existing drivers as kernel
>> verbs providers and chanes the IB clients.
>> Patch #2 changes usnic driver to a non-kernel verbs provider as it offers no
>> kernel API support.
>>
>> This RFC is introduced following the discussion over the EFA driver [1], which
>> initially does not provide in-kernel API support.
>>
>> [1] https://patchwork.kernel.org/cover/10711629/
> 
> Having some drivers support kernel verbs and others not is confusing to Linux
> RDMA users. If we add a kverbs_provider flag then that means that we officially
> support that not all drivers support kverbs. I don't like this - I think new
> RDMA drivers should support both kverbs and uverbs.
> 
> What is so hard about adding kverbs support to the EFA driver?
> 
> Bart.

Hi Bart,
This RFC is sent in order to help prevent the confusion; there's already a
driver that doesn't support kverbs which instead of having a proper way to
advertise that, simply fails the kernel callbacks. Informing the ib clients
upfront seems like a more healthy behavior to me.

For the RDMA users, I can add a sysfs/rdma tool indication that will make it
clear whether the device supports kernel verbs or not, does that sound reasonable?

EFA supported QP types are UD and Scalable Reliable Datagram (SRD, driver QP
type). Since kernel verbs do not make use of either of those QP types we do not
support in-kernel APIs initially.

Sagi Grimberg Jan. 2, 2019, 12:27 a.m. UTC | #4

Hey,

>>> Hello all,
>>> This RFC allows device drivers to indicate their support for in-kernel API
>>> through a flag in the IB device.
>>> Currently, devices that do not support in-kernel APIs (such as usnic) have no
>>> way to communicate that to the ULPs which try to use the device and fail.
>>> Instead, make the driver advertise its support upfront and allow clients to
>>> exit gracefully in case of unsupported device.
>>>
>>> Patch #1 adds the flag to the IB device, sets all existing drivers as kernel
>>> verbs providers and chanes the IB clients.
>>> Patch #2 changes usnic driver to a non-kernel verbs provider as it offers no
>>> kernel API support.
>>>
>>> This RFC is introduced following the discussion over the EFA driver [1], which
>>> initially does not provide in-kernel API support.
>>>
>>> [1] https://patchwork.kernel.org/cover/10711629/
>>
>> Having some drivers support kernel verbs and others not is confusing to
>> Linux RDMA users. If we add a kverbs_provider flag then that means that we
>> officially support that not all drivers support kverbs. I don't like this -
>> I think new RDMA drivers should support both kverbs and uverbs.
> 
> +1, and I'm not sure that we came to any meaningful conclusion that we
> allow drivers without kverbs, do we?

I think that the discussion is a bit backwards and so is the RFC..

Its not that EFA does not support kverbs, isn't kverbs is an abstract
name for "what our current ulps require"?

Why is this kernel specific anyways? The exact same holds for uverbs
applications that use RC, or in other words we can also see a kernel
consumer that use UD (and does not rely on IB addressing like ipoib)
that can run on the EFA device...

Perhaps an rdma device needs to specify mask of which QP types it
supports such that the core or the consumer can look at if it wants to
log a meaningful error message (or it can simply fail ib_create_qp).

And, fwiw, I'm not sure I understand why should a new device support our
kernel ulps (again, not kverbs, functionality required by our existing
kernel consumers) if its users are not interested in it (if it was the
case then it would probably be supported). Isn't it enough that
something like rsockets can run on a device to justify its existence?

Gal Pressman Jan. 2, 2019, 8:40 a.m. UTC | #5

Hey Sagi,

On 02-Jan-19 02:27, Sagi Grimberg wrote:
> Hey,
> 
>>>> Hello all,
>>>> This RFC allows device drivers to indicate their support for in-kernel API
>>>> through a flag in the IB device.
>>>> Currently, devices that do not support in-kernel APIs (such as usnic) have no
>>>> way to communicate that to the ULPs which try to use the device and fail.
>>>> Instead, make the driver advertise its support upfront and allow clients to
>>>> exit gracefully in case of unsupported device.
>>>>
>>>> Patch #1 adds the flag to the IB device, sets all existing drivers as kernel
>>>> verbs providers and chanes the IB clients.
>>>> Patch #2 changes usnic driver to a non-kernel verbs provider as it offers no
>>>> kernel API support.
>>>>
>>>> This RFC is introduced following the discussion over the EFA driver [1], which
>>>> initially does not provide in-kernel API support.
>>>>
>>>> [1] https://patchwork.kernel.org/cover/10711629/
>>>
>>> Having some drivers support kernel verbs and others not is confusing to
>>> Linux RDMA users. If we add a kverbs_provider flag then that means that we
>>> officially support that not all drivers support kverbs. I don't like this -
>>> I think new RDMA drivers should support both kverbs and uverbs.
>>
>> +1, and I'm not sure that we came to any meaningful conclusion that we
>> allow drivers without kverbs, do we?
> 
> I think that the discussion is a bit backwards and so is the RFC..
> 
> Its not that EFA does not support kverbs, isn't kverbs is an abstract
> name for "what our current ulps require"?
> 
> Why is this kernel specific anyways? The exact same holds for uverbs
> applications that use RC, or in other words we can also see a kernel
> consumer that use UD (and does not rely on IB addressing like ipoib)
> that can run on the EFA device...

Makes sense.

> 
> Perhaps an rdma device needs to specify mask of which QP types it
> supports such that the core or the consumer can look at if it wants to
> log a meaningful error message (or it can simply fail ib_create_qp).
Supported QP types for the device sounds good.
This could both move a chunk of drivers qp types checks to the core and help
clients identify an unsupported device.

> 
> And, fwiw, I'm not sure I understand why should a new device support our
> kernel ulps (again, not kverbs, functionality required by our existing
> kernel consumers) if its users are not interested in it (if it was the
> case then it would probably be supported). Isn't it enough that
> something like rsockets can run on a device to justify its existence?
Exactly, we just need a proper way to let the ulps know the device does not
provide the required functionality.

Leon Romanovsky Jan. 2, 2019, 11:31 a.m. UTC | #6

On Wed, Jan 02, 2019 at 10:40:52AM +0200, Gal Pressman wrote:
> Hey Sagi,
>
> On 02-Jan-19 02:27, Sagi Grimberg wrote:
> > Hey,
> >
> >>>> Hello all,
> >>>> This RFC allows device drivers to indicate their support for in-kernel API
> >>>> through a flag in the IB device.
> >>>> Currently, devices that do not support in-kernel APIs (such as usnic) have no
> >>>> way to communicate that to the ULPs which try to use the device and fail.
> >>>> Instead, make the driver advertise its support upfront and allow clients to
> >>>> exit gracefully in case of unsupported device.
> >>>>
> >>>> Patch #1 adds the flag to the IB device, sets all existing drivers as kernel
> >>>> verbs providers and chanes the IB clients.
> >>>> Patch #2 changes usnic driver to a non-kernel verbs provider as it offers no
> >>>> kernel API support.
> >>>>
> >>>> This RFC is introduced following the discussion over the EFA driver [1], which
> >>>> initially does not provide in-kernel API support.
> >>>>
> >>>> [1] https://patchwork.kernel.org/cover/10711629/
> >>>
> >>> Having some drivers support kernel verbs and others not is confusing to
> >>> Linux RDMA users. If we add a kverbs_provider flag then that means that we
> >>> officially support that not all drivers support kverbs. I don't like this -
> >>> I think new RDMA drivers should support both kverbs and uverbs.
> >>
> >> +1, and I'm not sure that we came to any meaningful conclusion that we
> >> allow drivers without kverbs, do we?
> >
> > I think that the discussion is a bit backwards and so is the RFC..
> >
> > Its not that EFA does not support kverbs, isn't kverbs is an abstract
> > name for "what our current ulps require"?
> >
> > Why is this kernel specific anyways? The exact same holds for uverbs
> > applications that use RC, or in other words we can also see a kernel
> > consumer that use UD (and does not rely on IB addressing like ipoib)
> > that can run on the EFA device...
>
> Makes sense.
>
> >
> > Perhaps an rdma device needs to specify mask of which QP types it
> > supports such that the core or the consumer can look at if it wants to
> > log a meaningful error message (or it can simply fail ib_create_qp).
> Supported QP types for the device sounds good.
> This could both move a chunk of drivers qp types checks to the core and help
> clients identify an unsupported device.
>
> >
> > And, fwiw, I'm not sure I understand why should a new device support our
> > kernel ulps (again, not kverbs, functionality required by our existing
> > kernel consumers) if its users are not interested in it (if it was the
> > case then it would probably be supported). Isn't it enough that
> > something like rsockets can run on a device to justify its existence?
> Exactly, we just need a proper way to let the ulps know the device does not
> provide the required functionality.

For me it sounds a little bit different.

I think that it will complicate ULPs and will push responsibility from
driver authors and their employer companies to ULP authors who are not
supposed to be spec savvies in order to use kverbs.

Thanks

Gal Pressman Jan. 2, 2019, 1:45 p.m. UTC | #7

On 02-Jan-19 13:31, Leon Romanovsky wrote:
> On Wed, Jan 02, 2019 at 10:40:52AM +0200, Gal Pressman wrote:
>> Hey Sagi,
>>
>> On 02-Jan-19 02:27, Sagi Grimberg wrote:
>>> Hey,
>>>
>>>>>> Hello all,
>>>>>> This RFC allows device drivers to indicate their support for in-kernel API
>>>>>> through a flag in the IB device.
>>>>>> Currently, devices that do not support in-kernel APIs (such as usnic) have no
>>>>>> way to communicate that to the ULPs which try to use the device and fail.
>>>>>> Instead, make the driver advertise its support upfront and allow clients to
>>>>>> exit gracefully in case of unsupported device.
>>>>>>
>>>>>> Patch #1 adds the flag to the IB device, sets all existing drivers as kernel
>>>>>> verbs providers and chanes the IB clients.
>>>>>> Patch #2 changes usnic driver to a non-kernel verbs provider as it offers no
>>>>>> kernel API support.
>>>>>>
>>>>>> This RFC is introduced following the discussion over the EFA driver [1], which
>>>>>> initially does not provide in-kernel API support.
>>>>>>
>>>>>> [1] https://patchwork.kernel.org/cover/10711629/
>>>>>
>>>>> Having some drivers support kernel verbs and others not is confusing to
>>>>> Linux RDMA users. If we add a kverbs_provider flag then that means that we
>>>>> officially support that not all drivers support kverbs. I don't like this -
>>>>> I think new RDMA drivers should support both kverbs and uverbs.
>>>>
>>>> +1, and I'm not sure that we came to any meaningful conclusion that we
>>>> allow drivers without kverbs, do we?
>>>
>>> I think that the discussion is a bit backwards and so is the RFC..
>>>
>>> Its not that EFA does not support kverbs, isn't kverbs is an abstract
>>> name for "what our current ulps require"?
>>>
>>> Why is this kernel specific anyways? The exact same holds for uverbs
>>> applications that use RC, or in other words we can also see a kernel
>>> consumer that use UD (and does not rely on IB addressing like ipoib)
>>> that can run on the EFA device...
>>
>> Makes sense.
>>
>>>
>>> Perhaps an rdma device needs to specify mask of which QP types it
>>> supports such that the core or the consumer can look at if it wants to
>>> log a meaningful error message (or it can simply fail ib_create_qp).
>> Supported QP types for the device sounds good.
>> This could both move a chunk of drivers qp types checks to the core and help
>> clients identify an unsupported device.
>>
>>>
>>> And, fwiw, I'm not sure I understand why should a new device support our
>>> kernel ulps (again, not kverbs, functionality required by our existing
>>> kernel consumers) if its users are not interested in it (if it was the
>>> case then it would probably be supported). Isn't it enough that
>>> something like rsockets can run on a device to justify its existence?
>> Exactly, we just need a proper way to let the ulps know the device does not
>> provide the required functionality.
> 
> For me it sounds a little bit different.
> 
> I think that it will complicate ULPs and will push responsibility from
> driver authors and their employer companies to ULP authors who are not
> supposed to be spec savvies in order to use kverbs.
> 
> Thanks
> 

My original suggestion doesn't really complicate the ULPs in any way, an
additional if statement is not a lot of responsibility.

The QP type suggestion does impose a bit more requirements from the ULPs, but it
provides them with more flexibility.
A ULP that can make use of UD QP type for example, could use devices that he
couldn't have if we used the 'kverbs_provider' all or nothing flag.

Jason Gunthorpe Jan. 2, 2019, 5:27 p.m. UTC | #8

On Tue, Jan 01, 2019 at 04:27:51PM -0800, Sagi Grimberg wrote:

> Its not that EFA does not support kverbs, isn't kverbs is an abstract
> name for "what our current ulps require"?

kverbs is an abstract name for "what our current ulps require" and
EFA clearly doesn't support that..  What are you trying to say?

> Why is this kernel specific anyways? The exact same holds for uverbs
> applications that use RC, or in other words we can also see a kernel
> consumer that use UD (and does not rely on IB addressing like ipoib)
> that can run on the EFA device...

We could but we don't have such a thing today.. 

.. and I'm not sure we ever will, as UD is kind of useless on an
ethernet based network. The saner thing to do is to use UDP and one of
the high speed ethernet packet processing flavours available these
days.

> And, fwiw, I'm not sure I understand why should a new device support our
> kernel ulps (again, not kverbs, functionality required by our existing
> kernel consumers) if its users are not interested in it (if it was the
> case then it would probably be supported). 

So what is your standard for determining if a device is part of the
RDMA subsystem or not?

If we don't have 'implements kverbs' as a requriement, and just permit
UD QP verbs as the baseline requirement, is that OK? (and again note
the original EFA submission didn't even support UD QP verbs)

This is where usnic is already.

The EFA device doesn't support rkeys: it *clearly* doesn't do the
thing we call RDMA.

> Isn't it enough that something like rsockets can run on a device to
> justify its existence?

?? rsockets requires RC RDMA QPs, EFA won't support it.

Jason

Sagi Grimberg Jan. 2, 2019, 7:32 p.m. UTC | #9

>> Its not that EFA does not support kverbs, isn't kverbs is an abstract
>> name for "what our current ulps require"?
> 
> kverbs is an abstract name for "what our current ulps require" and
> EFA clearly doesn't support that..  What are you trying to say?

This was directed to the proposal in the patch set,
ib_device->kverbs_provider is really not a great design choice for
and interface..

>> Why is this kernel specific anyways? The exact same holds for uverbs
>> applications that use RC, or in other words we can also see a kernel
>> consumer that use UD (and does not rely on IB addressing like ipoib)
>> that can run on the EFA device...
> 
> We could but we don't have such a thing today..
> 
> .. and I'm not sure we ever will, as UD is kind of useless on an
> ethernet based network.

I don't know either, but I was merely arguing that that this matter is
not kverbs specific, hence the interface should not reflect it as such.

> The saner thing to do is to use UDP and one of
> the high speed ethernet packet processing flavours available these
> days.

AFAICT we still have a long way before an application can actually do
termination + zcopy with whats available today. Forwarding is more
reasonable, I agree.

>> And, fwiw, I'm not sure I understand why should a new device support our
>> kernel ulps (again, not kverbs, functionality required by our existing
>> kernel consumers) if its users are not interested in it (if it was the
>> case then it would probably be supported).
> 
> So what is your standard for determining if a device is part of the
> RDMA subsystem or not?

I'm not sure, but its seems slightly odd to me that the vast majority
of RDMA use cases is probably not our ulps (as much as I'd like them to
be ;)) but we state them as the bar.

> If we don't have 'implements kverbs' as a requriement, and just permit
> UD QP verbs as the baseline requirement, is that OK? (and again note
> the original EFA submission didn't even support UD QP verbs)
> 
> This is where usnic is already.

Well, I agree that UD is kind of a low bar, and probably most of the
value of the EFA device comes from their SRD transport.

Is usnic a burdan is because its not actively maintained in a subsystem
that is constantly evolving, or because it implements a small subset of
an RDMA device functionality?

Personally, given that most of RDMA usage lives in userland, I would
think that having a uverbs provider is a more appropriate bar than
supporting our kernel consumers. But I (like you and others) would be
more than happy seeing both supported.

Both can be set as a bar, but one could argue that its an unnecessarily
high bar (if its user-base has no interest in running our kernel
consumers).

> The EFA device doesn't support rkeys: it *clearly* doesn't do the
> thing we call RDMA.

A lot of applications don't use rkeys. We even have a kernel consumer
that don't use rkeys (9p) but still is using RDMA devices.

>> Isn't it enough that something like rsockets can run on a device to
>> justify its existence?
> 
> ?? rsockets requires RC RDMA QPs, EFA won't support it.

I was referring to datagram rsockets...

Jason Gunthorpe Jan. 2, 2019, 8:31 p.m. UTC | #10

On Wed, Jan 02, 2019 at 11:32:46AM -0800, Sagi Grimberg wrote:

> This was directed to the proposal in the patch set,
> ib_device->kverbs_provider is really not a great design choice for
> and interface..

Yeah, it really isn't..

> > The saner thing to do is to use UDP and one of
> > the high speed ethernet packet processing flavours available these
> > days.
> 
> AFAICT we still have a long way before an application can actually do
> termination + zcopy with whats available today. Forwarding is more
> reasonable, I agree.

I thought the AF_XDP stuff was doing zcopy now?

> > So what is your standard for determining if a device is part of the
> > RDMA subsystem or not?
> 
> I'm not sure, but its seems slightly odd to me that the vast majority
> of RDMA use cases is probably not our ulps (as much as I'd like them to
> be ;)) but we state them as the bar.

Well, it is the vast majority of the in-kernel use cases, for sure :)

> Well, I agree that UD is kind of a low bar, and probably most of the
> value of the EFA device comes from their SRD transport.
> 
> Is usnic a burdan is because its not actively maintained in a
> subsystem that is constantly evolving, or because it implements a
> small subset of an RDMA device functionality?

I think because everytime someone wants to do something to refactor
the core API's (like the completion stuff, or the RDMA WC stuff)
they've looked into usnic to see if it would break and got all
confused.

For instance, EFA did not implement create_ah properly, so anyone
looking at how AH works for kverbs will become very confused by it.

I suppose this is why just blocking kverbs entirely has come up as a
proposal. People working on kverbs *do not* need to care about
non-kverbs drivers at all.

Part of this is how uverbs and kverbs are really roughly pushed into
the same driver API, so a driver can't just say it supports uverbs
only...

> Personally, given that most of RDMA usage lives in userland, I would
> think that having a uverbs provider is a more appropriate bar than
> supporting our kernel consumers. But I (like you and others) would be
> more than happy seeing both supported.

We don't really have a compliance test or anything for uverbs, beyond
the stuff in rdma-core, so if a driver doesn't support that stuff
there is no way to know if the device or driver is implementing verbs
correctly..

> Both can be set as a bar, but one could argue that its an
> unnecessarily high bar (if its user-base has no interest in running
> our kernel consumers).

So how should we support non-verbs things? EFA seems particularly
difficult because it is a *little* verbs like, and does seem to fit
into the uverbs system somewhat OK.

> > The EFA device doesn't support rkeys: it *clearly* doesn't do the
> > thing we call RDMA.
> 
> A lot of applications don't use rkeys. We even have a kernel consumer
> that don't use rkeys (9p) but still is using RDMA devices.

It uses only SEND?

> > > Isn't it enough that something like rsockets can run on a device to
> > > justify its existence?
> > 
> > ?? rsockets requires RC RDMA QPs, EFA won't support it.
> 
> I was referring to datagram rsockets...

Even so, I don't think EFA has an addressing model compatible with
rsockets, it doesn't use RDMA-CM either, which I think rsockets
requires still for UD??

Jason

Sagi Grimberg Jan. 2, 2019, 9:41 p.m. UTC | #11

>> AFAICT we still have a long way before an application can actually do
>> termination + zcopy with whats available today. Forwarding is more
>> reasonable, I agree.
> 
> I thought the AF_XDP stuff was doing zcopy now?

Apparently it does, on a intel nics for now... cool.

>> Well, I agree that UD is kind of a low bar, and probably most of the
>> value of the EFA device comes from their SRD transport.
>>
>> Is usnic a burdan is because its not actively maintained in a
>> subsystem that is constantly evolving, or because it implements a
>> small subset of an RDMA device functionality?
> 
> I think because everytime someone wants to do something to refactor
> the core API's (like the completion stuff, or the RDMA WC stuff)
> they've looked into usnic to see if it would break and got all
> confused.

You're right, every time I swung by it, I didn't understand what that
thing is doing but assumed its broken so I didn't worry about it too
much...

> For instance, EFA did not implement create_ah properly, so anyone
> looking at how AH works for kverbs will become very confused by it.

To an extent it cannot be fixed?

> I suppose this is why just blocking kverbs entirely has come up as a
> proposal. People working on kverbs *do not* need to care about
> non-kverbs drivers at all.
> 
> Part of this is how uverbs and kverbs are really roughly pushed into
> the same driver API, so a driver can't just say it supports uverbs
> only...

Well, there are a few devices that we don't really know to support our
kernel consumers (or at least I've never heard of someone who verified
them), and some that are known to be broken/deprecated. The reason for
this is because no one uses them to run ulps, which is the case for EFA
most likely.

>> Personally, given that most of RDMA usage lives in userland, I would
>> think that having a uverbs provider is a more appropriate bar than
>> supporting our kernel consumers. But I (like you and others) would be
>> more than happy seeing both supported.
> 
> We don't really have a compliance test or anything for uverbs, beyond
> the stuff in rdma-core, so if a driver doesn't support that stuff
> there is no way to know if the device or driver is implementing verbs
> correctly..

Sounds like a needed testing suite (even regardless of the discussion
here).

>> Both can be set as a bar, but one could argue that its an
>> unnecessarily high bar (if its user-base has no interest in running
>> our kernel consumers).
> 
> So how should we support non-verbs things? EFA seems particularly
> difficult because it is a *little* verbs like, and does seem to fit
> into the uverbs system somewhat OK.

I didn't say support non-verbs things. I personally think that uverbs
interface is required (and I think Gal agreed to add one to his future
submissions).

Anyways... its complicated I guess.. its hard to come up with a
reasonable bar half way in... Its just ones opinion..

>>> The EFA device doesn't support rkeys: it *clearly* doesn't do the
>>> thing we call RDMA.
>>
>> A lot of applications don't use rkeys. We even have a kernel consumer
>> that don't use rkeys (9p) but still is using RDMA devices.
> 
> It uses only SEND?

Yep.

>>>> Isn't it enough that something like rsockets can run on a device to
>>>> justify its existence?
>>>
>>> ?? rsockets requires RC RDMA QPs, EFA won't support it.
>>
>> I was referring to datagram rsockets...
> 
> Even so, I don't think EFA has an addressing model compatible with
> rsockets, it doesn't use RDMA-CM either, which I think rsockets
> requires still for UD??

I'd assume that EFA proprietary stuff matches IPv4/v6 to their
something so they can hook into ucma?

otherwise how would addressing work at all? Perhaps Gal can share
more on this...

Gal Pressman Jan. 3, 2019, 10:07 a.m. UTC | #12

On 02-Jan-19 23:41, Sagi Grimberg wrote:
> 
>>> AFAICT we still have a long way before an application can actually do
>>> termination + zcopy with whats available today. Forwarding is more
>>> reasonable, I agree.
>>
>> I thought the AF_XDP stuff was doing zcopy now?
> 
> Apparently it does, on a intel nics for now... cool.
> 
>>> Well, I agree that UD is kind of a low bar, and probably most of the
>>> value of the EFA device comes from their SRD transport.
>>>
>>> Is usnic a burdan is because its not actively maintained in a
>>> subsystem that is constantly evolving, or because it implements a
>>> small subset of an RDMA device functionality?
>>
>> I think because everytime someone wants to do something to refactor
>> the core API's (like the completion stuff, or the RDMA WC stuff)
>> they've looked into usnic to see if it would break and got all
>> confused.
> 
> You're right, every time I swung by it, I didn't understand what that
> thing is doing but assumed its broken so I didn't worry about it too
> much...
> 
>> For instance, EFA did not implement create_ah properly, so anyone
>> looking at how AH works for kverbs will become very confused by it.
> 
> To an extent it cannot be fixed?

Are we talking about the fact that EFA's create_ah is not atomic? That was fixed
using the sleepable flag.

We can also make it atomic, but as discussed, it makes no sense to do that until
we actually go through the atomic flows.

> 
>> I suppose this is why just blocking kverbs entirely has come up as a
>> proposal. People working on kverbs *do not* need to care about
>> non-kverbs drivers at all.
>>
>> Part of this is how uverbs and kverbs are really roughly pushed into
>> the same driver API, so a driver can't just say it supports uverbs
>> only...
> 
> Well, there are a few devices that we don't really know to support our
> kernel consumers (or at least I've never heard of someone who verified
> them), and some that are known to be broken/deprecated. The reason for
> this is because no one uses them to run ulps, which is the case for EFA
> most likely.

Correct, that's the reasoning for the kverbs_provider flag.
Jason, do you prefer Sagi's suggestion to make the drivers advertise their
supported QP types instead of the kverbs provider interface?

> >>> Personally, given that most of RDMA usage lives in userland, I would
>>> think that having a uverbs provider is a more appropriate bar than
>>> supporting our kernel consumers. But I (like you and others) would be
>>> more than happy seeing both supported.
>>
>> We don't really have a compliance test or anything for uverbs, beyond
>> the stuff in rdma-core, so if a driver doesn't support that stuff
>> there is no way to know if the device or driver is implementing verbs
>> correctly..
> 
> Sounds like a needed testing suite (even regardless of the discussion
> here).
> 
>>> Both can be set as a bar, but one could argue that its an
>>> unnecessarily high bar (if its user-base has no interest in running
>>> our kernel consumers).
>>
>> So how should we support non-verbs things? EFA seems particularly
>> difficult because it is a *little* verbs like, and does seem to fit
>> into the uverbs system somewhat OK.
> 
> I didn't say support non-verbs things. I personally think that uverbs
> interface is required (and I think Gal agreed to add one to his future
> submissions).
> 
> Anyways... its complicated I guess.. its hard to come up with a
> reasonable bar half way in... Its just ones opinion..
> 
>>>> The EFA device doesn't support rkeys: it *clearly* doesn't do the
>>>> thing we call RDMA.
>>>
>>> A lot of applications don't use rkeys. We even have a kernel consumer
>>> that don't use rkeys (9p) but still is using RDMA devices.
>>
>> It uses only SEND?
> 
> Yep.
> 
>>>>> Isn't it enough that something like rsockets can run on a device to
>>>>> justify its existence?
>>>>
>>>> ?? rsockets requires RC RDMA QPs, EFA won't support it.
>>>
>>> I was referring to datagram rsockets...
>>
>> Even so, I don't think EFA has an addressing model compatible with
>> rsockets, it doesn't use RDMA-CM either, which I think rsockets
>> requires still for UD??
> 
> I'd assume that EFA proprietary stuff matches IPv4/v6 to their
> something so they can hook into ucma?
> 
> otherwise how would addressing work at all? Perhaps Gal can share
> more on this...

Our addressing does not rely on rdmacm, also, there is no matching netdevice
(ipv4/6) for the EFA ib device.
Each EFA device has a 16 bytes opaque GID (queried from the device) that should
be specified when creating the AH.

libfabric's connection manager (out of band) is used to exchange these device
GIDs and destination QP numbers.

Does that answer your questions?

Sagi Grimberg Jan. 4, 2019, 11:32 p.m. UTC | #13

>>>>>> Isn't it enough that something like rsockets can run on a device to
>>>>>> justify its existence?
>>>>>
>>>>> ?? rsockets requires RC RDMA QPs, EFA won't support it.
>>>>
>>>> I was referring to datagram rsockets...
>>>
>>> Even so, I don't think EFA has an addressing model compatible with
>>> rsockets, it doesn't use RDMA-CM either, which I think rsockets
>>> requires still for UD??
>>
>> I'd assume that EFA proprietary stuff matches IPv4/v6 to their
>> something so they can hook into ucma?
>>
>> otherwise how would addressing work at all? Perhaps Gal can share
>> more on this...
> 
> Our addressing does not rely on rdmacm, also, there is no matching netdevice
> (ipv4/6) for the EFA ib device.

I suppose one can be made though? even in SW implementing some sort of
indirection table?

> Each EFA device has a 16 bytes opaque GID (queried from the device) that should
> be specified when creating the AH.
> 
> libfabric's connection manager (out of band) is used to exchange these device
> GIDs and destination QP numbers.
> 
> Does that answer your questions?

Yes, but I do tend to agree with the notion that efa needs to be made to
fit uverbs better. That means a libibverbs provider and a efacm
component that can map to rdma_ucm. Is that feasible?

Hefty, Sean Jan. 5, 2019, 12:25 a.m. UTC | #14

> Yes, but I do tend to agree with the notion that efa needs to be made to
> fit uverbs better. That means a libibverbs provider and a efacm
> component that can map to rdma_ucm. Is that feasible?

EFA supports connectionless communication.  Mapping to rdma_ucm doesn't make sense to me.

- Sean

Sagi Grimberg Jan. 5, 2019, 12:58 a.m. UTC | #15

>> Yes, but I do tend to agree with the notion that efa needs to be made to
>> fit uverbs better. That means a libibverbs provider and a efacm
>> component that can map to rdma_ucm. Is that feasible?
> 
> EFA supports connectionless communication.  Mapping to rdma_ucm doesn't make sense to me.

Gal said that address libfabric connection manager is used to resolve
addressing... so there is some connection manager involved...

Hefty, Sean Jan. 5, 2019, 1:09 a.m. UTC | #16

> >> Yes, but I do tend to agree with the notion that efa needs to be made to
> >> fit uverbs better. That means a libibverbs provider and a efacm
> >> component that can map to rdma_ucm. Is that feasible?
> >
> > EFA supports connectionless communication.  Mapping to rdma_ucm doesn't make
> sense to me.
> 
> Gal said that address libfabric connection manager is used to resolve
> addressing... so there is some connection manager involved...

I haven't see the libfabric provider yet, but libfabric has generic out-of-band socket-based name service that can be used by providers.  I'm guessing that's what Gal is referring to.  The name service is primarily there to support fabtests.  In realistic use cases, those providers rely on a job manager to exchange addressing, with name service support disabled.

- Sean

Sagi Grimberg Jan. 5, 2019, 1:27 a.m. UTC | #17

On 1/4/19 5:09 PM, Hefty, Sean wrote:
>>>> Yes, but I do tend to agree with the notion that efa needs to be made to
>>>> fit uverbs better. That means a libibverbs provider and a efacm
>>>> component that can map to rdma_ucm. Is that feasible?
>>>
>>> EFA supports connectionless communication.  Mapping to rdma_ucm doesn't make
>> sense to me.
>>
>> Gal said that address libfabric connection manager is used to resolve
>> addressing... so there is some connection manager involved...
> 
> I haven't see the libfabric provider yet, but libfabric has generic out-of-band socket-based name service that can be used by provider > I'm guessing that's what Gal is referring to.  The name service is 
primarily there to support fabtests.
> In realistic use cases, those providers rely on a job manager to exchange addressing, with name service support disabled.

I think that this is what I was referring to by introducing efacm like
ibcm and iwcm... Isn't it in essence the same thing?

Hefty, Sean Jan. 7, 2019, 4:28 p.m. UTC | #18

> > I haven't see the libfabric provider yet, but libfabric has generic out-of-
> band socket-based name service that can be used by provider > I'm guessing
> that's what Gal is referring to.  The name service is
> primarily there to support fabtests.
> > In realistic use cases, those providers rely on a job manager to exchange
> addressing, with name service support disabled.
> 
> I think that this is what I was referring to by introducing efacm like
> ibcm and iwcm... Isn't it in essence the same thing?

Not quite - this isn't running a connection protocol.  The closest in tree comparison would be the IB SIDR protocol used in conjunction with IP addresses.  I’m not aware of anyone using that, however.  Unconnected endpoints typically have an existing out of band mechanism (e.g. PMI) that can be used for address exchange.  The PSM/2 drivers make a similar assumption.

- Sean

Jason Gunthorpe Jan. 7, 2019, 11:42 p.m. UTC | #19

On Mon, Jan 07, 2019 at 04:28:54PM +0000, Hefty, Sean wrote:
> > > I haven't see the libfabric provider yet, but libfabric has generic out-of-
> > band socket-based name service that can be used by provider > I'm guessing
> > that's what Gal is referring to.  The name service is
> > primarily there to support fabtests.
> > > In realistic use cases, those providers rely on a job manager to exchange
> > addressing, with name service support disabled.
> > 
> > I think that this is what I was referring to by introducing efacm like
> > ibcm and iwcm... Isn't it in essence the same thing?
> 
> Not quite - this isn't running a connection protocol.  The closest
> in tree comparison would be the IB SIDR protocol used in conjunction
> with IP addresses.  I’m not aware of anyone using that, however.
> Unconnected endpoints typically have an existing out of band
> mechanism (e.g. PMI) that can be used for address exchange.  The
> PSM/2 drivers make a similar assumption.

Dare I ask how it avoids duplicate messages without a connection
protocol?

Jason

Barrett, Brian Jan. 7, 2019, 11:56 p.m. UTC | #20

On Jan 7, 2019, at 3:42 PM, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> 
> On Mon, Jan 07, 2019 at 04:28:54PM +0000, Hefty, Sean wrote:
>>>> I haven't see the libfabric provider yet, but libfabric has generic out-of-
>>> band socket-based name service that can be used by provider > I'm guessing
>>> that's what Gal is referring to.  The name service is
>>> primarily there to support fabtests.
>>>> In realistic use cases, those providers rely on a job manager to exchange
>>> addressing, with name service support disabled.
>>> 
>>> I think that this is what I was referring to by introducing efacm like
>>> ibcm and iwcm... Isn't it in essence the same thing?
>> 
>> Not quite - this isn't running a connection protocol.  The closest
>> in tree comparison would be the IB SIDR protocol used in conjunction
>> with IP addresses.  I’m not aware of anyone using that, however.
>> Unconnected endpoints typically have an existing out of band
>> mechanism (e.g. PMI) that can be used for address exchange.  The
>> PSM/2 drivers make a similar assumption.
> 
> Dare I ask how it avoids duplicate messages without a connection
> protocol?

In SRD’s case, there is a connection-like structure between any two NICs that is dynamically established as part of packet transmission.  If you look at Sandia Portals (which is even further from standard VERBS, but is a well documented communication interface so worth referencing), it assumes a job configuration step that, while not establishing a connection in the VERBS sense of the word connection, does give a time period for which reliability data can be stored.

Brian

Jason Gunthorpe Jan. 8, 2019, 12:28 a.m. UTC | #21

On Mon, Jan 07, 2019 at 11:56:02PM +0000, Barrett, Brian wrote:
> On Jan 7, 2019, at 3:42 PM, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > 
> > On Mon, Jan 07, 2019 at 04:28:54PM +0000, Hefty, Sean wrote:
> >>>> I haven't see the libfabric provider yet, but libfabric has generic out-of-
> >>> band socket-based name service that can be used by provider > I'm guessing
> >>> that's what Gal is referring to.  The name service is
> >>> primarily there to support fabtests.
> >>>> In realistic use cases, those providers rely on a job manager to exchange
> >>> addressing, with name service support disabled.
> >>> 
> >>> I think that this is what I was referring to by introducing efacm like
> >>> ibcm and iwcm... Isn't it in essence the same thing?
> >> 
> >> Not quite - this isn't running a connection protocol.  The closest
> >> in tree comparison would be the IB SIDR protocol used in conjunction
> >> with IP addresses.  I’m not aware of anyone using that, however.
> >> Unconnected endpoints typically have an existing out of band
> >> mechanism (e.g. PMI) that can be used for address exchange.  The
> >> PSM/2 drivers make a similar assumption.
> > 
> > Dare I ask how it avoids duplicate messages without a connection
> > protocol?
> 
> In SRD’s case, there is a connection-like structure between any two
> NICs that is dynamically established as part of packet transmission.
> If you look at Sandia Portals (which is even further from standard
> VERBS, but is a well documented communication interface so worth
> referencing), it assumes a job configuration step that, while not
> establishing a connection in the VERBS sense of the word connection,
> does give a time period for which reliability data can be stored.

Usually the reason a protocol needs an explicit exchange of connection
parameters is to solve collisions with ID re-use, ie the source ID
matching the 'connection-like' structure gets improperly re-used due
to machine reboot, general ID recycling, or whatever.

Does SRD inherently rely on the job-like scheme for correct operation?

A mandatory job-like scheme would probably preclude using it directly
in kernel ULPs in future..

Jason

Barrett, Brian Jan. 8, 2019, 3:53 a.m. UTC | #22

> 
> On Jan 7, 2019, at 16:29, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> 
>> On Mon, Jan 07, 2019 at 11:56:02PM +0000, Barrett, Brian wrote:
>>> On Jan 7, 2019, at 3:42 PM, Jason Gunthorpe <jgg@ziepe.ca> wrote:
>>> 
>>> On Mon, Jan 07, 2019 at 04:28:54PM +0000, Hefty, Sean wrote:
>>>>>> I haven't see the libfabric provider yet, but libfabric has generic out-of-
>>>>> band socket-based name service that can be used by provider > I'm guessing
>>>>> that's what Gal is referring to.  The name service is
>>>>> primarily there to support fabtests.
>>>>>> In realistic use cases, those providers rely on a job manager to exchange
>>>>> addressing, with name service support disabled.
>>>>> 
>>>>> I think that this is what I was referring to by introducing efacm like
>>>>> ibcm and iwcm... Isn't it in essence the same thing?
>>>> 
>>>> Not quite - this isn't running a connection protocol.  The closest
>>>> in tree comparison would be the IB SIDR protocol used in conjunction
>>>> with IP addresses.  I’m not aware of anyone using that, however.
>>>> Unconnected endpoints typically have an existing out of band
>>>> mechanism (e.g. PMI) that can be used for address exchange.  The
>>>> PSM/2 drivers make a similar assumption.
>>> 
>>> Dare I ask how it avoids duplicate messages without a connection
>>> protocol?
>> 
>> In SRD’s case, there is a connection-like structure between any two
>> NICs that is dynamically established as part of packet transmission.
>> If you look at Sandia Portals (which is even further from standard
>> VERBS, but is a well documented communication interface so worth
>> referencing), it assumes a job configuration step that, while not
>> establishing a connection in the VERBS sense of the word connection,
>> does give a time period for which reliability data can be stored.
> 
> Usually the reason a protocol needs an explicit exchange of connection
> parameters is to solve collisions with ID re-use, ie the source ID
> matching the 'connection-like' structure gets improperly re-used due
> to machine reboot, general ID recycling, or whatever.
> 
> Does SRD inherently rely on the job-like scheme for correct operation?
> 
> A mandatory job-like scheme would probably preclude using it directly
> in kernel ULPs in future..

Sorry, that wasn’t clear.  No, SRD does not require any job-like indicators.  It has a protocol to establish / invalidate reliability state in firmware.  My point was that whether or not there’s a connection established under the covers, there’s no visible connection to the user with SRD; the usage flow is similar to UD or RD (obviously with different reliability, ordering, and performance characteristics).  My experience (perhaps incorrect, but matching with Gal’s expectations) with UD and RD is that consumers of the datagram protocols don’t use a connection manager (because there isn’t a connection).  If this is a bad assumption, we’ll go back and rethink our strategy.

Brian

Jason Gunthorpe Jan. 8, 2019, 4:19 a.m. UTC | #23

On Tue, Jan 08, 2019 at 03:53:49AM +0000, Barrett, Brian wrote:

> Sorry, that wasn’t clear.  No, SRD does not require any job-like
> indicators.  It has a protocol to establish / invalidate reliability
> state in firmware.  My point was that whether or not there’s a
> connection established under the covers, there’s no visible
> connection to the user with SRD; 

A hidden connection manager makes a lot more sense.

But still, the hidden and uncontrolled resource usage is probably
still not so great for anything but a job-like HPC application. Any
client/server thing is going to want to control this resource more
finely.

> characteristics).  My experience (perhaps incorrect, but matching
> with Gal’s expectations) with UD and RD is that consumers of the
> datagram protocols don’t use a connection manager (because there
> isn’t a connection).  If this is a bad assumption, we’ll go back and
> rethink our strategy.

Well, sure sounds like there *is* a connection - you've just removed
all visiblity and control over the underlying connection resources and
state from the OS and application.

UD has both connected and unconnected flows that are interesting, and
as soon as there is a resource and state, generally, people will
eventually find a reason to need control over that. (although probably
not from a HPC workload perspective)

For instance, most enterprise applications will want to tear down and
restart their 'connection' - in the SRD perspective this means
forgetting about all the connection state and setting it up again.

In typical cases for other protocols this might select a different
network multi-path, or side step some bug that was preventing forward
progress.

So, you can choose to hide all of this, but I wouldn't describe SRD as
unconnected, more as 'automatically connected'.

Jason

Shalev, Leah Jan. 8, 2019, 9:17 a.m. UTC | #24

> From: Jason Gunthorpe <jgg@ziepe.ca>
> 
> But still, the hidden and uncontrolled resource usage is probably still not so
> great for anything but a job-like HPC application. Any client/server thing is
> going to want to control this resource more finely.
> 

Here is an excerpt from "SRD spec" we will provide, hope it will clarify things:

SRD QPs provide reliable but out-of-order delivery without segmentation support.
This allows decoupling of transport processing from QP buffer management, so
that separate application flows can be multiplexed without interfering with each
other.
As in UD QPs, each WR includes the AH of the remote destination, allowing a 
process to communicate with any process on any endnode using a single QP, on 
both send and receive side. Each Address Handle is associated with an SRD 
context. SRD context is used to provide reliable communication to a remote node,
similar to RD EE context, but without explicit management by a user. SRD 
contexts are implicitly controlled by AH and QP management operations. If a QP 
is destroyed, all pending Send WRs on that QP are implicitly canceled, and their transport
processing is aborted, without affecting SRD processing of other WRs. If an AH 
is destroyed, any outstanding WRs using that AH are completed in error.

Completion for Send WRs posted to SRD QPs are same as for WRs posted to regular
QPs. Success is reported after the WR is acked by the responder. 
In addition to local errors, new types of remote errors are returned for 
requests that caused the responder to send a NAK. These errors could have been 
caused when the destination QP either does not exist, or is in error state, or 
does not have posted Recv WRs. These errors do not affect SRD context state.

> 
> UD has both connected and unconnected flows that are interesting, and as
> soon as there is a resource and state, generally, people will eventually find a
> reason to need control over that. (although probably not from a HPC
> workload perspective)
We can support CM (UD-style) if anybody ever needs it, but it would be used only 
to control QP states, not SRD transport state.

> 
> For instance, most enterprise applications will want to tear down and restart
> their 'connection' - in the SRD perspective this means forgetting about all the
> connection state and setting it up again.
This is how it is today because of tight coupling of the interface and underlying protocol,
it does not have to be achieved by connection tear down.

> 
> In typical cases for other protocols this might select a different network
> multi-path, or side step some bug that was preventing forward progress.
Which is exactly why we  chose to design a new protocol.

> 
> So, you can choose to hide all of this, but I wouldn't describe SRD as
> unconnected, more as 'automatically connected'.
Is there any difference from a user perspective?

Leah

Jason Gunthorpe Jan. 9, 2019, 12:10 a.m. UTC | #25

On Tue, Jan 08, 2019 at 09:17:53AM +0000, Shalev, Leah wrote:

> > So, you can choose to hide all of this, but I wouldn't describe SRD as
> > unconnected, more as 'automatically connected'.

> Is there any difference from a user perspective?

Unconnected implies there is no state. Automatically connected
implies there is a state but it is hidden from the user.

They are equivalent until the state actually matters, like there are
bugs in the state management :)

Jason

Hefty, Sean Jan. 9, 2019, 12:25 a.m. UTC | #26

> > > So, you can choose to hide all of this, but I wouldn't describe SRD as
> > > unconnected, more as 'automatically connected'.
> 
> > Is there any difference from a user perspective?
> 
> Unconnected implies there is no state. Automatically connected
> implies there is a state but it is hidden from the user.

Brian said that the state is maintained NIC to NIC, not at the QP level.  He went further and suggested that the state is maintained for some period of time, and isn't necessarily tied to the lifetime of the QP.  The SRD QPs are unconnected, but let's argue what the definition of a connection is now.

> They are equivalent until the state actually matters, like there are
> bugs in the state management :)

Yes, everything that has transport offload and provides reliability maintains some sort of state.  And they can all have bugs.  So?

Jason Gunthorpe Jan. 9, 2019, 3:53 a.m. UTC | #27

On Wed, Jan 09, 2019 at 12:25:23AM +0000, Hefty, Sean wrote:
> > > > So, you can choose to hide all of this, but I wouldn't describe SRD as
> > > > unconnected, more as 'automatically connected'.
> > 
> > > Is there any difference from a user perspective?
> > 
> > Unconnected implies there is no state. Automatically connected
> > implies there is a state but it is hidden from the user.
> 
> Brian said that the state is maintained NIC to NIC, not at the QP
> level.  He went further and suggested that the state is maintained
> for some period of time, and isn't necessarily tied to the lifetime
> of the QP.  The SRD QPs are unconnected, but let's argue what the
> definition of a connection is now.

I didn't say QP, I said the protocol was 'automatically connected'

SRD seems similar to the IB spec concept of RD which is described as
an unconnected QP running a 'connection oriented' RD protocol.

EFA wants to hide the objects related to connection state (ie in IBA
RD terms RDC and EEC) of the protocol, thats fine, but lets not
pretend it doesn't exist, OK?

Sagi wanted to know where CM was done, and now we know. It is hidden
in the NIC, and managed automatically.

Jason

[RFC,0/2] IB device in-kernel API support indication

Message

Comments