Message ID | 1546335025-31360-1-git-send-email-galpress@amazon.com (mailing list archive) |
---|---|
Headers | show |
Series | IB device in-kernel API support indication | expand |
On 1/1/19 1:30 AM, Gal Pressman wrote: > Hello all, > This RFC allows device drivers to indicate their support for in-kernel API > through a flag in the IB device. > Currently, devices that do not support in-kernel APIs (such as usnic) have no > way to communicate that to the ULPs which try to use the device and fail. > Instead, make the driver advertise its support upfront and allow clients to > exit gracefully in case of unsupported device. > > Patch #1 adds the flag to the IB device, sets all existing drivers as kernel > verbs providers and chanes the IB clients. > Patch #2 changes usnic driver to a non-kernel verbs provider as it offers no > kernel API support. > > This RFC is introduced following the discussion over the EFA driver [1], which > initially does not provide in-kernel API support. > > [1] https://patchwork.kernel.org/cover/10711629/ Having some drivers support kernel verbs and others not is confusing to Linux RDMA users. If we add a kverbs_provider flag then that means that we officially support that not all drivers support kverbs. I don't like this - I think new RDMA drivers should support both kverbs and uverbs. What is so hard about adding kverbs support to the EFA driver? Bart.
On Tue, Jan 01, 2019 at 08:01:39AM -0800, Bart Van Assche wrote: > On 1/1/19 1:30 AM, Gal Pressman wrote: > > Hello all, > > This RFC allows device drivers to indicate their support for in-kernel API > > through a flag in the IB device. > > Currently, devices that do not support in-kernel APIs (such as usnic) have no > > way to communicate that to the ULPs which try to use the device and fail. > > Instead, make the driver advertise its support upfront and allow clients to > > exit gracefully in case of unsupported device. > > > > Patch #1 adds the flag to the IB device, sets all existing drivers as kernel > > verbs providers and chanes the IB clients. > > Patch #2 changes usnic driver to a non-kernel verbs provider as it offers no > > kernel API support. > > > > This RFC is introduced following the discussion over the EFA driver [1], which > > initially does not provide in-kernel API support. > > > > [1] https://patchwork.kernel.org/cover/10711629/ > > Having some drivers support kernel verbs and others not is confusing to > Linux RDMA users. If we add a kverbs_provider flag then that means that we > officially support that not all drivers support kverbs. I don't like this - > I think new RDMA drivers should support both kverbs and uverbs. +1, and I'm not sure that we came to any meaningful conclusion that we allow drivers without kverbs, do we? > > What is so hard about adding kverbs support to the EFA driver? It is not hard, but they don't need it and won't test it and probably won't care to support it too. So, this series returned us to square one, to discussion if EFA belongs to drivers/infiniband/ or not. Thanks > > Bart.
On 01-Jan-19 18:01, Bart Van Assche wrote: > On 1/1/19 1:30 AM, Gal Pressman wrote: >> Hello all, >> This RFC allows device drivers to indicate their support for in-kernel API >> through a flag in the IB device. >> Currently, devices that do not support in-kernel APIs (such as usnic) have no >> way to communicate that to the ULPs which try to use the device and fail. >> Instead, make the driver advertise its support upfront and allow clients to >> exit gracefully in case of unsupported device. >> >> Patch #1 adds the flag to the IB device, sets all existing drivers as kernel >> verbs providers and chanes the IB clients. >> Patch #2 changes usnic driver to a non-kernel verbs provider as it offers no >> kernel API support. >> >> This RFC is introduced following the discussion over the EFA driver [1], which >> initially does not provide in-kernel API support. >> >> [1] https://patchwork.kernel.org/cover/10711629/ > > Having some drivers support kernel verbs and others not is confusing to Linux > RDMA users. If we add a kverbs_provider flag then that means that we officially > support that not all drivers support kverbs. I don't like this - I think new > RDMA drivers should support both kverbs and uverbs. > > What is so hard about adding kverbs support to the EFA driver? > > Bart. Hi Bart, This RFC is sent in order to help prevent the confusion; there's already a driver that doesn't support kverbs which instead of having a proper way to advertise that, simply fails the kernel callbacks. Informing the ib clients upfront seems like a more healthy behavior to me. For the RDMA users, I can add a sysfs/rdma tool indication that will make it clear whether the device supports kernel verbs or not, does that sound reasonable? EFA supported QP types are UD and Scalable Reliable Datagram (SRD, driver QP type). Since kernel verbs do not make use of either of those QP types we do not support in-kernel APIs initially.
Hey, >>> Hello all, >>> This RFC allows device drivers to indicate their support for in-kernel API >>> through a flag in the IB device. >>> Currently, devices that do not support in-kernel APIs (such as usnic) have no >>> way to communicate that to the ULPs which try to use the device and fail. >>> Instead, make the driver advertise its support upfront and allow clients to >>> exit gracefully in case of unsupported device. >>> >>> Patch #1 adds the flag to the IB device, sets all existing drivers as kernel >>> verbs providers and chanes the IB clients. >>> Patch #2 changes usnic driver to a non-kernel verbs provider as it offers no >>> kernel API support. >>> >>> This RFC is introduced following the discussion over the EFA driver [1], which >>> initially does not provide in-kernel API support. >>> >>> [1] https://patchwork.kernel.org/cover/10711629/ >> >> Having some drivers support kernel verbs and others not is confusing to >> Linux RDMA users. If we add a kverbs_provider flag then that means that we >> officially support that not all drivers support kverbs. I don't like this - >> I think new RDMA drivers should support both kverbs and uverbs. > > +1, and I'm not sure that we came to any meaningful conclusion that we > allow drivers without kverbs, do we? I think that the discussion is a bit backwards and so is the RFC.. Its not that EFA does not support kverbs, isn't kverbs is an abstract name for "what our current ulps require"? Why is this kernel specific anyways? The exact same holds for uverbs applications that use RC, or in other words we can also see a kernel consumer that use UD (and does not rely on IB addressing like ipoib) that can run on the EFA device... Perhaps an rdma device needs to specify mask of which QP types it supports such that the core or the consumer can look at if it wants to log a meaningful error message (or it can simply fail ib_create_qp). And, fwiw, I'm not sure I understand why should a new device support our kernel ulps (again, not kverbs, functionality required by our existing kernel consumers) if its users are not interested in it (if it was the case then it would probably be supported). Isn't it enough that something like rsockets can run on a device to justify its existence?
Hey Sagi, On 02-Jan-19 02:27, Sagi Grimberg wrote: > Hey, > >>>> Hello all, >>>> This RFC allows device drivers to indicate their support for in-kernel API >>>> through a flag in the IB device. >>>> Currently, devices that do not support in-kernel APIs (such as usnic) have no >>>> way to communicate that to the ULPs which try to use the device and fail. >>>> Instead, make the driver advertise its support upfront and allow clients to >>>> exit gracefully in case of unsupported device. >>>> >>>> Patch #1 adds the flag to the IB device, sets all existing drivers as kernel >>>> verbs providers and chanes the IB clients. >>>> Patch #2 changes usnic driver to a non-kernel verbs provider as it offers no >>>> kernel API support. >>>> >>>> This RFC is introduced following the discussion over the EFA driver [1], which >>>> initially does not provide in-kernel API support. >>>> >>>> [1] https://patchwork.kernel.org/cover/10711629/ >>> >>> Having some drivers support kernel verbs and others not is confusing to >>> Linux RDMA users. If we add a kverbs_provider flag then that means that we >>> officially support that not all drivers support kverbs. I don't like this - >>> I think new RDMA drivers should support both kverbs and uverbs. >> >> +1, and I'm not sure that we came to any meaningful conclusion that we >> allow drivers without kverbs, do we? > > I think that the discussion is a bit backwards and so is the RFC.. > > Its not that EFA does not support kverbs, isn't kverbs is an abstract > name for "what our current ulps require"? > > Why is this kernel specific anyways? The exact same holds for uverbs > applications that use RC, or in other words we can also see a kernel > consumer that use UD (and does not rely on IB addressing like ipoib) > that can run on the EFA device... Makes sense. > > Perhaps an rdma device needs to specify mask of which QP types it > supports such that the core or the consumer can look at if it wants to > log a meaningful error message (or it can simply fail ib_create_qp). Supported QP types for the device sounds good. This could both move a chunk of drivers qp types checks to the core and help clients identify an unsupported device. > > And, fwiw, I'm not sure I understand why should a new device support our > kernel ulps (again, not kverbs, functionality required by our existing > kernel consumers) if its users are not interested in it (if it was the > case then it would probably be supported). Isn't it enough that > something like rsockets can run on a device to justify its existence? Exactly, we just need a proper way to let the ulps know the device does not provide the required functionality.
On Wed, Jan 02, 2019 at 10:40:52AM +0200, Gal Pressman wrote: > Hey Sagi, > > On 02-Jan-19 02:27, Sagi Grimberg wrote: > > Hey, > > > >>>> Hello all, > >>>> This RFC allows device drivers to indicate their support for in-kernel API > >>>> through a flag in the IB device. > >>>> Currently, devices that do not support in-kernel APIs (such as usnic) have no > >>>> way to communicate that to the ULPs which try to use the device and fail. > >>>> Instead, make the driver advertise its support upfront and allow clients to > >>>> exit gracefully in case of unsupported device. > >>>> > >>>> Patch #1 adds the flag to the IB device, sets all existing drivers as kernel > >>>> verbs providers and chanes the IB clients. > >>>> Patch #2 changes usnic driver to a non-kernel verbs provider as it offers no > >>>> kernel API support. > >>>> > >>>> This RFC is introduced following the discussion over the EFA driver [1], which > >>>> initially does not provide in-kernel API support. > >>>> > >>>> [1] https://patchwork.kernel.org/cover/10711629/ > >>> > >>> Having some drivers support kernel verbs and others not is confusing to > >>> Linux RDMA users. If we add a kverbs_provider flag then that means that we > >>> officially support that not all drivers support kverbs. I don't like this - > >>> I think new RDMA drivers should support both kverbs and uverbs. > >> > >> +1, and I'm not sure that we came to any meaningful conclusion that we > >> allow drivers without kverbs, do we? > > > > I think that the discussion is a bit backwards and so is the RFC.. > > > > Its not that EFA does not support kverbs, isn't kverbs is an abstract > > name for "what our current ulps require"? > > > > Why is this kernel specific anyways? The exact same holds for uverbs > > applications that use RC, or in other words we can also see a kernel > > consumer that use UD (and does not rely on IB addressing like ipoib) > > that can run on the EFA device... > > Makes sense. > > > > > Perhaps an rdma device needs to specify mask of which QP types it > > supports such that the core or the consumer can look at if it wants to > > log a meaningful error message (or it can simply fail ib_create_qp). > Supported QP types for the device sounds good. > This could both move a chunk of drivers qp types checks to the core and help > clients identify an unsupported device. > > > > > And, fwiw, I'm not sure I understand why should a new device support our > > kernel ulps (again, not kverbs, functionality required by our existing > > kernel consumers) if its users are not interested in it (if it was the > > case then it would probably be supported). Isn't it enough that > > something like rsockets can run on a device to justify its existence? > Exactly, we just need a proper way to let the ulps know the device does not > provide the required functionality. For me it sounds a little bit different. I think that it will complicate ULPs and will push responsibility from driver authors and their employer companies to ULP authors who are not supposed to be spec savvies in order to use kverbs. Thanks
On 02-Jan-19 13:31, Leon Romanovsky wrote: > On Wed, Jan 02, 2019 at 10:40:52AM +0200, Gal Pressman wrote: >> Hey Sagi, >> >> On 02-Jan-19 02:27, Sagi Grimberg wrote: >>> Hey, >>> >>>>>> Hello all, >>>>>> This RFC allows device drivers to indicate their support for in-kernel API >>>>>> through a flag in the IB device. >>>>>> Currently, devices that do not support in-kernel APIs (such as usnic) have no >>>>>> way to communicate that to the ULPs which try to use the device and fail. >>>>>> Instead, make the driver advertise its support upfront and allow clients to >>>>>> exit gracefully in case of unsupported device. >>>>>> >>>>>> Patch #1 adds the flag to the IB device, sets all existing drivers as kernel >>>>>> verbs providers and chanes the IB clients. >>>>>> Patch #2 changes usnic driver to a non-kernel verbs provider as it offers no >>>>>> kernel API support. >>>>>> >>>>>> This RFC is introduced following the discussion over the EFA driver [1], which >>>>>> initially does not provide in-kernel API support. >>>>>> >>>>>> [1] https://patchwork.kernel.org/cover/10711629/ >>>>> >>>>> Having some drivers support kernel verbs and others not is confusing to >>>>> Linux RDMA users. If we add a kverbs_provider flag then that means that we >>>>> officially support that not all drivers support kverbs. I don't like this - >>>>> I think new RDMA drivers should support both kverbs and uverbs. >>>> >>>> +1, and I'm not sure that we came to any meaningful conclusion that we >>>> allow drivers without kverbs, do we? >>> >>> I think that the discussion is a bit backwards and so is the RFC.. >>> >>> Its not that EFA does not support kverbs, isn't kverbs is an abstract >>> name for "what our current ulps require"? >>> >>> Why is this kernel specific anyways? The exact same holds for uverbs >>> applications that use RC, or in other words we can also see a kernel >>> consumer that use UD (and does not rely on IB addressing like ipoib) >>> that can run on the EFA device... >> >> Makes sense. >> >>> >>> Perhaps an rdma device needs to specify mask of which QP types it >>> supports such that the core or the consumer can look at if it wants to >>> log a meaningful error message (or it can simply fail ib_create_qp). >> Supported QP types for the device sounds good. >> This could both move a chunk of drivers qp types checks to the core and help >> clients identify an unsupported device. >> >>> >>> And, fwiw, I'm not sure I understand why should a new device support our >>> kernel ulps (again, not kverbs, functionality required by our existing >>> kernel consumers) if its users are not interested in it (if it was the >>> case then it would probably be supported). Isn't it enough that >>> something like rsockets can run on a device to justify its existence? >> Exactly, we just need a proper way to let the ulps know the device does not >> provide the required functionality. > > For me it sounds a little bit different. > > I think that it will complicate ULPs and will push responsibility from > driver authors and their employer companies to ULP authors who are not > supposed to be spec savvies in order to use kverbs. > > Thanks > My original suggestion doesn't really complicate the ULPs in any way, an additional if statement is not a lot of responsibility. The QP type suggestion does impose a bit more requirements from the ULPs, but it provides them with more flexibility. A ULP that can make use of UD QP type for example, could use devices that he couldn't have if we used the 'kverbs_provider' all or nothing flag.
On Tue, Jan 01, 2019 at 04:27:51PM -0800, Sagi Grimberg wrote: > Its not that EFA does not support kverbs, isn't kverbs is an abstract > name for "what our current ulps require"? kverbs is an abstract name for "what our current ulps require" and EFA clearly doesn't support that.. What are you trying to say? > Why is this kernel specific anyways? The exact same holds for uverbs > applications that use RC, or in other words we can also see a kernel > consumer that use UD (and does not rely on IB addressing like ipoib) > that can run on the EFA device... We could but we don't have such a thing today.. .. and I'm not sure we ever will, as UD is kind of useless on an ethernet based network. The saner thing to do is to use UDP and one of the high speed ethernet packet processing flavours available these days. > And, fwiw, I'm not sure I understand why should a new device support our > kernel ulps (again, not kverbs, functionality required by our existing > kernel consumers) if its users are not interested in it (if it was the > case then it would probably be supported). So what is your standard for determining if a device is part of the RDMA subsystem or not? If we don't have 'implements kverbs' as a requriement, and just permit UD QP verbs as the baseline requirement, is that OK? (and again note the original EFA submission didn't even support UD QP verbs) This is where usnic is already. The EFA device doesn't support rkeys: it *clearly* doesn't do the thing we call RDMA. > Isn't it enough that something like rsockets can run on a device to > justify its existence? ?? rsockets requires RC RDMA QPs, EFA won't support it. Jason
>> Its not that EFA does not support kverbs, isn't kverbs is an abstract >> name for "what our current ulps require"? > > kverbs is an abstract name for "what our current ulps require" and > EFA clearly doesn't support that.. What are you trying to say? This was directed to the proposal in the patch set, ib_device->kverbs_provider is really not a great design choice for and interface.. >> Why is this kernel specific anyways? The exact same holds for uverbs >> applications that use RC, or in other words we can also see a kernel >> consumer that use UD (and does not rely on IB addressing like ipoib) >> that can run on the EFA device... > > We could but we don't have such a thing today.. > > .. and I'm not sure we ever will, as UD is kind of useless on an > ethernet based network. I don't know either, but I was merely arguing that that this matter is not kverbs specific, hence the interface should not reflect it as such. > The saner thing to do is to use UDP and one of > the high speed ethernet packet processing flavours available these > days. AFAICT we still have a long way before an application can actually do termination + zcopy with whats available today. Forwarding is more reasonable, I agree. >> And, fwiw, I'm not sure I understand why should a new device support our >> kernel ulps (again, not kverbs, functionality required by our existing >> kernel consumers) if its users are not interested in it (if it was the >> case then it would probably be supported). > > So what is your standard for determining if a device is part of the > RDMA subsystem or not? I'm not sure, but its seems slightly odd to me that the vast majority of RDMA use cases is probably not our ulps (as much as I'd like them to be ;)) but we state them as the bar. > If we don't have 'implements kverbs' as a requriement, and just permit > UD QP verbs as the baseline requirement, is that OK? (and again note > the original EFA submission didn't even support UD QP verbs) > > This is where usnic is already. Well, I agree that UD is kind of a low bar, and probably most of the value of the EFA device comes from their SRD transport. Is usnic a burdan is because its not actively maintained in a subsystem that is constantly evolving, or because it implements a small subset of an RDMA device functionality? Personally, given that most of RDMA usage lives in userland, I would think that having a uverbs provider is a more appropriate bar than supporting our kernel consumers. But I (like you and others) would be more than happy seeing both supported. Both can be set as a bar, but one could argue that its an unnecessarily high bar (if its user-base has no interest in running our kernel consumers). > The EFA device doesn't support rkeys: it *clearly* doesn't do the > thing we call RDMA. A lot of applications don't use rkeys. We even have a kernel consumer that don't use rkeys (9p) but still is using RDMA devices. >> Isn't it enough that something like rsockets can run on a device to >> justify its existence? > > ?? rsockets requires RC RDMA QPs, EFA won't support it. I was referring to datagram rsockets...
On Wed, Jan 02, 2019 at 11:32:46AM -0800, Sagi Grimberg wrote: > This was directed to the proposal in the patch set, > ib_device->kverbs_provider is really not a great design choice for > and interface.. Yeah, it really isn't.. > > The saner thing to do is to use UDP and one of > > the high speed ethernet packet processing flavours available these > > days. > > AFAICT we still have a long way before an application can actually do > termination + zcopy with whats available today. Forwarding is more > reasonable, I agree. I thought the AF_XDP stuff was doing zcopy now? > > So what is your standard for determining if a device is part of the > > RDMA subsystem or not? > > I'm not sure, but its seems slightly odd to me that the vast majority > of RDMA use cases is probably not our ulps (as much as I'd like them to > be ;)) but we state them as the bar. Well, it is the vast majority of the in-kernel use cases, for sure :) > Well, I agree that UD is kind of a low bar, and probably most of the > value of the EFA device comes from their SRD transport. > > Is usnic a burdan is because its not actively maintained in a > subsystem that is constantly evolving, or because it implements a > small subset of an RDMA device functionality? I think because everytime someone wants to do something to refactor the core API's (like the completion stuff, or the RDMA WC stuff) they've looked into usnic to see if it would break and got all confused. For instance, EFA did not implement create_ah properly, so anyone looking at how AH works for kverbs will become very confused by it. I suppose this is why just blocking kverbs entirely has come up as a proposal. People working on kverbs *do not* need to care about non-kverbs drivers at all. Part of this is how uverbs and kverbs are really roughly pushed into the same driver API, so a driver can't just say it supports uverbs only... > Personally, given that most of RDMA usage lives in userland, I would > think that having a uverbs provider is a more appropriate bar than > supporting our kernel consumers. But I (like you and others) would be > more than happy seeing both supported. We don't really have a compliance test or anything for uverbs, beyond the stuff in rdma-core, so if a driver doesn't support that stuff there is no way to know if the device or driver is implementing verbs correctly.. > Both can be set as a bar, but one could argue that its an > unnecessarily high bar (if its user-base has no interest in running > our kernel consumers). So how should we support non-verbs things? EFA seems particularly difficult because it is a *little* verbs like, and does seem to fit into the uverbs system somewhat OK. > > The EFA device doesn't support rkeys: it *clearly* doesn't do the > > thing we call RDMA. > > A lot of applications don't use rkeys. We even have a kernel consumer > that don't use rkeys (9p) but still is using RDMA devices. It uses only SEND? > > > Isn't it enough that something like rsockets can run on a device to > > > justify its existence? > > > > ?? rsockets requires RC RDMA QPs, EFA won't support it. > > I was referring to datagram rsockets... Even so, I don't think EFA has an addressing model compatible with rsockets, it doesn't use RDMA-CM either, which I think rsockets requires still for UD?? Jason
>> AFAICT we still have a long way before an application can actually do >> termination + zcopy with whats available today. Forwarding is more >> reasonable, I agree. > > I thought the AF_XDP stuff was doing zcopy now? Apparently it does, on a intel nics for now... cool. >> Well, I agree that UD is kind of a low bar, and probably most of the >> value of the EFA device comes from their SRD transport. >> >> Is usnic a burdan is because its not actively maintained in a >> subsystem that is constantly evolving, or because it implements a >> small subset of an RDMA device functionality? > > I think because everytime someone wants to do something to refactor > the core API's (like the completion stuff, or the RDMA WC stuff) > they've looked into usnic to see if it would break and got all > confused. You're right, every time I swung by it, I didn't understand what that thing is doing but assumed its broken so I didn't worry about it too much... > For instance, EFA did not implement create_ah properly, so anyone > looking at how AH works for kverbs will become very confused by it. To an extent it cannot be fixed? > I suppose this is why just blocking kverbs entirely has come up as a > proposal. People working on kverbs *do not* need to care about > non-kverbs drivers at all. > > Part of this is how uverbs and kverbs are really roughly pushed into > the same driver API, so a driver can't just say it supports uverbs > only... Well, there are a few devices that we don't really know to support our kernel consumers (or at least I've never heard of someone who verified them), and some that are known to be broken/deprecated. The reason for this is because no one uses them to run ulps, which is the case for EFA most likely. >> Personally, given that most of RDMA usage lives in userland, I would >> think that having a uverbs provider is a more appropriate bar than >> supporting our kernel consumers. But I (like you and others) would be >> more than happy seeing both supported. > > We don't really have a compliance test or anything for uverbs, beyond > the stuff in rdma-core, so if a driver doesn't support that stuff > there is no way to know if the device or driver is implementing verbs > correctly.. Sounds like a needed testing suite (even regardless of the discussion here). >> Both can be set as a bar, but one could argue that its an >> unnecessarily high bar (if its user-base has no interest in running >> our kernel consumers). > > So how should we support non-verbs things? EFA seems particularly > difficult because it is a *little* verbs like, and does seem to fit > into the uverbs system somewhat OK. I didn't say support non-verbs things. I personally think that uverbs interface is required (and I think Gal agreed to add one to his future submissions). Anyways... its complicated I guess.. its hard to come up with a reasonable bar half way in... Its just ones opinion.. >>> The EFA device doesn't support rkeys: it *clearly* doesn't do the >>> thing we call RDMA. >> >> A lot of applications don't use rkeys. We even have a kernel consumer >> that don't use rkeys (9p) but still is using RDMA devices. > > It uses only SEND? Yep. >>>> Isn't it enough that something like rsockets can run on a device to >>>> justify its existence? >>> >>> ?? rsockets requires RC RDMA QPs, EFA won't support it. >> >> I was referring to datagram rsockets... > > Even so, I don't think EFA has an addressing model compatible with > rsockets, it doesn't use RDMA-CM either, which I think rsockets > requires still for UD?? I'd assume that EFA proprietary stuff matches IPv4/v6 to their something so they can hook into ucma? otherwise how would addressing work at all? Perhaps Gal can share more on this...
On 02-Jan-19 23:41, Sagi Grimberg wrote: > >>> AFAICT we still have a long way before an application can actually do >>> termination + zcopy with whats available today. Forwarding is more >>> reasonable, I agree. >> >> I thought the AF_XDP stuff was doing zcopy now? > > Apparently it does, on a intel nics for now... cool. > >>> Well, I agree that UD is kind of a low bar, and probably most of the >>> value of the EFA device comes from their SRD transport. >>> >>> Is usnic a burdan is because its not actively maintained in a >>> subsystem that is constantly evolving, or because it implements a >>> small subset of an RDMA device functionality? >> >> I think because everytime someone wants to do something to refactor >> the core API's (like the completion stuff, or the RDMA WC stuff) >> they've looked into usnic to see if it would break and got all >> confused. > > You're right, every time I swung by it, I didn't understand what that > thing is doing but assumed its broken so I didn't worry about it too > much... > >> For instance, EFA did not implement create_ah properly, so anyone >> looking at how AH works for kverbs will become very confused by it. > > To an extent it cannot be fixed? Are we talking about the fact that EFA's create_ah is not atomic? That was fixed using the sleepable flag. We can also make it atomic, but as discussed, it makes no sense to do that until we actually go through the atomic flows. > >> I suppose this is why just blocking kverbs entirely has come up as a >> proposal. People working on kverbs *do not* need to care about >> non-kverbs drivers at all. >> >> Part of this is how uverbs and kverbs are really roughly pushed into >> the same driver API, so a driver can't just say it supports uverbs >> only... > > Well, there are a few devices that we don't really know to support our > kernel consumers (or at least I've never heard of someone who verified > them), and some that are known to be broken/deprecated. The reason for > this is because no one uses them to run ulps, which is the case for EFA > most likely. Correct, that's the reasoning for the kverbs_provider flag. Jason, do you prefer Sagi's suggestion to make the drivers advertise their supported QP types instead of the kverbs provider interface? > >>> Personally, given that most of RDMA usage lives in userland, I would >>> think that having a uverbs provider is a more appropriate bar than >>> supporting our kernel consumers. But I (like you and others) would be >>> more than happy seeing both supported. >> >> We don't really have a compliance test or anything for uverbs, beyond >> the stuff in rdma-core, so if a driver doesn't support that stuff >> there is no way to know if the device or driver is implementing verbs >> correctly.. > > Sounds like a needed testing suite (even regardless of the discussion > here). > >>> Both can be set as a bar, but one could argue that its an >>> unnecessarily high bar (if its user-base has no interest in running >>> our kernel consumers). >> >> So how should we support non-verbs things? EFA seems particularly >> difficult because it is a *little* verbs like, and does seem to fit >> into the uverbs system somewhat OK. > > I didn't say support non-verbs things. I personally think that uverbs > interface is required (and I think Gal agreed to add one to his future > submissions). > > Anyways... its complicated I guess.. its hard to come up with a > reasonable bar half way in... Its just ones opinion.. > >>>> The EFA device doesn't support rkeys: it *clearly* doesn't do the >>>> thing we call RDMA. >>> >>> A lot of applications don't use rkeys. We even have a kernel consumer >>> that don't use rkeys (9p) but still is using RDMA devices. >> >> It uses only SEND? > > Yep. > >>>>> Isn't it enough that something like rsockets can run on a device to >>>>> justify its existence? >>>> >>>> ?? rsockets requires RC RDMA QPs, EFA won't support it. >>> >>> I was referring to datagram rsockets... >> >> Even so, I don't think EFA has an addressing model compatible with >> rsockets, it doesn't use RDMA-CM either, which I think rsockets >> requires still for UD?? > > I'd assume that EFA proprietary stuff matches IPv4/v6 to their > something so they can hook into ucma? > > otherwise how would addressing work at all? Perhaps Gal can share > more on this... Our addressing does not rely on rdmacm, also, there is no matching netdevice (ipv4/6) for the EFA ib device. Each EFA device has a 16 bytes opaque GID (queried from the device) that should be specified when creating the AH. libfabric's connection manager (out of band) is used to exchange these device GIDs and destination QP numbers. Does that answer your questions?
>>>>>> Isn't it enough that something like rsockets can run on a device to >>>>>> justify its existence? >>>>> >>>>> ?? rsockets requires RC RDMA QPs, EFA won't support it. >>>> >>>> I was referring to datagram rsockets... >>> >>> Even so, I don't think EFA has an addressing model compatible with >>> rsockets, it doesn't use RDMA-CM either, which I think rsockets >>> requires still for UD?? >> >> I'd assume that EFA proprietary stuff matches IPv4/v6 to their >> something so they can hook into ucma? >> >> otherwise how would addressing work at all? Perhaps Gal can share >> more on this... > > Our addressing does not rely on rdmacm, also, there is no matching netdevice > (ipv4/6) for the EFA ib device. I suppose one can be made though? even in SW implementing some sort of indirection table? > Each EFA device has a 16 bytes opaque GID (queried from the device) that should > be specified when creating the AH. > > libfabric's connection manager (out of band) is used to exchange these device > GIDs and destination QP numbers. > > Does that answer your questions? Yes, but I do tend to agree with the notion that efa needs to be made to fit uverbs better. That means a libibverbs provider and a efacm component that can map to rdma_ucm. Is that feasible?
> Yes, but I do tend to agree with the notion that efa needs to be made to > fit uverbs better. That means a libibverbs provider and a efacm > component that can map to rdma_ucm. Is that feasible? EFA supports connectionless communication. Mapping to rdma_ucm doesn't make sense to me. - Sean
>> Yes, but I do tend to agree with the notion that efa needs to be made to >> fit uverbs better. That means a libibverbs provider and a efacm >> component that can map to rdma_ucm. Is that feasible? > > EFA supports connectionless communication. Mapping to rdma_ucm doesn't make sense to me. Gal said that address libfabric connection manager is used to resolve addressing... so there is some connection manager involved...
> >> Yes, but I do tend to agree with the notion that efa needs to be made to > >> fit uverbs better. That means a libibverbs provider and a efacm > >> component that can map to rdma_ucm. Is that feasible? > > > > EFA supports connectionless communication. Mapping to rdma_ucm doesn't make > sense to me. > > Gal said that address libfabric connection manager is used to resolve > addressing... so there is some connection manager involved... I haven't see the libfabric provider yet, but libfabric has generic out-of-band socket-based name service that can be used by providers. I'm guessing that's what Gal is referring to. The name service is primarily there to support fabtests. In realistic use cases, those providers rely on a job manager to exchange addressing, with name service support disabled. - Sean
On 1/4/19 5:09 PM, Hefty, Sean wrote: >>>> Yes, but I do tend to agree with the notion that efa needs to be made to >>>> fit uverbs better. That means a libibverbs provider and a efacm >>>> component that can map to rdma_ucm. Is that feasible? >>> >>> EFA supports connectionless communication. Mapping to rdma_ucm doesn't make >> sense to me. >> >> Gal said that address libfabric connection manager is used to resolve >> addressing... so there is some connection manager involved... > > I haven't see the libfabric provider yet, but libfabric has generic out-of-band socket-based name service that can be used by provider > I'm guessing that's what Gal is referring to. The name service is primarily there to support fabtests. > In realistic use cases, those providers rely on a job manager to exchange addressing, with name service support disabled. I think that this is what I was referring to by introducing efacm like ibcm and iwcm... Isn't it in essence the same thing?
> > I haven't see the libfabric provider yet, but libfabric has generic out-of- > band socket-based name service that can be used by provider > I'm guessing > that's what Gal is referring to. The name service is > primarily there to support fabtests. > > In realistic use cases, those providers rely on a job manager to exchange > addressing, with name service support disabled. > > I think that this is what I was referring to by introducing efacm like > ibcm and iwcm... Isn't it in essence the same thing? Not quite - this isn't running a connection protocol. The closest in tree comparison would be the IB SIDR protocol used in conjunction with IP addresses. I’m not aware of anyone using that, however. Unconnected endpoints typically have an existing out of band mechanism (e.g. PMI) that can be used for address exchange. The PSM/2 drivers make a similar assumption. - Sean
On Mon, Jan 07, 2019 at 04:28:54PM +0000, Hefty, Sean wrote: > > > I haven't see the libfabric provider yet, but libfabric has generic out-of- > > band socket-based name service that can be used by provider > I'm guessing > > that's what Gal is referring to. The name service is > > primarily there to support fabtests. > > > In realistic use cases, those providers rely on a job manager to exchange > > addressing, with name service support disabled. > > > > I think that this is what I was referring to by introducing efacm like > > ibcm and iwcm... Isn't it in essence the same thing? > > Not quite - this isn't running a connection protocol. The closest > in tree comparison would be the IB SIDR protocol used in conjunction > with IP addresses. I’m not aware of anyone using that, however. > Unconnected endpoints typically have an existing out of band > mechanism (e.g. PMI) that can be used for address exchange. The > PSM/2 drivers make a similar assumption. Dare I ask how it avoids duplicate messages without a connection protocol? Jason
On Jan 7, 2019, at 3:42 PM, Jason Gunthorpe <jgg@ziepe.ca> wrote: > > On Mon, Jan 07, 2019 at 04:28:54PM +0000, Hefty, Sean wrote: >>>> I haven't see the libfabric provider yet, but libfabric has generic out-of- >>> band socket-based name service that can be used by provider > I'm guessing >>> that's what Gal is referring to. The name service is >>> primarily there to support fabtests. >>>> In realistic use cases, those providers rely on a job manager to exchange >>> addressing, with name service support disabled. >>> >>> I think that this is what I was referring to by introducing efacm like >>> ibcm and iwcm... Isn't it in essence the same thing? >> >> Not quite - this isn't running a connection protocol. The closest >> in tree comparison would be the IB SIDR protocol used in conjunction >> with IP addresses. I’m not aware of anyone using that, however. >> Unconnected endpoints typically have an existing out of band >> mechanism (e.g. PMI) that can be used for address exchange. The >> PSM/2 drivers make a similar assumption. > > Dare I ask how it avoids duplicate messages without a connection > protocol? In SRD’s case, there is a connection-like structure between any two NICs that is dynamically established as part of packet transmission. If you look at Sandia Portals (which is even further from standard VERBS, but is a well documented communication interface so worth referencing), it assumes a job configuration step that, while not establishing a connection in the VERBS sense of the word connection, does give a time period for which reliability data can be stored. Brian
On Mon, Jan 07, 2019 at 11:56:02PM +0000, Barrett, Brian wrote: > On Jan 7, 2019, at 3:42 PM, Jason Gunthorpe <jgg@ziepe.ca> wrote: > > > > On Mon, Jan 07, 2019 at 04:28:54PM +0000, Hefty, Sean wrote: > >>>> I haven't see the libfabric provider yet, but libfabric has generic out-of- > >>> band socket-based name service that can be used by provider > I'm guessing > >>> that's what Gal is referring to. The name service is > >>> primarily there to support fabtests. > >>>> In realistic use cases, those providers rely on a job manager to exchange > >>> addressing, with name service support disabled. > >>> > >>> I think that this is what I was referring to by introducing efacm like > >>> ibcm and iwcm... Isn't it in essence the same thing? > >> > >> Not quite - this isn't running a connection protocol. The closest > >> in tree comparison would be the IB SIDR protocol used in conjunction > >> with IP addresses. I’m not aware of anyone using that, however. > >> Unconnected endpoints typically have an existing out of band > >> mechanism (e.g. PMI) that can be used for address exchange. The > >> PSM/2 drivers make a similar assumption. > > > > Dare I ask how it avoids duplicate messages without a connection > > protocol? > > In SRD’s case, there is a connection-like structure between any two > NICs that is dynamically established as part of packet transmission. > If you look at Sandia Portals (which is even further from standard > VERBS, but is a well documented communication interface so worth > referencing), it assumes a job configuration step that, while not > establishing a connection in the VERBS sense of the word connection, > does give a time period for which reliability data can be stored. Usually the reason a protocol needs an explicit exchange of connection parameters is to solve collisions with ID re-use, ie the source ID matching the 'connection-like' structure gets improperly re-used due to machine reboot, general ID recycling, or whatever. Does SRD inherently rely on the job-like scheme for correct operation? A mandatory job-like scheme would probably preclude using it directly in kernel ULPs in future.. Jason
> > On Jan 7, 2019, at 16:29, Jason Gunthorpe <jgg@ziepe.ca> wrote: > >> On Mon, Jan 07, 2019 at 11:56:02PM +0000, Barrett, Brian wrote: >>> On Jan 7, 2019, at 3:42 PM, Jason Gunthorpe <jgg@ziepe.ca> wrote: >>> >>> On Mon, Jan 07, 2019 at 04:28:54PM +0000, Hefty, Sean wrote: >>>>>> I haven't see the libfabric provider yet, but libfabric has generic out-of- >>>>> band socket-based name service that can be used by provider > I'm guessing >>>>> that's what Gal is referring to. The name service is >>>>> primarily there to support fabtests. >>>>>> In realistic use cases, those providers rely on a job manager to exchange >>>>> addressing, with name service support disabled. >>>>> >>>>> I think that this is what I was referring to by introducing efacm like >>>>> ibcm and iwcm... Isn't it in essence the same thing? >>>> >>>> Not quite - this isn't running a connection protocol. The closest >>>> in tree comparison would be the IB SIDR protocol used in conjunction >>>> with IP addresses. I’m not aware of anyone using that, however. >>>> Unconnected endpoints typically have an existing out of band >>>> mechanism (e.g. PMI) that can be used for address exchange. The >>>> PSM/2 drivers make a similar assumption. >>> >>> Dare I ask how it avoids duplicate messages without a connection >>> protocol? >> >> In SRD’s case, there is a connection-like structure between any two >> NICs that is dynamically established as part of packet transmission. >> If you look at Sandia Portals (which is even further from standard >> VERBS, but is a well documented communication interface so worth >> referencing), it assumes a job configuration step that, while not >> establishing a connection in the VERBS sense of the word connection, >> does give a time period for which reliability data can be stored. > > Usually the reason a protocol needs an explicit exchange of connection > parameters is to solve collisions with ID re-use, ie the source ID > matching the 'connection-like' structure gets improperly re-used due > to machine reboot, general ID recycling, or whatever. > > Does SRD inherently rely on the job-like scheme for correct operation? > > A mandatory job-like scheme would probably preclude using it directly > in kernel ULPs in future.. Sorry, that wasn’t clear. No, SRD does not require any job-like indicators. It has a protocol to establish / invalidate reliability state in firmware. My point was that whether or not there’s a connection established under the covers, there’s no visible connection to the user with SRD; the usage flow is similar to UD or RD (obviously with different reliability, ordering, and performance characteristics). My experience (perhaps incorrect, but matching with Gal’s expectations) with UD and RD is that consumers of the datagram protocols don’t use a connection manager (because there isn’t a connection). If this is a bad assumption, we’ll go back and rethink our strategy. Brian
On Tue, Jan 08, 2019 at 03:53:49AM +0000, Barrett, Brian wrote: > Sorry, that wasn’t clear. No, SRD does not require any job-like > indicators. It has a protocol to establish / invalidate reliability > state in firmware. My point was that whether or not there’s a > connection established under the covers, there’s no visible > connection to the user with SRD; A hidden connection manager makes a lot more sense. But still, the hidden and uncontrolled resource usage is probably still not so great for anything but a job-like HPC application. Any client/server thing is going to want to control this resource more finely. > characteristics). My experience (perhaps incorrect, but matching > with Gal’s expectations) with UD and RD is that consumers of the > datagram protocols don’t use a connection manager (because there > isn’t a connection). If this is a bad assumption, we’ll go back and > rethink our strategy. Well, sure sounds like there *is* a connection - you've just removed all visiblity and control over the underlying connection resources and state from the OS and application. UD has both connected and unconnected flows that are interesting, and as soon as there is a resource and state, generally, people will eventually find a reason to need control over that. (although probably not from a HPC workload perspective) For instance, most enterprise applications will want to tear down and restart their 'connection' - in the SRD perspective this means forgetting about all the connection state and setting it up again. In typical cases for other protocols this might select a different network multi-path, or side step some bug that was preventing forward progress. So, you can choose to hide all of this, but I wouldn't describe SRD as unconnected, more as 'automatically connected'. Jason
> From: Jason Gunthorpe <jgg@ziepe.ca> > > But still, the hidden and uncontrolled resource usage is probably still not so > great for anything but a job-like HPC application. Any client/server thing is > going to want to control this resource more finely. > Here is an excerpt from "SRD spec" we will provide, hope it will clarify things: SRD QPs provide reliable but out-of-order delivery without segmentation support. This allows decoupling of transport processing from QP buffer management, so that separate application flows can be multiplexed without interfering with each other. As in UD QPs, each WR includes the AH of the remote destination, allowing a process to communicate with any process on any endnode using a single QP, on both send and receive side. Each Address Handle is associated with an SRD context. SRD context is used to provide reliable communication to a remote node, similar to RD EE context, but without explicit management by a user. SRD contexts are implicitly controlled by AH and QP management operations. If a QP is destroyed, all pending Send WRs on that QP are implicitly canceled, and their transport processing is aborted, without affecting SRD processing of other WRs. If an AH is destroyed, any outstanding WRs using that AH are completed in error. Completion for Send WRs posted to SRD QPs are same as for WRs posted to regular QPs. Success is reported after the WR is acked by the responder. In addition to local errors, new types of remote errors are returned for requests that caused the responder to send a NAK. These errors could have been caused when the destination QP either does not exist, or is in error state, or does not have posted Recv WRs. These errors do not affect SRD context state. > > UD has both connected and unconnected flows that are interesting, and as > soon as there is a resource and state, generally, people will eventually find a > reason to need control over that. (although probably not from a HPC > workload perspective) We can support CM (UD-style) if anybody ever needs it, but it would be used only to control QP states, not SRD transport state. > > For instance, most enterprise applications will want to tear down and restart > their 'connection' - in the SRD perspective this means forgetting about all the > connection state and setting it up again. This is how it is today because of tight coupling of the interface and underlying protocol, it does not have to be achieved by connection tear down. > > In typical cases for other protocols this might select a different network > multi-path, or side step some bug that was preventing forward progress. Which is exactly why we chose to design a new protocol. > > So, you can choose to hide all of this, but I wouldn't describe SRD as > unconnected, more as 'automatically connected'. Is there any difference from a user perspective? Leah
On Tue, Jan 08, 2019 at 09:17:53AM +0000, Shalev, Leah wrote: > > So, you can choose to hide all of this, but I wouldn't describe SRD as > > unconnected, more as 'automatically connected'. > Is there any difference from a user perspective? Unconnected implies there is no state. Automatically connected implies there is a state but it is hidden from the user. They are equivalent until the state actually matters, like there are bugs in the state management :) Jason
> > > So, you can choose to hide all of this, but I wouldn't describe SRD as > > > unconnected, more as 'automatically connected'. > > > Is there any difference from a user perspective? > > Unconnected implies there is no state. Automatically connected > implies there is a state but it is hidden from the user. Brian said that the state is maintained NIC to NIC, not at the QP level. He went further and suggested that the state is maintained for some period of time, and isn't necessarily tied to the lifetime of the QP. The SRD QPs are unconnected, but let's argue what the definition of a connection is now. > They are equivalent until the state actually matters, like there are > bugs in the state management :) Yes, everything that has transport offload and provides reliability maintains some sort of state. And they can all have bugs. So?
On Wed, Jan 09, 2019 at 12:25:23AM +0000, Hefty, Sean wrote: > > > > So, you can choose to hide all of this, but I wouldn't describe SRD as > > > > unconnected, more as 'automatically connected'. > > > > > Is there any difference from a user perspective? > > > > Unconnected implies there is no state. Automatically connected > > implies there is a state but it is hidden from the user. > > Brian said that the state is maintained NIC to NIC, not at the QP > level. He went further and suggested that the state is maintained > for some period of time, and isn't necessarily tied to the lifetime > of the QP. The SRD QPs are unconnected, but let's argue what the > definition of a connection is now. I didn't say QP, I said the protocol was 'automatically connected' SRD seems similar to the IB spec concept of RD which is described as an unconnected QP running a 'connection oriented' RD protocol. EFA wants to hide the objects related to connection state (ie in IBA RD terms RDC and EEC) of the protocol, thats fine, but lets not pretend it doesn't exist, OK? Sagi wanted to know where CM was done, and now we know. It is hidden in the NIC, and managed automatically. Jason