Message ID | 1429878230-11749-2-git-send-email-yun.wang@profitbricks.com (mailing list archive) |
---|---|
State | Rejected |
Headers | show |
On 4/24/2015 8:23 AM, Michael Wang wrote: > Add new callback query_transport() and implement for each HW. > > Mapping List: > node-type link-layer old-transport new-transport > ... > mlx4 IB_CA IB/ETH IB IB/IBOE > mlx5 IB_CA IB IB IB > ... > diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c > index 57c9809..b6f2f58 100644 > --- a/drivers/infiniband/hw/mlx5/main.c > +++ b/drivers/infiniband/hw/mlx5/main.c > @@ -262,6 +262,12 @@ out: > return err; > } > > +static enum rdma_transport_type > +mlx5_ib_query_transport(struct ib_device *device, u8 port_num) > +{ > + return RDMA_TRANSPORT_IB; > +} > + Just noticed that mlx5 is not being coded as RoCE-capable like mlx4. The mlx5 driver is for the new ConnectX-4, which implements all three of IB, RoCE and RoCEv2, right? Are those last two not supported? -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 04/24/2015 04:29 PM, Tom Talpey wrote: > On 4/24/2015 8:23 AM, Michael Wang wrote: [snip] >> +static enum rdma_transport_type >> +mlx5_ib_query_transport(struct ib_device *device, u8 port_num) >> +{ >> + return RDMA_TRANSPORT_IB; >> +} >> + > > > Just noticed that mlx5 is not being coded as RoCE-capable like mlx4. > The mlx5 driver is for the new ConnectX-4, which implements all three > of IB, RoCE and RoCEv2, right? Are those last two not supported? I'm not sure about the details of mlx5, but according to the current implementation, it's transport is IB without a link-layer callback, which means it doesn't support IBoE... And there is no method to change the port link-layer type as mlx4 did. Regards, Michael Wang > > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 4/24/2015 10:35 AM, Michael Wang wrote: > > > On 04/24/2015 04:29 PM, Tom Talpey wrote: >> On 4/24/2015 8:23 AM, Michael Wang wrote: > [snip] >>> +static enum rdma_transport_type >>> +mlx5_ib_query_transport(struct ib_device *device, u8 port_num) >>> +{ >>> + return RDMA_TRANSPORT_IB; >>> +} >>> + >> >> >> Just noticed that mlx5 is not being coded as RoCE-capable like mlx4. >> The mlx5 driver is for the new ConnectX-4, which implements all three >> of IB, RoCE and RoCEv2, right? Are those last two not supported? > > I'm not sure about the details of mlx5, but according to the current > implementation, it's transport is IB without a link-layer callback, > which means it doesn't support IBoE... > > And there is no method to change the port link-layer type as mlx4 did. Hal, is that correct? From the Mellanox web: http://www.mellanox.com/related-docs/products/IB_Adapter_card_brochure_c_2_3.pdf "ConnectX-4 ConnectX-4 adapter cards with Virtual Protocol Interconnect (VPI), supporting EDR 100Gb/s InfiniBand and 100Gb/s Ethernet connectivity, provide the... "Virtual Protocol Interconnect VPIĀ® flexibility enables any standard networking, clustering, storage, and management protocol to seamlessly operate over any converged network leveraging a consolidated software stack. Each port can operate on InfiniBand, Ethernet, or Data Center Bridging (DCB) fabrics, and supports IP over InfiniBand (IPoIB), Ethernet over InfiniBand (EoIB) and RDMA over Converged Ethernet (RoCE and RoCEv2). -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma- > [snip] > a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index > 65994a1..d54f91e 100644 > --- a/include/rdma/ib_verbs.h > +++ b/include/rdma/ib_verbs.h > @@ -75,10 +75,13 @@ enum rdma_node_type { }; > > enum rdma_transport_type { > + /* legacy for users */ > RDMA_TRANSPORT_IB, > RDMA_TRANSPORT_IWARP, > RDMA_TRANSPORT_USNIC, > - RDMA_TRANSPORT_USNIC_UDP > + RDMA_TRANSPORT_USNIC_UDP, > + /* new transport */ > + RDMA_TRANSPORT_IBOE, Remove RDMA_TRANSPORT_IBOE - it is not a transport. ROCE uses IBTA transport. If any code should test for ROCE should invoke a specific helper, e.g., rdma_protocol_iboe(). This is what you currently call "rdma_tech_iboe" is patch 02/26. I think that pretty much everybody agrees that rdma_protocol_*() is a better name than rdma_tech_*(), right? So, let's change this. The semantics are: "check that a link supports a certain wire protocol, or a set of wire protocols", where 'certain' refers to the specific helper... -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 2015-04-24 at 10:46 -0400, Tom Talpey wrote: > On 4/24/2015 10:35 AM, Michael Wang wrote: > > > > > > On 04/24/2015 04:29 PM, Tom Talpey wrote: > >> On 4/24/2015 8:23 AM, Michael Wang wrote: > > [snip] > >>> +static enum rdma_transport_type > >>> +mlx5_ib_query_transport(struct ib_device *device, u8 port_num) > >>> +{ > >>> + return RDMA_TRANSPORT_IB; > >>> +} > >>> + > >> > >> > >> Just noticed that mlx5 is not being coded as RoCE-capable like mlx4. > >> The mlx5 driver is for the new ConnectX-4, which implements all three > >> of IB, RoCE and RoCEv2, right? Are those last two not supported? > > > > I'm not sure about the details of mlx5, but according to the current > > implementation, it's transport is IB without a link-layer callback, > > which means it doesn't support IBoE... > > > > And there is no method to change the port link-layer type as mlx4 did. > > Hal, is that correct? The mlx5 Ethernet driver has not yet been submitted to the linux kernel. Until it lands, the drive is IB only. When it does land, it will need to update this area of code.
On 04/24/2015 05:12 PM, Liran Liss wrote: >> From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma- >> > [snip] >> a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index >> 65994a1..d54f91e 100644 >> --- a/include/rdma/ib_verbs.h >> +++ b/include/rdma/ib_verbs.h >> @@ -75,10 +75,13 @@ enum rdma_node_type { }; >> >> enum rdma_transport_type { >> + /* legacy for users */ >> RDMA_TRANSPORT_IB, >> RDMA_TRANSPORT_IWARP, >> RDMA_TRANSPORT_USNIC, >> - RDMA_TRANSPORT_USNIC_UDP >> + RDMA_TRANSPORT_USNIC_UDP, >> + /* new transport */ >> + RDMA_TRANSPORT_IBOE, > > Remove RDMA_TRANSPORT_IBOE - it is not a transport. > ROCE uses IBTA transport. > > If any code should test for ROCE should invoke a specific helper, e.g., rdma_protocol_iboe(). > This is what you currently call "rdma_tech_iboe" is patch 02/26. > > I think that pretty much everybody agrees that rdma_protocol_*() is a better name than rdma_tech_*(), right? > So, let's change this. Sure, sounds reasonable now, about the IBOE, we still need it to separate the port support IB/ETH without the check on link-layer, So what about a new enum on protocol type? Like: enum rdma_protocol { RDMA_PROTOCOL_IB, RDMA_PROTOCOL_IBOE, RDMA_PROTOCOL_IWARP, RDMA_PROTOCOL_USNIC_UDP }; So we could use query_protocol() to ask device provide the protocol type, and there will be no mixing with the legacy transport type anymore :-) Regards, Michael Wang > > The semantics are: "check that a link supports a certain wire protocol, or a set of wire protocols", where 'certain' > refers to the specific helper... > > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Apr 27, 2015 at 09:39:05AM +0200, Michael Wang wrote: > > > On 04/24/2015 05:12 PM, Liran Liss wrote: > >> From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma- > >> > > [snip] > >> a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index > >> 65994a1..d54f91e 100644 > >> --- a/include/rdma/ib_verbs.h > >> +++ b/include/rdma/ib_verbs.h > >> @@ -75,10 +75,13 @@ enum rdma_node_type { }; > >> > >> enum rdma_transport_type { > >> + /* legacy for users */ > >> RDMA_TRANSPORT_IB, > >> RDMA_TRANSPORT_IWARP, > >> RDMA_TRANSPORT_USNIC, > >> - RDMA_TRANSPORT_USNIC_UDP > >> + RDMA_TRANSPORT_USNIC_UDP, > >> + /* new transport */ > >> + RDMA_TRANSPORT_IBOE, > > > > Remove RDMA_TRANSPORT_IBOE - it is not a transport. > > ROCE uses IBTA transport. > > > > If any code should test for ROCE should invoke a specific helper, e.g., rdma_protocol_iboe(). > > This is what you currently call "rdma_tech_iboe" is patch 02/26. > > > > I think that pretty much everybody agrees that rdma_protocol_*() is a better name than rdma_tech_*(), right? > > So, let's change this. > > Sure, sounds reasonable now, about the IBOE, we still need it to > separate the port support IB/ETH without the check on link-layer, > So what about a new enum on protocol type? > > Like: > > enum rdma_protocol { > RDMA_PROTOCOL_IB, > RDMA_PROTOCOL_IBOE, > RDMA_PROTOCOL_IWARP, > RDMA_PROTOCOL_USNIC_UDP > }; > > So we could use query_protocol() to ask device provide the protocol > type, and there will be no mixing with the legacy transport type > anymore :-) I'm ok with that. I like introducing a unique namespace which is clearly different from the previous "transport" one. Ira -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 4/27/2015 2:52 PM, ira.weiny wrote: > On Mon, Apr 27, 2015 at 09:39:05AM +0200, Michael Wang wrote: >> >> >> On 04/24/2015 05:12 PM, Liran Liss wrote: >>>> From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma- >>>> >>> [snip] >>>> a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index >>>> 65994a1..d54f91e 100644 >>>> --- a/include/rdma/ib_verbs.h >>>> +++ b/include/rdma/ib_verbs.h >>>> @@ -75,10 +75,13 @@ enum rdma_node_type { }; >>>> >>>> enum rdma_transport_type { >>>> + /* legacy for users */ >>>> RDMA_TRANSPORT_IB, >>>> RDMA_TRANSPORT_IWARP, >>>> RDMA_TRANSPORT_USNIC, >>>> - RDMA_TRANSPORT_USNIC_UDP >>>> + RDMA_TRANSPORT_USNIC_UDP, >>>> + /* new transport */ >>>> + RDMA_TRANSPORT_IBOE, >>> >>> Remove RDMA_TRANSPORT_IBOE - it is not a transport. >>> ROCE uses IBTA transport. >>> >>> If any code should test for ROCE should invoke a specific helper, e.g., rdma_protocol_iboe(). >>> This is what you currently call "rdma_tech_iboe" is patch 02/26. >>> >>> I think that pretty much everybody agrees that rdma_protocol_*() is a better name than rdma_tech_*(), right? >>> So, let's change this. >> >> Sure, sounds reasonable now, about the IBOE, we still need it to >> separate the port support IB/ETH without the check on link-layer, >> So what about a new enum on protocol type? >> >> Like: >> >> enum rdma_protocol { >> RDMA_PROTOCOL_IB, >> RDMA_PROTOCOL_IBOE, >> RDMA_PROTOCOL_IWARP, >> RDMA_PROTOCOL_USNIC_UDP >> }; >> >> So we could use query_protocol() to ask device provide the protocol >> type, and there will be no mixing with the legacy transport type >> anymore :-) > > I'm ok with that. I like introducing a unique namespace which is clearly > different from the previous "transport" one. I agree the word "transport" takes things into the weeds. But on the topic of naming protocols, I've been wondering, is there some reason that "IBOE" is being used instead of "RoCE"? The IBOE protocol used to exist and is not the same as the currently standardized RoCE, right? Also wondering, why add "UDP" to USNIC, is there a different USNIC? Naming multiple layers together seems confusing and maybe in the end will create more code to deal with the differences. For example, what token will RoCEv2 take? RoCE_UDP, RoCE_v2 or ... ? -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 2015-04-27 at 17:16 -0700, Tom Talpey wrote: > On 4/27/2015 2:52 PM, ira.weiny wrote: > > On Mon, Apr 27, 2015 at 09:39:05AM +0200, Michael Wang wrote: > >> > >> > >> On 04/24/2015 05:12 PM, Liran Liss wrote: > >>>> From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma- > >>>> > >>> [snip] > >>>> a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index > >>>> 65994a1..d54f91e 100644 > >>>> --- a/include/rdma/ib_verbs.h > >>>> +++ b/include/rdma/ib_verbs.h > >>>> @@ -75,10 +75,13 @@ enum rdma_node_type { }; > >>>> > >>>> enum rdma_transport_type { > >>>> + /* legacy for users */ > >>>> RDMA_TRANSPORT_IB, > >>>> RDMA_TRANSPORT_IWARP, > >>>> RDMA_TRANSPORT_USNIC, > >>>> - RDMA_TRANSPORT_USNIC_UDP > >>>> + RDMA_TRANSPORT_USNIC_UDP, > >>>> + /* new transport */ > >>>> + RDMA_TRANSPORT_IBOE, > >>> > >>> Remove RDMA_TRANSPORT_IBOE - it is not a transport. > >>> ROCE uses IBTA transport. > >>> > >>> If any code should test for ROCE should invoke a specific helper, e.g., rdma_protocol_iboe(). > >>> This is what you currently call "rdma_tech_iboe" is patch 02/26. > >>> > >>> I think that pretty much everybody agrees that rdma_protocol_*() is a better name than rdma_tech_*(), right? > >>> So, let's change this. > >> > >> Sure, sounds reasonable now, about the IBOE, we still need it to > >> separate the port support IB/ETH without the check on link-layer, > >> So what about a new enum on protocol type? > >> > >> Like: > >> > >> enum rdma_protocol { > >> RDMA_PROTOCOL_IB, > >> RDMA_PROTOCOL_IBOE, > >> RDMA_PROTOCOL_IWARP, > >> RDMA_PROTOCOL_USNIC_UDP > >> }; > >> > >> So we could use query_protocol() to ask device provide the protocol > >> type, and there will be no mixing with the legacy transport type > >> anymore :-) > > > > I'm ok with that. I like introducing a unique namespace which is clearly > > different from the previous "transport" one. > > I agree the word "transport" takes things into the weeds. > > But on the topic of naming protocols, I've been wondering, is there > some reason that "IBOE" is being used instead of "RoCE"? Because back in the day, when RoCE was accepted into the kernel, I'm pretty sure it was prior to the IBTA's final stamp of approval and before the name was set on RoCE, so IBoE was chosen upstream as the more "correct" name because it properly denoted what it was deemed to truly be: IB Verbs over Ethernet. > The IBOE > protocol used to exist and is not the same as the currently > standardized RoCE, right? I don't believe so. To my knowledge, there was never an IBoE except in linux upstream parlance. > Also wondering, why add "UDP" to USNIC, is there a different USNIC? Yes, there are two transports, one a distinct ethertype and one that encapsulates USNIC in UDP. > Naming multiple layers together seems confusing and maybe in the end > will create more code to deal with the differences. For example, what > token will RoCEv2 take? RoCE_UDP, RoCE_v2 or ... ? Uncertain as of now.
On 4/27/2015 5:36 PM, Doug Ledford wrote: > On Mon, 2015-04-27 at 17:16 -0700, Tom Talpey wrote: >> On 4/27/2015 2:52 PM, ira.weiny wrote: >>> On Mon, Apr 27, 2015 at 09:39:05AM +0200, Michael Wang wrote: >>>> On 04/24/2015 05:12 PM, Liran Liss wrote: >>>>> [snip] >>>> >>>> Like: >>>> >>>> enum rdma_protocol { >>>> RDMA_PROTOCOL_IB, >>>> RDMA_PROTOCOL_IBOE, >>>> RDMA_PROTOCOL_IWARP, >>>> RDMA_PROTOCOL_USNIC_UDP >>>> }; >>>> >>>> So we could use query_protocol() to ask device provide the protocol >>>> type, and there will be no mixing with the legacy transport type >>>> anymore :-) >>> >>> I'm ok with that. I like introducing a unique namespace which is clearly >>> different from the previous "transport" one. >> >> I agree the word "transport" takes things into the weeds. >> >> But on the topic of naming protocols, I've been wondering, is there >> some reason that "IBOE" is being used instead of "RoCE"? > > Because back in the day, when RoCE was accepted into the kernel, I'm > pretty sure it was prior to the IBTA's final stamp of approval and > before the name was set on RoCE, so IBoE was chosen upstream as the more > "correct" name because it properly denoted what it was deemed to truly > be: IB Verbs over Ethernet. Well history is all well and good, but it seems weird to not use the current, standard name in new code. It confuses me, anyway, because it seems like IBOE could easily mean something else. >> Also wondering, why add "UDP" to USNIC, is there a different USNIC? > > Yes, there are two transports, one a distinct ethertype and one that > encapsulates USNIC in UDP. But this new enum isn't about transport, it's about protocol. So is there one USNIC protocol, with a raw layering and a separate one with UDP? Or is it one USNIC protocol with two different framings? Seems there should be at least the USNIC protocol, without the _UDP decoration, and I don't see it in the enum. > >> Naming multiple layers together seems confusing and maybe in the end >> will create more code to deal with the differences. For example, what >> token will RoCEv2 take? RoCE_UDP, RoCE_v2 or ... ? > > Uncertain as of now. Ok, but it's imminent, right? What's the preference/guidance? -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 2015-04-27 at 17:53 -0700, Tom Talpey wrote: > On 4/27/2015 5:36 PM, Doug Ledford wrote: > > On Mon, 2015-04-27 at 17:16 -0700, Tom Talpey wrote: > >> On 4/27/2015 2:52 PM, ira.weiny wrote: > >>> On Mon, Apr 27, 2015 at 09:39:05AM +0200, Michael Wang wrote: > >>>> On 04/24/2015 05:12 PM, Liran Liss wrote: > >>>>> [snip] > >>>> > >>>> Like: > >>>> > >>>> enum rdma_protocol { > >>>> RDMA_PROTOCOL_IB, > >>>> RDMA_PROTOCOL_IBOE, > >>>> RDMA_PROTOCOL_IWARP, > >>>> RDMA_PROTOCOL_USNIC_UDP > >>>> }; > >>>> > >>>> So we could use query_protocol() to ask device provide the protocol > >>>> type, and there will be no mixing with the legacy transport type > >>>> anymore :-) > >>> > >>> I'm ok with that. I like introducing a unique namespace which is clearly > >>> different from the previous "transport" one. > >> > >> I agree the word "transport" takes things into the weeds. > >> > >> But on the topic of naming protocols, I've been wondering, is there > >> some reason that "IBOE" is being used instead of "RoCE"? > > > > Because back in the day, when RoCE was accepted into the kernel, I'm > > pretty sure it was prior to the IBTA's final stamp of approval and > > before the name was set on RoCE, so IBoE was chosen upstream as the more > > "correct" name because it properly denoted what it was deemed to truly > > be: IB Verbs over Ethernet. > > Well history is all well and good, but it seems weird to not use the > current, standard name in new code. It confuses me, anyway, because > it seems like IBOE could easily mean something else. Having some of it refer to things as IBOE and some as ROCE would be similarly confusing, and switching existing IBOE usage to ROCE would cause pain to people with out of tree drivers (Lustre is the main one I know of). There's not a good answer here. There's only less sucky ones. > >> Also wondering, why add "UDP" to USNIC, is there a different USNIC? > > > > Yes, there are two transports, one a distinct ethertype and one that > > encapsulates USNIC in UDP. > > But this new enum isn't about transport, it's about protocol. So is > there one USNIC protocol, with a raw layering and a separate one with > UDP? Or is it one USNIC protocol with two different framings? Seems > there should be at least the USNIC protocol, without the _UDP > decoration, and I don't see it in the enum. Keep in mind that this enum was Liran's response to Michael's original patch. In the enum in Michael's patch, there was both USNIC and USNIC_UDP. > > > >> Naming multiple layers together seems confusing and maybe in the end > >> will create more code to deal with the differences. For example, what > >> token will RoCEv2 take? RoCE_UDP, RoCE_v2 or ... ? > > > > Uncertain as of now. > > Ok, but it's imminent, right? What's the preference/guidance? There is a patchset from Devesh Sharma at Emulex. It added the RoCEv2 capability. As I recall, it used a new flag added to the existing port capabilities bitmask and notably did not modify either the node type or link layer that are currently used to differentiate between the different protocols. That's from memory though, so I could be mistaken. But that patchset was not written with this patchset in mind, and merging the two may well change that. In any case, there is a proposed spec to follow, so for now that's the preference/guidance (unless this rework means that we need to depart from the spec on internals for implementation reasons).
On 4/27/2015 6:24 PM, Doug Ledford wrote: > On Mon, 2015-04-27 at 17:53 -0700, Tom Talpey wrote: >> On 4/27/2015 5:36 PM, Doug Ledford wrote: >>> On Mon, 2015-04-27 at 17:16 -0700, Tom Talpey wrote: >>>> On 4/27/2015 2:52 PM, ira.weiny wrote: >>>>> On Mon, Apr 27, 2015 at 09:39:05AM +0200, Michael Wang wrote: >>>>>> On 04/24/2015 05:12 PM, Liran Liss wrote: >>>>>>> [snip] >>>>>> >>>>>> Like: >>>>>> >>>>>> enum rdma_protocol { >>>>>> RDMA_PROTOCOL_IB, >>>>>> RDMA_PROTOCOL_IBOE, >>>>>> RDMA_PROTOCOL_IWARP, >>>>>> RDMA_PROTOCOL_USNIC_UDP >>>>>> }; >>>>>> >>>>>> So we could use query_protocol() to ask device provide the protocol >>>>>> type, and there will be no mixing with the legacy transport type >>>>>> anymore :-) >>>>> >>>>> I'm ok with that. I like introducing a unique namespace which is clearly >>>>> different from the previous "transport" one. >>>> >>>> I agree the word "transport" takes things into the weeds. >>>> >>>> But on the topic of naming protocols, I've been wondering, is there >>>> some reason that "IBOE" is being used instead of "RoCE"? >>> >>> Because back in the day, when RoCE was accepted into the kernel, I'm >>> pretty sure it was prior to the IBTA's final stamp of approval and >>> before the name was set on RoCE, so IBoE was chosen upstream as the more >>> "correct" name because it properly denoted what it was deemed to truly >>> be: IB Verbs over Ethernet. >> >> Well history is all well and good, but it seems weird to not use the >> current, standard name in new code. It confuses me, anyway, because >> it seems like IBOE could easily mean something else. > > Having some of it refer to things as IBOE and some as ROCE would be > similarly confusing, and switching existing IBOE usage to ROCE would > cause pain to people with out of tree drivers (Lustre is the main one I > know of). There's not a good answer here. There's only less sucky > ones. Hrm. Well, avoiding churn is good but legacies can wear ya down. MHO it is worth doing since these are new enums/new patches. > >>>> Also wondering, why add "UDP" to USNIC, is there a different USNIC? >>> >>> Yes, there are two transports, one a distinct ethertype and one that >>> encapsulates USNIC in UDP. >> >> But this new enum isn't about transport, it's about protocol. So is >> there one USNIC protocol, with a raw layering and a separate one with >> UDP? Or is it one USNIC protocol with two different framings? Seems >> there should be at least the USNIC protocol, without the _UDP >> decoration, and I don't see it in the enum. > > Keep in mind that this enum was Liran's response to Michael's original > patch. In the enum in Michael's patch, there was both USNIC and > USNIC_UDP. Right! That's why I'm confused. Seems wrong to drop it, right? > >>> >>>> Naming multiple layers together seems confusing and maybe in the end >>>> will create more code to deal with the differences. For example, what >>>> token will RoCEv2 take? RoCE_UDP, RoCE_v2 or ... ? >>> >>> Uncertain as of now. >> >> Ok, but it's imminent, right? What's the preference/guidance? > > There is a patchset from Devesh Sharma at Emulex. It added the RoCEv2 > capability. As I recall, it used a new flag added to the existing port > capabilities bitmask and notably did not modify either the node type or > link layer that are currently used to differentiate between the > different protocols. That's from memory though, so I could be mistaken. > > But that patchset was not written with this patchset in mind, and > merging the two may well change that. In any case, there is a proposed > spec to follow, so for now that's the preference/guidance (unless this > rework means that we need to depart from the spec on internals for > implementation reasons). Well, if RoCEv2 uses the same protocol enum, that may introduce new confusion, for example there will be some new CM handling for UDP encap, source port selection, and of course vlan/tag assignment, etc. But if there is support under way, and everyone is clear, then, ok. Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
PiA+IEtlZXAgaW4gbWluZCB0aGF0IHRoaXMgZW51bSB3YXMgTGlyYW4ncyByZXNwb25zZSB0byBN aWNoYWVsJ3Mgb3JpZ2luYWwNCj4gPiBwYXRjaC4gIEluIHRoZSBlbnVtIGluIE1pY2hhZWwncyBw YXRjaCwgdGhlcmUgd2FzIGJvdGggVVNOSUMgYW5kDQo+ID4gVVNOSUNfVURQLg0KPiANCj4gUmln aHQhIFRoYXQncyB3aHkgSSdtIGNvbmZ1c2VkLiBTZWVtcyB3cm9uZyB0byBkcm9wIGl0LCByaWdo dD8NCg0KSSB0aGluayB0aGUgb3JpZ2luYWwgVVNOSUMgcHJvdG9jb2wgaXMgbGF5ZXJlZCBkaXJl Y3RseSBvdmVyIEV0aGVybmV0LiAgVGhlIHByb3RvY29sIGJhc2ljYWxseSBzdG9sZSBhbiBFdGhl cnR5cGUgKHRoZSBvbmUgdXNlZCBmb3IgSUJvRS9Sb0NFKSBhbmQgaW1wbGVtZW50ZWQgYSBwcm9w cmlldGFyeSBwcm90b2NvbCBpbnN0ZWFkLiAgSSBoYXZlIG5vIGlkZWEgaG93IHlvdSByZXNvbHZl IHRoYXQsIGJ1dCBJIGFsc28gZG9uJ3QgdGhpbmsgaXQncyB1c2VkIGFueW1vcmUuICBVU05JQ19V RFAgaXMganVzdCBVRFAuDQoNCj4gV2VsbCwgaWYgUm9DRXYyIHVzZXMgdGhlIHNhbWUgcHJvdG9j b2wgZW51bSwgdGhhdCBtYXkgaW50cm9kdWNlIG5ldw0KPiBjb25mdXNpb24sIGZvciBleGFtcGxl IHRoZXJlIHdpbGwgYmUgc29tZSBuZXcgQ00gaGFuZGxpbmcgZm9yIFVEUCBlbmNhcCwNCj4gc291 cmNlIHBvcnQgc2VsZWN0aW9uLCBhbmQgb2YgY291cnNlIHZsYW4vdGFnIGFzc2lnbm1lbnQsIGV0 Yy4gQnV0IGlmDQo+IHRoZXJlIGlzIHN1cHBvcnQgdW5kZXIgd2F5LCBhbmQgZXZlcnlvbmUgaXMg Y2xlYXIsIHRoZW4sIG9rLg0KDQpSb0NFdjIvSUJvVURQIHNoYXJlcyB0aGUgc2FtZSBwb3J0IHNw YWNlIGFzIFVEUC4gIEl0IGhhcyBhIHNpbWlsYXIgaXNzdWVzIGFzIGlXYXJwIGRvZXMgc2hhcmlu ZyBzdGF0ZSB3aXRoIHRoZSBtYWluIG5ldHdvcmsgc3RhY2suICBJJ20gbm90IGF3YXJlIG9mIGFu eSBwcm9wb3NhbCBmb3IgcmVzb2x2aW5nIHRoYXQuICBEb2VzIGl0IHJlcXVpcmUgdXNpbmcgYSBz ZXBhcmF0ZSBJUCBhZGRyZXNzPyAgRG9lcyBpdCB1c2UgYSBwb3J0IG1hcHBlciBmdW5jdGlvbj8g IERvZXMgbmV0ZGV2IGNhcmUgZm9yIFVEUD8gIEknbSBub3Qgc3VyZSB3aGF0IFVTTklDIGRvZXMg Zm9yIHRoaXMgZWl0aGVyLCBidXQgYSBjb21tb24gc29sdXRpb24gYmV0d2VlbiBVU05JQyBhbmQg SUJvVURQIHNlZW1zIHJlYXNvbmFibGUuDQoNCg0K -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 04/28/2015 03:24 AM, Doug Ledford wrote: [snip] >>>> Also wondering, why add "UDP" to USNIC, is there a different USNIC? >>> >>> Yes, there are two transports, one a distinct ethertype and one that >>> encapsulates USNIC in UDP. >> >> But this new enum isn't about transport, it's about protocol. So is >> there one USNIC protocol, with a raw layering and a separate one with >> UDP? Or is it one USNIC protocol with two different framings? Seems >> there should be at least the USNIC protocol, without the _UDP >> decoration, and I don't see it in the enum. > > Keep in mind that this enum was Liran's response to Michael's original > patch. In the enum in Michael's patch, there was both USNIC and > USNIC_UDP. Yeah, I've not enum PROTOCOL_USNIC since currently there is no place need it... The only three cases currently are: 1. trasnport IB, link layer IB //PROTOCOL_IB 2. transport IB, link layer ETH //PROTOCOL_IBOE 3. transport IWARP //PROTOCOL_IWARP Regards, Michael Wang > >>> >>>> Naming multiple layers together seems confusing and maybe in the end >>>> will create more code to deal with the differences. For example, what >>>> token will RoCEv2 take? RoCE_UDP, RoCE_v2 or ... ? >>> >>> Uncertain as of now. >> >> Ok, but it's imminent, right? What's the preference/guidance? > > There is a patchset from Devesh Sharma at Emulex. It added the RoCEv2 > capability. As I recall, it used a new flag added to the existing port > capabilities bitmask and notably did not modify either the node type or > link layer that are currently used to differentiate between the > different protocols. That's from memory though, so I could be mistaken. > > But that patchset was not written with this patchset in mind, and > merging the two may well change that. In any case, there is a proposed > spec to follow, so for now that's the preference/guidance (unless this > rework means that we need to depart from the spec on internals for > implementation reasons). > > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Apr 27, 2015 at 09:24:35PM -0400, Doug Ledford wrote: > On Mon, 2015-04-27 at 17:53 -0700, Tom Talpey wrote: > Having some of it refer to things as IBOE and some as ROCE would be > similarly confusing, and switching existing IBOE usage to ROCE would > cause pain to people with out of tree drivers (Lustre is the main one I > know of). There's not a good answer here. There's only less sucky > ones. The tide has already turned, we should ditch iboe: $git grep -i roce_ drivers/infiniband/ | wc -l 91 $git grep -i iboe_ drivers/infiniband/ | wc -l 37 It isn't really mainline's role to be too concerned about out of tree things like Lustre. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Apr 28, 2015 at 9:56 PM, Jason Gunthorpe <jgunthorpe@obsidianresearch.com> wrote: > On Mon, Apr 27, 2015 at 09:24:35PM -0400, Doug Ledford wrote: >> On Mon, 2015-04-27 at 17:53 -0700, Tom Talpey wrote: > >> Having some of it refer to things as IBOE and some as ROCE would be >> similarly confusing, and switching existing IBOE usage to ROCE would >> cause pain to people with out of tree drivers (Lustre is the main one I >> know of). There's not a good answer here. There's only less sucky >> ones. > > The tide has already turned, we should ditch iboe: > > $git grep -i roce_ drivers/infiniband/ | wc -l > 91 > $git grep -i iboe_ drivers/infiniband/ | wc -l > 37 > > It isn't really mainline's role to be too concerned about out of tree > things like Lustre. FWIW, note that Lustre is under staging for a while, not sure how close they are for actual acceptance. Or. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Apr 28, 2015, at 1:14 AM, Hefty, Sean <sean.hefty@intel.com> wrote: >>> Keep in mind that this enum was Liran's response to Michael's original >>> patch. In the enum in Michael's patch, there was both USNIC and >>> USNIC_UDP. >> >> Right! That's why I'm confused. Seems wrong to drop it, right? > > I think the original USNIC protocol is layered directly over Ethernet. The protocol basically stole an Ethertype (the one used for IBoE/RoCE) and implemented a proprietary protocol instead. I have no idea how you resolve that, but I also don't think it's used anymore. USNIC_UDP is just UDP. Sean is correct. The legacy RDMA_TRANSPORT_USNIC code used a proprietary protocol over plain Ethernet frames. The newer RDMA_TRANSPORT_USNIC_UDP code is just standard UDP/IP/Ethernet packets exposed to user space via the uverbs stack. The current kernel module will support both formats, it just depends on which user space requests at create_qp time. From the kernel point of view there is no common protocol between the two TRANSPORTs (other than sharing partially similar Ethernet frames at L2). I posted last week to clarify some of this: http://marc.info/?l=linux-rdma&m=142972177830718&w=2 >> Well, if RoCEv2 uses the same protocol enum, that may introduce new >> confusion, for example there will be some new CM handling for UDP encap, >> source port selection, and of course vlan/tag assignment, etc. But if >> there is support under way, and everyone is clear, then, ok. > > RoCEv2/IBoUDP shares the same port space as UDP. It has a similar issues as iWarp does sharing state with the main network stack. I'm not aware of any proposal for resolving that. Does it require using a separate IP address? Does it use a port mapper function? Does netdev care for UDP? I'm not sure what USNIC does for this either, but a common solution between USNIC and IBoUDP seems reasonable. Is the concern here about CM issues or the UDP ports used by the actual usNIC RQs? CM is not used/supported for usNIC at this time. -Dave -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> Is the concern here about CM issues or the UDP ports used by the actual > usNIC RQs? UDP port space sharing -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 2015-04-28 at 22:11 +0300, Or Gerlitz wrote: > On Tue, Apr 28, 2015 at 9:56 PM, Jason Gunthorpe > <jgunthorpe@obsidianresearch.com> wrote: > > On Mon, Apr 27, 2015 at 09:24:35PM -0400, Doug Ledford wrote: > >> On Mon, 2015-04-27 at 17:53 -0700, Tom Talpey wrote: > > > >> Having some of it refer to things as IBOE and some as ROCE would be > >> similarly confusing, and switching existing IBOE usage to ROCE would > >> cause pain to people with out of tree drivers (Lustre is the main one I > >> know of). There's not a good answer here. There's only less sucky > >> ones. > > > > The tide has already turned, we should ditch iboe: > > > > $git grep -i roce_ drivers/infiniband/ | wc -l > > 91 > > $git grep -i iboe_ drivers/infiniband/ | wc -l > > 37 > > > > It isn't really mainline's role to be too concerned about out of tree > > things like Lustre. > > FWIW, note that Lustre is under staging for a while, not sure how > close they are for actual acceptance. I thought that was just the client and didn't include the server...
On Tue, 2015-04-28 at 12:56 -0600, Jason Gunthorpe wrote: > On Mon, Apr 27, 2015 at 09:24:35PM -0400, Doug Ledford wrote: > > On Mon, 2015-04-27 at 17:53 -0700, Tom Talpey wrote: > > > Having some of it refer to things as IBOE and some as ROCE would be > > similarly confusing, and switching existing IBOE usage to ROCE would > > cause pain to people with out of tree drivers (Lustre is the main one I > > know of). There's not a good answer here. There's only less sucky > > ones. > > The tide has already turned, we should ditch iboe: > > $git grep -i roce_ drivers/infiniband/ | wc -l > 91 > $git grep -i iboe_ drivers/infiniband/ | wc -l > 37 > > It isn't really mainline's role to be too concerned about out of tree > things like Lustre. While I generally agree, one need not be totally callous about out of tree things either.
On Apr 28, 2015, at 2:53 PM, Hefty, Sean <sean.hefty@intel.com> wrote: >> Is the concern here about CM issues or the UDP ports used by the actual >> usNIC RQs? > > UDP port space sharing For the UDP port used by the usNIC QP, the usnic_verbs kernel driver requires user space to pass a file descriptor of a regular UDP socket down at create_qp time. The reference count on this socket is incremented to make sure that the socket can't disappear out from under us. Then an RX filter is installed in the NIC which matches UDP/IP/Ethernet packets that are destined for the UDP port to which the given socket is already bound. So there is a real UDP socket to make most of the usual things happen in the net stack, but the raw UDP/IP/Ethernet packets get delivered directly to the user space queues by the NIC. E.g., "netstat" and "lsof" show you proper addressing information, though obviously any information related to data-path statistics will not be accurate. At teardown we just reverse the steps. However, I'm not sure if that's the sort of information you were looking for. -Dave -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> For the UDP port used by the usNIC QP, the usnic_verbs kernel driver > requires user space to pass a file descriptor of a regular UDP socket down > at create_qp time. The reference count on this socket is incremented to > make sure that the socket can't disappear out from under us. Then an RX > filter is installed in the NIC which matches UDP/IP/Ethernet packets that > are destined for the UDP port to which the given socket is already bound. > So there is a real UDP socket to make most of the usual things happen in > the net stack, but the raw UDP/IP/Ethernet packets get delivered directly > to the user space queues by the NIC. E.g., "netstat" and "lsof" show you > proper addressing information, though obviously any information related to > data-path statistics will not be accurate. At teardown we just reverse > the steps. > > However, I'm not sure if that's the sort of information you were looking > for. This is more part of the RoCEv2 discussion than this thread. But, yes, this is what I was looking for. Conceptually, this is loosely similar to the port mapper functionality in iWarp, with a direct port mapping. Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> From: Hefty, Sean [mailto:sean.hefty@intel.com] > > > For the UDP port used by the usNIC QP, the usnic_verbs kernel driver > > requires user space to pass a file descriptor of a regular UDP socket > > down at create_qp time. The reference count on this socket is > > incremented to make sure that the socket can't disappear out from > > under us. Then an RX filter is installed in the NIC which matches > > UDP/IP/Ethernet packets that are destined for the UDP port to which the > given socket is already bound. > > So there is a real UDP socket to make most of the usual things happen > > in the net stack, but the raw UDP/IP/Ethernet packets get delivered > > directly to the user space queues by the NIC. E.g., "netstat" and > > "lsof" show you proper addressing information, though obviously any > > information related to data-path statistics will not be accurate. At > > teardown we just reverse the steps. > > > > However, I'm not sure if that's the sort of information you were > > looking for. > > This is more part of the RoCEv2 discussion than this thread. But, yes, this is > what I was looking for. Conceptually, this is loosely similar to the port > mapper functionality in iWarp, with a direct port mapping. Thanks. > I would like to point out what seems to be a misconception regarding RoCEv2. RoCEv2 is *not* like iWARP and USNIC in the sense of using the TCP/UDP transport, and does *not* use any kind of port-mapping. RoCEv2 endpoints are IBTA QP numbers; the protocol uses UDP only for tunneling and *not* as a transport -- think VXLAN... All RoCEv2 packets have the same dport (but possibly different dqpns). It uses a well-known UDP port, assigned by IANA (port 4791). So, in summary, I think that: - The transport should refer to the transport semantics of the technology. -- This could be determined by an organization or a vendor, for example: --- IB, RoCEv1, and RoCEv2 use IBTA transport --- iWARP would use the IETF iWARP transport --- USNIC would use the USENIC transport (assuming that the semantics don't change for different wire protocols) - The protocol should describe a unique, interoperable, wire-protocol stack, optionally crossing different layers. -- For example: --- RoCE ports would advertise 2 protocols: ---- RoCEv1 ---- RoCEv2 --- IB ports would advertise the Infiniband protocol --- iWARP RNIC ports would advertise the iWARP protocol stack --- USNIC ports would advertise 2 protocols: ---- USENICv1 (Ethertype) ---- USENICv2 (UDP) protocols - Choosing the protocol at the API level is another matter, and depends on the technology, e.g., -- RoCE selects the protocol (V1 or V2) according to the SGID index (according to the specification, each GID entry is associated with a "GID type", that determines the protocol) -- USENIC does it using a modify QP attribute (if I understood correctly) - As for IBOE-->ROCE name changes: -- We should do it in a single patch that modifies all the current kernel symbols -- I wouldn't worry about out-of-tree drivers, they will adjust as they always do: --- From my experience, you usually have to do something in your backports when a new kernel is released anyway --- Substituting symbol names is not a complicated task... -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c index 18c1ece..a9587c4 100644 --- a/drivers/infiniband/core/device.c +++ b/drivers/infiniband/core/device.c @@ -76,6 +76,7 @@ static int ib_device_check_mandatory(struct ib_device *device) } mandatory_table[] = { IB_MANDATORY_FUNC(query_device), IB_MANDATORY_FUNC(query_port), + IB_MANDATORY_FUNC(query_transport), IB_MANDATORY_FUNC(query_pkey), IB_MANDATORY_FUNC(query_gid), IB_MANDATORY_FUNC(alloc_pd), diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c index f93eb8d..626c9cf 100644 --- a/drivers/infiniband/core/verbs.c +++ b/drivers/infiniband/core/verbs.c @@ -133,14 +133,16 @@ enum rdma_link_layer rdma_port_get_link_layer(struct ib_device *device, u8 port_ if (device->get_link_layer) return device->get_link_layer(device, port_num); - switch (rdma_node_get_transport(device->node_type)) { + switch (device->query_transport(device, port_num)) { case RDMA_TRANSPORT_IB: return IB_LINK_LAYER_INFINIBAND; + case RDMA_TRANSPORT_IBOE: case RDMA_TRANSPORT_IWARP: case RDMA_TRANSPORT_USNIC: case RDMA_TRANSPORT_USNIC_UDP: return IB_LINK_LAYER_ETHERNET; default: + BUG(); return IB_LINK_LAYER_UNSPECIFIED; } } diff --git a/drivers/infiniband/hw/amso1100/c2_provider.c b/drivers/infiniband/hw/amso1100/c2_provider.c index bdf3507..d46bbb0 100644 --- a/drivers/infiniband/hw/amso1100/c2_provider.c +++ b/drivers/infiniband/hw/amso1100/c2_provider.c @@ -99,6 +99,12 @@ static int c2_query_port(struct ib_device *ibdev, return 0; } +static enum rdma_transport_type +c2_query_transport(struct ib_device *device, u8 port_num) +{ + return RDMA_TRANSPORT_IWARP; +} + static int c2_query_pkey(struct ib_device *ibdev, u8 port, u16 index, u16 * pkey) { @@ -801,6 +807,7 @@ int c2_register_device(struct c2_dev *dev) dev->ibdev.dma_device = &dev->pcidev->dev; dev->ibdev.query_device = c2_query_device; dev->ibdev.query_port = c2_query_port; + dev->ibdev.query_transport = c2_query_transport; dev->ibdev.query_pkey = c2_query_pkey; dev->ibdev.query_gid = c2_query_gid; dev->ibdev.alloc_ucontext = c2_alloc_ucontext; diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c index 811b24a..09682e9e 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_provider.c +++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c @@ -1232,6 +1232,12 @@ static int iwch_query_port(struct ib_device *ibdev, return 0; } +static enum rdma_transport_type +iwch_query_transport(struct ib_device *device, u8 port_num) +{ + return RDMA_TRANSPORT_IWARP; +} + static ssize_t show_rev(struct device *dev, struct device_attribute *attr, char *buf) { @@ -1385,6 +1391,7 @@ int iwch_register_device(struct iwch_dev *dev) dev->ibdev.dma_device = &(dev->rdev.rnic_info.pdev->dev); dev->ibdev.query_device = iwch_query_device; dev->ibdev.query_port = iwch_query_port; + dev->ibdev.query_transport = iwch_query_transport; dev->ibdev.query_pkey = iwch_query_pkey; dev->ibdev.query_gid = iwch_query_gid; dev->ibdev.alloc_ucontext = iwch_alloc_ucontext; diff --git a/drivers/infiniband/hw/cxgb4/provider.c b/drivers/infiniband/hw/cxgb4/provider.c index 66bd6a2..a445e0d 100644 --- a/drivers/infiniband/hw/cxgb4/provider.c +++ b/drivers/infiniband/hw/cxgb4/provider.c @@ -390,6 +390,12 @@ static int c4iw_query_port(struct ib_device *ibdev, u8 port, return 0; } +static enum rdma_transport_type +c4iw_query_transport(struct ib_device *device, u8 port_num) +{ + return RDMA_TRANSPORT_IWARP; +} + static ssize_t show_rev(struct device *dev, struct device_attribute *attr, char *buf) { @@ -506,6 +512,7 @@ int c4iw_register_device(struct c4iw_dev *dev) dev->ibdev.dma_device = &(dev->rdev.lldi.pdev->dev); dev->ibdev.query_device = c4iw_query_device; dev->ibdev.query_port = c4iw_query_port; + dev->ibdev.query_transport = c4iw_query_transport; dev->ibdev.query_pkey = c4iw_query_pkey; dev->ibdev.query_gid = c4iw_query_gid; dev->ibdev.alloc_ucontext = c4iw_alloc_ucontext; diff --git a/drivers/infiniband/hw/ehca/ehca_hca.c b/drivers/infiniband/hw/ehca/ehca_hca.c index 9ed4d25..d5a34a6 100644 --- a/drivers/infiniband/hw/ehca/ehca_hca.c +++ b/drivers/infiniband/hw/ehca/ehca_hca.c @@ -242,6 +242,12 @@ query_port1: return ret; } +enum rdma_transport_type +ehca_query_transport(struct ib_device *device, u8 port_num) +{ + return RDMA_TRANSPORT_IB; +} + int ehca_query_sma_attr(struct ehca_shca *shca, u8 port, struct ehca_sma_attr *attr) { diff --git a/drivers/infiniband/hw/ehca/ehca_iverbs.h b/drivers/infiniband/hw/ehca/ehca_iverbs.h index 22f79af..cec945f 100644 --- a/drivers/infiniband/hw/ehca/ehca_iverbs.h +++ b/drivers/infiniband/hw/ehca/ehca_iverbs.h @@ -49,6 +49,9 @@ int ehca_query_device(struct ib_device *ibdev, struct ib_device_attr *props); int ehca_query_port(struct ib_device *ibdev, u8 port, struct ib_port_attr *props); +enum rdma_transport_type +ehca_query_transport(struct ib_device *device, u8 port_num); + int ehca_query_sma_attr(struct ehca_shca *shca, u8 port, struct ehca_sma_attr *attr); diff --git a/drivers/infiniband/hw/ehca/ehca_main.c b/drivers/infiniband/hw/ehca/ehca_main.c index cd8d290..60e0a09 100644 --- a/drivers/infiniband/hw/ehca/ehca_main.c +++ b/drivers/infiniband/hw/ehca/ehca_main.c @@ -467,6 +467,7 @@ static int ehca_init_device(struct ehca_shca *shca) shca->ib_device.dma_device = &shca->ofdev->dev; shca->ib_device.query_device = ehca_query_device; shca->ib_device.query_port = ehca_query_port; + shca->ib_device.query_transport = ehca_query_transport; shca->ib_device.query_gid = ehca_query_gid; shca->ib_device.query_pkey = ehca_query_pkey; /* shca->in_device.modify_device = ehca_modify_device */ diff --git a/drivers/infiniband/hw/ipath/ipath_verbs.c b/drivers/infiniband/hw/ipath/ipath_verbs.c index 44ea939..58d36e3 100644 --- a/drivers/infiniband/hw/ipath/ipath_verbs.c +++ b/drivers/infiniband/hw/ipath/ipath_verbs.c @@ -1638,6 +1638,12 @@ static int ipath_query_port(struct ib_device *ibdev, return 0; } +static enum rdma_transport_type +ipath_query_transport(struct ib_device *device, u8 port_num) +{ + return RDMA_TRANSPORT_IB; +} + static int ipath_modify_device(struct ib_device *device, int device_modify_mask, struct ib_device_modify *device_modify) @@ -2140,6 +2146,7 @@ int ipath_register_ib_device(struct ipath_devdata *dd) dev->query_device = ipath_query_device; dev->modify_device = ipath_modify_device; dev->query_port = ipath_query_port; + dev->query_transport = ipath_query_transport; dev->modify_port = ipath_modify_port; dev->query_pkey = ipath_query_pkey; dev->query_gid = ipath_query_gid; diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c index 57070c5..1e13cf9 100644 --- a/drivers/infiniband/hw/mlx4/main.c +++ b/drivers/infiniband/hw/mlx4/main.c @@ -420,6 +420,15 @@ static int mlx4_ib_query_port(struct ib_device *ibdev, u8 port, return __mlx4_ib_query_port(ibdev, port, props, 0); } +static enum rdma_transport_type +mlx4_ib_query_transport(struct ib_device *device, u8 port_num) +{ + struct mlx4_dev *dev = to_mdev(device)->dev; + + return dev->caps.port_mask[port_num] == MLX4_PORT_TYPE_IB ? + RDMA_TRANSPORT_IB : RDMA_TRANSPORT_IBOE; +} + int __mlx4_ib_query_gid(struct ib_device *ibdev, u8 port, int index, union ib_gid *gid, int netw_view) { @@ -2202,6 +2211,7 @@ static void *mlx4_ib_add(struct mlx4_dev *dev) ibdev->ib_dev.query_device = mlx4_ib_query_device; ibdev->ib_dev.query_port = mlx4_ib_query_port; + ibdev->ib_dev.query_transport = mlx4_ib_query_transport; ibdev->ib_dev.get_link_layer = mlx4_ib_port_link_layer; ibdev->ib_dev.query_gid = mlx4_ib_query_gid; ibdev->ib_dev.query_pkey = mlx4_ib_query_pkey; diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c index 57c9809..b6f2f58 100644 --- a/drivers/infiniband/hw/mlx5/main.c +++ b/drivers/infiniband/hw/mlx5/main.c @@ -262,6 +262,12 @@ out: return err; } +static enum rdma_transport_type +mlx5_ib_query_transport(struct ib_device *device, u8 port_num) +{ + return RDMA_TRANSPORT_IB; +} + static int mlx5_ib_query_gid(struct ib_device *ibdev, u8 port, int index, union ib_gid *gid) { @@ -1244,6 +1250,7 @@ static void *mlx5_ib_add(struct mlx5_core_dev *mdev) dev->ib_dev.query_device = mlx5_ib_query_device; dev->ib_dev.query_port = mlx5_ib_query_port; + dev->ib_dev.query_transport = mlx5_ib_query_transport; dev->ib_dev.query_gid = mlx5_ib_query_gid; dev->ib_dev.query_pkey = mlx5_ib_query_pkey; dev->ib_dev.modify_device = mlx5_ib_modify_device; diff --git a/drivers/infiniband/hw/mthca/mthca_provider.c b/drivers/infiniband/hw/mthca/mthca_provider.c index 415f8e1..67ac6a4 100644 --- a/drivers/infiniband/hw/mthca/mthca_provider.c +++ b/drivers/infiniband/hw/mthca/mthca_provider.c @@ -179,6 +179,12 @@ static int mthca_query_port(struct ib_device *ibdev, return err; } +static enum rdma_transport_type +mthca_query_transport(struct ib_device *device, u8 port_num) +{ + return RDMA_TRANSPORT_IB; +} + static int mthca_modify_device(struct ib_device *ibdev, int mask, struct ib_device_modify *props) @@ -1281,6 +1287,7 @@ int mthca_register_device(struct mthca_dev *dev) dev->ib_dev.dma_device = &dev->pdev->dev; dev->ib_dev.query_device = mthca_query_device; dev->ib_dev.query_port = mthca_query_port; + dev->ib_dev.query_transport = mthca_query_transport; dev->ib_dev.modify_device = mthca_modify_device; dev->ib_dev.modify_port = mthca_modify_port; dev->ib_dev.query_pkey = mthca_query_pkey; diff --git a/drivers/infiniband/hw/nes/nes_verbs.c b/drivers/infiniband/hw/nes/nes_verbs.c index c0d0296..8df5b61 100644 --- a/drivers/infiniband/hw/nes/nes_verbs.c +++ b/drivers/infiniband/hw/nes/nes_verbs.c @@ -606,6 +606,11 @@ static int nes_query_port(struct ib_device *ibdev, u8 port, struct ib_port_attr return 0; } +static enum rdma_transport_type +nes_query_transport(struct ib_device *device, u8 port_num) +{ + return RDMA_TRANSPORT_IWARP; +} /** * nes_query_pkey @@ -3879,6 +3884,7 @@ struct nes_ib_device *nes_init_ofa_device(struct net_device *netdev) nesibdev->ibdev.dev.parent = &nesdev->pcidev->dev; nesibdev->ibdev.query_device = nes_query_device; nesibdev->ibdev.query_port = nes_query_port; + nesibdev->ibdev.query_transport = nes_query_transport; nesibdev->ibdev.query_pkey = nes_query_pkey; nesibdev->ibdev.query_gid = nes_query_gid; nesibdev->ibdev.alloc_ucontext = nes_alloc_ucontext; diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_main.c b/drivers/infiniband/hw/ocrdma/ocrdma_main.c index 7a2b59a..9f4d182 100644 --- a/drivers/infiniband/hw/ocrdma/ocrdma_main.c +++ b/drivers/infiniband/hw/ocrdma/ocrdma_main.c @@ -244,6 +244,7 @@ static int ocrdma_register_device(struct ocrdma_dev *dev) /* mandatory verbs. */ dev->ibdev.query_device = ocrdma_query_device; dev->ibdev.query_port = ocrdma_query_port; + dev->ibdev.query_transport = ocrdma_query_transport; dev->ibdev.modify_port = ocrdma_modify_port; dev->ibdev.query_gid = ocrdma_query_gid; dev->ibdev.get_link_layer = ocrdma_link_layer; diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c index 8771755..73bace4 100644 --- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c +++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c @@ -187,6 +187,12 @@ int ocrdma_query_port(struct ib_device *ibdev, return 0; } +enum rdma_transport_type +ocrdma_query_transport(struct ib_device *device, u8 port_num) +{ + return RDMA_TRANSPORT_IBOE; +} + int ocrdma_modify_port(struct ib_device *ibdev, u8 port, int mask, struct ib_port_modify *props) { diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h index b8f7853..4a81b63 100644 --- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h +++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h @@ -41,6 +41,9 @@ int ocrdma_query_port(struct ib_device *, u8 port, struct ib_port_attr *props); int ocrdma_modify_port(struct ib_device *, u8 port, int mask, struct ib_port_modify *props); +enum rdma_transport_type +ocrdma_query_transport(struct ib_device *device, u8 port_num); + void ocrdma_get_guid(struct ocrdma_dev *, u8 *guid); int ocrdma_query_gid(struct ib_device *, u8 port, int index, union ib_gid *gid); diff --git a/drivers/infiniband/hw/qib/qib_verbs.c b/drivers/infiniband/hw/qib/qib_verbs.c index 4a35998..caad665 100644 --- a/drivers/infiniband/hw/qib/qib_verbs.c +++ b/drivers/infiniband/hw/qib/qib_verbs.c @@ -1650,6 +1650,12 @@ static int qib_query_port(struct ib_device *ibdev, u8 port, return 0; } +static enum rdma_transport_type +qib_query_transport(struct ib_device *device, u8 port_num) +{ + return RDMA_TRANSPORT_IB; +} + static int qib_modify_device(struct ib_device *device, int device_modify_mask, struct ib_device_modify *device_modify) @@ -2184,6 +2190,7 @@ int qib_register_ib_device(struct qib_devdata *dd) ibdev->query_device = qib_query_device; ibdev->modify_device = qib_modify_device; ibdev->query_port = qib_query_port; + ibdev->query_transport = qib_query_transport; ibdev->modify_port = qib_modify_port; ibdev->query_pkey = qib_query_pkey; ibdev->query_gid = qib_query_gid; diff --git a/drivers/infiniband/hw/usnic/usnic_ib_main.c b/drivers/infiniband/hw/usnic/usnic_ib_main.c index 0d0f986..03ea9f3 100644 --- a/drivers/infiniband/hw/usnic/usnic_ib_main.c +++ b/drivers/infiniband/hw/usnic/usnic_ib_main.c @@ -360,6 +360,7 @@ static void *usnic_ib_device_add(struct pci_dev *dev) us_ibdev->ib_dev.query_device = usnic_ib_query_device; us_ibdev->ib_dev.query_port = usnic_ib_query_port; + us_ibdev->ib_dev.query_transport = usnic_ib_query_transport; us_ibdev->ib_dev.query_pkey = usnic_ib_query_pkey; us_ibdev->ib_dev.query_gid = usnic_ib_query_gid; us_ibdev->ib_dev.get_link_layer = usnic_ib_port_link_layer; diff --git a/drivers/infiniband/hw/usnic/usnic_ib_verbs.c b/drivers/infiniband/hw/usnic/usnic_ib_verbs.c index 53bd6a2..ff9a5f7 100644 --- a/drivers/infiniband/hw/usnic/usnic_ib_verbs.c +++ b/drivers/infiniband/hw/usnic/usnic_ib_verbs.c @@ -348,6 +348,12 @@ int usnic_ib_query_port(struct ib_device *ibdev, u8 port, return 0; } +enum rdma_transport_type +usnic_ib_query_transport(struct ib_device *device, u8 port_num) +{ + return RDMA_TRANSPORT_USNIC_UDP; +} + int usnic_ib_query_qp(struct ib_qp *qp, struct ib_qp_attr *qp_attr, int qp_attr_mask, struct ib_qp_init_attr *qp_init_attr) diff --git a/drivers/infiniband/hw/usnic/usnic_ib_verbs.h b/drivers/infiniband/hw/usnic/usnic_ib_verbs.h index bb864f5..0b1633b 100644 --- a/drivers/infiniband/hw/usnic/usnic_ib_verbs.h +++ b/drivers/infiniband/hw/usnic/usnic_ib_verbs.h @@ -27,6 +27,8 @@ int usnic_ib_query_device(struct ib_device *ibdev, struct ib_device_attr *props); int usnic_ib_query_port(struct ib_device *ibdev, u8 port, struct ib_port_attr *props); +enum rdma_transport_type +usnic_ib_query_transport(struct ib_device *device, u8 port_num); int usnic_ib_query_qp(struct ib_qp *qp, struct ib_qp_attr *qp_attr, int qp_attr_mask, struct ib_qp_init_attr *qp_init_attr); diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 65994a1..d54f91e 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -75,10 +75,13 @@ enum rdma_node_type { }; enum rdma_transport_type { + /* legacy for users */ RDMA_TRANSPORT_IB, RDMA_TRANSPORT_IWARP, RDMA_TRANSPORT_USNIC, - RDMA_TRANSPORT_USNIC_UDP + RDMA_TRANSPORT_USNIC_UDP, + /* new transport */ + RDMA_TRANSPORT_IBOE, }; __attribute_const__ enum rdma_transport_type @@ -1501,6 +1504,8 @@ struct ib_device { int (*query_port)(struct ib_device *device, u8 port_num, struct ib_port_attr *port_attr); + enum rdma_transport_type (*query_transport)(struct ib_device *device, + u8 port_num); enum rdma_link_layer (*get_link_layer)(struct ib_device *device, u8 port_num); int (*query_gid)(struct ib_device *device,
Add new callback query_transport() and implement for each HW. Mapping List: node-type link-layer old-transport new-transport nes RNIC ETH IWARP IWARP amso1100 RNIC ETH IWARP IWARP cxgb3 RNIC ETH IWARP IWARP cxgb4 RNIC ETH IWARP IWARP usnic USNIC_UDP ETH USNIC_UDP USNIC_UDP ocrdma IB_CA ETH IB IBOE mlx4 IB_CA IB/ETH IB IB/IBOE mlx5 IB_CA IB IB IB ehca IB_CA IB IB IB ipath IB_CA IB IB IB mthca IB_CA IB IB IB qib IB_CA IB IB IB Cc: Hal Rosenstock <hal@dev.mellanox.co.il> Cc: Steve Wise <swise@opengridcomputing.com> Cc: Tom Talpey <tom@talpey.com> Cc: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Cc: Doug Ledford <dledford@redhat.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Michael Wang <yun.wang@profitbricks.com> --- drivers/infiniband/core/device.c | 1 + drivers/infiniband/core/verbs.c | 4 +++- drivers/infiniband/hw/amso1100/c2_provider.c | 7 +++++++ drivers/infiniband/hw/cxgb3/iwch_provider.c | 7 +++++++ drivers/infiniband/hw/cxgb4/provider.c | 7 +++++++ drivers/infiniband/hw/ehca/ehca_hca.c | 6 ++++++ drivers/infiniband/hw/ehca/ehca_iverbs.h | 3 +++ drivers/infiniband/hw/ehca/ehca_main.c | 1 + drivers/infiniband/hw/ipath/ipath_verbs.c | 7 +++++++ drivers/infiniband/hw/mlx4/main.c | 10 ++++++++++ drivers/infiniband/hw/mlx5/main.c | 7 +++++++ drivers/infiniband/hw/mthca/mthca_provider.c | 7 +++++++ drivers/infiniband/hw/nes/nes_verbs.c | 6 ++++++ drivers/infiniband/hw/ocrdma/ocrdma_main.c | 1 + drivers/infiniband/hw/ocrdma/ocrdma_verbs.c | 6 ++++++ drivers/infiniband/hw/ocrdma/ocrdma_verbs.h | 3 +++ drivers/infiniband/hw/qib/qib_verbs.c | 7 +++++++ drivers/infiniband/hw/usnic/usnic_ib_main.c | 1 + drivers/infiniband/hw/usnic/usnic_ib_verbs.c | 6 ++++++ drivers/infiniband/hw/usnic/usnic_ib_verbs.h | 2 ++ include/rdma/ib_verbs.h | 7 ++++++- 21 files changed, 104 insertions(+), 2 deletions(-)