Message ID | 1461765892-1285-2-git-send-email-matanb@mellanox.com (mailing list archive) |
---|---|
State | Changes Requested |
Headers | show |
On Wed, 27 Apr 2016, Matan Barak wrote: > --- a/include/rdma/ib_verbs.h > +++ b/include/rdma/ib_verbs.h > @@ -271,6 +271,11 @@ struct ib_cq_init_attr { > u32 flags; > }; > > +struct ib_lso_caps { > + u32 max_lso; > + u32 supported_qpts; /* Use enum ib_qp_type */ So this is a bitmap of the qps supported? And thus we have a limit now of 32 QP types. That needs to be documented and you need a BUILD_BUG_ON somewhere in case IB_QPT_MAX becomes larger than 32. Also needs to be better commented so we know this is a bitmap. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> > +struct ib_lso_caps { > > + u32 max_lso; > > + u32 supported_qpts; /* Use enum ib_qp_type */ > > So this is a bitmap of the qps supported? And thus we have a limit now of > 32 QP types. That needs to be documented and you need a BUILD_BUG_ON > somewhere in case IB_QPT_MAX becomes larger than 32. Also needs to be > better commented so we know this is a bitmap. I think it would make more sense to indicate what protocol, or specific aspect of a protocol, are being offloaded. Some qp types support LSO as a matter of practice. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 28/04/2016 00:54, Hefty, Sean wrote: >>> +struct ib_lso_caps { >>> + u32 max_lso; >>> + u32 supported_qpts; /* Use enum ib_qp_type */ >> >> So this is a bitmap of the qps supported? And thus we have a limit now of >> 32 QP types. That needs to be documented and you need a BUILD_BUG_ON >> somewhere in case IB_QPT_MAX becomes larger than 32. Also needs to be >> better commented so we know this is a bitmap. > > I think it would make more sense to indicate what protocol, or specific aspect of a protocol, are being offloaded. Some qp types support LSO as a matter of practice. > For a lot of applications, we want to hide the lower level protocol. For example, we try hard to hide whether IB/RoCE is used by an application. Thus, is it wise to expose this thing in API? A user creates a UD QP, he doesn't know or care whether the lower layer is Ethernet (RoCE v1/V2) or IB. I agree that the max_lso could be different for various QP types and maybe probing a QP type would be favorable that getting this information in a generic query_device. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> For a lot of applications, we want to hide the lower level protocol. For > example, we try hard to hide whether IB/RoCE is used by an application. > Thus, is it wise to expose this thing in API? IMO - yes. An application must know what protocol it is using in order to communicate at all. Even sockets exposes the protocol to the app. If an app truly doesn't care, it still needs a generic way to determine if it is using matching protocols. > A user creates a UD QP, he doesn't know or care whether the lower layer > is Ethernet (RoCE v1/V2) or IB. He cares if that there's more than one choice available. There are no guarantees that a ud qp can communicate with some other ud qp. > I agree that the max_lso could be different for various QP types and > maybe probing a QP type would be favorable that getting this information > in a generic query_device. The qp type is being used as a stand-in for the transport protocol. This worked when the APIs were developed and the only transport was IB. But it is not true today. LSO ultimately applies to a specific protocol. IMO that's what needs to be captured. Applications can still check for this in a generic fashion by matching which protocol (value) it is using against the LSO protocol support. - Sean -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 28/04/2016 19:35, Hefty, Sean wrote: >> For a lot of applications, we want to hide the lower level protocol. For >> example, we try hard to hide whether IB/RoCE is used by an application. >> Thus, is it wise to expose this thing in API? > > IMO - yes. > > An application must know what protocol it is using in order to communicate at all. Even sockets exposes the protocol to the app. If an app truly doesn't care, it still needs a generic way to determine if it is using matching protocols. > I think socket protocols resembles QP types more than how we use protocols in the IB world. >> A user creates a UD QP, he doesn't know or care whether the lower layer >> is Ethernet (RoCE v1/V2) or IB. > > He cares if that there's more than one choice available. There are no guarantees that a ud qp can communicate with some other ud qp. > One of the core questions here is where do we draw the line between the application and the administrator. For example, we could say that an administrator could configure the protocol we use as default for the fabric and that an application with the right privileges could override this default protocol. Saying that, maybe we should query a QP for its attributes only after it is created and a protocol is somehow assigned to it. What do you think? >> I agree that the max_lso could be different for various QP types and >> maybe probing a QP type would be favorable that getting this information >> in a generic query_device. > > The qp type is being used as a stand-in for the transport protocol. This worked when the APIs were developed and the only transport was IB. But it is not true today. > Agree, a QP type hides now several protocols. However, most applications that were written in this IB-only verbs world functions great under other protocols. It sometimes depend on an administrator configurations and such, but still. > LSO ultimately applies to a specific protocol. IMO that's what needs to be captured. Applications can still check for this in a generic fashion by matching which protocol (value) it is using against the LSO protocol support. > If a protocol is already assigned when an application queries a QP type, we get that a QP type uniquely defines a protocol for this application and thus querying a QP type is enough. If that's not the case, an application needs to somehow ask for a specific protocol (and it might needs to be privileged to do so). So, what do you think about stating an optional protocol in create_qp and putting LSO in query_qp? Thanks for taking a look. > - Sean > Matan -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> So, what do you think about stating an optional protocol in create_qp > and putting LSO in query_qp? This makes sense to me, and I think will work well as devices become more complex (e.g. the qlogic NIC that supports both iWarp and RoCE). -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, May 02, 2016 at 05:20:17PM +0000, Hefty, Sean wrote: > > So, what do you think about stating an optional protocol in create_qp > > and putting LSO in query_qp? > > This makes sense to me, and I think will work well as devices become > more complex (e.g. the qlogic NIC that supports both iWarp and > RoCE). What is the actual issue here? LSO should just be programmable UD fragmentation. It should be indistinguishable from a user app sending multiple send WRs to the UD QP. Surely it doesn't have anything to do with the underlying wire protocol? That said, I haven't seen anything talking about how to program the segmentation behavior (presumably it is assuming IP datagrams or something?). It is absolutely necessary to specify the segmentation method that the HW is going to use, there is lots of variety there. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> > > So, what do you think about stating an optional protocol in create_qp > > > and putting LSO in query_qp? > > > > This makes sense to me, and I think will work well as devices become > > more complex (e.g. the qlogic NIC that supports both iWarp and > > RoCE). > > What is the actual issue here? My issue is how this is being exposed. RC QPs already do LSO. What does this change mean in that context? What is actually being offloaded? When I see LSO, I think of TCP LSO. Is LSO the most appropriate term here? > LSO should just be programmable UD fragmentation. It should be > indistinguishable from a user app sending multiple send WRs to the UD > QP. Surely it doesn't have anything to do with the underlying wire > protocol? IMO - the fragmentation has everything to do with the underlying protocol. Are we fragmenting into IP datagrams? IB UD datagrams? UDP packets? Is re-assembly occurring on the opposite side (assuming no)? Is the receiver seeing multiple messages (assuming yes)? How many messages, and where did the breaks occur? If the wire protocol doesn't matter, why does the qp type? > That said, I haven't seen anything talking about how to program the > segmentation behavior (presumably it is assuming IP datagrams or > something?). It is absolutely necessary to specify the segmentation > method that the HW is going to use, there is lots of variety there. I'm hoping that the protocol conveys this information. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, May 02, 2016 at 06:13:45PM +0000, Hefty, Sean wrote: > > > > So, what do you think about stating an optional protocol in create_qp > > > > and putting LSO in query_qp? > > > > > > This makes sense to me, and I think will work well as devices become > > > more complex (e.g. the qlogic NIC that supports both iWarp and > > > RoCE). > > > > What is the actual issue here? > > My issue is how this is being exposed. RC QPs already do LSO. What > does this change mean in that context? What is actually being > offloaded? No, you are mixing layers. LSO is only defined at the WR layer. LSO says, 'take this WR, in HW perform a split and transform operation and create multiple SEND WRs'. The end result is still SEND, LSO does not change anything about the layers below the WR. A SEND on a UD, RC, or RAW QP is still exactly the same as before LSO was involved. Critically, the operation of LSO should be indistinguishable on the wire from actually generating the final WRs directly by the application in the SQ. Thus, RC/UC QPs do not do LSO. Their message segmentation protocol is something totally different. > > LSO should just be programmable UD fragmentation. It should be > > indistinguishable from a user app sending multiple send WRs to the UD > > QP. Surely it doesn't have anything to do with the underlying wire > > protocol? > > IMO - the fragmentation has everything to do with the underlying > protocol. Are we fragmenting into IP datagrams? IB UD datagrams? > UDP packets? It really doesn't. It is perfectly legitimate to perform an Ethernet IPv4 TCP LSO transformation on a RC QP for IB (for instance). You'd get Ethernet frames split up within IB RC SEND packets. eg for ethernet over IB type applications. The LSO transformation requested and underlying QP are totally orthogonal concepts. > Is re-assembly occurring on the opposite side (assuming no)? Is the > receiver seeing multiple messages (assuming yes)? How many > messages, and where did the breaks occur? If the wire protocol > doesn't matter, why does the qp type? Typically an IP focused LSO will work like this: 'SEND' WR for 32k. Split it into 1500 byte chunks, and make a new SEND for each chunk Replicate bytes 0->NN of header onto every new SEND Edit bytes XX->YY assuming they are an IPv4 header (update lengths, checksum, sequence, etc) Edit bytes YY->ZZ assuming they are a TCP or UDP header (length,checksum,etc) In all cases the LSO is simply a transformation of a single SEND into multiple SEND according to certain rules. The rules *must* be specified in the API. I think this is what you are thinking about when you say 'protocol', but this is not IB or iWarp, it is 'LSO packet format protocol #1' or something. IIRC there is quite a lot of variety here in Ethernet hardware, and some hardware can specify the LSO format on each WR. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> LSO says, 'take this WR, in HW perform a split and transform operation > and create multiple SEND WRs'. > > The end result is still SEND, LSO does not change anything about the > layers below the WR. A SEND on a UD, RC, or RAW QP is still exactly the > same as before LSO was involved. Okay - this isn't what I think about when I read LSO at all (i.e. large segment offload, not large send [WR] offload). We have chained WRs. What your describing sounds like some sort of optimized version of that. > Critically, the operation of LSO should be indistinguishable on the > wire from actually generating the final WRs directly by the > application in the SQ. This concept should apply to the receive side as well. And generically, I don't think we even need to restrict WR offload to being the same type of operation. E.g. WRITE_WITH_SEND type of operation. > Typically an IP focused LSO will work like this: > > 'SEND' WR for 32k. > Split it into 1500 byte chunks, and make a new SEND for each chunk > Replicate bytes 0->NN of header onto every new SEND > Edit bytes XX->YY assuming they are an IPv4 header (update lengths, > checksum, sequence, etc) > Edit bytes YY->ZZ assuming they are a TCP or UDP header > (length,checksum,etc) > > In all cases the LSO is simply a transformation of a single SEND into > multiple SEND according to certain rules. > > The rules *must* be specified in the API. I think this is what you are > thinking about when you say 'protocol', but this is not IB or iWarp, > it is 'LSO packet format protocol #1' or something. I was under the impression that this was actually working at a protocol level that was above the transport level of the QP -- i.e. TCP/UDP/IP segmentation offload. - Sean -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, May 04, 2016 at 04:49:47PM +0000, Hefty, Sean wrote: > Okay - this isn't what I think about when I read LSO at all > (i.e. large segment offload, not large send [WR] offload). We have > chained WRs. What your describing sounds like some sort of > optimized version of that. Yes. But critically the typical LSO engine will have the ability to do header replication and editing as it splits up the send. > This concept should apply to the receive side as well. The receive side is asymmetric in this case, LRO is a different animal. > > The rules *must* be specified in the API. I think this is what you are > > thinking about when you say 'protocol', but this is not IB or iWarp, > > it is 'LSO packet format protocol #1' or something. > > I was under the impression that this was actually working at a > protocol level that was above the transport level of the QP -- > i.e. TCP/UDP/IP segmentation offload. Sort of. The typical LSO rules the NIC vendors have designed are intended for TCP and UDP packet segmentation. But in our verbs context, we'd talk about this process as a transformation of 1xLSO_SEND WRs into NxSEND WRs and maintain strict layering. The use of LSO can not do anything the user could not do just by creating SENDs on their own. Further, a LSO rule set should always operate the same, no matter what sort of QP it is running on - the LSO rules don't change just because a QP is iwarp or IB. Matan: I think the take away here is that the API needs a lot more work on specifying exactly what 'LSO' means, in terms of what it actually does, and we almost certainly need some way to negotiate different rule-sets, as I doubt all NIC vendors agree on how this works. Even Linux net has a few variations over time (TSO, LSO/GSO) supported by the core code. And it is easy to predict that things like FCoX, SCTP and so forth will have their own unique rule sets. This should probably be done after creating the QP (and AH?), as a NIC may not be able to do certain LSO rules with certain QP/AH configurations. I'd probably also use the driver-specific channel to communicate the NIC's capability and prototype some kind of API in libibverbs. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 04/05/2016 21:11, Jason Gunthorpe wrote: > On Wed, May 04, 2016 at 04:49:47PM +0000, Hefty, Sean wrote: > >> Okay - this isn't what I think about when I read LSO at all >> (i.e. large segment offload, not large send [WR] offload). We have >> chained WRs. What your describing sounds like some sort of >> optimized version of that. > > Yes. > > But critically the typical LSO engine will have the ability to do header > replication and editing as it splits up the send. > >> This concept should apply to the receive side as well. > > The receive side is asymmetric in this case, LRO is a different > animal. > >>> The rules *must* be specified in the API. I think this is what you are >>> thinking about when you say 'protocol', but this is not IB or iWarp, >>> it is 'LSO packet format protocol #1' or something. >> >> I was under the impression that this was actually working at a >> protocol level that was above the transport level of the QP -- >> i.e. TCP/UDP/IP segmentation offload. > > Sort of. The typical LSO rules the NIC vendors have designed are > intended for TCP and UDP packet segmentation. > > But in our verbs context, we'd talk about this process as a > transformation of 1xLSO_SEND WRs into NxSEND WRs and maintain strict > layering. The use of LSO can not do anything the user could not do > just by creating SENDs on their own. > > Further, a LSO rule set should always operate the same, no matter what > sort of QP it is running on - the LSO rules don't change just because a > QP is iwarp or IB. > Agree > Matan: I think the take away here is that the API needs a lot more > work on specifying exactly what 'LSO' means, in terms of what it > actually does, and we almost certainly need some way to negotiate > different rule-sets, as I doubt all NIC vendors agree on how this > works. Even Linux net has a few variations over time (TSO, LSO/GSO) > supported by the core code. And it is easy to predict that things like > FCoX, SCTP and so forth will have their own unique rule sets. > I agree that we're lacking especially in the query capabilities area here. > This should probably be done after creating the QP (and AH?), as a NIC > may not be able to do certain LSO rules with certain QP/AH > configurations. > So I guess we want something that after a user creates a QP, he can query which protocols (IP, TCP, etc) could be used in the NIC's segmentation offload mechanism. > I'd probably also use the driver-specific channel to communicate the > NIC's capability and prototype some kind of API in libibverbs. > If we would like to expose this feature in the uAPI, why would we want to use the vendor specific channel instead of the standard core channel? > Jason > Matan -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, May 05, 2016 at 11:21:29AM +0300, Matan Barak wrote: > >This should probably be done after creating the QP (and AH?), as a NIC > >may not be able to do certain LSO rules with certain QP/AH > >configurations. > > > > So I guess we want something that after a user creates a QP, he can query > which protocols (IP, TCP, etc) could be used in the NIC's segmentation > offload mechanism. This is where I get upset with the process we are following here, without input from other hardware architects in other companies, it is hard to design something truly common. > >I'd probably also use the driver-specific channel to communicate the > >NIC's capability and prototype some kind of API in libibverbs. > > If we would like to expose this feature in the uAPI, why would we want to > use the vendor specific channel instead of the standard core channel? Because we don't really know what we are doing, we have only one example of hardware that implements this and we can change libibverbs with alot less pain than changing the kernel's uAPI. If you use the driver channel, and are smart about it, you can generically expose everything the card can do and can stop churning the kernel every time a new QP feature comes up. How is that not better for everyone? When multiple vendors actually implement a verbs feature (or agree to implement one via IBTA/IETF/etc) then it becomes much safer to enshrine it forever in a common kernel uAPI. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 5 May 2016, Jason Gunthorpe wrote: > On Thu, May 05, 2016 at 11:21:29AM +0300, Matan Barak wrote: > > >This should probably be done after creating the QP (and AH?), as a NIC > > >may not be able to do certain LSO rules with certain QP/AH > > >configurations. > > > > > > > So I guess we want something that after a user creates a QP, he can query > > which protocols (IP, TCP, etc) could be used in the NIC's segmentation > > offload mechanism. > > This is where I get upset with the process we are following here, > without input from other hardware architects in other companies, it is > hard to design something truly common. Presumably the other vendors are listening in on this conversation. The lack of alternate proposals and objections by them has often been considered silent approval in the past. > When multiple vendors actually implement a verbs feature (or agree to > implement one via IBTA/IETF/etc) then it becomes much safer to > enshrine it forever in a common kernel uAPI. On the other hand nothing is going to happen if one vendor does not push ahead. There were multiple implementations in the IP stack as well until things settled. We need to be able to do the same and not stifle innovation by making a vendor to wait until the competition comes up with something similar. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, May 05, 2016 at 09:20:49PM -0500, Christoph Lameter wrote: > > This is where I get upset with the process we are following here, > > without input from other hardware architects in other companies, > > it is hard to design something truly common. > > Presumably the other vendors are listening in on this conversation. The > lack of alternate proposals and objections by them has often been > considered silent approval in the past. I know the various hardware architects that would be involved with a IBTA process/etc do not monitor this list. AFAIK, no other linux community works in a way where the Linux centric mailing list sets hardware standards. Eg linux-scsi doesn't drive the T10 agenda, linux-pci doesn't drive the PCISIG, etc. At best they inspire work in those other communities. > > When multiple vendors actually implement a verbs feature (or agree to > > implement one via IBTA/IETF/etc) then it becomes much safer to > > enshrine it forever in a common kernel uAPI. > > On the other hand nothing is going to happen if one vendor does not push > ahead. There were multiple implementations in the IP stack as well until > things settled. We need to be able to do the same and not stifle > innovation by making a vendor to wait until the competition comes > up with something similar. Agreed. I'm just saying, do it in user space and leave the common kernel uAPI alone until a more obvious consensus is reached. That should speed everything up.. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 6 May 2016, Jason Gunthorpe wrote: > > > This is where I get upset with the process we are following here, > > > without input from other hardware architects in other companies, > > > it is hard to design something truly common. > > > > Presumably the other vendors are listening in on this conversation. The > > lack of alternate proposals and objections by them has often been > > considered silent approval in the past. > > I know the various hardware architects that would be involved with a > IBTA process/etc do not monitor this list. An IBTA process? How would that be relevant here? > > On the other hand nothing is going to happen if one vendor does not push > > ahead. There were multiple implementations in the IP stack as well until > > things settled. We need to be able to do the same and not stifle > > innovation by making a vendor to wait until the competition comes > > up with something similar. > > Agreed. > > I'm just saying, do it in user space and leave the common kernel uAPI > alone until a more obvious consensus is reached. That should speed > everything up.. How would you do this in user space? -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, May 09, 2016 at 09:27:32AM -0500, Christoph Lameter wrote: > On Fri, 6 May 2016, Jason Gunthorpe wrote: > > > > > This is where I get upset with the process we are following here, > > > > without input from other hardware architects in other companies, > > > > it is hard to design something truly common. > > > > > > Presumably the other vendors are listening in on this conversation. The > > > lack of alternate proposals and objections by them has often been > > > considered silent approval in the past. > > > > I know the various hardware architects that would be involved with a > > IBTA process/etc do not monitor this list. > > An IBTA process? How would that be relevant here? What do you mean? IBTA is the only multi-vendor body left working on standardizing the hardware specification for verbs. These various recent patches are adding new hardware features to verbs. If you want hardware knowledgeable people to help, then you need to go to the forums they are active in. This isn't just a software exercise. > > > On the other hand nothing is going to happen if one vendor does not push > > > ahead. There were multiple implementations in the IP stack as well until > > > things settled. We need to be able to do the same and not stifle > > > innovation by making a vendor to wait until the competition comes > > > up with something similar. > > > > Agreed. > > > > I'm just saying, do it in user space and leave the common kernel uAPI > > alone until a more obvious consensus is reached. That should speed > > everything up.. > > How would you do this in user space? Use udata to get the driver to do the little bits needed. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> > An IBTA process? How would that be relevant here? > > What do you mean? IBTA is the only multi-vendor body left working on > standardizing the hardware specification for verbs. > > These various recent patches are adding new hardware features to > verbs. > > If you want hardware knowledgeable people to help, then you need to go > to the forums they are active in. > > This isn't just a software exercise. Previous changes to the uABI (for example, XRC) were not accepted until they were standardized by the IBTA. An exception that I'm aware of are the flow specs. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, May 09, 2016 at 10:35:37AM -0600, Jason Gunthorpe wrote: > On Mon, May 09, 2016 at 09:27:32AM -0500, Christoph Lameter wrote: > > On Fri, 6 May 2016, Jason Gunthorpe wrote: > > > > > > > This is where I get upset with the process we are following here, > > > > > without input from other hardware architects in other companies, > > > > > it is hard to design something truly common. > > > > > > > > Presumably the other vendors are listening in on this conversation. The > > > > lack of alternate proposals and objections by them has often been > > > > considered silent approval in the past. > > > > > > I know the various hardware architects that would be involved with a > > > IBTA process/etc do not monitor this list. > > > > An IBTA process? How would that be relevant here? > > What do you mean? IBTA is the only multi-vendor body left working on > standardizing the hardware specification for verbs. > > These various recent patches are adding new hardware features to > verbs. These patches are exporting hardware capabilities and not "adding new hardware features". > > If you want hardware knowledgeable people to help, then you need to go > to the forums they are active in. It doesn't make sense to look after these people who have no interest in linux RDMA at all.
On Mon, May 09, 2016 at 10:16:13PM +0300, Leon Romanovsky wrote: > > If you want hardware knowledgeable people to help, then you need to go > > to the forums they are active in. > > It doesn't make sense to look after these people who have no interest in > linux RDMA at all. Linux RDMA is only one half of the coin - most of our uAPIs are backed by a hardware implementation, and thus defining a bad API directly limits future hardware designs. It is very foolish not to ask the people who are designing future hardware what they think. This isn't like other places in the kernel where everything is just software. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, May 09, 2016 at 07:11:23PM +0000, Hefty, Sean wrote: > > > An IBTA process? How would that be relevant here? > > > > What do you mean? IBTA is the only multi-vendor body left working on > > standardizing the hardware specification for verbs. > > > > These various recent patches are adding new hardware features to > > verbs. > > > > If you want hardware knowledgeable people to help, then you need to go > > to the forums they are active in. > > > > This isn't just a software exercise. > > Previous changes to the uABI (for example, XRC) were not accepted > until they were standardized by the IBTA. An exception that I'm > aware of are the flow specs. We need to decide as a community what we want to do here, because this uncertain process is getting tiring for everyone. The Collab Summit discussions ended with a general consensus that some things would go to the driver-specific channel, while hardware behavior related to the common uAPI would go through at least a multi-vendor sign off process - like IBTA. We didn't talk about how to expose them to apps in userspace once they were exposed through the kernel. It is pretty obvious the current 'process' isn't working. This new idea that the common uAPI should just be a union of every vendor's unique hardware ideas has not resulted in patches being merged any faster; and I think the quality of the API is not as good now.. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> We need to decide as a community what we want to do here, because this > uncertain process is getting tiring for everyone. Agree greatly > The Collab Summit discussions ended with a general consensus that some > things would go to the driver-specific channel, while hardware > behavior related to the common uAPI would go through at least a > multi-vendor sign off process - like IBTA. I agree with this approach. > We didn't talk about how to expose them to apps in userspace once they > were exposed through the kernel. I think this will end up being vendor specific, and I'm not sure each vendor knows what they want to do. > It is pretty obvious the current 'process' isn't working. This new > idea that the common uAPI should just be a union of every vendor's > unique hardware ideas has not resulted in patches being merged any > faster; and I think the quality of the API is not as good now.. IMO, crafting a common API means a higher level of abstraction. The lower the API, the more it needs to be driver specific. I prefer a method that: allows vendors to stay out of each other's way, and quickly allows vendors to export their features to user space. Either of these options can work for that. I don't see that a low-level, common API is suitable. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, May 09, 2016 at 01:57:34PM -0600, Jason Gunthorpe wrote: > On Mon, May 09, 2016 at 10:16:13PM +0300, Leon Romanovsky wrote: > > > > If you want hardware knowledgeable people to help, then you need to go > > > to the forums they are active in. > > > > It doesn't make sense to look after these people who have no interest in > > linux RDMA at all. > > Linux RDMA is only one half of the coin - most of our uAPIs are > backed by a hardware implementation, and thus defining a bad API > directly limits future hardware designs. It is very foolish not to ask > the people who are designing future hardware what they think. Exactly, this is why it is posted on the open mailing list where everyone has stage to express their opinion, including these mysterious hardware architecture who cares about future design, but doesn't care about its future support. The intension for submitted new uAPI to be generic from the beginning and authors are always open to adjust it to answer reviews feedback.
On Mon, May 09, 2016 at 02:05:07PM -0600, Jason Gunthorpe wrote: > On Mon, May 09, 2016 at 07:11:23PM +0000, Hefty, Sean wrote: > > > > An IBTA process? How would that be relevant here? > > > > > > What do you mean? IBTA is the only multi-vendor body left working on > > > standardizing the hardware specification for verbs. > > > > > > These various recent patches are adding new hardware features to > > > verbs. > > > > > > If you want hardware knowledgeable people to help, then you need to go > > > to the forums they are active in. > > > > > > This isn't just a software exercise. > > > > Previous changes to the uABI (for example, XRC) were not accepted > > until they were standardized by the IBTA. An exception that I'm > > aware of are the flow specs. > > We need to decide as a community what we want to do here, because this > uncertain process is getting tiring for everyone. > > The Collab Summit discussions ended with a general consensus that some > things would go to the driver-specific channel, while hardware > behavior related to the common uAPI would go through at least a > multi-vendor sign off process - like IBTA. Jason, We did ALL and much more from the discussed for RSS patches and at the end, we saw half a word from the maintainer about the future plans for its acceptance. > > We didn't talk about how to expose them to apps in userspace once they > were exposed through the kernel. > > It is pretty obvious the current 'process' isn't working. There is no meaning of "process" without people and their execution, and right now, I don't see any execution of it in the linux RDMA community (fixes and ULP changes are not merged for a while).
> > > We need to decide as a community what we want to do here, because this > > uncertain process is getting tiring for everyone. > > Agree greatly > > > The Collab Summit discussions ended with a general consensus that some > > things would go to the driver-specific channel, while hardware > > behavior related to the common uAPI would go through at least a > > multi-vendor sign off process - like IBTA. > > I agree with this approach. > > > We didn't talk about how to expose them to apps in userspace once they > > were exposed through the kernel. > > I think this will end up being vendor specific, and I'm not sure each vendor knows > what they want to do. > > > It is pretty obvious the current 'process' isn't working. This new > > idea that the common uAPI should just be a union of every vendor's > > unique hardware ideas has not resulted in patches being merged any > > faster; and I think the quality of the API is not as good now.. > > IMO, crafting a common API means a higher level of abstraction. The lower the > API, the more it needs to be driver specific. I prefer a method that: allows vendors > to stay out of each other's way, and quickly allows vendors to export their features > to user space. Either of these options can work for that. I don't see that a low- > level, common API is suitable. Hey all, I've been trying to follow this thread, and there has been comments that other device owners should speak up. But I'm not sure what LSO would do for an iWARP RC QP connection. Can someone give me a quick summary? Because as it stands now, a ~4GB send can be posted in 1 work request, and FW/HW (in the iWARP DDP layer) deals with segmenting it based on the MTU for the interface, and then assembling it to deliver as a single RECV completion at the peer. What would LSO do beyond this? Sorry for the dumb question. Steve -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, May 10, 2016 at 10:03:48AM -0500, Steve Wise wrote: > I've been trying to follow this thread, and there has been comments that other > device owners should speak up. But I'm not sure what LSO would do for an iWARP > RC QP connection. Can someone give me a quick summary? It is unlikely someone would use LSO and an iWarp RC QP together. However, an iWarp NIC may want to expose the raw ethernet QP style of feature, which would probably also want to include the LSO, RSS, timestamping etc features. Most likely iWarp cards already include LSO features in their netdevice drivers? In that instance I'd say, imagine that same LSO feature works on a verbs QP and ask if the patch covers the way it works. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> > > I've been trying to follow this thread, and there has been comments that other > > device owners should speak up. But I'm not sure what LSO would do for an iWARP > > RC QP connection. Can someone give me a quick summary? > > It is unlikely someone would use LSO and an iWarp RC QP together. > > However, an iWarp NIC may want to expose the raw ethernet QP style of > feature, which would probably also want to include the LSO, RSS, > timestamping etc features. > Yes, that makes sense. Chelsio has something like this that is currently out-of-kernel. But there is no LSO/RSS support. And it needs major rewrite to become acceptable for inclusion upstream. But I can review the changes with this in mind. > Most likely iWarp cards already include LSO features in their > netdevice drivers? In that instance I'd say, imagine that same LSO > feature works on a verbs QP and ask if the patch covers the way it > works. > All of Chelsio's cards support LSO features (and all the other important NIC features). I'll think about the proposed API and if it would be sufficient for a RAW_QP type of service. Thanks for the summary Jason! Steve. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> Exactly, this is why it is posted on the open mailing list where > everyone has stage to express their opinion, including these mysterious > hardware architecture who cares about future design, but doesn't care about > its future support. Look, if you want linux-rdma to be setting hardware standards then get a consensus from the vendors on this idea. This was tried at the Collab summit and I heard several representatives strongly say no. It is not an inherently unworkable idea. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> I've been trying to follow this thread, and there has been comments that > other > device owners should speak up. But I'm not sure what LSO would do for an > iWARP > RC QP connection. Can someone give me a quick summary? Well, it's not clear to me yet what this capability actually does. Jason described it as some sort of work request 'chaining'. I tried briefly searching for details on the web, but didn't find any significant documentation. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, May 10, 2016 at 07:58:01AM +0300, Leon Romanovsky wrote: > > The Collab Summit discussions ended with a general consensus that some > > things would go to the driver-specific channel, while hardware > > behavior related to the common uAPI would go through at least a > > multi-vendor sign off process - like IBTA. > > We did ALL and much more from the discussed for RSS patches and at the end, > we saw half a word from the maintainer about the future plans for its acceptance. Where is the 'multi-vendor sign off' part of the process? Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, May 10, 2016 at 11:07:39AM -0600, Jason Gunthorpe wrote: > > Exactly, this is why it is posted on the open mailing list where > > everyone has stage to express their opinion, including these mysterious > > hardware architecture who cares about future design, but doesn't care about > > its future support. > > Look, if you want linux-rdma to be setting hardware standards then get > a consensus from the vendors on this idea. I want fruitful discussion on this mailing list which can be beneficial to everyone, right now all discussions can be summarized in the following sentence: "i don't need it, no one need it". And to which vendors are you referring exactly? They are all on this mailing list. > > This was tried at the Collab summit and I heard several representatives > strongly say no. I may admit that I missed part of the discussions and my knowledge is limited. Can you post their names and their practical suggestions? Thanks
On Tue, May 10, 2016 at 11:18:19AM -0600, Jason Gunthorpe wrote: > On Tue, May 10, 2016 at 07:58:01AM +0300, Leon Romanovsky wrote: > > > The Collab Summit discussions ended with a general consensus that some > > > things would go to the driver-specific channel, while hardware > > > behavior related to the common uAPI would go through at least a > > > multi-vendor sign off process - like IBTA. > > > > We did ALL and much more from the discussed for RSS patches and at the end, > > we saw half a word from the maintainer about the future plans for its acceptance. > > Where is the 'multi-vendor sign off' part of the process? The steps are described at the following mail [1]. [1] http://marc.info/?l=linux-rdma&m=146218340000784&w=2 > > Jason
On Tue, May 10, 2016 at 08:52:00PM +0300, Leon Romanovsky wrote: > On Tue, May 10, 2016 at 11:18:19AM -0600, Jason Gunthorpe wrote: > > On Tue, May 10, 2016 at 07:58:01AM +0300, Leon Romanovsky wrote: > > > > The Collab Summit discussions ended with a general consensus > > > > that some things would go to the driver-specific channel, > > > > while hardware behavior related to the common uAPI would go > > > > through at least a multi-vendor sign off process - like IBTA. > > > > > > We did ALL and much more from the discussed for RSS patches and > > > at the end, we saw half a word from the maintainer about the > > > future plans for its acceptance. > > > > Where is the 'multi-vendor sign off' part of the process? > > The steps are described at the following mail [1]. > > [1] http://marc.info/?l=linux-rdma&m=146218340000784&w=2 None of that is what I mean when I say multi-vendor sign off. I mean a vote showing multi-vendor participation, or active support from other vendors. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, May 10, 2016 at 12:18:47PM -0600, Jason Gunthorpe wrote: > On Tue, May 10, 2016 at 08:52:00PM +0300, Leon Romanovsky wrote: > > On Tue, May 10, 2016 at 11:18:19AM -0600, Jason Gunthorpe wrote: > > > On Tue, May 10, 2016 at 07:58:01AM +0300, Leon Romanovsky wrote: > > > > > The Collab Summit discussions ended with a general consensus > > > > > that some things would go to the driver-specific channel, > > > > > while hardware behavior related to the common uAPI would go > > > > > through at least a multi-vendor sign off process - like IBTA. > > > > > > > > We did ALL and much more from the discussed for RSS patches and > > > > at the end, we saw half a word from the maintainer about the > > > > future plans for its acceptance. > > > > > > Where is the 'multi-vendor sign off' part of the process? > > > > The steps are described at the following mail [1]. > > > > [1] http://marc.info/?l=linux-rdma&m=146218340000784&w=2 > > None of that is what I mean when I say multi-vendor sign off. > > I mean a vote showing multi-vendor participation, or active support > from other vendors. They participated in OFA/OFAWG/OFVWG and this ML and didn't said no. It is more than enough to be convinced. > > Jason
On Tue, May 10, 2016 at 10:26:55PM +0300, Leon Romanovsky wrote: > On Tue, May 10, 2016 at 12:18:47PM -0600, Jason Gunthorpe wrote: > > On Tue, May 10, 2016 at 08:52:00PM +0300, Leon Romanovsky wrote: > > > On Tue, May 10, 2016 at 11:18:19AM -0600, Jason Gunthorpe wrote: > > > > On Tue, May 10, 2016 at 07:58:01AM +0300, Leon Romanovsky wrote: > > > > > > The Collab Summit discussions ended with a general consensus > > > > > > that some things would go to the driver-specific channel, > > > > > > while hardware behavior related to the common uAPI would go > > > > > > through at least a multi-vendor sign off process - like IBTA. > > > > > > > > > > We did ALL and much more from the discussed for RSS patches and > > > > > at the end, we saw half a word from the maintainer about the > > > > > future plans for its acceptance. > > > > > > > > Where is the 'multi-vendor sign off' part of the process? > > > > > > The steps are described at the following mail [1]. > > > > > > [1] http://marc.info/?l=linux-rdma&m=146218340000784&w=2 > > > > None of that is what I mean when I say multi-vendor sign off. > > > > I mean a vote showing multi-vendor participation, or active support > > from other vendors. > > They participated in OFA/OFAWG/OFVWG and this ML and didn't said no. > It is more than enough to be convinced. Sorry for typos, s/didn't said/didn't say > > > > > Jason
On Tue, 10 May 2016, Jason Gunthorpe wrote: > > Exactly, this is why it is posted on the open mailing list where > > everyone has stage to express their opinion, including these mysterious > > hardware architecture who cares about future design, but doesn't care about > > its future support. > > Look, if you want linux-rdma to be setting hardware standards then get > a consensus from the vendors on this idea. This is about exporting existing hardware features that are already available through the standard socket APIs. > This was tried at the Collab summit and I heard several representatives > strongly say no. Hardware standards? I thought we were talking about software and APIs. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, May 10, 2016 at 07:29:42PM -0500, Christoph Lameter wrote: > > > Exactly, this is why it is posted on the open mailing list where > > > everyone has stage to express their opinion, including these mysterious > > > hardware architecture who cares about future design, but doesn't care about > > > its future support. > > > > Look, if you want linux-rdma to be setting hardware standards then get > > a consensus from the vendors on this idea. > > This is about exporting existing hardware features that are already > available through the standard socket APIs. Eh? How is LSO or RSS available through standard socket APIs? Yes, it is indirectly through the TCP stack, but the actual hardware feature is not directly accessible??? > > This was tried at the Collab summit and I heard several representatives > > strongly say no. > > Hardware standards? I thought we were talking about software and APIs. The LSO, RSS and Timestamp patches all demand very specific hardware functionality for a device to implement the API. It is fundamentally not a 100% software API. Look at the patches. The LSO patch is so bad Sean and Steve couldn't even figure out what the heck the HW feature does. I know what it does, but I can't tell you the fine details of the actual required hardware behavior. How on earth is someone else supposed to ever implement hardware that does this? I still don't understand why this is being forced into libibverbs. dpdk would seem to be the prefered way to access these sorts of features and there is no technical reason why the mlx dpdk provider needs to use libibverbs. Talk directly to /dev/uverbs0 and use udata to enable the universe of nic specific features. For instance, the dpdk community has already done the hard work to figure out how to expose things like LSO to their users - and they actually have multiple vendors providing these features... http://dpdk.org/ml/archives/dev/2014-May/002537.html I guess you are going to tell me you want these features on IB UD transport and dpdk has no support for L2's other than ethernet? Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 5/10/2016 9:04 PM, Jason Gunthorpe wrote: > On Tue, May 10, 2016 at 07:29:42PM -0500, Christoph Lameter wrote: >>> This was tried at the Collab summit and I heard several representatives >>> strongly say no. >> Hardware standards? I thought we were talking about software and APIs. > The LSO, RSS and Timestamp patches all demand very specific hardware > functionality for a device to implement the API. It is fundamentally > not a 100% software API. > > Look at the patches. The LSO patch is so bad Sean and Steve couldn't > even figure out what the heck the HW feature does. I know what it > does, but I can't tell you the fine details of the actual required > hardware behavior. > > How on earth is someone else supposed to ever implement hardware that > does this? > > I still don't understand why this is being forced into > libibverbs. dpdk would seem to be the prefered way to access these > sorts of features and there is no technical reason why the mlx > dpdk provider needs to use libibverbs. Talk directly to /dev/uverbs0 > and use udata to enable the universe of nic specific features. There are other user space libraries which are build on top of libibverbs rather than DPDK. E.g, VMA. > > For instance, the dpdk community has already done the hard work to > figure out how to expose things like LSO to their users - and they > actually have multiple vendors providing these features... > > http://dpdk.org/ml/archives/dev/2014-May/002537.html > > I guess you are going to tell me you want these features on IB UD > transport and dpdk has no support for L2's other than ethernet? > > Jason > The patches from kernel are purely for hardware capabilities report. The policy and logic about how to implement the LSO are living in libibverbs and hardware related libraies(e.g, libmlx5). Review these patches together might make things more clear. They're not pushed to community for review yet because we still can't reach the agreement from this very first step. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, May 10, 2016 at 08:04:29PM -0600, Jason Gunthorpe wrote: > Look at the patches. The LSO patch is so bad Sean and Steve couldn't > even figure out what the heck the HW feature does. I know what it > does, but I can't tell you the fine details of the actual required > hardware behavior. These people are very busy and you can't expect from them to know everything, If someone doesn't know/understand, he will ask, exactly as they, you and me are doing all the time. Just to remind you, that YOUR's patch [1] didn't meet YOUR own quality criteria and can be called bad one, just because very respectful people asked their questions which very expected to be answered in the patch itself, but they don't [2]. [1] e6bd18f57aad ("IB/security: Restrict use of the write() interface") [2] http://marc.info/?l=linux-rdma&m=146264707224127&w=2
On Tue, 10 May 2016, Jason Gunthorpe wrote: > I still don't understand why this is being forced into > libibverbs. dpdk would seem to be the prefered way to access these > sorts of features and there is no technical reason why the mlx > dpdk provider needs to use libibverbs. Talk directly to /dev/uverbs0 > and use udata to enable the universe of nic specific features. Well then the trouble is how to integrate that into QPs and the rest of the RDMA infrastructure. > > For instance, the dpdk community has already done the hard work to > figure out how to expose things like LSO to their users - and they > actually have multiple vendors providing these features... > > http://dpdk.org/ml/archives/dev/2014-May/002537.html > > I guess you are going to tell me you want these features on IB UD > transport and dpdk has no support for L2's other than ethernet? Our software is based on raw ethernet frame processing through QPs. Thus the desire to have the extras that the hardware can provide. And yes we want to have an API that can be used to interface with hardware from multiple vendors. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, May 11, 2016 at 09:23:30AM -0500, Christoph Lameter wrote: > On Tue, 10 May 2016, Jason Gunthorpe wrote: > > > I still don't understand why this is being forced into > > libibverbs. dpdk would seem to be the prefered way to access these > > sorts of features and there is no technical reason why the mlx > > dpdk provider needs to use libibverbs. Talk directly to /dev/uverbs0 > > and use udata to enable the universe of nic specific features. > > Well then the trouble is how to integrate that into QPs and the rest of > the RDMA infrastructure. Granted, yes, dpdk obviously hasn't had much work in that area, but it also doesn't seem all that complex. You basically want to attach a dpdk pmd to something other than ethernet hardware. Eg a IB UD QP using ipoib. At the end of the day that is just a few more parameters when starting the pmd, and a different hwaddress length. To me that sounds much simpler than squeezing all of this endless stuff into libibverbs. Assuming that could be done, the dpdk community seems to have a much better handle on how to design and expose these IP acceleration features. All we need to do in RDMA land is just let the mlx4 pmd and mlx4 kernel driver communicate privately to do the special setup. We've already largely done that with udata. I don't think it would be contentious at all if mlx4 developed some 'dpdk' driver-specific calls. Much like how hfi1 has psm driver specific calls. Overall, dpdk looks alot better for building IP centric apps - it has more support utilities to work with IP packets, more optimization, and more device support for this kind of function. libibverbs always seemed like a poor fit to me.. .. and having something micro-optimized to only do IP would certainly speed up wr and wc processing in the pmd, code paths and structure members to support other opcodes can just be purged entirely. That is the cache line efficiency you were talking about in the other thread. > Our software is based on raw ethernet frame processing through QPs. Thus > the desire to have the extras that the hardware can provide. And yes we > want to have an API that can be used to interface with hardware from > multiple vendors. Well, then you need multiple vendors to agree they could potentially build hardware that matches these proposed uAPIs. Which is more or less what I've been more or less asking for :) Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, May 11, 2016 at 07:31:34AM +0300, Leon Romanovsky wrote: > On Tue, May 10, 2016 at 08:04:29PM -0600, Jason Gunthorpe wrote: > > Look at the patches. The LSO patch is so bad Sean and Steve couldn't > > even figure out what the heck the HW feature does. I know what it > > does, but I can't tell you the fine details of the actual required > > hardware behavior. > > These people are very busy and you can't expect from them to know > everything, If someone doesn't know/understand, he will ask, exactly > as they, you and me are doing all the time. > > Just to remind you, that YOUR's patch [1] didn't meet YOUR own quality > criteria and can be called bad one, just because very respectful people > asked their questions which very expected to be answered in the patch > itself, but they don't [2]. Oh come on, can you please get serious about this? Stubbornly taking a baseless and totally obstinate position is not working with the community. A security patch, which is deliberately a little obtuse to avoid telling the world how to write an exploit, is not at all the same as describing a complex new uAPI feature that is expected to be implementable by other parties. I can't believe you'd even try to use that as an excuse. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, May 11, 2016 at 11:36:08AM -0600, Jason Gunthorpe wrote: > On Wed, May 11, 2016 at 07:31:34AM +0300, Leon Romanovsky wrote: > > On Tue, May 10, 2016 at 08:04:29PM -0600, Jason Gunthorpe wrote: > > > Look at the patches. The LSO patch is so bad Sean and Steve couldn't > > > even figure out what the heck the HW feature does. I know what it > > > does, but I can't tell you the fine details of the actual required > > > hardware behavior. > > > > These people are very busy and you can't expect from them to know > > everything, If someone doesn't know/understand, he will ask, exactly > > as they, you and me are doing all the time. > > > > Just to remind you, that YOUR's patch [1] didn't meet YOUR own quality > > criteria and can be called bad one, just because very respectful people > > asked their questions which very expected to be answered in the patch > > itself, but they don't [2]. > > Oh come on, can you please get serious about this? Stubbornly taking a > baseless and totally obstinate position is not working with the > community. > > A security patch, which is deliberately a little obtuse to avoid > telling the world how to write an exploit, is not at all the same as > describing a complex new uAPI feature that is expected to be > implementable by other parties. > > I can't believe you'd even try to use that as an excuse. It is not an excuse and never supposed to be. My position is very concrete on this topic: post code on ML, start discussion, adjust the code and merge it. It works for all other code posted on various linux kernel ML and I don't see why it can't work for RDMA too. And again, there is openness and readiness of all participants in this ML to answer feedback and post adjusted code. > > Jason
On Wed, May 11, 2016 at 03:47:14AM +0000, Bodong Wang wrote: > > I still don't understand why this is being forced into > > libibverbs. dpdk would seem to be the prefered way to access these > > sorts of features and there is no technical reason why the mlx > > dpdk provider needs to use libibverbs. Talk directly to /dev/uverbs0 > > and use udata to enable the universe of nic specific features. > There are other user space libraries which are build on top of > libibverbs rather than DPDK. E.g, VMA. I looked at the VMA manual a bit, and honestly, it looks a lot like DPDK. It could hook into the kernel the same way (via udata and /dev/uverbs0) > The patches from kernel are purely for hardware capabilities > report. The policy and logic about how to implement the LSO are > living in libibverbs and hardware related libraies(e.g, > libmlx5). Review these patches together might make things more > clear. They're not pushed to community for review yet because we > still can't reach the agreement from this very first step. That seems like a great argument that this doesn't belong in the common api. If it is just driver capabilities, and they are not well defined, then put them in mlx4's udata. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 11 May 2016, Jason Gunthorpe wrote: > You basically want to attach a dpdk pmd to something other than > ethernet hardware. Eg a IB UD QP using ipoib. At the end of the day > that is just a few more parameters when starting the pmd, and a > different hwaddress length. To me that sounds much simpler than > squeezing all of this endless stuff into libibverbs. DPDK handles all network traffic for a nic. What we have with the raw ethernet QPs is the ability to offload individual Ethernet data streams to user space. That is not possible with DPDK and very much a QP based feature. > Overall, dpdk looks alot better for building IP centric apps - it has > more support utilities to work with IP packets, more optimization, and > more device support for this kind of function. libibverbs always seemed > like a poor fit to me.. It does not allow the kernel to handle the ethernet traffic in general but requires a full implementation of an IP stack with all the extras. > Well, then you need multiple vendors to agree they could potentially > build hardware that matches these proposed uAPIs. Which is more or less > what I've been more or less asking for :) I did not see disagreement from the other vendors. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, May 12, 2016 at 08:34:41AM -0500, Christoph Lameter wrote: > On Wed, 11 May 2016, Jason Gunthorpe wrote: > > > You basically want to attach a dpdk pmd to something other than > > ethernet hardware. Eg a IB UD QP using ipoib. At the end of the day > > that is just a few more parameters when starting the pmd, and a > > different hwaddress length. To me that sounds much simpler than > > squeezing all of this endless stuff into libibverbs. > > DPDK handles all network traffic for a nic. Eh? The DPDK documentation for the mlx4 PMD disagrees: This capability allows the PMD to coexist with kernel network interfaces which remain functional, although they stop receiving unicast packets as long as they share the same MAC address. The mlx4 PMD *is* QP based, clearly there is no fundamental obstacle to providing the dpdk API over a flow-steering QP hardware model. From that point it becomes a problem of configuring the QP's flow steering to get the exact split you need, which is exactly the same as what is being done with verbs today. Don't be confused that the typical example of dpdk uses a sr-iov hardware driver, that is just reflecting a limitation of the NIC hardware available, better hardware like mlx4 can and does use a QP model to steer the packets instead. > > build hardware that matches these proposed uAPIs. Which is more or less > > what I've been more or less asking for :) > > I did not see disagreement from the other vendors. You don't find Steve's remarks troubling then? I view Chelsio as one the vendors that could actually implement the hardware side of these proposed APIs with a firmware update and he hasn't even looked at them, and has yet to understand what they even do.. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c index 6fdc7ec..b0fb94b 100644 --- a/drivers/infiniband/core/uverbs_cmd.c +++ b/drivers/infiniband/core/uverbs_cmd.c @@ -3655,6 +3655,13 @@ int ib_uverbs_ex_query_device(struct ib_uverbs_file *file, resp.hca_core_clock = attr.hca_core_clock; resp.response_length += sizeof(resp.hca_core_clock); + if (ucore->outlen < resp.response_length + sizeof(resp.lso_caps)) + goto end; + + resp.lso_caps.max_lso = attr.lso_caps.max_lso; + resp.lso_caps.supported_qpts = attr.lso_caps.supported_qpts; + resp.response_length += sizeof(resp.lso_caps); + end: err = ib_copy_to_udata(ucore, &resp, resp.response_length); return err; diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index fb2cef4..d5a466f 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -271,6 +271,11 @@ struct ib_cq_init_attr { u32 flags; }; +struct ib_lso_caps { + u32 max_lso; + u32 supported_qpts; /* Use enum ib_qp_type */ +}; + struct ib_device_attr { u64 fw_ver; __be64 sys_image_guid; @@ -317,6 +322,7 @@ struct ib_device_attr { struct ib_odp_caps odp_caps; uint64_t timestamp_mask; uint64_t hca_core_clock; /* in KHZ */ + struct ib_lso_caps lso_caps; }; enum ib_mtu { diff --git a/include/uapi/rdma/ib_user_verbs.h b/include/uapi/rdma/ib_user_verbs.h index 8126c14..fa039f5 100644 --- a/include/uapi/rdma/ib_user_verbs.h +++ b/include/uapi/rdma/ib_user_verbs.h @@ -219,6 +219,11 @@ struct ib_uverbs_odp_caps { __u32 reserved; }; +struct ib_uverbs_lso_caps { + __u32 max_lso; + __u32 supported_qpts; /* Use enum ib_qp_type */ +}; + struct ib_uverbs_ex_query_device_resp { struct ib_uverbs_query_device_resp base; __u32 comp_mask; @@ -226,6 +231,7 @@ struct ib_uverbs_ex_query_device_resp { struct ib_uverbs_odp_caps odp_caps; __u64 timestamp_mask; __u64 hca_core_clock; /* in KHZ */ + struct ib_uverbs_lso_caps lso_caps; }; struct ib_uverbs_query_port {