Message ID | 1438788867-18332-2-git-send-email-amirv@mellanox.com (mailing list archive) |
---|---|
State | Changes Requested |
Headers | show |
> +enum ib_csum_cap_flags { > + IB_CSUM_RX_TCP_UDP = 1 << 0, > + IB_CSUM_RX_IP_HDR = 1 << 1, > + IB_CSUM_TX_TCP_UDP = 1 << 2, > + IB_CSUM_TX_IP_HDR = 1 << 3 > +}; TPC and UDP should be separate flags. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> > +enum ib_csum_cap_flags { > > + IB_CSUM_RX_TCP_UDP = 1 << 0, > > + IB_CSUM_RX_IP_HDR = 1 << 1, > > + IB_CSUM_TX_TCP_UDP = 1 << 2, > > + IB_CSUM_TX_IP_HDR = 1 << 3 > > +}; > > TPC and UDP should be separate flags. Can you explain why? What we are advertising here is offloads for L3 and L4 checksums, why should it be per protocol? -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> > > +enum ib_csum_cap_flags { > > > + IB_CSUM_RX_TCP_UDP = 1 << 0, > > > + IB_CSUM_RX_IP_HDR = 1 << 1, > > > + IB_CSUM_TX_TCP_UDP = 1 << 2, > > > + IB_CSUM_TX_IP_HDR = 1 << 3 > > > +}; > > > > TPC and UDP should be separate flags. > > Can you explain why? For the same reason that you didn't include *all* L4 headers. > What we are advertising here is offloads for L3 and L4 checksums, why > should it be per protocol? Because UDP and TCP have different headers, and it's entirely possible for a NIC (e.g. usnic) to support offloading one but not the other. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Aug 05, 2015 at 06:34:26PM +0300, Amir Vadai wrote: > struct ib_uverbs_ex_query_device { > __u32 comp_mask; > + __u32 csum_caps; > __u32 reserved; > }; Uh no. > @@ -221,6 +222,7 @@ struct ib_uverbs_odp_caps { > struct ib_uverbs_ex_query_device_resp { > struct ib_uverbs_query_device_resp base; > __u32 comp_mask; > + __u32 csum_caps; > __u32 response_length; > struct ib_uverbs_odp_caps odp_caps; > __u64 timestamp_mask; Also totally wrong. Or, WTF? why is something so screwed up like this being sent to the list? NAK Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Aug 5, 2015 at 8:16 PM, Jason Gunthorpe <jgunthorpe@obsidianresearch.com> wrote: > On Wed, Aug 05, 2015 at 06:34:26PM +0300, Amir Vadai wrote: >> struct ib_uverbs_ex_query_device { >> __u32 comp_mask; >> + __u32 csum_caps; >> __u32 reserved; >> }; > > Uh no. > >> @@ -221,6 +222,7 @@ struct ib_uverbs_odp_caps { >> struct ib_uverbs_ex_query_device_resp { >> struct ib_uverbs_query_device_resp base; >> __u32 comp_mask; >> + __u32 csum_caps; >> __u32 response_length; >> struct ib_uverbs_odp_caps odp_caps; >> __u64 timestamp_mask; > > Also totally wrong. > > Or, WTF? why is something so screwed up like this being sent to the list? > NAK Jason, So -- shit happens, I am trying to figure out if an internal review has been done, b/c we do have some folks who terribly master the extended uverbs framework, right...? if this is wrong it's good that you rejected the patch and even put it in WTF response, but again, I see that the author is someone new to the upstream rdma stack and we should encourage more people to participate... so you sent him back home to do HW, let's see what this will yield. Or. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Aug 5, 2015 at 7:17 PM, Hefty, Sean <sean.hefty@intel.com> wrote:
> TPC and UDP should be separate flags.
Sean,
I don't think we should over-complex things vs. what the network stack
does for many (since kernel 2.4?!) years. They have basically three
flags
NETIF_F_IP_CSUM - device can checksum TCP/UDP over IPv4
NETIF_F_IP6_CSUM - device can checksum TCP/UDP over IPv6
NETIF_F_HW_CSUM - device can checksum all packets
I would say that it makes sense to consider the IB_DEVICE_UD_IP_CSUM
dev cap we have for ~10 years to be equivalent for NETIF_F_IP_CSUM and
apply also for devices supports RAW PACKETs QPs.
Not sure if we need to add now the IPv6 support too, unless there are
real offload consumers, ditoo for the HW checksum. The focus better be
around building user space code that well supports the IPv5 device
cap.
Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
PiBJIGRvbid0IHRoaW5rIHdlIHNob3VsZCBvdmVyLWNvbXBsZXggdGhpbmdzIHZzLiB3aGF0IHRo ZSBuZXR3b3JrIHN0YWNrDQo+IGRvZXMgZm9yIG1hbnkgKHNpbmNlIGtlcm5lbCAyLjQ/ISkgeWVh cnMuIFRoZXkgaGF2ZSBiYXNpY2FsbHkgdGhyZWUNCj4gZmxhZ3MNCj4NCj4gTkVUSUZfRl9JUF9D U1VNIC0gZGV2aWNlIGNhbiBjaGVja3N1bSBUQ1AvVURQIG92ZXIgSVB2NA0KPiBORVRJRl9GX0lQ Nl9DU1VNICAtIGRldmljZSBjYW4gY2hlY2tzdW0gVENQL1VEUCBvdmVyIElQdjYNCj4gTkVUSUZf Rl9IV19DU1VNIC0gZGV2aWNlIGNhbiBjaGVja3N1bSBhbGwgcGFja2V0cw0KDQpJIGZhaWwgdG8g c2VlIGhvdyBkZWZpbmluZyBhIGZsYWcgdG8gbWVhbiBvbmUgdGhpbmcgY29tcGxpY2F0ZXMgdGhp bmdzLiAgUGx1cywgdGhlIHByb3Bvc2VkIHBhdGNoIGRvZXNuJ3QgZXZlbiBmb2xsb3cgd2hhdCB0 aGUgbmV0d29yayBzdGFjayBkb2VzLg0K -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Aug 06, 2015 at 01:16:17AM +0300, Or Gerlitz wrote: > So -- shit happens, I am trying to figure out if an internal review > has been done, b/c we do have some folks who terribly master the > extended uverbs framework, right...? You and Matan had no problem doing the timestamp stuff properly in this area, so I assume so.. I would *strongly* encourage either of you to publicly do the proper review and explain why the above patches are wrong. Then there is some record someone in future can find to discuss all this.. > to the upstream rdma stack and we should encourage more people to > participate... so you sent him back home to do HW, let's see what > this will yield. The participating we are sorely lacking right now is on the review side, which is like most of the kernel, unfortunately. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Aug 6, 2015 at 3:00 AM, Jason Gunthorpe <jgunthorpe@obsidianresearch.com> wrote: [...] > The participating we are sorely lacking right now is on the review > side, which is like most of the kernel, unfortunately. I agree, if a proper internal review was taking place here, it wouldn't been sent this way. But you know, it can happen and good you pointed this out for us to take a point and try to make sure we do things right. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Aug 6, 2015 at 2:23 AM, Hefty, Sean <sean.hefty@intel.com> wrote: >> I don't think we should over-complex things vs. what the network stack >> does for many (since kernel 2.4?!) years. They have basically three >> flags >> >> NETIF_F_IP_CSUM - device can checksum TCP/UDP over IPv4 >> NETIF_F_IP6_CSUM - device can checksum TCP/UDP over IPv6 >> NETIF_F_HW_CSUM - device can checksum all packets > > I fail to see how defining a flag to mean one thing complicates things. b/c we are talking on offloading IP, TCP, UDP and friends checksum -- something the networking stack does since 1991 or alike, and I don't see the point to go any further from where they are. > Plus, the proposed patch doesn't even follow what the network stack does. To make it clear, I didn't say that the proposed patch did what I sketched, I provided a response to you and a reviewer comment in the same reply... -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wednesday, August 5, 2015 8:16 PM, Jason Gunthorpe jgunthorpe@obsidianresearch.com> wrote: > On Wed, Aug 05, 2015 at 06:34:26PM +0300, Amir Vadai wrote: >> struct ib_uverbs_ex_query_device { >> __u32 comp_mask; >> + __u32 csum_caps; >> __u32 reserved; >> }; > > Uh no. This is the struct of the command, not the response. There is no need to extend it. The command is designed to always return as much information as possible, so the user space code doesn't need to pass anything for it to work. Even if you did want to extend it, you would need to replace the reserved word. The structs in this header file must be made in such way that they have the same size on 32-bit systems and on 64-bit systems (see the comment at the beginning of the header file). This is why the reserved word is there. > >> @@ -221,6 +222,7 @@ struct ib_uverbs_odp_caps { >> struct ib_uverbs_ex_query_device_resp { >> struct ib_uverbs_query_device_resp base; >> __u32 comp_mask; >> + __u32 csum_caps; >> __u32 response_length; >> struct ib_uverbs_odp_caps odp_caps; >> __u64 timestamp_mask; > > Also totally wrong. The response struct must maintain backward compatibility. You cannot change the order of the existing fields. The only valid way of extending it is at the end. Here too, you must make sure that the struct has the same size on 32-bit systems, so you would need to add a 32-bit reserved word at the end. Haggai -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Aug 6, 2015 at 4:30 PM, Haggai Eran <haggaie@mellanox.com> wrote: > On Wednesday, August 5, 2015 8:16 PM, Jason Gunthorpe jgunthorpe@obsidianresearch.com> wrote: >> On Wed, Aug 05, 2015 at 06:34:26PM +0300, Amir Vadai wrote: >>> struct ib_uverbs_ex_query_device { >>> __u32 comp_mask; >>> + __u32 csum_caps; >>> __u32 reserved; >>> }; >> >> Uh no. > This is the struct of the command, not the response. There is no need to extend it. The command is designed to always return as much information as possible, so the user space code doesn't need to pass anything for it to work. > > Even if you did want to extend it, you would need to replace the reserved word. The structs in this header file must be made in such way that they have the same size on 32-bit systems and on 64-bit systems (see the comment at the beginning of the header file). This is why the reserved word is there. > >> >>> @@ -221,6 +222,7 @@ struct ib_uverbs_odp_caps { >>> struct ib_uverbs_ex_query_device_resp { >>> struct ib_uverbs_query_device_resp base; >>> __u32 comp_mask; >>> + __u32 csum_caps; >>> __u32 response_length; >>> struct ib_uverbs_odp_caps odp_caps; >>> __u64 timestamp_mask; >> >> Also totally wrong. > > The response struct must maintain backward compatibility. You cannot change the order of the existing fields. The only valid way of extending it is at the end. Here too, you must make sure that the struct has the same size on 32-bit systems, so you would need to add a 32-bit reserved word at the end. > As struct ib_uverbs_ex_query_device_resp captures extended capabilities, does it make sense to have few more reserved words defined as part of this patch? So that later on those reserved can be defined in future for additional features. This way for every new feature we dont need to bump structure size of ABI, not we need to define new set of ABI calls. Its hard to say how much more is sufficient, but was thinking of 8 32-bit words. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 08/06/2015 02:18 PM, Parav Pandit wrote: > On Thu, Aug 6, 2015 at 4:30 PM, Haggai Eran <haggaie@mellanox.com > <mailto:haggaie@mellanox.com>> wrote: > > On Wednesday, August 5, 2015 8:16 PM, Jason Gunthorpe > jgunthorpe@obsidianresearch.com > <mailto:jgunthorpe@obsidianresearch.com>> wrote: > > On Wed, Aug 05, 2015 at 06:34:26PM +0300, Amir Vadai wrote: > >> struct ib_uverbs_ex_query_device { > >> __u32 comp_mask; > >> + __u32 csum_caps; > >> __u32 reserved; > >> }; > > > > Uh no. > This is the struct of the command, not the response. There is no > need to extend it. The command is designed to always return as much > information as possible, so the user space code doesn't need to pass > anything for it to work. > > Even if you did want to extend it, you would need to replace the > reserved word. The structs in this header file must be made in such > way that they have the same size on 32-bit systems and on 64-bit > systems (see the comment at the beginning of the header file). This > is why the reserved word is there. > > > > >> @@ -221,6 +222,7 @@ struct ib_uverbs_odp_caps { > >> struct ib_uverbs_ex_query_device_resp { > >> struct ib_uverbs_query_device_resp base; > >> __u32 comp_mask; > >> + __u32 csum_caps; > >> __u32 response_length; > >> struct ib_uverbs_odp_caps odp_caps; > >> __u64 timestamp_mask; > > > > Also totally wrong. > > The response struct must maintain backward compatibility. You cannot > change the order of the existing fields. The only valid way of > extending it is at the end. Here too, you must make sure that the > struct has the same size on 32-bit systems, so you would need to add > a 32-bit reserved word at the end. > > Haggai > > As struct ib_uverbs_ex_query_device_resp captures extended capabilities, > does it make sense to have few more reserved words defined as part of > this patch? > So that later on those reserved can be defined in future for additional > features. > This way for every new feature we dont need to bump structure size of > ABI, not we need to define new set of ABI calls. > Its hard to say how much more is sufficient, but was thinking of 8 > 32-bit words. > I don't see how increasing the size now would get you anything that changing the returned response_length field wouldn't. I'm not sure what you consider an ABI change. Doesn't adding new meaning to reserved fields count as a change? In any case, increasing the response length doesn't require adding new calls. The kernel code will agree to fill only the fields that fit in the buffer provided by the user-space caller. Haggai -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Aug 6, 2015 at 10:20 PM, Haggai Eran <haggaie@mellanox.com> wrote: > On 08/06/2015 02:18 PM, Parav Pandit wrote: >> On Thu, Aug 6, 2015 at 4:30 PM, Haggai Eran <haggaie@mellanox.com >> <mailto:haggaie@mellanox.com>> wrote: >> >> On Wednesday, August 5, 2015 8:16 PM, Jason Gunthorpe >> jgunthorpe@obsidianresearch.com >> <mailto:jgunthorpe@obsidianresearch.com>> wrote: >> > On Wed, Aug 05, 2015 at 06:34:26PM +0300, Amir Vadai wrote: >> >> struct ib_uverbs_ex_query_device { >> >> __u32 comp_mask; >> >> + __u32 csum_caps; >> >> __u32 reserved; >> >> }; >> > >> > Uh no. >> This is the struct of the command, not the response. There is no >> need to extend it. The command is designed to always return as much >> information as possible, so the user space code doesn't need to pass >> anything for it to work. >> >> Even if you did want to extend it, you would need to replace the >> reserved word. The structs in this header file must be made in such >> way that they have the same size on 32-bit systems and on 64-bit >> systems (see the comment at the beginning of the header file). This >> is why the reserved word is there. >> >> > >> >> @@ -221,6 +222,7 @@ struct ib_uverbs_odp_caps { >> >> struct ib_uverbs_ex_query_device_resp { >> >> struct ib_uverbs_query_device_resp base; >> >> __u32 comp_mask; >> >> + __u32 csum_caps; >> >> __u32 response_length; >> >> struct ib_uverbs_odp_caps odp_caps; >> >> __u64 timestamp_mask; >> > >> > Also totally wrong. >> >> The response struct must maintain backward compatibility. You cannot >> change the order of the existing fields. The only valid way of >> extending it is at the end. Here too, you must make sure that the >> struct has the same size on 32-bit systems, so you would need to add >> a 32-bit reserved word at the end. >> >> Haggai >> >> As struct ib_uverbs_ex_query_device_resp captures extended capabilities, >> does it make sense to have few more reserved words defined as part of >> this patch? >> So that later on those reserved can be defined in future for additional >> features. >> This way for every new feature we dont need to bump structure size of >> ABI, not we need to define new set of ABI calls. >> Its hard to say how much more is sufficient, but was thinking of 8 >> 32-bit words. >> > > I don't see how increasing the size now would get you anything that > changing the returned response_length field wouldn't. It won't. Eventually this code will have switch case for various different response length to support backward compatibility. I was trying to avoid adding such switch-case. Instead based on supported kernel version, it will fill up the information. > I'm not sure what > you consider an ABI change. Doesn't adding new meaning to reserved > fields count as a change? In any case, increasing the response length > doesn't require adding new calls. Yes, it doesn't. I don't see issue with response length increase, it solves it. I was considering a solution where we don't have to keep doing that. > The kernel code will agree to fill only the fields that fit in the buffer provided by the user-space caller. > > Haggai -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c index bbb02ff..c3d7c54 100644 --- a/drivers/infiniband/core/uverbs_cmd.c +++ b/drivers/infiniband/core/uverbs_cmd.c @@ -3435,6 +3435,12 @@ int ib_uverbs_ex_query_device(struct ib_uverbs_file *file, copy_query_dev_fields(file, &resp.base, &attr); resp.comp_mask = 0; + if (ucore->outlen < resp.response_length + sizeof(resp.csum_caps)) + goto end; + + resp.csum_caps = attr.csum_caps; + resp.response_length += sizeof(resp.csum_caps); + if (ucore->outlen < resp.response_length + sizeof(resp.odp_caps)) goto end; diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 0940051..582483e 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -190,6 +190,13 @@ struct ib_cq_init_attr { u32 flags; }; +enum ib_csum_cap_flags { + IB_CSUM_RX_TCP_UDP = 1 << 0, + IB_CSUM_RX_IP_HDR = 1 << 1, + IB_CSUM_TX_TCP_UDP = 1 << 2, + IB_CSUM_TX_IP_HDR = 1 << 3 +}; + struct ib_device_attr { u64 fw_ver; __be64 sys_image_guid; @@ -236,6 +243,7 @@ struct ib_device_attr { struct ib_odp_caps odp_caps; uint64_t timestamp_mask; uint64_t hca_core_clock; /* in KHZ */ + u32 csum_caps; }; enum ib_mtu { diff --git a/include/uapi/rdma/ib_user_verbs.h b/include/uapi/rdma/ib_user_verbs.h index 978841e..5eb7de1 100644 --- a/include/uapi/rdma/ib_user_verbs.h +++ b/include/uapi/rdma/ib_user_verbs.h @@ -205,6 +205,7 @@ struct ib_uverbs_query_device_resp { struct ib_uverbs_ex_query_device { __u32 comp_mask; + __u32 csum_caps; __u32 reserved; }; @@ -221,6 +222,7 @@ struct ib_uverbs_odp_caps { struct ib_uverbs_ex_query_device_resp { struct ib_uverbs_query_device_resp base; __u32 comp_mask; + __u32 csum_caps; __u32 response_length; struct ib_uverbs_odp_caps odp_caps; __u64 timestamp_mask;