Message ID | 24ceb1fc5b2b6563532e5776b1b2320b1ae37543.1422553023.git.ydroneaud@opteya.com (mailing list archive) |
---|---|
State | Superseded, archived |
Headers | show |
On Thu, Jan 29, 2015 at 06:59:58PM +0100, Yann Droneaud wrote: > As specified in "Extending Verbs API" presentation [1] by Tzahi Oved > during OFA International Developer Workshop 2013, the request's > comp_mask should describe the request data: it's describe the > availability of extended fields in the request. > Conversely, the response's comp_mask should describe the presence > of extended fields in the response. Roland: I agree with Yann, these patches need to go in, or the ODP patches reverted. > #ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING > - if (cmd.comp_mask & IB_USER_VERBS_EX_QUERY_DEVICE_ODP) { Absolutely, a query output should never depend on the input comp_mask. > - resp.odp_caps.general_caps = attr.odp_caps.general_caps; > - resp.odp_caps.per_transport_caps.rc_odp_caps = > - attr.odp_caps.per_transport_caps.rc_odp_caps; > - resp.odp_caps.per_transport_caps.uc_odp_caps = > - attr.odp_caps.per_transport_caps.uc_odp_caps; > - resp.odp_caps.per_transport_caps.ud_odp_caps = > - attr.odp_caps.per_transport_caps.ud_odp_caps; > - resp.comp_mask |= IB_USER_VERBS_EX_QUERY_DEVICE_ODP; > - } > + resp.odp_caps.general_caps = attr.odp_caps.general_caps; > + resp.odp_caps.per_transport_caps.rc_odp_caps = > + attr.odp_caps.per_transport_caps.rc_odp_caps; > + resp.odp_caps.per_transport_caps.uc_odp_caps = > + attr.odp_caps.per_transport_caps.uc_odp_caps; > + resp.odp_caps.per_transport_caps.ud_odp_caps = > + attr.odp_caps.per_transport_caps.ud_odp_caps; > #endif > + resp.comp_mask |= IB_USER_VERBS_EX_QUERY_DEVICE_ODP; Not sure about this - if 0 is a valid null answer for all the _caps then it is fine, and the comp_mask bit should just be removed as the size alone should be enough. This looks wrong however: > err = ib_copy_to_udata(ucore, &resp, sizeof(resp)); > if (err) > return err; > return 0; Why isn't this returning the filled structure size to userspace? This would seem to be very urgently wrong to me? Am I missing something? Patch 3 probably should gain: - return 0; + return resp_len; This patch looks like an improvement to me: Reviewed-By: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi, Le jeudi 29 janvier 2015 à 11:28 -0700, Jason Gunthorpe a écrit : > On Thu, Jan 29, 2015 at 06:59:58PM +0100, Yann Droneaud wrote: > > As specified in "Extending Verbs API" presentation [1] by Tzahi Oved > > during OFA International Developer Workshop 2013, the request's > > comp_mask should describe the request data: it's describe the > > availability of extended fields in the request. > > Conversely, the response's comp_mask should describe the presence > > of extended fields in the response. > > Roland: I agree with Yann, these patches need to go in, or the ODP > patches reverted. > > > #ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING > > - if (cmd.comp_mask & IB_USER_VERBS_EX_QUERY_DEVICE_ODP) { > > Absolutely, a query output should never depend on the input comp_mask. > We should keep this in mind then. > > - resp.odp_caps.general_caps = attr.odp_caps.general_caps; > > - resp.odp_caps.per_transport_caps.rc_odp_caps = > > - attr.odp_caps.per_transport_caps.rc_odp_caps; > > - resp.odp_caps.per_transport_caps.uc_odp_caps = > > - attr.odp_caps.per_transport_caps.uc_odp_caps; > > - resp.odp_caps.per_transport_caps.ud_odp_caps = > > - attr.odp_caps.per_transport_caps.ud_odp_caps; > > - resp.comp_mask |= IB_USER_VERBS_EX_QUERY_DEVICE_ODP; > > - } > > + resp.odp_caps.general_caps = attr.odp_caps.general_caps; > > + resp.odp_caps.per_transport_caps.rc_odp_caps = > > + attr.odp_caps.per_transport_caps.rc_odp_caps; > > + resp.odp_caps.per_transport_caps.uc_odp_caps = > > + attr.odp_caps.per_transport_caps.uc_odp_caps; > > + resp.odp_caps.per_transport_caps.ud_odp_caps = > > + attr.odp_caps.per_transport_caps.ud_odp_caps; > > #endif > > + resp.comp_mask |= IB_USER_VERBS_EX_QUERY_DEVICE_ODP; > > Not sure about this - if 0 is a valid null answer for all the _caps > then it is fine, and the comp_mask bit should just be removed as the > size alone should be enough. > That's difficult to say. But I hope 0 are valids answers in this case. Anyway, the response's comp_mask is needed, as it's the sole way for the userspace to know the size of the response. See below. > This looks wrong however: > > > err = ib_copy_to_udata(ucore, &resp, sizeof(resp)); > > if (err) > > return err; > > return 0; > > Why isn't this returning the filled structure size to userspace? This > would seem to be very urgently wrong to me? Am I missing something? > > Patch 3 probably should gain: > > - return 0; > + return resp_len; > The write() syscall must return the size buffer passed to it, or less, but in such case it would ask for trouble as userspace would be allowed to write() the remaining bytes. Returning a size bigger than the one passed to write() is not acceptable and would break any expectation. > This patch looks like an improvement to me: > > Reviewed-By: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> > Thanks a lot. Regards.
On Thu, Jan 29, 2015 at 07:43:29PM +0100, Yann Droneaud wrote: > The write() syscall must return the size buffer passed to it, or > less, but in such case it would ask for trouble as userspace would > be allowed to write() the remaining bytes. Returning a size bigger > than the one passed to write() is not acceptable and would break any > expectation. By that logic the 0 return is still wrong, and it should be ucore->in_len But I think we can return less without risking anything breaking, it already violates the invariant associated with write() - it mutates the buffer passed in! Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le jeudi 29 janvier 2015 à 19:43 +0100, Yann Droneaud a écrit : > Le jeudi 29 janvier 2015 à 11:28 -0700, Jason Gunthorpe a écrit : > > On Thu, Jan 29, 2015 at 06:59:58PM +0100, Yann Droneaud wrote: > > > - resp.odp_caps.general_caps = attr.odp_caps.general_caps; > > > - resp.odp_caps.per_transport_caps.rc_odp_caps = > > > - attr.odp_caps.per_transport_caps.rc_odp_caps; > > > - resp.odp_caps.per_transport_caps.uc_odp_caps = > > > - attr.odp_caps.per_transport_caps.uc_odp_caps; > > > - resp.odp_caps.per_transport_caps.ud_odp_caps = > > > - attr.odp_caps.per_transport_caps.ud_odp_caps; > > > - resp.comp_mask |= IB_USER_VERBS_EX_QUERY_DEVICE_ODP; > > > - } > > > + resp.odp_caps.general_caps = attr.odp_caps.general_caps; > > > + resp.odp_caps.per_transport_caps.rc_odp_caps = > > > + attr.odp_caps.per_transport_caps.rc_odp_caps; > > > + resp.odp_caps.per_transport_caps.uc_odp_caps = > > > + attr.odp_caps.per_transport_caps.uc_odp_caps; > > > + resp.odp_caps.per_transport_caps.ud_odp_caps = > > > + attr.odp_caps.per_transport_caps.ud_odp_caps; > > > #endif > > > + resp.comp_mask |= IB_USER_VERBS_EX_QUERY_DEVICE_ODP; > > > > Not sure about this - if 0 is a valid null answer for all the _caps > > then it is fine, and the comp_mask bit should just be removed as the > > size alone should be enough. > > > > That's difficult to say. But I hope 0 are valids answers in this case. > Hum, according to include/rdma/ib_verbs.h, there's a bit set for ODP support: 148 enum ib_odp_general_cap_bits { 149 IB_ODP_SUPPORT = 1 << 0, 150 }; So it should be safe to set everything to 0 in struct ib_uverbs_odp_caps. Regards.
Hi, Le jeudi 29 janvier 2015 à 12:18 -0700, Jason Gunthorpe a écrit : > On Thu, Jan 29, 2015 at 07:43:29PM +0100, Yann Droneaud wrote: > > > The write() syscall must return the size buffer passed to it, or > > less, but in such case it would ask for trouble as userspace would > > be allowed to write() the remaining bytes. Returning a size bigger > > than the one passed to write() is not acceptable and would break any > > expectation. > > By that logic the 0 return is still wrong, and it should be ucore->in_len > This is handled by ib_uverbs_write() in drivers/infiniband/core/uverbs_main.c: 709 if (err) 710 return err; 711 712 return written_count; > But I think we can return less without risking anything breaking, it > already violates the invariant associated with write() - it mutates > the buffer passed in! > I don't think so, as only the response buffer is written to and the response buffer pointer is provided in the buffer given to write(). AFAIK, no uverbs ever change the content of the input buffer (eg. the request): I've managed to declare the various input buffers "const" so it would surprising to find it use for writing to userspace. Anyway, I recognize that uverb way of abusing write() syscall is borderline (at best) regarding other Linux subsystems and Unix paradigm in general. But it's not enough to screw it more. Regards.
On Thu, Jan 29, 2015 at 09:50:38PM +0100, Yann Droneaud wrote: > Anyway, I recognize that uverb way of abusing write() syscall is > borderline (at best) regarding other Linux subsystems and Unix paradigm > in general. But it's not enough to screw it more. Then we must return the correct output size explicitly in the struct. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 29/01/2015 20:28, Jason Gunthorpe wrote: > On Thu, Jan 29, 2015 at 06:59:58PM +0100, Yann Droneaud wrote: >> > - resp.odp_caps.general_caps = attr.odp_caps.general_caps; >> > - resp.odp_caps.per_transport_caps.rc_odp_caps = >> > - attr.odp_caps.per_transport_caps.rc_odp_caps; >> > - resp.odp_caps.per_transport_caps.uc_odp_caps = >> > - attr.odp_caps.per_transport_caps.uc_odp_caps; >> > - resp.odp_caps.per_transport_caps.ud_odp_caps = >> > - attr.odp_caps.per_transport_caps.ud_odp_caps; >> > - resp.comp_mask |= IB_USER_VERBS_EX_QUERY_DEVICE_ODP; >> > - } >> > + resp.odp_caps.general_caps = attr.odp_caps.general_caps; >> > + resp.odp_caps.per_transport_caps.rc_odp_caps = >> > + attr.odp_caps.per_transport_caps.rc_odp_caps; >> > + resp.odp_caps.per_transport_caps.uc_odp_caps = >> > + attr.odp_caps.per_transport_caps.uc_odp_caps; >> > + resp.odp_caps.per_transport_caps.ud_odp_caps = >> > + attr.odp_caps.per_transport_caps.ud_odp_caps; >> > #endif >> > + resp.comp_mask |= IB_USER_VERBS_EX_QUERY_DEVICE_ODP; > Not sure about this - if 0 is a valid null answer for all the _caps > then it is fine, and the comp_mask bit should just be removed as the > size alone should be enough. Zero is indeed a valid answer. There the IB_ODP_SUPPORT bit in the general_caps field that says whether or not ODP is supported in general. The per transport capabilities are also default to not supported. However, I think we should keep the comp_mask field for future extensions. The current code doesn't report the size of the response to user space, and in addition, comp_mask being a bit mask has the advantage of allowing only part of the structure to be marked valid. Regards, Haggai -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> > On Thu, Jan 29, 2015 at 09:50:38PM +0100, Yann Droneaud wrote: > > > Anyway, I recognize that uverb way of abusing write() syscall is > > borderline (at best) regarding other Linux subsystems and Unix > > paradigm in general. But it's not enough to screw it more. > > Then we must return the correct output size explicitly in the struct. I was thinking this very same thing as I read through this thread. I too would like to avoid the use of comp_masks if at all possible. The query call seems to be a verb where the structure size is all you really need to know the set of values returned. As Jason says, other calls may require the comp_mask where 0 is not sufficient to indicate "missing". Ira -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 05/02/2015 04:54, Weiny, Ira wrote: >> >> On Thu, Jan 29, 2015 at 09:50:38PM +0100, Yann Droneaud wrote: >> >>> Anyway, I recognize that uverb way of abusing write() syscall is >>> borderline (at best) regarding other Linux subsystems and Unix >>> paradigm in general. But it's not enough to screw it more. >> >> Then we must return the correct output size explicitly in the struct. > > I was thinking this very same thing as I read through this thread. > > I too would like to avoid the use of comp_masks if at all possible. The query call seems to be a verb where the structure size is all you really need to know the set of values returned. > > As Jason says, other calls may require the comp_mask where 0 is not sufficient to indicate "missing". Would it be okay to return it in the ib_uverbs_cmd_hdr.out_words? That would further abuse the write() syscall by writing to the input buffer. However, the only other alternative I see is to add it explicitly to every uverb response struct. Haggai -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> > On 05/02/2015 04:54, Weiny, Ira wrote: > >> > >> On Thu, Jan 29, 2015 at 09:50:38PM +0100, Yann Droneaud wrote: > >> > >>> Anyway, I recognize that uverb way of abusing write() syscall is > >>> borderline (at best) regarding other Linux subsystems and Unix > >>> paradigm in general. But it's not enough to screw it more. > >> > >> Then we must return the correct output size explicitly in the struct. > > > > I was thinking this very same thing as I read through this thread. > > > > I too would like to avoid the use of comp_masks if at all possible. The query > call seems to be a verb where the structure size is all you really need to know > the set of values returned. > > > > As Jason says, other calls may require the comp_mask where 0 is not > sufficient to indicate "missing". > > Would it be okay to return it in the ib_uverbs_cmd_hdr.out_words? That would > further abuse the write() syscall by writing to the input buffer. I don't think that is such a great idea. > However, the only other alternative I see is to add it explicitly to every uverb > response struct. > I think this is the best solution. There is a 32 bit reserved field in ib_uverbs_ex_query_device_resp. Could we use all or part of that to be the size? For other extended commands I'm not sure what to do. Ira -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 07/02/2015 02:52, Weiny, Ira wrote: >> >> On 05/02/2015 04:54, Weiny, Ira wrote: >>>> >>>> On Thu, Jan 29, 2015 at 09:50:38PM +0100, Yann Droneaud wrote: >>>> >>>>> Anyway, I recognize that uverb way of abusing write() syscall is >>>>> borderline (at best) regarding other Linux subsystems and Unix >>>>> paradigm in general. But it's not enough to screw it more. >>>> >>>> Then we must return the correct output size explicitly in the struct. >>> >>> I was thinking this very same thing as I read through this thread. >>> >>> I too would like to avoid the use of comp_masks if at all possible. The query >> call seems to be a verb where the structure size is all you really need to know >> the set of values returned. >>> >>> As Jason says, other calls may require the comp_mask where 0 is not >> sufficient to indicate "missing". >> ... > >> However, the only other alternative I see is to add it explicitly to every uverb >> response struct. >> > > I think this is the best solution. There is a 32 bit reserved field in ib_uverbs_ex_query_device_resp. Could we use all or part of that to be the size? Yes, I think 32-bit for the response length are more than enough. I will send patches for 3.20 to re-introduce ib_uverbs_ex_query_device with the response size instead of the reserved field, and with Yann's changes. Thanks, Haggai -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c index 532d8eba8b02..6ef06a9b4362 100644 --- a/drivers/infiniband/core/uverbs_cmd.c +++ b/drivers/infiniband/core/uverbs_cmd.c @@ -3324,17 +3324,15 @@ int ib_uverbs_ex_query_device(struct ib_uverbs_file *file, resp.comp_mask = 0; #ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING - if (cmd.comp_mask & IB_USER_VERBS_EX_QUERY_DEVICE_ODP) { - resp.odp_caps.general_caps = attr.odp_caps.general_caps; - resp.odp_caps.per_transport_caps.rc_odp_caps = - attr.odp_caps.per_transport_caps.rc_odp_caps; - resp.odp_caps.per_transport_caps.uc_odp_caps = - attr.odp_caps.per_transport_caps.uc_odp_caps; - resp.odp_caps.per_transport_caps.ud_odp_caps = - attr.odp_caps.per_transport_caps.ud_odp_caps; - resp.comp_mask |= IB_USER_VERBS_EX_QUERY_DEVICE_ODP; - } + resp.odp_caps.general_caps = attr.odp_caps.general_caps; + resp.odp_caps.per_transport_caps.rc_odp_caps = + attr.odp_caps.per_transport_caps.rc_odp_caps; + resp.odp_caps.per_transport_caps.uc_odp_caps = + attr.odp_caps.per_transport_caps.uc_odp_caps; + resp.odp_caps.per_transport_caps.ud_odp_caps = + attr.odp_caps.per_transport_caps.ud_odp_caps; #endif + resp.comp_mask |= IB_USER_VERBS_EX_QUERY_DEVICE_ODP; err = ib_copy_to_udata(ucore, &resp, sizeof(resp)); if (err)
As specified in "Extending Verbs API" presentation [1] by Tzahi Oved during OFA International Developer Workshop 2013, the request's comp_mask should describe the request data: it's describe the availability of extended fields in the request. Conversely, the response's comp_mask should describe the presence of extended fields in the response. So this patch changes ib_uverbs_ex_query_device() function to always return the maximum supported features, currently only IB_USER_VERBS_EX_QUERY_DEVICE_ODP per commit 860f10a799c8 ("IB/core: Add flags for on demand paging support"), regardless of the value of request's comp_mask. [1] https://www.openfabrics.org/images/docs/2013_Dev_Workshop/Tues_0423/2013_Workshop_Tues_0830_Tzahi_Oved-verbs_extensions_ofa_2013-tzahio.pdf Link: http://mid.gmane.org/cover.1422553023.git.ydroneaud@opteya.com Cc: Sagi Grimberg <sagig@mellanox.com> Cc: Shachar Raindel <raindel@mellanox.com> Cc: Eli Cohen <eli@mellanox.com> Cc: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> --- drivers/infiniband/core/uverbs_cmd.c | 18 ++++++++---------- 1 file changed, 8 insertions(+), 10 deletions(-)