Message ID | 1422568741.3133.247.camel@opteya.com (mailing list archive) |
---|---|
State | Superseded, archived |
Headers | show |
On Thu, Jan 29, 2015 at 1:59 PM, Yann Droneaud <ydroneaud@opteya.com> wrote: >> Roland: I agree with Yann, these patches need to go in, or the ODP >> patches reverted. > Reverting all On Demand Paging patches seems overkill: > if something as to be reverted it should be commit 5a77abf9a97a > ("IB/core: Add support for extended query device caps") and the part of > commit 860f10a799c8 ("IB/core: Add flags for on demand paging support") > which modify ib_uverbs_ex_query_device(). Thank you and Jason for taking on this interface. At this point I feel like I do about the IPoIB changes -- we should revert the broken stuff and get it right for 3.20. If we revert the two things you describe above, is everything else OK to leave in 3.19 with respect to ABI? Thanks, Roland -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi, Le jeudi 29 janvier 2015 à 15:17 -0800, Roland Dreier a écrit : > On Thu, Jan 29, 2015 at 1:59 PM, Yann Droneaud <ydroneaud@opteya.com> wrote: > >> Roland: I agree with Yann, these patches need to go in, or the ODP > >> patches reverted. > > > Reverting all On Demand Paging patches seems overkill: > > if something as to be reverted it should be commit 5a77abf9a97a > > ("IB/core: Add support for extended query device caps") and the part of > > commit 860f10a799c8 ("IB/core: Add flags for on demand paging support") > > which modify ib_uverbs_ex_query_device(). > > Thank you and Jason for taking on this interface. > > At this point I feel like I do about the IPoIB changes -- we should > revert the broken stuff and get it right for 3.20. > > If we revert the two things you describe above, is everything else OK > to leave in 3.19 with respect to ABI? > I've tried to review every changes since v3.18 on drivers/infiniband include/rdma and include/uapi/rdma with respect to ABI issues. I've noticed no other issue, but I have to admit I've not well reviewed the drivers (hw/) internal changes. If the IB_USER_VERBS_EX_CMD_QUERY_DEVICE and ib_uverbs_ex_query_device changes are going to be reverted for v3.19, the on-demand-paging feature will be available (IB_DEVICE_ON_DEMAND_PAGING will be set device_cap_flags in response to non extended QUERY_DEVICE for mlx5 HCA and IB_ACCESS_ON_DEMAND access flag will be effective for REG_MR uverbs), but its parameters won't be. I don't know if it's a no-go for the usage of on-demand paging by userspace: I have not the chance of owning HCA with the support for this feature, nor the patches libibverbs / libmlx5 ... (anyway I would not have the time to test). I've hoped people from Mellanox would have commented on the revert option too. Regards.
On Fri, Jan 30, 2015, Yann Droneaud <ydroneaud@opteya.com> wrote: > [..] I have not the chance > of owning HCA with the support for this feature, nor the patches > libibverbs / libmlx5 ... (anyway I would not have the time to test). [...] Yann, have you ever run Linux over RDMA capable HW (IB, RoCE, iWARP or alike?) -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 29/01/2015 23:59, Yann Droneaud wrote: > But I wonder about this part of commit 860f10a799c8: > > diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c > index c7a43624c96b..f9326ccda4b5 100644 > --- a/drivers/infiniband/core/uverbs_cmd.c > +++ b/drivers/infiniband/core/uverbs_cmd.c > @@ -953,6 +953,18 @@ ssize_t ib_uverbs_reg_mr(struct ib_uverbs_file *file, > goto err_free; > } > > + if (cmd.access_flags & IB_ACCESS_ON_DEMAND) { > + struct ib_device_attr attr; > + > + ret = ib_query_device(pd->device, &attr); > + if (ret || !(attr.device_cap_flags & > + IB_DEVICE_ON_DEMAND_PAGING)) { > + pr_debug("ODP support not available\n"); > + ret = -EINVAL; > + goto err_put; > + } > + } > + > > AFAICT (1 << 6) bit from struct ib_uverbs_reg_mr access_flags field > was not enforced to be 0 previously, as ib_check_mr_access() only check > for some coherency between a subset of the bits (it's not a function > dedicated to check flags provided by userspace): > > include/rdma/ib_verbs.h: > > 1094 enum ib_access_flags { > 1095 IB_ACCESS_LOCAL_WRITE = 1, > 1096 IB_ACCESS_REMOTE_WRITE = (1<<1), > 1097 IB_ACCESS_REMOTE_READ = (1<<2), > 1098 IB_ACCESS_REMOTE_ATOMIC = (1<<3), > 1099 IB_ACCESS_MW_BIND = (1<<4), > 1100 IB_ZERO_BASED = (1<<5), > 1101 IB_ACCESS_ON_DEMAND = (1<<6), > 1102 }; > > drivers/infiniband/core/uverbs_cmd.c: ib_uverbs_reg_mr() > > 961 ret = ib_check_mr_access(cmd.access_flags); > 962 if (ret) > 963 return ret; > > include/rdma/ib_verbs.h: > > 2643 static inline int ib_check_mr_access(int flags) > 2644 { > 2645 /* > 2646 * Local write permission is required if remote write or > 2647 * remote atomic permission is also requested. > 2648 */ > 2649 if (flags & (IB_ACCESS_REMOTE_ATOMIC | IB_ACCESS_REMOTE_WRITE) && > 2650 !(flags & IB_ACCESS_LOCAL_WRITE)) > 2651 return -EINVAL; > 2652 > 2653 return 0; > 2654 } > > drivers/infiniband/core/uverbs_cmd.c: ib_uverbs_reg_mr() > > 990 mr = pd->device->reg_user_mr(pd, cmd.start, cmd.length, cmd.hca_va, > 991 cmd.access_flags, &udata); > > reg_user_mr() functions may call ib_umem_get() and pass access_flags to > it: > > drivers/infiniband/core/umem.c: ib_umem_get() > > 114 /* > 115 * We ask for writable memory if any of the following > 116 * access flags are set. "Local write" and "remote write" > 117 * obviously require write access. "Remote atomic" can do > 118 * things like fetch and add, which will modify memory, and > 119 * "MW bind" can change permissions by binding a window. > 120 */ > 121 umem->writable = !!(access & > 122 (IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_WRITE | > 123 IB_ACCESS_REMOTE_ATOMIC | IB_ACCESS_MW_BIND)); > 124 > 125 if (access & IB_ACCESS_ON_DEMAND) { > 126 ret = ib_umem_odp_get(context, umem); > 127 if (ret) { > 128 kfree(umem); > 129 return ERR_PTR(ret); > 130 } > 131 return umem; > 132 } > > > As you can see only a few bits in access_flags are checked in the end, > so it may exist a very unlikely possibility that an existing userspace > program is setting this IB_ACCESS_ON_DEMAND bit without the intention > of enabling on demand paging as this would be unnoticed by older kernel. > > In the other hand, a newer program built with on-demand-paging in mind > will set the bit, but when run on older kernel, it will be a no-op, > allowing the program to continue, perhaps thinking on-demand-paging > is available. This is why we added a method for userspace to check the kernel capabilities both when adding the memory windows support and with ODP. User-space can still send garbage to the kernel, but if it does so without checking the kernel supports its request, it's user-space's problem. > That should be avoided as much as possible. > > Unfortunately, I think this cannot be fixed as it's was long since > IB_ZERO_BASED was added by commit 7083e42ee2 ("IB/core: Add "type 2" > memory windows support"). > Anyway there was no check for IB_ACCESS_REMOTE_READ, nor > IB_ACCESS_MW_BIND in the uverb layer either. I think it would be perfectly fine to add a check today that validates the MR access flags input is known. Because the newer feature require user-space to check capabilities, I don't think it would matter if we returned an error on newer kernels that do not support these features. Haggai -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 30/01/2015 19:26, Yann Droneaud wrote: > Hi, > > Le jeudi 29 janvier 2015 à 15:17 -0800, Roland Dreier a écrit : >> On Thu, Jan 29, 2015 at 1:59 PM, Yann Droneaud <ydroneaud@opteya.com> wrote: >>>> Roland: I agree with Yann, these patches need to go in, or the ODP >>>> patches reverted. >> >>> Reverting all On Demand Paging patches seems overkill: >>> if something as to be reverted it should be commit 5a77abf9a97a >>> ("IB/core: Add support for extended query device caps") and the part of >>> commit 860f10a799c8 ("IB/core: Add flags for on demand paging support") >>> which modify ib_uverbs_ex_query_device(). >> >> Thank you and Jason for taking on this interface. >> >> At this point I feel like I do about the IPoIB changes -- we should >> revert the broken stuff and get it right for 3.20. >> >> If we revert the two things you describe above, is everything else OK >> to leave in 3.19 with respect to ABI? >> > > I've tried to review every changes since v3.18 on drivers/infiniband > include/rdma and include/uapi/rdma with respect to ABI issues. > > I've noticed no other issue, but I have to admit I've not well reviewed > the drivers (hw/) internal changes. > > If the IB_USER_VERBS_EX_CMD_QUERY_DEVICE and ib_uverbs_ex_query_device > changes are going to be reverted for v3.19, the on-demand-paging > feature will be available (IB_DEVICE_ON_DEMAND_PAGING will be set > device_cap_flags in response to non extended QUERY_DEVICE for mlx5 HCA > and IB_ACCESS_ON_DEMAND access flag will be effective for REG_MR > uverbs), but its parameters won't be. For user-space to make use of on demand paging, it should verify the specific transport and operation is supported. If they don't, they will encounter errors when a page fault occurs. > I don't know if it's a no-go for > the usage of on-demand paging by userspace: I have not the chance > of owning HCA with the support for this feature, nor the patches > libibverbs / libmlx5 ... (anyway I would not have the time to test). > I've hoped people from Mellanox would have commented on the revert > option too. I would prefer it if Yann's patches are accepted. I understand it is very late, but they are quite short, and I think they provide the right semantic for this new verb. As a second option, in case you prefer to revert the extended query device patch, I will send a patch shortly to do that. Regards, Haggai -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Or, Le samedi 31 janvier 2015 à 22:09 +0200, Or Gerlitz a écrit : > On Fri, Jan 30, 2015, Yann Droneaud <ydroneaud@opteya.com> wrote: > > [..] I have not the chance > > of owning HCA with the support for this feature, nor the patches > > libibverbs / libmlx5 ... (anyway I would not have the time to test). [...] > > Yann, have you ever run Linux over RDMA capable HW (IB, RoCE, iWARP or alike?) That was a silly question. So I've took a little time to consider answering it, perhaps you were gently kidding me. Anyway, please take the following links as an answer to your question: "rdma_create_qp() and max_send_wr" <http://mid.gmane.org/1303404264.2243.19.camel@deela.quest-ce.net> "ibv_create_qp() and max_inline_data behavior" <http://mid.gmane.org/d7685e13978f93890b2939c5ac2d5c7f@meuh.org> "[PATCH librdmacm 0/3] no IBV_SEND_INLINE thus memory registration required on QLogic/Intel HCA" <http://mid.gmane.org/cover.1376746185.git.ydroneaud@opteya.com> "[PATCH 00/22] infiniband: improve userspace input check" <http://mid.gmane.org/cover.1376847403.git.ydroneaud@opteya.com> I hope this would be enough to join the gang^Wclub. Where're the by-laws ? Regards.
diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c index c7a43624c96b..f9326ccda4b5 100644 --- a/drivers/infiniband/core/uverbs_cmd.c +++ b/drivers/infiniband/core/uverbs_cmd.c @@ -953,6 +953,18 @@ ssize_t ib_uverbs_reg_mr(struct ib_uverbs_file *file, goto err_free; } + if (cmd.access_flags & IB_ACCESS_ON_DEMAND) { + struct ib_device_attr attr; + + ret = ib_query_device(pd->device, &attr); + if (ret || !(attr.device_cap_flags & + IB_DEVICE_ON_DEMAND_PAGING)) { + pr_debug("ODP support not available\n"); + ret = -EINVAL; + goto err_put; + } + } + AFAICT (1 << 6) bit from struct ib_uverbs_reg_mr access_flags field was not enforced to be 0 previously, as ib_check_mr_access() only check