Message ID | 20190117191128.19140-1-shamir.rabinovitch@oracle.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | IB/{hw,sw}: remove 'uobject->context' dependency APIs | expand |
>Subject: [PATCH for-next 1/1] IB/{hw,sw}: remove 'uobject->context' dependency in >object creation APIs > >Now when we have the udata passed to all the ib_xxx object creation APIs and the >additional function 'rdma_get_ucontext' to get the ib_ucontext from ib_udata stored >in uverbs_attr_bundle, we can finally start to remove the dependency of the drivers >in the ib_xxx->uobject->context. > >Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> >--- > drivers/infiniband/hw/bnxt_re/ib_verbs.c | 33 +++++-- > drivers/infiniband/hw/cxgb3/iwch_provider.c | 4 +- > drivers/infiniband/hw/cxgb4/qp.c | 8 +- > drivers/infiniband/hw/hns/hns_roce_qp.c | 15 +++- > drivers/infiniband/hw/i40iw/i40iw_verbs.c | 15 +++- > drivers/infiniband/hw/mlx4/doorbell.c | 6 ++ > drivers/infiniband/hw/mlx4/mr.c | 9 +- > drivers/infiniband/hw/mlx4/qp.c | 92 +++++++++++++------- > drivers/infiniband/hw/mlx4/srq.c | 13 ++- > drivers/infiniband/hw/mlx5/qp.c | 68 +++++++++++---- > drivers/infiniband/hw/mlx5/srq.c | 13 ++- > drivers/infiniband/hw/mthca/mthca_provider.c | 28 ++++-- > drivers/infiniband/hw/mthca/mthca_qp.c | 19 ++-- > drivers/infiniband/hw/mthca/mthca_srq.c | 25 ++++-- > drivers/infiniband/hw/nes/nes_verbs.c | 13 ++- > drivers/infiniband/hw/qedr/verbs.c | 8 +- > drivers/infiniband/hw/usnic/usnic_ib_verbs.c | 7 +- > drivers/infiniband/sw/rdmavt/qp.c | 10 ++- > drivers/infiniband/sw/rdmavt/srq.c | 10 ++- > drivers/infiniband/sw/rxe/rxe_qp.c | 5 +- > drivers/infiniband/sw/rxe/rxe_verbs.c | 5 +- > 21 files changed, 287 insertions(+), 119 deletions(-) > [....] >diff --git a/drivers/infiniband/hw/i40iw/i40iw_verbs.c >b/drivers/infiniband/hw/i40iw/i40iw_verbs.c >index 12b31a8440be..194cd911c9de 100644 >--- a/drivers/infiniband/hw/i40iw/i40iw_verbs.c >+++ b/drivers/infiniband/hw/i40iw/i40iw_verbs.c >@@ -580,11 +580,16 @@ static struct ib_qp *i40iw_create_qp(struct ib_pd *ibpd, > struct i40iw_create_qp_info *qp_info; > struct i40iw_cqp_request *cqp_request; > struct cqp_commands_info *cqp_info; >+ struct ib_ucontext *ib_ucontext; > > struct i40iw_qp_host_ctx_info *ctx_info; > struct i40iwarp_offload_info *iwarp_info; > unsigned long flags; > >+ ib_ucontext = rdma_get_ucontext(udata); >+ if (udata && IS_ERR(ib_ucontext)) >+ return ERR_CAST(ib_ucontext); >+ > if (iwdev->closing) > return ERR_PTR(-ENODEV); > >@@ -674,7 +679,7 @@ static struct ib_qp *i40iw_create_qp(struct ib_pd *ibpd, > } > iwqp->ctx_info.qp_compl_ctx = req.user_compl_ctx; > iwqp->user_mode = 1; >- ucontext = to_ucontext(ibpd->uobject->context); >+ ucontext = to_ucontext(ib_ucontext); > > if (req.user_wqe_buffers) { > struct i40iw_pbl *iwpbl; >@@ -1832,6 +1837,7 @@ static struct ib_mr *i40iw_reg_user_mr(struct ib_pd *pd, > struct i40iw_pd *iwpd = to_iwpd(pd); > struct i40iw_device *iwdev = to_iwdev(pd->device); > struct i40iw_ucontext *ucontext; >+ struct ib_ucontext *ib_ucontext; > struct i40iw_pble_alloc *palloc; > struct i40iw_pbl *iwpbl; > struct i40iw_mr *iwmr; >@@ -1847,6 +1853,12 @@ static struct ib_mr *i40iw_reg_user_mr(struct ib_pd *pd, > int ret; > int pg_shift; > >+ ib_ucontext = rdma_get_ucontext(udata); >+ if (IS_ERR(ib_ucontext)) >+ return ERR_CAST(ib_ucontext); >+ >+ ucontext = to_ucontext(ib_ucontext); >+ > if (iwdev->closing) > return ERR_PTR(-ENODEV); > >@@ -1872,7 +1884,6 @@ static struct ib_mr *i40iw_reg_user_mr(struct ib_pd *pd, > iwmr->region = region; > iwmr->ibmr.pd = pd; > iwmr->ibmr.device = pd->device; >- ucontext = to_ucontext(pd->uobject->context); > > iwmr->page_size = PAGE_SIZE; > iwmr->page_msk = PAGE_MASK; QP. > */ > if (new_state == IB_QPS_RESET) { >- if (!ibuobject) { >+ if (!udata) { > mlx4_ib_cq_clean(recv_cq, qp->mqp.qpn, > ibsrq ? to_msrq(ibsrq) : NULL); > if (send_cq != recv_cq) I think you missed one in i40iw_dereg_mr? https://elixir.bootlin.com/linux/v5.0-rc2/source/drivers/infiniband/hw/i40iw/i40iw_verbs.c#L2097 Shiraz
On 1/17/2019 1:11 PM, Shamir Rabinovitch wrote: > Now when we have the udata passed to all the ib_xxx object creation APIs > and the additional function 'rdma_get_ucontext' to get the ib_ucontext > from ib_udata stored in uverbs_attr_bundle, we can finally start to remove > the dependency of the drivers in the ib_xxx->uobject->context. > > Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> > --- > drivers/infiniband/hw/bnxt_re/ib_verbs.c | 33 +++++-- > drivers/infiniband/hw/cxgb3/iwch_provider.c | 4 +- > drivers/infiniband/hw/cxgb4/qp.c | 8 +- > drivers/infiniband/hw/hns/hns_roce_qp.c | 15 +++- > drivers/infiniband/hw/i40iw/i40iw_verbs.c | 15 +++- > drivers/infiniband/hw/mlx4/doorbell.c | 6 ++ > drivers/infiniband/hw/mlx4/mr.c | 9 +- > drivers/infiniband/hw/mlx4/qp.c | 92 +++++++++++++------- > drivers/infiniband/hw/mlx4/srq.c | 13 ++- > drivers/infiniband/hw/mlx5/qp.c | 68 +++++++++++---- > drivers/infiniband/hw/mlx5/srq.c | 13 ++- > drivers/infiniband/hw/mthca/mthca_provider.c | 28 ++++-- > drivers/infiniband/hw/mthca/mthca_qp.c | 19 ++-- > drivers/infiniband/hw/mthca/mthca_srq.c | 25 ++++-- > drivers/infiniband/hw/nes/nes_verbs.c | 13 ++- > drivers/infiniband/hw/qedr/verbs.c | 8 +- > drivers/infiniband/hw/usnic/usnic_ib_verbs.c | 7 +- > drivers/infiniband/sw/rdmavt/qp.c | 10 ++- > drivers/infiniband/sw/rdmavt/srq.c | 10 ++- > drivers/infiniband/sw/rxe/rxe_qp.c | 5 +- > drivers/infiniband/sw/rxe/rxe_verbs.c | 5 +- > 21 files changed, 287 insertions(+), 119 deletions(-) ... > diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c > index 07c20cd07f33..b692cd2bbe92 100644 > --- a/drivers/infiniband/hw/cxgb3/iwch_provider.c > +++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c > @@ -796,6 +796,7 @@ static struct ib_qp *iwch_create_qp(struct ib_pd *pd, > struct iwch_cq *schp; > struct iwch_cq *rchp; > struct iwch_create_qp_resp uresp; > + struct ib_ucontext *ib_ucontext; > int wqsize, sqsize, rqsize; > struct iwch_ucontext *ucontext; > > @@ -836,7 +837,8 @@ static struct ib_qp *iwch_create_qp(struct ib_pd *pd, > * Kernel users need more wq space for fastreg WRs which can take > * 2 WR fragments. > */ > - ucontext = udata ? to_iwch_ucontext(pd->uobject->context) : NULL; > + ib_ucontext = rdma_get_ucontext(udata); > + ucontext = IS_ERR(ib_ucontext) ? NULL : to_iwch_ucontext(ib_ucontext); > if (!ucontext && wqsize < (rqsize + (2 * sqsize))) > wqsize = roundup_pow_of_two(rqsize + > roundup_pow_of_two(attrs->cap.max_send_wr * 2)); > diff --git a/drivers/infiniband/hw/cxgb4/qp.c b/drivers/infiniband/hw/cxgb4/qp.c > index 03f4c66c2659..d2c452b4282c 100644 > --- a/drivers/infiniband/hw/cxgb4/qp.c > +++ b/drivers/infiniband/hw/cxgb4/qp.c > @@ -2137,6 +2137,7 @@ struct ib_qp *c4iw_create_qp(struct ib_pd *pd, struct ib_qp_init_attr *attrs, > struct c4iw_create_qp_resp uresp; > unsigned int sqsize, rqsize = 0; > struct c4iw_ucontext *ucontext; > + struct ib_ucontext *ib_ucontext; > int ret; > struct c4iw_mm_entry *sq_key_mm, *rq_key_mm = NULL, *sq_db_key_mm; > struct c4iw_mm_entry *rq_db_key_mm = NULL, *ma_sync_key_mm = NULL; > @@ -2170,7 +2171,8 @@ struct ib_qp *c4iw_create_qp(struct ib_pd *pd, struct ib_qp_init_attr *attrs, > if (sqsize < 8) > sqsize = 8; > > - ucontext = udata ? to_c4iw_ucontext(pd->uobject->context) : NULL; > + ib_ucontext = rdma_get_ucontext(udata); > + ucontext = IS_ERR(ib_ucontext) ? NULL : to_c4iw_ucontext(ib_ucontext); > > qhp = kzalloc(sizeof(*qhp), GFP_KERNEL); > if (!qhp) > @@ -2697,6 +2699,7 @@ struct ib_srq *c4iw_create_srq(struct ib_pd *pd, struct ib_srq_init_attr *attrs, > struct c4iw_create_srq_resp uresp; > struct c4iw_ucontext *ucontext; > struct c4iw_mm_entry *srq_key_mm, *srq_db_key_mm; > + struct ib_ucontext *ib_ucontext; > int rqsize; > int ret; > int wr_len; > @@ -2719,7 +2722,8 @@ struct ib_srq *c4iw_create_srq(struct ib_pd *pd, struct ib_srq_init_attr *attrs, > rqsize = attrs->attr.max_wr + 1; > rqsize = roundup_pow_of_two(max_t(u16, rqsize, 16)); > > - ucontext = udata ? to_c4iw_ucontext(pd->uobject->context) : NULL; > + ib_ucontext = rdma_get_ucontext(udata); > + ucontext = IS_ERR(ib_ucontext) ? NULL : to_c4iw_ucontext(ib_ucontext); > > srq = kzalloc(sizeof(*srq), GFP_KERNEL); > if (!srq) For cxgb*, looks ok. Reviewed-by: Steve Wise <swise@opengridcomputing.com>
On Thu, Jan 17, 2019 at 09:11:12PM +0200, Shamir Rabinovitch wrote: > Now when we have the udata passed to all the ib_xxx object creation APIs > and the additional function 'rdma_get_ucontext' to get the ib_ucontext > from ib_udata stored in uverbs_attr_bundle, we can finally start to remove > the dependency of the drivers in the ib_xxx->uobject->context. > > Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> > drivers/infiniband/hw/bnxt_re/ib_verbs.c | 33 +++++-- > drivers/infiniband/hw/cxgb3/iwch_provider.c | 4 +- > drivers/infiniband/hw/cxgb4/qp.c | 8 +- > drivers/infiniband/hw/hns/hns_roce_qp.c | 15 +++- > drivers/infiniband/hw/i40iw/i40iw_verbs.c | 15 +++- > drivers/infiniband/hw/mlx4/doorbell.c | 6 ++ > drivers/infiniband/hw/mlx4/mr.c | 9 +- > drivers/infiniband/hw/mlx4/qp.c | 92 +++++++++++++------- > drivers/infiniband/hw/mlx4/srq.c | 13 ++- > drivers/infiniband/hw/mlx5/qp.c | 68 +++++++++++---- > drivers/infiniband/hw/mlx5/srq.c | 13 ++- > drivers/infiniband/hw/mthca/mthca_provider.c | 28 ++++-- > drivers/infiniband/hw/mthca/mthca_qp.c | 19 ++-- > drivers/infiniband/hw/mthca/mthca_srq.c | 25 ++++-- > drivers/infiniband/hw/nes/nes_verbs.c | 13 ++- > drivers/infiniband/hw/qedr/verbs.c | 8 +- > drivers/infiniband/hw/usnic/usnic_ib_verbs.c | 7 +- > drivers/infiniband/sw/rdmavt/qp.c | 10 ++- > drivers/infiniband/sw/rdmavt/srq.c | 10 ++- > drivers/infiniband/sw/rxe/rxe_qp.c | 5 +- > drivers/infiniband/sw/rxe/rxe_verbs.c | 5 +- > 21 files changed, 287 insertions(+), 119 deletions(-) > > diff --git a/drivers/infiniband/hw/bnxt_re/ib_verbs.c b/drivers/infiniband/hw/bnxt_re/ib_verbs.c > index 9bc637e49faa..d811efe49e77 100644 > +++ b/drivers/infiniband/hw/bnxt_re/ib_verbs.c > @@ -733,11 +733,16 @@ struct ib_ah *bnxt_re_create_ah(struct ib_pd *ib_pd, > > /* Write AVID to shared page. */ > if (udata) { > - struct ib_ucontext *ib_uctx = ib_pd->uobject->context; > + struct ib_ucontext *ib_uctx; > struct bnxt_re_ucontext *uctx; > unsigned long flag; > u32 *wrptr; > > + ib_uctx = rdma_get_ucontext(udata); > + if (IS_ERR(ib_uctx)) { > + rc = PTR_ERR(ib_uctx); > + goto fail; > + } Hurm.. I think if you are going to use this API so widely then we need to arrange for it to not fail. If udata is not null, and we are in a driver callback with an object, then the ucontext should be guaranteed available. Otherwise we have weird cases where udata and !rdma_get_ucontext() can be true, which is really horrible to reason about.. The way to do this is to have the core code places that have a uobject copy the ucontext from the uobject into the udata, I think I mentioned this before, and it is a fixme in the code.. For instance something like this would take care of things for the ioctl path: diff --git a/drivers/infiniband/core/uverbs_ioctl.c b/drivers/infiniband/core/uverbs_ioctl.c index 8c81ff69805276..9e9240645fbe9b 100644 --- a/drivers/infiniband/core/uverbs_ioctl.c +++ b/drivers/infiniband/core/uverbs_ioctl.c @@ -198,6 +198,7 @@ static int uverbs_process_idrs_array(struct bundle_priv *pbundle, ret = PTR_ERR(attr->uobjects[i]); break; } + pbundle->bundle.ucontext = attr->ubojects[i].ucontext; } attr->len = i; @@ -316,6 +317,7 @@ static int uverbs_process_attr(struct bundle_priv *pbundle, if (IS_ERR(o_attr->uobject)) return PTR_ERR(o_attr->uobject); __set_bit(attr_bkey, pbundle->uobj_finalize); + pbundle->bundle.ucontext = o_attr->uoboject.ucontext; if (spec->u.obj.access == UVERBS_ACCESS_NEW) { unsigned int uattr_idx = uattr - pbundle->uattrs; And the cmd path needs some editing of its alloc/get macros > uctx = container_of(ib_uctx, struct bnxt_re_ucontext, ib_uctx); > spin_lock_irqsave(&uctx->sh_lock, flag); > wrptr = (u32 *)(uctx->shpg + BNXT_RE_AVID_OFFT); > @@ -883,10 +888,15 @@ static int bnxt_re_init_user_qp(struct bnxt_re_dev *rdev, struct bnxt_re_pd *pd, > struct bnxt_qplib_qp *qplib_qp = &qp->qplib_qp; > struct ib_umem *umem; > int bytes = 0; > - struct ib_ucontext *context = pd->ib_pd.uobject->context; > - struct bnxt_re_ucontext *cntx = container_of(context, > - struct bnxt_re_ucontext, > - ib_uctx); > + struct bnxt_re_ucontext *cntx; > + struct ib_ucontext *context; > + > + context = rdma_get_ucontext(udata); > + if (IS_ERR(context)) > + return PTR_ERR(context); Like here, this is getting really long.. And this should probably follow the pattern of the new rdma_device_to_drv_device() to make it simpler.. Just add something like this at the start of functions: struct bnxt_re_ucontext *cntx = rdma_udata_to_drv_context(udata, struct bnxt_re_ucontext, ib_uctx); returns NULL if kernel mode. And then use 'if (cntx)' instead of 'if (udata)' in that function to test for user/kernel mode.. Jason
Hi Shamir, Thank you for the patch! Perhaps something to improve: [auto build test WARNING on rdma/for-next] [also build test WARNING on next-20190116] [cannot apply to v5.0-rc2] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Shamir-Rabinovitch/IB-hw-sw-remove-uobject-context-dependency-in-object-creation-APIs/20190118-132047 base: https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git for-next config: x86_64-allmodconfig (attached as .config) compiler: gcc-8 (Debian 8.2.0-14) 8.2.0 reproduce: # save the attached .config to linux build tree make ARCH=x86_64 All warnings (new ones prefixed by >>): drivers/infiniband//sw/rdmavt/qp.c: In function 'rvt_create_qp': >> drivers/infiniband//sw/rdmavt/qp.c:1133:9: warning: assignment to 'struct ib_qp *' from 'long int' makes pointer from integer without a cast [-Wint-conversion] ret = PTR_ERR(context); ^ vim +1133 drivers/infiniband//sw/rdmavt/qp.c 932 933 /** 934 * rvt_create_qp - create a queue pair for a device 935 * @ibpd: the protection domain who's device we create the queue pair for 936 * @init_attr: the attributes of the queue pair 937 * @udata: user data for libibverbs.so 938 * 939 * Queue pair creation is mostly an rvt issue. However, drivers have their own 940 * unique idea of what queue pair numbers mean. For instance there is a reserved 941 * range for PSM. 942 * 943 * Return: the queue pair on success, otherwise returns an errno. 944 * 945 * Called by the ib_create_qp() core verbs function. 946 */ 947 struct ib_qp *rvt_create_qp(struct ib_pd *ibpd, 948 struct ib_qp_init_attr *init_attr, 949 struct ib_udata *udata) 950 { 951 struct ib_ucontext *context; 952 struct rvt_qp *qp; 953 int err; 954 struct rvt_swqe *swq = NULL; 955 size_t sz; 956 size_t sg_list_sz; 957 struct ib_qp *ret = ERR_PTR(-ENOMEM); 958 struct rvt_dev_info *rdi = ib_to_rvt(ibpd->device); 959 void *priv = NULL; 960 size_t sqsize; 961 962 if (!rdi) 963 return ERR_PTR(-EINVAL); 964 965 if (init_attr->cap.max_send_sge > rdi->dparms.props.max_send_sge || 966 init_attr->cap.max_send_wr > rdi->dparms.props.max_qp_wr || 967 init_attr->create_flags) 968 return ERR_PTR(-EINVAL); 969 970 /* Check receive queue parameters if no SRQ is specified. */ 971 if (!init_attr->srq) { 972 if (init_attr->cap.max_recv_sge > 973 rdi->dparms.props.max_recv_sge || 974 init_attr->cap.max_recv_wr > rdi->dparms.props.max_qp_wr) 975 return ERR_PTR(-EINVAL); 976 977 if (init_attr->cap.max_send_sge + 978 init_attr->cap.max_send_wr + 979 init_attr->cap.max_recv_sge + 980 init_attr->cap.max_recv_wr == 0) 981 return ERR_PTR(-EINVAL); 982 } 983 sqsize = 984 init_attr->cap.max_send_wr + 1 + 985 rdi->dparms.reserved_operations; 986 switch (init_attr->qp_type) { 987 case IB_QPT_SMI: 988 case IB_QPT_GSI: 989 if (init_attr->port_num == 0 || 990 init_attr->port_num > ibpd->device->phys_port_cnt) 991 return ERR_PTR(-EINVAL); 992 /* fall through */ 993 case IB_QPT_UC: 994 case IB_QPT_RC: 995 case IB_QPT_UD: 996 sz = sizeof(struct rvt_sge) * 997 init_attr->cap.max_send_sge + 998 sizeof(struct rvt_swqe); 999 swq = vzalloc_node(array_size(sz, sqsize), rdi->dparms.node); 1000 if (!swq) 1001 return ERR_PTR(-ENOMEM); 1002 1003 sz = sizeof(*qp); 1004 sg_list_sz = 0; 1005 if (init_attr->srq) { 1006 struct rvt_srq *srq = ibsrq_to_rvtsrq(init_attr->srq); 1007 1008 if (srq->rq.max_sge > 1) 1009 sg_list_sz = sizeof(*qp->r_sg_list) * 1010 (srq->rq.max_sge - 1); 1011 } else if (init_attr->cap.max_recv_sge > 1) 1012 sg_list_sz = sizeof(*qp->r_sg_list) * 1013 (init_attr->cap.max_recv_sge - 1); 1014 qp = kzalloc_node(sz + sg_list_sz, GFP_KERNEL, 1015 rdi->dparms.node); 1016 if (!qp) 1017 goto bail_swq; 1018 1019 RCU_INIT_POINTER(qp->next, NULL); 1020 if (init_attr->qp_type == IB_QPT_RC) { 1021 qp->s_ack_queue = 1022 kcalloc_node(rvt_max_atomic(rdi), 1023 sizeof(*qp->s_ack_queue), 1024 GFP_KERNEL, 1025 rdi->dparms.node); 1026 if (!qp->s_ack_queue) 1027 goto bail_qp; 1028 } 1029 /* initialize timers needed for rc qp */ 1030 timer_setup(&qp->s_timer, rvt_rc_timeout, 0); 1031 hrtimer_init(&qp->s_rnr_timer, CLOCK_MONOTONIC, 1032 HRTIMER_MODE_REL); 1033 qp->s_rnr_timer.function = rvt_rc_rnr_retry; 1034 1035 /* 1036 * Driver needs to set up it's private QP structure and do any 1037 * initialization that is needed. 1038 */ 1039 priv = rdi->driver_f.qp_priv_alloc(rdi, qp); 1040 if (IS_ERR(priv)) { 1041 ret = priv; 1042 goto bail_qp; 1043 } 1044 qp->priv = priv; 1045 qp->timeout_jiffies = 1046 usecs_to_jiffies((4096UL * (1UL << qp->timeout)) / 1047 1000UL); 1048 if (init_attr->srq) { 1049 sz = 0; 1050 } else { 1051 qp->r_rq.size = init_attr->cap.max_recv_wr + 1; 1052 qp->r_rq.max_sge = init_attr->cap.max_recv_sge; 1053 sz = (sizeof(struct ib_sge) * qp->r_rq.max_sge) + 1054 sizeof(struct rvt_rwqe); 1055 if (udata) 1056 qp->r_rq.wq = vmalloc_user( 1057 sizeof(struct rvt_rwq) + 1058 qp->r_rq.size * sz); 1059 else 1060 qp->r_rq.wq = vzalloc_node( 1061 sizeof(struct rvt_rwq) + 1062 qp->r_rq.size * sz, 1063 rdi->dparms.node); 1064 if (!qp->r_rq.wq) 1065 goto bail_driver_priv; 1066 } 1067 1068 /* 1069 * ib_create_qp() will initialize qp->ibqp 1070 * except for qp->ibqp.qp_num. 1071 */ 1072 spin_lock_init(&qp->r_lock); 1073 spin_lock_init(&qp->s_hlock); 1074 spin_lock_init(&qp->s_lock); 1075 spin_lock_init(&qp->r_rq.lock); 1076 atomic_set(&qp->refcount, 0); 1077 atomic_set(&qp->local_ops_pending, 0); 1078 init_waitqueue_head(&qp->wait); 1079 INIT_LIST_HEAD(&qp->rspwait); 1080 qp->state = IB_QPS_RESET; 1081 qp->s_wq = swq; 1082 qp->s_size = sqsize; 1083 qp->s_avail = init_attr->cap.max_send_wr; 1084 qp->s_max_sge = init_attr->cap.max_send_sge; 1085 if (init_attr->sq_sig_type == IB_SIGNAL_REQ_WR) 1086 qp->s_flags = RVT_S_SIGNAL_REQ_WR; 1087 1088 err = alloc_qpn(rdi, &rdi->qp_dev->qpn_table, 1089 init_attr->qp_type, 1090 init_attr->port_num); 1091 if (err < 0) { 1092 ret = ERR_PTR(err); 1093 goto bail_rq_wq; 1094 } 1095 qp->ibqp.qp_num = err; 1096 qp->port_num = init_attr->port_num; 1097 rvt_init_qp(rdi, qp, init_attr->qp_type); 1098 if (rdi->driver_f.qp_priv_init) { 1099 err = rdi->driver_f.qp_priv_init(rdi, qp, init_attr); 1100 if (err) { 1101 ret = ERR_PTR(err); 1102 goto bail_rq_wq; 1103 } 1104 } 1105 break; 1106 1107 default: 1108 /* Don't support raw QPs */ 1109 return ERR_PTR(-EINVAL); 1110 } 1111 1112 init_attr->cap.max_inline_data = 0; 1113 1114 /* 1115 * Return the address of the RWQ as the offset to mmap. 1116 * See rvt_mmap() for details. 1117 */ 1118 if (udata && udata->outlen >= sizeof(__u64)) { 1119 if (!qp->r_rq.wq) { 1120 __u64 offset = 0; 1121 1122 err = ib_copy_to_udata(udata, &offset, 1123 sizeof(offset)); 1124 if (err) { 1125 ret = ERR_PTR(err); 1126 goto bail_qpn; 1127 } 1128 } else { 1129 u32 s = sizeof(struct rvt_rwq) + qp->r_rq.size * sz; 1130 1131 context = rdma_get_ucontext(udata); 1132 if (IS_ERR(context)) { > 1133 ret = PTR_ERR(context); 1134 goto bail_qpn; 1135 } 1136 1137 qp->ip = rvt_create_mmap_info(rdi, s, context, 1138 qp->r_rq.wq); 1139 if (!qp->ip) { 1140 ret = ERR_PTR(-ENOMEM); 1141 goto bail_qpn; 1142 } 1143 1144 err = ib_copy_to_udata(udata, &qp->ip->offset, 1145 sizeof(qp->ip->offset)); 1146 if (err) { 1147 ret = ERR_PTR(err); 1148 goto bail_ip; 1149 } 1150 } 1151 qp->pid = current->pid; 1152 } 1153 1154 spin_lock(&rdi->n_qps_lock); 1155 if (rdi->n_qps_allocated == rdi->dparms.props.max_qp) { 1156 spin_unlock(&rdi->n_qps_lock); 1157 ret = ERR_PTR(-ENOMEM); 1158 goto bail_ip; 1159 } 1160 1161 rdi->n_qps_allocated++; 1162 /* 1163 * Maintain a busy_jiffies variable that will be added to the timeout 1164 * period in mod_retry_timer and add_retry_timer. This busy jiffies 1165 * is scaled by the number of rc qps created for the device to reduce 1166 * the number of timeouts occurring when there is a large number of 1167 * qps. busy_jiffies is incremented every rc qp scaling interval. 1168 * The scaling interval is selected based on extensive performance 1169 * evaluation of targeted workloads. 1170 */ 1171 if (init_attr->qp_type == IB_QPT_RC) { 1172 rdi->n_rc_qps++; 1173 rdi->busy_jiffies = rdi->n_rc_qps / RC_QP_SCALING_INTERVAL; 1174 } 1175 spin_unlock(&rdi->n_qps_lock); 1176 1177 if (qp->ip) { 1178 spin_lock_irq(&rdi->pending_lock); 1179 list_add(&qp->ip->pending_mmaps, &rdi->pending_mmaps); 1180 spin_unlock_irq(&rdi->pending_lock); 1181 } 1182 1183 ret = &qp->ibqp; 1184 1185 /* 1186 * We have our QP and its good, now keep track of what types of opcodes 1187 * can be processed on this QP. We do this by keeping track of what the 1188 * 3 high order bits of the opcode are. 1189 */ 1190 switch (init_attr->qp_type) { 1191 case IB_QPT_SMI: 1192 case IB_QPT_GSI: 1193 case IB_QPT_UD: 1194 qp->allowed_ops = IB_OPCODE_UD; 1195 break; 1196 case IB_QPT_RC: 1197 qp->allowed_ops = IB_OPCODE_RC; 1198 break; 1199 case IB_QPT_UC: 1200 qp->allowed_ops = IB_OPCODE_UC; 1201 break; 1202 default: 1203 ret = ERR_PTR(-EINVAL); 1204 goto bail_ip; 1205 } 1206 1207 return ret; 1208 1209 bail_ip: 1210 if (qp->ip) 1211 kref_put(&qp->ip->ref, rvt_release_mmap_info); 1212 1213 bail_qpn: 1214 rvt_free_qpn(&rdi->qp_dev->qpn_table, qp->ibqp.qp_num); 1215 1216 bail_rq_wq: 1217 if (!qp->ip) 1218 vfree(qp->r_rq.wq); 1219 1220 bail_driver_priv: 1221 rdi->driver_f.qp_priv_free(rdi, qp); 1222 1223 bail_qp: 1224 kfree(qp->s_ack_queue); 1225 kfree(qp); 1226 1227 bail_swq: 1228 vfree(swq); 1229 1230 return ret; 1231 } 1232 --- 0-DAY kernel test infrastructure Open Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation
Hi Shamir, Thank you for the patch! Perhaps something to improve: [auto build test WARNING on rdma/for-next] [also build test WARNING on next-20190116] [cannot apply to v5.0-rc2] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Shamir-Rabinovitch/IB-hw-sw-remove-uobject-context-dependency-in-object-creation-APIs/20190118-132047 base: https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git for-next config: xtensa-allmodconfig (attached as .config) compiler: xtensa-linux-gcc (GCC) 8.2.0 reproduce: wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree GCC_VERSION=8.2.0 make.cross ARCH=xtensa Note: it may well be a FALSE warning. FWIW you are at least aware of it now. http://gcc.gnu.org/wiki/Better_Uninitialized_Warnings All warnings (new ones prefixed by >>): drivers/infiniband/hw/mlx4/srq.c: In function 'mlx4_ib_create_srq': >> drivers/infiniband/hw/mlx4/srq.c:212:3: warning: 'ib_ucontext' may be used uninitialized in this function [-Wmaybe-uninitialized] mlx4_ib_db_unmap_user(to_mucontext(ib_ucontext), &srq->db); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ vim +/ib_ucontext +212 drivers/infiniband/hw/mlx4/srq.c 70 71 struct ib_srq *mlx4_ib_create_srq(struct ib_pd *pd, 72 struct ib_srq_init_attr *init_attr, 73 struct ib_udata *udata) 74 { 75 struct mlx4_ib_dev *dev = to_mdev(pd->device); 76 struct mlx4_ib_srq *srq; 77 struct mlx4_wqe_srq_next_seg *next; 78 struct mlx4_wqe_data_seg *scatter; 79 struct ib_ucontext *ib_ucontext; 80 u32 cqn; 81 u16 xrcdn; 82 int desc_size; 83 int buf_size; 84 int err; 85 int i; 86 87 /* Sanity check SRQ size before proceeding */ 88 if (init_attr->attr.max_wr >= dev->dev->caps.max_srq_wqes || 89 init_attr->attr.max_sge > dev->dev->caps.max_srq_sge) 90 return ERR_PTR(-EINVAL); 91 92 srq = kmalloc(sizeof *srq, GFP_KERNEL); 93 if (!srq) 94 return ERR_PTR(-ENOMEM); 95 96 mutex_init(&srq->mutex); 97 spin_lock_init(&srq->lock); 98 srq->msrq.max = roundup_pow_of_two(init_attr->attr.max_wr + 1); 99 srq->msrq.max_gs = init_attr->attr.max_sge; 100 101 desc_size = max(32UL, 102 roundup_pow_of_two(sizeof (struct mlx4_wqe_srq_next_seg) + 103 srq->msrq.max_gs * 104 sizeof (struct mlx4_wqe_data_seg))); 105 srq->msrq.wqe_shift = ilog2(desc_size); 106 107 buf_size = srq->msrq.max * desc_size; 108 109 if (udata) { 110 struct mlx4_ib_create_srq ucmd; 111 112 ib_ucontext = rdma_get_ucontext(udata); 113 if (IS_ERR(ib_ucontext)) { 114 err = PTR_ERR(ib_ucontext); 115 goto err_srq; 116 } 117 118 if (ib_copy_from_udata(&ucmd, udata, sizeof ucmd)) { 119 err = -EFAULT; 120 goto err_srq; 121 } 122 123 srq->umem = ib_umem_get(udata, ucmd.buf_addr, buf_size, 0, 0); 124 if (IS_ERR(srq->umem)) { 125 err = PTR_ERR(srq->umem); 126 goto err_srq; 127 } 128 129 err = mlx4_mtt_init(dev->dev, ib_umem_page_count(srq->umem), 130 srq->umem->page_shift, &srq->mtt); 131 if (err) 132 goto err_buf; 133 134 err = mlx4_ib_umem_write_mtt(dev, &srq->mtt, srq->umem); 135 if (err) 136 goto err_mtt; 137 138 err = mlx4_ib_db_map_user(to_mucontext(ib_ucontext), udata, 139 ucmd.db_addr, &srq->db); 140 if (err) 141 goto err_mtt; 142 } else { 143 err = mlx4_db_alloc(dev->dev, &srq->db, 0); 144 if (err) 145 goto err_srq; 146 147 *srq->db.db = 0; 148 149 if (mlx4_buf_alloc(dev->dev, buf_size, PAGE_SIZE * 2, 150 &srq->buf)) { 151 err = -ENOMEM; 152 goto err_db; 153 } 154 155 srq->head = 0; 156 srq->tail = srq->msrq.max - 1; 157 srq->wqe_ctr = 0; 158 159 for (i = 0; i < srq->msrq.max; ++i) { 160 next = get_wqe(srq, i); 161 next->next_wqe_index = 162 cpu_to_be16((i + 1) & (srq->msrq.max - 1)); 163 164 for (scatter = (void *) (next + 1); 165 (void *) scatter < (void *) next + desc_size; 166 ++scatter) 167 scatter->lkey = cpu_to_be32(MLX4_INVALID_LKEY); 168 } 169 170 err = mlx4_mtt_init(dev->dev, srq->buf.npages, srq->buf.page_shift, 171 &srq->mtt); 172 if (err) 173 goto err_buf; 174 175 err = mlx4_buf_write_mtt(dev->dev, &srq->mtt, &srq->buf); 176 if (err) 177 goto err_mtt; 178 179 srq->wrid = kvmalloc_array(srq->msrq.max, 180 sizeof(u64), GFP_KERNEL); 181 if (!srq->wrid) { 182 err = -ENOMEM; 183 goto err_mtt; 184 } 185 } 186 187 cqn = ib_srq_has_cq(init_attr->srq_type) ? 188 to_mcq(init_attr->ext.cq)->mcq.cqn : 0; 189 xrcdn = (init_attr->srq_type == IB_SRQT_XRC) ? 190 to_mxrcd(init_attr->ext.xrc.xrcd)->xrcdn : 191 (u16) dev->dev->caps.reserved_xrcds; 192 err = mlx4_srq_alloc(dev->dev, to_mpd(pd)->pdn, cqn, xrcdn, &srq->mtt, 193 srq->db.dma, &srq->msrq); 194 if (err) 195 goto err_wrid; 196 197 srq->msrq.event = mlx4_ib_srq_event; 198 srq->ibsrq.ext.xrc.srq_num = srq->msrq.srqn; 199 200 if (udata) 201 if (ib_copy_to_udata(udata, &srq->msrq.srqn, sizeof (__u32))) { 202 err = -EFAULT; 203 goto err_wrid; 204 } 205 206 init_attr->attr.max_wr = srq->msrq.max - 1; 207 208 return &srq->ibsrq; 209 210 err_wrid: 211 if (udata) > 212 mlx4_ib_db_unmap_user(to_mucontext(ib_ucontext), &srq->db); 213 else 214 kvfree(srq->wrid); 215 216 err_mtt: 217 mlx4_mtt_cleanup(dev->dev, &srq->mtt); 218 219 err_buf: 220 if (srq->umem) 221 ib_umem_release(srq->umem); 222 else 223 mlx4_buf_free(dev->dev, buf_size, &srq->buf); 224 225 err_db: 226 if (!udata) 227 mlx4_db_free(dev->dev, &srq->db); 228 229 err_srq: 230 kfree(srq); 231 232 return ERR_PTR(err); 233 } 234 --- 0-DAY kernel test infrastructure Open Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation
On Thu, Jan 17, 2019 at 07:31:26PM +0000, Saleem, Shiraz wrote: > >Subject: [PATCH for-next 1/1] IB/{hw,sw}: remove 'uobject->context' dependency in > >object creation APIs > > > >Now when we have the udata passed to all the ib_xxx object creation APIs and the > >additional function 'rdma_get_ucontext' to get the ib_ucontext from ib_udata stored > >in uverbs_attr_bundle, we can finally start to remove the dependency of the drivers > >in the ib_xxx->uobject->context. > > > >Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> > >--- > > drivers/infiniband/hw/bnxt_re/ib_verbs.c | 33 +++++-- > > drivers/infiniband/hw/cxgb3/iwch_provider.c | 4 +- > > drivers/infiniband/hw/cxgb4/qp.c | 8 +- > > drivers/infiniband/hw/hns/hns_roce_qp.c | 15 +++- > > drivers/infiniband/hw/i40iw/i40iw_verbs.c | 15 +++- > > drivers/infiniband/hw/mlx4/doorbell.c | 6 ++ > > drivers/infiniband/hw/mlx4/mr.c | 9 +- > > drivers/infiniband/hw/mlx4/qp.c | 92 +++++++++++++------- > > drivers/infiniband/hw/mlx4/srq.c | 13 ++- > > drivers/infiniband/hw/mlx5/qp.c | 68 +++++++++++---- > > drivers/infiniband/hw/mlx5/srq.c | 13 ++- > > drivers/infiniband/hw/mthca/mthca_provider.c | 28 ++++-- > > drivers/infiniband/hw/mthca/mthca_qp.c | 19 ++-- > > drivers/infiniband/hw/mthca/mthca_srq.c | 25 ++++-- > > drivers/infiniband/hw/nes/nes_verbs.c | 13 ++- > > drivers/infiniband/hw/qedr/verbs.c | 8 +- > > drivers/infiniband/hw/usnic/usnic_ib_verbs.c | 7 +- > > drivers/infiniband/sw/rdmavt/qp.c | 10 ++- > > drivers/infiniband/sw/rdmavt/srq.c | 10 ++- > > drivers/infiniband/sw/rxe/rxe_qp.c | 5 +- > > drivers/infiniband/sw/rxe/rxe_verbs.c | 5 +- > > 21 files changed, 287 insertions(+), 119 deletions(-) > > > [....] > > >diff --git a/drivers/infiniband/hw/i40iw/i40iw_verbs.c > >b/drivers/infiniband/hw/i40iw/i40iw_verbs.c > >index 12b31a8440be..194cd911c9de 100644 > >--- a/drivers/infiniband/hw/i40iw/i40iw_verbs.c > >+++ b/drivers/infiniband/hw/i40iw/i40iw_verbs.c > >@@ -580,11 +580,16 @@ static struct ib_qp *i40iw_create_qp(struct ib_pd *ibpd, > > struct i40iw_create_qp_info *qp_info; > > struct i40iw_cqp_request *cqp_request; > > struct cqp_commands_info *cqp_info; > >+ struct ib_ucontext *ib_ucontext; > > > > struct i40iw_qp_host_ctx_info *ctx_info; > > struct i40iwarp_offload_info *iwarp_info; > > unsigned long flags; > > > >+ ib_ucontext = rdma_get_ucontext(udata); > >+ if (udata && IS_ERR(ib_ucontext)) > >+ return ERR_CAST(ib_ucontext); > >+ > > if (iwdev->closing) > > return ERR_PTR(-ENODEV); > > > >@@ -674,7 +679,7 @@ static struct ib_qp *i40iw_create_qp(struct ib_pd *ibpd, > > } > > iwqp->ctx_info.qp_compl_ctx = req.user_compl_ctx; > > iwqp->user_mode = 1; > >- ucontext = to_ucontext(ibpd->uobject->context); > >+ ucontext = to_ucontext(ib_ucontext); > > > > if (req.user_wqe_buffers) { > > struct i40iw_pbl *iwpbl; > >@@ -1832,6 +1837,7 @@ static struct ib_mr *i40iw_reg_user_mr(struct ib_pd *pd, > > struct i40iw_pd *iwpd = to_iwpd(pd); > > struct i40iw_device *iwdev = to_iwdev(pd->device); > > struct i40iw_ucontext *ucontext; > >+ struct ib_ucontext *ib_ucontext; > > struct i40iw_pble_alloc *palloc; > > struct i40iw_pbl *iwpbl; > > struct i40iw_mr *iwmr; > >@@ -1847,6 +1853,12 @@ static struct ib_mr *i40iw_reg_user_mr(struct ib_pd *pd, > > int ret; > > int pg_shift; > > > >+ ib_ucontext = rdma_get_ucontext(udata); > >+ if (IS_ERR(ib_ucontext)) > >+ return ERR_CAST(ib_ucontext); > >+ > >+ ucontext = to_ucontext(ib_ucontext); > >+ > > if (iwdev->closing) > > return ERR_PTR(-ENODEV); > > > >@@ -1872,7 +1884,6 @@ static struct ib_mr *i40iw_reg_user_mr(struct ib_pd *pd, > > iwmr->region = region; > > iwmr->ibmr.pd = pd; > > iwmr->ibmr.device = pd->device; > >- ucontext = to_ucontext(pd->uobject->context); > > > > iwmr->page_size = PAGE_SIZE; > > iwmr->page_msk = PAGE_MASK; > QP. > > */ > > if (new_state == IB_QPS_RESET) { > >- if (!ibuobject) { > >+ if (!udata) { > > mlx4_ib_cq_clean(recv_cq, qp->mqp.qpn, > > ibsrq ? to_msrq(ibsrq) : NULL); > > if (send_cq != recv_cq) > > I think you missed one in i40iw_dereg_mr? Thanks for the review Shiraz. This patch (see description) only try to solve the problem with regard to APIs that *create* objects. The reason is that Jason said in previous iterations that the APIs that *destroy* objects should not assume the existence of ib_udata. So the i40iw_dereg_mr will be handled in the next patch and it will not have same solution as you see here. > > https://elixir.bootlin.com/linux/v5.0-rc2/source/drivers/infiniband/hw/i40iw/i40iw_verbs.c#L2097 > > Shiraz
On Thu, Jan 17, 2019 at 01:46:46PM -0600, Steve Wise wrote: > > > On 1/17/2019 1:11 PM, Shamir Rabinovitch wrote: > > Now when we have the udata passed to all the ib_xxx object creation APIs > > and the additional function 'rdma_get_ucontext' to get the ib_ucontext > > from ib_udata stored in uverbs_attr_bundle, we can finally start to remove > > the dependency of the drivers in the ib_xxx->uobject->context. > > > > Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> > > --- [...] > > For cxgb*, looks ok. > > Reviewed-by: Steve Wise <swise@opengridcomputing.com> Thanks for the review Steve.
On Thu, Jan 17, 2019 at 09:11:12PM +0200, Shamir Rabinovitch wrote: > Now when we have the udata passed to all the ib_xxx object creation APIs > and the additional function 'rdma_get_ucontext' to get the ib_ucontext > from ib_udata stored in uverbs_attr_bundle, we can finally start to remove > the dependency of the drivers in the ib_xxx->uobject->context. > > Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> > --- > drivers/infiniband/hw/bnxt_re/ib_verbs.c | 33 +++++-- > drivers/infiniband/hw/cxgb3/iwch_provider.c | 4 +- > drivers/infiniband/hw/cxgb4/qp.c | 8 +- > drivers/infiniband/hw/hns/hns_roce_qp.c | 15 +++- > drivers/infiniband/hw/i40iw/i40iw_verbs.c | 15 +++- > drivers/infiniband/hw/mlx4/doorbell.c | 6 ++ > drivers/infiniband/hw/mlx4/mr.c | 9 +- > drivers/infiniband/hw/mlx4/qp.c | 92 +++++++++++++------- > drivers/infiniband/hw/mlx4/srq.c | 13 ++- > drivers/infiniband/hw/mlx5/qp.c | 68 +++++++++++---- > drivers/infiniband/hw/mlx5/srq.c | 13 ++- > drivers/infiniband/hw/mthca/mthca_provider.c | 28 ++++-- > drivers/infiniband/hw/mthca/mthca_qp.c | 19 ++-- > drivers/infiniband/hw/mthca/mthca_srq.c | 25 ++++-- > drivers/infiniband/hw/nes/nes_verbs.c | 13 ++- > drivers/infiniband/hw/qedr/verbs.c | 8 +- > drivers/infiniband/hw/usnic/usnic_ib_verbs.c | 7 +- > drivers/infiniband/sw/rdmavt/qp.c | 10 ++- > drivers/infiniband/sw/rdmavt/srq.c | 10 ++- > drivers/infiniband/sw/rxe/rxe_qp.c | 5 +- > drivers/infiniband/sw/rxe/rxe_verbs.c | 5 +- > 21 files changed, 287 insertions(+), 119 deletions(-) > > diff --git a/drivers/infiniband/hw/bnxt_re/ib_verbs.c b/drivers/infiniband/hw/bnxt_re/ib_verbs.c > index 9bc637e49faa..d811efe49e77 100644 > --- a/drivers/infiniband/hw/bnxt_re/ib_verbs.c > +++ b/drivers/infiniband/hw/bnxt_re/ib_verbs.c > @@ -733,11 +733,16 @@ struct ib_ah *bnxt_re_create_ah(struct ib_pd *ib_pd, > > /* Write AVID to shared page. */ > if (udata) { > - struct ib_ucontext *ib_uctx = ib_pd->uobject->context; > + struct ib_ucontext *ib_uctx; > struct bnxt_re_ucontext *uctx; > unsigned long flag; > u32 *wrptr; > > + ib_uctx = rdma_get_ucontext(udata); > + if (IS_ERR(ib_uctx)) { > + rc = PTR_ERR(ib_uctx); > + goto fail; > + } > uctx = container_of(ib_uctx, struct bnxt_re_ucontext, ib_uctx); > spin_lock_irqsave(&uctx->sh_lock, flag); > wrptr = (u32 *)(uctx->shpg + BNXT_RE_AVID_OFFT); > @@ -883,10 +888,15 @@ static int bnxt_re_init_user_qp(struct bnxt_re_dev *rdev, struct bnxt_re_pd *pd, > struct bnxt_qplib_qp *qplib_qp = &qp->qplib_qp; > struct ib_umem *umem; > int bytes = 0; > - struct ib_ucontext *context = pd->ib_pd.uobject->context; > - struct bnxt_re_ucontext *cntx = container_of(context, > - struct bnxt_re_ucontext, > - ib_uctx); > + struct bnxt_re_ucontext *cntx; > + struct ib_ucontext *context; > + > + context = rdma_get_ucontext(udata); > + if (IS_ERR(context)) > + return PTR_ERR(context); > + > + cntx = container_of(context, struct bnxt_re_ucontext, ib_uctx); > + > if (ib_copy_from_udata(&ureq, udata, sizeof(ureq))) > return -EFAULT; > > @@ -1360,10 +1370,15 @@ static int bnxt_re_init_user_srq(struct bnxt_re_dev *rdev, > struct bnxt_qplib_srq *qplib_srq = &srq->qplib_srq; > struct ib_umem *umem; > int bytes = 0; > - struct ib_ucontext *context = pd->ib_pd.uobject->context; > - struct bnxt_re_ucontext *cntx = container_of(context, > - struct bnxt_re_ucontext, > - ib_uctx); > + struct bnxt_re_ucontext *cntx; > + struct ib_ucontext *context; > + > + context = rdma_get_ucontext(udata); > + if (IS_ERR(context)) > + return PTR_ERR(context); > + > + cntx = container_of(context, struct bnxt_re_ucontext, ib_uctx); > + > if (ib_copy_from_udata(&ureq, udata, sizeof(ureq))) > return -EFAULT; > > diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c > index 07c20cd07f33..b692cd2bbe92 100644 > --- a/drivers/infiniband/hw/cxgb3/iwch_provider.c > +++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c > @@ -796,6 +796,7 @@ static struct ib_qp *iwch_create_qp(struct ib_pd *pd, > struct iwch_cq *schp; > struct iwch_cq *rchp; > struct iwch_create_qp_resp uresp; > + struct ib_ucontext *ib_ucontext; > int wqsize, sqsize, rqsize; > struct iwch_ucontext *ucontext; > > @@ -836,7 +837,8 @@ static struct ib_qp *iwch_create_qp(struct ib_pd *pd, > * Kernel users need more wq space for fastreg WRs which can take > * 2 WR fragments. > */ > - ucontext = udata ? to_iwch_ucontext(pd->uobject->context) : NULL; > + ib_ucontext = rdma_get_ucontext(udata); > + ucontext = IS_ERR(ib_ucontext) ? NULL : to_iwch_ucontext(ib_ucontext); > if (!ucontext && wqsize < (rqsize + (2 * sqsize))) > wqsize = roundup_pow_of_two(rqsize + > roundup_pow_of_two(attrs->cap.max_send_wr * 2)); > diff --git a/drivers/infiniband/hw/cxgb4/qp.c b/drivers/infiniband/hw/cxgb4/qp.c > index 03f4c66c2659..d2c452b4282c 100644 > --- a/drivers/infiniband/hw/cxgb4/qp.c > +++ b/drivers/infiniband/hw/cxgb4/qp.c > @@ -2137,6 +2137,7 @@ struct ib_qp *c4iw_create_qp(struct ib_pd *pd, struct ib_qp_init_attr *attrs, > struct c4iw_create_qp_resp uresp; > unsigned int sqsize, rqsize = 0; > struct c4iw_ucontext *ucontext; > + struct ib_ucontext *ib_ucontext; > int ret; > struct c4iw_mm_entry *sq_key_mm, *rq_key_mm = NULL, *sq_db_key_mm; > struct c4iw_mm_entry *rq_db_key_mm = NULL, *ma_sync_key_mm = NULL; > @@ -2170,7 +2171,8 @@ struct ib_qp *c4iw_create_qp(struct ib_pd *pd, struct ib_qp_init_attr *attrs, > if (sqsize < 8) > sqsize = 8; > > - ucontext = udata ? to_c4iw_ucontext(pd->uobject->context) : NULL; > + ib_ucontext = rdma_get_ucontext(udata); > + ucontext = IS_ERR(ib_ucontext) ? NULL : to_c4iw_ucontext(ib_ucontext); > > qhp = kzalloc(sizeof(*qhp), GFP_KERNEL); > if (!qhp) > @@ -2697,6 +2699,7 @@ struct ib_srq *c4iw_create_srq(struct ib_pd *pd, struct ib_srq_init_attr *attrs, > struct c4iw_create_srq_resp uresp; > struct c4iw_ucontext *ucontext; > struct c4iw_mm_entry *srq_key_mm, *srq_db_key_mm; > + struct ib_ucontext *ib_ucontext; > int rqsize; > int ret; > int wr_len; > @@ -2719,7 +2722,8 @@ struct ib_srq *c4iw_create_srq(struct ib_pd *pd, struct ib_srq_init_attr *attrs, > rqsize = attrs->attr.max_wr + 1; > rqsize = roundup_pow_of_two(max_t(u16, rqsize, 16)); > > - ucontext = udata ? to_c4iw_ucontext(pd->uobject->context) : NULL; > + ib_ucontext = rdma_get_ucontext(udata); > + ucontext = IS_ERR(ib_ucontext) ? NULL : to_c4iw_ucontext(ib_ucontext); > > srq = kzalloc(sizeof(*srq), GFP_KERNEL); > if (!srq) > diff --git a/drivers/infiniband/hw/hns/hns_roce_qp.c b/drivers/infiniband/hw/hns/hns_roce_qp.c > index accf9ce1507d..b9c25eaf2d75 100644 > --- a/drivers/infiniband/hw/hns/hns_roce_qp.c > +++ b/drivers/infiniband/hw/hns/hns_roce_qp.c > @@ -542,12 +542,19 @@ static int hns_roce_create_qp_common(struct hns_roce_dev *hr_dev, > struct device *dev = hr_dev->dev; > struct hns_roce_ib_create_qp ucmd; > struct hns_roce_ib_create_qp_resp resp = {}; > + struct ib_ucontext *ucontext; > unsigned long qpn = 0; > int ret = 0; > u32 page_shift; > u32 npages; > int i; > > + ucontext = rdma_get_ucontext(udata); > + if (IS_ERR(ucontext)) { > + ret = PTR_ERR(ucontext); > + goto err_out; > + } > + > mutex_init(&hr_qp->mutex); > spin_lock_init(&hr_qp->sq.lock); > spin_lock_init(&hr_qp->rq.lock); > @@ -653,7 +660,7 @@ static int hns_roce_create_qp_common(struct hns_roce_dev *hr_dev, > (udata->outlen >= sizeof(resp)) && > hns_roce_qp_has_sq(init_attr)) { > ret = hns_roce_db_map_user( > - to_hr_ucontext(ib_pd->uobject->context), udata, > + to_hr_ucontext(ucontext), udata, > ucmd.sdb_addr, &hr_qp->sdb); > if (ret) { > dev_err(dev, "sq record doorbell map failed!\n"); > @@ -669,7 +676,7 @@ static int hns_roce_create_qp_common(struct hns_roce_dev *hr_dev, > (udata->outlen >= sizeof(resp)) && > hns_roce_qp_has_rq(init_attr)) { > ret = hns_roce_db_map_user( > - to_hr_ucontext(ib_pd->uobject->context), udata, > + to_hr_ucontext(ucontext), udata, > ucmd.db_addr, &hr_qp->rdb); > if (ret) { > dev_err(dev, "rq record doorbell map failed!\n"); > @@ -815,7 +822,7 @@ static int hns_roce_create_qp_common(struct hns_roce_dev *hr_dev, > (udata->outlen >= sizeof(resp)) && > hns_roce_qp_has_rq(init_attr)) > hns_roce_db_unmap_user( > - to_hr_ucontext(ib_pd->uobject->context), > + to_hr_ucontext(ucontext), > &hr_qp->rdb); > } else { > kfree(hr_qp->sq.wrid); > @@ -829,7 +836,7 @@ static int hns_roce_create_qp_common(struct hns_roce_dev *hr_dev, > (udata->outlen >= sizeof(resp)) && > hns_roce_qp_has_sq(init_attr)) > hns_roce_db_unmap_user( > - to_hr_ucontext(ib_pd->uobject->context), > + to_hr_ucontext(ucontext), > &hr_qp->sdb); > > err_mtt: > diff --git a/drivers/infiniband/hw/i40iw/i40iw_verbs.c b/drivers/infiniband/hw/i40iw/i40iw_verbs.c > index 12b31a8440be..194cd911c9de 100644 > --- a/drivers/infiniband/hw/i40iw/i40iw_verbs.c > +++ b/drivers/infiniband/hw/i40iw/i40iw_verbs.c > @@ -580,11 +580,16 @@ static struct ib_qp *i40iw_create_qp(struct ib_pd *ibpd, > struct i40iw_create_qp_info *qp_info; > struct i40iw_cqp_request *cqp_request; > struct cqp_commands_info *cqp_info; > + struct ib_ucontext *ib_ucontext; > > struct i40iw_qp_host_ctx_info *ctx_info; > struct i40iwarp_offload_info *iwarp_info; > unsigned long flags; > > + ib_ucontext = rdma_get_ucontext(udata); > + if (udata && IS_ERR(ib_ucontext)) > + return ERR_CAST(ib_ucontext); > + > if (iwdev->closing) > return ERR_PTR(-ENODEV); > > @@ -674,7 +679,7 @@ static struct ib_qp *i40iw_create_qp(struct ib_pd *ibpd, > } > iwqp->ctx_info.qp_compl_ctx = req.user_compl_ctx; > iwqp->user_mode = 1; > - ucontext = to_ucontext(ibpd->uobject->context); > + ucontext = to_ucontext(ib_ucontext); > > if (req.user_wqe_buffers) { > struct i40iw_pbl *iwpbl; > @@ -1832,6 +1837,7 @@ static struct ib_mr *i40iw_reg_user_mr(struct ib_pd *pd, > struct i40iw_pd *iwpd = to_iwpd(pd); > struct i40iw_device *iwdev = to_iwdev(pd->device); > struct i40iw_ucontext *ucontext; > + struct ib_ucontext *ib_ucontext; > struct i40iw_pble_alloc *palloc; > struct i40iw_pbl *iwpbl; > struct i40iw_mr *iwmr; > @@ -1847,6 +1853,12 @@ static struct ib_mr *i40iw_reg_user_mr(struct ib_pd *pd, > int ret; > int pg_shift; > > + ib_ucontext = rdma_get_ucontext(udata); > + if (IS_ERR(ib_ucontext)) > + return ERR_CAST(ib_ucontext); > + > + ucontext = to_ucontext(ib_ucontext); > + > if (iwdev->closing) > return ERR_PTR(-ENODEV); > > @@ -1872,7 +1884,6 @@ static struct ib_mr *i40iw_reg_user_mr(struct ib_pd *pd, > iwmr->region = region; > iwmr->ibmr.pd = pd; > iwmr->ibmr.device = pd->device; > - ucontext = to_ucontext(pd->uobject->context); > > iwmr->page_size = PAGE_SIZE; > iwmr->page_msk = PAGE_MASK; > diff --git a/drivers/infiniband/hw/mlx4/doorbell.c b/drivers/infiniband/hw/mlx4/doorbell.c > index 3aab71b29ce8..1a4c6d3f8078 100644 > --- a/drivers/infiniband/hw/mlx4/doorbell.c > +++ b/drivers/infiniband/hw/mlx4/doorbell.c > @@ -48,6 +48,9 @@ int mlx4_ib_db_map_user(struct mlx4_ib_ucontext *context, > struct mlx4_ib_user_db_page *page; > int err = 0; > > + if (!context) > + return -EINVAL; > + > mutex_lock(&context->db_page_mutex); > > list_for_each_entry(page, &context->db_page_list, list) > @@ -84,6 +87,9 @@ int mlx4_ib_db_map_user(struct mlx4_ib_ucontext *context, > > void mlx4_ib_db_unmap_user(struct mlx4_ib_ucontext *context, struct mlx4_db *db) > { > + if (WARN_ON(!context)) > + return; I wonder, how did you come to need of those checks? Aren't our flows protected now and we are safe to have context != NULL? Thanks
On Sun, Jan 20, 2019 at 02:56:27PM +0200, Leon Romanovsky wrote: > On Thu, Jan 17, 2019 at 09:11:12PM +0200, Shamir Rabinovitch wrote: > > Now when we have the udata passed to all the ib_xxx object creation APIs > > and the additional function 'rdma_get_ucontext' to get the ib_ucontext > > from ib_udata stored in uverbs_attr_bundle, we can finally start to remove > > the dependency of the drivers in the ib_xxx->uobject->context. > > > > Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> > > --- [...] > > > > void mlx4_ib_db_unmap_user(struct mlx4_ib_ucontext *context, struct mlx4_db *db) > > { > > + if (WARN_ON(!context)) > > + return; > > I wonder, how did you come to need of those checks? > Aren't our flows protected now and we are safe to have context != NULL? > > Thanks Hi Leon, In this specific case of the above function I think you are right and the check is not needed. But this issue is more wide then this specific function as Jason wrote (see below). I'll try to come up with better patch follow what Jason said. I still have issues with drivers that store data on the user context because as Jason said we can't assume the user context will be available in the destroy path. So I think that for those drivers the patch that will modify the destroy path will also modify the create path and this information will have to find another place holder... " Hurm.. I think if you are going to use this API so widely then we need to arrange for it to not fail. If udata is not null, and we are in a driver callback with an object, then the ucontext should be guaranteed available. " BR, Shamir
On Thu, Jan 17, 2019 at 01:38:20PM -0700, Jason Gunthorpe wrote: > On Thu, Jan 17, 2019 at 09:11:12PM +0200, Shamir Rabinovitch wrote: > > Now when we have the udata passed to all the ib_xxx object creation APIs > > and the additional function 'rdma_get_ucontext' to get the ib_ucontext > > from ib_udata stored in uverbs_attr_bundle, we can finally start to remove > > the dependency of the drivers in the ib_xxx->uobject->context. > > [...] > > Hurm.. I think if you are going to use this API so widely then we need > to arrange for it to not fail. If udata is not null, and we are in a > driver callback with an object, then the ucontext should be guaranteed > available. > > Otherwise we have weird cases where udata and !rdma_get_ucontext() can > be true, which is really horrible to reason about.. > > The way to do this is to have the core code places that have a uobject > copy the ucontext from the uobject into the udata, I think I mentioned > this before, and it is a fixme in the code.. > > For instance something like this would take care of things for the > ioctl path: > > diff --git a/drivers/infiniband/core/uverbs_ioctl.c b/drivers/infiniband/core/uverbs_ioctl.c > index 8c81ff69805276..9e9240645fbe9b 100644 > --- a/drivers/infiniband/core/uverbs_ioctl.c > +++ b/drivers/infiniband/core/uverbs_ioctl.c > @@ -198,6 +198,7 @@ static int uverbs_process_idrs_array(struct bundle_priv *pbundle, > ret = PTR_ERR(attr->uobjects[i]); > break; > } > + pbundle->bundle.ucontext = attr->ubojects[i].ucontext; > } > > attr->len = i; > @@ -316,6 +317,7 @@ static int uverbs_process_attr(struct bundle_priv *pbundle, > if (IS_ERR(o_attr->uobject)) > return PTR_ERR(o_attr->uobject); > __set_bit(attr_bkey, pbundle->uobj_finalize); > + pbundle->bundle.ucontext = o_attr->uoboject.ucontext; > > if (spec->u.obj.access == UVERBS_ACCESS_NEW) { > unsigned int uattr_idx = uattr - pbundle->uattrs; > > And the cmd path needs some editing of its alloc/get macros OK. If I take ib_uverbs_alloc_pd as example I see: ib_uverbs_write -> ib_uverbs_alloc_pd -> uobj_alloc -> __uobj_alloc -> rdma_alloc_begin_uobject -> alloc_begin_idr_uobject -> alloc_uobj -> ib_uverbs_get_ucontext_file So it seems that the best place to pass the ib_ucontext to uverbs_attr_bundle is in __uobj_alloc in this call chain. I have this question about ib_ucontext in ib_uobject: I see this comment above ib_uverbs_get_ucontext_file: " * Must be called with the ufile->device->disassociate_srcu held, and the lock * must be held until use of the ucontext is finished. " Surly, when ib_uobject has pointer to ib_ucontext the above lock is not held for all the duration. What prevent the use of the ib_ucontext from ib_uobject after device has been disassociated ? Is it the fact that ib_uobject are only accessed via ib_uverbs_write for commands which hold the disassociate_srcu and that uverbs_disassociate_api_pre disable the commands for uverbs_api for that ib_uverbs_device ? Are there any paths that touch ib_uobject but not with in ib_uverbs_write protection ? Is this related to the point that you do not want to assume ib_udata (&context) in the ib_xxx destroy path ? If so, will it be acceptable to move the ib_xxx driver related data hanged on the ib_ucontext to the ib_xxx driver object and prevent sharing of this specific object in this specific driver to remove the dependency in the destroy path in the ib_ucontext ? Will it be acceptable to have each driver to decide what objects are shareable and what are not (those with data hanged on ib_ucontext are not shareable)? > > + context = rdma_get_ucontext(udata); > > + if (IS_ERR(context)) > > + return PTR_ERR(context); > > Like here, this is getting really long.. > > And this should probably follow the pattern of the new > rdma_device_to_drv_device() to make it simpler.. > > Just add something like this at the start of functions: > > struct bnxt_re_ucontext *cntx = rdma_udata_to_drv_context(udata, struct bnxt_re_ucontext, ib_uctx); > > returns NULL if kernel mode. > > And then use 'if (cntx)' instead of 'if (udata)' in that function to > test for user/kernel mode.. > > Jason OK.
On Tue, Jan 22, 2019 at 05:43:42PM +0200, Shamir Rabinovitch wrote: > > @@ -316,6 +317,7 @@ static int uverbs_process_attr(struct bundle_priv *pbundle, > > if (IS_ERR(o_attr->uobject)) > > return PTR_ERR(o_attr->uobject); > > __set_bit(attr_bkey, pbundle->uobj_finalize); > > + pbundle->bundle.ucontext = o_attr->uoboject.ucontext; > > > > if (spec->u.obj.access == UVERBS_ACCESS_NEW) { > > unsigned int uattr_idx = uattr - pbundle->uattrs; > > > > And the cmd path needs some editing of its alloc/get macros > > > OK. If I take ib_uverbs_alloc_pd as example I see: > > ib_uverbs_write -> ib_uverbs_alloc_pd -> uobj_alloc -> __uobj_alloc -> > rdma_alloc_begin_uobject -> alloc_begin_idr_uobject -> alloc_uobj -> > ib_uverbs_get_ucontext_file > > So it seems that the best place to pass the ib_ucontext to > uverbs_attr_bundle is in __uobj_alloc in this call chain. Probably yes > I have this question about ib_ucontext in ib_uobject: > > I see this comment above ib_uverbs_get_ucontext_file: > " > * Must be called with the ufile->device->disassociate_srcu held, and the lock > * must be held until use of the ucontext is finished. > " > > Surly, when ib_uobject has pointer to ib_ucontext the above lock is > not held for all the duration. The comment is only for ib_uverbs_get_ucontext_file() callers. But for the ucontext in the ib_uobject - all accesses to it have to be covered under a srcu & UVERBS_LOOKUP_READ/WRITE or the hw_destroy_rwsem to make sure the pointer is still alive. uverbs_destroy_uobject() sets the ucontext to NULL during disassociate, so every reader must have locking against that set. > Is it the fact that ib_uobject are only accessed via ib_uverbs_write > for commands which hold the disassociate_srcu and that > uverbs_disassociate_api_pre disable the commands for uverbs_api for > that ib_uverbs_device ? Running under the commands is what provides the srcu lock, but this test here in rdma_lookup_get_uobject: /* * If we have been disassociated block every command except for * DESTROY based commands. */ if (mode != UVERBS_LOOKUP_DESTROY && !srcu_dereference(ufile->device->ib_dev, &ufile->device->disassociate_srcu)) { ret = uverbs_try_lock_object(uobj, mode); Is what is excluding parallel dis-association. Once the per-uobject read/write lock is obtained under the srcu the ucontext pointer cannot be null'd. > Are there any paths that touch ib_uobject but not with in > ib_uverbs_write protection ? Everything should use this system. > Is this related to the point that you do not want to assume ib_udata > (&context) in the ib_xxx destroy path ? Not really.. the destroy path will still have a ucontext since ucontext cannot be destroyed until all of the ubojects are destroyed, but under sharing it might not be the same ucontext that was used to create the object - so what use could destroy have of it? > If so, will it be acceptable to move the ib_xxx driver related data > hanged on the ib_ucontext to the ib_xxx driver object and prevent > sharing of this specific object in this specific driver to remove > the dependency in the destroy path in the ib_ucontext ? I don't know.. We'd see what that looks like Maybe the driver should retain a kref on the ucontext inside the objects that need it, but that would be a bit tricky to manage lifetime too. > Will it be acceptable to have each driver to decide what objects are > shareable and what are not (those with data hanged on ib_ucontext > are not shareable)? I think that is what we will end up with, as revising all the drivers does not seem feasible. Jason
On Tue, Jan 22, 2019 at 09:02:18AM -0700, Jason Gunthorpe wrote: > On Tue, Jan 22, 2019 at 05:43:42PM +0200, Shamir Rabinovitch wrote: > > > @@ -316,6 +317,7 @@ static int uverbs_process_attr(struct bundle_priv *pbundle, > > > if (IS_ERR(o_attr->uobject)) > > > return PTR_ERR(o_attr->uobject); > > > __set_bit(attr_bkey, pbundle->uobj_finalize); > > > + pbundle->bundle.ucontext = o_attr->uoboject.ucontext; > > > > > > if (spec->u.obj.access == UVERBS_ACCESS_NEW) { > > > unsigned int uattr_idx = uattr - pbundle->uattrs; > > > > > > And the cmd path needs some editing of its alloc/get macros > > > > > > OK. If I take ib_uverbs_alloc_pd as example I see: > > > > ib_uverbs_write -> ib_uverbs_alloc_pd -> uobj_alloc -> __uobj_alloc -> > > rdma_alloc_begin_uobject -> alloc_begin_idr_uobject -> alloc_uobj -> > > ib_uverbs_get_ucontext_file > > > > So it seems that the best place to pass the ib_ucontext to > > uverbs_attr_bundle is in __uobj_alloc in this call chain. > > Probably yes > > > I have this question about ib_ucontext in ib_uobject: > > > > I see this comment above ib_uverbs_get_ucontext_file: > > " > > * Must be called with the ufile->device->disassociate_srcu held, and the lock > > * must be held until use of the ucontext is finished. > > " > > > > Surly, when ib_uobject has pointer to ib_ucontext the above lock is > > not held for all the duration. > > The comment is only for ib_uverbs_get_ucontext_file() callers. > > But for the ucontext in the ib_uobject - all accesses to it have to be > covered under a srcu & UVERBS_LOOKUP_READ/WRITE or the > hw_destroy_rwsem to make sure the pointer is still > alive. uverbs_destroy_uobject() sets the ucontext to NULL during > disassociate, so every reader must have locking against that set. > > > Is it the fact that ib_uobject are only accessed via ib_uverbs_write > > for commands which hold the disassociate_srcu and that > > uverbs_disassociate_api_pre disable the commands for uverbs_api for > > that ib_uverbs_device ? > > Running under the commands is what provides the srcu lock, but this > test here in rdma_lookup_get_uobject: > > /* > * If we have been disassociated block every command except for > * DESTROY based commands. > */ > if (mode != UVERBS_LOOKUP_DESTROY && > !srcu_dereference(ufile->device->ib_dev, > &ufile->device->disassociate_srcu)) { > > ret = uverbs_try_lock_object(uobj, mode); > > Is what is excluding parallel dis-association. Once the per-uobject > read/write lock is obtained under the srcu the ucontext pointer cannot be > null'd. > > > Are there any paths that touch ib_uobject but not with in > > ib_uverbs_write protection ? > > Everything should use this system. > > > Is this related to the point that you do not want to assume ib_udata > > (&context) in the ib_xxx destroy path ? > > Not really.. the destroy path will still have a ucontext since > ucontext cannot be destroyed until all of the ubojects are destroyed, > but under sharing it might not be the same ucontext that was used to > create the object - so what use could destroy have of it? Create path can have same issue... If driver object hold information on the context and we want to share it to another context we have problem that we cannot duplicate per context information (we might have more then 2 context at that point so who is the up-to-date..?) So in general I think: 1. Any ib_x driver object that use ib_ucontext to store data is not shareable 2. The main thing object sharing need is that no ib_x object will have ib_uobject pointer I think that we need to do for the destroy APIs the same change as you requested for the create APIs where we eliminate the dependency of the core in the ib_uobject embedded in the ib_x objects (mainly to obtain ib_ucontext pointer). Will it be acceptable to eliminate the use of ib_uobject in destroy APIs via the use of ib_udata embedded in attr bundle as next step? > > > If so, will it be acceptable to move the ib_xxx driver related data > > hanged on the ib_ucontext to the ib_xxx driver object and prevent > > sharing of this specific object in this specific driver to remove > > the dependency in the destroy path in the ib_ucontext ? > > I don't know.. We'd see what that looks like I think we need some way the rdma & uverbs could query the driver and ask if specific object type can be shared or not within that driver. Sound like that each driver need to supply this: bool is_shareable(enum uverbs_default_objects object) At same time I think that all the interface between uverbs - rdma - driver need to support sharing. The meaning is that we need a way to duplicate existing driver object and return the driver specific information to user. For this to happen we need the ability to send the original driver object down to the driver so it will find what it need to create the second copy. Is it acceptable to modify the rdma - driver interface for every create API so it will have the missing information? > > Maybe the driver should retain a kref on the ucontext inside the > objects that need it, but that would be a bit tricky to manage > lifetime too. Probably if driver want to enable share of some object it first need to move all the related data of this object out of the ib_ucontext and then add the code that duplicate the object and only then enable the sharing via 'is_shareable'. > > > Will it be acceptable to have each driver to decide what objects are > > shareable and what are not (those with data hanged on ib_ucontext > > are not shareable)? > > I think that is what we will end up with, as revising all the drivers > does not seem feasible. > > Jason
On Mon, Jan 28, 2019 at 07:03:47PM +0200, Shamir Rabinovitch wrote: > On Tue, Jan 22, 2019 at 09:02:18AM -0700, Jason Gunthorpe wrote: > > On Tue, Jan 22, 2019 at 05:43:42PM +0200, Shamir Rabinovitch wrote: > > > > @@ -316,6 +317,7 @@ static int uverbs_process_attr(struct bundle_priv *pbundle, > > > > if (IS_ERR(o_attr->uobject)) > > > > return PTR_ERR(o_attr->uobject); > > > > __set_bit(attr_bkey, pbundle->uobj_finalize); > > > > + pbundle->bundle.ucontext = o_attr->uoboject.ucontext; > > > > > > > > if (spec->u.obj.access == UVERBS_ACCESS_NEW) { > > > > unsigned int uattr_idx = uattr - pbundle->uattrs; > > > > > > > > And the cmd path needs some editing of its alloc/get macros > > > > > > > > > OK. If I take ib_uverbs_alloc_pd as example I see: > > > > > > ib_uverbs_write -> ib_uverbs_alloc_pd -> uobj_alloc -> __uobj_alloc -> > > > rdma_alloc_begin_uobject -> alloc_begin_idr_uobject -> alloc_uobj -> > > > ib_uverbs_get_ucontext_file > > > > > > So it seems that the best place to pass the ib_ucontext to > > > uverbs_attr_bundle is in __uobj_alloc in this call chain. > > > > Probably yes > > > > > I have this question about ib_ucontext in ib_uobject: > > > > > > I see this comment above ib_uverbs_get_ucontext_file: > > > " > > > * Must be called with the ufile->device->disassociate_srcu held, and the lock > > > * must be held until use of the ucontext is finished. > > > " > > > > > > Surly, when ib_uobject has pointer to ib_ucontext the above lock is > > > not held for all the duration. > > > > The comment is only for ib_uverbs_get_ucontext_file() callers. > > > > But for the ucontext in the ib_uobject - all accesses to it have to be > > covered under a srcu & UVERBS_LOOKUP_READ/WRITE or the > > hw_destroy_rwsem to make sure the pointer is still > > alive. uverbs_destroy_uobject() sets the ucontext to NULL during > > disassociate, so every reader must have locking against that set. > > > > > Is it the fact that ib_uobject are only accessed via ib_uverbs_write > > > for commands which hold the disassociate_srcu and that > > > uverbs_disassociate_api_pre disable the commands for uverbs_api for > > > that ib_uverbs_device ? > > > > Running under the commands is what provides the srcu lock, but this > > test here in rdma_lookup_get_uobject: > > > > /* > > * If we have been disassociated block every command except for > > * DESTROY based commands. > > */ > > if (mode != UVERBS_LOOKUP_DESTROY && > > !srcu_dereference(ufile->device->ib_dev, > > &ufile->device->disassociate_srcu)) { > > > > ret = uverbs_try_lock_object(uobj, mode); > > > > Is what is excluding parallel dis-association. Once the per-uobject > > read/write lock is obtained under the srcu the ucontext pointer cannot be > > null'd. > > > > > Are there any paths that touch ib_uobject but not with in > > > ib_uverbs_write protection ? > > > > Everything should use this system. > > > > > Is this related to the point that you do not want to assume ib_udata > > > (&context) in the ib_xxx destroy path ? > > > > Not really.. the destroy path will still have a ucontext since > > ucontext cannot be destroyed until all of the ubojects are destroyed, > > but under sharing it might not be the same ucontext that was used to > > create the object - so what use could destroy have of it? > > Create path can have same issue... If driver object hold information on > the context and we want to share it to another context we have problem > that we cannot duplicate per context information (we might have more > then 2 context at that point so who is the up-to-date..?) Well, a driver might be able to get by if it was syemmtric (ie always release the resource against the allocating PD) but it complicates ucontext destruction. > Will it be acceptable to eliminate the use of ib_uobject in destroy APIs > via the use of ib_udata embedded in attr bundle as next step? Yes, we can do that.. For non-sharable objects it would be no change in behavior. > I think we need some way the rdma & uverbs could query the driver and > ask if specific object type can be shared or not within that driver. > Sound like that each driver need to supply this: > > bool is_shareable(enum uverbs_default_objects object) I think drivers will need to have a flags that says what objects they can allow for sharing. Each driver will have to be audited before they can support sharing unfortunately. > At same time I think that all the interface between uverbs - rdma - > driver need to support sharing. The meaning is that we need a way to > duplicate existing driver object and return the driver specific > information to user. I don't think we should be duplicating the driver object, sharing should be a uverbs thing where it just creates a new uobject and points to the same driver object. Somehow this has to work out a lifetime too. > Probably if driver want to enable share of some object it first need to > move all the related data of this object out of the ib_ucontext and then > add the code that duplicate the object and only then enable the sharing > via 'is_shareable'. Most likely Jason
On Mon, Jan 28, 2019 at 12:20:42PM -0700, Jason Gunthorpe wrote: > On Mon, Jan 28, 2019 at 07:03:47PM +0200, Shamir Rabinovitch wrote: > > On Tue, Jan 22, 2019 at 09:02:18AM -0700, Jason Gunthorpe wrote: > > > On Tue, Jan 22, 2019 at 05:43:42PM +0200, Shamir Rabinovitch wrote: > > > > > @@ -316,6 +317,7 @@ static int uverbs_process_attr(struct bundle_priv *pbundle, > > > > > if (IS_ERR(o_attr->uobject)) > > > > > return PTR_ERR(o_attr->uobject); > > > > > __set_bit(attr_bkey, pbundle->uobj_finalize); > > > > > + pbundle->bundle.ucontext = o_attr->uoboject.ucontext; > > > > > > > > > > if (spec->u.obj.access == UVERBS_ACCESS_NEW) { > > > > > unsigned int uattr_idx = uattr - pbundle->uattrs; > > > > > > > > > > And the cmd path needs some editing of its alloc/get macros > > > > > > > > > > > > OK. If I take ib_uverbs_alloc_pd as example I see: > > > > > > > > ib_uverbs_write -> ib_uverbs_alloc_pd -> uobj_alloc -> __uobj_alloc -> > > > > rdma_alloc_begin_uobject -> alloc_begin_idr_uobject -> alloc_uobj -> > > > > ib_uverbs_get_ucontext_file > > > > > > > > So it seems that the best place to pass the ib_ucontext to > > > > uverbs_attr_bundle is in __uobj_alloc in this call chain. > > > > > > Probably yes > > > > > > > I have this question about ib_ucontext in ib_uobject: > > > > > > > > I see this comment above ib_uverbs_get_ucontext_file: > > > > " > > > > * Must be called with the ufile->device->disassociate_srcu held, and the lock > > > > * must be held until use of the ucontext is finished. > > > > " > > > > > > > > Surly, when ib_uobject has pointer to ib_ucontext the above lock is > > > > not held for all the duration. > > > > > > The comment is only for ib_uverbs_get_ucontext_file() callers. > > > > > > But for the ucontext in the ib_uobject - all accesses to it have to be > > > covered under a srcu & UVERBS_LOOKUP_READ/WRITE or the > > > hw_destroy_rwsem to make sure the pointer is still > > > alive. uverbs_destroy_uobject() sets the ucontext to NULL during > > > disassociate, so every reader must have locking against that set. > > > > > > > Is it the fact that ib_uobject are only accessed via ib_uverbs_write > > > > for commands which hold the disassociate_srcu and that > > > > uverbs_disassociate_api_pre disable the commands for uverbs_api for > > > > that ib_uverbs_device ? > > > > > > Running under the commands is what provides the srcu lock, but this > > > test here in rdma_lookup_get_uobject: > > > > > > /* > > > * If we have been disassociated block every command except for > > > * DESTROY based commands. > > > */ > > > if (mode != UVERBS_LOOKUP_DESTROY && > > > !srcu_dereference(ufile->device->ib_dev, > > > &ufile->device->disassociate_srcu)) { > > > > > > ret = uverbs_try_lock_object(uobj, mode); > > > > > > Is what is excluding parallel dis-association. Once the per-uobject > > > read/write lock is obtained under the srcu the ucontext pointer cannot be > > > null'd. > > > > > > > Are there any paths that touch ib_uobject but not with in > > > > ib_uverbs_write protection ? > > > > > > Everything should use this system. > > > > > > > Is this related to the point that you do not want to assume ib_udata > > > > (&context) in the ib_xxx destroy path ? > > > > > > Not really.. the destroy path will still have a ucontext since > > > ucontext cannot be destroyed until all of the ubojects are destroyed, > > > but under sharing it might not be the same ucontext that was used to > > > create the object - so what use could destroy have of it? > > > > Create path can have same issue... If driver object hold information on > > the context and we want to share it to another context we have problem > > that we cannot duplicate per context information (we might have more > > then 2 context at that point so who is the up-to-date..?) > > Well, a driver might be able to get by if it was syemmtric (ie always release > the resource against the allocating PD) but it complicates ucontext > destruction. > > > Will it be acceptable to eliminate the use of ib_uobject in destroy APIs > > via the use of ib_udata embedded in attr bundle as next step? > > Yes, we can do that.. For non-sharable objects it would be no change > in behavior. > > > I think we need some way the rdma & uverbs could query the driver and > > ask if specific object type can be shared or not within that driver. > > Sound like that each driver need to supply this: > > > > bool is_shareable(enum uverbs_default_objects object) > > I think drivers will need to have a flags that says what objects they > can allow for sharing. Each driver will have to be audited before they > can support sharing unfortunately. > > > At same time I think that all the interface between uverbs - rdma - > > driver need to support sharing. The meaning is that we need a way to > > duplicate existing driver object and return the driver specific > > information to user. > > I don't think we should be duplicating the driver object, sharing > should be a uverbs thing where it just creates a new uobject and > points to the same driver object. Somehow this has to work out > a lifetime too. I think I was miss understood here... Duplicating the object is not duplicating the ib_x HW object but rather letting the driver to copy the related information from the existing ib_x object to the user-space library. Example is the pdn number in mlx4 ib_bd creation. For this to happen, the create API (alloc_pd) need to convey existing ib_pd down to the mlx4 driver in case of share of PD. I think it's best to modify all the APIs between rdma-core and drivers so that it will be possible to attach existing ib_x object to each creation API and so we will have support for sharing in this interface even if we decide not to implement it because no driver currently support it. This involve wide API change which I think it's best to do in single commit. > > > Probably if driver want to enable share of some object it first need to > > move all the related data of this object out of the ib_ucontext and then > > add the code that duplicate the object and only then enable the sharing > > via 'is_shareable'. > > Most likely > > Jason
On Sun, Feb 03, 2019 at 06:43:31PM +0200, Shamir Rabinovitch wrote: > > I don't think we should be duplicating the driver object, sharing > > should be a uverbs thing where it just creates a new uobject and > > points to the same driver object. Somehow this has to work out > > a lifetime too. > > I think I was miss understood here... Duplicating the object is not > duplicating the ib_x HW object but rather letting the driver to copy the > related information from the existing ib_x object to the user-space > library. Example is the pdn number in mlx4 ib_bd creation. We should have a query API for drivers to call if they want to re-attach to shared objects. The query can return any driver specific information that might be needed. Maybe drivers that don't implement the query can't do the share? Jason
diff --git a/drivers/infiniband/hw/bnxt_re/ib_verbs.c b/drivers/infiniband/hw/bnxt_re/ib_verbs.c index 9bc637e49faa..d811efe49e77 100644 --- a/drivers/infiniband/hw/bnxt_re/ib_verbs.c +++ b/drivers/infiniband/hw/bnxt_re/ib_verbs.c @@ -733,11 +733,16 @@ struct ib_ah *bnxt_re_create_ah(struct ib_pd *ib_pd, /* Write AVID to shared page. */ if (udata) { - struct ib_ucontext *ib_uctx = ib_pd->uobject->context; + struct ib_ucontext *ib_uctx; struct bnxt_re_ucontext *uctx; unsigned long flag; u32 *wrptr; + ib_uctx = rdma_get_ucontext(udata); + if (IS_ERR(ib_uctx)) { + rc = PTR_ERR(ib_uctx); + goto fail; + } uctx = container_of(ib_uctx, struct bnxt_re_ucontext, ib_uctx); spin_lock_irqsave(&uctx->sh_lock, flag); wrptr = (u32 *)(uctx->shpg + BNXT_RE_AVID_OFFT); @@ -883,10 +888,15 @@ static int bnxt_re_init_user_qp(struct bnxt_re_dev *rdev, struct bnxt_re_pd *pd, struct bnxt_qplib_qp *qplib_qp = &qp->qplib_qp; struct ib_umem *umem; int bytes = 0; - struct ib_ucontext *context = pd->ib_pd.uobject->context; - struct bnxt_re_ucontext *cntx = container_of(context, - struct bnxt_re_ucontext, - ib_uctx); + struct bnxt_re_ucontext *cntx; + struct ib_ucontext *context; + + context = rdma_get_ucontext(udata); + if (IS_ERR(context)) + return PTR_ERR(context); + + cntx = container_of(context, struct bnxt_re_ucontext, ib_uctx); + if (ib_copy_from_udata(&ureq, udata, sizeof(ureq))) return -EFAULT; @@ -1360,10 +1370,15 @@ static int bnxt_re_init_user_srq(struct bnxt_re_dev *rdev, struct bnxt_qplib_srq *qplib_srq = &srq->qplib_srq; struct ib_umem *umem; int bytes = 0; - struct ib_ucontext *context = pd->ib_pd.uobject->context; - struct bnxt_re_ucontext *cntx = container_of(context, - struct bnxt_re_ucontext, - ib_uctx); + struct bnxt_re_ucontext *cntx; + struct ib_ucontext *context; + + context = rdma_get_ucontext(udata); + if (IS_ERR(context)) + return PTR_ERR(context); + + cntx = container_of(context, struct bnxt_re_ucontext, ib_uctx); + if (ib_copy_from_udata(&ureq, udata, sizeof(ureq))) return -EFAULT; diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c index 07c20cd07f33..b692cd2bbe92 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_provider.c +++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c @@ -796,6 +796,7 @@ static struct ib_qp *iwch_create_qp(struct ib_pd *pd, struct iwch_cq *schp; struct iwch_cq *rchp; struct iwch_create_qp_resp uresp; + struct ib_ucontext *ib_ucontext; int wqsize, sqsize, rqsize; struct iwch_ucontext *ucontext; @@ -836,7 +837,8 @@ static struct ib_qp *iwch_create_qp(struct ib_pd *pd, * Kernel users need more wq space for fastreg WRs which can take * 2 WR fragments. */ - ucontext = udata ? to_iwch_ucontext(pd->uobject->context) : NULL; + ib_ucontext = rdma_get_ucontext(udata); + ucontext = IS_ERR(ib_ucontext) ? NULL : to_iwch_ucontext(ib_ucontext); if (!ucontext && wqsize < (rqsize + (2 * sqsize))) wqsize = roundup_pow_of_two(rqsize + roundup_pow_of_two(attrs->cap.max_send_wr * 2)); diff --git a/drivers/infiniband/hw/cxgb4/qp.c b/drivers/infiniband/hw/cxgb4/qp.c index 03f4c66c2659..d2c452b4282c 100644 --- a/drivers/infiniband/hw/cxgb4/qp.c +++ b/drivers/infiniband/hw/cxgb4/qp.c @@ -2137,6 +2137,7 @@ struct ib_qp *c4iw_create_qp(struct ib_pd *pd, struct ib_qp_init_attr *attrs, struct c4iw_create_qp_resp uresp; unsigned int sqsize, rqsize = 0; struct c4iw_ucontext *ucontext; + struct ib_ucontext *ib_ucontext; int ret; struct c4iw_mm_entry *sq_key_mm, *rq_key_mm = NULL, *sq_db_key_mm; struct c4iw_mm_entry *rq_db_key_mm = NULL, *ma_sync_key_mm = NULL; @@ -2170,7 +2171,8 @@ struct ib_qp *c4iw_create_qp(struct ib_pd *pd, struct ib_qp_init_attr *attrs, if (sqsize < 8) sqsize = 8; - ucontext = udata ? to_c4iw_ucontext(pd->uobject->context) : NULL; + ib_ucontext = rdma_get_ucontext(udata); + ucontext = IS_ERR(ib_ucontext) ? NULL : to_c4iw_ucontext(ib_ucontext); qhp = kzalloc(sizeof(*qhp), GFP_KERNEL); if (!qhp) @@ -2697,6 +2699,7 @@ struct ib_srq *c4iw_create_srq(struct ib_pd *pd, struct ib_srq_init_attr *attrs, struct c4iw_create_srq_resp uresp; struct c4iw_ucontext *ucontext; struct c4iw_mm_entry *srq_key_mm, *srq_db_key_mm; + struct ib_ucontext *ib_ucontext; int rqsize; int ret; int wr_len; @@ -2719,7 +2722,8 @@ struct ib_srq *c4iw_create_srq(struct ib_pd *pd, struct ib_srq_init_attr *attrs, rqsize = attrs->attr.max_wr + 1; rqsize = roundup_pow_of_two(max_t(u16, rqsize, 16)); - ucontext = udata ? to_c4iw_ucontext(pd->uobject->context) : NULL; + ib_ucontext = rdma_get_ucontext(udata); + ucontext = IS_ERR(ib_ucontext) ? NULL : to_c4iw_ucontext(ib_ucontext); srq = kzalloc(sizeof(*srq), GFP_KERNEL); if (!srq) diff --git a/drivers/infiniband/hw/hns/hns_roce_qp.c b/drivers/infiniband/hw/hns/hns_roce_qp.c index accf9ce1507d..b9c25eaf2d75 100644 --- a/drivers/infiniband/hw/hns/hns_roce_qp.c +++ b/drivers/infiniband/hw/hns/hns_roce_qp.c @@ -542,12 +542,19 @@ static int hns_roce_create_qp_common(struct hns_roce_dev *hr_dev, struct device *dev = hr_dev->dev; struct hns_roce_ib_create_qp ucmd; struct hns_roce_ib_create_qp_resp resp = {}; + struct ib_ucontext *ucontext; unsigned long qpn = 0; int ret = 0; u32 page_shift; u32 npages; int i; + ucontext = rdma_get_ucontext(udata); + if (IS_ERR(ucontext)) { + ret = PTR_ERR(ucontext); + goto err_out; + } + mutex_init(&hr_qp->mutex); spin_lock_init(&hr_qp->sq.lock); spin_lock_init(&hr_qp->rq.lock); @@ -653,7 +660,7 @@ static int hns_roce_create_qp_common(struct hns_roce_dev *hr_dev, (udata->outlen >= sizeof(resp)) && hns_roce_qp_has_sq(init_attr)) { ret = hns_roce_db_map_user( - to_hr_ucontext(ib_pd->uobject->context), udata, + to_hr_ucontext(ucontext), udata, ucmd.sdb_addr, &hr_qp->sdb); if (ret) { dev_err(dev, "sq record doorbell map failed!\n"); @@ -669,7 +676,7 @@ static int hns_roce_create_qp_common(struct hns_roce_dev *hr_dev, (udata->outlen >= sizeof(resp)) && hns_roce_qp_has_rq(init_attr)) { ret = hns_roce_db_map_user( - to_hr_ucontext(ib_pd->uobject->context), udata, + to_hr_ucontext(ucontext), udata, ucmd.db_addr, &hr_qp->rdb); if (ret) { dev_err(dev, "rq record doorbell map failed!\n"); @@ -815,7 +822,7 @@ static int hns_roce_create_qp_common(struct hns_roce_dev *hr_dev, (udata->outlen >= sizeof(resp)) && hns_roce_qp_has_rq(init_attr)) hns_roce_db_unmap_user( - to_hr_ucontext(ib_pd->uobject->context), + to_hr_ucontext(ucontext), &hr_qp->rdb); } else { kfree(hr_qp->sq.wrid); @@ -829,7 +836,7 @@ static int hns_roce_create_qp_common(struct hns_roce_dev *hr_dev, (udata->outlen >= sizeof(resp)) && hns_roce_qp_has_sq(init_attr)) hns_roce_db_unmap_user( - to_hr_ucontext(ib_pd->uobject->context), + to_hr_ucontext(ucontext), &hr_qp->sdb); err_mtt: diff --git a/drivers/infiniband/hw/i40iw/i40iw_verbs.c b/drivers/infiniband/hw/i40iw/i40iw_verbs.c index 12b31a8440be..194cd911c9de 100644 --- a/drivers/infiniband/hw/i40iw/i40iw_verbs.c +++ b/drivers/infiniband/hw/i40iw/i40iw_verbs.c @@ -580,11 +580,16 @@ static struct ib_qp *i40iw_create_qp(struct ib_pd *ibpd, struct i40iw_create_qp_info *qp_info; struct i40iw_cqp_request *cqp_request; struct cqp_commands_info *cqp_info; + struct ib_ucontext *ib_ucontext; struct i40iw_qp_host_ctx_info *ctx_info; struct i40iwarp_offload_info *iwarp_info; unsigned long flags; + ib_ucontext = rdma_get_ucontext(udata); + if (udata && IS_ERR(ib_ucontext)) + return ERR_CAST(ib_ucontext); + if (iwdev->closing) return ERR_PTR(-ENODEV); @@ -674,7 +679,7 @@ static struct ib_qp *i40iw_create_qp(struct ib_pd *ibpd, } iwqp->ctx_info.qp_compl_ctx = req.user_compl_ctx; iwqp->user_mode = 1; - ucontext = to_ucontext(ibpd->uobject->context); + ucontext = to_ucontext(ib_ucontext); if (req.user_wqe_buffers) { struct i40iw_pbl *iwpbl; @@ -1832,6 +1837,7 @@ static struct ib_mr *i40iw_reg_user_mr(struct ib_pd *pd, struct i40iw_pd *iwpd = to_iwpd(pd); struct i40iw_device *iwdev = to_iwdev(pd->device); struct i40iw_ucontext *ucontext; + struct ib_ucontext *ib_ucontext; struct i40iw_pble_alloc *palloc; struct i40iw_pbl *iwpbl; struct i40iw_mr *iwmr; @@ -1847,6 +1853,12 @@ static struct ib_mr *i40iw_reg_user_mr(struct ib_pd *pd, int ret; int pg_shift; + ib_ucontext = rdma_get_ucontext(udata); + if (IS_ERR(ib_ucontext)) + return ERR_CAST(ib_ucontext); + + ucontext = to_ucontext(ib_ucontext); + if (iwdev->closing) return ERR_PTR(-ENODEV); @@ -1872,7 +1884,6 @@ static struct ib_mr *i40iw_reg_user_mr(struct ib_pd *pd, iwmr->region = region; iwmr->ibmr.pd = pd; iwmr->ibmr.device = pd->device; - ucontext = to_ucontext(pd->uobject->context); iwmr->page_size = PAGE_SIZE; iwmr->page_msk = PAGE_MASK; diff --git a/drivers/infiniband/hw/mlx4/doorbell.c b/drivers/infiniband/hw/mlx4/doorbell.c index 3aab71b29ce8..1a4c6d3f8078 100644 --- a/drivers/infiniband/hw/mlx4/doorbell.c +++ b/drivers/infiniband/hw/mlx4/doorbell.c @@ -48,6 +48,9 @@ int mlx4_ib_db_map_user(struct mlx4_ib_ucontext *context, struct mlx4_ib_user_db_page *page; int err = 0; + if (!context) + return -EINVAL; + mutex_lock(&context->db_page_mutex); list_for_each_entry(page, &context->db_page_list, list) @@ -84,6 +87,9 @@ int mlx4_ib_db_map_user(struct mlx4_ib_ucontext *context, void mlx4_ib_db_unmap_user(struct mlx4_ib_ucontext *context, struct mlx4_db *db) { + if (WARN_ON(!context)) + return; + mutex_lock(&context->db_page_mutex); if (!--db->u.user_page->refcnt) { diff --git a/drivers/infiniband/hw/mlx4/mr.c b/drivers/infiniband/hw/mlx4/mr.c index 56639ecd53ad..17744bb1b7a0 100644 --- a/drivers/infiniband/hw/mlx4/mr.c +++ b/drivers/infiniband/hw/mlx4/mr.c @@ -367,8 +367,7 @@ int mlx4_ib_umem_calc_optimal_mtt_size(struct ib_umem *umem, u64 start_va, return block_shift; } -static struct ib_umem *mlx4_get_umem_mr(struct ib_ucontext *context, - struct ib_udata *udata, u64 start, +static struct ib_umem *mlx4_get_umem_mr(struct ib_udata *udata, u64 start, u64 length, u64 virt_addr, int access_flags) { @@ -416,7 +415,7 @@ struct ib_mr *mlx4_ib_reg_user_mr(struct ib_pd *pd, u64 start, u64 length, if (!mr) return ERR_PTR(-ENOMEM); - mr->umem = mlx4_get_umem_mr(pd->uobject->context, udata, start, length, + mr->umem = mlx4_get_umem_mr(udata, start, length, virt_addr, access_flags); if (IS_ERR(mr->umem)) { err = PTR_ERR(mr->umem); @@ -507,8 +506,8 @@ int mlx4_ib_rereg_user_mr(struct ib_mr *mr, int flags, mlx4_mr_rereg_mem_cleanup(dev->dev, &mmr->mmr); ib_umem_release(mmr->umem); mmr->umem = - mlx4_get_umem_mr(mr->uobject->context, udata, start, - length, virt_addr, mr_access_flags); + mlx4_get_umem_mr(udata, start, length, virt_addr, + mr_access_flags); if (IS_ERR(mmr->umem)) { err = PTR_ERR(mmr->umem); /* Prevent mlx4_ib_dereg_mr from free'ing invalid pointer */ diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c index e38bab50cecf..91bb55dd93af 100644 --- a/drivers/infiniband/hw/mlx4/qp.c +++ b/drivers/infiniband/hw/mlx4/qp.c @@ -52,7 +52,8 @@ static void mlx4_ib_lock_cqs(struct mlx4_ib_cq *send_cq, struct mlx4_ib_cq *recv_cq); static void mlx4_ib_unlock_cqs(struct mlx4_ib_cq *send_cq, struct mlx4_ib_cq *recv_cq); -static int _mlx4_ib_modify_wq(struct ib_wq *ibwq, enum ib_wq_state new_state); +static int _mlx4_ib_modify_wq(struct ib_wq *ibwq, enum ib_wq_state new_statem, + struct ib_udata *udata); enum { MLX4_IB_ACK_REQ_FREQ = 8, @@ -778,10 +779,15 @@ static struct ib_qp *_mlx4_ib_create_qp_rss(struct ib_pd *pd, static int mlx4_ib_alloc_wqn(struct mlx4_ib_ucontext *context, struct mlx4_ib_qp *qp, int range_size, int *wqn) { - struct mlx4_ib_dev *dev = to_mdev(context->ibucontext.device); struct mlx4_wqn_range *range; + struct mlx4_ib_dev *dev; int err = 0; + if (!context) + return -EINVAL; + + dev = to_mdev(context->ibucontext.device); + mutex_lock(&context->wqn_ranges_mutex); range = list_first_entry_or_null(&context->wqn_ranges_list, @@ -828,8 +834,13 @@ static int mlx4_ib_alloc_wqn(struct mlx4_ib_ucontext *context, static void mlx4_ib_release_wqn(struct mlx4_ib_ucontext *context, struct mlx4_ib_qp *qp, bool dirty_release) { - struct mlx4_ib_dev *dev = to_mdev(context->ibucontext.device); struct mlx4_wqn_range *range; + struct mlx4_ib_dev *dev; + + if (!WARN_ON(context)) + return; + + dev = to_mdev(context->ibucontext.device); mutex_lock(&context->wqn_ranges_mutex); @@ -867,6 +878,11 @@ static int create_qp_common(struct mlx4_ib_dev *dev, struct ib_pd *pd, struct mlx4_ib_cq *mcq; unsigned long flags; int range_size = 0; + struct mlx4_ib_ucontext *context; + struct ib_ucontext *ib_ucontext; + + ib_ucontext = rdma_get_ucontext(udata); + context = IS_ERR(ib_ucontext) ? NULL : to_mucontext(ib_ucontext); /* When tunneling special qps, we use a plain UD qp */ if (sqpn) { @@ -1038,7 +1054,7 @@ static int create_qp_common(struct mlx4_ib_dev *dev, struct ib_pd *pd, if (qp_has_rq(init_attr)) { err = mlx4_ib_db_map_user( - to_mucontext(pd->uobject->context), udata, + context, udata, (src == MLX4_IB_QP_SRC) ? ucmd.qp.db_addr : ucmd.wq.db_addr, &qp->db); @@ -1112,8 +1128,7 @@ static int create_qp_common(struct mlx4_ib_dev *dev, struct ib_pd *pd, } } } else if (src == MLX4_IB_RWQ_SRC) { - err = mlx4_ib_alloc_wqn(to_mucontext(pd->uobject->context), qp, - range_size, &qpn); + err = mlx4_ib_alloc_wqn(context, qp, range_size, &qpn); if (err) goto err_wrid; } else { @@ -1184,8 +1199,7 @@ static int create_qp_common(struct mlx4_ib_dev *dev, struct ib_pd *pd, if (qp->flags & MLX4_IB_QP_NETIF) mlx4_ib_steer_qp_free(dev, qpn, 1); else if (src == MLX4_IB_RWQ_SRC) - mlx4_ib_release_wqn(to_mucontext(pd->uobject->context), - qp, 0); + mlx4_ib_release_wqn(context, qp, 0); else mlx4_qp_release_range(dev->dev, qpn, 1); } @@ -1195,7 +1209,7 @@ static int create_qp_common(struct mlx4_ib_dev *dev, struct ib_pd *pd, err_wrid: if (udata) { if (qp_has_rq(init_attr)) - mlx4_ib_db_unmap_user(to_mucontext(pd->uobject->context), &qp->db); + mlx4_ib_db_unmap_user(context, &qp->db); } else { kvfree(qp->sq.wrid); kvfree(qp->rq.wrid); @@ -1942,7 +1956,8 @@ static u8 gid_type_to_qpc(enum ib_gid_type gid_type) * Go over all RSS QP's childes (WQs) and apply their HW state according to * their logic state if the RSS QP is the first RSS QP associated for the WQ. */ -static int bringup_rss_rwqs(struct ib_rwq_ind_table *ind_tbl, u8 port_num) +static int bringup_rss_rwqs(struct ib_rwq_ind_table *ind_tbl, u8 port_num, + struct ib_udata *udata) { int err = 0; int i; @@ -1966,7 +1981,7 @@ static int bringup_rss_rwqs(struct ib_rwq_ind_table *ind_tbl, u8 port_num) } wq->port = port_num; if ((wq->rss_usecnt == 0) && (ibwq->state == IB_WQS_RDY)) { - err = _mlx4_ib_modify_wq(ibwq, IB_WQS_RDY); + err = _mlx4_ib_modify_wq(ibwq, IB_WQS_RDY, udata); if (err) { mutex_unlock(&wq->mutex); break; @@ -1988,7 +2003,8 @@ static int bringup_rss_rwqs(struct ib_rwq_ind_table *ind_tbl, u8 port_num) if ((wq->rss_usecnt == 1) && (ibwq->state == IB_WQS_RDY)) - if (_mlx4_ib_modify_wq(ibwq, IB_WQS_RESET)) + if (_mlx4_ib_modify_wq(ibwq, IB_WQS_RESET, + udata)) pr_warn("failed to reverse WQN=0x%06x\n", ibwq->wq_num); wq->rss_usecnt--; @@ -2000,7 +2016,8 @@ static int bringup_rss_rwqs(struct ib_rwq_ind_table *ind_tbl, u8 port_num) return err; } -static void bring_down_rss_rwqs(struct ib_rwq_ind_table *ind_tbl) +static void bring_down_rss_rwqs(struct ib_rwq_ind_table *ind_tbl, + struct ib_udata *udata) { int i; @@ -2011,7 +2028,7 @@ static void bring_down_rss_rwqs(struct ib_rwq_ind_table *ind_tbl) mutex_lock(&wq->mutex); if ((wq->rss_usecnt == 1) && (ibwq->state == IB_WQS_RDY)) - if (_mlx4_ib_modify_wq(ibwq, IB_WQS_RESET)) + if (_mlx4_ib_modify_wq(ibwq, IB_WQS_RESET, udata)) pr_warn("failed to reverse WQN=%x\n", ibwq->wq_num); wq->rss_usecnt--; @@ -2043,12 +2060,14 @@ static void fill_qp_rss_context(struct mlx4_qp_context *context, static int __mlx4_ib_modify_qp(void *src, enum mlx4_ib_source_type src_type, const struct ib_qp_attr *attr, int attr_mask, - enum ib_qp_state cur_state, enum ib_qp_state new_state) + enum ib_qp_state cur_state, + enum ib_qp_state new_state, + struct ib_udata *udata) { - struct ib_uobject *ibuobject; struct ib_srq *ibsrq; const struct ib_gid_attr *gid_attr = NULL; struct ib_rwq_ind_table *rwq_ind_tbl; + struct ib_ucontext *ibucontext; enum ib_qp_type qp_type; struct mlx4_ib_dev *dev; struct mlx4_ib_qp *qp; @@ -2065,7 +2084,6 @@ static int __mlx4_ib_modify_qp(void *src, enum mlx4_ib_source_type src_type, struct ib_wq *ibwq; ibwq = (struct ib_wq *)src; - ibuobject = ibwq->uobject; ibsrq = NULL; rwq_ind_tbl = NULL; qp_type = IB_QPT_RAW_PACKET; @@ -2076,7 +2094,6 @@ static int __mlx4_ib_modify_qp(void *src, enum mlx4_ib_source_type src_type, struct ib_qp *ibqp; ibqp = (struct ib_qp *)src; - ibuobject = ibqp->uobject; ibsrq = ibqp->srq; rwq_ind_tbl = ibqp->rwq_ind_tbl; qp_type = ibqp->qp_type; @@ -2161,12 +2178,17 @@ static int __mlx4_ib_modify_qp(void *src, enum mlx4_ib_source_type src_type, context->param3 |= cpu_to_be32(1 << 30); } - if (ibuobject) + if (udata) { + ibucontext = rdma_get_ucontext(udata); + if (IS_ERR(ibucontext)) { + err = PTR_ERR(ibucontext); + goto out; + } context->usr_page = cpu_to_be32( mlx4_to_hw_uar_index(dev->dev, - to_mucontext(ibuobject->context) + to_mucontext(ibucontext) ->uar.index)); - else + } else context->usr_page = cpu_to_be32( mlx4_to_hw_uar_index(dev->dev, dev->priv_uar.index)); @@ -2297,7 +2319,7 @@ static int __mlx4_ib_modify_qp(void *src, enum mlx4_ib_source_type src_type, context->cqn_recv = cpu_to_be32(recv_cq->mcq.cqn); /* Set "fast registration enabled" for all kernel QPs */ - if (!ibuobject) + if (!udata) context->params1 |= cpu_to_be32(1 << 11); if (attr_mask & IB_QP_RNR_RETRY) { @@ -2434,7 +2456,7 @@ static int __mlx4_ib_modify_qp(void *src, enum mlx4_ib_source_type src_type, else sqd_event = 0; - if (!ibuobject && + if (!udata && cur_state == IB_QPS_RESET && new_state == IB_QPS_INIT) context->rlkey_roce_mode |= (1 << 4); @@ -2445,7 +2467,7 @@ static int __mlx4_ib_modify_qp(void *src, enum mlx4_ib_source_type src_type, * headroom is stamped so that the hardware doesn't start * processing stale work requests. */ - if (!ibuobject && + if (!udata && cur_state == IB_QPS_RESET && new_state == IB_QPS_INIT) { struct mlx4_wqe_ctrl_seg *ctrl; @@ -2509,7 +2531,7 @@ static int __mlx4_ib_modify_qp(void *src, enum mlx4_ib_source_type src_type, * entries and reinitialize the QP. */ if (new_state == IB_QPS_RESET) { - if (!ibuobject) { + if (!udata) { mlx4_ib_cq_clean(recv_cq, qp->mqp.qpn, ibsrq ? to_msrq(ibsrq) : NULL); if (send_cq != recv_cq) @@ -2735,16 +2757,17 @@ static int _mlx4_ib_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr, } if (ibqp->rwq_ind_tbl && (new_state == IB_QPS_INIT)) { - err = bringup_rss_rwqs(ibqp->rwq_ind_tbl, attr->port_num); + err = bringup_rss_rwqs(ibqp->rwq_ind_tbl, attr->port_num, + udata); if (err) goto out; } err = __mlx4_ib_modify_qp(ibqp, MLX4_IB_QP_SRC, attr, attr_mask, - cur_state, new_state); + cur_state, new_state, udata); if (ibqp->rwq_ind_tbl && err) - bring_down_rss_rwqs(ibqp->rwq_ind_tbl); + bring_down_rss_rwqs(ibqp->rwq_ind_tbl, udata); if (mlx4_is_bonded(dev->dev) && (attr_mask & IB_QP_PORT)) attr->port_num = 1; @@ -4122,7 +4145,8 @@ static int ib_wq2qp_state(enum ib_wq_state state) } } -static int _mlx4_ib_modify_wq(struct ib_wq *ibwq, enum ib_wq_state new_state) +static int _mlx4_ib_modify_wq(struct ib_wq *ibwq, enum ib_wq_state new_state, + struct ib_udata *udata) { struct mlx4_ib_qp *qp = to_mqp((struct ib_qp *)ibwq); enum ib_qp_state qp_cur_state; @@ -4146,7 +4170,8 @@ static int _mlx4_ib_modify_wq(struct ib_wq *ibwq, enum ib_wq_state new_state) attr_mask = IB_QP_PORT; err = __mlx4_ib_modify_qp(ibwq, MLX4_IB_RWQ_SRC, &attr, - attr_mask, IB_QPS_RESET, IB_QPS_INIT); + attr_mask, IB_QPS_RESET, IB_QPS_INIT, + udata); if (err) { pr_debug("WQN=0x%06x failed to apply RST->INIT on the HW QP\n", ibwq->wq_num); @@ -4158,12 +4183,13 @@ static int _mlx4_ib_modify_wq(struct ib_wq *ibwq, enum ib_wq_state new_state) attr_mask = 0; err = __mlx4_ib_modify_qp(ibwq, MLX4_IB_RWQ_SRC, NULL, attr_mask, - qp_cur_state, qp_new_state); + qp_cur_state, qp_new_state, udata); if (err && (qp_cur_state == IB_QPS_INIT)) { qp_new_state = IB_QPS_RESET; if (__mlx4_ib_modify_qp(ibwq, MLX4_IB_RWQ_SRC, NULL, - attr_mask, IB_QPS_INIT, IB_QPS_RESET)) { + attr_mask, IB_QPS_INIT, IB_QPS_RESET, + udata)) { pr_warn("WQN=0x%06x failed with reverting HW's resources failure\n", ibwq->wq_num); qp_new_state = IB_QPS_INIT; @@ -4226,7 +4252,7 @@ int mlx4_ib_modify_wq(struct ib_wq *ibwq, struct ib_wq_attr *wq_attr, * WQ, so we can apply its port on the WQ. */ if (qp->rss_usecnt) - err = _mlx4_ib_modify_wq(ibwq, new_state); + err = _mlx4_ib_modify_wq(ibwq, new_state, udata); if (!err) ibwq->state = new_state; diff --git a/drivers/infiniband/hw/mlx4/srq.c b/drivers/infiniband/hw/mlx4/srq.c index 498588eac051..0551e6732d22 100644 --- a/drivers/infiniband/hw/mlx4/srq.c +++ b/drivers/infiniband/hw/mlx4/srq.c @@ -76,6 +76,7 @@ struct ib_srq *mlx4_ib_create_srq(struct ib_pd *pd, struct mlx4_ib_srq *srq; struct mlx4_wqe_srq_next_seg *next; struct mlx4_wqe_data_seg *scatter; + struct ib_ucontext *ib_ucontext; u32 cqn; u16 xrcdn; int desc_size; @@ -108,6 +109,12 @@ struct ib_srq *mlx4_ib_create_srq(struct ib_pd *pd, if (udata) { struct mlx4_ib_create_srq ucmd; + ib_ucontext = rdma_get_ucontext(udata); + if (IS_ERR(ib_ucontext)) { + err = PTR_ERR(ib_ucontext); + goto err_srq; + } + if (ib_copy_from_udata(&ucmd, udata, sizeof ucmd)) { err = -EFAULT; goto err_srq; @@ -128,8 +135,8 @@ struct ib_srq *mlx4_ib_create_srq(struct ib_pd *pd, if (err) goto err_mtt; - err = mlx4_ib_db_map_user(to_mucontext(pd->uobject->context), - udata, ucmd.db_addr, &srq->db); + err = mlx4_ib_db_map_user(to_mucontext(ib_ucontext), udata, + ucmd.db_addr, &srq->db); if (err) goto err_mtt; } else { @@ -202,7 +209,7 @@ struct ib_srq *mlx4_ib_create_srq(struct ib_pd *pd, err_wrid: if (udata) - mlx4_ib_db_unmap_user(to_mucontext(pd->uobject->context), &srq->db); + mlx4_ib_db_unmap_user(to_mucontext(ib_ucontext), &srq->db); else kvfree(srq->wrid); diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c index 529e76f67cb6..ccaa88a4eb9f 100644 --- a/drivers/infiniband/hw/mlx5/qp.c +++ b/drivers/infiniband/hw/mlx5/qp.c @@ -696,12 +696,17 @@ static int create_user_rq(struct mlx5_ib_dev *dev, struct ib_pd *pd, struct ib_udata *udata, struct mlx5_ib_rwq *rwq, struct mlx5_ib_create_wq *ucmd) { + struct ib_ucontext *ib_ucontext; int page_shift = 0; int npages; u32 offset = 0; int ncont = 0; int err; + ib_ucontext = rdma_get_ucontext(udata); + if (IS_ERR(ib_ucontext)) + return PTR_ERR(ib_ucontext); + if (!ucmd->buf_addr) return -EINVAL; @@ -730,7 +735,7 @@ static int create_user_rq(struct mlx5_ib_dev *dev, struct ib_pd *pd, (unsigned long long)ucmd->buf_addr, rwq->buf_size, npages, page_shift, ncont, offset); - err = mlx5_ib_db_map_user(to_mucontext(pd->uobject->context), udata, + err = mlx5_ib_db_map_user(to_mucontext(ib_ucontext), udata, ucmd->db_addr, &rwq->db); if (err) { mlx5_ib_dbg(dev, "map failed\n"); @@ -759,6 +764,7 @@ static int create_user_qp(struct mlx5_ib_dev *dev, struct ib_pd *pd, struct mlx5_ib_create_qp_resp *resp, int *inlen, struct mlx5_ib_qp_base *base) { + struct ib_ucontext *ib_ucontext; struct mlx5_ib_ucontext *context; struct mlx5_ib_create_qp ucmd; struct mlx5_ib_ubuffer *ubuffer = &base->ubuffer; @@ -773,13 +779,17 @@ static int create_user_qp(struct mlx5_ib_dev *dev, struct ib_pd *pd, int err; u16 uid; + ib_ucontext = rdma_get_ucontext(udata); + if (IS_ERR(ib_ucontext)) + return PTR_ERR(ib_ucontext); + err = ib_copy_from_udata(&ucmd, udata, sizeof(ucmd)); if (err) { mlx5_ib_dbg(dev, "copy failed\n"); return err; } - context = to_mucontext(pd->uobject->context); + context = to_mucontext(ib_ucontext); if (ucmd.flags & MLX5_QP_FLAG_BFREG_INDEX) { uar_index = bfregn_to_uar_index(dev, &context->bfregi, ucmd.bfreg_index, true); @@ -1818,6 +1828,7 @@ static int create_qp_common(struct mlx5_ib_dev *dev, struct ib_pd *pd, int inlen = MLX5_ST_SZ_BYTES(create_qp_in); struct mlx5_core_dev *mdev = dev->mdev; struct mlx5_ib_create_qp_resp resp = {}; + struct ib_ucontext *context; struct mlx5_ib_cq *send_cq; struct mlx5_ib_cq *recv_cq; unsigned long flags; @@ -1918,8 +1929,12 @@ static int create_qp_common(struct mlx5_ib_dev *dev, struct ib_pd *pd, MLX5_QP_FLAG_PACKET_BASED_CREDIT_MODE)) return -EINVAL; - err = get_qp_user_index(to_mucontext(pd->uobject->context), - &ucmd, udata->inlen, &uidx); + context = rdma_get_ucontext(udata); + if (IS_ERR(context)) + return PTR_ERR(context); + + err = get_qp_user_index(to_mucontext(context), &ucmd, + udata->inlen, &uidx); if (err) return err; @@ -2403,8 +2418,10 @@ static const char *ib_qp_type_str(enum ib_qp_type type) static struct ib_qp *mlx5_ib_create_dct(struct ib_pd *pd, struct ib_qp_init_attr *attr, - struct mlx5_ib_create_qp *ucmd) + struct mlx5_ib_create_qp *ucmd, + struct ib_udata *udata) { + struct ib_ucontext *context; struct mlx5_ib_qp *qp; int err = 0; u32 uidx = MLX5_IB_DEFAULT_UIDX; @@ -2413,8 +2430,12 @@ static struct ib_qp *mlx5_ib_create_dct(struct ib_pd *pd, if (!attr->srq || !attr->recv_cq) return ERR_PTR(-EINVAL); - err = get_qp_user_index(to_mucontext(pd->uobject->context), - ucmd, sizeof(*ucmd), &uidx); + context = rdma_get_ucontext(udata); + if (IS_ERR(context)) + return ERR_CAST(context); + + err = get_qp_user_index(to_mucontext(context), ucmd, sizeof(*ucmd), + &uidx); if (err) return ERR_PTR(err); @@ -2490,6 +2511,7 @@ struct ib_qp *mlx5_ib_create_qp(struct ib_pd *pd, struct ib_qp_init_attr *verbs_init_attr, struct ib_udata *udata) { + struct ib_ucontext *context; struct mlx5_ib_dev *dev; struct mlx5_ib_qp *qp; u16 xrcdn = 0; @@ -2504,9 +2526,16 @@ struct ib_qp *mlx5_ib_create_qp(struct ib_pd *pd, if (!udata) { mlx5_ib_dbg(dev, "Raw Packet QP is not supported for kernel consumers\n"); return ERR_PTR(-EINVAL); - } else if (!to_mucontext(pd->uobject->context)->cqe_version) { - mlx5_ib_dbg(dev, "Raw Packet QP is only supported for CQE version > 0\n"); - return ERR_PTR(-EINVAL); + } else { + context = rdma_get_ucontext(udata); + if (IS_ERR(context)) + return ERR_CAST(context); + + if (!to_mucontext(context)-> + cqe_version) { + mlx5_ib_dbg(dev, "Raw Packet QP is only supported for CQE version > 0\n"); + return ERR_PTR(-EINVAL); + } } } } else { @@ -2536,7 +2565,7 @@ struct ib_qp *mlx5_ib_create_qp(struct ib_pd *pd, return ERR_PTR(-EINVAL); } } else { - return mlx5_ib_create_dct(pd, init_attr, &ucmd); + return mlx5_ib_create_dct(pd, init_attr, &ucmd, udata); } } @@ -3174,13 +3203,16 @@ static int modify_raw_packet_qp(struct mlx5_ib_dev *dev, struct mlx5_ib_qp *qp, static unsigned int get_tx_affinity(struct mlx5_ib_dev *dev, struct mlx5_ib_pd *pd, struct mlx5_ib_qp_base *qp_base, - u8 port_num) + u8 port_num, + struct ib_udata *udata) { struct mlx5_ib_ucontext *ucontext = NULL; unsigned int tx_port_affinity; + struct ib_ucontext *context; - if (pd && pd->ibpd.uobject && pd->ibpd.uobject->context) - ucontext = to_mucontext(pd->ibpd.uobject->context); + context = rdma_get_ucontext(udata); + if (!IS_ERR(context)) + ucontext = to_mucontext(context); if (ucontext) { tx_port_affinity = (unsigned int)atomic_add_return( @@ -3205,7 +3237,8 @@ static unsigned int get_tx_affinity(struct mlx5_ib_dev *dev, static int __mlx5_ib_modify_qp(struct ib_qp *ibqp, const struct ib_qp_attr *attr, int attr_mask, enum ib_qp_state cur_state, enum ib_qp_state new_state, - const struct mlx5_ib_modify_qp *ucmd) + const struct mlx5_ib_modify_qp *ucmd, + struct ib_udata *udata) { static const u16 optab[MLX5_QP_NUM_STATE][MLX5_QP_NUM_STATE] = { [MLX5_QP_STATE_RST] = { @@ -3296,7 +3329,8 @@ static int __mlx5_ib_modify_qp(struct ib_qp *ibqp, (ibqp->qp_type == IB_QPT_XRC_TGT)) { if (dev->lag_active) { u8 p = mlx5_core_native_port_num(dev->mdev); - tx_affinity = get_tx_affinity(dev, pd, base, p); + tx_affinity = get_tx_affinity(dev, pd, base, p, + udata); context->flags |= cpu_to_be32(tx_affinity << 24); } } @@ -3779,7 +3813,7 @@ int mlx5_ib_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr, } err = __mlx5_ib_modify_qp(ibqp, attr, attr_mask, cur_state, - new_state, &ucmd); + new_state, &ucmd, udata); out: mutex_unlock(&qp->mutex); diff --git a/drivers/infiniband/hw/mlx5/srq.c b/drivers/infiniband/hw/mlx5/srq.c index 22bd774e0b4e..827e58c729a6 100644 --- a/drivers/infiniband/hw/mlx5/srq.c +++ b/drivers/infiniband/hw/mlx5/srq.c @@ -47,6 +47,7 @@ static int create_srq_user(struct ib_pd *pd, struct mlx5_ib_srq *srq, { struct mlx5_ib_dev *dev = to_mdev(pd->device); struct mlx5_ib_create_srq ucmd = {}; + struct ib_ucontext *context; size_t ucmdlen; int err; int npages; @@ -55,6 +56,10 @@ static int create_srq_user(struct ib_pd *pd, struct mlx5_ib_srq *srq, u32 offset; u32 uidx = MLX5_IB_DEFAULT_UIDX; + context = rdma_get_ucontext(udata); + if (IS_ERR(context)) + return PTR_ERR(context); + ucmdlen = min(udata->inlen, sizeof(ucmd)); if (ib_copy_from_udata(&ucmd, udata, ucmdlen)) { @@ -71,8 +76,8 @@ static int create_srq_user(struct ib_pd *pd, struct mlx5_ib_srq *srq, return -EINVAL; if (in->type != IB_SRQT_BASIC) { - err = get_srq_user_index(to_mucontext(pd->uobject->context), - &ucmd, udata->inlen, &uidx); + err = get_srq_user_index(to_mucontext(context), &ucmd, + udata->inlen, &uidx); if (err) return err; } @@ -103,8 +108,8 @@ static int create_srq_user(struct ib_pd *pd, struct mlx5_ib_srq *srq, mlx5_ib_populate_pas(dev, srq->umem, page_shift, in->pas, 0); - err = mlx5_ib_db_map_user(to_mucontext(pd->uobject->context), udata, - ucmd.db_addr, &srq->db); + err = mlx5_ib_db_map_user(to_mucontext(context), udata, ucmd.db_addr, + &srq->db); if (err) { mlx5_ib_dbg(dev, "map doorbell failed\n"); goto err_in; diff --git a/drivers/infiniband/hw/mthca/mthca_provider.c b/drivers/infiniband/hw/mthca/mthca_provider.c index 63003b4d2485..4d6a17bda8b4 100644 --- a/drivers/infiniband/hw/mthca/mthca_provider.c +++ b/drivers/infiniband/hw/mthca/mthca_provider.c @@ -446,6 +446,7 @@ static struct ib_srq *mthca_create_srq(struct ib_pd *pd, { struct mthca_create_srq ucmd; struct mthca_ucontext *context = NULL; + struct ib_ucontext *ib_ucontext; struct mthca_srq *srq; int err; @@ -457,7 +458,12 @@ static struct ib_srq *mthca_create_srq(struct ib_pd *pd, return ERR_PTR(-ENOMEM); if (udata) { - context = to_mucontext(pd->uobject->context); + ib_ucontext = rdma_get_ucontext(udata); + if (IS_ERR(ib_ucontext)) { + err = PTR_ERR(ib_ucontext); + goto err_free; + } + context = to_mucontext(ib_ucontext); if (ib_copy_from_udata(&ucmd, udata, sizeof ucmd)) { err = -EFAULT; @@ -520,6 +526,7 @@ static struct ib_qp *mthca_create_qp(struct ib_pd *pd, struct ib_qp_init_attr *init_attr, struct ib_udata *udata) { + struct ib_ucontext *ib_ucontext; struct mthca_create_qp ucmd; struct mthca_qp *qp; int err; @@ -532,14 +539,17 @@ static struct ib_qp *mthca_create_qp(struct ib_pd *pd, case IB_QPT_UC: case IB_QPT_UD: { - struct mthca_ucontext *context; + struct mthca_ucontext *context = NULL; qp = kmalloc(sizeof *qp, GFP_KERNEL); if (!qp) return ERR_PTR(-ENOMEM); if (udata) { - context = to_mucontext(pd->uobject->context); + ib_ucontext = rdma_get_ucontext(udata); + if (IS_ERR(ib_ucontext)) + return ERR_CAST(ib_ucontext); + context = to_mucontext(ib_ucontext); if (ib_copy_from_udata(&ucmd, udata, sizeof ucmd)) { kfree(qp); @@ -577,9 +587,7 @@ static struct ib_qp *mthca_create_qp(struct ib_pd *pd, init_attr->qp_type, init_attr->sq_sig_type, &init_attr->cap, qp, udata); - if (err && udata) { - context = to_mucontext(pd->uobject->context); - + if (err && context) { mthca_unmap_user_db(to_mdev(pd->device), &context->uar, context->db_tab, @@ -907,6 +915,7 @@ static struct ib_mr *mthca_reg_user_mr(struct ib_pd *pd, u64 start, u64 length, u64 virt, int acc, struct ib_udata *udata) { struct mthca_dev *dev = to_mdev(pd->device); + struct ib_ucontext *context; struct scatterlist *sg; struct mthca_mr *mr; struct mthca_reg_mr ucmd; @@ -917,12 +926,15 @@ static struct ib_mr *mthca_reg_user_mr(struct ib_pd *pd, u64 start, u64 length, int write_mtt_size; if (udata->inlen < sizeof ucmd) { - if (!to_mucontext(pd->uobject->context)->reg_mr_warned) { + context = rdma_get_ucontext(udata); + if (IS_ERR(context)) + return ERR_CAST(context); + if (!to_mucontext(context)->reg_mr_warned) { mthca_warn(dev, "Process '%s' did not pass in MR attrs.\n", current->comm); mthca_warn(dev, " Update libmthca to fix this.\n"); } - ++to_mucontext(pd->uobject->context)->reg_mr_warned; + ++to_mucontext(context)->reg_mr_warned; ucmd.mr_attrs = 0; } else if (ib_copy_from_udata(&ucmd, udata, sizeof ucmd)) return ERR_PTR(-EFAULT); diff --git a/drivers/infiniband/hw/mthca/mthca_qp.c b/drivers/infiniband/hw/mthca/mthca_qp.c index 4e5b5cc17f1d..ea0ee6b5572c 100644 --- a/drivers/infiniband/hw/mthca/mthca_qp.c +++ b/drivers/infiniband/hw/mthca/mthca_qp.c @@ -554,7 +554,9 @@ static int mthca_path_set(struct mthca_dev *dev, const struct rdma_ah_attr *ah, static int __mthca_modify_qp(struct ib_qp *ibqp, const struct ib_qp_attr *attr, int attr_mask, - enum ib_qp_state cur_state, enum ib_qp_state new_state) + enum ib_qp_state cur_state, + enum ib_qp_state new_state, + struct ib_udata *udata) { struct mthca_dev *dev = to_mdev(ibqp->device); struct mthca_qp *qp = to_mqp(ibqp); @@ -563,6 +565,7 @@ static int __mthca_modify_qp(struct ib_qp *ibqp, struct mthca_qp_context *qp_context; u32 sqd_event = 0; int err = -EINVAL; + struct ib_ucontext *context; mailbox = mthca_alloc_mailbox(dev, GFP_KERNEL); if (IS_ERR(mailbox)) { @@ -618,10 +621,15 @@ static int __mthca_modify_qp(struct ib_qp *ibqp, /* leave arbel_sched_queue as 0 */ - if (qp->ibqp.uobject) + if (udata) { + context = rdma_get_ucontext(udata); + if (IS_ERR(context)) { + err = PTR_ERR(context); + goto out_mailbox; + } qp_context->usr_page = - cpu_to_be32(to_mucontext(qp->ibqp.uobject->context)->uar.index); - else + cpu_to_be32(to_mucontext(context)->uar.index); + } else qp_context->usr_page = cpu_to_be32(dev->driver_uar.index); qp_context->local_qpn = cpu_to_be32(qp->qpn); if (attr_mask & IB_QP_DEST_QPN) { @@ -913,7 +921,8 @@ int mthca_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr, int attr_mask, goto out; } - err = __mthca_modify_qp(ibqp, attr, attr_mask, cur_state, new_state); + err = __mthca_modify_qp(ibqp, attr, attr_mask, cur_state, new_state, + udata); out: mutex_unlock(&qp->mutex); diff --git a/drivers/infiniband/hw/mthca/mthca_srq.c b/drivers/infiniband/hw/mthca/mthca_srq.c index b8333c79e3fa..04979b0ee7fd 100644 --- a/drivers/infiniband/hw/mthca/mthca_srq.c +++ b/drivers/infiniband/hw/mthca/mthca_srq.c @@ -96,18 +96,23 @@ static void mthca_tavor_init_srq_context(struct mthca_dev *dev, struct mthca_pd *pd, struct mthca_srq *srq, struct mthca_tavor_srq_context *context, - bool is_user) + struct ib_udata *udata) { + struct ib_ucontext *ib_ucontext; + memset(context, 0, sizeof *context); context->wqe_base_ds = cpu_to_be64(1 << (srq->wqe_shift - 4)); context->state_pd = cpu_to_be32(pd->pd_num); context->lkey = cpu_to_be32(srq->mr.ibmr.lkey); - if (is_user) + if (udata) { + ib_ucontext = rdma_get_ucontext(udata); + if (WARN_ON(IS_ERR(ib_ucontext))) + return; context->uar = - cpu_to_be32(to_mucontext(pd->ibpd.uobject->context)->uar.index); - else + cpu_to_be32(to_mucontext(ib_ucontext)->uar.index); + } else context->uar = cpu_to_be32(dev->driver_uar.index); } @@ -115,8 +120,9 @@ static void mthca_arbel_init_srq_context(struct mthca_dev *dev, struct mthca_pd *pd, struct mthca_srq *srq, struct mthca_arbel_srq_context *context, - bool is_user) + struct ib_udata *udata) { + struct ib_ucontext *ib_ucontext; int logsize, max; memset(context, 0, sizeof *context); @@ -131,10 +137,13 @@ static void mthca_arbel_init_srq_context(struct mthca_dev *dev, context->lkey = cpu_to_be32(srq->mr.ibmr.lkey); context->db_index = cpu_to_be32(srq->db_index); context->logstride_usrpage = cpu_to_be32((srq->wqe_shift - 4) << 29); - if (is_user) + if (udata) { + ib_ucontext = rdma_get_ucontext(udata); + if (WARN_ON(IS_ERR(ib_ucontext))) + return; context->logstride_usrpage |= - cpu_to_be32(to_mucontext(pd->ibpd.uobject->context)->uar.index); - else + cpu_to_be32(to_mucontext(ib_ucontext)->uar.index); + } else context->logstride_usrpage |= cpu_to_be32(dev->driver_uar.index); context->eq_pd = cpu_to_be32(MTHCA_EQ_ASYNC << 24 | pd->pd_num); } diff --git a/drivers/infiniband/hw/nes/nes_verbs.c b/drivers/infiniband/hw/nes/nes_verbs.c index 034156f7e9ed..a33b3fdae682 100644 --- a/drivers/infiniband/hw/nes/nes_verbs.c +++ b/drivers/infiniband/hw/nes/nes_verbs.c @@ -983,6 +983,7 @@ static struct ib_qp *nes_create_qp(struct ib_pd *ibpd, struct nes_vnic *nesvnic = to_nesvnic(ibpd->device); struct nes_device *nesdev = nesvnic->nesdev; struct nes_adapter *nesadapter = nesdev->nesadapter; + struct ib_ucontext *context; struct nes_qp *nesqp; struct nes_cq *nescq; struct nes_ucontext *nes_ucontext; @@ -1066,9 +1067,10 @@ static struct ib_qp *nes_create_qp(struct ib_pd *ibpd, } if (req.user_qp_buffer) nesqp->nesuqp_addr = req.user_qp_buffer; - if (udata && (ibpd->uobject->context)) { + context = rdma_get_ucontext(udata); + if (!IS_ERR(context)) { nesqp->user_mode = 1; - nes_ucontext = to_nesucontext(ibpd->uobject->context); + nes_ucontext = to_nesucontext(context); if (virt_wqs) { err = 1; list_for_each_entry(nespbl, &nes_ucontext->qp_reg_mem_list, list) { @@ -1089,7 +1091,6 @@ static struct ib_qp *nes_create_qp(struct ib_pd *ibpd, } } - nes_ucontext = to_nesucontext(ibpd->uobject->context); nesqp->mmap_sq_db_index = find_next_zero_bit(nes_ucontext->allocated_wqs, NES_MAX_USER_WQ_REGIONS, nes_ucontext->first_free_wq); @@ -2111,6 +2112,7 @@ static struct ib_mr *nes_reg_user_mr(struct ib_pd *pd, u64 start, u64 length, struct ib_mr *ibmr = ERR_PTR(-EINVAL); struct scatterlist *sg; struct nes_ucontext *nes_ucontext; + struct ib_ucontext *context; struct nes_pbl *nespbl; struct nes_mr *nesmr; struct ib_umem *region; @@ -2382,8 +2384,11 @@ static struct ib_mr *nes_reg_user_mr(struct ib_pd *pd, u64 start, u64 length, kfree(nespbl); return ERR_PTR(-ENOMEM); } + context = rdma_get_ucontext(udata); + if (IS_ERR(context)) + return ERR_CAST(context); nesmr->region = region; - nes_ucontext = to_nesucontext(pd->uobject->context); + nes_ucontext = to_nesucontext(context); pbl_depth = region->length >> 12; pbl_depth += (region->length & (4096-1)) ? 1 : 0; nespbl->pbl_size = pbl_depth*sizeof(u64); diff --git a/drivers/infiniband/hw/qedr/verbs.c b/drivers/infiniband/hw/qedr/verbs.c index b27ff9408507..e676a36b6b5d 100644 --- a/drivers/infiniband/hw/qedr/verbs.c +++ b/drivers/infiniband/hw/qedr/verbs.c @@ -1433,7 +1433,6 @@ struct ib_srq *qedr_create_srq(struct ib_pd *ibpd, struct qedr_pd *pd = get_qedr_pd(ibpd); struct qedr_create_srq_ureq ureq = {}; u64 pbl_base_addr, phy_prod_pair_addr; - struct ib_ucontext *ib_ctx = NULL; struct qedr_srq_hwq_info *hw_srq; u32 page_cnt, page_size; struct qedr_srq *srq; @@ -1458,9 +1457,7 @@ struct ib_srq *qedr_create_srq(struct ib_pd *ibpd, hw_srq->max_wr = init_attr->attr.max_wr; hw_srq->max_sges = init_attr->attr.max_sge; - if (udata && ibpd->uobject && ibpd->uobject->context) { - ib_ctx = ibpd->uobject->context; - + if (udata && !IS_ERR(rdma_get_ucontext(udata))) { if (ib_copy_from_udata(&ureq, udata, sizeof(ureq))) { DP_ERR(dev, "create srq: problem copying data from user space\n"); @@ -1698,13 +1695,10 @@ static int qedr_create_user_qp(struct qedr_dev *dev, struct qed_rdma_create_qp_in_params in_params; struct qed_rdma_create_qp_out_params out_params; struct qedr_pd *pd = get_qedr_pd(ibpd); - struct ib_ucontext *ib_ctx = NULL; struct qedr_create_qp_ureq ureq; int alloc_and_init = rdma_protocol_roce(&dev->ibdev, 1); int rc = -EINVAL; - ib_ctx = ibpd->uobject->context; - memset(&ureq, 0, sizeof(ureq)); rc = ib_copy_from_udata(&ureq, udata, sizeof(ureq)); if (rc) { diff --git a/drivers/infiniband/hw/usnic/usnic_ib_verbs.c b/drivers/infiniband/hw/usnic/usnic_ib_verbs.c index 432e6f6599fa..f471e7f270c0 100644 --- a/drivers/infiniband/hw/usnic/usnic_ib_verbs.c +++ b/drivers/infiniband/hw/usnic/usnic_ib_verbs.c @@ -501,10 +501,15 @@ struct ib_qp *usnic_ib_create_qp(struct ib_pd *pd, struct usnic_vnic_res_spec res_spec; struct usnic_ib_create_qp_cmd cmd; struct usnic_transport_spec trans_spec; + struct ib_ucontext *context; usnic_dbg("\n"); - ucontext = to_uucontext(pd->uobject->context); + context = rdma_get_ucontext(udata); + if (IS_ERR(context)) + return ERR_CAST(context); + + ucontext = to_uucontext(context); us_ibdev = to_usdev(pd->device); if (init_attr->create_flags) diff --git a/drivers/infiniband/sw/rdmavt/qp.c b/drivers/infiniband/sw/rdmavt/qp.c index a1bd8cfc2c25..ace8e640aee1 100644 --- a/drivers/infiniband/sw/rdmavt/qp.c +++ b/drivers/infiniband/sw/rdmavt/qp.c @@ -948,6 +948,7 @@ struct ib_qp *rvt_create_qp(struct ib_pd *ibpd, struct ib_qp_init_attr *init_attr, struct ib_udata *udata) { + struct ib_ucontext *context; struct rvt_qp *qp; int err; struct rvt_swqe *swq = NULL; @@ -1127,8 +1128,13 @@ struct ib_qp *rvt_create_qp(struct ib_pd *ibpd, } else { u32 s = sizeof(struct rvt_rwq) + qp->r_rq.size * sz; - qp->ip = rvt_create_mmap_info(rdi, s, - ibpd->uobject->context, + context = rdma_get_ucontext(udata); + if (IS_ERR(context)) { + ret = PTR_ERR(context); + goto bail_qpn; + } + + qp->ip = rvt_create_mmap_info(rdi, s, context, qp->r_rq.wq); if (!qp->ip) { ret = ERR_PTR(-ENOMEM); diff --git a/drivers/infiniband/sw/rdmavt/srq.c b/drivers/infiniband/sw/rdmavt/srq.c index 78e06fc456c5..fa4c6bc32ec6 100644 --- a/drivers/infiniband/sw/rdmavt/srq.c +++ b/drivers/infiniband/sw/rdmavt/srq.c @@ -77,6 +77,7 @@ struct ib_srq *rvt_create_srq(struct ib_pd *ibpd, struct ib_udata *udata) { struct rvt_dev_info *dev = ib_to_rvt(ibpd->device); + struct ib_ucontext *context; struct rvt_srq *srq; u32 sz; struct ib_srq *ret; @@ -118,9 +119,14 @@ struct ib_srq *rvt_create_srq(struct ib_pd *ibpd, int err; u32 s = sizeof(struct rvt_rwq) + srq->rq.size * sz; + context = rdma_get_ucontext(udata); + if (IS_ERR(context)) { + ret = ERR_CAST(context); + goto bail_wq; + } + srq->ip = - rvt_create_mmap_info(dev, s, ibpd->uobject->context, - srq->rq.wq); + rvt_create_mmap_info(dev, s, context, srq->rq.wq); if (!srq->ip) { ret = ERR_PTR(-ENOMEM); goto bail_wq; diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c b/drivers/infiniband/sw/rxe/rxe_qp.c index fd86fd2fbb26..a6a5f223ffb3 100644 --- a/drivers/infiniband/sw/rxe/rxe_qp.c +++ b/drivers/infiniband/sw/rxe/rxe_qp.c @@ -343,7 +343,10 @@ int rxe_qp_from_init(struct rxe_dev *rxe, struct rxe_qp *qp, struct rxe_pd *pd, struct rxe_cq *rcq = to_rcq(init->recv_cq); struct rxe_cq *scq = to_rcq(init->send_cq); struct rxe_srq *srq = init->srq ? to_rsrq(init->srq) : NULL; - struct ib_ucontext *context = udata ? ibpd->uobject->context : NULL; + struct ib_ucontext *context = rdma_get_ucontext(udata); + + if (IS_ERR(context)) + context = NULL; rxe_add_ref(pd); rxe_add_ref(rcq); diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c index 3d01247a28db..0d6e5af21797 100644 --- a/drivers/infiniband/sw/rxe/rxe_verbs.c +++ b/drivers/infiniband/sw/rxe/rxe_verbs.c @@ -331,9 +331,12 @@ static struct ib_srq *rxe_create_srq(struct ib_pd *ibpd, struct rxe_dev *rxe = to_rdev(ibpd->device); struct rxe_pd *pd = to_rpd(ibpd); struct rxe_srq *srq; - struct ib_ucontext *context = udata ? ibpd->uobject->context : NULL; + struct ib_ucontext *context = rdma_get_ucontext(udata); struct rxe_create_srq_resp __user *uresp = NULL; + if (IS_ERR(context)) + context = NULL; + if (udata) { if (udata->outlen < sizeof(*uresp)) return ERR_PTR(-EINVAL);
Now when we have the udata passed to all the ib_xxx object creation APIs and the additional function 'rdma_get_ucontext' to get the ib_ucontext from ib_udata stored in uverbs_attr_bundle, we can finally start to remove the dependency of the drivers in the ib_xxx->uobject->context. Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> --- drivers/infiniband/hw/bnxt_re/ib_verbs.c | 33 +++++-- drivers/infiniband/hw/cxgb3/iwch_provider.c | 4 +- drivers/infiniband/hw/cxgb4/qp.c | 8 +- drivers/infiniband/hw/hns/hns_roce_qp.c | 15 +++- drivers/infiniband/hw/i40iw/i40iw_verbs.c | 15 +++- drivers/infiniband/hw/mlx4/doorbell.c | 6 ++ drivers/infiniband/hw/mlx4/mr.c | 9 +- drivers/infiniband/hw/mlx4/qp.c | 92 +++++++++++++------- drivers/infiniband/hw/mlx4/srq.c | 13 ++- drivers/infiniband/hw/mlx5/qp.c | 68 +++++++++++---- drivers/infiniband/hw/mlx5/srq.c | 13 ++- drivers/infiniband/hw/mthca/mthca_provider.c | 28 ++++-- drivers/infiniband/hw/mthca/mthca_qp.c | 19 ++-- drivers/infiniband/hw/mthca/mthca_srq.c | 25 ++++-- drivers/infiniband/hw/nes/nes_verbs.c | 13 ++- drivers/infiniband/hw/qedr/verbs.c | 8 +- drivers/infiniband/hw/usnic/usnic_ib_verbs.c | 7 +- drivers/infiniband/sw/rdmavt/qp.c | 10 ++- drivers/infiniband/sw/rdmavt/srq.c | 10 ++- drivers/infiniband/sw/rxe/rxe_qp.c | 5 +- drivers/infiniband/sw/rxe/rxe_verbs.c | 5 +- 21 files changed, 287 insertions(+), 119 deletions(-)