Message ID | 20151110134147.GA12814@infradead.org (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
On Tue, Nov 10, 2015 at 05:41:47AM -0800, Christoph Hellwig wrote: > FYI, this is the API I'd aim for (only SRP and no HW driver converted > yet): > n = ib_map_mr_sg(desc->mr, state->sg, state->sg_nents, > - dev->mr_page_size); > + dev->mr_page_size, > + /* > + * XXX: add a bool write argument to this function, > + * so that we only need to open up the required > + * permissions. > + */ > + IB_MR_REMOTE | IB_MR_RDMA_READ | > IB_MR_RDMA_WRITE); I would call it IB_RDMA_LKEY and IB_RDMA_RKEY. We have other places in the API where lkey/rkey is used and it makes a lot more sense to think about a MR as being either a lkey or rkey usable MR - since this is effectively what we are doing here with these ops. IB_MR_RKEY | IB_MR_RDMA_READ == The resulting key can be used in a wr.rdma.rkey field IB_MR_LKEY | IB_MR_SEND == The key can be used in wr.sg_list[].lkey etc It is an error to use a IB_MR_LKEY in a rkey field, etc.. > +enum ib_mr_flags { > + /* scope: either remote or local */ > + IB_MR_REMOTE, > + IB_MR_LOCAL, > + > + /* direction: one or both can be ORed into the scope above */ > + IB_MR_RDMA_READ = (1 << 10), > + IB_MR_RDMA_WRITE = (1 << 11) Don't forget SEND too. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Nov 10, 2015 at 11:36:27AM -0700, Jason Gunthorpe wrote: > > n = ib_map_mr_sg(desc->mr, state->sg, state->sg_nents, > > - dev->mr_page_size); > > + dev->mr_page_size, > > + /* > > + * XXX: add a bool write argument to this function, > > + * so that we only need to open up the required > > + * permissions. > > + */ > > + IB_MR_REMOTE | IB_MR_RDMA_READ | > > IB_MR_RDMA_WRITE); > > I would call it IB_RDMA_LKEY and IB_RDMA_RKEY. We have other places in > the API where lkey/rkey is used and it makes a lot more sense to think > about a MR as being either a lkey or rkey usable MR - since this is > effectively what we are doing here with these ops. Hmm, I really hate these suport short names, but if there is consensus I can fix it up. > > +enum ib_mr_flags { > > + /* scope: either remote or local */ > > + IB_MR_REMOTE, > > + IB_MR_LOCAL, > > + > > + /* direction: one or both can be ORed into the scope above */ > > + IB_MR_RDMA_READ = (1 << 10), > > + IB_MR_RDMA_WRITE = (1 << 11) > > Don't forget SEND too. I don't think we're ever using that in the kernel, but it's an easy addition for completeless. Especially once we start exposing these flags to the drivers. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 10/11/2015 15:41, Christoph Hellwig wrote: > FYI, this is the API I'd aim for (only SRP and no HW driver converted > yet): This looks fine, although personally I find scope and direction flags more confusing than access_flags (but maybe it's just me). I think that the real issue here is the server/target side which needs extra logic for rdma_read and memory registration. If we can put this logic in the core and give ULPs an API that looks something like: - ib_rdma_read() - ib_rdma_write() then maybe we don't need to change our flags? The only reason why I posted this series was a pre-step for this type of API so we can avoid the is_iwarp() condition in the hot path. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Nov 11, 2015 at 11:07:14AM +0200, Sagi Grimberg wrote: > > > On 10/11/2015 15:41, Christoph Hellwig wrote: > >FYI, this is the API I'd aim for (only SRP and no HW driver converted > >yet): > > This looks fine, although personally I find scope and direction flags > more confusing than access_flags (but maybe it's just me). Looking at the drivers the current flags seem to confuse programmes hard, given that the choises seem to be random values that just happen to work: rkeys: iser: lwrite | rwrite | rread srp: lwrite | rwrite | rread rds: lwrite | rread | rwrite rds: rwrite xprdtrdma (wr): rwrite | lwrite xprdtrdma (rd): rread lkeys: isert: lwrite svcrdma: lwrite | rwrite moving to a model where the consumer clearly says what they intend to do with the MR seem much better than that. > I think that the real issue here is the server/target side which needs > extra logic for rdma_read and memory registration. If we can put this > logic in the core and give ULPs an API that looks something like: > - ib_rdma_read() > - ib_rdma_write() > > then maybe we don't need to change our flags? The current flags don't make any sense for someone trying to use the API. They're pointless leakage of badly designed protocol specs. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c index 0e21367..7ea695c 100644 --- a/drivers/infiniband/core/verbs.c +++ b/drivers/infiniband/core/verbs.c @@ -1484,14 +1484,15 @@ EXPORT_SYMBOL(ib_check_mr_status); int ib_map_mr_sg(struct ib_mr *mr, struct scatterlist *sg, int sg_nents, - unsigned int page_size) + unsigned int page_size, + unsigned int flags) { if (unlikely(!mr->device->map_mr_sg)) return -ENOSYS; mr->page_size = page_size; - return mr->device->map_mr_sg(mr, sg, sg_nents); + return mr->device->map_mr_sg(mr, sg, sg_nents, flags); } EXPORT_SYMBOL(ib_map_mr_sg); diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index 62b6cba..d77a5b4 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -1314,7 +1314,6 @@ static int srp_map_finish_fr(struct srp_map_state *state, struct srp_target_port *target = ch->target; struct srp_device *dev = target->srp_host->srp_dev; struct ib_send_wr *bad_wr; - struct ib_reg_wr wr; struct srp_fr_desc *desc; u32 rkey; int n, err; @@ -1342,20 +1341,17 @@ static int srp_map_finish_fr(struct srp_map_state *state, ib_update_fast_reg_key(desc->mr, rkey); n = ib_map_mr_sg(desc->mr, state->sg, state->sg_nents, - dev->mr_page_size); + dev->mr_page_size, + /* + * XXX: add a bool write argument to this function, + * so that we only need to open up the required + * permissions. + */ + IB_MR_REMOTE | IB_MR_RDMA_READ | IB_MR_RDMA_WRITE); if (unlikely(n < 0)) return n; - wr.wr.next = NULL; - wr.wr.opcode = IB_WR_REG_MR; - wr.wr.wr_id = FAST_REG_WR_ID_MASK; - wr.wr.num_sge = 0; - wr.wr.send_flags = 0; - wr.mr = desc->mr; - wr.key = desc->mr->rkey; - wr.access = (IB_ACCESS_LOCAL_WRITE | - IB_ACCESS_REMOTE_READ | - IB_ACCESS_REMOTE_WRITE); + desc->mr->wr.wr_id = FAST_REG_WR_ID_MASK; *state->fr.next++ = desc; state->nmdesc++; @@ -1363,7 +1359,7 @@ static int srp_map_finish_fr(struct srp_map_state *state, srp_map_desc(state, desc->mr->iova, desc->mr->length, desc->mr->rkey); - err = ib_post_send(ch->qp, &wr.wr, &bad_wr); + err = ib_post_send(ch->qp, &desc->mr->wr, &bad_wr); if (unlikely(err)) return err; diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 83d6ee8..b168b3a 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -1105,18 +1105,6 @@ static inline struct ib_ud_wr *ud_wr(struct ib_send_wr *wr) return container_of(wr, struct ib_ud_wr, wr); } -struct ib_reg_wr { - struct ib_send_wr wr; - struct ib_mr *mr; - u32 key; - int access; -}; - -static inline struct ib_reg_wr *reg_wr(struct ib_send_wr *wr) -{ - return container_of(wr, struct ib_reg_wr, wr); -} - struct ib_bind_mw_wr { struct ib_send_wr wr; struct ib_mw *mw; @@ -1314,7 +1302,18 @@ struct ib_qp { enum ib_qp_type qp_type; }; +enum ib_mr_flags { + /* scope: either remote or local */ + IB_MR_REMOTE, + IB_MR_LOCAL, + + /* direction: one or both can be ORed into the scope above */ + IB_MR_RDMA_READ = (1 << 10), + IB_MR_RDMA_WRITE = (1 << 11) +}; + struct ib_mr { + struct ib_send_wr wr; struct ib_device *device; struct ib_pd *pd; struct ib_uobject *uobject; @@ -1326,6 +1325,11 @@ struct ib_mr { atomic_t usecnt; /* count number of MWs */ }; +static inline struct ib_mr *wr_to_mr(struct ib_send_wr *wr) +{ + return container_of(wr, struct ib_mr, wr); +} + struct ib_mw { struct ib_device *device; struct ib_pd *pd; @@ -1706,7 +1710,8 @@ struct ib_device { u32 max_num_sg); int (*map_mr_sg)(struct ib_mr *mr, struct scatterlist *sg, - int sg_nents); + int sg_nents, + unsigned int flags); int (*rereg_phys_mr)(struct ib_mr *mr, int mr_rereg_mask, struct ib_pd *pd, @@ -3022,17 +3027,19 @@ struct net_device *ib_get_net_dev_by_params(struct ib_device *dev, u8 port, int ib_map_mr_sg(struct ib_mr *mr, struct scatterlist *sg, int sg_nents, - unsigned int page_size); + unsigned int page_size, + unsigned int flags); static inline int ib_map_mr_sg_zbva(struct ib_mr *mr, struct scatterlist *sg, int sg_nents, - unsigned int page_size) + unsigned int page_size, + unsigned int flags) { int n; - n = ib_map_mr_sg(mr, sg, sg_nents, page_size); + n = ib_map_mr_sg(mr, sg, sg_nents, page_size, flags); mr->iova = 0; return n;