Message ID | 20150629213618.4188.50574.stgit@build.ogc.int (mailing list archive) |
---|---|
State | Changes Requested |
Headers | show |
On 30/06/2015 00:36, Steve Wise wrote: > /** > + * rdma_mr_roles - possible roles an RDMA MR will be used for > + * > + * This allows a transport independent RDMA application to > + * create MRs that are usable for all the desired roles w/o > + * having to understand which access rights are needed. > + */ > +enum { > + > + /* lkey used in a ib_recv_wr sge */ > + RDMA_MRR_RECV = 1, > + > + /* lkey used for a IB_WR_SEND in the ib_send_wr sge */ > + RDMA_MRR_SEND = (1<<1), Perhaps you should mention that this covers all the IB_WR_SEND* opcodes (SEND_WITH_IMM and SEND_WITH_INV). READ, WRITE and ATOMICs also have several variants. > + > + /* rkey used for a IB_WR_RDMA_READ in ib_send_wr wr.rdma.rkey */ > + RDMA_MRR_READ_SOURCE = (1<<2), > + > + /* lkey used for a IB_WR_RDMA_READ in the ib_send_wr sge */ > + RDMA_MRR_READ_DEST = (1<<3), > + > + /* lkey used for a IB_WR_RDMA_WRITE in the ib_send_wr sge */ > + RDMA_MRR_WRITE_SOURCE = (1<<4), > + > + /* rkey used for a IB_WR_RDMA_WRITE in ib_send_wr wr.rdma.rkey */ > + RDMA_MRR_WRITE_DEST = (1<<5), > + > + /* > + * rkey used for a IB_WR_ATOMIC/MASKED_ATOMIC in ib_send_wr > + * wr.atomic.rkey > + */ > + RDMA_MRR_ATOMIC = (1<<6), What about using as an lkey in an IB_WR_ATOMIC/MASKED_ATOMIC in the ib_send_wr sge? Do you want that to be covered by RDMA_MRR_SEND? > + > + /* MR used for a IB_WR_MW_BIND in ib_send_wr wr.bind_mw.bind_info.mr */ > + RDMA_MRR_MW_BIND = (1<<7), > +}; -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 6/30/2015 12:36 AM, Steve Wise wrote: > The semantics for MR access flags are not consistent across RDMA > protocols. So rather than have applications try and glean what they > need, have them pass in the intended roles and attributes for the MR to > be allocated and let the RDMA core select the appropriate access flags > given the roles, attributes, and device capabilities. wait, we have NFSoRDMA (net/sunrpc/xprtrdma/*) in the kernel for years and it works on top of both IB/RoCE and iWARP.I know they have there 5-6 memory registration methods.. but if we stick to their mode that uses fast registration ala your upstream commit 0f7ec3 "RDMA/core: Add memory management extensions support" -- what's missing or how come it work w/o the enhancement suggested here?Added Chuck. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 6/30/2015 12:36 AM, Steve Wise wrote: > The semantics for MR access flags are not consistent across RDMA > protocols. So rather than have applications try and glean what they > need, have them pass in the intended roles and attributes for the MR to > be allocated and let the RDMA core select the appropriate access flags > given the roles, attributes, and device capabilities. > > We introduce rdma_mr_roles and rdma_mr_attributes that enumerate the > possible roles and attributes for a MR. These are documented in the > enums themselves. > > New services exported: > > rdma_device_access_flags() - given the intended MR roles and attributes > passed in, return the ib_access_flags bits for the device. > > rdma_get_dma_mr() - allocate a dma mr using the applications intended > MR roles and MR attributes. This uses rdma_device_access_flags(). > > rdma_fast_reg_access_flags() - return the ib_access_flags bits needed > for a fast register WR given the applications intended MR roles and > MR attributes. This uses rdma_device_access_flags(). > > Signed-off-by: Steve Wise <swise@opengridcomputing.com> > --- > drivers/infiniband/core/verbs.c | 30 ++++++++++++ > include/rdma/ib_verbs.h | 101 +++++++++++++++++++++++++++++++++++++++ > 2 files changed, 131 insertions(+), 0 deletions(-) > > diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c > index bac3fb4..2aa7c92 100644 > --- a/drivers/infiniband/core/verbs.c > +++ b/drivers/infiniband/core/verbs.c > @@ -1144,6 +1144,36 @@ struct ib_mr *ib_get_dma_mr(struct ib_pd *pd, int mr_access_flags) > } > EXPORT_SYMBOL(ib_get_dma_mr); > > +int rdma_device_access_flags(struct ib_pd *pd, int roles, int attrs) > +{ > + int access_flags = attrs; > + > + if (roles & RDMA_MRR_RECV) > + access_flags |= IB_ACCESS_LOCAL_WRITE; > + > + if (roles & RDMA_MRR_WRITE_DEST) > + access_flags |= IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_WRITE; > + > + if (roles & RDMA_MRR_READ_DEST) { > + access_flags |= IB_ACCESS_LOCAL_WRITE; > + if (rdma_protocol_iwarp(pd->device, > + rdma_start_port(pd->device))) > + access_flags |= IB_ACCESS_REMOTE_WRITE; > + } > + > + if (roles & RDMA_MRR_READ_SOURCE) > + access_flags |= IB_ACCESS_REMOTE_READ; > + > + if (roles & RDMA_MRR_ATOMIC) > + access_flags |= IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_ATOMIC; > + > + if (roles & RDMA_MRR_MW_BIND) > + access_flags |= IB_ACCESS_MW_BIND; > + > + return access_flags; > +} > +EXPORT_SYMBOL(rdma_device_access_flags); > + > struct ib_mr *ib_reg_phys_mr(struct ib_pd *pd, > struct ib_phys_buf *phys_buf_array, > int num_phys_buf, > diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h > index 986fddb..135592d 100644 > --- a/include/rdma/ib_verbs.h > +++ b/include/rdma/ib_verbs.h > @@ -2494,6 +2494,107 @@ static inline int ib_req_ncomp_notif(struct ib_cq *cq, int wc_cnt) > struct ib_mr *ib_get_dma_mr(struct ib_pd *pd, int mr_access_flags); > > /** > + * rdma_mr_roles - possible roles an RDMA MR will be used for > + * > + * This allows a transport independent RDMA application to > + * create MRs that are usable for all the desired roles w/o > + * having to understand which access rights are needed. > + */ > +enum { > + > + /* lkey used in a ib_recv_wr sge */ > + RDMA_MRR_RECV = 1, > + > + /* lkey used for a IB_WR_SEND in the ib_send_wr sge */ > + RDMA_MRR_SEND = (1<<1), > + > + /* rkey used for a IB_WR_RDMA_READ in ib_send_wr wr.rdma.rkey */ > + RDMA_MRR_READ_SOURCE = (1<<2), > + > + /* lkey used for a IB_WR_RDMA_READ in the ib_send_wr sge */ > + RDMA_MRR_READ_DEST = (1<<3), > + > + /* lkey used for a IB_WR_RDMA_WRITE in the ib_send_wr sge */ > + RDMA_MRR_WRITE_SOURCE = (1<<4), > + > + /* rkey used for a IB_WR_RDMA_WRITE in ib_send_wr wr.rdma.rkey */ > + RDMA_MRR_WRITE_DEST = (1<<5), > + > + /* > + * rkey used for a IB_WR_ATOMIC/MASKED_ATOMIC in ib_send_wr > + * wr.atomic.rkey > + */ > + RDMA_MRR_ATOMIC = (1<<6), > + > + /* MR used for a IB_WR_MW_BIND in ib_send_wr wr.bind_mw.bind_info.mr */ > + RDMA_MRR_MW_BIND = (1<<7), > +}; > + > +/** > + * rdma_mr_attributes - attributes for rdma memory regions > + */ > +enum { > + RDMA_MRA_ZERO_BASED = IB_ZERO_BASED, > + RDMA_MRA_ACCESS_ON_DEMAND = IB_ACCESS_ON_DEMAND, > +}; > + > +/** > + * rdma_device_access_flags - Returns the device-specific MR access flags. > + * @pd: The protection domain associated with the memory region. > + * @roles: The intended roles of the MR > + * @attrs: The desired attributes of the MR > + * > + * Use the intended roles from @roles along with @attrs and the device > + * capabilities to generate the needed access rights. > + * > + * Return: the ib_access_flags value needed to allocate the MR. > + */ > +int rdma_device_access_flags(struct ib_pd *pd, int roles, int attrs); > + > +/** > + * rdma_get_dma_mr - Returns a memory region for system memory that is > + * usable for DMA. > + * @pd: The protection domain associated with the memory region. > + * @roles: The intended roles of the MR > + * @attrs: The desired attributes of the MR > + * > + * Use the intended roles from @roles along with @attrs and the device > + * capabilities to define the needed access rights, and call > + * ib_get_dma_mr() to allocate the MR. > + * > + * Note that the ib_dma_*() functions defined below must be used > + * to create/destroy addresses used with the Lkey or Rkey returned > + * by ib_get_dma_mr(). > + * > + * Return: struct ib_mr pointer upon success, or a pointer encoded errno upon > + * failure. > + */ > +static inline struct ib_mr *rdma_get_dma_mr(struct ib_pd *pd, int roles, > + int attrs) > +{ > + return ib_get_dma_mr(pd, rdma_device_access_flags(pd, roles, attrs)); > +} Do we really need the rdma_get_dma_mr() wrapper? I suggest to start consolidating to ib_create_mr() that receives an extensible ib_mr_init_attr and additional attributes can be mr_roles and mr_attrs. I have no problem with renaming it to rdma_create_mr() if people really want to. See my comment in: http://marc.info/?l=linux-rdma&m=143539761710553&w=2 Thoughts? -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> -----Original Message----- > From: Haggai Eran [mailto:haggaie@mellanox.com] > Sent: Tuesday, June 30, 2015 2:25 AM > To: Steve Wise; dledford@redhat.com > Cc: roid@mellanox.com; sagig@mellanox.com; linux-rdma@vger.kernel.org; jgunthorpe@obsidianresearch.com; infinipath@intel.com; > eli@mellanox.com; ogerlitz@mellanox.com; sean.hefty@intel.com > Subject: Re: [PATCH V2 3/5] RDMA/core: transport-independent access flags > > On 30/06/2015 00:36, Steve Wise wrote: > > /** > > + * rdma_mr_roles - possible roles an RDMA MR will be used for > > + * > > + * This allows a transport independent RDMA application to > > + * create MRs that are usable for all the desired roles w/o > > + * having to understand which access rights are needed. > > + */ > > +enum { > > + > > + /* lkey used in a ib_recv_wr sge */ > > + RDMA_MRR_RECV = 1, > > + > > + /* lkey used for a IB_WR_SEND in the ib_send_wr sge */ > > + RDMA_MRR_SEND = (1<<1), > Perhaps you should mention that this covers all the IB_WR_SEND* opcodes > (SEND_WITH_IMM and SEND_WITH_INV). READ, WRITE and ATOMICs also have > several variants. > Agreed. > > + > > + /* rkey used for a IB_WR_RDMA_READ in ib_send_wr wr.rdma.rkey */ > > + RDMA_MRR_READ_SOURCE = (1<<2), > > + > > + /* lkey used for a IB_WR_RDMA_READ in the ib_send_wr sge */ > > + RDMA_MRR_READ_DEST = (1<<3), > > + > > + /* lkey used for a IB_WR_RDMA_WRITE in the ib_send_wr sge */ > > + RDMA_MRR_WRITE_SOURCE = (1<<4), > > + > > + /* rkey used for a IB_WR_RDMA_WRITE in ib_send_wr wr.rdma.rkey */ > > + RDMA_MRR_WRITE_DEST = (1<<5), > > + > > + /* > > + * rkey used for a IB_WR_ATOMIC/MASKED_ATOMIC in ib_send_wr > > + * wr.atomic.rkey > > + */ > > + RDMA_MRR_ATOMIC = (1<<6), > What about using as an lkey in an IB_WR_ATOMIC/MASKED_ATOMIC in the > ib_send_wr sge? Do you want that to be covered by RDMA_MRR_SEND? > Ah yes. Perhaps we need ATOMIC_SOURCE and ATOMIC_DEST. I wouldn't include it in the SEND. > > + > > + /* MR used for a IB_WR_MW_BIND in ib_send_wr wr.bind_mw.bind_info.mr */ > > + RDMA_MRR_MW_BIND = (1<<7), > > +}; -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> -----Original Message----- > From: Or Gerlitz [mailto:ogerlitz@mellanox.com] > Sent: Tuesday, June 30, 2015 4:04 AM > To: Steve Wise; Chuck Lever > Cc: dledford@redhat.com; roid@mellanox.com; sagig@mellanox.com; linux-rdma@vger.kernel.org; jgunthorpe@obsidianresearch.com; > infinipath@intel.com; eli@mellanox.com; sean.hefty@intel.com > Subject: Re: [PATCH V2 3/5] RDMA/core: transport-independent access flags > > On 6/30/2015 12:36 AM, Steve Wise wrote: > > The semantics for MR access flags are not consistent across RDMA > > protocols. So rather than have applications try and glean what they > > need, have them pass in the intended roles and attributes for the MR to > > be allocated and let the RDMA core select the appropriate access flags > > given the roles, attributes, and device capabilities. > > wait, we have NFSoRDMA (net/sunrpc/xprtrdma/*) in the kernel for years > and it works on top of both IB/RoCE and iWARP.I know they have there 5-6 > memory registration methods.. but if we stick to their mode that uses > fast registration ala your upstream commit 0f7ec3 "RDMA/core: Add memory > management extensions support" -- what's missing or how come it work > w/o the enhancement suggested here?Added Chuck. NFSRDMA currently checks the transport type to decide how to set the access flags for memory registration. With the new services exported in this series, we can change/simplify NFSRDMA to not have to know the transport type. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> -----Original Message----- > From: Sagi Grimberg [mailto:sagig@dev.mellanox.co.il] > Sent: Tuesday, June 30, 2015 4:22 AM > To: Steve Wise; dledford@redhat.com > Cc: roid@mellanox.com; sagig@mellanox.com; linux-rdma@vger.kernel.org; jgunthorpe@obsidianresearch.com; infinipath@intel.com; > eli@mellanox.com; ogerlitz@mellanox.com; sean.hefty@intel.com > Subject: Re: [PATCH V2 3/5] RDMA/core: transport-independent access flags > > On 6/30/2015 12:36 AM, Steve Wise wrote: > > The semantics for MR access flags are not consistent across RDMA > > protocols. So rather than have applications try and glean what they > > need, have them pass in the intended roles and attributes for the MR to > > be allocated and let the RDMA core select the appropriate access flags > > given the roles, attributes, and device capabilities. > > > > We introduce rdma_mr_roles and rdma_mr_attributes that enumerate the > > possible roles and attributes for a MR. These are documented in the > > enums themselves. > > > > New services exported: > > > > rdma_device_access_flags() - given the intended MR roles and attributes > > passed in, return the ib_access_flags bits for the device. > > > > rdma_get_dma_mr() - allocate a dma mr using the applications intended > > MR roles and MR attributes. This uses rdma_device_access_flags(). > > > > rdma_fast_reg_access_flags() - return the ib_access_flags bits needed > > for a fast register WR given the applications intended MR roles and > > MR attributes. This uses rdma_device_access_flags(). > > > > Signed-off-by: Steve Wise <swise@opengridcomputing.com> > > --- > > drivers/infiniband/core/verbs.c | 30 ++++++++++++ > > include/rdma/ib_verbs.h | 101 +++++++++++++++++++++++++++++++++++++++ > > 2 files changed, 131 insertions(+), 0 deletions(-) > > > > diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c > > index bac3fb4..2aa7c92 100644 > > --- a/drivers/infiniband/core/verbs.c > > +++ b/drivers/infiniband/core/verbs.c > > @@ -1144,6 +1144,36 @@ struct ib_mr *ib_get_dma_mr(struct ib_pd *pd, int mr_access_flags) > > } > > EXPORT_SYMBOL(ib_get_dma_mr); > > > > +int rdma_device_access_flags(struct ib_pd *pd, int roles, int attrs) > > +{ > > + int access_flags = attrs; > > + > > + if (roles & RDMA_MRR_RECV) > > + access_flags |= IB_ACCESS_LOCAL_WRITE; > > + > > + if (roles & RDMA_MRR_WRITE_DEST) > > + access_flags |= IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_WRITE; > > + > > + if (roles & RDMA_MRR_READ_DEST) { > > + access_flags |= IB_ACCESS_LOCAL_WRITE; > > + if (rdma_protocol_iwarp(pd->device, > > + rdma_start_port(pd->device))) > > + access_flags |= IB_ACCESS_REMOTE_WRITE; > > + } > > + > > + if (roles & RDMA_MRR_READ_SOURCE) > > + access_flags |= IB_ACCESS_REMOTE_READ; > > + > > + if (roles & RDMA_MRR_ATOMIC) > > + access_flags |= IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_ATOMIC; > > + > > + if (roles & RDMA_MRR_MW_BIND) > > + access_flags |= IB_ACCESS_MW_BIND; > > + > > + return access_flags; > > +} > > +EXPORT_SYMBOL(rdma_device_access_flags); > > + > > struct ib_mr *ib_reg_phys_mr(struct ib_pd *pd, > > struct ib_phys_buf *phys_buf_array, > > int num_phys_buf, > > diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h > > index 986fddb..135592d 100644 > > --- a/include/rdma/ib_verbs.h > > +++ b/include/rdma/ib_verbs.h > > @@ -2494,6 +2494,107 @@ static inline int ib_req_ncomp_notif(struct ib_cq *cq, int wc_cnt) > > struct ib_mr *ib_get_dma_mr(struct ib_pd *pd, int mr_access_flags); > > > > /** > > + * rdma_mr_roles - possible roles an RDMA MR will be used for > > + * > > + * This allows a transport independent RDMA application to > > + * create MRs that are usable for all the desired roles w/o > > + * having to understand which access rights are needed. > > + */ > > +enum { > > + > > + /* lkey used in a ib_recv_wr sge */ > > + RDMA_MRR_RECV = 1, > > + > > + /* lkey used for a IB_WR_SEND in the ib_send_wr sge */ > > + RDMA_MRR_SEND = (1<<1), > > + > > + /* rkey used for a IB_WR_RDMA_READ in ib_send_wr wr.rdma.rkey */ > > + RDMA_MRR_READ_SOURCE = (1<<2), > > + > > + /* lkey used for a IB_WR_RDMA_READ in the ib_send_wr sge */ > > + RDMA_MRR_READ_DEST = (1<<3), > > + > > + /* lkey used for a IB_WR_RDMA_WRITE in the ib_send_wr sge */ > > + RDMA_MRR_WRITE_SOURCE = (1<<4), > > + > > + /* rkey used for a IB_WR_RDMA_WRITE in ib_send_wr wr.rdma.rkey */ > > + RDMA_MRR_WRITE_DEST = (1<<5), > > + > > + /* > > + * rkey used for a IB_WR_ATOMIC/MASKED_ATOMIC in ib_send_wr > > + * wr.atomic.rkey > > + */ > > + RDMA_MRR_ATOMIC = (1<<6), > > + > > + /* MR used for a IB_WR_MW_BIND in ib_send_wr wr.bind_mw.bind_info.mr */ > > + RDMA_MRR_MW_BIND = (1<<7), > > +}; > > + > > +/** > > + * rdma_mr_attributes - attributes for rdma memory regions > > + */ > > +enum { > > + RDMA_MRA_ZERO_BASED = IB_ZERO_BASED, > > + RDMA_MRA_ACCESS_ON_DEMAND = IB_ACCESS_ON_DEMAND, > > +}; > > + > > +/** > > + * rdma_device_access_flags - Returns the device-specific MR access flags. > > + * @pd: The protection domain associated with the memory region. > > + * @roles: The intended roles of the MR > > + * @attrs: The desired attributes of the MR > > + * > > + * Use the intended roles from @roles along with @attrs and the device > > + * capabilities to generate the needed access rights. > > + * > > + * Return: the ib_access_flags value needed to allocate the MR. > > + */ > > +int rdma_device_access_flags(struct ib_pd *pd, int roles, int attrs); > > + > > +/** > > + * rdma_get_dma_mr - Returns a memory region for system memory that is > > + * usable for DMA. > > + * @pd: The protection domain associated with the memory region. > > + * @roles: The intended roles of the MR > > + * @attrs: The desired attributes of the MR > > + * > > + * Use the intended roles from @roles along with @attrs and the device > > + * capabilities to define the needed access rights, and call > > + * ib_get_dma_mr() to allocate the MR. > > + * > > + * Note that the ib_dma_*() functions defined below must be used > > + * to create/destroy addresses used with the Lkey or Rkey returned > > + * by ib_get_dma_mr(). > > + * > > + * Return: struct ib_mr pointer upon success, or a pointer encoded errno upon > > + * failure. > > + */ > > +static inline struct ib_mr *rdma_get_dma_mr(struct ib_pd *pd, int roles, > > + int attrs) > > +{ > > + return ib_get_dma_mr(pd, rdma_device_access_flags(pd, roles, attrs)); > > +} > > Do we really need the rdma_get_dma_mr() wrapper? > > I suggest to start consolidating to ib_create_mr() that receives an > extensible ib_mr_init_attr and additional attributes can be mr_roles > and mr_attrs. > > I have no problem with renaming it to rdma_create_mr() if people really > want to. > > See my comment in: http://marc.info/?l=linux-rdma&m=143539761710553&w=2 > > Thoughts? I'm not opposed to your suggestion to consolidate. I'd like to hear what others think? -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Jun 30, 2015, at 10:29 AM, Steve Wise <swise@opengridcomputing.com> wrote: > > >> -----Original Message----- >> From: Or Gerlitz [mailto:ogerlitz@mellanox.com] >> Sent: Tuesday, June 30, 2015 4:04 AM >> To: Steve Wise; Chuck Lever >> Cc: dledford@redhat.com; roid@mellanox.com; sagig@mellanox.com; linux-rdma@vger.kernel.org; jgunthorpe@obsidianresearch.com; >> infinipath@intel.com; eli@mellanox.com; sean.hefty@intel.com >> Subject: Re: [PATCH V2 3/5] RDMA/core: transport-independent access flags >> >> On 6/30/2015 12:36 AM, Steve Wise wrote: >>> The semantics for MR access flags are not consistent across RDMA >>> protocols. So rather than have applications try and glean what they >>> need, have them pass in the intended roles and attributes for the MR to >>> be allocated and let the RDMA core select the appropriate access flags >>> given the roles, attributes, and device capabilities. >> >> wait, we have NFSoRDMA (net/sunrpc/xprtrdma/*) in the kernel for years >> and it works on top of both IB/RoCE and iWARP.I know they have there 5-6 >> memory registration methods.. but if we stick to their mode that uses >> fast registration ala your upstream commit 0f7ec3 "RDMA/core: Add memory >> management extensions support" -- what's missing or how come it work >> w/o the enhancement suggested here?Added Chuck. > > NFSRDMA currently checks the transport type to decide how to set the access flags for memory registration. With the new services exported in this series, we can change/simplify NFSRDMA to not have to know the transport type. I was planning to look at this more closely soon, but if you have patches I’d happily consider them, or you can just point me to what needs to be changed and I can put it together for 4.3. -- Chuck Lever -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> -----Original Message----- > From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma-owner@vger.kernel.org] On Behalf Of Chuck Lever > Sent: Tuesday, June 30, 2015 9:42 AM > To: Steve Wise > Cc: Or Gerlitz; dledford@redhat.com; roid@mellanox.com; sagig@mellanox.com; linux-rdma@vger.kernel.org; > jgunthorpe@obsidianresearch.com; infinipath@intel.com; eli@mellanox.com; sean.hefty@intel.com > Subject: Re: [PATCH V2 3/5] RDMA/core: transport-independent access flags > > > On Jun 30, 2015, at 10:29 AM, Steve Wise <swise@opengridcomputing.com> wrote: > > > > > > >> -----Original Message----- > >> From: Or Gerlitz [mailto:ogerlitz@mellanox.com] > >> Sent: Tuesday, June 30, 2015 4:04 AM > >> To: Steve Wise; Chuck Lever > >> Cc: dledford@redhat.com; roid@mellanox.com; sagig@mellanox.com; linux-rdma@vger.kernel.org; jgunthorpe@obsidianresearch.com; > >> infinipath@intel.com; eli@mellanox.com; sean.hefty@intel.com > >> Subject: Re: [PATCH V2 3/5] RDMA/core: transport-independent access flags > >> > >> On 6/30/2015 12:36 AM, Steve Wise wrote: > >>> The semantics for MR access flags are not consistent across RDMA > >>> protocols. So rather than have applications try and glean what they > >>> need, have them pass in the intended roles and attributes for the MR to > >>> be allocated and let the RDMA core select the appropriate access flags > >>> given the roles, attributes, and device capabilities. > >> > >> wait, we have NFSoRDMA (net/sunrpc/xprtrdma/*) in the kernel for years > >> and it works on top of both IB/RoCE and iWARP.I know they have there 5-6 > >> memory registration methods.. but if we stick to their mode that uses > >> fast registration ala your upstream commit 0f7ec3 "RDMA/core: Add memory > >> management extensions support" -- what's missing or how come it work > >> w/o the enhancement suggested here?Added Chuck. > > > > NFSRDMA currently checks the transport type to decide how to set the access flags for memory registration. With the new services > exported in this series, we can change/simplify NFSRDMA to not have to know the transport type. > > I was planning to look at this more closely soon, but if you > have patches I'd happily consider them, or you can just point > me to what needs to be changed and I can put it together for 4.3. > Thanks. I suggest you hold off on NFSRDMA changes until we get closure on the new core services. Once it stabilizes, I'll ping ya. (and I'll CC you on subsequent versions of this). Stevo -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jun 29, 2015 at 04:36:18PM -0500, Steve Wise wrote: > +int rdma_device_access_flags(struct ib_pd *pd, int roles, int attrs) > +{ > + int access_flags = attrs; No RDMA_MRR_SEND ? > + if (roles & RDMA_MRR_RECV) > + access_flags |= IB_ACCESS_LOCAL_WRITE; > + > + if (roles & RDMA_MRR_WRITE_DEST) > + access_flags |= IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_WRITE; Is IB_ACCESS_LOCAL_WRITE needed? > + if (roles & RDMA_MRR_READ_DEST) { > + access_flags |= IB_ACCESS_LOCAL_WRITE; > + if (rdma_protocol_iwarp(pd->device, > + rdma_start_port(pd->device))) > + access_flags |= IB_ACCESS_REMOTE_WRITE; > + } So on iWarp if I want to issue a RDMA_READ then I have to allow the far side uncontrolled write access to the same memory? Is there something else protecting it? Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> NFSRDMA currently checks the transport type to decide how to set the > access flags for memory registration. With the new services > exported in this series, we can change/simplify NFSRDMA to not have > to know the transport type. It would be excellent if this series actually went through and got rid of all the now deprecated users. This would confirm we have the right API here and prune off the old stuff. This is fairly trivial to do, I think? The goal is to make things simpler, maintaining two kernel APIs is not simpler :) Regarding Sagi's comments .. We don't have to do everything at once, a series that focuses only on the access flags seems sane to me. Sagi's idea makes a lot of sense, but maybe it should be explored along the direction Christoph has been talking about? I suggested the wrapper because it makes it very easy to force the old API out (just remove the old function call), but maybe we could move the old style access flags into a private header or something ? Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> I suggest to start consolidating to ib_create_mr() that receives an > extensible ib_mr_init_attr and additional attributes can be mr_roles > and mr_attrs. I think this makes sense, but does it really help? If the end result is that the app and providers basically end up switching on mr_attr::type, we end up reducing the number of APIs, but the code complexity remains the same.
> -----Original Message----- > From: Jason Gunthorpe [mailto:jgunthorpe@obsidianresearch.com] > Sent: Tuesday, June 30, 2015 11:21 AM > To: Steve Wise > Cc: dledford@redhat.com; roid@mellanox.com; sagig@mellanox.com; linux-rdma@vger.kernel.org; infinipath@intel.com; > eli@mellanox.com; ogerlitz@mellanox.com; sean.hefty@intel.com > Subject: Re: [PATCH V2 3/5] RDMA/core: transport-independent access flags > > On Mon, Jun 29, 2015 at 04:36:18PM -0500, Steve Wise wrote: > > +int rdma_device_access_flags(struct ib_pd *pd, int roles, int attrs) > > +{ > > + int access_flags = attrs; > > No RDMA_MRR_SEND ? > Send need no explicit access flags. There is no LOCAL_READ access flag. > > + if (roles & RDMA_MRR_RECV) > > + access_flags |= IB_ACCESS_LOCAL_WRITE; > > + > > + if (roles & RDMA_MRR_WRITE_DEST) > > + access_flags |= IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_WRITE; > > Is IB_ACCESS_LOCAL_WRITE needed? > Yes. You must have LOCAL_WRITE if you set REMOTE_WRITE. > > + if (roles & RDMA_MRR_READ_DEST) { > > + access_flags |= IB_ACCESS_LOCAL_WRITE; > > + if (rdma_protocol_iwarp(pd->device, > > + rdma_start_port(pd->device))) > > + access_flags |= IB_ACCESS_REMOTE_WRITE; > > + } > > So on iWarp if I want to issue a RDMA_READ then I have to allow the > far side uncontrolled write access to the same memory? Is there > something else protecting it? > Only the fact that the rkey of your MR setup for MRR_READ_DEST doesn't have to be advertised to the peer application. So the peer app doesn't know what the rkey is. But the fact is that an MR with REMOTE_WRITE access flags can be used as the destination of RDMA reads and RDMA writes for iWARP. Steve. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 6/30/2015 7:42 PM, Jason Gunthorpe wrote: >> NFSRDMA currently checks the transport type to decide how to set the >> >access flags for memory registration. With the new services >> >exported in this series, we can change/simplify NFSRDMA to not have >> >to know the transport type. > It would be excellent if this series actually went through and got rid > of all the now deprecated users. This would confirm we have the right > API here and prune off the old stuff. This is fairly trivial to do, I think? > > The goal is to make things simpler, maintaining two kernel APIs is not > simpler:) Agree. Steve, you should 1st go and port NFSoRDMA to your proposal (patch) for IB core changes, so we can see the change in action for a code which is working today on multiple transports, iser should be 2nd. Or. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 6/30/2015 8:10 PM, Hefty, Sean wrote: >> I suggest to start consolidating to ib_create_mr() that receives an >> extensible ib_mr_init_attr and additional attributes can be mr_roles >> and mr_attrs. > > I think this makes sense, but does it really help? > If the end result is that the app and providers basically > end up switching on mr_attr::type, we end up reducing the > number of APIs, but the code complexity remains the same. > I think it will reduce code complexity. First, the callers will have a single API to allocate a memory key. I'm not sure why you say that app needs to switch on mr::type, it knows which type it wants. As for drivers, From what I've seen, usually there is a single memory key allocation interface to the HW and for each API and drivers call it with a slightly different attributes and maybe do some extra driver specific logic. Yes, probably drivers will need to switch on mr_attr::type, but maybe just just to determine the correct HW interface and hopefully not to a completely different logic. I think that drivers can benefit in reduced code duplication in total. As a first step, we can do a naive switch on mr_attr::type in the drivers but I'd expect driver developers to change this logic in their driver. Thoughts? -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 7/2/2015 1:22 AM, Sagi Grimberg wrote: > On 6/30/2015 8:10 PM, Hefty, Sean wrote: >>> I suggest to start consolidating to ib_create_mr() that receives an >>> extensible ib_mr_init_attr and additional attributes can be mr_roles >>> and mr_attrs. >> >> I think this makes sense, but does it really help? >> If the end result is that the app and providers basically >> end up switching on mr_attr::type, we end up reducing the >> number of APIs, but the code complexity remains the same. >> > > I think it will reduce code complexity. First, the callers > will have a single API to allocate a memory key. I'm not sure why you > say that app needs to switch on mr::type, it knows which type it wants. > I dont see how doing this is less complex: attr = FASTREG mr = create_mr(attr) vs: mr = lloc_fast_reg_mr() > As for drivers, From what I've seen, usually there is a single memory > key allocation interface to the HW and for each API and drivers call it > with a slightly different attributes and maybe do some extra driver > specific logic. > > Yes, probably drivers will need to switch on mr_attr::type, but maybe > just just to determine the correct HW interface and hopefully not to > a completely different logic. I think that drivers can benefit in > reduced code duplication in total. > There isn't much code duplication now because as you mentioned above, most drivers have common services that get called from the various entry points: alloc_fast_reg_mr(), get_dma_mr(), reg_user_mr(), etc... > As a first step, we can do a naive switch on mr_attr::type in the > drivers but I'd expect driver developers to change this logic in their > driver. > > Thoughts? I don't see any reduction in complexity of the application nor driver/core code. Perhaps a reduction in API complexity by having a single entry point...but that entry point would be the place I would fan out and call the driver functions vs changing the core<->driver interface... My 2 centimes... -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 7/2/2015 4:17 PM, Steve Wise wrote: > On 7/2/2015 1:22 AM, Sagi Grimberg wrote: >> On 6/30/2015 8:10 PM, Hefty, Sean wrote: >>>> I suggest to start consolidating to ib_create_mr() that receives an >>>> extensible ib_mr_init_attr and additional attributes can be mr_roles >>>> and mr_attrs. >>> >>> I think this makes sense, but does it really help? >>> If the end result is that the app and providers basically >>> end up switching on mr_attr::type, we end up reducing the >>> number of APIs, but the code complexity remains the same. >>> >> >> I think it will reduce code complexity. First, the callers >> will have a single API to allocate a memory key. I'm not sure why you >> say that app needs to switch on mr::type, it knows which type it wants. >> > > I dont see how doing this is less complex: > > attr = FASTREG > mr = create_mr(attr) > > vs: > > mr = lloc_fast_reg_mr() The simplification is that you can facilitate changes like your mr_roles and any other things we may want to add to mr allocation easily instead of suggesting wrappers over wrappers over wrappers. This is why I suggested it in the first place. Sagi. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> -----Original Message----- > From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma-owner@vger.kernel.org] On Behalf Of Sagi Grimberg > Sent: Thursday, July 02, 2015 8:23 AM > To: Steve Wise; Hefty, Sean; dledford@redhat.com > Cc: roid@mellanox.com; sagig@mellanox.com; linux-rdma@vger.kernel.org; jgunthorpe@obsidianresearch.com; infinipath; > eli@mellanox.com; ogerlitz@mellanox.com > Subject: Re: [PATCH V2 3/5] RDMA/core: transport-independent access flags > > On 7/2/2015 4:17 PM, Steve Wise wrote: > > On 7/2/2015 1:22 AM, Sagi Grimberg wrote: > >> On 6/30/2015 8:10 PM, Hefty, Sean wrote: > >>>> I suggest to start consolidating to ib_create_mr() that receives an > >>>> extensible ib_mr_init_attr and additional attributes can be mr_roles > >>>> and mr_attrs. > >>> > >>> I think this makes sense, but does it really help? > >>> If the end result is that the app and providers basically > >>> end up switching on mr_attr::type, we end up reducing the > >>> number of APIs, but the code complexity remains the same. > >>> > >> > >> I think it will reduce code complexity. First, the callers > >> will have a single API to allocate a memory key. I'm not sure why you > >> say that app needs to switch on mr::type, it knows which type it wants. > >> > > > > I dont see how doing this is less complex: > > > > attr = FASTREG > > mr = create_mr(attr) > > > > vs: > > > > mr = lloc_fast_reg_mr() > > The simplification is that you can facilitate changes like your > mr_roles and any other things we may want to add to mr allocation > easily instead of suggesting wrappers over wrappers over wrappers. > > This is why I suggested it in the first place. > Regardless of have one create_mr() or having create_sig_mr(), create_fast_reg_mr(), get_dma_mr(), we'll still need to define the roles/attributes to make them transport independent, and we'll still need to have an entry point that will compute access_flags given roles and attributes to be used when posting frmrs... I don't see wrappers over wrappers over wrappers. I'm not really opposed to creating a single entry point, but rather I'm not really seeing vast utility of doing so. Anybody else have an opinion? Steve. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c index bac3fb4..2aa7c92 100644 --- a/drivers/infiniband/core/verbs.c +++ b/drivers/infiniband/core/verbs.c @@ -1144,6 +1144,36 @@ struct ib_mr *ib_get_dma_mr(struct ib_pd *pd, int mr_access_flags) } EXPORT_SYMBOL(ib_get_dma_mr); +int rdma_device_access_flags(struct ib_pd *pd, int roles, int attrs) +{ + int access_flags = attrs; + + if (roles & RDMA_MRR_RECV) + access_flags |= IB_ACCESS_LOCAL_WRITE; + + if (roles & RDMA_MRR_WRITE_DEST) + access_flags |= IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_WRITE; + + if (roles & RDMA_MRR_READ_DEST) { + access_flags |= IB_ACCESS_LOCAL_WRITE; + if (rdma_protocol_iwarp(pd->device, + rdma_start_port(pd->device))) + access_flags |= IB_ACCESS_REMOTE_WRITE; + } + + if (roles & RDMA_MRR_READ_SOURCE) + access_flags |= IB_ACCESS_REMOTE_READ; + + if (roles & RDMA_MRR_ATOMIC) + access_flags |= IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_ATOMIC; + + if (roles & RDMA_MRR_MW_BIND) + access_flags |= IB_ACCESS_MW_BIND; + + return access_flags; +} +EXPORT_SYMBOL(rdma_device_access_flags); + struct ib_mr *ib_reg_phys_mr(struct ib_pd *pd, struct ib_phys_buf *phys_buf_array, int num_phys_buf, diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 986fddb..135592d 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -2494,6 +2494,107 @@ static inline int ib_req_ncomp_notif(struct ib_cq *cq, int wc_cnt) struct ib_mr *ib_get_dma_mr(struct ib_pd *pd, int mr_access_flags); /** + * rdma_mr_roles - possible roles an RDMA MR will be used for + * + * This allows a transport independent RDMA application to + * create MRs that are usable for all the desired roles w/o + * having to understand which access rights are needed. + */ +enum { + + /* lkey used in a ib_recv_wr sge */ + RDMA_MRR_RECV = 1, + + /* lkey used for a IB_WR_SEND in the ib_send_wr sge */ + RDMA_MRR_SEND = (1<<1), + + /* rkey used for a IB_WR_RDMA_READ in ib_send_wr wr.rdma.rkey */ + RDMA_MRR_READ_SOURCE = (1<<2), + + /* lkey used for a IB_WR_RDMA_READ in the ib_send_wr sge */ + RDMA_MRR_READ_DEST = (1<<3), + + /* lkey used for a IB_WR_RDMA_WRITE in the ib_send_wr sge */ + RDMA_MRR_WRITE_SOURCE = (1<<4), + + /* rkey used for a IB_WR_RDMA_WRITE in ib_send_wr wr.rdma.rkey */ + RDMA_MRR_WRITE_DEST = (1<<5), + + /* + * rkey used for a IB_WR_ATOMIC/MASKED_ATOMIC in ib_send_wr + * wr.atomic.rkey + */ + RDMA_MRR_ATOMIC = (1<<6), + + /* MR used for a IB_WR_MW_BIND in ib_send_wr wr.bind_mw.bind_info.mr */ + RDMA_MRR_MW_BIND = (1<<7), +}; + +/** + * rdma_mr_attributes - attributes for rdma memory regions + */ +enum { + RDMA_MRA_ZERO_BASED = IB_ZERO_BASED, + RDMA_MRA_ACCESS_ON_DEMAND = IB_ACCESS_ON_DEMAND, +}; + +/** + * rdma_device_access_flags - Returns the device-specific MR access flags. + * @pd: The protection domain associated with the memory region. + * @roles: The intended roles of the MR + * @attrs: The desired attributes of the MR + * + * Use the intended roles from @roles along with @attrs and the device + * capabilities to generate the needed access rights. + * + * Return: the ib_access_flags value needed to allocate the MR. + */ +int rdma_device_access_flags(struct ib_pd *pd, int roles, int attrs); + +/** + * rdma_get_dma_mr - Returns a memory region for system memory that is + * usable for DMA. + * @pd: The protection domain associated with the memory region. + * @roles: The intended roles of the MR + * @attrs: The desired attributes of the MR + * + * Use the intended roles from @roles along with @attrs and the device + * capabilities to define the needed access rights, and call + * ib_get_dma_mr() to allocate the MR. + * + * Note that the ib_dma_*() functions defined below must be used + * to create/destroy addresses used with the Lkey or Rkey returned + * by ib_get_dma_mr(). + * + * Return: struct ib_mr pointer upon success, or a pointer encoded errno upon + * failure. + */ +static inline struct ib_mr *rdma_get_dma_mr(struct ib_pd *pd, int roles, + int attrs) +{ + return ib_get_dma_mr(pd, rdma_device_access_flags(pd, roles, attrs)); +} + +/** + * rdma_fast_reg_access_flags - Returns the access flags needed for a fast + * register operation. + * @pd: The protection domain associated with the memory region. + * @roles: The intended roles of the MR + * @attrs: The desired attributes of the MR + * + * Use the intended roles from @roles along with @attrs and the device + * capabilities to define the needed access rights for a fast registration + * work request. + * + * Return: the ib_access_flags value needed for fast registration. + */ +static inline int rdma_fast_reg_access_flags(struct ib_pd *pd, int roles, + int attrs) +{ + return rdma_device_access_flags(pd, roles, attrs); +} + +/** * ib_dma_mapping_error - check a DMA addr for error * @dev: The device for which the dma_addr was created * @dma_addr: The DMA address to check
The semantics for MR access flags are not consistent across RDMA protocols. So rather than have applications try and glean what they need, have them pass in the intended roles and attributes for the MR to be allocated and let the RDMA core select the appropriate access flags given the roles, attributes, and device capabilities. We introduce rdma_mr_roles and rdma_mr_attributes that enumerate the possible roles and attributes for a MR. These are documented in the enums themselves. New services exported: rdma_device_access_flags() - given the intended MR roles and attributes passed in, return the ib_access_flags bits for the device. rdma_get_dma_mr() - allocate a dma mr using the applications intended MR roles and MR attributes. This uses rdma_device_access_flags(). rdma_fast_reg_access_flags() - return the ib_access_flags bits needed for a fast register WR given the applications intended MR roles and MR attributes. This uses rdma_device_access_flags(). Signed-off-by: Steve Wise <swise@opengridcomputing.com> --- drivers/infiniband/core/verbs.c | 30 ++++++++++++ include/rdma/ib_verbs.h | 101 +++++++++++++++++++++++++++++++++++++++ 2 files changed, 131 insertions(+), 0 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html