Message ID | 20190527150004.21191-1-yuval.shaia@oracle.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [rdma-core] verbs: Introduce a new reg_mr API for virtual address space | expand |
On Mon, May 27, 2019 at 06:00:04PM +0300, Yuval Shaia wrote: > The virtual address that is registered is used as a base for any address > used later in post_recv and post_send operations. > > On a virtualised environment this is not correct. > > A guest cannot register its memory so hypervisor maps the guest physical > address to a host virtual address and register it with the HW. Later on, > at datapath phase, the guest fills the SGEs with addresses from its > address space. > Since HW cannot access guest virtual address space an extra translation > is needed to map those addresses to be based on the host virtual address > that was registered with the HW. > > To avoid this, a logical separation between the address that is > registered and the address that is used as a offset at datapath phase is > needed. > > This separation is already implemented in the lower layer part > (ibv_cmd_reg_mr) but blocked at the API level. > > Fix it by introducing a new API function that accepts a address from > guest virtual address space as well. > > Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> > libibverbs/verbs.c | 24 ++++++++++++++++++++++++ > libibverbs/verbs.h | 6 ++++++ > 2 files changed, 30 insertions(+) > > diff --git a/libibverbs/verbs.c b/libibverbs/verbs.c > index 1766b9f5..9ad74ee0 100644 > +++ b/libibverbs/verbs.c > @@ -324,6 +324,30 @@ LATEST_SYMVER_FUNC(ibv_reg_mr, 1_1, "IBVERBS_1.1", > return mr; > } > > +LATEST_SYMVER_FUNC(ibv_reg_mr_virt_as, 1_1, "IBVERBS_1.1", > + struct ibv_mr *, > + struct ibv_pd *pd, void *addr, size_t length, > + uint64_t hca_va, int access) > +{ Doesn't need this macro since it doesn't have a compat version > + struct verbs_mr *vmr; > + struct ibv_reg_mr cmd; > + struct ib_uverbs_reg_mr_resp resp; > + int ret; > + > + vmr = malloc(sizeof(*vmr)); > + if (!vmr) > + return NULL; > + > + ret = ibv_cmd_reg_mr(pd, addr, length, hca_va, access, vmr, &cmd, > + sizeof(cmd), &resp, sizeof(resp)); It seems problematic not to call the driver, several of the drivers are wrappering mr in their own type (ie bnxt_re_mr) and we can't just allocate the wrong size of memory here. What you should do is modify the existing driver callback to accept another argument and go and fix all the drivers to pass that argument into their ibv_cmd_reg_mr as the hca_va above. This looks pretty trivial. Jason
On Mon, May 27, 2019 at 03:22:20PM -0300, Jason Gunthorpe wrote: > On Mon, May 27, 2019 at 06:00:04PM +0300, Yuval Shaia wrote: > > The virtual address that is registered is used as a base for any address > > used later in post_recv and post_send operations. > > > > On a virtualised environment this is not correct. > > > > A guest cannot register its memory so hypervisor maps the guest physical > > address to a host virtual address and register it with the HW. Later on, > > at datapath phase, the guest fills the SGEs with addresses from its > > address space. > > Since HW cannot access guest virtual address space an extra translation > > is needed to map those addresses to be based on the host virtual address > > that was registered with the HW. > > > > To avoid this, a logical separation between the address that is > > registered and the address that is used as a offset at datapath phase is > > needed. > > > > This separation is already implemented in the lower layer part > > (ibv_cmd_reg_mr) but blocked at the API level. > > > > Fix it by introducing a new API function that accepts a address from > > guest virtual address space as well. > > > > Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> > > libibverbs/verbs.c | 24 ++++++++++++++++++++++++ > > libibverbs/verbs.h | 6 ++++++ > > 2 files changed, 30 insertions(+) > > > > diff --git a/libibverbs/verbs.c b/libibverbs/verbs.c > > index 1766b9f5..9ad74ee0 100644 > > +++ b/libibverbs/verbs.c > > @@ -324,6 +324,30 @@ LATEST_SYMVER_FUNC(ibv_reg_mr, 1_1, "IBVERBS_1.1", > > return mr; > > } > > > > +LATEST_SYMVER_FUNC(ibv_reg_mr_virt_as, 1_1, "IBVERBS_1.1", > > + struct ibv_mr *, > > + struct ibv_pd *pd, void *addr, size_t length, > > + uint64_t hca_va, int access) > > +{ > > Doesn't need this macro since it doesn't have a compat version That is weird, without this it fails in link stage. /usr/bin/ld: hw/rdma/rdma_backend.o: in function `rdma_backend_create_mr': rdma_backend.c:(.text+0x260a): undefined reference to `ibv_reg_mr_virt_as' collect2: error: ld returned 1 exit status > > > + struct verbs_mr *vmr; > > + struct ibv_reg_mr cmd; > > + struct ib_uverbs_reg_mr_resp resp; > > + int ret; > > + > > + vmr = malloc(sizeof(*vmr)); > > + if (!vmr) > > + return NULL; > > + > > + ret = ibv_cmd_reg_mr(pd, addr, length, hca_va, access, vmr, &cmd, > > + sizeof(cmd), &resp, sizeof(resp)); > > It seems problematic not to call the driver, several of the drivers > are wrappering mr in their own type (ie bnxt_re_mr) and we can't just > allocate the wrong size of memory here. > > What you should do is modify the existing driver callback to accept > another argument and go and fix all the drivers to pass that argument > into their ibv_cmd_reg_mr as the hca_va above. This looks pretty > trivial. So back to what was proposed in the RFC besides the addition of new arg instead of new callback, right? > > Jason
On Tue, May 28, 2019 at 09:30:56AM +0300, Yuval Shaia wrote: > On Mon, May 27, 2019 at 03:22:20PM -0300, Jason Gunthorpe wrote: > > On Mon, May 27, 2019 at 06:00:04PM +0300, Yuval Shaia wrote: > > > The virtual address that is registered is used as a base for any address > > > used later in post_recv and post_send operations. > > > > > > On a virtualised environment this is not correct. > > > > > > A guest cannot register its memory so hypervisor maps the guest physical > > > address to a host virtual address and register it with the HW. Later on, > > > at datapath phase, the guest fills the SGEs with addresses from its > > > address space. > > > Since HW cannot access guest virtual address space an extra translation > > > is needed to map those addresses to be based on the host virtual address > > > that was registered with the HW. > > > > > > To avoid this, a logical separation between the address that is > > > registered and the address that is used as a offset at datapath phase is > > > needed. > > > > > > This separation is already implemented in the lower layer part > > > (ibv_cmd_reg_mr) but blocked at the API level. > > > > > > Fix it by introducing a new API function that accepts a address from > > > guest virtual address space as well. > > > > > > Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> > > > libibverbs/verbs.c | 24 ++++++++++++++++++++++++ > > > libibverbs/verbs.h | 6 ++++++ > > > 2 files changed, 30 insertions(+) > > > > > > diff --git a/libibverbs/verbs.c b/libibverbs/verbs.c > > > index 1766b9f5..9ad74ee0 100644 > > > +++ b/libibverbs/verbs.c > > > @@ -324,6 +324,30 @@ LATEST_SYMVER_FUNC(ibv_reg_mr, 1_1, "IBVERBS_1.1", > > > return mr; > > > } > > > > > > +LATEST_SYMVER_FUNC(ibv_reg_mr_virt_as, 1_1, "IBVERBS_1.1", > > > + struct ibv_mr *, > > > + struct ibv_pd *pd, void *addr, size_t length, > > > + uint64_t hca_va, int access) > > > +{ > > > > Doesn't need this macro since it doesn't have a compat version > > That is weird, without this it fails in link stage. > > /usr/bin/ld: hw/rdma/rdma_backend.o: in function `rdma_backend_create_mr': > rdma_backend.c:(.text+0x260a): undefined reference to `ibv_reg_mr_virt_as' > collect2: error: ld returned 1 exit status You do have to update the map file in any case > > > > > + struct verbs_mr *vmr; > > > + struct ibv_reg_mr cmd; > > > + struct ib_uverbs_reg_mr_resp resp; > > > + int ret; > > > + > > > + vmr = malloc(sizeof(*vmr)); > > > + if (!vmr) > > > + return NULL; > > > + > > > + ret = ibv_cmd_reg_mr(pd, addr, length, hca_va, access, vmr, &cmd, > > > + sizeof(cmd), &resp, sizeof(resp)); > > > > It seems problematic not to call the driver, several of the drivers > > are wrappering mr in their own type (ie bnxt_re_mr) and we can't just > > allocate the wrong size of memory here. > > > > What you should do is modify the existing driver callback to accept > > another argument and go and fix all the drivers to pass that argument > > into their ibv_cmd_reg_mr as the hca_va above. This looks pretty > > trivial. > > So back to what was proposed in the RFC besides the addition of new arg > instead of new callback, right? Well, half and half, keep the normal function for the entry point and change the sigature of the callback in driver.h Jason
diff --git a/libibverbs/verbs.c b/libibverbs/verbs.c index 1766b9f5..9ad74ee0 100644 --- a/libibverbs/verbs.c +++ b/libibverbs/verbs.c @@ -324,6 +324,30 @@ LATEST_SYMVER_FUNC(ibv_reg_mr, 1_1, "IBVERBS_1.1", return mr; } +LATEST_SYMVER_FUNC(ibv_reg_mr_virt_as, 1_1, "IBVERBS_1.1", + struct ibv_mr *, + struct ibv_pd *pd, void *addr, size_t length, + uint64_t hca_va, int access) +{ + struct verbs_mr *vmr; + struct ibv_reg_mr cmd; + struct ib_uverbs_reg_mr_resp resp; + int ret; + + vmr = malloc(sizeof(*vmr)); + if (!vmr) + return NULL; + + ret = ibv_cmd_reg_mr(pd, addr, length, hca_va, access, vmr, &cmd, + sizeof(cmd), &resp, sizeof(resp)); + if (ret) { + free(vmr); + return NULL; + } + + return &vmr->ibv_mr; +} + LATEST_SYMVER_FUNC(ibv_rereg_mr, 1_1, "IBVERBS_1.1", int, struct ibv_mr *mr, int flags, diff --git a/libibverbs/verbs.h b/libibverbs/verbs.h index cb2d8439..7b3e12a5 100644 --- a/libibverbs/verbs.h +++ b/libibverbs/verbs.h @@ -2372,6 +2372,12 @@ static inline int ibv_close_xrcd(struct ibv_xrcd *xrcd) struct ibv_mr *ibv_reg_mr(struct ibv_pd *pd, void *addr, size_t length, int access); +/** + * ibv_reg_mr_virt_as - Register a memory region with address from virtual + * address space + */ +struct ibv_mr *ibv_reg_mr_virt_as(struct ibv_pd *pd, void *addr, size_t length, + uint64_t hca_va, int access); enum ibv_rereg_mr_err_code { /* Old MR is valid, invalid input */
The virtual address that is registered is used as a base for any address used later in post_recv and post_send operations. On a virtualised environment this is not correct. A guest cannot register its memory so hypervisor maps the guest physical address to a host virtual address and register it with the HW. Later on, at datapath phase, the guest fills the SGEs with addresses from its address space. Since HW cannot access guest virtual address space an extra translation is needed to map those addresses to be based on the host virtual address that was registered with the HW. To avoid this, a logical separation between the address that is registered and the address that is used as a offset at datapath phase is needed. This separation is already implemented in the lower layer part (ibv_cmd_reg_mr) but blocked at the API level. Fix it by introducing a new API function that accepts a address from guest virtual address space as well. Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> --- libibverbs/verbs.c | 24 ++++++++++++++++++++++++ libibverbs/verbs.h | 6 ++++++ 2 files changed, 30 insertions(+)