diff mbox series

[rdma-core] verbs: Introduce a new reg_mr API for virtual address space

Message ID 20190527150004.21191-1-yuval.shaia@oracle.com (mailing list archive)
State Superseded
Headers show
Series [rdma-core] verbs: Introduce a new reg_mr API for virtual address space | expand

Commit Message

Yuval Shaia May 27, 2019, 3 p.m. UTC
The virtual address that is registered is used as a base for any address
used later in post_recv and post_send operations.

On a virtualised environment this is not correct.

A guest cannot register its memory so hypervisor maps the guest physical
address to a host virtual address and register it with the HW. Later on,
at datapath phase, the guest fills the SGEs with addresses from its
address space.
Since HW cannot access guest virtual address space an extra translation
is needed to map those addresses to be based on the host virtual address
that was registered with the HW.

To avoid this, a logical separation between the address that is
registered and the address that is used as a offset at datapath phase is
needed.

This separation is already implemented in the lower layer part
(ibv_cmd_reg_mr) but blocked at the API level.

Fix it by introducing a new API function that accepts a address from
guest virtual address space as well.

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
---
 libibverbs/verbs.c | 24 ++++++++++++++++++++++++
 libibverbs/verbs.h |  6 ++++++
 2 files changed, 30 insertions(+)

Comments

Jason Gunthorpe May 27, 2019, 6:22 p.m. UTC | #1
On Mon, May 27, 2019 at 06:00:04PM +0300, Yuval Shaia wrote:
> The virtual address that is registered is used as a base for any address
> used later in post_recv and post_send operations.
> 
> On a virtualised environment this is not correct.
> 
> A guest cannot register its memory so hypervisor maps the guest physical
> address to a host virtual address and register it with the HW. Later on,
> at datapath phase, the guest fills the SGEs with addresses from its
> address space.
> Since HW cannot access guest virtual address space an extra translation
> is needed to map those addresses to be based on the host virtual address
> that was registered with the HW.
> 
> To avoid this, a logical separation between the address that is
> registered and the address that is used as a offset at datapath phase is
> needed.
> 
> This separation is already implemented in the lower layer part
> (ibv_cmd_reg_mr) but blocked at the API level.
> 
> Fix it by introducing a new API function that accepts a address from
> guest virtual address space as well.
> 
> Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
>  libibverbs/verbs.c | 24 ++++++++++++++++++++++++
>  libibverbs/verbs.h |  6 ++++++
>  2 files changed, 30 insertions(+)
> 
> diff --git a/libibverbs/verbs.c b/libibverbs/verbs.c
> index 1766b9f5..9ad74ee0 100644
> +++ b/libibverbs/verbs.c
> @@ -324,6 +324,30 @@ LATEST_SYMVER_FUNC(ibv_reg_mr, 1_1, "IBVERBS_1.1",
>  	return mr;
>  }
>  
> +LATEST_SYMVER_FUNC(ibv_reg_mr_virt_as, 1_1, "IBVERBS_1.1",
> +		   struct ibv_mr *,
> +		   struct ibv_pd *pd, void *addr, size_t length,
> +		   uint64_t hca_va, int access)
> +{

Doesn't need this macro since it doesn't have a compat version

> +	struct verbs_mr *vmr;
> +	struct ibv_reg_mr cmd;
> +	struct ib_uverbs_reg_mr_resp resp;
> +	int ret;
> +
> +	vmr = malloc(sizeof(*vmr));
> +	if (!vmr)
> +		return NULL;
> +
> +	ret = ibv_cmd_reg_mr(pd, addr, length, hca_va, access, vmr, &cmd,
> +			     sizeof(cmd), &resp, sizeof(resp));

It seems problematic not to call the driver, several of the drivers
are wrappering mr in their own type (ie bnxt_re_mr) and we can't just
allocate the wrong size of memory here.

What you should do is modify the existing driver callback to accept
another argument and go and fix all the drivers to pass that argument
into their ibv_cmd_reg_mr as the hca_va above. This looks pretty
trivial.

Jason
Yuval Shaia May 28, 2019, 6:30 a.m. UTC | #2
On Mon, May 27, 2019 at 03:22:20PM -0300, Jason Gunthorpe wrote:
> On Mon, May 27, 2019 at 06:00:04PM +0300, Yuval Shaia wrote:
> > The virtual address that is registered is used as a base for any address
> > used later in post_recv and post_send operations.
> > 
> > On a virtualised environment this is not correct.
> > 
> > A guest cannot register its memory so hypervisor maps the guest physical
> > address to a host virtual address and register it with the HW. Later on,
> > at datapath phase, the guest fills the SGEs with addresses from its
> > address space.
> > Since HW cannot access guest virtual address space an extra translation
> > is needed to map those addresses to be based on the host virtual address
> > that was registered with the HW.
> > 
> > To avoid this, a logical separation between the address that is
> > registered and the address that is used as a offset at datapath phase is
> > needed.
> > 
> > This separation is already implemented in the lower layer part
> > (ibv_cmd_reg_mr) but blocked at the API level.
> > 
> > Fix it by introducing a new API function that accepts a address from
> > guest virtual address space as well.
> > 
> > Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
> >  libibverbs/verbs.c | 24 ++++++++++++++++++++++++
> >  libibverbs/verbs.h |  6 ++++++
> >  2 files changed, 30 insertions(+)
> > 
> > diff --git a/libibverbs/verbs.c b/libibverbs/verbs.c
> > index 1766b9f5..9ad74ee0 100644
> > +++ b/libibverbs/verbs.c
> > @@ -324,6 +324,30 @@ LATEST_SYMVER_FUNC(ibv_reg_mr, 1_1, "IBVERBS_1.1",
> >  	return mr;
> >  }
> >  
> > +LATEST_SYMVER_FUNC(ibv_reg_mr_virt_as, 1_1, "IBVERBS_1.1",
> > +		   struct ibv_mr *,
> > +		   struct ibv_pd *pd, void *addr, size_t length,
> > +		   uint64_t hca_va, int access)
> > +{
> 
> Doesn't need this macro since it doesn't have a compat version

That is weird, without this it fails in link stage.

/usr/bin/ld: hw/rdma/rdma_backend.o: in function `rdma_backend_create_mr':
rdma_backend.c:(.text+0x260a): undefined reference to `ibv_reg_mr_virt_as'
collect2: error: ld returned 1 exit status

> 
> > +	struct verbs_mr *vmr;
> > +	struct ibv_reg_mr cmd;
> > +	struct ib_uverbs_reg_mr_resp resp;
> > +	int ret;
> > +
> > +	vmr = malloc(sizeof(*vmr));
> > +	if (!vmr)
> > +		return NULL;
> > +
> > +	ret = ibv_cmd_reg_mr(pd, addr, length, hca_va, access, vmr, &cmd,
> > +			     sizeof(cmd), &resp, sizeof(resp));
> 
> It seems problematic not to call the driver, several of the drivers
> are wrappering mr in their own type (ie bnxt_re_mr) and we can't just
> allocate the wrong size of memory here.
> 
> What you should do is modify the existing driver callback to accept
> another argument and go and fix all the drivers to pass that argument
> into their ibv_cmd_reg_mr as the hca_va above. This looks pretty
> trivial.

So back to what was proposed in the RFC besides the addition of new arg
instead of new callback, right?

> 
> Jason
Jason Gunthorpe May 28, 2019, 11:26 a.m. UTC | #3
On Tue, May 28, 2019 at 09:30:56AM +0300, Yuval Shaia wrote:
> On Mon, May 27, 2019 at 03:22:20PM -0300, Jason Gunthorpe wrote:
> > On Mon, May 27, 2019 at 06:00:04PM +0300, Yuval Shaia wrote:
> > > The virtual address that is registered is used as a base for any address
> > > used later in post_recv and post_send operations.
> > > 
> > > On a virtualised environment this is not correct.
> > > 
> > > A guest cannot register its memory so hypervisor maps the guest physical
> > > address to a host virtual address and register it with the HW. Later on,
> > > at datapath phase, the guest fills the SGEs with addresses from its
> > > address space.
> > > Since HW cannot access guest virtual address space an extra translation
> > > is needed to map those addresses to be based on the host virtual address
> > > that was registered with the HW.
> > > 
> > > To avoid this, a logical separation between the address that is
> > > registered and the address that is used as a offset at datapath phase is
> > > needed.
> > > 
> > > This separation is already implemented in the lower layer part
> > > (ibv_cmd_reg_mr) but blocked at the API level.
> > > 
> > > Fix it by introducing a new API function that accepts a address from
> > > guest virtual address space as well.
> > > 
> > > Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
> > >  libibverbs/verbs.c | 24 ++++++++++++++++++++++++
> > >  libibverbs/verbs.h |  6 ++++++
> > >  2 files changed, 30 insertions(+)
> > > 
> > > diff --git a/libibverbs/verbs.c b/libibverbs/verbs.c
> > > index 1766b9f5..9ad74ee0 100644
> > > +++ b/libibverbs/verbs.c
> > > @@ -324,6 +324,30 @@ LATEST_SYMVER_FUNC(ibv_reg_mr, 1_1, "IBVERBS_1.1",
> > >  	return mr;
> > >  }
> > >  
> > > +LATEST_SYMVER_FUNC(ibv_reg_mr_virt_as, 1_1, "IBVERBS_1.1",
> > > +		   struct ibv_mr *,
> > > +		   struct ibv_pd *pd, void *addr, size_t length,
> > > +		   uint64_t hca_va, int access)
> > > +{
> > 
> > Doesn't need this macro since it doesn't have a compat version
> 
> That is weird, without this it fails in link stage.
> 
> /usr/bin/ld: hw/rdma/rdma_backend.o: in function `rdma_backend_create_mr':
> rdma_backend.c:(.text+0x260a): undefined reference to `ibv_reg_mr_virt_as'
> collect2: error: ld returned 1 exit status

You do have to update the map file in any case

> > 
> > > +	struct verbs_mr *vmr;
> > > +	struct ibv_reg_mr cmd;
> > > +	struct ib_uverbs_reg_mr_resp resp;
> > > +	int ret;
> > > +
> > > +	vmr = malloc(sizeof(*vmr));
> > > +	if (!vmr)
> > > +		return NULL;
> > > +
> > > +	ret = ibv_cmd_reg_mr(pd, addr, length, hca_va, access, vmr, &cmd,
> > > +			     sizeof(cmd), &resp, sizeof(resp));
> > 
> > It seems problematic not to call the driver, several of the drivers
> > are wrappering mr in their own type (ie bnxt_re_mr) and we can't just
> > allocate the wrong size of memory here.
> > 
> > What you should do is modify the existing driver callback to accept
> > another argument and go and fix all the drivers to pass that argument
> > into their ibv_cmd_reg_mr as the hca_va above. This looks pretty
> > trivial.
> 
> So back to what was proposed in the RFC besides the addition of new arg
> instead of new callback, right?

Well, half and half, keep the normal function for the entry point and
change the sigature of the callback in driver.h

Jason
diff mbox series

Patch

diff --git a/libibverbs/verbs.c b/libibverbs/verbs.c
index 1766b9f5..9ad74ee0 100644
--- a/libibverbs/verbs.c
+++ b/libibverbs/verbs.c
@@ -324,6 +324,30 @@  LATEST_SYMVER_FUNC(ibv_reg_mr, 1_1, "IBVERBS_1.1",
 	return mr;
 }
 
+LATEST_SYMVER_FUNC(ibv_reg_mr_virt_as, 1_1, "IBVERBS_1.1",
+		   struct ibv_mr *,
+		   struct ibv_pd *pd, void *addr, size_t length,
+		   uint64_t hca_va, int access)
+{
+	struct verbs_mr *vmr;
+	struct ibv_reg_mr cmd;
+	struct ib_uverbs_reg_mr_resp resp;
+	int ret;
+
+	vmr = malloc(sizeof(*vmr));
+	if (!vmr)
+		return NULL;
+
+	ret = ibv_cmd_reg_mr(pd, addr, length, hca_va, access, vmr, &cmd,
+			     sizeof(cmd), &resp, sizeof(resp));
+	if (ret) {
+		free(vmr);
+		return NULL;
+	}
+
+	return &vmr->ibv_mr;
+}
+
 LATEST_SYMVER_FUNC(ibv_rereg_mr, 1_1, "IBVERBS_1.1",
 		   int,
 		   struct ibv_mr *mr, int flags,
diff --git a/libibverbs/verbs.h b/libibverbs/verbs.h
index cb2d8439..7b3e12a5 100644
--- a/libibverbs/verbs.h
+++ b/libibverbs/verbs.h
@@ -2372,6 +2372,12 @@  static inline int ibv_close_xrcd(struct ibv_xrcd *xrcd)
 struct ibv_mr *ibv_reg_mr(struct ibv_pd *pd, void *addr,
 			  size_t length, int access);
 
+/**
+ * ibv_reg_mr_virt_as - Register a memory region with address from virtual
+ * address space
+ */
+struct ibv_mr *ibv_reg_mr_virt_as(struct ibv_pd *pd, void *addr, size_t length,
+				  uint64_t hca_va, int access);
 
 enum ibv_rereg_mr_err_code {
 	/* Old MR is valid, invalid input */