Message ID | 1589202728-12365-1-git-send-email-yishaih@mellanox.com (mailing list archive) |
---|---|
State | RFC |
Headers | show |
Series | [RFC,rdma-core] Verbs: Introduce import verbs for device, PD, MR | expand |
On 11/05/2020 16:12, Yishai Hadas wrote: > Introduce import verbs for device, PD, MR, it enables processes to share > their ibv_contxet and then share PD and MR that is associated with. > > A process is creating a device and then uses some of the Linux systems > calls to dup its 'cmd_fd' member which lets other process to obtain > owning on. > > Once other process obtains the 'cmd_fd' it can call ibv_import_device() > which returns an ibv_contxet on the original RDMA device. > > On the imported device there is an option to import PD(s) and MR(s) to > achieve a sharing on those objects. > > This is the responsibility of the application to coordinate between all > ibv_context(s) that use the imported objects, such that once destroy is > done no other process can touch the object except for unimport. All > users of the context must collaborate to ensure this. > > A matching unimport verbs where introduced for PD and MR, for the device > the ibv_close_device() API should be used. > > Detailed man pages are introduced as part of this RFC patch to clarify > the expected usage and notes. > > Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Hi Yishai, A few questions: Can you please explain the use case? I remember there was a discussion on the previous shared PD kernel submission (by Yuval and Shamir) but I'm not sure if there was a conclusion. Could you please elaborate more how the process cleanup flow (e.g killed process) is going to change? I know it's a very broad question but I'm just trying to get the general idea. What's expected to happen in a case where we have two processes P1 & P2, both use a shared PD, but separate MRs and QPs (created under the same shared PD). Now when an RDMA read request arrives at P2's QP, but refers to an MR of P1 (which was not imported, but under the same PD), how would you expect the device to handle that?
On 5/11/2020 5:31 PM, Gal Pressman wrote: > On 11/05/2020 16:12, Yishai Hadas wrote: >> Introduce import verbs for device, PD, MR, it enables processes to share >> their ibv_contxet and then share PD and MR that is associated with. >> >> A process is creating a device and then uses some of the Linux systems >> calls to dup its 'cmd_fd' member which lets other process to obtain >> owning on. >> >> Once other process obtains the 'cmd_fd' it can call ibv_import_device() >> which returns an ibv_contxet on the original RDMA device. >> >> On the imported device there is an option to import PD(s) and MR(s) to >> achieve a sharing on those objects. >> >> This is the responsibility of the application to coordinate between all >> ibv_context(s) that use the imported objects, such that once destroy is >> done no other process can touch the object except for unimport. All >> users of the context must collaborate to ensure this. >> >> A matching unimport verbs where introduced for PD and MR, for the device >> the ibv_close_device() API should be used. >> >> Detailed man pages are introduced as part of this RFC patch to clarify >> the expected usage and notes. >> >> Signed-off-by: Yishai Hadas <yishaih@mellanox.com> > > Hi Yishai, > > A few questions: > Can you please explain the use case? I remember there was a discussion on the > previous shared PD kernel submission (by Yuval and Shamir) but I'm not sure if > there was a conclusion. > The expected flow and use case are as follows. One process creates an ibv_context by calling ibv_open_device() and then enables owning of its 'cmd_fd' with other processes by some Linux system call, (see man page as part of this RFC for some alternatives). Then other process that owns this 'cmd_fd' will be able to have its own ibv_context for the same RDMA device by calling ibv_import_device(). At that point those processes really work on same kernel context and PD(s), MR(s) and potentially other objects in the future can be shared by calling ibv_import_pd()/mr() assuming that the initiator process let's the other ones know the kernel handle value. Once a PD and MR which points to this PD were shared it enables a memory that was registered by one process to be used by others with the matching lkey/rkey for RDMA operations. > Could you please elaborate more how the process cleanup flow (e.g killed > process) is going to change? I know it's a very broad question but I'm just > trying to get the general idea. > For now the model in those suggested APIs is that cleanup will be done or explicitly by calling the relevant destroy command or alternatively once all processes that own the cmd_fd will be closed. From kernel side there is only one object and its ref count is not increased as part of the import_xxx() functions, see in the man pages some notes regarding this point. > What's expected to happen in a case where we have two processes P1 & P2, both > use a shared PD, but separate MRs and QPs (created under the same shared PD). > Now when an RDMA read request arrives at P2's QP, but refers to an MR of P1 > (which was not imported, but under the same PD), how would you expect the device > to handle that? > The processes are behaving almost like 2 threads each have a QP and an MR, if you mix them around it will work just like any buggy software. In this case I would expect the device to scatter to the MR that was pointed by the RDMA read request, any reason that it will behave differently ? Yishai
On 11/05/2020 18:35, Yishai Hadas wrote: > On 5/11/2020 5:31 PM, Gal Pressman wrote: >> On 11/05/2020 16:12, Yishai Hadas wrote: >>> Introduce import verbs for device, PD, MR, it enables processes to share >>> their ibv_contxet and then share PD and MR that is associated with. >>> >>> A process is creating a device and then uses some of the Linux systems >>> calls to dup its 'cmd_fd' member which lets other process to obtain >>> owning on. >>> >>> Once other process obtains the 'cmd_fd' it can call ibv_import_device() >>> which returns an ibv_contxet on the original RDMA device. >>> >>> On the imported device there is an option to import PD(s) and MR(s) to >>> achieve a sharing on those objects. >>> >>> This is the responsibility of the application to coordinate between all >>> ibv_context(s) that use the imported objects, such that once destroy is >>> done no other process can touch the object except for unimport. All >>> users of the context must collaborate to ensure this. >>> >>> A matching unimport verbs where introduced for PD and MR, for the device >>> the ibv_close_device() API should be used. >>> >>> Detailed man pages are introduced as part of this RFC patch to clarify >>> the expected usage and notes. >>> >>> Signed-off-by: Yishai Hadas <yishaih@mellanox.com> >> >> Hi Yishai, >> >> A few questions: >> Can you please explain the use case? I remember there was a discussion on the >> previous shared PD kernel submission (by Yuval and Shamir) but I'm not sure if >> there was a conclusion. >> > > The expected flow and use case are as follows. > > One process creates an ibv_context by calling ibv_open_device() and then enables > owning of its 'cmd_fd' with other processes by some Linux system call, (see man > page as part of this RFC for some alternatives). Then other process that owns > this 'cmd_fd' will be able to have its own ibv_context for the same RDMA device > by calling ibv_import_device(). > > At that point those processes really work on same kernel context and PD(s), > MR(s) and potentially other objects in the future can be shared by calling > ibv_import_pd()/mr() assuming that the initiator process let's the other ones > know the kernel handle value. > > Once a PD and MR which points to this PD were shared it enables a memory that > was registered by one process to be used by others with the matching lkey/rkey > for RDMA operations. Thanks Yishai. Which type of applications need this kind of functionality? >> Could you please elaborate more how the process cleanup flow (e.g killed >> process) is going to change? I know it's a very broad question but I'm just >> trying to get the general idea. >> > > For now the model in those suggested APIs is that cleanup will be done or > explicitly by calling the relevant destroy command or alternatively once all > processes that own the cmd_fd will be closed. > > From kernel side there is only one object and its ref count is not increased as > part of the import_xxx() functions, see in the man pages some notes regarding > this point. ACK. >> What's expected to happen in a case where we have two processes P1 & P2, both >> use a shared PD, but separate MRs and QPs (created under the same shared PD). >> Now when an RDMA read request arrives at P2's QP, but refers to an MR of P1 >> (which was not imported, but under the same PD), how would you expect the device >> to handle that? >> > > The processes are behaving almost like 2 threads each have a QP and an MR, if > you mix them around it will work just like any buggy software. > In this case I would expect the device to scatter to the MR that was pointed by > the RDMA read request, any reason that it will behave differently ? I meant that the process is the RDMA read responder, not requester (although it's very similar), are we OK with one process accessing memory of a different process even though the MR isn't exported? I'm wondering whether there are any assumption about the "security" model of this feature, or are both processes considered exactly the same. Especially since both the kernel and the device aren't aware of the shared resources. It's a bit confusing that some of the resources are shared while others aren't though all created using the same PD.
On Tue, May 12, 2020 at 11:24 AM Gal Pressman <galpress@amazon.com> wrote: > > On 11/05/2020 18:35, Yishai Hadas wrote: > > On 5/11/2020 5:31 PM, Gal Pressman wrote: > >> On 11/05/2020 16:12, Yishai Hadas wrote: > >>> Introduce import verbs for device, PD, MR, it enables processes to share > >>> their ibv_contxet and then share PD and MR that is associated with. > >>> > >>> A process is creating a device and then uses some of the Linux systems > >>> calls to dup its 'cmd_fd' member which lets other process to obtain > >>> owning on. > >>> > >>> Once other process obtains the 'cmd_fd' it can call ibv_import_device() > >>> which returns an ibv_contxet on the original RDMA device. > >>> > >>> On the imported device there is an option to import PD(s) and MR(s) to > >>> achieve a sharing on those objects. > >>> > >>> This is the responsibility of the application to coordinate between all > >>> ibv_context(s) that use the imported objects, such that once destroy is > >>> done no other process can touch the object except for unimport. All > >>> users of the context must collaborate to ensure this. > >>> > >>> A matching unimport verbs where introduced for PD and MR, for the device > >>> the ibv_close_device() API should be used. > >>> > >>> Detailed man pages are introduced as part of this RFC patch to clarify > >>> the expected usage and notes. > >>> > >>> Signed-off-by: Yishai Hadas <yishaih@mellanox.com> > >> > >> Hi Yishai, > >> > >> A few questions: > >> Can you please explain the use case? I remember there was a discussion on the > >> previous shared PD kernel submission (by Yuval and Shamir) but I'm not sure if > >> there was a conclusion. > >> > > > > The expected flow and use case are as follows. > > > > One process creates an ibv_context by calling ibv_open_device() and then enables > > owning of its 'cmd_fd' with other processes by some Linux system call, (see man > > page as part of this RFC for some alternatives). Then other process that owns > > this 'cmd_fd' will be able to have its own ibv_context for the same RDMA device > > by calling ibv_import_device(). > > > > At that point those processes really work on same kernel context and PD(s), > > MR(s) and potentially other objects in the future can be shared by calling > > ibv_import_pd()/mr() assuming that the initiator process let's the other ones > > know the kernel handle value. > > > > Once a PD and MR which points to this PD were shared it enables a memory that > > was registered by one process to be used by others with the matching lkey/rkey > > for RDMA operations. > > Thanks Yishai. > Which type of applications need this kind of functionality? Any solution which is a single business logic based on multi-process design needs this. Example include NGINX, with TCP load balancing, sharing the RSS indirection table with RQ per process. HPC frameworks with multi-rank(process) solution on single hosts. UCX can share IB resources using the shared PD and can help dispatch data to multiple processes/MR's in single RDMA operation. Also, we have solutions in which the primary processes registered a large shared memory range, and each worker process spawned will create a private QP on the shared PD, and use the shared MR to save the registration time per-process. > > >> Could you please elaborate more how the process cleanup flow (e.g killed > >> process) is going to change? I know it's a very broad question but I'm just > >> trying to get the general idea. > >> > > > > For now the model in those suggested APIs is that cleanup will be done or > > explicitly by calling the relevant destroy command or alternatively once all > > processes that own the cmd_fd will be closed. > > > > From kernel side there is only one object and its ref count is not increased as > > part of the import_xxx() functions, see in the man pages some notes regarding > > this point. > > ACK. > > >> What's expected to happen in a case where we have two processes P1 & P2, both > >> use a shared PD, but separate MRs and QPs (created under the same shared PD). > >> Now when an RDMA read request arrives at P2's QP, but refers to an MR of P1 > >> (which was not imported, but under the same PD), how would you expect the device > >> to handle that? > >> > > > > The processes are behaving almost like 2 threads each have a QP and an MR, if > > you mix them around it will work just like any buggy software. > > In this case I would expect the device to scatter to the MR that was pointed by > > the RDMA read request, any reason that it will behave differently ? > > I meant that the process is the RDMA read responder, not requester (although > it's very similar), are we OK with one process accessing memory of a different > process even though the MR isn't exported? > > I'm wondering whether there are any assumption about the "security" model of > this feature, or are both processes considered exactly the same. Especially > since both the kernel and the device aren't aware of the shared resources. The RDMA security model is bound to the protection domain, so once the application logic shared it's PD (via the 'handle') it shared extended the security scope. > It's a bit confusing that some of the resources are shared while others aren't > though all created using the same PD. In this RFC, the shared resource are only stateless resource. Just import the resource, based on handle, and you have access. Current design doesn't add any shared state for resources running on different process memory spaces, objects like QP, CQ, need user-space state shared to be really usable between processes ... hopefully some days we'll get their. Alex
On 12/05/2020 13:51, Alex Rosenbaum wrote: > On Tue, May 12, 2020 at 11:24 AM Gal Pressman <galpress@amazon.com> wrote: >> >> On 11/05/2020 18:35, Yishai Hadas wrote: >>> On 5/11/2020 5:31 PM, Gal Pressman wrote: >>>> On 11/05/2020 16:12, Yishai Hadas wrote: >>>>> Introduce import verbs for device, PD, MR, it enables processes to share >>>>> their ibv_contxet and then share PD and MR that is associated with. >>>>> >>>>> A process is creating a device and then uses some of the Linux systems >>>>> calls to dup its 'cmd_fd' member which lets other process to obtain >>>>> owning on. >>>>> >>>>> Once other process obtains the 'cmd_fd' it can call ibv_import_device() >>>>> which returns an ibv_contxet on the original RDMA device. >>>>> >>>>> On the imported device there is an option to import PD(s) and MR(s) to >>>>> achieve a sharing on those objects. >>>>> >>>>> This is the responsibility of the application to coordinate between all >>>>> ibv_context(s) that use the imported objects, such that once destroy is >>>>> done no other process can touch the object except for unimport. All >>>>> users of the context must collaborate to ensure this. >>>>> >>>>> A matching unimport verbs where introduced for PD and MR, for the device >>>>> the ibv_close_device() API should be used. >>>>> >>>>> Detailed man pages are introduced as part of this RFC patch to clarify >>>>> the expected usage and notes. >>>>> >>>>> Signed-off-by: Yishai Hadas <yishaih@mellanox.com> >>>> >>>> Hi Yishai, >>>> >>>> A few questions: >>>> Can you please explain the use case? I remember there was a discussion on the >>>> previous shared PD kernel submission (by Yuval and Shamir) but I'm not sure if >>>> there was a conclusion. >>>> >>> >>> The expected flow and use case are as follows. >>> >>> One process creates an ibv_context by calling ibv_open_device() and then enables >>> owning of its 'cmd_fd' with other processes by some Linux system call, (see man >>> page as part of this RFC for some alternatives). Then other process that owns >>> this 'cmd_fd' will be able to have its own ibv_context for the same RDMA device >>> by calling ibv_import_device(). >>> >>> At that point those processes really work on same kernel context and PD(s), >>> MR(s) and potentially other objects in the future can be shared by calling >>> ibv_import_pd()/mr() assuming that the initiator process let's the other ones >>> know the kernel handle value. >>> >>> Once a PD and MR which points to this PD were shared it enables a memory that >>> was registered by one process to be used by others with the matching lkey/rkey >>> for RDMA operations. >> >> Thanks Yishai. >> Which type of applications need this kind of functionality? > > Any solution which is a single business logic based on multi-process > design needs this. > Example include NGINX, with TCP load balancing, sharing the RSS > indirection table with RQ per process. > HPC frameworks with multi-rank(process) solution on single hosts. UCX > can share IB resources using the shared PD and can help dispatch data > to multiple processes/MR's in single RDMA operation. > Also, we have solutions in which the primary processes registered a > large shared memory range, and each worker process spawned will create > a private QP on the shared PD, and use the shared MR to save the > registration time per-process. > >> >>>> Could you please elaborate more how the process cleanup flow (e.g killed >>>> process) is going to change? I know it's a very broad question but I'm just >>>> trying to get the general idea. >>>> >>> >>> For now the model in those suggested APIs is that cleanup will be done or >>> explicitly by calling the relevant destroy command or alternatively once all >>> processes that own the cmd_fd will be closed. >>> >>> From kernel side there is only one object and its ref count is not increased as >>> part of the import_xxx() functions, see in the man pages some notes regarding >>> this point. >> >> ACK. >> >>>> What's expected to happen in a case where we have two processes P1 & P2, both >>>> use a shared PD, but separate MRs and QPs (created under the same shared PD). >>>> Now when an RDMA read request arrives at P2's QP, but refers to an MR of P1 >>>> (which was not imported, but under the same PD), how would you expect the device >>>> to handle that? >>>> >>> >>> The processes are behaving almost like 2 threads each have a QP and an MR, if >>> you mix them around it will work just like any buggy software. >>> In this case I would expect the device to scatter to the MR that was pointed by >>> the RDMA read request, any reason that it will behave differently ? >> >> I meant that the process is the RDMA read responder, not requester (although >> it's very similar), are we OK with one process accessing memory of a different >> process even though the MR isn't exported? >> >> I'm wondering whether there are any assumption about the "security" model of >> this feature, or are both processes considered exactly the same. Especially >> since both the kernel and the device aren't aware of the shared resources. > > The RDMA security model is bound to the protection domain, so once the > application logic shared it's PD (via the 'handle') it shared extended > the security scope. > >> It's a bit confusing that some of the resources are shared while others aren't >> though all created using the same PD. > > In this RFC, the shared resource are only stateless resource. Just > import the resource, based on handle, and you have access. > Current design doesn't add any shared state for resources running on > different process memory spaces, objects like QP, CQ, need user-space > state shared to be really usable between processes ... hopefully some > days we'll get their. Thanks Alex. Let me know if I'm missing anything but assuming I'm importing an MR, I realise that the address and length fields aren't going to be valid, but still the MR points to physical memory that probably isn't in my address space. So the process has access to post operations on the MR, but can't access its data? How's the implementation of the new callbacks going to look like? It sounds like this feature doesn't involve the device at all, in that case I assume it won't involve the providers? Is it going to be a generic libibverbs implementation?
On Tue, May 12, 2020 at 02:44:54PM +0300, Gal Pressman wrote: > Let me know if I'm missing anything but assuming I'm importing an MR, I realise > that the address and length fields aren't going to be valid, but > still the MR The length can probably be made valid.. > points to physical memory that probably isn't in my address space. > So the process has access to post operations on the MR, but can't access its data? Right, unless the app takes other measures to share the pages > How's the implementation of the new callbacks going to look like? > It sounds like this feature doesn't involve the device at all, in that case I > assume it won't involve the providers? Is it going to be a generic libibverbs > implementation? In the ibverbs model drivers always have to build their driver specific objects, so driver involvment is required, though it may be trivial for some drivers and some objects. Jason
diff --git a/libibverbs/man/CMakeLists.txt b/libibverbs/man/CMakeLists.txt index e1d5edf8..9ebfeaac 100644 --- a/libibverbs/man/CMakeLists.txt +++ b/libibverbs/man/CMakeLists.txt @@ -36,6 +36,9 @@ rdma_man_pages( ibv_get_device_name.3.md ibv_get_pkey_index.3.md ibv_get_srq_num.3.md + ibv_import_device.3.md + ibv_import_mr.3.md + ibv_import_pd.3.md ibv_inc_rkey.3.md ibv_modify_qp.3 ibv_modify_qp_rate_limit.3 @@ -97,6 +100,8 @@ rdma_alias_man_pages( ibv_get_async_event.3 ibv_ack_async_event.3 ibv_get_cq_event.3 ibv_ack_cq_events.3 ibv_get_device_list.3 ibv_free_device_list.3 + ibv_import_mr.3 ibv_unimport_mr.3 + ibv_import_pd.3 ibv_unimport_pd.3 ibv_open_device.3 ibv_close_device.3 ibv_open_xrcd.3 ibv_close_xrcd.3 ibv_rate_to_mbps.3 mbps_to_ibv_rate.3 diff --git a/libibverbs/man/ibv_import_device.3.md b/libibverbs/man/ibv_import_device.3.md new file mode 100644 index 00000000..601b50a8 --- /dev/null +++ b/libibverbs/man/ibv_import_device.3.md @@ -0,0 +1,48 @@ +--- +date: 2020-5-3 +footer: libibverbs +header: "Libibverbs Programmer's Manual" +layout: page +license: 'Licensed under the OpenIB.org BSD license (FreeBSD Variant) - See COPYING.md' +section: 3 +title: ibv_import_device +--- + +# NAME + +ibv_import_device - import a device from a given comamnd FD + +# SYNOPSIS + +```c +#include <infiniband/verbs.h> + +struct ibv_context *ibv_import_device(int cmd_fd); + +``` + + +# DESCRIPTION + +**ibv_import_device()** returns an *ibv_context* pointer that is associated with the given +*cmd_fd*. + +The *cmd_fd* is obtained from the ibv_context cmd_fd member, which must be dup'd (eg by dup(), SCM_RIGHTS, etc) +before being passed to ibv_import_device(). + +Once the *ibv_context* usage has been ended *ibv_close_device()* should be called. +This call may cleanup whatever is needed/opposite of the import including closing the command FD. + +# RETURN VALUE + +**ibv_import_device()** returns a pointer to the allocated RDMA context, or NULL if the request fails. + +# SEE ALSO + +**ibv_open_device**(3), +**ibv_close_device**(3), + +# AUTHOR + +Yishai Hadas <yishaih@mellanox.com> + diff --git a/libibverbs/man/ibv_import_mr.3.md b/libibverbs/man/ibv_import_mr.3.md new file mode 100644 index 00000000..ca698a96 --- /dev/null +++ b/libibverbs/man/ibv_import_mr.3.md @@ -0,0 +1,63 @@ +--- +date: 2020-5-3 +footer: libibverbs +header: "Libibverbs Programmer's Manual" +layout: page +license: 'Licensed under the OpenIB.org BSD license (FreeBSD Variant) - See COPYING.md' +section: 3 +title: ibv_import_mr ibv_unimport_mr +--- + +# NAME + +ibv_import_mr - import an MR from a given ibv_context +ibv_unimport_mr - unimport an MR + +# SYNOPSIS + +```c +#include <infiniband/verbs.h> + +struct ibv_mr *ibv_import_mr(struct ibv_pd *pd, uint32_t handle); +void ibv_unimport_mr(struct ibv_mr *mr) + +``` + + +# DESCRIPTION + +**ibv_import_mr()** returns a Memory region (MR) that is associated with the given +*handle* in the RDMA context that assosicated with the given *pd*. + +The input *handle* value must be a valid kernel handle for an MR object in the given *context*. +The returned *ibv_mr* can be used in all verbs that use an MR. + +**ibv_unimport_mr()** un import the MR. +Once the MR usage has been ended ibv_dereg_mr() or ibv_unimport_mr() should be called. +The first one will go to the kernel to destroy the object once the second one way cleanup what +ever is needed/opposite of the import without calling the kernel. + +This is the responsibility of the application to coordinate between all ibv_context(s) that use this MR. +Once destroy is done no other process can touch the object except for unimport. All users of the context must +collaborate to ensure this. + +# RETURN VALUE + +**ibv_import_mr()** returns a pointer to the allocated MR, or NULL if the request fails. + +# NOTES + +The *addr* and the *length* fields in the imported MR are not applicable, NULL and zero is expected. + +# SEE ALSO + +**ibv_reg_mr**(3), +**ibv_reg_dm_mr**(3), +**ibv_reg_mr_iova**(3), +**ibv_reg_mr_iova2**(3), +**ibv_dereg_mr**(3), + +# AUTHOR + +Yishai Hadas <yishaih@mellanox.com> + diff --git a/libibverbs/man/ibv_import_pd.3.md b/libibverbs/man/ibv_import_pd.3.md new file mode 100644 index 00000000..be1d079f --- /dev/null +++ b/libibverbs/man/ibv_import_pd.3.md @@ -0,0 +1,56 @@ +--- +date: 2020-5-3 +footer: libibverbs +header: "Libibverbs Programmer's Manual" +layout: page +license: 'Licensed under the OpenIB.org BSD license (FreeBSD Variant) - See COPYING.md' +section: 3 +title: ibv_import_pd, ibv_unimport_pd +--- + +# NAME + +ibv_import_pd - import a PD from a given ibv_context +ibv_unimport_pd - unimport a PD + +# SYNOPSIS + +```c +#include <infiniband/verbs.h> + +struct ibv_pd *ibv_import_pd(struct ibv_context *context, uint32_t handle); +void ibv_unimport_pd(struct ibv_pd *pd) + +``` + + +# DESCRIPTION + +**ibv_import_pd()** returns a protection domain (PD) that is associated with the given +*handle* in the given *context*. + +The input *handle* value must be a valid kernel handle for a PD object in the given *context*. +The returned *ibv_pd* can be used in all verbs that get a protection domain. + +**ibv_unimport_pd()** unimport the PD. +Once the PD usage has been ended ibv_dealloc_pd() or ibv_unimport_pd() should be called. +The first one will go to the kernel to destroy the object once the second one way cleanup what +ever is needed/opposite of the import without calling the kernel. + +This is the responsibility of the application to coordinate between all ibv_context(s) that use this PD. +Once destroy is done no other process can touch the object except for unimport. All users of the context must +collaborate to ensure this. + +# RETURN VALUE + +**ibv_import_pd()** returns a pointer to the allocated PD, or NULL if the request fails. + +# SEE ALSO + +**ibv_alloc_pd**(3), +**ibv_dealloc_pd**(3), + +# AUTHOR + +Yishai Hadas <yishaih@mellanox.com> + diff --git a/libibverbs/verbs.h b/libibverbs/verbs.h index 288985d5..8548a7dd 100644 --- a/libibverbs/verbs.h +++ b/libibverbs/verbs.h @@ -2033,6 +2033,12 @@ struct ibv_values_ex { struct verbs_context { /* "grows up" - new fields go here */ + void (*unimport_pd)(struct ibv_pd *pd); + struct ibv_pd *(*import_pd)(struct ibv_context *context, + uint32_t pd_handle); + void (*unimport_mr)(struct ibv_mr *mr); + struct ibv_mr *(*import_mr)(struct ibv_pd *pd, + uint32_t mr_handle); int (*query_port)(struct ibv_context *context, uint8_t port_num, struct ibv_port_attr *port_attr, size_t port_attr_len); @@ -2217,6 +2223,12 @@ struct ibv_context *ibv_open_device(struct ibv_device *device); */ int ibv_close_device(struct ibv_context *context); +/** + * ibv_import_device - Import device + */ +struct ibv_context *ibv_import_device(int cmd_fd); + + /** * ibv_get_async_event - Get next async event * @event: Pointer to use to return async event @@ -2546,6 +2558,50 @@ static inline int ibv_advise_mr(struct ibv_pd *pd, return vctx->advise_mr(pd, advice, flags, sg_list, num_sge); } +static inline struct ibv_mr *ibv_import_mr(struct ibv_pd *pd, + uint32_t mr_handle) +{ + struct verbs_context *vctx; + + vctx = verbs_get_ctx_op(pd->context, import_mr); + if (!vctx) { + errno = EOPNOTSUPP; + return NULL; + } + + return vctx->import_mr(pd, mr_handle); +} + +static inline void ibv_unimport_mr(struct ibv_mr *mr) +{ + struct verbs_context *vctx; + + vctx = verbs_get_ctx_op(mr->context, unimport_mr); + vctx->unimport_mr(mr); +} + +static inline struct ibv_pd *ibv_import_pd(struct ibv_context *context, + uint32_t pd_handle) +{ + struct verbs_context *vctx; + + vctx = verbs_get_ctx_op(context, import_pd); + if (!vctx) { + errno = EOPNOTSUPP; + return NULL; + } + + return vctx->import_pd(context, pd_handle); +} + +static inline void ibv_unimport_pd(struct ibv_pd *pd) +{ + struct verbs_context *vctx; + + vctx = verbs_get_ctx_op(pd->context, unimport_pd); + vctx->unimport_pd(pd); +} + /** * ibv_alloc_dm - Allocate device memory * @context - Context DM will be attached to
Introduce import verbs for device, PD, MR, it enables processes to share their ibv_contxet and then share PD and MR that is associated with. A process is creating a device and then uses some of the Linux systems calls to dup its 'cmd_fd' member which lets other process to obtain owning on. Once other process obtains the 'cmd_fd' it can call ibv_import_device() which returns an ibv_contxet on the original RDMA device. On the imported device there is an option to import PD(s) and MR(s) to achieve a sharing on those objects. This is the responsibility of the application to coordinate between all ibv_context(s) that use the imported objects, such that once destroy is done no other process can touch the object except for unimport. All users of the context must collaborate to ensure this. A matching unimport verbs where introduced for PD and MR, for the device the ibv_close_device() API should be used. Detailed man pages are introduced as part of this RFC patch to clarify the expected usage and notes. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> --- libibverbs/man/CMakeLists.txt | 5 +++ libibverbs/man/ibv_import_device.3.md | 48 ++++++++++++++++++++ libibverbs/man/ibv_import_mr.3.md | 63 +++++++++++++++++++++++++++ libibverbs/man/ibv_import_pd.3.md | 56 ++++++++++++++++++++++++ libibverbs/verbs.h | 56 ++++++++++++++++++++++++ 5 files changed, 228 insertions(+) create mode 100644 libibverbs/man/ibv_import_device.3.md create mode 100644 libibverbs/man/ibv_import_mr.3.md create mode 100644 libibverbs/man/ibv_import_pd.3.md