mbox series

[0/6] Enable P2PDMA in Userspace RDMA

Message ID 20240605192934.742369-1-martin.oliveira@eideticom.com (mailing list archive)
Headers show
Series Enable P2PDMA in Userspace RDMA | expand

Message

Martin Oliveira June 5, 2024, 7:29 p.m. UTC
This patch series enables P2PDMA memory to be used in userspace RDMA
transfers. With this series, P2PDMA memory mmaped into userspace (ie.
only NVMe CMBs, at the moment) can then be used with ibv_reg_mr() (or
similar) interfaces. This can be tested by passing a sysfs p2pmem
allocator to the --mmap flag of the perftest tools.

This requires addressing three issues:

* Stop exporting the P2PDMA VMAs with page_mkwrite which is incompatible
with FOLL_LONGTERM

* Fix folio_fast_pin_allowed() path to take into account ZONE_DEVICE pages.

* Remove the restriction on FOLL_LONGTREM with FOLL_PCI_P2PDMA which was
initially put in place due to excessive caution with assuming P2PDMA
would have similar problems to fsdax with unmap_mapping_range(). Seeing
P2PDMA only uses unmap_mapping_range() on device unbind and immediately
waits for all page reference counts to go to zero after calling it, it
is actually believed to be safe from reuse and user access faults. See
[1] for more discussion.

This was tested using a Mellanox ConnectX-6 SmartNIC (MT28908 Family),
using the mlx5_core driver, as well as an NVMe CMB.

Thanks,
Martin

[1]: https://lore.kernel.org/linux-mm/87cypuvh2i.fsf@nvdebian.thelocal/T/

Martin Oliveira (6):
  kernfs: create vm_operations_struct without page_mkwrite()
  sysfs: add mmap_allocates parameter to struct bin_attribute
  PCI/P2PDMA: create VMA without page_mkwrite() operator
  mm/gup: handle ZONE_DEVICE pages in folio_fast_pin_allowed()
  mm/gup: allow FOLL_LONGTERM & FOLL_PCI_P2PDMA
  RDMA/umem: add support for P2P RDMA

 drivers/infiniband/core/umem.c |  3 +++
 drivers/pci/p2pdma.c           |  1 +
 fs/kernfs/file.c               | 15 ++++++++++++++-
 fs/sysfs/file.c                | 25 +++++++++++++++++++------
 include/linux/kernfs.h         |  7 +++++++
 include/linux/sysfs.h          |  1 +
 mm/gup.c                       |  9 ++++-----
 7 files changed, 49 insertions(+), 12 deletions(-)


base-commit: c3f38fa61af77b49866b006939479069cd451173

Comments

Zhu Yanjun June 6, 2024, 8:53 a.m. UTC | #1
On 05.06.24 21:29, Martin Oliveira wrote:
> This patch series enables P2PDMA memory to be used in userspace RDMA
> transfers. With this series, P2PDMA memory mmaped into userspace (ie.
> only NVMe CMBs, at the moment) can then be used with ibv_reg_mr() (or
> similar) interfaces. This can be tested by passing a sysfs p2pmem
> allocator to the --mmap flag of the perftest tools.

Do you mean the following --mmap flag?
"
--mmap=file  Use an mmap'd file as the buffer for testing P2P transfers.
"
I am interested in this. Can you provide the full steps to make tests 
with this patch series?

Thanks a lot.
Zhu Yanjun

> 
> This requires addressing three issues:
> 
> * Stop exporting the P2PDMA VMAs with page_mkwrite which is incompatible
> with FOLL_LONGTERM
> 
> * Fix folio_fast_pin_allowed() path to take into account ZONE_DEVICE pages.
> 
> * Remove the restriction on FOLL_LONGTREM with FOLL_PCI_P2PDMA which was
> initially put in place due to excessive caution with assuming P2PDMA
> would have similar problems to fsdax with unmap_mapping_range(). Seeing
> P2PDMA only uses unmap_mapping_range() on device unbind and immediately
> waits for all page reference counts to go to zero after calling it, it
> is actually believed to be safe from reuse and user access faults. See
> [1] for more discussion.
> 
> This was tested using a Mellanox ConnectX-6 SmartNIC (MT28908 Family),
> using the mlx5_core driver, as well as an NVMe CMB.
> 
> Thanks,
> Martin
> 
> [1]: https://lore.kernel.org/linux-mm/87cypuvh2i.fsf@nvdebian.thelocal/T/
> 
> Martin Oliveira (6):
>    kernfs: create vm_operations_struct without page_mkwrite()
>    sysfs: add mmap_allocates parameter to struct bin_attribute
>    PCI/P2PDMA: create VMA without page_mkwrite() operator
>    mm/gup: handle ZONE_DEVICE pages in folio_fast_pin_allowed()
>    mm/gup: allow FOLL_LONGTERM & FOLL_PCI_P2PDMA
>    RDMA/umem: add support for P2P RDMA
> 
>   drivers/infiniband/core/umem.c |  3 +++
>   drivers/pci/p2pdma.c           |  1 +
>   fs/kernfs/file.c               | 15 ++++++++++++++-
>   fs/sysfs/file.c                | 25 +++++++++++++++++++------
>   include/linux/kernfs.h         |  7 +++++++
>   include/linux/sysfs.h          |  1 +
>   mm/gup.c                       |  9 ++++-----
>   7 files changed, 49 insertions(+), 12 deletions(-)
> 
> 
> base-commit: c3f38fa61af77b49866b006939479069cd451173
Martin Oliveira June 6, 2024, 9:32 p.m. UTC | #2
On 2024-06-06 02:53, Zhu Yanjun wrote:
> On 05.06.24 21:29, Martin Oliveira wrote:
>> This patch series enables P2PDMA memory to be used in userspace RDMA
>> transfers. With this series, P2PDMA memory mmaped into userspace (ie.
>> only NVMe CMBs, at the moment) can then be used with ibv_reg_mr() (or
>> similar) interfaces. This can be tested by passing a sysfs p2pmem
>> allocator to the --mmap flag of the perftest tools.
> 
> Do you mean the following --mmap flag?
> "
> --mmap=fileĀ  Use an mmap'd file as the buffer for testing P2P transfers.
> "

Yes

> I am interested in this. Can you provide the full steps to make tests
> with this patch series
First start the server with:

ib_read_bw

Then run a client with something like this:

ib_read_bw --mmap /sys/bus/pci/devices/0000\:c5\:00.0/p2pmem/allocate <host>

Martin