mbox series

[0/4] cover-letter: Allow MMIO regions to be exported through dmabuf

Message ID 20241216095920.237117-1-wguay@fb.com (mailing list archive)
Headers show
Series cover-letter: Allow MMIO regions to be exported through dmabuf | expand

Message

Wei Lin Guay Dec. 16, 2024, 9:59 a.m. UTC
From: Wei Lin Guay <wguay@meta.com>

This is another attempt to revive the patches posted by Jason
Gunthorpe and Vivek Kasireddy, at
https://patchwork.kernel.org/project/linux-media/cover/0-v2-472615b3877e+28f7-vfio_dma_buf_jgg@nvidia.com/
https://lwn.net/Articles/970751/

In addition to the initial proposal by Jason, another promising
application is exposing memory from an AI accelerator (bound to VFIO)
to an RDMA device. This would allow the RDMA device to directly access
the accelerator's memory, thereby facilitating direct data
transactions between the RDMA device and the accelerator.

Below is from the text/motivation from the orginal cover letter.

dma-buf has become a way to safely acquire a handle to non-struct page
memory that can still have lifetime controlled by the exporter. Notably
RDMA can now import dma-buf FDs and build them into MRs which allows for
PCI P2P operations. Extend this to allow vfio-pci to export MMIO memory
from PCI device BARs.

This series supports a use case for SPDK where a NVMe device will be owned
by SPDK through VFIO but interacting with a RDMA device. The RDMA device
may directly access the NVMe CMB or directly manipulate the NVMe device's
doorbell using PCI P2P.

However, as a general mechanism, it can support many other scenarios with
VFIO. I imagine this dmabuf approach to be usable by iommufd as well for
generic and safe P2P mappings.

This series goes after the "Break up ioctl dispatch functions to one
function per ioctl" series.

v2:
 - Name the new file dma_buf.c
 - Restore orig_nents before freeing
 - Fix reversed logic around priv->revoked
 - Set priv->index
 - Rebased on v2 "Break up ioctl dispatch functions"
v1: https://lore.kernel.org/r/0-v1-9e6e1739ed95+5fa-vfio_dma_buf_jgg@nvidia.com
Cc: linux-rdma@vger.kernel.org
Cc: Oded Gabbay <ogabbay@kernel.org>
Cc: Christian König <christian.koenig@amd.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Maor Gottlieb <maorg@nvidia.com>
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

Jason Gunthorpe (3):
  vfio: Add vfio_device_get()
  dma-buf: Add dma_buf_try_get()
  vfio/pci: Allow MMIO regions to be exported through dma-buf

Wei Lin Guay (1):
  vfio/pci: Allow export dmabuf without move_notify from importer

 drivers/vfio/pci/Makefile          |   1 +
 drivers/vfio/pci/dma_buf.c         | 291 +++++++++++++++++++++++++++++
 drivers/vfio/pci/vfio_pci_config.c |   8 +-
 drivers/vfio/pci/vfio_pci_core.c   |  44 ++++-
 drivers/vfio/pci/vfio_pci_priv.h   |  30 +++
 drivers/vfio/vfio_main.c           |   1 +
 include/linux/dma-buf.h            |  13 ++
 include/linux/vfio.h               |   6 +
 include/linux/vfio_pci_core.h      |   1 +
 include/uapi/linux/vfio.h          |  18 ++
 10 files changed, 405 insertions(+), 8 deletions(-)
 create mode 100644 drivers/vfio/pci/dma_buf.c

--
2.43.5

Comments

Wei Lin Guay Dec. 17, 2024, 12:19 p.m. UTC | #1
Hi Vivek, 

> On 16 Dec 2024, at 18:34, Kasireddy, Vivek <vivek.kasireddy@intel.com> wrote:
> 
> > 
> Hi Wei Lin,
> 
>> Subject: [PATCH 0/4] cover-letter: Allow MMIO regions to be exported
>> through dmabuf
>> 
>> From: Wei Lin Guay <wguay@meta.com>
>> 
>> This is another attempt to revive the patches posted by Jason
>> Gunthorpe and Vivek Kasireddy, at
>> https://patchwork.kernel.org/project/linux-media/cover/0-v2-
>> 472615b3877e+28f7-vfio_dma_buf_jgg@nvidia.com/
>> https://urldefense.com/v3/__https://lwn.net/Articles/970751/__;!!Bt8RZUm9aw!5uPrlSI9DhXVIeRMntADbkdWcu4YYhAzGN0OwyQ2NHxrN7bYE9f1eQBXUD8xHHPxiJorSkYhNjIRwdG4PsD2$
> v2: https://lore.kernel.org/dri-devel/20240624065552.1572580-1-vivek.kasireddy@intel.com/
> addresses review comments from Alex and Jason and also includes the ability
> to create the dmabuf from multiple ranges. This is really needed to future-proof
> the feature. 

The good thing is that your patch series addressed Christian’s concerns of adding dma_buf_try_get().

However, several questions regarding your v2 patch:  
- The dma-buf still support redundant mmap handler? (One of review comments from v1?) 
- Rather than handle different regions within a single dma-buf, would vfio-user open multiple distinct file descriptors work?
For our specific use case, we don't require multiple regions and prefer Jason’s original patch.

> 
> Also, my understanding is that this patchset cannot proceed until Leon's series is merged:
> https://lore.kernel.org/kvm/cover.1733398913.git.leon@kernel.org/

Thanks for the pointer. 
I will rebase my local patch series on top of that. 

Thanks,
Wei Lin

> 
> 
> Thanks,
> Vivek
> 
>> 
>> In addition to the initial proposal by Jason, another promising
>> application is exposing memory from an AI accelerator (bound to VFIO)
>> to an RDMA device. This would allow the RDMA device to directly access
>> the accelerator's memory, thereby facilitating direct data
>> transactions between the RDMA device and the accelerator.
>> 
>> Below is from the text/motivation from the orginal cover letter.
>> 
>> dma-buf has become a way to safely acquire a handle to non-struct page
>> memory that can still have lifetime controlled by the exporter. Notably
>> RDMA can now import dma-buf FDs and build them into MRs which allows
>> for
>> PCI P2P operations. Extend this to allow vfio-pci to export MMIO memory
>> from PCI device BARs.
>> 
>> This series supports a use case for SPDK where a NVMe device will be owned
>> by SPDK through VFIO but interacting with a RDMA device. The RDMA device
>> may directly access the NVMe CMB or directly manipulate the NVMe device's
>> doorbell using PCI P2P.
>> 
>> However, as a general mechanism, it can support many other scenarios with
>> VFIO. I imagine this dmabuf approach to be usable by iommufd as well for
>> generic and safe P2P mappings.
>> 
>> This series goes after the "Break up ioctl dispatch functions to one
>> function per ioctl" series.
>> 
>> v2:
>> - Name the new file dma_buf.c
>> - Restore orig_nents before freeing
>> - Fix reversed logic around priv->revoked
>> - Set priv->index
>> - Rebased on v2 "Break up ioctl dispatch functions"
>> v1: https://lore.kernel.org/r/0-v1-9e6e1739ed95+5fa-
>> vfio_dma_buf_jgg@nvidia.com
>> Cc: linux-rdma@vger.kernel.org
>> Cc: Oded Gabbay <ogabbay@kernel.org>
>> Cc: Christian König <christian.koenig@amd.com>
>> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
>> Cc: Leon Romanovsky <leon@kernel.org>
>> Cc: Maor Gottlieb <maorg@nvidia.com>
>> Cc: dri-devel@lists.freedesktop.org
>> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
>> 
>> Jason Gunthorpe (3):
>>  vfio: Add vfio_device_get()
>>  dma-buf: Add dma_buf_try_get()
>>  vfio/pci: Allow MMIO regions to be exported through dma-buf
>> 
>> Wei Lin Guay (1):
>>  vfio/pci: Allow export dmabuf without move_notify from importer
>> 
>> drivers/vfio/pci/Makefile          |   1 +
>> drivers/vfio/pci/dma_buf.c         | 291 +++++++++++++++++++++++++++++
>> drivers/vfio/pci/vfio_pci_config.c |   8 +-
>> drivers/vfio/pci/vfio_pci_core.c   |  44 ++++-
>> drivers/vfio/pci/vfio_pci_priv.h   |  30 +++
>> drivers/vfio/vfio_main.c           |   1 +
>> include/linux/dma-buf.h            |  13 ++
>> include/linux/vfio.h               |   6 +
>> include/linux/vfio_pci_core.h      |   1 +
>> include/uapi/linux/vfio.h          |  18 ++
>> 10 files changed, 405 insertions(+), 8 deletions(-)
>> create mode 100644 drivers/vfio/pci/dma_buf.c
>> 
>> --
>> 2.43.5
Kasireddy, Vivek Dec. 18, 2024, 7:02 a.m. UTC | #2
Hi Wei Lin,

> Subject: Re: [PATCH 0/4] cover-letter: Allow MMIO regions to be exported
> through dmabuf
> >>
> >> From: Wei Lin Guay <wguay@meta.com>
> >>
> >> This is another attempt to revive the patches posted by Jason
> >> Gunthorpe and Vivek Kasireddy, at
> >> https://patchwork.kernel.org/project/linux-media/cover/0-v2-
> >> 472615b3877e+28f7-vfio_dma_buf_jgg@nvidia.com/
> >>
> https://urldefense.com/v3/__https://lwn.net/Articles/970751/__;!!Bt8RZUm
> 9aw!5uPrlSI9DhXVIeRMntADbkdWcu4YYhAzGN0OwyQ2NHxrN7bYE9f1eQBX
> UD8xHHPxiJorSkYhNjIRwdG4PsD2$
> > v2: https://lore.kernel.org/dri-devel/20240624065552.1572580-1-
> vivek.kasireddy@intel.com/
> > addresses review comments from Alex and Jason and also includes the
> ability
> > to create the dmabuf from multiple ranges. This is really needed to future-
> proof
> > the feature.
> 
> The good thing is that your patch series addressed Christian’s concerns of
> adding dma_buf_try_get().
No, it did not address his concern. In v2, instead of adding a new function, we
are now calling get_file_rcu() directly, which was suggested by Christian in the
old original review. 

> 
> However, several questions regarding your v2 patch:
> - The dma-buf still support redundant mmap handler? (One of review
> comments from v1?)
Yeah, the mmap handler is really needed as a debugging tool given that the
importer would not be able to provide access to the dmabuf's underlying
memory via the CPU in any other way.

> - Rather than handle different regions within a single dma-buf, would vfio-
> user open multiple distinct file descriptors work?
> For our specific use case, we don't require multiple regions and prefer
> Jason’s original patch.
Restricting the dmabuf to a single region (or having to create multiple dmabufs
to represent multiple regions/ranges associated with a single scattered buffer)
would not be feasible or ideal in all cases. For instance, in my use-case, I am
sharing a large framebuffer (FB) located in GPU's VRAM. And, allocating a large
FB contiguously (nr_ranges = 1) in VRAM is not possible when there is memory
pressure.

Furthermore, since we are adding a new UAPI with this patch/feature, we cannot
go back and tweak it (to add support for nr_ranges > 1) should there be a need in
the future, but you can always use nr_ranges = 1 anytime. That is why it makes
sense to be flexible in terms of the number of ranges/regions.

> 
> >
> > Also, my understanding is that this patchset cannot proceed until Leon's
> series is merged:
> > https://lore.kernel.org/kvm/cover.1733398913.git.leon@kernel.org/
> 
> Thanks for the pointer.
> I will rebase my local patch series on top of that.
AFAIK, Leon's work includes new mechanism/APIs to do P2P DMA, which we
should be using in this patchset. And, I think he is also planning to use the
new APIs to augment dmabuf usage and not be forced to use the scatter-gather
list particularly in cases where the underlying memory is not backed by struct page.

I was just waiting for all of this to happen, before posting a v3.

Thanks,
Vivek

> 
> Thanks,
> Wei Lin
> 
> >
> >
> > Thanks,
> > Vivek
> >
> >>
> >> In addition to the initial proposal by Jason, another promising
> >> application is exposing memory from an AI accelerator (bound to VFIO)
> >> to an RDMA device. This would allow the RDMA device to directly access
> >> the accelerator's memory, thereby facilitating direct data
> >> transactions between the RDMA device and the accelerator.
> >>
> >> Below is from the text/motivation from the orginal cover letter.
> >>
> >> dma-buf has become a way to safely acquire a handle to non-struct page
> >> memory that can still have lifetime controlled by the exporter. Notably
> >> RDMA can now import dma-buf FDs and build them into MRs which
> allows
> >> for
> >> PCI P2P operations. Extend this to allow vfio-pci to export MMIO memory
> >> from PCI device BARs.
> >>
> >> This series supports a use case for SPDK where a NVMe device will be
> owned
> >> by SPDK through VFIO but interacting with a RDMA device. The RDMA
> device
> >> may directly access the NVMe CMB or directly manipulate the NVMe
> device's
> >> doorbell using PCI P2P.
> >>
> >> However, as a general mechanism, it can support many other scenarios
> with
> >> VFIO. I imagine this dmabuf approach to be usable by iommufd as well for
> >> generic and safe P2P mappings.
> >>
> >> This series goes after the "Break up ioctl dispatch functions to one
> >> function per ioctl" series.
> >>
> >> v2:
> >> - Name the new file dma_buf.c
> >> - Restore orig_nents before freeing
> >> - Fix reversed logic around priv->revoked
> >> - Set priv->index
> >> - Rebased on v2 "Break up ioctl dispatch functions"
> >> v1: https://lore.kernel.org/r/0-v1-9e6e1739ed95+5fa-
> >> vfio_dma_buf_jgg@nvidia.com
> >> Cc: linux-rdma@vger.kernel.org
> >> Cc: Oded Gabbay <ogabbay@kernel.org>
> >> Cc: Christian König <christian.koenig@amd.com>
> >> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> >> Cc: Leon Romanovsky <leon@kernel.org>
> >> Cc: Maor Gottlieb <maorg@nvidia.com>
> >> Cc: dri-devel@lists.freedesktop.org
> >> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> >>
> >> Jason Gunthorpe (3):
> >>  vfio: Add vfio_device_get()
> >>  dma-buf: Add dma_buf_try_get()
> >>  vfio/pci: Allow MMIO regions to be exported through dma-buf
> >>
> >> Wei Lin Guay (1):
> >>  vfio/pci: Allow export dmabuf without move_notify from importer
> >>
> >> drivers/vfio/pci/Makefile          |   1 +
> >> drivers/vfio/pci/dma_buf.c         | 291 +++++++++++++++++++++++++++++
> >> drivers/vfio/pci/vfio_pci_config.c |   8 +-
> >> drivers/vfio/pci/vfio_pci_core.c   |  44 ++++-
> >> drivers/vfio/pci/vfio_pci_priv.h   |  30 +++
> >> drivers/vfio/vfio_main.c           |   1 +
> >> include/linux/dma-buf.h            |  13 ++
> >> include/linux/vfio.h               |   6 +
> >> include/linux/vfio_pci_core.h      |   1 +
> >> include/uapi/linux/vfio.h          |  18 ++
> >> 10 files changed, 405 insertions(+), 8 deletions(-)
> >> create mode 100644 drivers/vfio/pci/dma_buf.c
> >>
> >> --
> >> 2.43.5