Message ID | 20201106170036.18713-16-logang@deltatee.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Userspace P2PDMA with O_DIRECT NVMe devices | expand |
On Fri, Nov 06, 2020 at 10:00:36AM -0700, Logan Gunthorpe wrote: > Allow userspace to obtain CMB memory by mmaping the controller's > char device. The mmap call allocates and returns a hunk of CMB memory, > (the offset is ignored) so userspace does not have control over the > address within the CMB. > > A VMA allocated in this way will only be usable by drivers that set > FOLL_PCI_P2PDMA when calling GUP. And inter-device support will be > checked the first time the pages are mapped for DMA. > > Currently this is only supported by O_DIRECT to an PCI NVMe device > or through the NVMe passthrough IOCTL. > > Signed-off-by: Logan Gunthorpe <logang@deltatee.com> > drivers/nvme/host/core.c | 11 +++++++++++ > drivers/nvme/host/nvme.h | 1 + > drivers/nvme/host/pci.c | 9 +++++++++ > 3 files changed, 21 insertions(+) > > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c > index f14316c9b34a..fc642aba671d 100644 > +++ b/drivers/nvme/host/core.c > @@ -3240,12 +3240,23 @@ static long nvme_dev_ioctl(struct file *file, unsigned int cmd, > } > } > > +static int nvme_dev_mmap(struct file *file, struct vm_area_struct *vma) > +{ > + struct nvme_ctrl *ctrl = file->private_data; > + > + if (!ctrl->ops->mmap_cmb) > + return -ENODEV; > + > + return ctrl->ops->mmap_cmb(ctrl, vma); > +} This needs to ensure that the VMA created is destroyed before the driver is unprobed - ie the struct pages backing the BAR memory is destroyed. I don't see anything that synchronizes this in the nvme_dev_release()? Many places do this by putting all the VMAs into an address space and zaping the address space when unprobing the driver to revoke the pages, but there is a tricky race here :\ https://lore.kernel.org/dri-devel/20201021125030.GK36674@ziepe.ca/ Jason
On 2020-11-06 10:39 a.m., Jason Gunthorpe wrote: > On Fri, Nov 06, 2020 at 10:00:36AM -0700, Logan Gunthorpe wrote: >> Allow userspace to obtain CMB memory by mmaping the controller's >> char device. The mmap call allocates and returns a hunk of CMB memory, >> (the offset is ignored) so userspace does not have control over the >> address within the CMB. >> >> A VMA allocated in this way will only be usable by drivers that set >> FOLL_PCI_P2PDMA when calling GUP. And inter-device support will be >> checked the first time the pages are mapped for DMA. >> >> Currently this is only supported by O_DIRECT to an PCI NVMe device >> or through the NVMe passthrough IOCTL. >> >> Signed-off-by: Logan Gunthorpe <logang@deltatee.com> >> drivers/nvme/host/core.c | 11 +++++++++++ >> drivers/nvme/host/nvme.h | 1 + >> drivers/nvme/host/pci.c | 9 +++++++++ >> 3 files changed, 21 insertions(+) >> >> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c >> index f14316c9b34a..fc642aba671d 100644 >> +++ b/drivers/nvme/host/core.c >> @@ -3240,12 +3240,23 @@ static long nvme_dev_ioctl(struct file *file, unsigned int cmd, >> } >> } >> >> +static int nvme_dev_mmap(struct file *file, struct vm_area_struct *vma) >> +{ >> + struct nvme_ctrl *ctrl = file->private_data; >> + >> + if (!ctrl->ops->mmap_cmb) >> + return -ENODEV; >> + >> + return ctrl->ops->mmap_cmb(ctrl, vma); >> +} > > This needs to ensure that the VMA created is destroyed before the > driver is unprobed - ie the struct pages backing the BAR memory is > destroyed. > > I don't see anything that synchronizes this in the nvme_dev_release()? Yup, looks like something that needs to be fixed. Though I'd probably do it in the pci_p2pdma helper code instead. Logan
On Fri, Nov 06, 2020 at 10:00:36AM -0700, Logan Gunthorpe wrote: > Allow userspace to obtain CMB memory by mmaping the controller's > char device. The mmap call allocates and returns a hunk of CMB memory, > (the offset is ignored) so userspace does not have control over the > address within the CMB. > > A VMA allocated in this way will only be usable by drivers that set > FOLL_PCI_P2PDMA when calling GUP. And inter-device support will be > checked the first time the pages are mapped for DMA. > > Currently this is only supported by O_DIRECT to an PCI NVMe device > or through the NVMe passthrough IOCTL. Rather than make this be specific to nvme, could pci p2pdma create an mmap'able file for any resource registered with it?
On 2020-11-09 8:03 a.m., Keith Busch wrote: > On Fri, Nov 06, 2020 at 10:00:36AM -0700, Logan Gunthorpe wrote: >> Allow userspace to obtain CMB memory by mmaping the controller's >> char device. The mmap call allocates and returns a hunk of CMB memory, >> (the offset is ignored) so userspace does not have control over the >> address within the CMB. >> >> A VMA allocated in this way will only be usable by drivers that set >> FOLL_PCI_P2PDMA when calling GUP. And inter-device support will be >> checked the first time the pages are mapped for DMA. >> >> Currently this is only supported by O_DIRECT to an PCI NVMe device >> or through the NVMe passthrough IOCTL. > > Rather than make this be specific to nvme, could pci p2pdma create an > mmap'able file for any resource registered with it? It's certainly possible. However, other people have been arguing that more of this should be specific to NVMe as some use cases do not want to use the genalloc inside p2pdma. Logan
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index f14316c9b34a..fc642aba671d 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -3240,12 +3240,23 @@ static long nvme_dev_ioctl(struct file *file, unsigned int cmd, } } +static int nvme_dev_mmap(struct file *file, struct vm_area_struct *vma) +{ + struct nvme_ctrl *ctrl = file->private_data; + + if (!ctrl->ops->mmap_cmb) + return -ENODEV; + + return ctrl->ops->mmap_cmb(ctrl, vma); +} + static const struct file_operations nvme_dev_fops = { .owner = THIS_MODULE, .open = nvme_dev_open, .release = nvme_dev_release, .unlocked_ioctl = nvme_dev_ioctl, .compat_ioctl = compat_ptr_ioctl, + .mmap = nvme_dev_mmap, }; static ssize_t nvme_sysfs_reset(struct device *dev, diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h index a0bfe8709cfc..3d790d849701 100644 --- a/drivers/nvme/host/nvme.h +++ b/drivers/nvme/host/nvme.h @@ -473,6 +473,7 @@ struct nvme_ctrl_ops { void (*delete_ctrl)(struct nvme_ctrl *ctrl); int (*get_address)(struct nvme_ctrl *ctrl, char *buf, int size); bool (*supports_pci_p2pdma)(struct nvme_ctrl *ctrl); + int (*mmap_cmb)(struct nvme_ctrl *ctrl, struct vm_area_struct *vma); }; #ifdef CONFIG_FAULT_INJECTION_DEBUG_FS diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index 26976bdf4af0..9c61573111ea 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -2687,6 +2687,14 @@ static bool nvme_pci_supports_pci_p2pdma(struct nvme_ctrl *ctrl) return dma_pci_p2pdma_supported(dev->dev); } +static int nvme_pci_mmap_cmb(struct nvme_ctrl *ctrl, + struct vm_area_struct *vma) +{ + struct pci_dev *pdev = to_pci_dev(to_nvme_dev(ctrl)->dev); + + return pci_mmap_p2pmem(pdev, vma); +} + static const struct nvme_ctrl_ops nvme_pci_ctrl_ops = { .name = "pcie", .module = THIS_MODULE, @@ -2698,6 +2706,7 @@ static const struct nvme_ctrl_ops nvme_pci_ctrl_ops = { .submit_async_event = nvme_pci_submit_async_event, .get_address = nvme_pci_get_address, .supports_pci_p2pdma = nvme_pci_supports_pci_p2pdma, + .mmap_cmb = nvme_pci_mmap_cmb, }; static int nvme_dev_map(struct nvme_dev *dev)
Allow userspace to obtain CMB memory by mmaping the controller's char device. The mmap call allocates and returns a hunk of CMB memory, (the offset is ignored) so userspace does not have control over the address within the CMB. A VMA allocated in this way will only be usable by drivers that set FOLL_PCI_P2PDMA when calling GUP. And inter-device support will be checked the first time the pages are mapped for DMA. Currently this is only supported by O_DIRECT to an PCI NVMe device or through the NVMe passthrough IOCTL. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> --- drivers/nvme/host/core.c | 11 +++++++++++ drivers/nvme/host/nvme.h | 1 + drivers/nvme/host/pci.c | 9 +++++++++ 3 files changed, 21 insertions(+)