diff mbox series

[v7,10/13] nvme-pci: Add support for P2P memory in requests

Message ID 20180925162231.4354-11-logang@deltatee.com (mailing list archive)
State New, archived
Headers show
Series Copy Offload in NVMe Fabrics with P2P PCI Memory | expand

Commit Message

Logan Gunthorpe Sept. 25, 2018, 4:22 p.m. UTC
For P2P requests, we must use the pci_p2pmem_map_sg() function
instead of the dma_map_sg functions.

With that, we can then indicate PCI_P2P support in the request queue.
For this, we create an NVME_F_PCI_P2P flag which tells the core to
set QUEUE_FLAG_PCI_P2P in the request queue.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvme/host/core.c |  4 ++++
 drivers/nvme/host/nvme.h |  1 +
 drivers/nvme/host/pci.c  | 17 +++++++++++++----
 3 files changed, 18 insertions(+), 4 deletions(-)

Comments

Keith Busch Sept. 25, 2018, 5:11 p.m. UTC | #1
On Tue, Sep 25, 2018 at 10:22:28AM -0600, Logan Gunthorpe wrote:
> For P2P requests, we must use the pci_p2pmem_map_sg() function
> instead of the dma_map_sg functions.

Sorry if this was already discussed. Is there a reason the following
pattern is not pushed to the generic dma_map_sg_attrs?

	if (is_pci_p2pdma_page(sg_page(sg)))
		pci_p2pdma_map_sg(dev, sg, nents, dma_dir);

Beyond that, series looks good.
Logan Gunthorpe Sept. 25, 2018, 5:41 p.m. UTC | #2
Hey,

On 2018-09-25 11:11 a.m., Keith Busch wrote:
> Sorry if this was already discussed. Is there a reason the following
> pattern is not pushed to the generic dma_map_sg_attrs?
> 
> 	if (is_pci_p2pdma_page(sg_page(sg)))
> 		pci_p2pdma_map_sg(dev, sg, nents, dma_dir);
> 
> Beyond that, series looks good.

Yes, this has been discussed. It comes down to a few reasons:

1) Intrusiveness on other systems: ie. not needing to pay the cost for
every single dma_map_sg call

2) Consistency: we can add the check to dma_map_sg, but adding similar
functionality to dma_map_page, etc is difficult seeing it's hard for the
unmap operation to detect if a dma_addr_t was P2P memory to begin with.

3) Safety for developers trying to use P2P memory: Right now developers
must be careful with P2P pages and ensure they aren't mapped using other
means (ie dma_map_page). Having them check the drivers that are handling
the pages to ensure the appropriate map function is always used is and
that P2P pages aren't being mixed with regular pages is better than
developers relying on magic in dma_map_sg() and getting things wrong.

That being said, I think in the future everyone would like to move in
that direction but it means we will have to solve some difficult
problems with the existing infrastructure.

Logan
Keith Busch Sept. 25, 2018, 5:48 p.m. UTC | #3
On Tue, Sep 25, 2018 at 11:41:44AM -0600, Logan Gunthorpe wrote:
> Hey,
> 
> On 2018-09-25 11:11 a.m., Keith Busch wrote:
> > Sorry if this was already discussed. Is there a reason the following
> > pattern is not pushed to the generic dma_map_sg_attrs?
> > 
> > 	if (is_pci_p2pdma_page(sg_page(sg)))
> > 		pci_p2pdma_map_sg(dev, sg, nents, dma_dir);
> > 
> > Beyond that, series looks good.
> 
> Yes, this has been discussed. It comes down to a few reasons:
> 
> 1) Intrusiveness on other systems: ie. not needing to pay the cost for
> every single dma_map_sg call
> 
> 2) Consistency: we can add the check to dma_map_sg, but adding similar
> functionality to dma_map_page, etc is difficult seeing it's hard for the
> unmap operation to detect if a dma_addr_t was P2P memory to begin with.
> 
> 3) Safety for developers trying to use P2P memory: Right now developers
> must be careful with P2P pages and ensure they aren't mapped using other
> means (ie dma_map_page). Having them check the drivers that are handling
> the pages to ensure the appropriate map function is always used is and
> that P2P pages aren't being mixed with regular pages is better than
> developers relying on magic in dma_map_sg() and getting things wrong.
> 
> That being said, I think in the future everyone would like to move in
> that direction but it means we will have to solve some difficult
> problems with the existing infrastructure.

Gotchya, thanks for jogging my memory.
diff mbox series

Patch

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index dd8ec1dd9219..6033ce2fd3e9 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -3051,7 +3051,11 @@  static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid)
 	ns->queue = blk_mq_init_queue(ctrl->tagset);
 	if (IS_ERR(ns->queue))
 		goto out_free_ns;
+
 	blk_queue_flag_set(QUEUE_FLAG_NONROT, ns->queue);
+	if (ctrl->ops->flags & NVME_F_PCI_P2PDMA)
+		blk_queue_flag_set(QUEUE_FLAG_PCI_P2PDMA, ns->queue);
+
 	ns->queue->queuedata = ns;
 	ns->ctrl = ctrl;
 
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index bb4a2003c097..4030743c90aa 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -343,6 +343,7 @@  struct nvme_ctrl_ops {
 	unsigned int flags;
 #define NVME_F_FABRICS			(1 << 0)
 #define NVME_F_METADATA_SUPPORTED	(1 << 1)
+#define NVME_F_PCI_P2PDMA		(1 << 2)
 	int (*reg_read32)(struct nvme_ctrl *ctrl, u32 off, u32 *val);
 	int (*reg_write32)(struct nvme_ctrl *ctrl, u32 off, u32 val);
 	int (*reg_read64)(struct nvme_ctrl *ctrl, u32 off, u64 *val);
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index f434706a04e8..0d6c41bc2b35 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -745,8 +745,13 @@  static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
 		goto out;
 
 	ret = BLK_STS_RESOURCE;
-	nr_mapped = dma_map_sg_attrs(dev->dev, iod->sg, iod->nents, dma_dir,
-			DMA_ATTR_NO_WARN);
+
+	if (is_pci_p2pdma_page(sg_page(iod->sg)))
+		nr_mapped = pci_p2pdma_map_sg(dev->dev, iod->sg, iod->nents,
+					  dma_dir);
+	else
+		nr_mapped = dma_map_sg_attrs(dev->dev, iod->sg, iod->nents,
+					     dma_dir,  DMA_ATTR_NO_WARN);
 	if (!nr_mapped)
 		goto out;
 
@@ -788,7 +793,10 @@  static void nvme_unmap_data(struct nvme_dev *dev, struct request *req)
 			DMA_TO_DEVICE : DMA_FROM_DEVICE;
 
 	if (iod->nents) {
-		dma_unmap_sg(dev->dev, iod->sg, iod->nents, dma_dir);
+		/* P2PDMA requests do not need to be unmapped */
+		if (!is_pci_p2pdma_page(sg_page(iod->sg)))
+			dma_unmap_sg(dev->dev, iod->sg, iod->nents, dma_dir);
+
 		if (blk_integrity_rq(req))
 			dma_unmap_sg(dev->dev, &iod->meta_sg, 1, dma_dir);
 	}
@@ -2400,7 +2408,8 @@  static int nvme_pci_get_address(struct nvme_ctrl *ctrl, char *buf, int size)
 static const struct nvme_ctrl_ops nvme_pci_ctrl_ops = {
 	.name			= "pcie",
 	.module			= THIS_MODULE,
-	.flags			= NVME_F_METADATA_SUPPORTED,
+	.flags			= NVME_F_METADATA_SUPPORTED |
+				  NVME_F_PCI_P2PDMA,
 	.reg_read32		= nvme_pci_reg_read32,
 	.reg_write32		= nvme_pci_reg_write32,
 	.reg_read64		= nvme_pci_reg_read64,