[net-next,16/19] net/mlx5: Implement dma ops and params for mediated device
diff mbox series

Message ID 20191107160834.21087-16-parav@mellanox.com
State New
Headers show
Series
  • Mellanox, mlx5 sub function support
Related show

Commit Message

Parav Pandit Nov. 7, 2019, 4:08 p.m. UTC
Implement dma ops wrapper to divert dma ops to its parent PCI device
because Intel IOMMU (and may be other IOMMU) is limited to PCI devices.

Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
---
 .../ethernet/mellanox/mlx5/core/meddev/sf.c   | 151 ++++++++++++++++++
 1 file changed, 151 insertions(+)

Comments

Jakub Kicinski Nov. 7, 2019, 8:42 p.m. UTC | #1
On Thu,  7 Nov 2019 10:08:31 -0600, Parav Pandit wrote:
> Implement dma ops wrapper to divert dma ops to its parent PCI device
> because Intel IOMMU (and may be other IOMMU) is limited to PCI devices.
> 
> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
> Signed-off-by: Parav Pandit <parav@mellanox.com>

Isn't this supposed to use PASSID or whatnot? Could you explain a
little? This mdev stuff is pretty new to networking folks..
Parav Pandit Nov. 7, 2019, 9:30 p.m. UTC | #2
> -----Original Message-----
> From: kvm-owner@vger.kernel.org <kvm-owner@vger.kernel.org> On Behalf
> Of Jakub Kicinski
> Sent: Thursday, November 7, 2019 2:43 PM
> To: Parav Pandit <parav@mellanox.com>
> Cc: alex.williamson@redhat.com; davem@davemloft.net;
> kvm@vger.kernel.org; netdev@vger.kernel.org; Saeed Mahameed
> <saeedm@mellanox.com>; kwankhede@nvidia.com; leon@kernel.org;
> cohuck@redhat.com; Jiri Pirko <jiri@mellanox.com>; linux-
> rdma@vger.kernel.org
> Subject: Re: [PATCH net-next 16/19] net/mlx5: Implement dma ops and params
> for mediated device
> 
> On Thu,  7 Nov 2019 10:08:31 -0600, Parav Pandit wrote:
> > Implement dma ops wrapper to divert dma ops to its parent PCI device
> > because Intel IOMMU (and may be other IOMMU) is limited to PCI devices.
> >
> > Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
> > Signed-off-by: Parav Pandit <parav@mellanox.com>
> 
> Isn't this supposed to use PASSID or whatnot? Could you explain a little? This
> mdev stuff is pretty new to networking folks..

Currently series doesn't support PCI PASID.
While doing dma mapping, Intel IOMMU expects dma device to be PCI device in few function traces like, find_or_alloc_domain(), 
Since mdev bus is not a PCI bus, DMA mapping needs to go through its parent PCI device.
Otherwise dma ops on mdev devices fails, as I think it fails to identify how to perform the translations.
(It doesn't seem to consult its parent device).
Jakub Kicinski Nov. 8, 2019, 1:16 a.m. UTC | #3
On Thu, 7 Nov 2019 21:30:41 +0000, Parav Pandit wrote:
> > -----Original Message-----
> > From: kvm-owner@vger.kernel.org <kvm-owner@vger.kernel.org> On Behalf
> > Of Jakub Kicinski
> > Sent: Thursday, November 7, 2019 2:43 PM
> > To: Parav Pandit <parav@mellanox.com>
> > Cc: alex.williamson@redhat.com; davem@davemloft.net;
> > kvm@vger.kernel.org; netdev@vger.kernel.org; Saeed Mahameed
> > <saeedm@mellanox.com>; kwankhede@nvidia.com; leon@kernel.org;
> > cohuck@redhat.com; Jiri Pirko <jiri@mellanox.com>; linux-
> > rdma@vger.kernel.org
> > Subject: Re: [PATCH net-next 16/19] net/mlx5: Implement dma ops and params
> > for mediated device

Please try to avoid generating those headers, you're not an occasional
contributor. They're annoying and a waste of space :(

> > On Thu,  7 Nov 2019 10:08:31 -0600, Parav Pandit wrote:  
> > > Implement dma ops wrapper to divert dma ops to its parent PCI device
> > > because Intel IOMMU (and may be other IOMMU) is limited to PCI devices.
> > >
> > > Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
> > > Signed-off-by: Parav Pandit <parav@mellanox.com>  
> > 
> > Isn't this supposed to use PASSID or whatnot? Could you explain a little? This
> > mdev stuff is pretty new to networking folks..  
> 
> Currently series doesn't support PCI PASID.
> While doing dma mapping, Intel IOMMU expects dma device to be PCI device in few function traces like, find_or_alloc_domain(), 
> Since mdev bus is not a PCI bus, DMA mapping needs to go through its parent PCI device.
> Otherwise dma ops on mdev devices fails, as I think it fails to identify how to perform the translations.
> (It doesn't seem to consult its parent device).

What's missing for PASSID to work? HW support? FW support? IOMMU
plumbing? mdev plumbing? mlx5 plumbing?
Christoph Hellwig Nov. 8, 2019, 6:37 a.m. UTC | #4
On Thu, Nov 07, 2019 at 10:08:31AM -0600, Parav Pandit wrote:
> Implement dma ops wrapper to divert dma ops to its parent PCI device
> because Intel IOMMU (and may be other IOMMU) is limited to PCI devices.

Yikes.  I've been trying hard to get rid of pointless dma_map_ops
instance.  What upper layers use these child devices, and why can't
they just use the parent device for dma mapping directly?
Parav Pandit Nov. 8, 2019, 3:29 p.m. UTC | #5
Hi Christoph,

> -----Original Message-----
> From: Christoph Hellwig <hch@infradead.org>
> Sent: Friday, November 8, 2019 12:38 AM
> To: Parav Pandit <parav@mellanox.com>
> Cc: alex.williamson@redhat.com; davem@davemloft.net;
> kvm@vger.kernel.org; netdev@vger.kernel.org; Saeed Mahameed
> <saeedm@mellanox.com>; kwankhede@nvidia.com; leon@kernel.org;
> cohuck@redhat.com; Jiri Pirko <jiri@mellanox.com>; linux-
> rdma@vger.kernel.or 
> Subject: Re: [PATCH net-next 16/19] net/mlx5: Implement dma ops and
> params for mediated device
> 
> On Thu, Nov 07, 2019 at 10:08:31AM -0600, Parav Pandit wrote:
> > Implement dma ops wrapper to divert dma ops to its parent PCI device
> > because Intel IOMMU (and may be other IOMMU) is limited to PCI devices.
> 
> Yikes.  I've been trying hard to get rid of pointless dma_map_ops instance.
> What upper layers use these child devices, and why can't they just use the
> parent device for dma mapping directly?

I certainly like to get rid of the dma_ops. Please let me know, if there is better way. More details below.

Few upper layers that I know of are (a) NVME because this child devices are rdma and (b) TCP as child device s netdevice.
Without dma ops setup, ULPs on top of RDMA device will be able to make use of it.

Modifying any non RDMA ULPs to refer to the parent because this child device is mdev will be obviously non-starter.
On netdev side, mlx5_core driver can always do dma mapping to the parent PCI device.

However, I wanted to avoid such implementation in mlx5_core driver.
Specially when it is taken care when iommu is disabled.
When IOMMU is enabled, find_domain() fails during dma_alloc_coherent() through intel_alloc_coherent() for the child devices.
Couldn't figure out what did I miss in device setup that leads to this failure.
dev.archdata.iommu is null for device on mdev device.
Further code reading hints IOMMU branches on dev_pci().
Until that is fixed, 
1. we can get rid of dma ops, let mlx5_core refer to parent pci,
2. rdma will anyway refer to parent and ulps are ok
Or you have any inputs on how to debug this futher?

Patch
diff mbox series

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/meddev/sf.c b/drivers/net/ethernet/mellanox/mlx5/core/meddev/sf.c
index cfbbb2d32aca..4b0718418bc5 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/meddev/sf.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/meddev/sf.c
@@ -207,6 +207,156 @@  u16 mlx5_core_max_sfs(const struct mlx5_core_dev *dev,
 	return mlx5_core_is_sf_supported(dev) ? sf_table->max_sfs : 0;
 }
 
+static void *mlx5_sf_dma_alloc(struct device *dev, size_t size,
+			       dma_addr_t *dma_handle, gfp_t gfp,
+			       unsigned long attrs)
+{
+	return dma_alloc_attrs(dev->parent, size, dma_handle, gfp, attrs);
+}
+
+static void
+mlx5_sf_dma_free(struct device *dev, size_t size,
+		 void *vaddr, dma_addr_t dma_handle,
+		 unsigned long attrs)
+{
+	dma_free_attrs(dev->parent, size, vaddr, dma_handle, attrs);
+}
+
+static int
+mlx5_sf_dma_mmap(struct device *dev, struct vm_area_struct *vma,
+		 void *cpu_addr, dma_addr_t dma_addr, size_t size,
+		 unsigned long attrs)
+{
+	return dma_mmap_attrs(dev->parent, vma, cpu_addr,
+			      dma_addr, size, attrs);
+}
+
+static int
+mlx5_sf_dma_get_sgtable(struct device *dev, struct sg_table *sgt,
+			void *cpu_addr, dma_addr_t dma_addr, size_t size,
+			unsigned long attrs)
+{
+	return dma_get_sgtable_attrs(dev->parent, sgt, cpu_addr,
+				     dma_addr, size, attrs);
+}
+
+static dma_addr_t
+mlx5_sf_dma_map_page(struct device *dev, struct page *page,
+		     unsigned long offset, size_t size,
+		     enum dma_data_direction dir,
+		     unsigned long attrs)
+{
+	return dma_map_page_attrs(dev->parent, page, offset, size, dir, attrs);
+}
+
+static void
+mlx5_sf_dma_unmap_page(struct device *dev, dma_addr_t dma_handle,
+		       size_t size, enum dma_data_direction dir,
+		       unsigned long attrs)
+{
+	dma_unmap_page_attrs(dev->parent, dma_handle, size, dir, attrs);
+}
+
+static int
+mlx5_sf_dma_map_sg(struct device *dev, struct scatterlist *sg,
+		   int nents, enum dma_data_direction dir,
+		   unsigned long attrs)
+{
+	return dma_map_sg_attrs(dev->parent, sg, nents, dir, attrs);
+}
+
+static void
+mlx5_sf_dma_unmap_sg(struct device *dev, struct scatterlist *sg, int nents,
+		     enum dma_data_direction dir, unsigned long attrs)
+{
+	dma_unmap_sg_attrs(dev->parent, sg, nents, dir, attrs);
+}
+
+static dma_addr_t
+mlx5_sf_dma_map_resource(struct device *dev, phys_addr_t phys_addr,
+			 size_t size, enum dma_data_direction dir,
+			 unsigned long attrs)
+{
+	return dma_map_resource(dev->parent, phys_addr, size, dir, attrs);
+}
+
+static void
+mlx5_sf_dma_unmap_resource(struct device *dev, dma_addr_t dma_handle,
+			   size_t size, enum dma_data_direction dir,
+			   unsigned long attrs)
+{
+	dma_unmap_resource(dev->parent, dma_handle, size, dir, attrs);
+}
+
+static void
+mlx5_sf_dma_sync_single_for_cpu(struct device *dev,
+				dma_addr_t dma_handle, size_t size,
+				enum dma_data_direction dir)
+{
+	dma_sync_single_for_cpu(dev->parent, dma_handle, size, dir);
+}
+
+static void
+mlx5_sf_dma_sync_single_for_device(struct device *dev,
+				   dma_addr_t dma_handle, size_t size,
+				   enum dma_data_direction dir)
+{
+	dma_sync_single_for_device(dev->parent, dma_handle, size, dir);
+}
+
+static void
+mlx5_sf_dma_sync_sg_for_cpu(struct device *dev,
+			    struct scatterlist *sg, int nents,
+			    enum dma_data_direction dir)
+{
+	dma_sync_sg_for_cpu(dev->parent, sg, nents, dir);
+}
+
+static void
+mlx5_sf_dma_sync_sg_for_device(struct device *dev,
+			       struct scatterlist *sg, int nents,
+			       enum dma_data_direction dir)
+{
+	dma_sync_sg_for_device(dev->parent, sg, nents, dir);
+}
+
+static void
+mlx5_sf_dma_cache_sync(struct device *dev, void *vaddr, size_t size,
+		       enum dma_data_direction dir)
+{
+	dma_cache_sync(dev->parent, vaddr, size, dir);
+}
+
+static const struct dma_map_ops mlx5_sf_dma_ops = {
+	.alloc = mlx5_sf_dma_alloc,
+	.free = mlx5_sf_dma_free,
+	.mmap = mlx5_sf_dma_mmap,
+	.get_sgtable = mlx5_sf_dma_get_sgtable,
+	.map_page = mlx5_sf_dma_map_page,
+	.unmap_page = mlx5_sf_dma_unmap_page,
+	.map_sg = mlx5_sf_dma_map_sg,
+	.unmap_sg = mlx5_sf_dma_unmap_sg,
+	.map_resource = mlx5_sf_dma_map_resource,
+	.unmap_resource = mlx5_sf_dma_unmap_resource,
+	.sync_single_for_cpu = mlx5_sf_dma_sync_single_for_cpu,
+	.sync_sg_for_cpu = mlx5_sf_dma_sync_sg_for_cpu,
+	.sync_sg_for_device = mlx5_sf_dma_sync_sg_for_device,
+	.sync_single_for_device = mlx5_sf_dma_sync_single_for_device,
+	.cache_sync = mlx5_sf_dma_cache_sync,
+};
+
+static void
+set_dma_params(struct device *dev, const struct mlx5_core_dev *coredev)
+{
+	struct pci_dev *pdev = coredev->pdev;
+
+	dev->dma_ops = &mlx5_sf_dma_ops;
+	dev->dma_mask = pdev->dev.dma_mask;
+	dev->dma_parms = pdev->dev.dma_parms;
+	dma_set_coherent_mask(dev, pdev->dev.coherent_dma_mask);
+	dma_set_max_seg_size(dev, dma_get_max_seg_size(&pdev->dev));
+}
+
 int mlx5_sf_load(struct mlx5_sf *sf, struct device *device,
 		 const struct mlx5_core_dev *parent_dev)
 {
@@ -231,6 +381,7 @@  int mlx5_sf_load(struct mlx5_sf *sf, struct device *device,
 		err = -ENOMEM;
 		goto remap_err;
 	}
+	set_dma_params(dev->device, parent_dev);
 
 	err = mlx5_mdev_init(dev, MLX5_DEFAULT_PROF);
 	if (err) {