[2/9] iommu/dma-iommu: Add function to flush any cached not present IOTLB entries
diff mbox series

Message ID 20190411184741.27540-3-tmurphy@arista.com
State New
Headers show
Series
  • iommu/amd: Convert the AMD iommu driver to the dma-iommu api
Related show

Commit Message

Tom Murphy April 11, 2019, 6:47 p.m. UTC
Both the AMD and Intel drivers can cache not present IOTLB entries. To
convert these drivers to the dma-iommu api we need a generic way to
flush the NP cache. IOMMU drivers which have a NP cache can implement
the .flush_np_cache function in the iommu ops struct. I will implement
.flush_np_cache for both the Intel and AMD drivers in later patches.

The Intel np-cache is described here:
https://software.intel.com/sites/default/files/managed/c5/15/vt-directed-io-spec.pdf#G7.66452

And the AMD np-cache is described here:
https://developer.amd.com/wordpress/media/2012/10/34434-IOMMU-Rev_1.26_2-11-09.pdf#page=63

Signed-off-by: Tom Murphy <tmurphy@arista.com>
---
 drivers/iommu/dma-iommu.c | 10 ++++++++++
 include/linux/iommu.h     |  3 +++
 2 files changed, 13 insertions(+)

Comments

Robin Murphy April 16, 2019, 2 p.m. UTC | #1
On 11/04/2019 19:47, Tom Murphy wrote:
> Both the AMD and Intel drivers can cache not present IOTLB entries. To
> convert these drivers to the dma-iommu api we need a generic way to
> flush the NP cache. IOMMU drivers which have a NP cache can implement
> the .flush_np_cache function in the iommu ops struct. I will implement
> .flush_np_cache for both the Intel and AMD drivers in later patches.
> 
> The Intel np-cache is described here:
> https://software.intel.com/sites/default/files/managed/c5/15/vt-directed-io-spec.pdf#G7.66452
> 
> And the AMD np-cache is described here:
> https://developer.amd.com/wordpress/media/2012/10/34434-IOMMU-Rev_1.26_2-11-09.pdf#page=63

Callers expect that once iommu_map() returns successfully, the mapping 
exists and is ready to use - if these drivers aren't handling this 
flushing internally, how are they not already broken for e.g. VFIO?

> Signed-off-by: Tom Murphy <tmurphy@arista.com>
> ---
>   drivers/iommu/dma-iommu.c | 10 ++++++++++
>   include/linux/iommu.h     |  3 +++
>   2 files changed, 13 insertions(+)
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 1a4bff3f8427..cc5da30d6e58 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -594,6 +594,9 @@ struct page **iommu_dma_alloc(struct device *dev, size_t size, gfp_t gfp,
>   			< size)
>   		goto out_free_sg;
>   
> +	if (domain->ops->flush_np_cache)
> +		domain->ops->flush_np_cache(domain, iova, size);
> +

This doesn't scale. At the very least, it should be internal to 
iommu_map() and exposed to be the responsibility of every external 
caller now and forever after.

That said, I've now gone and looked and AFAICS both the Intel and AMD 
drivers *do* appear to handle this in their iommu_ops::map callbacks 
already, so the whole patch does indeed seem bogus. What might be 
worthwhile, though, is seeing if there's scope to refactor those drivers 
to push some of it into an iommu_ops::iotlb_sync_map callback to 
optimise the flushing for multi-page mappings.

Robin.

>   	*handle = iova;
>   	sg_free_table(&sgt);
>   	return pages;
> @@ -652,6 +655,10 @@ static dma_addr_t __iommu_dma_map(struct device *dev, phys_addr_t phys,
>   		iommu_dma_free_iova(cookie, iova, size);
>   		return DMA_MAPPING_ERROR;
>   	}
> +
> +	if (domain->ops->flush_np_cache)
> +		domain->ops->flush_np_cache(domain, iova, size);
> +
>   	return iova + iova_off;
>   }
>   
> @@ -812,6 +819,9 @@ int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg,
>   	if (iommu_map_sg_atomic(domain, iova, sg, nents, prot) < iova_len)
>   		goto out_free_iova;
>   
> +	if (domain->ops->flush_np_cache)
> +		domain->ops->flush_np_cache(domain, iova, iova_len);
> +
>   	return __finalise_sg(dev, sg, nents, iova);
>   
>   out_free_iova:
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 75559918d9bd..47ff8d731d6a 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -173,6 +173,7 @@ struct iommu_resv_region {
>    * @iotlb_sync_map: Sync mappings created recently using @map to the hardware
>    * @iotlb_sync: Flush all queued ranges from the hardware TLBs and empty flush
>    *            queue
> + * @flush_np_cache: Flush the non present entry cache
>    * @iova_to_phys: translate iova to physical address
>    * @add_device: add device to iommu grouping
>    * @remove_device: remove device from iommu grouping
> @@ -209,6 +210,8 @@ struct iommu_ops {
>   				unsigned long iova, size_t size);
>   	void (*iotlb_sync_map)(struct iommu_domain *domain);
>   	void (*iotlb_sync)(struct iommu_domain *domain);
> +	void (*flush_np_cache)(struct iommu_domain *domain,
> +				unsigned long iova, size_t size);
>   	phys_addr_t (*iova_to_phys)(struct iommu_domain *domain, dma_addr_t iova);
>   	int (*add_device)(struct device *dev);
>   	void (*remove_device)(struct device *dev);
>
Tom Murphy April 16, 2019, 4:40 p.m. UTC | #2
>That said, I've now gone and looked and AFAICS both the Intel...
Ah, I missed that, you're right.

>...and AMD
It doesn't look like it. On AMD the cache is flushed during
iommu_ops::map only if the there are page table pages to free (if
we're allocating a large page and freeing the sub pages), right?
I guess this is a bug in the AMD iommu driver, as you said, it should
be handled in iommu_map(). I'll submit another patch to flush the np
cache on a call to amd iommu_ops::map if amd_iommu_np_cache is set.

>What might be
>worthwhile, though, is seeing if there's scope to refactor those drivers
>to push some of it into an iommu_ops::iotlb_sync_map callback to
>optimise the flushing for multi-page mappings.
I am working on a different patch series to improve the flushing and
page table page freeing for both amd and intel

On Tue, Apr 16, 2019 at 3:01 PM Robin Murphy <robin.murphy@arm.com> wrote:
>
> On 11/04/2019 19:47, Tom Murphy wrote:
> > Both the AMD and Intel drivers can cache not present IOTLB entries. To
> > convert these drivers to the dma-iommu api we need a generic way to
> > flush the NP cache. IOMMU drivers which have a NP cache can implement
> > the .flush_np_cache function in the iommu ops struct. I will implement
> > .flush_np_cache for both the Intel and AMD drivers in later patches.
> >
> > The Intel np-cache is described here:
> > https://software.intel.com/sites/default/files/managed/c5/15/vt-directed-io-spec.pdf#G7.66452
> >
> > And the AMD np-cache is described here:
> > https://developer.amd.com/wordpress/media/2012/10/34434-IOMMU-Rev_1.26_2-11-09.pdf#page=63
>
> Callers expect that once iommu_map() returns successfully, the mapping
> exists and is ready to use - if these drivers aren't handling this
> flushing internally, how are they not already broken for e.g. VFIO?
>
> > Signed-off-by: Tom Murphy <tmurphy@arista.com>
> > ---
> >   drivers/iommu/dma-iommu.c | 10 ++++++++++
> >   include/linux/iommu.h     |  3 +++
> >   2 files changed, 13 insertions(+)
> >
> > diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> > index 1a4bff3f8427..cc5da30d6e58 100644
> > --- a/drivers/iommu/dma-iommu.c
> > +++ b/drivers/iommu/dma-iommu.c
> > @@ -594,6 +594,9 @@ struct page **iommu_dma_alloc(struct device *dev, size_t size, gfp_t gfp,
> >                       < size)
> >               goto out_free_sg;
> >
> > +     if (domain->ops->flush_np_cache)
> > +             domain->ops->flush_np_cache(domain, iova, size);
> > +
>
> This doesn't scale. At the very least, it should be internal to
> iommu_map() and exposed to be the responsibility of every external
> caller now and forever after.
>
> That said, I've now gone and looked and AFAICS both the Intel and AMD
> drivers *do* appear to handle this in their iommu_ops::map callbacks
> already, so the whole patch does indeed seem bogus. What might be
> worthwhile, though, is seeing if there's scope to refactor those drivers
> to push some of it into an iommu_ops::iotlb_sync_map callback to
> optimise the flushing for multi-page mappings.
>
> Robin.
>
> >       *handle = iova;
> >       sg_free_table(&sgt);
> >       return pages;
> > @@ -652,6 +655,10 @@ static dma_addr_t __iommu_dma_map(struct device *dev, phys_addr_t phys,
> >               iommu_dma_free_iova(cookie, iova, size);
> >               return DMA_MAPPING_ERROR;
> >       }
> > +
> > +     if (domain->ops->flush_np_cache)
> > +             domain->ops->flush_np_cache(domain, iova, size);
> > +
> >       return iova + iova_off;
> >   }
> >
> > @@ -812,6 +819,9 @@ int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg,
> >       if (iommu_map_sg_atomic(domain, iova, sg, nents, prot) < iova_len)
> >               goto out_free_iova;
> >
> > +     if (domain->ops->flush_np_cache)
> > +             domain->ops->flush_np_cache(domain, iova, iova_len);
> > +
> >       return __finalise_sg(dev, sg, nents, iova);
> >
> >   out_free_iova:
> > diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> > index 75559918d9bd..47ff8d731d6a 100644
> > --- a/include/linux/iommu.h
> > +++ b/include/linux/iommu.h
> > @@ -173,6 +173,7 @@ struct iommu_resv_region {
> >    * @iotlb_sync_map: Sync mappings created recently using @map to the hardware
> >    * @iotlb_sync: Flush all queued ranges from the hardware TLBs and empty flush
> >    *            queue
> > + * @flush_np_cache: Flush the non present entry cache
> >    * @iova_to_phys: translate iova to physical address
> >    * @add_device: add device to iommu grouping
> >    * @remove_device: remove device from iommu grouping
> > @@ -209,6 +210,8 @@ struct iommu_ops {
> >                               unsigned long iova, size_t size);
> >       void (*iotlb_sync_map)(struct iommu_domain *domain);
> >       void (*iotlb_sync)(struct iommu_domain *domain);
> > +     void (*flush_np_cache)(struct iommu_domain *domain,
> > +                             unsigned long iova, size_t size);
> >       phys_addr_t (*iova_to_phys)(struct iommu_domain *domain, dma_addr_t iova);
> >       int (*add_device)(struct device *dev);
> >       void (*remove_device)(struct device *dev);
> >

Patch
diff mbox series

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 1a4bff3f8427..cc5da30d6e58 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -594,6 +594,9 @@  struct page **iommu_dma_alloc(struct device *dev, size_t size, gfp_t gfp,
 			< size)
 		goto out_free_sg;
 
+	if (domain->ops->flush_np_cache)
+		domain->ops->flush_np_cache(domain, iova, size);
+
 	*handle = iova;
 	sg_free_table(&sgt);
 	return pages;
@@ -652,6 +655,10 @@  static dma_addr_t __iommu_dma_map(struct device *dev, phys_addr_t phys,
 		iommu_dma_free_iova(cookie, iova, size);
 		return DMA_MAPPING_ERROR;
 	}
+
+	if (domain->ops->flush_np_cache)
+		domain->ops->flush_np_cache(domain, iova, size);
+
 	return iova + iova_off;
 }
 
@@ -812,6 +819,9 @@  int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg,
 	if (iommu_map_sg_atomic(domain, iova, sg, nents, prot) < iova_len)
 		goto out_free_iova;
 
+	if (domain->ops->flush_np_cache)
+		domain->ops->flush_np_cache(domain, iova, iova_len);
+
 	return __finalise_sg(dev, sg, nents, iova);
 
 out_free_iova:
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 75559918d9bd..47ff8d731d6a 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -173,6 +173,7 @@  struct iommu_resv_region {
  * @iotlb_sync_map: Sync mappings created recently using @map to the hardware
  * @iotlb_sync: Flush all queued ranges from the hardware TLBs and empty flush
  *            queue
+ * @flush_np_cache: Flush the non present entry cache
  * @iova_to_phys: translate iova to physical address
  * @add_device: add device to iommu grouping
  * @remove_device: remove device from iommu grouping
@@ -209,6 +210,8 @@  struct iommu_ops {
 				unsigned long iova, size_t size);
 	void (*iotlb_sync_map)(struct iommu_domain *domain);
 	void (*iotlb_sync)(struct iommu_domain *domain);
+	void (*flush_np_cache)(struct iommu_domain *domain,
+				unsigned long iova, size_t size);
 	phys_addr_t (*iova_to_phys)(struct iommu_domain *domain, dma_addr_t iova);
 	int (*add_device)(struct device *dev);
 	void (*remove_device)(struct device *dev);