diff mbox series

[v3,03/13] iommu/dma: Force bouncing of the size is not cacheline-aligned

Message ID 20221106220143.2129263-4-catalin.marinas@arm.com (mailing list archive)
State New, archived
Headers show
Series mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8 | expand

Commit Message

Catalin Marinas Nov. 6, 2022, 10:01 p.m. UTC
Similarly to the direct DMA, bounce small allocations as they may have
originated from a kmalloc() cache not safe for DMA. Unlike the direct
DMA, iommu_dma_map_sg() cannot call iommu_dma_map_sg_swiotlb() for all
non-coherent devices as this would break some cases where the iova is
expected to be contiguous (dmabuf). Instead, scan the scatterlist for
any small sizes and only go the swiotlb path if any element of the list
needs bouncing (note that iommu_dma_map_page() would still only bounce
those buffers which are not DMA-aligned).

To avoid scanning the scatterlist on the 'sync' operations, introduce a
SG_DMA_BOUNCED flag set during the iommu_dma_map_sg() call (suggested by
Robin Murphy).

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Robin Murphy <robin.murphy@arm.com>
---

Not entirely sure about this approach but here it is. And it needs
better testing.

 drivers/iommu/dma-iommu.c   | 12 ++++++++----
 include/linux/dma-map-ops.h | 23 +++++++++++++++++++++++
 include/linux/scatterlist.h | 27 ++++++++++++++++++++++++---
 3 files changed, 55 insertions(+), 7 deletions(-)

Comments

Christoph Hellwig Nov. 7, 2022, 9:46 a.m. UTC | #1
> +static inline bool dma_sg_kmalloc_needs_bounce(struct device *dev,
> +					       struct scatterlist *sg, int nents,
> +					       enum dma_data_direction dir)
> +{
> +	struct scatterlist *s;
> +	int i;
> +
> +	if (!IS_ENABLED(CONFIG_DMA_BOUNCE_UNALIGNED_KMALLOC) ||
> +	    dir == DMA_TO_DEVICE || dev_is_dma_coherent(dev))
> +		return false;

This part should be shared with dma-direct in a well documented helper.

> +	for_each_sg(sg, s, nents, i) {
> +		if (dma_kmalloc_needs_bounce(dev, s->length, dir))
> +			return true;
> +	}

And for this loop iteration I'd much prefer it to be out of line, and
also not available in a global helper.

But maybe someone can come up with a nice tweak to the dma-iommu
code to not require the extra sglist walk anyway.
Catalin Marinas Nov. 7, 2022, 10:54 a.m. UTC | #2
On Mon, Nov 07, 2022 at 10:46:03AM +0100, Christoph Hellwig wrote:
> > +static inline bool dma_sg_kmalloc_needs_bounce(struct device *dev,
> > +					       struct scatterlist *sg, int nents,
> > +					       enum dma_data_direction dir)
> > +{
> > +	struct scatterlist *s;
> > +	int i;
> > +
> > +	if (!IS_ENABLED(CONFIG_DMA_BOUNCE_UNALIGNED_KMALLOC) ||
> > +	    dir == DMA_TO_DEVICE || dev_is_dma_coherent(dev))
> > +		return false;
> 
> This part should be shared with dma-direct in a well documented helper.
> 
> > +	for_each_sg(sg, s, nents, i) {
> > +		if (dma_kmalloc_needs_bounce(dev, s->length, dir))
> > +			return true;
> > +	}
> 
> And for this loop iteration I'd much prefer it to be out of line, and
> also not available in a global helper.
> 
> But maybe someone can come up with a nice tweak to the dma-iommu
> code to not require the extra sglist walk anyway.

An idea: we could add another member to struct scatterlist to track the
bounced address. We can then do the bouncing in a similar way to
iommu_dma_map_sg_swiotlb() but without the iova allocation. The latter
would be a common path for both the bounced and non-bounced cases.
Robin Murphy Nov. 7, 2022, 1:26 p.m. UTC | #3
On 2022-11-07 10:54, Catalin Marinas wrote:
> On Mon, Nov 07, 2022 at 10:46:03AM +0100, Christoph Hellwig wrote:
>>> +static inline bool dma_sg_kmalloc_needs_bounce(struct device *dev,
>>> +					       struct scatterlist *sg, int nents,
>>> +					       enum dma_data_direction dir)
>>> +{
>>> +	struct scatterlist *s;
>>> +	int i;
>>> +
>>> +	if (!IS_ENABLED(CONFIG_DMA_BOUNCE_UNALIGNED_KMALLOC) ||
>>> +	    dir == DMA_TO_DEVICE || dev_is_dma_coherent(dev))
>>> +		return false;
>>
>> This part should be shared with dma-direct in a well documented helper.
>>
>>> +	for_each_sg(sg, s, nents, i) {
>>> +		if (dma_kmalloc_needs_bounce(dev, s->length, dir))
>>> +			return true;
>>> +	}
>>
>> And for this loop iteration I'd much prefer it to be out of line, and
>> also not available in a global helper.
>>
>> But maybe someone can come up with a nice tweak to the dma-iommu
>> code to not require the extra sglist walk anyway.
> 
> An idea: we could add another member to struct scatterlist to track the
> bounced address. We can then do the bouncing in a similar way to
> iommu_dma_map_sg_swiotlb() but without the iova allocation. The latter
> would be a common path for both the bounced and non-bounced cases.

FWIW I spent a little time looking at this as well; I'm pretty confident
it can be done without the extra walk if the iommu-dma bouncing is
completely refactored (and it might want a SWIOTLB helper to retrieve
the original page from a bounced address). That's going to be a bigger
job than I'll be able to finish this cycle, and I concluded that this
in-between approach wouldn't be worth posting for its own sake, but as
part of this series I think it's a reasonable compromise. What we have
here is effectively a pretty specialist config that trades DMA mapping
performance for memory efficiency, so trading a little more performance
initially for the sake of keeping it manageable seems fair to me.

The one thing I did get as far as writing up is the patch below, which
I'll share as an indirect review comment on this patch - feel free to
pick it up or squash it in if you think it's worthwhile.

Thanks,
Robin.

----->8-----
From: Robin Murphy <robin.murphy@arm.com>
Date: Wed, 2 Nov 2022 17:35:09 +0000
Subject: [PATCH] scatterlist: Add dedicated config for DMA flags

The DMA flags field will be useful for users beyond PCI P2P, so upgrade
to its own dedicated config option.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
  drivers/pci/Kconfig         | 1 +
  include/linux/scatterlist.h | 4 ++--
  kernel/dma/Kconfig          | 3 +++
  3 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig
index 55c028af4bd9..0303604d9de9 100644
--- a/drivers/pci/Kconfig
+++ b/drivers/pci/Kconfig
@@ -173,6 +173,7 @@ config PCI_P2PDMA
  	#
  	depends on 64BIT
  	select GENERIC_ALLOCATOR
+	select NEED_SG_DMA_FLAGS
  	help
  	  Enableѕ drivers to do PCI peer-to-peer transactions to and from
  	  BARs that are exposed in other devices that are the part of
diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
index 375a5e90d86a..87aaf8b5cdb4 100644
--- a/include/linux/scatterlist.h
+++ b/include/linux/scatterlist.h
@@ -16,7 +16,7 @@ struct scatterlist {
  #ifdef CONFIG_NEED_SG_DMA_LENGTH
  	unsigned int	dma_length;
  #endif
-#ifdef CONFIG_PCI_P2PDMA
+#ifdef CONFIG_NEED_SG_DMA_FLAGS
  	unsigned int    dma_flags;
  #endif
  };
@@ -249,7 +249,7 @@ static inline void sg_unmark_end(struct scatterlist *sg)
  }
  
  /*
- * CONFGI_PCI_P2PDMA depends on CONFIG_64BIT which means there is 4 bytes
+ * CONFIG_PCI_P2PDMA depends on CONFIG_64BIT which means there is 4 bytes
   * in struct scatterlist (assuming also CONFIG_NEED_SG_DMA_LENGTH is set).
   * Use this padding for DMA flags bits to indicate when a specific
   * dma address is a bus address.
diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig
index 56866aaa2ae1..48016c4f67ac 100644
--- a/kernel/dma/Kconfig
+++ b/kernel/dma/Kconfig
@@ -24,6 +24,9 @@ config DMA_OPS_BYPASS
  config ARCH_HAS_DMA_MAP_DIRECT
  	bool
  
+config NEED_SG_DMA_FLAGS
+	bool
+
  config NEED_SG_DMA_LENGTH
  	bool
Christoph Hellwig Nov. 8, 2022, 7:50 a.m. UTC | #4
On Mon, Nov 07, 2022 at 10:54:36AM +0000, Catalin Marinas wrote:
> An idea: we could add another member to struct scatterlist to track the
> bounced address. We can then do the bouncing in a similar way to
> iommu_dma_map_sg_swiotlb() but without the iova allocation. The latter
> would be a common path for both the bounced and non-bounced cases.

That would be a pretty massive memory overhead for an unusual case,
so I'd rather avoid it.  In addition to the long term plan of doing
DMA mappings without a scatterlist..
Catalin Marinas Nov. 8, 2022, 10:51 a.m. UTC | #5
On Mon, Nov 07, 2022 at 01:26:21PM +0000, Robin Murphy wrote:
> On 2022-11-07 10:54, Catalin Marinas wrote:
> > On Mon, Nov 07, 2022 at 10:46:03AM +0100, Christoph Hellwig wrote:
> > > > +static inline bool dma_sg_kmalloc_needs_bounce(struct device *dev,
> > > > +					       struct scatterlist *sg, int nents,
> > > > +					       enum dma_data_direction dir)
> > > > +{
> > > > +	struct scatterlist *s;
> > > > +	int i;
> > > > +
> > > > +	if (!IS_ENABLED(CONFIG_DMA_BOUNCE_UNALIGNED_KMALLOC) ||
> > > > +	    dir == DMA_TO_DEVICE || dev_is_dma_coherent(dev))
> > > > +		return false;
> > > 
> > > This part should be shared with dma-direct in a well documented helper.
> > > 
> > > > +	for_each_sg(sg, s, nents, i) {
> > > > +		if (dma_kmalloc_needs_bounce(dev, s->length, dir))
> > > > +			return true;
> > > > +	}
> > > 
> > > And for this loop iteration I'd much prefer it to be out of line, and
> > > also not available in a global helper.
> > > 
> > > But maybe someone can come up with a nice tweak to the dma-iommu
> > > code to not require the extra sglist walk anyway.
> > 
> > An idea: we could add another member to struct scatterlist to track the
> > bounced address. We can then do the bouncing in a similar way to
> > iommu_dma_map_sg_swiotlb() but without the iova allocation. The latter
> > would be a common path for both the bounced and non-bounced cases.
> 
> FWIW I spent a little time looking at this as well; I'm pretty confident
> it can be done without the extra walk if the iommu-dma bouncing is
> completely refactored (and it might want a SWIOTLB helper to retrieve
> the original page from a bounced address).

Doesn't sg_page() provide the original page already? Either way, the
swiotlb knows it as it needs to do the copying between buffers.

> That's going to be a bigger
> job than I'll be able to finish this cycle, and I concluded that this
> in-between approach wouldn't be worth posting for its own sake, but as
> part of this series I think it's a reasonable compromise.

I'll drop my hack once you have something. Happy to carry it as part of
this series.

> diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
> index 375a5e90d86a..87aaf8b5cdb4 100644
> --- a/include/linux/scatterlist.h
> +++ b/include/linux/scatterlist.h
> @@ -16,7 +16,7 @@ struct scatterlist {
>  #ifdef CONFIG_NEED_SG_DMA_LENGTH
>  	unsigned int	dma_length;
>  #endif
> -#ifdef CONFIG_PCI_P2PDMA
> +#ifdef CONFIG_NEED_SG_DMA_FLAGS
>  	unsigned int    dma_flags;
>  #endif

I initially had something similar but I decided it's overkill for a
patch that I expected to be NAK'ed.

I'll include your patch in my series in the meantime.
Robin Murphy Nov. 8, 2022, 11:40 a.m. UTC | #6
On 2022-11-08 10:51, Catalin Marinas wrote:
> On Mon, Nov 07, 2022 at 01:26:21PM +0000, Robin Murphy wrote:
>> On 2022-11-07 10:54, Catalin Marinas wrote:
>>> On Mon, Nov 07, 2022 at 10:46:03AM +0100, Christoph Hellwig wrote:
>>>>> +static inline bool dma_sg_kmalloc_needs_bounce(struct device *dev,
>>>>> +					       struct scatterlist *sg, int nents,
>>>>> +					       enum dma_data_direction dir)
>>>>> +{
>>>>> +	struct scatterlist *s;
>>>>> +	int i;
>>>>> +
>>>>> +	if (!IS_ENABLED(CONFIG_DMA_BOUNCE_UNALIGNED_KMALLOC) ||
>>>>> +	    dir == DMA_TO_DEVICE || dev_is_dma_coherent(dev))
>>>>> +		return false;
>>>>
>>>> This part should be shared with dma-direct in a well documented helper.
>>>>
>>>>> +	for_each_sg(sg, s, nents, i) {
>>>>> +		if (dma_kmalloc_needs_bounce(dev, s->length, dir))
>>>>> +			return true;
>>>>> +	}
>>>>
>>>> And for this loop iteration I'd much prefer it to be out of line, and
>>>> also not available in a global helper.
>>>>
>>>> But maybe someone can come up with a nice tweak to the dma-iommu
>>>> code to not require the extra sglist walk anyway.
>>>
>>> An idea: we could add another member to struct scatterlist to track the
>>> bounced address. We can then do the bouncing in a similar way to
>>> iommu_dma_map_sg_swiotlb() but without the iova allocation. The latter
>>> would be a common path for both the bounced and non-bounced cases.
>>
>> FWIW I spent a little time looking at this as well; I'm pretty confident
>> it can be done without the extra walk if the iommu-dma bouncing is
>> completely refactored (and it might want a SWIOTLB helper to retrieve
>> the original page from a bounced address).
> 
> Doesn't sg_page() provide the original page already? Either way, the
> swiotlb knows it as it needs to do the copying between buffers.

For the part where we temporarily rewrite the offsets and lengths to 
pass to iommu_map_sg(), we'd also have to swizzle any relevant page 
pointers so that that picks up the physical addresses of the bounce 
buffer slots rather than the original pages, but then we need to put 
them back straight afterwards. Since SWIOTLB keeps track of that 
internally, it'll be a lot neater and more efficient to simply ask for 
it than to allocate more temporary storage to remember it independently 
(like I did for that horrible erratum thing to keep it self-contained).

>> That's going to be a bigger
>> job than I'll be able to finish this cycle, and I concluded that this
>> in-between approach wouldn't be worth posting for its own sake, but as
>> part of this series I think it's a reasonable compromise.
> 
> I'll drop my hack once you have something. Happy to carry it as part of
> this series.

Cool, I can't promise how soon I'll get there, but like I said if all 
the other objections are worked out in the meantime I have no issue with 
landing this approach and improving on it later.

Thanks,
Robin.

>> diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
>> index 375a5e90d86a..87aaf8b5cdb4 100644
>> --- a/include/linux/scatterlist.h
>> +++ b/include/linux/scatterlist.h
>> @@ -16,7 +16,7 @@ struct scatterlist {
>>   #ifdef CONFIG_NEED_SG_DMA_LENGTH
>>   	unsigned int	dma_length;
>>   #endif
>> -#ifdef CONFIG_PCI_P2PDMA
>> +#ifdef CONFIG_NEED_SG_DMA_FLAGS
>>   	unsigned int    dma_flags;
>>   #endif
> 
> I initially had something similar but I decided it's overkill for a
> patch that I expected to be NAK'ed.
> 
> I'll include your patch in my series in the meantime.
>
Isaac Manjarres Nov. 14, 2022, 11:23 p.m. UTC | #7
On Sun, Nov 06, 2022 at 10:01:33PM +0000, Catalin Marinas wrote:
> @@ -1202,7 +1203,10 @@ static int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg,
>  			goto out;
>  	}
>  
> -	if (dev_use_swiotlb(dev))
> +	if (dma_sg_kmalloc_needs_bounce(dev, sg, nents, dir))
> +		sg_dma_mark_bounced(sg);
> +
> +	if (dev_use_swiotlb(dev) || sg_is_dma_bounced(sg))
>  		return iommu_dma_map_sg_swiotlb(dev, sg, nents, dir, attrs);

Shouldn't you add a similar check in the iommu_dma_unmap_sg() path to
free any SWIOTLB memory that may have been allocated to bounce a scatter gather
list?
Catalin Marinas Nov. 15, 2022, 11:48 a.m. UTC | #8
On Mon, Nov 14, 2022 at 03:23:54PM -0800, Isaac Manjarres wrote:
> On Sun, Nov 06, 2022 at 10:01:33PM +0000, Catalin Marinas wrote:
> > @@ -1202,7 +1203,10 @@ static int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg,
> >  			goto out;
> >  	}
> >  
> > -	if (dev_use_swiotlb(dev))
> > +	if (dma_sg_kmalloc_needs_bounce(dev, sg, nents, dir))
> > +		sg_dma_mark_bounced(sg);
> > +
> > +	if (dev_use_swiotlb(dev) || sg_is_dma_bounced(sg))
> >  		return iommu_dma_map_sg_swiotlb(dev, sg, nents, dir, attrs);
> 
> Shouldn't you add a similar check in the iommu_dma_unmap_sg() path to
> free any SWIOTLB memory that may have been allocated to bounce a scatter gather
> list?

Good point, not sure how I missed this. The sync'ing works fine as
iommu_dma_sync_sg_for_cpu() has the check but the swiotlb buffer won't
be freed.

Thanks.
diff mbox series

Patch

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 9297b741f5e8..8c80dffe0337 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -948,7 +948,7 @@  static void iommu_dma_sync_sg_for_cpu(struct device *dev,
 	struct scatterlist *sg;
 	int i;
 
-	if (dev_use_swiotlb(dev))
+	if (dev_use_swiotlb(dev) || sg_is_dma_bounced(sgl))
 		for_each_sg(sgl, sg, nelems, i)
 			iommu_dma_sync_single_for_cpu(dev, sg_dma_address(sg),
 						      sg->length, dir);
@@ -964,7 +964,7 @@  static void iommu_dma_sync_sg_for_device(struct device *dev,
 	struct scatterlist *sg;
 	int i;
 
-	if (dev_use_swiotlb(dev))
+	if (dev_use_swiotlb(dev) || sg_is_dma_bounced(sgl))
 		for_each_sg(sgl, sg, nelems, i)
 			iommu_dma_sync_single_for_device(dev,
 							 sg_dma_address(sg),
@@ -990,7 +990,8 @@  static dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
 	 * If both the physical buffer start address and size are
 	 * page aligned, we don't need to use a bounce page.
 	 */
-	if (dev_use_swiotlb(dev) && iova_offset(iovad, phys | size)) {
+	if ((dev_use_swiotlb(dev) && iova_offset(iovad, phys | size)) ||
+	    dma_kmalloc_needs_bounce(dev, size, dir)) {
 		void *padding_start;
 		size_t padding_size, aligned_size;
 
@@ -1202,7 +1203,10 @@  static int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg,
 			goto out;
 	}
 
-	if (dev_use_swiotlb(dev))
+	if (dma_sg_kmalloc_needs_bounce(dev, sg, nents, dir))
+		sg_dma_mark_bounced(sg);
+
+	if (dev_use_swiotlb(dev) || sg_is_dma_bounced(sg))
 		return iommu_dma_map_sg_swiotlb(dev, sg, nents, dir, attrs);
 
 	if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h
index 785f7aa90f57..e747a46261d4 100644
--- a/include/linux/dma-map-ops.h
+++ b/include/linux/dma-map-ops.h
@@ -302,6 +302,29 @@  static inline bool dma_kmalloc_needs_bounce(struct device *dev, size_t size,
 	return true;
 }
 
+/*
+ * Return true if any of the scatterlist elements needs bouncing due to
+ * potentially originating from a small kmalloc() cache.
+ */
+static inline bool dma_sg_kmalloc_needs_bounce(struct device *dev,
+					       struct scatterlist *sg, int nents,
+					       enum dma_data_direction dir)
+{
+	struct scatterlist *s;
+	int i;
+
+	if (!IS_ENABLED(CONFIG_DMA_BOUNCE_UNALIGNED_KMALLOC) ||
+	    dir == DMA_TO_DEVICE || dev_is_dma_coherent(dev))
+		return false;
+
+	for_each_sg(sg, s, nents, i) {
+		if (dma_kmalloc_needs_bounce(dev, s->length, dir))
+			return true;
+	}
+
+	return false;
+}
+
 void *arch_dma_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle,
 		gfp_t gfp, unsigned long attrs);
 void arch_dma_free(struct device *dev, size_t size, void *cpu_addr,
diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
index 375a5e90d86a..f16cf040fe2c 100644
--- a/include/linux/scatterlist.h
+++ b/include/linux/scatterlist.h
@@ -16,7 +16,7 @@  struct scatterlist {
 #ifdef CONFIG_NEED_SG_DMA_LENGTH
 	unsigned int	dma_length;
 #endif
-#ifdef CONFIG_PCI_P2PDMA
+#if defined(CONFIG_PCI_P2PDMA) || defined(CONFIG_DMA_BOUNCE_UNALIGNED_KMALLOC)
 	unsigned int    dma_flags;
 #endif
 };
@@ -248,6 +248,29 @@  static inline void sg_unmark_end(struct scatterlist *sg)
 	sg->page_link &= ~SG_END;
 }
 
+#define SG_DMA_BUS_ADDRESS	(1 << 0)
+#define SG_DMA_BOUNCED		(1 << 1)
+
+#ifdef CONFIG_DMA_BOUNCE_UNALIGNED_KMALLOC
+static inline bool sg_is_dma_bounced(struct scatterlist *sg)
+{
+	return sg->dma_flags & SG_DMA_BOUNCED;
+}
+
+static inline void sg_dma_mark_bounced(struct scatterlist *sg)
+{
+	sg->dma_flags |= SG_DMA_BOUNCED;
+}
+#else
+static inline bool sg_is_dma_bounced(struct scatterlist *sg)
+{
+	return false;
+}
+static inline void sg_dma_mark_bounced(struct scatterlist *sg)
+{
+}
+#endif
+
 /*
  * CONFGI_PCI_P2PDMA depends on CONFIG_64BIT which means there is 4 bytes
  * in struct scatterlist (assuming also CONFIG_NEED_SG_DMA_LENGTH is set).
@@ -256,8 +279,6 @@  static inline void sg_unmark_end(struct scatterlist *sg)
  */
 #ifdef CONFIG_PCI_P2PDMA
 
-#define SG_DMA_BUS_ADDRESS (1 << 0)
-
 /**
  * sg_dma_is_bus address - Return whether a given segment was marked
  *			   as a bus address