diff mbox series

[2/2] arm: use swiotlb for bounce buffer on LPAE configs

Message ID 20190709142011.24984-3-hch@lst.de (mailing list archive)
State New, archived
Headers show
Series [1/2] dma-mapping check pfn validity in dma_common_{mmap, get_sgtable} | expand

Commit Message

Christoph Hellwig July 9, 2019, 2:20 p.m. UTC
The DMA API requires that 32-bit DMA masks are always supported, but on
arm LPAE configs they do not currently work when memory is present
above 4GB.  Wire up the swiotlb code like for all other architectures
to provide the bounce buffering in that case.

Fixes: 21e07dba9fb11 ("scsi: reduce use of block bounce buffers").
Reported-by: Roger Quadros <rogerq@ti.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 arch/arm/include/asm/dma-mapping.h |  4 +-
 arch/arm/mm/Kconfig                |  5 +++
 arch/arm/mm/dma-mapping.c          | 61 ++++++++++++++++++++++++++++++
 arch/arm/mm/init.c                 |  5 +++
 4 files changed, 74 insertions(+), 1 deletion(-)

Comments

Nicolas Saenz Julienne July 24, 2019, 5:23 p.m. UTC | #1
On Tue, 2019-07-09 at 07:20 -0700, Christoph Hellwig wrote:
> The DMA API requires that 32-bit DMA masks are always supported, but on
> arm LPAE configs they do not currently work when memory is present
> above 4GB.  Wire up the swiotlb code like for all other architectures
> to provide the bounce buffering in that case.
> 
> Fixes: 21e07dba9fb11 ("scsi: reduce use of block bounce buffers").
> Reported-by: Roger Quadros <rogerq@ti.com>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---

Hi Chistoph,
Out of curiosity, what is the reason stopping us from using dma-direct/swiotlb
instead of arm_dma_ops altogether?

Regards,
Nicolas
Christoph Hellwig July 24, 2019, 5:55 p.m. UTC | #2
On Wed, Jul 24, 2019 at 07:23:50PM +0200, Nicolas Saenz Julienne wrote:
> Out of curiosity, what is the reason stopping us from using dma-direct/swiotlb
> instead of arm_dma_ops altogether?

Nothing fundamental.  We just need to do a very careful piecemeal
migration as the arm code handles a ot of interesting corner case and
we need to ensure we don't break that.  I have various WIP patches
for the easier bits and we can work from there.
Peter Ujfalusi Dec. 19, 2019, 1:10 p.m. UTC | #3
Hi,

On 09/07/2019 17.20, Christoph Hellwig wrote:
> The DMA API requires that 32-bit DMA masks are always supported, but on
> arm LPAE configs they do not currently work when memory is present
> above 4GB.  Wire up the swiotlb code like for all other architectures
> to provide the bounce buffering in that case.

bisect pointed me to this commit as the reason why EDMA fails to probe and sdhci is falling back to PIO mode (not using it's built in DMA).

In both cases the reason is that
dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32));
fails, because dma_direct_supported() is returning false.


Prints inside dma_direct_supported():
sdhci-omap 23000000.mmc: max_pfn: 880000
sdhci-omap 23000000.mmc: min_mask #1: ffffff
sdhci-omap 23000000.mmc: min_mask #2: ffffff
sdhci-omap 23000000.mmc: dev->dma_pfn_offset: 780000
sdhci-omap 23000000.mmc: PAGE_SHIFT: 12
sdhci-omap 23000000.mmc: __phys_to_dma(dev, min_mask): ff880ffffff
sdhci-omap 23000000.mmc: mask: ffffffff

Print in dma_supported() after returning from dma_direct_supported():
sdhci-omap 23000000.mmc: dma_is_direct, ret = 0

sdhci-omap 23100000.mmc: DMA is not supported for the device

keystone-k2g have this in soc0 node:
dma-ranges = <0x80000000 0x8 0x00000000 0x80000000>;

DDR starts at 0x8 0000 0000 (8G) and 2G is aliased at 0x8000 0000.

This gives the 0x780000 for dma_pfn_offset for all devices underneath it.

The DMA_BIT_MASK(24) is passed to __phys_to_dma() because CONFIG_ZONE_DMA is enabled.

SWIOTLB is enabled in kconfig.

I'm not sure how to correctly fix it, but the following patch makes things working:

From b682a61776f0861755c9d54e5ebccf8471d85bfd Mon Sep 17 00:00:00 2001
From: Peter Ujfalusi <peter.ujfalusi@ti.com>
Date: Thu, 19 Dec 2019 15:07:25 +0200
Subject: [PATCH] arm: mm: dma-mapping: Fix dma_supported() when
 dev->dma_pfn_offset is not 0

We can only use direct mapping when LPAE is enabled if the dma_pfn_offset
is 0, otherwise valid dma_masks will be rejected and the DMA support is
going to be denied for peripherals, or DMA drivers.

Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
---
 arch/arm/mm/dma-mapping.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 7d042d5c43e3..bf199b1e82bd 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -1100,15 +1100,6 @@ int arm_dma_supported(struct device *dev, u64 mask)
 
 static const struct dma_map_ops *arm_get_dma_map_ops(bool coherent)
 {
-	/*
-	 * When CONFIG_ARM_LPAE is set, physical address can extend above
-	 * 32-bits, which then can't be addressed by devices that only support
-	 * 32-bit DMA.
-	 * Use the generic dma-direct / swiotlb ops code in that case, as that
-	 * handles bounce buffering for us.
-	 */
-	if (IS_ENABLED(CONFIG_ARM_LPAE))
-		return NULL;
 	return coherent ? &arm_coherent_dma_ops : &arm_dma_ops;
 }
 
@@ -2309,6 +2300,15 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
 
 	if (arm_setup_iommu_dma_ops(dev, dma_base, size, iommu))
 		dma_ops = arm_get_iommu_dma_map_ops(coherent);
+	else if (IS_ENABLED(CONFIG_ARM_LPAE) && !dev->dma_pfn_offset)
+		/*
+		 * When CONFIG_ARM_LPAE is set, physical address can extend
+		 * above * 32-bits, which then can't be addressed by devices
+		 * that only support 32-bit DMA.
+		 * Use the generic dma-direct / swiotlb ops code in that case,
+		 * as that handles bounce buffering for us.
+		 */
+		dma_ops = NULL;
 	else
 		dma_ops = arm_get_dma_map_ops(coherent);
Christoph Hellwig Dec. 19, 2019, 3:02 p.m. UTC | #4
Hi Peter,

can you try the patch below (it will need to be split into two):

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index e822af0d9219..30b9c6786ce3 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -221,7 +221,8 @@ EXPORT_SYMBOL(arm_coherent_dma_ops);
 
 static int __dma_supported(struct device *dev, u64 mask, bool warn)
 {
-	unsigned long max_dma_pfn = min(max_pfn, arm_dma_pfn_limit);
+	unsigned long max_dma_pfn =
+		min_t(unsigned long, max_pfn, zone_dma_limit >> PAGE_SHIFT);
 
 	/*
 	 * Translate the device's DMA mask to a PFN limit.  This
diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
index 3ef204137e73..dd0e169a1bb1 100644
--- a/arch/arm/mm/init.c
+++ b/arch/arm/mm/init.c
@@ -19,6 +19,7 @@
 #include <linux/gfp.h>
 #include <linux/memblock.h>
 #include <linux/dma-contiguous.h>
+#include <linux/dma-direct.h>
 #include <linux/sizes.h>
 #include <linux/stop_machine.h>
 #include <linux/swiotlb.h>
@@ -84,15 +85,6 @@ static void __init find_limits(unsigned long *min, unsigned long *max_low,
 phys_addr_t arm_dma_zone_size __read_mostly;
 EXPORT_SYMBOL(arm_dma_zone_size);
 
-/*
- * The DMA mask corresponding to the maximum bus address allocatable
- * using GFP_DMA.  The default here places no restriction on DMA
- * allocations.  This must be the smallest DMA mask in the system,
- * so a successful GFP_DMA allocation will always satisfy this.
- */
-phys_addr_t arm_dma_limit;
-unsigned long arm_dma_pfn_limit;
-
 static void __init arm_adjust_dma_zone(unsigned long *size, unsigned long *hole,
 	unsigned long dma_size)
 {
@@ -108,14 +100,14 @@ static void __init arm_adjust_dma_zone(unsigned long *size, unsigned long *hole,
 
 void __init setup_dma_zone(const struct machine_desc *mdesc)
 {
-#ifdef CONFIG_ZONE_DMA
-	if (mdesc->dma_zone_size) {
+	if (!IS_ENABLED(CONFIG_ZONE_DMA)) {
+		zone_dma_limit = ((phys_addr_t)~0);
+	} else if (mdesc->dma_zone_size) {
 		arm_dma_zone_size = mdesc->dma_zone_size;
-		arm_dma_limit = PHYS_OFFSET + arm_dma_zone_size - 1;
-	} else
-		arm_dma_limit = 0xffffffff;
-	arm_dma_pfn_limit = arm_dma_limit >> PAGE_SHIFT;
-#endif
+		zone_dma_limit = PHYS_OFFSET + arm_dma_zone_size - 1;
+	} else {
+		zone_dma_limit = 0xffffffff;
+	}
 }
 
 static void __init zone_sizes_init(unsigned long min, unsigned long max_low,
@@ -279,7 +271,7 @@ void __init arm_memblock_init(const struct machine_desc *mdesc)
 	early_init_fdt_scan_reserved_mem();
 
 	/* reserve memory for DMA contiguous allocations */
-	dma_contiguous_reserve(arm_dma_limit);
+	dma_contiguous_reserve(zone_dma_limit);
 
 	arm_memblock_steal_permitted = false;
 	memblock_dump_all();
diff --git a/arch/arm/mm/mm.h b/arch/arm/mm/mm.h
index 88c121ac14b3..7dbd77554273 100644
--- a/arch/arm/mm/mm.h
+++ b/arch/arm/mm/mm.h
@@ -82,14 +82,6 @@ extern __init void add_static_vm_early(struct static_vm *svm);
 
 #endif
 
-#ifdef CONFIG_ZONE_DMA
-extern phys_addr_t arm_dma_limit;
-extern unsigned long arm_dma_pfn_limit;
-#else
-#define arm_dma_limit ((phys_addr_t)~0)
-#define arm_dma_pfn_limit (~0ul >> PAGE_SHIFT)
-#endif
-
 extern phys_addr_t arm_lowmem_limit;
 
 void __init bootmem_init(void);
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index b65dffdfb201..7a7501acd763 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -441,7 +441,7 @@ void __init arm64_memblock_init(void)
 	early_init_fdt_scan_reserved_mem();
 
 	if (IS_ENABLED(CONFIG_ZONE_DMA)) {
-		zone_dma_bits = ARM64_ZONE_DMA_BITS;
+		zone_dma_limit = DMA_BIT_MASK(ARM64_ZONE_DMA_BITS);
 		arm64_dma_phys_limit = max_zone_phys(ARM64_ZONE_DMA_BITS);
 	}
 
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index 9488b63dfc87..337ace03d3f0 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -223,7 +223,7 @@ static int __init mark_nonram_nosave(void)
  * everything else. GFP_DMA32 page allocations automatically fall back to
  * ZONE_DMA.
  *
- * By using 31-bit unconditionally, we can exploit zone_dma_bits to inform the
+ * By using 31-bit unconditionally, we can exploit zone_dma_limit to inform the
  * generic DMA mapping code.  32-bit only devices (if not handled by an IOMMU
  * anyway) will take a first dip into ZONE_NORMAL and get otherwise served by
  * ZONE_DMA.
@@ -257,18 +257,20 @@ void __init paging_init(void)
 	printk(KERN_DEBUG "Memory hole size: %ldMB\n",
 	       (long int)((top_of_ram - total_ram) >> 20));
 
+#ifdef CONFIG_ZONE_DMA
 	/*
 	 * Allow 30-bit DMA for very limited Broadcom wifi chips on many
 	 * powerbooks.
 	 */
-	if (IS_ENABLED(CONFIG_PPC32))
-		zone_dma_bits = 30;
-	else
-		zone_dma_bits = 31;
-
-#ifdef CONFIG_ZONE_DMA
-	max_zone_pfns[ZONE_DMA]	= min(max_low_pfn,
-				      1UL << (zone_dma_bits - PAGE_SHIFT));
+	if (IS_ENABLED(CONFIG_PPC32)) {
+		zone_dma_limit = DMA_BIT_MASK(30);
+		max_zone_pfns[ZONE_DMA]	= min(max_low_pfn,
+					      1UL << (30 - PAGE_SHIFT));
+	} else {
+		zone_dma_limit = DMA_BIT_MASK(31);
+		max_zone_pfns[ZONE_DMA]	= min(max_low_pfn,
+					      1UL << (31 - PAGE_SHIFT));
+	}
 #endif
 	max_zone_pfns[ZONE_NORMAL] = max_low_pfn;
 #ifdef CONFIG_HIGHMEM
diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
index f0ce22220565..c403f61cb56b 100644
--- a/arch/s390/mm/init.c
+++ b/arch/s390/mm/init.c
@@ -118,7 +118,7 @@ void __init paging_init(void)
 
 	sparse_memory_present_with_active_regions(MAX_NUMNODES);
 	sparse_init();
-	zone_dma_bits = 31;
+	zone_dma_limit = DMA_BIT_MASK(31);
 	memset(max_zone_pfns, 0, sizeof(max_zone_pfns));
 	max_zone_pfns[ZONE_DMA] = PFN_DOWN(MAX_DMA_ADDRESS);
 	max_zone_pfns[ZONE_NORMAL] = max_low_pfn;
diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
index 24b8684aa21d..20d56d597506 100644
--- a/include/linux/dma-direct.h
+++ b/include/linux/dma-direct.h
@@ -6,7 +6,7 @@
 #include <linux/memblock.h> /* for min_low_pfn */
 #include <linux/mem_encrypt.h>
 
-extern unsigned int zone_dma_bits;
+extern phys_addr_t zone_dma_limit;
 
 #ifdef CONFIG_ARCH_HAS_PHYS_TO_DMA
 #include <asm/dma-direct.h>
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 6af7ae83c4ad..5ea1bed2ba6f 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -21,7 +21,7 @@
  * it for entirely different regions. In that case the arch code needs to
  * override the variable below for dma-direct to work properly.
  */
-unsigned int zone_dma_bits __ro_after_init = 24;
+phys_addr_t zone_dma_limit __ro_after_init = DMA_BIT_MASK(24);
 
 static void report_addr(struct device *dev, dma_addr_t dma_addr, size_t size)
 {
@@ -74,7 +74,7 @@ static gfp_t __dma_direct_optimal_gfp_mask(struct device *dev, u64 dma_mask,
 	 * Note that GFP_DMA32 and GFP_DMA are no ops without the corresponding
 	 * zones.
 	 */
-	if (*phys_limit <= DMA_BIT_MASK(zone_dma_bits))
+	if (*phys_limit <= zone_dma_limit)
 		return GFP_DMA;
 	if (*phys_limit <= DMA_BIT_MASK(32))
 		return GFP_DMA32;
@@ -483,7 +483,7 @@ int dma_direct_supported(struct device *dev, u64 mask)
 	u64 min_mask;
 
 	if (IS_ENABLED(CONFIG_ZONE_DMA))
-		min_mask = DMA_BIT_MASK(zone_dma_bits);
+		min_mask = zone_dma_limit;
 	else
 		min_mask = DMA_BIT_MASK(32);
Peter Ujfalusi Dec. 19, 2019, 3:20 p.m. UTC | #5
Hi Christoph,

On 19/12/2019 17.02, Christoph Hellwig wrote:
> Hi Peter,
> 
> can you try the patch below (it will need to be split into two):

Thank you!

Unfortunately it does not help:
[    0.596208] edma: probe of 2700000.edma failed with error -5
[    0.596626] edma: probe of 2728000.edma failed with error -5
...
[    2.108602] sdhci-omap 23000000.mmc: Got CD GPIO
[    2.113899] mmc0: Failed to set 32-bit DMA mask.
[    2.118592] mmc0: No suitable DMA available - falling back to PIO
[    2.159038] mmc0: SDHCI controller on 23000000.mmc [23000000.mmc]
using PIO
[    2.167531] mmc1: Failed to set 32-bit DMA mask.
[    2.172192] mmc1: No suitable DMA available - falling back to PIO
[    2.213841] mmc1: SDHCI controller on 23100000.mmc [23100000.mmc]
using PIO

- Péter


> diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
> index e822af0d9219..30b9c6786ce3 100644
> --- a/arch/arm/mm/dma-mapping.c
> +++ b/arch/arm/mm/dma-mapping.c
> @@ -221,7 +221,8 @@ EXPORT_SYMBOL(arm_coherent_dma_ops);
>  
>  static int __dma_supported(struct device *dev, u64 mask, bool warn)
>  {
> -	unsigned long max_dma_pfn = min(max_pfn, arm_dma_pfn_limit);
> +	unsigned long max_dma_pfn =
> +		min_t(unsigned long, max_pfn, zone_dma_limit >> PAGE_SHIFT);
>  
>  	/*
>  	 * Translate the device's DMA mask to a PFN limit.  This
> diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
> index 3ef204137e73..dd0e169a1bb1 100644
> --- a/arch/arm/mm/init.c
> +++ b/arch/arm/mm/init.c
> @@ -19,6 +19,7 @@
>  #include <linux/gfp.h>
>  #include <linux/memblock.h>
>  #include <linux/dma-contiguous.h>
> +#include <linux/dma-direct.h>
>  #include <linux/sizes.h>
>  #include <linux/stop_machine.h>
>  #include <linux/swiotlb.h>
> @@ -84,15 +85,6 @@ static void __init find_limits(unsigned long *min, unsigned long *max_low,
>  phys_addr_t arm_dma_zone_size __read_mostly;
>  EXPORT_SYMBOL(arm_dma_zone_size);
>  
> -/*
> - * The DMA mask corresponding to the maximum bus address allocatable
> - * using GFP_DMA.  The default here places no restriction on DMA
> - * allocations.  This must be the smallest DMA mask in the system,
> - * so a successful GFP_DMA allocation will always satisfy this.
> - */
> -phys_addr_t arm_dma_limit;
> -unsigned long arm_dma_pfn_limit;
> -
>  static void __init arm_adjust_dma_zone(unsigned long *size, unsigned long *hole,
>  	unsigned long dma_size)
>  {
> @@ -108,14 +100,14 @@ static void __init arm_adjust_dma_zone(unsigned long *size, unsigned long *hole,
>  
>  void __init setup_dma_zone(const struct machine_desc *mdesc)
>  {
> -#ifdef CONFIG_ZONE_DMA
> -	if (mdesc->dma_zone_size) {
> +	if (!IS_ENABLED(CONFIG_ZONE_DMA)) {
> +		zone_dma_limit = ((phys_addr_t)~0);
> +	} else if (mdesc->dma_zone_size) {
>  		arm_dma_zone_size = mdesc->dma_zone_size;
> -		arm_dma_limit = PHYS_OFFSET + arm_dma_zone_size - 1;
> -	} else
> -		arm_dma_limit = 0xffffffff;
> -	arm_dma_pfn_limit = arm_dma_limit >> PAGE_SHIFT;
> -#endif
> +		zone_dma_limit = PHYS_OFFSET + arm_dma_zone_size - 1;
> +	} else {
> +		zone_dma_limit = 0xffffffff;
> +	}
>  }
>  
>  static void __init zone_sizes_init(unsigned long min, unsigned long max_low,
> @@ -279,7 +271,7 @@ void __init arm_memblock_init(const struct machine_desc *mdesc)
>  	early_init_fdt_scan_reserved_mem();
>  
>  	/* reserve memory for DMA contiguous allocations */
> -	dma_contiguous_reserve(arm_dma_limit);
> +	dma_contiguous_reserve(zone_dma_limit);
>  
>  	arm_memblock_steal_permitted = false;
>  	memblock_dump_all();
> diff --git a/arch/arm/mm/mm.h b/arch/arm/mm/mm.h
> index 88c121ac14b3..7dbd77554273 100644
> --- a/arch/arm/mm/mm.h
> +++ b/arch/arm/mm/mm.h
> @@ -82,14 +82,6 @@ extern __init void add_static_vm_early(struct static_vm *svm);
>  
>  #endif
>  
> -#ifdef CONFIG_ZONE_DMA
> -extern phys_addr_t arm_dma_limit;
> -extern unsigned long arm_dma_pfn_limit;
> -#else
> -#define arm_dma_limit ((phys_addr_t)~0)
> -#define arm_dma_pfn_limit (~0ul >> PAGE_SHIFT)
> -#endif
> -
>  extern phys_addr_t arm_lowmem_limit;
>  
>  void __init bootmem_init(void);
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index b65dffdfb201..7a7501acd763 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -441,7 +441,7 @@ void __init arm64_memblock_init(void)
>  	early_init_fdt_scan_reserved_mem();
>  
>  	if (IS_ENABLED(CONFIG_ZONE_DMA)) {
> -		zone_dma_bits = ARM64_ZONE_DMA_BITS;
> +		zone_dma_limit = DMA_BIT_MASK(ARM64_ZONE_DMA_BITS);
>  		arm64_dma_phys_limit = max_zone_phys(ARM64_ZONE_DMA_BITS);
>  	}
>  
> diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
> index 9488b63dfc87..337ace03d3f0 100644
> --- a/arch/powerpc/mm/mem.c
> +++ b/arch/powerpc/mm/mem.c
> @@ -223,7 +223,7 @@ static int __init mark_nonram_nosave(void)
>   * everything else. GFP_DMA32 page allocations automatically fall back to
>   * ZONE_DMA.
>   *
> - * By using 31-bit unconditionally, we can exploit zone_dma_bits to inform the
> + * By using 31-bit unconditionally, we can exploit zone_dma_limit to inform the
>   * generic DMA mapping code.  32-bit only devices (if not handled by an IOMMU
>   * anyway) will take a first dip into ZONE_NORMAL and get otherwise served by
>   * ZONE_DMA.
> @@ -257,18 +257,20 @@ void __init paging_init(void)
>  	printk(KERN_DEBUG "Memory hole size: %ldMB\n",
>  	       (long int)((top_of_ram - total_ram) >> 20));
>  
> +#ifdef CONFIG_ZONE_DMA
>  	/*
>  	 * Allow 30-bit DMA for very limited Broadcom wifi chips on many
>  	 * powerbooks.
>  	 */
> -	if (IS_ENABLED(CONFIG_PPC32))
> -		zone_dma_bits = 30;
> -	else
> -		zone_dma_bits = 31;
> -
> -#ifdef CONFIG_ZONE_DMA
> -	max_zone_pfns[ZONE_DMA]	= min(max_low_pfn,
> -				      1UL << (zone_dma_bits - PAGE_SHIFT));
> +	if (IS_ENABLED(CONFIG_PPC32)) {
> +		zone_dma_limit = DMA_BIT_MASK(30);
> +		max_zone_pfns[ZONE_DMA]	= min(max_low_pfn,
> +					      1UL << (30 - PAGE_SHIFT));
> +	} else {
> +		zone_dma_limit = DMA_BIT_MASK(31);
> +		max_zone_pfns[ZONE_DMA]	= min(max_low_pfn,
> +					      1UL << (31 - PAGE_SHIFT));
> +	}
>  #endif
>  	max_zone_pfns[ZONE_NORMAL] = max_low_pfn;
>  #ifdef CONFIG_HIGHMEM
> diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
> index f0ce22220565..c403f61cb56b 100644
> --- a/arch/s390/mm/init.c
> +++ b/arch/s390/mm/init.c
> @@ -118,7 +118,7 @@ void __init paging_init(void)
>  
>  	sparse_memory_present_with_active_regions(MAX_NUMNODES);
>  	sparse_init();
> -	zone_dma_bits = 31;
> +	zone_dma_limit = DMA_BIT_MASK(31);
>  	memset(max_zone_pfns, 0, sizeof(max_zone_pfns));
>  	max_zone_pfns[ZONE_DMA] = PFN_DOWN(MAX_DMA_ADDRESS);
>  	max_zone_pfns[ZONE_NORMAL] = max_low_pfn;
> diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
> index 24b8684aa21d..20d56d597506 100644
> --- a/include/linux/dma-direct.h
> +++ b/include/linux/dma-direct.h
> @@ -6,7 +6,7 @@
>  #include <linux/memblock.h> /* for min_low_pfn */
>  #include <linux/mem_encrypt.h>
>  
> -extern unsigned int zone_dma_bits;
> +extern phys_addr_t zone_dma_limit;
>  
>  #ifdef CONFIG_ARCH_HAS_PHYS_TO_DMA
>  #include <asm/dma-direct.h>
> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> index 6af7ae83c4ad..5ea1bed2ba6f 100644
> --- a/kernel/dma/direct.c
> +++ b/kernel/dma/direct.c
> @@ -21,7 +21,7 @@
>   * it for entirely different regions. In that case the arch code needs to
>   * override the variable below for dma-direct to work properly.
>   */
> -unsigned int zone_dma_bits __ro_after_init = 24;
> +phys_addr_t zone_dma_limit __ro_after_init = DMA_BIT_MASK(24);
>  
>  static void report_addr(struct device *dev, dma_addr_t dma_addr, size_t size)
>  {
> @@ -74,7 +74,7 @@ static gfp_t __dma_direct_optimal_gfp_mask(struct device *dev, u64 dma_mask,
>  	 * Note that GFP_DMA32 and GFP_DMA are no ops without the corresponding
>  	 * zones.
>  	 */
> -	if (*phys_limit <= DMA_BIT_MASK(zone_dma_bits))
> +	if (*phys_limit <= zone_dma_limit)
>  		return GFP_DMA;
>  	if (*phys_limit <= DMA_BIT_MASK(32))
>  		return GFP_DMA32;
> @@ -483,7 +483,7 @@ int dma_direct_supported(struct device *dev, u64 mask)
>  	u64 min_mask;
>  
>  	if (IS_ENABLED(CONFIG_ZONE_DMA))
> -		min_mask = DMA_BIT_MASK(zone_dma_bits);
> +		min_mask = zone_dma_limit;
>  	else
>  		min_mask = DMA_BIT_MASK(32);
>  
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 

Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
Peter Ujfalusi Jan. 8, 2020, 8:28 a.m. UTC | #6
Hi Christoph,

On 19/12/2019 17.20, Peter Ujfalusi wrote:
> Hi Christoph,
> 
> On 19/12/2019 17.02, Christoph Hellwig wrote:
>> Hi Peter,
>>
>> can you try the patch below (it will need to be split into two):
> 
> Thank you!
> 
> Unfortunately it does not help:
> [    0.596208] edma: probe of 2700000.edma failed with error -5
> [    0.596626] edma: probe of 2728000.edma failed with error -5
> ...
> [    2.108602] sdhci-omap 23000000.mmc: Got CD GPIO
> [    2.113899] mmc0: Failed to set 32-bit DMA mask.
> [    2.118592] mmc0: No suitable DMA available - falling back to PIO
> [    2.159038] mmc0: SDHCI controller on 23000000.mmc [23000000.mmc]
> using PIO
> [    2.167531] mmc1: Failed to set 32-bit DMA mask.
> [    2.172192] mmc1: No suitable DMA available - falling back to PIO
> [    2.213841] mmc1: SDHCI controller on 23100000.mmc [23100000.mmc]
> using PIO

Do you have idea on how to fix this in a proper way?

IMHO when drivers are setting the dma_mask and coherent_mask the
dma_pfn_offset should not be applied to the mask at all.

If I understand it correctly for EDMA as example:

I set dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32));
since it can only address memory in this range.

It does not matter if dma_pfn_offset is 0 or not 0 (like in k2g, where
it is 0x780000) the EDMA still can only address within 32 bits.

The dma_pfn_offset will tell us that the memory location's physical
address is seen by the DMA at (phys_pfn - dma_pfn_offset) -> dma_pfn.

The dma_mask should be checked against the dma_pfn.

We can not 'move' the dma_mask with dma_pfn_offset when setting the mask
since it is not correct. The DMA can access in 32 bits range and we have
the peripherals under 0x8000 0000.

I might be missing something, but it looks to me that the way we set the
dma_mask and the coherent_mask is the place where this can be fixed.

Best regards,
- Péter

> 
> - Péter
> 
> 
>> diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
>> index e822af0d9219..30b9c6786ce3 100644
>> --- a/arch/arm/mm/dma-mapping.c
>> +++ b/arch/arm/mm/dma-mapping.c
>> @@ -221,7 +221,8 @@ EXPORT_SYMBOL(arm_coherent_dma_ops);
>>  
>>  static int __dma_supported(struct device *dev, u64 mask, bool warn)
>>  {
>> -	unsigned long max_dma_pfn = min(max_pfn, arm_dma_pfn_limit);
>> +	unsigned long max_dma_pfn =
>> +		min_t(unsigned long, max_pfn, zone_dma_limit >> PAGE_SHIFT);
>>  
>>  	/*
>>  	 * Translate the device's DMA mask to a PFN limit.  This
>> diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
>> index 3ef204137e73..dd0e169a1bb1 100644
>> --- a/arch/arm/mm/init.c
>> +++ b/arch/arm/mm/init.c
>> @@ -19,6 +19,7 @@
>>  #include <linux/gfp.h>
>>  #include <linux/memblock.h>
>>  #include <linux/dma-contiguous.h>
>> +#include <linux/dma-direct.h>
>>  #include <linux/sizes.h>
>>  #include <linux/stop_machine.h>
>>  #include <linux/swiotlb.h>
>> @@ -84,15 +85,6 @@ static void __init find_limits(unsigned long *min, unsigned long *max_low,
>>  phys_addr_t arm_dma_zone_size __read_mostly;
>>  EXPORT_SYMBOL(arm_dma_zone_size);
>>  
>> -/*
>> - * The DMA mask corresponding to the maximum bus address allocatable
>> - * using GFP_DMA.  The default here places no restriction on DMA
>> - * allocations.  This must be the smallest DMA mask in the system,
>> - * so a successful GFP_DMA allocation will always satisfy this.
>> - */
>> -phys_addr_t arm_dma_limit;
>> -unsigned long arm_dma_pfn_limit;
>> -
>>  static void __init arm_adjust_dma_zone(unsigned long *size, unsigned long *hole,
>>  	unsigned long dma_size)
>>  {
>> @@ -108,14 +100,14 @@ static void __init arm_adjust_dma_zone(unsigned long *size, unsigned long *hole,
>>  
>>  void __init setup_dma_zone(const struct machine_desc *mdesc)
>>  {
>> -#ifdef CONFIG_ZONE_DMA
>> -	if (mdesc->dma_zone_size) {
>> +	if (!IS_ENABLED(CONFIG_ZONE_DMA)) {
>> +		zone_dma_limit = ((phys_addr_t)~0);
>> +	} else if (mdesc->dma_zone_size) {
>>  		arm_dma_zone_size = mdesc->dma_zone_size;
>> -		arm_dma_limit = PHYS_OFFSET + arm_dma_zone_size - 1;
>> -	} else
>> -		arm_dma_limit = 0xffffffff;
>> -	arm_dma_pfn_limit = arm_dma_limit >> PAGE_SHIFT;
>> -#endif
>> +		zone_dma_limit = PHYS_OFFSET + arm_dma_zone_size - 1;
>> +	} else {
>> +		zone_dma_limit = 0xffffffff;
>> +	}
>>  }
>>  
>>  static void __init zone_sizes_init(unsigned long min, unsigned long max_low,
>> @@ -279,7 +271,7 @@ void __init arm_memblock_init(const struct machine_desc *mdesc)
>>  	early_init_fdt_scan_reserved_mem();
>>  
>>  	/* reserve memory for DMA contiguous allocations */
>> -	dma_contiguous_reserve(arm_dma_limit);
>> +	dma_contiguous_reserve(zone_dma_limit);
>>  
>>  	arm_memblock_steal_permitted = false;
>>  	memblock_dump_all();
>> diff --git a/arch/arm/mm/mm.h b/arch/arm/mm/mm.h
>> index 88c121ac14b3..7dbd77554273 100644
>> --- a/arch/arm/mm/mm.h
>> +++ b/arch/arm/mm/mm.h
>> @@ -82,14 +82,6 @@ extern __init void add_static_vm_early(struct static_vm *svm);
>>  
>>  #endif
>>  
>> -#ifdef CONFIG_ZONE_DMA
>> -extern phys_addr_t arm_dma_limit;
>> -extern unsigned long arm_dma_pfn_limit;
>> -#else
>> -#define arm_dma_limit ((phys_addr_t)~0)
>> -#define arm_dma_pfn_limit (~0ul >> PAGE_SHIFT)
>> -#endif
>> -
>>  extern phys_addr_t arm_lowmem_limit;
>>  
>>  void __init bootmem_init(void);
>> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
>> index b65dffdfb201..7a7501acd763 100644
>> --- a/arch/arm64/mm/init.c
>> +++ b/arch/arm64/mm/init.c
>> @@ -441,7 +441,7 @@ void __init arm64_memblock_init(void)
>>  	early_init_fdt_scan_reserved_mem();
>>  
>>  	if (IS_ENABLED(CONFIG_ZONE_DMA)) {
>> -		zone_dma_bits = ARM64_ZONE_DMA_BITS;
>> +		zone_dma_limit = DMA_BIT_MASK(ARM64_ZONE_DMA_BITS);
>>  		arm64_dma_phys_limit = max_zone_phys(ARM64_ZONE_DMA_BITS);
>>  	}
>>  
>> diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
>> index 9488b63dfc87..337ace03d3f0 100644
>> --- a/arch/powerpc/mm/mem.c
>> +++ b/arch/powerpc/mm/mem.c
>> @@ -223,7 +223,7 @@ static int __init mark_nonram_nosave(void)
>>   * everything else. GFP_DMA32 page allocations automatically fall back to
>>   * ZONE_DMA.
>>   *
>> - * By using 31-bit unconditionally, we can exploit zone_dma_bits to inform the
>> + * By using 31-bit unconditionally, we can exploit zone_dma_limit to inform the
>>   * generic DMA mapping code.  32-bit only devices (if not handled by an IOMMU
>>   * anyway) will take a first dip into ZONE_NORMAL and get otherwise served by
>>   * ZONE_DMA.
>> @@ -257,18 +257,20 @@ void __init paging_init(void)
>>  	printk(KERN_DEBUG "Memory hole size: %ldMB\n",
>>  	       (long int)((top_of_ram - total_ram) >> 20));
>>  
>> +#ifdef CONFIG_ZONE_DMA
>>  	/*
>>  	 * Allow 30-bit DMA for very limited Broadcom wifi chips on many
>>  	 * powerbooks.
>>  	 */
>> -	if (IS_ENABLED(CONFIG_PPC32))
>> -		zone_dma_bits = 30;
>> -	else
>> -		zone_dma_bits = 31;
>> -
>> -#ifdef CONFIG_ZONE_DMA
>> -	max_zone_pfns[ZONE_DMA]	= min(max_low_pfn,
>> -				      1UL << (zone_dma_bits - PAGE_SHIFT));
>> +	if (IS_ENABLED(CONFIG_PPC32)) {
>> +		zone_dma_limit = DMA_BIT_MASK(30);
>> +		max_zone_pfns[ZONE_DMA]	= min(max_low_pfn,
>> +					      1UL << (30 - PAGE_SHIFT));
>> +	} else {
>> +		zone_dma_limit = DMA_BIT_MASK(31);
>> +		max_zone_pfns[ZONE_DMA]	= min(max_low_pfn,
>> +					      1UL << (31 - PAGE_SHIFT));
>> +	}
>>  #endif
>>  	max_zone_pfns[ZONE_NORMAL] = max_low_pfn;
>>  #ifdef CONFIG_HIGHMEM
>> diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
>> index f0ce22220565..c403f61cb56b 100644
>> --- a/arch/s390/mm/init.c
>> +++ b/arch/s390/mm/init.c
>> @@ -118,7 +118,7 @@ void __init paging_init(void)
>>  
>>  	sparse_memory_present_with_active_regions(MAX_NUMNODES);
>>  	sparse_init();
>> -	zone_dma_bits = 31;
>> +	zone_dma_limit = DMA_BIT_MASK(31);
>>  	memset(max_zone_pfns, 0, sizeof(max_zone_pfns));
>>  	max_zone_pfns[ZONE_DMA] = PFN_DOWN(MAX_DMA_ADDRESS);
>>  	max_zone_pfns[ZONE_NORMAL] = max_low_pfn;
>> diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
>> index 24b8684aa21d..20d56d597506 100644
>> --- a/include/linux/dma-direct.h
>> +++ b/include/linux/dma-direct.h
>> @@ -6,7 +6,7 @@
>>  #include <linux/memblock.h> /* for min_low_pfn */
>>  #include <linux/mem_encrypt.h>
>>  
>> -extern unsigned int zone_dma_bits;
>> +extern phys_addr_t zone_dma_limit;
>>  
>>  #ifdef CONFIG_ARCH_HAS_PHYS_TO_DMA
>>  #include <asm/dma-direct.h>
>> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
>> index 6af7ae83c4ad..5ea1bed2ba6f 100644
>> --- a/kernel/dma/direct.c
>> +++ b/kernel/dma/direct.c
>> @@ -21,7 +21,7 @@
>>   * it for entirely different regions. In that case the arch code needs to
>>   * override the variable below for dma-direct to work properly.
>>   */
>> -unsigned int zone_dma_bits __ro_after_init = 24;
>> +phys_addr_t zone_dma_limit __ro_after_init = DMA_BIT_MASK(24);
>>  
>>  static void report_addr(struct device *dev, dma_addr_t dma_addr, size_t size)
>>  {
>> @@ -74,7 +74,7 @@ static gfp_t __dma_direct_optimal_gfp_mask(struct device *dev, u64 dma_mask,
>>  	 * Note that GFP_DMA32 and GFP_DMA are no ops without the corresponding
>>  	 * zones.
>>  	 */
>> -	if (*phys_limit <= DMA_BIT_MASK(zone_dma_bits))
>> +	if (*phys_limit <= zone_dma_limit)
>>  		return GFP_DMA;
>>  	if (*phys_limit <= DMA_BIT_MASK(32))
>>  		return GFP_DMA32;
>> @@ -483,7 +483,7 @@ int dma_direct_supported(struct device *dev, u64 mask)
>>  	u64 min_mask;
>>  
>>  	if (IS_ENABLED(CONFIG_ZONE_DMA))
>> -		min_mask = DMA_BIT_MASK(zone_dma_bits);
>> +		min_mask = zone_dma_limit;
>>  	else
>>  		min_mask = DMA_BIT_MASK(32);
>>  
>>
>> _______________________________________________
>> linux-arm-kernel mailing list
>> linux-arm-kernel@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>>
> 
> Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
> Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 

Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
Robin Murphy Jan. 8, 2020, 12:21 p.m. UTC | #7
On 08/01/2020 8:28 am, Peter Ujfalusi via iommu wrote:
> Hi Christoph,
> 
> On 19/12/2019 17.20, Peter Ujfalusi wrote:
>> Hi Christoph,
>>
>> On 19/12/2019 17.02, Christoph Hellwig wrote:
>>> Hi Peter,
>>>
>>> can you try the patch below (it will need to be split into two):
>>
>> Thank you!
>>
>> Unfortunately it does not help:
>> [    0.596208] edma: probe of 2700000.edma failed with error -5
>> [    0.596626] edma: probe of 2728000.edma failed with error -5
>> ...
>> [    2.108602] sdhci-omap 23000000.mmc: Got CD GPIO
>> [    2.113899] mmc0: Failed to set 32-bit DMA mask.
>> [    2.118592] mmc0: No suitable DMA available - falling back to PIO
>> [    2.159038] mmc0: SDHCI controller on 23000000.mmc [23000000.mmc]
>> using PIO
>> [    2.167531] mmc1: Failed to set 32-bit DMA mask.
>> [    2.172192] mmc1: No suitable DMA available - falling back to PIO
>> [    2.213841] mmc1: SDHCI controller on 23100000.mmc [23100000.mmc]
>> using PIO
> 
> Do you have idea on how to fix this in a proper way?
> 
> IMHO when drivers are setting the dma_mask and coherent_mask the
> dma_pfn_offset should not be applied to the mask at all.
> 
> If I understand it correctly for EDMA as example:
> 
> I set dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32));
> since it can only address memory in this range.
> 
> It does not matter if dma_pfn_offset is 0 or not 0 (like in k2g, where
> it is 0x780000) the EDMA still can only address within 32 bits.
> 
> The dma_pfn_offset will tell us that the memory location's physical
> address is seen by the DMA at (phys_pfn - dma_pfn_offset) -> dma_pfn.
> 
> The dma_mask should be checked against the dma_pfn.

That's exactly what dma_direct_supported() does, though. What it doesn't 
do, AFAICS, is account for weird cases where the DMA zone *starts* way, 
way above 32 physical address bits because of an implicit assumption 
that *all* devices have a dma_pfn_offset equal to min_low_pfn. I'm not 
sure how possible it is to cope with that generically.

Robin.

> We can not 'move' the dma_mask with dma_pfn_offset when setting the mask
> since it is not correct. The DMA can access in 32 bits range and we have
> the peripherals under 0x8000 0000.
> 
> I might be missing something, but it looks to me that the way we set the
> dma_mask and the coherent_mask is the place where this can be fixed.
> 
> Best regards,
> - Péter
> 
>>
>> - Péter
>>
>>
>>> diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
>>> index e822af0d9219..30b9c6786ce3 100644
>>> --- a/arch/arm/mm/dma-mapping.c
>>> +++ b/arch/arm/mm/dma-mapping.c
>>> @@ -221,7 +221,8 @@ EXPORT_SYMBOL(arm_coherent_dma_ops);
>>>   
>>>   static int __dma_supported(struct device *dev, u64 mask, bool warn)
>>>   {
>>> -	unsigned long max_dma_pfn = min(max_pfn, arm_dma_pfn_limit);
>>> +	unsigned long max_dma_pfn =
>>> +		min_t(unsigned long, max_pfn, zone_dma_limit >> PAGE_SHIFT);
>>>   
>>>   	/*
>>>   	 * Translate the device's DMA mask to a PFN limit.  This
>>> diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
>>> index 3ef204137e73..dd0e169a1bb1 100644
>>> --- a/arch/arm/mm/init.c
>>> +++ b/arch/arm/mm/init.c
>>> @@ -19,6 +19,7 @@
>>>   #include <linux/gfp.h>
>>>   #include <linux/memblock.h>
>>>   #include <linux/dma-contiguous.h>
>>> +#include <linux/dma-direct.h>
>>>   #include <linux/sizes.h>
>>>   #include <linux/stop_machine.h>
>>>   #include <linux/swiotlb.h>
>>> @@ -84,15 +85,6 @@ static void __init find_limits(unsigned long *min, unsigned long *max_low,
>>>   phys_addr_t arm_dma_zone_size __read_mostly;
>>>   EXPORT_SYMBOL(arm_dma_zone_size);
>>>   
>>> -/*
>>> - * The DMA mask corresponding to the maximum bus address allocatable
>>> - * using GFP_DMA.  The default here places no restriction on DMA
>>> - * allocations.  This must be the smallest DMA mask in the system,
>>> - * so a successful GFP_DMA allocation will always satisfy this.
>>> - */
>>> -phys_addr_t arm_dma_limit;
>>> -unsigned long arm_dma_pfn_limit;
>>> -
>>>   static void __init arm_adjust_dma_zone(unsigned long *size, unsigned long *hole,
>>>   	unsigned long dma_size)
>>>   {
>>> @@ -108,14 +100,14 @@ static void __init arm_adjust_dma_zone(unsigned long *size, unsigned long *hole,
>>>   
>>>   void __init setup_dma_zone(const struct machine_desc *mdesc)
>>>   {
>>> -#ifdef CONFIG_ZONE_DMA
>>> -	if (mdesc->dma_zone_size) {
>>> +	if (!IS_ENABLED(CONFIG_ZONE_DMA)) {
>>> +		zone_dma_limit = ((phys_addr_t)~0);
>>> +	} else if (mdesc->dma_zone_size) {
>>>   		arm_dma_zone_size = mdesc->dma_zone_size;
>>> -		arm_dma_limit = PHYS_OFFSET + arm_dma_zone_size - 1;
>>> -	} else
>>> -		arm_dma_limit = 0xffffffff;
>>> -	arm_dma_pfn_limit = arm_dma_limit >> PAGE_SHIFT;
>>> -#endif
>>> +		zone_dma_limit = PHYS_OFFSET + arm_dma_zone_size - 1;
>>> +	} else {
>>> +		zone_dma_limit = 0xffffffff;
>>> +	}
>>>   }
>>>   
>>>   static void __init zone_sizes_init(unsigned long min, unsigned long max_low,
>>> @@ -279,7 +271,7 @@ void __init arm_memblock_init(const struct machine_desc *mdesc)
>>>   	early_init_fdt_scan_reserved_mem();
>>>   
>>>   	/* reserve memory for DMA contiguous allocations */
>>> -	dma_contiguous_reserve(arm_dma_limit);
>>> +	dma_contiguous_reserve(zone_dma_limit);
>>>   
>>>   	arm_memblock_steal_permitted = false;
>>>   	memblock_dump_all();
>>> diff --git a/arch/arm/mm/mm.h b/arch/arm/mm/mm.h
>>> index 88c121ac14b3..7dbd77554273 100644
>>> --- a/arch/arm/mm/mm.h
>>> +++ b/arch/arm/mm/mm.h
>>> @@ -82,14 +82,6 @@ extern __init void add_static_vm_early(struct static_vm *svm);
>>>   
>>>   #endif
>>>   
>>> -#ifdef CONFIG_ZONE_DMA
>>> -extern phys_addr_t arm_dma_limit;
>>> -extern unsigned long arm_dma_pfn_limit;
>>> -#else
>>> -#define arm_dma_limit ((phys_addr_t)~0)
>>> -#define arm_dma_pfn_limit (~0ul >> PAGE_SHIFT)
>>> -#endif
>>> -
>>>   extern phys_addr_t arm_lowmem_limit;
>>>   
>>>   void __init bootmem_init(void);
>>> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
>>> index b65dffdfb201..7a7501acd763 100644
>>> --- a/arch/arm64/mm/init.c
>>> +++ b/arch/arm64/mm/init.c
>>> @@ -441,7 +441,7 @@ void __init arm64_memblock_init(void)
>>>   	early_init_fdt_scan_reserved_mem();
>>>   
>>>   	if (IS_ENABLED(CONFIG_ZONE_DMA)) {
>>> -		zone_dma_bits = ARM64_ZONE_DMA_BITS;
>>> +		zone_dma_limit = DMA_BIT_MASK(ARM64_ZONE_DMA_BITS);
>>>   		arm64_dma_phys_limit = max_zone_phys(ARM64_ZONE_DMA_BITS);
>>>   	}
>>>   
>>> diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
>>> index 9488b63dfc87..337ace03d3f0 100644
>>> --- a/arch/powerpc/mm/mem.c
>>> +++ b/arch/powerpc/mm/mem.c
>>> @@ -223,7 +223,7 @@ static int __init mark_nonram_nosave(void)
>>>    * everything else. GFP_DMA32 page allocations automatically fall back to
>>>    * ZONE_DMA.
>>>    *
>>> - * By using 31-bit unconditionally, we can exploit zone_dma_bits to inform the
>>> + * By using 31-bit unconditionally, we can exploit zone_dma_limit to inform the
>>>    * generic DMA mapping code.  32-bit only devices (if not handled by an IOMMU
>>>    * anyway) will take a first dip into ZONE_NORMAL and get otherwise served by
>>>    * ZONE_DMA.
>>> @@ -257,18 +257,20 @@ void __init paging_init(void)
>>>   	printk(KERN_DEBUG "Memory hole size: %ldMB\n",
>>>   	       (long int)((top_of_ram - total_ram) >> 20));
>>>   
>>> +#ifdef CONFIG_ZONE_DMA
>>>   	/*
>>>   	 * Allow 30-bit DMA for very limited Broadcom wifi chips on many
>>>   	 * powerbooks.
>>>   	 */
>>> -	if (IS_ENABLED(CONFIG_PPC32))
>>> -		zone_dma_bits = 30;
>>> -	else
>>> -		zone_dma_bits = 31;
>>> -
>>> -#ifdef CONFIG_ZONE_DMA
>>> -	max_zone_pfns[ZONE_DMA]	= min(max_low_pfn,
>>> -				      1UL << (zone_dma_bits - PAGE_SHIFT));
>>> +	if (IS_ENABLED(CONFIG_PPC32)) {
>>> +		zone_dma_limit = DMA_BIT_MASK(30);
>>> +		max_zone_pfns[ZONE_DMA]	= min(max_low_pfn,
>>> +					      1UL << (30 - PAGE_SHIFT));
>>> +	} else {
>>> +		zone_dma_limit = DMA_BIT_MASK(31);
>>> +		max_zone_pfns[ZONE_DMA]	= min(max_low_pfn,
>>> +					      1UL << (31 - PAGE_SHIFT));
>>> +	}
>>>   #endif
>>>   	max_zone_pfns[ZONE_NORMAL] = max_low_pfn;
>>>   #ifdef CONFIG_HIGHMEM
>>> diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
>>> index f0ce22220565..c403f61cb56b 100644
>>> --- a/arch/s390/mm/init.c
>>> +++ b/arch/s390/mm/init.c
>>> @@ -118,7 +118,7 @@ void __init paging_init(void)
>>>   
>>>   	sparse_memory_present_with_active_regions(MAX_NUMNODES);
>>>   	sparse_init();
>>> -	zone_dma_bits = 31;
>>> +	zone_dma_limit = DMA_BIT_MASK(31);
>>>   	memset(max_zone_pfns, 0, sizeof(max_zone_pfns));
>>>   	max_zone_pfns[ZONE_DMA] = PFN_DOWN(MAX_DMA_ADDRESS);
>>>   	max_zone_pfns[ZONE_NORMAL] = max_low_pfn;
>>> diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
>>> index 24b8684aa21d..20d56d597506 100644
>>> --- a/include/linux/dma-direct.h
>>> +++ b/include/linux/dma-direct.h
>>> @@ -6,7 +6,7 @@
>>>   #include <linux/memblock.h> /* for min_low_pfn */
>>>   #include <linux/mem_encrypt.h>
>>>   
>>> -extern unsigned int zone_dma_bits;
>>> +extern phys_addr_t zone_dma_limit;
>>>   
>>>   #ifdef CONFIG_ARCH_HAS_PHYS_TO_DMA
>>>   #include <asm/dma-direct.h>
>>> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
>>> index 6af7ae83c4ad..5ea1bed2ba6f 100644
>>> --- a/kernel/dma/direct.c
>>> +++ b/kernel/dma/direct.c
>>> @@ -21,7 +21,7 @@
>>>    * it for entirely different regions. In that case the arch code needs to
>>>    * override the variable below for dma-direct to work properly.
>>>    */
>>> -unsigned int zone_dma_bits __ro_after_init = 24;
>>> +phys_addr_t zone_dma_limit __ro_after_init = DMA_BIT_MASK(24);
>>>   
>>>   static void report_addr(struct device *dev, dma_addr_t dma_addr, size_t size)
>>>   {
>>> @@ -74,7 +74,7 @@ static gfp_t __dma_direct_optimal_gfp_mask(struct device *dev, u64 dma_mask,
>>>   	 * Note that GFP_DMA32 and GFP_DMA are no ops without the corresponding
>>>   	 * zones.
>>>   	 */
>>> -	if (*phys_limit <= DMA_BIT_MASK(zone_dma_bits))
>>> +	if (*phys_limit <= zone_dma_limit)
>>>   		return GFP_DMA;
>>>   	if (*phys_limit <= DMA_BIT_MASK(32))
>>>   		return GFP_DMA32;
>>> @@ -483,7 +483,7 @@ int dma_direct_supported(struct device *dev, u64 mask)
>>>   	u64 min_mask;
>>>   
>>>   	if (IS_ENABLED(CONFIG_ZONE_DMA))
>>> -		min_mask = DMA_BIT_MASK(zone_dma_bits);
>>> +		min_mask = zone_dma_limit;
>>>   	else
>>>   		min_mask = DMA_BIT_MASK(32);
>>>   
>>>
>>> _______________________________________________
>>> linux-arm-kernel mailing list
>>> linux-arm-kernel@lists.infradead.org
>>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>>>
>>
>> Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
>> Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
>>
>> _______________________________________________
>> linux-arm-kernel mailing list
>> linux-arm-kernel@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>>
> 
> Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
> Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
> _______________________________________________
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
>
Peter Ujfalusi Jan. 8, 2020, 2 p.m. UTC | #8
Robin,

On 08/01/2020 14.21, Robin Murphy wrote:
> On 08/01/2020 8:28 am, Peter Ujfalusi via iommu wrote:
>> Hi Christoph,
>>
>> On 19/12/2019 17.20, Peter Ujfalusi wrote:
>>> Hi Christoph,
>>>
>>> On 19/12/2019 17.02, Christoph Hellwig wrote:
>>>> Hi Peter,
>>>>
>>>> can you try the patch below (it will need to be split into two):
>>>
>>> Thank you!
>>>
>>> Unfortunately it does not help:
>>> [    0.596208] edma: probe of 2700000.edma failed with error -5
>>> [    0.596626] edma: probe of 2728000.edma failed with error -5
>>> ...
>>> [    2.108602] sdhci-omap 23000000.mmc: Got CD GPIO
>>> [    2.113899] mmc0: Failed to set 32-bit DMA mask.
>>> [    2.118592] mmc0: No suitable DMA available - falling back to PIO
>>> [    2.159038] mmc0: SDHCI controller on 23000000.mmc [23000000.mmc]
>>> using PIO
>>> [    2.167531] mmc1: Failed to set 32-bit DMA mask.
>>> [    2.172192] mmc1: No suitable DMA available - falling back to PIO
>>> [    2.213841] mmc1: SDHCI controller on 23100000.mmc [23100000.mmc]
>>> using PIO
>>
>> Do you have idea on how to fix this in a proper way?
>>
>> IMHO when drivers are setting the dma_mask and coherent_mask the
>> dma_pfn_offset should not be applied to the mask at all.
>>
>> If I understand it correctly for EDMA as example:
>>
>> I set dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32));
>> since it can only address memory in this range.
>>
>> It does not matter if dma_pfn_offset is 0 or not 0 (like in k2g, where
>> it is 0x780000) the EDMA still can only address within 32 bits.
>>
>> The dma_pfn_offset will tell us that the memory location's physical
>> address is seen by the DMA at (phys_pfn - dma_pfn_offset) -> dma_pfn.
>>
>> The dma_mask should be checked against the dma_pfn.
> 
> That's exactly what dma_direct_supported() does, though.

Yes, this is my understanding as well.

> What it doesn't
> do, AFAICS, is account for weird cases where the DMA zone *starts* way,
> way above 32 physical address bits because of an implicit assumption
> that *all* devices have a dma_pfn_offset equal to min_low_pfn.

The problem - I think - is that the DMA_BIT_MASK(32) from
dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32)) is treated as physical
address along the call path so the dma_pfn_offset is applied to it and
the check will fail, saying that DMA_BIT_MASK(32) can not be supported.

Fyi, this is what I have gathered on k2g via prints:

dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32));

Prints inside dma_direct_supported():
sdhci-omap 23000000.mmc: max_pfn: 880000
sdhci-omap 23000000.mmc: min_mask #1: ffffff
sdhci-omap 23000000.mmc: min_mask #2: ffffff
sdhci-omap 23000000.mmc: dev->dma_pfn_offset: 780000
sdhci-omap 23000000.mmc: PAGE_SHIFT: 12
sdhci-omap 23000000.mmc: __phys_to_dma(dev, min_mask): ff880ffffff
sdhci-omap 23000000.mmc: mask: ffffffff

Print in dma_supported() after returning from dma_direct_supported():
sdhci-omap 23000000.mmc: dma_is_direct, ret = 0

sdhci-omap 23100000.mmc: DMA is not supported for the device

keystone-k2g have this in soc0 node:
dma-ranges = <0x80000000 0x8 0x00000000 0x80000000>;

DDR starts at 0x8 0000 0000 (8G) and 2G is aliased at 0x8000 0000.

This gives the 0x780000 for dma_pfn_offset for all devices underneath it.

The DMA_BIT_MASK(24) is passed to __phys_to_dma() because
CONFIG_ZONE_DMA is enabled.

SWIOTLB is enabled in kconfig.

> I'm not sure how possible it is to cope with that generically.

Me neither, but k2g is failing since v5.3-rc3 and the bisect pointed to
this commit.

> 
> Robin.
> 
>> We can not 'move' the dma_mask with dma_pfn_offset when setting the mask
>> since it is not correct. The DMA can access in 32 bits range and we have
>> the peripherals under 0x8000 0000.
>>
>> I might be missing something, but it looks to me that the way we set the
>> dma_mask and the coherent_mask is the place where this can be fixed.
>>
>> Best regards,
>> - Péter
>>
>>>
>>> - Péter
>>>
>>>
>>>> diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
>>>> index e822af0d9219..30b9c6786ce3 100644
>>>> --- a/arch/arm/mm/dma-mapping.c
>>>> +++ b/arch/arm/mm/dma-mapping.c
>>>> @@ -221,7 +221,8 @@ EXPORT_SYMBOL(arm_coherent_dma_ops);
>>>>     static int __dma_supported(struct device *dev, u64 mask, bool warn)
>>>>   {
>>>> -    unsigned long max_dma_pfn = min(max_pfn, arm_dma_pfn_limit);
>>>> +    unsigned long max_dma_pfn =
>>>> +        min_t(unsigned long, max_pfn, zone_dma_limit >> PAGE_SHIFT);
>>>>         /*
>>>>        * Translate the device's DMA mask to a PFN limit.  This
>>>> diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
>>>> index 3ef204137e73..dd0e169a1bb1 100644
>>>> --- a/arch/arm/mm/init.c
>>>> +++ b/arch/arm/mm/init.c
>>>> @@ -19,6 +19,7 @@
>>>>   #include <linux/gfp.h>
>>>>   #include <linux/memblock.h>
>>>>   #include <linux/dma-contiguous.h>
>>>> +#include <linux/dma-direct.h>
>>>>   #include <linux/sizes.h>
>>>>   #include <linux/stop_machine.h>
>>>>   #include <linux/swiotlb.h>
>>>> @@ -84,15 +85,6 @@ static void __init find_limits(unsigned long
>>>> *min, unsigned long *max_low,
>>>>   phys_addr_t arm_dma_zone_size __read_mostly;
>>>>   EXPORT_SYMBOL(arm_dma_zone_size);
>>>>   -/*
>>>> - * The DMA mask corresponding to the maximum bus address allocatable
>>>> - * using GFP_DMA.  The default here places no restriction on DMA
>>>> - * allocations.  This must be the smallest DMA mask in the system,
>>>> - * so a successful GFP_DMA allocation will always satisfy this.
>>>> - */
>>>> -phys_addr_t arm_dma_limit;
>>>> -unsigned long arm_dma_pfn_limit;
>>>> -
>>>>   static void __init arm_adjust_dma_zone(unsigned long *size,
>>>> unsigned long *hole,
>>>>       unsigned long dma_size)
>>>>   {
>>>> @@ -108,14 +100,14 @@ static void __init
>>>> arm_adjust_dma_zone(unsigned long *size, unsigned long *hole,
>>>>     void __init setup_dma_zone(const struct machine_desc *mdesc)
>>>>   {
>>>> -#ifdef CONFIG_ZONE_DMA
>>>> -    if (mdesc->dma_zone_size) {
>>>> +    if (!IS_ENABLED(CONFIG_ZONE_DMA)) {
>>>> +        zone_dma_limit = ((phys_addr_t)~0);
>>>> +    } else if (mdesc->dma_zone_size) {
>>>>           arm_dma_zone_size = mdesc->dma_zone_size;
>>>> -        arm_dma_limit = PHYS_OFFSET + arm_dma_zone_size - 1;
>>>> -    } else
>>>> -        arm_dma_limit = 0xffffffff;
>>>> -    arm_dma_pfn_limit = arm_dma_limit >> PAGE_SHIFT;
>>>> -#endif
>>>> +        zone_dma_limit = PHYS_OFFSET + arm_dma_zone_size - 1;
>>>> +    } else {
>>>> +        zone_dma_limit = 0xffffffff;
>>>> +    }
>>>>   }
>>>>     static void __init zone_sizes_init(unsigned long min, unsigned
>>>> long max_low,
>>>> @@ -279,7 +271,7 @@ void __init arm_memblock_init(const struct
>>>> machine_desc *mdesc)
>>>>       early_init_fdt_scan_reserved_mem();
>>>>         /* reserve memory for DMA contiguous allocations */
>>>> -    dma_contiguous_reserve(arm_dma_limit);
>>>> +    dma_contiguous_reserve(zone_dma_limit);
>>>>         arm_memblock_steal_permitted = false;
>>>>       memblock_dump_all();
>>>> diff --git a/arch/arm/mm/mm.h b/arch/arm/mm/mm.h
>>>> index 88c121ac14b3..7dbd77554273 100644
>>>> --- a/arch/arm/mm/mm.h
>>>> +++ b/arch/arm/mm/mm.h
>>>> @@ -82,14 +82,6 @@ extern __init void add_static_vm_early(struct
>>>> static_vm *svm);
>>>>     #endif
>>>>   -#ifdef CONFIG_ZONE_DMA
>>>> -extern phys_addr_t arm_dma_limit;
>>>> -extern unsigned long arm_dma_pfn_limit;
>>>> -#else
>>>> -#define arm_dma_limit ((phys_addr_t)~0)
>>>> -#define arm_dma_pfn_limit (~0ul >> PAGE_SHIFT)
>>>> -#endif
>>>> -
>>>>   extern phys_addr_t arm_lowmem_limit;
>>>>     void __init bootmem_init(void);
>>>> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
>>>> index b65dffdfb201..7a7501acd763 100644
>>>> --- a/arch/arm64/mm/init.c
>>>> +++ b/arch/arm64/mm/init.c
>>>> @@ -441,7 +441,7 @@ void __init arm64_memblock_init(void)
>>>>       early_init_fdt_scan_reserved_mem();
>>>>         if (IS_ENABLED(CONFIG_ZONE_DMA)) {
>>>> -        zone_dma_bits = ARM64_ZONE_DMA_BITS;
>>>> +        zone_dma_limit = DMA_BIT_MASK(ARM64_ZONE_DMA_BITS);
>>>>           arm64_dma_phys_limit = max_zone_phys(ARM64_ZONE_DMA_BITS);
>>>>       }
>>>>   diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
>>>> index 9488b63dfc87..337ace03d3f0 100644
>>>> --- a/arch/powerpc/mm/mem.c
>>>> +++ b/arch/powerpc/mm/mem.c
>>>> @@ -223,7 +223,7 @@ static int __init mark_nonram_nosave(void)
>>>>    * everything else. GFP_DMA32 page allocations automatically fall
>>>> back to
>>>>    * ZONE_DMA.
>>>>    *
>>>> - * By using 31-bit unconditionally, we can exploit zone_dma_bits to
>>>> inform the
>>>> + * By using 31-bit unconditionally, we can exploit zone_dma_limit
>>>> to inform the
>>>>    * generic DMA mapping code.  32-bit only devices (if not handled
>>>> by an IOMMU
>>>>    * anyway) will take a first dip into ZONE_NORMAL and get
>>>> otherwise served by
>>>>    * ZONE_DMA.
>>>> @@ -257,18 +257,20 @@ void __init paging_init(void)
>>>>       printk(KERN_DEBUG "Memory hole size: %ldMB\n",
>>>>              (long int)((top_of_ram - total_ram) >> 20));
>>>>   +#ifdef CONFIG_ZONE_DMA
>>>>       /*
>>>>        * Allow 30-bit DMA for very limited Broadcom wifi chips on many
>>>>        * powerbooks.
>>>>        */
>>>> -    if (IS_ENABLED(CONFIG_PPC32))
>>>> -        zone_dma_bits = 30;
>>>> -    else
>>>> -        zone_dma_bits = 31;
>>>> -
>>>> -#ifdef CONFIG_ZONE_DMA
>>>> -    max_zone_pfns[ZONE_DMA]    = min(max_low_pfn,
>>>> -                      1UL << (zone_dma_bits - PAGE_SHIFT));
>>>> +    if (IS_ENABLED(CONFIG_PPC32)) {
>>>> +        zone_dma_limit = DMA_BIT_MASK(30);
>>>> +        max_zone_pfns[ZONE_DMA]    = min(max_low_pfn,
>>>> +                          1UL << (30 - PAGE_SHIFT));
>>>> +    } else {
>>>> +        zone_dma_limit = DMA_BIT_MASK(31);
>>>> +        max_zone_pfns[ZONE_DMA]    = min(max_low_pfn,
>>>> +                          1UL << (31 - PAGE_SHIFT));
>>>> +    }
>>>>   #endif
>>>>       max_zone_pfns[ZONE_NORMAL] = max_low_pfn;
>>>>   #ifdef CONFIG_HIGHMEM
>>>> diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
>>>> index f0ce22220565..c403f61cb56b 100644
>>>> --- a/arch/s390/mm/init.c
>>>> +++ b/arch/s390/mm/init.c
>>>> @@ -118,7 +118,7 @@ void __init paging_init(void)
>>>>         sparse_memory_present_with_active_regions(MAX_NUMNODES);
>>>>       sparse_init();
>>>> -    zone_dma_bits = 31;
>>>> +    zone_dma_limit = DMA_BIT_MASK(31);
>>>>       memset(max_zone_pfns, 0, sizeof(max_zone_pfns));
>>>>       max_zone_pfns[ZONE_DMA] = PFN_DOWN(MAX_DMA_ADDRESS);
>>>>       max_zone_pfns[ZONE_NORMAL] = max_low_pfn;
>>>> diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
>>>> index 24b8684aa21d..20d56d597506 100644
>>>> --- a/include/linux/dma-direct.h
>>>> +++ b/include/linux/dma-direct.h
>>>> @@ -6,7 +6,7 @@
>>>>   #include <linux/memblock.h> /* for min_low_pfn */
>>>>   #include <linux/mem_encrypt.h>
>>>>   -extern unsigned int zone_dma_bits;
>>>> +extern phys_addr_t zone_dma_limit;
>>>>     #ifdef CONFIG_ARCH_HAS_PHYS_TO_DMA
>>>>   #include <asm/dma-direct.h>
>>>> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
>>>> index 6af7ae83c4ad..5ea1bed2ba6f 100644
>>>> --- a/kernel/dma/direct.c
>>>> +++ b/kernel/dma/direct.c
>>>> @@ -21,7 +21,7 @@
>>>>    * it for entirely different regions. In that case the arch code
>>>> needs to
>>>>    * override the variable below for dma-direct to work properly.
>>>>    */
>>>> -unsigned int zone_dma_bits __ro_after_init = 24;
>>>> +phys_addr_t zone_dma_limit __ro_after_init = DMA_BIT_MASK(24);
>>>>     static void report_addr(struct device *dev, dma_addr_t dma_addr,
>>>> size_t size)
>>>>   {
>>>> @@ -74,7 +74,7 @@ static gfp_t __dma_direct_optimal_gfp_mask(struct
>>>> device *dev, u64 dma_mask,
>>>>        * Note that GFP_DMA32 and GFP_DMA are no ops without the
>>>> corresponding
>>>>        * zones.
>>>>        */
>>>> -    if (*phys_limit <= DMA_BIT_MASK(zone_dma_bits))
>>>> +    if (*phys_limit <= zone_dma_limit)
>>>>           return GFP_DMA;
>>>>       if (*phys_limit <= DMA_BIT_MASK(32))
>>>>           return GFP_DMA32;
>>>> @@ -483,7 +483,7 @@ int dma_direct_supported(struct device *dev, u64
>>>> mask)
>>>>       u64 min_mask;
>>>>         if (IS_ENABLED(CONFIG_ZONE_DMA))
>>>> -        min_mask = DMA_BIT_MASK(zone_dma_bits);
>>>> +        min_mask = zone_dma_limit;
>>>>       else
>>>>           min_mask = DMA_BIT_MASK(32);
>>>>  
>>>> _______________________________________________
>>>> linux-arm-kernel mailing list
>>>> linux-arm-kernel@lists.infradead.org
>>>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>>>>
>>>
>>> Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
>>> Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
>>>
>>> _______________________________________________
>>> linux-arm-kernel mailing list
>>> linux-arm-kernel@lists.infradead.org
>>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>>>
>>
>> Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
>> Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
>> _______________________________________________
>> iommu mailing list
>> iommu@lists.linux-foundation.org
>> https://lists.linuxfoundation.org/mailman/listinfo/iommu
>>

- Péter

Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
Robin Murphy Jan. 8, 2020, 3:20 p.m. UTC | #9
On 08/01/2020 2:00 pm, Peter Ujfalusi wrote:
> Robin,
> 
> On 08/01/2020 14.21, Robin Murphy wrote:
>> On 08/01/2020 8:28 am, Peter Ujfalusi via iommu wrote:
>>> Hi Christoph,
>>>
>>> On 19/12/2019 17.20, Peter Ujfalusi wrote:
>>>> Hi Christoph,
>>>>
>>>> On 19/12/2019 17.02, Christoph Hellwig wrote:
>>>>> Hi Peter,
>>>>>
>>>>> can you try the patch below (it will need to be split into two):
>>>>
>>>> Thank you!
>>>>
>>>> Unfortunately it does not help:
>>>> [    0.596208] edma: probe of 2700000.edma failed with error -5
>>>> [    0.596626] edma: probe of 2728000.edma failed with error -5
>>>> ...
>>>> [    2.108602] sdhci-omap 23000000.mmc: Got CD GPIO
>>>> [    2.113899] mmc0: Failed to set 32-bit DMA mask.
>>>> [    2.118592] mmc0: No suitable DMA available - falling back to PIO
>>>> [    2.159038] mmc0: SDHCI controller on 23000000.mmc [23000000.mmc]
>>>> using PIO
>>>> [    2.167531] mmc1: Failed to set 32-bit DMA mask.
>>>> [    2.172192] mmc1: No suitable DMA available - falling back to PIO
>>>> [    2.213841] mmc1: SDHCI controller on 23100000.mmc [23100000.mmc]
>>>> using PIO
>>>
>>> Do you have idea on how to fix this in a proper way?
>>>
>>> IMHO when drivers are setting the dma_mask and coherent_mask the
>>> dma_pfn_offset should not be applied to the mask at all.
>>>
>>> If I understand it correctly for EDMA as example:
>>>
>>> I set dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32));
>>> since it can only address memory in this range.
>>>
>>> It does not matter if dma_pfn_offset is 0 or not 0 (like in k2g, where
>>> it is 0x780000) the EDMA still can only address within 32 bits.
>>>
>>> The dma_pfn_offset will tell us that the memory location's physical
>>> address is seen by the DMA at (phys_pfn - dma_pfn_offset) -> dma_pfn.
>>>
>>> The dma_mask should be checked against the dma_pfn.
>>
>> That's exactly what dma_direct_supported() does, though.
> 
> Yes, this is my understanding as well.
> 
>> What it doesn't
>> do, AFAICS, is account for weird cases where the DMA zone *starts* way,
>> way above 32 physical address bits because of an implicit assumption
>> that *all* devices have a dma_pfn_offset equal to min_low_pfn.
> 
> The problem - I think - is that the DMA_BIT_MASK(32) from
> dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32)) is treated as physical
> address along the call path so the dma_pfn_offset is applied to it and
> the check will fail, saying that DMA_BIT_MASK(32) can not be supported.

But that's the thing - in isolation, that is entirely correct. 
Considering ZONE_DMA32 for simplicity, in general the zone is expected 
to cover the physical address range 0x0000_0000 - 0xffff_ffff (because 
DMA offsets are relatively rare), and a device with a dma_pfn_offset of 
more than (0x1_0000_0000 >> PAGE_SHIFT) *cannot* support that range with 
any mask, because the DMA address itself would have to be negative.

The problem is that platforms with esoteric memory maps have no right 
thing to do. If the base of RAM is at at 0x1_0000_0000 or higher, the 
"correct" ZONE_DMA32 would be empty while ZONE_NORMAL above it would 
not, and last time I looked that makes the page allocator break badly. 
So the standard bodge on such platforms is to make ZONE_DMA32 cover not 
the first 4GB of *PA space*, but the first 4GB of *RAM*, wherever that 
happens to be. That then brings different problems - now the page 
allocator is happy and successfully returns GFP_DMA32 allocations from 
the range 0x8_0000_0000 - 0x8_ffff_ffff that are utterly useless to 
32-bit devices with zero dma_pfn_offset - see the AMD Seattle SoC for 
the prime example of that. If on the other hand all devices are 
guaranteed to have a dma_pfn_offset that puts the base of RAM at DMA 
address 0 then GFP_DMA32 allocations do end up working as expected, but 
now the original assumption of where ZONE_DMA32 actually is is broken, 
so generic code unaware of the platform/architecture-specific bodge will 
be misled - that's the case you're running into.

Having thought this far, if there's a non-hacky way to reach in and grab 
ZONE_DMA{32} such that dma_direct_supported() could use zone_end_pfn() 
instead of trying to assume either way, that might be the most robust 
general solution.

Robin.

> Fyi, this is what I have gathered on k2g via prints:
> 
> dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32));
> 
> Prints inside dma_direct_supported():
> sdhci-omap 23000000.mmc: max_pfn: 880000
> sdhci-omap 23000000.mmc: min_mask #1: ffffff
> sdhci-omap 23000000.mmc: min_mask #2: ffffff
> sdhci-omap 23000000.mmc: dev->dma_pfn_offset: 780000
> sdhci-omap 23000000.mmc: PAGE_SHIFT: 12
> sdhci-omap 23000000.mmc: __phys_to_dma(dev, min_mask): ff880ffffff
> sdhci-omap 23000000.mmc: mask: ffffffff
> 
> Print in dma_supported() after returning from dma_direct_supported():
> sdhci-omap 23000000.mmc: dma_is_direct, ret = 0
> 
> sdhci-omap 23100000.mmc: DMA is not supported for the device
> 
> keystone-k2g have this in soc0 node:
> dma-ranges = <0x80000000 0x8 0x00000000 0x80000000>;
> 
> DDR starts at 0x8 0000 0000 (8G) and 2G is aliased at 0x8000 0000.
> 
> This gives the 0x780000 for dma_pfn_offset for all devices underneath it.
> 
> The DMA_BIT_MASK(24) is passed to __phys_to_dma() because
> CONFIG_ZONE_DMA is enabled.
> 
> SWIOTLB is enabled in kconfig.
> 
>> I'm not sure how possible it is to cope with that generically.
> 
> Me neither, but k2g is failing since v5.3-rc3 and the bisect pointed to
> this commit.
> 
>>
>> Robin.
>>
>>> We can not 'move' the dma_mask with dma_pfn_offset when setting the mask
>>> since it is not correct. The DMA can access in 32 bits range and we have
>>> the peripherals under 0x8000 0000.
>>>
>>> I might be missing something, but it looks to me that the way we set the
>>> dma_mask and the coherent_mask is the place where this can be fixed.
>>>
>>> Best regards,
>>> - Péter
>>>
>>>>
>>>> - Péter
>>>>
>>>>
>>>>> diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
>>>>> index e822af0d9219..30b9c6786ce3 100644
>>>>> --- a/arch/arm/mm/dma-mapping.c
>>>>> +++ b/arch/arm/mm/dma-mapping.c
>>>>> @@ -221,7 +221,8 @@ EXPORT_SYMBOL(arm_coherent_dma_ops);
>>>>>      static int __dma_supported(struct device *dev, u64 mask, bool warn)
>>>>>    {
>>>>> -    unsigned long max_dma_pfn = min(max_pfn, arm_dma_pfn_limit);
>>>>> +    unsigned long max_dma_pfn =
>>>>> +        min_t(unsigned long, max_pfn, zone_dma_limit >> PAGE_SHIFT);
>>>>>          /*
>>>>>         * Translate the device's DMA mask to a PFN limit.  This
>>>>> diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
>>>>> index 3ef204137e73..dd0e169a1bb1 100644
>>>>> --- a/arch/arm/mm/init.c
>>>>> +++ b/arch/arm/mm/init.c
>>>>> @@ -19,6 +19,7 @@
>>>>>    #include <linux/gfp.h>
>>>>>    #include <linux/memblock.h>
>>>>>    #include <linux/dma-contiguous.h>
>>>>> +#include <linux/dma-direct.h>
>>>>>    #include <linux/sizes.h>
>>>>>    #include <linux/stop_machine.h>
>>>>>    #include <linux/swiotlb.h>
>>>>> @@ -84,15 +85,6 @@ static void __init find_limits(unsigned long
>>>>> *min, unsigned long *max_low,
>>>>>    phys_addr_t arm_dma_zone_size __read_mostly;
>>>>>    EXPORT_SYMBOL(arm_dma_zone_size);
>>>>>    -/*
>>>>> - * The DMA mask corresponding to the maximum bus address allocatable
>>>>> - * using GFP_DMA.  The default here places no restriction on DMA
>>>>> - * allocations.  This must be the smallest DMA mask in the system,
>>>>> - * so a successful GFP_DMA allocation will always satisfy this.
>>>>> - */
>>>>> -phys_addr_t arm_dma_limit;
>>>>> -unsigned long arm_dma_pfn_limit;
>>>>> -
>>>>>    static void __init arm_adjust_dma_zone(unsigned long *size,
>>>>> unsigned long *hole,
>>>>>        unsigned long dma_size)
>>>>>    {
>>>>> @@ -108,14 +100,14 @@ static void __init
>>>>> arm_adjust_dma_zone(unsigned long *size, unsigned long *hole,
>>>>>      void __init setup_dma_zone(const struct machine_desc *mdesc)
>>>>>    {
>>>>> -#ifdef CONFIG_ZONE_DMA
>>>>> -    if (mdesc->dma_zone_size) {
>>>>> +    if (!IS_ENABLED(CONFIG_ZONE_DMA)) {
>>>>> +        zone_dma_limit = ((phys_addr_t)~0);
>>>>> +    } else if (mdesc->dma_zone_size) {
>>>>>            arm_dma_zone_size = mdesc->dma_zone_size;
>>>>> -        arm_dma_limit = PHYS_OFFSET + arm_dma_zone_size - 1;
>>>>> -    } else
>>>>> -        arm_dma_limit = 0xffffffff;
>>>>> -    arm_dma_pfn_limit = arm_dma_limit >> PAGE_SHIFT;
>>>>> -#endif
>>>>> +        zone_dma_limit = PHYS_OFFSET + arm_dma_zone_size - 1;
>>>>> +    } else {
>>>>> +        zone_dma_limit = 0xffffffff;
>>>>> +    }
>>>>>    }
>>>>>      static void __init zone_sizes_init(unsigned long min, unsigned
>>>>> long max_low,
>>>>> @@ -279,7 +271,7 @@ void __init arm_memblock_init(const struct
>>>>> machine_desc *mdesc)
>>>>>        early_init_fdt_scan_reserved_mem();
>>>>>          /* reserve memory for DMA contiguous allocations */
>>>>> -    dma_contiguous_reserve(arm_dma_limit);
>>>>> +    dma_contiguous_reserve(zone_dma_limit);
>>>>>          arm_memblock_steal_permitted = false;
>>>>>        memblock_dump_all();
>>>>> diff --git a/arch/arm/mm/mm.h b/arch/arm/mm/mm.h
>>>>> index 88c121ac14b3..7dbd77554273 100644
>>>>> --- a/arch/arm/mm/mm.h
>>>>> +++ b/arch/arm/mm/mm.h
>>>>> @@ -82,14 +82,6 @@ extern __init void add_static_vm_early(struct
>>>>> static_vm *svm);
>>>>>      #endif
>>>>>    -#ifdef CONFIG_ZONE_DMA
>>>>> -extern phys_addr_t arm_dma_limit;
>>>>> -extern unsigned long arm_dma_pfn_limit;
>>>>> -#else
>>>>> -#define arm_dma_limit ((phys_addr_t)~0)
>>>>> -#define arm_dma_pfn_limit (~0ul >> PAGE_SHIFT)
>>>>> -#endif
>>>>> -
>>>>>    extern phys_addr_t arm_lowmem_limit;
>>>>>      void __init bootmem_init(void);
>>>>> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
>>>>> index b65dffdfb201..7a7501acd763 100644
>>>>> --- a/arch/arm64/mm/init.c
>>>>> +++ b/arch/arm64/mm/init.c
>>>>> @@ -441,7 +441,7 @@ void __init arm64_memblock_init(void)
>>>>>        early_init_fdt_scan_reserved_mem();
>>>>>          if (IS_ENABLED(CONFIG_ZONE_DMA)) {
>>>>> -        zone_dma_bits = ARM64_ZONE_DMA_BITS;
>>>>> +        zone_dma_limit = DMA_BIT_MASK(ARM64_ZONE_DMA_BITS);
>>>>>            arm64_dma_phys_limit = max_zone_phys(ARM64_ZONE_DMA_BITS);
>>>>>        }
>>>>>    diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
>>>>> index 9488b63dfc87..337ace03d3f0 100644
>>>>> --- a/arch/powerpc/mm/mem.c
>>>>> +++ b/arch/powerpc/mm/mem.c
>>>>> @@ -223,7 +223,7 @@ static int __init mark_nonram_nosave(void)
>>>>>     * everything else. GFP_DMA32 page allocations automatically fall
>>>>> back to
>>>>>     * ZONE_DMA.
>>>>>     *
>>>>> - * By using 31-bit unconditionally, we can exploit zone_dma_bits to
>>>>> inform the
>>>>> + * By using 31-bit unconditionally, we can exploit zone_dma_limit
>>>>> to inform the
>>>>>     * generic DMA mapping code.  32-bit only devices (if not handled
>>>>> by an IOMMU
>>>>>     * anyway) will take a first dip into ZONE_NORMAL and get
>>>>> otherwise served by
>>>>>     * ZONE_DMA.
>>>>> @@ -257,18 +257,20 @@ void __init paging_init(void)
>>>>>        printk(KERN_DEBUG "Memory hole size: %ldMB\n",
>>>>>               (long int)((top_of_ram - total_ram) >> 20));
>>>>>    +#ifdef CONFIG_ZONE_DMA
>>>>>        /*
>>>>>         * Allow 30-bit DMA for very limited Broadcom wifi chips on many
>>>>>         * powerbooks.
>>>>>         */
>>>>> -    if (IS_ENABLED(CONFIG_PPC32))
>>>>> -        zone_dma_bits = 30;
>>>>> -    else
>>>>> -        zone_dma_bits = 31;
>>>>> -
>>>>> -#ifdef CONFIG_ZONE_DMA
>>>>> -    max_zone_pfns[ZONE_DMA]    = min(max_low_pfn,
>>>>> -                      1UL << (zone_dma_bits - PAGE_SHIFT));
>>>>> +    if (IS_ENABLED(CONFIG_PPC32)) {
>>>>> +        zone_dma_limit = DMA_BIT_MASK(30);
>>>>> +        max_zone_pfns[ZONE_DMA]    = min(max_low_pfn,
>>>>> +                          1UL << (30 - PAGE_SHIFT));
>>>>> +    } else {
>>>>> +        zone_dma_limit = DMA_BIT_MASK(31);
>>>>> +        max_zone_pfns[ZONE_DMA]    = min(max_low_pfn,
>>>>> +                          1UL << (31 - PAGE_SHIFT));
>>>>> +    }
>>>>>    #endif
>>>>>        max_zone_pfns[ZONE_NORMAL] = max_low_pfn;
>>>>>    #ifdef CONFIG_HIGHMEM
>>>>> diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
>>>>> index f0ce22220565..c403f61cb56b 100644
>>>>> --- a/arch/s390/mm/init.c
>>>>> +++ b/arch/s390/mm/init.c
>>>>> @@ -118,7 +118,7 @@ void __init paging_init(void)
>>>>>          sparse_memory_present_with_active_regions(MAX_NUMNODES);
>>>>>        sparse_init();
>>>>> -    zone_dma_bits = 31;
>>>>> +    zone_dma_limit = DMA_BIT_MASK(31);
>>>>>        memset(max_zone_pfns, 0, sizeof(max_zone_pfns));
>>>>>        max_zone_pfns[ZONE_DMA] = PFN_DOWN(MAX_DMA_ADDRESS);
>>>>>        max_zone_pfns[ZONE_NORMAL] = max_low_pfn;
>>>>> diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
>>>>> index 24b8684aa21d..20d56d597506 100644
>>>>> --- a/include/linux/dma-direct.h
>>>>> +++ b/include/linux/dma-direct.h
>>>>> @@ -6,7 +6,7 @@
>>>>>    #include <linux/memblock.h> /* for min_low_pfn */
>>>>>    #include <linux/mem_encrypt.h>
>>>>>    -extern unsigned int zone_dma_bits;
>>>>> +extern phys_addr_t zone_dma_limit;
>>>>>      #ifdef CONFIG_ARCH_HAS_PHYS_TO_DMA
>>>>>    #include <asm/dma-direct.h>
>>>>> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
>>>>> index 6af7ae83c4ad..5ea1bed2ba6f 100644
>>>>> --- a/kernel/dma/direct.c
>>>>> +++ b/kernel/dma/direct.c
>>>>> @@ -21,7 +21,7 @@
>>>>>     * it for entirely different regions. In that case the arch code
>>>>> needs to
>>>>>     * override the variable below for dma-direct to work properly.
>>>>>     */
>>>>> -unsigned int zone_dma_bits __ro_after_init = 24;
>>>>> +phys_addr_t zone_dma_limit __ro_after_init = DMA_BIT_MASK(24);
>>>>>      static void report_addr(struct device *dev, dma_addr_t dma_addr,
>>>>> size_t size)
>>>>>    {
>>>>> @@ -74,7 +74,7 @@ static gfp_t __dma_direct_optimal_gfp_mask(struct
>>>>> device *dev, u64 dma_mask,
>>>>>         * Note that GFP_DMA32 and GFP_DMA are no ops without the
>>>>> corresponding
>>>>>         * zones.
>>>>>         */
>>>>> -    if (*phys_limit <= DMA_BIT_MASK(zone_dma_bits))
>>>>> +    if (*phys_limit <= zone_dma_limit)
>>>>>            return GFP_DMA;
>>>>>        if (*phys_limit <= DMA_BIT_MASK(32))
>>>>>            return GFP_DMA32;
>>>>> @@ -483,7 +483,7 @@ int dma_direct_supported(struct device *dev, u64
>>>>> mask)
>>>>>        u64 min_mask;
>>>>>          if (IS_ENABLED(CONFIG_ZONE_DMA))
>>>>> -        min_mask = DMA_BIT_MASK(zone_dma_bits);
>>>>> +        min_mask = zone_dma_limit;
>>>>>        else
>>>>>            min_mask = DMA_BIT_MASK(32);
>>>>>   
>>>>> _______________________________________________
>>>>> linux-arm-kernel mailing list
>>>>> linux-arm-kernel@lists.infradead.org
>>>>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>>>>>
>>>>
>>>> Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
>>>> Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
>>>>
>>>> _______________________________________________
>>>> linux-arm-kernel mailing list
>>>> linux-arm-kernel@lists.infradead.org
>>>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>>>>
>>>
>>> Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
>>> Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
>>> _______________________________________________
>>> iommu mailing list
>>> iommu@lists.linux-foundation.org
>>> https://lists.linuxfoundation.org/mailman/listinfo/iommu
>>>
> 
> - Péter
> 
> Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
> Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
>
Christoph Hellwig Jan. 9, 2020, 2:49 p.m. UTC | #10
On Wed, Jan 08, 2020 at 03:20:07PM +0000, Robin Murphy wrote:
>> The problem - I think - is that the DMA_BIT_MASK(32) from
>> dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32)) is treated as physical
>> address along the call path so the dma_pfn_offset is applied to it and
>> the check will fail, saying that DMA_BIT_MASK(32) can not be supported.
>
> But that's the thing - in isolation, that is entirely correct. Considering 
> ZONE_DMA32 for simplicity, in general the zone is expected to cover the 
> physical address range 0x0000_0000 - 0xffff_ffff (because DMA offsets are 
> relatively rare), and a device with a dma_pfn_offset of more than 
> (0x1_0000_0000 >> PAGE_SHIFT) *cannot* support that range with any mask, 
> because the DMA address itself would have to be negative.

Note that ZONE_DMA32 is irrelevant in this particular case, as we are
talking about arm32.  But with ZONE_DMA instead this roughly makes sense.

> The problem is that platforms with esoteric memory maps have no right thing 
> to do. If the base of RAM is at at 0x1_0000_0000 or higher, the "correct" 
> ZONE_DMA32 would be empty while ZONE_NORMAL above it would not, and last 
> time I looked that makes the page allocator break badly. So the standard 
> bodge on such platforms is to make ZONE_DMA32 cover not the first 4GB of 
> *PA space*, but the first 4GB of *RAM*, wherever that happens to be. That 
> then brings different problems - now the page allocator is happy and 
> successfully returns GFP_DMA32 allocations from the range 0x8_0000_0000 - 
> 0x8_ffff_ffff that are utterly useless to 32-bit devices with zero 
> dma_pfn_offset - see the AMD Seattle SoC for the prime example of that. If 
> on the other hand all devices are guaranteed to have a dma_pfn_offset that 
> puts the base of RAM at DMA address 0 then GFP_DMA32 allocations do end up 
> working as expected, but now the original assumption of where ZONE_DMA32 
> actually is is broken, so generic code unaware of the 
> platform/architecture-specific bodge will be misled - that's the case 
> you're running into.
>
> Having thought this far, if there's a non-hacky way to reach in and grab 
> ZONE_DMA{32} such that dma_direct_supported() could use zone_end_pfn() 
> instead of trying to assume either way, that might be the most robust 
> general solution.

zone_dma_bits is our somewhat ugly way to try to poke into this
information, although the way it is done right now sucks pretty badly.

The patch I sent to Peter in December was trying to convey that
information in a way similar to what the arm32 legacy dma code does, but
it didn't work, so I'll need to find some time to sit down and figure out
why.
Peter Ujfalusi Jan. 14, 2020, 10:43 a.m. UTC | #11
Christoph, Robin,

On 09/01/2020 16.49, Christoph Hellwig wrote:
> On Wed, Jan 08, 2020 at 03:20:07PM +0000, Robin Murphy wrote:
>>> The problem - I think - is that the DMA_BIT_MASK(32) from
>>> dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32)) is treated as physical
>>> address along the call path so the dma_pfn_offset is applied to it and
>>> the check will fail, saying that DMA_BIT_MASK(32) can not be supported.
>>
>> But that's the thing - in isolation, that is entirely correct. Considering 
>> ZONE_DMA32 for simplicity, in general the zone is expected to cover the 
>> physical address range 0x0000_0000 - 0xffff_ffff (because DMA offsets are 
>> relatively rare), and a device with a dma_pfn_offset of more than 
>> (0x1_0000_0000 >> PAGE_SHIFT) *cannot* support that range with any mask, 
>> because the DMA address itself would have to be negative.
> 
> Note that ZONE_DMA32 is irrelevant in this particular case, as we are
> talking about arm32.  But with ZONE_DMA instead this roughly makes sense.
> 
>> The problem is that platforms with esoteric memory maps have no right thing 
>> to do. If the base of RAM is at at 0x1_0000_0000 or higher, the "correct" 
>> ZONE_DMA32 would be empty while ZONE_NORMAL above it would not, and last 
>> time I looked that makes the page allocator break badly. So the standard 
>> bodge on such platforms is to make ZONE_DMA32 cover not the first 4GB of 
>> *PA space*, but the first 4GB of *RAM*, wherever that happens to be. That 
>> then brings different problems - now the page allocator is happy and 
>> successfully returns GFP_DMA32 allocations from the range 0x8_0000_0000 - 
>> 0x8_ffff_ffff that are utterly useless to 32-bit devices with zero 
>> dma_pfn_offset - see the AMD Seattle SoC for the prime example of that. If 
>> on the other hand all devices are guaranteed to have a dma_pfn_offset that 
>> puts the base of RAM at DMA address 0 then GFP_DMA32 allocations do end up 
>> working as expected, but now the original assumption of where ZONE_DMA32 
>> actually is is broken, so generic code unaware of the 
>> platform/architecture-specific bodge will be misled - that's the case 
>> you're running into.
>>
>> Having thought this far, if there's a non-hacky way to reach in and grab 
>> ZONE_DMA{32} such that dma_direct_supported() could use zone_end_pfn() 
>> instead of trying to assume either way, that might be the most robust 
>> general solution.
> 
> zone_dma_bits is our somewhat ugly way to try to poke into this
> information, although the way it is done right now sucks pretty badly.

In my view the handling of dma_pfn_offset is just incorrect as it is applied to _any_ address.
According to DT specification dma-ranges:
"Value type: <empty> or <prop-encoded-array> encoded as an arbitrary
number of (child-bus-address, parent-bus-address, length) triplets."

Yet in drivers/of/ we only take the _first_ triplet and ignore the rest.

The dma_pfn_offset should be only applied to paddr in the range:
parent-bus-address to parent-bus-address+length
for anything outside of this the dma_pfn_offset is 0.

conversion back from dma to paddr should consider the offset in range:
child-bus-address to child-bus-address+length
and 0 for everything outside of this.

To correctly handle the dma-ranges we would need something like this in device.h:
+struct dma_ranges {
+       u64 paddr;
+       u64 dma_addr;
+       u64 size;
+	unsigned long pfn_offset;
+};
+

struct device {
	...
-	unsigned long	dma_pfn_offset;
+       struct dma_ranges *dma_ranges;
	int dma_ranges_cnt;
	...
};

Then when we currently use dma_pfn_offset we would have:

unsigned long __phys_to_dma_pfn_offset(struct device *dev, phys_addr_t paddr)
{
	int i;

	if (!dev->dma_ranges)
		return 0;

	for (i = 0; i < dev->dma_ranges_cnt; i++) {
		struct dma_ranges *range = &dev->dma_ranges[i];
		if (paddr >= range->paddr &&
		    paddr <= (range->paddr + range->size))
			return range->pfn_offset;
	}

	return 0;
}

unsigned long __dma_to_phys_pfn_offset(struct device *dev, dma_addr_t dma_addr)
{
	int i;

	for (i = 0; i < dev->dma_ranges_cnt; i++) {
		struct dma_ranges *range = &dev->dma_ranges[i];
		if (dma_addr >= range->dma_addr &&
		    dma_addr <= (range->dma_addr + range->size))
			return range->pfn_offset;
	}

	return 0;
}

For existing drivers/archs setting dma_pfn_offset we can:
if (dev->dma_ranges_cnt == 1 && dev->dma_ranges[0].pfn_offset && !dev->dma_ranges[0].size)
	return dev->dma_ranges[0].pfn_offset;

and they would have to set up one struct dma_ranges.

One of the issue with this is that the struct dma_ranges would need to be allocated for
all devices, so there should be a some clever way need to be invented to use pointers
as much as we can.

> The patch I sent to Peter in December was trying to convey that
> information in a way similar to what the arm32 legacy dma code does, but
> it didn't work, so I'll need to find some time to sit down and figure out
> why.

But, while we get a proper solution can we get the following patch in to fix the regression?
Basically we are falling back to what works (and was used before commit ad3c7b18c5b362be5dbd0f2c0bcf1fd5fd659315).

commit 8c3c36b377c139603a9dff5c58dac59865f1ac0f
Author: Peter Ujfalusi <peter.ujfalusi@ti.com>
Date:   Thu Dec 19 15:07:25 2019 +0200

    arm: mm: dma-mapping: Fix dma_supported() when dev->dma_pfn_offset is not 0
    
    We can only use direct mapping when LPAE is enabled if the dma_pfn_offset
    is 0, otherwise valid dma_masks will be rejected and the DMA support is
    going to be denied for peripherals, or DMA drivers.
    
    Cc: Stable <stable@vger.kernel.org> #v5.3+
    Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 9414d72f664b..e07ec1ea3865 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -1100,15 +1100,6 @@ int arm_dma_supported(struct device *dev, u64 mask)
 
 static const struct dma_map_ops *arm_get_dma_map_ops(bool coherent)
 {
-	/*
-	 * When CONFIG_ARM_LPAE is set, physical address can extend above
-	 * 32-bits, which then can't be addressed by devices that only support
-	 * 32-bit DMA.
-	 * Use the generic dma-direct / swiotlb ops code in that case, as that
-	 * handles bounce buffering for us.
-	 */
-	if (IS_ENABLED(CONFIG_ARM_LPAE))
-		return NULL;
 	return coherent ? &arm_coherent_dma_ops : &arm_dma_ops;
 }
 
@@ -2313,6 +2304,15 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
 
 	if (arm_setup_iommu_dma_ops(dev, dma_base, size, iommu))
 		dma_ops = arm_get_iommu_dma_map_ops(coherent);
+	else if (IS_ENABLED(CONFIG_ARM_LPAE) && !dev->dma_pfn_offset)
+		/*
+		 * When CONFIG_ARM_LPAE is set, physical address can extend
+		 * above * 32-bits, which then can't be addressed by devices
+		 * that only support 32-bit DMA.
+		 * Use the generic dma-direct / swiotlb ops code in that case,
+		 * as that handles bounce buffering for us.
+		 */
+		dma_ops = NULL;
 	else
 		dma_ops = arm_get_dma_map_ops(coherent);

--- 
- Péter

Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki. Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
diff mbox series

Patch

diff --git a/arch/arm/include/asm/dma-mapping.h b/arch/arm/include/asm/dma-mapping.h
index 03ba90ffc0f8..054119cd7757 100644
--- a/arch/arm/include/asm/dma-mapping.h
+++ b/arch/arm/include/asm/dma-mapping.h
@@ -18,7 +18,9 @@  extern const struct dma_map_ops arm_coherent_dma_ops;
 
 static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
 {
-	return IS_ENABLED(CONFIG_MMU) ? &arm_dma_ops : NULL;
+	if (IS_ENABLED(CONFIG_MMU) && !IS_ENABLED(CONFIG_ARM_LPAE))
+		return &arm_dma_ops;
+	return NULL;
 }
 
 #ifdef __arch_page_to_dma
diff --git a/arch/arm/mm/Kconfig b/arch/arm/mm/Kconfig
index b169e580bf82..2dd36183d0e6 100644
--- a/arch/arm/mm/Kconfig
+++ b/arch/arm/mm/Kconfig
@@ -663,6 +663,11 @@  config ARM_LPAE
 	depends on MMU && CPU_32v7 && !CPU_32v6 && !CPU_32v5 && \
 		!CPU_32v4 && !CPU_32v3
 	select PHYS_ADDR_T_64BIT
+	select SWIOTLB
+	select ARCH_HAS_DMA_COHERENT_TO_PFN
+	select ARCH_HAS_DMA_MMAP_PGPROT
+	select ARCH_HAS_SYNC_DMA_FOR_DEVICE
+	select ARCH_HAS_SYNC_DMA_FOR_CPU
 	help
 	  Say Y if you have an ARMv7 processor supporting the LPAE page
 	  table format and you would like to access memory beyond the
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index bdf0d236aaee..01a5b96d76a7 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -18,6 +18,7 @@ 
 #include <linux/init.h>
 #include <linux/device.h>
 #include <linux/dma-mapping.h>
+#include <linux/dma-noncoherent.h>
 #include <linux/dma-contiguous.h>
 #include <linux/highmem.h>
 #include <linux/memblock.h>
@@ -1129,6 +1130,19 @@  int arm_dma_supported(struct device *dev, u64 mask)
 
 static const struct dma_map_ops *arm_get_dma_map_ops(bool coherent)
 {
+	/*
+	 * When CONFIG_ARM_LPAE is set, physical address can extend above
+	 * 32-bits, which then can't be addressed by devices that only support
+	 * 32-bit DMA.
+	 * Use the generic dma-direct / swiotlb ops code in that case, as that
+	 * handles bounce buffering for us.
+	 *
+	 * Note: this checks CONFIG_ARM_LPAE instead of CONFIG_SWIOTLB as the
+	 * latter is also selected by the Xen code, but that code for now relies
+	 * on non-NULL dev_dma_ops.  To be cleaned up later.
+	 */
+	if (IS_ENABLED(CONFIG_ARM_LPAE))
+		return NULL;
 	return coherent ? &arm_coherent_dma_ops : &arm_dma_ops;
 }
 
@@ -2333,6 +2347,9 @@  void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
 	const struct dma_map_ops *dma_ops;
 
 	dev->archdata.dma_coherent = coherent;
+#ifdef CONFIG_SWIOTLB
+	dev->dma_coherent = coherent;
+#endif
 
 	/*
 	 * Don't override the dma_ops if they have already been set. Ideally
@@ -2367,3 +2384,47 @@  void arch_teardown_dma_ops(struct device *dev)
 	/* Let arch_setup_dma_ops() start again from scratch upon re-probe */
 	set_dma_ops(dev, NULL);
 }
+
+#ifdef CONFIG_SWIOTLB
+void arch_sync_dma_for_device(struct device *dev, phys_addr_t paddr,
+		size_t size, enum dma_data_direction dir)
+{
+	__dma_page_cpu_to_dev(phys_to_page(paddr), paddr & (PAGE_SIZE - 1),
+			      size, dir);
+}
+
+void arch_sync_dma_for_cpu(struct device *dev, phys_addr_t paddr,
+		size_t size, enum dma_data_direction dir)
+{
+	__dma_page_dev_to_cpu(phys_to_page(paddr), paddr & (PAGE_SIZE - 1),
+			      size, dir);
+}
+
+long arch_dma_coherent_to_pfn(struct device *dev, void *cpu_addr,
+		dma_addr_t dma_addr)
+{
+	return dma_to_pfn(dev, dma_addr);
+}
+
+pgprot_t arch_dma_mmap_pgprot(struct device *dev, pgprot_t prot,
+		unsigned long attrs)
+{
+	if (!dev_is_dma_coherent(dev))
+		return __get_dma_pgprot(attrs, prot);
+	return prot;
+}
+
+void *arch_dma_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle,
+		gfp_t gfp, unsigned long attrs)
+{
+	return __dma_alloc(dev, size, dma_handle, gfp,
+			   __get_dma_pgprot(attrs, PAGE_KERNEL), false,
+			   attrs, __builtin_return_address(0));
+}
+
+void arch_dma_free(struct device *dev, size_t size, void *cpu_addr,
+		dma_addr_t dma_handle, unsigned long attrs)
+{
+	__arm_dma_free(dev, size, cpu_addr, dma_handle, attrs, false);
+}
+#endif /* CONFIG_SWIOTLB */
diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
index be0b42937888..64541be15d43 100644
--- a/arch/arm/mm/init.c
+++ b/arch/arm/mm/init.c
@@ -24,6 +24,7 @@ 
 #include <linux/dma-contiguous.h>
 #include <linux/sizes.h>
 #include <linux/stop_machine.h>
+#include <linux/swiotlb.h>
 
 #include <asm/cp15.h>
 #include <asm/mach-types.h>
@@ -456,6 +457,10 @@  void __init mem_init(void)
 	extern u32 itcm_end;
 #endif
 
+#ifdef CONFIG_ARM_LPAE
+	swiotlb_init(1);
+#endif
+
 	set_max_mapnr(pfn_to_page(max_pfn) - mem_map);
 
 	/* this will put all unused low memory onto the freelists */