diff mbox

[v5,4/7] drivers: dma-coherent: Introduce default DMA pool

Message ID 1495621472-9323-5-git-send-email-vladimir.murzin@arm.com (mailing list archive)
State New, archived
Headers show

Commit Message

Vladimir Murzin May 24, 2017, 10:24 a.m. UTC
This patch introduces default coherent DMA pool similar to default CMA
area concept. To keep other users safe code kept under CONFIG_ARM.

Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Suggested-by: Robin Murphy <robin.murphy@arm.com>
Tested-by: Benjamin Gaignard <benjamin.gaignard@linaro.org>
Tested-by: Andras Szemzo <sza@esh.hu>
Tested-by: Alexandre TORGUE <alexandre.torgue@st.com>
Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
---
 .../bindings/reserved-memory/reserved-memory.txt   |  3 ++
 drivers/base/dma-coherent.c                        | 59 +++++++++++++++++++---
 2 files changed, 55 insertions(+), 7 deletions(-)

Comments

Christoph Hellwig June 20, 2017, 1:49 p.m. UTC | #1
On Wed, May 24, 2017 at 11:24:29AM +0100, Vladimir Murzin wrote:
> This patch introduces default coherent DMA pool similar to default CMA
> area concept. To keep other users safe code kept under CONFIG_ARM.

I don't see a CONFIG_ARM in the code, although parts of it are added
under CONFIG_OF_RESERVED_MEM.

But overall this code look a bit odd to me.  As far as I can tell
the dma-coherent.c code is for the case where we have a special
piece of coherent memory close to a device.

If you're allocating out of the global allocator the memory should
come from the normal dma_ops ->alloc allocator - and also take
the attrs into account (e.g. for DMA_ATTR_NON_CONSISTENT or
DMA_ATTR_NO_KERNEL_MAPPING requests you don't need coherent memory)
Robin Murphy June 20, 2017, 2:24 p.m. UTC | #2
On 20/06/17 14:49, Christoph Hellwig wrote:
> On Wed, May 24, 2017 at 11:24:29AM +0100, Vladimir Murzin wrote:
>> This patch introduces default coherent DMA pool similar to default CMA
>> area concept. To keep other users safe code kept under CONFIG_ARM.
> 
> I don't see a CONFIG_ARM in the code, although parts of it are added
> under CONFIG_OF_RESERVED_MEM.

It's in rmem_dma_setup() (line 325) currently enforcing no-map for ARM.

> But overall this code look a bit odd to me.  As far as I can tell
> the dma-coherent.c code is for the case where we have a special
> piece of coherent memory close to a device.

True, but the case here is where we need a special piece of coherent
memory for *all* devices, and it was more complicated *not* to reuse the
existing infrastructure. This would already be achievable by specifying
a separate rmem carveout per device, but the shared pool just makes life
easier, and mirrors the functionality dma-contiguous already supports.

> If you're allocating out of the global allocator the memory should
> come from the normal dma_ops ->alloc allocator - and also take
> the attrs into account (e.g. for DMA_ATTR_NON_CONSISTENT or
> DMA_ATTR_NO_KERNEL_MAPPING requests you don't need coherent memory)

The context here is noMMU but with caches - the problem being that the
normal allocator will give back kernel memory, and there's no way to
make that coherent with devices short of not enabling the caches in the
first place, which is obviously undesirable. The trick is that RAM is
aliased (in hardware) at two addresses, one of which makes CPU accesses
non-cacheable, so by only ever accessing the RAM set aside for the
coherent DMA pool using the non-cacheable alias (represented by the
dma_pfn_offset) we can achieve DMA coherency.

It perhaps seems a bit backwards, but we do actually end up honouring
DMA_ATTR_NON_CONSISTENT to a degree in patch #5, as such requests are
the only ones allowed to fall back to the normal dma_ops allocator.

Robin.
Vladimir Murzin June 22, 2017, 1:18 p.m. UTC | #3
On 20/06/17 14:49, Christoph Hellwig wrote:
> On Wed, May 24, 2017 at 11:24:29AM +0100, Vladimir Murzin wrote:
>> This patch introduces default coherent DMA pool similar to default CMA
>> area concept. To keep other users safe code kept under CONFIG_ARM.
> 
> I don't see a CONFIG_ARM in the code, although parts of it are added
> under CONFIG_OF_RESERVED_MEM.

It should look like that:

#ifdef CONFIG_ARM
	if (!of_get_flat_dt_prop(node, "no-map", NULL)) {
		pr_err("Reserved memory: regions without no-map are not yet supported\n");
		return -EINVAL;
	}
+
+	if (of_get_flat_dt_prop(node, "linux,dma-default", NULL)) {
+		WARN(dma_reserved_default_memory,
+		     "Reserved memory: region for default DMA coherent area is redefined\n");
+		dma_reserved_default_memory = rmem;
+	}
#endif


> 
> But overall this code look a bit odd to me.  As far as I can tell
> the dma-coherent.c code is for the case where we have a special
> piece of coherent memory close to a device.
> 
> If you're allocating out of the global allocator the memory should
> come from the normal dma_ops ->alloc allocator - and also take
> the attrs into account (e.g. for DMA_ATTR_NON_CONSISTENT or
> DMA_ATTR_NO_KERNEL_MAPPING requests you don't need coherent memory)
> 

It is how it has been started [1] - defining memory which is not cacheable
(i.e. suitable for coherent allocations) and building custom allocator on top
of it, like it was done for c6x and blackfin. The annoying thing was that we
needed to advertise such memory via command line parameter plus some "mem="
adjustment to hide coherent memory from buddy allocator. So it was suggested
to use reserved memory and this makes things look much better, but on the
other hand require changes on dts side to "bind" devices with reserved memory
- default DMA pool removes such drawback.

[1] https://marc.info/?l=linux-arm-kernel&m=148163694824475&w=2

Cheers
Vladimir
Christoph Hellwig June 26, 2017, 9:42 a.m. UTC | #4
On Tue, Jun 20, 2017 at 03:24:21PM +0100, Robin Murphy wrote:
> True, but the case here is where we need a special piece of coherent
> memory for *all* devices, and it was more complicated *not* to reuse the
> existing infrastructure. This would already be achievable by specifying
> a separate rmem carveout per device, but the shared pool just makes life
> easier, and mirrors the functionality dma-contiguous already supports.

І'm really worried about the code in dma-coherent.c - the original
version clearly intends to have a coherent pool per device, declared
in the driver.  Then Marek added the reserved_mem interface, and
now we get another variant of it.  Conceptually the per-device
and global pool are very different, and to me it seems like the
reserved mem should be a different interface.

> > If you're allocating out of the global allocator the memory should
> > come from the normal dma_ops ->alloc allocator - and also take
> > the attrs into account (e.g. for DMA_ATTR_NON_CONSISTENT or
> > DMA_ATTR_NO_KERNEL_MAPPING requests you don't need coherent memory)
> 
> The context here is noMMU but with caches - the problem being that the
> normal allocator will give back kernel memory, and there's no way to
> make that coherent with devices short of not enabling the caches in the
> first place, which is obviously undesirable. The trick is that RAM is
> aliased (in hardware) at two addresses, one of which makes CPU accesses
> non-cacheable, so by only ever accessing the RAM set aside for the
> coherent DMA pool using the non-cacheable alias (represented by the
> dma_pfn_offset) we can achieve DMA coherency.

Yes, and I think this is something we already have to deal with
for example on mips.  A simple genalloc allocator from your pool
in the normal dma_ops implementation should do the work just fine.
Christoph Hellwig June 26, 2017, 9:44 a.m. UTC | #5
On Thu, Jun 22, 2017 at 02:18:48PM +0100, Vladimir Murzin wrote:
> It is how it has been started [1] - defining memory which is not cacheable
> (i.e. suitable for coherent allocations) and building custom allocator on top
> of it, like it was done for c6x and blackfin. The annoying thing was that we
> needed to advertise such memory via command line parameter plus some "mem="
> adjustment to hide coherent memory from buddy allocator. So it was suggested
> to use reserved memory and this makes things look much better, but on the
> other hand require changes on dts side to "bind" devices with reserved memory
> - default DMA pool removes such drawback.

I like the idea in general, I'm just worried about the overlap with the
per-device coherent memory, especially when we have slight semantic
mismatches like the one about the physical (or rather dma) address
earlier.
Vladimir Murzin June 26, 2017, 2:08 p.m. UTC | #6
On 26/06/17 10:42, Christoph Hellwig wrote:
> On Tue, Jun 20, 2017 at 03:24:21PM +0100, Robin Murphy wrote:
>> True, but the case here is where we need a special piece of coherent
>> memory for *all* devices, and it was more complicated *not* to reuse the
>> existing infrastructure. This would already be achievable by specifying
>> a separate rmem carveout per device, but the shared pool just makes life
>> easier, and mirrors the functionality dma-contiguous already supports.
> 
> І'm really worried about the code in dma-coherent.c - the original
> version clearly intends to have a coherent pool per device, declared
> in the driver.  Then Marek added the reserved_mem interface, and
> now we get another variant of it.  Conceptually the per-device
> and global pool are very different, and to me it seems like the
> reserved mem should be a different interface.
> 
>>> If you're allocating out of the global allocator the memory should
>>> come from the normal dma_ops ->alloc allocator - and also take
>>> the attrs into account (e.g. for DMA_ATTR_NON_CONSISTENT or
>>> DMA_ATTR_NO_KERNEL_MAPPING requests you don't need coherent memory)
>>
>> The context here is noMMU but with caches - the problem being that the
>> normal allocator will give back kernel memory, and there's no way to
>> make that coherent with devices short of not enabling the caches in the
>> first place, which is obviously undesirable. The trick is that RAM is
>> aliased (in hardware) at two addresses, one of which makes CPU accesses
>> non-cacheable, so by only ever accessing the RAM set aside for the
>> coherent DMA pool using the non-cacheable alias (represented by the
>> dma_pfn_offset) we can achieve DMA coherency.
> 
> Yes, and I think this is something we already have to deal with
> for example on mips.  A simple genalloc allocator from your pool
> in the normal dma_ops implementation should do the work just fine.
> 

Are you proposing keeping pool handling under arch?

Cheers
Vladimir
Robin Murphy June 27, 2017, 2:36 p.m. UTC | #7
On 26/06/17 10:42, Christoph Hellwig wrote:
> On Tue, Jun 20, 2017 at 03:24:21PM +0100, Robin Murphy wrote:
>> True, but the case here is where we need a special piece of coherent
>> memory for *all* devices, and it was more complicated *not* to reuse the
>> existing infrastructure. This would already be achievable by specifying
>> a separate rmem carveout per device, but the shared pool just makes life
>> easier, and mirrors the functionality dma-contiguous already supports.
> 
> І'm really worried about the code in dma-coherent.c - the original
> version clearly intends to have a coherent pool per device, declared
> in the driver.  Then Marek added the reserved_mem interface, and
> now we get another variant of it.  Conceptually the per-device
> and global pool are very different, and to me it seems like the
> reserved mem should be a different interface.

Per-device reserved mem is still a private per-device pool though, it's
just discovered and declared by common firmware code rather than in some
device-specific way by driver code - once it's assigned there's no
distinction. The global/per-device issue is essentially entirely
orthogonal, and has the dubious pleasure of being a massive conceptual
difference yet a much smaller implementation difference.

>>> If you're allocating out of the global allocator the memory should
>>> come from the normal dma_ops ->alloc allocator - and also take
>>> the attrs into account (e.g. for DMA_ATTR_NON_CONSISTENT or
>>> DMA_ATTR_NO_KERNEL_MAPPING requests you don't need coherent memory)
>>
>> The context here is noMMU but with caches - the problem being that the
>> normal allocator will give back kernel memory, and there's no way to
>> make that coherent with devices short of not enabling the caches in the
>> first place, which is obviously undesirable. The trick is that RAM is
>> aliased (in hardware) at two addresses, one of which makes CPU accesses
>> non-cacheable, so by only ever accessing the RAM set aside for the
>> coherent DMA pool using the non-cacheable alias (represented by the
>> dma_pfn_offset) we can achieve DMA coherency.
> 
> Yes, and I think this is something we already have to deal with
> for example on mips.  A simple genalloc allocator from your pool
> in the normal dma_ops implementation should do the work just fine.

I admit I'm almost in agreement, were it not for the fact that
dma-contiguous already supports all four combinations of both per-device
and global pools, and both reserved mem and direct declarations from
arch/platform code, all through the same interface to boot, and nobody's
complaining about that. The only real difference for dma-coherent seems
to be the way it's baked into the existing API.

If it is just a matter of interfaces, I'd have no objection to exporting
a separate e.g. dma_alloc_from_global_coherent() or somesuch as a
conceptually separate interface to dma_coherent_default_memory, which
the arch code can then call from ->alloc in the same manner they
currently call dma_alloc_from_contiguous(). That seems like a reasonable
way to keep the per-device and global pools conceptually distinct
without needlessly duplicating implementations. In fact, I'm now
wondering if the regular arm/arm64 atomic pools couldn't also make use
of such a thing as well...

Robin.
Christoph Hellwig June 27, 2017, 3:22 p.m. UTC | #8
On Tue, Jun 27, 2017 at 03:36:16PM +0100, Robin Murphy wrote:
> I admit I'm almost in agreement, were it not for the fact that
> dma-contiguous already supports all four combinations of both per-device
> and global pools, and both reserved mem and direct declarations from
> arch/platform code, all through the same interface to boot, and nobody's
> complaining about that. The only real difference for dma-coherent seems
> to be the way it's baked into the existing API.
> 
> If it is just a matter of interfaces, I'd have no objection to exporting
> a separate e.g. dma_alloc_from_global_coherent() or somesuch as a
> conceptually separate interface to dma_coherent_default_memory, which
> the arch code can then call from ->alloc in the same manner they
> currently call dma_alloc_from_contiguous(). That seems like a reasonable
> way to keep the per-device and global pools conceptually distinct
> without needlessly duplicating implementations. In fact, I'm now
> wondering if the regular arm/arm64 atomic pools couldn't also make use
> of such a thing as well...

Ok.  I think I'll just go ahead with the current patches, and then
we'll try to come up with something better later.  I really don't
want it in actual arch code, but I want it controlled from the
dma_map_ops instance instead of from generic code.  There will be
a lot of churn in this area if my plans go ahead, so I think we can
handle it then.
diff mbox

Patch

diff --git a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
index 3da0ebd..16291f2 100644
--- a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
+++ b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
@@ -68,6 +68,9 @@  Linux implementation note:
 - If a "linux,cma-default" property is present, then Linux will use the
   region for the default pool of the contiguous memory allocator.
 
+- If a "linux,dma-default" property is present, then Linux will use the
+  region for the default pool of the consistent DMA allocator.
+
 Device node references to reserved memory
 -----------------------------------------
 Regions in the /reserved-memory node may be referenced by other device
diff --git a/drivers/base/dma-coherent.c b/drivers/base/dma-coherent.c
index 99c9695..2ae24c2 100644
--- a/drivers/base/dma-coherent.c
+++ b/drivers/base/dma-coherent.c
@@ -19,6 +19,15 @@  struct dma_coherent_mem {
 	bool		use_dev_dma_pfn_offset;
 };
 
+static struct dma_coherent_mem *dma_coherent_default_memory __ro_after_init;
+
+static inline struct dma_coherent_mem *dev_get_coherent_memory(struct device *dev)
+{
+	if (dev && dev->dma_mem)
+		return dev->dma_mem;
+	return dma_coherent_default_memory;
+}
+
 static inline dma_addr_t dma_get_device_base(struct device *dev,
 					     struct dma_coherent_mem * mem)
 {
@@ -93,6 +102,9 @@  static void dma_release_coherent_memory(struct dma_coherent_mem *mem)
 static int dma_assign_coherent_memory(struct device *dev,
 				      struct dma_coherent_mem *mem)
 {
+	if (!dev)
+		return -ENODEV;
+
 	if (dev->dma_mem)
 		return -EBUSY;
 
@@ -171,15 +183,12 @@  EXPORT_SYMBOL(dma_mark_declared_memory_occupied);
 int dma_alloc_from_coherent(struct device *dev, ssize_t size,
 				       dma_addr_t *dma_handle, void **ret)
 {
-	struct dma_coherent_mem *mem;
+	struct dma_coherent_mem *mem = dev_get_coherent_memory(dev);
 	int order = get_order(size);
 	unsigned long flags;
 	int pageno;
 	int dma_memory_map;
 
-	if (!dev)
-		return 0;
-	mem = dev->dma_mem;
 	if (!mem)
 		return 0;
 
@@ -233,7 +242,7 @@  EXPORT_SYMBOL(dma_alloc_from_coherent);
  */
 int dma_release_from_coherent(struct device *dev, int order, void *vaddr)
 {
-	struct dma_coherent_mem *mem = dev ? dev->dma_mem : NULL;
+	struct dma_coherent_mem *mem = dev_get_coherent_memory(dev);
 
 	if (mem && vaddr >= mem->virt_base && vaddr <
 		   (mem->virt_base + (mem->size << PAGE_SHIFT))) {
@@ -267,7 +276,7 @@  EXPORT_SYMBOL(dma_release_from_coherent);
 int dma_mmap_from_coherent(struct device *dev, struct vm_area_struct *vma,
 			   void *vaddr, size_t size, int *ret)
 {
-	struct dma_coherent_mem *mem = dev ? dev->dma_mem : NULL;
+	struct dma_coherent_mem *mem = dev_get_coherent_memory(dev);
 
 	if (mem && vaddr >= mem->virt_base && vaddr + size <=
 		   (mem->virt_base + (mem->size << PAGE_SHIFT))) {
@@ -297,6 +306,8 @@  EXPORT_SYMBOL(dma_mmap_from_coherent);
 #include <linux/of_fdt.h>
 #include <linux/of_reserved_mem.h>
 
+static struct reserved_mem *dma_reserved_default_memory __initdata;
+
 static int rmem_dma_device_init(struct reserved_mem *rmem, struct device *dev)
 {
 	struct dma_coherent_mem *mem = rmem->priv;
@@ -318,7 +329,8 @@  static int rmem_dma_device_init(struct reserved_mem *rmem, struct device *dev)
 static void rmem_dma_device_release(struct reserved_mem *rmem,
 				    struct device *dev)
 {
-	dev->dma_mem = NULL;
+	if (dev)
+		dev->dma_mem = NULL;
 }
 
 static const struct reserved_mem_ops rmem_dma_ops = {
@@ -338,6 +350,12 @@  static int __init rmem_dma_setup(struct reserved_mem *rmem)
 		pr_err("Reserved memory: regions without no-map are not yet supported\n");
 		return -EINVAL;
 	}
+
+	if (of_get_flat_dt_prop(node, "linux,dma-default", NULL)) {
+		WARN(dma_reserved_default_memory,
+		     "Reserved memory: region for default DMA coherent area is redefined\n");
+		dma_reserved_default_memory = rmem;
+	}
 #endif
 
 	rmem->ops = &rmem_dma_ops;
@@ -345,5 +363,32 @@  static int __init rmem_dma_setup(struct reserved_mem *rmem)
 		&rmem->base, (unsigned long)rmem->size / SZ_1M);
 	return 0;
 }
+
+static int __init dma_init_reserved_memory(void)
+{
+	const struct reserved_mem_ops *ops;
+	int ret;
+
+	if (!dma_reserved_default_memory)
+		return -ENOMEM;
+
+	ops = dma_reserved_default_memory->ops;
+
+	/*
+	 * We rely on rmem_dma_device_init() does not propagate error of
+	 * dma_assign_coherent_memory() for "NULL" device.
+	 */
+	ret = ops->device_init(dma_reserved_default_memory, NULL);
+
+	if (!ret) {
+		dma_coherent_default_memory = dma_reserved_default_memory->priv;
+		pr_info("DMA: default coherent area is set\n");
+	}
+
+	return ret;
+}
+
+core_initcall(dma_init_reserved_memory);
+
 RESERVEDMEM_OF_DECLARE(dma, "shared-dma-pool", rmem_dma_setup);
 #endif