diff mbox

[1/8] ARM: dma-mapping: remove offset parameter to prepare for generic dma_ops

Message ID 1308556213-24970-2-git-send-email-m.szyprowski@samsung.com (mailing list archive)
State New, archived
Headers show

Commit Message

Marek Szyprowski June 20, 2011, 7:50 a.m. UTC
This patch removes the need for offset parameter in dma bounce
functions. This is required to let dma-mapping framework on ARM
architecture use common, generic dma-mapping helpers.

Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
---
 arch/arm/common/dmabounce.c        |   13 ++++++--
 arch/arm/include/asm/dma-mapping.h |   63 +++++++++++++++++------------------
 arch/arm/mm/dma-mapping.c          |    4 +-
 3 files changed, 43 insertions(+), 37 deletions(-)

Comments

Marek Szyprowski June 20, 2011, 10:46 a.m. UTC | #1
Hello,

On Monday, June 20, 2011 10:36 AM Michal Nazarewicz wrote:

> On Mon, 20 Jun 2011 09:50:06 +0200, Marek Szyprowski
> <m.szyprowski@samsung.com> wrote:
> > +static inline void dma_sync_single_for_cpu(struct device *dev,
> 
> I wouldn't really put inline here or in the function below.
> 
> > +		dma_addr_t handle, size_t size, enum dma_data_direction dir)
> > +{
> > +	BUG_ON(!valid_dma_direction(dir));
> > +
> > +	debug_dma_sync_single_for_cpu(dev, handle, size, dir);
> > +
> > +	if (!dmabounce_sync_for_cpu(dev, handle, size, dir))
> > +		return;
> > +
> > +	__dma_single_dev_to_cpu(dma_to_virt(dev, handle), size, dir);
> 
> I know it is just copy'n'paste but how about:

This patch is just about moving the code between the files and I wanted just
to show what's being changed and how. There is a final cleanup anyway in the
separate patch. All these patches are meant to start the discussion about
the way the dma mapping can be redesigned for further extensions with generic
iommu support. 

> 
> 	if (dmabounce_sync_for_cpu(dev, handle, size, dir))
> 		__dma_single_dev_to_cpu(dma_to_virt(dev, handle), size, dir);

The above lines will be removed by the next patches in this series, so I
really see no point in changing this.

(snipped)

Best regards
Russell King - ARM Linux July 3, 2011, 3:28 p.m. UTC | #2
On Mon, Jun 20, 2011 at 09:50:06AM +0200, Marek Szyprowski wrote:
> This patch removes the need for offset parameter in dma bounce
> functions. This is required to let dma-mapping framework on ARM
> architecture use common, generic dma-mapping helpers.

I really don't like this.  Really as in hate.  Why?  I've said in the past
that the whole idea of getting rid of the sub-range functions is idiotic.

If you have to deal with cache coherence, what you _really_ want is an
API which tells you the size of the original buffer and the section of
that buffer which you want to handle - because the edges of the buffer
need special handling.

Lets say that you have a buffer which is 256 bytes long, misaligned to
half a cache line.  Let's first look at the sequence for whole-buffer:

1. You map it for DMA from the device.  This means you writeback the
   first and last cache lines to perserve any data shared in the
   overlapping cache line.  The remainder you can just invalidate.

2. You want to access the buffer, so you use the sync_for_cpu function.
   If your CPU doesn't do any speculative prefetching, then you don't
   need to do anything.  If you do, you have to invalidate the buffer,
   but you must preserve the overlapping cache lines which again must
   be written back.

3. You transfer ownership back to the device using sync_for_device.
   As you may have caused cache lines to be read in, again you need to
   invalidate, and the overlapping cache lines must be written back.

Now, if you ask for a sub-section of the buffer to be sync'd, you can
actually eliminate those writebacks which are potentially troublesome,
and which could corrupt neighbouring data.

If you get rid of the sub-buffer functions and start using the whole
buffer functions for that purpose, you no longer know whether the
partial cache lines are part of the buffer or not, so you have to write
those back every time too.

So far, we haven't had any reports of corruption of this type (maybe
folk using the sync functions are rare on ARM - thankfully) but getting
rid of the range sync functions means that solving this becomes a lot
more difficult because we've lost the information to make the decision.

So I've always believed - and continue to do so - that those who want
to get rid of the range sync functions are misguided and are storing up
problems for the future.
Marek Szyprowski July 26, 2011, 12:56 p.m. UTC | #3
Hello,

On Sunday, July 03, 2011 5:28 PM Russell King wrote:

> On Mon, Jun 20, 2011 at 09:50:06AM +0200, Marek Szyprowski wrote:
> > This patch removes the need for offset parameter in dma bounce
> > functions. This is required to let dma-mapping framework on ARM
> > architecture use common, generic dma-mapping helpers.
> 
> I really don't like this.  Really as in hate.  Why?  I've said in the past
> that the whole idea of getting rid of the sub-range functions is idiotic.
> 
> If you have to deal with cache coherence, what you _really_ want is an
> API which tells you the size of the original buffer and the section of
> that buffer which you want to handle - because the edges of the buffer
> need special handling.
> 
> Lets say that you have a buffer which is 256 bytes long, misaligned to
> half a cache line.  Let's first look at the sequence for whole-buffer:
> 
> 1. You map it for DMA from the device.  This means you writeback the
>    first and last cache lines to perserve any data shared in the
>    overlapping cache line.  The remainder you can just invalidate.
> 
> 2. You want to access the buffer, so you use the sync_for_cpu function.
>    If your CPU doesn't do any speculative prefetching, then you don't
>    need to do anything.  If you do, you have to invalidate the buffer,
>    but you must preserve the overlapping cache lines which again must
>    be written back.
> 
> 3. You transfer ownership back to the device using sync_for_device.
>    As you may have caused cache lines to be read in, again you need to
>    invalidate, and the overlapping cache lines must be written back.
> 
> Now, if you ask for a sub-section of the buffer to be sync'd, you can
> actually eliminate those writebacks which are potentially troublesome,
> and which could corrupt neighbouring data.
> 
> If you get rid of the sub-buffer functions and start using the whole
> buffer functions for that purpose, you no longer know whether the
> partial cache lines are part of the buffer or not, so you have to write
> those back every time too.
> 
> So far, we haven't had any reports of corruption of this type (maybe
> folk using the sync functions are rare on ARM - thankfully) but getting
> rid of the range sync functions means that solving this becomes a lot
> more difficult because we've lost the information to make the decision.

Well, right now I haven't heard anyone who wants to remove 
dma_sync_single_range_for_{cpu,device}. All this is about internal
implementation and dma_map_ops which uses the simplified calls, not
exposed to the drivers or any public API. 

I also see no reason why we loose the information. All drivers are still
required to call dma_map_{single,page} to aquire dma address first. This
way DMA mapping subsystem perfectly knows that the range from returned 
dma_addr to dma_addr+size has been used for dma operations. All calls to
dma_sync_single_* operations takes dma_addr as one of the arguments, so
there is no problem to check which dma range this particular sync 
operation fits.

In my patch I have shown that it is perfectly possible to use the common
dma_map_ops structure on ARM and unify dma mapping implementation a bit
with other architectures.

IMHO this is the right way. There is a need for custom dma mapping 
implementations (mainly related to taking the advantage of iommu controllers
available on newer SoCs). I would really like to avoid another set of ifdefs
or sequences of "if (iommu_supported())" all over the dma-mapping code. Even
now all this code is hard to understand in the first read (due to coherent/
non-coherent sub-architectures and dmabounce code mixed in).

> So I've always believed - and continue to do so - that those who want
> to get rid of the range sync functions are misguided and are storing up
> problems for the future.

I never said that I want to remove these operations from drivers API.

Best regards
diff mbox

Patch

diff --git a/arch/arm/common/dmabounce.c b/arch/arm/common/dmabounce.c
index e568163..f7b330f 100644
--- a/arch/arm/common/dmabounce.c
+++ b/arch/arm/common/dmabounce.c
@@ -171,7 +171,8 @@  find_safe_buffer(struct dmabounce_device_info *device_info, dma_addr_t safe_dma_
 	read_lock_irqsave(&device_info->lock, flags);
 
 	list_for_each_entry(b, &device_info->safe_buffers, node)
-		if (b->safe_dma_addr == safe_dma_addr) {
+		if (b->safe_dma_addr <= safe_dma_addr &&
+		    b->safe_dma_addr + b->size > safe_dma_addr) {
 			rb = b;
 			break;
 		}
@@ -391,9 +392,10 @@  void __dma_unmap_page(struct device *dev, dma_addr_t dma_addr, size_t size,
 EXPORT_SYMBOL(__dma_unmap_page);
 
 int dmabounce_sync_for_cpu(struct device *dev, dma_addr_t addr,
-		unsigned long off, size_t sz, enum dma_data_direction dir)
+		size_t sz, enum dma_data_direction dir)
 {
 	struct safe_buffer *buf;
+	unsigned long off;
 
 	dev_dbg(dev, "%s(dma=%#x,off=%#lx,sz=%zx,dir=%x)\n",
 		__func__, addr, off, sz, dir);
@@ -402,6 +404,8 @@  int dmabounce_sync_for_cpu(struct device *dev, dma_addr_t addr,
 	if (!buf)
 		return 1;
 
+	off = addr - buf->safe_dma_addr;
+
 	BUG_ON(buf->direction != dir);
 
 	dev_dbg(dev, "%s: unsafe buffer %p (dma=%#x) mapped to %p (dma=%#x)\n",
@@ -420,9 +424,10 @@  int dmabounce_sync_for_cpu(struct device *dev, dma_addr_t addr,
 EXPORT_SYMBOL(dmabounce_sync_for_cpu);
 
 int dmabounce_sync_for_device(struct device *dev, dma_addr_t addr,
-		unsigned long off, size_t sz, enum dma_data_direction dir)
+		size_t sz, enum dma_data_direction dir)
 {
 	struct safe_buffer *buf;
+	unsigned long off;
 
 	dev_dbg(dev, "%s(dma=%#x,off=%#lx,sz=%zx,dir=%x)\n",
 		__func__, addr, off, sz, dir);
@@ -431,6 +436,8 @@  int dmabounce_sync_for_device(struct device *dev, dma_addr_t addr,
 	if (!buf)
 		return 1;
 
+	off = addr - buf->safe_dma_addr;
+
 	BUG_ON(buf->direction != dir);
 
 	dev_dbg(dev, "%s: unsafe buffer %p (dma=%#x) mapped to %p (dma=%#x)\n",
diff --git a/arch/arm/include/asm/dma-mapping.h b/arch/arm/include/asm/dma-mapping.h
index 4fff837..ca920aa 100644
--- a/arch/arm/include/asm/dma-mapping.h
+++ b/arch/arm/include/asm/dma-mapping.h
@@ -310,10 +310,8 @@  extern void __dma_unmap_page(struct device *, dma_addr_t, size_t,
 /*
  * Private functions
  */
-int dmabounce_sync_for_cpu(struct device *, dma_addr_t, unsigned long,
-		size_t, enum dma_data_direction);
-int dmabounce_sync_for_device(struct device *, dma_addr_t, unsigned long,
-		size_t, enum dma_data_direction);
+int dmabounce_sync_for_cpu(struct device *, dma_addr_t, size_t, enum dma_data_direction);
+int dmabounce_sync_for_device(struct device *, dma_addr_t, size_t, enum dma_data_direction);
 #else
 static inline int dmabounce_sync_for_cpu(struct device *d, dma_addr_t addr,
 	unsigned long offset, size_t size, enum dma_data_direction dir)
@@ -454,6 +452,33 @@  static inline void dma_unmap_page(struct device *dev, dma_addr_t handle,
 	__dma_unmap_page(dev, handle, size, dir);
 }
 
+
+static inline void dma_sync_single_for_cpu(struct device *dev,
+		dma_addr_t handle, size_t size, enum dma_data_direction dir)
+{
+	BUG_ON(!valid_dma_direction(dir));
+
+	debug_dma_sync_single_for_cpu(dev, handle, size, dir);
+
+	if (!dmabounce_sync_for_cpu(dev, handle, size, dir))
+		return;
+
+	__dma_single_dev_to_cpu(dma_to_virt(dev, handle), size, dir);
+}
+
+static inline void dma_sync_single_for_device(struct device *dev,
+		dma_addr_t handle, size_t size, enum dma_data_direction dir)
+{
+	BUG_ON(!valid_dma_direction(dir));
+
+	debug_dma_sync_single_for_device(dev, handle, size, dir);
+
+	if (!dmabounce_sync_for_device(dev, handle, size, dir))
+		return;
+
+	__dma_single_cpu_to_dev(dma_to_virt(dev, handle), size, dir);
+}
+
 /**
  * dma_sync_single_range_for_cpu
  * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
@@ -476,40 +501,14 @@  static inline void dma_sync_single_range_for_cpu(struct device *dev,
 		dma_addr_t handle, unsigned long offset, size_t size,
 		enum dma_data_direction dir)
 {
-	BUG_ON(!valid_dma_direction(dir));
-
-	debug_dma_sync_single_for_cpu(dev, handle + offset, size, dir);
-
-	if (!dmabounce_sync_for_cpu(dev, handle, offset, size, dir))
-		return;
-
-	__dma_single_dev_to_cpu(dma_to_virt(dev, handle) + offset, size, dir);
+	dma_sync_single_for_cpu(dev, handle + offset, size, dir);
 }
 
 static inline void dma_sync_single_range_for_device(struct device *dev,
 		dma_addr_t handle, unsigned long offset, size_t size,
 		enum dma_data_direction dir)
 {
-	BUG_ON(!valid_dma_direction(dir));
-
-	debug_dma_sync_single_for_device(dev, handle + offset, size, dir);
-
-	if (!dmabounce_sync_for_device(dev, handle, offset, size, dir))
-		return;
-
-	__dma_single_cpu_to_dev(dma_to_virt(dev, handle) + offset, size, dir);
-}
-
-static inline void dma_sync_single_for_cpu(struct device *dev,
-		dma_addr_t handle, size_t size, enum dma_data_direction dir)
-{
-	dma_sync_single_range_for_cpu(dev, handle, 0, size, dir);
-}
-
-static inline void dma_sync_single_for_device(struct device *dev,
-		dma_addr_t handle, size_t size, enum dma_data_direction dir)
-{
-	dma_sync_single_range_for_device(dev, handle, 0, size, dir);
+	dma_sync_single_for_device(dev, handle + offset, size, dir);
 }
 
 /*
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 82a093c..c11f234 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -619,7 +619,7 @@  void dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg,
 	int i;
 
 	for_each_sg(sg, s, nents, i) {
-		if (!dmabounce_sync_for_cpu(dev, sg_dma_address(s), 0,
+		if (!dmabounce_sync_for_cpu(dev, sg_dma_address(s),
 					    sg_dma_len(s), dir))
 			continue;
 
@@ -645,7 +645,7 @@  void dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg,
 	int i;
 
 	for_each_sg(sg, s, nents, i) {
-		if (!dmabounce_sync_for_device(dev, sg_dma_address(s), 0,
+		if (!dmabounce_sync_for_device(dev, sg_dma_address(s),
 					sg_dma_len(s), dir))
 			continue;