diff mbox

[2/3] define ARM-specific dma_coherent_write_sync

Message ID 1314826214-22428-3-git-send-email-msalter@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Mark Salter Aug. 31, 2011, 9:30 p.m. UTC
For ARM kernels using CONFIG_ARM_DMA_MEM_BUFFERABLE, this patch adds an ARM
specific dma_coherent_write_sync() to override the default version. This
routine forces out any data sitting in a write buffer between the CPU and
memory.

Signed-off-by: Mark Salter <msalter@redhat.com>
---
 arch/arm/include/asm/dma-mapping.h |   10 ++++++++++
 1 files changed, 10 insertions(+), 0 deletions(-)

Comments

Catalin Marinas Sept. 6, 2011, 2:32 p.m. UTC | #1
On 31 August 2011 22:30, Mark Salter <msalter@redhat.com> wrote:
> For ARM kernels using CONFIG_ARM_DMA_MEM_BUFFERABLE, this patch adds an ARM
> specific dma_coherent_write_sync() to override the default version. This
> routine forces out any data sitting in a write buffer between the CPU and
> memory.
>
> Signed-off-by: Mark Salter <msalter@redhat.com>
> ---
>  arch/arm/include/asm/dma-mapping.h |   10 ++++++++++
>  1 files changed, 10 insertions(+), 0 deletions(-)
>
> diff --git a/arch/arm/include/asm/dma-mapping.h b/arch/arm/include/asm/dma-mapping.h
> index 7a21d0b..e99562b 100644
> --- a/arch/arm/include/asm/dma-mapping.h
> +++ b/arch/arm/include/asm/dma-mapping.h
> @@ -206,6 +206,16 @@ int dma_mmap_writecombine(struct device *, struct vm_area_struct *,
>                void *, dma_addr_t, size_t);
>
>
> +#ifdef CONFIG_ARM_DMA_MEM_BUFFERABLE
> +#define ARCH_HAS_DMA_COHERENT_WRITE_SYNC
> +
> +static inline void dma_coherent_write_sync(void)
> +{
> +       dsb();
> +       outer_sync();
> +}

That's what mb() and wmb() do already, at least on ARM. Why do we need
another API? IIRC from past discussions on linux-arch around barriers,
the mb() should be sufficient in the case of DMA coherent buffers.
That's why macros like writel() on ARM have the mb() added by default
(for cases where you start the DMA transfer by writing to a device
register).
Mark Salter Sept. 6, 2011, 2:37 p.m. UTC | #2
On Tue, 2011-09-06 at 15:32 +0100, Catalin Marinas wrote:
> That's what mb() and wmb() do already, at least on ARM. Why do we need
> another API? IIRC from past discussions on linux-arch around barriers,
> the mb() should be sufficient in the case of DMA coherent buffers.
> That's why macros like writel() on ARM have the mb() added by default
> (for cases where you start the DMA transfer by writing to a device
> register). 

For USB EHCI, the driver does not necessarily write to a register after
writing to DMA coherent memory. In some cases, the controller polls for
information written by the driver.

--Mark
Catalin Marinas Sept. 6, 2011, 2:48 p.m. UTC | #3
On 6 September 2011 15:37, Mark Salter <msalter@redhat.com> wrote:
> On Tue, 2011-09-06 at 15:32 +0100, Catalin Marinas wrote:
>> That's what mb() and wmb() do already, at least on ARM. Why do we need
>> another API? IIRC from past discussions on linux-arch around barriers,
>> the mb() should be sufficient in the case of DMA coherent buffers.
>> That's why macros like writel() on ARM have the mb() added by default
>> (for cases where you start the DMA transfer by writing to a device
>> register).
>
> For USB EHCI, the driver does not necessarily write to a register after
> writing to DMA coherent memory. In some cases, the controller polls for
> information written by the driver.

So as I understand, you would like to force the eviction from the
write buffer rather than waiting for it to be drained. On ARM, the
write buffer is eventually flushed, so there is no strict timing
guarantee. It could take longer if the processor immediately starts
polling some memory location for example, but in this case a simple
barrier would do.
Mark Salter Sept. 6, 2011, 3:02 p.m. UTC | #4
On Tue, 2011-09-06 at 15:48 +0100, Catalin Marinas wrote:
> On 6 September 2011 15:37, Mark Salter <msalter@redhat.com> wrote:
> > On Tue, 2011-09-06 at 15:32 +0100, Catalin Marinas wrote:
> >> That's what mb() and wmb() do already, at least on ARM. Why do we need
> >> another API? IIRC from past discussions on linux-arch around barriers,
> >> the mb() should be sufficient in the case of DMA coherent buffers.
> >> That's why macros like writel() on ARM have the mb() added by default
> >> (for cases where you start the DMA transfer by writing to a device
> >> register).
> >
> > For USB EHCI, the driver does not necessarily write to a register after
> > writing to DMA coherent memory. In some cases, the controller polls for
> > information written by the driver.
> 
> So as I understand, you would like to force the eviction from the
> write buffer rather than waiting for it to be drained. On ARM, the
> write buffer is eventually flushed, so there is no strict timing
> guarantee. It could take longer if the processor immediately starts
> polling some memory location for example, but in this case a simple
> barrier would do.

Yes, a memory barrier would have the same effect on ARM, but the
purpose of a barrier is to guarantee ordering. What the patch does
is add an interface to force a write buffer flush for performance,
not ordering. If a memory barrier is used, it could have a negative
impact on other arches.

In any case, the current thinking is that the original problem with
the USB performance seen on cortex A9 multicore is probably something
more than just write buffer delays. Once the original problem is better
understood, we can take another look at this patch if it is still
needed.

--Mark
diff mbox

Patch

diff --git a/arch/arm/include/asm/dma-mapping.h b/arch/arm/include/asm/dma-mapping.h
index 7a21d0b..e99562b 100644
--- a/arch/arm/include/asm/dma-mapping.h
+++ b/arch/arm/include/asm/dma-mapping.h
@@ -206,6 +206,16 @@  int dma_mmap_writecombine(struct device *, struct vm_area_struct *,
 		void *, dma_addr_t, size_t);
 
 
+#ifdef CONFIG_ARM_DMA_MEM_BUFFERABLE
+#define ARCH_HAS_DMA_COHERENT_WRITE_SYNC
+
+static inline void dma_coherent_write_sync(void)
+{
+	dsb();
+	outer_sync();
+}
+#endif
+
 #ifdef CONFIG_DMABOUNCE
 /*
  * For SA-1111, IXP425, and ADI systems  the dma-mapping functions are "magic"