diff mbox

Unnecessary cache-line flush on page table updates ?

Message ID 20110711164919.GB18871@e102109-lin.cambridge.arm.com (mailing list archive)
State New, archived
Headers show

Commit Message

Catalin Marinas July 11, 2011, 4:49 p.m. UTC
On Wed, Jul 06, 2011 at 07:08:14PM +0100, Russell King - ARM Linux wrote:
> Okay, so I can say with confidence then that how things stand in my tree,
> which limits BTC invalidation to:
> 
> 1. For kernel mappings, flush_icache_range() which must be called prior
>    to code placed in them being executed.
> 
> 2. For user mappings, __sync_icache_dcache's icache flush, which is
>    called before a non-zero user PTE is inserted.

What about:

flush_cache_user_range()
flush_ptrace_access()

They are fine as long as you haven't removed the BTC invalidation from
__cpuc_coherent_(kern|user)_range.

> The area which needs more to focus some further work is
> __sync_icache_dcache(), which is fairly over-zealous about all the
> flushing.

Another thing that could be optimised is not to clean and invalidate the
D-cache but only clean to the PoU. The only problem is that
(flush|invalidate)_kernel_vmap_area, functions that seem to used only in
a single place. The semantics in cachetlb.txt claim to be used for I/O,
which means that they are already broken since we don't handle the L2
cache.

8<----------------------

From e7ad36d5866a93fb182e8edfc8ae03b5911cc3e6 Mon Sep 17 00:00:00 2001
From: Catalin Marinas <catalin.marinas@arm.com>
Date: Mon, 11 Jul 2011 17:38:57 +0100
Subject: [PATCH 1/1] ARM: Only clean the D-cache to the PoU in v7_flush_kern_dcache_area()

With PIPT caches (like ARMv7), there is no need to clean&invalidate the
D-cache via the __cpuc_flush_dcache_area() function as there are no
aliases.  The D-cache only needs to be cleaned for the coherency with
the I-cache.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
---
 arch/arm/mm/cache-v7.S |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

Comments

Russell King - ARM Linux July 11, 2011, 5:01 p.m. UTC | #1
On Mon, Jul 11, 2011 at 05:49:20PM +0100, Catalin Marinas wrote:
> On Wed, Jul 06, 2011 at 07:08:14PM +0100, Russell King - ARM Linux wrote:
> > Okay, so I can say with confidence then that how things stand in my tree,
> > which limits BTC invalidation to:
> > 
> > 1. For kernel mappings, flush_icache_range() which must be called prior
> >    to code placed in them being executed.
> > 
> > 2. For user mappings, __sync_icache_dcache's icache flush, which is
> >    called before a non-zero user PTE is inserted.
> 
> What about:
> 
> flush_cache_user_range()
> flush_ptrace_access()
> 
> They are fine as long as you haven't removed the BTC invalidation from
> __cpuc_coherent_(kern|user)_range.

That's the basis of flush_icache_range(), so that still has the BTC
invalidate.

> > The area which needs more to focus some further work is
> > __sync_icache_dcache(), which is fairly over-zealous about all the
> > flushing.
> 
> Another thing that could be optimised is not to clean and invalidate the
> D-cache but only clean to the PoU. The only problem is that
> (flush|invalidate)_kernel_vmap_area, functions that seem to used only in
> a single place. The semantics in cachetlb.txt claim to be used for I/O,
> which means that they are already broken since we don't handle the L2
> cache.

Those are newly introduced to cope with XFS wanting DMA to vmap'd areas
to work.  They're there to handle the vmalloc-space alias of the pages.
The DMA API sorts out the kernel direct-mapped plus L2 for non-virtually
tagged L2 caches.

So they're just an additional pre-flush and post-invalidate calls around
the DMA API to cope with vmalloc space aliases.  So I don't think they're
broken.
Catalin Marinas July 12, 2011, 1:09 p.m. UTC | #2
On Mon, Jul 11, 2011 at 06:01:41PM +0100, Russell King - ARM Linux wrote:
> On Mon, Jul 11, 2011 at 05:49:20PM +0100, Catalin Marinas wrote:
> > On Wed, Jul 06, 2011 at 07:08:14PM +0100, Russell King - ARM Linux wrote:
> > > The area which needs more to focus some further work is
> > > __sync_icache_dcache(), which is fairly over-zealous about all the
> > > flushing.
> > 
> > Another thing that could be optimised is not to clean and invalidate the
> > D-cache but only clean to the PoU. The only problem is that
> > (flush|invalidate)_kernel_vmap_area, functions that seem to used only in
> > a single place. The semantics in cachetlb.txt claim to be used for I/O,
> > which means that they are already broken since we don't handle the L2
> > cache.
> 
> Those are newly introduced to cope with XFS wanting DMA to vmap'd areas
> to work.  They're there to handle the vmalloc-space alias of the pages.
> The DMA API sorts out the kernel direct-mapped plus L2 for non-virtually
> tagged L2 caches.
> 
> So they're just an additional pre-flush and post-invalidate calls around
> the DMA API to cope with vmalloc space aliases.  So I don't think they're
> broken.

OK, so in this case these functions need to go to the point of
coherency. We could also optimise them to do pre-cleaning and
post-invalidation rather than always clean&invalidate.

Can we not use dmac_flush_range() (or dma_clean_range and dma_inv_range
via dmac_(map|unmap)_area) instead of __cpuc_flush_dcache_area?
Russell King - ARM Linux July 15, 2011, 4:24 p.m. UTC | #3
On Tue, Jul 12, 2011 at 02:09:07PM +0100, Catalin Marinas wrote:
> On Mon, Jul 11, 2011 at 06:01:41PM +0100, Russell King - ARM Linux wrote:
> > On Mon, Jul 11, 2011 at 05:49:20PM +0100, Catalin Marinas wrote:
> > > On Wed, Jul 06, 2011 at 07:08:14PM +0100, Russell King - ARM Linux wrote:
> > > > The area which needs more to focus some further work is
> > > > __sync_icache_dcache(), which is fairly over-zealous about all the
> > > > flushing.
> > > 
> > > Another thing that could be optimised is not to clean and invalidate the
> > > D-cache but only clean to the PoU. The only problem is that
> > > (flush|invalidate)_kernel_vmap_area, functions that seem to used only in
> > > a single place. The semantics in cachetlb.txt claim to be used for I/O,
> > > which means that they are already broken since we don't handle the L2
> > > cache.
> > 
> > Those are newly introduced to cope with XFS wanting DMA to vmap'd areas
> > to work.  They're there to handle the vmalloc-space alias of the pages.
> > The DMA API sorts out the kernel direct-mapped plus L2 for non-virtually
> > tagged L2 caches.
> > 
> > So they're just an additional pre-flush and post-invalidate calls around
> > the DMA API to cope with vmalloc space aliases.  So I don't think they're
> > broken.
> 
> OK, so in this case these functions need to go to the point of
> coherency. We could also optimise them to do pre-cleaning and
> post-invalidation rather than always clean&invalidate.
> 
> Can we not use dmac_flush_range() (or dma_clean_range and dma_inv_range
> via dmac_(map|unmap)_area) instead of __cpuc_flush_dcache_area?

We got rid of the clean and invalidate interfaces because they weren't
suitable for cross-CPU cache handling (they were defined by implementation
rather than purpose.)

We don't have enough information here to be able to call the map/unmap
functions (the DMA direction would be a nonsense) so I think these
should be new callbacks into the CPU cache code, and we do whatever
is necessary in the low level stuff, rather than trying to overload
existing functions.
diff mbox

Patch

diff --git a/arch/arm/mm/cache-v7.S b/arch/arm/mm/cache-v7.S
index d32f02b..b4faccd 100644
--- a/arch/arm/mm/cache-v7.S
+++ b/arch/arm/mm/cache-v7.S
@@ -224,7 +224,7 @@  ENTRY(v7_flush_kern_dcache_area)
 	sub	r3, r2, #1
 	bic	r0, r0, r3
 1:
-	mcr	p15, 0, r0, c7, c14, 1		@ clean & invalidate D line / unified line
+	mcr	p15, 0, r0, c7, c11, 1		@ clean D line to the point of unification
 	add	r0, r0, r2
 	cmp	r0, r1
 	blo	1b