Message ID | 20110711164919.GB18871@e102109-lin.cambridge.arm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Mon, Jul 11, 2011 at 05:49:20PM +0100, Catalin Marinas wrote: > On Wed, Jul 06, 2011 at 07:08:14PM +0100, Russell King - ARM Linux wrote: > > Okay, so I can say with confidence then that how things stand in my tree, > > which limits BTC invalidation to: > > > > 1. For kernel mappings, flush_icache_range() which must be called prior > > to code placed in them being executed. > > > > 2. For user mappings, __sync_icache_dcache's icache flush, which is > > called before a non-zero user PTE is inserted. > > What about: > > flush_cache_user_range() > flush_ptrace_access() > > They are fine as long as you haven't removed the BTC invalidation from > __cpuc_coherent_(kern|user)_range. That's the basis of flush_icache_range(), so that still has the BTC invalidate. > > The area which needs more to focus some further work is > > __sync_icache_dcache(), which is fairly over-zealous about all the > > flushing. > > Another thing that could be optimised is not to clean and invalidate the > D-cache but only clean to the PoU. The only problem is that > (flush|invalidate)_kernel_vmap_area, functions that seem to used only in > a single place. The semantics in cachetlb.txt claim to be used for I/O, > which means that they are already broken since we don't handle the L2 > cache. Those are newly introduced to cope with XFS wanting DMA to vmap'd areas to work. They're there to handle the vmalloc-space alias of the pages. The DMA API sorts out the kernel direct-mapped plus L2 for non-virtually tagged L2 caches. So they're just an additional pre-flush and post-invalidate calls around the DMA API to cope with vmalloc space aliases. So I don't think they're broken.
On Mon, Jul 11, 2011 at 06:01:41PM +0100, Russell King - ARM Linux wrote: > On Mon, Jul 11, 2011 at 05:49:20PM +0100, Catalin Marinas wrote: > > On Wed, Jul 06, 2011 at 07:08:14PM +0100, Russell King - ARM Linux wrote: > > > The area which needs more to focus some further work is > > > __sync_icache_dcache(), which is fairly over-zealous about all the > > > flushing. > > > > Another thing that could be optimised is not to clean and invalidate the > > D-cache but only clean to the PoU. The only problem is that > > (flush|invalidate)_kernel_vmap_area, functions that seem to used only in > > a single place. The semantics in cachetlb.txt claim to be used for I/O, > > which means that they are already broken since we don't handle the L2 > > cache. > > Those are newly introduced to cope with XFS wanting DMA to vmap'd areas > to work. They're there to handle the vmalloc-space alias of the pages. > The DMA API sorts out the kernel direct-mapped plus L2 for non-virtually > tagged L2 caches. > > So they're just an additional pre-flush and post-invalidate calls around > the DMA API to cope with vmalloc space aliases. So I don't think they're > broken. OK, so in this case these functions need to go to the point of coherency. We could also optimise them to do pre-cleaning and post-invalidation rather than always clean&invalidate. Can we not use dmac_flush_range() (or dma_clean_range and dma_inv_range via dmac_(map|unmap)_area) instead of __cpuc_flush_dcache_area?
On Tue, Jul 12, 2011 at 02:09:07PM +0100, Catalin Marinas wrote: > On Mon, Jul 11, 2011 at 06:01:41PM +0100, Russell King - ARM Linux wrote: > > On Mon, Jul 11, 2011 at 05:49:20PM +0100, Catalin Marinas wrote: > > > On Wed, Jul 06, 2011 at 07:08:14PM +0100, Russell King - ARM Linux wrote: > > > > The area which needs more to focus some further work is > > > > __sync_icache_dcache(), which is fairly over-zealous about all the > > > > flushing. > > > > > > Another thing that could be optimised is not to clean and invalidate the > > > D-cache but only clean to the PoU. The only problem is that > > > (flush|invalidate)_kernel_vmap_area, functions that seem to used only in > > > a single place. The semantics in cachetlb.txt claim to be used for I/O, > > > which means that they are already broken since we don't handle the L2 > > > cache. > > > > Those are newly introduced to cope with XFS wanting DMA to vmap'd areas > > to work. They're there to handle the vmalloc-space alias of the pages. > > The DMA API sorts out the kernel direct-mapped plus L2 for non-virtually > > tagged L2 caches. > > > > So they're just an additional pre-flush and post-invalidate calls around > > the DMA API to cope with vmalloc space aliases. So I don't think they're > > broken. > > OK, so in this case these functions need to go to the point of > coherency. We could also optimise them to do pre-cleaning and > post-invalidation rather than always clean&invalidate. > > Can we not use dmac_flush_range() (or dma_clean_range and dma_inv_range > via dmac_(map|unmap)_area) instead of __cpuc_flush_dcache_area? We got rid of the clean and invalidate interfaces because they weren't suitable for cross-CPU cache handling (they were defined by implementation rather than purpose.) We don't have enough information here to be able to call the map/unmap functions (the DMA direction would be a nonsense) so I think these should be new callbacks into the CPU cache code, and we do whatever is necessary in the low level stuff, rather than trying to overload existing functions.
diff --git a/arch/arm/mm/cache-v7.S b/arch/arm/mm/cache-v7.S index d32f02b..b4faccd 100644 --- a/arch/arm/mm/cache-v7.S +++ b/arch/arm/mm/cache-v7.S @@ -224,7 +224,7 @@ ENTRY(v7_flush_kern_dcache_area) sub r3, r2, #1 bic r0, r0, r3 1: - mcr p15, 0, r0, c7, c14, 1 @ clean & invalidate D line / unified line + mcr p15, 0, r0, c7, c11, 1 @ clean D line to the point of unification add r0, r0, r2 cmp r0, r1 blo 1b