diff mbox

[RESEND] ARM: Handle user space mapped pages in flush_kernel_dcache_page

Message ID 1343464914-31084-1-git-send-email-gmbnomis@gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Simon Baatz July 28, 2012, 8:41 a.m. UTC
Commit f8b63c1 made flush_kernel_dcache_page a no-op assuming that the pages
it needs to handle are kernel mapped only.  However, for example when doing
direct I/O, pages with user space mappings may occur.

Thus, continue to do lazy flushing if there are no user space mappings.
Otherwise, flush the kernel cache lines directly.

Signed-off-by: Simon Baatz <gmbnomis@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Russell King <linux@arm.linux.org.uk>
---

Hi,

a while ago I sent the patch above to fix a data corruption problem
on VIVT architectures (and probably VIPT aliasing).  There has been a
bit of discussion with Catalin, but there was no real conclusion on
how to proceed.  (See
http://www.spinics.net/lists/arm-kernel/msg176913.html for the
original post)

The case is not hit too often apparently; the ingredients are PIO
(like) driver, use of flush_kernel_dcache_page(), and direct I/O. 
However, there is at least one real world example (running lvm2 on
top of an encrypted block device using mv_cesa on Kirkwood) that does
not work at all because of this problem.

- Simon


 arch/arm/include/asm/cacheflush.h |    4 ++++
 arch/arm/mm/flush.c               |   22 ++++++++++++++++++++++
 2 files changed, 26 insertions(+)

Comments

Russell King - ARM Linux July 28, 2012, 11:27 a.m. UTC | #1
On Sat, Jul 28, 2012 at 10:41:54AM +0200, Simon Baatz wrote:
> a while ago I sent the patch above to fix a data corruption problem
> on VIVT architectures (and probably VIPT aliasing).  There has been a
> bit of discussion with Catalin, but there was no real conclusion on
> how to proceed.  (See
> http://www.spinics.net/lists/arm-kernel/msg176913.html for the
> original post)

Going back to that post:

 However, this assumption is not true for direct IO or SG_IO (see
 e.g. [1] and [2]). This is why vgchange fails with mv_cesa as reported
 by me in [3]. (Btw., flush_kernel_dcache_page is also called from
 "copy_strings" in fs/exec.c when copying env/arg strings between
 processes. Thus, its use is not restricted to device drivers.)

The calls from copy_strings is not a problem - because the newly allocated
pages will have PG_arch_1 clear, and when the page is passed to set_pte(),
it will be flushed.

We _certainly_ do not want to make flush_kernel_dcache_page() do what you're
doing, because it will mean we're over-flushing *all* pages on VIVT caches.
Not only will we be flushing them for DMA, but we'll then do it again
when flush_kernel_dcache_page() is invoked, and then possibly again when
the page eventually ends up being visible to userspace.

> The case is not hit too often apparently; the ingredients are PIO
> (like) driver, use of flush_kernel_dcache_page(), and direct I/O. 

Right, so we need to analyse the direct IO paths and work out whether
they're using the flush_* interfaces correctly, and even whether they're
using the correct interfaces.

Note that flush_*dcache_page() are supposed to only be concerned with
page cache pages, not anonymous pages.  flush_anon_page() is all about
anonymous pages only.
Simon Baatz July 28, 2012, 8:55 p.m. UTC | #2
Hi Russel,

On Sat, Jul 28, 2012 at 12:27:07PM +0100, Russell King - ARM Linux wrote:
> On Sat, Jul 28, 2012 at 10:41:54AM +0200, Simon Baatz wrote:
> > a while ago I sent the patch above to fix a data corruption problem
> > on VIVT architectures (and probably VIPT aliasing).  There has been a
> > bit of discussion with Catalin, but there was no real conclusion on
> > how to proceed.  (See
> > http://www.spinics.net/lists/arm-kernel/msg176913.html for the
> > original post)
> 
> Going back to that post:
> 
>  However, this assumption is not true for direct IO or SG_IO (see
>  e.g. [1] and [2]). This is why vgchange fails with mv_cesa as reported
>  by me in [3]. (Btw., flush_kernel_dcache_page is also called from
>  "copy_strings" in fs/exec.c when copying env/arg strings between
>  processes. Thus, its use is not restricted to device drivers.)
> 
> The calls from copy_strings is not a problem - because the newly allocated
> pages will have PG_arch_1 clear, and when the page is passed to set_pte(),
> it will be flushed.

I was not implying that copy_strings has a problem. But, from
copy_string's comment:

/*
 * 'copy_strings()' copies argument/environment strings from the old
 * processes's memory to the new process's stack. ...


You said below that flush_kernel_dcache_page() is supposed to be
called for page cache pages only.  Hmm, in the new process's stack?

 
> We _certainly_ do not want to make flush_kernel_dcache_page() do what you're
> doing, because it will mean we're over-flushing *all* pages on VIVT caches.
> Not only will we be flushing them for DMA, but we'll then do it again
> when flush_kernel_dcache_page() is invoked, and then possibly again when
> the page eventually ends up being visible to userspace.

Why should flush_kernel_dcache_page() be invoked at all for DMAed
pages?  If you state that this patch over-flushes *all* pages, I
assume that you _certainly_ do not understand the mapping == NULL
case.  ;-)

Can a page in the page cache have page->mapping == NULL? If not
page_mapping() only returns NULL in the anon case.

I found this strange myself, but this is the way I thought
flush_dcache_page() handles this.  But now I realized it's probably
just a bug in that function, because flush_dcache_page() is not
supposed to handle anon pages at all.  (However, it flushes the
kernel mapping currently, since mapping == NULL for these pages.)

If we find out that flush_kernel_dcache_page() needs to handle
anonymous pages, it should do this explicitly.

> > The case is not hit too often apparently; the ingredients are PIO
> > (like) driver, use of flush_kernel_dcache_page(), and direct I/O. 
> 
> Right, so we need to analyse the direct IO paths and work out whether
> they're using the flush_* interfaces correctly, and even whether they're
> using the correct interfaces.

I agree. I am not claiming that my patch is necessarily correct; this
is a tangled mess for me.  But hey, it got the discussion finally
going.
 
> Note that flush_*dcache_page() are supposed to only be concerned with
> page cache pages, not anonymous pages.  flush_anon_page() is all about
> anonymous pages only.

May be, may be not. From Documentation/cachetlb.txt:

  void flush_dcache_page(struct page *page)

	Any time the kernel writes to a page cache page, _OR_
	the kernel is about to read from a page cache page and
	user space shared/writable mappings of this page potentially
	exist, this routine is called.

  void flush_kernel_dcache_page(struct page *page)
	When the kernel needs to modify a user page is has obtained
	with kmap, it calls this function after all modifications are
	complete (but before kunmapping it) to bring the underlying
	page up to date.


- Simon
Simon Baatz Aug. 13, 2012, 8:15 p.m. UTC | #3
On Sat, Jul 28, 2012 at 10:55:09PM +0200, Simon Baatz wrote:
> Hi Russel,
> 
> On Sat, Jul 28, 2012 at 12:27:07PM +0100, Russell King - ARM Linux wrote:
> > On Sat, Jul 28, 2012 at 10:41:54AM +0200, Simon Baatz wrote:
> > > a while ago I sent the patch above to fix a data corruption problem
> > > on VIVT architectures (and probably VIPT aliasing).  There has been a
> > > bit of discussion with Catalin, but there was no real conclusion on
> > > how to proceed.  (See
> > > http://www.spinics.net/lists/arm-kernel/msg176913.html for the
> > > original post)
> > 
> > Going back to that post:
> > 
> >  However, this assumption is not true for direct IO or SG_IO (see
> >  e.g. [1] and [2]). This is why vgchange fails with mv_cesa as reported
> >  by me in [3]. (Btw., flush_kernel_dcache_page is also called from
> >  "copy_strings" in fs/exec.c when copying env/arg strings between
> >  processes. Thus, its use is not restricted to device drivers.)
> > 
> > The calls from copy_strings is not a problem - because the newly allocated
> > pages will have PG_arch_1 clear, and when the page is passed to set_pte(),
> > it will be flushed.
> 
> I was not implying that copy_strings has a problem. But, from
> copy_string's comment:
> 
> /*
>  * 'copy_strings()' copies argument/environment strings from the old
>  * processes's memory to the new process's stack. ...
> 
> 
> You said below that flush_kernel_dcache_page() is supposed to be
> called for page cache pages only.  Hmm, in the new process's stack?
> 
>  
> > We _certainly_ do not want to make flush_kernel_dcache_page() do what you're
> > doing, because it will mean we're over-flushing *all* pages on VIVT caches.
> > Not only will we be flushing them for DMA, but we'll then do it again
> > when flush_kernel_dcache_page() is invoked, and then possibly again when
> > the page eventually ends up being visible to userspace.
> 
> Why should flush_kernel_dcache_page() be invoked at all for DMAed
> pages?  If you state that this patch over-flushes *all* pages, I
> assume that you _certainly_ do not understand the mapping == NULL
> case.  ;-)
> 
> Can a page in the page cache have page->mapping == NULL? If not
> page_mapping() only returns NULL in the anon case.
> 
> I found this strange myself, but this is the way I thought
> flush_dcache_page() handles this.  But now I realized it's probably
> just a bug in that function, because flush_dcache_page() is not
> supposed to handle anon pages at all.  (However, it flushes the
> kernel mapping currently, since mapping == NULL for these pages.)
> 
> If we find out that flush_kernel_dcache_page() needs to handle
> anonymous pages, it should do this explicitly.
> 
> > > The case is not hit too often apparently; the ingredients are PIO
> > > (like) driver, use of flush_kernel_dcache_page(), and direct I/O. 
> > 
> > Right, so we need to analyse the direct IO paths and work out whether
> > they're using the flush_* interfaces correctly, and even whether they're
> > using the correct interfaces.

Ok, I digged a little, but came to the same conclusion as before:

   O_DIRECT uses "get_user_pages()" pages (user mapped pages) to feed
   the block layer with rather than pagecache pages.  However
   pagecache pages can also be user mapped, so I'm thinking there
   should be enough cache flushing APIs to be able to handle either
   case.

(from https://lkml.org/lkml/2008/11/20/20)
 
flush_anon_page() and flush_kernel_dcache_page() were introduced
in the same patch set:

   Currently, get_user_pages() returns fully coherent pages to the
   kernel for anything other than anonymous pages.  This is a problem
   for things like fuse and the SCSI generic ioctl SG_IO which can
   potentially wish to do DMA to anonymous pages passed in by users.

   The fix is to add a new memory management API: flush_anon_page()
   which is used in get_user_pages() to make anonymous pages
   coherent.

(from https://lkml.org/lkml/2006/3/21/363)

   We have a problem in a lot of emulated storage in that it takes a
   page from get_user_pages() and does something like

   kmap_atomic(page)
   modify page
   kunmap_atomic(page)

   However, nothing has flushed the kernel cache view of the page
   before the kunmap.  We need a lightweight API to do this, so this
   new API would specifically be for flushing the kernel cache view
   of a user page which the kernel has modified.  The driver would
   need to add flush_kernel_dcache_page(page) before the final
   kunmap.

(from https://lkml.org/lkml/2006/3/21/364)

I have seen much discussion whether this make sense, but no changes
to this approach.

Thus:

- When using O_DIRECT, the block layer sees pages from get_user_pages()

- get_user_pages() is supposed to handle anonymous pages as well as
  user mapped page cache pages.

-> the block layer (and thus device drivers) will see these types of
   pages in this case.

- flush_kernel_dcache_page()'s intention is to handle these user pages


As said before, I tried to be least intrusive with my patch. Leave
the common case completely untouched (page cache page with no user
mappings). Don't flush and don't do anything to PG_dcache_clean.

However, in case of an anonymous page or a page cache page with user
mappings, just flush the page. I don't think setting PG_dcache_clean
is of much value here (the page already has user mappings).

However, if the patch should be more explicit, I can change that of
course, i.e. mimic flush_dcache_page, but only for the kernel
mapping. 


A different story is whether PIO device drivers (instead of emulated
storage ones) should use flush_kernel_dcache_page() or
flush_dache_page().  In almost all cases, they seem to use
flush_dcache_page() today, which is not supposed to handle anonymous
pages (but does at least for the kernel mapping). 

- Simon


> I agree. I am not claiming that my patch is necessarily correct; this
> is a tangled mess for me.  But hey, it got the discussion finally
> going.
>  
> > Note that flush_*dcache_page() are supposed to only be concerned with
> > page cache pages, not anonymous pages.  flush_anon_page() is all about
> > anonymous pages only.
> 
> May be, may be not. From Documentation/cachetlb.txt:
> 
>   void flush_dcache_page(struct page *page)
> 
> 	Any time the kernel writes to a page cache page, _OR_
> 	the kernel is about to read from a page cache page and
> 	user space shared/writable mappings of this page potentially
> 	exist, this routine is called.
> 
>   void flush_kernel_dcache_page(struct page *page)
> 	When the kernel needs to modify a user page is has obtained
> 	with kmap, it calls this function after all modifications are
> 	complete (but before kunmapping it) to bring the underlying
> 	page up to date.
>
diff mbox

Patch

diff --git a/arch/arm/include/asm/cacheflush.h b/arch/arm/include/asm/cacheflush.h
index 004c1bc..91ddc70 100644
--- a/arch/arm/include/asm/cacheflush.h
+++ b/arch/arm/include/asm/cacheflush.h
@@ -303,6 +303,10 @@  static inline void flush_anon_page(struct vm_area_struct *vma,
 #define ARCH_HAS_FLUSH_KERNEL_DCACHE_PAGE
 static inline void flush_kernel_dcache_page(struct page *page)
 {
+	extern void __flush_kernel_dcache_page(struct page *);
+	/* highmem pages are always flushed upon kunmap already */
+	if ((cache_is_vivt() || cache_is_vipt_aliasing()) && !PageHighMem(page))
+		__flush_kernel_dcache_page(page);
 }
 
 #define flush_dcache_mmap_lock(mapping) \
diff --git a/arch/arm/mm/flush.c b/arch/arm/mm/flush.c
index 7745854..bcba3a9 100644
--- a/arch/arm/mm/flush.c
+++ b/arch/arm/mm/flush.c
@@ -192,6 +192,28 @@  void __flush_dcache_page(struct address_space *mapping, struct page *page)
 				page->index << PAGE_CACHE_SHIFT);
 }
 
+/*
+ * Ensure cache coherency for kernel mapping of this page.
+ *
+ * If the page only exists in the page cache and there are no user
+ * space mappings, this is a no-op since the page was already marked
+ * dirty at creation.  Otherwise, we need to flush the dirty kernel
+ * cache lines directly.
+ *
+ * We can assume that the page is no high mem page, see
+ * flush_kernel_dcache_page.
+ */
+void __flush_kernel_dcache_page(struct page *page)
+{
+	struct address_space *mapping;
+
+	mapping = page_mapping(page);
+
+	if (!mapping || mapping_mapped(mapping))
+		__cpuc_flush_dcache_area(page_address(page), PAGE_SIZE);
+}
+EXPORT_SYMBOL(__flush_kernel_dcache_page);
+
 static void __flush_dcache_aliases(struct address_space *mapping, struct page *page)
 {
 	struct mm_struct *mm = current->active_mm;