diff mbox

[V3,2/2] ARM: Handle user space mapped pages in flush_kernel_dcache_page

Message ID 1349609352-6408-3-git-send-email-gmbnomis@gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Simon Baatz Oct. 7, 2012, 11:29 a.m. UTC
Commit f8b63c1 made flush_kernel_dcache_page() a no-op assuming that
the pages it needs to handle are kernel mapped only.  However, for
example when doing direct I/O, pages with user space mappings may
occur.

Thus, do lazy flushing like in flush_dcache_page() if there are no user
space mappings.  Otherwise, flush the kernel cache lines directly.

Signed-off-by: Simon Baatz <gmbnomis@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Russell King <linux@arm.linux.org.uk>
---

Changes:

in V3:
- Followed Catalin's suggestion to reverse the order of the patches

in V2:
- flush_kernel_dcache_page() follows flush_dcache_page() now, except that it
  does not flush the user mappings

 arch/arm/include/asm/cacheflush.h |    4 ++++
 arch/arm/mm/flush.c               |   42 +++++++++++++++++++++++++++++++++++++
 2 files changed, 46 insertions(+)

Comments

Catalin Marinas Oct. 8, 2012, 5:44 p.m. UTC | #1
On Sun, Oct 07, 2012 at 12:29:12PM +0100, Simon Baatz wrote:
> Commit f8b63c1 made flush_kernel_dcache_page() a no-op assuming that
> the pages it needs to handle are kernel mapped only.  However, for
> example when doing direct I/O, pages with user space mappings may
> occur.
> 
> Thus, do lazy flushing like in flush_dcache_page() if there are no user
> space mappings.  Otherwise, flush the kernel cache lines directly.

Do you need to fix the VIPT non-aliasing case as well? Does
flush_kernel_dcache_page() need to handle I-cache?
Simon Baatz Oct. 8, 2012, 8:02 p.m. UTC | #2
Hi Catalin,

On Mon, Oct 08, 2012 at 06:44:28PM +0100, Catalin Marinas wrote:
> On Sun, Oct 07, 2012 at 12:29:12PM +0100, Simon Baatz wrote:
> > Commit f8b63c1 made flush_kernel_dcache_page() a no-op assuming that
> > the pages it needs to handle are kernel mapped only.  However, for
> > example when doing direct I/O, pages with user space mappings may
> > occur.
> > 
> > Thus, do lazy flushing like in flush_dcache_page() if there are no user
> > space mappings.  Otherwise, flush the kernel cache lines directly.
> 
> Do you need to fix the VIPT non-aliasing case as well? Does
> flush_kernel_dcache_page() need to handle I-cache?

Good question. My previous version of the patch did not handle it,
but after our discussion on the arm64 case, I came to the conclusion
that we probably need to handle it.

flush_dcache_page() and flush_kernel_dcache_page() are both used when
a page was modified via its kernel mapping.  For example,
crypto/scatterwalk.c uses flush_dcache_page() whereas
lib/scatterlist.c uses flush_kernel_dcache_page().  

Thus, the reasoning is that if flush_dcache_page() needs to handle
I-cache in the case there is no hook later (already user-mapped page
cache page) then flush_kernel_dcache_page() needs to do the same. 

With respect to clearing the flag in the VIPT non-aliasing case with
anon pages: Declaring the pages dirty by default may already be
sufficient in this case, at least that is what the current
implementation assumes.  On the other hand, if we clear the
PG_dcache_clean flag in a function that is not even supposed to
handle anon pages, then why shouldn't we do it in a function that is? 
I wanted to play safe here and apply the same logic in both
functions.


- Simon
Russell King - ARM Linux Oct. 8, 2012, 8:20 p.m. UTC | #3
On Mon, Oct 08, 2012 at 10:02:16PM +0200, Simon Baatz wrote:
> Hi Catalin,
> 
> On Mon, Oct 08, 2012 at 06:44:28PM +0100, Catalin Marinas wrote:
> > On Sun, Oct 07, 2012 at 12:29:12PM +0100, Simon Baatz wrote:
> > > Commit f8b63c1 made flush_kernel_dcache_page() a no-op assuming that
> > > the pages it needs to handle are kernel mapped only.  However, for
> > > example when doing direct I/O, pages with user space mappings may
> > > occur.
> > > 
> > > Thus, do lazy flushing like in flush_dcache_page() if there are no user
> > > space mappings.  Otherwise, flush the kernel cache lines directly.
> > 
> > Do you need to fix the VIPT non-aliasing case as well? Does
> > flush_kernel_dcache_page() need to handle I-cache?
> 
> Good question. My previous version of the patch did not handle it,
> but after our discussion on the arm64 case, I came to the conclusion
> that we probably need to handle it.
> 
> flush_dcache_page() and flush_kernel_dcache_page() are both used when
> a page was modified via its kernel mapping.  For example,
> crypto/scatterwalk.c uses flush_dcache_page() whereas
> lib/scatterlist.c uses flush_kernel_dcache_page().  

It's likely that this stuff is incredibly buggy - and where we suspect
there's problems (like using the wrong flush_dcache_page() vs
flush_kernel_dcache_page() call) that needs to be fixed.

I suspect crypto/scatterwalk.c was created long before
flush_kernel_dcache_page() came into existence, and it probably needs
to be fixed.

> Thus, the reasoning is that if flush_dcache_page() needs to handle
> I-cache in the case there is no hook later (already user-mapped page
> cache page) then flush_kernel_dcache_page() needs to do the same. 

This sounds like we're heading in the direction that flush_kernel_dcache_page()
and flush_dcache_page() end up being virtually identical.  This isn't
supposed to be - they _are_ supposed to be different.  Again, maybe
that's because the users have chosen the wrong function.

Remember that these functions only get used on so-called "minority"
architectures where the caches are a hinderance - to put that a
different way, most people don't care about these calls because x86
just works.

> With respect to clearing the flag in the VIPT non-aliasing case with
> anon pages: Declaring the pages dirty by default may already be
> sufficient in this case, at least that is what the current
> implementation assumes.

PG_arch_1 has no meaning what so ever for anon pages.  Don't even try
to make it have meaning there, it will cause lots of pain if you do.
The reason is that anon pages have no mapping, and so trying to do the
PG_arch_1 trick will result in most places deciding "there is no
userspace mapping we can skip that".
Simon Baatz Oct. 8, 2012, 11:07 p.m. UTC | #4
Hi Russell,

On Mon, Oct 08, 2012 at 09:20:40PM +0100, Russell King - ARM Linux wrote:
> On Mon, Oct 08, 2012 at 10:02:16PM +0200, Simon Baatz wrote:
> > Hi Catalin,
> > 
> > On Mon, Oct 08, 2012 at 06:44:28PM +0100, Catalin Marinas wrote:
> > > On Sun, Oct 07, 2012 at 12:29:12PM +0100, Simon Baatz wrote:
> > > > Commit f8b63c1 made flush_kernel_dcache_page() a no-op assuming that
> > > > the pages it needs to handle are kernel mapped only.  However, for
> > > > example when doing direct I/O, pages with user space mappings may
> > > > occur.
> > > > 
> > > > Thus, do lazy flushing like in flush_dcache_page() if there are no user
> > > > space mappings.  Otherwise, flush the kernel cache lines directly.
> > > 
> > > Do you need to fix the VIPT non-aliasing case as well? Does
> > > flush_kernel_dcache_page() need to handle I-cache?
> > 
> > Good question. My previous version of the patch did not handle it,
> > but after our discussion on the arm64 case, I came to the conclusion
> > that we probably need to handle it.
> > 
> > flush_dcache_page() and flush_kernel_dcache_page() are both used when
> > a page was modified via its kernel mapping.  For example,
> > crypto/scatterwalk.c uses flush_dcache_page() whereas
> > lib/scatterlist.c uses flush_kernel_dcache_page().  
> 
> It's likely that this stuff is incredibly buggy - and where we suspect
> there's problems (like using the wrong flush_dcache_page() vs
> flush_kernel_dcache_page() call) that needs to be fixed.
> 
> I suspect crypto/scatterwalk.c was created long before
> flush_kernel_dcache_page() came into existence, and it probably needs
> to be fixed.
>
> > Thus, the reasoning is that if flush_dcache_page() needs to handle
> > I-cache in the case there is no hook later (already user-mapped page
> > cache page) then flush_kernel_dcache_page() needs to do the same. 
> 
> This sounds like we're heading in the direction that flush_kernel_dcache_page()
> and flush_dcache_page() end up being virtually identical.  This isn't
> supposed to be - they _are_ supposed to be different.  Again, maybe
> that's because the users have chosen the wrong function.
> ...

Yes, it is a mess and may be incredibly buggy. 

According to documentation flush_kernel_dcache_page() is supposed to
handle user pages.  It can assume that potential user mappings have
been flushed.  However, the documentation does not mention the case
of page cache pages with no user mappings explicitly.

flush_dcache_page() is supposed to deal with the user and kernel
mappings. However, the documentation does explicitly state that it is
not supposed to handle anon pages (but the arm implementation does
flush the kernel mapping).

Now, what should crypto/scatterwalk.c do? In the write to page path,
it can see page cache pages with and without user mappings and anon
pages.  If there are user mappings, it can assume that they have been
flushed.  I would argue it should use flush_kernel_dcache_page(). 
But if so, our current implementation does not handle two of these
three cases at all.

And the proposed flush_kernel_dcache_page() is different. It only
flushes the kernel mapping in the relevant cases whereas
flush_dcache_page() needs to care about user mappings as well. 
According to documentation we could remove the anon page case in
flush_dcache_page(), but uses like in crypto/scatterwalk.c currently
prevent that.

> > With respect to clearing the flag in the VIPT non-aliasing case with
> > anon pages: Declaring the pages dirty by default may already be
> > sufficient in this case, at least that is what the current
> > implementation assumes.
> 
> PG_arch_1 has no meaning what so ever for anon pages.  Don't even try
> to make it have meaning there, it will cause lots of pain if you do.
> The reason is that anon pages have no mapping, and so trying to do the
> PG_arch_1 trick will result in most places deciding "there is no
> userspace mapping we can skip that".

Hmm, now that you say it, it's quite obvious. There is the following
in __sync_icache_dcache():

        if (cache_is_vipt_aliasing())
                mapping = page_mapping(page);
        else
                mapping = NULL;

        if (!test_and_set_bit(PG_dcache_clean, &page->flags))
                __flush_dcache_page(mapping, page);

        if (pte_exec(pteval))
                __flush_icache_all();

At least this looks strange to me given that PG_dcache_clean has no
meaning for anon pages.  Is this correct?

- Simon
Jason Cooper Nov. 18, 2012, 9:10 p.m. UTC | #5
Russell, Catalin, Simon,

Any progress on this?  At worst, I'd like to see this in fixes for v3.8.
Assuming, of course, we've gotten to the bottom of it.

thx,

Jason.

On Tue, Oct 09, 2012 at 01:07:34AM +0200, Simon Baatz wrote:
> Hi Russell,
> 
> On Mon, Oct 08, 2012 at 09:20:40PM +0100, Russell King - ARM Linux wrote:
> > On Mon, Oct 08, 2012 at 10:02:16PM +0200, Simon Baatz wrote:
> > > Hi Catalin,
> > > 
> > > On Mon, Oct 08, 2012 at 06:44:28PM +0100, Catalin Marinas wrote:
> > > > On Sun, Oct 07, 2012 at 12:29:12PM +0100, Simon Baatz wrote:
> > > > > Commit f8b63c1 made flush_kernel_dcache_page() a no-op assuming that
> > > > > the pages it needs to handle are kernel mapped only.  However, for
> > > > > example when doing direct I/O, pages with user space mappings may
> > > > > occur.
> > > > > 
> > > > > Thus, do lazy flushing like in flush_dcache_page() if there are no user
> > > > > space mappings.  Otherwise, flush the kernel cache lines directly.
> > > > 
> > > > Do you need to fix the VIPT non-aliasing case as well? Does
> > > > flush_kernel_dcache_page() need to handle I-cache?
> > > 
> > > Good question. My previous version of the patch did not handle it,
> > > but after our discussion on the arm64 case, I came to the conclusion
> > > that we probably need to handle it.
> > > 
> > > flush_dcache_page() and flush_kernel_dcache_page() are both used when
> > > a page was modified via its kernel mapping.  For example,
> > > crypto/scatterwalk.c uses flush_dcache_page() whereas
> > > lib/scatterlist.c uses flush_kernel_dcache_page().  
> > 
> > It's likely that this stuff is incredibly buggy - and where we suspect
> > there's problems (like using the wrong flush_dcache_page() vs
> > flush_kernel_dcache_page() call) that needs to be fixed.
> > 
> > I suspect crypto/scatterwalk.c was created long before
> > flush_kernel_dcache_page() came into existence, and it probably needs
> > to be fixed.
> >
> > > Thus, the reasoning is that if flush_dcache_page() needs to handle
> > > I-cache in the case there is no hook later (already user-mapped page
> > > cache page) then flush_kernel_dcache_page() needs to do the same. 
> > 
> > This sounds like we're heading in the direction that flush_kernel_dcache_page()
> > and flush_dcache_page() end up being virtually identical.  This isn't
> > supposed to be - they _are_ supposed to be different.  Again, maybe
> > that's because the users have chosen the wrong function.
> > ...
> 
> Yes, it is a mess and may be incredibly buggy. 
> 
> According to documentation flush_kernel_dcache_page() is supposed to
> handle user pages.  It can assume that potential user mappings have
> been flushed.  However, the documentation does not mention the case
> of page cache pages with no user mappings explicitly.
> 
> flush_dcache_page() is supposed to deal with the user and kernel
> mappings. However, the documentation does explicitly state that it is
> not supposed to handle anon pages (but the arm implementation does
> flush the kernel mapping).
> 
> Now, what should crypto/scatterwalk.c do? In the write to page path,
> it can see page cache pages with and without user mappings and anon
> pages.  If there are user mappings, it can assume that they have been
> flushed.  I would argue it should use flush_kernel_dcache_page(). 
> But if so, our current implementation does not handle two of these
> three cases at all.
> 
> And the proposed flush_kernel_dcache_page() is different. It only
> flushes the kernel mapping in the relevant cases whereas
> flush_dcache_page() needs to care about user mappings as well. 
> According to documentation we could remove the anon page case in
> flush_dcache_page(), but uses like in crypto/scatterwalk.c currently
> prevent that.
> 
> > > With respect to clearing the flag in the VIPT non-aliasing case with
> > > anon pages: Declaring the pages dirty by default may already be
> > > sufficient in this case, at least that is what the current
> > > implementation assumes.
> > 
> > PG_arch_1 has no meaning what so ever for anon pages.  Don't even try
> > to make it have meaning there, it will cause lots of pain if you do.
> > The reason is that anon pages have no mapping, and so trying to do the
> > PG_arch_1 trick will result in most places deciding "there is no
> > userspace mapping we can skip that".
> 
> Hmm, now that you say it, it's quite obvious. There is the following
> in __sync_icache_dcache():
> 
>         if (cache_is_vipt_aliasing())
>                 mapping = page_mapping(page);
>         else
>                 mapping = NULL;
> 
>         if (!test_and_set_bit(PG_dcache_clean, &page->flags))
>                 __flush_dcache_page(mapping, page);
> 
>         if (pte_exec(pteval))
>                 __flush_icache_all();
> 
> At least this looks strange to me given that PG_dcache_clean has no
> meaning for anon pages.  Is this correct?
> 
> - Simon
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Jason Cooper April 18, 2013, 11:16 a.m. UTC | #6
Russell, Catalin, Simon,

I'm digging through my 'onice' mail folder and found this.  I've
personally experienced this bug in the past (workaround: don't use LVM
on ARM systems :-( ).

Is there any interest in reviving this series?  I can dig up the patches
if needed.

thx,

Jason.


On Sun, Nov 18, 2012 at 04:10:05PM -0500, Jason Cooper wrote:
> Russell, Catalin, Simon,
> 
> Any progress on this?  At worst, I'd like to see this in fixes for v3.8.
> Assuming, of course, we've gotten to the bottom of it.


-- Here's the original thread:

> On Tue, Oct 09, 2012 at 01:07:34AM +0200, Simon Baatz wrote:
> > Hi Russell,
> > 
> > On Mon, Oct 08, 2012 at 09:20:40PM +0100, Russell King - ARM Linux wrote:
> > > On Mon, Oct 08, 2012 at 10:02:16PM +0200, Simon Baatz wrote:
> > > > Hi Catalin,
> > > > 
> > > > On Mon, Oct 08, 2012 at 06:44:28PM +0100, Catalin Marinas wrote:
> > > > > On Sun, Oct 07, 2012 at 12:29:12PM +0100, Simon Baatz wrote:
> > > > > > Commit f8b63c1 made flush_kernel_dcache_page() a no-op assuming that
> > > > > > the pages it needs to handle are kernel mapped only.  However, for
> > > > > > example when doing direct I/O, pages with user space mappings may
> > > > > > occur.
> > > > > > 
> > > > > > Thus, do lazy flushing like in flush_dcache_page() if there are no user
> > > > > > space mappings.  Otherwise, flush the kernel cache lines directly.
> > > > > 
> > > > > Do you need to fix the VIPT non-aliasing case as well? Does
> > > > > flush_kernel_dcache_page() need to handle I-cache?
> > > > 
> > > > Good question. My previous version of the patch did not handle it,
> > > > but after our discussion on the arm64 case, I came to the conclusion
> > > > that we probably need to handle it.
> > > > 
> > > > flush_dcache_page() and flush_kernel_dcache_page() are both used when
> > > > a page was modified via its kernel mapping.  For example,
> > > > crypto/scatterwalk.c uses flush_dcache_page() whereas
> > > > lib/scatterlist.c uses flush_kernel_dcache_page().  
> > > 
> > > It's likely that this stuff is incredibly buggy - and where we suspect
> > > there's problems (like using the wrong flush_dcache_page() vs
> > > flush_kernel_dcache_page() call) that needs to be fixed.
> > > 
> > > I suspect crypto/scatterwalk.c was created long before
> > > flush_kernel_dcache_page() came into existence, and it probably needs
> > > to be fixed.
> > >
> > > > Thus, the reasoning is that if flush_dcache_page() needs to handle
> > > > I-cache in the case there is no hook later (already user-mapped page
> > > > cache page) then flush_kernel_dcache_page() needs to do the same. 
> > > 
> > > This sounds like we're heading in the direction that flush_kernel_dcache_page()
> > > and flush_dcache_page() end up being virtually identical.  This isn't
> > > supposed to be - they _are_ supposed to be different.  Again, maybe
> > > that's because the users have chosen the wrong function.
> > > ...
> > 
> > Yes, it is a mess and may be incredibly buggy. 
> > 
> > According to documentation flush_kernel_dcache_page() is supposed to
> > handle user pages.  It can assume that potential user mappings have
> > been flushed.  However, the documentation does not mention the case
> > of page cache pages with no user mappings explicitly.
> > 
> > flush_dcache_page() is supposed to deal with the user and kernel
> > mappings. However, the documentation does explicitly state that it is
> > not supposed to handle anon pages (but the arm implementation does
> > flush the kernel mapping).
> > 
> > Now, what should crypto/scatterwalk.c do? In the write to page path,
> > it can see page cache pages with and without user mappings and anon
> > pages.  If there are user mappings, it can assume that they have been
> > flushed.  I would argue it should use flush_kernel_dcache_page(). 
> > But if so, our current implementation does not handle two of these
> > three cases at all.
> > 
> > And the proposed flush_kernel_dcache_page() is different. It only
> > flushes the kernel mapping in the relevant cases whereas
> > flush_dcache_page() needs to care about user mappings as well. 
> > According to documentation we could remove the anon page case in
> > flush_dcache_page(), but uses like in crypto/scatterwalk.c currently
> > prevent that.
> > 
> > > > With respect to clearing the flag in the VIPT non-aliasing case with
> > > > anon pages: Declaring the pages dirty by default may already be
> > > > sufficient in this case, at least that is what the current
> > > > implementation assumes.
> > > 
> > > PG_arch_1 has no meaning what so ever for anon pages.  Don't even try
> > > to make it have meaning there, it will cause lots of pain if you do.
> > > The reason is that anon pages have no mapping, and so trying to do the
> > > PG_arch_1 trick will result in most places deciding "there is no
> > > userspace mapping we can skip that".
> > 
> > Hmm, now that you say it, it's quite obvious. There is the following
> > in __sync_icache_dcache():
> > 
> >         if (cache_is_vipt_aliasing())
> >                 mapping = page_mapping(page);
> >         else
> >                 mapping = NULL;
> > 
> >         if (!test_and_set_bit(PG_dcache_clean, &page->flags))
> >                 __flush_dcache_page(mapping, page);
> > 
> >         if (pte_exec(pteval))
> >                 __flush_icache_all();
> > 
> > At least this looks strange to me given that PG_dcache_clean has no
> > meaning for anon pages.  Is this correct?
> > 
> > - Simon
> > 
> > _______________________________________________
> > linux-arm-kernel mailing list
> > linux-arm-kernel@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Russell King - ARM Linux April 18, 2013, 11:22 a.m. UTC | #7
On Thu, Apr 18, 2013 at 07:16:08AM -0400, Jason Cooper wrote:
> Russell, Catalin, Simon,
> 
> I'm digging through my 'onice' mail folder and found this.  I've
> personally experienced this bug in the past (workaround: don't use LVM
> on ARM systems :-( ).
> 
> Is there any interest in reviving this series?  I can dig up the patches
> if needed.

None what so ever.  flush_kernel_dcache_page() is not supposed to touch
userspace pages.

  void flush_kernel_dcache_page(struct page *page)
        When the kernel needs to modify a user page is has obtained
        with kmap, it calls this function after all modifications are
        complete (but before kunmapping it) to bring the underlying
        page up to date.  It is assumed here that the user has no
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        incoherent cached copies (i.e. the original page was obtained
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        from a mechanism like get_user_pages()).  The default
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        implementation is a nop and should remain so on all coherent
        architectures.  On incoherent architectures, this should flush
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        the kernel cache for page (using page_address(page)).
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The first underlined sentence means that it's assumed that something _else_
has already dealt with the userspace aliases (in other words, get_user_pages()
via flush_dcache_page() or flush_anon_page()).  The second is absolutely
clear that _only_ the kernel side should be flushed.

flush_kernel_dcache_page() was invented explicitly to separate out the
kernel side flushing from flush_dcache_page(), as a lighter weight version
for block drivers to use.

Also see the commit text, which explicitly states that this is a lighter
weight API, and that this _only_ touches the kernel view of the page:

commit 5a3a5a98b6422d05c39eaa32c8b3f83840c7b768
Author: James Bottomley <James.Bottomley@SteelEye.com>
Date:   Sun Mar 26 01:36:59 2006 -0800

    [PATCH] Add flush_kernel_dcache_page() API

    We have a problem in a lot of emulated storage in that it takes a page from
    get_user_pages() and does something like

    kmap_atomic(page)
    modify page
    kunmap_atomic(page)

    However, nothing has flushed the kernel cache view of the page before the
    kunmap.  We need a lightweight API to do this, so this new API would
    specifically be for flushing the kernel cache view of a user page which the
    kernel has modified.  The driver would need to add
    flush_kernel_dcache_page(page) before the final kunmap.

    Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
    Cc: Russell King <rmk@arm.linux.org.uk>
    Cc: "David S. Miller" <davem@davemloft.net>
    Signed-off-by: Andrew Morton <akpm@osdl.org>
    Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Jason Cooper April 18, 2013, 11:40 a.m. UTC | #8
On Thu, Apr 18, 2013 at 12:22:01PM +0100, Russell King - ARM Linux wrote:
> On Thu, Apr 18, 2013 at 07:16:08AM -0400, Jason Cooper wrote:
> > Russell, Catalin, Simon,
> > 
> > I'm digging through my 'onice' mail folder and found this.  I've
> > personally experienced this bug in the past (workaround: don't use LVM
> > on ARM systems :-( ).
> > 
> > Is there any interest in reviving this series?  I can dig up the patches
> > if needed.
> 
> None what so ever.  flush_kernel_dcache_page() is not supposed to touch
> userspace pages.
> 
>   void flush_kernel_dcache_page(struct page *page)
>         When the kernel needs to modify a user page is has obtained
>         with kmap, it calls this function after all modifications are
>         complete (but before kunmapping it) to bring the underlying
>         page up to date.  It is assumed here that the user has no
>                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>         incoherent cached copies (i.e. the original page was obtained
>         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>         from a mechanism like get_user_pages()).  The default
>         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>         implementation is a nop and should remain so on all coherent
>         architectures.  On incoherent architectures, this should flush
>                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>         the kernel cache for page (using page_address(page)).
>         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 
> The first underlined sentence means that it's assumed that something _else_
> has already dealt with the userspace aliases (in other words, get_user_pages()
> via flush_dcache_page() or flush_anon_page()).  The second is absolutely
> clear that _only_ the kernel side should be flushed.
> 
> flush_kernel_dcache_page() was invented explicitly to separate out the
> kernel side flushing from flush_dcache_page(), as a lighter weight version
> for block drivers to use.
> 
> Also see the commit text, which explicitly states that this is a lighter
> weight API, and that this _only_ touches the kernel view of the page:
> 
> commit 5a3a5a98b6422d05c39eaa32c8b3f83840c7b768
> Author: James Bottomley <James.Bottomley@SteelEye.com>
> Date:   Sun Mar 26 01:36:59 2006 -0800
> 
>     [PATCH] Add flush_kernel_dcache_page() API
> 
>     We have a problem in a lot of emulated storage in that it takes a page from
>     get_user_pages() and does something like
> 
>     kmap_atomic(page)
>     modify page
>     kunmap_atomic(page)
> 
>     However, nothing has flushed the kernel cache view of the page before the
>     kunmap.  We need a lightweight API to do this, so this new API would
>     specifically be for flushing the kernel cache view of a user page which the
>     kernel has modified.  The driver would need to add
>     flush_kernel_dcache_page(page) before the final kunmap.
> 
>     Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
>     Cc: Russell King <rmk@arm.linux.org.uk>
>     Cc: "David S. Miller" <davem@davemloft.net>
>     Signed-off-by: Andrew Morton <akpm@osdl.org>
>     Signed-off-by: Linus Torvalds <torvalds@osdl.org>

Ok, got it.  I should have been more explicit.  LVM doesn't work on ARM.
iirc, Simon had a demo of dm-crypt also faulting on ARM.  This patch was
not the correct approach.  Is there an interest (particularly Simon) in
fixing the problem?

I'm willing to take a look at it, but LVM on ARM is a "nice to have" for
me, not a critical need.  The job would be easier if Simon was willing
to assist and lend his experience with the issue.

thx,

Jason.
Catalin Marinas April 18, 2013, 1:51 p.m. UTC | #9
On Thu, Apr 18, 2013 at 12:40:16PM +0100, Jason Cooper wrote:
> Ok, got it.  I should have been more explicit.  LVM doesn't work on ARM.
> iirc, Simon had a demo of dm-crypt also faulting on ARM.  This patch was
> not the correct approach.  Is there an interest (particularly Simon) in
> fixing the problem?

I think fixing this for ARM is useful but I don't have any time to
allocate. I think I acked the first patch in the series but I don't
fully remember the details behind the second one.

As Russell said, flush_kernel_dcache_page() is not the right API.
flush_dcache_page() is not supposed to be used on anonymous pages. What
we have for such pages is flush_anon_page() which is a no-op for VIPT
non-aliasing pages. I can see that __get_user_pages() calls both
flush_anon_page() flush_dcache_page().

Is the problem that you have related to I-D cache coherency? Is
flush_anon_page() the right place for this?
Simon Baatz April 18, 2013, 7 p.m. UTC | #10
Hi Russel,

On Thu, Apr 18, 2013 at 12:22:01PM +0100, Russell King - ARM Linux wrote:
> On Thu, Apr 18, 2013 at 07:16:08AM -0400, Jason Cooper wrote:
> > Russell, Catalin, Simon,
> > 
> > I'm digging through my 'onice' mail folder and found this.  I've
> > personally experienced this bug in the past (workaround: don't use LVM
> > on ARM systems :-( ).
> > 
> > Is there any interest in reviving this series?  I can dig up the patches
> > if needed.
> 
> None what so ever.  flush_kernel_dcache_page() is not supposed to touch
> userspace pages.

I don't know if we have a wording issue here. The header of the patch
says "user space mapped pages".  Thus, pages for which a user space
mapping exists.  As the documentation you cite below states, these
pages ("user pages") should be handled by flush_kernel_dcache_page(). 
Of course, flush_kernel_dcache_page() has only to care about the
dirty kernel mapping, but this is what the proposed patches do.  I
never proposed to touch the user space mappings.
 
>   void flush_kernel_dcache_page(struct page *page)
>         When the kernel needs to modify a user page is has obtained
>         with kmap, it calls this function after all modifications are
>         complete (but before kunmapping it) to bring the underlying
>         page up to date.  It is assumed here that the user has no
>                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>         incoherent cached copies (i.e. the original page was obtained
>         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>         from a mechanism like get_user_pages()).  The default
>         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>         implementation is a nop and should remain so on all coherent
>         architectures.  On incoherent architectures, this should flush
>                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>         the kernel cache for page (using page_address(page)).
>         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 
> The first underlined sentence means that it's assumed that something _else_
> has already dealt with the userspace aliases (in other words, get_user_pages()
> via flush_dcache_page() or flush_anon_page()).  The second is absolutely
> clear that _only_ the kernel side should be flushed.
> 
> flush_kernel_dcache_page() was invented explicitly to separate out the
> kernel side flushing from flush_dcache_page(), as a lighter weight version
> for block drivers to use.

Yes, exactly. Apart from one thing: flush_kernel_dcache_page() is
supposed to handle all user pages. Thus, it is a more lightweight
version of flush_dcache_page() and flush_anon_page().

But the current implementation only handles page cache pages without a
user mapping (it's a nop because we can do lazy flushing here). It
does not handle page cache pages with a user space mapping and anon
pages.

My first patch (see [1]) was the "minimalistic" approach to do this. Just
flush the _kernel_ mapping if we have user space mappings. Continue to
be lazy otherwise.

But then the question is, whether this is enough. Let's take commit
65b6ecc:

+++ b/kernel/events/uprobes.c
@@ -1199,6 +1199,11 @@ static unsigned long xol_get_insn_slot(struct uprobe *uprobe, unsigned long slot
        vaddr = kmap_atomic(area->page);
        memcpy(vaddr + offset, uprobe->arch.insn, MAX_UINSN_BYTES);
        kunmap_atomic(vaddr);
+       /*
+        * We probably need flush_icache_user_range() but it needs
vma.
+        * This should work on supported architectures too.
+        */
+       flush_dcache_page(area->page);

Wouldn't that be a use for flush_kernel_dcache_page() (before
kunmap())?  But then, we need to care about I/D-cache coherency, i.e. 
flush_kernel_dcache_page() very much looks like flush_dcache_page()
but without the "flush user mappings" part.  That's the idea behind
the second version (see [2]).  If that's not a relevant case, I am
more than happy to go back to [1].

The third version combined this with a slight optimization, but this
had problems as you pointed out at the time and we can forget about
that one for the time being.  Still, I consider the older versions
[1] or [2] to be relevant.

- Simon

[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2012-May/101794.html
[2] http://lists.infradead.org/pipermail/linux-arm-kernel/2012-September/122874.html
Simon Baatz April 18, 2013, 7:39 p.m. UTC | #11
Hi Jason,

On Thu, Apr 18, 2013 at 07:40:16AM -0400, Jason Cooper wrote:
> On Thu, Apr 18, 2013 at 12:22:01PM +0100, Russell King - ARM Linux wrote:
> > On Thu, Apr 18, 2013 at 07:16:08AM -0400, Jason Cooper wrote:
> > > Russell, Catalin, Simon,
> > > 
> > > I'm digging through my 'onice' mail folder and found this.  I've
> > > personally experienced this bug in the past (workaround: don't use LVM
> > > on ARM systems :-( ).
> > > 
> > > Is there any interest in reviving this series?  I can dig up the patches
> > > if needed.
...
> 
> Ok, got it.  I should have been more explicit.  LVM doesn't work on ARM.
> iirc, Simon had a demo of dm-crypt also faulting on ARM.  This patch was
> not the correct approach.  Is there an interest (particularly Simon) in
> fixing the problem?
> 
> I'm willing to take a look at it, but LVM on ARM is a "nice to have" for
> me, not a critical need.  The job would be easier if Simon was willing
> to assist and lend his experience with the issue.

Just to be precise here, the problem is much more specific: Drivers
which use flush_kernel_dcache_page() on VIVT or aliasing VIPT ARM
architectures may have problems with direct I/O.

In general, LVM should work perfectly fine on ARM. But if you have
this particular combination (in my case Kirkwood ARMv5 with VIVT
D-Cache and the mv_cesa driver), you can't put a physical volume on
such a device.  LVM user space tools will just read garbage when
trying to read from the device using direct I/O.

- Simon
Simon Baatz April 21, 2013, 10:06 p.m. UTC | #12
Hi Catalin,

On Thu, Apr 18, 2013 at 02:51:04PM +0100, Catalin Marinas wrote:
> On Thu, Apr 18, 2013 at 12:40:16PM +0100, Jason Cooper wrote:
> > Ok, got it.  I should have been more explicit.  LVM doesn't work on ARM.
> > iirc, Simon had a demo of dm-crypt also faulting on ARM.  This patch was
> > not the correct approach.  Is there an interest (particularly Simon) in
> > fixing the problem?
> 
> I think fixing this for ARM is useful but I don't have any time to
> allocate. I think I acked the first patch in the series but I don't
> fully remember the details behind the second one.
> 
> As Russell said, flush_kernel_dcache_page() is not the right API.

It is not the driver itself which is using the API, it is the
generic scatterlist memory iterator. And I don't think that this is
wrong, as I have tried to explain in [1].


> flush_dcache_page() is not supposed to be used on anonymous pages. What
> we have for such pages is flush_anon_page() which is a no-op for VIPT
> non-aliasing pages. I can see that __get_user_pages() calls both
> flush_anon_page() flush_dcache_page().

Yes. But I think that is not relevant here. Although the naming
suggest that the function is just a light-weight version of
flush_dcache_page(), both the documentation and the commit text say
that flush_kernel_dcache_page() has to handle pages that were
obtained through get_user_pages, which includes anon pages.


Btw. many drivers which modify pages via the kernel mapping currently
use flush_dcache_page() instead.  This is not supposed to work in
e.g.  the direct I/O case where a driver sees anon pages.  This works
just because flush_dcache_page() flushes the kernel mapping also for
anon pages.  Something is is clearly not supposed to do according to
documentation.

> Is the problem that you have related to I-D cache coherency? Is
> flush_anon_page() the right place for this?

This particular problem is not related to I/D cache coherency. 
However, as far as I know, flush_dcache_page() is expected to ensure
also I/D cache coherency.  Also, we flush the I-cache in certain
situations in this function.  As said in my response to Russel, I
don't know whether this is expected from flush_kernel_dcache_page()
as well.  I would say yes, but I also have posted a version that only
flushes the D-cache kernel mapping.

- Simon

[1] http://permalink.gmane.org/gmane.linux.ports.arm.kernel/181703
Catalin Marinas April 30, 2013, 11:22 a.m. UTC | #13
On Sun, Apr 21, 2013 at 11:06:30PM +0100, Simon Baatz wrote:
> On Thu, Apr 18, 2013 at 02:51:04PM +0100, Catalin Marinas wrote:
> > On Thu, Apr 18, 2013 at 12:40:16PM +0100, Jason Cooper wrote:
> > > Ok, got it.  I should have been more explicit.  LVM doesn't work on ARM.
> > > iirc, Simon had a demo of dm-crypt also faulting on ARM.  This patch was
> > > not the correct approach.  Is there an interest (particularly Simon) in
> > > fixing the problem?
> > 
> > I think fixing this for ARM is useful but I don't have any time to
> > allocate. I think I acked the first patch in the series but I don't
> > fully remember the details behind the second one.
> > 
> > As Russell said, flush_kernel_dcache_page() is not the right API.
> 
> It is not the driver itself which is using the API, it is the
> generic scatterlist memory iterator. And I don't think that this is
> wrong, as I have tried to explain in [1].

Trying to remember what we've discussed over the past months on this
topic. It looks like sg_miter_stop() does the right thing in calling
flush_kernel_dcache_page(). Commit f8b63c1 (ARM: 6382/1: Remove
superfluous flush_kernel_dcache_page()) removed this function entirely.
The code previously had this comment - /* highmem pages are always
flushed upon kunmap already */ which I think it wasn't fully correct
either. The kunmap_atomic() flushes the caches but kunmap() doesn't, so
I suspect we only get the flushing if SG_MITER_ATOMIC.

So it looks to me like flush_kernel_dcache_page() should be implemented
even for highmem pages (with VIVT or aliasing VIPT, at least for non
kmap_atomic addresses by checking for FIXADDR_START). If highmem is
disabled, I suspect we still need this function since the calling code
doesn't care whether kmap/kunmap was a no-op. But can we keep it as a
simple call to __cpuc_flush_dcache_area(page_address(page), PAGE_SIZE)?
Simon Baatz April 30, 2013, 9:04 p.m. UTC | #14
Hi Catalin,

On Tue, Apr 30, 2013 at 12:22:25PM +0100, Catalin Marinas wrote:
> On Sun, Apr 21, 2013 at 11:06:30PM +0100, Simon Baatz wrote:
> > On Thu, Apr 18, 2013 at 02:51:04PM +0100, Catalin Marinas wrote:
> > > On Thu, Apr 18, 2013 at 12:40:16PM +0100, Jason Cooper wrote:
...
> > 
> > It is not the driver itself which is using the API, it is the
> > generic scatterlist memory iterator. And I don't think that this is
> > wrong, as I have tried to explain in [1].
> 
> Trying to remember what we've discussed over the past months on this
> topic. It looks like sg_miter_stop() does the right thing in calling
> flush_kernel_dcache_page(). Commit f8b63c1 (ARM: 6382/1: Remove
> superfluous flush_kernel_dcache_page()) removed this function entirely.
> The code previously had this comment - /* highmem pages are always
> flushed upon kunmap already */ which I think it wasn't fully correct
> either. The kunmap_atomic() flushes the caches but kunmap() doesn't, so
> I suspect we only get the flushing if SG_MITER_ATOMIC.
> 
> So it looks to me like flush_kernel_dcache_page() should be implemented
> even for highmem pages (with VIVT or aliasing VIPT, at least for non
> kmap_atomic addresses by checking for FIXADDR_START). If highmem is
> disabled, I suspect we still need this function since the calling code
> doesn't care whether kmap/kunmap was a no-op. But can we keep it as a
> simple call to __cpuc_flush_dcache_area(page_address(page), PAGE_SIZE)?

My first version ([1]) had:

	if ((cache_is_vivt() || cache_is_vipt_aliasing()) && !PageHighMem(page))
		__flush_kernel_dcache_page(page);

If I understand this correctly, you are proposing to remove the
highmem exclusion.

And then in __flush_kernel_dcache_page():

	mapping = page_mapping(page);

	if (!mapping || mapping_mapped(mapping))
		__cpuc_flush_dcache_area(page_address(page), PAGE_SIZE);

I still prefer to have this condition here to avoid the flush when
there is no user mapping at all.  This is handled by lazy flushing
and is probably the most common case anyway (given how many people
seem to be affected by this problem).

Additionally, although we can assume that the page is kmapped,
page_address(page) can still be NULL for a highmem page, right?

- Simon

[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2012-May/101794.html
Catalin Marinas May 1, 2013, 2:22 p.m. UTC | #15
On Tue, Apr 30, 2013 at 10:04:03PM +0100, Simon Baatz wrote:
> On Tue, Apr 30, 2013 at 12:22:25PM +0100, Catalin Marinas wrote:
> > On Sun, Apr 21, 2013 at 11:06:30PM +0100, Simon Baatz wrote:
> > > On Thu, Apr 18, 2013 at 02:51:04PM +0100, Catalin Marinas wrote:
> > > > On Thu, Apr 18, 2013 at 12:40:16PM +0100, Jason Cooper wrote:
> ...
> > > 
> > > It is not the driver itself which is using the API, it is the
> > > generic scatterlist memory iterator. And I don't think that this is
> > > wrong, as I have tried to explain in [1].
> > 
> > Trying to remember what we've discussed over the past months on this
> > topic. It looks like sg_miter_stop() does the right thing in calling
> > flush_kernel_dcache_page(). Commit f8b63c1 (ARM: 6382/1: Remove
> > superfluous flush_kernel_dcache_page()) removed this function entirely.
> > The code previously had this comment - /* highmem pages are always
> > flushed upon kunmap already */ which I think it wasn't fully correct
> > either. The kunmap_atomic() flushes the caches but kunmap() doesn't, so
> > I suspect we only get the flushing if SG_MITER_ATOMIC.
> > 
> > So it looks to me like flush_kernel_dcache_page() should be implemented
> > even for highmem pages (with VIVT or aliasing VIPT, at least for non
> > kmap_atomic addresses by checking for FIXADDR_START). If highmem is
> > disabled, I suspect we still need this function since the calling code
> > doesn't care whether kmap/kunmap was a no-op. But can we keep it as a
> > simple call to __cpuc_flush_dcache_area(page_address(page), PAGE_SIZE)?
> 
> My first version ([1]) had:
> 
> 	if ((cache_is_vivt() || cache_is_vipt_aliasing()) && !PageHighMem(page))
> 		__flush_kernel_dcache_page(page);
> 
> If I understand this correctly, you are proposing to remove the
> highmem exclusion.

The highmem exclusion may have been there originally because of a
comment suggesting that kunmap() does the flushing. This is the case
only for kunmap_atomic() AFAICT (and maybe we could remove that as well
and rely on flush_kernel_dcache_page() being called).

> And then in __flush_kernel_dcache_page():
> 
> 	mapping = page_mapping(page);
> 
> 	if (!mapping || mapping_mapped(mapping))
> 		__cpuc_flush_dcache_area(page_address(page), PAGE_SIZE);
> 
> I still prefer to have this condition here to avoid the flush when
> there is no user mapping at all.  This is handled by lazy flushing
> and is probably the most common case anyway (given how many people
> seem to be affected by this problem).

Looking at the old thread, you said there is a case when this condition
is not true (O_DIRECT case). If that's for a page cache page, then we
can handle it lazily (for anonymous pages I don't think we can rely on
lazy flushing since the kernel does not guarantee the clearing of the
PG_arch_1 bit).

> Additionally, although we can assume that the page is kmapped,
> page_address(page) can still be NULL for a highmem page, right?

It looks like kmap() always sets page_address(page) but I'm not sure
about kmap_atomic(), it doesn't seem to.
Simon Baatz May 1, 2013, 7:04 p.m. UTC | #16
On Wed, May 01, 2013 at 03:22:06PM +0100, Catalin Marinas wrote:
> On Tue, Apr 30, 2013 at 10:04:03PM +0100, Simon Baatz wrote:
> > On Tue, Apr 30, 2013 at 12:22:25PM +0100, Catalin Marinas wrote:
> > > On Sun, Apr 21, 2013 at 11:06:30PM +0100, Simon Baatz wrote:
> > > > On Thu, Apr 18, 2013 at 02:51:04PM +0100, Catalin Marinas wrote:
> > > > > On Thu, Apr 18, 2013 at 12:40:16PM +0100, Jason Cooper wrote:
> > ...
> > > > 
> > > > It is not the driver itself which is using the API, it is the
> > > > generic scatterlist memory iterator. And I don't think that this is
> > > > wrong, as I have tried to explain in [1].
> > > 
> > > Trying to remember what we've discussed over the past months on this
> > > topic. It looks like sg_miter_stop() does the right thing in calling
> > > flush_kernel_dcache_page(). Commit f8b63c1 (ARM: 6382/1: Remove
> > > superfluous flush_kernel_dcache_page()) removed this function entirely.
> > > The code previously had this comment - /* highmem pages are always
> > > flushed upon kunmap already */ which I think it wasn't fully correct
> > > either. The kunmap_atomic() flushes the caches but kunmap() doesn't, so
> > > I suspect we only get the flushing if SG_MITER_ATOMIC.
> > > 
> > > So it looks to me like flush_kernel_dcache_page() should be implemented
> > > even for highmem pages (with VIVT or aliasing VIPT, at least for non
> > > kmap_atomic addresses by checking for FIXADDR_START). If highmem is
> > > disabled, I suspect we still need this function since the calling code
> > > doesn't care whether kmap/kunmap was a no-op. But can we keep it as a
> > > simple call to __cpuc_flush_dcache_area(page_address(page), PAGE_SIZE)?
> > 
> > My first version ([1]) had:
> > 
> > 	if ((cache_is_vivt() || cache_is_vipt_aliasing()) && !PageHighMem(page))
> > 		__flush_kernel_dcache_page(page);
> > 
> > If I understand this correctly, you are proposing to remove the
> > highmem exclusion.
> 
> The highmem exclusion may have been there originally because of a
> comment suggesting that kunmap() does the flushing. This is the case
> only for kunmap_atomic() AFAICT (and maybe we could remove that as well
> and rely on flush_kernel_dcache_page() being called).
> 
> > And then in __flush_kernel_dcache_page():
> > 
> > 	mapping = page_mapping(page);
> > 
> > 	if (!mapping || mapping_mapped(mapping))
> > 		__cpuc_flush_dcache_area(page_address(page), PAGE_SIZE);
> > 
> > I still prefer to have this condition here to avoid the flush when
> > there is no user mapping at all.  This is handled by lazy flushing
> > and is probably the most common case anyway (given how many people
> > seem to be affected by this problem).
> 
> Looking at the old thread, you said there is a case when this condition
> is not true (O_DIRECT case). If that's for a page cache page, then we
> can handle it lazily (for anonymous pages I don't think we can rely on
> lazy flushing since the kernel does not guarantee the clearing of the
> PG_arch_1 bit).

As Russel pointed out in a comment to a later version of the patch,
PG_arch_1 makes only sense for page cache pages. The condition above
is ok from my point of view (it is based on what flush_dcache_page()
uses):

- Page cache page without user space mapping:
mapping != NULL and mapping_mapped() == 0

-> no flush here; lazy flush based on PG_arch_1 later if needed (we
   rely on the proper initialization of the page to "dirty" here.)


- Page cache page with user space mapping:
mapping != NULL and mapping_mapped() != 0

-> kernel mapping flushed here (user mapping can be assumed to be clean)


- Anonymous page:
mapping == NULL

-> kernel mapping flushed here (user mapping can be assumed to be clean)

 
> > Additionally, although we can assume that the page is kmapped,
> > page_address(page) can still be NULL for a highmem page, right?
> 
> It looks like kmap() always sets page_address(page) but I'm not sure
> about kmap_atomic(), it doesn't seem to.

Hmm, in __flush_dcache_page() we have the following code to flush the
kernel mapping:

void __flush_dcache_page(struct address_space *mapping, struct page *page)
{
	/*
	 * Writeback any data associated with the kernel mapping of this
	 * page.  This ensures that data in the physical page is mutually
	 * coherent with the kernels mapping.
	 */
	if (!PageHighMem(page)) {
		__cpuc_flush_dcache_area(page_address(page), PAGE_SIZE);
	} else {
		void *addr = kmap_high_get(page);
		if (addr) {
			__cpuc_flush_dcache_area(addr, PAGE_SIZE);
			kunmap_high(page);
		} else if (cache_is_vipt()) {
			/* unmapped pages might still be cached */
			addr = kmap_atomic(page);
			__cpuc_flush_dcache_area(addr, PAGE_SIZE);
			kunmap_atomic(addr);
		}
	}
...


We probably should reuse it in flush_kernel_dcache_page() to flush
the kernel mapping. (The last else clause looks strange though)


- Simon
Catalin Marinas May 2, 2013, 9:54 a.m. UTC | #17
On Wed, May 01, 2013 at 08:04:41PM +0100, Simon Baatz wrote:
> On Wed, May 01, 2013 at 03:22:06PM +0100, Catalin Marinas wrote:
> > On Tue, Apr 30, 2013 at 10:04:03PM +0100, Simon Baatz wrote:
> > > On Tue, Apr 30, 2013 at 12:22:25PM +0100, Catalin Marinas wrote:
> > > > On Sun, Apr 21, 2013 at 11:06:30PM +0100, Simon Baatz wrote:
> > > > > On Thu, Apr 18, 2013 at 02:51:04PM +0100, Catalin Marinas wrote:
> > > > > > On Thu, Apr 18, 2013 at 12:40:16PM +0100, Jason Cooper wrote:
> > > ...
> > > > > 
> > > > > It is not the driver itself which is using the API, it is the
> > > > > generic scatterlist memory iterator. And I don't think that this is
> > > > > wrong, as I have tried to explain in [1].
> > > > 
> > > > Trying to remember what we've discussed over the past months on this
> > > > topic. It looks like sg_miter_stop() does the right thing in calling
> > > > flush_kernel_dcache_page(). Commit f8b63c1 (ARM: 6382/1: Remove
> > > > superfluous flush_kernel_dcache_page()) removed this function entirely.
> > > > The code previously had this comment - /* highmem pages are always
> > > > flushed upon kunmap already */ which I think it wasn't fully correct
> > > > either. The kunmap_atomic() flushes the caches but kunmap() doesn't, so
> > > > I suspect we only get the flushing if SG_MITER_ATOMIC.
> > > > 
> > > > So it looks to me like flush_kernel_dcache_page() should be implemented
> > > > even for highmem pages (with VIVT or aliasing VIPT, at least for non
> > > > kmap_atomic addresses by checking for FIXADDR_START). If highmem is
> > > > disabled, I suspect we still need this function since the calling code
> > > > doesn't care whether kmap/kunmap was a no-op. But can we keep it as a
> > > > simple call to __cpuc_flush_dcache_area(page_address(page), PAGE_SIZE)?
> > > 
> > > My first version ([1]) had:
> > > 
> > > 	if ((cache_is_vivt() || cache_is_vipt_aliasing()) && !PageHighMem(page))
> > > 		__flush_kernel_dcache_page(page);
> > > 
> > > If I understand this correctly, you are proposing to remove the
> > > highmem exclusion.
> > 
> > The highmem exclusion may have been there originally because of a
> > comment suggesting that kunmap() does the flushing. This is the case
> > only for kunmap_atomic() AFAICT (and maybe we could remove that as well
> > and rely on flush_kernel_dcache_page() being called).
> > 
> > > And then in __flush_kernel_dcache_page():
> > > 
> > > 	mapping = page_mapping(page);
> > > 
> > > 	if (!mapping || mapping_mapped(mapping))
> > > 		__cpuc_flush_dcache_area(page_address(page), PAGE_SIZE);
> > > 
> > > I still prefer to have this condition here to avoid the flush when
> > > there is no user mapping at all.  This is handled by lazy flushing
> > > and is probably the most common case anyway (given how many people
> > > seem to be affected by this problem).
> > 
> > Looking at the old thread, you said there is a case when this condition
> > is not true (O_DIRECT case). If that's for a page cache page, then we
> > can handle it lazily (for anonymous pages I don't think we can rely on
> > lazy flushing since the kernel does not guarantee the clearing of the
> > PG_arch_1 bit).
> 
> As Russel pointed out in a comment to a later version of the patch,
> PG_arch_1 makes only sense for page cache pages. The condition above
> is ok from my point of view (it is based on what flush_dcache_page()
> uses):
> 
> - Page cache page without user space mapping:
> mapping != NULL and mapping_mapped() == 0
> 
> -> no flush here; lazy flush based on PG_arch_1 later if needed (we
>    rely on the proper initialization of the page to "dirty" here.)

Indeed.

> - Page cache page with user space mapping:
> mapping != NULL and mapping_mapped() != 0
> 
> -> kernel mapping flushed here (user mapping can be assumed to be clean)

We had similar thoughts for AArch64 here and decided it's not needed, it
can just clear the PG_arch_1 bit and do it lazily (patches not pushed
yet, need more testing). This assumes that even if mapping_mapped(), the
page is not actually mapped in user space and we eventually get a
set_pte_at() call. That's what powerpc is doing.

> - Anonymous page:
> mapping == NULL
> 
> -> kernel mapping flushed here (user mapping can be assumed to be clean)

I don't think it should care about anonymous pages at all. I put a
WARN_ON(!mapping) in flush_dcache_page() and it hasn't triggered yet,
though not intensive testing.

> > > Additionally, although we can assume that the page is kmapped,
> > > page_address(page) can still be NULL for a highmem page, right?
> > 
> > It looks like kmap() always sets page_address(page) but I'm not sure
> > about kmap_atomic(), it doesn't seem to.
> 
> Hmm, in __flush_dcache_page() we have the following code to flush the
> kernel mapping:
> 
> void __flush_dcache_page(struct address_space *mapping, struct page *page)
> {
> 	/*
> 	 * Writeback any data associated with the kernel mapping of this
> 	 * page.  This ensures that data in the physical page is mutually
> 	 * coherent with the kernels mapping.
> 	 */
> 	if (!PageHighMem(page)) {
> 		__cpuc_flush_dcache_area(page_address(page), PAGE_SIZE);
> 	} else {
> 		void *addr = kmap_high_get(page);
> 		if (addr) {
> 			__cpuc_flush_dcache_area(addr, PAGE_SIZE);
> 			kunmap_high(page);
> 		} else if (cache_is_vipt()) {
> 			/* unmapped pages might still be cached */
> 			addr = kmap_atomic(page);
> 			__cpuc_flush_dcache_area(addr, PAGE_SIZE);
> 			kunmap_atomic(addr);
> 		}
> 	}
> ...
> 
> 
> We probably should reuse it in flush_kernel_dcache_page() to flush
> the kernel mapping. (The last else clause looks strange though)

I think it makes sense to reuse this logic in
flush_kernel_dcache_page(). If the page is already mapped with kmap,
then kmap_high_get() should return the actual address. If it was mapped
with kmap_atomic, kmap_high_get() would return NULL, hence the 'else'
clause and the additional kmap_atomic(). The cache_is_vipt() check is
useful because kunmap_atomic() would flush VIVT caches anyway.

As for __flush_dcache_page() called from other places like
flush_dcache_page(), because of this 'else if' clause it looks like it
misses flushing unmapped highmem pages on VIVT cache.
Simon Baatz May 2, 2013, 7:38 p.m. UTC | #18
Hi Catalin,

On Thu, May 02, 2013 at 10:54:31AM +0100, Catalin Marinas wrote:
> On Wed, May 01, 2013 at 08:04:41PM +0100, Simon Baatz wrote:
> > On Wed, May 01, 2013 at 03:22:06PM +0100, Catalin Marinas wrote:
> > > On Tue, Apr 30, 2013 at 10:04:03PM +0100, Simon Baatz wrote:
> > > > On Tue, Apr 30, 2013 at 12:22:25PM +0100, Catalin Marinas wrote:
> > > > > On Sun, Apr 21, 2013 at 11:06:30PM +0100, Simon Baatz wrote:
> > > > > > On Thu, Apr 18, 2013 at 02:51:04PM +0100, Catalin Marinas wrote:
> > > > > > > On Thu, Apr 18, 2013 at 12:40:16PM +0100, Jason Cooper wrote:
> > > > ...
> > > > > > 
> > > > > > It is not the driver itself which is using the API, it is the
> > > > > > generic scatterlist memory iterator. And I don't think that this is
> > > > > > wrong, as I have tried to explain in [1].
> > > > > 
> > > > > Trying to remember what we've discussed over the past months on this
> > > > > topic. It looks like sg_miter_stop() does the right thing in calling
> > > > > flush_kernel_dcache_page(). Commit f8b63c1 (ARM: 6382/1: Remove
> > > > > superfluous flush_kernel_dcache_page()) removed this function entirely.
> > > > > The code previously had this comment - /* highmem pages are always
> > > > > flushed upon kunmap already */ which I think it wasn't fully correct
> > > > > either. The kunmap_atomic() flushes the caches but kunmap() doesn't, so
> > > > > I suspect we only get the flushing if SG_MITER_ATOMIC.
> > > > > 
> > > > > So it looks to me like flush_kernel_dcache_page() should be implemented
> > > > > even for highmem pages (with VIVT or aliasing VIPT, at least for non
> > > > > kmap_atomic addresses by checking for FIXADDR_START). If highmem is
> > > > > disabled, I suspect we still need this function since the calling code
> > > > > doesn't care whether kmap/kunmap was a no-op. But can we keep it as a
> > > > > simple call to __cpuc_flush_dcache_area(page_address(page), PAGE_SIZE)?
> > > > 
> > > > My first version ([1]) had:
> > > > 
> > > > 	if ((cache_is_vivt() || cache_is_vipt_aliasing()) && !PageHighMem(page))
> > > > 		__flush_kernel_dcache_page(page);
> > > > 
> > > > If I understand this correctly, you are proposing to remove the
> > > > highmem exclusion.
> > > 
> > > The highmem exclusion may have been there originally because of a
> > > comment suggesting that kunmap() does the flushing. This is the case
> > > only for kunmap_atomic() AFAICT (and maybe we could remove that as well
> > > and rely on flush_kernel_dcache_page() being called).
> > > 
> > > > And then in __flush_kernel_dcache_page():
> > > > 
> > > > 	mapping = page_mapping(page);
> > > > 
> > > > 	if (!mapping || mapping_mapped(mapping))
> > > > 		__cpuc_flush_dcache_area(page_address(page), PAGE_SIZE);
> > > > 
> > > > I still prefer to have this condition here to avoid the flush when
> > > > there is no user mapping at all.  This is handled by lazy flushing
> > > > and is probably the most common case anyway (given how many people
> > > > seem to be affected by this problem).
> > > 
> > > Looking at the old thread, you said there is a case when this condition
> > > is not true (O_DIRECT case). If that's for a page cache page, then we
> > > can handle it lazily (for anonymous pages I don't think we can rely on
> > > lazy flushing since the kernel does not guarantee the clearing of the
> > > PG_arch_1 bit).
> > 
> > As Russel pointed out in a comment to a later version of the patch,
> > PG_arch_1 makes only sense for page cache pages. The condition above
> > is ok from my point of view (it is based on what flush_dcache_page()
> > uses):
> > 
> > - Page cache page without user space mapping:
> > mapping != NULL and mapping_mapped() == 0
> > 
> > -> no flush here; lazy flush based on PG_arch_1 later if needed (we
> >    rely on the proper initialization of the page to "dirty" here.)
> 
> Indeed.
> 
> > - Page cache page with user space mapping:
> > mapping != NULL and mapping_mapped() != 0
> > 
> > -> kernel mapping flushed here (user mapping can be assumed to be clean)
> 
> We had similar thoughts for AArch64 here and decided it's not needed, it
> can just clear the PG_arch_1 bit and do it lazily (patches not pushed
> yet, need more testing). This assumes that even if mapping_mapped(), the
> page is not actually mapped in user space and we eventually get a
> set_pte_at() call. That's what powerpc is doing.

For arm64 only I-/D-cache coherency is relevant, right? In this case,
that may be right (but you may want to have a look at what
xol_get_insn_slot() in kernel/events/uprobes.c does.  I don't know
what kind of pages it handles, but this looks suspicious. And I think
uprobes are also available for powerpc).

However, if you can have aliases in D-cache this is needed.  See
below to trigger this on pages that are already mapped
to user space (direct I/O into a memory mapped file).
 
> > - Anonymous page:
> > mapping == NULL
> > 
> > -> kernel mapping flushed here (user mapping can be assumed to be clean)
> 
> I don't think it should care about anonymous pages at all. I put a
> WARN_ON(!mapping) in flush_dcache_page() and it hasn't triggered yet,
> though not intensive testing.

I assume that you inhibited the call to flush_dcache_page() in
__get_user_pages() for anon pages.  Otherwise, you will be flooded
with warnings.

DIY steps to produce both cases:
(We will need software encryption, thus I need to remove the hardware
encryption module mv_cesa on my hardware first)

# rmmod mv_cesa     
# wget -O mapping_tests.c http://ubuntuone.com/0jF9fNsguN8BvI1Z8fsBVI
# gcc -o mapping_tests mapping_tests.c
# dd if=/dev/urandom of=map.out bs=4k count=10
# dd if=/dev/urandom of=cdevice.img bs=4k count=10
# losetup /dev/loop0 cdevice.img 
# cryptsetup -c aes create c_sda2 /dev/loop0
<enter any passphrase>
# ./mapping_tests

cryptsetup will already trigger the warning and mapping_tests should
flood your console.
 
Since flush_kernel_dcache_page() is supposed to handle user space
pages, it must flush the kernel mapping in these cases.  Running the
above on a device driver using an unpatched
flush_kernel_dcache_page() yields:

# ./mapping_tests 
Anonymous page: differs!
User space mapped page: differs!

It is even needed in flush_dcache_page() as long as everybody
continues to use flush_dcache_page() instead of
flush_kernel_dcache_page() when appropriate...
(This is probably the main reason why the problem I reported is so
uncommon: Everybody seems to use flush_dcache_page() and since it
flushes the kernel mapping in these cases, everything is fine.)

> > > > Additionally, although we can assume that the page is kmapped,
> > > > page_address(page) can still be NULL for a highmem page, right?
> > > 
> > > It looks like kmap() always sets page_address(page) but I'm not sure
> > > about kmap_atomic(), it doesn't seem to.
> > 
> > Hmm, in __flush_dcache_page() we have the following code to flush the
> > kernel mapping:
> > 
> > void __flush_dcache_page(struct address_space *mapping, struct page *page)
> > {
> > 	/*
> > 	 * Writeback any data associated with the kernel mapping of this
> > 	 * page.  This ensures that data in the physical page is mutually
> > 	 * coherent with the kernels mapping.
> > 	 */
> > 	if (!PageHighMem(page)) {
> > 		__cpuc_flush_dcache_area(page_address(page), PAGE_SIZE);
> > 	} else {
> > 		void *addr = kmap_high_get(page);
> > 		if (addr) {
> > 			__cpuc_flush_dcache_area(addr, PAGE_SIZE);
> > 			kunmap_high(page);
> > 		} else if (cache_is_vipt()) {
> > 			/* unmapped pages might still be cached */
> > 			addr = kmap_atomic(page);
> > 			__cpuc_flush_dcache_area(addr, PAGE_SIZE);
> > 			kunmap_atomic(addr);
> > 		}
> > 	}
> > ...
> > 
> > 
> > We probably should reuse it in flush_kernel_dcache_page() to flush
> > the kernel mapping. (The last else clause looks strange though)
> 
> I think it makes sense to reuse this logic in
> flush_kernel_dcache_page(). If the page is already mapped with kmap,
> then kmap_high_get() should return the actual address. If it was mapped
> with kmap_atomic, kmap_high_get() would return NULL, hence the 'else'
> clause and the additional kmap_atomic(). The cache_is_vipt() check is
> useful because kunmap_atomic() would flush VIVT caches anyway.

Yes, but why don't we flush for VIPT _aliasing_ already in
kunmap_atomic? Why do we need to do anything for non-aliasing VIPT
here?
 
> As for __flush_dcache_page() called from other places like
> flush_dcache_page(), because of this 'else if' clause it looks like it
> misses flushing unmapped highmem pages on VIVT cache.

How many users do we have with VIVT D-caches using highmem? (This is
not a rhetorical questions, I have no idea.  However, I would assume
that this use case is almost non-existent.)

- Simon
Catalin Marinas May 3, 2013, 10:02 a.m. UTC | #19
On Thu, May 02, 2013 at 08:38:36PM +0100, Simon Baatz wrote:
> On Thu, May 02, 2013 at 10:54:31AM +0100, Catalin Marinas wrote:
> > On Wed, May 01, 2013 at 08:04:41PM +0100, Simon Baatz wrote:
> > > On Wed, May 01, 2013 at 03:22:06PM +0100, Catalin Marinas wrote:
> > > > On Tue, Apr 30, 2013 at 10:04:03PM +0100, Simon Baatz wrote:
> > > > > On Tue, Apr 30, 2013 at 12:22:25PM +0100, Catalin Marinas wrote:
> > > > > > On Sun, Apr 21, 2013 at 11:06:30PM +0100, Simon Baatz wrote:
> > > > > > > On Thu, Apr 18, 2013 at 02:51:04PM +0100, Catalin Marinas wrote:
> > > > > > > > On Thu, Apr 18, 2013 at 12:40:16PM +0100, Jason Cooper wrote:
> > > > > ...
> > > > > > > 
> > > > > > > It is not the driver itself which is using the API, it is the
> > > > > > > generic scatterlist memory iterator. And I don't think that this is
> > > > > > > wrong, as I have tried to explain in [1].
> > > > > > 
> > > > > > Trying to remember what we've discussed over the past months on this
> > > > > > topic. It looks like sg_miter_stop() does the right thing in calling
> > > > > > flush_kernel_dcache_page(). Commit f8b63c1 (ARM: 6382/1: Remove
> > > > > > superfluous flush_kernel_dcache_page()) removed this function entirely.
> > > > > > The code previously had this comment - /* highmem pages are always
> > > > > > flushed upon kunmap already */ which I think it wasn't fully correct
> > > > > > either. The kunmap_atomic() flushes the caches but kunmap() doesn't, so
> > > > > > I suspect we only get the flushing if SG_MITER_ATOMIC.
> > > > > > 
> > > > > > So it looks to me like flush_kernel_dcache_page() should be implemented
> > > > > > even for highmem pages (with VIVT or aliasing VIPT, at least for non
> > > > > > kmap_atomic addresses by checking for FIXADDR_START). If highmem is
> > > > > > disabled, I suspect we still need this function since the calling code
> > > > > > doesn't care whether kmap/kunmap was a no-op. But can we keep it as a
> > > > > > simple call to __cpuc_flush_dcache_area(page_address(page), PAGE_SIZE)?
> > > > > 
> > > > > My first version ([1]) had:
> > > > > 
> > > > > 	if ((cache_is_vivt() || cache_is_vipt_aliasing()) && !PageHighMem(page))
> > > > > 		__flush_kernel_dcache_page(page);
> > > > > 
> > > > > If I understand this correctly, you are proposing to remove the
> > > > > highmem exclusion.
> > > > 
> > > > The highmem exclusion may have been there originally because of a
> > > > comment suggesting that kunmap() does the flushing. This is the case
> > > > only for kunmap_atomic() AFAICT (and maybe we could remove that as well
> > > > and rely on flush_kernel_dcache_page() being called).
> > > > 
> > > > > And then in __flush_kernel_dcache_page():
> > > > > 
> > > > > 	mapping = page_mapping(page);
> > > > > 
> > > > > 	if (!mapping || mapping_mapped(mapping))
> > > > > 		__cpuc_flush_dcache_area(page_address(page), PAGE_SIZE);
> > > > > 
> > > > > I still prefer to have this condition here to avoid the flush when
> > > > > there is no user mapping at all.  This is handled by lazy flushing
> > > > > and is probably the most common case anyway (given how many people
> > > > > seem to be affected by this problem).
> > > > 
> > > > Looking at the old thread, you said there is a case when this condition
> > > > is not true (O_DIRECT case). If that's for a page cache page, then we
> > > > can handle it lazily (for anonymous pages I don't think we can rely on
> > > > lazy flushing since the kernel does not guarantee the clearing of the
> > > > PG_arch_1 bit).
> > > 
> > > As Russel pointed out in a comment to a later version of the patch,
> > > PG_arch_1 makes only sense for page cache pages. The condition above
> > > is ok from my point of view (it is based on what flush_dcache_page()
> > > uses):
> > > 
> > > - Page cache page without user space mapping:
> > > mapping != NULL and mapping_mapped() == 0
> > > 
> > > -> no flush here; lazy flush based on PG_arch_1 later if needed (we
> > >    rely on the proper initialization of the page to "dirty" here.)
> > 
> > Indeed.
> > 
> > > - Page cache page with user space mapping:
> > > mapping != NULL and mapping_mapped() != 0
> > > 
> > > -> kernel mapping flushed here (user mapping can be assumed to be clean)
> > 
> > We had similar thoughts for AArch64 here and decided it's not needed, it
> > can just clear the PG_arch_1 bit and do it lazily (patches not pushed
> > yet, need more testing). This assumes that even if mapping_mapped(), the
> > page is not actually mapped in user space and we eventually get a
> > set_pte_at() call. That's what powerpc is doing.
> 
> For arm64 only I-/D-cache coherency is relevant, right?

Yes. Same for 32-bit ARMv7 or non-aliasing D-cache on ARMv6.

> In this case,
> that may be right (but you may want to have a look at what
> xol_get_insn_slot() in kernel/events/uprobes.c does.  I don't know
> what kind of pages it handles, but this looks suspicious. And I think
> uprobes are also available for powerpc).

It is suspicious indeed and even the author is unsure about using the
right API. Should it rather use copy_to_user_page? I'm not familiar with
uprobes.

> However, if you can have aliases in D-cache this is needed.  See
> below to trigger this on pages that are already mapped
> to user space (direct I/O into a memory mapped file).

You are probably right, for direct I/O and aliases in the D-cache it
needs to flush. The documentation for flush_dcache_page() states that it
can also be called when the kernel is about to read from a page cache
page (potentially mapped into user space).

> > > - Anonymous page:
> > > mapping == NULL
> > > 
> > > -> kernel mapping flushed here (user mapping can be assumed to be clean)
> > 
> > I don't think it should care about anonymous pages at all. I put a
> > WARN_ON(!mapping) in flush_dcache_page() and it hasn't triggered yet,
> > though not intensive testing.
> 
> I assume that you inhibited the call to flush_dcache_page() in
> __get_user_pages() for anon pages.  Otherwise, you will be flooded
> with warnings.

I haven't done any stress testing so I don't think I hit this code path,
so no warning. But yes, it should have triggered. Anyway, in this case
flush_dcache_page() should have just ignored (clearing PG_arch_1 is
harmless anyway if we also ignore this bit in __sync_icache_dcache for
non-aliasing caches).

> DIY steps to produce both cases:
> (We will need software encryption, thus I need to remove the hardware
> encryption module mv_cesa on my hardware first)
> 
> # rmmod mv_cesa     
> # wget -O mapping_tests.c http://ubuntuone.com/0jF9fNsguN8BvI1Z8fsBVI
> # gcc -o mapping_tests mapping_tests.c
> # dd if=/dev/urandom of=map.out bs=4k count=10
> # dd if=/dev/urandom of=cdevice.img bs=4k count=10
> # losetup /dev/loop0 cdevice.img 
> # cryptsetup -c aes create c_sda2 /dev/loop0
> <enter any passphrase>
> # ./mapping_tests
> 
> cryptsetup will already trigger the warning and mapping_tests should
> flood your console.
>  
> Since flush_kernel_dcache_page() is supposed to handle user space
> pages, it must flush the kernel mapping in these cases.  Running the
> above on a device driver using an unpatched
> flush_kernel_dcache_page() yields:
> 
> # ./mapping_tests 
> Anonymous page: differs!
> User space mapped page: differs!

I haven't run the tests but I can see  why it fails without
flush_kernel_dcache_page(). So I think this function needs to be
implemented for aliasing VIPT or VIVT caches.

> It is even needed in flush_dcache_page() as long as everybody
> continues to use flush_dcache_page() instead of
> flush_kernel_dcache_page() when appropriate...
> (This is probably the main reason why the problem I reported is so
> uncommon: Everybody seems to use flush_dcache_page() and since it
> flushes the kernel mapping in these cases, everything is fine.)

That's why for non-aliasing VIPT we could make it just a clear_bit() and
let the callers fix their API usage.

> > > > > Additionally, although we can assume that the page is kmapped,
> > > > > page_address(page) can still be NULL for a highmem page, right?
> > > > 
> > > > It looks like kmap() always sets page_address(page) but I'm not sure
> > > > about kmap_atomic(), it doesn't seem to.
> > > 
> > > Hmm, in __flush_dcache_page() we have the following code to flush the
> > > kernel mapping:
> > > 
> > > void __flush_dcache_page(struct address_space *mapping, struct page *page)
> > > {
> > > 	/*
> > > 	 * Writeback any data associated with the kernel mapping of this
> > > 	 * page.  This ensures that data in the physical page is mutually
> > > 	 * coherent with the kernels mapping.
> > > 	 */
> > > 	if (!PageHighMem(page)) {
> > > 		__cpuc_flush_dcache_area(page_address(page), PAGE_SIZE);
> > > 	} else {
> > > 		void *addr = kmap_high_get(page);
> > > 		if (addr) {
> > > 			__cpuc_flush_dcache_area(addr, PAGE_SIZE);
> > > 			kunmap_high(page);
> > > 		} else if (cache_is_vipt()) {
> > > 			/* unmapped pages might still be cached */
> > > 			addr = kmap_atomic(page);
> > > 			__cpuc_flush_dcache_area(addr, PAGE_SIZE);
> > > 			kunmap_atomic(addr);
> > > 		}
> > > 	}
> > > ...
> > > 
> > > 
> > > We probably should reuse it in flush_kernel_dcache_page() to flush
> > > the kernel mapping. (The last else clause looks strange though)
> > 
> > I think it makes sense to reuse this logic in
> > flush_kernel_dcache_page(). If the page is already mapped with kmap,
> > then kmap_high_get() should return the actual address. If it was mapped
> > with kmap_atomic, kmap_high_get() would return NULL, hence the 'else'
> > clause and the additional kmap_atomic(). The cache_is_vipt() check is
> > useful because kunmap_atomic() would flush VIVT caches anyway.
> 
> Yes, but why don't we flush for VIPT _aliasing_ already in
> kunmap_atomic?

Some comment from Nico on the list, it seems that highmem does not work
with aliasing VIPT caches:

http://article.gmane.org/gmane.linux.ports.arm.kernel/85114

> Why do we need to do anything for non-aliasing VIPT here?

Do you mean __flush_dcache_page()? I don't think we need.

> > As for __flush_dcache_page() called from other places like
> > flush_dcache_page(), because of this 'else if' clause it looks like it
> > misses flushing unmapped highmem pages on VIVT cache.
> 
> How many users do we have with VIVT D-caches using highmem? (This is
> not a rhetorical questions, I have no idea.  However, I would assume
> that this use case is almost non-existent.)

I don't really know. Maybe Nico has more info.

So my proposal:

1. Non-aliasing VIPT - defer any D-cache flushing until
   __sync_icache_dcache() (and fix callers like uprobes)
2. Aliasing VIPT or VIVT - flush the aliases as before in
   flush_dcache_page()
3. Aliasing VIPT or VIVT - implement flush_kernel_dcache_page() for all
   pages (don't check for PageHighmem)
4. VIVT - remove cache flushing from kunmap_atomic and rely on
   flush_kernel_dcache_page()

I'm not entirely sure about 4 but at the moment kunmap() doesn't flush
caches, only kunmap_atomic() so it looks like an inconsistency.
Simon Baatz May 5, 2013, 10:26 p.m. UTC | #20
On Fri, May 03, 2013 at 11:02:42AM +0100, Catalin Marinas wrote:
> On Thu, May 02, 2013 at 08:38:36PM +0100, Simon Baatz wrote:
> > On Thu, May 02, 2013 at 10:54:31AM +0100, Catalin Marinas wrote:
> > > On Wed, May 01, 2013 at 08:04:41PM +0100, Simon Baatz wrote:
> > > > On Wed, May 01, 2013 at 03:22:06PM +0100, Catalin Marinas wrote:
> > > > > On Tue, Apr 30, 2013 at 10:04:03PM +0100, Simon Baatz wrote:
> > > > > > On Tue, Apr 30, 2013 at 12:22:25PM +0100, Catalin Marinas wrote:
> > > > > > > On Sun, Apr 21, 2013 at 11:06:30PM +0100, Simon Baatz wrote:
> > > > > > > > On Thu, Apr 18, 2013 at 02:51:04PM +0100, Catalin Marinas wrote:
> > > > > > > > > On Thu, Apr 18, 2013 at 12:40:16PM +0100, Jason Cooper wrote:
...
> 
> I haven't run the tests but I can see  why it fails without
> flush_kernel_dcache_page(). So I think this function needs to be
> implemented for aliasing VIPT or VIVT caches.
> 
> > It is even needed in flush_dcache_page() as long as everybody
> > continues to use flush_dcache_page() instead of
> > flush_kernel_dcache_page() when appropriate...
> > (This is probably the main reason why the problem I reported is so
> > uncommon: Everybody seems to use flush_dcache_page() and since it
> > flushes the kernel mapping in these cases, everything is fine.)
> 
> That's why for non-aliasing VIPT we could make it just a clear_bit() and
> let the callers fix their API usage.

Do you mean the unnecessary flush for anon pages or also the D/I
flush for user space mapped page cache pages?

For anon pages, that was the content of the patch you acked ([1])
once.  However, Russel did not like the idea to use the
PG_dcache_clean bit also for anon pages.

I have a newer version that does the following: Do nothing for
mapping == NULL on non-aliasing VIPT in flush_dcache_page().  In
__sync_icache_dcache() flush if mapping == NULL (always on aliasing
VIPT, only if pte_exec() on non-aliasing VIPT).
 
> > > > > > Additionally, although we can assume that the page is kmapped,
> > > > > > page_address(page) can still be NULL for a highmem page, right?
> > > > > 
> > > > > It looks like kmap() always sets page_address(page) but I'm not sure
> > > > > about kmap_atomic(), it doesn't seem to.
> > > > 
> > > > Hmm, in __flush_dcache_page() we have the following code to flush the
> > > > kernel mapping:
> > > > 
> > > > void __flush_dcache_page(struct address_space *mapping, struct page *page)
> > > > {
> > > > 	/*
> > > > 	 * Writeback any data associated with the kernel mapping of this
> > > > 	 * page.  This ensures that data in the physical page is mutually
> > > > 	 * coherent with the kernels mapping.
> > > > 	 */
> > > > 	if (!PageHighMem(page)) {
> > > > 		__cpuc_flush_dcache_area(page_address(page), PAGE_SIZE);
> > > > 	} else {
> > > > 		void *addr = kmap_high_get(page);
> > > > 		if (addr) {
> > > > 			__cpuc_flush_dcache_area(addr, PAGE_SIZE);
> > > > 			kunmap_high(page);
> > > > 		} else if (cache_is_vipt()) {
> > > > 			/* unmapped pages might still be cached */
> > > > 			addr = kmap_atomic(page);
> > > > 			__cpuc_flush_dcache_area(addr, PAGE_SIZE);
> > > > 			kunmap_atomic(addr);
> > > > 		}
> > > > 	}
> > > > ...
> > > > 
> > > > 
> > > > We probably should reuse it in flush_kernel_dcache_page() to flush
> > > > the kernel mapping. (The last else clause looks strange though)
> > > 
> > > I think it makes sense to reuse this logic in
> > > flush_kernel_dcache_page(). If the page is already mapped with kmap,
> > > then kmap_high_get() should return the actual address. If it was mapped
> > > with kmap_atomic, kmap_high_get() would return NULL, hence the 'else'
> > > clause and the additional kmap_atomic(). The cache_is_vipt() check is
> > > useful because kunmap_atomic() would flush VIVT caches anyway.
> > 
> > Yes, but why don't we flush for VIPT _aliasing_ already in
> > kunmap_atomic?
> 
> Some comment from Nico on the list, it seems that highmem does not work
> with aliasing VIPT caches:
> 
> http://article.gmane.org/gmane.linux.ports.arm.kernel/85114
> 
> > Why do we need to do anything for non-aliasing VIPT here?
> 
> Do you mean __flush_dcache_page()? I don't think we need.
> 
> > > As for __flush_dcache_page() called from other places like
> > > flush_dcache_page(), because of this 'else if' clause it looks like it
> > > misses flushing unmapped highmem pages on VIVT cache.
> > 
> > How many users do we have with VIVT D-caches using highmem? (This is
> > not a rhetorical questions, I have no idea.  However, I would assume
> > that this use case is almost non-existent.)
> 
> I don't really know. Maybe Nico has more info.
> 
> So my proposal:
> 
> 1. Non-aliasing VIPT - defer any D-cache flushing until
>    __sync_icache_dcache() (and fix callers like uprobes)

See above, not sure whether we can also get rid of the I-cache flush.

> 2. Aliasing VIPT or VIVT - flush the aliases as before in
>    flush_dcache_page()

Yes.

> 3. Aliasing VIPT or VIVT - implement flush_kernel_dcache_page() for all
>    pages (don't check for PageHighmem)

Ok, I will update the first version of my patch.

> 4. VIVT - remove cache flushing from kunmap_atomic and rely on
>    flush_kernel_dcache_page()
> 
> I'm not entirely sure about 4 but at the moment kunmap() doesn't flush
> caches, only kunmap_atomic() so it looks like an inconsistency.

Not sure either. 

- Simon

[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2012-October/124291.html
Catalin Marinas May 8, 2013, 3:28 p.m. UTC | #21
On Sun, May 05, 2013 at 11:26:47PM +0100, Simon Baatz wrote:
> On Fri, May 03, 2013 at 11:02:42AM +0100, Catalin Marinas wrote:
> > On Thu, May 02, 2013 at 08:38:36PM +0100, Simon Baatz wrote:
> ...
> > I haven't run the tests but I can see  why it fails without
> > flush_kernel_dcache_page(). So I think this function needs to be
> > implemented for aliasing VIPT or VIVT caches.
> > 
> > > It is even needed in flush_dcache_page() as long as everybody
> > > continues to use flush_dcache_page() instead of
> > > flush_kernel_dcache_page() when appropriate...
> > > (This is probably the main reason why the problem I reported is so
> > > uncommon: Everybody seems to use flush_dcache_page() and since it
> > > flushes the kernel mapping in these cases, everything is fine.)
> > 
> > That's why for non-aliasing VIPT we could make it just a clear_bit() and
> > let the callers fix their API usage.
> 
> Do you mean the unnecessary flush for anon pages or also the D/I
> flush for user space mapped page cache pages?

flush_dcache_page() should just ignore anonymous pages. The D/I
coherency would be handled in __sync_icache_dcache() later. Do we have
cases where a page is already mapped in user space (pte valid) and the
kernel writes any code to it? I don't think we have. PowerPC have a
simplified flush_dcache_page() and they haven't seen any issues.

> For anon pages, that was the content of the patch you acked ([1])
> once.  However, Russel did not like the idea to use the
> PG_dcache_clean bit also for anon pages.

I think Russell was right, the kernel does not have any guarantees about
the PG_arch_1 bit on anonymous pages. Do we could remove the !mapping
code path in flush_dcache_page().

> I have a newer version that does the following: Do nothing for
> mapping == NULL on non-aliasing VIPT in flush_dcache_page().  

Can this mapping == NULL be generalised for aliasing caches?

> In __sync_icache_dcache() flush if mapping == NULL (always on aliasing
> VIPT, only if pte_exec() on non-aliasing VIPT).

If the page is anonymous, do we expect user code to execute from a
clear page? When the page is first allocated, clear_user_highpage should
take care of the aliases already.
diff mbox

Patch

diff --git a/arch/arm/include/asm/cacheflush.h b/arch/arm/include/asm/cacheflush.h
index e4448e1..eca955f 100644
--- a/arch/arm/include/asm/cacheflush.h
+++ b/arch/arm/include/asm/cacheflush.h
@@ -307,6 +307,10 @@  static inline void flush_anon_page(struct vm_area_struct *vma,
 #define ARCH_HAS_FLUSH_KERNEL_DCACHE_PAGE
 static inline void flush_kernel_dcache_page(struct page *page)
 {
+	extern void __flush_kernel_dcache_page(struct page *);
+	/* highmem pages are always flushed upon kunmap already */
+	if (!PageHighMem(page))
+		__flush_kernel_dcache_page(page);
 }
 
 #define flush_dcache_mmap_lock(mapping) \
diff --git a/arch/arm/mm/flush.c b/arch/arm/mm/flush.c
index 5c474a1..59ad4fc 100644
--- a/arch/arm/mm/flush.c
+++ b/arch/arm/mm/flush.c
@@ -192,6 +192,48 @@  void __flush_dcache_page(struct address_space *mapping, struct page *page)
 				page->index << PAGE_CACHE_SHIFT);
 }
 
+/*
+ * Ensure cache coherency for the kernel mapping of this page.
+ *
+ * If the page only exists in the page cache and there are no user
+ * space mappings, we can be lazy and remember that we may have dirty
+ * kernel cache lines for later.  Otherwise, we need to flush the
+ * dirty kernel cache lines directly.
+ *
+ * Note that we disable the lazy flush for SMP configurations where
+ * the cache maintenance operations are not automatically broadcasted.
+ *
+ * We can assume that the page is no high mem page, see
+ * flush_kernel_dcache_page.
+ */
+void __flush_kernel_dcache_page(struct page *page)
+{
+	struct address_space *mapping;
+
+	/*
+	 * The zero page is never written to, so never has any dirty
+	 * cache lines, and therefore never needs to be flushed.
+	 */
+	if (page == ZERO_PAGE(0))
+		return;
+
+	mapping = page_mapping(page);
+
+	if (!cache_ops_need_broadcast()) {
+		if ((mapping && !mapping_mapped(mapping)) ||
+		    (!mapping && cache_is_vipt_nonaliasing())) {
+			clear_bit(PG_dcache_clean, &page->flags);
+			return;
+		}
+	}
+
+	__cpuc_flush_dcache_area(page_address(page), PAGE_SIZE);
+	if (mapping && !cache_is_vivt())
+		__flush_icache_all();
+	set_bit(PG_dcache_clean, &page->flags);
+}
+EXPORT_SYMBOL(__flush_kernel_dcache_page);
+
 static void __flush_dcache_aliases(struct address_space *mapping, struct page *page)
 {
 	struct mm_struct *mm = current->active_mm;