Message ID | 1349609352-6408-3-git-send-email-gmbnomis@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Sun, Oct 07, 2012 at 12:29:12PM +0100, Simon Baatz wrote: > Commit f8b63c1 made flush_kernel_dcache_page() a no-op assuming that > the pages it needs to handle are kernel mapped only. However, for > example when doing direct I/O, pages with user space mappings may > occur. > > Thus, do lazy flushing like in flush_dcache_page() if there are no user > space mappings. Otherwise, flush the kernel cache lines directly. Do you need to fix the VIPT non-aliasing case as well? Does flush_kernel_dcache_page() need to handle I-cache?
Hi Catalin, On Mon, Oct 08, 2012 at 06:44:28PM +0100, Catalin Marinas wrote: > On Sun, Oct 07, 2012 at 12:29:12PM +0100, Simon Baatz wrote: > > Commit f8b63c1 made flush_kernel_dcache_page() a no-op assuming that > > the pages it needs to handle are kernel mapped only. However, for > > example when doing direct I/O, pages with user space mappings may > > occur. > > > > Thus, do lazy flushing like in flush_dcache_page() if there are no user > > space mappings. Otherwise, flush the kernel cache lines directly. > > Do you need to fix the VIPT non-aliasing case as well? Does > flush_kernel_dcache_page() need to handle I-cache? Good question. My previous version of the patch did not handle it, but after our discussion on the arm64 case, I came to the conclusion that we probably need to handle it. flush_dcache_page() and flush_kernel_dcache_page() are both used when a page was modified via its kernel mapping. For example, crypto/scatterwalk.c uses flush_dcache_page() whereas lib/scatterlist.c uses flush_kernel_dcache_page(). Thus, the reasoning is that if flush_dcache_page() needs to handle I-cache in the case there is no hook later (already user-mapped page cache page) then flush_kernel_dcache_page() needs to do the same. With respect to clearing the flag in the VIPT non-aliasing case with anon pages: Declaring the pages dirty by default may already be sufficient in this case, at least that is what the current implementation assumes. On the other hand, if we clear the PG_dcache_clean flag in a function that is not even supposed to handle anon pages, then why shouldn't we do it in a function that is? I wanted to play safe here and apply the same logic in both functions. - Simon
On Mon, Oct 08, 2012 at 10:02:16PM +0200, Simon Baatz wrote: > Hi Catalin, > > On Mon, Oct 08, 2012 at 06:44:28PM +0100, Catalin Marinas wrote: > > On Sun, Oct 07, 2012 at 12:29:12PM +0100, Simon Baatz wrote: > > > Commit f8b63c1 made flush_kernel_dcache_page() a no-op assuming that > > > the pages it needs to handle are kernel mapped only. However, for > > > example when doing direct I/O, pages with user space mappings may > > > occur. > > > > > > Thus, do lazy flushing like in flush_dcache_page() if there are no user > > > space mappings. Otherwise, flush the kernel cache lines directly. > > > > Do you need to fix the VIPT non-aliasing case as well? Does > > flush_kernel_dcache_page() need to handle I-cache? > > Good question. My previous version of the patch did not handle it, > but after our discussion on the arm64 case, I came to the conclusion > that we probably need to handle it. > > flush_dcache_page() and flush_kernel_dcache_page() are both used when > a page was modified via its kernel mapping. For example, > crypto/scatterwalk.c uses flush_dcache_page() whereas > lib/scatterlist.c uses flush_kernel_dcache_page(). It's likely that this stuff is incredibly buggy - and where we suspect there's problems (like using the wrong flush_dcache_page() vs flush_kernel_dcache_page() call) that needs to be fixed. I suspect crypto/scatterwalk.c was created long before flush_kernel_dcache_page() came into existence, and it probably needs to be fixed. > Thus, the reasoning is that if flush_dcache_page() needs to handle > I-cache in the case there is no hook later (already user-mapped page > cache page) then flush_kernel_dcache_page() needs to do the same. This sounds like we're heading in the direction that flush_kernel_dcache_page() and flush_dcache_page() end up being virtually identical. This isn't supposed to be - they _are_ supposed to be different. Again, maybe that's because the users have chosen the wrong function. Remember that these functions only get used on so-called "minority" architectures where the caches are a hinderance - to put that a different way, most people don't care about these calls because x86 just works. > With respect to clearing the flag in the VIPT non-aliasing case with > anon pages: Declaring the pages dirty by default may already be > sufficient in this case, at least that is what the current > implementation assumes. PG_arch_1 has no meaning what so ever for anon pages. Don't even try to make it have meaning there, it will cause lots of pain if you do. The reason is that anon pages have no mapping, and so trying to do the PG_arch_1 trick will result in most places deciding "there is no userspace mapping we can skip that".
Hi Russell, On Mon, Oct 08, 2012 at 09:20:40PM +0100, Russell King - ARM Linux wrote: > On Mon, Oct 08, 2012 at 10:02:16PM +0200, Simon Baatz wrote: > > Hi Catalin, > > > > On Mon, Oct 08, 2012 at 06:44:28PM +0100, Catalin Marinas wrote: > > > On Sun, Oct 07, 2012 at 12:29:12PM +0100, Simon Baatz wrote: > > > > Commit f8b63c1 made flush_kernel_dcache_page() a no-op assuming that > > > > the pages it needs to handle are kernel mapped only. However, for > > > > example when doing direct I/O, pages with user space mappings may > > > > occur. > > > > > > > > Thus, do lazy flushing like in flush_dcache_page() if there are no user > > > > space mappings. Otherwise, flush the kernel cache lines directly. > > > > > > Do you need to fix the VIPT non-aliasing case as well? Does > > > flush_kernel_dcache_page() need to handle I-cache? > > > > Good question. My previous version of the patch did not handle it, > > but after our discussion on the arm64 case, I came to the conclusion > > that we probably need to handle it. > > > > flush_dcache_page() and flush_kernel_dcache_page() are both used when > > a page was modified via its kernel mapping. For example, > > crypto/scatterwalk.c uses flush_dcache_page() whereas > > lib/scatterlist.c uses flush_kernel_dcache_page(). > > It's likely that this stuff is incredibly buggy - and where we suspect > there's problems (like using the wrong flush_dcache_page() vs > flush_kernel_dcache_page() call) that needs to be fixed. > > I suspect crypto/scatterwalk.c was created long before > flush_kernel_dcache_page() came into existence, and it probably needs > to be fixed. > > > Thus, the reasoning is that if flush_dcache_page() needs to handle > > I-cache in the case there is no hook later (already user-mapped page > > cache page) then flush_kernel_dcache_page() needs to do the same. > > This sounds like we're heading in the direction that flush_kernel_dcache_page() > and flush_dcache_page() end up being virtually identical. This isn't > supposed to be - they _are_ supposed to be different. Again, maybe > that's because the users have chosen the wrong function. > ... Yes, it is a mess and may be incredibly buggy. According to documentation flush_kernel_dcache_page() is supposed to handle user pages. It can assume that potential user mappings have been flushed. However, the documentation does not mention the case of page cache pages with no user mappings explicitly. flush_dcache_page() is supposed to deal with the user and kernel mappings. However, the documentation does explicitly state that it is not supposed to handle anon pages (but the arm implementation does flush the kernel mapping). Now, what should crypto/scatterwalk.c do? In the write to page path, it can see page cache pages with and without user mappings and anon pages. If there are user mappings, it can assume that they have been flushed. I would argue it should use flush_kernel_dcache_page(). But if so, our current implementation does not handle two of these three cases at all. And the proposed flush_kernel_dcache_page() is different. It only flushes the kernel mapping in the relevant cases whereas flush_dcache_page() needs to care about user mappings as well. According to documentation we could remove the anon page case in flush_dcache_page(), but uses like in crypto/scatterwalk.c currently prevent that. > > With respect to clearing the flag in the VIPT non-aliasing case with > > anon pages: Declaring the pages dirty by default may already be > > sufficient in this case, at least that is what the current > > implementation assumes. > > PG_arch_1 has no meaning what so ever for anon pages. Don't even try > to make it have meaning there, it will cause lots of pain if you do. > The reason is that anon pages have no mapping, and so trying to do the > PG_arch_1 trick will result in most places deciding "there is no > userspace mapping we can skip that". Hmm, now that you say it, it's quite obvious. There is the following in __sync_icache_dcache(): if (cache_is_vipt_aliasing()) mapping = page_mapping(page); else mapping = NULL; if (!test_and_set_bit(PG_dcache_clean, &page->flags)) __flush_dcache_page(mapping, page); if (pte_exec(pteval)) __flush_icache_all(); At least this looks strange to me given that PG_dcache_clean has no meaning for anon pages. Is this correct? - Simon
Russell, Catalin, Simon, Any progress on this? At worst, I'd like to see this in fixes for v3.8. Assuming, of course, we've gotten to the bottom of it. thx, Jason. On Tue, Oct 09, 2012 at 01:07:34AM +0200, Simon Baatz wrote: > Hi Russell, > > On Mon, Oct 08, 2012 at 09:20:40PM +0100, Russell King - ARM Linux wrote: > > On Mon, Oct 08, 2012 at 10:02:16PM +0200, Simon Baatz wrote: > > > Hi Catalin, > > > > > > On Mon, Oct 08, 2012 at 06:44:28PM +0100, Catalin Marinas wrote: > > > > On Sun, Oct 07, 2012 at 12:29:12PM +0100, Simon Baatz wrote: > > > > > Commit f8b63c1 made flush_kernel_dcache_page() a no-op assuming that > > > > > the pages it needs to handle are kernel mapped only. However, for > > > > > example when doing direct I/O, pages with user space mappings may > > > > > occur. > > > > > > > > > > Thus, do lazy flushing like in flush_dcache_page() if there are no user > > > > > space mappings. Otherwise, flush the kernel cache lines directly. > > > > > > > > Do you need to fix the VIPT non-aliasing case as well? Does > > > > flush_kernel_dcache_page() need to handle I-cache? > > > > > > Good question. My previous version of the patch did not handle it, > > > but after our discussion on the arm64 case, I came to the conclusion > > > that we probably need to handle it. > > > > > > flush_dcache_page() and flush_kernel_dcache_page() are both used when > > > a page was modified via its kernel mapping. For example, > > > crypto/scatterwalk.c uses flush_dcache_page() whereas > > > lib/scatterlist.c uses flush_kernel_dcache_page(). > > > > It's likely that this stuff is incredibly buggy - and where we suspect > > there's problems (like using the wrong flush_dcache_page() vs > > flush_kernel_dcache_page() call) that needs to be fixed. > > > > I suspect crypto/scatterwalk.c was created long before > > flush_kernel_dcache_page() came into existence, and it probably needs > > to be fixed. > > > > > Thus, the reasoning is that if flush_dcache_page() needs to handle > > > I-cache in the case there is no hook later (already user-mapped page > > > cache page) then flush_kernel_dcache_page() needs to do the same. > > > > This sounds like we're heading in the direction that flush_kernel_dcache_page() > > and flush_dcache_page() end up being virtually identical. This isn't > > supposed to be - they _are_ supposed to be different. Again, maybe > > that's because the users have chosen the wrong function. > > ... > > Yes, it is a mess and may be incredibly buggy. > > According to documentation flush_kernel_dcache_page() is supposed to > handle user pages. It can assume that potential user mappings have > been flushed. However, the documentation does not mention the case > of page cache pages with no user mappings explicitly. > > flush_dcache_page() is supposed to deal with the user and kernel > mappings. However, the documentation does explicitly state that it is > not supposed to handle anon pages (but the arm implementation does > flush the kernel mapping). > > Now, what should crypto/scatterwalk.c do? In the write to page path, > it can see page cache pages with and without user mappings and anon > pages. If there are user mappings, it can assume that they have been > flushed. I would argue it should use flush_kernel_dcache_page(). > But if so, our current implementation does not handle two of these > three cases at all. > > And the proposed flush_kernel_dcache_page() is different. It only > flushes the kernel mapping in the relevant cases whereas > flush_dcache_page() needs to care about user mappings as well. > According to documentation we could remove the anon page case in > flush_dcache_page(), but uses like in crypto/scatterwalk.c currently > prevent that. > > > > With respect to clearing the flag in the VIPT non-aliasing case with > > > anon pages: Declaring the pages dirty by default may already be > > > sufficient in this case, at least that is what the current > > > implementation assumes. > > > > PG_arch_1 has no meaning what so ever for anon pages. Don't even try > > to make it have meaning there, it will cause lots of pain if you do. > > The reason is that anon pages have no mapping, and so trying to do the > > PG_arch_1 trick will result in most places deciding "there is no > > userspace mapping we can skip that". > > Hmm, now that you say it, it's quite obvious. There is the following > in __sync_icache_dcache(): > > if (cache_is_vipt_aliasing()) > mapping = page_mapping(page); > else > mapping = NULL; > > if (!test_and_set_bit(PG_dcache_clean, &page->flags)) > __flush_dcache_page(mapping, page); > > if (pte_exec(pteval)) > __flush_icache_all(); > > At least this looks strange to me given that PG_dcache_clean has no > meaning for anon pages. Is this correct? > > - Simon > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Russell, Catalin, Simon, I'm digging through my 'onice' mail folder and found this. I've personally experienced this bug in the past (workaround: don't use LVM on ARM systems :-( ). Is there any interest in reviving this series? I can dig up the patches if needed. thx, Jason. On Sun, Nov 18, 2012 at 04:10:05PM -0500, Jason Cooper wrote: > Russell, Catalin, Simon, > > Any progress on this? At worst, I'd like to see this in fixes for v3.8. > Assuming, of course, we've gotten to the bottom of it. -- Here's the original thread: > On Tue, Oct 09, 2012 at 01:07:34AM +0200, Simon Baatz wrote: > > Hi Russell, > > > > On Mon, Oct 08, 2012 at 09:20:40PM +0100, Russell King - ARM Linux wrote: > > > On Mon, Oct 08, 2012 at 10:02:16PM +0200, Simon Baatz wrote: > > > > Hi Catalin, > > > > > > > > On Mon, Oct 08, 2012 at 06:44:28PM +0100, Catalin Marinas wrote: > > > > > On Sun, Oct 07, 2012 at 12:29:12PM +0100, Simon Baatz wrote: > > > > > > Commit f8b63c1 made flush_kernel_dcache_page() a no-op assuming that > > > > > > the pages it needs to handle are kernel mapped only. However, for > > > > > > example when doing direct I/O, pages with user space mappings may > > > > > > occur. > > > > > > > > > > > > Thus, do lazy flushing like in flush_dcache_page() if there are no user > > > > > > space mappings. Otherwise, flush the kernel cache lines directly. > > > > > > > > > > Do you need to fix the VIPT non-aliasing case as well? Does > > > > > flush_kernel_dcache_page() need to handle I-cache? > > > > > > > > Good question. My previous version of the patch did not handle it, > > > > but after our discussion on the arm64 case, I came to the conclusion > > > > that we probably need to handle it. > > > > > > > > flush_dcache_page() and flush_kernel_dcache_page() are both used when > > > > a page was modified via its kernel mapping. For example, > > > > crypto/scatterwalk.c uses flush_dcache_page() whereas > > > > lib/scatterlist.c uses flush_kernel_dcache_page(). > > > > > > It's likely that this stuff is incredibly buggy - and where we suspect > > > there's problems (like using the wrong flush_dcache_page() vs > > > flush_kernel_dcache_page() call) that needs to be fixed. > > > > > > I suspect crypto/scatterwalk.c was created long before > > > flush_kernel_dcache_page() came into existence, and it probably needs > > > to be fixed. > > > > > > > Thus, the reasoning is that if flush_dcache_page() needs to handle > > > > I-cache in the case there is no hook later (already user-mapped page > > > > cache page) then flush_kernel_dcache_page() needs to do the same. > > > > > > This sounds like we're heading in the direction that flush_kernel_dcache_page() > > > and flush_dcache_page() end up being virtually identical. This isn't > > > supposed to be - they _are_ supposed to be different. Again, maybe > > > that's because the users have chosen the wrong function. > > > ... > > > > Yes, it is a mess and may be incredibly buggy. > > > > According to documentation flush_kernel_dcache_page() is supposed to > > handle user pages. It can assume that potential user mappings have > > been flushed. However, the documentation does not mention the case > > of page cache pages with no user mappings explicitly. > > > > flush_dcache_page() is supposed to deal with the user and kernel > > mappings. However, the documentation does explicitly state that it is > > not supposed to handle anon pages (but the arm implementation does > > flush the kernel mapping). > > > > Now, what should crypto/scatterwalk.c do? In the write to page path, > > it can see page cache pages with and without user mappings and anon > > pages. If there are user mappings, it can assume that they have been > > flushed. I would argue it should use flush_kernel_dcache_page(). > > But if so, our current implementation does not handle two of these > > three cases at all. > > > > And the proposed flush_kernel_dcache_page() is different. It only > > flushes the kernel mapping in the relevant cases whereas > > flush_dcache_page() needs to care about user mappings as well. > > According to documentation we could remove the anon page case in > > flush_dcache_page(), but uses like in crypto/scatterwalk.c currently > > prevent that. > > > > > > With respect to clearing the flag in the VIPT non-aliasing case with > > > > anon pages: Declaring the pages dirty by default may already be > > > > sufficient in this case, at least that is what the current > > > > implementation assumes. > > > > > > PG_arch_1 has no meaning what so ever for anon pages. Don't even try > > > to make it have meaning there, it will cause lots of pain if you do. > > > The reason is that anon pages have no mapping, and so trying to do the > > > PG_arch_1 trick will result in most places deciding "there is no > > > userspace mapping we can skip that". > > > > Hmm, now that you say it, it's quite obvious. There is the following > > in __sync_icache_dcache(): > > > > if (cache_is_vipt_aliasing()) > > mapping = page_mapping(page); > > else > > mapping = NULL; > > > > if (!test_and_set_bit(PG_dcache_clean, &page->flags)) > > __flush_dcache_page(mapping, page); > > > > if (pte_exec(pteval)) > > __flush_icache_all(); > > > > At least this looks strange to me given that PG_dcache_clean has no > > meaning for anon pages. Is this correct? > > > > - Simon > > > > _______________________________________________ > > linux-arm-kernel mailing list > > linux-arm-kernel@lists.infradead.org > > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
On Thu, Apr 18, 2013 at 07:16:08AM -0400, Jason Cooper wrote: > Russell, Catalin, Simon, > > I'm digging through my 'onice' mail folder and found this. I've > personally experienced this bug in the past (workaround: don't use LVM > on ARM systems :-( ). > > Is there any interest in reviving this series? I can dig up the patches > if needed. None what so ever. flush_kernel_dcache_page() is not supposed to touch userspace pages. void flush_kernel_dcache_page(struct page *page) When the kernel needs to modify a user page is has obtained with kmap, it calls this function after all modifications are complete (but before kunmapping it) to bring the underlying page up to date. It is assumed here that the user has no ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ incoherent cached copies (i.e. the original page was obtained ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ from a mechanism like get_user_pages()). The default ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ implementation is a nop and should remain so on all coherent architectures. On incoherent architectures, this should flush ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ the kernel cache for page (using page_address(page)). ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The first underlined sentence means that it's assumed that something _else_ has already dealt with the userspace aliases (in other words, get_user_pages() via flush_dcache_page() or flush_anon_page()). The second is absolutely clear that _only_ the kernel side should be flushed. flush_kernel_dcache_page() was invented explicitly to separate out the kernel side flushing from flush_dcache_page(), as a lighter weight version for block drivers to use. Also see the commit text, which explicitly states that this is a lighter weight API, and that this _only_ touches the kernel view of the page: commit 5a3a5a98b6422d05c39eaa32c8b3f83840c7b768 Author: James Bottomley <James.Bottomley@SteelEye.com> Date: Sun Mar 26 01:36:59 2006 -0800 [PATCH] Add flush_kernel_dcache_page() API We have a problem in a lot of emulated storage in that it takes a page from get_user_pages() and does something like kmap_atomic(page) modify page kunmap_atomic(page) However, nothing has flushed the kernel cache view of the page before the kunmap. We need a lightweight API to do this, so this new API would specifically be for flushing the kernel cache view of a user page which the kernel has modified. The driver would need to add flush_kernel_dcache_page(page) before the final kunmap. Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
On Thu, Apr 18, 2013 at 12:22:01PM +0100, Russell King - ARM Linux wrote: > On Thu, Apr 18, 2013 at 07:16:08AM -0400, Jason Cooper wrote: > > Russell, Catalin, Simon, > > > > I'm digging through my 'onice' mail folder and found this. I've > > personally experienced this bug in the past (workaround: don't use LVM > > on ARM systems :-( ). > > > > Is there any interest in reviving this series? I can dig up the patches > > if needed. > > None what so ever. flush_kernel_dcache_page() is not supposed to touch > userspace pages. > > void flush_kernel_dcache_page(struct page *page) > When the kernel needs to modify a user page is has obtained > with kmap, it calls this function after all modifications are > complete (but before kunmapping it) to bring the underlying > page up to date. It is assumed here that the user has no > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > incoherent cached copies (i.e. the original page was obtained > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > from a mechanism like get_user_pages()). The default > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > implementation is a nop and should remain so on all coherent > architectures. On incoherent architectures, this should flush > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > the kernel cache for page (using page_address(page)). > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > The first underlined sentence means that it's assumed that something _else_ > has already dealt with the userspace aliases (in other words, get_user_pages() > via flush_dcache_page() or flush_anon_page()). The second is absolutely > clear that _only_ the kernel side should be flushed. > > flush_kernel_dcache_page() was invented explicitly to separate out the > kernel side flushing from flush_dcache_page(), as a lighter weight version > for block drivers to use. > > Also see the commit text, which explicitly states that this is a lighter > weight API, and that this _only_ touches the kernel view of the page: > > commit 5a3a5a98b6422d05c39eaa32c8b3f83840c7b768 > Author: James Bottomley <James.Bottomley@SteelEye.com> > Date: Sun Mar 26 01:36:59 2006 -0800 > > [PATCH] Add flush_kernel_dcache_page() API > > We have a problem in a lot of emulated storage in that it takes a page from > get_user_pages() and does something like > > kmap_atomic(page) > modify page > kunmap_atomic(page) > > However, nothing has flushed the kernel cache view of the page before the > kunmap. We need a lightweight API to do this, so this new API would > specifically be for flushing the kernel cache view of a user page which the > kernel has modified. The driver would need to add > flush_kernel_dcache_page(page) before the final kunmap. > > Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com> > Cc: Russell King <rmk@arm.linux.org.uk> > Cc: "David S. Miller" <davem@davemloft.net> > Signed-off-by: Andrew Morton <akpm@osdl.org> > Signed-off-by: Linus Torvalds <torvalds@osdl.org> Ok, got it. I should have been more explicit. LVM doesn't work on ARM. iirc, Simon had a demo of dm-crypt also faulting on ARM. This patch was not the correct approach. Is there an interest (particularly Simon) in fixing the problem? I'm willing to take a look at it, but LVM on ARM is a "nice to have" for me, not a critical need. The job would be easier if Simon was willing to assist and lend his experience with the issue. thx, Jason.
On Thu, Apr 18, 2013 at 12:40:16PM +0100, Jason Cooper wrote: > Ok, got it. I should have been more explicit. LVM doesn't work on ARM. > iirc, Simon had a demo of dm-crypt also faulting on ARM. This patch was > not the correct approach. Is there an interest (particularly Simon) in > fixing the problem? I think fixing this for ARM is useful but I don't have any time to allocate. I think I acked the first patch in the series but I don't fully remember the details behind the second one. As Russell said, flush_kernel_dcache_page() is not the right API. flush_dcache_page() is not supposed to be used on anonymous pages. What we have for such pages is flush_anon_page() which is a no-op for VIPT non-aliasing pages. I can see that __get_user_pages() calls both flush_anon_page() flush_dcache_page(). Is the problem that you have related to I-D cache coherency? Is flush_anon_page() the right place for this?
Hi Russel, On Thu, Apr 18, 2013 at 12:22:01PM +0100, Russell King - ARM Linux wrote: > On Thu, Apr 18, 2013 at 07:16:08AM -0400, Jason Cooper wrote: > > Russell, Catalin, Simon, > > > > I'm digging through my 'onice' mail folder and found this. I've > > personally experienced this bug in the past (workaround: don't use LVM > > on ARM systems :-( ). > > > > Is there any interest in reviving this series? I can dig up the patches > > if needed. > > None what so ever. flush_kernel_dcache_page() is not supposed to touch > userspace pages. I don't know if we have a wording issue here. The header of the patch says "user space mapped pages". Thus, pages for which a user space mapping exists. As the documentation you cite below states, these pages ("user pages") should be handled by flush_kernel_dcache_page(). Of course, flush_kernel_dcache_page() has only to care about the dirty kernel mapping, but this is what the proposed patches do. I never proposed to touch the user space mappings. > void flush_kernel_dcache_page(struct page *page) > When the kernel needs to modify a user page is has obtained > with kmap, it calls this function after all modifications are > complete (but before kunmapping it) to bring the underlying > page up to date. It is assumed here that the user has no > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > incoherent cached copies (i.e. the original page was obtained > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > from a mechanism like get_user_pages()). The default > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > implementation is a nop and should remain so on all coherent > architectures. On incoherent architectures, this should flush > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > the kernel cache for page (using page_address(page)). > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > The first underlined sentence means that it's assumed that something _else_ > has already dealt with the userspace aliases (in other words, get_user_pages() > via flush_dcache_page() or flush_anon_page()). The second is absolutely > clear that _only_ the kernel side should be flushed. > > flush_kernel_dcache_page() was invented explicitly to separate out the > kernel side flushing from flush_dcache_page(), as a lighter weight version > for block drivers to use. Yes, exactly. Apart from one thing: flush_kernel_dcache_page() is supposed to handle all user pages. Thus, it is a more lightweight version of flush_dcache_page() and flush_anon_page(). But the current implementation only handles page cache pages without a user mapping (it's a nop because we can do lazy flushing here). It does not handle page cache pages with a user space mapping and anon pages. My first patch (see [1]) was the "minimalistic" approach to do this. Just flush the _kernel_ mapping if we have user space mappings. Continue to be lazy otherwise. But then the question is, whether this is enough. Let's take commit 65b6ecc: +++ b/kernel/events/uprobes.c @@ -1199,6 +1199,11 @@ static unsigned long xol_get_insn_slot(struct uprobe *uprobe, unsigned long slot vaddr = kmap_atomic(area->page); memcpy(vaddr + offset, uprobe->arch.insn, MAX_UINSN_BYTES); kunmap_atomic(vaddr); + /* + * We probably need flush_icache_user_range() but it needs vma. + * This should work on supported architectures too. + */ + flush_dcache_page(area->page); Wouldn't that be a use for flush_kernel_dcache_page() (before kunmap())? But then, we need to care about I/D-cache coherency, i.e. flush_kernel_dcache_page() very much looks like flush_dcache_page() but without the "flush user mappings" part. That's the idea behind the second version (see [2]). If that's not a relevant case, I am more than happy to go back to [1]. The third version combined this with a slight optimization, but this had problems as you pointed out at the time and we can forget about that one for the time being. Still, I consider the older versions [1] or [2] to be relevant. - Simon [1] http://lists.infradead.org/pipermail/linux-arm-kernel/2012-May/101794.html [2] http://lists.infradead.org/pipermail/linux-arm-kernel/2012-September/122874.html
Hi Jason, On Thu, Apr 18, 2013 at 07:40:16AM -0400, Jason Cooper wrote: > On Thu, Apr 18, 2013 at 12:22:01PM +0100, Russell King - ARM Linux wrote: > > On Thu, Apr 18, 2013 at 07:16:08AM -0400, Jason Cooper wrote: > > > Russell, Catalin, Simon, > > > > > > I'm digging through my 'onice' mail folder and found this. I've > > > personally experienced this bug in the past (workaround: don't use LVM > > > on ARM systems :-( ). > > > > > > Is there any interest in reviving this series? I can dig up the patches > > > if needed. ... > > Ok, got it. I should have been more explicit. LVM doesn't work on ARM. > iirc, Simon had a demo of dm-crypt also faulting on ARM. This patch was > not the correct approach. Is there an interest (particularly Simon) in > fixing the problem? > > I'm willing to take a look at it, but LVM on ARM is a "nice to have" for > me, not a critical need. The job would be easier if Simon was willing > to assist and lend his experience with the issue. Just to be precise here, the problem is much more specific: Drivers which use flush_kernel_dcache_page() on VIVT or aliasing VIPT ARM architectures may have problems with direct I/O. In general, LVM should work perfectly fine on ARM. But if you have this particular combination (in my case Kirkwood ARMv5 with VIVT D-Cache and the mv_cesa driver), you can't put a physical volume on such a device. LVM user space tools will just read garbage when trying to read from the device using direct I/O. - Simon
Hi Catalin, On Thu, Apr 18, 2013 at 02:51:04PM +0100, Catalin Marinas wrote: > On Thu, Apr 18, 2013 at 12:40:16PM +0100, Jason Cooper wrote: > > Ok, got it. I should have been more explicit. LVM doesn't work on ARM. > > iirc, Simon had a demo of dm-crypt also faulting on ARM. This patch was > > not the correct approach. Is there an interest (particularly Simon) in > > fixing the problem? > > I think fixing this for ARM is useful but I don't have any time to > allocate. I think I acked the first patch in the series but I don't > fully remember the details behind the second one. > > As Russell said, flush_kernel_dcache_page() is not the right API. It is not the driver itself which is using the API, it is the generic scatterlist memory iterator. And I don't think that this is wrong, as I have tried to explain in [1]. > flush_dcache_page() is not supposed to be used on anonymous pages. What > we have for such pages is flush_anon_page() which is a no-op for VIPT > non-aliasing pages. I can see that __get_user_pages() calls both > flush_anon_page() flush_dcache_page(). Yes. But I think that is not relevant here. Although the naming suggest that the function is just a light-weight version of flush_dcache_page(), both the documentation and the commit text say that flush_kernel_dcache_page() has to handle pages that were obtained through get_user_pages, which includes anon pages. Btw. many drivers which modify pages via the kernel mapping currently use flush_dcache_page() instead. This is not supposed to work in e.g. the direct I/O case where a driver sees anon pages. This works just because flush_dcache_page() flushes the kernel mapping also for anon pages. Something is is clearly not supposed to do according to documentation. > Is the problem that you have related to I-D cache coherency? Is > flush_anon_page() the right place for this? This particular problem is not related to I/D cache coherency. However, as far as I know, flush_dcache_page() is expected to ensure also I/D cache coherency. Also, we flush the I-cache in certain situations in this function. As said in my response to Russel, I don't know whether this is expected from flush_kernel_dcache_page() as well. I would say yes, but I also have posted a version that only flushes the D-cache kernel mapping. - Simon [1] http://permalink.gmane.org/gmane.linux.ports.arm.kernel/181703
On Sun, Apr 21, 2013 at 11:06:30PM +0100, Simon Baatz wrote: > On Thu, Apr 18, 2013 at 02:51:04PM +0100, Catalin Marinas wrote: > > On Thu, Apr 18, 2013 at 12:40:16PM +0100, Jason Cooper wrote: > > > Ok, got it. I should have been more explicit. LVM doesn't work on ARM. > > > iirc, Simon had a demo of dm-crypt also faulting on ARM. This patch was > > > not the correct approach. Is there an interest (particularly Simon) in > > > fixing the problem? > > > > I think fixing this for ARM is useful but I don't have any time to > > allocate. I think I acked the first patch in the series but I don't > > fully remember the details behind the second one. > > > > As Russell said, flush_kernel_dcache_page() is not the right API. > > It is not the driver itself which is using the API, it is the > generic scatterlist memory iterator. And I don't think that this is > wrong, as I have tried to explain in [1]. Trying to remember what we've discussed over the past months on this topic. It looks like sg_miter_stop() does the right thing in calling flush_kernel_dcache_page(). Commit f8b63c1 (ARM: 6382/1: Remove superfluous flush_kernel_dcache_page()) removed this function entirely. The code previously had this comment - /* highmem pages are always flushed upon kunmap already */ which I think it wasn't fully correct either. The kunmap_atomic() flushes the caches but kunmap() doesn't, so I suspect we only get the flushing if SG_MITER_ATOMIC. So it looks to me like flush_kernel_dcache_page() should be implemented even for highmem pages (with VIVT or aliasing VIPT, at least for non kmap_atomic addresses by checking for FIXADDR_START). If highmem is disabled, I suspect we still need this function since the calling code doesn't care whether kmap/kunmap was a no-op. But can we keep it as a simple call to __cpuc_flush_dcache_area(page_address(page), PAGE_SIZE)?
Hi Catalin, On Tue, Apr 30, 2013 at 12:22:25PM +0100, Catalin Marinas wrote: > On Sun, Apr 21, 2013 at 11:06:30PM +0100, Simon Baatz wrote: > > On Thu, Apr 18, 2013 at 02:51:04PM +0100, Catalin Marinas wrote: > > > On Thu, Apr 18, 2013 at 12:40:16PM +0100, Jason Cooper wrote: ... > > > > It is not the driver itself which is using the API, it is the > > generic scatterlist memory iterator. And I don't think that this is > > wrong, as I have tried to explain in [1]. > > Trying to remember what we've discussed over the past months on this > topic. It looks like sg_miter_stop() does the right thing in calling > flush_kernel_dcache_page(). Commit f8b63c1 (ARM: 6382/1: Remove > superfluous flush_kernel_dcache_page()) removed this function entirely. > The code previously had this comment - /* highmem pages are always > flushed upon kunmap already */ which I think it wasn't fully correct > either. The kunmap_atomic() flushes the caches but kunmap() doesn't, so > I suspect we only get the flushing if SG_MITER_ATOMIC. > > So it looks to me like flush_kernel_dcache_page() should be implemented > even for highmem pages (with VIVT or aliasing VIPT, at least for non > kmap_atomic addresses by checking for FIXADDR_START). If highmem is > disabled, I suspect we still need this function since the calling code > doesn't care whether kmap/kunmap was a no-op. But can we keep it as a > simple call to __cpuc_flush_dcache_area(page_address(page), PAGE_SIZE)? My first version ([1]) had: if ((cache_is_vivt() || cache_is_vipt_aliasing()) && !PageHighMem(page)) __flush_kernel_dcache_page(page); If I understand this correctly, you are proposing to remove the highmem exclusion. And then in __flush_kernel_dcache_page(): mapping = page_mapping(page); if (!mapping || mapping_mapped(mapping)) __cpuc_flush_dcache_area(page_address(page), PAGE_SIZE); I still prefer to have this condition here to avoid the flush when there is no user mapping at all. This is handled by lazy flushing and is probably the most common case anyway (given how many people seem to be affected by this problem). Additionally, although we can assume that the page is kmapped, page_address(page) can still be NULL for a highmem page, right? - Simon [1] http://lists.infradead.org/pipermail/linux-arm-kernel/2012-May/101794.html
On Tue, Apr 30, 2013 at 10:04:03PM +0100, Simon Baatz wrote: > On Tue, Apr 30, 2013 at 12:22:25PM +0100, Catalin Marinas wrote: > > On Sun, Apr 21, 2013 at 11:06:30PM +0100, Simon Baatz wrote: > > > On Thu, Apr 18, 2013 at 02:51:04PM +0100, Catalin Marinas wrote: > > > > On Thu, Apr 18, 2013 at 12:40:16PM +0100, Jason Cooper wrote: > ... > > > > > > It is not the driver itself which is using the API, it is the > > > generic scatterlist memory iterator. And I don't think that this is > > > wrong, as I have tried to explain in [1]. > > > > Trying to remember what we've discussed over the past months on this > > topic. It looks like sg_miter_stop() does the right thing in calling > > flush_kernel_dcache_page(). Commit f8b63c1 (ARM: 6382/1: Remove > > superfluous flush_kernel_dcache_page()) removed this function entirely. > > The code previously had this comment - /* highmem pages are always > > flushed upon kunmap already */ which I think it wasn't fully correct > > either. The kunmap_atomic() flushes the caches but kunmap() doesn't, so > > I suspect we only get the flushing if SG_MITER_ATOMIC. > > > > So it looks to me like flush_kernel_dcache_page() should be implemented > > even for highmem pages (with VIVT or aliasing VIPT, at least for non > > kmap_atomic addresses by checking for FIXADDR_START). If highmem is > > disabled, I suspect we still need this function since the calling code > > doesn't care whether kmap/kunmap was a no-op. But can we keep it as a > > simple call to __cpuc_flush_dcache_area(page_address(page), PAGE_SIZE)? > > My first version ([1]) had: > > if ((cache_is_vivt() || cache_is_vipt_aliasing()) && !PageHighMem(page)) > __flush_kernel_dcache_page(page); > > If I understand this correctly, you are proposing to remove the > highmem exclusion. The highmem exclusion may have been there originally because of a comment suggesting that kunmap() does the flushing. This is the case only for kunmap_atomic() AFAICT (and maybe we could remove that as well and rely on flush_kernel_dcache_page() being called). > And then in __flush_kernel_dcache_page(): > > mapping = page_mapping(page); > > if (!mapping || mapping_mapped(mapping)) > __cpuc_flush_dcache_area(page_address(page), PAGE_SIZE); > > I still prefer to have this condition here to avoid the flush when > there is no user mapping at all. This is handled by lazy flushing > and is probably the most common case anyway (given how many people > seem to be affected by this problem). Looking at the old thread, you said there is a case when this condition is not true (O_DIRECT case). If that's for a page cache page, then we can handle it lazily (for anonymous pages I don't think we can rely on lazy flushing since the kernel does not guarantee the clearing of the PG_arch_1 bit). > Additionally, although we can assume that the page is kmapped, > page_address(page) can still be NULL for a highmem page, right? It looks like kmap() always sets page_address(page) but I'm not sure about kmap_atomic(), it doesn't seem to.
On Wed, May 01, 2013 at 03:22:06PM +0100, Catalin Marinas wrote: > On Tue, Apr 30, 2013 at 10:04:03PM +0100, Simon Baatz wrote: > > On Tue, Apr 30, 2013 at 12:22:25PM +0100, Catalin Marinas wrote: > > > On Sun, Apr 21, 2013 at 11:06:30PM +0100, Simon Baatz wrote: > > > > On Thu, Apr 18, 2013 at 02:51:04PM +0100, Catalin Marinas wrote: > > > > > On Thu, Apr 18, 2013 at 12:40:16PM +0100, Jason Cooper wrote: > > ... > > > > > > > > It is not the driver itself which is using the API, it is the > > > > generic scatterlist memory iterator. And I don't think that this is > > > > wrong, as I have tried to explain in [1]. > > > > > > Trying to remember what we've discussed over the past months on this > > > topic. It looks like sg_miter_stop() does the right thing in calling > > > flush_kernel_dcache_page(). Commit f8b63c1 (ARM: 6382/1: Remove > > > superfluous flush_kernel_dcache_page()) removed this function entirely. > > > The code previously had this comment - /* highmem pages are always > > > flushed upon kunmap already */ which I think it wasn't fully correct > > > either. The kunmap_atomic() flushes the caches but kunmap() doesn't, so > > > I suspect we only get the flushing if SG_MITER_ATOMIC. > > > > > > So it looks to me like flush_kernel_dcache_page() should be implemented > > > even for highmem pages (with VIVT or aliasing VIPT, at least for non > > > kmap_atomic addresses by checking for FIXADDR_START). If highmem is > > > disabled, I suspect we still need this function since the calling code > > > doesn't care whether kmap/kunmap was a no-op. But can we keep it as a > > > simple call to __cpuc_flush_dcache_area(page_address(page), PAGE_SIZE)? > > > > My first version ([1]) had: > > > > if ((cache_is_vivt() || cache_is_vipt_aliasing()) && !PageHighMem(page)) > > __flush_kernel_dcache_page(page); > > > > If I understand this correctly, you are proposing to remove the > > highmem exclusion. > > The highmem exclusion may have been there originally because of a > comment suggesting that kunmap() does the flushing. This is the case > only for kunmap_atomic() AFAICT (and maybe we could remove that as well > and rely on flush_kernel_dcache_page() being called). > > > And then in __flush_kernel_dcache_page(): > > > > mapping = page_mapping(page); > > > > if (!mapping || mapping_mapped(mapping)) > > __cpuc_flush_dcache_area(page_address(page), PAGE_SIZE); > > > > I still prefer to have this condition here to avoid the flush when > > there is no user mapping at all. This is handled by lazy flushing > > and is probably the most common case anyway (given how many people > > seem to be affected by this problem). > > Looking at the old thread, you said there is a case when this condition > is not true (O_DIRECT case). If that's for a page cache page, then we > can handle it lazily (for anonymous pages I don't think we can rely on > lazy flushing since the kernel does not guarantee the clearing of the > PG_arch_1 bit). As Russel pointed out in a comment to a later version of the patch, PG_arch_1 makes only sense for page cache pages. The condition above is ok from my point of view (it is based on what flush_dcache_page() uses): - Page cache page without user space mapping: mapping != NULL and mapping_mapped() == 0 -> no flush here; lazy flush based on PG_arch_1 later if needed (we rely on the proper initialization of the page to "dirty" here.) - Page cache page with user space mapping: mapping != NULL and mapping_mapped() != 0 -> kernel mapping flushed here (user mapping can be assumed to be clean) - Anonymous page: mapping == NULL -> kernel mapping flushed here (user mapping can be assumed to be clean) > > Additionally, although we can assume that the page is kmapped, > > page_address(page) can still be NULL for a highmem page, right? > > It looks like kmap() always sets page_address(page) but I'm not sure > about kmap_atomic(), it doesn't seem to. Hmm, in __flush_dcache_page() we have the following code to flush the kernel mapping: void __flush_dcache_page(struct address_space *mapping, struct page *page) { /* * Writeback any data associated with the kernel mapping of this * page. This ensures that data in the physical page is mutually * coherent with the kernels mapping. */ if (!PageHighMem(page)) { __cpuc_flush_dcache_area(page_address(page), PAGE_SIZE); } else { void *addr = kmap_high_get(page); if (addr) { __cpuc_flush_dcache_area(addr, PAGE_SIZE); kunmap_high(page); } else if (cache_is_vipt()) { /* unmapped pages might still be cached */ addr = kmap_atomic(page); __cpuc_flush_dcache_area(addr, PAGE_SIZE); kunmap_atomic(addr); } } ... We probably should reuse it in flush_kernel_dcache_page() to flush the kernel mapping. (The last else clause looks strange though) - Simon
On Wed, May 01, 2013 at 08:04:41PM +0100, Simon Baatz wrote: > On Wed, May 01, 2013 at 03:22:06PM +0100, Catalin Marinas wrote: > > On Tue, Apr 30, 2013 at 10:04:03PM +0100, Simon Baatz wrote: > > > On Tue, Apr 30, 2013 at 12:22:25PM +0100, Catalin Marinas wrote: > > > > On Sun, Apr 21, 2013 at 11:06:30PM +0100, Simon Baatz wrote: > > > > > On Thu, Apr 18, 2013 at 02:51:04PM +0100, Catalin Marinas wrote: > > > > > > On Thu, Apr 18, 2013 at 12:40:16PM +0100, Jason Cooper wrote: > > > ... > > > > > > > > > > It is not the driver itself which is using the API, it is the > > > > > generic scatterlist memory iterator. And I don't think that this is > > > > > wrong, as I have tried to explain in [1]. > > > > > > > > Trying to remember what we've discussed over the past months on this > > > > topic. It looks like sg_miter_stop() does the right thing in calling > > > > flush_kernel_dcache_page(). Commit f8b63c1 (ARM: 6382/1: Remove > > > > superfluous flush_kernel_dcache_page()) removed this function entirely. > > > > The code previously had this comment - /* highmem pages are always > > > > flushed upon kunmap already */ which I think it wasn't fully correct > > > > either. The kunmap_atomic() flushes the caches but kunmap() doesn't, so > > > > I suspect we only get the flushing if SG_MITER_ATOMIC. > > > > > > > > So it looks to me like flush_kernel_dcache_page() should be implemented > > > > even for highmem pages (with VIVT or aliasing VIPT, at least for non > > > > kmap_atomic addresses by checking for FIXADDR_START). If highmem is > > > > disabled, I suspect we still need this function since the calling code > > > > doesn't care whether kmap/kunmap was a no-op. But can we keep it as a > > > > simple call to __cpuc_flush_dcache_area(page_address(page), PAGE_SIZE)? > > > > > > My first version ([1]) had: > > > > > > if ((cache_is_vivt() || cache_is_vipt_aliasing()) && !PageHighMem(page)) > > > __flush_kernel_dcache_page(page); > > > > > > If I understand this correctly, you are proposing to remove the > > > highmem exclusion. > > > > The highmem exclusion may have been there originally because of a > > comment suggesting that kunmap() does the flushing. This is the case > > only for kunmap_atomic() AFAICT (and maybe we could remove that as well > > and rely on flush_kernel_dcache_page() being called). > > > > > And then in __flush_kernel_dcache_page(): > > > > > > mapping = page_mapping(page); > > > > > > if (!mapping || mapping_mapped(mapping)) > > > __cpuc_flush_dcache_area(page_address(page), PAGE_SIZE); > > > > > > I still prefer to have this condition here to avoid the flush when > > > there is no user mapping at all. This is handled by lazy flushing > > > and is probably the most common case anyway (given how many people > > > seem to be affected by this problem). > > > > Looking at the old thread, you said there is a case when this condition > > is not true (O_DIRECT case). If that's for a page cache page, then we > > can handle it lazily (for anonymous pages I don't think we can rely on > > lazy flushing since the kernel does not guarantee the clearing of the > > PG_arch_1 bit). > > As Russel pointed out in a comment to a later version of the patch, > PG_arch_1 makes only sense for page cache pages. The condition above > is ok from my point of view (it is based on what flush_dcache_page() > uses): > > - Page cache page without user space mapping: > mapping != NULL and mapping_mapped() == 0 > > -> no flush here; lazy flush based on PG_arch_1 later if needed (we > rely on the proper initialization of the page to "dirty" here.) Indeed. > - Page cache page with user space mapping: > mapping != NULL and mapping_mapped() != 0 > > -> kernel mapping flushed here (user mapping can be assumed to be clean) We had similar thoughts for AArch64 here and decided it's not needed, it can just clear the PG_arch_1 bit and do it lazily (patches not pushed yet, need more testing). This assumes that even if mapping_mapped(), the page is not actually mapped in user space and we eventually get a set_pte_at() call. That's what powerpc is doing. > - Anonymous page: > mapping == NULL > > -> kernel mapping flushed here (user mapping can be assumed to be clean) I don't think it should care about anonymous pages at all. I put a WARN_ON(!mapping) in flush_dcache_page() and it hasn't triggered yet, though not intensive testing. > > > Additionally, although we can assume that the page is kmapped, > > > page_address(page) can still be NULL for a highmem page, right? > > > > It looks like kmap() always sets page_address(page) but I'm not sure > > about kmap_atomic(), it doesn't seem to. > > Hmm, in __flush_dcache_page() we have the following code to flush the > kernel mapping: > > void __flush_dcache_page(struct address_space *mapping, struct page *page) > { > /* > * Writeback any data associated with the kernel mapping of this > * page. This ensures that data in the physical page is mutually > * coherent with the kernels mapping. > */ > if (!PageHighMem(page)) { > __cpuc_flush_dcache_area(page_address(page), PAGE_SIZE); > } else { > void *addr = kmap_high_get(page); > if (addr) { > __cpuc_flush_dcache_area(addr, PAGE_SIZE); > kunmap_high(page); > } else if (cache_is_vipt()) { > /* unmapped pages might still be cached */ > addr = kmap_atomic(page); > __cpuc_flush_dcache_area(addr, PAGE_SIZE); > kunmap_atomic(addr); > } > } > ... > > > We probably should reuse it in flush_kernel_dcache_page() to flush > the kernel mapping. (The last else clause looks strange though) I think it makes sense to reuse this logic in flush_kernel_dcache_page(). If the page is already mapped with kmap, then kmap_high_get() should return the actual address. If it was mapped with kmap_atomic, kmap_high_get() would return NULL, hence the 'else' clause and the additional kmap_atomic(). The cache_is_vipt() check is useful because kunmap_atomic() would flush VIVT caches anyway. As for __flush_dcache_page() called from other places like flush_dcache_page(), because of this 'else if' clause it looks like it misses flushing unmapped highmem pages on VIVT cache.
Hi Catalin, On Thu, May 02, 2013 at 10:54:31AM +0100, Catalin Marinas wrote: > On Wed, May 01, 2013 at 08:04:41PM +0100, Simon Baatz wrote: > > On Wed, May 01, 2013 at 03:22:06PM +0100, Catalin Marinas wrote: > > > On Tue, Apr 30, 2013 at 10:04:03PM +0100, Simon Baatz wrote: > > > > On Tue, Apr 30, 2013 at 12:22:25PM +0100, Catalin Marinas wrote: > > > > > On Sun, Apr 21, 2013 at 11:06:30PM +0100, Simon Baatz wrote: > > > > > > On Thu, Apr 18, 2013 at 02:51:04PM +0100, Catalin Marinas wrote: > > > > > > > On Thu, Apr 18, 2013 at 12:40:16PM +0100, Jason Cooper wrote: > > > > ... > > > > > > > > > > > > It is not the driver itself which is using the API, it is the > > > > > > generic scatterlist memory iterator. And I don't think that this is > > > > > > wrong, as I have tried to explain in [1]. > > > > > > > > > > Trying to remember what we've discussed over the past months on this > > > > > topic. It looks like sg_miter_stop() does the right thing in calling > > > > > flush_kernel_dcache_page(). Commit f8b63c1 (ARM: 6382/1: Remove > > > > > superfluous flush_kernel_dcache_page()) removed this function entirely. > > > > > The code previously had this comment - /* highmem pages are always > > > > > flushed upon kunmap already */ which I think it wasn't fully correct > > > > > either. The kunmap_atomic() flushes the caches but kunmap() doesn't, so > > > > > I suspect we only get the flushing if SG_MITER_ATOMIC. > > > > > > > > > > So it looks to me like flush_kernel_dcache_page() should be implemented > > > > > even for highmem pages (with VIVT or aliasing VIPT, at least for non > > > > > kmap_atomic addresses by checking for FIXADDR_START). If highmem is > > > > > disabled, I suspect we still need this function since the calling code > > > > > doesn't care whether kmap/kunmap was a no-op. But can we keep it as a > > > > > simple call to __cpuc_flush_dcache_area(page_address(page), PAGE_SIZE)? > > > > > > > > My first version ([1]) had: > > > > > > > > if ((cache_is_vivt() || cache_is_vipt_aliasing()) && !PageHighMem(page)) > > > > __flush_kernel_dcache_page(page); > > > > > > > > If I understand this correctly, you are proposing to remove the > > > > highmem exclusion. > > > > > > The highmem exclusion may have been there originally because of a > > > comment suggesting that kunmap() does the flushing. This is the case > > > only for kunmap_atomic() AFAICT (and maybe we could remove that as well > > > and rely on flush_kernel_dcache_page() being called). > > > > > > > And then in __flush_kernel_dcache_page(): > > > > > > > > mapping = page_mapping(page); > > > > > > > > if (!mapping || mapping_mapped(mapping)) > > > > __cpuc_flush_dcache_area(page_address(page), PAGE_SIZE); > > > > > > > > I still prefer to have this condition here to avoid the flush when > > > > there is no user mapping at all. This is handled by lazy flushing > > > > and is probably the most common case anyway (given how many people > > > > seem to be affected by this problem). > > > > > > Looking at the old thread, you said there is a case when this condition > > > is not true (O_DIRECT case). If that's for a page cache page, then we > > > can handle it lazily (for anonymous pages I don't think we can rely on > > > lazy flushing since the kernel does not guarantee the clearing of the > > > PG_arch_1 bit). > > > > As Russel pointed out in a comment to a later version of the patch, > > PG_arch_1 makes only sense for page cache pages. The condition above > > is ok from my point of view (it is based on what flush_dcache_page() > > uses): > > > > - Page cache page without user space mapping: > > mapping != NULL and mapping_mapped() == 0 > > > > -> no flush here; lazy flush based on PG_arch_1 later if needed (we > > rely on the proper initialization of the page to "dirty" here.) > > Indeed. > > > - Page cache page with user space mapping: > > mapping != NULL and mapping_mapped() != 0 > > > > -> kernel mapping flushed here (user mapping can be assumed to be clean) > > We had similar thoughts for AArch64 here and decided it's not needed, it > can just clear the PG_arch_1 bit and do it lazily (patches not pushed > yet, need more testing). This assumes that even if mapping_mapped(), the > page is not actually mapped in user space and we eventually get a > set_pte_at() call. That's what powerpc is doing. For arm64 only I-/D-cache coherency is relevant, right? In this case, that may be right (but you may want to have a look at what xol_get_insn_slot() in kernel/events/uprobes.c does. I don't know what kind of pages it handles, but this looks suspicious. And I think uprobes are also available for powerpc). However, if you can have aliases in D-cache this is needed. See below to trigger this on pages that are already mapped to user space (direct I/O into a memory mapped file). > > - Anonymous page: > > mapping == NULL > > > > -> kernel mapping flushed here (user mapping can be assumed to be clean) > > I don't think it should care about anonymous pages at all. I put a > WARN_ON(!mapping) in flush_dcache_page() and it hasn't triggered yet, > though not intensive testing. I assume that you inhibited the call to flush_dcache_page() in __get_user_pages() for anon pages. Otherwise, you will be flooded with warnings. DIY steps to produce both cases: (We will need software encryption, thus I need to remove the hardware encryption module mv_cesa on my hardware first) # rmmod mv_cesa # wget -O mapping_tests.c http://ubuntuone.com/0jF9fNsguN8BvI1Z8fsBVI # gcc -o mapping_tests mapping_tests.c # dd if=/dev/urandom of=map.out bs=4k count=10 # dd if=/dev/urandom of=cdevice.img bs=4k count=10 # losetup /dev/loop0 cdevice.img # cryptsetup -c aes create c_sda2 /dev/loop0 <enter any passphrase> # ./mapping_tests cryptsetup will already trigger the warning and mapping_tests should flood your console. Since flush_kernel_dcache_page() is supposed to handle user space pages, it must flush the kernel mapping in these cases. Running the above on a device driver using an unpatched flush_kernel_dcache_page() yields: # ./mapping_tests Anonymous page: differs! User space mapped page: differs! It is even needed in flush_dcache_page() as long as everybody continues to use flush_dcache_page() instead of flush_kernel_dcache_page() when appropriate... (This is probably the main reason why the problem I reported is so uncommon: Everybody seems to use flush_dcache_page() and since it flushes the kernel mapping in these cases, everything is fine.) > > > > Additionally, although we can assume that the page is kmapped, > > > > page_address(page) can still be NULL for a highmem page, right? > > > > > > It looks like kmap() always sets page_address(page) but I'm not sure > > > about kmap_atomic(), it doesn't seem to. > > > > Hmm, in __flush_dcache_page() we have the following code to flush the > > kernel mapping: > > > > void __flush_dcache_page(struct address_space *mapping, struct page *page) > > { > > /* > > * Writeback any data associated with the kernel mapping of this > > * page. This ensures that data in the physical page is mutually > > * coherent with the kernels mapping. > > */ > > if (!PageHighMem(page)) { > > __cpuc_flush_dcache_area(page_address(page), PAGE_SIZE); > > } else { > > void *addr = kmap_high_get(page); > > if (addr) { > > __cpuc_flush_dcache_area(addr, PAGE_SIZE); > > kunmap_high(page); > > } else if (cache_is_vipt()) { > > /* unmapped pages might still be cached */ > > addr = kmap_atomic(page); > > __cpuc_flush_dcache_area(addr, PAGE_SIZE); > > kunmap_atomic(addr); > > } > > } > > ... > > > > > > We probably should reuse it in flush_kernel_dcache_page() to flush > > the kernel mapping. (The last else clause looks strange though) > > I think it makes sense to reuse this logic in > flush_kernel_dcache_page(). If the page is already mapped with kmap, > then kmap_high_get() should return the actual address. If it was mapped > with kmap_atomic, kmap_high_get() would return NULL, hence the 'else' > clause and the additional kmap_atomic(). The cache_is_vipt() check is > useful because kunmap_atomic() would flush VIVT caches anyway. Yes, but why don't we flush for VIPT _aliasing_ already in kunmap_atomic? Why do we need to do anything for non-aliasing VIPT here? > As for __flush_dcache_page() called from other places like > flush_dcache_page(), because of this 'else if' clause it looks like it > misses flushing unmapped highmem pages on VIVT cache. How many users do we have with VIVT D-caches using highmem? (This is not a rhetorical questions, I have no idea. However, I would assume that this use case is almost non-existent.) - Simon
On Thu, May 02, 2013 at 08:38:36PM +0100, Simon Baatz wrote: > On Thu, May 02, 2013 at 10:54:31AM +0100, Catalin Marinas wrote: > > On Wed, May 01, 2013 at 08:04:41PM +0100, Simon Baatz wrote: > > > On Wed, May 01, 2013 at 03:22:06PM +0100, Catalin Marinas wrote: > > > > On Tue, Apr 30, 2013 at 10:04:03PM +0100, Simon Baatz wrote: > > > > > On Tue, Apr 30, 2013 at 12:22:25PM +0100, Catalin Marinas wrote: > > > > > > On Sun, Apr 21, 2013 at 11:06:30PM +0100, Simon Baatz wrote: > > > > > > > On Thu, Apr 18, 2013 at 02:51:04PM +0100, Catalin Marinas wrote: > > > > > > > > On Thu, Apr 18, 2013 at 12:40:16PM +0100, Jason Cooper wrote: > > > > > ... > > > > > > > > > > > > > > It is not the driver itself which is using the API, it is the > > > > > > > generic scatterlist memory iterator. And I don't think that this is > > > > > > > wrong, as I have tried to explain in [1]. > > > > > > > > > > > > Trying to remember what we've discussed over the past months on this > > > > > > topic. It looks like sg_miter_stop() does the right thing in calling > > > > > > flush_kernel_dcache_page(). Commit f8b63c1 (ARM: 6382/1: Remove > > > > > > superfluous flush_kernel_dcache_page()) removed this function entirely. > > > > > > The code previously had this comment - /* highmem pages are always > > > > > > flushed upon kunmap already */ which I think it wasn't fully correct > > > > > > either. The kunmap_atomic() flushes the caches but kunmap() doesn't, so > > > > > > I suspect we only get the flushing if SG_MITER_ATOMIC. > > > > > > > > > > > > So it looks to me like flush_kernel_dcache_page() should be implemented > > > > > > even for highmem pages (with VIVT or aliasing VIPT, at least for non > > > > > > kmap_atomic addresses by checking for FIXADDR_START). If highmem is > > > > > > disabled, I suspect we still need this function since the calling code > > > > > > doesn't care whether kmap/kunmap was a no-op. But can we keep it as a > > > > > > simple call to __cpuc_flush_dcache_area(page_address(page), PAGE_SIZE)? > > > > > > > > > > My first version ([1]) had: > > > > > > > > > > if ((cache_is_vivt() || cache_is_vipt_aliasing()) && !PageHighMem(page)) > > > > > __flush_kernel_dcache_page(page); > > > > > > > > > > If I understand this correctly, you are proposing to remove the > > > > > highmem exclusion. > > > > > > > > The highmem exclusion may have been there originally because of a > > > > comment suggesting that kunmap() does the flushing. This is the case > > > > only for kunmap_atomic() AFAICT (and maybe we could remove that as well > > > > and rely on flush_kernel_dcache_page() being called). > > > > > > > > > And then in __flush_kernel_dcache_page(): > > > > > > > > > > mapping = page_mapping(page); > > > > > > > > > > if (!mapping || mapping_mapped(mapping)) > > > > > __cpuc_flush_dcache_area(page_address(page), PAGE_SIZE); > > > > > > > > > > I still prefer to have this condition here to avoid the flush when > > > > > there is no user mapping at all. This is handled by lazy flushing > > > > > and is probably the most common case anyway (given how many people > > > > > seem to be affected by this problem). > > > > > > > > Looking at the old thread, you said there is a case when this condition > > > > is not true (O_DIRECT case). If that's for a page cache page, then we > > > > can handle it lazily (for anonymous pages I don't think we can rely on > > > > lazy flushing since the kernel does not guarantee the clearing of the > > > > PG_arch_1 bit). > > > > > > As Russel pointed out in a comment to a later version of the patch, > > > PG_arch_1 makes only sense for page cache pages. The condition above > > > is ok from my point of view (it is based on what flush_dcache_page() > > > uses): > > > > > > - Page cache page without user space mapping: > > > mapping != NULL and mapping_mapped() == 0 > > > > > > -> no flush here; lazy flush based on PG_arch_1 later if needed (we > > > rely on the proper initialization of the page to "dirty" here.) > > > > Indeed. > > > > > - Page cache page with user space mapping: > > > mapping != NULL and mapping_mapped() != 0 > > > > > > -> kernel mapping flushed here (user mapping can be assumed to be clean) > > > > We had similar thoughts for AArch64 here and decided it's not needed, it > > can just clear the PG_arch_1 bit and do it lazily (patches not pushed > > yet, need more testing). This assumes that even if mapping_mapped(), the > > page is not actually mapped in user space and we eventually get a > > set_pte_at() call. That's what powerpc is doing. > > For arm64 only I-/D-cache coherency is relevant, right? Yes. Same for 32-bit ARMv7 or non-aliasing D-cache on ARMv6. > In this case, > that may be right (but you may want to have a look at what > xol_get_insn_slot() in kernel/events/uprobes.c does. I don't know > what kind of pages it handles, but this looks suspicious. And I think > uprobes are also available for powerpc). It is suspicious indeed and even the author is unsure about using the right API. Should it rather use copy_to_user_page? I'm not familiar with uprobes. > However, if you can have aliases in D-cache this is needed. See > below to trigger this on pages that are already mapped > to user space (direct I/O into a memory mapped file). You are probably right, for direct I/O and aliases in the D-cache it needs to flush. The documentation for flush_dcache_page() states that it can also be called when the kernel is about to read from a page cache page (potentially mapped into user space). > > > - Anonymous page: > > > mapping == NULL > > > > > > -> kernel mapping flushed here (user mapping can be assumed to be clean) > > > > I don't think it should care about anonymous pages at all. I put a > > WARN_ON(!mapping) in flush_dcache_page() and it hasn't triggered yet, > > though not intensive testing. > > I assume that you inhibited the call to flush_dcache_page() in > __get_user_pages() for anon pages. Otherwise, you will be flooded > with warnings. I haven't done any stress testing so I don't think I hit this code path, so no warning. But yes, it should have triggered. Anyway, in this case flush_dcache_page() should have just ignored (clearing PG_arch_1 is harmless anyway if we also ignore this bit in __sync_icache_dcache for non-aliasing caches). > DIY steps to produce both cases: > (We will need software encryption, thus I need to remove the hardware > encryption module mv_cesa on my hardware first) > > # rmmod mv_cesa > # wget -O mapping_tests.c http://ubuntuone.com/0jF9fNsguN8BvI1Z8fsBVI > # gcc -o mapping_tests mapping_tests.c > # dd if=/dev/urandom of=map.out bs=4k count=10 > # dd if=/dev/urandom of=cdevice.img bs=4k count=10 > # losetup /dev/loop0 cdevice.img > # cryptsetup -c aes create c_sda2 /dev/loop0 > <enter any passphrase> > # ./mapping_tests > > cryptsetup will already trigger the warning and mapping_tests should > flood your console. > > Since flush_kernel_dcache_page() is supposed to handle user space > pages, it must flush the kernel mapping in these cases. Running the > above on a device driver using an unpatched > flush_kernel_dcache_page() yields: > > # ./mapping_tests > Anonymous page: differs! > User space mapped page: differs! I haven't run the tests but I can see why it fails without flush_kernel_dcache_page(). So I think this function needs to be implemented for aliasing VIPT or VIVT caches. > It is even needed in flush_dcache_page() as long as everybody > continues to use flush_dcache_page() instead of > flush_kernel_dcache_page() when appropriate... > (This is probably the main reason why the problem I reported is so > uncommon: Everybody seems to use flush_dcache_page() and since it > flushes the kernel mapping in these cases, everything is fine.) That's why for non-aliasing VIPT we could make it just a clear_bit() and let the callers fix their API usage. > > > > > Additionally, although we can assume that the page is kmapped, > > > > > page_address(page) can still be NULL for a highmem page, right? > > > > > > > > It looks like kmap() always sets page_address(page) but I'm not sure > > > > about kmap_atomic(), it doesn't seem to. > > > > > > Hmm, in __flush_dcache_page() we have the following code to flush the > > > kernel mapping: > > > > > > void __flush_dcache_page(struct address_space *mapping, struct page *page) > > > { > > > /* > > > * Writeback any data associated with the kernel mapping of this > > > * page. This ensures that data in the physical page is mutually > > > * coherent with the kernels mapping. > > > */ > > > if (!PageHighMem(page)) { > > > __cpuc_flush_dcache_area(page_address(page), PAGE_SIZE); > > > } else { > > > void *addr = kmap_high_get(page); > > > if (addr) { > > > __cpuc_flush_dcache_area(addr, PAGE_SIZE); > > > kunmap_high(page); > > > } else if (cache_is_vipt()) { > > > /* unmapped pages might still be cached */ > > > addr = kmap_atomic(page); > > > __cpuc_flush_dcache_area(addr, PAGE_SIZE); > > > kunmap_atomic(addr); > > > } > > > } > > > ... > > > > > > > > > We probably should reuse it in flush_kernel_dcache_page() to flush > > > the kernel mapping. (The last else clause looks strange though) > > > > I think it makes sense to reuse this logic in > > flush_kernel_dcache_page(). If the page is already mapped with kmap, > > then kmap_high_get() should return the actual address. If it was mapped > > with kmap_atomic, kmap_high_get() would return NULL, hence the 'else' > > clause and the additional kmap_atomic(). The cache_is_vipt() check is > > useful because kunmap_atomic() would flush VIVT caches anyway. > > Yes, but why don't we flush for VIPT _aliasing_ already in > kunmap_atomic? Some comment from Nico on the list, it seems that highmem does not work with aliasing VIPT caches: http://article.gmane.org/gmane.linux.ports.arm.kernel/85114 > Why do we need to do anything for non-aliasing VIPT here? Do you mean __flush_dcache_page()? I don't think we need. > > As for __flush_dcache_page() called from other places like > > flush_dcache_page(), because of this 'else if' clause it looks like it > > misses flushing unmapped highmem pages on VIVT cache. > > How many users do we have with VIVT D-caches using highmem? (This is > not a rhetorical questions, I have no idea. However, I would assume > that this use case is almost non-existent.) I don't really know. Maybe Nico has more info. So my proposal: 1. Non-aliasing VIPT - defer any D-cache flushing until __sync_icache_dcache() (and fix callers like uprobes) 2. Aliasing VIPT or VIVT - flush the aliases as before in flush_dcache_page() 3. Aliasing VIPT or VIVT - implement flush_kernel_dcache_page() for all pages (don't check for PageHighmem) 4. VIVT - remove cache flushing from kunmap_atomic and rely on flush_kernel_dcache_page() I'm not entirely sure about 4 but at the moment kunmap() doesn't flush caches, only kunmap_atomic() so it looks like an inconsistency.
On Fri, May 03, 2013 at 11:02:42AM +0100, Catalin Marinas wrote: > On Thu, May 02, 2013 at 08:38:36PM +0100, Simon Baatz wrote: > > On Thu, May 02, 2013 at 10:54:31AM +0100, Catalin Marinas wrote: > > > On Wed, May 01, 2013 at 08:04:41PM +0100, Simon Baatz wrote: > > > > On Wed, May 01, 2013 at 03:22:06PM +0100, Catalin Marinas wrote: > > > > > On Tue, Apr 30, 2013 at 10:04:03PM +0100, Simon Baatz wrote: > > > > > > On Tue, Apr 30, 2013 at 12:22:25PM +0100, Catalin Marinas wrote: > > > > > > > On Sun, Apr 21, 2013 at 11:06:30PM +0100, Simon Baatz wrote: > > > > > > > > On Thu, Apr 18, 2013 at 02:51:04PM +0100, Catalin Marinas wrote: > > > > > > > > > On Thu, Apr 18, 2013 at 12:40:16PM +0100, Jason Cooper wrote: ... > > I haven't run the tests but I can see why it fails without > flush_kernel_dcache_page(). So I think this function needs to be > implemented for aliasing VIPT or VIVT caches. > > > It is even needed in flush_dcache_page() as long as everybody > > continues to use flush_dcache_page() instead of > > flush_kernel_dcache_page() when appropriate... > > (This is probably the main reason why the problem I reported is so > > uncommon: Everybody seems to use flush_dcache_page() and since it > > flushes the kernel mapping in these cases, everything is fine.) > > That's why for non-aliasing VIPT we could make it just a clear_bit() and > let the callers fix their API usage. Do you mean the unnecessary flush for anon pages or also the D/I flush for user space mapped page cache pages? For anon pages, that was the content of the patch you acked ([1]) once. However, Russel did not like the idea to use the PG_dcache_clean bit also for anon pages. I have a newer version that does the following: Do nothing for mapping == NULL on non-aliasing VIPT in flush_dcache_page(). In __sync_icache_dcache() flush if mapping == NULL (always on aliasing VIPT, only if pte_exec() on non-aliasing VIPT). > > > > > > Additionally, although we can assume that the page is kmapped, > > > > > > page_address(page) can still be NULL for a highmem page, right? > > > > > > > > > > It looks like kmap() always sets page_address(page) but I'm not sure > > > > > about kmap_atomic(), it doesn't seem to. > > > > > > > > Hmm, in __flush_dcache_page() we have the following code to flush the > > > > kernel mapping: > > > > > > > > void __flush_dcache_page(struct address_space *mapping, struct page *page) > > > > { > > > > /* > > > > * Writeback any data associated with the kernel mapping of this > > > > * page. This ensures that data in the physical page is mutually > > > > * coherent with the kernels mapping. > > > > */ > > > > if (!PageHighMem(page)) { > > > > __cpuc_flush_dcache_area(page_address(page), PAGE_SIZE); > > > > } else { > > > > void *addr = kmap_high_get(page); > > > > if (addr) { > > > > __cpuc_flush_dcache_area(addr, PAGE_SIZE); > > > > kunmap_high(page); > > > > } else if (cache_is_vipt()) { > > > > /* unmapped pages might still be cached */ > > > > addr = kmap_atomic(page); > > > > __cpuc_flush_dcache_area(addr, PAGE_SIZE); > > > > kunmap_atomic(addr); > > > > } > > > > } > > > > ... > > > > > > > > > > > > We probably should reuse it in flush_kernel_dcache_page() to flush > > > > the kernel mapping. (The last else clause looks strange though) > > > > > > I think it makes sense to reuse this logic in > > > flush_kernel_dcache_page(). If the page is already mapped with kmap, > > > then kmap_high_get() should return the actual address. If it was mapped > > > with kmap_atomic, kmap_high_get() would return NULL, hence the 'else' > > > clause and the additional kmap_atomic(). The cache_is_vipt() check is > > > useful because kunmap_atomic() would flush VIVT caches anyway. > > > > Yes, but why don't we flush for VIPT _aliasing_ already in > > kunmap_atomic? > > Some comment from Nico on the list, it seems that highmem does not work > with aliasing VIPT caches: > > http://article.gmane.org/gmane.linux.ports.arm.kernel/85114 > > > Why do we need to do anything for non-aliasing VIPT here? > > Do you mean __flush_dcache_page()? I don't think we need. > > > > As for __flush_dcache_page() called from other places like > > > flush_dcache_page(), because of this 'else if' clause it looks like it > > > misses flushing unmapped highmem pages on VIVT cache. > > > > How many users do we have with VIVT D-caches using highmem? (This is > > not a rhetorical questions, I have no idea. However, I would assume > > that this use case is almost non-existent.) > > I don't really know. Maybe Nico has more info. > > So my proposal: > > 1. Non-aliasing VIPT - defer any D-cache flushing until > __sync_icache_dcache() (and fix callers like uprobes) See above, not sure whether we can also get rid of the I-cache flush. > 2. Aliasing VIPT or VIVT - flush the aliases as before in > flush_dcache_page() Yes. > 3. Aliasing VIPT or VIVT - implement flush_kernel_dcache_page() for all > pages (don't check for PageHighmem) Ok, I will update the first version of my patch. > 4. VIVT - remove cache flushing from kunmap_atomic and rely on > flush_kernel_dcache_page() > > I'm not entirely sure about 4 but at the moment kunmap() doesn't flush > caches, only kunmap_atomic() so it looks like an inconsistency. Not sure either. - Simon [1] http://lists.infradead.org/pipermail/linux-arm-kernel/2012-October/124291.html
On Sun, May 05, 2013 at 11:26:47PM +0100, Simon Baatz wrote: > On Fri, May 03, 2013 at 11:02:42AM +0100, Catalin Marinas wrote: > > On Thu, May 02, 2013 at 08:38:36PM +0100, Simon Baatz wrote: > ... > > I haven't run the tests but I can see why it fails without > > flush_kernel_dcache_page(). So I think this function needs to be > > implemented for aliasing VIPT or VIVT caches. > > > > > It is even needed in flush_dcache_page() as long as everybody > > > continues to use flush_dcache_page() instead of > > > flush_kernel_dcache_page() when appropriate... > > > (This is probably the main reason why the problem I reported is so > > > uncommon: Everybody seems to use flush_dcache_page() and since it > > > flushes the kernel mapping in these cases, everything is fine.) > > > > That's why for non-aliasing VIPT we could make it just a clear_bit() and > > let the callers fix their API usage. > > Do you mean the unnecessary flush for anon pages or also the D/I > flush for user space mapped page cache pages? flush_dcache_page() should just ignore anonymous pages. The D/I coherency would be handled in __sync_icache_dcache() later. Do we have cases where a page is already mapped in user space (pte valid) and the kernel writes any code to it? I don't think we have. PowerPC have a simplified flush_dcache_page() and they haven't seen any issues. > For anon pages, that was the content of the patch you acked ([1]) > once. However, Russel did not like the idea to use the > PG_dcache_clean bit also for anon pages. I think Russell was right, the kernel does not have any guarantees about the PG_arch_1 bit on anonymous pages. Do we could remove the !mapping code path in flush_dcache_page(). > I have a newer version that does the following: Do nothing for > mapping == NULL on non-aliasing VIPT in flush_dcache_page(). Can this mapping == NULL be generalised for aliasing caches? > In __sync_icache_dcache() flush if mapping == NULL (always on aliasing > VIPT, only if pte_exec() on non-aliasing VIPT). If the page is anonymous, do we expect user code to execute from a clear page? When the page is first allocated, clear_user_highpage should take care of the aliases already.
diff --git a/arch/arm/include/asm/cacheflush.h b/arch/arm/include/asm/cacheflush.h index e4448e1..eca955f 100644 --- a/arch/arm/include/asm/cacheflush.h +++ b/arch/arm/include/asm/cacheflush.h @@ -307,6 +307,10 @@ static inline void flush_anon_page(struct vm_area_struct *vma, #define ARCH_HAS_FLUSH_KERNEL_DCACHE_PAGE static inline void flush_kernel_dcache_page(struct page *page) { + extern void __flush_kernel_dcache_page(struct page *); + /* highmem pages are always flushed upon kunmap already */ + if (!PageHighMem(page)) + __flush_kernel_dcache_page(page); } #define flush_dcache_mmap_lock(mapping) \ diff --git a/arch/arm/mm/flush.c b/arch/arm/mm/flush.c index 5c474a1..59ad4fc 100644 --- a/arch/arm/mm/flush.c +++ b/arch/arm/mm/flush.c @@ -192,6 +192,48 @@ void __flush_dcache_page(struct address_space *mapping, struct page *page) page->index << PAGE_CACHE_SHIFT); } +/* + * Ensure cache coherency for the kernel mapping of this page. + * + * If the page only exists in the page cache and there are no user + * space mappings, we can be lazy and remember that we may have dirty + * kernel cache lines for later. Otherwise, we need to flush the + * dirty kernel cache lines directly. + * + * Note that we disable the lazy flush for SMP configurations where + * the cache maintenance operations are not automatically broadcasted. + * + * We can assume that the page is no high mem page, see + * flush_kernel_dcache_page. + */ +void __flush_kernel_dcache_page(struct page *page) +{ + struct address_space *mapping; + + /* + * The zero page is never written to, so never has any dirty + * cache lines, and therefore never needs to be flushed. + */ + if (page == ZERO_PAGE(0)) + return; + + mapping = page_mapping(page); + + if (!cache_ops_need_broadcast()) { + if ((mapping && !mapping_mapped(mapping)) || + (!mapping && cache_is_vipt_nonaliasing())) { + clear_bit(PG_dcache_clean, &page->flags); + return; + } + } + + __cpuc_flush_dcache_area(page_address(page), PAGE_SIZE); + if (mapping && !cache_is_vivt()) + __flush_icache_all(); + set_bit(PG_dcache_clean, &page->flags); +} +EXPORT_SYMBOL(__flush_kernel_dcache_page); + static void __flush_dcache_aliases(struct address_space *mapping, struct page *page) { struct mm_struct *mm = current->active_mm;
Commit f8b63c1 made flush_kernel_dcache_page() a no-op assuming that the pages it needs to handle are kernel mapped only. However, for example when doing direct I/O, pages with user space mappings may occur. Thus, do lazy flushing like in flush_dcache_page() if there are no user space mappings. Otherwise, flush the kernel cache lines directly. Signed-off-by: Simon Baatz <gmbnomis@gmail.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Russell King <linux@arm.linux.org.uk> --- Changes: in V3: - Followed Catalin's suggestion to reverse the order of the patches in V2: - flush_kernel_dcache_page() follows flush_dcache_page() now, except that it does not flush the user mappings arch/arm/include/asm/cacheflush.h | 4 ++++ arch/arm/mm/flush.c | 42 +++++++++++++++++++++++++++++++++++++ 2 files changed, 46 insertions(+)