diff mbox series

[RFC,3/3] mm, fault_around: do not take a reference to a locked page

Message ID 20181120134323.13007-4-mhocko@kernel.org (mailing list archive)
State New, archived
Headers show
Series few memory offlining enhancements | expand

Commit Message

Michal Hocko Nov. 20, 2018, 1:43 p.m. UTC
From: Michal Hocko <mhocko@suse.com>

filemap_map_pages takes a speculative reference to each page in the
range before it tries to lock that page. While this is correct it
also can influence page migration which will bail out when seeing
an elevated reference count. The faultaround code would bail on
seeing a locked page so we can pro-actively check the PageLocked
bit before page_cache_get_speculative and prevent from pointless
reference count churn.

Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 mm/filemap.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Kirill A. Shutemov Nov. 20, 2018, 2:07 p.m. UTC | #1
On Tue, Nov 20, 2018 at 02:43:23PM +0100, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> filemap_map_pages takes a speculative reference to each page in the
> range before it tries to lock that page. While this is correct it
> also can influence page migration which will bail out when seeing
> an elevated reference count. The faultaround code would bail on
> seeing a locked page so we can pro-actively check the PageLocked
> bit before page_cache_get_speculative and prevent from pointless
> reference count churn.

Looks fine to me.

But please drop a line of comment in the code. As is it might be confusing
for a reader.
Michal Hocko Nov. 20, 2018, 2:12 p.m. UTC | #2
On Tue 20-11-18 17:07:15, Kirill A. Shutemov wrote:
> On Tue, Nov 20, 2018 at 02:43:23PM +0100, Michal Hocko wrote:
> > From: Michal Hocko <mhocko@suse.com>
> > 
> > filemap_map_pages takes a speculative reference to each page in the
> > range before it tries to lock that page. While this is correct it
> > also can influence page migration which will bail out when seeing
> > an elevated reference count. The faultaround code would bail on
> > seeing a locked page so we can pro-actively check the PageLocked
> > bit before page_cache_get_speculative and prevent from pointless
> > reference count churn.
> 
> Looks fine to me.

Thanks for the review.

> But please drop a line of comment in the code. As is it might be confusing
> for a reader.

This?

diff --git a/mm/filemap.c b/mm/filemap.c
index c76d6a251770..7c4e439a2e85 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2554,6 +2554,10 @@ void filemap_map_pages(struct vm_fault *vmf,
 
 		head = compound_head(page);
 
+		/*
+		 * Check the locked pages before taking a reference to not
+		 * go in the way of migration.
+		 */
 		if (PageLocked(head))
 			goto next;
 		if (!page_cache_get_speculative(head))
Kirill A. Shutemov Nov. 20, 2018, 2:17 p.m. UTC | #3
On Tue, Nov 20, 2018 at 03:12:07PM +0100, Michal Hocko wrote:
> On Tue 20-11-18 17:07:15, Kirill A. Shutemov wrote:
> > On Tue, Nov 20, 2018 at 02:43:23PM +0100, Michal Hocko wrote:
> > > From: Michal Hocko <mhocko@suse.com>
> > > 
> > > filemap_map_pages takes a speculative reference to each page in the
> > > range before it tries to lock that page. While this is correct it
> > > also can influence page migration which will bail out when seeing
> > > an elevated reference count. The faultaround code would bail on
> > > seeing a locked page so we can pro-actively check the PageLocked
> > > bit before page_cache_get_speculative and prevent from pointless
> > > reference count churn.
> > 
> > Looks fine to me.
> 
> Thanks for the review.
> 
> > But please drop a line of comment in the code. As is it might be confusing
> > for a reader.
> 
> This?

Yep.

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

> 
> diff --git a/mm/filemap.c b/mm/filemap.c
> index c76d6a251770..7c4e439a2e85 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -2554,6 +2554,10 @@ void filemap_map_pages(struct vm_fault *vmf,
>  
>  		head = compound_head(page);
>  
> +		/*
> +		 * Check the locked pages before taking a reference to not
> +		 * go in the way of migration.
> +		 */
>  		if (PageLocked(head))
>  			goto next;
>  		if (!page_cache_get_speculative(head))
> -- 
> Michal Hocko
> SUSE Labs
Michal Hocko Nov. 20, 2018, 2:25 p.m. UTC | #4
On Tue 20-11-18 17:17:00, Kirill A. Shutemov wrote:
> On Tue, Nov 20, 2018 at 03:12:07PM +0100, Michal Hocko wrote:
> > On Tue 20-11-18 17:07:15, Kirill A. Shutemov wrote:
> > > On Tue, Nov 20, 2018 at 02:43:23PM +0100, Michal Hocko wrote:
> > > > From: Michal Hocko <mhocko@suse.com>
> > > > 
> > > > filemap_map_pages takes a speculative reference to each page in the
> > > > range before it tries to lock that page. While this is correct it
> > > > also can influence page migration which will bail out when seeing
> > > > an elevated reference count. The faultaround code would bail on
> > > > seeing a locked page so we can pro-actively check the PageLocked
> > > > bit before page_cache_get_speculative and prevent from pointless
> > > > reference count churn.
> > > 
> > > Looks fine to me.
> > 
> > Thanks for the review.
> > 
> > > But please drop a line of comment in the code. As is it might be confusing
> > > for a reader.
> > 
> > This?
> 
> Yep.
> 
> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

Cool, thanks! I will wait some more time for review feedback for other
patches and then repost with this folded in.
David Hildenbrand Nov. 20, 2018, 2:33 p.m. UTC | #5
On 20.11.18 14:43, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> filemap_map_pages takes a speculative reference to each page in the
> range before it tries to lock that page. While this is correct it
> also can influence page migration which will bail out when seeing
> an elevated reference count. The faultaround code would bail on
> seeing a locked page so we can pro-actively check the PageLocked
> bit before page_cache_get_speculative and prevent from pointless
> reference count churn.
> 
> Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
> Suggested-by: Jan Kara <jack@suse.cz>
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  mm/filemap.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 81adec8ee02c..c76d6a251770 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -2553,6 +2553,9 @@ void filemap_map_pages(struct vm_fault *vmf,
>  			goto next;
>  
>  		head = compound_head(page);
> +
> +		if (PageLocked(head))
> +			goto next;
>  		if (!page_cache_get_speculative(head))
>  			goto next;
>  
> 

Right, no sense in referencing a page if we know we will drop the
reference right away.

Reviewed-by: David Hildenbrand <david@redhat.com>
Hugh Dickins Nov. 21, 2018, 1:47 a.m. UTC | #6
On Tue, 20 Nov 2018, Michal Hocko wrote:

> From: Michal Hocko <mhocko@suse.com>
> 
> filemap_map_pages takes a speculative reference to each page in the
> range before it tries to lock that page. While this is correct it
> also can influence page migration which will bail out when seeing
> an elevated reference count. The faultaround code would bail on
> seeing a locked page so we can pro-actively check the PageLocked
> bit before page_cache_get_speculative and prevent from pointless
> reference count churn.
> 
> Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
> Suggested-by: Jan Kara <jack@suse.cz>
> Signed-off-by: Michal Hocko <mhocko@suse.com>

Acked-by: Hugh Dickins <hughd@google.com>

though I think this patch is more useful to the avoid atomic ops,
and unnecessary dirtying of the cacheline, than to avoid the very
transient elevation of refcount, which will not affect page migration
very much.

> ---
>  mm/filemap.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 81adec8ee02c..c76d6a251770 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -2553,6 +2553,9 @@ void filemap_map_pages(struct vm_fault *vmf,
>  			goto next;
>  
>  		head = compound_head(page);
> +
> +		if (PageLocked(head))
> +			goto next;
>  		if (!page_cache_get_speculative(head))
>  			goto next;
>  
> -- 
> 2.19.1
William Kucharski Nov. 21, 2018, 4:51 a.m. UTC | #7
> On Nov 20, 2018, at 7:12 AM, Michal Hocko <mhocko@kernel.org> wrote:
> 
> +		/*
> +		 * Check the locked pages before taking a reference to not
> +		 * go in the way of migration.
> +		 */

Could you make this a tiny bit more explanative, something like:

+		/*
+		 * Check for a locked page first, as a speculative
+		 * reference may adversely influence page migration.
+		 */

Reviewed-by: William Kucharski <william.kucharski@oracle.com>
Michal Hocko Nov. 21, 2018, 7:07 a.m. UTC | #8
On Tue 20-11-18 21:51:39, William Kucharski wrote:
> 
> 
> > On Nov 20, 2018, at 7:12 AM, Michal Hocko <mhocko@kernel.org> wrote:
> > 
> > +		/*
> > +		 * Check the locked pages before taking a reference to not
> > +		 * go in the way of migration.
> > +		 */
> 
> Could you make this a tiny bit more explanative, something like:
> 
> +		/*
> +		 * Check for a locked page first, as a speculative
> +		 * reference may adversely influence page migration.
> +		 */

sure

> 
> Reviewed-by: William Kucharski <william.kucharski@oracle.com>

Thanks!
Michal Hocko Nov. 21, 2018, 7:11 a.m. UTC | #9
On Tue 20-11-18 17:47:21, Hugh Dickins wrote:
> On Tue, 20 Nov 2018, Michal Hocko wrote:
> 
> > From: Michal Hocko <mhocko@suse.com>
> > 
> > filemap_map_pages takes a speculative reference to each page in the
> > range before it tries to lock that page. While this is correct it
> > also can influence page migration which will bail out when seeing
> > an elevated reference count. The faultaround code would bail on
> > seeing a locked page so we can pro-actively check the PageLocked
> > bit before page_cache_get_speculative and prevent from pointless
> > reference count churn.
> > 
> > Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
> > Suggested-by: Jan Kara <jack@suse.cz>
> > Signed-off-by: Michal Hocko <mhocko@suse.com>
> 
> Acked-by: Hugh Dickins <hughd@google.com>

Thanks!

> though I think this patch is more useful to the avoid atomic ops,
> and unnecessary dirtying of the cacheline, than to avoid the very
> transient elevation of refcount, which will not affect page migration
> very much.

Are you sure it would really be transient? In other words is it possible
that the fault around can block migration repeatedly under refault heavy
workload? I just couldn't convince myself, to be honest.
Hugh Dickins Nov. 22, 2018, 2:27 a.m. UTC | #10
On Wed, 21 Nov 2018, Michal Hocko wrote:
> On Tue 20-11-18 17:47:21, Hugh Dickins wrote:
> > On Tue, 20 Nov 2018, Michal Hocko wrote:
> > 
> > > From: Michal Hocko <mhocko@suse.com>
> > > 
> > > filemap_map_pages takes a speculative reference to each page in the
> > > range before it tries to lock that page. While this is correct it
> > > also can influence page migration which will bail out when seeing
> > > an elevated reference count. The faultaround code would bail on
> > > seeing a locked page so we can pro-actively check the PageLocked
> > > bit before page_cache_get_speculative and prevent from pointless
> > > reference count churn.
> > > 
> > > Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
> > > Suggested-by: Jan Kara <jack@suse.cz>
> > > Signed-off-by: Michal Hocko <mhocko@suse.com>
> > 
> > Acked-by: Hugh Dickins <hughd@google.com>
> 
> Thanks!
> 
> > though I think this patch is more useful to the avoid atomic ops,
> > and unnecessary dirtying of the cacheline, than to avoid the very
> > transient elevation of refcount, which will not affect page migration
> > very much.
> 
> Are you sure it would really be transient? In other words is it possible
> that the fault around can block migration repeatedly under refault heavy
> workload? I just couldn't convince myself, to be honest.

I don't deny that it is possible: I expect that, using fork() (which does
not copy the ptes in a shared file vma), you can construct a test case
where each child faults one or another page near a page of no interest,
and that page of no interest is a target of migration perpetually
frustrated by filemap_map_pages()'s briefly raised refcount.

But I suggest that's a third-order effect: well worth fixing because
it's easily and uncontroversially dealt with, as you have; but not of
great importance.

The first-order effect is migration conspiring to defeat itself: that's
what my put_and_wait_on_page_locked() patch, in other thread, is about.

The second order effect is when a page that is really wanted is waited
on - the target of a fault, for which page refcount is raised maybe
long before it finally gets into the page table (whereupon it becomes
visible to try_to_unmap(), and its mapcount matches refcount so that
migration can fully account for the page).  One class of that can be
well dealt with by using put_and_wait_on_page_locked_killable() in
lock_page_or_retry(), but I was keeping that as a future instalment.

But I shouldn't denigrate the transient case by referring so lightly
to migrate_pages()' 10 attempts: each of those failed attempts can
be very expensive, unmapping and TLB flushing (including IPIs) and
remapping. It may well be that 2 or 3 would be a more cost-effective
number of attempts, at least when the page is mapped.

Hugh
Michal Hocko Nov. 22, 2018, 9:05 a.m. UTC | #11
On Wed 21-11-18 18:27:11, Hugh Dickins wrote:
> On Wed, 21 Nov 2018, Michal Hocko wrote:
> > On Tue 20-11-18 17:47:21, Hugh Dickins wrote:
> > > On Tue, 20 Nov 2018, Michal Hocko wrote:
> > > 
> > > > From: Michal Hocko <mhocko@suse.com>
> > > > 
> > > > filemap_map_pages takes a speculative reference to each page in the
> > > > range before it tries to lock that page. While this is correct it
> > > > also can influence page migration which will bail out when seeing
> > > > an elevated reference count. The faultaround code would bail on
> > > > seeing a locked page so we can pro-actively check the PageLocked
> > > > bit before page_cache_get_speculative and prevent from pointless
> > > > reference count churn.
> > > > 
> > > > Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
> > > > Suggested-by: Jan Kara <jack@suse.cz>
> > > > Signed-off-by: Michal Hocko <mhocko@suse.com>
> > > 
> > > Acked-by: Hugh Dickins <hughd@google.com>
> > 
> > Thanks!
> > 
> > > though I think this patch is more useful to the avoid atomic ops,
> > > and unnecessary dirtying of the cacheline, than to avoid the very
> > > transient elevation of refcount, which will not affect page migration
> > > very much.
> > 
> > Are you sure it would really be transient? In other words is it possible
> > that the fault around can block migration repeatedly under refault heavy
> > workload? I just couldn't convince myself, to be honest.
> 
> I don't deny that it is possible: I expect that, using fork() (which does
> not copy the ptes in a shared file vma), you can construct a test case
> where each child faults one or another page near a page of no interest,
> and that page of no interest is a target of migration perpetually
> frustrated by filemap_map_pages()'s briefly raised refcount.

The other issue I am debugging and which very likely has the same
underlying issue in the end has shown
[  883.930477] rac1 kernel: page:ffffea2084bf5cc0 count:1889 mapcount:1887 mapping:ffff8833c82c9ad8 index:0x6b
[  883.930485] rac1 kernel: ext4_da_aops [ext4]
[  883.930497] rac1 kernel: name:"libc-2.22.so"
[  883.931241] rac1 kernel: do_migrate_range done ret=23

pattern. After we have disabled the faultaround the failure has moved to
a different page but libc hasn't shown up again. This might be a matter
of (bad)luck and timing. But we thought that it is not too unlikely for
faultaround on such a shared page to strike in.

> But I suggest that's a third-order effect: well worth fixing because
> it's easily and uncontroversially dealt with, as you have; but not of
> great importance.
> 
> The first-order effect is migration conspiring to defeat itself: that's
> what my put_and_wait_on_page_locked() patch, in other thread, is about.

yes. That is obviously a much more effective fix.

> The second order effect is when a page that is really wanted is waited
> on - the target of a fault, for which page refcount is raised maybe
> long before it finally gets into the page table (whereupon it becomes
> visible to try_to_unmap(), and its mapcount matches refcount so that
> migration can fully account for the page).  One class of that can be
> well dealt with by using put_and_wait_on_page_locked_killable() in
> lock_page_or_retry(), but I was keeping that as a future instalment.
> 
> But I shouldn't denigrate the transient case by referring so lightly
> to migrate_pages()' 10 attempts: each of those failed attempts can
> be very expensive, unmapping and TLB flushing (including IPIs) and
> remapping. It may well be that 2 or 3 would be a more cost-effective
> number of attempts, at least when the page is mapped.

If you want some update to the comment in this function or to the
changelog, I am open of course. Right now I have
+                * Check for a locked page first, as a speculative
+                * reference may adversely influence page migration.
as suggested by William.
Hugh Dickins Nov. 23, 2018, 8:22 p.m. UTC | #12
On Thu, 22 Nov 2018, Michal Hocko wrote:
> 
> If you want some update to the comment in this function or to the
> changelog, I am open of course. Right now I have
> +                * Check for a locked page first, as a speculative
> +                * reference may adversely influence page migration.
> as suggested by William.

I ought to care, since I challenged the significance of this aspect
in the first place, but find I don't care enough - I much prefer the
patch to the comments on and in it, but have not devised any wording
that I'd prefer to see instead - sorry.

Hugh
diff mbox series

Patch

diff --git a/mm/filemap.c b/mm/filemap.c
index 81adec8ee02c..c76d6a251770 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2553,6 +2553,9 @@  void filemap_map_pages(struct vm_fault *vmf,
 			goto next;
 
 		head = compound_head(page);
+
+		if (PageLocked(head))
+			goto next;
 		if (!page_cache_get_speculative(head))
 			goto next;