[2/2] mm,thp,compaction,cma: allow THP migration for CMA allocations
diff mbox series

Message ID 3289dc5e6c4c3174999598d8293adf8ed3e93b57.1582321645.git.riel@surriel.com
State New
Headers show
Series
  • fix THP migration for CMA allocations
Related show

Commit Message

Rik van Riel Feb. 21, 2020, 9:53 p.m. UTC
The code to implement THP migrations already exists, and the code
for CMA to clear out a region of memory already exists.

Only a few small tweaks are needed to allow CMA to move THP memory
when attempting an allocation from alloc_contig_range.

With these changes, migrating THPs from a CMA area works when
allocating a 1GB hugepage from CMA memory.

Signed-off-by: Rik van Riel <riel@surriel.com>
---
 mm/compaction.c | 16 +++++++++-------
 mm/page_alloc.c |  6 ++++--
 2 files changed, 13 insertions(+), 9 deletions(-)

Comments

Zi Yan Feb. 21, 2020, 10:31 p.m. UTC | #1
On 21 Feb 2020, at 16:53, Rik van Riel wrote:

> The code to implement THP migrations already exists, and the code
> for CMA to clear out a region of memory already exists.
>
> Only a few small tweaks are needed to allow CMA to move THP memory
> when attempting an allocation from alloc_contig_range.
>
> With these changes, migrating THPs from a CMA area works when
> allocating a 1GB hugepage from CMA memory.
>
> Signed-off-by: Rik van Riel <riel@surriel.com>
> ---
>  mm/compaction.c | 16 +++++++++-------
>  mm/page_alloc.c |  6 ++++--
>  2 files changed, 13 insertions(+), 9 deletions(-)
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 672d3c78c6ab..f3e05c91df62 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -894,12 +894,12 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>
>  		/*
>  		 * Regardless of being on LRU, compound pages such as THP and
> -		 * hugetlbfs are not to be compacted. We can potentially save
> -		 * a lot of iterations if we skip them at once. The check is
> -		 * racy, but we can consider only valid values and the only
> -		 * danger is skipping too much.
> +		 * hugetlbfs are not to be compacted most of the time. We can
> +		 * potentially save a lot of iterations if we skip them at
> +		 * once. The check is racy, but we can consider only valid
> +		 * values and the only danger is skipping too much.
>  		 */

Maybe add “we do want to move them when allocating contiguous memory using CMA” to help
people understand the context of using cc->alloc_contig?

> -		if (PageCompound(page)) {
> +		if (PageCompound(page) && !cc->alloc_contig) {
>  			const unsigned int order = compound_order(page);
>
>  			if (likely(order < MAX_ORDER))
> @@ -969,7 +969,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>  			 * and it's on LRU. It can only be a THP so the order
>  			 * is safe to read and it's 0 for tail pages.
>  			 */
> -			if (unlikely(PageCompound(page))) {
> +			if (unlikely(PageCompound(page) && !cc->alloc_contig)) {
>  				low_pfn += compound_nr(page) - 1;
>  				goto isolate_fail;
>  			}
> @@ -981,7 +981,9 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>  		if (__isolate_lru_page(page, isolate_mode) != 0)
>  			goto isolate_fail;
>
> -		VM_BUG_ON_PAGE(PageCompound(page), page);
> +		/* The whole page is taken off the LRU; skip the tail pages. */
> +		if (PageCompound(page))
> +			low_pfn += compound_nr(page) - 1;
>
>  		/* Successfully isolated */
>  		del_page_from_lru_list(page, lruvec, page_lru(page));
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index a36736812596..38c8ddfcecc8 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -8253,14 +8253,16 @@ struct page *has_unmovable_pages(struct zone *zone, struct page *page,
>
>  		/*
>  		 * Hugepages are not in LRU lists, but they're movable.
> +		 * THPs are on the LRU, but need to be counted as #small pages.
>  		 * We need not scan over tail pages because we don't
>  		 * handle each tail page individually in migration.
>  		 */
> -		if (PageHuge(page)) {
> +		if (PageTransHuge(page)) {
>  			struct page *head = compound_head(page);
>  			unsigned int skip_pages;
>
> -			if (!hugepage_migration_supported(page_hstate(head)))
> +			if (PageHuge(page) &&
> +			    !hugepage_migration_supported(page_hstate(head)))
>  				return page;
>
>  			skip_pages = compound_nr(head) - (page - head);
> -- 
> 2.24.1

Everything else looks good to me.

Reviewed-by: Zi Yan <ziy@nvidia.com>


--
Best Regards,
Yan Zi
Rik van Riel Feb. 21, 2020, 10:35 p.m. UTC | #2
On Fri, 2020-02-21 at 17:31 -0500, Zi Yan wrote:
> On 21 Feb 2020, at 16:53, Rik van Riel wrote:
> 
> > +++ b/mm/compaction.c
> > @@ -894,12 +894,12 @@ isolate_migratepages_block(struct
> > compact_control *cc, unsigned long low_pfn,
> > 
> >  		/*
> >  		 * Regardless of being on LRU, compound pages such as
> > THP and
> > -		 * hugetlbfs are not to be compacted. We can
> > potentially save
> > -		 * a lot of iterations if we skip them at once. The
> > check is
> > -		 * racy, but we can consider only valid values and the
> > only
> > -		 * danger is skipping too much.
> > +		 * hugetlbfs are not to be compacted most of the time.
> > We can
> > +		 * potentially save a lot of iterations if we skip them
> > at
> > +		 * once. The check is racy, but we can consider only
> > valid
> > +		 * values and the only danger is skipping too much.
> >  		 */
> 
> Maybe add “we do want to move them when allocating contiguous memory
> using CMA” to help
> people understand the context of using cc->alloc_contig?

I can certainly do that.

I'll wait for feedback from other people to see if
more changes are wanted, and plan to post v2 by
Tuesday or so :)
Vlastimil Babka Feb. 24, 2020, 3:29 p.m. UTC | #3
On 2/21/20 10:53 PM, Rik van Riel wrote:
> @@ -981,7 +981,9 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>  		if (__isolate_lru_page(page, isolate_mode) != 0)
>  			goto isolate_fail;
>  
> -		VM_BUG_ON_PAGE(PageCompound(page), page);
> +		/* The whole page is taken off the LRU; skip the tail pages. */
> +		if (PageCompound(page))
> +			low_pfn += compound_nr(page) - 1;
>  
>  		/* Successfully isolated */
>  		del_page_from_lru_list(page, lruvec, page_lru(page));

This continues by:
inc_node_page_state(page, NR_ISOLATED_ANON + page_is_file_cache(page));


I think it now needs to use mod_node_page_state() with
hpage_nr_pages(page) otherwise the counter will underflow after the
migration?

> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index a36736812596..38c8ddfcecc8 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -8253,14 +8253,16 @@ struct page *has_unmovable_pages(struct zone *zone, struct page *page,
>  
>  		/*
>  		 * Hugepages are not in LRU lists, but they're movable.
> +		 * THPs are on the LRU, but need to be counted as #small pages.
>  		 * We need not scan over tail pages because we don't
>  		 * handle each tail page individually in migration.
>  		 */
> -		if (PageHuge(page)) {
> +		if (PageTransHuge(page)) {

Hmm, PageTransHuge() has VM_BUG_ON() for tail pages, while this code is
written so that it can encounter a tail page and skip the rest of the
compound page properly. So I would be worried about this.

Also PageTransHuge() is basically just a PageHead() so for each
non-hugetlbfs compound page this will assume it's a THP, while correctly
it should reach the __PageMovable() || PageLRU(page) tests below.

So probably this should do something like.

if (PageHuge(page) || PageTransCompound(page)) {
...
   if (PageHuge(page) && !hpage_migration_supported)) return page.
   if (!PageLRU(head) && !__PageMovable(head)) return page
...

>  			struct page *head = compound_head(page);
>  			unsigned int skip_pages;
>  
> -			if (!hugepage_migration_supported(page_hstate(head)))
> +			if (PageHuge(page) &&
> +			    !hugepage_migration_supported(page_hstate(head)))
>  				return page;
>  
>  			skip_pages = compound_nr(head) - (page - head);
>
Rik van Riel Feb. 25, 2020, 6:44 p.m. UTC | #4
On Mon, 2020-02-24 at 16:29 +0100, Vlastimil Babka wrote:
> On 2/21/20 10:53 PM, Rik van Riel wrote:
> > @@ -981,7 +981,9 @@ isolate_migratepages_block(struct
> > compact_control *cc, unsigned long low_pfn,
> >  		if (__isolate_lru_page(page, isolate_mode) != 0)
> >  			goto isolate_fail;
> >  
> > -		VM_BUG_ON_PAGE(PageCompound(page), page);
> > +		/* The whole page is taken off the LRU; skip the tail
> > pages. */
> > +		if (PageCompound(page))
> > +			low_pfn += compound_nr(page) - 1;
> >  
> >  		/* Successfully isolated */
> >  		del_page_from_lru_list(page, lruvec, page_lru(page));
> 
> This continues by:
> inc_node_page_state(page, NR_ISOLATED_ANON +
> page_is_file_cache(page));
> 
> 
> I think it now needs to use mod_node_page_state() with
> hpage_nr_pages(page) otherwise the counter will underflow after the
> migration?

You are absolutely right. I have not observed the
underflow, but the functions doing the decrementing
use hpage_nr_pages, and I need to do that as well
on the incrementing side.

Change made.

> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index a36736812596..38c8ddfcecc8 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -8253,14 +8253,16 @@ struct page *has_unmovable_pages(struct
> > zone *zone, struct page *page,
> >  
> >  		/*
> >  		 * Hugepages are not in LRU lists, but they're movable.
> > +		 * THPs are on the LRU, but need to be counted as
> > #small pages.
> >  		 * We need not scan over tail pages because we don't
> >  		 * handle each tail page individually in migration.
> >  		 */
> > -		if (PageHuge(page)) {
> > +		if (PageTransHuge(page)) {
> 
> Hmm, PageTransHuge() has VM_BUG_ON() for tail pages, while this code
> is
> written so that it can encounter a tail page and skip the rest of the
> compound page properly. So I would be worried about this.

Good point, a CMA allocation could start partway into a
compound page. 

> Also PageTransHuge() is basically just a PageHead() so for each
> non-hugetlbfs compound page this will assume it's a THP, while
> correctly
> it should reach the __PageMovable() || PageLRU(page) tests below.
> 
> So probably this should do something like.
> 
> if (PageHuge(page) || PageTransCompound(page)) {
> ...
>    if (PageHuge(page) && !hpage_migration_supported)) return page.

So far so good.

>    if (!PageLRU(head) && !__PageMovable(head)) return page

I don't get this one, though. What about a THP that has
not made it onto the LRU list yet for some reason?

I don't think anonymous pages are marked __PageMovable,
are they? It looks like they only have the PAGE_MAPPING_ANON
flag set, not the PAGE_MAPPING_MOVABLE one.

What am I missing?

> ...
> 
> >  			struct page *head = compound_head(page);
> >  			unsigned int skip_pages;
> >  
> > -			if
> > (!hugepage_migration_supported(page_hstate(head)))
> > +			if (PageHuge(page) &&
> > +			    !hugepage_migration_supported(page_hstate(h
> > ead)))
> >  				return page;
> >  
> >  			skip_pages = compound_nr(head) - (page - head);
> > 
> 
>
Vlastimil Babka Feb. 26, 2020, 9:48 a.m. UTC | #5
On 2/25/20 7:44 PM, Rik van Riel wrote:
>> Also PageTransHuge() is basically just a PageHead() so for each
>> non-hugetlbfs compound page this will assume it's a THP, while
>> correctly
>> it should reach the __PageMovable() || PageLRU(page) tests below.
>> 
>> So probably this should do something like.
>> 
>> if (PageHuge(page) || PageTransCompound(page)) {
>> ...
>>    if (PageHuge(page) && !hpage_migration_supported)) return page.
> 
> So far so good.
> 
>>    if (!PageLRU(head) && !__PageMovable(head)) return page
> 
> I don't get this one, though. What about a THP that has
> not made it onto the LRU list yet for some reason?

Uh, is it any different from base pages which have to pass the same check? I
guess the caller could do e.g. lru_add_drain_all() first.

> I don't think anonymous pages are marked __PageMovable,
> are they? It looks like they only have the PAGE_MAPPING_ANON
> flag set, not the PAGE_MAPPING_MOVABLE one.
> 
> What am I missing?

My point is that we should not accept compound pages that are neither a
migratable hugetlbfs page nor a THP, as movable. And your PageTransHuge() test
and my PageTransCompound() is really just a test for all compound pages, not
"hugetlbfs or THP only". I should have perhaps suggested PageCompound() instead
of the PageTransCompound() wrapper, to make it more obvious.

So we should test non-hugetlbfs pages first whether they are the kind of
compound pages that are migratable. THP's should pass this test by PageLRU(),
other compound movable pages by __PageMovable(head).

> 
>> ...
>> 
>> >  			struct page *head = compound_head(page);
>> >  			unsigned int skip_pages;
>> >  
>> > -			if
>> > (!hugepage_migration_supported(page_hstate(head)))
>> > +			if (PageHuge(page) &&
>> > +			    !hugepage_migration_supported(page_hstate(h
>> > ead)))
>> >  				return page;
>> >  
>> >  			skip_pages = compound_nr(head) - (page - head);
>> > 
>> 
>> 
>
Rik van Riel Feb. 26, 2020, 5:53 p.m. UTC | #6
On Wed, 2020-02-26 at 10:48 +0100, Vlastimil Babka wrote:
> On 2/25/20 7:44 PM, Rik van Riel wrote:
> > > Also PageTransHuge() is basically just a PageHead() so for each
> > > non-hugetlbfs compound page this will assume it's a THP, while
> > > correctly
> > > it should reach the __PageMovable() || PageLRU(page) tests below.
> > > 
> > > So probably this should do something like.
> > > 
> > > if (PageHuge(page) || PageTransCompound(page)) {
> > > ...
> > >    if (PageHuge(page) && !hpage_migration_supported)) return
> > > page.
> > 
> > So far so good.
> > 
> > >    if (!PageLRU(head) && !__PageMovable(head)) return page
> > 
> > I don't get this one, though. What about a THP that has
> > not made it onto the LRU list yet for some reason?
> 
> Uh, is it any different from base pages which have to pass the same
> check? I
> guess the caller could do e.g. lru_add_drain_all() first.

You are right, it is not different.

As for lru_add_drain_all(), I wonder at what point that
should happen?

It appears that the order in which things are done does
not really provide a good moment:
1) decide to attempt allocating a range of memory
2) scan each page block for unmovable pages
3) if no unmovable pages are found, mark the page block
   MIGRATE_ISOLATE

I wonder if we should do things the opposite way, first
marking the page block MIGRATE_ISOLATE (to prevent new
allocations), then scanning it, and calling lru_add_drain_all
if we encounter a page that looks like it could benefit from
that.

If we still see unmovable pages after that, it is cheap
enough to set the page block back to its previous state.

> > I don't think anonymous pages are marked __PageMovable,
> > are they? It looks like they only have the PAGE_MAPPING_ANON
> > flag set, not the PAGE_MAPPING_MOVABLE one.
> > 
> > What am I missing?
> 
> My point is that we should not accept compound pages that are neither
> a
> migratable hugetlbfs page nor a THP, as movable.

I have merged your suggestions into my code base. Thank
you for pointing out that 4kB pages have the exact same
restrictions as THPs, and why.

I'll run some tests and will post v2 of the series soon.
Vlastimil Babka Feb. 28, 2020, 3:17 p.m. UTC | #7
On 2/26/20 6:53 PM, Rik van Riel wrote:
> On Wed, 2020-02-26 at 10:48 +0100, Vlastimil Babka wrote:
>> On 2/25/20 7:44 PM, Rik van Riel wrote:
>>
>> Uh, is it any different from base pages which have to pass the same
>> check? I
>> guess the caller could do e.g. lru_add_drain_all() first.
> 
> You are right, it is not different.
> 
> As for lru_add_drain_all(), I wonder at what point that
> should happen?

Right now it seems to be done in alloc_contig_range(), but rather late.

> It appears that the order in which things are done does
> not really provide a good moment:
> 1) decide to attempt allocating a range of memory
> 2) scan each page block for unmovable pages
> 3) if no unmovable pages are found, mark the page block
>    MIGRATE_ISOLATE
> 
> I wonder if we should do things the opposite way, first
> marking the page block MIGRATE_ISOLATE (to prevent new
> allocations), then scanning it, and calling lru_add_drain_all
> if we encounter a page that looks like it could benefit from
> that.
> 
> If we still see unmovable pages after that, it is cheap
> enough to set the page block back to its previous state.

Yeah seems like the whole has_unmovable_pages() thing isn't much useful
here. It might prevent some unnecessary action like isolating something,
then finding non-movable page and rolling back the isolation. But maybe
it's not worth the savings, and also has_unmovable_pages() being false
doesn't guarantee succeed in the actual isolate+migrate attempt.  And if
it can cause a false negative due to lru pages not drained, then it's
actually worse than if it wasn't called at all.
Rik van Riel March 1, 2020, 2:24 a.m. UTC | #8
On Fri, 2020-02-28 at 16:17 +0100, Vlastimil Babka wrote:
> On 2/26/20 6:53 PM, Rik van Riel wrote:
> > 
> > It appears that the order in which things are done does
> > not really provide a good moment:
> > 1) decide to attempt allocating a range of memory
> > 2) scan each page block for unmovable pages
> > 3) if no unmovable pages are found, mark the page block
> >    MIGRATE_ISOLATE
> > 
> > I wonder if we should do things the opposite way, first
> > marking the page block MIGRATE_ISOLATE (to prevent new
> > allocations), then scanning it, and calling lru_add_drain_all
> > if we encounter a page that looks like it could benefit from
> > that.
> > 
> > If we still see unmovable pages after that, it is cheap
> > enough to set the page block back to its previous state.
> 
> Yeah seems like the whole has_unmovable_pages() thing isn't much
> useful
> here. It might prevent some unnecessary action like isolating
> something,
> then finding non-movable page and rolling back the isolation. But
> maybe
> it's not worth the savings, and also has_unmovable_pages() being
> false
> doesn't guarantee succeed in the actual isolate+migrate attempt.  And
> if
> it can cause a false negative due to lru pages not drained, then it's
> actually worse than if it wasn't called at all.

We'll experiment with that, and see how often it is an
issue in practice.

If this aspect of the code needs improving, I suspect
Roman and I will find it soon enough.

Patch
diff mbox series

diff --git a/mm/compaction.c b/mm/compaction.c
index 672d3c78c6ab..f3e05c91df62 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -894,12 +894,12 @@  isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 
 		/*
 		 * Regardless of being on LRU, compound pages such as THP and
-		 * hugetlbfs are not to be compacted. We can potentially save
-		 * a lot of iterations if we skip them at once. The check is
-		 * racy, but we can consider only valid values and the only
-		 * danger is skipping too much.
+		 * hugetlbfs are not to be compacted most of the time. We can
+		 * potentially save a lot of iterations if we skip them at
+		 * once. The check is racy, but we can consider only valid
+		 * values and the only danger is skipping too much.
 		 */
-		if (PageCompound(page)) {
+		if (PageCompound(page) && !cc->alloc_contig) {
 			const unsigned int order = compound_order(page);
 
 			if (likely(order < MAX_ORDER))
@@ -969,7 +969,7 @@  isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 			 * and it's on LRU. It can only be a THP so the order
 			 * is safe to read and it's 0 for tail pages.
 			 */
-			if (unlikely(PageCompound(page))) {
+			if (unlikely(PageCompound(page) && !cc->alloc_contig)) {
 				low_pfn += compound_nr(page) - 1;
 				goto isolate_fail;
 			}
@@ -981,7 +981,9 @@  isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 		if (__isolate_lru_page(page, isolate_mode) != 0)
 			goto isolate_fail;
 
-		VM_BUG_ON_PAGE(PageCompound(page), page);
+		/* The whole page is taken off the LRU; skip the tail pages. */
+		if (PageCompound(page))
+			low_pfn += compound_nr(page) - 1;
 
 		/* Successfully isolated */
 		del_page_from_lru_list(page, lruvec, page_lru(page));
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a36736812596..38c8ddfcecc8 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -8253,14 +8253,16 @@  struct page *has_unmovable_pages(struct zone *zone, struct page *page,
 
 		/*
 		 * Hugepages are not in LRU lists, but they're movable.
+		 * THPs are on the LRU, but need to be counted as #small pages.
 		 * We need not scan over tail pages because we don't
 		 * handle each tail page individually in migration.
 		 */
-		if (PageHuge(page)) {
+		if (PageTransHuge(page)) {
 			struct page *head = compound_head(page);
 			unsigned int skip_pages;
 
-			if (!hugepage_migration_supported(page_hstate(head)))
+			if (PageHuge(page) &&
+			    !hugepage_migration_supported(page_hstate(head)))
 				return page;
 
 			skip_pages = compound_nr(head) - (page - head);