diff mbox series

[RFC,1/2] mm,page_alloc: Make alloc_contig_range handle in-use hugetlb pages

Message ID 20210208103812.32056-2-osalvador@suse.de (mailing list archive)
State New, archived
Headers show
Series Make alloc_contig_range handle Hugetlb pages | expand

Commit Message

Oscar Salvador Feb. 8, 2021, 10:38 a.m. UTC
alloc_contig_range is not prepared to handle hugetlb pages and will
fail if it ever sees one, but since they can be migrated as any other
page (LRU and Movable), it makes sense to also handle them.

For now, do it only when coming from alloc_contig_range.

Signed-off-by: Oscar Salvador <osalvador@suse.de>
---
 mm/compaction.c | 17 +++++++++++++++++
 mm/vmscan.c     |  5 +++--
 2 files changed, 20 insertions(+), 2 deletions(-)

Comments

David Hildenbrand Feb. 10, 2021, 8:56 a.m. UTC | #1
On 08.02.21 11:38, Oscar Salvador wrote:
> alloc_contig_range is not prepared to handle hugetlb pages and will
> fail if it ever sees one, but since they can be migrated as any other
> page (LRU and Movable), it makes sense to also handle them.
> 
> For now, do it only when coming from alloc_contig_range.
> 
> Signed-off-by: Oscar Salvador <osalvador@suse.de>
> ---
>   mm/compaction.c | 17 +++++++++++++++++
>   mm/vmscan.c     |  5 +++--
>   2 files changed, 20 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index e5acb9714436..89cd2e60da29 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -940,6 +940,22 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>   			goto isolate_fail;
>   		}
>   
> +		/*
> +		 * Handle hugetlb pages only when coming from alloc_contig
> +		 */
> +		if (PageHuge(page) && cc->alloc_contig) {
> +			if (page_count(page)) {

I wonder if we should care about races here. What if someone 
concurrently allocates/frees?

Note that PageHuge() succeeds on tail pages, isolate_huge_page() not, i 
assume we'll have to handle that as well.

I wonder if it would make sense to move some of the magic to hugetlb 
code and handle it there with less chances for races (isolate if used, 
alloc-and-dissolve if not).

> +				/*
> +				 * Hugetlb page in-use. Isolate and migrate.
> +				 */
> +				if (isolate_huge_page(page, &cc->migratepages)) {
> +					low_pfn += compound_nr(page) - 1;
> +					goto isolate_success_no_list;
> +				}
> +			}
Oscar Salvador Feb. 10, 2021, 2:09 p.m. UTC | #2
On Wed, Feb 10, 2021 at 09:56:37AM +0100, David Hildenbrand wrote:
> On 08.02.21 11:38, Oscar Salvador wrote:
> > alloc_contig_range is not prepared to handle hugetlb pages and will
> > fail if it ever sees one, but since they can be migrated as any other
> > page (LRU and Movable), it makes sense to also handle them.
> > 
> > For now, do it only when coming from alloc_contig_range.
> > 
> > Signed-off-by: Oscar Salvador <osalvador@suse.de>
> > ---
> >   mm/compaction.c | 17 +++++++++++++++++
> >   mm/vmscan.c     |  5 +++--
> >   2 files changed, 20 insertions(+), 2 deletions(-)
> > 
> > diff --git a/mm/compaction.c b/mm/compaction.c
> > index e5acb9714436..89cd2e60da29 100644
> > --- a/mm/compaction.c
> > +++ b/mm/compaction.c
> > @@ -940,6 +940,22 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
> >   			goto isolate_fail;
> >   		}
> > +		/*
> > +		 * Handle hugetlb pages only when coming from alloc_contig
> > +		 */
> > +		if (PageHuge(page) && cc->alloc_contig) {
> > +			if (page_count(page)) {
> 
> I wonder if we should care about races here. What if someone concurrently
> allocates/frees?
> 
> Note that PageHuge() succeeds on tail pages, isolate_huge_page() not, i
> assume we'll have to handle that as well.
> 
> I wonder if it would make sense to move some of the magic to hugetlb code
> and handle it there with less chances for races (isolate if used,
> alloc-and-dissolve if not).

Yes, it makes sense to keep the magic in hugetlb code.
Note, though, that removing all races might be tricky.

isolate_huge_page() checks for PageHuge under hugetlb_lock,
so there is a race between a call to PageHuge(x) and a subsequent
call to isolate_huge_page().
But we should be fine as isolate_huge_page will fail in case the page is
no longer HugeTLB.

Also, since isolate_migratepages_block() gets called with ranges
pageblock aligned, we should never be handling tail pages in the core
of the function. E.g: the same way we handle THP:

    /* The whole page is taken off the LRU; skip the tail pages. */
    if (PageCompound(page))
           low_pfn += compound_nr(page) - 1;

But all in all, the code has to be more bullet-proof. This RFC was more
like a PoC to see whether something crazy was done.
And as I said, moving the handling of hugetlb pages to hugetlb.c might
help towards a better error-race-handling.

Thanks for having a look ;-)
David Hildenbrand Feb. 10, 2021, 2:11 p.m. UTC | #3
On 10.02.21 15:09, Oscar Salvador wrote:
> On Wed, Feb 10, 2021 at 09:56:37AM +0100, David Hildenbrand wrote:
>> On 08.02.21 11:38, Oscar Salvador wrote:
>>> alloc_contig_range is not prepared to handle hugetlb pages and will
>>> fail if it ever sees one, but since they can be migrated as any other
>>> page (LRU and Movable), it makes sense to also handle them.
>>>
>>> For now, do it only when coming from alloc_contig_range.
>>>
>>> Signed-off-by: Oscar Salvador <osalvador@suse.de>
>>> ---
>>>    mm/compaction.c | 17 +++++++++++++++++
>>>    mm/vmscan.c     |  5 +++--
>>>    2 files changed, 20 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/mm/compaction.c b/mm/compaction.c
>>> index e5acb9714436..89cd2e60da29 100644
>>> --- a/mm/compaction.c
>>> +++ b/mm/compaction.c
>>> @@ -940,6 +940,22 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>>>    			goto isolate_fail;
>>>    		}
>>> +		/*
>>> +		 * Handle hugetlb pages only when coming from alloc_contig
>>> +		 */
>>> +		if (PageHuge(page) && cc->alloc_contig) {
>>> +			if (page_count(page)) {
>>
>> I wonder if we should care about races here. What if someone concurrently
>> allocates/frees?
>>
>> Note that PageHuge() succeeds on tail pages, isolate_huge_page() not, i
>> assume we'll have to handle that as well.
>>
>> I wonder if it would make sense to move some of the magic to hugetlb code
>> and handle it there with less chances for races (isolate if used,
>> alloc-and-dissolve if not).
> 
> Yes, it makes sense to keep the magic in hugetlb code.
> Note, though, that removing all races might be tricky.
> 
> isolate_huge_page() checks for PageHuge under hugetlb_lock,
> so there is a race between a call to PageHuge(x) and a subsequent
> call to isolate_huge_page().
> But we should be fine as isolate_huge_page will fail in case the page is
> no longer HugeTLB.
> 
> Also, since isolate_migratepages_block() gets called with ranges
> pageblock aligned, we should never be handling tail pages in the core
> of the function. E.g: the same way we handle THP:

Gigantic pages? (spoiler: see my comments to next patch :) )
Oscar Salvador Feb. 10, 2021, 2:14 p.m. UTC | #4
On Wed, Feb 10, 2021 at 03:11:05PM +0100, David Hildenbrand wrote:
> On 10.02.21 15:09, Oscar Salvador wrote:
> > On Wed, Feb 10, 2021 at 09:56:37AM +0100, David Hildenbrand wrote:
> > > On 08.02.21 11:38, Oscar Salvador wrote:
> > > > alloc_contig_range is not prepared to handle hugetlb pages and will
> > > > fail if it ever sees one, but since they can be migrated as any other
> > > > page (LRU and Movable), it makes sense to also handle them.
> > > > 
> > > > For now, do it only when coming from alloc_contig_range.
> > > > 
> > > > Signed-off-by: Oscar Salvador <osalvador@suse.de>
> > > > ---
> > > >    mm/compaction.c | 17 +++++++++++++++++
> > > >    mm/vmscan.c     |  5 +++--
> > > >    2 files changed, 20 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/mm/compaction.c b/mm/compaction.c
> > > > index e5acb9714436..89cd2e60da29 100644
> > > > --- a/mm/compaction.c
> > > > +++ b/mm/compaction.c
> > > > @@ -940,6 +940,22 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
> > > >    			goto isolate_fail;
> > > >    		}
> > > > +		/*
> > > > +		 * Handle hugetlb pages only when coming from alloc_contig
> > > > +		 */
> > > > +		if (PageHuge(page) && cc->alloc_contig) {
> > > > +			if (page_count(page)) {
> > > 
> > > I wonder if we should care about races here. What if someone concurrently
> > > allocates/frees?
> > > 
> > > Note that PageHuge() succeeds on tail pages, isolate_huge_page() not, i
> > > assume we'll have to handle that as well.
> > > 
> > > I wonder if it would make sense to move some of the magic to hugetlb code
> > > and handle it there with less chances for races (isolate if used,
> > > alloc-and-dissolve if not).
> > 
> > Yes, it makes sense to keep the magic in hugetlb code.
> > Note, though, that removing all races might be tricky.
> > 
> > isolate_huge_page() checks for PageHuge under hugetlb_lock,
> > so there is a race between a call to PageHuge(x) and a subsequent
> > call to isolate_huge_page().
> > But we should be fine as isolate_huge_page will fail in case the page is
> > no longer HugeTLB.
> > 
> > Also, since isolate_migratepages_block() gets called with ranges
> > pageblock aligned, we should never be handling tail pages in the core
> > of the function. E.g: the same way we handle THP:
> 
> Gigantic pages? (spoiler: see my comments to next patch :) )

Oh, yeah, that sucks.
We had the same problem in scan_movable_pages/has_unmovable_pages
with such pages.

Uhm, I will try to be more careful :-)
David Hildenbrand Feb. 10, 2021, 2:22 p.m. UTC | #5
On 10.02.21 15:14, Oscar Salvador wrote:
> On Wed, Feb 10, 2021 at 03:11:05PM +0100, David Hildenbrand wrote:
>> On 10.02.21 15:09, Oscar Salvador wrote:
>>> On Wed, Feb 10, 2021 at 09:56:37AM +0100, David Hildenbrand wrote:
>>>> On 08.02.21 11:38, Oscar Salvador wrote:
>>>>> alloc_contig_range is not prepared to handle hugetlb pages and will
>>>>> fail if it ever sees one, but since they can be migrated as any other
>>>>> page (LRU and Movable), it makes sense to also handle them.
>>>>>
>>>>> For now, do it only when coming from alloc_contig_range.
>>>>>
>>>>> Signed-off-by: Oscar Salvador <osalvador@suse.de>
>>>>> ---
>>>>>     mm/compaction.c | 17 +++++++++++++++++
>>>>>     mm/vmscan.c     |  5 +++--
>>>>>     2 files changed, 20 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/mm/compaction.c b/mm/compaction.c
>>>>> index e5acb9714436..89cd2e60da29 100644
>>>>> --- a/mm/compaction.c
>>>>> +++ b/mm/compaction.c
>>>>> @@ -940,6 +940,22 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>>>>>     			goto isolate_fail;
>>>>>     		}
>>>>> +		/*
>>>>> +		 * Handle hugetlb pages only when coming from alloc_contig
>>>>> +		 */
>>>>> +		if (PageHuge(page) && cc->alloc_contig) {
>>>>> +			if (page_count(page)) {
>>>>
>>>> I wonder if we should care about races here. What if someone concurrently
>>>> allocates/frees?
>>>>
>>>> Note that PageHuge() succeeds on tail pages, isolate_huge_page() not, i
>>>> assume we'll have to handle that as well.
>>>>
>>>> I wonder if it would make sense to move some of the magic to hugetlb code
>>>> and handle it there with less chances for races (isolate if used,
>>>> alloc-and-dissolve if not).
>>>
>>> Yes, it makes sense to keep the magic in hugetlb code.
>>> Note, though, that removing all races might be tricky.
>>>
>>> isolate_huge_page() checks for PageHuge under hugetlb_lock,
>>> so there is a race between a call to PageHuge(x) and a subsequent
>>> call to isolate_huge_page().
>>> But we should be fine as isolate_huge_page will fail in case the page is
>>> no longer HugeTLB.
>>>
>>> Also, since isolate_migratepages_block() gets called with ranges
>>> pageblock aligned, we should never be handling tail pages in the core
>>> of the function. E.g: the same way we handle THP:
>>
>> Gigantic pages? (spoiler: see my comments to next patch :) )
> 
> Oh, yeah, that sucks.
> We had the same problem in scan_movable_pages/has_unmovable_pages
> with such pages.
> 
> Uhm, I will try to be more careful :-)
> 

Gigantic pages are a minefield. Not your fault :)
Mike Kravetz Feb. 11, 2021, 12:56 a.m. UTC | #6
On 2/8/21 2:38 AM, Oscar Salvador wrote:
> alloc_contig_range is not prepared to handle hugetlb pages and will
> fail if it ever sees one, but since they can be migrated as any other
> page (LRU and Movable), it makes sense to also handle them.
> 
> For now, do it only when coming from alloc_contig_range.
> 
> Signed-off-by: Oscar Salvador <osalvador@suse.de>
> ---
>  mm/compaction.c | 17 +++++++++++++++++
>  mm/vmscan.c     |  5 +++--
>  2 files changed, 20 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index e5acb9714436..89cd2e60da29 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -940,6 +940,22 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>  			goto isolate_fail;
>  		}
>  
> +		/*
> +		 * Handle hugetlb pages only when coming from alloc_contig
> +		 */
> +		if (PageHuge(page) && cc->alloc_contig) {
> +			if (page_count(page)) {

Thanks for doing this!

I agree with everything in the discussion you and David had.  This code
is racy, but since we are scanning lockless there is no way to eliminate
them all.  Best to just minimize the windows and document.
David Hildenbrand Feb. 11, 2021, 10:40 a.m. UTC | #7
On 11.02.21 01:56, Mike Kravetz wrote:
> On 2/8/21 2:38 AM, Oscar Salvador wrote:
>> alloc_contig_range is not prepared to handle hugetlb pages and will
>> fail if it ever sees one, but since they can be migrated as any other
>> page (LRU and Movable), it makes sense to also handle them.
>>
>> For now, do it only when coming from alloc_contig_range.
>>
>> Signed-off-by: Oscar Salvador <osalvador@suse.de>
>> ---
>>   mm/compaction.c | 17 +++++++++++++++++
>>   mm/vmscan.c     |  5 +++--
>>   2 files changed, 20 insertions(+), 2 deletions(-)
>>
>> diff --git a/mm/compaction.c b/mm/compaction.c
>> index e5acb9714436..89cd2e60da29 100644
>> --- a/mm/compaction.c
>> +++ b/mm/compaction.c
>> @@ -940,6 +940,22 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>>   			goto isolate_fail;
>>   		}
>>   
>> +		/*
>> +		 * Handle hugetlb pages only when coming from alloc_contig
>> +		 */
>> +		if (PageHuge(page) && cc->alloc_contig) {
>> +			if (page_count(page)) {
> 
> Thanks for doing this!
> 
> I agree with everything in the discussion you and David had.  This code
> is racy, but since we are scanning lockless there is no way to eliminate
> them all.  Best to just minimize the windows and document.
>

Agreed - and make sure that we don't have strange side. (e.g., in the 
next patch, allocate a new page, try to dissolve. Dissolving fails, what 
happens to the just-allocated page?)
diff mbox series

Patch

diff --git a/mm/compaction.c b/mm/compaction.c
index e5acb9714436..89cd2e60da29 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -940,6 +940,22 @@  isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 			goto isolate_fail;
 		}
 
+		/*
+		 * Handle hugetlb pages only when coming from alloc_contig
+		 */
+		if (PageHuge(page) && cc->alloc_contig) {
+			if (page_count(page)) {
+				/*
+				 * Hugetlb page in-use. Isolate and migrate.
+				 */
+				if (isolate_huge_page(page, &cc->migratepages)) {
+					low_pfn += compound_nr(page) - 1;
+					goto isolate_success_no_list;
+				}
+			}
+			goto isolate_fail;
+		}
+
 		/*
 		 * Check may be lockless but that's ok as we recheck later.
 		 * It's possible to migrate LRU and non-lru movable pages.
@@ -1041,6 +1057,7 @@  isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 
 isolate_success:
 		list_add(&page->lru, &cc->migratepages);
+isolate_success_no_list:
 		cc->nr_migratepages += compound_nr(page);
 		nr_isolated += compound_nr(page);
 
diff --git a/mm/vmscan.c b/mm/vmscan.c
index b1b574ad199d..0803adca4469 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1506,8 +1506,9 @@  unsigned int reclaim_clean_pages_from_list(struct zone *zone,
 	LIST_HEAD(clean_pages);
 
 	list_for_each_entry_safe(page, next, page_list, lru) {
-		if (page_is_file_lru(page) && !PageDirty(page) &&
-		    !__PageMovable(page) && !PageUnevictable(page)) {
+		if (!PageHuge(page) && page_is_file_lru(page) &&
+		    !PageDirty(page) && !__PageMovable(page) &&
+		    !PageUnevictable(page)) {
 			ClearPageActive(page);
 			list_move(&page->lru, &clean_pages);
 		}