diff mbox series

[v3] mm, compaction: Skip all non-migratable pages during scan

Message ID 20230517161555.84776-1-khalid.aziz@oracle.com (mailing list archive)
State New
Headers show
Series [v3] mm, compaction: Skip all non-migratable pages during scan | expand

Commit Message

Khalid Aziz May 17, 2023, 4:15 p.m. UTC
Pages pinned in memory through extra refcounts can not be migrated.
Currently as isolate_migratepages_block() scans pages for
compaction, it skips any pinned anonymous pages. All non-migratable
pages should be skipped and not just the anonymous pinned pages.
This patch adds a check for extra refcounts on a page to determine
if the page can be migrated.  This was seen as a real issue on a
customer workload where a large number of pages were pinned by vfio
on the host and any attempts to allocate hugepages resulted in
significant amount of cpu time spent in either direct compaction or
in kcompactd scanning vfio pinned pages over and over again that can
not be migrated. 

Signed-off-by: Khalid Aziz <khalid.aziz@oracle.com>
Suggested-by: Steve Sistare <steven.sistare@oracle.com>
---
v3:
	- Account for extra ref added by get_page_unless_zero() earlier
	  in isolate_migratepages_block() (Suggested by Huang, Ying)
	- Clean up computation of extra refs to be consistent 
	  (Suggested by Huang, Ying)

v2:
	- Update comments in the code (Suggested by Andrew)
	- Use PagePrivate() instead of page_has_private() (Suggested
	  by Matthew)
	- Pass mapping to page_has_extrarefs() (Suggested by Matthew)
	- Use page_ref_count() (Suggested by Matthew)
	- Rename is_pinned_page() to reflect its function more
	  accurately (Suggested by Matthew)

 mm/compaction.c | 36 ++++++++++++++++++++++++++++++++----
 1 file changed, 32 insertions(+), 4 deletions(-)

Comments

David Hildenbrand May 17, 2023, 6:32 p.m. UTC | #1
On 17.05.23 18:15, Khalid Aziz wrote:
> Pages pinned in memory through extra refcounts can not be migrated.
> Currently as isolate_migratepages_block() scans pages for
> compaction, it skips any pinned anonymous pages. All non-migratable
> pages should be skipped and not just the anonymous pinned pages.
> This patch adds a check for extra refcounts on a page to determine
> if the page can be migrated.  This was seen as a real issue on a
> customer workload where a large number of pages were pinned by vfio
> on the host and any attempts to allocate hugepages resulted in
> significant amount of cpu time spent in either direct compaction or
> in kcompactd scanning vfio pinned pages over and over again that can
> not be migrated.

How will this change affect alloc_contig_range(), such as used for CMA 
allocations or virtio-mem? alloc_contig_range() ends up calling 
isolate_migratepages_range() -> isolate_migratepages_block().

We don't want to fail early in case there is a short-term pin that might 
go away any moment after we isolated ... that will make the situation 
worse for these use cases, especially if MIGRATE_CMA or ZONE_MOVABLE is 
involved.
Khalid Aziz May 17, 2023, 10:33 p.m. UTC | #2
On 5/17/23 12:32, David Hildenbrand wrote:
> On 17.05.23 18:15, Khalid Aziz wrote:
>> Pages pinned in memory through extra refcounts can not be migrated.
>> Currently as isolate_migratepages_block() scans pages for
>> compaction, it skips any pinned anonymous pages. All non-migratable
>> pages should be skipped and not just the anonymous pinned pages.
>> This patch adds a check for extra refcounts on a page to determine
>> if the page can be migrated.  This was seen as a real issue on a
>> customer workload where a large number of pages were pinned by vfio
>> on the host and any attempts to allocate hugepages resulted in
>> significant amount of cpu time spent in either direct compaction or
>> in kcompactd scanning vfio pinned pages over and over again that can
>> not be migrated.
> 
> How will this change affect alloc_contig_range(), such as used for CMA allocations or virtio-mem? alloc_contig_range() 
> ends up calling isolate_migratepages_range() -> isolate_migratepages_block().
> 
> We don't want to fail early in case there is a short-term pin that might go away any moment after we isolated ... that 
> will make the situation worse for these use cases, especially if MIGRATE_CMA or ZONE_MOVABLE is involved.
> 

You are right that transitory conditions can be problematic. Wouldn't that apply to anonymous pages as well and we do 
skip pinned anonymous pages today? A retry would be the right way to handle transitory conditions I think. At the same 
time, by not scanning long term pinned non-anonymous pages repeatedly, alloc_contig_range() would be helped as well, right?

Nevertheless, we certainly do not want a change that makes overall system behavior worse. Do you see system behavior 
getting worse, or would the retry in cma_alloc() be sufficient to deal with transitory pins?

Thanks,
Khalid
Huang, Ying May 18, 2023, 1:09 a.m. UTC | #3
David Hildenbrand <david@redhat.com> writes:

> On 17.05.23 18:15, Khalid Aziz wrote:
>> Pages pinned in memory through extra refcounts can not be migrated.
>> Currently as isolate_migratepages_block() scans pages for
>> compaction, it skips any pinned anonymous pages. All non-migratable
>> pages should be skipped and not just the anonymous pinned pages.
>> This patch adds a check for extra refcounts on a page to determine
>> if the page can be migrated.  This was seen as a real issue on a
>> customer workload where a large number of pages were pinned by vfio
>> on the host and any attempts to allocate hugepages resulted in
>> significant amount of cpu time spent in either direct compaction or
>> in kcompactd scanning vfio pinned pages over and over again that can
>> not be migrated.
>
> How will this change affect alloc_contig_range(), such as used for CMA
> allocations or virtio-mem? alloc_contig_range() ends up calling 
> isolate_migratepages_range() -> isolate_migratepages_block().

IIUC, cc->alloc_contig can be used to distinguish contiguous allocation
and compaction.  And, from the original commit which introduced
anonymous pages skipping (commit 119d6d59dcc0 ("mm, compaction: avoid
isolating pinned pages ")) and this patch, large number of migration
failure during compaction causes real issue too.  So, I suggest to use
cc->alloc_contig here.

> We don't want to fail early in case there is a short-term pin that
> might go away any moment after we isolated ... that will make the
> situation worse for these use cases, especially if MIGRATE_CMA or
> ZONE_MOVABLE is involved.

Best Regards,
Huang, Ying
Huang, Ying May 18, 2023, 1:21 a.m. UTC | #4
Khalid Aziz <khalid.aziz@oracle.com> writes:

> Pages pinned in memory through extra refcounts can not be migrated.
> Currently as isolate_migratepages_block() scans pages for
> compaction, it skips any pinned anonymous pages. All non-migratable
> pages should be skipped and not just the anonymous pinned pages.
> This patch adds a check for extra refcounts on a page to determine
> if the page can be migrated.  This was seen as a real issue on a
> customer workload where a large number of pages were pinned by vfio
> on the host and any attempts to allocate hugepages resulted in
> significant amount of cpu time spent in either direct compaction or
> in kcompactd scanning vfio pinned pages over and over again that can
> not be migrated. 
>
> Signed-off-by: Khalid Aziz <khalid.aziz@oracle.com>
> Suggested-by: Steve Sistare <steven.sistare@oracle.com>
> ---
> v3:
> 	- Account for extra ref added by get_page_unless_zero() earlier
> 	  in isolate_migratepages_block() (Suggested by Huang, Ying)
> 	- Clean up computation of extra refs to be consistent 
> 	  (Suggested by Huang, Ying)
>
> v2:
> 	- Update comments in the code (Suggested by Andrew)
> 	- Use PagePrivate() instead of page_has_private() (Suggested
> 	  by Matthew)
> 	- Pass mapping to page_has_extrarefs() (Suggested by Matthew)
> 	- Use page_ref_count() (Suggested by Matthew)
> 	- Rename is_pinned_page() to reflect its function more
> 	  accurately (Suggested by Matthew)
>
>  mm/compaction.c | 36 ++++++++++++++++++++++++++++++++----
>  1 file changed, 32 insertions(+), 4 deletions(-)
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 5a9501e0ae01..f04c00981172 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -764,6 +764,34 @@ static bool too_many_isolated(pg_data_t *pgdat)
>  	return too_many;
>  }
>  
> +/*
> + * Check if this base page should be skipped from isolation because
> + * it has extra refcounts that will prevent it from being migrated.

This appears duplicated with the comments in caller.  It's OK to keep
one only?

> + * This function is called for regular pages only, and not
> + * for THP or hugetlbfs pages. This code is inspired by similar code
> + * in migrate_vma_check_page(), can_split_folio() and
> + * folio_migrate_mapping()

It's not good to duplicate code.  Why not just use
folio_expected_refs()?

Best Regards,
Huang, Ying

> + */
> +static inline bool page_has_extra_refs(struct page *page)
> +{
> +	/* caller holds a ref already from get_page_unless_zero() */
> +	unsigned long extra_refs = 1;
> +
> +	/* anonymous page can have extra ref from swap cache */
> +	if (PageAnon(page))
> +		extra_refs += PageSwapCache(page) ? 1 : 0;
> +	else
> +		extra_refs += 1 + PagePrivate(page);
> +
> +	/*
> +	 * This is an admittedly racy check but good enough to determine
> +	 * if a page is pinned and can not be migrated
> +	 */
> +	if ((page_ref_count(page) - extra_refs) > page_mapcount(page))
> +		return true;
> +	return false;
> +}
> +
>  /**
>   * isolate_migratepages_block() - isolate all migrate-able pages within
>   *				  a single pageblock
> @@ -992,12 +1020,12 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>  			goto isolate_fail;
>  
>  		/*
> -		 * Migration will fail if an anonymous page is pinned in memory,
> -		 * so avoid taking lru_lock and isolating it unnecessarily in an
> -		 * admittedly racy check.
> +		 * Migration will fail if a page has extra refcounts
> +		 * preventing it from migrating, so avoid taking
> +		 * lru_lock and isolating it unnecessarily
>  		 */
>  		mapping = page_mapping(page);
> -		if (!mapping && (page_count(page) - 1) > total_mapcount(page))
> +		if (page_has_extra_refs(page))
>  			goto isolate_fail_put;
>  
>  		/*
Khalid Aziz May 18, 2023, 3:07 p.m. UTC | #5
On 5/17/23 19:21, Huang, Ying wrote:
> Khalid Aziz <khalid.aziz@oracle.com> writes:
> 
>> Pages pinned in memory through extra refcounts can not be migrated.
>> Currently as isolate_migratepages_block() scans pages for
>> compaction, it skips any pinned anonymous pages. All non-migratable
>> pages should be skipped and not just the anonymous pinned pages.
>> This patch adds a check for extra refcounts on a page to determine
>> if the page can be migrated.  This was seen as a real issue on a
>> customer workload where a large number of pages were pinned by vfio
>> on the host and any attempts to allocate hugepages resulted in
>> significant amount of cpu time spent in either direct compaction or
>> in kcompactd scanning vfio pinned pages over and over again that can
>> not be migrated.
>>
>> Signed-off-by: Khalid Aziz <khalid.aziz@oracle.com>
>> Suggested-by: Steve Sistare <steven.sistare@oracle.com>
>> ---
>> v3:
>> 	- Account for extra ref added by get_page_unless_zero() earlier
>> 	  in isolate_migratepages_block() (Suggested by Huang, Ying)
>> 	- Clean up computation of extra refs to be consistent
>> 	  (Suggested by Huang, Ying)
>>
>> v2:
>> 	- Update comments in the code (Suggested by Andrew)
>> 	- Use PagePrivate() instead of page_has_private() (Suggested
>> 	  by Matthew)
>> 	- Pass mapping to page_has_extrarefs() (Suggested by Matthew)
>> 	- Use page_ref_count() (Suggested by Matthew)
>> 	- Rename is_pinned_page() to reflect its function more
>> 	  accurately (Suggested by Matthew)
>>
>>   mm/compaction.c | 36 ++++++++++++++++++++++++++++++++----
>>   1 file changed, 32 insertions(+), 4 deletions(-)
>>
>> diff --git a/mm/compaction.c b/mm/compaction.c
>> index 5a9501e0ae01..f04c00981172 100644
>> --- a/mm/compaction.c
>> +++ b/mm/compaction.c
>> @@ -764,6 +764,34 @@ static bool too_many_isolated(pg_data_t *pgdat)
>>   	return too_many;
>>   }
>>   
>> +/*
>> + * Check if this base page should be skipped from isolation because
>> + * it has extra refcounts that will prevent it from being migrated.
> 
> This appears duplicated with the comments in caller.  It's OK to keep
> one only?

I don't see any harm in explaining what the function is doing and then why this function is being called. This allows 
one to understand code in either place without having to refer to the function and function call both to understand what 
the code is doing.

> 
>> + * This function is called for regular pages only, and not
>> + * for THP or hugetlbfs pages. This code is inspired by similar code
>> + * in migrate_vma_check_page(), can_split_folio() and
>> + * folio_migrate_mapping()
> 
> It's not good to duplicate code.  Why not just use
> folio_expected_refs()?

compaction.c has not been converted over to folios, so I want to not do piecemeal conversion. When compaction code is 
converted over, it would make sense to see if this could also use the folio function.  folio_expected_refs() currently 
is a static function in mm/migrate.c.

Thanks,
Khalid

> 
> Best Regards,
> Huang, Ying
> 
>> + */
>> +static inline bool page_has_extra_refs(struct page *page)
>> +{
>> +	/* caller holds a ref already from get_page_unless_zero() */
>> +	unsigned long extra_refs = 1;
>> +
>> +	/* anonymous page can have extra ref from swap cache */
>> +	if (PageAnon(page))
>> +		extra_refs += PageSwapCache(page) ? 1 : 0;
>> +	else
>> +		extra_refs += 1 + PagePrivate(page);
>> +
>> +	/*
>> +	 * This is an admittedly racy check but good enough to determine
>> +	 * if a page is pinned and can not be migrated
>> +	 */
>> +	if ((page_ref_count(page) - extra_refs) > page_mapcount(page))
>> +		return true;
>> +	return false;
>> +}
>> +
>>   /**
>>    * isolate_migratepages_block() - isolate all migrate-able pages within
>>    *				  a single pageblock
>> @@ -992,12 +1020,12 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>>   			goto isolate_fail;
>>   
>>   		/*
>> -		 * Migration will fail if an anonymous page is pinned in memory,
>> -		 * so avoid taking lru_lock and isolating it unnecessarily in an
>> -		 * admittedly racy check.
>> +		 * Migration will fail if a page has extra refcounts
>> +		 * preventing it from migrating, so avoid taking
>> +		 * lru_lock and isolating it unnecessarily
>>   		 */
>>   		mapping = page_mapping(page);
>> -		if (!mapping && (page_count(page) - 1) > total_mapcount(page))
>> +		if (page_has_extra_refs(page))
>>   			goto isolate_fail_put;
>>   
>>   		/*
Huang, Ying May 19, 2023, 12:19 a.m. UTC | #6
Khalid Aziz <khalid.aziz@oracle.com> writes:

> On 5/17/23 19:21, Huang, Ying wrote:
>> Khalid Aziz <khalid.aziz@oracle.com> writes:
>> 
>>> Pages pinned in memory through extra refcounts can not be migrated.
>>> Currently as isolate_migratepages_block() scans pages for
>>> compaction, it skips any pinned anonymous pages. All non-migratable
>>> pages should be skipped and not just the anonymous pinned pages.
>>> This patch adds a check for extra refcounts on a page to determine
>>> if the page can be migrated.  This was seen as a real issue on a
>>> customer workload where a large number of pages were pinned by vfio
>>> on the host and any attempts to allocate hugepages resulted in
>>> significant amount of cpu time spent in either direct compaction or
>>> in kcompactd scanning vfio pinned pages over and over again that can
>>> not be migrated.
>>>
>>> Signed-off-by: Khalid Aziz <khalid.aziz@oracle.com>
>>> Suggested-by: Steve Sistare <steven.sistare@oracle.com>
>>> ---
>>> v3:
>>> 	- Account for extra ref added by get_page_unless_zero() earlier
>>> 	  in isolate_migratepages_block() (Suggested by Huang, Ying)
>>> 	- Clean up computation of extra refs to be consistent
>>> 	  (Suggested by Huang, Ying)
>>>
>>> v2:
>>> 	- Update comments in the code (Suggested by Andrew)
>>> 	- Use PagePrivate() instead of page_has_private() (Suggested
>>> 	  by Matthew)
>>> 	- Pass mapping to page_has_extrarefs() (Suggested by Matthew)
>>> 	- Use page_ref_count() (Suggested by Matthew)
>>> 	- Rename is_pinned_page() to reflect its function more
>>> 	  accurately (Suggested by Matthew)
>>>
>>>   mm/compaction.c | 36 ++++++++++++++++++++++++++++++++----
>>>   1 file changed, 32 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/mm/compaction.c b/mm/compaction.c
>>> index 5a9501e0ae01..f04c00981172 100644
>>> --- a/mm/compaction.c
>>> +++ b/mm/compaction.c
>>> @@ -764,6 +764,34 @@ static bool too_many_isolated(pg_data_t *pgdat)
>>>   	return too_many;
>>>   }
>>>   +/*
>>> + * Check if this base page should be skipped from isolation because
>>> + * it has extra refcounts that will prevent it from being migrated.
>> This appears duplicated with the comments in caller.  It's OK to
>> keep
>> one only?
>
> I don't see any harm in explaining what the function is doing and then
> why this function is being called. This allows one to understand code
> in either place without having to refer to the function and function
> call both to understand what the code is doing.

Duplicated comments will waste reader's time.

>> 
>>> + * This function is called for regular pages only, and not
>>> + * for THP or hugetlbfs pages. This code is inspired by similar code
>>> + * in migrate_vma_check_page(), can_split_folio() and
>>> + * folio_migrate_mapping()
>> It's not good to duplicate code.  Why not just use
>> folio_expected_refs()?
>
> compaction.c has not been converted over to folios, so I want to not
> do piecemeal conversion. When compaction code is converted over, it
> would make sense to see if this could also use the folio function.
> folio_expected_refs() currently is a static function in mm/migrate.c.

page_folio() is handy.

Best Regards,
Huang, Ying

>> 
>>> + */
>>> +static inline bool page_has_extra_refs(struct page *page)
>>> +{
>>> +	/* caller holds a ref already from get_page_unless_zero() */
>>> +	unsigned long extra_refs = 1;
>>> +
>>> +	/* anonymous page can have extra ref from swap cache */
>>> +	if (PageAnon(page))
>>> +		extra_refs += PageSwapCache(page) ? 1 : 0;
>>> +	else
>>> +		extra_refs += 1 + PagePrivate(page);
>>> +
>>> +	/*
>>> +	 * This is an admittedly racy check but good enough to determine
>>> +	 * if a page is pinned and can not be migrated
>>> +	 */
>>> +	if ((page_ref_count(page) - extra_refs) > page_mapcount(page))
>>> +		return true;
>>> +	return false;
>>> +}
>>> +
>>>   /**
>>>    * isolate_migratepages_block() - isolate all migrate-able pages within
>>>    *				  a single pageblock
>>> @@ -992,12 +1020,12 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>>>   			goto isolate_fail;
>>>     		/*
>>> -		 * Migration will fail if an anonymous page is pinned in memory,
>>> -		 * so avoid taking lru_lock and isolating it unnecessarily in an
>>> -		 * admittedly racy check.
>>> +		 * Migration will fail if a page has extra refcounts
>>> +		 * preventing it from migrating, so avoid taking
>>> +		 * lru_lock and isolating it unnecessarily
>>>   		 */
>>>   		mapping = page_mapping(page);
>>> -		if (!mapping && (page_count(page) - 1) > total_mapcount(page))
>>> +		if (page_has_extra_refs(page))
>>>   			goto isolate_fail_put;
>>>     		/*
David Hildenbrand May 19, 2023, 9:51 a.m. UTC | #7
On 18.05.23 03:09, Huang, Ying wrote:
> David Hildenbrand <david@redhat.com> writes:
> 
>> On 17.05.23 18:15, Khalid Aziz wrote:
>>> Pages pinned in memory through extra refcounts can not be migrated.
>>> Currently as isolate_migratepages_block() scans pages for
>>> compaction, it skips any pinned anonymous pages. All non-migratable
>>> pages should be skipped and not just the anonymous pinned pages.
>>> This patch adds a check for extra refcounts on a page to determine
>>> if the page can be migrated.  This was seen as a real issue on a
>>> customer workload where a large number of pages were pinned by vfio
>>> on the host and any attempts to allocate hugepages resulted in
>>> significant amount of cpu time spent in either direct compaction or
>>> in kcompactd scanning vfio pinned pages over and over again that can
>>> not be migrated.
>>
>> How will this change affect alloc_contig_range(), such as used for CMA
>> allocations or virtio-mem? alloc_contig_range() ends up calling
>> isolate_migratepages_range() -> isolate_migratepages_block().
> 
> IIUC, cc->alloc_contig can be used to distinguish contiguous allocation
> and compaction.  And, from the original commit which introduced
> anonymous pages skipping (commit 119d6d59dcc0 ("mm, compaction: avoid
> isolating pinned pages ")) and this patch, large number of migration
> failure during compaction causes real issue too.  So, I suggest to use
> cc->alloc_contig here.

Agreed. I further wonder if we want to special-case the !alloc_contig 
case also for MIGRATE_CMA and ZONE_MOVABLE, where we cannot have 
longterm page pinnings (e.g., vfio pinned pages).
Huang, Ying May 22, 2023, 5:55 a.m. UTC | #8
David Hildenbrand <david@redhat.com> writes:

> On 18.05.23 03:09, Huang, Ying wrote:
>> David Hildenbrand <david@redhat.com> writes:
>> 
>>> On 17.05.23 18:15, Khalid Aziz wrote:
>>>> Pages pinned in memory through extra refcounts can not be migrated.
>>>> Currently as isolate_migratepages_block() scans pages for
>>>> compaction, it skips any pinned anonymous pages. All non-migratable
>>>> pages should be skipped and not just the anonymous pinned pages.
>>>> This patch adds a check for extra refcounts on a page to determine
>>>> if the page can be migrated.  This was seen as a real issue on a
>>>> customer workload where a large number of pages were pinned by vfio
>>>> on the host and any attempts to allocate hugepages resulted in
>>>> significant amount of cpu time spent in either direct compaction or
>>>> in kcompactd scanning vfio pinned pages over and over again that can
>>>> not be migrated.
>>>
>>> How will this change affect alloc_contig_range(), such as used for CMA
>>> allocations or virtio-mem? alloc_contig_range() ends up calling
>>> isolate_migratepages_range() -> isolate_migratepages_block().
>> IIUC, cc->alloc_contig can be used to distinguish contiguous
>> allocation
>> and compaction.  And, from the original commit which introduced
>> anonymous pages skipping (commit 119d6d59dcc0 ("mm, compaction: avoid
>> isolating pinned pages ")) and this patch, large number of migration
>> failure during compaction causes real issue too.  So, I suggest to use
>> cc->alloc_contig here.
>
> Agreed. I further wonder if we want to special-case the !alloc_contig
> case also for MIGRATE_CMA and ZONE_MOVABLE, where we cannot have 
> longterm page pinnings (e.g., vfio pinned pages).

This makes sense.  The skipping is more accurate in this way.

Best Regards,
Huang, Ying
Khalid Aziz May 22, 2023, 3:12 p.m. UTC | #9
On 5/21/23 23:55, Huang, Ying wrote:
> David Hildenbrand <david@redhat.com> writes:
> 
>> On 18.05.23 03:09, Huang, Ying wrote:
>>> David Hildenbrand <david@redhat.com> writes:
>>>
>>>> On 17.05.23 18:15, Khalid Aziz wrote:
>>>>> Pages pinned in memory through extra refcounts can not be migrated.
>>>>> Currently as isolate_migratepages_block() scans pages for
>>>>> compaction, it skips any pinned anonymous pages. All non-migratable
>>>>> pages should be skipped and not just the anonymous pinned pages.
>>>>> This patch adds a check for extra refcounts on a page to determine
>>>>> if the page can be migrated.  This was seen as a real issue on a
>>>>> customer workload where a large number of pages were pinned by vfio
>>>>> on the host and any attempts to allocate hugepages resulted in
>>>>> significant amount of cpu time spent in either direct compaction or
>>>>> in kcompactd scanning vfio pinned pages over and over again that can
>>>>> not be migrated.
>>>>
>>>> How will this change affect alloc_contig_range(), such as used for CMA
>>>> allocations or virtio-mem? alloc_contig_range() ends up calling
>>>> isolate_migratepages_range() -> isolate_migratepages_block().
>>> IIUC, cc->alloc_contig can be used to distinguish contiguous
>>> allocation
>>> and compaction.  And, from the original commit which introduced
>>> anonymous pages skipping (commit 119d6d59dcc0 ("mm, compaction: avoid
>>> isolating pinned pages ")) and this patch, large number of migration
>>> failure during compaction causes real issue too.  So, I suggest to use
>>> cc->alloc_contig here.
>>
>> Agreed. I further wonder if we want to special-case the !alloc_contig
>> case also for MIGRATE_CMA and ZONE_MOVABLE, where we cannot have
>> longterm page pinnings (e.g., vfio pinned pages).
> 
> This makes sense.  The skipping is more accurate in this way.
> 


Something like this?

diff --git a/mm/compaction.c b/mm/compaction.c
index f04c00981172..014e21d3d7e9 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1025,7 +1025,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
                  * lru_lock and isolating it unnecessarily
                  */
                 mapping = page_mapping(page);
-               if (page_has_extra_refs(page))
+               if (!cc->alloc_contig && page_has_extra_refs(page))
                         goto isolate_fail_put;

                 /*


Thanks,
Khalid
Huang, Ying May 23, 2023, 1:23 a.m. UTC | #10
Khalid Aziz <khalid.aziz@oracle.com> writes:

> On 5/21/23 23:55, Huang, Ying wrote:
>> David Hildenbrand <david@redhat.com> writes:
>> 
>>> On 18.05.23 03:09, Huang, Ying wrote:
>>>> David Hildenbrand <david@redhat.com> writes:
>>>>
>>>>> On 17.05.23 18:15, Khalid Aziz wrote:
>>>>>> Pages pinned in memory through extra refcounts can not be migrated.
>>>>>> Currently as isolate_migratepages_block() scans pages for
>>>>>> compaction, it skips any pinned anonymous pages. All non-migratable
>>>>>> pages should be skipped and not just the anonymous pinned pages.
>>>>>> This patch adds a check for extra refcounts on a page to determine
>>>>>> if the page can be migrated.  This was seen as a real issue on a
>>>>>> customer workload where a large number of pages were pinned by vfio
>>>>>> on the host and any attempts to allocate hugepages resulted in
>>>>>> significant amount of cpu time spent in either direct compaction or
>>>>>> in kcompactd scanning vfio pinned pages over and over again that can
>>>>>> not be migrated.
>>>>>
>>>>> How will this change affect alloc_contig_range(), such as used for CMA
>>>>> allocations or virtio-mem? alloc_contig_range() ends up calling
>>>>> isolate_migratepages_range() -> isolate_migratepages_block().
>>>> IIUC, cc->alloc_contig can be used to distinguish contiguous
>>>> allocation
>>>> and compaction.  And, from the original commit which introduced
>>>> anonymous pages skipping (commit 119d6d59dcc0 ("mm, compaction: avoid
>>>> isolating pinned pages ")) and this patch, large number of migration
>>>> failure during compaction causes real issue too.  So, I suggest to use
>>>> cc->alloc_contig here.
>>>
>>> Agreed. I further wonder if we want to special-case the !alloc_contig
>>> case also for MIGRATE_CMA and ZONE_MOVABLE, where we cannot have
>>> longterm page pinnings (e.g., vfio pinned pages).
>> This makes sense.  The skipping is more accurate in this way.
>> 
>
>
> Something like this?
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> index f04c00981172..014e21d3d7e9 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -1025,7 +1025,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>                  * lru_lock and isolating it unnecessarily
>                  */
>                 mapping = page_mapping(page);
> -               if (page_has_extra_refs(page))
> +               if (!cc->alloc_contig && page_has_extra_refs(page))
>                         goto isolate_fail_put;
>
>                 /*

As suggested by David above, you can check the current zone type (for
ZONE_MOVABLE) and page block migrate type (MIGRATE_CMA) too.  Because
pages there will not be pinned in long term, and should be tried to be
migrated.

Best Regards,
Huang, Ying
Baolin Wang May 23, 2023, 3:42 a.m. UTC | #11
On 5/18/2023 12:15 AM, Khalid Aziz wrote:
> Pages pinned in memory through extra refcounts can not be migrated.
> Currently as isolate_migratepages_block() scans pages for
> compaction, it skips any pinned anonymous pages. All non-migratable
> pages should be skipped and not just the anonymous pinned pages.
> This patch adds a check for extra refcounts on a page to determine
> if the page can be migrated.  This was seen as a real issue on a
> customer workload where a large number of pages were pinned by vfio
> on the host and any attempts to allocate hugepages resulted in
> significant amount of cpu time spent in either direct compaction or
> in kcompactd scanning vfio pinned pages over and over again that can
> not be migrated.
> 
> Signed-off-by: Khalid Aziz <khalid.aziz@oracle.com>
> Suggested-by: Steve Sistare <steven.sistare@oracle.com>
> ---
> v3:
> 	- Account for extra ref added by get_page_unless_zero() earlier
> 	  in isolate_migratepages_block() (Suggested by Huang, Ying)
> 	- Clean up computation of extra refs to be consistent
> 	  (Suggested by Huang, Ying)
> 
> v2:
> 	- Update comments in the code (Suggested by Andrew)
> 	- Use PagePrivate() instead of page_has_private() (Suggested
> 	  by Matthew)
> 	- Pass mapping to page_has_extrarefs() (Suggested by Matthew)
> 	- Use page_ref_count() (Suggested by Matthew)
> 	- Rename is_pinned_page() to reflect its function more
> 	  accurately (Suggested by Matthew)
> 
>   mm/compaction.c | 36 ++++++++++++++++++++++++++++++++----
>   1 file changed, 32 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 5a9501e0ae01..f04c00981172 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -764,6 +764,34 @@ static bool too_many_isolated(pg_data_t *pgdat)
>   	return too_many;
>   }
>   
> +/*
> + * Check if this base page should be skipped from isolation because
> + * it has extra refcounts that will prevent it from being migrated.
> + * This function is called for regular pages only, and not
> + * for THP or hugetlbfs pages. This code is inspired by similar code
> + * in migrate_vma_check_page(), can_split_folio() and
> + * folio_migrate_mapping()
> + */
> +static inline bool page_has_extra_refs(struct page *page)
> +{
> +	/* caller holds a ref already from get_page_unless_zero() */
> +	unsigned long extra_refs = 1;
> +
> +	/* anonymous page can have extra ref from swap cache */
> +	if (PageAnon(page))
> +		extra_refs += PageSwapCache(page) ? 1 : 0;
> +	else
> +		extra_refs += 1 + PagePrivate(page);
> +
> +	/*
> +	 * This is an admittedly racy check but good enough to determine
> +	 * if a page is pinned and can not be migrated
> +	 */
> +	if ((page_ref_count(page) - extra_refs) > page_mapcount(page))

Could you explain why changing total_mapcount() to page_mapcount()? The 
original commit 829ae0f81ce0 ("mm: migrate: fix THP's mapcount on 
isolation") did fix a THP's mapcount issue with converting to 
total_mapcount().

> +		return true;
> +	return false;
> +}
> +
>   /**
>    * isolate_migratepages_block() - isolate all migrate-able pages within
>    *				  a single pageblock
> @@ -992,12 +1020,12 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>   			goto isolate_fail;
>   
>   		/*
> -		 * Migration will fail if an anonymous page is pinned in memory,
> -		 * so avoid taking lru_lock and isolating it unnecessarily in an
> -		 * admittedly racy check.
> +		 * Migration will fail if a page has extra refcounts
> +		 * preventing it from migrating, so avoid taking
> +		 * lru_lock and isolating it unnecessarily
>   		 */
>   		mapping = page_mapping(page);
> -		if (!mapping && (page_count(page) - 1) > total_mapcount(page))
> +		if (page_has_extra_refs(page))
>   			goto isolate_fail_put;
>   
>   		/*
Khalid Aziz May 23, 2023, 8:54 p.m. UTC | #12
On 5/22/23 21:42, Baolin Wang wrote:
> 
> 
> On 5/18/2023 12:15 AM, Khalid Aziz wrote:
>> Pages pinned in memory through extra refcounts can not be migrated.
>> Currently as isolate_migratepages_block() scans pages for
>> compaction, it skips any pinned anonymous pages. All non-migratable
>> pages should be skipped and not just the anonymous pinned pages.
>> This patch adds a check for extra refcounts on a page to determine
>> if the page can be migrated.  This was seen as a real issue on a
>> customer workload where a large number of pages were pinned by vfio
>> on the host and any attempts to allocate hugepages resulted in
>> significant amount of cpu time spent in either direct compaction or
>> in kcompactd scanning vfio pinned pages over and over again that can
>> not be migrated.
>>
>> Signed-off-by: Khalid Aziz <khalid.aziz@oracle.com>
>> Suggested-by: Steve Sistare <steven.sistare@oracle.com>
>> ---
>> v3:
>>     - Account for extra ref added by get_page_unless_zero() earlier
>>       in isolate_migratepages_block() (Suggested by Huang, Ying)
>>     - Clean up computation of extra refs to be consistent
>>       (Suggested by Huang, Ying)
>>
>> v2:
>>     - Update comments in the code (Suggested by Andrew)
>>     - Use PagePrivate() instead of page_has_private() (Suggested
>>       by Matthew)
>>     - Pass mapping to page_has_extrarefs() (Suggested by Matthew)
>>     - Use page_ref_count() (Suggested by Matthew)
>>     - Rename is_pinned_page() to reflect its function more
>>       accurately (Suggested by Matthew)
>>
>>   mm/compaction.c | 36 ++++++++++++++++++++++++++++++++----
>>   1 file changed, 32 insertions(+), 4 deletions(-)
>>
>> diff --git a/mm/compaction.c b/mm/compaction.c
>> index 5a9501e0ae01..f04c00981172 100644
>> --- a/mm/compaction.c
>> +++ b/mm/compaction.c
>> @@ -764,6 +764,34 @@ static bool too_many_isolated(pg_data_t *pgdat)
>>       return too_many;
>>   }
>> +/*
>> + * Check if this base page should be skipped from isolation because
>> + * it has extra refcounts that will prevent it from being migrated.
>> + * This function is called for regular pages only, and not
>> + * for THP or hugetlbfs pages. This code is inspired by similar code
>> + * in migrate_vma_check_page(), can_split_folio() and
>> + * folio_migrate_mapping()
>> + */
>> +static inline bool page_has_extra_refs(struct page *page)
>> +{
>> +    /* caller holds a ref already from get_page_unless_zero() */
>> +    unsigned long extra_refs = 1;
>> +
>> +    /* anonymous page can have extra ref from swap cache */
>> +    if (PageAnon(page))
>> +        extra_refs += PageSwapCache(page) ? 1 : 0;
>> +    else
>> +        extra_refs += 1 + PagePrivate(page);
>> +
>> +    /*
>> +     * This is an admittedly racy check but good enough to determine
>> +     * if a page is pinned and can not be migrated
>> +     */
>> +    if ((page_ref_count(page) - extra_refs) > page_mapcount(page))
> 
> Could you explain why changing total_mapcount() to page_mapcount()? The original commit 829ae0f81ce0 ("mm: migrate: fix 
> THP's mapcount on isolation") did fix a THP's mapcount issue with converting to total_mapcount().

I had based this code on migrate_vma_check_page() which uses page_mapcount() but you are right. In the code for 
page_has_extra_refs(), this should stay total_mapcount(). I will fix it.

Thanks,
Khalid

> 
>> +        return true;
>> +    return false;
>> +}
>> +
>>   /**
>>    * isolate_migratepages_block() - isolate all migrate-able pages within
>>    *                  a single pageblock
>> @@ -992,12 +1020,12 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>>               goto isolate_fail;
>>           /*
>> -         * Migration will fail if an anonymous page is pinned in memory,
>> -         * so avoid taking lru_lock and isolating it unnecessarily in an
>> -         * admittedly racy check.
>> +         * Migration will fail if a page has extra refcounts
>> +         * preventing it from migrating, so avoid taking
>> +         * lru_lock and isolating it unnecessarily
>>            */
>>           mapping = page_mapping(page);
>> -        if (!mapping && (page_count(page) - 1) > total_mapcount(page))
>> +        if (page_has_extra_refs(page))
>>               goto isolate_fail_put;
>>           /*
diff mbox series

Patch

diff --git a/mm/compaction.c b/mm/compaction.c
index 5a9501e0ae01..f04c00981172 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -764,6 +764,34 @@  static bool too_many_isolated(pg_data_t *pgdat)
 	return too_many;
 }
 
+/*
+ * Check if this base page should be skipped from isolation because
+ * it has extra refcounts that will prevent it from being migrated.
+ * This function is called for regular pages only, and not
+ * for THP or hugetlbfs pages. This code is inspired by similar code
+ * in migrate_vma_check_page(), can_split_folio() and
+ * folio_migrate_mapping()
+ */
+static inline bool page_has_extra_refs(struct page *page)
+{
+	/* caller holds a ref already from get_page_unless_zero() */
+	unsigned long extra_refs = 1;
+
+	/* anonymous page can have extra ref from swap cache */
+	if (PageAnon(page))
+		extra_refs += PageSwapCache(page) ? 1 : 0;
+	else
+		extra_refs += 1 + PagePrivate(page);
+
+	/*
+	 * This is an admittedly racy check but good enough to determine
+	 * if a page is pinned and can not be migrated
+	 */
+	if ((page_ref_count(page) - extra_refs) > page_mapcount(page))
+		return true;
+	return false;
+}
+
 /**
  * isolate_migratepages_block() - isolate all migrate-able pages within
  *				  a single pageblock
@@ -992,12 +1020,12 @@  isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 			goto isolate_fail;
 
 		/*
-		 * Migration will fail if an anonymous page is pinned in memory,
-		 * so avoid taking lru_lock and isolating it unnecessarily in an
-		 * admittedly racy check.
+		 * Migration will fail if a page has extra refcounts
+		 * preventing it from migrating, so avoid taking
+		 * lru_lock and isolating it unnecessarily
 		 */
 		mapping = page_mapping(page);
-		if (!mapping && (page_count(page) - 1) > total_mapcount(page))
+		if (page_has_extra_refs(page))
 			goto isolate_fail_put;
 
 		/*