diff mbox series

mm: migrate: Correct the hugetlb migration stats

Message ID b35e54802a9a82d03d24845b463e9d9a68f7fd6b.1635491660.git.baolin.wang@linux.alibaba.com (mailing list archive)
State New
Headers show
Series mm: migrate: Correct the hugetlb migration stats | expand

Commit Message

Baolin Wang Oct. 29, 2021, 7:42 a.m. UTC
Now hugetlb migration is also available for some scenarios, such as
soft offling or memory compaction. So we should correct the migration
stats for hugetlb with using compound_nr() instead of thp_nr_pages()
to get the number of pages.

Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
 mm/migrate.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

Comments

Zi Yan Oct. 29, 2021, 3:43 p.m. UTC | #1
On 29 Oct 2021, at 3:42, Baolin Wang wrote:

> Now hugetlb migration is also available for some scenarios, such as
> soft offling or memory compaction. So we should correct the migration

hugetlb migration is available at the time if (PageHuge(page)) branch
is added. I am not sure what is new here.

> stats for hugetlb with using compound_nr() instead of thp_nr_pages()
> to get the number of pages.

nr_failed records the number of pages, not subpages. It is returned to
user space when move_pages() syscall is used. After your change,
if users try to migrate a list of pages including THPs and/or hugetlb
pages and some of THPs and/or hugetlb fail to migrate, move_pages()
will return a number larger than the number of pages the users tried
to migrate. I am not sure this is the change we want. Or at least,
the comment of migrate_pages() and the manpage of move_pages() need
to be changed and linux-api mailing list should be cc’d.

>
> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> ---
>  mm/migrate.c | 10 ++++------
>  1 file changed, 4 insertions(+), 6 deletions(-)
>
> diff --git a/mm/migrate.c b/mm/migrate.c
> index a11e948..2b45a29 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1475,7 +1475,7 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>  			 * during migration.
>  			 */
>  			is_thp = PageTransHuge(page) && !PageHuge(page);
> -			nr_subpages = thp_nr_pages(page);
> +			nr_subpages = compound_nr(page);
>  			cond_resched();
>
>  			if (PageHuge(page))
> @@ -1540,7 +1540,7 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>  					nr_failed += nr_subpages;
>  					goto out;
>  				}
> -				nr_failed++;
> +				nr_failed += nr_subpages;
>  				goto out;
>  			case -EAGAIN:
>  				if (is_thp) {
> @@ -1550,14 +1550,14 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>  				retry++;
>  				break;
>  			case MIGRATEPAGE_SUCCESS:
> +				nr_succeeded += nr_subpages;
>  				if (is_thp) {
>  					nr_thp_succeeded++;
> -					nr_succeeded += nr_subpages;
>  					break;
>  				}
> -				nr_succeeded++;
>  				break;
>  			default:
> +				nr_failed += nr_subpages;
>  				/*
>  				 * Permanent failure (-EBUSY, etc.):
>  				 * unlike -EAGAIN case, the failed page is
> @@ -1566,10 +1566,8 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>  				 */
>  				if (is_thp) {
>  					nr_thp_failed++;
> -					nr_failed += nr_subpages;
>  					break;
>  				}
> -				nr_failed++;
>  				break;
>  			}
>  		}
> -- 
> 1.8.3.1

--
Best Regards,
Yan, Zi
Baolin Wang Nov. 1, 2021, 6:54 a.m. UTC | #2
On 2021/10/29 23:43, Zi Yan wrote:
> On 29 Oct 2021, at 3:42, Baolin Wang wrote:
> 
>> Now hugetlb migration is also available for some scenarios, such as
>> soft offling or memory compaction. So we should correct the migration
> 
> hugetlb migration is available at the time if (PageHuge(page)) branch
> is added. I am not sure what is new here.

No new things actually, sorry for confusing and will update the commit 
message in next version.

> 
>> stats for hugetlb with using compound_nr() instead of thp_nr_pages()
>> to get the number of pages.
> 
> nr_failed records the number of pages, not subpages. It is returned to

I also think nr_failed should record the number of pages, not the number 
of hugetlb, if I understand you correctly.

> user space when move_pages() syscall is used. After your change,
> if users try to migrate a list of pages including THPs and/or hugetlb
> pages and some of THPs and/or hugetlb fail to migrate, move_pages()
> will return a number larger than the number of pages the users tried

OK, thanks for pointing out the issue.

But before my patch, we've already returned the number of pages 
successed or failed for THP migration, instead of the number of THP. 
That means if we just move only 1 page by move_pages() and if this page 
is 2M THP, so move_pages() will return 512 if failed to migrate, which 
is larger than the page count specified from user.

if (err > 0)
	err += nr_pages - i - 1;

On the other hand, the stats of PGMIGRATE_SUCCESS/PGMIGRATE_FAIL should 
stand for the number of pages, instead of the number of hugetlb. Also 
for hugetlb migration when memory compaction, we've already counted the 
number of pages for a hugetlb into cc->nr_migratepages, if the hugetlb 
migration failed, the trace stat of compaction will be confusing if we 
return the number of hugetlb.

trace_mm_compaction_migratepages(cc->nr_migratepages, err, 
                                   &cc->migratepages);

So I think the stats of hugetlb migration should be consistent with THP.

> to migrate. I am not sure this is the change we want. Or at least,
> the comment of migrate_pages() and the manpage of move_pages() need
> to be changed and linux-api mailing list should be cc’d.

I don't think we should update the comments of migrate_pages(), "Returns 
the number of pages that were not migrated" makes sense to me if I 
understand correctly.

For the manpage of move_pages(), as you said, the the returned 
non-migrate page numbers can be larger than the numbers specified from 
user if failed to migrate a THP or a hugetlb. I am not sure if we should 
change the manpage, since the THP already did, but I can send a patch to 
update the manpage if you think this is still necessary. Thanks.

>>
>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>> ---
>>   mm/migrate.c | 10 ++++------
>>   1 file changed, 4 insertions(+), 6 deletions(-)
>>
>> diff --git a/mm/migrate.c b/mm/migrate.c
>> index a11e948..2b45a29 100644
>> --- a/mm/migrate.c
>> +++ b/mm/migrate.c
>> @@ -1475,7 +1475,7 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>>   			 * during migration.
>>   			 */
>>   			is_thp = PageTransHuge(page) && !PageHuge(page);
>> -			nr_subpages = thp_nr_pages(page);
>> +			nr_subpages = compound_nr(page);
>>   			cond_resched();
>>
>>   			if (PageHuge(page))
>> @@ -1540,7 +1540,7 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>>   					nr_failed += nr_subpages;
>>   					goto out;
>>   				}
>> -				nr_failed++;
>> +				nr_failed += nr_subpages;
>>   				goto out;
>>   			case -EAGAIN:
>>   				if (is_thp) {
>> @@ -1550,14 +1550,14 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>>   				retry++;
>>   				break;
>>   			case MIGRATEPAGE_SUCCESS:
>> +				nr_succeeded += nr_subpages;
>>   				if (is_thp) {
>>   					nr_thp_succeeded++;
>> -					nr_succeeded += nr_subpages;
>>   					break;
>>   				}
>> -				nr_succeeded++;
>>   				break;
>>   			default:
>> +				nr_failed += nr_subpages;
>>   				/*
>>   				 * Permanent failure (-EBUSY, etc.):
>>   				 * unlike -EAGAIN case, the failed page is
>> @@ -1566,10 +1566,8 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>>   				 */
>>   				if (is_thp) {
>>   					nr_thp_failed++;
>> -					nr_failed += nr_subpages;
>>   					break;
>>   				}
>> -				nr_failed++;
>>   				break;
>>   			}
>>   		}
>> -- 
>> 1.8.3.1
> 
> --
> Best Regards,
> Yan, Zi
>
Zi Yan Nov. 1, 2021, 3:12 p.m. UTC | #3
On 1 Nov 2021, at 2:54, Baolin Wang wrote:

> On 2021/10/29 23:43, Zi Yan wrote:
>> On 29 Oct 2021, at 3:42, Baolin Wang wrote:
>>
>>> Now hugetlb migration is also available for some scenarios, such as
>>> soft offling or memory compaction. So we should correct the migration
>>
>> hugetlb migration is available at the time if (PageHuge(page)) branch
>> is added. I am not sure what is new here.
>
> No new things actually, sorry for confusing and will update the commit message in next version.
>
>>
>>> stats for hugetlb with using compound_nr() instead of thp_nr_pages()
>>> to get the number of pages.
>>
>> nr_failed records the number of pages, not subpages. It is returned to
>
> I also think nr_failed should record the number of pages, not the number of hugetlb, if I understand you correctly.
>
>> user space when move_pages() syscall is used. After your change,
>> if users try to migrate a list of pages including THPs and/or hugetlb
>> pages and some of THPs and/or hugetlb fail to migrate, move_pages()
>> will return a number larger than the number of pages the users tried
>
> OK, thanks for pointing out the issue.
>
> But before my patch, we've already returned the number of pages successed or failed for THP migration, instead of the number of THP. That means if we just move only 1 page by

Ah, you are right.

> move_pages() and if this page is 2M THP, so move_pages() will return 512 if failed to migrate, which is larger than the page count specified from user.
>
> if (err > 0)
> 	err += nr_pages - i - 1;

I am not sure this is right for user-space.

>
> On the other hand, the stats of PGMIGRATE_SUCCESS/PGMIGRATE_FAIL should stand for the number of pages, instead of the number of hugetlb. Also for hugetlb migration when memory compaction, we've already counted the number of pages for a hugetlb into cc->nr_migratepages, if the hugetlb migration failed, the trace stat of compaction will be confusing if we return the number of hugetlb.
>
> trace_mm_compaction_migratepages(cc->nr_migratepages, err,                                   &cc->migratepages);
>
> So I think the stats of hugetlb migration should be consistent with THP.

It makes sense to me.

>
>> to migrate. I am not sure this is the change we want. Or at least,
>> the comment of migrate_pages() and the manpage of move_pages() need
>> to be changed and linux-api mailing list should be cc’d.
>
> I don't think we should update the comments of migrate_pages(), "Returns the number of pages that were not migrated" makes sense to me if I understand correctly.
>
> For the manpage of move_pages(), as you said, the the returned non-migrate page numbers can be larger than the numbers specified from user if failed to migrate a THP or a hugetlb. I am not sure if we should change the manpage, since the THP already did, but I can send a patch to update the manpage if you think this is still necessary. Thanks.

I am not sure changing manpage would help the users of move_pages() after
think about it again, since users might not know all the THP and/or hugetlb
information	when they call move_pages() and they just pass a list of N pages.

I just wonder if we could fix the rc value of migrate_pages to return
the number of {base page, THP, hugetlb} instead, so that move_pages()
can get its return value right.

Thanks.

>
>>>
>>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>>> ---
>>>   mm/migrate.c | 10 ++++------
>>>   1 file changed, 4 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/mm/migrate.c b/mm/migrate.c
>>> index a11e948..2b45a29 100644
>>> --- a/mm/migrate.c
>>> +++ b/mm/migrate.c
>>> @@ -1475,7 +1475,7 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>>>   			 * during migration.
>>>   			 */
>>>   			is_thp = PageTransHuge(page) && !PageHuge(page);
>>> -			nr_subpages = thp_nr_pages(page);
>>> +			nr_subpages = compound_nr(page);
>>>   			cond_resched();
>>>
>>>   			if (PageHuge(page))
>>> @@ -1540,7 +1540,7 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>>>   					nr_failed += nr_subpages;
>>>   					goto out;
>>>   				}
>>> -				nr_failed++;
>>> +				nr_failed += nr_subpages;
>>>   				goto out;
>>>   			case -EAGAIN:
>>>   				if (is_thp) {
>>> @@ -1550,14 +1550,14 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>>>   				retry++;
>>>   				break;
>>>   			case MIGRATEPAGE_SUCCESS:
>>> +				nr_succeeded += nr_subpages;
>>>   				if (is_thp) {
>>>   					nr_thp_succeeded++;
>>> -					nr_succeeded += nr_subpages;
>>>   					break;
>>>   				}
>>> -				nr_succeeded++;
>>>   				break;
>>>   			default:
>>> +				nr_failed += nr_subpages;
>>>   				/*
>>>   				 * Permanent failure (-EBUSY, etc.):
>>>   				 * unlike -EAGAIN case, the failed page is
>>> @@ -1566,10 +1566,8 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>>>   				 */
>>>   				if (is_thp) {
>>>   					nr_thp_failed++;
>>> -					nr_failed += nr_subpages;
>>>   					break;
>>>   				}
>>> -				nr_failed++;
>>>   				break;
>>>   			}
>>>   		}
>>> -- 
>>> 1.8.3.1
>>
>> --
>> Best Regards,
>> Yan, Zi
>>


--
Best Regards,
Yan, Zi
Baolin Wang Nov. 2, 2021, 6:08 a.m. UTC | #4
On 2021/11/1 23:12, Zi Yan wrote:
> On 1 Nov 2021, at 2:54, Baolin Wang wrote:
> 
>> On 2021/10/29 23:43, Zi Yan wrote:
>>> On 29 Oct 2021, at 3:42, Baolin Wang wrote:
>>>
>>>> Now hugetlb migration is also available for some scenarios, such as
>>>> soft offling or memory compaction. So we should correct the migration
>>>
>>> hugetlb migration is available at the time if (PageHuge(page)) branch
>>> is added. I am not sure what is new here.
>>
>> No new things actually, sorry for confusing and will update the commit message in next version.
>>
>>>
>>>> stats for hugetlb with using compound_nr() instead of thp_nr_pages()
>>>> to get the number of pages.
>>>
>>> nr_failed records the number of pages, not subpages. It is returned to
>>
>> I also think nr_failed should record the number of pages, not the number of hugetlb, if I understand you correctly.
>>
>>> user space when move_pages() syscall is used. After your change,
>>> if users try to migrate a list of pages including THPs and/or hugetlb
>>> pages and some of THPs and/or hugetlb fail to migrate, move_pages()
>>> will return a number larger than the number of pages the users tried
>>
>> OK, thanks for pointing out the issue.
>>
>> But before my patch, we've already returned the number of pages successed or failed for THP migration, instead of the number of THP. That means if we just move only 1 page by
> 
> Ah, you are right.
> 
>> move_pages() and if this page is 2M THP, so move_pages() will return 512 if failed to migrate, which is larger than the page count specified from user.
>>
>> if (err > 0)
>> 	err += nr_pages - i - 1;
> 
> I am not sure this is right for user-space.
> 
>>
>> On the other hand, the stats of PGMIGRATE_SUCCESS/PGMIGRATE_FAIL should stand for the number of pages, instead of the number of hugetlb. Also for hugetlb migration when memory compaction, we've already counted the number of pages for a hugetlb into cc->nr_migratepages, if the hugetlb migration failed, the trace stat of compaction will be confusing if we return the number of hugetlb.
>>
>> trace_mm_compaction_migratepages(cc->nr_migratepages, err,                                   &cc->migratepages);
>>
>> So I think the stats of hugetlb migration should be consistent with THP.
> 
> It makes sense to me.
> 
>>
>>> to migrate. I am not sure this is the change we want. Or at least,
>>> the comment of migrate_pages() and the manpage of move_pages() need
>>> to be changed and linux-api mailing list should be cc’d.
>>
>> I don't think we should update the comments of migrate_pages(), "Returns the number of pages that were not migrated" makes sense to me if I understand correctly.
>>
>> For the manpage of move_pages(), as you said, the the returned non-migrate page numbers can be larger than the numbers specified from user if failed to migrate a THP or a hugetlb. I am not sure if we should change the manpage, since the THP already did, but I can send a patch to update the manpage if you think this is still necessary. Thanks.
> 
> I am not sure changing manpage would help the users of move_pages() after
> think about it again, since users might not know all the THP and/or hugetlb
> information	when they call move_pages() and they just pass a list of N pages. >
> I just wonder if we could fix the rc value of migrate_pages to return
> the number of {base page, THP, hugetlb} instead, so that move_pages()
> can get its return value right.

IMO it will break the usage in other places if we change the rc value of 
migrate_pages(), for example, the page migration when doing memory 
compaction as I said before, which will expect the number of normal 
pages. Meanwhile the THP page can be split into normal pages during 
migration, so it will not be consistent if we return the number of THP.

Changing the return value of migrate_pages() will make things more 
complicated, and I am not sure whether it is worth doing. Any 
suggestion? Thanks.
Zi Yan Nov. 2, 2021, 7:30 p.m. UTC | #5
On 2 Nov 2021, at 2:08, Baolin Wang wrote:

> On 2021/11/1 23:12, Zi Yan wrote:
>> On 1 Nov 2021, at 2:54, Baolin Wang wrote:
>>
>>> On 2021/10/29 23:43, Zi Yan wrote:
>>>> On 29 Oct 2021, at 3:42, Baolin Wang wrote:
>>>>
>>>>> Now hugetlb migration is also available for some scenarios, such as
>>>>> soft offling or memory compaction. So we should correct the migration
>>>>
>>>> hugetlb migration is available at the time if (PageHuge(page)) branch
>>>> is added. I am not sure what is new here.
>>>
>>> No new things actually, sorry for confusing and will update the commit message in next version.
>>>
>>>>
>>>>> stats for hugetlb with using compound_nr() instead of thp_nr_pages()
>>>>> to get the number of pages.
>>>>
>>>> nr_failed records the number of pages, not subpages. It is returned to
>>>
>>> I also think nr_failed should record the number of pages, not the number of hugetlb, if I understand you correctly.
>>>
>>>> user space when move_pages() syscall is used. After your change,
>>>> if users try to migrate a list of pages including THPs and/or hugetlb
>>>> pages and some of THPs and/or hugetlb fail to migrate, move_pages()
>>>> will return a number larger than the number of pages the users tried
>>>
>>> OK, thanks for pointing out the issue.
>>>
>>> But before my patch, we've already returned the number of pages successed or failed for THP migration, instead of the number of THP. That means if we just move only 1 page by
>>
>> Ah, you are right.
>>
>>> move_pages() and if this page is 2M THP, so move_pages() will return 512 if failed to migrate, which is larger than the page count specified from user.
>>>
>>> if (err > 0)
>>> 	err += nr_pages - i - 1;
>>
>> I am not sure this is right for user-space.
>>
>>>
>>> On the other hand, the stats of PGMIGRATE_SUCCESS/PGMIGRATE_FAIL should stand for the number of pages, instead of the number of hugetlb. Also for hugetlb migration when memory compaction, we've already counted the number of pages for a hugetlb into cc->nr_migratepages, if the hugetlb migration failed, the trace stat of compaction will be confusing if we return the number of hugetlb.
>>>
>>> trace_mm_compaction_migratepages(cc->nr_migratepages, err,                                   &cc->migratepages);
>>>
>>> So I think the stats of hugetlb migration should be consistent with THP.
>>
>> It makes sense to me.
>>
>>>
>>>> to migrate. I am not sure this is the change we want. Or at least,
>>>> the comment of migrate_pages() and the manpage of move_pages() need
>>>> to be changed and linux-api mailing list should be cc’d.
>>>
>>> I don't think we should update the comments of migrate_pages(), "Returns the number of pages that were not migrated" makes sense to me if I understand correctly.
>>>
>>> For the manpage of move_pages(), as you said, the the returned non-migrate page numbers can be larger than the numbers specified from user if failed to migrate a THP or a hugetlb. I am not sure if we should change the manpage, since the THP already did, but I can send a patch to update the manpage if you think this is still necessary. Thanks.
>>
>> I am not sure changing manpage would help the users of move_pages() after
>> think about it again, since users might not know all the THP and/or hugetlb
>> information	when they call move_pages() and they just pass a list of N pages. >
>> I just wonder if we could fix the rc value of migrate_pages to return
>> the number of {base page, THP, hugetlb} instead, so that move_pages()
>> can get its return value right.
>
> IMO it will break the usage in other places if we change the rc value of migrate_pages(), for example, the page migration when doing memory compaction as I said before, which will expect the number of normal pages. Meanwhile the THP page can be split into normal pages during migration, so it will not be consistent if we return the number of THP.

You mean the above trace_mm_compaction_migratepages()? I checked all migrate_pages()
callers and none of them cares about the actual number of non-migrated pages, except
do_move_pages_to_node() and trace_mm_compaction_migratepages(). The former expects
the number of before-split-and-not-subpage pages, whereas the latter expects, like
you said, the number of base pages.

>
> Changing the return value of migrate_pages() will make things more complicated, and I am not sure whether it is worth doing. Any suggestion? Thanks.

How about 1) fixing migrate_pages() to return the number of before-split-and-not-subpage
pages, and 2) replace err with nr_succeeded (you can get it via *ret_succeeded in
migrate_pages()) in trace_mm_compaction_migratepages()? As a result, user-space move_pages()
will be fixed and trace_mm_compaction_migratepages() gives a
different but correct number as you want (you can still get nr_nonmigrated =
nr_migratepages - nr_succeeded).



--
Best Regards,
Yan, Zi
Baolin Wang Nov. 3, 2021, 8:07 a.m. UTC | #6
On 2021/11/3 3:30, Zi Yan wrote:
> On 2 Nov 2021, at 2:08, Baolin Wang wrote:
> 
>> On 2021/11/1 23:12, Zi Yan wrote:
>>> On 1 Nov 2021, at 2:54, Baolin Wang wrote:
>>>
>>>> On 2021/10/29 23:43, Zi Yan wrote:
>>>>> On 29 Oct 2021, at 3:42, Baolin Wang wrote:
>>>>>
>>>>>> Now hugetlb migration is also available for some scenarios, such as
>>>>>> soft offling or memory compaction. So we should correct the migration
>>>>>
>>>>> hugetlb migration is available at the time if (PageHuge(page)) branch
>>>>> is added. I am not sure what is new here.
>>>>
>>>> No new things actually, sorry for confusing and will update the commit message in next version.
>>>>
>>>>>
>>>>>> stats for hugetlb with using compound_nr() instead of thp_nr_pages()
>>>>>> to get the number of pages.
>>>>>
>>>>> nr_failed records the number of pages, not subpages. It is returned to
>>>>
>>>> I also think nr_failed should record the number of pages, not the number of hugetlb, if I understand you correctly.
>>>>
>>>>> user space when move_pages() syscall is used. After your change,
>>>>> if users try to migrate a list of pages including THPs and/or hugetlb
>>>>> pages and some of THPs and/or hugetlb fail to migrate, move_pages()
>>>>> will return a number larger than the number of pages the users tried
>>>>
>>>> OK, thanks for pointing out the issue.
>>>>
>>>> But before my patch, we've already returned the number of pages successed or failed for THP migration, instead of the number of THP. That means if we just move only 1 page by
>>>
>>> Ah, you are right.
>>>
>>>> move_pages() and if this page is 2M THP, so move_pages() will return 512 if failed to migrate, which is larger than the page count specified from user.
>>>>
>>>> if (err > 0)
>>>> 	err += nr_pages - i - 1;
>>>
>>> I am not sure this is right for user-space.
>>>
>>>>
>>>> On the other hand, the stats of PGMIGRATE_SUCCESS/PGMIGRATE_FAIL should stand for the number of pages, instead of the number of hugetlb. Also for hugetlb migration when memory compaction, we've already counted the number of pages for a hugetlb into cc->nr_migratepages, if the hugetlb migration failed, the trace stat of compaction will be confusing if we return the number of hugetlb.
>>>>
>>>> trace_mm_compaction_migratepages(cc->nr_migratepages, err,                                   &cc->migratepages);
>>>>
>>>> So I think the stats of hugetlb migration should be consistent with THP.
>>>
>>> It makes sense to me.
>>>
>>>>
>>>>> to migrate. I am not sure this is the change we want. Or at least,
>>>>> the comment of migrate_pages() and the manpage of move_pages() need
>>>>> to be changed and linux-api mailing list should be cc’d.
>>>>
>>>> I don't think we should update the comments of migrate_pages(), "Returns the number of pages that were not migrated" makes sense to me if I understand correctly.
>>>>
>>>> For the manpage of move_pages(), as you said, the the returned non-migrate page numbers can be larger than the numbers specified from user if failed to migrate a THP or a hugetlb. I am not sure if we should change the manpage, since the THP already did, but I can send a patch to update the manpage if you think this is still necessary. Thanks.
>>>
>>> I am not sure changing manpage would help the users of move_pages() after
>>> think about it again, since users might not know all the THP and/or hugetlb
>>> information	when they call move_pages() and they just pass a list of N pages. >
>>> I just wonder if we could fix the rc value of migrate_pages to return
>>> the number of {base page, THP, hugetlb} instead, so that move_pages()
>>> can get its return value right.
>>
>> IMO it will break the usage in other places if we change the rc value of migrate_pages(), for example, the page migration when doing memory compaction as I said before, which will expect the number of normal pages. Meanwhile the THP page can be split into normal pages during migration, so it will not be consistent if we return the number of THP.
> 
> You mean the above trace_mm_compaction_migratepages()? I checked all migrate_pages()

Right.

> callers and none of them cares about the actual number of non-migrated pages, except
> do_move_pages_to_node() and trace_mm_compaction_migratepages(). The former expects
> the number of before-split-and-not-subpage pages, whereas the latter expects, like
> you said, the number of base pages.

Yes.

>>
>> Changing the return value of migrate_pages() will make things more complicated, and I am not sure whether it is worth doing. Any suggestion? Thanks.
> 
> How about 1) fixing migrate_pages() to return the number of before-split-and-not-subpage
> pages, and 2) replace err with nr_succeeded (you can get it via *ret_succeeded in
> migrate_pages()) in trace_mm_compaction_migratepages()? As a result, user-space move_pages()
> will be fixed and trace_mm_compaction_migratepages() gives a
> different but correct number as you want (you can still get nr_nonmigrated =
> nr_migratepages - nr_succeeded).

OK, sounds reasonable to me, I can try it. Thanks for your input.
Baolin Wang Nov. 3, 2021, 11:09 a.m. UTC | #7
> On 2021/11/3 3:30, Zi Yan wrote:
>> On 2 Nov 2021, at 2:08, Baolin Wang wrote:
>>
>>> On 2021/11/1 23:12, Zi Yan wrote:
>>>> On 1 Nov 2021, at 2:54, Baolin Wang wrote:
>>>>
>>>>> On 2021/10/29 23:43, Zi Yan wrote:
>>>>>> On 29 Oct 2021, at 3:42, Baolin Wang wrote:
>>>>>>
>>>>>>> Now hugetlb migration is also available for some scenarios, such as
>>>>>>> soft offling or memory compaction. So we should correct the 
>>>>>>> migration
>>>>>>
>>>>>> hugetlb migration is available at the time if (PageHuge(page)) branch
>>>>>> is added. I am not sure what is new here.
>>>>>
>>>>> No new things actually, sorry for confusing and will update the 
>>>>> commit message in next version.
>>>>>
>>>>>>
>>>>>>> stats for hugetlb with using compound_nr() instead of thp_nr_pages()
>>>>>>> to get the number of pages.
>>>>>>
>>>>>> nr_failed records the number of pages, not subpages. It is 
>>>>>> returned to
>>>>>
>>>>> I also think nr_failed should record the number of pages, not the 
>>>>> number of hugetlb, if I understand you correctly.
>>>>>
>>>>>> user space when move_pages() syscall is used. After your change,
>>>>>> if users try to migrate a list of pages including THPs and/or hugetlb
>>>>>> pages and some of THPs and/or hugetlb fail to migrate, move_pages()
>>>>>> will return a number larger than the number of pages the users tried
>>>>>
>>>>> OK, thanks for pointing out the issue.
>>>>>
>>>>> But before my patch, we've already returned the number of pages 
>>>>> successed or failed for THP migration, instead of the number of 
>>>>> THP. That means if we just move only 1 page by
>>>>
>>>> Ah, you are right.
>>>>
>>>>> move_pages() and if this page is 2M THP, so move_pages() will 
>>>>> return 512 if failed to migrate, which is larger than the page 
>>>>> count specified from user.
>>>>>
>>>>> if (err > 0)
>>>>>     err += nr_pages - i - 1;
>>>>
>>>> I am not sure this is right for user-space.
>>>>
>>>>>
>>>>> On the other hand, the stats of PGMIGRATE_SUCCESS/PGMIGRATE_FAIL 
>>>>> should stand for the number of pages, instead of the number of 
>>>>> hugetlb. Also for hugetlb migration when memory compaction, we've 
>>>>> already counted the number of pages for a hugetlb into 
>>>>> cc->nr_migratepages, if the hugetlb migration failed, the trace 
>>>>> stat of compaction will be confusing if we return the number of 
>>>>> hugetlb.
>>>>>
>>>>> trace_mm_compaction_migratepages(cc->nr_migratepages, 
>>>>> err,                                   &cc->migratepages);
>>>>>
>>>>> So I think the stats of hugetlb migration should be consistent with 
>>>>> THP.
>>>>
>>>> It makes sense to me.
>>>>
>>>>>
>>>>>> to migrate. I am not sure this is the change we want. Or at least,
>>>>>> the comment of migrate_pages() and the manpage of move_pages() need
>>>>>> to be changed and linux-api mailing list should be cc’d.
>>>>>
>>>>> I don't think we should update the comments of migrate_pages(), 
>>>>> "Returns the number of pages that were not migrated" makes sense to 
>>>>> me if I understand correctly.
>>>>>
>>>>> For the manpage of move_pages(), as you said, the the returned 
>>>>> non-migrate page numbers can be larger than the numbers specified 
>>>>> from user if failed to migrate a THP or a hugetlb. I am not sure if 
>>>>> we should change the manpage, since the THP already did, but I can 
>>>>> send a patch to update the manpage if you think this is still 
>>>>> necessary. Thanks.
>>>>
>>>> I am not sure changing manpage would help the users of move_pages() 
>>>> after
>>>> think about it again, since users might not know all the THP and/or 
>>>> hugetlb
>>>> information    when they call move_pages() and they just pass a list 
>>>> of N pages. >
>>>> I just wonder if we could fix the rc value of migrate_pages to return
>>>> the number of {base page, THP, hugetlb} instead, so that move_pages()
>>>> can get its return value right.
>>>
>>> IMO it will break the usage in other places if we change the rc value 
>>> of migrate_pages(), for example, the page migration when doing memory 
>>> compaction as I said before, which will expect the number of normal 
>>> pages. Meanwhile the THP page can be split into normal pages during 
>>> migration, so it will not be consistent if we return the number of THP.
>>
>> You mean the above trace_mm_compaction_migratepages()? I checked all 
>> migrate_pages()
> 
> Right.
> 
>> callers and none of them cares about the actual number of non-migrated 
>> pages, except
>> do_move_pages_to_node() and trace_mm_compaction_migratepages(). The 
>> former expects
>> the number of before-split-and-not-subpage pages, whereas the latter 
>> expects, like
>> you said, the number of base pages.
> 
> Yes.
> 
>>>
>>> Changing the return value of migrate_pages() will make things more 
>>> complicated, and I am not sure whether it is worth doing. Any 
>>> suggestion? Thanks.
>>
>> How about 1) fixing migrate_pages() to return the number of 
>> before-split-and-not-subpage
>> pages, and 2) replace err with nr_succeeded (you can get it via 
>> *ret_succeeded in
>> migrate_pages()) in trace_mm_compaction_migratepages()? As a result, 
>> user-space move_pages()
>> will be fixed and trace_mm_compaction_migratepages() gives a
>> different but correct number as you want (you can still get 
>> nr_nonmigrated =
>> nr_migratepages - nr_succeeded).
> 
> OK, sounds reasonable to me, I can try it. Thanks for your input.

After more thinking, I am still strugglling to handle the THP split 
case. Suppose the move_pages() tries to move 1 page, which is a 2M THP. 
If the 2M THP is split into 512 normal pages during migration, and 256 
normal pages are migrated successfully, others are failed. So we should 
return 1 or 256 non-migrated pages from migrate_pages()?

Anyway I posted new RFC patchset [1] according to your suggestion, we 
can talk about there.

[1] 
https://lore.kernel.org/linux-mm/cover.1635936218.git.baolin.wang@linux.alibaba.com/
diff mbox series

Patch

diff --git a/mm/migrate.c b/mm/migrate.c
index a11e948..2b45a29 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1475,7 +1475,7 @@  int migrate_pages(struct list_head *from, new_page_t get_new_page,
 			 * during migration.
 			 */
 			is_thp = PageTransHuge(page) && !PageHuge(page);
-			nr_subpages = thp_nr_pages(page);
+			nr_subpages = compound_nr(page);
 			cond_resched();
 
 			if (PageHuge(page))
@@ -1540,7 +1540,7 @@  int migrate_pages(struct list_head *from, new_page_t get_new_page,
 					nr_failed += nr_subpages;
 					goto out;
 				}
-				nr_failed++;
+				nr_failed += nr_subpages;
 				goto out;
 			case -EAGAIN:
 				if (is_thp) {
@@ -1550,14 +1550,14 @@  int migrate_pages(struct list_head *from, new_page_t get_new_page,
 				retry++;
 				break;
 			case MIGRATEPAGE_SUCCESS:
+				nr_succeeded += nr_subpages;
 				if (is_thp) {
 					nr_thp_succeeded++;
-					nr_succeeded += nr_subpages;
 					break;
 				}
-				nr_succeeded++;
 				break;
 			default:
+				nr_failed += nr_subpages;
 				/*
 				 * Permanent failure (-EBUSY, etc.):
 				 * unlike -EAGAIN case, the failed page is
@@ -1566,10 +1566,8 @@  int migrate_pages(struct list_head *from, new_page_t get_new_page,
 				 */
 				if (is_thp) {
 					nr_thp_failed++;
-					nr_failed += nr_subpages;
 					break;
 				}
-				nr_failed++;
 				break;
 			}
 		}