diff mbox series

[4/7] migrate_pages(): fix failure counting for THP subpages retrying

Message ID 20220624025309.1033400-5-ying.huang@intel.com (mailing list archive)
State New
Headers show
Series [1/7] migrate: fix syscall move_pages() return value for failure | expand

Commit Message

Huang, Ying June 24, 2022, 2:53 a.m. UTC
If THP is failed to be migrated for -ENOSYS and -ENOMEM, the THP will
be split into thp_split_pages, and after other pages are migrated,
pages in thp_split_pages will be migrated with no_subpage_counting ==
true, because its failure have been counted already.  If some pages in
thp_split_pages are retried during migration, we should not count
their failure if no_subpage_counting == true too.  This is done this
patch to fix the failure counting for THP subpages retrying.

Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Fixes: 5984fabb6e82 ("mm: move_pages: report the number of non-attempted pages")
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Yang Shi <shy828301@gmail.com>
---
 mm/migrate.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Baolin Wang June 24, 2022, 9:45 a.m. UTC | #1
On 6/24/2022 10:53 AM, Huang Ying wrote:
> If THP is failed to be migrated for -ENOSYS and -ENOMEM, the THP will
> be split into thp_split_pages, and after other pages are migrated,
> pages in thp_split_pages will be migrated with no_subpage_counting ==
> true, because its failure have been counted already.  If some pages in
> thp_split_pages are retried during migration, we should not count
> their failure if no_subpage_counting == true too.  This is done this
> patch to fix the failure counting for THP subpages retrying.

Good catch. Totally agree with you. It seems we can move the condition 
into -EAGAIN case like other cases did?

diff --git a/mm/migrate.c b/mm/migrate.c
index 1ece23d80bc4..491c2d07402b 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1463,7 +1463,7 @@ int migrate_pages(struct list_head *from, 
new_page_t get_new_page,
                         case -EAGAIN:
                                 if (is_thp)
                                         thp_retry++;
-                               else
+                               else if (!no_subpage_counting)
                                         retry++;
                                 break;

Anyway this patch looks good to me.
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>

> 
> Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
> Fixes: 5984fabb6e82 ("mm: move_pages: report the number of non-attempted pages")
> Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
> Cc: Zi Yan <ziy@nvidia.com>
> Cc: Yang Shi <shy828301@gmail.com>
> --- >   mm/migrate.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 542533e4e3cf..61dab3025a1d 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1477,7 +1477,8 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>   			}
>   		}
>   	}
> -	nr_failed += retry;
> +	if (!no_subpage_counting)
> +		nr_failed += retry;
>   	nr_thp_failed += thp_retry;
>   	/*
>   	 * Try to migrate subpages of fail-to-migrate THPs, no nr_failed
Huang, Ying June 27, 2022, 1:46 a.m. UTC | #2
Baolin Wang <baolin.wang@linux.alibaba.com> writes:

> On 6/24/2022 10:53 AM, Huang Ying wrote:
>> If THP is failed to be migrated for -ENOSYS and -ENOMEM, the THP will
>> be split into thp_split_pages, and after other pages are migrated,
>> pages in thp_split_pages will be migrated with no_subpage_counting ==
>> true, because its failure have been counted already.  If some pages in
>> thp_split_pages are retried during migration, we should not count
>> their failure if no_subpage_counting == true too.  This is done this
>> patch to fix the failure counting for THP subpages retrying.
>
> Good catch. Totally agree with you. It seems we can move the condition
> into -EAGAIN case like other cases did?
>
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 1ece23d80bc4..491c2d07402b 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1463,7 +1463,7 @@ int migrate_pages(struct list_head *from,
> new_page_t get_new_page,
>                         case -EAGAIN:
>                                 if (is_thp)
>                                         thp_retry++;
> -                               else
> +                               else if (!no_subpage_counting)
>                                         retry++;
>                                 break;

This has another effect except fixing the failure counting.  That is,
the split subpages of THP will not be retried for 10 times for -EAGAIN.
TBH, I think that we should do that.  But because this has some behavior
change, it's better to be done in a separate patch?  Do you have
interest to do that on top of this patchset?

> Anyway this patch looks good to me.
> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>

Thanks!

Best Regards,
Huang, Ying

>> Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
>> Fixes: 5984fabb6e82 ("mm: move_pages: report the number of non-attempted pages")
>> Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
>> Cc: Zi Yan <ziy@nvidia.com>
>> Cc: Yang Shi <shy828301@gmail.com>
>> --- >   mm/migrate.c | 3 ++-
>>   1 file changed, 2 insertions(+), 1 deletion(-)
>> diff --git a/mm/migrate.c b/mm/migrate.c
>> index 542533e4e3cf..61dab3025a1d 100644
>> --- a/mm/migrate.c
>> +++ b/mm/migrate.c
>> @@ -1477,7 +1477,8 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>>   			}
>>   		}
>>   	}
>> -	nr_failed += retry;
>> +	if (!no_subpage_counting)
>> +		nr_failed += retry;
>>   	nr_thp_failed += thp_retry;
>>   	/*
>>   	 * Try to migrate subpages of fail-to-migrate THPs, no nr_failed
Baolin Wang June 27, 2022, 3:59 a.m. UTC | #3
On 6/27/2022 9:46 AM, Huang, Ying wrote:
> Baolin Wang <baolin.wang@linux.alibaba.com> writes:
> 
>> On 6/24/2022 10:53 AM, Huang Ying wrote:
>>> If THP is failed to be migrated for -ENOSYS and -ENOMEM, the THP will
>>> be split into thp_split_pages, and after other pages are migrated,
>>> pages in thp_split_pages will be migrated with no_subpage_counting ==
>>> true, because its failure have been counted already.  If some pages in
>>> thp_split_pages are retried during migration, we should not count
>>> their failure if no_subpage_counting == true too.  This is done this
>>> patch to fix the failure counting for THP subpages retrying.
>>
>> Good catch. Totally agree with you. It seems we can move the condition
>> into -EAGAIN case like other cases did?
>>
>> diff --git a/mm/migrate.c b/mm/migrate.c
>> index 1ece23d80bc4..491c2d07402b 100644
>> --- a/mm/migrate.c
>> +++ b/mm/migrate.c
>> @@ -1463,7 +1463,7 @@ int migrate_pages(struct list_head *from,
>> new_page_t get_new_page,
>>                          case -EAGAIN:
>>                                  if (is_thp)
>>                                          thp_retry++;
>> -                               else
>> +                               else if (!no_subpage_counting)
>>                                          retry++;
>>                                  break;
> 
> This has another effect except fixing the failure counting.  That is,
> the split subpages of THP will not be retried for 10 times for -EAGAIN.

Ah, yes.

> TBH, I think that we should do that.  But because this has some behavior

OK. So you afraid that 10 times retry for each subpage of THP will waste 
lots of time?

> change, it's better to be done in a separate patch?  Do you have
> interest to do that on top of this patchset?

Sure. I can send a patch which can be folded into your series. Is this 
OK for you?

By the way, if I do like I said, the patch 4 can be avoided.

> 
>> Anyway this patch looks good to me.
>> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> 
> Thanks!
> 
> Best Regards,
> Huang, Ying
> 
>>> Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
>>> Fixes: 5984fabb6e82 ("mm: move_pages: report the number of non-attempted pages")
>>> Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
>>> Cc: Zi Yan <ziy@nvidia.com>
>>> Cc: Yang Shi <shy828301@gmail.com>
>>> --- >   mm/migrate.c | 3 ++-
>>>    1 file changed, 2 insertions(+), 1 deletion(-)
>>> diff --git a/mm/migrate.c b/mm/migrate.c
>>> index 542533e4e3cf..61dab3025a1d 100644
>>> --- a/mm/migrate.c
>>> +++ b/mm/migrate.c
>>> @@ -1477,7 +1477,8 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>>>    			}
>>>    		}
>>>    	}
>>> -	nr_failed += retry;
>>> +	if (!no_subpage_counting)
>>> +		nr_failed += retry;
>>>    	nr_thp_failed += thp_retry;
>>>    	/*
>>>    	 * Try to migrate subpages of fail-to-migrate THPs, no nr_failed
Huang, Ying June 27, 2022, 4:23 a.m. UTC | #4
Baolin Wang <baolin.wang@linux.alibaba.com> writes:

> On 6/27/2022 9:46 AM, Huang, Ying wrote:
>> Baolin Wang <baolin.wang@linux.alibaba.com> writes:
>> 
>>> On 6/24/2022 10:53 AM, Huang Ying wrote:
>>>> If THP is failed to be migrated for -ENOSYS and -ENOMEM, the THP will
>>>> be split into thp_split_pages, and after other pages are migrated,
>>>> pages in thp_split_pages will be migrated with no_subpage_counting ==
>>>> true, because its failure have been counted already.  If some pages in
>>>> thp_split_pages are retried during migration, we should not count
>>>> their failure if no_subpage_counting == true too.  This is done this
>>>> patch to fix the failure counting for THP subpages retrying.
>>>
>>> Good catch. Totally agree with you. It seems we can move the condition
>>> into -EAGAIN case like other cases did?
>>>
>>> diff --git a/mm/migrate.c b/mm/migrate.c
>>> index 1ece23d80bc4..491c2d07402b 100644
>>> --- a/mm/migrate.c
>>> +++ b/mm/migrate.c
>>> @@ -1463,7 +1463,7 @@ int migrate_pages(struct list_head *from,
>>> new_page_t get_new_page,
>>>                          case -EAGAIN:
>>>                                  if (is_thp)
>>>                                          thp_retry++;
>>> -                               else
>>> +                               else if (!no_subpage_counting)
>>>                                          retry++;
>>>                                  break;
>> This has another effect except fixing the failure counting.  That
>> is,
>> the split subpages of THP will not be retried for 10 times for -EAGAIN.
>
> Ah, yes.
>
>> TBH, I think that we should do that.  But because this has some behavior
>
> OK. So you afraid that 10 times retry for each subpage of THP will
> waste lots of time?

I just think that it's unnecessary.  We have already regarded the
migration as failed.  And for the worst case, we will try 512 * 10 =
5120 times in total.

>> change, it's better to be done in a separate patch?  Do you have
>> interest to do that on top of this patchset?
>
> Sure. I can send a patch which can be folded into your series. Is this
> OK for you?
>
> By the way, if I do like I said, the patch 4 can be avoided.

I tend to keep both.  [4/7] is just a fix.  You patch will introduce the
behavior change.  And your patch needn't to be folded into this series.
You can send it and push it separately.

Best Regards,
Huang, Ying

>> 
>>> Anyway this patch looks good to me.
>>> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>> Thanks!
>> Best Regards,
>> Huang, Ying
>> 
>>>> Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
>>>> Fixes: 5984fabb6e82 ("mm: move_pages: report the number of non-attempted pages")
>>>> Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
>>>> Cc: Zi Yan <ziy@nvidia.com>
>>>> Cc: Yang Shi <shy828301@gmail.com>
>>>> --- >   mm/migrate.c | 3 ++-
>>>>    1 file changed, 2 insertions(+), 1 deletion(-)
>>>> diff --git a/mm/migrate.c b/mm/migrate.c
>>>> index 542533e4e3cf..61dab3025a1d 100644
>>>> --- a/mm/migrate.c
>>>> +++ b/mm/migrate.c
>>>> @@ -1477,7 +1477,8 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>>>>    			}
>>>>    		}
>>>>    	}
>>>> -	nr_failed += retry;
>>>> +	if (!no_subpage_counting)
>>>> +		nr_failed += retry;
>>>>    	nr_thp_failed += thp_retry;
>>>>    	/*
>>>>    	 * Try to migrate subpages of fail-to-migrate THPs, no nr_failed
Baolin Wang June 27, 2022, 5:56 a.m. UTC | #5
On 6/27/2022 12:23 PM, Huang, Ying wrote:
> Baolin Wang <baolin.wang@linux.alibaba.com> writes:
> 
>> On 6/27/2022 9:46 AM, Huang, Ying wrote:
>>> Baolin Wang <baolin.wang@linux.alibaba.com> writes:
>>>
>>>> On 6/24/2022 10:53 AM, Huang Ying wrote:
>>>>> If THP is failed to be migrated for -ENOSYS and -ENOMEM, the THP will
>>>>> be split into thp_split_pages, and after other pages are migrated,
>>>>> pages in thp_split_pages will be migrated with no_subpage_counting ==
>>>>> true, because its failure have been counted already.  If some pages in
>>>>> thp_split_pages are retried during migration, we should not count
>>>>> their failure if no_subpage_counting == true too.  This is done this
>>>>> patch to fix the failure counting for THP subpages retrying.
>>>>
>>>> Good catch. Totally agree with you. It seems we can move the condition
>>>> into -EAGAIN case like other cases did?
>>>>
>>>> diff --git a/mm/migrate.c b/mm/migrate.c
>>>> index 1ece23d80bc4..491c2d07402b 100644
>>>> --- a/mm/migrate.c
>>>> +++ b/mm/migrate.c
>>>> @@ -1463,7 +1463,7 @@ int migrate_pages(struct list_head *from,
>>>> new_page_t get_new_page,
>>>>                           case -EAGAIN:
>>>>                                   if (is_thp)
>>>>                                           thp_retry++;
>>>> -                               else
>>>> +                               else if (!no_subpage_counting)
>>>>                                           retry++;
>>>>                                   break;
>>> This has another effect except fixing the failure counting.  That
>>> is,
>>> the split subpages of THP will not be retried for 10 times for -EAGAIN.
>>
>> Ah, yes.
>>
>>> TBH, I think that we should do that.  But because this has some behavior
>>
>> OK. So you afraid that 10 times retry for each subpage of THP will
>> waste lots of time?
> 
> I just think that it's unnecessary.  We have already regarded the
> migration as failed.  And for the worst case, we will try 512 * 10 =
> 5120 times in total.
> 
>>> change, it's better to be done in a separate patch?  Do you have
>>> interest to do that on top of this patchset?
>>
>> Sure. I can send a patch which can be folded into your series. Is this
>> OK for you?
>>
>> By the way, if I do like I said, the patch 4 can be avoided.
> 
> I tend to keep both.  [4/7] is just a fix.  You patch will introduce the
> behavior change.  And your patch needn't to be folded into this series.
> You can send it and push it separately.

OK. Thanks.
diff mbox series

Patch

diff --git a/mm/migrate.c b/mm/migrate.c
index 542533e4e3cf..61dab3025a1d 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1477,7 +1477,8 @@  int migrate_pages(struct list_head *from, new_page_t get_new_page,
 			}
 		}
 	}
-	nr_failed += retry;
+	if (!no_subpage_counting)
+		nr_failed += retry;
 	nr_thp_failed += thp_retry;
 	/*
 	 * Try to migrate subpages of fail-to-migrate THPs, no nr_failed