diff mbox series

[v3] mm/migrate: split source folio if it is on deferred split list

Message ID 20240319154753.253262-1-zi.yan@sent.com (mailing list archive)
State New
Headers show
Series [v3] mm/migrate: split source folio if it is on deferred split list | expand

Commit Message

Zi Yan March 19, 2024, 3:47 p.m. UTC
From: Zi Yan <ziy@nvidia.com>

If the source folio is on deferred split list, it is likely some subpages
are not used. Split it before migration to avoid migrating unused subpages.

Commit 616b8371539a6 ("mm: thp: enable thp migration in generic path")
did not check if a THP is on deferred split list before migration, thus,
the destination THP is never put on deferred split list even if the source
THP might be. The opportunity of reclaiming free pages in a partially
mapped THP during deferred list scanning is lost, but no other harmful
consequence is present[1].

From v2:
1. Split the source folio instead of migrating it (per Matthew Wilcox)[2].

From v1:
1. Used dst to get correct deferred split list after migration
   (per Ryan Roberts).

[1]: https://lore.kernel.org/linux-mm/03CE3A00-917C-48CC-8E1C-6A98713C817C@nvidia.com/
[2]: https://lore.kernel.org/linux-mm/Ze_P6xagdTbcu1Kz@casper.infradead.org/

Fixes: 616b8371539a ("mm: thp: enable thp migration in generic path")
Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 mm/huge_memory.c | 22 ------------------
 mm/internal.h    | 23 +++++++++++++++++++
 mm/migrate.c     | 60 +++++++++++++++++++++++++++++++++++++++---------
 3 files changed, 72 insertions(+), 33 deletions(-)


base-commit: 384df0d61b071bb7a38ae63bae9b59d088335911

Comments

SeongJae Park March 20, 2024, 1:08 a.m. UTC | #1
Hello,

On Tue, 19 Mar 2024 11:47:53 -0400 Zi Yan <zi.yan@sent.com> wrote:

> From: Zi Yan <ziy@nvidia.com>
> 
> If the source folio is on deferred split list, it is likely some subpages
> are not used. Split it before migration to avoid migrating unused subpages.
> 
> Commit 616b8371539a6 ("mm: thp: enable thp migration in generic path")
> did not check if a THP is on deferred split list before migration, thus,
> the destination THP is never put on deferred split list even if the source
> THP might be. The opportunity of reclaiming free pages in a partially
> mapped THP during deferred list scanning is lost, but no other harmful
> consequence is present[1].
> 
> From v2:
> 1. Split the source folio instead of migrating it (per Matthew Wilcox)[2].
> 
> From v1:
> 1. Used dst to get correct deferred split list after migration
>    (per Ryan Roberts).
> 
> [1]: https://lore.kernel.org/linux-mm/03CE3A00-917C-48CC-8E1C-6A98713C817C@nvidia.com/
> [2]: https://lore.kernel.org/linux-mm/Ze_P6xagdTbcu1Kz@casper.infradead.org/
> 
> Fixes: 616b8371539a ("mm: thp: enable thp migration in generic path")
> Signed-off-by: Zi Yan <ziy@nvidia.com>
> ---
>  mm/huge_memory.c | 22 ------------------
>  mm/internal.h    | 23 +++++++++++++++++++
>  mm/migrate.c     | 60 +++++++++++++++++++++++++++++++++++++++---------
>  3 files changed, 72 insertions(+), 33 deletions(-)
> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 9859aa4f7553..c6d4d0cdf4b3 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -766,28 +766,6 @@ pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma)
>  	return pmd;
>  }
>  
> -#ifdef CONFIG_MEMCG
> -static inline
> -struct deferred_split *get_deferred_split_queue(struct folio *folio)
> -{
> -	struct mem_cgroup *memcg = folio_memcg(folio);
> -	struct pglist_data *pgdat = NODE_DATA(folio_nid(folio));
> -
> -	if (memcg)
> -		return &memcg->deferred_split_queue;
> -	else
> -		return &pgdat->deferred_split_queue;
> -}
> -#else
> -static inline
> -struct deferred_split *get_deferred_split_queue(struct folio *folio)
> -{
> -	struct pglist_data *pgdat = NODE_DATA(folio_nid(folio));
> -
> -	return &pgdat->deferred_split_queue;
> -}
> -#endif
> -
>  void folio_prep_large_rmappable(struct folio *folio)
>  {
>  	if (!folio || !folio_test_large(folio))
> diff --git a/mm/internal.h b/mm/internal.h
> index d1c69119b24f..8fa36e84463a 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -1107,6 +1107,29 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma,
>  				   unsigned long addr, pmd_t *pmd,
>  				   unsigned int flags);
>  
> +#ifdef CONFIG_MEMCG
> +static inline
> +struct deferred_split *get_deferred_split_queue(struct folio *folio)
> +{
> +	struct mem_cgroup *memcg = folio_memcg(folio);
> +	struct pglist_data *pgdat = NODE_DATA(folio_nid(folio));
> +
> +	if (memcg)
> +		return &memcg->deferred_split_queue;
> +	else
> +		return &pgdat->deferred_split_queue;
> +}
> +#else
> +static inline
> +struct deferred_split *get_deferred_split_queue(struct folio *folio)
> +{
> +	struct pglist_data *pgdat = NODE_DATA(folio_nid(folio));
> +
> +	return &pgdat->deferred_split_queue;
> +}
> +#endif

I found this breaks the build when CONFIG_TRANSPARENT_HUGEPAGE is not set, with
below error:

    .../lib/../mm/internal.h: In function 'get_deferred_split_queue':
    .../lib/../mm/internal.h:1127:22: error: 'struct pglist_data' has no member named 'deferred_split_queue'
     1127 |         return &pgdat->deferred_split_queue;
          |                      ^~

Since the code was in hugepage.c, maybe the above chunk need to be wrapped by
#ifdef CONFIG_TRANSPARENT_HUGEPAGE?  I confirmed below change is fixing the
build on my setup.


Thanks,
SJ

[...]

-------------------------------------- >8 -------------------------------------
diff --git a/mm/internal.h b/mm/internal.h
index dce2b9f5e6cd..fe9f69ceb140 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -1106,6 +1106,7 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma,
                                   unsigned long addr, pmd_t *pmd,
                                   unsigned int flags);

+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
 #ifdef CONFIG_MEMCG
 static inline
 struct deferred_split *get_deferred_split_queue(struct folio *folio)
@@ -1126,7 +1127,8 @@ struct deferred_split *get_deferred_split_queue(struct folio *folio)

        return &pgdat->deferred_split_queue;
 }
-#endif
+#endif /* CONFIG_MEMCG */
+#endif /* CONFIG_TRANSPARENT_HUGEPAGE */


 /*
Zi Yan March 20, 2024, 1:09 a.m. UTC | #2
On 19 Mar 2024, at 21:08, SeongJae Park wrote:

> Hello,
>
> On Tue, 19 Mar 2024 11:47:53 -0400 Zi Yan <zi.yan@sent.com> wrote:
>
>> From: Zi Yan <ziy@nvidia.com>
>>
>> If the source folio is on deferred split list, it is likely some subpages
>> are not used. Split it before migration to avoid migrating unused subpages.
>>
>> Commit 616b8371539a6 ("mm: thp: enable thp migration in generic path")
>> did not check if a THP is on deferred split list before migration, thus,
>> the destination THP is never put on deferred split list even if the source
>> THP might be. The opportunity of reclaiming free pages in a partially
>> mapped THP during deferred list scanning is lost, but no other harmful
>> consequence is present[1].
>>
>> From v2:
>> 1. Split the source folio instead of migrating it (per Matthew Wilcox)[2].
>>
>> From v1:
>> 1. Used dst to get correct deferred split list after migration
>>    (per Ryan Roberts).
>>
>> [1]: https://lore.kernel.org/linux-mm/03CE3A00-917C-48CC-8E1C-6A98713C817C@nvidia.com/
>> [2]: https://lore.kernel.org/linux-mm/Ze_P6xagdTbcu1Kz@casper.infradead.org/
>>
>> Fixes: 616b8371539a ("mm: thp: enable thp migration in generic path")
>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>> ---
>>  mm/huge_memory.c | 22 ------------------
>>  mm/internal.h    | 23 +++++++++++++++++++
>>  mm/migrate.c     | 60 +++++++++++++++++++++++++++++++++++++++---------
>>  3 files changed, 72 insertions(+), 33 deletions(-)
>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index 9859aa4f7553..c6d4d0cdf4b3 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -766,28 +766,6 @@ pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma)
>>  	return pmd;
>>  }
>>
>> -#ifdef CONFIG_MEMCG
>> -static inline
>> -struct deferred_split *get_deferred_split_queue(struct folio *folio)
>> -{
>> -	struct mem_cgroup *memcg = folio_memcg(folio);
>> -	struct pglist_data *pgdat = NODE_DATA(folio_nid(folio));
>> -
>> -	if (memcg)
>> -		return &memcg->deferred_split_queue;
>> -	else
>> -		return &pgdat->deferred_split_queue;
>> -}
>> -#else
>> -static inline
>> -struct deferred_split *get_deferred_split_queue(struct folio *folio)
>> -{
>> -	struct pglist_data *pgdat = NODE_DATA(folio_nid(folio));
>> -
>> -	return &pgdat->deferred_split_queue;
>> -}
>> -#endif
>> -
>>  void folio_prep_large_rmappable(struct folio *folio)
>>  {
>>  	if (!folio || !folio_test_large(folio))
>> diff --git a/mm/internal.h b/mm/internal.h
>> index d1c69119b24f..8fa36e84463a 100644
>> --- a/mm/internal.h
>> +++ b/mm/internal.h
>> @@ -1107,6 +1107,29 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma,
>>  				   unsigned long addr, pmd_t *pmd,
>>  				   unsigned int flags);
>>
>> +#ifdef CONFIG_MEMCG
>> +static inline
>> +struct deferred_split *get_deferred_split_queue(struct folio *folio)
>> +{
>> +	struct mem_cgroup *memcg = folio_memcg(folio);
>> +	struct pglist_data *pgdat = NODE_DATA(folio_nid(folio));
>> +
>> +	if (memcg)
>> +		return &memcg->deferred_split_queue;
>> +	else
>> +		return &pgdat->deferred_split_queue;
>> +}
>> +#else
>> +static inline
>> +struct deferred_split *get_deferred_split_queue(struct folio *folio)
>> +{
>> +	struct pglist_data *pgdat = NODE_DATA(folio_nid(folio));
>> +
>> +	return &pgdat->deferred_split_queue;
>> +}
>> +#endif
>
> I found this breaks the build when CONFIG_TRANSPARENT_HUGEPAGE is not set, with
> below error:
>
>     .../lib/../mm/internal.h: In function 'get_deferred_split_queue':
>     .../lib/../mm/internal.h:1127:22: error: 'struct pglist_data' has no member named 'deferred_split_queue'
>      1127 |         return &pgdat->deferred_split_queue;
>           |                      ^~
>
> Since the code was in hugepage.c, maybe the above chunk need to be wrapped by
> #ifdef CONFIG_TRANSPARENT_HUGEPAGE?  I confirmed below change is fixing the
> build on my setup.

Thanks. Will fix it in the next version.

>
> [...]
>
> -------------------------------------- >8 -------------------------------------
> diff --git a/mm/internal.h b/mm/internal.h
> index dce2b9f5e6cd..fe9f69ceb140 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -1106,6 +1106,7 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma,
>                                    unsigned long addr, pmd_t *pmd,
>                                    unsigned int flags);
>
> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
>  #ifdef CONFIG_MEMCG
>  static inline
>  struct deferred_split *get_deferred_split_queue(struct folio *folio)
> @@ -1126,7 +1127,8 @@ struct deferred_split *get_deferred_split_queue(struct folio *folio)
>
>         return &pgdat->deferred_split_queue;
>  }
> -#endif
> +#endif /* CONFIG_MEMCG */
> +#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
>
>
>  /*


--
Best Regards,
Yan, Zi
Zi Yan March 20, 2024, 1:16 a.m. UTC | #3
On 19 Mar 2024, at 21:09, Zi Yan wrote:

> On 19 Mar 2024, at 21:08, SeongJae Park wrote:
>
>> Hello,
>>
>> On Tue, 19 Mar 2024 11:47:53 -0400 Zi Yan <zi.yan@sent.com> wrote:
>>
>>> From: Zi Yan <ziy@nvidia.com>
>>>
>>> If the source folio is on deferred split list, it is likely some subpages
>>> are not used. Split it before migration to avoid migrating unused subpages.
>>>
>>> Commit 616b8371539a6 ("mm: thp: enable thp migration in generic path")
>>> did not check if a THP is on deferred split list before migration, thus,
>>> the destination THP is never put on deferred split list even if the source
>>> THP might be. The opportunity of reclaiming free pages in a partially
>>> mapped THP during deferred list scanning is lost, but no other harmful
>>> consequence is present[1].
>>>
>>> From v2:
>>> 1. Split the source folio instead of migrating it (per Matthew Wilcox)[2].
>>>
>>> From v1:
>>> 1. Used dst to get correct deferred split list after migration
>>>    (per Ryan Roberts).
>>>
>>> [1]: https://lore.kernel.org/linux-mm/03CE3A00-917C-48CC-8E1C-6A98713C817C@nvidia.com/
>>> [2]: https://lore.kernel.org/linux-mm/Ze_P6xagdTbcu1Kz@casper.infradead.org/
>>>
>>> Fixes: 616b8371539a ("mm: thp: enable thp migration in generic path")
>>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>>> ---
>>>  mm/huge_memory.c | 22 ------------------
>>>  mm/internal.h    | 23 +++++++++++++++++++
>>>  mm/migrate.c     | 60 +++++++++++++++++++++++++++++++++++++++---------
>>>  3 files changed, 72 insertions(+), 33 deletions(-)
>>>
>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>> index 9859aa4f7553..c6d4d0cdf4b3 100644
>>> --- a/mm/huge_memory.c
>>> +++ b/mm/huge_memory.c
>>> @@ -766,28 +766,6 @@ pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma)
>>>  	return pmd;
>>>  }
>>>
>>> -#ifdef CONFIG_MEMCG
>>> -static inline
>>> -struct deferred_split *get_deferred_split_queue(struct folio *folio)
>>> -{
>>> -	struct mem_cgroup *memcg = folio_memcg(folio);
>>> -	struct pglist_data *pgdat = NODE_DATA(folio_nid(folio));
>>> -
>>> -	if (memcg)
>>> -		return &memcg->deferred_split_queue;
>>> -	else
>>> -		return &pgdat->deferred_split_queue;
>>> -}
>>> -#else
>>> -static inline
>>> -struct deferred_split *get_deferred_split_queue(struct folio *folio)
>>> -{
>>> -	struct pglist_data *pgdat = NODE_DATA(folio_nid(folio));
>>> -
>>> -	return &pgdat->deferred_split_queue;
>>> -}
>>> -#endif
>>> -
>>>  void folio_prep_large_rmappable(struct folio *folio)
>>>  {
>>>  	if (!folio || !folio_test_large(folio))
>>> diff --git a/mm/internal.h b/mm/internal.h
>>> index d1c69119b24f..8fa36e84463a 100644
>>> --- a/mm/internal.h
>>> +++ b/mm/internal.h
>>> @@ -1107,6 +1107,29 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma,
>>>  				   unsigned long addr, pmd_t *pmd,
>>>  				   unsigned int flags);
>>>
>>> +#ifdef CONFIG_MEMCG
>>> +static inline
>>> +struct deferred_split *get_deferred_split_queue(struct folio *folio)
>>> +{
>>> +	struct mem_cgroup *memcg = folio_memcg(folio);
>>> +	struct pglist_data *pgdat = NODE_DATA(folio_nid(folio));
>>> +
>>> +	if (memcg)
>>> +		return &memcg->deferred_split_queue;
>>> +	else
>>> +		return &pgdat->deferred_split_queue;
>>> +}
>>> +#else
>>> +static inline
>>> +struct deferred_split *get_deferred_split_queue(struct folio *folio)
>>> +{
>>> +	struct pglist_data *pgdat = NODE_DATA(folio_nid(folio));
>>> +
>>> +	return &pgdat->deferred_split_queue;
>>> +}
>>> +#endif
>>
>> I found this breaks the build when CONFIG_TRANSPARENT_HUGEPAGE is not set, with
>> below error:
>>
>>     .../lib/../mm/internal.h: In function 'get_deferred_split_queue':
>>     .../lib/../mm/internal.h:1127:22: error: 'struct pglist_data' has no member named 'deferred_split_queue'
>>      1127 |         return &pgdat->deferred_split_queue;
>>           |                      ^~
>>
>> Since the code was in hugepage.c, maybe the above chunk need to be wrapped by
>> #ifdef CONFIG_TRANSPARENT_HUGEPAGE?  I confirmed below change is fixing the
>> build on my setup.
>
> Thanks. Will fix it in the next version.

Actually, since get_deferred_split_queue() is used in mm/migrate.c, that
part needs to be guarded by CONFIG_TRANSPARENT_HUGEPAGE as well.

--
Best Regards,
Yan, Zi
Zi Yan March 20, 2024, 1:30 a.m. UTC | #4
Hi Matthew,

>>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>>> index 9859aa4f7553..c6d4d0cdf4b3 100644
>>>> --- a/mm/huge_memory.c
>>>> +++ b/mm/huge_memory.c
>>>> @@ -766,28 +766,6 @@ pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma)
>>>>  	return pmd;
>>>>  }
>>>>
>>>> -#ifdef CONFIG_MEMCG
>>>> -static inline
>>>> -struct deferred_split *get_deferred_split_queue(struct folio *folio)
>>>> -{
>>>> -	struct mem_cgroup *memcg = folio_memcg(folio);
>>>> -	struct pglist_data *pgdat = NODE_DATA(folio_nid(folio));
>>>> -
>>>> -	if (memcg)
>>>> -		return &memcg->deferred_split_queue;
>>>> -	else
>>>> -		return &pgdat->deferred_split_queue;
>>>> -}
>>>> -#else
>>>> -static inline
>>>> -struct deferred_split *get_deferred_split_queue(struct folio *folio)
>>>> -{
>>>> -	struct pglist_data *pgdat = NODE_DATA(folio_nid(folio));
>>>> -
>>>> -	return &pgdat->deferred_split_queue;
>>>> -}
>>>> -#endif
>>>> -
>>>>  void folio_prep_large_rmappable(struct folio *folio)
>>>>  {
>>>>  	if (!folio || !folio_test_large(folio))
>>>> diff --git a/mm/internal.h b/mm/internal.h
>>>> index d1c69119b24f..8fa36e84463a 100644
>>>> --- a/mm/internal.h
>>>> +++ b/mm/internal.h
>>>> @@ -1107,6 +1107,29 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma,
>>>>  				   unsigned long addr, pmd_t *pmd,
>>>>  				   unsigned int flags);
>>>>
>>>> +#ifdef CONFIG_MEMCG
>>>> +static inline
>>>> +struct deferred_split *get_deferred_split_queue(struct folio *folio)
>>>> +{
>>>> +	struct mem_cgroup *memcg = folio_memcg(folio);
>>>> +	struct pglist_data *pgdat = NODE_DATA(folio_nid(folio));
>>>> +
>>>> +	if (memcg)
>>>> +		return &memcg->deferred_split_queue;
>>>> +	else
>>>> +		return &pgdat->deferred_split_queue;
>>>> +}
>>>> +#else
>>>> +static inline
>>>> +struct deferred_split *get_deferred_split_queue(struct folio *folio)
>>>> +{
>>>> +	struct pglist_data *pgdat = NODE_DATA(folio_nid(folio));
>>>> +
>>>> +	return &pgdat->deferred_split_queue;
>>>> +}
>>>> +#endif
>>>
>>> I found this breaks the build when CONFIG_TRANSPARENT_HUGEPAGE is not set, with
>>> below error:
>>>
>>>     .../lib/../mm/internal.h: In function 'get_deferred_split_queue':
>>>     .../lib/../mm/internal.h:1127:22: error: 'struct pglist_data' has no member named 'deferred_split_queue'
>>>      1127 |         return &pgdat->deferred_split_queue;
>>>           |                      ^~
>>>
>>> Since the code was in hugepage.c, maybe the above chunk need to be wrapped by
>>> #ifdef CONFIG_TRANSPARENT_HUGEPAGE?  I confirmed below change is fixing the
>>> build on my setup.
>>
>> Thanks. Will fix it in the next version.
>
> Actually, since get_deferred_split_queue() is used in mm/migrate.c, that
> part needs to be guarded by CONFIG_TRANSPARENT_HUGEPAGE as well.

_deferred_list is not hidden behind CONFIG_TRANSPARENT_HUGEPAGE but
its users are all behind CONFIG_TRANSPARENT_HUGEPAGE. Should it be moved
as well? Or the other way around, moving _deferred_list users out?

Large folios should work without THP, since they would be like THP without
PMD mappings, namely mTHP. But mTHP is considered a subset of THP. It seems
confusing. Maybe hiding _deferred_list behind CONFIG_TRANSPARENT_HUGEPAGE
is better, or just do not change.

--
Best Regards,
Yan, Zi
SeongJae Park March 20, 2024, 1:38 a.m. UTC | #5
On Tue, 19 Mar 2024 21:16:37 -0400 Zi Yan <ziy@nvidia.com> wrote:

> [-- Attachment #1: Type: text/plain, Size: 4354 bytes --]
> 
> On 19 Mar 2024, at 21:09, Zi Yan wrote:
> 
> > On 19 Mar 2024, at 21:08, SeongJae Park wrote:
> >
> >> Hello,
> >>
> >> On Tue, 19 Mar 2024 11:47:53 -0400 Zi Yan <zi.yan@sent.com> wrote:
> >>
> >>> From: Zi Yan <ziy@nvidia.com>
> >>>
> >>> If the source folio is on deferred split list, it is likely some subpages
> >>> are not used. Split it before migration to avoid migrating unused subpages.
> >>>
> >>> Commit 616b8371539a6 ("mm: thp: enable thp migration in generic path")
> >>> did not check if a THP is on deferred split list before migration, thus,
> >>> the destination THP is never put on deferred split list even if the source
> >>> THP might be. The opportunity of reclaiming free pages in a partially
> >>> mapped THP during deferred list scanning is lost, but no other harmful
> >>> consequence is present[1].
> >>>
> >>> From v2:
> >>> 1. Split the source folio instead of migrating it (per Matthew Wilcox)[2].
> >>>
> >>> From v1:
> >>> 1. Used dst to get correct deferred split list after migration
> >>>    (per Ryan Roberts).
> >>>
> >>> [1]: https://lore.kernel.org/linux-mm/03CE3A00-917C-48CC-8E1C-6A98713C817C@nvidia.com/
> >>> [2]: https://lore.kernel.org/linux-mm/Ze_P6xagdTbcu1Kz@casper.infradead.org/
> >>>
> >>> Fixes: 616b8371539a ("mm: thp: enable thp migration in generic path")
> >>> Signed-off-by: Zi Yan <ziy@nvidia.com>
> >>> ---
> >>>  mm/huge_memory.c | 22 ------------------
> >>>  mm/internal.h    | 23 +++++++++++++++++++
> >>>  mm/migrate.c     | 60 +++++++++++++++++++++++++++++++++++++++---------
> >>>  3 files changed, 72 insertions(+), 33 deletions(-)
> >>>
> >>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> >>> index 9859aa4f7553..c6d4d0cdf4b3 100644
> >>> --- a/mm/huge_memory.c
> >>> +++ b/mm/huge_memory.c
> >>> @@ -766,28 +766,6 @@ pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma)
> >>>  	return pmd;
> >>>  }
> >>>
> >>> -#ifdef CONFIG_MEMCG
> >>> -static inline
> >>> -struct deferred_split *get_deferred_split_queue(struct folio *folio)
> >>> -{
> >>> -	struct mem_cgroup *memcg = folio_memcg(folio);
> >>> -	struct pglist_data *pgdat = NODE_DATA(folio_nid(folio));
> >>> -
> >>> -	if (memcg)
> >>> -		return &memcg->deferred_split_queue;
> >>> -	else
> >>> -		return &pgdat->deferred_split_queue;
> >>> -}
> >>> -#else
> >>> -static inline
> >>> -struct deferred_split *get_deferred_split_queue(struct folio *folio)
> >>> -{
> >>> -	struct pglist_data *pgdat = NODE_DATA(folio_nid(folio));
> >>> -
> >>> -	return &pgdat->deferred_split_queue;
> >>> -}
> >>> -#endif
> >>> -
> >>>  void folio_prep_large_rmappable(struct folio *folio)
> >>>  {
> >>>  	if (!folio || !folio_test_large(folio))
> >>> diff --git a/mm/internal.h b/mm/internal.h
> >>> index d1c69119b24f..8fa36e84463a 100644
> >>> --- a/mm/internal.h
> >>> +++ b/mm/internal.h
> >>> @@ -1107,6 +1107,29 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma,
> >>>  				   unsigned long addr, pmd_t *pmd,
> >>>  				   unsigned int flags);
> >>>
> >>> +#ifdef CONFIG_MEMCG
> >>> +static inline
> >>> +struct deferred_split *get_deferred_split_queue(struct folio *folio)
> >>> +{
> >>> +	struct mem_cgroup *memcg = folio_memcg(folio);
> >>> +	struct pglist_data *pgdat = NODE_DATA(folio_nid(folio));
> >>> +
> >>> +	if (memcg)
> >>> +		return &memcg->deferred_split_queue;
> >>> +	else
> >>> +		return &pgdat->deferred_split_queue;
> >>> +}
> >>> +#else
> >>> +static inline
> >>> +struct deferred_split *get_deferred_split_queue(struct folio *folio)
> >>> +{
> >>> +	struct pglist_data *pgdat = NODE_DATA(folio_nid(folio));
> >>> +
> >>> +	return &pgdat->deferred_split_queue;
> >>> +}
> >>> +#endif
> >>
> >> I found this breaks the build when CONFIG_TRANSPARENT_HUGEPAGE is not set, with
> >> below error:
> >>
> >>     .../lib/../mm/internal.h: In function 'get_deferred_split_queue':
> >>     .../lib/../mm/internal.h:1127:22: error: 'struct pglist_data' has no member named 'deferred_split_queue'
> >>      1127 |         return &pgdat->deferred_split_queue;
> >>           |                      ^~
> >>
> >> Since the code was in hugepage.c, maybe the above chunk need to be wrapped by
> >> #ifdef CONFIG_TRANSPARENT_HUGEPAGE?  I confirmed below change is fixing the
> >> build on my setup.
> >
> > Thanks. Will fix it in the next version.
> 
> Actually, since get_deferred_split_queue() is used in mm/migrate.c, that
> part needs to be guarded by CONFIG_TRANSPARENT_HUGEPAGE as well.

You're right.  I also just confirmed that build breaks with below error when
CONFIG_TRANSPARENT_HUGEPAGE is not set but CONFIG_MIGRATION is set.

    ERROR:root:.../mm/migrate.c: In function ‘migrate_pages_batch’:
    .../mm/migrate.c:1682:49: error: implicit declaration of function ‘get_deferred_split_queue’ [-Werror=implicit-function-declaration]
     1682 |                                                 get_deferred_split_queue(folio);
          |                                                 ^~~~~~~~~~~~~~~~~~~~~~~~


Thanks,
SJ

> 
> --
> Best Regards,
> Yan, Zi
diff mbox series

Patch

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 9859aa4f7553..c6d4d0cdf4b3 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -766,28 +766,6 @@  pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma)
 	return pmd;
 }
 
-#ifdef CONFIG_MEMCG
-static inline
-struct deferred_split *get_deferred_split_queue(struct folio *folio)
-{
-	struct mem_cgroup *memcg = folio_memcg(folio);
-	struct pglist_data *pgdat = NODE_DATA(folio_nid(folio));
-
-	if (memcg)
-		return &memcg->deferred_split_queue;
-	else
-		return &pgdat->deferred_split_queue;
-}
-#else
-static inline
-struct deferred_split *get_deferred_split_queue(struct folio *folio)
-{
-	struct pglist_data *pgdat = NODE_DATA(folio_nid(folio));
-
-	return &pgdat->deferred_split_queue;
-}
-#endif
-
 void folio_prep_large_rmappable(struct folio *folio)
 {
 	if (!folio || !folio_test_large(folio))
diff --git a/mm/internal.h b/mm/internal.h
index d1c69119b24f..8fa36e84463a 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -1107,6 +1107,29 @@  struct page *follow_trans_huge_pmd(struct vm_area_struct *vma,
 				   unsigned long addr, pmd_t *pmd,
 				   unsigned int flags);
 
+#ifdef CONFIG_MEMCG
+static inline
+struct deferred_split *get_deferred_split_queue(struct folio *folio)
+{
+	struct mem_cgroup *memcg = folio_memcg(folio);
+	struct pglist_data *pgdat = NODE_DATA(folio_nid(folio));
+
+	if (memcg)
+		return &memcg->deferred_split_queue;
+	else
+		return &pgdat->deferred_split_queue;
+}
+#else
+static inline
+struct deferred_split *get_deferred_split_queue(struct folio *folio)
+{
+	struct pglist_data *pgdat = NODE_DATA(folio_nid(folio));
+
+	return &pgdat->deferred_split_queue;
+}
+#endif
+
+
 /*
  * mm/mmap.c
  */
diff --git a/mm/migrate.c b/mm/migrate.c
index 73a052a382f1..3fd58b267e37 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1654,25 +1654,63 @@  static int migrate_pages_batch(struct list_head *from,
 
 			/*
 			 * Large folio migration might be unsupported or
-			 * the allocation might be failed so we should retry
-			 * on the same folio with the large folio split
+			 * the folio is on deferred split list so we should
+			 * retry on the same folio with the large folio split
 			 * to normal folios.
 			 *
 			 * Split folios are put in split_folios, and
 			 * we will migrate them after the rest of the
 			 * list is processed.
 			 */
-			if (!thp_migration_supported() && is_thp) {
-				nr_failed++;
-				stats->nr_thp_failed++;
-				if (!try_split_folio(folio, split_folios)) {
-					stats->nr_thp_split++;
-					stats->nr_split++;
+			if (is_thp) {
+				bool is_on_deferred_list = false;
+
+				/*
+				 * Check without taking split_queue_lock to
+				 * reduce locking overheads. The worst case is
+				 * that if the folio is put on the deferred
+				 * split list after the check, it will be
+				 * migrated and not put back on the list.
+				 * The migrated folio will not be split
+				 * via shrinker during memory pressure.
+				 */
+				if (!data_race(list_empty(&folio->_deferred_list))) {
+					struct deferred_split *ds_queue;
+					unsigned long flags;
+
+					ds_queue =
+						get_deferred_split_queue(folio);
+					spin_lock_irqsave(&ds_queue->split_queue_lock,
+							  flags);
+					/*
+					 * Only check if the folio is on
+					 * deferred split list without removing
+					 * it. Since the folio can be on
+					 * deferred_split_scan() local list and
+					 * removing it can cause the local list
+					 * corruption. Folio split process
+					 * below can handle it with the help of
+					 * folio_ref_freeze().
+					 */
+					is_on_deferred_list =
+						!list_empty(&folio->_deferred_list);
+					spin_unlock_irqrestore(&ds_queue->split_queue_lock,
+							       flags);
+				}
+				if (!thp_migration_supported() ||
+						is_on_deferred_list) {
+					nr_failed++;
+					stats->nr_thp_failed++;
+					if (!try_split_folio(folio,
+							     split_folios)) {
+						stats->nr_thp_split++;
+						stats->nr_split++;
+						continue;
+					}
+					stats->nr_failed_pages += nr_pages;
+					list_move_tail(&folio->lru, ret_folios);
 					continue;
 				}
-				stats->nr_failed_pages += nr_pages;
-				list_move_tail(&folio->lru, ret_folios);
-				continue;
 			}
 
 			rc = migrate_folio_unmap(get_new_folio, put_new_folio,