Message ID | 20240319154753.253262-1-zi.yan@sent.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [v3] mm/migrate: split source folio if it is on deferred split list | expand |
Hello, On Tue, 19 Mar 2024 11:47:53 -0400 Zi Yan <zi.yan@sent.com> wrote: > From: Zi Yan <ziy@nvidia.com> > > If the source folio is on deferred split list, it is likely some subpages > are not used. Split it before migration to avoid migrating unused subpages. > > Commit 616b8371539a6 ("mm: thp: enable thp migration in generic path") > did not check if a THP is on deferred split list before migration, thus, > the destination THP is never put on deferred split list even if the source > THP might be. The opportunity of reclaiming free pages in a partially > mapped THP during deferred list scanning is lost, but no other harmful > consequence is present[1]. > > From v2: > 1. Split the source folio instead of migrating it (per Matthew Wilcox)[2]. > > From v1: > 1. Used dst to get correct deferred split list after migration > (per Ryan Roberts). > > [1]: https://lore.kernel.org/linux-mm/03CE3A00-917C-48CC-8E1C-6A98713C817C@nvidia.com/ > [2]: https://lore.kernel.org/linux-mm/Ze_P6xagdTbcu1Kz@casper.infradead.org/ > > Fixes: 616b8371539a ("mm: thp: enable thp migration in generic path") > Signed-off-by: Zi Yan <ziy@nvidia.com> > --- > mm/huge_memory.c | 22 ------------------ > mm/internal.h | 23 +++++++++++++++++++ > mm/migrate.c | 60 +++++++++++++++++++++++++++++++++++++++--------- > 3 files changed, 72 insertions(+), 33 deletions(-) > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index 9859aa4f7553..c6d4d0cdf4b3 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -766,28 +766,6 @@ pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma) > return pmd; > } > > -#ifdef CONFIG_MEMCG > -static inline > -struct deferred_split *get_deferred_split_queue(struct folio *folio) > -{ > - struct mem_cgroup *memcg = folio_memcg(folio); > - struct pglist_data *pgdat = NODE_DATA(folio_nid(folio)); > - > - if (memcg) > - return &memcg->deferred_split_queue; > - else > - return &pgdat->deferred_split_queue; > -} > -#else > -static inline > -struct deferred_split *get_deferred_split_queue(struct folio *folio) > -{ > - struct pglist_data *pgdat = NODE_DATA(folio_nid(folio)); > - > - return &pgdat->deferred_split_queue; > -} > -#endif > - > void folio_prep_large_rmappable(struct folio *folio) > { > if (!folio || !folio_test_large(folio)) > diff --git a/mm/internal.h b/mm/internal.h > index d1c69119b24f..8fa36e84463a 100644 > --- a/mm/internal.h > +++ b/mm/internal.h > @@ -1107,6 +1107,29 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma, > unsigned long addr, pmd_t *pmd, > unsigned int flags); > > +#ifdef CONFIG_MEMCG > +static inline > +struct deferred_split *get_deferred_split_queue(struct folio *folio) > +{ > + struct mem_cgroup *memcg = folio_memcg(folio); > + struct pglist_data *pgdat = NODE_DATA(folio_nid(folio)); > + > + if (memcg) > + return &memcg->deferred_split_queue; > + else > + return &pgdat->deferred_split_queue; > +} > +#else > +static inline > +struct deferred_split *get_deferred_split_queue(struct folio *folio) > +{ > + struct pglist_data *pgdat = NODE_DATA(folio_nid(folio)); > + > + return &pgdat->deferred_split_queue; > +} > +#endif I found this breaks the build when CONFIG_TRANSPARENT_HUGEPAGE is not set, with below error: .../lib/../mm/internal.h: In function 'get_deferred_split_queue': .../lib/../mm/internal.h:1127:22: error: 'struct pglist_data' has no member named 'deferred_split_queue' 1127 | return &pgdat->deferred_split_queue; | ^~ Since the code was in hugepage.c, maybe the above chunk need to be wrapped by #ifdef CONFIG_TRANSPARENT_HUGEPAGE? I confirmed below change is fixing the build on my setup. Thanks, SJ [...] -------------------------------------- >8 ------------------------------------- diff --git a/mm/internal.h b/mm/internal.h index dce2b9f5e6cd..fe9f69ceb140 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -1106,6 +1106,7 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma, unsigned long addr, pmd_t *pmd, unsigned int flags); +#ifdef CONFIG_TRANSPARENT_HUGEPAGE #ifdef CONFIG_MEMCG static inline struct deferred_split *get_deferred_split_queue(struct folio *folio) @@ -1126,7 +1127,8 @@ struct deferred_split *get_deferred_split_queue(struct folio *folio) return &pgdat->deferred_split_queue; } -#endif +#endif /* CONFIG_MEMCG */ +#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ /*
On 19 Mar 2024, at 21:08, SeongJae Park wrote: > Hello, > > On Tue, 19 Mar 2024 11:47:53 -0400 Zi Yan <zi.yan@sent.com> wrote: > >> From: Zi Yan <ziy@nvidia.com> >> >> If the source folio is on deferred split list, it is likely some subpages >> are not used. Split it before migration to avoid migrating unused subpages. >> >> Commit 616b8371539a6 ("mm: thp: enable thp migration in generic path") >> did not check if a THP is on deferred split list before migration, thus, >> the destination THP is never put on deferred split list even if the source >> THP might be. The opportunity of reclaiming free pages in a partially >> mapped THP during deferred list scanning is lost, but no other harmful >> consequence is present[1]. >> >> From v2: >> 1. Split the source folio instead of migrating it (per Matthew Wilcox)[2]. >> >> From v1: >> 1. Used dst to get correct deferred split list after migration >> (per Ryan Roberts). >> >> [1]: https://lore.kernel.org/linux-mm/03CE3A00-917C-48CC-8E1C-6A98713C817C@nvidia.com/ >> [2]: https://lore.kernel.org/linux-mm/Ze_P6xagdTbcu1Kz@casper.infradead.org/ >> >> Fixes: 616b8371539a ("mm: thp: enable thp migration in generic path") >> Signed-off-by: Zi Yan <ziy@nvidia.com> >> --- >> mm/huge_memory.c | 22 ------------------ >> mm/internal.h | 23 +++++++++++++++++++ >> mm/migrate.c | 60 +++++++++++++++++++++++++++++++++++++++--------- >> 3 files changed, 72 insertions(+), 33 deletions(-) >> >> diff --git a/mm/huge_memory.c b/mm/huge_memory.c >> index 9859aa4f7553..c6d4d0cdf4b3 100644 >> --- a/mm/huge_memory.c >> +++ b/mm/huge_memory.c >> @@ -766,28 +766,6 @@ pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma) >> return pmd; >> } >> >> -#ifdef CONFIG_MEMCG >> -static inline >> -struct deferred_split *get_deferred_split_queue(struct folio *folio) >> -{ >> - struct mem_cgroup *memcg = folio_memcg(folio); >> - struct pglist_data *pgdat = NODE_DATA(folio_nid(folio)); >> - >> - if (memcg) >> - return &memcg->deferred_split_queue; >> - else >> - return &pgdat->deferred_split_queue; >> -} >> -#else >> -static inline >> -struct deferred_split *get_deferred_split_queue(struct folio *folio) >> -{ >> - struct pglist_data *pgdat = NODE_DATA(folio_nid(folio)); >> - >> - return &pgdat->deferred_split_queue; >> -} >> -#endif >> - >> void folio_prep_large_rmappable(struct folio *folio) >> { >> if (!folio || !folio_test_large(folio)) >> diff --git a/mm/internal.h b/mm/internal.h >> index d1c69119b24f..8fa36e84463a 100644 >> --- a/mm/internal.h >> +++ b/mm/internal.h >> @@ -1107,6 +1107,29 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma, >> unsigned long addr, pmd_t *pmd, >> unsigned int flags); >> >> +#ifdef CONFIG_MEMCG >> +static inline >> +struct deferred_split *get_deferred_split_queue(struct folio *folio) >> +{ >> + struct mem_cgroup *memcg = folio_memcg(folio); >> + struct pglist_data *pgdat = NODE_DATA(folio_nid(folio)); >> + >> + if (memcg) >> + return &memcg->deferred_split_queue; >> + else >> + return &pgdat->deferred_split_queue; >> +} >> +#else >> +static inline >> +struct deferred_split *get_deferred_split_queue(struct folio *folio) >> +{ >> + struct pglist_data *pgdat = NODE_DATA(folio_nid(folio)); >> + >> + return &pgdat->deferred_split_queue; >> +} >> +#endif > > I found this breaks the build when CONFIG_TRANSPARENT_HUGEPAGE is not set, with > below error: > > .../lib/../mm/internal.h: In function 'get_deferred_split_queue': > .../lib/../mm/internal.h:1127:22: error: 'struct pglist_data' has no member named 'deferred_split_queue' > 1127 | return &pgdat->deferred_split_queue; > | ^~ > > Since the code was in hugepage.c, maybe the above chunk need to be wrapped by > #ifdef CONFIG_TRANSPARENT_HUGEPAGE? I confirmed below change is fixing the > build on my setup. Thanks. Will fix it in the next version. > > [...] > > -------------------------------------- >8 ------------------------------------- > diff --git a/mm/internal.h b/mm/internal.h > index dce2b9f5e6cd..fe9f69ceb140 100644 > --- a/mm/internal.h > +++ b/mm/internal.h > @@ -1106,6 +1106,7 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma, > unsigned long addr, pmd_t *pmd, > unsigned int flags); > > +#ifdef CONFIG_TRANSPARENT_HUGEPAGE > #ifdef CONFIG_MEMCG > static inline > struct deferred_split *get_deferred_split_queue(struct folio *folio) > @@ -1126,7 +1127,8 @@ struct deferred_split *get_deferred_split_queue(struct folio *folio) > > return &pgdat->deferred_split_queue; > } > -#endif > +#endif /* CONFIG_MEMCG */ > +#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ > > > /* -- Best Regards, Yan, Zi
On 19 Mar 2024, at 21:09, Zi Yan wrote: > On 19 Mar 2024, at 21:08, SeongJae Park wrote: > >> Hello, >> >> On Tue, 19 Mar 2024 11:47:53 -0400 Zi Yan <zi.yan@sent.com> wrote: >> >>> From: Zi Yan <ziy@nvidia.com> >>> >>> If the source folio is on deferred split list, it is likely some subpages >>> are not used. Split it before migration to avoid migrating unused subpages. >>> >>> Commit 616b8371539a6 ("mm: thp: enable thp migration in generic path") >>> did not check if a THP is on deferred split list before migration, thus, >>> the destination THP is never put on deferred split list even if the source >>> THP might be. The opportunity of reclaiming free pages in a partially >>> mapped THP during deferred list scanning is lost, but no other harmful >>> consequence is present[1]. >>> >>> From v2: >>> 1. Split the source folio instead of migrating it (per Matthew Wilcox)[2]. >>> >>> From v1: >>> 1. Used dst to get correct deferred split list after migration >>> (per Ryan Roberts). >>> >>> [1]: https://lore.kernel.org/linux-mm/03CE3A00-917C-48CC-8E1C-6A98713C817C@nvidia.com/ >>> [2]: https://lore.kernel.org/linux-mm/Ze_P6xagdTbcu1Kz@casper.infradead.org/ >>> >>> Fixes: 616b8371539a ("mm: thp: enable thp migration in generic path") >>> Signed-off-by: Zi Yan <ziy@nvidia.com> >>> --- >>> mm/huge_memory.c | 22 ------------------ >>> mm/internal.h | 23 +++++++++++++++++++ >>> mm/migrate.c | 60 +++++++++++++++++++++++++++++++++++++++--------- >>> 3 files changed, 72 insertions(+), 33 deletions(-) >>> >>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c >>> index 9859aa4f7553..c6d4d0cdf4b3 100644 >>> --- a/mm/huge_memory.c >>> +++ b/mm/huge_memory.c >>> @@ -766,28 +766,6 @@ pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma) >>> return pmd; >>> } >>> >>> -#ifdef CONFIG_MEMCG >>> -static inline >>> -struct deferred_split *get_deferred_split_queue(struct folio *folio) >>> -{ >>> - struct mem_cgroup *memcg = folio_memcg(folio); >>> - struct pglist_data *pgdat = NODE_DATA(folio_nid(folio)); >>> - >>> - if (memcg) >>> - return &memcg->deferred_split_queue; >>> - else >>> - return &pgdat->deferred_split_queue; >>> -} >>> -#else >>> -static inline >>> -struct deferred_split *get_deferred_split_queue(struct folio *folio) >>> -{ >>> - struct pglist_data *pgdat = NODE_DATA(folio_nid(folio)); >>> - >>> - return &pgdat->deferred_split_queue; >>> -} >>> -#endif >>> - >>> void folio_prep_large_rmappable(struct folio *folio) >>> { >>> if (!folio || !folio_test_large(folio)) >>> diff --git a/mm/internal.h b/mm/internal.h >>> index d1c69119b24f..8fa36e84463a 100644 >>> --- a/mm/internal.h >>> +++ b/mm/internal.h >>> @@ -1107,6 +1107,29 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma, >>> unsigned long addr, pmd_t *pmd, >>> unsigned int flags); >>> >>> +#ifdef CONFIG_MEMCG >>> +static inline >>> +struct deferred_split *get_deferred_split_queue(struct folio *folio) >>> +{ >>> + struct mem_cgroup *memcg = folio_memcg(folio); >>> + struct pglist_data *pgdat = NODE_DATA(folio_nid(folio)); >>> + >>> + if (memcg) >>> + return &memcg->deferred_split_queue; >>> + else >>> + return &pgdat->deferred_split_queue; >>> +} >>> +#else >>> +static inline >>> +struct deferred_split *get_deferred_split_queue(struct folio *folio) >>> +{ >>> + struct pglist_data *pgdat = NODE_DATA(folio_nid(folio)); >>> + >>> + return &pgdat->deferred_split_queue; >>> +} >>> +#endif >> >> I found this breaks the build when CONFIG_TRANSPARENT_HUGEPAGE is not set, with >> below error: >> >> .../lib/../mm/internal.h: In function 'get_deferred_split_queue': >> .../lib/../mm/internal.h:1127:22: error: 'struct pglist_data' has no member named 'deferred_split_queue' >> 1127 | return &pgdat->deferred_split_queue; >> | ^~ >> >> Since the code was in hugepage.c, maybe the above chunk need to be wrapped by >> #ifdef CONFIG_TRANSPARENT_HUGEPAGE? I confirmed below change is fixing the >> build on my setup. > > Thanks. Will fix it in the next version. Actually, since get_deferred_split_queue() is used in mm/migrate.c, that part needs to be guarded by CONFIG_TRANSPARENT_HUGEPAGE as well. -- Best Regards, Yan, Zi
Hi Matthew, >>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c >>>> index 9859aa4f7553..c6d4d0cdf4b3 100644 >>>> --- a/mm/huge_memory.c >>>> +++ b/mm/huge_memory.c >>>> @@ -766,28 +766,6 @@ pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma) >>>> return pmd; >>>> } >>>> >>>> -#ifdef CONFIG_MEMCG >>>> -static inline >>>> -struct deferred_split *get_deferred_split_queue(struct folio *folio) >>>> -{ >>>> - struct mem_cgroup *memcg = folio_memcg(folio); >>>> - struct pglist_data *pgdat = NODE_DATA(folio_nid(folio)); >>>> - >>>> - if (memcg) >>>> - return &memcg->deferred_split_queue; >>>> - else >>>> - return &pgdat->deferred_split_queue; >>>> -} >>>> -#else >>>> -static inline >>>> -struct deferred_split *get_deferred_split_queue(struct folio *folio) >>>> -{ >>>> - struct pglist_data *pgdat = NODE_DATA(folio_nid(folio)); >>>> - >>>> - return &pgdat->deferred_split_queue; >>>> -} >>>> -#endif >>>> - >>>> void folio_prep_large_rmappable(struct folio *folio) >>>> { >>>> if (!folio || !folio_test_large(folio)) >>>> diff --git a/mm/internal.h b/mm/internal.h >>>> index d1c69119b24f..8fa36e84463a 100644 >>>> --- a/mm/internal.h >>>> +++ b/mm/internal.h >>>> @@ -1107,6 +1107,29 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma, >>>> unsigned long addr, pmd_t *pmd, >>>> unsigned int flags); >>>> >>>> +#ifdef CONFIG_MEMCG >>>> +static inline >>>> +struct deferred_split *get_deferred_split_queue(struct folio *folio) >>>> +{ >>>> + struct mem_cgroup *memcg = folio_memcg(folio); >>>> + struct pglist_data *pgdat = NODE_DATA(folio_nid(folio)); >>>> + >>>> + if (memcg) >>>> + return &memcg->deferred_split_queue; >>>> + else >>>> + return &pgdat->deferred_split_queue; >>>> +} >>>> +#else >>>> +static inline >>>> +struct deferred_split *get_deferred_split_queue(struct folio *folio) >>>> +{ >>>> + struct pglist_data *pgdat = NODE_DATA(folio_nid(folio)); >>>> + >>>> + return &pgdat->deferred_split_queue; >>>> +} >>>> +#endif >>> >>> I found this breaks the build when CONFIG_TRANSPARENT_HUGEPAGE is not set, with >>> below error: >>> >>> .../lib/../mm/internal.h: In function 'get_deferred_split_queue': >>> .../lib/../mm/internal.h:1127:22: error: 'struct pglist_data' has no member named 'deferred_split_queue' >>> 1127 | return &pgdat->deferred_split_queue; >>> | ^~ >>> >>> Since the code was in hugepage.c, maybe the above chunk need to be wrapped by >>> #ifdef CONFIG_TRANSPARENT_HUGEPAGE? I confirmed below change is fixing the >>> build on my setup. >> >> Thanks. Will fix it in the next version. > > Actually, since get_deferred_split_queue() is used in mm/migrate.c, that > part needs to be guarded by CONFIG_TRANSPARENT_HUGEPAGE as well. _deferred_list is not hidden behind CONFIG_TRANSPARENT_HUGEPAGE but its users are all behind CONFIG_TRANSPARENT_HUGEPAGE. Should it be moved as well? Or the other way around, moving _deferred_list users out? Large folios should work without THP, since they would be like THP without PMD mappings, namely mTHP. But mTHP is considered a subset of THP. It seems confusing. Maybe hiding _deferred_list behind CONFIG_TRANSPARENT_HUGEPAGE is better, or just do not change. -- Best Regards, Yan, Zi
On Tue, 19 Mar 2024 21:16:37 -0400 Zi Yan <ziy@nvidia.com> wrote: > [-- Attachment #1: Type: text/plain, Size: 4354 bytes --] > > On 19 Mar 2024, at 21:09, Zi Yan wrote: > > > On 19 Mar 2024, at 21:08, SeongJae Park wrote: > > > >> Hello, > >> > >> On Tue, 19 Mar 2024 11:47:53 -0400 Zi Yan <zi.yan@sent.com> wrote: > >> > >>> From: Zi Yan <ziy@nvidia.com> > >>> > >>> If the source folio is on deferred split list, it is likely some subpages > >>> are not used. Split it before migration to avoid migrating unused subpages. > >>> > >>> Commit 616b8371539a6 ("mm: thp: enable thp migration in generic path") > >>> did not check if a THP is on deferred split list before migration, thus, > >>> the destination THP is never put on deferred split list even if the source > >>> THP might be. The opportunity of reclaiming free pages in a partially > >>> mapped THP during deferred list scanning is lost, but no other harmful > >>> consequence is present[1]. > >>> > >>> From v2: > >>> 1. Split the source folio instead of migrating it (per Matthew Wilcox)[2]. > >>> > >>> From v1: > >>> 1. Used dst to get correct deferred split list after migration > >>> (per Ryan Roberts). > >>> > >>> [1]: https://lore.kernel.org/linux-mm/03CE3A00-917C-48CC-8E1C-6A98713C817C@nvidia.com/ > >>> [2]: https://lore.kernel.org/linux-mm/Ze_P6xagdTbcu1Kz@casper.infradead.org/ > >>> > >>> Fixes: 616b8371539a ("mm: thp: enable thp migration in generic path") > >>> Signed-off-by: Zi Yan <ziy@nvidia.com> > >>> --- > >>> mm/huge_memory.c | 22 ------------------ > >>> mm/internal.h | 23 +++++++++++++++++++ > >>> mm/migrate.c | 60 +++++++++++++++++++++++++++++++++++++++--------- > >>> 3 files changed, 72 insertions(+), 33 deletions(-) > >>> > >>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c > >>> index 9859aa4f7553..c6d4d0cdf4b3 100644 > >>> --- a/mm/huge_memory.c > >>> +++ b/mm/huge_memory.c > >>> @@ -766,28 +766,6 @@ pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma) > >>> return pmd; > >>> } > >>> > >>> -#ifdef CONFIG_MEMCG > >>> -static inline > >>> -struct deferred_split *get_deferred_split_queue(struct folio *folio) > >>> -{ > >>> - struct mem_cgroup *memcg = folio_memcg(folio); > >>> - struct pglist_data *pgdat = NODE_DATA(folio_nid(folio)); > >>> - > >>> - if (memcg) > >>> - return &memcg->deferred_split_queue; > >>> - else > >>> - return &pgdat->deferred_split_queue; > >>> -} > >>> -#else > >>> -static inline > >>> -struct deferred_split *get_deferred_split_queue(struct folio *folio) > >>> -{ > >>> - struct pglist_data *pgdat = NODE_DATA(folio_nid(folio)); > >>> - > >>> - return &pgdat->deferred_split_queue; > >>> -} > >>> -#endif > >>> - > >>> void folio_prep_large_rmappable(struct folio *folio) > >>> { > >>> if (!folio || !folio_test_large(folio)) > >>> diff --git a/mm/internal.h b/mm/internal.h > >>> index d1c69119b24f..8fa36e84463a 100644 > >>> --- a/mm/internal.h > >>> +++ b/mm/internal.h > >>> @@ -1107,6 +1107,29 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma, > >>> unsigned long addr, pmd_t *pmd, > >>> unsigned int flags); > >>> > >>> +#ifdef CONFIG_MEMCG > >>> +static inline > >>> +struct deferred_split *get_deferred_split_queue(struct folio *folio) > >>> +{ > >>> + struct mem_cgroup *memcg = folio_memcg(folio); > >>> + struct pglist_data *pgdat = NODE_DATA(folio_nid(folio)); > >>> + > >>> + if (memcg) > >>> + return &memcg->deferred_split_queue; > >>> + else > >>> + return &pgdat->deferred_split_queue; > >>> +} > >>> +#else > >>> +static inline > >>> +struct deferred_split *get_deferred_split_queue(struct folio *folio) > >>> +{ > >>> + struct pglist_data *pgdat = NODE_DATA(folio_nid(folio)); > >>> + > >>> + return &pgdat->deferred_split_queue; > >>> +} > >>> +#endif > >> > >> I found this breaks the build when CONFIG_TRANSPARENT_HUGEPAGE is not set, with > >> below error: > >> > >> .../lib/../mm/internal.h: In function 'get_deferred_split_queue': > >> .../lib/../mm/internal.h:1127:22: error: 'struct pglist_data' has no member named 'deferred_split_queue' > >> 1127 | return &pgdat->deferred_split_queue; > >> | ^~ > >> > >> Since the code was in hugepage.c, maybe the above chunk need to be wrapped by > >> #ifdef CONFIG_TRANSPARENT_HUGEPAGE? I confirmed below change is fixing the > >> build on my setup. > > > > Thanks. Will fix it in the next version. > > Actually, since get_deferred_split_queue() is used in mm/migrate.c, that > part needs to be guarded by CONFIG_TRANSPARENT_HUGEPAGE as well. You're right. I also just confirmed that build breaks with below error when CONFIG_TRANSPARENT_HUGEPAGE is not set but CONFIG_MIGRATION is set. ERROR:root:.../mm/migrate.c: In function ‘migrate_pages_batch’: .../mm/migrate.c:1682:49: error: implicit declaration of function ‘get_deferred_split_queue’ [-Werror=implicit-function-declaration] 1682 | get_deferred_split_queue(folio); | ^~~~~~~~~~~~~~~~~~~~~~~~ Thanks, SJ > > -- > Best Regards, > Yan, Zi
diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 9859aa4f7553..c6d4d0cdf4b3 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -766,28 +766,6 @@ pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma) return pmd; } -#ifdef CONFIG_MEMCG -static inline -struct deferred_split *get_deferred_split_queue(struct folio *folio) -{ - struct mem_cgroup *memcg = folio_memcg(folio); - struct pglist_data *pgdat = NODE_DATA(folio_nid(folio)); - - if (memcg) - return &memcg->deferred_split_queue; - else - return &pgdat->deferred_split_queue; -} -#else -static inline -struct deferred_split *get_deferred_split_queue(struct folio *folio) -{ - struct pglist_data *pgdat = NODE_DATA(folio_nid(folio)); - - return &pgdat->deferred_split_queue; -} -#endif - void folio_prep_large_rmappable(struct folio *folio) { if (!folio || !folio_test_large(folio)) diff --git a/mm/internal.h b/mm/internal.h index d1c69119b24f..8fa36e84463a 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -1107,6 +1107,29 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma, unsigned long addr, pmd_t *pmd, unsigned int flags); +#ifdef CONFIG_MEMCG +static inline +struct deferred_split *get_deferred_split_queue(struct folio *folio) +{ + struct mem_cgroup *memcg = folio_memcg(folio); + struct pglist_data *pgdat = NODE_DATA(folio_nid(folio)); + + if (memcg) + return &memcg->deferred_split_queue; + else + return &pgdat->deferred_split_queue; +} +#else +static inline +struct deferred_split *get_deferred_split_queue(struct folio *folio) +{ + struct pglist_data *pgdat = NODE_DATA(folio_nid(folio)); + + return &pgdat->deferred_split_queue; +} +#endif + + /* * mm/mmap.c */ diff --git a/mm/migrate.c b/mm/migrate.c index 73a052a382f1..3fd58b267e37 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -1654,25 +1654,63 @@ static int migrate_pages_batch(struct list_head *from, /* * Large folio migration might be unsupported or - * the allocation might be failed so we should retry - * on the same folio with the large folio split + * the folio is on deferred split list so we should + * retry on the same folio with the large folio split * to normal folios. * * Split folios are put in split_folios, and * we will migrate them after the rest of the * list is processed. */ - if (!thp_migration_supported() && is_thp) { - nr_failed++; - stats->nr_thp_failed++; - if (!try_split_folio(folio, split_folios)) { - stats->nr_thp_split++; - stats->nr_split++; + if (is_thp) { + bool is_on_deferred_list = false; + + /* + * Check without taking split_queue_lock to + * reduce locking overheads. The worst case is + * that if the folio is put on the deferred + * split list after the check, it will be + * migrated and not put back on the list. + * The migrated folio will not be split + * via shrinker during memory pressure. + */ + if (!data_race(list_empty(&folio->_deferred_list))) { + struct deferred_split *ds_queue; + unsigned long flags; + + ds_queue = + get_deferred_split_queue(folio); + spin_lock_irqsave(&ds_queue->split_queue_lock, + flags); + /* + * Only check if the folio is on + * deferred split list without removing + * it. Since the folio can be on + * deferred_split_scan() local list and + * removing it can cause the local list + * corruption. Folio split process + * below can handle it with the help of + * folio_ref_freeze(). + */ + is_on_deferred_list = + !list_empty(&folio->_deferred_list); + spin_unlock_irqrestore(&ds_queue->split_queue_lock, + flags); + } + if (!thp_migration_supported() || + is_on_deferred_list) { + nr_failed++; + stats->nr_thp_failed++; + if (!try_split_folio(folio, + split_folios)) { + stats->nr_thp_split++; + stats->nr_split++; + continue; + } + stats->nr_failed_pages += nr_pages; + list_move_tail(&folio->lru, ret_folios); continue; } - stats->nr_failed_pages += nr_pages; - list_move_tail(&folio->lru, ret_folios); - continue; } rc = migrate_folio_unmap(get_new_folio, put_new_folio,