Message ID | 20240816040625.650053-1-wangkefeng.wang@huawei.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm: remove migration for HugePage in isolate_single_pageblock() | expand |
On Fri, 16 Aug 2024 12:06:25 +0800 Kefeng Wang <wangkefeng.wang@huawei.com> wrote: > The gigantic page size may larger than memory block size, so memory > offline always fails in this case after commit b2c9e2fbba32 ("mm: make > alloc_contig_range work at pageblock granularity"), > > offline_pages > start_isolate_page_range > start_isolate_page_range(isolate_before=true) > isolate [isolate_start, isolate_start + pageblock_nr_pages) > start_isolate_page_range(isolate_before=false) > isolate [isolate_end - pageblock_nr_pages, isolate_end) pageblock > __alloc_contig_migrate_range > isolate_migratepages_range > isolate_migratepages_block > isolate_or_dissolve_huge_page > if (hstate_is_gigantic(h)) > return -ENOMEM; > > In fact, we don't need to migrate page in page range isolation, for > memory offline path, there is do_migrate_range() to move the pages. > For contig allocation, there is another __alloc_contig_migrate_range() > after isolation to migrate the pages. So fix issue by skipping the > __alloc_contig_migrate_range() in isolate_single_pageblock(). > > Fixes: b2c9e2fbba32 ("mm: make alloc_contig_range work at pageblock granularity") Should we backport this into -stable kernels?
On 2024/8/16 12:58, Andrew Morton wrote: > On Fri, 16 Aug 2024 12:06:25 +0800 Kefeng Wang <wangkefeng.wang@huawei.com> wrote: > >> The gigantic page size may larger than memory block size, so memory >> offline always fails in this case after commit b2c9e2fbba32 ("mm: make >> alloc_contig_range work at pageblock granularity"), >> >> offline_pages >> start_isolate_page_range >> start_isolate_page_range(isolate_before=true) >> isolate [isolate_start, isolate_start + pageblock_nr_pages) >> start_isolate_page_range(isolate_before=false) >> isolate [isolate_end - pageblock_nr_pages, isolate_end) pageblock >> __alloc_contig_migrate_range >> isolate_migratepages_range >> isolate_migratepages_block >> isolate_or_dissolve_huge_page >> if (hstate_is_gigantic(h)) >> return -ENOMEM; >> >> In fact, we don't need to migrate page in page range isolation, for >> memory offline path, there is do_migrate_range() to move the pages. >> For contig allocation, there is another __alloc_contig_migrate_range() >> after isolation to migrate the pages. So fix issue by skipping the >> __alloc_contig_migrate_range() in isolate_single_pageblock(). >> >> Fixes: b2c9e2fbba32 ("mm: make alloc_contig_range work at pageblock granularity") > > Should we backport this into -stable kernels? Better to backport to stable since memory offline always fail in this case. >
On 16.08.24 06:06, Kefeng Wang wrote: > The gigantic page size may larger than memory block size, so memory > offline always fails in this case after commit b2c9e2fbba32 ("mm: make > alloc_contig_range work at pageblock granularity"), > > offline_pages > start_isolate_page_range > start_isolate_page_range(isolate_before=true) > isolate [isolate_start, isolate_start + pageblock_nr_pages) > start_isolate_page_range(isolate_before=false) > isolate [isolate_end - pageblock_nr_pages, isolate_end) pageblock > __alloc_contig_migrate_range > isolate_migratepages_range > isolate_migratepages_block > isolate_or_dissolve_huge_page > if (hstate_is_gigantic(h)) > return -ENOMEM; > > In fact, we don't need to migrate page in page range isolation, for > memory offline path, there is do_migrate_range() to move the pages. > For contig allocation, there is another __alloc_contig_migrate_range() > after isolation to migrate the pages. So fix issue by skipping the > __alloc_contig_migrate_range() in isolate_single_pageblock(). > > Fixes: b2c9e2fbba32 ("mm: make alloc_contig_range work at pageblock granularity") > Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> > --- > mm/page_isolation.c | 28 +++------------------------- > 1 file changed, 3 insertions(+), 25 deletions(-) > > diff --git a/mm/page_isolation.c b/mm/page_isolation.c > index 39fb8c07aeb7..7e04047977cf 100644 > --- a/mm/page_isolation.c > +++ b/mm/page_isolation.c > @@ -403,30 +403,8 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, int flags, > unsigned long head_pfn = page_to_pfn(head); > unsigned long nr_pages = compound_nr(head); > > - if (head_pfn + nr_pages <= boundary_pfn) { > - pfn = head_pfn + nr_pages; > - continue; > - } > - > -#if defined CONFIG_COMPACTION || defined CONFIG_CMA > - if (PageHuge(page)) { > - int page_mt = get_pageblock_migratetype(page); > - struct compact_control cc = { > - .nr_migratepages = 0, > - .order = -1, > - .zone = page_zone(pfn_to_page(head_pfn)), > - .mode = MIGRATE_SYNC, > - .ignore_skip_hint = true, > - .no_set_skip_hint = true, > - .gfp_mask = gfp_flags, > - .alloc_contig = true, > - }; > - INIT_LIST_HEAD(&cc.migratepages); > - > - ret = __alloc_contig_migrate_range(&cc, head_pfn, > - head_pfn + nr_pages, page_mt); > - if (ret) > - goto failed; But won't this break alloc_contig_range() then? I would have expected that you have to special-case here on the migration reason (MEMORY_OFFLINE). I remember some dirty details when we're trying to allcoate with a single pageblock for alloc_contig_range(). Note that memory offlining always covers pageblocks large than MAX_ORDER chunks (which implies full pageblocks) but alloc_contig_range() + CMA might only cover (parts of) single pageblocks. Hoping Zi Yan can review :)
On 2024/8/16 18:11, David Hildenbrand wrote: > On 16.08.24 06:06, Kefeng Wang wrote: >> The gigantic page size may larger than memory block size, so memory >> offline always fails in this case after commit b2c9e2fbba32 ("mm: make >> alloc_contig_range work at pageblock granularity"), >> >> offline_pages >> start_isolate_page_range >> start_isolate_page_range(isolate_before=true) >> isolate [isolate_start, isolate_start + pageblock_nr_pages) >> start_isolate_page_range(isolate_before=false) >> isolate [isolate_end - pageblock_nr_pages, isolate_end) pageblock >> __alloc_contig_migrate_range >> isolate_migratepages_range >> isolate_migratepages_block >> isolate_or_dissolve_huge_page >> if (hstate_is_gigantic(h)) >> return -ENOMEM; >> >> In fact, we don't need to migrate page in page range isolation, for >> memory offline path, there is do_migrate_range() to move the pages. >> For contig allocation, there is another __alloc_contig_migrate_range() >> after isolation to migrate the pages. So fix issue by skipping the >> __alloc_contig_migrate_range() in isolate_single_pageblock(). >> >> Fixes: b2c9e2fbba32 ("mm: make alloc_contig_range work at pageblock >> granularity") >> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> >> --- >> mm/page_isolation.c | 28 +++------------------------- >> 1 file changed, 3 insertions(+), 25 deletions(-) >> >> diff --git a/mm/page_isolation.c b/mm/page_isolation.c >> index 39fb8c07aeb7..7e04047977cf 100644 >> --- a/mm/page_isolation.c >> +++ b/mm/page_isolation.c >> @@ -403,30 +403,8 @@ static int isolate_single_pageblock(unsigned long >> boundary_pfn, int flags, >> unsigned long head_pfn = page_to_pfn(head); >> unsigned long nr_pages = compound_nr(head); >> - if (head_pfn + nr_pages <= boundary_pfn) { >> - pfn = head_pfn + nr_pages; >> - continue; >> - } >> - >> -#if defined CONFIG_COMPACTION || defined CONFIG_CMA >> - if (PageHuge(page)) { >> - int page_mt = get_pageblock_migratetype(page); >> - struct compact_control cc = { >> - .nr_migratepages = 0, >> - .order = -1, >> - .zone = page_zone(pfn_to_page(head_pfn)), >> - .mode = MIGRATE_SYNC, >> - .ignore_skip_hint = true, >> - .no_set_skip_hint = true, >> - .gfp_mask = gfp_flags, >> - .alloc_contig = true, >> - }; >> - INIT_LIST_HEAD(&cc.migratepages); >> - >> - ret = __alloc_contig_migrate_range(&cc, head_pfn, >> - head_pfn + nr_pages, page_mt); >> - if (ret) >> - goto failed; > > But won't this break alloc_contig_range() then? I would have expected > that you have to special-case here on the migration reason > (MEMORY_OFFLINE). > Yes, this is what I did in rfc, only skip migration for offline path. but Zi Yan suggested to remove migration totally[1] [1] https://lore.kernel.org/linux-mm/50FEEE33-49CA-48B5-B4C5-964F1BE25D43@nvidia.com/ > I remember some dirty details when we're trying to allcoate with a > single pageblock for alloc_contig_range(). > > Note that memory offlining always covers pageblocks large than MAX_ORDER > chunks (which implies full pageblocks) but alloc_contig_range() + CMA > might only cover (parts of) single pageblocks. > > Hoping Zi Yan can review :) >
On 16 Aug 2024, at 7:30, Kefeng Wang wrote: > On 2024/8/16 18:11, David Hildenbrand wrote: >> On 16.08.24 06:06, Kefeng Wang wrote: >>> The gigantic page size may larger than memory block size, so memory >>> offline always fails in this case after commit b2c9e2fbba32 ("mm: make >>> alloc_contig_range work at pageblock granularity"), >>> >>> offline_pages >>> start_isolate_page_range >>> start_isolate_page_range(isolate_before=true) >>> isolate [isolate_start, isolate_start + pageblock_nr_pages) >>> start_isolate_page_range(isolate_before=false) >>> isolate [isolate_end - pageblock_nr_pages, isolate_end) pageblock >>> __alloc_contig_migrate_range >>> isolate_migratepages_range >>> isolate_migratepages_block >>> isolate_or_dissolve_huge_page >>> if (hstate_is_gigantic(h)) >>> return -ENOMEM; >>> >>> In fact, we don't need to migrate page in page range isolation, for >>> memory offline path, there is do_migrate_range() to move the pages. >>> For contig allocation, there is another __alloc_contig_migrate_range() >>> after isolation to migrate the pages. So fix issue by skipping the >>> __alloc_contig_migrate_range() in isolate_single_pageblock(). >>> >>> Fixes: b2c9e2fbba32 ("mm: make alloc_contig_range work at pageblock granularity") >>> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> >>> --- >>> mm/page_isolation.c | 28 +++------------------------- >>> 1 file changed, 3 insertions(+), 25 deletions(-) >>> >>> diff --git a/mm/page_isolation.c b/mm/page_isolation.c >>> index 39fb8c07aeb7..7e04047977cf 100644 >>> --- a/mm/page_isolation.c >>> +++ b/mm/page_isolation.c >>> @@ -403,30 +403,8 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, int flags, >>> unsigned long head_pfn = page_to_pfn(head); >>> unsigned long nr_pages = compound_nr(head); >>> - if (head_pfn + nr_pages <= boundary_pfn) { >>> - pfn = head_pfn + nr_pages; >>> - continue; >>> - } >>> - >>> -#if defined CONFIG_COMPACTION || defined CONFIG_CMA >>> - if (PageHuge(page)) { >>> - int page_mt = get_pageblock_migratetype(page); >>> - struct compact_control cc = { >>> - .nr_migratepages = 0, >>> - .order = -1, >>> - .zone = page_zone(pfn_to_page(head_pfn)), >>> - .mode = MIGRATE_SYNC, >>> - .ignore_skip_hint = true, >>> - .no_set_skip_hint = true, >>> - .gfp_mask = gfp_flags, >>> - .alloc_contig = true, >>> - }; >>> - INIT_LIST_HEAD(&cc.migratepages); >>> - >>> - ret = __alloc_contig_migrate_range(&cc, head_pfn, >>> - head_pfn + nr_pages, page_mt); >>> - if (ret) >>> - goto failed; >> >> But won't this break alloc_contig_range() then? I would have expected that you have to special-case here on the migration reason (MEMORY_OFFLINE). >> > > Yes, this is what I did in rfc, only skip migration for offline path. > but Zi Yan suggested to remove migration totally[1] > > [1] https://lore.kernel.org/linux-mm/50FEEE33-49CA-48B5-B4C5-964F1BE25D43@nvidia.com/ > >> I remember some dirty details when we're trying to allcoate with a single pageblock for alloc_contig_range(). Most likely I was overthinking about the situation back then. I thought PageHuge, PageLRU, and __PageMovable all can be bigger than a pageblock, but in reality only PageHuge can and the gigantic PageHuge is freed as order-0. This means MIGRATE_ISOLATE pageblocks will get to the right free list after __alloc_contig_migrate_range(), the one after start_isolate_page_range(). David, I know we do not have cross-pageblock PageLRU yet (wait until someone adds PMD-level mTHP). But I am not sure about __PageMovable, even if you and Johannes told me that __PageMovable has no compound page. I wonder what are the use cases for __PageMovable. Is it possible for a driver to mark its cross-pageblock page __PageMovable and provide ->isolate_page and ->migratepage in its struct address_space_operations? Or it is unsupported, so I should not need to worry about it. >> >> Note that memory offlining always covers pageblocks large than MAX_ORDER chunks (which implies full pageblocks) but alloc_contig_range() + CMA might only cover (parts of) single pageblocks. >> >> Hoping Zi Yan can review :) At the moment, I think this is the right clean up. Acked-by: Zi Yan <ziy@nvidia.com> Best Regards, Yan, Zi
On 16.08.24 13:30, Kefeng Wang wrote: > > > On 2024/8/16 18:11, David Hildenbrand wrote: >> On 16.08.24 06:06, Kefeng Wang wrote: >>> The gigantic page size may larger than memory block size, so memory >>> offline always fails in this case after commit b2c9e2fbba32 ("mm: make >>> alloc_contig_range work at pageblock granularity"), >>> >>> offline_pages >>> start_isolate_page_range >>> start_isolate_page_range(isolate_before=true) >>> isolate [isolate_start, isolate_start + pageblock_nr_pages) >>> start_isolate_page_range(isolate_before=false) >>> isolate [isolate_end - pageblock_nr_pages, isolate_end) pageblock >>> __alloc_contig_migrate_range >>> isolate_migratepages_range >>> isolate_migratepages_block >>> isolate_or_dissolve_huge_page >>> if (hstate_is_gigantic(h)) >>> return -ENOMEM; >>> >>> In fact, we don't need to migrate page in page range isolation, for >>> memory offline path, there is do_migrate_range() to move the pages. >>> For contig allocation, there is another __alloc_contig_migrate_range() >>> after isolation to migrate the pages. So fix issue by skipping the >>> __alloc_contig_migrate_range() in isolate_single_pageblock(). >>> >>> Fixes: b2c9e2fbba32 ("mm: make alloc_contig_range work at pageblock >>> granularity") >>> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> >>> --- >>> mm/page_isolation.c | 28 +++------------------------- >>> 1 file changed, 3 insertions(+), 25 deletions(-) >>> >>> diff --git a/mm/page_isolation.c b/mm/page_isolation.c >>> index 39fb8c07aeb7..7e04047977cf 100644 >>> --- a/mm/page_isolation.c >>> +++ b/mm/page_isolation.c >>> @@ -403,30 +403,8 @@ static int isolate_single_pageblock(unsigned long >>> boundary_pfn, int flags, >>> unsigned long head_pfn = page_to_pfn(head); >>> unsigned long nr_pages = compound_nr(head); >>> - if (head_pfn + nr_pages <= boundary_pfn) { >>> - pfn = head_pfn + nr_pages; >>> - continue; >>> - } >>> - >>> -#if defined CONFIG_COMPACTION || defined CONFIG_CMA >>> - if (PageHuge(page)) { >>> - int page_mt = get_pageblock_migratetype(page); >>> - struct compact_control cc = { >>> - .nr_migratepages = 0, >>> - .order = -1, >>> - .zone = page_zone(pfn_to_page(head_pfn)), >>> - .mode = MIGRATE_SYNC, >>> - .ignore_skip_hint = true, >>> - .no_set_skip_hint = true, >>> - .gfp_mask = gfp_flags, >>> - .alloc_contig = true, >>> - }; >>> - INIT_LIST_HEAD(&cc.migratepages); >>> - >>> - ret = __alloc_contig_migrate_range(&cc, head_pfn, >>> - head_pfn + nr_pages, page_mt); >>> - if (ret) >>> - goto failed; >> >> But won't this break alloc_contig_range() then? I would have expected >> that you have to special-case here on the migration reason >> (MEMORY_OFFLINE). >> > > Yes, this is what I did in rfc, only skip migration for offline path. > but Zi Yan suggested to remove migration totally[1] Please distill some of that in the patch description. Right now you only talk about memory offlining and don't cover why alloc_contig_range() is fine as well with this change. Let me explore the details in the meantime ... :)
On 16.08.24 17:06, Zi Yan wrote: > On 16 Aug 2024, at 7:30, Kefeng Wang wrote: > >> On 2024/8/16 18:11, David Hildenbrand wrote: >>> On 16.08.24 06:06, Kefeng Wang wrote: >>>> The gigantic page size may larger than memory block size, so memory >>>> offline always fails in this case after commit b2c9e2fbba32 ("mm: make >>>> alloc_contig_range work at pageblock granularity"), >>>> >>>> offline_pages >>>> start_isolate_page_range >>>> start_isolate_page_range(isolate_before=true) >>>> isolate [isolate_start, isolate_start + pageblock_nr_pages) >>>> start_isolate_page_range(isolate_before=false) >>>> isolate [isolate_end - pageblock_nr_pages, isolate_end) pageblock >>>> __alloc_contig_migrate_range >>>> isolate_migratepages_range >>>> isolate_migratepages_block >>>> isolate_or_dissolve_huge_page >>>> if (hstate_is_gigantic(h)) >>>> return -ENOMEM; >>>> >>>> In fact, we don't need to migrate page in page range isolation, for >>>> memory offline path, there is do_migrate_range() to move the pages. >>>> For contig allocation, there is another __alloc_contig_migrate_range() >>>> after isolation to migrate the pages. So fix issue by skipping the >>>> __alloc_contig_migrate_range() in isolate_single_pageblock(). >>>> >>>> Fixes: b2c9e2fbba32 ("mm: make alloc_contig_range work at pageblock granularity") >>>> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> >>>> --- >>>> mm/page_isolation.c | 28 +++------------------------- >>>> 1 file changed, 3 insertions(+), 25 deletions(-) >>>> >>>> diff --git a/mm/page_isolation.c b/mm/page_isolation.c >>>> index 39fb8c07aeb7..7e04047977cf 100644 >>>> --- a/mm/page_isolation.c >>>> +++ b/mm/page_isolation.c >>>> @@ -403,30 +403,8 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, int flags, >>>> unsigned long head_pfn = page_to_pfn(head); >>>> unsigned long nr_pages = compound_nr(head); >>>> - if (head_pfn + nr_pages <= boundary_pfn) { >>>> - pfn = head_pfn + nr_pages; >>>> - continue; >>>> - } >>>> - >>>> -#if defined CONFIG_COMPACTION || defined CONFIG_CMA >>>> - if (PageHuge(page)) { >>>> - int page_mt = get_pageblock_migratetype(page); >>>> - struct compact_control cc = { >>>> - .nr_migratepages = 0, >>>> - .order = -1, >>>> - .zone = page_zone(pfn_to_page(head_pfn)), >>>> - .mode = MIGRATE_SYNC, >>>> - .ignore_skip_hint = true, >>>> - .no_set_skip_hint = true, >>>> - .gfp_mask = gfp_flags, >>>> - .alloc_contig = true, >>>> - }; >>>> - INIT_LIST_HEAD(&cc.migratepages); >>>> - >>>> - ret = __alloc_contig_migrate_range(&cc, head_pfn, >>>> - head_pfn + nr_pages, page_mt); >>>> - if (ret) >>>> - goto failed; >>> >>> But won't this break alloc_contig_range() then? I would have expected that you have to special-case here on the migration reason (MEMORY_OFFLINE). >>> >> >> Yes, this is what I did in rfc, only skip migration for offline path. >> but Zi Yan suggested to remove migration totally[1] >> >> [1] https://lore.kernel.org/linux-mm/50FEEE33-49CA-48B5-B4C5-964F1BE25D43@nvidia.com/ >> >>> I remember some dirty details when we're trying to allcoate with a single pageblock for alloc_contig_range(). > > Most likely I was overthinking about the situation back then. I thought I'm more than happy if we can remove that code here :) > PageHuge, PageLRU, and __PageMovable all can be bigger than a pageblock, > but in reality only PageHuge can and the gigantic PageHuge is freed as > order-0. Does that still hold with Yu's patches to allocate/free gigantic pages from CMA using compound pages that are on the list (and likely already in mm-unstable)? I did not look at the freeing path of that patchset. As the buddy doesn't understand anything larger than MAX_ORDER, I would assume that we are fine. I assume the real issue is when we have a movable allocation (folio) that spans multiple pageblocks. For example, when MAX_ORDER is large than a single pageblock, like it is on x86. Besides gigantic pages, I wonder if that can happen. Likely currently really only with hugetlb. This means MIGRATE_ISOLATE pageblocks will get to the right > free list after __alloc_contig_migrate_range(), the one after > start_isolate_page_range(). > > David, I know we do not have cross-pageblock PageLRU yet (wait until > someone adds PMD-level mTHP). But I am not sure about __PageMovable, > even if you and Johannes told me that __PageMovable has no compound page. I think it's all order-0. Likely we should sanity check that somewhere (when setting a folio-page movable?). For example, the vmware balloon handles 2M pages differently than 4k pages. Only the latter is movable. > I wonder what are the use cases for __PageMovable. Is it possible for > a driver to mark its cross-pageblock page __PageMovable and provide > ->isolate_page and ->migratepage in its struct address_space_operations? > Or it is unsupported, so I should not need to worry about it. I never tried. We should document and enforce/sanity check that it only works with order-0 for now. > >>> >>> Note that memory offlining always covers pageblocks large than MAX_ORDER chunks (which implies full pageblocks) but alloc_contig_range() + CMA might only cover (parts of) single pageblocks. >>> >>> Hoping Zi Yan can review :) > > At the moment, I think this is the right clean up. I think we want to have some way to catch when it changes. For example, can we warn if we find a LRU folio here that is large than a single pageblock? Also, I think we have to document why it works with hugetlb gigantic folios / large CMA allocations somewhere (the order-0 stuff you note above). Maybe as part of this changelog.
On Fri, Aug 16, 2024 at 2:12 PM David Hildenbrand <david@redhat.com> wrote: > > On 16.08.24 17:06, Zi Yan wrote: > > On 16 Aug 2024, at 7:30, Kefeng Wang wrote: > > > >> On 2024/8/16 18:11, David Hildenbrand wrote: > >>> On 16.08.24 06:06, Kefeng Wang wrote: > >>>> The gigantic page size may larger than memory block size, so memory > >>>> offline always fails in this case after commit b2c9e2fbba32 ("mm: make > >>>> alloc_contig_range work at pageblock granularity"), > >>>> > >>>> offline_pages > >>>> start_isolate_page_range > >>>> start_isolate_page_range(isolate_before=true) > >>>> isolate [isolate_start, isolate_start + pageblock_nr_pages) > >>>> start_isolate_page_range(isolate_before=false) > >>>> isolate [isolate_end - pageblock_nr_pages, isolate_end) pageblock > >>>> __alloc_contig_migrate_range > >>>> isolate_migratepages_range > >>>> isolate_migratepages_block > >>>> isolate_or_dissolve_huge_page > >>>> if (hstate_is_gigantic(h)) > >>>> return -ENOMEM; > >>>> > >>>> In fact, we don't need to migrate page in page range isolation, for > >>>> memory offline path, there is do_migrate_range() to move the pages. > >>>> For contig allocation, there is another __alloc_contig_migrate_range() > >>>> after isolation to migrate the pages. So fix issue by skipping the > >>>> __alloc_contig_migrate_range() in isolate_single_pageblock(). > >>>> > >>>> Fixes: b2c9e2fbba32 ("mm: make alloc_contig_range work at pageblock granularity") > >>>> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> > >>>> --- > >>>> mm/page_isolation.c | 28 +++------------------------- > >>>> 1 file changed, 3 insertions(+), 25 deletions(-) > >>>> > >>>> diff --git a/mm/page_isolation.c b/mm/page_isolation.c > >>>> index 39fb8c07aeb7..7e04047977cf 100644 > >>>> --- a/mm/page_isolation.c > >>>> +++ b/mm/page_isolation.c > >>>> @@ -403,30 +403,8 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, int flags, > >>>> unsigned long head_pfn = page_to_pfn(head); > >>>> unsigned long nr_pages = compound_nr(head); > >>>> - if (head_pfn + nr_pages <= boundary_pfn) { > >>>> - pfn = head_pfn + nr_pages; > >>>> - continue; > >>>> - } > >>>> - > >>>> -#if defined CONFIG_COMPACTION || defined CONFIG_CMA > >>>> - if (PageHuge(page)) { > >>>> - int page_mt = get_pageblock_migratetype(page); > >>>> - struct compact_control cc = { > >>>> - .nr_migratepages = 0, > >>>> - .order = -1, > >>>> - .zone = page_zone(pfn_to_page(head_pfn)), > >>>> - .mode = MIGRATE_SYNC, > >>>> - .ignore_skip_hint = true, > >>>> - .no_set_skip_hint = true, > >>>> - .gfp_mask = gfp_flags, > >>>> - .alloc_contig = true, > >>>> - }; > >>>> - INIT_LIST_HEAD(&cc.migratepages); > >>>> - > >>>> - ret = __alloc_contig_migrate_range(&cc, head_pfn, > >>>> - head_pfn + nr_pages, page_mt); > >>>> - if (ret) > >>>> - goto failed; > >>> > >>> But won't this break alloc_contig_range() then? I would have expected that you have to special-case here on the migration reason (MEMORY_OFFLINE). > >>> > >> > >> Yes, this is what I did in rfc, only skip migration for offline path. > >> but Zi Yan suggested to remove migration totally[1] > >> > >> [1] https://lore.kernel.org/linux-mm/50FEEE33-49CA-48B5-B4C5-964F1BE25D43@nvidia.com/ > >> > >>> I remember some dirty details when we're trying to allcoate with a single pageblock for alloc_contig_range(). > > > > Most likely I was overthinking about the situation back then. I thought > > I'm more than happy if we can remove that code here :) > > > PageHuge, PageLRU, and __PageMovable all can be bigger than a pageblock, > > but in reality only PageHuge can and the gigantic PageHuge is freed as > > order-0. > > Does that still hold with Yu's patches to allocate/free gigantic pages > from CMA using compound pages that are on the list (and likely already > in mm-unstable)? Gigantic folios are now freed at pageblock granularity rather than order-0, as Zi himself stated during the review :) https://lore.kernel.org/linux-mm/29B680F7-E14D-4CD7-802B-5BBE1E1A3F92@nvidia.com/ > I did not look at the freeing path of that patchset. As > the buddy doesn't understand anything larger than MAX_ORDER, I would > assume that we are fine. Correct. > I assume the real issue is when we have a movable allocation (folio) > that spans multiple pageblocks. For example, when MAX_ORDER is large > than a single pageblock, like it is on x86. > > Besides gigantic pages, I wonder if that can happen. Likely currently > really only with hugetlb. > > > This means MIGRATE_ISOLATE pageblocks will get to the right > > free list after __alloc_contig_migrate_range(), the one after > > start_isolate_page_range(). > > > > David, I know we do not have cross-pageblock PageLRU yet (wait until > > someone adds PMD-level mTHP). But I am not sure about __PageMovable, > > even if you and Johannes told me that __PageMovable has no compound page. > > I think it's all order-0. Likely we should sanity check that somewhere > (when setting a folio-page movable?). > > For example, the vmware balloon handles 2M pages differently than 4k > pages. Only the latter is movable. > > > I wonder what are the use cases for __PageMovable. Is it possible for > > a driver to mark its cross-pageblock page __PageMovable and provide > > ->isolate_page and ->migratepage in its struct address_space_operations? > > Or it is unsupported, so I should not need to worry about it. > > I never tried. We should document and enforce/sanity check that it only > works with order-0 for now. > > > > >>> > >>> Note that memory offlining always covers pageblocks large than MAX_ORDER chunks (which implies full pageblocks) but alloc_contig_range() + CMA might only cover (parts of) single pageblocks. > >>> > >>> Hoping Zi Yan can review :) > > > > At the moment, I think this is the right clean up. > > I think we want to have some way to catch when it changes. For example, > can we warn if we find a LRU folio here that is large than a single > pageblock? > > Also, I think we have to document why it works with hugetlb gigantic > folios / large CMA allocations somewhere (the order-0 stuff you note > above). Maybe as part of this changelog. > > -- > Cheers, > > David / dhildenb > >
On 16 Aug 2024, at 16:12, David Hildenbrand wrote: > On 16.08.24 17:06, Zi Yan wrote: >> On 16 Aug 2024, at 7:30, Kefeng Wang wrote: >> >>> On 2024/8/16 18:11, David Hildenbrand wrote: >>>> On 16.08.24 06:06, Kefeng Wang wrote: >>>>> The gigantic page size may larger than memory block size, so memory >>>>> offline always fails in this case after commit b2c9e2fbba32 ("mm: make >>>>> alloc_contig_range work at pageblock granularity"), >>>>> >>>>> offline_pages >>>>> start_isolate_page_range >>>>> start_isolate_page_range(isolate_before=true) >>>>> isolate [isolate_start, isolate_start + pageblock_nr_pages) >>>>> start_isolate_page_range(isolate_before=false) >>>>> isolate [isolate_end - pageblock_nr_pages, isolate_end) pageblock >>>>> __alloc_contig_migrate_range >>>>> isolate_migratepages_range >>>>> isolate_migratepages_block >>>>> isolate_or_dissolve_huge_page >>>>> if (hstate_is_gigantic(h)) >>>>> return -ENOMEM; >>>>> >>>>> In fact, we don't need to migrate page in page range isolation, for >>>>> memory offline path, there is do_migrate_range() to move the pages. >>>>> For contig allocation, there is another __alloc_contig_migrate_range() >>>>> after isolation to migrate the pages. So fix issue by skipping the >>>>> __alloc_contig_migrate_range() in isolate_single_pageblock(). >>>>> >>>>> Fixes: b2c9e2fbba32 ("mm: make alloc_contig_range work at pageblock granularity") >>>>> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> >>>>> --- >>>>> mm/page_isolation.c | 28 +++------------------------- >>>>> 1 file changed, 3 insertions(+), 25 deletions(-) >>>>> >>>>> diff --git a/mm/page_isolation.c b/mm/page_isolation.c >>>>> index 39fb8c07aeb7..7e04047977cf 100644 >>>>> --- a/mm/page_isolation.c >>>>> +++ b/mm/page_isolation.c >>>>> @@ -403,30 +403,8 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, int flags, >>>>> unsigned long head_pfn = page_to_pfn(head); >>>>> unsigned long nr_pages = compound_nr(head); >>>>> - if (head_pfn + nr_pages <= boundary_pfn) { >>>>> - pfn = head_pfn + nr_pages; >>>>> - continue; >>>>> - } >>>>> - >>>>> -#if defined CONFIG_COMPACTION || defined CONFIG_CMA >>>>> - if (PageHuge(page)) { >>>>> - int page_mt = get_pageblock_migratetype(page); >>>>> - struct compact_control cc = { >>>>> - .nr_migratepages = 0, >>>>> - .order = -1, >>>>> - .zone = page_zone(pfn_to_page(head_pfn)), >>>>> - .mode = MIGRATE_SYNC, >>>>> - .ignore_skip_hint = true, >>>>> - .no_set_skip_hint = true, >>>>> - .gfp_mask = gfp_flags, >>>>> - .alloc_contig = true, >>>>> - }; >>>>> - INIT_LIST_HEAD(&cc.migratepages); >>>>> - >>>>> - ret = __alloc_contig_migrate_range(&cc, head_pfn, >>>>> - head_pfn + nr_pages, page_mt); >>>>> - if (ret) >>>>> - goto failed; >>>> >>>> But won't this break alloc_contig_range() then? I would have expected that you have to special-case here on the migration reason (MEMORY_OFFLINE). >>>> >>> >>> Yes, this is what I did in rfc, only skip migration for offline path. >>> but Zi Yan suggested to remove migration totally[1] >>> >>> [1] https://lore.kernel.org/linux-mm/50FEEE33-49CA-48B5-B4C5-964F1BE25D43@nvidia.com/ >>> >>>> I remember some dirty details when we're trying to allcoate with a single pageblock for alloc_contig_range(). >> >> Most likely I was overthinking about the situation back then. I thought > > I'm more than happy if we can remove that code here :) > >> PageHuge, PageLRU, and __PageMovable all can be bigger than a pageblock, >> but in reality only PageHuge can and the gigantic PageHuge is freed as >> order-0. > > Does that still hold with Yu's patches to allocate/free gigantic pages from CMA using compound pages that are on the list (and likely already in mm-unstable)? I did not look at the freeing path of that patchset. As the buddy doesn't understand anything larger than MAX_ORDER, I would assume that we are fine. > > I assume the real issue is when we have a movable allocation (folio) that spans multiple pageblocks. For example, when MAX_ORDER is large than a single pageblock, like it is on x86. > > Besides gigantic pages, I wonder if that can happen. Likely currently really only with hugetlb. It is still OK after I checked Yu’s patch. The patch uses split_large_buddy() to free pages in pageblock granularity. That prevents pageblocks with different migratetypes being merged. > > > This means MIGRATE_ISOLATE pageblocks will get to the right >> free list after __alloc_contig_migrate_range(), the one after >> start_isolate_page_range(). >> >> David, I know we do not have cross-pageblock PageLRU yet (wait until >> someone adds PMD-level mTHP). But I am not sure about __PageMovable, >> even if you and Johannes told me that __PageMovable has no compound page. > > I think it's all order-0. Likely we should sanity check that somewhere (when setting a folio-page movable?). > > For example, the vmware balloon handles 2M pages differently than 4k pages. Only the latter is movable. Got it. > >> I wonder what are the use cases for __PageMovable. Is it possible for >> a driver to mark its cross-pageblock page __PageMovable and provide >> ->isolate_page and ->migratepage in its struct address_space_operations? >> Or it is unsupported, so I should not need to worry about it. > > I never tried. We should document and enforce/sanity check that it only works with order-0 for now. I tried when I was developing the commit b2c9e2fbba32 ("mm: make alloc_contig_range work at pageblock granularity") and it worked (see https://github.com/x-y-z/kernel-modules/blob/pageblock_test/pref-test.c#L52). That led to the complicated code in isolate_single_pageblock(). > >> >>>> >>>> Note that memory offlining always covers pageblocks large than MAX_ORDER chunks (which implies full pageblocks) but alloc_contig_range() + CMA might only cover (parts of) single pageblocks. >>>> >>>> Hoping Zi Yan can review :) >> >> At the moment, I think this is the right clean up. > > I think we want to have some way to catch when it changes. For example, can we warn if we find a LRU folio here that is large than a single pageblock? Definitely. We already have VM_WARN_ON_ONCE_PAGE(PageLRU(page), page); VM_WARN_ON_ONCE_PAGE(__PageMovable(page), page); when last time Johannes did the clean up. I agree that we will need some WARN_ON_ONCE in __SetPageMovable to check if any compound page is passed in. For > pageblock_order PageLRU, maybe a check in __folio_rmap_sanity_checks()? > > Also, I think we have to document why it works with hugetlb gigantic folios / large CMA allocations somewhere (the order-0 stuff you note above). Maybe as part of this changelog. I agree. Best Regards, Yan, Zi
On 2024/8/17 3:45, David Hildenbrand wrote: > On 16.08.24 13:30, Kefeng Wang wrote: >> >> >> On 2024/8/16 18:11, David Hildenbrand wrote: >>> On 16.08.24 06:06, Kefeng Wang wrote: >>>> The gigantic page size may larger than memory block size, so memory >>>> offline always fails in this case after commit b2c9e2fbba32 ("mm: make >>>> alloc_contig_range work at pageblock granularity"), >>>> >>>> offline_pages >>>> start_isolate_page_range >>>> start_isolate_page_range(isolate_before=true) >>>> isolate [isolate_start, isolate_start + pageblock_nr_pages) >>>> start_isolate_page_range(isolate_before=false) >>>> isolate [isolate_end - pageblock_nr_pages, isolate_end) >>>> pageblock >>>> __alloc_contig_migrate_range >>>> isolate_migratepages_range >>>> isolate_migratepages_block >>>> isolate_or_dissolve_huge_page >>>> if (hstate_is_gigantic(h)) >>>> return -ENOMEM; >>>> >>>> In fact, we don't need to migrate page in page range isolation, for >>>> memory offline path, there is do_migrate_range() to move the pages. >>>> For contig allocation, there is another __alloc_contig_migrate_range() >>>> after isolation to migrate the pages. So fix issue by skipping the >>>> __alloc_contig_migrate_range() in isolate_single_pageblock(). ... > > Please distill some of that in the patch description. Right now you only > talk about memory offlining and don't cover why alloc_contig_range() is > fine as well with this change. Borrowing some word from Zi, PageHuge(gigantic) can bigger than a pageblock, the gigantic PageHuge is freed as order-0. This means MIGRATE_ISOLATE pageblocks will get to the right free list after __alloc_contig_migrate_range(), the one after start_isolate_page_range() for alloc_contig_range(), this is same as in memory offline, it has own path to isolate/migrate used page and dissolve the free hugepages, so the migration code in isolate_single_pageblock() is not needed, let's cleanup it and which also fix the above the issue. Please correct me or help to write better description, thanks. > > Let me explore the details in the meantime ... :) >
On 17 Aug 2024, at 2:13, Kefeng Wang wrote: > On 2024/8/17 3:45, David Hildenbrand wrote: >> On 16.08.24 13:30, Kefeng Wang wrote: >>> >>> >>> On 2024/8/16 18:11, David Hildenbrand wrote: >>>> On 16.08.24 06:06, Kefeng Wang wrote: >>>>> The gigantic page size may larger than memory block size, so memory >>>>> offline always fails in this case after commit b2c9e2fbba32 ("mm: make >>>>> alloc_contig_range work at pageblock granularity"), >>>>> >>>>> offline_pages >>>>> start_isolate_page_range >>>>> start_isolate_page_range(isolate_before=true) >>>>> isolate [isolate_start, isolate_start + pageblock_nr_pages) >>>>> start_isolate_page_range(isolate_before=false) >>>>> isolate [isolate_end - pageblock_nr_pages, isolate_end) pageblock >>>>> __alloc_contig_migrate_range >>>>> isolate_migratepages_range >>>>> isolate_migratepages_block >>>>> isolate_or_dissolve_huge_page >>>>> if (hstate_is_gigantic(h)) >>>>> return -ENOMEM; >>>>> >>>>> In fact, we don't need to migrate page in page range isolation, for >>>>> memory offline path, there is do_migrate_range() to move the pages. >>>>> For contig allocation, there is another __alloc_contig_migrate_range() >>>>> after isolation to migrate the pages. So fix issue by skipping the >>>>> __alloc_contig_migrate_range() in isolate_single_pageblock(). > > ... > >> >> Please distill some of that in the patch description. Right now you only talk about memory offlining and don't cover why alloc_contig_range() is fine as well with this change. > > Borrowing some word from Zi, > > > PageHuge(gigantic) can bigger than a pageblock, the gigantic PageHuge is freed as order-0. This means MIGRATE_ISOLATE pageblocks will get to the right free list after __alloc_contig_migrate_range(), the one after > start_isolate_page_range() for alloc_contig_range(), this is same as in > memory offline, it has own path to isolate/migrate used page and dissolve the free hugepages, so the migration code in isolate_single_pageblock() is not needed, let's cleanup it and which also fix the above the issue. > > Please correct me or help to write better description, thanks. How about? Gigantic PageHuge is bigger than a pageblock, but since it is freed as order-0 pages, its pageblocks after being freed will get to the right free list. There is no need to have special handling code for them in start_isolate_page_range(). For both alloc_contig_range() and memory offline cases, the migration code after start_isolate_page_range() will be able to migrate gigantic PageHuge when possible. Let's clean up start_isolate_page_range() and fix the aforementioned memory offline failure issue all together. -- Best Regards, Yan, Zi
On 2024/8/18 7:58, Zi Yan wrote: > On 17 Aug 2024, at 2:13, Kefeng Wang wrote: > >> On 2024/8/17 3:45, David Hildenbrand wrote: >>> On 16.08.24 13:30, Kefeng Wang wrote: >>>> >>>> >>>> On 2024/8/16 18:11, David Hildenbrand wrote: >>>>> On 16.08.24 06:06, Kefeng Wang wrote: >>>>>> The gigantic page size may larger than memory block size, so memory >>>>>> offline always fails in this case after commit b2c9e2fbba32 ("mm: make >>>>>> alloc_contig_range work at pageblock granularity"), >>>>>> >>>>>> offline_pages >>>>>> start_isolate_page_range >>>>>> start_isolate_page_range(isolate_before=true) >>>>>> isolate [isolate_start, isolate_start + pageblock_nr_pages) >>>>>> start_isolate_page_range(isolate_before=false) >>>>>> isolate [isolate_end - pageblock_nr_pages, isolate_end) pageblock >>>>>> __alloc_contig_migrate_range >>>>>> isolate_migratepages_range >>>>>> isolate_migratepages_block >>>>>> isolate_or_dissolve_huge_page >>>>>> if (hstate_is_gigantic(h)) >>>>>> return -ENOMEM; >>>>>> >>>>>> In fact, we don't need to migrate page in page range isolation, for >>>>>> memory offline path, there is do_migrate_range() to move the pages. >>>>>> For contig allocation, there is another __alloc_contig_migrate_range() >>>>>> after isolation to migrate the pages. So fix issue by skipping the >>>>>> __alloc_contig_migrate_range() in isolate_single_pageblock(). >> >> ... >> >>> >>> Please distill some of that in the patch description. Right now you only talk about memory offlining and don't cover why alloc_contig_range() is fine as well with this change. >> >> Borrowing some word from Zi, >> >> >> PageHuge(gigantic) can bigger than a pageblock, the gigantic PageHuge is freed as order-0. This means MIGRATE_ISOLATE pageblocks will get to the right free list after __alloc_contig_migrate_range(), the one after >> start_isolate_page_range() for alloc_contig_range(), this is same as in >> memory offline, it has own path to isolate/migrate used page and dissolve the free hugepages, so the migration code in isolate_single_pageblock() is not needed, let's cleanup it and which also fix the above the issue. >> >> Please correct me or help to write better description, thanks. > > How about? > > Gigantic PageHuge is bigger than a pageblock, but since it is freed as order-0 pages, > its pageblocks after being freed will get to the right free list. There is no need > to have special handling code for them in start_isolate_page_range(). For both > alloc_contig_range() and memory offline cases, the migration code after > start_isolate_page_range() will be able to migrate gigantic PageHuge when possible. > Let's clean up start_isolate_page_range() and fix the aforementioned memory offline > failure issue all together. Thanks Zi, it is better, will update. > > > -- > Best Regards, > Yan, Zi
On Sat, 17 Aug 2024 19:58:07 -0400 Zi Yan <ziy@nvidia.com> wrote: > > Please correct me or help to write better description, thanks. > > How about? > > Gigantic PageHuge is bigger than a pageblock, but since it is freed as order-0 pages, > its pageblocks after being freed will get to the right free list. There is no need > to have special handling code for them in start_isolate_page_range(). For both > alloc_contig_range() and memory offline cases, the migration code after > start_isolate_page_range() will be able to migrate gigantic PageHuge when possible. > Let's clean up start_isolate_page_range() and fix the aforementioned memory offline > failure issue all together. Thanks, I updated the changelog with the above.
diff --git a/mm/page_isolation.c b/mm/page_isolation.c index 39fb8c07aeb7..7e04047977cf 100644 --- a/mm/page_isolation.c +++ b/mm/page_isolation.c @@ -403,30 +403,8 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, int flags, unsigned long head_pfn = page_to_pfn(head); unsigned long nr_pages = compound_nr(head); - if (head_pfn + nr_pages <= boundary_pfn) { - pfn = head_pfn + nr_pages; - continue; - } - -#if defined CONFIG_COMPACTION || defined CONFIG_CMA - if (PageHuge(page)) { - int page_mt = get_pageblock_migratetype(page); - struct compact_control cc = { - .nr_migratepages = 0, - .order = -1, - .zone = page_zone(pfn_to_page(head_pfn)), - .mode = MIGRATE_SYNC, - .ignore_skip_hint = true, - .no_set_skip_hint = true, - .gfp_mask = gfp_flags, - .alloc_contig = true, - }; - INIT_LIST_HEAD(&cc.migratepages); - - ret = __alloc_contig_migrate_range(&cc, head_pfn, - head_pfn + nr_pages, page_mt); - if (ret) - goto failed; + if (head_pfn + nr_pages <= boundary_pfn || + PageHuge(page)) { pfn = head_pfn + nr_pages; continue; } @@ -440,7 +418,7 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, int flags, */ VM_WARN_ON_ONCE_PAGE(PageLRU(page), page); VM_WARN_ON_ONCE_PAGE(__PageMovable(page), page); -#endif + goto failed; }
The gigantic page size may larger than memory block size, so memory offline always fails in this case after commit b2c9e2fbba32 ("mm: make alloc_contig_range work at pageblock granularity"), offline_pages start_isolate_page_range start_isolate_page_range(isolate_before=true) isolate [isolate_start, isolate_start + pageblock_nr_pages) start_isolate_page_range(isolate_before=false) isolate [isolate_end - pageblock_nr_pages, isolate_end) pageblock __alloc_contig_migrate_range isolate_migratepages_range isolate_migratepages_block isolate_or_dissolve_huge_page if (hstate_is_gigantic(h)) return -ENOMEM; In fact, we don't need to migrate page in page range isolation, for memory offline path, there is do_migrate_range() to move the pages. For contig allocation, there is another __alloc_contig_migrate_range() after isolation to migrate the pages. So fix issue by skipping the __alloc_contig_migrate_range() in isolate_single_pageblock(). Fixes: b2c9e2fbba32 ("mm: make alloc_contig_range work at pageblock granularity") Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> --- mm/page_isolation.c | 28 +++------------------------- 1 file changed, 3 insertions(+), 25 deletions(-)