diff mbox series

mm: remove migration for HugePage in isolate_single_pageblock()

Message ID 20240816040625.650053-1-wangkefeng.wang@huawei.com (mailing list archive)
State New
Headers show
Series mm: remove migration for HugePage in isolate_single_pageblock() | expand

Commit Message

Kefeng Wang Aug. 16, 2024, 4:06 a.m. UTC
The gigantic page size may larger than memory block size, so memory
offline always fails in this case after commit b2c9e2fbba32 ("mm: make
alloc_contig_range work at pageblock granularity"),

offline_pages
  start_isolate_page_range
    start_isolate_page_range(isolate_before=true)
      isolate [isolate_start, isolate_start + pageblock_nr_pages)
    start_isolate_page_range(isolate_before=false)
      isolate [isolate_end - pageblock_nr_pages, isolate_end) pageblock
       	__alloc_contig_migrate_range
          isolate_migratepages_range
            isolate_migratepages_block
              isolate_or_dissolve_huge_page
                if (hstate_is_gigantic(h))
                    return -ENOMEM;

In fact, we don't need to migrate page in page range isolation, for
memory offline path, there is do_migrate_range() to move the pages.
For contig allocation, there is another __alloc_contig_migrate_range()
after isolation to migrate the pages. So fix issue by skipping the
__alloc_contig_migrate_range() in isolate_single_pageblock().

Fixes: b2c9e2fbba32 ("mm: make alloc_contig_range work at pageblock granularity")
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
 mm/page_isolation.c | 28 +++-------------------------
 1 file changed, 3 insertions(+), 25 deletions(-)

Comments

Andrew Morton Aug. 16, 2024, 4:58 a.m. UTC | #1
On Fri, 16 Aug 2024 12:06:25 +0800 Kefeng Wang <wangkefeng.wang@huawei.com> wrote:

> The gigantic page size may larger than memory block size, so memory
> offline always fails in this case after commit b2c9e2fbba32 ("mm: make
> alloc_contig_range work at pageblock granularity"),
> 
> offline_pages
>   start_isolate_page_range
>     start_isolate_page_range(isolate_before=true)
>       isolate [isolate_start, isolate_start + pageblock_nr_pages)
>     start_isolate_page_range(isolate_before=false)
>       isolate [isolate_end - pageblock_nr_pages, isolate_end) pageblock
>        	__alloc_contig_migrate_range
>           isolate_migratepages_range
>             isolate_migratepages_block
>               isolate_or_dissolve_huge_page
>                 if (hstate_is_gigantic(h))
>                     return -ENOMEM;
> 
> In fact, we don't need to migrate page in page range isolation, for
> memory offline path, there is do_migrate_range() to move the pages.
> For contig allocation, there is another __alloc_contig_migrate_range()
> after isolation to migrate the pages. So fix issue by skipping the
> __alloc_contig_migrate_range() in isolate_single_pageblock().
> 
> Fixes: b2c9e2fbba32 ("mm: make alloc_contig_range work at pageblock granularity")

Should we backport this into -stable kernels?
Kefeng Wang Aug. 16, 2024, 6:10 a.m. UTC | #2
On 2024/8/16 12:58, Andrew Morton wrote:
> On Fri, 16 Aug 2024 12:06:25 +0800 Kefeng Wang <wangkefeng.wang@huawei.com> wrote:
> 
>> The gigantic page size may larger than memory block size, so memory
>> offline always fails in this case after commit b2c9e2fbba32 ("mm: make
>> alloc_contig_range work at pageblock granularity"),
>>
>> offline_pages
>>    start_isolate_page_range
>>      start_isolate_page_range(isolate_before=true)
>>        isolate [isolate_start, isolate_start + pageblock_nr_pages)
>>      start_isolate_page_range(isolate_before=false)
>>        isolate [isolate_end - pageblock_nr_pages, isolate_end) pageblock
>>         	__alloc_contig_migrate_range
>>            isolate_migratepages_range
>>              isolate_migratepages_block
>>                isolate_or_dissolve_huge_page
>>                  if (hstate_is_gigantic(h))
>>                      return -ENOMEM;
>>
>> In fact, we don't need to migrate page in page range isolation, for
>> memory offline path, there is do_migrate_range() to move the pages.
>> For contig allocation, there is another __alloc_contig_migrate_range()
>> after isolation to migrate the pages. So fix issue by skipping the
>> __alloc_contig_migrate_range() in isolate_single_pageblock().
>>
>> Fixes: b2c9e2fbba32 ("mm: make alloc_contig_range work at pageblock granularity")
> 
> Should we backport this into -stable kernels?

Better to backport to stable since memory offline always fail in this case.

>
David Hildenbrand Aug. 16, 2024, 10:11 a.m. UTC | #3
On 16.08.24 06:06, Kefeng Wang wrote:
> The gigantic page size may larger than memory block size, so memory
> offline always fails in this case after commit b2c9e2fbba32 ("mm: make
> alloc_contig_range work at pageblock granularity"),
> 
> offline_pages
>    start_isolate_page_range
>      start_isolate_page_range(isolate_before=true)
>        isolate [isolate_start, isolate_start + pageblock_nr_pages)
>      start_isolate_page_range(isolate_before=false)
>        isolate [isolate_end - pageblock_nr_pages, isolate_end) pageblock
>         	__alloc_contig_migrate_range
>            isolate_migratepages_range
>              isolate_migratepages_block
>                isolate_or_dissolve_huge_page
>                  if (hstate_is_gigantic(h))
>                      return -ENOMEM;
> 
> In fact, we don't need to migrate page in page range isolation, for
> memory offline path, there is do_migrate_range() to move the pages.
> For contig allocation, there is another __alloc_contig_migrate_range()
> after isolation to migrate the pages. So fix issue by skipping the
> __alloc_contig_migrate_range() in isolate_single_pageblock().
> 
> Fixes: b2c9e2fbba32 ("mm: make alloc_contig_range work at pageblock granularity")
> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
> ---
>   mm/page_isolation.c | 28 +++-------------------------
>   1 file changed, 3 insertions(+), 25 deletions(-)
> 
> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> index 39fb8c07aeb7..7e04047977cf 100644
> --- a/mm/page_isolation.c
> +++ b/mm/page_isolation.c
> @@ -403,30 +403,8 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, int flags,
>   			unsigned long head_pfn = page_to_pfn(head);
>   			unsigned long nr_pages = compound_nr(head);
>   
> -			if (head_pfn + nr_pages <= boundary_pfn) {
> -				pfn = head_pfn + nr_pages;
> -				continue;
> -			}
> -
> -#if defined CONFIG_COMPACTION || defined CONFIG_CMA
> -			if (PageHuge(page)) {
> -				int page_mt = get_pageblock_migratetype(page);
> -				struct compact_control cc = {
> -					.nr_migratepages = 0,
> -					.order = -1,
> -					.zone = page_zone(pfn_to_page(head_pfn)),
> -					.mode = MIGRATE_SYNC,
> -					.ignore_skip_hint = true,
> -					.no_set_skip_hint = true,
> -					.gfp_mask = gfp_flags,
> -					.alloc_contig = true,
> -				};
> -				INIT_LIST_HEAD(&cc.migratepages);
> -
> -				ret = __alloc_contig_migrate_range(&cc, head_pfn,
> -							head_pfn + nr_pages, page_mt);
> -				if (ret)
> -					goto failed;

But won't this break alloc_contig_range() then? I would have expected 
that you have to special-case here on the migration reason (MEMORY_OFFLINE).

I remember some dirty details when we're trying to allcoate with a 
single pageblock for alloc_contig_range().

Note that memory offlining always covers pageblocks large than MAX_ORDER 
chunks (which implies full pageblocks) but alloc_contig_range() + CMA 
might only cover (parts of) single pageblocks.

Hoping Zi Yan can review :)
Kefeng Wang Aug. 16, 2024, 11:30 a.m. UTC | #4
On 2024/8/16 18:11, David Hildenbrand wrote:
> On 16.08.24 06:06, Kefeng Wang wrote:
>> The gigantic page size may larger than memory block size, so memory
>> offline always fails in this case after commit b2c9e2fbba32 ("mm: make
>> alloc_contig_range work at pageblock granularity"),
>>
>> offline_pages
>>    start_isolate_page_range
>>      start_isolate_page_range(isolate_before=true)
>>        isolate [isolate_start, isolate_start + pageblock_nr_pages)
>>      start_isolate_page_range(isolate_before=false)
>>        isolate [isolate_end - pageblock_nr_pages, isolate_end) pageblock
>>             __alloc_contig_migrate_range
>>            isolate_migratepages_range
>>              isolate_migratepages_block
>>                isolate_or_dissolve_huge_page
>>                  if (hstate_is_gigantic(h))
>>                      return -ENOMEM;
>>
>> In fact, we don't need to migrate page in page range isolation, for
>> memory offline path, there is do_migrate_range() to move the pages.
>> For contig allocation, there is another __alloc_contig_migrate_range()
>> after isolation to migrate the pages. So fix issue by skipping the
>> __alloc_contig_migrate_range() in isolate_single_pageblock().
>>
>> Fixes: b2c9e2fbba32 ("mm: make alloc_contig_range work at pageblock 
>> granularity")
>> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
>> ---
>>   mm/page_isolation.c | 28 +++-------------------------
>>   1 file changed, 3 insertions(+), 25 deletions(-)
>>
>> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
>> index 39fb8c07aeb7..7e04047977cf 100644
>> --- a/mm/page_isolation.c
>> +++ b/mm/page_isolation.c
>> @@ -403,30 +403,8 @@ static int isolate_single_pageblock(unsigned long 
>> boundary_pfn, int flags,
>>               unsigned long head_pfn = page_to_pfn(head);
>>               unsigned long nr_pages = compound_nr(head);
>> -            if (head_pfn + nr_pages <= boundary_pfn) {
>> -                pfn = head_pfn + nr_pages;
>> -                continue;
>> -            }
>> -
>> -#if defined CONFIG_COMPACTION || defined CONFIG_CMA
>> -            if (PageHuge(page)) {
>> -                int page_mt = get_pageblock_migratetype(page);
>> -                struct compact_control cc = {
>> -                    .nr_migratepages = 0,
>> -                    .order = -1,
>> -                    .zone = page_zone(pfn_to_page(head_pfn)),
>> -                    .mode = MIGRATE_SYNC,
>> -                    .ignore_skip_hint = true,
>> -                    .no_set_skip_hint = true,
>> -                    .gfp_mask = gfp_flags,
>> -                    .alloc_contig = true,
>> -                };
>> -                INIT_LIST_HEAD(&cc.migratepages);
>> -
>> -                ret = __alloc_contig_migrate_range(&cc, head_pfn,
>> -                            head_pfn + nr_pages, page_mt);
>> -                if (ret)
>> -                    goto failed;
> 
> But won't this break alloc_contig_range() then? I would have expected 
> that you have to special-case here on the migration reason 
> (MEMORY_OFFLINE).
> 

Yes, this is what I did in rfc, only skip migration for offline path.
but Zi Yan suggested to remove migration totally[1]

[1] 
https://lore.kernel.org/linux-mm/50FEEE33-49CA-48B5-B4C5-964F1BE25D43@nvidia.com/

> I remember some dirty details when we're trying to allcoate with a 
> single pageblock for alloc_contig_range().
> 
> Note that memory offlining always covers pageblocks large than MAX_ORDER 
> chunks (which implies full pageblocks) but alloc_contig_range() + CMA 
> might only cover (parts of) single pageblocks.
> 
> Hoping Zi Yan can review :)
>
Zi Yan Aug. 16, 2024, 3:06 p.m. UTC | #5
On 16 Aug 2024, at 7:30, Kefeng Wang wrote:

> On 2024/8/16 18:11, David Hildenbrand wrote:
>> On 16.08.24 06:06, Kefeng Wang wrote:
>>> The gigantic page size may larger than memory block size, so memory
>>> offline always fails in this case after commit b2c9e2fbba32 ("mm: make
>>> alloc_contig_range work at pageblock granularity"),
>>>
>>> offline_pages
>>>    start_isolate_page_range
>>>      start_isolate_page_range(isolate_before=true)
>>>        isolate [isolate_start, isolate_start + pageblock_nr_pages)
>>>      start_isolate_page_range(isolate_before=false)
>>>        isolate [isolate_end - pageblock_nr_pages, isolate_end) pageblock
>>>             __alloc_contig_migrate_range
>>>            isolate_migratepages_range
>>>              isolate_migratepages_block
>>>                isolate_or_dissolve_huge_page
>>>                  if (hstate_is_gigantic(h))
>>>                      return -ENOMEM;
>>>
>>> In fact, we don't need to migrate page in page range isolation, for
>>> memory offline path, there is do_migrate_range() to move the pages.
>>> For contig allocation, there is another __alloc_contig_migrate_range()
>>> after isolation to migrate the pages. So fix issue by skipping the
>>> __alloc_contig_migrate_range() in isolate_single_pageblock().
>>>
>>> Fixes: b2c9e2fbba32 ("mm: make alloc_contig_range work at pageblock granularity")
>>> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
>>> ---
>>>   mm/page_isolation.c | 28 +++-------------------------
>>>   1 file changed, 3 insertions(+), 25 deletions(-)
>>>
>>> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
>>> index 39fb8c07aeb7..7e04047977cf 100644
>>> --- a/mm/page_isolation.c
>>> +++ b/mm/page_isolation.c
>>> @@ -403,30 +403,8 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, int flags,
>>>               unsigned long head_pfn = page_to_pfn(head);
>>>               unsigned long nr_pages = compound_nr(head);
>>> -            if (head_pfn + nr_pages <= boundary_pfn) {
>>> -                pfn = head_pfn + nr_pages;
>>> -                continue;
>>> -            }
>>> -
>>> -#if defined CONFIG_COMPACTION || defined CONFIG_CMA
>>> -            if (PageHuge(page)) {
>>> -                int page_mt = get_pageblock_migratetype(page);
>>> -                struct compact_control cc = {
>>> -                    .nr_migratepages = 0,
>>> -                    .order = -1,
>>> -                    .zone = page_zone(pfn_to_page(head_pfn)),
>>> -                    .mode = MIGRATE_SYNC,
>>> -                    .ignore_skip_hint = true,
>>> -                    .no_set_skip_hint = true,
>>> -                    .gfp_mask = gfp_flags,
>>> -                    .alloc_contig = true,
>>> -                };
>>> -                INIT_LIST_HEAD(&cc.migratepages);
>>> -
>>> -                ret = __alloc_contig_migrate_range(&cc, head_pfn,
>>> -                            head_pfn + nr_pages, page_mt);
>>> -                if (ret)
>>> -                    goto failed;
>>
>> But won't this break alloc_contig_range() then? I would have expected that you have to special-case here on the migration reason (MEMORY_OFFLINE).
>>
>
> Yes, this is what I did in rfc, only skip migration for offline path.
> but Zi Yan suggested to remove migration totally[1]
>
> [1] https://lore.kernel.org/linux-mm/50FEEE33-49CA-48B5-B4C5-964F1BE25D43@nvidia.com/
>
>> I remember some dirty details when we're trying to allcoate with a single pageblock for alloc_contig_range().

Most likely I was overthinking about the situation back then. I thought
PageHuge, PageLRU, and __PageMovable all can be bigger than a pageblock,
but in reality only PageHuge can and the gigantic PageHuge is freed as
order-0. This means MIGRATE_ISOLATE pageblocks will get to the right
free list after __alloc_contig_migrate_range(), the one after
start_isolate_page_range().

David, I know we do not have cross-pageblock PageLRU yet (wait until
someone adds PMD-level mTHP). But I am not sure about __PageMovable,
even if you and Johannes told me that __PageMovable has no compound page.
I wonder what are the use cases for __PageMovable. Is it possible for
a driver to mark its cross-pageblock page __PageMovable and provide
->isolate_page and ->migratepage in its struct address_space_operations?
Or it is unsupported, so I should not need to worry about it.

>>
>> Note that memory offlining always covers pageblocks large than MAX_ORDER chunks (which implies full pageblocks) but alloc_contig_range() + CMA might only cover (parts of) single pageblocks.
>>
>> Hoping Zi Yan can review :)

At the moment, I think this is the right clean up.

Acked-by: Zi Yan <ziy@nvidia.com>


Best Regards,
Yan, Zi
David Hildenbrand Aug. 16, 2024, 7:45 p.m. UTC | #6
On 16.08.24 13:30, Kefeng Wang wrote:
> 
> 
> On 2024/8/16 18:11, David Hildenbrand wrote:
>> On 16.08.24 06:06, Kefeng Wang wrote:
>>> The gigantic page size may larger than memory block size, so memory
>>> offline always fails in this case after commit b2c9e2fbba32 ("mm: make
>>> alloc_contig_range work at pageblock granularity"),
>>>
>>> offline_pages
>>>     start_isolate_page_range
>>>       start_isolate_page_range(isolate_before=true)
>>>         isolate [isolate_start, isolate_start + pageblock_nr_pages)
>>>       start_isolate_page_range(isolate_before=false)
>>>         isolate [isolate_end - pageblock_nr_pages, isolate_end) pageblock
>>>              __alloc_contig_migrate_range
>>>             isolate_migratepages_range
>>>               isolate_migratepages_block
>>>                 isolate_or_dissolve_huge_page
>>>                   if (hstate_is_gigantic(h))
>>>                       return -ENOMEM;
>>>
>>> In fact, we don't need to migrate page in page range isolation, for
>>> memory offline path, there is do_migrate_range() to move the pages.
>>> For contig allocation, there is another __alloc_contig_migrate_range()
>>> after isolation to migrate the pages. So fix issue by skipping the
>>> __alloc_contig_migrate_range() in isolate_single_pageblock().
>>>
>>> Fixes: b2c9e2fbba32 ("mm: make alloc_contig_range work at pageblock
>>> granularity")
>>> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
>>> ---
>>>    mm/page_isolation.c | 28 +++-------------------------
>>>    1 file changed, 3 insertions(+), 25 deletions(-)
>>>
>>> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
>>> index 39fb8c07aeb7..7e04047977cf 100644
>>> --- a/mm/page_isolation.c
>>> +++ b/mm/page_isolation.c
>>> @@ -403,30 +403,8 @@ static int isolate_single_pageblock(unsigned long
>>> boundary_pfn, int flags,
>>>                unsigned long head_pfn = page_to_pfn(head);
>>>                unsigned long nr_pages = compound_nr(head);
>>> -            if (head_pfn + nr_pages <= boundary_pfn) {
>>> -                pfn = head_pfn + nr_pages;
>>> -                continue;
>>> -            }
>>> -
>>> -#if defined CONFIG_COMPACTION || defined CONFIG_CMA
>>> -            if (PageHuge(page)) {
>>> -                int page_mt = get_pageblock_migratetype(page);
>>> -                struct compact_control cc = {
>>> -                    .nr_migratepages = 0,
>>> -                    .order = -1,
>>> -                    .zone = page_zone(pfn_to_page(head_pfn)),
>>> -                    .mode = MIGRATE_SYNC,
>>> -                    .ignore_skip_hint = true,
>>> -                    .no_set_skip_hint = true,
>>> -                    .gfp_mask = gfp_flags,
>>> -                    .alloc_contig = true,
>>> -                };
>>> -                INIT_LIST_HEAD(&cc.migratepages);
>>> -
>>> -                ret = __alloc_contig_migrate_range(&cc, head_pfn,
>>> -                            head_pfn + nr_pages, page_mt);
>>> -                if (ret)
>>> -                    goto failed;
>>
>> But won't this break alloc_contig_range() then? I would have expected
>> that you have to special-case here on the migration reason
>> (MEMORY_OFFLINE).
>>
> 
> Yes, this is what I did in rfc, only skip migration for offline path.
> but Zi Yan suggested to remove migration totally[1]

Please distill some of that in the patch description. Right now you only 
talk about memory offlining and don't cover why alloc_contig_range() is 
fine as well with this change.

Let me explore the details in the meantime ... :)
David Hildenbrand Aug. 16, 2024, 8:12 p.m. UTC | #7
On 16.08.24 17:06, Zi Yan wrote:
> On 16 Aug 2024, at 7:30, Kefeng Wang wrote:
> 
>> On 2024/8/16 18:11, David Hildenbrand wrote:
>>> On 16.08.24 06:06, Kefeng Wang wrote:
>>>> The gigantic page size may larger than memory block size, so memory
>>>> offline always fails in this case after commit b2c9e2fbba32 ("mm: make
>>>> alloc_contig_range work at pageblock granularity"),
>>>>
>>>> offline_pages
>>>>     start_isolate_page_range
>>>>       start_isolate_page_range(isolate_before=true)
>>>>         isolate [isolate_start, isolate_start + pageblock_nr_pages)
>>>>       start_isolate_page_range(isolate_before=false)
>>>>         isolate [isolate_end - pageblock_nr_pages, isolate_end) pageblock
>>>>              __alloc_contig_migrate_range
>>>>             isolate_migratepages_range
>>>>               isolate_migratepages_block
>>>>                 isolate_or_dissolve_huge_page
>>>>                   if (hstate_is_gigantic(h))
>>>>                       return -ENOMEM;
>>>>
>>>> In fact, we don't need to migrate page in page range isolation, for
>>>> memory offline path, there is do_migrate_range() to move the pages.
>>>> For contig allocation, there is another __alloc_contig_migrate_range()
>>>> after isolation to migrate the pages. So fix issue by skipping the
>>>> __alloc_contig_migrate_range() in isolate_single_pageblock().
>>>>
>>>> Fixes: b2c9e2fbba32 ("mm: make alloc_contig_range work at pageblock granularity")
>>>> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
>>>> ---
>>>>    mm/page_isolation.c | 28 +++-------------------------
>>>>    1 file changed, 3 insertions(+), 25 deletions(-)
>>>>
>>>> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
>>>> index 39fb8c07aeb7..7e04047977cf 100644
>>>> --- a/mm/page_isolation.c
>>>> +++ b/mm/page_isolation.c
>>>> @@ -403,30 +403,8 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, int flags,
>>>>                unsigned long head_pfn = page_to_pfn(head);
>>>>                unsigned long nr_pages = compound_nr(head);
>>>> -            if (head_pfn + nr_pages <= boundary_pfn) {
>>>> -                pfn = head_pfn + nr_pages;
>>>> -                continue;
>>>> -            }
>>>> -
>>>> -#if defined CONFIG_COMPACTION || defined CONFIG_CMA
>>>> -            if (PageHuge(page)) {
>>>> -                int page_mt = get_pageblock_migratetype(page);
>>>> -                struct compact_control cc = {
>>>> -                    .nr_migratepages = 0,
>>>> -                    .order = -1,
>>>> -                    .zone = page_zone(pfn_to_page(head_pfn)),
>>>> -                    .mode = MIGRATE_SYNC,
>>>> -                    .ignore_skip_hint = true,
>>>> -                    .no_set_skip_hint = true,
>>>> -                    .gfp_mask = gfp_flags,
>>>> -                    .alloc_contig = true,
>>>> -                };
>>>> -                INIT_LIST_HEAD(&cc.migratepages);
>>>> -
>>>> -                ret = __alloc_contig_migrate_range(&cc, head_pfn,
>>>> -                            head_pfn + nr_pages, page_mt);
>>>> -                if (ret)
>>>> -                    goto failed;
>>>
>>> But won't this break alloc_contig_range() then? I would have expected that you have to special-case here on the migration reason (MEMORY_OFFLINE).
>>>
>>
>> Yes, this is what I did in rfc, only skip migration for offline path.
>> but Zi Yan suggested to remove migration totally[1]
>>
>> [1] https://lore.kernel.org/linux-mm/50FEEE33-49CA-48B5-B4C5-964F1BE25D43@nvidia.com/
>>
>>> I remember some dirty details when we're trying to allcoate with a single pageblock for alloc_contig_range().
> 
> Most likely I was overthinking about the situation back then. I thought

I'm more than happy if we can remove that code here :)

> PageHuge, PageLRU, and __PageMovable all can be bigger than a pageblock,
> but in reality only PageHuge can and the gigantic PageHuge is freed as
> order-0. 

Does that still hold with Yu's patches to allocate/free gigantic pages 
from CMA using compound pages that are on the list (and likely already 
in mm-unstable)? I did not look at the freeing path of that patchset. As 
the buddy doesn't understand anything larger than MAX_ORDER, I would 
assume that we are fine.

I assume the real issue is when we have a movable allocation (folio) 
that spans multiple pageblocks. For example, when MAX_ORDER is large 
than a single pageblock, like it is on x86.

Besides gigantic pages, I wonder if that can happen. Likely currently 
really only with hugetlb.


This means MIGRATE_ISOLATE pageblocks will get to the right
> free list after __alloc_contig_migrate_range(), the one after
> start_isolate_page_range().
> 
> David, I know we do not have cross-pageblock PageLRU yet (wait until
> someone adds PMD-level mTHP). But I am not sure about __PageMovable,
> even if you and Johannes told me that __PageMovable has no compound page.

I think it's all order-0. Likely we should sanity check that somewhere 
(when setting a folio-page movable?).

For example, the vmware balloon handles 2M pages differently than 4k 
pages. Only the latter is movable.

> I wonder what are the use cases for __PageMovable. Is it possible for
> a driver to mark its cross-pageblock page __PageMovable and provide
> ->isolate_page and ->migratepage in its struct address_space_operations?
> Or it is unsupported, so I should not need to worry about it.

I never tried. We should document and enforce/sanity check that it only 
works with order-0 for now.

> 
>>>
>>> Note that memory offlining always covers pageblocks large than MAX_ORDER chunks (which implies full pageblocks) but alloc_contig_range() + CMA might only cover (parts of) single pageblocks.
>>>
>>> Hoping Zi Yan can review :)
> 
> At the moment, I think this is the right clean up.

I think we want to have some way to catch when it changes. For example, 
can we warn if we find a LRU folio here that is large than a single 
pageblock?

Also, I think we have to document why it works with hugetlb gigantic 
folios / large CMA allocations somewhere (the order-0 stuff you note 
above). Maybe as part of this changelog.
Yu Zhao Aug. 16, 2024, 9:16 p.m. UTC | #8
On Fri, Aug 16, 2024 at 2:12 PM David Hildenbrand <david@redhat.com> wrote:
>
> On 16.08.24 17:06, Zi Yan wrote:
> > On 16 Aug 2024, at 7:30, Kefeng Wang wrote:
> >
> >> On 2024/8/16 18:11, David Hildenbrand wrote:
> >>> On 16.08.24 06:06, Kefeng Wang wrote:
> >>>> The gigantic page size may larger than memory block size, so memory
> >>>> offline always fails in this case after commit b2c9e2fbba32 ("mm: make
> >>>> alloc_contig_range work at pageblock granularity"),
> >>>>
> >>>> offline_pages
> >>>>     start_isolate_page_range
> >>>>       start_isolate_page_range(isolate_before=true)
> >>>>         isolate [isolate_start, isolate_start + pageblock_nr_pages)
> >>>>       start_isolate_page_range(isolate_before=false)
> >>>>         isolate [isolate_end - pageblock_nr_pages, isolate_end) pageblock
> >>>>              __alloc_contig_migrate_range
> >>>>             isolate_migratepages_range
> >>>>               isolate_migratepages_block
> >>>>                 isolate_or_dissolve_huge_page
> >>>>                   if (hstate_is_gigantic(h))
> >>>>                       return -ENOMEM;
> >>>>
> >>>> In fact, we don't need to migrate page in page range isolation, for
> >>>> memory offline path, there is do_migrate_range() to move the pages.
> >>>> For contig allocation, there is another __alloc_contig_migrate_range()
> >>>> after isolation to migrate the pages. So fix issue by skipping the
> >>>> __alloc_contig_migrate_range() in isolate_single_pageblock().
> >>>>
> >>>> Fixes: b2c9e2fbba32 ("mm: make alloc_contig_range work at pageblock granularity")
> >>>> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
> >>>> ---
> >>>>    mm/page_isolation.c | 28 +++-------------------------
> >>>>    1 file changed, 3 insertions(+), 25 deletions(-)
> >>>>
> >>>> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> >>>> index 39fb8c07aeb7..7e04047977cf 100644
> >>>> --- a/mm/page_isolation.c
> >>>> +++ b/mm/page_isolation.c
> >>>> @@ -403,30 +403,8 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, int flags,
> >>>>                unsigned long head_pfn = page_to_pfn(head);
> >>>>                unsigned long nr_pages = compound_nr(head);
> >>>> -            if (head_pfn + nr_pages <= boundary_pfn) {
> >>>> -                pfn = head_pfn + nr_pages;
> >>>> -                continue;
> >>>> -            }
> >>>> -
> >>>> -#if defined CONFIG_COMPACTION || defined CONFIG_CMA
> >>>> -            if (PageHuge(page)) {
> >>>> -                int page_mt = get_pageblock_migratetype(page);
> >>>> -                struct compact_control cc = {
> >>>> -                    .nr_migratepages = 0,
> >>>> -                    .order = -1,
> >>>> -                    .zone = page_zone(pfn_to_page(head_pfn)),
> >>>> -                    .mode = MIGRATE_SYNC,
> >>>> -                    .ignore_skip_hint = true,
> >>>> -                    .no_set_skip_hint = true,
> >>>> -                    .gfp_mask = gfp_flags,
> >>>> -                    .alloc_contig = true,
> >>>> -                };
> >>>> -                INIT_LIST_HEAD(&cc.migratepages);
> >>>> -
> >>>> -                ret = __alloc_contig_migrate_range(&cc, head_pfn,
> >>>> -                            head_pfn + nr_pages, page_mt);
> >>>> -                if (ret)
> >>>> -                    goto failed;
> >>>
> >>> But won't this break alloc_contig_range() then? I would have expected that you have to special-case here on the migration reason (MEMORY_OFFLINE).
> >>>
> >>
> >> Yes, this is what I did in rfc, only skip migration for offline path.
> >> but Zi Yan suggested to remove migration totally[1]
> >>
> >> [1] https://lore.kernel.org/linux-mm/50FEEE33-49CA-48B5-B4C5-964F1BE25D43@nvidia.com/
> >>
> >>> I remember some dirty details when we're trying to allcoate with a single pageblock for alloc_contig_range().
> >
> > Most likely I was overthinking about the situation back then. I thought
>
> I'm more than happy if we can remove that code here :)
>
> > PageHuge, PageLRU, and __PageMovable all can be bigger than a pageblock,
> > but in reality only PageHuge can and the gigantic PageHuge is freed as
> > order-0.
>
> Does that still hold with Yu's patches to allocate/free gigantic pages
> from CMA using compound pages that are on the list (and likely already
> in mm-unstable)?

Gigantic folios are now freed at pageblock granularity rather than
order-0, as Zi himself stated during the review :)

https://lore.kernel.org/linux-mm/29B680F7-E14D-4CD7-802B-5BBE1E1A3F92@nvidia.com/

> I did not look at the freeing path of that patchset. As
> the buddy doesn't understand anything larger than MAX_ORDER, I would
> assume that we are fine.

Correct.


> I assume the real issue is when we have a movable allocation (folio)
> that spans multiple pageblocks. For example, when MAX_ORDER is large
> than a single pageblock, like it is on x86.
>
> Besides gigantic pages, I wonder if that can happen. Likely currently
> really only with hugetlb.
>
>
> This means MIGRATE_ISOLATE pageblocks will get to the right
> > free list after __alloc_contig_migrate_range(), the one after
> > start_isolate_page_range().
> >
> > David, I know we do not have cross-pageblock PageLRU yet (wait until
> > someone adds PMD-level mTHP). But I am not sure about __PageMovable,
> > even if you and Johannes told me that __PageMovable has no compound page.
>
> I think it's all order-0. Likely we should sanity check that somewhere
> (when setting a folio-page movable?).
>
> For example, the vmware balloon handles 2M pages differently than 4k
> pages. Only the latter is movable.
>
> > I wonder what are the use cases for __PageMovable. Is it possible for
> > a driver to mark its cross-pageblock page __PageMovable and provide
> > ->isolate_page and ->migratepage in its struct address_space_operations?
> > Or it is unsupported, so I should not need to worry about it.
>
> I never tried. We should document and enforce/sanity check that it only
> works with order-0 for now.
>
> >
> >>>
> >>> Note that memory offlining always covers pageblocks large than MAX_ORDER chunks (which implies full pageblocks) but alloc_contig_range() + CMA might only cover (parts of) single pageblocks.
> >>>
> >>> Hoping Zi Yan can review :)
> >
> > At the moment, I think this is the right clean up.
>
> I think we want to have some way to catch when it changes. For example,
> can we warn if we find a LRU folio here that is large than a single
> pageblock?
>
> Also, I think we have to document why it works with hugetlb gigantic
> folios / large CMA allocations somewhere (the order-0 stuff you note
> above). Maybe as part of this changelog.
>
> --
> Cheers,
>
> David / dhildenb
>
>
Zi Yan Aug. 16, 2024, 10:09 p.m. UTC | #9
On 16 Aug 2024, at 16:12, David Hildenbrand wrote:

> On 16.08.24 17:06, Zi Yan wrote:
>> On 16 Aug 2024, at 7:30, Kefeng Wang wrote:
>>
>>> On 2024/8/16 18:11, David Hildenbrand wrote:
>>>> On 16.08.24 06:06, Kefeng Wang wrote:
>>>>> The gigantic page size may larger than memory block size, so memory
>>>>> offline always fails in this case after commit b2c9e2fbba32 ("mm: make
>>>>> alloc_contig_range work at pageblock granularity"),
>>>>>
>>>>> offline_pages
>>>>>     start_isolate_page_range
>>>>>       start_isolate_page_range(isolate_before=true)
>>>>>         isolate [isolate_start, isolate_start + pageblock_nr_pages)
>>>>>       start_isolate_page_range(isolate_before=false)
>>>>>         isolate [isolate_end - pageblock_nr_pages, isolate_end) pageblock
>>>>>              __alloc_contig_migrate_range
>>>>>             isolate_migratepages_range
>>>>>               isolate_migratepages_block
>>>>>                 isolate_or_dissolve_huge_page
>>>>>                   if (hstate_is_gigantic(h))
>>>>>                       return -ENOMEM;
>>>>>
>>>>> In fact, we don't need to migrate page in page range isolation, for
>>>>> memory offline path, there is do_migrate_range() to move the pages.
>>>>> For contig allocation, there is another __alloc_contig_migrate_range()
>>>>> after isolation to migrate the pages. So fix issue by skipping the
>>>>> __alloc_contig_migrate_range() in isolate_single_pageblock().
>>>>>
>>>>> Fixes: b2c9e2fbba32 ("mm: make alloc_contig_range work at pageblock granularity")
>>>>> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
>>>>> ---
>>>>>    mm/page_isolation.c | 28 +++-------------------------
>>>>>    1 file changed, 3 insertions(+), 25 deletions(-)
>>>>>
>>>>> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
>>>>> index 39fb8c07aeb7..7e04047977cf 100644
>>>>> --- a/mm/page_isolation.c
>>>>> +++ b/mm/page_isolation.c
>>>>> @@ -403,30 +403,8 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, int flags,
>>>>>                unsigned long head_pfn = page_to_pfn(head);
>>>>>                unsigned long nr_pages = compound_nr(head);
>>>>> -            if (head_pfn + nr_pages <= boundary_pfn) {
>>>>> -                pfn = head_pfn + nr_pages;
>>>>> -                continue;
>>>>> -            }
>>>>> -
>>>>> -#if defined CONFIG_COMPACTION || defined CONFIG_CMA
>>>>> -            if (PageHuge(page)) {
>>>>> -                int page_mt = get_pageblock_migratetype(page);
>>>>> -                struct compact_control cc = {
>>>>> -                    .nr_migratepages = 0,
>>>>> -                    .order = -1,
>>>>> -                    .zone = page_zone(pfn_to_page(head_pfn)),
>>>>> -                    .mode = MIGRATE_SYNC,
>>>>> -                    .ignore_skip_hint = true,
>>>>> -                    .no_set_skip_hint = true,
>>>>> -                    .gfp_mask = gfp_flags,
>>>>> -                    .alloc_contig = true,
>>>>> -                };
>>>>> -                INIT_LIST_HEAD(&cc.migratepages);
>>>>> -
>>>>> -                ret = __alloc_contig_migrate_range(&cc, head_pfn,
>>>>> -                            head_pfn + nr_pages, page_mt);
>>>>> -                if (ret)
>>>>> -                    goto failed;
>>>>
>>>> But won't this break alloc_contig_range() then? I would have expected that you have to special-case here on the migration reason (MEMORY_OFFLINE).
>>>>
>>>
>>> Yes, this is what I did in rfc, only skip migration for offline path.
>>> but Zi Yan suggested to remove migration totally[1]
>>>
>>> [1] https://lore.kernel.org/linux-mm/50FEEE33-49CA-48B5-B4C5-964F1BE25D43@nvidia.com/
>>>
>>>> I remember some dirty details when we're trying to allcoate with a single pageblock for alloc_contig_range().
>>
>> Most likely I was overthinking about the situation back then. I thought
>
> I'm more than happy if we can remove that code here :)
>
>> PageHuge, PageLRU, and __PageMovable all can be bigger than a pageblock,
>> but in reality only PageHuge can and the gigantic PageHuge is freed as
>> order-0.
>
> Does that still hold with Yu's patches to allocate/free gigantic pages from CMA using compound pages that are on the list (and likely already in mm-unstable)? I did not look at the freeing path of that patchset. As the buddy doesn't understand anything larger than MAX_ORDER, I would assume that we are fine.
>
> I assume the real issue is when we have a movable allocation (folio) that spans multiple pageblocks. For example, when MAX_ORDER is large than a single pageblock, like it is on x86.
>
> Besides gigantic pages, I wonder if that can happen. Likely currently really only with hugetlb.

It is still OK after I checked Yu’s patch. The patch uses split_large_buddy()
to free pages in pageblock granularity. That prevents pageblocks with different
migratetypes being merged.

>
>
> This means MIGRATE_ISOLATE pageblocks will get to the right
>> free list after __alloc_contig_migrate_range(), the one after
>> start_isolate_page_range().
>>
>> David, I know we do not have cross-pageblock PageLRU yet (wait until
>> someone adds PMD-level mTHP). But I am not sure about __PageMovable,
>> even if you and Johannes told me that __PageMovable has no compound page.
>
> I think it's all order-0. Likely we should sanity check that somewhere (when setting a folio-page movable?).
>
> For example, the vmware balloon handles 2M pages differently than 4k pages. Only the latter is movable.

Got it.

>
>> I wonder what are the use cases for __PageMovable. Is it possible for
>> a driver to mark its cross-pageblock page __PageMovable and provide
>> ->isolate_page and ->migratepage in its struct address_space_operations?
>> Or it is unsupported, so I should not need to worry about it.
>
> I never tried. We should document and enforce/sanity check that it only works with order-0 for now.

I tried when I was developing the commit b2c9e2fbba32 ("mm: make alloc_contig_range
work at pageblock granularity") and it worked (see https://github.com/x-y-z/kernel-modules/blob/pageblock_test/pref-test.c#L52). That led to the complicated
code in isolate_single_pageblock().

>
>>
>>>>
>>>> Note that memory offlining always covers pageblocks large than MAX_ORDER chunks (which implies full pageblocks) but alloc_contig_range() + CMA might only cover (parts of) single pageblocks.
>>>>
>>>> Hoping Zi Yan can review :)
>>
>> At the moment, I think this is the right clean up.
>
> I think we want to have some way to catch when it changes. For example, can we warn if we find a LRU folio here that is large than a single pageblock?

Definitely. We already have

VM_WARN_ON_ONCE_PAGE(PageLRU(page), page);
VM_WARN_ON_ONCE_PAGE(__PageMovable(page), page);

when last time Johannes did the clean up.

I agree that we will need some WARN_ON_ONCE in __SetPageMovable to check
if any compound page is passed in.

For > pageblock_order PageLRU, maybe a check in __folio_rmap_sanity_checks()?

>
> Also, I think we have to document why it works with hugetlb gigantic folios / large CMA allocations somewhere (the order-0 stuff you note above). Maybe as part of this changelog.

I agree.


Best Regards,
Yan, Zi
Kefeng Wang Aug. 17, 2024, 6:13 a.m. UTC | #10
On 2024/8/17 3:45, David Hildenbrand wrote:
> On 16.08.24 13:30, Kefeng Wang wrote:
>>
>>
>> On 2024/8/16 18:11, David Hildenbrand wrote:
>>> On 16.08.24 06:06, Kefeng Wang wrote:
>>>> The gigantic page size may larger than memory block size, so memory
>>>> offline always fails in this case after commit b2c9e2fbba32 ("mm: make
>>>> alloc_contig_range work at pageblock granularity"),
>>>>
>>>> offline_pages
>>>>     start_isolate_page_range
>>>>       start_isolate_page_range(isolate_before=true)
>>>>         isolate [isolate_start, isolate_start + pageblock_nr_pages)
>>>>       start_isolate_page_range(isolate_before=false)
>>>>         isolate [isolate_end - pageblock_nr_pages, isolate_end) 
>>>> pageblock
>>>>              __alloc_contig_migrate_range
>>>>             isolate_migratepages_range
>>>>               isolate_migratepages_block
>>>>                 isolate_or_dissolve_huge_page
>>>>                   if (hstate_is_gigantic(h))
>>>>                       return -ENOMEM;
>>>>
>>>> In fact, we don't need to migrate page in page range isolation, for
>>>> memory offline path, there is do_migrate_range() to move the pages.
>>>> For contig allocation, there is another __alloc_contig_migrate_range()
>>>> after isolation to migrate the pages. So fix issue by skipping the
>>>> __alloc_contig_migrate_range() in isolate_single_pageblock().

...

> 
> Please distill some of that in the patch description. Right now you only 
> talk about memory offlining and don't cover why alloc_contig_range() is 
> fine as well with this change.

Borrowing some word from Zi,


PageHuge(gigantic) can bigger than a pageblock, the gigantic PageHuge is 
freed as order-0. This means MIGRATE_ISOLATE pageblocks will get to the 
right free list after __alloc_contig_migrate_range(), the one after
start_isolate_page_range() for alloc_contig_range(), this is same as in
memory offline, it has own path to isolate/migrate used page and 
dissolve the free hugepages, so the migration code in 
isolate_single_pageblock() is not needed, let's cleanup it and which 
also fix the above the issue.

Please correct me or help to write better description, thanks.

> 
> Let me explore the details in the meantime ... :)
>
Zi Yan Aug. 17, 2024, 11:58 p.m. UTC | #11
On 17 Aug 2024, at 2:13, Kefeng Wang wrote:

> On 2024/8/17 3:45, David Hildenbrand wrote:
>> On 16.08.24 13:30, Kefeng Wang wrote:
>>>
>>>
>>> On 2024/8/16 18:11, David Hildenbrand wrote:
>>>> On 16.08.24 06:06, Kefeng Wang wrote:
>>>>> The gigantic page size may larger than memory block size, so memory
>>>>> offline always fails in this case after commit b2c9e2fbba32 ("mm: make
>>>>> alloc_contig_range work at pageblock granularity"),
>>>>>
>>>>> offline_pages
>>>>>     start_isolate_page_range
>>>>>       start_isolate_page_range(isolate_before=true)
>>>>>         isolate [isolate_start, isolate_start + pageblock_nr_pages)
>>>>>       start_isolate_page_range(isolate_before=false)
>>>>>         isolate [isolate_end - pageblock_nr_pages, isolate_end) pageblock
>>>>>              __alloc_contig_migrate_range
>>>>>             isolate_migratepages_range
>>>>>               isolate_migratepages_block
>>>>>                 isolate_or_dissolve_huge_page
>>>>>                   if (hstate_is_gigantic(h))
>>>>>                       return -ENOMEM;
>>>>>
>>>>> In fact, we don't need to migrate page in page range isolation, for
>>>>> memory offline path, there is do_migrate_range() to move the pages.
>>>>> For contig allocation, there is another __alloc_contig_migrate_range()
>>>>> after isolation to migrate the pages. So fix issue by skipping the
>>>>> __alloc_contig_migrate_range() in isolate_single_pageblock().
>
> ...
>
>>
>> Please distill some of that in the patch description. Right now you only talk about memory offlining and don't cover why alloc_contig_range() is fine as well with this change.
>
> Borrowing some word from Zi,
>
>
> PageHuge(gigantic) can bigger than a pageblock, the gigantic PageHuge is freed as order-0. This means MIGRATE_ISOLATE pageblocks will get to the right free list after __alloc_contig_migrate_range(), the one after
> start_isolate_page_range() for alloc_contig_range(), this is same as in
> memory offline, it has own path to isolate/migrate used page and dissolve the free hugepages, so the migration code in isolate_single_pageblock() is not needed, let's cleanup it and which also fix the above the issue.
>
> Please correct me or help to write better description, thanks.

How about?

Gigantic PageHuge is bigger than a pageblock, but since it is freed as order-0 pages,
its pageblocks after being freed will get to the right free list. There is no need
to have special handling code for them in start_isolate_page_range(). For both
alloc_contig_range() and memory offline cases, the migration code after
start_isolate_page_range() will be able to migrate gigantic PageHuge when possible.
Let's clean up start_isolate_page_range() and fix the aforementioned memory offline
failure issue all together.


--
Best Regards,
Yan, Zi
Kefeng Wang Aug. 19, 2024, 2:42 a.m. UTC | #12
On 2024/8/18 7:58, Zi Yan wrote:
> On 17 Aug 2024, at 2:13, Kefeng Wang wrote:
> 
>> On 2024/8/17 3:45, David Hildenbrand wrote:
>>> On 16.08.24 13:30, Kefeng Wang wrote:
>>>>
>>>>
>>>> On 2024/8/16 18:11, David Hildenbrand wrote:
>>>>> On 16.08.24 06:06, Kefeng Wang wrote:
>>>>>> The gigantic page size may larger than memory block size, so memory
>>>>>> offline always fails in this case after commit b2c9e2fbba32 ("mm: make
>>>>>> alloc_contig_range work at pageblock granularity"),
>>>>>>
>>>>>> offline_pages
>>>>>>      start_isolate_page_range
>>>>>>        start_isolate_page_range(isolate_before=true)
>>>>>>          isolate [isolate_start, isolate_start + pageblock_nr_pages)
>>>>>>        start_isolate_page_range(isolate_before=false)
>>>>>>          isolate [isolate_end - pageblock_nr_pages, isolate_end) pageblock
>>>>>>               __alloc_contig_migrate_range
>>>>>>              isolate_migratepages_range
>>>>>>                isolate_migratepages_block
>>>>>>                  isolate_or_dissolve_huge_page
>>>>>>                    if (hstate_is_gigantic(h))
>>>>>>                        return -ENOMEM;
>>>>>>
>>>>>> In fact, we don't need to migrate page in page range isolation, for
>>>>>> memory offline path, there is do_migrate_range() to move the pages.
>>>>>> For contig allocation, there is another __alloc_contig_migrate_range()
>>>>>> after isolation to migrate the pages. So fix issue by skipping the
>>>>>> __alloc_contig_migrate_range() in isolate_single_pageblock().
>>
>> ...
>>
>>>
>>> Please distill some of that in the patch description. Right now you only talk about memory offlining and don't cover why alloc_contig_range() is fine as well with this change.
>>
>> Borrowing some word from Zi,
>>
>>
>> PageHuge(gigantic) can bigger than a pageblock, the gigantic PageHuge is freed as order-0. This means MIGRATE_ISOLATE pageblocks will get to the right free list after __alloc_contig_migrate_range(), the one after
>> start_isolate_page_range() for alloc_contig_range(), this is same as in
>> memory offline, it has own path to isolate/migrate used page and dissolve the free hugepages, so the migration code in isolate_single_pageblock() is not needed, let's cleanup it and which also fix the above the issue.
>>
>> Please correct me or help to write better description, thanks.
> 
> How about?
> 
> Gigantic PageHuge is bigger than a pageblock, but since it is freed as order-0 pages,
> its pageblocks after being freed will get to the right free list. There is no need
> to have special handling code for them in start_isolate_page_range(). For both
> alloc_contig_range() and memory offline cases, the migration code after
> start_isolate_page_range() will be able to migrate gigantic PageHuge when possible.
> Let's clean up start_isolate_page_range() and fix the aforementioned memory offline
> failure issue all together.

Thanks Zi, it is better, will update.
> 
> 
> --
> Best Regards,
> Yan, Zi
Andrew Morton Aug. 21, 2024, 1:41 a.m. UTC | #13
On Sat, 17 Aug 2024 19:58:07 -0400 Zi Yan <ziy@nvidia.com> wrote:

> > Please correct me or help to write better description, thanks.
> 
> How about?
> 
> Gigantic PageHuge is bigger than a pageblock, but since it is freed as order-0 pages,
> its pageblocks after being freed will get to the right free list. There is no need
> to have special handling code for them in start_isolate_page_range(). For both
> alloc_contig_range() and memory offline cases, the migration code after
> start_isolate_page_range() will be able to migrate gigantic PageHuge when possible.
> Let's clean up start_isolate_page_range() and fix the aforementioned memory offline
> failure issue all together.

Thanks, I updated the changelog with the above.
diff mbox series

Patch

diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 39fb8c07aeb7..7e04047977cf 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -403,30 +403,8 @@  static int isolate_single_pageblock(unsigned long boundary_pfn, int flags,
 			unsigned long head_pfn = page_to_pfn(head);
 			unsigned long nr_pages = compound_nr(head);
 
-			if (head_pfn + nr_pages <= boundary_pfn) {
-				pfn = head_pfn + nr_pages;
-				continue;
-			}
-
-#if defined CONFIG_COMPACTION || defined CONFIG_CMA
-			if (PageHuge(page)) {
-				int page_mt = get_pageblock_migratetype(page);
-				struct compact_control cc = {
-					.nr_migratepages = 0,
-					.order = -1,
-					.zone = page_zone(pfn_to_page(head_pfn)),
-					.mode = MIGRATE_SYNC,
-					.ignore_skip_hint = true,
-					.no_set_skip_hint = true,
-					.gfp_mask = gfp_flags,
-					.alloc_contig = true,
-				};
-				INIT_LIST_HEAD(&cc.migratepages);
-
-				ret = __alloc_contig_migrate_range(&cc, head_pfn,
-							head_pfn + nr_pages, page_mt);
-				if (ret)
-					goto failed;
+			if (head_pfn + nr_pages <= boundary_pfn ||
+			    PageHuge(page)) {
 				pfn = head_pfn + nr_pages;
 				continue;
 			}
@@ -440,7 +418,7 @@  static int isolate_single_pageblock(unsigned long boundary_pfn, int flags,
 			 */
 			VM_WARN_ON_ONCE_PAGE(PageLRU(page), page);
 			VM_WARN_ON_ONCE_PAGE(__PageMovable(page), page);
-#endif
+
 			goto failed;
 		}