diff mbox series

[V2] mm/hugetlb: wait for hugepage folios to be freed

Message ID 1739604026-2258-1-git-send-email-yangge1116@126.com (mailing list archive)
State New
Headers show
Series [V2] mm/hugetlb: wait for hugepage folios to be freed | expand

Commit Message

Ge Yang Feb. 15, 2025, 7:20 a.m. UTC
From: Ge Yang <yangge1116@126.com>

Since the introduction of commit b65d4adbc0f0 ("mm: hugetlb: defer freeing
of HugeTLB pages"), which supports deferring the freeing of HugeTLB pages,
the allocation of contiguous memory through cma_alloc() may fail
probabilistically.

In the CMA allocation process, if it is found that the CMA area is occupied
by in-use hugepage folios, these in-use hugepage folios need to be migrated
to another location. When there are no available hugepage folios in the
free HugeTLB pool during the migration of in-use HugeTLB pages, new folios
are allocated from the buddy system. A temporary state is set on the newly
allocated folio. Upon completion of the hugepage folio migration, the
temporary state is transferred from the new folios to the old folios.
Normally, when the old folios with the temporary state are freed, it is
directly released back to the buddy system. However, due to the deferred
freeing of HugeTLB pages, the PageBuddy() check fails, ultimately leading
to the failure of cma_alloc().

Here is a simplified call trace illustrating the process:
cma_alloc()
    ->__alloc_contig_migrate_range() // Migrate in-use hugepage
        ->unmap_and_move_huge_page()
            ->folio_putback_hugetlb() // Free old folios
    ->test_pages_isolated()
        ->__test_page_isolated_in_pageblock()
             ->PageBuddy(page) // Check if the page is in buddy

To resolve this issue, we have implemented a function named
wait_for_hugepage_folios_freed(). This function ensures that the hugepage
folios are properly released back to the buddy system after their migration
is completed. By invoking wait_for_hugepage_folios_freed() before calling
PageBuddy(), we ensure that PageBuddy() will succeed.

Fixes: b65d4adbc0f0 ("mm: hugetlb: defer freeing of HugeTLB pages")
Signed-off-by: Ge Yang <yangge1116@126.com>
---

V2:
- flush all folios at once suggested by David

 include/linux/hugetlb.h |  5 +++++
 mm/hugetlb.c            |  8 ++++++++
 mm/page_isolation.c     | 10 ++++++++++
 3 files changed, 23 insertions(+)

Comments

Andrew Morton Feb. 18, 2025, 4:34 a.m. UTC | #1
On Sat, 15 Feb 2025 15:20:26 +0800 yangge1116@126.com wrote:

> From: Ge Yang <yangge1116@126.com>
> 
> Since the introduction of commit b65d4adbc0f0 ("mm: hugetlb: defer freeing
> of HugeTLB pages"), which supports deferring the freeing of HugeTLB pages,
> the allocation of contiguous memory through cma_alloc() may fail
> probabilistically.
> 
> In the CMA allocation process, if it is found that the CMA area is occupied
> by in-use hugepage folios, these in-use hugepage folios need to be migrated
> to another location. When there are no available hugepage folios in the
> free HugeTLB pool during the migration of in-use HugeTLB pages, new folios
> are allocated from the buddy system. A temporary state is set on the newly
> allocated folio. Upon completion of the hugepage folio migration, the
> temporary state is transferred from the new folios to the old folios.
> Normally, when the old folios with the temporary state are freed, it is
> directly released back to the buddy system. However, due to the deferred
> freeing of HugeTLB pages, the PageBuddy() check fails, ultimately leading
> to the failure of cma_alloc().
> 
> Here is a simplified call trace illustrating the process:
> cma_alloc()
>     ->__alloc_contig_migrate_range() // Migrate in-use hugepage
>         ->unmap_and_move_huge_page()
>             ->folio_putback_hugetlb() // Free old folios
>     ->test_pages_isolated()
>         ->__test_page_isolated_in_pageblock()
>              ->PageBuddy(page) // Check if the page is in buddy
> 
> To resolve this issue, we have implemented a function named
> wait_for_hugepage_folios_freed(). This function ensures that the hugepage
> folios are properly released back to the buddy system after their migration
> is completed. By invoking wait_for_hugepage_folios_freed() before calling
> PageBuddy(), we ensure that PageBuddy() will succeed.
> 
> Fixes: b65d4adbc0f0 ("mm: hugetlb: defer freeing of HugeTLB pages")

Do you feel that this issue is serious enough to justify a -stable
backport of the fix?
Muchun Song Feb. 18, 2025, 6:52 a.m. UTC | #2
> On Feb 15, 2025, at 15:20, yangge1116@126.com wrote:
> 
> From: Ge Yang <yangge1116@126.com>
> 
> Since the introduction of commit b65d4adbc0f0 ("mm: hugetlb: defer freeing
> of HugeTLB pages"), which supports deferring the freeing of HugeTLB pages,
> the allocation of contiguous memory through cma_alloc() may fail
> probabilistically.
> 
> In the CMA allocation process, if it is found that the CMA area is occupied
> by in-use hugepage folios, these in-use hugepage folios need to be migrated
> to another location. When there are no available hugepage folios in the
> free HugeTLB pool during the migration of in-use HugeTLB pages, new folios
> are allocated from the buddy system. A temporary state is set on the newly
> allocated folio. Upon completion of the hugepage folio migration, the
> temporary state is transferred from the new folios to the old folios.
> Normally, when the old folios with the temporary state are freed, it is
> directly released back to the buddy system. However, due to the deferred
> freeing of HugeTLB pages, the PageBuddy() check fails, ultimately leading
> to the failure of cma_alloc().
> 
> Here is a simplified call trace illustrating the process:
> cma_alloc()
>    ->__alloc_contig_migrate_range() // Migrate in-use hugepage
>        ->unmap_and_move_huge_page()
>            ->folio_putback_hugetlb() // Free old folios
>    ->test_pages_isolated()
>        ->__test_page_isolated_in_pageblock()
>             ->PageBuddy(page) // Check if the page is in buddy
> 
> To resolve this issue, we have implemented a function named
> wait_for_hugepage_folios_freed(). This function ensures that the hugepage
> folios are properly released back to the buddy system after their migration
> is completed. By invoking wait_for_hugepage_folios_freed() before calling
> PageBuddy(), we ensure that PageBuddy() will succeed.
> 
> Fixes: b65d4adbc0f0 ("mm: hugetlb: defer freeing of HugeTLB pages")
> Signed-off-by: Ge Yang <yangge1116@126.com>
> ---
> 
> V2:
> - flush all folios at once suggested by David
> 
> include/linux/hugetlb.h |  5 +++++
> mm/hugetlb.c            |  8 ++++++++
> mm/page_isolation.c     | 10 ++++++++++
> 3 files changed, 23 insertions(+)
> 
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index 6c6546b..04708b0 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -697,6 +697,7 @@ bool hugetlb_bootmem_page_zones_valid(int nid, struct huge_bootmem_page *m);
> 
> int isolate_or_dissolve_huge_page(struct page *page, struct list_head *list);
> int replace_free_hugepage_folios(unsigned long start_pfn, unsigned long end_pfn);
> +void wait_for_hugepage_folios_freed(void);
> struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
>                unsigned long addr, bool cow_from_owner);
> struct folio *alloc_hugetlb_folio_nodemask(struct hstate *h, int preferred_nid,
> @@ -1092,6 +1093,10 @@ static inline int replace_free_hugepage_folios(unsigned long start_pfn,
>    return 0;
> }
> 
> +static inline void wait_for_hugepage_folios_freed(void)
> +{
> +}
> +
> static inline struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
>                       unsigned long addr,
>                       bool cow_from_owner)
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 30bc34d..36dd3e4 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -2955,6 +2955,14 @@ int replace_free_hugepage_folios(unsigned long start_pfn, unsigned long end_pfn)
>    return ret;
> }
> 
> +void wait_for_hugepage_folios_freed(void)

We usually use the "hugetlb" term now instead of "huge_page" to differentiate with THP. So I suggest naming it as wait_for_hugetlb_folios_freed().

> +{
> +    struct hstate *h;
> +
> +    for_each_hstate(h)
> +        flush_free_hpage_work(h);

Because all hstate use the shared work to defer the freeing of hugetlb pages, we only need to flush once. Directly useing flush_work(&free_hpage_work) is enough.

> +}
> +
> typedef enum {
>    /*
>     * For either 0/1: we checked the per-vma resv map, and one resv
> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> index 8ed53ee0..f56cf02 100644
> --- a/mm/page_isolation.c
> +++ b/mm/page_isolation.c
> @@ -615,6 +615,16 @@ int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn,
>    int ret;
> 
>    /*
> +     * Due to the deferred freeing of HugeTLB folios, the hugepage folios may
> +     * not immediately release to the buddy system. This can cause PageBuddy()
> +     * to fail in __test_page_isolated_in_pageblock(). To ensure that the
> +     * hugepage folios are properly released back to the buddy system, we

hugetlb folios, pls.

Thanks,
Muchun

> +     * invoke the wait_for_hugepage_folios_freed() function to wait for the
> +     * release to complete.
> +     */
> +    wait_for_hugepage_folios_freed();
> +
> +    /*
>     * Note: pageblock_nr_pages != MAX_PAGE_ORDER. Then, chunks of free
>     * pages are not aligned to pageblock_nr_pages.
>     * Then we just check migratetype first.
> -- 
> 2.7.4
>
Muchun Song Feb. 18, 2025, 7:05 a.m. UTC | #3
> On Feb 15, 2025, at 15:20, yangge1116@126.com wrote:
> 
> From: Ge Yang <yangge1116@126.com>
> 
> Since the introduction of commit b65d4adbc0f0 ("mm: hugetlb: defer freeing
> of HugeTLB pages"), which supports deferring the freeing of HugeTLB pages,
> the allocation of contiguous memory through cma_alloc() may fail
> probabilistically.
> 
> In the CMA allocation process, if it is found that the CMA area is occupied
> by in-use hugepage folios, these in-use hugepage folios need to be migrated
> to another location. When there are no available hugepage folios in the
> free HugeTLB pool during the migration of in-use HugeTLB pages, new folios
> are allocated from the buddy system. A temporary state is set on the newly
> allocated folio. Upon completion of the hugepage folio migration, the
> temporary state is transferred from the new folios to the old folios.
> Normally, when the old folios with the temporary state are freed, it is
> directly released back to the buddy system. However, due to the deferred
> freeing of HugeTLB pages, the PageBuddy() check fails, ultimately leading
> to the failure of cma_alloc().
> 
> Here is a simplified call trace illustrating the process:
> cma_alloc()
>    ->__alloc_contig_migrate_range() // Migrate in-use hugepage
>        ->unmap_and_move_huge_page()
>            ->folio_putback_hugetlb() // Free old folios
>    ->test_pages_isolated()
>        ->__test_page_isolated_in_pageblock()
>             ->PageBuddy(page) // Check if the page is in buddy
> 
> To resolve this issue, we have implemented a function named
> wait_for_hugepage_folios_freed(). This function ensures that the hugepage
> folios are properly released back to the buddy system after their migration
> is completed. By invoking wait_for_hugepage_folios_freed() before calling
> PageBuddy(), we ensure that PageBuddy() will succeed.
> 
> Fixes: b65d4adbc0f0 ("mm: hugetlb: defer freeing of HugeTLB pages")

The actual blamed commit should be the

commit c77c0a8ac4c52 ("mm/hugetlb: defer freeing of huge pages if in non-task context")

which is the first to introducing the delayed work to free the hugetlb pages.
It was removed by commit db71ef79b59bb2 and then was brought back by commit
b65d4adbc0f0 immediately.

> Signed-off-by: Ge Yang <yangge1116@126.com>
> ---
> 
> V2:
> - flush all folios at once suggested by David
> 
> include/linux/hugetlb.h |  5 +++++
> mm/hugetlb.c            |  8 ++++++++
> mm/page_isolation.c     | 10 ++++++++++
> 3 files changed, 23 insertions(+)
> 
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index 6c6546b..04708b0 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -697,6 +697,7 @@ bool hugetlb_bootmem_page_zones_valid(int nid, struct huge_bootmem_page *m);
> 
> int isolate_or_dissolve_huge_page(struct page *page, struct list_head *list);
> int replace_free_hugepage_folios(unsigned long start_pfn, unsigned long end_pfn);
> +void wait_for_hugepage_folios_freed(void);
> struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
> 				unsigned long addr, bool cow_from_owner);
> struct folio *alloc_hugetlb_folio_nodemask(struct hstate *h, int preferred_nid,
> @@ -1092,6 +1093,10 @@ static inline int replace_free_hugepage_folios(unsigned long start_pfn,
> 	return 0;
> }
> 
> +static inline void wait_for_hugepage_folios_freed(void)
> +{
> +}
> +
> static inline struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
>   			unsigned long addr,
>   			bool cow_from_owner)
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 30bc34d..36dd3e4 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -2955,6 +2955,14 @@ int replace_free_hugepage_folios(unsigned long start_pfn, unsigned long end_pfn)
> return ret;
> }
> 
> +void wait_for_hugepage_folios_freed(void)

We usually use the "hugetlb" term now instead of "huge_page" to differentiate with THP.
So I suggest naming it as wait_for_hugetlb_folios_freed().

> +{
> + 	struct hstate *h;
> +
> + 	for_each_hstate(h)
> + 		flush_free_hpage_work(h);

Because all hstate use the shared work to defer the freeing of hugetlb pages, we only
need to flush once. Directly useing flush_work(&free_hpage_work) is enough.

> +}
> +
> typedef enum {
> 	/*
> * For either 0/1: we checked the per-vma resv map, and one resv
> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> index 8ed53ee0..f56cf02 100644
> --- a/mm/page_isolation.c
> +++ b/mm/page_isolation.c
> @@ -615,6 +615,16 @@ int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn,
> int ret;
> 
> 	/*
> +	 * Due to the deferred freeing of HugeTLB folios, the hugepage folios may
> +	 * not immediately release to the buddy system. This can cause PageBuddy()
> +	 * to fail in __test_page_isolated_in_pageblock(). To ensure that the
> +	 * hugepage folios are properly released back to the buddy system, we

hugetlb folios.

Muchun,
Thanks.

> +	 * invoke the wait_for_hugepage_folios_freed() function to wait for the
> +	 * release to complete.
> +	 */
> + 	wait_for_hugepage_folios_freed();
> +
> + 	/*
> 	 * Note: pageblock_nr_pages != MAX_PAGE_ORDER. Then, chunks of free
> 	 * pages are not aligned to pageblock_nr_pages.
> 	 * Then we just check migratetype first.
> -- 
> 2.7.4
>
Ge Yang Feb. 18, 2025, 7:24 a.m. UTC | #4
在 2025/2/18 12:34, Andrew Morton 写道:
> On Sat, 15 Feb 2025 15:20:26 +0800 yangge1116@126.com wrote:
> 
>> From: Ge Yang <yangge1116@126.com>
>>
>> Since the introduction of commit b65d4adbc0f0 ("mm: hugetlb: defer freeing
>> of HugeTLB pages"), which supports deferring the freeing of HugeTLB pages,
>> the allocation of contiguous memory through cma_alloc() may fail
>> probabilistically.
>>
>> In the CMA allocation process, if it is found that the CMA area is occupied
>> by in-use hugepage folios, these in-use hugepage folios need to be migrated
>> to another location. When there are no available hugepage folios in the
>> free HugeTLB pool during the migration of in-use HugeTLB pages, new folios
>> are allocated from the buddy system. A temporary state is set on the newly
>> allocated folio. Upon completion of the hugepage folio migration, the
>> temporary state is transferred from the new folios to the old folios.
>> Normally, when the old folios with the temporary state are freed, it is
>> directly released back to the buddy system. However, due to the deferred
>> freeing of HugeTLB pages, the PageBuddy() check fails, ultimately leading
>> to the failure of cma_alloc().
>>
>> Here is a simplified call trace illustrating the process:
>> cma_alloc()
>>      ->__alloc_contig_migrate_range() // Migrate in-use hugepage
>>          ->unmap_and_move_huge_page()
>>              ->folio_putback_hugetlb() // Free old folios
>>      ->test_pages_isolated()
>>          ->__test_page_isolated_in_pageblock()
>>               ->PageBuddy(page) // Check if the page is in buddy
>>
>> To resolve this issue, we have implemented a function named
>> wait_for_hugepage_folios_freed(). This function ensures that the hugepage
>> folios are properly released back to the buddy system after their migration
>> is completed. By invoking wait_for_hugepage_folios_freed() before calling
>> PageBuddy(), we ensure that PageBuddy() will succeed.
>>
>> Fixes: b65d4adbc0f0 ("mm: hugetlb: defer freeing of HugeTLB pages")
> 
> Do you feel that this issue is serious enough to justify a -stable
> backport of the fix?
Yes, I will add 'CC: stable@vger.kernel.org' in the next patch, thanks.
Ge Yang Feb. 18, 2025, 7:25 a.m. UTC | #5
在 2025/2/18 15:05, Muchun Song 写道:
> 
> 
>> On Feb 15, 2025, at 15:20, yangge1116@126.com wrote:
>>
>> From: Ge Yang <yangge1116@126.com>
>>
>> Since the introduction of commit b65d4adbc0f0 ("mm: hugetlb: defer freeing
>> of HugeTLB pages"), which supports deferring the freeing of HugeTLB pages,
>> the allocation of contiguous memory through cma_alloc() may fail
>> probabilistically.
>>
>> In the CMA allocation process, if it is found that the CMA area is occupied
>> by in-use hugepage folios, these in-use hugepage folios need to be migrated
>> to another location. When there are no available hugepage folios in the
>> free HugeTLB pool during the migration of in-use HugeTLB pages, new folios
>> are allocated from the buddy system. A temporary state is set on the newly
>> allocated folio. Upon completion of the hugepage folio migration, the
>> temporary state is transferred from the new folios to the old folios.
>> Normally, when the old folios with the temporary state are freed, it is
>> directly released back to the buddy system. However, due to the deferred
>> freeing of HugeTLB pages, the PageBuddy() check fails, ultimately leading
>> to the failure of cma_alloc().
>>
>> Here is a simplified call trace illustrating the process:
>> cma_alloc()
>>     ->__alloc_contig_migrate_range() // Migrate in-use hugepage
>>         ->unmap_and_move_huge_page()
>>             ->folio_putback_hugetlb() // Free old folios
>>     ->test_pages_isolated()
>>         ->__test_page_isolated_in_pageblock()
>>              ->PageBuddy(page) // Check if the page is in buddy
>>
>> To resolve this issue, we have implemented a function named
>> wait_for_hugepage_folios_freed(). This function ensures that the hugepage
>> folios are properly released back to the buddy system after their migration
>> is completed. By invoking wait_for_hugepage_folios_freed() before calling
>> PageBuddy(), we ensure that PageBuddy() will succeed.
>>
>> Fixes: b65d4adbc0f0 ("mm: hugetlb: defer freeing of HugeTLB pages")
> 
> The actual blamed commit should be the
> 
> commit c77c0a8ac4c52 ("mm/hugetlb: defer freeing of huge pages if in non-task context")
> 
> which is the first to introducing the delayed work to free the hugetlb pages.
> It was removed by commit db71ef79b59bb2 and then was brought back by commit
> b65d4adbc0f0 immediately.
> 
Ok, thanks.
>> Signed-off-by: Ge Yang <yangge1116@126.com>
>> ---
>>
>> V2:
>> - flush all folios at once suggested by David
>>
>> include/linux/hugetlb.h |  5 +++++
>> mm/hugetlb.c            |  8 ++++++++
>> mm/page_isolation.c     | 10 ++++++++++
>> 3 files changed, 23 insertions(+)
>>
>> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
>> index 6c6546b..04708b0 100644
>> --- a/include/linux/hugetlb.h
>> +++ b/include/linux/hugetlb.h
>> @@ -697,6 +697,7 @@ bool hugetlb_bootmem_page_zones_valid(int nid, struct huge_bootmem_page *m);
>>
>> int isolate_or_dissolve_huge_page(struct page *page, struct list_head *list);
>> int replace_free_hugepage_folios(unsigned long start_pfn, unsigned long end_pfn);
>> +void wait_for_hugepage_folios_freed(void);
>> struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
>> 				unsigned long addr, bool cow_from_owner);
>> struct folio *alloc_hugetlb_folio_nodemask(struct hstate *h, int preferred_nid,
>> @@ -1092,6 +1093,10 @@ static inline int replace_free_hugepage_folios(unsigned long start_pfn,
>> 	return 0;
>> }
>>
>> +static inline void wait_for_hugepage_folios_freed(void)
>> +{
>> +}
>> +
>> static inline struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
>>    			unsigned long addr,
>>    			bool cow_from_owner)
>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>> index 30bc34d..36dd3e4 100644
>> --- a/mm/hugetlb.c
>> +++ b/mm/hugetlb.c
>> @@ -2955,6 +2955,14 @@ int replace_free_hugepage_folios(unsigned long start_pfn, unsigned long end_pfn)
>> return ret;
>> }
>>
>> +void wait_for_hugepage_folios_freed(void)
> 
> We usually use the "hugetlb" term now instead of "huge_page" to differentiate with THP.
> So I suggest naming it as wait_for_hugetlb_folios_freed().
> 
>> +{
>> + 	struct hstate *h;
>> +
>> + 	for_each_hstate(h)
>> + 		flush_free_hpage_work(h);
> 
> Because all hstate use the shared work to defer the freeing of hugetlb pages, we only
> need to flush once. Directly useing flush_work(&free_hpage_work) is enough.
> 
Ok, thanks.
>> +}
>> +
>> typedef enum {
>> 	/*
>> * For either 0/1: we checked the per-vma resv map, and one resv
>> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
>> index 8ed53ee0..f56cf02 100644
>> --- a/mm/page_isolation.c
>> +++ b/mm/page_isolation.c
>> @@ -615,6 +615,16 @@ int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn,
>> int ret;
>>
>> 	/*
>> +	 * Due to the deferred freeing of HugeTLB folios, the hugepage folios may
>> +	 * not immediately release to the buddy system. This can cause PageBuddy()
>> +	 * to fail in __test_page_isolated_in_pageblock(). To ensure that the
>> +	 * hugepage folios are properly released back to the buddy system, we
> 
> hugetlb folios.
> 
Ok, thanks.
> Muchun,
> Thanks.
> 
>> +	 * invoke the wait_for_hugepage_folios_freed() function to wait for the
>> +	 * release to complete.
>> +	 */
>> + 	wait_for_hugepage_folios_freed();
>> +
>> + 	/*
>> 	 * Note: pageblock_nr_pages != MAX_PAGE_ORDER. Then, chunks of free
>> 	 * pages are not aligned to pageblock_nr_pages.
>> 	 * Then we just check migratetype first.
>> -- 
>> 2.7.4
>>
David Hildenbrand Feb. 18, 2025, 8:52 a.m. UTC | #6
On 18.02.25 07:52, Muchun Song wrote:
> 
> 
>> On Feb 15, 2025, at 15:20, yangge1116@126.com wrote:
>>
>> From: Ge Yang <yangge1116@126.com>
>>
>> Since the introduction of commit b65d4adbc0f0 ("mm: hugetlb: defer freeing
>> of HugeTLB pages"), which supports deferring the freeing of HugeTLB pages,
>> the allocation of contiguous memory through cma_alloc() may fail
>> probabilistically.
>>
>> In the CMA allocation process, if it is found that the CMA area is 
>> occupied
>> by in-use hugepage folios, these in-use hugepage folios need to be 
>> migrated
>> to another location. When there are no available hugepage folios in the
>> free HugeTLB pool during the migration of in-use HugeTLB pages, new folios
>> are allocated from the buddy system. A temporary state is set on the newly
>> allocated folio. Upon completion of the hugepage folio migration, the
>> temporary state is transferred from the new folios to the old folios.
>> Normally, when the old folios with the temporary state are freed, it is
>> directly released back to the buddy system. However, due to the deferred
>> freeing of HugeTLB pages, the PageBuddy() check fails, ultimately leading
>> to the failure of cma_alloc().
>>
>> Here is a simplified call trace illustrating the process:
>> cma_alloc()
>>    ->__alloc_contig_migrate_range() // Migrate in-use hugepage
>>        ->unmap_and_move_huge_page()
>>            ->folio_putback_hugetlb() // Free old folios
>>    ->test_pages_isolated()
>>        ->__test_page_isolated_in_pageblock()
>>             ->PageBuddy(page) // Check if the page is in buddy
>>
>> To resolve this issue, we have implemented a function named
>> wait_for_hugepage_folios_freed(). This function ensures that the hugepage
>> folios are properly released back to the buddy system after their 
>> migration
>> is completed. By invoking wait_for_hugepage_folios_freed() before calling
>> PageBuddy(), we ensure that PageBuddy() will succeed.
>>
>> Fixes: b65d4adbc0f0 ("mm: hugetlb: defer freeing of HugeTLB pages")
>> Signed-off-by: Ge Yang <yangge1116@126.com>
>> ---
>>
>> V2:
>> - flush all folios at once suggested by David
>>
>> include/linux/hugetlb.h |  5 +++++
>> mm/hugetlb.c            |  8 ++++++++
>> mm/page_isolation.c     | 10 ++++++++++
>> 3 files changed, 23 insertions(+)
>>
>> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
>> index 6c6546b..04708b0 100644
>> --- a/include/linux/hugetlb.h
>> +++ b/include/linux/hugetlb.h
>> @@ -697,6 +697,7 @@ bool hugetlb_bootmem_page_zones_valid(int nid, 
>> struct huge_bootmem_page *m);
>>
>> int isolate_or_dissolve_huge_page(struct page *page, struct list_head 
>> *list);
>> int replace_free_hugepage_folios(unsigned long start_pfn, unsigned 
>> long end_pfn);
>> +void wait_for_hugepage_folios_freed(void);
>> struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
>>                unsigned long addr, bool cow_from_owner);
>> struct folio *alloc_hugetlb_folio_nodemask(struct hstate *h, int 
>> preferred_nid,
>> @@ -1092,6 +1093,10 @@ static inline int 
>> replace_free_hugepage_folios(unsigned long start_pfn,
>>    return 0;
>> }
>>
>> +static inline void wait_for_hugepage_folios_freed(void)
>> +{
>> +}
>> +
>> static inline struct folio *alloc_hugetlb_folio(struct vm_area_struct 
>> *vma,
>>                       unsigned long addr,
>>                       bool cow_from_owner)
>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>> index 30bc34d..36dd3e4 100644
>> --- a/mm/hugetlb.c
>> +++ b/mm/hugetlb.c
>> @@ -2955,6 +2955,14 @@ int replace_free_hugepage_folios(unsigned long 
>> start_pfn, unsigned long end_pfn)
>>    return ret;
>> }
>>
>> +void wait_for_hugepage_folios_freed(void)
> 
> We usually use the "hugetlb" term now instead of "huge_page" to 
> differentiate with THP. So I suggest naming it as 
> wait_for_hugetlb_folios_freed().

Maybe "wait_for_freed_hugetlb_folios" or "hugetlb_wait_for_freed_folios".

In general, LGTM
Ge Yang Feb. 18, 2025, 9:06 a.m. UTC | #7
在 2025/2/18 16:52, David Hildenbrand 写道:
> On 18.02.25 07:52, Muchun Song wrote:
>>
>>
>>> On Feb 15, 2025, at 15:20, yangge1116@126.com wrote:
>>>
>>> From: Ge Yang <yangge1116@126.com>
>>>
>>> Since the introduction of commit b65d4adbc0f0 ("mm: hugetlb: defer 
>>> freeing
>>> of HugeTLB pages"), which supports deferring the freeing of HugeTLB 
>>> pages,
>>> the allocation of contiguous memory through cma_alloc() may fail
>>> probabilistically.
>>>
>>> In the CMA allocation process, if it is found that the CMA area is 
>>> occupied
>>> by in-use hugepage folios, these in-use hugepage folios need to be 
>>> migrated
>>> to another location. When there are no available hugepage folios in the
>>> free HugeTLB pool during the migration of in-use HugeTLB pages, new 
>>> folios
>>> are allocated from the buddy system. A temporary state is set on the 
>>> newly
>>> allocated folio. Upon completion of the hugepage folio migration, the
>>> temporary state is transferred from the new folios to the old folios.
>>> Normally, when the old folios with the temporary state are freed, it is
>>> directly released back to the buddy system. However, due to the deferred
>>> freeing of HugeTLB pages, the PageBuddy() check fails, ultimately 
>>> leading
>>> to the failure of cma_alloc().
>>>
>>> Here is a simplified call trace illustrating the process:
>>> cma_alloc()
>>>    ->__alloc_contig_migrate_range() // Migrate in-use hugepage
>>>        ->unmap_and_move_huge_page()
>>>            ->folio_putback_hugetlb() // Free old folios
>>>    ->test_pages_isolated()
>>>        ->__test_page_isolated_in_pageblock()
>>>             ->PageBuddy(page) // Check if the page is in buddy
>>>
>>> To resolve this issue, we have implemented a function named
>>> wait_for_hugepage_folios_freed(). This function ensures that the 
>>> hugepage
>>> folios are properly released back to the buddy system after their 
>>> migration
>>> is completed. By invoking wait_for_hugepage_folios_freed() before 
>>> calling
>>> PageBuddy(), we ensure that PageBuddy() will succeed.
>>>
>>> Fixes: b65d4adbc0f0 ("mm: hugetlb: defer freeing of HugeTLB pages")
>>> Signed-off-by: Ge Yang <yangge1116@126.com>
>>> ---
>>>
>>> V2:
>>> - flush all folios at once suggested by David
>>>
>>> include/linux/hugetlb.h |  5 +++++
>>> mm/hugetlb.c            |  8 ++++++++
>>> mm/page_isolation.c     | 10 ++++++++++
>>> 3 files changed, 23 insertions(+)
>>>
>>> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
>>> index 6c6546b..04708b0 100644
>>> --- a/include/linux/hugetlb.h
>>> +++ b/include/linux/hugetlb.h
>>> @@ -697,6 +697,7 @@ bool hugetlb_bootmem_page_zones_valid(int nid, 
>>> struct huge_bootmem_page *m);
>>>
>>> int isolate_or_dissolve_huge_page(struct page *page, struct list_head 
>>> *list);
>>> int replace_free_hugepage_folios(unsigned long start_pfn, unsigned 
>>> long end_pfn);
>>> +void wait_for_hugepage_folios_freed(void);
>>> struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
>>>                unsigned long addr, bool cow_from_owner);
>>> struct folio *alloc_hugetlb_folio_nodemask(struct hstate *h, int 
>>> preferred_nid,
>>> @@ -1092,6 +1093,10 @@ static inline int 
>>> replace_free_hugepage_folios(unsigned long start_pfn,
>>>    return 0;
>>> }
>>>
>>> +static inline void wait_for_hugepage_folios_freed(void)
>>> +{
>>> +}
>>> +
>>> static inline struct folio *alloc_hugetlb_folio(struct vm_area_struct 
>>> *vma,
>>>                       unsigned long addr,
>>>                       bool cow_from_owner)
>>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>>> index 30bc34d..36dd3e4 100644
>>> --- a/mm/hugetlb.c
>>> +++ b/mm/hugetlb.c
>>> @@ -2955,6 +2955,14 @@ int replace_free_hugepage_folios(unsigned long 
>>> start_pfn, unsigned long end_pfn)
>>>    return ret;
>>> }
>>>
>>> +void wait_for_hugepage_folios_freed(void)
>>
>> We usually use the "hugetlb" term now instead of "huge_page" to 
>> differentiate with THP. So I suggest naming it as 
>> wait_for_hugetlb_folios_freed().
> 
> Maybe "wait_for_freed_hugetlb_folios" or "hugetlb_wait_for_freed_folios".
> 
> In general, LGTM
> 
Ok, thanks.
diff mbox series

Patch

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 6c6546b..04708b0 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -697,6 +697,7 @@  bool hugetlb_bootmem_page_zones_valid(int nid, struct huge_bootmem_page *m);
 
 int isolate_or_dissolve_huge_page(struct page *page, struct list_head *list);
 int replace_free_hugepage_folios(unsigned long start_pfn, unsigned long end_pfn);
+void wait_for_hugepage_folios_freed(void);
 struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
 				unsigned long addr, bool cow_from_owner);
 struct folio *alloc_hugetlb_folio_nodemask(struct hstate *h, int preferred_nid,
@@ -1092,6 +1093,10 @@  static inline int replace_free_hugepage_folios(unsigned long start_pfn,
 	return 0;
 }
 
+static inline void wait_for_hugepage_folios_freed(void)
+{
+}
+
 static inline struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
 					   unsigned long addr,
 					   bool cow_from_owner)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 30bc34d..36dd3e4 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2955,6 +2955,14 @@  int replace_free_hugepage_folios(unsigned long start_pfn, unsigned long end_pfn)
 	return ret;
 }
 
+void wait_for_hugepage_folios_freed(void)
+{
+	struct hstate *h;
+
+	for_each_hstate(h)
+		flush_free_hpage_work(h);
+}
+
 typedef enum {
 	/*
 	 * For either 0/1: we checked the per-vma resv map, and one resv
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 8ed53ee0..f56cf02 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -615,6 +615,16 @@  int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn,
 	int ret;
 
 	/*
+	 * Due to the deferred freeing of HugeTLB folios, the hugepage folios may
+	 * not immediately release to the buddy system. This can cause PageBuddy()
+	 * to fail in __test_page_isolated_in_pageblock(). To ensure that the
+	 * hugepage folios are properly released back to the buddy system, we
+	 * invoke the wait_for_hugepage_folios_freed() function to wait for the
+	 * release to complete.
+	 */
+	wait_for_hugepage_folios_freed();
+
+	/*
 	 * Note: pageblock_nr_pages != MAX_PAGE_ORDER. Then, chunks of free
 	 * pages are not aligned to pageblock_nr_pages.
 	 * Then we just check migratetype first.