mm, compaction: don't use ALLOC_CMA in long term GUP flow

Message ID	1734075432-14131-1-git-send-email-yangge1116@126.com (mailing list archive)
State	New
Headers	show Return-Path: <owner-linux-mm@kvack.org> From: yangge1116@126.com To: akpm@linux-foundation.org Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, stable@vger.kernel.org, 21cnbao@gmail.com, david@redhat.com, baolin.wang@linux.alibaba.com, vbabka@suse.cz, liuzixing@hygon.cn, yangge <yangge1116@126.com> Subject: [PATCH] mm, compaction: don't use ALLOC_CMA in long term GUP flow Date: Fri, 13 Dec 2024 15:37:12 +0800 Message-Id: <1734075432-14131-1-git-send-email-yangge1116@126.com> Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	mm, compaction: don't use ALLOC_CMA in long term GUP flow \| expand mm, compaction: don't use ALLOC_CMA in long term GUP flow

Message ID

1734075432-14131-1-git-send-email-yangge1116@126.com (mailing list archive)

State

New

Headers

From: yangge1116@126.com
To: akpm@linux-foundation.org
Cc: linux-mm@kvack.org,
	linux-kernel@vger.kernel.org,
	stable@vger.kernel.org,
	21cnbao@gmail.com,
	david@redhat.com,
	baolin.wang@linux.alibaba.com,
	vbabka@suse.cz,
	liuzixing@hygon.cn,
	yangge <yangge1116@126.com>
Subject: [PATCH] mm, compaction: don't use ALLOC_CMA in long term GUP flow
Date: Fri, 13 Dec 2024 15:37:12 +0800
Message-Id: <1734075432-14131-1-git-send-email-yangge1116@126.com>
Sender: owner-linux-mm@kvack.org
Precedence: bulk

Series

mm, compaction: don't use ALLOC_CMA in long term GUP flow | expand

Commit Message

Ge Yang Dec. 13, 2024, 7:37 a.m. UTC

From: yangge <yangge1116@126.com>

Since commit 984fdba6a32e ("mm, compaction: use proper alloc_flags
in __compaction_suitable()") allow compaction to proceed when free
pages required for compaction reside in the CMA pageblocks, it's
possible that __compaction_suitable() always returns true, and in
some cases, it's not acceptable.

There are 4 NUMA nodes on my machine, and each NUMA node has 32GB
of memory. I have configured 16GB of CMA memory on each NUMA node,
and starting a 32GB virtual machine with device passthrough is
extremely slow, taking almost an hour.

During the start-up of the virtual machine, it will call
pin_user_pages_remote(..., FOLL_LONGTERM, ...) to allocate memory.
Long term GUP cannot allocate memory from CMA area, so a maximum
of 16 GB of no-CMA memory on a NUMA node can be used as virtual
machine memory. Since there is 16G of free CMA memory on the NUMA
node, watermark for order-0 always be met for compaction, so
__compaction_suitable() always returns true, even if the node is
unable to allocate non-CMA memory for the virtual machine.

For costly allocations, because __compaction_suitable() always
returns true, __alloc_pages_slowpath() can't exit at the appropriate
place, resulting in excessively long virtual machine startup times.
Call trace:
__alloc_pages_slowpath
    if (compact_result == COMPACT_SKIPPED ||
        compact_result == COMPACT_DEFERRED)
        goto nopage; // should exit __alloc_pages_slowpath() from here

To sum up, during long term GUP flow, we should remove ALLOC_CMA
both in __compaction_suitable() and __isolate_free_page().

Fixes: 984fdba6a32e ("mm, compaction: use proper alloc_flags in __compaction_suitable()")
Cc: <stable@vger.kernel.org>
Signed-off-by: yangge <yangge1116@126.com>
---
 mm/compaction.c | 8 +++++---
 mm/page_alloc.c | 4 +++-
 2 files changed, 8 insertions(+), 4 deletions(-)

Comments

Barry Song Dec. 13, 2024, 7:56 a.m. UTC | #1

On Fri, Dec 13, 2024 at 3:37 PM <yangge1116@126.com> wrote:
>
> From: yangge <yangge1116@126.com>
>
> Since commit 984fdba6a32e ("mm, compaction: use proper alloc_flags
> in __compaction_suitable()") allow compaction to proceed when free
> pages required for compaction reside in the CMA pageblocks, it's
> possible that __compaction_suitable() always returns true, and in
> some cases, it's not acceptable.
>
> There are 4 NUMA nodes on my machine, and each NUMA node has 32GB
> of memory. I have configured 16GB of CMA memory on each NUMA node,
> and starting a 32GB virtual machine with device passthrough is
> extremely slow, taking almost an hour.

I don't fully understand why each node has a 16GB CMA. As I recall, I designed
the per-NUMA CMA to support devices that are not behind the IOMMU, such as
the IOMMU itself or certain device drivers which are not having IOMMU and
need contiguous memory for DMA. These devices don't seem to require that
much memory.

>
> During the start-up of the virtual machine, it will call
> pin_user_pages_remote(..., FOLL_LONGTERM, ...) to allocate memory.
> Long term GUP cannot allocate memory from CMA area, so a maximum
> of 16 GB of no-CMA memory on a NUMA node can be used as virtual
> machine memory. Since there is 16G of free CMA memory on the NUMA
> node, watermark for order-0 always be met for compaction, so
> __compaction_suitable() always returns true, even if the node is
> unable to allocate non-CMA memory for the virtual machine.
>
> For costly allocations, because __compaction_suitable() always
> returns true, __alloc_pages_slowpath() can't exit at the appropriate
> place, resulting in excessively long virtual machine startup times.
> Call trace:
> __alloc_pages_slowpath
>     if (compact_result == COMPACT_SKIPPED ||
>         compact_result == COMPACT_DEFERRED)
>         goto nopage; // should exit __alloc_pages_slowpath() from here
>
> To sum up, during long term GUP flow, we should remove ALLOC_CMA
> both in __compaction_suitable() and __isolate_free_page().

What’s the outcome after your fix? Will it quickly fall back to remote
NUMA nodes
for the pin?

>
> Fixes: 984fdba6a32e ("mm, compaction: use proper alloc_flags in __compaction_suitable()")
> Cc: <stable@vger.kernel.org>
> Signed-off-by: yangge <yangge1116@126.com>
> ---
>  mm/compaction.c | 8 +++++---
>  mm/page_alloc.c | 4 +++-
>  2 files changed, 8 insertions(+), 4 deletions(-)
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 07bd227..044c2247 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -2384,6 +2384,7 @@ static bool __compaction_suitable(struct zone *zone, int order,
>                                   unsigned long wmark_target)
>  {
>         unsigned long watermark;
> +       bool pin;
>         /*
>          * Watermarks for order-0 must be met for compaction to be able to
>          * isolate free pages for migration targets. This means that the
> @@ -2395,14 +2396,15 @@ static bool __compaction_suitable(struct zone *zone, int order,
>          * even if compaction succeeds.
>          * For costly orders, we require low watermark instead of min for
>          * compaction to proceed to increase its chances.
> -        * ALLOC_CMA is used, as pages in CMA pageblocks are considered
> -        * suitable migration targets
> +        * In addition to long term GUP flow, ALLOC_CMA is used, as pages in
> +        * CMA pageblocks are considered suitable migration targets
>          */
>         watermark = (order > PAGE_ALLOC_COSTLY_ORDER) ?
>                                 low_wmark_pages(zone) : min_wmark_pages(zone);
>         watermark += compact_gap(order);
> +       pin = !!(current->flags & PF_MEMALLOC_PIN);
>         return __zone_watermark_ok(zone, 0, watermark, highest_zoneidx,
> -                                  ALLOC_CMA, wmark_target);
> +                                  pin ? 0 : ALLOC_CMA, wmark_target);
>  }
>
>  /*
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index dde19db..9a5dfda 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2813,6 +2813,7 @@ int __isolate_free_page(struct page *page, unsigned int order)
>  {
>         struct zone *zone = page_zone(page);
>         int mt = get_pageblock_migratetype(page);
> +       bool pin;
>
>         if (!is_migrate_isolate(mt)) {
>                 unsigned long watermark;
> @@ -2823,7 +2824,8 @@ int __isolate_free_page(struct page *page, unsigned int order)
>                  * exists.
>                  */
>                 watermark = zone->_watermark[WMARK_MIN] + (1UL << order);
> -               if (!zone_watermark_ok(zone, 0, watermark, 0, ALLOC_CMA))
> +               pin = !!(current->flags & PF_MEMALLOC_PIN);
> +               if (!zone_watermark_ok(zone, 0, watermark, 0, pin ? 0 : ALLOC_CMA))
>                         return 0;
>         }
>
> --
> 2.7.4
>
Thanks
Barry

Baolin Wang Dec. 13, 2024, 8:23 a.m. UTC | #2

On 2024/12/13 15:37, yangge1116@126.com wrote:
> From: yangge <yangge1116@126.com>
> 
> Since commit 984fdba6a32e ("mm, compaction: use proper alloc_flags
> in __compaction_suitable()") allow compaction to proceed when free
> pages required for compaction reside in the CMA pageblocks, it's
> possible that __compaction_suitable() always returns true, and in
> some cases, it's not acceptable.
> 
> There are 4 NUMA nodes on my machine, and each NUMA node has 32GB
> of memory. I have configured 16GB of CMA memory on each NUMA node,
> and starting a 32GB virtual machine with device passthrough is
> extremely slow, taking almost an hour.
> 
> During the start-up of the virtual machine, it will call
> pin_user_pages_remote(..., FOLL_LONGTERM, ...) to allocate memory.
> Long term GUP cannot allocate memory from CMA area, so a maximum
> of 16 GB of no-CMA memory on a NUMA node can be used as virtual
> machine memory. Since there is 16G of free CMA memory on the NUMA
> node, watermark for order-0 always be met for compaction, so
> __compaction_suitable() always returns true, even if the node is
> unable to allocate non-CMA memory for the virtual machine.
> 
> For costly allocations, because __compaction_suitable() always
> returns true, __alloc_pages_slowpath() can't exit at the appropriate
> place, resulting in excessively long virtual machine startup times.
> Call trace:
> __alloc_pages_slowpath
>      if (compact_result == COMPACT_SKIPPED ||
>          compact_result == COMPACT_DEFERRED)
>          goto nopage; // should exit __alloc_pages_slowpath() from here
> 
> To sum up, during long term GUP flow, we should remove ALLOC_CMA
> both in __compaction_suitable() and __isolate_free_page().
> 
> Fixes: 984fdba6a32e ("mm, compaction: use proper alloc_flags in __compaction_suitable()")
> Cc: <stable@vger.kernel.org>
> Signed-off-by: yangge <yangge1116@126.com>
> ---
>   mm/compaction.c | 8 +++++---
>   mm/page_alloc.c | 4 +++-
>   2 files changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 07bd227..044c2247 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -2384,6 +2384,7 @@ static bool __compaction_suitable(struct zone *zone, int order,
>   				  unsigned long wmark_target)
>   {
>   	unsigned long watermark;
> +	bool pin;
>   	/*
>   	 * Watermarks for order-0 must be met for compaction to be able to
>   	 * isolate free pages for migration targets. This means that the
> @@ -2395,14 +2396,15 @@ static bool __compaction_suitable(struct zone *zone, int order,
>   	 * even if compaction succeeds.
>   	 * For costly orders, we require low watermark instead of min for
>   	 * compaction to proceed to increase its chances.
> -	 * ALLOC_CMA is used, as pages in CMA pageblocks are considered
> -	 * suitable migration targets
> +	 * In addition to long term GUP flow, ALLOC_CMA is used, as pages in
> +	 * CMA pageblocks are considered suitable migration targets
>   	 */
>   	watermark = (order > PAGE_ALLOC_COSTLY_ORDER) ?
>   				low_wmark_pages(zone) : min_wmark_pages(zone);
>   	watermark += compact_gap(order);
> +	pin = !!(current->flags & PF_MEMALLOC_PIN);
>   	return __zone_watermark_ok(zone, 0, watermark, highest_zoneidx,
> -				   ALLOC_CMA, wmark_target);
> +				   pin ? 0 : ALLOC_CMA, wmark_target);
>   }

Seems a little hack for me. Using the 'cc->alloc_flags' passed from the 
caller to determin if ‘ALLOC_CMA’ is needed looks more reasonable to me.

>   
>   /*
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index dde19db..9a5dfda 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2813,6 +2813,7 @@ int __isolate_free_page(struct page *page, unsigned int order)
>   {
>   	struct zone *zone = page_zone(page);
>   	int mt = get_pageblock_migratetype(page);
> +	bool pin;
>   
>   	if (!is_migrate_isolate(mt)) {
>   		unsigned long watermark;
> @@ -2823,7 +2824,8 @@ int __isolate_free_page(struct page *page, unsigned int order)
>   		 * exists.
>   		 */
>   		watermark = zone->_watermark[WMARK_MIN] + (1UL << order);
> -		if (!zone_watermark_ok(zone, 0, watermark, 0, ALLOC_CMA))
> +		pin = !!(current->flags & PF_MEMALLOC_PIN);
> +		if (!zone_watermark_ok(zone, 0, watermark, 0, pin ? 0 : ALLOC_CMA))
>   			return 0;
>   	}
>

Ge Yang Dec. 13, 2024, 8:41 a.m. UTC | #3

在 2024/12/13 15:56, Barry Song 写道:
> On Fri, Dec 13, 2024 at 3:37 PM <yangge1116@126.com> wrote:
>>
>> From: yangge <yangge1116@126.com>
>>
>> Since commit 984fdba6a32e ("mm, compaction: use proper alloc_flags
>> in __compaction_suitable()") allow compaction to proceed when free
>> pages required for compaction reside in the CMA pageblocks, it's
>> possible that __compaction_suitable() always returns true, and in
>> some cases, it's not acceptable.
>>
>> There are 4 NUMA nodes on my machine, and each NUMA node has 32GB
>> of memory. I have configured 16GB of CMA memory on each NUMA node,
>> and starting a 32GB virtual machine with device passthrough is
>> extremely slow, taking almost an hour.
> 
> I don't fully understand why each node has a 16GB CMA. As I recall, I designed
> the per-NUMA CMA to support devices that are not behind the IOMMU, such as
> the IOMMU itself or certain device drivers which are not having IOMMU and
> need contiguous memory for DMA. These devices don't seem to require that
> much memory.
Our hardware supports setting specific protection for contiguous memory 
block, but the granularity of protection is relatively large, exceeding 
4MB, which makes it unsuitable for allocation from buddy. Therefore, 
during system startup, a certain percentage of memory on each node is 
reserved as CMA memory, allowing for the allocation of large contiguous 
memory blocks through cma_alloc.
> 
>>
>> During the start-up of the virtual machine, it will call
>> pin_user_pages_remote(..., FOLL_LONGTERM, ...) to allocate memory.
>> Long term GUP cannot allocate memory from CMA area, so a maximum
>> of 16 GB of no-CMA memory on a NUMA node can be used as virtual
>> machine memory. Since there is 16G of free CMA memory on the NUMA
>> node, watermark for order-0 always be met for compaction, so
>> __compaction_suitable() always returns true, even if the node is
>> unable to allocate non-CMA memory for the virtual machine.
>>
>> For costly allocations, because __compaction_suitable() always
>> returns true, __alloc_pages_slowpath() can't exit at the appropriate
>> place, resulting in excessively long virtual machine startup times.
>> Call trace:
>> __alloc_pages_slowpath
>>      if (compact_result == COMPACT_SKIPPED ||
>>          compact_result == COMPACT_DEFERRED)
>>          goto nopage; // should exit __alloc_pages_slowpath() from here
>>
>> To sum up, during long term GUP flow, we should remove ALLOC_CMA
>> both in __compaction_suitable() and __isolate_free_page().
> 
> What’s the outcome after your fix? Will it quickly fall back to remote
> NUMA nodes
> for the pin?
Starting a 32GB virtual machine with device passthrough takes only a 
free seconds.
Yes, it will quickly fall back to remote NUMA nodes.
> 
>>
>> Fixes: 984fdba6a32e ("mm, compaction: use proper alloc_flags in __compaction_suitable()")
>> Cc: <stable@vger.kernel.org>
>> Signed-off-by: yangge <yangge1116@126.com>
>> ---
>>   mm/compaction.c | 8 +++++---
>>   mm/page_alloc.c | 4 +++-
>>   2 files changed, 8 insertions(+), 4 deletions(-)
>>
>> diff --git a/mm/compaction.c b/mm/compaction.c
>> index 07bd227..044c2247 100644
>> --- a/mm/compaction.c
>> +++ b/mm/compaction.c
>> @@ -2384,6 +2384,7 @@ static bool __compaction_suitable(struct zone *zone, int order,
>>                                    unsigned long wmark_target)
>>   {
>>          unsigned long watermark;
>> +       bool pin;
>>          /*
>>           * Watermarks for order-0 must be met for compaction to be able to
>>           * isolate free pages for migration targets. This means that the
>> @@ -2395,14 +2396,15 @@ static bool __compaction_suitable(struct zone *zone, int order,
>>           * even if compaction succeeds.
>>           * For costly orders, we require low watermark instead of min for
>>           * compaction to proceed to increase its chances.
>> -        * ALLOC_CMA is used, as pages in CMA pageblocks are considered
>> -        * suitable migration targets
>> +        * In addition to long term GUP flow, ALLOC_CMA is used, as pages in
>> +        * CMA pageblocks are considered suitable migration targets
>>           */
>>          watermark = (order > PAGE_ALLOC_COSTLY_ORDER) ?
>>                                  low_wmark_pages(zone) : min_wmark_pages(zone);
>>          watermark += compact_gap(order);
>> +       pin = !!(current->flags & PF_MEMALLOC_PIN);
>>          return __zone_watermark_ok(zone, 0, watermark, highest_zoneidx,
>> -                                  ALLOC_CMA, wmark_target);
>> +                                  pin ? 0 : ALLOC_CMA, wmark_target);
>>   }
>>
>>   /*
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index dde19db..9a5dfda 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -2813,6 +2813,7 @@ int __isolate_free_page(struct page *page, unsigned int order)
>>   {
>>          struct zone *zone = page_zone(page);
>>          int mt = get_pageblock_migratetype(page);
>> +       bool pin;
>>
>>          if (!is_migrate_isolate(mt)) {
>>                  unsigned long watermark;
>> @@ -2823,7 +2824,8 @@ int __isolate_free_page(struct page *page, unsigned int order)
>>                   * exists.
>>                   */
>>                  watermark = zone->_watermark[WMARK_MIN] + (1UL << order);
>> -               if (!zone_watermark_ok(zone, 0, watermark, 0, ALLOC_CMA))
>> +               pin = !!(current->flags & PF_MEMALLOC_PIN);
>> +               if (!zone_watermark_ok(zone, 0, watermark, 0, pin ? 0 : ALLOC_CMA))
>>                          return 0;
>>          }
>>
>> --
>> 2.7.4
>>
> Thanks
> Barry

Ge Yang Dec. 13, 2024, 8:43 a.m. UTC | #4

在 2024/12/13 16:23, Baolin Wang 写道:
> 
> 
> On 2024/12/13 15:37, yangge1116@126.com wrote:
>> From: yangge <yangge1116@126.com>
>>
>> Since commit 984fdba6a32e ("mm, compaction: use proper alloc_flags
>> in __compaction_suitable()") allow compaction to proceed when free
>> pages required for compaction reside in the CMA pageblocks, it's
>> possible that __compaction_suitable() always returns true, and in
>> some cases, it's not acceptable.
>>
>> There are 4 NUMA nodes on my machine, and each NUMA node has 32GB
>> of memory. I have configured 16GB of CMA memory on each NUMA node,
>> and starting a 32GB virtual machine with device passthrough is
>> extremely slow, taking almost an hour.
>>
>> During the start-up of the virtual machine, it will call
>> pin_user_pages_remote(..., FOLL_LONGTERM, ...) to allocate memory.
>> Long term GUP cannot allocate memory from CMA area, so a maximum
>> of 16 GB of no-CMA memory on a NUMA node can be used as virtual
>> machine memory. Since there is 16G of free CMA memory on the NUMA
>> node, watermark for order-0 always be met for compaction, so
>> __compaction_suitable() always returns true, even if the node is
>> unable to allocate non-CMA memory for the virtual machine.
>>
>> For costly allocations, because __compaction_suitable() always
>> returns true, __alloc_pages_slowpath() can't exit at the appropriate
>> place, resulting in excessively long virtual machine startup times.
>> Call trace:
>> __alloc_pages_slowpath
>>      if (compact_result == COMPACT_SKIPPED ||
>>          compact_result == COMPACT_DEFERRED)
>>          goto nopage; // should exit __alloc_pages_slowpath() from here
>>
>> To sum up, during long term GUP flow, we should remove ALLOC_CMA
>> both in __compaction_suitable() and __isolate_free_page().
>>
>> Fixes: 984fdba6a32e ("mm, compaction: use proper alloc_flags in 
>> __compaction_suitable()")
>> Cc: <stable@vger.kernel.org>
>> Signed-off-by: yangge <yangge1116@126.com>
>> ---
>>   mm/compaction.c | 8 +++++---
>>   mm/page_alloc.c | 4 +++-
>>   2 files changed, 8 insertions(+), 4 deletions(-)
>>
>> diff --git a/mm/compaction.c b/mm/compaction.c
>> index 07bd227..044c2247 100644
>> --- a/mm/compaction.c
>> +++ b/mm/compaction.c
>> @@ -2384,6 +2384,7 @@ static bool __compaction_suitable(struct zone 
>> *zone, int order,
>>                     unsigned long wmark_target)
>>   {
>>       unsigned long watermark;
>> +    bool pin;
>>       /*
>>        * Watermarks for order-0 must be met for compaction to be able to
>>        * isolate free pages for migration targets. This means that the
>> @@ -2395,14 +2396,15 @@ static bool __compaction_suitable(struct zone 
>> *zone, int order,
>>        * even if compaction succeeds.
>>        * For costly orders, we require low watermark instead of min for
>>        * compaction to proceed to increase its chances.
>> -     * ALLOC_CMA is used, as pages in CMA pageblocks are considered
>> -     * suitable migration targets
>> +     * In addition to long term GUP flow, ALLOC_CMA is used, as pages in
>> +     * CMA pageblocks are considered suitable migration targets
>>        */
>>       watermark = (order > PAGE_ALLOC_COSTLY_ORDER) ?
>>                   low_wmark_pages(zone) : min_wmark_pages(zone);
>>       watermark += compact_gap(order);
>> +    pin = !!(current->flags & PF_MEMALLOC_PIN);
>>       return __zone_watermark_ok(zone, 0, watermark, highest_zoneidx,
>> -                   ALLOC_CMA, wmark_target);
>> +                   pin ? 0 : ALLOC_CMA, wmark_target);
>>   }
> 
> Seems a little hack for me. Using the 'cc->alloc_flags' passed from the 
> caller to determin if ‘ALLOC_CMA’ is needed looks more reasonable to me.

Ok, thanks.

> 
>>   /*
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index dde19db..9a5dfda 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -2813,6 +2813,7 @@ int __isolate_free_page(struct page *page, 
>> unsigned int order)
>>   {
>>       struct zone *zone = page_zone(page);
>>       int mt = get_pageblock_migratetype(page);
>> +    bool pin;
>>       if (!is_migrate_isolate(mt)) {
>>           unsigned long watermark;
>> @@ -2823,7 +2824,8 @@ int __isolate_free_page(struct page *page, 
>> unsigned int order)
>>            * exists.
>>            */
>>           watermark = zone->_watermark[WMARK_MIN] + (1UL << order);
>> -        if (!zone_watermark_ok(zone, 0, watermark, 0, ALLOC_CMA))
>> +        pin = !!(current->flags & PF_MEMALLOC_PIN);
>> +        if (!zone_watermark_ok(zone, 0, watermark, 0, pin ? 0 : 
>> ALLOC_CMA))
>>               return 0;
>>       }

diff --git a/mm/compaction.c b/mm/compaction.c
index 07bd227..044c2247 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -2384,6 +2384,7 @@  static bool __compaction_suitable(struct zone *zone, int order,
 				  unsigned long wmark_target)
 {
 	unsigned long watermark;
+	bool pin;
 	/*
 	 * Watermarks for order-0 must be met for compaction to be able to
 	 * isolate free pages for migration targets. This means that the
@@ -2395,14 +2396,15 @@  static bool __compaction_suitable(struct zone *zone, int order,
 	 * even if compaction succeeds.
 	 * For costly orders, we require low watermark instead of min for
 	 * compaction to proceed to increase its chances.
-	 * ALLOC_CMA is used, as pages in CMA pageblocks are considered
-	 * suitable migration targets
+	 * In addition to long term GUP flow, ALLOC_CMA is used, as pages in
+	 * CMA pageblocks are considered suitable migration targets
 	 */
 	watermark = (order > PAGE_ALLOC_COSTLY_ORDER) ?
 				low_wmark_pages(zone) : min_wmark_pages(zone);
 	watermark += compact_gap(order);
+	pin = !!(current->flags & PF_MEMALLOC_PIN);
 	return __zone_watermark_ok(zone, 0, watermark, highest_zoneidx,
-				   ALLOC_CMA, wmark_target);
+				   pin ? 0 : ALLOC_CMA, wmark_target);
 }
 
 /*
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index dde19db..9a5dfda 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2813,6 +2813,7 @@  int __isolate_free_page(struct page *page, unsigned int order)
 {
 	struct zone *zone = page_zone(page);
 	int mt = get_pageblock_migratetype(page);
+	bool pin;
 
 	if (!is_migrate_isolate(mt)) {
 		unsigned long watermark;
@@ -2823,7 +2824,8 @@  int __isolate_free_page(struct page *page, unsigned int order)
 		 * exists.
 		 */
 		watermark = zone->_watermark[WMARK_MIN] + (1UL << order);
-		if (!zone_watermark_ok(zone, 0, watermark, 0, ALLOC_CMA))
+		pin = !!(current->flags & PF_MEMALLOC_PIN);
+		if (!zone_watermark_ok(zone, 0, watermark, 0, pin ? 0 : ALLOC_CMA))
 			return 0;
 	}

mm, compaction: don't use ALLOC_CMA in long term GUP flow

Commit Message

Comments

Patch