diff mbox series

mm,cma: remove pfn_range_valid_contig

Message ID 20200306170647.455a2db3@imladris.surriel.com (mailing list archive)
State New, archived
Headers show
Series mm,cma: remove pfn_range_valid_contig | expand

Commit Message

Rik van Riel March 6, 2020, 10:06 p.m. UTC
The function pfn_range_valid_contig checks whether all memory in the
target area is free. This causes unnecessary CMA failures, since
alloc_contig_range will migrate movable memory out of a target range,
and has its own sanity check early on in has_unmovable_pages, which
is called from start_isolate_page_range & set_migrate_type_isolate.

Relying on that has_unmovable_pages call simplifies the CMA code and
results in an increased success rate of CMA allocations.

Signed-off-by: Rik van Riel <riel@surriel.com>
---
 mm/page_alloc.c | 47 +++--------------------------------------------
 1 file changed, 3 insertions(+), 44 deletions(-)

Comments

Rik van Riel March 6, 2020, 10:22 p.m. UTC | #1
On Fri, 2020-03-06 at 17:06 -0500, Rik van Riel wrote:
> The function pfn_range_valid_contig checks whether all memory in the
> target area is free. This causes unnecessary CMA failures, since
> alloc_contig_range will migrate movable memory out of a target range,
> and has its own sanity check early on in has_unmovable_pages, which
> is called from start_isolate_page_range & set_migrate_type_isolate.
> 
> Relying on that has_unmovable_pages call simplifies the CMA code and
> results in an increased success rate of CMA allocations.

Thinking about this some more, could we get away with
entirely removing alloc_contig_pages, and simply having
the hugetlb code use cma_alloc?

That might be simpler still.

It also seems like it could make the success rate of 
1GB hugepage allocation much more predictable, because
the kernel will place only movable memory allocations
inside the CMA area, and we would never try to allocate
a 1GB huge page from a 1GB memory area with unmovable
pages.

It would be possible for the code in hugetlb_init() to 
invoke the cma setup code as needed, to set aside an 
appropriate amount of memory for movable allocations
(and later CMA 1GB hugepages) only.

Then again, if the allocation reliability ends up
being eg. 90% instead of 100%, we may need some way
to set up the memory pool for CMA hugepage allocation
a little larger, and not size it automatically to the
desired number of hugepages (with nothing to spare).

An explicit hugepage_cma=nG option would cover that.

Thoughts?
Vlastimil Babka March 11, 2020, 8:40 a.m. UTC | #2
On 3/6/20 11:06 PM, Rik van Riel wrote:
> The function pfn_range_valid_contig checks whether all memory in the
> target area is free. This causes unnecessary CMA failures, since
> alloc_contig_range will migrate movable memory out of a target range,
> and has its own sanity check early on in has_unmovable_pages, which
> is called from start_isolate_page_range & set_migrate_type_isolate.
> 
> Relying on that has_unmovable_pages call simplifies the CMA code and
> results in an increased success rate of CMA allocations.
> 
> Signed-off-by: Rik van Riel <riel@surriel.com>

Yeah, the page_count and PageHuge checks are harmful. Not sure about
PageReserved. And is anything later in the alloc_contig_range() making sure that
we are always in the same zone?

> ---
>  mm/page_alloc.c | 47 +++--------------------------------------------
>  1 file changed, 3 insertions(+), 44 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 0fb3c1719625..75e84907d8c6 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -8539,32 +8539,6 @@ static int __alloc_contig_pages(unsigned long start_pfn,
>  				  gfp_mask);
>  }
>  
> -static bool pfn_range_valid_contig(struct zone *z, unsigned long start_pfn,
> -				   unsigned long nr_pages)
> -{
> -	unsigned long i, end_pfn = start_pfn + nr_pages;
> -	struct page *page;
> -
> -	for (i = start_pfn; i < end_pfn; i++) {
> -		page = pfn_to_online_page(i);
> -		if (!page)
> -			return false;
> -
> -		if (page_zone(page) != z)
> -			return false;
> -
> -		if (PageReserved(page))
> -			return false;
> -
> -		if (page_count(page) > 0)
> -			return false;
> -
> -		if (PageHuge(page))
> -			return false;
> -	}
> -	return true;
> -}
> -
>  static bool zone_spans_last_pfn(const struct zone *zone,
>  				unsigned long start_pfn, unsigned long nr_pages)
>  {
> @@ -8605,28 +8579,13 @@ struct page *alloc_contig_pages(unsigned long nr_pages, gfp_t gfp_mask,
>  	zonelist = node_zonelist(nid, gfp_mask);
>  	for_each_zone_zonelist_nodemask(zone, z, zonelist,
>  					gfp_zone(gfp_mask), nodemask) {
> -		spin_lock_irqsave(&zone->lock, flags);
> -
>  		pfn = ALIGN(zone->zone_start_pfn, nr_pages);
>  		while (zone_spans_last_pfn(zone, pfn, nr_pages)) {
> -			if (pfn_range_valid_contig(zone, pfn, nr_pages)) {
> -				/*
> -				 * We release the zone lock here because
> -				 * alloc_contig_range() will also lock the zone
> -				 * at some point. If there's an allocation
> -				 * spinning on this lock, it may win the race
> -				 * and cause alloc_contig_range() to fail...
> -				 */
> -				spin_unlock_irqrestore(&zone->lock, flags);
> -				ret = __alloc_contig_pages(pfn, nr_pages,
> -							gfp_mask);
> -				if (!ret)
> -					return pfn_to_page(pfn);
> -				spin_lock_irqsave(&zone->lock, flags);
> -			}
> +			ret = __alloc_contig_pages(pfn, nr_pages, gfp_mask);
> +			if (!ret)
> +				return pfn_to_page(pfn);
>  			pfn += nr_pages;
>  		}
> -		spin_unlock_irqrestore(&zone->lock, flags);
>  	}
>  	return NULL;
>  }
>
diff mbox series

Patch

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0fb3c1719625..75e84907d8c6 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -8539,32 +8539,6 @@  static int __alloc_contig_pages(unsigned long start_pfn,
 				  gfp_mask);
 }
 
-static bool pfn_range_valid_contig(struct zone *z, unsigned long start_pfn,
-				   unsigned long nr_pages)
-{
-	unsigned long i, end_pfn = start_pfn + nr_pages;
-	struct page *page;
-
-	for (i = start_pfn; i < end_pfn; i++) {
-		page = pfn_to_online_page(i);
-		if (!page)
-			return false;
-
-		if (page_zone(page) != z)
-			return false;
-
-		if (PageReserved(page))
-			return false;
-
-		if (page_count(page) > 0)
-			return false;
-
-		if (PageHuge(page))
-			return false;
-	}
-	return true;
-}
-
 static bool zone_spans_last_pfn(const struct zone *zone,
 				unsigned long start_pfn, unsigned long nr_pages)
 {
@@ -8605,28 +8579,13 @@  struct page *alloc_contig_pages(unsigned long nr_pages, gfp_t gfp_mask,
 	zonelist = node_zonelist(nid, gfp_mask);
 	for_each_zone_zonelist_nodemask(zone, z, zonelist,
 					gfp_zone(gfp_mask), nodemask) {
-		spin_lock_irqsave(&zone->lock, flags);
-
 		pfn = ALIGN(zone->zone_start_pfn, nr_pages);
 		while (zone_spans_last_pfn(zone, pfn, nr_pages)) {
-			if (pfn_range_valid_contig(zone, pfn, nr_pages)) {
-				/*
-				 * We release the zone lock here because
-				 * alloc_contig_range() will also lock the zone
-				 * at some point. If there's an allocation
-				 * spinning on this lock, it may win the race
-				 * and cause alloc_contig_range() to fail...
-				 */
-				spin_unlock_irqrestore(&zone->lock, flags);
-				ret = __alloc_contig_pages(pfn, nr_pages,
-							gfp_mask);
-				if (!ret)
-					return pfn_to_page(pfn);
-				spin_lock_irqsave(&zone->lock, flags);
-			}
+			ret = __alloc_contig_pages(pfn, nr_pages, gfp_mask);
+			if (!ret)
+				return pfn_to_page(pfn);
 			pfn += nr_pages;
 		}
-		spin_unlock_irqrestore(&zone->lock, flags);
 	}
 	return NULL;
 }