[09/10] mm: page_isolation: prepare for hygienic freelists

Message ID	20240320180429.678181-10-hannes@cmpxchg.org (mailing list archive)
State	New
Headers	show Return-Path: <owner-linux-mm@kvack.org> From: Johannes Weiner <hannes@cmpxchg.org> To: Andrew Morton <akpm@linux-foundation.org> Cc: Vlastimil Babka <vbabka@suse.cz>, Mel Gorman <mgorman@techsingularity.net>, Zi Yan <ziy@nvidia.com>, "Huang, Ying" <ying.huang@intel.com>, David Hildenbrand <david@redhat.com>, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH 09/10] mm: page_isolation: prepare for hygienic freelists Date: Wed, 20 Mar 2024 14:02:14 -0400 Message-ID: <20240320180429.678181-10-hannes@cmpxchg.org> In-Reply-To: <20240320180429.678181-1-hannes@cmpxchg.org> References: <20240320180429.678181-1-hannes@cmpxchg.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	mm: page_alloc: freelist migratetype hygiene \| expand [V4,00/10] mm: page_alloc: freelist migratetype hygiene [01/10] mm: page_alloc: remove pcppage migratetype caching [02/10] mm: page_alloc: optimize free_unref_folios() [03/10] mm: page_alloc: fix up block types when merging compatible blocks [04/10] mm: page_alloc: move free pages when converting block during isolation [05/10] mm: page_alloc: fix move_freepages_block() range error [06/10] mm: page_alloc: fix freelist movement during block conversion [07/10] mm: page_alloc: close migratetype race between freeing and stealing [08/10] mm: page_alloc: set migratetype inside move_freepages() [09/10] mm: page_isolation: prepare for hygienic freelists [10/10] mm: page_alloc: consolidate free page accounting

Message ID

20240320180429.678181-10-hannes@cmpxchg.org (mailing list archive)

State

New

Headers

From: Johannes Weiner <hannes@cmpxchg.org>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Vlastimil Babka <vbabka@suse.cz>,
	Mel Gorman <mgorman@techsingularity.net>,
	Zi Yan <ziy@nvidia.com>,
	"Huang, Ying" <ying.huang@intel.com>,
	David Hildenbrand <david@redhat.com>,
	linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Subject: [PATCH 09/10] mm: page_isolation: prepare for hygienic freelists
Date: Wed, 20 Mar 2024 14:02:14 -0400
Message-ID: <20240320180429.678181-10-hannes@cmpxchg.org>
In-Reply-To: <20240320180429.678181-1-hannes@cmpxchg.org>
References: <20240320180429.678181-1-hannes@cmpxchg.org>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Sender: owner-linux-mm@kvack.org
Precedence: bulk

Series

mm: page_alloc: freelist migratetype hygiene | expand

Commit Message

Johannes Weiner March 20, 2024, 6:02 p.m. UTC

Page isolation currently sets MIGRATE_ISOLATE on a block, then drops
zone->lock and scans the block for straddling buddies to split
up. Because this happens non-atomically wrt the page allocator, it's
possible for allocations to get a buddy whose first block is a regular
pcp migratetype but whose tail is isolated. This means that in certain
cases memory can still be allocated after isolation. It will also
trigger the freelist type hygiene warnings in subsequent patches.

start_isolate_page_range()
  isolate_single_pageblock()
    set_migratetype_isolate(tail)
      lock zone->lock
      move_freepages_block(tail) // nop
      set_pageblock_migratetype(tail)
      unlock zone->lock
                                                     __rmqueue_smallest()
                                                       del_page_from_freelist(head)
                                                       expand(head, head_mt)
                                                         WARN(head_mt != tail_mt)
    start_pfn = ALIGN_DOWN(MAX_ORDER_NR_PAGES)
    for (pfn = start_pfn, pfn < end_pfn)
      if (PageBuddy())
        split_free_page(head)

Introduce a variant of move_freepages_block() provided by the
allocator specifically for page isolation; it moves free pages,
converts the block, and handles the splitting of straddling buddies
while holding zone->lock.

The allocator knows that pageblocks and buddies are always naturally
aligned, which means that buddies can only straddle blocks if they're
actually >pageblock_order. This means the search-and-split part can be
simplified compared to what page isolation used to do.

Also tighten up the page isolation code around the expectations of
which pages can be large, and how they are freed.

Based on extensive discussions with and invaluable input from Zi Yan.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 include/linux/page-isolation.h |   4 +-
 mm/internal.h                  |   4 -
 mm/page_alloc.c                | 200 +++++++++++++++++++--------------
 mm/page_isolation.c            | 106 ++++++-----------
 4 files changed, 151 insertions(+), 163 deletions(-)

Comments

kernel test robot March 21, 2024, 1:13 p.m. UTC | #1

Hi Johannes,

kernel test robot noticed the following build warnings:

[auto build test WARNING on akpm-mm/mm-everything]

url:    https://github.com/intel-lab-lkp/linux/commits/Johannes-Weiner/mm-page_alloc-remove-pcppage-migratetype-caching/20240321-020814
base:   https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link:    https://lore.kernel.org/r/20240320180429.678181-10-hannes%40cmpxchg.org
patch subject: [PATCH 09/10] mm: page_isolation: prepare for hygienic freelists
config: i386-randconfig-003-20240321 (https://download.01.org/0day-ci/archive/20240321/202403212118.ye7lcKjD-lkp@intel.com/config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240321/202403212118.ye7lcKjD-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202403212118.ye7lcKjD-lkp@intel.com/

All warnings (new ones prefixed by >>):

   mm/page_alloc.c: In function 'move_freepages_block_isolate':
>> mm/page_alloc.c:688:17: warning: array subscript 11 is above array bounds of 'struct free_area[11]' [-Warray-bounds]
     688 |  zone->free_area[order].nr_free--;
         |  ~~~~~~~~~~~~~~~^~~~~~~
>> mm/page_alloc.c:688:17: warning: array subscript 11 is above array bounds of 'struct free_area[11]' [-Warray-bounds]


vim +688 mm/page_alloc.c

6ab0136310961eb Alexander Duyck 2020-04-06  677  
6ab0136310961eb Alexander Duyck 2020-04-06  678  static inline void del_page_from_free_list(struct page *page, struct zone *zone,
6ab0136310961eb Alexander Duyck 2020-04-06  679  					   unsigned int order)
6ab0136310961eb Alexander Duyck 2020-04-06  680  {
36e66c554b5c6a9 Alexander Duyck 2020-04-06  681  	/* clear reported state and update reported page count */
36e66c554b5c6a9 Alexander Duyck 2020-04-06  682  	if (page_reported(page))
36e66c554b5c6a9 Alexander Duyck 2020-04-06  683  		__ClearPageReported(page);
36e66c554b5c6a9 Alexander Duyck 2020-04-06  684  
bf75f200569dd05 Mel Gorman      2022-06-24  685  	list_del(&page->buddy_list);
6ab0136310961eb Alexander Duyck 2020-04-06  686  	__ClearPageBuddy(page);
6ab0136310961eb Alexander Duyck 2020-04-06  687  	set_page_private(page, 0);
6ab0136310961eb Alexander Duyck 2020-04-06 @688  	zone->free_area[order].nr_free--;
6ab0136310961eb Alexander Duyck 2020-04-06  689  }
6ab0136310961eb Alexander Duyck 2020-04-06  690

Johannes Weiner March 21, 2024, 2:24 p.m. UTC | #2

On Thu, Mar 21, 2024 at 09:13:57PM +0800, kernel test robot wrote:
> Hi Johannes,
> 
> kernel test robot noticed the following build warnings:
> 
> [auto build test WARNING on akpm-mm/mm-everything]
> 
> url:    https://github.com/intel-lab-lkp/linux/commits/Johannes-Weiner/mm-page_alloc-remove-pcppage-migratetype-caching/20240321-020814
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
> patch link:    https://lore.kernel.org/r/20240320180429.678181-10-hannes%40cmpxchg.org
> patch subject: [PATCH 09/10] mm: page_isolation: prepare for hygienic freelists
> config: i386-randconfig-003-20240321 (https://download.01.org/0day-ci/archive/20240321/202403212118.ye7lcKjD-lkp@intel.com/config)
> compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
> reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240321/202403212118.ye7lcKjD-lkp@intel.com/reproduce)
> 
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <lkp@intel.com>
> | Closes: https://lore.kernel.org/oe-kbuild-all/202403212118.ye7lcKjD-lkp@intel.com/
> 
> All warnings (new ones prefixed by >>):
> 
>    mm/page_alloc.c: In function 'move_freepages_block_isolate':
> >> mm/page_alloc.c:688:17: warning: array subscript 11 is above array bounds of 'struct free_area[11]' [-Warray-bounds]
>      688 |  zone->free_area[order].nr_free--;
>          |  ~~~~~~~~~~~~~~~^~~~~~~
> >> mm/page_alloc.c:688:17: warning: array subscript 11 is above array bounds of 'struct free_area[11]' [-Warray-bounds]

I think this is a bug in the old gcc.

We have this in move_freepages_block_isolate():

	/* We're the starting block of a larger buddy */
	if (PageBuddy(page) && buddy_order(page) > pageblock_order) {
		int mt = get_pfnblock_migratetype(page, pfn);
		int order = buddy_order(page);

		if (!is_migrate_isolate(mt))
			__mod_zone_freepage_state(zone, -(1UL << order), mt);
		del_page_from_free_list(page, zone, order);

And this config doesn't have hugetlb enabled, so:

/* If huge pages are not used, group by MAX_ORDER_NR_PAGES */
#define pageblock_order         MAX_PAGE_ORDER

If buddies were indeed >MAX_PAGE_ORDER, this would be an out-of-bounds
access when delete updates the freelist count. Of course, buddies per
definition cannot be larger than MAX_PAGE_ORDER. But the older gcc
doesn't seem to realize this branch in this configuration is dead.

Maybe we can help it out and make the impossible scenario a bit more
explicit? Does this fixlet silence the warning?

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index efb2581ac142..4cdc356e73f6 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1698,6 +1698,10 @@ bool move_freepages_block_isolate(struct zone *zone, struct page *page,
 				       NULL, NULL))
 		return false;
 
+	/* No splits needed if buddies can't span multiple blocks */
+	if (pageblock_order == MAX_PAGE_ORDER)
+		goto move;
+
 	/* We're a tail block in a larger buddy */
 	pfn = find_large_buddy(start_pfn);
 	if (pfn != start_pfn) {
@@ -1725,7 +1729,7 @@ bool move_freepages_block_isolate(struct zone *zone, struct page *page,
 		split_large_buddy(zone, page, pfn, order);
 		return true;
 	}
-
+move:
 	mt = get_pfnblock_migratetype(page, start_pfn);
 	nr_moved = move_freepages(zone, start_pfn, end_pfn, migratetype);
 	if (!is_migrate_isolate(mt))

Zi Yan, does this look sane to you as well?

Zi Yan March 21, 2024, 3:03 p.m. UTC | #3

On 21 Mar 2024, at 10:24, Johannes Weiner wrote:

> On Thu, Mar 21, 2024 at 09:13:57PM +0800, kernel test robot wrote:
>> Hi Johannes,
>>
>> kernel test robot noticed the following build warnings:
>>
>> [auto build test WARNING on akpm-mm/mm-everything]
>>
>> url:    https://github.com/intel-lab-lkp/linux/commits/Johannes-Weiner/mm-page_alloc-remove-pcppage-migratetype-caching/20240321-020814
>> base:   https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
>> patch link:    https://lore.kernel.org/r/20240320180429.678181-10-hannes%40cmpxchg.org
>> patch subject: [PATCH 09/10] mm: page_isolation: prepare for hygienic freelists
>> config: i386-randconfig-003-20240321 (https://download.01.org/0day-ci/archive/20240321/202403212118.ye7lcKjD-lkp@intel.com/config)
>> compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
>> reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240321/202403212118.ye7lcKjD-lkp@intel.com/reproduce)
>>
>> If you fix the issue in a separate patch/commit (i.e. not just a new version of
>> the same patch/commit), kindly add following tags
>> | Reported-by: kernel test robot <lkp@intel.com>
>> | Closes: https://lore.kernel.org/oe-kbuild-all/202403212118.ye7lcKjD-lkp@intel.com/
>>
>> All warnings (new ones prefixed by >>):
>>
>>    mm/page_alloc.c: In function 'move_freepages_block_isolate':
>>>> mm/page_alloc.c:688:17: warning: array subscript 11 is above array bounds of 'struct free_area[11]' [-Warray-bounds]
>>      688 |  zone->free_area[order].nr_free--;
>>          |  ~~~~~~~~~~~~~~~^~~~~~~
>>>> mm/page_alloc.c:688:17: warning: array subscript 11 is above array bounds of 'struct free_area[11]' [-Warray-bounds]
>
> I think this is a bug in the old gcc.
>
> We have this in move_freepages_block_isolate():
>
> 	/* We're the starting block of a larger buddy */
> 	if (PageBuddy(page) && buddy_order(page) > pageblock_order) {
> 		int mt = get_pfnblock_migratetype(page, pfn);
> 		int order = buddy_order(page);
>
> 		if (!is_migrate_isolate(mt))
> 			__mod_zone_freepage_state(zone, -(1UL << order), mt);
> 		del_page_from_free_list(page, zone, order);
>
> And this config doesn't have hugetlb enabled, so:
>
> /* If huge pages are not used, group by MAX_ORDER_NR_PAGES */
> #define pageblock_order         MAX_PAGE_ORDER
>
> If buddies were indeed >MAX_PAGE_ORDER, this would be an out-of-bounds
> access when delete updates the freelist count. Of course, buddies per
> definition cannot be larger than MAX_PAGE_ORDER. But the older gcc
> doesn't seem to realize this branch in this configuration is dead.
>
> Maybe we can help it out and make the impossible scenario a bit more
> explicit? Does this fixlet silence the warning?
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index efb2581ac142..4cdc356e73f6 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1698,6 +1698,10 @@ bool move_freepages_block_isolate(struct zone *zone, struct page *page,
>  				       NULL, NULL))
>  		return false;
>
> +	/* No splits needed if buddies can't span multiple blocks */
> +	if (pageblock_order == MAX_PAGE_ORDER)
> +		goto move;
> +
>  	/* We're a tail block in a larger buddy */
>  	pfn = find_large_buddy(start_pfn);
>  	if (pfn != start_pfn) {
> @@ -1725,7 +1729,7 @@ bool move_freepages_block_isolate(struct zone *zone, struct page *page,
>  		split_large_buddy(zone, page, pfn, order);
>  		return true;
>  	}
> -
> +move:
>  	mt = get_pfnblock_migratetype(page, start_pfn);
>  	nr_moved = move_freepages(zone, start_pfn, end_pfn, migratetype);
>  	if (!is_migrate_isolate(mt))
>
> Zi Yan, does this look sane to you as well?
Yes. This and patch 9 look good to me. Thanks.

For both, Reviewed-by: Zi Yan <ziy@nvidia.com>

I also tested them locally and confirm it is a gcc bug and this fix
works for gcc-9.3:
1. gcc-13.2 does not give any warning for the original patch 9
2. gcc-9.3 gives the warning for the origin patch, but the warning goes
away with this patch applied.

--
Best Regards,
Yan, Zi

Vlastimil Babka March 27, 2024, 8:06 a.m. UTC | #4

On 3/20/24 7:02 PM, Johannes Weiner wrote:
> Page isolation currently sets MIGRATE_ISOLATE on a block, then drops
> zone->lock and scans the block for straddling buddies to split
> up. Because this happens non-atomically wrt the page allocator, it's
> possible for allocations to get a buddy whose first block is a regular
> pcp migratetype but whose tail is isolated. This means that in certain
> cases memory can still be allocated after isolation. It will also
> trigger the freelist type hygiene warnings in subsequent patches.
> 
> start_isolate_page_range()
>   isolate_single_pageblock()
>     set_migratetype_isolate(tail)
>       lock zone->lock
>       move_freepages_block(tail) // nop
>       set_pageblock_migratetype(tail)
>       unlock zone->lock
>                                                      __rmqueue_smallest()
>                                                        del_page_from_freelist(head)
>                                                        expand(head, head_mt)
>                                                          WARN(head_mt != tail_mt)
>     start_pfn = ALIGN_DOWN(MAX_ORDER_NR_PAGES)
>     for (pfn = start_pfn, pfn < end_pfn)
>       if (PageBuddy())
>         split_free_page(head)
> 
> Introduce a variant of move_freepages_block() provided by the
> allocator specifically for page isolation; it moves free pages,
> converts the block, and handles the splitting of straddling buddies
> while holding zone->lock.
> 
> The allocator knows that pageblocks and buddies are always naturally
> aligned, which means that buddies can only straddle blocks if they're
> actually >pageblock_order. This means the search-and-split part can be
> simplified compared to what page isolation used to do.
> 
> Also tighten up the page isolation code around the expectations of
> which pages can be large, and how they are freed.
> 
> Based on extensive discussions with and invaluable input from Zi Yan.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

Nice cleanup of hairy code as well!

Reviewed-by: Vlastimil Babka <vbabka@suse.cz>

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 8550b3c91480..c16db0067090 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -34,7 +34,9 @@  static inline bool is_migrate_isolate(int migratetype)
 #define REPORT_FAILURE	0x2
 
 void set_pageblock_migratetype(struct page *page, int migratetype);
-int move_freepages_block(struct zone *zone, struct page *page, int migratetype);
+
+bool move_freepages_block_isolate(struct zone *zone, struct page *page,
+				  int migratetype);
 
 int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
 			     int migratetype, int flags, gfp_t gfp_flags);
diff --git a/mm/internal.h b/mm/internal.h
index f8b31234c130..d6e6c7d9f04e 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -559,10 +559,6 @@  extern void *memmap_alloc(phys_addr_t size, phys_addr_t align,
 void memmap_init_range(unsigned long, int, unsigned long, unsigned long,
 		unsigned long, enum meminit_context, struct vmem_altmap *, int);
 
-
-int split_free_page(struct page *free_page,
-			unsigned int order, unsigned long split_pfn_offset);
-
 #if defined CONFIG_COMPACTION || defined CONFIG_CMA
 
 /*
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d687f27d891f..efb2581ac142 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -832,64 +832,6 @@  static inline void __free_one_page(struct page *page,
 		page_reporting_notify_free(order);
 }
 
-/**
- * split_free_page() -- split a free page at split_pfn_offset
- * @free_page:		the original free page
- * @order:		the order of the page
- * @split_pfn_offset:	split offset within the page
- *
- * Return -ENOENT if the free page is changed, otherwise 0
- *
- * It is used when the free page crosses two pageblocks with different migratetypes
- * at split_pfn_offset within the page. The split free page will be put into
- * separate migratetype lists afterwards. Otherwise, the function achieves
- * nothing.
- */
-int split_free_page(struct page *free_page,
-			unsigned int order, unsigned long split_pfn_offset)
-{
-	struct zone *zone = page_zone(free_page);
-	unsigned long free_page_pfn = page_to_pfn(free_page);
-	unsigned long pfn;
-	unsigned long flags;
-	int free_page_order;
-	int mt;
-	int ret = 0;
-
-	if (split_pfn_offset == 0)
-		return ret;
-
-	spin_lock_irqsave(&zone->lock, flags);
-
-	if (!PageBuddy(free_page) || buddy_order(free_page) != order) {
-		ret = -ENOENT;
-		goto out;
-	}
-
-	mt = get_pfnblock_migratetype(free_page, free_page_pfn);
-	if (likely(!is_migrate_isolate(mt)))
-		__mod_zone_freepage_state(zone, -(1UL << order), mt);
-
-	del_page_from_free_list(free_page, zone, order);
-	for (pfn = free_page_pfn;
-	     pfn < free_page_pfn + (1UL << order);) {
-		int mt = get_pfnblock_migratetype(pfn_to_page(pfn), pfn);
-
-		free_page_order = min_t(unsigned int,
-					pfn ? __ffs(pfn) : order,
-					__fls(split_pfn_offset));
-		__free_one_page(pfn_to_page(pfn), pfn, zone, free_page_order,
-				mt, FPI_NONE);
-		pfn += 1UL << free_page_order;
-		split_pfn_offset -= (1UL << free_page_order);
-		/* we have done the first part, now switch to second part */
-		if (split_pfn_offset == 0)
-			split_pfn_offset = (1UL << order) - (pfn - free_page_pfn);
-	}
-out:
-	spin_unlock_irqrestore(&zone->lock, flags);
-	return ret;
-}
 /*
  * A bad page could be due to a number of fields. Instead of multiple branches,
  * try and check multiple fields with one check. The caller must do a detailed
@@ -1669,8 +1611,8 @@  static bool prep_move_freepages_block(struct zone *zone, struct page *page,
 	return true;
 }
 
-int move_freepages_block(struct zone *zone, struct page *page,
-			 int migratetype)
+static int move_freepages_block(struct zone *zone, struct page *page,
+				int migratetype)
 {
 	unsigned long start_pfn, end_pfn;
 
@@ -1681,6 +1623,119 @@  int move_freepages_block(struct zone *zone, struct page *page,
 	return move_freepages(zone, start_pfn, end_pfn, migratetype);
 }
 
+#ifdef CONFIG_MEMORY_ISOLATION
+/* Look for a buddy that straddles start_pfn */
+static unsigned long find_large_buddy(unsigned long start_pfn)
+{
+	int order = 0;
+	struct page *page;
+	unsigned long pfn = start_pfn;
+
+	while (!PageBuddy(page = pfn_to_page(pfn))) {
+		/* Nothing found */
+		if (++order > MAX_PAGE_ORDER)
+			return start_pfn;
+		pfn &= ~0UL << order;
+	}
+
+	/*
+	 * Found a preceding buddy, but does it straddle?
+	 */
+	if (pfn + (1 << buddy_order(page)) > start_pfn)
+		return pfn;
+
+	/* Nothing found */
+	return start_pfn;
+}
+
+/* Split a multi-block free page into its individual pageblocks */
+static void split_large_buddy(struct zone *zone, struct page *page,
+			      unsigned long pfn, int order)
+{
+	unsigned long end_pfn = pfn + (1 << order);
+
+	VM_WARN_ON_ONCE(order <= pageblock_order);
+	VM_WARN_ON_ONCE(pfn & (pageblock_nr_pages - 1));
+
+	/* Caller removed page from freelist, buddy info cleared! */
+	VM_WARN_ON_ONCE(PageBuddy(page));
+
+	while (pfn != end_pfn) {
+		int mt = get_pfnblock_migratetype(page, pfn);
+
+		__free_one_page(page, pfn, zone, pageblock_order, mt, FPI_NONE);
+		pfn += pageblock_nr_pages;
+		page = pfn_to_page(pfn);
+	}
+}
+
+/**
+ * move_freepages_block_isolate - move free pages in block for page isolation
+ * @zone: the zone
+ * @page: the pageblock page
+ * @migratetype: migratetype to set on the pageblock
+ *
+ * This is similar to move_freepages_block(), but handles the special
+ * case encountered in page isolation, where the block of interest
+ * might be part of a larger buddy spanning multiple pageblocks.
+ *
+ * Unlike the regular page allocator path, which moves pages while
+ * stealing buddies off the freelist, page isolation is interested in
+ * arbitrary pfn ranges that may have overlapping buddies on both ends.
+ *
+ * This function handles that. Straddling buddies are split into
+ * individual pageblocks. Only the block of interest is moved.
+ *
+ * Returns %true if pages could be moved, %false otherwise.
+ */
+bool move_freepages_block_isolate(struct zone *zone, struct page *page,
+				  int migratetype)
+{
+	unsigned long start_pfn, end_pfn, pfn;
+	int nr_moved, mt;
+
+	if (!prep_move_freepages_block(zone, page, &start_pfn, &end_pfn,
+				       NULL, NULL))
+		return false;
+
+	/* We're a tail block in a larger buddy */
+	pfn = find_large_buddy(start_pfn);
+	if (pfn != start_pfn) {
+		struct page *buddy = pfn_to_page(pfn);
+		int order = buddy_order(buddy);
+		int mt = get_pfnblock_migratetype(buddy, pfn);
+
+		if (!is_migrate_isolate(mt))
+			__mod_zone_freepage_state(zone, -(1UL << order), mt);
+		del_page_from_free_list(buddy, zone, order);
+		set_pageblock_migratetype(page, migratetype);
+		split_large_buddy(zone, buddy, pfn, order);
+		return true;
+	}
+
+	/* We're the starting block of a larger buddy */
+	if (PageBuddy(page) && buddy_order(page) > pageblock_order) {
+		int mt = get_pfnblock_migratetype(page, pfn);
+		int order = buddy_order(page);
+
+		if (!is_migrate_isolate(mt))
+			__mod_zone_freepage_state(zone, -(1UL << order), mt);
+		del_page_from_free_list(page, zone, order);
+		set_pageblock_migratetype(page, migratetype);
+		split_large_buddy(zone, page, pfn, order);
+		return true;
+	}
+
+	mt = get_pfnblock_migratetype(page, start_pfn);
+	nr_moved = move_freepages(zone, start_pfn, end_pfn, migratetype);
+	if (!is_migrate_isolate(mt))
+		__mod_zone_freepage_state(zone, -nr_moved, mt);
+	else if (!is_migrate_isolate(migratetype))
+		__mod_zone_freepage_state(zone, nr_moved, migratetype);
+	return true;
+}
+#endif /* CONFIG_MEMORY_ISOLATION */
+
 static void change_pageblock_range(struct page *pageblock_page,
 					int start_order, int migratetype)
 {
@@ -6390,7 +6445,6 @@  int alloc_contig_range(unsigned long start, unsigned long end,
 		       unsigned migratetype, gfp_t gfp_mask)
 {
 	unsigned long outer_start, outer_end;
-	int order;
 	int ret = 0;
 
 	struct compact_control cc = {
@@ -6463,29 +6517,7 @@  int alloc_contig_range(unsigned long start, unsigned long end,
 	 * We don't have to hold zone->lock here because the pages are
 	 * isolated thus they won't get removed from buddy.
 	 */
-
-	order = 0;
-	outer_start = start;
-	while (!PageBuddy(pfn_to_page(outer_start))) {
-		if (++order > MAX_PAGE_ORDER) {
-			outer_start = start;
-			break;
-		}
-		outer_start &= ~0UL << order;
-	}
-
-	if (outer_start != start) {
-		order = buddy_order(pfn_to_page(outer_start));
-
-		/*
-		 * outer_start page could be small order buddy page and
-		 * it doesn't include start page. Adjust outer_start
-		 * in this case to report failed page properly
-		 * on tracepoint in test_pages_isolated()
-		 */
-		if (outer_start + (1UL << order) <= start)
-			outer_start = start;
-	}
+	outer_start = find_large_buddy(start);
 
 	/* Make sure the range is really isolated. */
 	if (test_pages_isolated(outer_start, end, 0)) {
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index f84f0981b2df..042937d5abe4 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -178,16 +178,10 @@  static int set_migratetype_isolate(struct page *page, int migratetype, int isol_
 	unmovable = has_unmovable_pages(check_unmovable_start, check_unmovable_end,
 			migratetype, isol_flags);
 	if (!unmovable) {
-		int nr_pages;
-		int mt = get_pageblock_migratetype(page);
-
-		nr_pages = move_freepages_block(zone, page, MIGRATE_ISOLATE);
-		/* Block spans zone boundaries? */
-		if (nr_pages == -1) {
+		if (!move_freepages_block_isolate(zone, page, MIGRATE_ISOLATE)) {
 			spin_unlock_irqrestore(&zone->lock, flags);
 			return -EBUSY;
 		}
-		__mod_zone_freepage_state(zone, -nr_pages, mt);
 		zone->nr_isolate_pageblock++;
 		spin_unlock_irqrestore(&zone->lock, flags);
 		return 0;
@@ -254,13 +248,11 @@  static void unset_migratetype_isolate(struct page *page, int migratetype)
 	 * allocation.
 	 */
 	if (!isolated_page) {
-		int nr_pages = move_freepages_block(zone, page, migratetype);
 		/*
 		 * Isolating this block already succeeded, so this
 		 * should not fail on zone boundaries.
 		 */
-		WARN_ON_ONCE(nr_pages == -1);
-		__mod_zone_freepage_state(zone, nr_pages, migratetype);
+		WARN_ON_ONCE(!move_freepages_block_isolate(zone, page, migratetype));
 	} else {
 		set_pageblock_migratetype(page, migratetype);
 		__putback_isolated_page(page, order, migratetype);
@@ -374,26 +366,29 @@  static int isolate_single_pageblock(unsigned long boundary_pfn, int flags,
 
 		VM_BUG_ON(!page);
 		pfn = page_to_pfn(page);
-		/*
-		 * start_pfn is MAX_ORDER_NR_PAGES aligned, if there is any
-		 * free pages in [start_pfn, boundary_pfn), its head page will
-		 * always be in the range.
-		 */
+
 		if (PageBuddy(page)) {
 			int order = buddy_order(page);
 
-			if (pfn + (1UL << order) > boundary_pfn) {
-				/* free page changed before split, check it again */
-				if (split_free_page(page, order, boundary_pfn - pfn))
-					continue;
-			}
+			/* move_freepages_block_isolate() handled this */
+			VM_WARN_ON_ONCE(pfn + (1 << order) > boundary_pfn);
 
 			pfn += 1UL << order;
 			continue;
 		}
+
 		/*
-		 * migrate compound pages then let the free page handling code
-		 * above do the rest. If migration is not possible, just fail.
+		 * If a compound page is straddling our block, attempt
+		 * to migrate it out of the way.
+		 *
+		 * We don't have to worry about this creating a large
+		 * free page that straddles into our block: gigantic
+		 * pages are freed as order-0 chunks, and LRU pages
+		 * (currently) do not exceed pageblock_order.
+		 *
+		 * The block of interest has already been marked
+		 * MIGRATE_ISOLATE above, so when migration is done it
+		 * will free its pages onto the correct freelists.
 		 */
 		if (PageCompound(page)) {
 			struct page *head = compound_head(page);
@@ -404,16 +399,10 @@  static int isolate_single_pageblock(unsigned long boundary_pfn, int flags,
 				pfn = head_pfn + nr_pages;
 				continue;
 			}
+
 #if defined CONFIG_COMPACTION || defined CONFIG_CMA
-			/*
-			 * hugetlb, lru compound (THP), and movable compound pages
-			 * can be migrated. Otherwise, fail the isolation.
-			 */
-			if (PageHuge(page) || PageLRU(page) || __PageMovable(page)) {
-				int order;
-				unsigned long outer_pfn;
+			if (PageHuge(page)) {
 				int page_mt = get_pageblock_migratetype(page);
-				bool isolate_page = !is_migrate_isolate_page(page);
 				struct compact_control cc = {
 					.nr_migratepages = 0,
 					.order = -1,
@@ -426,56 +415,25 @@  static int isolate_single_pageblock(unsigned long boundary_pfn, int flags,
 				};
 				INIT_LIST_HEAD(&cc.migratepages);
 
-				/*
-				 * XXX: mark the page as MIGRATE_ISOLATE so that
-				 * no one else can grab the freed page after migration.
-				 * Ideally, the page should be freed as two separate
-				 * pages to be added into separate migratetype free
-				 * lists.
-				 */
-				if (isolate_page) {
-					ret = set_migratetype_isolate(page, page_mt,
-						flags, head_pfn, head_pfn + nr_pages);
-					if (ret)
-						goto failed;
-				}
-
 				ret = __alloc_contig_migrate_range(&cc, head_pfn,
 							head_pfn + nr_pages, page_mt);
-
-				/*
-				 * restore the page's migratetype so that it can
-				 * be split into separate migratetype free lists
-				 * later.
-				 */
-				if (isolate_page)
-					unset_migratetype_isolate(page, page_mt);
-
 				if (ret)
 					goto failed;
-				/*
-				 * reset pfn to the head of the free page, so
-				 * that the free page handling code above can split
-				 * the free page to the right migratetype list.
-				 *
-				 * head_pfn is not used here as a hugetlb page order
-				 * can be bigger than MAX_PAGE_ORDER, but after it is
-				 * freed, the free page order is not. Use pfn within
-				 * the range to find the head of the free page.
-				 */
-				order = 0;
-				outer_pfn = pfn;
-				while (!PageBuddy(pfn_to_page(outer_pfn))) {
-					/* stop if we cannot find the free page */
-					if (++order > MAX_PAGE_ORDER)
-						goto failed;
-					outer_pfn &= ~0UL << order;
-				}
-				pfn = outer_pfn;
+				pfn = head_pfn + nr_pages;
 				continue;
-			} else
+			}
+
+			/*
+			 * These pages are movable too, but they're
+			 * not expected to exceed pageblock_order.
+			 *
+			 * Let us know when they do, so we can add
+			 * proper free and split handling for them.
+			 */
+			VM_WARN_ON_ONCE_PAGE(PageLRU(page), page);
+			VM_WARN_ON_ONCE_PAGE(__PageMovable(page), page);
 #endif
-				goto failed;
+			goto failed;
 		}
 
 		pfn++;

[09/10] mm: page_isolation: prepare for hygienic freelists

Commit Message

Comments

Patch