diff mbox series

[RFC,05/26] mm: page_alloc: per-migratetype pcplist for THPs

Message ID 20230418191313.268131-6-hannes@cmpxchg.org (mailing list archive)
State New
Headers show
Series mm: reliable huge page allocator | expand

Commit Message

Johannes Weiner April 18, 2023, 7:12 p.m. UTC
Right now, there is only one pcplist for THP allocations. However,
while most THPs are movable, the huge zero page is not. This means a
movable THP allocation can grab an unmovable block from the pcplist,
and a subsequent THP split, partial free, and reallocation of the
remainder will mix movable and unmovable pages in the block.

While this isn't a huge source of block pollution in practice, it
happens often enough to trigger debug warnings fairly quickly under
load. In the interest of tightening up pageblock hygiene, make the THP
pcplists fully migratetype-aware, just like the lower order ones.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 include/linux/mmzone.h | 8 +++-----
 mm/page_alloc.c        | 4 ++--
 2 files changed, 5 insertions(+), 7 deletions(-)

Comments

Mel Gorman April 21, 2023, 12:47 p.m. UTC | #1
On Tue, Apr 18, 2023 at 03:12:52PM -0400, Johannes Weiner wrote:
> Right now, there is only one pcplist for THP allocations. However,
> while most THPs are movable, the huge zero page is not. This means a
> movable THP allocation can grab an unmovable block from the pcplist,
> and a subsequent THP split, partial free, and reallocation of the
> remainder will mix movable and unmovable pages in the block.
> 
> While this isn't a huge source of block pollution in practice, it
> happens often enough to trigger debug warnings fairly quickly under
> load. In the interest of tightening up pageblock hygiene, make the THP
> pcplists fully migratetype-aware, just like the lower order ones.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

Split out :P

Take special care of this one because, while I didn't check this, I
suspect it'll push the PCP structure size into the next cache line and
increase overhead.

The changelog makes it unclear why exactly this happens or why the
patch fixes it. The huge zero page strips GFP_MOVABLE (so unmovable)
but at allocation time, it doesn't really matter what the movable type
is because it's a full pageblock. It doesn't appear to be a hazard until
the split happens. Assuming that's the case, it should be ok to always
set the pageblock movable for THP allocations regardless of GFP flags at
allocation time or else set the pageblock MOVABLE at THP split (always
MOVABLE at allocation time makes more sense).
Johannes Weiner April 21, 2023, 3:06 p.m. UTC | #2
On Fri, Apr 21, 2023 at 01:47:44PM +0100, Mel Gorman wrote:
> On Tue, Apr 18, 2023 at 03:12:52PM -0400, Johannes Weiner wrote:
> > Right now, there is only one pcplist for THP allocations. However,
> > while most THPs are movable, the huge zero page is not. This means a
> > movable THP allocation can grab an unmovable block from the pcplist,
> > and a subsequent THP split, partial free, and reallocation of the
> > remainder will mix movable and unmovable pages in the block.
> > 
> > While this isn't a huge source of block pollution in practice, it
> > happens often enough to trigger debug warnings fairly quickly under
> > load. In the interest of tightening up pageblock hygiene, make the THP
> > pcplists fully migratetype-aware, just like the lower order ones.
> > 
> > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> 
> Split out :P
> 
> Take special care of this one because, while I didn't check this, I
> suspect it'll push the PCP structure size into the next cache line and
> increase overhead.
>
> The changelog makes it unclear why exactly this happens or why the
> patch fixes it.

Before this, I'd see warnings from the last patch in the series about
received migratetype not matching requested mt.

The way it happens is that the zero page gets freed and the unmovable
block put on the pcplist. A regular THP allocation is subsequently
served from an unmovable block.

Mental note, I think this can happen the other way around too: a
regular THP on the pcp being served to a MIGRATE_UNMOVABLE zero
THP. It's not supposed to, but it looks like there is a bug in the
code that's meant to prevent that from happening in rmqueue():

	if (likely(pcp_allowed_order(order))) {
		/*
		 * MIGRATE_MOVABLE pcplist could have the pages on CMA area and
		 * we need to skip it when CMA area isn't allowed.
		 */
		if (!IS_ENABLED(CONFIG_CMA) || alloc_flags & ALLOC_CMA ||
				migratetype != MIGRATE_MOVABLE) {
			page = rmqueue_pcplist(preferred_zone, zone, order,
					migratetype, alloc_flags);
			if (likely(page))
				goto out;
		}
	}

Surely that last condition should be migratetype == MIGRATE_MOVABLE?

> The huge zero page strips GFP_MOVABLE (so unmovable)
> but at allocation time, it doesn't really matter what the movable type
> is because it's a full pageblock. It doesn't appear to be a hazard until
> the split happens. Assuming that's the case, it should be ok to always
> set the pageblock movable for THP allocations regardless of GFP flags at
> allocation time or else set the pageblock MOVABLE at THP split (always
> MOVABLE at allocation time makes more sense).

The regular allocator compaction skips over compound pages anyway, so
the migratetype should indeed not matter there.

The bigger issue is CMA. alloc_contig_range() will try to move THPs to
free a larger range. We have to be careful not to place an unmovable
zero THP into a CMA region. That means we can not play games with MT -
we really do have to physically keep unmovable and movable THPs apart.

Another option would be not to use pcp for the zero THP. It's cached
anyway in the caller. But it would add branches to the THP alloc and
free fast paths (pcp_allowed_order() also checking migratetype).
Mel Gorman April 28, 2023, 10:29 a.m. UTC | #3
On Fri, Apr 21, 2023 at 11:06:48AM -0400, Johannes Weiner wrote:
> On Fri, Apr 21, 2023 at 01:47:44PM +0100, Mel Gorman wrote:
> > On Tue, Apr 18, 2023 at 03:12:52PM -0400, Johannes Weiner wrote:
> > > Right now, there is only one pcplist for THP allocations. However,
> > > while most THPs are movable, the huge zero page is not. This means a
> > > movable THP allocation can grab an unmovable block from the pcplist,
> > > and a subsequent THP split, partial free, and reallocation of the
> > > remainder will mix movable and unmovable pages in the block.
> > > 
> > > While this isn't a huge source of block pollution in practice, it
> > > happens often enough to trigger debug warnings fairly quickly under
> > > load. In the interest of tightening up pageblock hygiene, make the THP
> > > pcplists fully migratetype-aware, just like the lower order ones.
> > > 
> > > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> > 
> > Split out :P
> > 
> > Take special care of this one because, while I didn't check this, I
> > suspect it'll push the PCP structure size into the next cache line and
> > increase overhead.
> >
> > The changelog makes it unclear why exactly this happens or why the
> > patch fixes it.
> 
> Before this, I'd see warnings from the last patch in the series about
> received migratetype not matching requested mt.
> 
> The way it happens is that the zero page gets freed and the unmovable
> block put on the pcplist. A regular THP allocation is subsequently
> served from an unmovable block.
> 
> Mental note, I think this can happen the other way around too: a
> regular THP on the pcp being served to a MIGRATE_UNMOVABLE zero
> THP. It's not supposed to, but it looks like there is a bug in the
> code that's meant to prevent that from happening in rmqueue():
> 
> 	if (likely(pcp_allowed_order(order))) {
> 		/*
> 		 * MIGRATE_MOVABLE pcplist could have the pages on CMA area and
> 		 * we need to skip it when CMA area isn't allowed.
> 		 */
> 		if (!IS_ENABLED(CONFIG_CMA) || alloc_flags & ALLOC_CMA ||
> 				migratetype != MIGRATE_MOVABLE) {
> 			page = rmqueue_pcplist(preferred_zone, zone, order,
> 					migratetype, alloc_flags);
> 			if (likely(page))
> 				goto out;
> 		}
> 	}
> 
> Surely that last condition should be migratetype == MIGRATE_MOVABLE?
> 

It should be. It would have been missed for ages because it would need a
test case based on a machine configuration that requires CMA for functional
correctness and is using THP which is an unlikely combination.

> > The huge zero page strips GFP_MOVABLE (so unmovable)
> > but at allocation time, it doesn't really matter what the movable type
> > is because it's a full pageblock. It doesn't appear to be a hazard until
> > the split happens. Assuming that's the case, it should be ok to always
> > set the pageblock movable for THP allocations regardless of GFP flags at
> > allocation time or else set the pageblock MOVABLE at THP split (always
> > MOVABLE at allocation time makes more sense).
> 
> The regular allocator compaction skips over compound pages anyway, so
> the migratetype should indeed not matter there.
> 
> The bigger issue is CMA. alloc_contig_range() will try to move THPs to
> free a larger range. We have to be careful not to place an unmovable
> zero THP into a CMA region. That means we can not play games with MT -
> we really do have to physically keep unmovable and movable THPs apart.
> 

Fair point.

> Another option would be not to use pcp for the zero THP. It's cached
> anyway in the caller. But it would add branches to the THP alloc and
> free fast paths (pcp_allowed_order() also checking migratetype).

And this is probably the most straight-forward option. The intent behind
caching some THPs on PCP was faulting large mappings of normal THPs and
reducing the contention on the zone lock a little. The zero THP is somewhat
special because it should not be allocated at high frequency.
diff mbox series

Patch

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index cd28a100d9e4..53e55882a4e7 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -552,13 +552,11 @@  enum zone_watermarks {
 };
 
 /*
- * One per migratetype for each PAGE_ALLOC_COSTLY_ORDER. One additional list
- * for THP which will usually be GFP_MOVABLE. Even if it is another type,
- * it should not contribute to serious fragmentation causing THP allocation
- * failures.
+ * One per migratetype for each PAGE_ALLOC_COSTLY_ORDER. One additional set
+ * for THP (usually GFP_MOVABLE, but with exception of the huge zero page.)
  */
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
-#define NR_PCP_THP 1
+#define NR_PCP_THP MIGRATE_PCPTYPES
 #else
 #define NR_PCP_THP 0
 #endif
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5e04a69f6a26..d3d01019ce77 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -710,7 +710,7 @@  static inline unsigned int order_to_pindex(int migratetype, int order)
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 	if (order > PAGE_ALLOC_COSTLY_ORDER) {
 		VM_BUG_ON(order != pageblock_order);
-		return NR_LOWORDER_PCP_LISTS;
+		return NR_LOWORDER_PCP_LISTS + migratetype;
 	}
 #else
 	VM_BUG_ON(order > PAGE_ALLOC_COSTLY_ORDER);
@@ -724,7 +724,7 @@  static inline int pindex_to_order(unsigned int pindex)
 	int order = pindex / MIGRATE_PCPTYPES;
 
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
-	if (pindex == NR_LOWORDER_PCP_LISTS)
+	if (pindex >= NR_LOWORDER_PCP_LISTS)
 		order = pageblock_order;
 #else
 	VM_BUG_ON(order > PAGE_ALLOC_COSTLY_ORDER);