[10/10] mm: page_alloc: consolidate free page accounting

Message ID	20240320180429.678181-11-hannes@cmpxchg.org (mailing list archive)
State	New
Headers	show Return-Path: <owner-linux-mm@kvack.org> From: Johannes Weiner <hannes@cmpxchg.org> To: Andrew Morton <akpm@linux-foundation.org> Cc: Vlastimil Babka <vbabka@suse.cz>, Mel Gorman <mgorman@techsingularity.net>, Zi Yan <ziy@nvidia.com>, "Huang, Ying" <ying.huang@intel.com>, David Hildenbrand <david@redhat.com>, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH 10/10] mm: page_alloc: consolidate free page accounting Date: Wed, 20 Mar 2024 14:02:15 -0400 Message-ID: <20240320180429.678181-11-hannes@cmpxchg.org> In-Reply-To: <20240320180429.678181-1-hannes@cmpxchg.org> References: <20240320180429.678181-1-hannes@cmpxchg.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	mm: page_alloc: freelist migratetype hygiene \| expand [V4,00/10] mm: page_alloc: freelist migratetype hygiene [01/10] mm: page_alloc: remove pcppage migratetype caching [02/10] mm: page_alloc: optimize free_unref_folios() [03/10] mm: page_alloc: fix up block types when merging compatible blocks [04/10] mm: page_alloc: move free pages when converting block during isolation [05/10] mm: page_alloc: fix move_freepages_block() range error [06/10] mm: page_alloc: fix freelist movement during block conversion [07/10] mm: page_alloc: close migratetype race between freeing and stealing [08/10] mm: page_alloc: set migratetype inside move_freepages() [09/10] mm: page_isolation: prepare for hygienic freelists [10/10] mm: page_alloc: consolidate free page accounting

Johannes Weiner March 20, 2024, 6:02 p.m. UTC

Free page accounting currently happens a bit too high up the call
stack, where it has to deal with guard pages, compaction capturing,
block stealing and even page isolation. This is subtle and fragile,
and makes it difficult to hack on the code.

Now that type violations on the freelists have been fixed, push the
accounting down to where pages enter and leave the freelist.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 include/linux/mm.h     |  18 ++--
 include/linux/vmstat.h |   8 --
 mm/debug_page_alloc.c  |  12 +--
 mm/internal.h          |   5 --
 mm/page_alloc.c        | 194 +++++++++++++++++++++++------------------
 mm/page_isolation.c    |   3 +-
 6 files changed, 120 insertions(+), 120 deletions(-)

Vlastimil Babka March 27, 2024, 8:54 a.m. UTC | #1

On 3/20/24 7:02 PM, Johannes Weiner wrote:
> Free page accounting currently happens a bit too high up the call
> stack, where it has to deal with guard pages, compaction capturing,
> block stealing and even page isolation. This is subtle and fragile,
> and makes it difficult to hack on the code.
> 
> Now that type violations on the freelists have been fixed, push the
> accounting down to where pages enter and leave the freelist.

Awesome!

> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

Reviewed-by: Vlastimil Babka <vbabka@suse.cz>

Just some nits:

> @@ -1314,10 +1349,10 @@ static inline void expand(struct zone *zone, struct page *page,
>  		 * Corresponding page table entries will not be touched,
>  		 * pages will stay not present in virtual address space
>  		 */
> -		if (set_page_guard(zone, &page[size], high, migratetype))
> +		if (set_page_guard(zone, &page[size], high))
>  			continue;
>  
> -		add_to_free_list(&page[size], zone, high, migratetype);
> +		add_to_free_list(&page[size], zone, high, migratetype, false);

This is account_freepages() in the hot loop, what if we instead used
__add_to_free_list(), sum up nr_pages and called account_freepages() once
outside of the loop?

>  		set_buddy_order(&page[size], high);
>  	}
>  }

<snip>

> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> index 042937d5abe4..914a71c580d8 100644
> --- a/mm/page_isolation.c
> +++ b/mm/page_isolation.c
> @@ -252,7 +252,8 @@ static void unset_migratetype_isolate(struct page *page, int migratetype)
>  		 * Isolating this block already succeeded, so this
>  		 * should not fail on zone boundaries.
>  		 */
> -		WARN_ON_ONCE(!move_freepages_block_isolate(zone, page, migratetype));
> +		WARN_ON_ONCE(!move_freepages_block_isolate(zone, page,
> +							   migratetype));
>  	} else {
>  		set_pageblock_migratetype(page, migratetype);
>  		__putback_isolated_page(page, order, migratetype);

Looks like a drive-by edit of an extra file just to adjust identation.

Johannes Weiner March 27, 2024, 2:32 p.m. UTC | #2

On Wed, Mar 27, 2024 at 09:54:01AM +0100, Vlastimil Babka wrote:
> On 3/20/24 7:02 PM, Johannes Weiner wrote:
> > Free page accounting currently happens a bit too high up the call
> > stack, where it has to deal with guard pages, compaction capturing,
> > block stealing and even page isolation. This is subtle and fragile,
> > and makes it difficult to hack on the code.
> > 
> > Now that type violations on the freelists have been fixed, push the
> > accounting down to where pages enter and leave the freelist.
> 
> Awesome!
> 
> > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> 
> Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
> 
> Just some nits:
> 
> > @@ -1314,10 +1349,10 @@ static inline void expand(struct zone *zone, struct page *page,
> >  		 * Corresponding page table entries will not be touched,
> >  		 * pages will stay not present in virtual address space
> >  		 */
> > -		if (set_page_guard(zone, &page[size], high, migratetype))
> > +		if (set_page_guard(zone, &page[size], high))
> >  			continue;
> >  
> > -		add_to_free_list(&page[size], zone, high, migratetype);
> > +		add_to_free_list(&page[size], zone, high, migratetype, false);
> 
> This is account_freepages() in the hot loop, what if we instead used
> __add_to_free_list(), sum up nr_pages and called account_freepages() once
> outside of the loop?

Good idea. I'll send a fixlet for that.

> >  		set_buddy_order(&page[size], high);
> >  	}
> >  }
> 
> <snip>
> 
> > diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> > index 042937d5abe4..914a71c580d8 100644
> > --- a/mm/page_isolation.c
> > +++ b/mm/page_isolation.c
> > @@ -252,7 +252,8 @@ static void unset_migratetype_isolate(struct page *page, int migratetype)
> >  		 * Isolating this block already succeeded, so this
> >  		 * should not fail on zone boundaries.
> >  		 */
> > -		WARN_ON_ONCE(!move_freepages_block_isolate(zone, page, migratetype));
> > +		WARN_ON_ONCE(!move_freepages_block_isolate(zone, page,
> > +							   migratetype));
> >  	} else {
> >  		set_pageblock_migratetype(page, migratetype);
> >  		__putback_isolated_page(page, order, migratetype);
> 
> Looks like a drive-by edit of an extra file just to adjust identation.

Argh, yeah, I think an earlier version mucked with the signature and I
didn't undo that cleanly. I'll send a fixlet for that too.

Thanks for the review!

Baolin Wang April 7, 2024, 10:19 a.m. UTC | #3

On 2024/3/21 02:02, Johannes Weiner wrote:
> Free page accounting currently happens a bit too high up the call
> stack, where it has to deal with guard pages, compaction capturing,
> block stealing and even page isolation. This is subtle and fragile,
> and makes it difficult to hack on the code.
> 
> Now that type violations on the freelists have been fixed, push the
> accounting down to where pages enter and leave the freelist.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> ---
>   include/linux/mm.h     |  18 ++--
>   include/linux/vmstat.h |   8 --
>   mm/debug_page_alloc.c  |  12 +--
>   mm/internal.h          |   5 --
>   mm/page_alloc.c        | 194 +++++++++++++++++++++++------------------
>   mm/page_isolation.c    |   3 +-
>   6 files changed, 120 insertions(+), 120 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 8147b1302413..bd2e94391c7e 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -3781,24 +3781,22 @@ static inline bool page_is_guard(struct page *page)
>   	return PageGuard(page);
>   }
>   
> -bool __set_page_guard(struct zone *zone, struct page *page, unsigned int order,
> -		      int migratetype);
> +bool __set_page_guard(struct zone *zone, struct page *page, unsigned int order);
>   static inline bool set_page_guard(struct zone *zone, struct page *page,
> -				  unsigned int order, int migratetype)
> +				  unsigned int order)
>   {
>   	if (!debug_guardpage_enabled())
>   		return false;
> -	return __set_page_guard(zone, page, order, migratetype);
> +	return __set_page_guard(zone, page, order);
>   }
>   
> -void __clear_page_guard(struct zone *zone, struct page *page, unsigned int order,
> -			int migratetype);
> +void __clear_page_guard(struct zone *zone, struct page *page, unsigned int order);
>   static inline void clear_page_guard(struct zone *zone, struct page *page,
> -				    unsigned int order, int migratetype)
> +				    unsigned int order)
>   {
>   	if (!debug_guardpage_enabled())
>   		return;
> -	__clear_page_guard(zone, page, order, migratetype);
> +	__clear_page_guard(zone, page, order);
>   }
>   
>   #else	/* CONFIG_DEBUG_PAGEALLOC */
> @@ -3808,9 +3806,9 @@ static inline unsigned int debug_guardpage_minorder(void) { return 0; }
>   static inline bool debug_guardpage_enabled(void) { return false; }
>   static inline bool page_is_guard(struct page *page) { return false; }
>   static inline bool set_page_guard(struct zone *zone, struct page *page,
> -			unsigned int order, int migratetype) { return false; }
> +			unsigned int order) { return false; }
>   static inline void clear_page_guard(struct zone *zone, struct page *page,
> -				unsigned int order, int migratetype) {}
> +				unsigned int order) {}
>   #endif	/* CONFIG_DEBUG_PAGEALLOC */
>   
>   #ifdef __HAVE_ARCH_GATE_AREA
> diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h
> index 343906a98d6e..735eae6e272c 100644
> --- a/include/linux/vmstat.h
> +++ b/include/linux/vmstat.h
> @@ -487,14 +487,6 @@ static inline void node_stat_sub_folio(struct folio *folio,
>   	mod_node_page_state(folio_pgdat(folio), item, -folio_nr_pages(folio));
>   }
>   
> -static inline void __mod_zone_freepage_state(struct zone *zone, int nr_pages,
> -					     int migratetype)
> -{
> -	__mod_zone_page_state(zone, NR_FREE_PAGES, nr_pages);
> -	if (is_migrate_cma(migratetype))
> -		__mod_zone_page_state(zone, NR_FREE_CMA_PAGES, nr_pages);
> -}
> -
>   extern const char * const vmstat_text[];
>   
>   static inline const char *zone_stat_name(enum zone_stat_item item)
> diff --git a/mm/debug_page_alloc.c b/mm/debug_page_alloc.c
> index 6755f0c9d4a3..d46acf989dde 100644
> --- a/mm/debug_page_alloc.c
> +++ b/mm/debug_page_alloc.c
> @@ -32,8 +32,7 @@ static int __init debug_guardpage_minorder_setup(char *buf)
>   }
>   early_param("debug_guardpage_minorder", debug_guardpage_minorder_setup);
>   
> -bool __set_page_guard(struct zone *zone, struct page *page, unsigned int order,
> -		      int migratetype)
> +bool __set_page_guard(struct zone *zone, struct page *page, unsigned int order)
>   {
>   	if (order >= debug_guardpage_minorder())
>   		return false;
> @@ -41,19 +40,12 @@ bool __set_page_guard(struct zone *zone, struct page *page, unsigned int order,
>   	__SetPageGuard(page);
>   	INIT_LIST_HEAD(&page->buddy_list);
>   	set_page_private(page, order);
> -	/* Guard pages are not available for any usage */
> -	if (!is_migrate_isolate(migratetype))
> -		__mod_zone_freepage_state(zone, -(1 << order), migratetype);
>   
>   	return true;
>   }
>   
> -void __clear_page_guard(struct zone *zone, struct page *page, unsigned int order,
> -		      int migratetype)
> +void __clear_page_guard(struct zone *zone, struct page *page, unsigned int order)
>   {
>   	__ClearPageGuard(page);
> -
>   	set_page_private(page, 0);
> -	if (!is_migrate_isolate(migratetype))
> -		__mod_zone_freepage_state(zone, (1 << order), migratetype);
>   }
> diff --git a/mm/internal.h b/mm/internal.h
> index d6e6c7d9f04e..0a4007b03d0d 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -1036,11 +1036,6 @@ static inline bool is_migrate_highatomic(enum migratetype migratetype)
>   	return migratetype == MIGRATE_HIGHATOMIC;
>   }
>   
> -static inline bool is_migrate_highatomic_page(struct page *page)
> -{
> -	return get_pageblock_migratetype(page) == MIGRATE_HIGHATOMIC;
> -}
> -
>   void setup_zone_pageset(struct zone *zone);
>   
>   struct migration_target_control {
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index efb2581ac142..c46491f83ac2 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -642,42 +642,72 @@ compaction_capture(struct capture_control *capc, struct page *page,
>   }
>   #endif /* CONFIG_COMPACTION */
>   
> -/* Used for pages not on another list */
> -static inline void add_to_free_list(struct page *page, struct zone *zone,
> -				    unsigned int order, int migratetype)
> +static inline void account_freepages(struct page *page, struct zone *zone,
> +				     int nr_pages, int migratetype)
>   {
> -	struct free_area *area = &zone->free_area[order];
> +	if (is_migrate_isolate(migratetype))
> +		return;
>   
> -	list_add(&page->buddy_list, &area->free_list[migratetype]);
> -	area->nr_free++;
> +	__mod_zone_page_state(zone, NR_FREE_PAGES, nr_pages);
> +
> +	if (is_migrate_cma(migratetype))
> +		__mod_zone_page_state(zone, NR_FREE_CMA_PAGES, nr_pages);
>   }
>   
>   /* Used for pages not on another list */
> -static inline void add_to_free_list_tail(struct page *page, struct zone *zone,
> -					 unsigned int order, int migratetype)
> +static inline void __add_to_free_list(struct page *page, struct zone *zone,
> +				      unsigned int order, int migratetype,
> +				      bool tail)
>   {
>   	struct free_area *area = &zone->free_area[order];
>   
> -	list_add_tail(&page->buddy_list, &area->free_list[migratetype]);
> +	VM_WARN_ONCE(get_pageblock_migratetype(page) != migratetype,
> +		     "page type is %lu, passed migratetype is %d (nr=%d)\n",
> +		     get_pageblock_migratetype(page), migratetype, 1 << order);
> +
> +	if (tail)
> +		list_add_tail(&page->buddy_list, &area->free_list[migratetype]);
> +	else
> +		list_add(&page->buddy_list, &area->free_list[migratetype]);
>   	area->nr_free++;
>   }
>   
> +static inline void add_to_free_list(struct page *page, struct zone *zone,
> +				    unsigned int order, int migratetype,
> +				    bool tail)
> +{
> +	__add_to_free_list(page, zone, order, migratetype, tail);
> +	account_freepages(page, zone, 1 << order, migratetype);
> +}
> +
>   /*
>    * Used for pages which are on another list. Move the pages to the tail
>    * of the list - so the moved pages won't immediately be considered for
>    * allocation again (e.g., optimization for memory onlining).
>    */
>   static inline void move_to_free_list(struct page *page, struct zone *zone,
> -				     unsigned int order, int migratetype)
> +				     unsigned int order, int old_mt, int new_mt)
>   {
>   	struct free_area *area = &zone->free_area[order];
>   
> -	list_move_tail(&page->buddy_list, &area->free_list[migratetype]);
> +	/* Free page moving can fail, so it happens before the type update */
> +	VM_WARN_ONCE(get_pageblock_migratetype(page) != old_mt,
> +		     "page type is %lu, passed migratetype is %d (nr=%d)\n",
> +		     get_pageblock_migratetype(page), old_mt, 1 << order);
> +
> +	list_move_tail(&page->buddy_list, &area->free_list[new_mt]);
> +
> +	account_freepages(page, zone, -(1 << order), old_mt);
> +	account_freepages(page, zone, 1 << order, new_mt);
>   }
>   
> -static inline void del_page_from_free_list(struct page *page, struct zone *zone,
> -					   unsigned int order)
> +static inline void __del_page_from_free_list(struct page *page, struct zone *zone,
> +					     unsigned int order, int migratetype)
>   {
> +        VM_WARN_ONCE(get_pageblock_migratetype(page) != migratetype,
> +		     "page type is %lu, passed migratetype is %d (nr=%d)\n",
> +		     get_pageblock_migratetype(page), migratetype, 1 << order);
> +
>   	/* clear reported state and update reported page count */
>   	if (page_reported(page))
>   		__ClearPageReported(page);
> @@ -688,6 +718,13 @@ static inline void del_page_from_free_list(struct page *page, struct zone *zone,
>   	zone->free_area[order].nr_free--;
>   }
>   
> +static inline void del_page_from_free_list(struct page *page, struct zone *zone,
> +					   unsigned int order, int migratetype)
> +{
> +	__del_page_from_free_list(page, zone, order, migratetype);
> +	account_freepages(page, zone, -(1 << order), migratetype);
> +}
> +
>   static inline struct page *get_page_from_free_area(struct free_area *area,
>   					    int migratetype)
>   {
> @@ -759,18 +796,16 @@ static inline void __free_one_page(struct page *page,
>   	VM_BUG_ON_PAGE(page->flags & PAGE_FLAGS_CHECK_AT_PREP, page);
>   
>   	VM_BUG_ON(migratetype == -1);
> -	if (likely(!is_migrate_isolate(migratetype)))
> -		__mod_zone_freepage_state(zone, 1 << order, migratetype);
> -
>   	VM_BUG_ON_PAGE(pfn & ((1 << order) - 1), page);
>   	VM_BUG_ON_PAGE(bad_range(zone, page), page);
>   
> +	account_freepages(page, zone, 1 << order, migratetype);
> +
>   	while (order < MAX_PAGE_ORDER) {
> -		if (compaction_capture(capc, page, order, migratetype)) {
> -			__mod_zone_freepage_state(zone, -(1 << order),
> -								migratetype);
> +		int buddy_mt = migratetype;
> +
> +		if (compaction_capture(capc, page, order, migratetype))
>   			return;
> -		}

IIUC, if the released page is captured by compaction, then the 
statistics for free pages should be correspondingly decreased, 
otherwise, there will be a slight regression for my thpcompact benchmark.

thpcompact Percentage Faults Huge
                           k6.9-rc2-base        base + patch10 + 2 fixes	
Percentage huge-1        78.18 (   0.00%)       71.92 (  -8.01%)
Percentage huge-3        86.70 (   0.00%)       86.07 (  -0.73%)
Percentage huge-5        90.26 (   0.00%)       78.02 ( -13.57%)
Percentage huge-7        92.34 (   0.00%)       78.67 ( -14.81%)
Percentage huge-12       91.18 (   0.00%)       81.04 ( -11.12%)
Percentage huge-18       89.00 (   0.00%)       79.57 ( -10.60%)
Percentage huge-24       90.52 (   0.00%)       80.07 ( -11.54%)
Percentage huge-30       94.44 (   0.00%)       96.28 (   1.95%)
Percentage huge-32       93.09 (   0.00%)       99.39 (   6.77%)

I add below fix based on your fix 2, then the thpcompact Percentage 
looks good. How do you think for the fix?

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 8330c5c2de6b..2facf844ef84 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -805,8 +805,10 @@ static inline void __free_one_page(struct page *page,
         while (order < MAX_PAGE_ORDER) {
                 int buddy_mt = migratetype;

-               if (compaction_capture(capc, page, order, migratetype))
+               if (compaction_capture(capc, page, order, migratetype)) {
+                       account_freepages(zone, -(1 << order), migratetype);
                         return;
+               }

                 buddy = find_buddy_page_pfn(page, pfn, order, &buddy_pfn);
                 if (!buddy)

With my fix, the THP percentage looks better:
                       k6.9-rc2-base          base + patch10 + 2 fixes	+ 
my fix
Percentage huge-1        78.18 (   0.00%)       82.83 (   5.94%)
Percentage huge-3        86.70 (   0.00%)       93.47 (   7.81%)
Percentage huge-5        90.26 (   0.00%)       94.73 (   4.95%)
Percentage huge-7        92.34 (   0.00%)       95.22 (   3.12%)
Percentage huge-12       91.18 (   0.00%)       92.40 (   1.34%)
Percentage huge-18       89.00 (   0.00%)       85.39 (  -4.06%)
Percentage huge-24       90.52 (   0.00%)       94.70 (   4.61%)
Percentage huge-30       94.44 (   0.00%)       97.00 (   2.71%)
Percentage huge-32       93.09 (   0.00%)       92.87 (  -0.24%)

Vlastimil Babka April 8, 2024, 7:38 a.m. UTC | #4

On 4/7/24 12:19 PM, Baolin Wang wrote:
> On 2024/3/21 02:02, Johannes Weiner wrote:
>>   
>> +	account_freepages(page, zone, 1 << order, migratetype);
>> +
>>   	while (order < MAX_PAGE_ORDER) {
>> -		if (compaction_capture(capc, page, order, migratetype)) {
>> -			__mod_zone_freepage_state(zone, -(1 << order),
>> -								migratetype);
>> +		int buddy_mt = migratetype;
>> +
>> +		if (compaction_capture(capc, page, order, migratetype))
>>   			return;
>> -		}
> 
> IIUC, if the released page is captured by compaction, then the 
> statistics for free pages should be correspondingly decreased, 
> otherwise, there will be a slight regression for my thpcompact benchmark.
> 
> thpcompact Percentage Faults Huge
>                            k6.9-rc2-base        base + patch10 + 2 fixes	
> Percentage huge-1        78.18 (   0.00%)       71.92 (  -8.01%)
> Percentage huge-3        86.70 (   0.00%)       86.07 (  -0.73%)
> Percentage huge-5        90.26 (   0.00%)       78.02 ( -13.57%)
> Percentage huge-7        92.34 (   0.00%)       78.67 ( -14.81%)
> Percentage huge-12       91.18 (   0.00%)       81.04 ( -11.12%)
> Percentage huge-18       89.00 (   0.00%)       79.57 ( -10.60%)
> Percentage huge-24       90.52 (   0.00%)       80.07 ( -11.54%)
> Percentage huge-30       94.44 (   0.00%)       96.28 (   1.95%)
> Percentage huge-32       93.09 (   0.00%)       99.39 (   6.77%)
> 
> I add below fix based on your fix 2, then the thpcompact Percentage 
> looks good. How do you think for the fix?

Yeah another well spotted, thanks. "slight regression" is an understatement,
this affects not just a "statistics" but very important counter
NR_FREE_PAGES which IIUC would eventually become larger than reality, make
the watermark checks false positive and result in depleted reserves etc etc.
Actually wondering why we're not seeing -next failures already (or maybe I
just haven't noticed).

> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 8330c5c2de6b..2facf844ef84 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -805,8 +805,10 @@ static inline void __free_one_page(struct page *page,
>          while (order < MAX_PAGE_ORDER) {
>                  int buddy_mt = migratetype;
> 
> -               if (compaction_capture(capc, page, order, migratetype))
> +               if (compaction_capture(capc, page, order, migratetype)) {
> +                       account_freepages(zone, -(1 << order), migratetype);
>                          return;
> +               }
> 
>                  buddy = find_buddy_page_pfn(page, pfn, order, &buddy_pfn);
>                  if (!buddy)
> 
> With my fix, the THP percentage looks better:
>                        k6.9-rc2-base          base + patch10 + 2 fixes	+ 
> my fix
> Percentage huge-1        78.18 (   0.00%)       82.83 (   5.94%)
> Percentage huge-3        86.70 (   0.00%)       93.47 (   7.81%)
> Percentage huge-5        90.26 (   0.00%)       94.73 (   4.95%)
> Percentage huge-7        92.34 (   0.00%)       95.22 (   3.12%)
> Percentage huge-12       91.18 (   0.00%)       92.40 (   1.34%)
> Percentage huge-18       89.00 (   0.00%)       85.39 (  -4.06%)
> Percentage huge-24       90.52 (   0.00%)       94.70 (   4.61%)
> Percentage huge-30       94.44 (   0.00%)       97.00 (   2.71%)
> Percentage huge-32       93.09 (   0.00%)       92.87 (  -0.24%)

Baolin Wang April 8, 2024, 9:13 a.m. UTC | #5

On 2024/4/8 15:38, Vlastimil Babka wrote:
> On 4/7/24 12:19 PM, Baolin Wang wrote:
>> On 2024/3/21 02:02, Johannes Weiner wrote:
>>>    
>>> +	account_freepages(page, zone, 1 << order, migratetype);
>>> +
>>>    	while (order < MAX_PAGE_ORDER) {
>>> -		if (compaction_capture(capc, page, order, migratetype)) {
>>> -			__mod_zone_freepage_state(zone, -(1 << order),
>>> -								migratetype);
>>> +		int buddy_mt = migratetype;
>>> +
>>> +		if (compaction_capture(capc, page, order, migratetype))
>>>    			return;
>>> -		}
>>
>> IIUC, if the released page is captured by compaction, then the
>> statistics for free pages should be correspondingly decreased,
>> otherwise, there will be a slight regression for my thpcompact benchmark.
>>
>> thpcompact Percentage Faults Huge
>>                             k6.9-rc2-base        base + patch10 + 2 fixes	
>> Percentage huge-1        78.18 (   0.00%)       71.92 (  -8.01%)
>> Percentage huge-3        86.70 (   0.00%)       86.07 (  -0.73%)
>> Percentage huge-5        90.26 (   0.00%)       78.02 ( -13.57%)
>> Percentage huge-7        92.34 (   0.00%)       78.67 ( -14.81%)
>> Percentage huge-12       91.18 (   0.00%)       81.04 ( -11.12%)
>> Percentage huge-18       89.00 (   0.00%)       79.57 ( -10.60%)
>> Percentage huge-24       90.52 (   0.00%)       80.07 ( -11.54%)
>> Percentage huge-30       94.44 (   0.00%)       96.28 (   1.95%)
>> Percentage huge-32       93.09 (   0.00%)       99.39 (   6.77%)
>>
>> I add below fix based on your fix 2, then the thpcompact Percentage
>> looks good. How do you think for the fix?
> 
> Yeah another well spotted, thanks. "slight regression" is an understatement,

Thanks for helping to confirm.

> this affects not just a "statistics" but very important counter
> NR_FREE_PAGES which IIUC would eventually become larger than reality, make
> the watermark checks false positive and result in depleted reserves etc etc.

Right, agree.

> Actually wondering why we're not seeing -next failures already (or maybe I
> just haven't noticed).
> 
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 8330c5c2de6b..2facf844ef84 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -805,8 +805,10 @@ static inline void __free_one_page(struct page *page,
>>           while (order < MAX_PAGE_ORDER) {
>>                   int buddy_mt = migratetype;
>>
>> -               if (compaction_capture(capc, page, order, migratetype))
>> +               if (compaction_capture(capc, page, order, migratetype)) {
>> +                       account_freepages(zone, -(1 << order), migratetype);
>>                           return;
>> +               }
>>
>>                   buddy = find_buddy_page_pfn(page, pfn, order, &buddy_pfn);
>>                   if (!buddy)
>>
>> With my fix, the THP percentage looks better:
>>                         k6.9-rc2-base          base + patch10 + 2 fixes	+
>> my fix
>> Percentage huge-1        78.18 (   0.00%)       82.83 (   5.94%)
>> Percentage huge-3        86.70 (   0.00%)       93.47 (   7.81%)
>> Percentage huge-5        90.26 (   0.00%)       94.73 (   4.95%)
>> Percentage huge-7        92.34 (   0.00%)       95.22 (   3.12%)
>> Percentage huge-12       91.18 (   0.00%)       92.40 (   1.34%)
>> Percentage huge-18       89.00 (   0.00%)       85.39 (  -4.06%)
>> Percentage huge-24       90.52 (   0.00%)       94.70 (   4.61%)
>> Percentage huge-30       94.44 (   0.00%)       97.00 (   2.71%)
>> Percentage huge-32       93.09 (   0.00%)       92.87 (  -0.24%)

Johannes Weiner April 8, 2024, 2:23 p.m. UTC | #6

On Mon, Apr 08, 2024 at 09:38:20AM +0200, Vlastimil Babka wrote:
> On 4/7/24 12:19 PM, Baolin Wang wrote:
> > On 2024/3/21 02:02, Johannes Weiner wrote:
> >>   
> >> +	account_freepages(page, zone, 1 << order, migratetype);
> >> +
> >>   	while (order < MAX_PAGE_ORDER) {
> >> -		if (compaction_capture(capc, page, order, migratetype)) {
> >> -			__mod_zone_freepage_state(zone, -(1 << order),
> >> -								migratetype);
> >> +		int buddy_mt = migratetype;
> >> +
> >> +		if (compaction_capture(capc, page, order, migratetype))
> >>   			return;
> >> -		}
> > 
> > IIUC, if the released page is captured by compaction, then the 
> > statistics for free pages should be correspondingly decreased, 
> > otherwise, there will be a slight regression for my thpcompact benchmark.
> > 
> > thpcompact Percentage Faults Huge
> >                            k6.9-rc2-base        base + patch10 + 2 fixes	
> > Percentage huge-1        78.18 (   0.00%)       71.92 (  -8.01%)
> > Percentage huge-3        86.70 (   0.00%)       86.07 (  -0.73%)
> > Percentage huge-5        90.26 (   0.00%)       78.02 ( -13.57%)
> > Percentage huge-7        92.34 (   0.00%)       78.67 ( -14.81%)
> > Percentage huge-12       91.18 (   0.00%)       81.04 ( -11.12%)
> > Percentage huge-18       89.00 (   0.00%)       79.57 ( -10.60%)
> > Percentage huge-24       90.52 (   0.00%)       80.07 ( -11.54%)
> > Percentage huge-30       94.44 (   0.00%)       96.28 (   1.95%)
> > Percentage huge-32       93.09 (   0.00%)       99.39 (   6.77%)
> > 
> > I add below fix based on your fix 2, then the thpcompact Percentage 
> > looks good. How do you think for the fix?
> 
> Yeah another well spotted, thanks. "slight regression" is an understatement,
> this affects not just a "statistics" but very important counter
> NR_FREE_PAGES which IIUC would eventually become larger than reality, make
> the watermark checks false positive and result in depleted reserves etc etc.
> Actually wondering why we're not seeing -next failures already (or maybe I
> just haven't noticed).

Good catch indeed.

Trying to understand why I didn't notice this during testing, and I
think it's because I had order-10 pageblocks in my config. There is
this in compaction_capture():

	if (order < pageblock_order && migratetype == MIGRATE_MOVABLE)
		return false;

Most compaction is for order-9 THPs on movable blocks, so I didn't get
much capturing in practice in order for that leak to be noticable.

In earlier versions of the patches I had more aggressive capturing,
but also did the accounting in add_page_to/del_page_from_freelist(),
so capturing only steals what's already accounted to be off list.

With the __add/__del and the consolidated account_freepages()
optimization, compaction_capture() needs explicit accounting again.

> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 8330c5c2de6b..2facf844ef84 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -805,8 +805,10 @@ static inline void __free_one_page(struct page *page,
> >          while (order < MAX_PAGE_ORDER) {
> >                  int buddy_mt = migratetype;
> > 
> > -               if (compaction_capture(capc, page, order, migratetype))
> > +               if (compaction_capture(capc, page, order, migratetype)) {
> > +                       account_freepages(zone, -(1 << order), migratetype);
> >                          return;
> > +               }
> > 
> >                  buddy = find_buddy_page_pfn(page, pfn, order, &buddy_pfn);
> >                  if (!buddy)
> > 
> > With my fix, the THP percentage looks better:
> >                        k6.9-rc2-base          base + patch10 + 2 fixes	+ 
> > my fix
> > Percentage huge-1        78.18 (   0.00%)       82.83 (   5.94%)
> > Percentage huge-3        86.70 (   0.00%)       93.47 (   7.81%)
> > Percentage huge-5        90.26 (   0.00%)       94.73 (   4.95%)
> > Percentage huge-7        92.34 (   0.00%)       95.22 (   3.12%)
> > Percentage huge-12       91.18 (   0.00%)       92.40 (   1.34%)
> > Percentage huge-18       89.00 (   0.00%)       85.39 (  -4.06%)
> > Percentage huge-24       90.52 (   0.00%)       94.70 (   4.61%)
> > Percentage huge-30       94.44 (   0.00%)       97.00 (   2.71%)
> > Percentage huge-32       93.09 (   0.00%)       92.87 (  -0.24%)

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

With fixed indentation:

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 70f82635f650..96815e3c22f2 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -805,8 +805,10 @@ static inline void __free_one_page(struct page *page,
 	while (order < MAX_PAGE_ORDER) {
 		int buddy_mt = migratetype;
 
-		if (compaction_capture(capc, page, order, migratetype))
+		if (compaction_capture(capc, page, order, migratetype)) {
+			account_freepages(zone, -(1 << order), migratetype);
 			return;
+		}
 
 		buddy = find_buddy_page_pfn(page, pfn, order, &buddy_pfn);
 		if (!buddy)

Vlastimil Babka April 9, 2024, 6:23 a.m. UTC | #7

On 4/8/24 4:23 PM, Johannes Weiner wrote:
> On Mon, Apr 08, 2024 at 09:38:20AM +0200, Vlastimil Babka wrote:
>> On 4/7/24 12:19 PM, Baolin Wang wrote:
>> > On 2024/3/21 02:02, Johannes Weiner wrote:
>> >>   
>> >> +	account_freepages(page, zone, 1 << order, migratetype);
>> >> +
>> >>   	while (order < MAX_PAGE_ORDER) {
>> >> -		if (compaction_capture(capc, page, order, migratetype)) {
>> >> -			__mod_zone_freepage_state(zone, -(1 << order),
>> >> -								migratetype);
>> >> +		int buddy_mt = migratetype;
>> >> +
>> >> +		if (compaction_capture(capc, page, order, migratetype))
>> >>   			return;
>> >> -		}
>> > 
>> > IIUC, if the released page is captured by compaction, then the 
>> > statistics for free pages should be correspondingly decreased, 
>> > otherwise, there will be a slight regression for my thpcompact benchmark.
>> > 
>> > thpcompact Percentage Faults Huge
>> >                            k6.9-rc2-base        base + patch10 + 2 fixes	
>> > Percentage huge-1        78.18 (   0.00%)       71.92 (  -8.01%)
>> > Percentage huge-3        86.70 (   0.00%)       86.07 (  -0.73%)
>> > Percentage huge-5        90.26 (   0.00%)       78.02 ( -13.57%)
>> > Percentage huge-7        92.34 (   0.00%)       78.67 ( -14.81%)
>> > Percentage huge-12       91.18 (   0.00%)       81.04 ( -11.12%)
>> > Percentage huge-18       89.00 (   0.00%)       79.57 ( -10.60%)
>> > Percentage huge-24       90.52 (   0.00%)       80.07 ( -11.54%)
>> > Percentage huge-30       94.44 (   0.00%)       96.28 (   1.95%)
>> > Percentage huge-32       93.09 (   0.00%)       99.39 (   6.77%)
>> > 
>> > I add below fix based on your fix 2, then the thpcompact Percentage 
>> > looks good. How do you think for the fix?
>> 
>> Yeah another well spotted, thanks. "slight regression" is an understatement,
>> this affects not just a "statistics" but very important counter
>> NR_FREE_PAGES which IIUC would eventually become larger than reality, make
>> the watermark checks false positive and result in depleted reserves etc etc.
>> Actually wondering why we're not seeing -next failures already (or maybe I
>> just haven't noticed).
> 
> Good catch indeed.
> 
> Trying to understand why I didn't notice this during testing, and I
> think it's because I had order-10 pageblocks in my config. There is
> this in compaction_capture():
> 
> 	if (order < pageblock_order && migratetype == MIGRATE_MOVABLE)
> 		return false;
> 
> Most compaction is for order-9 THPs on movable blocks, so I didn't get
> much capturing in practice in order for that leak to be noticable.
> 
> In earlier versions of the patches I had more aggressive capturing,
> but also did the accounting in add_page_to/del_page_from_freelist(),
> so capturing only steals what's already accounted to be off list.
> 
> With the __add/__del and the consolidated account_freepages()
> optimization, compaction_capture() needs explicit accounting again.
> 
>> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> > index 8330c5c2de6b..2facf844ef84 100644
>> > --- a/mm/page_alloc.c
>> > +++ b/mm/page_alloc.c
>> > @@ -805,8 +805,10 @@ static inline void __free_one_page(struct page *page,
>> >          while (order < MAX_PAGE_ORDER) {
>> >                  int buddy_mt = migratetype;
>> > 
>> > -               if (compaction_capture(capc, page, order, migratetype))
>> > +               if (compaction_capture(capc, page, order, migratetype)) {
>> > +                       account_freepages(zone, -(1 << order), migratetype);
>> >                          return;
>> > +               }
>> > 
>> >                  buddy = find_buddy_page_pfn(page, pfn, order, &buddy_pfn);
>> >                  if (!buddy)
>> > 
>> > With my fix, the THP percentage looks better:
>> >                        k6.9-rc2-base          base + patch10 + 2 fixes	+ 
>> > my fix
>> > Percentage huge-1        78.18 (   0.00%)       82.83 (   5.94%)
>> > Percentage huge-3        86.70 (   0.00%)       93.47 (   7.81%)
>> > Percentage huge-5        90.26 (   0.00%)       94.73 (   4.95%)
>> > Percentage huge-7        92.34 (   0.00%)       95.22 (   3.12%)
>> > Percentage huge-12       91.18 (   0.00%)       92.40 (   1.34%)
>> > Percentage huge-18       89.00 (   0.00%)       85.39 (  -4.06%)
>> > Percentage huge-24       90.52 (   0.00%)       94.70 (   4.61%)
>> > Percentage huge-30       94.44 (   0.00%)       97.00 (   2.71%)
>> > Percentage huge-32       93.09 (   0.00%)       92.87 (  -0.24%)
> 
> Acked-by: Johannes Weiner <hannes@cmpxchg.org>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

> With fixed indentation:

Maybe Baolin could resend the finalized 2 fixups in a more ready-to-pick form?

> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 70f82635f650..96815e3c22f2 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -805,8 +805,10 @@ static inline void __free_one_page(struct page *page,
>  	while (order < MAX_PAGE_ORDER) {
>  		int buddy_mt = migratetype;
>  
> -		if (compaction_capture(capc, page, order, migratetype))
> +		if (compaction_capture(capc, page, order, migratetype)) {
> +			account_freepages(zone, -(1 << order), migratetype);
>  			return;
> +		}
>  
>  		buddy = find_buddy_page_pfn(page, pfn, order, &buddy_pfn);
>  		if (!buddy)

Baolin Wang April 9, 2024, 7:56 a.m. UTC | #8

On 2024/4/9 14:23, Vlastimil Babka wrote:
> On 4/8/24 4:23 PM, Johannes Weiner wrote:
>> On Mon, Apr 08, 2024 at 09:38:20AM +0200, Vlastimil Babka wrote:
>>> On 4/7/24 12:19 PM, Baolin Wang wrote:
>>>> On 2024/3/21 02:02, Johannes Weiner wrote:
>>>>>    
>>>>> +	account_freepages(page, zone, 1 << order, migratetype);
>>>>> +
>>>>>    	while (order < MAX_PAGE_ORDER) {
>>>>> -		if (compaction_capture(capc, page, order, migratetype)) {
>>>>> -			__mod_zone_freepage_state(zone, -(1 << order),
>>>>> -								migratetype);
>>>>> +		int buddy_mt = migratetype;
>>>>> +
>>>>> +		if (compaction_capture(capc, page, order, migratetype))
>>>>>    			return;
>>>>> -		}
>>>>
>>>> IIUC, if the released page is captured by compaction, then the
>>>> statistics for free pages should be correspondingly decreased,
>>>> otherwise, there will be a slight regression for my thpcompact benchmark.
>>>>
>>>> thpcompact Percentage Faults Huge
>>>>                             k6.9-rc2-base        base + patch10 + 2 fixes	
>>>> Percentage huge-1        78.18 (   0.00%)       71.92 (  -8.01%)
>>>> Percentage huge-3        86.70 (   0.00%)       86.07 (  -0.73%)
>>>> Percentage huge-5        90.26 (   0.00%)       78.02 ( -13.57%)
>>>> Percentage huge-7        92.34 (   0.00%)       78.67 ( -14.81%)
>>>> Percentage huge-12       91.18 (   0.00%)       81.04 ( -11.12%)
>>>> Percentage huge-18       89.00 (   0.00%)       79.57 ( -10.60%)
>>>> Percentage huge-24       90.52 (   0.00%)       80.07 ( -11.54%)
>>>> Percentage huge-30       94.44 (   0.00%)       96.28 (   1.95%)
>>>> Percentage huge-32       93.09 (   0.00%)       99.39 (   6.77%)
>>>>
>>>> I add below fix based on your fix 2, then the thpcompact Percentage
>>>> looks good. How do you think for the fix?
>>>
>>> Yeah another well spotted, thanks. "slight regression" is an understatement,
>>> this affects not just a "statistics" but very important counter
>>> NR_FREE_PAGES which IIUC would eventually become larger than reality, make
>>> the watermark checks false positive and result in depleted reserves etc etc.
>>> Actually wondering why we're not seeing -next failures already (or maybe I
>>> just haven't noticed).
>>
>> Good catch indeed.
>>
>> Trying to understand why I didn't notice this during testing, and I
>> think it's because I had order-10 pageblocks in my config. There is
>> this in compaction_capture():
>>
>> 	if (order < pageblock_order && migratetype == MIGRATE_MOVABLE)
>> 		return false;
>>
>> Most compaction is for order-9 THPs on movable blocks, so I didn't get
>> much capturing in practice in order for that leak to be noticable.
>>
>> In earlier versions of the patches I had more aggressive capturing,
>> but also did the accounting in add_page_to/del_page_from_freelist(),
>> so capturing only steals what's already accounted to be off list.
>>
>> With the __add/__del and the consolidated account_freepages()
>> optimization, compaction_capture() needs explicit accounting again.
>>
>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>>> index 8330c5c2de6b..2facf844ef84 100644
>>>> --- a/mm/page_alloc.c
>>>> +++ b/mm/page_alloc.c
>>>> @@ -805,8 +805,10 @@ static inline void __free_one_page(struct page *page,
>>>>           while (order < MAX_PAGE_ORDER) {
>>>>                   int buddy_mt = migratetype;
>>>>
>>>> -               if (compaction_capture(capc, page, order, migratetype))
>>>> +               if (compaction_capture(capc, page, order, migratetype)) {
>>>> +                       account_freepages(zone, -(1 << order), migratetype);
>>>>                           return;
>>>> +               }
>>>>
>>>>                   buddy = find_buddy_page_pfn(page, pfn, order, &buddy_pfn);
>>>>                   if (!buddy)
>>>>
>>>> With my fix, the THP percentage looks better:
>>>>                         k6.9-rc2-base          base + patch10 + 2 fixes	+
>>>> my fix
>>>> Percentage huge-1        78.18 (   0.00%)       82.83 (   5.94%)
>>>> Percentage huge-3        86.70 (   0.00%)       93.47 (   7.81%)
>>>> Percentage huge-5        90.26 (   0.00%)       94.73 (   4.95%)
>>>> Percentage huge-7        92.34 (   0.00%)       95.22 (   3.12%)
>>>> Percentage huge-12       91.18 (   0.00%)       92.40 (   1.34%)
>>>> Percentage huge-18       89.00 (   0.00%)       85.39 (  -4.06%)
>>>> Percentage huge-24       90.52 (   0.00%)       94.70 (   4.61%)
>>>> Percentage huge-30       94.44 (   0.00%)       97.00 (   2.71%)
>>>> Percentage huge-32       93.09 (   0.00%)       92.87 (  -0.24%)
>>
>> Acked-by: Johannes Weiner <hannes@cmpxchg.org>
> 
> Acked-by: Vlastimil Babka <vbabka@suse.cz>

Thanks.

>> With fixed indentation:
> 
> Maybe Baolin could resend the finalized 2 fixups in a more ready-to-pick form?

Sure, I've send it out.

And I see Andrew has already folded the first fix 
("mm-page_alloc-fix-freelist-movement-during-block-conversion-fix") into 
the patch set. If a formal fix patch is needed, Andrew, please let me know.

Vlastimil Babka April 9, 2024, 8:41 a.m. UTC | #9

On 4/9/24 9:56 AM, Baolin Wang wrote:
> 
> 
>>>
>>> Acked-by: Johannes Weiner <hannes@cmpxchg.org>
>> 
>> Acked-by: Vlastimil Babka <vbabka@suse.cz>
> 
> Thanks.
> 
>>> With fixed indentation:
>> 
>> Maybe Baolin could resend the finalized 2 fixups in a more ready-to-pick form?
> 
> Sure, I've send it out.

Thanks.

> And I see Andrew has already folded the first fix 
> ("mm-page_alloc-fix-freelist-movement-during-block-conversion-fix") into 
> the patch set. If a formal fix patch is needed, Andrew, please let me know.

Oh didn't notice, in that case nothing more should be needed.

Baolin Wang April 9, 2024, 9:31 a.m. UTC | #10

On 2024/4/8 22:23, Johannes Weiner wrote:
> On Mon, Apr 08, 2024 at 09:38:20AM +0200, Vlastimil Babka wrote:
>> On 4/7/24 12:19 PM, Baolin Wang wrote:
>>> On 2024/3/21 02:02, Johannes Weiner wrote:
>>>>    
>>>> +	account_freepages(page, zone, 1 << order, migratetype);
>>>> +
>>>>    	while (order < MAX_PAGE_ORDER) {
>>>> -		if (compaction_capture(capc, page, order, migratetype)) {
>>>> -			__mod_zone_freepage_state(zone, -(1 << order),
>>>> -								migratetype);
>>>> +		int buddy_mt = migratetype;
>>>> +
>>>> +		if (compaction_capture(capc, page, order, migratetype))
>>>>    			return;
>>>> -		}
>>>
>>> IIUC, if the released page is captured by compaction, then the
>>> statistics for free pages should be correspondingly decreased,
>>> otherwise, there will be a slight regression for my thpcompact benchmark.
>>>
>>> thpcompact Percentage Faults Huge
>>>                             k6.9-rc2-base        base + patch10 + 2 fixes	
>>> Percentage huge-1        78.18 (   0.00%)       71.92 (  -8.01%)
>>> Percentage huge-3        86.70 (   0.00%)       86.07 (  -0.73%)
>>> Percentage huge-5        90.26 (   0.00%)       78.02 ( -13.57%)
>>> Percentage huge-7        92.34 (   0.00%)       78.67 ( -14.81%)
>>> Percentage huge-12       91.18 (   0.00%)       81.04 ( -11.12%)
>>> Percentage huge-18       89.00 (   0.00%)       79.57 ( -10.60%)
>>> Percentage huge-24       90.52 (   0.00%)       80.07 ( -11.54%)
>>> Percentage huge-30       94.44 (   0.00%)       96.28 (   1.95%)
>>> Percentage huge-32       93.09 (   0.00%)       99.39 (   6.77%)
>>>
>>> I add below fix based on your fix 2, then the thpcompact Percentage
>>> looks good. How do you think for the fix?
>>
>> Yeah another well spotted, thanks. "slight regression" is an understatement,
>> this affects not just a "statistics" but very important counter
>> NR_FREE_PAGES which IIUC would eventually become larger than reality, make
>> the watermark checks false positive and result in depleted reserves etc etc.
>> Actually wondering why we're not seeing -next failures already (or maybe I
>> just haven't noticed).
> 
> Good catch indeed.
> 
> Trying to understand why I didn't notice this during testing, and I
> think it's because I had order-10 pageblocks in my config. There is
> this in compaction_capture():
> 
> 	if (order < pageblock_order && migratetype == MIGRATE_MOVABLE)
> 		return false;
> 
> Most compaction is for order-9 THPs on movable blocks, so I didn't get
> much capturing in practice in order for that leak to be noticable.

This makes me wonder why not use 'cc->migratetype' for migratetype 
comparison, so that low-order (like mTHP) compaction can directly get 
the released pages, which could avoid some compaction scans without 
mixing the migratetype?

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 2facf844ef84..7a64020f8222 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -622,7 +622,7 @@ compaction_capture(struct capture_control *capc, 
struct page *page,
          * and vice-versa but no more than normal fallback logic which can
          * have trouble finding a high-order free page.
          */
-       if (order < pageblock_order && migratetype == MIGRATE_MOVABLE)
+       if (order < pageblock_order && capc->cc->migratetype != migratetype)
                 return false;

         capc->page = page;

Zi Yan April 9, 2024, 2:46 p.m. UTC | #11

On 9 Apr 2024, at 5:31, Baolin Wang wrote:

> On 2024/4/8 22:23, Johannes Weiner wrote:
>> On Mon, Apr 08, 2024 at 09:38:20AM +0200, Vlastimil Babka wrote:
>>> On 4/7/24 12:19 PM, Baolin Wang wrote:
>>>> On 2024/3/21 02:02, Johannes Weiner wrote:
>>>>>   +	account_freepages(page, zone, 1 << order, migratetype);
>>>>> +
>>>>>    	while (order < MAX_PAGE_ORDER) {
>>>>> -		if (compaction_capture(capc, page, order, migratetype)) {
>>>>> -			__mod_zone_freepage_state(zone, -(1 << order),
>>>>> -								migratetype);
>>>>> +		int buddy_mt = migratetype;
>>>>> +
>>>>> +		if (compaction_capture(capc, page, order, migratetype))
>>>>>    			return;
>>>>> -		}
>>>>
>>>> IIUC, if the released page is captured by compaction, then the
>>>> statistics for free pages should be correspondingly decreased,
>>>> otherwise, there will be a slight regression for my thpcompact benchmark.
>>>>
>>>> thpcompact Percentage Faults Huge
>>>>                             k6.9-rc2-base        base + patch10 + 2 fixes	
>>>> Percentage huge-1        78.18 (   0.00%)       71.92 (  -8.01%)
>>>> Percentage huge-3        86.70 (   0.00%)       86.07 (  -0.73%)
>>>> Percentage huge-5        90.26 (   0.00%)       78.02 ( -13.57%)
>>>> Percentage huge-7        92.34 (   0.00%)       78.67 ( -14.81%)
>>>> Percentage huge-12       91.18 (   0.00%)       81.04 ( -11.12%)
>>>> Percentage huge-18       89.00 (   0.00%)       79.57 ( -10.60%)
>>>> Percentage huge-24       90.52 (   0.00%)       80.07 ( -11.54%)
>>>> Percentage huge-30       94.44 (   0.00%)       96.28 (   1.95%)
>>>> Percentage huge-32       93.09 (   0.00%)       99.39 (   6.77%)
>>>>
>>>> I add below fix based on your fix 2, then the thpcompact Percentage
>>>> looks good. How do you think for the fix?
>>>
>>> Yeah another well spotted, thanks. "slight regression" is an understatement,
>>> this affects not just a "statistics" but very important counter
>>> NR_FREE_PAGES which IIUC would eventually become larger than reality, make
>>> the watermark checks false positive and result in depleted reserves etc etc.
>>> Actually wondering why we're not seeing -next failures already (or maybe I
>>> just haven't noticed).
>>
>> Good catch indeed.
>>
>> Trying to understand why I didn't notice this during testing, and I
>> think it's because I had order-10 pageblocks in my config. There is
>> this in compaction_capture():
>>
>> 	if (order < pageblock_order && migratetype == MIGRATE_MOVABLE)
>> 		return false;
>>
>> Most compaction is for order-9 THPs on movable blocks, so I didn't get
>> much capturing in practice in order for that leak to be noticable.
>
> This makes me wonder why not use 'cc->migratetype' for migratetype comparison, so that low-order (like mTHP) compaction can directly get the released pages, which could avoid some compaction scans without mixing the migratetype?
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 2facf844ef84..7a64020f8222 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -622,7 +622,7 @@ compaction_capture(struct capture_control *capc, struct page *page,
>          * and vice-versa but no more than normal fallback logic which can
>          * have trouble finding a high-order free page.
>          */
> -       if (order < pageblock_order && migratetype == MIGRATE_MOVABLE)
> +       if (order < pageblock_order && capc->cc->migratetype != migratetype)
>                 return false;
>
>         capc->page = page;

It is worth trying, since at the original patch time mTHP was not present and
not capturing any MIGRATE_MOVABLE makes sense. But with your change, the capture
will lose the opportunity of letting an unmovable request use a reclaimable
pageblock and vice-versa, like the comment says. Please change the comment
as well and we should monitor potential unmovable and reclaimable regression.


--
Best Regards,
Yan, Zi

Baolin Wang April 10, 2024, 8:49 a.m. UTC | #12

On 2024/4/9 22:46, Zi Yan wrote:
> On 9 Apr 2024, at 5:31, Baolin Wang wrote:
> 
>> On 2024/4/8 22:23, Johannes Weiner wrote:
>>> On Mon, Apr 08, 2024 at 09:38:20AM +0200, Vlastimil Babka wrote:
>>>> On 4/7/24 12:19 PM, Baolin Wang wrote:
>>>>> On 2024/3/21 02:02, Johannes Weiner wrote:
>>>>>>    +	account_freepages(page, zone, 1 << order, migratetype);
>>>>>> +
>>>>>>     	while (order < MAX_PAGE_ORDER) {
>>>>>> -		if (compaction_capture(capc, page, order, migratetype)) {
>>>>>> -			__mod_zone_freepage_state(zone, -(1 << order),
>>>>>> -								migratetype);
>>>>>> +		int buddy_mt = migratetype;
>>>>>> +
>>>>>> +		if (compaction_capture(capc, page, order, migratetype))
>>>>>>     			return;
>>>>>> -		}
>>>>>
>>>>> IIUC, if the released page is captured by compaction, then the
>>>>> statistics for free pages should be correspondingly decreased,
>>>>> otherwise, there will be a slight regression for my thpcompact benchmark.
>>>>>
>>>>> thpcompact Percentage Faults Huge
>>>>>                              k6.9-rc2-base        base + patch10 + 2 fixes	
>>>>> Percentage huge-1        78.18 (   0.00%)       71.92 (  -8.01%)
>>>>> Percentage huge-3        86.70 (   0.00%)       86.07 (  -0.73%)
>>>>> Percentage huge-5        90.26 (   0.00%)       78.02 ( -13.57%)
>>>>> Percentage huge-7        92.34 (   0.00%)       78.67 ( -14.81%)
>>>>> Percentage huge-12       91.18 (   0.00%)       81.04 ( -11.12%)
>>>>> Percentage huge-18       89.00 (   0.00%)       79.57 ( -10.60%)
>>>>> Percentage huge-24       90.52 (   0.00%)       80.07 ( -11.54%)
>>>>> Percentage huge-30       94.44 (   0.00%)       96.28 (   1.95%)
>>>>> Percentage huge-32       93.09 (   0.00%)       99.39 (   6.77%)
>>>>>
>>>>> I add below fix based on your fix 2, then the thpcompact Percentage
>>>>> looks good. How do you think for the fix?
>>>>
>>>> Yeah another well spotted, thanks. "slight regression" is an understatement,
>>>> this affects not just a "statistics" but very important counter
>>>> NR_FREE_PAGES which IIUC would eventually become larger than reality, make
>>>> the watermark checks false positive and result in depleted reserves etc etc.
>>>> Actually wondering why we're not seeing -next failures already (or maybe I
>>>> just haven't noticed).
>>>
>>> Good catch indeed.
>>>
>>> Trying to understand why I didn't notice this during testing, and I
>>> think it's because I had order-10 pageblocks in my config. There is
>>> this in compaction_capture():
>>>
>>> 	if (order < pageblock_order && migratetype == MIGRATE_MOVABLE)
>>> 		return false;
>>>
>>> Most compaction is for order-9 THPs on movable blocks, so I didn't get
>>> much capturing in practice in order for that leak to be noticable.
>>
>> This makes me wonder why not use 'cc->migratetype' for migratetype comparison, so that low-order (like mTHP) compaction can directly get the released pages, which could avoid some compaction scans without mixing the migratetype?
>>
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 2facf844ef84..7a64020f8222 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -622,7 +622,7 @@ compaction_capture(struct capture_control *capc, struct page *page,
>>           * and vice-versa but no more than normal fallback logic which can
>>           * have trouble finding a high-order free page.
>>           */
>> -       if (order < pageblock_order && migratetype == MIGRATE_MOVABLE)
>> +       if (order < pageblock_order && capc->cc->migratetype != migratetype)
>>                  return false;
>>
>>          capc->page = page;
> 
> It is worth trying, since at the original patch time mTHP was not present and
> not capturing any MIGRATE_MOVABLE makes sense. But with your change, the capture
> will lose the opportunity of letting an unmovable request use a reclaimable
> pageblock and vice-versa, like the comment says. Please change the comment
> as well and we should monitor potential unmovable and reclaimable regression.

Yes, but I think this case is easy to solve. Anyway let me try to do 
some measurement for mTHP.

[10/10] mm: page_alloc: consolidate free page accounting

Commit Message

Comments

Patch