mm/page_alloc: Clarify some migratetype fallback code

Message ID	20250214-clarify-steal-v1-1-79dc5adf1b79@google.com (mailing list archive)
State	New
Headers	show Return-Path: <owner-linux-mm@kvack.org> Date: Fri, 14 Feb 2025 18:14:01 +0000 Mime-Version: 1.0 Message-ID: <20250214-clarify-steal-v1-1-79dc5adf1b79@google.com> Subject: [PATCH] mm/page_alloc: Clarify some migratetype fallback code From: Brendan Jackman <jackmanb@google.com> To: Andrew Morton <akpm@linux-foundation.org> Cc: Vlastimil Babka <vbabka@suse.cz>, Mel Gorman <mgorman@techsingularity.net>, Michal Hocko <mhocko@kernel.org>, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Yosry Ahmed <yosry.ahmed@linux.dev>, Brendan Jackman <jackmanb@google.com> Content-Type: text/plain; charset="utf-8" Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	mm/page_alloc: Clarify some migratetype fallback code \| expand mm/page_alloc: Clarify some migratetype fallback code

Brendan Jackman Feb. 14, 2025, 6:14 p.m. UTC

This code is rather confusing because:

 1. "Steal" is sometimes used to refer to the general concept of
    allocating from a from a block of a fallback migratetype
    (steal_suitable_fallback()) but sometimes it refers specifically to
    converting a whole block's migratetype (can_steal_fallback()).

 2. can_steal_fallback() sounds as though it's answering the question "am
    I functionally permitted to allocate from that other type" but in
    fact it is encoding a heuristic preference.

 3. The same piece of data has different names in different places:
    can_steal vs whole_block. This reinforces point 2 because it looks
    like the different names reflect a shift in intent from "am I
    allowed to steal" to "do I want to steal", but no such shift exists.

Fix 1. by avoiding the term "steal" in ambiguous contexts. This fixes
3. automatically since the natural name for can_steal is whole_block.

Fix 2. by using "should" instead of "can", and also rename its
parameters and add some commentary to make it more explicit what they
mean.

Signed-off-by: Brendan Jackman <jackmanb@google.com>
---
 mm/compaction.c |  4 ++--
 mm/internal.h   |  2 +-
 mm/page_alloc.c | 42 ++++++++++++++++++++++--------------------
 3 files changed, 25 insertions(+), 23 deletions(-)


---
base-commit: 128c8f96eb8638c060cd3532dc394d046ce64fe1
change-id: 20250214-clarify-steal-f244880441c1

Best regards,

Johannes Weiner Feb. 14, 2025, 9:26 p.m. UTC | #1

On Fri, Feb 14, 2025 at 06:14:01PM +0000, Brendan Jackman wrote:
> This code is rather confusing because:
> 
>  1. "Steal" is sometimes used to refer to the general concept of
>     allocating from a from a block of a fallback migratetype
>     (steal_suitable_fallback()) but sometimes it refers specifically to
>     converting a whole block's migratetype (can_steal_fallback()).

Yes, that's ambiguous.

>  2. can_steal_fallback() sounds as though it's answering the question "am
>     I functionally permitted to allocate from that other type" but in
>     fact it is encoding a heuristic preference.

Here I don't see that nuance tbh.

>  3. The same piece of data has different names in different places:
>     can_steal vs whole_block. This reinforces point 2 because it looks
>     like the different names reflect a shift in intent from "am I
>     allowed to steal" to "do I want to steal", but no such shift exists.
> 
> Fix 1. by avoiding the term "steal" in ambiguous contexts. This fixes
> 3. automatically since the natural name for can_steal is whole_block.

I'm not a fan of whole_block because it loses the action verb. It
makes sense in the context of steal_suitable_fallback(), but becomes
ominous in find_suitable_fallback().

Maybe @block_claimable would be better?

> Fix 2. by using "should" instead of "can", and also rename its
> parameters and add some commentary to make it more explicit what they
> mean.
> 
> Signed-off-by: Brendan Jackman <jackmanb@google.com>
> ---
>  mm/compaction.c |  4 ++--
>  mm/internal.h   |  2 +-
>  mm/page_alloc.c | 42 ++++++++++++++++++++++--------------------
>  3 files changed, 25 insertions(+), 23 deletions(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 12ed8425fa175c5dec50bac3dddb13499abaaa11..8dccb2e388f128dd134ec6f24c924c118c9c93bb 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -2332,7 +2332,7 @@ static enum compact_result __compact_finished(struct compact_control *cc)
>  	ret = COMPACT_NO_SUITABLE_PAGE;
>  	for (order = cc->order; order < NR_PAGE_ORDERS; order++) {
>  		struct free_area *area = &cc->zone->free_area[order];
> -		bool can_steal;
> +		bool whole_block;
>
>  		/* Job done if page is free of the right migratetype */
>  		if (!free_area_empty(area, migratetype))
> @@ -2349,7 +2349,7 @@ static enum compact_result __compact_finished(struct compact_control *cc)
>  		 * other migratetype buddy lists.
>  		 */
>  		if (find_suitable_fallback(area, order, migratetype,
> -						true, &can_steal) != -1)
> +						true, &whole_block) != -1)

This one e.g. would look clearer with &block_claimable.

Not that it's actually used...

> @@ -1948,7 +1948,7 @@ steal_suitable_fallback(struct zone *zone, struct page *page,
>  	if (boost_watermark(zone) && (alloc_flags & ALLOC_KSWAPD))
>  		set_bit(ZONE_BOOSTED_WATERMARK, &zone->flags);
>  
> -	/* We are not allowed to try stealing from the whole block */
> +	/* No point in stealing from the whole block */

The original comment actually makes more sense to me. Why is there no
point? It could well find enough free+alike pages to steal the
block... It's just not allowed to.

I will say, the current code is pretty hard to reason about:

On one hand we check the block size statically in can_steal_fallback;
on the other hand, we do that majority scan for compatible pages in
steal_suitable_fallback(). The effective outcomes are hard to discern,
and I'm honestly not convinced they're all intentional.

For example, if we're allowed to steal the block because of this in
can_steal_fallback():

	order >= pageblock_order/2

surely, we'll always satisfy this in steal_suitable_fallback()

	free_pages + alike_pages >= (1 << (pageblock_order-1)

on free_pages alone.

And if the order is less than half a block, we're only allowed an
attempt at stealing it if this is true in can_steal_fallback():

	start_type == MIGRATE_RECLAIMABLE ||
	start_type == MIGRATE_UNMOVABLE

So why is the majority scan in steal_suitable_fallback() checking:

	if (start_type == MIGRATE_MOVABLE)
		alike_pages = movable_pages

Here is how I read the effective rules:

- We always steal the block if at least half of it is free.

- If less than half is free, but more than half is compatible (free +
  alike), we currently do movable -> non-movable conversions.

  We don't do non-movable -> movable (won't get to the majority scan).
  This seems reasonable to me, as there seems to be little value in
  making a new pre-polluted movable block.

- However, we do non-movable -> movable conversion if more than half
  is free. This is seemingly in conflict with the previous point.

Then there is compaction, which currently uses only the
find_suitable_fallback() subset of the rules. Namely, for kernel
allocations, compaction stops as soon as there is an adequately sized
fallback. Even if the allocator won't convert the block due to the
majority scan. For movable requests, we'll stop if there is half a
block to fall back to. I suppose that's reasonable - the old
utilization vs. fragmentation debate aside...

Did I miss one?

We should be able to encode all this more concisely.

> @@ -1995,12 +1995,14 @@ steal_suitable_fallback(struct zone *zone, struct page *page,
>  
>  /*
>   * Check whether there is a suitable fallback freepage with requested order.
> - * If only_stealable is true, this function returns fallback_mt only if
> - * we can steal other freepages all together. This would help to reduce
> + * Sets *whole_block to instruct the caller whether it should convert a whole
> + * pageblock to the returned migratetype.
> + * If need_whole_block is true, this function returns fallback_mt only if
> + * we would do this whole-block stealing. This would help to reduce
>   * fragmentation due to mixed migratetype pages in one pageblock.
>   */
>  int find_suitable_fallback(struct free_area *area, unsigned int order,
> -			int migratetype, bool only_stealable, bool *can_steal)
> +			int migratetype, bool need_whole_block, bool *whole_block)
>  {
>  	int i;
>  	int fallback_mt;
> @@ -2008,19 +2010,19 @@ int find_suitable_fallback(struct free_area *area, unsigned int order,
>  	if (area->nr_free == 0)
>  		return -1;
>  
> -	*can_steal = false;
> +	*whole_block = false;
>  	for (i = 0; i < MIGRATE_PCPTYPES - 1 ; i++) {
>  		fallback_mt = fallbacks[migratetype][i];
>  		if (free_area_empty(area, fallback_mt))
>  			continue;
>  
> -		if (can_steal_fallback(order, migratetype))
> -			*can_steal = true;
> +		if (should_steal_whole_block(order, migratetype))
> +			*whole_block = true;
>  
> -		if (!only_stealable)
> +		if (!need_whole_block)
>  			return fallback_mt;
>  
> -		if (*can_steal)
> +		if (*whole_block)
>  			return fallback_mt;
>  	}

This loop is quite awkward, but I think it actually gets more awkward
with the new names.

Consider this instead:

		*block_claimable = can_claim_block(order, migratetype);
		if (*block_claimable || !need_whole_block)
			return fallback_mt;

Or better yet, inline that function completely. There are no other
callers, and consolidating the rules into fewer places would IMO go a
long way of making it easier to follow:

		if (order >= pageblock_order/2 ||
		    start_mt == MIGRATE_RECLAIMABLE ||
		    start_mt == MIGRATE_UNMOVABLE)
			*block_claimable = true;

		if (*block_claimable || !need_whole_block)
			return fallback_mt;

Brendan Jackman Feb. 17, 2025, 4:26 p.m. UTC | #2

On Fri, 14 Feb 2025 at 22:26, Johannes Weiner <hannes@cmpxchg.org> wrote:
> >  2. can_steal_fallback() sounds as though it's answering the question "am
> >     I functionally permitted to allocate from that other type" but in
> >     fact it is encoding a heuristic preference.
>
> Here I don't see that nuance tbh.
>
> >  3. The same piece of data has different names in different places:
> >     can_steal vs whole_block. This reinforces point 2 because it looks
> >     like the different names reflect a shift in intent from "am I
> >     allowed to steal" to "do I want to steal", but no such shift exists.
> >
> > Fix 1. by avoiding the term "steal" in ambiguous contexts. This fixes
> > 3. automatically since the natural name for can_steal is whole_block.
>
> I'm not a fan of whole_block because it loses the action verb. It
> makes sense in the context of steal_suitable_fallback(), but becomes
> ominous in find_suitable_fallback().
>
> Maybe @block_claimable would be better?

Yeah that sounds good to me.

> > @@ -1948,7 +1948,7 @@ steal_suitable_fallback(struct zone *zone, struct page *page,
> >       if (boost_watermark(zone) && (alloc_flags & ALLOC_KSWAPD))
> >               set_bit(ZONE_BOOSTED_WATERMARK, &zone->flags);
> >
> > -     /* We are not allowed to try stealing from the whole block */
> > +     /* No point in stealing from the whole block */
>
> The original comment actually makes more sense to me. Why is there no
> point? It could well find enough free+alike pages to steal the
> block... It's just not allowed to.

OK so this is basically point 2 from the commit message, maybe I don't
really understand why this condition is here, and maybe I'm about to
look stupid.

Why don't we allow this code to steal the whole block? There wouldn't
be any functional bug if it did, right? I thought that !whole_block
just meant "flipping that block would not have any real benefit from a
fragmentation point of view". So we'd just be doing work and modifying
data structures for no good reason. Is there some stronger reason I'm
missing why we really _mustn't_ steal it?

> I will say, the current code is pretty hard to reason about:
>
> On one hand we check the block size statically in can_steal_fallback;
> on the other hand, we do that majority scan for compatible pages in
> steal_suitable_fallback(). The effective outcomes are hard to discern,
> and I'm honestly not convinced they're all intentional.
>
> For example, if we're allowed to steal the block because of this in
> can_steal_fallback():
>
>         order >= pageblock_order/2
>
> surely, we'll always satisfy this in steal_suitable_fallback()
>
>         free_pages + alike_pages >= (1 << (pageblock_order-1)
>
> on free_pages alone.

No, the former is half the _order_ the latter is half the _number of
pages_. So e.g. we could have order=6 (assuming pageblock_order=10)
then free_pages might be only 1<<6 which is less than 1<<9.

Anyway your underlying point that this is confusing is obviously correct!

> And if the order is less than half a block, we're only allowed an
> attempt at stealing it if this is true in can_steal_fallback():
>
>         start_type == MIGRATE_RECLAIMABLE ||
>         start_type == MIGRATE_UNMOVABLE
>
> So why is the majority scan in steal_suitable_fallback() checking:
>
>         if (start_type == MIGRATE_MOVABLE)
>                 alike_pages = movable_pages
>
> Here is how I read the effective rules:
>
> - We always steal the block if at least half of it is free.
>
> - If less than half is free, but more than half is compatible (free +
>   alike), we currently do movable -> non-movable conversions.
>
>   We don't do non-movable -> movable (won't get to the majority scan).
>   This seems reasonable to me, as there seems to be little value in
>   making a new pre-polluted movable block.
>
> - However, we do non-movable -> movable conversion if more than half
>   is free. This is seemingly in conflict with the previous point.

Hmm I'm not sure I'm seeing the "conflict", I think you just have to
word it differently it's like:

- For movable allocations, there's a threshold of the square root of
the pageblock size (??) before we consider stealing.

- Otherwise, we steal the block if more than half is compatible.

- If this threshold prevents us from stealing the whole block, we find
the page via the smallest-order freelist possible instead of largest.

Does that seem right to you?

It feels like that last point has something to do with: if we know in
advance that we aren't gonna steal the whole block, we wanna avoid
breaking down a high-order page. But, if the allocation is movable, we
wouldn't create persistent fragmentation by doing that. So I'm
realising now that I don't entirely understand this.

> Then there is compaction, which currently uses only the
> find_suitable_fallback() subset of the rules. Namely, for kernel
> allocations, compaction stops as soon as there is an adequately sized
> fallback. Even if the allocator won't convert the block due to the
> majority scan. For movable requests, we'll stop if there is half a
> block to fall back to.

Not half a block, "square root" of a block. But yeah.

> I suppose that's reasonable - the old
> utilization vs. fragmentation debate aside...
>
> Did I miss one?
>
> We should be able to encode all this more concisely.
>
> > @@ -1995,12 +1995,14 @@ steal_suitable_fallback(struct zone *zone, struct page *page,
> >
> >  /*
> >   * Check whether there is a suitable fallback freepage with requested order.
> > - * If only_stealable is true, this function returns fallback_mt only if
> > - * we can steal other freepages all together. This would help to reduce
> > + * Sets *whole_block to instruct the caller whether it should convert a whole
> > + * pageblock to the returned migratetype.
> > + * If need_whole_block is true, this function returns fallback_mt only if
> > + * we would do this whole-block stealing. This would help to reduce
> >   * fragmentation due to mixed migratetype pages in one pageblock.
> >   */
> >  int find_suitable_fallback(struct free_area *area, unsigned int order,
> > -                     int migratetype, bool only_stealable, bool *can_steal)
> > +                     int migratetype, bool need_whole_block, bool *whole_block)
> >  {
> >       int i;
> >       int fallback_mt;
> > @@ -2008,19 +2010,19 @@ int find_suitable_fallback(struct free_area *area, unsigned int order,
> >       if (area->nr_free == 0)
> >               return -1;
> >
> > -     *can_steal = false;
> > +     *whole_block = false;
> >       for (i = 0; i < MIGRATE_PCPTYPES - 1 ; i++) {
> >               fallback_mt = fallbacks[migratetype][i];
> >               if (free_area_empty(area, fallback_mt))
> >                       continue;
> >
> > -             if (can_steal_fallback(order, migratetype))
> > -                     *can_steal = true;
> > +             if (should_steal_whole_block(order, migratetype))
> > +                     *whole_block = true;
> >
> > -             if (!only_stealable)
> > +             if (!need_whole_block)
> >                       return fallback_mt;
> >
> > -             if (*can_steal)
> > +             if (*whole_block)
> >                       return fallback_mt;
> >       }
>
> This loop is quite awkward, but I think it actually gets more awkward
> with the new names.
>
> Consider this instead:
>
>                 *block_claimable = can_claim_block(order, migratetype);
>                 if (*block_claimable || !need_whole_block)
>                         return fallback_mt;

Yeah and even just combining the conditionals makes this much easier
to follow IMO.

> Or better yet, inline that function completely. There are no other
> callers, and consolidating the rules into fewer places would IMO go a
> long way of making it easier to follow:
>
>                 if (order >= pageblock_order/2 ||
>                     start_mt == MIGRATE_RECLAIMABLE ||
>                     start_mt == MIGRATE_UNMOVABLE)
>                         *block_claimable = true;
>
>                 if (*block_claimable || !need_whole_block)
>                         return fallback_mt;

Sounds good. There might also be some clarity to be gained by fiddling
around with the comments and polarity of conditions too.

E.g. It's confusing that the comment above that first conditional
talks about returning false for movable pages, but the conditional is
about returning true for unmovable.

Anyway I will have a bit more of a think about this.

Vlastimil Babka Feb. 18, 2025, 10:19 a.m. UTC | #3

On 2/17/25 17:26, Brendan Jackman wrote:
> On Fri, 14 Feb 2025 at 22:26, Johannes Weiner <hannes@cmpxchg.org> wrote:
>> >  2. can_steal_fallback() sounds as though it's answering the question "am
>> >     I functionally permitted to allocate from that other type" but in
>> >     fact it is encoding a heuristic preference.
>>
>> Here I don't see that nuance tbh.

I think I do.

>>
>> >  3. The same piece of data has different names in different places:
>> >     can_steal vs whole_block. This reinforces point 2 because it looks

I think some of the weirdness comes from the time before migratetype hygiene
series. IIRC it was possible to steal just the page we want to allocate,
that and extra pages but not the whole block, or whole block. Now it's
either just the page or whole block.

>> >     like the different names reflect a shift in intent from "am I
>> >     allowed to steal" to "do I want to steal", but no such shift exists.
>> >
>> > Fix 1. by avoiding the term "steal" in ambiguous contexts. This fixes
>> > 3. automatically since the natural name for can_steal is whole_block.
>>
>> I'm not a fan of whole_block because it loses the action verb. It
>> makes sense in the context of steal_suitable_fallback(), but becomes
>> ominous in find_suitable_fallback().
>>
>> Maybe @block_claimable would be better?

How about @claim_block ? That's even more "action verb" as the verb is not
passive.

> 
> Yeah that sounds good to me.
> 
>> > @@ -1948,7 +1948,7 @@ steal_suitable_fallback(struct zone *zone, struct page *page,
>> >       if (boost_watermark(zone) && (alloc_flags & ALLOC_KSWAPD))
>> >               set_bit(ZONE_BOOSTED_WATERMARK, &zone->flags);
>> >
>> > -     /* We are not allowed to try stealing from the whole block */
>> > +     /* No point in stealing from the whole block */
>>
>> The original comment actually makes more sense to me. Why is there no
>> point? It could well find enough free+alike pages to steal the
>> block... It's just not allowed to.
> 
> OK so this is basically point 2 from the commit message, maybe I don't
> really understand why this condition is here, and maybe I'm about to
> look stupid.
> 
> Why don't we allow this code to steal the whole block? There wouldn't
> be any functional bug if it did, right? I thought that !whole_block
> just meant "flipping that block would not have any real benefit from a
> fragmentation point of view". So we'd just be doing work and modifying
> data structures for no good reason. Is there some stronger reason I'm
> missing why we really _mustn't_ steal it?

Agreed with your view.

>> I will say, the current code is pretty hard to reason about:
>>
>> On one hand we check the block size statically in can_steal_fallback;
>> on the other hand, we do that majority scan for compatible pages in
>> steal_suitable_fallback(). The effective outcomes are hard to discern,
>> and I'm honestly not convinced they're all intentional.
>>
>> For example, if we're allowed to steal the block because of this in
>> can_steal_fallback():
>>
>>         order >= pageblock_order/2
>>
>> surely, we'll always satisfy this in steal_suitable_fallback()
>>
>>         free_pages + alike_pages >= (1 << (pageblock_order-1)
>>
>> on free_pages alone.
> 
> No, the former is half the _order_ the latter is half the _number of
> pages_. So e.g. we could have order=6 (assuming pageblock_order=10)
> then free_pages might be only 1<<6 which is less than 1<<9.
> 
> Anyway your underlying point that this is confusing is obviously correct!
> 
>> And if the order is less than half a block, we're only allowed an
>> attempt at stealing it if this is true in can_steal_fallback():
>>
>>         start_type == MIGRATE_RECLAIMABLE ||
>>         start_type == MIGRATE_UNMOVABLE
>>
>> So why is the majority scan in steal_suitable_fallback() checking:
>>
>>         if (start_type == MIGRATE_MOVABLE)
>>                 alike_pages = movable_pages
>>
>> Here is how I read the effective rules:
>>
>> - We always steal the block if at least half of it is free.
>>
>> - If less than half is free, but more than half is compatible (free +
>>   alike), we currently do movable -> non-movable conversions.
>>
>>   We don't do non-movable -> movable (won't get to the majority scan).
>>   This seems reasonable to me, as there seems to be little value in
>>   making a new pre-polluted movable block.
>>
>> - However, we do non-movable -> movable conversion if more than half
>>   is free. This is seemingly in conflict with the previous point.
> 
> Hmm I'm not sure I'm seeing the "conflict", I think you just have to
> word it differently it's like:
> 
> - For movable allocations, there's a threshold of the square root of
> the pageblock size (??) before we consider stealing.
> 
> - Otherwise, we steal the block if more than half is compatible.
> 
> - If this threshold prevents us from stealing the whole block, we find
> the page via the smallest-order freelist possible instead of largest.
> 
> Does that seem right to you?
> 
> It feels like that last point has something to do with: if we know in
> advance that we aren't gonna steal the whole block, we wanna avoid
> breaking down a high-order page. But, if the allocation is movable, we
> wouldn't create persistent fragmentation by doing that. So I'm
> realising now that I don't entirely understand this.

Aha! Looks like this is also a leftover from before migratetype hygiene
series that went unnoticed. The logic was, if we're making an unmovable
allocation stealing from a movable block, even if we don't claim the whole
block, at least steal the highest order available. Then more unmovable
allocations can be satisfied from what remains of the high-order split,
before we have to steal from another movable pageblock.

If we're making a movable allocation stealing from an unmovable pageblock,
we don't need to make this buffer, as polluting unmovable pageblocks with
movable allocations is not that bad - they can be compacted later. So we go
for the smallest order we need.

However IIUC this is all moot now. If we don't claim the whole pageblock,
and split a high-order page, the remains of the split will go to the
freelists of the migratetype of the unclaimed pageblock (i.e. movable),
previously (before migratetype hygiene) they would got to the freelists of
the migratetype we want to allocate (unmovable).

So now it makes no sense to want the highest order if we're not claiming the
whole pageblock, we're just causing more fragmentation for no good reason.
We should always do the find_smallest. It would be good to fix this.

>> Then there is compaction, which currently uses only the
>> find_suitable_fallback() subset of the rules. Namely, for kernel
>> allocations, compaction stops as soon as there is an adequately sized
>> fallback. Even if the allocator won't convert the block due to the
>> majority scan. For movable requests, we'll stop if there is half a
>> block to fall back to.
> 
> Not half a block, "square root" of a block. But yeah.
> 
>> I suppose that's reasonable - the old
>> utilization vs. fragmentation debate aside...
>>
>> Did I miss one?
>>
>> We should be able to encode all this more concisely.
>>
>> > @@ -1995,12 +1995,14 @@ steal_suitable_fallback(struct zone *zone, struct page *page,
>> >
>> >  /*
>> >   * Check whether there is a suitable fallback freepage with requested order.
>> > - * If only_stealable is true, this function returns fallback_mt only if
>> > - * we can steal other freepages all together. This would help to reduce
>> > + * Sets *whole_block to instruct the caller whether it should convert a whole
>> > + * pageblock to the returned migratetype.
>> > + * If need_whole_block is true, this function returns fallback_mt only if
>> > + * we would do this whole-block stealing. This would help to reduce
>> >   * fragmentation due to mixed migratetype pages in one pageblock.
>> >   */
>> >  int find_suitable_fallback(struct free_area *area, unsigned int order,
>> > -                     int migratetype, bool only_stealable, bool *can_steal)
>> > +                     int migratetype, bool need_whole_block, bool *whole_block)
>> >  {
>> >       int i;
>> >       int fallback_mt;
>> > @@ -2008,19 +2010,19 @@ int find_suitable_fallback(struct free_area *area, unsigned int order,
>> >       if (area->nr_free == 0)
>> >               return -1;
>> >
>> > -     *can_steal = false;
>> > +     *whole_block = false;
>> >       for (i = 0; i < MIGRATE_PCPTYPES - 1 ; i++) {
>> >               fallback_mt = fallbacks[migratetype][i];
>> >               if (free_area_empty(area, fallback_mt))
>> >                       continue;
>> >
>> > -             if (can_steal_fallback(order, migratetype))
>> > -                     *can_steal = true;
>> > +             if (should_steal_whole_block(order, migratetype))
>> > +                     *whole_block = true;
>> >
>> > -             if (!only_stealable)
>> > +             if (!need_whole_block)
>> >                       return fallback_mt;
>> >
>> > -             if (*can_steal)
>> > +             if (*whole_block)
>> >                       return fallback_mt;
>> >       }
>>
>> This loop is quite awkward, but I think it actually gets more awkward
>> with the new names.
>>
>> Consider this instead:
>>
>>                 *block_claimable = can_claim_block(order, migratetype);
>>                 if (*block_claimable || !need_whole_block)
>>                         return fallback_mt;
> 
> Yeah and even just combining the conditionals makes this much easier
> to follow IMO.
> 
>> Or better yet, inline that function completely. There are no other
>> callers, and consolidating the rules into fewer places would IMO go a
>> long way of making it easier to follow:
>>
>>                 if (order >= pageblock_order/2 ||
>>                     start_mt == MIGRATE_RECLAIMABLE ||
>>                     start_mt == MIGRATE_UNMOVABLE)
>>                         *block_claimable = true;
>>
>>                 if (*block_claimable || !need_whole_block)
>>                         return fallback_mt;
> 
> Sounds good. There might also be some clarity to be gained by fiddling
> around with the comments and polarity of conditions too.

Agreed.

Would it make sense to have only "bool *whole_block" parameter of
find_suitable_fallback? The value the caller initializes it, it means the
current need_whole_block, the value it has upon return it instructs the
caller what to do. It would mean __compact_finished() would no longer pass
an unused parameter.

> E.g. It's confusing that the comment above that first conditional
> talks about returning false for movable pages, but the conditional is
> about returning true for unmovable.
> 
> Anyway I will have a bit more of a think about this.

Johannes Weiner Feb. 18, 2025, 8:38 p.m. UTC | #4

On Tue, Feb 18, 2025 at 11:19:58AM +0100, Vlastimil Babka wrote:
> On 2/17/25 17:26, Brendan Jackman wrote:
> > On Fri, 14 Feb 2025 at 22:26, Johannes Weiner <hannes@cmpxchg.org> wrote:
> >> >  2. can_steal_fallback() sounds as though it's answering the question "am
> >> >     I functionally permitted to allocate from that other type" but in
> >> >     fact it is encoding a heuristic preference.
> >>
> >> Here I don't see that nuance tbh.
> 
> I think I do.
> 
> >>
> >> >  3. The same piece of data has different names in different places:
> >> >     can_steal vs whole_block. This reinforces point 2 because it looks
> 
> I think some of the weirdness comes from the time before migratetype hygiene
> series. IIRC it was possible to steal just the page we want to allocate,
> that and extra pages but not the whole block, or whole block. Now it's
> either just the page or whole block.
> 
> >> >     like the different names reflect a shift in intent from "am I
> >> >     allowed to steal" to "do I want to steal", but no such shift exists.
> >> >
> >> > Fix 1. by avoiding the term "steal" in ambiguous contexts. This fixes
> >> > 3. automatically since the natural name for can_steal is whole_block.
> >>
> >> I'm not a fan of whole_block because it loses the action verb. It
> >> makes sense in the context of steal_suitable_fallback(), but becomes
> >> ominous in find_suitable_fallback().
> >>
> >> Maybe @block_claimable would be better?
> 
> How about @claim_block ? That's even more "action verb" as the verb is not
> passive.

Sounds good to me.

> > Yeah that sounds good to me.
> > 
> >> > @@ -1948,7 +1948,7 @@ steal_suitable_fallback(struct zone *zone, struct page *page,
> >> >       if (boost_watermark(zone) && (alloc_flags & ALLOC_KSWAPD))
> >> >               set_bit(ZONE_BOOSTED_WATERMARK, &zone->flags);
> >> >
> >> > -     /* We are not allowed to try stealing from the whole block */
> >> > +     /* No point in stealing from the whole block */
> >>
> >> The original comment actually makes more sense to me. Why is there no
> >> point? It could well find enough free+alike pages to steal the
> >> block... It's just not allowed to.
> > 
> > OK so this is basically point 2 from the commit message, maybe I don't
> > really understand why this condition is here, and maybe I'm about to
> > look stupid.
> > 
> > Why don't we allow this code to steal the whole block? There wouldn't
> > be any functional bug if it did, right? I thought that !whole_block
> > just meant "flipping that block would not have any real benefit from a
> > fragmentation point of view". So we'd just be doing work and modifying
> > data structures for no good reason. Is there some stronger reason I'm
> > missing why we really _mustn't_ steal it?
> 
> Agreed with your view.

Thanks, I hadn't looked at it this way. There is also this comment

 * If we are stealing a relatively large buddy page, it is likely there will
 * be more free pages in the pageblock, so try to steal them all.

explaining that it's about whether there is a point in trying.

> >> I will say, the current code is pretty hard to reason about:
> >>
> >> On one hand we check the block size statically in can_steal_fallback;
> >> on the other hand, we do that majority scan for compatible pages in
> >> steal_suitable_fallback(). The effective outcomes are hard to discern,
> >> and I'm honestly not convinced they're all intentional.
> >>
> >> For example, if we're allowed to steal the block because of this in
> >> can_steal_fallback():
> >>
> >>         order >= pageblock_order/2
> >>
> >> surely, we'll always satisfy this in steal_suitable_fallback()
> >>
> >>         free_pages + alike_pages >= (1 << (pageblock_order-1)
> >>
> >> on free_pages alone.
> > 
> > No, the former is half the _order_ the latter is half the _number of
> > pages_. So e.g. we could have order=6 (assuming pageblock_order=10)
> > then free_pages might be only 1<<6 which is less than 1<<9.

Doh, absolute brainfart. I should have known better:

"On Fri, 14 Feb 2025 at 22:26, Johannes Weiner <hannes@cmpxchg.org> wrote:"
    ^^^                 ^^^^^

> Aha! Looks like this is also a leftover from before migratetype hygiene
> series that went unnoticed. The logic was, if we're making an unmovable
> allocation stealing from a movable block, even if we don't claim the whole
> block, at least steal the highest order available. Then more unmovable
> allocations can be satisfied from what remains of the high-order split,
> before we have to steal from another movable pageblock.

Ah, right. So it was

	1. buddy >= pageblock_order:	steal free pages & claim type
	2. buddy >= pageblock_order/2:	steal free pages
	3. otherwise:			steal only the requested page

The hygiene patches eliminated case 2 by disallowing type mismatches
between freelists and the pages on them.

That's why the pageblock_order/2 check looks a lot more spurious now
than it used to.

> If we're making a movable allocation stealing from an unmovable pageblock,
> we don't need to make this buffer, as polluting unmovable pageblocks with
> movable allocations is not that bad - they can be compacted later. So we go
> for the smallest order we need.
> 
> However IIUC this is all moot now. If we don't claim the whole pageblock,
> and split a high-order page, the remains of the split will go to the
> freelists of the migratetype of the unclaimed pageblock (i.e. movable),
> previously (before migratetype hygiene) they would got to the freelists of
> the migratetype we want to allocate (unmovable).
> 
> So now it makes no sense to want the highest order if we're not claiming the
> whole pageblock, we're just causing more fragmentation for no good reason.
> We should always do the find_smallest. It would be good to fix this.

That's a good point.

It still makes sense to have the two passes, though, right? One pass
where we try to steal a whole block starting from the biggest buddies;
and then one pass where we try to steal an individual page starting
from the smallest buddies.

Something like this, completely untested draft:

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d6644aeaabec..f16e3f2bf3dd 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1911,13 +1911,12 @@ static inline bool boost_watermark(struct zone *zone)
  * can claim the whole pageblock for the requested migratetype. If not, we check
  * the pageblock for constituent pages; if at least half of the pages are free
  * or compatible, we can still claim the whole block, so pages freed in the
- * future will be put on the correct free list. Otherwise, we isolate exactly
- * the order we need from the fallback block and leave its migratetype alone.
+ * future will be put on the correct free list.
  */
 static struct page *
-steal_suitable_fallback(struct zone *zone, struct page *page,
-			int current_order, int order, int start_type,
-			unsigned int alloc_flags, bool whole_block)
+try_to_steal_block(struct zone *zone, struct page *page,
+		   int current_order, int order, int start_type,
+		   unsigned int alloc_flags)
 {
 	int free_pages, movable_pages, alike_pages;
 	unsigned long start_pfn;
@@ -1930,7 +1929,7 @@ steal_suitable_fallback(struct zone *zone, struct page *page,
 	 * highatomic accounting.
 	 */
 	if (is_migrate_highatomic(block_type))
-		goto single_page;
+		return NULL;
 
 	/* Take ownership for orders >= pageblock_order */
 	if (current_order >= pageblock_order) {
@@ -1951,14 +1950,10 @@ steal_suitable_fallback(struct zone *zone, struct page *page,
 	if (boost_watermark(zone) && (alloc_flags & ALLOC_KSWAPD))
 		set_bit(ZONE_BOOSTED_WATERMARK, &zone->flags);
 
-	/* We are not allowed to try stealing from the whole block */
-	if (!whole_block)
-		goto single_page;
-
 	/* moving whole block can fail due to zone boundary conditions */
 	if (!prep_move_freepages_block(zone, page, &start_pfn, &free_pages,
 				       &movable_pages))
-		goto single_page;
+		return NULL;
 
 	/*
 	 * Determine how many pages are compatible with our allocation.
@@ -1991,9 +1986,7 @@ steal_suitable_fallback(struct zone *zone, struct page *page,
 		return __rmqueue_smallest(zone, order, start_type);
 	}
 
-single_page:
-	page_del_and_expand(zone, page, order, current_order, block_type);
-	return page;
+	return NULL;
 }
 
 /*
@@ -2216,23 +2209,16 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype,
 		if (fallback_mt == -1)
 			continue;
 
-		/*
-		 * We cannot steal all free pages from the pageblock and the
-		 * requested migratetype is movable. In that case it's better to
-		 * steal and split the smallest available page instead of the
-		 * largest available page, because even if the next movable
-		 * allocation falls back into a different pageblock than this
-		 * one, it won't cause permanent fragmentation.
-		 */
-		if (!can_steal && start_migratetype == MIGRATE_MOVABLE
-					&& current_order > order)
+		if (!can_steal)
 			goto find_smallest;
 
-		goto do_steal;
+		page = get_page_from_free_area(area, fallback_mt);
+		page = try_to_steal_block(zone, page, current_order, order,
+					  start_migratetype, alloc_flags);
+		if (page)
+			goto out;
 	}
 
-	return NULL;
-
 find_smallest:
 	for (current_order = order; current_order < NR_PAGE_ORDERS; current_order++) {
 		area = &(zone->free_area[current_order]);
@@ -2248,13 +2234,9 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype,
 	 */
 	VM_BUG_ON(current_order > MAX_PAGE_ORDER);
 
-do_steal:
 	page = get_page_from_free_area(area, fallback_mt);
-
-	/* take off list, maybe claim block, expand remainder */
-	page = steal_suitable_fallback(zone, page, current_order, order,
-				       start_migratetype, alloc_flags, can_steal);
-
+	page_del_and_expand(zone, page, order, current_order, fallback_mt);
+out:
 	trace_mm_page_alloc_extfrag(page, order, current_order,
 		start_migratetype, fallback_mt);

Vlastimil Babka Feb. 19, 2025, 11:01 a.m. UTC | #5

On 2/18/25 21:38, Johannes Weiner wrote:
> It still makes sense to have the two passes, though, right? One pass
> where we try to steal a whole block starting from the biggest buddies;
> and then one pass where we try to steal an individual page starting
> from the smallest buddies.

Yes.

> Something like this, completely untested draft:

Looks good to me!

mm/page_alloc: Clarify some migratetype fallback code

Commit Message

Comments

Patch