diff mbox series

[6/7] mm/page_alloc: Give GFP_ATOMIC and non-blocking allocations access to reserves

Message ID 20230109151631.24923-7-mgorman@techsingularity.net (mailing list archive)
State New
Headers show
Series Discard __GFP_ATOMIC | expand

Commit Message

Mel Gorman Jan. 9, 2023, 3:16 p.m. UTC
Explicit GFP_ATOMIC allocations get flagged ALLOC_HARDER which is a bit
vague. In preparation for removing __GFP_ATOMIC, give GFP_ATOMIC and
other non-blocking allocation requests equal access to reserve.  Rename
ALLOC_HARDER to ALLOC_NON_BLOCK to make it more clear what the flag
means.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/internal.h   |  7 +++++--
 mm/page_alloc.c | 23 +++++++++++++----------
 2 files changed, 18 insertions(+), 12 deletions(-)

Comments

Vlastimil Babka Jan. 11, 2023, 2:12 p.m. UTC | #1
On 1/9/23 16:16, Mel Gorman wrote:
> Explicit GFP_ATOMIC allocations get flagged ALLOC_HARDER which is a bit
> vague. In preparation for removing __GFP_ATOMIC, give GFP_ATOMIC and
> other non-blocking allocation requests equal access to reserve.  Rename
> ALLOC_HARDER to ALLOC_NON_BLOCK to make it more clear what the flag
> means.
> 
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>

Acked-by: Vlastimil Babka <vbabka@suse.cz>
Michal Hocko Jan. 11, 2023, 3:58 p.m. UTC | #2
On Mon 09-01-23 15:16:30, Mel Gorman wrote:
> Explicit GFP_ATOMIC allocations get flagged ALLOC_HARDER which is a bit
> vague. In preparation for removing __GFP_ATOMIC, give GFP_ATOMIC and
> other non-blocking allocation requests equal access to reserve.  Rename
> ALLOC_HARDER to ALLOC_NON_BLOCK to make it more clear what the flag
> means.

GFP_NOWAIT can be also used for opportunistic allocations which can and
should fail quickly if the memory is tight and more elaborate path
should be taken (e.g. try higher order allocation first but fall back to
smaller request if the memory is fragmented). Do we really want to give
those access to memory reserves as well?
Mel Gorman Jan. 11, 2023, 5:05 p.m. UTC | #3
On Wed, Jan 11, 2023 at 04:58:02PM +0100, Michal Hocko wrote:
> On Mon 09-01-23 15:16:30, Mel Gorman wrote:
> > Explicit GFP_ATOMIC allocations get flagged ALLOC_HARDER which is a bit
> > vague. In preparation for removing __GFP_ATOMIC, give GFP_ATOMIC and
> > other non-blocking allocation requests equal access to reserve.  Rename
> > ALLOC_HARDER to ALLOC_NON_BLOCK to make it more clear what the flag
> > means.
> 
> GFP_NOWAIT can be also used for opportunistic allocations which can and
> should fail quickly if the memory is tight and more elaborate path
> should be taken (e.g. try higher order allocation first but fall back to
> smaller request if the memory is fragmented). Do we really want to give
> those access to memory reserves as well?

Good question. Without __GFP_ATOMIC, GFP_NOWAIT only differs from GFP_ATOMIC
by __GFP_HIGH but that is not enough to distinguish between a caller that
cannot sleep versus one that is speculatively attempting an allocation but
has other options. That changelog is misleading, it's not equal access
as GFP_NOWAIT ends up with 25% of the reserves which is less than what
GFP_ATOMIC gets.

Because it becomes impossible to distinguish between non-blocking and
atomic without __GFP_ATOMIC, there is some justification for allowing
access to reserves for GFP_NOWAIT. bio for example attempts an allocation
(clears __GFP_DIRECT_RECLAIM) before falling back to mempool but delays
in IO can also lead to further allocation pressure. mmu gather failing
GFP_WAIT slows the rate memory can be freed. NFS failing GFP_NOWAIT will
have to retry IOs multiple times. The examples were picked at random but
the point is that there are cases where failing GFP_NOWAIT can degrade
the system, particularly delay the cleaning of pages before reclaim.

A lot of the truly speculative users appear to use GFP_NOWAIT | __GFP_NOWARN
so one compromise would be to avoid using reserves if __GFP_NOWARN is
also specified.

Something like this as a separate patch?

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 7244ab522028..0a7a0ac1b46d 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4860,9 +4860,11 @@ gfp_to_alloc_flags(gfp_t gfp_mask, unsigned int order)
 	if (!(gfp_mask & __GFP_DIRECT_RECLAIM)) {
 		/*
 		 * Not worth trying to allocate harder for __GFP_NOMEMALLOC even
-		 * if it can't schedule.
+		 * if it can't schedule. Similarly, a caller specifying
+		 * __GFP_NOWARN is likely a speculative allocation with a
+		 * graceful recovery path.
 		 */
-		if (!(gfp_mask & __GFP_NOMEMALLOC)) {
+		if (!(gfp_mask & (__GFP_NOMEMALLOC|__GFP_NOWARN))) {
 			alloc_flags |= ALLOC_NON_BLOCK;
 
 			if (order > 0)
Michal Hocko Jan. 12, 2023, 8:11 a.m. UTC | #4
On Wed 11-01-23 17:05:52, Mel Gorman wrote:
> On Wed, Jan 11, 2023 at 04:58:02PM +0100, Michal Hocko wrote:
> > On Mon 09-01-23 15:16:30, Mel Gorman wrote:
> > > Explicit GFP_ATOMIC allocations get flagged ALLOC_HARDER which is a bit
> > > vague. In preparation for removing __GFP_ATOMIC, give GFP_ATOMIC and
> > > other non-blocking allocation requests equal access to reserve.  Rename
> > > ALLOC_HARDER to ALLOC_NON_BLOCK to make it more clear what the flag
> > > means.
> > 
> > GFP_NOWAIT can be also used for opportunistic allocations which can and
> > should fail quickly if the memory is tight and more elaborate path
> > should be taken (e.g. try higher order allocation first but fall back to
> > smaller request if the memory is fragmented). Do we really want to give
> > those access to memory reserves as well?
> 
> Good question. Without __GFP_ATOMIC, GFP_NOWAIT only differs from GFP_ATOMIC
> by __GFP_HIGH but that is not enough to distinguish between a caller that
> cannot sleep versus one that is speculatively attempting an allocation but
> has other options. That changelog is misleading, it's not equal access
> as GFP_NOWAIT ends up with 25% of the reserves which is less than what
> GFP_ATOMIC gets.
> 
> Because it becomes impossible to distinguish between non-blocking and
> atomic without __GFP_ATOMIC, there is some justification for allowing
> access to reserves for GFP_NOWAIT. bio for example attempts an allocation
> (clears __GFP_DIRECT_RECLAIM) before falling back to mempool but delays
> in IO can also lead to further allocation pressure. mmu gather failing
> GFP_WAIT slows the rate memory can be freed. NFS failing GFP_NOWAIT will
> have to retry IOs multiple times. The examples were picked at random but
> the point is that there are cases where failing GFP_NOWAIT can degrade
> the system, particularly delay the cleaning of pages before reclaim.

Fair points.

> A lot of the truly speculative users appear to use GFP_NOWAIT | __GFP_NOWARN
> so one compromise would be to avoid using reserves if __GFP_NOWARN is
> also specified.
> 
> Something like this as a separate patch?

I cannot say I would be happy about adding more side effects to
__GFP_NOWARN. You are right that it should be used for those optimistic
allocation requests but historically all many of these subtle side effects
have kicked back at some point. Wouldn't it make sense to explicitly
mark those places which really benefit from reserves instead? This is
more work but it should pay off long term. Your examples above would use
GFP_ATOMIC instead of GFP_NOWAIT.

The semantic would be easier to explain as well. GFP_ATOMIC - non
sleeping allocations which are important so they have access to memory
reserves. GFP_NOWAIT - non sleeping allocations.

> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 7244ab522028..0a7a0ac1b46d 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4860,9 +4860,11 @@ gfp_to_alloc_flags(gfp_t gfp_mask, unsigned int order)
>  	if (!(gfp_mask & __GFP_DIRECT_RECLAIM)) {
>  		/*
>  		 * Not worth trying to allocate harder for __GFP_NOMEMALLOC even
> -		 * if it can't schedule.
> +		 * if it can't schedule. Similarly, a caller specifying
> +		 * __GFP_NOWARN is likely a speculative allocation with a
> +		 * graceful recovery path.
>  		 */
> -		if (!(gfp_mask & __GFP_NOMEMALLOC)) {
> +		if (!(gfp_mask & (__GFP_NOMEMALLOC|__GFP_NOWARN))) {
>  			alloc_flags |= ALLOC_NON_BLOCK;
>  
>  			if (order > 0)
Michal Hocko Jan. 12, 2023, 8:29 a.m. UTC | #5
On Thu 12-01-23 09:11:07, Michal Hocko wrote:
> On Wed 11-01-23 17:05:52, Mel Gorman wrote:
> > On Wed, Jan 11, 2023 at 04:58:02PM +0100, Michal Hocko wrote:
> > > On Mon 09-01-23 15:16:30, Mel Gorman wrote:
> > > > Explicit GFP_ATOMIC allocations get flagged ALLOC_HARDER which is a bit
> > > > vague. In preparation for removing __GFP_ATOMIC, give GFP_ATOMIC and
> > > > other non-blocking allocation requests equal access to reserve.  Rename
> > > > ALLOC_HARDER to ALLOC_NON_BLOCK to make it more clear what the flag
> > > > means.
> > > 
> > > GFP_NOWAIT can be also used for opportunistic allocations which can and
> > > should fail quickly if the memory is tight and more elaborate path
> > > should be taken (e.g. try higher order allocation first but fall back to
> > > smaller request if the memory is fragmented). Do we really want to give
> > > those access to memory reserves as well?
> > 
> > Good question. Without __GFP_ATOMIC, GFP_NOWAIT only differs from GFP_ATOMIC
> > by __GFP_HIGH but that is not enough to distinguish between a caller that
> > cannot sleep versus one that is speculatively attempting an allocation but
> > has other options. That changelog is misleading, it's not equal access
> > as GFP_NOWAIT ends up with 25% of the reserves which is less than what
> > GFP_ATOMIC gets.
> > 
> > Because it becomes impossible to distinguish between non-blocking and
> > atomic without __GFP_ATOMIC, there is some justification for allowing
> > access to reserves for GFP_NOWAIT. bio for example attempts an allocation
> > (clears __GFP_DIRECT_RECLAIM) before falling back to mempool but delays
> > in IO can also lead to further allocation pressure. mmu gather failing
> > GFP_WAIT slows the rate memory can be freed. NFS failing GFP_NOWAIT will
> > have to retry IOs multiple times. The examples were picked at random but
> > the point is that there are cases where failing GFP_NOWAIT can degrade
> > the system, particularly delay the cleaning of pages before reclaim.
> 
> Fair points.
> 
> > A lot of the truly speculative users appear to use GFP_NOWAIT | __GFP_NOWARN
> > so one compromise would be to avoid using reserves if __GFP_NOWARN is
> > also specified.
> > 
> > Something like this as a separate patch?
> 
> I cannot say I would be happy about adding more side effects to
> __GFP_NOWARN. You are right that it should be used for those optimistic
> allocation requests but historically all many of these subtle side effects
> have kicked back at some point.

Should have looked at git grep GFP_ATOMIC | __GFP_NOWARN is quite
popular with more than 50 instances.
Mel Gorman Jan. 12, 2023, 9:24 a.m. UTC | #6
On Thu, Jan 12, 2023 at 09:11:06AM +0100, Michal Hocko wrote:
> On Wed 11-01-23 17:05:52, Mel Gorman wrote:
> > On Wed, Jan 11, 2023 at 04:58:02PM +0100, Michal Hocko wrote:
> > > On Mon 09-01-23 15:16:30, Mel Gorman wrote:
> > > > Explicit GFP_ATOMIC allocations get flagged ALLOC_HARDER which is a bit
> > > > vague. In preparation for removing __GFP_ATOMIC, give GFP_ATOMIC and
> > > > other non-blocking allocation requests equal access to reserve.  Rename
> > > > ALLOC_HARDER to ALLOC_NON_BLOCK to make it more clear what the flag
> > > > means.
> > > 
> > > GFP_NOWAIT can be also used for opportunistic allocations which can and
> > > should fail quickly if the memory is tight and more elaborate path
> > > should be taken (e.g. try higher order allocation first but fall back to
> > > smaller request if the memory is fragmented). Do we really want to give
> > > those access to memory reserves as well?
> > 
> > Good question. Without __GFP_ATOMIC, GFP_NOWAIT only differs from GFP_ATOMIC
> > by __GFP_HIGH but that is not enough to distinguish between a caller that
> > cannot sleep versus one that is speculatively attempting an allocation but
> > has other options. That changelog is misleading, it's not equal access
> > as GFP_NOWAIT ends up with 25% of the reserves which is less than what
> > GFP_ATOMIC gets.
> > 
> > Because it becomes impossible to distinguish between non-blocking and
> > atomic without __GFP_ATOMIC, there is some justification for allowing
> > access to reserves for GFP_NOWAIT. bio for example attempts an allocation
> > (clears __GFP_DIRECT_RECLAIM) before falling back to mempool but delays
> > in IO can also lead to further allocation pressure. mmu gather failing
> > GFP_WAIT slows the rate memory can be freed. NFS failing GFP_NOWAIT will
> > have to retry IOs multiple times. The examples were picked at random but
> > the point is that there are cases where failing GFP_NOWAIT can degrade
> > the system, particularly delay the cleaning of pages before reclaim.
> 
> Fair points.
> 
> > A lot of the truly speculative users appear to use GFP_NOWAIT | __GFP_NOWARN
> > so one compromise would be to avoid using reserves if __GFP_NOWARN is
> > also specified.
> > 
> > Something like this as a separate patch?
> 
> I cannot say I would be happy about adding more side effects to
> __GFP_NOWARN. You are right that it should be used for those optimistic
> allocation requests but historically all many of these subtle side effects
> have kicked back at some point.

True.

> Wouldn't it make sense to explicitly
> mark those places which really benefit from reserves instead?

That would be __GFP_HIGH and would require context from every caller on
whether they need reserves or not and to determine what the consequences
are if there is a stall. Is there immediate local fallout or wider fallout
such as a variable delay before pages can be cleaned?

> This is
> more work but it should pay off long term. Your examples above would use
> GFP_ATOMIC instead of GFP_NOWAIT.
> 

Yes, although it would confuse the meaning of GFP_ATOMIC as a result.
It's described as "%GFP_ATOMIC users can not sleep and need the allocation to
succeed" and something like the bio callsite does not *need* the allocation
to succeed. It can fallback to the mempool and performance simply degrades
temporarily. No doubt there are a few abuses of GFP_ATOMIC just to get
non-blocking behaviour already.

> The semantic would be easier to explain as well. GFP_ATOMIC - non
> sleeping allocations which are important so they have access to memory
> reserves. GFP_NOWAIT - non sleeping allocations.
> 

People's definition of "important" will vary wildly. The following would
avoid reserve access for GFP_NOWAIT for now. It would need to be folded
into this patch and a new changelog

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 7244ab522028..aa20165224cf 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3989,18 +3989,19 @@ bool __zone_watermark_ok(struct zone *z, unsigned int order, unsigned long mark,
 		 * __GFP_HIGH allows access to 50% of the min reserve as well
 		 * as OOM.
 		 */
-		if (alloc_flags & ALLOC_MIN_RESERVE)
+		if (alloc_flags & ALLOC_MIN_RESERVE) {
 			min -= min / 2;
 
-		/*
-		 * Non-blocking allocations can access some of the reserve
-		 * with more access if also __GFP_HIGH. The reasoning is that
-		 * a non-blocking caller may incur a more severe penalty
-		 * if it cannot get memory quickly, particularly if it's
-		 * also __GFP_HIGH.
-		 */
-		if (alloc_flags & ALLOC_NON_BLOCK)
-			min -= min / 4;
+			/*
+			 * Non-blocking allocations (e.g. GFP_ATOMIC) can
+			 * access more reserves than just __GFP_HIGH. Other
+			 * non-blocking allocations requests such as GFP_NOWAIT
+			 * or (GFP_KERNEL & ~__GFP_DIRECT_RECLAIM) do not get
+			 * access to the min reserve.
+			 */
+			if (alloc_flags & ALLOC_NON_BLOCK)
+				min -= min / 4;
+		}
 
 		/*
 		 * OOM victims can try even harder than the normal reserve
Michal Hocko Jan. 12, 2023, 9:45 a.m. UTC | #7
On Thu 12-01-23 09:24:52, Mel Gorman wrote:
> On Thu, Jan 12, 2023 at 09:11:06AM +0100, Michal Hocko wrote:
> > On Wed 11-01-23 17:05:52, Mel Gorman wrote:
> > > On Wed, Jan 11, 2023 at 04:58:02PM +0100, Michal Hocko wrote:
> > > > On Mon 09-01-23 15:16:30, Mel Gorman wrote:
> > > > > Explicit GFP_ATOMIC allocations get flagged ALLOC_HARDER which is a bit
> > > > > vague. In preparation for removing __GFP_ATOMIC, give GFP_ATOMIC and
> > > > > other non-blocking allocation requests equal access to reserve.  Rename
> > > > > ALLOC_HARDER to ALLOC_NON_BLOCK to make it more clear what the flag
> > > > > means.
> > > > 
> > > > GFP_NOWAIT can be also used for opportunistic allocations which can and
> > > > should fail quickly if the memory is tight and more elaborate path
> > > > should be taken (e.g. try higher order allocation first but fall back to
> > > > smaller request if the memory is fragmented). Do we really want to give
> > > > those access to memory reserves as well?
> > > 
> > > Good question. Without __GFP_ATOMIC, GFP_NOWAIT only differs from GFP_ATOMIC
> > > by __GFP_HIGH but that is not enough to distinguish between a caller that
> > > cannot sleep versus one that is speculatively attempting an allocation but
> > > has other options. That changelog is misleading, it's not equal access
> > > as GFP_NOWAIT ends up with 25% of the reserves which is less than what
> > > GFP_ATOMIC gets.
> > > 
> > > Because it becomes impossible to distinguish between non-blocking and
> > > atomic without __GFP_ATOMIC, there is some justification for allowing
> > > access to reserves for GFP_NOWAIT. bio for example attempts an allocation
> > > (clears __GFP_DIRECT_RECLAIM) before falling back to mempool but delays
> > > in IO can also lead to further allocation pressure. mmu gather failing
> > > GFP_WAIT slows the rate memory can be freed. NFS failing GFP_NOWAIT will
> > > have to retry IOs multiple times. The examples were picked at random but
> > > the point is that there are cases where failing GFP_NOWAIT can degrade
> > > the system, particularly delay the cleaning of pages before reclaim.
> > 
> > Fair points.
> > 
> > > A lot of the truly speculative users appear to use GFP_NOWAIT | __GFP_NOWARN
> > > so one compromise would be to avoid using reserves if __GFP_NOWARN is
> > > also specified.
> > > 
> > > Something like this as a separate patch?
> > 
> > I cannot say I would be happy about adding more side effects to
> > __GFP_NOWARN. You are right that it should be used for those optimistic
> > allocation requests but historically all many of these subtle side effects
> > have kicked back at some point.
> 
> True.
> 
> > Wouldn't it make sense to explicitly
> > mark those places which really benefit from reserves instead?
> 
> That would be __GFP_HIGH and would require context from every caller on
> whether they need reserves or not and to determine what the consequences
> are if there is a stall. Is there immediate local fallout or wider fallout
> such as a variable delay before pages can be cleaned?

Yes, and I will not hide I do not mind putting the burden on caller to
justify adding requirement and eat from otherwise shared pool which
memory reserves are.

> > This is
> > more work but it should pay off long term. Your examples above would use
> > GFP_ATOMIC instead of GFP_NOWAIT.
> > 
> 
> Yes, although it would confuse the meaning of GFP_ATOMIC as a result.
> It's described as "%GFP_ATOMIC users can not sleep and need the allocation to
> succeed" and something like the bio callsite does not *need* the allocation
> to succeed. It can fallback to the mempool and performance simply degrades
> temporarily. No doubt there are a few abuses of GFP_ATOMIC just to get
> non-blocking behaviour already.

I am afraid GFP_ATOMIC will eventually require a closer look. Many
users are simply confused by the name and use it from the spin lock
context. Others use it from IRQ context because that is the right thing
to do (TM).

> > The semantic would be easier to explain as well. GFP_ATOMIC - non
> > sleeping allocations which are important so they have access to memory
> > reserves. GFP_NOWAIT - non sleeping allocations.
> > 
> 
> People's definition of "important" will vary wildly. The following would
> avoid reserve access for GFP_NOWAIT for now. It would need to be folded
> into this patch and a new changelog

OK, so that effectively means that __GFP_HIGH modifier will give more
reserves to non-sleepable allocations than sleepable. That is a better
semantic than other special casing because when the two allocations are
competing then the priority non-sleepable should win because it simply
cannot reclaim. That hierarchy makes sense to me. Thanks for bearing
with me here. Changing gfp flags semantic is a PITA. I wish would could
design the whole thing from scratch (and screw it in yet another way).

I will ack the patch once you post the full version of it.
NeilBrown Jan. 14, 2023, 10:10 p.m. UTC | #8
On Thu, 12 Jan 2023, Mel Gorman wrote:
> On Wed, Jan 11, 2023 at 04:58:02PM +0100, Michal Hocko wrote:
> > On Mon 09-01-23 15:16:30, Mel Gorman wrote:
> > > Explicit GFP_ATOMIC allocations get flagged ALLOC_HARDER which is a bit
> > > vague. In preparation for removing __GFP_ATOMIC, give GFP_ATOMIC and
> > > other non-blocking allocation requests equal access to reserve.  Rename
> > > ALLOC_HARDER to ALLOC_NON_BLOCK to make it more clear what the flag
> > > means.
> > 
> > GFP_NOWAIT can be also used for opportunistic allocations which can and
> > should fail quickly if the memory is tight and more elaborate path
> > should be taken (e.g. try higher order allocation first but fall back to
> > smaller request if the memory is fragmented). Do we really want to give
> > those access to memory reserves as well?
> 
> Good question. Without __GFP_ATOMIC, GFP_NOWAIT only differs from GFP_ATOMIC
> by __GFP_HIGH but that is not enough to distinguish between a caller that
> cannot sleep versus one that is speculatively attempting an allocation but
> has other options.

Isn't that a distinction without a difference?
A caller than cannot sleep MUST have other options, because failure is
always possible.
The "other option" might be failure (error to user space, dropped packets
etc), but sometimes failure IS an option.

So the difference between ATOMIC and NOWAIT boils down to the perceived
cost of the "other options".  If that cost is high, then include
__GFP_HIGH to get GFP_ATOMIC.  If that cost is low, then don't include
__GFP_HIGH and get GFP_NOWAIT.

I don't think there is any useful third option that is worth supporting.

NeilBrown



>                     That changelog is misleading, it's not equal access
> as GFP_NOWAIT ends up with 25% of the reserves which is less than what
> GFP_ATOMIC gets.
> 
> Because it becomes impossible to distinguish between non-blocking and
> atomic without __GFP_ATOMIC, there is some justification for allowing
> access to reserves for GFP_NOWAIT. bio for example attempts an allocation
> (clears __GFP_DIRECT_RECLAIM) before falling back to mempool but delays
> in IO can also lead to further allocation pressure. mmu gather failing
> GFP_WAIT slows the rate memory can be freed. NFS failing GFP_NOWAIT will
> have to retry IOs multiple times. The examples were picked at random but
> the point is that there are cases where failing GFP_NOWAIT can degrade
> the system, particularly delay the cleaning of pages before reclaim.
> 
> A lot of the truly speculative users appear to use GFP_NOWAIT | __GFP_NOWARN
> so one compromise would be to avoid using reserves if __GFP_NOWARN is
> also specified.
> 
> Something like this as a separate patch?
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 7244ab522028..0a7a0ac1b46d 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4860,9 +4860,11 @@ gfp_to_alloc_flags(gfp_t gfp_mask, unsigned int order)
>  	if (!(gfp_mask & __GFP_DIRECT_RECLAIM)) {
>  		/*
>  		 * Not worth trying to allocate harder for __GFP_NOMEMALLOC even
> -		 * if it can't schedule.
> +		 * if it can't schedule. Similarly, a caller specifying
> +		 * __GFP_NOWARN is likely a speculative allocation with a
> +		 * graceful recovery path.
>  		 */
> -		if (!(gfp_mask & __GFP_NOMEMALLOC)) {
> +		if (!(gfp_mask & (__GFP_NOMEMALLOC|__GFP_NOWARN))) {
>  			alloc_flags |= ALLOC_NON_BLOCK;
>  
>  			if (order > 0)
>
Mel Gorman Jan. 16, 2023, 10:29 a.m. UTC | #9
On Sun, Jan 15, 2023 at 09:10:13AM +1100, NeilBrown wrote:
> On Thu, 12 Jan 2023, Mel Gorman wrote:
> > On Wed, Jan 11, 2023 at 04:58:02PM +0100, Michal Hocko wrote:
> > > On Mon 09-01-23 15:16:30, Mel Gorman wrote:
> > > > Explicit GFP_ATOMIC allocations get flagged ALLOC_HARDER which is a bit
> > > > vague. In preparation for removing __GFP_ATOMIC, give GFP_ATOMIC and
> > > > other non-blocking allocation requests equal access to reserve.  Rename
> > > > ALLOC_HARDER to ALLOC_NON_BLOCK to make it more clear what the flag
> > > > means.
> > > 
> > > GFP_NOWAIT can be also used for opportunistic allocations which can and
> > > should fail quickly if the memory is tight and more elaborate path
> > > should be taken (e.g. try higher order allocation first but fall back to
> > > smaller request if the memory is fragmented). Do we really want to give
> > > those access to memory reserves as well?
> > 
> > Good question. Without __GFP_ATOMIC, GFP_NOWAIT only differs from GFP_ATOMIC
> > by __GFP_HIGH but that is not enough to distinguish between a caller that
> > cannot sleep versus one that is speculatively attempting an allocation but
> > has other options.
> 
> Isn't that a distinction without a difference?

Ideally yes but it's not always clear what the consequences of failure are.

> A caller than cannot sleep MUST have other options, because failure is
> always possible.
> The "other option" might be failure (error to user space, dropped packets
> etc), but sometimes failure IS an option.
> 

True, but it varies how gracefully it's handled and there is some cut&paste
involved and other cases where the GFP_ATOMIC usage predated the existance
or awareness of NOWAIT.

> So the difference between ATOMIC and NOWAIT boils down to the perceived
> cost of the "other options".  If that cost is high, then include
> __GFP_HIGH to get GFP_ATOMIC.  If that cost is low, then don't include
> __GFP_HIGH and get GFP_NOWAIT.
> 

Again, ideally yes but not necessary true. It depends on how careful
the caller was. The core appears to get it right in the cases I checked,
I'm less sure about drivers.

> I don't think there is any useful third option that is worth supporting.
> 

That's what we'll find out over time once the series hits a released
kernel.
diff mbox series

Patch

diff --git a/mm/internal.h b/mm/internal.h
index 8706d46863df..23a37588073a 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -735,7 +735,10 @@  unsigned int reclaim_clean_pages_from_list(struct zone *zone,
 #define ALLOC_OOM		ALLOC_NO_WATERMARKS
 #endif
 
-#define ALLOC_HARDER		 0x10 /* try to alloc harder */
+#define ALLOC_NON_BLOCK		 0x10 /* Caller cannot block. Allow access
+				       * to 25% of the min watermark or
+				       * 62.5% if __GFP_HIGH is set.
+				       */
 #define ALLOC_MIN_RESERVE	 0x20 /* __GFP_HIGH set. Allow access to 50%
 				       * of the min watermark.
 				       */
@@ -750,7 +753,7 @@  unsigned int reclaim_clean_pages_from_list(struct zone *zone,
 #define ALLOC_KSWAPD		0x800 /* allow waking of kswapd, __GFP_KSWAPD_RECLAIM set */
 
 /* Flags that allow allocations below the min watermark. */
-#define ALLOC_RESERVES (ALLOC_HARDER|ALLOC_MIN_RESERVE|ALLOC_HIGHATOMIC|ALLOC_OOM)
+#define ALLOC_RESERVES (ALLOC_NON_BLOCK|ALLOC_MIN_RESERVE|ALLOC_HIGHATOMIC|ALLOC_OOM)
 
 enum ttu_flags;
 struct tlbflush_unmap_batch;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d2df78f5baa2..2217bab2dbb2 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3999,7 +3999,7 @@  bool __zone_watermark_ok(struct zone *z, unsigned int order, unsigned long mark,
 		 * if it cannot get memory quickly, particularly if it's
 		 * also __GFP_HIGH.
 		 */
-		if (alloc_flags & ALLOC_HARDER)
+		if (alloc_flags & ALLOC_NON_BLOCK)
 			min -= min / 4;
 
 		/*
@@ -4851,28 +4851,30 @@  gfp_to_alloc_flags(gfp_t gfp_mask, unsigned int order)
 	 * The caller may dip into page reserves a bit more if the caller
 	 * cannot run direct reclaim, or if the caller has realtime scheduling
 	 * policy or is asking for __GFP_HIGH memory.  GFP_ATOMIC requests will
-	 * set both ALLOC_HARDER (__GFP_ATOMIC) and ALLOC_MIN_RESERVE(__GFP_HIGH).
+	 * set both ALLOC_NON_BLOCK and ALLOC_MIN_RESERVE(__GFP_HIGH).
 	 */
 	alloc_flags |= (__force int)
 		(gfp_mask & (__GFP_HIGH | __GFP_KSWAPD_RECLAIM));
 
-	if (gfp_mask & __GFP_ATOMIC) {
+	if (!(gfp_mask & __GFP_DIRECT_RECLAIM)) {
 		/*
 		 * Not worth trying to allocate harder for __GFP_NOMEMALLOC even
 		 * if it can't schedule.
 		 */
 		if (!(gfp_mask & __GFP_NOMEMALLOC)) {
-			alloc_flags |= ALLOC_HARDER;
+			alloc_flags |= ALLOC_NON_BLOCK;
 
 			if (order > 0)
 				alloc_flags |= ALLOC_HIGHATOMIC;
 		}
 
 		/*
-		 * Ignore cpuset mems for GFP_ATOMIC rather than fail, see the
-		 * comment for __cpuset_node_allowed().
+		 * Ignore cpuset mems for non-blocking __GFP_HIGH (probably
+		 * GFP_ATOMIC) rather than fail, see the comment for
+		 * __cpuset_node_allowed().
 		 */
-		alloc_flags &= ~ALLOC_CPUSET;
+		if (alloc_flags & ALLOC_MIN_RESERVE)
+			alloc_flags &= ~ALLOC_CPUSET;
 	} else if (unlikely(rt_task(current)) && in_task())
 		alloc_flags |= ALLOC_MIN_RESERVE;
 
@@ -5304,11 +5306,12 @@  __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 
 		/*
 		 * Help non-failing allocations by giving them access to memory
-		 * reserves but do not use ALLOC_NO_WATERMARKS because this
+		 * reserves normally used for high priority non-blocking
+		 * allocations but do not use ALLOC_NO_WATERMARKS because this
 		 * could deplete whole memory reserves which would just make
-		 * the situation worse
+		 * the situation worse.
 		 */
-		page = __alloc_pages_cpuset_fallback(gfp_mask, order, ALLOC_MIN_RESERVE|ALLOC_HARDER, ac);
+		page = __alloc_pages_cpuset_fallback(gfp_mask, order, ALLOC_MIN_RESERVE|ALLOC_NON_BLOCK, ac);
 		if (page)
 			goto got_pg;