diff mbox series

[2/3] mm: introduce PF_MEMALLOC_NORECLAIM, PF_MEMALLOC_NOWARN

Message ID 20240206215016.961253-3-kent.overstreet@linux.dev (mailing list archive)
State New
Headers show
Series few mm helpers for bcachefs | expand

Commit Message

Kent Overstreet Feb. 6, 2024, 9:50 p.m. UTC
Introduce PF_MEMALLOC_* equivalents of some GFP_ flags:

PF_MEMALLOC_NORECLAIM	-> GFP_NOWAIT
PF_MEMALLOC_NOWARN	-> __GFP_NOWARN

Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Darrick J. Wong <djwong@kernel.org>
Cc: linux-mm@kvack.org
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
---
 include/linux/sched.h    |  4 ++--
 include/linux/sched/mm.h | 17 +++++++++++++----
 2 files changed, 15 insertions(+), 6 deletions(-)

Comments

Vlastimil Babka Feb. 7, 2024, 7:24 a.m. UTC | #1
On 2/6/24 22:50, Kent Overstreet wrote:
> Introduce PF_MEMALLOC_* equivalents of some GFP_ flags:
> 
> PF_MEMALLOC_NORECLAIM	-> GFP_NOWAIT

In an ideal world, this would be nice, but we are in a world with implicit
"too small to fail" guarantees that has so far been impossible to get away
from [1] for small order GFP_KERNEL allocations, and this scoping would be
only safe if no allocations underneath relied on this behavior. But how to
ensure that's the case?

[1] https://lwn.net/Articles/723317/

> PF_MEMALLOC_NOWARN	-> __GFP_NOWARN
> 
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Darrick J. Wong <djwong@kernel.org>
> Cc: linux-mm@kvack.org
> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
> ---
>  include/linux/sched.h    |  4 ++--
>  include/linux/sched/mm.h | 17 +++++++++++++----
>  2 files changed, 15 insertions(+), 6 deletions(-)
> 
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 292c31697248..ca08d92b20ac 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1755,8 +1755,8 @@ extern struct pid *cad_pid;
>  						 * I am cleaning dirty pages from some other bdi. */
>  #define PF_KTHREAD		0x00200000	/* I am a kernel thread */
>  #define PF_RANDOMIZE		0x00400000	/* Randomize virtual address space */
> -#define PF__HOLE__00800000	0x00800000
> -#define PF__HOLE__01000000	0x01000000
> +#define PF_MEMALLOC_NORECLAIM	0x00800000	/* All allocation requests will inherit __GFP_NOWARN */
> +#define PF_MEMALLOC_NOWARN	0x01000000	/* All allocation requests will inherit __GFP_NOWARN */
>  #define PF__HOLE__02000000	0x02000000
>  #define PF_NO_SETAFFINITY	0x04000000	/* Userland is not allowed to meddle with cpus_mask */
>  #define PF_MCE_EARLY		0x08000000      /* Early kill for mce process policy */
> diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
> index f00d7ecc2adf..c29059a76052 100644
> --- a/include/linux/sched/mm.h
> +++ b/include/linux/sched/mm.h
> @@ -236,16 +236,25 @@ static inline gfp_t current_gfp_context(gfp_t flags)
>  {
>  	unsigned int pflags = READ_ONCE(current->flags);
>  
> -	if (unlikely(pflags & (PF_MEMALLOC_NOIO | PF_MEMALLOC_NOFS | PF_MEMALLOC_PIN))) {
> +	if (unlikely(pflags & (PF_MEMALLOC_NOIO |
> +			       PF_MEMALLOC_NOFS |
> +			       PF_MEMALLOC_NORECLAIM |
> +			       PF_MEMALLOC_NOWARN |
> +			       PF_MEMALLOC_PIN))) {
>  		/*
> -		 * NOIO implies both NOIO and NOFS and it is a weaker context
> -		 * so always make sure it makes precedence
> +		 * Stronger flags before weaker flags:
> +		 * NORECLAIM implies NOIO, which in turn implies NOFS
>  		 */
> -		if (pflags & PF_MEMALLOC_NOIO)
> +		if (pflags & PF_MEMALLOC_NORECLAIM)
> +			flags &= ~__GFP_DIRECT_RECLAIM;
> +		else if (pflags & PF_MEMALLOC_NOIO)
>  			flags &= ~(__GFP_IO | __GFP_FS);
>  		else if (pflags & PF_MEMALLOC_NOFS)
>  			flags &= ~__GFP_FS;
>  
> +		if (pflags & PF_MEMALLOC_NOWARN)
> +			flags |= __GFP_NOWARN;
> +
>  		if (pflags & PF_MEMALLOC_PIN)
>  			flags &= ~__GFP_MOVABLE;
>  	}
Michal Hocko Feb. 7, 2024, 7:44 a.m. UTC | #2
On Wed 07-02-24 08:24:33, Vlastimil Babka wrote:
> On 2/6/24 22:50, Kent Overstreet wrote:
> > Introduce PF_MEMALLOC_* equivalents of some GFP_ flags:
> > 
> > PF_MEMALLOC_NORECLAIM	-> GFP_NOWAIT
> 
> In an ideal world, this would be nice, but we are in a world with implicit
> "too small to fail" guarantees that has so far been impossible to get away
> from [1] for small order GFP_KERNEL allocations, and this scoping would be
> only safe if no allocations underneath relied on this behavior. But how to
> ensure that's the case?

Right http://lkml.kernel.org/r/Zbu_yyChbCO6b2Lj@tiehlicka

> [1] https://lwn.net/Articles/723317/
Kent Overstreet Feb. 7, 2024, 9:05 p.m. UTC | #3
On Wed, Feb 07, 2024 at 08:24:33AM +0100, Vlastimil Babka wrote:
> On 2/6/24 22:50, Kent Overstreet wrote:
> > Introduce PF_MEMALLOC_* equivalents of some GFP_ flags:
> > 
> > PF_MEMALLOC_NORECLAIM	-> GFP_NOWAIT
> 
> In an ideal world, this would be nice, but we are in a world with implicit
> "too small to fail" guarantees that has so far been impossible to get away
> from [1] for small order GFP_KERNEL allocations, and this scoping would be
> only safe if no allocations underneath relied on this behavior. But how to
> ensure that's the case?

Fault injection. You can't know if code works if it never gets tested,
and if small allocations don't fail in practice, then you need fault
injection.

But there's a code pattern that absolutely requires GFP_NOWAIT. Say
you've got locks held and you want to allocate memory:

p = kmalloc(GFP_NOWAIT);
if (p)
	goto success;
unlock();
p = kmalloc(GFP_KERNEL);

/* unwind and retry, or tryrelock, depending on what you're doing */

that is - try the allocation nonblocking, then unlock or unwind, then
try it GFP_KERNEL.

bcachefs uses this heavily because we've got bch2_trans_unlock() and
bch2_trans_relock(); relock succeeds iff nothing else took write locks
on the nodes he had locked before - so we can safely use GFP_KERNEL
without causing deadlocks, only the occasional transaction restart.

but: the first GFP_NOWAIT allocation, before using GFP_KERNEL, is
absolutely required - calling unlock() has to be a slowpath operation,
otherwise it will livelock when multiple threads are contending for the
same locks.

More broadly, there's a bunch of other GFP_NOWAIT uses in the kernel,
and we're _not_ going to kill them off, and we are trying to kill off
gfp_t for this kind of purpose - so we need this.
diff mbox series

Patch

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 292c31697248..ca08d92b20ac 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1755,8 +1755,8 @@  extern struct pid *cad_pid;
 						 * I am cleaning dirty pages from some other bdi. */
 #define PF_KTHREAD		0x00200000	/* I am a kernel thread */
 #define PF_RANDOMIZE		0x00400000	/* Randomize virtual address space */
-#define PF__HOLE__00800000	0x00800000
-#define PF__HOLE__01000000	0x01000000
+#define PF_MEMALLOC_NORECLAIM	0x00800000	/* All allocation requests will inherit __GFP_NOWARN */
+#define PF_MEMALLOC_NOWARN	0x01000000	/* All allocation requests will inherit __GFP_NOWARN */
 #define PF__HOLE__02000000	0x02000000
 #define PF_NO_SETAFFINITY	0x04000000	/* Userland is not allowed to meddle with cpus_mask */
 #define PF_MCE_EARLY		0x08000000      /* Early kill for mce process policy */
diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
index f00d7ecc2adf..c29059a76052 100644
--- a/include/linux/sched/mm.h
+++ b/include/linux/sched/mm.h
@@ -236,16 +236,25 @@  static inline gfp_t current_gfp_context(gfp_t flags)
 {
 	unsigned int pflags = READ_ONCE(current->flags);
 
-	if (unlikely(pflags & (PF_MEMALLOC_NOIO | PF_MEMALLOC_NOFS | PF_MEMALLOC_PIN))) {
+	if (unlikely(pflags & (PF_MEMALLOC_NOIO |
+			       PF_MEMALLOC_NOFS |
+			       PF_MEMALLOC_NORECLAIM |
+			       PF_MEMALLOC_NOWARN |
+			       PF_MEMALLOC_PIN))) {
 		/*
-		 * NOIO implies both NOIO and NOFS and it is a weaker context
-		 * so always make sure it makes precedence
+		 * Stronger flags before weaker flags:
+		 * NORECLAIM implies NOIO, which in turn implies NOFS
 		 */
-		if (pflags & PF_MEMALLOC_NOIO)
+		if (pflags & PF_MEMALLOC_NORECLAIM)
+			flags &= ~__GFP_DIRECT_RECLAIM;
+		else if (pflags & PF_MEMALLOC_NOIO)
 			flags &= ~(__GFP_IO | __GFP_FS);
 		else if (pflags & PF_MEMALLOC_NOFS)
 			flags &= ~__GFP_FS;
 
+		if (pflags & PF_MEMALLOC_NOWARN)
+			flags |= __GFP_NOWARN;
+
 		if (pflags & PF_MEMALLOC_PIN)
 			flags &= ~__GFP_MOVABLE;
 	}