Message ID | 20240206215016.961253-3-kent.overstreet@linux.dev (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | few mm helpers for bcachefs | expand |
On 2/6/24 22:50, Kent Overstreet wrote: > Introduce PF_MEMALLOC_* equivalents of some GFP_ flags: > > PF_MEMALLOC_NORECLAIM -> GFP_NOWAIT In an ideal world, this would be nice, but we are in a world with implicit "too small to fail" guarantees that has so far been impossible to get away from [1] for small order GFP_KERNEL allocations, and this scoping would be only safe if no allocations underneath relied on this behavior. But how to ensure that's the case? [1] https://lwn.net/Articles/723317/ > PF_MEMALLOC_NOWARN -> __GFP_NOWARN > > Cc: Vlastimil Babka <vbabka@suse.cz> > Cc: Matthew Wilcox <willy@infradead.org> > Cc: Michal Hocko <mhocko@kernel.org> > Cc: Darrick J. Wong <djwong@kernel.org> > Cc: linux-mm@kvack.org > Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> > --- > include/linux/sched.h | 4 ++-- > include/linux/sched/mm.h | 17 +++++++++++++---- > 2 files changed, 15 insertions(+), 6 deletions(-) > > diff --git a/include/linux/sched.h b/include/linux/sched.h > index 292c31697248..ca08d92b20ac 100644 > --- a/include/linux/sched.h > +++ b/include/linux/sched.h > @@ -1755,8 +1755,8 @@ extern struct pid *cad_pid; > * I am cleaning dirty pages from some other bdi. */ > #define PF_KTHREAD 0x00200000 /* I am a kernel thread */ > #define PF_RANDOMIZE 0x00400000 /* Randomize virtual address space */ > -#define PF__HOLE__00800000 0x00800000 > -#define PF__HOLE__01000000 0x01000000 > +#define PF_MEMALLOC_NORECLAIM 0x00800000 /* All allocation requests will inherit __GFP_NOWARN */ > +#define PF_MEMALLOC_NOWARN 0x01000000 /* All allocation requests will inherit __GFP_NOWARN */ > #define PF__HOLE__02000000 0x02000000 > #define PF_NO_SETAFFINITY 0x04000000 /* Userland is not allowed to meddle with cpus_mask */ > #define PF_MCE_EARLY 0x08000000 /* Early kill for mce process policy */ > diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h > index f00d7ecc2adf..c29059a76052 100644 > --- a/include/linux/sched/mm.h > +++ b/include/linux/sched/mm.h > @@ -236,16 +236,25 @@ static inline gfp_t current_gfp_context(gfp_t flags) > { > unsigned int pflags = READ_ONCE(current->flags); > > - if (unlikely(pflags & (PF_MEMALLOC_NOIO | PF_MEMALLOC_NOFS | PF_MEMALLOC_PIN))) { > + if (unlikely(pflags & (PF_MEMALLOC_NOIO | > + PF_MEMALLOC_NOFS | > + PF_MEMALLOC_NORECLAIM | > + PF_MEMALLOC_NOWARN | > + PF_MEMALLOC_PIN))) { > /* > - * NOIO implies both NOIO and NOFS and it is a weaker context > - * so always make sure it makes precedence > + * Stronger flags before weaker flags: > + * NORECLAIM implies NOIO, which in turn implies NOFS > */ > - if (pflags & PF_MEMALLOC_NOIO) > + if (pflags & PF_MEMALLOC_NORECLAIM) > + flags &= ~__GFP_DIRECT_RECLAIM; > + else if (pflags & PF_MEMALLOC_NOIO) > flags &= ~(__GFP_IO | __GFP_FS); > else if (pflags & PF_MEMALLOC_NOFS) > flags &= ~__GFP_FS; > > + if (pflags & PF_MEMALLOC_NOWARN) > + flags |= __GFP_NOWARN; > + > if (pflags & PF_MEMALLOC_PIN) > flags &= ~__GFP_MOVABLE; > }
On Wed 07-02-24 08:24:33, Vlastimil Babka wrote: > On 2/6/24 22:50, Kent Overstreet wrote: > > Introduce PF_MEMALLOC_* equivalents of some GFP_ flags: > > > > PF_MEMALLOC_NORECLAIM -> GFP_NOWAIT > > In an ideal world, this would be nice, but we are in a world with implicit > "too small to fail" guarantees that has so far been impossible to get away > from [1] for small order GFP_KERNEL allocations, and this scoping would be > only safe if no allocations underneath relied on this behavior. But how to > ensure that's the case? Right http://lkml.kernel.org/r/Zbu_yyChbCO6b2Lj@tiehlicka > [1] https://lwn.net/Articles/723317/
On Wed, Feb 07, 2024 at 08:24:33AM +0100, Vlastimil Babka wrote: > On 2/6/24 22:50, Kent Overstreet wrote: > > Introduce PF_MEMALLOC_* equivalents of some GFP_ flags: > > > > PF_MEMALLOC_NORECLAIM -> GFP_NOWAIT > > In an ideal world, this would be nice, but we are in a world with implicit > "too small to fail" guarantees that has so far been impossible to get away > from [1] for small order GFP_KERNEL allocations, and this scoping would be > only safe if no allocations underneath relied on this behavior. But how to > ensure that's the case? Fault injection. You can't know if code works if it never gets tested, and if small allocations don't fail in practice, then you need fault injection. But there's a code pattern that absolutely requires GFP_NOWAIT. Say you've got locks held and you want to allocate memory: p = kmalloc(GFP_NOWAIT); if (p) goto success; unlock(); p = kmalloc(GFP_KERNEL); /* unwind and retry, or tryrelock, depending on what you're doing */ that is - try the allocation nonblocking, then unlock or unwind, then try it GFP_KERNEL. bcachefs uses this heavily because we've got bch2_trans_unlock() and bch2_trans_relock(); relock succeeds iff nothing else took write locks on the nodes he had locked before - so we can safely use GFP_KERNEL without causing deadlocks, only the occasional transaction restart. but: the first GFP_NOWAIT allocation, before using GFP_KERNEL, is absolutely required - calling unlock() has to be a slowpath operation, otherwise it will livelock when multiple threads are contending for the same locks. More broadly, there's a bunch of other GFP_NOWAIT uses in the kernel, and we're _not_ going to kill them off, and we are trying to kill off gfp_t for this kind of purpose - so we need this.
diff --git a/include/linux/sched.h b/include/linux/sched.h index 292c31697248..ca08d92b20ac 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1755,8 +1755,8 @@ extern struct pid *cad_pid; * I am cleaning dirty pages from some other bdi. */ #define PF_KTHREAD 0x00200000 /* I am a kernel thread */ #define PF_RANDOMIZE 0x00400000 /* Randomize virtual address space */ -#define PF__HOLE__00800000 0x00800000 -#define PF__HOLE__01000000 0x01000000 +#define PF_MEMALLOC_NORECLAIM 0x00800000 /* All allocation requests will inherit __GFP_NOWARN */ +#define PF_MEMALLOC_NOWARN 0x01000000 /* All allocation requests will inherit __GFP_NOWARN */ #define PF__HOLE__02000000 0x02000000 #define PF_NO_SETAFFINITY 0x04000000 /* Userland is not allowed to meddle with cpus_mask */ #define PF_MCE_EARLY 0x08000000 /* Early kill for mce process policy */ diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h index f00d7ecc2adf..c29059a76052 100644 --- a/include/linux/sched/mm.h +++ b/include/linux/sched/mm.h @@ -236,16 +236,25 @@ static inline gfp_t current_gfp_context(gfp_t flags) { unsigned int pflags = READ_ONCE(current->flags); - if (unlikely(pflags & (PF_MEMALLOC_NOIO | PF_MEMALLOC_NOFS | PF_MEMALLOC_PIN))) { + if (unlikely(pflags & (PF_MEMALLOC_NOIO | + PF_MEMALLOC_NOFS | + PF_MEMALLOC_NORECLAIM | + PF_MEMALLOC_NOWARN | + PF_MEMALLOC_PIN))) { /* - * NOIO implies both NOIO and NOFS and it is a weaker context - * so always make sure it makes precedence + * Stronger flags before weaker flags: + * NORECLAIM implies NOIO, which in turn implies NOFS */ - if (pflags & PF_MEMALLOC_NOIO) + if (pflags & PF_MEMALLOC_NORECLAIM) + flags &= ~__GFP_DIRECT_RECLAIM; + else if (pflags & PF_MEMALLOC_NOIO) flags &= ~(__GFP_IO | __GFP_FS); else if (pflags & PF_MEMALLOC_NOFS) flags &= ~__GFP_FS; + if (pflags & PF_MEMALLOC_NOWARN) + flags |= __GFP_NOWARN; + if (pflags & PF_MEMALLOC_PIN) flags &= ~__GFP_MOVABLE; }
Introduce PF_MEMALLOC_* equivalents of some GFP_ flags: PF_MEMALLOC_NORECLAIM -> GFP_NOWAIT PF_MEMALLOC_NOWARN -> __GFP_NOWARN Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Darrick J. Wong <djwong@kernel.org> Cc: linux-mm@kvack.org Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> --- include/linux/sched.h | 4 ++-- include/linux/sched/mm.h | 17 +++++++++++++---- 2 files changed, 15 insertions(+), 6 deletions(-)