mm: document risk of PF_MEMALLOC_NORECLAIM

Message ID	Zr8LMv89fkfpmBlO@tiehlicka (mailing list archive)
State	New
Headers	show Return-Path: <owner-linux-mm@kvack.org> Date: Fri, 16 Aug 2024 10:17:54 +0200 From: Michal Hocko <mhocko@suse.com> To: Yafang Shao <laoar.shao@gmail.com>, Andrew Morton <akpm@linux-foundation.org> Cc: Christoph Hellwig <hch@infradead.org>, akpm@linux-foundation.org, viro@zeniv.linux.org.uk, brauner@kernel.org, jack@suse.cz, david@fromorbit.com, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, Kent Overstreet <kent.overstreet@linux.dev> Subject: [PATCH] mm: document risk of PF_MEMALLOC_NORECLAIM Message-ID: <Zr8LMv89fkfpmBlO@tiehlicka> References: <20240812090525.80299-2-laoar.shao@gmail.com> <Zrn0FlBY-kYMftK4@infradead.org> <CALOAHbBd2oCVKsMwcH_YGUWT5LGLWmNSUAZzRPp8j7bBaqc1PQ@mail.gmail.com> <Zrxfy-F1ZkvQdhNR@tiehlicka> <CALOAHbCLPLpi39-HVVJvUj=qVcNFcQz=3cd95wFpKZzUntCtdw@mail.gmail.com> <ZrymePQHzTHaUIch@tiehlicka> <CALOAHbDw5_hFGsQGYpmaW2KPXi8TxnxPQg4z7G3GCyuJWWywpQ@mail.gmail.com> <Zr2eiFOT--CV5YsR@tiehlicka> <CALOAHbCnWDPnErCDOWaPo6vc__G56wzmX-j=bGrwAx6J26DgJg@mail.gmail.com> <Zr2liCOFDqPiNk6_@tiehlicka> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <Zr2liCOFDqPiNk6_@tiehlicka> Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	mm: document risk of PF_MEMALLOC_NORECLAIM \| expand mm: document risk of PF_MEMALLOC_NORECLAIM

Michal Hocko Aug. 16, 2024, 8:17 a.m. UTC

Andrew, could you merge the following before PF_MEMALLOC_NORECLAIM can
be removed from the tree altogether please? For the full context the 
email thread starts here: https://lore.kernel.org/all/20240812090525.80299-1-laoar.shao@gmail.com/T/#u
--- 
From f17d36975ec343d9388aa6dbf9ca8d1b58ed09ce Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@suse.com>
Date: Fri, 16 Aug 2024 10:10:00 +0200
Subject: [PATCH] mm: document risk of PF_MEMALLOC_NORECLAIM

PF_MEMALLOC_NORECLAIM has been added even when it was pointed out [1]
that such a allocation contex is inherently unsafe if the context
doesn't fully control all allocations called from this context. Any
potential __GFP_NOFAIL request from withing PF_MEMALLOC_NORECLAIM
context would BUG_ON if the allocation would fail.

[1] https://lore.kernel.org/all/ZcM0xtlKbAOFjv5n@tiehlicka/

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 include/linux/sched.h | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

Christoph Hellwig Aug. 16, 2024, 8:22 a.m. UTC | #1

On Fri, Aug 16, 2024 at 10:17:54AM +0200, Michal Hocko wrote:
> Andrew, could you merge the following before PF_MEMALLOC_NORECLAIM can
> be removed from the tree altogether please? For the full context the 
> email thread starts here: https://lore.kernel.org/all/20240812090525.80299-1-laoar.shao@gmail.com/T/#u

I don't think that's enough.  We just need to kill it given that it was
added without ACKs and despite explicit earlier objections to the API.

That might leave bcachefs with broken inode allocation handling, but so
what - it's marked experimental and there should be other ways to sort
this out.

Michal Hocko Aug. 16, 2024, 8:54 a.m. UTC | #2

On Fri 16-08-24 01:22:37, Christoph Hellwig wrote:
> On Fri, Aug 16, 2024 at 10:17:54AM +0200, Michal Hocko wrote:
> > Andrew, could you merge the following before PF_MEMALLOC_NORECLAIM can
> > be removed from the tree altogether please? For the full context the 
> > email thread starts here: https://lore.kernel.org/all/20240812090525.80299-1-laoar.shao@gmail.com/T/#u
> 
> I don't think that's enough.  We just need to kill it given that it was
> added without ACKs and despite explicit earlier objections to the API.

Yes, I think we should kill it before it spreads even more but I would
not like to make the existing user just broken. I have zero visibility
and understanding of the bcachefs code but from a quick look at __bch2_new_inode
it shouldn't be really terribly hard to push GFP_NOWAIT flag there
directly. It would require inode_init_always_gfp variant as well (to not
touch all existing callers that do not have any locking requirements but
I do not see any other nested allocations.

So a very quick and untested change would look as follows:
---
diff --git a/fs/bcachefs/fs.c b/fs/bcachefs/fs.c
index 15fc41e63b6c..7a55167b9133 100644
--- a/fs/bcachefs/fs.c
+++ b/fs/bcachefs/fs.c
@@ -231,9 +231,9 @@ static struct inode *bch2_alloc_inode(struct super_block *sb)
 	BUG();
 }
 
-static struct bch_inode_info *__bch2_new_inode(struct bch_fs *c)
+static struct bch_inode_info *__bch2_new_inode(struct bch_fs *c, gfp_t gfp)
 {
-	struct bch_inode_info *inode = kmem_cache_alloc(bch2_inode_cache, GFP_NOFS);
+	struct bch_inode_info *inode = kmem_cache_alloc(bch2_inode_cache, gfp);
 	if (!inode)
 		return NULL;
 
@@ -245,7 +245,7 @@ static struct bch_inode_info *__bch2_new_inode(struct bch_fs *c)
 	mutex_init(&inode->ei_quota_lock);
 	memset(&inode->ei_devs_need_flush, 0, sizeof(inode->ei_devs_need_flush));
 
-	if (unlikely(inode_init_always(c->vfs_sb, &inode->v))) {
+	if (unlikely(inode_init_always_gfp(c->vfs_sb, &inode->v), gfp)) {
 		kmem_cache_free(bch2_inode_cache, inode);
 		return NULL;
 	}
@@ -258,12 +258,10 @@ static struct bch_inode_info *__bch2_new_inode(struct bch_fs *c)
  */
 static struct bch_inode_info *bch2_new_inode(struct btree_trans *trans)
 {
-	struct bch_inode_info *inode =
-		memalloc_flags_do(PF_MEMALLOC_NORECLAIM|PF_MEMALLOC_NOWARN,
-				  __bch2_new_inode(trans->c));
+	struct bch_inode_info *inode = __bch2_new_inode(trans->c, GFP_NOWARN | GFP_NOWAIT);
 
 	if (unlikely(!inode)) {
-		int ret = drop_locks_do(trans, (inode = __bch2_new_inode(trans->c)) ? 0 : -ENOMEM);
+		int ret = drop_locks_do(trans, (inode = __bch2_new_inode(trans->c, GFP_NOFS)) ? 0 : -ENOMEM);
 		if (ret && inode) {
 			__destroy_inode(&inode->v);
 			kmem_cache_free(bch2_inode_cache, inode);
@@ -328,7 +326,7 @@ __bch2_create(struct mnt_idmap *idmap,
 	if (ret)
 		return ERR_PTR(ret);
 #endif
-	inode = __bch2_new_inode(c);
+	inode = __bch2_new_inode(c, GFP_NOFS);
 	if (unlikely(!inode)) {
 		inode = ERR_PTR(-ENOMEM);
 		goto err;
diff --git a/fs/inode.c b/fs/inode.c
index 86670941884b..95fd67a6cac3 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -153,7 +153,7 @@ static int no_open(struct inode *inode, struct file *file)
  * These are initializations that need to be done on every inode
  * allocation as the fields are not initialised by slab allocation.
  */
-int inode_init_always(struct super_block *sb, struct inode *inode)
+int inode_init_always(struct super_block *sb, struct inode *inode, gfp_t gfp)
 {
 	static const struct inode_operations empty_iops;
 	static const struct file_operations no_open_fops = {.open = no_open};
@@ -230,14 +230,14 @@ int inode_init_always(struct super_block *sb, struct inode *inode)
 #endif
 	inode->i_flctx = NULL;
 
-	if (unlikely(security_inode_alloc(inode)))
+	if (unlikely(security_inode_alloc(inode, gfp)))
 		return -ENOMEM;
 
 	this_cpu_inc(nr_inodes);
 
 	return 0;
 }
-EXPORT_SYMBOL(inode_init_always);
+EXPORT_SYMBOL(inode_init_always_gfp);
 
 void free_inode_nonrcu(struct inode *inode)
 {
diff --git a/include/linux/fs.h b/include/linux/fs.h
index fd34b5755c0b..5c613a89718b 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3027,7 +3027,12 @@ extern loff_t default_llseek(struct file *file, loff_t offset, int whence);
 
 extern loff_t vfs_llseek(struct file *file, loff_t offset, int whence);
 
-extern int inode_init_always(struct super_block *, struct inode *);
+extern int inode_init_always(struct super_block *, struct inode *, gfp_t);
+static inline int inode_init_always(struct super_block *, struct inode *)
+{
+	return inode_init_always_gfp(GFP_NOFS);
+}
+
 extern void inode_init_once(struct inode *);
 extern void address_space_init_once(struct address_space *mapping);
 extern struct inode * igrab(struct inode *);
diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
index a2ade0ffe9e7..b08472d64765 100644
--- a/include/linux/lsm_hooks.h
+++ b/include/linux/lsm_hooks.h
@@ -150,6 +150,6 @@ extern struct lsm_info __start_early_lsm_info[], __end_early_lsm_info[];
 		__used __section(".early_lsm_info.init")		\
 		__aligned(sizeof(unsigned long))
 
-extern int lsm_inode_alloc(struct inode *inode);
+extern int lsm_inode_alloc(struct inode *inode, gfp_t gfp);
 
 #endif /* ! __LINUX_LSM_HOOKS_H */
diff --git a/include/linux/security.h b/include/linux/security.h
index 1390f1efb4f0..7c6b9b038a0d 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -336,7 +336,7 @@ int security_dentry_create_files_as(struct dentry *dentry, int mode,
 					struct cred *new);
 int security_path_notify(const struct path *path, u64 mask,
 					unsigned int obj_type);
-int security_inode_alloc(struct inode *inode);
+int security_inode_alloc(struct inode *inode, gfp_t gfp);
 void security_inode_free(struct inode *inode);
 int security_inode_init_security(struct inode *inode, struct inode *dir,
 				 const struct qstr *qstr,
@@ -769,7 +769,7 @@ static inline int security_path_notify(const struct path *path, u64 mask,
 	return 0;
 }
 
-static inline int security_inode_alloc(struct inode *inode)
+static inline int security_inode_alloc(struct inode *inode, gfp_t gfp)
 {
 	return 0;
 }
diff --git a/security/security.c b/security/security.c
index 8cee5b6c6e6d..8634d3bee56f 100644
--- a/security/security.c
+++ b/security/security.c
@@ -660,14 +660,14 @@ static int lsm_file_alloc(struct file *file)
  *
  * Returns 0, or -ENOMEM if memory can't be allocated.
  */
-int lsm_inode_alloc(struct inode *inode)
+int lsm_inode_alloc(struct inode *inode, gfp)
 {
 	if (!lsm_inode_cache) {
 		inode->i_security = NULL;
 		return 0;
 	}
 
-	inode->i_security = kmem_cache_zalloc(lsm_inode_cache, GFP_NOFS);
+	inode->i_security = kmem_cache_zalloc(lsm_inode_cache, gfp);
 	if (inode->i_security == NULL)
 		return -ENOMEM;
 	return 0;
@@ -1582,9 +1582,9 @@ int security_path_notify(const struct path *path, u64 mask,
  *
  * Return: Return 0 if operation was successful.
  */
-int security_inode_alloc(struct inode *inode)
+int security_inode_alloc(struct inode *inode, gfp_t gfp)
 {
-	int rc = lsm_inode_alloc(inode);
+	int rc = lsm_inode_alloc(inode, gfp);
 
 	if (unlikely(rc))
 		return rc;

Christoph Hellwig Aug. 16, 2024, 2:26 p.m. UTC | #3

On Fri, Aug 16, 2024 at 10:54:37AM +0200, Michal Hocko wrote:
> Yes, I think we should kill it before it spreads even more but I would
> not like to make the existing user just broken. I have zero visibility
> and understanding of the bcachefs code but from a quick look at __bch2_new_inode
> it shouldn't be really terribly hard to push GFP_NOWAIT flag there
> directly. It would require inode_init_always_gfp variant as well (to not
> touch all existing callers that do not have any locking requirements but
> I do not see any other nested allocations.

I'll probably have to go down into security_inode_alloc as well.
That being said there is no explanation for the behavior here in the
commit logs or the code itself, so who knows.

Michal Hocko Aug. 16, 2024, 3:57 p.m. UTC | #4

On Fri 16-08-24 07:26:06, Christoph Hellwig wrote:
> On Fri, Aug 16, 2024 at 10:54:37AM +0200, Michal Hocko wrote:
> > Yes, I think we should kill it before it spreads even more but I would
> > not like to make the existing user just broken. I have zero visibility
> > and understanding of the bcachefs code but from a quick look at __bch2_new_inode
> > it shouldn't be really terribly hard to push GFP_NOWAIT flag there
> > directly. It would require inode_init_always_gfp variant as well (to not
> > touch all existing callers that do not have any locking requirements but
> > I do not see any other nested allocations.
> 
> I'll probably have to go down into security_inode_alloc as well.

yes, I have done that as well. I was not sure about
inode_alloc_security. It has alloc in the name but none of the caller
actually allocate from there. lsm_inode_alloc was trivial to update.

> That being said there is no explanation for the behavior here in the
> commit logs or the code itself, so who knows.

Yafang Shao Aug. 17, 2024, 2:29 a.m. UTC | #5

On Fri, Aug 16, 2024 at 4:17 PM Michal Hocko <mhocko@suse.com> wrote:
>
> Andrew, could you merge the following before PF_MEMALLOC_NORECLAIM can
> be removed from the tree altogether please? For the full context the
> email thread starts here: https://lore.kernel.org/all/20240812090525.80299-1-laoar.shao@gmail.com/T/#u
> ---
> From f17d36975ec343d9388aa6dbf9ca8d1b58ed09ce Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.com>
> Date: Fri, 16 Aug 2024 10:10:00 +0200
> Subject: [PATCH] mm: document risk of PF_MEMALLOC_NORECLAIM
>
> PF_MEMALLOC_NORECLAIM has been added even when it was pointed out [1]
> that such a allocation contex is inherently unsafe if the context
> doesn't fully control all allocations called from this context. Any
> potential __GFP_NOFAIL request from withing PF_MEMALLOC_NORECLAIM
> context would BUG_ON if the allocation would fail.
>
> [1] https://lore.kernel.org/all/ZcM0xtlKbAOFjv5n@tiehlicka/
>
> Signed-off-by: Michal Hocko <mhocko@suse.com>

Documenting the risk is a good first step. For this change:

Acked-by: Yafang Shao <laoar.shao@gmail.com>

Even without the PF_MEMALLOC_NORECLAIM flag, the underlying risk
remains, as users can still potentially set both ~__GPF_DIRECT_RECLAIM
and __GFP_NOFAIL. PF_MEMALLOC_NORECLAIM does not create this risk; it
only exacerbates it. The core problem lies in the complexity of the
various GFP flags and the lack of robust management for them. While we
have extensive documentation on these flags, it can still be
confusing, particularly for new developers who haven't yet encountered
real-world issues.

For instance:

  * %GFP_NOWAIT is for kernel allocations that should not stall for direct
  * reclaim,
  #define GFP_NOWAIT      (__GFP_KSWAPD_RECLAIM | __GFP_NOWARN)

Initially, it wasn't clear to me why setting __GFP_KSWAPD_RECLAIM and
__GFP_NOWARN would prevent direct reclaim. It only became apparent
after I studied the entire code path of page allocation. I believe
other newcomers to kernel development may face similar confusion as I
did early in my experience.

The real issue we need to address is improving the management of these
GFP flags, though I don't have a concrete solution at this time.

Since I don't have a better alternative yet, I'm not strongly opposed
to reverting the PF_MEMALLOC_NORECLAIM flag if it is deemed the root
cause of the problem.

Michal Hocko Aug. 19, 2024, 7:57 a.m. UTC | #6

On Sat 17-08-24 10:29:31, Yafang Shao wrote:
> On Fri, Aug 16, 2024 at 4:17 PM Michal Hocko <mhocko@suse.com> wrote:
> >
> > Andrew, could you merge the following before PF_MEMALLOC_NORECLAIM can
> > be removed from the tree altogether please? For the full context the
> > email thread starts here: https://lore.kernel.org/all/20240812090525.80299-1-laoar.shao@gmail.com/T/#u
> > ---
> > From f17d36975ec343d9388aa6dbf9ca8d1b58ed09ce Mon Sep 17 00:00:00 2001
> > From: Michal Hocko <mhocko@suse.com>
> > Date: Fri, 16 Aug 2024 10:10:00 +0200
> > Subject: [PATCH] mm: document risk of PF_MEMALLOC_NORECLAIM
> >
> > PF_MEMALLOC_NORECLAIM has been added even when it was pointed out [1]
> > that such a allocation contex is inherently unsafe if the context
> > doesn't fully control all allocations called from this context. Any
> > potential __GFP_NOFAIL request from withing PF_MEMALLOC_NORECLAIM
> > context would BUG_ON if the allocation would fail.
> >
> > [1] https://lore.kernel.org/all/ZcM0xtlKbAOFjv5n@tiehlicka/
> >
> > Signed-off-by: Michal Hocko <mhocko@suse.com>
> 
> Documenting the risk is a good first step. For this change:
> 
> Acked-by: Yafang Shao <laoar.shao@gmail.com>
> 
> Even without the PF_MEMALLOC_NORECLAIM flag, the underlying risk
> remains, as users can still potentially set both ~__GPF_DIRECT_RECLAIM
> and __GFP_NOFAIL.

Users can configure all sorts of nonsensical gfp flags combination. That
is a sad reality of the interface. But we do assume that kernel code is
somehow sane.

Besides that Barry is working on making this less likely by droppong
__GFP_NOFAIL and replace it by GFP_NOFAIL which always includes
__GFP_DIRECT_RECLAIM. Sure nothing will prevent callers from clearing
that flag explicitly but we have no real defense afains broken code.

> PF_MEMALLOC_NORECLAIM does not create this risk; it
> only exacerbates it. The core problem lies in the complexity of the
> various GFP flags and the lack of robust management for them. While we
> have extensive documentation on these flags, it can still be
> confusing, particularly for new developers who haven't yet encountered
> real-world issues.
> 
> For instance:
> 
>   * %GFP_NOWAIT is for kernel allocations that should not stall for direct
>   * reclaim,
>   #define GFP_NOWAIT      (__GFP_KSWAPD_RECLAIM | __GFP_NOWARN)
> 
> Initially, it wasn't clear to me why setting __GFP_KSWAPD_RECLAIM and
> __GFP_NOWARN would prevent direct reclaim. It only became apparent
> after I studied the entire code path of page allocation. I believe
> other newcomers to kernel development may face similar confusion as I
> did early in my experience.
> 
> The real issue we need to address is improving the management of these
> GFP flags, though I don't have a concrete solution at this time.

Welcome to the club. Changing this interface is a _huge_ undertaking.
Just have a look how many users of the gfp flags we have in the kernel.
I can tell you from a first hand experience that even minor tweaks are
really hard to make.

Michal Hocko Aug. 21, 2024, 7:30 a.m. UTC | #7

On Fri 16-08-24 10:54:39, Michal Hocko wrote:
> On Fri 16-08-24 01:22:37, Christoph Hellwig wrote:
> > On Fri, Aug 16, 2024 at 10:17:54AM +0200, Michal Hocko wrote:
> > > Andrew, could you merge the following before PF_MEMALLOC_NORECLAIM can
> > > be removed from the tree altogether please? For the full context the 
> > > email thread starts here: https://lore.kernel.org/all/20240812090525.80299-1-laoar.shao@gmail.com/T/#u
> > 
> > I don't think that's enough.  We just need to kill it given that it was
> > added without ACKs and despite explicit earlier objections to the API.
> 
> Yes, I think we should kill it before it spreads even more but I would
> not like to make the existing user just broken. I have zero visibility
> and understanding of the bcachefs code but from a quick look at __bch2_new_inode
> it shouldn't be really terribly hard to push GFP_NOWAIT flag there
> directly. It would require inode_init_always_gfp variant as well (to not
> touch all existing callers that do not have any locking requirements but
> I do not see any other nested allocations.
> 
> So a very quick and untested change would look as follows:

Anybody managed to give it a more detailed look? I have a fixup for 
[...]
> diff --git a/security/security.c b/security/security.c
> index 8cee5b6c6e6d..8634d3bee56f 100644
> --- a/security/security.c
> +++ b/security/security.c
> @@ -660,14 +660,14 @@ static int lsm_file_alloc(struct file *file)
>   *
>   * Returns 0, or -ENOMEM if memory can't be allocated.
>   */
> -int lsm_inode_alloc(struct inode *inode)
> +int lsm_inode_alloc(struct inode *inode, gfp)

this

>  {
>  	if (!lsm_inode_cache) {
>  		inode->i_security = NULL;
>  		return 0;
>  	}
>  
> -	inode->i_security = kmem_cache_zalloc(lsm_inode_cache, GFP_NOFS);
> +	inode->i_security = kmem_cache_zalloc(lsm_inode_cache, gfp);
>  	if (inode->i_security == NULL)
>  		return -ENOMEM;
>  	return 0;

Christoph Hellwig Aug. 21, 2024, 11:44 a.m. UTC | #8

On Fri, Aug 16, 2024 at 10:54:37AM +0200, Michal Hocko wrote:
> Yes, I think we should kill it before it spreads even more but I would
> not like to make the existing user just broken. I have zero visibility
> and understanding of the bcachefs code but from a quick look at __bch2_new_inode
> it shouldn't be really terribly hard to push GFP_NOWAIT flag there
> directly. >

I don't understand that sentence.  You're adding the gfp_t argument to
it, which to mean counts as pushing it there directly.


> It would require inode_init_always_gfp variant as well (to not
> touch all existing callers that do not have any locking requirements but
> I do not see any other nested allocations.

inode_init_always only has 4 callers, so I'd just add the gfp_t
argument.  Otherwise this looks good modulo the fix your posted:

Acked-by: Christoph Hellwig <hch@lst.de>

Michal Hocko Aug. 21, 2024, 12:37 p.m. UTC | #9

On Wed 21-08-24 04:44:03, Christoph Hellwig wrote:
> On Fri, Aug 16, 2024 at 10:54:37AM +0200, Michal Hocko wrote:
> > Yes, I think we should kill it before it spreads even more but I would
> > not like to make the existing user just broken. I have zero visibility
> > and understanding of the bcachefs code but from a quick look at __bch2_new_inode
> > it shouldn't be really terribly hard to push GFP_NOWAIT flag there
> > directly. >
> 
> I don't understand that sentence.  You're adding the gfp_t argument to
> it, which to mean counts as pushing it there directly.

Sorry, what I meant to say is that pushing GFP_NOWAIT directly seem fine
unless I have missed some hidden corners down the call path which would
require a scope flag to override a hardcoded gfp flag.
 
> > It would require inode_init_always_gfp variant as well (to not
> > touch all existing callers that do not have any locking requirements but
> > I do not see any other nested allocations.
> 
> inode_init_always only has 4 callers, so I'd just add the gfp_t
> argument.  Otherwise this looks good modulo the fix your posted:
> 
> Acked-by: Christoph Hellwig <hch@lst.de>

Thanks. I will wait for more review and post this as a real patch. I
would really appreciate any help with actual testing.

Christian Brauner Aug. 22, 2024, 9:09 a.m. UTC | #10

On Wed, Aug 21, 2024 at 02:37:33PM GMT, Michal Hocko wrote:
> On Wed 21-08-24 04:44:03, Christoph Hellwig wrote:
> > On Fri, Aug 16, 2024 at 10:54:37AM +0200, Michal Hocko wrote:
> > > Yes, I think we should kill it before it spreads even more but I would
> > > not like to make the existing user just broken. I have zero visibility
> > > and understanding of the bcachefs code but from a quick look at __bch2_new_inode
> > > it shouldn't be really terribly hard to push GFP_NOWAIT flag there
> > > directly. >
> > 
> > I don't understand that sentence.  You're adding the gfp_t argument to
> > it, which to mean counts as pushing it there directly.
> 
> Sorry, what I meant to say is that pushing GFP_NOWAIT directly seem fine
> unless I have missed some hidden corners down the call path which would
> require a scope flag to override a hardcoded gfp flag.
>  
> > > It would require inode_init_always_gfp variant as well (to not
> > > touch all existing callers that do not have any locking requirements but
> > > I do not see any other nested allocations.
> > 
> > inode_init_always only has 4 callers, so I'd just add the gfp_t
> > argument.  Otherwise this looks good modulo the fix your posted:
> > 
> > Acked-by: Christoph Hellwig <hch@lst.de>
> 
> Thanks. I will wait for more review and post this as a real patch. I
> would really appreciate any help with actual testing.

Ugh, I really don't like that we have to add a flag argument especially
for an api that's broken but fine.

mm: document risk of PF_MEMALLOC_NORECLAIM

Commit Message

Comments

Patch