Message ID | 20240712032554.444823-1-youling.tang@linux.dev (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm: list_lru: Fix NULL pointer dereference in list_lru_add() | expand |
On Fri, Jul 12, 2024 at 11:25:54AM GMT, Youling Tang wrote: > From: Youling Tang <tangyouling@kylinos.cn> > > Note that list_lru_from_memcg_idx() may return NULL, so it is necessary > to error handle the return value to avoid triggering NULL pointer > dereference BUG. > > The issue was triggered for discussion [1], > Link [1]: https://lore.kernel.org/linux-bcachefs/84de6cb1-57bd-42f7-8029-4203820ef0b4@linux.dev/T/#m901bb26cdb1d9d4bacebf0d034f0a5a712cc93a6 I see no explanation for why this is the correct fix, and I doubt it is. What's the real reason for the NULL lru_list_one, and why doesn't this come up on other filesystems?
Hi, Kent On 12/07/2024 12:07, Kent Overstreet wrote: > On Fri, Jul 12, 2024 at 11:25:54AM GMT, Youling Tang wrote: >> From: Youling Tang <tangyouling@kylinos.cn> >> >> Note that list_lru_from_memcg_idx() may return NULL, so it is necessary >> to error handle the return value to avoid triggering NULL pointer >> dereference BUG. >> >> The issue was triggered for discussion [1], >> Link [1]: https://lore.kernel.org/linux-bcachefs/84de6cb1-57bd-42f7-8029-4203820ef0b4@linux.dev/T/#m901bb26cdb1d9d4bacebf0d034f0a5a712cc93a6 > I see no explanation for why this is the correct fix, and I doubt it is. > What's the real reason for the NULL lru_list_one, and why doesn't this > come up on other filesystems? We can break it down into two questions (independent of each other): 1) Error handling is necessary when l (lru_list_one) is NULL here. 2) Why does marking SLAB_ACCOUNT in bcachefs cause lru_list_one to be NULL? This patch only fixes the first issue, and the real cause of the second issue still needs to be analyzed. Thanks, Youling.
On Fri, Jul 12, 2024 at 12:28:57PM GMT, Youling Tang wrote: > Hi, Kent > > On 12/07/2024 12:07, Kent Overstreet wrote: > > On Fri, Jul 12, 2024 at 11:25:54AM GMT, Youling Tang wrote: > > > From: Youling Tang <tangyouling@kylinos.cn> > > > > > > Note that list_lru_from_memcg_idx() may return NULL, so it is necessary > > > to error handle the return value to avoid triggering NULL pointer > > > dereference BUG. > > > > > > The issue was triggered for discussion [1], > > > Link [1]: https://lore.kernel.org/linux-bcachefs/84de6cb1-57bd-42f7-8029-4203820ef0b4@linux.dev/T/#m901bb26cdb1d9d4bacebf0d034f0a5a712cc93a6 > > I see no explanation for why this is the correct fix, and I doubt it is. > > What's the real reason for the NULL lru_list_one, and why doesn't this > > come up on other filesystems? > We can break it down into two questions (independent of each other): > 1) Error handling is necessary when l (lru_list_one) is NULL here. No, you're just hiding the actual bug - since I wasn't clear, I'm naking this patch.
On 2024/7/12 12:07, Kent Overstreet wrote: > On Fri, Jul 12, 2024 at 11:25:54AM GMT, Youling Tang wrote: >> From: Youling Tang <tangyouling@kylinos.cn> >> >> Note that list_lru_from_memcg_idx() may return NULL, so it is necessary >> to error handle the return value to avoid triggering NULL pointer >> dereference BUG. >> >> The issue was triggered for discussion [1], >> Link [1]: https://lore.kernel.org/linux-bcachefs/84de6cb1-57bd-42f7-8029-4203820ef0b4@linux.dev/T/#m901bb26cdb1d9d4bacebf0d034f0a5a712cc93a6 > > I see no explanation for why this is the correct fix, and I doubt it is. > What's the real reason for the NULL lru_list_one, and why doesn't this > come up on other filesystems? Agree, IIRC, the list_lru_one will be pre-allocated in the allocation path of inode/dentry etc.
Hi, Kent On 12/07/2024 23:49, Kent Overstreet wrote: > On Fri, Jul 12, 2024 at 12:28:57PM GMT, Youling Tang wrote: >> Hi, Kent >> >> On 12/07/2024 12:07, Kent Overstreet wrote: >>> On Fri, Jul 12, 2024 at 11:25:54AM GMT, Youling Tang wrote: >>>> From: Youling Tang <tangyouling@kylinos.cn> >>>> >>>> Note that list_lru_from_memcg_idx() may return NULL, so it is necessary >>>> to error handle the return value to avoid triggering NULL pointer >>>> dereference BUG. >>>> >>>> The issue was triggered for discussion [1], >>>> Link [1]: https://lore.kernel.org/linux-bcachefs/84de6cb1-57bd-42f7-8029-4203820ef0b4@linux.dev/T/#m901bb26cdb1d9d4bacebf0d034f0a5a712cc93a6 >>> I see no explanation for why this is the correct fix, and I doubt it is. >>> What's the real reason for the NULL lru_list_one, and why doesn't this >>> come up on other filesystems? >> We can break it down into two questions (independent of each other): >> 1) Error handling is necessary when l (lru_list_one) is NULL here. > No, you're just hiding the actual bug - since I wasn't clear, I'm naking > this patch. We should use kmem_cache_alloc_lru() instead of kmem_cache_alloc(), similar to the [1] modification. Apply the following patch to fix the problem: diff --git a/fs/bcachefs/fs.c b/fs/bcachefs/fs.c index f9c9a95d7d4c..79a580dfb5e1 100644 --- a/fs/bcachefs/fs.c +++ b/fs/bcachefs/fs.c @@ -227,7 +227,8 @@ static struct inode *bch2_alloc_inode(struct super_block *sb) static struct bch_inode_info *__bch2_new_inode(struct bch_fs *c) { - struct bch_inode_info *inode = kmem_cache_alloc(bch2_inode_cache, GFP_NOFS); + struct bch_inode_info *inode = alloc_inode_sb(c->vfs_sb, bch2_inode_cache, GFP_NOFS); if (!inode) return NULL; Link [1]: https://lwn.net/ml/linux-kernel/20220228122126.37293-5-songmuchun@bytedance.com/ Thanks, Youling.
On Tue, Jul 16, 2024 at 10:28:33AM GMT, Youling Tang wrote: > Hi, Kent > > On 12/07/2024 23:49, Kent Overstreet wrote: > > On Fri, Jul 12, 2024 at 12:28:57PM GMT, Youling Tang wrote: > > > Hi, Kent > > > > > > On 12/07/2024 12:07, Kent Overstreet wrote: > > > > On Fri, Jul 12, 2024 at 11:25:54AM GMT, Youling Tang wrote: > > > > > From: Youling Tang <tangyouling@kylinos.cn> > > > > > > > > > > Note that list_lru_from_memcg_idx() may return NULL, so it is necessary > > > > > to error handle the return value to avoid triggering NULL pointer > > > > > dereference BUG. > > > > > > > > > > The issue was triggered for discussion [1], > > > > > Link [1]: https://lore.kernel.org/linux-bcachefs/84de6cb1-57bd-42f7-8029-4203820ef0b4@linux.dev/T/#m901bb26cdb1d9d4bacebf0d034f0a5a712cc93a6 > > > > I see no explanation for why this is the correct fix, and I doubt it is. > > > > What's the real reason for the NULL lru_list_one, and why doesn't this > > > > come up on other filesystems? > > > We can break it down into two questions (independent of each other): > > > 1) Error handling is necessary when l (lru_list_one) is NULL here. > > No, you're just hiding the actual bug - since I wasn't clear, I'm naking > > this patch. > We should use kmem_cache_alloc_lru() instead of kmem_cache_alloc(), > similar to the [1] modification. > > Apply the following patch to fix the problem: Thanks, send it as a proper patch and I'll apply it > > diff --git a/fs/bcachefs/fs.c b/fs/bcachefs/fs.c > index f9c9a95d7d4c..79a580dfb5e1 100644 > --- a/fs/bcachefs/fs.c > +++ b/fs/bcachefs/fs.c > @@ -227,7 +227,8 @@ static struct inode *bch2_alloc_inode(struct super_block > *sb) > > static struct bch_inode_info *__bch2_new_inode(struct bch_fs *c) > { > - struct bch_inode_info *inode = kmem_cache_alloc(bch2_inode_cache, > GFP_NOFS); > + struct bch_inode_info *inode = alloc_inode_sb(c->vfs_sb, > bch2_inode_cache, GFP_NOFS); > if (!inode) > return NULL; > > Link [1]: https://lwn.net/ml/linux-kernel/20220228122126.37293-5-songmuchun@bytedance.com/ > > Thanks, > Youling.
On 15/07/2024 11:27, Qi Zheng wrote: > > > On 2024/7/12 12:07, Kent Overstreet wrote: >> On Fri, Jul 12, 2024 at 11:25:54AM GMT, Youling Tang wrote: >>> From: Youling Tang <tangyouling@kylinos.cn> >>> >>> Note that list_lru_from_memcg_idx() may return NULL, so it is necessary >>> to error handle the return value to avoid triggering NULL pointer >>> dereference BUG. >>> >>> The issue was triggered for discussion [1], >>> Link [1]: >>> https://lore.kernel.org/linux-bcachefs/84de6cb1-57bd-42f7-8029-4203820ef0b4@linux.dev/T/#m901bb26cdb1d9d4bacebf0d034f0a5a712cc93a6 >> >> I see no explanation for why this is the correct fix, and I doubt it is. >> What's the real reason for the NULL lru_list_one, and why doesn't this >> come up on other filesystems? > > Agree, IIRC, the list_lru_one will be pre-allocated in the allocation > path of inode/dentry etc. Yes, this issue does not fix why lru_list_one is NULL. lru_list_one is NULL because the inode is allocated using kmem_cache_alloc() instead of kmem_cache_alloc_lru(), and the lru argument passed to slab_alloc_node() is NULL. See [1] for the actual fix. However, the return value of list_lru_from_memcg_idx() may be NULL. When list_lru_one is NULL, the kernel will panic directly. Do we need to add error handling when list_lru_one is NULL in list_lru_{add, del}? To avoid hiding the actual BUG(previous changes), we might make the following changes, diff --git a/mm/list_lru.c b/mm/list_lru.c index 3fd64736bc45..fa86d3eff90b 100644 --- a/mm/list_lru.c +++ b/mm/list_lru.c @@ -94,6 +94,9 @@ bool list_lru_add(struct list_lru *lru, struct list_head *item, int nid, spin_lock(&nlru->lock); if (list_empty(item)) { l = list_lru_from_memcg_idx(lru, nid, memcg_kmem_id(memcg)); + if (WARN_ON_ONCE(!l)) + goto out; + list_add_tail(item, &l->list); /* Set shrinker bit if the first element was added */ if (!l->nr_items++) @@ -102,6 +105,7 @@ bool list_lru_add(struct list_lru *lru, struct list_head *item, int nid, spin_unlock(&nlru->lock); return true; } +out: spin_unlock(&nlru->lock); return false; } @@ -126,12 +130,16 @@ bool list_lru_del(struct list_lru *lru, struct list_head *item, int nid, spin_lock(&nlru->lock); if (!list_empty(item)) { l = list_lru_from_memcg_idx(lru, nid, memcg_kmem_id(memcg)); + if (WARN_ON_ONCE(!l)) + goto out; + list_del_init(item); l->nr_items--; nlru->nr_items--; spin_unlock(&nlru->lock); return true; } +out: spin_unlock(&nlru->lock); return false; } Link: [1]: https://lore.kernel.org/all/20240716025816.52156-1-youling.tang@linux.dev/ Thanks, Youling.
On 2024/7/17 10:25, Youling Tang wrote: > On 15/07/2024 11:27, Qi Zheng wrote: >> >> >> On 2024/7/12 12:07, Kent Overstreet wrote: >>> On Fri, Jul 12, 2024 at 11:25:54AM GMT, Youling Tang wrote: >>>> From: Youling Tang <tangyouling@kylinos.cn> >>>> >>>> Note that list_lru_from_memcg_idx() may return NULL, so it is necessary >>>> to error handle the return value to avoid triggering NULL pointer >>>> dereference BUG. >>>> >>>> The issue was triggered for discussion [1], >>>> Link [1]: >>>> https://lore.kernel.org/linux-bcachefs/84de6cb1-57bd-42f7-8029-4203820ef0b4@linux.dev/T/#m901bb26cdb1d9d4bacebf0d034f0a5a712cc93a6 >>> >>> I see no explanation for why this is the correct fix, and I doubt it is. >>> What's the real reason for the NULL lru_list_one, and why doesn't this >>> come up on other filesystems? >> >> Agree, IIRC, the list_lru_one will be pre-allocated in the allocation >> path of inode/dentry etc. > Yes, this issue does not fix why lru_list_one is NULL. > > lru_list_one is NULL because the inode is allocated using > kmem_cache_alloc() > instead of kmem_cache_alloc_lru(), and the lru argument passed to > slab_alloc_node() is NULL. See [1] for the actual fix. > > However, the return value of list_lru_from_memcg_idx() may be NULL. When > list_lru_one is NULL, the kernel will panic directly. Do we need to add > error handling when list_lru_one is NULL in list_lru_{add, del}? Nope, we should pre-allocated the list_lru_one before calling list_lru_add(). > > To avoid hiding the actual BUG(previous changes), we might make the > following > changes, Even if you do this, you should still modify all callers to handle this failure. > > diff --git a/mm/list_lru.c b/mm/list_lru.c > index 3fd64736bc45..fa86d3eff90b 100644 > --- a/mm/list_lru.c > +++ b/mm/list_lru.c > @@ -94,6 +94,9 @@ bool list_lru_add(struct list_lru *lru, struct > list_head *item, int nid, > spin_lock(&nlru->lock); > if (list_empty(item)) { > l = list_lru_from_memcg_idx(lru, nid, > memcg_kmem_id(memcg)); > + if (WARN_ON_ONCE(!l)) > + goto out; > + > list_add_tail(item, &l->list); > /* Set shrinker bit if the first element was added */ > if (!l->nr_items++) > @@ -102,6 +105,7 @@ bool list_lru_add(struct list_lru *lru, struct > list_head *item, int nid, > spin_unlock(&nlru->lock); > return true; > } > +out: > spin_unlock(&nlru->lock); > return false; > } > @@ -126,12 +130,16 @@ bool list_lru_del(struct list_lru *lru, struct > list_head *item, int nid, > spin_lock(&nlru->lock); > if (!list_empty(item)) { > l = list_lru_from_memcg_idx(lru, nid, > memcg_kmem_id(memcg)); > + if (WARN_ON_ONCE(!l)) > + goto out; > + > list_del_init(item); > l->nr_items--; > nlru->nr_items--; > spin_unlock(&nlru->lock); > return true; > } > +out: > spin_unlock(&nlru->lock); > return false; > } > > Link: > [1]: > https://lore.kernel.org/all/20240716025816.52156-1-youling.tang@linux.dev/ > > Thanks, > Youling.
diff --git a/mm/list_lru.c b/mm/list_lru.c index 3fd64736bc45..ee7424c3879d 100644 --- a/mm/list_lru.c +++ b/mm/list_lru.c @@ -94,6 +94,9 @@ bool list_lru_add(struct list_lru *lru, struct list_head *item, int nid, spin_lock(&nlru->lock); if (list_empty(item)) { l = list_lru_from_memcg_idx(lru, nid, memcg_kmem_id(memcg)); + if (!l) + goto out; + list_add_tail(item, &l->list); /* Set shrinker bit if the first element was added */ if (!l->nr_items++) @@ -102,6 +105,7 @@ bool list_lru_add(struct list_lru *lru, struct list_head *item, int nid, spin_unlock(&nlru->lock); return true; } +out: spin_unlock(&nlru->lock); return false; }