diff mbox series

mm: list_lru: Fix NULL pointer dereference in list_lru_add()

Message ID 20240712032554.444823-1-youling.tang@linux.dev (mailing list archive)
State New
Headers show
Series mm: list_lru: Fix NULL pointer dereference in list_lru_add() | expand

Commit Message

Youling Tang July 12, 2024, 3:25 a.m. UTC
From: Youling Tang <tangyouling@kylinos.cn>

Note that list_lru_from_memcg_idx() may return NULL, so it is necessary
to error handle the return value to avoid triggering NULL pointer
dereference BUG.

The issue was triggered for discussion [1],
Link [1]: https://lore.kernel.org/linux-bcachefs/84de6cb1-57bd-42f7-8029-4203820ef0b4@linux.dev/T/#m901bb26cdb1d9d4bacebf0d034f0a5a712cc93a6

Signed-off-by: Youling Tang <tangyouling@kylinos.cn>
---
 mm/list_lru.c | 4 ++++
 1 file changed, 4 insertions(+)

Comments

Kent Overstreet July 12, 2024, 4:07 a.m. UTC | #1
On Fri, Jul 12, 2024 at 11:25:54AM GMT, Youling Tang wrote:
> From: Youling Tang <tangyouling@kylinos.cn>
> 
> Note that list_lru_from_memcg_idx() may return NULL, so it is necessary
> to error handle the return value to avoid triggering NULL pointer
> dereference BUG.
> 
> The issue was triggered for discussion [1],
> Link [1]: https://lore.kernel.org/linux-bcachefs/84de6cb1-57bd-42f7-8029-4203820ef0b4@linux.dev/T/#m901bb26cdb1d9d4bacebf0d034f0a5a712cc93a6

I see no explanation for why this is the correct fix, and I doubt it is.
What's the real reason for the NULL lru_list_one, and why doesn't this
come up on other filesystems?
Youling Tang July 12, 2024, 4:28 a.m. UTC | #2
Hi, Kent

On 12/07/2024 12:07, Kent Overstreet wrote:
> On Fri, Jul 12, 2024 at 11:25:54AM GMT, Youling Tang wrote:
>> From: Youling Tang <tangyouling@kylinos.cn>
>>
>> Note that list_lru_from_memcg_idx() may return NULL, so it is necessary
>> to error handle the return value to avoid triggering NULL pointer
>> dereference BUG.
>>
>> The issue was triggered for discussion [1],
>> Link [1]: https://lore.kernel.org/linux-bcachefs/84de6cb1-57bd-42f7-8029-4203820ef0b4@linux.dev/T/#m901bb26cdb1d9d4bacebf0d034f0a5a712cc93a6
> I see no explanation for why this is the correct fix, and I doubt it is.
> What's the real reason for the NULL lru_list_one, and why doesn't this
> come up on other filesystems?
We can break it down into two questions (independent of each other):
1) Error handling is necessary when l (lru_list_one) is NULL here.
2) Why does marking SLAB_ACCOUNT in bcachefs cause lru_list_one
    to be NULL?

This patch only fixes the first issue, and the real cause of the
second issue still needs to be analyzed.

Thanks,
Youling.
Kent Overstreet July 12, 2024, 3:49 p.m. UTC | #3
On Fri, Jul 12, 2024 at 12:28:57PM GMT, Youling Tang wrote:
> Hi, Kent
> 
> On 12/07/2024 12:07, Kent Overstreet wrote:
> > On Fri, Jul 12, 2024 at 11:25:54AM GMT, Youling Tang wrote:
> > > From: Youling Tang <tangyouling@kylinos.cn>
> > > 
> > > Note that list_lru_from_memcg_idx() may return NULL, so it is necessary
> > > to error handle the return value to avoid triggering NULL pointer
> > > dereference BUG.
> > > 
> > > The issue was triggered for discussion [1],
> > > Link [1]: https://lore.kernel.org/linux-bcachefs/84de6cb1-57bd-42f7-8029-4203820ef0b4@linux.dev/T/#m901bb26cdb1d9d4bacebf0d034f0a5a712cc93a6
> > I see no explanation for why this is the correct fix, and I doubt it is.
> > What's the real reason for the NULL lru_list_one, and why doesn't this
> > come up on other filesystems?
> We can break it down into two questions (independent of each other):
> 1) Error handling is necessary when l (lru_list_one) is NULL here.

No, you're just hiding the actual bug - since I wasn't clear, I'm naking
this patch.
Qi Zheng July 15, 2024, 3:27 a.m. UTC | #4
On 2024/7/12 12:07, Kent Overstreet wrote:
> On Fri, Jul 12, 2024 at 11:25:54AM GMT, Youling Tang wrote:
>> From: Youling Tang <tangyouling@kylinos.cn>
>>
>> Note that list_lru_from_memcg_idx() may return NULL, so it is necessary
>> to error handle the return value to avoid triggering NULL pointer
>> dereference BUG.
>>
>> The issue was triggered for discussion [1],
>> Link [1]: https://lore.kernel.org/linux-bcachefs/84de6cb1-57bd-42f7-8029-4203820ef0b4@linux.dev/T/#m901bb26cdb1d9d4bacebf0d034f0a5a712cc93a6
> 
> I see no explanation for why this is the correct fix, and I doubt it is.
> What's the real reason for the NULL lru_list_one, and why doesn't this
> come up on other filesystems?

Agree, IIRC, the list_lru_one will be pre-allocated in the allocation
path of inode/dentry etc.
Youling Tang July 16, 2024, 2:28 a.m. UTC | #5
Hi, Kent

On 12/07/2024 23:49, Kent Overstreet wrote:
> On Fri, Jul 12, 2024 at 12:28:57PM GMT, Youling Tang wrote:
>> Hi, Kent
>>
>> On 12/07/2024 12:07, Kent Overstreet wrote:
>>> On Fri, Jul 12, 2024 at 11:25:54AM GMT, Youling Tang wrote:
>>>> From: Youling Tang <tangyouling@kylinos.cn>
>>>>
>>>> Note that list_lru_from_memcg_idx() may return NULL, so it is necessary
>>>> to error handle the return value to avoid triggering NULL pointer
>>>> dereference BUG.
>>>>
>>>> The issue was triggered for discussion [1],
>>>> Link [1]: https://lore.kernel.org/linux-bcachefs/84de6cb1-57bd-42f7-8029-4203820ef0b4@linux.dev/T/#m901bb26cdb1d9d4bacebf0d034f0a5a712cc93a6
>>> I see no explanation for why this is the correct fix, and I doubt it is.
>>> What's the real reason for the NULL lru_list_one, and why doesn't this
>>> come up on other filesystems?
>> We can break it down into two questions (independent of each other):
>> 1) Error handling is necessary when l (lru_list_one) is NULL here.
> No, you're just hiding the actual bug - since I wasn't clear, I'm naking
> this patch.
We should use kmem_cache_alloc_lru() instead of kmem_cache_alloc(),
similar to the [1] modification.

Apply the following patch to fix the problem:

diff --git a/fs/bcachefs/fs.c b/fs/bcachefs/fs.c
index f9c9a95d7d4c..79a580dfb5e1 100644
--- a/fs/bcachefs/fs.c
+++ b/fs/bcachefs/fs.c
@@ -227,7 +227,8 @@ static struct inode *bch2_alloc_inode(struct 
super_block *sb)

  static struct bch_inode_info *__bch2_new_inode(struct bch_fs *c)
  {
-       struct bch_inode_info *inode = 
kmem_cache_alloc(bch2_inode_cache, GFP_NOFS);
+       struct bch_inode_info *inode = alloc_inode_sb(c->vfs_sb, 
bch2_inode_cache, GFP_NOFS);
         if (!inode)
                 return NULL;

Link [1]: 
https://lwn.net/ml/linux-kernel/20220228122126.37293-5-songmuchun@bytedance.com/

Thanks,
Youling.
Kent Overstreet July 16, 2024, 2:30 a.m. UTC | #6
On Tue, Jul 16, 2024 at 10:28:33AM GMT, Youling Tang wrote:
> Hi, Kent
> 
> On 12/07/2024 23:49, Kent Overstreet wrote:
> > On Fri, Jul 12, 2024 at 12:28:57PM GMT, Youling Tang wrote:
> > > Hi, Kent
> > > 
> > > On 12/07/2024 12:07, Kent Overstreet wrote:
> > > > On Fri, Jul 12, 2024 at 11:25:54AM GMT, Youling Tang wrote:
> > > > > From: Youling Tang <tangyouling@kylinos.cn>
> > > > > 
> > > > > Note that list_lru_from_memcg_idx() may return NULL, so it is necessary
> > > > > to error handle the return value to avoid triggering NULL pointer
> > > > > dereference BUG.
> > > > > 
> > > > > The issue was triggered for discussion [1],
> > > > > Link [1]: https://lore.kernel.org/linux-bcachefs/84de6cb1-57bd-42f7-8029-4203820ef0b4@linux.dev/T/#m901bb26cdb1d9d4bacebf0d034f0a5a712cc93a6
> > > > I see no explanation for why this is the correct fix, and I doubt it is.
> > > > What's the real reason for the NULL lru_list_one, and why doesn't this
> > > > come up on other filesystems?
> > > We can break it down into two questions (independent of each other):
> > > 1) Error handling is necessary when l (lru_list_one) is NULL here.
> > No, you're just hiding the actual bug - since I wasn't clear, I'm naking
> > this patch.
> We should use kmem_cache_alloc_lru() instead of kmem_cache_alloc(),
> similar to the [1] modification.
> 
> Apply the following patch to fix the problem:

Thanks, send it as a proper patch and I'll apply it

> 
> diff --git a/fs/bcachefs/fs.c b/fs/bcachefs/fs.c
> index f9c9a95d7d4c..79a580dfb5e1 100644
> --- a/fs/bcachefs/fs.c
> +++ b/fs/bcachefs/fs.c
> @@ -227,7 +227,8 @@ static struct inode *bch2_alloc_inode(struct super_block
> *sb)
> 
>  static struct bch_inode_info *__bch2_new_inode(struct bch_fs *c)
>  {
> -       struct bch_inode_info *inode = kmem_cache_alloc(bch2_inode_cache,
> GFP_NOFS);
> +       struct bch_inode_info *inode = alloc_inode_sb(c->vfs_sb,
> bch2_inode_cache, GFP_NOFS);
>         if (!inode)
>                 return NULL;
> 
> Link [1]: https://lwn.net/ml/linux-kernel/20220228122126.37293-5-songmuchun@bytedance.com/
> 
> Thanks,
> Youling.
Youling Tang July 17, 2024, 2:25 a.m. UTC | #7
On 15/07/2024 11:27, Qi Zheng wrote:
>
>
> On 2024/7/12 12:07, Kent Overstreet wrote:
>> On Fri, Jul 12, 2024 at 11:25:54AM GMT, Youling Tang wrote:
>>> From: Youling Tang <tangyouling@kylinos.cn>
>>>
>>> Note that list_lru_from_memcg_idx() may return NULL, so it is necessary
>>> to error handle the return value to avoid triggering NULL pointer
>>> dereference BUG.
>>>
>>> The issue was triggered for discussion [1],
>>> Link [1]: 
>>> https://lore.kernel.org/linux-bcachefs/84de6cb1-57bd-42f7-8029-4203820ef0b4@linux.dev/T/#m901bb26cdb1d9d4bacebf0d034f0a5a712cc93a6
>>
>> I see no explanation for why this is the correct fix, and I doubt it is.
>> What's the real reason for the NULL lru_list_one, and why doesn't this
>> come up on other filesystems?
>
> Agree, IIRC, the list_lru_one will be pre-allocated in the allocation
> path of inode/dentry etc.
Yes, this issue does not fix why lru_list_one is NULL.

lru_list_one is NULL because the inode is allocated using kmem_cache_alloc()
instead of kmem_cache_alloc_lru(), and the lru argument passed to
slab_alloc_node() is NULL. See [1] for the actual fix.

However, the return value of list_lru_from_memcg_idx() may be NULL. When
list_lru_one is NULL, the kernel will panic directly. Do we need to add
error handling when list_lru_one is NULL in list_lru_{add, del}?

To avoid hiding the actual BUG(previous changes), we might make the 
following
changes,

diff --git a/mm/list_lru.c b/mm/list_lru.c
index 3fd64736bc45..fa86d3eff90b 100644
--- a/mm/list_lru.c
+++ b/mm/list_lru.c
@@ -94,6 +94,9 @@ bool list_lru_add(struct list_lru *lru, struct 
list_head *item, int nid,
         spin_lock(&nlru->lock);
         if (list_empty(item)) {
                 l = list_lru_from_memcg_idx(lru, nid, 
memcg_kmem_id(memcg));
+               if (WARN_ON_ONCE(!l))
+                       goto out;
+
                 list_add_tail(item, &l->list);
                 /* Set shrinker bit if the first element was added */
                 if (!l->nr_items++)
@@ -102,6 +105,7 @@ bool list_lru_add(struct list_lru *lru, struct 
list_head *item, int nid,
                 spin_unlock(&nlru->lock);
                 return true;
         }
+out:
         spin_unlock(&nlru->lock);
         return false;
  }
@@ -126,12 +130,16 @@ bool list_lru_del(struct list_lru *lru, struct 
list_head *item, int nid,
         spin_lock(&nlru->lock);
         if (!list_empty(item)) {
                 l = list_lru_from_memcg_idx(lru, nid, 
memcg_kmem_id(memcg));
+               if (WARN_ON_ONCE(!l))
+                       goto out;
+
                 list_del_init(item);
                 l->nr_items--;
                 nlru->nr_items--;
                 spin_unlock(&nlru->lock);
                 return true;
         }
+out:
         spin_unlock(&nlru->lock);
         return false;
  }

Link:
[1]: 
https://lore.kernel.org/all/20240716025816.52156-1-youling.tang@linux.dev/

Thanks,
Youling.
Qi Zheng July 17, 2024, 2:37 a.m. UTC | #8
On 2024/7/17 10:25, Youling Tang wrote:
> On 15/07/2024 11:27, Qi Zheng wrote:
>>
>>
>> On 2024/7/12 12:07, Kent Overstreet wrote:
>>> On Fri, Jul 12, 2024 at 11:25:54AM GMT, Youling Tang wrote:
>>>> From: Youling Tang <tangyouling@kylinos.cn>
>>>>
>>>> Note that list_lru_from_memcg_idx() may return NULL, so it is necessary
>>>> to error handle the return value to avoid triggering NULL pointer
>>>> dereference BUG.
>>>>
>>>> The issue was triggered for discussion [1],
>>>> Link [1]: 
>>>> https://lore.kernel.org/linux-bcachefs/84de6cb1-57bd-42f7-8029-4203820ef0b4@linux.dev/T/#m901bb26cdb1d9d4bacebf0d034f0a5a712cc93a6
>>>
>>> I see no explanation for why this is the correct fix, and I doubt it is.
>>> What's the real reason for the NULL lru_list_one, and why doesn't this
>>> come up on other filesystems?
>>
>> Agree, IIRC, the list_lru_one will be pre-allocated in the allocation
>> path of inode/dentry etc.
> Yes, this issue does not fix why lru_list_one is NULL.
> 
> lru_list_one is NULL because the inode is allocated using 
> kmem_cache_alloc()
> instead of kmem_cache_alloc_lru(), and the lru argument passed to
> slab_alloc_node() is NULL. See [1] for the actual fix.
> 
> However, the return value of list_lru_from_memcg_idx() may be NULL. When
> list_lru_one is NULL, the kernel will panic directly. Do we need to add
> error handling when list_lru_one is NULL in list_lru_{add, del}?

Nope, we should pre-allocated the list_lru_one before calling
list_lru_add().

> 
> To avoid hiding the actual BUG(previous changes), we might make the 
> following
> changes,

Even if you do this, you should still modify all callers
to handle this failure.

> 
> diff --git a/mm/list_lru.c b/mm/list_lru.c
> index 3fd64736bc45..fa86d3eff90b 100644
> --- a/mm/list_lru.c
> +++ b/mm/list_lru.c
> @@ -94,6 +94,9 @@ bool list_lru_add(struct list_lru *lru, struct 
> list_head *item, int nid,
>          spin_lock(&nlru->lock);
>          if (list_empty(item)) {
>                  l = list_lru_from_memcg_idx(lru, nid, 
> memcg_kmem_id(memcg));
> +               if (WARN_ON_ONCE(!l))
> +                       goto out;
> +
>                  list_add_tail(item, &l->list);
>                  /* Set shrinker bit if the first element was added */
>                  if (!l->nr_items++)
> @@ -102,6 +105,7 @@ bool list_lru_add(struct list_lru *lru, struct 
> list_head *item, int nid,
>                  spin_unlock(&nlru->lock);
>                  return true;
>          }
> +out:
>          spin_unlock(&nlru->lock);
>          return false;
>   }
> @@ -126,12 +130,16 @@ bool list_lru_del(struct list_lru *lru, struct 
> list_head *item, int nid,
>          spin_lock(&nlru->lock);
>          if (!list_empty(item)) {
>                  l = list_lru_from_memcg_idx(lru, nid, 
> memcg_kmem_id(memcg));
> +               if (WARN_ON_ONCE(!l))
> +                       goto out;
> +
>                  list_del_init(item);
>                  l->nr_items--;
>                  nlru->nr_items--;
>                  spin_unlock(&nlru->lock);
>                  return true;
>          }
> +out:
>          spin_unlock(&nlru->lock);
>          return false;
>   }
> 
> Link:
> [1]: 
> https://lore.kernel.org/all/20240716025816.52156-1-youling.tang@linux.dev/
> 
> Thanks,
> Youling.
diff mbox series

Patch

diff --git a/mm/list_lru.c b/mm/list_lru.c
index 3fd64736bc45..ee7424c3879d 100644
--- a/mm/list_lru.c
+++ b/mm/list_lru.c
@@ -94,6 +94,9 @@  bool list_lru_add(struct list_lru *lru, struct list_head *item, int nid,
 	spin_lock(&nlru->lock);
 	if (list_empty(item)) {
 		l = list_lru_from_memcg_idx(lru, nid, memcg_kmem_id(memcg));
+		if (!l)
+			goto out;
+
 		list_add_tail(item, &l->list);
 		/* Set shrinker bit if the first element was added */
 		if (!l->nr_items++)
@@ -102,6 +105,7 @@  bool list_lru_add(struct list_lru *lru, struct list_head *item, int nid,
 		spin_unlock(&nlru->lock);
 		return true;
 	}
+out:
 	spin_unlock(&nlru->lock);
 	return false;
 }