[v3] mm: kmem: add lockdep assertion to obj_cgroup_memcg

Message ID	20240725094330.72537-1-songmuchun@bytedance.com (mailing list archive)
State	New
Headers	show Return-Path: <owner-linux-mm@kvack.org> From: Muchun Song <songmuchun@bytedance.com> To: hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, akpm@linux-foundation.org, vbabka@kernel.org Cc: cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Muchun Song <songmuchun@bytedance.com> Subject: [PATCH v3] mm: kmem: add lockdep assertion to obj_cgroup_memcg Date: Thu, 25 Jul 2024 17:43:30 +0800 Message-Id: <20240725094330.72537-1-songmuchun@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	[v3] mm: kmem: add lockdep assertion to obj_cgroup_memcg \| expand [v3] mm: kmem: add lockdep assertion to obj_cgroup_memcg

Muchun Song July 25, 2024, 9:43 a.m. UTC

The obj_cgroup_memcg() is supposed to safe to prevent the returned
memory cgroup from being freed only when the caller is holding the
rcu read lock or objcg_lock or cgroup_mutex. It is very easy to
ignore thoes conditions when users call some upper APIs which call
obj_cgroup_memcg() internally like mem_cgroup_from_slab_obj() (See
the link below). So it is better to add lockdep assertion to
obj_cgroup_memcg() to find those issues ASAP.

Because there is no user of obj_cgroup_memcg() holding objcg_lock
to make the returned memory cgroup safe, do not add objcg_lock
assertion (We should export objcg_lock if we really want to do).
Additionally, this is some internal implementation detail of memcg
and should not be accessible outside memcg code.

Some users like __mem_cgroup_uncharge() do not care the lifetime
of the returned memory cgroup, which just want to know if the
folio is charged to a memory cgroup, therefore, they do not need
to hold the needed locks. In which case, introduce a new helper
folio_memcg_charged() to do this. Compare it to folio_memcg(), it
could eliminate a memory access of objcg->memcg for kmem, actually,
a really small gain.

Link: https://lore.kernel.org/all/20240718083607.42068-1-songmuchun@bytedance.com/
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
---
v3:
 - Use lockdep_assert_once(Vlastimil).

v2:
 - Remove mention of objcg_lock in obj_cgroup_memcg()(Shakeel Butt).

 include/linux/memcontrol.h | 20 +++++++++++++++++---
 mm/memcontrol.c            |  6 +++---
 2 files changed, 20 insertions(+), 6 deletions(-)

Shakeel Butt July 25, 2024, 3:58 p.m. UTC | #1

On Thu, Jul 25, 2024 at 05:43:30PM GMT, Muchun Song wrote:
> The obj_cgroup_memcg() is supposed to safe to prevent the returned
> memory cgroup from being freed only when the caller is holding the
> rcu read lock or objcg_lock or cgroup_mutex. It is very easy to
> ignore thoes conditions when users call some upper APIs which call
> obj_cgroup_memcg() internally like mem_cgroup_from_slab_obj() (See
> the link below). So it is better to add lockdep assertion to
> obj_cgroup_memcg() to find those issues ASAP.
> 
> Because there is no user of obj_cgroup_memcg() holding objcg_lock
> to make the returned memory cgroup safe, do not add objcg_lock
> assertion (We should export objcg_lock if we really want to do).
> Additionally, this is some internal implementation detail of memcg
> and should not be accessible outside memcg code.
> 
> Some users like __mem_cgroup_uncharge() do not care the lifetime
> of the returned memory cgroup, which just want to know if the
> folio is charged to a memory cgroup, therefore, they do not need
> to hold the needed locks. In which case, introduce a new helper
> folio_memcg_charged() to do this. Compare it to folio_memcg(), it
> could eliminate a memory access of objcg->memcg for kmem, actually,
> a really small gain.
> 
> Link: https://lore.kernel.org/all/20240718083607.42068-1-songmuchun@bytedance.com/
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>

Acked-by: Shakeel Butt <shakeel.butt@linux.dev>

Roman Gushchin July 25, 2024, 5:39 p.m. UTC | #2

On Thu, Jul 25, 2024 at 05:43:30PM +0800, Muchun Song wrote:
> The obj_cgroup_memcg() is supposed to safe to prevent the returned
> memory cgroup from being freed only when the caller is holding the
> rcu read lock or objcg_lock or cgroup_mutex. It is very easy to
> ignore thoes conditions when users call some upper APIs which call
> obj_cgroup_memcg() internally like mem_cgroup_from_slab_obj() (See
> the link below). So it is better to add lockdep assertion to
> obj_cgroup_memcg() to find those issues ASAP.
> 
> Because there is no user of obj_cgroup_memcg() holding objcg_lock
> to make the returned memory cgroup safe, do not add objcg_lock
> assertion (We should export objcg_lock if we really want to do).
> Additionally, this is some internal implementation detail of memcg
> and should not be accessible outside memcg code.
> 
> Some users like __mem_cgroup_uncharge() do not care the lifetime
> of the returned memory cgroup, which just want to know if the
> folio is charged to a memory cgroup, therefore, they do not need
> to hold the needed locks. In which case, introduce a new helper
> folio_memcg_charged() to do this. Compare it to folio_memcg(), it
> could eliminate a memory access of objcg->memcg for kmem, actually,
> a really small gain.
> 
> Link: https://lore.kernel.org/all/20240718083607.42068-1-songmuchun@bytedance.com/
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>

Acked-by: Roman Gushchin <roman.gushchin@linux.dev>

Thanks!

Vlastimil Babka (SUSE) July 26, 2024, 8:09 a.m. UTC | #3

On 7/25/24 11:43 AM, Muchun Song wrote:
> The obj_cgroup_memcg() is supposed to safe to prevent the returned
> memory cgroup from being freed only when the caller is holding the
> rcu read lock or objcg_lock or cgroup_mutex. It is very easy to
> ignore thoes conditions when users call some upper APIs which call
> obj_cgroup_memcg() internally like mem_cgroup_from_slab_obj() (See
> the link below). So it is better to add lockdep assertion to
> obj_cgroup_memcg() to find those issues ASAP.
> 
> Because there is no user of obj_cgroup_memcg() holding objcg_lock
> to make the returned memory cgroup safe, do not add objcg_lock
> assertion (We should export objcg_lock if we really want to do).
> Additionally, this is some internal implementation detail of memcg
> and should not be accessible outside memcg code.
> 
> Some users like __mem_cgroup_uncharge() do not care the lifetime
> of the returned memory cgroup, which just want to know if the
> folio is charged to a memory cgroup, therefore, they do not need
> to hold the needed locks. In which case, introduce a new helper
> folio_memcg_charged() to do this. Compare it to folio_memcg(), it
> could eliminate a memory access of objcg->memcg for kmem, actually,
> a really small gain.
> 
> Link: https://lore.kernel.org/all/20240718083607.42068-1-songmuchun@bytedance.com/
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

> ---
> v3:
>  - Use lockdep_assert_once(Vlastimil).
> 
> v2:
>  - Remove mention of objcg_lock in obj_cgroup_memcg()(Shakeel Butt).
> 
>  include/linux/memcontrol.h | 20 +++++++++++++++++---
>  mm/memcontrol.c            |  6 +++---
>  2 files changed, 20 insertions(+), 6 deletions(-)
> 
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index fc94879db4dff..95f823deafeca 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -360,11 +360,11 @@ static inline bool folio_memcg_kmem(struct folio *folio);
>   * After the initialization objcg->memcg is always pointing at
>   * a valid memcg, but can be atomically swapped to the parent memcg.
>   *
> - * The caller must ensure that the returned memcg won't be released:
> - * e.g. acquire the rcu_read_lock or css_set_lock.
> + * The caller must ensure that the returned memcg won't be released.
>   */
>  static inline struct mem_cgroup *obj_cgroup_memcg(struct obj_cgroup *objcg)
>  {
> +	lockdep_assert_once(rcu_read_lock_held() || lockdep_is_held(&cgroup_mutex));
>  	return READ_ONCE(objcg->memcg);
>  }
>  
> @@ -438,6 +438,19 @@ static inline struct mem_cgroup *folio_memcg(struct folio *folio)
>  	return __folio_memcg(folio);
>  }
>  
> +/*
> + * folio_memcg_charged - If a folio is charged to a memory cgroup.
> + * @folio: Pointer to the folio.
> + *
> + * Returns true if folio is charged to a memory cgroup, otherwise returns false.
> + */
> +static inline bool folio_memcg_charged(struct folio *folio)
> +{
> +	if (folio_memcg_kmem(folio))
> +		return __folio_objcg(folio) != NULL;
> +	return __folio_memcg(folio) != NULL;
> +}
> +
>  /**
>   * folio_memcg_rcu - Locklessly get the memory cgroup associated with a folio.
>   * @folio: Pointer to the folio.
> @@ -454,7 +467,6 @@ static inline struct mem_cgroup *folio_memcg_rcu(struct folio *folio)
>  	unsigned long memcg_data = READ_ONCE(folio->memcg_data);
>  
>  	VM_BUG_ON_FOLIO(folio_test_slab(folio), folio);
> -	WARN_ON_ONCE(!rcu_read_lock_held());
>  
>  	if (memcg_data & MEMCG_DATA_KMEM) {
>  		struct obj_cgroup *objcg;
> @@ -463,6 +475,8 @@ static inline struct mem_cgroup *folio_memcg_rcu(struct folio *folio)
>  		return obj_cgroup_memcg(objcg);
>  	}
>  
> +	WARN_ON_ONCE(!rcu_read_lock_held());
> +
>  	return (struct mem_cgroup *)(memcg_data & ~OBJEXTS_FLAGS_MASK);
>  }
>  
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 622d4544edd24..3da0284573857 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2366,7 +2366,7 @@ void mem_cgroup_cancel_charge(struct mem_cgroup *memcg, unsigned int nr_pages)
>  
>  static void commit_charge(struct folio *folio, struct mem_cgroup *memcg)
>  {
> -	VM_BUG_ON_FOLIO(folio_memcg(folio), folio);
> +	VM_BUG_ON_FOLIO(folio_memcg_charged(folio), folio);
>  	/*
>  	 * Any of the following ensures page's memcg stability:
>  	 *
> @@ -4617,7 +4617,7 @@ void __mem_cgroup_uncharge(struct folio *folio)
>  	struct uncharge_gather ug;
>  
>  	/* Don't touch folio->lru of any random page, pre-check: */
> -	if (!folio_memcg(folio))
> +	if (!folio_memcg_charged(folio))
>  		return;
>  
>  	uncharge_gather_clear(&ug);
> @@ -4662,7 +4662,7 @@ void mem_cgroup_replace_folio(struct folio *old, struct folio *new)
>  		return;
>  
>  	/* Page cache replacement: new folio already charged? */
> -	if (folio_memcg(new))
> +	if (folio_memcg_charged(new))
>  		return;
>  
>  	memcg = folio_memcg(old);

Marek Szyprowski July 30, 2024, 6:52 p.m. UTC | #4

On 25.07.2024 11:43, Muchun Song wrote:
> The obj_cgroup_memcg() is supposed to safe to prevent the returned
> memory cgroup from being freed only when the caller is holding the
> rcu read lock or objcg_lock or cgroup_mutex. It is very easy to
> ignore thoes conditions when users call some upper APIs which call
> obj_cgroup_memcg() internally like mem_cgroup_from_slab_obj() (See
> the link below). So it is better to add lockdep assertion to
> obj_cgroup_memcg() to find those issues ASAP.
>
> Because there is no user of obj_cgroup_memcg() holding objcg_lock
> to make the returned memory cgroup safe, do not add objcg_lock
> assertion (We should export objcg_lock if we really want to do).
> Additionally, this is some internal implementation detail of memcg
> and should not be accessible outside memcg code.
>
> Some users like __mem_cgroup_uncharge() do not care the lifetime
> of the returned memory cgroup, which just want to know if the
> folio is charged to a memory cgroup, therefore, they do not need
> to hold the needed locks. In which case, introduce a new helper
> folio_memcg_charged() to do this. Compare it to folio_memcg(), it
> could eliminate a memory access of objcg->memcg for kmem, actually,
> a really small gain.
>
> Link: https://lore.kernel.org/all/20240718083607.42068-1-songmuchun@bytedance.com/
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>

This patch landed in today's linux-next as commit 230b2f1f31b9 ("mm: 
kmem: add lockdep assertion to obj_cgroup_memcg"). I my tests I found 
that it triggers the following warning on Debian bookworm/sid system 
image running under QEMU RISCV64:

------------[ cut here ]------------
WARNING: CPU: 0 PID: 1 at include/linux/memcontrol.h:373 
mem_cgroup_from_slab_obj+0x13e/0x1ea
Modules linked in:
CPU: 0 UID: 0 PID: 1 Comm: systemd Not tainted 6.10.0+ #15154
Hardware name: riscv-virtio,qemu (DT)
epc : mem_cgroup_from_slab_obj+0x13e/0x1ea
  ra : mem_cgroup_from_slab_obj+0x13c/0x1ea
...
[<ffffffff80257256>] mem_cgroup_from_slab_obj+0x13e/0x1ea
[<ffffffff801f0b3e>] list_lru_del_obj+0xa6/0xc2
[<ffffffff8027c6c6>] d_lru_del+0x8c/0xa4
[<ffffffff8027da60>] __dentry_kill+0x15e/0x17a
[<ffffffff8027ec3c>] dput.part.0+0x242/0x3e6
[<ffffffff8027edee>] dput+0xe/0x18
[<ffffffff8027324c>] lookup_fast+0x80/0xce
[<ffffffff80273e28>] walk_component+0x20/0x13c
[<ffffffff802747e2>] path_lookupat+0x64/0x16c
[<ffffffff80274bf4>] filename_lookup+0x76/0x122
[<ffffffff80274d80>] user_path_at+0x30/0x4a
[<ffffffff802d12bc>] __riscv_sys_name_to_handle_at+0x52/0x1d8
[<ffffffff80b60324>] do_trap_ecall_u+0x14e/0x1da
[<ffffffff80b6c546>] handle_exception+0xca/0xd6
irq event stamp: 198187
hardirqs last  enabled at (198187): [<ffffffff8028ca9e>] 
lookup_mnt+0x186/0x308
hardirqs last disabled at (198186): [<ffffffff8028ca74>] 
lookup_mnt+0x15c/0x308
softirqs last  enabled at (198172): [<ffffffff800e34f6>] 
cgroup_apply_control_enable+0x1f6/0x2fc
softirqs last disabled at (198170): [<ffffffff800e34d8>] 
cgroup_apply_control_enable+0x1d8/0x2fc
---[ end trace 0000000000000000 ]---

Similar warning appears on ARM64 Debian bookworm system. Reverting it on 
top of linux-next hides the issue, but I assume this is not the best way 
to fix it.

I'm testing kernel built from riscv/defconfig with PROVE_LOCKING, 
DEBUG_ATOMIC_SLEEP, DEBUG_DRIVER and DEBUG_DEVRES options enabled.

> ---
> v3:
>   - Use lockdep_assert_once(Vlastimil).
>
> v2:
>   - Remove mention of objcg_lock in obj_cgroup_memcg()(Shakeel Butt).
>
>   include/linux/memcontrol.h | 20 +++++++++++++++++---
>   mm/memcontrol.c            |  6 +++---
>   2 files changed, 20 insertions(+), 6 deletions(-)
>
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index fc94879db4dff..95f823deafeca 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -360,11 +360,11 @@ static inline bool folio_memcg_kmem(struct folio *folio);
>    * After the initialization objcg->memcg is always pointing at
>    * a valid memcg, but can be atomically swapped to the parent memcg.
>    *
> - * The caller must ensure that the returned memcg won't be released:
> - * e.g. acquire the rcu_read_lock or css_set_lock.
> + * The caller must ensure that the returned memcg won't be released.
>    */
>   static inline struct mem_cgroup *obj_cgroup_memcg(struct obj_cgroup *objcg)
>   {
> +	lockdep_assert_once(rcu_read_lock_held() || lockdep_is_held(&cgroup_mutex));
>   	return READ_ONCE(objcg->memcg);
>   }
>   
> @@ -438,6 +438,19 @@ static inline struct mem_cgroup *folio_memcg(struct folio *folio)
>   	return __folio_memcg(folio);
>   }
>   
> +/*
> + * folio_memcg_charged - If a folio is charged to a memory cgroup.
> + * @folio: Pointer to the folio.
> + *
> + * Returns true if folio is charged to a memory cgroup, otherwise returns false.
> + */
> +static inline bool folio_memcg_charged(struct folio *folio)
> +{
> +	if (folio_memcg_kmem(folio))
> +		return __folio_objcg(folio) != NULL;
> +	return __folio_memcg(folio) != NULL;
> +}
> +
>   /**
>    * folio_memcg_rcu - Locklessly get the memory cgroup associated with a folio.
>    * @folio: Pointer to the folio.
> @@ -454,7 +467,6 @@ static inline struct mem_cgroup *folio_memcg_rcu(struct folio *folio)
>   	unsigned long memcg_data = READ_ONCE(folio->memcg_data);
>   
>   	VM_BUG_ON_FOLIO(folio_test_slab(folio), folio);
> -	WARN_ON_ONCE(!rcu_read_lock_held());
>   
>   	if (memcg_data & MEMCG_DATA_KMEM) {
>   		struct obj_cgroup *objcg;
> @@ -463,6 +475,8 @@ static inline struct mem_cgroup *folio_memcg_rcu(struct folio *folio)
>   		return obj_cgroup_memcg(objcg);
>   	}
>   
> +	WARN_ON_ONCE(!rcu_read_lock_held());
> +
>   	return (struct mem_cgroup *)(memcg_data & ~OBJEXTS_FLAGS_MASK);
>   }
>   
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 622d4544edd24..3da0284573857 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2366,7 +2366,7 @@ void mem_cgroup_cancel_charge(struct mem_cgroup *memcg, unsigned int nr_pages)
>   
>   static void commit_charge(struct folio *folio, struct mem_cgroup *memcg)
>   {
> -	VM_BUG_ON_FOLIO(folio_memcg(folio), folio);
> +	VM_BUG_ON_FOLIO(folio_memcg_charged(folio), folio);
>   	/*
>   	 * Any of the following ensures page's memcg stability:
>   	 *
> @@ -4617,7 +4617,7 @@ void __mem_cgroup_uncharge(struct folio *folio)
>   	struct uncharge_gather ug;
>   
>   	/* Don't touch folio->lru of any random page, pre-check: */
> -	if (!folio_memcg(folio))
> +	if (!folio_memcg_charged(folio))
>   		return;
>   
>   	uncharge_gather_clear(&ug);
> @@ -4662,7 +4662,7 @@ void mem_cgroup_replace_folio(struct folio *old, struct folio *new)
>   		return;
>   
>   	/* Page cache replacement: new folio already charged? */
> -	if (folio_memcg(new))
> +	if (folio_memcg_charged(new))
>   		return;
>   
>   	memcg = folio_memcg(old);

Best regards

Andrew Morton July 30, 2024, 8:18 p.m. UTC | #5

On Tue, 30 Jul 2024 20:52:04 +0200 Marek Szyprowski <m.szyprowski@samsung.com> wrote:

> This patch landed in today's linux-next as commit 230b2f1f31b9 ("mm: 
> kmem: add lockdep assertion to obj_cgroup_memcg"). I my tests I found 
> that it triggers the following warning on Debian bookworm/sid system 
> image running under QEMU RISCV64:

Thanks.  I'll drop the patch while this gets sorted out, to be nice to
linux-next users.

Muchun Song July 31, 2024, 6:53 a.m. UTC | #6

> On Jul 31, 2024, at 04:18, Andrew Morton <akpm@linux-foundation.org> wrote:
> 
> On Tue, 30 Jul 2024 20:52:04 +0200 Marek Szyprowski <m.szyprowski@samsung.com> wrote:
> 
>> This patch landed in today's linux-next as commit 230b2f1f31b9 ("mm: 
>> kmem: add lockdep assertion to obj_cgroup_memcg"). I my tests I found 
>> that it triggers the following warning on Debian bookworm/sid system 
>> image running under QEMU RISCV64:
> 
> Thanks.  I'll drop the patch while this gets sorted out, to be nice to
> linux-next users.

Hi Andrew,

Please pick up this patch first [1], it is a fix for the above issue.

[1] https://lore.kernel.org/all/20240718083607.42068-1-songmuchun@bytedance.com/

Thanks.

Muchun Song July 31, 2024, 7:02 a.m. UTC | #7

> On Jul 31, 2024, at 02:52, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
> 
> On 25.07.2024 11:43, Muchun Song wrote:
>> The obj_cgroup_memcg() is supposed to safe to prevent the returned
>> memory cgroup from being freed only when the caller is holding the
>> rcu read lock or objcg_lock or cgroup_mutex. It is very easy to
>> ignore thoes conditions when users call some upper APIs which call
>> obj_cgroup_memcg() internally like mem_cgroup_from_slab_obj() (See
>> the link below). So it is better to add lockdep assertion to
>> obj_cgroup_memcg() to find those issues ASAP.
>> 
>> Because there is no user of obj_cgroup_memcg() holding objcg_lock
>> to make the returned memory cgroup safe, do not add objcg_lock
>> assertion (We should export objcg_lock if we really want to do).
>> Additionally, this is some internal implementation detail of memcg
>> and should not be accessible outside memcg code.
>> 
>> Some users like __mem_cgroup_uncharge() do not care the lifetime
>> of the returned memory cgroup, which just want to know if the
>> folio is charged to a memory cgroup, therefore, they do not need
>> to hold the needed locks. In which case, introduce a new helper
>> folio_memcg_charged() to do this. Compare it to folio_memcg(), it
>> could eliminate a memory access of objcg->memcg for kmem, actually,
>> a really small gain.
>> 
>> Link: https://lore.kernel.org/all/20240718083607.42068-1-songmuchun@bytedance.com/
>> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> 
> This patch landed in today's linux-next as commit 230b2f1f31b9 ("mm: 
> kmem: add lockdep assertion to obj_cgroup_memcg"). I my tests I found 
> that it triggers the following warning on Debian bookworm/sid system 
> image running under QEMU RISCV64:

Thanks for your report.

I'd like to say excellent since it indeed indicates this patch works
well. Your report is actually a bug that I fixed it in [1] but not
related to this patch.

[1] https://lore.kernel.org/all/20240718083607.42068-1-songmuchun@bytedance.com/

> 
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 1 at include/linux/memcontrol.h:373 
> mem_cgroup_from_slab_obj+0x13e/0x1ea
> Modules linked in:
> CPU: 0 UID: 0 PID: 1 Comm: systemd Not tainted 6.10.0+ #15154
> Hardware name: riscv-virtio,qemu (DT)
> epc : mem_cgroup_from_slab_obj+0x13e/0x1ea
>  ra : mem_cgroup_from_slab_obj+0x13c/0x1ea
> ...
> [<ffffffff80257256>] mem_cgroup_from_slab_obj+0x13e/0x1ea
> [<ffffffff801f0b3e>] list_lru_del_obj+0xa6/0xc2
> [<ffffffff8027c6c6>] d_lru_del+0x8c/0xa4
> [<ffffffff8027da60>] __dentry_kill+0x15e/0x17a
> [<ffffffff8027ec3c>] dput.part.0+0x242/0x3e6
> [<ffffffff8027edee>] dput+0xe/0x18
> [<ffffffff8027324c>] lookup_fast+0x80/0xce
> [<ffffffff80273e28>] walk_component+0x20/0x13c
> [<ffffffff802747e2>] path_lookupat+0x64/0x16c
> [<ffffffff80274bf4>] filename_lookup+0x76/0x122
> [<ffffffff80274d80>] user_path_at+0x30/0x4a
> [<ffffffff802d12bc>] __riscv_sys_name_to_handle_at+0x52/0x1d8
> [<ffffffff80b60324>] do_trap_ecall_u+0x14e/0x1da
> [<ffffffff80b6c546>] handle_exception+0xca/0xd6
> irq event stamp: 198187
> hardirqs last  enabled at (198187): [<ffffffff8028ca9e>] 
> lookup_mnt+0x186/0x308
> hardirqs last disabled at (198186): [<ffffffff8028ca74>] 
> lookup_mnt+0x15c/0x308
> softirqs last  enabled at (198172): [<ffffffff800e34f6>] 
> cgroup_apply_control_enable+0x1f6/0x2fc
> softirqs last disabled at (198170): [<ffffffff800e34d8>] 
> cgroup_apply_control_enable+0x1d8/0x2fc
> ---[ end trace 0000000000000000 ]---
> 
> Similar warning appears on ARM64 Debian bookworm system. Reverting it on 
> top of linux-next hides the issue, but I assume this is not the best way 
> to fix it.
> 
> I'm testing kernel built from riscv/defconfig with PROVE_LOCKING, 
> DEBUG_ATOMIC_SLEEP, DEBUG_DRIVER and DEBUG_DEVRES options enabled.
> 
>> ---
>> v3:
>>  - Use lockdep_assert_once(Vlastimil).
>> 
>> v2:
>>  - Remove mention of objcg_lock in obj_cgroup_memcg()(Shakeel Butt).
>> 
>>  include/linux/memcontrol.h | 20 +++++++++++++++++---
>>  mm/memcontrol.c            |  6 +++---
>>  2 files changed, 20 insertions(+), 6 deletions(-)
>> 
>> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
>> index fc94879db4dff..95f823deafeca 100644
>> --- a/include/linux/memcontrol.h
>> +++ b/include/linux/memcontrol.h
>> @@ -360,11 +360,11 @@ static inline bool folio_memcg_kmem(struct folio *folio);
>>   * After the initialization objcg->memcg is always pointing at
>>   * a valid memcg, but can be atomically swapped to the parent memcg.
>>   *
>> - * The caller must ensure that the returned memcg won't be released:
>> - * e.g. acquire the rcu_read_lock or css_set_lock.
>> + * The caller must ensure that the returned memcg won't be released.
>>   */
>>  static inline struct mem_cgroup *obj_cgroup_memcg(struct obj_cgroup *objcg)
>>  {
>> + lockdep_assert_once(rcu_read_lock_held() || lockdep_is_held(&cgroup_mutex));
>>   return READ_ONCE(objcg->memcg);
>>  }
>> 
>> @@ -438,6 +438,19 @@ static inline struct mem_cgroup *folio_memcg(struct folio *folio)
>>   return __folio_memcg(folio);
>>  }
>> 
>> +/*
>> + * folio_memcg_charged - If a folio is charged to a memory cgroup.
>> + * @folio: Pointer to the folio.
>> + *
>> + * Returns true if folio is charged to a memory cgroup, otherwise returns false.
>> + */
>> +static inline bool folio_memcg_charged(struct folio *folio)
>> +{
>> + if (folio_memcg_kmem(folio))
>> + return __folio_objcg(folio) != NULL;
>> + return __folio_memcg(folio) != NULL;
>> +}
>> +
>>  /**
>>   * folio_memcg_rcu - Locklessly get the memory cgroup associated with a folio.
>>   * @folio: Pointer to the folio.
>> @@ -454,7 +467,6 @@ static inline struct mem_cgroup *folio_memcg_rcu(struct folio *folio)
>>   unsigned long memcg_data = READ_ONCE(folio->memcg_data);
>> 
>>   VM_BUG_ON_FOLIO(folio_test_slab(folio), folio);
>> - WARN_ON_ONCE(!rcu_read_lock_held());
>> 
>>   if (memcg_data & MEMCG_DATA_KMEM) {
>>   struct obj_cgroup *objcg;
>> @@ -463,6 +475,8 @@ static inline struct mem_cgroup *folio_memcg_rcu(struct folio *folio)
>>   return obj_cgroup_memcg(objcg);
>>   }
>> 
>> + WARN_ON_ONCE(!rcu_read_lock_held());
>> +
>>   return (struct mem_cgroup *)(memcg_data & ~OBJEXTS_FLAGS_MASK);
>>  }
>> 
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index 622d4544edd24..3da0284573857 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -2366,7 +2366,7 @@ void mem_cgroup_cancel_charge(struct mem_cgroup *memcg, unsigned int nr_pages)
>> 
>>  static void commit_charge(struct folio *folio, struct mem_cgroup *memcg)
>>  {
>> - VM_BUG_ON_FOLIO(folio_memcg(folio), folio);
>> + VM_BUG_ON_FOLIO(folio_memcg_charged(folio), folio);
>>   /*
>>    * Any of the following ensures page's memcg stability:
>>    *
>> @@ -4617,7 +4617,7 @@ void __mem_cgroup_uncharge(struct folio *folio)
>>   struct uncharge_gather ug;
>> 
>>   /* Don't touch folio->lru of any random page, pre-check: */
>> - if (!folio_memcg(folio))
>> + if (!folio_memcg_charged(folio))
>>   return;
>> 
>>   uncharge_gather_clear(&ug);
>> @@ -4662,7 +4662,7 @@ void mem_cgroup_replace_folio(struct folio *old, struct folio *new)
>>   return;
>> 
>>   /* Page cache replacement: new folio already charged? */
>> - if (folio_memcg(new))
>> + if (folio_memcg_charged(new))
>>   return;
>> 
>>   memcg = folio_memcg(old);
> 
> Best regards
> -- 
> Marek Szyprowski, PhD
> Samsung R&D Institute Poland

Marek Szyprowski July 31, 2024, 8:09 a.m. UTC | #8

On 31.07.2024 09:02, Muchun Song wrote:
>> On Jul 31, 2024, at 02:52, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
>>
>> On 25.07.2024 11:43, Muchun Song wrote:
>>> The obj_cgroup_memcg() is supposed to safe to prevent the returned
>>> memory cgroup from being freed only when the caller is holding the
>>> rcu read lock or objcg_lock or cgroup_mutex. It is very easy to
>>> ignore thoes conditions when users call some upper APIs which call
>>> obj_cgroup_memcg() internally like mem_cgroup_from_slab_obj() (See
>>> the link below). So it is better to add lockdep assertion to
>>> obj_cgroup_memcg() to find those issues ASAP.
>>>
>>> Because there is no user of obj_cgroup_memcg() holding objcg_lock
>>> to make the returned memory cgroup safe, do not add objcg_lock
>>> assertion (We should export objcg_lock if we really want to do).
>>> Additionally, this is some internal implementation detail of memcg
>>> and should not be accessible outside memcg code.
>>>
>>> Some users like __mem_cgroup_uncharge() do not care the lifetime
>>> of the returned memory cgroup, which just want to know if the
>>> folio is charged to a memory cgroup, therefore, they do not need
>>> to hold the needed locks. In which case, introduce a new helper
>>> folio_memcg_charged() to do this. Compare it to folio_memcg(), it
>>> could eliminate a memory access of objcg->memcg for kmem, actually,
>>> a really small gain.
>>>
>>> Link: https://lore.kernel.org/all/20240718083607.42068-1-songmuchun@bytedance.com/
>>> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
>> This patch landed in today's linux-next as commit 230b2f1f31b9 ("mm:
>> kmem: add lockdep assertion to obj_cgroup_memcg"). I my tests I found
>> that it triggers the following warning on Debian bookworm/sid system
>> image running under QEMU RISCV64:
> Thanks for your report.
>
> I'd like to say excellent since it indeed indicates this patch works
> well. Your report is actually a bug that I fixed it in [1] but not
> related to this patch.
>
> [1] https://lore.kernel.org/all/20240718083607.42068-1-songmuchun@bytedance.com/

Confirmed. Applying [1] on top of next-20240730 fixes this issue without 
reverting $subject.

Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>

>> ------------[ cut here ]------------
>> WARNING: CPU: 0 PID: 1 at include/linux/memcontrol.h:373
>> mem_cgroup_from_slab_obj+0x13e/0x1ea
>> Modules linked in:
>> CPU: 0 UID: 0 PID: 1 Comm: systemd Not tainted 6.10.0+ #15154
>> Hardware name: riscv-virtio,qemu (DT)
>> epc : mem_cgroup_from_slab_obj+0x13e/0x1ea
>>   ra : mem_cgroup_from_slab_obj+0x13c/0x1ea
>> ...
>> [<ffffffff80257256>] mem_cgroup_from_slab_obj+0x13e/0x1ea
>> [<ffffffff801f0b3e>] list_lru_del_obj+0xa6/0xc2
>> [<ffffffff8027c6c6>] d_lru_del+0x8c/0xa4
>> [<ffffffff8027da60>] __dentry_kill+0x15e/0x17a
>> [<ffffffff8027ec3c>] dput.part.0+0x242/0x3e6
>> [<ffffffff8027edee>] dput+0xe/0x18
>> [<ffffffff8027324c>] lookup_fast+0x80/0xce
>> [<ffffffff80273e28>] walk_component+0x20/0x13c
>> [<ffffffff802747e2>] path_lookupat+0x64/0x16c
>> [<ffffffff80274bf4>] filename_lookup+0x76/0x122
>> [<ffffffff80274d80>] user_path_at+0x30/0x4a
>> [<ffffffff802d12bc>] __riscv_sys_name_to_handle_at+0x52/0x1d8
>> [<ffffffff80b60324>] do_trap_ecall_u+0x14e/0x1da
>> [<ffffffff80b6c546>] handle_exception+0xca/0xd6
>> irq event stamp: 198187
>> hardirqs last  enabled at (198187): [<ffffffff8028ca9e>]
>> lookup_mnt+0x186/0x308
>> hardirqs last disabled at (198186): [<ffffffff8028ca74>]
>> lookup_mnt+0x15c/0x308
>> softirqs last  enabled at (198172): [<ffffffff800e34f6>]
>> cgroup_apply_control_enable+0x1f6/0x2fc
>> softirqs last disabled at (198170): [<ffffffff800e34d8>]
>> cgroup_apply_control_enable+0x1d8/0x2fc
>> ---[ end trace 0000000000000000 ]---
>>
>> Similar warning appears on ARM64 Debian bookworm system. Reverting it on
>> top of linux-next hides the issue, but I assume this is not the best way
>> to fix it.
>>
>> I'm testing kernel built from riscv/defconfig with PROVE_LOCKING,
>> DEBUG_ATOMIC_SLEEP, DEBUG_DRIVER and DEBUG_DEVRES options enabled.
>>
>>> ---
>>> v3:
>>>   - Use lockdep_assert_once(Vlastimil).
>>>
>>> v2:
>>>   - Remove mention of objcg_lock in obj_cgroup_memcg()(Shakeel Butt).
>>>
>>>   include/linux/memcontrol.h | 20 +++++++++++++++++---
>>>   mm/memcontrol.c            |  6 +++---
>>>   2 files changed, 20 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
>>> index fc94879db4dff..95f823deafeca 100644
>>> --- a/include/linux/memcontrol.h
>>> +++ b/include/linux/memcontrol.h
>>> @@ -360,11 +360,11 @@ static inline bool folio_memcg_kmem(struct folio *folio);
>>>    * After the initialization objcg->memcg is always pointing at
>>>    * a valid memcg, but can be atomically swapped to the parent memcg.
>>>    *
>>> - * The caller must ensure that the returned memcg won't be released:
>>> - * e.g. acquire the rcu_read_lock or css_set_lock.
>>> + * The caller must ensure that the returned memcg won't be released.
>>>    */
>>>   static inline struct mem_cgroup *obj_cgroup_memcg(struct obj_cgroup *objcg)
>>>   {
>>> + lockdep_assert_once(rcu_read_lock_held() || lockdep_is_held(&cgroup_mutex));
>>>    return READ_ONCE(objcg->memcg);
>>>   }
>>>
>>> @@ -438,6 +438,19 @@ static inline struct mem_cgroup *folio_memcg(struct folio *folio)
>>>    return __folio_memcg(folio);
>>>   }
>>>
>>> +/*
>>> + * folio_memcg_charged - If a folio is charged to a memory cgroup.
>>> + * @folio: Pointer to the folio.
>>> + *
>>> + * Returns true if folio is charged to a memory cgroup, otherwise returns false.
>>> + */
>>> +static inline bool folio_memcg_charged(struct folio *folio)
>>> +{
>>> + if (folio_memcg_kmem(folio))
>>> + return __folio_objcg(folio) != NULL;
>>> + return __folio_memcg(folio) != NULL;
>>> +}
>>> +
>>>   /**
>>>    * folio_memcg_rcu - Locklessly get the memory cgroup associated with a folio.
>>>    * @folio: Pointer to the folio.
>>> @@ -454,7 +467,6 @@ static inline struct mem_cgroup *folio_memcg_rcu(struct folio *folio)
>>>    unsigned long memcg_data = READ_ONCE(folio->memcg_data);
>>>
>>>    VM_BUG_ON_FOLIO(folio_test_slab(folio), folio);
>>> - WARN_ON_ONCE(!rcu_read_lock_held());
>>>
>>>    if (memcg_data & MEMCG_DATA_KMEM) {
>>>    struct obj_cgroup *objcg;
>>> @@ -463,6 +475,8 @@ static inline struct mem_cgroup *folio_memcg_rcu(struct folio *folio)
>>>    return obj_cgroup_memcg(objcg);
>>>    }
>>>
>>> + WARN_ON_ONCE(!rcu_read_lock_held());
>>> +
>>>    return (struct mem_cgroup *)(memcg_data & ~OBJEXTS_FLAGS_MASK);
>>>   }
>>>
>>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>>> index 622d4544edd24..3da0284573857 100644
>>> --- a/mm/memcontrol.c
>>> +++ b/mm/memcontrol.c
>>> @@ -2366,7 +2366,7 @@ void mem_cgroup_cancel_charge(struct mem_cgroup *memcg, unsigned int nr_pages)
>>>
>>>   static void commit_charge(struct folio *folio, struct mem_cgroup *memcg)
>>>   {
>>> - VM_BUG_ON_FOLIO(folio_memcg(folio), folio);
>>> + VM_BUG_ON_FOLIO(folio_memcg_charged(folio), folio);
>>>    /*
>>>     * Any of the following ensures page's memcg stability:
>>>     *
>>> @@ -4617,7 +4617,7 @@ void __mem_cgroup_uncharge(struct folio *folio)
>>>    struct uncharge_gather ug;
>>>
>>>    /* Don't touch folio->lru of any random page, pre-check: */
>>> - if (!folio_memcg(folio))
>>> + if (!folio_memcg_charged(folio))
>>>    return;
>>>
>>>    uncharge_gather_clear(&ug);
>>> @@ -4662,7 +4662,7 @@ void mem_cgroup_replace_folio(struct folio *old, struct folio *new)
>>>    return;
>>>
>>>    /* Page cache replacement: new folio already charged? */
>>> - if (folio_memcg(new))
>>> + if (folio_memcg_charged(new))
>>>    return;
>>>
>>>    memcg = folio_memcg(old);
>> Best regards
>> -- 
>> Marek Szyprowski, PhD
>> Samsung R&D Institute Poland
>
>
Best regards

[v3] mm: kmem: add lockdep assertion to obj_cgroup_memcg

Commit Message

Comments

Patch