Message ID | 20240725094330.72537-1-songmuchun@bytedance.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [v3] mm: kmem: add lockdep assertion to obj_cgroup_memcg | expand |
On Thu, Jul 25, 2024 at 05:43:30PM GMT, Muchun Song wrote: > The obj_cgroup_memcg() is supposed to safe to prevent the returned > memory cgroup from being freed only when the caller is holding the > rcu read lock or objcg_lock or cgroup_mutex. It is very easy to > ignore thoes conditions when users call some upper APIs which call > obj_cgroup_memcg() internally like mem_cgroup_from_slab_obj() (See > the link below). So it is better to add lockdep assertion to > obj_cgroup_memcg() to find those issues ASAP. > > Because there is no user of obj_cgroup_memcg() holding objcg_lock > to make the returned memory cgroup safe, do not add objcg_lock > assertion (We should export objcg_lock if we really want to do). > Additionally, this is some internal implementation detail of memcg > and should not be accessible outside memcg code. > > Some users like __mem_cgroup_uncharge() do not care the lifetime > of the returned memory cgroup, which just want to know if the > folio is charged to a memory cgroup, therefore, they do not need > to hold the needed locks. In which case, introduce a new helper > folio_memcg_charged() to do this. Compare it to folio_memcg(), it > could eliminate a memory access of objcg->memcg for kmem, actually, > a really small gain. > > Link: https://lore.kernel.org/all/20240718083607.42068-1-songmuchun@bytedance.com/ > Signed-off-by: Muchun Song <songmuchun@bytedance.com> Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
On Thu, Jul 25, 2024 at 05:43:30PM +0800, Muchun Song wrote: > The obj_cgroup_memcg() is supposed to safe to prevent the returned > memory cgroup from being freed only when the caller is holding the > rcu read lock or objcg_lock or cgroup_mutex. It is very easy to > ignore thoes conditions when users call some upper APIs which call > obj_cgroup_memcg() internally like mem_cgroup_from_slab_obj() (See > the link below). So it is better to add lockdep assertion to > obj_cgroup_memcg() to find those issues ASAP. > > Because there is no user of obj_cgroup_memcg() holding objcg_lock > to make the returned memory cgroup safe, do not add objcg_lock > assertion (We should export objcg_lock if we really want to do). > Additionally, this is some internal implementation detail of memcg > and should not be accessible outside memcg code. > > Some users like __mem_cgroup_uncharge() do not care the lifetime > of the returned memory cgroup, which just want to know if the > folio is charged to a memory cgroup, therefore, they do not need > to hold the needed locks. In which case, introduce a new helper > folio_memcg_charged() to do this. Compare it to folio_memcg(), it > could eliminate a memory access of objcg->memcg for kmem, actually, > a really small gain. > > Link: https://lore.kernel.org/all/20240718083607.42068-1-songmuchun@bytedance.com/ > Signed-off-by: Muchun Song <songmuchun@bytedance.com> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Thanks!
On 7/25/24 11:43 AM, Muchun Song wrote: > The obj_cgroup_memcg() is supposed to safe to prevent the returned > memory cgroup from being freed only when the caller is holding the > rcu read lock or objcg_lock or cgroup_mutex. It is very easy to > ignore thoes conditions when users call some upper APIs which call > obj_cgroup_memcg() internally like mem_cgroup_from_slab_obj() (See > the link below). So it is better to add lockdep assertion to > obj_cgroup_memcg() to find those issues ASAP. > > Because there is no user of obj_cgroup_memcg() holding objcg_lock > to make the returned memory cgroup safe, do not add objcg_lock > assertion (We should export objcg_lock if we really want to do). > Additionally, this is some internal implementation detail of memcg > and should not be accessible outside memcg code. > > Some users like __mem_cgroup_uncharge() do not care the lifetime > of the returned memory cgroup, which just want to know if the > folio is charged to a memory cgroup, therefore, they do not need > to hold the needed locks. In which case, introduce a new helper > folio_memcg_charged() to do this. Compare it to folio_memcg(), it > could eliminate a memory access of objcg->memcg for kmem, actually, > a really small gain. > > Link: https://lore.kernel.org/all/20240718083607.42068-1-songmuchun@bytedance.com/ > Signed-off-by: Muchun Song <songmuchun@bytedance.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> > --- > v3: > - Use lockdep_assert_once(Vlastimil). > > v2: > - Remove mention of objcg_lock in obj_cgroup_memcg()(Shakeel Butt). > > include/linux/memcontrol.h | 20 +++++++++++++++++--- > mm/memcontrol.c | 6 +++--- > 2 files changed, 20 insertions(+), 6 deletions(-) > > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h > index fc94879db4dff..95f823deafeca 100644 > --- a/include/linux/memcontrol.h > +++ b/include/linux/memcontrol.h > @@ -360,11 +360,11 @@ static inline bool folio_memcg_kmem(struct folio *folio); > * After the initialization objcg->memcg is always pointing at > * a valid memcg, but can be atomically swapped to the parent memcg. > * > - * The caller must ensure that the returned memcg won't be released: > - * e.g. acquire the rcu_read_lock or css_set_lock. > + * The caller must ensure that the returned memcg won't be released. > */ > static inline struct mem_cgroup *obj_cgroup_memcg(struct obj_cgroup *objcg) > { > + lockdep_assert_once(rcu_read_lock_held() || lockdep_is_held(&cgroup_mutex)); > return READ_ONCE(objcg->memcg); > } > > @@ -438,6 +438,19 @@ static inline struct mem_cgroup *folio_memcg(struct folio *folio) > return __folio_memcg(folio); > } > > +/* > + * folio_memcg_charged - If a folio is charged to a memory cgroup. > + * @folio: Pointer to the folio. > + * > + * Returns true if folio is charged to a memory cgroup, otherwise returns false. > + */ > +static inline bool folio_memcg_charged(struct folio *folio) > +{ > + if (folio_memcg_kmem(folio)) > + return __folio_objcg(folio) != NULL; > + return __folio_memcg(folio) != NULL; > +} > + > /** > * folio_memcg_rcu - Locklessly get the memory cgroup associated with a folio. > * @folio: Pointer to the folio. > @@ -454,7 +467,6 @@ static inline struct mem_cgroup *folio_memcg_rcu(struct folio *folio) > unsigned long memcg_data = READ_ONCE(folio->memcg_data); > > VM_BUG_ON_FOLIO(folio_test_slab(folio), folio); > - WARN_ON_ONCE(!rcu_read_lock_held()); > > if (memcg_data & MEMCG_DATA_KMEM) { > struct obj_cgroup *objcg; > @@ -463,6 +475,8 @@ static inline struct mem_cgroup *folio_memcg_rcu(struct folio *folio) > return obj_cgroup_memcg(objcg); > } > > + WARN_ON_ONCE(!rcu_read_lock_held()); > + > return (struct mem_cgroup *)(memcg_data & ~OBJEXTS_FLAGS_MASK); > } > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 622d4544edd24..3da0284573857 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -2366,7 +2366,7 @@ void mem_cgroup_cancel_charge(struct mem_cgroup *memcg, unsigned int nr_pages) > > static void commit_charge(struct folio *folio, struct mem_cgroup *memcg) > { > - VM_BUG_ON_FOLIO(folio_memcg(folio), folio); > + VM_BUG_ON_FOLIO(folio_memcg_charged(folio), folio); > /* > * Any of the following ensures page's memcg stability: > * > @@ -4617,7 +4617,7 @@ void __mem_cgroup_uncharge(struct folio *folio) > struct uncharge_gather ug; > > /* Don't touch folio->lru of any random page, pre-check: */ > - if (!folio_memcg(folio)) > + if (!folio_memcg_charged(folio)) > return; > > uncharge_gather_clear(&ug); > @@ -4662,7 +4662,7 @@ void mem_cgroup_replace_folio(struct folio *old, struct folio *new) > return; > > /* Page cache replacement: new folio already charged? */ > - if (folio_memcg(new)) > + if (folio_memcg_charged(new)) > return; > > memcg = folio_memcg(old);
On 25.07.2024 11:43, Muchun Song wrote: > The obj_cgroup_memcg() is supposed to safe to prevent the returned > memory cgroup from being freed only when the caller is holding the > rcu read lock or objcg_lock or cgroup_mutex. It is very easy to > ignore thoes conditions when users call some upper APIs which call > obj_cgroup_memcg() internally like mem_cgroup_from_slab_obj() (See > the link below). So it is better to add lockdep assertion to > obj_cgroup_memcg() to find those issues ASAP. > > Because there is no user of obj_cgroup_memcg() holding objcg_lock > to make the returned memory cgroup safe, do not add objcg_lock > assertion (We should export objcg_lock if we really want to do). > Additionally, this is some internal implementation detail of memcg > and should not be accessible outside memcg code. > > Some users like __mem_cgroup_uncharge() do not care the lifetime > of the returned memory cgroup, which just want to know if the > folio is charged to a memory cgroup, therefore, they do not need > to hold the needed locks. In which case, introduce a new helper > folio_memcg_charged() to do this. Compare it to folio_memcg(), it > could eliminate a memory access of objcg->memcg for kmem, actually, > a really small gain. > > Link: https://lore.kernel.org/all/20240718083607.42068-1-songmuchun@bytedance.com/ > Signed-off-by: Muchun Song <songmuchun@bytedance.com> This patch landed in today's linux-next as commit 230b2f1f31b9 ("mm: kmem: add lockdep assertion to obj_cgroup_memcg"). I my tests I found that it triggers the following warning on Debian bookworm/sid system image running under QEMU RISCV64: ------------[ cut here ]------------ WARNING: CPU: 0 PID: 1 at include/linux/memcontrol.h:373 mem_cgroup_from_slab_obj+0x13e/0x1ea Modules linked in: CPU: 0 UID: 0 PID: 1 Comm: systemd Not tainted 6.10.0+ #15154 Hardware name: riscv-virtio,qemu (DT) epc : mem_cgroup_from_slab_obj+0x13e/0x1ea ra : mem_cgroup_from_slab_obj+0x13c/0x1ea ... [<ffffffff80257256>] mem_cgroup_from_slab_obj+0x13e/0x1ea [<ffffffff801f0b3e>] list_lru_del_obj+0xa6/0xc2 [<ffffffff8027c6c6>] d_lru_del+0x8c/0xa4 [<ffffffff8027da60>] __dentry_kill+0x15e/0x17a [<ffffffff8027ec3c>] dput.part.0+0x242/0x3e6 [<ffffffff8027edee>] dput+0xe/0x18 [<ffffffff8027324c>] lookup_fast+0x80/0xce [<ffffffff80273e28>] walk_component+0x20/0x13c [<ffffffff802747e2>] path_lookupat+0x64/0x16c [<ffffffff80274bf4>] filename_lookup+0x76/0x122 [<ffffffff80274d80>] user_path_at+0x30/0x4a [<ffffffff802d12bc>] __riscv_sys_name_to_handle_at+0x52/0x1d8 [<ffffffff80b60324>] do_trap_ecall_u+0x14e/0x1da [<ffffffff80b6c546>] handle_exception+0xca/0xd6 irq event stamp: 198187 hardirqs last enabled at (198187): [<ffffffff8028ca9e>] lookup_mnt+0x186/0x308 hardirqs last disabled at (198186): [<ffffffff8028ca74>] lookup_mnt+0x15c/0x308 softirqs last enabled at (198172): [<ffffffff800e34f6>] cgroup_apply_control_enable+0x1f6/0x2fc softirqs last disabled at (198170): [<ffffffff800e34d8>] cgroup_apply_control_enable+0x1d8/0x2fc ---[ end trace 0000000000000000 ]--- Similar warning appears on ARM64 Debian bookworm system. Reverting it on top of linux-next hides the issue, but I assume this is not the best way to fix it. I'm testing kernel built from riscv/defconfig with PROVE_LOCKING, DEBUG_ATOMIC_SLEEP, DEBUG_DRIVER and DEBUG_DEVRES options enabled. > --- > v3: > - Use lockdep_assert_once(Vlastimil). > > v2: > - Remove mention of objcg_lock in obj_cgroup_memcg()(Shakeel Butt). > > include/linux/memcontrol.h | 20 +++++++++++++++++--- > mm/memcontrol.c | 6 +++--- > 2 files changed, 20 insertions(+), 6 deletions(-) > > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h > index fc94879db4dff..95f823deafeca 100644 > --- a/include/linux/memcontrol.h > +++ b/include/linux/memcontrol.h > @@ -360,11 +360,11 @@ static inline bool folio_memcg_kmem(struct folio *folio); > * After the initialization objcg->memcg is always pointing at > * a valid memcg, but can be atomically swapped to the parent memcg. > * > - * The caller must ensure that the returned memcg won't be released: > - * e.g. acquire the rcu_read_lock or css_set_lock. > + * The caller must ensure that the returned memcg won't be released. > */ > static inline struct mem_cgroup *obj_cgroup_memcg(struct obj_cgroup *objcg) > { > + lockdep_assert_once(rcu_read_lock_held() || lockdep_is_held(&cgroup_mutex)); > return READ_ONCE(objcg->memcg); > } > > @@ -438,6 +438,19 @@ static inline struct mem_cgroup *folio_memcg(struct folio *folio) > return __folio_memcg(folio); > } > > +/* > + * folio_memcg_charged - If a folio is charged to a memory cgroup. > + * @folio: Pointer to the folio. > + * > + * Returns true if folio is charged to a memory cgroup, otherwise returns false. > + */ > +static inline bool folio_memcg_charged(struct folio *folio) > +{ > + if (folio_memcg_kmem(folio)) > + return __folio_objcg(folio) != NULL; > + return __folio_memcg(folio) != NULL; > +} > + > /** > * folio_memcg_rcu - Locklessly get the memory cgroup associated with a folio. > * @folio: Pointer to the folio. > @@ -454,7 +467,6 @@ static inline struct mem_cgroup *folio_memcg_rcu(struct folio *folio) > unsigned long memcg_data = READ_ONCE(folio->memcg_data); > > VM_BUG_ON_FOLIO(folio_test_slab(folio), folio); > - WARN_ON_ONCE(!rcu_read_lock_held()); > > if (memcg_data & MEMCG_DATA_KMEM) { > struct obj_cgroup *objcg; > @@ -463,6 +475,8 @@ static inline struct mem_cgroup *folio_memcg_rcu(struct folio *folio) > return obj_cgroup_memcg(objcg); > } > > + WARN_ON_ONCE(!rcu_read_lock_held()); > + > return (struct mem_cgroup *)(memcg_data & ~OBJEXTS_FLAGS_MASK); > } > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 622d4544edd24..3da0284573857 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -2366,7 +2366,7 @@ void mem_cgroup_cancel_charge(struct mem_cgroup *memcg, unsigned int nr_pages) > > static void commit_charge(struct folio *folio, struct mem_cgroup *memcg) > { > - VM_BUG_ON_FOLIO(folio_memcg(folio), folio); > + VM_BUG_ON_FOLIO(folio_memcg_charged(folio), folio); > /* > * Any of the following ensures page's memcg stability: > * > @@ -4617,7 +4617,7 @@ void __mem_cgroup_uncharge(struct folio *folio) > struct uncharge_gather ug; > > /* Don't touch folio->lru of any random page, pre-check: */ > - if (!folio_memcg(folio)) > + if (!folio_memcg_charged(folio)) > return; > > uncharge_gather_clear(&ug); > @@ -4662,7 +4662,7 @@ void mem_cgroup_replace_folio(struct folio *old, struct folio *new) > return; > > /* Page cache replacement: new folio already charged? */ > - if (folio_memcg(new)) > + if (folio_memcg_charged(new)) > return; > > memcg = folio_memcg(old); Best regards
On Tue, 30 Jul 2024 20:52:04 +0200 Marek Szyprowski <m.szyprowski@samsung.com> wrote: > This patch landed in today's linux-next as commit 230b2f1f31b9 ("mm: > kmem: add lockdep assertion to obj_cgroup_memcg"). I my tests I found > that it triggers the following warning on Debian bookworm/sid system > image running under QEMU RISCV64: Thanks. I'll drop the patch while this gets sorted out, to be nice to linux-next users.
> On Jul 31, 2024, at 04:18, Andrew Morton <akpm@linux-foundation.org> wrote: > > On Tue, 30 Jul 2024 20:52:04 +0200 Marek Szyprowski <m.szyprowski@samsung.com> wrote: > >> This patch landed in today's linux-next as commit 230b2f1f31b9 ("mm: >> kmem: add lockdep assertion to obj_cgroup_memcg"). I my tests I found >> that it triggers the following warning on Debian bookworm/sid system >> image running under QEMU RISCV64: > > Thanks. I'll drop the patch while this gets sorted out, to be nice to > linux-next users. Hi Andrew, Please pick up this patch first [1], it is a fix for the above issue. [1] https://lore.kernel.org/all/20240718083607.42068-1-songmuchun@bytedance.com/ Thanks.
> On Jul 31, 2024, at 02:52, Marek Szyprowski <m.szyprowski@samsung.com> wrote: > > On 25.07.2024 11:43, Muchun Song wrote: >> The obj_cgroup_memcg() is supposed to safe to prevent the returned >> memory cgroup from being freed only when the caller is holding the >> rcu read lock or objcg_lock or cgroup_mutex. It is very easy to >> ignore thoes conditions when users call some upper APIs which call >> obj_cgroup_memcg() internally like mem_cgroup_from_slab_obj() (See >> the link below). So it is better to add lockdep assertion to >> obj_cgroup_memcg() to find those issues ASAP. >> >> Because there is no user of obj_cgroup_memcg() holding objcg_lock >> to make the returned memory cgroup safe, do not add objcg_lock >> assertion (We should export objcg_lock if we really want to do). >> Additionally, this is some internal implementation detail of memcg >> and should not be accessible outside memcg code. >> >> Some users like __mem_cgroup_uncharge() do not care the lifetime >> of the returned memory cgroup, which just want to know if the >> folio is charged to a memory cgroup, therefore, they do not need >> to hold the needed locks. In which case, introduce a new helper >> folio_memcg_charged() to do this. Compare it to folio_memcg(), it >> could eliminate a memory access of objcg->memcg for kmem, actually, >> a really small gain. >> >> Link: https://lore.kernel.org/all/20240718083607.42068-1-songmuchun@bytedance.com/ >> Signed-off-by: Muchun Song <songmuchun@bytedance.com> > > This patch landed in today's linux-next as commit 230b2f1f31b9 ("mm: > kmem: add lockdep assertion to obj_cgroup_memcg"). I my tests I found > that it triggers the following warning on Debian bookworm/sid system > image running under QEMU RISCV64: Thanks for your report. I'd like to say excellent since it indeed indicates this patch works well. Your report is actually a bug that I fixed it in [1] but not related to this patch. [1] https://lore.kernel.org/all/20240718083607.42068-1-songmuchun@bytedance.com/ > > ------------[ cut here ]------------ > WARNING: CPU: 0 PID: 1 at include/linux/memcontrol.h:373 > mem_cgroup_from_slab_obj+0x13e/0x1ea > Modules linked in: > CPU: 0 UID: 0 PID: 1 Comm: systemd Not tainted 6.10.0+ #15154 > Hardware name: riscv-virtio,qemu (DT) > epc : mem_cgroup_from_slab_obj+0x13e/0x1ea > ra : mem_cgroup_from_slab_obj+0x13c/0x1ea > ... > [<ffffffff80257256>] mem_cgroup_from_slab_obj+0x13e/0x1ea > [<ffffffff801f0b3e>] list_lru_del_obj+0xa6/0xc2 > [<ffffffff8027c6c6>] d_lru_del+0x8c/0xa4 > [<ffffffff8027da60>] __dentry_kill+0x15e/0x17a > [<ffffffff8027ec3c>] dput.part.0+0x242/0x3e6 > [<ffffffff8027edee>] dput+0xe/0x18 > [<ffffffff8027324c>] lookup_fast+0x80/0xce > [<ffffffff80273e28>] walk_component+0x20/0x13c > [<ffffffff802747e2>] path_lookupat+0x64/0x16c > [<ffffffff80274bf4>] filename_lookup+0x76/0x122 > [<ffffffff80274d80>] user_path_at+0x30/0x4a > [<ffffffff802d12bc>] __riscv_sys_name_to_handle_at+0x52/0x1d8 > [<ffffffff80b60324>] do_trap_ecall_u+0x14e/0x1da > [<ffffffff80b6c546>] handle_exception+0xca/0xd6 > irq event stamp: 198187 > hardirqs last enabled at (198187): [<ffffffff8028ca9e>] > lookup_mnt+0x186/0x308 > hardirqs last disabled at (198186): [<ffffffff8028ca74>] > lookup_mnt+0x15c/0x308 > softirqs last enabled at (198172): [<ffffffff800e34f6>] > cgroup_apply_control_enable+0x1f6/0x2fc > softirqs last disabled at (198170): [<ffffffff800e34d8>] > cgroup_apply_control_enable+0x1d8/0x2fc > ---[ end trace 0000000000000000 ]--- > > Similar warning appears on ARM64 Debian bookworm system. Reverting it on > top of linux-next hides the issue, but I assume this is not the best way > to fix it. > > I'm testing kernel built from riscv/defconfig with PROVE_LOCKING, > DEBUG_ATOMIC_SLEEP, DEBUG_DRIVER and DEBUG_DEVRES options enabled. > >> --- >> v3: >> - Use lockdep_assert_once(Vlastimil). >> >> v2: >> - Remove mention of objcg_lock in obj_cgroup_memcg()(Shakeel Butt). >> >> include/linux/memcontrol.h | 20 +++++++++++++++++--- >> mm/memcontrol.c | 6 +++--- >> 2 files changed, 20 insertions(+), 6 deletions(-) >> >> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h >> index fc94879db4dff..95f823deafeca 100644 >> --- a/include/linux/memcontrol.h >> +++ b/include/linux/memcontrol.h >> @@ -360,11 +360,11 @@ static inline bool folio_memcg_kmem(struct folio *folio); >> * After the initialization objcg->memcg is always pointing at >> * a valid memcg, but can be atomically swapped to the parent memcg. >> * >> - * The caller must ensure that the returned memcg won't be released: >> - * e.g. acquire the rcu_read_lock or css_set_lock. >> + * The caller must ensure that the returned memcg won't be released. >> */ >> static inline struct mem_cgroup *obj_cgroup_memcg(struct obj_cgroup *objcg) >> { >> + lockdep_assert_once(rcu_read_lock_held() || lockdep_is_held(&cgroup_mutex)); >> return READ_ONCE(objcg->memcg); >> } >> >> @@ -438,6 +438,19 @@ static inline struct mem_cgroup *folio_memcg(struct folio *folio) >> return __folio_memcg(folio); >> } >> >> +/* >> + * folio_memcg_charged - If a folio is charged to a memory cgroup. >> + * @folio: Pointer to the folio. >> + * >> + * Returns true if folio is charged to a memory cgroup, otherwise returns false. >> + */ >> +static inline bool folio_memcg_charged(struct folio *folio) >> +{ >> + if (folio_memcg_kmem(folio)) >> + return __folio_objcg(folio) != NULL; >> + return __folio_memcg(folio) != NULL; >> +} >> + >> /** >> * folio_memcg_rcu - Locklessly get the memory cgroup associated with a folio. >> * @folio: Pointer to the folio. >> @@ -454,7 +467,6 @@ static inline struct mem_cgroup *folio_memcg_rcu(struct folio *folio) >> unsigned long memcg_data = READ_ONCE(folio->memcg_data); >> >> VM_BUG_ON_FOLIO(folio_test_slab(folio), folio); >> - WARN_ON_ONCE(!rcu_read_lock_held()); >> >> if (memcg_data & MEMCG_DATA_KMEM) { >> struct obj_cgroup *objcg; >> @@ -463,6 +475,8 @@ static inline struct mem_cgroup *folio_memcg_rcu(struct folio *folio) >> return obj_cgroup_memcg(objcg); >> } >> >> + WARN_ON_ONCE(!rcu_read_lock_held()); >> + >> return (struct mem_cgroup *)(memcg_data & ~OBJEXTS_FLAGS_MASK); >> } >> >> diff --git a/mm/memcontrol.c b/mm/memcontrol.c >> index 622d4544edd24..3da0284573857 100644 >> --- a/mm/memcontrol.c >> +++ b/mm/memcontrol.c >> @@ -2366,7 +2366,7 @@ void mem_cgroup_cancel_charge(struct mem_cgroup *memcg, unsigned int nr_pages) >> >> static void commit_charge(struct folio *folio, struct mem_cgroup *memcg) >> { >> - VM_BUG_ON_FOLIO(folio_memcg(folio), folio); >> + VM_BUG_ON_FOLIO(folio_memcg_charged(folio), folio); >> /* >> * Any of the following ensures page's memcg stability: >> * >> @@ -4617,7 +4617,7 @@ void __mem_cgroup_uncharge(struct folio *folio) >> struct uncharge_gather ug; >> >> /* Don't touch folio->lru of any random page, pre-check: */ >> - if (!folio_memcg(folio)) >> + if (!folio_memcg_charged(folio)) >> return; >> >> uncharge_gather_clear(&ug); >> @@ -4662,7 +4662,7 @@ void mem_cgroup_replace_folio(struct folio *old, struct folio *new) >> return; >> >> /* Page cache replacement: new folio already charged? */ >> - if (folio_memcg(new)) >> + if (folio_memcg_charged(new)) >> return; >> >> memcg = folio_memcg(old); > > Best regards > -- > Marek Szyprowski, PhD > Samsung R&D Institute Poland
On 31.07.2024 09:02, Muchun Song wrote: >> On Jul 31, 2024, at 02:52, Marek Szyprowski <m.szyprowski@samsung.com> wrote: >> >> On 25.07.2024 11:43, Muchun Song wrote: >>> The obj_cgroup_memcg() is supposed to safe to prevent the returned >>> memory cgroup from being freed only when the caller is holding the >>> rcu read lock or objcg_lock or cgroup_mutex. It is very easy to >>> ignore thoes conditions when users call some upper APIs which call >>> obj_cgroup_memcg() internally like mem_cgroup_from_slab_obj() (See >>> the link below). So it is better to add lockdep assertion to >>> obj_cgroup_memcg() to find those issues ASAP. >>> >>> Because there is no user of obj_cgroup_memcg() holding objcg_lock >>> to make the returned memory cgroup safe, do not add objcg_lock >>> assertion (We should export objcg_lock if we really want to do). >>> Additionally, this is some internal implementation detail of memcg >>> and should not be accessible outside memcg code. >>> >>> Some users like __mem_cgroup_uncharge() do not care the lifetime >>> of the returned memory cgroup, which just want to know if the >>> folio is charged to a memory cgroup, therefore, they do not need >>> to hold the needed locks. In which case, introduce a new helper >>> folio_memcg_charged() to do this. Compare it to folio_memcg(), it >>> could eliminate a memory access of objcg->memcg for kmem, actually, >>> a really small gain. >>> >>> Link: https://lore.kernel.org/all/20240718083607.42068-1-songmuchun@bytedance.com/ >>> Signed-off-by: Muchun Song <songmuchun@bytedance.com> >> This patch landed in today's linux-next as commit 230b2f1f31b9 ("mm: >> kmem: add lockdep assertion to obj_cgroup_memcg"). I my tests I found >> that it triggers the following warning on Debian bookworm/sid system >> image running under QEMU RISCV64: > Thanks for your report. > > I'd like to say excellent since it indeed indicates this patch works > well. Your report is actually a bug that I fixed it in [1] but not > related to this patch. > > [1] https://lore.kernel.org/all/20240718083607.42068-1-songmuchun@bytedance.com/ Confirmed. Applying [1] on top of next-20240730 fixes this issue without reverting $subject. Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> >> ------------[ cut here ]------------ >> WARNING: CPU: 0 PID: 1 at include/linux/memcontrol.h:373 >> mem_cgroup_from_slab_obj+0x13e/0x1ea >> Modules linked in: >> CPU: 0 UID: 0 PID: 1 Comm: systemd Not tainted 6.10.0+ #15154 >> Hardware name: riscv-virtio,qemu (DT) >> epc : mem_cgroup_from_slab_obj+0x13e/0x1ea >> ra : mem_cgroup_from_slab_obj+0x13c/0x1ea >> ... >> [<ffffffff80257256>] mem_cgroup_from_slab_obj+0x13e/0x1ea >> [<ffffffff801f0b3e>] list_lru_del_obj+0xa6/0xc2 >> [<ffffffff8027c6c6>] d_lru_del+0x8c/0xa4 >> [<ffffffff8027da60>] __dentry_kill+0x15e/0x17a >> [<ffffffff8027ec3c>] dput.part.0+0x242/0x3e6 >> [<ffffffff8027edee>] dput+0xe/0x18 >> [<ffffffff8027324c>] lookup_fast+0x80/0xce >> [<ffffffff80273e28>] walk_component+0x20/0x13c >> [<ffffffff802747e2>] path_lookupat+0x64/0x16c >> [<ffffffff80274bf4>] filename_lookup+0x76/0x122 >> [<ffffffff80274d80>] user_path_at+0x30/0x4a >> [<ffffffff802d12bc>] __riscv_sys_name_to_handle_at+0x52/0x1d8 >> [<ffffffff80b60324>] do_trap_ecall_u+0x14e/0x1da >> [<ffffffff80b6c546>] handle_exception+0xca/0xd6 >> irq event stamp: 198187 >> hardirqs last enabled at (198187): [<ffffffff8028ca9e>] >> lookup_mnt+0x186/0x308 >> hardirqs last disabled at (198186): [<ffffffff8028ca74>] >> lookup_mnt+0x15c/0x308 >> softirqs last enabled at (198172): [<ffffffff800e34f6>] >> cgroup_apply_control_enable+0x1f6/0x2fc >> softirqs last disabled at (198170): [<ffffffff800e34d8>] >> cgroup_apply_control_enable+0x1d8/0x2fc >> ---[ end trace 0000000000000000 ]--- >> >> Similar warning appears on ARM64 Debian bookworm system. Reverting it on >> top of linux-next hides the issue, but I assume this is not the best way >> to fix it. >> >> I'm testing kernel built from riscv/defconfig with PROVE_LOCKING, >> DEBUG_ATOMIC_SLEEP, DEBUG_DRIVER and DEBUG_DEVRES options enabled. >> >>> --- >>> v3: >>> - Use lockdep_assert_once(Vlastimil). >>> >>> v2: >>> - Remove mention of objcg_lock in obj_cgroup_memcg()(Shakeel Butt). >>> >>> include/linux/memcontrol.h | 20 +++++++++++++++++--- >>> mm/memcontrol.c | 6 +++--- >>> 2 files changed, 20 insertions(+), 6 deletions(-) >>> >>> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h >>> index fc94879db4dff..95f823deafeca 100644 >>> --- a/include/linux/memcontrol.h >>> +++ b/include/linux/memcontrol.h >>> @@ -360,11 +360,11 @@ static inline bool folio_memcg_kmem(struct folio *folio); >>> * After the initialization objcg->memcg is always pointing at >>> * a valid memcg, but can be atomically swapped to the parent memcg. >>> * >>> - * The caller must ensure that the returned memcg won't be released: >>> - * e.g. acquire the rcu_read_lock or css_set_lock. >>> + * The caller must ensure that the returned memcg won't be released. >>> */ >>> static inline struct mem_cgroup *obj_cgroup_memcg(struct obj_cgroup *objcg) >>> { >>> + lockdep_assert_once(rcu_read_lock_held() || lockdep_is_held(&cgroup_mutex)); >>> return READ_ONCE(objcg->memcg); >>> } >>> >>> @@ -438,6 +438,19 @@ static inline struct mem_cgroup *folio_memcg(struct folio *folio) >>> return __folio_memcg(folio); >>> } >>> >>> +/* >>> + * folio_memcg_charged - If a folio is charged to a memory cgroup. >>> + * @folio: Pointer to the folio. >>> + * >>> + * Returns true if folio is charged to a memory cgroup, otherwise returns false. >>> + */ >>> +static inline bool folio_memcg_charged(struct folio *folio) >>> +{ >>> + if (folio_memcg_kmem(folio)) >>> + return __folio_objcg(folio) != NULL; >>> + return __folio_memcg(folio) != NULL; >>> +} >>> + >>> /** >>> * folio_memcg_rcu - Locklessly get the memory cgroup associated with a folio. >>> * @folio: Pointer to the folio. >>> @@ -454,7 +467,6 @@ static inline struct mem_cgroup *folio_memcg_rcu(struct folio *folio) >>> unsigned long memcg_data = READ_ONCE(folio->memcg_data); >>> >>> VM_BUG_ON_FOLIO(folio_test_slab(folio), folio); >>> - WARN_ON_ONCE(!rcu_read_lock_held()); >>> >>> if (memcg_data & MEMCG_DATA_KMEM) { >>> struct obj_cgroup *objcg; >>> @@ -463,6 +475,8 @@ static inline struct mem_cgroup *folio_memcg_rcu(struct folio *folio) >>> return obj_cgroup_memcg(objcg); >>> } >>> >>> + WARN_ON_ONCE(!rcu_read_lock_held()); >>> + >>> return (struct mem_cgroup *)(memcg_data & ~OBJEXTS_FLAGS_MASK); >>> } >>> >>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c >>> index 622d4544edd24..3da0284573857 100644 >>> --- a/mm/memcontrol.c >>> +++ b/mm/memcontrol.c >>> @@ -2366,7 +2366,7 @@ void mem_cgroup_cancel_charge(struct mem_cgroup *memcg, unsigned int nr_pages) >>> >>> static void commit_charge(struct folio *folio, struct mem_cgroup *memcg) >>> { >>> - VM_BUG_ON_FOLIO(folio_memcg(folio), folio); >>> + VM_BUG_ON_FOLIO(folio_memcg_charged(folio), folio); >>> /* >>> * Any of the following ensures page's memcg stability: >>> * >>> @@ -4617,7 +4617,7 @@ void __mem_cgroup_uncharge(struct folio *folio) >>> struct uncharge_gather ug; >>> >>> /* Don't touch folio->lru of any random page, pre-check: */ >>> - if (!folio_memcg(folio)) >>> + if (!folio_memcg_charged(folio)) >>> return; >>> >>> uncharge_gather_clear(&ug); >>> @@ -4662,7 +4662,7 @@ void mem_cgroup_replace_folio(struct folio *old, struct folio *new) >>> return; >>> >>> /* Page cache replacement: new folio already charged? */ >>> - if (folio_memcg(new)) >>> + if (folio_memcg_charged(new)) >>> return; >>> >>> memcg = folio_memcg(old); >> Best regards >> -- >> Marek Szyprowski, PhD >> Samsung R&D Institute Poland > > Best regards
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index fc94879db4dff..95f823deafeca 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -360,11 +360,11 @@ static inline bool folio_memcg_kmem(struct folio *folio); * After the initialization objcg->memcg is always pointing at * a valid memcg, but can be atomically swapped to the parent memcg. * - * The caller must ensure that the returned memcg won't be released: - * e.g. acquire the rcu_read_lock or css_set_lock. + * The caller must ensure that the returned memcg won't be released. */ static inline struct mem_cgroup *obj_cgroup_memcg(struct obj_cgroup *objcg) { + lockdep_assert_once(rcu_read_lock_held() || lockdep_is_held(&cgroup_mutex)); return READ_ONCE(objcg->memcg); } @@ -438,6 +438,19 @@ static inline struct mem_cgroup *folio_memcg(struct folio *folio) return __folio_memcg(folio); } +/* + * folio_memcg_charged - If a folio is charged to a memory cgroup. + * @folio: Pointer to the folio. + * + * Returns true if folio is charged to a memory cgroup, otherwise returns false. + */ +static inline bool folio_memcg_charged(struct folio *folio) +{ + if (folio_memcg_kmem(folio)) + return __folio_objcg(folio) != NULL; + return __folio_memcg(folio) != NULL; +} + /** * folio_memcg_rcu - Locklessly get the memory cgroup associated with a folio. * @folio: Pointer to the folio. @@ -454,7 +467,6 @@ static inline struct mem_cgroup *folio_memcg_rcu(struct folio *folio) unsigned long memcg_data = READ_ONCE(folio->memcg_data); VM_BUG_ON_FOLIO(folio_test_slab(folio), folio); - WARN_ON_ONCE(!rcu_read_lock_held()); if (memcg_data & MEMCG_DATA_KMEM) { struct obj_cgroup *objcg; @@ -463,6 +475,8 @@ static inline struct mem_cgroup *folio_memcg_rcu(struct folio *folio) return obj_cgroup_memcg(objcg); } + WARN_ON_ONCE(!rcu_read_lock_held()); + return (struct mem_cgroup *)(memcg_data & ~OBJEXTS_FLAGS_MASK); } diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 622d4544edd24..3da0284573857 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2366,7 +2366,7 @@ void mem_cgroup_cancel_charge(struct mem_cgroup *memcg, unsigned int nr_pages) static void commit_charge(struct folio *folio, struct mem_cgroup *memcg) { - VM_BUG_ON_FOLIO(folio_memcg(folio), folio); + VM_BUG_ON_FOLIO(folio_memcg_charged(folio), folio); /* * Any of the following ensures page's memcg stability: * @@ -4617,7 +4617,7 @@ void __mem_cgroup_uncharge(struct folio *folio) struct uncharge_gather ug; /* Don't touch folio->lru of any random page, pre-check: */ - if (!folio_memcg(folio)) + if (!folio_memcg_charged(folio)) return; uncharge_gather_clear(&ug); @@ -4662,7 +4662,7 @@ void mem_cgroup_replace_folio(struct folio *old, struct folio *new) return; /* Page cache replacement: new folio already charged? */ - if (folio_memcg(new)) + if (folio_memcg_charged(new)) return; memcg = folio_memcg(old);
The obj_cgroup_memcg() is supposed to safe to prevent the returned memory cgroup from being freed only when the caller is holding the rcu read lock or objcg_lock or cgroup_mutex. It is very easy to ignore thoes conditions when users call some upper APIs which call obj_cgroup_memcg() internally like mem_cgroup_from_slab_obj() (See the link below). So it is better to add lockdep assertion to obj_cgroup_memcg() to find those issues ASAP. Because there is no user of obj_cgroup_memcg() holding objcg_lock to make the returned memory cgroup safe, do not add objcg_lock assertion (We should export objcg_lock if we really want to do). Additionally, this is some internal implementation detail of memcg and should not be accessible outside memcg code. Some users like __mem_cgroup_uncharge() do not care the lifetime of the returned memory cgroup, which just want to know if the folio is charged to a memory cgroup, therefore, they do not need to hold the needed locks. In which case, introduce a new helper folio_memcg_charged() to do this. Compare it to folio_memcg(), it could eliminate a memory access of objcg->memcg for kmem, actually, a really small gain. Link: https://lore.kernel.org/all/20240718083607.42068-1-songmuchun@bytedance.com/ Signed-off-by: Muchun Song <songmuchun@bytedance.com> --- v3: - Use lockdep_assert_once(Vlastimil). v2: - Remove mention of objcg_lock in obj_cgroup_memcg()(Shakeel Butt). include/linux/memcontrol.h | 20 +++++++++++++++++--- mm/memcontrol.c | 6 +++--- 2 files changed, 20 insertions(+), 6 deletions(-)