mbox series

[v2,0/7] mm, slub: handle pending kfree_rcu() in kmem_cache_destroy()

Message ID 20240807-b4-slab-kfree_rcu-destroy-v2-0-ea79102f428c@suse.cz (mailing list archive)
Headers show
Series mm, slub: handle pending kfree_rcu() in kmem_cache_destroy() | expand

Message

Vlastimil Babka Aug. 7, 2024, 10:31 a.m. UTC
Also in git:
https://git.kernel.org/vbabka/l/slab-kfree_rcu-destroy-v2r2

Since SLOB was removed, we have allowed kfree_rcu() for objects
allocated from any kmem_cache in addition to kmalloc().

Recently we have attempted to replace existing call_rcu() usage with
kfree_rcu() where the callback is a plain kmem_cache_free(), in a series
by Julia Lawall [1].

Jakub Kicinski pointed out [2] this was tried already in batman-adv but
had to be reverted due to kmem_cache_destroy() failing due to objects
remaining in the cache, despite rcu_barrier() being used.

Jason Donenfeld found the culprit [3] being a35d16905efc ("rcu: Add
basic support for kfree_rcu() batching") causing rcu_barrier() to be
insufficient.

This was never a problem for kfree_rcu() usage on kmalloc() objects as
the kmalloc caches are never destroyed, but arbitrary caches can be,
e.g. due to module unload.

Out of the possible solutions collected by Paul McKenney [4] the most
appealing to me is "kmem_cache_destroy() lingers for kfree_rcu()" as
it adds no additional concerns to kfree_rcu() users.

We already have the precedence in some parts of the kmem_cache cleanup
being done asynchronously for SLAB_TYPESAFE_BY_RCU caches. The v1 of
this RFC took the same approach for asynchronously waiting for pending
kfree_rcu(). Mateusz Guzik on IRC questioned this approach, and it turns
out the rcu_barrier() used to be synchronous before commit 657dc2f97220
("slab: remove synchronous rcu_barrier() call in memcg cache release
path") and the motivation for that is no longer applicable. So instead
in v2 the existing barrier is reverted to be synchronous, and the new
barrier for kfree_rcu() is also called sychronously.

The new kvfree_rcu_barrier() was provided by Uladzislau Rezki in a patch
[5] carried now by this series.

There is also a bunch of preliminary cleanup steps. The potentially
visible one is that sysfs and debugfs directories, as well as
/proc/slabinfo record of the cache are now removed immediately during
kmem_cache_destroy() - previously this would be delayed for
SLAB_TYPESAFE_BY_RCU caches or left around forever if leaked objects
were detected. Even though we no longer have the delayed removal, leaked
objects should not prevent the cache to be recreated including its sysfs
and debugfs directories, so it's better to make this cleanup anyway.
The immediate removal is the simplest solution (compared to e.g.
renaming the directories) and should not make debugging harder - while
it won't be possible to check debugfs for allocation traces of leaked
objects, they are listed with more detail in dmesg anyway.

[1] https://lore.kernel.org/all/20240609082726.32742-1-Julia.Lawall@inria.fr/
[2] https://lore.kernel.org/all/20240612143305.451abf58@kernel.org/
[3] https://lore.kernel.org/all/Zmo9-YGraiCj5-MI@zx2c4.com/
[4] https://docs.google.com/document/d/1v0rcZLvvjVGejT3523W0rDy_sLFu2LWc_NR3fQItZaA/edit
[5] https://lore.kernel.org/all/20240801111039.79656-1-urezki@gmail.com/

To: Paul E. McKenney <paulmck@kernel.org>
To: Joel Fernandes <joel@joelfernandes.org>
To: Josh Triplett <josh@joshtriplett.org>
To: Boqun Feng <boqun.feng@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
CC: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Zqiang <qiang.zhang1211@gmail.com>
Cc: Julia Lawall <Julia.Lawall@inria.fr>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
To: Christoph Lameter <cl@linux.com>
To: David Rientjes <rientjes@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
Cc: rcu@vger.kernel.org
Cc: Alexander Potapenko <glider@google.com>
Cc: Marco Elver <elver@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: kasan-dev@googlegroups.com
Cc: Jann Horn <jannh@google.com>
Cc: Mateusz Guzik <mjguzik@gmail.com>

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
Changes in v2:
- Include the necessary barrier implementation (by Uladzislau Rezki)
- Switch to synchronous barriers (Mateusz Guzik)
- Moving of kfence_shutdown_cache() outside slab_mutex done in a
  separate step for review and bisectability.
- Additional kunit test for destroying a cache with leaked object.
- Link to v1: https://lore.kernel.org/r/20240715-b4-slab-kfree_rcu-destroy-v1-0-46b2984c2205@suse.cz

---
Uladzislau Rezki (Sony) (1):
      rcu/kvfree: Add kvfree_rcu_barrier() API

Vlastimil Babka (6):
      mm, slab: dissolve shutdown_cache() into its caller
      mm, slab: unlink slabinfo, sysfs and debugfs immediately
      mm, slab: move kfence_shutdown_cache() outside slab_mutex
      mm, slab: reintroduce rcu_barrier() into kmem_cache_destroy()
      mm, slab: call kvfree_rcu_barrier() from kmem_cache_destroy()
      kunit, slub: add test_kfree_rcu() and test_leak_destroy()

 include/linux/rcutiny.h |   5 +++
 include/linux/rcutree.h |   1 +
 kernel/rcu/tree.c       | 103 ++++++++++++++++++++++++++++++++++++++++----
 lib/slub_kunit.c        |  31 ++++++++++++++
 mm/slab_common.c        | 111 ++++++++++++++----------------------------------
 5 files changed, 163 insertions(+), 88 deletions(-)
---
base-commit: 8400291e289ee6b2bf9779ff1c83a291501f017b
change-id: 20240715-b4-slab-kfree_rcu-destroy-85dd2b2ded92

Best regards,