Message ID | 20190411032635.10325-1-cai@lca.pw (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | slab: fix an infinite loop in leaks_show() | expand |
On 4/11/19 5:26 AM, Qian Cai wrote: > "cat /proc/slab_allocators" could hang forever on SMP machines with > kmemleak or object debugging enabled due to other CPUs running do_drain() > will keep making kmemleak_object or debug_objects_cache dirty and unable > to escape the first loop in leaks_show(), So what if we don't remove SLAB (yet?) but start removing the debugging functionality that has been broken for years and nobody noticed. I think Linus already mentioned that we remove at least the /proc/slab_allocators file... > do { > set_store_user_clean(cachep); > drain_cpu_caches(cachep); > ... > > } while (!is_store_user_clean(cachep)); > > For example, > > do_drain > slabs_destroy > slab_destroy > kmem_cache_free > __cache_free > ___cache_free > kmemleak_free_recursive > delete_object_full > __delete_object > put_object > free_object_rcu > kmem_cache_free > cache_free_debugcheck --> dirty kmemleak_object > > One approach is to check cachep->name and skip both kmemleak_object and > debug_objects_cache in leaks_show(). The other is to set > store_user_clean after drain_cpu_caches() which leaves a small window > between drain_cpu_caches() and set_store_user_clean() where per-CPU > caches could be dirty again lead to slightly wrong information has been > stored but could also speed up things significantly which sounds like a > good compromise. For example, > > # cat /proc/slab_allocators > 0m42.778s # 1st approach > 0m0.737s # 2nd approach > > Fixes: d31676dfde25 ("mm/slab: alternative implementation for DEBUG_SLAB_LEAK") > Signed-off-by: Qian Cai <cai@lca.pw> > --- > mm/slab.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/mm/slab.c b/mm/slab.c > index 9142ee992493..3e1b7ff0360c 100644 > --- a/mm/slab.c > +++ b/mm/slab.c > @@ -4328,8 +4328,12 @@ static int leaks_show(struct seq_file *m, void *p) > * whole processing. > */ > do { > - set_store_user_clean(cachep); > drain_cpu_caches(cachep); > + /* > + * drain_cpu_caches() could always make kmemleak_object and > + * debug_objects_cache dirty, so reset afterwards. > + */ > + set_store_user_clean(cachep); > > x[1] = 0; > >
On 4/11/19 4:20 AM, Vlastimil Babka wrote: > On 4/11/19 5:26 AM, Qian Cai wrote: >> "cat /proc/slab_allocators" could hang forever on SMP machines with >> kmemleak or object debugging enabled due to other CPUs running do_drain() >> will keep making kmemleak_object or debug_objects_cache dirty and unable >> to escape the first loop in leaks_show(), > > So what if we don't remove SLAB (yet?) but start removing the debugging > functionality that has been broken for years and nobody noticed. I think > Linus already mentioned that we remove at least the > /proc/slab_allocators file... In my experience, 2-year isn't that long for debugging features to be silently broken with SLAB where kmemleak is broken for more than 4-year there. See 92d1d07daad6 ("mm/slab.c: kmemleak no scan alien caches").
diff --git a/mm/slab.c b/mm/slab.c index 9142ee992493..3e1b7ff0360c 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -4328,8 +4328,12 @@ static int leaks_show(struct seq_file *m, void *p) * whole processing. */ do { - set_store_user_clean(cachep); drain_cpu_caches(cachep); + /* + * drain_cpu_caches() could always make kmemleak_object and + * debug_objects_cache dirty, so reset afterwards. + */ + set_store_user_clean(cachep); x[1] = 0;
"cat /proc/slab_allocators" could hang forever on SMP machines with kmemleak or object debugging enabled due to other CPUs running do_drain() will keep making kmemleak_object or debug_objects_cache dirty and unable to escape the first loop in leaks_show(), do { set_store_user_clean(cachep); drain_cpu_caches(cachep); ... } while (!is_store_user_clean(cachep)); For example, do_drain slabs_destroy slab_destroy kmem_cache_free __cache_free ___cache_free kmemleak_free_recursive delete_object_full __delete_object put_object free_object_rcu kmem_cache_free cache_free_debugcheck --> dirty kmemleak_object One approach is to check cachep->name and skip both kmemleak_object and debug_objects_cache in leaks_show(). The other is to set store_user_clean after drain_cpu_caches() which leaves a small window between drain_cpu_caches() and set_store_user_clean() where per-CPU caches could be dirty again lead to slightly wrong information has been stored but could also speed up things significantly which sounds like a good compromise. For example, # cat /proc/slab_allocators 0m42.778s # 1st approach 0m0.737s # 2nd approach Fixes: d31676dfde25 ("mm/slab: alternative implementation for DEBUG_SLAB_LEAK") Signed-off-by: Qian Cai <cai@lca.pw> --- mm/slab.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)