diff mbox series

slab: fix a crash by reading /proc/slab_allocators

Message ID 20190406225901.35465-1-cai@lca.pw (mailing list archive)
State New, archived
Headers show
Series slab: fix a crash by reading /proc/slab_allocators | expand

Commit Message

Qian Cai April 6, 2019, 10:59 p.m. UTC
The commit 510ded33e075 ("slab: implement slab_root_caches list")
changes the name of the list node within "struct kmem_cache" from
"list" to "root_caches_node", but leaks_show() still use the "list"
which causes a crash when reading /proc/slab_allocators.

BUG: unable to handle kernel NULL pointer dereference at
00000000000000aa
PGD 0 P4D 0
Oops: 0000 [#1] SMP DEBUG_PAGEALLOC PTI
CPU: 3 PID: 5925 Comm: ldd Not tainted 5.1.0-rc3-mm1+ #6
RIP: 0010:__lock_acquire.isra.14+0x4b4/0xa50
Call Trace:
 <IRQ>
 lock_acquire+0xa3/0x180
 _raw_spin_lock+0x2f/0x40
 do_drain+0x61/0xc0
 flush_smp_call_function_queue+0x3a/0x110
 generic_smp_call_function_single_interrupt+0x13/0x2b
 smp_call_function_interrupt+0x66/0x1a0
 call_function_interrupt+0xf/0x20
 </IRQ>
RIP: 0010:__tlb_remove_page_size+0x8c/0xe0
 zap_pte_range+0x39f/0xc80
 unmap_page_range+0x38a/0x550
 unmap_single_vma+0x7d/0xe0
 unmap_vmas+0xae/0xd0
 exit_mmap+0xae/0x190
 mmput+0x7a/0x150
 do_exit+0x2d9/0xd40
 do_group_exit+0x41/0xd0
 __x64_sys_exit_group+0x18/0x20
 do_syscall_64+0x68/0x381
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: 510ded33e075 ("slab: implement slab_root_caches list")
Signed-off-by: Qian Cai <cai@lca.pw>
---
 mm/slab.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Tobin Harding April 8, 2019, 1:59 a.m. UTC | #1
On Sat, Apr 06, 2019 at 06:59:01PM -0400, Qian Cai wrote:
> The commit 510ded33e075 ("slab: implement slab_root_caches list")
> changes the name of the list node within "struct kmem_cache" from
> "list" to "root_caches_node"

Are you sure? It looks to me like it adds a member to the memcg_cache_array

diff --git a/include/linux/slab.h b/include/linux/slab.h
index a0cc7a77cda2..af1a5bef80f4 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -556,6 +556,8 @@ struct memcg_cache_array {
  *             used to index child cachces during allocation and cleared
  *             early during shutdown.
  * 
+ * @root_caches_node: List node for slab_root_caches list.
+ * 
  * @children:  List of all child caches.  While the child caches are also
  *             reachable through @memcg_caches, a child cache remains on
  *             this list until it is actually destroyed.
@@ -573,6 +575,7 @@ struct memcg_cache_params {
        union { 
                struct {
                        struct memcg_cache_array __rcu *memcg_caches;
+                       struct list_head __root_caches_node;
                        struct list_head children;
                };

And then defines 'root_caches_node' to be 'memcg_params.__root_caches_node'
if we have CONFIG_MEMCG otherwise defines 'root_caches_node' to be 'list'


> but leaks_show() still use the "list"

I believe it should since 'list' is used to add to slab_caches list.

> which causes a crash when reading /proc/slab_allocators.

I was unable to reproduce this crash, I built with

# CONFIG_MEMCG is not set
CONFIG_SLAB=y
CONFIG_SLAB_MERGE_DEFAULT=y
CONFIG_SLAB_FREELIST_RANDOM=y
CONFIG_DEBUG_SLAB=y
CONFIG_DEBUG_SLAB_LEAK=y

I then booted in Qemu and successfully ran 
$ cat slab_allocators

Perhaps you could post your config?

Hope this helps,
Tobin.
Qian Cai April 8, 2019, 2:18 a.m. UTC | #2
On 4/7/19 9:59 PM, Tobin C. Harding wrote:
> On Sat, Apr 06, 2019 at 06:59:01PM -0400, Qian Cai wrote:
>> The commit 510ded33e075 ("slab: implement slab_root_caches list")
>> changes the name of the list node within "struct kmem_cache" from
>> "list" to "root_caches_node"
> 
> Are you sure? It looks to me like it adds a member to the memcg_cache_array
> 
> diff --git a/include/linux/slab.h b/include/linux/slab.h
> index a0cc7a77cda2..af1a5bef80f4 100644
> --- a/include/linux/slab.h
> +++ b/include/linux/slab.h
> @@ -556,6 +556,8 @@ struct memcg_cache_array {
>   *             used to index child cachces during allocation and cleared
>   *             early during shutdown.
>   * 
> + * @root_caches_node: List node for slab_root_caches list.
> + * 
>   * @children:  List of all child caches.  While the child caches are also
>   *             reachable through @memcg_caches, a child cache remains on
>   *             this list until it is actually destroyed.
> @@ -573,6 +575,7 @@ struct memcg_cache_params {
>         union { 
>                 struct {
>                         struct memcg_cache_array __rcu *memcg_caches;
> +                       struct list_head __root_caches_node;
>                         struct list_head children;
>                 };
> 
> And then defines 'root_caches_node' to be 'memcg_params.__root_caches_node'
> if we have CONFIG_MEMCG otherwise defines 'root_caches_node' to be 'list'
> 
> 
>> but leaks_show() still use the "list"
> 
> I believe it should since 'list' is used to add to slab_caches list.

See the offensive commit 510ded33e075 which changed those.

@@ -1136,12 +1146,12 @@ static void print_slabinfo_header(struct seq_file *m)
 void *slab_start(struct seq_file *m, loff_t *pos)
 {
        mutex_lock(&slab_mutex);
-       return seq_list_start(&slab_caches, *pos);
+       return seq_list_start(&slab_root_caches, *pos);
 }

 void *slab_next(struct seq_file *m, void *p, loff_t *pos)
 {
-       return seq_list_next(p, &slab_caches, pos);
+       return seq_list_next(p, &slab_root_caches, pos);
 }

and then memcg_link_cache() does,

if (is_root_cache(s)) {
	list_add(&s->root_caches_node, &slab_root_caches);

memcg_unlink_cache() does,

if (is_root_cache(s)) {
	list_del(&s->root_caches_node);

It also changed /proc/slabinfo but forgot to change /proc/slab_allocators.

@@ -1193,12 +1203,11 @@ static void cache_show(struct kmem_cache *s, struct
seq_file *m)

 static int slab_show(struct seq_file *m, void *p)
 {
-       struct kmem_cache *s = list_entry(p, struct kmem_cache, list);
+       struct kmem_cache *s = list_entry(p, struct kmem_cache, root_caches_node);

> 
>> which causes a crash when reading /proc/slab_allocators.
> 
> I was unable to reproduce this crash, I built with
> 
> # CONFIG_MEMCG is not set
> CONFIG_SLAB=y
> CONFIG_SLAB_MERGE_DEFAULT=y
> CONFIG_SLAB_FREELIST_RANDOM=y
> CONFIG_DEBUG_SLAB=y
> CONFIG_DEBUG_SLAB_LEAK=y
> 
> I then booted in Qemu and successfully ran 
> $ cat slab_allocators
> 
> Perhaps you could post your config?

Yes, it won't be reproducible without CONFIG_MEMCG=y, because it has,

/* If !memcg, all caches are root. */
#define slab_root_caches       slab_caches
#define root_caches_node       list

Anyway,

https://git.sr.ht/~cai/linux-debug/blob/master/config
Tobin Harding April 8, 2019, 5:03 a.m. UTC | #3
On Sat, Apr 06, 2019 at 06:59:01PM -0400, Qian Cai wrote:
> The commit 510ded33e075 ("slab: implement slab_root_caches list")
> changes the name of the list node within "struct kmem_cache" from
> "list" to "root_caches_node", but leaks_show() still use the "list"
> which causes a crash when reading /proc/slab_allocators.
> 
> BUG: unable to handle kernel NULL pointer dereference at
> 00000000000000aa
> PGD 0 P4D 0
> Oops: 0000 [#1] SMP DEBUG_PAGEALLOC PTI
> CPU: 3 PID: 5925 Comm: ldd Not tainted 5.1.0-rc3-mm1+ #6
> RIP: 0010:__lock_acquire.isra.14+0x4b4/0xa50
> Call Trace:
>  <IRQ>
>  lock_acquire+0xa3/0x180
>  _raw_spin_lock+0x2f/0x40
>  do_drain+0x61/0xc0
>  flush_smp_call_function_queue+0x3a/0x110
>  generic_smp_call_function_single_interrupt+0x13/0x2b
>  smp_call_function_interrupt+0x66/0x1a0
>  call_function_interrupt+0xf/0x20
>  </IRQ>
> RIP: 0010:__tlb_remove_page_size+0x8c/0xe0
>  zap_pte_range+0x39f/0xc80
>  unmap_page_range+0x38a/0x550
>  unmap_single_vma+0x7d/0xe0
>  unmap_vmas+0xae/0xd0
>  exit_mmap+0xae/0x190
>  mmput+0x7a/0x150
>  do_exit+0x2d9/0xd40
>  do_group_exit+0x41/0xd0
>  __x64_sys_exit_group+0x18/0x20
>  do_syscall_64+0x68/0x381
>  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
> Fixes: 510ded33e075 ("slab: implement slab_root_caches list")
> Signed-off-by: Qian Cai <cai@lca.pw>
> ---
>  mm/slab.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/slab.c b/mm/slab.c
> index 46a6e084222b..9142ee992493 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -4307,7 +4307,8 @@ static void show_symbol(struct seq_file *m, unsigned long address)
>  
>  static int leaks_show(struct seq_file *m, void *p)
>  {
> -	struct kmem_cache *cachep = list_entry(p, struct kmem_cache, list);
> +	struct kmem_cache *cachep = list_entry(p, struct kmem_cache,
> +					       root_caches_node);
>  	struct page *page;
>  	struct kmem_cache_node *n;
>  	const char *name;
> -- 
> 2.17.2 (Apple Git-113)
> 

For what its worth

Reviewed-by: Tobin C. Harding <tobin@kernel.org>

thanks,
Tobin.
Linus Torvalds April 8, 2019, 5:35 a.m. UTC | #4
On Sat, Apr 6, 2019 at 12:59 PM Qian Cai <cai@lca.pw> wrote:
>
> The commit 510ded33e075 ("slab: implement slab_root_caches list")
> changes the name of the list node within "struct kmem_cache" from
> "list" to "root_caches_node", but leaks_show() still use the "list"
> which causes a crash when reading /proc/slab_allocators.

The patch does seem to be correct, and I have applied it.

However, it does strike me that apparently this wasn't caught for two
years. Which makes me wonder whether we should (once again) discuss
just removing SLAB entirely, or at least removing the
/proc/slab_allocators file. Apparently it has never been used in the
last two years. At some point a "this can't have worked if  anybody
ever tried to use it" situation means that the code should likely be
excised.

Qian, how did you end up noticing and debugging this?

                 Linus
Qian Cai April 8, 2019, 1:17 p.m. UTC | #5
On Sun, 2019-04-07 at 19:35 -1000, Linus Torvalds wrote:
> On Sat, Apr 6, 2019 at 12:59 PM Qian Cai <cai@lca.pw> wrote:
> > 
> > The commit 510ded33e075 ("slab: implement slab_root_caches list")
> > changes the name of the list node within "struct kmem_cache" from
> > "list" to "root_caches_node", but leaks_show() still use the "list"
> > which causes a crash when reading /proc/slab_allocators.
> 
> The patch does seem to be correct, and I have applied it.
> 
> However, it does strike me that apparently this wasn't caught for two
> years. Which makes me wonder whether we should (once again) discuss
> just removing SLAB entirely, or at least removing the
> /proc/slab_allocators file. Apparently it has never been used in the
> last two years. At some point a "this can't have worked if  anybody
> ever tried to use it" situation means that the code should likely be
> excised.
> 
> Qian, how did you end up noticing and debugging this?

There are some nice texts for CONFIG_SLAB Kconfig written in 2007,

"The regular slab allocator that is established and known to work well in all
environments."

"tricked" me into enabling it in a debug kernel for running testing where LTP
proc01 test case (read all files in procfs) would usually trigger the crash
(Sometimes, "cat /proc/slab_allocators" would just end up printing nothing).

Normally, all those debug kernels would use CONFIG_KASAN which would set
CONFIG_DEBUG_SLAB=n. However, there is no KASAN for powerpc yet, so it selects
CONFIG_DEBUG_SLAB=y there, and then the testing found the issue.
Christoph Lameter (Ampere) April 8, 2019, 3:15 p.m. UTC | #6
On Sun, 7 Apr 2019, Linus Torvalds wrote:

> On Sat, Apr 6, 2019 at 12:59 PM Qian Cai <cai@lca.pw> wrote:
> >
> > The commit 510ded33e075 ("slab: implement slab_root_caches list")
> > changes the name of the list node within "struct kmem_cache" from
> > "list" to "root_caches_node", but leaks_show() still use the "list"
> > which causes a crash when reading /proc/slab_allocators.
>
> The patch does seem to be correct, and I have applied it.
>
> However, it does strike me that apparently this wasn't caught for two
> years. Which makes me wonder whether we should (once again) discuss
> just removing SLAB entirely, or at least removing the
> /proc/slab_allocators file. Apparently it has never been used in the
> last two years. At some point a "this can't have worked if  anybody
> ever tried to use it" situation means that the code should likely be
> excised.

This is only occurring with specially build kernels so that memory leaks
can be investigated. The same is done with other tools (kasan and friends)
today I guess and also the SLUB debugging tools are much more user
friendly. So this means that some esoteric debugging feature of SLAB was
broken.
Tobin Harding April 8, 2019, 11:41 p.m. UTC | #7
On Sun, Apr 07, 2019 at 07:35:34PM -1000, Linus Torvalds wrote:
> On Sat, Apr 6, 2019 at 12:59 PM Qian Cai <cai@lca.pw> wrote:
> >
> > The commit 510ded33e075 ("slab: implement slab_root_caches list")
> > changes the name of the list node within "struct kmem_cache" from
> > "list" to "root_caches_node", but leaks_show() still use the "list"
> > which causes a crash when reading /proc/slab_allocators.
> 
> The patch does seem to be correct, and I have applied it.
> 
> However, it does strike me that apparently this wasn't caught for two
> years. Which makes me wonder whether we should (once again) discuss
> just removing SLAB entirely, or at least removing the
> /proc/slab_allocators file. Apparently it has never been used in the
> last two years. At some point a "this can't have worked if  anybody
> ever tried to use it" situation means that the code should likely be
> excised.

The bug doesn't trigger on every read of /proc/slab_allocators (as noted
later in this thread by Qian).  I tried to repro it with a bunch of
different configs and often `cat /proc/slab_allocators` just returns
empty.

I've got a patchset ready to go sitting in my tree that removes SLAB, I
could just send it to start the conversation :)


	Tobin
diff mbox series

Patch

diff --git a/mm/slab.c b/mm/slab.c
index 46a6e084222b..9142ee992493 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -4307,7 +4307,8 @@  static void show_symbol(struct seq_file *m, unsigned long address)
 
 static int leaks_show(struct seq_file *m, void *p)
 {
-	struct kmem_cache *cachep = list_entry(p, struct kmem_cache, list);
+	struct kmem_cache *cachep = list_entry(p, struct kmem_cache,
+					       root_caches_node);
 	struct page *page;
 	struct kmem_cache_node *n;
 	const char *name;