mbox series

[0/5] SLUB debugfs improvements based on stackdepot

Message ID 20220225180318.20594-1-vbabka@suse.cz (mailing list archive)
Headers show
Series SLUB debugfs improvements based on stackdepot | expand

Message

Vlastimil Babka Feb. 25, 2022, 6:03 p.m. UTC
Hi,

this series combines and revives patches from Oliver's last year
bachelor thesis (where I was the advisor) that make SLUB's debugfs
files alloc_traces and free_traces more useful.
The resubmission was blocked on stackdepot changes that are now merged,
as explained in patch 2.

Patch 1 is a new preparatory cleanup.

Patch 2 originally submitted here [1], was merged to mainline but
reverted for stackdepot related issues as explained in the patch.

Patches 3-5 originally submitted as RFC here [2]. In this submission I
have omitted the new file 'all_objects' (patch 3/3 in [2]) as it might
be considered too intrusive so I will postpone it for later. The docs
patch is adjusted accordingly.

Also available in git, based on v5.17-rc1:
https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=slub-stackdepot-v1

I'd like to ask for some review before I add this to the slab tree.

[1] https://lore.kernel.org/all/20210414163434.4376-1-glittao@gmail.com/
[2] https://lore.kernel.org/all/20210521121127.24653-1-glittao@gmail.com/

Oliver Glitta (4):
  mm/slub: use stackdepot to save stack trace in objects
  mm/slub: aggregate and print stack traces in debugfs files
  mm/slub: sort debugfs output by frequency of stack traces
  slab, documentation: add description of debugfs files for SLUB caches

Vlastimil Babka (1):
  mm/slub: move struct track init out of set_track()

 Documentation/vm/slub.rst |  61 +++++++++++++++
 init/Kconfig              |   1 +
 mm/slub.c                 | 152 +++++++++++++++++++++++++-------------
 3 files changed, 162 insertions(+), 52 deletions(-)

Comments

Hyeonggon Yoo Feb. 26, 2022, 7:19 a.m. UTC | #1
On Fri, Feb 25, 2022 at 07:03:13PM +0100, Vlastimil Babka wrote:
> Hi,
> 
> this series combines and revives patches from Oliver's last year
> bachelor thesis (where I was the advisor) that make SLUB's debugfs
> files alloc_traces and free_traces more useful.
> The resubmission was blocked on stackdepot changes that are now merged,
> as explained in patch 2.
> 

Hello. I just started review/testing this series.

it crashed on my system (arm64)

I ran with boot parameter slub_debug=U, and without KASAN.
So CONFIG_STACKDEPOT_ALWAYS_INIT=n.

void * __init memblock_alloc_try_nid(
                        phys_addr_t size, phys_addr_t align,
                        phys_addr_t min_addr, phys_addr_t max_addr,
                        int nid)
{
        void *ptr;

        memblock_dbg("%s: %llu bytes align=0x%llx nid=%d from=%pa max_addr=%pa %pS\n",
                     __func__, (u64)size, (u64)align, nid, &min_addr,
                     &max_addr, (void *)_RET_IP_);
        ptr = memblock_alloc_internal(size, align,
                                           min_addr, max_addr, nid, false);
        if (ptr)
                memset(ptr, 0, size); <--- Crash Here

        return ptr;
}

It crashed during create_boot_cache() -> stack_depot_init() ->
memblock_alloc().

I think That's because, in kmem_cache_init(), both slab and memblock is not
available. (AFAIU memblock is not available after mem_init() because of
memblock_free_all(), right?)

Thanks!

/*
 * Set up kernel memory allocators
 */
static void __init mm_init(void)
{
        /*
         * page_ext requires contiguous pages,
         * bigger than MAX_ORDER unless SPARSEMEM.
         */
        page_ext_init_flatmem();
        init_mem_debugging_and_hardening();
        kfence_alloc_pool();
        report_meminit();
        stack_depot_early_init();
        mem_init();
        mem_init_print_info();
        kmem_cache_init();
        /*
         * page_owner must be initialized after buddy is ready, and also after
         * slab is ready so that stack_depot_init() works properly
         */)

> Patch 1 is a new preparatory cleanup.
> 
> Patch 2 originally submitted here [1], was merged to mainline but
> reverted for stackdepot related issues as explained in the patch.
> 
> Patches 3-5 originally submitted as RFC here [2]. In this submission I
> have omitted the new file 'all_objects' (patch 3/3 in [2]) as it might
> be considered too intrusive so I will postpone it for later. The docs
> patch is adjusted accordingly.
> 
> Also available in git, based on v5.17-rc1:
> https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=slub-stackdepot-v1
> 
> I'd like to ask for some review before I add this to the slab tree.
> 
> [1] https://lore.kernel.org/all/20210414163434.4376-1-glittao@gmail.com/
> [2] https://lore.kernel.org/all/20210521121127.24653-1-glittao@gmail.com/
> 
> Oliver Glitta (4):
>   mm/slub: use stackdepot to save stack trace in objects
>   mm/slub: aggregate and print stack traces in debugfs files
>   mm/slub: sort debugfs output by frequency of stack traces
>   slab, documentation: add description of debugfs files for SLUB caches
> 
> Vlastimil Babka (1):
>   mm/slub: move struct track init out of set_track()
> 
>  Documentation/vm/slub.rst |  61 +++++++++++++++
>  init/Kconfig              |   1 +
>  mm/slub.c                 | 152 +++++++++++++++++++++++++-------------
>  3 files changed, 162 insertions(+), 52 deletions(-)
> 
> -- 
> 2.35.1
> 
>
Hyeonggon Yoo Feb. 26, 2022, 12:18 p.m. UTC | #2
On Fri, Feb 25, 2022 at 07:03:13PM +0100, Vlastimil Babka wrote:
> Hi,
> 
> this series combines and revives patches from Oliver's last year
> bachelor thesis (where I was the advisor) that make SLUB's debugfs
> files alloc_traces and free_traces more useful.
> The resubmission was blocked on stackdepot changes that are now merged,
> as explained in patch 2.
> 
> Patch 1 is a new preparatory cleanup.
> 
> Patch 2 originally submitted here [1], was merged to mainline but
> reverted for stackdepot related issues as explained in the patch.
> 
> Patches 3-5 originally submitted as RFC here [2]. In this submission I
> have omitted the new file 'all_objects' (patch 3/3 in [2]) as it might
> be considered too intrusive so I will postpone it for later. The docs
> patch is adjusted accordingly.
> 

This problem is not caused by this patch series.
But I think it's worth mentioning...

It's really weird that some stack traces are not recorded
when CONFIG_KASAN=y.

I made sure that:
	- Stack Depot did not reach its limit
	- the free path happen on CONFIG_KASAN=y too.

I have no clue why this happen.

# cat dentry/free_traces (CONFIG_KASAN=y)
   6585 <not-available> age=4294912647 pid=0 cpus=0

# cat dentry/free_traces (CONFIG_KASAN=n)
   1246 <not-available> age=4294906877 pid=0 cpus=0
    379 __d_free+0x20/0x2c age=33/14225/14353 pid=0-122 cpus=0-3
        kmem_cache_free+0x1f4/0x21c
        __d_free+0x20/0x2c
        rcu_core+0x334/0x580
        rcu_core_si+0x14/0x20
        __do_softirq+0x12c/0x2a8

      2 dentry_free+0x58/0xb0 age=14101/14101/14101 pid=158 cpus=0
        kmem_cache_free+0x1f4/0x21c
        dentry_free+0x58/0xb0
        __dentry_kill+0x18c/0x1d0
        dput+0x1c4/0x2fc
        __fput+0xb0/0x230
        ____fput+0x14/0x20
        task_work_run+0x84/0x17c
        do_notify_resume+0x208/0x1330
        el0_svc+0x6c/0x80
        el0t_64_sync_handler+0xa8/0x130
        el0t_64_sync+0x1a0/0x1a4

      1 dentry_free+0x58/0xb0 age=7678 pid=190 cpus=1
        kmem_cache_free+0x1f4/0x21c
        dentry_free+0x58/0xb0
        __dentry_kill+0x18c/0x1d0
        dput+0x1c4/0x2fc
        __fput+0xb0/0x230
        ____fput+0x14/0x20
        task_work_run+0x84/0x17c
        do_exit+0x2dc/0x8e0
        do_group_exit+0x38/0xa4
        __wake_up_parent+0x0/0x34
        invoke_syscall+0x48/0x114
        el0_svc_common.constprop.0+0x44/0xfc
        do_el0_svc+0x2c/0x94
        el0_svc+0x28/0x80
        el0t_64_sync_handler+0xa8/0x130
        el0t_64_sync+0x1a0/0x1a4
Vlastimil Babka Feb. 28, 2022, 7:10 p.m. UTC | #3
On 2/26/22 08:19, Hyeonggon Yoo wrote:
> On Fri, Feb 25, 2022 at 07:03:13PM +0100, Vlastimil Babka wrote:
>> Hi,
>> 
>> this series combines and revives patches from Oliver's last year
>> bachelor thesis (where I was the advisor) that make SLUB's debugfs
>> files alloc_traces and free_traces more useful.
>> The resubmission was blocked on stackdepot changes that are now merged,
>> as explained in patch 2.
>> 
> 
> Hello. I just started review/testing this series.
> 
> it crashed on my system (arm64)

Hmm, interesting. On x86_64 this works for me and stackdepot is allocated
from memblock. arm64 must have memblock freeing happen earlier or something.
(CCing memblock experts)

> I ran with boot parameter slub_debug=U, and without KASAN.
> So CONFIG_STACKDEPOT_ALWAYS_INIT=n.
> 
> void * __init memblock_alloc_try_nid(
>                         phys_addr_t size, phys_addr_t align,
>                         phys_addr_t min_addr, phys_addr_t max_addr,
>                         int nid)
> {
>         void *ptr;
> 
>         memblock_dbg("%s: %llu bytes align=0x%llx nid=%d from=%pa max_addr=%pa %pS\n",
>                      __func__, (u64)size, (u64)align, nid, &min_addr,
>                      &max_addr, (void *)_RET_IP_);
>         ptr = memblock_alloc_internal(size, align,
>                                            min_addr, max_addr, nid, false);
>         if (ptr)
>                 memset(ptr, 0, size); <--- Crash Here
> 
>         return ptr;
> }
> 
> It crashed during create_boot_cache() -> stack_depot_init() ->
> memblock_alloc().
> 
> I think That's because, in kmem_cache_init(), both slab and memblock is not
> available. (AFAIU memblock is not available after mem_init() because of
> memblock_free_all(), right?)

Hm yes I see, even in x86_64 version mem_init() calls memblock_free_all().
But then, I would expect stack_depot_init() to detect that memblock_alloc()
returns NULL, we print ""Stack Depot hash table allocation failed,
disabling" and disable it. Instead it seems memblock_alloc() returns
something that's already potentially used by somebody else? Sounds like a bug?

> Thanks!
> 
> /*
>  * Set up kernel memory allocators
>  */
> static void __init mm_init(void)
> {
>         /*
>          * page_ext requires contiguous pages,
>          * bigger than MAX_ORDER unless SPARSEMEM.
>          */
>         page_ext_init_flatmem();
>         init_mem_debugging_and_hardening();
>         kfence_alloc_pool();
>         report_meminit();
>         stack_depot_early_init();
>         mem_init();
>         mem_init_print_info();
>         kmem_cache_init();
>         /*
>          * page_owner must be initialized after buddy is ready, and also after
>          * slab is ready so that stack_depot_init() works properly
>          */)
> 
>> Patch 1 is a new preparatory cleanup.
>> 
>> Patch 2 originally submitted here [1], was merged to mainline but
>> reverted for stackdepot related issues as explained in the patch.
>> 
>> Patches 3-5 originally submitted as RFC here [2]. In this submission I
>> have omitted the new file 'all_objects' (patch 3/3 in [2]) as it might
>> be considered too intrusive so I will postpone it for later. The docs
>> patch is adjusted accordingly.
>> 
>> Also available in git, based on v5.17-rc1:
>> https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=slub-stackdepot-v1
>> 
>> I'd like to ask for some review before I add this to the slab tree.
>> 
>> [1] https://lore.kernel.org/all/20210414163434.4376-1-glittao@gmail.com/
>> [2] https://lore.kernel.org/all/20210521121127.24653-1-glittao@gmail.com/
>> 
>> Oliver Glitta (4):
>>   mm/slub: use stackdepot to save stack trace in objects
>>   mm/slub: aggregate and print stack traces in debugfs files
>>   mm/slub: sort debugfs output by frequency of stack traces
>>   slab, documentation: add description of debugfs files for SLUB caches
>> 
>> Vlastimil Babka (1):
>>   mm/slub: move struct track init out of set_track()
>> 
>>  Documentation/vm/slub.rst |  61 +++++++++++++++
>>  init/Kconfig              |   1 +
>>  mm/slub.c                 | 152 +++++++++++++++++++++++++-------------
>>  3 files changed, 162 insertions(+), 52 deletions(-)
>> 
>> -- 
>> 2.35.1
>> 
>> 
>
Mike Rapoport Feb. 28, 2022, 8:01 p.m. UTC | #4
On Mon, Feb 28, 2022 at 08:10:18PM +0100, Vlastimil Babka wrote:
> On 2/26/22 08:19, Hyeonggon Yoo wrote:
> > On Fri, Feb 25, 2022 at 07:03:13PM +0100, Vlastimil Babka wrote:
> >> Hi,
> >> 
> >> this series combines and revives patches from Oliver's last year
> >> bachelor thesis (where I was the advisor) that make SLUB's debugfs
> >> files alloc_traces and free_traces more useful.
> >> The resubmission was blocked on stackdepot changes that are now merged,
> >> as explained in patch 2.
> >> 
> > 
> > Hello. I just started review/testing this series.
> > 
> > it crashed on my system (arm64)
> 
> Hmm, interesting. On x86_64 this works for me and stackdepot is allocated
> from memblock. arm64 must have memblock freeing happen earlier or something.
> (CCing memblock experts)
> 
> > I ran with boot parameter slub_debug=U, and without KASAN.
> > So CONFIG_STACKDEPOT_ALWAYS_INIT=n.
> > 
> > void * __init memblock_alloc_try_nid(
> >                         phys_addr_t size, phys_addr_t align,
> >                         phys_addr_t min_addr, phys_addr_t max_addr,
> >                         int nid)
> > {
> >         void *ptr;
> > 
> >         memblock_dbg("%s: %llu bytes align=0x%llx nid=%d from=%pa max_addr=%pa %pS\n",
> >                      __func__, (u64)size, (u64)align, nid, &min_addr,
> >                      &max_addr, (void *)_RET_IP_);
> >         ptr = memblock_alloc_internal(size, align,
> >                                            min_addr, max_addr, nid, false);
> >         if (ptr)
> >                 memset(ptr, 0, size); <--- Crash Here
> > 
> >         return ptr;
> > }
> > 
> > It crashed during create_boot_cache() -> stack_depot_init() ->
> > memblock_alloc().
> > 
> > I think That's because, in kmem_cache_init(), both slab and memblock is not
> > available. (AFAIU memblock is not available after mem_init() because of
> > memblock_free_all(), right?)
> 
> Hm yes I see, even in x86_64 version mem_init() calls memblock_free_all().
> But then, I would expect stack_depot_init() to detect that memblock_alloc()
> returns NULL, we print ""Stack Depot hash table allocation failed,
> disabling" and disable it. Instead it seems memblock_alloc() returns
> something that's already potentially used by somebody else? Sounds like a bug?

If stack_depot_init() is called from kmem_cache_init(), there will be a
confusion what allocator should be used because we use slab_is_available()
to stop using memblock and start using kmalloc() instead in both
stack_depot_init() and in memblock.

Hyeonggon, did you run your tests with panic on warn at any chance?
 
> > Thanks!
> > 
> > /*
> >  * Set up kernel memory allocators
> >  */
> > static void __init mm_init(void)
> > {
> >         /*
> >          * page_ext requires contiguous pages,
> >          * bigger than MAX_ORDER unless SPARSEMEM.
> >          */
> >         page_ext_init_flatmem();
> >         init_mem_debugging_and_hardening();
> >         kfence_alloc_pool();
> >         report_meminit();
> >         stack_depot_early_init();
> >         mem_init();
> >         mem_init_print_info();
> >         kmem_cache_init();
> >         /*
> >          * page_owner must be initialized after buddy is ready, and also after
> >          * slab is ready so that stack_depot_init() works properly
> >          */)
> > 
> >> Patch 1 is a new preparatory cleanup.
> >> 
> >> Patch 2 originally submitted here [1], was merged to mainline but
> >> reverted for stackdepot related issues as explained in the patch.
> >> 
> >> Patches 3-5 originally submitted as RFC here [2]. In this submission I
> >> have omitted the new file 'all_objects' (patch 3/3 in [2]) as it might
> >> be considered too intrusive so I will postpone it for later. The docs
> >> patch is adjusted accordingly.
> >> 
> >> Also available in git, based on v5.17-rc1:
> >> https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=slub-stackdepot-v1
> >> 
> >> I'd like to ask for some review before I add this to the slab tree.
> >> 
> >> [1] https://lore.kernel.org/all/20210414163434.4376-1-glittao@gmail.com/
> >> [2] https://lore.kernel.org/all/20210521121127.24653-1-glittao@gmail.com/
> >> 
> >> Oliver Glitta (4):
> >>   mm/slub: use stackdepot to save stack trace in objects
> >>   mm/slub: aggregate and print stack traces in debugfs files
> >>   mm/slub: sort debugfs output by frequency of stack traces
> >>   slab, documentation: add description of debugfs files for SLUB caches
> >> 
> >> Vlastimil Babka (1):
> >>   mm/slub: move struct track init out of set_track()
> >> 
> >>  Documentation/vm/slub.rst |  61 +++++++++++++++
> >>  init/Kconfig              |   1 +
> >>  mm/slub.c                 | 152 +++++++++++++++++++++++++-------------
> >>  3 files changed, 162 insertions(+), 52 deletions(-)
> >> 
> >> -- 
> >> 2.35.1
> >> 
> >> 
> > 
>
Hyeonggon Yoo Feb. 28, 2022, 9:20 p.m. UTC | #5
On Mon, Feb 28, 2022 at 10:01:29PM +0200, Mike Rapoport wrote:
> On Mon, Feb 28, 2022 at 08:10:18PM +0100, Vlastimil Babka wrote:
> > On 2/26/22 08:19, Hyeonggon Yoo wrote:
> > > On Fri, Feb 25, 2022 at 07:03:13PM +0100, Vlastimil Babka wrote:
> > >> Hi,
> > >> 
> > >> this series combines and revives patches from Oliver's last year
> > >> bachelor thesis (where I was the advisor) that make SLUB's debugfs
> > >> files alloc_traces and free_traces more useful.
> > >> The resubmission was blocked on stackdepot changes that are now merged,
> > >> as explained in patch 2.
> > >> 
> > > 
> > > Hello. I just started review/testing this series.
> > > 
> > > it crashed on my system (arm64)
> > 
> > Hmm, interesting. On x86_64 this works for me and stackdepot is allocated
> > from memblock. arm64 must have memblock freeing happen earlier or something.
> > (CCing memblock experts)
> > 
> > > I ran with boot parameter slub_debug=U, and without KASAN.
> > > So CONFIG_STACKDEPOT_ALWAYS_INIT=n.
> > > 
> > > void * __init memblock_alloc_try_nid(
> > >                         phys_addr_t size, phys_addr_t align,
> > >                         phys_addr_t min_addr, phys_addr_t max_addr,
> > >                         int nid)
> > > {
> > >         void *ptr;
> > > 
> > >         memblock_dbg("%s: %llu bytes align=0x%llx nid=%d from=%pa max_addr=%pa %pS\n",
> > >                      __func__, (u64)size, (u64)align, nid, &min_addr,
> > >                      &max_addr, (void *)_RET_IP_);
> > >         ptr = memblock_alloc_internal(size, align,
> > >                                            min_addr, max_addr, nid, false);
> > >         if (ptr)
> > >                 memset(ptr, 0, size); <--- Crash Here
> > > 
> > >         return ptr;
> > > }
> > > 
> > > It crashed during create_boot_cache() -> stack_depot_init() ->
> > > memblock_alloc().
> > > 
> > > I think That's because, in kmem_cache_init(), both slab and memblock is not
> > > available. (AFAIU memblock is not available after mem_init() because of
> > > memblock_free_all(), right?)
> > 
> > Hm yes I see, even in x86_64 version mem_init() calls memblock_free_all().
> > But then, I would expect stack_depot_init() to detect that memblock_alloc()
> > returns NULL, we print ""Stack Depot hash table allocation failed,
> > disabling" and disable it. Instead it seems memblock_alloc() returns
> > something that's already potentially used by somebody else? Sounds like a bug?
> 

It's really weird, but memblock_alloc() did not fail after
memblock_free_all(). it just crashed while initializing memory returned
by memblock.

> If stack_depot_init() is called from kmem_cache_init(), there will be a
> confusion what allocator should be used because we use slab_is_available()
> to stop using memblock and start using kmalloc() instead in both
> stack_depot_init() and in memblock.
> 
> Hyeonggon, did you run your tests with panic on warn at any chance?
>

Yeah, I think this stack trace would help:

[    0.000000] Stack Depot allocating hash table with memblock_alloc
[    0.000000] Unable to handle kernel paging request at virtual address ffff000097400000
[    0.000000] Mem abort info:
[    0.000000]   ESR = 0x96000047
[    0.000000]   EC = 0x25: DABT (current EL), IL = 32 bits
[    0.000000]   SET = 0, FnV = 0
[    0.000000]   EA = 0, S1PTW = 0
[    0.000000]   FSC = 0x07: level 3 translation fault
[    0.000000] Data abort info:
[    0.000000]   ISV = 0, ISS = 0x00000047
[    0.000000]   CM = 0, WnR = 1
[    0.000000] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000041719000
[    0.000000] [ffff000097400000] pgd=18000000dcff8003, p4d=18000000dcff8003, pud=18000000dcbfe003, pmd=18000000dcb43003, pte=00680000d7400706
[    0.000000] Internal error: Oops: 96000047 [#1] PREEMPT SMP
[    0.000000] Modules linked in:
[    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.17.0-rc1-11918-gbf5d03166d75 #51
[    0.000000] Hardware name: linux,dummy-virt (DT)
[    0.000000] pstate: 400000c5 (nZcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[    0.000000] pc : __memset+0x16c/0x188
[    0.000000] lr : memblock_alloc_try_nid+0xcc/0xe4
[    0.000000] sp : ffff800009a33cd0
[    0.000000] x29: ffff800009a33cd0 x28: 0000000041720018 x27: ffff800009362640
[    0.000000] x26: ffff800009362640 x25: 0000000000000000 x24: 0000000000000000
[    0.000000] x23: 0000000000002000 x22: ffff80000932bb50 x21: 00000000ffffffff
[    0.000000] x20: ffff000097400000 x19: 0000000000800000 x18: ffffffffffffffff
[    0.000000] x17: 373578302f383278 x16: 302b657461657263 x15: 0000001000000000
[    0.000000] x14: 0000000000000360 x13: 0000000000009f8c x12: 00000000dcb0c070
[    0.000000] x11: 0000001000000000 x10: 00000000004ea000 x9 : 0000000000000000
[    0.000000] x8 : ffff000097400000 x7 : 0000000000000000 x6 : 000000000000003f
[    0.000000] x5 : 0000000000000040 x4 : 0000000000000000 x3 : 0000000000000004
[    0.000000] x2 : 00000000007fffc0 x1 : 0000000000000000 x0 : ffff000097400000
[    0.000000] Call trace:
[    0.000000]  __memset+0x16c/0x188
[    0.000000]  stack_depot_init+0xc8/0x100
[    0.000000]  __kmem_cache_create+0x454/0x570
[    0.000000]  create_boot_cache+0xa0/0xe0
[    0.000000]  kmem_cache_init+0xf8/0x204
[    0.000000]  start_kernel+0x3ec/0x668
[    0.000000]  __primary_switched+0xc0/0xc8
[    0.000000] Code: 91010108 54ffff4a 8b040108 cb050042 (d50b7428)
[    0.000000] ---[ end trace 0000000000000000 ]---
[    0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
[    0.000000] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---


Thanks!

> > > Thanks!
> > > 
> > > /*
> > >  * Set up kernel memory allocators
> > >  */
> > > static void __init mm_init(void)
> > > {
> > >         /*
> > >          * page_ext requires contiguous pages,
> > >          * bigger than MAX_ORDER unless SPARSEMEM.
> > >          */
> > >         page_ext_init_flatmem();
> > >         init_mem_debugging_and_hardening();
> > >         kfence_alloc_pool();
> > >         report_meminit();
> > >         stack_depot_early_init();
> > >         mem_init();
> > >         mem_init_print_info();
> > >         kmem_cache_init();
> > >         /*
> > >          * page_owner must be initialized after buddy is ready, and also after
> > >          * slab is ready so that stack_depot_init() works properly
> > >          */)
> > > 
> > >> Patch 1 is a new preparatory cleanup.
> > >> 
> > >> Patch 2 originally submitted here [1], was merged to mainline but
> > >> reverted for stackdepot related issues as explained in the patch.
> > >> 
> > >> Patches 3-5 originally submitted as RFC here [2]. In this submission I
> > >> have omitted the new file 'all_objects' (patch 3/3 in [2]) as it might
> > >> be considered too intrusive so I will postpone it for later. The docs
> > >> patch is adjusted accordingly.
> > >> 
> > >> Also available in git, based on v5.17-rc1:
> > >> https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=slub-stackdepot-v1
> > >> 
> > >> I'd like to ask for some review before I add this to the slab tree.
> > >> 
> > >> [1] https://lore.kernel.org/all/20210414163434.4376-1-glittao@gmail.com/
> > >> [2] https://lore.kernel.org/all/20210521121127.24653-1-glittao@gmail.com/
> > >> 
> > >> Oliver Glitta (4):
> > >>   mm/slub: use stackdepot to save stack trace in objects
> > >>   mm/slub: aggregate and print stack traces in debugfs files
> > >>   mm/slub: sort debugfs output by frequency of stack traces
> > >>   slab, documentation: add description of debugfs files for SLUB caches
> > >> 
> > >> Vlastimil Babka (1):
> > >>   mm/slub: move struct track init out of set_track()
> > >> 
> > >>  Documentation/vm/slub.rst |  61 +++++++++++++++
> > >>  init/Kconfig              |   1 +
> > >>  mm/slub.c                 | 152 +++++++++++++++++++++++++-------------
> > >>  3 files changed, 162 insertions(+), 52 deletions(-)
> > >> 
> > >> -- 
> > >> 2.35.1
> > >> 
> > >> 
> > > 
> > 
> 
> -- 
> Sincerely yours,
> Mike.
Hyeonggon Yoo Feb. 28, 2022, 9:27 p.m. UTC | #6
On Mon, Feb 28, 2022 at 08:10:18PM +0100, Vlastimil Babka wrote:
> On 2/26/22 08:19, Hyeonggon Yoo wrote:
> > On Fri, Feb 25, 2022 at 07:03:13PM +0100, Vlastimil Babka wrote:
> >> Hi,
> >> 
> >> this series combines and revives patches from Oliver's last year
> >> bachelor thesis (where I was the advisor) that make SLUB's debugfs
> >> files alloc_traces and free_traces more useful.
> >> The resubmission was blocked on stackdepot changes that are now merged,
> >> as explained in patch 2.
> >> 
> > 
> > Hello. I just started review/testing this series.
> > 
> > it crashed on my system (arm64)
> 
> Hmm, interesting. On x86_64 this works for me and stackdepot is allocated
> from memblock. arm64 must have memblock freeing happen earlier or something.
> (CCing memblock experts)
> 
> > I ran with boot parameter slub_debug=U, and without KASAN.
> > So CONFIG_STACKDEPOT_ALWAYS_INIT=n.
> > 
> > void * __init memblock_alloc_try_nid(
> >                         phys_addr_t size, phys_addr_t align,
> >                         phys_addr_t min_addr, phys_addr_t max_addr,
> >                         int nid)
> > {
> >         void *ptr;
> > 
> >         memblock_dbg("%s: %llu bytes align=0x%llx nid=%d from=%pa max_addr=%pa %pS\n",
> >                      __func__, (u64)size, (u64)align, nid, &min_addr,
> >                      &max_addr, (void *)_RET_IP_);
> >         ptr = memblock_alloc_internal(size, align,
> >                                            min_addr, max_addr, nid, false);
> >         if (ptr)
> >                 memset(ptr, 0, size); <--- Crash Here
> > 
> >         return ptr;
> > }
> > 
> > It crashed during create_boot_cache() -> stack_depot_init() ->
> > memblock_alloc().
> > 
> > I think That's because, in kmem_cache_init(), both slab and memblock is not
> > available. (AFAIU memblock is not available after mem_init() because of
> > memblock_free_all(), right?)
> 
> Hm yes I see, even in x86_64 version mem_init() calls memblock_free_all().
> But then, I would expect stack_depot_init() to detect that memblock_alloc()
> returns NULL, we print ""Stack Depot hash table allocation failed,
> disabling" and disable it. Instead it seems memblock_alloc() returns
> something that's already potentially used by somebody else? Sounds like a bug?


By the way, I fixed this by allowing stack_depot_init() to be called in
kmem_cache_init() too [1] and Marco suggested that calling
stack_depot_init() depending on slub_debug parameter for simplicity. [2]

I would prefer [2], Would you take a look?

[1] https://lkml.org/lkml/2022/2/27/31

[2] https://lkml.org/lkml/2022/2/28/717

> > Thanks!
> > 
> > /*
> >  * Set up kernel memory allocators
> >  */
> > static void __init mm_init(void)
> > {
> >         /*
> >          * page_ext requires contiguous pages,
> >          * bigger than MAX_ORDER unless SPARSEMEM.
> >          */
> >         page_ext_init_flatmem();
> >         init_mem_debugging_and_hardening();
> >         kfence_alloc_pool();
> >         report_meminit();
> >         stack_depot_early_init();
> >         mem_init();
> >         mem_init_print_info();
> >         kmem_cache_init();
> >         /*
> >          * page_owner must be initialized after buddy is ready, and also after
> >          * slab is ready so that stack_depot_init() works properly
> >          */)
> > 
> >> Patch 1 is a new preparatory cleanup.
> >> 
> >> Patch 2 originally submitted here [1], was merged to mainline but
> >> reverted for stackdepot related issues as explained in the patch.
> >> 
> >> Patches 3-5 originally submitted as RFC here [2]. In this submission I
> >> have omitted the new file 'all_objects' (patch 3/3 in [2]) as it might
> >> be considered too intrusive so I will postpone it for later. The docs
> >> patch is adjusted accordingly.
> >> 
> >> Also available in git, based on v5.17-rc1:
> >> https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=slub-stackdepot-v1
> >> 
> >> I'd like to ask for some review before I add this to the slab tree.
> >> 
> >> [1] https://lore.kernel.org/all/20210414163434.4376-1-glittao@gmail.com/
> >> [2] https://lore.kernel.org/all/20210521121127.24653-1-glittao@gmail.com/
> >> 
> >> Oliver Glitta (4):
> >>   mm/slub: use stackdepot to save stack trace in objects
> >>   mm/slub: aggregate and print stack traces in debugfs files
> >>   mm/slub: sort debugfs output by frequency of stack traces
> >>   slab, documentation: add description of debugfs files for SLUB caches
> >> 
> >> Vlastimil Babka (1):
> >>   mm/slub: move struct track init out of set_track()
> >> 
> >>  Documentation/vm/slub.rst |  61 +++++++++++++++
> >>  init/Kconfig              |   1 +
> >>  mm/slub.c                 | 152 +++++++++++++++++++++++++-------------
> >>  3 files changed, 162 insertions(+), 52 deletions(-)
> >> 
> >> -- 
> >> 2.35.1
> >> 
> >> 
> > 
>
Vlastimil Babka Feb. 28, 2022, 11:38 p.m. UTC | #7
On 2/28/22 21:01, Mike Rapoport wrote:
> On Mon, Feb 28, 2022 at 08:10:18PM +0100, Vlastimil Babka wrote:
>> On 2/26/22 08:19, Hyeonggon Yoo wrote:
>> > On Fri, Feb 25, 2022 at 07:03:13PM +0100, Vlastimil Babka wrote:
>> >> Hi,
>> >> 
>> >> this series combines and revives patches from Oliver's last year
>> >> bachelor thesis (where I was the advisor) that make SLUB's debugfs
>> >> files alloc_traces and free_traces more useful.
>> >> The resubmission was blocked on stackdepot changes that are now merged,
>> >> as explained in patch 2.
>> >> 
>> > 
>> > Hello. I just started review/testing this series.
>> > 
>> > it crashed on my system (arm64)
>> 
>> Hmm, interesting. On x86_64 this works for me and stackdepot is allocated
>> from memblock. arm64 must have memblock freeing happen earlier or something.
>> (CCing memblock experts)
>> 
>> > I ran with boot parameter slub_debug=U, and without KASAN.
>> > So CONFIG_STACKDEPOT_ALWAYS_INIT=n.
>> > 
>> > void * __init memblock_alloc_try_nid(
>> >                         phys_addr_t size, phys_addr_t align,
>> >                         phys_addr_t min_addr, phys_addr_t max_addr,
>> >                         int nid)
>> > {
>> >         void *ptr;
>> > 
>> >         memblock_dbg("%s: %llu bytes align=0x%llx nid=%d from=%pa max_addr=%pa %pS\n",
>> >                      __func__, (u64)size, (u64)align, nid, &min_addr,
>> >                      &max_addr, (void *)_RET_IP_);
>> >         ptr = memblock_alloc_internal(size, align,
>> >                                            min_addr, max_addr, nid, false);
>> >         if (ptr)
>> >                 memset(ptr, 0, size); <--- Crash Here
>> > 
>> >         return ptr;
>> > }
>> > 
>> > It crashed during create_boot_cache() -> stack_depot_init() ->
>> > memblock_alloc().
>> > 
>> > I think That's because, in kmem_cache_init(), both slab and memblock is not
>> > available. (AFAIU memblock is not available after mem_init() because of
>> > memblock_free_all(), right?)
>> 
>> Hm yes I see, even in x86_64 version mem_init() calls memblock_free_all().
>> But then, I would expect stack_depot_init() to detect that memblock_alloc()
>> returns NULL, we print ""Stack Depot hash table allocation failed,
>> disabling" and disable it. Instead it seems memblock_alloc() returns
>> something that's already potentially used by somebody else? Sounds like a bug?
> 
> If stack_depot_init() is called from kmem_cache_init(), there will be a
> confusion what allocator should be used because we use slab_is_available()
> to stop using memblock and start using kmalloc() instead in both
> stack_depot_init() and in memblock.

I did check that stack_depot_init() is called from kmem_cache_init()
*before* we make slab_is_available() true, hence assumed that memblock would
be still available at that point and expected no confusion. But seems if
memblock is already beyond memblock_free_all() then it being still available
is just an illusion?

> Hyeonggon, did you run your tests with panic on warn at any chance?
>  
>> > Thanks!
>> > 
>> > /*
>> >  * Set up kernel memory allocators
>> >  */
>> > static void __init mm_init(void)
>> > {
>> >         /*
>> >          * page_ext requires contiguous pages,
>> >          * bigger than MAX_ORDER unless SPARSEMEM.
>> >          */
>> >         page_ext_init_flatmem();
>> >         init_mem_debugging_and_hardening();
>> >         kfence_alloc_pool();
>> >         report_meminit();
>> >         stack_depot_early_init();
>> >         mem_init();
>> >         mem_init_print_info();
>> >         kmem_cache_init();
>> >         /*
>> >          * page_owner must be initialized after buddy is ready, and also after
>> >          * slab is ready so that stack_depot_init() works properly
>> >          */)
>> > 
>> >> Patch 1 is a new preparatory cleanup.
>> >> 
>> >> Patch 2 originally submitted here [1], was merged to mainline but
>> >> reverted for stackdepot related issues as explained in the patch.
>> >> 
>> >> Patches 3-5 originally submitted as RFC here [2]. In this submission I
>> >> have omitted the new file 'all_objects' (patch 3/3 in [2]) as it might
>> >> be considered too intrusive so I will postpone it for later. The docs
>> >> patch is adjusted accordingly.
>> >> 
>> >> Also available in git, based on v5.17-rc1:
>> >> https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=slub-stackdepot-v1
>> >> 
>> >> I'd like to ask for some review before I add this to the slab tree.
>> >> 
>> >> [1] https://lore.kernel.org/all/20210414163434.4376-1-glittao@gmail.com/
>> >> [2] https://lore.kernel.org/all/20210521121127.24653-1-glittao@gmail.com/
>> >> 
>> >> Oliver Glitta (4):
>> >>   mm/slub: use stackdepot to save stack trace in objects
>> >>   mm/slub: aggregate and print stack traces in debugfs files
>> >>   mm/slub: sort debugfs output by frequency of stack traces
>> >>   slab, documentation: add description of debugfs files for SLUB caches
>> >> 
>> >> Vlastimil Babka (1):
>> >>   mm/slub: move struct track init out of set_track()
>> >> 
>> >>  Documentation/vm/slub.rst |  61 +++++++++++++++
>> >>  init/Kconfig              |   1 +
>> >>  mm/slub.c                 | 152 +++++++++++++++++++++++++-------------
>> >>  3 files changed, 162 insertions(+), 52 deletions(-)
>> >> 
>> >> -- 
>> >> 2.35.1
>> >> 
>> >> 
>> > 
>> 
>
Mike Rapoport March 1, 2022, 9:21 a.m. UTC | #8
On Tue, Mar 01, 2022 at 12:38:11AM +0100, Vlastimil Babka wrote:
> On 2/28/22 21:01, Mike Rapoport wrote:
> > On Mon, Feb 28, 2022 at 08:10:18PM +0100, Vlastimil Babka wrote:
> >> On 2/26/22 08:19, Hyeonggon Yoo wrote:
> >> > On Fri, Feb 25, 2022 at 07:03:13PM +0100, Vlastimil Babka wrote:
> >> >> Hi,
> >> >> 
> >> >> this series combines and revives patches from Oliver's last year
> >> >> bachelor thesis (where I was the advisor) that make SLUB's debugfs
> >> >> files alloc_traces and free_traces more useful.
> >> >> The resubmission was blocked on stackdepot changes that are now merged,
> >> >> as explained in patch 2.
> >> >> 
> >> > 
> >> > Hello. I just started review/testing this series.
> >> > 
> >> > it crashed on my system (arm64)
> >> 
> >> Hmm, interesting. On x86_64 this works for me and stackdepot is allocated
> >> from memblock. arm64 must have memblock freeing happen earlier or something.
> >> (CCing memblock experts)
> >> 
> >> > I ran with boot parameter slub_debug=U, and without KASAN.
> >> > So CONFIG_STACKDEPOT_ALWAYS_INIT=n.
> >> > 
> >> > void * __init memblock_alloc_try_nid(
> >> >                         phys_addr_t size, phys_addr_t align,
> >> >                         phys_addr_t min_addr, phys_addr_t max_addr,
> >> >                         int nid)
> >> > {
> >> >         void *ptr;
> >> > 
> >> >         memblock_dbg("%s: %llu bytes align=0x%llx nid=%d from=%pa max_addr=%pa %pS\n",
> >> >                      __func__, (u64)size, (u64)align, nid, &min_addr,
> >> >                      &max_addr, (void *)_RET_IP_);
> >> >         ptr = memblock_alloc_internal(size, align,
> >> >                                            min_addr, max_addr, nid, false);
> >> >         if (ptr)
> >> >                 memset(ptr, 0, size); <--- Crash Here
> >> > 
> >> >         return ptr;
> >> > }
> >> > 
> >> > It crashed during create_boot_cache() -> stack_depot_init() ->
> >> > memblock_alloc().
> >> > 
> >> > I think That's because, in kmem_cache_init(), both slab and memblock is not
> >> > available. (AFAIU memblock is not available after mem_init() because of
> >> > memblock_free_all(), right?)
> >> 
> >> Hm yes I see, even in x86_64 version mem_init() calls memblock_free_all().
> >> But then, I would expect stack_depot_init() to detect that memblock_alloc()
> >> returns NULL, we print ""Stack Depot hash table allocation failed,
> >> disabling" and disable it. Instead it seems memblock_alloc() returns
> >> something that's already potentially used by somebody else? Sounds like a bug?
> > 
> > If stack_depot_init() is called from kmem_cache_init(), there will be a
> > confusion what allocator should be used because we use slab_is_available()
> > to stop using memblock and start using kmalloc() instead in both
> > stack_depot_init() and in memblock.
> 
> I did check that stack_depot_init() is called from kmem_cache_init()
> *before* we make slab_is_available() true, hence assumed that memblock would
> be still available at that point and expected no confusion. But seems if
> memblock is already beyond memblock_free_all() then it being still available
> is just an illusion?

Yeah, it appears it is an illusion :)

I think we have to deal with allocations that happen between
memblock_free_all() and slab_is_available() at the memblock level and then
figure out the where to put stack_depot_init() and how to allocate memory
there.

I believe something like this (untested) patch below addresses the first
issue. As for stack_depot_init() I'm still trying to figure out the
possible call paths, but it seems we can use stack_depot_early_init() for
SLUB debugging case. I'll try to come up with something Really Soon (tm).

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index 50ad19662a32..4ea89d44d22a 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -90,6 +90,7 @@ struct memblock_type {
  */
 struct memblock {
 	bool bottom_up;  /* is bottom up direction? */
+	bool mem_freed;
 	phys_addr_t current_limit;
 	struct memblock_type memory;
 	struct memblock_type reserved;
diff --git a/mm/memblock.c b/mm/memblock.c
index b12a364f2766..60196dc4980e 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -120,6 +120,7 @@ struct memblock memblock __initdata_memblock = {
 	.reserved.name		= "reserved",
 
 	.bottom_up		= false,
+	.mem_freed		= false,
 	.current_limit		= MEMBLOCK_ALLOC_ANYWHERE,
 };
 
@@ -1487,6 +1488,13 @@ static void * __init memblock_alloc_internal(
 	if (WARN_ON_ONCE(slab_is_available()))
 		return kzalloc_node(size, GFP_NOWAIT, nid);
 
+	if (memblock.mem_freed) {
+		unsigned int order = get_order(size);
+
+		pr_warn("memblock: allocating from buddy\n");
+		return __alloc_pages_node(nid, order, GFP_KERNEL);
+	}
+
 	if (max_addr > memblock.current_limit)
 		max_addr = memblock.current_limit;
 
@@ -2116,6 +2124,7 @@ void __init memblock_free_all(void)
 
 	pages = free_low_memory_core_early();
 	totalram_pages_add(pages);
+	memblock.mem_freed = true;
 }
 
 #if defined(CONFIG_DEBUG_FS) && defined(CONFIG_ARCH_KEEP_MEMBLOCK)
 
> > Hyeonggon, did you run your tests with panic on warn at any chance?
> >  
> >> > Thanks!
> >> > 
> >> > /*
> >> >  * Set up kernel memory allocators
> >> >  */
> >> > static void __init mm_init(void)
> >> > {
> >> >         /*
> >> >          * page_ext requires contiguous pages,
> >> >          * bigger than MAX_ORDER unless SPARSEMEM.
> >> >          */
> >> >         page_ext_init_flatmem();
> >> >         init_mem_debugging_and_hardening();
> >> >         kfence_alloc_pool();
> >> >         report_meminit();
> >> >         stack_depot_early_init();
> >> >         mem_init();
> >> >         mem_init_print_info();
> >> >         kmem_cache_init();
> >> >         /*
> >> >          * page_owner must be initialized after buddy is ready, and also after
> >> >          * slab is ready so that stack_depot_init() works properly
> >> >          */)
> >> > 
> >> >> Patch 1 is a new preparatory cleanup.
> >> >> 
> >> >> Patch 2 originally submitted here [1], was merged to mainline but
> >> >> reverted for stackdepot related issues as explained in the patch.
> >> >> 
> >> >> Patches 3-5 originally submitted as RFC here [2]. In this submission I
> >> >> have omitted the new file 'all_objects' (patch 3/3 in [2]) as it might
> >> >> be considered too intrusive so I will postpone it for later. The docs
> >> >> patch is adjusted accordingly.
> >> >> 
> >> >> Also available in git, based on v5.17-rc1:
> >> >> https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=slub-stackdepot-v1
> >> >> 
> >> >> I'd like to ask for some review before I add this to the slab tree.
> >> >> 
> >> >> [1] https://lore.kernel.org/all/20210414163434.4376-1-glittao@gmail.com/
> >> >> [2] https://lore.kernel.org/all/20210521121127.24653-1-glittao@gmail.com/
> >> >> 
> >> >> Oliver Glitta (4):
> >> >>   mm/slub: use stackdepot to save stack trace in objects
> >> >>   mm/slub: aggregate and print stack traces in debugfs files
> >> >>   mm/slub: sort debugfs output by frequency of stack traces
> >> >>   slab, documentation: add description of debugfs files for SLUB caches
> >> >> 
> >> >> Vlastimil Babka (1):
> >> >>   mm/slub: move struct track init out of set_track()
> >> >> 
> >> >>  Documentation/vm/slub.rst |  61 +++++++++++++++
> >> >>  init/Kconfig              |   1 +
> >> >>  mm/slub.c                 | 152 +++++++++++++++++++++++++-------------
> >> >>  3 files changed, 162 insertions(+), 52 deletions(-)
> >> >> 
> >> >> -- 
> >> >> 2.35.1
> >> >> 
> >> >> 
> >> > 
> >> 
> > 
>
Mike Rapoport March 1, 2022, 9:23 a.m. UTC | #9
Hi,

On Mon, Feb 28, 2022 at 09:27:02PM +0000, Hyeonggon Yoo wrote:
> On Mon, Feb 28, 2022 at 08:10:18PM +0100, Vlastimil Babka wrote:
> > On 2/26/22 08:19, Hyeonggon Yoo wrote:
> > > On Fri, Feb 25, 2022 at 07:03:13PM +0100, Vlastimil Babka wrote:
> > >> Hi,
> > >> 
> > >> this series combines and revives patches from Oliver's last year
> > >> bachelor thesis (where I was the advisor) that make SLUB's debugfs
> > >> files alloc_traces and free_traces more useful.
> > >> The resubmission was blocked on stackdepot changes that are now merged,
> > >> as explained in patch 2.
> > >> 
> > > 
> > > Hello. I just started review/testing this series.
> > > 
> > > it crashed on my system (arm64)
> > 
> > Hmm, interesting. On x86_64 this works for me and stackdepot is allocated
> > from memblock. arm64 must have memblock freeing happen earlier or something.
> > (CCing memblock experts)
> > 
> > > I ran with boot parameter slub_debug=U, and without KASAN.
> > > So CONFIG_STACKDEPOT_ALWAYS_INIT=n.
> > > 
> > > void * __init memblock_alloc_try_nid(
> > >                         phys_addr_t size, phys_addr_t align,
> > >                         phys_addr_t min_addr, phys_addr_t max_addr,
> > >                         int nid)
> > > {
> > >         void *ptr;
> > > 
> > >         memblock_dbg("%s: %llu bytes align=0x%llx nid=%d from=%pa max_addr=%pa %pS\n",
> > >                      __func__, (u64)size, (u64)align, nid, &min_addr,
> > >                      &max_addr, (void *)_RET_IP_);
> > >         ptr = memblock_alloc_internal(size, align,
> > >                                            min_addr, max_addr, nid, false);
> > >         if (ptr)
> > >                 memset(ptr, 0, size); <--- Crash Here
> > > 
> > >         return ptr;
> > > }
> > > 
> > > It crashed during create_boot_cache() -> stack_depot_init() ->
> > > memblock_alloc().
> > > 
> > > I think That's because, in kmem_cache_init(), both slab and memblock is not
> > > available. (AFAIU memblock is not available after mem_init() because of
> > > memblock_free_all(), right?)
> > 
> > Hm yes I see, even in x86_64 version mem_init() calls memblock_free_all().
> > But then, I would expect stack_depot_init() to detect that memblock_alloc()
> > returns NULL, we print ""Stack Depot hash table allocation failed,
> > disabling" and disable it. Instead it seems memblock_alloc() returns
> > something that's already potentially used by somebody else? Sounds like a bug?
> 
> 
> By the way, I fixed this by allowing stack_depot_init() to be called in
> kmem_cache_init() too [1] and Marco suggested that calling
> stack_depot_init() depending on slub_debug parameter for simplicity. [2]
> 
> I would prefer [2], Would you take a look?
> 
> [1] https://lkml.org/lkml/2022/2/27/31
> 
> [2] https://lkml.org/lkml/2022/2/28/717

I'm still looking at stack_depot_init() callers, but I think it's possible
to make stack_depot_early_init() a proper function that calls
memblock_alloc() and only use stack_depot_init() when slab is actually
available.
 
> > > Thanks!
> > > 
> > > /*
> > >  * Set up kernel memory allocators
> > >  */
> > > static void __init mm_init(void)
> > > {
> > >         /*
> > >          * page_ext requires contiguous pages,
> > >          * bigger than MAX_ORDER unless SPARSEMEM.
> > >          */
> > >         page_ext_init_flatmem();
> > >         init_mem_debugging_and_hardening();
> > >         kfence_alloc_pool();
> > >         report_meminit();
> > >         stack_depot_early_init();
> > >         mem_init();
> > >         mem_init_print_info();
> > >         kmem_cache_init();
> > >         /*
> > >          * page_owner must be initialized after buddy is ready, and also after
> > >          * slab is ready so that stack_depot_init() works properly
> > >          */)
> > > 
> > >> Patch 1 is a new preparatory cleanup.
> > >> 
> > >> Patch 2 originally submitted here [1], was merged to mainline but
> > >> reverted for stackdepot related issues as explained in the patch.
> > >> 
> > >> Patches 3-5 originally submitted as RFC here [2]. In this submission I
> > >> have omitted the new file 'all_objects' (patch 3/3 in [2]) as it might
> > >> be considered too intrusive so I will postpone it for later. The docs
> > >> patch is adjusted accordingly.
> > >> 
> > >> Also available in git, based on v5.17-rc1:
> > >> https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=slub-stackdepot-v1
> > >> 
> > >> I'd like to ask for some review before I add this to the slab tree.
> > >> 
> > >> [1] https://lore.kernel.org/all/20210414163434.4376-1-glittao@gmail.com/
> > >> [2] https://lore.kernel.org/all/20210521121127.24653-1-glittao@gmail.com/
> > >> 
> > >> Oliver Glitta (4):
> > >>   mm/slub: use stackdepot to save stack trace in objects
> > >>   mm/slub: aggregate and print stack traces in debugfs files
> > >>   mm/slub: sort debugfs output by frequency of stack traces
> > >>   slab, documentation: add description of debugfs files for SLUB caches
> > >> 
> > >> Vlastimil Babka (1):
> > >>   mm/slub: move struct track init out of set_track()
> > >> 
> > >>  Documentation/vm/slub.rst |  61 +++++++++++++++
> > >>  init/Kconfig              |   1 +
> > >>  mm/slub.c                 | 152 +++++++++++++++++++++++++-------------
> > >>  3 files changed, 162 insertions(+), 52 deletions(-)
> > >> 
> > >> -- 
> > >> 2.35.1
> > >> 
> > >> 
> > > 
> > 
> 
> -- 
> Thank you, You are awesome!
> Hyeonggon :-)
Vlastimil Babka March 1, 2022, 9:41 a.m. UTC | #10
On 3/1/22 10:21, Mike Rapoport wrote:
> On Tue, Mar 01, 2022 at 12:38:11AM +0100, Vlastimil Babka wrote:
>> On 2/28/22 21:01, Mike Rapoport wrote:
>> > On Mon, Feb 28, 2022 at 08:10:18PM +0100, Vlastimil Babka wrote:
>> >> On 2/26/22 08:19, Hyeonggon Yoo wrote:
>> >> > On Fri, Feb 25, 2022 at 07:03:13PM +0100, Vlastimil Babka wrote:
>> >> >> Hi,
>> >> >> 
>> >> >> this series combines and revives patches from Oliver's last year
>> >> >> bachelor thesis (where I was the advisor) that make SLUB's debugfs
>> >> >> files alloc_traces and free_traces more useful.
>> >> >> The resubmission was blocked on stackdepot changes that are now merged,
>> >> >> as explained in patch 2.
>> >> >> 
>> >> > 
>> >> > Hello. I just started review/testing this series.
>> >> > 
>> >> > it crashed on my system (arm64)
>> >> 
>> >> Hmm, interesting. On x86_64 this works for me and stackdepot is allocated
>> >> from memblock. arm64 must have memblock freeing happen earlier or something.
>> >> (CCing memblock experts)
>> >> 
>> >> > I ran with boot parameter slub_debug=U, and without KASAN.
>> >> > So CONFIG_STACKDEPOT_ALWAYS_INIT=n.
>> >> > 
>> >> > void * __init memblock_alloc_try_nid(
>> >> >                         phys_addr_t size, phys_addr_t align,
>> >> >                         phys_addr_t min_addr, phys_addr_t max_addr,
>> >> >                         int nid)
>> >> > {
>> >> >         void *ptr;
>> >> > 
>> >> >         memblock_dbg("%s: %llu bytes align=0x%llx nid=%d from=%pa max_addr=%pa %pS\n",
>> >> >                      __func__, (u64)size, (u64)align, nid, &min_addr,
>> >> >                      &max_addr, (void *)_RET_IP_);
>> >> >         ptr = memblock_alloc_internal(size, align,
>> >> >                                            min_addr, max_addr, nid, false);
>> >> >         if (ptr)
>> >> >                 memset(ptr, 0, size); <--- Crash Here
>> >> > 
>> >> >         return ptr;
>> >> > }
>> >> > 
>> >> > It crashed during create_boot_cache() -> stack_depot_init() ->
>> >> > memblock_alloc().
>> >> > 
>> >> > I think That's because, in kmem_cache_init(), both slab and memblock is not
>> >> > available. (AFAIU memblock is not available after mem_init() because of
>> >> > memblock_free_all(), right?)
>> >> 
>> >> Hm yes I see, even in x86_64 version mem_init() calls memblock_free_all().
>> >> But then, I would expect stack_depot_init() to detect that memblock_alloc()
>> >> returns NULL, we print ""Stack Depot hash table allocation failed,
>> >> disabling" and disable it. Instead it seems memblock_alloc() returns
>> >> something that's already potentially used by somebody else? Sounds like a bug?
>> > 
>> > If stack_depot_init() is called from kmem_cache_init(), there will be a
>> > confusion what allocator should be used because we use slab_is_available()
>> > to stop using memblock and start using kmalloc() instead in both
>> > stack_depot_init() and in memblock.
>> 
>> I did check that stack_depot_init() is called from kmem_cache_init()
>> *before* we make slab_is_available() true, hence assumed that memblock would
>> be still available at that point and expected no confusion. But seems if
>> memblock is already beyond memblock_free_all() then it being still available
>> is just an illusion?
> 
> Yeah, it appears it is an illusion :)
> 
> I think we have to deal with allocations that happen between
> memblock_free_all() and slab_is_available() at the memblock level and then
> figure out the where to put stack_depot_init() and how to allocate memory
> there.
> 
> I believe something like this (untested) patch below addresses the first
> issue. As for stack_depot_init() I'm still trying to figure out the
> possible call paths, but it seems we can use stack_depot_early_init() for
> SLUB debugging case. I'll try to come up with something Really Soon (tm).

Yeah as you already noticed, we are pursuing an approach to decide on
calling stack_depot_early_init(), which should be a good way to solve this
given how special slab is in this case. For memblock I just wanted to point
out that it could be more robust, your patch below seems to be on the right
patch. Maybe it just doesn't have to fallback to buddy, which could be
considered a layering violation, but just return NULL that can be
immediately recognized as an error?

> diff --git a/include/linux/memblock.h b/include/linux/memblock.h
> index 50ad19662a32..4ea89d44d22a 100644
> --- a/include/linux/memblock.h
> +++ b/include/linux/memblock.h
> @@ -90,6 +90,7 @@ struct memblock_type {
>   */
>  struct memblock {
>  	bool bottom_up;  /* is bottom up direction? */
> +	bool mem_freed;
>  	phys_addr_t current_limit;
>  	struct memblock_type memory;
>  	struct memblock_type reserved;
> diff --git a/mm/memblock.c b/mm/memblock.c
> index b12a364f2766..60196dc4980e 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -120,6 +120,7 @@ struct memblock memblock __initdata_memblock = {
>  	.reserved.name		= "reserved",
>  
>  	.bottom_up		= false,
> +	.mem_freed		= false,
>  	.current_limit		= MEMBLOCK_ALLOC_ANYWHERE,
>  };
>  
> @@ -1487,6 +1488,13 @@ static void * __init memblock_alloc_internal(
>  	if (WARN_ON_ONCE(slab_is_available()))
>  		return kzalloc_node(size, GFP_NOWAIT, nid);
>  
> +	if (memblock.mem_freed) {
> +		unsigned int order = get_order(size);
> +
> +		pr_warn("memblock: allocating from buddy\n");
> +		return __alloc_pages_node(nid, order, GFP_KERNEL);
> +	}
> +
>  	if (max_addr > memblock.current_limit)
>  		max_addr = memblock.current_limit;
>  
> @@ -2116,6 +2124,7 @@ void __init memblock_free_all(void)
>  
>  	pages = free_low_memory_core_early();
>  	totalram_pages_add(pages);
> +	memblock.mem_freed = true;
>  }
>  
>  #if defined(CONFIG_DEBUG_FS) && defined(CONFIG_ARCH_KEEP_MEMBLOCK)
>  
>> > Hyeonggon, did you run your tests with panic on warn at any chance?
>> >  
>> >> > Thanks!
>> >> > 
>> >> > /*
>> >> >  * Set up kernel memory allocators
>> >> >  */
>> >> > static void __init mm_init(void)
>> >> > {
>> >> >         /*
>> >> >          * page_ext requires contiguous pages,
>> >> >          * bigger than MAX_ORDER unless SPARSEMEM.
>> >> >          */
>> >> >         page_ext_init_flatmem();
>> >> >         init_mem_debugging_and_hardening();
>> >> >         kfence_alloc_pool();
>> >> >         report_meminit();
>> >> >         stack_depot_early_init();
>> >> >         mem_init();
>> >> >         mem_init_print_info();
>> >> >         kmem_cache_init();
>> >> >         /*
>> >> >          * page_owner must be initialized after buddy is ready, and also after
>> >> >          * slab is ready so that stack_depot_init() works properly
>> >> >          */)
>> >> > 
>> >> >> Patch 1 is a new preparatory cleanup.
>> >> >> 
>> >> >> Patch 2 originally submitted here [1], was merged to mainline but
>> >> >> reverted for stackdepot related issues as explained in the patch.
>> >> >> 
>> >> >> Patches 3-5 originally submitted as RFC here [2]. In this submission I
>> >> >> have omitted the new file 'all_objects' (patch 3/3 in [2]) as it might
>> >> >> be considered too intrusive so I will postpone it for later. The docs
>> >> >> patch is adjusted accordingly.
>> >> >> 
>> >> >> Also available in git, based on v5.17-rc1:
>> >> >> https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=slub-stackdepot-v1
>> >> >> 
>> >> >> I'd like to ask for some review before I add this to the slab tree.
>> >> >> 
>> >> >> [1] https://lore.kernel.org/all/20210414163434.4376-1-glittao@gmail.com/
>> >> >> [2] https://lore.kernel.org/all/20210521121127.24653-1-glittao@gmail.com/
>> >> >> 
>> >> >> Oliver Glitta (4):
>> >> >>   mm/slub: use stackdepot to save stack trace in objects
>> >> >>   mm/slub: aggregate and print stack traces in debugfs files
>> >> >>   mm/slub: sort debugfs output by frequency of stack traces
>> >> >>   slab, documentation: add description of debugfs files for SLUB caches
>> >> >> 
>> >> >> Vlastimil Babka (1):
>> >> >>   mm/slub: move struct track init out of set_track()
>> >> >> 
>> >> >>  Documentation/vm/slub.rst |  61 +++++++++++++++
>> >> >>  init/Kconfig              |   1 +
>> >> >>  mm/slub.c                 | 152 +++++++++++++++++++++++++-------------
>> >> >>  3 files changed, 162 insertions(+), 52 deletions(-)
>> >> >> 
>> >> >> -- 
>> >> >> 2.35.1
>> >> >> 
>> >> >> 
>> >> > 
>> >> 
>> > 
>> 
>
Mike Rapoport March 1, 2022, 2:52 p.m. UTC | #11
On Tue, Mar 01, 2022 at 10:41:10AM +0100, Vlastimil Babka wrote:
> On 3/1/22 10:21, Mike Rapoport wrote:
> > On Tue, Mar 01, 2022 at 12:38:11AM +0100, Vlastimil Babka wrote:
> >> On 2/28/22 21:01, Mike Rapoport wrote:
> >> > On Mon, Feb 28, 2022 at 08:10:18PM +0100, Vlastimil Babka wrote:
> >> > 
> >> > If stack_depot_init() is called from kmem_cache_init(), there will be a
> >> > confusion what allocator should be used because we use slab_is_available()
> >> > to stop using memblock and start using kmalloc() instead in both
> >> > stack_depot_init() and in memblock.
> >> 
> >> I did check that stack_depot_init() is called from kmem_cache_init()
> >> *before* we make slab_is_available() true, hence assumed that memblock would
> >> be still available at that point and expected no confusion. But seems if
> >> memblock is already beyond memblock_free_all() then it being still available
> >> is just an illusion?
> > 
> > Yeah, it appears it is an illusion :)
> > 
> > I think we have to deal with allocations that happen between
> > memblock_free_all() and slab_is_available() at the memblock level and then
> > figure out the where to put stack_depot_init() and how to allocate memory
> > there.
> > 
> > I believe something like this (untested) patch below addresses the first
> > issue. As for stack_depot_init() I'm still trying to figure out the
> > possible call paths, but it seems we can use stack_depot_early_init() for
> > SLUB debugging case. I'll try to come up with something Really Soon (tm).
> 
> Yeah as you already noticed, we are pursuing an approach to decide on
> calling stack_depot_early_init(), which should be a good way to solve this
> given how special slab is in this case. For memblock I just wanted to point
> out that it could be more robust, your patch below seems to be on the right
> patch. Maybe it just doesn't have to fallback to buddy, which could be
> considered a layering violation, but just return NULL that can be
> immediately recognized as an error?

The layering violation is anyway there for slab_is_available() case, so
adding a __alloc_pages() there will be only consistent.
 
> > diff --git a/include/linux/memblock.h b/include/linux/memblock.h
> > index 50ad19662a32..4ea89d44d22a 100644
> > --- a/include/linux/memblock.h
> > +++ b/include/linux/memblock.h
> > @@ -90,6 +90,7 @@ struct memblock_type {
> >   */
> >  struct memblock {
> >  	bool bottom_up;  /* is bottom up direction? */
> > +	bool mem_freed;
> >  	phys_addr_t current_limit;
> >  	struct memblock_type memory;
> >  	struct memblock_type reserved;
> > diff --git a/mm/memblock.c b/mm/memblock.c
> > index b12a364f2766..60196dc4980e 100644
> > --- a/mm/memblock.c
> > +++ b/mm/memblock.c
> > @@ -120,6 +120,7 @@ struct memblock memblock __initdata_memblock = {
> >  	.reserved.name		= "reserved",
> >  
> >  	.bottom_up		= false,
> > +	.mem_freed		= false,
> >  	.current_limit		= MEMBLOCK_ALLOC_ANYWHERE,
> >  };
> >  
> > @@ -1487,6 +1488,13 @@ static void * __init memblock_alloc_internal(
> >  	if (WARN_ON_ONCE(slab_is_available()))
> >  		return kzalloc_node(size, GFP_NOWAIT, nid);
> >  
> > +	if (memblock.mem_freed) {
> > +		unsigned int order = get_order(size);
> > +
> > +		pr_warn("memblock: allocating from buddy\n");
> > +		return __alloc_pages_node(nid, order, GFP_KERNEL);
> > +	}
> > +
> >  	if (max_addr > memblock.current_limit)
> >  		max_addr = memblock.current_limit;
> >  
> > @@ -2116,6 +2124,7 @@ void __init memblock_free_all(void)
> >  
> >  	pages = free_low_memory_core_early();
> >  	totalram_pages_add(pages);
> > +	memblock.mem_freed = true;
> >  }
> >  
> >  #if defined(CONFIG_DEBUG_FS) && defined(CONFIG_ARCH_KEEP_MEMBLOCK)
> >
Mike Rapoport March 2, 2022, 8:37 a.m. UTC | #12
On Mon, Feb 28, 2022 at 09:27:02PM +0000, Hyeonggon Yoo wrote:
> On Mon, Feb 28, 2022 at 08:10:18PM +0100, Vlastimil Babka wrote:
> > On 2/26/22 08:19, Hyeonggon Yoo wrote:
> > > On Fri, Feb 25, 2022 at 07:03:13PM +0100, Vlastimil Babka wrote:
> > >> Hi,
> > >> 
> > >> this series combines and revives patches from Oliver's last year
> > >> bachelor thesis (where I was the advisor) that make SLUB's debugfs
> > >> files alloc_traces and free_traces more useful.
> > >> The resubmission was blocked on stackdepot changes that are now merged,
> > >> as explained in patch 2.
> > >> 
> > > 
> > > Hello. I just started review/testing this series.
> > > 
> > > it crashed on my system (arm64)
> > 
> > Hmm, interesting. On x86_64 this works for me and stackdepot is allocated
> > from memblock. arm64 must have memblock freeing happen earlier or something.
> > (CCing memblock experts)
> > 
> > > I ran with boot parameter slub_debug=U, and without KASAN.
> > > So CONFIG_STACKDEPOT_ALWAYS_INIT=n.
> > > 
> > > void * __init memblock_alloc_try_nid(
> > >                         phys_addr_t size, phys_addr_t align,
> > >                         phys_addr_t min_addr, phys_addr_t max_addr,
> > >                         int nid)
> > > {
> > >         void *ptr;
> > > 
> > >         memblock_dbg("%s: %llu bytes align=0x%llx nid=%d from=%pa max_addr=%pa %pS\n",
> > >                      __func__, (u64)size, (u64)align, nid, &min_addr,
> > >                      &max_addr, (void *)_RET_IP_);
> > >         ptr = memblock_alloc_internal(size, align,
> > >                                            min_addr, max_addr, nid, false);
> > >         if (ptr)
> > >                 memset(ptr, 0, size); <--- Crash Here
> > > 
> > >         return ptr;
> > > }
> > > 
> > > It crashed during create_boot_cache() -> stack_depot_init() ->
> > > memblock_alloc().
> > > 
> > > I think That's because, in kmem_cache_init(), both slab and memblock is not
> > > available. (AFAIU memblock is not available after mem_init() because of
> > > memblock_free_all(), right?)
> > 
> > Hm yes I see, even in x86_64 version mem_init() calls memblock_free_all().
> > But then, I would expect stack_depot_init() to detect that memblock_alloc()
> > returns NULL, we print ""Stack Depot hash table allocation failed,
> > disabling" and disable it. Instead it seems memblock_alloc() returns
> > something that's already potentially used by somebody else? Sounds like a bug?
> 
> 
> By the way, I fixed this by allowing stack_depot_init() to be called in
> kmem_cache_init() too [1] and Marco suggested that calling
> stack_depot_init() depending on slub_debug parameter for simplicity. [2]
> 
> I would prefer [2], Would you take a look?
> 
> [1] https://lkml.org/lkml/2022/2/27/31
> 
> [2] https://lkml.org/lkml/2022/2/28/717

I have the third version :)

diff --git a/mm/slub.c b/mm/slub.c
index a74afe59a403..0c3ab2335b46 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1548,6 +1548,10 @@ static int __init setup_slub_debug(char *str)
 	}
 out:
 	slub_debug = global_flags;
+
+	if (slub_flags & SLAB_STORE_USER && IS_ENABLED(CONFIG_STACKDEPOT))
+		stack_depot_early_init();
+
 	if (slub_debug != 0 || slub_debug_string)
 		static_branch_enable(&slub_debug_enabled);
 	else
@@ -4221,9 +4225,6 @@ static int kmem_cache_open(struct kmem_cache *s, slab_flags_t flags)
 	s->remote_node_defrag_ratio = 1000;
 #endif
 
-	if (s->flags & SLAB_STORE_USER && IS_ENABLED(CONFIG_STACKDEPOT))
-		stack_depot_init();
-
 	/* Initialize the pre-computed randomized freelist if slab is up */
 	if (slab_state >= UP) {
 		if (init_cache_random_seq(s))
 
> -- 
> Thank you, You are awesome!
> Hyeonggon :-)
Vlastimil Babka March 2, 2022, 9:09 a.m. UTC | #13
On 3/2/22 09:37, Mike Rapoport wrote:
> On Mon, Feb 28, 2022 at 09:27:02PM +0000, Hyeonggon Yoo wrote:
>> On Mon, Feb 28, 2022 at 08:10:18PM +0100, Vlastimil Babka wrote:
>> > On 2/26/22 08:19, Hyeonggon Yoo wrote:
>> > > On Fri, Feb 25, 2022 at 07:03:13PM +0100, Vlastimil Babka wrote:
>> > >> Hi,
>> > >> 
>> > >> this series combines and revives patches from Oliver's last year
>> > >> bachelor thesis (where I was the advisor) that make SLUB's debugfs
>> > >> files alloc_traces and free_traces more useful.
>> > >> The resubmission was blocked on stackdepot changes that are now merged,
>> > >> as explained in patch 2.
>> > >> 
>> > > 
>> > > Hello. I just started review/testing this series.
>> > > 
>> > > it crashed on my system (arm64)
>> > 
>> > Hmm, interesting. On x86_64 this works for me and stackdepot is allocated
>> > from memblock. arm64 must have memblock freeing happen earlier or something.
>> > (CCing memblock experts)
>> > 
>> > > I ran with boot parameter slub_debug=U, and without KASAN.
>> > > So CONFIG_STACKDEPOT_ALWAYS_INIT=n.
>> > > 
>> > > void * __init memblock_alloc_try_nid(
>> > >                         phys_addr_t size, phys_addr_t align,
>> > >                         phys_addr_t min_addr, phys_addr_t max_addr,
>> > >                         int nid)
>> > > {
>> > >         void *ptr;
>> > > 
>> > >         memblock_dbg("%s: %llu bytes align=0x%llx nid=%d from=%pa max_addr=%pa %pS\n",
>> > >                      __func__, (u64)size, (u64)align, nid, &min_addr,
>> > >                      &max_addr, (void *)_RET_IP_);
>> > >         ptr = memblock_alloc_internal(size, align,
>> > >                                            min_addr, max_addr, nid, false);
>> > >         if (ptr)
>> > >                 memset(ptr, 0, size); <--- Crash Here
>> > > 
>> > >         return ptr;
>> > > }
>> > > 
>> > > It crashed during create_boot_cache() -> stack_depot_init() ->
>> > > memblock_alloc().
>> > > 
>> > > I think That's because, in kmem_cache_init(), both slab and memblock is not
>> > > available. (AFAIU memblock is not available after mem_init() because of
>> > > memblock_free_all(), right?)
>> > 
>> > Hm yes I see, even in x86_64 version mem_init() calls memblock_free_all().
>> > But then, I would expect stack_depot_init() to detect that memblock_alloc()
>> > returns NULL, we print ""Stack Depot hash table allocation failed,
>> > disabling" and disable it. Instead it seems memblock_alloc() returns
>> > something that's already potentially used by somebody else? Sounds like a bug?
>> 
>> 
>> By the way, I fixed this by allowing stack_depot_init() to be called in
>> kmem_cache_init() too [1] and Marco suggested that calling
>> stack_depot_init() depending on slub_debug parameter for simplicity. [2]
>> 
>> I would prefer [2], Would you take a look?
>> 
>> [1] https://lkml.org/lkml/2022/2/27/31
>> 
>> [2] https://lkml.org/lkml/2022/2/28/717
> 
> I have the third version :)

While simple, it changes the timing of stack_depot_early_init() that was
supposed to be at a single callsite - now it's less predictable and depends
on e.g. kernel parameter ordering. Some arch/config combo could break,
dunno. Setting a variable that stack_depot_early_init() checks should be
more robust.

> 
> diff --git a/mm/slub.c b/mm/slub.c
> index a74afe59a403..0c3ab2335b46 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -1548,6 +1548,10 @@ static int __init setup_slub_debug(char *str)
>  	}
>  out:
>  	slub_debug = global_flags;
> +
> +	if (slub_flags & SLAB_STORE_USER && IS_ENABLED(CONFIG_STACKDEPOT))
> +		stack_depot_early_init();
> +
>  	if (slub_debug != 0 || slub_debug_string)
>  		static_branch_enable(&slub_debug_enabled);
>  	else
> @@ -4221,9 +4225,6 @@ static int kmem_cache_open(struct kmem_cache *s, slab_flags_t flags)
>  	s->remote_node_defrag_ratio = 1000;
>  #endif
>  
> -	if (s->flags & SLAB_STORE_USER && IS_ENABLED(CONFIG_STACKDEPOT))
> -		stack_depot_init();
> -
>  	/* Initialize the pre-computed randomized freelist if slab is up */
>  	if (slab_state >= UP) {
>  		if (init_cache_random_seq(s))
>  
>> -- 
>> Thank you, You are awesome!
>> Hyeonggon :-)
>
Mike Rapoport March 2, 2022, 12:30 p.m. UTC | #14
On Wed, Mar 02, 2022 at 10:09:37AM +0100, Vlastimil Babka wrote:
> On 3/2/22 09:37, Mike Rapoport wrote:
> > On Mon, Feb 28, 2022 at 09:27:02PM +0000, Hyeonggon Yoo wrote:
> >> On Mon, Feb 28, 2022 at 08:10:18PM +0100, Vlastimil Babka wrote:
> >> > On 2/26/22 08:19, Hyeonggon Yoo wrote:
> >> > > On Fri, Feb 25, 2022 at 07:03:13PM +0100, Vlastimil Babka wrote:
> >> > >> Hi,
> >> > >> 
> >> > >> this series combines and revives patches from Oliver's last year
> >> > >> bachelor thesis (where I was the advisor) that make SLUB's debugfs
> >> > >> files alloc_traces and free_traces more useful.
> >> > >> The resubmission was blocked on stackdepot changes that are now merged,
> >> > >> as explained in patch 2.
> >> > >> 
> >> > > 
> >> > > Hello. I just started review/testing this series.
> >> > > 
> >> > > it crashed on my system (arm64)
> >> > 
> >> > Hmm, interesting. On x86_64 this works for me and stackdepot is allocated
> >> > from memblock. arm64 must have memblock freeing happen earlier or something.
> >> > (CCing memblock experts)
> >> > 
> >> > > I ran with boot parameter slub_debug=U, and without KASAN.
> >> > > So CONFIG_STACKDEPOT_ALWAYS_INIT=n.
> >> > > 
> >> > > void * __init memblock_alloc_try_nid(
> >> > >                         phys_addr_t size, phys_addr_t align,
> >> > >                         phys_addr_t min_addr, phys_addr_t max_addr,
> >> > >                         int nid)
> >> > > {
> >> > >         void *ptr;
> >> > > 
> >> > >         memblock_dbg("%s: %llu bytes align=0x%llx nid=%d from=%pa max_addr=%pa %pS\n",
> >> > >                      __func__, (u64)size, (u64)align, nid, &min_addr,
> >> > >                      &max_addr, (void *)_RET_IP_);
> >> > >         ptr = memblock_alloc_internal(size, align,
> >> > >                                            min_addr, max_addr, nid, false);
> >> > >         if (ptr)
> >> > >                 memset(ptr, 0, size); <--- Crash Here
> >> > > 
> >> > >         return ptr;
> >> > > }
> >> > > 
> >> > > It crashed during create_boot_cache() -> stack_depot_init() ->
> >> > > memblock_alloc().
> >> > > 
> >> > > I think That's because, in kmem_cache_init(), both slab and memblock is not
> >> > > available. (AFAIU memblock is not available after mem_init() because of
> >> > > memblock_free_all(), right?)
> >> > 
> >> > Hm yes I see, even in x86_64 version mem_init() calls memblock_free_all().
> >> > But then, I would expect stack_depot_init() to detect that memblock_alloc()
> >> > returns NULL, we print ""Stack Depot hash table allocation failed,
> >> > disabling" and disable it. Instead it seems memblock_alloc() returns
> >> > something that's already potentially used by somebody else? Sounds like a bug?
> >> 
> >> 
> >> By the way, I fixed this by allowing stack_depot_init() to be called in
> >> kmem_cache_init() too [1] and Marco suggested that calling
> >> stack_depot_init() depending on slub_debug parameter for simplicity. [2]
> >> 
> >> I would prefer [2], Would you take a look?
> >> 
> >> [1] https://lkml.org/lkml/2022/2/27/31
> >> 
> >> [2] https://lkml.org/lkml/2022/2/28/717
> > 
> > I have the third version :)
> 
> While simple, it changes the timing of stack_depot_early_init() that was
> supposed to be at a single callsite - now it's less predictable and depends
> on e.g. kernel parameter ordering. Some arch/config combo could break,
> dunno. Setting a variable that stack_depot_early_init() checks should be
> more robust.

Not sure I follow.
stack_depot_early_init() is a wrapper for stack_depot_init() which already
checks 

	if (!stack_depot_disable && !stack_table)

So largely it can be at multiple call sites just like stack_depot_init...

Still, I understand your concern of having multiple call sites for
stack_depot_early_init().

The most robust way I can think of will be to make stack_depot_early_init()
a proper function, move memblock_alloc() there and add a variable, say
stack_depot_needed_early that will be set to 1 if
CONFIG_STACKDEPOT_ALWAYS_INIT=y or by the callers that need to allocate the
stack_table before kmalloc is up.
 
E.g

__init int stack_depot_early_init(void)
{

	if (stack_depot_needed_early && !stack_table) {
		size_t size = (STACK_HASH_SIZE * sizeof(struct stack_record *));
		int i;

		pr_info("Stack Depot allocating hash table with memblock_alloc\n");
		stack_table = memblock_alloc(size, SMP_CACHE_BYTES);
	
		if (!stack_table) {
			pr_err("Stack Depot hash table allocation failed, disabling\n");
			stack_depot_disable = true;
			return -ENOMEM;
		}
	}

	return 0;
}

The mutex is not needed here because mm_init() -> stack_depot_early_init()
happens before SMP and setting stack_table[i] to NULL is redundant with
memblock_alloc(). (btw, kvmalloc case could use __GFP_ZERO as well).

I'm not sure if the stack depot should be disabled for good if the early
allocation failed, but that's another story.

> > diff --git a/mm/slub.c b/mm/slub.c
> > index a74afe59a403..0c3ab2335b46 100644
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -1548,6 +1548,10 @@ static int __init setup_slub_debug(char *str)
> >  	}
> >  out:
> >  	slub_debug = global_flags;
> > +
> > +	if (slub_flags & SLAB_STORE_USER && IS_ENABLED(CONFIG_STACKDEPOT))
> > +		stack_depot_early_init();
> > +
> >  	if (slub_debug != 0 || slub_debug_string)
> >  		static_branch_enable(&slub_debug_enabled);
> >  	else
> > @@ -4221,9 +4225,6 @@ static int kmem_cache_open(struct kmem_cache *s, slab_flags_t flags)
> >  	s->remote_node_defrag_ratio = 1000;
> >  #endif
> >  
> > -	if (s->flags & SLAB_STORE_USER && IS_ENABLED(CONFIG_STACKDEPOT))
> > -		stack_depot_init();
> > -
> >  	/* Initialize the pre-computed randomized freelist if slab is up */
> >  	if (slab_state >= UP) {
> >  		if (init_cache_random_seq(s))
> >  
> >> -- 
> >> Thank you, You are awesome!
> >> Hyeonggon :-)
> > 
>
Hyeonggon Yoo March 2, 2022, 5:02 p.m. UTC | #15
On Wed, Mar 02, 2022 at 02:30:56PM +0200, Mike Rapoport wrote:
> On Wed, Mar 02, 2022 at 10:09:37AM +0100, Vlastimil Babka wrote:
> > On 3/2/22 09:37, Mike Rapoport wrote:
> > > On Mon, Feb 28, 2022 at 09:27:02PM +0000, Hyeonggon Yoo wrote:
> > >> On Mon, Feb 28, 2022 at 08:10:18PM +0100, Vlastimil Babka wrote:
> > >> > On 2/26/22 08:19, Hyeonggon Yoo wrote:
> > >> > > On Fri, Feb 25, 2022 at 07:03:13PM +0100, Vlastimil Babka wrote:
> > >> > >> Hi,
> > >> > >> 
> > >> > >> this series combines and revives patches from Oliver's last year
> > >> > >> bachelor thesis (where I was the advisor) that make SLUB's debugfs
> > >> > >> files alloc_traces and free_traces more useful.
> > >> > >> The resubmission was blocked on stackdepot changes that are now merged,
> > >> > >> as explained in patch 2.
> > >> > >> 
> > >> > > 
> > >> > > Hello. I just started review/testing this series.
> > >> > > 
> > >> > > it crashed on my system (arm64)
> > >> > 
> > >> > Hmm, interesting. On x86_64 this works for me and stackdepot is allocated
> > >> > from memblock. arm64 must have memblock freeing happen earlier or something.
> > >> > (CCing memblock experts)
> > >> > 
> > >> > > I ran with boot parameter slub_debug=U, and without KASAN.
> > >> > > So CONFIG_STACKDEPOT_ALWAYS_INIT=n.
> > >> > > 
> > >> > > void * __init memblock_alloc_try_nid(
> > >> > >                         phys_addr_t size, phys_addr_t align,
> > >> > >                         phys_addr_t min_addr, phys_addr_t max_addr,
> > >> > >                         int nid)
> > >> > > {
> > >> > >         void *ptr;
> > >> > > 
> > >> > >         memblock_dbg("%s: %llu bytes align=0x%llx nid=%d from=%pa max_addr=%pa %pS\n",
> > >> > >                      __func__, (u64)size, (u64)align, nid, &min_addr,
> > >> > >                      &max_addr, (void *)_RET_IP_);
> > >> > >         ptr = memblock_alloc_internal(size, align,
> > >> > >                                            min_addr, max_addr, nid, false);
> > >> > >         if (ptr)
> > >> > >                 memset(ptr, 0, size); <--- Crash Here
> > >> > > 
> > >> > >         return ptr;
> > >> > > }
> > >> > > 
> > >> > > It crashed during create_boot_cache() -> stack_depot_init() ->
> > >> > > memblock_alloc().
> > >> > > 
> > >> > > I think That's because, in kmem_cache_init(), both slab and memblock is not
> > >> > > available. (AFAIU memblock is not available after mem_init() because of
> > >> > > memblock_free_all(), right?)
> > >> > 
> > >> > Hm yes I see, even in x86_64 version mem_init() calls memblock_free_all().
> > >> > But then, I would expect stack_depot_init() to detect that memblock_alloc()
> > >> > returns NULL, we print ""Stack Depot hash table allocation failed,
> > >> > disabling" and disable it. Instead it seems memblock_alloc() returns
> > >> > something that's already potentially used by somebody else? Sounds like a bug?
> > >> 
> > >> 
> > >> By the way, I fixed this by allowing stack_depot_init() to be called in
> > >> kmem_cache_init() too [1] and Marco suggested that calling
> > >> stack_depot_init() depending on slub_debug parameter for simplicity. [2]
> > >> 
> > >> I would prefer [2], Would you take a look?
> > >> 
> > >> [1] https://lkml.org/lkml/2022/2/27/31
> > >> 
> > >> [2] https://lkml.org/lkml/2022/2/28/717
> > > 
> > > I have the third version :)
> > 
> > While simple, it changes the timing of stack_depot_early_init() that was
> > supposed to be at a single callsite - now it's less predictable and depends
> > on e.g. kernel parameter ordering. Some arch/config combo could break,
> > dunno. Setting a variable that stack_depot_early_init() checks should be
> > more robust.
> 
> Not sure I follow.
> stack_depot_early_init() is a wrapper for stack_depot_init() which already
> checks 
> 
> 	if (!stack_depot_disable && !stack_table)
> 
> So largely it can be at multiple call sites just like stack_depot_init...

In my opinion, allowing to call stack_depot_init() in various context is not a good
idea. For another simple example, slub_debug=U,vmap_area can fool the
current code because it's called in context where slab is available,
but vmalloc is unavailable. and stack_depot_init() will try to allocate
using kvmalloc(). Late initialization adds too much complexity.

So IMO we have two solutions.

First solution is only allowing early init and avoiding late init.
(setting a global variable that is visible to stack depot would do this)

And second solution is to make caller allocate and manage its own hash
table. All of this complexity is because we're trying to make stack_table
global.

First solution looks ok if we have few users of stack depot.
But I think we should use second approach if stack depot is growing
more and more callers?

> Still, I understand your concern of having multiple call sites for
> stack_depot_early_init().
> 
> The most robust way I can think of will be to make stack_depot_early_init()
> a proper function, move memblock_alloc() there and add a variable, say
> stack_depot_needed_early that will be set to 1 if
> CONFIG_STACKDEPOT_ALWAYS_INIT=y or by the callers that need to allocate the
> stack_table before kmalloc is up.
>  
> E.g
>
> __init int stack_depot_early_init(void)
> {
> 
> 	if (stack_depot_needed_early && !stack_table) {
> 		size_t size = (STACK_HASH_SIZE * sizeof(struct stack_record *));
> 		int i;
> 
> 		pr_info("Stack Depot allocating hash table with memblock_alloc\n");
> 		stack_table = memblock_alloc(size, SMP_CACHE_BYTES);
> 	
> 		if (!stack_table) {
> 			pr_err("Stack Depot hash table allocation failed, disabling\n");
> 			stack_depot_disable = true;
> 			return -ENOMEM;
> 		}
> 	}
> 
> 	return 0;
> }
>
> The mutex is not needed here because mm_init() -> stack_depot_early_init()
> happens before SMP and setting stack_table[i] to NULL is redundant with
> memblock_alloc(). (btw, kvmalloc case could use __GFP_ZERO as well).
> 
> I'm not sure if the stack depot should be disabled for good if the early
> allocation failed, but that's another story.
> 
> > > diff --git a/mm/slub.c b/mm/slub.c
> > > index a74afe59a403..0c3ab2335b46 100644
> > > --- a/mm/slub.c
> > > +++ b/mm/slub.c
> > > @@ -1548,6 +1548,10 @@ static int __init setup_slub_debug(char *str)
> > >  	}
> > >  out:
> > >  	slub_debug = global_flags;
> > > +
> > > +	if (slub_flags & SLAB_STORE_USER && IS_ENABLED(CONFIG_STACKDEPOT))
> > > +		stack_depot_early_init();
> > > +
> > >  	if (slub_debug != 0 || slub_debug_string)
> > >  		static_branch_enable(&slub_debug_enabled);
> > >  	else
> > > @@ -4221,9 +4225,6 @@ static int kmem_cache_open(struct kmem_cache *s, slab_flags_t flags)
> > >  	s->remote_node_defrag_ratio = 1000;
> > >  #endif
> > >  
> > > -	if (s->flags & SLAB_STORE_USER && IS_ENABLED(CONFIG_STACKDEPOT))
> > > -		stack_depot_init();
> > > -
> > >  	/* Initialize the pre-computed randomized freelist if slab is up */
> > >  	if (slab_state >= UP) {
> > >  		if (init_cache_random_seq(s))
> > >  
> > >> -- 
> > >> Thank you, You are awesome!
> > >> Hyeonggon :-)
> > > 
> > 
> 
> -- 
> Sincerely yours,
> Mike.
Marco Elver March 2, 2022, 5:27 p.m. UTC | #16
On Wed, 2 Mar 2022 at 18:02, Hyeonggon Yoo <42.hyeyoo@gmail.com> wrote:
[...]
> So IMO we have two solutions.
>
> First solution is only allowing early init and avoiding late init.
> (setting a global variable that is visible to stack depot would do this)
>
> And second solution is to make caller allocate and manage its own hash
> table. All of this complexity is because we're trying to make stack_table
> global.

I think this would be a mistake, because then we have to continuously
audit all users of stackdepot and make sure that allocation stack
traces don't end up in duplicate hash tables. It's global for a
reason.

> First solution looks ok if we have few users of stack depot.
> But I think we should use second approach if stack depot is growing
> more and more callers?

The problem here really is just that initialization of stackdepot and
slabs can have a cyclic dependency with the changes you're making. I
very much doubt there'll be other cases (beyond the allocator itself
used by stackdepot) which can introduce such a cyclic dependency.

The easiest way to break the cyclic dependency is to initialize
stackdepot earlier, assuming it can be determined it is required (in
this case it can because the command line is parsed before slab
creation). The suggestion with the stack_depot_needed_early variable
(like Mike's suggested code) would solve all that.

I don't understand the concern about multiple contexts. The problem is
just about a cyclic dependency during early init, and I doubt we'll
have more of that.

Thanks,
-- Marco
Vlastimil Babka March 4, 2022, 5:25 p.m. UTC | #17
On 2/26/22 13:18, Hyeonggon Yoo wrote:
> On Fri, Feb 25, 2022 at 07:03:13PM +0100, Vlastimil Babka wrote:
>> Hi,
>>
>> this series combines and revives patches from Oliver's last year
>> bachelor thesis (where I was the advisor) that make SLUB's debugfs
>> files alloc_traces and free_traces more useful.
>> The resubmission was blocked on stackdepot changes that are now merged,
>> as explained in patch 2.
>>
>> Patch 1 is a new preparatory cleanup.
>>
>> Patch 2 originally submitted here [1], was merged to mainline but
>> reverted for stackdepot related issues as explained in the patch.
>>
>> Patches 3-5 originally submitted as RFC here [2]. In this submission I
>> have omitted the new file 'all_objects' (patch 3/3 in [2]) as it might
>> be considered too intrusive so I will postpone it for later. The docs
>> patch is adjusted accordingly.
>>
> 
> This problem is not caused by this patch series.
> But I think it's worth mentioning...
> 
> It's really weird that some stack traces are not recorded
> when CONFIG_KASAN=y.
> 
> I made sure that:
> 	- Stack Depot did not reach its limit
> 	- the free path happen on CONFIG_KASAN=y too.
> 
> I have no clue why this happen.
> 
> # cat dentry/free_traces (CONFIG_KASAN=y)
>    6585 <not-available> age=4294912647 pid=0 cpus=0

I think it's some kind of KASAN quarantining of freed objects, so they
haven't been properly freed through the SLUB layer yet.

> # cat dentry/free_traces (CONFIG_KASAN=n)
>    1246 <not-available> age=4294906877 pid=0 cpus=0
>     379 __d_free+0x20/0x2c age=33/14225/14353 pid=0-122 cpus=0-3
>         kmem_cache_free+0x1f4/0x21c
>         __d_free+0x20/0x2c
>         rcu_core+0x334/0x580
>         rcu_core_si+0x14/0x20
>         __do_softirq+0x12c/0x2a8
> 
>       2 dentry_free+0x58/0xb0 age=14101/14101/14101 pid=158 cpus=0
>         kmem_cache_free+0x1f4/0x21c
>         dentry_free+0x58/0xb0
>         __dentry_kill+0x18c/0x1d0
>         dput+0x1c4/0x2fc
>         __fput+0xb0/0x230
>         ____fput+0x14/0x20
>         task_work_run+0x84/0x17c
>         do_notify_resume+0x208/0x1330
>         el0_svc+0x6c/0x80
>         el0t_64_sync_handler+0xa8/0x130
>         el0t_64_sync+0x1a0/0x1a4
> 
>       1 dentry_free+0x58/0xb0 age=7678 pid=190 cpus=1
>         kmem_cache_free+0x1f4/0x21c
>         dentry_free+0x58/0xb0
>         __dentry_kill+0x18c/0x1d0
>         dput+0x1c4/0x2fc
>         __fput+0xb0/0x230
>         ____fput+0x14/0x20
>         task_work_run+0x84/0x17c
>         do_exit+0x2dc/0x8e0
>         do_group_exit+0x38/0xa4
>         __wake_up_parent+0x0/0x34
>         invoke_syscall+0x48/0x114
>         el0_svc_common.constprop.0+0x44/0xfc
>         do_el0_svc+0x2c/0x94
>         el0_svc+0x28/0x80
>         el0t_64_sync_handler+0xa8/0x130
>         el0t_64_sync+0x1a0/0x1a4