mbox series

[0/3] page_owner: print stacks and their counter

Message ID 20220901044249.4624-1-osalvador@suse.de (mailing list archive)
Headers show
Series page_owner: print stacks and their counter | expand

Message

Oscar Salvador Sept. 1, 2022, 4:42 a.m. UTC
Hi,

page_owner is a great debug functionality tool that gets us to know
about all pages that have been allocated/freed and their stacktrace.
This comes very handy when e.g: debugging leaks, as with some scripting
we might be able to see those stacktraces that are allocating pages
but not freeing theme.

In my experience, that is one of the most useful cases, but it can get
really tedious to screen through all pages aand try to reconstruct the
stack <-> allocated/freed relationship. There is a lot of noise
to cancel off.

This patch aims to fix that by adding a new functionality into page_owner.
What this does is to create a new read-only file "page_owner_stacks",
which prints only the allocating stacktraces and their counting, being that
the times the stacktrace has allocated - the times it has freed.

So we have a clear overview of stacks <-> allocated/freed relationship
without the need to fiddle with pages and trying to match free stacktraces
with allocated stacktraces.

This is achieved by adding a new refcount_t field in the stack_record struct,
incrementing that refcount_t everytime the same stacktrace allocates,
and decrementing it when it frees a page. Details can be seen in the
respective patches.

We also create another file called "page_owner_threshold", which let us
specify a threshold, so when when reading from "page_owner_stacks",
we will only see those stacktraces which counting goes beyond the
threshold we specified.

A PoC can be found below:

# cat /sys/kernel/debug/page_owner_threshold
 0
# cat /sys/kernel/debug/page_owner_stacks > stacks_full.txt
# head -32 stacks_full.txt
 prep_new_page+0x10d/0x180
 get_page_from_freelist+0x1bd6/0x1e10
 __alloc_pages+0x194/0x360
 alloc_page_interleave+0x13/0x90
 new_slab+0x31d/0x530
 ___slab_alloc+0x5d7/0x720
 __slab_alloc.isra.85+0x4a/0x90
 kmem_cache_alloc+0x455/0x4a0
 acpi_ps_alloc_op+0x57/0x8f
 acpi_ps_create_scope_op+0x12/0x23
 acpi_ps_execute_method+0x102/0x2c1
 acpi_ns_evaluate+0x343/0x4da
 acpi_evaluate_object+0x1cb/0x392
 acpi_run_osc+0x135/0x260
 acpi_init+0x165/0x4ed
 do_one_initcall+0x3e/0x200
stack count: 2

 free_pcp_prepare+0x287/0x5c0
 free_unref_page+0x1c/0xd0
 __mmdrop+0x50/0x160
 finish_task_switch+0x249/0x2b0
 __schedule+0x2c3/0x960
 schedule+0x44/0xb0
 futex_wait_queue+0x70/0xd0
 futex_wait+0x160/0x250
 do_futex+0x11c/0x1b0
 __x64_sys_futex+0x5e/0x1d0
 do_syscall_64+0x37/0x90
 entry_SYSCALL_64_after_hwframe+0x63/0xcd
stack count: 1

 

# echo 10000 > /sys/kernel/debug/page_owner_threshold
# cat /sys/kernel/debug/page_owner_stacks > stacks_10000.txt
# cat stacks_10000.txt 
 prep_new_page+0x10d/0x180
 get_page_from_freelist+0x1bd6/0x1e10
 __alloc_pages+0x194/0x360
 folio_alloc+0x17/0x40
 page_cache_ra_unbounded+0x96/0x170
 filemap_get_pages+0x23d/0x5e0
 filemap_read+0xbf/0x3a0
 __kernel_read+0x136/0x2f0
 kernel_read_file+0x197/0x2d0
 kernel_read_file_from_fd+0x54/0x90
 __do_sys_finit_module+0x89/0x120
 do_syscall_64+0x37/0x90
 entry_SYSCALL_64_after_hwframe+0x63/0xcd
stack count: 36195

 prep_new_page+0x10d/0x180
 get_page_from_freelist+0x1bd6/0x1e10
 __alloc_pages+0x194/0x360
 folio_alloc+0x17/0x40
 page_cache_ra_unbounded+0x96/0x170
 filemap_get_pages+0x23d/0x5e0
 filemap_read+0xbf/0x3a0
 new_sync_read+0x106/0x180
 vfs_read+0x16f/0x190
 ksys_read+0xa5/0xe0
 do_syscall_64+0x37/0x90
 entry_SYSCALL_64_after_hwframe+0x63/0xcd
stack count: 44484

 prep_new_page+0x10d/0x180
 get_page_from_freelist+0x1bd6/0x1e10
 __alloc_pages+0x194/0x360
 folio_alloc+0x17/0x40
 page_cache_ra_unbounded+0x96/0x170
 filemap_get_pages+0xdd/0x5e0
 filemap_read+0xbf/0x3a0
 new_sync_read+0x106/0x180
 vfs_read+0x16f/0x190
 ksys_read+0xa5/0xe0
 do_syscall_64+0x37/0x90
 entry_SYSCALL_64_after_hwframe+0x63/0xcd
stack count: 17874


Oscar Salvador (3):
  lib/stackdepot: Add a refcount field in stack_record
  mm, page_owner: Add page_owner_stacks file to print out only stacks
    and their counter
  mm,page_owner: Filter out stacks by a threshold counter

 include/linux/stackdepot.h |  16 ++++-
 lib/stackdepot.c           | 121 ++++++++++++++++++++++++++++++++-----
 mm/kasan/common.c          |   3 +-
 mm/page_owner.c            | 102 +++++++++++++++++++++++++++++--
 4 files changed, 222 insertions(+), 20 deletions(-)

Comments

Michal Hocko Sept. 1, 2022, 8:32 a.m. UTC | #1
On Thu 01-09-22 06:42:46, Oscar Salvador wrote:
> Hi,
> 
> page_owner is a great debug functionality tool that gets us to know
> about all pages that have been allocated/freed and their stacktrace.
> This comes very handy when e.g: debugging leaks, as with some scripting
> we might be able to see those stacktraces that are allocating pages
> but not freeing theme.
> 
> In my experience, that is one of the most useful cases, but it can get
> really tedious to screen through all pages aand try to reconstruct the
> stack <-> allocated/freed relationship. There is a lot of noise
> to cancel off.
> 
> This patch aims to fix that by adding a new functionality into page_owner.
> What this does is to create a new read-only file "page_owner_stacks",
> which prints only the allocating stacktraces and their counting, being that
> the times the stacktrace has allocated - the times it has freed.
> 
> So we have a clear overview of stacks <-> allocated/freed relationship
> without the need to fiddle with pages and trying to match free stacktraces
> with allocated stacktraces.
> 
> This is achieved by adding a new refcount_t field in the stack_record struct,
> incrementing that refcount_t everytime the same stacktrace allocates,
> and decrementing it when it frees a page. Details can be seen in the
> respective patches.
> 
> We also create another file called "page_owner_threshold", which let us
> specify a threshold, so when when reading from "page_owner_stacks",
> we will only see those stacktraces which counting goes beyond the
> threshold we specified.
> 
> A PoC can be found below:
> 
> # cat /sys/kernel/debug/page_owner_threshold
>  0
> # cat /sys/kernel/debug/page_owner_stacks > stacks_full.txt
> # head -32 stacks_full.txt
>  prep_new_page+0x10d/0x180
>  get_page_from_freelist+0x1bd6/0x1e10
>  __alloc_pages+0x194/0x360
>  alloc_page_interleave+0x13/0x90
>  new_slab+0x31d/0x530
>  ___slab_alloc+0x5d7/0x720
>  __slab_alloc.isra.85+0x4a/0x90
>  kmem_cache_alloc+0x455/0x4a0
>  acpi_ps_alloc_op+0x57/0x8f
>  acpi_ps_create_scope_op+0x12/0x23
>  acpi_ps_execute_method+0x102/0x2c1
>  acpi_ns_evaluate+0x343/0x4da
>  acpi_evaluate_object+0x1cb/0x392
>  acpi_run_osc+0x135/0x260
>  acpi_init+0x165/0x4ed
>  do_one_initcall+0x3e/0x200
> stack count: 2

This is very nice and useful! I guess some people would prefer to have
Memory usage: XYZ kB
dumped instead but looking at the code this would require to track
number of pages rather than calls with stacks and that would be more code
and somehow alien to the concept as well. Practically speaking, when
looking into leakers high stack count should be indicative enough IMHO.

[...]
> Oscar Salvador (3):
>   lib/stackdepot: Add a refcount field in stack_record
>   mm, page_owner: Add page_owner_stacks file to print out only stacks
>     and their counter
>   mm,page_owner: Filter out stacks by a threshold counter
> 
>  include/linux/stackdepot.h |  16 ++++-
>  lib/stackdepot.c           | 121 ++++++++++++++++++++++++++++++++-----
>  mm/kasan/common.c          |   3 +-
>  mm/page_owner.c            | 102 +++++++++++++++++++++++++++++--
>  4 files changed, 222 insertions(+), 20 deletions(-)

The code footprint is also rather low. I am no expert in neither
stackdepot nor page owner but from a very quick glance nothing really
jumped at me.

Thanks!