mbox series

[v3,0/6] mm: introduce shrinker debugfs interface

Message ID 20220509183820.573666-1-roman.gushchin@linux.dev (mailing list archive)
Headers show
Series mm: introduce shrinker debugfs interface | expand

Message

Roman Gushchin May 9, 2022, 6:38 p.m. UTC
There are 50+ different shrinkers in the kernel, many with their own bells and
whistles. Under the memory pressure the kernel applies some pressure on each of
them in the order of which they were created/registered in the system. Some
of them can contain only few objects, some can be quite large. Some can be
effective at reclaiming memory, some not.

The only existing debugging mechanism is a couple of tracepoints in
do_shrink_slab(): mm_shrink_slab_start and mm_shrink_slab_end. They aren't
covering everything though: shrinkers which report 0 objects will never show up,
there is no support for memcg-aware shrinkers. Shrinkers are identified by their
scan function, which is not always enough (e.g. hard to guess which super
block's shrinker it is having only "super_cache_scan").

To provide a better visibility and debug options for memory shrinkers
this patchset introduces a /sys/kernel/debug/shrinker interface, to some extent
similar to /sys/kernel/slab.

For each shrinker registered in the system a directory is created.
As now, the directory will contain only a "scan" file, which allows to get
the number of managed objects for each memory cgroup (for memcg-aware shrinkers)
and each numa node (for numa-aware shrinkers on a numa machine). Other
interfaces might be added in the future.

To make debugging more pleasant, the patchset also names all shrinkers,
so that debugfs entries can have meaningful names.


v3:
  1) separated the "scan" part into a separate patch, by Dave
  2) merged *_memcg, *_node and *_memcg_node interfaces, by Dave
  3) shrinkers naming enhancements, by Christophe and Dave
  4) added signal_pending() check, by Hillf
  5) enabled by default, by Dave

v2:
  1) switched to debugfs, suggested by Mike, Andrew, Greg and others
  2) switched to seq_file API for output, no PAGE_SIZE limit anymore, by Andrew
  3) switched to down_read_killable(), suggested by Hillf
  4) dropped stateful filtering and "freed" returning, by Kent
  5) added docs, by Andrew
  6) added memcg_shrinker.py tool

rfc:
  https://lwn.net/Articles/891542/


Roman Gushchin (6):
  mm: memcontrol: introduce mem_cgroup_ino() and
    mem_cgroup_get_from_ino()
  mm: shrinkers: introduce debugfs interface for memory shrinkers
  mm: shrinkers: provide shrinkers with names
  mm: docs: document shrinker debugfs
  tools: add memcg_shrinker.py
  mm: shrinkers: add scan interface for shrinker debugfs

 Documentation/admin-guide/mm/index.rst        |   1 +
 .../admin-guide/mm/shrinker_debugfs.rst       | 131 ++++++++
 arch/x86/kvm/mmu/mmu.c                        |   2 +-
 drivers/android/binder_alloc.c                |   2 +-
 drivers/gpu/drm/i915/gem/i915_gem_shrinker.c  |   3 +-
 drivers/gpu/drm/msm/msm_gem_shrinker.c        |   2 +-
 .../gpu/drm/panfrost/panfrost_gem_shrinker.c  |   2 +-
 drivers/gpu/drm/ttm/ttm_pool.c                |   2 +-
 drivers/md/bcache/btree.c                     |   2 +-
 drivers/md/dm-bufio.c                         |   2 +-
 drivers/md/dm-zoned-metadata.c                |   2 +-
 drivers/md/raid5.c                            |   2 +-
 drivers/misc/vmw_balloon.c                    |   2 +-
 drivers/virtio/virtio_balloon.c               |   2 +-
 drivers/xen/xenbus/xenbus_probe_backend.c     |   2 +-
 fs/btrfs/super.c                              |   2 +
 fs/erofs/utils.c                              |   2 +-
 fs/ext4/extents_status.c                      |   3 +-
 fs/f2fs/super.c                               |   2 +-
 fs/gfs2/glock.c                               |   2 +-
 fs/gfs2/main.c                                |   2 +-
 fs/jbd2/journal.c                             |   2 +-
 fs/mbcache.c                                  |   2 +-
 fs/nfs/nfs42xattr.c                           |   7 +-
 fs/nfs/super.c                                |   2 +-
 fs/nfsd/filecache.c                           |   2 +-
 fs/nfsd/nfscache.c                            |   2 +-
 fs/quota/dquot.c                              |   2 +-
 fs/super.c                                    |   6 +-
 fs/ubifs/super.c                              |   2 +-
 fs/xfs/xfs_buf.c                              |   2 +-
 fs/xfs/xfs_icache.c                           |   2 +-
 fs/xfs/xfs_qm.c                               |   2 +-
 include/linux/memcontrol.h                    |  21 ++
 include/linux/shrinker.h                      |  31 +-
 kernel/rcu/tree.c                             |   2 +-
 lib/Kconfig.debug                             |   9 +
 mm/Makefile                                   |   1 +
 mm/huge_memory.c                              |   4 +-
 mm/memcontrol.c                               |  23 ++
 mm/shrinker_debug.c                           | 285 ++++++++++++++++++
 mm/vmscan.c                                   |  64 +++-
 mm/workingset.c                               |   2 +-
 mm/zsmalloc.c                                 |   2 +-
 net/sunrpc/auth.c                             |   2 +-
 tools/cgroup/memcg_shrinker.py                |  71 +++++
 46 files changed, 675 insertions(+), 47 deletions(-)
 create mode 100644 Documentation/admin-guide/mm/shrinker_debugfs.rst
 create mode 100644 mm/shrinker_debug.c
 create mode 100755 tools/cgroup/memcg_shrinker.py

Comments

Roman Gushchin May 19, 2022, 5:15 p.m. UTC | #1
On Mon, May 09, 2022 at 11:38:14AM -0700, Roman Gushchin wrote:
> There are 50+ different shrinkers in the kernel, many with their own bells and
> whistles. Under the memory pressure the kernel applies some pressure on each of
> them in the order of which they were created/registered in the system. Some
> of them can contain only few objects, some can be quite large. Some can be
> effective at reclaiming memory, some not.
> 
> The only existing debugging mechanism is a couple of tracepoints in
> do_shrink_slab(): mm_shrink_slab_start and mm_shrink_slab_end. They aren't
> covering everything though: shrinkers which report 0 objects will never show up,
> there is no support for memcg-aware shrinkers. Shrinkers are identified by their
> scan function, which is not always enough (e.g. hard to guess which super
> block's shrinker it is having only "super_cache_scan").
> 
> To provide a better visibility and debug options for memory shrinkers
> this patchset introduces a /sys/kernel/debug/shrinker interface, to some extent
> similar to /sys/kernel/slab.
> 
> For each shrinker registered in the system a directory is created.
> As now, the directory will contain only a "scan" file, which allows to get
> the number of managed objects for each memory cgroup (for memcg-aware shrinkers)
> and each numa node (for numa-aware shrinkers on a numa machine). Other
> interfaces might be added in the future.
> 
> To make debugging more pleasant, the patchset also names all shrinkers,
> so that debugfs entries can have meaningful names.
> 
> 
> v3:
>   1) separated the "scan" part into a separate patch, by Dave
>   2) merged *_memcg, *_node and *_memcg_node interfaces, by Dave
>   3) shrinkers naming enhancements, by Christophe and Dave
>   4) added signal_pending() check, by Hillf
>   5) enabled by default, by Dave

Any comments? Thoughts? Objections?

Thanks!
Dave Chinner May 20, 2022, 4:33 a.m. UTC | #2
On Thu, May 19, 2022 at 10:15:04AM -0700, Roman Gushchin wrote:
> On Mon, May 09, 2022 at 11:38:14AM -0700, Roman Gushchin wrote:
> > There are 50+ different shrinkers in the kernel, many with their own bells and
> > whistles. Under the memory pressure the kernel applies some pressure on each of
> > them in the order of which they were created/registered in the system. Some
> > of them can contain only few objects, some can be quite large. Some can be
> > effective at reclaiming memory, some not.
> > 
> > The only existing debugging mechanism is a couple of tracepoints in
> > do_shrink_slab(): mm_shrink_slab_start and mm_shrink_slab_end. They aren't
> > covering everything though: shrinkers which report 0 objects will never show up,
> > there is no support for memcg-aware shrinkers. Shrinkers are identified by their
> > scan function, which is not always enough (e.g. hard to guess which super
> > block's shrinker it is having only "super_cache_scan").
> > 
> > To provide a better visibility and debug options for memory shrinkers
> > this patchset introduces a /sys/kernel/debug/shrinker interface, to some extent
> > similar to /sys/kernel/slab.
> > 
> > For each shrinker registered in the system a directory is created.
> > As now, the directory will contain only a "scan" file, which allows to get
> > the number of managed objects for each memory cgroup (for memcg-aware shrinkers)
> > and each numa node (for numa-aware shrinkers on a numa machine). Other
> > interfaces might be added in the future.
> > 
> > To make debugging more pleasant, the patchset also names all shrinkers,
> > so that debugfs entries can have meaningful names.
> > 
> > 
> > v3:
> >   1) separated the "scan" part into a separate patch, by Dave
> >   2) merged *_memcg, *_node and *_memcg_node interfaces, by Dave
> >   3) shrinkers naming enhancements, by Christophe and Dave
> >   4) added signal_pending() check, by Hillf
> >   5) enabled by default, by Dave
> 
> Any comments? Thoughts? Objections?

I have no time available to look at this right now, and won't for a
while.

Cheers,

Dave.