mbox series

[RFC,v3,00/15] Slab Movable Objects (SMO)

Message ID 20190411013441.5415-1-tobin@kernel.org (mailing list archive)
Headers show
Series Slab Movable Objects (SMO) | expand

Message

Tobin C. Harding April 11, 2019, 1:34 a.m. UTC
Hi,

Another iteration of the SMO patch set implementing suggestions from Al
and Willy on the last version as well as some feedback from comments on
the recent LWN article.

Applies on top of Linus' tree (tag: v5.1-rc4).

This is a patch set implementing movable objects within the SLUB
allocator.  This is work based on Christopher Lameter's patch set:

 https://lore.kernel.org/patchwork/project/lkml/list/?series=377335

The original code logic is from that set and implemented by Christopher.
Clean up, refactoring, documentation, and additional features by myself.
Responsibility for any bugs remaining falls solely with myself.

Patch #9 has changes to the XArray migration function as suggested by
Matthew, thank you.

The only other changes to this version are to the dcache code.

dcache
------

It was noted on LWN that calling the dcache migration function
'd_migrate' is a misnomer because we are _not_ trying to migrate the
dentry objects but rather only free them.  As noted by Al dentry (and
inode) objects are inherently not relocatable.  What we are trying to
achieve here is, rather, to attempt to free a select group of dentry
objects.  The dcache patches are not intended to be a silver bullet
fixing all fragmentation within the dentry slab cache.  Instead we are
trying to make a non-invasive attempt at freeing up pages sparsely used
by the dentry slab cache.  This may be useful for a number of reasons
e.g. we _may_ be able to free a page that is stopping high order page
allocations.  This would be a useful capability.

Since this is only something that _may_ help the aim is to be
non-intrusive.  This version of the set adds a config option to
selectively build in the SMO stuff for the dcache.  Without this option
the only change this set makes to the dcache is adding a constructor.
With the constructor doing a spinlock_init() it is hoped this will at
best be a performance gain and at worst NOT be a performance reduction.
Benchmarking has found this to be the case, results are included below.

Patch #14 and #15 can be rolled into a single patch if #15 is found
favourable.

Changes since v2:

 - Improve the XArray migration function (thanks Matthew)
 - Fix the dcache constructor (thanks Alexander)
 - Rename the d_migrate function to d_partial_shrink (open to
   suggested improvement)
 - Totally re-write the dcache migration function based on schooling by Al


Thanks for looking at this,
Tobin.


=============================
dcache SMO patch benchmarking
=============================

Process
=======

We use 5.1-rc4 as the baseline.  We benchmark the SMO patchset with
and without CONFIG_DCACHE_SMO.  SMO patch set without CONFIG_DCACHE_SMO
just adds a constructor to the dcache, no other code added to the build.
Building with CONFIG_DCACHE_SMO adds code to enable object migration for
the dcache.

cmd = `time find / -name fname-no-exist`
drop_caches = `cat 2 > /proc/sys/vm/drop_caches`

1. Boot system
2. Run $cmd
3. Run $drop_caches
4. Run $cmd


Bare metal results
------------------

Machine: x86_64
Kernel configured with::

	make defconfig


- rc4 kernel (baseline)::

	time find / -name fname-no-exist dentry 

	real	0m29.799s
	user	0m1.519s
	sys	0m10.825s

	echo 2 > /proc/sys/vm/drop_caches 

	time find / -name fname-no-exist dentry 

	real	0m6.828s
	user	0m0.952s
	sys	0m5.824s


- rc4 kernel with SMO patch set and !CONFIG_DCACHE_SMO::

	time find / -name fname-no-exist

	real	0m30.075s
	user	0m1.480s
	sys	0m10.754s

	echo 2 > /proc/sys/vm/drop_caches 
	time find / -name fname-no-existproc/sys/vm/drop_caches 

	real	0m6.626s
	user	0m0.917s
	sys	0m5.661s


- rc4 kernel with SMO patch set and CONFIG_DCACHE_SMO::

	time find / -name fname-no-exist dentry 

	real	0m30.637s
	user	0m1.516s
	sys	0m11.603s

	echo 2 > /proc/sys/vm/drop_caches 

	time find / -name fname-no-exist dentry 

	real	0m6.886s
	user	0m0.932s
	sys	0m5.907s


Qemu results
------------

Host machine: x86_64

Qemu kernel configured with::

	make defconfig
	make kvmconfig

Qemu invoked with::

    qemu-system-x86_64 \
      -enable-kvm \
      -m 4G \
      -hda arch.qcow \
      -kernel $kernel \
      -serial stdio \
      -display none" \
      -append 'root=/dev/sda1 console=ttyS0 rw'

- rc4 kernel (baseline)::

	time find / -name fname-no-exist

	real	0m0.929s
	user	0m0.096s
	sys	0m0.168s

	echo 2 > /proc/sys/vm/drop_caches 
	time find / -name fname-no-exist

	real	0m0.249s
	user	0m0.112s
	sys	0m0.133s

- rc4 kernel with SMO patch set and !CONFIG_DCACHE_SMO::

	time find / -name fname-no-exist

	real	0m1.018s
	user	0m0.095s
	sys	0m0.151s

	echo 2 > /proc/sys/vm/drop_caches 
	time find / -name fname-no-exist

	real	0m0.191s
	user	0m0.083s
	sys	0m0.105s


- rc4 kernel with SMO patch set and CONFIG_DCACHE_SMO::

	time find / -name fname-no-exist

	real	0m0.763s
	user	0m0.091s
	sys	0m0.165s

	echo 2 > /proc/sys/vm/drop_caches 
	time find / -name fname-no-exist

	real	0m0.192s
	user	0m0.062s
	sys	0m0.126s


I am not very experienced with benchmarking, if this is grossly
incorrect please do not hesitate to yell at me.  Any suggestions on
more/better benchmarking most appreciated.

Thanks,
Tobin.


Tobin C. Harding (15):
  slub: Add isolate() and migrate() methods
  tools/vm/slabinfo: Add support for -C and -M options
  slub: Sort slab cache list
  slub: Slab defrag core
  tools/vm/slabinfo: Add remote node defrag ratio output
  tools/vm/slabinfo: Add defrag_used_ratio output
  tools/testing/slab: Add object migration test module
  tools/testing/slab: Add object migration test suite
  xarray: Implement migration function for objects
  tools/testing/slab: Add XArray movable objects tests
  slub: Enable moving objects to/from specific nodes
  slub: Enable balancing slabs across nodes
  dcache: Provide a dentry constructor
  dcache: Implement partial shrink via Slab Movable Objects
  dcache: Add CONFIG_DCACHE_SMO

 Documentation/ABI/testing/sysfs-kernel-slab |  14 +
 fs/dcache.c                                 | 106 ++-
 include/linux/slab.h                        |  71 ++
 include/linux/slub_def.h                    |  10 +
 lib/radix-tree.c                            |  13 +
 lib/xarray.c                                |  49 ++
 mm/Kconfig                                  |  14 +
 mm/slab_common.c                            |   2 +-
 mm/slub.c                                   | 819 ++++++++++++++++++--
 tools/testing/slab/Makefile                 |  10 +
 tools/testing/slab/slub_defrag.c            | 567 ++++++++++++++
 tools/testing/slab/slub_defrag.py           | 451 +++++++++++
 tools/testing/slab/slub_defrag_xarray.c     | 211 +++++
 tools/vm/slabinfo.c                         |  51 +-
 14 files changed, 2295 insertions(+), 93 deletions(-)
 create mode 100644 tools/testing/slab/Makefile
 create mode 100644 tools/testing/slab/slub_defrag.c
 create mode 100755 tools/testing/slab/slub_defrag.py
 create mode 100644 tools/testing/slab/slub_defrag_xarray.c