diff mbox series

[RFC,04/15] slub: Enable Slab Movable Objects (SMO)

Message ID 20190308041426.16654-5-tobin@kernel.org (mailing list archive)
State New, archived
Headers show
Series mm: Implement Slab Movable Objects (SMO) | expand

Commit Message

Tobin C. Harding March 8, 2019, 4:14 a.m. UTC
We have now in place a mechanism for adding callbacks to a cache in
order to be able to implement object migration.

Add a function __move() that implements SMO by moving all objects in a
slab page using the isolate/migrate callback methods.

Co-developed-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Tobin C. Harding <tobin@kernel.org>
---
 mm/slub.c | 85 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 85 insertions(+)

Comments

Roman Gushchin March 11, 2019, 10:48 p.m. UTC | #1
On Fri, Mar 08, 2019 at 03:14:15PM +1100, Tobin C. Harding wrote:
> We have now in place a mechanism for adding callbacks to a cache in
> order to be able to implement object migration.
> 
> Add a function __move() that implements SMO by moving all objects in a
> slab page using the isolate/migrate callback methods.
> 
> Co-developed-by: Christoph Lameter <cl@linux.com>
> Signed-off-by: Tobin C. Harding <tobin@kernel.org>
> ---
>  mm/slub.c | 85 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 85 insertions(+)
> 
> diff --git a/mm/slub.c b/mm/slub.c
> index 0133168d1089..6ce866b420f1 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -4325,6 +4325,91 @@ int __kmem_cache_create(struct kmem_cache *s, slab_flags_t flags)
>  	return err;
>  }
>  
> +/*
> + * Allocate a slab scratch space that is sufficient to keep pointers to
> + * individual objects for all objects in cache and also a bitmap for the
> + * objects (used to mark which objects are active).
> + */
> +static inline void *alloc_scratch(struct kmem_cache *s)
> +{
> +	unsigned int size = oo_objects(s->max);
> +
> +	return kmalloc(size * sizeof(void *) +
> +		       BITS_TO_LONGS(size) * sizeof(unsigned long),
> +		       GFP_KERNEL);

I wonder how big this allocation can be?
Given that the reason for migration is probably highly fragmented memory,
we probably don't want to have a high-order allocation here. So maybe
kvmalloc()?

> +}
> +
> +/*
> + * __move() - Move all objects in the given slab.
> + * @page: The slab we are working on.
> + * @scratch: Pointer to scratch space.
> + * @node: The target node to move objects to.
> + *
> + * If the target node is not the current node then the object is moved
> + * to the target node.  If the target node is the current node then this
> + * is an effective way of defragmentation since the current slab page
> + * with its object is exempt from allocation.
> + */
> +static void __move(struct page *page, void *scratch, int node)
> +{

__move() isn't a very explanatory name. kmem_cache_move() (as in Christopher's
version) is much better, IMO. Or maybe move_slab_objects()?

Also, it's usually better to avoid adding new functions without calling them.
Maybe it's possible to merge this patch with (9)?

Thanks!


> +	unsigned long objects;
> +	struct kmem_cache *s;
> +	unsigned long flags;
> +	unsigned long *map;
> +	void *private;
> +	int count;
> +	void *p;
> +	void **vector = scratch;
> +	void *addr = page_address(page);
> +
> +	local_irq_save(flags);
> +	slab_lock(page);
> +
> +	BUG_ON(!PageSlab(page)); /* Must be s slab page */
> +	BUG_ON(!page->frozen);	 /* Slab must have been frozen earlier */
> +
> +	s = page->slab_cache;
> +	objects = page->objects;
> +	map = scratch + objects * sizeof(void **);
> +
> +	/* Determine used objects */
> +	bitmap_fill(map, objects);
> +	for (p = page->freelist; p; p = get_freepointer(s, p))
> +		__clear_bit(slab_index(p, s, addr), map);
> +
> +	/* Build vector of pointers to objects */
> +	count = 0;
> +	memset(vector, 0, objects * sizeof(void **));
> +	for_each_object(p, s, addr, objects)
> +		if (test_bit(slab_index(p, s, addr), map))
> +			vector[count++] = p;
> +
> +	if (s->isolate)
> +		private = s->isolate(s, vector, count);
> +	else
> +		/* Objects do not need to be isolated */
> +		private = NULL;
> +
> +	/*
> +	 * Pinned the objects. Now we can drop the slab lock. The slab
> +	 * is frozen so it cannot vanish from under us nor will
> +	 * allocations be performed on the slab. However, unlocking the
> +	 * slab will allow concurrent slab_frees to proceed. So the
> +	 * subsystem must have a way to tell from the content of the
> +	 * object that it was freed.
> +	 *
> +	 * If neither RCU nor ctor is being used then the object may be
> +	 * modified by the allocator after being freed which may disrupt
> +	 * the ability of the migrate function to tell if the object is
> +	 * free or not.
> +	 */
> +	slab_unlock(page);
> +	local_irq_restore(flags);
> +
> +	/* Perform callback to move the objects */
> +	s->migrate(s, vector, count, node, private);
> +}
> +
>  void kmem_cache_setup_mobility(struct kmem_cache *s,
>  			       kmem_cache_isolate_func isolate,
>  			       kmem_cache_migrate_func migrate)
> -- 
> 2.21.0
>
Tobin Harding March 12, 2019, 1:47 a.m. UTC | #2
On Mon, Mar 11, 2019 at 10:48:45PM +0000, Roman Gushchin wrote:
> On Fri, Mar 08, 2019 at 03:14:15PM +1100, Tobin C. Harding wrote:
> > We have now in place a mechanism for adding callbacks to a cache in
> > order to be able to implement object migration.
> > 
> > Add a function __move() that implements SMO by moving all objects in a
> > slab page using the isolate/migrate callback methods.
> > 
> > Co-developed-by: Christoph Lameter <cl@linux.com>
> > Signed-off-by: Tobin C. Harding <tobin@kernel.org>
> > ---
> >  mm/slub.c | 85 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 85 insertions(+)
> > 
> > diff --git a/mm/slub.c b/mm/slub.c
> > index 0133168d1089..6ce866b420f1 100644
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -4325,6 +4325,91 @@ int __kmem_cache_create(struct kmem_cache *s, slab_flags_t flags)
> >  	return err;
> >  }
> >  
> > +/*
> > + * Allocate a slab scratch space that is sufficient to keep pointers to
> > + * individual objects for all objects in cache and also a bitmap for the
> > + * objects (used to mark which objects are active).
> > + */
> > +static inline void *alloc_scratch(struct kmem_cache *s)
> > +{
> > +	unsigned int size = oo_objects(s->max);
> > +
> > +	return kmalloc(size * sizeof(void *) +
> > +		       BITS_TO_LONGS(size) * sizeof(unsigned long),
> > +		       GFP_KERNEL);
> 
> I wonder how big this allocation can be?
> Given that the reason for migration is probably highly fragmented memory,
> we probably don't want to have a high-order allocation here. So maybe
> kvmalloc()?
> 
> > +}
> > +
> > +/*
> > + * __move() - Move all objects in the given slab.
> > + * @page: The slab we are working on.
> > + * @scratch: Pointer to scratch space.
> > + * @node: The target node to move objects to.
> > + *
> > + * If the target node is not the current node then the object is moved
> > + * to the target node.  If the target node is the current node then this
> > + * is an effective way of defragmentation since the current slab page
> > + * with its object is exempt from allocation.
> > + */
> > +static void __move(struct page *page, void *scratch, int node)
> > +{
> 
> __move() isn't a very explanatory name. kmem_cache_move() (as in Christopher's
> version) is much better, IMO. Or maybe move_slab_objects()?

How about move_slab_page()?  We use kmem_cache_move() later in the
series.  __move() moves all objects in the given page but not all
objects in this cache (which kmem_cache_move() later does).  Open to
further suggestions though, naming things is hard :)

Christopher's original patch uses kmem_cache_move() for a function that
only moves objects from within partial slabs, I changed it because I did
not think this name suitably describes the behaviour.  So from the
original I rename:

	__move() -> __defrag()
	kmem_cache_move() -> __move()
	
And reuse kmem_cache_move() for move _all_ objects (includes full list).

With this set applied we have the call chains

kmem_cache_shrink()		# Defined in slab_common.c, exported to kernel.
 -> __kmem_cache_shrink()	# Defined in slub.c
   -> __defrag()		# Unconditionally (i.e 100%)
     -> __move()

kmem_cache_defrag()		# Exported to kernel
 -> __defrag()
   -> __move()

move_store()			# sysfs
 -> kmem_cache_move()
   -> __move()
 or
 -> __move_all_objects_to()
   -> kmem_cache_move()
     -> __move()


Suggested improvements?

> Also, it's usually better to avoid adding new functions without calling them.
> Maybe it's possible to merge this patch with (9)?

Understood.  The reason behind this is that I attempted to break this
set up separating the implementation of SMO with the addition of each
feature.  This function is only called when features are implemented to
use SMO, I did not want to elevate any feature above any other by
including it in this patch.  I'm open to suggestions on how to order
though, also I'm happy to order it differently if/when we do PATCH
version.  Is that acceptable for the RFC versions?


thanks,
Tobin.
Christoph Lameter (Ampere) March 12, 2019, 4:39 a.m. UTC | #3
On Mon, 11 Mar 2019, Roman Gushchin wrote:

> > +static inline void *alloc_scratch(struct kmem_cache *s)
> > +{
> > +	unsigned int size = oo_objects(s->max);
> > +
> > +	return kmalloc(size * sizeof(void *) +
> > +		       BITS_TO_LONGS(size) * sizeof(unsigned long),
> > +		       GFP_KERNEL);
>
> I wonder how big this allocation can be?
> Given that the reason for migration is probably highly fragmented memory,
> we probably don't want to have a high-order allocation here. So maybe
> kvmalloc()?

The smallest object size is 8 bytes which is one word which would be
places in an order 0 page. So it comes out to about a page again.

Larger allocation orders are possible if the slab pages itself can have
larger orders of course. If you set the min_order to the huge page order
then we can have similar sized orders for the allocation of the scratch
space. However, that is not a problem since the allocations for the slab
pages itself are also already of that same order.
Roman Gushchin March 12, 2019, 6 p.m. UTC | #4
On Tue, Mar 12, 2019 at 12:47:12PM +1100, Tobin C. Harding wrote:
> On Mon, Mar 11, 2019 at 10:48:45PM +0000, Roman Gushchin wrote:
> > On Fri, Mar 08, 2019 at 03:14:15PM +1100, Tobin C. Harding wrote:
> > > We have now in place a mechanism for adding callbacks to a cache in
> > > order to be able to implement object migration.
> > > 
> > > Add a function __move() that implements SMO by moving all objects in a
> > > slab page using the isolate/migrate callback methods.
> > > 
> > > Co-developed-by: Christoph Lameter <cl@linux.com>
> > > Signed-off-by: Tobin C. Harding <tobin@kernel.org>
> > > ---
> > >  mm/slub.c | 85 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > >  1 file changed, 85 insertions(+)
> > > 
> > > diff --git a/mm/slub.c b/mm/slub.c
> > > index 0133168d1089..6ce866b420f1 100644
> > > --- a/mm/slub.c
> > > +++ b/mm/slub.c
> > > @@ -4325,6 +4325,91 @@ int __kmem_cache_create(struct kmem_cache *s, slab_flags_t flags)
> > >  	return err;
> > >  }
> > >  
> > > +/*
> > > + * Allocate a slab scratch space that is sufficient to keep pointers to
> > > + * individual objects for all objects in cache and also a bitmap for the
> > > + * objects (used to mark which objects are active).
> > > + */
> > > +static inline void *alloc_scratch(struct kmem_cache *s)
> > > +{
> > > +	unsigned int size = oo_objects(s->max);
> > > +
> > > +	return kmalloc(size * sizeof(void *) +
> > > +		       BITS_TO_LONGS(size) * sizeof(unsigned long),
> > > +		       GFP_KERNEL);
> > 
> > I wonder how big this allocation can be?
> > Given that the reason for migration is probably highly fragmented memory,
> > we probably don't want to have a high-order allocation here. So maybe
> > kvmalloc()?
> > 
> > > +}
> > > +
> > > +/*
> > > + * __move() - Move all objects in the given slab.
> > > + * @page: The slab we are working on.
> > > + * @scratch: Pointer to scratch space.
> > > + * @node: The target node to move objects to.
> > > + *
> > > + * If the target node is not the current node then the object is moved
> > > + * to the target node.  If the target node is the current node then this
> > > + * is an effective way of defragmentation since the current slab page
> > > + * with its object is exempt from allocation.
> > > + */
> > > +static void __move(struct page *page, void *scratch, int node)
> > > +{
> > 
> > __move() isn't a very explanatory name. kmem_cache_move() (as in Christopher's
> > version) is much better, IMO. Or maybe move_slab_objects()?
> 
> How about move_slab_page()?  We use kmem_cache_move() later in the
> series.  __move() moves all objects in the given page but not all
> objects in this cache (which kmem_cache_move() later does).  Open to
> further suggestions though, naming things is hard :)
> 
> Christopher's original patch uses kmem_cache_move() for a function that
> only moves objects from within partial slabs, I changed it because I did
> not think this name suitably describes the behaviour.  So from the
> original I rename:
> 
> 	__move() -> __defrag()
> 	kmem_cache_move() -> __move()
> 	
> And reuse kmem_cache_move() for move _all_ objects (includes full list).
> 
> With this set applied we have the call chains
> 
> kmem_cache_shrink()		# Defined in slab_common.c, exported to kernel.
>  -> __kmem_cache_shrink()	# Defined in slub.c
>    -> __defrag()		# Unconditionally (i.e 100%)
>      -> __move()
> 
> kmem_cache_defrag()		# Exported to kernel
>  -> __defrag()
>    -> __move()
> 
> move_store()			# sysfs
>  -> kmem_cache_move()
>    -> __move()
>  or
>  -> __move_all_objects_to()
>    -> kmem_cache_move()
>      -> __move()
> 
> 
> Suggested improvements?

move_slab_page() looks good to me. I don't have a strong opinion on naming here,
I just think that unique names are always preferred even for static functions
(that's why I don't like __move()). Also, it's not clear what '__' means here.
So maybe move_slab_page(), defrag_kmem_node(), etc? Or something like this.

> 
> > Also, it's usually better to avoid adding new functions without calling them.
> > Maybe it's possible to merge this patch with (9)?
> 
> Understood.  The reason behind this is that I attempted to break this
> set up separating the implementation of SMO with the addition of each
> feature.  This function is only called when features are implemented to
> use SMO, I did not want to elevate any feature above any other by
> including it in this patch.  I'm open to suggestions on how to order
> though, also I'm happy to order it differently if/when we do PATCH
> version.  Is that acceptable for the RFC versions?

The general rule is that any patch should be self-sufficient.
If it's required to look into further patches to understand why something
has been done, it makes the review process harder.

I think it's possible to make the patchset to conform this rule with
some almost cosmetic changes, e.g. move some changes from one patch
to another, add some comments, etc.

Anyway, it's definitely fine for an RFC, just something to think of
in the next version.

Thanks!
diff mbox series

Patch

diff --git a/mm/slub.c b/mm/slub.c
index 0133168d1089..6ce866b420f1 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4325,6 +4325,91 @@  int __kmem_cache_create(struct kmem_cache *s, slab_flags_t flags)
 	return err;
 }
 
+/*
+ * Allocate a slab scratch space that is sufficient to keep pointers to
+ * individual objects for all objects in cache and also a bitmap for the
+ * objects (used to mark which objects are active).
+ */
+static inline void *alloc_scratch(struct kmem_cache *s)
+{
+	unsigned int size = oo_objects(s->max);
+
+	return kmalloc(size * sizeof(void *) +
+		       BITS_TO_LONGS(size) * sizeof(unsigned long),
+		       GFP_KERNEL);
+}
+
+/*
+ * __move() - Move all objects in the given slab.
+ * @page: The slab we are working on.
+ * @scratch: Pointer to scratch space.
+ * @node: The target node to move objects to.
+ *
+ * If the target node is not the current node then the object is moved
+ * to the target node.  If the target node is the current node then this
+ * is an effective way of defragmentation since the current slab page
+ * with its object is exempt from allocation.
+ */
+static void __move(struct page *page, void *scratch, int node)
+{
+	unsigned long objects;
+	struct kmem_cache *s;
+	unsigned long flags;
+	unsigned long *map;
+	void *private;
+	int count;
+	void *p;
+	void **vector = scratch;
+	void *addr = page_address(page);
+
+	local_irq_save(flags);
+	slab_lock(page);
+
+	BUG_ON(!PageSlab(page)); /* Must be s slab page */
+	BUG_ON(!page->frozen);	 /* Slab must have been frozen earlier */
+
+	s = page->slab_cache;
+	objects = page->objects;
+	map = scratch + objects * sizeof(void **);
+
+	/* Determine used objects */
+	bitmap_fill(map, objects);
+	for (p = page->freelist; p; p = get_freepointer(s, p))
+		__clear_bit(slab_index(p, s, addr), map);
+
+	/* Build vector of pointers to objects */
+	count = 0;
+	memset(vector, 0, objects * sizeof(void **));
+	for_each_object(p, s, addr, objects)
+		if (test_bit(slab_index(p, s, addr), map))
+			vector[count++] = p;
+
+	if (s->isolate)
+		private = s->isolate(s, vector, count);
+	else
+		/* Objects do not need to be isolated */
+		private = NULL;
+
+	/*
+	 * Pinned the objects. Now we can drop the slab lock. The slab
+	 * is frozen so it cannot vanish from under us nor will
+	 * allocations be performed on the slab. However, unlocking the
+	 * slab will allow concurrent slab_frees to proceed. So the
+	 * subsystem must have a way to tell from the content of the
+	 * object that it was freed.
+	 *
+	 * If neither RCU nor ctor is being used then the object may be
+	 * modified by the allocator after being freed which may disrupt
+	 * the ability of the migrate function to tell if the object is
+	 * free or not.
+	 */
+	slab_unlock(page);
+	local_irq_restore(flags);
+
+	/* Perform callback to move the objects */
+	s->migrate(s, vector, count, node, private);
+}
+
 void kmem_cache_setup_mobility(struct kmem_cache *s,
 			       kmem_cache_isolate_func isolate,
 			       kmem_cache_migrate_func migrate)