diff mbox

[RFC] Heuristic for inode/dentry fragmentation prevention

Message ID alpine.DEB.2.20.1801031332230.10522@nuc-kabylake (mailing list archive)
State New, archived
Headers show

Commit Message

Christoph Lameter (Ampere) Jan. 3, 2018, 7:39 p.m. UTC
I was looking at the inode/dentry reclaim code today and I thought there
is an obvious and easy to implement way to avoid fragmentation by checking
the number of objects in a slab page.


Subject: Heuristic for fragmentation prevention for inode and dentry caches

When freeing dentries and inodes we often get to the situation
that a slab page cannot be freed because there is only a single
object left in that slab page.

We add a new function to the slab allocators that returns the
number of objects in the same slab page.

Then the dentry and inode logic can check if such a situation
exits and take measures to try to reclaim that entry sooner.

In this patch the check if an inode or dentry has been referenced
(and thus should be kept) is skipped if the freeing of the object
would result in the slab page becoming available.

That will cause overhead in terms of having to re-allocate and
generate the inoden or dentry but in all likelyhood the inode
or dentry will then be allocated in a slab page that already
contains other inodes or dentries. Thus fragmentation is reduced.

Signed-off-by: Christopher Lameter <cl@linux.com>

Comments

Matthew Wilcox (Oracle) Jan. 3, 2018, 8:33 p.m. UTC | #1
On Wed, Jan 03, 2018 at 01:39:27PM -0600, Christopher Lameter wrote:
> +/* How many objects left in slab page */
> +unsigned kobjects_left_in_slab_page(const void *object)
> +{
> +	struct page *page;
> +
> +	if (unlikely(ZERO_OR_NULL_PTR(object)))
> +		return 0;
> +
> +	page = virt_to_head_page(object);
> +
> +	if (unlikely(!PageSlab(page))) {
> +		WARN_ON(1);
> +		return 1;
> +	}

I see this construct all over the kernel.  Here's a better one:

	if (WARN_ON(!PageSlab(page)))
		return 1;

There's a built-in unlikely() in the definition of WARN_ON, so this
works nicely.
Matthew Wilcox (Oracle) Jan. 3, 2018, 9:06 p.m. UTC | #2
On Wed, Jan 03, 2018 at 01:39:27PM -0600, Christopher Lameter wrote:
> +++ linux/fs/dcache.c
> @@ -1074,7 +1074,8 @@ static enum lru_status dentry_lru_isolat
>  		return LRU_REMOVED;
>  	}
> 
> -	if (dentry->d_flags & DCACHE_REFERENCED) {
> +	if (dentry->d_flags & DCACHE_REFERENCED &&
> +	   kobjects_left_in_slab_page(dentry) > 1) {
>  		dentry->d_flags &= ~DCACHE_REFERENCED;
>  		spin_unlock(&dentry->d_lock);
> 

Maybe also update this comment:

        /*
         * Referenced dentries are still in use. If they have active
         * counts, just remove them from the LRU. Otherwise give them
-        * another pass through the LRU.
+	 * another pass through the LRU unless they are the only
+	 * object on their slab page.
         */
Dave Chinner Jan. 4, 2018, 12:08 a.m. UTC | #3
On Wed, Jan 03, 2018 at 01:39:27PM -0600, Christopher Lameter wrote:
> I was looking at the inode/dentry reclaim code today and I thought there
> is an obvious and easy to implement way to avoid fragmentation by checking
> the number of objects in a slab page.
> 
> 
> Subject: Heuristic for fragmentation prevention for inode and dentry caches
> 
> When freeing dentries and inodes we often get to the situation
> that a slab page cannot be freed because there is only a single
> object left in that slab page.
> 
> We add a new function to the slab allocators that returns the
> number of objects in the same slab page.
> 
> Then the dentry and inode logic can check if such a situation
> exits and take measures to try to reclaim that entry sooner.
> 
> In this patch the check if an inode or dentry has been referenced
> (and thus should be kept) is skipped if the freeing of the object
> would result in the slab page becoming available.
> 
> That will cause overhead in terms of having to re-allocate and
> generate the inoden or dentry but in all likelyhood the inode
> or dentry will then be allocated in a slab page that already
> contains other inodes or dentries. Thus fragmentation is reduced.

Please quantify the difference this makes to inode/dentry cache
fragmentation, as well as the overhead of the
kobjects_left_in_slab_page() check on every referenced inode and
dentry we scan.

Basically, if we can't reliably produce and quantify inode/dentry
cache fragmentation on demand, then we've go no way to evaluate the
effect of such heuristics will on cache footprint. I'm happy to run
tests to help develop heuristics, but I don't have time to create
tests to reproduce cache fragmentation issues myself.

IOWs, before we start down this path, we need to create workloads
that reproduce inode/dentry cache fragmentation issues....

Cheers,

Dave.
diff mbox

Patch

Index: linux/include/linux/slab.h
===================================================================
--- linux.orig/include/linux/slab.h
+++ linux/include/linux/slab.h
@@ -165,6 +165,7 @@  void * __must_check krealloc(const void
 void kfree(const void *);
 void kzfree(const void *);
 size_t ksize(const void *);
+unsigned kobjects_left_in_slab_page(const void *);

 #ifdef CONFIG_HAVE_HARDENED_USERCOPY_ALLOCATOR
 const char *__check_heap_object(const void *ptr, unsigned long n,
Index: linux/mm/slab.c
===================================================================
--- linux.orig/mm/slab.c
+++ linux/mm/slab.c
@@ -4446,3 +4446,24 @@  size_t ksize(const void *objp)
 	return size;
 }
 EXPORT_SYMBOL(ksize);
+
+/* How many objects left in slab page */
+unsigned kobjects_left_in_slab_page(const void *object)
+{
+	struct page *page;
+
+	if (unlikely(ZERO_OR_NULL_PTR(object)))
+		return 0;
+
+	page = virt_to_head_page(object);
+
+	if (unlikely(!PageSlab(page))) {
+		WARN_ON(1);
+		return 1;
+	}
+
+	return page->active;
+}
+EXPORT_SYMBOL(kobjects_left_in_slab_page);
+
+
Index: linux/mm/slub.c
===================================================================
--- linux.orig/mm/slub.c
+++ linux/mm/slub.c
@@ -3879,6 +3879,25 @@  size_t ksize(const void *object)
 }
 EXPORT_SYMBOL(ksize);

+/* How many objects left in slab page */
+unsigned kobjects_left_in_slab_page(const void *object)
+{
+	struct page *page;
+
+	if (unlikely(ZERO_OR_NULL_PTR(object)))
+		return 0;
+
+	page = virt_to_head_page(object);
+
+	if (unlikely(!PageSlab(page))) {
+		WARN_ON(!PageCompound(page));
+		return 1;
+	}
+
+	return page->inuse;
+}
+EXPORT_SYMBOL(kobjects_left_in_slab_page);
+
 void kfree(const void *x)
 {
 	struct page *page;
Index: linux/fs/dcache.c
===================================================================
--- linux.orig/fs/dcache.c
+++ linux/fs/dcache.c
@@ -1074,7 +1074,8 @@  static enum lru_status dentry_lru_isolat
 		return LRU_REMOVED;
 	}

-	if (dentry->d_flags & DCACHE_REFERENCED) {
+	if (dentry->d_flags & DCACHE_REFERENCED &&
+	   kobjects_left_in_slab_page(dentry) > 1) {
 		dentry->d_flags &= ~DCACHE_REFERENCED;
 		spin_unlock(&dentry->d_lock);

Index: linux/fs/inode.c
===================================================================
--- linux.orig/fs/inode.c
+++ linux/fs/inode.c
@@ -725,8 +725,12 @@  static enum lru_status inode_lru_isolate
 		return LRU_REMOVED;
 	}

-	/* recently referenced inodes get one more pass */
-	if (inode->i_state & I_REFERENCED) {
+	/*
+	 * Recently referenced inodes get one more pass
+	 * if they are not the only objects in a slab page
+	 */
+	if (inode->i_state & I_REFERENCED &&
+	    kobjects_left_in_slab_page(inode) > 1) {
 		inode->i_state &= ~I_REFERENCED;
 		spin_unlock(&inode->i_lock);
 		return LRU_ROTATE;