diff mbox series

[v2,1/2] mm: swap: make page_evictable() inline

Message ID 1584397455-28701-1-git-send-email-yang.shi@linux.alibaba.com (mailing list archive)
State New, archived
Headers show
Series [v2,1/2] mm: swap: make page_evictable() inline | expand

Commit Message

Yang Shi March 16, 2020, 10:24 p.m. UTC
When backporting commit 9c4e6b1a7027 ("mm, mlock, vmscan: no more
skipping pagevecs") to our 4.9 kernel, our test bench noticed around 10%
down with a couple of vm-scalability's test cases (lru-file-readonce,
lru-file-readtwice and lru-file-mmap-read).  I didn't see that much down
on my VM (32c-64g-2nodes).  It might be caused by the test configuration,
which is 32c-256g with NUMA disabled and the tests were run in root memcg,
so the tests actually stress only one inactive and active lru.  It
sounds not very usual in mordern production environment.

That commit did two major changes:
1. Call page_evictable()
2. Use smp_mb to force the PG_lru set visible

It looks they contribute the most overhead.  The page_evictable() is a
function which does function prologue and epilogue, and that was used by
page reclaim path only.  However, lru add is a very hot path, so it
sounds better to make it inline.  However, it calls page_mapping() which
is not inlined either, but the disassemble shows it doesn't do push and
pop operations and it sounds not very straightforward to inline it.

Other than this, it sounds smp_mb() is not necessary for x86 since
SetPageLRU is atomic which enforces memory barrier already, replace it
with smp_mb__after_atomic() in the following patch.

With the two fixes applied, the tests can get back around 5% on that
test bench and get back normal on my VM.  Since the test bench
configuration is not that usual and I also saw around 6% up on the
latest upstream, so it sounds good enough IMHO.

The below is test data (lru-file-readtwice throughput) against the v5.6-rc4:
	mainline	w/ inline fix
          150MB            154MB

With this patch the throughput gets 2.67% up.  The data with using
smp_mb__after_atomic() is showed in the following patch.

Fixes: 9c4e6b1a7027 ("mm, mlock, vmscan: no more skipping pagevecs")
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
---
v2: * Solved the comments from Willy.

 include/linux/pagemap.h |  2 +-
 include/linux/swap.h    | 24 +++++++++++++++++++++++-
 mm/vmscan.c             | 23 -----------------------
 3 files changed, 24 insertions(+), 25 deletions(-)

Comments

Shakeel Butt March 16, 2020, 11:46 p.m. UTC | #1
On Mon, Mar 16, 2020 at 3:24 PM Yang Shi <yang.shi@linux.alibaba.com> wrote:
>
> When backporting commit 9c4e6b1a7027 ("mm, mlock, vmscan: no more
> skipping pagevecs") to our 4.9 kernel, our test bench noticed around 10%
> down with a couple of vm-scalability's test cases (lru-file-readonce,
> lru-file-readtwice and lru-file-mmap-read).  I didn't see that much down
> on my VM (32c-64g-2nodes).  It might be caused by the test configuration,
> which is 32c-256g with NUMA disabled and the tests were run in root memcg,
> so the tests actually stress only one inactive and active lru.  It
> sounds not very usual in mordern production environment.
>
> That commit did two major changes:
> 1. Call page_evictable()
> 2. Use smp_mb to force the PG_lru set visible
>
> It looks they contribute the most overhead.  The page_evictable() is a
> function which does function prologue and epilogue, and that was used by
> page reclaim path only.  However, lru add is a very hot path, so it
> sounds better to make it inline.  However, it calls page_mapping() which
> is not inlined either, but the disassemble shows it doesn't do push and
> pop operations and it sounds not very straightforward to inline it.
>
> Other than this, it sounds smp_mb() is not necessary for x86 since
> SetPageLRU is atomic which enforces memory barrier already, replace it
> with smp_mb__after_atomic() in the following patch.
>
> With the two fixes applied, the tests can get back around 5% on that
> test bench and get back normal on my VM.  Since the test bench
> configuration is not that usual and I also saw around 6% up on the
> latest upstream, so it sounds good enough IMHO.
>
> The below is test data (lru-file-readtwice throughput) against the v5.6-rc4:
>         mainline        w/ inline fix
>           150MB            154MB
>
> With this patch the throughput gets 2.67% up.  The data with using
> smp_mb__after_atomic() is showed in the following patch.
>
> Fixes: 9c4e6b1a7027 ("mm, mlock, vmscan: no more skipping pagevecs")
> Cc: Shakeel Butt <shakeelb@google.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Matthew Wilcox <willy@infradead.org>
> Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>


So, I tested on a real machine with limiting the 'dd' on a single node
and reading 100 GiB sparse file (less than a single node). I just ran
a single instance to not cause the lru lock contention. The cmd I used
is "dd if=file-100GiB of=/dev/null bs=4k". I ran the cmd 10 times with
drop_caches in between and measured the time it took.

Without patch: 56.64143 +- 0.672 sec

With patches: 56.10 +- 0.21 sec

Reviewed-and-Tested-by: Shakeel Butt <shakeelb@google.com>
Yang Shi March 17, 2020, 2:05 a.m. UTC | #2
On 3/16/20 4:46 PM, Shakeel Butt wrote:
> On Mon, Mar 16, 2020 at 3:24 PM Yang Shi <yang.shi@linux.alibaba.com> wrote:
>> When backporting commit 9c4e6b1a7027 ("mm, mlock, vmscan: no more
>> skipping pagevecs") to our 4.9 kernel, our test bench noticed around 10%
>> down with a couple of vm-scalability's test cases (lru-file-readonce,
>> lru-file-readtwice and lru-file-mmap-read).  I didn't see that much down
>> on my VM (32c-64g-2nodes).  It might be caused by the test configuration,
>> which is 32c-256g with NUMA disabled and the tests were run in root memcg,
>> so the tests actually stress only one inactive and active lru.  It
>> sounds not very usual in mordern production environment.
>>
>> That commit did two major changes:
>> 1. Call page_evictable()
>> 2. Use smp_mb to force the PG_lru set visible
>>
>> It looks they contribute the most overhead.  The page_evictable() is a
>> function which does function prologue and epilogue, and that was used by
>> page reclaim path only.  However, lru add is a very hot path, so it
>> sounds better to make it inline.  However, it calls page_mapping() which
>> is not inlined either, but the disassemble shows it doesn't do push and
>> pop operations and it sounds not very straightforward to inline it.
>>
>> Other than this, it sounds smp_mb() is not necessary for x86 since
>> SetPageLRU is atomic which enforces memory barrier already, replace it
>> with smp_mb__after_atomic() in the following patch.
>>
>> With the two fixes applied, the tests can get back around 5% on that
>> test bench and get back normal on my VM.  Since the test bench
>> configuration is not that usual and I also saw around 6% up on the
>> latest upstream, so it sounds good enough IMHO.
>>
>> The below is test data (lru-file-readtwice throughput) against the v5.6-rc4:
>>          mainline        w/ inline fix
>>            150MB            154MB
>>
>> With this patch the throughput gets 2.67% up.  The data with using
>> smp_mb__after_atomic() is showed in the following patch.
>>
>> Fixes: 9c4e6b1a7027 ("mm, mlock, vmscan: no more skipping pagevecs")
>> Cc: Shakeel Butt <shakeelb@google.com>
>> Cc: Vlastimil Babka <vbabka@suse.cz>
>> Cc: Matthew Wilcox <willy@infradead.org>
>> Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
>
> So, I tested on a real machine with limiting the 'dd' on a single node
> and reading 100 GiB sparse file (less than a single node). I just ran
> a single instance to not cause the lru lock contention. The cmd I used
> is "dd if=file-100GiB of=/dev/null bs=4k". I ran the cmd 10 times with
> drop_caches in between and measured the time it took.
>
> Without patch: 56.64143 +- 0.672 sec
>
> With patches: 56.10 +- 0.21 sec
>
> Reviewed-and-Tested-by: Shakeel Butt <shakeelb@google.com>

Thanks Shakeel. It'd better to add your test result in the commit log too.
kernel test robot March 17, 2020, 3 a.m. UTC | #3
Hi Yang,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on mmotm/master]
[also build test ERROR on linus/master v5.6-rc6 next-20200316]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/Yang-Shi/mm-swap-make-page_evictable-inline/20200317-094836
base:   git://git.cmpxchg.org/linux-mmotm.git master
config: i386-tinyconfig (attached as .config)
compiler: gcc-7 (Debian 7.5.0-5) 7.5.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   In file included from include/linux/suspend.h:5:0,
                    from arch/x86/kernel/asm-offsets.c:13:
   include/linux/swap.h: In function 'page_evictable':
>> include/linux/swap.h:395:9: error: implicit declaration of function 'mapping_unevictable'; did you mean 'mapping_deny_writable'? [-Werror=implicit-function-declaration]
     ret = !mapping_unevictable(page_mapping(page)) && !PageMlocked(page);
            ^~~~~~~~~~~~~~~~~~~
            mapping_deny_writable
   cc1: some warnings being treated as errors
   make[2]: *** [scripts/Makefile.build:99: arch/x86/kernel/asm-offsets.s] Error 1
   make[2]: Target '__build' not remade because of errors.
   make[1]: *** [Makefile:1139: prepare0] Error 2
   make[1]: Target 'prepare' not remade because of errors.
   make: *** [Makefile:179: sub-make] Error 2
   11 real  4 user  5 sys  89.51% cpu 	make prepare

vim +395 include/linux/swap.h

   376	
   377	/**
   378	 * page_evictable - test whether a page is evictable
   379	 * @page: the page to test
   380	 *
   381	 * Test whether page is evictable--i.e., should be placed on active/inactive
   382	 * lists vs unevictable list.
   383	 *
   384	 * Reasons page might not be evictable:
   385	 * (1) page's mapping marked unevictable
   386	 * (2) page is part of an mlocked VMA
   387	 *
   388	 */
   389	static inline bool page_evictable(struct page *page)
   390	{
   391		bool ret;
   392	
   393		/* Prevent address_space of inode and swap cache from being freed */
   394		rcu_read_lock();
 > 395		ret = !mapping_unevictable(page_mapping(page)) && !PageMlocked(page);
   396		rcu_read_unlock();
   397		return ret;
   398	}
   399	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
kernel test robot March 17, 2020, 3:15 a.m. UTC | #4
Hi Yang,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on mmotm/master]
[also build test ERROR on linus/master v5.6-rc6 next-20200316]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/Yang-Shi/mm-swap-make-page_evictable-inline/20200317-094836
base:   git://git.cmpxchg.org/linux-mmotm.git master
config: openrisc-simple_smp_defconfig (attached as .config)
compiler: or1k-linux-gcc (GCC) 9.2.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        GCC_VERSION=9.2.0 make.cross ARCH=openrisc 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   In file included from include/linux/suspend.h:5,
                    from init/do_mounts.c:7:
   include/linux/swap.h: In function 'page_evictable':
   include/linux/swap.h:395:9: error: implicit declaration of function 'mapping_unevictable'; did you mean 'mapping_deny_writable'? [-Werror=implicit-function-declaration]
     395 |  ret = !mapping_unevictable(page_mapping(page)) && !PageMlocked(page);
         |         ^~~~~~~~~~~~~~~~~~~
         |         mapping_deny_writable
   In file included from include/linux/mempolicy.h:16,
                    from include/linux/shmem_fs.h:7,
                    from init/do_mounts.c:21:
   include/linux/pagemap.h: At top level:
>> include/linux/pagemap.h:73:20: error: conflicting types for 'mapping_unevictable'
      73 | static inline bool mapping_unevictable(struct address_space *mapping)
         |                    ^~~~~~~~~~~~~~~~~~~
   In file included from include/linux/suspend.h:5,
                    from init/do_mounts.c:7:
   include/linux/swap.h:395:9: note: previous implicit declaration of 'mapping_unevictable' was here
     395 |  ret = !mapping_unevictable(page_mapping(page)) && !PageMlocked(page);
         |         ^~~~~~~~~~~~~~~~~~~
   cc1: some warnings being treated as errors

vim +/mapping_unevictable +73 include/linux/pagemap.h

    72	
  > 73	static inline bool mapping_unevictable(struct address_space *mapping)
    74	{
    75		if (mapping)
    76			return test_bit(AS_UNEVICTABLE, &mapping->flags);
    77		return !!mapping;
    78	}
    79	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
Yang Shi March 17, 2020, 3:24 a.m. UTC | #5
On 3/16/20 8:00 PM, kbuild test robot wrote:
> Hi Yang,
>
> Thank you for the patch! Yet something to improve:
>
> [auto build test ERROR on mmotm/master]
> [also build test ERROR on linus/master v5.6-rc6 next-20200316]
> [if your patch is applied to the wrong git tree, please drop us a note to help
> improve the system. BTW, we also suggest to use '--base' option to specify the
> base tree in git format-patch, please see https://stackoverflow.com/a/37406982]
>
> url:    https://github.com/0day-ci/linux/commits/Yang-Shi/mm-swap-make-page_evictable-inline/20200317-094836
> base:   git://git.cmpxchg.org/linux-mmotm.git master
> config: i386-tinyconfig (attached as .config)
> compiler: gcc-7 (Debian 7.5.0-5) 7.5.0
> reproduce:
>          # save the attached .config to linux build tree
>          make ARCH=i386
>
> If you fix the issue, kindly add following tag
> Reported-by: kbuild test robot <lkp@intel.com>
>
> All errors (new ones prefixed by >>):
>
>     In file included from include/linux/suspend.h:5:0,
>                      from arch/x86/kernel/asm-offsets.c:13:
>     include/linux/swap.h: In function 'page_evictable':
>>> include/linux/swap.h:395:9: error: implicit declaration of function 'mapping_unevictable'; did you mean 'mapping_deny_writable'? [-Werror=implicit-function-declaration]

It looks pagemap.h need to be included. The below patch should be able 
to fix the build error:

diff --git a/include/linux/swap.h b/include/linux/swap.h
index d296c70..e8b8bbe 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -12,6 +12,7 @@
  #include <linux/fs.h>
  #include <linux/atomic.h>
  #include <linux/page-flags.h>
+#include <linux/pagemap.h>
  #include <asm/page.h>

  struct notifier_block;


>       ret = !mapping_unevictable(page_mapping(page)) && !PageMlocked(page);
>              ^~~~~~~~~~~~~~~~~~~
>              mapping_deny_writable
>     cc1: some warnings being treated as errors
>     make[2]: *** [scripts/Makefile.build:99: arch/x86/kernel/asm-offsets.s] Error 1
>     make[2]: Target '__build' not remade because of errors.
>     make[1]: *** [Makefile:1139: prepare0] Error 2
>     make[1]: Target 'prepare' not remade because of errors.
>     make: *** [Makefile:179: sub-make] Error 2
>     11 real  4 user  5 sys  89.51% cpu 	make prepare
>
> vim +395 include/linux/swap.h
>
>     376	
>     377	/**
>     378	 * page_evictable - test whether a page is evictable
>     379	 * @page: the page to test
>     380	 *
>     381	 * Test whether page is evictable--i.e., should be placed on active/inactive
>     382	 * lists vs unevictable list.
>     383	 *
>     384	 * Reasons page might not be evictable:
>     385	 * (1) page's mapping marked unevictable
>     386	 * (2) page is part of an mlocked VMA
>     387	 *
>     388	 */
>     389	static inline bool page_evictable(struct page *page)
>     390	{
>     391		bool ret;
>     392	
>     393		/* Prevent address_space of inode and swap cache from being freed */
>     394		rcu_read_lock();
>   > 395		ret = !mapping_unevictable(page_mapping(page)) && !PageMlocked(page);
>     396		rcu_read_unlock();
>     397		return ret;
>     398	}
>     399	
>
> ---
> 0-DAY CI Kernel Test Service, Intel Corporation
> https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
Vlastimil Babka March 17, 2020, 8:59 a.m. UTC | #6
On 3/16/20 11:24 PM, Yang Shi wrote:
> When backporting commit 9c4e6b1a7027 ("mm, mlock, vmscan: no more
> skipping pagevecs") to our 4.9 kernel, our test bench noticed around 10%
> down with a couple of vm-scalability's test cases (lru-file-readonce,
> lru-file-readtwice and lru-file-mmap-read).  I didn't see that much down
> on my VM (32c-64g-2nodes).  It might be caused by the test configuration,
> which is 32c-256g with NUMA disabled and the tests were run in root memcg,
> so the tests actually stress only one inactive and active lru.  It
> sounds not very usual in mordern production environment.
> 
> That commit did two major changes:
> 1. Call page_evictable()
> 2. Use smp_mb to force the PG_lru set visible
> 
> It looks they contribute the most overhead.  The page_evictable() is a
> function which does function prologue and epilogue, and that was used by
> page reclaim path only.  However, lru add is a very hot path, so it
> sounds better to make it inline.  However, it calls page_mapping() which
> is not inlined either, but the disassemble shows it doesn't do push and
> pop operations and it sounds not very straightforward to inline it.
> 
> Other than this, it sounds smp_mb() is not necessary for x86 since
> SetPageLRU is atomic which enforces memory barrier already, replace it
> with smp_mb__after_atomic() in the following patch.
> 
> With the two fixes applied, the tests can get back around 5% on that
> test bench and get back normal on my VM.  Since the test bench
> configuration is not that usual and I also saw around 6% up on the
> latest upstream, so it sounds good enough IMHO.
> 
> The below is test data (lru-file-readtwice throughput) against the v5.6-rc4:
> 	mainline	w/ inline fix
>           150MB            154MB
> 
> With this patch the throughput gets 2.67% up.  The data with using
> smp_mb__after_atomic() is showed in the following patch.
> 
> Fixes: 9c4e6b1a7027 ("mm, mlock, vmscan: no more skipping pagevecs")
> Cc: Shakeel Butt <shakeelb@google.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Matthew Wilcox <willy@infradead.org>
> Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

Thanks.
diff mbox series

Patch

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index ccb14b6..654ce76 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -70,7 +70,7 @@  static inline void mapping_clear_unevictable(struct address_space *mapping)
 	clear_bit(AS_UNEVICTABLE, &mapping->flags);
 }
 
-static inline int mapping_unevictable(struct address_space *mapping)
+static inline bool mapping_unevictable(struct address_space *mapping)
 {
 	if (mapping)
 		return test_bit(AS_UNEVICTABLE, &mapping->flags);
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 1e99f7a..d296c70 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -374,7 +374,29 @@  extern unsigned long mem_cgroup_shrink_node(struct mem_cgroup *mem,
 #define node_reclaim_mode 0
 #endif
 
-extern int page_evictable(struct page *page);
+/**
+ * page_evictable - test whether a page is evictable
+ * @page: the page to test
+ *
+ * Test whether page is evictable--i.e., should be placed on active/inactive
+ * lists vs unevictable list.
+ *
+ * Reasons page might not be evictable:
+ * (1) page's mapping marked unevictable
+ * (2) page is part of an mlocked VMA
+ *
+ */
+static inline bool page_evictable(struct page *page)
+{
+	bool ret;
+
+	/* Prevent address_space of inode and swap cache from being freed */
+	rcu_read_lock();
+	ret = !mapping_unevictable(page_mapping(page)) && !PageMlocked(page);
+	rcu_read_unlock();
+	return ret;
+}
+
 extern void check_move_unevictable_pages(struct pagevec *pvec);
 
 extern int kswapd_run(int nid);
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 8763705..855c395 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -4277,29 +4277,6 @@  int node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned int order)
 }
 #endif
 
-/*
- * page_evictable - test whether a page is evictable
- * @page: the page to test
- *
- * Test whether page is evictable--i.e., should be placed on active/inactive
- * lists vs unevictable list.
- *
- * Reasons page might not be evictable:
- * (1) page's mapping marked unevictable
- * (2) page is part of an mlocked VMA
- *
- */
-int page_evictable(struct page *page)
-{
-	int ret;
-
-	/* Prevent address_space of inode and swap cache from being freed */
-	rcu_read_lock();
-	ret = !mapping_unevictable(page_mapping(page)) && !PageMlocked(page);
-	rcu_read_unlock();
-	return ret;
-}
-
 /**
  * check_move_unevictable_pages - check pages for evictability and move to
  * appropriate zone lru list