Message ID | 1584397455-28701-1-git-send-email-yang.shi@linux.alibaba.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2,1/2] mm: swap: make page_evictable() inline | expand |
On Mon, Mar 16, 2020 at 3:24 PM Yang Shi <yang.shi@linux.alibaba.com> wrote: > > When backporting commit 9c4e6b1a7027 ("mm, mlock, vmscan: no more > skipping pagevecs") to our 4.9 kernel, our test bench noticed around 10% > down with a couple of vm-scalability's test cases (lru-file-readonce, > lru-file-readtwice and lru-file-mmap-read). I didn't see that much down > on my VM (32c-64g-2nodes). It might be caused by the test configuration, > which is 32c-256g with NUMA disabled and the tests were run in root memcg, > so the tests actually stress only one inactive and active lru. It > sounds not very usual in mordern production environment. > > That commit did two major changes: > 1. Call page_evictable() > 2. Use smp_mb to force the PG_lru set visible > > It looks they contribute the most overhead. The page_evictable() is a > function which does function prologue and epilogue, and that was used by > page reclaim path only. However, lru add is a very hot path, so it > sounds better to make it inline. However, it calls page_mapping() which > is not inlined either, but the disassemble shows it doesn't do push and > pop operations and it sounds not very straightforward to inline it. > > Other than this, it sounds smp_mb() is not necessary for x86 since > SetPageLRU is atomic which enforces memory barrier already, replace it > with smp_mb__after_atomic() in the following patch. > > With the two fixes applied, the tests can get back around 5% on that > test bench and get back normal on my VM. Since the test bench > configuration is not that usual and I also saw around 6% up on the > latest upstream, so it sounds good enough IMHO. > > The below is test data (lru-file-readtwice throughput) against the v5.6-rc4: > mainline w/ inline fix > 150MB 154MB > > With this patch the throughput gets 2.67% up. The data with using > smp_mb__after_atomic() is showed in the following patch. > > Fixes: 9c4e6b1a7027 ("mm, mlock, vmscan: no more skipping pagevecs") > Cc: Shakeel Butt <shakeelb@google.com> > Cc: Vlastimil Babka <vbabka@suse.cz> > Cc: Matthew Wilcox <willy@infradead.org> > Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com> So, I tested on a real machine with limiting the 'dd' on a single node and reading 100 GiB sparse file (less than a single node). I just ran a single instance to not cause the lru lock contention. The cmd I used is "dd if=file-100GiB of=/dev/null bs=4k". I ran the cmd 10 times with drop_caches in between and measured the time it took. Without patch: 56.64143 +- 0.672 sec With patches: 56.10 +- 0.21 sec Reviewed-and-Tested-by: Shakeel Butt <shakeelb@google.com>
On 3/16/20 4:46 PM, Shakeel Butt wrote: > On Mon, Mar 16, 2020 at 3:24 PM Yang Shi <yang.shi@linux.alibaba.com> wrote: >> When backporting commit 9c4e6b1a7027 ("mm, mlock, vmscan: no more >> skipping pagevecs") to our 4.9 kernel, our test bench noticed around 10% >> down with a couple of vm-scalability's test cases (lru-file-readonce, >> lru-file-readtwice and lru-file-mmap-read). I didn't see that much down >> on my VM (32c-64g-2nodes). It might be caused by the test configuration, >> which is 32c-256g with NUMA disabled and the tests were run in root memcg, >> so the tests actually stress only one inactive and active lru. It >> sounds not very usual in mordern production environment. >> >> That commit did two major changes: >> 1. Call page_evictable() >> 2. Use smp_mb to force the PG_lru set visible >> >> It looks they contribute the most overhead. The page_evictable() is a >> function which does function prologue and epilogue, and that was used by >> page reclaim path only. However, lru add is a very hot path, so it >> sounds better to make it inline. However, it calls page_mapping() which >> is not inlined either, but the disassemble shows it doesn't do push and >> pop operations and it sounds not very straightforward to inline it. >> >> Other than this, it sounds smp_mb() is not necessary for x86 since >> SetPageLRU is atomic which enforces memory barrier already, replace it >> with smp_mb__after_atomic() in the following patch. >> >> With the two fixes applied, the tests can get back around 5% on that >> test bench and get back normal on my VM. Since the test bench >> configuration is not that usual and I also saw around 6% up on the >> latest upstream, so it sounds good enough IMHO. >> >> The below is test data (lru-file-readtwice throughput) against the v5.6-rc4: >> mainline w/ inline fix >> 150MB 154MB >> >> With this patch the throughput gets 2.67% up. The data with using >> smp_mb__after_atomic() is showed in the following patch. >> >> Fixes: 9c4e6b1a7027 ("mm, mlock, vmscan: no more skipping pagevecs") >> Cc: Shakeel Butt <shakeelb@google.com> >> Cc: Vlastimil Babka <vbabka@suse.cz> >> Cc: Matthew Wilcox <willy@infradead.org> >> Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com> > > So, I tested on a real machine with limiting the 'dd' on a single node > and reading 100 GiB sparse file (less than a single node). I just ran > a single instance to not cause the lru lock contention. The cmd I used > is "dd if=file-100GiB of=/dev/null bs=4k". I ran the cmd 10 times with > drop_caches in between and measured the time it took. > > Without patch: 56.64143 +- 0.672 sec > > With patches: 56.10 +- 0.21 sec > > Reviewed-and-Tested-by: Shakeel Butt <shakeelb@google.com> Thanks Shakeel. It'd better to add your test result in the commit log too.
Hi Yang, Thank you for the patch! Yet something to improve: [auto build test ERROR on mmotm/master] [also build test ERROR on linus/master v5.6-rc6 next-20200316] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system. BTW, we also suggest to use '--base' option to specify the base tree in git format-patch, please see https://stackoverflow.com/a/37406982] url: https://github.com/0day-ci/linux/commits/Yang-Shi/mm-swap-make-page_evictable-inline/20200317-094836 base: git://git.cmpxchg.org/linux-mmotm.git master config: i386-tinyconfig (attached as .config) compiler: gcc-7 (Debian 7.5.0-5) 7.5.0 reproduce: # save the attached .config to linux build tree make ARCH=i386 If you fix the issue, kindly add following tag Reported-by: kbuild test robot <lkp@intel.com> All errors (new ones prefixed by >>): In file included from include/linux/suspend.h:5:0, from arch/x86/kernel/asm-offsets.c:13: include/linux/swap.h: In function 'page_evictable': >> include/linux/swap.h:395:9: error: implicit declaration of function 'mapping_unevictable'; did you mean 'mapping_deny_writable'? [-Werror=implicit-function-declaration] ret = !mapping_unevictable(page_mapping(page)) && !PageMlocked(page); ^~~~~~~~~~~~~~~~~~~ mapping_deny_writable cc1: some warnings being treated as errors make[2]: *** [scripts/Makefile.build:99: arch/x86/kernel/asm-offsets.s] Error 1 make[2]: Target '__build' not remade because of errors. make[1]: *** [Makefile:1139: prepare0] Error 2 make[1]: Target 'prepare' not remade because of errors. make: *** [Makefile:179: sub-make] Error 2 11 real 4 user 5 sys 89.51% cpu make prepare vim +395 include/linux/swap.h 376 377 /** 378 * page_evictable - test whether a page is evictable 379 * @page: the page to test 380 * 381 * Test whether page is evictable--i.e., should be placed on active/inactive 382 * lists vs unevictable list. 383 * 384 * Reasons page might not be evictable: 385 * (1) page's mapping marked unevictable 386 * (2) page is part of an mlocked VMA 387 * 388 */ 389 static inline bool page_evictable(struct page *page) 390 { 391 bool ret; 392 393 /* Prevent address_space of inode and swap cache from being freed */ 394 rcu_read_lock(); > 395 ret = !mapping_unevictable(page_mapping(page)) && !PageMlocked(page); 396 rcu_read_unlock(); 397 return ret; 398 } 399 --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
Hi Yang, Thank you for the patch! Yet something to improve: [auto build test ERROR on mmotm/master] [also build test ERROR on linus/master v5.6-rc6 next-20200316] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system. BTW, we also suggest to use '--base' option to specify the base tree in git format-patch, please see https://stackoverflow.com/a/37406982] url: https://github.com/0day-ci/linux/commits/Yang-Shi/mm-swap-make-page_evictable-inline/20200317-094836 base: git://git.cmpxchg.org/linux-mmotm.git master config: openrisc-simple_smp_defconfig (attached as .config) compiler: or1k-linux-gcc (GCC) 9.2.0 reproduce: wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree GCC_VERSION=9.2.0 make.cross ARCH=openrisc If you fix the issue, kindly add following tag Reported-by: kbuild test robot <lkp@intel.com> All errors (new ones prefixed by >>): In file included from include/linux/suspend.h:5, from init/do_mounts.c:7: include/linux/swap.h: In function 'page_evictable': include/linux/swap.h:395:9: error: implicit declaration of function 'mapping_unevictable'; did you mean 'mapping_deny_writable'? [-Werror=implicit-function-declaration] 395 | ret = !mapping_unevictable(page_mapping(page)) && !PageMlocked(page); | ^~~~~~~~~~~~~~~~~~~ | mapping_deny_writable In file included from include/linux/mempolicy.h:16, from include/linux/shmem_fs.h:7, from init/do_mounts.c:21: include/linux/pagemap.h: At top level: >> include/linux/pagemap.h:73:20: error: conflicting types for 'mapping_unevictable' 73 | static inline bool mapping_unevictable(struct address_space *mapping) | ^~~~~~~~~~~~~~~~~~~ In file included from include/linux/suspend.h:5, from init/do_mounts.c:7: include/linux/swap.h:395:9: note: previous implicit declaration of 'mapping_unevictable' was here 395 | ret = !mapping_unevictable(page_mapping(page)) && !PageMlocked(page); | ^~~~~~~~~~~~~~~~~~~ cc1: some warnings being treated as errors vim +/mapping_unevictable +73 include/linux/pagemap.h 72 > 73 static inline bool mapping_unevictable(struct address_space *mapping) 74 { 75 if (mapping) 76 return test_bit(AS_UNEVICTABLE, &mapping->flags); 77 return !!mapping; 78 } 79 --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
On 3/16/20 8:00 PM, kbuild test robot wrote: > Hi Yang, > > Thank you for the patch! Yet something to improve: > > [auto build test ERROR on mmotm/master] > [also build test ERROR on linus/master v5.6-rc6 next-20200316] > [if your patch is applied to the wrong git tree, please drop us a note to help > improve the system. BTW, we also suggest to use '--base' option to specify the > base tree in git format-patch, please see https://stackoverflow.com/a/37406982] > > url: https://github.com/0day-ci/linux/commits/Yang-Shi/mm-swap-make-page_evictable-inline/20200317-094836 > base: git://git.cmpxchg.org/linux-mmotm.git master > config: i386-tinyconfig (attached as .config) > compiler: gcc-7 (Debian 7.5.0-5) 7.5.0 > reproduce: > # save the attached .config to linux build tree > make ARCH=i386 > > If you fix the issue, kindly add following tag > Reported-by: kbuild test robot <lkp@intel.com> > > All errors (new ones prefixed by >>): > > In file included from include/linux/suspend.h:5:0, > from arch/x86/kernel/asm-offsets.c:13: > include/linux/swap.h: In function 'page_evictable': >>> include/linux/swap.h:395:9: error: implicit declaration of function 'mapping_unevictable'; did you mean 'mapping_deny_writable'? [-Werror=implicit-function-declaration] It looks pagemap.h need to be included. The below patch should be able to fix the build error: diff --git a/include/linux/swap.h b/include/linux/swap.h index d296c70..e8b8bbe 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -12,6 +12,7 @@ #include <linux/fs.h> #include <linux/atomic.h> #include <linux/page-flags.h> +#include <linux/pagemap.h> #include <asm/page.h> struct notifier_block; > ret = !mapping_unevictable(page_mapping(page)) && !PageMlocked(page); > ^~~~~~~~~~~~~~~~~~~ > mapping_deny_writable > cc1: some warnings being treated as errors > make[2]: *** [scripts/Makefile.build:99: arch/x86/kernel/asm-offsets.s] Error 1 > make[2]: Target '__build' not remade because of errors. > make[1]: *** [Makefile:1139: prepare0] Error 2 > make[1]: Target 'prepare' not remade because of errors. > make: *** [Makefile:179: sub-make] Error 2 > 11 real 4 user 5 sys 89.51% cpu make prepare > > vim +395 include/linux/swap.h > > 376 > 377 /** > 378 * page_evictable - test whether a page is evictable > 379 * @page: the page to test > 380 * > 381 * Test whether page is evictable--i.e., should be placed on active/inactive > 382 * lists vs unevictable list. > 383 * > 384 * Reasons page might not be evictable: > 385 * (1) page's mapping marked unevictable > 386 * (2) page is part of an mlocked VMA > 387 * > 388 */ > 389 static inline bool page_evictable(struct page *page) > 390 { > 391 bool ret; > 392 > 393 /* Prevent address_space of inode and swap cache from being freed */ > 394 rcu_read_lock(); > > 395 ret = !mapping_unevictable(page_mapping(page)) && !PageMlocked(page); > 396 rcu_read_unlock(); > 397 return ret; > 398 } > 399 > > --- > 0-DAY CI Kernel Test Service, Intel Corporation > https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
On 3/16/20 11:24 PM, Yang Shi wrote: > When backporting commit 9c4e6b1a7027 ("mm, mlock, vmscan: no more > skipping pagevecs") to our 4.9 kernel, our test bench noticed around 10% > down with a couple of vm-scalability's test cases (lru-file-readonce, > lru-file-readtwice and lru-file-mmap-read). I didn't see that much down > on my VM (32c-64g-2nodes). It might be caused by the test configuration, > which is 32c-256g with NUMA disabled and the tests were run in root memcg, > so the tests actually stress only one inactive and active lru. It > sounds not very usual in mordern production environment. > > That commit did two major changes: > 1. Call page_evictable() > 2. Use smp_mb to force the PG_lru set visible > > It looks they contribute the most overhead. The page_evictable() is a > function which does function prologue and epilogue, and that was used by > page reclaim path only. However, lru add is a very hot path, so it > sounds better to make it inline. However, it calls page_mapping() which > is not inlined either, but the disassemble shows it doesn't do push and > pop operations and it sounds not very straightforward to inline it. > > Other than this, it sounds smp_mb() is not necessary for x86 since > SetPageLRU is atomic which enforces memory barrier already, replace it > with smp_mb__after_atomic() in the following patch. > > With the two fixes applied, the tests can get back around 5% on that > test bench and get back normal on my VM. Since the test bench > configuration is not that usual and I also saw around 6% up on the > latest upstream, so it sounds good enough IMHO. > > The below is test data (lru-file-readtwice throughput) against the v5.6-rc4: > mainline w/ inline fix > 150MB 154MB > > With this patch the throughput gets 2.67% up. The data with using > smp_mb__after_atomic() is showed in the following patch. > > Fixes: 9c4e6b1a7027 ("mm, mlock, vmscan: no more skipping pagevecs") > Cc: Shakeel Butt <shakeelb@google.com> > Cc: Vlastimil Babka <vbabka@suse.cz> > Cc: Matthew Wilcox <willy@infradead.org> > Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Thanks.
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index ccb14b6..654ce76 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -70,7 +70,7 @@ static inline void mapping_clear_unevictable(struct address_space *mapping) clear_bit(AS_UNEVICTABLE, &mapping->flags); } -static inline int mapping_unevictable(struct address_space *mapping) +static inline bool mapping_unevictable(struct address_space *mapping) { if (mapping) return test_bit(AS_UNEVICTABLE, &mapping->flags); diff --git a/include/linux/swap.h b/include/linux/swap.h index 1e99f7a..d296c70 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -374,7 +374,29 @@ extern unsigned long mem_cgroup_shrink_node(struct mem_cgroup *mem, #define node_reclaim_mode 0 #endif -extern int page_evictable(struct page *page); +/** + * page_evictable - test whether a page is evictable + * @page: the page to test + * + * Test whether page is evictable--i.e., should be placed on active/inactive + * lists vs unevictable list. + * + * Reasons page might not be evictable: + * (1) page's mapping marked unevictable + * (2) page is part of an mlocked VMA + * + */ +static inline bool page_evictable(struct page *page) +{ + bool ret; + + /* Prevent address_space of inode and swap cache from being freed */ + rcu_read_lock(); + ret = !mapping_unevictable(page_mapping(page)) && !PageMlocked(page); + rcu_read_unlock(); + return ret; +} + extern void check_move_unevictable_pages(struct pagevec *pvec); extern int kswapd_run(int nid); diff --git a/mm/vmscan.c b/mm/vmscan.c index 8763705..855c395 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -4277,29 +4277,6 @@ int node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned int order) } #endif -/* - * page_evictable - test whether a page is evictable - * @page: the page to test - * - * Test whether page is evictable--i.e., should be placed on active/inactive - * lists vs unevictable list. - * - * Reasons page might not be evictable: - * (1) page's mapping marked unevictable - * (2) page is part of an mlocked VMA - * - */ -int page_evictable(struct page *page) -{ - int ret; - - /* Prevent address_space of inode and swap cache from being freed */ - rcu_read_lock(); - ret = !mapping_unevictable(page_mapping(page)) && !PageMlocked(page); - rcu_read_unlock(); - return ret; -} - /** * check_move_unevictable_pages - check pages for evictability and move to * appropriate zone lru list
When backporting commit 9c4e6b1a7027 ("mm, mlock, vmscan: no more skipping pagevecs") to our 4.9 kernel, our test bench noticed around 10% down with a couple of vm-scalability's test cases (lru-file-readonce, lru-file-readtwice and lru-file-mmap-read). I didn't see that much down on my VM (32c-64g-2nodes). It might be caused by the test configuration, which is 32c-256g with NUMA disabled and the tests were run in root memcg, so the tests actually stress only one inactive and active lru. It sounds not very usual in mordern production environment. That commit did two major changes: 1. Call page_evictable() 2. Use smp_mb to force the PG_lru set visible It looks they contribute the most overhead. The page_evictable() is a function which does function prologue and epilogue, and that was used by page reclaim path only. However, lru add is a very hot path, so it sounds better to make it inline. However, it calls page_mapping() which is not inlined either, but the disassemble shows it doesn't do push and pop operations and it sounds not very straightforward to inline it. Other than this, it sounds smp_mb() is not necessary for x86 since SetPageLRU is atomic which enforces memory barrier already, replace it with smp_mb__after_atomic() in the following patch. With the two fixes applied, the tests can get back around 5% on that test bench and get back normal on my VM. Since the test bench configuration is not that usual and I also saw around 6% up on the latest upstream, so it sounds good enough IMHO. The below is test data (lru-file-readtwice throughput) against the v5.6-rc4: mainline w/ inline fix 150MB 154MB With this patch the throughput gets 2.67% up. The data with using smp_mb__after_atomic() is showed in the following patch. Fixes: 9c4e6b1a7027 ("mm, mlock, vmscan: no more skipping pagevecs") Cc: Shakeel Butt <shakeelb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com> --- v2: * Solved the comments from Willy. include/linux/pagemap.h | 2 +- include/linux/swap.h | 24 +++++++++++++++++++++++- mm/vmscan.c | 23 ----------------------- 3 files changed, 24 insertions(+), 25 deletions(-)