diff mbox series

[v4] mm/vmstat: add events for ksm cow

Message ID 20220330082640.2381401-1-yang.yang29@zte.com.cn (mailing list archive)
State New
Headers show
Series [v4] mm/vmstat: add events for ksm cow | expand

Commit Message

CGEL March 30, 2022, 8:26 a.m. UTC
From: Yang Yang <yang.yang29@zte.com.cn>

Users may use ksm by calling madvise(, , MADV_MERGEABLE) when they want
to save memory, it's a tradeoff by suffering delay on ksm cow. Users can
get to know how much memory ksm saved by reading
/sys/kernel/mm/ksm/pages_sharing, but they don't know what's the costs
of ksm cow, and this is important of some delay sensitive tasks.

So add ksm cow events to help users evaluate whether or how to use ksm.
Also update /Documentation/admin-guide/mm/ksm.rst with new added events.

Signed-off-by: Yang Yang <yang.yang29@zte.com.cn>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: xu xin <xu.xin16@zte.com.cn>
Reviewed-by: Ran Xiaokai <ran.xiaokai@zte.com.cn>
---
v2:
- fix compile error when CONFIG_KSM is not set
v3:
- delete KSM_COW_FAIL event
v4:
- modify /Documentation/admin-guide/mm/ksm.rst. And let cow_ksm before
- ksm_swpin_copy, so if new cow_* event could add before cow_ksm.
---
 Documentation/admin-guide/mm/ksm.rst | 18 ++++++++++++++++++
 include/linux/vm_event_item.h        |  1 +
 mm/memory.c                          |  4 ++++
 mm/vmstat.c                          |  1 +
 4 files changed, 24 insertions(+)

Comments

kernel test robot March 30, 2022, 1:29 p.m. UTC | #1
Hi,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on hnaz-mm/master]

url:    https://github.com/intel-lab-lkp/linux/commits/cgel-zte-gmail-com/mm-vmstat-add-events-for-ksm-cow/20220330-162859
base:   https://github.com/hnaz/linux-mm master
config: nios2-buildonly-randconfig-r001-20220330 (https://download.01.org/0day-ci/archive/20220330/202203302122.cFR0tu2M-lkp@intel.com/config)
compiler: nios2-linux-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/48bd13bff24d30af750dd9429638a2563b758611
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review cgel-zte-gmail-com/mm-vmstat-add-events-for-ksm-cow/20220330-162859
        git checkout 48bd13bff24d30af750dd9429638a2563b758611
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross O=build_dir ARCH=nios2 SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   mm/memory.c: In function 'do_wp_page':
>> mm/memory.c:3336:32: error: 'COW_KSM' undeclared (first use in this function)
    3336 |                 count_vm_event(COW_KSM);
         |                                ^~~~~~~
   mm/memory.c:3336:32: note: each undeclared identifier is reported only once for each function it appears in


vim +/COW_KSM +3336 mm/memory.c

  3229	
  3230	/*
  3231	 * This routine handles present pages, when users try to write
  3232	 * to a shared page. It is done by copying the page to a new address
  3233	 * and decrementing the shared-page counter for the old page.
  3234	 *
  3235	 * Note that this routine assumes that the protection checks have been
  3236	 * done by the caller (the low-level page fault routine in most cases).
  3237	 * Thus we can safely just mark it writable once we've done any necessary
  3238	 * COW.
  3239	 *
  3240	 * We also mark the page dirty at this point even though the page will
  3241	 * change only once the write actually happens. This avoids a few races,
  3242	 * and potentially makes it more efficient.
  3243	 *
  3244	 * We enter with non-exclusive mmap_lock (to exclude vma changes,
  3245	 * but allow concurrent faults), with pte both mapped and locked.
  3246	 * We return with mmap_lock still held, but pte unmapped and unlocked.
  3247	 */
  3248	static vm_fault_t do_wp_page(struct vm_fault *vmf)
  3249		__releases(vmf->ptl)
  3250	{
  3251		struct vm_area_struct *vma = vmf->vma;
  3252	
  3253		if (userfaultfd_pte_wp(vma, *vmf->pte)) {
  3254			pte_unmap_unlock(vmf->pte, vmf->ptl);
  3255			return handle_userfault(vmf, VM_UFFD_WP);
  3256		}
  3257	
  3258		/*
  3259		 * Userfaultfd write-protect can defer flushes. Ensure the TLB
  3260		 * is flushed in this case before copying.
  3261		 */
  3262		if (unlikely(userfaultfd_wp(vmf->vma) &&
  3263			     mm_tlb_flush_pending(vmf->vma->vm_mm)))
  3264			flush_tlb_page(vmf->vma, vmf->address);
  3265	
  3266		vmf->page = vm_normal_page(vma, vmf->address, vmf->orig_pte);
  3267		if (!vmf->page) {
  3268			/*
  3269			 * VM_MIXEDMAP !pfn_valid() case, or VM_SOFTDIRTY clear on a
  3270			 * VM_PFNMAP VMA.
  3271			 *
  3272			 * We should not cow pages in a shared writeable mapping.
  3273			 * Just mark the pages writable and/or call ops->pfn_mkwrite.
  3274			 */
  3275			if ((vma->vm_flags & (VM_WRITE|VM_SHARED)) ==
  3276					     (VM_WRITE|VM_SHARED))
  3277				return wp_pfn_shared(vmf);
  3278	
  3279			pte_unmap_unlock(vmf->pte, vmf->ptl);
  3280			return wp_page_copy(vmf);
  3281		}
  3282	
  3283		/*
  3284		 * Take out anonymous pages first, anonymous shared vmas are
  3285		 * not dirty accountable.
  3286		 */
  3287		if (PageAnon(vmf->page)) {
  3288			struct page *page = vmf->page;
  3289	
  3290			/*
  3291			 * We have to verify under page lock: these early checks are
  3292			 * just an optimization to avoid locking the page and freeing
  3293			 * the swapcache if there is little hope that we can reuse.
  3294			 *
  3295			 * PageKsm() doesn't necessarily raise the page refcount.
  3296			 */
  3297			if (PageKsm(page) || page_count(page) > 3)
  3298				goto copy;
  3299			if (!PageLRU(page))
  3300				/*
  3301				 * Note: We cannot easily detect+handle references from
  3302				 * remote LRU pagevecs or references to PageLRU() pages.
  3303				 */
  3304				lru_add_drain();
  3305			if (page_count(page) > 1 + PageSwapCache(page))
  3306				goto copy;
  3307			if (!trylock_page(page))
  3308				goto copy;
  3309			if (PageSwapCache(page))
  3310				try_to_free_swap(page);
  3311			if (PageKsm(page) || page_count(page) != 1) {
  3312				unlock_page(page);
  3313				goto copy;
  3314			}
  3315			/*
  3316			 * Ok, we've got the only page reference from our mapping
  3317			 * and the page is locked, it's dark out, and we're wearing
  3318			 * sunglasses. Hit it.
  3319			 */
  3320			unlock_page(page);
  3321			wp_page_reuse(vmf);
  3322			return VM_FAULT_WRITE;
  3323		} else if (unlikely((vma->vm_flags & (VM_WRITE|VM_SHARED)) ==
  3324						(VM_WRITE|VM_SHARED))) {
  3325			return wp_page_shared(vmf);
  3326		}
  3327	copy:
  3328		/*
  3329		 * Ok, we need to copy. Oh, well..
  3330		 */
  3331		get_page(vmf->page);
  3332	
  3333		pte_unmap_unlock(vmf->pte, vmf->ptl);
  3334	#ifdef CONFIG_KSM
  3335		if (PageKsm(vmf->page))
> 3336			count_vm_event(COW_KSM);
  3337	#endif
  3338		return wp_page_copy(vmf);
  3339	}
  3340
kernel test robot March 30, 2022, 1:40 p.m. UTC | #2
Hi,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on hnaz-mm/master]

url:    https://github.com/intel-lab-lkp/linux/commits/cgel-zte-gmail-com/mm-vmstat-add-events-for-ksm-cow/20220330-162859
base:   https://github.com/hnaz/linux-mm master
config: hexagon-buildonly-randconfig-r006-20220330 (https://download.01.org/0day-ci/archive/20220330/202203302113.QzkSLwzk-lkp@intel.com/config)
compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project 0f6d9501cf49ce02937099350d08f20c4af86f3d)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/48bd13bff24d30af750dd9429638a2563b758611
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review cgel-zte-gmail-com/mm-vmstat-add-events-for-ksm-cow/20220330-162859
        git checkout 48bd13bff24d30af750dd9429638a2563b758611
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=hexagon SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> mm/memory.c:3336:18: error: use of undeclared identifier 'COW_KSM'
                   count_vm_event(COW_KSM);
                                  ^
   1 error generated.


vim +/COW_KSM +3336 mm/memory.c

  3229	
  3230	/*
  3231	 * This routine handles present pages, when users try to write
  3232	 * to a shared page. It is done by copying the page to a new address
  3233	 * and decrementing the shared-page counter for the old page.
  3234	 *
  3235	 * Note that this routine assumes that the protection checks have been
  3236	 * done by the caller (the low-level page fault routine in most cases).
  3237	 * Thus we can safely just mark it writable once we've done any necessary
  3238	 * COW.
  3239	 *
  3240	 * We also mark the page dirty at this point even though the page will
  3241	 * change only once the write actually happens. This avoids a few races,
  3242	 * and potentially makes it more efficient.
  3243	 *
  3244	 * We enter with non-exclusive mmap_lock (to exclude vma changes,
  3245	 * but allow concurrent faults), with pte both mapped and locked.
  3246	 * We return with mmap_lock still held, but pte unmapped and unlocked.
  3247	 */
  3248	static vm_fault_t do_wp_page(struct vm_fault *vmf)
  3249		__releases(vmf->ptl)
  3250	{
  3251		struct vm_area_struct *vma = vmf->vma;
  3252	
  3253		if (userfaultfd_pte_wp(vma, *vmf->pte)) {
  3254			pte_unmap_unlock(vmf->pte, vmf->ptl);
  3255			return handle_userfault(vmf, VM_UFFD_WP);
  3256		}
  3257	
  3258		/*
  3259		 * Userfaultfd write-protect can defer flushes. Ensure the TLB
  3260		 * is flushed in this case before copying.
  3261		 */
  3262		if (unlikely(userfaultfd_wp(vmf->vma) &&
  3263			     mm_tlb_flush_pending(vmf->vma->vm_mm)))
  3264			flush_tlb_page(vmf->vma, vmf->address);
  3265	
  3266		vmf->page = vm_normal_page(vma, vmf->address, vmf->orig_pte);
  3267		if (!vmf->page) {
  3268			/*
  3269			 * VM_MIXEDMAP !pfn_valid() case, or VM_SOFTDIRTY clear on a
  3270			 * VM_PFNMAP VMA.
  3271			 *
  3272			 * We should not cow pages in a shared writeable mapping.
  3273			 * Just mark the pages writable and/or call ops->pfn_mkwrite.
  3274			 */
  3275			if ((vma->vm_flags & (VM_WRITE|VM_SHARED)) ==
  3276					     (VM_WRITE|VM_SHARED))
  3277				return wp_pfn_shared(vmf);
  3278	
  3279			pte_unmap_unlock(vmf->pte, vmf->ptl);
  3280			return wp_page_copy(vmf);
  3281		}
  3282	
  3283		/*
  3284		 * Take out anonymous pages first, anonymous shared vmas are
  3285		 * not dirty accountable.
  3286		 */
  3287		if (PageAnon(vmf->page)) {
  3288			struct page *page = vmf->page;
  3289	
  3290			/*
  3291			 * We have to verify under page lock: these early checks are
  3292			 * just an optimization to avoid locking the page and freeing
  3293			 * the swapcache if there is little hope that we can reuse.
  3294			 *
  3295			 * PageKsm() doesn't necessarily raise the page refcount.
  3296			 */
  3297			if (PageKsm(page) || page_count(page) > 3)
  3298				goto copy;
  3299			if (!PageLRU(page))
  3300				/*
  3301				 * Note: We cannot easily detect+handle references from
  3302				 * remote LRU pagevecs or references to PageLRU() pages.
  3303				 */
  3304				lru_add_drain();
  3305			if (page_count(page) > 1 + PageSwapCache(page))
  3306				goto copy;
  3307			if (!trylock_page(page))
  3308				goto copy;
  3309			if (PageSwapCache(page))
  3310				try_to_free_swap(page);
  3311			if (PageKsm(page) || page_count(page) != 1) {
  3312				unlock_page(page);
  3313				goto copy;
  3314			}
  3315			/*
  3316			 * Ok, we've got the only page reference from our mapping
  3317			 * and the page is locked, it's dark out, and we're wearing
  3318			 * sunglasses. Hit it.
  3319			 */
  3320			unlock_page(page);
  3321			wp_page_reuse(vmf);
  3322			return VM_FAULT_WRITE;
  3323		} else if (unlikely((vma->vm_flags & (VM_WRITE|VM_SHARED)) ==
  3324						(VM_WRITE|VM_SHARED))) {
  3325			return wp_page_shared(vmf);
  3326		}
  3327	copy:
  3328		/*
  3329		 * Ok, we need to copy. Oh, well..
  3330		 */
  3331		get_page(vmf->page);
  3332	
  3333		pte_unmap_unlock(vmf->pte, vmf->ptl);
  3334	#ifdef CONFIG_KSM
  3335		if (PageKsm(vmf->page))
> 3336			count_vm_event(COW_KSM);
  3337	#endif
  3338		return wp_page_copy(vmf);
  3339	}
  3340
diff mbox series

Patch

diff --git a/Documentation/admin-guide/mm/ksm.rst b/Documentation/admin-guide/mm/ksm.rst
index 97d816791aca..b244f0202a03 100644
--- a/Documentation/admin-guide/mm/ksm.rst
+++ b/Documentation/admin-guide/mm/ksm.rst
@@ -184,6 +184,24 @@  The maximum possible ``pages_sharing/pages_shared`` ratio is limited by the
 ``max_page_sharing`` tunable. To increase the ratio ``max_page_sharing`` must
 be increased accordingly.
 
+Monitoring KSM events
+=====================
+
+There are some counters in /proc/vmstat that may be used to monitor KSM events.
+KSM might help save memory, it's a tradeoff by may suffering delay on KSM COW
+or on swapping in copy. Those events could help users evaluate whether or how
+to use KSM. For example, if cow_ksm increases too fast, user may decrease the
+range of madvise(, , MADV_MERGEABLE).
+
+cow_ksm
+	is incremented every time a KSM page triggers copy on write (COW)
+	when users try to write to a KSM page, we have to make a copy.
+
+ksm_swpin_copy
+	is incremented every time a KSM page is copied when swapping in
+	note that KSM page might be copied when swapping in because do_swap_page()
+	cannot do all the locking needed to reconstitute a cross-anon_vma KSM page.
+
 --
 Izik Eidus,
 Hugh Dickins, 17 Nov 2009
diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
index 16a0a4fd000b..74ec4b6a9ed0 100644
--- a/include/linux/vm_event_item.h
+++ b/include/linux/vm_event_item.h
@@ -130,6 +130,7 @@  enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
 		SWAP_RA,
 		SWAP_RA_HIT,
 #ifdef CONFIG_KSM
+		COW_KSM,
 		KSM_SWPIN_COPY,
 #endif
 #endif
diff --git a/mm/memory.c b/mm/memory.c
index 4111f97c91a0..12925ceaf745 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3339,6 +3339,10 @@  static vm_fault_t do_wp_page(struct vm_fault *vmf)
 	get_page(vmf->page);
 
 	pte_unmap_unlock(vmf->pte, vmf->ptl);
+#ifdef CONFIG_KSM
+	if (PageKsm(vmf->page))
+		count_vm_event(COW_KSM);
+#endif
 	return wp_page_copy(vmf);
 }
 
diff --git a/mm/vmstat.c b/mm/vmstat.c
index d5cc8d739fac..250ae0652740 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1389,6 +1389,7 @@  const char * const vmstat_text[] = {
 	"swap_ra",
 	"swap_ra_hit",
 #ifdef CONFIG_KSM
+	"cow_ksm",
 	"ksm_swpin_copy",
 #endif
 #endif