diff mbox series

[-next,2/3] mm: speed up mremap by 20x on large regions (v4)

Message ID 20181103040041.7085-3-joelaf@google.com (mailing list archive)
State New, archived
Headers show
Series Add support for fast mremap | expand

Commit Message

Joel Fernandes Nov. 3, 2018, 4 a.m. UTC
From: "Joel Fernandes (Google)" <joel@joelfernandes.org>

Android needs to mremap large regions of memory during memory management
related operations. The mremap system call can be really slow if THP is
not enabled. The bottleneck is move_page_tables, which is copying each
pte at a time, and can be really slow across a large map. Turning on THP
may not be a viable option, and is not for us. This patch speeds up the
performance for non-THP system by copying at the PMD level when possible.

The speed up is an order of magnitude on x86 (~20x). On a 1GB mremap,
the mremap completion times drops from 3.4-3.6 milliseconds to 144-160
microseconds.

Before:
Total mremap time for 1GB data: 3521942 nanoseconds.
Total mremap time for 1GB data: 3449229 nanoseconds.
Total mremap time for 1GB data: 3488230 nanoseconds.

After:
Total mremap time for 1GB data: 150279 nanoseconds.
Total mremap time for 1GB data: 144665 nanoseconds.
Total mremap time for 1GB data: 158708 nanoseconds.

Incase THP is enabled, the optimization is mostly skipped except in
certain situations.

Acked-by: Kirill A. Shutemov <kirill@shutemov.name>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---

Note that since the bug fix in [1], we now have to flush the TLB every
PMD move. The above numbers were obtained on x86 with a flush done every
move. For arm64, I previously encountered performance issues doing a
flush everytime we move, however Will Deacon says [2] the performance
should be better now with recent release. Until we can evaluate arm64, I
am dropping the HAVE_MOVE_PMD config enable patch for ARM64 for now. It
can be added back once we finish the performance evaluation. Also of
note is that the speed up on arm64 with this patch but without the TLB
flush every PMD move is around 500x.

[1] https://bugs.chromium.org/p/project-zero/issues/detail?id=1695
[2] https://www.mail-archive.com/linuxppc-dev@lists.ozlabs.org/msg140837.html

 arch/Kconfig |  5 +++++
 mm/mremap.c  | 60 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 65 insertions(+)

Comments

kernel test robot Nov. 3, 2018, 4:45 p.m. UTC | #1
Hi Joel,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on next-20181102]

url:    https://github.com/0day-ci/linux/commits/Joel-Fernandes/Add-support-for-fast-mremap/20181103-224908
config: xtensa-allmodconfig (attached as .config)
compiler: xtensa-linux-gcc (GCC) 8.1.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        GCC_VERSION=8.1.0 make.cross ARCH=xtensa 

All errors (new ones prefixed by >>):

   mm/mremap.c: In function 'move_normal_pmd':
>> mm/mremap.c:229:2: error: implicit declaration of function 'set_pmd_at'; did you mean 'set_pte_at'? [-Werror=implicit-function-declaration]
     set_pmd_at(mm, new_addr, new_pmd, pmd);
     ^~~~~~~~~~
     set_pte_at
   cc1: some warnings being treated as errors

vim +229 mm/mremap.c

   193	
   194	static bool move_normal_pmd(struct vm_area_struct *vma, unsigned long old_addr,
   195			  unsigned long new_addr, unsigned long old_end,
   196			  pmd_t *old_pmd, pmd_t *new_pmd)
   197	{
   198		spinlock_t *old_ptl, *new_ptl;
   199		struct mm_struct *mm = vma->vm_mm;
   200		pmd_t pmd;
   201	
   202		if ((old_addr & ~PMD_MASK) || (new_addr & ~PMD_MASK)
   203		    || old_end - old_addr < PMD_SIZE)
   204			return false;
   205	
   206		/*
   207		 * The destination pmd shouldn't be established, free_pgtables()
   208		 * should have release it.
   209		 */
   210		if (WARN_ON(!pmd_none(*new_pmd)))
   211			return false;
   212	
   213		/*
   214		 * We don't have to worry about the ordering of src and dst
   215		 * ptlocks because exclusive mmap_sem prevents deadlock.
   216		 */
   217		old_ptl = pmd_lock(vma->vm_mm, old_pmd);
   218		new_ptl = pmd_lockptr(mm, new_pmd);
   219		if (new_ptl != old_ptl)
   220			spin_lock_nested(new_ptl, SINGLE_DEPTH_NESTING);
   221	
   222		/* Clear the pmd */
   223		pmd = *old_pmd;
   224		pmd_clear(old_pmd);
   225	
   226		VM_BUG_ON(!pmd_none(*new_pmd));
   227	
   228		/* Set the new pmd */
 > 229		set_pmd_at(mm, new_addr, new_pmd, pmd);
   230		flush_tlb_range(vma, old_addr, old_addr + PMD_SIZE);
   231		if (new_ptl != old_ptl)
   232			spin_unlock(new_ptl);
   233		spin_unlock(old_ptl);
   234	
   235		return true;
   236	}
   237	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation
kernel test robot Nov. 3, 2018, 4:56 p.m. UTC | #2
Hi Joel,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on next-20181102]

url:    https://github.com/0day-ci/linux/commits/Joel-Fernandes/Add-support-for-fast-mremap/20181103-224908
config: openrisc-or1ksim_defconfig (attached as .config)
compiler: or1k-linux-gcc (GCC) 6.0.0 20160327 (experimental)
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=openrisc 

All errors (new ones prefixed by >>):

   mm/mremap.c: In function 'move_normal_pmd':
>> mm/mremap.c:229:2: error: implicit declaration of function 'set_pmd_at' [-Werror=implicit-function-declaration]
     set_pmd_at(mm, new_addr, new_pmd, pmd);
     ^~~~~~~~~~
   cc1: some warnings being treated as errors

vim +/set_pmd_at +229 mm/mremap.c

   193	
   194	static bool move_normal_pmd(struct vm_area_struct *vma, unsigned long old_addr,
   195			  unsigned long new_addr, unsigned long old_end,
   196			  pmd_t *old_pmd, pmd_t *new_pmd)
   197	{
   198		spinlock_t *old_ptl, *new_ptl;
   199		struct mm_struct *mm = vma->vm_mm;
   200		pmd_t pmd;
   201	
   202		if ((old_addr & ~PMD_MASK) || (new_addr & ~PMD_MASK)
   203		    || old_end - old_addr < PMD_SIZE)
   204			return false;
   205	
   206		/*
   207		 * The destination pmd shouldn't be established, free_pgtables()
   208		 * should have release it.
   209		 */
   210		if (WARN_ON(!pmd_none(*new_pmd)))
   211			return false;
   212	
   213		/*
   214		 * We don't have to worry about the ordering of src and dst
   215		 * ptlocks because exclusive mmap_sem prevents deadlock.
   216		 */
   217		old_ptl = pmd_lock(vma->vm_mm, old_pmd);
   218		new_ptl = pmd_lockptr(mm, new_pmd);
   219		if (new_ptl != old_ptl)
   220			spin_lock_nested(new_ptl, SINGLE_DEPTH_NESTING);
   221	
   222		/* Clear the pmd */
   223		pmd = *old_pmd;
   224		pmd_clear(old_pmd);
   225	
   226		VM_BUG_ON(!pmd_none(*new_pmd));
   227	
   228		/* Set the new pmd */
 > 229		set_pmd_at(mm, new_addr, new_pmd, pmd);
   230		flush_tlb_range(vma, old_addr, old_addr + PMD_SIZE);
   231		if (new_ptl != old_ptl)
   232			spin_unlock(new_ptl);
   233		spin_unlock(old_ptl);
   234	
   235		return true;
   236	}
   237	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation
diff mbox series

Patch

diff --git a/arch/Kconfig b/arch/Kconfig
index e1e540ffa979..b70c952ac838 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -535,6 +535,11 @@  config HAVE_IRQ_TIME_ACCOUNTING
 	  Archs need to ensure they use a high enough resolution clock to
 	  support irq time accounting and then call enable_sched_clock_irqtime().
 
+config HAVE_MOVE_PMD
+	bool
+	help
+	  Archs that select this are able to move page tables at the PMD level.
+
 config HAVE_ARCH_TRANSPARENT_HUGEPAGE
 	bool
 
diff --git a/mm/mremap.c b/mm/mremap.c
index 7c9ab747f19d..7cf6b0943090 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -191,6 +191,50 @@  static void move_ptes(struct vm_area_struct *vma, pmd_t *old_pmd,
 		drop_rmap_locks(vma);
 }
 
+static bool move_normal_pmd(struct vm_area_struct *vma, unsigned long old_addr,
+		  unsigned long new_addr, unsigned long old_end,
+		  pmd_t *old_pmd, pmd_t *new_pmd)
+{
+	spinlock_t *old_ptl, *new_ptl;
+	struct mm_struct *mm = vma->vm_mm;
+	pmd_t pmd;
+
+	if ((old_addr & ~PMD_MASK) || (new_addr & ~PMD_MASK)
+	    || old_end - old_addr < PMD_SIZE)
+		return false;
+
+	/*
+	 * The destination pmd shouldn't be established, free_pgtables()
+	 * should have release it.
+	 */
+	if (WARN_ON(!pmd_none(*new_pmd)))
+		return false;
+
+	/*
+	 * We don't have to worry about the ordering of src and dst
+	 * ptlocks because exclusive mmap_sem prevents deadlock.
+	 */
+	old_ptl = pmd_lock(vma->vm_mm, old_pmd);
+	new_ptl = pmd_lockptr(mm, new_pmd);
+	if (new_ptl != old_ptl)
+		spin_lock_nested(new_ptl, SINGLE_DEPTH_NESTING);
+
+	/* Clear the pmd */
+	pmd = *old_pmd;
+	pmd_clear(old_pmd);
+
+	VM_BUG_ON(!pmd_none(*new_pmd));
+
+	/* Set the new pmd */
+	set_pmd_at(mm, new_addr, new_pmd, pmd);
+	flush_tlb_range(vma, old_addr, old_addr + PMD_SIZE);
+	if (new_ptl != old_ptl)
+		spin_unlock(new_ptl);
+	spin_unlock(old_ptl);
+
+	return true;
+}
+
 unsigned long move_page_tables(struct vm_area_struct *vma,
 		unsigned long old_addr, struct vm_area_struct *new_vma,
 		unsigned long new_addr, unsigned long len,
@@ -237,7 +281,23 @@  unsigned long move_page_tables(struct vm_area_struct *vma,
 			split_huge_pmd(vma, old_pmd, old_addr);
 			if (pmd_trans_unstable(old_pmd))
 				continue;
+		} else if (extent == PMD_SIZE && IS_ENABLED(CONFIG_HAVE_MOVE_PMD)) {
+			/*
+			 * If the extent is PMD-sized, try to speed the move by
+			 * moving at the PMD level if possible.
+			 */
+			bool moved;
+
+			if (need_rmap_locks)
+				take_rmap_locks(vma);
+			moved = move_normal_pmd(vma, old_addr, new_addr,
+					old_end, old_pmd, new_pmd);
+			if (need_rmap_locks)
+				drop_rmap_locks(vma);
+			if (moved)
+				continue;
 		}
+
 		if (pte_alloc(new_vma->vm_mm, new_pmd))
 			break;
 		next = (new_addr + PMD_SIZE) & PMD_MASK;