diff mbox series

mm: eliminate function call overhead during copy_page_range()

Message ID 20230205150602.GA25866@haolee.io (mailing list archive)
State New
Headers show
Series mm: eliminate function call overhead during copy_page_range() | expand

Commit Message

Hao Lee Feb. 5, 2023, 3:06 p.m. UTC
vm_normal_page() is called so many times that its overhead is very high.
After changing this call site to an inline function, copy_page_range()
runs 3~5 times faster than before.

Signed-off-by: Hao Lee <haolee.swjtu@gmail.com>
---
 mm/memory.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

Comments

Matthew Wilcox Feb. 5, 2023, 9:53 p.m. UTC | #1
On Sun, Feb 05, 2023 at 03:06:02PM +0000, Hao Lee wrote:
> vm_normal_page() is called so many times that its overhead is very high.
> After changing this call site to an inline function, copy_page_range()
> runs 3~5 times faster than before.

So you're saying that your compiler is making bad decisions?  What
architecture, what compiler, what version?  Do you have
CONFIG_ARCH_HAS_PTE_SPECIAL set?

Is there something about inlining it that makes the compiler able to
optimise away code, or is it really the function call overhead?  Can
you share any perf results?
Hao Lee Feb. 6, 2023, 9:17 a.m. UTC | #2
On Sun, Feb 05, 2023 at 09:53:53PM +0000, Matthew Wilcox wrote:
> On Sun, Feb 05, 2023 at 03:06:02PM +0000, Hao Lee wrote:
> > vm_normal_page() is called so many times that its overhead is very high.
> > After changing this call site to an inline function, copy_page_range()
> > runs 3~5 times faster than before.
> 
> So you're saying that your compiler is making bad decisions?  What
> architecture, what compiler, what version?  Do you have
> CONFIG_ARCH_HAS_PTE_SPECIAL set?
> 
> Is there something about inlining it that makes the compiler able to
> optimise away code, or is it really the function call overhead?  Can
> you share any perf results?

I am so embarrassed; I forgot to disable function_graph when timing the
non-inlined function so my test was interfered. And the actual
performance improvement is only ~3%.
Please ignore this patch. Sorry...

>
diff mbox series

Patch

diff --git a/mm/memory.c b/mm/memory.c
index 7a04a1130ec1..2084bb7aff85 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -562,7 +562,7 @@  static void print_bad_pte(struct vm_area_struct *vma, unsigned long addr,
  * PFNMAP mappings in order to support COWable mappings.
  *
  */
-struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
+static inline struct page *__vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
 			    pte_t pte)
 {
 	unsigned long pfn = pte_pfn(pte);
@@ -625,6 +625,12 @@  struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
 	return pfn_to_page(pfn);
 }
 
+struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
+			    pte_t pte)
+{
+	return __vm_normal_page(vma, addr, pte);
+}
+
 struct folio *vm_normal_folio(struct vm_area_struct *vma, unsigned long addr,
 			    pte_t pte)
 {
@@ -908,7 +914,7 @@  copy_present_pte(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma,
 	struct page *page;
 	struct folio *folio;
 
-	page = vm_normal_page(src_vma, addr, pte);
+	page = __vm_normal_page(src_vma, addr, pte);
 	if (page)
 		folio = page_folio(page);
 	if (page && folio_test_anon(folio)) {