Message ID | 20230205150602.GA25866@haolee.io (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm: eliminate function call overhead during copy_page_range() | expand |
On Sun, Feb 05, 2023 at 03:06:02PM +0000, Hao Lee wrote: > vm_normal_page() is called so many times that its overhead is very high. > After changing this call site to an inline function, copy_page_range() > runs 3~5 times faster than before. So you're saying that your compiler is making bad decisions? What architecture, what compiler, what version? Do you have CONFIG_ARCH_HAS_PTE_SPECIAL set? Is there something about inlining it that makes the compiler able to optimise away code, or is it really the function call overhead? Can you share any perf results?
On Sun, Feb 05, 2023 at 09:53:53PM +0000, Matthew Wilcox wrote: > On Sun, Feb 05, 2023 at 03:06:02PM +0000, Hao Lee wrote: > > vm_normal_page() is called so many times that its overhead is very high. > > After changing this call site to an inline function, copy_page_range() > > runs 3~5 times faster than before. > > So you're saying that your compiler is making bad decisions? What > architecture, what compiler, what version? Do you have > CONFIG_ARCH_HAS_PTE_SPECIAL set? > > Is there something about inlining it that makes the compiler able to > optimise away code, or is it really the function call overhead? Can > you share any perf results? I am so embarrassed; I forgot to disable function_graph when timing the non-inlined function so my test was interfered. And the actual performance improvement is only ~3%. Please ignore this patch. Sorry... >
diff --git a/mm/memory.c b/mm/memory.c index 7a04a1130ec1..2084bb7aff85 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -562,7 +562,7 @@ static void print_bad_pte(struct vm_area_struct *vma, unsigned long addr, * PFNMAP mappings in order to support COWable mappings. * */ -struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr, +static inline struct page *__vm_normal_page(struct vm_area_struct *vma, unsigned long addr, pte_t pte) { unsigned long pfn = pte_pfn(pte); @@ -625,6 +625,12 @@ struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr, return pfn_to_page(pfn); } +struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr, + pte_t pte) +{ + return __vm_normal_page(vma, addr, pte); +} + struct folio *vm_normal_folio(struct vm_area_struct *vma, unsigned long addr, pte_t pte) { @@ -908,7 +914,7 @@ copy_present_pte(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma, struct page *page; struct folio *folio; - page = vm_normal_page(src_vma, addr, pte); + page = __vm_normal_page(src_vma, addr, pte); if (page) folio = page_folio(page); if (page && folio_test_anon(folio)) {
vm_normal_page() is called so many times that its overhead is very high. After changing this call site to an inline function, copy_page_range() runs 3~5 times faster than before. Signed-off-by: Hao Lee <haolee.swjtu@gmail.com> --- mm/memory.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-)