Message ID | 20210413194625.1472345-1-willy@infradead.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mm: Optimise nth_page for contiguous memmap | expand |
On 13.04.21 21:46, Matthew Wilcox (Oracle) wrote: > If the memmap is virtually contiguous (either because we're using > a virtually mapped memmap or because we don't support a discontig > memmap at all), then we can implement nth_page() by simple addition. > Contrary to popular belief, the compiler is not able to optimise this > itself for a vmemmap configuration. This reduces one example user (sg.c) > by four instructions: > > struct page *page = nth_page(rsv_schp->pages[k], offset >> PAGE_SHIFT); > > before: > 49 8b 45 70 mov 0x70(%r13),%rax > 48 63 c9 movslq %ecx,%rcx > 48 c1 eb 0c shr $0xc,%rbx > 48 8b 04 c8 mov (%rax,%rcx,8),%rax > 48 2b 05 00 00 00 00 sub 0x0(%rip),%rax > R_X86_64_PC32 vmemmap_base-0x4 > 48 c1 f8 06 sar $0x6,%rax > 48 01 d8 add %rbx,%rax > 48 c1 e0 06 shl $0x6,%rax > 48 03 05 00 00 00 00 add 0x0(%rip),%rax > R_X86_64_PC32 vmemmap_base-0x4 > > after: > 49 8b 45 70 mov 0x70(%r13),%rax > 48 63 c9 movslq %ecx,%rcx > 48 c1 eb 0c shr $0xc,%rbx > 48 c1 e3 06 shl $0x6,%rbx > 48 03 1c c8 add (%rax,%rcx,8),%rbx > > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> > Reviewed-by: Christoph Hellwig <hch@lst.de> > --- > include/linux/mm.h | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/include/linux/mm.h b/include/linux/mm.h > index 25b9041f9925..2327f99b121f 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -234,7 +234,11 @@ int overcommit_policy_handler(struct ctl_table *, int, void *, size_t *, > int __add_to_page_cache_locked(struct page *page, struct address_space *mapping, > pgoff_t index, gfp_t gfp, void **shadowp); > > +#if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP) > #define nth_page(page,n) pfn_to_page(page_to_pfn((page)) + (n)) > +#else > +#define nth_page(page,n) ((page) + (n)) > +#endif For sparsemem we could optimize within a single memory section. But not sure if it's worth the trouble. Reviewed-by: David Hildenbrand <david@redhat.com>
On 13 Apr 2021, at 15:46, Matthew Wilcox (Oracle) wrote: > If the memmap is virtually contiguous (either because we're using > a virtually mapped memmap or because we don't support a discontig > memmap at all), then we can implement nth_page() by simple addition. > Contrary to popular belief, the compiler is not able to optimise this > itself for a vmemmap configuration. This reduces one example user (sg.c) > by four instructions: > > struct page *page = nth_page(rsv_schp->pages[k], offset >> PAGE_SHIFT); > > before: > 49 8b 45 70 mov 0x70(%r13),%rax > 48 63 c9 movslq %ecx,%rcx > 48 c1 eb 0c shr $0xc,%rbx > 48 8b 04 c8 mov (%rax,%rcx,8),%rax > 48 2b 05 00 00 00 00 sub 0x0(%rip),%rax > R_X86_64_PC32 vmemmap_base-0x4 > 48 c1 f8 06 sar $0x6,%rax > 48 01 d8 add %rbx,%rax > 48 c1 e0 06 shl $0x6,%rax > 48 03 05 00 00 00 00 add 0x0(%rip),%rax > R_X86_64_PC32 vmemmap_base-0x4 > > after: > 49 8b 45 70 mov 0x70(%r13),%rax > 48 63 c9 movslq %ecx,%rcx > 48 c1 eb 0c shr $0xc,%rbx > 48 c1 e3 06 shl $0x6,%rbx > 48 03 1c c8 add (%rax,%rcx,8),%rbx > > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> > Reviewed-by: Christoph Hellwig <hch@lst.de> > --- > include/linux/mm.h | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/include/linux/mm.h b/include/linux/mm.h > index 25b9041f9925..2327f99b121f 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -234,7 +234,11 @@ int overcommit_policy_handler(struct ctl_table *, int, void *, size_t *, > int __add_to_page_cache_locked(struct page *page, struct address_space *mapping, > pgoff_t index, gfp_t gfp, void **shadowp); > > +#if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP) > #define nth_page(page,n) pfn_to_page(page_to_pfn((page)) + (n)) > +#else > +#define nth_page(page,n) ((page) + (n)) > +#endif > > /* to align the pointer to the (next) page boundary */ > #define PAGE_ALIGN(addr) ALIGN(addr, PAGE_SIZE) > -- > 2.30.2 LGTM. Thanks. Reviewed-by: Zi Yan <ziy@nvidia.com> — Best Regards, Yan Zi
On Wed, Apr 14, 2021 at 05:24:42PM +0200, David Hildenbrand wrote: > On 13.04.21 21:46, Matthew Wilcox (Oracle) wrote: > > +#if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP) > > #define nth_page(page,n) pfn_to_page(page_to_pfn((page)) + (n)) > > +#else > > +#define nth_page(page,n) ((page) + (n)) > > +#endif > > For sparsemem we could optimize within a single memory section. But not sure > if it's worth the trouble. Not only is it not worth the trouble, I suspect it's more expensive to test-and-branch than just unconditionally call pfn_to_page() and page_to_pfn(). That said, I haven't measured. SPARSEMEM_VMEMMAP is default Y, and enabled by arm64, ia64, powerpc, riscv, s390, sparc and x86. I mean ... do we care any more? > Reviewed-by: David Hildenbrand <david@redhat.com> > > -- > Thanks, > > David / dhildenb >
Matthew Wilcox <willy@infradead.org> schrieb am Mi. 14. Apr. 2021 um 20:52: > On Wed, Apr 14, 2021 at 05:24:42PM +0200, David Hildenbrand wrote: > > On 13.04.21 21:46, Matthew Wilcox (Oracle) wrote: > > > +#if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP) > > > #define nth_page(page,n) pfn_to_page(page_to_pfn((page)) + (n)) > > > +#else > > > +#define nth_page(page,n) ((page) + (n)) > > > +#endif > > > > For sparsemem we could optimize within a single memory section. But not > sure > > if it's worth the trouble. > > Not only is it not worth the trouble, I suspect it's more expensive to > test-and-branch than just unconditionally call pfn_to_page() and > page_to_pfn(). That said, I haven't measured. My thinking was that in most cases we might stay within the section such that there are barely any actual branches. > > SPARSEMEM_VMEMMAP is default Y, and enabled by arm64, ia64, powerpc, > riscv, s390, sparc and x86. I mean ... do we care any more? Also true.
diff --git a/include/linux/mm.h b/include/linux/mm.h index 25b9041f9925..2327f99b121f 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -234,7 +234,11 @@ int overcommit_policy_handler(struct ctl_table *, int, void *, size_t *, int __add_to_page_cache_locked(struct page *page, struct address_space *mapping, pgoff_t index, gfp_t gfp, void **shadowp); +#if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP) #define nth_page(page,n) pfn_to_page(page_to_pfn((page)) + (n)) +#else +#define nth_page(page,n) ((page) + (n)) +#endif /* to align the pointer to the (next) page boundary */ #define PAGE_ALIGN(addr) ALIGN(addr, PAGE_SIZE)