diff mbox series

[RFC] mm: add pad in sturct page

Message ID 1545703503-20939-1-git-send-email-zhangkehong@hisilicon.com (mailing list archive)
State RFC
Headers show
Series [RFC] mm: add pad in sturct page | expand

Commit Message

Kehong Zhang Dec. 25, 2018, 2:05 a.m. UTC
When I analysis the performance of using nginx for static websites.
	I found a high cpu workload in the code below:

	mm/filemap.c
	@@ 1425,3 @@ struct page *find_get_entry(struct address_space 
						*mapping, pgoff_t offset)
	head = compound_head(page);
	if (!page_cache_get_speculative(head))
		goto repeat;
        
	The code will read page->compound_head and atomic add page->_refcount,
	which two variables are in the same cache-line. As multiple core 
	read/write two variables in the same cache-line	will cause serious
	slowdown, I test the performance of using nginx for static websites
	in two cases, add patch below and original.

	core num\Scenes        add pad        no pad      
	8core                  317654         344414

	As we know, struct page is the most used struct and should be as
	small as possible. And I don't have a better way to solve it, if
	anybody have any idea to solve it?

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Kehong Zhang <zhangkehong@hisilicon.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Souptick Joarder <jrdr.linux@gmail.com>
Cc: Rik van Riel <riel@surriel.com>
Signed-off-by: Kehong Zhang <zhangkehong@hisilicon.com>
---
 arch/arm64/include/asm/memory.h | 2 +-
 include/linux/mm_types.h        | 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

Comments

Matthew Wilcox Dec. 25, 2018, 1:59 p.m. UTC | #1
On Tue, Dec 25, 2018 at 10:05:03AM +0800, Kehong Zhang wrote:
>         When I analysis the performance of using nginx for static websites.
> 	I found a high cpu workload in the code below:
> 
> 	mm/filemap.c
> 	@@ 1425,3 @@ struct page *find_get_entry(struct address_space 
> 						*mapping, pgoff_t offset)
> 	head = compound_head(page);
> 	if (!page_cache_get_speculative(head))
> 		goto repeat;
>         
> 	The code will read page->compound_head and atomic add page->_refcount,
> 	which two variables are in the same cache-line. As multiple core 
> 	read/write two variables in the same cache-line	will cause serious
> 	slowdown,

page->compound is written at allocation.  It's not modified during use,
so your analysis is wrong here.  What you're seeing is the first attempt
to access this struct page, so it's a cache-miss.  If you take away the
access to page->compound, you'll instead see all of the penalty incurred
on the next attempt to access the struct page.

>       I test the performance of using nginx for static websites
> 	in two cases, add patch below and original.
> 
> 	core num\Scenes        add pad        no pad      
> 	8core                  317654         344414

These numbers aren't particularly useful.  How stable are they from run
to run?  That is, if you do five runs, what range do they fall in?  Also,
is higher better or is lower better?  I can't tell if this is requests
per second or latency or something else being measured.

> 	As we know, struct page is the most used struct and should be as
> 	small as possible. And I don't have a better way to solve it, if
> 	anybody have any idea to solve it?

Obviously we're not going to grow struct page from 64 bytes to 256 bytes.
I have briefly investigated an idea to eliminate the reference counting
for page cache accesses that don't need to sleep, but it's a long way
down my todo list.
diff mbox series

Patch

diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
index b9644296..652ccf6 100644
--- a/arch/arm64/include/asm/memory.h
+++ b/arch/arm64/include/asm/memory.h
@@ -41,7 +41,7 @@ 
  * requires its definition to be available at this point in the inclusion
  * chain, and it may not be a power of 2 in the first place.
  */
-#define STRUCT_PAGE_MAX_SHIFT	6
+#define STRUCT_PAGE_MAX_SHIFT	8
 
 /*
  * VMEMMAP_SIZE - allows the whole linear region to be covered by
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 5ed8f62..0a11c5b 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -179,6 +179,8 @@  struct page {
 		int units;			/* SLOB */
 	};
 
+	unsigned char pad[128];
+
 	/* Usage count. *DO NOT USE DIRECTLY*. See page_ref.h */
 	atomic_t _refcount;