Message ID | 1597816075-61091-1-git-send-email-alex.shi@linux.alibaba.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2,1/2] mm/pageblock: mitigation cmpxchg false sharing in pageblock flags | expand |
On 08/19/2020 11:17 AM, Alex Shi wrote: > pageblock_flags is used as long, since every pageblock_flags is just 4 > bits, 'long' size will include 8(32bit machine) or 16 pageblocks' flags, > that flag setting has to sync in cmpxchg with 7 or 15 other pageblock > flags. It would cause long waiting for sync. > > If we could change the pageblock_flags variable as char, we could use > char size cmpxchg, which just sync up with 2 pageblock flags. it could > relief much false sharing in cmpxchg. Do you have numbers demonstrating claimed performance improvement after this change ?
On 19.08.20 09:55, Anshuman Khandual wrote: > > > On 08/19/2020 11:17 AM, Alex Shi wrote: >> pageblock_flags is used as long, since every pageblock_flags is just 4 >> bits, 'long' size will include 8(32bit machine) or 16 pageblocks' flags, >> that flag setting has to sync in cmpxchg with 7 or 15 other pageblock >> flags. It would cause long waiting for sync. >> >> If we could change the pageblock_flags variable as char, we could use >> char size cmpxchg, which just sync up with 2 pageblock flags. it could >> relief much false sharing in cmpxchg. > > Do you have numbers demonstrating claimed performance improvement > after this change ? > I asked for that in v1 and there are no performance numbers to justify the change. IMHO, that will be required to consider this for inclusion, otherwise it's just code churn resulting in an (although minimal) additional memory consumption.
在 2020/8/19 下午4:04, David Hildenbrand 写道: > On 19.08.20 09:55, Anshuman Khandual wrote: >> >> >> On 08/19/2020 11:17 AM, Alex Shi wrote: >>> pageblock_flags is used as long, since every pageblock_flags is just 4 >>> bits, 'long' size will include 8(32bit machine) or 16 pageblocks' flags, >>> that flag setting has to sync in cmpxchg with 7 or 15 other pageblock >>> flags. It would cause long waiting for sync. >>> >>> If we could change the pageblock_flags variable as char, we could use >>> char size cmpxchg, which just sync up with 2 pageblock flags. it could >>> relief much false sharing in cmpxchg. >> >> Do you have numbers demonstrating claimed performance improvement >> after this change ? >> > > I asked for that in v1 and there are no performance numbers to justify > the change. IMHO, that will be required to consider this for inclusion, > otherwise it's just code churn resulting in an (although minimal) > additional memory consumption. > Just got some time to run thpscale on my 4*HT cores machine, here is the data: I run each of kernel for 3 times, pageblock kernel is the 5.9-rc2 with this 2 patches, the plp1 is the first patch on 5.9-rc2, and rc2 is 5.9-rc2 kernel. We could found the system and total time is slight less than original kernel. pageblock pageblock pageblock plp1 plp1 plp1 rc2 rc2 rc2 pageblock 16 16-2 16-3 1 2 3 a b c a Duration User 14.81 15.24 14.55 15.28 14.66 14.63 14.76 14.97 14.38 15.07 Duration System 84.44 88.38 90.64 92.65 94.01 90.58 100.43 89.15 88.89 84.04 Duration Elapsed 98.83 99.06 99.81 99.65 100.26 99.90 100.30 99.24 99.14 98.87 And I also add tracing for patchset effect, which show the cmpxchg failure times get clearly less. Performance counter stats for './run-mmtests.sh -c configs/config-workload-thpscale rc2-b': 6,720 compaction:mm_compaction_isolate_migratepages 13,526 compaction:mm_compaction_isolate_freepages 4,052 compaction:mm_compaction_migratepages 34,199 compaction:mm_compaction_begin 34,199 compaction:mm_compaction_end 21,784 compaction:mm_compaction_try_to_compact_pages 71,606 compaction:mm_compaction_finished 106,545 compaction:mm_compaction_suitable 0 compaction:mm_compaction_deferred 0 compaction:mm_compaction_defer_compaction 2,977 compaction:mm_compaction_defer_reset 0 compaction:mm_compaction_kcompactd_sleep 0 compaction:mm_compaction_wakeup_kcompactd 0 compaction:mm_compaction_kcompactd_wake 1,046 pageblock:hit_cmpxchg 114.914303988 seconds time elapsed 15.754797000 seconds user 89.712251000 seconds sys Performance counter stats for './run-mmtests.sh -c configs/config-workload-thpscale pageblock-a': 602 compaction:mm_compaction_isolate_migratepages 3,710 compaction:mm_compaction_isolate_freepages 402 compaction:mm_compaction_migratepages 43,116 compaction:mm_compaction_begin 43,116 compaction:mm_compaction_end 24,810 compaction:mm_compaction_try_to_compact_pages 86,527 compaction:mm_compaction_finished 125,819 compaction:mm_compaction_suitable 2 compaction:mm_compaction_deferred 0 compaction:mm_compaction_defer_compaction 271 compaction:mm_compaction_defer_reset 0 compaction:mm_compaction_kcompactd_sleep 0 compaction:mm_compaction_wakeup_kcompactd 0 compaction:mm_compaction_kcompactd_wake 369 pageblock:hit_cmpxchg 107.405499745 seconds time elapsed 15.830967000 seconds user 84.559767000 seconds sys commit 36cea76895637c0c18ce8590c0f43a3e453fbf8f Author: Alex Shi <alex.shi@linux.alibaba.com> Date: Wed Aug 19 17:26:26 2020 +0800 add cmpxchg tracing Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com> diff --git a/include/trace/events/pageblock.h b/include/trace/events/pageblock.h new file mode 100644 index 000000000000..003c2d716f82 --- /dev/null +++ b/include/trace/events/pageblock.h @@ -0,0 +1,30 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#undef TRACE_SYSTEM +#define TRACE_SYSTEM pageblock + +#if !defined(_TRACE_PAGEBLOCK_H) || defined(TRACE_HEADER_MULTI_READ) +#define _TRACE_PAGEBLOCK_H + +#include <linux/tracepoint.h> + +TRACE_EVENT(hit_cmpxchg, + + TP_PROTO(char byte), + + TP_ARGS(byte), + + TP_STRUCT__entry( + __field(char, byte) + ), + + TP_fast_assign( + __entry->byte = byte; + ), + + TP_printk("%d", __entry->byte) +); + +#endif /* _TRACE_PAGE_ISOLATION_H */ + +/* This part must be outside protection */ +#include <trace/define_trace.h> diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 60342e764090..2422dec00484 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -509,6 +509,9 @@ static __always_inline int get_pfnblock_migratetype(struct page *page, unsigned * @pfn: The target page frame number * @mask: mask of bits that the caller is interested in */ +#define CREATE_TRACE_POINTS +#include <trace/events/pageblock.h> + void set_pfnblock_flags_mask(struct page *page, unsigned long flags, unsigned long pfn, unsigned long mask) @@ -536,6 +539,7 @@ void set_pfnblock_flags_mask(struct page *page, unsigned long flags, if (byte == old_byte) break; byte = old_byte; + trace_hit_cmpxchg(byte); } }
在 2020/8/19 下午3:55, Anshuman Khandual 写道: > > > On 08/19/2020 11:17 AM, Alex Shi wrote: >> pageblock_flags is used as long, since every pageblock_flags is just 4 >> bits, 'long' size will include 8(32bit machine) or 16 pageblocks' flags, >> that flag setting has to sync in cmpxchg with 7 or 15 other pageblock >> flags. It would cause long waiting for sync. >> >> If we could change the pageblock_flags variable as char, we could use >> char size cmpxchg, which just sync up with 2 pageblock flags. it could >> relief much false sharing in cmpxchg. > > Do you have numbers demonstrating claimed performance improvement > after this change ? > the performance data show in another email. LKP reported the arm6 has a bug on this patchset, since it has no cmpxchgb solution, so maybe let's fallback to cmpxchg on it. From db3d97ba8cc5e206b440bd40a92ef6955ad86bc0 Mon Sep 17 00:00:00 2001 From: Alex Shi <alex.shi@linux.alibaba.com> Date: Tue, 18 Aug 2020 15:51:18 +0800 Subject: [PATCH v2 3/3] armv6: fix armv6 build issue Arm v6 can not simulate cmpxchg1 func, so we have to use cmpxchg4 on it. arm-linux-gnueabi-ld: mm/page_alloc.o: in function `set_pfnblock_flags_mask': (.text+0x32b4): undefined reference to `__bad_cmpxchg' arm-linux-gnueabi-ld: (.text+0x32e0): undefined reference to `__bad_cmpxchg' arm-linux-gnueabi-ld: drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.o: in function `hw_atl_b0_get_mac_temp': hw_atl_b0.c:(.text+0x30fc): undefined reference to `__bad_udelay' Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Russell King <linux@armlinux.org.uk> Cc: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org --- mm/page_alloc.c | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 7da09d66233b..c09146a8946c 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -517,7 +517,11 @@ void set_pfnblock_flags_mask(struct page *page, unsigned long flags, { unsigned char *bitmap; unsigned long bitidx, byte_bitidx; +#ifdef CONFIG_CPU_V6 + unsigned long old_byte, byte; +#else unsigned char old_byte, byte; +#endif BUILD_BUG_ON(NR_PAGEBLOCK_BITS != BITS_PER_BYTE); BUILD_BUG_ON(MIGRATE_TYPES > (1 << PB_migratetype_bits)); @@ -532,9 +536,18 @@ void set_pfnblock_flags_mask(struct page *page, unsigned long flags, mask <<= bitidx; flags <<= bitidx; +#ifdef CONFIG_CPU_V6 + byte = (unsigned long)READ_ONCE(bitmap[byte_bitidx]); +#else byte = READ_ONCE(bitmap[byte_bitidx]); +#endif for (;;) { +#ifdef CONFIG_CPU_V6 + /* arm v6 has no cmpxchgb function, so still false sharing long word */ + old_byte = cmpxchg((unsigned long*)&bitmap[byte_bitidx], byte, (byte & ~mask) | flags); +#else old_byte = cmpxchg(&bitmap[byte_bitidx], byte, (byte & ~mask) | flags); +#endif if (byte == old_byte) break; byte = old_byte;
On Sun, Aug 30, 2020 at 06:14:33PM +0800, Alex Shi wrote: > +++ b/mm/page_alloc.c > @@ -532,9 +536,18 @@ void set_pfnblock_flags_mask(struct page *page, unsigned long flags, > mask <<= bitidx; > flags <<= bitidx; > > +#ifdef CONFIG_CPU_V6 > + byte = (unsigned long)READ_ONCE(bitmap[byte_bitidx]); > +#else > byte = READ_ONCE(bitmap[byte_bitidx]); > +#endif > for (;;) { > +#ifdef CONFIG_CPU_V6 > + /* arm v6 has no cmpxchgb function, so still false sharing long word */ > + old_byte = cmpxchg((unsigned long*)&bitmap[byte_bitidx], byte, (byte & ~mask) | flags); > +#else > old_byte = cmpxchg(&bitmap[byte_bitidx], byte, (byte & ~mask) | flags); > +#endif Good grief, no. Either come up with an appropriate abstraction or abandon this patch. We can't possibly put this kind of ifdef in the memory allocator!
在 2020/8/30 下午6:18, Matthew Wilcox 写道: > On Sun, Aug 30, 2020 at 06:14:33PM +0800, Alex Shi wrote: >> +++ b/mm/page_alloc.c >> @@ -532,9 +536,18 @@ void set_pfnblock_flags_mask(struct page *page, unsigned long flags, >> mask <<= bitidx; >> flags <<= bitidx; >> >> +#ifdef CONFIG_CPU_V6 >> + byte = (unsigned long)READ_ONCE(bitmap[byte_bitidx]); >> +#else >> byte = READ_ONCE(bitmap[byte_bitidx]); >> +#endif >> for (;;) { >> +#ifdef CONFIG_CPU_V6 >> + /* arm v6 has no cmpxchgb function, so still false sharing long word */ >> + old_byte = cmpxchg((unsigned long*)&bitmap[byte_bitidx], byte, (byte & ~mask) | flags); >> +#else >> old_byte = cmpxchg(&bitmap[byte_bitidx], byte, (byte & ~mask) | flags); >> +#endif > > Good grief, no. Either come up with an appropriate abstraction or > abandon this patch. We can't possibly put this kind of ifdef in the > memory allocator! > Hi Matthew, Thanks a lot for comments! How about the following patch? From 5f61b91351461084c5bb410025965a3b4d2f7206 Mon Sep 17 00:00:00 2001 From: Alex Shi <alex.shi@linux.alibaba.com> Date: Mon, 31 Aug 2020 15:41:20 +0800 Subject: [PATCH 3/3] mm/armv6: work around armv6 cmpxchg support issue Armv6 can not simulate cmpxchg1 func, so we have to use cmpxchg4 on it. arm-linux-gnueabi-ld: mm/page_alloc.o: in function `set_pfnblock_flags_mask': (.text+0x32b4): undefined reference to `__bad_cmpxchg' arm-linux-gnueabi-ld: (.text+0x32e0): undefined reference to `__bad_cmpxchg' arm-linux-gnueabi-ld:drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.o: in function `hw_atl_b0_get_mac_temp': hw_atl_b0.c:(.text+0x30fc): undefined reference to `__bad_udelay' Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Russell King <linux@armlinux.org.uk> Cc: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org --- include/linux/mmzone.h | 15 ++++++++++++--- mm/page_alloc.c | 24 ++++++++++++------------ 2 files changed, 24 insertions(+), 15 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index be676e659fb7..c1bb904bcad8 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -406,6 +406,15 @@ enum zone_type { #ifndef __GENERATING_BOUNDS_H +#ifdef CONFIG_CPU_V6 +/* cmpxchg only support 32-bits operands on ARMv6. */ +typedef unsigned long pageblockflags_t; +#define BITS_PER_FLAGS BITS_PER_LONG +#else +typedef unsigned char pageblockflags_t; +#define BITS_PER_FLAGS BITS_PER_BYTE +#endif + struct zone { /* Read-mostly fields */ @@ -437,7 +446,7 @@ struct zone { * Flags for a pageblock_nr_pages block. See pageblock-flags.h. * In SPARSEMEM, this map is stored in struct mem_section */ - unsigned char *pageblock_flags; + pageblockflags_t *pageblock_flags; #endif /* CONFIG_SPARSEMEM */ /* zone_start_pfn == zone_start_paddr >> PAGE_SHIFT */ @@ -1159,7 +1168,7 @@ struct mem_section_usage { DECLARE_BITMAP(subsection_map, SUBSECTIONS_PER_SECTION); #endif /* See declaration of similar field in struct zone */ - unsigned char pageblock_flags[0]; + pageblockflags_t pageblock_flags[0]; }; void subsection_map_init(unsigned long pfn, unsigned long nr_pages); @@ -1212,7 +1221,7 @@ struct mem_section { extern struct mem_section mem_section[NR_SECTION_ROOTS][SECTIONS_PER_ROOT]; #endif -static inline unsigned char *section_to_usemap(struct mem_section *ms) +static inline pageblockflags_t *section_to_usemap(struct mem_section *ms) { return ms->usage->pageblock_flags; } diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 60342e764090..9a41c5dc78eb 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -445,7 +445,7 @@ static inline bool defer_init(int nid, unsigned long pfn, unsigned long end_pfn) #endif /* Return a pointer to the bitmap storing bits affecting a block of pages */ -static inline unsigned char *get_pageblock_bitmap(struct page *page, +static inline pageblockflags_t *get_pageblock_bitmap(struct page *page, unsigned long pfn) { #ifdef CONFIG_SPARSEMEM @@ -474,24 +474,24 @@ static inline int pfn_to_bitidx(struct page *page, unsigned long pfn) * Return: pageblock_bits flags */ static __always_inline -unsigned char __get_pfnblock_flags_mask(struct page *page, +pageblockflags_t __get_pfnblock_flags_mask(struct page *page, unsigned long pfn, unsigned long mask) { - unsigned char *bitmap; + pageblockflags_t *bitmap; unsigned long bitidx, byte_bitidx; - unsigned char byte; + pageblockflags_t byte; bitmap = get_pageblock_bitmap(page, pfn); bitidx = pfn_to_bitidx(page, pfn); - byte_bitidx = bitidx / BITS_PER_BYTE; - bitidx &= (BITS_PER_BYTE-1); + byte_bitidx = bitidx / BITS_PER_FLAGS; + bitidx &= (BITS_PER_FLAGS - 1); byte = bitmap[byte_bitidx]; return (byte >> bitidx) & mask; } -unsigned char get_pfnblock_flags_mask(struct page *page, unsigned long pfn, +pageblockflags_t get_pfnblock_flags_mask(struct page *page, unsigned long pfn, unsigned long mask) { return __get_pfnblock_flags_mask(page, pfn, mask); @@ -513,17 +513,17 @@ void set_pfnblock_flags_mask(struct page *page, unsigned long flags, unsigned long pfn, unsigned long mask) { - unsigned char *bitmap; + pageblockflags_t *bitmap; unsigned long bitidx, byte_bitidx; - unsigned char old_byte, byte; + pageblockflags_t old_byte, byte; - BUILD_BUG_ON(NR_PAGEBLOCK_BITS != BITS_PER_BYTE); + BUILD_BUG_ON(NR_PAGEBLOCK_BITS != BITS_PER_FLAGS); BUILD_BUG_ON(MIGRATE_TYPES > (1 << PB_migratetype_bits)); bitmap = get_pageblock_bitmap(page, pfn); bitidx = pfn_to_bitidx(page, pfn); - byte_bitidx = bitidx / BITS_PER_BYTE; - bitidx &= (BITS_PER_BYTE-1); + byte_bitidx = bitidx / BITS_PER_FLAGS; + bitidx &= (BITS_PER_FLAGS - 1); VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn), page);
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 0ed520954843..c92d6d24527d 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -438,7 +438,7 @@ struct zone { * Flags for a pageblock_nr_pages block. See pageblock-flags.h. * In SPARSEMEM, this map is stored in struct mem_section */ - unsigned long *pageblock_flags; + unsigned char *pageblock_flags; #endif /* CONFIG_SPARSEMEM */ /* zone_start_pfn == zone_start_paddr >> PAGE_SHIFT */ @@ -1159,7 +1159,7 @@ struct mem_section_usage { DECLARE_BITMAP(subsection_map, SUBSECTIONS_PER_SECTION); #endif /* See declaration of similar field in struct zone */ - unsigned long pageblock_flags[0]; + unsigned char pageblock_flags[0]; }; void subsection_map_init(unsigned long pfn, unsigned long nr_pages); @@ -1212,7 +1212,7 @@ struct mem_section { extern struct mem_section mem_section[NR_SECTION_ROOTS][SECTIONS_PER_ROOT]; #endif -static inline unsigned long *section_to_usemap(struct mem_section *ms) +static inline unsigned char *section_to_usemap(struct mem_section *ms) { return ms->usage->pageblock_flags; } diff --git a/include/linux/pageblock-flags.h b/include/linux/pageblock-flags.h index fff52ad370c1..d189441568eb 100644 --- a/include/linux/pageblock-flags.h +++ b/include/linux/pageblock-flags.h @@ -54,7 +54,7 @@ enum pageblock_bits { /* Forward declaration */ struct page; -unsigned long get_pfnblock_flags_mask(struct page *page, +unsigned char get_pfnblock_flags_mask(struct page *page, unsigned long pfn, unsigned long mask); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index c0b0b32de50e..f60071e8a4e1 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -445,7 +445,7 @@ static inline bool defer_init(int nid, unsigned long pfn, unsigned long end_pfn) #endif /* Return a pointer to the bitmap storing bits affecting a block of pages */ -static inline unsigned long *get_pageblock_bitmap(struct page *page, +static inline unsigned char *get_pageblock_bitmap(struct page *page, unsigned long pfn) { #ifdef CONFIG_SPARSEMEM @@ -474,24 +474,24 @@ static inline int pfn_to_bitidx(struct page *page, unsigned long pfn) * Return: pageblock_bits flags */ static __always_inline -unsigned long __get_pfnblock_flags_mask(struct page *page, +unsigned char __get_pfnblock_flags_mask(struct page *page, unsigned long pfn, unsigned long mask) { - unsigned long *bitmap; - unsigned long bitidx, word_bitidx; - unsigned long word; + unsigned char *bitmap; + unsigned long bitidx, byte_bitidx; + unsigned char byte; bitmap = get_pageblock_bitmap(page, pfn); bitidx = pfn_to_bitidx(page, pfn); - word_bitidx = bitidx / BITS_PER_LONG; - bitidx &= (BITS_PER_LONG-1); + byte_bitidx = bitidx / BITS_PER_BYTE; + bitidx &= (BITS_PER_BYTE-1); - word = bitmap[word_bitidx]; - return (word >> bitidx) & mask; + byte = bitmap[byte_bitidx]; + return (byte >> bitidx) & mask; } -unsigned long get_pfnblock_flags_mask(struct page *page, unsigned long pfn, +unsigned char get_pfnblock_flags_mask(struct page *page, unsigned long pfn, unsigned long mask) { return __get_pfnblock_flags_mask(page, pfn, mask); @@ -513,29 +513,29 @@ void set_pfnblock_flags_mask(struct page *page, unsigned long flags, unsigned long pfn, unsigned long mask) { - unsigned long *bitmap; - unsigned long bitidx, word_bitidx; - unsigned long old_word, word; + unsigned char *bitmap; + unsigned long bitidx, byte_bitidx; + unsigned char old_byte, byte; BUILD_BUG_ON(NR_PAGEBLOCK_BITS != 4); BUILD_BUG_ON(MIGRATE_TYPES > (1 << PB_migratetype_bits)); bitmap = get_pageblock_bitmap(page, pfn); bitidx = pfn_to_bitidx(page, pfn); - word_bitidx = bitidx / BITS_PER_LONG; - bitidx &= (BITS_PER_LONG-1); + byte_bitidx = bitidx / BITS_PER_BYTE; + bitidx &= (BITS_PER_BYTE-1); VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn), page); mask <<= bitidx; flags <<= bitidx; - word = READ_ONCE(bitmap[word_bitidx]); + byte = READ_ONCE(bitmap[byte_bitidx]); for (;;) { - old_word = cmpxchg(&bitmap[word_bitidx], word, (word & ~mask) | flags); - if (word == old_word) + old_byte = cmpxchg(&bitmap[byte_bitidx], byte, (byte & ~mask) | flags); + if (byte == old_byte) break; - word = old_word; + byte = old_byte; } }
pageblock_flags is used as long, since every pageblock_flags is just 4 bits, 'long' size will include 8(32bit machine) or 16 pageblocks' flags, that flag setting has to sync in cmpxchg with 7 or 15 other pageblock flags. It would cause long waiting for sync. If we could change the pageblock_flags variable as char, we could use char size cmpxchg, which just sync up with 2 pageblock flags. it could relief much false sharing in cmpxchg. Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Hugh Dickins <hughd@google.com> Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com> Cc: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org --- include/linux/mmzone.h | 6 +++--- include/linux/pageblock-flags.h | 2 +- mm/page_alloc.c | 38 +++++++++++++++++++------------------- 3 files changed, 23 insertions(+), 23 deletions(-)