From patchwork Wed Apr 7 20:24:13 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mel Gorman X-Patchwork-Id: 12189287 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 86A52C433B4 for ; Wed, 7 Apr 2021 20:24:48 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 1F9006100A for ; Wed, 7 Apr 2021 20:24:48 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1F9006100A Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=techsingularity.net Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id ABAD56B0078; Wed, 7 Apr 2021 16:24:47 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A91BF6B007D; Wed, 7 Apr 2021 16:24:47 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 932846B007E; Wed, 7 Apr 2021 16:24:47 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0179.hostedemail.com [216.40.44.179]) by kanga.kvack.org (Postfix) with ESMTP id 719876B0078 for ; Wed, 7 Apr 2021 16:24:47 -0400 (EDT) Received: from smtpin07.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 2B3838248047 for ; Wed, 7 Apr 2021 20:24:47 +0000 (UTC) X-FDA: 78006699414.07.2849488 Received: from outbound-smtp38.blacknight.com (outbound-smtp38.blacknight.com [46.22.139.221]) by imf01.hostedemail.com (Postfix) with ESMTP id F03DE500152B for ; Wed, 7 Apr 2021 20:24:45 +0000 (UTC) Received: from mail.blacknight.com (pemlinmail01.blacknight.ie [81.17.254.10]) by outbound-smtp38.blacknight.com (Postfix) with ESMTPS id 183B51B17 for ; Wed, 7 Apr 2021 21:24:45 +0100 (IST) Received: (qmail 14034 invoked from network); 7 Apr 2021 20:24:44 -0000 Received: from unknown (HELO stampy.112glenside.lan) (mgorman@techsingularity.net@[84.203.22.4]) by 81.17.254.9 with ESMTPA; 7 Apr 2021 20:24:44 -0000 From: Mel Gorman To: Linux-MM , Linux-RT-Users Cc: LKML , Chuck Lever , Jesper Dangaard Brouer , Matthew Wilcox , Thomas Gleixner , Peter Zijlstra , Ingo Molnar , Michal Hocko , Oscar Salvador , Mel Gorman Subject: [PATCH 01/11] mm/page_alloc: Split per cpu page lists and zone stats Date: Wed, 7 Apr 2021 21:24:13 +0100 Message-Id: <20210407202423.16022-2-mgorman@techsingularity.net> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210407202423.16022-1-mgorman@techsingularity.net> References: <20210407202423.16022-1-mgorman@techsingularity.net> MIME-Version: 1.0 X-Stat-Signature: 3puxtgeec41chgtzhfrabnme89ffbchd X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: F03DE500152B Received-SPF: none (techsingularity.net>: No applicable sender policy available) receiver=imf01; identity=mailfrom; envelope-from=""; helo=outbound-smtp38.blacknight.com; client-ip=46.22.139.221 X-HE-DKIM-Result: none/none X-HE-Tag: 1617827085-312336 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The per-cpu page allocator lists and the per-cpu vmstat deltas are stored in the same struct per_cpu_pages even though vmstats have no direct impact on the per-cpu page lists. This is inconsistent because the vmstats for a node are stored on a dedicated structure. The bigger issue is that the per_cpu_pages structure is not cache-aligned and stat updates either cache conflict with adjacent per-cpu lists incurring a runtime cost or padding is required incurring a memory cost. This patch splits the per-cpu pagelists and the vmstat deltas into separate structures. It's mostly a mechanical conversion but some variable renaming is done to clearly distinguish the per-cpu pages structure (pcp) from the vmstats (pzstats). Superficially, this appears to increase the size of the per_cpu_pages structure but the movement of expire fills a structure hole so there is no impact overall. [lkp@intel.com: Check struct per_cpu_zonestat has a non-zero size] Signed-off-by: Mel Gorman --- include/linux/mmzone.h | 18 ++++---- include/linux/vmstat.h | 8 ++-- mm/page_alloc.c | 84 +++++++++++++++++++----------------- mm/vmstat.c | 96 ++++++++++++++++++++++-------------------- 4 files changed, 110 insertions(+), 96 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 47946cec7584..a4393ac27336 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -341,20 +341,21 @@ struct per_cpu_pages { int count; /* number of pages in the list */ int high; /* high watermark, emptying needed */ int batch; /* chunk size for buddy add/remove */ +#ifdef CONFIG_NUMA + int expire; /* When 0, remote pagesets are drained */ +#endif /* Lists of pages, one per migrate type stored on the pcp-lists */ struct list_head lists[MIGRATE_PCPTYPES]; }; -struct per_cpu_pageset { - struct per_cpu_pages pcp; -#ifdef CONFIG_NUMA - s8 expire; - u16 vm_numa_stat_diff[NR_VM_NUMA_STAT_ITEMS]; -#endif +struct per_cpu_zonestat { #ifdef CONFIG_SMP - s8 stat_threshold; s8 vm_stat_diff[NR_VM_ZONE_STAT_ITEMS]; + s8 stat_threshold; +#endif +#ifdef CONFIG_NUMA + u16 vm_numa_stat_diff[NR_VM_NUMA_STAT_ITEMS]; #endif }; @@ -470,7 +471,8 @@ struct zone { int node; #endif struct pglist_data *zone_pgdat; - struct per_cpu_pageset __percpu *pageset; + struct per_cpu_pages __percpu *per_cpu_pageset; + struct per_cpu_zonestat __percpu *per_cpu_zonestats; /* * the high and batch values are copied to individual pagesets for * faster access diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h index 506d625163a1..1736ea9d24a7 100644 --- a/include/linux/vmstat.h +++ b/include/linux/vmstat.h @@ -163,7 +163,7 @@ static inline unsigned long zone_numa_state_snapshot(struct zone *zone, int cpu; for_each_online_cpu(cpu) - x += per_cpu_ptr(zone->pageset, cpu)->vm_numa_stat_diff[item]; + x += per_cpu_ptr(zone->per_cpu_zonestats, cpu)->vm_numa_stat_diff[item]; return x; } @@ -236,7 +236,7 @@ static inline unsigned long zone_page_state_snapshot(struct zone *zone, #ifdef CONFIG_SMP int cpu; for_each_online_cpu(cpu) - x += per_cpu_ptr(zone->pageset, cpu)->vm_stat_diff[item]; + x += per_cpu_ptr(zone->per_cpu_zonestats, cpu)->vm_stat_diff[item]; if (x < 0) x = 0; @@ -291,7 +291,7 @@ struct ctl_table; int vmstat_refresh(struct ctl_table *, int write, void *buffer, size_t *lenp, loff_t *ppos); -void drain_zonestat(struct zone *zone, struct per_cpu_pageset *); +void drain_zonestat(struct zone *zone, struct per_cpu_zonestat *); int calculate_pressure_threshold(struct zone *zone); int calculate_normal_threshold(struct zone *zone); @@ -399,7 +399,7 @@ static inline void cpu_vm_stats_fold(int cpu) { } static inline void quiet_vmstat(void) { } static inline void drain_zonestat(struct zone *zone, - struct per_cpu_pageset *pset) { } + struct per_cpu_zonestat *pzstats) { } #endif /* CONFIG_SMP */ static inline void __mod_zone_freepage_state(struct zone *zone, int nr_pages, diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 5e8aedb64b57..a68bacddcae0 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2981,15 +2981,14 @@ void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp) static void drain_pages_zone(unsigned int cpu, struct zone *zone) { unsigned long flags; - struct per_cpu_pageset *pset; struct per_cpu_pages *pcp; local_irq_save(flags); - pset = per_cpu_ptr(zone->pageset, cpu); - pcp = &pset->pcp; + pcp = per_cpu_ptr(zone->per_cpu_pageset, cpu); if (pcp->count) free_pcppages_bulk(zone, pcp->count, pcp); + local_irq_restore(flags); } @@ -3088,7 +3087,7 @@ static void __drain_all_pages(struct zone *zone, bool force_all_cpus) * disables preemption as part of its processing */ for_each_online_cpu(cpu) { - struct per_cpu_pageset *pcp; + struct per_cpu_pages *pcp; struct zone *z; bool has_pcps = false; @@ -3099,13 +3098,13 @@ static void __drain_all_pages(struct zone *zone, bool force_all_cpus) */ has_pcps = true; } else if (zone) { - pcp = per_cpu_ptr(zone->pageset, cpu); - if (pcp->pcp.count) + pcp = per_cpu_ptr(zone->per_cpu_pageset, cpu); + if (pcp->count) has_pcps = true; } else { for_each_populated_zone(z) { - pcp = per_cpu_ptr(z->pageset, cpu); - if (pcp->pcp.count) { + pcp = per_cpu_ptr(z->per_cpu_pageset, cpu); + if (pcp->count) { has_pcps = true; break; } @@ -3235,7 +3234,7 @@ static void free_unref_page_commit(struct page *page, unsigned long pfn) migratetype = MIGRATE_MOVABLE; } - pcp = &this_cpu_ptr(zone->pageset)->pcp; + pcp = this_cpu_ptr(zone->per_cpu_pageset); list_add(&page->lru, &pcp->lists[migratetype]); pcp->count++; if (pcp->count >= READ_ONCE(pcp->high)) @@ -3451,7 +3450,7 @@ static struct page *rmqueue_pcplist(struct zone *preferred_zone, unsigned long flags; local_irq_save(flags); - pcp = &this_cpu_ptr(zone->pageset)->pcp; + pcp = this_cpu_ptr(zone->per_cpu_pageset); list = &pcp->lists[migratetype]; page = __rmqueue_pcplist(zone, migratetype, alloc_flags, pcp, list); if (page) { @@ -5054,7 +5053,7 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid, /* Attempt the batch allocation */ local_irq_save(flags); - pcp = &this_cpu_ptr(zone->pageset)->pcp; + pcp = this_cpu_ptr(zone->per_cpu_pageset); pcp_list = &pcp->lists[ac.migratetype]; while (nr_populated < nr_pages) { @@ -5667,7 +5666,7 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask) continue; for_each_online_cpu(cpu) - free_pcp += per_cpu_ptr(zone->pageset, cpu)->pcp.count; + free_pcp += per_cpu_ptr(zone->per_cpu_pageset, cpu)->count; } printk("active_anon:%lu inactive_anon:%lu isolated_anon:%lu\n" @@ -5759,7 +5758,7 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask) free_pcp = 0; for_each_online_cpu(cpu) - free_pcp += per_cpu_ptr(zone->pageset, cpu)->pcp.count; + free_pcp += per_cpu_ptr(zone->per_cpu_pageset, cpu)->count; show_node(zone); printk(KERN_CONT @@ -5800,7 +5799,7 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask) K(zone_page_state(zone, NR_MLOCK)), K(zone_page_state(zone, NR_BOUNCE)), K(free_pcp), - K(this_cpu_read(zone->pageset->pcp.count)), + K(this_cpu_read(zone->per_cpu_pageset->count)), K(zone_page_state(zone, NR_FREE_CMA_PAGES))); printk("lowmem_reserve[]:"); for (i = 0; i < MAX_NR_ZONES; i++) @@ -6127,11 +6126,12 @@ static void build_zonelists(pg_data_t *pgdat) * not check if the processor is online before following the pageset pointer. * Other parts of the kernel may not check if the zone is available. */ -static void pageset_init(struct per_cpu_pageset *p); +static void per_cpu_pages_init(struct per_cpu_pages *pcp, struct per_cpu_zonestat *pzstats); /* These effectively disable the pcplists in the boot pageset completely */ #define BOOT_PAGESET_HIGH 0 #define BOOT_PAGESET_BATCH 1 -static DEFINE_PER_CPU(struct per_cpu_pageset, boot_pageset); +static DEFINE_PER_CPU(struct per_cpu_pages, boot_pageset); +static DEFINE_PER_CPU(struct per_cpu_zonestat, boot_zonestats); static DEFINE_PER_CPU(struct per_cpu_nodestat, boot_nodestats); static void __build_all_zonelists(void *data) @@ -6198,7 +6198,7 @@ build_all_zonelists_init(void) * (a chicken-egg dilemma). */ for_each_possible_cpu(cpu) - pageset_init(&per_cpu(boot_pageset, cpu)); + per_cpu_pages_init(&per_cpu(boot_pageset, cpu), &per_cpu(boot_zonestats, cpu)); mminit_verify_zonelist(); cpuset_init_current_mems_allowed(); @@ -6576,14 +6576,13 @@ static void pageset_update(struct per_cpu_pages *pcp, unsigned long high, WRITE_ONCE(pcp->high, high); } -static void pageset_init(struct per_cpu_pageset *p) +static void per_cpu_pages_init(struct per_cpu_pages *pcp, struct per_cpu_zonestat *pzstats) { - struct per_cpu_pages *pcp; int migratetype; - memset(p, 0, sizeof(*p)); + memset(pcp, 0, sizeof(*pcp)); + memset(pzstats, 0, sizeof(*pzstats)); - pcp = &p->pcp; for (migratetype = 0; migratetype < MIGRATE_PCPTYPES; migratetype++) INIT_LIST_HEAD(&pcp->lists[migratetype]); @@ -6600,12 +6599,12 @@ static void pageset_init(struct per_cpu_pageset *p) static void __zone_set_pageset_high_and_batch(struct zone *zone, unsigned long high, unsigned long batch) { - struct per_cpu_pageset *p; + struct per_cpu_pages *pcp; int cpu; for_each_possible_cpu(cpu) { - p = per_cpu_ptr(zone->pageset, cpu); - pageset_update(&p->pcp, high, batch); + pcp = per_cpu_ptr(zone->per_cpu_pageset, cpu); + pageset_update(pcp, high, batch); } } @@ -6640,13 +6639,20 @@ static void zone_set_pageset_high_and_batch(struct zone *zone) void __meminit setup_zone_pageset(struct zone *zone) { - struct per_cpu_pageset *p; int cpu; - zone->pageset = alloc_percpu(struct per_cpu_pageset); + /* Size may be 0 on !SMP && !NUMA */ + if (sizeof(struct per_cpu_zonestat) > 0) + zone->per_cpu_zonestats = alloc_percpu(struct per_cpu_zonestat); + + zone->per_cpu_pageset = alloc_percpu(struct per_cpu_pages); for_each_possible_cpu(cpu) { - p = per_cpu_ptr(zone->pageset, cpu); - pageset_init(p); + struct per_cpu_pages *pcp; + struct per_cpu_zonestat *pzstats; + + pcp = per_cpu_ptr(zone->per_cpu_pageset, cpu); + pzstats = per_cpu_ptr(zone->per_cpu_zonestats, cpu); + per_cpu_pages_init(pcp, pzstats); } zone_set_pageset_high_and_batch(zone); @@ -6673,9 +6679,9 @@ void __init setup_per_cpu_pageset(void) * the nodes these zones are associated with. */ for_each_possible_cpu(cpu) { - struct per_cpu_pageset *pcp = &per_cpu(boot_pageset, cpu); - memset(pcp->vm_numa_stat_diff, 0, - sizeof(pcp->vm_numa_stat_diff)); + struct per_cpu_zonestat *pzstats = &per_cpu(boot_zonestats, cpu); + memset(pzstats->vm_numa_stat_diff, 0, + sizeof(pzstats->vm_numa_stat_diff)); } #endif @@ -6691,7 +6697,7 @@ static __meminit void zone_pcp_init(struct zone *zone) * relies on the ability of the linker to provide the * offset of a (static) per cpu variable into the per cpu area. */ - zone->pageset = &boot_pageset; + zone->per_cpu_pageset = &boot_pageset; zone->pageset_high = BOOT_PAGESET_HIGH; zone->pageset_batch = BOOT_PAGESET_BATCH; @@ -8954,17 +8960,19 @@ void zone_pcp_reset(struct zone *zone) { unsigned long flags; int cpu; - struct per_cpu_pageset *pset; + struct per_cpu_zonestat *pzstats; /* avoid races with drain_pages() */ local_irq_save(flags); - if (zone->pageset != &boot_pageset) { + if (zone->per_cpu_pageset != &boot_pageset) { for_each_online_cpu(cpu) { - pset = per_cpu_ptr(zone->pageset, cpu); - drain_zonestat(zone, pset); + pzstats = per_cpu_ptr(zone->per_cpu_zonestats, cpu); + drain_zonestat(zone, pzstats); } - free_percpu(zone->pageset); - zone->pageset = &boot_pageset; + free_percpu(zone->per_cpu_pageset); + free_percpu(zone->per_cpu_zonestats); + zone->per_cpu_pageset = &boot_pageset; + zone->per_cpu_zonestats = &boot_zonestats; } local_irq_restore(flags); } diff --git a/mm/vmstat.c b/mm/vmstat.c index 74b2c374b86c..8a8f1a26b231 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -44,7 +44,7 @@ static void zero_zone_numa_counters(struct zone *zone) for (item = 0; item < NR_VM_NUMA_STAT_ITEMS; item++) { atomic_long_set(&zone->vm_numa_stat[item], 0); for_each_online_cpu(cpu) - per_cpu_ptr(zone->pageset, cpu)->vm_numa_stat_diff[item] + per_cpu_ptr(zone->per_cpu_zonestats, cpu)->vm_numa_stat_diff[item] = 0; } } @@ -266,7 +266,7 @@ void refresh_zone_stat_thresholds(void) for_each_online_cpu(cpu) { int pgdat_threshold; - per_cpu_ptr(zone->pageset, cpu)->stat_threshold + per_cpu_ptr(zone->per_cpu_zonestats, cpu)->stat_threshold = threshold; /* Base nodestat threshold on the largest populated zone. */ @@ -303,7 +303,7 @@ void set_pgdat_percpu_threshold(pg_data_t *pgdat, threshold = (*calculate_pressure)(zone); for_each_online_cpu(cpu) - per_cpu_ptr(zone->pageset, cpu)->stat_threshold + per_cpu_ptr(zone->per_cpu_zonestats, cpu)->stat_threshold = threshold; } } @@ -316,7 +316,7 @@ void set_pgdat_percpu_threshold(pg_data_t *pgdat, void __mod_zone_page_state(struct zone *zone, enum zone_stat_item item, long delta) { - struct per_cpu_pageset __percpu *pcp = zone->pageset; + struct per_cpu_zonestat __percpu *pcp = zone->per_cpu_zonestats; s8 __percpu *p = pcp->vm_stat_diff + item; long x; long t; @@ -389,7 +389,7 @@ EXPORT_SYMBOL(__mod_node_page_state); */ void __inc_zone_state(struct zone *zone, enum zone_stat_item item) { - struct per_cpu_pageset __percpu *pcp = zone->pageset; + struct per_cpu_zonestat __percpu *pcp = zone->per_cpu_zonestats; s8 __percpu *p = pcp->vm_stat_diff + item; s8 v, t; @@ -435,7 +435,7 @@ EXPORT_SYMBOL(__inc_node_page_state); void __dec_zone_state(struct zone *zone, enum zone_stat_item item) { - struct per_cpu_pageset __percpu *pcp = zone->pageset; + struct per_cpu_zonestat __percpu *pcp = zone->per_cpu_zonestats; s8 __percpu *p = pcp->vm_stat_diff + item; s8 v, t; @@ -495,7 +495,7 @@ EXPORT_SYMBOL(__dec_node_page_state); static inline void mod_zone_state(struct zone *zone, enum zone_stat_item item, long delta, int overstep_mode) { - struct per_cpu_pageset __percpu *pcp = zone->pageset; + struct per_cpu_zonestat __percpu *pcp = zone->per_cpu_zonestats; s8 __percpu *p = pcp->vm_stat_diff + item; long o, n, t, z; @@ -781,19 +781,20 @@ static int refresh_cpu_vm_stats(bool do_pagesets) int changes = 0; for_each_populated_zone(zone) { - struct per_cpu_pageset __percpu *p = zone->pageset; + struct per_cpu_zonestat __percpu *pzstats = zone->per_cpu_zonestats; + struct per_cpu_pages __percpu *pcp = zone->per_cpu_pageset; for (i = 0; i < NR_VM_ZONE_STAT_ITEMS; i++) { int v; - v = this_cpu_xchg(p->vm_stat_diff[i], 0); + v = this_cpu_xchg(pzstats->vm_stat_diff[i], 0); if (v) { atomic_long_add(v, &zone->vm_stat[i]); global_zone_diff[i] += v; #ifdef CONFIG_NUMA /* 3 seconds idle till flush */ - __this_cpu_write(p->expire, 3); + __this_cpu_write(pcp->expire, 3); #endif } } @@ -801,12 +802,12 @@ static int refresh_cpu_vm_stats(bool do_pagesets) for (i = 0; i < NR_VM_NUMA_STAT_ITEMS; i++) { int v; - v = this_cpu_xchg(p->vm_numa_stat_diff[i], 0); + v = this_cpu_xchg(pzstats->vm_numa_stat_diff[i], 0); if (v) { atomic_long_add(v, &zone->vm_numa_stat[i]); global_numa_diff[i] += v; - __this_cpu_write(p->expire, 3); + __this_cpu_write(pcp->expire, 3); } } @@ -819,23 +820,23 @@ static int refresh_cpu_vm_stats(bool do_pagesets) * Check if there are pages remaining in this pageset * if not then there is nothing to expire. */ - if (!__this_cpu_read(p->expire) || - !__this_cpu_read(p->pcp.count)) + if (!__this_cpu_read(pcp->expire) || + !__this_cpu_read(pcp->count)) continue; /* * We never drain zones local to this processor. */ if (zone_to_nid(zone) == numa_node_id()) { - __this_cpu_write(p->expire, 0); + __this_cpu_write(pcp->expire, 0); continue; } - if (__this_cpu_dec_return(p->expire)) + if (__this_cpu_dec_return(pcp->expire)) continue; - if (__this_cpu_read(p->pcp.count)) { - drain_zone_pages(zone, this_cpu_ptr(&p->pcp)); + if (__this_cpu_read(pcp->count)) { + drain_zone_pages(zone, this_cpu_ptr(pcp)); changes++; } } @@ -882,27 +883,27 @@ void cpu_vm_stats_fold(int cpu) int global_node_diff[NR_VM_NODE_STAT_ITEMS] = { 0, }; for_each_populated_zone(zone) { - struct per_cpu_pageset *p; + struct per_cpu_zonestat *pzstats; - p = per_cpu_ptr(zone->pageset, cpu); + pzstats = per_cpu_ptr(zone->per_cpu_zonestats, cpu); for (i = 0; i < NR_VM_ZONE_STAT_ITEMS; i++) - if (p->vm_stat_diff[i]) { + if (pzstats->vm_stat_diff[i]) { int v; - v = p->vm_stat_diff[i]; - p->vm_stat_diff[i] = 0; + v = pzstats->vm_stat_diff[i]; + pzstats->vm_stat_diff[i] = 0; atomic_long_add(v, &zone->vm_stat[i]); global_zone_diff[i] += v; } #ifdef CONFIG_NUMA for (i = 0; i < NR_VM_NUMA_STAT_ITEMS; i++) - if (p->vm_numa_stat_diff[i]) { + if (pzstats->vm_numa_stat_diff[i]) { int v; - v = p->vm_numa_stat_diff[i]; - p->vm_numa_stat_diff[i] = 0; + v = pzstats->vm_numa_stat_diff[i]; + pzstats->vm_numa_stat_diff[i] = 0; atomic_long_add(v, &zone->vm_numa_stat[i]); global_numa_diff[i] += v; } @@ -936,24 +937,24 @@ void cpu_vm_stats_fold(int cpu) * this is only called if !populated_zone(zone), which implies no other users of * pset->vm_stat_diff[] exsist. */ -void drain_zonestat(struct zone *zone, struct per_cpu_pageset *pset) +void drain_zonestat(struct zone *zone, struct per_cpu_zonestat *pzstats) { int i; for (i = 0; i < NR_VM_ZONE_STAT_ITEMS; i++) - if (pset->vm_stat_diff[i]) { - int v = pset->vm_stat_diff[i]; - pset->vm_stat_diff[i] = 0; + if (pzstats->vm_stat_diff[i]) { + int v = pzstats->vm_stat_diff[i]; + pzstats->vm_stat_diff[i] = 0; atomic_long_add(v, &zone->vm_stat[i]); atomic_long_add(v, &vm_zone_stat[i]); } #ifdef CONFIG_NUMA for (i = 0; i < NR_VM_NUMA_STAT_ITEMS; i++) - if (pset->vm_numa_stat_diff[i]) { - int v = pset->vm_numa_stat_diff[i]; + if (pzstats->vm_numa_stat_diff[i]) { + int v = pzstats->vm_numa_stat_diff[i]; - pset->vm_numa_stat_diff[i] = 0; + pzstats->vm_numa_stat_diff[i] = 0; atomic_long_add(v, &zone->vm_numa_stat[i]); atomic_long_add(v, &vm_numa_stat[i]); } @@ -965,8 +966,8 @@ void drain_zonestat(struct zone *zone, struct per_cpu_pageset *pset) void __inc_numa_state(struct zone *zone, enum numa_stat_item item) { - struct per_cpu_pageset __percpu *pcp = zone->pageset; - u16 __percpu *p = pcp->vm_numa_stat_diff + item; + struct per_cpu_zonestat __percpu *pzstats = zone->per_cpu_zonestats; + u16 __percpu *p = pzstats->vm_numa_stat_diff + item; u16 v; v = __this_cpu_inc_return(*p); @@ -1685,21 +1686,23 @@ static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat, seq_printf(m, "\n pagesets"); for_each_online_cpu(i) { - struct per_cpu_pageset *pageset; + struct per_cpu_pages *pcp; + struct per_cpu_zonestat *pzstats; - pageset = per_cpu_ptr(zone->pageset, i); + pcp = per_cpu_ptr(zone->per_cpu_pageset, i); + pzstats = per_cpu_ptr(zone->per_cpu_zonestats, i); seq_printf(m, "\n cpu: %i" "\n count: %i" "\n high: %i" "\n batch: %i", i, - pageset->pcp.count, - pageset->pcp.high, - pageset->pcp.batch); + pcp->count, + pcp->high, + pcp->batch); #ifdef CONFIG_SMP seq_printf(m, "\n vm stats threshold: %d", - pageset->stat_threshold); + pzstats->stat_threshold); #endif } seq_printf(m, @@ -1910,17 +1913,18 @@ static bool need_update(int cpu) struct zone *zone; for_each_populated_zone(zone) { - struct per_cpu_pageset *p = per_cpu_ptr(zone->pageset, cpu); + struct per_cpu_zonestat *pzstats = per_cpu_ptr(zone->per_cpu_zonestats, cpu); struct per_cpu_nodestat *n; + /* * The fast way of checking if there are any vmstat diffs. */ - if (memchr_inv(p->vm_stat_diff, 0, NR_VM_ZONE_STAT_ITEMS * - sizeof(p->vm_stat_diff[0]))) + if (memchr_inv(pzstats->vm_stat_diff, 0, NR_VM_ZONE_STAT_ITEMS * + sizeof(pzstats->vm_stat_diff[0]))) return true; #ifdef CONFIG_NUMA - if (memchr_inv(p->vm_numa_stat_diff, 0, NR_VM_NUMA_STAT_ITEMS * - sizeof(p->vm_numa_stat_diff[0]))) + if (memchr_inv(pzstats->vm_numa_stat_diff, 0, NR_VM_NUMA_STAT_ITEMS * + sizeof(pzstats->vm_numa_stat_diff[0]))) return true; #endif if (last_pgdat == zone->zone_pgdat) From patchwork Wed Apr 7 20:24:14 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mel Gorman X-Patchwork-Id: 12189289 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 35797C433ED for ; Wed, 7 Apr 2021 20:24:59 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id C301F611EE for ; Wed, 7 Apr 2021 20:24:58 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C301F611EE Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=techsingularity.net Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 0D01B6B007D; Wed, 7 Apr 2021 16:24:58 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0A6286B007E; Wed, 7 Apr 2021 16:24:58 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E62886B0080; Wed, 7 Apr 2021 16:24:57 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0064.hostedemail.com [216.40.44.64]) by kanga.kvack.org (Postfix) with ESMTP id CBEDA6B007D for ; Wed, 7 Apr 2021 16:24:57 -0400 (EDT) Received: from smtpin36.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 9052C81D8 for ; Wed, 7 Apr 2021 20:24:57 +0000 (UTC) X-FDA: 78006699834.36.B907DB9 Received: from outbound-smtp47.blacknight.com (outbound-smtp47.blacknight.com [46.22.136.64]) by imf17.hostedemail.com (Postfix) with ESMTP id 5B0BA40002DB for ; Wed, 7 Apr 2021 20:24:55 +0000 (UTC) Received: from mail.blacknight.com (pemlinmail01.blacknight.ie [81.17.254.10]) by outbound-smtp47.blacknight.com (Postfix) with ESMTPS id D44DCFB3E9 for ; Wed, 7 Apr 2021 21:24:55 +0100 (IST) Received: (qmail 14561 invoked from network); 7 Apr 2021 20:24:55 -0000 Received: from unknown (HELO stampy.112glenside.lan) (mgorman@techsingularity.net@[84.203.22.4]) by 81.17.254.9 with ESMTPA; 7 Apr 2021 20:24:55 -0000 From: Mel Gorman To: Linux-MM , Linux-RT-Users Cc: LKML , Chuck Lever , Jesper Dangaard Brouer , Matthew Wilcox , Thomas Gleixner , Peter Zijlstra , Ingo Molnar , Michal Hocko , Oscar Salvador , Mel Gorman Subject: [PATCH 02/11] mm/page_alloc: Convert per-cpu list protection to local_lock Date: Wed, 7 Apr 2021 21:24:14 +0100 Message-Id: <20210407202423.16022-3-mgorman@techsingularity.net> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210407202423.16022-1-mgorman@techsingularity.net> References: <20210407202423.16022-1-mgorman@techsingularity.net> MIME-Version: 1.0 X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 5B0BA40002DB X-Stat-Signature: t5dhqhcqnehn78gy7eadzpeuuo8nwrhg Received-SPF: none (techsingularity.net>: No applicable sender policy available) receiver=imf17; identity=mailfrom; envelope-from=""; helo=outbound-smtp47.blacknight.com; client-ip=46.22.136.64 X-HE-DKIM-Result: none/none X-HE-Tag: 1617827095-335038 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: There is a lack of clarity of what exactly local_irq_save/local_irq_restore protects in page_alloc.c . It conflates the protection of per-cpu page allocation structures with per-cpu vmstat deltas. This patch protects the PCP structure using local_lock which for most configurations is identical to IRQ enabling/disabling. The scope of the lock is still wider than it should be but this is decreased laster. [lkp@intel.com: Make pagesets static] Signed-off-by: Mel Gorman --- include/linux/mmzone.h | 2 ++ mm/page_alloc.c | 50 +++++++++++++++++++++++++++++------------- 2 files changed, 37 insertions(+), 15 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index a4393ac27336..106da8fbc72a 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -20,6 +20,7 @@ #include #include #include +#include #include /* Free memory management - zoned buddy allocator. */ @@ -337,6 +338,7 @@ enum zone_watermarks { #define high_wmark_pages(z) (z->_watermark[WMARK_HIGH] + z->watermark_boost) #define wmark_pages(z, i) (z->_watermark[i] + z->watermark_boost) +/* Fields and list protected by pagesets local_lock in page_alloc.c */ struct per_cpu_pages { int count; /* number of pages in the list */ int high; /* high watermark, emptying needed */ diff --git a/mm/page_alloc.c b/mm/page_alloc.c index a68bacddcae0..e9e60d1a85d4 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -112,6 +112,13 @@ typedef int __bitwise fpi_t; static DEFINE_MUTEX(pcp_batch_high_lock); #define MIN_PERCPU_PAGELIST_FRACTION (8) +struct pagesets { + local_lock_t lock; +}; +static DEFINE_PER_CPU(struct pagesets, pagesets) = { + .lock = INIT_LOCAL_LOCK(lock), +}; + #ifdef CONFIG_USE_PERCPU_NUMA_NODE_ID DEFINE_PER_CPU(int, numa_node); EXPORT_PER_CPU_SYMBOL(numa_node); @@ -1421,6 +1428,10 @@ static void free_pcppages_bulk(struct zone *zone, int count, } while (--count && --batch_free && !list_empty(list)); } + /* + * local_lock_irq held so equivalent to spin_lock_irqsave for + * both PREEMPT_RT and non-PREEMPT_RT configurations. + */ spin_lock(&zone->lock); isolated_pageblocks = has_isolate_pageblock(zone); @@ -1541,6 +1552,11 @@ static void __free_pages_ok(struct page *page, unsigned int order, return; migratetype = get_pfnblock_migratetype(page, pfn); + + /* + * TODO FIX: Disable IRQs before acquiring IRQ-safe zone->lock + * and protect vmstat updates. + */ local_irq_save(flags); __count_vm_events(PGFREE, 1 << order); free_one_page(page_zone(page), page, pfn, order, migratetype, @@ -2910,6 +2926,10 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order, { int i, allocated = 0; + /* + * local_lock_irq held so equivalent to spin_lock_irqsave for + * both PREEMPT_RT and non-PREEMPT_RT configurations. + */ spin_lock(&zone->lock); for (i = 0; i < count; ++i) { struct page *page = __rmqueue(zone, order, migratetype, @@ -2962,12 +2982,12 @@ void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp) unsigned long flags; int to_drain, batch; - local_irq_save(flags); + local_lock_irqsave(&pagesets.lock, flags); batch = READ_ONCE(pcp->batch); to_drain = min(pcp->count, batch); if (to_drain > 0) free_pcppages_bulk(zone, to_drain, pcp); - local_irq_restore(flags); + local_unlock_irqrestore(&pagesets.lock, flags); } #endif @@ -2983,13 +3003,13 @@ static void drain_pages_zone(unsigned int cpu, struct zone *zone) unsigned long flags; struct per_cpu_pages *pcp; - local_irq_save(flags); + local_lock_irqsave(&pagesets.lock, flags); pcp = per_cpu_ptr(zone->per_cpu_pageset, cpu); if (pcp->count) free_pcppages_bulk(zone, pcp->count, pcp); - local_irq_restore(flags); + local_unlock_irqrestore(&pagesets.lock, flags); } /* @@ -3252,9 +3272,9 @@ void free_unref_page(struct page *page) if (!free_unref_page_prepare(page, pfn)) return; - local_irq_save(flags); + local_lock_irqsave(&pagesets.lock, flags); free_unref_page_commit(page, pfn); - local_irq_restore(flags); + local_unlock_irqrestore(&pagesets.lock, flags); } /* @@ -3274,7 +3294,7 @@ void free_unref_page_list(struct list_head *list) set_page_private(page, pfn); } - local_irq_save(flags); + local_lock_irqsave(&pagesets.lock, flags); list_for_each_entry_safe(page, next, list, lru) { unsigned long pfn = page_private(page); @@ -3287,12 +3307,12 @@ void free_unref_page_list(struct list_head *list) * a large list of pages to free. */ if (++batch_count == SWAP_CLUSTER_MAX) { - local_irq_restore(flags); + local_unlock_irqrestore(&pagesets.lock, flags); batch_count = 0; - local_irq_save(flags); + local_lock_irqsave(&pagesets.lock, flags); } } - local_irq_restore(flags); + local_unlock_irqrestore(&pagesets.lock, flags); } /* @@ -3449,7 +3469,7 @@ static struct page *rmqueue_pcplist(struct zone *preferred_zone, struct page *page; unsigned long flags; - local_irq_save(flags); + local_lock_irqsave(&pagesets.lock, flags); pcp = this_cpu_ptr(zone->per_cpu_pageset); list = &pcp->lists[migratetype]; page = __rmqueue_pcplist(zone, migratetype, alloc_flags, pcp, list); @@ -3457,7 +3477,7 @@ static struct page *rmqueue_pcplist(struct zone *preferred_zone, __count_zid_vm_events(PGALLOC, page_zonenum(page), 1); zone_statistics(preferred_zone, zone); } - local_irq_restore(flags); + local_unlock_irqrestore(&pagesets.lock, flags); return page; } @@ -5052,7 +5072,7 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid, goto failed; /* Attempt the batch allocation */ - local_irq_save(flags); + local_lock_irqsave(&pagesets.lock, flags); pcp = this_cpu_ptr(zone->per_cpu_pageset); pcp_list = &pcp->lists[ac.migratetype]; @@ -5090,12 +5110,12 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid, nr_populated++; } - local_irq_restore(flags); + local_unlock_irqrestore(&pagesets.lock, flags); return nr_populated; failed_irq: - local_irq_restore(flags); + local_unlock_irqrestore(&pagesets.lock, flags); failed: page = __alloc_pages(gfp, 0, preferred_nid, nodemask); From patchwork Wed Apr 7 20:24:15 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mel Gorman X-Patchwork-Id: 12189291 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 21FA5C433B4 for ; Wed, 7 Apr 2021 20:25:09 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id C64DB6100A for ; Wed, 7 Apr 2021 20:25:08 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C64DB6100A Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=techsingularity.net Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 6747D6B007E; Wed, 7 Apr 2021 16:25:08 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 64B526B0080; Wed, 7 Apr 2021 16:25:08 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 513746B0081; Wed, 7 Apr 2021 16:25:08 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0197.hostedemail.com [216.40.44.197]) by kanga.kvack.org (Postfix) with ESMTP id 368A56B007E for ; Wed, 7 Apr 2021 16:25:08 -0400 (EDT) Received: from smtpin03.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id EA32C815E for ; Wed, 7 Apr 2021 20:25:07 +0000 (UTC) X-FDA: 78006700254.03.0C49710 Received: from outbound-smtp32.blacknight.com (outbound-smtp32.blacknight.com [81.17.249.64]) by imf20.hostedemail.com (Postfix) with ESMTP id C976F132 for ; Wed, 7 Apr 2021 20:25:04 +0000 (UTC) Received: from mail.blacknight.com (pemlinmail01.blacknight.ie [81.17.254.10]) by outbound-smtp32.blacknight.com (Postfix) with ESMTPS id 2B166BECAC for ; Wed, 7 Apr 2021 21:25:06 +0100 (IST) Received: (qmail 15188 invoked from network); 7 Apr 2021 20:25:06 -0000 Received: from unknown (HELO stampy.112glenside.lan) (mgorman@techsingularity.net@[84.203.22.4]) by 81.17.254.9 with ESMTPA; 7 Apr 2021 20:25:05 -0000 From: Mel Gorman To: Linux-MM , Linux-RT-Users Cc: LKML , Chuck Lever , Jesper Dangaard Brouer , Matthew Wilcox , Thomas Gleixner , Peter Zijlstra , Ingo Molnar , Michal Hocko , Oscar Salvador , Mel Gorman Subject: [PATCH 03/11] mm/memory_hotplug: Make unpopulated zones PCP structures unreachable during hot remove Date: Wed, 7 Apr 2021 21:24:15 +0100 Message-Id: <20210407202423.16022-4-mgorman@techsingularity.net> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210407202423.16022-1-mgorman@techsingularity.net> References: <20210407202423.16022-1-mgorman@techsingularity.net> MIME-Version: 1.0 X-Rspamd-Queue-Id: C976F132 X-Stat-Signature: wfzepyoebm148yequkwi933hadcxo3g4 X-Rspamd-Server: rspam02 Received-SPF: none (techsingularity.net>: No applicable sender policy available) receiver=imf20; identity=mailfrom; envelope-from=""; helo=outbound-smtp32.blacknight.com; client-ip=81.17.249.64 X-HE-DKIM-Result: none/none X-HE-Tag: 1617827104-222328 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: zone_pcp_reset allegedly protects against a race with drain_pages using local_irq_save but this is bogus. local_irq_save only operates on the local CPU. If memory hotplug is running on CPU A and drain_pages is running on CPU B, disabling IRQs on CPU A does not affect CPU B and offers no protection. This patch reorders memory hotremove such that the PCP structures relevant to the zone are no longer reachable by the time the structures are freed. With this reordering, no protection is required to prevent a use-after-free and the IRQs can be left enabled. zone_pcp_reset is renamed to zone_pcp_destroy to make it clear that the per-cpu structures are deleted when the function returns. Signed-off-by: Mel Gorman --- mm/internal.h | 2 +- mm/memory_hotplug.c | 10 +++++++--- mm/page_alloc.c | 22 ++++++++++++++++------ 3 files changed, 24 insertions(+), 10 deletions(-) diff --git a/mm/internal.h b/mm/internal.h index 09adf152a10b..cc34ce4461b7 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -203,7 +203,7 @@ extern void free_unref_page(struct page *page); extern void free_unref_page_list(struct list_head *list); extern void zone_pcp_update(struct zone *zone); -extern void zone_pcp_reset(struct zone *zone); +extern void zone_pcp_destroy(struct zone *zone); extern void zone_pcp_disable(struct zone *zone); extern void zone_pcp_enable(struct zone *zone); diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 0cdbbfbc5757..3d059c9f9c2d 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1687,12 +1687,16 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages) zone->nr_isolate_pageblock -= nr_pages / pageblock_nr_pages; spin_unlock_irqrestore(&zone->lock, flags); - zone_pcp_enable(zone); - /* removal success */ adjust_managed_page_count(pfn_to_page(start_pfn), -nr_pages); zone->present_pages -= nr_pages; + /* + * Restore PCP after managed pages has been updated. Unpopulated + * zones PCP structures will remain unusable. + */ + zone_pcp_enable(zone); + pgdat_resize_lock(zone->zone_pgdat, &flags); zone->zone_pgdat->node_present_pages -= nr_pages; pgdat_resize_unlock(zone->zone_pgdat, &flags); @@ -1700,8 +1704,8 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages) init_per_zone_wmark_min(); if (!populated_zone(zone)) { - zone_pcp_reset(zone); build_all_zonelists(NULL); + zone_pcp_destroy(zone); } else zone_pcp_update(zone); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index e9e60d1a85d4..a8630003612b 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -8972,18 +8972,29 @@ void zone_pcp_disable(struct zone *zone) void zone_pcp_enable(struct zone *zone) { - __zone_set_pageset_high_and_batch(zone, zone->pageset_high, zone->pageset_batch); + /* + * If the zone is populated, restore the high and batch counts. + * If unpopulated, leave the high and batch count as 0 and 1 + * respectively as done by zone_pcp_disable. The per-cpu + * structures will later be freed by zone_pcp_destroy. + */ + if (populated_zone(zone)) + __zone_set_pageset_high_and_batch(zone, zone->pageset_high, zone->pageset_batch); + mutex_unlock(&pcp_batch_high_lock); } -void zone_pcp_reset(struct zone *zone) +/* + * Called when a zone has been hot-removed. At this point, the PCP has been + * drained, disabled and the zone is removed from the zonelists so the + * structures are no longer in use. PCP was disabled/drained by + * zone_pcp_disable. This function will drain any remaining vmstat deltas. + */ +void zone_pcp_destroy(struct zone *zone) { - unsigned long flags; int cpu; struct per_cpu_zonestat *pzstats; - /* avoid races with drain_pages() */ - local_irq_save(flags); if (zone->per_cpu_pageset != &boot_pageset) { for_each_online_cpu(cpu) { pzstats = per_cpu_ptr(zone->per_cpu_zonestats, cpu); @@ -8994,7 +9005,6 @@ void zone_pcp_reset(struct zone *zone) zone->per_cpu_pageset = &boot_pageset; zone->per_cpu_zonestats = &boot_zonestats; } - local_irq_restore(flags); } #ifdef CONFIG_MEMORY_HOTREMOVE From patchwork Wed Apr 7 20:24:16 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mel Gorman X-Patchwork-Id: 12189293 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7EA2FC43460 for ; Wed, 7 Apr 2021 20:25:19 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id EE77661184 for ; Wed, 7 Apr 2021 20:25:18 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org EE77661184 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=techsingularity.net Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 822B66B0080; Wed, 7 Apr 2021 16:25:18 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7D17A6B0081; Wed, 7 Apr 2021 16:25:18 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 673036B0082; Wed, 7 Apr 2021 16:25:18 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0188.hostedemail.com [216.40.44.188]) by kanga.kvack.org (Postfix) with ESMTP id 488C86B0080 for ; Wed, 7 Apr 2021 16:25:18 -0400 (EDT) Received: from smtpin22.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id F040118099AA8 for ; Wed, 7 Apr 2021 20:25:17 +0000 (UTC) X-FDA: 78006700674.22.A048533 Received: from outbound-smtp09.blacknight.com (outbound-smtp09.blacknight.com [46.22.139.14]) by imf19.hostedemail.com (Postfix) with ESMTP id E7D8C90009E2 for ; Wed, 7 Apr 2021 20:25:08 +0000 (UTC) Received: from mail.blacknight.com (pemlinmail01.blacknight.ie [81.17.254.10]) by outbound-smtp09.blacknight.com (Postfix) with ESMTPS id 4B8891C4856 for ; Wed, 7 Apr 2021 21:25:16 +0100 (IST) Received: (qmail 15718 invoked from network); 7 Apr 2021 20:25:16 -0000 Received: from unknown (HELO stampy.112glenside.lan) (mgorman@techsingularity.net@[84.203.22.4]) by 81.17.254.9 with ESMTPA; 7 Apr 2021 20:25:16 -0000 From: Mel Gorman To: Linux-MM , Linux-RT-Users Cc: LKML , Chuck Lever , Jesper Dangaard Brouer , Matthew Wilcox , Thomas Gleixner , Peter Zijlstra , Ingo Molnar , Michal Hocko , Oscar Salvador , Mel Gorman Subject: [PATCH 04/11] mm/vmstat: Convert NUMA statistics to basic NUMA counters Date: Wed, 7 Apr 2021 21:24:16 +0100 Message-Id: <20210407202423.16022-5-mgorman@techsingularity.net> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210407202423.16022-1-mgorman@techsingularity.net> References: <20210407202423.16022-1-mgorman@techsingularity.net> MIME-Version: 1.0 X-Stat-Signature: p8pb9mbr1ubncfchyagxjcti5xzdpe1c X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: E7D8C90009E2 Received-SPF: none (techsingularity.net>: No applicable sender policy available) receiver=imf19; identity=mailfrom; envelope-from=""; helo=outbound-smtp09.blacknight.com; client-ip=46.22.139.14 X-HE-DKIM-Result: none/none X-HE-Tag: 1617827108-864092 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: NUMA statistics are maintained on the zone level for hits, misses, foreign etc but nothing relies on them being perfectly accurate for functional correctness. The counters are used by userspace to get a general overview of a workloads NUMA behaviour but the page allocator incurs a high cost to maintain perfect accuracy similar to what is required for a vmstat like NR_FREE_PAGES. There even is a sysctl vm.numa_stat to allow userspace to turn off the collection of NUMA statistics like NUMA_HIT. This patch converts NUMA_HIT and friends to be NUMA events with similar accuracy to VM events. There is a possibility that slight errors will be introduced but the overall trend as seen by userspace will be similar. Note that while these counters could be maintained at the node level that it would have a user-visible impact. Signed-off-by: Mel Gorman --- drivers/base/node.c | 18 +++-- include/linux/mmzone.h | 11 ++- include/linux/vmstat.h | 42 +++++----- mm/mempolicy.c | 2 +- mm/page_alloc.c | 12 +-- mm/vmstat.c | 175 ++++++++++++----------------------------- 6 files changed, 93 insertions(+), 167 deletions(-) diff --git a/drivers/base/node.c b/drivers/base/node.c index f449dbb2c746..443a609db428 100644 --- a/drivers/base/node.c +++ b/drivers/base/node.c @@ -484,6 +484,7 @@ static DEVICE_ATTR(meminfo, 0444, node_read_meminfo, NULL); static ssize_t node_read_numastat(struct device *dev, struct device_attribute *attr, char *buf) { + fold_vm_numa_events(); return sysfs_emit(buf, "numa_hit %lu\n" "numa_miss %lu\n" @@ -491,12 +492,12 @@ static ssize_t node_read_numastat(struct device *dev, "interleave_hit %lu\n" "local_node %lu\n" "other_node %lu\n", - sum_zone_numa_state(dev->id, NUMA_HIT), - sum_zone_numa_state(dev->id, NUMA_MISS), - sum_zone_numa_state(dev->id, NUMA_FOREIGN), - sum_zone_numa_state(dev->id, NUMA_INTERLEAVE_HIT), - sum_zone_numa_state(dev->id, NUMA_LOCAL), - sum_zone_numa_state(dev->id, NUMA_OTHER)); + sum_zone_numa_event_state(dev->id, NUMA_HIT), + sum_zone_numa_event_state(dev->id, NUMA_MISS), + sum_zone_numa_event_state(dev->id, NUMA_FOREIGN), + sum_zone_numa_event_state(dev->id, NUMA_INTERLEAVE_HIT), + sum_zone_numa_event_state(dev->id, NUMA_LOCAL), + sum_zone_numa_event_state(dev->id, NUMA_OTHER)); } static DEVICE_ATTR(numastat, 0444, node_read_numastat, NULL); @@ -514,10 +515,11 @@ static ssize_t node_read_vmstat(struct device *dev, sum_zone_node_page_state(nid, i)); #ifdef CONFIG_NUMA - for (i = 0; i < NR_VM_NUMA_STAT_ITEMS; i++) + fold_vm_numa_events(); + for (i = 0; i < NR_VM_NUMA_EVENT_ITEMS; i++) len += sysfs_emit_at(buf, len, "%s %lu\n", numa_stat_name(i), - sum_zone_numa_state(nid, i)); + sum_zone_numa_event_state(nid, i)); #endif for (i = 0; i < NR_VM_NODE_STAT_ITEMS; i++) { diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 106da8fbc72a..693cd5f24f7d 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -135,10 +135,10 @@ enum numa_stat_item { NUMA_INTERLEAVE_HIT, /* interleaver preferred this zone */ NUMA_LOCAL, /* allocation from local node */ NUMA_OTHER, /* allocation from other node */ - NR_VM_NUMA_STAT_ITEMS + NR_VM_NUMA_EVENT_ITEMS }; #else -#define NR_VM_NUMA_STAT_ITEMS 0 +#define NR_VM_NUMA_EVENT_ITEMS 0 #endif enum zone_stat_item { @@ -357,7 +357,10 @@ struct per_cpu_zonestat { s8 stat_threshold; #endif #ifdef CONFIG_NUMA - u16 vm_numa_stat_diff[NR_VM_NUMA_STAT_ITEMS]; + u16 vm_numa_stat_diff[NR_VM_NUMA_EVENT_ITEMS]; +#endif +#ifdef CONFIG_NUMA + unsigned long vm_numa_event[NR_VM_NUMA_EVENT_ITEMS]; #endif }; @@ -609,7 +612,7 @@ struct zone { ZONE_PADDING(_pad3_) /* Zone statistics */ atomic_long_t vm_stat[NR_VM_ZONE_STAT_ITEMS]; - atomic_long_t vm_numa_stat[NR_VM_NUMA_STAT_ITEMS]; + atomic_long_t vm_numa_events[NR_VM_NUMA_EVENT_ITEMS]; } ____cacheline_internodealigned_in_smp; enum pgdat_flags { diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h index 1736ea9d24a7..fc14415223c5 100644 --- a/include/linux/vmstat.h +++ b/include/linux/vmstat.h @@ -138,35 +138,27 @@ static inline void vm_events_fold_cpu(int cpu) * Zone and node-based page accounting with per cpu differentials. */ extern atomic_long_t vm_zone_stat[NR_VM_ZONE_STAT_ITEMS]; -extern atomic_long_t vm_numa_stat[NR_VM_NUMA_STAT_ITEMS]; extern atomic_long_t vm_node_stat[NR_VM_NODE_STAT_ITEMS]; #ifdef CONFIG_NUMA -static inline void zone_numa_state_add(long x, struct zone *zone, - enum numa_stat_item item) -{ - atomic_long_add(x, &zone->vm_numa_stat[item]); - atomic_long_add(x, &vm_numa_stat[item]); -} - -static inline unsigned long global_numa_state(enum numa_stat_item item) +static inline unsigned long zone_numa_event_state(struct zone *zone, + enum numa_stat_item item) { - long x = atomic_long_read(&vm_numa_stat[item]); - - return x; + return atomic_long_read(&zone->vm_numa_events[item]); } -static inline unsigned long zone_numa_state_snapshot(struct zone *zone, - enum numa_stat_item item) +static inline unsigned long +global_numa_event_state(enum numa_stat_item item) { - long x = atomic_long_read(&zone->vm_numa_stat[item]); - int cpu; + struct zone *zone; + unsigned long x = 0; - for_each_online_cpu(cpu) - x += per_cpu_ptr(zone->per_cpu_zonestats, cpu)->vm_numa_stat_diff[item]; + for_each_populated_zone(zone) + x += zone_numa_event_state(zone, item); return x; } + #endif /* CONFIG_NUMA */ static inline void zone_page_state_add(long x, struct zone *zone, @@ -245,18 +237,22 @@ static inline unsigned long zone_page_state_snapshot(struct zone *zone, } #ifdef CONFIG_NUMA -extern void __inc_numa_state(struct zone *zone, enum numa_stat_item item); +extern void __count_numa_event(struct zone *zone, enum numa_stat_item item); extern unsigned long sum_zone_node_page_state(int node, enum zone_stat_item item); -extern unsigned long sum_zone_numa_state(int node, enum numa_stat_item item); +extern unsigned long sum_zone_numa_event_state(int node, enum numa_stat_item item); extern unsigned long node_page_state(struct pglist_data *pgdat, enum node_stat_item item); extern unsigned long node_page_state_pages(struct pglist_data *pgdat, enum node_stat_item item); +extern void fold_vm_numa_events(void); #else #define sum_zone_node_page_state(node, item) global_zone_page_state(item) #define node_page_state(node, item) global_node_page_state(item) #define node_page_state_pages(node, item) global_node_page_state_pages(item) +static inline void fold_vm_numa_events(void) +{ +} #endif /* CONFIG_NUMA */ #ifdef CONFIG_SMP @@ -428,7 +424,7 @@ static inline const char *numa_stat_name(enum numa_stat_item item) static inline const char *node_stat_name(enum node_stat_item item) { return vmstat_text[NR_VM_ZONE_STAT_ITEMS + - NR_VM_NUMA_STAT_ITEMS + + NR_VM_NUMA_EVENT_ITEMS + item]; } @@ -440,7 +436,7 @@ static inline const char *lru_list_name(enum lru_list lru) static inline const char *writeback_stat_name(enum writeback_stat_item item) { return vmstat_text[NR_VM_ZONE_STAT_ITEMS + - NR_VM_NUMA_STAT_ITEMS + + NR_VM_NUMA_EVENT_ITEMS + NR_VM_NODE_STAT_ITEMS + item]; } @@ -449,7 +445,7 @@ static inline const char *writeback_stat_name(enum writeback_stat_item item) static inline const char *vm_event_name(enum vm_event_item item) { return vmstat_text[NR_VM_ZONE_STAT_ITEMS + - NR_VM_NUMA_STAT_ITEMS + + NR_VM_NUMA_EVENT_ITEMS + NR_VM_NODE_STAT_ITEMS + NR_VM_WRITEBACK_STAT_ITEMS + item]; diff --git a/mm/mempolicy.c b/mm/mempolicy.c index cd0295567a04..99c06a9ae7ee 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -2146,7 +2146,7 @@ static struct page *alloc_page_interleave(gfp_t gfp, unsigned order, return page; if (page && page_to_nid(page) == nid) { preempt_disable(); - __inc_numa_state(page_zone(page), NUMA_INTERLEAVE_HIT); + __count_numa_event(page_zone(page), NUMA_INTERLEAVE_HIT); preempt_enable(); } return page; diff --git a/mm/page_alloc.c b/mm/page_alloc.c index a8630003612b..73e618d06315 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3424,12 +3424,12 @@ static inline void zone_statistics(struct zone *preferred_zone, struct zone *z) local_stat = NUMA_OTHER; if (zone_to_nid(z) == zone_to_nid(preferred_zone)) - __inc_numa_state(z, NUMA_HIT); + __count_numa_event(z, NUMA_HIT); else { - __inc_numa_state(z, NUMA_MISS); - __inc_numa_state(preferred_zone, NUMA_FOREIGN); + __count_numa_event(z, NUMA_MISS); + __count_numa_event(preferred_zone, NUMA_FOREIGN); } - __inc_numa_state(z, local_stat); + __count_numa_event(z, local_stat); #endif } @@ -6700,8 +6700,8 @@ void __init setup_per_cpu_pageset(void) */ for_each_possible_cpu(cpu) { struct per_cpu_zonestat *pzstats = &per_cpu(boot_zonestats, cpu); - memset(pzstats->vm_numa_stat_diff, 0, - sizeof(pzstats->vm_numa_stat_diff)); + memset(pzstats->vm_numa_event, 0, + sizeof(pzstats->vm_numa_event)); } #endif diff --git a/mm/vmstat.c b/mm/vmstat.c index 8a8f1a26b231..63bd84d122c0 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -41,38 +41,24 @@ static void zero_zone_numa_counters(struct zone *zone) { int item, cpu; - for (item = 0; item < NR_VM_NUMA_STAT_ITEMS; item++) { - atomic_long_set(&zone->vm_numa_stat[item], 0); - for_each_online_cpu(cpu) - per_cpu_ptr(zone->per_cpu_zonestats, cpu)->vm_numa_stat_diff[item] + for (item = 0; item < NR_VM_NUMA_EVENT_ITEMS; item++) { + atomic_long_set(&zone->vm_numa_events[item], 0); + for_each_online_cpu(cpu) { + per_cpu_ptr(zone->per_cpu_zonestats, cpu)->vm_numa_event[item] = 0; + } } } -/* zero numa counters of all the populated zones */ -static void zero_zones_numa_counters(void) +static void invalidate_numa_statistics(void) { struct zone *zone; + /* zero numa counters of all the populated zones */ for_each_populated_zone(zone) zero_zone_numa_counters(zone); } -/* zero global numa counters */ -static void zero_global_numa_counters(void) -{ - int item; - - for (item = 0; item < NR_VM_NUMA_STAT_ITEMS; item++) - atomic_long_set(&vm_numa_stat[item], 0); -} - -static void invalid_numa_statistics(void) -{ - zero_zones_numa_counters(); - zero_global_numa_counters(); -} - static DEFINE_MUTEX(vm_numa_stat_lock); int sysctl_vm_numa_stat_handler(struct ctl_table *table, int write, @@ -94,7 +80,7 @@ int sysctl_vm_numa_stat_handler(struct ctl_table *table, int write, pr_info("enable numa statistics\n"); } else { static_branch_disable(&vm_numa_stat_key); - invalid_numa_statistics(); + invalidate_numa_statistics(); pr_info("disable numa statistics, and clear numa counters\n"); } @@ -161,10 +147,8 @@ void vm_events_fold_cpu(int cpu) * vm_stat contains the global counters */ atomic_long_t vm_zone_stat[NR_VM_ZONE_STAT_ITEMS] __cacheline_aligned_in_smp; -atomic_long_t vm_numa_stat[NR_VM_NUMA_STAT_ITEMS] __cacheline_aligned_in_smp; atomic_long_t vm_node_stat[NR_VM_NODE_STAT_ITEMS] __cacheline_aligned_in_smp; EXPORT_SYMBOL(vm_zone_stat); -EXPORT_SYMBOL(vm_numa_stat); EXPORT_SYMBOL(vm_node_stat); #ifdef CONFIG_SMP @@ -706,8 +690,7 @@ EXPORT_SYMBOL(dec_node_page_state); * Fold a differential into the global counters. * Returns the number of counters updated. */ -#ifdef CONFIG_NUMA -static int fold_diff(int *zone_diff, int *numa_diff, int *node_diff) +static int fold_diff(int *zone_diff, int *node_diff) { int i; int changes = 0; @@ -718,12 +701,6 @@ static int fold_diff(int *zone_diff, int *numa_diff, int *node_diff) changes++; } - for (i = 0; i < NR_VM_NUMA_STAT_ITEMS; i++) - if (numa_diff[i]) { - atomic_long_add(numa_diff[i], &vm_numa_stat[i]); - changes++; - } - for (i = 0; i < NR_VM_NODE_STAT_ITEMS; i++) if (node_diff[i]) { atomic_long_add(node_diff[i], &vm_node_stat[i]); @@ -731,26 +708,36 @@ static int fold_diff(int *zone_diff, int *numa_diff, int *node_diff) } return changes; } -#else -static int fold_diff(int *zone_diff, int *node_diff) + +#ifdef CONFIG_NUMA +static void fold_vm_zone_numa_events(struct zone *zone) { - int i; - int changes = 0; + int zone_numa_events[NR_VM_NUMA_EVENT_ITEMS] = { 0, }; + int cpu; + enum numa_stat_item item; - for (i = 0; i < NR_VM_ZONE_STAT_ITEMS; i++) - if (zone_diff[i]) { - atomic_long_add(zone_diff[i], &vm_zone_stat[i]); - changes++; + for_each_online_cpu(cpu) { + struct per_cpu_zonestat *pzstats; + + pzstats = per_cpu_ptr(zone->per_cpu_zonestats, cpu); + for (item = 0; item < NR_VM_NUMA_EVENT_ITEMS; item++) { + zone_numa_events[item] += pzstats->vm_numa_event[item]; + } } - for (i = 0; i < NR_VM_NODE_STAT_ITEMS; i++) - if (node_diff[i]) { - atomic_long_add(node_diff[i], &vm_node_stat[i]); - changes++; + for (item = 0; item < NR_VM_NUMA_EVENT_ITEMS; item++) { + atomic_long_set(&zone->vm_numa_events[item], zone_numa_events[item]); } - return changes; } -#endif /* CONFIG_NUMA */ + +void fold_vm_numa_events(void) +{ + struct zone *zone; + + for_each_populated_zone(zone) + fold_vm_zone_numa_events(zone); +} +#endif /* * Update the zone counters for the current cpu. @@ -774,9 +761,6 @@ static int refresh_cpu_vm_stats(bool do_pagesets) struct zone *zone; int i; int global_zone_diff[NR_VM_ZONE_STAT_ITEMS] = { 0, }; -#ifdef CONFIG_NUMA - int global_numa_diff[NR_VM_NUMA_STAT_ITEMS] = { 0, }; -#endif int global_node_diff[NR_VM_NODE_STAT_ITEMS] = { 0, }; int changes = 0; @@ -799,17 +783,6 @@ static int refresh_cpu_vm_stats(bool do_pagesets) } } #ifdef CONFIG_NUMA - for (i = 0; i < NR_VM_NUMA_STAT_ITEMS; i++) { - int v; - - v = this_cpu_xchg(pzstats->vm_numa_stat_diff[i], 0); - if (v) { - - atomic_long_add(v, &zone->vm_numa_stat[i]); - global_numa_diff[i] += v; - __this_cpu_write(pcp->expire, 3); - } - } if (do_pagesets) { cond_resched(); @@ -857,12 +830,7 @@ static int refresh_cpu_vm_stats(bool do_pagesets) } } -#ifdef CONFIG_NUMA - changes += fold_diff(global_zone_diff, global_numa_diff, - global_node_diff); -#else changes += fold_diff(global_zone_diff, global_node_diff); -#endif return changes; } @@ -877,9 +845,6 @@ void cpu_vm_stats_fold(int cpu) struct zone *zone; int i; int global_zone_diff[NR_VM_ZONE_STAT_ITEMS] = { 0, }; -#ifdef CONFIG_NUMA - int global_numa_diff[NR_VM_NUMA_STAT_ITEMS] = { 0, }; -#endif int global_node_diff[NR_VM_NODE_STAT_ITEMS] = { 0, }; for_each_populated_zone(zone) { @@ -887,7 +852,7 @@ void cpu_vm_stats_fold(int cpu) pzstats = per_cpu_ptr(zone->per_cpu_zonestats, cpu); - for (i = 0; i < NR_VM_ZONE_STAT_ITEMS; i++) + for (i = 0; i < NR_VM_ZONE_STAT_ITEMS; i++) { if (pzstats->vm_stat_diff[i]) { int v; @@ -896,18 +861,7 @@ void cpu_vm_stats_fold(int cpu) atomic_long_add(v, &zone->vm_stat[i]); global_zone_diff[i] += v; } - -#ifdef CONFIG_NUMA - for (i = 0; i < NR_VM_NUMA_STAT_ITEMS; i++) - if (pzstats->vm_numa_stat_diff[i]) { - int v; - - v = pzstats->vm_numa_stat_diff[i]; - pzstats->vm_numa_stat_diff[i] = 0; - atomic_long_add(v, &zone->vm_numa_stat[i]); - global_numa_diff[i] += v; - } -#endif + } } for_each_online_pgdat(pgdat) { @@ -926,11 +880,7 @@ void cpu_vm_stats_fold(int cpu) } } -#ifdef CONFIG_NUMA - fold_diff(global_zone_diff, global_numa_diff, global_node_diff); -#else fold_diff(global_zone_diff, global_node_diff); -#endif } /* @@ -948,34 +898,17 @@ void drain_zonestat(struct zone *zone, struct per_cpu_zonestat *pzstats) atomic_long_add(v, &zone->vm_stat[i]); atomic_long_add(v, &vm_zone_stat[i]); } - -#ifdef CONFIG_NUMA - for (i = 0; i < NR_VM_NUMA_STAT_ITEMS; i++) - if (pzstats->vm_numa_stat_diff[i]) { - int v = pzstats->vm_numa_stat_diff[i]; - - pzstats->vm_numa_stat_diff[i] = 0; - atomic_long_add(v, &zone->vm_numa_stat[i]); - atomic_long_add(v, &vm_numa_stat[i]); - } -#endif } #endif #ifdef CONFIG_NUMA -void __inc_numa_state(struct zone *zone, +/* See __count_vm_event comment on why raw_cpu_inc is used. */ +void __count_numa_event(struct zone *zone, enum numa_stat_item item) { struct per_cpu_zonestat __percpu *pzstats = zone->per_cpu_zonestats; - u16 __percpu *p = pzstats->vm_numa_stat_diff + item; - u16 v; - v = __this_cpu_inc_return(*p); - - if (unlikely(v > NUMA_STATS_THRESHOLD)) { - zone_numa_state_add(v, zone, item); - __this_cpu_write(*p, 0); - } + raw_cpu_inc(pzstats->vm_numa_event[item]); } /* @@ -1000,15 +933,15 @@ unsigned long sum_zone_node_page_state(int node, * Determine the per node value of a numa stat item. To avoid deviation, * the per cpu stat number in vm_numa_stat_diff[] is also included. */ -unsigned long sum_zone_numa_state(int node, +unsigned long sum_zone_numa_event_state(int node, enum numa_stat_item item) { struct zone *zones = NODE_DATA(node)->node_zones; - int i; unsigned long count = 0; + int i; for (i = 0; i < MAX_NR_ZONES; i++) - count += zone_numa_state_snapshot(zones + i, item); + count += zone_numa_event_state(zones + i, item); return count; } @@ -1679,9 +1612,9 @@ static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat, zone_page_state(zone, i)); #ifdef CONFIG_NUMA - for (i = 0; i < NR_VM_NUMA_STAT_ITEMS; i++) + for (i = 0; i < NR_VM_NUMA_EVENT_ITEMS; i++) seq_printf(m, "\n %-12s %lu", numa_stat_name(i), - zone_numa_state_snapshot(zone, i)); + zone_numa_event_state(zone, i)); #endif seq_printf(m, "\n pagesets"); @@ -1735,7 +1668,7 @@ static const struct seq_operations zoneinfo_op = { }; #define NR_VMSTAT_ITEMS (NR_VM_ZONE_STAT_ITEMS + \ - NR_VM_NUMA_STAT_ITEMS + \ + NR_VM_NUMA_EVENT_ITEMS + \ NR_VM_NODE_STAT_ITEMS + \ NR_VM_WRITEBACK_STAT_ITEMS + \ (IS_ENABLED(CONFIG_VM_EVENT_COUNTERS) ? \ @@ -1750,6 +1683,7 @@ static void *vmstat_start(struct seq_file *m, loff_t *pos) return NULL; BUILD_BUG_ON(ARRAY_SIZE(vmstat_text) < NR_VMSTAT_ITEMS); + fold_vm_numa_events(); v = kmalloc_array(NR_VMSTAT_ITEMS, sizeof(unsigned long), GFP_KERNEL); m->private = v; if (!v) @@ -1759,9 +1693,9 @@ static void *vmstat_start(struct seq_file *m, loff_t *pos) v += NR_VM_ZONE_STAT_ITEMS; #ifdef CONFIG_NUMA - for (i = 0; i < NR_VM_NUMA_STAT_ITEMS; i++) - v[i] = global_numa_state(i); - v += NR_VM_NUMA_STAT_ITEMS; + for (i = 0; i < NR_VM_NUMA_EVENT_ITEMS; i++) + v[i] = global_numa_event_state(i); + v += NR_VM_NUMA_EVENT_ITEMS; #endif for (i = 0; i < NR_VM_NODE_STAT_ITEMS; i++) { @@ -1864,16 +1798,6 @@ int vmstat_refresh(struct ctl_table *table, int write, err = -EINVAL; } } -#ifdef CONFIG_NUMA - for (i = 0; i < NR_VM_NUMA_STAT_ITEMS; i++) { - val = atomic_long_read(&vm_numa_stat[i]); - if (val < 0) { - pr_warn("%s: %s %ld\n", - __func__, numa_stat_name(i), val); - err = -EINVAL; - } - } -#endif if (err) return err; if (write) @@ -1922,8 +1846,9 @@ static bool need_update(int cpu) if (memchr_inv(pzstats->vm_stat_diff, 0, NR_VM_ZONE_STAT_ITEMS * sizeof(pzstats->vm_stat_diff[0]))) return true; + #ifdef CONFIG_NUMA - if (memchr_inv(pzstats->vm_numa_stat_diff, 0, NR_VM_NUMA_STAT_ITEMS * + if (memchr_inv(pzstats->vm_numa_stat_diff, 0, NR_VM_NUMA_EVENT_ITEMS * sizeof(pzstats->vm_numa_stat_diff[0]))) return true; #endif From patchwork Wed Apr 7 20:24:17 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mel Gorman X-Patchwork-Id: 12189295 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 75008C433B4 for ; Wed, 7 Apr 2021 20:25:29 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 1F3146120E for ; Wed, 7 Apr 2021 20:25:29 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1F3146120E Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=techsingularity.net Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id A92986B0081; Wed, 7 Apr 2021 16:25:28 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A695B6B0082; Wed, 7 Apr 2021 16:25:28 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9317F6B0083; Wed, 7 Apr 2021 16:25:28 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0006.hostedemail.com [216.40.44.6]) by kanga.kvack.org (Postfix) with ESMTP id 772B06B0081 for ; Wed, 7 Apr 2021 16:25:28 -0400 (EDT) Received: from smtpin12.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 176A66D9A for ; Wed, 7 Apr 2021 20:25:28 +0000 (UTC) X-FDA: 78006701136.12.5BEFED6 Received: from outbound-smtp15.blacknight.com (outbound-smtp15.blacknight.com [46.22.139.232]) by imf29.hostedemail.com (Postfix) with ESMTP id 0F295DD for ; Wed, 7 Apr 2021 20:25:25 +0000 (UTC) Received: from mail.blacknight.com (pemlinmail01.blacknight.ie [81.17.254.10]) by outbound-smtp15.blacknight.com (Postfix) with ESMTPS id 701181C4857 for ; Wed, 7 Apr 2021 21:25:26 +0100 (IST) Received: (qmail 16150 invoked from network); 7 Apr 2021 20:25:26 -0000 Received: from unknown (HELO stampy.112glenside.lan) (mgorman@techsingularity.net@[84.203.22.4]) by 81.17.254.9 with ESMTPA; 7 Apr 2021 20:25:26 -0000 From: Mel Gorman To: Linux-MM , Linux-RT-Users Cc: LKML , Chuck Lever , Jesper Dangaard Brouer , Matthew Wilcox , Thomas Gleixner , Peter Zijlstra , Ingo Molnar , Michal Hocko , Oscar Salvador , Mel Gorman Subject: [PATCH 05/11] mm/vmstat: Inline NUMA event counter updates Date: Wed, 7 Apr 2021 21:24:17 +0100 Message-Id: <20210407202423.16022-6-mgorman@techsingularity.net> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210407202423.16022-1-mgorman@techsingularity.net> References: <20210407202423.16022-1-mgorman@techsingularity.net> MIME-Version: 1.0 X-Stat-Signature: 7f4bdp3dru8quq9u5ub3kef8xun1u53m X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 0F295DD Received-SPF: none (techsingularity.net>: No applicable sender policy available) receiver=imf29; identity=mailfrom; envelope-from=""; helo=outbound-smtp15.blacknight.com; client-ip=46.22.139.232 X-HE-DKIM-Result: none/none X-HE-Tag: 1617827125-302727 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: __count_numa_event is small enough to be treated similarly to __count_vm_event so inline it. Signed-off-by: Mel Gorman --- include/linux/vmstat.h | 9 +++++++++ mm/vmstat.c | 9 --------- 2 files changed, 9 insertions(+), 9 deletions(-) diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h index fc14415223c5..dde4dec4e7dd 100644 --- a/include/linux/vmstat.h +++ b/include/linux/vmstat.h @@ -237,6 +237,15 @@ static inline unsigned long zone_page_state_snapshot(struct zone *zone, } #ifdef CONFIG_NUMA +/* See __count_vm_event comment on why raw_cpu_inc is used. */ +static inline void +__count_numa_event(struct zone *zone, enum numa_stat_item item) +{ + struct per_cpu_zonestat __percpu *pzstats = zone->per_cpu_zonestats; + + raw_cpu_inc(pzstats->vm_numa_event[item]); +} + extern void __count_numa_event(struct zone *zone, enum numa_stat_item item); extern unsigned long sum_zone_node_page_state(int node, enum zone_stat_item item); diff --git a/mm/vmstat.c b/mm/vmstat.c index 63bd84d122c0..b853df95ed0c 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -902,15 +902,6 @@ void drain_zonestat(struct zone *zone, struct per_cpu_zonestat *pzstats) #endif #ifdef CONFIG_NUMA -/* See __count_vm_event comment on why raw_cpu_inc is used. */ -void __count_numa_event(struct zone *zone, - enum numa_stat_item item) -{ - struct per_cpu_zonestat __percpu *pzstats = zone->per_cpu_zonestats; - - raw_cpu_inc(pzstats->vm_numa_event[item]); -} - /* * Determine the per node value of a stat item. This function * is called frequently in a NUMA machine, so try to be as From patchwork Wed Apr 7 20:24:18 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mel Gorman X-Patchwork-Id: 12189297 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 823F5C433ED for ; Wed, 7 Apr 2021 20:25:39 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 1E0E86100A for ; Wed, 7 Apr 2021 20:25:39 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1E0E86100A Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=techsingularity.net Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id AE1306B0082; Wed, 7 Apr 2021 16:25:38 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A90096B0083; Wed, 7 Apr 2021 16:25:38 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9586B6B0085; Wed, 7 Apr 2021 16:25:38 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0194.hostedemail.com [216.40.44.194]) by kanga.kvack.org (Postfix) with ESMTP id 7C13B6B0082 for ; Wed, 7 Apr 2021 16:25:38 -0400 (EDT) Received: from smtpin35.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 1F0761812D6C8 for ; Wed, 7 Apr 2021 20:25:38 +0000 (UTC) X-FDA: 78006701556.35.89280CD Received: from outbound-smtp09.blacknight.com (outbound-smtp09.blacknight.com [46.22.139.14]) by imf24.hostedemail.com (Postfix) with ESMTP id EB260A0003A4 for ; Wed, 7 Apr 2021 20:25:33 +0000 (UTC) Received: from mail.blacknight.com (pemlinmail01.blacknight.ie [81.17.254.10]) by outbound-smtp09.blacknight.com (Postfix) with ESMTPS id A20A21C4866 for ; Wed, 7 Apr 2021 21:25:36 +0100 (IST) Received: (qmail 16603 invoked from network); 7 Apr 2021 20:25:36 -0000 Received: from unknown (HELO stampy.112glenside.lan) (mgorman@techsingularity.net@[84.203.22.4]) by 81.17.254.9 with ESMTPA; 7 Apr 2021 20:25:36 -0000 From: Mel Gorman To: Linux-MM , Linux-RT-Users Cc: LKML , Chuck Lever , Jesper Dangaard Brouer , Matthew Wilcox , Thomas Gleixner , Peter Zijlstra , Ingo Molnar , Michal Hocko , Oscar Salvador , Mel Gorman Subject: [PATCH 06/11] mm/page_alloc: Batch the accounting updates in the bulk allocator Date: Wed, 7 Apr 2021 21:24:18 +0100 Message-Id: <20210407202423.16022-7-mgorman@techsingularity.net> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210407202423.16022-1-mgorman@techsingularity.net> References: <20210407202423.16022-1-mgorman@techsingularity.net> MIME-Version: 1.0 X-Stat-Signature: f3afqkazhdcasbjfrjapuazyrc6wxzi9 X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: EB260A0003A4 Received-SPF: none (techsingularity.net>: No applicable sender policy available) receiver=imf24; identity=mailfrom; envelope-from=""; helo=outbound-smtp09.blacknight.com; client-ip=46.22.139.14 X-HE-DKIM-Result: none/none X-HE-Tag: 1617827133-858823 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Now that the zone_statistics are simple counters that do not require special protection, the bulk allocator accounting updates can be batch updated without adding too much complexity with protected RMW updates or using xchg. Signed-off-by: Mel Gorman --- include/linux/vmstat.h | 8 ++++++++ mm/page_alloc.c | 30 +++++++++++++----------------- 2 files changed, 21 insertions(+), 17 deletions(-) diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h index dde4dec4e7dd..8473b8fa9756 100644 --- a/include/linux/vmstat.h +++ b/include/linux/vmstat.h @@ -246,6 +246,14 @@ __count_numa_event(struct zone *zone, enum numa_stat_item item) raw_cpu_inc(pzstats->vm_numa_event[item]); } +static inline void +__count_numa_events(struct zone *zone, enum numa_stat_item item, long delta) +{ + struct per_cpu_zonestat __percpu *pzstats = zone->per_cpu_zonestats; + + raw_cpu_add(pzstats->vm_numa_event[item], delta); +} + extern void __count_numa_event(struct zone *zone, enum numa_stat_item item); extern unsigned long sum_zone_node_page_state(int node, enum zone_stat_item item); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 73e618d06315..defb0e436fac 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3411,7 +3411,8 @@ void __putback_isolated_page(struct page *page, unsigned int order, int mt) * * Must be called with interrupts disabled. */ -static inline void zone_statistics(struct zone *preferred_zone, struct zone *z) +static inline void zone_statistics(struct zone *preferred_zone, struct zone *z, + long nr_account) { #ifdef CONFIG_NUMA enum numa_stat_item local_stat = NUMA_LOCAL; @@ -3424,12 +3425,12 @@ static inline void zone_statistics(struct zone *preferred_zone, struct zone *z) local_stat = NUMA_OTHER; if (zone_to_nid(z) == zone_to_nid(preferred_zone)) - __count_numa_event(z, NUMA_HIT); + __count_numa_events(z, NUMA_HIT, nr_account); else { - __count_numa_event(z, NUMA_MISS); - __count_numa_event(preferred_zone, NUMA_FOREIGN); + __count_numa_events(z, NUMA_MISS, nr_account); + __count_numa_events(preferred_zone, NUMA_FOREIGN, nr_account); } - __count_numa_event(z, local_stat); + __count_numa_events(z, local_stat, nr_account); #endif } @@ -3475,7 +3476,7 @@ static struct page *rmqueue_pcplist(struct zone *preferred_zone, page = __rmqueue_pcplist(zone, migratetype, alloc_flags, pcp, list); if (page) { __count_zid_vm_events(PGALLOC, page_zonenum(page), 1); - zone_statistics(preferred_zone, zone); + zone_statistics(preferred_zone, zone, 1); } local_unlock_irqrestore(&pagesets.lock, flags); return page; @@ -3536,7 +3537,7 @@ struct page *rmqueue(struct zone *preferred_zone, get_pcppage_migratetype(page)); __count_zid_vm_events(PGALLOC, page_zonenum(page), 1 << order); - zone_statistics(preferred_zone, zone); + zone_statistics(preferred_zone, zone, 1); local_irq_restore(flags); out: @@ -5019,7 +5020,7 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid, struct alloc_context ac; gfp_t alloc_gfp; unsigned int alloc_flags = ALLOC_WMARK_LOW; - int nr_populated = 0; + int nr_populated = 0, nr_account = 0; if (unlikely(nr_pages <= 0)) return 0; @@ -5092,15 +5093,7 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid, goto failed_irq; break; } - - /* - * Ideally this would be batched but the best way to do - * that cheaply is to first convert zone_statistics to - * be inaccurate per-cpu counter like vm_events to avoid - * a RMW cycle then do the accounting with IRQs enabled. - */ - __count_zid_vm_events(PGALLOC, zone_idx(zone), 1); - zone_statistics(ac.preferred_zoneref->zone, zone); + nr_account++; prep_new_page(page, 0, gfp, 0); if (page_list) @@ -5110,6 +5103,9 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid, nr_populated++; } + __count_zid_vm_events(PGALLOC, zone_idx(zone), nr_account); + zone_statistics(ac.preferred_zoneref->zone, zone, nr_account); + local_unlock_irqrestore(&pagesets.lock, flags); return nr_populated; From patchwork Wed Apr 7 20:24:19 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mel Gorman X-Patchwork-Id: 12189299 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 94918C433B4 for ; Wed, 7 Apr 2021 20:25:49 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 3DB4B611C1 for ; Wed, 7 Apr 2021 20:25:49 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3DB4B611C1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=techsingularity.net Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id D2A2B6B0073; Wed, 7 Apr 2021 16:25:48 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D00C86B0083; Wed, 7 Apr 2021 16:25:48 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C184B6B0085; Wed, 7 Apr 2021 16:25:48 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0027.hostedemail.com [216.40.44.27]) by kanga.kvack.org (Postfix) with ESMTP id AB08A6B0073 for ; Wed, 7 Apr 2021 16:25:48 -0400 (EDT) Received: from smtpin17.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 68AE8824805A for ; Wed, 7 Apr 2021 20:25:48 +0000 (UTC) X-FDA: 78006701976.17.0C69172 Received: from outbound-smtp24.blacknight.com (outbound-smtp24.blacknight.com [81.17.249.192]) by imf30.hostedemail.com (Postfix) with ESMTP id 1A6B2E000114 for ; Wed, 7 Apr 2021 20:25:41 +0000 (UTC) Received: from mail.blacknight.com (pemlinmail01.blacknight.ie [81.17.254.10]) by outbound-smtp24.blacknight.com (Postfix) with ESMTPS id C6FB7C0D79 for ; Wed, 7 Apr 2021 21:25:46 +0100 (IST) Received: (qmail 16972 invoked from network); 7 Apr 2021 20:25:46 -0000 Received: from unknown (HELO stampy.112glenside.lan) (mgorman@techsingularity.net@[84.203.22.4]) by 81.17.254.9 with ESMTPA; 7 Apr 2021 20:25:46 -0000 From: Mel Gorman To: Linux-MM , Linux-RT-Users Cc: LKML , Chuck Lever , Jesper Dangaard Brouer , Matthew Wilcox , Thomas Gleixner , Peter Zijlstra , Ingo Molnar , Michal Hocko , Oscar Salvador , Mel Gorman Subject: [PATCH 07/11] mm/page_alloc: Reduce duration that IRQs are disabled for VM counters Date: Wed, 7 Apr 2021 21:24:19 +0100 Message-Id: <20210407202423.16022-8-mgorman@techsingularity.net> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210407202423.16022-1-mgorman@techsingularity.net> References: <20210407202423.16022-1-mgorman@techsingularity.net> MIME-Version: 1.0 X-Stat-Signature: rf89q3rfq8ekiqhhjzsnzefx4yp88a1e X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 1A6B2E000114 Received-SPF: none (techsingularity.net>: No applicable sender policy available) receiver=imf30; identity=mailfrom; envelope-from=""; helo=outbound-smtp24.blacknight.com; client-ip=81.17.249.192 X-HE-DKIM-Result: none/none X-HE-Tag: 1617827141-33162 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: IRQs are left disabled for the zone and node VM event counters. This is unnecessary as the affected counters are allowed to race for preemmption and IRQs. This patch reduces the scope of IRQs being disabled via local_[lock|unlock]_irq on !PREEMPT_RT kernels. One __mod_zone_freepage_state is still called with IRQs disabled. While this could be moved out, it's not free on all architectures as some require IRQs to be disabled for mod_zone_page_state on !PREEMPT_RTkernels. Signed-off-by: Mel Gorman --- mm/page_alloc.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index defb0e436fac..bd75102ef1e1 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3474,11 +3474,11 @@ static struct page *rmqueue_pcplist(struct zone *preferred_zone, pcp = this_cpu_ptr(zone->per_cpu_pageset); list = &pcp->lists[migratetype]; page = __rmqueue_pcplist(zone, migratetype, alloc_flags, pcp, list); + local_unlock_irqrestore(&pagesets.lock, flags); if (page) { __count_zid_vm_events(PGALLOC, page_zonenum(page), 1); zone_statistics(preferred_zone, zone, 1); } - local_unlock_irqrestore(&pagesets.lock, flags); return page; } @@ -3530,15 +3530,15 @@ struct page *rmqueue(struct zone *preferred_zone, if (!page) page = __rmqueue(zone, order, migratetype, alloc_flags); } while (page && check_new_pages(page, order)); - spin_unlock(&zone->lock); if (!page) goto failed; + __mod_zone_freepage_state(zone, -(1 << order), get_pcppage_migratetype(page)); + spin_unlock_irqrestore(&zone->lock, flags); __count_zid_vm_events(PGALLOC, page_zonenum(page), 1 << order); zone_statistics(preferred_zone, zone, 1); - local_irq_restore(flags); out: /* Separate test+clear to avoid unnecessary atomics */ @@ -3551,7 +3551,7 @@ struct page *rmqueue(struct zone *preferred_zone, return page; failed: - local_irq_restore(flags); + spin_unlock_irqrestore(&zone->lock, flags); return NULL; } @@ -5103,11 +5103,11 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid, nr_populated++; } + local_unlock_irqrestore(&pagesets.lock, flags); + __count_zid_vm_events(PGALLOC, zone_idx(zone), nr_account); zone_statistics(ac.preferred_zoneref->zone, zone, nr_account); - local_unlock_irqrestore(&pagesets.lock, flags); - return nr_populated; failed_irq: From patchwork Wed Apr 7 20:24:20 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mel Gorman X-Patchwork-Id: 12189301 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E089DC433ED for ; Wed, 7 Apr 2021 20:25:59 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 8EE2661184 for ; Wed, 7 Apr 2021 20:25:59 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8EE2661184 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=techsingularity.net Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 318B96B0078; Wed, 7 Apr 2021 16:25:59 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2EEA76B0083; Wed, 7 Apr 2021 16:25:59 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1B73B6B0085; Wed, 7 Apr 2021 16:25:59 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0049.hostedemail.com [216.40.44.49]) by kanga.kvack.org (Postfix) with ESMTP id 022C96B0078 for ; Wed, 7 Apr 2021 16:25:58 -0400 (EDT) Received: from smtpin39.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id B0B268248047 for ; Wed, 7 Apr 2021 20:25:58 +0000 (UTC) X-FDA: 78006702396.39.F84A692 Received: from outbound-smtp53.blacknight.com (outbound-smtp53.blacknight.com [46.22.136.237]) by imf20.hostedemail.com (Postfix) with ESMTP id 79352DC for ; Wed, 7 Apr 2021 20:25:55 +0000 (UTC) Received: from mail.blacknight.com (pemlinmail01.blacknight.ie [81.17.254.10]) by outbound-smtp53.blacknight.com (Postfix) with ESMTPS id 0054BFB3EB for ; Wed, 7 Apr 2021 21:25:56 +0100 (IST) Received: (qmail 17364 invoked from network); 7 Apr 2021 20:25:56 -0000 Received: from unknown (HELO stampy.112glenside.lan) (mgorman@techsingularity.net@[84.203.22.4]) by 81.17.254.9 with ESMTPA; 7 Apr 2021 20:25:56 -0000 From: Mel Gorman To: Linux-MM , Linux-RT-Users Cc: LKML , Chuck Lever , Jesper Dangaard Brouer , Matthew Wilcox , Thomas Gleixner , Peter Zijlstra , Ingo Molnar , Michal Hocko , Oscar Salvador , Mel Gorman Subject: [PATCH 08/11] mm/page_alloc: Remove duplicate checks if migratetype should be isolated Date: Wed, 7 Apr 2021 21:24:20 +0100 Message-Id: <20210407202423.16022-9-mgorman@techsingularity.net> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210407202423.16022-1-mgorman@techsingularity.net> References: <20210407202423.16022-1-mgorman@techsingularity.net> MIME-Version: 1.0 X-Rspamd-Queue-Id: 79352DC X-Stat-Signature: qkbdqq9aeb1zkafzc7ui6yfqbtrabrgj X-Rspamd-Server: rspam02 Received-SPF: none (techsingularity.net>: No applicable sender policy available) receiver=imf20; identity=mailfrom; envelope-from=""; helo=outbound-smtp53.blacknight.com; client-ip=46.22.136.237 X-HE-DKIM-Result: none/none X-HE-Tag: 1617827155-33651 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Both free_pcppages_bulk() and free_one_page() have very similar checks about whether a pages migratetype has changed under the zone lock. Use a common helper. Signed-off-by: Mel Gorman --- mm/page_alloc.c | 32 ++++++++++++++++++++++---------- 1 file changed, 22 insertions(+), 10 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index bd75102ef1e1..1bb5b522a0f9 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1354,6 +1354,23 @@ static inline void prefetch_buddy(struct page *page) prefetch(buddy); } +/* + * The migratetype of a page may have changed due to isolation so check. + * Assumes the caller holds the zone->lock to serialise against page + * isolation. + */ +static inline int +check_migratetype_isolated(struct zone *zone, struct page *page, unsigned long pfn, int migratetype) +{ + /* If isolating, check if the migratetype has changed */ + if (unlikely(has_isolate_pageblock(zone) || + is_migrate_isolate(migratetype))) { + migratetype = get_pfnblock_migratetype(page, pfn); + } + + return migratetype; +} + /* * Frees a number of pages from the PCP lists * Assumes all pages on list are in same zone, and of same order. @@ -1371,7 +1388,6 @@ static void free_pcppages_bulk(struct zone *zone, int count, int migratetype = 0; int batch_free = 0; int prefetch_nr = READ_ONCE(pcp->batch); - bool isolated_pageblocks; struct page *page, *tmp; LIST_HEAD(head); @@ -1433,21 +1449,20 @@ static void free_pcppages_bulk(struct zone *zone, int count, * both PREEMPT_RT and non-PREEMPT_RT configurations. */ spin_lock(&zone->lock); - isolated_pageblocks = has_isolate_pageblock(zone); /* * Use safe version since after __free_one_page(), * page->lru.next will not point to original list. */ list_for_each_entry_safe(page, tmp, &head, lru) { + unsigned long pfn = page_to_pfn(page); int mt = get_pcppage_migratetype(page); + /* MIGRATE_ISOLATE page should not go to pcplists */ VM_BUG_ON_PAGE(is_migrate_isolate(mt), page); - /* Pageblock could have been isolated meanwhile */ - if (unlikely(isolated_pageblocks)) - mt = get_pageblock_migratetype(page); - __free_one_page(page, page_to_pfn(page), zone, 0, mt, FPI_NONE); + mt = check_migratetype_isolated(zone, page, pfn, mt); + __free_one_page(page, pfn, zone, 0, mt, FPI_NONE); trace_mm_page_pcpu_drain(page, 0, mt); } spin_unlock(&zone->lock); @@ -1459,10 +1474,7 @@ static void free_one_page(struct zone *zone, int migratetype, fpi_t fpi_flags) { spin_lock(&zone->lock); - if (unlikely(has_isolate_pageblock(zone) || - is_migrate_isolate(migratetype))) { - migratetype = get_pfnblock_migratetype(page, pfn); - } + migratetype = check_migratetype_isolated(zone, page, pfn, migratetype); __free_one_page(page, pfn, zone, order, migratetype, fpi_flags); spin_unlock(&zone->lock); } From patchwork Wed Apr 7 20:24:21 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mel Gorman X-Patchwork-Id: 12189303 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 15243C433B4 for ; Wed, 7 Apr 2021 20:26:10 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 9E2BA61184 for ; Wed, 7 Apr 2021 20:26:09 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9E2BA61184 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=techsingularity.net Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 38E486B007D; Wed, 7 Apr 2021 16:26:09 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 364D86B0083; Wed, 7 Apr 2021 16:26:09 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 204E66B0085; Wed, 7 Apr 2021 16:26:09 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0173.hostedemail.com [216.40.44.173]) by kanga.kvack.org (Postfix) with ESMTP id 07FE06B007D for ; Wed, 7 Apr 2021 16:26:09 -0400 (EDT) Received: from smtpin19.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id B51DF1806CE17 for ; Wed, 7 Apr 2021 20:26:08 +0000 (UTC) X-FDA: 78006702816.19.4353AA9 Received: from outbound-smtp57.blacknight.com (outbound-smtp57.blacknight.com [46.22.136.241]) by imf08.hostedemail.com (Postfix) with ESMTP id C1D6B80192E8 for ; Wed, 7 Apr 2021 20:26:00 +0000 (UTC) Received: from mail.blacknight.com (pemlinmail01.blacknight.ie [81.17.254.10]) by outbound-smtp57.blacknight.com (Postfix) with ESMTPS id 23CD3FB3EB for ; Wed, 7 Apr 2021 21:26:07 +0100 (IST) Received: (qmail 17805 invoked from network); 7 Apr 2021 20:26:07 -0000 Received: from unknown (HELO stampy.112glenside.lan) (mgorman@techsingularity.net@[84.203.22.4]) by 81.17.254.9 with ESMTPA; 7 Apr 2021 20:26:06 -0000 From: Mel Gorman To: Linux-MM , Linux-RT-Users Cc: LKML , Chuck Lever , Jesper Dangaard Brouer , Matthew Wilcox , Thomas Gleixner , Peter Zijlstra , Ingo Molnar , Michal Hocko , Oscar Salvador , Mel Gorman Subject: [PATCH 09/11] mm/page_alloc: Explicitly acquire the zone lock in __free_pages_ok Date: Wed, 7 Apr 2021 21:24:21 +0100 Message-Id: <20210407202423.16022-10-mgorman@techsingularity.net> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210407202423.16022-1-mgorman@techsingularity.net> References: <20210407202423.16022-1-mgorman@techsingularity.net> MIME-Version: 1.0 X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: C1D6B80192E8 X-Stat-Signature: k8h37zcas615q56cw3jhhsmxnshfabbt Received-SPF: none (techsingularity.net>: No applicable sender policy available) receiver=imf08; identity=mailfrom; envelope-from=""; helo=outbound-smtp57.blacknight.com; client-ip=46.22.136.241 X-HE-DKIM-Result: none/none X-HE-Tag: 1617827160-154488 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: __free_pages_ok() disables IRQs before calling a common helper free_one_page() that acquires the zone lock. While this is safe, it unnecessarily disables IRQs on PREEMPT_RT kernels. This patch explicitly acquires the lock with spin_lock_irqsave instead of relying on a helper. This removes the last instance of local_irq_save() in page_alloc.c. Signed-off-by: Mel Gorman --- mm/page_alloc.c | 13 +++++-------- 1 file changed, 5 insertions(+), 8 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 1bb5b522a0f9..d94ec53367bd 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1559,21 +1559,18 @@ static void __free_pages_ok(struct page *page, unsigned int order, unsigned long flags; int migratetype; unsigned long pfn = page_to_pfn(page); + struct zone *zone = page_zone(page); if (!free_pages_prepare(page, order, true)) return; migratetype = get_pfnblock_migratetype(page, pfn); - /* - * TODO FIX: Disable IRQs before acquiring IRQ-safe zone->lock - * and protect vmstat updates. - */ - local_irq_save(flags); + spin_lock_irqsave(&zone->lock, flags); __count_vm_events(PGFREE, 1 << order); - free_one_page(page_zone(page), page, pfn, order, migratetype, - fpi_flags); - local_irq_restore(flags); + migratetype = check_migratetype_isolated(zone, page, pfn, migratetype); + __free_one_page(page, pfn, zone, order, migratetype, fpi_flags); + spin_unlock_irqrestore(&zone->lock, flags); } void __free_pages_core(struct page *page, unsigned int order) From patchwork Wed Apr 7 20:24:22 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mel Gorman X-Patchwork-Id: 12189305 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B989FC433B4 for ; Wed, 7 Apr 2021 20:26:20 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 5D6A0611C1 for ; Wed, 7 Apr 2021 20:26:20 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5D6A0611C1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=techsingularity.net Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id F275A6B007E; Wed, 7 Apr 2021 16:26:19 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id EFC666B0083; Wed, 7 Apr 2021 16:26:19 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DC53E6B0085; Wed, 7 Apr 2021 16:26:19 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id C43626B007E for ; Wed, 7 Apr 2021 16:26:19 -0400 (EDT) Received: from smtpin38.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 87A8D1814C3D3 for ; Wed, 7 Apr 2021 20:26:19 +0000 (UTC) X-FDA: 78006703278.38.1C008F0 Received: from outbound-smtp62.blacknight.com (outbound-smtp62.blacknight.com [46.22.136.251]) by imf17.hostedemail.com (Postfix) with ESMTP id 56FE840002D8 for ; Wed, 7 Apr 2021 20:26:17 +0000 (UTC) Received: from mail.blacknight.com (pemlinmail01.blacknight.ie [81.17.254.10]) by outbound-smtp62.blacknight.com (Postfix) with ESMTPS id CD16EFB3F4 for ; Wed, 7 Apr 2021 21:26:17 +0100 (IST) Received: (qmail 18464 invoked from network); 7 Apr 2021 20:26:17 -0000 Received: from unknown (HELO stampy.112glenside.lan) (mgorman@techsingularity.net@[84.203.22.4]) by 81.17.254.9 with ESMTPA; 7 Apr 2021 20:26:17 -0000 From: Mel Gorman To: Linux-MM , Linux-RT-Users Cc: LKML , Chuck Lever , Jesper Dangaard Brouer , Matthew Wilcox , Thomas Gleixner , Peter Zijlstra , Ingo Molnar , Michal Hocko , Oscar Salvador , Mel Gorman Subject: [PATCH 10/11] mm/page_alloc: Avoid conflating IRQs disabled with zone->lock Date: Wed, 7 Apr 2021 21:24:22 +0100 Message-Id: <20210407202423.16022-11-mgorman@techsingularity.net> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210407202423.16022-1-mgorman@techsingularity.net> References: <20210407202423.16022-1-mgorman@techsingularity.net> MIME-Version: 1.0 X-Rspamd-Queue-Id: 56FE840002D8 X-Stat-Signature: wchjgrumy83ow53yunc3iibh4gbzctdn X-Rspamd-Server: rspam02 Received-SPF: none (techsingularity.net>: No applicable sender policy available) receiver=imf17; identity=mailfrom; envelope-from=""; helo=outbound-smtp62.blacknight.com; client-ip=46.22.136.251 X-HE-DKIM-Result: none/none X-HE-Tag: 1617827177-559271 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Historically when freeing pages, free_one_page() assumed that callers had IRQs disabled and the zone->lock could be acquired with spin_lock(). This confuses the scope of what local_lock_irq is protecting and what zone->lock is protecting in free_unref_page_list in particular. This patch uses spin_lock_irqsave() for the zone->lock in free_one_page() instead of relying on callers to have disabled IRQs. free_unref_page_commit() is changed to only deal with PCP pages protected by the local lock. free_unref_page_list() then first frees isolated pages to the buddy lists with free_one_page() and frees the rest of the pages to the PCP via free_unref_page_commit(). The end result is that free_one_page() is no longer depending on side-effects of local_lock to be correct. Note that this may incur a performance penalty while memory hot-remove is running but that is not a common operation. Signed-off-by: Mel Gorman --- mm/page_alloc.c | 67 ++++++++++++++++++++++++++++++------------------- 1 file changed, 41 insertions(+), 26 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index d94ec53367bd..6d98d97b6cf5 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1473,10 +1473,12 @@ static void free_one_page(struct zone *zone, unsigned int order, int migratetype, fpi_t fpi_flags) { - spin_lock(&zone->lock); + unsigned long flags; + + spin_lock_irqsave(&zone->lock, flags); migratetype = check_migratetype_isolated(zone, page, pfn, migratetype); __free_one_page(page, pfn, zone, order, migratetype, fpi_flags); - spin_unlock(&zone->lock); + spin_unlock_irqrestore(&zone->lock, flags); } static void __meminit __init_single_page(struct page *page, unsigned long pfn, @@ -3238,31 +3240,13 @@ static bool free_unref_page_prepare(struct page *page, unsigned long pfn) return true; } -static void free_unref_page_commit(struct page *page, unsigned long pfn) +static void free_unref_page_commit(struct page *page, unsigned long pfn, + int migratetype) { struct zone *zone = page_zone(page); struct per_cpu_pages *pcp; - int migratetype; - migratetype = get_pcppage_migratetype(page); __count_vm_event(PGFREE); - - /* - * We only track unmovable, reclaimable and movable on pcp lists. - * Free ISOLATE pages back to the allocator because they are being - * offlined but treat HIGHATOMIC as movable pages so we can get those - * areas back if necessary. Otherwise, we may have to free - * excessively into the page allocator - */ - if (migratetype >= MIGRATE_PCPTYPES) { - if (unlikely(is_migrate_isolate(migratetype))) { - free_one_page(zone, page, pfn, 0, migratetype, - FPI_NONE); - return; - } - migratetype = MIGRATE_MOVABLE; - } - pcp = this_cpu_ptr(zone->per_cpu_pageset); list_add(&page->lru, &pcp->lists[migratetype]); pcp->count++; @@ -3277,12 +3261,29 @@ void free_unref_page(struct page *page) { unsigned long flags; unsigned long pfn = page_to_pfn(page); + int migratetype; if (!free_unref_page_prepare(page, pfn)) return; + /* + * We only track unmovable, reclaimable and movable on pcp lists. + * Place ISOLATE pages on the isolated list because they are being + * offlined but treat HIGHATOMIC as movable pages so we can get those + * areas back if necessary. Otherwise, we may have to free + * excessively into the page allocator + */ + migratetype = get_pcppage_migratetype(page); + if (unlikely(migratetype >= MIGRATE_PCPTYPES)) { + if (unlikely(is_migrate_isolate(migratetype))) { + free_one_page(page_zone(page), page, pfn, 0, migratetype, FPI_NONE); + return; + } + migratetype = MIGRATE_MOVABLE; + } + local_lock_irqsave(&pagesets.lock, flags); - free_unref_page_commit(page, pfn); + free_unref_page_commit(page, pfn, migratetype); local_unlock_irqrestore(&pagesets.lock, flags); } @@ -3294,6 +3295,7 @@ void free_unref_page_list(struct list_head *list) struct page *page, *next; unsigned long flags, pfn; int batch_count = 0; + int migratetype; /* Prepare pages for freeing */ list_for_each_entry_safe(page, next, list, lru) { @@ -3301,15 +3303,28 @@ void free_unref_page_list(struct list_head *list) if (!free_unref_page_prepare(page, pfn)) list_del(&page->lru); set_page_private(page, pfn); + + /* + * Free isolated pages directly to the allocator, see + * comment in free_unref_page. + */ + migratetype = get_pcppage_migratetype(page); + if (unlikely(migratetype >= MIGRATE_PCPTYPES)) { + if (unlikely(is_migrate_isolate(migratetype))) { + free_one_page(page_zone(page), page, pfn, 0, + migratetype, FPI_NONE); + list_del(&page->lru); + } + } } local_lock_irqsave(&pagesets.lock, flags); list_for_each_entry_safe(page, next, list, lru) { - unsigned long pfn = page_private(page); - + pfn = page_private(page); set_page_private(page, 0); + migratetype = get_pcppage_migratetype(page); trace_mm_page_free_batched(page); - free_unref_page_commit(page, pfn); + free_unref_page_commit(page, pfn, migratetype); /* * Guard against excessive IRQ disabled times when we get From patchwork Wed Apr 7 20:24:23 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mel Gorman X-Patchwork-Id: 12189307 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B3C26C433B4 for ; Wed, 7 Apr 2021 20:26:31 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 602A661184 for ; Wed, 7 Apr 2021 20:26:31 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 602A661184 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=techsingularity.net Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 00A026B0080; Wed, 7 Apr 2021 16:26:31 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id EAF976B0083; Wed, 7 Apr 2021 16:26:30 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D50786B0085; Wed, 7 Apr 2021 16:26:30 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0083.hostedemail.com [216.40.44.83]) by kanga.kvack.org (Postfix) with ESMTP id BA4496B0080 for ; Wed, 7 Apr 2021 16:26:30 -0400 (EDT) Received: from smtpin40.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 70F668248047 for ; Wed, 7 Apr 2021 20:26:30 +0000 (UTC) X-FDA: 78006703740.40.3116B2A Received: from outbound-smtp21.blacknight.com (outbound-smtp21.blacknight.com [81.17.249.41]) by imf05.hostedemail.com (Postfix) with ESMTP id 7ED2AE00010E for ; Wed, 7 Apr 2021 20:26:29 +0000 (UTC) Received: from mail.blacknight.com (pemlinmail01.blacknight.ie [81.17.254.10]) by outbound-smtp21.blacknight.com (Postfix) with ESMTPS id C2D3118E0E4 for ; Wed, 7 Apr 2021 21:26:28 +0100 (IST) Received: (qmail 18921 invoked from network); 7 Apr 2021 20:26:28 -0000 Received: from unknown (HELO stampy.112glenside.lan) (mgorman@techsingularity.net@[84.203.22.4]) by 81.17.254.9 with ESMTPA; 7 Apr 2021 20:26:28 -0000 From: Mel Gorman To: Linux-MM , Linux-RT-Users Cc: LKML , Chuck Lever , Jesper Dangaard Brouer , Matthew Wilcox , Thomas Gleixner , Peter Zijlstra , Ingo Molnar , Michal Hocko , Oscar Salvador , Mel Gorman Subject: [PATCH 11/11] mm/page_alloc: Update PGFREE outside the zone lock in __free_pages_ok Date: Wed, 7 Apr 2021 21:24:23 +0100 Message-Id: <20210407202423.16022-12-mgorman@techsingularity.net> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210407202423.16022-1-mgorman@techsingularity.net> References: <20210407202423.16022-1-mgorman@techsingularity.net> MIME-Version: 1.0 X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 7ED2AE00010E X-Stat-Signature: utg9dgawuitroedg6ixkca84st5uxfru Received-SPF: none (techsingularity.net>: No applicable sender policy available) receiver=imf05; identity=mailfrom; envelope-from=""; helo=outbound-smtp21.blacknight.com; client-ip=81.17.249.41 X-HE-DKIM-Result: none/none X-HE-Tag: 1617827189-658661 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: VM events do not need explicit protection by disabling IRQs so update the counter with IRQs enabled in __free_pages_ok. Signed-off-by: Mel Gorman --- mm/page_alloc.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 6d98d97b6cf5..49951dd841fa 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1569,10 +1569,11 @@ static void __free_pages_ok(struct page *page, unsigned int order, migratetype = get_pfnblock_migratetype(page, pfn); spin_lock_irqsave(&zone->lock, flags); - __count_vm_events(PGFREE, 1 << order); migratetype = check_migratetype_isolated(zone, page, pfn, migratetype); __free_one_page(page, pfn, zone, order, migratetype, fpi_flags); spin_unlock_irqrestore(&zone->lock, flags); + + __count_vm_events(PGFREE, 1 << order); } void __free_pages_core(struct page *page, unsigned int order)