From patchwork Tue Sep 22 14:37:04 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vlastimil Babka X-Patchwork-Id: 11792691 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D6C621668 for ; Tue, 22 Sep 2020 14:37:28 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 9EABC20739 for ; Tue, 22 Sep 2020 14:37:28 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9EABC20739 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 158E4900086; Tue, 22 Sep 2020 10:37:23 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id B9974900094; Tue, 22 Sep 2020 10:37:22 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 97373900096; Tue, 22 Sep 2020 10:37:22 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0226.hostedemail.com [216.40.44.226]) by kanga.kvack.org (Postfix) with ESMTP id 61ACF900093 for ; Tue, 22 Sep 2020 10:37:22 -0400 (EDT) Received: from smtpin28.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 3574482499A8 for ; Tue, 22 Sep 2020 14:37:22 +0000 (UTC) X-FDA: 77290950324.28.uncle69_4112e632714e Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin28.hostedemail.com (Postfix) with ESMTP id 0005A6C0B for ; Tue, 22 Sep 2020 14:37:21 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,vbabka@suse.cz,,RULES_HIT:30054:30070:30090,0,RBL:195.135.220.15:@suse.cz:.lbl8.mailshell.net-64.100.201.201 62.2.6.2;04y8guu5zt18qmtiuu51sjtqycioqypcqesnmt3meja8so8mjnsm936u7uduuun.hzqez133nxpbkq9hg4j8sgmk6oioz4p87sxn1137mjkifh3xtuzmk94htg63o7q.g-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: uncle69_4112e632714e X-Filterd-Recvd-Size: 4909 Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by imf48.hostedemail.com (Postfix) with ESMTP for ; Tue, 22 Sep 2020 14:37:21 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 00BBCB039; Tue, 22 Sep 2020 14:37:57 +0000 (UTC) From: Vlastimil Babka To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, Michal Hocko , Pavel Tatashin , David Hildenbrand , Oscar Salvador , Joonsoo Kim , Vlastimil Babka Subject: [PATCH 1/9] mm, page_alloc: clean up pageset high and batch update Date: Tue, 22 Sep 2020 16:37:04 +0200 Message-Id: <20200922143712.12048-2-vbabka@suse.cz> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20200922143712.12048-1-vbabka@suse.cz> References: <20200922143712.12048-1-vbabka@suse.cz> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The updates to pcplists' high and batch valued are handled by multiple functions that make the calculations hard to follow. Consolidate everything to pageset_set_high_and_batch() and remove pageset_set_batch() and pageset_set_high() wrappers. The only special case using one of the removed wrappers was: build_all_zonelists_init() setup_pageset() pageset_set_batch() which was hardcoding batch as 0, so we can just open-code a call to pageset_update() with constant parameters instead. No functional change. Signed-off-by: Vlastimil Babka Reviewed-by: Oscar Salvador Reviewed-by: David Hildenbrand Acked-by: Michal Hocko --- mm/page_alloc.c | 49 ++++++++++++++++++++----------------------------- 1 file changed, 20 insertions(+), 29 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 60a0e94645a6..a163c5e561f2 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5823,7 +5823,7 @@ static void build_zonelists(pg_data_t *pgdat) * not check if the processor is online before following the pageset pointer. * Other parts of the kernel may not check if the zone is available. */ -static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch); +static void setup_pageset(struct per_cpu_pageset *p); static DEFINE_PER_CPU(struct per_cpu_pageset, boot_pageset); static DEFINE_PER_CPU(struct per_cpu_nodestat, boot_nodestats); @@ -5891,7 +5891,7 @@ build_all_zonelists_init(void) * (a chicken-egg dilemma). */ for_each_possible_cpu(cpu) - setup_pageset(&per_cpu(boot_pageset, cpu), 0); + setup_pageset(&per_cpu(boot_pageset, cpu)); mminit_verify_zonelist(); cpuset_init_current_mems_allowed(); @@ -6200,12 +6200,6 @@ static void pageset_update(struct per_cpu_pages *pcp, unsigned long high, pcp->batch = batch; } -/* a companion to pageset_set_high() */ -static void pageset_set_batch(struct per_cpu_pageset *p, unsigned long batch) -{ - pageset_update(&p->pcp, 6 * batch, max(1UL, 1 * batch)); -} - static void pageset_init(struct per_cpu_pageset *p) { struct per_cpu_pages *pcp; @@ -6218,35 +6212,32 @@ static void pageset_init(struct per_cpu_pageset *p) INIT_LIST_HEAD(&pcp->lists[migratetype]); } -static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch) +static void setup_pageset(struct per_cpu_pageset *p) { pageset_init(p); - pageset_set_batch(p, batch); + pageset_update(&p->pcp, 0, 1); } /* - * pageset_set_high() sets the high water mark for hot per_cpu_pagelist - * to the value high for the pageset p. + * Calculate and set new high and batch values for given per-cpu pageset of a + * zone, based on the zone's size and the percpu_pagelist_fraction sysctl. */ -static void pageset_set_high(struct per_cpu_pageset *p, - unsigned long high) -{ - unsigned long batch = max(1UL, high / 4); - if ((high / 4) > (PAGE_SHIFT * 8)) - batch = PAGE_SHIFT * 8; - - pageset_update(&p->pcp, high, batch); -} - static void pageset_set_high_and_batch(struct zone *zone, - struct per_cpu_pageset *pcp) + struct per_cpu_pageset *p) { - if (percpu_pagelist_fraction) - pageset_set_high(pcp, - (zone_managed_pages(zone) / - percpu_pagelist_fraction)); - else - pageset_set_batch(pcp, zone_batchsize(zone)); + unsigned long new_high, new_batch; + + if (percpu_pagelist_fraction) { + new_high = zone_managed_pages(zone) / percpu_pagelist_fraction; + new_batch = max(1UL, new_high / 4); + if ((new_high / 4) > (PAGE_SHIFT * 8)) + new_batch = PAGE_SHIFT * 8; + } else { + new_batch = zone_batchsize(zone); + new_high = 6 * new_batch; + new_batch = max(1UL, 1 * new_batch); + } + pageset_update(&p->pcp, new_high, new_batch); } static void __meminit zone_pageset_init(struct zone *zone, int cpu) From patchwork Tue Sep 22 14:37:05 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vlastimil Babka X-Patchwork-Id: 11792687 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3CD48112C for ; Tue, 22 Sep 2020 14:37:25 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 035062396D for ; Tue, 22 Sep 2020 14:37:24 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 035062396D Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id A7B14900093; Tue, 22 Sep 2020 10:37:22 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 92D33900092; Tue, 22 Sep 2020 10:37:22 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7E965900096; Tue, 22 Sep 2020 10:37:22 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0061.hostedemail.com [216.40.44.61]) by kanga.kvack.org (Postfix) with ESMTP id 57F4F900086 for ; Tue, 22 Sep 2020 10:37:22 -0400 (EDT) Received: from smtpin07.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 111C63628 for ; Tue, 22 Sep 2020 14:37:22 +0000 (UTC) X-FDA: 77290950324.07.rain73_580717e2714e Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin07.hostedemail.com (Postfix) with ESMTP id E1D4D1803F9A3 for ; Tue, 22 Sep 2020 14:37:21 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,vbabka@suse.cz,,RULES_HIT,0,RBL:195.135.220.15:@suse.cz:.lbl8.mailshell.net-64.100.201.201 62.2.6.2;04yf1p4qmmn6nkedw7wi9t7d8nd5byph4qb9p66wa88rjb8kaa8g93p3uy465j3.wcff8bfshz3zppjz9cs9c3yu1dsdaomh11p1kzj1wfjdfsi6rfs6am99c3hatdh.y-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:25,LUA_SUMMARY:none X-HE-Tag: rain73_580717e2714e X-Filterd-Recvd-Size: 4528 Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by imf46.hostedemail.com (Postfix) with ESMTP for ; Tue, 22 Sep 2020 14:37:21 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 01803B08C; Tue, 22 Sep 2020 14:37:57 +0000 (UTC) From: Vlastimil Babka To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, Michal Hocko , Pavel Tatashin , David Hildenbrand , Oscar Salvador , Joonsoo Kim , Vlastimil Babka Subject: [PATCH 2/9] mm, page_alloc: calculate pageset high and batch once per zone Date: Tue, 22 Sep 2020 16:37:05 +0200 Message-Id: <20200922143712.12048-3-vbabka@suse.cz> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20200922143712.12048-1-vbabka@suse.cz> References: <20200922143712.12048-1-vbabka@suse.cz> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000001, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: We currently call pageset_set_high_and_batch() for each possible cpu, which repeats the same calculations of high and batch values. Instead call the function just once per zone, and make it apply the calculated values to all per-cpu pagesets of the zone. This also allows removing the zone_pageset_init() and __zone_pcp_update() wrappers. No functional change. Signed-off-by: Vlastimil Babka Reviewed-by: Oscar Salvador Reviewed-by: David Hildenbrand Acked-by: Michal Hocko --- mm/page_alloc.c | 42 ++++++++++++++++++------------------------ 1 file changed, 18 insertions(+), 24 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index a163c5e561f2..26069c8d1b19 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6219,13 +6219,14 @@ static void setup_pageset(struct per_cpu_pageset *p) } /* - * Calculate and set new high and batch values for given per-cpu pageset of a + * Calculate and set new high and batch values for all per-cpu pagesets of a * zone, based on the zone's size and the percpu_pagelist_fraction sysctl. */ -static void pageset_set_high_and_batch(struct zone *zone, - struct per_cpu_pageset *p) +static void zone_set_pageset_high_and_batch(struct zone *zone) { unsigned long new_high, new_batch; + struct per_cpu_pageset *p; + int cpu; if (percpu_pagelist_fraction) { new_high = zone_managed_pages(zone) / percpu_pagelist_fraction; @@ -6237,23 +6238,25 @@ static void pageset_set_high_and_batch(struct zone *zone, new_high = 6 * new_batch; new_batch = max(1UL, 1 * new_batch); } - pageset_update(&p->pcp, new_high, new_batch); -} - -static void __meminit zone_pageset_init(struct zone *zone, int cpu) -{ - struct per_cpu_pageset *pcp = per_cpu_ptr(zone->pageset, cpu); - pageset_init(pcp); - pageset_set_high_and_batch(zone, pcp); + for_each_possible_cpu(cpu) { + p = per_cpu_ptr(zone->pageset, cpu); + pageset_update(&p->pcp, new_high, new_batch); + } } void __meminit setup_zone_pageset(struct zone *zone) { + struct per_cpu_pageset *p; int cpu; + zone->pageset = alloc_percpu(struct per_cpu_pageset); - for_each_possible_cpu(cpu) - zone_pageset_init(zone, cpu); + for_each_possible_cpu(cpu) { + p = per_cpu_ptr(zone->pageset, cpu); + pageset_init(p); + } + + zone_set_pageset_high_and_batch(zone); } /* @@ -7985,15 +7988,6 @@ int lowmem_reserve_ratio_sysctl_handler(struct ctl_table *table, int write, return 0; } -static void __zone_pcp_update(struct zone *zone) -{ - unsigned int cpu; - - for_each_possible_cpu(cpu) - pageset_set_high_and_batch(zone, - per_cpu_ptr(zone->pageset, cpu)); -} - /* * percpu_pagelist_fraction - changes the pcp->high for each zone on each * cpu. It is the fraction of total pages in each zone that a hot per cpu @@ -8026,7 +8020,7 @@ int percpu_pagelist_fraction_sysctl_handler(struct ctl_table *table, int write, goto out; for_each_populated_zone(zone) - __zone_pcp_update(zone); + zone_set_pageset_high_and_batch(zone); out: mutex_unlock(&pcp_batch_high_lock); return ret; @@ -8633,7 +8627,7 @@ EXPORT_SYMBOL(free_contig_range); void __meminit zone_pcp_update(struct zone *zone) { mutex_lock(&pcp_batch_high_lock); - __zone_pcp_update(zone); + zone_set_pageset_high_and_batch(zone); mutex_unlock(&pcp_batch_high_lock); } From patchwork Tue Sep 22 14:37:06 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vlastimil Babka X-Patchwork-Id: 11792685 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C5841112C for ; Tue, 22 Sep 2020 14:37:23 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 84D5320739 for ; Tue, 22 Sep 2020 14:37:23 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 84D5320739 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 73837900090; Tue, 22 Sep 2020 10:37:22 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 68D5F900094; Tue, 22 Sep 2020 10:37:22 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 48E80900090; Tue, 22 Sep 2020 10:37:22 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0190.hostedemail.com [216.40.44.190]) by kanga.kvack.org (Postfix) with ESMTP id 33D42900086 for ; Tue, 22 Sep 2020 10:37:22 -0400 (EDT) Received: from smtpin02.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id EA2B88249980 for ; Tue, 22 Sep 2020 14:37:21 +0000 (UTC) X-FDA: 77290950282.02.list83_5e017772714e Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin02.hostedemail.com (Postfix) with ESMTP id CA32A100B4D47 for ; Tue, 22 Sep 2020 14:37:21 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,vbabka@suse.cz,,RULES_HIT:30054:30070:30090,0,RBL:195.135.220.15:@suse.cz:.lbl8.mailshell.net-62.2.6.2 64.100.201.201;04yrwc7cw8ki4ue3a3dzo85rwfoisyc3kwbt9i5gcooska7pjj7xgnia6mnxjzg.nn5w5dthcicxgh63hbc95kt6gmm871zyikxaezjwfowztezmusy4zqo8t7fqwux.c-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:25,LUA_SUMMARY:none X-HE-Tag: list83_5e017772714e X-Filterd-Recvd-Size: 3063 Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by imf05.hostedemail.com (Postfix) with ESMTP for ; Tue, 22 Sep 2020 14:37:21 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 008C9ACC2; Tue, 22 Sep 2020 14:37:57 +0000 (UTC) From: Vlastimil Babka To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, Michal Hocko , Pavel Tatashin , David Hildenbrand , Oscar Salvador , Joonsoo Kim , Vlastimil Babka Subject: [PATCH 3/9] mm, page_alloc: remove setup_pageset() Date: Tue, 22 Sep 2020 16:37:06 +0200 Message-Id: <20200922143712.12048-4-vbabka@suse.cz> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20200922143712.12048-1-vbabka@suse.cz> References: <20200922143712.12048-1-vbabka@suse.cz> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: We initialize boot-time pagesets with setup_pageset(), which sets high and batch values that effectively disable pcplists. We can remove this wrapper if we just set these values for all pagesets in pageset_init(). Non-boot pagesets then subsequently update them to the proper values. No functional change. Signed-off-by: Vlastimil Babka Reviewed-by: David Hildenbrand --- mm/page_alloc.c | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 26069c8d1b19..76c2b4578723 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5823,7 +5823,7 @@ static void build_zonelists(pg_data_t *pgdat) * not check if the processor is online before following the pageset pointer. * Other parts of the kernel may not check if the zone is available. */ -static void setup_pageset(struct per_cpu_pageset *p); +static void pageset_init(struct per_cpu_pageset *p); static DEFINE_PER_CPU(struct per_cpu_pageset, boot_pageset); static DEFINE_PER_CPU(struct per_cpu_nodestat, boot_nodestats); @@ -5891,7 +5891,7 @@ build_all_zonelists_init(void) * (a chicken-egg dilemma). */ for_each_possible_cpu(cpu) - setup_pageset(&per_cpu(boot_pageset, cpu)); + pageset_init(&per_cpu(boot_pageset, cpu)); mminit_verify_zonelist(); cpuset_init_current_mems_allowed(); @@ -6210,12 +6210,15 @@ static void pageset_init(struct per_cpu_pageset *p) pcp = &p->pcp; for (migratetype = 0; migratetype < MIGRATE_PCPTYPES; migratetype++) INIT_LIST_HEAD(&pcp->lists[migratetype]); -} -static void setup_pageset(struct per_cpu_pageset *p) -{ - pageset_init(p); - pageset_update(&p->pcp, 0, 1); + /* + * Set batch and high values safe for a boot pageset. A true percpu + * pageset's initialization will update them subsequently. Here we don't + * need to be as careful as pageset_update() as nobody can access the + * pageset yet. + */ + pcp->high = 0; + pcp->batch = 1; } /* From patchwork Tue Sep 22 14:37:07 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vlastimil Babka X-Patchwork-Id: 11792693 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2F4EF1668 for ; Tue, 22 Sep 2020 14:37:31 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id E5A0220739 for ; Tue, 22 Sep 2020 14:37:30 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E5A0220739 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 53CB2900094; Tue, 22 Sep 2020 10:37:23 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id EF412900096; Tue, 22 Sep 2020 10:37:22 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C58F1900086; Tue, 22 Sep 2020 10:37:22 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0164.hostedemail.com [216.40.44.164]) by kanga.kvack.org (Postfix) with ESMTP id 86A7E900086 for ; Tue, 22 Sep 2020 10:37:22 -0400 (EDT) Received: from smtpin06.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 49C8E1EF1 for ; Tue, 22 Sep 2020 14:37:22 +0000 (UTC) X-FDA: 77290950324.06.plant61_2304f762714e Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin06.hostedemail.com (Postfix) with ESMTP id 21B91100CF1C4 for ; Tue, 22 Sep 2020 14:37:22 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,vbabka@suse.cz,,RULES_HIT:30012:30054:30056:30070,0,RBL:195.135.220.15:@suse.cz:.lbl8.mailshell.net-62.2.6.2 64.100.201.201;04yfowqmui3bzcz3jw5dpd66qmd9yyc58ucfhu7gzpegtuw747xbeynmi3aizkj.wbi9kdbfxt5mmcjd6bpiu3eza59s5jpero4a4rd3rycru3zif9drmbhdqwu17hf.s-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:1:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: plant61_2304f762714e X-Filterd-Recvd-Size: 5663 Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by imf23.hostedemail.com (Postfix) with ESMTP for ; Tue, 22 Sep 2020 14:37:21 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 1701AB0D2; Tue, 22 Sep 2020 14:37:57 +0000 (UTC) From: Vlastimil Babka To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, Michal Hocko , Pavel Tatashin , David Hildenbrand , Oscar Salvador , Joonsoo Kim , Vlastimil Babka Subject: [PATCH 4/9] mm, page_alloc: simplify pageset_update() Date: Tue, 22 Sep 2020 16:37:07 +0200 Message-Id: <20200922143712.12048-5-vbabka@suse.cz> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20200922143712.12048-1-vbabka@suse.cz> References: <20200922143712.12048-1-vbabka@suse.cz> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: pageset_update() attempts to update pcplist's high and batch values in a way that readers don't observe batch > high. It uses smp_wmb() to order the updates in a way to achieve this. However, without proper pairing read barriers in readers this guarantee doesn't hold, and there are no such barriers in e.g. free_unref_page_commit(). Commit 88e8ac11d2ea ("mm, page_alloc: fix core hung in free_pcppages_bulk()") already showed this is problematic, and solved this by ultimately only trusing pcp->count of the current cpu with interrupts disabled. The update dance with unpaired write barriers thus makes no sense. Replace them with plain WRITE_ONCE to prevent store tearing, and document that the values can change asynchronously and should not be trusted for correctness. All current readers appear to be OK after 88e8ac11d2ea. Convert them to READ_ONCE to prevent unnecessary read tearing, but mainly to alert anybody making future changes to the code that special care is needed. Signed-off-by: Vlastimil Babka Acked-by: David Hildenbrand Acked-by: Michal Hocko --- mm/page_alloc.c | 40 ++++++++++++++++++---------------------- 1 file changed, 18 insertions(+), 22 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 76c2b4578723..99b74c1c2b0a 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1297,7 +1297,7 @@ static void free_pcppages_bulk(struct zone *zone, int count, { int migratetype = 0; int batch_free = 0; - int prefetch_nr = 0; + int prefetch_nr = READ_ONCE(pcp->batch); bool isolated_pageblocks; struct page *page, *tmp; LIST_HEAD(head); @@ -1348,8 +1348,10 @@ static void free_pcppages_bulk(struct zone *zone, int count, * avoid excessive prefetching due to large count, only * prefetch buddy for the first pcp->batch nr of pages. */ - if (prefetch_nr++ < pcp->batch) + if (prefetch_nr) { prefetch_buddy(page); + prefetch_nr--; + } } while (--count && --batch_free && !list_empty(list)); } @@ -3131,10 +3133,8 @@ static void free_unref_page_commit(struct page *page, unsigned long pfn) pcp = &this_cpu_ptr(zone->pageset)->pcp; list_add(&page->lru, &pcp->lists[migratetype]); pcp->count++; - if (pcp->count >= pcp->high) { - unsigned long batch = READ_ONCE(pcp->batch); - free_pcppages_bulk(zone, batch, pcp); - } + if (pcp->count >= READ_ONCE(pcp->high)) + free_pcppages_bulk(zone, READ_ONCE(pcp->batch), pcp); } /* @@ -3318,7 +3318,7 @@ static struct page *__rmqueue_pcplist(struct zone *zone, int migratetype, do { if (list_empty(list)) { pcp->count += rmqueue_bulk(zone, 0, - pcp->batch, list, + READ_ONCE(pcp->batch), list, migratetype, alloc_flags); if (unlikely(list_empty(list))) return NULL; @@ -6174,13 +6174,16 @@ static int zone_batchsize(struct zone *zone) } /* - * pcp->high and pcp->batch values are related and dependent on one another: - * ->batch must never be higher then ->high. - * The following function updates them in a safe manner without read side - * locking. + * pcp->high and pcp->batch values are related and generally batch is lower + * than high. They are also related to pcp->count such that count is lower + * than high, and as soon as it reaches high, the pcplist is flushed. * - * Any new users of pcp->batch and pcp->high should ensure they can cope with - * those fields changing asynchronously (acording to the above rule). + * However, guaranteeing these relations at all times would require e.g. write + * barriers here but also careful usage of read barriers at the read side, and + * thus be prone to error and bad for performance. Thus the update only prevents + * store tearing. Any new users of pcp->batch and pcp->high should ensure they + * can cope with those fields changing asynchronously, and fully trust only the + * pcp->count field on the local CPU with interrupts disabled. * * mutex_is_locked(&pcp_batch_high_lock) required when calling this function * outside of boot time (or some other assurance that no concurrent updaters @@ -6189,15 +6192,8 @@ static int zone_batchsize(struct zone *zone) static void pageset_update(struct per_cpu_pages *pcp, unsigned long high, unsigned long batch) { - /* start with a fail safe value for batch */ - pcp->batch = 1; - smp_wmb(); - - /* Update high, then batch, in order */ - pcp->high = high; - smp_wmb(); - - pcp->batch = batch; + WRITE_ONCE(pcp->batch, batch); + WRITE_ONCE(pcp->high, high); } static void pageset_init(struct per_cpu_pageset *p) From patchwork Tue Sep 22 14:37:08 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vlastimil Babka X-Patchwork-Id: 11792695 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8F7CA112C for ; Tue, 22 Sep 2020 14:37:33 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 571CB20739 for ; Tue, 22 Sep 2020 14:37:33 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 571CB20739 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id E094F90009D; Tue, 22 Sep 2020 10:37:23 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id D418B900097; Tue, 22 Sep 2020 10:37:23 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BE46990009B; Tue, 22 Sep 2020 10:37:23 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0081.hostedemail.com [216.40.44.81]) by kanga.kvack.org (Postfix) with ESMTP id 9F905900096 for ; Tue, 22 Sep 2020 10:37:23 -0400 (EDT) Received: from smtpin29.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 611CD3628 for ; Tue, 22 Sep 2020 14:37:23 +0000 (UTC) X-FDA: 77290950366.29.glass51_09026162714e Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin29.hostedemail.com (Postfix) with ESMTP id 30ECC18086CDA for ; Tue, 22 Sep 2020 14:37:23 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,vbabka@suse.cz,,RULES_HIT:6226:30054,0,RBL:195.135.220.15:@suse.cz:.lbl8.mailshell.net-64.100.201.201 62.2.6.2;04yfp3eha4c7dsz4i9msxhx7cp3rfyc8iwgq1dng4zgof4i9iht63ojrurhhcxg.fk9zshd3rdxbig1mitrzaxwzrghnyzemw3txys46ur5pfupe4y7xg9zowuh5ayh.4-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: glass51_09026162714e X-Filterd-Recvd-Size: 2801 Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by imf33.hostedemail.com (Postfix) with ESMTP for ; Tue, 22 Sep 2020 14:37:22 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 34205B11F; Tue, 22 Sep 2020 14:37:57 +0000 (UTC) From: Vlastimil Babka To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, Michal Hocko , Pavel Tatashin , David Hildenbrand , Oscar Salvador , Joonsoo Kim , Vlastimil Babka Subject: [PATCH 5/9] mm, page_alloc: make per_cpu_pageset accessible only after init Date: Tue, 22 Sep 2020 16:37:08 +0200 Message-Id: <20200922143712.12048-6-vbabka@suse.cz> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20200922143712.12048-1-vbabka@suse.cz> References: <20200922143712.12048-1-vbabka@suse.cz> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: setup_zone_pageset() replaces the boot_pageset by allocating and initializing a proper percpu one. Currently it assigns zone->pageset with the newly allocated one before initializing it. That's currently not an issue, because the zone should not be in any zonelist, thus not visible to allocators at this point. Memory ordering between the pcplist contents and its visibility is also not guaranteed here, but that also shouldn't be an issue because online_pages() does a spin_unlock(pgdat->node_size_lock) before building the zonelists. However it's best that we don't silently rely on operations that can be changed in the future. Make sure only properly initialized pcplists are visible, using smp_store_release(). The read side has a data dependency via the zone->pageset pointer instead of an explicit read barrier. Signed-off-by: Vlastimil Babka Acked-by: David Hildenbrand --- mm/page_alloc.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 99b74c1c2b0a..de3b48bda45c 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6246,15 +6246,17 @@ static void zone_set_pageset_high_and_batch(struct zone *zone) void __meminit setup_zone_pageset(struct zone *zone) { + struct per_cpu_pageset __percpu * new_pageset; struct per_cpu_pageset *p; int cpu; - zone->pageset = alloc_percpu(struct per_cpu_pageset); + new_pageset = alloc_percpu(struct per_cpu_pageset); for_each_possible_cpu(cpu) { - p = per_cpu_ptr(zone->pageset, cpu); + p = per_cpu_ptr(new_pageset, cpu); pageset_init(p); } + smp_store_release(&zone->pageset, new_pageset); zone_set_pageset_high_and_batch(zone); } From patchwork Tue Sep 22 14:37:09 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vlastimil Babka X-Patchwork-Id: 11792697 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 48C651668 for ; Tue, 22 Sep 2020 14:37:36 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 0D29720739 for ; Tue, 22 Sep 2020 14:37:35 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0D29720739 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 19028900097; Tue, 22 Sep 2020 10:37:24 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id E06E690009C; Tue, 22 Sep 2020 10:37:23 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C84C8900096; Tue, 22 Sep 2020 10:37:23 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0155.hostedemail.com [216.40.44.155]) by kanga.kvack.org (Postfix) with ESMTP id AA4BB900097 for ; Tue, 22 Sep 2020 10:37:23 -0400 (EDT) Received: from smtpin06.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 65BA7180AD80F for ; Tue, 22 Sep 2020 14:37:23 +0000 (UTC) X-FDA: 77290950366.06.park73_41016de2714e Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin06.hostedemail.com (Postfix) with ESMTP id 3D6B0100393E5 for ; Tue, 22 Sep 2020 14:37:23 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,vbabka@suse.cz,,RULES_HIT:30054:30056:30070:30079:30090:30091,0,RBL:195.135.220.15:@suse.cz:.lbl8.mailshell.net-64.100.201.201 62.2.6.2;04yf7sp3p618brdwqa4k3yxadjdtwopiqmo9u8tiktj31xdr8kmtfsi5kgwou79.ifx6ejh9sajtfq5qyw3575yiq771oyj4c8j4gy61ec7zq54yiojyne45p7kip8t.1-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: park73_41016de2714e X-Filterd-Recvd-Size: 3989 Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by imf47.hostedemail.com (Postfix) with ESMTP for ; Tue, 22 Sep 2020 14:37:22 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 3DD02B120; Tue, 22 Sep 2020 14:37:57 +0000 (UTC) From: Vlastimil Babka To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, Michal Hocko , Pavel Tatashin , David Hildenbrand , Oscar Salvador , Joonsoo Kim , Vlastimil Babka Subject: [PATCH 6/9] mm, page_alloc: cache pageset high and batch in struct zone Date: Tue, 22 Sep 2020 16:37:09 +0200 Message-Id: <20200922143712.12048-7-vbabka@suse.cz> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20200922143712.12048-1-vbabka@suse.cz> References: <20200922143712.12048-1-vbabka@suse.cz> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: All per-cpu pagesets for a zone use the same high and batch values, that are duplicated there just for performance (locality) reasons. This patch adds the same variables also to struct zone as a shared copy. This will be useful later for making possible to disable pcplists temporarily by setting high value to 0, while remembering the values for restoring them later. But we can also immediately benefit from not updating pagesets of all possible cpus in case the newly recalculated values (after sysctl change or memory online/offline) are actually unchanged from the previous ones. Signed-off-by: Vlastimil Babka --- include/linux/mmzone.h | 6 ++++++ mm/page_alloc.c | 16 ++++++++++++++-- 2 files changed, 20 insertions(+), 2 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 90721f3156bc..7ad3f14dbe88 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -470,6 +470,12 @@ struct zone { #endif struct pglist_data *zone_pgdat; struct per_cpu_pageset __percpu *pageset; + /* + * the high and batch values are copied to individual pagesets for + * faster access + */ + int pageset_high; + int pageset_batch; #ifndef CONFIG_SPARSEMEM /* diff --git a/mm/page_alloc.c b/mm/page_alloc.c index de3b48bda45c..901907799bdc 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5824,6 +5824,8 @@ static void build_zonelists(pg_data_t *pgdat) * Other parts of the kernel may not check if the zone is available. */ static void pageset_init(struct per_cpu_pageset *p); +#define BOOT_PAGESET_HIGH 0 +#define BOOT_PAGESET_BATCH 1 static DEFINE_PER_CPU(struct per_cpu_pageset, boot_pageset); static DEFINE_PER_CPU(struct per_cpu_nodestat, boot_nodestats); @@ -6213,8 +6215,8 @@ static void pageset_init(struct per_cpu_pageset *p) * need to be as careful as pageset_update() as nobody can access the * pageset yet. */ - pcp->high = 0; - pcp->batch = 1; + pcp->high = BOOT_PAGESET_HIGH; + pcp->batch = BOOT_PAGESET_BATCH; } /* @@ -6238,6 +6240,14 @@ static void zone_set_pageset_high_and_batch(struct zone *zone) new_batch = max(1UL, 1 * new_batch); } + if (zone->pageset_high != new_high || + zone->pageset_batch != new_batch) { + zone->pageset_high = new_high; + zone->pageset_batch = new_batch; + } else { + return; + } + for_each_possible_cpu(cpu) { p = per_cpu_ptr(zone->pageset, cpu); pageset_update(&p->pcp, new_high, new_batch); @@ -6300,6 +6310,8 @@ static __meminit void zone_pcp_init(struct zone *zone) * offset of a (static) per cpu variable into the per cpu area. */ zone->pageset = &boot_pageset; + zone->pageset_high = BOOT_PAGESET_HIGH; + zone->pageset_batch = BOOT_PAGESET_BATCH; if (populated_zone(zone)) printk(KERN_DEBUG " %s zone: %lu pages, LIFO batch:%u\n", From patchwork Tue Sep 22 14:37:10 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vlastimil Babka X-Patchwork-Id: 11792699 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9B5FF1668 for ; Tue, 22 Sep 2020 14:37:38 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 625522396D for ; Tue, 22 Sep 2020 14:37:38 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 625522396D Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 40FBB900096; Tue, 22 Sep 2020 10:37:24 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 3118890009B; Tue, 22 Sep 2020 10:37:24 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 04F03900096; Tue, 22 Sep 2020 10:37:23 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0250.hostedemail.com [216.40.44.250]) by kanga.kvack.org (Postfix) with ESMTP id D9D6090009B for ; Tue, 22 Sep 2020 10:37:23 -0400 (EDT) Received: from smtpin01.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 6440E3629 for ; Tue, 22 Sep 2020 14:37:23 +0000 (UTC) X-FDA: 77290950366.01.apple41_29067472714e Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin01.hostedemail.com (Postfix) with ESMTP id 4B1941004BE20 for ; Tue, 22 Sep 2020 14:37:23 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,vbabka@suse.cz,,RULES_HIT:30054:30070,0,RBL:195.135.220.15:@suse.cz:.lbl8.mailshell.net-62.2.6.2 64.100.201.201;04yfnkjfjj5iqtbraetd1x94ymopdopxtqoomna34ji1uzg9f8nzwaf6uq3b3kz.g6fnobuoqiotr8za8k65yo4sjihimpinwoarw41m7uum85huha57heox9pk96ji.1-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:25,LUA_SUMMARY:none X-HE-Tag: apple41_29067472714e X-Filterd-Recvd-Size: 5047 Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by imf41.hostedemail.com (Postfix) with ESMTP for ; Tue, 22 Sep 2020 14:37:22 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 51168B123; Tue, 22 Sep 2020 14:37:57 +0000 (UTC) From: Vlastimil Babka To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, Michal Hocko , Pavel Tatashin , David Hildenbrand , Oscar Salvador , Joonsoo Kim , Vlastimil Babka Subject: [PATCH 7/9] mm, page_alloc: move draining pcplists to page isolation users Date: Tue, 22 Sep 2020 16:37:10 +0200 Message-Id: <20200922143712.12048-8-vbabka@suse.cz> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20200922143712.12048-1-vbabka@suse.cz> References: <20200922143712.12048-1-vbabka@suse.cz> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Currently, pcplists are drained during set_migratetype_isolate() which means once per pageblock processed start_isolate_page_range(). This is somewhat wasteful. Moreover, the callers might need different guarantees, and the draining is currently prone to races and does not guarantee that no page from isolated pageblock will end up on the pcplist after the drain. Better guarantees are added by later patches and require explicit actions by page isolation users that need them. Thus it makes sense to move the current imperfect draining to the callers also as a preparation step. Suggested-by: Pavel Tatashin Signed-off-by: Vlastimil Babka Reviewed-by: David Hildenbrand Acked-by: Michal Hocko --- mm/memory_hotplug.c | 11 ++++++----- mm/page_alloc.c | 2 ++ mm/page_isolation.c | 10 +++++----- 3 files changed, 13 insertions(+), 10 deletions(-) diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 9db80ee29caa..08f729922e18 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1524,6 +1524,8 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages) goto failed_removal; } + drain_all_pages(zone); + arg.start_pfn = start_pfn; arg.nr_pages = nr_pages; node_states_check_changes_offline(nr_pages, zone, &arg); @@ -1574,11 +1576,10 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages) } /* - * per-cpu pages are drained in start_isolate_page_range, but if - * there are still pages that are not free, make sure that we - * drain again, because when we isolated range we might - * have raced with another thread that was adding pages to pcp - * list. + * per-cpu pages are drained after start_isolate_page_range, but + * if there are still pages that are not free, make sure that we + * drain again, because when we isolated range we might have + * raced with another thread that was adding pages to pcp list. * * Forward progress should be still guaranteed because * pages on the pcp list can only belong to MOVABLE_ZONE diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 901907799bdc..4e37bc3f6077 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -8432,6 +8432,8 @@ int alloc_contig_range(unsigned long start, unsigned long end, if (ret) return ret; + drain_all_pages(cc.zone); + /* * In case of -EBUSY, we'd like to know which page causes problem. * So, just fall through. test_pages_isolated() has a tracepoint diff --git a/mm/page_isolation.c b/mm/page_isolation.c index abfe26ad59fd..57d8bc1e7760 100644 --- a/mm/page_isolation.c +++ b/mm/page_isolation.c @@ -49,7 +49,6 @@ static int set_migratetype_isolate(struct page *page, int migratetype, int isol_ __mod_zone_freepage_state(zone, -nr_pages, mt); spin_unlock_irqrestore(&zone->lock, flags); - drain_all_pages(zone); return 0; } @@ -167,11 +166,12 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages) * * Please note that there is no strong synchronization with the page allocator * either. Pages might be freed while their page blocks are marked ISOLATED. - * In some cases pages might still end up on pcp lists and that would allow + * A call to drain_all_pages() after isolation can flush most of them. However + * in some cases pages might still end up on pcp lists and that would allow * for their allocation even when they are in fact isolated already. Depending - * on how strong of a guarantee the caller needs drain_all_pages might be needed - * (e.g. __offline_pages will need to call it after check for isolated range for - * a next retry). + * on how strong of a guarantee the caller needs, further drain_all_pages() + * might be needed (e.g. __offline_pages will need to call it after check for + * isolated range for a next retry). * * Return: 0 on success and -EBUSY if any part of range cannot be isolated. */ From patchwork Tue Sep 22 14:37:11 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vlastimil Babka X-Patchwork-Id: 11792703 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B90AE112C for ; Tue, 22 Sep 2020 14:37:44 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 81D152395C for ; Tue, 22 Sep 2020 14:37:44 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 81D152395C Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 6DEF590009B; Tue, 22 Sep 2020 10:37:26 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 5553490009E; Tue, 22 Sep 2020 10:37:26 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 420E090009B; Tue, 22 Sep 2020 10:37:26 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0220.hostedemail.com [216.40.44.220]) by kanga.kvack.org (Postfix) with ESMTP id 18A6990009E for ; Tue, 22 Sep 2020 10:37:26 -0400 (EDT) Received: from smtpin13.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id D292A180AD804 for ; Tue, 22 Sep 2020 14:37:25 +0000 (UTC) X-FDA: 77290950450.13.house38_54097702714e Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin13.hostedemail.com (Postfix) with ESMTP id 6515F18140630 for ; Tue, 22 Sep 2020 14:37:23 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,vbabka@suse.cz,,RULES_HIT:30054:30070,0,RBL:195.135.220.15:@suse.cz:.lbl8.mailshell.net-64.100.201.201 62.2.6.2;04y8syti314xdomzc8b6i1abqjapsypksugejxyhzwyse5ztfyqk38yfpgagd8c.p8c838g9qdza5hoykzwahd1eqrmob7jsj5ogts757yt7sehd78ojpxogdxoubim.6-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: house38_54097702714e X-Filterd-Recvd-Size: 4431 Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by imf29.hostedemail.com (Postfix) with ESMTP for ; Tue, 22 Sep 2020 14:37:22 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 6CB8BB12C; Tue, 22 Sep 2020 14:37:57 +0000 (UTC) From: Vlastimil Babka To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, Michal Hocko , Pavel Tatashin , David Hildenbrand , Oscar Salvador , Joonsoo Kim , Vlastimil Babka Subject: [PATCH 8/9] mm, page_alloc: drain all pcplists during memory offline Date: Tue, 22 Sep 2020 16:37:11 +0200 Message-Id: <20200922143712.12048-9-vbabka@suse.cz> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20200922143712.12048-1-vbabka@suse.cz> References: <20200922143712.12048-1-vbabka@suse.cz> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: drain_all_pages() is optimized to only execute on cpus where pcplists are not empty. The check can however race with a free to pcplist that has not yet increased the pcp->count from 0 to 1. Make the drain optionally skip the racy check and drain on all cpus, and use it in memory offline context, where we want to make sure no isolated pages are left behind on pcplists. Signed-off-by: Vlastimil Babka --- include/linux/gfp.h | 1 + mm/memory_hotplug.c | 4 ++-- mm/page_alloc.c | 29 ++++++++++++++++++++--------- 3 files changed, 23 insertions(+), 11 deletions(-) diff --git a/include/linux/gfp.h b/include/linux/gfp.h index 67a0774e080b..cc52c5cc9fab 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -592,6 +592,7 @@ extern void page_frag_free(void *addr); void page_alloc_init(void); void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp); +void __drain_all_pages(struct zone *zone, bool page_isolation); void drain_all_pages(struct zone *zone); void drain_local_pages(struct zone *zone); diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 08f729922e18..bbde415b558b 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1524,7 +1524,7 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages) goto failed_removal; } - drain_all_pages(zone); + __drain_all_pages(zone, true); arg.start_pfn = start_pfn; arg.nr_pages = nr_pages; @@ -1588,7 +1588,7 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages) */ ret = test_pages_isolated(start_pfn, end_pfn, MEMORY_OFFLINE); if (ret) - drain_all_pages(zone); + __drain_all_pages(zone, true); } while (ret); /* Mark all sections offline and remove free pages from the buddy. */ diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 4e37bc3f6077..33cc35d152b1 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2960,14 +2960,7 @@ static void drain_local_pages_wq(struct work_struct *work) preempt_enable(); } -/* - * Spill all the per-cpu pages from all CPUs back into the buddy allocator. - * - * When zone parameter is non-NULL, spill just the single zone's pages. - * - * Note that this can be extremely slow as the draining happens in a workqueue. - */ -void drain_all_pages(struct zone *zone) +void __drain_all_pages(struct zone *zone, bool force_all_cpus) { int cpu; @@ -3006,7 +2999,13 @@ void drain_all_pages(struct zone *zone) struct zone *z; bool has_pcps = false; - if (zone) { + if (force_all_cpus) { + /* + * The pcp.count check is racy, some callers need a + * guarantee that no cpu is missed. + */ + has_pcps = true; + } else if (zone) { pcp = per_cpu_ptr(zone->pageset, cpu); if (pcp->pcp.count) has_pcps = true; @@ -3039,6 +3038,18 @@ void drain_all_pages(struct zone *zone) mutex_unlock(&pcpu_drain_mutex); } +/* + * Spill all the per-cpu pages from all CPUs back into the buddy allocator. + * + * When zone parameter is non-NULL, spill just the single zone's pages. + * + * Note that this can be extremely slow as the draining happens in a workqueue. + */ +void drain_all_pages(struct zone *zone) +{ + __drain_all_pages(zone, false); +} + #ifdef CONFIG_HIBERNATION /* From patchwork Tue Sep 22 14:37:12 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vlastimil Babka X-Patchwork-Id: 11792701 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8B7611668 for ; Tue, 22 Sep 2020 14:37:41 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 4F78C2395C for ; Tue, 22 Sep 2020 14:37:41 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4F78C2395C Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id E2DCE90009C; Tue, 22 Sep 2020 10:37:25 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id DE0A490009B; Tue, 22 Sep 2020 10:37:25 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C852590009C; Tue, 22 Sep 2020 10:37:25 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0145.hostedemail.com [216.40.44.145]) by kanga.kvack.org (Postfix) with ESMTP id AA71390009B for ; Tue, 22 Sep 2020 10:37:25 -0400 (EDT) Received: from smtpin15.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 7262B180AD804 for ; Tue, 22 Sep 2020 14:37:25 +0000 (UTC) X-FDA: 77290950450.15.paper30_26180fa2714e Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin15.hostedemail.com (Postfix) with ESMTP id 6E11B1814B0CC for ; Tue, 22 Sep 2020 14:37:23 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,vbabka@suse.cz,,RULES_HIT:30051:30054:30070:30091,0,RBL:195.135.220.15:@suse.cz:.lbl8.mailshell.net-64.100.201.201 62.2.6.2;04y8jigsdimmm63khxfoomdy4kptqyctfyyuarwakrkpixitoipbctjhxqrtppb.koakkj95gobku8m8311qxxw67ctzwu4ybortuxjkwsjtrenhtyte98eas4xm85y.w-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: paper30_26180fa2714e X-Filterd-Recvd-Size: 12375 Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by imf19.hostedemail.com (Postfix) with ESMTP for ; Tue, 22 Sep 2020 14:37:22 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 84F5EB145; Tue, 22 Sep 2020 14:37:57 +0000 (UTC) From: Vlastimil Babka To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, Michal Hocko , Pavel Tatashin , David Hildenbrand , Oscar Salvador , Joonsoo Kim , Vlastimil Babka , Michal Hocko Subject: [PATCH 9/9] mm, page_alloc: optionally disable pcplists during page isolation Date: Tue, 22 Sep 2020 16:37:12 +0200 Message-Id: <20200922143712.12048-10-vbabka@suse.cz> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20200922143712.12048-1-vbabka@suse.cz> References: <20200922143712.12048-1-vbabka@suse.cz> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Page isolation can race with process freeing pages to pcplists in a way that a page from isolated pageblock can end up on pcplist. This can be fixed by repeated draining of pcplists, as done by patch "mm/memory_hotplug: drain per-cpu pages again during memory offline" in [1]. David and Michal would prefer that this race was closed in a way that callers of page isolation who need stronger guarantees don't need to repeatedly drain. David suggested disabling pcplists usage completely during page isolation, instead of repeatedly draining them. To achieve this without adding special cases in alloc/free fastpath, we can use the same approach as boot pagesets - when pcp->high is 0, any pcplist addition will be immediately flushed. The race can thus be closed by setting pcp->high to 0 and draining pcplists once, before calling start_isolate_page_range(). The draining will serialize after processes that already disabled interrupts and read the old value of pcp->high in free_unref_page_commit(), and processes that have not yet disabled interrupts, will observe pcp->high == 0 when they are rescheduled, and skip pcplists. This guarantees no stray pages on pcplists in zones where isolation happens. This patch thus adds zone_pcplist_disable() and zone_pcplist_enable() functions that page isolation users can call before start_isolate_page_range() and after unisolating (or offlining) the isolated pages. A new zone->pcplist_disabled atomic variable makes sure we disable only pcplists once and don't enable them prematurely in case there are multiple users in parallel. We however have to avoid external updates to high and batch by taking pcp_batch_high_lock. To allow multiple isolations in parallel, change this lock from mutex to rwsem. Currently the only user of this functionality is offline_pages(). [1] https://lore.kernel.org/linux-mm/20200903140032.380431-1-pasha.tatashin@soleen.com/ Suggested-by: David Hildenbrand Suggested-by: Michal Hocko Signed-off-by: Vlastimil Babka --- include/linux/mmzone.h | 2 ++ include/linux/page-isolation.h | 2 ++ mm/internal.h | 4 ++++ mm/memory_hotplug.c | 28 ++++++++++++---------------- mm/page_alloc.c | 29 ++++++++++++++++++----------- mm/page_isolation.c | 22 +++++++++++++++++++--- 6 files changed, 57 insertions(+), 30 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 7ad3f14dbe88..3c653d6348b1 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -536,6 +536,8 @@ struct zone { * of pageblock. Protected by zone->lock. */ unsigned long nr_isolate_pageblock; + /* Track requests for disabling pcplists */ + atomic_t pcplist_disabled; #endif #ifdef CONFIG_MEMORY_HOTPLUG diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h index 572458016331..1dec5d0f62a6 100644 --- a/include/linux/page-isolation.h +++ b/include/linux/page-isolation.h @@ -38,6 +38,8 @@ struct page *has_unmovable_pages(struct zone *zone, struct page *page, void set_pageblock_migratetype(struct page *page, int migratetype); int move_freepages_block(struct zone *zone, struct page *page, int migratetype, int *num_movable); +void zone_pcplist_disable(struct zone *zone); +void zone_pcplist_enable(struct zone *zone); /* * Changes migrate type in [start_pfn, end_pfn) to be MIGRATE_ISOLATE. diff --git a/mm/internal.h b/mm/internal.h index ea66ef45da6c..154e38e702dd 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -7,6 +7,7 @@ #ifndef __MM_INTERNAL_H #define __MM_INTERNAL_H +#include #include #include #include @@ -199,8 +200,11 @@ extern void post_alloc_hook(struct page *page, unsigned int order, gfp_t gfp_flags); extern int user_min_free_kbytes; +extern struct rw_semaphore pcp_batch_high_lock; extern void zone_pcp_update(struct zone *zone); extern void zone_pcp_reset(struct zone *zone); +extern void zone_update_pageset_high_and_batch(struct zone *zone, + unsigned long high, unsigned long batch); #if defined CONFIG_COMPACTION || defined CONFIG_CMA diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index bbde415b558b..6fe9be550160 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1515,17 +1515,21 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages) } node = zone_to_nid(zone); + /* + * Disable pcplists so that page isolation cannot race with freeing + * in a way that pages from isolated pageblock are left on pcplists. + */ + zone_pcplist_disable(zone); + /* set above range as isolated */ ret = start_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE, MEMORY_OFFLINE | REPORT_FAILURE); if (ret) { reason = "failure to isolate range"; - goto failed_removal; + goto failed_removal_pcplists_disabled; } - __drain_all_pages(zone, true); - arg.start_pfn = start_pfn; arg.nr_pages = nr_pages; node_states_check_changes_offline(nr_pages, zone, &arg); @@ -1575,20 +1579,8 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages) goto failed_removal_isolated; } - /* - * per-cpu pages are drained after start_isolate_page_range, but - * if there are still pages that are not free, make sure that we - * drain again, because when we isolated range we might have - * raced with another thread that was adding pages to pcp list. - * - * Forward progress should be still guaranteed because - * pages on the pcp list can only belong to MOVABLE_ZONE - * because has_unmovable_pages explicitly checks for - * PageBuddy on freed pages on other zones. - */ ret = test_pages_isolated(start_pfn, end_pfn, MEMORY_OFFLINE); - if (ret) - __drain_all_pages(zone, true); + } while (ret); /* Mark all sections offline and remove free pages from the buddy. */ @@ -1604,6 +1596,8 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages) zone->nr_isolate_pageblock -= nr_pages / pageblock_nr_pages; spin_unlock_irqrestore(&zone->lock, flags); + zone_pcplist_enable(zone); + /* removal success */ adjust_managed_page_count(pfn_to_page(start_pfn), -nr_pages); zone->present_pages -= nr_pages; @@ -1637,6 +1631,8 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages) failed_removal_isolated: undo_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE); memory_notify(MEM_CANCEL_OFFLINE, &arg); +failed_removal_pcplists_disabled: + zone_pcplist_enable(zone); failed_removal: pr_debug("memory offlining [mem %#010llx-%#010llx] failed due to %s\n", (unsigned long long) start_pfn << PAGE_SHIFT, diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 33cc35d152b1..673d353f9311 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -78,7 +78,7 @@ #include "page_reporting.h" /* prevent >1 _updater_ of zone percpu pageset ->high and ->batch fields */ -static DEFINE_MUTEX(pcp_batch_high_lock); +DECLARE_RWSEM(pcp_batch_high_lock); #define MIN_PERCPU_PAGELIST_FRACTION (8) #ifdef CONFIG_USE_PERCPU_NUMA_NODE_ID @@ -6230,6 +6230,18 @@ static void pageset_init(struct per_cpu_pageset *p) pcp->batch = BOOT_PAGESET_BATCH; } +void zone_update_pageset_high_and_batch(struct zone *zone, unsigned long high, + unsigned long batch) +{ + struct per_cpu_pageset *p; + int cpu; + + for_each_possible_cpu(cpu) { + p = per_cpu_ptr(zone->pageset, cpu); + pageset_update(&p->pcp, high, batch); + } +} + /* * Calculate and set new high and batch values for all per-cpu pagesets of a * zone, based on the zone's size and the percpu_pagelist_fraction sysctl. @@ -6237,8 +6249,6 @@ static void pageset_init(struct per_cpu_pageset *p) static void zone_set_pageset_high_and_batch(struct zone *zone) { unsigned long new_high, new_batch; - struct per_cpu_pageset *p; - int cpu; if (percpu_pagelist_fraction) { new_high = zone_managed_pages(zone) / percpu_pagelist_fraction; @@ -6259,10 +6269,7 @@ static void zone_set_pageset_high_and_batch(struct zone *zone) return; } - for_each_possible_cpu(cpu) { - p = per_cpu_ptr(zone->pageset, cpu); - pageset_update(&p->pcp, new_high, new_batch); - } + zone_update_pageset_high_and_batch(zone, new_high, new_batch); } void __meminit setup_zone_pageset(struct zone *zone) @@ -8024,7 +8031,7 @@ int percpu_pagelist_fraction_sysctl_handler(struct ctl_table *table, int write, int old_percpu_pagelist_fraction; int ret; - mutex_lock(&pcp_batch_high_lock); + down_write(&pcp_batch_high_lock); old_percpu_pagelist_fraction = percpu_pagelist_fraction; ret = proc_dointvec_minmax(table, write, buffer, length, ppos); @@ -8046,7 +8053,7 @@ int percpu_pagelist_fraction_sysctl_handler(struct ctl_table *table, int write, for_each_populated_zone(zone) zone_set_pageset_high_and_batch(zone); out: - mutex_unlock(&pcp_batch_high_lock); + up_write(&pcp_batch_high_lock); return ret; } @@ -8652,9 +8659,9 @@ EXPORT_SYMBOL(free_contig_range); */ void __meminit zone_pcp_update(struct zone *zone) { - mutex_lock(&pcp_batch_high_lock); + down_write(&pcp_batch_high_lock); zone_set_pageset_high_and_batch(zone); - mutex_unlock(&pcp_batch_high_lock); + up_write(&pcp_batch_high_lock); } void zone_pcp_reset(struct zone *zone) diff --git a/mm/page_isolation.c b/mm/page_isolation.c index 57d8bc1e7760..c0e895e8f322 100644 --- a/mm/page_isolation.c +++ b/mm/page_isolation.c @@ -15,6 +15,22 @@ #define CREATE_TRACE_POINTS #include +void zone_pcplist_disable(struct zone *zone) +{ + down_read(&pcp_batch_high_lock); + if (atomic_inc_return(&zone->pcplist_disabled) == 1) { + zone_update_pageset_high_and_batch(zone, 0, 1); + __drain_all_pages(zone, true); + } +} + +void zone_pcplist_enable(struct zone *zone) +{ + if (atomic_dec_return(&zone->pcplist_disabled) == 0) + zone_update_pageset_high_and_batch(zone, zone->pageset_high, zone->pageset_batch); + up_read(&pcp_batch_high_lock); +} + static int set_migratetype_isolate(struct page *page, int migratetype, int isol_flags) { struct zone *zone = page_zone(page); @@ -169,9 +185,9 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages) * A call to drain_all_pages() after isolation can flush most of them. However * in some cases pages might still end up on pcp lists and that would allow * for their allocation even when they are in fact isolated already. Depending - * on how strong of a guarantee the caller needs, further drain_all_pages() - * might be needed (e.g. __offline_pages will need to call it after check for - * isolated range for a next retry). + * on how strong of a guarantee the caller needs, zone_pcplist_disable/enable() + * might be used to flush and disable pcplist before isolation and after + * unisolation. * * Return: 0 on success and -EBUSY if any part of range cannot be isolated. */