From patchwork Mon Sep 7 16:36:24 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vlastimil Babka X-Patchwork-Id: 11761591 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 838B992C for ; Mon, 7 Sep 2020 16:36:50 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 1A32421927 for ; Mon, 7 Sep 2020 16:36:50 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1A32421927 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id B23306B005A; Mon, 7 Sep 2020 12:36:45 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 7AA3C6B0055; Mon, 7 Sep 2020 12:36:45 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 539938E0003; Mon, 7 Sep 2020 12:36:45 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0218.hostedemail.com [216.40.44.218]) by kanga.kvack.org (Postfix) with ESMTP id 1EA8C6B005C for ; Mon, 7 Sep 2020 12:36:45 -0400 (EDT) Received: from smtpin28.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id D10A38248047 for ; Mon, 7 Sep 2020 16:36:44 +0000 (UTC) X-FDA: 77236819128.28.pear43_55070e5270cd Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin28.hostedemail.com (Postfix) with ESMTP id A7B8F6C05 for ; Mon, 7 Sep 2020 16:36:44 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,vbabka@suse.cz,,RULES_HIT:30054:30070:30090,0,RBL:195.135.220.15:@suse.cz:.lbl8.mailshell.net-62.14.6.2 64.201.201.201;04yfseby4t8zkijtgg85xy9piqnztop5pyqaa6w8zfmbrb7kjj716uzkmxkfs5g.pz76rz5zk33951ocg4j8sgmk6oioz4p87sxn1137mjki8cc35ib5dd4jxigspqk.y-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: pear43_55070e5270cd X-Filterd-Recvd-Size: 4741 Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by imf39.hostedemail.com (Postfix) with ESMTP for ; Mon, 7 Sep 2020 16:36:44 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 3200BB01D; Mon, 7 Sep 2020 16:36:43 +0000 (UTC) From: Vlastimil Babka To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, Michal Hocko , Pavel Tatashin , David Hildenbrand , Oscar Salvador , Joonsoo Kim , Vlastimil Babka Subject: [RFC 1/5] mm, page_alloc: clean up pageset high and batch update Date: Mon, 7 Sep 2020 18:36:24 +0200 Message-Id: <20200907163628.26495-2-vbabka@suse.cz> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20200907163628.26495-1-vbabka@suse.cz> References: <20200907163628.26495-1-vbabka@suse.cz> MIME-Version: 1.0 X-Rspamd-Queue-Id: A7B8F6C05 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam05 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The updates to pcplists' high and batch valued are handled by multiple functions that make the calculations hard to follow. Consolidate everything to pageset_set_high_and_batch() and remove pageset_set_batch() and pageset_set_high() wrappers. The only special case using one of the removed wrappers was: build_all_zonelists_init() setup_pageset() pageset_set_batch() which was hardcoding batch as 0, so we can just open-code a call to pageset_update() with constant parameters instead. No functional change. Signed-off-by: Vlastimil Babka Reviewed-by: Oscar Salvador --- mm/page_alloc.c | 51 +++++++++++++++++++------------------------------ 1 file changed, 20 insertions(+), 31 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index fab5e97dc9ca..0b516208afda 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5834,7 +5834,7 @@ static void build_zonelists(pg_data_t *pgdat) * not check if the processor is online before following the pageset pointer. * Other parts of the kernel may not check if the zone is available. */ -static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch); +static void setup_pageset(struct per_cpu_pageset *p); static DEFINE_PER_CPU(struct per_cpu_pageset, boot_pageset); static DEFINE_PER_CPU(struct per_cpu_nodestat, boot_nodestats); @@ -5902,7 +5902,7 @@ build_all_zonelists_init(void) * (a chicken-egg dilemma). */ for_each_possible_cpu(cpu) - setup_pageset(&per_cpu(boot_pageset, cpu), 0); + setup_pageset(&per_cpu(boot_pageset, cpu)); mminit_verify_zonelist(); cpuset_init_current_mems_allowed(); @@ -6218,12 +6218,6 @@ static void pageset_update(struct per_cpu_pages *pcp, unsigned long high, pcp->batch = batch; } -/* a companion to pageset_set_high() */ -static void pageset_set_batch(struct per_cpu_pageset *p, unsigned long batch) -{ - pageset_update(&p->pcp, 6 * batch, max(1UL, 1 * batch)); -} - static void pageset_init(struct per_cpu_pageset *p) { struct per_cpu_pages *pcp; @@ -6236,35 +6230,30 @@ static void pageset_init(struct per_cpu_pageset *p) INIT_LIST_HEAD(&pcp->lists[migratetype]); } -static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch) +static void setup_pageset(struct per_cpu_pageset *p) { pageset_init(p); - pageset_set_batch(p, batch); -} - -/* - * pageset_set_high() sets the high water mark for hot per_cpu_pagelist - * to the value high for the pageset p. - */ -static void pageset_set_high(struct per_cpu_pageset *p, - unsigned long high) -{ - unsigned long batch = max(1UL, high / 4); - if ((high / 4) > (PAGE_SHIFT * 8)) - batch = PAGE_SHIFT * 8; - - pageset_update(&p->pcp, high, batch); + pageset_update(&p->pcp, 0, 1); } static void pageset_set_high_and_batch(struct zone *zone, - struct per_cpu_pageset *pcp) + struct per_cpu_pageset *p) { - if (percpu_pagelist_fraction) - pageset_set_high(pcp, - (zone_managed_pages(zone) / - percpu_pagelist_fraction)); - else - pageset_set_batch(pcp, zone_batchsize(zone)); + unsigned long new_high; + unsigned long new_batch; + int fraction = READ_ONCE(percpu_pagelist_fraction); + + if (fraction) { + new_high = zone_managed_pages(zone) / fraction; + new_batch = max(1UL, new_high / 4); + if ((new_high / 4) > (PAGE_SHIFT * 8)) + new_batch = PAGE_SHIFT * 8; + } else { + new_batch = zone_batchsize(zone); + new_high = 6 * new_batch; + new_batch = max(1UL, 1 * new_batch); + } + pageset_update(&p->pcp, new_high, new_batch); } static void __meminit zone_pageset_init(struct zone *zone, int cpu) From patchwork Mon Sep 7 16:36:25 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vlastimil Babka X-Patchwork-Id: 11761595 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 79CEE92C for ; Mon, 7 Sep 2020 16:36:58 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 3EB1A21927 for ; Mon, 7 Sep 2020 16:36:58 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3EB1A21927 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 402B88E0007; Mon, 7 Sep 2020 12:36:57 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 38C4F8E0001; Mon, 7 Sep 2020 12:36:57 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 22DC48E0006; Mon, 7 Sep 2020 12:36:57 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 0A6B78E0001 for ; Mon, 7 Sep 2020 12:36:57 -0400 (EDT) Received: from smtpin26.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id BD176181AC9CB for ; Mon, 7 Sep 2020 16:36:56 +0000 (UTC) X-FDA: 77236819632.26.honey17_1e15619270cd Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin26.hostedemail.com (Postfix) with ESMTP id 7C9D218049B80 for ; Mon, 7 Sep 2020 16:36:44 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,vbabka@suse.cz,,RULES_HIT:30054:30070,0,RBL:195.135.220.15:@suse.cz:.lbl8.mailshell.net-62.14.6.2 64.201.201.201;04y8wo67kjjikxkdes4gzkaacisrbyp6mcx5kkdq4ecd9imssnk7kae9s56xu4d.ydzr48q175eek8ze8wjfw59gd3qznaby17y85qxq5xei7463bsy38ai6pckcw6z.k-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: honey17_1e15619270cd X-Filterd-Recvd-Size: 4228 Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by imf13.hostedemail.com (Postfix) with ESMTP for ; Mon, 7 Sep 2020 16:36:44 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 2D9AAB012; Mon, 7 Sep 2020 16:36:43 +0000 (UTC) From: Vlastimil Babka To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, Michal Hocko , Pavel Tatashin , David Hildenbrand , Oscar Salvador , Joonsoo Kim , Vlastimil Babka Subject: [RFC 2/5] mm, page_alloc: calculate pageset high and batch once per zone Date: Mon, 7 Sep 2020 18:36:25 +0200 Message-Id: <20200907163628.26495-3-vbabka@suse.cz> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20200907163628.26495-1-vbabka@suse.cz> References: <20200907163628.26495-1-vbabka@suse.cz> MIME-Version: 1.0 X-Rspamd-Queue-Id: 7C9D218049B80 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam05 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000017, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: We currently call pageset_set_high_and_batch() for each possible cpu, which repeats the same calculations of high and batch values. Instead call it once per zone, and it applies the calculated values to all per-cpu pagesets of the zone. This also allows removing zone_pageset_init() and __zone_pcp_update() wrappers. No functional change. Signed-off-by: Vlastimil Babka Reviewed-by: Oscar Salvador Reviewed-by: David Hildenbrand --- mm/page_alloc.c | 40 +++++++++++++++++----------------------- 1 file changed, 17 insertions(+), 23 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 0b516208afda..f669a251f654 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6236,12 +6236,13 @@ static void setup_pageset(struct per_cpu_pageset *p) pageset_update(&p->pcp, 0, 1); } -static void pageset_set_high_and_batch(struct zone *zone, - struct per_cpu_pageset *p) +static void zone_set_pageset_high_and_batch(struct zone *zone) { unsigned long new_high; unsigned long new_batch; int fraction = READ_ONCE(percpu_pagelist_fraction); + int cpu; + struct per_cpu_pageset *p; if (fraction) { new_high = zone_managed_pages(zone) / fraction; @@ -6253,23 +6254,25 @@ static void pageset_set_high_and_batch(struct zone *zone, new_high = 6 * new_batch; new_batch = max(1UL, 1 * new_batch); } - pageset_update(&p->pcp, new_high, new_batch); -} - -static void __meminit zone_pageset_init(struct zone *zone, int cpu) -{ - struct per_cpu_pageset *pcp = per_cpu_ptr(zone->pageset, cpu); - pageset_init(pcp); - pageset_set_high_and_batch(zone, pcp); + for_each_possible_cpu(cpu) { + p = per_cpu_ptr(zone->pageset, cpu); + pageset_update(&p->pcp, new_high, new_batch); + } } void __meminit setup_zone_pageset(struct zone *zone) { int cpu; + struct per_cpu_pageset *p; + zone->pageset = alloc_percpu(struct per_cpu_pageset); - for_each_possible_cpu(cpu) - zone_pageset_init(zone, cpu); + for_each_possible_cpu(cpu) { + p = per_cpu_ptr(zone->pageset, cpu); + pageset_init(p); + } + + zone_set_pageset_high_and_batch(zone); } /* @@ -8002,15 +8005,6 @@ int lowmem_reserve_ratio_sysctl_handler(struct ctl_table *table, int write, return 0; } -static void __zone_pcp_update(struct zone *zone) -{ - unsigned int cpu; - - for_each_possible_cpu(cpu) - pageset_set_high_and_batch(zone, - per_cpu_ptr(zone->pageset, cpu)); -} - /* * percpu_pagelist_fraction - changes the pcp->high for each zone on each * cpu. It is the fraction of total pages in each zone that a hot per cpu @@ -8043,7 +8037,7 @@ int percpu_pagelist_fraction_sysctl_handler(struct ctl_table *table, int write, goto out; for_each_populated_zone(zone) - __zone_pcp_update(zone); + zone_set_pageset_high_and_batch(zone); out: mutex_unlock(&pcp_batch_high_lock); return ret; @@ -8659,7 +8653,7 @@ EXPORT_SYMBOL(free_contig_range); void __meminit zone_pcp_update(struct zone *zone) { mutex_lock(&pcp_batch_high_lock); - __zone_pcp_update(zone); + zone_set_pageset_high_and_batch(zone); mutex_unlock(&pcp_batch_high_lock); } From patchwork Mon Sep 7 16:36:26 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vlastimil Babka X-Patchwork-Id: 11761597 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DFF0592C for ; Mon, 7 Sep 2020 16:37:00 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 892C821927 for ; Mon, 7 Sep 2020 16:37:00 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 892C821927 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 732E98E0001; Mon, 7 Sep 2020 12:36:57 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 676EC900002; Mon, 7 Sep 2020 12:36:57 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3FF6E8E0006; Mon, 7 Sep 2020 12:36:57 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0173.hostedemail.com [216.40.44.173]) by kanga.kvack.org (Postfix) with ESMTP id 1B7B38E0003 for ; Mon, 7 Sep 2020 12:36:57 -0400 (EDT) Received: from smtpin26.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id CC2DF180AD80F for ; Mon, 7 Sep 2020 16:36:56 +0000 (UTC) X-FDA: 77236819632.26.ghost34_2004053270cd Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin26.hostedemail.com (Postfix) with ESMTP id 8A5D318049B82 for ; Mon, 7 Sep 2020 16:36:44 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,vbabka@suse.cz,,RULES_HIT:30054:30070,0,RBL:195.135.220.15:@suse.cz:.lbl8.mailshell.net-64.201.201.201 62.14.6.2;04yge1mngdjo9t73y7qq7swx1rujoocixoausxbs34qqzsfb57kbgdjparenn5f.6y3w68rtyeyzj6jegab6tjb74sufgdskbixjkptrk7a9fe6xjw9gpduy81huodz.n-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: ghost34_2004053270cd X-Filterd-Recvd-Size: 2497 Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by imf24.hostedemail.com (Postfix) with ESMTP for ; Mon, 7 Sep 2020 16:36:44 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 31D97B019; Mon, 7 Sep 2020 16:36:43 +0000 (UTC) From: Vlastimil Babka To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, Michal Hocko , Pavel Tatashin , David Hildenbrand , Oscar Salvador , Joonsoo Kim , Vlastimil Babka Subject: [RFC 3/5] mm, page_alloc(): remove setup_pageset() Date: Mon, 7 Sep 2020 18:36:26 +0200 Message-Id: <20200907163628.26495-4-vbabka@suse.cz> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20200907163628.26495-1-vbabka@suse.cz> References: <20200907163628.26495-1-vbabka@suse.cz> MIME-Version: 1.0 X-Rspamd-Queue-Id: 8A5D318049B82 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam02 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: We initialize boot-time pagesets with setup_pageset(), which sets high and batch values that effectively disable pcplists. We can remove this wrapper if we just set these values for all pagesets in pageset_init(). Non-boot pagesets then subsequently update them to specific values. Signed-off-by: Vlastimil Babka Reviewed-by: Oscar Salvador Reviewed-by: David Hildenbrand --- mm/page_alloc.c | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index f669a251f654..a0cab2c6055e 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5902,7 +5902,7 @@ build_all_zonelists_init(void) * (a chicken-egg dilemma). */ for_each_possible_cpu(cpu) - setup_pageset(&per_cpu(boot_pageset, cpu)); + pageset_init(&per_cpu(boot_pageset, cpu)); mminit_verify_zonelist(); cpuset_init_current_mems_allowed(); @@ -6228,12 +6228,13 @@ static void pageset_init(struct per_cpu_pageset *p) pcp = &p->pcp; for (migratetype = 0; migratetype < MIGRATE_PCPTYPES; migratetype++) INIT_LIST_HEAD(&pcp->lists[migratetype]); -} -static void setup_pageset(struct per_cpu_pageset *p) -{ - pageset_init(p); - pageset_update(&p->pcp, 0, 1); + /* + * Set batch and high values safe for a boot pageset. Proper pageset's + * initialization will update them. + */ + pcp->high = 0; + pcp->batch = 1; } static void zone_set_pageset_high_and_batch(struct zone *zone) From patchwork Mon Sep 7 16:36:27 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vlastimil Babka X-Patchwork-Id: 11761589 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 50EEA16C0 for ; Mon, 7 Sep 2020 16:36:48 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 1CC2021927 for ; Mon, 7 Sep 2020 16:36:48 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1CC2021927 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 81ADB6B005C; Mon, 7 Sep 2020 12:36:45 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 625476B005A; Mon, 7 Sep 2020 12:36:45 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 428DE6B005D; Mon, 7 Sep 2020 12:36:45 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0029.hostedemail.com [216.40.44.29]) by kanga.kvack.org (Postfix) with ESMTP id 174FC6B005A for ; Mon, 7 Sep 2020 12:36:45 -0400 (EDT) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id CD11A3620 for ; Mon, 7 Sep 2020 16:36:44 +0000 (UTC) X-FDA: 77236819128.14.bit48_1609baa270cd Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin14.hostedemail.com (Postfix) with ESMTP id A49EB18229818 for ; Mon, 7 Sep 2020 16:36:44 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,vbabka@suse.cz,,RULES_HIT:30054:30070:30090:30091,0,RBL:195.135.220.15:@suse.cz:.lbl8.mailshell.net-64.100.201.201 62.2.6.2;04yrxmkqpk1z54ur4ms8r6cg6sqttycikmybac4y1yohehxb4n1cbtyasxa6h35.oy6r8ojsxddtrbsoqw3qtyhmikup5tszxi7t3pmhwj7y6hci4ubdjnj6pzi6jax.6-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: bit48_1609baa270cd X-Filterd-Recvd-Size: 4425 Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by imf22.hostedemail.com (Postfix) with ESMTP for ; Mon, 7 Sep 2020 16:36:44 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 3C425ACB5; Mon, 7 Sep 2020 16:36:43 +0000 (UTC) From: Vlastimil Babka To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, Michal Hocko , Pavel Tatashin , David Hildenbrand , Oscar Salvador , Joonsoo Kim , Vlastimil Babka Subject: [RFC 4/5] mm, page_alloc: cache pageset high and batch in struct zone Date: Mon, 7 Sep 2020 18:36:27 +0200 Message-Id: <20200907163628.26495-5-vbabka@suse.cz> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20200907163628.26495-1-vbabka@suse.cz> References: <20200907163628.26495-1-vbabka@suse.cz> MIME-Version: 1.0 X-Rspamd-Queue-Id: A49EB18229818 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam02 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: All per-cpu pagesets for a zone use the same high and batch values, that are duplicated there just for performance (locality) reasons. This patch adds the same variables also to struct zone as 'central' ones. This will be useful later for making possible to disable pcplists temporarily by setting high value to 0, while remembering the values for restoring them later. But we can also immediately benefit from not updating pagesets of all possible cpus in case the newly recalculated values (after sysctl change or memory online/offline) are actually unchanged from the previous ones. Signed-off-by: Vlastimil Babka --- include/linux/mmzone.h | 2 ++ mm/page_alloc.c | 18 +++++++++++++----- 2 files changed, 15 insertions(+), 5 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 8379432f4f2f..15582ca368b9 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -431,6 +431,8 @@ struct zone { #endif struct pglist_data *zone_pgdat; struct per_cpu_pageset __percpu *pageset; + int pageset_high; + int pageset_batch; #ifndef CONFIG_SPARSEMEM /* diff --git a/mm/page_alloc.c b/mm/page_alloc.c index a0cab2c6055e..004350a2b6ca 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5834,7 +5834,7 @@ static void build_zonelists(pg_data_t *pgdat) * not check if the processor is online before following the pageset pointer. * Other parts of the kernel may not check if the zone is available. */ -static void setup_pageset(struct per_cpu_pageset *p); +static void pageset_init(struct per_cpu_pageset *p); static DEFINE_PER_CPU(struct per_cpu_pageset, boot_pageset); static DEFINE_PER_CPU(struct per_cpu_nodestat, boot_nodestats); @@ -6237,7 +6237,7 @@ static void pageset_init(struct per_cpu_pageset *p) pcp->batch = 1; } -static void zone_set_pageset_high_and_batch(struct zone *zone) +static void zone_set_pageset_high_and_batch(struct zone *zone, bool force_update) { unsigned long new_high; unsigned long new_batch; @@ -6256,6 +6256,14 @@ static void zone_set_pageset_high_and_batch(struct zone *zone) new_batch = max(1UL, 1 * new_batch); } + if (zone->pageset_high != new_high || + zone->pageset_batch != new_batch) { + zone->pageset_high = new_high; + zone->pageset_batch = new_batch; + } else if (!force_update) { + return; + } + for_each_possible_cpu(cpu) { p = per_cpu_ptr(zone->pageset, cpu); pageset_update(&p->pcp, new_high, new_batch); @@ -6273,7 +6281,7 @@ void __meminit setup_zone_pageset(struct zone *zone) pageset_init(p); } - zone_set_pageset_high_and_batch(zone); + zone_set_pageset_high_and_batch(zone, true); } /* @@ -8038,7 +8046,7 @@ int percpu_pagelist_fraction_sysctl_handler(struct ctl_table *table, int write, goto out; for_each_populated_zone(zone) - zone_set_pageset_high_and_batch(zone); + zone_set_pageset_high_and_batch(zone, false); out: mutex_unlock(&pcp_batch_high_lock); return ret; @@ -8654,7 +8662,7 @@ EXPORT_SYMBOL(free_contig_range); void __meminit zone_pcp_update(struct zone *zone) { mutex_lock(&pcp_batch_high_lock); - zone_set_pageset_high_and_batch(zone); + zone_set_pageset_high_and_batch(zone, false); mutex_unlock(&pcp_batch_high_lock); } From patchwork Mon Sep 7 16:36:28 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vlastimil Babka X-Patchwork-Id: 11761593 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B30E4618 for ; Mon, 7 Sep 2020 16:36:52 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 7180621927 for ; Mon, 7 Sep 2020 16:36:52 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7180621927 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id DA28F6B0055; Mon, 7 Sep 2020 12:36:46 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id CB7E86B005D; Mon, 7 Sep 2020 12:36:46 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B7DD98E0001; Mon, 7 Sep 2020 12:36:46 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0151.hostedemail.com [216.40.44.151]) by kanga.kvack.org (Postfix) with ESMTP id 99DA26B0055 for ; Mon, 7 Sep 2020 12:36:46 -0400 (EDT) Received: from smtpin29.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 660121EFD for ; Mon, 7 Sep 2020 16:36:46 +0000 (UTC) X-FDA: 77236819212.29.paste85_3f11fdf270cd Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin29.hostedemail.com (Postfix) with ESMTP id 3A2A6180868EC for ; Mon, 7 Sep 2020 16:36:46 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,vbabka@suse.cz,,RULES_HIT:30034:30054:30069:30070:30090:30091,0,RBL:195.135.220.15:@suse.cz:.lbl8.mailshell.net-62.2.6.2 64.100.201.201;04yfe4hqpkjw1qeaxhwaiu5484uw8ypb98ehy6a9wj1qneswzdxzirxjmjk8871.tzxaaba8q3jqbbej7hnijfn6wauep6bxugisbuskykq3pdzugafb7q1eq8r4of6.h-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:25,LUA_SUMMARY:none X-HE-Tag: paste85_3f11fdf270cd X-Filterd-Recvd-Size: 15422 Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by imf23.hostedemail.com (Postfix) with ESMTP for ; Mon, 7 Sep 2020 16:36:45 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id A429CB020; Mon, 7 Sep 2020 16:36:43 +0000 (UTC) From: Vlastimil Babka To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, Michal Hocko , Pavel Tatashin , David Hildenbrand , Oscar Salvador , Joonsoo Kim , Vlastimil Babka , Michal Hocko Subject: [RFC 5/5] mm, page_alloc: disable pcplists during page isolation Date: Mon, 7 Sep 2020 18:36:28 +0200 Message-Id: <20200907163628.26495-6-vbabka@suse.cz> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20200907163628.26495-1-vbabka@suse.cz> References: <20200907163628.26495-1-vbabka@suse.cz> MIME-Version: 1.0 X-Rspamd-Queue-Id: 3A2A6180868EC X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam04 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Page isolation can race with process freeing pages to pcplists in a way that a page from isolated pageblock can end up on pcplist. This can be fixed by repeated draining of pcplists, as done by patch "mm/memory_hotplug: drain per-cpu pages again during memory offline" in [1]. David and Michal would prefer that this race was closed in a way that callers of page isolation don't need to care about drain. David suggested disabling pcplists usage completely during page isolation, instead of repeatedly draining them. To achieve this without adding special cases in alloc/free fastpath, we can use the same 'trick' as boot pagesets - when pcp->high is 0, any pcplist addition will be immediately flushed. The race can thus be closed by setting pcp->high to 0 and draining pcplists once in start_isolate_page_range(). The draining will serialize after processes that already disabled interrupts and read the old value of pcp->high in free_unref_page_commit(), and processes that have not yet disabled interrupts, will observe pcp->high == 0 when they are rescheduled, and skip pcplists. This guarantees no stray pages on pcplists in zones where isolation happens. We can use the variable zone->nr_isolate_pageblock (protected by zone->lock) to detect transitions from 0 to 1 (to change pcp->high to 0 and issue drain) and from 1 to 0 (to restore original pcp->high and batch values cached in struct zone). We have to avoid external updates to high and batch by taking pcp_batch_high_lock. To allow multiple isolations in parallel, change this lock from mutex to rwsem. For callers that pair start_isolate_page_range() with undo_isolated_page_range() properly, this is transparent. Currently that's alloc_contig_range(). __offline_pages() doesn't call undo_isolated_page_range() in the succes case, so it has to be carful to handle restoring pcp->high and batch and unlocking pcp_batch_high_lock. This commit also changes drain_all_pages() to not trust reading pcp->count during drain for page isolation - I believe that could be racy and lead to missing some cpu's to drain. If others agree, this can be separated and potentially backported. [1] https://lore.kernel.org/linux-mm/20200903140032.380431-1-pasha.tatashin@soleen.com/ Suggested-by: David Hildenbrand Suggested-by: Michal Hocko Signed-off-by: Vlastimil Babka Signed-off-by: Vlastimil Babka --- include/linux/gfp.h | 1 + mm/internal.h | 4 ++++ mm/memory_hotplug.c | 24 ++++++++----------- mm/page_alloc.c | 58 +++++++++++++++++++++++++++++---------------- mm/page_isolation.c | 45 ++++++++++++++++++++++++++++------- 5 files changed, 89 insertions(+), 43 deletions(-) diff --git a/include/linux/gfp.h b/include/linux/gfp.h index 67a0774e080b..cc52c5cc9fab 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -592,6 +592,7 @@ extern void page_frag_free(void *addr); void page_alloc_init(void); void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp); +void __drain_all_pages(struct zone *zone, bool page_isolation); void drain_all_pages(struct zone *zone); void drain_local_pages(struct zone *zone); diff --git a/mm/internal.h b/mm/internal.h index 10c677655912..c157af87a9ed 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -7,6 +7,7 @@ #ifndef __MM_INTERNAL_H #define __MM_INTERNAL_H +#include #include #include #include @@ -201,8 +202,11 @@ extern void post_alloc_hook(struct page *page, unsigned int order, gfp_t gfp_flags); extern int user_min_free_kbytes; +extern struct rw_semaphore pcp_batch_high_lock; extern void zone_pcp_update(struct zone *zone); extern void zone_pcp_reset(struct zone *zone); +extern void zone_update_pageset_high_and_batch(struct zone *zone, + unsigned long high, unsigned long batch); #if defined CONFIG_COMPACTION || defined CONFIG_CMA diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index b11a269e2356..a978ac32279b 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1485,6 +1485,7 @@ static int __ref __offline_pages(unsigned long start_pfn, struct zone *zone; struct memory_notify arg; char *reason; + bool unisolated_last = false; mem_hotplug_begin(); @@ -1575,20 +1576,6 @@ static int __ref __offline_pages(unsigned long start_pfn, /* check again */ ret = walk_system_ram_range(start_pfn, end_pfn - start_pfn, NULL, check_pages_isolated_cb); - /* - * per-cpu pages are drained in start_isolate_page_range, but if - * there are still pages that are not free, make sure that we - * drain again, because when we isolated range we might - * have raced with another thread that was adding pages to pcp - * list. - * - * Forward progress should be still guaranteed because - * pages on the pcp list can only belong to MOVABLE_ZONE - * because has_unmovable_pages explicitly checks for - * PageBuddy on freed pages on other zones. - */ - if (ret) - drain_all_pages(zone); } while (ret); /* Ok, all of our target is isolated. @@ -1602,8 +1589,17 @@ static int __ref __offline_pages(unsigned long start_pfn, * pageblocks zone counter here. */ spin_lock_irqsave(&zone->lock, flags); + if (nr_isolate_pageblock && nr_isolate_pageblock == + zone->nr_isolate_pageblock) + unisolated_last = true; zone->nr_isolate_pageblock -= nr_isolate_pageblock; spin_unlock_irqrestore(&zone->lock, flags); + if (unisolated_last) { + zone_update_pageset_high_and_batch(zone, zone->pageset_high, + zone->pageset_batch); + } + /* pairs with start_isolate_page_range() */ + up_read(&pcp_batch_high_lock); /* removal success */ adjust_managed_page_count(pfn_to_page(start_pfn), -offlined_pages); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 004350a2b6ca..d82f3bec7953 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -78,7 +78,7 @@ #include "page_reporting.h" /* prevent >1 _updater_ of zone percpu pageset ->high and ->batch fields */ -static DEFINE_MUTEX(pcp_batch_high_lock); +DECLARE_RWSEM(pcp_batch_high_lock); #define MIN_PERCPU_PAGELIST_FRACTION (8) #ifdef CONFIG_USE_PERCPU_NUMA_NODE_ID @@ -2958,14 +2958,7 @@ static void drain_local_pages_wq(struct work_struct *work) preempt_enable(); } -/* - * Spill all the per-cpu pages from all CPUs back into the buddy allocator. - * - * When zone parameter is non-NULL, spill just the single zone's pages. - * - * Note that this can be extremely slow as the draining happens in a workqueue. - */ -void drain_all_pages(struct zone *zone) +void __drain_all_pages(struct zone *zone, bool page_isolation) { int cpu; @@ -3004,7 +2997,13 @@ void drain_all_pages(struct zone *zone) struct zone *z; bool has_pcps = false; - if (zone) { + if (page_isolation) { + /* + * For page isolation, don't trust the racy pcp.count + * check. We need to flush really everything. + */ + has_pcps = true; + } else if (zone) { pcp = per_cpu_ptr(zone->pageset, cpu); if (pcp->pcp.count) has_pcps = true; @@ -3037,6 +3036,18 @@ void drain_all_pages(struct zone *zone) mutex_unlock(&pcpu_drain_mutex); } +/* + * Spill all the per-cpu pages from all CPUs back into the buddy allocator. + * + * When zone parameter is non-NULL, spill just the single zone's pages. + * + * Note that this can be extremely slow as the draining happens in a workqueue. + */ +void drain_all_pages(struct zone *zone) +{ + __drain_all_pages(zone, false); +} + #ifdef CONFIG_HIBERNATION /* @@ -6237,13 +6248,23 @@ static void pageset_init(struct per_cpu_pageset *p) pcp->batch = 1; } +void zone_update_pageset_high_and_batch(struct zone *zone, unsigned long high, + unsigned long batch) +{ + struct per_cpu_pageset *p; + int cpu; + + for_each_possible_cpu(cpu) { + p = per_cpu_ptr(zone->pageset, cpu); + pageset_update(&p->pcp, high, batch); + } +} + static void zone_set_pageset_high_and_batch(struct zone *zone, bool force_update) { unsigned long new_high; unsigned long new_batch; int fraction = READ_ONCE(percpu_pagelist_fraction); - int cpu; - struct per_cpu_pageset *p; if (fraction) { new_high = zone_managed_pages(zone) / fraction; @@ -6264,10 +6285,7 @@ static void zone_set_pageset_high_and_batch(struct zone *zone, bool force_update return; } - for_each_possible_cpu(cpu) { - p = per_cpu_ptr(zone->pageset, cpu); - pageset_update(&p->pcp, new_high, new_batch); - } + zone_update_pageset_high_and_batch(zone, new_high, new_batch); } void __meminit setup_zone_pageset(struct zone *zone) @@ -8026,7 +8044,7 @@ int percpu_pagelist_fraction_sysctl_handler(struct ctl_table *table, int write, int old_percpu_pagelist_fraction; int ret; - mutex_lock(&pcp_batch_high_lock); + down_write(&pcp_batch_high_lock); old_percpu_pagelist_fraction = percpu_pagelist_fraction; ret = proc_dointvec_minmax(table, write, buffer, length, ppos); @@ -8048,7 +8066,7 @@ int percpu_pagelist_fraction_sysctl_handler(struct ctl_table *table, int write, for_each_populated_zone(zone) zone_set_pageset_high_and_batch(zone, false); out: - mutex_unlock(&pcp_batch_high_lock); + up_write(&pcp_batch_high_lock); return ret; } @@ -8661,9 +8679,9 @@ EXPORT_SYMBOL(free_contig_range); */ void __meminit zone_pcp_update(struct zone *zone) { - mutex_lock(&pcp_batch_high_lock); + down_write(&pcp_batch_high_lock); zone_set_pageset_high_and_batch(zone, false); - mutex_unlock(&pcp_batch_high_lock); + up_write(&pcp_batch_high_lock); } void zone_pcp_reset(struct zone *zone) diff --git a/mm/page_isolation.c b/mm/page_isolation.c index 63a3db10a8c0..ceada64abd1f 100644 --- a/mm/page_isolation.c +++ b/mm/page_isolation.c @@ -21,6 +21,7 @@ static int set_migratetype_isolate(struct page *page, int migratetype, int isol_ struct zone *zone; unsigned long flags; int ret = -EBUSY; + bool first_isolated_pageblock = false; zone = page_zone(page); @@ -45,6 +46,8 @@ static int set_migratetype_isolate(struct page *page, int migratetype, int isol_ set_pageblock_migratetype(page, MIGRATE_ISOLATE); zone->nr_isolate_pageblock++; + if (zone->nr_isolate_pageblock == 1) + first_isolated_pageblock = true; nr_pages = move_freepages_block(zone, page, MIGRATE_ISOLATE, NULL); @@ -54,8 +57,9 @@ static int set_migratetype_isolate(struct page *page, int migratetype, int isol_ out: spin_unlock_irqrestore(&zone->lock, flags); - if (!ret) { - drain_all_pages(zone); + if (!ret && first_isolated_pageblock) { + zone_update_pageset_high_and_batch(zone, 0, 1); + __drain_all_pages(zone, true); } else { WARN_ON_ONCE(zone_idx(zone) == ZONE_MOVABLE); @@ -78,6 +82,7 @@ static void unset_migratetype_isolate(struct page *page, unsigned migratetype) unsigned int order; unsigned long pfn, buddy_pfn; struct page *buddy; + bool unisolated_last = false; zone = page_zone(page); spin_lock_irqsave(&zone->lock, flags); @@ -120,8 +125,14 @@ static void unset_migratetype_isolate(struct page *page, unsigned migratetype) if (isolated_page) __putback_isolated_page(page, order, migratetype); zone->nr_isolate_pageblock--; + if (zone->nr_isolate_pageblock == 0) + unisolated_last = true; out: spin_unlock_irqrestore(&zone->lock, flags); + if (unisolated_last) { + zone_update_pageset_high_and_batch(zone, zone->pageset_high, + zone->pageset_batch); + } } static inline struct page * @@ -170,14 +181,17 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages) * pageblocks we may have modified and return -EBUSY to caller. This * prevents two threads from simultaneously working on overlapping ranges. * - * Please note that there is no strong synchronization with the page allocator - * either. Pages might be freed while their page blocks are marked ISOLATED. - * In some cases pages might still end up on pcp lists and that would allow - * for their allocation even when they are in fact isolated already. Depending - * on how strong of a guarantee the caller needs drain_all_pages might be needed - * (e.g. __offline_pages will need to call it after check for isolated range for - * a next retry). + * To synchronize with page allocator users freeing pages on the pcplists, we + * disable them by setting their allowed usage (pcp->high) to 0, and issue a + * drain. This is only needed when isolating the first pageblock of a zone. * + * Successful call to start_isolate_page_range() has to be paired with + * undo_isolate_page_range() for proper accounting of zone->nr_isolate_pageblock + * (which controls pcplist enabling/disabling discussed above, including + * handling of pcp_batch_high_lock). + * If undo_isolate_page_range() is not used, this has to be handled manually + * by caller. + * * Return: the number of isolated pageblocks on success and -EBUSY if any part * of range cannot be isolated. */ @@ -192,6 +206,13 @@ int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn, BUG_ON(!IS_ALIGNED(start_pfn, pageblock_nr_pages)); BUG_ON(!IS_ALIGNED(end_pfn, pageblock_nr_pages)); + /* + * We are going to change pcplists's high and batch values temporarily, + * so block any updates via sysctl. Caller must unlock by + * undo_isolate_page_range() or finish_isolate_page_range(). + */ + down_read(&pcp_batch_high_lock); + for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages) { @@ -215,6 +236,8 @@ int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn, unset_migratetype_isolate(page, migratetype); } + up_read(&pcp_batch_high_lock); + return -EBUSY; } @@ -238,7 +261,11 @@ void undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn, continue; unset_migratetype_isolate(page, migratetype); } + + up_read(&pcp_batch_high_lock); } + + /* * Test all pages in the range is free(means isolated) or not. * all pages in [start_pfn...end_pfn) must be in the same zone.