From patchwork Wed Nov 11 09:28:06 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vlastimil Babka X-Patchwork-Id: 11897267 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5D558139F for ; Wed, 11 Nov 2020 09:28:26 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 13F882072C for ; Wed, 11 Nov 2020 09:28:25 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 13F882072C Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 241ED6B0071; Wed, 11 Nov 2020 04:28:25 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 1A4116B0074; Wed, 11 Nov 2020 04:28:25 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EC1336B0071; Wed, 11 Nov 2020 04:28:24 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0142.hostedemail.com [216.40.44.142]) by kanga.kvack.org (Postfix) with ESMTP id BAFC56B0068 for ; Wed, 11 Nov 2020 04:28:24 -0500 (EST) Received: from smtpin10.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 655875853 for ; Wed, 11 Nov 2020 09:28:24 +0000 (UTC) X-FDA: 77471611728.10.dolls90_590dc33272fc Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin10.hostedemail.com (Postfix) with ESMTP id 41B6C16A0B9 for ; Wed, 11 Nov 2020 09:28:24 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,vbabka@suse.cz,,RULES_HIT:30054:30070:30090,0,RBL:195.135.220.15:@suse.cz:.lbl8.mailshell.net-62.2.6.2 64.100.201.201;04y8o7opt55iksxok6eugjap3ewn1opcqesnmt3meja8so8mjnsm936u7uduuun.hzqez133nxpbkq9hg4j8sgmk6oioz4p87sxn1137mjkifh3xtuzmk94htg63o7q.g-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:70,LUA_SUMMARY:none X-HE-Tag: dolls90_590dc33272fc X-Filterd-Recvd-Size: 5084 Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by imf06.hostedemail.com (Postfix) with ESMTP for ; Wed, 11 Nov 2020 09:28:23 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 5EDC9AC35; Wed, 11 Nov 2020 09:28:22 +0000 (UTC) From: Vlastimil Babka To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Michal Hocko , Pavel Tatashin , David Hildenbrand , Oscar Salvador , Joonsoo Kim , Vlastimil Babka , Michal Hocko Subject: [PATCH v3 1/7] mm, page_alloc: clean up pageset high and batch update Date: Wed, 11 Nov 2020 10:28:06 +0100 Message-Id: <20201111092812.11329-2-vbabka@suse.cz> X-Mailer: git-send-email 2.29.1 In-Reply-To: <20201111092812.11329-1-vbabka@suse.cz> References: <20201111092812.11329-1-vbabka@suse.cz> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The updates to pcplists' high and batch values are handled by multiple functions that make the calculations hard to follow. Consolidate everything to pageset_set_high_and_batch() and remove pageset_set_batch() and pageset_set_high() wrappers. The only special case using one of the removed wrappers was: build_all_zonelists_init() setup_pageset() pageset_set_batch() which was hardcoding batch as 0, so we can just open-code a call to pageset_update() with constant parameters instead. No functional change. Signed-off-by: Vlastimil Babka Reviewed-by: Oscar Salvador Reviewed-by: David Hildenbrand Acked-by: Michal Hocko Acked-by: Pankaj Gupta --- mm/page_alloc.c | 49 ++++++++++++++++++++----------------------------- 1 file changed, 20 insertions(+), 29 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 7560240fcf7d..3f1c57344e73 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5911,7 +5911,7 @@ static void build_zonelists(pg_data_t *pgdat) * not check if the processor is online before following the pageset pointer. * Other parts of the kernel may not check if the zone is available. */ -static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch); +static void setup_pageset(struct per_cpu_pageset *p); static DEFINE_PER_CPU(struct per_cpu_pageset, boot_pageset); static DEFINE_PER_CPU(struct per_cpu_nodestat, boot_nodestats); @@ -5979,7 +5979,7 @@ build_all_zonelists_init(void) * (a chicken-egg dilemma). */ for_each_possible_cpu(cpu) - setup_pageset(&per_cpu(boot_pageset, cpu), 0); + setup_pageset(&per_cpu(boot_pageset, cpu)); mminit_verify_zonelist(); cpuset_init_current_mems_allowed(); @@ -6288,12 +6288,6 @@ static void pageset_update(struct per_cpu_pages *pcp, unsigned long high, pcp->batch = batch; } -/* a companion to pageset_set_high() */ -static void pageset_set_batch(struct per_cpu_pageset *p, unsigned long batch) -{ - pageset_update(&p->pcp, 6 * batch, max(1UL, 1 * batch)); -} - static void pageset_init(struct per_cpu_pageset *p) { struct per_cpu_pages *pcp; @@ -6306,35 +6300,32 @@ static void pageset_init(struct per_cpu_pageset *p) INIT_LIST_HEAD(&pcp->lists[migratetype]); } -static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch) +static void setup_pageset(struct per_cpu_pageset *p) { pageset_init(p); - pageset_set_batch(p, batch); + pageset_update(&p->pcp, 0, 1); } /* - * pageset_set_high() sets the high water mark for hot per_cpu_pagelist - * to the value high for the pageset p. + * Calculate and set new high and batch values for given per-cpu pageset of a + * zone, based on the zone's size and the percpu_pagelist_fraction sysctl. */ -static void pageset_set_high(struct per_cpu_pageset *p, - unsigned long high) -{ - unsigned long batch = max(1UL, high / 4); - if ((high / 4) > (PAGE_SHIFT * 8)) - batch = PAGE_SHIFT * 8; - - pageset_update(&p->pcp, high, batch); -} - static void pageset_set_high_and_batch(struct zone *zone, - struct per_cpu_pageset *pcp) + struct per_cpu_pageset *p) { - if (percpu_pagelist_fraction) - pageset_set_high(pcp, - (zone_managed_pages(zone) / - percpu_pagelist_fraction)); - else - pageset_set_batch(pcp, zone_batchsize(zone)); + unsigned long new_high, new_batch; + + if (percpu_pagelist_fraction) { + new_high = zone_managed_pages(zone) / percpu_pagelist_fraction; + new_batch = max(1UL, new_high / 4); + if ((new_high / 4) > (PAGE_SHIFT * 8)) + new_batch = PAGE_SHIFT * 8; + } else { + new_batch = zone_batchsize(zone); + new_high = 6 * new_batch; + new_batch = max(1UL, 1 * new_batch); + } + pageset_update(&p->pcp, new_high, new_batch); } static void __meminit zone_pageset_init(struct zone *zone, int cpu) From patchwork Wed Nov 11 09:28:07 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vlastimil Babka X-Patchwork-Id: 11897269 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D23BA921 for ; Wed, 11 Nov 2020 09:28:27 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 76AD32075A for ; Wed, 11 Nov 2020 09:28:27 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 76AD32075A Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 4BEBA6B0073; Wed, 11 Nov 2020 04:28:25 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 2DB1D6B0072; Wed, 11 Nov 2020 04:28:25 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 01BB16B0070; Wed, 11 Nov 2020 04:28:24 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0204.hostedemail.com [216.40.44.204]) by kanga.kvack.org (Postfix) with ESMTP id BBD686B006C for ; Wed, 11 Nov 2020 04:28:24 -0500 (EST) Received: from smtpin11.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 573718249980 for ; Wed, 11 Nov 2020 09:28:24 +0000 (UTC) X-FDA: 77471611728.11.wrist93_0514866272fc Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin11.hostedemail.com (Postfix) with ESMTP id 39CF1180F8B80 for ; Wed, 11 Nov 2020 09:28:24 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,vbabka@suse.cz,,RULES_HIT:30054:30070,0,RBL:195.135.220.15:@suse.cz:.lbl8.mailshell.net-62.2.6.2 64.100.201.201;04yg7j31jrhf6xkxje61w9qe5afegoph4qb9p66wa88rjb8kaa8g93p3uy465j3.wcff8bfshz3zppjz9cs9c3yu1dsdaomh11p1kzj1wfjdrd68tuscp799c3hatdh.y-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:70,LUA_SUMMARY:none X-HE-Tag: wrist93_0514866272fc X-Filterd-Recvd-Size: 4652 Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by imf16.hostedemail.com (Postfix) with ESMTP for ; Wed, 11 Nov 2020 09:28:23 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 5CA44AC2E; Wed, 11 Nov 2020 09:28:22 +0000 (UTC) From: Vlastimil Babka To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Michal Hocko , Pavel Tatashin , David Hildenbrand , Oscar Salvador , Joonsoo Kim , Vlastimil Babka , Michal Hocko Subject: [PATCH v3 2/7] mm, page_alloc: calculate pageset high and batch once per zone Date: Wed, 11 Nov 2020 10:28:07 +0100 Message-Id: <20201111092812.11329-3-vbabka@suse.cz> X-Mailer: git-send-email 2.29.1 In-Reply-To: <20201111092812.11329-1-vbabka@suse.cz> References: <20201111092812.11329-1-vbabka@suse.cz> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: We currently call pageset_set_high_and_batch() for each possible cpu, which repeats the same calculations of high and batch values. Instead call the function just once per zone, and make it apply the calculated values to all per-cpu pagesets of the zone. This also allows removing the zone_pageset_init() and __zone_pcp_update() wrappers. No functional change. Signed-off-by: Vlastimil Babka Reviewed-by: Oscar Salvador Reviewed-by: David Hildenbrand Acked-by: Michal Hocko --- mm/page_alloc.c | 42 ++++++++++++++++++------------------------ 1 file changed, 18 insertions(+), 24 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 3f1c57344e73..2fa432762908 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6307,13 +6307,14 @@ static void setup_pageset(struct per_cpu_pageset *p) } /* - * Calculate and set new high and batch values for given per-cpu pageset of a + * Calculate and set new high and batch values for all per-cpu pagesets of a * zone, based on the zone's size and the percpu_pagelist_fraction sysctl. */ -static void pageset_set_high_and_batch(struct zone *zone, - struct per_cpu_pageset *p) +static void zone_set_pageset_high_and_batch(struct zone *zone) { unsigned long new_high, new_batch; + struct per_cpu_pageset *p; + int cpu; if (percpu_pagelist_fraction) { new_high = zone_managed_pages(zone) / percpu_pagelist_fraction; @@ -6325,23 +6326,25 @@ static void pageset_set_high_and_batch(struct zone *zone, new_high = 6 * new_batch; new_batch = max(1UL, 1 * new_batch); } - pageset_update(&p->pcp, new_high, new_batch); -} - -static void __meminit zone_pageset_init(struct zone *zone, int cpu) -{ - struct per_cpu_pageset *pcp = per_cpu_ptr(zone->pageset, cpu); - pageset_init(pcp); - pageset_set_high_and_batch(zone, pcp); + for_each_possible_cpu(cpu) { + p = per_cpu_ptr(zone->pageset, cpu); + pageset_update(&p->pcp, new_high, new_batch); + } } void __meminit setup_zone_pageset(struct zone *zone) { + struct per_cpu_pageset *p; int cpu; + zone->pageset = alloc_percpu(struct per_cpu_pageset); - for_each_possible_cpu(cpu) - zone_pageset_init(zone, cpu); + for_each_possible_cpu(cpu) { + p = per_cpu_ptr(zone->pageset, cpu); + pageset_init(p); + } + + zone_set_pageset_high_and_batch(zone); } /* @@ -8080,15 +8083,6 @@ int lowmem_reserve_ratio_sysctl_handler(struct ctl_table *table, int write, return 0; } -static void __zone_pcp_update(struct zone *zone) -{ - unsigned int cpu; - - for_each_possible_cpu(cpu) - pageset_set_high_and_batch(zone, - per_cpu_ptr(zone->pageset, cpu)); -} - /* * percpu_pagelist_fraction - changes the pcp->high for each zone on each * cpu. It is the fraction of total pages in each zone that a hot per cpu @@ -8121,7 +8115,7 @@ int percpu_pagelist_fraction_sysctl_handler(struct ctl_table *table, int write, goto out; for_each_populated_zone(zone) - __zone_pcp_update(zone); + zone_set_pageset_high_and_batch(zone); out: mutex_unlock(&pcp_batch_high_lock); return ret; @@ -8746,7 +8740,7 @@ EXPORT_SYMBOL(free_contig_range); void __meminit zone_pcp_update(struct zone *zone) { mutex_lock(&pcp_batch_high_lock); - __zone_pcp_update(zone); + zone_set_pageset_high_and_batch(zone); mutex_unlock(&pcp_batch_high_lock); } From patchwork Wed Nov 11 09:28:08 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vlastimil Babka X-Patchwork-Id: 11897271 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 22D8A921 for ; Wed, 11 Nov 2020 09:28:29 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id D98D520756 for ; Wed, 11 Nov 2020 09:28:28 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D98D520756 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 7F38F6B0068; Wed, 11 Nov 2020 04:28:25 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 43D5D6B006C; Wed, 11 Nov 2020 04:28:25 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 23E8A6B006C; Wed, 11 Nov 2020 04:28:25 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0162.hostedemail.com [216.40.44.162]) by kanga.kvack.org (Postfix) with ESMTP id D67B06B006E for ; Wed, 11 Nov 2020 04:28:24 -0500 (EST) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 7E1D3180AD801 for ; Wed, 11 Nov 2020 09:28:24 +0000 (UTC) X-FDA: 77471611728.14.pipe61_3903a27272fc Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin14.hostedemail.com (Postfix) with ESMTP id 59E0418229835 for ; Wed, 11 Nov 2020 09:28:24 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,vbabka@suse.cz,,RULES_HIT:30054:30070:30090,0,RBL:195.135.220.15:@suse.cz:.lbl8.mailshell.net-62.2.6.2 64.100.201.201;04ygrkmhjrgan1b687irny85ik56poc3kwbt9i5gcooska7pjj7xgnia6mnxjzg.nn5w5dthcicxgh63hbc95kt6gmm871zyikxaezjwfowz1rnyqksyme3xyae9yir.e-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:72,LUA_SUMMARY:none X-HE-Tag: pipe61_3903a27272fc X-Filterd-Recvd-Size: 3287 Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by imf09.hostedemail.com (Postfix) with ESMTP for ; Wed, 11 Nov 2020 09:28:23 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 75FAFAC65; Wed, 11 Nov 2020 09:28:22 +0000 (UTC) From: Vlastimil Babka To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Michal Hocko , Pavel Tatashin , David Hildenbrand , Oscar Salvador , Joonsoo Kim , Vlastimil Babka , Michal Hocko Subject: [PATCH v3 3/7] mm, page_alloc: remove setup_pageset() Date: Wed, 11 Nov 2020 10:28:08 +0100 Message-Id: <20201111092812.11329-4-vbabka@suse.cz> X-Mailer: git-send-email 2.29.1 In-Reply-To: <20201111092812.11329-1-vbabka@suse.cz> References: <20201111092812.11329-1-vbabka@suse.cz> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: We initialize boot-time pagesets with setup_pageset(), which sets high and batch values that effectively disable pcplists. We can remove this wrapper if we just set these values for all pagesets in pageset_init(). Non-boot pagesets then subsequently update them to the proper values. No functional change. Signed-off-by: Vlastimil Babka Reviewed-by: David Hildenbrand Reviewed-by: Oscar Salvador Acked-by: Michal Hocko --- mm/page_alloc.c | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 2fa432762908..5a8ec7d94884 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5911,7 +5911,7 @@ static void build_zonelists(pg_data_t *pgdat) * not check if the processor is online before following the pageset pointer. * Other parts of the kernel may not check if the zone is available. */ -static void setup_pageset(struct per_cpu_pageset *p); +static void pageset_init(struct per_cpu_pageset *p); static DEFINE_PER_CPU(struct per_cpu_pageset, boot_pageset); static DEFINE_PER_CPU(struct per_cpu_nodestat, boot_nodestats); @@ -5979,7 +5979,7 @@ build_all_zonelists_init(void) * (a chicken-egg dilemma). */ for_each_possible_cpu(cpu) - setup_pageset(&per_cpu(boot_pageset, cpu)); + pageset_init(&per_cpu(boot_pageset, cpu)); mminit_verify_zonelist(); cpuset_init_current_mems_allowed(); @@ -6298,12 +6298,15 @@ static void pageset_init(struct per_cpu_pageset *p) pcp = &p->pcp; for (migratetype = 0; migratetype < MIGRATE_PCPTYPES; migratetype++) INIT_LIST_HEAD(&pcp->lists[migratetype]); -} -static void setup_pageset(struct per_cpu_pageset *p) -{ - pageset_init(p); - pageset_update(&p->pcp, 0, 1); + /* + * Set batch and high values safe for a boot pageset. A true percpu + * pageset's initialization will update them subsequently. Here we don't + * need to be as careful as pageset_update() as nobody can access the + * pageset yet. + */ + pcp->high = 0; + pcp->batch = 1; } /* From patchwork Wed Nov 11 09:28:09 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vlastimil Babka X-Patchwork-Id: 11897275 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CCDDD921 for ; Wed, 11 Nov 2020 09:28:32 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 8863A20756 for ; Wed, 11 Nov 2020 09:28:32 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8863A20756 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id E485B6B0070; Wed, 11 Nov 2020 04:28:25 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 7BBFF6B006E; Wed, 11 Nov 2020 04:28:25 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4AE9B6B0070; Wed, 11 Nov 2020 04:28:25 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0092.hostedemail.com [216.40.44.92]) by kanga.kvack.org (Postfix) with ESMTP id 108626B0073 for ; Wed, 11 Nov 2020 04:28:25 -0500 (EST) Received: from smtpin15.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id A19F598B9 for ; Wed, 11 Nov 2020 09:28:24 +0000 (UTC) X-FDA: 77471611728.15.tree95_630236e272fc Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin15.hostedemail.com (Postfix) with ESMTP id 664A81814B0C9 for ; Wed, 11 Nov 2020 09:28:24 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,vbabka@suse.cz,,RULES_HIT:30012:30054:30056:30070,0,RBL:195.135.220.15:@suse.cz:.lbl8.mailshell.net-62.2.6.2 64.100.201.201;04yrqm1w7zfh8uqhnubp9c4nkhh8joc58ucfhu7gzpegtuw747xbeynmi3aizkj.wbi9kdbfxt5mmcjd6bpiu3eza59s5jpero4a4rd3rycru3zif9drmbhdqwu17hf.s-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:none,Custom_rules:0:1:0,LFtime:69,LUA_SUMMARY:none X-HE-Tag: tree95_630236e272fc X-Filterd-Recvd-Size: 5884 Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by imf04.hostedemail.com (Postfix) with ESMTP for ; Wed, 11 Nov 2020 09:28:23 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 914EDAD19; Wed, 11 Nov 2020 09:28:22 +0000 (UTC) From: Vlastimil Babka To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Michal Hocko , Pavel Tatashin , David Hildenbrand , Oscar Salvador , Joonsoo Kim , Vlastimil Babka , Michal Hocko Subject: [PATCH v3 4/7] mm, page_alloc: simplify pageset_update() Date: Wed, 11 Nov 2020 10:28:09 +0100 Message-Id: <20201111092812.11329-5-vbabka@suse.cz> X-Mailer: git-send-email 2.29.1 In-Reply-To: <20201111092812.11329-1-vbabka@suse.cz> References: <20201111092812.11329-1-vbabka@suse.cz> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: pageset_update() attempts to update pcplist's high and batch values in a way that readers don't observe batch > high. It uses smp_wmb() to order the updates in a way to achieve this. However, without proper pairing read barriers in readers this guarantee doesn't hold, and there are no such barriers in e.g. free_unref_page_commit(). Commit 88e8ac11d2ea ("mm, page_alloc: fix core hung in free_pcppages_bulk()") already showed this is problematic, and solved this by ultimately only trusing pcp->count of the current cpu with interrupts disabled. The update dance with unpaired write barriers thus makes no sense. Replace them with plain WRITE_ONCE to prevent store tearing, and document that the values can change asynchronously and should not be trusted for correctness. All current readers appear to be OK after 88e8ac11d2ea. Convert them to READ_ONCE to prevent unnecessary read tearing, but mainly to alert anybody making future changes to the code that special care is needed. Signed-off-by: Vlastimil Babka Reviewed-by: Oscar Salvador Acked-by: David Hildenbrand Acked-by: Michal Hocko --- mm/page_alloc.c | 40 ++++++++++++++++++---------------------- 1 file changed, 18 insertions(+), 22 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 5a8ec7d94884..8b43d6d1a288 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1353,7 +1353,7 @@ static void free_pcppages_bulk(struct zone *zone, int count, { int migratetype = 0; int batch_free = 0; - int prefetch_nr = 0; + int prefetch_nr = READ_ONCE(pcp->batch); bool isolated_pageblocks; struct page *page, *tmp; LIST_HEAD(head); @@ -1404,8 +1404,10 @@ static void free_pcppages_bulk(struct zone *zone, int count, * avoid excessive prefetching due to large count, only * prefetch buddy for the first pcp->batch nr of pages. */ - if (prefetch_nr++ < pcp->batch) + if (prefetch_nr) { prefetch_buddy(page); + prefetch_nr--; + } } while (--count && --batch_free && !list_empty(list)); } @@ -3202,10 +3204,8 @@ static void free_unref_page_commit(struct page *page, unsigned long pfn) pcp = &this_cpu_ptr(zone->pageset)->pcp; list_add(&page->lru, &pcp->lists[migratetype]); pcp->count++; - if (pcp->count >= pcp->high) { - unsigned long batch = READ_ONCE(pcp->batch); - free_pcppages_bulk(zone, batch, pcp); - } + if (pcp->count >= READ_ONCE(pcp->high)) + free_pcppages_bulk(zone, READ_ONCE(pcp->batch), pcp); } /* @@ -3390,7 +3390,7 @@ static struct page *__rmqueue_pcplist(struct zone *zone, int migratetype, do { if (list_empty(list)) { pcp->count += rmqueue_bulk(zone, 0, - pcp->batch, list, + READ_ONCE(pcp->batch), list, migratetype, alloc_flags); if (unlikely(list_empty(list))) return NULL; @@ -6262,13 +6262,16 @@ static int zone_batchsize(struct zone *zone) } /* - * pcp->high and pcp->batch values are related and dependent on one another: - * ->batch must never be higher then ->high. - * The following function updates them in a safe manner without read side - * locking. + * pcp->high and pcp->batch values are related and generally batch is lower + * than high. They are also related to pcp->count such that count is lower + * than high, and as soon as it reaches high, the pcplist is flushed. * - * Any new users of pcp->batch and pcp->high should ensure they can cope with - * those fields changing asynchronously (acording to the above rule). + * However, guaranteeing these relations at all times would require e.g. write + * barriers here but also careful usage of read barriers at the read side, and + * thus be prone to error and bad for performance. Thus the update only prevents + * store tearing. Any new users of pcp->batch and pcp->high should ensure they + * can cope with those fields changing asynchronously, and fully trust only the + * pcp->count field on the local CPU with interrupts disabled. * * mutex_is_locked(&pcp_batch_high_lock) required when calling this function * outside of boot time (or some other assurance that no concurrent updaters @@ -6277,15 +6280,8 @@ static int zone_batchsize(struct zone *zone) static void pageset_update(struct per_cpu_pages *pcp, unsigned long high, unsigned long batch) { - /* start with a fail safe value for batch */ - pcp->batch = 1; - smp_wmb(); - - /* Update high, then batch, in order */ - pcp->high = high; - smp_wmb(); - - pcp->batch = batch; + WRITE_ONCE(pcp->batch, batch); + WRITE_ONCE(pcp->high, high); } static void pageset_init(struct per_cpu_pageset *p) From patchwork Wed Nov 11 09:28:10 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vlastimil Babka X-Patchwork-Id: 11897277 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1062E139F for ; Wed, 11 Nov 2020 09:28:35 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id BA45D20756 for ; Wed, 11 Nov 2020 09:28:34 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org BA45D20756 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 68A746B006E; Wed, 11 Nov 2020 04:28:26 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 5C1FC6B0072; Wed, 11 Nov 2020 04:28:26 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 467EE6B0074; Wed, 11 Nov 2020 04:28:26 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0235.hostedemail.com [216.40.44.235]) by kanga.kvack.org (Postfix) with ESMTP id 0AAD86B0072 for ; Wed, 11 Nov 2020 04:28:25 -0500 (EST) Received: from smtpin06.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id AF142181AC9CC for ; Wed, 11 Nov 2020 09:28:25 +0000 (UTC) X-FDA: 77471611770.06.waste53_100b361272fc Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin06.hostedemail.com (Postfix) with ESMTP id 90C0710045630 for ; Wed, 11 Nov 2020 09:28:25 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,vbabka@suse.cz,,RULES_HIT:30054:30056:30070:30079:30090:30091,0,RBL:195.135.220.15:@suse.cz:.lbl8.mailshell.net-62.2.6.2 64.100.201.201;04yrp3zjeyk1hnfo4zw5aabtqsk48opiqmo9u8tijzgrsbnde4mtfsi5kgwou79.ifx6ejh9sajtfq5xsr9ppud1ip71oyj4c8j4gy61ec7zpigz8pjcbwew6yzi9zp.c-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:106,LUA_SUMMARY:none X-HE-Tag: waste53_100b361272fc X-Filterd-Recvd-Size: 4222 Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by imf42.hostedemail.com (Postfix) with ESMTP for ; Wed, 11 Nov 2020 09:28:25 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id B0ACFAD1A; Wed, 11 Nov 2020 09:28:22 +0000 (UTC) From: Vlastimil Babka To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Michal Hocko , Pavel Tatashin , David Hildenbrand , Oscar Salvador , Joonsoo Kim , Vlastimil Babka , Michal Hocko Subject: [PATCH v3 5/7] mm, page_alloc: cache pageset high and batch in struct zone Date: Wed, 11 Nov 2020 10:28:10 +0100 Message-Id: <20201111092812.11329-6-vbabka@suse.cz> X-Mailer: git-send-email 2.29.1 In-Reply-To: <20201111092812.11329-1-vbabka@suse.cz> References: <20201111092812.11329-1-vbabka@suse.cz> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: All per-cpu pagesets for a zone use the same high and batch values, that are duplicated there just for performance (locality) reasons. This patch adds the same variables also to struct zone as a shared copy. This will be useful later for making possible to disable pcplists temporarily by setting high value to 0, while remembering the values for restoring them later. But we can also immediately benefit from not updating pagesets of all possible cpus in case the newly recalculated values (after sysctl change or memory online/offline) are actually unchanged from the previous ones. Signed-off-by: Vlastimil Babka Reviewed-by: Oscar Salvador Acked-by: Michal Hocko Reviewed-by: David Hildenbrand --- include/linux/mmzone.h | 6 ++++++ mm/page_alloc.c | 16 ++++++++++++++-- 2 files changed, 20 insertions(+), 2 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 7385871768d4..bb9188de2718 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -470,6 +470,12 @@ struct zone { #endif struct pglist_data *zone_pgdat; struct per_cpu_pageset __percpu *pageset; + /* + * the high and batch values are copied to individual pagesets for + * faster access + */ + int pageset_high; + int pageset_batch; #ifndef CONFIG_SPARSEMEM /* diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 8b43d6d1a288..25bc9bb77696 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5912,6 +5912,9 @@ static void build_zonelists(pg_data_t *pgdat) * Other parts of the kernel may not check if the zone is available. */ static void pageset_init(struct per_cpu_pageset *p); +/* These effectively disable the pcplists in the boot pageset completely */ +#define BOOT_PAGESET_HIGH 0 +#define BOOT_PAGESET_BATCH 1 static DEFINE_PER_CPU(struct per_cpu_pageset, boot_pageset); static DEFINE_PER_CPU(struct per_cpu_nodestat, boot_nodestats); @@ -6301,8 +6304,8 @@ static void pageset_init(struct per_cpu_pageset *p) * need to be as careful as pageset_update() as nobody can access the * pageset yet. */ - pcp->high = 0; - pcp->batch = 1; + pcp->high = BOOT_PAGESET_HIGH; + pcp->batch = BOOT_PAGESET_BATCH; } /* @@ -6326,6 +6329,13 @@ static void zone_set_pageset_high_and_batch(struct zone *zone) new_batch = max(1UL, 1 * new_batch); } + if (zone->pageset_high == new_high && + zone->pageset_batch == new_batch) + return; + + zone->pageset_high = new_high; + zone->pageset_batch = new_batch; + for_each_possible_cpu(cpu) { p = per_cpu_ptr(zone->pageset, cpu); pageset_update(&p->pcp, new_high, new_batch); @@ -6386,6 +6396,8 @@ static __meminit void zone_pcp_init(struct zone *zone) * offset of a (static) per cpu variable into the per cpu area. */ zone->pageset = &boot_pageset; + zone->pageset_high = BOOT_PAGESET_HIGH; + zone->pageset_batch = BOOT_PAGESET_BATCH; if (populated_zone(zone)) printk(KERN_DEBUG " %s zone: %lu pages, LIFO batch:%u\n", From patchwork Wed Nov 11 09:28:11 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vlastimil Babka X-Patchwork-Id: 11897279 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BAA8A139F for ; Wed, 11 Nov 2020 09:28:36 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 77F4D20756 for ; Wed, 11 Nov 2020 09:28:36 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 77F4D20756 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id D437B6B007B; Wed, 11 Nov 2020 04:28:26 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id BE1B76B0074; Wed, 11 Nov 2020 04:28:26 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A0E496B007B; Wed, 11 Nov 2020 04:28:26 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0184.hostedemail.com [216.40.44.184]) by kanga.kvack.org (Postfix) with ESMTP id 5E4826B0074 for ; Wed, 11 Nov 2020 04:28:26 -0500 (EST) Received: from smtpin03.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id F3BE6879E for ; Wed, 11 Nov 2020 09:28:25 +0000 (UTC) X-FDA: 77471611770.03.lead83_191470d272fc Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin03.hostedemail.com (Postfix) with ESMTP id C281F28A4E9 for ; Wed, 11 Nov 2020 09:28:25 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,vbabka@suse.cz,,RULES_HIT:30054:30070,0,RBL:195.135.220.15:@suse.cz:.lbl8.mailshell.net-62.2.6.2 64.100.201.201;04yran1r1qz48i1seqt3rh6n59h13optna4pmdmd4ji1uzg9f8j144jap7hb3kz.g6fnobuoqiotr8za8k65yo4sj1pkr9az637rw41m7uumga1r8hdt65yof5uioy4.c-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:70,LUA_SUMMARY:none X-HE-Tag: lead83_191470d272fc X-Filterd-Recvd-Size: 5323 Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by imf14.hostedemail.com (Postfix) with ESMTP for ; Wed, 11 Nov 2020 09:28:25 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id CAF00AD29; Wed, 11 Nov 2020 09:28:22 +0000 (UTC) From: Vlastimil Babka To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Michal Hocko , Pavel Tatashin , David Hildenbrand , Oscar Salvador , Joonsoo Kim , Vlastimil Babka , Michal Hocko Subject: [PATCH v3 6/7] mm, page_alloc: move draining pcplists to page isolation users Date: Wed, 11 Nov 2020 10:28:11 +0100 Message-Id: <20201111092812.11329-7-vbabka@suse.cz> X-Mailer: git-send-email 2.29.1 In-Reply-To: <20201111092812.11329-1-vbabka@suse.cz> References: <20201111092812.11329-1-vbabka@suse.cz> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Currently, pcplists are drained during set_migratetype_isolate() which means once per pageblock processed start_isolate_page_range(). This is somewhat wasteful. Moreover, the callers might need different guarantees, and the draining is currently prone to races and does not guarantee that no page from isolated pageblock will end up on the pcplist after the drain. Better guarantees are added by later patches and require explicit actions by page isolation users that need them. Thus it makes sense to move the current imperfect draining to the callers also as a preparation step. Suggested-by: David Hildenbrand Suggested-by: Pavel Tatashin Signed-off-by: Vlastimil Babka Reviewed-by: David Hildenbrand Reviewed-by: Oscar Salvador Acked-by: Michal Hocko --- mm/memory_hotplug.c | 11 ++++++----- mm/page_alloc.c | 2 ++ mm/page_isolation.c | 10 +++++----- 3 files changed, 13 insertions(+), 10 deletions(-) diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 41c62295292b..3c494ab0d075 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1500,6 +1500,8 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages) goto failed_removal; } + drain_all_pages(zone); + arg.start_pfn = start_pfn; arg.nr_pages = nr_pages; node_states_check_changes_offline(nr_pages, zone, &arg); @@ -1550,11 +1552,10 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages) } /* - * per-cpu pages are drained in start_isolate_page_range, but if - * there are still pages that are not free, make sure that we - * drain again, because when we isolated range we might - * have raced with another thread that was adding pages to pcp - * list. + * per-cpu pages are drained after start_isolate_page_range, but + * if there are still pages that are not free, make sure that we + * drain again, because when we isolated range we might have + * raced with another thread that was adding pages to pcp list. * * Forward progress should be still guaranteed because * pages on the pcp list can only belong to MOVABLE_ZONE diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 25bc9bb77696..2ec3e1e27169 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -8539,6 +8539,8 @@ int alloc_contig_range(unsigned long start, unsigned long end, if (ret) return ret; + drain_all_pages(cc.zone); + /* * In case of -EBUSY, we'd like to know which page causes problem. * So, just fall through. test_pages_isolated() has a tracepoint diff --git a/mm/page_isolation.c b/mm/page_isolation.c index abbf42214485..feab446d1982 100644 --- a/mm/page_isolation.c +++ b/mm/page_isolation.c @@ -49,7 +49,6 @@ static int set_migratetype_isolate(struct page *page, int migratetype, int isol_ __mod_zone_freepage_state(zone, -nr_pages, mt); spin_unlock_irqrestore(&zone->lock, flags); - drain_all_pages(zone); return 0; } @@ -172,11 +171,12 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages) * * Please note that there is no strong synchronization with the page allocator * either. Pages might be freed while their page blocks are marked ISOLATED. - * In some cases pages might still end up on pcp lists and that would allow + * A call to drain_all_pages() after isolation can flush most of them. However + * in some cases pages might still end up on pcp lists and that would allow * for their allocation even when they are in fact isolated already. Depending - * on how strong of a guarantee the caller needs drain_all_pages might be needed - * (e.g. __offline_pages will need to call it after check for isolated range for - * a next retry). + * on how strong of a guarantee the caller needs, further drain_all_pages() + * might be needed (e.g. __offline_pages will need to call it after check for + * isolated range for a next retry). * * Return: 0 on success and -EBUSY if any part of range cannot be isolated. */ From patchwork Wed Nov 11 09:28:12 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vlastimil Babka X-Patchwork-Id: 11897281 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D232A921 for ; Wed, 11 Nov 2020 09:28:38 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 7BC4F20756 for ; Wed, 11 Nov 2020 09:28:38 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7BC4F20756 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 19B566B0072; Wed, 11 Nov 2020 04:28:27 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 0E84E6B0078; Wed, 11 Nov 2020 04:28:26 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D40D06B0072; Wed, 11 Nov 2020 04:28:26 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0111.hostedemail.com [216.40.44.111]) by kanga.kvack.org (Postfix) with ESMTP id 922E56B0078 for ; Wed, 11 Nov 2020 04:28:26 -0500 (EST) Received: from smtpin25.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 3CC7C98BF for ; Wed, 11 Nov 2020 09:28:26 +0000 (UTC) X-FDA: 77471611812.25.boat22_510da8a272fc Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin25.hostedemail.com (Postfix) with ESMTP id 18C2C1804E3A0 for ; Wed, 11 Nov 2020 09:28:26 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,vbabka@suse.cz,,RULES_HIT:30051:30054:30070:30091,0,RBL:195.135.220.15:@suse.cz:.lbl8.mailshell.net-62.2.6.2 64.100.201.201;04y8o34omixxryjcfp7d6ksssw84oocomh5haf711efep78mdzw5wpejczi9p7m.y5dzcpurbre93fahzwd5dyywmppq3cwawtbftdbt7he3st7uj478pap8aeixt45.1-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:69,LUA_SUMMARY:none X-HE-Tag: boat22_510da8a272fc X-Filterd-Recvd-Size: 11579 Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by imf48.hostedemail.com (Postfix) with ESMTP for ; Wed, 11 Nov 2020 09:28:25 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id DB20FAD5F; Wed, 11 Nov 2020 09:28:22 +0000 (UTC) From: Vlastimil Babka To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Michal Hocko , Pavel Tatashin , David Hildenbrand , Oscar Salvador , Joonsoo Kim , Vlastimil Babka , Michal Hocko Subject: [PATCH v3 7/7] mm, page_alloc: disable pcplists during memory offline Date: Wed, 11 Nov 2020 10:28:12 +0100 Message-Id: <20201111092812.11329-8-vbabka@suse.cz> X-Mailer: git-send-email 2.29.1 In-Reply-To: <20201111092812.11329-1-vbabka@suse.cz> References: <20201111092812.11329-1-vbabka@suse.cz> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Memory offlining relies on page isolation to guarantee a forward progress because pages cannot be reused while they are isolated. But the page isolation itself doesn't prevent from races while freed pages are stored on pcp lists and thus can be reused. This can be worked around by repeated draining of pcplists, as done by commit 968318261221 ("mm/memory_hotplug: drain per-cpu pages again during memory offline"). David and Michal would prefer that this race was closed in a way that callers of page isolation who need stronger guarantees don't need to repeatedly drain. David suggested disabling pcplists usage completely during page isolation, instead of repeatedly draining them. To achieve this without adding special cases in alloc/free fastpath, we can use the same approach as boot pagesets - when pcp->high is 0, any pcplist addition will be immediately flushed. The race can thus be closed by setting pcp->high to 0 and draining pcplists once, before calling start_isolate_page_range(). The draining will serialize after processes that already disabled interrupts and read the old value of pcp->high in free_unref_page_commit(), and processes that have not yet disabled interrupts, will observe pcp->high == 0 when they are rescheduled, and skip pcplists. This guarantees no stray pages on pcplists in zones where isolation happens. This patch thus adds zone_pcp_disable() and zone_pcp_enable() functions that page isolation users can call before start_isolate_page_range() and after unisolating (or offlining) the isolated pages. Also, drain_all_pages() is optimized to only execute on cpus where pcplists are not empty. The check can however race with a free to pcplist that has not yet increased the pcp->count from 0 to 1. Thus make the drain optionally skip the racy check and drain on all cpus, and use this option in zone_pcp_disable(). As we have to avoid external updates to high and batch while pcplists are disabled, we take pcp_batch_high_lock in zone_pcp_disable() and release it in zone_pcp_enable(). This also synchronizes multiple users of zone_pcp_disable()/enable(). Currently the only user of this functionality is offline_pages(). Suggested-by: David Hildenbrand Suggested-by: Michal Hocko Signed-off-by: Vlastimil Babka Reviewed-by: Oscar Salvador Acked-by: Michal Hocko Signed-off-by: Vlastimil Babka Reviewed-by: Oscar Salvador Acked-by: Michal Hocko Reviewed-by: David Hildenbrand --- mm/internal.h | 2 ++ mm/memory_hotplug.c | 28 ++++++++---------- mm/page_alloc.c | 69 +++++++++++++++++++++++++++++++++++---------- mm/page_isolation.c | 6 ++-- 4 files changed, 71 insertions(+), 34 deletions(-) diff --git a/mm/internal.h b/mm/internal.h index c43ccdddb0f6..2966496680bc 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -201,6 +201,8 @@ extern int user_min_free_kbytes; extern void zone_pcp_update(struct zone *zone); extern void zone_pcp_reset(struct zone *zone); +extern void zone_pcp_disable(struct zone *zone); +extern void zone_pcp_enable(struct zone *zone); #if defined CONFIG_COMPACTION || defined CONFIG_CMA diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 3c494ab0d075..e0a561c550b3 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1491,17 +1491,21 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages) } node = zone_to_nid(zone); + /* + * Disable pcplists so that page isolation cannot race with freeing + * in a way that pages from isolated pageblock are left on pcplists. + */ + zone_pcp_disable(zone); + /* set above range as isolated */ ret = start_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE, MEMORY_OFFLINE | REPORT_FAILURE); if (ret) { reason = "failure to isolate range"; - goto failed_removal; + goto failed_removal_pcplists_disabled; } - drain_all_pages(zone); - arg.start_pfn = start_pfn; arg.nr_pages = nr_pages; node_states_check_changes_offline(nr_pages, zone, &arg); @@ -1551,20 +1555,8 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages) goto failed_removal_isolated; } - /* - * per-cpu pages are drained after start_isolate_page_range, but - * if there are still pages that are not free, make sure that we - * drain again, because when we isolated range we might have - * raced with another thread that was adding pages to pcp list. - * - * Forward progress should be still guaranteed because - * pages on the pcp list can only belong to MOVABLE_ZONE - * because has_unmovable_pages explicitly checks for - * PageBuddy on freed pages on other zones. - */ ret = test_pages_isolated(start_pfn, end_pfn, MEMORY_OFFLINE); - if (ret) - drain_all_pages(zone); + } while (ret); /* Mark all sections offline and remove free pages from the buddy. */ @@ -1580,6 +1572,8 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages) zone->nr_isolate_pageblock -= nr_pages / pageblock_nr_pages; spin_unlock_irqrestore(&zone->lock, flags); + zone_pcp_enable(zone); + /* removal success */ adjust_managed_page_count(pfn_to_page(start_pfn), -nr_pages); zone->present_pages -= nr_pages; @@ -1612,6 +1606,8 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages) failed_removal_isolated: undo_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE); memory_notify(MEM_CANCEL_OFFLINE, &arg); +failed_removal_pcplists_disabled: + zone_pcp_enable(zone); failed_removal: pr_debug("memory offlining [mem %#010llx-%#010llx] failed due to %s\n", (unsigned long long) start_pfn << PAGE_SHIFT, diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 2ec3e1e27169..ba4943727793 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3030,14 +3030,7 @@ static void drain_local_pages_wq(struct work_struct *work) preempt_enable(); } -/* - * Spill all the per-cpu pages from all CPUs back into the buddy allocator. - * - * When zone parameter is non-NULL, spill just the single zone's pages. - * - * Note that this can be extremely slow as the draining happens in a workqueue. - */ -void drain_all_pages(struct zone *zone) +void __drain_all_pages(struct zone *zone, bool force_all_cpus) { int cpu; @@ -3076,7 +3069,13 @@ void drain_all_pages(struct zone *zone) struct zone *z; bool has_pcps = false; - if (zone) { + if (force_all_cpus) { + /* + * The pcp.count check is racy, some callers need a + * guarantee that no cpu is missed. + */ + has_pcps = true; + } else if (zone) { pcp = per_cpu_ptr(zone->pageset, cpu); if (pcp->pcp.count) has_pcps = true; @@ -3109,6 +3108,18 @@ void drain_all_pages(struct zone *zone) mutex_unlock(&pcpu_drain_mutex); } +/* + * Spill all the per-cpu pages from all CPUs back into the buddy allocator. + * + * When zone parameter is non-NULL, spill just the single zone's pages. + * + * Note that this can be extremely slow as the draining happens in a workqueue. + */ +void drain_all_pages(struct zone *zone) +{ + __drain_all_pages(zone, false); +} + #ifdef CONFIG_HIBERNATION /* @@ -6308,6 +6319,18 @@ static void pageset_init(struct per_cpu_pageset *p) pcp->batch = BOOT_PAGESET_BATCH; } +void __zone_set_pageset_high_and_batch(struct zone *zone, unsigned long high, + unsigned long batch) +{ + struct per_cpu_pageset *p; + int cpu; + + for_each_possible_cpu(cpu) { + p = per_cpu_ptr(zone->pageset, cpu); + pageset_update(&p->pcp, high, batch); + } +} + /* * Calculate and set new high and batch values for all per-cpu pagesets of a * zone, based on the zone's size and the percpu_pagelist_fraction sysctl. @@ -6315,8 +6338,6 @@ static void pageset_init(struct per_cpu_pageset *p) static void zone_set_pageset_high_and_batch(struct zone *zone) { unsigned long new_high, new_batch; - struct per_cpu_pageset *p; - int cpu; if (percpu_pagelist_fraction) { new_high = zone_managed_pages(zone) / percpu_pagelist_fraction; @@ -6336,10 +6357,7 @@ static void zone_set_pageset_high_and_batch(struct zone *zone) zone->pageset_high = new_high; zone->pageset_batch = new_batch; - for_each_possible_cpu(cpu) { - p = per_cpu_ptr(zone->pageset, cpu); - pageset_update(&p->pcp, new_high, new_batch); - } + __zone_set_pageset_high_and_batch(zone, new_high, new_batch); } void __meminit setup_zone_pageset(struct zone *zone) @@ -8757,6 +8775,27 @@ void __meminit zone_pcp_update(struct zone *zone) mutex_unlock(&pcp_batch_high_lock); } +/* + * Effectively disable pcplists for the zone by setting the high limit to 0 + * and draining all cpus. A concurrent page freeing on another CPU that's about + * to put the page on pcplist will either finish before the drain and the page + * will be drained, or observe the new high limit and skip the pcplist. + * + * Must be paired with a call to zone_pcp_enable(). + */ +void zone_pcp_disable(struct zone *zone) +{ + mutex_lock(&pcp_batch_high_lock); + __zone_set_pageset_high_and_batch(zone, 0, 1); + __drain_all_pages(zone, true); +} + +void zone_pcp_enable(struct zone *zone) +{ + __zone_set_pageset_high_and_batch(zone, zone->pageset_high, zone->pageset_batch); + mutex_unlock(&pcp_batch_high_lock); +} + void zone_pcp_reset(struct zone *zone) { unsigned long flags; diff --git a/mm/page_isolation.c b/mm/page_isolation.c index feab446d1982..a254e1f370a3 100644 --- a/mm/page_isolation.c +++ b/mm/page_isolation.c @@ -174,9 +174,9 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages) * A call to drain_all_pages() after isolation can flush most of them. However * in some cases pages might still end up on pcp lists and that would allow * for their allocation even when they are in fact isolated already. Depending - * on how strong of a guarantee the caller needs, further drain_all_pages() - * might be needed (e.g. __offline_pages will need to call it after check for - * isolated range for a next retry). + * on how strong of a guarantee the caller needs, zone_pcp_disable/enable() + * might be used to flush and disable pcplist before isolation and enable after + * unisolation. * * Return: 0 on success and -EBUSY if any part of range cannot be isolated. */