From patchwork Tue Jun 29 02:42:12 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12349217 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A90CCC11F66 for ; Tue, 29 Jun 2021 02:42:15 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 5F77A61D0C for ; Tue, 29 Jun 2021 02:42:15 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5F77A61D0C Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id BA0098D0120; Mon, 28 Jun 2021 22:42:14 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B5B0F8D00F0; Mon, 28 Jun 2021 22:42:14 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9EFDC8D0120; Mon, 28 Jun 2021 22:42:14 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0131.hostedemail.com [216.40.44.131]) by kanga.kvack.org (Postfix) with ESMTP id 71FF78D00F0 for ; Mon, 28 Jun 2021 22:42:14 -0400 (EDT) Received: from smtpin25.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 6D1AA180F76D8 for ; Tue, 29 Jun 2021 02:42:14 +0000 (UTC) X-FDA: 78305212188.25.F7942EC Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf14.hostedemail.com (Postfix) with ESMTP id 110E4C0201F7 for ; Tue, 29 Jun 2021 02:42:13 +0000 (UTC) Received: by mail.kernel.org (Postfix) with ESMTPSA id 1D06161D0B; Tue, 29 Jun 2021 02:42:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1624934533; bh=haCJrPlq8ZIiwQ8T6ojOGd8KHiu/UdJJ1nfLVXnFjIo=; h=Date:From:To:Subject:In-Reply-To:From; b=WlcTG6q9f665Ru7udrINwXEdnhTIL/9/Z53kvS7Kh1jPE9MGqynotzXp4msX/9wEH PAuM8OhvoVdBEt+4iKhgCYTrK4KXYD/iVm8IB9SSW/KcdJcm29B8AieVDmQLc6syHb 7TLVd1cwRnAqGvbd6G9eiI5LgK2m3mUBVjcKVQBQ= Date: Mon, 28 Jun 2021 19:42:12 -0700 From: Andrew Morton To: akpm@linux-foundation.org, dave.hansen@linux.intel.com, hdanton@sina.com, linux-mm@kvack.org, mgorman@techsingularity.net, mhocko@kernel.org, mm-commits@vger.kernel.org, torvalds@linux-foundation.org, vbabka@suse.cz Subject: [patch 171/192] mm/page_alloc: disassociate the pcp->high from pcp->batch Message-ID: <20210629024212.Bu1ghfkao%akpm@linux-foundation.org> In-Reply-To: <20210628193256.008961950a714730751c1423@linux-foundation.org> User-Agent: s-nail v14.8.16 Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=WlcTG6q9; dmarc=none; spf=pass (imf14.hostedemail.com: domain of akpm@linux-foundation.org designates 198.145.29.99 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Stat-Signature: di8c394ndgrew68h7mfttjaxk17pkrgn X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 110E4C0201F7 X-HE-Tag: 1624934533-700281 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Mel Gorman Subject: mm/page_alloc: disassociate the pcp->high from pcp->batch The pcp high watermark is based on the batch size but there is no relationship between them other than it is convenient to use early in boot. This patch takes the first step and bases pcp->high on the zone low watermark split across the number of CPUs local to a zone while the batch size remains the same to avoid increasing allocation latencies. The intent behind the default pcp->high is "set the number of PCP pages such that if they are all full that background reclaim is not started prematurely". Note that in this patch the pcp->high values are adjusted after memory hotplug events, min_free_kbytes adjustments and watermark scale factor adjustments but not CPU hotplug events which is handled later in the series. On a test KVM instance; Before grep -E "high:|batch" /proc/zoneinfo | tail -2 high: 378 batch: 63 After grep -E "high:|batch" /proc/zoneinfo | tail -2 high: 649 batch: 63 [mgorman@techsingularity.net: fix __setup_per_zone_wmarks for parallel memory hotplug] Link: https://lkml.kernel.org/r/20210528105925.GN30378@techsingularity.net Link: https://lkml.kernel.org/r/20210525080119.5455-3-mgorman@techsingularity.net Signed-off-by: Mel Gorman Acked-by: Vlastimil Babka Cc: Dave Hansen Cc: Hillf Danton Cc: Michal Hocko Signed-off-by: Andrew Morton --- mm/memory_hotplug.c | 6 ++-- mm/page_alloc.c | 62 +++++++++++++++++++++++++++++------------- 2 files changed, 47 insertions(+), 21 deletions(-) --- a/mm/memory_hotplug.c~mm-page_alloc-disassociate-the-pcp-high-from-pcp-batch +++ a/mm/memory_hotplug.c @@ -961,7 +961,6 @@ int __ref online_pages(unsigned long pfn node_states_set_node(nid, &arg); if (need_zonelists_rebuild) build_all_zonelists(NULL); - zone_pcp_update(zone); /* Basic onlining is complete, allow allocation of onlined pages. */ undo_isolate_page_range(pfn, pfn + nr_pages, MIGRATE_MOVABLE); @@ -974,6 +973,7 @@ int __ref online_pages(unsigned long pfn */ shuffle_zone(zone); + /* reinitialise watermarks and update pcp limits */ init_per_zone_wmark_min(); kswapd_run(nid); @@ -1829,13 +1829,13 @@ int __ref offline_pages(unsigned long st adjust_managed_page_count(pfn_to_page(start_pfn), -nr_pages); adjust_present_page_count(zone, -nr_pages); + /* reinitialise watermarks and update pcp limits */ init_per_zone_wmark_min(); if (!populated_zone(zone)) { zone_pcp_reset(zone); build_all_zonelists(NULL); - } else - zone_pcp_update(zone); + } node_states_clear_node(node, &arg); if (arg.status_change_nid >= 0) { --- a/mm/page_alloc.c~mm-page_alloc-disassociate-the-pcp-high-from-pcp-batch +++ a/mm/page_alloc.c @@ -2175,14 +2175,6 @@ void __init page_alloc_init_late(void) wait_for_completion(&pgdat_init_all_done_comp); /* - * The number of managed pages has changed due to the initialisation - * so the pcpu batch and high limits needs to be updated or the limits - * will be artificially small. - */ - for_each_populated_zone(zone) - zone_pcp_update(zone); - - /* * We initialized the rest of the deferred pages. Permanently disable * on-demand struct page initialization. */ @@ -6633,13 +6625,12 @@ static int zone_batchsize(struct zone *z int batch; /* - * The per-cpu-pages pools are set to around 1000th of the - * size of the zone. + * The number of pages to batch allocate is either ~0.1% + * of the zone or 1MB, whichever is smaller. The batch + * size is striking a balance between allocation latency + * and zone lock contention. */ - batch = zone_managed_pages(zone) / 1024; - /* But no more than a meg. */ - if (batch * PAGE_SIZE > 1024 * 1024) - batch = (1024 * 1024) / PAGE_SIZE; + batch = min(zone_managed_pages(zone) >> 10, (1024 * 1024) / PAGE_SIZE); batch /= 4; /* We effectively *= 4 below */ if (batch < 1) batch = 1; @@ -6676,6 +6667,34 @@ static int zone_batchsize(struct zone *z #endif } +static int zone_highsize(struct zone *zone, int batch) +{ +#ifdef CONFIG_MMU + int high; + int nr_local_cpus; + + /* + * The high value of the pcp is based on the zone low watermark + * so that if they are full then background reclaim will not be + * started prematurely. The value is split across all online CPUs + * local to the zone. Note that early in boot that CPUs may not be + * online yet. + */ + nr_local_cpus = max(1U, cpumask_weight(cpumask_of_node(zone_to_nid(zone)))); + high = low_wmark_pages(zone) / nr_local_cpus; + + /* + * Ensure high is at least batch*4. The multiple is based on the + * historical relationship between high and batch. + */ + high = max(high, batch << 2); + + return high; +#else + return 0; +#endif +} + /* * pcp->high and pcp->batch values are related and generally batch is lower * than high. They are also related to pcp->count such that count is lower @@ -6737,11 +6756,10 @@ static void __zone_set_pageset_high_and_ */ static void zone_set_pageset_high_and_batch(struct zone *zone) { - unsigned long new_high, new_batch; + int new_high, new_batch; - new_batch = zone_batchsize(zone); - new_high = 6 * new_batch; - new_batch = max(1UL, 1 * new_batch); + new_batch = max(1, zone_batchsize(zone)); + new_high = zone_highsize(zone, new_batch); if (zone->pageset_high == new_high && zone->pageset_batch == new_batch) @@ -8222,11 +8240,19 @@ static void __setup_per_zone_wmarks(void */ void setup_per_zone_wmarks(void) { + struct zone *zone; static DEFINE_SPINLOCK(lock); spin_lock(&lock); __setup_per_zone_wmarks(); spin_unlock(&lock); + + /* + * The watermark size have changed so update the pcpu batch + * and high limits or the limits may be inappropriate. + */ + for_each_zone(zone) + zone_pcp_update(zone); } /*