From patchwork Tue Jun 29 02:43:11 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12349255 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7D960C11F66 for ; Tue, 29 Jun 2021 02:43:14 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 1DDBE61D19 for ; Tue, 29 Jun 2021 02:43:14 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1DDBE61D19 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 77F6D8D0133; Mon, 28 Jun 2021 22:43:13 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7303B8D00F0; Mon, 28 Jun 2021 22:43:13 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5F94A8D0133; Mon, 28 Jun 2021 22:43:13 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0206.hostedemail.com [216.40.44.206]) by kanga.kvack.org (Postfix) with ESMTP id 346CF8D00F0 for ; Mon, 28 Jun 2021 22:43:13 -0400 (EDT) Received: from smtpin38.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 2FD31181AEF1E for ; Tue, 29 Jun 2021 02:43:13 +0000 (UTC) X-FDA: 78305214666.38.EA68CAA Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf12.hostedemail.com (Postfix) with ESMTP id B0B5F34B for ; Tue, 29 Jun 2021 02:43:12 +0000 (UTC) Received: by mail.kernel.org (Postfix) with ESMTPSA id B4A6261D1D; Tue, 29 Jun 2021 02:43:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1624934592; bh=rHvJcpbuQR7xtLpBz6JlEOqWljVAT9Qt0WEFgBr+qBI=; h=Date:From:To:Subject:In-Reply-To:From; b=LGCzfeIKekRYfpM4j7DyB86Hk+LHJhYjC1UciCpn58ZpcyLF2KqCfmFN2khXP7dUw TjoPXlyW3FaoG1wxe5t0J1pEUvkQGzEQgcrw8xM6f47wwG0XZ/5w/9pVxL4Dl563rM F5Ezm+HX18tiFdX/FtDX+gQO3heqPW0R1vXY6qp4= Date: Mon, 28 Jun 2021 19:43:11 -0700 From: Andrew Morton To: akpm@linux-foundation.org, dave.hansen@intel.com, feng.tang@intel.com, hdanton@sina.com, linux-mm@kvack.org, mgorman@techsingularity.net, mhocko@kernel.org, mm-commits@vger.kernel.org, torvalds@linux-foundation.org, vbabka@suse.cz Subject: [patch 190/192] mm/page_alloc: split pcp->high across all online CPUs for cpuless nodes Message-ID: <20210629024311.e8wr0CToB%akpm@linux-foundation.org> In-Reply-To: <20210628193256.008961950a714730751c1423@linux-foundation.org> User-Agent: s-nail v14.8.16 Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=LGCzfeIK; dmarc=none; spf=pass (imf12.hostedemail.com: domain of akpm@linux-foundation.org designates 198.145.29.99 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Stat-Signature: r97qc9pnf7f4rmoycx8p71tmsciowsnn X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: B0B5F34B X-HE-Tag: 1624934592-625421 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Mel Gorman Subject: mm/page_alloc: split pcp->high across all online CPUs for cpuless nodes Dave Hansen reported the following about Feng Tang's tests on a machine with persistent memory onlined as a DRAM-like device. Feng Tang tossed these on a "Cascade Lake" system with 96 threads and ~512G of persistent memory and 128G of DRAM. The PMEM is in "volatile use" mode and being managed via the buddy just like the normal RAM. The PMEM zones are big ones: present 65011712 = 248 G high 134595 = 525 M The PMEM nodes, of course, don't have any CPUs in them. With your series, the pcp->high value per-cpu is 69584 pages or about 270MB per CPU. Scaled up by the 96 CPU threads, that's ~26GB of worst-case memory in the pcps per zone, or roughly 10% of the size of the zone. This should not cause a problem as such although it could trigger reclaim due to pages being stored on per-cpu lists for CPUs remote to a node. It is not possible to treat cpuless nodes exactly the same as normal nodes but the worst-case scenario can be mitigated by splitting pcp->high across all online CPUs for cpuless memory nodes. Link: https://lkml.kernel.org/r/20210616110743.GK30378@techsingularity.net Suggested-by: Dave Hansen Signed-off-by: Mel Gorman Acked-by: Vlastimil Babka Acked-by: Dave Hansen Cc: Hillf Danton Cc: Michal Hocko Cc: "Tang, Feng" Signed-off-by: Andrew Morton --- mm/page_alloc.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) --- a/mm/page_alloc.c~mm-page_alloc-split-pcp-high-across-all-online-cpus-for-cpuless-nodes +++ a/mm/page_alloc.c @@ -6790,7 +6790,7 @@ static int zone_highsize(struct zone *zo { #ifdef CONFIG_MMU int high; - int nr_local_cpus; + int nr_split_cpus; unsigned long total_pages; if (!percpu_pagelist_high_fraction) { @@ -6813,10 +6813,14 @@ static int zone_highsize(struct zone *zo * Split the high value across all online CPUs local to the zone. Note * that early in boot that CPUs may not be online yet and that during * CPU hotplug that the cpumask is not yet updated when a CPU is being - * onlined. + * onlined. For memory nodes that have no CPUs, split pcp->high across + * all online CPUs to mitigate the risk that reclaim is triggered + * prematurely due to pages stored on pcp lists. */ - nr_local_cpus = max(1U, cpumask_weight(cpumask_of_node(zone_to_nid(zone)))) + cpu_online; - high = total_pages / nr_local_cpus; + nr_split_cpus = cpumask_weight(cpumask_of_node(zone_to_nid(zone))) + cpu_online; + if (!nr_split_cpus) + nr_split_cpus = num_online_cpus(); + high = total_pages / nr_split_cpus; /* * Ensure high is at least batch*4. The multiple is based on the