From patchwork Thu Oct 19 10:43:54 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qi Zheng X-Patchwork-Id: 13428634 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AED6ECDB465 for ; Thu, 19 Oct 2023 10:44:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 315178D019B; Thu, 19 Oct 2023 06:44:55 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2C4028D0199; Thu, 19 Oct 2023 06:44:55 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 167E98D019B; Thu, 19 Oct 2023 06:44:55 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id F41AC8D0199 for ; Thu, 19 Oct 2023 06:44:54 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id B98351CC031 for ; Thu, 19 Oct 2023 10:44:54 +0000 (UTC) X-FDA: 81361878108.19.22FC282 Received: from mail-oo1-f52.google.com (mail-oo1-f52.google.com [209.85.161.52]) by imf14.hostedemail.com (Postfix) with ESMTP id EBC8E100023 for ; Thu, 19 Oct 2023 10:44:52 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b="cbPP/59i"; dmarc=pass (policy=quarantine) header.from=bytedance.com; spf=pass (imf14.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.161.52 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1697712293; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=AdZK381RBjQZyfHFZ18jHnyg7m+suieSGZz1SIv3ey8=; b=nN+YQRR1AXbjELSD2XYaDlFL0SHegnKjtCj816VBLNdNmqqsD7cFl+x0NSglQQkzmgsbVN DRck+rMN1aIbBZgne8Wp1nbJmljuNoxzuMrGTQ3pxtDDSFYPXmC+dvnDap4dIhru2W3gKR SyBmrHfmOosV8vCKAFmiTT6+IiRqzLs= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b="cbPP/59i"; dmarc=pass (policy=quarantine) header.from=bytedance.com; spf=pass (imf14.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.161.52 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1697712293; a=rsa-sha256; cv=none; b=ObHS13F6KtuJn4E0E69wM83N1pqheQe7gSe9+bpdlvIWE2CZv3JubtDV7H4RTxRjTuBmPU v/CuE8QMbZrDWY2E0zI4AUniYxtU0TMitQIrD5aSSW/oqblbp2CsoolKsSO4CaFI0zIXbC M2SdVoUWCnJxRQlbZXdAghRp6fU3ftI= Received: by mail-oo1-f52.google.com with SMTP id 006d021491bc7-57be74614c0so458825eaf.1 for ; Thu, 19 Oct 2023 03:44:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1697712292; x=1698317092; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=AdZK381RBjQZyfHFZ18jHnyg7m+suieSGZz1SIv3ey8=; b=cbPP/59iZ92prF4kbvBFrZ57VkeTUpNEwRLecbzTcUw9Y/+YXy9xcZgRbp6oMrSS4q w3+EPkiORSrjh0VzO05vhLidlcpxPMpdaU7Uq9Sdav3iVyyrD4aYw+kJxfmXGXy0BgZB pu7z+AVgd5CwqlvKWfDzkSrhw6T2qUUGAyKyDQ+CMObsljYD1nrkTGay4X25++IXLuNf Hag9IW/mCpeyCcUkzDitAScXIpjHLnEySvDpjMW3SratQi5Y7K7M0chPhYQ0pG0Aii6b /jJsRgMgX2B7+d86s5ry+YzEAeAXEWa9XYTYAgVABC3z7DnxG6LmwZZ/GsObohg8Mlgo wGZw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697712292; x=1698317092; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=AdZK381RBjQZyfHFZ18jHnyg7m+suieSGZz1SIv3ey8=; b=sawE6RpPosAiEhmoMCiOn0IUNB8ArrKZAIAQZwyFqFVQsi36ucP9/pNnEDvHCtF9Z8 ZgCyvaPBpr8DmpZnEq5LKdeSRhtCiwowZi+/nVSygUxS4sJJyh+c3AJg6zbrZVRiJJgq +xhikA25m4u373GNSZr33I/87ZBgyTzbbTXIg6cK+k3oHrhaS8O8CincWKzr66G/iMQE 6hTTTX8Jb6t2xpo66/pQDnwyE/VcXujZqNQsf7LujH38a4yDEBMD0KCx95ta27W3xJsN 4WCw/BaAXlmHJzpZDFH5Y2w14GH6wWNrM27lq9NRz/iRpNeeYZiYD0yrfiuVkAPzzUjr EK1Q== X-Gm-Message-State: AOJu0Yyyqgpi8orT/yCE1TqhTDkPdTreCKKh4WldmXkTofF88TDHpo0L h+KutzvaSeoqz5Wq+/Ve+NlONA== X-Google-Smtp-Source: AGHT+IEKtgaD9gW+wGXJDbqDaemkokq1LTLe2VIcWMQxnk/34R61ymLdL7teHgXuzKgJcNIS7AXK9w== X-Received: by 2002:a05:6359:3902:b0:168:a332:e1f2 with SMTP id xa2-20020a056359390200b00168a332e1f2mr1426065rwb.1.1697712291794; Thu, 19 Oct 2023 03:44:51 -0700 (PDT) Received: from C02DW0BEMD6R.bytedance.net ([203.208.167.147]) by smtp.gmail.com with ESMTPSA id y13-20020aa79e0d000000b006bdfb718e17sm4812290pfq.124.2023.10.19.03.44.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 19 Oct 2023 03:44:51 -0700 (PDT) From: Qi Zheng To: akpm@linux-foundation.org, rppt@kernel.org, david@redhat.com, vbabka@suse.cz, mhocko@suse.com Cc: willy@infradead.org, mgorman@techsingularity.net, mingo@kernel.org, aneesh.kumar@linux.ibm.com, ying.huang@intel.com, hannes@cmpxchg.org, osalvador@suse.de, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Qi Zheng Subject: [PATCH v3 1/2] mm: page_alloc: skip memoryless nodes entirely Date: Thu, 19 Oct 2023 18:43:54 +0800 Message-Id: <157013e978468241de4a4c05d5337a44638ecb0e.1697711415.git.zhengqi.arch@bytedance.com> X-Mailer: git-send-email 2.24.3 (Apple Git-128) In-Reply-To: References: MIME-Version: 1.0 X-Rspamd-Queue-Id: EBC8E100023 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: 54wdhofpic67tfauu1hg5gbzpxz5yfgu X-HE-Tag: 1697712292-541404 X-HE-Meta: U2FsdGVkX1+kyiwCddflZnHqI2srGtiyLRunLVBAFAEOiXx8WzBypI1vDCMdBgGu6/+vwi1BogB7CRE92Ord7GEC1NxFXs06ATTsT7draqgyQXZJcQVXD0lT5BMay1sGrKZ7CNn/73b2JFCVDtYezU7WCXC9D1SVXOt6TEApjG2/1Z137r1wypvsEzVtjzKoY2Na0gU7HXRwUx681Tu21Okcviw3fS/BVhGwP2tbXbwZqWTqcyWBdjiwD/G2gJaNkDoCXep2zXH4anC7EFc7H2JHh094lgi3Z8IxXDqPYvj6Gx+PfKDJs73+8Gl66jJWldSdYST4CmE0w1Y25+nSRnFlIpmN/LFq/6ms04D3cXRj4h3ZAXQPS3EKCH5VtbnwFPHF/vredOr0uAC6UcRuQUB2JG9f9QdH4arzVbm9IAgfoNdH2BJu2anHCHjcR31lWfVQBiT2AQ/WCKzzW3I36DnR3k3sAS40FpX3Y9B82Uu5WOwrAKgO4pPe1CgQXf9d9qvxtnZ7dKlF+NGFCtuwTzx7xy94+x0XQg/CKewzsYoQqTJYYaE5+I/ZVR07rL2vPLSJzz3K+VcQEPwCmlMAGTLN4R2I0HXPqpEs0G0q7mfP5ydgeIbg6CBN1MX+M0Ms/j5azMrpRlxSt4p2UB0Axh2hOzFonDQQTtFBWMYyE8Ycd06cD8u8xezdaQ8NIndrsZhNyuEwXdD8IMPVg/IHeoBsrTeXuiIuj26E6IW+iQbv8KewPbeZDou2jR6QN/btQ5hHZjI2GhBdgj9F/Bw7098Ehb8stG0zsCbNENojOsXoP+oZqK0yW9w0ZM5LTWM6kMVoK0DkZ/y1UC5BfBRv9Keh+yNVjgJXFK0Wp/MRkpKqRvvSnaq7wvC1Ytye6GQePzqW2qllhp+QALeJURCaNkugXDYcwPecWzw6NtIyrhBAI+8mIUIHzAqgRM/lwRhokYgR8P4og9IQBqlZYfm adJjfooC lbSxAX5/FLNNmgO78vKF2cFan/3RUTUshUthm+Af0pPWWvCMPG0nKjtQpwkXQmg02m0iOsfy5DRXYpV39ZFw6VvKAZzZ2onenxjamLMSs1sraPhywtWN5e2ZMNVBSWev9kjrso8hEsaii6oQGmfqMJGIKxOEWHKnoS7BygP0mA9gvHLOrO2IId/3qyTmFDM9gaPqBSuslwFwV1HQodofVMwylcIuYattn3SnbRlG5JCd2rDnL64Ldj8PzOD3Fj0UCtIpKQGta5ZrtfDEFqTToryh/Xd1BSI3A7JQH36BvkdLEVTk+vZlQYL7Z9QFW6oUc0yi6xNEKaomW6Llu/5ir5FeUdDRe8R9JbAk/NFI6cC4Ys2bfvrb2Z5+nGa3f8RlnhZM80cdLWh0vHXCT/vEk4sRvYp0U8sCARVoV6odZQ9dB5Pjk8/3URVZnei+TVQCWcmJDi5NMVy6W9MIA6Az5HsgSr/3koXt1iDVl3pgOQCjaoIbCqVLfjJf0eTFaT3hGs82fPEWluEzw1Oo+qooI9BY7RHhXrcnfLxyg X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: In find_next_best_node(), We skipped the memoryless nodes when building the zonelists of other normal nodes (N_NORMAL), but did not skip the memoryless node itself when building the zonelist. This will cause it to be traversed at runtime. For example, say we have node0 and node1, node0 is memoryless node, then the fallback order of node0 and node1 as follows: [ 0.153005] Fallback order for Node 0: 0 1 [ 0.153564] Fallback order for Node 1: 1 After this patch, we skip memoryless node0 entirely, then the fallback order of node0 and node1 as follows: [ 0.155236] Fallback order for Node 0: 1 [ 0.155806] Fallback order for Node 1: 1 So it becomes completely invisible, which will reduce runtime overhead. And in this way, we will not try to allocate pages from memoryless node0, then the panic mentioned in [1] will also be fixed. Even though this problem has been solved by dropping the NODE_MIN_SIZE constrain in x86 [2], it would be better to fix it in core MM as well. [1]. https://lore.kernel.org/all/20230212110305.93670-1-zhengqi.arch@bytedance.com/ [2]. https://lore.kernel.org/all/20231017062215.171670-1-rppt@kernel.org/ Signed-off-by: Qi Zheng Acked-by: David Hildenbrand Acked-by: Ingo Molnar --- mm/page_alloc.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index ee392a324802..e978272699d3 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5052,8 +5052,11 @@ int find_next_best_node(int node, nodemask_t *used_node_mask) int min_val = INT_MAX; int best_node = NUMA_NO_NODE; - /* Use the local node if we haven't already */ - if (!node_isset(node, *used_node_mask)) { + /* + * Use the local node if we haven't already. But for memoryless local + * node, we should skip it and fallback to other nodes. + */ + if (!node_isset(node, *used_node_mask) && node_state(node, N_MEMORY)) { node_set(node, *used_node_mask); return node; }