From patchwork Thu Apr 11 03:56:51 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yang Shi X-Patchwork-Id: 10895073 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 89D9C17E0 for ; Thu, 11 Apr 2019 03:57:38 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 737791FF73 for ; Thu, 11 Apr 2019 03:57:38 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 67ADD28A86; Thu, 11 Apr 2019 03:57:38 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C354C1FF73 for ; Thu, 11 Apr 2019 03:57:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A4F126B0005; Wed, 10 Apr 2019 23:57:26 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 9AB666B0006; Wed, 10 Apr 2019 23:57:26 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8867B6B0008; Wed, 10 Apr 2019 23:57:26 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f197.google.com (mail-pf1-f197.google.com [209.85.210.197]) by kanga.kvack.org (Postfix) with ESMTP id 4D9DD6B0005 for ; Wed, 10 Apr 2019 23:57:26 -0400 (EDT) Received: by mail-pf1-f197.google.com with SMTP id i23so3406169pfa.0 for ; Wed, 10 Apr 2019 20:57:26 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=zT5IRyMHrSI4fnrzELRIPvo8b6vbfkc3L02HfOTWYA0=; b=bxCcfI2uYSBdhwdfx6YFCuyDZYCoiLA9isj0j383c4R3DHu6++35srWHzFgeazjnwK JL3YfASJglN+eGsrPeNdD4MQVVVzU4raKvMeVHbL8artkBbKcU7OLa/s94z3BRq5t/T9 x/FPOcpAkUSSxEe7Oo14Z1oFAYMfwJZ1a6Ys4MNd0LGbo0UiNDTs0pJo1S1ypTjQFZIv KYwMILMaNbBB40rUw3UHgVUD7JTDfBMZyt1BoZSk6uEbyL3qFXB/WDWwzi4DMTRwERyS sLK9AQUgUzf6g24zTltRnk5++TjggN7ET/zzWXbDfeSlEikDE5BJHPxHRdSEWydNojux TaRg== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of yang.shi@linux.alibaba.com designates 115.124.30.130 as permitted sender) smtp.mailfrom=yang.shi@linux.alibaba.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=alibaba.com X-Gm-Message-State: APjAAAXgTTbSIDUCeMyEl9jxVsk6UsWkIl9ta4WdzZO4BR3QTqW89npB +c0e2kpaaMh3bohzyXa5pMX/mwL9SeeSrX0oGOf+0R6ITiVP8IMbga757vTs/9h+JA2K4YRuy/D /b4PWj3iAiMBB8gufajflSEHacrgEu5hvSJ9XK+8KxGnS+oK3l9N+oj/g0lSDvtNxOw== X-Received: by 2002:a63:e850:: with SMTP id a16mr42823238pgk.195.1554955045686; Wed, 10 Apr 2019 20:57:25 -0700 (PDT) X-Google-Smtp-Source: APXvYqzyLWpd07NhCUJF6+l2ljPHMRzx0/pqZmNlVYqP6pLpUe+5XEPw0Z1UVeuf/wcUkrrkhyE2 X-Received: by 2002:a63:e850:: with SMTP id a16mr42823161pgk.195.1554955044144; Wed, 10 Apr 2019 20:57:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1554955044; cv=none; d=google.com; s=arc-20160816; b=BuujULZAouVhuMaDLtROx5mZcnLy7Cpscf2464+CjwZb8ODRrB+1NaEr24XmYUfRFv 5mAoBT3FiHrA2f18DjZm1Riaa0U84cnXyMSGvjVWZt1j8CiiXBklpP9L+bDZW29jtnOu Ags4KIk0/4Y+NieJK6i2Ukjo+kAQkUEq5DtAQEWO0wRVMeyP3wWNPYOFOp38G4D/o6Cv pxDbkcE5RWAKbAhaL7aNJYpb98HzHYOjcgTpuje/Icl0o2S3KBOjsBbujHfoZuWMVw7M n/rdzYft2mROQxJNbBQu2QNc1g5olfbLw5KjLphf8NXmoUY10BO/XpL5VuEMCfbNYLRi umdw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=zT5IRyMHrSI4fnrzELRIPvo8b6vbfkc3L02HfOTWYA0=; b=b/c58pCu0HXZVK+QPFkLln4Gc2nbdV0IZBxqeNfl7MV13HOVlS3e/Kv97Vr415mV4T A8gMTIfpduw0GqW9SddzfGUV2vxzyiYw9S562SgDagTtejkY6AIS9Xkh/3rfI5oOjSbY GVbeQQaRQN+g+9HKHzMPzc/C/y8DSVbLZCNjv8SlxZnzTDURWEh7HhBhhXm48IVC9wHg OZB3WWHyAfaomf4/tBHwlUqRB05VgBgK8ifm5T4RDxAXY3IZnMrnS1iYDPuLWVYuljfQ IvekdwGcjtu7NFvZs1t3OTBLrSKeVP5j26kT05FBWiUm7krSpNEGfXHSGxScMiOFmZc/ whAg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of yang.shi@linux.alibaba.com designates 115.124.30.130 as permitted sender) smtp.mailfrom=yang.shi@linux.alibaba.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: from out30-130.freemail.mail.aliyun.com (out30-130.freemail.mail.aliyun.com. [115.124.30.130]) by mx.google.com with ESMTPS id i19si15607342pfr.246.2019.04.10.20.57.23 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 10 Apr 2019 20:57:24 -0700 (PDT) Received-SPF: pass (google.com: domain of yang.shi@linux.alibaba.com designates 115.124.30.130 as permitted sender) client-ip=115.124.30.130; Authentication-Results: mx.google.com; spf=pass (google.com: domain of yang.shi@linux.alibaba.com designates 115.124.30.130 as permitted sender) smtp.mailfrom=yang.shi@linux.alibaba.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=alibaba.com X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R761e4;CH=green;DM=||false|;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04420;MF=yang.shi@linux.alibaba.com;NM=1;PH=DS;RN=15;SR=0;TI=SMTPD_---0TP0I5rB_1554955031; Received: from e19h19392.et15sqa.tbsite.net(mailfrom:yang.shi@linux.alibaba.com fp:SMTPD_---0TP0I5rB_1554955031) by smtp.aliyun-inc.com(127.0.0.1); Thu, 11 Apr 2019 11:57:22 +0800 From: Yang Shi To: mhocko@suse.com, mgorman@techsingularity.net, riel@surriel.com, hannes@cmpxchg.org, akpm@linux-foundation.org, dave.hansen@intel.com, keith.busch@intel.com, dan.j.williams@intel.com, fengguang.wu@intel.com, fan.du@intel.com, ying.huang@intel.com, ziy@nvidia.com Cc: yang.shi@linux.alibaba.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [v2 PATCH 1/9] mm: define N_CPU_MEM node states Date: Thu, 11 Apr 2019 11:56:51 +0800 Message-Id: <1554955019-29472-2-git-send-email-yang.shi@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1554955019-29472-1-git-send-email-yang.shi@linux.alibaba.com> References: <1554955019-29472-1-git-send-email-yang.shi@linux.alibaba.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Kernel has some pre-defined node masks called node states, i.e. N_MEMORY, N_CPU, etc. But, there might be cpuless nodes, i.e. PMEM nodes, and some architectures, i.e. Power, may have memoryless nodes. It is not very straight forward to get the nodes with both CPUs and memory. So, define N_CPU_MEMORY node states. The nodes with both CPUs and memory are called "primary" nodes. /sys/devices/system/node/primary would show the current online "primary" nodes. Signed-off-by: Yang Shi --- drivers/base/node.c | 2 ++ include/linux/nodemask.h | 3 ++- mm/memory_hotplug.c | 6 ++++++ mm/page_alloc.c | 1 + mm/vmstat.c | 11 +++++++++-- 5 files changed, 20 insertions(+), 3 deletions(-) diff --git a/drivers/base/node.c b/drivers/base/node.c index 86d6cd9..1b963b2 100644 --- a/drivers/base/node.c +++ b/drivers/base/node.c @@ -634,6 +634,7 @@ static ssize_t show_node_state(struct device *dev, #endif [N_MEMORY] = _NODE_ATTR(has_memory, N_MEMORY), [N_CPU] = _NODE_ATTR(has_cpu, N_CPU), + [N_CPU_MEM] = _NODE_ATTR(primary, N_CPU_MEM), }; static struct attribute *node_state_attrs[] = { @@ -645,6 +646,7 @@ static ssize_t show_node_state(struct device *dev, #endif &node_state_attr[N_MEMORY].attr.attr, &node_state_attr[N_CPU].attr.attr, + &node_state_attr[N_CPU_MEM].attr.attr, NULL }; diff --git a/include/linux/nodemask.h b/include/linux/nodemask.h index 27e7fa3..66a8964 100644 --- a/include/linux/nodemask.h +++ b/include/linux/nodemask.h @@ -398,7 +398,8 @@ enum node_states { N_HIGH_MEMORY = N_NORMAL_MEMORY, #endif N_MEMORY, /* The node has memory(regular, high, movable) */ - N_CPU, /* The node has one or more cpus */ + N_CPU, /* The node has one or more cpus */ + N_CPU_MEM, /* The node has both cpus and memory */ NR_NODE_STATES }; diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index f767582..1140f3b 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -729,6 +729,9 @@ static void node_states_set_node(int node, struct memory_notify *arg) if (arg->status_change_nid >= 0) node_set_state(node, N_MEMORY); + + if (node_state(node, N_CPU)) + node_set_state(node, N_CPU_MEM); } static void __meminit resize_zone_range(struct zone *zone, unsigned long start_pfn, @@ -1569,6 +1572,9 @@ static void node_states_clear_node(int node, struct memory_notify *arg) if (arg->status_change_nid >= 0) node_clear_state(node, N_MEMORY); + + if (node_state(node, N_CPU)) + node_clear_state(node, N_CPU_MEM); } static int __ref __offline_pages(unsigned long start_pfn, diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 03fcf73..7cd88a4 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -122,6 +122,7 @@ struct pcpu_drain { #endif [N_MEMORY] = { { [0] = 1UL } }, [N_CPU] = { { [0] = 1UL } }, + [N_CPU_MEM] = { { [0] = 1UL } }, #endif /* NUMA */ }; EXPORT_SYMBOL(node_states); diff --git a/mm/vmstat.c b/mm/vmstat.c index 36b56f8..1a431dc 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1910,15 +1910,22 @@ static void __init init_cpu_node_state(void) int node; for_each_online_node(node) { - if (cpumask_weight(cpumask_of_node(node)) > 0) + if (cpumask_weight(cpumask_of_node(node)) > 0) { node_set_state(node, N_CPU); + if (node_state(node, N_MEMORY)) + node_set_state(node, N_CPU_MEM); + } } } static int vmstat_cpu_online(unsigned int cpu) { + int node = cpu_to_node(cpu); + refresh_zone_stat_thresholds(); - node_set_state(cpu_to_node(cpu), N_CPU); + node_set_state(node, N_CPU); + if (node_state(node, N_MEMORY)) + node_set_state(node, N_CPU_MEM); return 0; } From patchwork Thu Apr 11 03:56:52 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yang Shi X-Patchwork-Id: 10895069 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2BC941669 for ; Thu, 11 Apr 2019 03:57:33 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 17BDB27F85 for ; Thu, 11 Apr 2019 03:57:33 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 0B49128A86; Thu, 11 Apr 2019 03:57:33 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 853CB27F85 for ; Thu, 11 Apr 2019 03:57:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1E7746B0007; Wed, 10 Apr 2019 23:57:27 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id EB2D36B0006; Wed, 10 Apr 2019 23:57:26 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D63B16B0007; Wed, 10 Apr 2019 23:57:26 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f198.google.com (mail-pg1-f198.google.com [209.85.215.198]) by kanga.kvack.org (Postfix) with ESMTP id 9E8436B0008 for ; Wed, 10 Apr 2019 23:57:26 -0400 (EDT) Received: by mail-pg1-f198.google.com with SMTP id n5so3554106pgk.9 for ; Wed, 10 Apr 2019 20:57:26 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=MLWynHGliv6scIgC4xX8HtX9K46TiIdNd3gqOe6+OLg=; b=lJi0l01tcXwRUr8yjybldmDSUJKdAs8A8EAnxhghF696LYjn+2qNJeLRIrIAiQy7wq J5F+eoAzK1jjjjvcQ9IDxVoxSc9sPmT95g1tSd4jt9FJUiUWRkFt0gdsHTRSdphBgVxb OSS5C6EaKij8m2H+2h+IS+dt1EnulOXxRH9+dZVqXtSL/oN6Y+0Z2TYSarF5L71HBBEu TUtHKwKOWlaQSrjQFzBgy7z8LfPFkan9c+8qA3Devc+IOxBODRjWgosc893z3GLt5hai XefPD+Dt03XGzuiUzUY6Y/SEjIT5MyJXv64oasFlhEw46ItBzgKsJcQEE2vLsmgQ3oDR c/sA== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of yang.shi@linux.alibaba.com designates 115.124.30.57 as permitted sender) smtp.mailfrom=yang.shi@linux.alibaba.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=alibaba.com X-Gm-Message-State: APjAAAX/yzEybwxtFTal4iQcnKDxbPeuM9w70Ckmyyo5OUn6g5QcOWYk iZEjZSy3guCnACwELPLlv7gFE+ePUpV8tHpE1SG9l/XdC5w6lH2CA47v0BJgyWNAPnTPFzwm8li 0NV3FHx311sukwRIa/5eh4gJ8YiDb2/FCDBGFlvJ4qIbZ+HKIVHlTmiMaywCfE+8FCA== X-Received: by 2002:aa7:8d9a:: with SMTP id i26mr35425466pfr.220.1554955045902; Wed, 10 Apr 2019 20:57:25 -0700 (PDT) X-Google-Smtp-Source: APXvYqyOL8Bn3zu5CgMgkxtCiFx79A9cIRfmzY17PF04qQjp4l3LbFZS9EeoQhVxC8+s1RK8xZv5 X-Received: by 2002:aa7:8d9a:: with SMTP id i26mr35425391pfr.220.1554955044527; Wed, 10 Apr 2019 20:57:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1554955044; cv=none; d=google.com; s=arc-20160816; b=kohTdrYyiaoDR5xoliVZdUd4TxUlYz3bmDIRjhq3pOSpONiWdo12hJH0l4inKSNm03 xutdkydhDsXAwnHlBrCHn8A6jhH4ALGqWWag61t+x2qnSEwXldHtHxR+AIOInJKuVCIL POMj3FMqflyzMiEAPfU7ujYwZ/Y1JI2y63KP6WVmhD4JcceeAfEuiP9TCqanxhUvh9pg dEfDvITsI0jyfUDnybnrd67yCheJDhNAR/en+qnMG7LN2sASrzcl9W6vzW0s2l564ikM +Xhx5UJ1d47GIeNTgJXP2xQB29uV0r5vFg/tMOpvtfI6Y1gZuCrG6TQ5YAmNhc/yoK0M fKTw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=MLWynHGliv6scIgC4xX8HtX9K46TiIdNd3gqOe6+OLg=; b=08wtXKQgNbmFw9rgdKEBeVhAD4qwMyeyyVSKOSHW9MqNT6phW6yFEpaZDyc+04RNYp A8a7Xr6+phcfn//WzYN5r91BVJRim+L3x7wDIQ4a93Iphf8cyHmU++AreYeoNTafw4XK /KTlXcPD+pRcxjg43t0iqBEWH4555UukAd2OUh8LEzwd/UvY1Cbt2xTqu8aL9kV+ag/G HwHPAoGVWfvrPAyPdLthhq5NATsBV+uWMVHDt5WwMlFoKZttZTQPyFqObG+8H89v4V/C ozKtkyUVIS29EXIDM2Dm7Dcu3guIQyEWsasi7JOEA9043cpmssr4wICWfZ6A6wARI5dw B6pg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of yang.shi@linux.alibaba.com designates 115.124.30.57 as permitted sender) smtp.mailfrom=yang.shi@linux.alibaba.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: from out30-57.freemail.mail.aliyun.com (out30-57.freemail.mail.aliyun.com. [115.124.30.57]) by mx.google.com with ESMTPS id r124si30310016pgr.201.2019.04.10.20.57.23 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 10 Apr 2019 20:57:24 -0700 (PDT) Received-SPF: pass (google.com: domain of yang.shi@linux.alibaba.com designates 115.124.30.57 as permitted sender) client-ip=115.124.30.57; Authentication-Results: mx.google.com; spf=pass (google.com: domain of yang.shi@linux.alibaba.com designates 115.124.30.57 as permitted sender) smtp.mailfrom=yang.shi@linux.alibaba.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=alibaba.com X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R121e4;CH=green;DM=||false|;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e07417;MF=yang.shi@linux.alibaba.com;NM=1;PH=DS;RN=15;SR=0;TI=SMTPD_---0TP0I5rB_1554955031; Received: from e19h19392.et15sqa.tbsite.net(mailfrom:yang.shi@linux.alibaba.com fp:SMTPD_---0TP0I5rB_1554955031) by smtp.aliyun-inc.com(127.0.0.1); Thu, 11 Apr 2019 11:57:22 +0800 From: Yang Shi To: mhocko@suse.com, mgorman@techsingularity.net, riel@surriel.com, hannes@cmpxchg.org, akpm@linux-foundation.org, dave.hansen@intel.com, keith.busch@intel.com, dan.j.williams@intel.com, fengguang.wu@intel.com, fan.du@intel.com, ying.huang@intel.com, ziy@nvidia.com Cc: yang.shi@linux.alibaba.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [v2 PATCH 2/9] mm: page_alloc: make find_next_best_node find return cpuless node Date: Thu, 11 Apr 2019 11:56:52 +0800 Message-Id: <1554955019-29472-3-git-send-email-yang.shi@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1554955019-29472-1-git-send-email-yang.shi@linux.alibaba.com> References: <1554955019-29472-1-git-send-email-yang.shi@linux.alibaba.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Need find the cloest cpuless node to demote DRAM pages. Add "cpuless" parameter to find_next_best_node() to skip DRAM node on demand. Signed-off-by: Yang Shi --- mm/internal.h | 11 +++++++++++ mm/page_alloc.c | 14 ++++++++++---- 2 files changed, 21 insertions(+), 4 deletions(-) diff --git a/mm/internal.h b/mm/internal.h index 9eeaf2b..a514808 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -292,6 +292,17 @@ static inline bool is_data_mapping(vm_flags_t flags) return (flags & (VM_WRITE | VM_SHARED | VM_STACK)) == VM_WRITE; } +#ifdef CONFIG_NUMA +extern int find_next_best_node(int node, nodemask_t *used_node_mask, + bool cpuless); +#else +static inline int find_next_best_node(int node, nodemask_t *used_node_mask, + bool cpuless) +{ + return 0; +} +#endif + /* mm/util.c */ void __vma_link_list(struct mm_struct *mm, struct vm_area_struct *vma, struct vm_area_struct *prev, struct rb_node *rb_parent); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 7cd88a4..bda17c2 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5362,6 +5362,7 @@ int numa_zonelist_order_handler(struct ctl_table *table, int write, * find_next_best_node - find the next node that should appear in a given node's fallback list * @node: node whose fallback list we're appending * @used_node_mask: nodemask_t of already used nodes + * @cpuless: find next best cpuless node * * We use a number of factors to determine which is the next node that should * appear on a given node's fallback list. The node should not have appeared @@ -5373,7 +5374,8 @@ int numa_zonelist_order_handler(struct ctl_table *table, int write, * * Return: node id of the found node or %NUMA_NO_NODE if no node is found. */ -static int find_next_best_node(int node, nodemask_t *used_node_mask) +int find_next_best_node(int node, nodemask_t *used_node_mask, + bool cpuless) { int n, val; int min_val = INT_MAX; @@ -5381,13 +5383,18 @@ static int find_next_best_node(int node, nodemask_t *used_node_mask) const struct cpumask *tmp = cpumask_of_node(0); /* Use the local node if we haven't already */ - if (!node_isset(node, *used_node_mask)) { + if (!node_isset(node, *used_node_mask) && + !cpuless) { node_set(node, *used_node_mask); return node; } for_each_node_state(n, N_MEMORY) { + /* Find next best cpuless node */ + if (cpuless && (node_state(n, N_CPU))) + continue; + /* Don't want a node to appear more than once */ if (node_isset(n, *used_node_mask)) continue; @@ -5419,7 +5426,6 @@ static int find_next_best_node(int node, nodemask_t *used_node_mask) return best_node; } - /* * Build zonelists ordered by node and zones within node. * This results in maximum locality--normal zone overflows into local @@ -5481,7 +5487,7 @@ static void build_zonelists(pg_data_t *pgdat) nodes_clear(used_mask); memset(node_order, 0, sizeof(node_order)); - while ((node = find_next_best_node(local_node, &used_mask)) >= 0) { + while ((node = find_next_best_node(local_node, &used_mask, false)) >= 0) { /* * We don't want to pressure a particular node. * So adding penalty to the first node in same From patchwork Thu Apr 11 03:56:53 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yang Shi X-Patchwork-Id: 10895071 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0AE4C1669 for ; Thu, 11 Apr 2019 03:57:36 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E83B82018E for ; Thu, 11 Apr 2019 03:57:35 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id DC21B2898C; Thu, 11 Apr 2019 03:57:35 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0D8B12018E for ; Thu, 11 Apr 2019 03:57:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6065A6B0006; Wed, 10 Apr 2019 23:57:27 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 592556B000A; Wed, 10 Apr 2019 23:57:27 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 27F8A6B000C; Wed, 10 Apr 2019 23:57:27 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f199.google.com (mail-pg1-f199.google.com [209.85.215.199]) by kanga.kvack.org (Postfix) with ESMTP id DDC8F6B000A for ; Wed, 10 Apr 2019 23:57:26 -0400 (EDT) Received: by mail-pg1-f199.google.com with SMTP id 18so3545508pgx.11 for ; Wed, 10 Apr 2019 20:57:26 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=9VGPp5u5MHlsn+PLrxQC9AI+6qwPcFhzNxibXCLjh3U=; b=gXVWoSBjLwx1yXvNwXVOxGBZYrREy99VvCRxp7WsN89OQChg1kAPWf2zW9kV0qGtU2 WFFqrB9Z4vCgxKjoiZ29IeC4dW2liy5whZGPCJBiZwk8+vvWh4fSbZ+6C7dEDuwbwX4a B5huEnuMa+U/VYW4TxLnwrvJkU8fhk6XCF186dA2ZC3loXeb98OCbWfHDPTJrNWM38HO ecixbCpsCddGVhGyDZQoLZxXyjYUKK/wJNDma7KVildjLOu3ojY4OsZkMrJQEEloWuLh 9mWOr+yV2hsNi/ujb6e8WJNqXy9JG5eg9A62b/A2OJZhGUP8s9nc7Aw5CKLv4vyXCNYC wfFQ== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of yang.shi@linux.alibaba.com designates 115.124.30.133 as permitted sender) smtp.mailfrom=yang.shi@linux.alibaba.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=alibaba.com X-Gm-Message-State: APjAAAXm/K9Ab2gbFEKl9VgdUu+nUp/cv/5ix8wK8uLKPVTbn+BzwKtT xMAmTvy4fxFSVqM6HFnoOVfHAEMAnMw2oQaUPbnzbfxgD+KhTAv22ILBaTkK9hjdHEncdp2rjZC GhAHq9/rDJ2NItUl2ZBFHXSw55LYthbTZ65mq1heM53oB77rbavFZgIy3ldd5lueRMA== X-Received: by 2002:a62:1f92:: with SMTP id l18mr48244662pfj.180.1554955046348; Wed, 10 Apr 2019 20:57:26 -0700 (PDT) X-Google-Smtp-Source: APXvYqzBZ4USuZCRTqeJaWbvZG2tVgGuUMAygqnQFmGbwnf40iEbiBZhwh72bCQaUZxT9AvaY0f9 X-Received: by 2002:a62:1f92:: with SMTP id l18mr48244593pfj.180.1554955044787; Wed, 10 Apr 2019 20:57:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1554955044; cv=none; d=google.com; s=arc-20160816; b=UlKgcPBg7wonOfDhdpQR91ZlZEXvccGXG20bLYLYiySoZT6h/gBzlD5lbh1RRdJn1w vp/EyHeR07zuRV3ICvCZo9bQLblYO54eZE3zdkD9K9Q5AaJ+0DDLIg84R/I6WZSUgQUh 2aV/o1lL4V56SBoin+Sw4q758oJRuh9mjhFvAYbGzhegHY6xhQLTN4dTZSGgky3Rl1XH ELxZiGFKchfCAs1vp5IqD+0xI1Iebl9OQEThI2UWbyucAGoBvWS9pw/w/0KemssHaLmw IFLslznbAG0X6xUtKOr6coWPcLpK5nZygwgM8VAoNuB/jfCjv7htBi8iXwMeBDUBVFi3 tVgw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=9VGPp5u5MHlsn+PLrxQC9AI+6qwPcFhzNxibXCLjh3U=; b=BUvXcQqjDXlyynD2tLTRPvvW2u17qZscrEJkGrPS6uHhe53DagWVKTMDT2dwy5HaHB 3qlbkGDSI0m3p8yceA75wthcd/RkzALgFaenxua7L78GOFt7WS2qBEym9KEkklK3Db0e NmYi4HVWHNkqyCOnrizi64bUDX9zEH8RkzhKMuxHko1pgpkPhNLzps3VFWk8VdK4cMtn +Fbfy+J4fgGS9E0TXy07tcXLIstxyUX6/zY2aMUk17cNeDuld7sqaWV08/Do5/umWQ9u gAy5DMZKDM0xhQVLy/fyMj94u/Sy/qNUv9d2uSedYL6YeyRK8VT1j8JtWf8j3Wt7NrfM qn7w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of yang.shi@linux.alibaba.com designates 115.124.30.133 as permitted sender) smtp.mailfrom=yang.shi@linux.alibaba.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: from out30-133.freemail.mail.aliyun.com (out30-133.freemail.mail.aliyun.com. [115.124.30.133]) by mx.google.com with ESMTPS id v18si10830416plo.394.2019.04.10.20.57.24 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 10 Apr 2019 20:57:24 -0700 (PDT) Received-SPF: pass (google.com: domain of yang.shi@linux.alibaba.com designates 115.124.30.133 as permitted sender) client-ip=115.124.30.133; Authentication-Results: mx.google.com; spf=pass (google.com: domain of yang.shi@linux.alibaba.com designates 115.124.30.133 as permitted sender) smtp.mailfrom=yang.shi@linux.alibaba.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=alibaba.com X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R901e4;CH=green;DM=||false|;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e01422;MF=yang.shi@linux.alibaba.com;NM=1;PH=DS;RN=15;SR=0;TI=SMTPD_---0TP0I5rB_1554955031; Received: from e19h19392.et15sqa.tbsite.net(mailfrom:yang.shi@linux.alibaba.com fp:SMTPD_---0TP0I5rB_1554955031) by smtp.aliyun-inc.com(127.0.0.1); Thu, 11 Apr 2019 11:57:22 +0800 From: Yang Shi To: mhocko@suse.com, mgorman@techsingularity.net, riel@surriel.com, hannes@cmpxchg.org, akpm@linux-foundation.org, dave.hansen@intel.com, keith.busch@intel.com, dan.j.williams@intel.com, fengguang.wu@intel.com, fan.du@intel.com, ying.huang@intel.com, ziy@nvidia.com Cc: yang.shi@linux.alibaba.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [v2 PATCH 3/9] mm: numa: promote pages to DRAM when it gets accessed twice Date: Thu, 11 Apr 2019 11:56:53 +0800 Message-Id: <1554955019-29472-4-git-send-email-yang.shi@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1554955019-29472-1-git-send-email-yang.shi@linux.alibaba.com> References: <1554955019-29472-1-git-send-email-yang.shi@linux.alibaba.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP NUMA balancing would promote the pages to DRAM once it is accessed, but it might be just one off access. To reduce migration thrashing and memory bandwidth pressure, just promote the page which gets accessed twice by extending page_check_references() to support second reference algorithm for anonymous page. The page_check_reference() would walk all mapped pte or pmd to check if the page is referenced or not, but such walk sounds unnecessary to NUMA balancing since NUMA balancing would have pte or pmd referenced bit set all the time, so anonymous page would have at least one referenced pte or pmd. And, distinguish with page reclaim path via scan_control, scan_control would be NULL in NUMA balancing path. This approach is not definitely the optimal one to distinguish the hot or cold pages accurately. It may need much more sophisticated algorithm to distinguish hot or cold pages accurately. Signed-off-by: Yang Shi --- mm/huge_memory.c | 11 ++++++ mm/internal.h | 80 ++++++++++++++++++++++++++++++++++++++ mm/memory.c | 21 ++++++++++ mm/vmscan.c | 116 ++++++++++++++++--------------------------------------- 4 files changed, 146 insertions(+), 82 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 404acdc..0b18ac45 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1590,6 +1590,17 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf, pmd_t pmd) } /* + * Promote the page when it gets NUMA fault twice. + * It is safe to set page flag since the page is locked now. + */ + if (!node_state(page_nid, N_CPU_MEM) && + page_check_references(page, NULL) != PAGEREF_PROMOTE) { + put_page(page); + page_nid = NUMA_NO_NODE; + goto clear_pmdnuma; + } + + /* * Migrate the THP to the requested node, returns with page unlocked * and access rights restored. */ diff --git a/mm/internal.h b/mm/internal.h index a514808..bee4d6c 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -89,8 +89,88 @@ static inline void set_page_refcounted(struct page *page) /* * in mm/vmscan.c: */ +struct scan_control { + /* How many pages shrink_list() should reclaim */ + unsigned long nr_to_reclaim; + + /* + * Nodemask of nodes allowed by the caller. If NULL, all nodes + * are scanned. + */ + nodemask_t *nodemask; + + /* + * The memory cgroup that hit its limit and as a result is the + * primary target of this reclaim invocation. + */ + struct mem_cgroup *target_mem_cgroup; + + /* Writepage batching in laptop mode; RECLAIM_WRITE */ + unsigned int may_writepage:1; + + /* Can mapped pages be reclaimed? */ + unsigned int may_unmap:1; + + /* Can pages be swapped as part of reclaim? */ + unsigned int may_swap:1; + + /* e.g. boosted watermark reclaim leaves slabs alone */ + unsigned int may_shrinkslab:1; + + /* + * Cgroups are not reclaimed below their configured memory.low, + * unless we threaten to OOM. If any cgroups are skipped due to + * memory.low and nothing was reclaimed, go back for memory.low. + */ + unsigned int memcg_low_reclaim:1; + unsigned int memcg_low_skipped:1; + + unsigned int hibernation_mode:1; + + /* One of the zones is ready for compaction */ + unsigned int compaction_ready:1; + + /* Allocation order */ + s8 order; + + /* Scan (total_size >> priority) pages at once */ + s8 priority; + + /* The highest zone to isolate pages for reclaim from */ + s8 reclaim_idx; + + /* This context's GFP mask */ + gfp_t gfp_mask; + + /* Incremented by the number of inactive pages that were scanned */ + unsigned long nr_scanned; + + /* Number of pages freed so far during a call to shrink_zones() */ + unsigned long nr_reclaimed; + + struct { + unsigned int dirty; + unsigned int unqueued_dirty; + unsigned int congested; + unsigned int writeback; + unsigned int immediate; + unsigned int file_taken; + unsigned int taken; + } nr; +}; + +enum page_references { + PAGEREF_RECLAIM, + PAGEREF_RECLAIM_CLEAN, + PAGEREF_KEEP, + PAGEREF_ACTIVATE, + PAGEREF_PROMOTE = PAGEREF_ACTIVATE, +}; + extern int isolate_lru_page(struct page *page); extern void putback_lru_page(struct page *page); +enum page_references page_check_references(struct page *page, + struct scan_control *sc); /* * in mm/rmap.c: diff --git a/mm/memory.c b/mm/memory.c index 47fe250..01c1ead 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3680,6 +3680,27 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf) goto out; } + /* + * Promote the page when it gets NUMA fault twice. + * Need lock the page before check its references. + */ + if (!node_state(page_nid, N_CPU_MEM)) { + if (!trylock_page(page)) { + put_page(page); + target_nid = NUMA_NO_NODE; + goto out; + } + + if (page_check_references(page, NULL) != PAGEREF_PROMOTE) { + unlock_page(page); + put_page(page); + target_nid = NUMA_NO_NODE; + goto out; + } + + unlock_page(page); + } + /* Migrate to the requested node */ migrated = migrate_misplaced_page(page, vma, target_nid); if (migrated) { diff --git a/mm/vmscan.c b/mm/vmscan.c index a5ad0b3..0504845 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -63,76 +63,6 @@ #define CREATE_TRACE_POINTS #include -struct scan_control { - /* How many pages shrink_list() should reclaim */ - unsigned long nr_to_reclaim; - - /* - * Nodemask of nodes allowed by the caller. If NULL, all nodes - * are scanned. - */ - nodemask_t *nodemask; - - /* - * The memory cgroup that hit its limit and as a result is the - * primary target of this reclaim invocation. - */ - struct mem_cgroup *target_mem_cgroup; - - /* Writepage batching in laptop mode; RECLAIM_WRITE */ - unsigned int may_writepage:1; - - /* Can mapped pages be reclaimed? */ - unsigned int may_unmap:1; - - /* Can pages be swapped as part of reclaim? */ - unsigned int may_swap:1; - - /* e.g. boosted watermark reclaim leaves slabs alone */ - unsigned int may_shrinkslab:1; - - /* - * Cgroups are not reclaimed below their configured memory.low, - * unless we threaten to OOM. If any cgroups are skipped due to - * memory.low and nothing was reclaimed, go back for memory.low. - */ - unsigned int memcg_low_reclaim:1; - unsigned int memcg_low_skipped:1; - - unsigned int hibernation_mode:1; - - /* One of the zones is ready for compaction */ - unsigned int compaction_ready:1; - - /* Allocation order */ - s8 order; - - /* Scan (total_size >> priority) pages at once */ - s8 priority; - - /* The highest zone to isolate pages for reclaim from */ - s8 reclaim_idx; - - /* This context's GFP mask */ - gfp_t gfp_mask; - - /* Incremented by the number of inactive pages that were scanned */ - unsigned long nr_scanned; - - /* Number of pages freed so far during a call to shrink_zones() */ - unsigned long nr_reclaimed; - - struct { - unsigned int dirty; - unsigned int unqueued_dirty; - unsigned int congested; - unsigned int writeback; - unsigned int immediate; - unsigned int file_taken; - unsigned int taken; - } nr; -}; - #ifdef ARCH_HAS_PREFETCH #define prefetch_prev_lru_page(_page, _base, _field) \ do { \ @@ -1002,21 +932,32 @@ void putback_lru_page(struct page *page) put_page(page); /* drop ref from isolate */ } -enum page_references { - PAGEREF_RECLAIM, - PAGEREF_RECLAIM_CLEAN, - PAGEREF_KEEP, - PAGEREF_ACTIVATE, -}; - -static enum page_references page_check_references(struct page *page, - struct scan_control *sc) +/* + * Called by NUMA balancing to implement access twice check for + * promoting pages from cpuless nodes. + * + * The sc would be NULL in NUMA balancing path. + */ +enum page_references page_check_references(struct page *page, + struct scan_control *sc) { int referenced_ptes, referenced_page; unsigned long vm_flags; + struct mem_cgroup *memcg = sc ? sc->target_mem_cgroup : NULL; + + if (sc) + referenced_ptes = page_referenced(page, 1, memcg, &vm_flags); + else + /* + * The page should always has at least one referenced pte + * in NUMA balancing path since NUMA balancing set referenced + * bit by default in PAGE_NONE. + * So, it sounds unnecessary to walk rmap to get the number of + * referenced ptes. This also help avoid potential ptl + * deadlock for huge pmd. + */ + referenced_ptes = 1; - referenced_ptes = page_referenced(page, 1, sc->target_mem_cgroup, - &vm_flags); referenced_page = TestClearPageReferenced(page); /* @@ -1027,8 +968,19 @@ static enum page_references page_check_references(struct page *page, return PAGEREF_RECLAIM; if (referenced_ptes) { - if (PageSwapBacked(page)) + if (PageSwapBacked(page)) { + if (!sc) { + if (referenced_page) + return PAGEREF_ACTIVATE; + + SetPageReferenced(page); + + return PAGEREF_KEEP; + } + return PAGEREF_ACTIVATE; + } + /* * All mapped pages start out with page table * references from the instantiating fault, so we need From patchwork Thu Apr 11 03:56:54 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yang Shi X-Patchwork-Id: 10895075 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A3FE617EF for ; Thu, 11 Apr 2019 03:57:38 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8CA452018E for ; Thu, 11 Apr 2019 03:57:38 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 8094028AEB; Thu, 11 Apr 2019 03:57:38 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 934D22018E for ; Thu, 11 Apr 2019 03:57:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A26CA6B0008; Wed, 10 Apr 2019 23:57:27 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 93D096B000C; Wed, 10 Apr 2019 23:57:27 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7B1406B000E; Wed, 10 Apr 2019 23:57:27 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f197.google.com (mail-pg1-f197.google.com [209.85.215.197]) by kanga.kvack.org (Postfix) with ESMTP id 415F86B0008 for ; Wed, 10 Apr 2019 23:57:27 -0400 (EDT) Received: by mail-pg1-f197.google.com with SMTP id 33so3532354pgv.17 for ; Wed, 10 Apr 2019 20:57:27 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=Y2aF/IIl27dEEAGYQuuIG6kjUSKoSHWS2RatmAtZPo4=; b=AH4LCbve+7HZPKAc/6OOwns8zneFlhrNAhpNpDas0r/tYZGUr+vt40t8HGKCbfN5Uq B6CVUQpWK+/z067yDIsEGEjJvpnM1SOZPg8TrdEWhFGpyxLMJeXw/09BSx7nR/vkq3Wd hREa9SJxqiVen13/nBBhiEw+UHwafdAZCnHMNDYE9AFr4RIpK9xw1o31T9ecTG1ZPLuC qRtl0e94sizujmgM7Ovzz5p5mWheu0ijtHVDgxRXo85MPA/u1WKUK5x4yMxSIpu7K0hG 3bTD2p+/xBAa7MQcSfX/E+Vb3tffXjz1+GS/D8LbBHMMxuXMQW7/zKIbIP2Xw50SGBzo CNHA== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of yang.shi@linux.alibaba.com designates 115.124.30.130 as permitted sender) smtp.mailfrom=yang.shi@linux.alibaba.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=alibaba.com X-Gm-Message-State: APjAAAUGQ86mxmV5jPk2ApZStDEyL9xTuIQkO0O+Qpd2Tn1eycZtkmt7 mKaMGu+8kTUk1jAHh8CCg8ZWDsJlGOTgl9TecxVoW7CJKRPn3Z9np1SbqsGQOx5fMOiXZbU5G6R HkXansol0P4HUP/XceuZivqaZH/VXHUexrkHHCx41C3fXdYcMEMDaiNXavznCbZX5yw== X-Received: by 2002:a17:902:441:: with SMTP id 59mr18173997ple.242.1554955046669; Wed, 10 Apr 2019 20:57:26 -0700 (PDT) X-Google-Smtp-Source: APXvYqxzmVsq82GZ1ars2niBviJDP59wZxw5vnmBxAN9qj6vGlN3hkqF/CejH8uIaYN+W3kc7S31 X-Received: by 2002:a17:902:441:: with SMTP id 59mr18173915ple.242.1554955045125; Wed, 10 Apr 2019 20:57:25 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1554955045; cv=none; d=google.com; s=arc-20160816; b=puW64LNuAy1RPkWJFUSxK+r/sWNtLu/3cRDyBtYwnLFglh0tn0/oAjdsdu6kuq4Q0q eqdm4otC+q5Uh++RKqfOSnBilqsQlyHmLQ5Ae1btuwRLCbw481x1hP558yWiuDjvmn3G 9m2qF2Q+uEa9fNR3HZa4+cX1/qwatX1A1wtDaeDTr6AhGze58XbOVNzbT553hNJTx2BM 0Oj9zeCNplpFKoe0AXWwG184q4gAAHoVw+QOXIq/ZoRu01JxkW6IJ+OkUudjk7CCKO2A BUu/SMZjC1QeXv/+jQf378RL4iHqky94XtVlybak7sUWSUNnqRHj3mNM6S/AHqgLWWjv JkWQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=Y2aF/IIl27dEEAGYQuuIG6kjUSKoSHWS2RatmAtZPo4=; b=xuzZZ/bwpzHa9Topw7cF9y2reF99yKXozC8fmuXY2fPx4MX14a0xTe/Mon3ZgHKHA0 pF8U2jf0VinVmoe+jMyyGS3qfs0LMq0zS9PcSC+GgMDi5c7y1GKhUXEUQ445UYwoZwiC /hWgEzw/87dXthOSKmMlv2wbY1KjSBjf6xNQY6/CmRYlttTD6qxl45ESxpsRzU5JnCaz cZql8Z/zOSqM24V3y0Duzq1PtItqa3VDG9bVrd5Sl0z8GN3z+hR9oUFnjGGofcDdsjRx 23FvGPNwjZjVXTh2KxFWePM3sWpBk99L4zDsJHb9rkf5hZofRcrTxJG8OXGACWo9L0K3 +EiA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of yang.shi@linux.alibaba.com designates 115.124.30.130 as permitted sender) smtp.mailfrom=yang.shi@linux.alibaba.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: from out30-130.freemail.mail.aliyun.com (out30-130.freemail.mail.aliyun.com. [115.124.30.130]) by mx.google.com with ESMTPS id m128si33434437pga.142.2019.04.10.20.57.24 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 10 Apr 2019 20:57:25 -0700 (PDT) Received-SPF: pass (google.com: domain of yang.shi@linux.alibaba.com designates 115.124.30.130 as permitted sender) client-ip=115.124.30.130; Authentication-Results: mx.google.com; spf=pass (google.com: domain of yang.shi@linux.alibaba.com designates 115.124.30.130 as permitted sender) smtp.mailfrom=yang.shi@linux.alibaba.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=alibaba.com X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R231e4;CH=green;DM=||false|;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04407;MF=yang.shi@linux.alibaba.com;NM=1;PH=DS;RN=15;SR=0;TI=SMTPD_---0TP0I5rB_1554955031; Received: from e19h19392.et15sqa.tbsite.net(mailfrom:yang.shi@linux.alibaba.com fp:SMTPD_---0TP0I5rB_1554955031) by smtp.aliyun-inc.com(127.0.0.1); Thu, 11 Apr 2019 11:57:22 +0800 From: Yang Shi To: mhocko@suse.com, mgorman@techsingularity.net, riel@surriel.com, hannes@cmpxchg.org, akpm@linux-foundation.org, dave.hansen@intel.com, keith.busch@intel.com, dan.j.williams@intel.com, fengguang.wu@intel.com, fan.du@intel.com, ying.huang@intel.com, ziy@nvidia.com Cc: yang.shi@linux.alibaba.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [v2 PATCH 4/9] mm: migrate: make migrate_pages() return nr_succeeded Date: Thu, 11 Apr 2019 11:56:54 +0800 Message-Id: <1554955019-29472-5-git-send-email-yang.shi@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1554955019-29472-1-git-send-email-yang.shi@linux.alibaba.com> References: <1554955019-29472-1-git-send-email-yang.shi@linux.alibaba.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP The migrate_pages() returns the number of pages that were not migrated, or an error code. When returning an error code, there is no way to know how many pages were migrated or not migrated. In the following patch, migrate_pages() is used to demote pages to PMEM node, we need account how many pages are reclaimed (demoted) since page reclaim behavior depends on this. Add *nr_succeeded parameter to make migrate_pages() return how many pages are demoted successfully for all cases. Signed-off-by: Yang Shi --- include/linux/migrate.h | 5 +++-- mm/compaction.c | 3 ++- mm/gup.c | 4 +++- mm/memory-failure.c | 7 +++++-- mm/memory_hotplug.c | 4 +++- mm/mempolicy.c | 7 +++++-- mm/migrate.c | 18 ++++++++++-------- mm/page_alloc.c | 4 +++- 8 files changed, 34 insertions(+), 18 deletions(-) diff --git a/include/linux/migrate.h b/include/linux/migrate.h index e13d9bf..837fdd1 100644 --- a/include/linux/migrate.h +++ b/include/linux/migrate.h @@ -66,7 +66,8 @@ extern int migrate_page(struct address_space *mapping, struct page *newpage, struct page *page, enum migrate_mode mode); extern int migrate_pages(struct list_head *l, new_page_t new, free_page_t free, - unsigned long private, enum migrate_mode mode, int reason); + unsigned long private, enum migrate_mode mode, int reason, + unsigned int *nr_succeeded); extern int isolate_movable_page(struct page *page, isolate_mode_t mode); extern void putback_movable_page(struct page *page); @@ -84,7 +85,7 @@ extern int migrate_page_move_mapping(struct address_space *mapping, static inline void putback_movable_pages(struct list_head *l) {} static inline int migrate_pages(struct list_head *l, new_page_t new, free_page_t free, unsigned long private, enum migrate_mode mode, - int reason) + int reason, unsigned int *nr_succeeded) { return -ENOSYS; } static inline int isolate_movable_page(struct page *page, isolate_mode_t mode) { return -EBUSY; } diff --git a/mm/compaction.c b/mm/compaction.c index f171a83..c6a0ec4 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -2065,6 +2065,7 @@ bool compaction_zonelist_suitable(struct alloc_context *ac, int order, unsigned long last_migrated_pfn; const bool sync = cc->mode != MIGRATE_ASYNC; bool update_cached; + unsigned int nr_succeeded = 0; cc->migratetype = gfpflags_to_migratetype(cc->gfp_mask); ret = compaction_suitable(cc->zone, cc->order, cc->alloc_flags, @@ -2173,7 +2174,7 @@ bool compaction_zonelist_suitable(struct alloc_context *ac, int order, err = migrate_pages(&cc->migratepages, compaction_alloc, compaction_free, (unsigned long)cc, cc->mode, - MR_COMPACTION); + MR_COMPACTION, &nr_succeeded); trace_mm_compaction_migratepages(cc->nr_migratepages, err, &cc->migratepages); diff --git a/mm/gup.c b/mm/gup.c index f84e226..b482b8c 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -1217,6 +1217,7 @@ static long check_and_migrate_cma_pages(unsigned long start, long nr_pages, long i; bool drain_allow = true; bool migrate_allow = true; + unsigned int nr_succeeded = 0; LIST_HEAD(cma_page_list); check_again: @@ -1257,7 +1258,8 @@ static long check_and_migrate_cma_pages(unsigned long start, long nr_pages, put_page(pages[i]); if (migrate_pages(&cma_page_list, new_non_cma_page, - NULL, 0, MIGRATE_SYNC, MR_CONTIG_RANGE)) { + NULL, 0, MIGRATE_SYNC, MR_CONTIG_RANGE, + &nr_succeeded)) { /* * some of the pages failed migration. Do get_user_pages * without migration. diff --git a/mm/memory-failure.c b/mm/memory-failure.c index fc8b517..b5d8a8f 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1686,6 +1686,7 @@ static int soft_offline_huge_page(struct page *page, int flags) int ret; unsigned long pfn = page_to_pfn(page); struct page *hpage = compound_head(page); + unsigned int nr_succeeded = 0; LIST_HEAD(pagelist); /* @@ -1713,7 +1714,7 @@ static int soft_offline_huge_page(struct page *page, int flags) } ret = migrate_pages(&pagelist, new_page, NULL, MPOL_MF_MOVE_ALL, - MIGRATE_SYNC, MR_MEMORY_FAILURE); + MIGRATE_SYNC, MR_MEMORY_FAILURE, &nr_succeeded); if (ret) { pr_info("soft offline: %#lx: hugepage migration failed %d, type %lx (%pGp)\n", pfn, ret, page->flags, &page->flags); @@ -1742,6 +1743,7 @@ static int __soft_offline_page(struct page *page, int flags) { int ret; unsigned long pfn = page_to_pfn(page); + unsigned int nr_succeeded = 0; /* * Check PageHWPoison again inside page lock because PageHWPoison @@ -1801,7 +1803,8 @@ static int __soft_offline_page(struct page *page, int flags) page_is_file_cache(page)); list_add(&page->lru, &pagelist); ret = migrate_pages(&pagelist, new_page, NULL, MPOL_MF_MOVE_ALL, - MIGRATE_SYNC, MR_MEMORY_FAILURE); + MIGRATE_SYNC, MR_MEMORY_FAILURE, + &nr_succeeded); if (ret) { if (!list_empty(&pagelist)) putback_movable_pages(&pagelist); diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 1140f3b..29414a4 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1375,6 +1375,7 @@ static struct page *new_node_page(struct page *page, unsigned long private) unsigned long pfn; struct page *page; int ret = 0; + unsigned int nr_succeeded = 0; LIST_HEAD(source); for (pfn = start_pfn; pfn < end_pfn; pfn++) { @@ -1435,7 +1436,8 @@ static struct page *new_node_page(struct page *page, unsigned long private) if (!list_empty(&source)) { /* Allocate a new page from the nearest neighbor node */ ret = migrate_pages(&source, new_node_page, NULL, 0, - MIGRATE_SYNC, MR_MEMORY_HOTPLUG); + MIGRATE_SYNC, MR_MEMORY_HOTPLUG, + &nr_succeeded); if (ret) { list_for_each_entry(page, &source, lru) { pr_warn("migrating pfn %lx failed ret:%d ", diff --git a/mm/mempolicy.c b/mm/mempolicy.c index af171cc..96d6e2e 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -962,6 +962,7 @@ static int migrate_to_node(struct mm_struct *mm, int source, int dest, nodemask_t nmask; LIST_HEAD(pagelist); int err = 0; + unsigned int nr_succeeded = 0; nodes_clear(nmask); node_set(source, nmask); @@ -977,7 +978,7 @@ static int migrate_to_node(struct mm_struct *mm, int source, int dest, if (!list_empty(&pagelist)) { err = migrate_pages(&pagelist, alloc_new_node_page, NULL, dest, - MIGRATE_SYNC, MR_SYSCALL); + MIGRATE_SYNC, MR_SYSCALL, &nr_succeeded); if (err) putback_movable_pages(&pagelist); } @@ -1156,6 +1157,7 @@ static long do_mbind(unsigned long start, unsigned long len, struct mempolicy *new; unsigned long end; int err; + unsigned int nr_succeeded = 0; LIST_HEAD(pagelist); if (flags & ~(unsigned long)MPOL_MF_VALID) @@ -1228,7 +1230,8 @@ static long do_mbind(unsigned long start, unsigned long len, if (!list_empty(&pagelist)) { WARN_ON_ONCE(flags & MPOL_MF_LAZY); nr_failed = migrate_pages(&pagelist, new_page, NULL, - start, MIGRATE_SYNC, MR_MEMPOLICY_MBIND); + start, MIGRATE_SYNC, MR_MEMPOLICY_MBIND, + &nr_succeeded); if (nr_failed) putback_movable_pages(&pagelist); } diff --git a/mm/migrate.c b/mm/migrate.c index ac6f493..84bba47 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -1387,6 +1387,7 @@ static int unmap_and_move_huge_page(new_page_t get_new_page, * @mode: The migration mode that specifies the constraints for * page migration, if any. * @reason: The reason for page migration. + * @nr_succeeded: The number of pages migrated successfully. * * The function returns after 10 attempts or if no pages are movable any more * because the list has become empty or no retryable pages exist any more. @@ -1397,11 +1398,10 @@ static int unmap_and_move_huge_page(new_page_t get_new_page, */ int migrate_pages(struct list_head *from, new_page_t get_new_page, free_page_t put_new_page, unsigned long private, - enum migrate_mode mode, int reason) + enum migrate_mode mode, int reason, unsigned int *nr_succeeded) { int retry = 1; int nr_failed = 0; - int nr_succeeded = 0; int pass = 0; struct page *page; struct page *page2; @@ -1455,7 +1455,7 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page, retry++; break; case MIGRATEPAGE_SUCCESS: - nr_succeeded++; + (*nr_succeeded)++; break; default: /* @@ -1472,11 +1472,11 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page, nr_failed += retry; rc = nr_failed; out: - if (nr_succeeded) - count_vm_events(PGMIGRATE_SUCCESS, nr_succeeded); + if (*nr_succeeded) + count_vm_events(PGMIGRATE_SUCCESS, *nr_succeeded); if (nr_failed) count_vm_events(PGMIGRATE_FAIL, nr_failed); - trace_mm_migrate_pages(nr_succeeded, nr_failed, mode, reason); + trace_mm_migrate_pages(*nr_succeeded, nr_failed, mode, reason); if (!swapwrite) current->flags &= ~PF_SWAPWRITE; @@ -1501,12 +1501,13 @@ static int do_move_pages_to_node(struct mm_struct *mm, struct list_head *pagelist, int node) { int err; + unsigned int nr_succeeded = 0; if (list_empty(pagelist)) return 0; err = migrate_pages(pagelist, alloc_new_node_page, NULL, node, - MIGRATE_SYNC, MR_SYSCALL); + MIGRATE_SYNC, MR_SYSCALL, &nr_succeeded); if (err) putback_movable_pages(pagelist); return err; @@ -1939,6 +1940,7 @@ int migrate_misplaced_page(struct page *page, struct vm_area_struct *vma, pg_data_t *pgdat = NODE_DATA(node); int isolated; int nr_remaining; + unsigned int nr_succeeded = 0; LIST_HEAD(migratepages); /* @@ -1963,7 +1965,7 @@ int migrate_misplaced_page(struct page *page, struct vm_area_struct *vma, list_add(&page->lru, &migratepages); nr_remaining = migrate_pages(&migratepages, alloc_misplaced_dst_page, NULL, node, MIGRATE_ASYNC, - MR_NUMA_MISPLACED); + MR_NUMA_MISPLACED, &nr_succeeded); if (nr_remaining) { if (!list_empty(&migratepages)) { list_del(&page->lru); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index bda17c2..e53cc96 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -8139,6 +8139,7 @@ static int __alloc_contig_migrate_range(struct compact_control *cc, unsigned long pfn = start; unsigned int tries = 0; int ret = 0; + unsigned int nr_succeeded = 0; migrate_prep(); @@ -8166,7 +8167,8 @@ static int __alloc_contig_migrate_range(struct compact_control *cc, cc->nr_migratepages -= nr_reclaimed; ret = migrate_pages(&cc->migratepages, alloc_migrate_target, - NULL, 0, cc->mode, MR_CONTIG_RANGE); + NULL, 0, cc->mode, MR_CONTIG_RANGE, + &nr_succeeded); } if (ret < 0) { putback_movable_pages(&cc->migratepages); From patchwork Thu Apr 11 03:56:55 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yang Shi X-Patchwork-Id: 10895083 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 272351669 for ; Thu, 11 Apr 2019 03:58:17 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 115DF2018E for ; Thu, 11 Apr 2019 03:58:17 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 056F81FF73; Thu, 11 Apr 2019 03:58:17 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0B2BC2018E for ; Thu, 11 Apr 2019 03:58:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D7C7D6B0010; Wed, 10 Apr 2019 23:58:14 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id D06476B0266; Wed, 10 Apr 2019 23:58:14 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BA6D56B0269; Wed, 10 Apr 2019 23:58:14 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f198.google.com (mail-pf1-f198.google.com [209.85.210.198]) by kanga.kvack.org (Postfix) with ESMTP id 783406B0010 for ; Wed, 10 Apr 2019 23:58:14 -0400 (EDT) Received: by mail-pf1-f198.google.com with SMTP id p8so3397771pfd.4 for ; Wed, 10 Apr 2019 20:58:14 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=/lKb2r94b4CHm8hvCuJYyFgwh4EfbW3OAW6MiwwGvc0=; b=EFEXJGg5ObT3xO9uASxUOUkAiXRdXtp4N3SnUFacByYKyDsl+Nw6lMOKJgaIVYtDLI 4f8slym6inr/s84UuuyRnsqRq6HEkbUhL4k2Or+Mn/kJZ0CTH45aeqLEoiXrG0HaUNdl s7yFMtEs8N65nK4/nznVF2MdLgX6t+yF4WQ1NjyAh++nDMSvse9PfsQ8XmIIhmfmbjsg 7CxF02fHtURHQ4Js5qpAdk9uriK7KWW3tqdg06hMtbwACAG0R2GwNClUN0JaNxOwHyzZ MxnT5XNwd0U4jqfKAytHvN/ue2tbRQnhv4eoXM+KbyVt7ppe8wewxTlmi+23213GJ5eP EDfw== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of yang.shi@linux.alibaba.com designates 115.124.30.56 as permitted sender) smtp.mailfrom=yang.shi@linux.alibaba.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=alibaba.com X-Gm-Message-State: APjAAAX6taefUGrzLb1tkEUFJO59vIXd61Hm/gfojdtlSylBaM8/cRx4 om3UFyRl8gJc+3c37Od66EHL9bcyn045O8VOwVg7sLia3MNY/cQ7WHIonjhHpo2ZAlFQgMp7rSc dBe5JgA39PeoCYZZ7fyqdyBsyiUaj+Bkh/8d2D7c0R16dOuYdM38p4Mt2JajRtO2n0A== X-Received: by 2002:a17:902:f089:: with SMTP id go9mr46154228plb.309.1554955094003; Wed, 10 Apr 2019 20:58:14 -0700 (PDT) X-Google-Smtp-Source: APXvYqwvbzvlnYdFTLXfXMnRaQWlmy6Ya6lqeh8OAnt5WnwxCplI7TtmGlAMRy3at9AyqOdQLVkt X-Received: by 2002:a17:902:f089:: with SMTP id go9mr46154116plb.309.1554955091927; Wed, 10 Apr 2019 20:58:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1554955091; cv=none; d=google.com; s=arc-20160816; b=NT7j3peLzq0jiAHOLl7X1QkoGk5TpX4FjzymFeHZv7MAZH7ZwXpqEpfjoRloyQSaaR n0nYCyCYtbyx0cZ7iZxonlrdao5B0HjCU08/tzhVVP1T8kuTK6wCyYwWfbArdfBjzDTE +gpvuKvRsw43FbaEaV8HfO+fb2pBlUxq/NMSOW2OTCz2nr/EJCVbe4kswwuhODy4mAjr q9AERe7244moGrocNQNEB2/1y/Ou9kgqBjv/vZOkznqbjS6opk3FoGp5CnFozsHUnAyD 83+ZGjnORZ+EhDOO19djHR+MoglcUnh8QzmcVFJqUASuRoJJdEyR0j4QThZyPtOfVJKN LloA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=/lKb2r94b4CHm8hvCuJYyFgwh4EfbW3OAW6MiwwGvc0=; b=A9g87izwEmAuRDS8kKMKqrVnH0NAFL9AWslZdphXA/IrxxM1TAjdbrfSly++h6n7o7 XGMGcvq2VzGboDY3oLJXS+hfX9M1x0RpnGOx1WGI5be/djYHcPGaTbgn41Md0NmzY8ym FJivPRbcFyi/jK+x6jt2+d/lWbxPN8mNk8IicgR50ZxOQhmsSV4m8NGNOTOadb0pxXF3 cp7XUaGb5966c+liK3S3AJS+NYjcB2jNAkRxWyCLuN+sLOBy/sqfxKIfHCDD6kABK8vx oXawVLt1eIu2LRi2O41gQRUoVtAAtBjoA/PPhz6En6LwJ5nxWlTGsdLYflw9u3Ka+s1i /Vxw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of yang.shi@linux.alibaba.com designates 115.124.30.56 as permitted sender) smtp.mailfrom=yang.shi@linux.alibaba.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: from out30-56.freemail.mail.aliyun.com (out30-56.freemail.mail.aliyun.com. [115.124.30.56]) by mx.google.com with ESMTPS id o9si32302785pgv.25.2019.04.10.20.58.11 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 10 Apr 2019 20:58:11 -0700 (PDT) Received-SPF: pass (google.com: domain of yang.shi@linux.alibaba.com designates 115.124.30.56 as permitted sender) client-ip=115.124.30.56; Authentication-Results: mx.google.com; spf=pass (google.com: domain of yang.shi@linux.alibaba.com designates 115.124.30.56 as permitted sender) smtp.mailfrom=yang.shi@linux.alibaba.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=alibaba.com X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R351e4;CH=green;DM=||false|;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04394;MF=yang.shi@linux.alibaba.com;NM=1;PH=DS;RN=15;SR=0;TI=SMTPD_---0TP0I5rB_1554955031; Received: from e19h19392.et15sqa.tbsite.net(mailfrom:yang.shi@linux.alibaba.com fp:SMTPD_---0TP0I5rB_1554955031) by smtp.aliyun-inc.com(127.0.0.1); Thu, 11 Apr 2019 11:57:23 +0800 From: Yang Shi To: mhocko@suse.com, mgorman@techsingularity.net, riel@surriel.com, hannes@cmpxchg.org, akpm@linux-foundation.org, dave.hansen@intel.com, keith.busch@intel.com, dan.j.williams@intel.com, fengguang.wu@intel.com, fan.du@intel.com, ying.huang@intel.com, ziy@nvidia.com Cc: yang.shi@linux.alibaba.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [v2 PATCH 5/9] mm: vmscan: demote anon DRAM pages to PMEM node Date: Thu, 11 Apr 2019 11:56:55 +0800 Message-Id: <1554955019-29472-6-git-send-email-yang.shi@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1554955019-29472-1-git-send-email-yang.shi@linux.alibaba.com> References: <1554955019-29472-1-git-send-email-yang.shi@linux.alibaba.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Since PMEM provides larger capacity than DRAM and has much lower access latency than disk, so it is a good choice to use as a middle tier between DRAM and disk in page reclaim path. With PMEM nodes, the demotion path of anonymous pages could be: DRAM -> PMEM -> swap device This patch demotes anonymous pages only for the time being and demote THP to PMEM in a whole. To avoid expensive page reclaim and/or compaction on PMEM node if there is memory pressure on it, the most conservative gfp flag is used, which would fail quickly if there is memory pressure and just wakeup kswapd on failure. The migrate_pages() would split THP to migrate one by one as base page upon THP allocation failure. Demote pages to the cloest non-DRAM node even though the system is swapless. The current logic of page reclaim just scan anon LRU when swap is on and swappiness is set properly. Demoting to PMEM doesn't need care whether swap is available or not. But, reclaiming from PMEM still skip anon LRU if swap is not available. The demotion just happens from DRAM node to its cloest PMEM node. Demoting to a remote PMEM node or migrating from PMEM to DRAM on reclaim is not allowed for now. And, define a new migration reason for demotion, called MR_DEMOTE. Demote page via async migration to avoid blocking. Signed-off-by: Yang Shi --- include/linux/gfp.h | 12 ++++ include/linux/migrate.h | 1 + include/trace/events/migrate.h | 3 +- mm/debug.c | 1 + mm/internal.h | 13 +++++ mm/migrate.c | 15 ++++- mm/vmscan.c | 127 +++++++++++++++++++++++++++++++++++------ 7 files changed, 149 insertions(+), 23 deletions(-) diff --git a/include/linux/gfp.h b/include/linux/gfp.h index fdab7de..57ced51 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -285,6 +285,14 @@ * available and will not wake kswapd/kcompactd on failure. The _LIGHT * version does not attempt reclaim/compaction at all and is by default used * in page fault path, while the non-light is used by khugepaged. + * + * %GFP_DEMOTE is for migration on memory reclaim (a.k.a demotion) allocations. + * The allocation might happen in kswapd or direct reclaim, so assuming + * __GFP_IO and __GFP_FS are not allowed looks safer. Demotion happens for + * user pages (on LRU) only and on specific node. Generally it will fail + * quickly if memory is not available, but may wake up kswapd on failure. + * + * %GFP_TRANSHUGE_DEMOTE is used for THP demotion allocation. */ #define GFP_ATOMIC (__GFP_HIGH|__GFP_ATOMIC|__GFP_KSWAPD_RECLAIM) #define GFP_KERNEL (__GFP_RECLAIM | __GFP_IO | __GFP_FS) @@ -300,6 +308,10 @@ #define GFP_TRANSHUGE_LIGHT ((GFP_HIGHUSER_MOVABLE | __GFP_COMP | \ __GFP_NOMEMALLOC | __GFP_NOWARN) & ~__GFP_RECLAIM) #define GFP_TRANSHUGE (GFP_TRANSHUGE_LIGHT | __GFP_DIRECT_RECLAIM) +#define GFP_DEMOTE (__GFP_HIGHMEM | __GFP_MOVABLE | __GFP_NORETRY | \ + __GFP_NOMEMALLOC | __GFP_NOWARN | __GFP_THISNODE | \ + GFP_NOWAIT) +#define GFP_TRANSHUGE_DEMOTE (GFP_DEMOTE | __GFP_COMP) /* Convert GFP flags to their corresponding migrate type */ #define GFP_MOVABLE_MASK (__GFP_RECLAIMABLE|__GFP_MOVABLE) diff --git a/include/linux/migrate.h b/include/linux/migrate.h index 837fdd1..cfb1f57 100644 --- a/include/linux/migrate.h +++ b/include/linux/migrate.h @@ -25,6 +25,7 @@ enum migrate_reason { MR_MEMPOLICY_MBIND, MR_NUMA_MISPLACED, MR_CONTIG_RANGE, + MR_DEMOTE, MR_TYPES }; diff --git a/include/trace/events/migrate.h b/include/trace/events/migrate.h index 705b33d..c1d5b36 100644 --- a/include/trace/events/migrate.h +++ b/include/trace/events/migrate.h @@ -20,7 +20,8 @@ EM( MR_SYSCALL, "syscall_or_cpuset") \ EM( MR_MEMPOLICY_MBIND, "mempolicy_mbind") \ EM( MR_NUMA_MISPLACED, "numa_misplaced") \ - EMe(MR_CONTIG_RANGE, "contig_range") + EM( MR_CONTIG_RANGE, "contig_range") \ + EMe(MR_DEMOTE, "demote") /* * First define the enums in the above macros to be exported to userspace diff --git a/mm/debug.c b/mm/debug.c index c0b31b6..cc0d7df 100644 --- a/mm/debug.c +++ b/mm/debug.c @@ -25,6 +25,7 @@ "mempolicy_mbind", "numa_misplaced", "cma", + "demote", }; const struct trace_print_flags pageflag_names[] = { diff --git a/mm/internal.h b/mm/internal.h index bee4d6c..8c424b5 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -383,6 +383,19 @@ static inline int find_next_best_node(int node, nodemask_t *used_node_mask, } #endif +static inline bool has_cpuless_node_online(void) +{ + nodemask_t nmask; + + nodes_andnot(nmask, node_states[N_MEMORY], + node_states[N_CPU_MEM]); + + if (nodes_empty(nmask)) + return false; + + return true; +} + /* mm/util.c */ void __vma_link_list(struct mm_struct *mm, struct vm_area_struct *vma, struct vm_area_struct *prev, struct rb_node *rb_parent); diff --git a/mm/migrate.c b/mm/migrate.c index 84bba47..c97a739 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -1001,7 +1001,8 @@ static int move_to_new_page(struct page *newpage, struct page *page, } static int __unmap_and_move(struct page *page, struct page *newpage, - int force, enum migrate_mode mode) + int force, enum migrate_mode mode, + enum migrate_reason reason) { int rc = -EAGAIN; int page_was_mapped = 0; @@ -1138,8 +1139,16 @@ static int __unmap_and_move(struct page *page, struct page *newpage, if (rc == MIGRATEPAGE_SUCCESS) { if (unlikely(!is_lru)) put_page(newpage); - else + else { + /* + * Put demoted pages on the target node's + * active LRU. + */ + if (!PageUnevictable(newpage) && + reason == MR_DEMOTE) + SetPageActive(newpage); putback_lru_page(newpage); + } } return rc; @@ -1193,7 +1202,7 @@ static ICE_noinline int unmap_and_move(new_page_t get_new_page, goto out; } - rc = __unmap_and_move(page, newpage, force, mode); + rc = __unmap_and_move(page, newpage, force, mode, reason); if (rc == MIGRATEPAGE_SUCCESS) set_page_owner_migrate_reason(newpage, reason); diff --git a/mm/vmscan.c b/mm/vmscan.c index 0504845..2a96609 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1046,6 +1046,45 @@ static void page_check_dirty_writeback(struct page *page, mapping->a_ops->is_dirty_writeback(page, dirty, writeback); } +static inline bool is_demote_ok(int nid) +{ + /* Current node is cpuless node */ + if (!node_state(nid, N_CPU_MEM)) + return false; + + /* No online PMEM node */ + if (!has_cpuless_node_online()) + return false; + + return true; +} + +#ifdef CONFIG_NUMA +static struct page *alloc_demote_page(struct page *page, unsigned long node) +{ + if (unlikely(PageHuge(page))) + /* HugeTLB demotion is not supported for now */ + BUG(); + else if (PageTransHuge(page)) { + struct page *thp; + + thp = alloc_pages_node(node, GFP_TRANSHUGE_DEMOTE, + HPAGE_PMD_ORDER); + if (!thp) + return NULL; + prep_transhuge_page(thp); + return thp; + } else + return __alloc_pages_node(node, GFP_DEMOTE, 0); +} +#else +static inline struct page *alloc_demote_page(struct page *page, + unsigned long node) +{ + return NULL; +} +#endif + /* * shrink_page_list() returns the number of reclaimed pages */ @@ -1058,6 +1097,7 @@ static unsigned long shrink_page_list(struct list_head *page_list, { LIST_HEAD(ret_pages); LIST_HEAD(free_pages); + LIST_HEAD(demote_pages); unsigned nr_reclaimed = 0; memset(stat, 0, sizeof(*stat)); @@ -1220,6 +1260,18 @@ static unsigned long shrink_page_list(struct list_head *page_list, */ if (PageAnon(page) && PageSwapBacked(page)) { if (!PageSwapCache(page)) { + /* + * Demote anonymous pages only for now and + * skip MADV_FREE pages. + * + * Demotion only happen from primary nodes + * to cpuless nodes. + */ + if (is_demote_ok(page_to_nid(page))) { + list_add(&page->lru, &demote_pages); + unlock_page(page); + continue; + } if (!(sc->gfp_mask & __GFP_IO)) goto keep_locked; if (PageTransHuge(page)) { @@ -1429,6 +1481,29 @@ static unsigned long shrink_page_list(struct list_head *page_list, VM_BUG_ON_PAGE(PageLRU(page) || PageUnevictable(page), page); } + /* Demote pages to PMEM */ + if (!list_empty(&demote_pages)) { + int err, target_nid; + unsigned int nr_succeeded = 0; + nodemask_t used_mask; + + nodes_clear(used_mask); + target_nid = find_next_best_node(pgdat->node_id, &used_mask, + true); + + err = migrate_pages(&demote_pages, alloc_demote_page, NULL, + target_nid, MIGRATE_ASYNC, MR_DEMOTE, + &nr_succeeded); + + nr_reclaimed += nr_succeeded; + + if (err) { + putback_movable_pages(&demote_pages); + + list_splice(&ret_pages, &demote_pages); + } + } + mem_cgroup_uncharge_list(&free_pages); try_to_unmap_flush(); free_unref_page_list(&free_pages); @@ -2140,10 +2215,11 @@ static bool inactive_list_is_low(struct lruvec *lruvec, bool file, unsigned long gb; /* - * If we don't have swap space, anonymous page deactivation - * is pointless. + * If we don't have swap space or PMEM online, anonymous page + * deactivation is pointless. */ - if (!file && !total_swap_pages) + if (!file && !total_swap_pages && + !is_demote_ok(pgdat->node_id)) return false; inactive = lruvec_lru_size(lruvec, inactive_lru, sc->reclaim_idx); @@ -2223,22 +2299,34 @@ static void get_scan_count(struct lruvec *lruvec, struct mem_cgroup *memcg, unsigned long ap, fp; enum lru_list lru; - /* If we have no swap space, do not bother scanning anon pages. */ - if (!sc->may_swap || mem_cgroup_get_nr_swap_pages(memcg) <= 0) { - scan_balance = SCAN_FILE; - goto out; - } - /* - * Global reclaim will swap to prevent OOM even with no - * swappiness, but memcg users want to use this knob to - * disable swapping for individual groups completely when - * using the memory controller's swap limit feature would be - * too expensive. + * Anon pages can be demoted to PMEM. If there is PMEM node online, + * still scan anonymous LRU even though the systme is swapless or + * swapping is disabled by memcg. + * + * If current node is already PMEM node, demotion is not applicable. */ - if (!global_reclaim(sc) && !swappiness) { - scan_balance = SCAN_FILE; - goto out; + if (!is_demote_ok(pgdat->node_id)) { + /* + * If we have no swap space, do not bother scanning + * anon pages. + */ + if (!sc->may_swap || mem_cgroup_get_nr_swap_pages(memcg) <= 0) { + scan_balance = SCAN_FILE; + goto out; + } + + /* + * Global reclaim will swap to prevent OOM even with no + * swappiness, but memcg users want to use this knob to + * disable swapping for individual groups completely when + * using the memory controller's swap limit feature would be + * too expensive. + */ + if (!global_reclaim(sc) && !swappiness) { + scan_balance = SCAN_FILE; + goto out; + } } /* @@ -2587,7 +2675,7 @@ static inline bool should_continue_reclaim(struct pglist_data *pgdat, */ pages_for_compaction = compact_gap(sc->order); inactive_lru_pages = node_page_state(pgdat, NR_INACTIVE_FILE); - if (get_nr_swap_pages() > 0) + if (get_nr_swap_pages() > 0 || is_demote_ok(pgdat->node_id)) inactive_lru_pages += node_page_state(pgdat, NR_INACTIVE_ANON); if (sc->nr_reclaimed < pages_for_compaction && inactive_lru_pages > pages_for_compaction) @@ -3284,7 +3372,8 @@ static void age_active_anon(struct pglist_data *pgdat, { struct mem_cgroup *memcg; - if (!total_swap_pages) + /* Aging anon page as long as demotion is fine */ + if (!total_swap_pages && !is_demote_ok(pgdat->node_id)) return; memcg = mem_cgroup_iter(NULL, NULL, NULL); From patchwork Thu Apr 11 03:56:56 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yang Shi X-Patchwork-Id: 10895079 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C263F17E0 for ; Thu, 11 Apr 2019 03:58:03 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AB23E1FF73 for ; Thu, 11 Apr 2019 03:58:03 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9C1AC27F85; Thu, 11 Apr 2019 03:58:03 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 08F4B1FF73 for ; Thu, 11 Apr 2019 03:58:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 149A16B000C; Wed, 10 Apr 2019 23:58:02 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 0F99D6B000E; Wed, 10 Apr 2019 23:58:02 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F02736B0010; Wed, 10 Apr 2019 23:58:01 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f197.google.com (mail-pl1-f197.google.com [209.85.214.197]) by kanga.kvack.org (Postfix) with ESMTP id B5F946B000C for ; Wed, 10 Apr 2019 23:58:01 -0400 (EDT) Received: by mail-pl1-f197.google.com with SMTP id d16so3205650pll.21 for ; Wed, 10 Apr 2019 20:58:01 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=kPV/IimYK8rIwnVzgWtbkuEOj9DcMDBGg1cT6gUJMCE=; b=NtWaX/AFnnTvRkcg89//yDBYkFRkaM8LBQIw4nF4Il078H+6C7fkc1EPx63Vp2dwHB KaKTdUg/J5Jk7BilEqEAtIYeDCq6YAHHR+lVFXwKyKZdwPXa4lQno/QM64pCL0xzW3I9 K9cFGuBvXoGZKVgb+yvCW+3TjpBEEXWm6pstZsggBS0s36FavKbcPayHS0O3Rt+5dLzV xP1thCSbqYcEQA8uspubf3TI2qL4rtEU616dv/3SbEwHsP4m1j24ywVRFv3NnUp4RMiU jDZv9ELE7aE4SIcN736ncnVU3YA3rCL4GM4dESIaPjqugZXQQoaZFSns/AM/unm1NRFL GsrQ== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of yang.shi@linux.alibaba.com designates 115.124.30.45 as permitted sender) smtp.mailfrom=yang.shi@linux.alibaba.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=alibaba.com X-Gm-Message-State: APjAAAULNXb7SmkLaCEV080MdlayxP7RA4ZkZ8o0AKstkcZBBkqKtPtV 6zYI9GBB7KbbNE1Eb0MURIRbafqcf7Xaa9sgWuWhcmUKwlGWfy/FjT50ErPcXqyFp4EeQq00nQ/ oP0cqHNOlovm+yasORvUA2YveuD3VKJIf7BCHeaCPN3yGhTA6UlIi+W3wQcX41PzQ6Q== X-Received: by 2002:a63:711d:: with SMTP id m29mr45178533pgc.109.1554955081301; Wed, 10 Apr 2019 20:58:01 -0700 (PDT) X-Google-Smtp-Source: APXvYqwf0tO+XBYV99+JNELIJ/keN/hba1UwHO5vhYW4nljK6uYSZTFQMWz04INdMNrwaMPbVgfI X-Received: by 2002:a63:711d:: with SMTP id m29mr45178477pgc.109.1554955080041; Wed, 10 Apr 2019 20:58:00 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1554955080; cv=none; d=google.com; s=arc-20160816; b=pJXuQR3+kKBtLBYAzlrC8+LGU95i7+HNkEgDZteXiLCsqQKSvPB0vdLLnnpAaxG+i4 k8lO9E8npMzMQgrh6IGvD1MGUJFhpodne90Vp7riNipKIclro0p6mS8PKUQghiK6iSyt k6atcsUtl/c3fisrIGi2cdEY2Hczk2nqJFBCNpCQf99N9hNP8hgh03cn8hTII+8gcmEV CZYemdITEdOrPdMO91a+BdWuEnKcoW/620itKZRH3N5lodRfJ7jOpLP7pHZOUq6xofaP ZsiRB5yteZXEdPC44vapukTm3Xpbvkdo/HLh82+4odsjeOsxHU8X5lOtzcUhNDDIPeVy s33Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=kPV/IimYK8rIwnVzgWtbkuEOj9DcMDBGg1cT6gUJMCE=; b=nKKoCbnEk1z3anTH5+Kh3kJI7hbyaqDB3TWnObV99+jZmfIkjQkMOYgCfQ3iMd3e2C aHSxza1w4JS3kr9W8rDIW1eo8xqsZPdSgYZmoUNKozKp4jsogbE6nDaVYsetAklwcFBV uYrNqqz73NPoWLamOzgSY4L4QWtwnxadgUYcn+RJrjTUATS6YhXnsk/74qXeqrF300u2 Kv+e09qveVMHLJPKe1VZu4M38wO+BFwACGhXgwtVkDVKeDGnbq2C+Zaj2faHyI8pRepi G/1RHvOn92DqQeroMAyw4VPszLvl7jbpiyqmMMuMdr0kXNLFxknLE8s7uKRZTkbiriOj rmPg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of yang.shi@linux.alibaba.com designates 115.124.30.45 as permitted sender) smtp.mailfrom=yang.shi@linux.alibaba.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: from out30-45.freemail.mail.aliyun.com (out30-45.freemail.mail.aliyun.com. [115.124.30.45]) by mx.google.com with ESMTPS id a23si32832829pls.188.2019.04.10.20.57.59 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 10 Apr 2019 20:58:00 -0700 (PDT) Received-SPF: pass (google.com: domain of yang.shi@linux.alibaba.com designates 115.124.30.45 as permitted sender) client-ip=115.124.30.45; Authentication-Results: mx.google.com; spf=pass (google.com: domain of yang.shi@linux.alibaba.com designates 115.124.30.45 as permitted sender) smtp.mailfrom=yang.shi@linux.alibaba.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=alibaba.com X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R941e4;CH=green;DM=||false|;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01f04391;MF=yang.shi@linux.alibaba.com;NM=1;PH=DS;RN=15;SR=0;TI=SMTPD_---0TP0I5rB_1554955031; Received: from e19h19392.et15sqa.tbsite.net(mailfrom:yang.shi@linux.alibaba.com fp:SMTPD_---0TP0I5rB_1554955031) by smtp.aliyun-inc.com(127.0.0.1); Thu, 11 Apr 2019 11:57:23 +0800 From: Yang Shi To: mhocko@suse.com, mgorman@techsingularity.net, riel@surriel.com, hannes@cmpxchg.org, akpm@linux-foundation.org, dave.hansen@intel.com, keith.busch@intel.com, dan.j.williams@intel.com, fengguang.wu@intel.com, fan.du@intel.com, ying.huang@intel.com, ziy@nvidia.com Cc: yang.shi@linux.alibaba.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [v2 PATCH 6/9] mm: vmscan: don't demote for memcg reclaim Date: Thu, 11 Apr 2019 11:56:56 +0800 Message-Id: <1554955019-29472-7-git-send-email-yang.shi@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1554955019-29472-1-git-send-email-yang.shi@linux.alibaba.com> References: <1554955019-29472-1-git-send-email-yang.shi@linux.alibaba.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP The memcg reclaim happens when the limit is breached, but demotion just migrate pages to the other node instead of reclaiming them. This sounds pointless to memcg reclaim since the usage is not reduced at all. Signed-off-by: Yang Shi --- mm/vmscan.c | 38 +++++++++++++++++++++----------------- 1 file changed, 21 insertions(+), 17 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 2a96609..80cd624 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1046,8 +1046,12 @@ static void page_check_dirty_writeback(struct page *page, mapping->a_ops->is_dirty_writeback(page, dirty, writeback); } -static inline bool is_demote_ok(int nid) +static inline bool is_demote_ok(int nid, struct scan_control *sc) { + /* It is pointless to do demotion in memcg reclaim */ + if (!global_reclaim(sc)) + return false; + /* Current node is cpuless node */ if (!node_state(nid, N_CPU_MEM)) return false; @@ -1267,7 +1271,7 @@ static unsigned long shrink_page_list(struct list_head *page_list, * Demotion only happen from primary nodes * to cpuless nodes. */ - if (is_demote_ok(page_to_nid(page))) { + if (is_demote_ok(page_to_nid(page), sc)) { list_add(&page->lru, &demote_pages); unlock_page(page); continue; @@ -2219,7 +2223,7 @@ static bool inactive_list_is_low(struct lruvec *lruvec, bool file, * deactivation is pointless. */ if (!file && !total_swap_pages && - !is_demote_ok(pgdat->node_id)) + !is_demote_ok(pgdat->node_id, sc)) return false; inactive = lruvec_lru_size(lruvec, inactive_lru, sc->reclaim_idx); @@ -2306,7 +2310,7 @@ static void get_scan_count(struct lruvec *lruvec, struct mem_cgroup *memcg, * * If current node is already PMEM node, demotion is not applicable. */ - if (!is_demote_ok(pgdat->node_id)) { + if (!is_demote_ok(pgdat->node_id, sc)) { /* * If we have no swap space, do not bother scanning * anon pages. @@ -2315,18 +2319,18 @@ static void get_scan_count(struct lruvec *lruvec, struct mem_cgroup *memcg, scan_balance = SCAN_FILE; goto out; } + } - /* - * Global reclaim will swap to prevent OOM even with no - * swappiness, but memcg users want to use this knob to - * disable swapping for individual groups completely when - * using the memory controller's swap limit feature would be - * too expensive. - */ - if (!global_reclaim(sc) && !swappiness) { - scan_balance = SCAN_FILE; - goto out; - } + /* + * Global reclaim will swap to prevent OOM even with no + * swappiness, but memcg users want to use this knob to + * disable swapping for individual groups completely when + * using the memory controller's swap limit feature would be + * too expensive. + */ + if (!global_reclaim(sc) && !swappiness) { + scan_balance = SCAN_FILE; + goto out; } /* @@ -2675,7 +2679,7 @@ static inline bool should_continue_reclaim(struct pglist_data *pgdat, */ pages_for_compaction = compact_gap(sc->order); inactive_lru_pages = node_page_state(pgdat, NR_INACTIVE_FILE); - if (get_nr_swap_pages() > 0 || is_demote_ok(pgdat->node_id)) + if (get_nr_swap_pages() > 0 || is_demote_ok(pgdat->node_id, sc)) inactive_lru_pages += node_page_state(pgdat, NR_INACTIVE_ANON); if (sc->nr_reclaimed < pages_for_compaction && inactive_lru_pages > pages_for_compaction) @@ -3373,7 +3377,7 @@ static void age_active_anon(struct pglist_data *pgdat, struct mem_cgroup *memcg; /* Aging anon page as long as demotion is fine */ - if (!total_swap_pages && !is_demote_ok(pgdat->node_id)) + if (!total_swap_pages && !is_demote_ok(pgdat->node_id, sc)) return; memcg = mem_cgroup_iter(NULL, NULL, NULL); From patchwork Thu Apr 11 03:56:57 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yang Shi X-Patchwork-Id: 10895085 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C076B1669 for ; Thu, 11 Apr 2019 03:58:23 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AA7901FF73 for ; Thu, 11 Apr 2019 03:58:23 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9E3E427F85; Thu, 11 Apr 2019 03:58:23 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 23F3E1FF73 for ; Thu, 11 Apr 2019 03:58:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EE0426B0266; Wed, 10 Apr 2019 23:58:21 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id E65776B0269; Wed, 10 Apr 2019 23:58:21 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D073E6B026A; Wed, 10 Apr 2019 23:58:21 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f199.google.com (mail-pl1-f199.google.com [209.85.214.199]) by kanga.kvack.org (Postfix) with ESMTP id 944FA6B0266 for ; Wed, 10 Apr 2019 23:58:21 -0400 (EDT) Received: by mail-pl1-f199.google.com with SMTP id b34so3218613pld.17 for ; Wed, 10 Apr 2019 20:58:21 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=aomN1GPRo9IazHu+20Dhl0DXd62/KJuH2okPvztfYlE=; b=uVuNw7Hnd3VnH9/ywsTv0MKjHmfMs0+xVolGk0LVtuJPPcc/p5BSqNs8yMqXlfft1n cQH5S8ejmmXgnL4uFQJhtOjZ71KE1r2O6JT5jMy+CbUvWVaBBxWuGBCr9JRqSzuscs9s hkbAW/kmxkSnYBacaQYkJwf+mdS41gmTM3n35g2V3s8wqcMCKv+e28DtleuOaf6H2Fpm ApULKeH/8YCc6J7KDpKMynnNNbbwDdwZHyds3yV5VcGWbZahmWDamoFvs4eQFVV+KAAF tCopxcbswvhVTGmHFJB1QhmnZd9eC0woqq0pNRttOP664SXhob88Kuc0o++vJX2vizeE Xqfg== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of yang.shi@linux.alibaba.com designates 115.124.30.54 as permitted sender) smtp.mailfrom=yang.shi@linux.alibaba.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=alibaba.com X-Gm-Message-State: APjAAAWoJp/tEmCx2X4IzB9RXWx6+65sQm9NdWKl012uNA2C9GxICwja TtiFvStCTHhjiysiqayUuCOyFkqgJV0qK0hlPVfckl/b9e3hx399Ib61XlzNVB89JlocerqGwkF J3WUP1tiY72aosOvy1Eb58feTcgE2IqRq+7kccTBsrOZ/9typmnTUeVfQeH451p1ihQ== X-Received: by 2002:a63:5057:: with SMTP id q23mr45359278pgl.30.1554955101163; Wed, 10 Apr 2019 20:58:21 -0700 (PDT) X-Google-Smtp-Source: APXvYqzU6dXvYij1Ksi0pSNQVKtaEXqJ3NsZQINAbytbamRQFwGfuZK+NnenCF90we5adGhL4xj4 X-Received: by 2002:a63:5057:: with SMTP id q23mr45359214pgl.30.1554955099964; Wed, 10 Apr 2019 20:58:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1554955099; cv=none; d=google.com; s=arc-20160816; b=eTAKoopg8rDPE3mghEn8NRowtMKSaP4zqJeNprtB5hJ75/afE/ZQuWTjowVh56agDu UbpuHxu6gxMxY9KsEkvsEAkfs7D9klo6GppLflOWyqlG3PTSrVwgAWHTCMsamWyQTKIc GlghcdUvO2eDlu/wie4ff41R01b2JU3YEuoCk/jYUtM7l7QDwFIyNtJThno/oNXkVG0q vLEseJYBVKG3McpPsEFXuoudkoQx9WIKCt9oOuvso+Xph1leFkfBDaQ4Jz1matiWsHNh DPUiJU/XauI/BnlYB0hoKrqqn6014nvkLA59T5/WWs/DI7VUpA3/zaIgSHSgSPgFhL+H Hidg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=aomN1GPRo9IazHu+20Dhl0DXd62/KJuH2okPvztfYlE=; b=h1+0PKGuPp7HnElH4Wp+N47VtRYewpNhT2IkXLt/bprvd9nTDQOxMvT3AJPOHZv8OV TW9xSjJYRzIMxNFlxuDVAUbNtSKoxDKLpEAll60Lz8hRcXqLUUZFhyflHYA7km8PAPfX WhL5j3tSax3oRrrpmqZKrHIqN8GJqQphejbS2AG1rh+rExR4NygZuS37UdJTTlK/+jFf U9mubHYRp7Tr6rS6uwPFMtOqd2xAbQdne5WxxWEtp/IatH0Vrh1XDizd2Fa7lnQRgI5H +kT47uxV66+fjRhVD/pyc6sr4xhdvTNsvgb4Qs8myxJkKwkQPAdIVVRHzd46CkFlKCC4 LJgQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of yang.shi@linux.alibaba.com designates 115.124.30.54 as permitted sender) smtp.mailfrom=yang.shi@linux.alibaba.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: from out30-54.freemail.mail.aliyun.com (out30-54.freemail.mail.aliyun.com. [115.124.30.54]) by mx.google.com with ESMTPS id l66si15104606pfi.62.2019.04.10.20.58.19 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 10 Apr 2019 20:58:19 -0700 (PDT) Received-SPF: pass (google.com: domain of yang.shi@linux.alibaba.com designates 115.124.30.54 as permitted sender) client-ip=115.124.30.54; Authentication-Results: mx.google.com; spf=pass (google.com: domain of yang.shi@linux.alibaba.com designates 115.124.30.54 as permitted sender) smtp.mailfrom=yang.shi@linux.alibaba.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=alibaba.com X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R861e4;CH=green;DM=||false|;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e07486;MF=yang.shi@linux.alibaba.com;NM=1;PH=DS;RN=15;SR=0;TI=SMTPD_---0TP0I5rB_1554955031; Received: from e19h19392.et15sqa.tbsite.net(mailfrom:yang.shi@linux.alibaba.com fp:SMTPD_---0TP0I5rB_1554955031) by smtp.aliyun-inc.com(127.0.0.1); Thu, 11 Apr 2019 11:57:23 +0800 From: Yang Shi To: mhocko@suse.com, mgorman@techsingularity.net, riel@surriel.com, hannes@cmpxchg.org, akpm@linux-foundation.org, dave.hansen@intel.com, keith.busch@intel.com, dan.j.williams@intel.com, fengguang.wu@intel.com, fan.du@intel.com, ying.huang@intel.com, ziy@nvidia.com Cc: yang.shi@linux.alibaba.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [v2 PATCH 7/9] mm: vmscan: check if the demote target node is contended or not Date: Thu, 11 Apr 2019 11:56:57 +0800 Message-Id: <1554955019-29472-8-git-send-email-yang.shi@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1554955019-29472-1-git-send-email-yang.shi@linux.alibaba.com> References: <1554955019-29472-1-git-send-email-yang.shi@linux.alibaba.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP When demoting to PMEM node, the target node may have memory pressure, then the memory pressure may cause migrate_pages() fail. If the failure is caused by memory pressure (i.e. returning -ENOMEM), tag the node with PGDAT_CONTENDED. The tag would be cleared once the target node is balanced again. Check if the target node is PGDAT_CONTENDED or not, if it is just skip demotion. Signed-off-by: Yang Shi --- include/linux/mmzone.h | 3 +++ mm/vmscan.c | 28 ++++++++++++++++++++++++++++ 2 files changed, 31 insertions(+) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index fba7741..de534db 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -520,6 +520,9 @@ enum pgdat_flags { * many pages under writeback */ PGDAT_RECLAIM_LOCKED, /* prevents concurrent reclaim */ + PGDAT_CONTENDED, /* the node has not enough free memory + * available + */ }; enum zone_flags { diff --git a/mm/vmscan.c b/mm/vmscan.c index 80cd624..50cde53 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1048,6 +1048,9 @@ static void page_check_dirty_writeback(struct page *page, static inline bool is_demote_ok(int nid, struct scan_control *sc) { + int node; + nodemask_t used_mask; + /* It is pointless to do demotion in memcg reclaim */ if (!global_reclaim(sc)) return false; @@ -1060,6 +1063,13 @@ static inline bool is_demote_ok(int nid, struct scan_control *sc) if (!has_cpuless_node_online()) return false; + /* Check if the demote target node is contended or not */ + nodes_clear(used_mask); + node = find_next_best_node(nid, &used_mask, true); + + if (test_bit(PGDAT_CONTENDED, &NODE_DATA(node)->flags)) + return false; + return true; } @@ -1502,6 +1512,10 @@ static unsigned long shrink_page_list(struct list_head *page_list, nr_reclaimed += nr_succeeded; if (err) { + if (err == -ENOMEM) + set_bit(PGDAT_CONTENDED, + &NODE_DATA(target_nid)->flags); + putback_movable_pages(&demote_pages); list_splice(&ret_pages, &demote_pages); @@ -2596,6 +2610,19 @@ static void shrink_node_memcg(struct pglist_data *pgdat, struct mem_cgroup *memc * scan target and the percentage scanning already complete */ lru = (lru == LRU_FILE) ? LRU_BASE : LRU_FILE; + + /* + * The shrink_page_list() may find the demote target node is + * contended, if so it doesn't make sense to scan anonymous + * LRU again. + * + * Need check if swap is available or not too since demotion + * may happen on swapless system. + */ + if (!is_demote_ok(pgdat->node_id, sc) && + (!sc->may_swap || mem_cgroup_get_nr_swap_pages(memcg) <= 0)) + lru = LRU_FILE; + nr_scanned = targets[lru] - nr[lru]; nr[lru] = targets[lru] * (100 - percentage) / 100; nr[lru] -= min(nr[lru], nr_scanned); @@ -3458,6 +3485,7 @@ static void clear_pgdat_congested(pg_data_t *pgdat) clear_bit(PGDAT_CONGESTED, &pgdat->flags); clear_bit(PGDAT_DIRTY, &pgdat->flags); clear_bit(PGDAT_WRITEBACK, &pgdat->flags); + clear_bit(PGDAT_CONTENDED, &pgdat->flags); } /* From patchwork Thu Apr 11 03:56:58 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yang Shi X-Patchwork-Id: 10895077 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BF3C81669 for ; Thu, 11 Apr 2019 03:57:40 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AB6461FF73 for ; Thu, 11 Apr 2019 03:57:40 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9FB9E2898C; Thu, 11 Apr 2019 03:57:40 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 277911FF73 for ; Thu, 11 Apr 2019 03:57:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 05DD96B000A; Wed, 10 Apr 2019 23:57:28 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 0111C6B000C; Wed, 10 Apr 2019 23:57:27 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E65E76B000E; Wed, 10 Apr 2019 23:57:27 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f199.google.com (mail-pg1-f199.google.com [209.85.215.199]) by kanga.kvack.org (Postfix) with ESMTP id AC4686B000A for ; Wed, 10 Apr 2019 23:57:27 -0400 (EDT) Received: by mail-pg1-f199.google.com with SMTP id z14so3571646pgv.0 for ; Wed, 10 Apr 2019 20:57:27 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=8cOivVWY4TyKGsTwqMSmS+kBKv0VXMIFbIVXDNKOfGQ=; b=SjPkSwyuE8D5PNhGNSSyAThPnVKbfZ0bxmEM3ejM83aSROomyDFRnMDQEPCDAE5XnZ 0el0AZ0TlmYfhLvTTpUfmiDEv+BIKnOLXbAUGRpG9fujdhd9UdMKlZ8cfsS5NJ7LH/+A 1hEemF0DAKrwGjeIe3SaVYOgAJ52rIz59KM15NkJ5nx9mo78/0vtO/1RlklT6HHiTezA aLJZEOggw2AQK5Amd8nMYSCOU1ksHua85BFl+OKcW+s4/8xnX59dqDtk/fxmTcpLFSEl mKfpap8q4+bs/DUao3SnuNHc6/zvhafIGF1b5003tRJT2xMlXPxdtRJLR2ggxPjRW8i2 9C1Q== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of yang.shi@linux.alibaba.com designates 115.124.30.131 as permitted sender) smtp.mailfrom=yang.shi@linux.alibaba.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=alibaba.com X-Gm-Message-State: APjAAAWioqin6IYyw93B/6vrvVEsdejsCMidh2v3H9N2C/TD32wjynKw Qm9SioXQ452kvzS/cfEMHZiHBHkfs3OS03Hr3chwz6UdlEDl9eFdbe8E68Lrq5/JIHzisTkZCU1 +KlFWRP9MuDRHr0mugSeqgRk93pNmB60s/ZAvTx+XX/YeT7h5Z1Ys7MQvcVRhYSICcg== X-Received: by 2002:a63:6942:: with SMTP id e63mr45299360pgc.102.1554955047315; Wed, 10 Apr 2019 20:57:27 -0700 (PDT) X-Google-Smtp-Source: APXvYqwTqYjhEIyyscqODo+xkyFYT11XuiNdJ4F1kqlQ+/ZL6w8qan5Wz73m7HF2G0IL44IPeYCR X-Received: by 2002:a63:6942:: with SMTP id e63mr45299296pgc.102.1554955046212; Wed, 10 Apr 2019 20:57:26 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1554955046; cv=none; d=google.com; s=arc-20160816; b=iZQPvoaIlcCZ7FnY6EcRr1bTd0mJ8M7aAgsmmRaGSoI3T5+227BjnfhV1AklVo1t2m xH8xM/YvuVYVaIifK2LWwgpG9nijjTAMxVqIjAB4PcvvUZN7G1IuwTi4s+m7IdJfThE9 Zd6I5VxdN02xZCjC3738IuNQLIb5Vwkhxacuty/xAH/QtItfQpoeOtRzrFtJ74RL2Xv6 GunloawdGZWTsAdr2PpSBObKgk7utj2LsjYdIWpeAr9+f0/7cyWy1uLmK/wCAnygHhXH ZPD44eH8+INv8ijpIacozwUiVxThaGZmFSp7QHBibFKgpGMA6FGAsSdAsCxKFajhr5hH vtpg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=8cOivVWY4TyKGsTwqMSmS+kBKv0VXMIFbIVXDNKOfGQ=; b=F4UfoYPFc4zHf0CTisCKGlXTv5NEC6Obt2CiNTbBSQwYJQbZ8YanrPofzrL+KxQIKX GNuMdLBz8QGbqn1dut3lX+SAgothNdvUlAmraPRhZ/z+8P6sIuEfrP/1Y22TeDyxzDZw 0dA7ZZeF6sNq76OKAglyzwA+hpqzht4czA8WULWifSBRjEgiJ+Lr40RWQ6r097w2Y9BB /7kLh/JQM64OuLzC8VFCpAWBzyEPwfEQE04xhE1AdEKAu6kjezse457TFxXw+U4CVAur g1kGwXrmKrHDzSrgXRD1DBRmzcM7B8OISCt+pSCDohyWuEqIxjgbRuzM2CmL/dEnHQEY mM7Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of yang.shi@linux.alibaba.com designates 115.124.30.131 as permitted sender) smtp.mailfrom=yang.shi@linux.alibaba.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: from out30-131.freemail.mail.aliyun.com (out30-131.freemail.mail.aliyun.com. [115.124.30.131]) by mx.google.com with ESMTPS id g90si13116127plb.140.2019.04.10.20.57.25 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 10 Apr 2019 20:57:26 -0700 (PDT) Received-SPF: pass (google.com: domain of yang.shi@linux.alibaba.com designates 115.124.30.131 as permitted sender) client-ip=115.124.30.131; Authentication-Results: mx.google.com; spf=pass (google.com: domain of yang.shi@linux.alibaba.com designates 115.124.30.131 as permitted sender) smtp.mailfrom=yang.shi@linux.alibaba.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=alibaba.com X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R991e4;CH=green;DM=||false|;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04400;MF=yang.shi@linux.alibaba.com;NM=1;PH=DS;RN=15;SR=0;TI=SMTPD_---0TP0I5rB_1554955031; Received: from e19h19392.et15sqa.tbsite.net(mailfrom:yang.shi@linux.alibaba.com fp:SMTPD_---0TP0I5rB_1554955031) by smtp.aliyun-inc.com(127.0.0.1); Thu, 11 Apr 2019 11:57:24 +0800 From: Yang Shi To: mhocko@suse.com, mgorman@techsingularity.net, riel@surriel.com, hannes@cmpxchg.org, akpm@linux-foundation.org, dave.hansen@intel.com, keith.busch@intel.com, dan.j.williams@intel.com, fengguang.wu@intel.com, fan.du@intel.com, ying.huang@intel.com, ziy@nvidia.com Cc: yang.shi@linux.alibaba.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [v2 PATCH 8/9] mm: vmscan: add page demotion counter Date: Thu, 11 Apr 2019 11:56:58 +0800 Message-Id: <1554955019-29472-9-git-send-email-yang.shi@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1554955019-29472-1-git-send-email-yang.shi@linux.alibaba.com> References: <1554955019-29472-1-git-send-email-yang.shi@linux.alibaba.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Account the number of demoted pages into reclaim_state->nr_demoted. Add pgdemote_kswapd and pgdemote_direct VM counters showed in /proc/vmstat. Signed-off-by: Yang Shi --- include/linux/vm_event_item.h | 2 ++ include/linux/vmstat.h | 1 + mm/internal.h | 1 + mm/vmscan.c | 7 +++++++ mm/vmstat.c | 2 ++ 5 files changed, 13 insertions(+) diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h index 47a3441..499a3aa 100644 --- a/include/linux/vm_event_item.h +++ b/include/linux/vm_event_item.h @@ -32,6 +32,8 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT, PGREFILL, PGSTEAL_KSWAPD, PGSTEAL_DIRECT, + PGDEMOTE_KSWAPD, + PGDEMOTE_DIRECT, PGSCAN_KSWAPD, PGSCAN_DIRECT, PGSCAN_DIRECT_THROTTLE, diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h index 2db8d60..eb5d21c 100644 --- a/include/linux/vmstat.h +++ b/include/linux/vmstat.h @@ -29,6 +29,7 @@ struct reclaim_stat { unsigned nr_activate; unsigned nr_ref_keep; unsigned nr_unmap_fail; + unsigned nr_demoted; }; #ifdef CONFIG_VM_EVENT_COUNTERS diff --git a/mm/internal.h b/mm/internal.h index 8c424b5..8ba4853 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -156,6 +156,7 @@ struct scan_control { unsigned int immediate; unsigned int file_taken; unsigned int taken; + unsigned int demoted; } nr; }; diff --git a/mm/vmscan.c b/mm/vmscan.c index 50cde53..a52c8248 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1511,6 +1511,12 @@ static unsigned long shrink_page_list(struct list_head *page_list, nr_reclaimed += nr_succeeded; + stat->nr_demoted = nr_succeeded; + if (current_is_kswapd()) + __count_vm_events(PGDEMOTE_KSWAPD, stat->nr_demoted); + else + __count_vm_events(PGDEMOTE_DIRECT, stat->nr_demoted); + if (err) { if (err == -ENOMEM) set_bit(PGDAT_CONTENDED, @@ -2019,6 +2025,7 @@ static int current_may_throttle(void) sc->nr.unqueued_dirty += stat.nr_unqueued_dirty; sc->nr.writeback += stat.nr_writeback; sc->nr.immediate += stat.nr_immediate; + sc->nr.demoted += stat.nr_demoted; sc->nr.taken += nr_taken; if (file) sc->nr.file_taken += nr_taken; diff --git a/mm/vmstat.c b/mm/vmstat.c index 1a431dc..d1e4993 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1192,6 +1192,8 @@ int fragmentation_index(struct zone *zone, unsigned int order) "pgrefill", "pgsteal_kswapd", "pgsteal_direct", + "pgdemote_kswapd", + "pgdemote_direct", "pgscan_kswapd", "pgscan_direct", "pgscan_direct_throttle", From patchwork Thu Apr 11 03:56:59 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yang Shi X-Patchwork-Id: 10895081 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3F12E17E0 for ; Thu, 11 Apr 2019 03:58:12 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2A1891FF73 for ; Thu, 11 Apr 2019 03:58:12 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 1DA4927F85; Thu, 11 Apr 2019 03:58:12 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B17D61FF73 for ; Thu, 11 Apr 2019 03:58:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CBB996B000E; Wed, 10 Apr 2019 23:58:10 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id C6A686B0010; Wed, 10 Apr 2019 23:58:10 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B0C666B0266; Wed, 10 Apr 2019 23:58:10 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f197.google.com (mail-pg1-f197.google.com [209.85.215.197]) by kanga.kvack.org (Postfix) with ESMTP id 733A56B000E for ; Wed, 10 Apr 2019 23:58:10 -0400 (EDT) Received: by mail-pg1-f197.google.com with SMTP id j184so3551031pgd.7 for ; Wed, 10 Apr 2019 20:58:10 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=VBnAD41t69RqBOb9zPduWfXXPAif/Mt2uvTkTO6jcCY=; b=JlJ9PJc+lTp5Gw0Kcn3RMlQu0AMn/8rI49rGuyqh5icQpLayiDrsf6eRNJkENG6L7M +EkMFvA50hycN0pugg6JrD8gvHMZB9mEec3lk9T84jQguFv0bFiL6bfQq0zqrPqmFU6H rjJcFgRlwTnh+sz6pSlhb/NvmJMbQbU5fTgghHDTFfinY6cZzifQrEpSGrTN3bmUWA52 n+kc2QP4rcZkPSGkS9S20tWBlI/9nJdbBtP3BiC8LAtF5uCc7VPh0i2BtejmiP4o3iI9 eOQ16jRWDDISgixfS8dnfcbR/LL9Lph8bKWi6Gn/oMlXx3XOCZk4jK5KOa3dhDT4qNax /BUA== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of yang.shi@linux.alibaba.com designates 115.124.30.44 as permitted sender) smtp.mailfrom=yang.shi@linux.alibaba.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=alibaba.com X-Gm-Message-State: APjAAAV8dZ/a7UPEwZI5PAG9PH8RocFVpYnmnsNORMCWhigrFy7YjWGn 0ByuK1oHFs2Knd7meV9sSCXaRzzSFpDMQ96o2YmaXKrxuSFTTzn4fWpny0/f0jzyLIEhJqyOZ6/ iqrZsQuMyMHFjc/mbvUYtTL7CUiJNv5iR8bcGsLbhW7aN3OsOuH8X4olV/m6uf+x/vw== X-Received: by 2002:aa7:8208:: with SMTP id k8mr48203906pfi.69.1554955090103; Wed, 10 Apr 2019 20:58:10 -0700 (PDT) X-Google-Smtp-Source: APXvYqxgThj1c2HUo7PVm1SD82v/7lVustNb/ddJUsOBDdXcROSn7gdoP/NPwpXgBoAgAWi5+ONr X-Received: by 2002:aa7:8208:: with SMTP id k8mr48203839pfi.69.1554955088839; Wed, 10 Apr 2019 20:58:08 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1554955088; cv=none; d=google.com; s=arc-20160816; b=tpzZ/TpN2KvdN17YiEZalSr20h8eAmgpoSnQ1mmoGQYVqZcEW0kPl9cHTfmAJ7jL9G kAPo1zzrbCkLAqy6tFWFSikqc4SK9AcsZuPlVPcLfdgdVdw/HM9G9TlPyE1JrqDEF26I wTkoScA2r0rBCcoiiGwEiJMqmkYR1yTXYjmvWXDoz0KmYr9a7XEwkpOeGcZdLWNf4Jp6 P2OFT9SsHoyEL4irl8d3PmRRfPyO6sLMFOfwc6/VHMdnLzWabfDdydDmMGsgkksoZd4T QqXNOnhvS3TSy2b0VvqhYW+TNMJaZcRFXYp7XS9JQHQSiSoqfDUKUuW37cSEFRddlJrZ pu8w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=VBnAD41t69RqBOb9zPduWfXXPAif/Mt2uvTkTO6jcCY=; b=DAtkOQikZmjsO3vqnR3PUNGVuwsFZVTvDcR8+1y8DvLt0x9kVDUwqtj0YSrBK7HiNn Aibay8hdZtfnExg/mc/gAr/pKX6R7ZAFXFCogf5vXwa3KY3mTUDGZ9B8p/zKLki6WNwT 7fsX4Q0HfiGHfyYUr9wFEpPMPAY9cBkSaYYF/QYFm07hnjObiGlKn3Ym5BjNmL5Oe7qA 2MFPrU+wkE8P70oVe0PJpXAnAQ4piETqh0ohWtTBSGpI28/mqzhOERKgb+wrUrVRYmwy g9dime7aI1ms8qXu1FBEOHSlulsvnIP9xRI0NGMoFnpg16yWAriGQJYPqTgjtaKIZkWF jNsg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of yang.shi@linux.alibaba.com designates 115.124.30.44 as permitted sender) smtp.mailfrom=yang.shi@linux.alibaba.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: from out30-44.freemail.mail.aliyun.com (out30-44.freemail.mail.aliyun.com. [115.124.30.44]) by mx.google.com with ESMTPS id a5si33597356pff.39.2019.04.10.20.58.08 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 10 Apr 2019 20:58:08 -0700 (PDT) Received-SPF: pass (google.com: domain of yang.shi@linux.alibaba.com designates 115.124.30.44 as permitted sender) client-ip=115.124.30.44; Authentication-Results: mx.google.com; spf=pass (google.com: domain of yang.shi@linux.alibaba.com designates 115.124.30.44 as permitted sender) smtp.mailfrom=yang.shi@linux.alibaba.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=alibaba.com X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R501e4;CH=green;DM=||false|;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04395;MF=yang.shi@linux.alibaba.com;NM=1;PH=DS;RN=15;SR=0;TI=SMTPD_---0TP0I5rB_1554955031; Received: from e19h19392.et15sqa.tbsite.net(mailfrom:yang.shi@linux.alibaba.com fp:SMTPD_---0TP0I5rB_1554955031) by smtp.aliyun-inc.com(127.0.0.1); Thu, 11 Apr 2019 11:57:24 +0800 From: Yang Shi To: mhocko@suse.com, mgorman@techsingularity.net, riel@surriel.com, hannes@cmpxchg.org, akpm@linux-foundation.org, dave.hansen@intel.com, keith.busch@intel.com, dan.j.williams@intel.com, fengguang.wu@intel.com, fan.du@intel.com, ying.huang@intel.com, ziy@nvidia.com Cc: yang.shi@linux.alibaba.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [v2 PATCH 9/9] mm: numa: add page promotion counter Date: Thu, 11 Apr 2019 11:56:59 +0800 Message-Id: <1554955019-29472-10-git-send-email-yang.shi@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1554955019-29472-1-git-send-email-yang.shi@linux.alibaba.com> References: <1554955019-29472-1-git-send-email-yang.shi@linux.alibaba.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Add counter for page promotion for NUMA balancing. Signed-off-by: Yang Shi --- include/linux/vm_event_item.h | 1 + mm/huge_memory.c | 4 ++++ mm/memory.c | 4 ++++ mm/vmstat.c | 1 + 4 files changed, 10 insertions(+) diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h index 499a3aa..9f52a62 100644 --- a/include/linux/vm_event_item.h +++ b/include/linux/vm_event_item.h @@ -51,6 +51,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT, NUMA_HINT_FAULTS, NUMA_HINT_FAULTS_LOCAL, NUMA_PAGE_MIGRATE, + NUMA_PAGE_PROMOTE, #endif #ifdef CONFIG_MIGRATION PGMIGRATE_SUCCESS, PGMIGRATE_FAIL, diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 0b18ac45..ca9d688 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1609,6 +1609,10 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf, pmd_t pmd) migrated = migrate_misplaced_transhuge_page(vma->vm_mm, vma, vmf->pmd, pmd, vmf->address, page, target_nid); if (migrated) { + if (!node_state(page_nid, N_CPU_MEM) && + node_state(target_nid, N_CPU_MEM)) + count_vm_numa_events(NUMA_PAGE_PROMOTE, HPAGE_PMD_NR); + flags |= TNF_MIGRATED; page_nid = target_nid; } else diff --git a/mm/memory.c b/mm/memory.c index 01c1ead..7b1218b 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3704,6 +3704,10 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf) /* Migrate to the requested node */ migrated = migrate_misplaced_page(page, vma, target_nid); if (migrated) { + if (!node_state(page_nid, N_CPU_MEM) && + node_state(target_nid, N_CPU_MEM)) + count_vm_numa_event(NUMA_PAGE_PROMOTE); + page_nid = target_nid; flags |= TNF_MIGRATED; } else diff --git a/mm/vmstat.c b/mm/vmstat.c index d1e4993..fd194e3 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1220,6 +1220,7 @@ int fragmentation_index(struct zone *zone, unsigned int order) "numa_hint_faults", "numa_hint_faults_local", "numa_pages_migrated", + "numa_pages_promoted", #endif #ifdef CONFIG_MIGRATION "pgmigrate_success",