From patchwork Wed Sep 19 03:17:46 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pingfan Liu X-Patchwork-Id: 10605209 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0B45715A6 for ; Wed, 19 Sep 2018 03:18:46 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id EC1892B8AC for ; Wed, 19 Sep 2018 03:18:45 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id DE59F2B8FF; Wed, 19 Sep 2018 03:18:45 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3D20C2B8AC for ; Wed, 19 Sep 2018 03:18:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2A6578E0006; Tue, 18 Sep 2018 23:18:44 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 2555B8E0001; Tue, 18 Sep 2018 23:18:44 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0A9718E0006; Tue, 18 Sep 2018 23:18:44 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f199.google.com (mail-pg1-f199.google.com [209.85.215.199]) by kanga.kvack.org (Postfix) with ESMTP id B7A078E0001 for ; Tue, 18 Sep 2018 23:18:43 -0400 (EDT) Received: by mail-pg1-f199.google.com with SMTP id u6-v6so1808015pgn.10 for ; Tue, 18 Sep 2018 20:18:43 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:from:to:cc:subject:date :message-id:in-reply-to:references; bh=wnmTMCIvpHh2feZmA1KHGX2V8gyDhRDAIRT9odJG9qM=; b=KdCMhhC5CyWWVaZXP7RQ0+iRWNzflEBa1jgYYxKvJfApcgsjQ+DyelnN8sp1CTnxFf k/t9SX5EE2FyaRI3KheAZN17E4DUV9W0uqfbC5QIPpZN0PAE5g3R1kjwgexngOGOMkxl 0Z8DZFaInXvkG5M9qjLnyVLJqaM2Bd975mAe9dohPhn4oaf7EfuD+7px3estvRIrQIQL zmh7CdcX5yR2+wUlecJAXbFwmRXuG2deBo9ZVw/yhbP04iMuFOMDSfIDxTUC95gZ51gp gSzppn2RHoalIq9gmheb79iiKeS4XIgDbs1VFhqk9xzZ6isJR0Zy8pjQv96ptk6+Cp8Q ynOQ== X-Gm-Message-State: APzg51BCXX6j6I0mPkN4UhLA2EaL4BZdi6rnKejqp8BtXG14Luya10yn qnCjROrMDA8GyZJ19rWe2YniIBUMmBvf0zcKW/V1qaKuQt9NVNRPMaQEIa7wi+tBGJn/PgClfLC nk4dn2dzKxgqWqyZIkDHMZphsud8hdC2Xay2bd/qr7TYMhGx8TLOfd94AFba/LD0ZrkL5BctGcp KGXczOc0zFHvoorIquQ2eiS95AGuNkUYhEslIL/oRiUcMvmJf3fsNcmxMM7JabvxeJHz+yCTmn2 T7HPNtP1//f59Wha5GXWPCTx0zeS8nMd6HosYT3u0cBacAaSrKBMbrJt/JgWOZbEY/6FX14zyeN epFXaFxFyzamJFpNQ/0U4scGkDuLDRI8h7TiyQN0rnSLQj52rVzpdugYunqvxuR1FXOU1H2pWPx w X-Received: by 2002:a17:902:32f:: with SMTP id 44-v6mr31761865pld.15.1537327123426; Tue, 18 Sep 2018 20:18:43 -0700 (PDT) X-Received: by 2002:a17:902:32f:: with SMTP id 44-v6mr31761820pld.15.1537327122469; Tue, 18 Sep 2018 20:18:42 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1537327122; cv=none; d=google.com; s=arc-20160816; b=ihllbq7JkujvKMwi9hbypYndQwazk6GazfIDdqbZO0C0noj8Vcpv/wfbjaXiGbc3dC rvYYinHgeUL+NHgixV1u0XtkDQqK+gA19Vu/qk+RbmTsfVN+hMrzSBFi95e9g1me5X5t nAAXnYxKFseZjdub+2hj1UDNxhJPkey3hkYws/BF9dgYkYaGa4/UB+l3cAK0Q+nQlU3e Z2Q8eN3QKKOqbq0JiyD2bS/Od5xLzU8OoxqoKZEDvPsnRpr5EL2Qf5dU+pctXJMO3uWp KZjo5qtujfEgFRIULGgpoJ0/dbm4L9HLkGM+Le4FRzn8IJriLAO8etHKQpaAKiiR7d+D ePTg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=wnmTMCIvpHh2feZmA1KHGX2V8gyDhRDAIRT9odJG9qM=; b=f/l00oRdtGDkcgtKhCju+UA8lJIMHuON4P7dtWP0U8Ai1Hbm4A8Dytu/SmRRj9xfKy g/Iiwn9+9gPHUWw63z2tNJAWB9/Yxu1S9QM4rXbRxtWWO/EzUx90uNpa+MdJ+jeOgMD5 +HUfyfZg0C5SvWrg+V+Vcb0lHSSz6pVkGjYPDXz7ZtxxH+pXP59MZ6zPb8NpgBpUCoJq G9PYkAjL8cU/pcunIzA5ZOYDNMOAjzJsGTKNLzKajd1zZVpqQfR5Au2E46HYI4ewf4kA L5oAwinuhbHvdyqGtDQtocN+zyMquPWPGnmjQktAf6YZSEZogVpR0VV543N1Nf1Kpqks GPuA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b="XVpmTx7/"; spf=pass (google.com: domain of kernelfans@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=kernelfans@gmail.com; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id g12-v6sor3752618pll.149.2018.09.18.20.18.42 for (Google Transport Security); Tue, 18 Sep 2018 20:18:42 -0700 (PDT) Received-SPF: pass (google.com: domain of kernelfans@gmail.com designates 209.85.220.65 as permitted sender) client-ip=209.85.220.65; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b="XVpmTx7/"; spf=pass (google.com: domain of kernelfans@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=kernelfans@gmail.com; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=wnmTMCIvpHh2feZmA1KHGX2V8gyDhRDAIRT9odJG9qM=; b=XVpmTx7/5hKYHWHfzI5nlirWpBN00KpTlwaMwqOqidBME3yZcoveMBQ3mNUDIQ2HRV ulElcBpcUbuKppeW5LC6kdpRtxeEf2yPEXH8O6+O1F0UHiznpQtrUR/zQ99WdClQ6F+b nOxEQfYd6bTKjVSsLSj5/0fyDmW6XeRC4Z12YeE1fk7CHZKg9YhiHoUYZlFuxGUQmDim EwNYHKz2M2f0/A2PhIjBXIanAQMJhj32wr9uQs5My7moAzRo7RZKV/7YT6PO6QmPMzf7 /hB724agoQJFyaYD6ICcv+8BZfYjqOZrG8ufqRK7X3yTxhZFB5/dbsvjM3bCiug0mahm 1piA== X-Google-Smtp-Source: ANB0VdapTZaC4ULMWX8MN4m7fRkCf7zWEA4mWPAYwW+wFrKeoRpF1v+Sb2mJ+PWkDyO0WNfomPHKhg== X-Received: by 2002:a17:902:8a97:: with SMTP id p23-v6mr31846563plo.21.1537327121745; Tue, 18 Sep 2018 20:18:41 -0700 (PDT) Received: from mylaptop.redhat.com ([209.132.188.80]) by smtp.gmail.com with ESMTPSA id o20-v6sm53087673pfj.35.2018.09.18.20.18.36 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 18 Sep 2018 20:18:40 -0700 (PDT) From: Pingfan Liu To: linux-mm@kvack.org Cc: Pingfan Liu , Andrew Morton , KAMEZAWA Hiroyuki , Mel Gorman , Greg Kroah-Hartman , Pavel Tatashin , Michal Hocko , Bharata B Rao , Dan Williams , "H. Peter Anvin" , "Kirill A . Shutemov" Subject: [PATCH 3/3] drivers/base/node: create a partial offline hints under each node Date: Wed, 19 Sep 2018 11:17:46 +0800 Message-Id: <1537327066-27852-4-git-send-email-kernelfans@gmail.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1537327066-27852-1-git-send-email-kernelfans@gmail.com> References: <1537327066-27852-1-git-send-email-kernelfans@gmail.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP When offline mem, there are two cases: 1st, offline all of memblock under a node. 2nd, only offline and replace part of mem under a node. For the 2nd case, there is not need to alloc new page from other nodes, which may incur extra numa fault to resolve the misplaced issue, and place unnecessary mem pressure on other nodes. The patch suggests to introduce an interface /sys/../node/nodeX/partial_offline to let the user order how to allocate a new page, i.e. from local node or other nodes. Signed-off-by: Pingfan Liu Cc: Andrew Morton Cc: KAMEZAWA Hiroyuki Cc: Mel Gorman Cc: Greg Kroah-Hartman Cc: Pavel Tatashin Cc: Michal Hocko Cc: Bharata B Rao Cc: Dan Williams Cc: "H. Peter Anvin" Cc: Kirill A. Shutemov --- drivers/base/node.c | 33 +++++++++++++++++++++++++++++++++ include/linux/mmzone.h | 1 + mm/memory_hotplug.c | 31 +++++++++++++++++++------------ 3 files changed, 53 insertions(+), 12 deletions(-) diff --git a/drivers/base/node.c b/drivers/base/node.c index 1ac4c36..64b0cb8 100644 --- a/drivers/base/node.c +++ b/drivers/base/node.c @@ -25,6 +25,36 @@ static struct bus_type node_subsys = { .dev_name = "node", }; +static ssize_t read_partial_offline(struct device *dev, + struct device_attribute *attr, char *buf) +{ + int nid = dev->id; + struct pglist_data *pgdat = NODE_DATA(nid); + ssize_t len = 0; + + if (pgdat->partial_offline) + len = sprintf(buf, "1\n"); + else + len = sprintf(buf, "0\n"); + + return len; +} + +static ssize_t write_partial_offline(struct device *dev, + struct device_attribute *attr, const char *buf, size_t count) +{ + int nid = dev->id; + struct pglist_data *pgdat = NODE_DATA(nid); + + if (sysfs_streq(buf, "1")) + pgdat->partial_offline = true; + else if (sysfs_streq(buf, "0")) + pgdat->partial_offline = false; + else + return -EINVAL; + + return strlen(buf); +} static ssize_t node_read_cpumap(struct device *dev, bool list, char *buf) { @@ -56,6 +86,8 @@ static inline ssize_t node_read_cpulist(struct device *dev, return node_read_cpumap(dev, true, buf); } +static DEVICE_ATTR(partial_offline, 0600, read_partial_offline, + write_partial_offline); static DEVICE_ATTR(cpumap, S_IRUGO, node_read_cpumask, NULL); static DEVICE_ATTR(cpulist, S_IRUGO, node_read_cpulist, NULL); @@ -235,6 +267,7 @@ static struct attribute *node_dev_attrs[] = { &dev_attr_numastat.attr, &dev_attr_distance.attr, &dev_attr_vmstat.attr, + &dev_attr_partial_offline.attr, NULL }; ATTRIBUTE_GROUPS(node_dev); diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 1e22d96..80c44c8 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -722,6 +722,7 @@ typedef struct pglist_data { /* Per-node vmstats */ struct per_cpu_nodestat __percpu *per_cpu_nodestats; atomic_long_t vm_stat[NR_VM_NODE_STAT_ITEMS]; + bool partial_offline; } pg_data_t; #define node_present_pages(nid) (NODE_DATA(nid)->node_present_pages) diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 228de4d..3c66075 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1346,18 +1346,10 @@ static unsigned long scan_movable_pages(unsigned long start, unsigned long end) static struct page *new_node_page(struct page *page, unsigned long private) { - int nid = page_to_nid(page); - nodemask_t nmask = node_states[N_MEMORY]; - - /* - * try to allocate from a different node but reuse this node if there - * are no other online nodes to be used (e.g. we are offlining a part - * of the only existing node) - */ - node_clear(nid, nmask); - if (nodes_empty(nmask)) - node_set(nid, nmask); + nodemask_t nmask = *(nodemask_t *)private; + int nid; + nid = page_to_nid(page); return new_page_nodemask(page, nid, &nmask); } @@ -1371,6 +1363,8 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) int not_managed = 0; int ret = 0; LIST_HEAD(source); + int nid; + nodemask_t nmask = node_states[N_MEMORY]; for (pfn = start_pfn; pfn < end_pfn && move_pages > 0; pfn++) { if (!pfn_valid(pfn)) @@ -1430,8 +1424,21 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) goto out; } + page = list_entry(source.next, struct page, lru); + nid = page_to_nid(page); + if (!NODE_DATA(nid)->partial_offline) { + /* + * try to allocate from a different node but reuse this + * node if there are no other online nodes to be used + * (e.g. we are offlining a part of the only existing + * node) + */ + node_clear(nid, nmask); + if (nodes_empty(nmask)) + node_set(nid, nmask); + } /* Allocate a new page from the nearest neighbor node */ - ret = migrate_pages(&source, new_node_page, NULL, 0, + ret = migrate_pages(&source, new_node_page, NULL, &nmask, MIGRATE_SYNC, MR_MEMORY_HOTPLUG); if (ret) putback_movable_pages(&source);