From patchwork Thu Nov 19 04:22:11 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: tangchen X-Patchwork-Id: 7655021 Return-Path: X-Original-To: patchwork-linux-acpi@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 4B918BF90C for ; Thu, 19 Nov 2015 04:27:40 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 6A80E204D1 for ; Thu, 19 Nov 2015 04:27:39 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 79E3520256 for ; Thu, 19 Nov 2015 04:27:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933876AbbKSE1X (ORCPT ); Wed, 18 Nov 2015 23:27:23 -0500 Received: from cn.fujitsu.com ([59.151.112.132]:19167 "EHLO heian.cn.fujitsu.com" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S933737AbbKSEZJ (ORCPT ); Wed, 18 Nov 2015 23:25:09 -0500 X-IronPort-AV: E=Sophos;i="5.20,242,1444665600"; d="scan'208";a="611375" Received: from unknown (HELO edo.cn.fujitsu.com) ([10.167.33.5]) by heian.cn.fujitsu.com with ESMTP; 19 Nov 2015 12:24:56 +0800 Received: from G08CNEXCHPEKD01.g08.fujitsu.local (localhost.localdomain [127.0.0.1]) by edo.cn.fujitsu.com (8.14.3/8.13.1) with ESMTP id tAJ4OOd9031743; Thu, 19 Nov 2015 12:24:24 +0800 Received: from tangchen.g08.fujitsu.local (10.167.226.71) by G08CNEXCHPEKD01.g08.fujitsu.local (10.167.33.89) with Microsoft SMTP Server (TLS) id 14.3.181.6; Thu, 19 Nov 2015 12:25:06 +0800 From: Tang Chen To: , , , , , , , , , , , , CC: , , , , Subject: [PATCH v3 1/5] x86, memhp, numa: Online memory-less nodes at boot time. Date: Thu, 19 Nov 2015 12:22:11 +0800 Message-ID: <1447906935-31899-2-git-send-email-tangchen@cn.fujitsu.com> X-Mailer: git-send-email 1.9.3 In-Reply-To: <1447906935-31899-1-git-send-email-tangchen@cn.fujitsu.com> References: <1447906935-31899-1-git-send-email-tangchen@cn.fujitsu.com> MIME-Version: 1.0 X-Originating-IP: [10.167.226.71] Sender: linux-acpi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-acpi@vger.kernel.org X-Spam-Status: No, score=-7.5 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP For now, x86 does not support memory-less node. A node without memory will not be onlined, and the cpus on it will be mapped to the other online nodes with memory in init_cpu_to_node(). The reason of doing this is to ensure each cpu has mapped to a node with memory, so that it will be able to allocate local memory for that cpu. But we don't have to do it in this way. In this series of patches, we are going to construct cpu <-> node mapping for all possible cpus at boot time, which is a 1-1 mapping. It means the cpu will be mapped to the node it belongs to, and will never be changed. If a node has only cpus but no memory, the cpus on it will be mapped to a memory-less node. And the memory-less node should be onlined. This patch allocate pgdats for all memory-less nodes and online them at boot time. Then build zonelists for these nodes. As a result, when cpus on these memory-less nodes try to allocate memory from local node, it will automatically fall back to the proper zones in the zonelists. --- arch/x86/mm/numa.c | 30 ++++++++++++++++-------------- include/linux/mmzone.h | 1 + mm/page_alloc.c | 2 +- 3 files changed, 18 insertions(+), 15 deletions(-) diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c index c3b3f65..3537c31 100644 --- a/arch/x86/mm/numa.c +++ b/arch/x86/mm/numa.c @@ -704,22 +704,22 @@ void __init x86_numa_init(void) numa_init(dummy_numa_init); } -static __init int find_near_online_node(int node) +static void __init init_memory_less_node(int nid) { - int n, val; - int min_val = INT_MAX; - int best_node = -1; + unsigned long zones_size[MAX_NR_ZONES] = {0}; + unsigned long zholes_size[MAX_NR_ZONES] = {0}; - for_each_online_node(n) { - val = node_distance(node, n); + /* Allocate and initialize node data. Memory-less node is now online.*/ + alloc_node_data(nid); + free_area_init_node(nid, zones_size, 0, zholes_size); - if (val < min_val) { - min_val = val; - best_node = n; - } - } - - return best_node; + /* + * Build zonelist so that when the cpus try to allocate memory on local + * node, which has no memory, it will fall back to the best near node. + * No need to rebuild zonelist for the other nodes since memory-less + * node has no memory. And no need to lock at boot time. + */ + build_zonelists(NODE_DATA(nid)); } /* @@ -748,8 +748,10 @@ void __init init_cpu_to_node(void) if (node == NUMA_NO_NODE) continue; + if (!node_online(node)) - node = find_near_online_node(node); + init_memory_less_node(node); + numa_set_node(cpu, node); } } diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index e23a9e7..9c4d4d5 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -736,6 +736,7 @@ static inline bool is_dev_zone(const struct zone *zone) extern struct mutex zonelists_mutex; void build_all_zonelists(pg_data_t *pgdat, struct zone *zone); +void build_zonelists(pg_data_t *pgdat); void wakeup_kswapd(struct zone *zone, int order, enum zone_type classzone_idx); bool zone_watermark_ok(struct zone *z, unsigned int order, unsigned long mark, int classzone_idx, int alloc_flags); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 17a3c66..761f302 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -4144,7 +4144,7 @@ static void set_zonelist_order(void) current_zonelist_order = user_zonelist_order; } -static void build_zonelists(pg_data_t *pgdat) +void build_zonelists(pg_data_t *pgdat) { int j, node, load; enum zone_type i;