From patchwork Sun Oct 6 14:43:54 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhang Yanfei X-Patchwork-Id: 2992801 Return-Path: X-Original-To: patchwork-linux-acpi@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork1.web.kernel.org (Postfix) with ESMTP id BF81A9F243 for ; Sun, 6 Oct 2013 14:44:21 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id BCC02201F5 for ; Sun, 6 Oct 2013 14:44:19 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 6F6D3201BA for ; Sun, 6 Oct 2013 14:44:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753550Ab3JFOoR (ORCPT ); Sun, 6 Oct 2013 10:44:17 -0400 Received: from mail-pd0-f176.google.com ([209.85.192.176]:57174 "EHLO mail-pd0-f176.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753546Ab3JFOoP (ORCPT ); Sun, 6 Oct 2013 10:44:15 -0400 Received: by mail-pd0-f176.google.com with SMTP id q10so6052049pdj.21 for ; Sun, 06 Oct 2013 07:44:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type:content-transfer-encoding; bh=3nKMJOJMquL4mOtMIPI7Cl+0CAB7GEf7/88haOjMFBc=; b=rtkKgJCWQKV8TNZ4l/v5WcL/wEc9nSrcMZkDSb/7szblKgHVAfv6muNd24Bwr/cUc8 rEoM3pBH31yUffiD6LDDKCrF09nckGJ4mg8mf6CGU2R3+pr7mtoSyzanz2ib14UxCpKs BF4CAVHSdm9Zj0/1oPLbE9SmviDKPmu7R09VTAJH9opYwREEv50mzdqHRZNGyFYezTa5 sGV+wM2frdPSEth0CfYkjyTxD8HvENvYK8NykwEQjOYxeI5KIwfvAhSXxwCyM/XnL7n1 kRc39gtYk5AL7Wa2xjVe4u8WwmV+1nUKZ5MfE7t/2iea/8timsh+fEET5MAtLXodA4OA uoGQ== X-Received: by 10.68.163.132 with SMTP id yi4mr1731202pbb.158.1381070655000; Sun, 06 Oct 2013 07:44:15 -0700 (PDT) Received: from localhost.localdomain ([49.74.64.53]) by mx.google.com with ESMTPSA id xn12sm32630843pac.12.1969.12.31.16.00.00 (version=TLSv1 cipher=RC4-SHA bits=128/128); Sun, 06 Oct 2013 07:44:14 -0700 (PDT) Message-ID: <5251772A.2050509@gmail.com> Date: Sun, 06 Oct 2013 22:43:54 +0800 From: Zhang Yanfei User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.5) Gecko/20120607 Thunderbird/10.0.5 MIME-Version: 1.0 To: Toshi Kani , Andrew Morton CC: "Rafael J . Wysocki" , lenb@kernel.org, Thomas Gleixner , mingo@elte.hu, "H. Peter Anvin" , Tejun Heo , Wanpeng Li , Thomas Renninger , Yinghai Lu , Jiang Liu , Wen Congyang , Lai Jiangshan , isimatu.yasuaki@jp.fujitsu.com, izumi.taku@jp.fujitsu.com, Mel Gorman , Minchan Kim , mina86@mina86.com, gong.chen@linux.intel.com, vasilis.liaskovitis@profitbricks.com, lwoodman@redhat.com, Rik van Riel , jweiner@redhat.com, prarit@redhat.com, "x86@kernel.org" , linux-doc@vger.kernel.org, "linux-kernel@vger.kernel.org" , Linux MM , linux-acpi@vger.kernel.org, imtangchen@gmail.com, Zhang Yanfei , Tang Chen Subject: [PATCH part1 v6 update 6/6] mem-hotplug: Introduce movable_node boot option References: <524E2032.4020106@gmail.com> <524E21BC.7090104@gmail.com> <1381012134.5429.86.camel@misato.fc.hp.com> In-Reply-To: <1381012134.5429.86.camel@misato.fc.hp.com> Sender: linux-acpi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-acpi@vger.kernel.org X-Spam-Status: No, score=-6.8 required=5.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, DKIM_SIGNED, FREEMAIL_FROM, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, T_DKIM_INVALID, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Tang Chen The hot-Pluggable field in SRAT specifies which memory is hotpluggable. As we mentioned before, if hotpluggable memory is used by the kernel, it cannot be hot-removed. So memory hotplug users may want to set all hotpluggable memory in ZONE_MOVABLE so that the kernel won't use it. Memory hotplug users may also set a node as movable node, which has ZONE_MOVABLE only, so that the whole node can be hot-removed. But the kernel cannot use memory in ZONE_MOVABLE. By doing this, the kernel cannot use memory in movable nodes. This will cause NUMA performance down. And other users may be unhappy. So we need a way to allow users to enable and disable this functionality. In this patch, we introduce movable_node boot option to allow users to choose to not to consume hotpluggable memory at early boot time and later we can set it as ZONE_MOVABLE. To achieve this, the movable_node boot option will control the memblock allocation direction. That said, after memblock is ready, before SRAT is parsed, we should allocate memory near the kernel image as we explained in the previous patches. So if movable_node boot option is set, the kernel does the following: 1. After memblock is ready, make memblock allocate memory bottom up. 2. After SRAT is parsed, make memblock behave as default, allocate memory top down. Users can specify "movable_node" in kernel commandline to enable this functionality. For those who don't use memory hotplug or who don't want to lose their NUMA performance, just don't specify anything. The kernel will work as before. Suggested-by: Kamezawa Hiroyuki Acked-by: Tejun Heo Signed-off-by: Tang Chen Signed-off-by: Zhang Yanfei --- Documentation/kernel-parameters.txt | 3 +++ arch/x86/mm/numa.c | 11 +++++++++++ mm/Kconfig | 17 ++++++++++++----- mm/memory_hotplug.c | 31 +++++++++++++++++++++++++++++++ 4 files changed, 57 insertions(+), 5 deletions(-) diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 539a236..13201d4 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -1769,6 +1769,9 @@ bytes respectively. Such letter suffixes can also be entirely omitted. that the amount of memory usable for all allocations is not too small. + movable_node [KNL,X86] Boot-time switch to enable the effects + of CONFIG_MOVABLE_NODE=y. See mm/Kconfig for details. + MTD_Partition= [MTD] Format: ,,, diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c index 8bf93ba..24aec58 100644 --- a/arch/x86/mm/numa.c +++ b/arch/x86/mm/numa.c @@ -567,6 +567,17 @@ static int __init numa_init(int (*init_func)(void)) ret = init_func(); if (ret < 0) return ret; + + /* + * We reset memblock back to the top-down direction + * here because if we configured ACPI_NUMA, we have + * parsed SRAT in init_func(). It is ok to have the + * reset here even if we did't configure ACPI_NUMA + * or acpi numa init fails and fallbacks to dummy + * numa init. + */ + memblock_set_bottom_up(false); + ret = numa_cleanup_meminfo(&numa_meminfo); if (ret < 0) return ret; diff --git a/mm/Kconfig b/mm/Kconfig index 026771a..0db1cc6 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -153,11 +153,18 @@ config MOVABLE_NODE help Allow a node to have only movable memory. Pages used by the kernel, such as direct mapping pages cannot be migrated. So the corresponding - memory device cannot be hotplugged. This option allows users to - online all the memory of a node as movable memory so that the whole - node can be hotplugged. Users who don't use the memory hotplug - feature are fine with this option on since they don't online memory - as movable. + memory device cannot be hotplugged. This option allows the following + two things: + - When the system is booting, node full of hotpluggable memory can + be arranged to have only movable memory so that the whole node can + be hot-removed. (need movable_node boot option specified). + - After the system is up, the option allows users to online all the + memory of a node as movable memory so that the whole node can be + hot-removed. + + Users who don't use the memory hotplug feature are fine with this + option on since they don't specify movable_node boot option or they + don't online memory as movable. Say Y here if you want to hotplug a whole node. Say N here if you want kernel to use memory on all nodes evenly. diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index ed85fe3..6874c31 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -31,6 +31,7 @@ #include #include #include +#include #include @@ -1412,6 +1413,36 @@ static bool can_offline_normal(struct zone *zone, unsigned long nr_pages) } #endif /* CONFIG_MOVABLE_NODE */ +static int __init cmdline_parse_movable_node(char *p) +{ +#ifdef CONFIG_MOVABLE_NODE + /* + * Memory used by the kernel cannot be hot-removed because Linux + * cannot migrate the kernel pages. When memory hotplug is + * enabled, we should prevent memblock from allocating memory + * for the kernel. + * + * ACPI SRAT records all hotpluggable memory ranges. But before + * SRAT is parsed, we don't know about it. + * + * The kernel image is loaded into memory at very early time. We + * cannot prevent this anyway. So on NUMA system, we set any + * node the kernel resides in as un-hotpluggable. + * + * Since on modern servers, one node could have double-digit + * gigabytes memory, we can assume the memory around the kernel + * image is also un-hotpluggable. So before SRAT is parsed, just + * allocate memory near the kernel image to try the best to keep + * the kernel away from hotpluggable memory. + */ + memblock_set_bottom_up(true); +#else + pr_warn("movable_node option not supported\n"); +#endif + return 0; +} +early_param("movable_node", cmdline_parse_movable_node); + /* check which state of node_states will be changed when offline memory */ static void node_states_check_changes_offline(unsigned long nr_pages, struct zone *zone, struct memory_notify *arg)