From patchwork Tue Sep 24 18:35:14 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhang Yanfei X-Patchwork-Id: 2935351 Return-Path: X-Original-To: patchwork-linux-acpi@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork1.web.kernel.org (Postfix) with ESMTP id C6B4A9F524 for ; Tue, 24 Sep 2013 18:36:40 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id DA87520166 for ; Tue, 24 Sep 2013 18:36:38 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id B59E020148 for ; Tue, 24 Sep 2013 18:36:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752912Ab3IXSgg (ORCPT ); Tue, 24 Sep 2013 14:36:36 -0400 Received: from mail-pd0-f181.google.com ([209.85.192.181]:48868 "EHLO mail-pd0-f181.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750836Ab3IXSgf (ORCPT ); Tue, 24 Sep 2013 14:36:35 -0400 Received: by mail-pd0-f181.google.com with SMTP id g10so4934034pdj.40 for ; Tue, 24 Sep 2013 11:36:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type:content-transfer-encoding; bh=4kKti2EtYbRgwra+Xqp3tMffIKc1mHsjwWPd3tHZVdA=; b=muJ2C5PvNFsRvfB523SVxcPAo8TuJnVLXzt7tIF4j240WefogDtd3FCEW/2+hBg00X vHjK6asX/yRPeeTWyPm/0LxnRNdGLJGTc6nfX4vTy5Vx3kHHGXOUWDnUawAG9qWMg4lf sxQFhxtoGiGZJVAgtlYqtSNSp2GvCocsKWt2sleOV7VIfKu1kxJGirdGv2qAvwZwGjRU U0kOsZLP3J9tcRIxSW/BVW/HPExMRdZf7TFmJvbibLxJQD97mnZOhtWYETGjI74lQyN/ LJzttCax6xkM56Rft1ixeklU37rROrvu8sfBBKVqu4wC4szaju8Bt7lv4cnqPX1WTGUk OY1Q== X-Received: by 10.66.14.3 with SMTP id l3mr6818653pac.162.1380047795415; Tue, 24 Sep 2013 11:36:35 -0700 (PDT) Received: from localhost.localdomain ([222.95.211.254]) by mx.google.com with ESMTPSA id ed3sm18496667pbc.6.1969.12.31.16.00.00 (version=TLSv1 cipher=RC4-SHA bits=128/128); Tue, 24 Sep 2013 11:36:34 -0700 (PDT) Message-ID: <5241DB62.2090300@gmail.com> Date: Wed, 25 Sep 2013 02:35:14 +0800 From: Zhang Yanfei User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.5) Gecko/20120607 Thunderbird/10.0.5 MIME-Version: 1.0 To: "Rafael J . Wysocki" , lenb@kernel.org, Thomas Gleixner , mingo@elte.hu, "H. Peter Anvin" , Andrew Morton , Tejun Heo , Toshi Kani , Wanpeng Li , Thomas Renninger , Yinghai Lu , Jiang Liu , Wen Congyang , Lai Jiangshan , isimatu.yasuaki@jp.fujitsu.com, izumi.taku@jp.fujitsu.com, Mel Gorman , Minchan Kim , mina86@mina86.com, gong.chen@linux.intel.com, vasilis.liaskovitis@profitbricks.com, lwoodman@redhat.com, Rik van Riel , jweiner@redhat.com, prarit@redhat.com CC: "x86@kernel.org" , linux-doc@vger.kernel.org, "linux-kernel@vger.kernel.org" , Linux MM , linux-acpi@vger.kernel.org, imtangchen@gmail.com, Zhang Yanfei Subject: [PATCH v5 6/6] mem-hotplug: Introduce movablenode boot option References: <5241D897.1090905@gmail.com> In-Reply-To: <5241D897.1090905@gmail.com> Sender: linux-acpi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-acpi@vger.kernel.org X-Spam-Status: No, score=-5.7 required=5.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, DKIM_SIGNED, FREEMAIL_FROM, KHOP_BIG_TO_CC, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, T_DKIM_INVALID, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Tang Chen The hot-Pluggable field in SRAT specifies which memory is hotpluggable. As we mentioned before, if hotpluggable memory is used by the kernel, it cannot be hot-removed. So memory hotplug users may want to set all hotpluggable memory in ZONE_MOVABLE so that the kernel won't use it. Memory hotplug users may also set a node as movable node, which has ZONE_MOVABLE only, so that the whole node can be hot-removed. But the kernel cannot use memory in ZONE_MOVABLE. By doing this, the kernel cannot use memory in movable nodes. This will cause NUMA performance down. And other users may be unhappy. So we need a way to allow users to enable and disable this functionality. In this patch, we introduce movablenode boot option to allow users to choose to not to consume hotpluggable memory at early boot time and later we can set it as ZONE_MOVABLE. To achieve this, the movablenode boot option will control the memblock allocation direction. That said, after memblock is ready, before SRAT is parsed, we should allocate memory near the kernel image as we explained in the previous patches. So if movablenode boot option is set, the kernel does the following: 1. After memblock is ready, make memblock allocate memory bottom up. 2. After SRAT is parsed, make memblock behave as default, allocate memory top down. Users can specify "movablenode" in kernel commandline to enable this functionality. For those who don't use memory hotplug or who don't want to lose their NUMA performance, just don't specify anything. The kernel will work as before. Suggested-by: Kamezawa Hiroyuki Signed-off-by: Tang Chen Signed-off-by: Zhang Yanfei --- Documentation/kernel-parameters.txt | 15 +++++++++++++++ arch/x86/kernel/setup.c | 7 +++++++ mm/memory_hotplug.c | 31 +++++++++++++++++++++++++++++++ 3 files changed, 53 insertions(+), 0 deletions(-) diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 1a036cd..8c056c4 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -1769,6 +1769,21 @@ bytes respectively. Such letter suffixes can also be entirely omitted. that the amount of memory usable for all allocations is not too small. + movablenode [KNL,X86] This parameter enables/disables the + kernel to arrange hotpluggable memory ranges recorded + in ACPI SRAT(System Resource Affinity Table) as + ZONE_MOVABLE. And these memory can be hot-removed when + the system is up. + By specifying this option, all the hotpluggable memory + will be in ZONE_MOVABLE, which the kernel cannot use. + This will cause NUMA performance down. For users who + care about NUMA performance, just don't use it. + If all the memory ranges in the system are hotpluggable, + then the ones used by the kernel at early time, such as + kernel code and data segments, initrd file and so on, + won't be set as ZONE_MOVABLE, and won't be hotpluggable. + Otherwise the kernel won't have enough memory to boot. + MTD_Partition= [MTD] Format: ,,, diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index 36cfce3..b8fefb7 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -1132,6 +1132,13 @@ void __init setup_arch(char **cmdline_p) early_acpi_boot_init(); initmem_init(); + + /* + * When ACPI SRAT is parsed, which is done in initmem_init(), + * set memblock back to the top-down direction. + */ + memblock_set_bottom_up(false); + memblock_find_dma_reserve(); /* diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index ed85fe3..dcd819a 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -31,6 +31,7 @@ #include #include #include +#include #include @@ -1412,6 +1413,36 @@ static bool can_offline_normal(struct zone *zone, unsigned long nr_pages) } #endif /* CONFIG_MOVABLE_NODE */ +static int __init cmdline_parse_movablenode(char *p) +{ +#ifdef CONFIG_MOVABLE_NODE + /* + * Memory used by the kernel cannot be hot-removed because Linux + * cannot migrate the kernel pages. When memory hotplug is + * enabled, we should prevent memblock from allocating memory + * for the kernel. + * + * ACPI SRAT records all hotpluggable memory ranges. But before + * SRAT is parsed, we don't know about it. + * + * The kernel image is loaded into memory at very early time. We + * cannot prevent this anyway. So on NUMA system, we set any + * node the kernel resides in as un-hotpluggable. + * + * Since on modern servers, one node could have double-digit + * gigabytes memory, we can assume the memory around the kernel + * image is also un-hotpluggable. So before SRAT is parsed, just + * allocate memory near the kernel image to try the best to keep + * the kernel away from hotpluggable memory. + */ + memblock_set_bottom_up(true); +#else + pr_warn("movablenode option not supported"); +#endif + return 0; +} +early_param("movablenode", cmdline_parse_movablenode); + /* check which state of node_states will be changed when offline memory */ static void node_states_check_changes_offline(unsigned long nr_pages, struct zone *zone, struct memory_notify *arg)