From patchwork Tue Sep 24 10:05:03 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yanfei Zhang X-Patchwork-Id: 2933201 Return-Path: X-Original-To: patchwork-linux-acpi@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork2.web.kernel.org (Postfix) with ESMTP id C7516BFF05 for ; Tue, 24 Sep 2013 10:06:33 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id C98C220138 for ; Tue, 24 Sep 2013 10:06:28 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id E8FEC20124 for ; Tue, 24 Sep 2013 10:06:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753674Ab3IXKGH (ORCPT ); Tue, 24 Sep 2013 06:06:07 -0400 Received: from cn.fujitsu.com ([222.73.24.84]:59407 "EHLO song.cn.fujitsu.com" rhost-flags-OK-FAIL-OK-OK) by vger.kernel.org with ESMTP id S1753194Ab3IXKGE (ORCPT ); Tue, 24 Sep 2013 06:06:04 -0400 X-IronPort-AV: E=Sophos;i="4.90,969,1371052800"; d="scan'208";a="8606644" Received: from unknown (HELO tang.cn.fujitsu.com) ([10.167.250.3]) by song.cn.fujitsu.com with ESMTP; 24 Sep 2013 18:02:48 +0800 Received: from fnstmail02.fnst.cn.fujitsu.com (tang.cn.fujitsu.com [127.0.0.1]) by tang.cn.fujitsu.com (8.14.3/8.13.1) with ESMTP id r8OA5nDa017488; Tue, 24 Sep 2013 18:05:49 +0800 Received: from [10.167.226.121] ([10.167.226.121]) by fnstmail02.fnst.cn.fujitsu.com (Lotus Domino Release 8.5.3) with ESMTP id 2013092418040526-1733225 ; Tue, 24 Sep 2013 18:04:05 +0800 Message-ID: <524163CF.3010303@cn.fujitsu.com> Date: Tue, 24 Sep 2013 18:05:03 +0800 From: Zhang Yanfei User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130514 Thunderbird/17.0.6 MIME-Version: 1.0 To: "Rafael J . Wysocki" , lenb@kernel.org, Thomas Gleixner , mingo@elte.hu, "H. Peter Anvin" , Andrew Morton , Tejun Heo , Toshi Kani , Wanpeng Li , Thomas Renninger , Yinghai Lu , Jiang Liu , Wen Congyang , Lai Jiangshan , isimatu.yasuaki@jp.fujitsu.com, izumi.taku@jp.fujitsu.com, Mel Gorman , Minchan Kim , mina86@mina86.com, gong.chen@linux.intel.com, vasilis.liaskovitis@profitbricks.com, lwoodman@redhat.com, Rik van Riel , jweiner@redhat.com, prarit@redhat.com CC: Zhang Yanfei , "x86@kernel.org" , linux-doc@vger.kernel.org, "linux-kernel@vger.kernel.org" , Linux MM , linux-acpi@vger.kernel.org, imtangchen@gmail.com, Zhang Yanfei Subject: [PATCH 2/6] memblock: Introduce bottom-up allocation mode References: <524162DA.30004@cn.fujitsu.com> In-Reply-To: <524162DA.30004@cn.fujitsu.com> X-MIMETrack: Itemize by SMTP Server on mailserver/fnst(Release 8.5.3|September 15, 2011) at 2013/09/24 18:04:05, Serialize by Router on mailserver/fnst(Release 8.5.3|September 15, 2011) at 2013/09/24 18:04:18, Serialize complete at 2013/09/24 18:04:18 Sender: linux-acpi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-acpi@vger.kernel.org X-Spam-Status: No, score=-5.8 required=5.0 tests=BAYES_00,KHOP_BIG_TO_CC, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Tang Chen The Linux kernel cannot migrate pages used by the kernel. As a result, kernel pages cannot be hot-removed. So we cannot allocate hotpluggable memory for the kernel. ACPI SRAT (System Resource Affinity Table) contains the memory hotplug info. But before SRAT is parsed, memblock has already started to allocate memory for the kernel. So we need to prevent memblock from doing this. In a memory hotplug system, any numa node the kernel resides in should be unhotpluggable. And for a modern server, each node could have at least 16GB memory. So memory around the kernel image is highly likely unhotpluggable. So the basic idea is: Allocate memory from the end of the kernel image and to the higher memory. Since memory allocation before SRAT is parsed won't be too much, it could highly likely be in the same node with kernel image. The current memblock can only allocate memory top-down. So this patch introduces a new bottom-up allocation mode to allocate memory bottom-up. And later when we use this allocation direction to allocate memory, we will limit the start address above the kernel. Signed-off-by: Tang Chen Signed-off-by: Zhang Yanfei --- include/linux/memblock.h | 26 +++++++++++++++++++ mm/memblock.c | 63 ++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 89 insertions(+), 0 deletions(-) diff --git a/include/linux/memblock.h b/include/linux/memblock.h index 31e95ac..c14bca5 100644 --- a/include/linux/memblock.h +++ b/include/linux/memblock.h @@ -19,6 +19,13 @@ #define INIT_MEMBLOCK_REGIONS 128 +/* Allocation direction */ +enum { + MEMBLOCK_DIRECTION_TOP_DOWN, + MEMBLOCK_DIRECTION_BOTTOM_UP, + NR_MEMLBOCK_DIRECTIONS +}; + struct memblock_region { phys_addr_t base; phys_addr_t size; @@ -35,6 +42,7 @@ struct memblock_type { }; struct memblock { + int current_direction; /* current allocation direction */ phys_addr_t current_limit; struct memblock_type memory; struct memblock_type reserved; @@ -148,6 +156,24 @@ phys_addr_t memblock_alloc_try_nid(phys_addr_t size, phys_addr_t align, int nid) phys_addr_t memblock_alloc(phys_addr_t size, phys_addr_t align); +#ifdef CONFIG_MOVABLE_NODE +static inline void memblock_set_bottom_up(bool enable) +{ + if (enable) + memblock.current_direction = MEMBLOCK_DIRECTION_BOTTOM_UP; + else + memblock.current_direction = MEMBLOCK_DIRECTION_TOP_DOWN; +} + +static inline bool memblock_bottom_up(void) +{ + return memblock.current_direction == MEMBLOCK_DIRECTION_BOTTOM_UP; +} +#else +static inline void memblock_set_bottom_up(bool enable) {} +static inline bool memblock_bottom_up(void) { return false; } +#endif + /* Flags for memblock_alloc_base() amd __memblock_alloc_base() */ #define MEMBLOCK_ALLOC_ANYWHERE (~(phys_addr_t)0) #define MEMBLOCK_ALLOC_ACCESSIBLE 0 diff --git a/mm/memblock.c b/mm/memblock.c index 3d80c74..5859b8e 100644 --- a/mm/memblock.c +++ b/mm/memblock.c @@ -20,6 +20,8 @@ #include #include +#include + static struct memblock_region memblock_memory_init_regions[INIT_MEMBLOCK_REGIONS] __initdata_memblock; static struct memblock_region memblock_reserved_init_regions[INIT_MEMBLOCK_REGIONS] __initdata_memblock; @@ -32,6 +34,7 @@ struct memblock memblock __initdata_memblock = { .reserved.cnt = 1, /* empty dummy entry */ .reserved.max = INIT_MEMBLOCK_REGIONS, + .current_direction = MEMBLOCK_DIRECTION_TOP_DOWN, .current_limit = MEMBLOCK_ALLOC_ANYWHERE, }; @@ -83,6 +86,38 @@ static long __init_memblock memblock_overlaps_region(struct memblock_type *type, } /** + * __memblock_find_range - find free area utility + * @start: start of candidate range + * @end: end of candidate range, can be %MEMBLOCK_ALLOC_{ANYWHERE|ACCESSIBLE} + * @size: size of free area to find + * @align: alignment of free area to find + * @nid: nid of the free area to find, %MAX_NUMNODES for any node + * + * Utility called from memblock_find_in_range_node(), find free area bottom-up. + * + * RETURNS: + * Found address on success, %0 on failure. + */ +static phys_addr_t __init_memblock +__memblock_find_range(phys_addr_t start, phys_addr_t end, phys_addr_t size, + phys_addr_t align, int nid) +{ + phys_addr_t this_start, this_end, cand; + u64 i; + + for_each_free_mem_range(i, nid, &this_start, &this_end, NULL) { + this_start = clamp(this_start, start, end); + this_end = clamp(this_end, start, end); + + cand = round_up(this_start, align); + if (cand < this_end && this_end - cand >= size) + return cand; + } + + return 0; +} + +/** * __memblock_find_range_rev - find free area utility, in reverse order * @start: start of candidate range * @end: end of candidate range, can be %MEMBLOCK_ALLOC_{ANYWHERE|ACCESSIBLE} @@ -127,6 +162,10 @@ __memblock_find_range_rev(phys_addr_t start, phys_addr_t end, * * Find @size free area aligned to @align in the specified range and node. * + * When allocation direction is bottom-up, the @start should be greater + * than the end of the kernel image. Otherwise, it will be trimmed. And also, + * if bottom-up allocation failed, will try to allocate memory top-down. + * * RETURNS: * Found address on success, %0 on failure. */ @@ -134,6 +173,8 @@ phys_addr_t __init_memblock memblock_find_in_range_node(phys_addr_t start, phys_addr_t end, phys_addr_t size, phys_addr_t align, int nid) { + int ret; + /* pump up @end */ if (end == MEMBLOCK_ALLOC_ACCESSIBLE) end = memblock.current_limit; @@ -142,6 +183,28 @@ phys_addr_t __init_memblock memblock_find_in_range_node(phys_addr_t start, start = max_t(phys_addr_t, start, PAGE_SIZE); end = max(start, end); + if (memblock_bottom_up()) { + phys_addr_t bottom_up_start; + + /* make sure we will allocate above the kernel */ + bottom_up_start = max_t(phys_addr_t, start, __pa_symbol(_end)); + + /* ok, try bottom-up allocation first */ + ret = __memblock_find_range(bottom_up_start, end, + size, align, nid); + if (ret) + return ret; + + /* + * we always limit bottom-up allocation above the kernel, + * but top-down allocation doesn't have the limit, so + * retrying top-down allocation may succeed when bottom-up + * allocation failed. + */ + pr_warn("memblock: Failed to allocate memory in bottom up " + "direction. Now try top down direction.\n"); + } + return __memblock_find_range_rev(start, end, size, align, nid); }