From patchwork Sat Oct 12 06:03:19 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yanfei Zhang X-Patchwork-Id: 3030851 Return-Path: X-Original-To: patchwork-linux-acpi@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork1.web.kernel.org (Postfix) with ESMTP id DFA779F1E1 for ; Sat, 12 Oct 2013 06:15:02 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id E24392039A for ; Sat, 12 Oct 2013 06:15:01 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id E317E20394 for ; Sat, 12 Oct 2013 06:15:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751821Ab3JLGPA (ORCPT ); Sat, 12 Oct 2013 02:15:00 -0400 Received: from cn.fujitsu.com ([222.73.24.84]:7432 "EHLO song.cn.fujitsu.com" rhost-flags-OK-FAIL-OK-OK) by vger.kernel.org with ESMTP id S1751742Ab3JLGO7 (ORCPT ); Sat, 12 Oct 2013 02:14:59 -0400 X-IronPort-AV: E=Sophos;i="4.93,480,1378828800"; d="scan'208";a="8731081" Received: from unknown (HELO tang.cn.fujitsu.com) ([10.167.250.3]) by song.cn.fujitsu.com with ESMTP; 12 Oct 2013 14:11:41 +0800 Received: from fnstmail02.fnst.cn.fujitsu.com (tang.cn.fujitsu.com [127.0.0.1]) by tang.cn.fujitsu.com (8.14.3/8.13.1) with ESMTP id r9C64feE019375; Sat, 12 Oct 2013 14:04:46 +0800 Received: from [10.167.226.121] ([10.167.226.121]) by fnstmail02.fnst.cn.fujitsu.com (Lotus Domino Release 8.5.3) with ESMTP id 2013101214023047-2214103 ; Sat, 12 Oct 2013 14:02:30 +0800 Message-ID: <5258E627.7020303@cn.fujitsu.com> Date: Sat, 12 Oct 2013 14:03:19 +0800 From: Zhang Yanfei User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130514 Thunderbird/17.0.6 MIME-Version: 1.0 To: Andrew Morton , "Rafael J . Wysocki" , Len Brown , Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , Tejun Heo , Toshi Kani , Wanpeng Li , Thomas Renninger , Yinghai Lu , Jiang Liu , Wen Congyang , Lai Jiangshan , Yasuaki Ishimatsu , Taku Izumi , Mel Gorman , Minchan Kim , "mina86@mina86.com" , "gong.chen@linux.intel.com" , Vasilis Liaskovitis , "lwoodman@redhat.com" , Rik van Riel , "jweiner@redhat.com" , Prarit Bhargava CC: "x86@kernel.org" , "linux-kernel@vger.kernel.org" , Linux MM , ACPI Devel Maling List , Chen Tang , Tang Chen , Zhang Yanfei Subject: [PATCH part2 v2 1/8] x86: get pg_data_t's memory from other node References: <5258E560.5050506@cn.fujitsu.com> In-Reply-To: <5258E560.5050506@cn.fujitsu.com> X-MIMETrack: Itemize by SMTP Server on mailserver/fnst(Release 8.5.3|September 15, 2011) at 2013/10/12 14:02:30, Serialize by Router on mailserver/fnst(Release 8.5.3|September 15, 2011) at 2013/10/12 14:03:08, Serialize complete at 2013/10/12 14:03:08 Sender: linux-acpi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-acpi@vger.kernel.org X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,KHOP_BIG_TO_CC, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Yasuaki Ishimatsu If system can create movable node which all memory of the node is allocated as ZONE_MOVABLE, setup_node_data() cannot allocate memory for the node's pg_data_t. So, invoke memblock_alloc_nid(...MAX_NUMNODES) again to retry when the first allocation fails. Otherwise, the system could failed to boot. (We don't use memblock_alloc_try_nid() to retry because in this function, if the allocation fails, it will panic the system.) The node_data could be on hotpluggable node. And so could pagetable and vmemmap. But for now, doing so will break memory hot-remove path. A node could have several memory devices. And the device who holds node data should be hot-removed in the last place. But in NUMA level, we don't know which memory_block (/sys/devices/system/node/nodeX/memoryXXX) belongs to which memory device. We only have node. So we can only do node hotplug. But in virtualization, developers are now developing memory hotplug in qemu, which support a single memory device hotplug. So a whole node hotplug will not satisfy virtualization users. So at last, we concluded that we'd better do memory hotplug and local node things (local node node data, pagetable, vmemmap, ...) in two steps. Please refer to https://lkml.org/lkml/2013/6/19/73 For now, we put node_data of movable node to another node, and then improve it in the future. Signed-off-by: Yasuaki Ishimatsu Signed-off-by: Lai Jiangshan Signed-off-by: Tang Chen Signed-off-by: Jiang Liu Signed-off-by: Tang Chen Signed-off-by: Zhang Yanfei Reviewed-by: Wanpeng Li Acked-by: Toshi Kani --- arch/x86/mm/numa.c | 11 ++++++++--- 1 files changed, 8 insertions(+), 3 deletions(-) diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c index 24aec58..e17db5d 100644 --- a/arch/x86/mm/numa.c +++ b/arch/x86/mm/numa.c @@ -211,9 +211,14 @@ static void __init setup_node_data(int nid, u64 start, u64 end) */ nd_pa = memblock_alloc_nid(nd_size, SMP_CACHE_BYTES, nid); if (!nd_pa) { - pr_err("Cannot find %zu bytes in node %d\n", - nd_size, nid); - return; + pr_warn("Cannot find %zu bytes in node %d, so try other nodes", + nd_size, nid); + nd_pa = memblock_alloc_nid(nd_size, SMP_CACHE_BYTES, + MAX_NUMNODES); + if (!nd_pa) { + pr_err("Cannot find %zu bytes in any node\n", nd_size); + return; + } } nd = __va(nd_pa);