From patchwork Fri Oct 4 02:00:07 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhang Yanfei X-Patchwork-Id: 2986951 Return-Path: X-Original-To: patchwork-linux-acpi@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork1.web.kernel.org (Postfix) with ESMTP id CA1E99F288 for ; Fri, 4 Oct 2013 02:00:27 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id CAC6E2044A for ; Fri, 4 Oct 2013 02:00:26 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id C81922018B for ; Fri, 4 Oct 2013 02:00:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751508Ab3JDCAY (ORCPT ); Thu, 3 Oct 2013 22:00:24 -0400 Received: from mail-pb0-f45.google.com ([209.85.160.45]:57543 "EHLO mail-pb0-f45.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753525Ab3JDCAW (ORCPT ); Thu, 3 Oct 2013 22:00:22 -0400 Received: by mail-pb0-f45.google.com with SMTP id mc17so3270538pbc.32 for ; Thu, 03 Oct 2013 19:00:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type:content-transfer-encoding; bh=i5R/+tQUdZY46TBi4dblYwsTN+mfEh4SxgN3nxz4LiY=; b=mQQQ5RtRWwiuVE+no/vGqRwkGxAcRcJsJbo53EODjpUFDIF03ExZLFsfrgt/+Fxzw/ fcWhyE/TW9uz2nnfp/v35IRiYU3k+QQcKP+ZIbWydmdPN57ERbEmpZMQwcm8WYT3JHSu JiVV1hjKppwXcbHHRdKCO6MvFmP2lKsy4j6A84vU21RyRfLtMSyvIuokXUTndUIzIYP7 BPcTkD6s/5i9HlN7+daSkRb+5PIYCgpWBwYhp5dxSMX3LBm3Bg3roRQ7uQeyis7MYgei WLyukY4X2wDfdDsvVJC1jITZ5ZobT9qaz6Zd/AieqSGAoJdL94GayDnUwvgyCpil4Cvr PcSA== X-Received: by 10.66.142.42 with SMTP id rt10mr12579242pab.1.1380852022237; Thu, 03 Oct 2013 19:00:22 -0700 (PDT) Received: from localhost.localdomain ([121.225.217.236]) by mx.google.com with ESMTPSA id im2sm11152570pbd.31.1969.12.31.16.00.00 (version=TLSv1 cipher=RC4-SHA bits=128/128); Thu, 03 Oct 2013 19:00:21 -0700 (PDT) Message-ID: <524E2127.4090904@gmail.com> Date: Fri, 04 Oct 2013 10:00:07 +0800 From: Zhang Yanfei User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.5) Gecko/20120607 Thunderbird/10.0.5 MIME-Version: 1.0 To: Andrew Morton , "Rafael J . Wysocki" , lenb@kernel.org, Thomas Gleixner , mingo@elte.hu, "H. Peter Anvin" , Tejun Heo , Toshi Kani , Wanpeng Li , Thomas Renninger , Yinghai Lu , Jiang Liu , Wen Congyang , Lai Jiangshan , isimatu.yasuaki@jp.fujitsu.com, izumi.taku@jp.fujitsu.com, Mel Gorman , Minchan Kim , mina86@mina86.com, gong.chen@linux.intel.com, vasilis.liaskovitis@profitbricks.com, lwoodman@redhat.com, Rik van Riel , jweiner@redhat.com, prarit@redhat.com CC: "x86@kernel.org" , linux-doc@vger.kernel.org, "linux-kernel@vger.kernel.org" , Linux MM , linux-acpi@vger.kernel.org, imtangchen@gmail.com, Zhang Yanfei , Tang Chen Subject: [PATCH part1 v6 4/6] x86/mem-hotplug: Support initialize page tables in bottom-up References: <524E2032.4020106@gmail.com> In-Reply-To: <524E2032.4020106@gmail.com> Sender: linux-acpi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-acpi@vger.kernel.org X-Spam-Status: No, score=-4.0 required=5.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, DKIM_SIGNED, FREEMAIL_FROM, KHOP_BIG_TO_CC, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, T_DKIM_INVALID,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Tang Chen The Linux kernel cannot migrate pages used by the kernel. As a result, kernel pages cannot be hot-removed. So we cannot allocate hotpluggable memory for the kernel. In a memory hotplug system, any numa node the kernel resides in should be unhotpluggable. And for a modern server, each node could have at least 16GB memory. So memory around the kernel image is highly likely unhotpluggable. ACPI SRAT (System Resource Affinity Table) contains the memory hotplug info. But before SRAT is parsed, memblock has already started to allocate memory for the kernel. So we need to prevent memblock from doing this. So direct memory mapping page tables setup is the case. init_mem_mapping() is called before SRAT is parsed. To prevent page tables being allocated within hotpluggable memory, we will use bottom-up direction to allocate page tables from the end of kernel image to the higher memory. Acked-by: Tejun Heo Signed-off-by: Tang Chen Signed-off-by: Zhang Yanfei --- arch/x86/mm/init.c | 71 ++++++++++++++++++++++++++++++++++++++++++++++++++- 1 files changed, 69 insertions(+), 2 deletions(-) diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c index ea2be79..5cea9ed 100644 --- a/arch/x86/mm/init.c +++ b/arch/x86/mm/init.c @@ -458,6 +458,51 @@ static void __init memory_map_top_down(unsigned long map_start, init_range_memory_mapping(real_end, map_end); } +/** + * memory_map_bottom_up - Map [map_start, map_end) bottom up + * @map_start: start address of the target memory range + * @map_end: end address of the target memory range + * + * This function will setup direct mapping for memory range + * [map_start, map_end) in bottom-up. Since we have limited the + * bottom-up allocation above the kernel, the page tables will + * be allocated just above the kernel and we map the memory + * in [map_start, map_end) in bottom-up. + */ +static void __init memory_map_bottom_up(unsigned long map_start, + unsigned long map_end) +{ + unsigned long next, new_mapped_ram_size, start; + unsigned long mapped_ram_size = 0; + /* step_size need to be small so pgt_buf from BRK could cover it */ + unsigned long step_size = PMD_SIZE; + + start = map_start; + min_pfn_mapped = start >> PAGE_SHIFT; + + /* + * We start from the bottom (@map_start) and go to the top (@map_end). + * The memblock_find_in_range() gets us a block of RAM from the + * end of RAM in [min_pfn_mapped, max_pfn_mapped) used as new pages + * for page table. + */ + while (start < map_end) { + if (map_end - start > step_size) { + next = round_up(start + 1, step_size); + if (next > map_end) + next = map_end; + } else + next = map_end; + + new_mapped_ram_size = init_range_memory_mapping(start, next); + start = next; + + if (new_mapped_ram_size > mapped_ram_size) + step_size <<= STEP_SIZE_SHIFT; + mapped_ram_size += new_mapped_ram_size; + } +} + void __init init_mem_mapping(void) { unsigned long end; @@ -473,8 +518,30 @@ void __init init_mem_mapping(void) /* the ISA range is always mapped regardless of memory holes */ init_memory_mapping(0, ISA_END_ADDRESS); - /* setup direct mapping for range [ISA_END_ADDRESS, end) in top-down*/ - memory_map_top_down(ISA_END_ADDRESS, end); + /* + * If the allocation is in bottom-up direction, we setup direct mapping + * in bottom-up, otherwise we setup direct mapping in top-down. + */ + if (memblock_bottom_up()) { + unsigned long kernel_end; + +#ifdef CONFIG_X86 + kernel_end = __pa_symbol(_end); +#else + kernel_end = __pa(RELOC_HIDE((unsigned long)(_end), 0)); +#endif + /* + * we need two separate calls here. This is because we want to + * allocate page tables above the kernel. So we first map + * [kernel_end, end) to make memory above the kernel be mapped + * as soon as possible. And then use page tables allocated above + * the kernel to map [ISA_END_ADDRESS, kernel_end). + */ + memory_map_bottom_up(kernel_end, end); + memory_map_bottom_up(ISA_END_ADDRESS, kernel_end); + } else { + memory_map_top_down(ISA_END_ADDRESS, end); + } #ifdef CONFIG_X86_64 if (max_pfn > max_low_pfn) {