From patchwork Fri Jan 11 05:12:56 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pingfan Liu X-Patchwork-Id: 10757377 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 69ED291E for ; Fri, 11 Jan 2019 05:14:05 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5C4C0298E5 for ; Fri, 11 Jan 2019 05:14:05 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 4F1E6299B3; Fri, 11 Jan 2019 05:14:05 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=unavailable version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 96E4B299BE for ; Fri, 11 Jan 2019 05:14:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7B0308E0009; Fri, 11 Jan 2019 00:14:03 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 75F2C8E0001; Fri, 11 Jan 2019 00:14:03 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 675428E0009; Fri, 11 Jan 2019 00:14:03 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f200.google.com (mail-pl1-f200.google.com [209.85.214.200]) by kanga.kvack.org (Postfix) with ESMTP id 245958E0001 for ; Fri, 11 Jan 2019 00:14:03 -0500 (EST) Received: by mail-pl1-f200.google.com with SMTP id m13so7604959pls.15 for ; Thu, 10 Jan 2019 21:14:03 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:from:to:cc:subject:date :message-id:in-reply-to:references; bh=rzNgf1HX1LRzHvXNDEH6fLsdUxAcQ1NDOrWEAz1gjyY=; b=RMPrblQLY/C+WBLJy/jpDGr+aQ3PBGaWANtBW6m5UEewz8Qr93maULytOtUd7E1r6L MTwuPwUPEb4+7upGgISyijSC+rH5f+INFHTp4pEhrOCR1axzXIPu3EnrPwSt8amnU3d9 E2l+/nImDi9EZkua3V5Tgi345TDI9a24fSwbrkafcSJykA8bHMXqTcBKHg8/OM/OfV/k 8Xd41sg0i6j48Ycpfz9oKcOYqS6BVTBp79LLIJy3M5ieYnyCEm+WOl0OarmawyjULkXu WbzLjiBSz5EWJnl+tTw86i/bn+wnCwsoL3WTmY99RC4GzF+VWH/FUG0YZY9XMkD9jS2G hhdA== X-Gm-Message-State: AJcUukfgn/VzN01CKVl36fvA9rEtspWlxD1ql7YdK+iq1necfyJtHqh6 DGQQBCatvuxnUT2FrcA+ZT5IUWeUCsAXw4C98GMR7EAU326wuFilooEdzpq5ZfIqy2uZ21F0kDK dK6Wr6GaJdRIclTWRofJ/AYumFJzMGC5HoVAy3LY3MVHU5zcMUdI+Aqb5pr/FMz8/cXMifb5cr4 DM9+ZETrafexhzwpfMfat9zLbSqL7VfwnhdR38/W+h4723EGy7UcbAT95ebmcVb5BMPz104S5sF tJAJjK5MLVsaK4gvVWXfjq8DvQheOyBsapDpcqgiCYBBKOFjgS7Cx7vJV4juXDr1/1RodkD+tTt iorpdmCDN7dCDipz/XBzNJWjKivpYsE0+LVfXhRculpPgnIMdDicW3aqNovUlE4I7qK5jf4qlv4 d X-Received: by 2002:a17:902:2a89:: with SMTP id j9mr8501111plb.296.1547183642782; Thu, 10 Jan 2019 21:14:02 -0800 (PST) X-Received: by 2002:a17:902:2a89:: with SMTP id j9mr8501074plb.296.1547183641839; Thu, 10 Jan 2019 21:14:01 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1547183641; cv=none; d=google.com; s=arc-20160816; b=XktbV1Kjn6XWv2fHKX4rRjH9EV1UXbLBCnjDBp5SftUkJAUf2bwGAI8SlJGZ++JU68 tuA8OsZctkp85E/PwidaRcMeN68gh787M8+xXUNJoGT6i9y+/4zqhRAxM0wci6yyV27M 3b7C7hs8BRrTmhPmLorXCJRIaaT4IVHEDNYRfPpKg/cCbRuT6ncbQ3Jr13HD77iXmMPl NPFNF1sOupBMXJMtCVwuS+G2f3ZdhzA2Us/3ljplW4GJ6VsFa8LeewxD94fVxjMZ1L1X 1cf/vg5kIZNxhkrVx43fGRiEw4lz3dfhy1qd7CgLnO2htm2fk27gMccIV5LZH9+APITJ 9loQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=rzNgf1HX1LRzHvXNDEH6fLsdUxAcQ1NDOrWEAz1gjyY=; b=fL6ptVag6inicbXvLNJG1eUNjCEZgrg7jpRWCI3RcTjPVpeeNqo6EA4cc/hLgjTWUT 2ULOXnvyoL6+x+T3GrJ3YbypddgLwt8jXxigoE29VgQUteFOIk8Dx0NdAiGq60oQH409 QXexOGW5BnfKZqzyMEDlEFNc/y1wwOyAFwYWCWROmCEqLsVD/o9OYcC4lAOp11E9uvlF qH+S9odMWy4sAiX8Xu0bGYZJ/SF0OXn1iWiEezwrVbPgg7cvZqFdTwZauOGCCWZh38gf B0tsH/0cxC4NFSjq38yBTsKFFkXcY/sBBUtoG2jKJBI/Xec0+x1bQr789ww//Pi7MKdr UiBw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=nNp9fdlM; spf=pass (google.com: domain of kernelfans@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=kernelfans@gmail.com; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id l189sor53290759pgd.51.2019.01.10.21.14.01 for (Google Transport Security); Thu, 10 Jan 2019 21:14:01 -0800 (PST) Received-SPF: pass (google.com: domain of kernelfans@gmail.com designates 209.85.220.65 as permitted sender) client-ip=209.85.220.65; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=nNp9fdlM; spf=pass (google.com: domain of kernelfans@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=kernelfans@gmail.com; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=rzNgf1HX1LRzHvXNDEH6fLsdUxAcQ1NDOrWEAz1gjyY=; b=nNp9fdlMm4AGNuCQXSEDspc1l3wd3kiL5pN2RY2C6ARIPyzKYzWxs5yTIKKzt1E/4c 1AQPFM9m526tFP8XgW/FDRCIVK8FzbGNOPrFqS9dJ85Eljim55SvQ3m7Pl+qp+NYmK9Y jzL2+No0eouItU9JipYU8TM4z84mydN9SQx+oqxcATXF6g+rkuoBRWk+xwI/wykLck8z 4JNlCbwwgSvb8dqu8YF7hsEE9t6QDTiw44G6vYyaELAnc+yRqFcNh+xBfJKTGDJL5eyf 77VVl3fWAVzrptI5Fsj5j/LsJH16Cgvwlx9V6lCfw/knmoQ6rH3tb8VrPeROAEychbS1 zyaA== X-Google-Smtp-Source: ALg8bN4yq2XtfY6ND6zW/GDd12iNHsw8DjiYMbmvmejWTF8nmnyuT4ditOyjlFffV1szW56d6KYupA== X-Received: by 2002:a63:3602:: with SMTP id d2mr11870595pga.404.1547183641475; Thu, 10 Jan 2019 21:14:01 -0800 (PST) Received: from mylaptop.redhat.com ([209.132.188.80]) by smtp.gmail.com with ESMTPSA id q7sm93490471pgp.40.2019.01.10.21.13.54 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 10 Jan 2019 21:14:00 -0800 (PST) From: Pingfan Liu To: linux-kernel@vger.kernel.org Cc: Pingfan Liu , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H. Peter Anvin" , Dave Hansen , Andy Lutomirski , Peter Zijlstra , "Rafael J. Wysocki" , Len Brown , Yinghai Lu , Tejun Heo , Chao Fan , Baoquan He , Juergen Gross , Andrew Morton , Mike Rapoport , Vlastimil Babka , Michal Hocko , x86@kernel.org, linux-acpi@vger.kernel.org, linux-mm@kvack.org Subject: [PATCHv2 6/7] x86/mm: remove bottom-up allocation style for x86_64 Date: Fri, 11 Jan 2019 13:12:56 +0800 Message-Id: <1547183577-20309-7-git-send-email-kernelfans@gmail.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1547183577-20309-1-git-send-email-kernelfans@gmail.com> References: <1547183577-20309-1-git-send-email-kernelfans@gmail.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Although kaslr-kernel can avoid to stain the movable node. [1] But the pgtable can still stain the movable node. That is a probability problem, although low, but exist. This patch tries to make it certainty by allocating pgtable on unmovable node, instead of following kernel end. There are two acheivements by this patch: -1st. keep the subtree of pgtable away from movable node. With the previous patch, at the point of init_mem_mapping(), memblock allocator can work with the knowledge of acpi memory hotmovable info, and avoid to stain the movable node. As a result, memory_map_bottom_up() is not needed any more. The following figure show the defection of current bottom-up style: [startA, endA][startB, "kaslr kernel verly close to" endB][startC, endC] If nodeA,B is unmovable, while nodeC is movable, then init_mem_mapping() can generate pgtable on nodeC, which stain movable node. For more lengthy background, please refer to Background section -2nd. simplify the logic of memory_map_top_down() Thanks to the help of early_make_pgtable(), x86_64 can directly set up the subtree of pgtable at any place, hence the careful iteration in memory_map_top_down() can be discard. *Background section* When kaslr kernel can be guaranteed to sit inside unmovable node after [1]. But if kaslr kernel is located near the end of the movable node, then bottom-up allocator may create pagetable which crosses the boundary between unmovable node and movable node. It is a probability issue, two factors include -1. how big the gap between kernel end and unmovable node's end. -2. how many memory does the system own. Alternative way to fix this issue is by increasing the gap by boot/compressed/kaslr*. But taking the scenario of PB level memory, the pagetable will take server MB even if using 1GB page, different page attr and fragment will make things worse. So it is hard to decide how much should the gap increase. [1]: https://lore.kernel.org/patchwork/patch/1029376/ Signed-off-by: Pingfan Liu Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Borislav Petkov Cc: "H. Peter Anvin" Cc: Dave Hansen Cc: Andy Lutomirski Cc: Peter Zijlstra Cc: "Rafael J. Wysocki" Cc: Len Brown Cc: Yinghai Lu Cc: Tejun Heo Cc: Chao Fan Cc: Baoquan He Cc: Juergen Gross Cc: Andrew Morton Cc: Mike Rapoport Cc: Vlastimil Babka Cc: Michal Hocko Cc: x86@kernel.org Cc: linux-acpi@vger.kernel.org Cc: linux-mm@kvack.org --- arch/x86/kernel/setup.c | 4 ++-- arch/x86/mm/init.c | 56 ++++++++++++++++++++++++++++++------------------- 2 files changed, 36 insertions(+), 24 deletions(-) diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index 9b57e01..00a1b84 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -827,7 +827,7 @@ static void early_acpi_parse(void) early_acpi_boot_init(); initmem_init(); /* check whether memory is returned or not */ - start = memblock_find_in_range(start, end, 1<<24, 1); + start = memblock_find_in_range(start, end, 1 << 24, 1); if (!start) pr_warn("the above acpi routines change and consume memory\n"); memblock_set_current_limit(orig_start, orig_end, enforcing); @@ -1135,7 +1135,7 @@ void __init setup_arch(char **cmdline_p) trim_platform_memory_ranges(); trim_low_memory_range(); -#ifdef CONFIG_MEMORY_HOTPLUG +#if defined(CONFIG_MEMORY_HOTPLUG) && defined(CONFIG_X86_32) /* * Memory used by the kernel cannot be hot-removed because Linux * cannot migrate the kernel pages. When memory hotplug is diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c index 385b9cd..003ad77 100644 --- a/arch/x86/mm/init.c +++ b/arch/x86/mm/init.c @@ -72,8 +72,6 @@ static unsigned long __initdata pgt_buf_start; static unsigned long __initdata pgt_buf_end; static unsigned long __initdata pgt_buf_top; -static unsigned long min_pfn_mapped; - static bool __initdata can_use_brk_pgt = true; static unsigned long min_pfn_allowed; @@ -532,6 +530,10 @@ static unsigned long __init init_range_memory_mapping( return mapped_ram_size; } +#ifdef CONFIG_X86_32 + +static unsigned long min_pfn_mapped; + static unsigned long __init get_new_step_size(unsigned long step_size) { /* @@ -653,6 +655,32 @@ static void __init memory_map_bottom_up(unsigned long map_start, } } +static unsigned long __init init_range_memory_mapping32( + unsigned long r_start, unsigned long r_end) +{ + /* + * If the allocation is in bottom-up direction, we setup direct mapping + * in bottom-up, otherwise we setup direct mapping in top-down. + */ + if (memblock_bottom_up()) { + unsigned long kernel_end = __pa_symbol(_end); + + /* + * we need two separate calls here. This is because we want to + * allocate page tables above the kernel. So we first map + * [kernel_end, end) to make memory above the kernel be mapped + * as soon as possible. And then use page tables allocated above + * the kernel to map [ISA_END_ADDRESS, kernel_end). + */ + memory_map_bottom_up(kernel_end, r_end); + memory_map_bottom_up(r_start, kernel_end); + } else { + memory_map_top_down(r_start, r_end); + } +} + +#endif + void __init init_mem_mapping(void) { unsigned long end; @@ -663,6 +691,8 @@ void __init init_mem_mapping(void) #ifdef CONFIG_X86_64 end = max_pfn << PAGE_SHIFT; + /* allow alloc_low_pages() to allocate from memblock */ + set_alloc_range(ISA_END_ADDRESS, end); #else end = max_low_pfn << PAGE_SHIFT; #endif @@ -673,32 +703,14 @@ void __init init_mem_mapping(void) /* Init the trampoline, possibly with KASLR memory offset */ init_trampoline(); - /* - * If the allocation is in bottom-up direction, we setup direct mapping - * in bottom-up, otherwise we setup direct mapping in top-down. - */ - if (memblock_bottom_up()) { - unsigned long kernel_end = __pa_symbol(_end); - - /* - * we need two separate calls here. This is because we want to - * allocate page tables above the kernel. So we first map - * [kernel_end, end) to make memory above the kernel be mapped - * as soon as possible. And then use page tables allocated above - * the kernel to map [ISA_END_ADDRESS, kernel_end). - */ - memory_map_bottom_up(kernel_end, end); - memory_map_bottom_up(ISA_END_ADDRESS, kernel_end); - } else { - memory_map_top_down(ISA_END_ADDRESS, end); - } - #ifdef CONFIG_X86_64 + init_range_memory_mapping(ISA_END_ADDRESS, end); if (max_pfn > max_low_pfn) { /* can we preseve max_low_pfn ?*/ max_low_pfn = max_pfn; } #else + init_range_memory_mapping32(ISA_END_ADDRESS, end); early_ioremap_page_table_range_init(); #endif