From patchwork Wed Apr 25 13:22:50 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Morse X-Patchwork-Id: 10363231 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 09530601D3 for ; Wed, 25 Apr 2018 13:26:53 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E819228595 for ; Wed, 25 Apr 2018 13:26:52 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id DBE8028AC0; Wed, 25 Apr 2018 13:26:52 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI autolearn=ham version=3.3.1 Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 28E1A28595 for ; Wed, 25 Apr 2018 13:26:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20170209; h=Sender: Content-Transfer-Encoding:Content-Type:MIME-Version:Cc:List-Subscribe: List-Help:List-Post:List-Archive:List-Unsubscribe:List-Id:Message-Id:Date: Subject:To:From:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:In-Reply-To: References:List-Owner; bh=z4w6LcfCVUdOjCqu57hoIXehrb7QoPXprKd5h1I/JUA=; b=goL kHCggdHdm5Qz+XH3/d67z4FtBwwAwLD+rxxHeakLgNTpB/CGRpBrGTu7AZagJCo6+bsw08e+VTXZi dmNFTFhBWQZVJFJzYHbdMNuoCPuHB6vbPt7dWP7Vcio0S+qeb6tXDT+tHfkJVKJo6ye8oQ19sHBuU BIjrfhNll44HUTwdDVJOrswvS+xI3JvKxL+m9JB3fkljG3NqKxjp1puDim3GNqXHqmKkcBtKQcQUo s4qrcUudj4Rj4ZTbsZ51PrbgZAy6ck6NekzzbrgNIwEfxUGD7TalTidpwzovCdZ71Ji+rgyskHlXE a+yHT4I5HdJ6r0qPxIBqieJo3MwkXig==; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.90_1 #2 (Red Hat Linux)) id 1fBKRT-0002sV-5f; Wed, 25 Apr 2018 13:26:39 +0000 Received: from foss.arm.com ([217.140.101.70]) by bombadil.infradead.org with esmtp (Exim 4.90_1 #2 (Red Hat Linux)) id 1fBKQs-0002YC-Rx for linux-arm-kernel@lists.infradead.org; Wed, 25 Apr 2018 13:26:05 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 3BF661435; Wed, 25 Apr 2018 06:25:52 -0700 (PDT) Received: from melchizedek.cambridge.arm.com (melchizedek.cambridge.arm.com [10.1.207.55]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 957A03F4FF; Wed, 25 Apr 2018 06:25:50 -0700 (PDT) From: James Morse To: linux-arm-kernel@lists.infradead.org Subject: [PATCH] arm64: export memblock_reserve()d regions via /proc/iomem Date: Wed, 25 Apr 2018 14:22:50 +0100 Message-Id: <20180425132250.9397-1-james.morse@arm.com> X-Mailer: git-send-email 2.16.2 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20180425_062603_149762_CD8E9BF3 X-CRM114-Status: GOOD ( 21.43 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Mark Rutland , Ard Biesheuvel , Catalin Marinas , Tyler Baicar , Will Deacon , Akashi Takahiro , James Morse , Bhupesh Sharma MIME-Version: 1.0 Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org X-Virus-Scanned: ClamAV using ClamSMTP There has been some confusion around what is necessary to prevent kexec overwriting important memory regions. memblock: reserve, or nomap? Only memblock nomap regions are reported via /proc/iomem, kexec's user-space doesn't know about memblock_reserve()d regions. Until commit f56ab9a5b73ca ("efi/arm: Don't mark ACPI reclaim memory as MEMBLOCK_NOMAP") the ACPI tables were nomap, now they are reserved and thus possible for kexec to overwrite with the new kernel or initrd. But this was always broken, as the UEFI memory map is also reserved and not marked as nomap. It turns out that while kexec-tools will pick up reserved sections in iomem that look like: | 80000000-dfffffff : System RAM | 81000000-8158ffff : reserved The reserved section is ignored by its 'locate_hole()' code. To fix this, we need to describe memblock_reserved() and nomap regions as 'reserved' at the top level: | 80000000-80ffffff : System RAM | 81000000-8158ffff : reserved | 81590000-dfffffff : System RAM To complicate matters, our existing named sections are described as being part of 'System RAM', but they are also memblock_reserve()d. We need to keep this in-case something is depending on it. To do this involves walking memblock multiple times: First add the 'System RAM' sections that are memory and not-reserved. These may be smaller than a page if part of the page is reserved. In this case we want to describe the page as reserved, so we round these regions down to the smallest page-size region, which may be empty. (We round-up the memblock_reserved() regions to fill in the gaps). The boundaries for kernel_data are changed because paging_init() punches holes in the _sdata -> _edata region, and this code can't add a named region that crosses memblock_reserve()d<->normal-memory regions. The new helpers will catch any more overlapping regions that occur. Lastly, we add the memblock_reserved() regions using reserve_region_with_split(), which will fill in the gaps between the existing named regions. (e.g. the regions occupied by the __init code). This call uses the slab allocator, so has to run from an initcall. Reported-by: Bhupesh Sharma Reported-by: Tyler Baicar Suggested-by: Akashi Takahiro Signed-off-by: James Morse CC: Ard Biesheuvel CC: Mark Rutland Tested-by: Tyler Baicar --- If we do send this to stable: Fixes: d28f6df1305a ("arm64/kexec: Add core kexec support") If we're happy to modify user-sapce, we can do much neater things. It looks like UEFI's careful 'memory map not mapped' code had me convinced it was nomap. arch/arm64/kernel/setup.c | 136 ++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 113 insertions(+), 23 deletions(-) diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c index 30ad2f085d1f..e82c0d5c70f8 100644 --- a/arch/arm64/kernel/setup.c +++ b/arch/arm64/kernel/setup.c @@ -202,45 +202,135 @@ static void __init setup_machine_fdt(phys_addr_t dt_phys) dump_stack_set_arch_desc("%s (DT)", name); } +static struct resource * __init add_standard_resources(phys_addr_t start, + phys_addr_t end, + bool reserved) +{ + struct resource *res; + + res = alloc_bootmem_low(sizeof(*res)); + + if (reserved) { + res->name = "reserved"; + res->flags = IORESOURCE_MEM; + } else { + res->name = "System RAM"; + res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY; + } + res->start = start; + res->end = end; + + if (request_resource_conflict(&iomem_resource, res)) { + pr_warn_once("Attempted to add overlapping resources\n"); + return NULL; + } + + return res; +} + +static void __init add_named_resources(struct resource *named_resource) +{ + phys_addr_t start, end; + struct resource *res; + + start = __pfn_to_phys(PFN_DOWN(named_resource->start)); + end = __pfn_to_phys(PFN_UP(named_resource->end)) - 1; + res = add_standard_resources(start, end, false); + if (res) + request_resource(res, named_resource); +} + static void __init request_standard_resources(void) { + phys_addr_t start, end; struct memblock_region *region; struct resource *res; + u64 i; + int num_res = 0; kernel_code.start = __pa_symbol(_text); kernel_code.end = __pa_symbol(__init_begin - 1); kernel_data.start = __pa_symbol(_sdata); - kernel_data.end = __pa_symbol(_end - 1); + kernel_data.end = __pa_symbol(_edata - 1); - for_each_memblock(memory, region) { - res = alloc_bootmem_low(sizeof(*res)); - if (memblock_is_nomap(region)) { - res->name = "reserved"; - res->flags = IORESOURCE_MEM; - } else { - res->name = "System RAM"; - res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY; - } - res->start = __pfn_to_phys(memblock_region_memory_base_pfn(region)); - res->end = __pfn_to_phys(memblock_region_memory_end_pfn(region)) - 1; + /* + * We can't allocate memory while walking free memory, count the number + * of struct resource's we will need. Round start/end to the smallest + * page-size region as we round the reserved regions up. + */ + for_each_free_mem_range(i, NUMA_NO_NODE, 0, &start, &end, NULL) { + start = ALIGN(start, PAGE_SIZE); + end = ALIGN_DOWN(end, PAGE_SIZE) - 1; + if (end > start) + num_res++; + } + + /* our allocation may split a free memblock */ + num_res++; + res = alloc_bootmem_low(num_res * sizeof(*res)); - request_resource(&iomem_resource, res); + /* + * Add the non-reserved memory regions. flag=0 means we skip nomap + * regions too. + */ + for_each_free_mem_range(i, NUMA_NO_NODE, 0, &start, &end, NULL) { + if (WARN_ON(!num_res)) + return; + + res->name = "System RAM"; + res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY; + + res->start = ALIGN(start, PAGE_SIZE); + res->end = ALIGN_DOWN(end, PAGE_SIZE) - 1; + if (res->end > res->start) { + request_resource(&iomem_resource, res); + res++; + num_res--; + } + } - if (kernel_code.start >= res->start && - kernel_code.end <= res->end) - request_resource(res, &kernel_code); - if (kernel_data.start >= res->start && - kernel_data.end <= res->end) - request_resource(res, &kernel_data); + /* Add the named reserved regions and their system-ram parents */ + add_named_resources(&kernel_code); + add_named_resources(&kernel_data); #ifdef CONFIG_KEXEC_CORE - /* Userspace will find "Crash kernel" region in /proc/iomem. */ - if (crashk_res.end && crashk_res.start >= res->start && - crashk_res.end <= res->end) - request_resource(res, &crashk_res); + if (crashk_res.end) + add_named_resources(&crashk_res); #endif + + /* Add the nomap regions */ + for_each_memblock(memory, region) { + if (!memblock_is_nomap(region)) + continue; + + start = __pfn_to_phys(memblock_region_memory_base_pfn(region)); + end = __pfn_to_phys(memblock_region_memory_end_pfn(region)) - 1; + add_standard_resources(start, end, true); } } +static int __init reserve_memblock_reserved_regions(void) +{ + phys_addr_t start, end, roundup_end = 0; + u64 i; + + for_each_reserved_mem_region(i, &start, &end) { + if (end <= roundup_end) + continue; /* done already */ + + start = __pfn_to_phys(PFN_DOWN(start)); + end = __pfn_to_phys(PFN_UP(end)) - 1; + roundup_end = end; + + reserve_region_with_split(&iomem_resource, start, end, + "reserved"); + } + + return 0; +} +/* reserve_region_with_split() requires the slab allocator: */ +arch_initcall(reserve_memblock_reserved_regions); + + u64 __cpu_logical_map[NR_CPUS] = { [0 ... NR_CPUS-1] = INVALID_HWID }; void __init setup_arch(char **cmdline_p)