From patchwork Thu Oct 23 19:14:34 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mark Rutland X-Patchwork-Id: 5142761 Return-Path: X-Original-To: patchwork-linux-arm@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork2.web.kernel.org (Postfix) with ESMTP id B9587C11AC for ; Thu, 23 Oct 2014 19:21:48 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 992532015E for ; Thu, 23 Oct 2014 19:21:47 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.9]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 5677A200ED for ; Thu, 23 Oct 2014 19:21:46 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.80.1 #2 (Red Hat Linux)) id 1XhNqv-0000FJ-6Z; Thu, 23 Oct 2014 19:15:17 +0000 Received: from cam-admin0.cambridge.arm.com ([217.140.96.50]) by bombadil.infradead.org with esmtp (Exim 4.80.1 #2 (Red Hat Linux)) id 1XhNqr-0007Yo-Ox for linux-arm-kernel@lists.infradead.org; Thu, 23 Oct 2014 19:15:15 +0000 Received: from leverpostej (leverpostej.cambridge.arm.com [10.1.205.151]) by cam-admin0.cambridge.arm.com (8.12.6/8.12.6) with ESMTP id s9NJEgwo008947; Thu, 23 Oct 2014 20:14:42 +0100 (BST) Date: Thu, 23 Oct 2014 20:14:34 +0100 From: Mark Rutland To: "msalter@redhat.com" Subject: Re: [PATCH 06/10] arm64/efi: use UEFI memory map unconditionally if available Message-ID: <20141023191433.GB20121@leverpostej> References: <1413987713-30528-1-git-send-email-ard.biesheuvel@linaro.org> <1413987713-30528-7-git-send-email-ard.biesheuvel@linaro.org> <1413997616.2985.74.camel@deneb.redhat.com> <20141023155428.GA977@leverpostej> <1414081198.6829.12.camel@deneb.redhat.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <1414081198.6829.12.camel@deneb.redhat.com> Thread-Topic: [PATCH 06/10] arm64/efi: use UEFI memory map unconditionally if available Accept-Language: en-GB, en-US Content-Language: en-US User-Agent: Mutt/1.5.21 (2010-09-15) X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20141023_121514_228180_039998D9 X-CRM114-Status: GOOD ( 34.72 ) X-Spam-Score: -6.4 (------) Cc: "matt.fleming@intel.com" , "yi.li@linaro.org" , Ard Biesheuvel , Catalin Marinas , Will Deacon , "leif.lindholm@linaro.org" , "roy.franz@linaro.org" , "linux-efi@vger.kernel.org" , "dyoung@redhat.com" , "linux-arm-kernel@lists.infradead.org" X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org X-Spam-Status: No, score=-3.3 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_NONE, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP [...] > > > It also looks like EFI_BOOT flag will be set even if uefi_init fails. > > > If uefi_init fails, we only need reserve_regions() for the purpose > > > of adding memblocks. Otherwise, we end up wasting a lot of memory. > > > > What memory are we wasting in that case? Surely the only items that we > > could choose to throw away if we failed uefi_init are > > EFI_ACPI_RECLAIM_MEMORY and EFI_MEMORY_RUNTIME? > > > > We might want to keep those around so we can kexec into a kernel where > > we can make use of them. Surely they shouldn't take up a significant > > proportion of the available memory? > > In addition, reserve_regions() also reserves BOOT_SERVICES (which gets > freed up later after set_virtual_address_map(). That won't happen if > uefi_init() fails. In any case, if uefi_init() fails, we won't be able > to find the ACPI tables kexec kernel won't be able to either. If you need the ACPI tables, the first kernel won't boot. However, if we fail to map the runtime services data and code, that doesn't mean the next kernel can't. We might have failed to map the runtime services due to an early_ioremap bug or similar, and that could be fixed in the kernel we're about to kexec to. If we're going to nuke runtime regions, we should nuke them in the memory map descriptors as a courtesy to the next kernel. Otherwise we could be more courteous and simply leave them be (which is my preference). > The BOOT_SERVICES stuff is the big chunk. There was some discussion of > not reserving that but no one has gotten around to writing the patch > to clean out the code that does it. I've just spent some time doing so; patch below. It boots happily on my Juno (with 4KiB pages at least), but I haven't given it much of a stress test. I wasn't sure if unusable memory could show up with EFI_MEMORY_WB attributes. If so, this patch still won't prevent that from being mapped as cacheable, but that should be easy to fix. Thanks, Mark. ---->8---- From 581d5bac5b4a6f93e23737012b71c58d809af6bb Mon Sep 17 00:00:00 2001 From: Mark Rutland Date: Thu, 23 Oct 2014 19:41:55 +0100 Subject: [PATCH] arm64: efi: Simplify memory descriptor handling Currently discovering the memory map from UEFI is a multi-stage affair, where the boot services code and data are kept around for much longer than necessary. Additionally, potential overlap between memory and IO mappings at kernel page granularity is not handled. This patch attempts to solve both of these issues, with the addition and filtering of memory regions being handled in a single function. First, all normal memory (i.e. anything which can be mapped writeback cacheable) is added to memblock. Second, IO and reserved regions are filtered out of memblock at kernel page granularity, to prevent potential conflicting mappings. As some reserved regions must be present in the idmap these are reserved. Everything else that's not normal memory is removed, preventing the possibility of a cacheable mapping covering something it shouldn't. Signed-off-by: Mark Rutland Cc: Ard Biesheuvel Cc: Leif Lindholm --- arch/arm64/kernel/efi.c | 181 +++++++++++------------------------------------- 1 file changed, 42 insertions(+), 139 deletions(-) diff --git a/arch/arm64/kernel/efi.c b/arch/arm64/kernel/efi.c index 03aaa99..8873c28 100644 --- a/arch/arm64/kernel/efi.c +++ b/arch/arm64/kernel/efi.c @@ -154,161 +154,66 @@ static __init int is_reserve_region(efi_memory_desc_t *md) return 0; } -static __init void reserve_regions(void) +static __init void efi_add_memory(void) { efi_memory_desc_t *md; - u64 paddr, npages, size; + u64 paddr, size; if (uefi_debug) - pr_info("Processing EFI memory map:\n"); + pr_info("EFI: adding normal memory:\n"); for_each_efi_memory_desc(&memmap, md) { + if (!is_normal_ram(md)) + continue; + paddr = md->phys_addr; - npages = md->num_pages; + size = md->num_pages << EFI_PAGE_SHIFT; if (uefi_debug) - pr_info(" 0x%012llx-0x%012llx [%s]", - paddr, paddr + (npages << EFI_PAGE_SHIFT) - 1, + pr_info(" 0x%012llx-0x%012llx [%s]\n", + paddr, size - 1, memory_type_name[md->type]); - memrange_efi_to_native(&paddr, &npages); - size = npages << PAGE_SHIFT; + early_init_dt_add_memory_arch(paddr, size); + } + + /* + * Regions describe in an EFI_MEMORY_DESCRIPTOR are only guaranteed to + * be 4KiB aligned but the kernel page size could be larger, and + * regions with different attributes could fall into the same kernel + * page. Thus we must remove any memory in the same kernel page as an + * IO region. + * + * Reserved regions of memory need to be in the idmap, so just reserve + * them. + */ + if (uefi_debug) + pr_info("EFI: correcting for overlapping regions:\n"); + + for_each_efi_memory_desc(&memmap, md) { + if (!is_reserve_region(md) && is_normal_ram(md)) + continue; + + paddr = md->phys_addr; + size = md->num_pages << EFI_PAGE_SHIFT; + + size = PAGE_ALIGN(size + (paddr & ~PAGE_MASK)); + paddr &= PAGE_MASK; - if (is_normal_ram(md)) - early_init_dt_add_memory_arch(paddr, size); + if (uefi_debug) + pr_info(" 0x%012llx-0x%012llx [%s]\n", + paddr, size - 1, + memory_type_name[md->type]); - if (is_reserve_region(md) || - md->type == EFI_BOOT_SERVICES_CODE || - md->type == EFI_BOOT_SERVICES_DATA) { + if (is_reserve_region(md)) memblock_reserve(paddr, size); - if (uefi_debug) - pr_cont("*"); - } - - if (uefi_debug) - pr_cont("\n"); + else + memblock_remove(paddr, size); } set_bit(EFI_MEMMAP, &efi.flags); } - -static u64 __init free_one_region(u64 start, u64 end) -{ - u64 size = end - start; - - if (uefi_debug) - pr_info(" EFI freeing: 0x%012llx-0x%012llx\n", start, end - 1); - - free_bootmem_late(start, size); - return size; -} - -static u64 __init free_region(u64 start, u64 end) -{ - u64 map_start, map_end, total = 0; - - if (end <= start) - return total; - - map_start = (u64)memmap.phys_map; - map_end = PAGE_ALIGN(map_start + (memmap.map_end - memmap.map)); - map_start &= PAGE_MASK; - - if (start < map_end && end > map_start) { - /* region overlaps UEFI memmap */ - if (start < map_start) - total += free_one_region(start, map_start); - - if (map_end < end) - total += free_one_region(map_end, end); - } else - total += free_one_region(start, end); - - return total; -} - -static void __init free_boot_services(void) -{ - u64 total_freed = 0; - u64 keep_end, free_start, free_end; - efi_memory_desc_t *md; - - /* - * If kernel uses larger pages than UEFI, we have to be careful - * not to inadvertantly free memory we want to keep if there is - * overlap at the kernel page size alignment. We do not want to - * free is_reserve_region() memory nor the UEFI memmap itself. - * - * The memory map is sorted, so we keep track of the end of - * any previous region we want to keep, remember any region - * we want to free and defer freeing it until we encounter - * the next region we want to keep. This way, before freeing - * it, we can clip it as needed to avoid freeing memory we - * want to keep for UEFI. - */ - - keep_end = 0; - free_start = 0; - - for_each_efi_memory_desc(&memmap, md) { - u64 paddr, npages, size; - - if (is_reserve_region(md)) { - /* - * We don't want to free any memory from this region. - */ - if (free_start) { - /* adjust free_end then free region */ - if (free_end > md->phys_addr) - free_end -= PAGE_SIZE; - total_freed += free_region(free_start, free_end); - free_start = 0; - } - keep_end = md->phys_addr + (md->num_pages << EFI_PAGE_SHIFT); - continue; - } - - if (md->type != EFI_BOOT_SERVICES_CODE && - md->type != EFI_BOOT_SERVICES_DATA) { - /* no need to free this region */ - continue; - } - - /* - * We want to free memory from this region. - */ - paddr = md->phys_addr; - npages = md->num_pages; - memrange_efi_to_native(&paddr, &npages); - size = npages << PAGE_SHIFT; - - if (free_start) { - if (paddr <= free_end) - free_end = paddr + size; - else { - total_freed += free_region(free_start, free_end); - free_start = paddr; - free_end = paddr + size; - } - } else { - free_start = paddr; - free_end = paddr + size; - } - if (free_start < keep_end) { - free_start += PAGE_SIZE; - if (free_start >= free_end) - free_start = 0; - } - } - if (free_start) - total_freed += free_region(free_start, free_end); - - if (total_freed) - pr_info("Freed 0x%llx bytes of EFI boot services memory", - total_freed); -} - void __init efi_init(void) { struct efi_fdt_params params; @@ -330,7 +235,7 @@ void __init efi_init(void) if (uefi_init() < 0) return; - reserve_regions(); + efi_add_memory(); } void __init efi_idmap_init(void) @@ -452,8 +357,6 @@ static int __init arm64_enter_virtual_mode(void) kfree(virtmap); - free_boot_services(); - if (status != EFI_SUCCESS) { pr_err("Failed to set EFI virtual address map! [%lx]\n", status);