From patchwork Tue Aug 13 22:29:36 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thomas Gleixner X-Patchwork-Id: 13762635 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6641AC52D7C for ; Tue, 13 Aug 2024 22:29:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 05D376B0089; Tue, 13 Aug 2024 18:29:42 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 00D266B008A; Tue, 13 Aug 2024 18:29:41 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DC8996B0092; Tue, 13 Aug 2024 18:29:41 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id BE8A66B0089 for ; Tue, 13 Aug 2024 18:29:41 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 76C2C40A83 for ; Tue, 13 Aug 2024 22:29:41 +0000 (UTC) X-FDA: 82448665362.08.A021631 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) by imf26.hostedemail.com (Postfix) with ESMTP id 75658140004 for ; Tue, 13 Aug 2024 22:29:39 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=linutronix.de header.s=2020 header.b=Gm2yjerx; dkim=pass header.d=linutronix.de header.s=2020e header.b=O43mwkXa; spf=pass (imf26.hostedemail.com: domain of tglx@linutronix.de designates 193.142.43.55 as permitted sender) smtp.mailfrom=tglx@linutronix.de; dmarc=pass (policy=none) header.from=linutronix.de ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1723588108; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=SSQozabB5oXXlbGC/Voz9NEQnop8mpnE16RwHB2qbT0=; b=qw/qdW8AY2cDnKy2Ve8FAUjSrFytAcLJum9wHRg+PKqexL3XF3W84Raa0OMZwAbapdUNwL yR/aiO86Ysy7K5LxnwIJKZPdX6pjSvNaG5ZHeJt4h4jbvYTqIbJDwRoRje3ebUmBBL3wyC zFuaV1/o+rvM/+cBft1srjLMJwaLGNI= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1723588108; a=rsa-sha256; cv=none; b=X17Mdef2pOwo4EU5IgAG2E0PY8Tx4LDiqVExya611E5FFIwSTllkTYq+wA1Th+NTHOM4Gq HFOEuFA4RZu8kKslJ8HUe47qDOyKoHlGejL09A2j/Ngu5qkymxHrgSGA+z8tkKgKHWYakX settbgCpS1uuv0FqBxp2SGW0pA7x3fU= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=linutronix.de header.s=2020 header.b=Gm2yjerx; dkim=pass header.d=linutronix.de header.s=2020e header.b=O43mwkXa; spf=pass (imf26.hostedemail.com: domain of tglx@linutronix.de designates 193.142.43.55 as permitted sender) smtp.mailfrom=tglx@linutronix.de; dmarc=pass (policy=none) header.from=linutronix.de From: Thomas Gleixner DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1723588176; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=SSQozabB5oXXlbGC/Voz9NEQnop8mpnE16RwHB2qbT0=; b=Gm2yjerxdpyFuPVK+ZzeMEa4D0rBkSvEqVmGOd+NC9SOLSpzbU4Boaa2iIhDAxVDXYHmcc BTCpXAHVwGRa2PhVCA0/RQ2hUqO1EkxGiN62wuOWuIqndGgB80754GP99wwaXBlJCDZIu0 MGRjmkjTpPXNdEvkimYkREftnxZJ3DhuaBsJf21/PoKCEVIiUeR/3dfHoogGlCj3nEn1Xx NEyaB4RtO2fMwAUTxZXWIPebOKtxholvkUm3dYxpL0V5laIL+gVOPPMjeAMgHqiuGIFx9v xdjJQCWdXSYkJxpnyr87kn0L/BOf2dIkPHQdACpgZj1KO0BG3+TuR02Y2ACJPQ== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1723588176; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=SSQozabB5oXXlbGC/Voz9NEQnop8mpnE16RwHB2qbT0=; b=O43mwkXaoQ/oeU+7uvgCmPNiAeK76Yg7BC6WHjABMMHL6MYswlq0j0VTgpgSLnRPDjwWb+ OvHvukvgm5xfbWCA== To: Alistair Popple Cc: x86@kernel.org, Dan Williams , dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, max8rr8@gmail.com, linux-kernel@vger.kernel.org, x86@kernel.org, jhubbard@nvidia.com, Kees Cook , Andrew Morton , David Hildenbrand , Oscar Salvador , linux-mm@kvack.org Subject: x86/kaslr: Expose and use the end of the physical memory address space In-Reply-To: <87le10p3ak.ffs@tglx> References: <20230810100011.14552-1-max8rr8@gmail.com> <87le17yu5y.ffs@tglx> <66b4eb2a62f6_c1448294b0@dwillia2-xfh.jf.intel.com.notmuch> <877ccryor7.ffs@tglx> <66b4f305eb227_c144829443@dwillia2-xfh.jf.intel.com.notmuch> <66b4f4a522508_c1448294f2@dwillia2-xfh.jf.intel.com.notmuch> <87zfpmyhvr.ffs@tglx> <66b523ac448e2_c1448294ec@dwillia2-xfh.jf.intel.com.notmuch> <87seve4e37.fsf@nvdebian.thelocal> <66b59314b3d4_c1448294d3@dwillia2-xfh.jf.intel.com.notmuch> <87zfpks23v.ffs@tglx> <87o75y428z.fsf@nvdebian.thelocal> <87ikw6rrau.ffs@tglx> <87frr9swmw.ffs@tglx> <87bk1x42vk.fsf@nvdebian.thelocal> <87sev8rfyx.ffs@tglx> <87le10p3ak.ffs@tglx> Date: Wed, 14 Aug 2024 00:29:36 +0200 Message-ID: <87ed6soy3z.ffs@tglx> MIME-Version: 1.0 X-Stat-Signature: pap3thhzxwnxm8bqchdfoi4yuybaoc41 X-Rspamd-Queue-Id: 75658140004 X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1723588179-558851 X-HE-Meta: U2FsdGVkX18nzjXTDh4IRltVBof6erxAQ8KdPoeyA/mX1n0FBlIBs7ZEAGqxk6Et3u1xGyQveFvmFS8TyLR8rTuM4y4HYTl1xb10zx4UjpEFHjCs4IRobxb4InOLtwtOYfyUfSWpsas4ig58HuqEVXtSHxMauxgh7QtUqF3jfonhqYDIRSeOLA738aX9KGnTxrOyevh0TtEJJB4BjBw9b78HKSalfFvezA3vcNei3/czA5veows6v91HHJ43JgSEZS0AIoyIF6l6kuTx1nbnK8HI3za69RQYq8o/6m1+ns72Hjdlztq1gVU56i6j0Zb6KLLEeexdKPmPUDz3IfytiL2BrndPfZDPeF3rEhHkbjfRBuwATiDvAa+5deOjILnUYGHclfMJQ+BH20jm5LsmJAKihUoRsSZX9kWX1spWEmTmEHpslrei2yWsbs1xCwn6iyB5stA5Hfz92YwgyMBZodhASrOutjvSyKpzddymk0kYQkAyF9PyDtA2NU/8jl/5FceiO5wzQ/DfFyzUBnkNeCncjkeGTjidzSQrCSMTQkFzoLFb85njaEIxB7H+IUIOgjEQAsJHiPpA+znoSZTZUdjSboNdK4br3HsthXEuY8rSlu9Wh3xJpf5zr+quFKzMvk/pvkYAHCQBKj6LmK9zb3mCGP9YbsQdMWcGI8Cq10XVxRZbCrsS0TxLg6mpOoYdjOTlt3vH6+edun7QLc0yTMwaa8Jdu/jwill/rQpSomVzsd9w/ATsfhfjyBl0rYSm4KK2hA1VbwRYV7D3EHjwLSnP7DpQKjUni8iHLkl7RVnWkK5xLgnjehxOFN2oud+ljgNcXR6XJkfRauXM1F2BaeUwI73WSYbPt8kPoUTYDK66ICZ9ibiQdqh+u42gV09OG7/tKgiW33xYzYpjcRsVHtuS/nmPn4dKxENccM5QcciWSfkNwMRMnFWdY/N9TvivbkmLRrGuPQVijm/ravx tP9oRn6R keEn1kA1f0xaUyD4sq2AEyRCcAfGuzg3vZJNuMS0BoQ+JDhNvSzC3FjOGP4RG2UNd2jX9+2ewVlxImGhGk7OQlnPpRG1EQRyfF49TTpTciNo+nCAqAbC+AfKAeHl+iUgaylGGC1a9clJDL+pXalpM0LABqWuWHhqwtFwhwhTpjoeQsJr19Sv5eOgbQj+2GLn4wink/Z3MT07SWbAjo9cDlCHpxA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: iounmap() on x86 occasionally fails to unmap because the provided valid ioremap address is not below high_memory. It turned out that this happens due to KASLR. KASLR uses the full address space between PAGE_OFFSET and vaddr_end to randomize the starting points of the direct map, vmalloc and vmemmap regions. It thereby limits the size of the direct map by using the installed memory size plus an extra configurable margin for hot-plug memory. This limitation is done to gain more randomization space because otherwise only the holes between the direct map, vmalloc, vmemmap and vaddr_end would be usable for randomizing. The limited direct map size is not exposed to the rest of the kernel, so the memory hot-plug and resource management related code paths still operate under the assumption that the available address space can be determined with MAX_PHYSMEM_BITS. request_free_mem_region() allocates from (1 << MAX_PHYSMEM_BITS) - 1 downwards. That means the first allocation happens past the end of the direct map and if unlucky this address is in the vmalloc space, which causes high_memory to become greater than VMALLOC_START and consequently causes iounmap() to fail for valid ioremap addresses. MAX_PHYSMEM_BITS cannot be changed for that because the randomization does not align with address bit boundaries and there are other places which actually require to know the maximum number of address bits. All remaining usage sites of MAX_PHYSMEM_BITS have been analyzed and found to be correct. Cure this by exposing the end of the direct map via PHYSMEM_END and use that for the memory hot-plug and resource management related places instead of relying on MAX_PHYSMEM_BITS. In the KASLR case PHYSMEM_END maps to a variable which is initialized by the KASLR initialization and otherwise it is based on MAX_PHYSMEM_BITS as before. To prevent future hickups add a check into add_pages() to catch callers trying to add memory above PHYSMEM_END. Fixes: 0483e1fa6e09 ("x86/mm: Implement ASLR for kernel memory regions") Reported-by: Max Ramanouski Reported-by: Alistair Popple Signed-off-by: Thomas Gleixner Reviewed-by: Alistair Popple Tested-by: Alistair Popple Reviewed-by: Dan Williams Reviewed-by: Kees Cook Tested-By: Max Ramanouski --- arch/x86/include/asm/page_64.h | 1 + arch/x86/include/asm/pgtable_64_types.h | 4 ++++ arch/x86/mm/init_64.c | 4 ++++ arch/x86/mm/kaslr.c | 21 ++++++++++++++++++--- include/linux/mm.h | 4 ++++ kernel/resource.c | 6 ++---- mm/memory_hotplug.c | 2 +- mm/sparse.c | 2 +- 8 files changed, 35 insertions(+), 9 deletions(-) --- a/arch/x86/include/asm/page_64.h +++ b/arch/x86/include/asm/page_64.h @@ -17,6 +17,7 @@ extern unsigned long phys_base; extern unsigned long page_offset_base; extern unsigned long vmalloc_base; extern unsigned long vmemmap_base; +extern unsigned long physmem_end; static __always_inline unsigned long __phys_addr_nodebug(unsigned long x) { --- a/arch/x86/include/asm/pgtable_64_types.h +++ b/arch/x86/include/asm/pgtable_64_types.h @@ -140,6 +140,10 @@ extern unsigned int ptrs_per_p4d; # define VMEMMAP_START __VMEMMAP_BASE_L4 #endif /* CONFIG_DYNAMIC_MEMORY_LAYOUT */ +#ifdef CONFIG_RANDOMIZE_MEMORY +# define PHYSMEM_END physmem_end +#endif + /* * End of the region for which vmalloc page tables are pre-allocated. * For non-KMSAN builds, this is the same as VMALLOC_END. --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -958,8 +958,12 @@ static void update_end_of_memory_vars(u6 int add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages, struct mhp_params *params) { + unsigned long end = ((start_pfn + nr_pages) << PAGE_SHIFT) - 1; int ret; + if (WARN_ON_ONCE(end > PHYSMEM_END)) + return -ERANGE; + ret = __add_pages(nid, start_pfn, nr_pages, params); WARN_ON_ONCE(ret); --- a/arch/x86/mm/kaslr.c +++ b/arch/x86/mm/kaslr.c @@ -47,13 +47,24 @@ static const unsigned long vaddr_end = C */ static __initdata struct kaslr_memory_region { unsigned long *base; + unsigned long *end; unsigned long size_tb; } kaslr_regions[] = { - { &page_offset_base, 0 }, - { &vmalloc_base, 0 }, - { &vmemmap_base, 0 }, + { + .base = &page_offset_base, + .end = &physmem_end, + }, + { + .base = &vmalloc_base, + }, + { + .base = &vmemmap_base, + }, }; +/* The end of the possible address space for physical memory */ +unsigned long physmem_end __ro_after_init; + /* Get size in bytes used by the memory region */ static inline unsigned long get_padding(struct kaslr_memory_region *region) { @@ -82,6 +93,8 @@ void __init kernel_randomize_memory(void BUILD_BUG_ON(vaddr_end != CPU_ENTRY_AREA_BASE); BUILD_BUG_ON(vaddr_end > __START_KERNEL_map); + /* Preset the end of the possible address space for physical memory */ + physmem_end = ((1ULL << MAX_PHYSMEM_BITS) - 1); if (!kaslr_memory_enabled()) return; @@ -134,6 +147,8 @@ void __init kernel_randomize_memory(void */ vaddr += get_padding(&kaslr_regions[i]); vaddr = round_up(vaddr + 1, PUD_SIZE); + if (kaslr_regions[i].end) + *kaslr_regions[i].end = __pa(vaddr) - 1; remain_entropy -= entropy; } } --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -97,6 +97,10 @@ extern const int mmap_rnd_compat_bits_ma extern int mmap_rnd_compat_bits __read_mostly; #endif +#ifndef PHYSMEM_END +# define PHYSMEM_END ((1ULL << MAX_PHYSMEM_BITS) - 1) +#endif + #include #include --- a/kernel/resource.c +++ b/kernel/resource.c @@ -1826,8 +1826,7 @@ static resource_size_t gfr_start(struct if (flags & GFR_DESCENDING) { resource_size_t end; - end = min_t(resource_size_t, base->end, - (1ULL << MAX_PHYSMEM_BITS) - 1); + end = min_t(resource_size_t, base->end, PHYSMEM_END); return end - size + 1; } @@ -1844,8 +1843,7 @@ static bool gfr_continue(struct resource * @size did not wrap 0. */ return addr > addr - size && - addr <= min_t(resource_size_t, base->end, - (1ULL << MAX_PHYSMEM_BITS) - 1); + addr <= min_t(resource_size_t, base->end, PHYSMEM_END); } static resource_size_t gfr_next(resource_size_t addr, resource_size_t size, --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1681,7 +1681,7 @@ struct range __weak arch_get_mappable_ra struct range mhp_get_pluggable_range(bool need_mapping) { - const u64 max_phys = (1ULL << MAX_PHYSMEM_BITS) - 1; + const u64 max_phys = PHYSMEM_END; struct range mhp_range; if (need_mapping) { --- a/mm/sparse.c +++ b/mm/sparse.c @@ -129,7 +129,7 @@ static inline int sparse_early_nid(struc static void __meminit mminit_validate_memmodel_limits(unsigned long *start_pfn, unsigned long *end_pfn) { - unsigned long max_sparsemem_pfn = 1UL << (MAX_PHYSMEM_BITS-PAGE_SHIFT); + unsigned long max_sparsemem_pfn = (PHYSMEM_END + 1) >> PAGE_SHIFT; /* * Sanity checks - do not allow an architecture to pass