From patchwork Fri Jan 10 03:09:11 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anshuman Khandual X-Patchwork-Id: 11326541 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9EBA8139A for ; Fri, 10 Jan 2020 03:08:54 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 7C09D2073A for ; Fri, 10 Jan 2020 03:08:54 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="ROaBsV2j" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7C09D2073A Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=arm.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20170209; h=Sender: Content-Transfer-Encoding:Content-Type:MIME-Version:Cc:List-Subscribe: List-Help:List-Post:List-Archive:List-Unsubscribe:List-Id:References: In-Reply-To:Message-Id:Date:Subject:To:From:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Owner; bh=ZP7wCgMMHlOb2zN5eB2YpPocP7vUwh7gPM1V1rebmtI=; b=ROaBsV2jfyGuhYxh+sFQTko6qH /LDHKvxXg1QAkqK4pj9EL3Lm9baU5JjE5AlsXTfIxZ/FShRqw48rwBOzyWEP7SGqCO2apKWDJpPBP tZKxsmhVgmSCmNmYo4NVRZad07xp96Jx3yad02kSHDv7M9aqjIErUG1wHKhUnO3bnEeL1MDmg2ssO ILuRr2fVrgWIoOrua2eu0QBaQG1uxrjFsl478fIZaRGPdBeKSikTUfbSJ0lHB8+WwQN+WD83e9l3E bzmsPp/rcJ0YorjTGfEsT5S4fdU/AhWr6TLnjuVn7xfnnhBs8M18jDuznZFgmptKiQQLOyL/hfbxR DuMY0MtA==; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1ipkfN-0000ps-CE; Fri, 10 Jan 2020 03:08:53 +0000 Received: from foss.arm.com ([217.140.110.172]) by bombadil.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1ipkfJ-0000p1-MM for linux-arm-kernel@lists.infradead.org; Fri, 10 Jan 2020 03:08:51 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id C8D211007; Thu, 9 Jan 2020 19:08:48 -0800 (PST) Received: from p8cg001049571a15.blr.arm.com (p8cg001049571a15.blr.arm.com [10.162.42.128]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 064CB3F703; Thu, 9 Jan 2020 19:08:40 -0800 (PST) From: Anshuman Khandual To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, akpm@linux-foundation.org, catalin.marinas@arm.com, will@kernel.org Subject: [PATCH V11 1/5] mm/hotplug: Introduce arch callback validating the hot remove range Date: Fri, 10 Jan 2020 08:39:11 +0530 Message-Id: <1578625755-11792-2-git-send-email-anshuman.khandual@arm.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1578625755-11792-1-git-send-email-anshuman.khandual@arm.com> References: <1578625755-11792-1-git-send-email-anshuman.khandual@arm.com> X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20200109_190849_818254_12D516C4 X-CRM114-Status: GOOD ( 14.40 ) X-Spam-Score: 0.0 (/) X-Spam-Report: SpamAssassin version 3.4.2 on bombadil.infradead.org summary: Content analysis details: (0.0 points) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at https://www.dnswl.org/, no trust [217.140.110.172 listed in list.dnswl.org] 0.0 SPF_HELO_NONE SPF: HELO does not publish an SPF Record -0.0 SPF_PASS SPF: sender matches SPF record X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: mark.rutland@arm.com, ira.weiny@intel.com, david@redhat.com, mgorman@techsingularity.net, steve.capper@arm.com, Robin.Murphy@arm.com, steven.price@arm.com, broonie@kernel.org, cai@lca.pw, ard.biesheuvel@arm.com, cpandya@codeaurora.org, arunks@codeaurora.org, dan.j.williams@intel.com, Anshuman Khandual , logang@deltatee.com, valentin.schneider@arm.com, suzuki.poulose@arm.com, osalvador@suse.de MIME-Version: 1.0 Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org Currently there are two interfaces to initiate memory range hot removal i.e remove_memory() and __remove_memory() which then calls try_remove_memory(). Platform gets called with arch_remove_memory() to tear down required kernel page tables and other arch specific procedures. But there are platforms like arm64 which might want to prevent removal of certain specific memory ranges irrespective of their present usage or movability properties. Current arch call back arch_remove_memory() is too late in the process to abort memory hot removal as memory block devices and firmware memory map entries would have already been removed. Platforms should be able to abort the process before taking the mem_hotplug_lock with mem_hotplug_begin(). This essentially requires a new arch callback for memory range validation. This differentiates memory range validation between memory hot add and hot remove paths before carving out a new helper check_hotremove_memory_range() which incorporates a new arch callback. This call back provides platforms an opportunity to refuse memory removal at the very onset. In future the same principle can be extended for memory hot add path if required. Platforms can choose to override this callback in order to reject specific memory ranges from removal or can just fallback to a default implementation which allows removal of all memory ranges. Cc: Andrew Morton Signed-off-by: Anshuman Khandual Reported-by: kbuild test robot Reported-by: kbuild test robot --- include/linux/memory_hotplug.h | 7 +++++++ mm/memory_hotplug.c | 21 ++++++++++++++++++++- 2 files changed, 27 insertions(+), 1 deletion(-) diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index ba0dca6..f661bd5 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -305,6 +305,13 @@ static inline void pgdat_resize_init(struct pglist_data *pgdat) {} #ifdef CONFIG_MEMORY_HOTREMOVE +#ifndef arch_memory_removable +static inline bool arch_memory_removable(u64 base, u64 size) +{ + return true; +} +#endif + extern bool is_mem_section_removable(unsigned long pfn, unsigned long nr_pages); extern void try_offline_node(int nid); extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages); diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index a91a072..7cdf800 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1014,6 +1014,23 @@ static int check_hotplug_memory_range(u64 start, u64 size) return 0; } +static int check_hotremove_memory_range(u64 start, u64 size) +{ + int rc; + + BUG_ON(check_hotplug_memory_range(start, size)); + + /* + * First check if the platform is willing to have this + * memory range removed else just abort. + */ + rc = arch_memory_removable(start, size); + if (!rc) + return -EINVAL; + + return 0; +} + static int online_memory_block(struct memory_block *mem, void *arg) { return device_online(&mem->dev); @@ -1762,7 +1779,9 @@ static int __ref try_remove_memory(int nid, u64 start, u64 size) { int rc = 0; - BUG_ON(check_hotplug_memory_range(start, size)); + rc = check_hotremove_memory_range(start, size); + if (rc) + return rc; mem_hotplug_begin(); From patchwork Fri Jan 10 03:09:12 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anshuman Khandual X-Patchwork-Id: 11326547 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 74485139A for ; Fri, 10 Jan 2020 03:09:12 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 42E842075D for ; Fri, 10 Jan 2020 03:09:12 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="sYHTArqX" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 42E842075D Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=arm.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20170209; h=Sender: Content-Transfer-Encoding:Content-Type:MIME-Version:Cc:List-Subscribe: List-Help:List-Post:List-Archive:List-Unsubscribe:List-Id:References: In-Reply-To:Message-Id:Date:Subject:To:From:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Owner; bh=cshj3UWTng/3O0aEkcfUlEDofKqfU+aPShzRucQ1By8=; b=sYHTArqX6FIu90vLcRHoenOfix QN/0TLKR1nzgXVH8GbhnXu5YbWBOmZ01OJEnL4rS5vxTsDzRtYk4SntHk5Pe+Sic4ED+KZZg5bSRa nQYySDORAw8dp5MpbuwnlGu9V+Tjw0SUvCcsAF9pX81a526zgRTp+NHzzmZS0FmX8xgcV12b4gHFd xru+QEfqWVRa3d/AqogjOvTSsM9aGlB0s0RkvA+Zd+q2OGxec8yMvDzntjpUVdUUKoOIfDE/ngVuF TSTT67rAY7/We+XK3cEndmdBHjm9TW7S+KlZhy8aaVqNUFkjPZruQ1xscfDdoCy+p5lOHlCKaP/Yh NTAGefhw==; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1ipkfc-00015G-TI; Fri, 10 Jan 2020 03:09:08 +0000 Received: from foss.arm.com ([217.140.110.172]) by bombadil.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1ipkfR-0000u8-3x for linux-arm-kernel@lists.infradead.org; Fri, 10 Jan 2020 03:08:58 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id BFC6E113E; Thu, 9 Jan 2020 19:08:56 -0800 (PST) Received: from p8cg001049571a15.blr.arm.com (p8cg001049571a15.blr.arm.com [10.162.42.128]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 670D23F703; Thu, 9 Jan 2020 19:08:49 -0800 (PST) From: Anshuman Khandual To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, akpm@linux-foundation.org, catalin.marinas@arm.com, will@kernel.org Subject: [PATCH V11 2/5] mm/memblock: Introduce MEMBLOCK_BOOT flag Date: Fri, 10 Jan 2020 08:39:12 +0530 Message-Id: <1578625755-11792-3-git-send-email-anshuman.khandual@arm.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1578625755-11792-1-git-send-email-anshuman.khandual@arm.com> References: <1578625755-11792-1-git-send-email-anshuman.khandual@arm.com> X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20200109_190857_277402_7A73251A X-CRM114-Status: GOOD ( 13.59 ) X-Spam-Score: 0.0 (/) X-Spam-Report: SpamAssassin version 3.4.2 on bombadil.infradead.org summary: Content analysis details: (0.0 points) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at https://www.dnswl.org/, no trust [217.140.110.172 listed in list.dnswl.org] 0.0 SPF_HELO_NONE SPF: HELO does not publish an SPF Record -0.0 SPF_PASS SPF: sender matches SPF record X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: mark.rutland@arm.com, Mike Rapoport , ira.weiny@intel.com, david@redhat.com, mgorman@techsingularity.net, steve.capper@arm.com, Robin.Murphy@arm.com, steven.price@arm.com, broonie@kernel.org, cai@lca.pw, ard.biesheuvel@arm.com, cpandya@codeaurora.org, arunks@codeaurora.org, dan.j.williams@intel.com, Anshuman Khandual , logang@deltatee.com, valentin.schneider@arm.com, suzuki.poulose@arm.com, osalvador@suse.de MIME-Version: 1.0 Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org On arm64 platform boot memory should never be hot removed due to certain platform specific constraints. Hence the platform would like to override earlier added arch call back arch_memory_removable() for this purpose. In order to reject boot memory hot removal request, it needs to first track them at runtime. In the future, there might be other platforms requiring runtime boot memory enumeration. Hence lets expand the existing generic memblock framework for this purpose rather then creating one just for arm64 platforms. This introduces a new memblock flag MEMBLOCK_BOOT along with helpers which can be marked by given platform on all memory regions discovered during boot. Cc: Mike Rapoport Cc: Andrew Morton Signed-off-by: Anshuman Khandual --- include/linux/memblock.h | 10 ++++++++++ mm/memblock.c | 37 +++++++++++++++++++++++++++++++++++++ 2 files changed, 47 insertions(+) diff --git a/include/linux/memblock.h b/include/linux/memblock.h index b38bbef..fb04c87 100644 --- a/include/linux/memblock.h +++ b/include/linux/memblock.h @@ -31,12 +31,14 @@ extern unsigned long long max_possible_pfn; * @MEMBLOCK_HOTPLUG: hotpluggable region * @MEMBLOCK_MIRROR: mirrored region * @MEMBLOCK_NOMAP: don't add to kernel direct mapping + * @MEMBLOCK_BOOT: memory received from firmware during boot */ enum memblock_flags { MEMBLOCK_NONE = 0x0, /* No special request */ MEMBLOCK_HOTPLUG = 0x1, /* hotpluggable region */ MEMBLOCK_MIRROR = 0x2, /* mirrored region */ MEMBLOCK_NOMAP = 0x4, /* don't add to kernel direct mapping */ + MEMBLOCK_BOOT = 0x8, /* memory received from firmware during boot */ }; /** @@ -116,6 +118,8 @@ int memblock_reserve(phys_addr_t base, phys_addr_t size); void memblock_trim_memory(phys_addr_t align); bool memblock_overlaps_region(struct memblock_type *type, phys_addr_t base, phys_addr_t size); +int memblock_mark_boot(phys_addr_t base, phys_addr_t size); +int memblock_clear_boot(phys_addr_t base, phys_addr_t size); int memblock_mark_hotplug(phys_addr_t base, phys_addr_t size); int memblock_clear_hotplug(phys_addr_t base, phys_addr_t size); int memblock_mark_mirror(phys_addr_t base, phys_addr_t size); @@ -216,6 +220,11 @@ static inline bool memblock_is_nomap(struct memblock_region *m) return m->flags & MEMBLOCK_NOMAP; } +static inline bool memblock_is_boot(struct memblock_region *m) +{ + return m->flags & MEMBLOCK_BOOT; +} + #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP int memblock_search_pfn_nid(unsigned long pfn, unsigned long *start_pfn, unsigned long *end_pfn); @@ -449,6 +458,7 @@ void memblock_cap_memory_range(phys_addr_t base, phys_addr_t size); void memblock_mem_limit_remove_map(phys_addr_t limit); bool memblock_is_memory(phys_addr_t addr); bool memblock_is_map_memory(phys_addr_t addr); +bool memblock_is_boot_memory(phys_addr_t addr); bool memblock_is_region_memory(phys_addr_t base, phys_addr_t size); bool memblock_is_reserved(phys_addr_t addr); bool memblock_is_region_reserved(phys_addr_t base, phys_addr_t size); diff --git a/mm/memblock.c b/mm/memblock.c index 4bc2c7d..e10207f 100644 --- a/mm/memblock.c +++ b/mm/memblock.c @@ -865,6 +865,30 @@ static int __init_memblock memblock_setclr_flag(phys_addr_t base, } /** + * memblock_mark_bootmem - Mark boot memory with flag MEMBLOCK_BOOT. + * @base: the base phys addr of the region + * @size: the size of the region + * + * Return: 0 on success, -errno on failure. + */ +int __init_memblock memblock_mark_boot(phys_addr_t base, phys_addr_t size) +{ + return memblock_setclr_flag(base, size, 1, MEMBLOCK_BOOT); +} + +/** + * memblock_clear_bootmem - Clear flag MEMBLOCK_BOOT for a specified region. + * @base: the base phys addr of the region + * @size: the size of the region + * + * Return: 0 on success, -errno on failure. + */ +int __init_memblock memblock_clear_boot(phys_addr_t base, phys_addr_t size) +{ + return memblock_setclr_flag(base, size, 0, MEMBLOCK_BOOT); +} + +/** * memblock_mark_hotplug - Mark hotpluggable memory with flag MEMBLOCK_HOTPLUG. * @base: the base phys addr of the region * @size: the size of the region @@ -974,6 +998,10 @@ static bool should_skip_region(struct memblock_region *m, int nid, int flags) if ((flags & MEMBLOCK_MIRROR) && !memblock_is_mirror(m)) return true; + /* if we want boot memory skip non-boot memory regions */ + if ((flags & MEMBLOCK_BOOT) && !memblock_is_boot(m)) + return true; + /* skip nomap memory unless we were asked for it explicitly */ if (!(flags & MEMBLOCK_NOMAP) && memblock_is_nomap(m)) return true; @@ -1785,6 +1813,15 @@ bool __init_memblock memblock_is_map_memory(phys_addr_t addr) return !memblock_is_nomap(&memblock.memory.regions[i]); } +bool __init_memblock memblock_is_boot_memory(phys_addr_t addr) +{ + int i = memblock_search(&memblock.memory, addr); + + if (i == -1) + return false; + return memblock_is_boot(&memblock.memory.regions[i]); +} + #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP int __init_memblock memblock_search_pfn_nid(unsigned long pfn, unsigned long *start_pfn, unsigned long *end_pfn) From patchwork Fri Jan 10 03:09:13 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anshuman Khandual X-Patchwork-Id: 11326553 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7B2E213A0 for ; Fri, 10 Jan 2020 03:09:30 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 48E9C20673 for ; Fri, 10 Jan 2020 03:09:30 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="G/DU80vs" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 48E9C20673 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=arm.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20170209; h=Sender: Content-Transfer-Encoding:Content-Type:MIME-Version:Cc:List-Subscribe: List-Help:List-Post:List-Archive:List-Unsubscribe:List-Id:References: In-Reply-To:Message-Id:Date:Subject:To:From:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Owner; bh=9NaJ6pzBv+rPOyT47aYnTgYiJ2SiFtTuDUV68Fx01Yk=; b=G/DU80vsbLhRMCvWjMdllrn3Sj bMMXydS7vWETYUnHTWedt3pnHk0476Il0sSpAGwRPgZP15M20gik//o/4n1y/a+Hh7aGeF75grBGJ OrTYk884Zqd6owvqFgcHEQ17ZAPbOKN30/8ME+JvdBDVFntvDQIBhHJwT1Nr8FRIGylFxuh3lO2qi KCK54ZdShbM3RYV8yfy37tsLZec60UbSApVgzqL0IAzMYfyKrDxGVafI1jlZG+w04YBHOa0e3/cN6 pBWVoW0EjsMz9Z2mmsZhVgT4KtHDtEKfmMnhLm32QNc4l1lOA41I8OqIY0I86BGIZMhN49kZu17XQ wiOpRGZw==; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1ipkfv-0001L6-Hw; Fri, 10 Jan 2020 03:09:27 +0000 Received: from foss.arm.com ([217.140.110.172]) by bombadil.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1ipkfZ-000121-Vy for linux-arm-kernel@lists.infradead.org; Fri, 10 Jan 2020 03:09:08 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 2925411D4; Thu, 9 Jan 2020 19:09:05 -0800 (PST) Received: from p8cg001049571a15.blr.arm.com (p8cg001049571a15.blr.arm.com [10.162.42.128]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 5672B3F703; Thu, 9 Jan 2020 19:08:57 -0800 (PST) From: Anshuman Khandual To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, akpm@linux-foundation.org, catalin.marinas@arm.com, will@kernel.org Subject: [PATCH V11 3/5] of/fdt: Mark boot memory with MEMBLOCK_BOOT Date: Fri, 10 Jan 2020 08:39:13 +0530 Message-Id: <1578625755-11792-4-git-send-email-anshuman.khandual@arm.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1578625755-11792-1-git-send-email-anshuman.khandual@arm.com> References: <1578625755-11792-1-git-send-email-anshuman.khandual@arm.com> X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20200109_190906_082489_2BCA5F2A X-CRM114-Status: GOOD ( 11.13 ) X-Spam-Score: 0.0 (/) X-Spam-Report: SpamAssassin version 3.4.2 on bombadil.infradead.org summary: Content analysis details: (0.0 points) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at https://www.dnswl.org/, no trust [217.140.110.172 listed in list.dnswl.org] 0.0 SPF_HELO_NONE SPF: HELO does not publish an SPF Record -0.0 SPF_PASS SPF: sender matches SPF record X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: mark.rutland@arm.com, devicetree@vger.kernel.org, Frank Rowand , ira.weiny@intel.com, david@redhat.com, mgorman@techsingularity.net, steve.capper@arm.com, Robin.Murphy@arm.com, Rob Herring , steven.price@arm.com, broonie@kernel.org, cai@lca.pw, ard.biesheuvel@arm.com, cpandya@codeaurora.org, arunks@codeaurora.org, dan.j.williams@intel.com, Anshuman Khandual , logang@deltatee.com, valentin.schneider@arm.com, suzuki.poulose@arm.com, osalvador@suse.de MIME-Version: 1.0 Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org early_init_dt_add_memory_arch() adds memory into memblock on both UEFI and DT based arm64 systems. Lets mark these as boot memory right after they get into memblock. All other platforms using this default implementation for early_init_dt_add_memory_arch() will also have this memblock flag set on boot memory ranges but will be upto the platforms if they would like to use it or not. On arm64 platform this flag will be used to identify boot memory at runtime and reject any attempt to remove them. Cc: Rob Herring Cc: Frank Rowand Cc: devicetree@vger.kernel.org Signed-off-by: Anshuman Khandual --- drivers/of/fdt.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c index 2cdf64d..a2ae2c88 100644 --- a/drivers/of/fdt.c +++ b/drivers/of/fdt.c @@ -1143,6 +1143,7 @@ void __init __weak early_init_dt_add_memory_arch(u64 base, u64 size) base = phys_offset; } memblock_add(base, size); + memblock_mark_boot(base, size); } int __init __weak early_init_dt_mark_hotplug_memory_arch(u64 base, u64 size) From patchwork Fri Jan 10 03:09:14 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anshuman Khandual X-Patchwork-Id: 11326555 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 33E1A139A for ; Fri, 10 Jan 2020 03:09:42 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 9AE4E20673 for ; Fri, 10 Jan 2020 03:09:41 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="MtEzMznb" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9AE4E20673 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=arm.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20170209; h=Sender: Content-Transfer-Encoding:Content-Type:MIME-Version:Cc:List-Subscribe: List-Help:List-Post:List-Archive:List-Unsubscribe:List-Id:References: In-Reply-To:Message-Id:Date:Subject:To:From:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Owner; bh=brcuZxRq0Bz6zxQdl/wCgLDFUPR7mr4KtnJNuC993/o=; b=MtEzMznbXweXVgPswomeaGnOSO eO6Bk8xjsL2Ya9s+7ytIvUGmbo51r9r+X0rLskQNIUqgbxm36X2+k4+q64pSzRLiWRoR1qiyiKWJr eiW+aXG+qfg4VBTo8vgpeqWkNRnLIlwwGbw2ecjfApqa7bN/43oe17ozMENWGGzZmIVaHTODjOrrL kZJvyhmemJD+rrMwy0343cQM0psev5AD7HfbjMedq4l62G4JnTfKjekhs7laHSAZHe86bsXwpScBa 5vS3EtMY0Lb/DdMOuUZlsg3J07Mk+UQkg1k4GD3t8RVVvuJhtZUYtUtOB+Ud992hrgSyUkq+OKYfa P6045siQ==; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1ipkg6-0001Xt-H7; Fri, 10 Jan 2020 03:09:38 +0000 Received: from foss.arm.com ([217.140.110.172]) by bombadil.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1ipkfg-00018n-91 for linux-arm-kernel@lists.infradead.org; Fri, 10 Jan 2020 03:09:13 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 9BF3A11FB; Thu, 9 Jan 2020 19:09:11 -0800 (PST) Received: from p8cg001049571a15.blr.arm.com (p8cg001049571a15.blr.arm.com [10.162.42.128]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 988163F703; Thu, 9 Jan 2020 19:09:05 -0800 (PST) From: Anshuman Khandual To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, akpm@linux-foundation.org, catalin.marinas@arm.com, will@kernel.org Subject: [PATCH V11 4/5] arm64/mm: Hold memory hotplug lock while walking for kernel page table dump Date: Fri, 10 Jan 2020 08:39:14 +0530 Message-Id: <1578625755-11792-5-git-send-email-anshuman.khandual@arm.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1578625755-11792-1-git-send-email-anshuman.khandual@arm.com> References: <1578625755-11792-1-git-send-email-anshuman.khandual@arm.com> X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20200109_190912_385732_978F65D5 X-CRM114-Status: GOOD ( 11.39 ) X-Spam-Score: 0.0 (/) X-Spam-Report: SpamAssassin version 3.4.2 on bombadil.infradead.org summary: Content analysis details: (0.0 points) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at https://www.dnswl.org/, no trust [217.140.110.172 listed in list.dnswl.org] 0.0 SPF_HELO_NONE SPF: HELO does not publish an SPF Record -0.0 SPF_PASS SPF: sender matches SPF record X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: mark.rutland@arm.com, ira.weiny@intel.com, david@redhat.com, mgorman@techsingularity.net, steve.capper@arm.com, Robin.Murphy@arm.com, steven.price@arm.com, broonie@kernel.org, cai@lca.pw, ard.biesheuvel@arm.com, cpandya@codeaurora.org, arunks@codeaurora.org, dan.j.williams@intel.com, Anshuman Khandual , logang@deltatee.com, valentin.schneider@arm.com, suzuki.poulose@arm.com, osalvador@suse.de MIME-Version: 1.0 Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org The arm64 page table dump code can race with concurrent modification of the kernel page tables. When a leaf entries are modified concurrently, the dump code may log stale or inconsistent information for a VA range, but this is otherwise not harmful. When intermediate levels of table are freed, the dump code will continue to use memory which has been freed and potentially reallocated for another purpose. In such cases, the dump code may dereference bogus addresses, leading to a number of potential problems. Intermediate levels of table may by freed during memory hot-remove, which will be enabled by a subsequent patch. To avoid racing with this, take the memory hotplug lock when walking the kernel page table. Cc: Catalin Marinas Cc: Will Deacon Acked-by: David Hildenbrand Acked-by: Mark Rutland Signed-off-by: Anshuman Khandual --- arch/arm64/mm/ptdump_debugfs.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/arch/arm64/mm/ptdump_debugfs.c b/arch/arm64/mm/ptdump_debugfs.c index 064163f..b5eebc8 100644 --- a/arch/arm64/mm/ptdump_debugfs.c +++ b/arch/arm64/mm/ptdump_debugfs.c @@ -1,5 +1,6 @@ // SPDX-License-Identifier: GPL-2.0 #include +#include #include #include @@ -7,7 +8,10 @@ static int ptdump_show(struct seq_file *m, void *v) { struct ptdump_info *info = m->private; + + get_online_mems(); ptdump_walk_pgd(m, info); + put_online_mems(); return 0; } DEFINE_SHOW_ATTRIBUTE(ptdump); From patchwork Fri Jan 10 03:09:15 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anshuman Khandual X-Patchwork-Id: 11326557 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4CDEA13A0 for ; Fri, 10 Jan 2020 03:09:58 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 1197E20673 for ; Fri, 10 Jan 2020 03:09:58 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="CvHVL5Jx" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1197E20673 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=arm.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20170209; h=Sender: Content-Transfer-Encoding:Content-Type:MIME-Version:Cc:List-Subscribe: List-Help:List-Post:List-Archive:List-Unsubscribe:List-Id:References: In-Reply-To:Message-Id:Date:Subject:To:From:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Owner; bh=qJTb3WEkGdynY/iZl0XFe0GKiXVQbM559O0Rt5u/0Ns=; b=CvHVL5Jx+IUFpRHzm3WmGxiKaz tPtos2ECPY5BVl/e4GOZvaPtrsquz11Y7TJXRgk8vqWXPI4NMfnIaWS5YAIberQklnL4zzOPd0s+c wsDDIyDsnILTByGb4GtLWi+iYnInAWiSqiyi3ucY51tW7CEu6n2mF4LWXt2U7EruVucm3jhUHsYjD ieq3kZqe5QnoPH+UEdyO2wd44JrQJ4/VduBBFruh9NANVRTv1M8QEzq2Xk1RD3vBfZjzHmrnV++UC NR1ntgFZeV2FUmhiPgvTpRql2vmbnxIjrbRsJrvwefCX+ev++Ma2chVSxPvUbo3BcKqWllCDlAKUc P/iCTyTw==; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1ipkgP-0001qY-83; Fri, 10 Jan 2020 03:09:57 +0000 Received: from foss.arm.com ([217.140.110.172]) by bombadil.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1ipkfo-0001GY-E1 for linux-arm-kernel@lists.infradead.org; Fri, 10 Jan 2020 03:09:23 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 8654A12FC; Thu, 9 Jan 2020 19:09:18 -0800 (PST) Received: from p8cg001049571a15.blr.arm.com (p8cg001049571a15.blr.arm.com [10.162.42.128]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 1BC813F703; Thu, 9 Jan 2020 19:09:11 -0800 (PST) From: Anshuman Khandual To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, akpm@linux-foundation.org, catalin.marinas@arm.com, will@kernel.org Subject: [PATCH V11 5/5] arm64/mm: Enable memory hot remove Date: Fri, 10 Jan 2020 08:39:15 +0530 Message-Id: <1578625755-11792-6-git-send-email-anshuman.khandual@arm.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1578625755-11792-1-git-send-email-anshuman.khandual@arm.com> References: <1578625755-11792-1-git-send-email-anshuman.khandual@arm.com> X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20200109_190920_688738_45A5BEAA X-CRM114-Status: GOOD ( 21.88 ) X-Spam-Score: 0.0 (/) X-Spam-Report: SpamAssassin version 3.4.2 on bombadil.infradead.org summary: Content analysis details: (0.0 points) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at https://www.dnswl.org/, no trust [217.140.110.172 listed in list.dnswl.org] 0.0 SPF_HELO_NONE SPF: HELO does not publish an SPF Record -0.0 SPF_PASS SPF: sender matches SPF record X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: mark.rutland@arm.com, ira.weiny@intel.com, david@redhat.com, mgorman@techsingularity.net, steve.capper@arm.com, Robin.Murphy@arm.com, steven.price@arm.com, broonie@kernel.org, cai@lca.pw, ard.biesheuvel@arm.com, cpandya@codeaurora.org, arunks@codeaurora.org, dan.j.williams@intel.com, Anshuman Khandual , logang@deltatee.com, valentin.schneider@arm.com, suzuki.poulose@arm.com, osalvador@suse.de MIME-Version: 1.0 Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org The arch code for hot-remove must tear down portions of the linear map and vmemmap corresponding to memory being removed. In both cases the page tables mapping these regions must be freed, and when sparse vmemmap is in use the memory backing the vmemmap must also be freed. This patch adds unmap_hotplug_range() and free_empty_tables() helpers which can be used to tear down either region and calls it from vmemmap_free() and ___remove_pgd_mapping(). The free_mapped argument determines whether the backing memory will be freed. It makes two distinct passes over the kernel page table. In the first pass with unmap_hotplug_range() it unmaps, invalidates applicable TLB cache and frees backing memory if required (vmemmap) for each mapped leaf entry. In the second pass with free_empty_tables() it looks for empty page table sections whose page table page can be unmapped, TLB invalidated and freed. While freeing intermediate level page table pages bail out if any of its entries are still valid. This can happen for partially filled kernel page table either from a previously attempted failed memory hot add or while removing an address range which does not span the entire page table page range. The vmemmap region may share levels of table with the vmalloc region. There can be conflicts between hot remove freeing page table pages with a concurrent vmalloc() walking the kernel page table. This conflict can not just be solved by taking the init_mm ptl because of existing locking scheme in vmalloc(). So free_empty_tables() implements a floor and ceiling method which is borrowed from user page table tear with free_pgd_range() which skips freeing page table pages if intermediate address range is not aligned or maximum floor-ceiling might not own the entire page table page. Boot memory on arm64 cannot be removed. Hence subscribe the earlier added platform call back mechanism arch_memory_removable() and reject any boot memory removal requests. While here update arch_add_memory() to handle __add_pages() failures by just unmapping recently added kernel linear mapping. Now enable memory hot remove on arm64 platforms by default with ARCH_ENABLE_MEMORY_HOTREMOVE. This implementation is overall inspired from kernel page table tear down procedure on X86 architecture and user page table tear down method. Cc: Catalin Marinas Cc: Will Deacon Cc: Steve Capper Cc: Mark Rutland Reviewed-by: Catalin Marinas Signed-off-by: Anshuman Khandual --- arch/arm64/Kconfig | 3 + arch/arm64/include/asm/memory.h | 6 + arch/arm64/mm/mmu.c | 334 ++++++++++++++++++++++++++++++++++++++-- 3 files changed, 334 insertions(+), 9 deletions(-) diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index b1b4476..402a114 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -277,6 +277,9 @@ config ZONE_DMA32 config ARCH_ENABLE_MEMORY_HOTPLUG def_bool y +config ARCH_ENABLE_MEMORY_HOTREMOVE + def_bool y + config SMP def_bool y diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h index a4f9ca5..045a512 100644 --- a/arch/arm64/include/asm/memory.h +++ b/arch/arm64/include/asm/memory.h @@ -54,6 +54,7 @@ #define MODULES_VADDR (BPF_JIT_REGION_END) #define MODULES_VSIZE (SZ_128M) #define VMEMMAP_START (-VMEMMAP_SIZE - SZ_2M) +#define VMEMMAP_END (VMEMMAP_START + VMEMMAP_SIZE) #define PCI_IO_END (VMEMMAP_START - SZ_2M) #define PCI_IO_START (PCI_IO_END - PCI_IO_SIZE) #define FIXADDR_TOP (PCI_IO_START - SZ_2M) @@ -292,6 +293,11 @@ static inline void *phys_to_virt(phys_addr_t x) return (void *)(__phys_to_virt(x)); } +#ifdef CONFIG_MEMORY_HOTREMOVE +#define arch_memory_removable arch_memory_removable +extern bool arch_memory_removable(u64 base, u64 size); +#endif + /* * Drivers should NOT use these either. */ diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index 40797cb..2cb1b2e 100644 --- a/arch/arm64/mm/mmu.c +++ b/arch/arm64/mm/mmu.c @@ -17,6 +17,7 @@ #include #include #include +#include #include #include #include @@ -724,6 +725,275 @@ int kern_addr_valid(unsigned long addr) return pfn_valid(pte_pfn(pte)); } + +#ifdef CONFIG_MEMORY_HOTPLUG +static void free_hotplug_page_range(struct page *page, size_t size) +{ + WARN_ON(PageReserved(page)); + free_pages((unsigned long)page_address(page), get_order(size)); +} + +static void free_hotplug_pgtable_page(struct page *page) +{ + free_hotplug_page_range(page, PAGE_SIZE); +} + +static bool pgtable_range_aligned(unsigned long start, unsigned long end, + unsigned long floor, unsigned long ceiling, + unsigned long mask) +{ + start &= mask; + if (start < floor) + return false; + + if (ceiling) { + ceiling &= mask; + if (!ceiling) + return false; + } + + if (end - 1 > ceiling - 1) + return false; + return true; +} + +static void unmap_hotplug_pte_range(pmd_t *pmdp, unsigned long addr, + unsigned long end, bool free_mapped) +{ + pte_t *ptep, pte; + + do { + ptep = pte_offset_kernel(pmdp, addr); + pte = READ_ONCE(*ptep); + if (pte_none(pte)) + continue; + + WARN_ON(!pte_present(pte)); + pte_clear(&init_mm, addr, ptep); + flush_tlb_kernel_range(addr, addr + PAGE_SIZE); + if (free_mapped) + free_hotplug_page_range(pte_page(pte), PAGE_SIZE); + } while (addr += PAGE_SIZE, addr < end); +} + +static void unmap_hotplug_pmd_range(pud_t *pudp, unsigned long addr, + unsigned long end, bool free_mapped) +{ + unsigned long next; + pmd_t *pmdp, pmd; + + do { + next = pmd_addr_end(addr, end); + pmdp = pmd_offset(pudp, addr); + pmd = READ_ONCE(*pmdp); + if (pmd_none(pmd)) + continue; + + WARN_ON(!pmd_present(pmd)); + if (pmd_sect(pmd)) { + pmd_clear(pmdp); + + /* + * One TLBI should be sufficient here as the PMD_SIZE + * range is mapped with a single block entry. + */ + flush_tlb_kernel_range(addr, addr + PAGE_SIZE); + if (free_mapped) + free_hotplug_page_range(pmd_page(pmd), + PMD_SIZE); + continue; + } + WARN_ON(!pmd_table(pmd)); + unmap_hotplug_pte_range(pmdp, addr, next, free_mapped); + } while (addr = next, addr < end); +} + +static void unmap_hotplug_pud_range(pgd_t *pgdp, unsigned long addr, + unsigned long end, bool free_mapped) +{ + unsigned long next; + pud_t *pudp, pud; + + do { + next = pud_addr_end(addr, end); + pudp = pud_offset(pgdp, addr); + pud = READ_ONCE(*pudp); + if (pud_none(pud)) + continue; + + WARN_ON(!pud_present(pud)); + if (pud_sect(pud)) { + pud_clear(pudp); + + /* + * One TLBI should be sufficient here as the PUD_SIZE + * range is mapped with a single block entry. + */ + flush_tlb_kernel_range(addr, addr + PAGE_SIZE); + if (free_mapped) + free_hotplug_page_range(pud_page(pud), + PUD_SIZE); + continue; + } + WARN_ON(!pud_table(pud)); + unmap_hotplug_pmd_range(pudp, addr, next, free_mapped); + } while (addr = next, addr < end); +} + +static void unmap_hotplug_range(unsigned long addr, unsigned long end, + bool free_mapped) +{ + unsigned long next; + pgd_t *pgdp, pgd; + + do { + next = pgd_addr_end(addr, end); + pgdp = pgd_offset_k(addr); + pgd = READ_ONCE(*pgdp); + if (pgd_none(pgd)) + continue; + + WARN_ON(!pgd_present(pgd)); + unmap_hotplug_pud_range(pgdp, addr, next, free_mapped); + } while (addr = next, addr < end); +} + +static void free_empty_pte_table(pmd_t *pmdp, unsigned long addr, + unsigned long end, unsigned long floor, + unsigned long ceiling) +{ + pte_t *ptep, pte; + unsigned long i, start = addr; + + do { + ptep = pte_offset_kernel(pmdp, addr); + pte = READ_ONCE(*ptep); + + /* + * This is just a sanity check here which verifies that + * pte clearing has been done by earlier unmap loops. + */ + WARN_ON(!pte_none(pte)); + } while (addr += PAGE_SIZE, addr < end); + + if (!pgtable_range_aligned(start, end, floor, ceiling, PMD_MASK)) + return; + + /* + * Check whether we can free the pte page if the rest of the + * entries are empty. Overlap with other regions have been + * handled by the floor/ceiling check. + */ + ptep = pte_offset_kernel(pmdp, 0UL); + for (i = 0; i < PTRS_PER_PTE; i++) { + if (!pte_none(READ_ONCE(ptep[i]))) + return; + } + + pmd_clear(pmdp); + __flush_tlb_kernel_pgtable(start); + free_hotplug_pgtable_page(virt_to_page(ptep)); +} + +static void free_empty_pmd_table(pud_t *pudp, unsigned long addr, + unsigned long end, unsigned long floor, + unsigned long ceiling) +{ + pmd_t *pmdp, pmd; + unsigned long i, next, start = addr; + + do { + next = pmd_addr_end(addr, end); + pmdp = pmd_offset(pudp, addr); + pmd = READ_ONCE(*pmdp); + if (pmd_none(pmd)) + continue; + + WARN_ON(!pmd_present(pmd) || !pmd_table(pmd) || pmd_sect(pmd)); + free_empty_pte_table(pmdp, addr, next, floor, ceiling); + } while (addr = next, addr < end); + + if (CONFIG_PGTABLE_LEVELS <= 2) + return; + + if (!pgtable_range_aligned(start, end, floor, ceiling, PUD_MASK)) + return; + + /* + * Check whether we can free the pmd page if the rest of the + * entries are empty. Overlap with other regions have been + * handled by the floor/ceiling check. + */ + pmdp = pmd_offset(pudp, 0UL); + for (i = 0; i < PTRS_PER_PMD; i++) { + if (!pmd_none(READ_ONCE(pmdp[i]))) + return; + } + + pud_clear(pudp); + __flush_tlb_kernel_pgtable(start); + free_hotplug_pgtable_page(virt_to_page(pmdp)); +} + +static void free_empty_pud_table(pgd_t *pgdp, unsigned long addr, + unsigned long end, unsigned long floor, + unsigned long ceiling) +{ + pud_t *pudp, pud; + unsigned long i, next, start = addr; + + do { + next = pud_addr_end(addr, end); + pudp = pud_offset(pgdp, addr); + pud = READ_ONCE(*pudp); + if (pud_none(pud)) + continue; + + WARN_ON(!pud_present(pud) || !pud_table(pud) || pud_sect(pud)); + free_empty_pmd_table(pudp, addr, next, floor, ceiling); + } while (addr = next, addr < end); + + if (CONFIG_PGTABLE_LEVELS <= 3) + return; + + if (!pgtable_range_aligned(start, end, floor, ceiling, PGDIR_MASK)) + return; + + /* + * Check whether we can free the pud page if the rest of the + * entries are empty. Overlap with other regions have been + * handled by the floor/ceiling check. + */ + pudp = pud_offset(pgdp, 0UL); + for (i = 0; i < PTRS_PER_PUD; i++) { + if (!pud_none(READ_ONCE(pudp[i]))) + return; + } + + pgd_clear(pgdp); + __flush_tlb_kernel_pgtable(start); + free_hotplug_pgtable_page(virt_to_page(pudp)); +} + +static void free_empty_tables(unsigned long addr, unsigned long end, + unsigned long floor, unsigned long ceiling) +{ + unsigned long next; + pgd_t *pgdp, pgd; + + do { + next = pgd_addr_end(addr, end); + pgdp = pgd_offset_k(addr); + pgd = READ_ONCE(*pgdp); + if (pgd_none(pgd)) + continue; + + WARN_ON(!pgd_present(pgd)); + free_empty_pud_table(pgdp, addr, next, floor, ceiling); + } while (addr = next, addr < end); +} +#endif + #ifdef CONFIG_SPARSEMEM_VMEMMAP #if !ARM64_SWAPPER_USES_SECTION_MAPS int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node, @@ -771,6 +1041,12 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node, void vmemmap_free(unsigned long start, unsigned long end, struct vmem_altmap *altmap) { +#ifdef CONFIG_MEMORY_HOTPLUG + WARN_ON((start < VMEMMAP_START) || (end > VMEMMAP_END)); + + unmap_hotplug_range(start, end, true); + free_empty_tables(start, end, VMEMMAP_START, VMEMMAP_END); +#endif } #endif /* CONFIG_SPARSEMEM_VMEMMAP */ @@ -1049,10 +1325,41 @@ int p4d_free_pud_page(p4d_t *p4d, unsigned long addr) } #ifdef CONFIG_MEMORY_HOTPLUG +static bool range_overlaps_bootmem(u64 base, u64 size) +{ + unsigned long addr, end = base + size; + unsigned long mem_block_size = memory_block_size_bytes(); + + WARN_ON(!IS_ALIGNED(base, mem_block_size)); + WARN_ON(!IS_ALIGNED(size, mem_block_size)); + + /* + * Both memory hot add and remove happens in memory block + * units. Any given memory block on the system was either + * added during boot or at runtime via hotplug. + */ + for (addr = base; addr <= end; addr += mem_block_size) { + if (memblock_is_boot_memory(addr)) + return true; + } + return false; +} + +static void __remove_pgd_mapping(pgd_t *pgdir, unsigned long start, u64 size) +{ + unsigned long end = start + size; + + WARN_ON(pgdir != init_mm.pgd); + WARN_ON((start < PAGE_OFFSET) || (end > PAGE_END)); + + unmap_hotplug_range(start, end, false); + free_empty_tables(start, end, PAGE_OFFSET, PAGE_END); +} + int arch_add_memory(int nid, u64 start, u64 size, struct mhp_restrictions *restrictions) { - int flags = 0; + int ret, flags = 0; if (rodata_full || debug_pagealloc_enabled()) flags = NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS; @@ -1062,8 +1369,13 @@ int arch_add_memory(int nid, u64 start, u64 size, memblock_clear_nomap(start, size); - return __add_pages(nid, start >> PAGE_SHIFT, size >> PAGE_SHIFT, + ret = __add_pages(nid, start >> PAGE_SHIFT, size >> PAGE_SHIFT, restrictions); + if (ret) + __remove_pgd_mapping(swapper_pg_dir, + __phys_to_virt(start), size); + return ret; + } void arch_remove_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap) @@ -1071,13 +1383,17 @@ void arch_remove_memory(int nid, u64 start, u64 size, unsigned long start_pfn = start >> PAGE_SHIFT; unsigned long nr_pages = size >> PAGE_SHIFT; - /* - * FIXME: Cleanup page tables (also in arch_add_memory() in case - * adding fails). Until then, this function should only be used - * during memory hotplug (adding memory), not for memory - * unplug. ARCH_ENABLE_MEMORY_HOTREMOVE must not be - * unlocked yet. - */ + WARN_ON(range_overlaps_bootmem(start, size)); __remove_pages(start_pfn, nr_pages, altmap); + __remove_pgd_mapping(swapper_pg_dir, __phys_to_virt(start), size); +} +#endif + +#ifdef CONFIG_MEMORY_HOTREMOVE +bool arch_memory_removable(u64 base, u64 size) +{ + if (range_overlaps_bootmem(base, size)) + return false; + return true; } #endif