From patchwork Fri Jan 10 03:09:11 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anshuman Khandual X-Patchwork-Id: 11326539 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 96F7213A0 for ; Fri, 10 Jan 2020 03:08:51 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 6DE4F20838 for ; Fri, 10 Jan 2020 03:08:51 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6DE4F20838 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=arm.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id AF3748E0007; Thu, 9 Jan 2020 22:08:50 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id AA2F88E0001; Thu, 9 Jan 2020 22:08:50 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9B8A68E0007; Thu, 9 Jan 2020 22:08:50 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0184.hostedemail.com [216.40.44.184]) by kanga.kvack.org (Postfix) with ESMTP id 867C58E0001 for ; Thu, 9 Jan 2020 22:08:50 -0500 (EST) Received: from smtpin18.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with SMTP id 437E9180AD80F for ; Fri, 10 Jan 2020 03:08:50 +0000 (UTC) X-FDA: 76360242420.18.pie68_6a2eb8c4d6740 X-Spam-Summary: 2,0,0,431363739d78ddb7,d41d8cd98f00b204,anshuman.khandual@arm.com,::linux-kernel@vger.kernel.org:linux-arm-kernel@lists.infradead.org:akpm@linux-foundation.org:catalin.marinas@arm.com:will@kernel.org:mark.rutland@arm.com:david@redhat.com:cai@lca.pw:logang@deltatee.com:cpandya@codeaurora.org:arunks@codeaurora.org:dan.j.williams@intel.com:mgorman@techsingularity.net:osalvador@suse.de:ard.biesheuvel@arm.com:steve.capper@arm.com:broonie@kernel.org:valentin.schneider@arm.com:robin.murphy@arm.com:steven.price@arm.com:suzuki.poulose@arm.com:ira.weiny@intel.com:anshuman.khandual@arm.com,RULES_HIT:41:355:379:541:800:960:968:973:988:989:1260:1261:1345:1359:1431:1437:1534:1543:1711:1730:1747:1777:1792:1963:2194:2198:2199:2200:2393:2559:2562:2731:3138:3139:3140:3141:3142:3355:3865:3866:3867:3868:3870:3871:4031:4321:4419:4605:5007:6261:6742:7903:8603:8634:8660:9389:10004:11026:11232:11473:11658:11914:12043:12048:12291:12296:12297:12438:12555:12895:12986:13141:13148:13149:1 3161:132 X-HE-Tag: pie68_6a2eb8c4d6740 X-Filterd-Recvd-Size: 4950 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf19.hostedemail.com (Postfix) with ESMTP for ; Fri, 10 Jan 2020 03:08:49 +0000 (UTC) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id C8D211007; Thu, 9 Jan 2020 19:08:48 -0800 (PST) Received: from p8cg001049571a15.blr.arm.com (p8cg001049571a15.blr.arm.com [10.162.42.128]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 064CB3F703; Thu, 9 Jan 2020 19:08:40 -0800 (PST) From: Anshuman Khandual To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, akpm@linux-foundation.org, catalin.marinas@arm.com, will@kernel.org Cc: mark.rutland@arm.com, david@redhat.com, cai@lca.pw, logang@deltatee.com, cpandya@codeaurora.org, arunks@codeaurora.org, dan.j.williams@intel.com, mgorman@techsingularity.net, osalvador@suse.de, ard.biesheuvel@arm.com, steve.capper@arm.com, broonie@kernel.org, valentin.schneider@arm.com, Robin.Murphy@arm.com, steven.price@arm.com, suzuki.poulose@arm.com, ira.weiny@intel.com, Anshuman Khandual Subject: [PATCH V11 1/5] mm/hotplug: Introduce arch callback validating the hot remove range Date: Fri, 10 Jan 2020 08:39:11 +0530 Message-Id: <1578625755-11792-2-git-send-email-anshuman.khandual@arm.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1578625755-11792-1-git-send-email-anshuman.khandual@arm.com> References: <1578625755-11792-1-git-send-email-anshuman.khandual@arm.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Currently there are two interfaces to initiate memory range hot removal i.e remove_memory() and __remove_memory() which then calls try_remove_memory(). Platform gets called with arch_remove_memory() to tear down required kernel page tables and other arch specific procedures. But there are platforms like arm64 which might want to prevent removal of certain specific memory ranges irrespective of their present usage or movability properties. Current arch call back arch_remove_memory() is too late in the process to abort memory hot removal as memory block devices and firmware memory map entries would have already been removed. Platforms should be able to abort the process before taking the mem_hotplug_lock with mem_hotplug_begin(). This essentially requires a new arch callback for memory range validation. This differentiates memory range validation between memory hot add and hot remove paths before carving out a new helper check_hotremove_memory_range() which incorporates a new arch callback. This call back provides platforms an opportunity to refuse memory removal at the very onset. In future the same principle can be extended for memory hot add path if required. Platforms can choose to override this callback in order to reject specific memory ranges from removal or can just fallback to a default implementation which allows removal of all memory ranges. Cc: Andrew Morton Signed-off-by: Anshuman Khandual Reported-by: kbuild test robot Reported-by: kbuild test robot --- include/linux/memory_hotplug.h | 7 +++++++ mm/memory_hotplug.c | 21 ++++++++++++++++++++- 2 files changed, 27 insertions(+), 1 deletion(-) diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index ba0dca6..f661bd5 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -305,6 +305,13 @@ static inline void pgdat_resize_init(struct pglist_data *pgdat) {} #ifdef CONFIG_MEMORY_HOTREMOVE +#ifndef arch_memory_removable +static inline bool arch_memory_removable(u64 base, u64 size) +{ + return true; +} +#endif + extern bool is_mem_section_removable(unsigned long pfn, unsigned long nr_pages); extern void try_offline_node(int nid); extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages); diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index a91a072..7cdf800 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1014,6 +1014,23 @@ static int check_hotplug_memory_range(u64 start, u64 size) return 0; } +static int check_hotremove_memory_range(u64 start, u64 size) +{ + int rc; + + BUG_ON(check_hotplug_memory_range(start, size)); + + /* + * First check if the platform is willing to have this + * memory range removed else just abort. + */ + rc = arch_memory_removable(start, size); + if (!rc) + return -EINVAL; + + return 0; +} + static int online_memory_block(struct memory_block *mem, void *arg) { return device_online(&mem->dev); @@ -1762,7 +1779,9 @@ static int __ref try_remove_memory(int nid, u64 start, u64 size) { int rc = 0; - BUG_ON(check_hotplug_memory_range(start, size)); + rc = check_hotremove_memory_range(start, size); + if (rc) + return rc; mem_hotplug_begin(); From patchwork Fri Jan 10 03:09:12 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anshuman Khandual X-Patchwork-Id: 11326543 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 78FA2139A for ; Fri, 10 Jan 2020 03:09:00 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 3851C2072E for ; Fri, 10 Jan 2020 03:08:59 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3851C2072E Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=arm.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 993B78E0008; Thu, 9 Jan 2020 22:08:58 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 943878E0001; Thu, 9 Jan 2020 22:08:58 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 858FE8E0008; Thu, 9 Jan 2020 22:08:58 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0241.hostedemail.com [216.40.44.241]) by kanga.kvack.org (Postfix) with ESMTP id 712B58E0001 for ; Thu, 9 Jan 2020 22:08:58 -0500 (EST) Received: from smtpin08.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with SMTP id 2CFBC33CD for ; Fri, 10 Jan 2020 03:08:58 +0000 (UTC) X-FDA: 76360242756.08.shock50_6b5e12475820c X-Spam-Summary: 2,0,0,1ab9c7cbd228716b,d41d8cd98f00b204,anshuman.khandual@arm.com,::linux-kernel@vger.kernel.org:linux-arm-kernel@lists.infradead.org:akpm@linux-foundation.org:catalin.marinas@arm.com:will@kernel.org:mark.rutland@arm.com:david@redhat.com:cai@lca.pw:logang@deltatee.com:cpandya@codeaurora.org:arunks@codeaurora.org:dan.j.williams@intel.com:mgorman@techsingularity.net:osalvador@suse.de:ard.biesheuvel@arm.com:steve.capper@arm.com:broonie@kernel.org:valentin.schneider@arm.com:robin.murphy@arm.com:steven.price@arm.com:suzuki.poulose@arm.com:ira.weiny@intel.com:anshuman.khandual@arm.com:rppt@linux.ibm.com,RULES_HIT:2:41:355:379:541:800:960:968:973:988:989:1260:1261:1345:1359:1431:1437:1535:1605:1606:1730:1747:1777:1792:1963:1981:2194:2198:2199:2200:2393:2559:2562:2731:2899:3138:3139:3140:3141:3142:3865:3866:3867:3870:3872:4117:4321:4605:5007:6117:6261:6742:7875:7903:8603:8634:8660:9389:10004:11026:11473:11658:11914:12043:12048:12291:12296:12297:12438:12555:12683:12895 :13148:1 X-HE-Tag: shock50_6b5e12475820c X-Filterd-Recvd-Size: 6968 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf10.hostedemail.com (Postfix) with ESMTP for ; Fri, 10 Jan 2020 03:08:57 +0000 (UTC) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id BFC6E113E; Thu, 9 Jan 2020 19:08:56 -0800 (PST) Received: from p8cg001049571a15.blr.arm.com (p8cg001049571a15.blr.arm.com [10.162.42.128]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 670D23F703; Thu, 9 Jan 2020 19:08:49 -0800 (PST) From: Anshuman Khandual To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, akpm@linux-foundation.org, catalin.marinas@arm.com, will@kernel.org Cc: mark.rutland@arm.com, david@redhat.com, cai@lca.pw, logang@deltatee.com, cpandya@codeaurora.org, arunks@codeaurora.org, dan.j.williams@intel.com, mgorman@techsingularity.net, osalvador@suse.de, ard.biesheuvel@arm.com, steve.capper@arm.com, broonie@kernel.org, valentin.schneider@arm.com, Robin.Murphy@arm.com, steven.price@arm.com, suzuki.poulose@arm.com, ira.weiny@intel.com, Anshuman Khandual , Mike Rapoport Subject: [PATCH V11 2/5] mm/memblock: Introduce MEMBLOCK_BOOT flag Date: Fri, 10 Jan 2020 08:39:12 +0530 Message-Id: <1578625755-11792-3-git-send-email-anshuman.khandual@arm.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1578625755-11792-1-git-send-email-anshuman.khandual@arm.com> References: <1578625755-11792-1-git-send-email-anshuman.khandual@arm.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On arm64 platform boot memory should never be hot removed due to certain platform specific constraints. Hence the platform would like to override earlier added arch call back arch_memory_removable() for this purpose. In order to reject boot memory hot removal request, it needs to first track them at runtime. In the future, there might be other platforms requiring runtime boot memory enumeration. Hence lets expand the existing generic memblock framework for this purpose rather then creating one just for arm64 platforms. This introduces a new memblock flag MEMBLOCK_BOOT along with helpers which can be marked by given platform on all memory regions discovered during boot. Cc: Mike Rapoport Cc: Andrew Morton Signed-off-by: Anshuman Khandual --- include/linux/memblock.h | 10 ++++++++++ mm/memblock.c | 37 +++++++++++++++++++++++++++++++++++++ 2 files changed, 47 insertions(+) diff --git a/include/linux/memblock.h b/include/linux/memblock.h index b38bbef..fb04c87 100644 --- a/include/linux/memblock.h +++ b/include/linux/memblock.h @@ -31,12 +31,14 @@ extern unsigned long long max_possible_pfn; * @MEMBLOCK_HOTPLUG: hotpluggable region * @MEMBLOCK_MIRROR: mirrored region * @MEMBLOCK_NOMAP: don't add to kernel direct mapping + * @MEMBLOCK_BOOT: memory received from firmware during boot */ enum memblock_flags { MEMBLOCK_NONE = 0x0, /* No special request */ MEMBLOCK_HOTPLUG = 0x1, /* hotpluggable region */ MEMBLOCK_MIRROR = 0x2, /* mirrored region */ MEMBLOCK_NOMAP = 0x4, /* don't add to kernel direct mapping */ + MEMBLOCK_BOOT = 0x8, /* memory received from firmware during boot */ }; /** @@ -116,6 +118,8 @@ int memblock_reserve(phys_addr_t base, phys_addr_t size); void memblock_trim_memory(phys_addr_t align); bool memblock_overlaps_region(struct memblock_type *type, phys_addr_t base, phys_addr_t size); +int memblock_mark_boot(phys_addr_t base, phys_addr_t size); +int memblock_clear_boot(phys_addr_t base, phys_addr_t size); int memblock_mark_hotplug(phys_addr_t base, phys_addr_t size); int memblock_clear_hotplug(phys_addr_t base, phys_addr_t size); int memblock_mark_mirror(phys_addr_t base, phys_addr_t size); @@ -216,6 +220,11 @@ static inline bool memblock_is_nomap(struct memblock_region *m) return m->flags & MEMBLOCK_NOMAP; } +static inline bool memblock_is_boot(struct memblock_region *m) +{ + return m->flags & MEMBLOCK_BOOT; +} + #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP int memblock_search_pfn_nid(unsigned long pfn, unsigned long *start_pfn, unsigned long *end_pfn); @@ -449,6 +458,7 @@ void memblock_cap_memory_range(phys_addr_t base, phys_addr_t size); void memblock_mem_limit_remove_map(phys_addr_t limit); bool memblock_is_memory(phys_addr_t addr); bool memblock_is_map_memory(phys_addr_t addr); +bool memblock_is_boot_memory(phys_addr_t addr); bool memblock_is_region_memory(phys_addr_t base, phys_addr_t size); bool memblock_is_reserved(phys_addr_t addr); bool memblock_is_region_reserved(phys_addr_t base, phys_addr_t size); diff --git a/mm/memblock.c b/mm/memblock.c index 4bc2c7d..e10207f 100644 --- a/mm/memblock.c +++ b/mm/memblock.c @@ -865,6 +865,30 @@ static int __init_memblock memblock_setclr_flag(phys_addr_t base, } /** + * memblock_mark_bootmem - Mark boot memory with flag MEMBLOCK_BOOT. + * @base: the base phys addr of the region + * @size: the size of the region + * + * Return: 0 on success, -errno on failure. + */ +int __init_memblock memblock_mark_boot(phys_addr_t base, phys_addr_t size) +{ + return memblock_setclr_flag(base, size, 1, MEMBLOCK_BOOT); +} + +/** + * memblock_clear_bootmem - Clear flag MEMBLOCK_BOOT for a specified region. + * @base: the base phys addr of the region + * @size: the size of the region + * + * Return: 0 on success, -errno on failure. + */ +int __init_memblock memblock_clear_boot(phys_addr_t base, phys_addr_t size) +{ + return memblock_setclr_flag(base, size, 0, MEMBLOCK_BOOT); +} + +/** * memblock_mark_hotplug - Mark hotpluggable memory with flag MEMBLOCK_HOTPLUG. * @base: the base phys addr of the region * @size: the size of the region @@ -974,6 +998,10 @@ static bool should_skip_region(struct memblock_region *m, int nid, int flags) if ((flags & MEMBLOCK_MIRROR) && !memblock_is_mirror(m)) return true; + /* if we want boot memory skip non-boot memory regions */ + if ((flags & MEMBLOCK_BOOT) && !memblock_is_boot(m)) + return true; + /* skip nomap memory unless we were asked for it explicitly */ if (!(flags & MEMBLOCK_NOMAP) && memblock_is_nomap(m)) return true; @@ -1785,6 +1813,15 @@ bool __init_memblock memblock_is_map_memory(phys_addr_t addr) return !memblock_is_nomap(&memblock.memory.regions[i]); } +bool __init_memblock memblock_is_boot_memory(phys_addr_t addr) +{ + int i = memblock_search(&memblock.memory, addr); + + if (i == -1) + return false; + return memblock_is_boot(&memblock.memory.regions[i]); +} + #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP int __init_memblock memblock_search_pfn_nid(unsigned long pfn, unsigned long *start_pfn, unsigned long *end_pfn) From patchwork Fri Jan 10 03:09:13 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anshuman Khandual X-Patchwork-Id: 11326545 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EDFCC139A for ; Fri, 10 Jan 2020 03:09:07 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id C44AF2073A for ; Fri, 10 Jan 2020 03:09:07 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C44AF2073A Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=arm.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 12F638E0009; Thu, 9 Jan 2020 22:09:07 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 0B8F18E0001; Thu, 9 Jan 2020 22:09:07 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EE9548E0009; Thu, 9 Jan 2020 22:09:06 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0023.hostedemail.com [216.40.44.23]) by kanga.kvack.org (Postfix) with ESMTP id D7FD18E0001 for ; Thu, 9 Jan 2020 22:09:06 -0500 (EST) Received: from smtpin25.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with SMTP id A01EB8248D51 for ; Fri, 10 Jan 2020 03:09:06 +0000 (UTC) X-FDA: 76360243092.25.year93_6c92976e41a1f X-Spam-Summary: 2,0,0,02bb81c14fe0b643,d41d8cd98f00b204,anshuman.khandual@arm.com,::linux-kernel@vger.kernel.org:linux-arm-kernel@lists.infradead.org:akpm@linux-foundation.org:catalin.marinas@arm.com:will@kernel.org:mark.rutland@arm.com:david@redhat.com:cai@lca.pw:logang@deltatee.com:cpandya@codeaurora.org:arunks@codeaurora.org:dan.j.williams@intel.com:mgorman@techsingularity.net:osalvador@suse.de:ard.biesheuvel@arm.com:steve.capper@arm.com:broonie@kernel.org:valentin.schneider@arm.com:robin.murphy@arm.com:steven.price@arm.com:suzuki.poulose@arm.com:ira.weiny@intel.com:anshuman.khandual@arm.com:robh+dt@kernel.org:frowand.list@gmail.com:devicetree@vger.kernel.org,RULES_HIT:41:355:379:541:800:960:968:973:988:989:1260:1261:1345:1359:1431:1437:1534:1541:1711:1730:1747:1777:1792:1963:2393:2559:2562:2693:3138:3139:3140:3141:3142:3352:3865:3866:3867:3868:3872:4250:4321:5007:6261:6742:6743:7514:8603:8634:9010:10004:11026:11658:11914:12043:12048:12297:12438:12555:12895:13069:13311:133 57:14096 X-HE-Tag: year93_6c92976e41a1f X-Filterd-Recvd-Size: 2891 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf14.hostedemail.com (Postfix) with ESMTP for ; Fri, 10 Jan 2020 03:09:05 +0000 (UTC) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 2925411D4; Thu, 9 Jan 2020 19:09:05 -0800 (PST) Received: from p8cg001049571a15.blr.arm.com (p8cg001049571a15.blr.arm.com [10.162.42.128]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 5672B3F703; Thu, 9 Jan 2020 19:08:57 -0800 (PST) From: Anshuman Khandual To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, akpm@linux-foundation.org, catalin.marinas@arm.com, will@kernel.org Cc: mark.rutland@arm.com, david@redhat.com, cai@lca.pw, logang@deltatee.com, cpandya@codeaurora.org, arunks@codeaurora.org, dan.j.williams@intel.com, mgorman@techsingularity.net, osalvador@suse.de, ard.biesheuvel@arm.com, steve.capper@arm.com, broonie@kernel.org, valentin.schneider@arm.com, Robin.Murphy@arm.com, steven.price@arm.com, suzuki.poulose@arm.com, ira.weiny@intel.com, Anshuman Khandual , Rob Herring , Frank Rowand , devicetree@vger.kernel.org Subject: [PATCH V11 3/5] of/fdt: Mark boot memory with MEMBLOCK_BOOT Date: Fri, 10 Jan 2020 08:39:13 +0530 Message-Id: <1578625755-11792-4-git-send-email-anshuman.khandual@arm.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1578625755-11792-1-git-send-email-anshuman.khandual@arm.com> References: <1578625755-11792-1-git-send-email-anshuman.khandual@arm.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: early_init_dt_add_memory_arch() adds memory into memblock on both UEFI and DT based arm64 systems. Lets mark these as boot memory right after they get into memblock. All other platforms using this default implementation for early_init_dt_add_memory_arch() will also have this memblock flag set on boot memory ranges but will be upto the platforms if they would like to use it or not. On arm64 platform this flag will be used to identify boot memory at runtime and reject any attempt to remove them. Cc: Rob Herring Cc: Frank Rowand Cc: devicetree@vger.kernel.org Signed-off-by: Anshuman Khandual --- drivers/of/fdt.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c index 2cdf64d..a2ae2c88 100644 --- a/drivers/of/fdt.c +++ b/drivers/of/fdt.c @@ -1143,6 +1143,7 @@ void __init __weak early_init_dt_add_memory_arch(u64 base, u64 size) base = phys_offset; } memblock_add(base, size); + memblock_mark_boot(base, size); } int __init __weak early_init_dt_mark_hotplug_memory_arch(u64 base, u64 size) From patchwork Fri Jan 10 03:09:14 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anshuman Khandual X-Patchwork-Id: 11326549 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7F1A1139A for ; Fri, 10 Jan 2020 03:09:15 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 563AD2075D for ; Fri, 10 Jan 2020 03:09:15 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 563AD2075D Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=arm.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 303928E000A; Thu, 9 Jan 2020 22:09:13 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 2DA6C8E0001; Thu, 9 Jan 2020 22:09:13 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 218898E000A; Thu, 9 Jan 2020 22:09:13 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0158.hostedemail.com [216.40.44.158]) by kanga.kvack.org (Postfix) with ESMTP id 0D2608E0001 for ; Thu, 9 Jan 2020 22:09:13 -0500 (EST) Received: from smtpin06.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with SMTP id B6C1D2C33 for ; Fri, 10 Jan 2020 03:09:12 +0000 (UTC) X-FDA: 76360243344.06.jeans27_6d7f30f1db300 X-Spam-Summary: 2,0,0,f9891cc86513ff86,d41d8cd98f00b204,anshuman.khandual@arm.com,::linux-kernel@vger.kernel.org:linux-arm-kernel@lists.infradead.org:akpm@linux-foundation.org:catalin.marinas@arm.com:will@kernel.org:mark.rutland@arm.com:david@redhat.com:cai@lca.pw:logang@deltatee.com:cpandya@codeaurora.org:arunks@codeaurora.org:dan.j.williams@intel.com:mgorman@techsingularity.net:osalvador@suse.de:ard.biesheuvel@arm.com:steve.capper@arm.com:broonie@kernel.org:valentin.schneider@arm.com:robin.murphy@arm.com:steven.price@arm.com:suzuki.poulose@arm.com:ira.weiny@intel.com:anshuman.khandual@arm.com,RULES_HIT:41:355:379:541:800:960:966:968:973:988:989:1260:1261:1345:1359:1431:1437:1534:1541:1711:1730:1747:1777:1792:1963:2194:2196:2199:2200:2393:2553:2559:2562:2693:2892:2898:3138:3139:3140:3141:3142:3352:3865:3866:3867:3868:3870:3871:4250:4321:4385:5007:6261:6742:7903:8634:10004:11026:11473:11658:11914:12043:12048:12296:12297:12438:12555:12895:13069:13153:13161:13228:13229:13255:13 311:1335 X-HE-Tag: jeans27_6d7f30f1db300 X-Filterd-Recvd-Size: 3331 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf05.hostedemail.com (Postfix) with ESMTP for ; Fri, 10 Jan 2020 03:09:12 +0000 (UTC) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 9BF3A11FB; Thu, 9 Jan 2020 19:09:11 -0800 (PST) Received: from p8cg001049571a15.blr.arm.com (p8cg001049571a15.blr.arm.com [10.162.42.128]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 988163F703; Thu, 9 Jan 2020 19:09:05 -0800 (PST) From: Anshuman Khandual To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, akpm@linux-foundation.org, catalin.marinas@arm.com, will@kernel.org Cc: mark.rutland@arm.com, david@redhat.com, cai@lca.pw, logang@deltatee.com, cpandya@codeaurora.org, arunks@codeaurora.org, dan.j.williams@intel.com, mgorman@techsingularity.net, osalvador@suse.de, ard.biesheuvel@arm.com, steve.capper@arm.com, broonie@kernel.org, valentin.schneider@arm.com, Robin.Murphy@arm.com, steven.price@arm.com, suzuki.poulose@arm.com, ira.weiny@intel.com, Anshuman Khandual Subject: [PATCH V11 4/5] arm64/mm: Hold memory hotplug lock while walking for kernel page table dump Date: Fri, 10 Jan 2020 08:39:14 +0530 Message-Id: <1578625755-11792-5-git-send-email-anshuman.khandual@arm.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1578625755-11792-1-git-send-email-anshuman.khandual@arm.com> References: <1578625755-11792-1-git-send-email-anshuman.khandual@arm.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The arm64 page table dump code can race with concurrent modification of the kernel page tables. When a leaf entries are modified concurrently, the dump code may log stale or inconsistent information for a VA range, but this is otherwise not harmful. When intermediate levels of table are freed, the dump code will continue to use memory which has been freed and potentially reallocated for another purpose. In such cases, the dump code may dereference bogus addresses, leading to a number of potential problems. Intermediate levels of table may by freed during memory hot-remove, which will be enabled by a subsequent patch. To avoid racing with this, take the memory hotplug lock when walking the kernel page table. Cc: Catalin Marinas Cc: Will Deacon Acked-by: David Hildenbrand Acked-by: Mark Rutland Signed-off-by: Anshuman Khandual --- arch/arm64/mm/ptdump_debugfs.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/arch/arm64/mm/ptdump_debugfs.c b/arch/arm64/mm/ptdump_debugfs.c index 064163f..b5eebc8 100644 --- a/arch/arm64/mm/ptdump_debugfs.c +++ b/arch/arm64/mm/ptdump_debugfs.c @@ -1,5 +1,6 @@ // SPDX-License-Identifier: GPL-2.0 #include +#include #include #include @@ -7,7 +8,10 @@ static int ptdump_show(struct seq_file *m, void *v) { struct ptdump_info *info = m->private; + + get_online_mems(); ptdump_walk_pgd(m, info); + put_online_mems(); return 0; } DEFINE_SHOW_ATTRIBUTE(ptdump); From patchwork Fri Jan 10 03:09:15 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anshuman Khandual X-Patchwork-Id: 11326551 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8467513A0 for ; Fri, 10 Jan 2020 03:09:21 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 40ADC2073A for ; Fri, 10 Jan 2020 03:09:21 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 40ADC2073A Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=arm.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 6379F8E000B; Thu, 9 Jan 2020 22:09:20 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 60F488E0001; Thu, 9 Jan 2020 22:09:20 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4FCD88E000B; Thu, 9 Jan 2020 22:09:20 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0040.hostedemail.com [216.40.44.40]) by kanga.kvack.org (Postfix) with ESMTP id 35E5E8E0001 for ; Thu, 9 Jan 2020 22:09:20 -0500 (EST) Received: from smtpin08.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with SMTP id 03948181AC9C6 for ; Fri, 10 Jan 2020 03:09:20 +0000 (UTC) X-FDA: 76360243680.08.steam67_6e87a79776930 X-Spam-Summary: 2,0,0,ec9911d0a0d5c08c,d41d8cd98f00b204,anshuman.khandual@arm.com,::linux-kernel@vger.kernel.org:linux-arm-kernel@lists.infradead.org:akpm@linux-foundation.org:catalin.marinas@arm.com:will@kernel.org:mark.rutland@arm.com:david@redhat.com:cai@lca.pw:logang@deltatee.com:cpandya@codeaurora.org:arunks@codeaurora.org:dan.j.williams@intel.com:mgorman@techsingularity.net:osalvador@suse.de:ard.biesheuvel@arm.com:steve.capper@arm.com:broonie@kernel.org:valentin.schneider@arm.com:robin.murphy@arm.com:steven.price@arm.com:suzuki.poulose@arm.com:ira.weiny@intel.com:anshuman.khandual@arm.com,RULES_HIT:4:41:355:379:541:800:960:966:968:973:988:989:1260:1261:1345:1359:1431:1437:1605:1730:1747:1777:1792:1963:2196:2198:2199:2200:2393:2553:2559:2562:2640:2693:2895:2898:3138:3139:3140:3141:3142:3865:3866:3867:3868:3870:3871:3872:3874:4250:4321:4385:4419:4605:5007:6117:6119:6261:6742:7903:8603:8634:8660:8957:9036:9389:10004:10128:11026:11473:11657:11658:11914:12043:12048:12291:122 96:12297 X-HE-Tag: steam67_6e87a79776930 X-Filterd-Recvd-Size: 16278 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf48.hostedemail.com (Postfix) with ESMTP for ; Fri, 10 Jan 2020 03:09:19 +0000 (UTC) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 8654A12FC; Thu, 9 Jan 2020 19:09:18 -0800 (PST) Received: from p8cg001049571a15.blr.arm.com (p8cg001049571a15.blr.arm.com [10.162.42.128]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 1BC813F703; Thu, 9 Jan 2020 19:09:11 -0800 (PST) From: Anshuman Khandual To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, akpm@linux-foundation.org, catalin.marinas@arm.com, will@kernel.org Cc: mark.rutland@arm.com, david@redhat.com, cai@lca.pw, logang@deltatee.com, cpandya@codeaurora.org, arunks@codeaurora.org, dan.j.williams@intel.com, mgorman@techsingularity.net, osalvador@suse.de, ard.biesheuvel@arm.com, steve.capper@arm.com, broonie@kernel.org, valentin.schneider@arm.com, Robin.Murphy@arm.com, steven.price@arm.com, suzuki.poulose@arm.com, ira.weiny@intel.com, Anshuman Khandual Subject: [PATCH V11 5/5] arm64/mm: Enable memory hot remove Date: Fri, 10 Jan 2020 08:39:15 +0530 Message-Id: <1578625755-11792-6-git-send-email-anshuman.khandual@arm.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1578625755-11792-1-git-send-email-anshuman.khandual@arm.com> References: <1578625755-11792-1-git-send-email-anshuman.khandual@arm.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The arch code for hot-remove must tear down portions of the linear map and vmemmap corresponding to memory being removed. In both cases the page tables mapping these regions must be freed, and when sparse vmemmap is in use the memory backing the vmemmap must also be freed. This patch adds unmap_hotplug_range() and free_empty_tables() helpers which can be used to tear down either region and calls it from vmemmap_free() and ___remove_pgd_mapping(). The free_mapped argument determines whether the backing memory will be freed. It makes two distinct passes over the kernel page table. In the first pass with unmap_hotplug_range() it unmaps, invalidates applicable TLB cache and frees backing memory if required (vmemmap) for each mapped leaf entry. In the second pass with free_empty_tables() it looks for empty page table sections whose page table page can be unmapped, TLB invalidated and freed. While freeing intermediate level page table pages bail out if any of its entries are still valid. This can happen for partially filled kernel page table either from a previously attempted failed memory hot add or while removing an address range which does not span the entire page table page range. The vmemmap region may share levels of table with the vmalloc region. There can be conflicts between hot remove freeing page table pages with a concurrent vmalloc() walking the kernel page table. This conflict can not just be solved by taking the init_mm ptl because of existing locking scheme in vmalloc(). So free_empty_tables() implements a floor and ceiling method which is borrowed from user page table tear with free_pgd_range() which skips freeing page table pages if intermediate address range is not aligned or maximum floor-ceiling might not own the entire page table page. Boot memory on arm64 cannot be removed. Hence subscribe the earlier added platform call back mechanism arch_memory_removable() and reject any boot memory removal requests. While here update arch_add_memory() to handle __add_pages() failures by just unmapping recently added kernel linear mapping. Now enable memory hot remove on arm64 platforms by default with ARCH_ENABLE_MEMORY_HOTREMOVE. This implementation is overall inspired from kernel page table tear down procedure on X86 architecture and user page table tear down method. Cc: Catalin Marinas Cc: Will Deacon Cc: Steve Capper Cc: Mark Rutland Reviewed-by: Catalin Marinas Signed-off-by: Anshuman Khandual --- arch/arm64/Kconfig | 3 + arch/arm64/include/asm/memory.h | 6 + arch/arm64/mm/mmu.c | 334 ++++++++++++++++++++++++++++++++++++++-- 3 files changed, 334 insertions(+), 9 deletions(-) diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index b1b4476..402a114 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -277,6 +277,9 @@ config ZONE_DMA32 config ARCH_ENABLE_MEMORY_HOTPLUG def_bool y +config ARCH_ENABLE_MEMORY_HOTREMOVE + def_bool y + config SMP def_bool y diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h index a4f9ca5..045a512 100644 --- a/arch/arm64/include/asm/memory.h +++ b/arch/arm64/include/asm/memory.h @@ -54,6 +54,7 @@ #define MODULES_VADDR (BPF_JIT_REGION_END) #define MODULES_VSIZE (SZ_128M) #define VMEMMAP_START (-VMEMMAP_SIZE - SZ_2M) +#define VMEMMAP_END (VMEMMAP_START + VMEMMAP_SIZE) #define PCI_IO_END (VMEMMAP_START - SZ_2M) #define PCI_IO_START (PCI_IO_END - PCI_IO_SIZE) #define FIXADDR_TOP (PCI_IO_START - SZ_2M) @@ -292,6 +293,11 @@ static inline void *phys_to_virt(phys_addr_t x) return (void *)(__phys_to_virt(x)); } +#ifdef CONFIG_MEMORY_HOTREMOVE +#define arch_memory_removable arch_memory_removable +extern bool arch_memory_removable(u64 base, u64 size); +#endif + /* * Drivers should NOT use these either. */ diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index 40797cb..2cb1b2e 100644 --- a/arch/arm64/mm/mmu.c +++ b/arch/arm64/mm/mmu.c @@ -17,6 +17,7 @@ #include #include #include +#include #include #include #include @@ -724,6 +725,275 @@ int kern_addr_valid(unsigned long addr) return pfn_valid(pte_pfn(pte)); } + +#ifdef CONFIG_MEMORY_HOTPLUG +static void free_hotplug_page_range(struct page *page, size_t size) +{ + WARN_ON(PageReserved(page)); + free_pages((unsigned long)page_address(page), get_order(size)); +} + +static void free_hotplug_pgtable_page(struct page *page) +{ + free_hotplug_page_range(page, PAGE_SIZE); +} + +static bool pgtable_range_aligned(unsigned long start, unsigned long end, + unsigned long floor, unsigned long ceiling, + unsigned long mask) +{ + start &= mask; + if (start < floor) + return false; + + if (ceiling) { + ceiling &= mask; + if (!ceiling) + return false; + } + + if (end - 1 > ceiling - 1) + return false; + return true; +} + +static void unmap_hotplug_pte_range(pmd_t *pmdp, unsigned long addr, + unsigned long end, bool free_mapped) +{ + pte_t *ptep, pte; + + do { + ptep = pte_offset_kernel(pmdp, addr); + pte = READ_ONCE(*ptep); + if (pte_none(pte)) + continue; + + WARN_ON(!pte_present(pte)); + pte_clear(&init_mm, addr, ptep); + flush_tlb_kernel_range(addr, addr + PAGE_SIZE); + if (free_mapped) + free_hotplug_page_range(pte_page(pte), PAGE_SIZE); + } while (addr += PAGE_SIZE, addr < end); +} + +static void unmap_hotplug_pmd_range(pud_t *pudp, unsigned long addr, + unsigned long end, bool free_mapped) +{ + unsigned long next; + pmd_t *pmdp, pmd; + + do { + next = pmd_addr_end(addr, end); + pmdp = pmd_offset(pudp, addr); + pmd = READ_ONCE(*pmdp); + if (pmd_none(pmd)) + continue; + + WARN_ON(!pmd_present(pmd)); + if (pmd_sect(pmd)) { + pmd_clear(pmdp); + + /* + * One TLBI should be sufficient here as the PMD_SIZE + * range is mapped with a single block entry. + */ + flush_tlb_kernel_range(addr, addr + PAGE_SIZE); + if (free_mapped) + free_hotplug_page_range(pmd_page(pmd), + PMD_SIZE); + continue; + } + WARN_ON(!pmd_table(pmd)); + unmap_hotplug_pte_range(pmdp, addr, next, free_mapped); + } while (addr = next, addr < end); +} + +static void unmap_hotplug_pud_range(pgd_t *pgdp, unsigned long addr, + unsigned long end, bool free_mapped) +{ + unsigned long next; + pud_t *pudp, pud; + + do { + next = pud_addr_end(addr, end); + pudp = pud_offset(pgdp, addr); + pud = READ_ONCE(*pudp); + if (pud_none(pud)) + continue; + + WARN_ON(!pud_present(pud)); + if (pud_sect(pud)) { + pud_clear(pudp); + + /* + * One TLBI should be sufficient here as the PUD_SIZE + * range is mapped with a single block entry. + */ + flush_tlb_kernel_range(addr, addr + PAGE_SIZE); + if (free_mapped) + free_hotplug_page_range(pud_page(pud), + PUD_SIZE); + continue; + } + WARN_ON(!pud_table(pud)); + unmap_hotplug_pmd_range(pudp, addr, next, free_mapped); + } while (addr = next, addr < end); +} + +static void unmap_hotplug_range(unsigned long addr, unsigned long end, + bool free_mapped) +{ + unsigned long next; + pgd_t *pgdp, pgd; + + do { + next = pgd_addr_end(addr, end); + pgdp = pgd_offset_k(addr); + pgd = READ_ONCE(*pgdp); + if (pgd_none(pgd)) + continue; + + WARN_ON(!pgd_present(pgd)); + unmap_hotplug_pud_range(pgdp, addr, next, free_mapped); + } while (addr = next, addr < end); +} + +static void free_empty_pte_table(pmd_t *pmdp, unsigned long addr, + unsigned long end, unsigned long floor, + unsigned long ceiling) +{ + pte_t *ptep, pte; + unsigned long i, start = addr; + + do { + ptep = pte_offset_kernel(pmdp, addr); + pte = READ_ONCE(*ptep); + + /* + * This is just a sanity check here which verifies that + * pte clearing has been done by earlier unmap loops. + */ + WARN_ON(!pte_none(pte)); + } while (addr += PAGE_SIZE, addr < end); + + if (!pgtable_range_aligned(start, end, floor, ceiling, PMD_MASK)) + return; + + /* + * Check whether we can free the pte page if the rest of the + * entries are empty. Overlap with other regions have been + * handled by the floor/ceiling check. + */ + ptep = pte_offset_kernel(pmdp, 0UL); + for (i = 0; i < PTRS_PER_PTE; i++) { + if (!pte_none(READ_ONCE(ptep[i]))) + return; + } + + pmd_clear(pmdp); + __flush_tlb_kernel_pgtable(start); + free_hotplug_pgtable_page(virt_to_page(ptep)); +} + +static void free_empty_pmd_table(pud_t *pudp, unsigned long addr, + unsigned long end, unsigned long floor, + unsigned long ceiling) +{ + pmd_t *pmdp, pmd; + unsigned long i, next, start = addr; + + do { + next = pmd_addr_end(addr, end); + pmdp = pmd_offset(pudp, addr); + pmd = READ_ONCE(*pmdp); + if (pmd_none(pmd)) + continue; + + WARN_ON(!pmd_present(pmd) || !pmd_table(pmd) || pmd_sect(pmd)); + free_empty_pte_table(pmdp, addr, next, floor, ceiling); + } while (addr = next, addr < end); + + if (CONFIG_PGTABLE_LEVELS <= 2) + return; + + if (!pgtable_range_aligned(start, end, floor, ceiling, PUD_MASK)) + return; + + /* + * Check whether we can free the pmd page if the rest of the + * entries are empty. Overlap with other regions have been + * handled by the floor/ceiling check. + */ + pmdp = pmd_offset(pudp, 0UL); + for (i = 0; i < PTRS_PER_PMD; i++) { + if (!pmd_none(READ_ONCE(pmdp[i]))) + return; + } + + pud_clear(pudp); + __flush_tlb_kernel_pgtable(start); + free_hotplug_pgtable_page(virt_to_page(pmdp)); +} + +static void free_empty_pud_table(pgd_t *pgdp, unsigned long addr, + unsigned long end, unsigned long floor, + unsigned long ceiling) +{ + pud_t *pudp, pud; + unsigned long i, next, start = addr; + + do { + next = pud_addr_end(addr, end); + pudp = pud_offset(pgdp, addr); + pud = READ_ONCE(*pudp); + if (pud_none(pud)) + continue; + + WARN_ON(!pud_present(pud) || !pud_table(pud) || pud_sect(pud)); + free_empty_pmd_table(pudp, addr, next, floor, ceiling); + } while (addr = next, addr < end); + + if (CONFIG_PGTABLE_LEVELS <= 3) + return; + + if (!pgtable_range_aligned(start, end, floor, ceiling, PGDIR_MASK)) + return; + + /* + * Check whether we can free the pud page if the rest of the + * entries are empty. Overlap with other regions have been + * handled by the floor/ceiling check. + */ + pudp = pud_offset(pgdp, 0UL); + for (i = 0; i < PTRS_PER_PUD; i++) { + if (!pud_none(READ_ONCE(pudp[i]))) + return; + } + + pgd_clear(pgdp); + __flush_tlb_kernel_pgtable(start); + free_hotplug_pgtable_page(virt_to_page(pudp)); +} + +static void free_empty_tables(unsigned long addr, unsigned long end, + unsigned long floor, unsigned long ceiling) +{ + unsigned long next; + pgd_t *pgdp, pgd; + + do { + next = pgd_addr_end(addr, end); + pgdp = pgd_offset_k(addr); + pgd = READ_ONCE(*pgdp); + if (pgd_none(pgd)) + continue; + + WARN_ON(!pgd_present(pgd)); + free_empty_pud_table(pgdp, addr, next, floor, ceiling); + } while (addr = next, addr < end); +} +#endif + #ifdef CONFIG_SPARSEMEM_VMEMMAP #if !ARM64_SWAPPER_USES_SECTION_MAPS int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node, @@ -771,6 +1041,12 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node, void vmemmap_free(unsigned long start, unsigned long end, struct vmem_altmap *altmap) { +#ifdef CONFIG_MEMORY_HOTPLUG + WARN_ON((start < VMEMMAP_START) || (end > VMEMMAP_END)); + + unmap_hotplug_range(start, end, true); + free_empty_tables(start, end, VMEMMAP_START, VMEMMAP_END); +#endif } #endif /* CONFIG_SPARSEMEM_VMEMMAP */ @@ -1049,10 +1325,41 @@ int p4d_free_pud_page(p4d_t *p4d, unsigned long addr) } #ifdef CONFIG_MEMORY_HOTPLUG +static bool range_overlaps_bootmem(u64 base, u64 size) +{ + unsigned long addr, end = base + size; + unsigned long mem_block_size = memory_block_size_bytes(); + + WARN_ON(!IS_ALIGNED(base, mem_block_size)); + WARN_ON(!IS_ALIGNED(size, mem_block_size)); + + /* + * Both memory hot add and remove happens in memory block + * units. Any given memory block on the system was either + * added during boot or at runtime via hotplug. + */ + for (addr = base; addr <= end; addr += mem_block_size) { + if (memblock_is_boot_memory(addr)) + return true; + } + return false; +} + +static void __remove_pgd_mapping(pgd_t *pgdir, unsigned long start, u64 size) +{ + unsigned long end = start + size; + + WARN_ON(pgdir != init_mm.pgd); + WARN_ON((start < PAGE_OFFSET) || (end > PAGE_END)); + + unmap_hotplug_range(start, end, false); + free_empty_tables(start, end, PAGE_OFFSET, PAGE_END); +} + int arch_add_memory(int nid, u64 start, u64 size, struct mhp_restrictions *restrictions) { - int flags = 0; + int ret, flags = 0; if (rodata_full || debug_pagealloc_enabled()) flags = NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS; @@ -1062,8 +1369,13 @@ int arch_add_memory(int nid, u64 start, u64 size, memblock_clear_nomap(start, size); - return __add_pages(nid, start >> PAGE_SHIFT, size >> PAGE_SHIFT, + ret = __add_pages(nid, start >> PAGE_SHIFT, size >> PAGE_SHIFT, restrictions); + if (ret) + __remove_pgd_mapping(swapper_pg_dir, + __phys_to_virt(start), size); + return ret; + } void arch_remove_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap) @@ -1071,13 +1383,17 @@ void arch_remove_memory(int nid, u64 start, u64 size, unsigned long start_pfn = start >> PAGE_SHIFT; unsigned long nr_pages = size >> PAGE_SHIFT; - /* - * FIXME: Cleanup page tables (also in arch_add_memory() in case - * adding fails). Until then, this function should only be used - * during memory hotplug (adding memory), not for memory - * unplug. ARCH_ENABLE_MEMORY_HOTREMOVE must not be - * unlocked yet. - */ + WARN_ON(range_overlaps_bootmem(start, size)); __remove_pages(start_pfn, nr_pages, altmap); + __remove_pgd_mapping(swapper_pg_dir, __phys_to_virt(start), size); +} +#endif + +#ifdef CONFIG_MEMORY_HOTREMOVE +bool arch_memory_removable(u64 base, u64 size) +{ + if (range_overlaps_bootmem(base, size)) + return false; + return true; } #endif