From patchwork Thu Jul 5 06:49:12 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 10508221 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 6C98060116 for ; Thu, 5 Jul 2018 06:59:16 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 57E6328DF2 for ; Thu, 5 Jul 2018 06:59:16 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 4C1B228E49; Thu, 5 Jul 2018 06:59:16 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00, MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=unavailable version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E06B628DF2 for ; Thu, 5 Jul 2018 06:59:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B80596B0010; Thu, 5 Jul 2018 02:59:13 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id B32326B0266; Thu, 5 Jul 2018 02:59:13 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A20676B0269; Thu, 5 Jul 2018 02:59:13 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf0-f198.google.com (mail-pf0-f198.google.com [209.85.192.198]) by kanga.kvack.org (Postfix) with ESMTP id 545CD6B0010 for ; Thu, 5 Jul 2018 02:59:13 -0400 (EDT) Received: by mail-pf0-f198.google.com with SMTP id v10-v6so4270729pfm.11 for ; Wed, 04 Jul 2018 23:59:13 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:subject:from :to:cc:date:message-id:in-reply-to:references:user-agent :mime-version:content-transfer-encoding; bh=eiKLlMwXljBu6pYrWpvAK5sCYvikoobdIbqYn4Q4mZ4=; b=TWZWvhgLhnrLSOusmn9qmQUr3VWkJQ1VT1SZiIyqZzI2wLJ4K+9Eo+MWXmsIRaVsuG b8DcrQAWSWqVwX+R6peQ91mNpNS4BMK7gXt5yc7IywDzOctS7xylkgAdDlpHJUfK2Up8 vMHtu2dR74W2wvm8O6SvNw10QqxVvi953yKsbiJPz4s4oLBJ9I94YoCezYvFq1CqSzR7 P9GjwF4+zjk57aPb9n2rx4jh8RTSZYsl6TZ35g7ZcjDA5R1N8uDKSKJ0nWau/MH+cvTB AkNT4TDa8YfxidhJ5gdmoDqFy3/NWqd+7TTeyiz9SKGsgjuj5hSFtsEO79DDCFywZwQ8 MFjQ== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 192.55.52.151 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: APt69E3PaqLQtcT0lG2RuIj/whActS8Os5HzQ7K4OwVstfrpYR3TWZJj /p9Nu92NOfOheM0P9ranYV/nMYYPb9GKB8smH/2JYt+OWeFSh0EYl7O7rKkdi5F0G+uMZpzIoJM a0WUCI7TaVXeCr2X7dH7KhNiXA8xS8G2r5fs9qPpPbKOSqSAJKA4P0Seg7BHRwHjdpw== X-Received: by 2002:a63:440a:: with SMTP id r10-v6mr4395505pga.27.1530773952981; Wed, 04 Jul 2018 23:59:12 -0700 (PDT) X-Google-Smtp-Source: AAOMgpdiBOdBLKc2DxAC9us9JWVTEfUK4RO1M+r9DtCAQ5Zl9pvED3oiz3xVF6Iw6Yqeiy5uprNm X-Received: by 2002:a63:440a:: with SMTP id r10-v6mr4395433pga.27.1530773951413; Wed, 04 Jul 2018 23:59:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1530773951; cv=none; d=google.com; s=arc-20160816; b=eXqXeNYjnDN8kWS1iL+PwQQNoGs/7B1bjous88t8jB3oMRNOrM3+twx4+JYT7v/uD9 pzOCY0oIBavjrzxM8esLkWTL6BA4acnhNjZvvWeTPEZk8Fp8DcSixxeiGjSXzKTTY68m 70DvL8FuT/AewJjB+Sw5xSBEEggS24045OANCWEVNCjVx+91A0k3B+iicjctWzW3jxB8 IPbsFb7Q7lJuF5PsScS+Yos55Lcrmi6i/rbUUw3t1hgO0DXtgeQBuhbPpM82A1KYLWSL as3fyiRcbQqYEfMqKrUsrccZ+vfryGe3M+qK6BVc0NbYKxe2nvWPEsgYQrYCY5475UcJ 5X/A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:user-agent:references :in-reply-to:message-id:date:cc:to:from:subject :arc-authentication-results; bh=eiKLlMwXljBu6pYrWpvAK5sCYvikoobdIbqYn4Q4mZ4=; b=dgFszY4SIIkZOcTZg8utgRsHZcocy/epTBIFRIURBZk7fsmUMev+Zz0I/hsyeVdPyP 7f88qUJ7+MLMY0cwxBgbyl96mIU4jni0ewnbHuNbn6jziY63pHmR3F1RyeQEQy4QaN9v X6Fq+yYGwdZdBpjou8zf+SpyIwaXuUpRhB4dsUWjfwygGui6rpopeAfwr8+uRKiMZ8kJ HlZpqqhFV0KMqTAW+sFeYLo0YNUPjrHLfzEPIJiG+fddw5pSJHTtOIukqaX3mklaKpFV JkkS3AooIinB0/yIN0qKkq3qSK8QjA7ZeDFMgnqBLPS2fcoKyB8ssgbxYN+/PkPlauZD 6kGw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 192.55.52.151 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga17.intel.com (mga17.intel.com. [192.55.52.151]) by mx.google.com with ESMTPS id e9-v6si4984448pgn.576.2018.07.04.23.59.11 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 04 Jul 2018 23:59:11 -0700 (PDT) Received-SPF: pass (google.com: domain of dan.j.williams@intel.com designates 192.55.52.151 as permitted sender) client-ip=192.55.52.151; Authentication-Results: mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 192.55.52.151 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga107.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 04 Jul 2018 23:59:10 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.51,311,1526367600"; d="scan'208";a="243121882" Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.16]) by fmsmga006.fm.intel.com with ESMTP; 04 Jul 2018 23:59:09 -0700 Subject: [PATCH 02/13] mm: Enable asynchronous __add_pages() and vmemmap_populate_hugepages() From: Dan Williams To: akpm@linux-foundation.org Cc: Tony Luck , Fenghua Yu , Benjamin Herrenschmidt , Paul Mackerras , Michael Ellerman , Martin Schwidefsky , Heiko Carstens , Yoshinori Sato , Rich Felker , Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , x86@kernel.org, Michal Hocko , Vlastimil Babka , vishal.l.verma@intel.com, hch@lst.de, linux-nvdimm@lists.01.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Wed, 04 Jul 2018 23:49:12 -0700 Message-ID: <153077335235.40830.14632656289865731741.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <153077334130.40830.2714147692560185329.stgit@dwillia2-desk3.amr.corp.intel.com> References: <153077334130.40830.2714147692560185329.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.18-2-gc94f MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP In preparation for allowing all ZONE_DEVICE page init to happen in the background, enable multiple vmemmap_populate_hugepages() invocations to run in parallel. To date the big memory-hotplug lock has been used to serialize changes to the linear map and vmemmap. Finer grained locking is needed to prevent 2 parallel invocations of vmemmap_populate_hugepages() colliding. Given that populating vmemmap has architecture specific implications this new asynchronous support is only added for the x86_64 arch_add_memory(), all other implementations indicate no support for async operations by returning -EWOULDBLOCK. Cc: Tony Luck Cc: Fenghua Yu Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: Martin Schwidefsky Cc: Heiko Carstens Cc: Yoshinori Sato Cc: Rich Felker Cc: Thomas Gleixner Cc: Ingo Molnar Cc: "H. Peter Anvin" Cc: Cc: Michal Hocko Cc: Vlastimil Babka Cc: Andrew Morton Signed-off-by: Dan Williams --- arch/ia64/mm/init.c | 5 ++- arch/powerpc/mm/mem.c | 5 ++- arch/s390/mm/init.c | 8 +++-- arch/sh/mm/init.c | 5 ++- arch/x86/mm/init_32.c | 8 +++-- arch/x86/mm/init_64.c | 27 ++++++++++------ drivers/nvdimm/pfn_devs.c | 1 + include/linux/memmap_async.h | 28 ++++++++++++++++ include/linux/memory_hotplug.h | 15 ++++++--- include/linux/memremap.h | 2 + include/linux/mm.h | 6 ++- kernel/memremap.c | 4 +- mm/memory_hotplug.c | 69 ++++++++++++++++++++++++++++++---------- mm/page_alloc.c | 3 ++ mm/sparse-vmemmap.c | 56 +++++++++++++++++++++++++------- 15 files changed, 184 insertions(+), 58 deletions(-) create mode 100644 include/linux/memmap_async.h diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c index 18278b448530..d331488dd76f 100644 --- a/arch/ia64/mm/init.c +++ b/arch/ia64/mm/init.c @@ -649,12 +649,15 @@ mem_init (void) #ifdef CONFIG_MEMORY_HOTPLUG int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap, - bool want_memblock) + bool want_memblock, struct memmap_async_state *async) { unsigned long start_pfn = start >> PAGE_SHIFT; unsigned long nr_pages = size >> PAGE_SHIFT; int ret; + if (async) + return -EWOULDBLOCK; + ret = __add_pages(nid, start_pfn, nr_pages, altmap, want_memblock); if (ret) printk("%s: Problem encountered in __add_pages() as ret=%d\n", diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c index 5c8530d0c611..3205a361e37a 100644 --- a/arch/powerpc/mm/mem.c +++ b/arch/powerpc/mm/mem.c @@ -118,12 +118,15 @@ int __weak remove_section_mapping(unsigned long start, unsigned long end) } int __meminit arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap, - bool want_memblock) + bool want_memblock, struct memmap_async_state *async) { unsigned long start_pfn = start >> PAGE_SHIFT; unsigned long nr_pages = size >> PAGE_SHIFT; int rc; + if (async) + return -EWOULDBLOCK; + resize_hpt_for_hotplug(memblock_phys_mem_size()); start = (unsigned long)__va(start); diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c index 3fa3e5323612..ee87085a3a58 100644 --- a/arch/s390/mm/init.c +++ b/arch/s390/mm/init.c @@ -223,17 +223,21 @@ device_initcall(s390_cma_mem_init); #endif /* CONFIG_CMA */ int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap, - bool want_memblock) + bool want_memblock, struct memmap_async_state *async) { unsigned long start_pfn = PFN_DOWN(start); unsigned long size_pages = PFN_DOWN(size); int rc; + if (async) + return -EWOULDBLOCK; + rc = vmem_add_mapping(start, size); if (rc) return rc; - rc = __add_pages(nid, start_pfn, size_pages, altmap, want_memblock); + rc = __add_pages(nid, start_pfn, size_pages, altmap, want_memblock, + async); if (rc) vmem_remove_mapping(start, size); return rc; diff --git a/arch/sh/mm/init.c b/arch/sh/mm/init.c index 4034035fbede..534303de3ec2 100644 --- a/arch/sh/mm/init.c +++ b/arch/sh/mm/init.c @@ -430,12 +430,15 @@ void free_initrd_mem(unsigned long start, unsigned long end) #ifdef CONFIG_MEMORY_HOTPLUG int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap, - bool want_memblock) + bool want_memblock, struct memmap_async_state *async) { unsigned long start_pfn = PFN_DOWN(start); unsigned long nr_pages = size >> PAGE_SHIFT; int ret; + if (async) + return -EWOULDBLOCK; + /* We only have ZONE_NORMAL, so this is easy.. */ ret = __add_pages(nid, start_pfn, nr_pages, altmap, want_memblock); if (unlikely(ret)) diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c index 979e0a02cbe1..1be538746010 100644 --- a/arch/x86/mm/init_32.c +++ b/arch/x86/mm/init_32.c @@ -852,12 +852,16 @@ void __init mem_init(void) #ifdef CONFIG_MEMORY_HOTPLUG int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap, - bool want_memblock) + bool want_memblock, struct memmap_async_state *async) { unsigned long start_pfn = start >> PAGE_SHIFT; unsigned long nr_pages = size >> PAGE_SHIFT; - return __add_pages(nid, start_pfn, nr_pages, altmap, want_memblock); + if (async) + return -EWOULDBLOCK; + + return __add_pages(nid, start_pfn, nr_pages, altmap, want_memblock, + async); } #ifdef CONFIG_MEMORY_HOTREMOVE diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c index a688617c727e..40bd9ba052fe 100644 --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -784,11 +784,13 @@ static void update_end_of_memory_vars(u64 start, u64 size) } int add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages, - struct vmem_altmap *altmap, bool want_memblock) + struct vmem_altmap *altmap, bool want_memblock, + struct memmap_async_state *async) { int ret; - ret = __add_pages(nid, start_pfn, nr_pages, altmap, want_memblock); + ret = __add_pages(nid, start_pfn, nr_pages, altmap, want_memblock, + async); WARN_ON_ONCE(ret); /* update max_pfn, max_low_pfn and high_memory */ @@ -799,14 +801,15 @@ int add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages, } int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap, - bool want_memblock) + bool want_memblock, struct memmap_async_state *async) { unsigned long start_pfn = start >> PAGE_SHIFT; unsigned long nr_pages = size >> PAGE_SHIFT; init_memory_mapping(start, start + size); - return add_pages(nid, start_pfn, nr_pages, altmap, want_memblock); + return add_pages(nid, start_pfn, nr_pages, altmap, want_memblock, + async); } #define PAGE_INUSE 0xFD @@ -1412,26 +1415,30 @@ static int __meminit vmemmap_populate_hugepages(unsigned long start, { unsigned long addr; unsigned long next; - pgd_t *pgd; - p4d_t *p4d; - pud_t *pud; + pgd_t *pgd = NULL; + p4d_t *p4d = NULL; + pud_t *pud = NULL; pmd_t *pmd; for (addr = start; addr < end; addr = next) { next = pmd_addr_end(addr, end); - pgd = vmemmap_pgd_populate(addr, node); + pgd = vmemmap_pgd_populate(addr, node, pgd); if (!pgd) return -ENOMEM; - p4d = vmemmap_p4d_populate(pgd, addr, node); + p4d = vmemmap_p4d_populate(pgd, addr, node, p4d); if (!p4d) return -ENOMEM; - pud = vmemmap_pud_populate(p4d, addr, node); + pud = vmemmap_pud_populate(p4d, addr, node, pud); if (!pud) return -ENOMEM; + /* + * No lock required here as sections do not collide + * below the pud level. + */ pmd = pmd_offset(pud, addr); if (pmd_none(*pmd)) { void *p; diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c index 3f7ad5bc443e..147c62e2ef2b 100644 --- a/drivers/nvdimm/pfn_devs.c +++ b/drivers/nvdimm/pfn_devs.c @@ -577,6 +577,7 @@ static int __nvdimm_setup_pfn(struct nd_pfn *nd_pfn, struct dev_pagemap *pgmap) memcpy(altmap, &__altmap, sizeof(*altmap)); altmap->free = PHYS_PFN(offset - SZ_8K); altmap->alloc = 0; + spin_lock_init(&altmap->lock); pgmap->altmap_valid = true; } else return -ENXIO; diff --git a/include/linux/memmap_async.h b/include/linux/memmap_async.h new file mode 100644 index 000000000000..11aa9f3a523e --- /dev/null +++ b/include/linux/memmap_async.h @@ -0,0 +1,28 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __LINUX_MEMMAP_ASYNC_H +#define __LINUX_MEMMAP_ASYNC_H +#include + +struct vmem_altmap; + +struct memmap_init_env { + struct vmem_altmap *altmap; + bool want_memblock; + int nid; +}; + +struct memmap_init_memmap { + struct memmap_init_env *env; + async_cookie_t cookie; + int start_sec; + int end_sec; + int result; +}; + +struct memmap_async_state { + struct memmap_init_env env; + struct memmap_init_memmap memmap; +}; + +extern struct async_domain memmap_init_domain; +#endif /* __LINUX_MEMMAP_ASYNC_H */ diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index e60085b2824d..7565b2675863 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -15,6 +15,7 @@ struct memory_block; struct resource; struct vmem_altmap; struct dev_pagemap; +struct memmap_async_state; #ifdef CONFIG_MEMORY_HOTPLUG /* @@ -116,18 +117,21 @@ extern int __remove_pages(struct zone *zone, unsigned long start_pfn, /* reasonably generic interface to expand the physical pages */ extern int __add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages, - struct vmem_altmap *altmap, bool want_memblock); + struct vmem_altmap *altmap, bool want_memblock, + struct memmap_async_state *async); #ifndef CONFIG_ARCH_HAS_ADD_PAGES static inline int add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages, struct vmem_altmap *altmap, - bool want_memblock) + bool want_memblock, struct memmap_async_state *async) { - return __add_pages(nid, start_pfn, nr_pages, altmap, want_memblock); + return __add_pages(nid, start_pfn, nr_pages, altmap, want_memblock, + async); } #else /* ARCH_HAS_ADD_PAGES */ int add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages, - struct vmem_altmap *altmap, bool want_memblock); + struct vmem_altmap *altmap, bool want_memblock, + struct memmap_async_state *async); #endif /* ARCH_HAS_ADD_PAGES */ #ifdef CONFIG_NUMA @@ -325,7 +329,8 @@ extern int walk_memory_range(unsigned long start_pfn, unsigned long end_pfn, extern int add_memory(int nid, u64 start, u64 size); extern int add_memory_resource(int nid, struct resource *resource, bool online); extern int arch_add_memory(int nid, u64 start, u64 size, - struct vmem_altmap *altmap, bool want_memblock); + struct vmem_altmap *altmap, bool want_memblock, + struct memmap_async_state *async); extern void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn, unsigned long nr_pages, struct dev_pagemap *pgmap); extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages); diff --git a/include/linux/memremap.h b/include/linux/memremap.h index 71f5e7c7dfb9..bfdc7363b13b 100644 --- a/include/linux/memremap.h +++ b/include/linux/memremap.h @@ -16,6 +16,7 @@ struct device; * @free: free pages set aside in the mapping for memmap storage * @align: pages reserved to meet allocation alignments * @alloc: track pages consumed, private to vmemmap_populate() + * @lock: enable parallel allocations */ struct vmem_altmap { const unsigned long base_pfn; @@ -23,6 +24,7 @@ struct vmem_altmap { unsigned long free; unsigned long align; unsigned long alloc; + spinlock_t lock; }; /* diff --git a/include/linux/mm.h b/include/linux/mm.h index 319d01372efa..0fac83ff21c5 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2654,9 +2654,9 @@ void sparse_mem_maps_populate_node(struct page **map_map, struct page *sparse_mem_map_populate(unsigned long pnum, int nid, struct vmem_altmap *altmap); -pgd_t *vmemmap_pgd_populate(unsigned long addr, int node); -p4d_t *vmemmap_p4d_populate(pgd_t *pgd, unsigned long addr, int node); -pud_t *vmemmap_pud_populate(p4d_t *p4d, unsigned long addr, int node); +pgd_t *vmemmap_pgd_populate(unsigned long addr, int node, pgd_t *); +p4d_t *vmemmap_p4d_populate(pgd_t *pgd, unsigned long addr, int node, p4d_t *); +pud_t *vmemmap_pud_populate(p4d_t *p4d, unsigned long addr, int node, pud_t *); pmd_t *vmemmap_pmd_populate(pud_t *pud, unsigned long addr, int node); pte_t *vmemmap_pte_populate(pmd_t *pmd, unsigned long addr, int node); void *vmemmap_alloc_block(unsigned long size, int node); diff --git a/kernel/memremap.c b/kernel/memremap.c index 58327259420d..b861fe909932 100644 --- a/kernel/memremap.c +++ b/kernel/memremap.c @@ -235,12 +235,12 @@ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap, */ if (pgmap->type == MEMORY_DEVICE_PRIVATE) { error = add_pages(nid, align_start >> PAGE_SHIFT, - align_size >> PAGE_SHIFT, NULL, false); + align_size >> PAGE_SHIFT, NULL, false, NULL); } else { struct zone *zone; error = arch_add_memory(nid, align_start, align_size, altmap, - false); + false, NULL); zone = &NODE_DATA(nid)->node_zones[ZONE_DEVICE]; if (!error) move_pfn_range_to_zone(zone, align_start >> PAGE_SHIFT, diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index aae4e6cc65e9..18f8e2c49089 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -34,6 +34,8 @@ #include #include #include +#include +#include #include #include @@ -264,6 +266,32 @@ static int __meminit __add_section(int nid, unsigned long phys_start_pfn, return hotplug_memory_register(nid, __pfn_to_section(phys_start_pfn)); } +static void __ref section_init_async(void *data, async_cookie_t cookie) +{ + unsigned long i; + struct memmap_init_memmap *args = data; + struct memmap_init_env *env = args->env; + int start_sec = args->start_sec, end_sec = args->end_sec, err; + + args->result = 0; + for (i = start_sec; i <= end_sec; i++) { + err = __add_section(env->nid, section_nr_to_pfn(i), env->altmap, + env->want_memblock); + + /* + * EEXIST is finally dealt with by ioresource collision + * check. see add_memory() => register_memory_resource() + * Warning will be printed if there is collision. + */ + if (err && (err != -EEXIST)) { + args->result = err; + break; + } + args->result = 0; + cond_resched(); + } +} + /* * Reasonably generic function for adding memory. It is * expected that archs that support memory hotplug will @@ -272,11 +300,12 @@ static int __meminit __add_section(int nid, unsigned long phys_start_pfn, */ int __ref __add_pages(int nid, unsigned long phys_start_pfn, unsigned long nr_pages, struct vmem_altmap *altmap, - bool want_memblock) + bool want_memblock, struct memmap_async_state *async) { - unsigned long i; int err = 0; int start_sec, end_sec; + struct memmap_init_env _env, *env; + struct memmap_init_memmap _args, *args; /* during initialize mem_map, align hot-added range to section */ start_sec = pfn_to_section_nr(phys_start_pfn); @@ -289,28 +318,32 @@ int __ref __add_pages(int nid, unsigned long phys_start_pfn, if (altmap->base_pfn != phys_start_pfn || vmem_altmap_offset(altmap) > nr_pages) { pr_warn_once("memory add fail, invalid altmap\n"); - err = -EINVAL; - goto out; + return -EINVAL; } altmap->alloc = 0; } - for (i = start_sec; i <= end_sec; i++) { - err = __add_section(nid, section_nr_to_pfn(i), altmap, - want_memblock); + env = async ? &async->env : &_env; + args = async ? &async->memmap : &_args; - /* - * EEXIST is finally dealt with by ioresource collision - * check. see add_memory() => register_memory_resource() - * Warning will be printed if there is collision. - */ - if (err && (err != -EEXIST)) - break; - err = 0; - cond_resched(); + env->nid = nid; + env->altmap = altmap; + env->want_memblock = want_memblock; + + args->env = env; + args->end_sec = end_sec; + args->start_sec = start_sec; + + if (async) + args->cookie = async_schedule_domain(section_init_async, args, + &memmap_init_domain); + else { + /* call the 'async' routine synchronously */ + section_init_async(args, 0); + err = args->result; } + vmemmap_populate_print_last(); -out: return err; } @@ -1135,7 +1168,7 @@ int __ref add_memory_resource(int nid, struct resource *res, bool online) } /* call arch's memory hotadd */ - ret = arch_add_memory(nid, start, size, NULL, true); + ret = arch_add_memory(nid, start, size, NULL, true, NULL); if (ret < 0) goto error; diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 545a5860cce7..f83682ef006e 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -66,6 +66,7 @@ #include #include #include +#include #include #include @@ -5452,6 +5453,8 @@ void __ref build_all_zonelists(pg_data_t *pgdat) #endif } +ASYNC_DOMAIN_EXCLUSIVE(memmap_init_domain); + /* * Initially all pages are reserved - free ones are freed * up by free_all_bootmem() once the early boot process is diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c index bd0276d5f66b..9cdd82fb595d 100644 --- a/mm/sparse-vmemmap.c +++ b/mm/sparse-vmemmap.c @@ -93,6 +93,7 @@ void * __meminit vmemmap_alloc_block_buf(unsigned long size, int node) static unsigned long __meminit vmem_altmap_next_pfn(struct vmem_altmap *altmap) { + lockdep_assert_held(&altmap->lock); return altmap->base_pfn + altmap->reserve + altmap->alloc + altmap->align; } @@ -101,6 +102,7 @@ static unsigned long __meminit vmem_altmap_nr_free(struct vmem_altmap *altmap) { unsigned long allocated = altmap->alloc + altmap->align; + lockdep_assert_held(&altmap->lock); if (altmap->free > allocated) return altmap->free - allocated; return 0; @@ -124,16 +126,20 @@ void * __meminit altmap_alloc_block_buf(unsigned long size, return NULL; } + spin_lock(&altmap->lock); pfn = vmem_altmap_next_pfn(altmap); nr_pfns = size >> PAGE_SHIFT; nr_align = 1UL << find_first_bit(&nr_pfns, BITS_PER_LONG); nr_align = ALIGN(pfn, nr_align) - pfn; - if (nr_pfns + nr_align > vmem_altmap_nr_free(altmap)) + if (nr_pfns + nr_align > vmem_altmap_nr_free(altmap)) { + spin_unlock(&altmap->lock); return NULL; + } altmap->alloc += nr_pfns; altmap->align += nr_align; pfn += nr_align; + spin_unlock(&altmap->lock); pr_debug("%s: pfn: %#lx alloc: %ld align: %ld nr: %#lx\n", __func__, pfn, altmap->alloc, altmap->align, nr_pfns); @@ -188,39 +194,63 @@ pmd_t * __meminit vmemmap_pmd_populate(pud_t *pud, unsigned long addr, int node) return pmd; } -pud_t * __meminit vmemmap_pud_populate(p4d_t *p4d, unsigned long addr, int node) +static DEFINE_MUTEX(vmemmap_pgd_lock); +static DEFINE_MUTEX(vmemmap_p4d_lock); +static DEFINE_MUTEX(vmemmap_pud_lock); + +pud_t * __meminit vmemmap_pud_populate(p4d_t *p4d, unsigned long addr, int node, + pud_t *pud) { - pud_t *pud = pud_offset(p4d, addr); + pud_t *new = pud_offset(p4d, addr); + + if (new == pud) + return pud; + pud = new; + mutex_lock(&vmemmap_pud_lock); if (pud_none(*pud)) { void *p = vmemmap_alloc_block_zero(PAGE_SIZE, node); if (!p) return NULL; pud_populate(&init_mm, pud, p); } + mutex_unlock(&vmemmap_pud_lock); return pud; } -p4d_t * __meminit vmemmap_p4d_populate(pgd_t *pgd, unsigned long addr, int node) +p4d_t * __meminit vmemmap_p4d_populate(pgd_t *pgd, unsigned long addr, int node, + p4d_t * p4d) { - p4d_t *p4d = p4d_offset(pgd, addr); + p4d_t *new = p4d_offset(pgd, addr); + + if (new == p4d) + return p4d; + p4d = new; + mutex_lock(&vmemmap_p4d_lock); if (p4d_none(*p4d)) { void *p = vmemmap_alloc_block_zero(PAGE_SIZE, node); if (!p) return NULL; p4d_populate(&init_mm, p4d, p); } + mutex_unlock(&vmemmap_p4d_lock); return p4d; } -pgd_t * __meminit vmemmap_pgd_populate(unsigned long addr, int node) +pgd_t * __meminit vmemmap_pgd_populate(unsigned long addr, int node, pgd_t *pgd) { - pgd_t *pgd = pgd_offset_k(addr); + pgd_t *new = pgd_offset_k(addr); + + if (new == pgd) + return pgd; + pgd = new; + mutex_lock(&vmemmap_pgd_lock); if (pgd_none(*pgd)) { void *p = vmemmap_alloc_block_zero(PAGE_SIZE, node); if (!p) return NULL; pgd_populate(&init_mm, pgd, p); } + mutex_unlock(&vmemmap_pgd_lock); return pgd; } @@ -228,20 +258,20 @@ int __meminit vmemmap_populate_basepages(unsigned long start, unsigned long end, int node) { unsigned long addr = start; - pgd_t *pgd; - p4d_t *p4d; - pud_t *pud; + pgd_t *pgd = NULL; + p4d_t *p4d = NULL; + pud_t *pud = NULL; pmd_t *pmd; pte_t *pte; for (; addr < end; addr += PAGE_SIZE) { - pgd = vmemmap_pgd_populate(addr, node); + pgd = vmemmap_pgd_populate(addr, node, pgd); if (!pgd) return -ENOMEM; - p4d = vmemmap_p4d_populate(pgd, addr, node); + p4d = vmemmap_p4d_populate(pgd, addr, node, p4d); if (!p4d) return -ENOMEM; - pud = vmemmap_pud_populate(p4d, addr, node); + pud = vmemmap_pud_populate(p4d, addr, node, pud); if (!pud) return -ENOMEM; pmd = vmemmap_pmd_populate(pud, addr, node);