From patchwork Wed Jun 5 21:57:54 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 10977953 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4B6676C5 for ; Wed, 5 Jun 2019 22:12:16 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3A968202A5 for ; Wed, 5 Jun 2019 22:12:16 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 2E17F285DB; Wed, 5 Jun 2019 22:12:16 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=unavailable version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 16DC1202A5 for ; Wed, 5 Jun 2019 22:12:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 12D5F6B026F; Wed, 5 Jun 2019 18:12:14 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 0DEC76B0270; Wed, 5 Jun 2019 18:12:14 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EC0CD6B0271; Wed, 5 Jun 2019 18:12:13 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f199.google.com (mail-pf1-f199.google.com [209.85.210.199]) by kanga.kvack.org (Postfix) with ESMTP id B1C8C6B026F for ; Wed, 5 Jun 2019 18:12:13 -0400 (EDT) Received: by mail-pf1-f199.google.com with SMTP id l4so288733pff.5 for ; Wed, 05 Jun 2019 15:12:13 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:subject:from :to:cc:date:message-id:in-reply-to:references:user-agent :mime-version:content-transfer-encoding; bh=TnOhrsHELOKseRG8U6emFtZJ5+9SHHXSv2NRKOMko3k=; b=kMafWTwJXkuWq8NRlr+J5WyP9viy0as1ivYOLRNr+Nz9olDaGLMuFs84poBXUzWd/C REwxUzNTVgKAZO0cquCalU6rIgqO5+uA4E0DK8xmc2nuOr48hS1QGuMM6il9tfsAjbLn LjOZ39NH4vqHz6u43DxuCflWOPKBLADq7zzIg1ZFTDEc5YHna3LdvPjpd9jPAwabOUwF Foe4yXhaHVCQwEH0uZqYLOjggJB4MShP1JBkbav9dh/wInKKPi+Qiu61P6D+ekF61kdD DgyEsGCOjgnjji6TYzrNtbgHMlsVtCBoG7xjw+aOVUxDmQuJIW5LVWKWSi5A8SQGjZIs htsw== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 192.55.52.120 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: APjAAAVHaiMebYeTXgm3YEGwB5xVX4GLTX6Q8yFeoUignyJAyfSoCme0 W85jQKFJh2SNOd42xFW3nRBDycMuMAoLb26D1cwrWCzJkKVe31VgxUgpJwTABBsjjMwFVWrxZlk QSXBxNmYJnY4hsDm1v/GIDjig4sgRcMa2zLV3dl6Zyru4IRs0T7RJISv0rIC3fUj+yQ== X-Received: by 2002:a62:7990:: with SMTP id u138mr5708505pfc.191.1559772733196; Wed, 05 Jun 2019 15:12:13 -0700 (PDT) X-Google-Smtp-Source: APXvYqxhMOhGkwJ3EGwDRfXThHF52PTbW7gv65Hizdwx1GAqRbYWCORG4AQ2RZZvsAzBrWWeSx+N X-Received: by 2002:a62:7990:: with SMTP id u138mr5708300pfc.191.1559772731593; Wed, 05 Jun 2019 15:12:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1559772731; cv=none; d=google.com; s=arc-20160816; b=FM0SMKr8nPzoQJK19r5Fw6L89aeWQCcs6xl9udp/HU4bt0HvQUuSu04pNbDTJIas9v CqRB8jgRDBD4m7Nz/7TzkAyNHXxyRkLT5QGlJznDuOEdF1yKr3xl8RxR0OgRyv+Z4JZD gxfq8LZ2uJVzWpzbXDEeO5rC41orxwOmMvQ6ibF7MIglkYlXobtKZ8ipx6QZBZel7LVH iJlrVT0rPRnuOFqs0gaqn2x6JvDWNA3rKL5CO8T0JKNfqTq5dLg0NdUZK6U9FcYiMl+0 qAEfaluR8u6ZobQZusjmM3EAltmEDvro/jhdUCT9N5pZVr9y1D1B8J7vleKpVKRrviDp HtHA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:user-agent:references :in-reply-to:message-id:date:cc:to:from:subject; bh=TnOhrsHELOKseRG8U6emFtZJ5+9SHHXSv2NRKOMko3k=; b=zD1NP2u81mEFvDEr81oNi1MYV+drxOX3Hzgkw7AXs6CG62ivTSVVZhfXYz+ebMSEFr xpDv11e1JyXoX9NHEQiQDsi1pvGtMgxriUmOq8OkSGSMZEuIFc0VxZICa+HD/7HlnlIR ZLTAajVJIPVMSynls3X24b1nZVQrXW40BWsRQ0lJ0MfZLRlOgUQkC0JLWXOVbp7U6zQ0 na2i58CIcf+vyvi9URNHFLj+eO/JHlDLiBxRdNzCFQ+qqKvWd1JYqgybvd5syq67a9gR fBNv970uGivnF7tsgihfcfHWT8XXDNHcF0lgDb8ryy9uETZ9Pf9OK1sMAL6X+yj/CZAU Mf2w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 192.55.52.120 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga04.intel.com (mga04.intel.com. [192.55.52.120]) by mx.google.com with ESMTPS id j8si30445540pfh.239.2019.06.05.15.12.11 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 05 Jun 2019 15:12:11 -0700 (PDT) Received-SPF: pass (google.com: domain of dan.j.williams@intel.com designates 192.55.52.120 as permitted sender) client-ip=192.55.52.120; Authentication-Results: mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 192.55.52.120 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 05 Jun 2019 15:12:10 -0700 X-ExtLoop1: 1 Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.16]) by orsmga003.jf.intel.com with ESMTP; 05 Jun 2019 15:12:10 -0700 Subject: [PATCH v9 01/12] mm/sparsemem: Introduce struct mem_section_usage From: Dan Williams To: akpm@linux-foundation.org Cc: Michal Hocko , Vlastimil Babka , Logan Gunthorpe , Oscar Salvador , Pavel Tatashin , Benjamin Herrenschmidt , Paul Mackerras , Michael Ellerman , linux-mm@kvack.org, linux-nvdimm@lists.01.org, linux-kernel@vger.kernel.org, osalvador@suse.de, mhocko@suse.com Date: Wed, 05 Jun 2019 14:57:54 -0700 Message-ID: <155977187407.2443951.16503493275720588454.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <155977186863.2443951.9036044808311959913.stgit@dwillia2-desk3.amr.corp.intel.com> References: <155977186863.2443951.9036044808311959913.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.18-2-gc94f MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Towards enabling memory hotplug to track partial population of a section, introduce 'struct mem_section_usage'. A pointer to a 'struct mem_section_usage' instance replaces the existing pointer to a 'pageblock_flags' bitmap. Effectively it adds one more 'unsigned long' beyond the 'pageblock_flags' (usemap) allocation to house a new 'subsection_map' bitmap. The new bitmap enables the memory hot{plug,remove} implementation to act on incremental sub-divisions of a section. The default SUBSECTION_SHIFT is chosen to keep the 'subsection_map' no larger than a single 'unsigned long' on the major architectures. Alternatively an architecture can define ARCH_SUBSECTION_SHIFT to override the default PMD_SHIFT. Note that PowerPC needs to use ARCH_SUBSECTION_SHIFT to workaround PMD_SHIFT being a non-constant expression on PowerPC. The primary motivation for this functionality is to support platforms that mix "System RAM" and "Persistent Memory" within a single section, or multiple PMEM ranges with different mapping lifetimes within a single section. The section restriction for hotplug has caused an ongoing saga of hacks and bugs for devm_memremap_pages() users. Beyond the fixups to teach existing paths how to retrieve the 'usemap' from a section, and updates to usemap allocation path, there are no expected behavior changes. Cc: Michal Hocko Cc: Vlastimil Babka Cc: Logan Gunthorpe Cc: Oscar Salvador Cc: Pavel Tatashin Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Signed-off-by: Dan Williams Reviewed-by: Oscar Salvador Reviewed-by: Wei Yang --- arch/powerpc/include/asm/sparsemem.h | 3 + include/linux/mmzone.h | 48 +++++++++++++++++++- mm/memory_hotplug.c | 18 ++++---- mm/page_alloc.c | 2 - mm/sparse.c | 81 +++++++++++++++++----------------- 5 files changed, 99 insertions(+), 53 deletions(-) diff --git a/arch/powerpc/include/asm/sparsemem.h b/arch/powerpc/include/asm/sparsemem.h index 3192d454a733..1aa3c9303bf8 100644 --- a/arch/powerpc/include/asm/sparsemem.h +++ b/arch/powerpc/include/asm/sparsemem.h @@ -10,6 +10,9 @@ */ #define SECTION_SIZE_BITS 24 +/* Reflect the largest possible PMD-size as the subsection-size constant */ +#define ARCH_SUBSECTION_SHIFT 24 + #endif /* CONFIG_SPARSEMEM */ #ifdef CONFIG_MEMORY_HOTPLUG diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 427b79c39b3c..ac163f2f274f 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -1161,6 +1161,44 @@ static inline unsigned long section_nr_to_pfn(unsigned long sec) #define SECTION_ALIGN_UP(pfn) (((pfn) + PAGES_PER_SECTION - 1) & PAGE_SECTION_MASK) #define SECTION_ALIGN_DOWN(pfn) ((pfn) & PAGE_SECTION_MASK) +/* + * SUBSECTION_SHIFT must be constant since it is used to declare + * subsection_map and related bitmaps without triggering the generation + * of variable-length arrays. The most natural size for a subsection is + * a PMD-page. For architectures that do not have a constant PMD-size + * ARCH_SUBSECTION_SHIFT can be set to a constant max size, or otherwise + * fallback to 2MB. + */ +#if defined(ARCH_SUBSECTION_SHIFT) +#define SUBSECTION_SHIFT (ARCH_SUBSECTION_SHIFT) +#elif defined(PMD_SHIFT) +#define SUBSECTION_SHIFT (PMD_SHIFT) +#else +/* + * Memory hotplug enabled platforms avoid this default because they + * either define ARCH_SUBSECTION_SHIFT, or PMD_SHIFT is a constant, but + * this is kept as a backstop to allow compilation on + * !ARCH_ENABLE_MEMORY_HOTPLUG archs. + */ +#define SUBSECTION_SHIFT 21 +#endif + +#define PFN_SUBSECTION_SHIFT (SUBSECTION_SHIFT - PAGE_SHIFT) +#define PAGES_PER_SUBSECTION (1UL << PFN_SUBSECTION_SHIFT) +#define PAGE_SUBSECTION_MASK ((~(PAGES_PER_SUBSECTION-1))) + +#if SUBSECTION_SHIFT > SECTION_SIZE_BITS +#error Subsection size exceeds section size +#else +#define SUBSECTIONS_PER_SECTION (1UL << (SECTION_SIZE_BITS - SUBSECTION_SHIFT)) +#endif + +struct mem_section_usage { + DECLARE_BITMAP(subsection_map, SUBSECTIONS_PER_SECTION); + /* See declaration of similar field in struct zone */ + unsigned long pageblock_flags[0]; +}; + struct page; struct page_ext; struct mem_section { @@ -1178,8 +1216,7 @@ struct mem_section { */ unsigned long section_mem_map; - /* See declaration of similar field in struct zone */ - unsigned long *pageblock_flags; + struct mem_section_usage *usage; #ifdef CONFIG_PAGE_EXTENSION /* * If SPARSEMEM, pgdat doesn't have page_ext pointer. We use @@ -1210,6 +1247,11 @@ extern struct mem_section **mem_section; extern struct mem_section mem_section[NR_SECTION_ROOTS][SECTIONS_PER_ROOT]; #endif +static inline unsigned long *section_to_usemap(struct mem_section *ms) +{ + return ms->usage->pageblock_flags; +} + static inline struct mem_section *__nr_to_section(unsigned long nr) { #ifdef CONFIG_SPARSEMEM_EXTREME @@ -1221,7 +1263,7 @@ static inline struct mem_section *__nr_to_section(unsigned long nr) return &mem_section[SECTION_NR_TO_ROOT(nr)][nr & SECTION_ROOT_MASK]; } extern int __section_nr(struct mem_section* ms); -extern unsigned long usemap_size(void); +extern size_t mem_section_usage_size(void); /* * We use the lower bits of the mem_map pointer to store diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index a88c5f334e5a..7b963c2d3a0d 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -166,9 +166,10 @@ void put_page_bootmem(struct page *page) #ifndef CONFIG_SPARSEMEM_VMEMMAP static void register_page_bootmem_info_section(unsigned long start_pfn) { - unsigned long *usemap, mapsize, section_nr, i; + unsigned long mapsize, section_nr, i; struct mem_section *ms; struct page *page, *memmap; + struct mem_section_usage *usage; section_nr = pfn_to_section_nr(start_pfn); ms = __nr_to_section(section_nr); @@ -188,10 +189,10 @@ static void register_page_bootmem_info_section(unsigned long start_pfn) for (i = 0; i < mapsize; i++, page++) get_page_bootmem(section_nr, page, SECTION_INFO); - usemap = ms->pageblock_flags; - page = virt_to_page(usemap); + usage = ms->usage; + page = virt_to_page(usage); - mapsize = PAGE_ALIGN(usemap_size()) >> PAGE_SHIFT; + mapsize = PAGE_ALIGN(mem_section_usage_size()) >> PAGE_SHIFT; for (i = 0; i < mapsize; i++, page++) get_page_bootmem(section_nr, page, MIX_SECTION_INFO); @@ -200,9 +201,10 @@ static void register_page_bootmem_info_section(unsigned long start_pfn) #else /* CONFIG_SPARSEMEM_VMEMMAP */ static void register_page_bootmem_info_section(unsigned long start_pfn) { - unsigned long *usemap, mapsize, section_nr, i; + unsigned long mapsize, section_nr, i; struct mem_section *ms; struct page *page, *memmap; + struct mem_section_usage *usage; section_nr = pfn_to_section_nr(start_pfn); ms = __nr_to_section(section_nr); @@ -211,10 +213,10 @@ static void register_page_bootmem_info_section(unsigned long start_pfn) register_page_bootmem_memmap(section_nr, memmap, PAGES_PER_SECTION); - usemap = ms->pageblock_flags; - page = virt_to_page(usemap); + usage = ms->usage; + page = virt_to_page(usage); - mapsize = PAGE_ALIGN(usemap_size()) >> PAGE_SHIFT; + mapsize = PAGE_ALIGN(mem_section_usage_size()) >> PAGE_SHIFT; for (i = 0; i < mapsize; i++, page++) get_page_bootmem(section_nr, page, MIX_SECTION_INFO); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index c061f66c2d0c..c6d8224d792e 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -404,7 +404,7 @@ static inline unsigned long *get_pageblock_bitmap(struct page *page, unsigned long pfn) { #ifdef CONFIG_SPARSEMEM - return __pfn_to_section(pfn)->pageblock_flags; + return section_to_usemap(__pfn_to_section(pfn)); #else return page_zone(page)->pageblock_flags; #endif /* CONFIG_SPARSEMEM */ diff --git a/mm/sparse.c b/mm/sparse.c index 1552c855d62a..71da15cc7432 100644 --- a/mm/sparse.c +++ b/mm/sparse.c @@ -288,33 +288,31 @@ struct page *sparse_decode_mem_map(unsigned long coded_mem_map, unsigned long pn static void __meminit sparse_init_one_section(struct mem_section *ms, unsigned long pnum, struct page *mem_map, - unsigned long *pageblock_bitmap) + struct mem_section_usage *usage) { ms->section_mem_map &= ~SECTION_MAP_MASK; ms->section_mem_map |= sparse_encode_mem_map(mem_map, pnum) | SECTION_HAS_MEM_MAP; - ms->pageblock_flags = pageblock_bitmap; + ms->usage = usage; } -unsigned long usemap_size(void) +static unsigned long usemap_size(void) { return BITS_TO_LONGS(SECTION_BLOCKFLAGS_BITS) * sizeof(unsigned long); } -#ifdef CONFIG_MEMORY_HOTPLUG -static unsigned long *__kmalloc_section_usemap(void) +size_t mem_section_usage_size(void) { - return kmalloc(usemap_size(), GFP_KERNEL); + return sizeof(struct mem_section_usage) + usemap_size(); } -#endif /* CONFIG_MEMORY_HOTPLUG */ #ifdef CONFIG_MEMORY_HOTREMOVE -static unsigned long * __init +static struct mem_section_usage * __init sparse_early_usemaps_alloc_pgdat_section(struct pglist_data *pgdat, unsigned long size) { + struct mem_section_usage *usage; unsigned long goal, limit; - unsigned long *p; int nid; /* * A page may contain usemaps for other sections preventing the @@ -330,15 +328,16 @@ sparse_early_usemaps_alloc_pgdat_section(struct pglist_data *pgdat, limit = goal + (1UL << PA_SECTION_SHIFT); nid = early_pfn_to_nid(goal >> PAGE_SHIFT); again: - p = memblock_alloc_try_nid(size, SMP_CACHE_BYTES, goal, limit, nid); - if (!p && limit) { + usage = memblock_alloc_try_nid(size, SMP_CACHE_BYTES, goal, limit, nid); + if (!usage && limit) { limit = 0; goto again; } - return p; + return usage; } -static void __init check_usemap_section_nr(int nid, unsigned long *usemap) +static void __init check_usemap_section_nr(int nid, + struct mem_section_usage *usage) { unsigned long usemap_snr, pgdat_snr; static unsigned long old_usemap_snr; @@ -352,7 +351,7 @@ static void __init check_usemap_section_nr(int nid, unsigned long *usemap) old_pgdat_snr = NR_MEM_SECTIONS; } - usemap_snr = pfn_to_section_nr(__pa(usemap) >> PAGE_SHIFT); + usemap_snr = pfn_to_section_nr(__pa(usage) >> PAGE_SHIFT); pgdat_snr = pfn_to_section_nr(__pa(pgdat) >> PAGE_SHIFT); if (usemap_snr == pgdat_snr) return; @@ -380,14 +379,15 @@ static void __init check_usemap_section_nr(int nid, unsigned long *usemap) usemap_snr, pgdat_snr, nid); } #else -static unsigned long * __init +static struct mem_section_usage * __init sparse_early_usemaps_alloc_pgdat_section(struct pglist_data *pgdat, unsigned long size) { return memblock_alloc_node(size, SMP_CACHE_BYTES, pgdat->node_id); } -static void __init check_usemap_section_nr(int nid, unsigned long *usemap) +static void __init check_usemap_section_nr(int nid, + struct mem_section_usage *usage) { } #endif /* CONFIG_MEMORY_HOTREMOVE */ @@ -474,14 +474,13 @@ static void __init sparse_init_nid(int nid, unsigned long pnum_begin, unsigned long pnum_end, unsigned long map_count) { - unsigned long pnum, usemap_longs, *usemap; + struct mem_section_usage *usage; + unsigned long pnum; struct page *map; - usemap_longs = BITS_TO_LONGS(SECTION_BLOCKFLAGS_BITS); - usemap = sparse_early_usemaps_alloc_pgdat_section(NODE_DATA(nid), - usemap_size() * - map_count); - if (!usemap) { + usage = sparse_early_usemaps_alloc_pgdat_section(NODE_DATA(nid), + mem_section_usage_size() * map_count); + if (!usage) { pr_err("%s: node[%d] usemap allocation failed", __func__, nid); goto failed; } @@ -497,9 +496,9 @@ static void __init sparse_init_nid(int nid, unsigned long pnum_begin, pnum_begin = pnum; goto failed; } - check_usemap_section_nr(nid, usemap); - sparse_init_one_section(__nr_to_section(pnum), pnum, map, usemap); - usemap += usemap_longs; + check_usemap_section_nr(nid, usage); + sparse_init_one_section(__nr_to_section(pnum), pnum, map, usage); + usage = (void *) usage + mem_section_usage_size(); } sparse_buffer_fini(); return; @@ -697,9 +696,9 @@ int __meminit sparse_add_one_section(int nid, unsigned long start_pfn, struct vmem_altmap *altmap) { unsigned long section_nr = pfn_to_section_nr(start_pfn); + struct mem_section_usage *usage; struct mem_section *ms; struct page *memmap; - unsigned long *usemap; int ret; /* @@ -713,8 +712,8 @@ int __meminit sparse_add_one_section(int nid, unsigned long start_pfn, memmap = kmalloc_section_memmap(section_nr, nid, altmap); if (!memmap) return -ENOMEM; - usemap = __kmalloc_section_usemap(); - if (!usemap) { + usage = kzalloc(mem_section_usage_size(), GFP_KERNEL); + if (!usage) { __kfree_section_memmap(memmap, altmap); return -ENOMEM; } @@ -732,11 +731,11 @@ int __meminit sparse_add_one_section(int nid, unsigned long start_pfn, page_init_poison(memmap, sizeof(struct page) * PAGES_PER_SECTION); section_mark_present(ms); - sparse_init_one_section(ms, section_nr, memmap, usemap); + sparse_init_one_section(ms, section_nr, memmap, usage); out: if (ret < 0) { - kfree(usemap); + kfree(usage); __kfree_section_memmap(memmap, altmap); } return ret; @@ -772,20 +771,20 @@ static inline void clear_hwpoisoned_pages(struct page *memmap, int nr_pages) } #endif -static void free_section_usemap(struct page *memmap, unsigned long *usemap, - struct vmem_altmap *altmap) +static void free_section_usage(struct page *memmap, + struct mem_section_usage *usage, struct vmem_altmap *altmap) { - struct page *usemap_page; + struct page *usage_page; - if (!usemap) + if (!usage) return; - usemap_page = virt_to_page(usemap); + usage_page = virt_to_page(usage); /* * Check to see if allocation came from hot-plug-add */ - if (PageSlab(usemap_page) || PageCompound(usemap_page)) { - kfree(usemap); + if (PageSlab(usage_page) || PageCompound(usage_page)) { + kfree(usage); if (memmap) __kfree_section_memmap(memmap, altmap); return; @@ -804,18 +803,18 @@ void sparse_remove_one_section(struct mem_section *ms, unsigned long map_offset, struct vmem_altmap *altmap) { struct page *memmap = NULL; - unsigned long *usemap = NULL; + struct mem_section_usage *usage = NULL; if (ms->section_mem_map) { - usemap = ms->pageblock_flags; + usage = ms->usage; memmap = sparse_decode_mem_map(ms->section_mem_map, __section_nr(ms)); ms->section_mem_map = 0; - ms->pageblock_flags = NULL; + ms->usage = NULL; } clear_hwpoisoned_pages(memmap + map_offset, PAGES_PER_SECTION - map_offset); - free_section_usemap(memmap, usemap, altmap); + free_section_usage(memmap, usage, altmap); } #endif /* CONFIG_MEMORY_HOTPLUG */ From patchwork Wed Jun 5 21:57:59 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 10977957 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B6E441398 for ; Wed, 5 Jun 2019 22:12:20 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A603D202A5 for ; Wed, 5 Jun 2019 22:12:20 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 99C04285DB; Wed, 5 Jun 2019 22:12:20 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=unavailable version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A9BD5202A5 for ; Wed, 5 Jun 2019 22:12:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B67A76B0270; Wed, 5 Jun 2019 18:12:18 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id B17416B0271; Wed, 5 Jun 2019 18:12:18 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A069C6B0272; Wed, 5 Jun 2019 18:12:18 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f197.google.com (mail-pl1-f197.google.com [209.85.214.197]) by kanga.kvack.org (Postfix) with ESMTP id 682FB6B0270 for ; Wed, 5 Jun 2019 18:12:18 -0400 (EDT) Received: by mail-pl1-f197.google.com with SMTP id d19so218033pls.1 for ; Wed, 05 Jun 2019 15:12:18 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:subject:from :to:cc:date:message-id:in-reply-to:references:user-agent :mime-version:content-transfer-encoding; bh=n8hi6N/kizta8FqBmW4FQV8wLCzVOtOh6NRdhaV269k=; b=teTT5OcFablUVGTDoaXtzFThJkfosr6MF6Y2p0Q96adaRxsaRAZwgnoUfOzc/gBlwI rZUspMTlRW2wInYqsAonmjan012TZNg9CVrMqKDUPDPK77d+rt/E36TV05tstX5Btq9H ox1v8tcIS+UyoyBh0bm77fxpy0PNZXTcL2P5swNES2Fv/ta8D0Jah5+kj0RaRbm2Ywkr aLCJS9jbEgtys/qgj132Uc7eq+bwFxmaVgFAC1fRjd0FHOPI7I6epxWAWu+VJF1vcdGQ e1gZA/lH/8zGEe5EU/E+acP5LYOdssgE0G1oH6jLI2yGs8hTg/TtDX/ydqvKDgzAijCH Dt4w== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: APjAAAU+lxOVb3VHHERa+7Ulrq/7u/aGFGpi6Ejo87hZHABZZVMSdbWf JGIrUsGpCCArqghAH3+Jsy3S1Uodj0dkjKZjvvcoo09sLz7mRVFVdTiWuBAE+sJUk9wxg42/h2+ ygLM0BqFtp0E7/f28jBxFpKcyPkNMyPkAeU2o4eRz4smpxcSJrRIPZIZvBYuV8qdbsQ== X-Received: by 2002:a62:5253:: with SMTP id g80mr1902016pfb.179.1559772737922; Wed, 05 Jun 2019 15:12:17 -0700 (PDT) X-Google-Smtp-Source: APXvYqyMBfr7G6xQf6V3dByW8sOzVulRwriOAasyx+SZ97GtxSwbEA/g+rHIUu7qVflGERdCmgKF X-Received: by 2002:a62:5253:: with SMTP id g80mr1901884pfb.179.1559772736769; Wed, 05 Jun 2019 15:12:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1559772736; cv=none; d=google.com; s=arc-20160816; b=ubtbuvScNDz1SyNsXEKSbhoImth6reVRn5ce/FFtsSzhrR9nh44Lzii9uSpXIIzZeO 5FdeBfkyYMmmPhI5vbtl43n+3N9hN9Fai0/ytS5lXoi+NPW5OXTbX3DWAxpBrklVhxgC 7pAfx7kgq+WWNFjKWg+UaaUnpz8CEPRz7nRlwlZfFnTlyDoZUrIIomq0RhTEfydn4Wma 1fuCAdq2tmxucyWPPhlKeNkrKSB665Y5zf9MrzJCpEYxskp41jXjU+k5waNJl6sKjJJr nmKTML8XyCwAG74NorNbTVGxQsRdXOwWkiUQXkMqvu7HT91AyOwPptzP9RbNEnvI7cwi FEnw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:user-agent:references :in-reply-to:message-id:date:cc:to:from:subject; bh=n8hi6N/kizta8FqBmW4FQV8wLCzVOtOh6NRdhaV269k=; b=x/B2ieLn+EnbECYvqnCT/iXG1JkdiFqftmOwmxeJGq38YDJXiM29z6XaiXbG0ZAXa7 Ml031SmWKI4lgEnITdW9tsbQk84dnVrfyOHrjoSTEsM79qaPVX47+p09IU0Erf5zlS4f DJS3CyjXEetFOJBK1t7XUJFCkWzjtoC8L+ERs/FnvpLBPTP8VFgVdeEfL7AqdbFKR9oq g7BsarQh4anCouF7QBmo7gttq/inSHeURrg64mFKBEKOMA7uvroJezlpTRctCPUvBOVY btSgBcHsFetEPbxnsF44gHvqmB4YFJQaHVBpwIN/17lUL8tVKldG114zzIfmEiOiTiuT qnPg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga07.intel.com (mga07.intel.com. [134.134.136.100]) by mx.google.com with ESMTPS id o10si30054993pfo.196.2019.06.05.15.12.16 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 05 Jun 2019 15:12:16 -0700 (PDT) Received-SPF: pass (google.com: domain of dan.j.williams@intel.com designates 134.134.136.100 as permitted sender) client-ip=134.134.136.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 05 Jun 2019 15:12:16 -0700 X-ExtLoop1: 1 Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.16]) by fmsmga004.fm.intel.com with ESMTP; 05 Jun 2019 15:12:15 -0700 Subject: [PATCH v9 02/12] mm/sparsemem: Add helpers track active portions of a section at boot From: Dan Williams To: akpm@linux-foundation.org Cc: Michal Hocko , Vlastimil Babka , Logan Gunthorpe , Oscar Salvador , Pavel Tatashin , Jane Chu , linux-mm@kvack.org, linux-nvdimm@lists.01.org, linux-kernel@vger.kernel.org, osalvador@suse.de, mhocko@suse.com Date: Wed, 05 Jun 2019 14:57:59 -0700 Message-ID: <155977187919.2443951.8925592545929008845.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <155977186863.2443951.9036044808311959913.stgit@dwillia2-desk3.amr.corp.intel.com> References: <155977186863.2443951.9036044808311959913.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.18-2-gc94f MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Prepare for hot{plug,remove} of sub-ranges of a section by tracking a sub-section active bitmask, each bit representing a PMD_SIZE span of the architecture's memory hotplug section size. The implications of a partially populated section is that pfn_valid() needs to go beyond a valid_section() check and read the sub-section active ranges from the bitmask. The expectation is that the bitmask (subsection_map) fits in the same cacheline as the valid_section() data, so the incremental performance overhead to pfn_valid() should be negligible. Cc: Michal Hocko Cc: Vlastimil Babka Cc: Logan Gunthorpe Cc: Oscar Salvador Cc: Pavel Tatashin Tested-by: Jane Chu Signed-off-by: Dan Williams Reviewed-by: Oscar Salvador --- include/linux/mmzone.h | 29 ++++++++++++++++++++++++++++- mm/page_alloc.c | 4 +++- mm/sparse.c | 35 +++++++++++++++++++++++++++++++++++ 3 files changed, 66 insertions(+), 2 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index ac163f2f274f..6dd52d544857 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -1199,6 +1199,8 @@ struct mem_section_usage { unsigned long pageblock_flags[0]; }; +void subsection_map_init(unsigned long pfn, unsigned long nr_pages); + struct page; struct page_ext; struct mem_section { @@ -1336,12 +1338,36 @@ static inline struct mem_section *__pfn_to_section(unsigned long pfn) extern int __highest_present_section_nr; +static inline int subsection_map_index(unsigned long pfn) +{ + return (pfn & ~(PAGE_SECTION_MASK)) / PAGES_PER_SUBSECTION; +} + +#ifdef CONFIG_SPARSEMEM_VMEMMAP +static inline int pfn_section_valid(struct mem_section *ms, unsigned long pfn) +{ + int idx = subsection_map_index(pfn); + + return test_bit(idx, ms->usage->subsection_map); +} +#else +static inline int pfn_section_valid(struct mem_section *ms, unsigned long pfn) +{ + return 1; +} +#endif + #ifndef CONFIG_HAVE_ARCH_PFN_VALID static inline int pfn_valid(unsigned long pfn) { + struct mem_section *ms; + if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS) return 0; - return valid_section(__nr_to_section(pfn_to_section_nr(pfn))); + ms = __nr_to_section(pfn_to_section_nr(pfn)); + if (!valid_section(ms)) + return 0; + return pfn_section_valid(ms, pfn); } #endif @@ -1373,6 +1399,7 @@ void sparse_init(void); #define sparse_init() do {} while (0) #define sparse_index_init(_sec, _nid) do {} while (0) #define pfn_present pfn_valid +#define subsection_map_init(_pfn, _nr_pages) do {} while (0) #endif /* CONFIG_SPARSEMEM */ /* diff --git a/mm/page_alloc.c b/mm/page_alloc.c index c6d8224d792e..bd773efe5b82 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -7292,10 +7292,12 @@ void __init free_area_init_nodes(unsigned long *max_zone_pfn) /* Print out the early node map */ pr_info("Early memory node ranges\n"); - for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid) + for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid) { pr_info(" node %3d: [mem %#018Lx-%#018Lx]\n", nid, (u64)start_pfn << PAGE_SHIFT, ((u64)end_pfn << PAGE_SHIFT) - 1); + subsection_map_init(start_pfn, end_pfn - start_pfn); + } /* Initialise every node */ mminit_verify_pageflags_layout(); diff --git a/mm/sparse.c b/mm/sparse.c index 71da15cc7432..0baa2e55cfdd 100644 --- a/mm/sparse.c +++ b/mm/sparse.c @@ -210,6 +210,41 @@ static inline unsigned long first_present_section_nr(void) return next_present_section_nr(-1); } +void subsection_mask_set(unsigned long *map, unsigned long pfn, + unsigned long nr_pages) +{ + int idx = subsection_map_index(pfn); + int end = subsection_map_index(pfn + nr_pages - 1); + + bitmap_set(map, idx, end - idx + 1); +} + +void subsection_map_init(unsigned long pfn, unsigned long nr_pages) +{ + int end_sec = pfn_to_section_nr(pfn + nr_pages - 1); + int i, start_sec = pfn_to_section_nr(pfn); + + if (!nr_pages) + return; + + for (i = start_sec; i <= end_sec; i++) { + struct mem_section *ms; + unsigned long pfns; + + pfns = min(nr_pages, PAGES_PER_SECTION + - (pfn & ~PAGE_SECTION_MASK)); + ms = __nr_to_section(i); + subsection_mask_set(ms->usage->subsection_map, pfn, pfns); + + pr_debug("%s: sec: %d pfns: %ld set(%d, %d)\n", __func__, i, + pfns, subsection_map_index(pfn), + subsection_map_index(pfn + pfns - 1)); + + pfn += pfns; + nr_pages -= pfns; + } +} + /* Record a memory area against a node. */ void __init memory_present(int nid, unsigned long start, unsigned long end) { From patchwork Wed Jun 5 21:58:04 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 10977961 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9AA986C5 for ; Wed, 5 Jun 2019 22:12:26 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8AECD202A5 for ; Wed, 5 Jun 2019 22:12:26 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 7ED71285DB; Wed, 5 Jun 2019 22:12:26 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=unavailable version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1544E202A5 for ; Wed, 5 Jun 2019 22:12:26 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1A2BB6B0271; Wed, 5 Jun 2019 18:12:25 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 152FE6B0272; Wed, 5 Jun 2019 18:12:25 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 044236B0273; Wed, 5 Jun 2019 18:12:25 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f198.google.com (mail-pf1-f198.google.com [209.85.210.198]) by kanga.kvack.org (Postfix) with ESMTP id C22BE6B0271 for ; Wed, 5 Jun 2019 18:12:24 -0400 (EDT) Received: by mail-pf1-f198.google.com with SMTP id l4so289121pff.5 for ; Wed, 05 Jun 2019 15:12:24 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:subject:from :to:cc:date:message-id:in-reply-to:references:user-agent :mime-version:content-transfer-encoding; bh=hae6rk+qL45xxjhpub2qabo/6u1of96bSs3wXLn92Vo=; b=oKmJ8VqFD/sNPl5+KQLdvHLWHz27omKXAty5LCgNksK04rYuB9Y5saHwUyETaCbD8M tJJz3p6EZZzDrfk/IvObY9RyuTYUpzShzlu2xfLiLdxp6TSTXFBuiz727qEsG/oiJkD9 XsGV9CSfelEEngC5KBAmRDvmpkBb+ODoMEk3dxHBfFtYNaB4lTXsd68i/p4t2Sps6MUf yZTaBf4FS0LME4ZqHoErwNk3i4X9j3XMxDLBs0o0UKZtgb/qkBwTzROnIBPCEeV5ikdG LavFZp9rC/x6azfAhBCq6JDhK5sY0xDYFTFbxJrCPCz39mdDWODxgEMz7yWPGRG1QbF4 K1xg== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: APjAAAXxIdIkhCBQTYEnoVeOuLj9svFp8GCKXJBzxXmsoPhbLsFad2W7 BdL2MPv9cl7gPjpPUlQuEY0k/aJV4w3EMe8o1b/AJ/ER21a4qRssmbJPdjaBPucYLvBMQNGgi6j ERMoQ4fZU4Kq7e1Id2GDZZCO1zYKD0ZX/X0652C/1bBKK2a8l/yToY+TMLciYfQrd6w== X-Received: by 2002:a17:902:12f:: with SMTP id 44mr46457107plb.137.1559772744272; Wed, 05 Jun 2019 15:12:24 -0700 (PDT) X-Google-Smtp-Source: APXvYqxh5qRiyxRHDkiDDNVkczA1qXX5lo2rWM1BMrAYvpAP/XznMuockBbtI3Z3YLxk59ommzhl X-Received: by 2002:a17:902:12f:: with SMTP id 44mr46457020plb.137.1559772743432; Wed, 05 Jun 2019 15:12:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1559772743; cv=none; d=google.com; s=arc-20160816; b=sLMsGn7IJSqox+hSFcELsuvHrA8/lz5kXY7ycU+kF/MZ7JPEPwBE8dkREmnV30b4tb +lDyFeviXhUoq5DdXwyBn7zRC200ZwDdf9SZDCwqbLNbJXKiA6HohL/v2TDwoep92TXw EJNCipIcfykNwpz015xgg8mpfD8BK9YFpAbLEUKtZNAJus2Iwk36mTvoJBkyrtkZvynl 4oTSps0TDCIwF25dsdDrmQdZgN3Oes8V8dZFW7aOO1zuPDIdDU3xwpRC+TFpnkrtCRzQ wHI+QzFppo+H3h/+CoHGHpDvmIfIlBnU7XDP+uvQUUnKjBr/Q2MzSwSoh/c2vVoLuvKb kCkA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:user-agent:references :in-reply-to:message-id:date:cc:to:from:subject; bh=hae6rk+qL45xxjhpub2qabo/6u1of96bSs3wXLn92Vo=; b=0PEX8D3jJPFlTr8F1LtLGikghfRUt90Qp0J4ANgrN90XHPMs6FhGgC0uFiJ+bDEwO+ RpVnuZx/6MdGagwUIj0AGKhQG31U/ggrrITPItuwlnG1TeDDSy8QNw4x8YiUP3c5y8vZ Qdo4uZCBePbWRoGQURe+/aTNqtl782IWPgi748epWvR0q9XlCI81F4NJ+LohUWQFaMqk 8ru2ZPGGkSdV5OILsV0Xb9+okxTWnW2kHzqedmRhqoGah6WcaD/d+0rZCv7D5iSPcbJo pSulKWey81HVBb9qaDx+E5N8cNhIGNDnethpECMa5ojjNT/eMPqdUbAsjUvT6pDgPc4d 3JsQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga05.intel.com (mga05.intel.com. [192.55.52.43]) by mx.google.com with ESMTPS id f1si131894pgi.432.2019.06.05.15.12.23 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 05 Jun 2019 15:12:23 -0700 (PDT) Received-SPF: pass (google.com: domain of dan.j.williams@intel.com designates 192.55.52.43 as permitted sender) client-ip=192.55.52.43; Authentication-Results: mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 05 Jun 2019 15:12:22 -0700 X-ExtLoop1: 1 Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.16]) by FMSMGA003.fm.intel.com with ESMTP; 05 Jun 2019 15:12:21 -0700 Subject: [PATCH v9 03/12] mm/hotplug: Prepare shrink_{zone, pgdat}_span for sub-section removal From: Dan Williams To: akpm@linux-foundation.org Cc: Michal Hocko , Vlastimil Babka , Logan Gunthorpe , Pavel Tatashin , Oscar Salvador , linux-mm@kvack.org, linux-nvdimm@lists.01.org, linux-kernel@vger.kernel.org, osalvador@suse.de, mhocko@suse.com Date: Wed, 05 Jun 2019 14:58:04 -0700 Message-ID: <155977188458.2443951.9573565800736334460.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <155977186863.2443951.9036044808311959913.stgit@dwillia2-desk3.amr.corp.intel.com> References: <155977186863.2443951.9036044808311959913.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.18-2-gc94f MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Sub-section hotplug support reduces the unit of operation of hotplug from section-sized-units (PAGES_PER_SECTION) to sub-section-sized units (PAGES_PER_SUBSECTION). Teach shrink_{zone,pgdat}_span() to consider PAGES_PER_SUBSECTION boundaries as the points where pfn_valid(), not valid_section(), can toggle. Cc: Michal Hocko Cc: Vlastimil Babka Cc: Logan Gunthorpe Reviewed-by: Pavel Tatashin Reviewed-by: Oscar Salvador Signed-off-by: Dan Williams --- mm/memory_hotplug.c | 29 ++++++++--------------------- 1 file changed, 8 insertions(+), 21 deletions(-) diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 7b963c2d3a0d..647859a1d119 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -318,12 +318,8 @@ static unsigned long find_smallest_section_pfn(int nid, struct zone *zone, unsigned long start_pfn, unsigned long end_pfn) { - struct mem_section *ms; - - for (; start_pfn < end_pfn; start_pfn += PAGES_PER_SECTION) { - ms = __pfn_to_section(start_pfn); - - if (unlikely(!valid_section(ms))) + for (; start_pfn < end_pfn; start_pfn += PAGES_PER_SUBSECTION) { + if (unlikely(!pfn_valid(start_pfn))) continue; if (unlikely(pfn_to_nid(start_pfn) != nid)) @@ -343,15 +339,12 @@ static unsigned long find_biggest_section_pfn(int nid, struct zone *zone, unsigned long start_pfn, unsigned long end_pfn) { - struct mem_section *ms; unsigned long pfn; /* pfn is the end pfn of a memory section. */ pfn = end_pfn - 1; - for (; pfn >= start_pfn; pfn -= PAGES_PER_SECTION) { - ms = __pfn_to_section(pfn); - - if (unlikely(!valid_section(ms))) + for (; pfn >= start_pfn; pfn -= PAGES_PER_SUBSECTION) { + if (unlikely(!pfn_valid(pfn))) continue; if (unlikely(pfn_to_nid(pfn) != nid)) @@ -373,7 +366,6 @@ static void shrink_zone_span(struct zone *zone, unsigned long start_pfn, unsigned long z = zone_end_pfn(zone); /* zone_end_pfn namespace clash */ unsigned long zone_end_pfn = z; unsigned long pfn; - struct mem_section *ms; int nid = zone_to_nid(zone); zone_span_writelock(zone); @@ -410,10 +402,8 @@ static void shrink_zone_span(struct zone *zone, unsigned long start_pfn, * it check the zone has only hole or not. */ pfn = zone_start_pfn; - for (; pfn < zone_end_pfn; pfn += PAGES_PER_SECTION) { - ms = __pfn_to_section(pfn); - - if (unlikely(!valid_section(ms))) + for (; pfn < zone_end_pfn; pfn += PAGES_PER_SUBSECTION) { + if (unlikely(!pfn_valid(pfn))) continue; if (page_zone(pfn_to_page(pfn)) != zone) @@ -441,7 +431,6 @@ static void shrink_pgdat_span(struct pglist_data *pgdat, unsigned long p = pgdat_end_pfn(pgdat); /* pgdat_end_pfn namespace clash */ unsigned long pgdat_end_pfn = p; unsigned long pfn; - struct mem_section *ms; int nid = pgdat->node_id; if (pgdat_start_pfn == start_pfn) { @@ -478,10 +467,8 @@ static void shrink_pgdat_span(struct pglist_data *pgdat, * has only hole or not. */ pfn = pgdat_start_pfn; - for (; pfn < pgdat_end_pfn; pfn += PAGES_PER_SECTION) { - ms = __pfn_to_section(pfn); - - if (unlikely(!valid_section(ms))) + for (; pfn < pgdat_end_pfn; pfn += PAGES_PER_SUBSECTION) { + if (unlikely(!pfn_valid(pfn))) continue; if (pfn_to_nid(pfn) != nid) From patchwork Wed Jun 5 21:58:21 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 10977965 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 251086C5 for ; Wed, 5 Jun 2019 22:12:43 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 12D8621BED for ; Wed, 5 Jun 2019 22:12:43 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 05098285DB; Wed, 5 Jun 2019 22:12:43 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=unavailable version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3A34028A4A for ; Wed, 5 Jun 2019 22:12:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3C59E6B0272; Wed, 5 Jun 2019 18:12:41 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 375756B0273; Wed, 5 Jun 2019 18:12:41 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 264AC6B0274; Wed, 5 Jun 2019 18:12:41 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f199.google.com (mail-pl1-f199.google.com [209.85.214.199]) by kanga.kvack.org (Postfix) with ESMTP id DEA436B0272 for ; Wed, 5 Jun 2019 18:12:40 -0400 (EDT) Received: by mail-pl1-f199.google.com with SMTP id o12so181262pll.17 for ; Wed, 05 Jun 2019 15:12:40 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:subject:from :to:cc:date:message-id:in-reply-to:references:user-agent :mime-version:content-transfer-encoding; bh=DPdJ9vQYv7PkI7f5U43CBPqu09YPnM6cMNpV6pU0IOY=; b=Dfzd6klS3/0cK0dQ3iUMHVXq4d08rj5EE4ySwl4DGN3OBlQ7/EHGpPwjuSFM6yZS6K lWFjy4T1dN5ZUHbXynjs8MVLQQm0SM6a+r3SruQcWg97Gzq3z1MacC4cEYA2Ej9Pn8Ng 4BAQLJ1up6gUiR3OIka2hBVh0ywGmqysIWoYlmGfyN4mDaGLm2zr7aMII2EuOaYBYiAb XqutrdxVEmURN97YQv057k35aPdM8ELSR7IeetFlTHIdxNZK4b22hjgfS3ibHRK+cDLm jSLc3tobHDxwEjnOHyxJbr0RjgobEBvhL2JFO4vV0PJzX7M9TiKfjvV1LSE4ROkB53GM c13Q== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 192.55.52.136 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: APjAAAV4gqaRqGRQTnSabB0dB/Vfuybm1D2O3P8y4wf7yYx0/jqIcpT0 vbDYTS49z6o6Kz0Ed9C/68aohMCgDgyiPULFFSmy60zLHNJFbzkG9Fx5Jxh7Mh5JdcyGQ2IsytN WSsdlnN5vGVn3/fpfUnTrWrc3MKhzYxSHIum3nEvZeDQ+HIi8PXuHRnWxdhe8cNHHlg== X-Received: by 2002:a17:902:934a:: with SMTP id g10mr35625313plp.18.1559772760406; Wed, 05 Jun 2019 15:12:40 -0700 (PDT) X-Google-Smtp-Source: APXvYqxKmyTe/a3QTnaxBPkhFgkQxL3jliWIh06+KRGkUX3elEII95MhAhVUgw0kJ/NOuF9k93bF X-Received: by 2002:a17:902:934a:: with SMTP id g10mr35625196plp.18.1559772759399; Wed, 05 Jun 2019 15:12:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1559772759; cv=none; d=google.com; s=arc-20160816; b=YyrXP10LrJg9kY9hKjDyXa8lcOaCgtlq+VtgNj+2OfqeRLkjJfqQZjL0RdeikBraQF 5sS+fH0Ap3IIdHgkupu7Xf1x1UUX8WlH1yly0ICrXVv8abAhz/PKX1iXQYfnWyi0x4R6 Myw3hGXA3csV7gaXq4tkxZEogKpyof/JvEmXb0OVkNPMk2khsxYNss6FJ2y3z7nKHbEQ sL4Pr57+5eV/UynYJEE4yPWWQ5L+MSLQ3YxTuw8oFnlzuW8vVAzm0Dw4IOBN0t1QnzlH YaPNNIcm5ls1j3Q+FZ/yvrEno4g2YCxWadgjqHQKcfedLyIf43fy5E9fZRGaz8ZO+NQ+ JRuw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:user-agent:references :in-reply-to:message-id:date:cc:to:from:subject; bh=DPdJ9vQYv7PkI7f5U43CBPqu09YPnM6cMNpV6pU0IOY=; b=fyPahfI1ZS+7WQfhDe47c9/GiJdP4ln6Mi6g5AGk+D4im4E/viI1lQzjR6QZNI3pFY Wa+/K0x8bFIt4szvQGcKbWQKRQ0Sd2fgafgLF0zuA8vU5RPi8GBmDG+a6KPbi8PhESI0 ggkkJn5L69QENf03GJ9Kivvm9auYuzfUR8aZKTg2gHcfVLet9ro1xe6eFTtY4l1ymTp5 XxwaNih9n2Mc4ZKVoE1YN/k1pTWOKOmZAUakmWmIrkzzOkuz4rL+v7I323vKkDid06VW m453aPN9w+9EpRIvDVhNl4H8+V1fa+2yP9drymmfdF+04CxtctBX02VGBrVMtJqzffR6 cIqQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 192.55.52.136 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga12.intel.com (mga12.intel.com. [192.55.52.136]) by mx.google.com with ESMTPS id s6si27684091plr.112.2019.06.05.15.12.39 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 05 Jun 2019 15:12:39 -0700 (PDT) Received-SPF: pass (google.com: domain of dan.j.williams@intel.com designates 192.55.52.136 as permitted sender) client-ip=192.55.52.136; Authentication-Results: mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 192.55.52.136 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga106.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 05 Jun 2019 15:12:38 -0700 X-ExtLoop1: 1 Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.16]) by fmsmga007.fm.intel.com with ESMTP; 05 Jun 2019 15:12:38 -0700 Subject: [PATCH v9 04/12] mm/sparsemem: Convert kmalloc_section_memmap() to populate_section_memmap() From: Dan Williams To: akpm@linux-foundation.org Cc: Michal Hocko , David Hildenbrand , Logan Gunthorpe , Oscar Salvador , Pavel Tatashin , linux-mm@kvack.org, linux-nvdimm@lists.01.org, linux-kernel@vger.kernel.org, osalvador@suse.de, mhocko@suse.com Date: Wed, 05 Jun 2019 14:58:21 -0700 Message-ID: <155977189139.2443951.460884430946346998.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <155977186863.2443951.9036044808311959913.stgit@dwillia2-desk3.amr.corp.intel.com> References: <155977186863.2443951.9036044808311959913.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.18-2-gc94f MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Allow sub-section sized ranges to be added to the memmap. populate_section_memmap() takes an explict pfn range rather than assuming a full section, and those parameters are plumbed all the way through to vmmemap_populate(). There should be no sub-section usage in current deployments. New warnings are added to clarify which memmap allocation paths are sub-section capable. Cc: Michal Hocko Cc: David Hildenbrand Cc: Logan Gunthorpe Cc: Oscar Salvador Reviewed-by: Pavel Tatashin Signed-off-by: Dan Williams Reviewed-by: Oscar Salvador --- arch/x86/mm/init_64.c | 4 ++- include/linux/mm.h | 4 ++- mm/sparse-vmemmap.c | 21 +++++++++++------ mm/sparse.c | 61 +++++++++++++++++++++++++++++++------------------ 4 files changed, 57 insertions(+), 33 deletions(-) diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c index 8335ac6e1112..688fb0687e55 100644 --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -1520,7 +1520,9 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node, { int err; - if (boot_cpu_has(X86_FEATURE_PSE)) + if (end - start < PAGES_PER_SECTION * sizeof(struct page)) + err = vmemmap_populate_basepages(start, end, node); + else if (boot_cpu_has(X86_FEATURE_PSE)) err = vmemmap_populate_hugepages(start, end, node, altmap); else if (altmap) { pr_err_once("%s: no cpu support for altmap allocations\n", diff --git a/include/linux/mm.h b/include/linux/mm.h index acc578407e9e..c502f3ce8661 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2734,8 +2734,8 @@ const char * arch_vma_name(struct vm_area_struct *vma); void print_vma_addr(char *prefix, unsigned long rip); void *sparse_buffer_alloc(unsigned long size); -struct page *sparse_mem_map_populate(unsigned long pnum, int nid, - struct vmem_altmap *altmap); +struct page * __populate_section_memmap(unsigned long pfn, + unsigned long nr_pages, int nid, struct vmem_altmap *altmap); pgd_t *vmemmap_pgd_populate(unsigned long addr, int node); p4d_t *vmemmap_p4d_populate(pgd_t *pgd, unsigned long addr, int node); pud_t *vmemmap_pud_populate(p4d_t *p4d, unsigned long addr, int node); diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c index 7fec05796796..200aef686722 100644 --- a/mm/sparse-vmemmap.c +++ b/mm/sparse-vmemmap.c @@ -245,19 +245,26 @@ int __meminit vmemmap_populate_basepages(unsigned long start, return 0; } -struct page * __meminit sparse_mem_map_populate(unsigned long pnum, int nid, - struct vmem_altmap *altmap) +struct page * __meminit __populate_section_memmap(unsigned long pfn, + unsigned long nr_pages, int nid, struct vmem_altmap *altmap) { unsigned long start; unsigned long end; - struct page *map; - map = pfn_to_page(pnum * PAGES_PER_SECTION); - start = (unsigned long)map; - end = (unsigned long)(map + PAGES_PER_SECTION); + /* + * The minimum granularity of memmap extensions is + * PAGES_PER_SUBSECTION as allocations are tracked in the + * 'subsection_map' bitmap of the section. + */ + end = ALIGN(pfn + nr_pages, PAGES_PER_SUBSECTION); + pfn &= PAGE_SUBSECTION_MASK; + nr_pages = end - pfn; + + start = (unsigned long) pfn_to_page(pfn); + end = start + nr_pages * sizeof(struct page); if (vmemmap_populate(start, end, nid, altmap)) return NULL; - return map; + return pfn_to_page(pfn); } diff --git a/mm/sparse.c b/mm/sparse.c index 0baa2e55cfdd..2093c662a5f7 100644 --- a/mm/sparse.c +++ b/mm/sparse.c @@ -439,8 +439,8 @@ static unsigned long __init section_map_size(void) return PAGE_ALIGN(sizeof(struct page) * PAGES_PER_SECTION); } -struct page __init *sparse_mem_map_populate(unsigned long pnum, int nid, - struct vmem_altmap *altmap) +struct page __init *__populate_section_memmap(unsigned long pfn, + unsigned long nr_pages, int nid, struct vmem_altmap *altmap) { unsigned long size = section_map_size(); struct page *map = sparse_buffer_alloc(size); @@ -521,10 +521,13 @@ static void __init sparse_init_nid(int nid, unsigned long pnum_begin, } sparse_buffer_init(map_count * section_map_size(), nid); for_each_present_section_nr(pnum_begin, pnum) { + unsigned long pfn = section_nr_to_pfn(pnum); + if (pnum >= pnum_end) break; - map = sparse_mem_map_populate(pnum, nid, NULL); + map = __populate_section_memmap(pfn, PAGES_PER_SECTION, + nid, NULL); if (!map) { pr_err("%s: node[%d] memory map backing failed. Some memory will not be available.", __func__, nid); @@ -624,17 +627,17 @@ void offline_mem_sections(unsigned long start_pfn, unsigned long end_pfn) #endif #ifdef CONFIG_SPARSEMEM_VMEMMAP -static inline struct page *kmalloc_section_memmap(unsigned long pnum, int nid, - struct vmem_altmap *altmap) +static struct page *populate_section_memmap(unsigned long pfn, + unsigned long nr_pages, int nid, struct vmem_altmap *altmap) { - /* This will make the necessary allocations eventually. */ - return sparse_mem_map_populate(pnum, nid, altmap); + return __populate_section_memmap(pfn, nr_pages, nid, altmap); } -static void __kfree_section_memmap(struct page *memmap, + +static void depopulate_section_memmap(unsigned long pfn, unsigned long nr_pages, struct vmem_altmap *altmap) { - unsigned long start = (unsigned long)memmap; - unsigned long end = (unsigned long)(memmap + PAGES_PER_SECTION); + unsigned long start = (unsigned long) pfn_to_page(pfn); + unsigned long end = start + nr_pages * sizeof(struct page); vmemmap_free(start, end, altmap); } @@ -646,11 +649,18 @@ static void free_map_bootmem(struct page *memmap) vmemmap_free(start, end, NULL); } #else -static struct page *__kmalloc_section_memmap(void) +struct page *populate_section_memmap(unsigned long pfn, + unsigned long nr_pages, int nid, struct vmem_altmap *altmap) { struct page *page, *ret; unsigned long memmap_size = sizeof(struct page) * PAGES_PER_SECTION; + if ((pfn & ~PAGE_SECTION_MASK) || nr_pages != PAGES_PER_SECTION) { + WARN(1, "%s: called with section unaligned parameters\n", + __func__); + return NULL; + } + page = alloc_pages(GFP_KERNEL|__GFP_NOWARN, get_order(memmap_size)); if (page) goto got_map_page; @@ -667,15 +677,17 @@ static struct page *__kmalloc_section_memmap(void) return ret; } -static inline struct page *kmalloc_section_memmap(unsigned long pnum, int nid, +static void depopulate_section_memmap(unsigned long pfn, unsigned long nr_pages, struct vmem_altmap *altmap) { - return __kmalloc_section_memmap(); -} + struct page *memmap = pfn_to_page(pfn); + + if ((pfn & ~PAGE_SECTION_MASK) || nr_pages != PAGES_PER_SECTION) { + WARN(1, "%s: called with section unaligned parameters\n", + __func__); + return; + } -static void __kfree_section_memmap(struct page *memmap, - struct vmem_altmap *altmap) -{ if (is_vmalloc_addr(memmap)) vfree(memmap); else @@ -744,12 +756,13 @@ int __meminit sparse_add_one_section(int nid, unsigned long start_pfn, if (ret < 0 && ret != -EEXIST) return ret; ret = 0; - memmap = kmalloc_section_memmap(section_nr, nid, altmap); + memmap = populate_section_memmap(start_pfn, PAGES_PER_SECTION, nid, + altmap); if (!memmap) return -ENOMEM; usage = kzalloc(mem_section_usage_size(), GFP_KERNEL); if (!usage) { - __kfree_section_memmap(memmap, altmap); + depopulate_section_memmap(start_pfn, PAGES_PER_SECTION, altmap); return -ENOMEM; } @@ -771,7 +784,7 @@ int __meminit sparse_add_one_section(int nid, unsigned long start_pfn, out: if (ret < 0) { kfree(usage); - __kfree_section_memmap(memmap, altmap); + depopulate_section_memmap(start_pfn, PAGES_PER_SECTION, altmap); } return ret; } @@ -807,7 +820,8 @@ static inline void clear_hwpoisoned_pages(struct page *memmap, int nr_pages) #endif static void free_section_usage(struct page *memmap, - struct mem_section_usage *usage, struct vmem_altmap *altmap) + struct mem_section_usage *usage, unsigned long pfn, + unsigned long nr_pages, struct vmem_altmap *altmap) { struct page *usage_page; @@ -821,7 +835,7 @@ static void free_section_usage(struct page *memmap, if (PageSlab(usage_page) || PageCompound(usage_page)) { kfree(usage); if (memmap) - __kfree_section_memmap(memmap, altmap); + depopulate_section_memmap(pfn, nr_pages, altmap); return; } @@ -850,6 +864,7 @@ void sparse_remove_one_section(struct mem_section *ms, unsigned long map_offset, clear_hwpoisoned_pages(memmap + map_offset, PAGES_PER_SECTION - map_offset); - free_section_usage(memmap, usage, altmap); + free_section_usage(memmap, usage, section_nr_to_pfn(__section_nr(ms)), + PAGES_PER_SECTION, altmap); } #endif /* CONFIG_MEMORY_HOTPLUG */ From patchwork Wed Jun 5 21:58:27 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 10977969 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D97916C5 for ; Wed, 5 Jun 2019 22:12:47 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C9EB421BED for ; Wed, 5 Jun 2019 22:12:47 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id BE9312890A; Wed, 5 Jun 2019 22:12:47 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6043121BED for ; Wed, 5 Jun 2019 22:12:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 796F26B0273; Wed, 5 Jun 2019 18:12:46 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 745BC6B0274; Wed, 5 Jun 2019 18:12:46 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 65C2B6B0276; Wed, 5 Jun 2019 18:12:46 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f198.google.com (mail-pl1-f198.google.com [209.85.214.198]) by kanga.kvack.org (Postfix) with ESMTP id 324276B0273 for ; Wed, 5 Jun 2019 18:12:46 -0400 (EDT) Received: by mail-pl1-f198.google.com with SMTP id bb9so214448plb.2 for ; Wed, 05 Jun 2019 15:12:46 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:subject:from :to:cc:date:message-id:in-reply-to:references:user-agent :mime-version:content-transfer-encoding; bh=LEsj0JvF5oyzUNmNgQWz6pMzr+Cf/rYYP2iyFKbjxv8=; b=TrR9efAp8kuiCsgm72WXa/f6Yy4U7uwEn4pWgi0N4YXItvbV/eGsiMjpD6i3kWtyFp mmqzok1nf/YkN0AAsE0kgOD8L7tuCuL/7Q/oCHERf70DdIWx/azAmH1U638asrTBevkJ 2/tE1WM9t4wpLAPjjTDZQyyc1WNZyziFTtiloL//BhFqDjqYhP4xxdmc/w2h7efW1e4o bo1yvaqJNVRcfxSBljCsXAjOXZgYW6eFDfUmRYjgFl7z1bSOJJxGQtrCdoiVaz0tv4of /Dvz83Qh2NlaNcFS1vA4qkz8KBleP5fe+yAHAui2+PL7ZMfnGqTH8E0myZv8z1qfE+/M 42CA== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: APjAAAV6iBwCGkDaewqXDE7wQVm8VtEnZQEWSuQR7+/DM3S3lLXUhjrQ 4tiDVtm+ghA386gGdf6cK5qbhmpoAQyp0BFXS7J7t/N8dT2r2i2U2G3I2bn46YESOIxznAVoMWt QAbEw9+Y20cun3VoVzGG1jhqVrrRX/oGEcMPEdOWx/ZLE73VuD+UofPZS0x09nWstSQ== X-Received: by 2002:a63:fb01:: with SMTP id o1mr97762pgh.410.1559772765674; Wed, 05 Jun 2019 15:12:45 -0700 (PDT) X-Google-Smtp-Source: APXvYqxEHBgQ6nM3J0NxPUAWLbUHffYQaNfJPsyBMHEtYkObpv+iptTu+glAjP1s2VvpOpprzJ9+ X-Received: by 2002:a63:fb01:: with SMTP id o1mr97680pgh.410.1559772764841; Wed, 05 Jun 2019 15:12:44 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1559772764; cv=none; d=google.com; s=arc-20160816; b=B9tTmw66oracIb7MiMqq/oPHFPtmbBtvOBlvEfcP9Bmk8WV74meUGI7gxrB8Vn2tWM md32vRYGZzs+UJ4md8qviaanc/LBjRH6tOkpCcKR7ad3fDzIuwbPpBU5cTpi7M9W36eA yoF+2Jl6T9Y+nZBeUnpVbD40VOz8YNJeumK44l/GbZrklaq52nWkeijIohAoQA1OA/bz uKMsEOmUJYPgdmRQ6ia2udMkZPcPxB5Ci44ybT5O8w0aULKwvfwzjjoXO3N9F3FZq6M0 PQNlm6Pe+Y6rTZ7elP2c29TMhRvWreLgjGq8FnBoJthIaW4Bj4ss425MGFxbskXqgMQV d5aQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:user-agent:references :in-reply-to:message-id:date:cc:to:from:subject; bh=LEsj0JvF5oyzUNmNgQWz6pMzr+Cf/rYYP2iyFKbjxv8=; b=S/whieR3KA2av18vPTXCH2kqjB3NnQADfrtpCHcImt1CxFSWU8BPZMMU/z13ofHQZT PxUlGJ7Zt/1E3Fe06kcnFhKlhTTmtNwA/ImiGhtc68e7nbtLonooPNOToP/udnqWJaIZ V2D2lK8wER3Zyw7MikAhc3Tb8j3nK+P3wofAn8KbS5wmViMvkHJoo84zyBu6x7GYt7T8 q6hZAYZp5aLy9tB3mZUr+VqvCoF7iZEeuj/wo3Wi7bFnRahlqYzrqVvRStm+EMg+Lefa tUVA/7nOyGkLgoaG2vU+0dPS/ylOxUb6fVMUdUqHJVoNcrHsFAqEvg3zx4rQMVQmqABj J8OQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga05.intel.com (mga05.intel.com. [192.55.52.43]) by mx.google.com with ESMTPS id f9si193471pgv.5.2019.06.05.15.12.44 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 05 Jun 2019 15:12:44 -0700 (PDT) Received-SPF: pass (google.com: domain of dan.j.williams@intel.com designates 192.55.52.43 as permitted sender) client-ip=192.55.52.43; Authentication-Results: mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 05 Jun 2019 15:12:44 -0700 X-ExtLoop1: 1 Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.16]) by orsmga004.jf.intel.com with ESMTP; 05 Jun 2019 15:12:44 -0700 Subject: [PATCH v9 05/12] mm/hotplug: Kill is_dev_zone() usage in __remove_pages() From: Dan Williams To: akpm@linux-foundation.org Cc: Michal Hocko , Logan Gunthorpe , Pavel Tatashin , David Hildenbrand , Oscar Salvador , linux-mm@kvack.org, linux-nvdimm@lists.01.org, linux-kernel@vger.kernel.org, osalvador@suse.de, mhocko@suse.com Date: Wed, 05 Jun 2019 14:58:27 -0700 Message-ID: <155977190749.2443951.1028412998584791672.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <155977186863.2443951.9036044808311959913.stgit@dwillia2-desk3.amr.corp.intel.com> References: <155977186863.2443951.9036044808311959913.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.18-2-gc94f MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP The zone type check was a leftover from the cleanup that plumbed altmap through the memory hotplug path, i.e. commit da024512a1fa "mm: pass the vmem_altmap to arch_remove_memory and __remove_pages". Cc: Michal Hocko Cc: Logan Gunthorpe Cc: Pavel Tatashin Reviewed-by: David Hildenbrand Reviewed-by: Oscar Salvador Signed-off-by: Dan Williams --- mm/memory_hotplug.c | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 647859a1d119..4b882c57781a 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -535,11 +535,8 @@ void __remove_pages(struct zone *zone, unsigned long phys_start_pfn, unsigned long map_offset = 0; int sections_to_remove; - /* In the ZONE_DEVICE case device driver owns the memory region */ - if (is_dev_zone(zone)) { - if (altmap) - map_offset = vmem_altmap_offset(altmap); - } + if (altmap) + map_offset = vmem_altmap_offset(altmap); clear_zone_contiguous(zone); From patchwork Wed Jun 5 21:58:32 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 10977973 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D4C5A1398 for ; Wed, 5 Jun 2019 22:12:52 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C54F621BED for ; Wed, 5 Jun 2019 22:12:52 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id B96232890A; Wed, 5 Jun 2019 22:12:52 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 54030285DB for ; Wed, 5 Jun 2019 22:12:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6ABF96B0274; Wed, 5 Jun 2019 18:12:51 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 682626B0276; Wed, 5 Jun 2019 18:12:51 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5721F6B0277; Wed, 5 Jun 2019 18:12:51 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f200.google.com (mail-pg1-f200.google.com [209.85.215.200]) by kanga.kvack.org (Postfix) with ESMTP id 220B66B0274 for ; Wed, 5 Jun 2019 18:12:51 -0400 (EDT) Received: by mail-pg1-f200.google.com with SMTP id j26so128886pgj.6 for ; Wed, 05 Jun 2019 15:12:51 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:subject:from :to:cc:date:message-id:in-reply-to:references:user-agent :mime-version:content-transfer-encoding; bh=8n19+FzQaIDY2xDJy05T7UR8tvZJ9j4S/+ajqMgZvdY=; b=DoL4zDQKTzrEGQwRr4NY0kpaaVXoXr5EdWteM8qD9M420BOFUEUnOVmQCo85HwiQHh OyzBA+sqRBn6+rSnrCzFgXRambmrVyrFZZDyvwf2HmNH6joYwr4buhkxzYCt1OmI0WHp Ma1L6MHkkT5J5KgA26S5tU29OMMVJUTZ9fYfmxBxnmoe1WtZpBOiy+qb2giXSLEEDjav HEmTr+i4MmzT4l/hPr+Na+IqMxtSWVm8mvBA9hqfN9YGCqsXWh1dHnaelbwAW1QtiMt8 yAb36luchaKC6q5/0fQJWgnGt8gyIMcvFkhztZxqzOHQjvBNAKB3f+lb8L1JaElZxlNW uLQQ== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: APjAAAXegUctTrPqEoYL+KOTrIJta8Bjm+SeCYROV5CkVL8KUtQZOWHv oFPDz763PeC6BpxQoW+X/yvxtKgrDZ3DD2e4duvT5VYTMv8jarPwfyVrkT3xN5HgOjNRjKqqciv JasYbpa2nbifvLaUuVsvKPkuTJxjUIubCwh75HyXMRSmtw/N8NkACzeHkpQCUGXOrsQ== X-Received: by 2002:a17:902:20ec:: with SMTP id v41mr44443966plg.142.1559772770637; Wed, 05 Jun 2019 15:12:50 -0700 (PDT) X-Google-Smtp-Source: APXvYqz3jUzQswNROJ2QOo5vq+EfUNpkTjrbJ2JEmnSvImOPDxSiFsqt5kbS7sZBnAc8OZ6Ibg8e X-Received: by 2002:a17:902:20ec:: with SMTP id v41mr44443882plg.142.1559772769776; Wed, 05 Jun 2019 15:12:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1559772769; cv=none; d=google.com; s=arc-20160816; b=Byq492nmu0gbmzhf1twIxUqFSNd1j9/uqw3COXWS7pfFT91kHS7dVAIKSc3XhqCpzy DqHmW+4dnWtQlaEpdlGXjXWTwoZYKMoCOsqMFS2OhJIWIpzWwcHDTmxzflYzsxIDtm00 Lof7a4DhWAmZcl3P+68dx9Uo6gVZoeZArWVP1TFMi2WU8wPiTlf4WpVkPVRq+ngsv0f+ jfHuk2no7hQXaXduOkI4VeGq5VU/9FOR6M0TwKPMbSO2pMxXG9lxxNGhuWoFNev0cF3X Qx65j29AUt6Vt6/ZxvXkJbVF+DP6O29ROPqL3YCAWetUYQfoh4lFiRFyoZyyVjqTi0Ij H7fw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:user-agent:references :in-reply-to:message-id:date:cc:to:from:subject; bh=8n19+FzQaIDY2xDJy05T7UR8tvZJ9j4S/+ajqMgZvdY=; b=fVoaeL40tVF9dCweye1G6RqRf1RIOQFM1pMW+y2gG/dANUyEhRvezQHAzoibD7ROmi oyaCLaXXX5Uo5mymAl5HVcM/kmaKVbh/PvQ+8q0cNX+Hg9IacJ2AOKLogKx8vKtsIjZf i4SdINFEmwVCCxeej0/k//+qReIvM8MaIDnJKE85+iQ/Ag5VosSb73hWD/QBkQybzPNr l/iFW1HqqoE8MqLFQnKh5qrdMJhmAt5srIIz56QWtMoAdqIB9VjBcu60cZaGH+2tg0mD l5N+T9Av1wVz7p7hsNmQOjndc6a1Zw9q3WH2WMnccvmvItIdaOewDVJVzQWQQjjZuKpt r8qg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga05.intel.com (mga05.intel.com. [192.55.52.43]) by mx.google.com with ESMTPS id f9si193471pgv.5.2019.06.05.15.12.49 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 05 Jun 2019 15:12:49 -0700 (PDT) Received-SPF: pass (google.com: domain of dan.j.williams@intel.com designates 192.55.52.43 as permitted sender) client-ip=192.55.52.43; Authentication-Results: mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 05 Jun 2019 15:12:49 -0700 X-ExtLoop1: 1 Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.16]) by orsmga008.jf.intel.com with ESMTP; 05 Jun 2019 15:12:49 -0700 Subject: [PATCH v9 06/12] mm: Kill is_dev_zone() helper From: Dan Williams To: akpm@linux-foundation.org Cc: Michal Hocko , Logan Gunthorpe , David Hildenbrand , Oscar Salvador , Pavel Tatashin , linux-mm@kvack.org, linux-nvdimm@lists.01.org, linux-kernel@vger.kernel.org, osalvador@suse.de, mhocko@suse.com Date: Wed, 05 Jun 2019 14:58:32 -0700 Message-ID: <155977191260.2443951.15908146523735681570.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <155977186863.2443951.9036044808311959913.stgit@dwillia2-desk3.amr.corp.intel.com> References: <155977186863.2443951.9036044808311959913.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.18-2-gc94f MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Given there are no more usages of is_dev_zone() outside of 'ifdef CONFIG_ZONE_DEVICE' protection, kill off the compilation helper. Cc: Michal Hocko Cc: Logan Gunthorpe Acked-by: David Hildenbrand Reviewed-by: Oscar Salvador Reviewed-by: Pavel Tatashin Signed-off-by: Dan Williams Reviewed-by: Wei Yang --- include/linux/mmzone.h | 12 ------------ mm/page_alloc.c | 2 +- 2 files changed, 1 insertion(+), 13 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 6dd52d544857..49e7fb452dfd 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -855,18 +855,6 @@ static inline int local_memory_node(int node_id) { return node_id; }; */ #define zone_idx(zone) ((zone) - (zone)->zone_pgdat->node_zones) -#ifdef CONFIG_ZONE_DEVICE -static inline bool is_dev_zone(const struct zone *zone) -{ - return zone_idx(zone) == ZONE_DEVICE; -} -#else -static inline bool is_dev_zone(const struct zone *zone) -{ - return false; -} -#endif - /* * Returns true if a zone has pages managed by the buddy allocator. * All the reclaim decisions have to use this function rather than diff --git a/mm/page_alloc.c b/mm/page_alloc.c index bd773efe5b82..5dff3f49a372 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5865,7 +5865,7 @@ void __ref memmap_init_zone_device(struct zone *zone, unsigned long start = jiffies; int nid = pgdat->node_id; - if (WARN_ON_ONCE(!pgmap || !is_dev_zone(zone))) + if (WARN_ON_ONCE(!pgmap || zone_idx(zone) != ZONE_DEVICE)) return; /* From patchwork Wed Jun 5 21:58:37 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 10977977 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C0A971398 for ; Wed, 5 Jun 2019 22:12:58 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AF14A21BED for ; Wed, 5 Jun 2019 22:12:58 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id A32D32890A; Wed, 5 Jun 2019 22:12:58 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=unavailable version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CFC4121BED for ; Wed, 5 Jun 2019 22:12:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D648B6B0276; Wed, 5 Jun 2019 18:12:56 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id D3A1C6B0277; Wed, 5 Jun 2019 18:12:56 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C02646B0278; Wed, 5 Jun 2019 18:12:56 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f197.google.com (mail-pf1-f197.google.com [209.85.210.197]) by kanga.kvack.org (Postfix) with ESMTP id 8837D6B0276 for ; Wed, 5 Jun 2019 18:12:56 -0400 (EDT) Received: by mail-pf1-f197.google.com with SMTP id d125so291810pfd.3 for ; Wed, 05 Jun 2019 15:12:56 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:subject:from :to:cc:date:message-id:in-reply-to:references:user-agent :mime-version:content-transfer-encoding; bh=ppDn4NbXCYcwgcq7IflsAAu6CLujpPkaxxXGcE+TDd0=; b=q/RIg7wiYnwamfdfK2vF5x/by/vmxEkGUVgAvoVvp+CXJ3J3KQWr9cL89DHevOD30+ 5UQ8YZV4nLE4DMeNvtRjpMxEHFBnIxKh0FxVQfgT7tibeiMf4BM6zSzCoMs8TrKpXBbu 3QV7M4lc8beJbNOVo0F4ptQLQel9EegF/O43lGp5WyC9nfpi2Q2z8PoYb3DG890NQoWg tJpUSZFC1yni/T4N1tLHRrq0s24vsE1Q0NGcKwJSBLocTD/DLf+Ra/ymk2d5je85BQ8h m1o17QkuZgtjcPYw4S4O/rqJC/ldJ33cHOqip+V+BvLcFoyNIUmokMrX6lyfNAZAZ0DM pGeg== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 192.55.52.120 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: APjAAAVUES2f0S8jmSqAvNlYCeSXhJBMZVCUxkZyu9al7vQKzb5Ae4To SOz4K7VeEqYdbHfGbeqADT9g//B/y+Hq+Axurdx1BNApPDfs46oUjBWHj6nfFZt/79+aveuLQqG lE/mkLghppMBDLSjEIF5U6N/lXt3aw/yy9SfDj5O54SmJ3KwXKtYTHhgmyY7eoyUP3A== X-Received: by 2002:a62:5801:: with SMTP id m1mr50098009pfb.32.1559772776051; Wed, 05 Jun 2019 15:12:56 -0700 (PDT) X-Google-Smtp-Source: APXvYqx8y7huiaBqUKwAbJXIJ/Z18pw3FC9/8mzw2/6R/YDwE9C0oezE17A6LoVwXdOUNDsY4C8s X-Received: by 2002:a62:5801:: with SMTP id m1mr50097879pfb.32.1559772775041; Wed, 05 Jun 2019 15:12:55 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1559772775; cv=none; d=google.com; s=arc-20160816; b=kVgKxpeYE7hPTDsWDojdO4LE3Q+brMfRGIZMTLtlb6Mmibyt10euJ34+1VkY1Lumoo viFaF+m/hLkJ1H46KL9YJn7mRiuacGJH/QkUL0Vt2lm/yqyYzQIUKXR1EGJqonmT4Lo2 8KM++0e3ILws6B5J2aLMDMAUSQ1t+HPhQeWDPSWqzFSaO2EDvhb7z7mhgVURAvuM6/I0 iQ22w0gHDF4t5oUMzJrDSFehTFQBpSYMA+Gsrol6ZtV8LW4BFtip+3K9fbEAtB9wnu52 tv5VFTtkG0uVvBGJsgifqkK+VplxPbOz+ua+cQ+CdOy91MxPGm8KyyJnsnPyVWMKL1eY HqpQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:user-agent:references :in-reply-to:message-id:date:cc:to:from:subject; bh=ppDn4NbXCYcwgcq7IflsAAu6CLujpPkaxxXGcE+TDd0=; b=GUiUbB+frjJqRYJo1qwE0aQuhHo3Rx2k/nN2uc9gblwaSlpm+8M3wd2S1HyKHN0/Ql bJQVMvuRjqVwHCAIYBCbK3C3vepJG0xNAk5qGMw7rjQ4AoKnsU9l2+leazQaWwlxNfav S3Ly4o89dUFpshrAJSJezIEPAzB2o4SjYCxsuQKAvUInPQcHXFSJ8j00ixHIIU7Z1aSr hV/pwtwd2/6O67SQIgYEgMYU+jREK4gsVXyu2TmlfMJhJXLibsExlPLMnXzmAev+5Mqa kB+iQMnA+Jq1Gj9wJCNASAJGKNYOhtBkeAIU2H0SfQvH9XqiSWM030/pYZNnf0MDfnZq RN5w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 192.55.52.120 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga04.intel.com (mga04.intel.com. [192.55.52.120]) by mx.google.com with ESMTPS id b19si33554198pfi.23.2019.06.05.15.12.54 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 05 Jun 2019 15:12:55 -0700 (PDT) Received-SPF: pass (google.com: domain of dan.j.williams@intel.com designates 192.55.52.120 as permitted sender) client-ip=192.55.52.120; Authentication-Results: mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 192.55.52.120 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 05 Jun 2019 15:12:54 -0700 X-ExtLoop1: 1 Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.16]) by orsmga007.jf.intel.com with ESMTP; 05 Jun 2019 15:12:54 -0700 Subject: [PATCH v9 07/12] mm/sparsemem: Prepare for sub-section ranges From: Dan Williams To: akpm@linux-foundation.org Cc: Michal Hocko , Vlastimil Babka , Logan Gunthorpe , Oscar Salvador , Pavel Tatashin , linux-mm@kvack.org, linux-nvdimm@lists.01.org, linux-kernel@vger.kernel.org, osalvador@suse.de, mhocko@suse.com Date: Wed, 05 Jun 2019 14:58:37 -0700 Message-ID: <155977191770.2443951.1506588644989416699.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <155977186863.2443951.9036044808311959913.stgit@dwillia2-desk3.amr.corp.intel.com> References: <155977186863.2443951.9036044808311959913.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.18-2-gc94f MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Prepare the memory hot-{add,remove} paths for handling sub-section ranges by plumbing the starting page frame and number of pages being handled through arch_{add,remove}_memory() to sparse_{add,remove}_one_section(). This is simply plumbing, small cleanups, and some identifier renames. No intended functional changes. Cc: Michal Hocko Cc: Vlastimil Babka Cc: Logan Gunthorpe Cc: Oscar Salvador Reviewed-by: Pavel Tatashin Signed-off-by: Dan Williams Reviewed-by: Oscar Salvador --- include/linux/memory_hotplug.h | 5 +- mm/memory_hotplug.c | 114 +++++++++++++++++++++++++--------------- mm/sparse.c | 15 ++--- 3 files changed, 81 insertions(+), 53 deletions(-) diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index 79e0add6a597..3ab0282b4fe5 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -348,9 +348,10 @@ extern int add_memory_resource(int nid, struct resource *resource); extern void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn, unsigned long nr_pages, struct vmem_altmap *altmap); extern bool is_memblock_offlined(struct memory_block *mem); -extern int sparse_add_one_section(int nid, unsigned long start_pfn, - struct vmem_altmap *altmap); +extern int sparse_add_section(int nid, unsigned long pfn, + unsigned long nr_pages, struct vmem_altmap *altmap); extern void sparse_remove_one_section(struct mem_section *ms, + unsigned long pfn, unsigned long nr_pages, unsigned long map_offset, struct vmem_altmap *altmap); extern struct page *sparse_decode_mem_map(unsigned long coded_mem_map, unsigned long pnum); diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 4b882c57781a..399bf78bccc5 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -252,51 +252,84 @@ void __init register_page_bootmem_info_node(struct pglist_data *pgdat) } #endif /* CONFIG_HAVE_BOOTMEM_INFO_NODE */ -static int __meminit __add_section(int nid, unsigned long phys_start_pfn, - struct vmem_altmap *altmap) +static int __meminit __add_section(int nid, unsigned long pfn, + unsigned long nr_pages, struct vmem_altmap *altmap) { int ret; - if (pfn_valid(phys_start_pfn)) + if (pfn_valid(pfn)) return -EEXIST; - ret = sparse_add_one_section(nid, phys_start_pfn, altmap); + ret = sparse_add_section(nid, pfn, nr_pages, altmap); return ret < 0 ? ret : 0; } +static int check_pfn_span(unsigned long pfn, unsigned long nr_pages, + const char *reason) +{ + /* + * Disallow all operations smaller than a sub-section and only + * allow operations smaller than a section for + * SPARSEMEM_VMEMMAP. Note that check_hotplug_memory_range() + * enforces a larger memory_block_size_bytes() granularity for + * memory that will be marked online, so this check should only + * fire for direct arch_{add,remove}_memory() users outside of + * add_memory_resource(). + */ + unsigned long min_align; + + if (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP)) + min_align = PAGES_PER_SUBSECTION; + else + min_align = PAGES_PER_SECTION; + if (!IS_ALIGNED(pfn, min_align) + || !IS_ALIGNED(nr_pages, min_align)) { + WARN(1, "Misaligned __%s_pages start: %#lx end: #%lx\n", + reason, pfn, pfn + nr_pages - 1); + return -EINVAL; + } + return 0; +} + /* * Reasonably generic function for adding memory. It is * expected that archs that support memory hotplug will * call this function after deciding the zone to which to * add the new pages. */ -int __ref __add_pages(int nid, unsigned long phys_start_pfn, - unsigned long nr_pages, struct mhp_restrictions *restrictions) +int __ref __add_pages(int nid, unsigned long pfn, unsigned long nr_pages, + struct mhp_restrictions *restrictions) { unsigned long i; - int err = 0; - int start_sec, end_sec; + int start_sec, end_sec, err; struct vmem_altmap *altmap = restrictions->altmap; - /* during initialize mem_map, align hot-added range to section */ - start_sec = pfn_to_section_nr(phys_start_pfn); - end_sec = pfn_to_section_nr(phys_start_pfn + nr_pages - 1); - if (altmap) { /* * Validate altmap is within bounds of the total request */ - if (altmap->base_pfn != phys_start_pfn + if (altmap->base_pfn != pfn || vmem_altmap_offset(altmap) > nr_pages) { pr_warn_once("memory add fail, invalid altmap\n"); - err = -EINVAL; - goto out; + return -EINVAL; } altmap->alloc = 0; } + err = check_pfn_span(pfn, nr_pages, "add"); + if (err) + return err; + + start_sec = pfn_to_section_nr(pfn); + end_sec = pfn_to_section_nr(pfn + nr_pages - 1); for (i = start_sec; i <= end_sec; i++) { - err = __add_section(nid, section_nr_to_pfn(i), altmap); + unsigned long pfns; + + pfns = min(nr_pages, PAGES_PER_SECTION + - (pfn & ~PAGE_SECTION_MASK)); + err = __add_section(nid, pfn, pfns, altmap); + pfn += pfns; + nr_pages -= pfns; /* * EEXIST is finally dealt with by ioresource collision @@ -309,7 +342,6 @@ int __ref __add_pages(int nid, unsigned long phys_start_pfn, cond_resched(); } vmemmap_populate_print_last(); -out: return err; } @@ -487,10 +519,10 @@ static void shrink_pgdat_span(struct pglist_data *pgdat, pgdat->node_spanned_pages = 0; } -static void __remove_zone(struct zone *zone, unsigned long start_pfn) +static void __remove_zone(struct zone *zone, unsigned long start_pfn, + unsigned long nr_pages) { struct pglist_data *pgdat = zone->zone_pgdat; - int nr_pages = PAGES_PER_SECTION; unsigned long flags; pgdat_resize_lock(zone->zone_pgdat, &flags); @@ -499,27 +531,23 @@ static void __remove_zone(struct zone *zone, unsigned long start_pfn) pgdat_resize_unlock(zone->zone_pgdat, &flags); } -static void __remove_section(struct zone *zone, struct mem_section *ms, - unsigned long map_offset, - struct vmem_altmap *altmap) +static void __remove_section(struct zone *zone, unsigned long pfn, + unsigned long nr_pages, unsigned long map_offset, + struct vmem_altmap *altmap) { - unsigned long start_pfn; - int scn_nr; + struct mem_section *ms = __nr_to_section(pfn_to_section_nr(pfn)); if (WARN_ON_ONCE(!valid_section(ms))) return; - scn_nr = __section_nr(ms); - start_pfn = section_nr_to_pfn((unsigned long)scn_nr); - __remove_zone(zone, start_pfn); - - sparse_remove_one_section(ms, map_offset, altmap); + __remove_zone(zone, pfn, nr_pages); + sparse_remove_one_section(ms, pfn, nr_pages, map_offset, altmap); } /** * __remove_pages() - remove sections of pages from a zone * @zone: zone from which pages need to be removed - * @phys_start_pfn: starting pageframe (must be aligned to start of a section) + * @pfn: starting pageframe (must be aligned to start of a section) * @nr_pages: number of pages to remove (must be multiple of section size) * @altmap: alternative device page map or %NULL if default memmap is used * @@ -528,31 +556,31 @@ static void __remove_section(struct zone *zone, struct mem_section *ms, * sure that pages are marked reserved and zones are adjust properly by * calling offline_pages(). */ -void __remove_pages(struct zone *zone, unsigned long phys_start_pfn, +void __remove_pages(struct zone *zone, unsigned long pfn, unsigned long nr_pages, struct vmem_altmap *altmap) { - unsigned long i; unsigned long map_offset = 0; - int sections_to_remove; + int i, start_sec, end_sec; if (altmap) map_offset = vmem_altmap_offset(altmap); clear_zone_contiguous(zone); - /* - * We can only remove entire sections - */ - BUG_ON(phys_start_pfn & ~PAGE_SECTION_MASK); - BUG_ON(nr_pages % PAGES_PER_SECTION); + if (check_pfn_span(pfn, nr_pages, "remove")) + return; - sections_to_remove = nr_pages / PAGES_PER_SECTION; - for (i = 0; i < sections_to_remove; i++) { - unsigned long pfn = phys_start_pfn + i*PAGES_PER_SECTION; + start_sec = pfn_to_section_nr(pfn); + end_sec = pfn_to_section_nr(pfn + nr_pages - 1); + for (i = start_sec; i <= end_sec; i++) { + unsigned long pfns; cond_resched(); - __remove_section(zone, __pfn_to_section(pfn), map_offset, - altmap); + pfns = min(nr_pages, PAGES_PER_SECTION + - (pfn & ~PAGE_SECTION_MASK)); + __remove_section(zone, pfn, pfns, map_offset, altmap); + pfn += pfns; + nr_pages -= pfns; map_offset = 0; } diff --git a/mm/sparse.c b/mm/sparse.c index 2093c662a5f7..f65206deaf49 100644 --- a/mm/sparse.c +++ b/mm/sparse.c @@ -739,8 +739,8 @@ static void free_map_bootmem(struct page *memmap) * * -EEXIST - Section has been present. * * -ENOMEM - Out of memory. */ -int __meminit sparse_add_one_section(int nid, unsigned long start_pfn, - struct vmem_altmap *altmap) +int __meminit sparse_add_section(int nid, unsigned long start_pfn, + unsigned long nr_pages, struct vmem_altmap *altmap) { unsigned long section_nr = pfn_to_section_nr(start_pfn); struct mem_section_usage *usage; @@ -848,8 +848,9 @@ static void free_section_usage(struct page *memmap, free_map_bootmem(memmap); } -void sparse_remove_one_section(struct mem_section *ms, unsigned long map_offset, - struct vmem_altmap *altmap) +void sparse_remove_one_section(struct mem_section *ms, unsigned long pfn, + unsigned long nr_pages, unsigned long map_offset, + struct vmem_altmap *altmap) { struct page *memmap = NULL; struct mem_section_usage *usage = NULL; @@ -862,9 +863,7 @@ void sparse_remove_one_section(struct mem_section *ms, unsigned long map_offset, ms->usage = NULL; } - clear_hwpoisoned_pages(memmap + map_offset, - PAGES_PER_SECTION - map_offset); - free_section_usage(memmap, usage, section_nr_to_pfn(__section_nr(ms)), - PAGES_PER_SECTION, altmap); + clear_hwpoisoned_pages(memmap + map_offset, nr_pages - map_offset); + free_section_usage(memmap, usage, pfn, nr_pages, altmap); } #endif /* CONFIG_MEMORY_HOTPLUG */ From patchwork Wed Jun 5 21:58:42 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 10977981 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0AB986C5 for ; Wed, 5 Jun 2019 22:13:04 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id ECD7921BED for ; Wed, 5 Jun 2019 22:13:03 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id E07E22890A; Wed, 5 Jun 2019 22:13:03 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=unavailable version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E91AD21BED for ; Wed, 5 Jun 2019 22:13:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D89AF6B0277; Wed, 5 Jun 2019 18:13:01 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id D38606B0278; Wed, 5 Jun 2019 18:13:01 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C4EE06B0279; Wed, 5 Jun 2019 18:13:01 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f200.google.com (mail-pl1-f200.google.com [209.85.214.200]) by kanga.kvack.org (Postfix) with ESMTP id 8843F6B0277 for ; Wed, 5 Jun 2019 18:13:01 -0400 (EDT) Received: by mail-pl1-f200.google.com with SMTP id q2so179102plr.19 for ; Wed, 05 Jun 2019 15:13:01 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:subject:from :to:cc:date:message-id:in-reply-to:references:user-agent :mime-version:content-transfer-encoding; bh=iSwXzQQbTQrIqHqiTk0MZs9qCXr7kdVRZXlzTAF7W04=; b=dnl+oF0CHZ45unWqa2qOXFJcO4OcExft3TKMf84KPats3TJ/n9FA6mP330gS22vO7+ e9XTKkpLvxKgI4WDjYHyAb4lcvknUqJDVSlVilF3EGw+t9XgM1fpC9ccG/pm0W9SCMis TJbMl2ccK6q3km6f6mjH8NAvX20mxtyqKhpWPVuNVbrLeXMvFpa5HrxLMUJbXR6wnzmh 1O8HtezLLLDFW6acpgYaAa47bPrkH++4d53mwz+bHrLhg4NqjTyDypGkODZxB6mFUQv8 YtZbH5c762HLDo1IU7R6hKOGIBXpW2F1+YZojBf1MHPAlHrw4EFRn1fTrYuEMziyZt/m cQPQ== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: APjAAAW2722t1+dMLghRobw+fzPRhp0l1e0LfwcjCtyEcaW4Qr8bv1Tw A6Ybu7jZUspSiDTBZT7gU3qhzed+57M3T5p0PzKlrQXo4UyTLZq59gA9J57Z1Mh/BRpBtXYpkFA pp/lj554pKUpY9Ft11rv4rGaa+sE8nDOOwUMAMoag8lgcSOTj/Fqz2D4APQOvx/Z0JA== X-Received: by 2002:a17:902:a986:: with SMTP id bh6mr45103546plb.100.1559772781001; Wed, 05 Jun 2019 15:13:01 -0700 (PDT) X-Google-Smtp-Source: APXvYqzAJELr1MWYX3fJ/K1EHVkweJjKVoZRsKnYebwHJshL4fk0STMxhTKhCTo6kM55ZiUnNIoJ X-Received: by 2002:a17:902:a986:: with SMTP id bh6mr45103427plb.100.1559772779878; Wed, 05 Jun 2019 15:12:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1559772779; cv=none; d=google.com; s=arc-20160816; b=s+k9Jsv082mJaf9jORfpxcXiNKNSoUslHfGFyZL9735BUCd55Qsk4b781BzB19hbLJ w3fOX0e506rWmBdjjQ7ZFq4owNYNyhcUSaEJJd45pDWVp+20zDPSfdgqmGT8wsLL5FbO 3k6KZM0FADYqR4EfLflctVAyGpqTIeZg5kGToQzFw9XWqvJzTX47eVMU4xttOXRjsUqg yMt9qtgoY+6+eAWBJyXORiQxpTwJxKiJ0wDksbIRj1ftby54DgrQaOqub/jh8wuInUS+ ABJBuguN7c22XOwrkCpUHOOrm14uQtRVHbaqiJs2ulwlbrNskaEzPYcJc6R4UuYFvQ7m YtyA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:user-agent:references :in-reply-to:message-id:date:cc:to:from:subject; bh=iSwXzQQbTQrIqHqiTk0MZs9qCXr7kdVRZXlzTAF7W04=; b=GvwzAMrmo61U//s1UgyzsQZP777Zfk0ihr8YmVS0cTSXZIpV9slVro17UmNdSMMeLy hoEZJq17MmYpolPCsmAdXwbGV/b3ZZWn936nVdSG5TuO51bp8+WsyiwR8npZHRigs34r 76DbO503347a46nlN84tqGjRn+uiweDj9KbmwBXGaKXYPDq0OyrruF0EGyUGAbBwCNpg r3Za0icf88MXDaTH0ihKaqgsh5SfrT+q+0/TISnwmHyjfILkvLUvdDx1Z2miWHnQF1+j 1wDeJ+qQnL8WqzvH8NtLYPd99gf5ID/8hsF2RuL/+KeluQPIZ9ljUDiqvmPi4tcaSpKJ kVYA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga02.intel.com (mga02.intel.com. [134.134.136.20]) by mx.google.com with ESMTPS id t1si127205pgh.406.2019.06.05.15.12.59 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 05 Jun 2019 15:12:59 -0700 (PDT) Received-SPF: pass (google.com: domain of dan.j.williams@intel.com designates 134.134.136.20 as permitted sender) client-ip=134.134.136.20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 05 Jun 2019 15:12:59 -0700 X-ExtLoop1: 1 Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.16]) by orsmga003.jf.intel.com with ESMTP; 05 Jun 2019 15:12:59 -0700 Subject: [PATCH v9 08/12] mm/sparsemem: Support sub-section hotplug From: Dan Williams To: akpm@linux-foundation.org Cc: Michal Hocko , Vlastimil Babka , Logan Gunthorpe , Oscar Salvador , Pavel Tatashin , linux-mm@kvack.org, linux-nvdimm@lists.01.org, linux-kernel@vger.kernel.org, osalvador@suse.de, mhocko@suse.com Date: Wed, 05 Jun 2019 14:58:42 -0700 Message-ID: <155977192280.2443951.13941265207662462739.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <155977186863.2443951.9036044808311959913.stgit@dwillia2-desk3.amr.corp.intel.com> References: <155977186863.2443951.9036044808311959913.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.18-2-gc94f MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP The libnvdimm sub-system has suffered a series of hacks and broken workarounds for the memory-hotplug implementation's awkward section-aligned (128MB) granularity. For example the following backtrace is emitted when attempting arch_add_memory() with physical address ranges that intersect 'System RAM' (RAM) with 'Persistent Memory' (PMEM) within a given section: WARNING: CPU: 0 PID: 558 at kernel/memremap.c:300 devm_memremap_pages+0x3b5/0x4c0 devm_memremap_pages attempted on mixed region [mem 0x200000000-0x2fbffffff flags 0x200] [..] Call Trace: dump_stack+0x86/0xc3 __warn+0xcb/0xf0 warn_slowpath_fmt+0x5f/0x80 devm_memremap_pages+0x3b5/0x4c0 __wrap_devm_memremap_pages+0x58/0x70 [nfit_test_iomap] pmem_attach_disk+0x19a/0x440 [nd_pmem] Recently it was discovered that the problem goes beyond RAM vs PMEM collisions as some platform produce PMEM vs PMEM collisions within a given section. The libnvdimm workaround for that case revealed that the libnvdimm section-alignment-padding implementation has been broken for a long while. A fix for that long-standing breakage introduces as many problems as it solves as it would require a backward-incompatible change to the namespace metadata interpretation. Instead of that dubious route [1], address the root problem in the memory-hotplug implementation. [1]: https://lore.kernel.org/r/155000671719.348031.2347363160141119237.stgit@dwillia2-desk3.amr.corp.intel.com Cc: Michal Hocko Cc: Vlastimil Babka Cc: Logan Gunthorpe Cc: Oscar Salvador Cc: Pavel Tatashin Signed-off-by: Dan Williams Reviewed-by: Oscar Salvador --- include/linux/memory_hotplug.h | 2 mm/memory_hotplug.c | 7 - mm/page_alloc.c | 2 mm/sparse.c | 225 +++++++++++++++++++++++++++------------- 4 files changed, 155 insertions(+), 81 deletions(-) diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index 3ab0282b4fe5..0b8a5e5ef2da 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -350,7 +350,7 @@ extern void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn, extern bool is_memblock_offlined(struct memory_block *mem); extern int sparse_add_section(int nid, unsigned long pfn, unsigned long nr_pages, struct vmem_altmap *altmap); -extern void sparse_remove_one_section(struct mem_section *ms, +extern void sparse_remove_section(struct mem_section *ms, unsigned long pfn, unsigned long nr_pages, unsigned long map_offset, struct vmem_altmap *altmap); extern struct page *sparse_decode_mem_map(unsigned long coded_mem_map, diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 399bf78bccc5..8188be7a9edb 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -255,13 +255,10 @@ void __init register_page_bootmem_info_node(struct pglist_data *pgdat) static int __meminit __add_section(int nid, unsigned long pfn, unsigned long nr_pages, struct vmem_altmap *altmap) { - int ret; - if (pfn_valid(pfn)) return -EEXIST; - ret = sparse_add_section(nid, pfn, nr_pages, altmap); - return ret < 0 ? ret : 0; + return sparse_add_section(nid, pfn, nr_pages, altmap); } static int check_pfn_span(unsigned long pfn, unsigned long nr_pages, @@ -541,7 +538,7 @@ static void __remove_section(struct zone *zone, unsigned long pfn, return; __remove_zone(zone, pfn, nr_pages); - sparse_remove_one_section(ms, pfn, nr_pages, map_offset, altmap); + sparse_remove_section(ms, pfn, nr_pages, map_offset, altmap); } /** diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 5dff3f49a372..af260cc469cd 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5915,7 +5915,7 @@ void __ref memmap_init_zone_device(struct zone *zone, * pfn out of zone. * * Please note that MEMMAP_HOTPLUG path doesn't clear memmap - * because this is done early in sparse_add_one_section + * because this is done early in section_activate() */ if (!(pfn & (pageblock_nr_pages - 1))) { set_pageblock_migratetype(page, MIGRATE_MOVABLE); diff --git a/mm/sparse.c b/mm/sparse.c index f65206deaf49..d83bac5d1324 100644 --- a/mm/sparse.c +++ b/mm/sparse.c @@ -83,8 +83,15 @@ static int __meminit sparse_index_init(unsigned long section_nr, int nid) unsigned long root = SECTION_NR_TO_ROOT(section_nr); struct mem_section *section; + /* + * An existing section is possible in the sub-section hotplug + * case. First hot-add instantiates, follow-on hot-add reuses + * the existing section. + * + * The mem_hotplug_lock resolves the apparent race below. + */ if (mem_section[root]) - return -EEXIST; + return 0; section = sparse_index_alloc(nid); if (!section) @@ -325,6 +332,15 @@ static void __meminit sparse_init_one_section(struct mem_section *ms, unsigned long pnum, struct page *mem_map, struct mem_section_usage *usage) { + /* + * Given that SPARSEMEM_VMEMMAP=y supports sub-section hotplug, + * ->section_mem_map can not be guaranteed to point to a full + * section's worth of memory. The field is only valid / used + * in the SPARSEMEM_VMEMMAP=n case. + */ + if (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP)) + mem_map = NULL; + ms->section_mem_map &= ~SECTION_MAP_MASK; ms->section_mem_map |= sparse_encode_mem_map(mem_map, pnum) | SECTION_HAS_MEM_MAP; @@ -726,10 +742,131 @@ static void free_map_bootmem(struct page *memmap) } #endif /* CONFIG_SPARSEMEM_VMEMMAP */ +static bool is_early_section(struct mem_section *ms) +{ + struct page *usage_page; + + usage_page = virt_to_page(ms->usage); + if (PageSlab(usage_page) || PageCompound(usage_page)) + return false; + else + return true; +} + +static void section_deactivate(unsigned long pfn, unsigned long nr_pages, + struct vmem_altmap *altmap) +{ + DECLARE_BITMAP(map, SUBSECTIONS_PER_SECTION) = { 0 }; + DECLARE_BITMAP(tmp, SUBSECTIONS_PER_SECTION) = { 0 }; + struct mem_section *ms = __pfn_to_section(pfn); + bool early_section = is_early_section(ms); + struct page *memmap = NULL; + unsigned long *subsection_map = ms->usage + ? &ms->usage->subsection_map[0] : NULL; + + subsection_mask_set(map, pfn, nr_pages); + if (subsection_map) + bitmap_and(tmp, map, subsection_map, SUBSECTIONS_PER_SECTION); + + if (WARN(!subsection_map || !bitmap_equal(tmp, map, SUBSECTIONS_PER_SECTION), + "section already deactivated (%#lx + %ld)\n", + pfn, nr_pages)) + return; + + /* + * There are 3 cases to handle across two configurations + * (SPARSEMEM_VMEMMAP={y,n}): + * + * 1/ deactivation of a partial hot-added section (only possible + * in the SPARSEMEM_VMEMMAP=y case). + * a/ section was present at memory init + * b/ section was hot-added post memory init + * 2/ deactivation of a complete hot-added section + * 3/ deactivation of a complete section from memory init + * + * For 1/, when subsection_map does not empty we will not be + * freeing the usage map, but still need to free the vmemmap + * range. + * + * For 2/ and 3/ the SPARSEMEM_VMEMMAP={y,n} cases are unified + */ + bitmap_xor(subsection_map, map, subsection_map, SUBSECTIONS_PER_SECTION); + if (bitmap_empty(subsection_map, SUBSECTIONS_PER_SECTION)) { + unsigned long section_nr = pfn_to_section_nr(pfn); + + if (!early_section) { + kfree(ms->usage); + ms->usage = NULL; + } + memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr); + ms->section_mem_map = sparse_encode_mem_map(NULL, section_nr); + } + + if (early_section && memmap) + free_map_bootmem(memmap); + else + depopulate_section_memmap(pfn, nr_pages, altmap); +} + +static struct page * __meminit section_activate(int nid, unsigned long pfn, + unsigned long nr_pages, struct vmem_altmap *altmap) +{ + DECLARE_BITMAP(map, SUBSECTIONS_PER_SECTION) = { 0 }; + struct mem_section *ms = __pfn_to_section(pfn); + struct mem_section_usage *usage = NULL; + unsigned long *subsection_map; + struct page *memmap; + int rc = 0; + + subsection_mask_set(map, pfn, nr_pages); + + if (!ms->usage) { + usage = kzalloc(mem_section_usage_size(), GFP_KERNEL); + if (!usage) + return ERR_PTR(-ENOMEM); + ms->usage = usage; + } + subsection_map = &ms->usage->subsection_map[0]; + + if (bitmap_empty(map, SUBSECTIONS_PER_SECTION)) + rc = -EINVAL; + else if (bitmap_intersects(map, subsection_map, SUBSECTIONS_PER_SECTION)) + rc = -EEXIST; + else + bitmap_or(subsection_map, map, subsection_map, + SUBSECTIONS_PER_SECTION); + + if (rc) { + if (usage) + ms->usage = NULL; + kfree(usage); + return ERR_PTR(rc); + } + + /* + * The early init code does not consider partially populated + * initial sections, it simply assumes that memory will never be + * referenced. If we hot-add memory into such a section then we + * do not need to populate the memmap and can simply reuse what + * is already there. + */ + if (nr_pages < PAGES_PER_SECTION && is_early_section(ms)) + return pfn_to_page(pfn); + + memmap = populate_section_memmap(pfn, nr_pages, nid, altmap); + if (!memmap) { + section_deactivate(pfn, nr_pages, altmap); + return ERR_PTR(-ENOMEM); + } + + return memmap; +} + /** - * sparse_add_one_section - add a memory section + * sparse_add_section - add a memory section, or populate an existing one * @nid: The node to add section on * @start_pfn: start pfn of the memory range + * @nr_pages: number of pfns to add in the section * @altmap: device page map * * This is only intended for hotplug. @@ -743,50 +880,29 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn, unsigned long nr_pages, struct vmem_altmap *altmap) { unsigned long section_nr = pfn_to_section_nr(start_pfn); - struct mem_section_usage *usage; struct mem_section *ms; struct page *memmap; int ret; - /* - * no locking for this, because it does its own - * plus, it does a kmalloc - */ ret = sparse_index_init(section_nr, nid); - if (ret < 0 && ret != -EEXIST) + if (ret < 0) return ret; - ret = 0; - memmap = populate_section_memmap(start_pfn, PAGES_PER_SECTION, nid, - altmap); - if (!memmap) - return -ENOMEM; - usage = kzalloc(mem_section_usage_size(), GFP_KERNEL); - if (!usage) { - depopulate_section_memmap(start_pfn, PAGES_PER_SECTION, altmap); - return -ENOMEM; - } - ms = __pfn_to_section(start_pfn); - if (ms->section_mem_map & SECTION_MARKED_PRESENT) { - ret = -EEXIST; - goto out; - } + memmap = section_activate(nid, start_pfn, nr_pages, altmap); + if (IS_ERR(memmap)) + return PTR_ERR(memmap); /* * Poison uninitialized struct pages in order to catch invalid flags * combinations. */ - page_init_poison(memmap, sizeof(struct page) * PAGES_PER_SECTION); + page_init_poison(pfn_to_page(start_pfn), sizeof(struct page) * nr_pages); + ms = __pfn_to_section(start_pfn); section_mark_present(ms); - sparse_init_one_section(ms, section_nr, memmap, usage); + sparse_init_one_section(ms, section_nr, memmap, ms->usage); -out: - if (ret < 0) { - kfree(usage); - depopulate_section_memmap(start_pfn, PAGES_PER_SECTION, altmap); - } - return ret; + return 0; } #ifdef CONFIG_MEMORY_FAILURE @@ -819,51 +935,12 @@ static inline void clear_hwpoisoned_pages(struct page *memmap, int nr_pages) } #endif -static void free_section_usage(struct page *memmap, - struct mem_section_usage *usage, unsigned long pfn, - unsigned long nr_pages, struct vmem_altmap *altmap) -{ - struct page *usage_page; - - if (!usage) - return; - - usage_page = virt_to_page(usage); - /* - * Check to see if allocation came from hot-plug-add - */ - if (PageSlab(usage_page) || PageCompound(usage_page)) { - kfree(usage); - if (memmap) - depopulate_section_memmap(pfn, nr_pages, altmap); - return; - } - - /* - * The usemap came from bootmem. This is packed with other usemaps - * on the section which has pgdat at boot time. Just keep it as is now. - */ - - if (memmap) - free_map_bootmem(memmap); -} - -void sparse_remove_one_section(struct mem_section *ms, unsigned long pfn, +void sparse_remove_section(struct mem_section *ms, unsigned long pfn, unsigned long nr_pages, unsigned long map_offset, struct vmem_altmap *altmap) { - struct page *memmap = NULL; - struct mem_section_usage *usage = NULL; - - if (ms->section_mem_map) { - usage = ms->usage; - memmap = sparse_decode_mem_map(ms->section_mem_map, - __section_nr(ms)); - ms->section_mem_map = 0; - ms->usage = NULL; - } - - clear_hwpoisoned_pages(memmap + map_offset, nr_pages - map_offset); - free_section_usage(memmap, usage, pfn, nr_pages, altmap); + clear_hwpoisoned_pages(pfn_to_page(pfn) + map_offset, + nr_pages - map_offset); + section_deactivate(pfn, nr_pages, altmap); } #endif /* CONFIG_MEMORY_HOTPLUG */ From patchwork Wed Jun 5 21:58:47 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 10977985 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 634AE1398 for ; Wed, 5 Jun 2019 22:13:09 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 518D921BED for ; Wed, 5 Jun 2019 22:13:09 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 45B722890A; Wed, 5 Jun 2019 22:13:09 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=unavailable version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CA07D21BED for ; Wed, 5 Jun 2019 22:13:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D13956B0278; Wed, 5 Jun 2019 18:13:07 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id CC4AA6B0279; Wed, 5 Jun 2019 18:13:07 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BD9D76B027A; Wed, 5 Jun 2019 18:13:07 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f200.google.com (mail-pf1-f200.google.com [209.85.210.200]) by kanga.kvack.org (Postfix) with ESMTP id 88CBA6B0278 for ; Wed, 5 Jun 2019 18:13:07 -0400 (EDT) Received: by mail-pf1-f200.google.com with SMTP id o184so295364pfg.1 for ; Wed, 05 Jun 2019 15:13:07 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:subject:from :to:cc:date:message-id:in-reply-to:references:user-agent :mime-version:content-transfer-encoding; bh=ZKEVF6/9Fmnf7YWUEHXmw9ISQdNkRcpJo8Na4G/GKC4=; b=BCeqToVzLBF1A1ZbWV9+cG9POqmsiD7ImpqRXtbidKaD7LyIHvX2zbMyLnH7pkrSY3 A/2G4IRARUGlWtYnf8cojZnwsTX/dGSoJeAIFxwDLX3dbJ6pdiWsz/yXYSVv0JisY2Ep fofwdBVljof9ud9lvOWrzU8RvqbfU2ROFnsB1CozwP7tLlwr3Fy2R5X2E/JvaT9lChxX TNIPjpZPuC41ryFNerKwiqgKQJrZHV3uDfy2Ka922kn8dBzjnaZZCb/pwg7bNCijmbwl sAT5h5RnuwHHRhcojRC/JBqTRAN2KXkcNZTCAEfHDrkJpnng+LjnAzJLwDL6qYqBW0p5 VVng== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 134.134.136.126 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: APjAAAW0mGyjR/39WAwdSKFAFLG6m4IfVC0tNqZCZkEf+tLSV4tvvZeu 3kAyM9DNowiSBRJtjJnianDg0OdpTD40+xqrAyI61wdAa42IojCv5g551Iy2onz23SyhxA2dkZc fhO7M2CFOwZ/L5JKC2Wv0trFaeooENo0WK9WyZgCSAyeeU+xqKfaas7b/37UWHXqmSg== X-Received: by 2002:a62:6844:: with SMTP id d65mr49852196pfc.175.1559772787071; Wed, 05 Jun 2019 15:13:07 -0700 (PDT) X-Google-Smtp-Source: APXvYqzOrtAL1AwWounTtOkAYRDAe1mqjQ4r+x2AJblYA/43LdlB/U5cDZoB46q7dxBps6Oy1t3/ X-Received: by 2002:a62:6844:: with SMTP id d65mr49852068pfc.175.1559772785999; Wed, 05 Jun 2019 15:13:05 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1559772785; cv=none; d=google.com; s=arc-20160816; b=TR5rrXpDVhZsekuRzaSB4AsqO2fb7IuTcTG7+PW7OdNqYOwlL1rICZLxi2tvZe9j4S ZqEMimBhgSZfXuU+HDfbrOpIKElnOJlviH1ah6DnnFCpVaPt3AEk0GRkoPLkNgBzvFL+ IPMNEZ+JETOtdDerbwzAY9c32Y8NJ8CiMf5HFIbcaOaJvc17iK+cc6qPCp3BDCPuYk77 n4O1AXHTNU6HUH2zV04IAjHEk2IG/F+DFzAgxyxMX9bhoexoGWqHF5gkCGQOE1ak8eiZ xIl1OH8oeglPIMrxqmJsv6pzGUUPrfkZTx7f2ZpFaDHuhxOrRyh5Olu+5XZcviyR4eiK m8Pw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:user-agent:references :in-reply-to:message-id:date:cc:to:from:subject; bh=ZKEVF6/9Fmnf7YWUEHXmw9ISQdNkRcpJo8Na4G/GKC4=; b=Lsx+GCXH4jA80/tsF/krCVdFvNs6+19xY7y4/p/oTU4cNXGEHZXz5JImzU8h4BxnDH 4ZdzZ+y/N6+i9QbV3KE1ARd6Gnf5B8H0GBfUouZodg+vV+WMz86ZhVUqODVCNqcPkxlR uoro5C/RVrQ6HLJd0WqNeqOzAowCPf5AkcI5bNMCL0yqdoDBkGVccLMNud/+Hz9g0U7q J8nzFdj3wy9C12mURmWze9QNryU6r8xQiSZy/oDUU6gbjhpFL+CtpOryHylEdxpAd+B3 KqA7rbVc7/0LDqP7PoPH72DCWPivlr2gRkq+RkVSIPhsR0+orX49S6LZuDThHwcmwyCY glZw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 134.134.136.126 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga18.intel.com (mga18.intel.com. [134.134.136.126]) by mx.google.com with ESMTPS id i2si152453pgl.282.2019.06.05.15.13.05 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 05 Jun 2019 15:13:05 -0700 (PDT) Received-SPF: pass (google.com: domain of dan.j.williams@intel.com designates 134.134.136.126 as permitted sender) client-ip=134.134.136.126; Authentication-Results: mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 134.134.136.126 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga106.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 05 Jun 2019 15:13:04 -0700 X-ExtLoop1: 1 Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.16]) by fmsmga008.fm.intel.com with ESMTP; 05 Jun 2019 15:13:04 -0700 Subject: [PATCH v9 09/12] mm: Document ZONE_DEVICE memory-model implications From: Dan Williams To: akpm@linux-foundation.org Cc: Jonathan Corbet , Mike Rapoport , linux-mm@kvack.org, linux-nvdimm@lists.01.org, linux-kernel@vger.kernel.org, osalvador@suse.de, mhocko@suse.com Date: Wed, 05 Jun 2019 14:58:47 -0700 Message-ID: <155977192794.2443951.16177998596403034849.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <155977186863.2443951.9036044808311959913.stgit@dwillia2-desk3.amr.corp.intel.com> References: <155977186863.2443951.9036044808311959913.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.18-2-gc94f MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Explain the general mechanisms of 'ZONE_DEVICE' pages and list the users of 'devm_memremap_pages()'. Cc: Jonathan Corbet Reported-by: Mike Rapoport Signed-off-by: Dan Williams --- Documentation/vm/memory-model.rst | 39 +++++++++++++++++++++++++++++++++++++ 1 file changed, 39 insertions(+) diff --git a/Documentation/vm/memory-model.rst b/Documentation/vm/memory-model.rst index 382f72ace1fc..e0af47e02e78 100644 --- a/Documentation/vm/memory-model.rst +++ b/Documentation/vm/memory-model.rst @@ -181,3 +181,42 @@ that is eventually passed to vmemmap_populate() through a long chain of function calls. The vmemmap_populate() implementation may use the `vmem_altmap` along with :c:func:`altmap_alloc_block_buf` helper to allocate memory map on the persistent memory device. + +ZONE_DEVICE +=========== +The `ZONE_DEVICE` facility builds upon `SPARSEMEM_VMEMMAP` to offer +`struct page` `mem_map` services for device driver identified physical +address ranges. The "device" aspect of `ZONE_DEVICE` relates to the fact +that the page objects for these address ranges are never marked online, +and that a reference must be taken against the device, not just the page +to keep the memory pinned for active use. `ZONE_DEVICE`, via +:c:func:`devm_memremap_pages`, performs just enough memory hotplug to +turn on :c:func:`pfn_to_page`, :c:func:`page_to_pfn`, and +:c:func:`get_user_pages` service for the given range of pfns. Since the +page reference count never drops below 1 the page is never tracked as +free memory and the page's `struct list_head lru` space is repurposed +for back referencing to the host device / driver that mapped the memory. + +While `SPARSEMEM` presents memory as a collection of sections, +optionally collected into memory blocks, `ZONE_DEVICE` users have a need +for smaller granularity of populating the `mem_map`. Given that +`ZONE_DEVICE` memory is never marked online it is subsequently never +subject to its memory ranges being exposed through the sysfs memory +hotplug api on memory block boundaries. The implementation relies on +this lack of user-api constraint to allow sub-section sized memory +ranges to be specified to :c:func:`arch_add_memory`, the top-half of +memory hotplug. Sub-section support allows for `PMD_SIZE` as the minimum +alignment granularity for :c:func:`devm_memremap_pages`. + +The users of `ZONE_DEVICE` are: +* pmem: Map platform persistent memory to be used as a direct-I/O target + via DAX mappings. + +* hmm: Extend `ZONE_DEVICE` with `->page_fault()` and `->page_free()` + event callbacks to allow a device-driver to coordinate memory management + events related to device-memory, typically GPU memory. See + Documentation/vm/hmm.rst. + +* p2pdma: Create `struct page` objects to allow peer devices in a + PCI/-E topology to coordinate direct-DMA operations between themselves, + i.e. bypass host memory. From patchwork Wed Jun 5 21:58:53 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 10977989 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6F2F16C5 for ; Wed, 5 Jun 2019 22:13:14 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5C27621BED for ; Wed, 5 Jun 2019 22:13:14 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 4A0F32890A; Wed, 5 Jun 2019 22:13:14 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7CA6221BED for ; Wed, 5 Jun 2019 22:13:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 684626B0279; Wed, 5 Jun 2019 18:13:12 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 634E76B027A; Wed, 5 Jun 2019 18:13:12 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4D7316B027B; Wed, 5 Jun 2019 18:13:12 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f200.google.com (mail-pl1-f200.google.com [209.85.214.200]) by kanga.kvack.org (Postfix) with ESMTP id 131776B0279 for ; Wed, 5 Jun 2019 18:13:12 -0400 (EDT) Received: by mail-pl1-f200.google.com with SMTP id g65so199583plb.9 for ; Wed, 05 Jun 2019 15:13:12 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:subject:from :to:cc:date:message-id:in-reply-to:references:user-agent :mime-version:content-transfer-encoding; bh=EowCKhf3YudIrPL5E9Ee9l1jM9t44G3yACrnSK7J+yA=; b=o4UTvGwmUMBfhO5+MFVPu6Byp9dC8j8MykdusfB2BS5I3KM2PmSjTY65zoq0zt9KVY Ragv1A+nhZxReoh/CK4dh94iX5VS/uKG6eTecAOoddMkVp5UzCyLKUymmJAmVJJyBGdz yIslm++gxm2Cluqf7JN3dneuOTzFe9CqX0zSzr5RXjj2UbypvJ35hzUP9efeR3kl0+PH g924w7KDP79NOBXOCID58MFRNpi9Ok7B0NUdpNxS5lwYAa2yjFg1oqHO7R7C4i0gN0Yj 3Bl7ExzjgCamP0k8XC5cUCKCYXVcKYbGhIPqUWoIz5AjafxbwTS/Byijz7HGUw8E4WkY D74Q== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 192.55.52.93 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: APjAAAWiR51m/h3AtsZUbvxBceEw0DnJgWGuE1pa2DMUqAMQMKUZnc+9 /b3qXNm1IfMWRLmAbAboKXhciLMyWKMf6UDqZmvEqGgMLN0sD87dv7nLQELP9nb8w2bHZW5Ifpg 5mvWcX5R8oeGBbum1spE/AZkSygWQyp0/VChqqGVnsMawFixB2zM8cChtWXBpXiNBRw== X-Received: by 2002:a62:2b81:: with SMTP id r123mr9714543pfr.108.1559772791553; Wed, 05 Jun 2019 15:13:11 -0700 (PDT) X-Google-Smtp-Source: APXvYqy1DRTHEh6bMsHHq1hiYd7REQsE44LOrdekQ23nZi5pjRReaiboAIAbFYrz9WIIC/d5OM16 X-Received: by 2002:a62:2b81:: with SMTP id r123mr9714435pfr.108.1559772790607; Wed, 05 Jun 2019 15:13:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1559772790; cv=none; d=google.com; s=arc-20160816; b=cAg68RF6tGCAxLDua9+LruaqBDmxpaaGdeL8W2WLNxaAL8E2DJ/CNjphXbw9UBG5bX hqphib0uItgLcQapf/fU/KrfYEYGZzhLXMK45ePx1/XhvbUPwe4cxlT/afzEoH5c2eCX Ww6hX98rPVlLokZulsSM1wmEm3bfG/iqdtEM5AsKznX6SsCern8wMwm79G7Ogq4myRkG +ZbhlAQMm1g2n1gqMdQxgJGdj84bwNvx7lBCnbL/4Tc0w7SCKDZap+JrtnwXaweBRV28 zLLIv67KRusW7+K+tByqo8j4H3uEiQpCFmp2uxd1oh/jeRwYHIKJe/aTG+5Pe8ukD5H8 9kJw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:user-agent:references :in-reply-to:message-id:date:cc:to:from:subject; bh=EowCKhf3YudIrPL5E9Ee9l1jM9t44G3yACrnSK7J+yA=; b=ylwFvdRYG4o8UQ8Ceezgj1glTvLl8JhEPC+eSMZ6eWwQ0ctuWFr+zOEZv5Wq1WaO29 G1Dm+hYIQju0sLLYOoT0F51XkRnNm1n8FdOjQyRDzrFMxrA64peA8Hs4C50hjw7ZxFcb KI8KAEpqn0+7tB0hQEWz6IUiIgIJNAob3BGQC41PGVEjvnIR5UuOATl604/IwOpCaBNO BUic/AxC/EAjDEZJuZHHjt5OIPTBTqFAUm19SvrdPaWOTFl3be0fmYDC/W/Uv/CNeBib 2+bk9BnU8/0DKRacb0+PNogXXTamexMUXSXuHzDJTKYxGRAoCxXYH+2UxYRrUxwncXsY VOxg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 192.55.52.93 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga11.intel.com (mga11.intel.com. [192.55.52.93]) by mx.google.com with ESMTPS id w12si10385063pld.301.2019.06.05.15.13.10 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 05 Jun 2019 15:13:10 -0700 (PDT) Received-SPF: pass (google.com: domain of dan.j.williams@intel.com designates 192.55.52.93 as permitted sender) client-ip=192.55.52.93; Authentication-Results: mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 192.55.52.93 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga102.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 05 Jun 2019 15:13:10 -0700 X-ExtLoop1: 1 Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.16]) by fmsmga006.fm.intel.com with ESMTP; 05 Jun 2019 15:13:09 -0700 Subject: [PATCH v9 10/12] mm/devm_memremap_pages: Enable sub-section remap From: Dan Williams To: akpm@linux-foundation.org Cc: Michal Hocko , Toshi Kani , =?utf-8?b?SsOpcsO0bWU=?= Glisse , Logan Gunthorpe , Oscar Salvador , Pavel Tatashin , linux-mm@kvack.org, linux-nvdimm@lists.01.org, linux-kernel@vger.kernel.org, osalvador@suse.de, mhocko@suse.com Date: Wed, 05 Jun 2019 14:58:53 -0700 Message-ID: <155977193326.2443951.14201009973429527491.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <155977186863.2443951.9036044808311959913.stgit@dwillia2-desk3.amr.corp.intel.com> References: <155977186863.2443951.9036044808311959913.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.18-2-gc94f MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Teach devm_memremap_pages() about the new sub-section capabilities of arch_{add,remove}_memory(). Effectively, just replace all usage of align_start, align_end, and align_size with res->start, res->end, and resource_size(res). The existing sanity check will still make sure that the two separate remap attempts do not collide within a sub-section (2MB on x86). Cc: Michal Hocko Cc: Toshi Kani Cc: Jérôme Glisse Cc: Logan Gunthorpe Cc: Oscar Salvador Cc: Pavel Tatashin Signed-off-by: Dan Williams Reviewed-by: Oscar Salvador --- kernel/memremap.c | 61 +++++++++++++++++++++-------------------------------- 1 file changed, 24 insertions(+), 37 deletions(-) diff --git a/kernel/memremap.c b/kernel/memremap.c index 57980ed4e571..a0e5f6b91b04 100644 --- a/kernel/memremap.c +++ b/kernel/memremap.c @@ -58,7 +58,7 @@ static unsigned long pfn_first(struct dev_pagemap *pgmap) struct vmem_altmap *altmap = &pgmap->altmap; unsigned long pfn; - pfn = res->start >> PAGE_SHIFT; + pfn = PHYS_PFN(res->start); if (pgmap->altmap_valid) pfn += vmem_altmap_offset(altmap); return pfn; @@ -86,7 +86,6 @@ static void devm_memremap_pages_release(void *data) struct dev_pagemap *pgmap = data; struct device *dev = pgmap->dev; struct resource *res = &pgmap->res; - resource_size_t align_start, align_size; unsigned long pfn; int nid; @@ -96,25 +95,21 @@ static void devm_memremap_pages_release(void *data) pgmap->cleanup(pgmap->ref); /* pages are dead and unused, undo the arch mapping */ - align_start = res->start & ~(PA_SECTION_SIZE - 1); - align_size = ALIGN(res->start + resource_size(res), PA_SECTION_SIZE) - - align_start; - - nid = page_to_nid(pfn_to_page(align_start >> PAGE_SHIFT)); + nid = page_to_nid(pfn_to_page(PHYS_PFN(res->start))); mem_hotplug_begin(); if (pgmap->type == MEMORY_DEVICE_PRIVATE) { - pfn = align_start >> PAGE_SHIFT; + pfn = PHYS_PFN(res->start); __remove_pages(page_zone(pfn_to_page(pfn)), pfn, - align_size >> PAGE_SHIFT, NULL); + PHYS_PFN(resource_size(res)), NULL); } else { - arch_remove_memory(nid, align_start, align_size, + arch_remove_memory(nid, res->start, resource_size(res), pgmap->altmap_valid ? &pgmap->altmap : NULL); - kasan_remove_zero_shadow(__va(align_start), align_size); + kasan_remove_zero_shadow(__va(res->start), resource_size(res)); } mem_hotplug_done(); - untrack_pfn(NULL, PHYS_PFN(align_start), align_size); + untrack_pfn(NULL, PHYS_PFN(res->start), resource_size(res)); pgmap_array_delete(res); dev_WARN_ONCE(dev, pgmap->altmap.alloc, "%s: failed to free all reserved pages\n", __func__); @@ -141,16 +136,13 @@ static void devm_memremap_pages_release(void *data) */ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap) { - resource_size_t align_start, align_size, align_end; - struct vmem_altmap *altmap = pgmap->altmap_valid ? - &pgmap->altmap : NULL; struct resource *res = &pgmap->res; struct dev_pagemap *conflict_pgmap; struct mhp_restrictions restrictions = { /* * We do not want any optional features only our own memmap */ - .altmap = altmap, + .altmap = pgmap->altmap_valid ? &pgmap->altmap : NULL, }; pgprot_t pgprot = PAGE_KERNEL; int error, nid, is_ram; @@ -160,12 +152,7 @@ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap) return ERR_PTR(-EINVAL); } - align_start = res->start & ~(PA_SECTION_SIZE - 1); - align_size = ALIGN(res->start + resource_size(res), PA_SECTION_SIZE) - - align_start; - align_end = align_start + align_size - 1; - - conflict_pgmap = get_dev_pagemap(PHYS_PFN(align_start), NULL); + conflict_pgmap = get_dev_pagemap(PHYS_PFN(res->start), NULL); if (conflict_pgmap) { dev_WARN(dev, "Conflicting mapping in same section\n"); put_dev_pagemap(conflict_pgmap); @@ -173,7 +160,7 @@ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap) goto err_array; } - conflict_pgmap = get_dev_pagemap(PHYS_PFN(align_end), NULL); + conflict_pgmap = get_dev_pagemap(PHYS_PFN(res->end), NULL); if (conflict_pgmap) { dev_WARN(dev, "Conflicting mapping in same section\n"); put_dev_pagemap(conflict_pgmap); @@ -181,7 +168,7 @@ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap) goto err_array; } - is_ram = region_intersects(align_start, align_size, + is_ram = region_intersects(res->start, resource_size(res), IORESOURCE_SYSTEM_RAM, IORES_DESC_NONE); if (is_ram != REGION_DISJOINT) { @@ -202,8 +189,8 @@ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap) if (nid < 0) nid = numa_mem_id(); - error = track_pfn_remap(NULL, &pgprot, PHYS_PFN(align_start), 0, - align_size); + error = track_pfn_remap(NULL, &pgprot, PHYS_PFN(res->start), 0, + resource_size(res)); if (error) goto err_pfn_remap; @@ -221,25 +208,25 @@ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap) * arch_add_memory(). */ if (pgmap->type == MEMORY_DEVICE_PRIVATE) { - error = add_pages(nid, align_start >> PAGE_SHIFT, - align_size >> PAGE_SHIFT, &restrictions); + error = add_pages(nid, PHYS_PFN(res->start), + PHYS_PFN(resource_size(res)), &restrictions); } else { - error = kasan_add_zero_shadow(__va(align_start), align_size); + error = kasan_add_zero_shadow(__va(res->start), resource_size(res)); if (error) { mem_hotplug_done(); goto err_kasan; } - error = arch_add_memory(nid, align_start, align_size, - &restrictions); + error = arch_add_memory(nid, res->start, resource_size(res), + &restrictions); } if (!error) { struct zone *zone; zone = &NODE_DATA(nid)->node_zones[ZONE_DEVICE]; - move_pfn_range_to_zone(zone, align_start >> PAGE_SHIFT, - align_size >> PAGE_SHIFT, altmap); + move_pfn_range_to_zone(zone, PHYS_PFN(res->start), + PHYS_PFN(resource_size(res)), restrictions.altmap); } mem_hotplug_done(); @@ -251,8 +238,8 @@ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap) * to allow us to do the work while not holding the hotplug lock. */ memmap_init_zone_device(&NODE_DATA(nid)->node_zones[ZONE_DEVICE], - align_start >> PAGE_SHIFT, - align_size >> PAGE_SHIFT, pgmap); + PHYS_PFN(res->start), + PHYS_PFN(resource_size(res)), pgmap); percpu_ref_get_many(pgmap->ref, pfn_end(pgmap) - pfn_first(pgmap)); error = devm_add_action_or_reset(dev, devm_memremap_pages_release, @@ -263,9 +250,9 @@ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap) return __va(res->start); err_add_memory: - kasan_remove_zero_shadow(__va(align_start), align_size); + kasan_remove_zero_shadow(__va(res->start), resource_size(res)); err_kasan: - untrack_pfn(NULL, PHYS_PFN(align_start), align_size); + untrack_pfn(NULL, PHYS_PFN(res->start), resource_size(res)); err_pfn_remap: pgmap_array_delete(res); err_array: From patchwork Wed Jun 5 21:58:58 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 10977993 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 15D4C1398 for ; Wed, 5 Jun 2019 22:13:20 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 05A5B21BED for ; Wed, 5 Jun 2019 22:13:20 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id EE49F2890A; Wed, 5 Jun 2019 22:13:19 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=unavailable version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 62F1321BED for ; Wed, 5 Jun 2019 22:13:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 488636B026E; Wed, 5 Jun 2019 18:13:18 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 439446B026F; Wed, 5 Jun 2019 18:13:18 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3282F6B027A; Wed, 5 Jun 2019 18:13:18 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f199.google.com (mail-pg1-f199.google.com [209.85.215.199]) by kanga.kvack.org (Postfix) with ESMTP id F10866B026E for ; Wed, 5 Jun 2019 18:13:17 -0400 (EDT) Received: by mail-pg1-f199.google.com with SMTP id a13so98957pgw.19 for ; Wed, 05 Jun 2019 15:13:17 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:subject:from :to:cc:date:message-id:in-reply-to:references:user-agent :mime-version:content-transfer-encoding; bh=LKvk91HhP9ezsuBTNBJqcLkpO+1Sux7+j9Ueu/2PT+g=; b=bXFsx2GRZJdYcBWW8GnN92biCo4ZSNd9e6INpaamO6QUbhYFVKze/A4wIVtrU1ZHlq rUI7rqRjcMZ1Z2YoB83Ekx6P3ar2trgQidJmzrJXelOSHRB3650UlBycW3XWxP3oE+Br 7FyXlMSQrv0pH8FidcAPOFEmRkXZWVOW9p63j/53bjXwtgwjpJLzahrv+9hdSlDVGZ5q BXXAhBxlzfioJM5BZXaHcVTz4SnYNbEkK7UVFSsl8/HNrp7PWknoGtzJdEBrtsznlRIz C90n9wfVtGxxB68nZSESSXxy6JhB6F+zc37W+eu9Uxml50DiKY4DwmMUVJuvMU6l+US0 Ab/Q== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 192.55.52.136 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: APjAAAWIo1E5l1T0O8udwD2PedwhrY9mCTSEY7nO2JpwtQBQXznAbyRC CgtTImqshi57bFg4BCyq1o/NcBfIiK+i6Vf12OUGsp9QPQhEmSwBskRO1gxLSWBEdIGhx+F/lUf jde4CctkM1zoNYV29LYNf9+XbtUS9Yq+dc8iV8knyzrRWyjNz3OPYMkvt2r6kdgjzsg== X-Received: by 2002:aa7:83d4:: with SMTP id j20mr50912403pfn.90.1559772797466; Wed, 05 Jun 2019 15:13:17 -0700 (PDT) X-Google-Smtp-Source: APXvYqylDL+Le/zO/mDWoAEo5olviI84hVHoAXlFSQ9Pq1HsZFoWqkRsznTkZQXAhEzbNoJs2uCi X-Received: by 2002:aa7:83d4:: with SMTP id j20mr50912224pfn.90.1559772795964; Wed, 05 Jun 2019 15:13:15 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1559772795; cv=none; d=google.com; s=arc-20160816; b=gGryb6oyPBxAdBKeX1wAN+bNMHYcViH8nwjqC/g4Nyna1Z+5/ms5XebxWd9QqoVDp3 nhcYGwUaGbwRkjW2lyY8YIsDXE6XvpXVMbhy45bjY/BWd3+1+yZ71shj2vH6AFc4sR6V xf6nC43t/ouTy8Fo8e1RuJ/RtOI4AQq0uaUeb85Uri5k0iFxLGvgCs++GhN0cv/kyzhH 6JMNpbcouFogXl49rk3ngQmI6UnliDZueo2REyIOZoV+wSDmJZoBHs+T7yvA5VkOb+/S nHVcpaARV7ho1p3HCXraZO4WXEu/LFaY1qLr4lPEkwlt+PoxQyj4Nvmh1uYFOxcp8s9Q 6AiA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:user-agent:references :in-reply-to:message-id:date:cc:to:from:subject; bh=LKvk91HhP9ezsuBTNBJqcLkpO+1Sux7+j9Ueu/2PT+g=; b=xyvlTPTr0bXwUPnUa75jb5dzI2QjN/3xgb9uXyCenRFZJTbkPB1CJYEzx+SjO9RE9d gkmYIexvOD+GbEvJ1rdP9MxWN5zFjtapgBoCo7bC9mXSTv9meAQeAUB3r7RATyfZaSQZ B4pslN5q9XKWFcn+ER4Yt2+4uyiirdCJNRJijVZ3DIE9fIomLaAagIhrr8RcW7Q3ewAZ jloq0I1mIjni1QCWtRyE3cYARi68MPZbD8cHQrESGdswyArRkge8oVNY+CvmASRSpHqk snvBy/dYTjVd+X8zFfyewBtd68A5CJZAKvVNQcjRjS0Nffvu+5mjFljSKgMihYRLgYU3 c/ow== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 192.55.52.136 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga12.intel.com (mga12.intel.com. [192.55.52.136]) by mx.google.com with ESMTPS id v4si33263997pfn.197.2019.06.05.15.13.15 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 05 Jun 2019 15:13:15 -0700 (PDT) Received-SPF: pass (google.com: domain of dan.j.williams@intel.com designates 192.55.52.136 as permitted sender) client-ip=192.55.52.136; Authentication-Results: mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 192.55.52.136 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga106.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 05 Jun 2019 15:13:15 -0700 X-ExtLoop1: 1 Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.16]) by fmsmga005.fm.intel.com with ESMTP; 05 Jun 2019 15:13:15 -0700 Subject: [PATCH v9 11/12] libnvdimm/pfn: Fix fsdax-mode namespace info-block zero-fields From: Dan Williams To: akpm@linux-foundation.org Cc: stable@vger.kernel.org, linux-mm@kvack.org, linux-nvdimm@lists.01.org, linux-kernel@vger.kernel.org, osalvador@suse.de, mhocko@suse.com Date: Wed, 05 Jun 2019 14:58:58 -0700 Message-ID: <155977193862.2443951.10284714500308539570.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <155977186863.2443951.9036044808311959913.stgit@dwillia2-desk3.amr.corp.intel.com> References: <155977186863.2443951.9036044808311959913.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.18-2-gc94f MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP At namespace creation time there is the potential for the "expected to be zero" fields of a 'pfn' info-block to be filled with indeterminate data. While the kernel buffer is zeroed on allocation it is immediately overwritten by nd_pfn_validate() filling it with the current contents of the on-media info-block location. For fields like, 'flags' and the 'padding' it potentially means that future implementations can not rely on those fields being zero. In preparation to stop using the 'start_pad' and 'end_trunc' fields for section alignment, arrange for fields that are not explicitly initialized to be guaranteed zero. Bump the minor version to indicate it is safe to assume the 'padding' and 'flags' are zero. Otherwise, this corruption is expected to benign since all other critical fields are explicitly initialized. Fixes: 32ab0a3f5170 ("libnvdimm, pmem: 'struct page' for pmem") Cc: Signed-off-by: Dan Williams --- drivers/nvdimm/dax_devs.c | 2 +- drivers/nvdimm/pfn.h | 1 + drivers/nvdimm/pfn_devs.c | 18 +++++++++++++++--- 3 files changed, 17 insertions(+), 4 deletions(-) diff --git a/drivers/nvdimm/dax_devs.c b/drivers/nvdimm/dax_devs.c index 0453f49dc708..326f02ffca81 100644 --- a/drivers/nvdimm/dax_devs.c +++ b/drivers/nvdimm/dax_devs.c @@ -126,7 +126,7 @@ int nd_dax_probe(struct device *dev, struct nd_namespace_common *ndns) nvdimm_bus_unlock(&ndns->dev); if (!dax_dev) return -ENOMEM; - pfn_sb = devm_kzalloc(dev, sizeof(*pfn_sb), GFP_KERNEL); + pfn_sb = devm_kmalloc(dev, sizeof(*pfn_sb), GFP_KERNEL); nd_pfn->pfn_sb = pfn_sb; rc = nd_pfn_validate(nd_pfn, DAX_SIG); dev_dbg(dev, "dax: %s\n", rc == 0 ? dev_name(dax_dev) : ""); diff --git a/drivers/nvdimm/pfn.h b/drivers/nvdimm/pfn.h index dde9853453d3..e901e3a3b04c 100644 --- a/drivers/nvdimm/pfn.h +++ b/drivers/nvdimm/pfn.h @@ -36,6 +36,7 @@ struct nd_pfn_sb { __le32 end_trunc; /* minor-version-2 record the base alignment of the mapping */ __le32 align; + /* minor-version-3 guarantee the padding and flags are zero */ u8 padding[4000]; __le64 checksum; }; diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c index 01f40672507f..a2406253eb70 100644 --- a/drivers/nvdimm/pfn_devs.c +++ b/drivers/nvdimm/pfn_devs.c @@ -420,6 +420,15 @@ static int nd_pfn_clear_memmap_errors(struct nd_pfn *nd_pfn) return 0; } +/** + * nd_pfn_validate - read and validate info-block + * @nd_pfn: fsdax namespace runtime state / properties + * @sig: 'devdax' or 'fsdax' signature + * + * Upon return the info-block buffer contents (->pfn_sb) are + * indeterminate when validation fails, and a coherent info-block + * otherwise. + */ int nd_pfn_validate(struct nd_pfn *nd_pfn, const char *sig) { u64 checksum, offset; @@ -565,7 +574,7 @@ int nd_pfn_probe(struct device *dev, struct nd_namespace_common *ndns) nvdimm_bus_unlock(&ndns->dev); if (!pfn_dev) return -ENOMEM; - pfn_sb = devm_kzalloc(dev, sizeof(*pfn_sb), GFP_KERNEL); + pfn_sb = devm_kmalloc(dev, sizeof(*pfn_sb), GFP_KERNEL); nd_pfn = to_nd_pfn(pfn_dev); nd_pfn->pfn_sb = pfn_sb; rc = nd_pfn_validate(nd_pfn, PFN_SIG); @@ -702,7 +711,7 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn) u64 checksum; int rc; - pfn_sb = devm_kzalloc(&nd_pfn->dev, sizeof(*pfn_sb), GFP_KERNEL); + pfn_sb = devm_kmalloc(&nd_pfn->dev, sizeof(*pfn_sb), GFP_KERNEL); if (!pfn_sb) return -ENOMEM; @@ -711,11 +720,14 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn) sig = DAX_SIG; else sig = PFN_SIG; + rc = nd_pfn_validate(nd_pfn, sig); if (rc != -ENODEV) return rc; /* no info block, do init */; + memset(pfn_sb, 0, sizeof(*pfn_sb)); + nd_region = to_nd_region(nd_pfn->dev.parent); if (nd_region->ro) { dev_info(&nd_pfn->dev, @@ -768,7 +780,7 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn) memcpy(pfn_sb->uuid, nd_pfn->uuid, 16); memcpy(pfn_sb->parent_uuid, nd_dev_to_uuid(&ndns->dev), 16); pfn_sb->version_major = cpu_to_le16(1); - pfn_sb->version_minor = cpu_to_le16(2); + pfn_sb->version_minor = cpu_to_le16(3); pfn_sb->start_pad = cpu_to_le32(start_pad); pfn_sb->end_trunc = cpu_to_le32(end_trunc); pfn_sb->align = cpu_to_le32(nd_pfn->align); From patchwork Wed Jun 5 21:59:04 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 10977997 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 04EDC6C5 for ; Wed, 5 Jun 2019 22:13:26 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E7B6F21BED for ; Wed, 5 Jun 2019 22:13:25 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id DBCCF2890A; Wed, 5 Jun 2019 22:13:25 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=unavailable version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2D8D821BED for ; Wed, 5 Jun 2019 22:13:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E4BA86B0270; Wed, 5 Jun 2019 18:13:23 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id DFD5D6B027A; Wed, 5 Jun 2019 18:13:23 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CEBB76B027B; Wed, 5 Jun 2019 18:13:23 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f199.google.com (mail-pl1-f199.google.com [209.85.214.199]) by kanga.kvack.org (Postfix) with ESMTP id 962E06B0270 for ; Wed, 5 Jun 2019 18:13:23 -0400 (EDT) Received: by mail-pl1-f199.google.com with SMTP id r7so206086plo.6 for ; Wed, 05 Jun 2019 15:13:23 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:subject:from :to:cc:date:message-id:in-reply-to:references:user-agent :mime-version:content-transfer-encoding; bh=bYwzlu+qlg87yG+G9ABQwbpiA0ygWWX0zPRHiAUPmik=; b=ItquAhwoVCP6h+FBeqT3xC+NmAyZwLCWVMiUxNjQohd+H2X5iBBYIxqP3Scj/KfpUq 9Vei+yTWWhxSklx0MdcdoHBpVAKzMf9PMONk8BKrMcJWCzzhs/RdfDQOycpNr1ZzHE2P wZUc2hYfCfQCrQDOlbe5Nc0sDB1aptRNfNMj+2u/BgKhKfi6BxoVdIVza3KSCk8Y0b/6 AntXUrdliXShSIqndWKAKRkzvz2v5RJzCi3zKtfdL8r9G4tT5OQ6TUQXYmxLqsdrRPGD 4ypg7DBAnxIiXsoKH+3MIPWE96M68frlR8wVj1o7bL+di4xQYiFTrosYQBXsxzR19c1D 2rTA== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: APjAAAUOIDIWyI5bpc04+l6nqkoN0untANg9kbq9XBkSwKku7tM9Bttt TizboIZt9HU1V0APVSGErnsK6p1NICB4ChzTejGVRPiHFGGvEjPD8uaMEr/cNRg3PIW8DVqDrzc o925ugnmPXlybxR1e/X+zdqvLTyYdiBfyQwafms3xPu/ey4EBaMqDQGjoTroxUG5IWg== X-Received: by 2002:a17:902:7295:: with SMTP id d21mr33502069pll.299.1559772803086; Wed, 05 Jun 2019 15:13:23 -0700 (PDT) X-Google-Smtp-Source: APXvYqyAIJEMLRPuYP9vaMBmBtv9h787ghG6aN9mtUThyohDZsIUoqpfZYGPILaEODEKGknqpskY X-Received: by 2002:a17:902:7295:: with SMTP id d21mr33501945pll.299.1559772801882; Wed, 05 Jun 2019 15:13:21 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1559772801; cv=none; d=google.com; s=arc-20160816; b=nahqWU6akOG0/kEre9QxdKO2jZBevuoR2bVNVb7XpuLxg2KXMrg9kS+CtaUbBPYg2j wUzBFZ5nzA79XaHeK7rXbVf2ascQeliQ2iDZDHhlJIXJCL8XCTQI1MvsbPewm20AmV9E v1ZUyxZOEOG3Xshvv76xUohc+hzYvSIrDfDYN6Zyp1nRWVAxrEUcRGn4z21vjBGxDki1 Zc8rTMUYcyAtKcsM1MLwNlUSjHFrydNm2OE7SF6HP+YqOoqZt/c9l2aEtjOy0G+etCwz HN7QbPDOEdvX2bwk72zJWC5P6KZUjmmaSxKWZ26adYF6BZs5IGvkoTqpeeTzPDt0L1kt eezA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:user-agent:references :in-reply-to:message-id:date:cc:to:from:subject; bh=bYwzlu+qlg87yG+G9ABQwbpiA0ygWWX0zPRHiAUPmik=; b=RHwqa+x9Xi1aoGG7NysEh8j/gwOdgQp9YYJlaBJAV7as226M+dmc4mZLZczE/hr5oD e1DoByGza8Yp6AJN48jfnMhthkxIYI65SivY02X4hkdk9EbAlMn6p3pr3r1H/zztCNPY WVaSTkgfu+HlmL/aP+8psg5KIRlJtw00QMZuNiLVhdLVLyj/+CMPKvgLQs+l5HbY8id6 be0TNceq1dMQsje8bWKVgQHT2T/Zw40zJUayMVJ5gKft8UfZlXPXamkLJRQnSNoJAYqx HunXbAFa4gXUr2M1Wg3LkhUVeiJxLWny/Qu+jHL2vuAMvGC7osHL+djlYVKj8BvPDRkJ k7Pg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga07.intel.com (mga07.intel.com. [134.134.136.100]) by mx.google.com with ESMTPS id j59si30233009plb.176.2019.06.05.15.13.21 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 05 Jun 2019 15:13:21 -0700 (PDT) Received-SPF: pass (google.com: domain of dan.j.williams@intel.com designates 134.134.136.100 as permitted sender) client-ip=134.134.136.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 05 Jun 2019 15:13:21 -0700 X-ExtLoop1: 1 Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.16]) by fmsmga007.fm.intel.com with ESMTP; 05 Jun 2019 15:13:20 -0700 Subject: [PATCH v9 12/12] libnvdimm/pfn: Stop padding pmem namespaces to section alignment From: Dan Williams To: akpm@linux-foundation.org Cc: Jeff Moyer , linux-mm@kvack.org, linux-nvdimm@lists.01.org, linux-kernel@vger.kernel.org, osalvador@suse.de, mhocko@suse.com Date: Wed, 05 Jun 2019 14:59:04 -0700 Message-ID: <155977194407.2443951.14077225226024648760.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <155977186863.2443951.9036044808311959913.stgit@dwillia2-desk3.amr.corp.intel.com> References: <155977186863.2443951.9036044808311959913.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.18-2-gc94f MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Now that the mm core supports section-unaligned hotplug of ZONE_DEVICE memory, we no longer need to add padding at pfn/dax device creation time. The kernel will still honor padding established by older kernels. Reported-by: Jeff Moyer Signed-off-by: Dan Williams --- drivers/nvdimm/pfn.h | 14 -------- drivers/nvdimm/pfn_devs.c | 77 ++++++++------------------------------------- include/linux/mmzone.h | 3 ++ 3 files changed, 16 insertions(+), 78 deletions(-) diff --git a/drivers/nvdimm/pfn.h b/drivers/nvdimm/pfn.h index e901e3a3b04c..cc042a98758f 100644 --- a/drivers/nvdimm/pfn.h +++ b/drivers/nvdimm/pfn.h @@ -41,18 +41,4 @@ struct nd_pfn_sb { __le64 checksum; }; -#ifdef CONFIG_SPARSEMEM -#define PFN_SECTION_ALIGN_DOWN(x) SECTION_ALIGN_DOWN(x) -#define PFN_SECTION_ALIGN_UP(x) SECTION_ALIGN_UP(x) -#else -/* - * In this case ZONE_DEVICE=n and we will disable 'pfn' device support, - * but we still want pmem to compile. - */ -#define PFN_SECTION_ALIGN_DOWN(x) (x) -#define PFN_SECTION_ALIGN_UP(x) (x) -#endif - -#define PHYS_SECTION_ALIGN_DOWN(x) PFN_PHYS(PFN_SECTION_ALIGN_DOWN(PHYS_PFN(x))) -#define PHYS_SECTION_ALIGN_UP(x) PFN_PHYS(PFN_SECTION_ALIGN_UP(PHYS_PFN(x))) #endif /* __NVDIMM_PFN_H */ diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c index a2406253eb70..7f54374b082f 100644 --- a/drivers/nvdimm/pfn_devs.c +++ b/drivers/nvdimm/pfn_devs.c @@ -595,14 +595,14 @@ static u32 info_block_reserve(void) } /* - * We hotplug memory at section granularity, pad the reserved area from - * the previous section base to the namespace base address. + * We hotplug memory at sub-section granularity, pad the reserved area + * from the previous section base to the namespace base address. */ static unsigned long init_altmap_base(resource_size_t base) { unsigned long base_pfn = PHYS_PFN(base); - return PFN_SECTION_ALIGN_DOWN(base_pfn); + return SUBSECTION_ALIGN_DOWN(base_pfn); } static unsigned long init_altmap_reserve(resource_size_t base) @@ -610,7 +610,7 @@ static unsigned long init_altmap_reserve(resource_size_t base) unsigned long reserve = info_block_reserve() >> PAGE_SHIFT; unsigned long base_pfn = PHYS_PFN(base); - reserve += base_pfn - PFN_SECTION_ALIGN_DOWN(base_pfn); + reserve += base_pfn - SUBSECTION_ALIGN_DOWN(base_pfn); return reserve; } @@ -641,8 +641,7 @@ static int __nvdimm_setup_pfn(struct nd_pfn *nd_pfn, struct dev_pagemap *pgmap) nd_pfn->npfns = le64_to_cpu(pfn_sb->npfns); pgmap->altmap_valid = false; } else if (nd_pfn->mode == PFN_MODE_PMEM) { - nd_pfn->npfns = PFN_SECTION_ALIGN_UP((resource_size(res) - - offset) / PAGE_SIZE); + nd_pfn->npfns = PHYS_PFN((resource_size(res) - offset)); if (le64_to_cpu(nd_pfn->pfn_sb->npfns) > nd_pfn->npfns) dev_info(&nd_pfn->dev, "number of pfns truncated from %lld to %ld\n", @@ -658,54 +657,14 @@ static int __nvdimm_setup_pfn(struct nd_pfn *nd_pfn, struct dev_pagemap *pgmap) return 0; } -static u64 phys_pmem_align_down(struct nd_pfn *nd_pfn, u64 phys) -{ - return min_t(u64, PHYS_SECTION_ALIGN_DOWN(phys), - ALIGN_DOWN(phys, nd_pfn->align)); -} - -/* - * Check if pmem collides with 'System RAM', or other regions when - * section aligned. Trim it accordingly. - */ -static void trim_pfn_device(struct nd_pfn *nd_pfn, u32 *start_pad, u32 *end_trunc) -{ - struct nd_namespace_common *ndns = nd_pfn->ndns; - struct nd_namespace_io *nsio = to_nd_namespace_io(&ndns->dev); - struct nd_region *nd_region = to_nd_region(nd_pfn->dev.parent); - const resource_size_t start = nsio->res.start; - const resource_size_t end = start + resource_size(&nsio->res); - resource_size_t adjust, size; - - *start_pad = 0; - *end_trunc = 0; - - adjust = start - PHYS_SECTION_ALIGN_DOWN(start); - size = resource_size(&nsio->res) + adjust; - if (region_intersects(start - adjust, size, IORESOURCE_SYSTEM_RAM, - IORES_DESC_NONE) == REGION_MIXED - || nd_region_conflict(nd_region, start - adjust, size)) - *start_pad = PHYS_SECTION_ALIGN_UP(start) - start; - - /* Now check that end of the range does not collide. */ - adjust = PHYS_SECTION_ALIGN_UP(end) - end; - size = resource_size(&nsio->res) + adjust; - if (region_intersects(start, size, IORESOURCE_SYSTEM_RAM, - IORES_DESC_NONE) == REGION_MIXED - || !IS_ALIGNED(end, nd_pfn->align) - || nd_region_conflict(nd_region, start, size)) - *end_trunc = end - phys_pmem_align_down(nd_pfn, end); -} - static int nd_pfn_init(struct nd_pfn *nd_pfn) { struct nd_namespace_common *ndns = nd_pfn->ndns; struct nd_namespace_io *nsio = to_nd_namespace_io(&ndns->dev); - u32 start_pad, end_trunc, reserve = info_block_reserve(); resource_size_t start, size; struct nd_region *nd_region; + unsigned long npfns, align; struct nd_pfn_sb *pfn_sb; - unsigned long npfns; phys_addr_t offset; const char *sig; u64 checksum; @@ -736,43 +695,35 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn) return -ENXIO; } - memset(pfn_sb, 0, sizeof(*pfn_sb)); - - trim_pfn_device(nd_pfn, &start_pad, &end_trunc); - if (start_pad + end_trunc) - dev_info(&nd_pfn->dev, "%s alignment collision, truncate %d bytes\n", - dev_name(&ndns->dev), start_pad + end_trunc); - /* * Note, we use 64 here for the standard size of struct page, * debugging options may cause it to be larger in which case the * implementation will limit the pfns advertised through * ->direct_access() to those that are included in the memmap. */ - start = nsio->res.start + start_pad; + start = nsio->res.start; size = resource_size(&nsio->res); - npfns = PFN_SECTION_ALIGN_UP((size - start_pad - end_trunc - reserve) - / PAGE_SIZE); + npfns = PHYS_PFN(size - SZ_8K); + align = max(nd_pfn->align, (1UL << SUBSECTION_SHIFT)); if (nd_pfn->mode == PFN_MODE_PMEM) { /* * The altmap should be padded out to the block size used * when populating the vmemmap. This *should* be equal to * PMD_SIZE for most architectures. */ - offset = ALIGN(start + reserve + 64 * npfns, - max(nd_pfn->align, PMD_SIZE)) - start; + offset = ALIGN(start + SZ_8K + 64 * npfns, align) - start; } else if (nd_pfn->mode == PFN_MODE_RAM) - offset = ALIGN(start + reserve, nd_pfn->align) - start; + offset = ALIGN(start + SZ_8K, align) - start; else return -ENXIO; - if (offset + start_pad + end_trunc >= size) { + if (offset >= size) { dev_err(&nd_pfn->dev, "%s unable to satisfy requested alignment\n", dev_name(&ndns->dev)); return -ENXIO; } - npfns = (size - offset - start_pad - end_trunc) / SZ_4K; + npfns = PHYS_PFN(size - offset); pfn_sb->mode = cpu_to_le32(nd_pfn->mode); pfn_sb->dataoff = cpu_to_le64(offset); pfn_sb->npfns = cpu_to_le64(npfns); @@ -781,8 +732,6 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn) memcpy(pfn_sb->parent_uuid, nd_dev_to_uuid(&ndns->dev), 16); pfn_sb->version_major = cpu_to_le16(1); pfn_sb->version_minor = cpu_to_le16(3); - pfn_sb->start_pad = cpu_to_le32(start_pad); - pfn_sb->end_trunc = cpu_to_le32(end_trunc); pfn_sb->align = cpu_to_le32(nd_pfn->align); checksum = nd_sb_checksum((struct nd_gen_sb *) pfn_sb); pfn_sb->checksum = cpu_to_le64(checksum); diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 49e7fb452dfd..15e07f007ba2 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -1181,6 +1181,9 @@ static inline unsigned long section_nr_to_pfn(unsigned long sec) #define SUBSECTIONS_PER_SECTION (1UL << (SECTION_SIZE_BITS - SUBSECTION_SHIFT)) #endif +#define SUBSECTION_ALIGN_UP(pfn) ALIGN((pfn), PAGES_PER_SUBSECTION) +#define SUBSECTION_ALIGN_DOWN(pfn) ((pfn) & PAGE_SUBSECTION_MASK) + struct mem_section_usage { DECLARE_BITMAP(subsection_map, SUBSECTIONS_PER_SECTION); /* See declaration of similar field in struct zone */