From patchwork Tue Dec 8 17:28:54 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joao Martins X-Patchwork-Id: 11959125 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, UNPARSEABLE_RELAY,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0D116C433FE for ; Tue, 8 Dec 2020 17:30:42 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 911C823B02 for ; Tue, 8 Dec 2020 17:30:41 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 911C823B02 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=oracle.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 38E866B005C; Tue, 8 Dec 2020 12:30:40 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 33F896B005D; Tue, 8 Dec 2020 12:30:40 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2303C6B006C; Tue, 8 Dec 2020 12:30:40 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0231.hostedemail.com [216.40.44.231]) by kanga.kvack.org (Postfix) with ESMTP id 0D0B66B005C for ; Tue, 8 Dec 2020 12:30:40 -0500 (EST) Received: from smtpin25.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id C08368249980 for ; Tue, 8 Dec 2020 17:30:39 +0000 (UTC) X-FDA: 77570804598.25.doll28_5411c8d273e8 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin25.hostedemail.com (Postfix) with ESMTP id 2D4611804E3BF for ; Tue, 8 Dec 2020 17:30:37 +0000 (UTC) X-HE-Tag: doll28_5411c8d273e8 X-Filterd-Recvd-Size: 12733 Received: from aserp2120.oracle.com (aserp2120.oracle.com [141.146.126.78]) by imf16.hostedemail.com (Postfix) with ESMTP for ; Tue, 8 Dec 2020 17:30:36 +0000 (UTC) Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0B8HPJGv071146; Tue, 8 Dec 2020 17:30:27 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2020-01-29; bh=Bf0H8+1g681aJKDQDVv1pLRfSYGx+OJmLGnIVAZhDVs=; b=qIJEWKbVS9VEgPtIFLmUjJARzDDpPJLwN2LmUNdaUUX0/F0WZgJ/CmLlhEJN2bORg/1k lxx91sUyyBk1EP6HZ3qrQ/Ga9QBtgVGVbTnYNF7w0i/OXzmYDRxs3cru+/k2Tt5M8Dji IhV7Zs6uZl01cvvhGv66oFf30EFoPKHT+sG8WLoIGk/n9hO43dQjjw1W5iiAI8CzK8v7 VtHMbLppXMZnm+a1iktp62jSGr7sFnntZVBz1scEpLDunXl21hvTnifdhI7SnVS5MrI3 XF1w9WuCfr+cCloxrkknStJxKv1/3aboKnae2nZ64QsOyrhXrwu7mmmMJd832ngVUUnE Ug== Received: from aserp3020.oracle.com (aserp3020.oracle.com [141.146.126.70]) by aserp2120.oracle.com with ESMTP id 35825m41jj-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Tue, 08 Dec 2020 17:30:27 +0000 Received: from pps.filterd (aserp3020.oracle.com [127.0.0.1]) by aserp3020.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0B8HOV52195372; Tue, 8 Dec 2020 17:30:27 GMT Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72]) by aserp3020.oracle.com with ESMTP id 358m3y2f72-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 08 Dec 2020 17:30:27 +0000 Received: from abhmp0019.oracle.com (abhmp0019.oracle.com [141.146.116.25]) by userv0121.oracle.com (8.14.4/8.13.8) with ESMTP id 0B8HUPhV026020; Tue, 8 Dec 2020 17:30:25 GMT Received: from paddy.uk.oracle.com (/10.175.194.215) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 08 Dec 2020 09:30:25 -0800 From: Joao Martins To: linux-mm@kvack.org Cc: Dan Williams , Ira Weiny , linux-nvdimm@lists.01.org, Matthew Wilcox , Jason Gunthorpe , Jane Chu , Muchun Song , Mike Kravetz , Andrew Morton , Joao Martins Subject: [PATCH RFC 3/9] sparse-vmemmap: Reuse vmemmap areas for a given mhp_params::align Date: Tue, 8 Dec 2020 17:28:54 +0000 Message-Id: <20201208172901.17384-4-joao.m.martins@oracle.com> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20201208172901.17384-1-joao.m.martins@oracle.com> References: <20201208172901.17384-1-joao.m.martins@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9829 signatures=668683 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 malwarescore=0 adultscore=0 bulkscore=0 phishscore=0 suspectscore=1 mlxscore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2012080107 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9829 signatures=668683 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=1 adultscore=0 bulkscore=0 phishscore=0 mlxlogscore=999 clxscore=1015 priorityscore=1501 mlxscore=0 spamscore=0 lowpriorityscore=0 malwarescore=0 impostorscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2012080107 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Introduce a new flag, MEMHP_REUSE_VMEMMAP, which signals that that struct pages are onlined with a given alignment, and should reuse the tail pages vmemmap areas. On that circunstamce we reuse the PFN backing only the tail pages subsections, while letting the head page PFN remain different. This presumes that the backing page structs are compound pages, such as the case for compound pagemaps (i.e. ZONE_DEVICE with PGMAP_COMPOUND set) On 2M compound pagemaps, it lets us save 6 pages out of the 8 necessary PFNs necessary to describe the subsection's 32K struct pages we are onlining. On a 1G compound pagemap it let us save 4096 pages. Sections are 128M (or bigger/smaller), and such when initializing a compound memory map where we are initializing compound struct pages, we need to preserve the tail page to be reused across the rest of the areas for pagesizes which bigger than a section. Signed-off-by: Joao Martins --- I wonder, rather than separating vmem_context and mhp_params, that one would just pick the latter. Albeit semantically the ctx aren't necessarily paramters, context passed from multiple sections onlining (i.e. multiple calls to populate_section_memmap). Also provided that this is internal state, which isn't passed to external modules, except @align and @flags for page size and requesting whether to reuse tail page areas. --- include/linux/memory_hotplug.h | 10 ++++ include/linux/mm.h | 2 +- mm/memory_hotplug.c | 12 ++++- mm/memremap.c | 3 ++ mm/sparse-vmemmap.c | 93 ++++++++++++++++++++++++++++------ 5 files changed, 103 insertions(+), 17 deletions(-) diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index 73f8bcbb58a4..e15bb82805a3 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -70,6 +70,10 @@ typedef int __bitwise mhp_t; */ #define MEMHP_MERGE_RESOURCE ((__force mhp_t)BIT(0)) +/* + */ +#define MEMHP_REUSE_VMEMMAP ((__force mhp_t)BIT(1)) + /* * Extended parameters for memory hotplug: * altmap: alternative allocator for memmap array (optional) @@ -79,10 +83,16 @@ typedef int __bitwise mhp_t; struct mhp_params { struct vmem_altmap *altmap; pgprot_t pgprot; + unsigned int align; + mhp_t flags; }; struct vmem_context { struct vmem_altmap *altmap; + mhp_t flags; + unsigned int align; + void *block; + unsigned long block_page; }; /* diff --git a/include/linux/mm.h b/include/linux/mm.h index 2eb44318bb2d..8b0155441835 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3006,7 +3006,7 @@ p4d_t *vmemmap_p4d_populate(pgd_t *pgd, unsigned long addr, int node); pud_t *vmemmap_pud_populate(p4d_t *p4d, unsigned long addr, int node); pmd_t *vmemmap_pmd_populate(pud_t *pud, unsigned long addr, int node); pte_t *vmemmap_pte_populate(pmd_t *pmd, unsigned long addr, int node, - struct vmem_altmap *altmap); + struct vmem_altmap *altmap, void *block); void *vmemmap_alloc_block(unsigned long size, int node); struct vmem_altmap; void *vmemmap_alloc_block_buf(unsigned long size, int node, diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index f8870c53fe5e..56121dfcc44b 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -300,6 +300,14 @@ static int check_hotplug_memory_addressable(unsigned long pfn, return 0; } +static void vmem_context_init(struct vmem_context *ctx, struct mhp_params *params) +{ + memset(ctx, 0, sizeof(*ctx)); + ctx->align = params->align; + ctx->altmap = params->altmap; + ctx->flags = params->flags; +} + /* * Reasonably generic function for adding memory. It is * expected that archs that support memory hotplug will @@ -313,7 +321,7 @@ int __ref __add_pages(int nid, unsigned long pfn, unsigned long nr_pages, unsigned long cur_nr_pages; int err; struct vmem_altmap *altmap = params->altmap; - struct vmem_context ctx = { .altmap = params->altmap }; + struct vmem_context ctx; if (WARN_ON_ONCE(!params->pgprot.pgprot)) return -EINVAL; @@ -338,6 +346,8 @@ int __ref __add_pages(int nid, unsigned long pfn, unsigned long nr_pages, if (err) return err; + vmem_context_init(&ctx, params); + for (; pfn < end_pfn; pfn += cur_nr_pages) { /* Select all remaining pages up to the next section boundary */ cur_nr_pages = min(end_pfn - pfn, diff --git a/mm/memremap.c b/mm/memremap.c index 287a24b7a65a..ecfa74848ac6 100644 --- a/mm/memremap.c +++ b/mm/memremap.c @@ -253,6 +253,9 @@ static int pagemap_range(struct dev_pagemap *pgmap, struct mhp_params *params, goto err_kasan; } + if (pgmap->flags & PGMAP_COMPOUND) + params->align = pgmap->align; + error = arch_add_memory(nid, range->start, range_len(range), params); } diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c index bcda68ba1381..1679f36473ac 100644 --- a/mm/sparse-vmemmap.c +++ b/mm/sparse-vmemmap.c @@ -141,16 +141,20 @@ void __meminit vmemmap_verify(pte_t *pte, int node, } pte_t * __meminit vmemmap_pte_populate(pmd_t *pmd, unsigned long addr, int node, - struct vmem_altmap *altmap) + struct vmem_altmap *altmap, void *block) { pte_t *pte = pte_offset_kernel(pmd, addr); if (pte_none(*pte)) { pte_t entry; - void *p; - - p = vmemmap_alloc_block_buf(PAGE_SIZE, node, altmap); - if (!p) - return NULL; + void *p = block; + + if (!block) { + p = vmemmap_alloc_block_buf(PAGE_SIZE, node, altmap); + if (!p) + return NULL; + } else { + get_page(virt_to_page(block)); + } entry = pfn_pte(__pa(p) >> PAGE_SHIFT, PAGE_KERNEL); set_pte_at(&init_mm, addr, pte, entry); } @@ -216,8 +220,10 @@ pgd_t * __meminit vmemmap_pgd_populate(unsigned long addr, int node) return pgd; } -int __meminit vmemmap_populate_basepages(unsigned long start, unsigned long end, - int node, struct vmem_altmap *altmap) +static void *__meminit __vmemmap_populate_basepages(unsigned long start, + unsigned long end, int node, + struct vmem_altmap *altmap, + void *block) { unsigned long addr = start; pgd_t *pgd; @@ -229,38 +235,95 @@ int __meminit vmemmap_populate_basepages(unsigned long start, unsigned long end, for (; addr < end; addr += PAGE_SIZE) { pgd = vmemmap_pgd_populate(addr, node); if (!pgd) - return -ENOMEM; + return NULL; p4d = vmemmap_p4d_populate(pgd, addr, node); if (!p4d) - return -ENOMEM; + return NULL; pud = vmemmap_pud_populate(p4d, addr, node); if (!pud) - return -ENOMEM; + return NULL; pmd = vmemmap_pmd_populate(pud, addr, node); if (!pmd) - return -ENOMEM; - pte = vmemmap_pte_populate(pmd, addr, node, altmap); + return NULL; + pte = vmemmap_pte_populate(pmd, addr, node, altmap, block); if (!pte) - return -ENOMEM; + return NULL; vmemmap_verify(pte, node, addr, addr + PAGE_SIZE); } + return __va(__pfn_to_phys(pte_pfn(*pte))); +} + +int __meminit vmemmap_populate_basepages(unsigned long start, unsigned long end, + int node, struct vmem_altmap *altmap) +{ + if (!__vmemmap_populate_basepages(start, end, node, altmap, NULL)) + return -ENOMEM; return 0; } +static struct page * __meminit vmemmap_populate_reuse(unsigned long start, + unsigned long end, int node, + struct vmem_context *ctx) +{ + unsigned long size, addr = start; + unsigned long psize = PHYS_PFN(ctx->align) * sizeof(struct page); + + size = min(psize, end - start); + + for (; addr < end; addr += size) { + unsigned long head = addr + PAGE_SIZE; + unsigned long tail = addr; + unsigned long last = addr + size; + void *area; + + if (ctx->block_page && + IS_ALIGNED((addr - ctx->block_page), psize)) + ctx->block = NULL; + + area = ctx->block; + if (!area) { + if (!__vmemmap_populate_basepages(addr, head, node, + ctx->altmap, NULL)) + return NULL; + + tail = head + PAGE_SIZE; + area = __vmemmap_populate_basepages(head, tail, node, + ctx->altmap, NULL); + if (!area) + return NULL; + + ctx->block = area; + ctx->block_page = addr; + } + + if (!__vmemmap_populate_basepages(tail, last, node, + ctx->altmap, area)) + return NULL; + } + + return (struct page*) start; +} + struct page * __meminit __populate_section_memmap(unsigned long pfn, unsigned long nr_pages, int nid, struct vmem_context *ctx) { unsigned long start = (unsigned long) pfn_to_page(pfn); unsigned long end = start + nr_pages * sizeof(struct page); struct vmem_altmap *altmap = NULL; + int flags = 0; if (WARN_ON_ONCE(!IS_ALIGNED(pfn, PAGES_PER_SUBSECTION) || !IS_ALIGNED(nr_pages, PAGES_PER_SUBSECTION))) return NULL; - if (ctx) + if (ctx) { altmap = ctx->altmap; + flags = ctx->flags; + } + + if (flags & MEMHP_REUSE_VMEMMAP) + return vmemmap_populate_reuse(start, end, nid, ctx); if (vmemmap_populate(start, end, nid, altmap)) return NULL;