From patchwork Tue Dec 8 17:28:52 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joao Martins X-Patchwork-Id: 11959157 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, UNPARSEABLE_RELAY,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8B86AC4361B for ; Tue, 8 Dec 2020 17:32:31 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 37CB823B1A for ; Tue, 8 Dec 2020 17:32:31 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 37CB823B1A Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=oracle.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 9F99F6B0075; Tue, 8 Dec 2020 12:32:30 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 9A65A6B0071; Tue, 8 Dec 2020 12:32:30 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8471F6B007B; Tue, 8 Dec 2020 12:32:30 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0141.hostedemail.com [216.40.44.141]) by kanga.kvack.org (Postfix) with ESMTP id 676AE6B0071 for ; Tue, 8 Dec 2020 12:32:30 -0500 (EST) Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id EF7D633C4 for ; Tue, 8 Dec 2020 17:32:29 +0000 (UTC) X-FDA: 77570809218.23.jeans16_0114f67273e8 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin23.hostedemail.com (Postfix) with ESMTP id CB9A537604 for ; Tue, 8 Dec 2020 17:32:29 +0000 (UTC) X-HE-Tag: jeans16_0114f67273e8 X-Filterd-Recvd-Size: 6664 Received: from userp2130.oracle.com (userp2130.oracle.com [156.151.31.86]) by imf12.hostedemail.com (Postfix) with ESMTP for ; Tue, 8 Dec 2020 17:32:28 +0000 (UTC) Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0B8HNtm1191619; Tue, 8 Dec 2020 17:32:23 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2020-01-29; bh=D5qZqJl9pDTnQTyU+Z93dVba8Vn316JE/ksYxTiz9X0=; b=V1mXdGn4LGY79Syj73VzU+i1xF9fxXfqx6KCgHAvig/smz+i02wt5Eg0JRZzs5c8ma0A Gc3YKnF+4ktQwnI8vQJG0J6lx1JMykq9Aesw5dfr6d+cwSqfuSVvrAfza/O21rRdtIMr 7odsDTXl80xC6500/95/h0DkmmPkZlP7OJSalJPWz/quEolmpA8JAaMsxBREX6Jhvzss sxOWqiKskjfEoPpbM3pDa7FiNktGr/pR+4tmPEr9/F0B0XtRUIGji1w4T1Lbxh+Bqw9I j+OoYO4FSlRmpfXVqdm1M/PHCW889TXllixIs3SI3MGvGtxte5S6CZ8v1IQ2IS6+eRHb SA== Received: from aserp3020.oracle.com (aserp3020.oracle.com [141.146.126.70]) by userp2130.oracle.com with ESMTP id 3581mqv09r-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Tue, 08 Dec 2020 17:32:23 +0000 Received: from pps.filterd (aserp3020.oracle.com [127.0.0.1]) by aserp3020.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0B8HOVrt195377; Tue, 8 Dec 2020 17:30:22 GMT Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by aserp3020.oracle.com with ESMTP id 358m3y2f35-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 08 Dec 2020 17:30:22 +0000 Received: from abhmp0019.oracle.com (abhmp0019.oracle.com [141.146.116.25]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 0B8HUJNO031431; Tue, 8 Dec 2020 17:30:19 GMT Received: from paddy.uk.oracle.com (/10.175.194.215) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 08 Dec 2020 09:30:19 -0800 From: Joao Martins To: linux-mm@kvack.org Cc: Dan Williams , Ira Weiny , linux-nvdimm@lists.01.org, Matthew Wilcox , Jason Gunthorpe , Jane Chu , Muchun Song , Mike Kravetz , Andrew Morton , Joao Martins Subject: [PATCH RFC 1/9] memremap: add ZONE_DEVICE support for compound pages Date: Tue, 8 Dec 2020 17:28:52 +0000 Message-Id: <20201208172901.17384-2-joao.m.martins@oracle.com> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20201208172901.17384-1-joao.m.martins@oracle.com> References: <20201208172901.17384-1-joao.m.martins@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9829 signatures=668683 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 malwarescore=0 adultscore=0 bulkscore=0 phishscore=0 suspectscore=1 mlxscore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2012080107 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9829 signatures=668683 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=1 mlxlogscore=999 clxscore=1015 malwarescore=0 priorityscore=1501 adultscore=0 lowpriorityscore=0 phishscore=0 spamscore=0 impostorscore=0 mlxscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2012080107 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add a new flag for struct dev_pagemap which designates that a a pagemap is described as a set of compound pages or in other words, that how pages are grouped together in the page tables are reflected in how we describe struct pages. This means that rather than initializing individual struct pages, we also initialize these struct pages, as compound pages (on x86: 2M or 1G compound pages) For certain ZONE_DEVICE users, like device-dax, which have a fixed page size, this creates an opportunity to optimize GUP and GUP-fast walkers, thus playing the same tricks as hugetlb pages. Signed-off-by: Joao Martins --- include/linux/memremap.h | 2 ++ mm/memremap.c | 8 ++++++-- mm/page_alloc.c | 7 +++++++ 3 files changed, 15 insertions(+), 2 deletions(-) diff --git a/include/linux/memremap.h b/include/linux/memremap.h index 79c49e7f5c30..f8f26b2cc3da 100644 --- a/include/linux/memremap.h +++ b/include/linux/memremap.h @@ -90,6 +90,7 @@ struct dev_pagemap_ops { }; #define PGMAP_ALTMAP_VALID (1 << 0) +#define PGMAP_COMPOUND (1 << 1) /** * struct dev_pagemap - metadata for ZONE_DEVICE mappings @@ -114,6 +115,7 @@ struct dev_pagemap { struct completion done; enum memory_type type; unsigned int flags; + unsigned int align; const struct dev_pagemap_ops *ops; void *owner; int nr_range; diff --git a/mm/memremap.c b/mm/memremap.c index 16b2fb482da1..287a24b7a65a 100644 --- a/mm/memremap.c +++ b/mm/memremap.c @@ -277,8 +277,12 @@ static int pagemap_range(struct dev_pagemap *pgmap, struct mhp_params *params, memmap_init_zone_device(&NODE_DATA(nid)->node_zones[ZONE_DEVICE], PHYS_PFN(range->start), PHYS_PFN(range_len(range)), pgmap); - percpu_ref_get_many(pgmap->ref, pfn_end(pgmap, range_id) - - pfn_first(pgmap, range_id)); + if (pgmap->flags & PGMAP_COMPOUND) + percpu_ref_get_many(pgmap->ref, (pfn_end(pgmap, range_id) + - pfn_first(pgmap, range_id)) / PHYS_PFN(pgmap->align)); + else + percpu_ref_get_many(pgmap->ref, pfn_end(pgmap, range_id) + - pfn_first(pgmap, range_id)); return 0; err_add_memory: diff --git a/mm/page_alloc.c b/mm/page_alloc.c index eaa227a479e4..9716ecd58e29 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6116,6 +6116,8 @@ void __ref memmap_init_zone_device(struct zone *zone, unsigned long pfn, end_pfn = start_pfn + nr_pages; struct pglist_data *pgdat = zone->zone_pgdat; struct vmem_altmap *altmap = pgmap_altmap(pgmap); + bool compound = pgmap->flags & PGMAP_COMPOUND; + unsigned int align = PHYS_PFN(pgmap->align); unsigned long zone_idx = zone_idx(zone); unsigned long start = jiffies; int nid = pgdat->node_id; @@ -6171,6 +6173,11 @@ void __ref memmap_init_zone_device(struct zone *zone, } } + if (compound) { + for (pfn = start_pfn; pfn < end_pfn; pfn += align) + prep_compound_page(pfn_to_page(pfn), order_base_2(align)); + } + pr_info("%s initialised %lu pages in %ums\n", __func__, nr_pages, jiffies_to_msecs(jiffies - start)); } From patchwork Tue Dec 8 17:28:53 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joao Martins X-Patchwork-Id: 11959133 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, UNPARSEABLE_RELAY,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C795BC433FE for ; Tue, 8 Dec 2020 17:30:51 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 73CA423B05 for ; Tue, 8 Dec 2020 17:30:51 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 73CA423B05 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=oracle.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 541AF6B0071; Tue, 8 Dec 2020 12:30:47 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 4C7FB6B0072; Tue, 8 Dec 2020 12:30:47 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3B8FC6B0073; Tue, 8 Dec 2020 12:30:47 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0109.hostedemail.com [216.40.44.109]) by kanga.kvack.org (Postfix) with ESMTP id 263316B0071 for ; Tue, 8 Dec 2020 12:30:47 -0500 (EST) Received: from smtpin01.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id E3079180AD820 for ; Tue, 8 Dec 2020 17:30:46 +0000 (UTC) X-FDA: 77570804892.01.ray41_291517a273e8 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin01.hostedemail.com (Postfix) with ESMTP id 35F3F1004A087 for ; Tue, 8 Dec 2020 17:30:36 +0000 (UTC) X-HE-Tag: ray41_291517a273e8 X-Filterd-Recvd-Size: 10058 Received: from userp2130.oracle.com (userp2130.oracle.com [156.151.31.86]) by imf34.hostedemail.com (Postfix) with ESMTP for ; Tue, 8 Dec 2020 17:30:35 +0000 (UTC) Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0B8HNtD2191701; Tue, 8 Dec 2020 17:30:25 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2020-01-29; bh=S4Dp0+htpEglgK/1Ic/K1B8geFHI1bZ/iiN4YE5pPk0=; b=Hbdf1PR23+K2cH7wMRqBuAjYHhoUQPjPnQ5OhQa9p/yQgNaqk5+wdFq5pg1xMAzcZaJD 7L3Jm4ePyJVmsFqpdFugX9C3n9ewkm8aEJKahhwqIDL7Zi9gwKte1TsMbVXd80e0MTWN yNz5YIM5fexkUw1ZStxPWMO/qaxFm7Qx+s55UFK0ZhgO3DrP4If4p3KlhjJhe3c4Eg/b VjUU23WiE3xnoWw1C1IeeF5UV8QJWW28UQ481ylhbx9n073N9i8WkJbenFn7dTESATXn pVFiXTGK4hSBdMy8ng2InZ3B+9FxrzaEneRlinywNiue+J8XPPhy6lOkTpTd0gp+zOD9 lg== Received: from aserp3020.oracle.com (aserp3020.oracle.com [141.146.126.70]) by userp2130.oracle.com with ESMTP id 3581mquyyb-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Tue, 08 Dec 2020 17:30:24 +0000 Received: from pps.filterd (aserp3020.oracle.com [127.0.0.1]) by aserp3020.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0B8HOVZR195439; Tue, 8 Dec 2020 17:30:24 GMT Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by aserp3020.oracle.com with ESMTP id 358m3y2f4e-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 08 Dec 2020 17:30:24 +0000 Received: from abhmp0019.oracle.com (abhmp0019.oracle.com [141.146.116.25]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 0B8HUMgc031509; Tue, 8 Dec 2020 17:30:22 GMT Received: from paddy.uk.oracle.com (/10.175.194.215) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 08 Dec 2020 09:30:22 -0800 From: Joao Martins To: linux-mm@kvack.org Cc: Dan Williams , Ira Weiny , linux-nvdimm@lists.01.org, Matthew Wilcox , Jason Gunthorpe , Jane Chu , Muchun Song , Mike Kravetz , Andrew Morton , Joao Martins Subject: [PATCH RFC 2/9] sparse-vmemmap: Consolidate arguments in vmemmap section populate Date: Tue, 8 Dec 2020 17:28:53 +0000 Message-Id: <20201208172901.17384-3-joao.m.martins@oracle.com> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20201208172901.17384-1-joao.m.martins@oracle.com> References: <20201208172901.17384-1-joao.m.martins@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9829 signatures=668683 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 malwarescore=0 adultscore=0 bulkscore=0 phishscore=0 suspectscore=1 mlxscore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2012080107 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9829 signatures=668683 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=1 mlxlogscore=999 clxscore=1011 malwarescore=0 priorityscore=1501 adultscore=0 lowpriorityscore=0 phishscore=0 spamscore=0 impostorscore=0 mlxscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2012080107 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Replace vmem_altmap with an vmem_context argument. That let us express how the vmemmap is gonna be initialized e.g. passing flags and a page size for reusing pages upon initializing the vmemmap. Signed-off-by: Joao Martins --- include/linux/memory_hotplug.h | 6 +++++- include/linux/mm.h | 2 +- mm/memory_hotplug.c | 3 ++- mm/sparse-vmemmap.c | 6 +++++- mm/sparse.c | 16 ++++++++-------- 5 files changed, 21 insertions(+), 12 deletions(-) diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index 551093b74596..73f8bcbb58a4 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -81,6 +81,10 @@ struct mhp_params { pgprot_t pgprot; }; +struct vmem_context { + struct vmem_altmap *altmap; +}; + /* * Zone resizing functions * @@ -353,7 +357,7 @@ extern void remove_pfn_range_from_zone(struct zone *zone, unsigned long nr_pages); extern bool is_memblock_offlined(struct memory_block *mem); extern int sparse_add_section(int nid, unsigned long pfn, - unsigned long nr_pages, struct vmem_altmap *altmap); + unsigned long nr_pages, struct vmem_context *ctx); extern void sparse_remove_section(struct mem_section *ms, unsigned long pfn, unsigned long nr_pages, unsigned long map_offset, struct vmem_altmap *altmap); diff --git a/include/linux/mm.h b/include/linux/mm.h index db6ae4d3fb4e..2eb44318bb2d 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3000,7 +3000,7 @@ static inline void print_vma_addr(char *prefix, unsigned long rip) void *sparse_buffer_alloc(unsigned long size); struct page * __populate_section_memmap(unsigned long pfn, - unsigned long nr_pages, int nid, struct vmem_altmap *altmap); + unsigned long nr_pages, int nid, struct vmem_context *ctx); pgd_t *vmemmap_pgd_populate(unsigned long addr, int node); p4d_t *vmemmap_p4d_populate(pgd_t *pgd, unsigned long addr, int node); pud_t *vmemmap_pud_populate(p4d_t *p4d, unsigned long addr, int node); diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 63b2e46b6555..f8870c53fe5e 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -313,6 +313,7 @@ int __ref __add_pages(int nid, unsigned long pfn, unsigned long nr_pages, unsigned long cur_nr_pages; int err; struct vmem_altmap *altmap = params->altmap; + struct vmem_context ctx = { .altmap = params->altmap }; if (WARN_ON_ONCE(!params->pgprot.pgprot)) return -EINVAL; @@ -341,7 +342,7 @@ int __ref __add_pages(int nid, unsigned long pfn, unsigned long nr_pages, /* Select all remaining pages up to the next section boundary */ cur_nr_pages = min(end_pfn - pfn, SECTION_ALIGN_UP(pfn + 1) - pfn); - err = sparse_add_section(nid, pfn, cur_nr_pages, altmap); + err = sparse_add_section(nid, pfn, cur_nr_pages, &ctx); if (err) break; cond_resched(); diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c index 16183d85a7d5..bcda68ba1381 100644 --- a/mm/sparse-vmemmap.c +++ b/mm/sparse-vmemmap.c @@ -249,15 +249,19 @@ int __meminit vmemmap_populate_basepages(unsigned long start, unsigned long end, } struct page * __meminit __populate_section_memmap(unsigned long pfn, - unsigned long nr_pages, int nid, struct vmem_altmap *altmap) + unsigned long nr_pages, int nid, struct vmem_context *ctx) { unsigned long start = (unsigned long) pfn_to_page(pfn); unsigned long end = start + nr_pages * sizeof(struct page); + struct vmem_altmap *altmap = NULL; if (WARN_ON_ONCE(!IS_ALIGNED(pfn, PAGES_PER_SUBSECTION) || !IS_ALIGNED(nr_pages, PAGES_PER_SUBSECTION))) return NULL; + if (ctx) + altmap = ctx->altmap; + if (vmemmap_populate(start, end, nid, altmap)) return NULL; diff --git a/mm/sparse.c b/mm/sparse.c index 7bd23f9d6cef..47ca494398a7 100644 --- a/mm/sparse.c +++ b/mm/sparse.c @@ -443,7 +443,7 @@ static unsigned long __init section_map_size(void) } struct page __init *__populate_section_memmap(unsigned long pfn, - unsigned long nr_pages, int nid, struct vmem_altmap *altmap) + unsigned long nr_pages, int nid, struct vmem_context *ctx) { unsigned long size = section_map_size(); struct page *map = sparse_buffer_alloc(size); @@ -648,9 +648,9 @@ void offline_mem_sections(unsigned long start_pfn, unsigned long end_pfn) #ifdef CONFIG_SPARSEMEM_VMEMMAP static struct page * __meminit populate_section_memmap(unsigned long pfn, - unsigned long nr_pages, int nid, struct vmem_altmap *altmap) + unsigned long nr_pages, int nid, struct vmem_context *ctx) { - return __populate_section_memmap(pfn, nr_pages, nid, altmap); + return __populate_section_memmap(pfn, nr_pages, nid, ctx); } static void depopulate_section_memmap(unsigned long pfn, unsigned long nr_pages, @@ -842,7 +842,7 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages, } static struct page * __meminit section_activate(int nid, unsigned long pfn, - unsigned long nr_pages, struct vmem_altmap *altmap) + unsigned long nr_pages, struct vmem_context *ctx) { struct mem_section *ms = __pfn_to_section(pfn); struct mem_section_usage *usage = NULL; @@ -874,9 +874,9 @@ static struct page * __meminit section_activate(int nid, unsigned long pfn, if (nr_pages < PAGES_PER_SECTION && early_section(ms)) return pfn_to_page(pfn); - memmap = populate_section_memmap(pfn, nr_pages, nid, altmap); + memmap = populate_section_memmap(pfn, nr_pages, nid, ctx); if (!memmap) { - section_deactivate(pfn, nr_pages, altmap); + section_deactivate(pfn, nr_pages, ctx->altmap); return ERR_PTR(-ENOMEM); } @@ -902,7 +902,7 @@ static struct page * __meminit section_activate(int nid, unsigned long pfn, * * -ENOMEM - Out of memory. */ int __meminit sparse_add_section(int nid, unsigned long start_pfn, - unsigned long nr_pages, struct vmem_altmap *altmap) + unsigned long nr_pages, struct vmem_context *ctx) { unsigned long section_nr = pfn_to_section_nr(start_pfn); struct mem_section *ms; @@ -913,7 +913,7 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn, if (ret < 0) return ret; - memmap = section_activate(nid, start_pfn, nr_pages, altmap); + memmap = section_activate(nid, start_pfn, nr_pages, ctx); if (IS_ERR(memmap)) return PTR_ERR(memmap); From patchwork Tue Dec 8 17:28:54 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joao Martins X-Patchwork-Id: 11959125 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, UNPARSEABLE_RELAY,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0D116C433FE for ; Tue, 8 Dec 2020 17:30:42 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 911C823B02 for ; Tue, 8 Dec 2020 17:30:41 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 911C823B02 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=oracle.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 38E866B005C; Tue, 8 Dec 2020 12:30:40 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 33F896B005D; Tue, 8 Dec 2020 12:30:40 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2303C6B006C; Tue, 8 Dec 2020 12:30:40 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0231.hostedemail.com [216.40.44.231]) by kanga.kvack.org (Postfix) with ESMTP id 0D0B66B005C for ; Tue, 8 Dec 2020 12:30:40 -0500 (EST) Received: from smtpin25.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id C08368249980 for ; Tue, 8 Dec 2020 17:30:39 +0000 (UTC) X-FDA: 77570804598.25.doll28_5411c8d273e8 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin25.hostedemail.com (Postfix) with ESMTP id 2D4611804E3BF for ; Tue, 8 Dec 2020 17:30:37 +0000 (UTC) X-HE-Tag: doll28_5411c8d273e8 X-Filterd-Recvd-Size: 12733 Received: from aserp2120.oracle.com (aserp2120.oracle.com [141.146.126.78]) by imf16.hostedemail.com (Postfix) with ESMTP for ; Tue, 8 Dec 2020 17:30:36 +0000 (UTC) Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0B8HPJGv071146; Tue, 8 Dec 2020 17:30:27 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2020-01-29; bh=Bf0H8+1g681aJKDQDVv1pLRfSYGx+OJmLGnIVAZhDVs=; b=qIJEWKbVS9VEgPtIFLmUjJARzDDpPJLwN2LmUNdaUUX0/F0WZgJ/CmLlhEJN2bORg/1k lxx91sUyyBk1EP6HZ3qrQ/Ga9QBtgVGVbTnYNF7w0i/OXzmYDRxs3cru+/k2Tt5M8Dji IhV7Zs6uZl01cvvhGv66oFf30EFoPKHT+sG8WLoIGk/n9hO43dQjjw1W5iiAI8CzK8v7 VtHMbLppXMZnm+a1iktp62jSGr7sFnntZVBz1scEpLDunXl21hvTnifdhI7SnVS5MrI3 XF1w9WuCfr+cCloxrkknStJxKv1/3aboKnae2nZ64QsOyrhXrwu7mmmMJd832ngVUUnE Ug== Received: from aserp3020.oracle.com (aserp3020.oracle.com [141.146.126.70]) by aserp2120.oracle.com with ESMTP id 35825m41jj-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Tue, 08 Dec 2020 17:30:27 +0000 Received: from pps.filterd (aserp3020.oracle.com [127.0.0.1]) by aserp3020.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0B8HOV52195372; Tue, 8 Dec 2020 17:30:27 GMT Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72]) by aserp3020.oracle.com with ESMTP id 358m3y2f72-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 08 Dec 2020 17:30:27 +0000 Received: from abhmp0019.oracle.com (abhmp0019.oracle.com [141.146.116.25]) by userv0121.oracle.com (8.14.4/8.13.8) with ESMTP id 0B8HUPhV026020; Tue, 8 Dec 2020 17:30:25 GMT Received: from paddy.uk.oracle.com (/10.175.194.215) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 08 Dec 2020 09:30:25 -0800 From: Joao Martins To: linux-mm@kvack.org Cc: Dan Williams , Ira Weiny , linux-nvdimm@lists.01.org, Matthew Wilcox , Jason Gunthorpe , Jane Chu , Muchun Song , Mike Kravetz , Andrew Morton , Joao Martins Subject: [PATCH RFC 3/9] sparse-vmemmap: Reuse vmemmap areas for a given mhp_params::align Date: Tue, 8 Dec 2020 17:28:54 +0000 Message-Id: <20201208172901.17384-4-joao.m.martins@oracle.com> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20201208172901.17384-1-joao.m.martins@oracle.com> References: <20201208172901.17384-1-joao.m.martins@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9829 signatures=668683 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 malwarescore=0 adultscore=0 bulkscore=0 phishscore=0 suspectscore=1 mlxscore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2012080107 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9829 signatures=668683 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=1 adultscore=0 bulkscore=0 phishscore=0 mlxlogscore=999 clxscore=1015 priorityscore=1501 mlxscore=0 spamscore=0 lowpriorityscore=0 malwarescore=0 impostorscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2012080107 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Introduce a new flag, MEMHP_REUSE_VMEMMAP, which signals that that struct pages are onlined with a given alignment, and should reuse the tail pages vmemmap areas. On that circunstamce we reuse the PFN backing only the tail pages subsections, while letting the head page PFN remain different. This presumes that the backing page structs are compound pages, such as the case for compound pagemaps (i.e. ZONE_DEVICE with PGMAP_COMPOUND set) On 2M compound pagemaps, it lets us save 6 pages out of the 8 necessary PFNs necessary to describe the subsection's 32K struct pages we are onlining. On a 1G compound pagemap it let us save 4096 pages. Sections are 128M (or bigger/smaller), and such when initializing a compound memory map where we are initializing compound struct pages, we need to preserve the tail page to be reused across the rest of the areas for pagesizes which bigger than a section. Signed-off-by: Joao Martins --- I wonder, rather than separating vmem_context and mhp_params, that one would just pick the latter. Albeit semantically the ctx aren't necessarily paramters, context passed from multiple sections onlining (i.e. multiple calls to populate_section_memmap). Also provided that this is internal state, which isn't passed to external modules, except @align and @flags for page size and requesting whether to reuse tail page areas. --- include/linux/memory_hotplug.h | 10 ++++ include/linux/mm.h | 2 +- mm/memory_hotplug.c | 12 ++++- mm/memremap.c | 3 ++ mm/sparse-vmemmap.c | 93 ++++++++++++++++++++++++++++------ 5 files changed, 103 insertions(+), 17 deletions(-) diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index 73f8bcbb58a4..e15bb82805a3 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -70,6 +70,10 @@ typedef int __bitwise mhp_t; */ #define MEMHP_MERGE_RESOURCE ((__force mhp_t)BIT(0)) +/* + */ +#define MEMHP_REUSE_VMEMMAP ((__force mhp_t)BIT(1)) + /* * Extended parameters for memory hotplug: * altmap: alternative allocator for memmap array (optional) @@ -79,10 +83,16 @@ typedef int __bitwise mhp_t; struct mhp_params { struct vmem_altmap *altmap; pgprot_t pgprot; + unsigned int align; + mhp_t flags; }; struct vmem_context { struct vmem_altmap *altmap; + mhp_t flags; + unsigned int align; + void *block; + unsigned long block_page; }; /* diff --git a/include/linux/mm.h b/include/linux/mm.h index 2eb44318bb2d..8b0155441835 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3006,7 +3006,7 @@ p4d_t *vmemmap_p4d_populate(pgd_t *pgd, unsigned long addr, int node); pud_t *vmemmap_pud_populate(p4d_t *p4d, unsigned long addr, int node); pmd_t *vmemmap_pmd_populate(pud_t *pud, unsigned long addr, int node); pte_t *vmemmap_pte_populate(pmd_t *pmd, unsigned long addr, int node, - struct vmem_altmap *altmap); + struct vmem_altmap *altmap, void *block); void *vmemmap_alloc_block(unsigned long size, int node); struct vmem_altmap; void *vmemmap_alloc_block_buf(unsigned long size, int node, diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index f8870c53fe5e..56121dfcc44b 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -300,6 +300,14 @@ static int check_hotplug_memory_addressable(unsigned long pfn, return 0; } +static void vmem_context_init(struct vmem_context *ctx, struct mhp_params *params) +{ + memset(ctx, 0, sizeof(*ctx)); + ctx->align = params->align; + ctx->altmap = params->altmap; + ctx->flags = params->flags; +} + /* * Reasonably generic function for adding memory. It is * expected that archs that support memory hotplug will @@ -313,7 +321,7 @@ int __ref __add_pages(int nid, unsigned long pfn, unsigned long nr_pages, unsigned long cur_nr_pages; int err; struct vmem_altmap *altmap = params->altmap; - struct vmem_context ctx = { .altmap = params->altmap }; + struct vmem_context ctx; if (WARN_ON_ONCE(!params->pgprot.pgprot)) return -EINVAL; @@ -338,6 +346,8 @@ int __ref __add_pages(int nid, unsigned long pfn, unsigned long nr_pages, if (err) return err; + vmem_context_init(&ctx, params); + for (; pfn < end_pfn; pfn += cur_nr_pages) { /* Select all remaining pages up to the next section boundary */ cur_nr_pages = min(end_pfn - pfn, diff --git a/mm/memremap.c b/mm/memremap.c index 287a24b7a65a..ecfa74848ac6 100644 --- a/mm/memremap.c +++ b/mm/memremap.c @@ -253,6 +253,9 @@ static int pagemap_range(struct dev_pagemap *pgmap, struct mhp_params *params, goto err_kasan; } + if (pgmap->flags & PGMAP_COMPOUND) + params->align = pgmap->align; + error = arch_add_memory(nid, range->start, range_len(range), params); } diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c index bcda68ba1381..1679f36473ac 100644 --- a/mm/sparse-vmemmap.c +++ b/mm/sparse-vmemmap.c @@ -141,16 +141,20 @@ void __meminit vmemmap_verify(pte_t *pte, int node, } pte_t * __meminit vmemmap_pte_populate(pmd_t *pmd, unsigned long addr, int node, - struct vmem_altmap *altmap) + struct vmem_altmap *altmap, void *block) { pte_t *pte = pte_offset_kernel(pmd, addr); if (pte_none(*pte)) { pte_t entry; - void *p; - - p = vmemmap_alloc_block_buf(PAGE_SIZE, node, altmap); - if (!p) - return NULL; + void *p = block; + + if (!block) { + p = vmemmap_alloc_block_buf(PAGE_SIZE, node, altmap); + if (!p) + return NULL; + } else { + get_page(virt_to_page(block)); + } entry = pfn_pte(__pa(p) >> PAGE_SHIFT, PAGE_KERNEL); set_pte_at(&init_mm, addr, pte, entry); } @@ -216,8 +220,10 @@ pgd_t * __meminit vmemmap_pgd_populate(unsigned long addr, int node) return pgd; } -int __meminit vmemmap_populate_basepages(unsigned long start, unsigned long end, - int node, struct vmem_altmap *altmap) +static void *__meminit __vmemmap_populate_basepages(unsigned long start, + unsigned long end, int node, + struct vmem_altmap *altmap, + void *block) { unsigned long addr = start; pgd_t *pgd; @@ -229,38 +235,95 @@ int __meminit vmemmap_populate_basepages(unsigned long start, unsigned long end, for (; addr < end; addr += PAGE_SIZE) { pgd = vmemmap_pgd_populate(addr, node); if (!pgd) - return -ENOMEM; + return NULL; p4d = vmemmap_p4d_populate(pgd, addr, node); if (!p4d) - return -ENOMEM; + return NULL; pud = vmemmap_pud_populate(p4d, addr, node); if (!pud) - return -ENOMEM; + return NULL; pmd = vmemmap_pmd_populate(pud, addr, node); if (!pmd) - return -ENOMEM; - pte = vmemmap_pte_populate(pmd, addr, node, altmap); + return NULL; + pte = vmemmap_pte_populate(pmd, addr, node, altmap, block); if (!pte) - return -ENOMEM; + return NULL; vmemmap_verify(pte, node, addr, addr + PAGE_SIZE); } + return __va(__pfn_to_phys(pte_pfn(*pte))); +} + +int __meminit vmemmap_populate_basepages(unsigned long start, unsigned long end, + int node, struct vmem_altmap *altmap) +{ + if (!__vmemmap_populate_basepages(start, end, node, altmap, NULL)) + return -ENOMEM; return 0; } +static struct page * __meminit vmemmap_populate_reuse(unsigned long start, + unsigned long end, int node, + struct vmem_context *ctx) +{ + unsigned long size, addr = start; + unsigned long psize = PHYS_PFN(ctx->align) * sizeof(struct page); + + size = min(psize, end - start); + + for (; addr < end; addr += size) { + unsigned long head = addr + PAGE_SIZE; + unsigned long tail = addr; + unsigned long last = addr + size; + void *area; + + if (ctx->block_page && + IS_ALIGNED((addr - ctx->block_page), psize)) + ctx->block = NULL; + + area = ctx->block; + if (!area) { + if (!__vmemmap_populate_basepages(addr, head, node, + ctx->altmap, NULL)) + return NULL; + + tail = head + PAGE_SIZE; + area = __vmemmap_populate_basepages(head, tail, node, + ctx->altmap, NULL); + if (!area) + return NULL; + + ctx->block = area; + ctx->block_page = addr; + } + + if (!__vmemmap_populate_basepages(tail, last, node, + ctx->altmap, area)) + return NULL; + } + + return (struct page*) start; +} + struct page * __meminit __populate_section_memmap(unsigned long pfn, unsigned long nr_pages, int nid, struct vmem_context *ctx) { unsigned long start = (unsigned long) pfn_to_page(pfn); unsigned long end = start + nr_pages * sizeof(struct page); struct vmem_altmap *altmap = NULL; + int flags = 0; if (WARN_ON_ONCE(!IS_ALIGNED(pfn, PAGES_PER_SUBSECTION) || !IS_ALIGNED(nr_pages, PAGES_PER_SUBSECTION))) return NULL; - if (ctx) + if (ctx) { altmap = ctx->altmap; + flags = ctx->flags; + } + + if (flags & MEMHP_REUSE_VMEMMAP) + return vmemmap_populate_reuse(start, end, nid, ctx); if (vmemmap_populate(start, end, nid, altmap)) return NULL; From patchwork Tue Dec 8 17:28:56 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joao Martins X-Patchwork-Id: 11959127 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, UNPARSEABLE_RELAY,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8A8B7C1B0D8 for ; Tue, 8 Dec 2020 17:30:43 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 278E923AFD for ; Tue, 8 Dec 2020 17:30:43 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 278E923AFD Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=oracle.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id B581B6B005D; Tue, 8 Dec 2020 12:30:42 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B07526B006C; Tue, 8 Dec 2020 12:30:42 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 986D36B0070; Tue, 8 Dec 2020 12:30:42 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0023.hostedemail.com [216.40.44.23]) by kanga.kvack.org (Postfix) with ESMTP id 831366B005D for ; Tue, 8 Dec 2020 12:30:42 -0500 (EST) Received: from smtpin12.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 4D191181AEF1E for ; Tue, 8 Dec 2020 17:30:42 +0000 (UTC) X-FDA: 77570804724.12.knee16_160f05d273e8 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin12.hostedemail.com (Postfix) with ESMTP id EA36918015E02 for ; Tue, 8 Dec 2020 17:30:40 +0000 (UTC) X-HE-Tag: knee16_160f05d273e8 X-Filterd-Recvd-Size: 7276 Received: from userp2130.oracle.com (userp2130.oracle.com [156.151.31.86]) by imf38.hostedemail.com (Postfix) with ESMTP for ; Tue, 8 Dec 2020 17:30:40 +0000 (UTC) Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0B8HNtEn191622; Tue, 8 Dec 2020 17:30:34 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2020-01-29; bh=0kzwvkTGsVJKtEUrLZS4ix2by6zECgpKyc1dxPfDV8c=; b=pVCebdJtzI9QUzQ0H//+uylhQf6y3korcYr11DucMDxpH+BuW08f3LKVPVBmJsuafB3Y wUGJ38psjy6Jt5SifSt1xS0fPg1JVY1HGXUGPNzml5nclfLJLsVDvRDgvPxyWrB+AZof QyjWZ6sMqF363NoLPWWRoCwtBl4wtgJSo95efXD5d8qHxwiXLoiV/QuEm4k+YCm+5uKC /QlJMiNd8Uca9VZRN3AuDk02zWGPa1dnSMGTAyKF3wHOO+EmU8B6dWhF+ZgyxscvQwYn QsHk0Mt+nOpnVLiETG5xVDj6IiMcFchAkFmE1P69uIAwRsGGa/Psq5yDL9PrcylE71U7 xw== Received: from aserp3020.oracle.com (aserp3020.oracle.com [141.146.126.70]) by userp2130.oracle.com with ESMTP id 3581mqv005-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Tue, 08 Dec 2020 17:30:33 +0000 Received: from pps.filterd (aserp3020.oracle.com [127.0.0.1]) by aserp3020.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0B8HOVhQ195364; Tue, 8 Dec 2020 17:30:33 GMT Received: from aserv0121.oracle.com (aserv0121.oracle.com [141.146.126.235]) by aserp3020.oracle.com with ESMTP id 358m3y2fca-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 08 Dec 2020 17:30:33 +0000 Received: from abhmp0019.oracle.com (abhmp0019.oracle.com [141.146.116.25]) by aserv0121.oracle.com (8.14.4/8.13.8) with ESMTP id 0B8HUVZk004537; Tue, 8 Dec 2020 17:30:31 GMT Received: from paddy.uk.oracle.com (/10.175.194.215) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 08 Dec 2020 09:30:31 -0800 From: Joao Martins To: linux-mm@kvack.org Cc: Dan Williams , Ira Weiny , linux-nvdimm@lists.01.org, Matthew Wilcox , Jason Gunthorpe , Jane Chu , Muchun Song , Mike Kravetz , Andrew Morton , Joao Martins Subject: [PATCH RFC 4/9] mm/page_alloc: Reuse tail struct pages for compound pagemaps Date: Tue, 8 Dec 2020 17:28:56 +0000 Message-Id: <20201208172901.17384-6-joao.m.martins@oracle.com> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20201208172901.17384-1-joao.m.martins@oracle.com> References: <20201208172901.17384-1-joao.m.martins@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9829 signatures=668683 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 malwarescore=0 adultscore=0 bulkscore=0 phishscore=0 suspectscore=3 mlxscore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2012080107 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9829 signatures=668683 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=3 mlxlogscore=999 clxscore=1015 malwarescore=0 priorityscore=1501 adultscore=0 lowpriorityscore=0 phishscore=0 spamscore=0 impostorscore=0 mlxscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2012080107 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When PGMAP_COMPOUND is set, all pages are onlined at a given huge page alignment and using compound pages to describe them as opposed to a struct per 4K. To minimize struct page overhead and given the usage of compound pages we utilize the fact that most tail pages look the same, we online the subsection while pointing to the same pages. Thus request VMEMMAP_REUSE in add_pages. With VMEMMAP_REUSE, provided we reuse most tail pages the amount of struct pages we need to initialize is a lot smaller that the total amount of structs we would normnally online. Thus allow an @init_order to be passed to specify how much pages we want to prep upon creating a compound page. Finally when onlining all struct pages in memmap_init_zone_device, make sure that we only initialize the unique struct pages i.e. the first 2 4K pages from @align which means 128 struct pages out of 32768 for 2M @align or 262144 for a 1G @align. Signed-off-by: Joao Martins --- mm/memremap.c | 4 +++- mm/page_alloc.c | 23 ++++++++++++++++++++--- 2 files changed, 23 insertions(+), 4 deletions(-) diff --git a/mm/memremap.c b/mm/memremap.c index ecfa74848ac6..3eca07916b9d 100644 --- a/mm/memremap.c +++ b/mm/memremap.c @@ -253,8 +253,10 @@ static int pagemap_range(struct dev_pagemap *pgmap, struct mhp_params *params, goto err_kasan; } - if (pgmap->flags & PGMAP_COMPOUND) + if (pgmap->flags & PGMAP_COMPOUND) { params->align = pgmap->align; + params->flags = MEMHP_REUSE_VMEMMAP; + } error = arch_add_memory(nid, range->start, range_len(range), params); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 9716ecd58e29..180a7d4e9285 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -691,10 +691,11 @@ void free_compound_page(struct page *page) __free_pages_ok(page, compound_order(page), FPI_NONE); } -void prep_compound_page(struct page *page, unsigned int order) +static void __prep_compound_page(struct page *page, unsigned int order, + unsigned int init_order) { int i; - int nr_pages = 1 << order; + int nr_pages = 1 << init_order; __SetPageHead(page); for (i = 1; i < nr_pages; i++) { @@ -711,6 +712,11 @@ void prep_compound_page(struct page *page, unsigned int order) atomic_set(compound_pincount_ptr(page), 0); } +void prep_compound_page(struct page *page, unsigned int order) +{ + __prep_compound_page(page, order, order); +} + #ifdef CONFIG_DEBUG_PAGEALLOC unsigned int _debug_guardpage_minorder; @@ -6108,6 +6114,9 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone, } #ifdef CONFIG_ZONE_DEVICE + +#define MEMMAP_COMPOUND_SIZE (2 * (PAGE_SIZE/sizeof(struct page))) + void __ref memmap_init_zone_device(struct zone *zone, unsigned long start_pfn, unsigned long nr_pages, @@ -6138,6 +6147,12 @@ void __ref memmap_init_zone_device(struct zone *zone, for (pfn = start_pfn; pfn < end_pfn; pfn++) { struct page *page = pfn_to_page(pfn); + /* Skip already initialized pages. */ + if (compound && (pfn % align >= MEMMAP_COMPOUND_SIZE)) { + pfn = ALIGN(pfn, align) - 1; + continue; + } + __init_single_page(page, pfn, zone_idx, nid); /* @@ -6175,7 +6190,9 @@ void __ref memmap_init_zone_device(struct zone *zone, if (compound) { for (pfn = start_pfn; pfn < end_pfn; pfn += align) - prep_compound_page(pfn_to_page(pfn), order_base_2(align)); + __prep_compound_page(pfn_to_page(pfn), + order_base_2(align), + order_base_2(MEMMAP_COMPOUND_SIZE)); } pr_info("%s initialised %lu pages in %ums\n", __func__, From patchwork Tue Dec 8 17:28:57 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joao Martins X-Patchwork-Id: 11959129 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, UNPARSEABLE_RELAY,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E903EC4361B for ; Tue, 8 Dec 2020 17:30:46 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 7324623AFD for ; Tue, 8 Dec 2020 17:30:46 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7324623AFD Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=oracle.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 0F74F6B006C; Tue, 8 Dec 2020 12:30:46 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0D1056B0070; Tue, 8 Dec 2020 12:30:46 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EFFE06B0071; Tue, 8 Dec 2020 12:30:45 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0198.hostedemail.com [216.40.44.198]) by kanga.kvack.org (Postfix) with ESMTP id D7DC16B006C for ; Tue, 8 Dec 2020 12:30:45 -0500 (EST) Received: from smtpin30.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 9EAF98249980 for ; Tue, 8 Dec 2020 17:30:45 +0000 (UTC) X-FDA: 77570804850.30.scene99_410b44f273e8 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin30.hostedemail.com (Postfix) with ESMTP id 69369180B3C8B for ; Tue, 8 Dec 2020 17:30:45 +0000 (UTC) X-HE-Tag: scene99_410b44f273e8 X-Filterd-Recvd-Size: 7164 Received: from userp2130.oracle.com (userp2130.oracle.com [156.151.31.86]) by imf28.hostedemail.com (Postfix) with ESMTP for ; Tue, 8 Dec 2020 17:30:44 +0000 (UTC) Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0B8HNtKb191639; Tue, 8 Dec 2020 17:30:36 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2020-01-29; bh=g6swsjDZvzJC7QAazYa+f++QasgrbynEkAanWwTCzP4=; b=G8frdM5XIvg9XNJYzwKy9DT8AVDtQLnRWPKHtTS1ysuB8J6wjUoZ48J6Ah6k2K08fCYN gayysIbwvSAGqQ+2rsmuUyvbcfD65SUtXoOveitYVRNSHxkzsQi2Kido21OcUpobyErC BkFrxaUds/KwXQUfH435n3AxCugDcn6NAO7bAynllz+8CIDrY8zg5rYN/cfhCw5paFhV Yh5LCbrwqB+9kZBZ/XqsDmLdl/qaWuXjzfKRqxZeeLh5xIRyJicrZAEdK/9bv4vgo4LB 924C1ySZJbXDduABBaU9lMizWtDsya5BR6n44HWzrkGn7x9N8vfgGDuPJZlkmGUTVzI1 0Q== Received: from userp3030.oracle.com (userp3030.oracle.com [156.151.31.80]) by userp2130.oracle.com with ESMTP id 3581mqv00b-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Tue, 08 Dec 2020 17:30:36 +0000 Received: from pps.filterd (userp3030.oracle.com [127.0.0.1]) by userp3030.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0B8HPjfh032498; Tue, 8 Dec 2020 17:30:36 GMT Received: from aserv0121.oracle.com (aserv0121.oracle.com [141.146.126.235]) by userp3030.oracle.com with ESMTP id 358m4y6ssv-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 08 Dec 2020 17:30:36 +0000 Received: from abhmp0019.oracle.com (abhmp0019.oracle.com [141.146.116.25]) by aserv0121.oracle.com (8.14.4/8.13.8) with ESMTP id 0B8HUZH2004555; Tue, 8 Dec 2020 17:30:35 GMT Received: from paddy.uk.oracle.com (/10.175.194.215) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 08 Dec 2020 09:30:34 -0800 From: Joao Martins To: linux-mm@kvack.org Cc: Dan Williams , Ira Weiny , linux-nvdimm@lists.01.org, Matthew Wilcox , Jason Gunthorpe , Jane Chu , Muchun Song , Mike Kravetz , Andrew Morton , Joao Martins Subject: [PATCH RFC 5/9] device-dax: Compound pagemap support Date: Tue, 8 Dec 2020 17:28:57 +0000 Message-Id: <20201208172901.17384-7-joao.m.martins@oracle.com> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20201208172901.17384-1-joao.m.martins@oracle.com> References: <20201208172901.17384-1-joao.m.martins@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9829 signatures=668683 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxscore=0 spamscore=0 suspectscore=1 bulkscore=0 malwarescore=0 phishscore=0 adultscore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2012080107 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9829 signatures=668683 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=1 mlxlogscore=999 clxscore=1015 malwarescore=0 priorityscore=1501 adultscore=0 lowpriorityscore=0 phishscore=0 spamscore=0 impostorscore=0 mlxscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2012080107 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: dax devices are created with a fixed @align (huge page size) which is enforced through as well at mmap() of the device. Faults, consequently happen too at the specified @align specified at the creation, and those don't change through out dax device lifetime. MCEs poisons a whole dax huge page, as well as splits occurring at at the configured page size. As such, use the newly added compound pagemap facility which onlines the assigned dax ranges as compound pages. Currently, this means, that region/namespace bootstrap would take considerably less, given that you would initialize considerably less pages. On emulated NVDIMM guests this can be easily seen, e.g. on a setup with an emulated NVDIMM with 128G in size seeing improvements from ~750ms to ~190ms with 2M pages, and to less than a 1msec with 1G pages. Signed-off-by: Joao Martins --- Probably deserves its own sysfs attribute for enabling PGMAP_COMPOUND? --- drivers/dax/device.c | 54 +++++++++++++++++++++++++++++++++----------- 1 file changed, 41 insertions(+), 13 deletions(-) diff --git a/drivers/dax/device.c b/drivers/dax/device.c index 25e0b84a4296..9daec6e08efe 100644 --- a/drivers/dax/device.c +++ b/drivers/dax/device.c @@ -192,6 +192,39 @@ static vm_fault_t __dev_dax_pud_fault(struct dev_dax *dev_dax, } #endif /* !CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD */ +static void set_page_mapping(struct vm_fault *vmf, pfn_t pfn, + unsigned int fault_size, + struct address_space *f_mapping) +{ + unsigned long i; + pgoff_t pgoff; + + pgoff = linear_page_index(vmf->vma, vmf->address + & ~(fault_size - 1)); + + for (i = 0; i < fault_size / PAGE_SIZE; i++) { + struct page *page; + + page = pfn_to_page(pfn_t_to_pfn(pfn) + i); + if (page->mapping) + continue; + page->mapping = f_mapping; + page->index = pgoff + i; + } +} + +static void set_compound_mapping(struct vm_fault *vmf, pfn_t pfn, + unsigned int fault_size, + struct address_space *f_mapping) +{ + struct page *head; + + head = pfn_to_page(pfn_t_to_pfn(pfn)); + head->mapping = f_mapping; + head->index = linear_page_index(vmf->vma, vmf->address + & ~(fault_size - 1)); +} + static vm_fault_t dev_dax_huge_fault(struct vm_fault *vmf, enum page_entry_size pe_size) { @@ -225,8 +258,7 @@ static vm_fault_t dev_dax_huge_fault(struct vm_fault *vmf, } if (rc == VM_FAULT_NOPAGE) { - unsigned long i; - pgoff_t pgoff; + struct dev_pagemap *pgmap = pfn_t_to_page(pfn)->pgmap; /* * In the device-dax case the only possibility for a @@ -234,17 +266,10 @@ static vm_fault_t dev_dax_huge_fault(struct vm_fault *vmf, * mapped. No need to consider the zero page, or racing * conflicting mappings. */ - pgoff = linear_page_index(vmf->vma, vmf->address - & ~(fault_size - 1)); - for (i = 0; i < fault_size / PAGE_SIZE; i++) { - struct page *page; - - page = pfn_to_page(pfn_t_to_pfn(pfn) + i); - if (page->mapping) - continue; - page->mapping = filp->f_mapping; - page->index = pgoff + i; - } + if (pgmap->flags & PGMAP_COMPOUND) + set_compound_mapping(vmf, pfn, fault_size, filp->f_mapping); + else + set_page_mapping(vmf, pfn, fault_size, filp->f_mapping); } dax_read_unlock(id); @@ -426,6 +451,9 @@ int dev_dax_probe(struct dev_dax *dev_dax) } pgmap->type = MEMORY_DEVICE_GENERIC; + pgmap->flags = PGMAP_COMPOUND; + pgmap->align = dev_dax->align; + addr = devm_memremap_pages(dev, pgmap); if (IS_ERR(addr)) return PTR_ERR(addr); From patchwork Tue Dec 8 17:28:58 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joao Martins X-Patchwork-Id: 11959131 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, UNPARSEABLE_RELAY,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1E9CCC4167B for ; Tue, 8 Dec 2020 17:30:49 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id CD6B523AFD for ; Tue, 8 Dec 2020 17:30:48 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CD6B523AFD Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=oracle.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id D2C626B0070; Tue, 8 Dec 2020 12:30:46 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D0C306B0071; Tue, 8 Dec 2020 12:30:46 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BF7866B0072; Tue, 8 Dec 2020 12:30:46 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0194.hostedemail.com [216.40.44.194]) by kanga.kvack.org (Postfix) with ESMTP id A99DD6B0070 for ; Tue, 8 Dec 2020 12:30:46 -0500 (EST) Received: from smtpin05.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 3B2EF3625 for ; Tue, 8 Dec 2020 17:30:46 +0000 (UTC) X-FDA: 77570804892.05.scene05_0900c77273e8 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin05.hostedemail.com (Postfix) with ESMTP id 27FE21800FD73 for ; Tue, 8 Dec 2020 17:30:46 +0000 (UTC) X-HE-Tag: scene05_0900c77273e8 X-Filterd-Recvd-Size: 7693 Received: from aserp2130.oracle.com (aserp2130.oracle.com [141.146.126.79]) by imf47.hostedemail.com (Postfix) with ESMTP for ; Tue, 8 Dec 2020 17:30:45 +0000 (UTC) Received: from pps.filterd (aserp2130.oracle.com [127.0.0.1]) by aserp2130.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0B8HPdL3083242; Tue, 8 Dec 2020 17:30:40 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2020-01-29; bh=IPG6ZsD2a9Vym4C7OJ7uGrMcb9WiYMai81wXH8Ceros=; b=ItJYq/GAYBlTpQ2+u9W58j+n1sLVJfQUncmsCa4l8a5JA0XzRIhI91E2xhIKrsyojKxy CoBIhDGix1VWOQD/F7/Pcj9gM9Y8Sje0ivYHftqDj9mK6SF0O+Jbw24qppTJEQLZx+hQ htvuqA9Z79Y3pQCaZqlGbbjVi6P5/1XErtwUfRI1MEz1Gu705eXUlaRGfJRIgbv3QM1o 1HExNkS3AqgDDNdifWbX0rr/AFbEpFn26VmTVt+XWosGj+EBoYHDFpAmhiIu5I9CQfAA BSRj90dpG6f2Vs7mf53V9h948yc6j5fPt4VTClvygM747y9KeJfwsExaA/ieQNLEubOM tw== Received: from userp3030.oracle.com (userp3030.oracle.com [156.151.31.80]) by aserp2130.oracle.com with ESMTP id 357yqbv4eb-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Tue, 08 Dec 2020 17:30:39 +0000 Received: from pps.filterd (userp3030.oracle.com [127.0.0.1]) by userp3030.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0B8HPhk3032074; Tue, 8 Dec 2020 17:30:39 GMT Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by userp3030.oracle.com with ESMTP id 358m4y6sv9-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 08 Dec 2020 17:30:39 +0000 Received: from abhmp0019.oracle.com (abhmp0019.oracle.com [141.146.116.25]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 0B8HUcsQ031614; Tue, 8 Dec 2020 17:30:38 GMT Received: from paddy.uk.oracle.com (/10.175.194.215) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 08 Dec 2020 09:30:37 -0800 From: Joao Martins To: linux-mm@kvack.org Cc: Dan Williams , Ira Weiny , linux-nvdimm@lists.01.org, Matthew Wilcox , Jason Gunthorpe , Jane Chu , Muchun Song , Mike Kravetz , Andrew Morton , Joao Martins Subject: [PATCH RFC 6/9] mm/gup: Grab head page refcount once for group of subpages Date: Tue, 8 Dec 2020 17:28:58 +0000 Message-Id: <20201208172901.17384-8-joao.m.martins@oracle.com> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20201208172901.17384-1-joao.m.martins@oracle.com> References: <20201208172901.17384-1-joao.m.martins@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9829 signatures=668683 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxscore=0 spamscore=0 suspectscore=1 bulkscore=0 malwarescore=0 phishscore=0 adultscore=0 mlxlogscore=904 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2012080107 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9829 signatures=668683 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=1 mlxlogscore=930 clxscore=1015 malwarescore=0 bulkscore=0 phishscore=0 adultscore=0 spamscore=0 priorityscore=1501 mlxscore=0 lowpriorityscore=0 impostorscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2012080107 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Much like hugetlbfs or THPs, we treat device pagemaps with compound pages like the rest of GUP handling of compound pages. Rather than incrementing the refcount every 4K, we record all sub pages and increment by @refs amount *once*. Performance measured by gup_benchmark improves considerably get_user_pages_fast() and pin_user_pages_fast(): $ gup_benchmark -f /dev/dax0.2 -m 16384 -r 10 -S [-u,-a] -n 512 -w (get_user_pages_fast 2M pages) ~75k us -> ~3.6k us (pin_user_pages_fast 2M pages) ~125k us -> ~3.8k us Signed-off-by: Joao Martins --- mm/gup.c | 67 ++++++++++++++++++++++++++++++++++++++++++-------------- 1 file changed, 51 insertions(+), 16 deletions(-) diff --git a/mm/gup.c b/mm/gup.c index 98eb8e6d2609..194e6981eb03 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -2250,22 +2250,68 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end, } #endif /* CONFIG_ARCH_HAS_PTE_SPECIAL */ + +static int record_subpages(struct page *page, unsigned long addr, + unsigned long end, struct page **pages) +{ + int nr; + + for (nr = 0; addr != end; addr += PAGE_SIZE) + pages[nr++] = page++; + + return nr; +} + #if defined(CONFIG_ARCH_HAS_PTE_DEVMAP) && defined(CONFIG_TRANSPARENT_HUGEPAGE) -static int __gup_device_huge(unsigned long pfn, unsigned long addr, - unsigned long end, unsigned int flags, - struct page **pages, int *nr) +static int __gup_device_compound_huge(struct dev_pagemap *pgmap, + struct page *head, unsigned long sz, + unsigned long addr, unsigned long end, + unsigned int flags, struct page **pages) +{ + struct page *page; + int refs; + + if (!(pgmap->flags & PGMAP_COMPOUND)) + return -1; + + page = head + ((addr & (sz-1)) >> PAGE_SHIFT); + refs = record_subpages(page, addr, end, pages); + + SetPageReferenced(page); + head = try_grab_compound_head(head, refs, flags); + if (!head) { + ClearPageReferenced(page); + return 0; + } + + return refs; +} + +static int __gup_device_huge(unsigned long pfn, unsigned long sz, + unsigned long addr, unsigned long end, + unsigned int flags, struct page **pages, int *nr) { int nr_start = *nr; struct dev_pagemap *pgmap = NULL; do { struct page *page = pfn_to_page(pfn); + int refs; pgmap = get_dev_pagemap(pfn, pgmap); if (unlikely(!pgmap)) { undo_dev_pagemap(nr, nr_start, flags, pages); return 0; } + + refs = __gup_device_compound_huge(pgmap, page, sz, addr, end, + flags, pages + *nr); + if (refs >= 0) { + *nr += refs; + put_dev_pagemap(pgmap); + return refs ? 1 : 0; + } + SetPageReferenced(page); pages[*nr] = page; if (unlikely(!try_grab_page(page, flags))) { @@ -2289,7 +2335,7 @@ static int __gup_device_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr, int nr_start = *nr; fault_pfn = pmd_pfn(orig) + ((addr & ~PMD_MASK) >> PAGE_SHIFT); - if (!__gup_device_huge(fault_pfn, addr, end, flags, pages, nr)) + if (!__gup_device_huge(fault_pfn, PMD_SHIFT, addr, end, flags, pages, nr)) return 0; if (unlikely(pmd_val(orig) != pmd_val(*pmdp))) { @@ -2307,7 +2353,7 @@ static int __gup_device_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr, int nr_start = *nr; fault_pfn = pud_pfn(orig) + ((addr & ~PUD_MASK) >> PAGE_SHIFT); - if (!__gup_device_huge(fault_pfn, addr, end, flags, pages, nr)) + if (!__gup_device_huge(fault_pfn, PUD_SHIFT, addr, end, flags, pages, nr)) return 0; if (unlikely(pud_val(orig) != pud_val(*pudp))) { @@ -2334,17 +2380,6 @@ static int __gup_device_huge_pud(pud_t pud, pud_t *pudp, unsigned long addr, } #endif -static int record_subpages(struct page *page, unsigned long addr, - unsigned long end, struct page **pages) -{ - int nr; - - for (nr = 0; addr != end; addr += PAGE_SIZE) - pages[nr++] = page++; - - return nr; -} - #ifdef CONFIG_ARCH_HAS_HUGEPD static unsigned long hugepte_addr_end(unsigned long addr, unsigned long end, unsigned long sz) From patchwork Tue Dec 8 17:28:59 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joao Martins X-Patchwork-Id: 11959135 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, UNPARSEABLE_RELAY,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C1C30C4167B for ; Tue, 8 Dec 2020 17:30:54 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 6BC7823B05 for ; Tue, 8 Dec 2020 17:30:54 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6BC7823B05 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=oracle.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 8265B6B0072; Tue, 8 Dec 2020 12:30:49 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 7B40E6B0073; Tue, 8 Dec 2020 12:30:49 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 58D5C6B0074; Tue, 8 Dec 2020 12:30:49 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0219.hostedemail.com [216.40.44.219]) by kanga.kvack.org (Postfix) with ESMTP id 404AF6B0072 for ; Tue, 8 Dec 2020 12:30:49 -0500 (EST) Received: from smtpin16.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id D356A3625 for ; Tue, 8 Dec 2020 17:30:48 +0000 (UTC) X-FDA: 77570804976.16.twig81_4009fc0273e8 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin16.hostedemail.com (Postfix) with ESMTP id AA57B100E690C for ; Tue, 8 Dec 2020 17:30:48 +0000 (UTC) X-HE-Tag: twig81_4009fc0273e8 X-Filterd-Recvd-Size: 7359 Received: from aserp2130.oracle.com (aserp2130.oracle.com [141.146.126.79]) by imf05.hostedemail.com (Postfix) with ESMTP for ; Tue, 8 Dec 2020 17:30:47 +0000 (UTC) Received: from pps.filterd (aserp2130.oracle.com [127.0.0.1]) by aserp2130.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0B8HPa5v083230; Tue, 8 Dec 2020 17:30:42 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2020-01-29; bh=LBzwQ398iHST3UKT538bsLuEEbD/c3Ows2/jGL1jAIY=; b=oXpLirFM6Qlbdf+Mtkrd46GYq4i7ZSwkEHsawSs3Mzpn+CvtLEqAs7ysR1mnC4EgUlb3 q4FSO5XVJJp24q8JVnR0RllKAc77C7t8gXPg4cDOhwMrWOc9RtCRgN4hrJ7HapqBKy6U 7iVEnu0d1MHDfPWsSsK5a4d1aL6YnAnVNjZwPHFLxgg0bxHEFdvBys6Xs7cQAwe6/Dc1 nlRBu2lGlWvjw+yj1NNcCeDwL9sMZZHjmWldpWNoVK1iJvtns0bX7EeBz8iP0vkjLIkH IopzYpqxZGFPYWYjuLhh1MNkUG0lM24ygQv/2Dbt+ayjCQPZM4Nlc0ACYTe18caiABJs ww== Received: from aserp3020.oracle.com (aserp3020.oracle.com [141.146.126.70]) by aserp2130.oracle.com with ESMTP id 357yqbv4ej-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Tue, 08 Dec 2020 17:30:42 +0000 Received: from pps.filterd (aserp3020.oracle.com [127.0.0.1]) by aserp3020.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0B8HOVs7195377; Tue, 8 Dec 2020 17:30:41 GMT Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by aserp3020.oracle.com with ESMTP id 358m3y2fgw-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 08 Dec 2020 17:30:41 +0000 Received: from abhmp0019.oracle.com (abhmp0019.oracle.com [141.146.116.25]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 0B8HUfXr012449; Tue, 8 Dec 2020 17:30:41 GMT Received: from paddy.uk.oracle.com (/10.175.194.215) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 08 Dec 2020 09:30:40 -0800 From: Joao Martins To: linux-mm@kvack.org Cc: Dan Williams , Ira Weiny , linux-nvdimm@lists.01.org, Matthew Wilcox , Jason Gunthorpe , Jane Chu , Muchun Song , Mike Kravetz , Andrew Morton , Joao Martins Subject: [PATCH RFC 7/9] mm/gup: Decrement head page once for group of subpages Date: Tue, 8 Dec 2020 17:28:59 +0000 Message-Id: <20201208172901.17384-9-joao.m.martins@oracle.com> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20201208172901.17384-1-joao.m.martins@oracle.com> References: <20201208172901.17384-1-joao.m.martins@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9829 signatures=668683 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 malwarescore=0 adultscore=0 bulkscore=0 phishscore=0 suspectscore=1 mlxscore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2012080107 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9829 signatures=668683 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=1 mlxlogscore=999 clxscore=1015 malwarescore=0 bulkscore=0 phishscore=0 adultscore=0 spamscore=0 priorityscore=1501 mlxscore=0 lowpriorityscore=0 impostorscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2012080107 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Rather than decrementing the ref count one by one, we walk the page array and checking which belong to the same compound_head. Later on we decrement the calculated amount of references in a single write to the head page. Signed-off-by: Joao Martins --- mm/gup.c | 41 ++++++++++++++++++++++++++++++++--------- 1 file changed, 32 insertions(+), 9 deletions(-) diff --git a/mm/gup.c b/mm/gup.c index 194e6981eb03..3a9a7229f418 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -212,6 +212,18 @@ static bool __unpin_devmap_managed_user_page(struct page *page) } #endif /* CONFIG_DEV_PAGEMAP_OPS */ +static int record_refs(struct page **pages, int npages) +{ + struct page *head = compound_head(pages[0]); + int refs = 1, index; + + for (index = 1; index < npages; index++, refs++) + if (compound_head(pages[index]) != head) + break; + + return refs; +} + /** * unpin_user_page() - release a dma-pinned page * @page: pointer to page to be released @@ -221,9 +233,9 @@ static bool __unpin_devmap_managed_user_page(struct page *page) * that such pages can be separately tracked and uniquely handled. In * particular, interactions with RDMA and filesystems need special handling. */ -void unpin_user_page(struct page *page) +static void __unpin_user_page(struct page *page, int refs) { - int refs = 1; + int orig_refs = refs; page = compound_head(page); @@ -237,14 +249,19 @@ void unpin_user_page(struct page *page) return; if (hpage_pincount_available(page)) - hpage_pincount_sub(page, 1); + hpage_pincount_sub(page, refs); else - refs = GUP_PIN_COUNTING_BIAS; + refs *= GUP_PIN_COUNTING_BIAS; if (page_ref_sub_and_test(page, refs)) __put_page(page); - mod_node_page_state(page_pgdat(page), NR_FOLL_PIN_RELEASED, 1); + mod_node_page_state(page_pgdat(page), NR_FOLL_PIN_RELEASED, orig_refs); +} + +void unpin_user_page(struct page *page) +{ + __unpin_user_page(page, 1); } EXPORT_SYMBOL(unpin_user_page); @@ -274,6 +291,7 @@ void unpin_user_pages_dirty_lock(struct page **pages, unsigned long npages, bool make_dirty) { unsigned long index; + int refs = 1; /* * TODO: this can be optimized for huge pages: if a series of pages is @@ -286,8 +304,9 @@ void unpin_user_pages_dirty_lock(struct page **pages, unsigned long npages, return; } - for (index = 0; index < npages; index++) { + for (index = 0; index < npages; index += refs) { struct page *page = compound_head(pages[index]); + /* * Checking PageDirty at this point may race with * clear_page_dirty_for_io(), but that's OK. Two key @@ -310,7 +329,8 @@ void unpin_user_pages_dirty_lock(struct page **pages, unsigned long npages, */ if (!PageDirty(page)) set_page_dirty_lock(page); - unpin_user_page(page); + refs = record_refs(pages + index, npages - index); + __unpin_user_page(page, refs); } } EXPORT_SYMBOL(unpin_user_pages_dirty_lock); @@ -327,6 +347,7 @@ EXPORT_SYMBOL(unpin_user_pages_dirty_lock); void unpin_user_pages(struct page **pages, unsigned long npages) { unsigned long index; + int refs = 1; /* * If this WARN_ON() fires, then the system *might* be leaking pages (by @@ -340,8 +361,10 @@ void unpin_user_pages(struct page **pages, unsigned long npages) * physically contiguous and part of the same compound page, then a * single operation to the head page should suffice. */ - for (index = 0; index < npages; index++) - unpin_user_page(pages[index]); + for (index = 0; index < npages; index += refs) { + refs = record_refs(pages + index, npages - index); + __unpin_user_page(pages[index], refs); + } } EXPORT_SYMBOL(unpin_user_pages); From patchwork Tue Dec 8 17:29:00 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joao Martins X-Patchwork-Id: 11959137 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, UNPARSEABLE_RELAY,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A406BC4361B for ; Tue, 8 Dec 2020 17:30:57 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 4B4A823B1A for ; Tue, 8 Dec 2020 17:30:57 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4B4A823B1A Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=oracle.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 7D76B6B0073; Tue, 8 Dec 2020 12:30:51 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 787DC6B0074; Tue, 8 Dec 2020 12:30:51 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 58E7E6B0075; Tue, 8 Dec 2020 12:30:51 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0054.hostedemail.com [216.40.44.54]) by kanga.kvack.org (Postfix) with ESMTP id 435226B0073 for ; Tue, 8 Dec 2020 12:30:51 -0500 (EST) Received: from smtpin20.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 05101180AD807 for ; Tue, 8 Dec 2020 17:30:51 +0000 (UTC) X-FDA: 77570805102.20.grip30_3002dcb273e8 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin20.hostedemail.com (Postfix) with ESMTP id DB792180C060E for ; Tue, 8 Dec 2020 17:30:50 +0000 (UTC) X-HE-Tag: grip30_3002dcb273e8 X-Filterd-Recvd-Size: 6326 Received: from userp2130.oracle.com (userp2130.oracle.com [156.151.31.86]) by imf06.hostedemail.com (Postfix) with ESMTP for ; Tue, 8 Dec 2020 17:30:50 +0000 (UTC) Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0B8HNtFI191647; Tue, 8 Dec 2020 17:30:45 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2020-01-29; bh=iIdjfPyCeZVdLs1aRy3eOjZbWDQIpPbiALVmzGStE1w=; b=whnak4zF2li/G2++F2OrxCP5NIZhbsOfoby48WhoGOPxpU0sRuKrFGp/3PkWC3SK2UHX 3GgsDeeBvfz7dqU8N7gA3c9KIaN3yiulzt9UIdRndVOU9xKbCPG1Mb0fSIUrRgFe2Lz9 /VX07NgbhaRuz+OWuofDHwb6FWqSFQcNgiG9tuuSR0I2JJrj2kGIEC+UebFjSa8+LHH5 XLi7mdHNA4VZbFcX7x2ffC9t+okWRwRwoPsx54m8Wo7XwmfIWid8MF7uIyu5/7xyy6uG 4lNgJoXbcl8aL65Ldgl8BlWOLqNxmVTLqqvfyv8NqhXhhBOlUYKfXIxL0GzWpD0SVrTo fw== Received: from aserp3020.oracle.com (aserp3020.oracle.com [141.146.126.70]) by userp2130.oracle.com with ESMTP id 3581mqv012-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Tue, 08 Dec 2020 17:30:45 +0000 Received: from pps.filterd (aserp3020.oracle.com [127.0.0.1]) by aserp3020.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0B8HOVUo195414; Tue, 8 Dec 2020 17:30:45 GMT Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by aserp3020.oracle.com with ESMTP id 358m3y2fk8-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 08 Dec 2020 17:30:44 +0000 Received: from abhmp0019.oracle.com (abhmp0019.oracle.com [141.146.116.25]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 0B8HUiLM012477; Tue, 8 Dec 2020 17:30:44 GMT Received: from paddy.uk.oracle.com (/10.175.194.215) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 08 Dec 2020 09:30:43 -0800 From: Joao Martins To: linux-mm@kvack.org Cc: Dan Williams , Ira Weiny , linux-nvdimm@lists.01.org, Matthew Wilcox , Jason Gunthorpe , Jane Chu , Muchun Song , Mike Kravetz , Andrew Morton , Joao Martins Subject: [PATCH RFC 8/9] RDMA/umem: batch page unpin in __ib_mem_release() Date: Tue, 8 Dec 2020 17:29:00 +0000 Message-Id: <20201208172901.17384-10-joao.m.martins@oracle.com> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20201208172901.17384-1-joao.m.martins@oracle.com> References: <20201208172901.17384-1-joao.m.martins@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9829 signatures=668683 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 malwarescore=0 adultscore=0 bulkscore=0 phishscore=0 suspectscore=3 mlxscore=0 mlxlogscore=829 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2012080107 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9829 signatures=668683 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=3 mlxlogscore=842 clxscore=1015 malwarescore=0 priorityscore=1501 adultscore=0 lowpriorityscore=0 phishscore=0 spamscore=0 impostorscore=0 mlxscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2012080107 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Take advantage of the newly added unpin_user_pages() batched refcount update, by calculating a page array from an SGL (same size as the one used in ib_mem_get()) and call unpin_user_pages() with that. unpin_user_pages() will check on consecutive pages that belong to the same compound page set and batch the refcount update in a single write. Running a test program which calls mr reg/unreg on a 1G in size and measures cost of both operations together (in a guest using rxe) with device-dax and hugetlbfs: Before: 159 rounds in 5.027 sec: 31617.923 usec / round (device-dax) 466 rounds in 5.009 sec: 10748.456 usec / round (hugetlbfs) After: 305 rounds in 5.010 sec: 16426.047 usec / round (device-dax) 1073 rounds in 5.004 sec: 4663.622 usec / round (hugetlbfs) We also see similar improvements on a setup with pmem and RDMA hardware. Signed-off-by: Joao Martins --- drivers/infiniband/core/umem.c | 25 ++++++++++++++++++++++--- 1 file changed, 22 insertions(+), 3 deletions(-) diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c index e9fecbdf391b..493cfdcf7381 100644 --- a/drivers/infiniband/core/umem.c +++ b/drivers/infiniband/core/umem.c @@ -44,20 +44,40 @@ #include "uverbs.h" +#define PAGES_PER_LIST (PAGE_SIZE / sizeof(struct page *)) + static void __ib_umem_release(struct ib_device *dev, struct ib_umem *umem, int dirty) { + bool make_dirty = umem->writable && dirty; + struct page **page_list = NULL; struct sg_page_iter sg_iter; + unsigned long nr = 0; struct page *page; + page_list = (struct page **) __get_free_page(GFP_KERNEL); + if (umem->nmap > 0) ib_dma_unmap_sg(dev, umem->sg_head.sgl, umem->sg_nents, DMA_BIDIRECTIONAL); for_each_sg_page(umem->sg_head.sgl, &sg_iter, umem->sg_nents, 0) { page = sg_page_iter_page(&sg_iter); - unpin_user_pages_dirty_lock(&page, 1, umem->writable && dirty); + if (page_list) + page_list[nr++] = page; + + if (!page_list) { + unpin_user_pages_dirty_lock(&page, 1, make_dirty); + } else if (nr == PAGES_PER_LIST) { + unpin_user_pages_dirty_lock(page_list, nr, make_dirty); + nr = 0; + } } + if (nr) + unpin_user_pages_dirty_lock(page_list, nr, make_dirty); + + if (page_list) + free_page((unsigned long) page_list); sg_free_table(&umem->sg_head); } @@ -212,8 +232,7 @@ struct ib_umem *ib_umem_get(struct ib_device *device, unsigned long addr, cond_resched(); ret = pin_user_pages_fast(cur_base, min_t(unsigned long, npages, - PAGE_SIZE / - sizeof(struct page *)), + PAGES_PER_LIST), gup_flags | FOLL_LONGTERM, page_list); if (ret < 0) goto umem_release; From patchwork Tue Dec 8 17:29:01 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joao Martins X-Patchwork-Id: 11959139 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, UNPARSEABLE_RELAY,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7063BC4167B for ; Tue, 8 Dec 2020 17:31:00 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 1574623B17 for ; Tue, 8 Dec 2020 17:31:00 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1574623B17 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=oracle.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 78C5F6B0074; Tue, 8 Dec 2020 12:30:56 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 6C95E6B0075; Tue, 8 Dec 2020 12:30:56 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5B3606B0078; Tue, 8 Dec 2020 12:30:56 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0144.hostedemail.com [216.40.44.144]) by kanga.kvack.org (Postfix) with ESMTP id 44C916B0074 for ; Tue, 8 Dec 2020 12:30:56 -0500 (EST) Received: from smtpin03.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 03355181AEF1A for ; Tue, 8 Dec 2020 17:30:56 +0000 (UTC) X-FDA: 77570805312.03.note50_540402b273e8 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin03.hostedemail.com (Postfix) with ESMTP id 1601328A24A for ; Tue, 8 Dec 2020 17:30:54 +0000 (UTC) X-HE-Tag: note50_540402b273e8 X-Filterd-Recvd-Size: 13603 Received: from aserp2120.oracle.com (aserp2120.oracle.com [141.146.126.78]) by imf05.hostedemail.com (Postfix) with ESMTP for ; Tue, 8 Dec 2020 17:30:53 +0000 (UTC) Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0B8HPIw8071095; Tue, 8 Dec 2020 17:30:48 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2020-01-29; bh=OgxmGknp7SIfBNCz26KqSEuDI8RgB3Wqg5tc6ikM2Ro=; b=ADzxl0fZ05pr6/BLynk4EXG8TkYwAgHt7DSXfhPur9jg+O1o5ANUon5eX4eZpboaNaep xUdrpNXfmjB4iT+Zdm8Gk7Lo6LMQtUO2ObM0ksjKh8DosMFLXBbpkOeQe2hl78TBTp9w C+GWcrV9LXeybcYRU2N7v+zLMHPrCrZAvpXehwEivy+CDbWNCUaIQTwUCVSiSy/oCM4F dQRi8wy3fLqXPpwqV542KdRoRCm7FVLDLqLtEwUqdnhN/+Ktix3K8CTyB8YUyTnbNWP9 upAgE25EkWkwL7AKdsFC7L31ma3UnVJXcsIpNPHYTdADfLRBesw/ARWXnuVZbonUgeFO rw== Received: from aserp3030.oracle.com (aserp3030.oracle.com [141.146.126.71]) by aserp2120.oracle.com with ESMTP id 35825m41mg-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Tue, 08 Dec 2020 17:30:48 +0000 Received: from pps.filterd (aserp3030.oracle.com [127.0.0.1]) by aserp3030.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0B8HOmpk060825; Tue, 8 Dec 2020 17:30:48 GMT Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by aserp3030.oracle.com with ESMTP id 358ksnwgke-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 08 Dec 2020 17:30:47 +0000 Received: from abhmp0019.oracle.com (abhmp0019.oracle.com [141.146.116.25]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 0B8HUl8x012509; Tue, 8 Dec 2020 17:30:47 GMT Received: from paddy.uk.oracle.com (/10.175.194.215) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 08 Dec 2020 09:30:46 -0800 From: Joao Martins To: linux-mm@kvack.org Cc: Dan Williams , Ira Weiny , linux-nvdimm@lists.01.org, Matthew Wilcox , Jason Gunthorpe , Jane Chu , Muchun Song , Mike Kravetz , Andrew Morton , Joao Martins Subject: [PATCH RFC 9/9] mm: Add follow_devmap_page() for devdax vmas Date: Tue, 8 Dec 2020 17:29:01 +0000 Message-Id: <20201208172901.17384-11-joao.m.martins@oracle.com> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20201208172901.17384-1-joao.m.martins@oracle.com> References: <20201208172901.17384-1-joao.m.martins@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9829 signatures=668683 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxlogscore=999 suspectscore=1 bulkscore=0 malwarescore=0 phishscore=0 mlxscore=0 spamscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2012080107 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9829 signatures=668683 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=1 adultscore=0 bulkscore=0 phishscore=0 mlxlogscore=999 clxscore=1015 priorityscore=1501 mlxscore=0 spamscore=0 lowpriorityscore=0 malwarescore=0 impostorscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2012080107 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Similar to follow_hugetlb_page() add a follow_devmap_page which rather than calling follow_page() per 4K page in a PMD/PUD it does so for the entire PMD, where we lock the pmd/pud, get all pages , unlock. While doing so, we only change the refcount once when PGMAP_COMPOUND is passed in. This let us improve {pin,get}_user_pages{,_longterm}() considerably: $ gup_benchmark -f /dev/dax0.2 -m 16384 -r 10 -S [-U,-b,-L] -n 512 -w () [before] -> [after] (get_user_pages 2M pages) ~150k us -> ~8.9k us (pin_user_pages 2M pages) ~192k us -> ~9k us (pin_user_pages_longterm 2M pages) ~200k us -> ~19k us Signed-off-by: Joao Martins --- I've special-cased this to device-dax vmas given its similar page size guarantees as hugetlbfs, but I feel this is a bit wrong. I am replicating follow_hugetlb_page() as RFC ought to seek feedback whether this should be generalized if no fundamental issues exist. In such case, should I be changing follow_page_mask() to take either an array of pages or a function pointer and opaque arguments which would let caller pick its structure? --- include/linux/huge_mm.h | 4 + include/linux/mm.h | 2 + mm/gup.c | 22 ++++- mm/huge_memory.c | 202 ++++++++++++++++++++++++++++++++++++++++ 4 files changed, 227 insertions(+), 3 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 0365aa97f8e7..da87ecea19e6 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -293,6 +293,10 @@ struct page *follow_devmap_pmd(struct vm_area_struct *vma, unsigned long addr, pmd_t *pmd, int flags, struct dev_pagemap **pgmap); struct page *follow_devmap_pud(struct vm_area_struct *vma, unsigned long addr, pud_t *pud, int flags, struct dev_pagemap **pgmap); +long follow_devmap_page(struct mm_struct *mm, struct vm_area_struct *vma, + struct page **pages, struct vm_area_struct **vmas, + unsigned long *position, unsigned long *nr_pages, + long i, unsigned int flags, int *locked); extern vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf, pmd_t orig_pmd); diff --git a/include/linux/mm.h b/include/linux/mm.h index 8b0155441835..466c88679628 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1164,6 +1164,8 @@ static inline void get_page(struct page *page) page_ref_inc(page); } +__maybe_unused struct page *try_grab_compound_head(struct page *page, int refs, + unsigned int flags); bool __must_check try_grab_page(struct page *page, unsigned int flags); static inline __must_check bool try_get_page(struct page *page) diff --git a/mm/gup.c b/mm/gup.c index 3a9a7229f418..50effb9cc349 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -78,7 +78,7 @@ static inline struct page *try_get_compound_head(struct page *page, int refs) * considered failure, and furthermore, a likely bug in the caller, so a warning * is also emitted. */ -static __maybe_unused struct page *try_grab_compound_head(struct page *page, +__maybe_unused struct page *try_grab_compound_head(struct page *page, int refs, unsigned int flags) { @@ -880,8 +880,8 @@ static int get_gate_page(struct mm_struct *mm, unsigned long address, * does not include FOLL_NOWAIT, the mmap_lock may be released. If it * is, *@locked will be set to 0 and -EBUSY returned. */ -static int faultin_page(struct vm_area_struct *vma, - unsigned long address, unsigned int *flags, int *locked) +int faultin_page(struct vm_area_struct *vma, + unsigned long address, unsigned int *flags, int *locked) { unsigned int fault_flags = 0; vm_fault_t ret; @@ -1103,6 +1103,22 @@ static long __get_user_pages(struct mm_struct *mm, } continue; } + if (vma_is_dax(vma)) { + i = follow_devmap_page(mm, vma, pages, vmas, + &start, &nr_pages, i, + gup_flags, locked); + if (locked && *locked == 0) { + /* + * We've got a VM_FAULT_RETRY + * and we've lost mmap_lock. + * We must stop here. + */ + BUG_ON(gup_flags & FOLL_NOWAIT); + BUG_ON(ret != 0); + goto out; + } + continue; + } } retry: /* diff --git a/mm/huge_memory.c b/mm/huge_memory.c index ec2bb93f7431..20bfbf211dc3 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1168,6 +1168,208 @@ struct page *follow_devmap_pud(struct vm_area_struct *vma, unsigned long addr, return page; } +long follow_devmap_page(struct mm_struct *mm, struct vm_area_struct *vma, + struct page **pages, struct vm_area_struct **vmas, + unsigned long *position, unsigned long *nr_pages, + long i, unsigned int flags, int *locked) +{ + unsigned long pfn_offset; + unsigned long vaddr = *position; + unsigned long remainder = *nr_pages; + unsigned long align = vma_kernel_pagesize(vma); + unsigned long align_nr_pages = align >> PAGE_SHIFT; + unsigned long mask = ~(align-1); + unsigned long nr_pages_hpage = 0; + struct dev_pagemap *pgmap = NULL; + int err = -EFAULT; + + if (align == PAGE_SIZE) + return i; + + while (vaddr < vma->vm_end && remainder) { + pte_t *pte; + spinlock_t *ptl = NULL; + int absent; + struct page *page; + + /* + * If we have a pending SIGKILL, don't keep faulting pages and + * potentially allocating memory. + */ + if (fatal_signal_pending(current)) { + remainder = 0; + break; + } + + /* + * Some archs (sparc64, sh*) have multiple pte_ts to + * each hugepage. We have to make sure we get the + * first, for the page indexing below to work. + * + * Note that page table lock is not held when pte is null. + */ + pte = huge_pte_offset(mm, vaddr & mask, align); + if (pte) { + if (align == PMD_SIZE) + ptl = pmd_lockptr(mm, (pmd_t *) pte); + else if (align == PUD_SIZE) + ptl = pud_lockptr(mm, (pud_t *) pte); + spin_lock(ptl); + } + absent = !pte || pte_none(ptep_get(pte)); + + if (absent && (flags & FOLL_DUMP)) { + if (pte) + spin_unlock(ptl); + remainder = 0; + break; + } + + if (absent || + ((flags & FOLL_WRITE) && + !pte_write(ptep_get(pte)))) { + vm_fault_t ret; + unsigned int fault_flags = 0; + + if (pte) + spin_unlock(ptl); + if (flags & FOLL_WRITE) + fault_flags |= FAULT_FLAG_WRITE; + if (locked) + fault_flags |= FAULT_FLAG_ALLOW_RETRY | + FAULT_FLAG_KILLABLE; + if (flags & FOLL_NOWAIT) + fault_flags |= FAULT_FLAG_ALLOW_RETRY | + FAULT_FLAG_RETRY_NOWAIT; + if (flags & FOLL_TRIED) { + /* + * Note: FAULT_FLAG_ALLOW_RETRY and + * FAULT_FLAG_TRIED can co-exist + */ + fault_flags |= FAULT_FLAG_TRIED; + } + ret = handle_mm_fault(vma, vaddr, flags, NULL); + if (ret & VM_FAULT_ERROR) { + err = vm_fault_to_errno(ret, flags); + remainder = 0; + break; + } + if (ret & VM_FAULT_RETRY) { + if (locked && + !(fault_flags & FAULT_FLAG_RETRY_NOWAIT)) + *locked = 0; + *nr_pages = 0; + /* + * VM_FAULT_RETRY must not return an + * error, it will return zero + * instead. + * + * No need to update "position" as the + * caller will not check it after + * *nr_pages is set to 0. + */ + return i; + } + continue; + } + + pfn_offset = (vaddr & ~mask) >> PAGE_SHIFT; + page = pte_page(ptep_get(pte)); + + pgmap = get_dev_pagemap(page_to_pfn(page), pgmap); + if (!pgmap) { + spin_unlock(ptl); + remainder = 0; + err = -EFAULT; + break; + } + + /* + * If subpage information not requested, update counters + * and skip the same_page loop below. + */ + if (!pages && !vmas && !pfn_offset && + (vaddr + align < vma->vm_end) && + (remainder >= (align_nr_pages))) { + vaddr += align; + remainder -= align_nr_pages; + i += align_nr_pages; + spin_unlock(ptl); + continue; + } + + nr_pages_hpage = 0; + +same_page: + if (pages) { + pages[i] = mem_map_offset(page, pfn_offset); + + /* + * try_grab_page() should always succeed here, because: + * a) we hold the ptl lock, and b) we've just checked + * that the huge page is present in the page tables. + */ + if (!(pgmap->flags & PGMAP_COMPOUND) && + WARN_ON_ONCE(!try_grab_page(pages[i], flags))) { + spin_unlock(ptl); + remainder = 0; + err = -ENOMEM; + break; + } + + } + + if (vmas) + vmas[i] = vma; + + vaddr += PAGE_SIZE; + ++pfn_offset; + --remainder; + ++i; + nr_pages_hpage++; + if (vaddr < vma->vm_end && remainder && + pfn_offset < align_nr_pages) { + /* + * We use pfn_offset to avoid touching the pageframes + * of this compound page. + */ + goto same_page; + } else { + /* + * try_grab_compound_head() should always succeed here, + * because: a) we hold the ptl lock, and b) we've just + * checked that the huge page is present in the page + * tables. If the huge page is present, then the tail + * pages must also be present. The ptl prevents the + * head page and tail pages from being rearranged in + * any way. So this page must be available at this + * point, unless the page refcount overflowed: + */ + if ((pgmap->flags & PGMAP_COMPOUND) && + WARN_ON_ONCE(!try_grab_compound_head(pages[i-1], + nr_pages_hpage, + flags))) { + put_dev_pagemap(pgmap); + spin_unlock(ptl); + remainder = 0; + err = -ENOMEM; + break; + } + put_dev_pagemap(pgmap); + } + spin_unlock(ptl); + } + *nr_pages = remainder; + /* + * setting position is actually required only if remainder is + * not zero but it's faster not to add a "if (remainder)" + * branch. + */ + *position = vaddr; + + return i ? i : err; +} + int copy_huge_pud(struct mm_struct *dst_mm, struct mm_struct *src_mm, pud_t *dst_pud, pud_t *src_pud, unsigned long addr, struct vm_area_struct *vma)