From patchwork Wed Apr 20 15:53:06 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joao Martins X-Patchwork-Id: 12820520 Received: from mx0a-00069f02.pphosted.com (mx0a-00069f02.pphosted.com [205.220.165.32]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 400E3185B for ; Wed, 20 Apr 2022 15:54:23 +0000 (UTC) Received: from pps.filterd (m0246627.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.16.1.2/8.16.1.2) with SMTP id 23KDh2Dn019815; Wed, 20 Apr 2022 15:54:09 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : content-type : mime-version; s=corp-2021-07-09; bh=AwJmhFsM52ANORHnTZ5JeadwvF5WNYlOh8ukDsPyOGo=; b=tjiReahPAHAyUH3gm6jiko3u664kn7TINBYUw/FBceX+ziMDDEfdtzYW8ZY7nykMjED/ h0dYPrRhm2waoTwWAmdz1AwwcDx+Kibg+Se278SYLg2jL7jDROhdUONIvCtjXzNLWxtv sr0FdwurjjHUP/kX+aILAef+zIUzD7YKJYtDXYBy5Mr+HrvFsppRrinz4DVouPO2+qNb tSE6DjEA8JqAcP51u2xgwxIAmLAhaMuz3kdHYukAFW007bWHjzLiD+0lgx5xVdS5A0pt qAmtN+M6dxn/ijAE5pNIMYOxnspHHjUknWTccgb4Z/RQZCaCKmSOeqqPlU1pcGtxR8yQ Vw== Received: from phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta02.appoci.oracle.com [147.154.114.232]) by mx0b-00069f02.pphosted.com with ESMTP id 3ffmd19ka0-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 20 Apr 2022 15:54:09 +0000 Received: from pps.filterd (phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com (8.16.1.2/8.16.1.2) with SMTP id 23KFokjG037654; Wed, 20 Apr 2022 15:54:00 GMT Received: from nam11-dm6-obe.outbound.protection.outlook.com (mail-dm6nam11lp2174.outbound.protection.outlook.com [104.47.57.174]) by phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com with ESMTP id 3ffm87c890-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 20 Apr 2022 15:54:00 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=QDXAZ63Q3q1Z/WtzhwY7HHWkNYca5cwyynzuxm96sCXPGlauvOeYk0FaTNr1vpMOxuMSYLAAiETd6EH7gWY5rGYrkwONKSCA55dNUhEPM1Dd0RM6FZsAMfXX3d5ZLfQwl0hhGGnugYUJk/ktAFXaEy+k/+kcgQak4BnNVUWgGYFvdE3HYif/69n/mFXZfjJAwEQFcT204Op1gDRUBkwXfaIHTjQ31cKFGmPYWtoaPgXB7fdYhtPlTfRhXn4xMYFqwhYJhE8JdCRbLsdwiWRSg/6TFDbAEKIz0bBUhmBjWomKVZOynxSQbYRu5V3997//YZJzKun2zde5H8bBJkSaCw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=AwJmhFsM52ANORHnTZ5JeadwvF5WNYlOh8ukDsPyOGo=; b=aFTJFvUthPz6YBM2IstKffvy/p8xcph4ftzmPQI7aUv+F1ash/FznLmo7G17MIf9ysah4624S8eBhoPZhMcj/Wy7VdDkfZZ7hBEO3zAD+qLSlXuVfI9n/OZiohQcq+umlp4N/ORmFPnxNpJ5YhCEX9LGeKw7knzTj4V+Vl+EYOWqE7pHco+CZhL0/5jSeHyifBH3RSeWCf5mlyiNWEEbOO0FA0H1YYoUC93arDf1Sj7KZrCyQwgs7HdjkeYtbguYGXqPM0tx14Xs8R1mT6UdXxNzGLOQkHfUiUWUpObd53byuE6rzRsEkgpcYjnxcsFoWTyVkCTgfsbhm7DZYQoJZA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=oracle.com; dmarc=pass action=none header.from=oracle.com; dkim=pass header.d=oracle.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.onmicrosoft.com; s=selector2-oracle-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=AwJmhFsM52ANORHnTZ5JeadwvF5WNYlOh8ukDsPyOGo=; b=azb4dg16di8vEa7QVF9n7+uKFKAnzqNHRJCufLCqGwfrxRt6OA4xl4KhD0bMyIP1yRoMgZ8ftSHyjRexyodMYsAaL0VsvrFmOvB7BZy7hqZjbRmQd2BfbAWaga9M3okYuoCaX0sdXpL7eQjXyL0UirVb+0c0uov5uRYjD19iVUU= Received: from BLAPR10MB4835.namprd10.prod.outlook.com (2603:10b6:208:331::11) by BYAPR10MB3046.namprd10.prod.outlook.com (2603:10b6:a03:8e::17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5164.20; Wed, 20 Apr 2022 15:53:58 +0000 Received: from BLAPR10MB4835.namprd10.prod.outlook.com ([fe80::d17f:a2a4:ca0c:cb49]) by BLAPR10MB4835.namprd10.prod.outlook.com ([fe80::d17f:a2a4:ca0c:cb49%4]) with mapi id 15.20.5186.013; Wed, 20 Apr 2022 15:53:58 +0000 From: Joao Martins To: linux-mm@kvack.org Cc: Dan Williams , Vishal Verma , Matthew Wilcox , Jason Gunthorpe , Jane Chu , Muchun Song , Mike Kravetz , Andrew Morton , Jonathan Corbet , Christoph Hellwig , nvdimm@lists.linux.dev, linux-doc@vger.kernel.org Subject: [PATCH v9 1/5] mm/sparse-vmemmap: add a pgmap argument to section activation Date: Wed, 20 Apr 2022 16:53:06 +0100 Message-Id: <20220420155310.9712-2-joao.m.martins@oracle.com> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20220420155310.9712-1-joao.m.martins@oracle.com> References: <20220420155310.9712-1-joao.m.martins@oracle.com> X-ClientProxiedBy: SGAP274CA0011.SGPP274.PROD.OUTLOOK.COM (2603:1096:4:b6::23) To BLAPR10MB4835.namprd10.prod.outlook.com (2603:10b6:208:331::11) Precedence: bulk X-Mailing-List: nvdimm@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 0a62b5d0-e1c7-4e8b-25b7-08da22e5faa0 X-MS-TrafficTypeDiagnostic: BYAPR10MB3046:EE_ X-Microsoft-Antispam-PRVS: X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: z9UymbEjPHGlA6C1UjlaORXV08PtSVm/N3rVpt6lc8R9fkyEhaNv+tIbxRvbPn9nmcwoiEjci0fARpPzUytp513y69P6LO/Q1cuvFY8ImTZ8KcE5rLpeS+TJa+TpsPBX03qX4NR25R2YntGHe5IS1GIUG5gXnjuAMZAzqm9HEZnsJ38UYECqgpQTHcEIdCWD/eB3w6rX2Guf33BaX1TrlrV3s7gtExSrF/a6n/zrHXq5xNm2GoKEOZ/a46UNYDgEzcVUASNt1QTKuzCXyZNYuYht9hEY+51FCDsl+aBBMQoqFj2t3Hf3l62ly+nklWybtNxIkYm5qfNCgW4hZfLQBh6fRkchRBRd4EC/Ks1EL6I/pu4FwN6XaY6gpTbhUQkrMttpSb/OrbTm/GgTsTscpB3rIUCxg3KF0Z5K3jybQNrjrPZlEqNIhIZ/n95edCWjabdKv0xrdTa6/y6+9TUb7uDdOJ20nUtrWbG5Y3xk+mwzvGYyeyzLBAh4mkFWEUGAqJdu3iJManOn7VLlr4z7Kf5N0UrR0KDkM7mIkgTTeQKtLuvNYLsOS5xxXpcygp+RH0jv837zEAqdQaKHHH44jxnpY4MtQmHRTDd+QfnXKBLfV3gtDayFuFl58b4kMQ4Y1BxSTAYtjekyrADiXogfCixEaJKUqaskp02rCqVUIrWomySXCGNnYuHfTdju40q4d67VAj+Jy1nqlwcXWyD4dizpZ1TJawID/wnl+jsIKIDbfuS5/kn7AIUXkGAtrmnPS9ZxbXXA5m0NDt5kd6sXu6qhHXUfrxosaVg6xMpKRRQ= X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:BLAPR10MB4835.namprd10.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230001)(366004)(103116003)(36756003)(66946007)(66556008)(186003)(966005)(2906002)(1076003)(66476007)(4326008)(8676002)(508600001)(38100700002)(6486002)(38350700002)(2616005)(5660300002)(52116002)(6666004)(86362001)(316002)(7416002)(6512007)(26005)(83380400001)(54906003)(6916009)(6506007)(8936002);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: GGYK/8T0Wz3g4iRIFG50pZ2WO7ms7k+J+SBbMss3KhBqyEsyTOlBNsMysezq2oueklJnccgbFLBq0K8gYzwtkGEr3EibzVdhT55AYkRAthkd0rVcVbk+Yz8DPXvf8ONhUehanocWLvRCVWewyLbDYb3Bww8oSq7BsTBGiDvFqxSD7ahiuxFP/P/hYljDuXl0YRd+u11GomJlETrmZkHdR4Y6Er3x9B4/w9NiaoQAvDU9cyIjoFwidOik9QoLzGn99hBgXRmF3prvUX1IllWvN5KnQLUZWZZTqpWeMacrz7hBDsA8VTdZ9ly6964PeVAvJY97hKTnqniVpdvlBMlPVISTBnx18RviasNC27BWx4m2nLmfXuzrwKs7M414Gf4NzbRqwhfZM/Mm/Zn2qdRMgyhN+aUOB6ZJc1anEHqsuKz2RbK19eepVe2H+1hP0TB1+S3JU0JXOf9v67NkL1i2jeVBor8N5+OB+87zOIyIanWj/GBOwXIqSwhcYQ1/b7ypee0AE6Wlrh1gSXgDp1vf3OGtZZjsD5ZGIs+/8B8bTclTUChJ+kcA21ztfg0sX8U2rGtowt8ul/C3ZC6iw1F264aBb+MiKUw5K9ZFXMrhFx0+tMUqkcxC+wMFyPj7zqmpFY9xIEo+9se2eou+aOKOcFaGR5+a27TckkPOBZ2Xxu3NYeLLN0wBaDkWfSQ9oJWVDr9iwCUPrgIhxDNj5ZSK/YIqH3qgxbD5pKbpHvHI2gGBFdMi/h8UDkcezbFPBstnZjw6aJ0OCJWYQKsVCQ00S+6G+MCRbo8Rxgoi6afViMNDZ+OtUL0l9YTDHXkAtlYoFjYR0fOXTaNJ6u6tuEEtE4SdX9Hggf80/X2/hLc90WLr5f2wznbd5hqFU15q/USjeXHet40SXA5UKBMwtkpWy7NMtP03UVbkf+xQ/LDPG1o8rEtfGUF1uzDLJEXRYuApsC1VMuyhoiCzpF3RlUxLZJcQFShsJCA2Ba32nWrXUXnEB1A6vfOG5DSE8Y4OOOeK7YedSiRFILQGFWq7ncorZDamMVSft3TMQ4rdD2f124WygBKfJ9EqiUJqQPiEWjPKD5mTKXjuHTc0V0ch8/dD3Pd5LJWpFnFQcBazhhBNNNn+XuppGQfGHUfOaGff9VCZ3ggyCPju0gimy2aeqqLHpixJTtL11MC0ub3UmDUsdJWJnBabQkTGLsesvA4ecLmyh2LNcfRehBP0yhWGKk+LoGNoJzrQ5zj1OzoNEAfxGEBku4XWRGG+bUez3kQo6eUEwiSj5ZOInPhUp93kftOEipOlkiegWAVsHmQuVmUzNJOVzDLietiDkT/3FjE78J/1strhvGEmj1p0fXMrmYUqTy1XK7tly2EUPqEJv94tPSxV5HssTtMmzLYyeXZ1GWDUZL4Twx/8vGYzW7aGI1zTJgzXc98kIwDcpnQignydy7d9ql3Mp8iQnJlGSgkEVv3gWmT9GRIlTr0qgJgm1H7Vq+3kcD/o6LMiWQ5POq7jFK+kJy72CxgkZ6qqzMlYo41L81W7Q2Dp5WW9Rqm8uwTOeR/+A2HpyBZuGpywY1hb3FOYcCMjB8/bvopRFiG3KO7Rnt65miTbgw8mEUNulRTnhfzEef9voBtcEZZNYStxZbF7ueWyq4zy8VEKa/orvn+x17gOLbebEEww9kAIfTmk88anhargL+oGLK+HHCfViT/KoRH9yrIY43RSzFcp7xHHLAf+lhDVQKdX4YCxvFZ1ywygjIJulUed2OtTWsNmJvU= X-OriginatorOrg: oracle.com X-MS-Exchange-CrossTenant-Network-Message-Id: 0a62b5d0-e1c7-4e8b-25b7-08da22e5faa0 X-MS-Exchange-CrossTenant-AuthSource: BLAPR10MB4835.namprd10.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 20 Apr 2022 15:53:58.0732 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 4e2c6054-71cb-48f1-bd6c-3a9705aca71b X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: ay69S1yjLOzp0WjdVJer8EA3s7ZgfttZJJO6PoXKT4gFgS83kfAMSJoIVRRW9h6Wtw5sqb1mNwPrneBd6veSDrsbqDD+fEdgOgwfszP9aZI= X-MS-Exchange-Transport-CrossTenantHeadersStamped: BYAPR10MB3046 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.486,18.0.858 definitions=2022-04-20_04:2022-04-20,2022-04-20 signatures=0 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 phishscore=0 bulkscore=0 suspectscore=0 malwarescore=0 mlxlogscore=999 adultscore=0 mlxscore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2202240000 definitions=main-2204200094 X-Proofpoint-ORIG-GUID: YFJLfdUTG3Tf5Ey-DBNPrby1bc-tZIYx X-Proofpoint-GUID: YFJLfdUTG3Tf5Ey-DBNPrby1bc-tZIYx In support of using compound pages for devmap mappings, plumb the pgmap down to the vmemmap_populate implementation. Note that while altmap is retrievable from pgmap the memory hotplug code passes altmap without pgmap[*], so both need to be independently plumbed. So in addition to @altmap, pass @pgmap to sparse section populate functions namely: sparse_add_section section_activate populate_section_memmap __populate_section_memmap Passing @pgmap allows __populate_section_memmap() to both fetch the vmemmap_shift in which memmap metadata is created for and also to let sparse-vmemmap fetch pgmap ranges to co-relate to a given section and pick whether to just reuse tail pages from past onlined sections. While at it, fix the kdoc for @altmap for sparse_add_section(). [*] https://lore.kernel.org/linux-mm/20210319092635.6214-1-osalvador@suse.de/ Signed-off-by: Joao Martins Reviewed-by: Dan Williams Reviewed-by: Muchun Song --- include/linux/memory_hotplug.h | 5 ++++- include/linux/mm.h | 3 ++- mm/memory_hotplug.c | 3 ++- mm/sparse-vmemmap.c | 3 ++- mm/sparse.c | 26 ++++++++++++++++---------- 5 files changed, 26 insertions(+), 14 deletions(-) diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index 7ab15d6fb227..029fb7e26504 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -15,6 +15,7 @@ struct memory_block; struct memory_group; struct resource; struct vmem_altmap; +struct dev_pagemap; #ifdef CONFIG_HAVE_ARCH_NODEDATA_EXTENSION /* @@ -122,6 +123,7 @@ typedef int __bitwise mhp_t; struct mhp_params { struct vmem_altmap *altmap; pgprot_t pgprot; + struct dev_pagemap *pgmap; }; bool mhp_range_allowed(u64 start, u64 size, bool need_mapping); @@ -333,7 +335,8 @@ extern void remove_pfn_range_from_zone(struct zone *zone, unsigned long nr_pages); extern bool is_memblock_offlined(struct memory_block *mem); extern int sparse_add_section(int nid, unsigned long pfn, - unsigned long nr_pages, struct vmem_altmap *altmap); + unsigned long nr_pages, struct vmem_altmap *altmap, + struct dev_pagemap *pgmap); extern void sparse_remove_section(struct mem_section *ms, unsigned long pfn, unsigned long nr_pages, unsigned long map_offset, struct vmem_altmap *altmap); diff --git a/include/linux/mm.h b/include/linux/mm.h index ad4b6c15c814..62564d81d8cb 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3202,7 +3202,8 @@ int vmemmap_remap_alloc(unsigned long start, unsigned long end, void *sparse_buffer_alloc(unsigned long size); struct page * __populate_section_memmap(unsigned long pfn, - unsigned long nr_pages, int nid, struct vmem_altmap *altmap); + unsigned long nr_pages, int nid, struct vmem_altmap *altmap, + struct dev_pagemap *pgmap); pgd_t *vmemmap_pgd_populate(unsigned long addr, int node); p4d_t *vmemmap_p4d_populate(pgd_t *pgd, unsigned long addr, int node); pud_t *vmemmap_pud_populate(p4d_t *p4d, unsigned long addr, int node); diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 74430f88853d..8257e2e619c2 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -328,7 +328,8 @@ int __ref __add_pages(int nid, unsigned long pfn, unsigned long nr_pages, /* Select all remaining pages up to the next section boundary */ cur_nr_pages = min(end_pfn - pfn, SECTION_ALIGN_UP(pfn + 1) - pfn); - err = sparse_add_section(nid, pfn, cur_nr_pages, altmap); + err = sparse_add_section(nid, pfn, cur_nr_pages, altmap, + params->pgmap); if (err) break; cond_resched(); diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c index 52f36527bab3..fb68e7764ba2 100644 --- a/mm/sparse-vmemmap.c +++ b/mm/sparse-vmemmap.c @@ -641,7 +641,8 @@ int __meminit vmemmap_populate_basepages(unsigned long start, unsigned long end, } struct page * __meminit __populate_section_memmap(unsigned long pfn, - unsigned long nr_pages, int nid, struct vmem_altmap *altmap) + unsigned long nr_pages, int nid, struct vmem_altmap *altmap, + struct dev_pagemap *pgmap) { unsigned long start = (unsigned long) pfn_to_page(pfn); unsigned long end = start + nr_pages * sizeof(struct page); diff --git a/mm/sparse.c b/mm/sparse.c index 952f06d8f373..d2d76d158b39 100644 --- a/mm/sparse.c +++ b/mm/sparse.c @@ -427,7 +427,8 @@ static unsigned long __init section_map_size(void) } struct page __init *__populate_section_memmap(unsigned long pfn, - unsigned long nr_pages, int nid, struct vmem_altmap *altmap) + unsigned long nr_pages, int nid, struct vmem_altmap *altmap, + struct dev_pagemap *pgmap) { unsigned long size = section_map_size(); struct page *map = sparse_buffer_alloc(size); @@ -524,7 +525,7 @@ static void __init sparse_init_nid(int nid, unsigned long pnum_begin, break; map = __populate_section_memmap(pfn, PAGES_PER_SECTION, - nid, NULL); + nid, NULL, NULL); if (!map) { pr_err("%s: node[%d] memory map backing failed. Some memory will not be available.", __func__, nid); @@ -629,9 +630,10 @@ void offline_mem_sections(unsigned long start_pfn, unsigned long end_pfn) #ifdef CONFIG_SPARSEMEM_VMEMMAP static struct page * __meminit populate_section_memmap(unsigned long pfn, - unsigned long nr_pages, int nid, struct vmem_altmap *altmap) + unsigned long nr_pages, int nid, struct vmem_altmap *altmap, + struct dev_pagemap *pgmap) { - return __populate_section_memmap(pfn, nr_pages, nid, altmap); + return __populate_section_memmap(pfn, nr_pages, nid, altmap, pgmap); } static void depopulate_section_memmap(unsigned long pfn, unsigned long nr_pages, @@ -700,7 +702,8 @@ static int fill_subsection_map(unsigned long pfn, unsigned long nr_pages) } #else struct page * __meminit populate_section_memmap(unsigned long pfn, - unsigned long nr_pages, int nid, struct vmem_altmap *altmap) + unsigned long nr_pages, int nid, struct vmem_altmap *altmap, + struct dev_pagemap *pgmap) { return kvmalloc_node(array_size(sizeof(struct page), PAGES_PER_SECTION), GFP_KERNEL, nid); @@ -823,7 +826,8 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages, } static struct page * __meminit section_activate(int nid, unsigned long pfn, - unsigned long nr_pages, struct vmem_altmap *altmap) + unsigned long nr_pages, struct vmem_altmap *altmap, + struct dev_pagemap *pgmap) { struct mem_section *ms = __pfn_to_section(pfn); struct mem_section_usage *usage = NULL; @@ -855,7 +859,7 @@ static struct page * __meminit section_activate(int nid, unsigned long pfn, if (nr_pages < PAGES_PER_SECTION && early_section(ms)) return pfn_to_page(pfn); - memmap = populate_section_memmap(pfn, nr_pages, nid, altmap); + memmap = populate_section_memmap(pfn, nr_pages, nid, altmap, pgmap); if (!memmap) { section_deactivate(pfn, nr_pages, altmap); return ERR_PTR(-ENOMEM); @@ -869,7 +873,8 @@ static struct page * __meminit section_activate(int nid, unsigned long pfn, * @nid: The node to add section on * @start_pfn: start pfn of the memory range * @nr_pages: number of pfns to add in the section - * @altmap: device page map + * @altmap: alternate pfns to allocate the memmap backing store + * @pgmap: alternate compound page geometry for devmap mappings * * This is only intended for hotplug. * @@ -883,7 +888,8 @@ static struct page * __meminit section_activate(int nid, unsigned long pfn, * * -ENOMEM - Out of memory. */ int __meminit sparse_add_section(int nid, unsigned long start_pfn, - unsigned long nr_pages, struct vmem_altmap *altmap) + unsigned long nr_pages, struct vmem_altmap *altmap, + struct dev_pagemap *pgmap) { unsigned long section_nr = pfn_to_section_nr(start_pfn); struct mem_section *ms; @@ -894,7 +900,7 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn, if (ret < 0) return ret; - memmap = section_activate(nid, start_pfn, nr_pages, altmap); + memmap = section_activate(nid, start_pfn, nr_pages, altmap, pgmap); if (IS_ERR(memmap)) return PTR_ERR(memmap); From patchwork Wed Apr 20 15:53:07 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joao Martins X-Patchwork-Id: 12820522 Received: from mx0a-00069f02.pphosted.com (mx0a-00069f02.pphosted.com [205.220.165.32]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 38DD71862 for ; Wed, 20 Apr 2022 15:54:26 +0000 (UTC) Received: from pps.filterd (m0246627.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.16.1.2/8.16.1.2) with SMTP id 23KFlNAY020195; Wed, 20 Apr 2022 15:54:11 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : content-type : mime-version; s=corp-2021-07-09; bh=oBlHkE/27aRnCcLZgw1SLiFlxrFQbSNKMwuZonexp8Y=; b=ebAbCiJgLN4tEu88xadbOhWvSevXvEtpvE9cpRUyxCE4tLTZc1QPp8GxTRop3vvCBspV agxtVeUF+eGVMY2sXjbTdFVXQU6nuYBFVV7v4EaojbY8uQ3uBT3T9SB0l/C/KlSSDs2j bGJ18Iz3JBM82Ln+VoGoF3v4MXmTROcdpVFgQMZHVJZQfQnuqLmMyRCLxSzpLBV63FwB PVc3S1XAnY3/IWctYYHCj87GDPK0k552ZgjbRIn+W86NJL76XSOF/OOLoEFERuARpQ+/ p/eIAGJuoI93QWS0D8lc/wMnWwXntrivu0bN6kKGIfqFID0oi99aqlpNHtHJ6kksQwpQ kA== Received: from phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta02.appoci.oracle.com [147.154.114.232]) by mx0b-00069f02.pphosted.com with ESMTP id 3ffmd19kaj-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 20 Apr 2022 15:54:10 +0000 Received: from pps.filterd (phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com (8.16.1.2/8.16.1.2) with SMTP id 23KFojbT037583; Wed, 20 Apr 2022 15:54:09 GMT Received: from nam12-mw2-obe.outbound.protection.outlook.com (mail-mw2nam12lp2042.outbound.protection.outlook.com [104.47.66.42]) by phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com with ESMTP id 3ffm87c8n4-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 20 Apr 2022 15:54:09 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=lrOrdRM45iOrpQD9YbHxyW2SBtWM/ba7mN/Ib/dIEwPWPIoZHCN9FoVC5yUvFBTefo0worbvqalWCN1kSHCoFS9DpcAbkPjpCXPxgSlDgcDjTy7un0e07DdKceiCmzBPQyvtaobLfOyS78zDWJBrOQtvIdHguy/tR8RPO7DPhTQY+fkYtI4HnviUKFKyVqFxQbjgNOeTHP6qcA9F42wEiOGR50OAqUshUwmsQnbt9/5XAk141pwMeMloSR8y2EgCknzgJiehrpBteQ7uutx8xdPceBT2cmCVqI9A5rW90DWKrP3lPYNnxtQw/Nz5uSOHLgA1xzi40dh3FJGVeER+kw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=oBlHkE/27aRnCcLZgw1SLiFlxrFQbSNKMwuZonexp8Y=; b=Ns5PWk4cglJChbeaqi3qDEvnLZLaB2RbwXOKyo1I/0fb3yMxbFMIuqhfeUy/DcR9P6taORRtnLa406FR3OBDUcoUH/CAgXpZj/gIR4W4OgDCiMde/WbR2vDVBfugqmXpZ5TloCRXRz9RELCfik3RIrDfvM1c2tU9fBRoO102Y3GmrkO182N5rgzRyGpr/u8iMIxtpQ/v/rXQhsGlM7PHir2o3z3MGKqNkzH2yl6IHXKFoLPvWfYOnLmqRtvmexJ09JUR7S45tc6XfVtY4zz+cpysHX1TC7oCX6qy/neSfz/yXLSFUOMVT93NpHeMSw6WlcxXR1mmw/NvrCioi026+Q== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=oracle.com; dmarc=pass action=none header.from=oracle.com; dkim=pass header.d=oracle.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.onmicrosoft.com; s=selector2-oracle-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=oBlHkE/27aRnCcLZgw1SLiFlxrFQbSNKMwuZonexp8Y=; b=Q+CINwEt2dhPa1poAW/G2fQTvTc+VlNRDrLgte8sKYw0oFXHY5NZIeChK0ST7JNfsXW7qIPKs9Xgp+dk8J/eWpW5JCbuY1czOvJLadEfWM2Ffa4UKBTL8V4H+9XN/GIlC9qcEHDNNOrr3KUTJ8bmLwEcZw0Onys45YOClk2IRgw= Received: from BLAPR10MB4835.namprd10.prod.outlook.com (2603:10b6:208:331::11) by MWHPR1001MB2208.namprd10.prod.outlook.com (2603:10b6:301:2c::37) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5186.13; Wed, 20 Apr 2022 15:54:06 +0000 Received: from BLAPR10MB4835.namprd10.prod.outlook.com ([fe80::d17f:a2a4:ca0c:cb49]) by BLAPR10MB4835.namprd10.prod.outlook.com ([fe80::d17f:a2a4:ca0c:cb49%4]) with mapi id 15.20.5186.013; Wed, 20 Apr 2022 15:54:06 +0000 From: Joao Martins To: linux-mm@kvack.org Cc: Dan Williams , Vishal Verma , Matthew Wilcox , Jason Gunthorpe , Jane Chu , Muchun Song , Mike Kravetz , Andrew Morton , Jonathan Corbet , Christoph Hellwig , nvdimm@lists.linux.dev, linux-doc@vger.kernel.org Subject: [PATCH v9 2/5] mm/sparse-vmemmap: refactor core of vmemmap_populate_basepages() to helper Date: Wed, 20 Apr 2022 16:53:07 +0100 Message-Id: <20220420155310.9712-3-joao.m.martins@oracle.com> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20220420155310.9712-1-joao.m.martins@oracle.com> References: <20220420155310.9712-1-joao.m.martins@oracle.com> X-ClientProxiedBy: SGAP274CA0011.SGPP274.PROD.OUTLOOK.COM (2603:1096:4:b6::23) To BLAPR10MB4835.namprd10.prod.outlook.com (2603:10b6:208:331::11) Precedence: bulk X-Mailing-List: nvdimm@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: db7dd6e8-138d-4aed-07c7-08da22e5fff0 X-MS-TrafficTypeDiagnostic: MWHPR1001MB2208:EE_ X-Microsoft-Antispam-PRVS: X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: Dy7N9SBbS9GEjScFpCtVRJO9HngJ2kVrevC7lZAF0cplYyFHTw3kou/6JB5frIzk/7JvC3+3f60yyCcisENfiO5TJijQRaOvAhXg+J+unb7QJXj9uCjZ2VY8+6XtmK6ubHVcTSrnAtskcgbt0vL/WvIseG3ZbkjEW510DLXA2sxTWEgHUih1pzLMfQ6hL1CLdvV5QAU7Byv8t8B9eo1wmu3MgQLUY4aysjbxLrgEvVrO/Zdgbc8THanQuTRMPo2sbBlVyiyxreQarZcCT9Z9Es3ImZwb0V3w5bu412E512/xybQzJa57Kw+lq/eKUfJ3Q/BVJjvAqtuR1NDE7b153m124HlqjL8wXbcjX5y41seMlQ6likjNURUlXAFSG4VITmnI3bfPME7Cm1xzD8bKC5uoXqkhrKomiQ9xsg/c8KVawca0wJAFs2xatmCbGNnGcECfA2vTk1VuXcJuv6YxjRTT1fYqDRccOQs4JLXnaxEQJ1TVxobAPJSwmU1r576IcZBURqNPH0TzDwz0pn1wHrK8BEtCO3D8lVTEZkLT5qNXFJu42gFqkh9/g6/cCMAMhHUh/5e0Jf7o4cmqVFT7ZznTHWGjohsBwElTp2K0eCvsmq3SUWjdMKkr+vb8Gc+3McWAO5EWJM0I5/va2cHMN+iHIAbNMzZ43A1UkCDQuruiBxDGWCNnwKGj9G9FOr9qGxrBUqJA1fyV7aybOvXcnA== X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:BLAPR10MB4835.namprd10.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230001)(366004)(6512007)(2906002)(6506007)(508600001)(316002)(6916009)(6486002)(66556008)(4326008)(36756003)(86362001)(66476007)(103116003)(52116002)(66946007)(54906003)(8676002)(8936002)(83380400001)(2616005)(5660300002)(38350700002)(38100700002)(7416002)(1076003)(186003)(26005);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: hxjvr8MqoNVjE4oS1QMWfA4/adQ5phOi9DWNy8qE2H8uRK6I79ivyS4NdEwGTdZ+nbq29q9Gk3YteKvpeKlcGSSU6YUcYa9f2RC6HUnWqV5XhI8tllUYnzs1g+ZseTWzgJl/VxG+cHybA1/qfeMwMzk//BX8XmTTX0It3wjohHSkUDZ0WuPypLmra/qpoyZjTgbP+sMpTU5jmyXTbLm1MyMNaGRyF5UuGoGbmC3lyJOlQIZLOXhr3Klno5htGtGgNtKmMENt9FX0P4o1dqJ4BvVhgvwu2bzO1q7DCj+w1Zo3FcdS0x8toDoO1lvWI6lYLkBlMGW1fwQtv9emiB8AH3ONvP7v+TotibohvVqqgjoqYX0pTHLVp6hfrVnGYqNcORux+/rcj2w5oXAN8PTs6JLzKerVuFEiaN2HBs/wD1pLL8g8Yb5lHocss+8Z7pPhYVZpn29uxx1fCcYVKABPJk/UZQyt1SkyKr/r/fxUd2yeAFY531b8l0HDONVq4ZuG71XYFlu+xaa6RdDWiQt4DUzfudFg6Isx3KPeZO+6yxwj4a452Mx/TT6TTFWwIuvBIoblFtYiJyX1lXKvX2ODiSvBMBYq5v8J8L9KoXvb+u8j5rShB9d/cTSqlsD1WTY7nd65xuvKIIaYT2ZUWyLASj38gU9seWll2cDUmqrkwkGdhUVeN8qzvfmqvh2mEmXYsA9zIOdPVgrS/PgAXcukC/gfgzPrZJJ8Zx0gWAUB1d08N/jBy960geSkRColSC7nPTsfBKyc7x4QMKBu5nzeam3oVZ671rpCNqBNWVJhaBN2rsZftVrgnE5iG6xQfG9H6+OvYYE+hUze0VH6Be0mUynzblyrOoDb9p7sI8T1dOMNBIZnWqNPDZD7EsIR+ngyVzeoCGgp0AasBxSz8m4GAYYfKSPkEfT2ozrc6yphmZFdR4OnxT7BaV8g8JXS65WyPcSsjIB53gO8JW8FoTdLYVp+pJd/DYRMvfumSshcPWQZh4IHihfJ46n5P3w69smuxQh/3FEOSHsDHhMyJnVyleVm6k5fgEnGJAEOnfnhE/DajTwIN5k3xtVxlBH6ksQuf0nVJoT9yp7bWzEVZlDImjEVhP1nzEwPczpMKNAAAOvLAhQ2a2ilLqe7mokRvf5SV7uiBmqybXH11G4nFrelNtsKJrKHGPLvOZJx2m/K62znHafRAUv6DkbJqyWiqLepyYNAy8939H3AfV3J9u9AMO0tDaVZ58nvrBQ5IGVHMyYed4fLlnVb/NHa4wNLM2UxmtQ4uK2852+WnmdLCGSu/CGs2Yhva9EoMFnk4Sex8qzl3PL/RfLSZVo1ler1AqI8okGYHflFpc/inGW2oeiiXn7fu6zkf2RtpHk6+Hc8vn8CPtyWRxH5IvbBbD9DFCKFgpCFXZUC+e7hyJdijoNU9bE7fZlh//1VoZmLBKkS0+TGSZRT948t/arNCWTd/piGDdqqPxYB4ZMrHVi2z55OYOv5L8hNgTnUOHLrnmg6RFiwYjXGEB+kwfUrd7HaPkcXR+1q2aQR/Vnb9akln6EhCOchRpplYlLUMoFSJwSTwA9vpQocfhXNYWe/g61x1claiJaWp2cwi5cG27/ROu+7xCQvBTivgVl8IAafcjnAUoVor5D+8s/xmFk8ZhCwbbj+GjYWUXjNBtWmc1S3uSfCpisXjCwx5iHItYGWwJsvqC+gWeY7d2DJgPxZWcEakmEVdt82AEWGYe4FeHZ3aMh+YBQ2vtMd3aNjg0c/UqMG7CY= X-OriginatorOrg: oracle.com X-MS-Exchange-CrossTenant-Network-Message-Id: db7dd6e8-138d-4aed-07c7-08da22e5fff0 X-MS-Exchange-CrossTenant-AuthSource: BLAPR10MB4835.namprd10.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 20 Apr 2022 15:54:06.6403 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 4e2c6054-71cb-48f1-bd6c-3a9705aca71b X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: 4W6HmCTT7QJS+mf6nyzA2QmT7S+0vdRu12RppPUG0X5NwzoH3dk6wIss9+RSdc50ws7Z+lM6EQRw7ZoqSWVQBFZEawdGLTXA7D2ipqSiRFg= X-MS-Exchange-Transport-CrossTenantHeadersStamped: MWHPR1001MB2208 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.486,18.0.858 definitions=2022-04-20_04:2022-04-20,2022-04-20 signatures=0 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 phishscore=0 bulkscore=0 suspectscore=0 malwarescore=0 mlxlogscore=798 adultscore=0 mlxscore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2202240000 definitions=main-2204200094 X-Proofpoint-ORIG-GUID: kg3Wgt10qzEFN6empL15fA24-bxvf6uG X-Proofpoint-GUID: kg3Wgt10qzEFN6empL15fA24-bxvf6uG In preparation for describing a memmap with compound pages, move the actual pte population logic into a separate function vmemmap_populate_address() and have a new helper vmemmap_populate_range() walk through all base pages it needs to populate. While doing that, change the helper to use a pte_t* as return value, rather than an hardcoded errno of 0 or -ENOMEM. Signed-off-by: Joao Martins Reviewed-by: Muchun Song --- mm/sparse-vmemmap.c | 53 ++++++++++++++++++++++++++++++--------------- 1 file changed, 36 insertions(+), 17 deletions(-) diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c index fb68e7764ba2..ef15664c6b6c 100644 --- a/mm/sparse-vmemmap.c +++ b/mm/sparse-vmemmap.c @@ -608,38 +608,57 @@ pgd_t * __meminit vmemmap_pgd_populate(unsigned long addr, int node) return pgd; } -int __meminit vmemmap_populate_basepages(unsigned long start, unsigned long end, - int node, struct vmem_altmap *altmap) +static pte_t * __meminit vmemmap_populate_address(unsigned long addr, int node, + struct vmem_altmap *altmap) { - unsigned long addr = start; pgd_t *pgd; p4d_t *p4d; pud_t *pud; pmd_t *pmd; pte_t *pte; + pgd = vmemmap_pgd_populate(addr, node); + if (!pgd) + return NULL; + p4d = vmemmap_p4d_populate(pgd, addr, node); + if (!p4d) + return NULL; + pud = vmemmap_pud_populate(p4d, addr, node); + if (!pud) + return NULL; + pmd = vmemmap_pmd_populate(pud, addr, node); + if (!pmd) + return NULL; + pte = vmemmap_pte_populate(pmd, addr, node, altmap); + if (!pte) + return NULL; + vmemmap_verify(pte, node, addr, addr + PAGE_SIZE); + + return pte; +} + +static int __meminit vmemmap_populate_range(unsigned long start, + unsigned long end, int node, + struct vmem_altmap *altmap) +{ + unsigned long addr = start; + pte_t *pte; + for (; addr < end; addr += PAGE_SIZE) { - pgd = vmemmap_pgd_populate(addr, node); - if (!pgd) - return -ENOMEM; - p4d = vmemmap_p4d_populate(pgd, addr, node); - if (!p4d) - return -ENOMEM; - pud = vmemmap_pud_populate(p4d, addr, node); - if (!pud) - return -ENOMEM; - pmd = vmemmap_pmd_populate(pud, addr, node); - if (!pmd) - return -ENOMEM; - pte = vmemmap_pte_populate(pmd, addr, node, altmap); + pte = vmemmap_populate_address(addr, node, altmap); if (!pte) return -ENOMEM; - vmemmap_verify(pte, node, addr, addr + PAGE_SIZE); } return 0; } +int __meminit vmemmap_populate_basepages(unsigned long start, unsigned long end, + int node, struct vmem_altmap *altmap) +{ + return vmemmap_populate_range(start, end, node, altmap); +} + struct page * __meminit __populate_section_memmap(unsigned long pfn, unsigned long nr_pages, int nid, struct vmem_altmap *altmap, struct dev_pagemap *pgmap) From patchwork Wed Apr 20 15:53:08 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joao Martins X-Patchwork-Id: 12820523 Received: from mx0b-00069f02.pphosted.com (mx0b-00069f02.pphosted.com [205.220.177.32]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BC1FD1862 for ; Wed, 20 Apr 2022 15:54:34 +0000 (UTC) Received: from pps.filterd (m0246632.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.16.1.2/8.16.1.2) with SMTP id 23KElUXw009567; Wed, 20 Apr 2022 15:54:20 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : content-type : mime-version; s=corp-2021-07-09; bh=cN2dJDQ5NpX137w33ZeWe6BTTCYZFR6xf3MP4K9nh1Q=; b=IDQeoVehnZG07MhdLwSpPYomh3kMpd4Iz1kZuKhAxE0nXXt9GWX4QsWKYaXLmgpJX4Xt gb6YRUiYJyGxQakQCNb6jiy26KtBy72aeO3iqmPmIn9O50weR73hhzWSmai4upNw9rV8 QBoARn5sI3JEZj6Fcyp0oFdCVixsYuWOOO1chQlqhC42PrCfFZ1Uyl20jehyVKKnzGXG c1GxGfT3YCGSBQY9i/FXWj0+EMeab0dexYchp/b/kDJRQpclpAOCKiVsWAtNm8NxW67N HKMT3eZc/ZSkuPkgCZMBt373VnV8rQk+ItKXPDNP4xerpYvHtBn0roN6xPS/ixvcttXr Mw== Received: from phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta02.appoci.oracle.com [147.154.114.232]) by mx0b-00069f02.pphosted.com with ESMTP id 3ffndthjd2-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 20 Apr 2022 15:54:20 +0000 Received: from pps.filterd (phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com (8.16.1.2/8.16.1.2) with SMTP id 23KFokjd037654; Wed, 20 Apr 2022 15:54:18 GMT Received: from nam11-co1-obe.outbound.protection.outlook.com (mail-co1nam11lp2173.outbound.protection.outlook.com [104.47.56.173]) by phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com with ESMTP id 3ffm87c8w2-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 20 Apr 2022 15:54:18 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=HPa1uq5vJDCyU603LNkOBzQJQi3tnTnI2RiJ7nfhUvqroJmyR52viPmMDOIJyLCcgSI6fVokaNrIZg/fE76XmOObALhVC6/qrzQIWodJvaesasuN6if9kNqNuTMMmOfYFUQibXY4ohM+Zweq1ZDC8sW1IIOQ44UHtp6V3v4SRJCr8cS6QnL6iZPDzKw9b5fVYDsApWQ1/TbV5sgBLyI42eTKQkRedRkz2cner86BF06Yppm6lqvm2RiND9ssHNgAI5G+EJEh6PDsldhEW9mbB595mkBl2UdK7sVWjHpYhwU6CGfUuLGgTI876zlvwIpiVb69NlE3VTum8P6Rqir91Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=cN2dJDQ5NpX137w33ZeWe6BTTCYZFR6xf3MP4K9nh1Q=; b=RhRrCETi/thZY4yZdXwbe6uDHSIUJ/YICj5cD4dzNJr2j568OXO9XMJFP/J3d2BsNQy/ANFYDtKEylVKTJ3wbzgbG3RzlXNZp174ZO0klSiYILPJKAEcHkWK6JZNod6SY87pcHxNyQupXxMYdhSKW8zJPq8Xgikxo4B11zwP4QjpLPaLFvl4grBtBte5PXcFXMTus3XZKlq0JGdMjcX1jsDosPqbOLo4oUOXS5lNjl9sKqo2hkUYHAn4JiviZq+RUGY3LIYTrq97+0VMkwKczaKJdAIw02iVUVXYEZasTg2S2NFhtV5nZJGjU05/EKE/1CPW2WGnIKBon+A6ply+SQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=oracle.com; dmarc=pass action=none header.from=oracle.com; dkim=pass header.d=oracle.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.onmicrosoft.com; s=selector2-oracle-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=cN2dJDQ5NpX137w33ZeWe6BTTCYZFR6xf3MP4K9nh1Q=; b=hPxDdxxqyvvMu9nqYd9NkQ+PZY/6+eBn0JCL69f9hBgYv2bk9n1AkzwQFfAx2oNe41czZYOprc7tAywsh7RdduAUK5TvLILd11Bp8xOUIMdhB+cei3M7P+5LzX1Lld7iah9CWEntTFF0+nFIJOStWrxuccjmos3qq/3FfaRTyhw= Received: from BLAPR10MB4835.namprd10.prod.outlook.com (2603:10b6:208:331::11) by BN6PR10MB1330.namprd10.prod.outlook.com (2603:10b6:404:43::16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5164.25; Wed, 20 Apr 2022 15:54:15 +0000 Received: from BLAPR10MB4835.namprd10.prod.outlook.com ([fe80::d17f:a2a4:ca0c:cb49]) by BLAPR10MB4835.namprd10.prod.outlook.com ([fe80::d17f:a2a4:ca0c:cb49%4]) with mapi id 15.20.5186.013; Wed, 20 Apr 2022 15:54:15 +0000 From: Joao Martins To: linux-mm@kvack.org Cc: Dan Williams , Vishal Verma , Matthew Wilcox , Jason Gunthorpe , Jane Chu , Muchun Song , Mike Kravetz , Andrew Morton , Jonathan Corbet , Christoph Hellwig , nvdimm@lists.linux.dev, linux-doc@vger.kernel.org Subject: [PATCH v9 3/5] mm/hugetlb_vmemmap: move comment block to Documentation/vm Date: Wed, 20 Apr 2022 16:53:08 +0100 Message-Id: <20220420155310.9712-4-joao.m.martins@oracle.com> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20220420155310.9712-1-joao.m.martins@oracle.com> References: <20220420155310.9712-1-joao.m.martins@oracle.com> X-ClientProxiedBy: SGAP274CA0011.SGPP274.PROD.OUTLOOK.COM (2603:1096:4:b6::23) To BLAPR10MB4835.namprd10.prod.outlook.com (2603:10b6:208:331::11) Precedence: bulk X-Mailing-List: nvdimm@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: cfcee955-07c2-473c-a7cc-08da22e604e2 X-MS-TrafficTypeDiagnostic: BN6PR10MB1330:EE_ X-Microsoft-Antispam-PRVS: X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: aIHR/I0Jdjg8awHaVgATL1bwOvfQM1YelNo7dN72uY8u0x7XMLv2uQ8eIaFLxCRUgUqn6/t0392FbutTN6x0fYg/ee/rAD3+d3/FY/qLJmj5ImKSbrA2tLN4ZrZ4FWT1xfZc382b60yNJDZmfmsQGQesWLFxUrvH6u2s5NYe3USEJgwMfRQ8O+BP1hs1DlUGh78xcuJQ+WKpEX8vg/3DN37lpjOLgiUgLKv3yBaWisihOFcDhx6oL0eUEUv5ejFSV4NMNPxiQ6UfmhW3eXhS/d/12wMYi8QcnzZcjcBRzFxqC5XE6KtGZVL4ItntwOkRqh3zF0SskWAgV6AFTGYzJ6mv26MI5NX/sN/4cKsBr00OS42vjM9wDzAE/p8FA+ufLoGtHYfyYihDwIDU9+YlhV9ghWvwQWwL5U7ifgICJwTIBdHOMRH7ZCX5917V9xtMsDLozLtip+b8ggT6FNx/lRISQiaatvxEJFgLu2EAywlTxVjgZ8j0WEtX1s1vo0o8dZuxr8kY9piKoovX67wTCL6P7usy7ncIhK8qR3MO2znfDJHVuKXbjg6c7bMQvvxJGM+zGUKKRpJfdVAnfyYHPguwNOtIScsf16fLXkb404JjULTZImZBH4X9zFuKlCYmoPnaI0nTIUfNSZbpOPLFP75g9V6ROFjtfAHw2JDvdBpTsH/mWwhTHeBAD9wbv/VzkFn8lYSnVgtIc1hUCG0TMg== X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:BLAPR10MB4835.namprd10.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230001)(366004)(6506007)(8676002)(26005)(6512007)(4326008)(86362001)(38100700002)(38350700002)(6486002)(30864003)(8936002)(66556008)(508600001)(66946007)(6666004)(7416002)(52116002)(316002)(1076003)(6916009)(186003)(2906002)(54906003)(36756003)(83380400001)(2616005)(66476007)(5660300002)(103116003);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: hFGhwJnexmjAZ/2WJ+HL/U+azibeNL6y/D12Z+11LhLjunxjeLn7jfnKckrf7fq9sGbBHLC6uKOnnZRUhkRTBhi9B0VzL6y6nsW8YKcDeE3eG6e4ec0wssggyO33OJO7E7S2uSkR5uYzJnlljBnG7pC9zW5M2Uv07OABxvjmKF1U7LYKaj9qbZOgaMAQffcr186/ByrbvXLD0Gxo9YdYrzN4oRhzdMpyqO/nGATmTc7T9mfa3VeWmHYQZPF9KZIVLLDrzpZnea6rXukG3im8w3IQcleBz5RW2TmKkxQ7Q62sjRtfKhsHAaI50w4NQjciXlwNRgvVJuIhluQeA5RnCsO+443Wi+kX91uOuxheoMyVtK/G7FlrKpaz9wFX4/1FrdnXjEpeXY0JaOtg98wMbFwKcvfxxCeghDNLOFseCIWtEZdHOqC45tpsIIxeZXIc2F2P5VZWuhpfuNA1EpSciA8uoQBIkhCnMyfLbH3cn0GEyZt6ByVpVb1BOTKDKhRpG6ParSAA2GKtDEo31bUc+ZQ8E0LR6dNsvKoFbnQXcWE/Uanugg/kq9WjCyxvVWuXv0I9wcidZGFSNcA8J05tBx0NhXqChR/5pd3/4Fk4fcI/qCV3L7/AzyzC+ngmr/7NZ5TWG//O33Pt21Wvzyv8WT5spgDCr/bznVc5R2N5fmlu6Meq4dUPBiL5gXm9SEDEPF3L0UCz2SPyQB2Bb7o/3Ow9qHrpOtSgQeB3g+w2GPQSthEZImm6Z6MzqhTEas/cGYmfnhTmWnlcf+VI578W9VJ7GF4LkTAyErLXniD8nHOoVW4CLQyJRygBfL89RaWZ4x3RxGqMuHFMcW4OEnqyw4NINxCNH2DYmwFENo+HlLYPijGzcb7Bun5hH4fUnifQTYmC1VZoJ8PsFGsEoU1egjI33gWnxkpewEa4ylQhc37z7GpH8pmy5LUufpWZSW+RNDpIU+La3j/B3t4pA3wCDb/AADvWkLRUxTeMJove4nz+foAJwAK131OQyOYEfJKp3If0w/IeXzzm12unwO4yJHkYEAT3EgWZpAtdDb8pxae0dJWOwx4vblRr09ol/sTKi7Egqk6esWI5hx8n2IQ4JrKFrQjeMaCvmuthjUYFN13i/tljKAgjG0e0YkzRy20/d/2gSFHapdTEBimStrXa4i98hTid0M+2UoYYvjmeX/bba8HSb3O6SRCpeHWQHu0pia+eRfW3R5oRd7e81tLTk5gunTaIpOejIpeyQAJpLzzBOrfT6znmRwHuW81HpAxXsemvV8P6tO5BKgVYVlistrBKnRcOP4W3Rnv56jsRUOl4Nd8QJJBcWM+3Q6Yj8q9wxr0s+SMTPwKanfAtiwPRVOWQOr9Y9auT1S/0q/kjuu/E55wPLKzIvW4pJzwkO9Co3C0aeOvX4s9yAHAN9BJ7cqo+CFCq52BmMT6LKbFhc7YgqM8OGh0hCNgb5jPgkB0vev6pcHOp0a8U6CiLCAclffB0Bhu597kUswyd3LTjeUO4733TDPNh34cqBhhuG85xkBxQ7doBSF/c1ZZe6PoqienKcTp5YzMEfCiyo04DKPLnao7x6vZKC+wR06ZXcXFyq5rtCNXqAcqh9jZSSakFToDtSy80c+zA3oN0uDUS5ndAnf9Hl5EmOfeeppB16rXos1m2rY9uuhumYdHHxpOl7d5KpMEu3ul9d8wKt/EeleD4r+gE1LDYgxv0gbwDGfDfXRAWLjSflidbMEiSKuCpKxqjKaYiIt/dtXyJkh6gGYQ= X-OriginatorOrg: oracle.com X-MS-Exchange-CrossTenant-Network-Message-Id: cfcee955-07c2-473c-a7cc-08da22e604e2 X-MS-Exchange-CrossTenant-AuthSource: BLAPR10MB4835.namprd10.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 20 Apr 2022 15:54:15.2380 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 4e2c6054-71cb-48f1-bd6c-3a9705aca71b X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: hzrZFgFhpILL3EoiHuDTwk4wb284M2mCIm2bOkx3r4PH1DK34HkzXrs5zesXvI/j+h93RF0nKlchTwwCq+1Ebea0DiXApCvW05BGStL11ls= X-MS-Exchange-Transport-CrossTenantHeadersStamped: BN6PR10MB1330 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.486,18.0.858 definitions=2022-04-20_04:2022-04-20,2022-04-20 signatures=0 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 phishscore=0 bulkscore=0 suspectscore=0 malwarescore=0 mlxlogscore=999 adultscore=0 mlxscore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2202240000 definitions=main-2204200094 X-Proofpoint-ORIG-GUID: ypvYZDCaZJbwwC8es7m_brl-BdQZtVMj X-Proofpoint-GUID: ypvYZDCaZJbwwC8es7m_brl-BdQZtVMj In preparation for device-dax for using hugetlbfs compound page tail deduplication technique, move the comment block explanation into a common place in Documentation/vm. Cc: Muchun Song Cc: Mike Kravetz Suggested-by: Dan Williams Signed-off-by: Joao Martins Reviewed-by: Muchun Song Reviewed-by: Dan Williams --- Documentation/vm/index.rst | 1 + Documentation/vm/vmemmap_dedup.rst | 173 +++++++++++++++++++++++++++++ mm/hugetlb_vmemmap.c | 168 +--------------------------- 3 files changed, 175 insertions(+), 167 deletions(-) create mode 100644 Documentation/vm/vmemmap_dedup.rst diff --git a/Documentation/vm/index.rst b/Documentation/vm/index.rst index b48434300226..e0dc1ddc2265 100644 --- a/Documentation/vm/index.rst +++ b/Documentation/vm/index.rst @@ -38,5 +38,6 @@ algorithms. If you are looking for advice on simply allocating memory, see the transhuge unevictable-lru vmalloced-kernel-stacks + vmemmap_dedup z3fold zsmalloc diff --git a/Documentation/vm/vmemmap_dedup.rst b/Documentation/vm/vmemmap_dedup.rst new file mode 100644 index 000000000000..485ccf4f7b10 --- /dev/null +++ b/Documentation/vm/vmemmap_dedup.rst @@ -0,0 +1,173 @@ +.. SPDX-License-Identifier: GPL-2.0 + +================================== +Free some vmemmap pages of HugeTLB +================================== + +The struct page structures (page structs) are used to describe a physical +page frame. By default, there is a one-to-one mapping from a page frame to +it's corresponding page struct. + +HugeTLB pages consist of multiple base page size pages and is supported by many +architectures. See Documentation/admin-guide/mm/hugetlbpage.rst for more +details. On the x86-64 architecture, HugeTLB pages of size 2MB and 1GB are +currently supported. Since the base page size on x86 is 4KB, a 2MB HugeTLB page +consists of 512 base pages and a 1GB HugeTLB page consists of 4096 base pages. +For each base page, there is a corresponding page struct. + +Within the HugeTLB subsystem, only the first 4 page structs are used to +contain unique information about a HugeTLB page. __NR_USED_SUBPAGE provides +this upper limit. The only 'useful' information in the remaining page structs +is the compound_head field, and this field is the same for all tail pages. + +By removing redundant page structs for HugeTLB pages, memory can be returned +to the buddy allocator for other uses. + +Different architectures support different HugeTLB pages. For example, the +following table is the HugeTLB page size supported by x86 and arm64 +architectures. Because arm64 supports 4k, 16k, and 64k base pages and +supports contiguous entries, so it supports many kinds of sizes of HugeTLB +page. + ++--------------+-----------+-----------------------------------------------+ +| Architecture | Page Size | HugeTLB Page Size | ++--------------+-----------+-----------+-----------+-----------+-----------+ +| x86-64 | 4KB | 2MB | 1GB | | | ++--------------+-----------+-----------+-----------+-----------+-----------+ +| | 4KB | 64KB | 2MB | 32MB | 1GB | +| +-----------+-----------+-----------+-----------+-----------+ +| arm64 | 16KB | 2MB | 32MB | 1GB | | +| +-----------+-----------+-----------+-----------+-----------+ +| | 64KB | 2MB | 512MB | 16GB | | ++--------------+-----------+-----------+-----------+-----------+-----------+ + +When the system boot up, every HugeTLB page has more than one struct page +structs which size is (unit: pages):: + + struct_size = HugeTLB_Size / PAGE_SIZE * sizeof(struct page) / PAGE_SIZE + +Where HugeTLB_Size is the size of the HugeTLB page. We know that the size +of the HugeTLB page is always n times PAGE_SIZE. So we can get the following +relationship:: + + HugeTLB_Size = n * PAGE_SIZE + +Then:: + + struct_size = n * PAGE_SIZE / PAGE_SIZE * sizeof(struct page) / PAGE_SIZE + = n * sizeof(struct page) / PAGE_SIZE + +We can use huge mapping at the pud/pmd level for the HugeTLB page. + +For the HugeTLB page of the pmd level mapping, then:: + + struct_size = n * sizeof(struct page) / PAGE_SIZE + = PAGE_SIZE / sizeof(pte_t) * sizeof(struct page) / PAGE_SIZE + = sizeof(struct page) / sizeof(pte_t) + = 64 / 8 + = 8 (pages) + +Where n is how many pte entries which one page can contains. So the value of +n is (PAGE_SIZE / sizeof(pte_t)). + +This optimization only supports 64-bit system, so the value of sizeof(pte_t) +is 8. And this optimization also applicable only when the size of struct page +is a power of two. In most cases, the size of struct page is 64 bytes (e.g. +x86-64 and arm64). So if we use pmd level mapping for a HugeTLB page, the +size of struct page structs of it is 8 page frames which size depends on the +size of the base page. + +For the HugeTLB page of the pud level mapping, then:: + + struct_size = PAGE_SIZE / sizeof(pmd_t) * struct_size(pmd) + = PAGE_SIZE / 8 * 8 (pages) + = PAGE_SIZE (pages) + +Where the struct_size(pmd) is the size of the struct page structs of a +HugeTLB page of the pmd level mapping. + +E.g.: A 2MB HugeTLB page on x86_64 consists in 8 page frames while 1GB +HugeTLB page consists in 4096. + +Next, we take the pmd level mapping of the HugeTLB page as an example to +show the internal implementation of this optimization. There are 8 pages +struct page structs associated with a HugeTLB page which is pmd mapped. + +Here is how things look before optimization:: + + HugeTLB struct pages(8 pages) page frame(8 pages) + +-----------+ ---virt_to_page---> +-----------+ mapping to +-----------+ + | | | 0 | -------------> | 0 | + | | +-----------+ +-----------+ + | | | 1 | -------------> | 1 | + | | +-----------+ +-----------+ + | | | 2 | -------------> | 2 | + | | +-----------+ +-----------+ + | | | 3 | -------------> | 3 | + | | +-----------+ +-----------+ + | | | 4 | -------------> | 4 | + | PMD | +-----------+ +-----------+ + | level | | 5 | -------------> | 5 | + | mapping | +-----------+ +-----------+ + | | | 6 | -------------> | 6 | + | | +-----------+ +-----------+ + | | | 7 | -------------> | 7 | + | | +-----------+ +-----------+ + | | + | | + | | + +-----------+ + +The value of page->compound_head is the same for all tail pages. The first +page of page structs (page 0) associated with the HugeTLB page contains the 4 +page structs necessary to describe the HugeTLB. The only use of the remaining +pages of page structs (page 1 to page 7) is to point to page->compound_head. +Therefore, we can remap pages 1 to 7 to page 0. Only 1 page of page structs +will be used for each HugeTLB page. This will allow us to free the remaining +7 pages to the buddy allocator. + +Here is how things look after remapping:: + + HugeTLB struct pages(8 pages) page frame(8 pages) + +-----------+ ---virt_to_page---> +-----------+ mapping to +-----------+ + | | | 0 | -------------> | 0 | + | | +-----------+ +-----------+ + | | | 1 | ---------------^ ^ ^ ^ ^ ^ ^ + | | +-----------+ | | | | | | + | | | 2 | -----------------+ | | | | | + | | +-----------+ | | | | | + | | | 3 | -------------------+ | | | | + | | +-----------+ | | | | + | | | 4 | ---------------------+ | | | + | PMD | +-----------+ | | | + | level | | 5 | -----------------------+ | | + | mapping | +-----------+ | | + | | | 6 | -------------------------+ | + | | +-----------+ | + | | | 7 | ---------------------------+ + | | +-----------+ + | | + | | + | | + +-----------+ + +When a HugeTLB is freed to the buddy system, we should allocate 7 pages for +vmemmap pages and restore the previous mapping relationship. + +For the HugeTLB page of the pud level mapping. It is similar to the former. +We also can use this approach to free (PAGE_SIZE - 1) vmemmap pages. + +Apart from the HugeTLB page of the pmd/pud level mapping, some architectures +(e.g. aarch64) provides a contiguous bit in the translation table entries +that hints to the MMU to indicate that it is one of a contiguous set of +entries that can be cached in a single TLB entry. + +The contiguous bit is used to increase the mapping size at the pmd and pte +(last) level. So this type of HugeTLB page can be optimized only when its +size of the struct page structs is greater than 1 page. + +Notice: The head vmemmap page is not freed to the buddy allocator and all +tail vmemmap pages are mapped to the head vmemmap page frame. So we can see +more than one struct page struct with PG_head (e.g. 8 per 2 MB HugeTLB page) +associated with each HugeTLB page. The compound_head() can handle this +correctly (more details refer to the comment above compound_head()). diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c index 2655434a946b..29554c6ef2ae 100644 --- a/mm/hugetlb_vmemmap.c +++ b/mm/hugetlb_vmemmap.c @@ -6,173 +6,7 @@ * * Author: Muchun Song * - * The struct page structures (page structs) are used to describe a physical - * page frame. By default, there is a one-to-one mapping from a page frame to - * it's corresponding page struct. - * - * HugeTLB pages consist of multiple base page size pages and is supported by - * many architectures. See hugetlbpage.rst in the Documentation directory for - * more details. On the x86-64 architecture, HugeTLB pages of size 2MB and 1GB - * are currently supported. Since the base page size on x86 is 4KB, a 2MB - * HugeTLB page consists of 512 base pages and a 1GB HugeTLB page consists of - * 4096 base pages. For each base page, there is a corresponding page struct. - * - * Within the HugeTLB subsystem, only the first 4 page structs are used to - * contain unique information about a HugeTLB page. __NR_USED_SUBPAGE provides - * this upper limit. The only 'useful' information in the remaining page structs - * is the compound_head field, and this field is the same for all tail pages. - * - * By removing redundant page structs for HugeTLB pages, memory can be returned - * to the buddy allocator for other uses. - * - * Different architectures support different HugeTLB pages. For example, the - * following table is the HugeTLB page size supported by x86 and arm64 - * architectures. Because arm64 supports 4k, 16k, and 64k base pages and - * supports contiguous entries, so it supports many kinds of sizes of HugeTLB - * page. - * - * +--------------+-----------+-----------------------------------------------+ - * | Architecture | Page Size | HugeTLB Page Size | - * +--------------+-----------+-----------+-----------+-----------+-----------+ - * | x86-64 | 4KB | 2MB | 1GB | | | - * +--------------+-----------+-----------+-----------+-----------+-----------+ - * | | 4KB | 64KB | 2MB | 32MB | 1GB | - * | +-----------+-----------+-----------+-----------+-----------+ - * | arm64 | 16KB | 2MB | 32MB | 1GB | | - * | +-----------+-----------+-----------+-----------+-----------+ - * | | 64KB | 2MB | 512MB | 16GB | | - * +--------------+-----------+-----------+-----------+-----------+-----------+ - * - * When the system boot up, every HugeTLB page has more than one struct page - * structs which size is (unit: pages): - * - * struct_size = HugeTLB_Size / PAGE_SIZE * sizeof(struct page) / PAGE_SIZE - * - * Where HugeTLB_Size is the size of the HugeTLB page. We know that the size - * of the HugeTLB page is always n times PAGE_SIZE. So we can get the following - * relationship. - * - * HugeTLB_Size = n * PAGE_SIZE - * - * Then, - * - * struct_size = n * PAGE_SIZE / PAGE_SIZE * sizeof(struct page) / PAGE_SIZE - * = n * sizeof(struct page) / PAGE_SIZE - * - * We can use huge mapping at the pud/pmd level for the HugeTLB page. - * - * For the HugeTLB page of the pmd level mapping, then - * - * struct_size = n * sizeof(struct page) / PAGE_SIZE - * = PAGE_SIZE / sizeof(pte_t) * sizeof(struct page) / PAGE_SIZE - * = sizeof(struct page) / sizeof(pte_t) - * = 64 / 8 - * = 8 (pages) - * - * Where n is how many pte entries which one page can contains. So the value of - * n is (PAGE_SIZE / sizeof(pte_t)). - * - * This optimization only supports 64-bit system, so the value of sizeof(pte_t) - * is 8. And this optimization also applicable only when the size of struct page - * is a power of two. In most cases, the size of struct page is 64 bytes (e.g. - * x86-64 and arm64). So if we use pmd level mapping for a HugeTLB page, the - * size of struct page structs of it is 8 page frames which size depends on the - * size of the base page. - * - * For the HugeTLB page of the pud level mapping, then - * - * struct_size = PAGE_SIZE / sizeof(pmd_t) * struct_size(pmd) - * = PAGE_SIZE / 8 * 8 (pages) - * = PAGE_SIZE (pages) - * - * Where the struct_size(pmd) is the size of the struct page structs of a - * HugeTLB page of the pmd level mapping. - * - * E.g.: A 2MB HugeTLB page on x86_64 consists in 8 page frames while 1GB - * HugeTLB page consists in 4096. - * - * Next, we take the pmd level mapping of the HugeTLB page as an example to - * show the internal implementation of this optimization. There are 8 pages - * struct page structs associated with a HugeTLB page which is pmd mapped. - * - * Here is how things look before optimization. - * - * HugeTLB struct pages(8 pages) page frame(8 pages) - * +-----------+ ---virt_to_page---> +-----------+ mapping to +-----------+ - * | | | 0 | -------------> | 0 | - * | | +-----------+ +-----------+ - * | | | 1 | -------------> | 1 | - * | | +-----------+ +-----------+ - * | | | 2 | -------------> | 2 | - * | | +-----------+ +-----------+ - * | | | 3 | -------------> | 3 | - * | | +-----------+ +-----------+ - * | | | 4 | -------------> | 4 | - * | PMD | +-----------+ +-----------+ - * | level | | 5 | -------------> | 5 | - * | mapping | +-----------+ +-----------+ - * | | | 6 | -------------> | 6 | - * | | +-----------+ +-----------+ - * | | | 7 | -------------> | 7 | - * | | +-----------+ +-----------+ - * | | - * | | - * | | - * +-----------+ - * - * The value of page->compound_head is the same for all tail pages. The first - * page of page structs (page 0) associated with the HugeTLB page contains the 4 - * page structs necessary to describe the HugeTLB. The only use of the remaining - * pages of page structs (page 1 to page 7) is to point to page->compound_head. - * Therefore, we can remap pages 1 to 7 to page 0. Only 1 page of page structs - * will be used for each HugeTLB page. This will allow us to free the remaining - * 7 pages to the buddy allocator. - * - * Here is how things look after remapping. - * - * HugeTLB struct pages(8 pages) page frame(8 pages) - * +-----------+ ---virt_to_page---> +-----------+ mapping to +-----------+ - * | | | 0 | -------------> | 0 | - * | | +-----------+ +-----------+ - * | | | 1 | ---------------^ ^ ^ ^ ^ ^ ^ - * | | +-----------+ | | | | | | - * | | | 2 | -----------------+ | | | | | - * | | +-----------+ | | | | | - * | | | 3 | -------------------+ | | | | - * | | +-----------+ | | | | - * | | | 4 | ---------------------+ | | | - * | PMD | +-----------+ | | | - * | level | | 5 | -----------------------+ | | - * | mapping | +-----------+ | | - * | | | 6 | -------------------------+ | - * | | +-----------+ | - * | | | 7 | ---------------------------+ - * | | +-----------+ - * | | - * | | - * | | - * +-----------+ - * - * When a HugeTLB is freed to the buddy system, we should allocate 7 pages for - * vmemmap pages and restore the previous mapping relationship. - * - * For the HugeTLB page of the pud level mapping. It is similar to the former. - * We also can use this approach to free (PAGE_SIZE - 1) vmemmap pages. - * - * Apart from the HugeTLB page of the pmd/pud level mapping, some architectures - * (e.g. aarch64) provides a contiguous bit in the translation table entries - * that hints to the MMU to indicate that it is one of a contiguous set of - * entries that can be cached in a single TLB entry. - * - * The contiguous bit is used to increase the mapping size at the pmd and pte - * (last) level. So this type of HugeTLB page can be optimized only when its - * size of the struct page structs is greater than 1 page. - * - * Notice: The head vmemmap page is not freed to the buddy allocator and all - * tail vmemmap pages are mapped to the head vmemmap page frame. So we can see - * more than one struct page struct with PG_head (e.g. 8 per 2 MB HugeTLB page) - * associated with each HugeTLB page. The compound_head() can handle this - * correctly (more details refer to the comment above compound_head()). + * See Documentation/vm/vmemmap_dedup.rst */ #define pr_fmt(fmt) "HugeTLB: " fmt From patchwork Wed Apr 20 15:53:09 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joao Martins X-Patchwork-Id: 12820524 Received: from mx0b-00069f02.pphosted.com (mx0b-00069f02.pphosted.com [205.220.177.32]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A434B1863 for ; Wed, 20 Apr 2022 15:54:35 +0000 (UTC) Received: from pps.filterd (m0246630.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.16.1.2/8.16.1.2) with SMTP id 23KF6eJT025975; Wed, 20 Apr 2022 15:54:28 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : content-type : mime-version; s=corp-2021-07-09; bh=Pp9ImtKk1xRZnPd6z6QwGblAXhgTBGlOjM5edRw08BY=; b=F93ze5EjofBrP3DIE4ALcD/N0AXH1pZYzkRKLUlY5Mv0cNwMirWXxOpuOm5dur5ug/3L GdXBPBfvxB6wwD3LXhLV51h0z9L686sLuXMeUb3ZdtQJ13bQbUDjwU1pdxs8f60pH7uH sj+iMeLpINR7bVloBsyDsVw5QxGwF3S6W4Qz1pJFF03N+su/RUisJvmGXAOR/fSevjhM wLtSuyiMWXHKvvYytsROxxp2blaGdLt+BFx6RT2BWFEAFETU217Fy6kmivrgz8faxm18 xuzA1OZhkt2T9KNOkwllpW6o9N8BNznuO8EaIDioi0L7L6UDYdywLC5SJDutJ64CShQp pA== Received: from phxpaimrmta03.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta03.appoci.oracle.com [138.1.37.129]) by mx0b-00069f02.pphosted.com with ESMTP id 3ffm7csbqr-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 20 Apr 2022 15:54:27 +0000 Received: from pps.filterd (phxpaimrmta03.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta03.imrmtpd1.prodappphxaev1.oraclevcn.com (8.16.1.2/8.16.1.2) with SMTP id 23KFpnSC009004; Wed, 20 Apr 2022 15:54:26 GMT Received: from nam11-co1-obe.outbound.protection.outlook.com (mail-co1nam11lp2173.outbound.protection.outlook.com [104.47.56.173]) by phxpaimrmta03.imrmtpd1.prodappphxaev1.oraclevcn.com with ESMTP id 3ffm86u8j7-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 20 Apr 2022 15:54:25 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=XnnjBn1wwuMSGR6HUvTYMnA2/sYpyVWaoPj8HUwlETyljWVSS1Iq5HiyAn4rfxqwRrkHbgOL/Fxatt6BppuRpgDtAOFsfr4qW9rgG9EHgQRI1nlsScEwYnJ2xT4sJuyaRb739l+7qfvFneptziUvJdB5kM+R/yssypOZG01VWahxr4/GTr0UUf8/7NwBf4SFNPYjsLnSkvky4Lg79UKQsvyyR4cY6ec1DDrwUArFBTxtdgffqxpA0oqijl8wnkzUYyQ1DMjwMEpPwyLgwWOKWyAn7q1JVUcrPAlurFwvmYPCs1mPRzvONf6e+d3BEaBx/k+u9ECH7NfD+JA95hCKsg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=Pp9ImtKk1xRZnPd6z6QwGblAXhgTBGlOjM5edRw08BY=; b=lqDkCyyWgkJ4oHTvi+fjOBCsaYJI+vH/Jpvbs34+MLNfvZXOaGUFnuXFidnFedZFTYzn3pSfI/l64xJtVVJtqA11eGSwdpjGsmS1vrNQ8usaTupW3r8wggyelD77CJM2QmYhPTJOYk7AnsH5opZvAFzqOOKUchk66k4GEPmcqzYAZFn3kfpRckE63KqxwMJip3t1I+W1ISdjAi5CWloSXtUmabd9PJ1iFWGeq44UfHBNaqRDDFS/nQxDuF/0KerwKZuHG9tokW0KFL+bot8c+wwIdmibT9DFbh5yiSJIJj2gVmd07KIVA/7gVSR1//jHW2H2Tch5OLu+KlHtxrUlsg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=oracle.com; dmarc=pass action=none header.from=oracle.com; dkim=pass header.d=oracle.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.onmicrosoft.com; s=selector2-oracle-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=Pp9ImtKk1xRZnPd6z6QwGblAXhgTBGlOjM5edRw08BY=; b=Sa0aJGrU1kQ5aABS7tIfweO8GusoHN9K+Bd505A/6Nf/XTY5eiOyQOamWvQQ+g1zxGXSJhMVwS8QkiIGUo1YfPrWaqj/cBcMnVDnt6aoTW+KXv/89fmlGEL62laJVb4Mrk2WEiIOfXzvWLW3/dhGWsof+f0aS+s4F+ZaOT6i95M= Received: from BLAPR10MB4835.namprd10.prod.outlook.com (2603:10b6:208:331::11) by BN6PR10MB1330.namprd10.prod.outlook.com (2603:10b6:404:43::16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5164.25; Wed, 20 Apr 2022 15:54:23 +0000 Received: from BLAPR10MB4835.namprd10.prod.outlook.com ([fe80::d17f:a2a4:ca0c:cb49]) by BLAPR10MB4835.namprd10.prod.outlook.com ([fe80::d17f:a2a4:ca0c:cb49%4]) with mapi id 15.20.5186.013; Wed, 20 Apr 2022 15:54:23 +0000 From: Joao Martins To: linux-mm@kvack.org Cc: Dan Williams , Vishal Verma , Matthew Wilcox , Jason Gunthorpe , Jane Chu , Muchun Song , Mike Kravetz , Andrew Morton , Jonathan Corbet , Christoph Hellwig , nvdimm@lists.linux.dev, linux-doc@vger.kernel.org Subject: [PATCH v9 4/5] mm/sparse-vmemmap: improve memory savings for compound devmaps Date: Wed, 20 Apr 2022 16:53:09 +0100 Message-Id: <20220420155310.9712-5-joao.m.martins@oracle.com> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20220420155310.9712-1-joao.m.martins@oracle.com> References: <20220420155310.9712-1-joao.m.martins@oracle.com> X-ClientProxiedBy: SGAP274CA0011.SGPP274.PROD.OUTLOOK.COM (2603:1096:4:b6::23) To BLAPR10MB4835.namprd10.prod.outlook.com (2603:10b6:208:331::11) Precedence: bulk X-Mailing-List: nvdimm@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 55fdee67-95c6-4287-6a6d-08da22e609c4 X-MS-TrafficTypeDiagnostic: BN6PR10MB1330:EE_ X-Microsoft-Antispam-PRVS: X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: eiEn7umiB03CzVn3AocWRU1M4nrwS1n8ioLYqef+SDXe3etqQ15fh405ZjYl9vH8IeKYHuGduoXqoNCFME8ZjXILRymz21he9HyqQdEwabGj/OAXHunSrQxKi+r5tDtGiEKeCfzYu1cNiwyY6Rg9fL/VGW2NlgOR+PoJxLbHGoITk2zb7TN8AmTPIvxYSDdEanNQEiQ2Y5iOK7n9/vLWb43fnbrVFnzE3dBZWWOrAZ/oA0eo04m23Fy2xms2yd0YiMaJeJ+WnGT76bTADMkDG0njA6505sw2A1ErU7PTYzSGL6syOYJ/cgymYFwCtVLVAy7HE34Zw/YTemfea7Rv/gDKuDPiowlS8T2mVHxUkce+M4qqXTXIE1Zgc7+WdZMzVWmZlxJxX7BiswelHFhF5gyiOVAvWoqQ8bXtujcN2yWvBKLY5zMz007MDGS7FKgwnaAGhrTOgieBHb7ZwIln+q6MqaHf/x5+jZ8Uh8cZTQ+YOhLsV1YwH3LlhxaZKf2i5z+pmzNtd5CIC8ynAeZx/ziyJ8nSiz9uKAPCMC8XNA0zR8O57xuPZtg9jgGVn4HcsdYj8JKBgvpDdXjarOoqO7bOx6R+jYJnSaRZSsEnM/VEAi6pynXLVIj1Zuf/0nCvawsZyKFpOYe4QKcQozOvGyUjSAxb1ZbjARG1e63u3v5RO7xehRPeUeJrhy0bo0fQ63dCHrdeJfbou43zWV1YRXLUmcHZxNkSNcnwJhmfOeA= X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:BLAPR10MB4835.namprd10.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230001)(366004)(6506007)(8676002)(26005)(6512007)(4326008)(86362001)(38100700002)(38350700002)(6486002)(30864003)(8936002)(66556008)(508600001)(66946007)(6666004)(7416002)(52116002)(316002)(1076003)(6916009)(186003)(2906002)(54906003)(36756003)(83380400001)(2616005)(66476007)(5660300002)(103116003)(25903002);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: untrLd3tJSLCs+Q2JBXhzHJcaA/v6ZgXqNDS+O02iliC74wDuzIZPWtN/+M2np3Brrjt+SgCFC9kxnpO8XYLv0hltqSy+bMKdF8P2u8vrbt9LpTYZC97KXp1ChAmmu79uwww+jRghXQutV/L4eygV//R3uktNiQj7ofihUX0BFi0kzuBIgOgukGC+2r4f+9mHQLjk+/vZSzmpHcqHGo1R+ut0bCpj0/Cms78xFIqz9XuUrw+V6hkgx3u7BHsVVXRDj+wXC8H7UvJCZyaBRE3FrFOWy6+bB2UnN3cjBZ5eeS4Hm3J2lFnlGf+HLME30GAjfY4Gf7ICsrWIYCySURuOA1sE5ha8aQFvIo3bqgFisXXuXGRClFOH6o3qB9bsuQSfUpN+Mm7Zw/SeNmRNEJ3RVJuS1Saz2DLNxlgSiKy5+z0g9Uia4gXrtdcXgXz/fykRWOH5nwVxheHOMuG6uWtvd9rLX3PAByKhUprNBoLpmmXW6Sd+EG7blxC5tHs5s7dZR2dIYxG6akcoqHsp7t2nEUh42EHRljxiUcf1NQ6Bp1xPgUlVy6t9xczE+yLg/U14bdwdkiE3upqgQ4+e4drr5vfYweMLnvSTNgxxF4pfyTiQOW0QNWlFUCwQ6935fKLteVfH03umeAAl2j2GNSx5Nz8v9TJlq+PnXFHSTqlJUXEAqFBj7tfk0hQGCKkFtnU7bzwRdm2WB3S5BQtAcBfpyChfsU78vAAZvllFDtwklwvULLZfwTaSK046MJQ96qrmdSfqwVkiXysn6fRKP6NQSMzG6GtVsRDoccPNIsExg4yHoBuBI0J5L7JIcdnGDzIDXtXV1qpCorCgI8A8+EpMSlGjCL2iFZyWAeldgw8I0JsksNiIHoDxWbHQP2LgwjmmmfX6Ad+uVeaaEANmFxZvvrqLlikIpftHGEkkXbqTwtAP+nee2CVWAfVZWz+V3qMTYDSbdWScoKts/IhkUNlYRt+QEDa5tx14erjom62u2NJMyDZEliwaqwP/AfG9NNohkTptY2yL9ha9+nu7K/UQu9IkUw5cClSAjlrwRJLK4Zzse85cI2tXOD68klq7wNDK6NKRiHuDzV1LmxEIryU5jNR9rJJLTNk+jlj/d5SAATo2Jac1TE8bpqRehIEgfgouBMKg6HxW3Aku3bPV3UkJM/FYsufg7LDbm34CuwI7zpuC+Dhi1RKsaUFuOW0ZyrabNk5NIdrfe8qUqcrPUvC4VT2+lXFgTGLmE41iqPoDXVwP0ErEczTuC0nVMZoW9MDX+SfjgbYUUJ9/CvBYRwVCtNHpWqOOsDCn6W73JSvxlNqVxugvNHjPXhAnGP7xpzBXIUcT2P55zigufBaSnb3isrqkCWam3ry914jLx9V1wlUE3+AiH9KcUY0dLrC7yH8YMjMDKeE40JGs1ZFdwtjJRp2jC7xl5Bsh1T7ZWqVIHw2gqoz96whyNTjEqYbhPfJGmT1iCGScSIwixYxumZT5sLC0mQ6JRmUpt/LMWJXKG0HNLkOJ7KjvV0i+ay8p7BYTGBG9cvp6yp8MPkNNMNDVHMgdjwfrKWC3qwbickKdaEIRlj3iaccQs4exLlTSA970rphmecVbVifi3DHXwgPaAdG6BXA7L/HZuZlHGhvs6j6zbb4DRv2IXDTKK1WmeBJWuGpqUZfUwVlHJ3uN/Dj5KpHXZspnFaTWS00StwiM6rVUZ/MudE4BXPvskPFRdclSuC2VLieQ8Y5ODeNgNdgRzRvk3beDABXPidYtx7c6/U= X-OriginatorOrg: oracle.com X-MS-Exchange-CrossTenant-Network-Message-Id: 55fdee67-95c6-4287-6a6d-08da22e609c4 X-MS-Exchange-CrossTenant-AuthSource: BLAPR10MB4835.namprd10.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 20 Apr 2022 15:54:23.2566 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 4e2c6054-71cb-48f1-bd6c-3a9705aca71b X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: x/wHtp4GDFFw3cPL0LVz/pcMzIjDPP1ODCXcCyHnaJGwGm+da6WESmw0eYJtYCPjNlTMZFYqSjwGBQtgazTLYj5a8VvcXPZ4tNobvepXH5c= X-MS-Exchange-Transport-CrossTenantHeadersStamped: BN6PR10MB1330 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.486,18.0.858 definitions=2022-04-20_04:2022-04-20,2022-04-20 signatures=0 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxscore=0 phishscore=0 malwarescore=0 suspectscore=0 spamscore=0 mlxlogscore=999 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2202240000 definitions=main-2204200094 X-Proofpoint-GUID: 3OLG8k9dCIVQTZalwWBjFQgO6-UogQdc X-Proofpoint-ORIG-GUID: 3OLG8k9dCIVQTZalwWBjFQgO6-UogQdc A compound devmap is a dev_pagemap with @vmemmap_shift > 0 and it means that pages are mapped at a given huge page alignment and utilize uses compound pages as opposed to order-0 pages. Take advantage of the fact that most tail pages look the same (except the first two) to minimize struct page overhead. Allocate a separate page for the vmemmap area which contains the head page and separate for the next 64 pages. The rest of the subsections then reuse this tail vmemmap page to initialize the rest of the tail pages. Sections are arch-dependent (e.g. on x86 it's 64M, 128M or 512M) and when initializing compound devmap with big enough @vmemmap_shift (e.g. 1G PUD) it may cross multiple sections. The vmemmap code needs to consult @pgmap so that multiple sections that all map the same tail data can refer back to the first copy of that data for a given gigantic page. On compound devmaps with 2M align, this mechanism lets 6 pages be saved out of the 8 necessary PFNs necessary to set the subsection's 512 struct pages being mapped. On a 1G compound devmap it saves 4094 pages. Altmap isn't supported yet, given various restrictions in altmap pfn allocator, thus fallback to the already in use vmemmap_populate(). It is worth noting that altmap for devmap mappings was there to relieve the pressure of inordinate amounts of memmap space to map terabytes of pmem. With compound pages the motivation for altmaps for pmem gets reduced. Signed-off-by: Joao Martins Reviewed-by: Muchun Song --- Documentation/vm/vmemmap_dedup.rst | 56 +++++++++++- include/linux/mm.h | 2 +- mm/memremap.c | 1 + mm/sparse-vmemmap.c | 132 ++++++++++++++++++++++++++--- 4 files changed, 177 insertions(+), 14 deletions(-) diff --git a/Documentation/vm/vmemmap_dedup.rst b/Documentation/vm/vmemmap_dedup.rst index 485ccf4f7b10..c9c495f62d12 100644 --- a/Documentation/vm/vmemmap_dedup.rst +++ b/Documentation/vm/vmemmap_dedup.rst @@ -1,8 +1,11 @@ .. SPDX-License-Identifier: GPL-2.0 -================================== -Free some vmemmap pages of HugeTLB -================================== +========================================= +A vmemmap diet for HugeTLB and Device DAX +========================================= + +HugeTLB +======= The struct page structures (page structs) are used to describe a physical page frame. By default, there is a one-to-one mapping from a page frame to @@ -171,3 +174,50 @@ tail vmemmap pages are mapped to the head vmemmap page frame. So we can see more than one struct page struct with PG_head (e.g. 8 per 2 MB HugeTLB page) associated with each HugeTLB page. The compound_head() can handle this correctly (more details refer to the comment above compound_head()). + +Device DAX +========== + +The device-dax interface uses the same tail deduplication technique explained +in the previous chapter, except when used with the vmemmap in +the device (altmap). + +The following page sizes are supported in DAX: PAGE_SIZE (4K on x86_64), +PMD_SIZE (2M on x86_64) and PUD_SIZE (1G on x86_64). + +The differences with HugeTLB are relatively minor. + +It only use 3 page structs for storing all information as opposed +to 4 on HugeTLB pages. + +There's no remapping of vmemmap given that device-dax memory is not part of +System RAM ranges initialized at boot. Thus the tail page deduplication +happens at a later stage when we populate the sections. HugeTLB reuses the +the head vmemmap page representing, whereas device-dax reuses the tail +vmemmap page. This results in only half of the savings compared to HugeTLB. + +Deduplicated tail pages are not mapped read-only. + +Here's how things look like on device-dax after the sections are populated:: + + +-----------+ ---virt_to_page---> +-----------+ mapping to +-----------+ + | | | 0 | -------------> | 0 | + | | +-----------+ +-----------+ + | | | 1 | -------------> | 1 | + | | +-----------+ +-----------+ + | | | 2 | ----------------^ ^ ^ ^ ^ ^ + | | +-----------+ | | | | | + | | | 3 | ------------------+ | | | | + | | +-----------+ | | | | + | | | 4 | --------------------+ | | | + | PMD | +-----------+ | | | + | level | | 5 | ----------------------+ | | + | mapping | +-----------+ | | + | | | 6 | ------------------------+ | + | | +-----------+ | + | | | 7 | --------------------------+ + | | +-----------+ + | | + | | + | | + +-----------+ diff --git a/include/linux/mm.h b/include/linux/mm.h index 62564d81d8cb..a097323778c4 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3209,7 +3209,7 @@ p4d_t *vmemmap_p4d_populate(pgd_t *pgd, unsigned long addr, int node); pud_t *vmemmap_pud_populate(p4d_t *p4d, unsigned long addr, int node); pmd_t *vmemmap_pmd_populate(pud_t *pud, unsigned long addr, int node); pte_t *vmemmap_pte_populate(pmd_t *pmd, unsigned long addr, int node, - struct vmem_altmap *altmap); + struct vmem_altmap *altmap, struct page *reuse); void *vmemmap_alloc_block(unsigned long size, int node); struct vmem_altmap; void *vmemmap_alloc_block_buf(unsigned long size, int node, diff --git a/mm/memremap.c b/mm/memremap.c index a7b6abf6ca1b..223ada81fe43 100644 --- a/mm/memremap.c +++ b/mm/memremap.c @@ -307,6 +307,7 @@ void *memremap_pages(struct dev_pagemap *pgmap, int nid) { struct mhp_params params = { .altmap = pgmap_altmap(pgmap), + .pgmap = pgmap, .pgprot = PAGE_KERNEL, }; const int nr_range = pgmap->nr_range; diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c index ef15664c6b6c..f4fa61dbbee3 100644 --- a/mm/sparse-vmemmap.c +++ b/mm/sparse-vmemmap.c @@ -533,16 +533,31 @@ void __meminit vmemmap_verify(pte_t *pte, int node, } pte_t * __meminit vmemmap_pte_populate(pmd_t *pmd, unsigned long addr, int node, - struct vmem_altmap *altmap) + struct vmem_altmap *altmap, + struct page *reuse) { pte_t *pte = pte_offset_kernel(pmd, addr); if (pte_none(*pte)) { pte_t entry; void *p; - p = vmemmap_alloc_block_buf(PAGE_SIZE, node, altmap); - if (!p) - return NULL; + if (!reuse) { + p = vmemmap_alloc_block_buf(PAGE_SIZE, node, altmap); + if (!p) + return NULL; + } else { + /* + * When a PTE/PMD entry is freed from the init_mm + * there's a a free_pages() call to this page allocated + * above. Thus this get_page() is paired with the + * put_page_testzero() on the freeing path. + * This can only called by certain ZONE_DEVICE path, + * and through vmemmap_populate_compound_pages() when + * slab is available. + */ + get_page(reuse); + p = page_to_virt(reuse); + } entry = pfn_pte(__pa(p) >> PAGE_SHIFT, PAGE_KERNEL); set_pte_at(&init_mm, addr, pte, entry); } @@ -609,7 +624,8 @@ pgd_t * __meminit vmemmap_pgd_populate(unsigned long addr, int node) } static pte_t * __meminit vmemmap_populate_address(unsigned long addr, int node, - struct vmem_altmap *altmap) + struct vmem_altmap *altmap, + struct page *reuse) { pgd_t *pgd; p4d_t *p4d; @@ -629,7 +645,7 @@ static pte_t * __meminit vmemmap_populate_address(unsigned long addr, int node, pmd = vmemmap_pmd_populate(pud, addr, node); if (!pmd) return NULL; - pte = vmemmap_pte_populate(pmd, addr, node, altmap); + pte = vmemmap_pte_populate(pmd, addr, node, altmap, reuse); if (!pte) return NULL; vmemmap_verify(pte, node, addr, addr + PAGE_SIZE); @@ -639,13 +655,14 @@ static pte_t * __meminit vmemmap_populate_address(unsigned long addr, int node, static int __meminit vmemmap_populate_range(unsigned long start, unsigned long end, int node, - struct vmem_altmap *altmap) + struct vmem_altmap *altmap, + struct page *reuse) { unsigned long addr = start; pte_t *pte; for (; addr < end; addr += PAGE_SIZE) { - pte = vmemmap_populate_address(addr, node, altmap); + pte = vmemmap_populate_address(addr, node, altmap, reuse); if (!pte) return -ENOMEM; } @@ -656,7 +673,95 @@ static int __meminit vmemmap_populate_range(unsigned long start, int __meminit vmemmap_populate_basepages(unsigned long start, unsigned long end, int node, struct vmem_altmap *altmap) { - return vmemmap_populate_range(start, end, node, altmap); + return vmemmap_populate_range(start, end, node, altmap, NULL); +} + +/* + * For compound pages bigger than section size (e.g. x86 1G compound + * pages with 2M subsection size) fill the rest of sections as tail + * pages. + * + * Note that memremap_pages() resets @nr_range value and will increment + * it after each range successful onlining. Thus the value or @nr_range + * at section memmap populate corresponds to the in-progress range + * being onlined here. + */ +static bool __meminit reuse_compound_section(unsigned long start_pfn, + struct dev_pagemap *pgmap) +{ + unsigned long nr_pages = pgmap_vmemmap_nr(pgmap); + unsigned long offset = start_pfn - + PHYS_PFN(pgmap->ranges[pgmap->nr_range].start); + + return !IS_ALIGNED(offset, nr_pages) && nr_pages > PAGES_PER_SUBSECTION; +} + +static pte_t * __meminit compound_section_tail_page(unsigned long addr) +{ + pte_t *pte; + + addr -= PAGE_SIZE; + + /* + * Assuming sections are populated sequentially, the previous section's + * page data can be reused. + */ + pte = pte_offset_kernel(pmd_off_k(addr), addr); + if (!pte) + return NULL; + + return pte; +} + +static int __meminit vmemmap_populate_compound_pages(unsigned long start_pfn, + unsigned long start, + unsigned long end, int node, + struct dev_pagemap *pgmap) +{ + unsigned long size, addr; + pte_t *pte; + int rc; + + if (reuse_compound_section(start_pfn, pgmap)) { + pte = compound_section_tail_page(start); + if (!pte) + return -ENOMEM; + + /* + * Reuse the page that was populated in the prior iteration + * with just tail struct pages. + */ + return vmemmap_populate_range(start, end, node, NULL, + pte_page(*pte)); + } + + size = min(end - start, pgmap_vmemmap_nr(pgmap) * sizeof(struct page)); + for (addr = start; addr < end; addr += size) { + unsigned long next = addr, last = addr + size; + + /* Populate the head page vmemmap page */ + pte = vmemmap_populate_address(addr, node, NULL, NULL); + if (!pte) + return -ENOMEM; + + /* Populate the tail pages vmemmap page */ + next = addr + PAGE_SIZE; + pte = vmemmap_populate_address(next, node, NULL, NULL); + if (!pte) + return -ENOMEM; + + /* + * Reuse the previous page for the rest of tail pages + * See layout diagram in Documentation/vm/vmemmap_dedup.rst + */ + next += PAGE_SIZE; + rc = vmemmap_populate_range(next, last, node, NULL, + pte_page(*pte)); + if (rc) + return -ENOMEM; + } + + return 0; } struct page * __meminit __populate_section_memmap(unsigned long pfn, @@ -665,12 +770,19 @@ struct page * __meminit __populate_section_memmap(unsigned long pfn, { unsigned long start = (unsigned long) pfn_to_page(pfn); unsigned long end = start + nr_pages * sizeof(struct page); + int r; if (WARN_ON_ONCE(!IS_ALIGNED(pfn, PAGES_PER_SUBSECTION) || !IS_ALIGNED(nr_pages, PAGES_PER_SUBSECTION))) return NULL; - if (vmemmap_populate(start, end, nid, altmap)) + if (is_power_of_2(sizeof(struct page)) && + pgmap && pgmap_vmemmap_nr(pgmap) > 1 && !altmap) + r = vmemmap_populate_compound_pages(pfn, start, end, nid, pgmap); + else + r = vmemmap_populate(start, end, nid, altmap); + + if (r < 0) return NULL; return pfn_to_page(pfn); From patchwork Wed Apr 20 15:53:10 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joao Martins X-Patchwork-Id: 12820525 Received: from mx0a-00069f02.pphosted.com (mx0a-00069f02.pphosted.com [205.220.165.32]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F0630185B for ; Wed, 20 Apr 2022 15:54:47 +0000 (UTC) Received: from pps.filterd (m0246617.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.16.1.2/8.16.1.2) with SMTP id 23KFZ7rX012445; Wed, 20 Apr 2022 15:54:40 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : content-type : mime-version; s=corp-2021-07-09; bh=ovme9e+tTvO4wJJnibmVHs6VjdH60MVWqY13XUbqeNQ=; b=RndOmyhn1tE4HvkUOG/trIOGeMNPniHA86gxv+NgTLtCwoagq9NkXBYzszl7PTmw+qUi MMYewsxSFAZkXsHVwpxjZvfXCy7qRBo5N0n+d9GdNNuWrDZT7G/WbXZwQ83c8YFUy1HK A1sUQ7FneiFqkyps/0rRxeFUbLWT5x4D2b7P40BIXo8mbugI3R2p346vJE781CcWSL1H ITphxOmZ0PNPppZHkrnQZSzNH+GIEHzVYA4EvbHZJphK05MB3fDfYN9f6I+mWYz/Ago5 iK7vvUOJNNSEqRv7FG7QfOZMg83PsaChom7H09w2ICH8eqJYkQoC+KjZzlxCiENsTvHv ww== Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.appoci.oracle.com [138.1.114.2]) by mx0b-00069f02.pphosted.com with ESMTP id 3ffpbv9ndk-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 20 Apr 2022 15:54:39 +0000 Received: from pps.filterd (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (8.16.1.2/8.16.1.2) with SMTP id 23KFqAdL001383; Wed, 20 Apr 2022 15:54:33 GMT Received: from nam11-co1-obe.outbound.protection.outlook.com (mail-co1nam11lp2169.outbound.protection.outlook.com [104.47.56.169]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com with ESMTP id 3ffm8727rc-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 20 Apr 2022 15:54:33 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Xp43mx3+UqCCQMJmljN6O+ZnZg6Zc3/ETqvwkJ4MjZ6GavhocvrgyjOW6jU99SpSJCtwW4aX8O5bmcc1kpuYWt0bIiVG4Gy/xpza2wI7AkRhLBQTdezbqChlcP916zxcARn7wO4hZI6ssyjlgPPsuos+pM64lFzpBP+S5ddzdto99gskoMTg1Yn/CUlHL3z+p40pBNagVheorJ/n0e6wymSi3wlmRtD1NZQEcvcOP2xVFRK+X4hKTIM9ldZVd1Qs0efxFuUhpoFsfPNvw2V2jkXgN1xxs1s+hDwM/AFW0lKaTO9FBNA2KAByAVX25ZFBiY0O123ggKCxQt7pgiMBUg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=ovme9e+tTvO4wJJnibmVHs6VjdH60MVWqY13XUbqeNQ=; b=dGdQzohFeXEo9QaCm7RVxPwBaCH9YLTMFjH5c+sd+5Zq5JFtWFwyqwEGC5D6iSS77aCPRcmTiVtD/TkJArYHtgaa1iyxwXwOGUywzsfsBY5NF3N6opRtf80IUsmobGRE/8Wg3aTu66pQdRXJ10H2rpMddPY0hcAs/yDgJK8adML1aiuFuZtdby8DLNJdXHQn36lxXMiVoBuIeRsQpIJIIgxPfqT1FFl3WB3Q4h8W2xd8A201GxaPw3G19eNGHG42J705IF/aYH+MTVTHcbIV1oL3dWfFRlZA6SKQwW1gsyBx7BzilU6cQuC2Tu/NqZ4vfc0o+mI7azWEP72nNZBVhg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=oracle.com; dmarc=pass action=none header.from=oracle.com; dkim=pass header.d=oracle.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.onmicrosoft.com; s=selector2-oracle-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=ovme9e+tTvO4wJJnibmVHs6VjdH60MVWqY13XUbqeNQ=; b=tJUr9gzA8qqNKL7m1bHuXarRPQ/9WhAHSCkgRGcv2/5VOcrybeB1fBSQxNh8ROdNm2egjGNhJAmTlbkcucTCpO5fh4D8ewdhCNtjZ50yxp1LCUgWHu5P35jZ9TovX+mUqllgageXqL9ShfyNpyMDvoILJi4lh0O26R2wo8dVYSI= Received: from BLAPR10MB4835.namprd10.prod.outlook.com (2603:10b6:208:331::11) by BN6PR10MB1330.namprd10.prod.outlook.com (2603:10b6:404:43::16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5164.25; Wed, 20 Apr 2022 15:54:31 +0000 Received: from BLAPR10MB4835.namprd10.prod.outlook.com ([fe80::d17f:a2a4:ca0c:cb49]) by BLAPR10MB4835.namprd10.prod.outlook.com ([fe80::d17f:a2a4:ca0c:cb49%4]) with mapi id 15.20.5186.013; Wed, 20 Apr 2022 15:54:31 +0000 From: Joao Martins To: linux-mm@kvack.org Cc: Dan Williams , Vishal Verma , Matthew Wilcox , Jason Gunthorpe , Jane Chu , Muchun Song , Mike Kravetz , Andrew Morton , Jonathan Corbet , Christoph Hellwig , nvdimm@lists.linux.dev, linux-doc@vger.kernel.org Subject: [PATCH v9 5/5] mm/page_alloc: reuse tail struct pages for compound devmaps Date: Wed, 20 Apr 2022 16:53:10 +0100 Message-Id: <20220420155310.9712-6-joao.m.martins@oracle.com> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20220420155310.9712-1-joao.m.martins@oracle.com> References: <20220420155310.9712-1-joao.m.martins@oracle.com> X-ClientProxiedBy: SGAP274CA0011.SGPP274.PROD.OUTLOOK.COM (2603:1096:4:b6::23) To BLAPR10MB4835.namprd10.prod.outlook.com (2603:10b6:208:331::11) Precedence: bulk X-Mailing-List: nvdimm@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 4d3b2e22-4bbd-47d4-3711-08da22e60ec0 X-MS-TrafficTypeDiagnostic: BN6PR10MB1330:EE_ X-Microsoft-Antispam-PRVS: X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: /uaQhrkuEZAE+GljW/TRWJv61FJxjRKFeQv2RRsRzRvUG1EAxglmR0k0mZ0HElvDk/f67B+bJR9AQlVjlyw49FRnznjB7KkKDYAJ1V7g+M6hlKivDiSkB0i0APUe0sFnmg7RuOM58B2gKOwmjq0qcDvidnCwVcG2/vUDLKGGsoOScC/D8OBIqTUKANLdERTiQ0yaYUfgngJ/AR0vF3+hs/9CrwLoVA6EhXgM678/zjeYdji+6+SuEfqHGpQR6RwSCpLjJ9MDHKl1Rf/TnGRNsgsBWkoRhybY5AHoVUWj/R2L4Hg2accWHRjLwTOHGQSu+ZOcIZIQS/b6MWngbtq/SrBBsD/APt8PcZ+QSzAbuWxHikNhED73wg5IpT1hO7LkV1P1o83rCCsGsbYf0ikXlAfCT6/e/DvweUHanSdSiLRRRcfqFxsgjmSPtmwGTexDCfPFieU/pOYaQYdRwZk3LsVZdhYiNaHh1ny+4i/8nuDbrMp/IbSdQ2ycAcxJuB/WwZ4mpRxBDCR5tQL9xqJTwGTIeG9bY1dC0CNC22Gj79tyJ1/qyS0EGafqlzlANjGUEsA11Jow04ESm/v3GUEA5q5DpIdWp7Ogw9XOqYQsa6R0SfwyebYk0YZ1qm4Zzi5t/Ab6A1Okno7JE5T0CkvPF1dpBF6eDZFRj+UZriR6REFY8u+Gxl6suJqfEUc8mXIvVQxxHm0Yn0X8va4VZYwBsA== X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:BLAPR10MB4835.namprd10.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230001)(366004)(6506007)(8676002)(26005)(6512007)(4326008)(86362001)(38100700002)(38350700002)(6486002)(8936002)(66556008)(508600001)(66946007)(6666004)(7416002)(52116002)(316002)(1076003)(6916009)(186003)(2906002)(54906003)(36756003)(83380400001)(2616005)(66476007)(5660300002)(103116003);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: 2F+l4YW2Q64H2LlQm3MOFDrJ6wZ0cEC5OI7xT+TznJ8PsKhB5AU7E6w5RkrZPY8urkZB8K7tqwuTLtSPTWESz9nN2C31XXtogCjS82/VaUxrWVEqOGl6UEljN7DwLHemeNsRrcQ+a4cJGeN82iRwH5yPkFB0oivpcNtE0NyctoHT3lM40U2i4eMcup2bkr/cYZyztjxLX962wjWhlHoGDi46qBENiHVdowa8SevxO4uQJcjBGUq6f0pu6MH5bRfvvHsPyPw/1+A1S3U5KlokNOQSMVohWShXq5emcyUb9jOdlZGk8gg6mEDbByAK2/YfkJ/faaraTpp1KjyFS72HbyUMvyy594q+dtUzfEtGP1ki4/RGkDbqUlytRgbBfw+uDGbKwn0K4+26/TMrDXEAXjLijXolAobMOMyA7kjEFWBci8ebZWEYhiTETX/iMSbUeL06CpXKay69B9nSL3Rt2PrV3F8Y1H8O5VgXsOFKcA10JlzY53bSRTREYvAztWQkLvb7gkJMmKuyRQrp98IHivuB+ZVCXkJ3j3siqOAuoeBeB0yUMyIlUxcFtzKJ4oMiCN69VWH/we+hkJFEh2w5akBZnZehfaaNz8q6ja6I79HDbhoMPP5oA+EFydMjp+fssyQmCPeHm8oHSkQ3CuuYm+Krc445G64wf3DQxooX8JOgRTjE81ajZ50Hl5bygMZ91/7k+cZeL8/60aVf5DZckLojysdTAwqQu0tpFMrKCuSeMigv0tCqFwNu05pd0j++QLHV0LfzczGjvDXVoPNjE9p+VtxMklYt1F9bYEByBQmm3kJqi4idx1wWIaVx5yXhQFCNmLvEB2pH0ttFSycmMn+9kINT5LZl7MyIvxKXYaRSi2ShLGwt0epXqb/2xqvYIisU49wJ4er2x0d2s8lII0O9mj6UoIN+Jf+qongZkul3pesLiBOtfXaCGs3xl7XD41WjNS0oOzwgHB9QvO/zIk5GVmVwQE4hU64tm0Yz1UU+fjG06GL9bn8HT9zLZj3PaHYMRz+UBwLKJkVC6P15Pj7Kh7W4Howhy0dDiASqNrOtMBdzajAc8zJT78ytV0QUvw3B8IsAB9CFY7HYKTRV+QARgmkUray52n6oAopTzwUPuPakRMVa5+ta0QOh9+9aWQ2ODmFd0wGPdebBLdMMq2ejQjWiG8Woh3CG6Fb/61UqnIV0XcVBRITy98EWa5Sr+742kVMJXqcFotcCWiPh12axu7Cz+ySr2iLsCn7PAtjSUTi1eGiv3o5HgOqIySyy/1jqs4mF2/rY1pC1pBBQJIoyqJFEOHAD9SK2+e/dcGzgPiAcpEqRSx0gF3TC4i4XLA20sNDUAf1ElPzVSEfp1zKUr3j4JJtrF2YxoLGDqstJH1D6b74jYHGOe+YJtRmdPQv7lJ+QQpYDUppd0ACjC9s9RHv+3cNaAXLUN/bcKyGOCWuh/wzGO88RZPPYoix5ubHAYwqiNf6B99MlMBIeuuaZKstaZhKQ0Zr1KAuyIsnp0sOPD6FmXsTDzBxSNAa5ZRllFGv/T4FlDgjV5TXCTVVinTd/R3VnpeiQzK1DpHqXbHFF6vwRqwor/tqFLhEsAJj/9cfJ11/bPLtI7/etailk09hlLIHJcrK23GP4vAVSfjDTLKa7Df8Tdb6FZsWqYpsmqDNZCB4aMxRiQgye2ZyA+ga1SPzKXDhU5Tqw6/6WebVIPhCEP863q7ktFZeQGgF6YjnFmKd8SllvDhWWQI/ZzPTz5y+KTnhSsiir2iU= X-OriginatorOrg: oracle.com X-MS-Exchange-CrossTenant-Network-Message-Id: 4d3b2e22-4bbd-47d4-3711-08da22e60ec0 X-MS-Exchange-CrossTenant-AuthSource: BLAPR10MB4835.namprd10.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 20 Apr 2022 15:54:31.4458 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 4e2c6054-71cb-48f1-bd6c-3a9705aca71b X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: LBInq0BJeVRJCnICv5xmv92VJz12reESKnRupKHIW2fg0yVJx5JxHzTN+1qS4Q7TLHJCWHW0bGkSoNC0LR6SZW/3qpfdx/qf6lqmxar3UHo= X-MS-Exchange-Transport-CrossTenantHeadersStamped: BN6PR10MB1330 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.486,18.0.858 definitions=2022-04-20_04:2022-04-20,2022-04-20 signatures=0 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 phishscore=0 adultscore=0 malwarescore=0 mlxlogscore=999 suspectscore=0 spamscore=0 bulkscore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2202240000 definitions=main-2204200094 X-Proofpoint-GUID: 2XgPbn4L1qH4ratiX2fP5or5HyUWEq_g X-Proofpoint-ORIG-GUID: 2XgPbn4L1qH4ratiX2fP5or5HyUWEq_g Currently memmap_init_zone_device() ends up initializing 32768 pages when it only needs to initialize 128 given tail page reuse. That number is worse with 1GB compound pages, 262144 instead of 128. Update memmap_init_zone_device() to skip redundant initialization, detailed below. When a pgmap @vmemmap_shift is set, all pages are mapped at a given huge page alignment and use compound pages to describe them as opposed to a struct per 4K. With @vmemmap_shift > 0 and when struct pages are stored in ram (!altmap) most tail pages are reused. Consequently, the amount of unique struct pages is a lot smaller than the total amount of struct pages being mapped. The altmap path is left alone since it does not support memory savings based on compound pages devmap. Signed-off-by: Joao Martins Reviewed-by: Muchun Song --- mm/page_alloc.c | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 8d4c6a74fc85..8c34d43a4914 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6606,6 +6606,21 @@ static void __ref __init_zone_device_page(struct page *page, unsigned long pfn, } } +/* + * With compound page geometry and when struct pages are stored in ram most + * tail pages are reused. Consequently, the amount of unique struct pages to + * initialize is a lot smaller that the total amount of struct pages being + * mapped. This is a paired / mild layering violation with explicit knowledge + * of how the sparse_vmemmap internals handle compound pages in the lack + * of an altmap. See vmemmap_populate_compound_pages(). + */ +static inline unsigned long compound_nr_pages(struct vmem_altmap *altmap, + unsigned long nr_pages) +{ + return is_power_of_2(sizeof(struct page)) && + !altmap ? 2 * (PAGE_SIZE / sizeof(struct page)) : nr_pages; +} + static void __ref memmap_init_compound(struct page *head, unsigned long head_pfn, unsigned long zone_idx, int nid, @@ -6670,7 +6685,7 @@ void __ref memmap_init_zone_device(struct zone *zone, continue; memmap_init_compound(page, pfn, zone_idx, nid, pgmap, - pfns_per_compound); + compound_nr_pages(altmap, pfns_per_compound)); } pr_info("%s initialised %lu pages in %ums\n", __func__,