From patchwork Mon Jan 6 16:55:03 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zi Yan X-Patchwork-Id: 13927678 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id ABA18E77198 for ; Mon, 6 Jan 2025 16:55:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2CD296B0088; Mon, 6 Jan 2025 11:55:33 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 27CD46B0089; Mon, 6 Jan 2025 11:55:33 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0F6F36B0093; Mon, 6 Jan 2025 11:55:33 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id E0BD56B0088 for ; Mon, 6 Jan 2025 11:55:32 -0500 (EST) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 6EB1B80111 for ; Mon, 6 Jan 2025 16:55:32 +0000 (UTC) X-FDA: 82977628104.04.ABF720B Received: from NAM11-CO1-obe.outbound.protection.outlook.com (mail-co1nam11on2046.outbound.protection.outlook.com [40.107.220.46]) by imf10.hostedemail.com (Postfix) with ESMTP id A2F50C0010 for ; Mon, 6 Jan 2025 16:55:29 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=Nvidia.com header.s=selector2 header.b=sUv6TsWV; spf=pass (imf10.hostedemail.com: domain of ziy@nvidia.com designates 40.107.220.46 as permitted sender) smtp.mailfrom=ziy@nvidia.com; dmarc=pass (policy=reject) header.from=nvidia.com; arc=pass ("microsoft.com:s=arcselector10001:i=1") ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736182529; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=jDzzex6kBQk+VWvDbD3WpCkkqUIMvfLYdYBc0yxjk5I=; b=oTc6YUBv01Dc6ulyE1f5weTtutSOI3Ham9tF/ZCN3bp6s2ceNZEPiy2f3dYk4v16OYnXiX oFJrVSThzDuXE2sd5CD98uTT1I4fyZmRHgX4Qoe+VX3jLw4HzdKyUFH4gDExGAwhBLsFeJ RvmxwG2+GpSaR5kD8PSdhtfj8XIVJXw= ARC-Authentication-Results: i=2; imf10.hostedemail.com; dkim=pass header.d=Nvidia.com header.s=selector2 header.b=sUv6TsWV; spf=pass (imf10.hostedemail.com: domain of ziy@nvidia.com designates 40.107.220.46 as permitted sender) smtp.mailfrom=ziy@nvidia.com; dmarc=pass (policy=reject) header.from=nvidia.com; arc=pass ("microsoft.com:s=arcselector10001:i=1") ARC-Seal: i=2; s=arc-20220608; d=hostedemail.com; t=1736182529; a=rsa-sha256; cv=pass; b=atuXyvcxPR1twwylZMVACdhkWunxz6AriAZzLPT7INMOYt1OXy5Pozc2f1KzJbrwtKNY68 lim8CDIKfnVBBNzvQTPmK3MTJt/ZWcDY9XORC6TOjzDWnW2DEHwGD49pmPHw5GcCPJp8Zl RGOuENUR1c2EtNfL/6L6PhO9zTnDMF8= ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=DnCNgw8GhBRKtPKPb+eeKmCvyWW7BELz5/uklSpnahcXXUC4m2QIuUOytXUQsH206O41FukpXOY5OucIhDDymGr3IM7kXrJVSDPZyPV1RElTIP3ecAWtWlM0Deaj60OXXrA7MDTWHnNppCMEbmvzh6Q6FOHz4YwobUJBBK6D76RzLICdN3YSTADGVYAZ62VVtqwHPgA43wJnYj7SXmOlrE2zHkX8cOZlsF5OdauFQhRshjcZfXulLCNmHsVCfMBx6+ojIOmD1Lq+KdXH9k4W4vP7nT+aXkvrSuiOPvQam1iW8pjFES2Si32WwqBUJ9fFpq3Jf2tP/yQ7x8OOsQnl4A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=jDzzex6kBQk+VWvDbD3WpCkkqUIMvfLYdYBc0yxjk5I=; b=CLJVMr4uwsBsl922NUv6JeMQZn6yNq2AtYkjbqBnFGNd1SCn1mNQQJf5MVqBemPGXx38xcEdivnNrWxpgTZG5FxL7GOgw9vwmPcNugJ4kTlhN3FaR/2I9dJVW0fv1gYJLXRu/yvnkrkHsRwNxXvL3ufYUlQxZNPO0VQhO31arTvUU709ZeF4/VIC6WLzvpn0rCPgJfo1US67t35O+Vkkg6b/J1Cz0Obofbf7EyPA8JTr76pGPTd9YzSF1VT/LUJgeFktA+ev8l6ElLsCJubj7FzV3qLDIwM9iSrjZ4q9QVoTuSuHZzeWlTIKoz43mcgTNeHR6upl67XzlhxqrND3Cw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=jDzzex6kBQk+VWvDbD3WpCkkqUIMvfLYdYBc0yxjk5I=; b=sUv6TsWVvsxYPUkrKRD1IlcyTv/jbJm8lwf5CpiWOSiZprrTHAoweMUZW4hRbDh3QuoEhR049qWaEs/OYjN8z2GzhMSby7cr26roawb84Au9Fgf4O679C4MqV7umokQpO3MEh7+fXmbDsBkBLMW3Q7Gr0ev4rvTmoeDIobY7yds100z52w8Vk31uEq5FzMX9mUONYfprjTQWANDjTgZJmjrtnkSYcLy+QMmYw0EiFtNMLchl4Y7xd7SlK44gh65X0lyM/Ie0AyclB33ty97Dlw6Tar3AjLeXr7EECnLsNofifhS/+X75k45faHcc3gZ7ejJkwZF/NZy7f+2gIu2GGQ== Received: from DS7PR12MB9473.namprd12.prod.outlook.com (2603:10b6:8:252::5) by PH7PR12MB6634.namprd12.prod.outlook.com (2603:10b6:510:211::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8314.17; Mon, 6 Jan 2025 16:55:22 +0000 Received: from DS7PR12MB9473.namprd12.prod.outlook.com ([fe80::5189:ecec:d84a:133a]) by DS7PR12MB9473.namprd12.prod.outlook.com ([fe80::5189:ecec:d84a:133a%3]) with mapi id 15.20.8314.018; Mon, 6 Jan 2025 16:55:21 +0000 From: Zi Yan To: linux-mm@kvack.org, "Kirill A . Shutemov" , "Matthew Wilcox (Oracle)" Cc: Ryan Roberts , Hugh Dickins , David Hildenbrand , Yang Shi , Miaohe Lin , Kefeng Wang , Yu Zhao , John Hubbard , linux-kernel@vger.kernel.org, Zi Yan Subject: [PATCH v4 00/10] Buddy allocator like folio split Date: Mon, 6 Jan 2025 11:55:03 -0500 Message-ID: <20250106165513.104899-1-ziy@nvidia.com> X-Mailer: git-send-email 2.45.2 X-ClientProxiedBy: BL1PR13CA0266.namprd13.prod.outlook.com (2603:10b6:208:2ba::31) To DS7PR12MB9473.namprd12.prod.outlook.com (2603:10b6:8:252::5) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: DS7PR12MB9473:EE_|PH7PR12MB6634:EE_ X-MS-Office365-Filtering-Correlation-Id: 6bf72f73-4d81-43f6-ced6-08dd2e72e81d X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|366016|7416014|376014|1800799024; X-Microsoft-Antispam-Message-Info: uUZzsEpQa+hpX5bKswqjHDYQCQeY7mazy/Bw8fTElpWf6MeH9li9XY2TS2J+6PhS+ttXc7Yx1Rs1JGLDFC/npaUIDLpoKWJLIfrVTV35UQpglV2q41dg9omir3m9kmzCppQiQv0mL8ghyTX1PvdNV/TXQmxZ7MRRhKnoXetjYvXUCmeS6hiU/srubIJNJu4RA384Q1HoTL59MQD5QeuLiJqngu2Ij9C56hJ8bC8kZ4PMhRy2DLWXU2U0QXuMXaN3a5rCa82L7EanDTSHmjIcRV0+VPXoEswD/R/2732Z4ijQNbFqdvrbQbJqXbvqO09YXODSnDB6msVCTTL4RtO1/rUYqZHmuIEJbyGoP1ErZQH/lzFOj34YzZ1msZWJEIgQGglnnmhBr/8nMePVAgj/rY+xO1WJ+TEac2/aX+XT0XamFGt5HdLBfgz6gX/DK8rKT1mavaD8SiG2KYRpOwkcqHl/XBPhS4mnTWJFDJ1Z+Pv2Yf1SnCtiTeTz2ApvejJbu/PbDW5G25P1+HjCT8qNPiDVE4BMXebyWMfdIOtGH3AkxyF2w1vA684f1Ya+2DPJWZ+oSkIRdY5rmQu8Q9v35Rq0Fbel2fNitB6A6jZrnMTU9msgDt0joAWZnUUlgguF1Buik2Q7KFeVrZ9NJm/ZuW4rqX6O9/RJvKQ8/mRCG1T5tTtjh7DFMm+RlME/i7bgi3wBasbHdlOOB1oORllQQOpEZ/73DhPGczigvNFAPMWoXRHq3Q0r1AqbF2VC01nlLiw9V6UZwURyplaHKTOr22YPy+6qO6cdZl+iF4Fxtyakihh3pu6EQNgrRR1rXblXsnbLI4cCfxeT/jsFV+SN1Djb01Iedbk9tTTEdyE0/uFOfaS4NYau+tWzhW/vucBjPsV24KRpCUd8ZH3gWygV4yv3coOpjs6I6MnLYZ3l8MwolGLn91UQxnqNqy0qVKx+BnAWyfw2wdEpE95AyCodGL4d0MvfRNfWdHgTE98ObV3FpJo4hHcz7wASwrJewuBUKGg1QyoRSZctl+K3eS9jQJG01JIhJQ0weWX6TatTTpgmFajCYR+iVaVEQhB3Xufddmqzz9BtKjLjn+43237+i61h20fzl2MFeYBZGi7LLhYjF8eAWOkR20yTmkAkIsl8DJ+p5hdWF3H4xt3lCjaxY7mJxqNKXji7ggI++/twLBzkI17Nq43OeiuIPyJQanpPgNHNq/oRMtMDO/+3jLsxNVlY7p8cax8E4TsIqDy560GBQOyh3OVW4JeEO1ToRnlilNlYKRMFCSUnfw7YXUrBISJjKZbyXTi/MErctQogOdEs+99HoN6DEHmgMDOSiwu9sCnu1hkaxYB02IpV/a2CrSx7d7y7qPRdPYkIoosOsBxDuEPoeyXJOdOkO/pu0Gj5 X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:DS7PR12MB9473.namprd12.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230040)(366016)(7416014)(376014)(1800799024);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: +lPcxx1iGK14N6pKZt+gVBWt81gc3u6ECcBRnqezUQ+/tgdQ0z+gsWCAbeV1i+gn5NxzamelKWuNv+TGZtU/G5i29Vua8h6TQwKYNMLVAlarLhHfq9sF4jqapzieZ5FCIikYDfLYPEq+FZ7tN7kTTgBSqjwpIusVaz7KFbJ/fEiR1NqYF5cNFb7zfAjeTigmV4z9VK94NZL4o+x3IAEglmghhIrHs9W0BBggJ5SWneUcCqWiH3+BCQ7PVyKYEnkjBhRylII1x/3jvQKOFr/ubh7DtZG74WX7QmrfBbFRpvs5G2n3j/SesMtD7H5jftuJ+Pezf1FFVLpBCFOwmrzq0aGBIU6FO4jXxzvJ3yABS5RMIpvPFE0UrJDAPO8MUc4L8n7cv+HKV25OYWN+0FwGtpGocxZmmyYoj7M+zrEQRn5Y8Nl4hlMsulRuM45+Is/fnYmwOFfV644NKnw//dyqn4wlPaqXsJpEMp6+CQgYLnDL9KaFa43kenjA08xfAY7szy4zYbqT6zNK2i8pUvLHq5cm6LSIhCw9dkMvQnAX9M0lDuaLG9t/zZrUOH8M9RPkvVrRiT/xzU1D6v2FdOEeeaaAGq44e0Jk0+Co0t8N0E4chDp40SkBHl3v0J6lGnqqr7Tq60Q0Y6WjIYGLptUP8d9vnSNQKtb4eba9wFSDr0+VipHaU+vGAkBQVYQg27mEmPcs6xMka/GwshW8OptqxD3KC/fbHxsx1kQiQqd8S+noO70jszqoXYQ4yqyxJS5BTK9L/ODdS5XruqfwM1/jlrbVEWAKzRrdCcDc8c3EgjFunxcgKZDSXC8wRwunnlD3CaXvMj0E7iZWq/qw/mMIBm93DQUqtYE4FsBAKO//mbZUADkusmm5sdGeKRxUxyvZPmAdZUgpFLcpZp90j/bg5F4TkfbCQQPjR9kHfiVsR69yFJ4ZjOk8yz+sCJ7qnB9T65bDEXp0Zkm0ve2eZLwcrhLE+aiG+MdyPD0Dp/cyE4yS8NXd7G+dKomTjhTQtxocpiR6Yj5y0qOSYvCvZ1v8vaW4IMSUq4OLd5E6vx0if6h4svRYiYT9z8nyLvWn0+q4OiHS13658+FnuhTCbiYEwQUgoIGfYGHGGI9ixTUPNominv2MLGo6GuNNtmhuPKlohcZyQMQKNABzL7j5iWl7P+awYhJ7N6dlcYMc2Gg9/c8haCKoa0Zfe6fNQcl1fduFKjN1eFJORJGKlNh/5rJsOss1JMZ0h2mk80NLyItQtQ1sDg/cqAvmADwwa036Pkp2tS/yPlXQekkXwbaQstJcbYepJs1Pwrqlplrt//aB90/Ey1QN1v6aL17wJmHANkQaLM1IgktDfyk/BlUCsKL8VKrM80yTvesJcTSV6YeTksLIDdNP2I2SKQsxsiuaqFsOrjSPJ5j0/WaSOvE1UTVUjW0lEKoks3dP3HQPHf++CahQoIvZGv2WXSUdBeE8iBa4DPyHi72veXayfSbMd/xkdqoSCbIMKGouWS2tR4uoi5wOHEQEeb4rTDLcB1Gx70quH+MXVBttL059NSq2/vQT17GIfF5EnpK6Q0KdwA+I8b3Fgi7sc0rpvM3Ds3+IemTS X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-Network-Message-Id: 6bf72f73-4d81-43f6-ced6-08dd2e72e81d X-MS-Exchange-CrossTenant-AuthSource: DS7PR12MB9473.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 06 Jan 2025 16:55:21.3058 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: H/iYiLjRZjOq3dM+PKweJiwdKlw9fTcngaGicYvKU2u7FeQ20jsm1+YBBTyJl2zD X-MS-Exchange-Transport-CrossTenantHeadersStamped: PH7PR12MB6634 X-Rspamd-Server: rspam05 X-Stat-Signature: duc5nfad86f5hm76oq1tswxbow9cxexw X-Rspamd-Queue-Id: A2F50C0010 X-Rspam-User: X-HE-Tag: 1736182529-613600 X-HE-Meta: U2FsdGVkX1/X9+39ros435ErWB9/e5tKiQONbwHF1qaMXsUCOG5vBYvt24AQ16MRUL/EY37gAa4d2IHqkRXPzPFYhhMmEdxN8PAwGlKa3Ii2rjM2SlIoMU11oJ9KM2zCKAyhzvocvhIVgZWI33MQs67dffpVRzA4QZ+WV9YW7SFMP6jzEdTu6MvY6DI0BtECzauusLEu+a55O9QRZmRAY5QyUvOdiPtG5fVq85kvhZfUR5/TfndW6gl6j+9PjJ8yKX33nowshtxTQN4pwleK1PKLrkm/HihTXLgkBG1ZvL2lB3aLxsiQd3Iy0y6FCnHBajX7IkcWgU/L/kIvCxGzF412/7E6Lsy0spJhOQ7q7k0usGA6HvT7cFoGlbsJPQEWOaZAmEK7xktZf7w7XI2QGlGW0rb+TWBRs7LVn93wWCmoLc8FL6v7xkk1asfURZM2gzqquFIeyRUtnO5sKnrb/7fNbZZGE6J7bE0gK2nE0RrV5LhM18aa1zvnEBVpmvYkLlP58meBGhzXDOTsn61wIX4a3MyCJN4hmeWIhAOc1GDRqCq9jiNB8Nun+rDZrDzezDgp2Cp0dAaTOV6tFkGe+ba/hGRbn/cDfGNSGv8HEa7/FKbaieNTHLd/s3B++NqpgpBxzCNKW/tT09mzNFl/6NLCGqlCekaZwAXfP1/qvMlcIehytN3YII2mJs/fCGN8iA45+oG3B23wOgNlJ6tyeNOW7Wibto2Zb8t6O4ilNTSiJD99pv9JKdnV4Ns7ajachKk5TjjcsxfqtOrA70qt3VlNfMcLaMTAKj3IKcoC7ur7F4JkA6h1bQvPPD8Wqvko5bnVJ9AO+AGibIp7hY3Ftm/GohYhiphnbDAw5GdG7UhJ5Nel7yCKZVHjj3JtqCPwQMOao4oEsKgeI7XgkzDtUaKCg3pb+4rCyMpcgQWMCpSVIQjyimIAFpnYlLKVVI2P3CArnv1x8kP1cJMhpfA SZ27kAsf 2NP3Ew3JfvRvcHQyG6eIn6/aZFZGv+/dxnN2rdlh+aF1ilv2KTC0ZXfLkr6gGY84Cj8NcqfqdvVS3ZbySMZ5mdcirBr6nWMFVM+wyk3dtQ1EUZRk4tPqjyww6b7QkTVvDsNKDQ1PRHhOKDMIAo6YaCnIrgkAtpDKOStY+xX1lDSx0umgyBXCpPH9Q63w70m0YDcYIaeytl/UDQZoUkKETrUXiwY6C+mI+XNEjjdAGhxejR2DMFNaZgvYSGpYFCsIMKmGBcyTN3PLrrU4BkU9roIU47X0d7+AUIfSNN5T3Q+py2uwryIDZDL6+sV+GqHNMewZdkCXdOktAdfCOOUz6JS10hQyvRGQzPs4K63daFhoCfoc= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi all This patchset adds a new buddy allocator like large folio split to the total number of resulting folios, the amount of memory needed for multi-index xarray split, and keep more large folios after a split. It is on top of mm-everything-2025-01-04-04-41 and just a resend of v3. Instead of duplicating existing split_huge_page*() code, __folio_split() is introduced as the shared backend code for both split_huge_page_to_list_to_order() and folio_split(). __folio_split() can support both uniform split and buddy allocator like split. All existing split_huge_page*() users can be gradually converted to use folio_split() if possible. In this patchset, I converted truncate_inode_partial_folio() to use folio_split(). THP tests in selftesting passed for split_huge_page*() runs and I also tested folio_split() for anon large folio, pagecache folio, and truncate. I also ran truncate related tests from xfstests quick test group and saw no issues. Changelog === From V3[5]: 1. Used xas_split_alloc(GFP_NOWAIT) instead of xas_nomem(), since extra operations inside xas_split_alloc() are needed for correctness. 2. Enabled folio_split() for shmem and no issue was found with xfstests quick test group. 3. Split both ends of a truncate range in truncate_inode_partial_folio() to avoid wasting memory in shmem truncate (per David Hildenbrand). 4. Removed page_in_folio_offset() since page_folio() does the same thing. 5. Finished truncate related tests from xfstests quick test group on XFS and tmpfs without issues. 6. Disabled buddy allocator like split on CONFIG_READ_ONLY_THP_FOR_FS and FS without large folio split. This check was missed in the prior versions. From V2[3]: 1. Incorporated all the feedback from Kirill[4]. 2. Used GFP_NOWAIT for xas_nomem(). 3. Tested the code path when xas_nomem() fails. 4. Added selftests for folio_split(). 5. Fixed no THP config build error. From V1[2]: 1. Split the original patch 1 into multiple ones for easy review (per Kirill). 2. Added xas_destroy() to avoid memory leak. 3. Fixed nr_dropped not used error (per kernel test robot). 4. Added proper error handling when xas_nomem() fails to allocate memory for xas_split() during buddy allocator like split. From RFC[1]: 1. Merged backend code of split_huge_page_to_list_to_order() and folio_split(). The same code is used for both uniform split and buddy allocator like split. 2. Use xas_nomem() instead of xas_split_alloc() for folio_split(). 3. folio_split() now leaves the first after-split folio unlocked, instead of the one containing the given page, since the caller of truncate_inode_partial_folio() locks and unlocks the first folio. 4. Extended split_huge_page debugfs to use folio_split(). 5. Added truncate_inode_partial_folio() as first user of folio_split(). Design === folio_split() splits a large folio in the same way as buddy allocator splits a large free page for allocation. The purpose is to minimize the number of folios after the split. For example, if user wants to free the 3rd subpage in a order-9 folio, folio_split() will split the order-9 folio as: O-0, O-0, O-0, O-0, O-2, O-3, O-4, O-5, O-6, O-7, O-8 if it is anon O-1, O-0, O-0, O-2, O-3, O-4, O-5, O-6, O-7, O-9 if it is pagecache Since anon folio does not support order-1 yet. The split process is similar to existing approach: 1. Unmap all page mappings (split PMD mappings if exist); 2. Split meta data like memcg, page owner, page alloc tag; 3. Copy meta data in struct folio to sub pages, but instead of spliting the whole folio into multiple smaller ones with the same order in a shot, this approach splits the folio iteratively. Taking the example above, this approach first splits the original order-9 into two order-8, then splits left part of order-8 to two order-7 and so on; 4. Post-process split folios, like write mapping->i_pages for pagecache, adjust folio refcounts, add split folios to corresponding list; 5. Remap split folios 6. Unlock split folios. __folio_split_without_mapping() and __split_folio_to_order() replace __split_huge_page() and __split_huge_page_tail() respectively. __folio_split_without_mapping() uses different approaches to perform uniform split and buddy allocator like split: 1. uniform split: one single call to __split_folio_to_order() is used to uniformly split the given folio. All resulting folios are put back to the list after split. The folio containing the given page is left to caller to unlock and others are unlocked. 2. buddy allocator like split: old_order - new_order calls to __split_folio_to_order() are used to split the given folio at order N to order N-1. After each call, the target folio is changed to the one containing the page, which is given via folio_split() parameters. After each call, folios not containing the page are put back to the list. The folio containing the page is put back to the list when its order is new_order. All folios are unlocked except the first folio, which is left to caller to unlock. Patch Overview === 1. Patch 1 fixed a selftest counting bug in split_huge_page_test and patch 2 added tests for splitting a PMD huge page to any lower order. They can be picked independent of this patchset. 2. Patch 3 enabled split shmem to any lower order, since large folio support for shmem was upstreamed. 3. Patch 4 added __folio_split_without_mapping() and __split_folio_to_order() to prepare for moving to new backend split code. 4. Patch 5 replaced __split_huge_page() with __folio_split_without_mapping() in split_huge_page_to_list_to_order(). 5. Patch 6 added new folio_split(). 6. Patch 7 removed __split_huge_page() and __split_huge_page_tail(). 7. Patch 8 added a new in_folio_offset to split_huge_page debugfs for folio_split() test. 8. Patch 9 used folio_split() for truncate operation. 9. Patch 10 added folio_split() tests. Any comments and/or suggestions are welcome. Thanks. [1] https://lore.kernel.org/linux-mm/20241008223748.555845-1-ziy@nvidia.com/ [2] https://lore.kernel.org/linux-mm/20241028180932.1319265-1-ziy@nvidia.com/ [3] https://lore.kernel.org/linux-mm/20241101150357.1752726-1-ziy@nvidia.com/ [4] https://lore.kernel.org/linux-mm/e6ppwz5t4p4kvir6eqzoto4y5fmdjdxdyvxvtw43ncly4l4ogr@7ruqsay6i2h2/ [5] https://lore.kernel.org/linux-mm/20241205001839.2582020-1-ziy@nvidia.com/ Zi Yan (10): selftests/mm: use selftests framework to print test result. selftests/mm: add tests for splitting pmd THPs to all lower orders. mm/huge_memory: allow split shmem large folio to any order mm/huge_memory: add two new (not yet used) functions for folio_split() mm/huge_memory: move folio split common code to __folio_split() mm/huge_memory: add buddy allocator like folio_split() mm/huge_memory: remove the old, unused __split_huge_page() mm/huge_memory: add folio_split() to debugfs testing interface. mm/truncate: use folio_split() for truncate operation. selftests/mm: add tests for folio_split(), buddy allocator like split. include/linux/huge_mm.h | 17 + mm/huge_memory.c | 689 +++++++++++------- mm/truncate.c | 31 +- .../selftests/mm/split_huge_page_test.c | 70 +- 4 files changed, 522 insertions(+), 285 deletions(-)