From patchwork Fri Mar 7 17:39:57 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zi Yan X-Patchwork-Id: 14006818 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5E683C28B23 for ; Fri, 7 Mar 2025 17:40:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 54A226B0088; Fri, 7 Mar 2025 12:40:29 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 4D21E6B0089; Fri, 7 Mar 2025 12:40:29 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2FDB96B008A; Fri, 7 Mar 2025 12:40:29 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 09E4B6B0088 for ; Fri, 7 Mar 2025 12:40:29 -0500 (EST) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 1C341120E95 for ; Fri, 7 Mar 2025 17:40:29 +0000 (UTC) X-FDA: 83195469378.28.641B0E7 Received: from NAM11-CO1-obe.outbound.protection.outlook.com (mail-co1nam11on2087.outbound.protection.outlook.com [40.107.220.87]) by imf24.hostedemail.com (Postfix) with ESMTP id 0CEA918000E for ; Fri, 7 Mar 2025 17:40:25 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=Nvidia.com header.s=selector2 header.b=KnP9aI11; arc=pass ("microsoft.com:s=arcselector10001:i=1"); spf=pass (imf24.hostedemail.com: domain of ziy@nvidia.com designates 40.107.220.87 as permitted sender) smtp.mailfrom=ziy@nvidia.com; dmarc=pass (policy=reject) header.from=nvidia.com ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1741369226; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=vyxwjhNAe16++5qZINdjjuM142tYqzuQZ3aY+eriH5s=; b=BmPY/Qx+V3kUd48gX7eweG5dVjBT+ZHsRcMS90YPmIWOMXsUfCRXblb2VhO3rvyvAqsvkZ tZHb8Fi3r9DSCrRRpIxVzOoBwtsLRAo1jI9yreev7j82Vq7q3A0Hz4KBHtmKQw0yOM17Uk NRXYSBIyDb/GhYKNX9b2TKISGNARbbI= ARC-Authentication-Results: i=2; imf24.hostedemail.com; dkim=pass header.d=Nvidia.com header.s=selector2 header.b=KnP9aI11; arc=pass ("microsoft.com:s=arcselector10001:i=1"); spf=pass (imf24.hostedemail.com: domain of ziy@nvidia.com designates 40.107.220.87 as permitted sender) smtp.mailfrom=ziy@nvidia.com; dmarc=pass (policy=reject) header.from=nvidia.com ARC-Seal: i=2; s=arc-20220608; d=hostedemail.com; t=1741369226; a=rsa-sha256; cv=pass; b=p9uodTdWJE1F68Ah52Bei2fffo+ybPfKtNWYvm6P0Avoz3yFDIZv0EpS2GqOirA7m/wYs2 /BBTbn8UpBXyi2q3Cl1HJIAsQsG009YjiXhRzFWvW5WXqwCPAIkS21d3RfAsJwS3t0Uyjz kvSxofYDtIrknrCkRcMDtuFRFgifciM= ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=bmQyDPT3LQm/9KrJAvVOpCgaBllJl7yMn6m+fa77CKotpYh224dQ71TnQJhmDyuDecyIfvMhcqZlY351dI+5uNL2DfFqUvQCdpXQUe9lbQsODneIaqzTm5fB0e8wORoSMeqfV5oOZCKDAkykvqtJfsRiSLQe84H5Cave8U9u6E4G63VcwVUTOcR4Uqolw2bs46fHPBS0QqV0drUx+31dZJDJUqqxe825+ww0s3BE9OfVyxxpHjH3stY6gLuTTYaM8nuJAGi2bT5muqV6fdU0WzDL+H2le1UBb96fq1G4gQXgDF47mTrv0kZKSIyGzUYCfP0awG9k5dfKTZi31RYQBw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=vyxwjhNAe16++5qZINdjjuM142tYqzuQZ3aY+eriH5s=; b=kK0Rsvg3iMPAZ662JxFofo8mQ7K70uId9k1aTiWCG6w9Wce/XFHDCakmwbjPY5W0GacfuW5BBeOCm4u21yDfGM5boVv5Fj+0dO9jvfNWJgN40oknjNkZJGd27MzZjZItij9uhr0I/GhhNZdPrVA92H+k7BdI9MU3tpBBFLo2jRI0TdZdGfax0J/cFBB20qCgmytsvLHT+ZPOKbgzpyMKhGtVnAk6JpVoaqxGMj5tuzXtA78NvZdyCLRqfwyMD3n27mlsvoeOGlD1ZFTF6IDmMeTJEbbP5+Sh3maRiGd7XdvN8Efm53fVK8dePmwHaQh/PwF+G9lkhnnDGXENILkKag== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=vyxwjhNAe16++5qZINdjjuM142tYqzuQZ3aY+eriH5s=; b=KnP9aI11ddoBKj3nOIuDI+qeM/89uXcXMXxSmEHPDNzn3E0rtR2LfYJjjxe0DYeDz+SnQ8FEk858OhipXEVymrtOJRycwgXt0wcSlGif5kl2KSyJgJLlhxVkDyXh2CMQD4Ut7BS5ByMeYpyzS/wJw6w6bJIg/kApCdxqd1+hJCeRF2Ara4mzll0c+hgx+Vus+WILJCpZZNUurNpOWoS/HGQTPJQYpZyvLiJHF7E1TksNiuzjIW/iP+AMTdrMYDdzIvlIEQdiTno7deui1dBmmGIqXwMKWFJHee9nkaErXARCxIe7RdyAUDYZwC/axb3muy5bfLjtYaGS5UOA6RVc8w== Received: from DS7PR12MB9473.namprd12.prod.outlook.com (2603:10b6:8:252::5) by LV2PR12MB6014.namprd12.prod.outlook.com (2603:10b6:408:170::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8511.19; Fri, 7 Mar 2025 17:40:13 +0000 Received: from DS7PR12MB9473.namprd12.prod.outlook.com ([fe80::5189:ecec:d84a:133a]) by DS7PR12MB9473.namprd12.prod.outlook.com ([fe80::5189:ecec:d84a:133a%5]) with mapi id 15.20.8511.020; Fri, 7 Mar 2025 17:40:13 +0000 From: Zi Yan To: linux-mm@kvack.org, Andrew Morton , Hugh Dickins , "Matthew Wilcox (Oracle)" Cc: Ryan Roberts , "Kirill A . Shutemov" , David Hildenbrand , Yang Shi , Miaohe Lin , Kefeng Wang , Yu Zhao , John Hubbard , Baolin Wang , linux-kselftest@vger.kernel.org, linux-kernel@vger.kernel.org, Zi Yan , Kairui Song Subject: [PATCH v10 4/8] mm/huge_memory: add buddy allocator like (non-uniform) folio_split() Date: Fri, 7 Mar 2025 12:39:57 -0500 Message-ID: <20250307174001.242794-5-ziy@nvidia.com> X-Mailer: git-send-email 2.47.2 In-Reply-To: <20250307174001.242794-1-ziy@nvidia.com> References: <20250307174001.242794-1-ziy@nvidia.com> X-ClientProxiedBy: BL0PR02CA0140.namprd02.prod.outlook.com (2603:10b6:208:35::45) To DS7PR12MB9473.namprd12.prod.outlook.com (2603:10b6:8:252::5) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: DS7PR12MB9473:EE_|LV2PR12MB6014:EE_ X-MS-Office365-Filtering-Correlation-Id: ea8ae57d-d34e-442c-848c-08dd5d9f1d56 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|376014|7416014|366016|1800799024; X-Microsoft-Antispam-Message-Info: ZQb5mWKJ6O0sAPOcuUZfQ9mxi+u+K8fNTRyNSvf0uZbq/56Bs9hZusIXnCNWwElxyB6R/u7oVgBrOfceFYMo0Hhdo2uq0yyDeFzftWsMYak1BE60fFJd45aEFjyeJsNqcGgnFlj6zeGdaq8ZJmsBVir+QHFHPiljcoIkQ05+iTfH8jiiUF+RLHG0Vz8tq6py2g0lwALeROYAEoEgoFKpm87PaxxbxE5kGhRPqsmgG/9Nog46VS4CZvx/9S2ZQMrPuaD0Q/5HnTjGahnUlxL4F8mVk0pZefJ4nCUA3E5Htix/eZX3yY8SLuGKSfED5HJu9k4cOZj3nCfd4TnjoYas56qxFHj9bF/bkBVU/JW/aO3frfpJF3xWDs9Rdzxub1A/iiS1lQ8AWu7yEPOwxtoyUNe1+jahQHOGLNwoVU2/y3cObhqiGfK6rxD99K3/AQnm4jDilVDCP5m53ykB5Z+kk3KiwGR16TP6O1SJn7tkOjPeI46rZfGN1AK4GooDHkcW72x7Rry1Dsg6VcKvCdQQ9HCPPSzcjRmZpvFUSu9vrfZzbifaDNHAaCeBQZnvpx1qAkAQsKUqN1RfUQXmrlB41VnK8Bif566KshtdLlhTvNzzSoN55Jz3MAitgTFiZRUHNXXSxNyaHjDmsb+sDpIgDhx+B5W1Nwgo9oUZ0CBgxy86lcPVSgT8B34TtevnaeW4hFFjwSV+CvO1Pq8ZPqnfz9WEPE3jAl53D+n4MtSbjNdMXL6IsPIwXyJnViUIPuIXEh+HeEcTzxm5FBcFNn7qrEPpjQFiFiMeL5sRaBmnYXhehJGL1kfEEAQRufAM3Ik4dNDZIGsXhIwQrHhUBjvvNA9p8vZDdvpN1TEQMu0f7s0xptFDttS5Tcb6GZ06EN5lHubZ7O+HFpn1BbBfy6lF08eRkxLj12vtQUl9bEi4S3w+41FvSFbB4GsHh9Ql2kE1mCZLgogFHsGRn5SfwpUziUuI0LmxDnuFUb6WNdTqCm79HSFfQHtjXLJIG1GA9KjyizwXEmGBq1RjWDtoDTUdLcQpWwaKrp3WzxbzzWmAfsf9lBRKaMqM8sTt2bpO2kCVGxmvp0mZNo0AFu05E9r3bl1/P+yTexVn7cP4augRsIZUtEEZnqO2pHiAnr224nR3lUmw31oSqKTAKRm0O1VRTMGRiAw+Jr6GmBRE1wctigu32tf4DOUSMjK7BetmHU1hCMXxBm5agwL5Sl95j6STz7Ep/9DQHsm0G0290Q/pZNfpzd0ngr+ZWiKsZgyrjD6YYPSuKtjYq0zYB4qZeFVaHjtcyVHIfSvcudYdVZPTj2JAoin2lQt5djChIKjbrKqVxHah46eRi8r97GHt9kmT0LPmWQzWgWwGj+BPhT5xdQFPOAPhjHMAt33CeN6xgh0s X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:DS7PR12MB9473.namprd12.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230040)(376014)(7416014)(366016)(1800799024);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: r4VhafHT6sCbbV8qKWs8O76MEEHcnzWXTI5lZ8v9NNHy0MLk3Chz5fAh5rIrpDEiZ01LRrhf0kNvzXL8Uj4XdOHPVf433ZDxzwQsiuT33WwOZft+H6yOZacaoyv4Iyfxy4pC50HWogU8ZghkmqqobVfchOhLyqDf2miPdEmY7XzYwy5OVskHd7iSaVRenXcQ44Swp6W8BpUUlYSXkOsLEZ04XzeRW47IrBjvWOQ/b3FPPMAxgxV6p+J34NH8+7/G2MwEMEuzbkdETlEwMsxN0DljETwFrKqLsjuRrPVMVjhYkaaveRqgtzZfemvlUygi/m4mqlCzrPYQXHpC+ooMYcEdtUeo2rxa/F1hifGh0bbj7AM3RwTPH7HX8NPUaqzmtPbbDnOslp/3DEjOJxKzymZEE6FutUrjNFC7RokGlxTESL4Zw+OSTQ9yk1M1eJOR4OfVnXU7/Xl1X+6jTLGG3G8tMnaGWh9pXrRAB5rFCeE6hcxIhT5dJE6f4U1IQcFY7NgZ/X1ltb7qXP1QEocw4JVcgzwbEFTx/KmqsEQTibC2RLbfm01q8vCh7Jx+1F39THE7aQ/2T2B+/wLGkCPTot3x733W5dThlMHIhyNGSvlN2fVKBh2yGbK0v3or6tHkC++52VwISnRvXDw1uiNj7SBTp84zkekkzrzqh4p0WZkL27k2f6XO1JHF9zPLjsbSjjKRN/TAtcygU+m9ubT7FyulS38f50QIZb4hcRoX0n/9SG97X06VYVCkvb4sxXq3h8bgN9I9zUhVGL/YHuHuYfNpQGj7pT7xE18NblKxrXKtgpxsnmy7KaJDzPakFxIvpd9tccM8nUqJxycEcwSVwEjYF7PIkoA3YxWFcarKcjSI5N0P0cb1QP5KL3cCjIEG04Jf6eoE5n6lRQO/4wX1LNqVQ0F17tL3FfA+uFcyDtg1rMcY0zLaJWloLS7reDcwODeLis1k+NcL8kOwWNCx5YQBUrL53RMyWY/aO+n7DA7auoDz8HAXk7ETqZhhbq/pGimaMhvfIvTUnI8PqC9pkKupreq+7d5IbkK18uTzunHzEB7FX9ck6uEJSwvIHRsIRvfBnFqBCbUbOa75GD7Q+Hv8GcbWLbrmUccoltiqBMVyVLL8+SlQ1L6GnoHGZrhaWXGpEI6DWcQjUcOgPlo3WPDjsaCeSF5B6Q3FvppKfKaLF18HZdQxOxhK93Lyutzioip6MGm+WNYd+uuRlMs62bT9Q9ju3hkcf3LPfzEHso+Vxn4VrxhMevlMidhDDPIV6kroeOtmzBDECzB4oqDHQNTO4nzbc9ZbRL1KGxY6GMm2BEHAc9T+OI9fPQdexxt0bxQmjc3wjwNSNraDnWBbUSTbh47/TDGvko2zDsQns0B/46S+DWrOyuhnqK4Ig9cilV5DgJPGeQ2pIjEOdr177xYGJ6RN7KJBfZjCkqitN9voRYwRT6rdfdMqr/5I9mCTb63WUTlGPiT0cAwdbYQSjsRG1YTXanLywmqoYPL24xQiqOlks2PdDRzw89BJ8oBlPYAcGRs+CJ8t2WsBGFugY49xxV5mOsl7NxbwX9KPeis= X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-Network-Message-Id: ea8ae57d-d34e-442c-848c-08dd5d9f1d56 X-MS-Exchange-CrossTenant-AuthSource: DS7PR12MB9473.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 07 Mar 2025 17:40:13.1924 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: JSvtka+1DE+TNO2ec5Qa4JtwRQbBr4tQGcvi1XxqrGGur+jtIekMMhvbBDuB745d X-MS-Exchange-Transport-CrossTenantHeadersStamped: LV2PR12MB6014 X-Stat-Signature: s5nbzugtmygkyzt5hwcxedohi93bjdih X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: 0CEA918000E X-Rspam-User: X-HE-Tag: 1741369225-475434 X-HE-Meta: U2FsdGVkX1/Xr+mVeXcRAtqwrx7ishrXQ2BBwJH20W2KITdVwfZhc5XPT/IHIuddMfMH0p5uu/AHWutZivULo4fnQa251lyHE2euUO2XqwTWcZNfDl7YiSBQhqeWS82QM0TpO1gsd+AhtQ1gR40rAivu1xC/ZAM9Vt5+eNSKmVqrV9aiz6nrcKZMkoAcqecEaQAFKKwZeTNL+1pNq0QpG/jxK0PoXd9cyPQuthdMgxeBbcuBRfvHpPVOp3w8ZZNfqW/E7i2Nl+TePI/vqVTyegvbHgtzxjbnSO0EpYNWlT+o0699yRi/Tl+HhYsMlHjGrEQT+8vpp538LVSyWQKZNyrMAsOEQOQ1ZfqNJ5YzQpoVXZkeC0xX31zyFVdoaP4u5GsiiYGaqgHlPUDhvo2uCQdUIbZSjMHyroJpx8fq6C8vNJlwm6a2S+n3C64tJeLZmh4CiRCHERwk+Zvl4hhS4NyCqFVeKKz+N6hyXmJeflG6I92wYMk5c05K+LsOgDQufV6Y1otdoK5sitLVB8d0fNLdeWutigZNy8dxrJeYq8siGTcCbgh6cZQX8HKuYwSd/PpHQlodCu2jNOd+ermDmceIx+6dfeqL8pWP/W6v6FTZI9GCsBTNRFFbPfv4a6kFkIDsc9lAWZ4iyA0aRptAiu1BHW/E7wJaUnekiVUvjRQikLfKfztw+kH8DXmEwPrgMmJL/f8EJXaALiZwqL7JtojJXh8SyZ9iBMhQR1oBW+a28ZsFX+vrhaVYv1btmKPQDAbTl0D0CIMg2k0jAskqK74wLBcEx+Dtfec8gLItpXfg8YcizHfjJDWbh5RfgD7x0P6v83+WXrjrZ147BPomvDLtVvFwXIW3lJzaLilaYusdgThfPBwl/qyKukwHDBeb8tKqfe6x8SpG4uOrZuGPBuvo4UE0Bk7c2vClbv4A9bgSuzt/Uj++ZD5VQkGwh+i8DGdYF9Dnkapf6hTq4Vt OSc+QKVb KbLwOpik7DeHwYIfEZDZlvsKxac0OSbdLeEAxc5KSZVBhMXtJcYXQZ0I+tGQnw0tq6ooHkGzHkfzBzBLuMTCZUoc2l5sbzSd2QTLHB6fQFjwwESRZwejeXIauoZ994ZAK3WIoaDjzfDUfoPrEUKCMS7I9T0ykOHq/ybCi2DNZFUXiJou1qhCxGXTZ02JFyojUY8IGVok+PtbpcKXAWUYe9tneB/Oad0y35w3S/RvWAKerm0fRv5QpmRyaLZRNW5lTTOgC3hiljDb8w4ay9tRSJGckOUDcJxFx6GX43KTUVIrzsvAKH70clFwtAPlW2oe2a1TPi6n21QLOzZC8Gx+xPYCjfDUweLRZLpUmyQ1JIHFWoP3vxAOYC0H41IJq/YdCvf2kHpB4WSNFJvHNRHIv5L564as878CEkBRsEch7cYtfU4Be76jboUOKUZnSzs+LpJjGHu4j2q3iI4OtHJVQjwJ8IsWQ7Q88TDdlvXnHbKZM1qB+w75aJ2xRMTarvVJYnxWjl2fCHM2AfjDUbfPlkEQEmWlxww1fI/KOGmxSsRC7cho2vr3AkxueJW5Do4AhnZNre/Itj3ZmLc3ejiS68cjAHanbv998ok/8zIPe7DOAgtUuM4eDM8nozOYhx/iJ1YRmN/mj3X4Kvb8e47jpGbxRCOGJwgMskz484QQuTX3eiNZXt66UeAAgvGwoqHhbkYTWnLTd2lv4IWcunXjlqgXDiUrG7bYab97IAtScHRRoNuLHycNPofQR7g== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: folio_split() splits a large folio in the same way as buddy allocator splits a large free page for allocation. The purpose is to minimize the number of folios after the split. For example, if user wants to free the 3rd subpage in a order-9 folio, folio_split() will split the order-9 folio as: O-0, O-0, O-0, O-0, O-2, O-3, O-4, O-5, O-6, O-7, O-8 if it is anon, since anon folio does not support order-1 yet. ----------------------------------------------------------------- | | | | | | | | | |O-0|O-0|O-0|O-0| O-2 |...| O-7 | O-8 | | | | | | | | | | ----------------------------------------------------------------- O-1, O-0, O-0, O-2, O-3, O-4, O-5, O-6, O-7, O-9 if it is pagecache --------------------------------------------------------------- | | | | | | | | | O-1 |O-0|O-0| O-2 |...| O-7 | O-8 | | | | | | | | | --------------------------------------------------------------- It generates fewer folios (i.e., 11 or 10) than existing page split approach, which splits the order-9 to 512 order-0 folios. It also reduces the number of new xa_node needed during a pagecache folio split from 8 to 1, potentially decreasing the folio split failure rate due to memory constraints. folio_split() and existing split_huge_page_to_list_to_order() share the folio unmapping and remapping code in __folio_split() and the common backend split code in __split_unmapped_folio() using uniform_split variable to distinguish their operations. uniform_split_supported() and non_uniform_split_supported() are added to factor out check code and will be used outside __folio_split() in the following commit. Signed-off-by: Zi Yan Cc: Baolin Wang Cc: David Hildenbrand Cc: Hugh Dickins Cc: John Hubbard Cc: Kefeng Wang Cc: Kirill A. Shuemov Cc: Matthew Wilcox Cc: Miaohe Lin Cc: Ryan Roberts Cc: Yang Shi Cc: Yu Zhao Cc: Kairui Song --- mm/huge_memory.c | 170 +++++++++++++++++++++++++++++++++++------------ 1 file changed, 128 insertions(+), 42 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 871c260163f1..3e05e62fdccb 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -3863,12 +3863,85 @@ static int __split_unmapped_folio(struct folio *folio, int new_order, return ret; } +static bool non_uniform_split_supported(struct folio *folio, unsigned int new_order, + bool warns) +{ + if (folio_test_anon(folio)) { + /* order-1 is not supported for anonymous THP. */ + VM_WARN_ONCE(warns && new_order == 1, + "Cannot split to order-1 folio"); + return new_order != 1; + } else if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && + !mapping_large_folio_support(folio->mapping)) { + /* + * No split if the file system does not support large folio. + * Note that we might still have THPs in such mappings due to + * CONFIG_READ_ONLY_THP_FOR_FS. But in that case, the mapping + * does not actually support large folios properly. + */ + VM_WARN_ONCE(warns, + "Cannot split file folio to non-0 order"); + return false; + } + + /* Only swapping a whole PMD-mapped folio is supported */ + if (folio_test_swapcache(folio)) { + VM_WARN_ONCE(warns, + "Cannot split swapcache folio to non-0 order"); + return false; + } + + return true; +} + +/* See comments in non_uniform_split_supported() */ +static bool uniform_split_supported(struct folio *folio, unsigned int new_order, + bool warns) +{ + if (folio_test_anon(folio)) { + VM_WARN_ONCE(warns && new_order == 1, + "Cannot split to order-1 folio"); + return new_order != 1; + } else if (new_order) { + if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && + !mapping_large_folio_support(folio->mapping)) { + VM_WARN_ONCE(warns, + "Cannot split file folio to non-0 order"); + return false; + } + } + + if (new_order && folio_test_swapcache(folio)) { + VM_WARN_ONCE(warns, + "Cannot split swapcache folio to non-0 order"); + return false; + } + + return true; +} + +/* + * __folio_split: split a folio at @split_at to a @new_order folio + * @folio: folio to split + * @new_order: the order of the new folio + * @split_at: a page within the new folio + * @lock_at: a page within @folio to be left locked to caller + * @list: after-split folios will be put on it if non NULL + * @uniform_split: perform uniform split or not (non-uniform split) + * + * It calls __split_unmapped_folio() to perform uniform and non-uniform split. + * It is in charge of checking whether the split is supported or not and + * preparing @folio for __split_unmapped_folio(). + * + * return: 0: successful, <0 failed (if -ENOMEM is returned, @folio might be + * split but not to @new_order, the caller needs to check) + */ static int __folio_split(struct folio *folio, unsigned int new_order, - struct page *page, struct list_head *list) + struct page *split_at, struct page *lock_at, + struct list_head *list, bool uniform_split) { struct deferred_split *ds_queue = get_deferred_split_queue(folio); - /* reset xarray order to new order after split */ - XA_STATE_ORDER(xas, &folio->mapping->i_pages, folio->index, new_order); + XA_STATE(xas, &folio->mapping->i_pages, folio->index); bool is_anon = folio_test_anon(folio); struct address_space *mapping = NULL; struct anon_vma *anon_vma = NULL; @@ -3880,32 +3953,17 @@ static int __folio_split(struct folio *folio, unsigned int new_order, VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio); VM_BUG_ON_FOLIO(!folio_test_large(folio), folio); + if (folio != page_folio(split_at) || folio != page_folio(lock_at)) + return -EINVAL; + if (new_order >= folio_order(folio)) return -EINVAL; - if (is_anon) { - /* order-1 is not supported for anonymous THP. */ - if (new_order == 1) { - VM_WARN_ONCE(1, "Cannot split to order-1 folio"); - return -EINVAL; - } - } else if (new_order) { - /* - * No split if the file system does not support large folio. - * Note that we might still have THPs in such mappings due to - * CONFIG_READ_ONLY_THP_FOR_FS. But in that case, the mapping - * does not actually support large folios properly. - */ - if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && - !mapping_large_folio_support(folio->mapping)) { - VM_WARN_ONCE(1, - "Cannot split file folio to non-0 order"); - return -EINVAL; - } - } + if (uniform_split && !uniform_split_supported(folio, new_order, true)) + return -EINVAL; - /* Only swapping a whole PMD-mapped folio is supported */ - if (folio_test_swapcache(folio) && new_order) + if (!uniform_split && + !non_uniform_split_supported(folio, new_order, true)) return -EINVAL; is_hzp = is_huge_zero_folio(folio); @@ -3967,21 +4025,24 @@ static int __folio_split(struct folio *folio, unsigned int new_order, goto out; } - xas_split_alloc(&xas, folio, folio_order(folio), gfp); - if (xas_error(&xas)) { - ret = xas_error(&xas); - goto out; + if (uniform_split) { + xas_set_order(&xas, folio->index, new_order); + xas_split_alloc(&xas, folio, folio_order(folio), gfp); + if (xas_error(&xas)) { + ret = xas_error(&xas); + goto out; + } } anon_vma = NULL; i_mmap_lock_read(mapping); /* - *__split_huge_page() may need to trim off pages beyond EOF: - * but on 32-bit, i_size_read() takes an irq-unsafe seqlock, - * which cannot be nested inside the page tree lock. So note - * end now: i_size itself may be changed at any moment, but - * folio lock is good enough to serialize the trimming. + *__split_unmapped_folio() may need to trim off pages beyond + * EOF: but on 32-bit, i_size_read() takes an irq-unsafe + * seqlock, which cannot be nested inside the page tree lock. + * So note end now: i_size itself may be changed at any moment, + * but folio lock is good enough to serialize the trimming. */ end = DIV_ROUND_UP(i_size_read(mapping->host), PAGE_SIZE); if (shmem_mapping(mapping)) @@ -4035,7 +4096,6 @@ static int __folio_split(struct folio *folio, unsigned int new_order, if (mapping) { int nr = folio_nr_pages(folio); - xas_split(&xas, folio, folio_order(folio)); if (folio_test_pmd_mappable(folio) && new_order < HPAGE_PMD_ORDER) { if (folio_test_swapbacked(folio)) { @@ -4049,12 +4109,9 @@ static int __folio_split(struct folio *folio, unsigned int new_order, } } - if (is_anon) { - mod_mthp_stat(order, MTHP_STAT_NR_ANON, -1); - mod_mthp_stat(new_order, MTHP_STAT_NR_ANON, 1 << (order - new_order)); - } - __split_huge_page(page, list, end, new_order); - ret = 0; + ret = __split_unmapped_folio(folio, new_order, + split_at, lock_at, list, end, &xas, mapping, + uniform_split); } else { spin_unlock(&ds_queue->split_queue_lock); fail: @@ -4132,7 +4189,36 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list, { struct folio *folio = page_folio(page); - return __folio_split(folio, new_order, page, list); + return __folio_split(folio, new_order, &folio->page, page, list, true); +} + +/* + * folio_split: split a folio at @split_at to a @new_order folio + * @folio: folio to split + * @new_order: the order of the new folio + * @split_at: a page within the new folio + * + * return: 0: successful, <0 failed (if -ENOMEM is returned, @folio might be + * split but not to @new_order, the caller needs to check) + * + * It has the same prerequisites and returns as + * split_huge_page_to_list_to_order(). + * + * Split a folio at @split_at to a new_order folio, leave the + * remaining subpages of the original folio as large as possible. For example, + * in the case of splitting an order-9 folio at its third order-3 subpages to + * an order-3 folio, there are 2^(9-3)=64 order-3 subpages in the order-9 folio. + * After the split, there will be a group of folios with different orders and + * the new folio containing @split_at is marked in bracket: + * [order-4, {order-3}, order-3, order-5, order-6, order-7, order-8]. + * + * After split, folio is left locked for caller. + */ +static int folio_split(struct folio *folio, unsigned int new_order, + struct page *split_at, struct list_head *list) +{ + return __folio_split(folio, new_order, split_at, &folio->page, list, + false); } int min_order_for_split(struct folio *folio)