From patchwork Fri Feb 14 17:57:03 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kairui Song X-Patchwork-Id: 13975436 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 25AF7C02198 for ; Fri, 14 Feb 2025 17:58:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9D198280001; Fri, 14 Feb 2025 12:58:31 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 981F06B0089; Fri, 14 Feb 2025 12:58:31 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7FA9B280001; Fri, 14 Feb 2025 12:58:31 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 606206B0088 for ; Fri, 14 Feb 2025 12:58:31 -0500 (EST) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id E23DA1207F7 for ; Fri, 14 Feb 2025 17:58:30 +0000 (UTC) X-FDA: 83119309980.14.039B252 Received: from mail-pl1-f170.google.com (mail-pl1-f170.google.com [209.85.214.170]) by imf23.hostedemail.com (Postfix) with ESMTP id 09C5C140004 for ; Fri, 14 Feb 2025 17:58:28 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=ZZRQDmBg; spf=pass (imf23.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.170 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1739555909; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=wbraR41/doRi2Xpub6gCucxiI0zUxRVmW0Za0OUBAfw=; b=QhpwAdVvz+ZofeLKJUVZEe280P3mgOuVq591CmNjtsxvUyU6Y4TEWTgbistIbF1Htz06CJ iI0dqrwM+A/1+9QCJdBUiwifK6pmNFXEzWcRQ2EL7X/ex1ohQ/rNaRQQBoMKtQUfbYPFR/ T29V79C02mdW9yX9Wjvf5NXfxHZM/6M= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=ZZRQDmBg; spf=pass (imf23.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.170 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1739555909; a=rsa-sha256; cv=none; b=7HulWwS5u38H8eBpNkAo/s8/IxInzKErRIwYastQjvh6CJXlfSGUtDYTgWMC5heXj2TQf9 u52rcTibC/Julw7yPMwukqF6TajbCUVvAYG4HkxSWIe5XA1gg+5TSc8p8pi8Yzwm8gnZZn QVpwGqoKeJ2HV1W/Pi7j3BaVDtqvrsQ= Received: by mail-pl1-f170.google.com with SMTP id d9443c01a7336-220e6028214so34413925ad.0 for ; Fri, 14 Feb 2025 09:58:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1739555907; x=1740160707; darn=kvack.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=wbraR41/doRi2Xpub6gCucxiI0zUxRVmW0Za0OUBAfw=; b=ZZRQDmBgvWQJGQLMEbjqEbVq7GMNq/PoUXwnuzrlffo7MhaEBPY0qSILqV1jsrHJCl VLMHApMcP9o2v/+dxPPwn3+IbZWJrwnZO/lNvlyU6YjWEXzn123v/y5j2E+puSWjEAxF AqIl4EpF4qeg/OCzycQvyLMV2CxT3hF95LHnYCuygmHiW3OPSJLAbFHiBREw3p5OqAiO 8S9arn8a2D2OlpjzE7tOFaGcQMJDq5m3XLK1sZ/WPXDDcjNkHGV/J+KtP72F+vWWe4RC 7g5y+/4DeTbrqW6978Yvl9FQFOM4lRNCMXTMaXvC7RZj5wwPKAlbFN41+gf6i5P1hboS aLeg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739555907; x=1740160707; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=wbraR41/doRi2Xpub6gCucxiI0zUxRVmW0Za0OUBAfw=; b=VLlAGHF2zklN6zNf843zkUh377lPhgc7En5v50C2g4+6MUMOop4dR5JpLkoI1SkzYJ ZPO+nfS6c+Ufgw/I3V768cYc+TAJlyXzS88y9PtH9oquzPS+GCb+eXAz7m97nGFYlvug bJhWOVaLnDVEGeFjjeIC5kUzruemUIE7Lh2vr3OZKp67zNp1h/85FMhFWiGtf2FiUhrF UJaghIGfmfmb9hekGlHJFxEt/dO9ahmWmPHujzW2NksGiJSpMn3FXB2z/5uVRvjY+F8w p4vPXth6khHhITZqtryI85QgmDZyoE4na2uCXJ+3Rek0dVzzdvoWF/Fnwd7iu6ngVjPg y0bg== X-Gm-Message-State: AOJu0YyHR8FrFMTPjmmysKu62liOE9w64BsLM4BMrfrHU3l2ebo3cd2D 6Nel1Au8uMVOe1T2GU9LpKTG+lZ4Qdfklp3EmjHStGfGymuNcn8IkXjsD2env+4= X-Gm-Gg: ASbGncuoCYPKDx+/mu3CCl6m0S/cUDZj/lZwNqGCgZ+9dr7Cim8jyh/mKapwkWBlyLy tiA4hZfPq2NPtANggr6tpc3BO1D/gHtDs2lxy2f1zlX6GiDbflEt7FWF59zGtOTPxKsrj7ltTHj Rnr0gZk3r2q4Mbo36yC3DsYWsrrE3Lak545WkB+B3O1GfoF2cvG8SDwLAc88rHzDxyFKlhlbe2J EKmz7bY0psWbAIu+oAXQu5Mx5XRCAxcfrxNG8wDZOgCxeKYYDT0UZLMMMcROSQLgK2wiW9L89Uy Bie+iPjh/qFSoIaZOwLnxcS9sneYxvr5O1yQ X-Google-Smtp-Source: AGHT+IEcy2yWQYwogQ6KcjhIKWsRSf0nE2f/TYKhfF4cn7HOpM1wAXZnP8din7GApc5tHvB2ixsFGg== X-Received: by 2002:a17:903:18a:b0:220:c8e2:5e30 with SMTP id d9443c01a7336-221040bd29dmr1683615ad.46.1739555907510; Fri, 14 Feb 2025 09:58:27 -0800 (PST) Received: from KASONG-MC4.tencent.com ([106.37.120.226]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-220d55943b5sm31216605ad.246.2025.02.14.09.58.22 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Fri, 14 Feb 2025 09:58:27 -0800 (PST) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Chris Li , Barry Song , Hugh Dickins , Yosry Ahmed , "Huang, Ying" , Baoquan He , Nhat Pham , Johannes Weiner , Kalesh Singh , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH 1/7] mm, swap: avoid reclaiming irrelevant swap cache Date: Sat, 15 Feb 2025 01:57:03 +0800 Message-ID: <20250214175709.76029-2-ryncsn@gmail.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250214175709.76029-1-ryncsn@gmail.com> References: <20250214175709.76029-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Queue-Id: 09C5C140004 X-Stat-Signature: fq88yy88fc1cxbc8codr9ptf5dkga9eg X-Rspamd-Server: rspam03 X-HE-Tag: 1739555908-148780 X-HE-Meta: U2FsdGVkX1+mlWo70hxBQMmdW/3c4wGIeTk0/Oi5icZH6poOiRII30SB3x0LRgPIF/Fh40PTzlwDxvSDQG/m+l3eH7DU00lXajxq+ZqKyTXT69z9WFWv382sHPLdZ+FmJ6pOx42eXakAw2GSsQSnNdF0OGwBLOIPNV/3922oCtKzajq2Pf5yBA6n2lPO+I/ibe+dgzbrXOiG9+BvT+ztrahocbVHEluc0tH5Z/BBCaMECR/pqGQw49fKpx+zgIAZXvsU72EB8ZWvNKeakzb9gYVH3817gm0cHhyYSrNaVAUvR9UO7thhayZxqhhQIlk5xjyDxMvF7hya82I8VSNzln9fNY00zKRF8B2YfKGQvhmVeu2xUjP+AmBBZ4jLVap3NLt4jW6+xEnuKFGTekRPS5bgQo8OAYU6WfDq+9qT57kEGrdU7mNdB9u/b+OQNic1WWw1RqvGS9GLrgAOMdyUN0iFwJYqnCmwHDx6DaTmqQjkRnmIN7NYLuIdOu7zlTM8J2Xotm76KW8ZPAx44Gxiw/J33bhnR2U/Wb6cLwZN2QrOfyX/HrSjGD6MjisVKtBuK/6rZI7pUlrO9SeAD65M+cpgDqcW0BqbbBOEXRX0sCCGMIeSNu7TgPSup3wcu0QBX2PJtssPT+Mize+MdKIerYyEOb8hKzp5iKu12TUznU71QhnxCKn/nKC3qISKv48+q8hNsNpIMTC8NV0k7Yd3PCiUKxmP7XAq+y3uMAqzzqixbNZfL+9+X0JzlNsldAwPoI08FBCCH0wso4CdP0TyeU7EXU0z09p8kLRzq/HgO2ooMXkL24EQE97rVFPC0S+ItOHS7ui27dikDeYHnOXVNQm5pbvWpzYv6zbFvJDXRPUnMKaTjjDjCuNpisoJI92u/tpOL+FUsNjppthweDrS6lGqxw6BBtT+/L05d9MAELaxxUkaKPkwzSucZTkQTTtye5LNvlD6tIOrh5sJ5Hs BHM2If0S N+2CBflGg+2z6O3e+ddwQKTOK/S8Jqtek2T24kufr2aXTVHbi6/HAjw82gcMozowpAsf1RymmTKoCGfxNV7EoNTWFXyW+yQJW2XARpTUmE6+FIEFaVJIjX3Bhk/+a1OgfO1a5gqXEUekNpWIddyaFKT6EtF6m0h0MPb5Yn4s6T/3WN69rkj9nZMU+qT5/3Im8exim+mdqD1fm7bNF7U4WawEYLDtPw5thbKTDAI7DLZZKLtTuadgdvQhw25+MqgNNp+Pt38iWCWXh9gjapFkDPRqsaIL/CJJRnigpORVSewYUzMt9lm1S5syuL8mGqvCkFdycXnFrh05f0OVCzwuD8imPGiQMHd3rea6P9+HfjGCf5fowbGq9pSHAq0osud3uYUGW1B1nH2bmNyN45ZqOS/z4QRb/Cf9LFVxTjlCFfeppTpJQrRYkN4LVlu7OSYKOd1XU5AMvTx9eJSsyNesb4Lo8NwLZmGgbWbSYqAjA6rlSx+fhD2mvTRvs+ih1xdbY/KgFV0PMrSj2egka+wZdE41LwB8r4K5+HzVqNxVJ7SIj+KrW9U0nmIvTSQfqKqGyNRkgJrlPq5CQQ7mMFIJpsh4Gog8QwIrKfSNq8fVmpthliRE90leuaQRPh5naJxivgg3YYsgERScaP1O9ocMxWyjDAvGFxR3u8tDtbXan+46I+oosT2Vc0zYHcZagC6Tj7YkbxseJsEhR6ecshuoxIOnR2uvcH/dRR+bAfRXyNRxIDFbUQc68yKsilsSq4vLpKXJrY15+Pdd+5xXi50E4nP0k8zZTTAgEWNo4yBD4J2aIr3KUdDdoHTm5i8ZfLgvGwTswgu1qlgm5SVM64N1k0OZs3eWPvnKrIToxMa/ZW+EEDTgKczTLakLQSMuBY/fwgXof8HE8pNUvy2CiCiq+qeCv0HwzT0ze31DC63jNSn1mzn4= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Kairui Song Swap allocator will do swap cache reclaim to recycle HAS_CACHE slots for allocation. It initiates the reclaim from the offset to be reclaimed and looks up the corresponding folio. The lookup process is lockless, so it's possible the folio will be removed from the swap cache and given a different swap entry before the reclaim locks the folio. If it happens, the reclaim will end up reclaiming an irrelevant folio, and return wrong return value. This shouldn't cause any problem with correctness or stability, but it is indeed confusing and unexpected, and will increase fragmentation, decrease performance. Fix this by checking whether the folio is still pointing to the offset the allocator want to reclaim before reclaiming it. Signed-off-by: Kairui Song Reviewed-by: Baoquan He --- mm/swapfile.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/mm/swapfile.c b/mm/swapfile.c index 34baefb000b5..c77ffee4af86 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -210,6 +210,7 @@ static int __try_to_reclaim_swap(struct swap_info_struct *si, int ret, nr_pages; bool need_reclaim; +again: folio = filemap_get_folio(address_space, swap_cache_index(entry)); if (IS_ERR(folio)) return 0; @@ -227,8 +228,16 @@ static int __try_to_reclaim_swap(struct swap_info_struct *si, if (!folio_trylock(folio)) goto out; - /* offset could point to the middle of a large folio */ + /* + * Offset could point to the middle of a large folio, or folio + * may no longer point to the expected offset before it's locked. + */ entry = folio->swap; + if (offset < swp_offset(entry) || offset >= swp_offset(entry) + nr_pages) { + folio_unlock(folio); + folio_put(folio); + goto again; + } offset = swp_offset(entry); need_reclaim = ((flags & TTRS_ANYWAY) || From patchwork Fri Feb 14 17:57:04 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kairui Song X-Patchwork-Id: 13975437 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id F37E8C02198 for ; Fri, 14 Feb 2025 17:58:36 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8E67B280003; Fri, 14 Feb 2025 12:58:36 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 896CF280002; Fri, 14 Feb 2025 12:58:36 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7372E280003; Fri, 14 Feb 2025 12:58:36 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 55770280002 for ; Fri, 14 Feb 2025 12:58:36 -0500 (EST) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id D85AE140808 for ; Fri, 14 Feb 2025 17:58:35 +0000 (UTC) X-FDA: 83119310190.12.8BF2D07 Received: from mail-pj1-f49.google.com (mail-pj1-f49.google.com [209.85.216.49]) by imf22.hostedemail.com (Postfix) with ESMTP id 05EDFC0011 for ; Fri, 14 Feb 2025 17:58:33 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=cu74+mHz; spf=pass (imf22.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.216.49 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1739555914; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=gKbcT7HZi918XHMs4NWM+gGN3vz3fffmbZA+qEY+PV4=; b=IJqcuxX4ji+14XcxwdElLawq8yQdYwbxfdAjavhIRx39fERZa6AQu9DdhWu2Rlu2k8odT+ 1OXxPaUpC2JR4pcYwwUY4Vc2XHEbjFFWmIR9R7E2AyZtikwP/XkISOp3hYiDzygg1Fcm4W IwDsEw+VjaZx9OFS96zp1DSmDDEHThQ= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=cu74+mHz; spf=pass (imf22.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.216.49 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1739555914; a=rsa-sha256; cv=none; b=La/XlO5vbD0qHP8+v8s5JnA0NYcr/Hque9PQgWLZyTwIKGshldLfysvoDFzOnQ0A02MpUd kXlFsD+2KCVEhsLdcljh1PZ+qqcBDLX9atmxRe78iIe7/JTdW4PJXk5a8fD+RK/BUFlo5w wSFvQDuamy0gHBGZe13a6z0H7YUYysI= Received: by mail-pj1-f49.google.com with SMTP id 98e67ed59e1d1-2fc291f7ddbso2029960a91.1 for ; Fri, 14 Feb 2025 09:58:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1739555912; x=1740160712; darn=kvack.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=gKbcT7HZi918XHMs4NWM+gGN3vz3fffmbZA+qEY+PV4=; b=cu74+mHzgKDugMRpuUZLqpAjBsjpTPff/vPBaQBtia3l1sBsBATbRDF/YgGUqPU+/Y L2Lm4nXq6RmW9K9OBIFoSibpNxxXYAoPTo7Bs0nxkk78JMdXlrCxlVgqTiExcnyixGls tyScYH8JM84ofoxfUFN6NUCoaxNEKOjVnii2X7voxYvcsuxqwrrDdfRfPJ3TF3IfRj0C y2QJq2RbpD0GeNmSNiDjyDj8bCYb3+aQjVzOYds5Sn7h6WvXT6bMMxc1iIeK9V1Dcave 5rS2aV+25YMcSt3rG6WLxfLpiG9FY4z7EzwIyW7UEVBtxzVRd1+Tn5HPbJQAJ0BKzqil jRaA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739555912; x=1740160712; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=gKbcT7HZi918XHMs4NWM+gGN3vz3fffmbZA+qEY+PV4=; b=BJLOaLXKx44wh+0wT4Vy4d7DX8E9JCADRoxE1ie6f/K4QnVAakzq2zREPy6m0Vhmli XkwIt2gFop5NIBWpjDw5jmHDmO1Pj6L6xTSpsaEI+/jnv6PQ6tgatf3uo4m5WEmQcLzJ jcN9qBP4n8fylmHcOoA6HuiiqxeX83CkeGCim7gf7A4EVRq3b6f9/ZIxHytjkrorlE8J i4ok7IhR9jQpv8k2fp2efiuKD+bj9RYw7oiIhrGkNQYyx5NhyVl891dyU6aLTSEhrTL1 4vXGOYk9qQFEvx91ff/mxlnFnt2nky6MMoG5A29ESW7xWxvNsokSv8C6dtaTpm1rdkmj vJnQ== X-Gm-Message-State: AOJu0YyRBkWPwaQi22Nyg0/jOrqYRQuWCCxh6jKu0mlfNxSwK+kp4HVV nqcR33ycI8UZc0vvmnMqnvyi3RFn8+1yFJOcVmVCK3uDsBlOAlfuJVS9KpAhO4Q= X-Gm-Gg: ASbGncuwyLNoQm27KSVTcyqkQIy0UkXLxblWSFwPSz68Yqrvu4+Ahn36+Yx2gbqIlUx 39EaVewgGG5q40yaAmm1//NnY0jK+cBMC69Vqwn9dlPrt6xbHCIGfiu7tf+XM7xR5K1ywKnlIjH lP2XfBFlHpB80idAkRJk3r4g/6qC0UqfcD2dfhNd0H88WxvM9pfrMOGKbedggsLHs9tr2X+xl2O ICcN+nxSlm+AdqZ2OWFUWjJFDWBcEMyTKtQrWVU4uixnHwCQgvC2kArkVCwSDglTGmVBrzu/dFK x1EU/gmikv1qwONrjXv4CZjNxLEb4jb90CPq X-Google-Smtp-Source: AGHT+IFvP7YLzoold6ZoZzlfgGVoTuFDtB0G5EBYz929UfitJI5tbELQqHkPOfuQHwIrnMxFrJEGPA== X-Received: by 2002:a17:90b:384b:b0:2f7:4cce:ae37 with SMTP id 98e67ed59e1d1-2fc0e4c17b5mr14365273a91.18.1739555912471; Fri, 14 Feb 2025 09:58:32 -0800 (PST) Received: from KASONG-MC4.tencent.com ([106.37.120.226]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-220d55943b5sm31216605ad.246.2025.02.14.09.58.27 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Fri, 14 Feb 2025 09:58:32 -0800 (PST) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Chris Li , Barry Song , Hugh Dickins , Yosry Ahmed , "Huang, Ying" , Baoquan He , Nhat Pham , Johannes Weiner , Kalesh Singh , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH 2/7] mm, swap: drop the flag TTRS_DIRECT Date: Sat, 15 Feb 2025 01:57:04 +0800 Message-ID: <20250214175709.76029-3-ryncsn@gmail.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250214175709.76029-1-ryncsn@gmail.com> References: <20250214175709.76029-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: 05EDFC0011 X-Stat-Signature: oj3tmixqbzickgz47waean3fy31mj6cm X-HE-Tag: 1739555913-588682 X-HE-Meta: U2FsdGVkX18VbFDrLSBi74oRFyJPOl5oGPcJpc/wuUkcUsBzi3RRz26xzaUE6bsJczFF165hs1oGxqjgKiCM3jKmk/FontDwA3aKgPJUhkLkm2Ou7Wf+C8nP9UFUiqKQGuInhFJe+S6Ra8Dvmx6mJMo/+WrdtW4ST60bmILAgrA5Gj8TCXerLlmQyyGPNwAmNobKFzdtVn5z9Ounuicczku9fZOO/HIw4mXexoTrfsLarfA5Leaw8EYNnK4gJCTMSkfamkoWnz7zNek5UNUqDDQgtLocvV1c9GOM+s3DlsDnLijo1gKfdbczVSR9bZ5x27MOynBUMS2ZaKw0lvJLghmfxR2MqOmhR03tqkczt4tuYqhKfMYdgJIUdWHZ0cVyxxDv1JwduKqNooKYBqdpUKuvqxDpYXxCqJ1pS+I/UknJ73lRIehOuS7SoFD5ogqnLAi5SvwJ4VBU8ANFS/LVBLCrCjcVvMVq0UnVzsu04trqe5H8U+PfC7gT5qRL/evHkA6hFuReSuTqEAsFNS6F1IZgaWVtwx//Dwo7RDLpxqD8Mfa1wLsC3tKQe2HEJb6CdeZEKt7/Zc3wg8FlokYF7pyClI6BBcba/BdpPQuFAPH0FHkq9ckzL2TXJoCyGl/Ds31rtS+7axMvR5y6aiepWiPxJb6VhxMXGmKyJRDCr9RDTlmvgnjKFLUFFlOurAINuDsNYhqVFIuy/1qxAssh9h8fzBUfnR7SwTQHvES5g6QyJf1XrIQE7WrUxvdN3N55qUd6akxTNvR5y7OCK4WzRYGCrSrkKvpNvSQyWuL1hR9eVcu+3q8NvAqHZlYnCjEkxuP4YrvOcJFPIoN9iTT+GN1D/x997TIUApBm/7Tck9JBSd0B3ZIsdAkEIC8IN3PRPomDR4aoRUiSuZGxRQU8Dar0JJmxYwrTsDqUIaKxHzSnXfldJjjAxsws+S9taYPXc+TXR2LgDF6nrJYxowq 1tzl4H68 E17Tm8hp6rPTJoQ3gn7rwFd1R2/+p1n/iM+hOC67V0mamXcIY1jLq2KcHMYMoa5rPh9xdIYThJLvjZ0IpTl7HUXypA3pIZNdEzPtt51adsNcKfXT618+ikBuNEsnlDUHd2CPjfT1Zpr3LP4C2rLMTRLKXVheowN4OfcypEgoWEcwJN8QETDRzdJwPz3E8zpMAykEvj7eG3R4W9GQpA3igmbY/45XAr77ILwMgfdI/DcJvrXVQNMTU5hMTp362FWsibYO4c4aKsj2zuPcafHKvfghJnMWB2FCJYJvbLqG16MkQ+beYQyNHRA0Pq+5l0v8GmIvw/ldmx/r7ZJt3vaqwUZdpwVMciYos1gbLAlHhhPpZkNvvOzvtLMXx6SeRV1gwld9NP+cCoLCZXf71joo++ZsBrXwl9yEKsHnBMcqhTbHYPmYPHt3idRo4pGvnn/Lmr9EmJ8rZW/zgSRhP7ZxaN1ByIh6ohtwF5T0rw6e2qH3AG3aZVFdtMNVX8c/TlCUCLPCxB6SN+xeMCEHWz24f0oT1UWDsGttvbSV7uoItGdUqgCCjE4TvqBlMHLX5qM3mdpx41ywSxgiSs4Ht5e09xPqT8l+3bVbJnKx0K7P9rkv0660= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Kairui Song This flag exists temporarily to allow the allocator to bypass the slot cache during freeing, so reclaiming one slot will free the slot immediately. But now we have already removed slot cache usage on freeing, so this flag has no effect now. Signed-off-by: Kairui Song Reviewed-by: Baoquan He --- mm/swapfile.c | 23 +++-------------------- 1 file changed, 3 insertions(+), 20 deletions(-) diff --git a/mm/swapfile.c b/mm/swapfile.c index c77ffee4af86..449e388a6fec 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -158,8 +158,6 @@ static long swap_usage_in_pages(struct swap_info_struct *si) #define TTRS_UNMAPPED 0x2 /* Reclaim the swap entry if swap is getting full */ #define TTRS_FULL 0x4 -/* Reclaim directly, bypass the slot cache and don't touch device lock */ -#define TTRS_DIRECT 0x8 static bool swap_only_has_cache(struct swap_info_struct *si, unsigned long offset, int nr_pages) @@ -257,23 +255,8 @@ static int __try_to_reclaim_swap(struct swap_info_struct *si, if (!need_reclaim) goto out_unlock; - if (!(flags & TTRS_DIRECT)) { - /* Free through slot cache */ - delete_from_swap_cache(folio); - folio_set_dirty(folio); - ret = nr_pages; - goto out_unlock; - } - - xa_lock_irq(&address_space->i_pages); - __delete_from_swap_cache(folio, entry, NULL); - xa_unlock_irq(&address_space->i_pages); - folio_ref_sub(folio, nr_pages); + delete_from_swap_cache(folio); folio_set_dirty(folio); - - ci = lock_cluster(si, offset); - swap_entry_range_free(si, ci, entry, nr_pages); - unlock_cluster(ci); ret = nr_pages; out_unlock: folio_unlock(folio); @@ -707,7 +690,7 @@ static bool cluster_reclaim_range(struct swap_info_struct *si, offset++; break; case SWAP_HAS_CACHE: - nr_reclaim = __try_to_reclaim_swap(si, offset, TTRS_ANYWAY | TTRS_DIRECT); + nr_reclaim = __try_to_reclaim_swap(si, offset, TTRS_ANYWAY); if (nr_reclaim > 0) offset += nr_reclaim; else @@ -860,7 +843,7 @@ static void swap_reclaim_full_clusters(struct swap_info_struct *si, bool force) if (READ_ONCE(map[offset]) == SWAP_HAS_CACHE) { spin_unlock(&ci->lock); nr_reclaim = __try_to_reclaim_swap(si, offset, - TTRS_ANYWAY | TTRS_DIRECT); + TTRS_ANYWAY); spin_lock(&ci->lock); if (nr_reclaim) { offset += abs(nr_reclaim); From patchwork Fri Feb 14 17:57:05 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kairui Song X-Patchwork-Id: 13975438 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1355EC02198 for ; Fri, 14 Feb 2025 17:58:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A1F35280004; Fri, 14 Feb 2025 12:58:42 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 9CF16280002; Fri, 14 Feb 2025 12:58:42 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8982B280004; Fri, 14 Feb 2025 12:58:42 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 6BAE1280002 for ; Fri, 14 Feb 2025 12:58:42 -0500 (EST) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id D9A9EA07F0 for ; Fri, 14 Feb 2025 17:58:40 +0000 (UTC) X-FDA: 83119310400.29.5694E84 Received: from mail-pl1-f178.google.com (mail-pl1-f178.google.com [209.85.214.178]) by imf13.hostedemail.com (Postfix) with ESMTP id 01F2F20005 for ; Fri, 14 Feb 2025 17:58:38 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=CjNg4pD7; spf=pass (imf13.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.178 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1739555919; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=vo/Vw3N/1YvU4dFuupnxGUy23lCsf9jYjVH89XpowKk=; b=C4UGNJKDfqzHsdGqOHABqDrUDXtKs8R0lmcf7qUbio7z2E4D4c2Cz8v2BdeLtXdW/AUukb 49F+pawXB9i5KOH0ysNeLqd/unDlKvBG9HnfRT9K+dD7sharYm/h894Fpu+zjRJiuR7/x5 3icU3pMoZCbwyeESi8KtIbNQe9KJbiQ= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=CjNg4pD7; spf=pass (imf13.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.178 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1739555919; a=rsa-sha256; cv=none; b=r34Xzrwxe8tVS9UGjP8XbyVofuJlZmcbM1NQjNvC1zkvbN9YjVHXxKy1Oy2WR2beFm50/V 6M7iSFtsFmke6C9Uk8ApXIvwaXj7caYoCnyeZPSCv3sRfpjc7s3iw3j7V+1rAdTbbMULfY eVFUn/wvAkSaxcP8eHX5ZVefl1Ea3sA= Received: by mail-pl1-f178.google.com with SMTP id d9443c01a7336-220e83d65e5so31085985ad.1 for ; Fri, 14 Feb 2025 09:58:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1739555917; x=1740160717; darn=kvack.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=vo/Vw3N/1YvU4dFuupnxGUy23lCsf9jYjVH89XpowKk=; b=CjNg4pD7BS+NG7ZDCJRUB2dhU4ge+VHGN/y3cNXaHv1crDuZpCr9gKQN0S/BBJdDfQ XWZA3DPz6cWTlcNVK8XzB7DtfnCzPa2LZmr9aEmyux+ZTjUBL4B9f2T2Nl9eyUwUj1YW gftpCCrobu74vvJHSmBZ2FrfVvcYk6dBItYwhno64FO52xbRtLJ6eq0FJCQIDeuXAxgy 2sXNTrViUy82ztZhuNfO1Nh7Vn81xrd/AdpPVPPSSen++Mn7UAvujz4zVC8lctIJJl5P GIg3X1YZj2TXU6B7BSAgBqgFM/ipPcKOwQwTY9me4+CctYS0SRBapRMc11llvaoE0J5c 0eSw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739555917; x=1740160717; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=vo/Vw3N/1YvU4dFuupnxGUy23lCsf9jYjVH89XpowKk=; b=wigN1Gldfc/AJhBjRH/2qzy8gtdZfLZVsErWwVDikHqAnS1IVp+T0Bo//Jx7tZfPIV HoX+GXkpJ7Aw5RhC4Uo/acXolQVXxeLCARu6D2x7oa+1Gi8m/IBrvyv12Fz9NMEi/w0E tN4xjNPzkK3wJo0xwdGyIjk7gk9Fvxq1Puy03i+kyHuzhHX5YPanfm2wWtwdyW975kAv 1l1aXgLczMVPCJXYU0HdLtiMKoYDFQbA9dIa6AFbluwuJn17YixwPtgZYsVaBT8Xp6BW ykYbXLi/C5KDWIs6KAQW2VQgsGkV5AdMlf/oD88ArHaY9v9HLYlvZiYK67xQDmNAr6Qw mQ5Q== X-Gm-Message-State: AOJu0YzgLggowZYCRPN9mE739e+mLl7DSJS5yaQTw60cd4e3VGFd6wOa N78DlkKRpNGroXwBhbomX9gFo0ja0Mzi3Jhn1gVWsPfwYFSrXlRRfGx8EEMsHTM= X-Gm-Gg: ASbGncskGuT9ncGgmnD7oFm45XKr25/Rclg600HOHdhBQYWquxM5dNDUjZQ4QZ9zJvE mLFiB7L1aPFkoHrt+hb8CqpIRWleUkPLG7OpmlA/z8sJ51w1v/TfLN3ZfMRbyQZ8OrPfYM3F6vq uXQlpEZwgGAWZMv3Uoi3rSu8LxdaFzd0IfmSmBMXm5PYSzVWMtOwWyHizSO+FXAPH1dCIfq8+Na ZIEwot8NRI3Qy8BZJxvzdr9pAcxprUi4rcsPmzIurHf6ANbrY2qMXXAdH869tmvmhvf1ffbEX6K 9Jp8M5VmkrowIviCO45ki28LVrJ9ve65hpdm X-Google-Smtp-Source: AGHT+IEqu8/5Sy9EBJ+I/60NzQrDUdehFXtUPvZTfs+82aTm3p2XSzB32UYFzYofOshKbGfGpZWekA== X-Received: by 2002:a17:902:ce09:b0:20d:cb6:11e with SMTP id d9443c01a7336-22104062bcdmr2226915ad.26.1739555917455; Fri, 14 Feb 2025 09:58:37 -0800 (PST) Received: from KASONG-MC4.tencent.com ([106.37.120.226]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-220d55943b5sm31216605ad.246.2025.02.14.09.58.32 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Fri, 14 Feb 2025 09:58:37 -0800 (PST) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Chris Li , Barry Song , Hugh Dickins , Yosry Ahmed , "Huang, Ying" , Baoquan He , Nhat Pham , Johannes Weiner , Kalesh Singh , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH 3/7] mm, swap: avoid redundant swap device pinning Date: Sat, 15 Feb 2025 01:57:05 +0800 Message-ID: <20250214175709.76029-4-ryncsn@gmail.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250214175709.76029-1-ryncsn@gmail.com> References: <20250214175709.76029-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 X-Rspamd-Queue-Id: 01F2F20005 X-Stat-Signature: w59u76pedogpumzuasukkzyaezhtqpm4 X-Rspam-User: X-Rspamd-Server: rspam01 X-HE-Tag: 1739555918-556446 X-HE-Meta: U2FsdGVkX189cQcDanByA3poyxZJfh+mJLBvOEniBwYgTmECsIzGrz9Cyn8PgdaRGDjI1i4IFc3vTPb5ITcOVFmLvRxOl2YPCnOqcqINHfyzdFEQx1kFlw0QqIUMAy0Y9kpUnEGpBPhIWruoLNLuvclPkLwqultBoQnDIsAyissZbF2+nlZPPaHWLWN20zOPF2Hvogsij2spISLcwQ0jbBMo236jiEolbTBKtGmUtHM6ortU/d3ZoRdZIhrlvXQNsRRwqeZGXxqR8SiUrsmOiryq45CxEk5+YV1TCa92g4IfTs8vJFG8M6g5w98BJQWzt8x8mi2Erh7+el8gsaQCi/SlCO5p43cbe1FaJARWtr+o+terP/wiI7S/2lJPDQnscSX9vYohQkJHLOWLRl3PXCUJglZzHFIQzQkSh0ZlkVELoQDoTjWxTQnzvZKhg53SjzDd83cCI5148soGBCupFl78FdkFOchyxaX6ewwhVEw/Eg/bi+NLJfmvP+/tNlMLQynTFxV9nUe/eQqucezzsXnD/l0ltM8bJHydad1GFkJrnzKqeBZxIMzoAe5zdNyuHNsHqWJv5/zjDtbvSJsDPfTSTizPA20/3gO0Qvb1YYmrz0G6SPMGqFi8w8GfJwdViXqtlegsA/zp84Rf7CgIj3BVs+UtVcO3NdMJGIGaHWHiskF9ezvp7BifGFeQBQPtNqxHZUA12Tr1+XEB8oMHfhdx+DtsQb3JBVUM5IkPV2ZpwVDsqHArxhNFJj9pBLQOrAa/sfu0eSJ5+Iz1g0XPeIBHqc8hHGF35o64XAgtIN0IyP+HQa+II0jX5etSpEfDOrUS1KKABp8CM0WCQ1ATAojhvr5BUZbpLheGgvuz6l9VtkgHWQ1o83lhGsFJaVpwy8Wfh/YuG5WdL2PbA0LUvlJW09Sn2raOKMOY7tzCnd7l9CUgdhK2c9IL6qCHsjIfY/6vbuK4Ns4EhQkZJ0x IO9cc4yr VX3gA30Zv3tmH+YF47mVYMEW+zvk6G/z7dk6oOjm63GCxdZFPaR9Ccm8kCLYG/gheN7yyGHLp3ZpWLl69sqp3BxBl6I2F7bGU8SGecncKfYOeT+U5xw1yLe71GmUDkss84S2e9vsboFrbJa5thlLhPfIj6r2CCOxgjVhtFmT3t9smL9bCxCx38QaLCgfR3pWb/p2jEkyEf4CD0frYJ2c5TZhTLCV6tQYnPxjXwPayPfTeSfkWrs20pPvug/BjE69YursKCFpDNpV93wGpKX6jP73w0EpqxiTB1CKiAOct8Qpf4zTfEKaZgtnXdzvddTKhRZ4Ij2fC9mUx1fN4Sm7F/LRX2sulmjIIyE4d0gxavDAtwtdq4seICW10kdVHqq4KwWZm34xmeGfmYpIfxguiidGiOLryF9jidg84yFkjnrmoEDF+SLCRhq/EskH5gNxUEnyX5t5yfTbUTKLSeQ00xSLJb5p1ALCNofC7d0SZ7YR9FE1crGQrBpQKRB61f2oOA/eUP3L+3F/Z65QBOSCiq90mZQtnzI1SFhkmhxXhXdq7DRbMZeooNgODnSIJANmgNSJO X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Kairui Song There are only two callers of __read_swap_cache_async not holding a swap device reference, so make them hold a reference instead, and drop the get/put_swap_device calls in __read_swap_cache_async. This should reduce the overhead for swap in during page fault slightly. Signed-off-by: Kairui Song Reviewed-by: Baoquan He --- mm/swap_state.c | 14 ++++++++------ mm/zswap.c | 6 ++++++ 2 files changed, 14 insertions(+), 6 deletions(-) diff --git a/mm/swap_state.c b/mm/swap_state.c index a54b035d6a6c..50840a2887a5 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -426,17 +426,13 @@ struct folio *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, struct mempolicy *mpol, pgoff_t ilx, bool *new_page_allocated, bool skip_if_exists) { - struct swap_info_struct *si; + struct swap_info_struct *si = swp_swap_info(entry); struct folio *folio; struct folio *new_folio = NULL; struct folio *result = NULL; void *shadow = NULL; *new_page_allocated = false; - si = get_swap_device(entry); - if (!si) - return NULL; - for (;;) { int err; /* @@ -532,7 +528,6 @@ struct folio *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, put_swap_folio(new_folio, entry); folio_unlock(new_folio); put_and_return: - put_swap_device(si); if (!(*new_page_allocated) && new_folio) folio_put(new_folio); return result; @@ -552,11 +547,16 @@ struct folio *read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, struct vm_area_struct *vma, unsigned long addr, struct swap_iocb **plug) { + struct swap_info_struct *si; bool page_allocated; struct mempolicy *mpol; pgoff_t ilx; struct folio *folio; + si = get_swap_device(entry); + if (!si) + return NULL; + mpol = get_vma_policy(vma, addr, 0, &ilx); folio = __read_swap_cache_async(entry, gfp_mask, mpol, ilx, &page_allocated, false); @@ -564,6 +564,8 @@ struct folio *read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, if (page_allocated) swap_read_folio(folio, plug); + + put_swap_device(si); return folio; } diff --git a/mm/zswap.c b/mm/zswap.c index ac9d299e7d0c..83dfa1f9e689 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -1051,14 +1051,20 @@ static int zswap_writeback_entry(struct zswap_entry *entry, struct folio *folio; struct mempolicy *mpol; bool folio_was_allocated; + struct swap_info_struct *si; struct writeback_control wbc = { .sync_mode = WB_SYNC_NONE, }; /* try to allocate swap cache folio */ + si = get_swap_device(swpentry); + if (!si) + return -EEXIST; + mpol = get_task_policy(current); folio = __read_swap_cache_async(swpentry, GFP_KERNEL, mpol, NO_INTERLEAVE_INDEX, &folio_was_allocated, true); + put_swap_device(si); if (!folio) return -ENOMEM; From patchwork Fri Feb 14 17:57:06 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kairui Song X-Patchwork-Id: 13975439 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0C00BC02198 for ; Fri, 14 Feb 2025 17:58:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 934D8280007; Fri, 14 Feb 2025 12:58:46 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 8BEA3280002; Fri, 14 Feb 2025 12:58:46 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 75EB7280007; Fri, 14 Feb 2025 12:58:46 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 54B8A280002 for ; Fri, 14 Feb 2025 12:58:46 -0500 (EST) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id F0F264684C for ; Fri, 14 Feb 2025 17:58:45 +0000 (UTC) X-FDA: 83119310610.18.DEE81DC Received: from mail-pl1-f169.google.com (mail-pl1-f169.google.com [209.85.214.169]) by imf07.hostedemail.com (Postfix) with ESMTP id 1885640006 for ; Fri, 14 Feb 2025 17:58:43 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Lw4yQ36a; spf=pass (imf07.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.169 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1739555924; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=J7iAvbK5JG/d4lmxGQvSkQ18V+/XQNNBtKhwiIbpkLY=; b=iVKA4dBQnW9ZhBW65+FcxadvjMiF/yGDILklHIdHpqvzSpujSR69ZGYLDyCObTlH7AgfCC IZrK7ZVhl+QF0zcTGbMNqrbJpIKuEgg4qfcWZAiRgQXfd2Snz6WOTotFZHK0DUHBmQO5gf xT7hvIP/T3r2s4m1gytz0/9h4CHwF+c= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Lw4yQ36a; spf=pass (imf07.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.169 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1739555924; a=rsa-sha256; cv=none; b=1NmtwbXaCij2HkByPta35q9El+6uXaHRTXmmHjCjZ1Qn65sSnj05IRt5vAbhkGSAapyOtp Kgm9IwODQ+2XB8EemgXwmnNVQrT2MY3JHrDB4GI5pM9WeT0DTJgKjFTP9IfNyGN3hPaYw4 9gu8JEbSyZzvTe4omwlspNmicmOWQpk= Received: by mail-pl1-f169.google.com with SMTP id d9443c01a7336-21f818a980cso38355835ad.3 for ; Fri, 14 Feb 2025 09:58:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1739555922; x=1740160722; darn=kvack.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=J7iAvbK5JG/d4lmxGQvSkQ18V+/XQNNBtKhwiIbpkLY=; b=Lw4yQ36aoApNfnlDLF2zxex/gnRlWtPialnZGTFmzlz0yrLo7ggi5NIBJRFGzlmlu9 EUrPQPrCGw0k33FROYXsOlKIcyYn+B7f4f0fN/WiwQAp6J3GlkZnb0vjc3fcXpt6ZNt8 mFSAamEdoAiRwZMTIcPYCuOyIc1cpz1Sg5WiV3lVUlRg6o1ELgvvVVzvgBY3AGUnwQpc MK6Ke8H8GZwnlrNm6QgRpEcEZjLMAK3qdtSe5pvg3PoMlliAzB/nD0qMhnxrWZ7VOhQW EYGlqGUkTNvGYdA4W5lvoh3OA8VK7YNBzMi2m8JoSZvav+xjmiSPJe2cJkkzpn4PwAuA LlMg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739555922; x=1740160722; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=J7iAvbK5JG/d4lmxGQvSkQ18V+/XQNNBtKhwiIbpkLY=; b=BcUBElPAE5cSEqQAy9qv+Cbe2sWogsiIAVM2XOxAJwtPvxkOlYRk9vQUesa/NwSz5d tEfd04BXTubPtKJbNSUpFZpUGd/ZAouMFbubzfHE/lfsNoNsredeEKjH7LEuF3isgZFJ 1+jRzxk9nFMw1pObYeEc5tjbrBqimd2gAmgb1HHlbAaVVADAg8uVZeUOKOPqN3HAdRe5 GisgneukwItc9dSZ5NPMi9xINnSp8RB7NZKBVorJLZHEbuWsygXcSMHBqWoWdX+BeAUs asfmlNdhoqkzKCAn1h/ta9oesfZTldqGxqNKFMyNBJaq4us5XDHQZCitvo+QjhCy93MH rSzg== X-Gm-Message-State: AOJu0YyfC4MIMmFryaNGylEgDp85PFcHJ1wxp7oSAav2t4u/tC8XOuaf Qot989eOV4OSNnbC+5xidLZ8TVq3jBWP0mID/WVUJl3tRb79AZsj57UNz0SO5f4= X-Gm-Gg: ASbGncsfuHqEse7CaTpdfEAV3xWYOMbobTn//Ce/eJ+ehB6dM+SaJuFjJr7e8BcGRHU GWw7RBIorONITan8fskosoBRAuvh6byso7msBJDmKXVJ/EKodiwLonDNyU3XmekOHqNu1EL3A5t kBiHmMY8Tmb7DKO9ZEqOm50YuP8m2cvRbHYQ+/YUP8HSRXPNw0gx7Z+KAkqZztkpH5aa6M2z05N PJ7J2K3obj47zxmX44SaoOHljooK3ufAys0R2RBorzxerWqnHpDSQWMUrXwjumgrDFrGozii2oE BINld6gHWk08H21MCH7z0UC59ahNVy/GW1M1 X-Google-Smtp-Source: AGHT+IH9ha70sbHLOmBa5PHjMDHnw1Ok3lpm5sqNiv2eKZSkk6HsPl/5KsZl/aumE/hwSF+8y4tPBg== X-Received: by 2002:a17:902:ea11:b0:220:f4db:98b4 with SMTP id d9443c01a7336-22104012399mr3146705ad.24.1739555922552; Fri, 14 Feb 2025 09:58:42 -0800 (PST) Received: from KASONG-MC4.tencent.com ([106.37.120.226]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-220d55943b5sm31216605ad.246.2025.02.14.09.58.37 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Fri, 14 Feb 2025 09:58:42 -0800 (PST) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Chris Li , Barry Song , Hugh Dickins , Yosry Ahmed , "Huang, Ying" , Baoquan He , Nhat Pham , Johannes Weiner , Kalesh Singh , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH 4/7] mm, swap: don't update the counter up-front Date: Sat, 15 Feb 2025 01:57:06 +0800 Message-ID: <20250214175709.76029-5-ryncsn@gmail.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250214175709.76029-1-ryncsn@gmail.com> References: <20250214175709.76029-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 1885640006 X-Stat-Signature: bxnk7rnri7a1zkjp4aubsrbyegysdeqz X-Rspam-User: X-HE-Tag: 1739555923-538325 X-HE-Meta: U2FsdGVkX18rtowZHuLTmKzJOJavWmD7+eM7GKSI+qlQt6pZLCX9f2c15ZDfOzCkgkXExlj3Wu4ShYj0tLGvZae2apUdgLQxbzDSAz5AX3sHk/iBLe/ZxjfV5b5NqO5DxgnlSPQy155M0jLBZB+HUHeKqYzzADhxVZBuUJCqrqIYUopI46ZX9CQ1cMpFTDvXKaAl7bKj8EtBfouaCvoOWbrOJHpNWYXfjtD/JyS4iSgV72o/Jo9xRVHU7uwvBFjUgg/gBDC9GHCxTRFQad3AiX3eVBT/cVJvqc2zXLL5p4s4jFmHfhjTuV+S24U7TG7nM9YWvPbLIx3Qw9VVMjyvvypw6Sw5dRC7evv3jGRhGw2S4xcc268b3pzjfx1juJvlPveftyfAfRdbXOSa1BETqtueL7YOGYwI4C4KQQOxCE+XPG0lIPdUiE/MFi9ODKde0pvv2iCbBb2Bp7GXocAoTReua3jhxsM05decZ9SNmGwnPHULCWyR0Bo7PBcTBumxxoW7zvE5FKkVLBEeIBmVc24fAzz0V4rtNK/5xGHyiaiZlmtPJGwQngfvbyKHV15nULmwvKYNOlDlDYInZ4kqDmyLdEpqbXfU8ECwwJVwc4GE5Mw1SSAP1QO2JXB7iD964yNnThIn7IG1X2MSCQuhZRTSt22CVzxBWP2firg8NgKatdUfsRBaOXbv/VlvrvMcYg3pTSlsczpvFS2btygirZLAnhkSub3UAcYGmKh8q1JUTSy0zkVJywGvmuvIvih/n+ai3qYU3e8ydH/4XLZa1vTTwUItwbZSbklhCdIH8qDQjYV9cZtLrMmB5uXFNY3QSigmzyvhzPjnKpjwLiGE0UcsFUufs8XcGqOZ4iu6HYMZDTRPFbxf1Cx9KWmBgFjYiP9I4hZZ8V80kyPU4dgr9vN5OGLgiGz3eu6VzJ823s71zGS5PMwuJNiu5AOaWXt7ZymnFPNU+TCYv9nJZVn LH1j9g2Q kY4lbWAXR31YJHMxsB7HX2JWBhaxZoSTrONGjTobuCxM/Z4V3jj4SPG4Bq8zFCOCSI/I/HOKoElYpALPVRB7kjD4/BmNMCKraPP5GXvw40lY0ln+7hHDpAZignZDQkH1bFgQTZsczIVDhJXeCzQ/q2ywOCVPZpPs8Y25RtAvGj3GEEQHUhUtNHymbwSgumbp5y1KDIf5h7q6+iF/xQ4bSxQIpeqFQXpPiQIqgFE19K6IMK6jxoK4RzDixHo9Lfa58KE+Vz2ZbYmSQ82wT2wHH2O108oeDckbxM1ryRDdg5MU4uUceRKWm2SSz+xEDRqXTKoxVmw1wvuvqSMEABwlWecvp52quzkAXbr+CO2jL+PhWXg6Ad9t+k+R2U7UADEnpHrjZ+Pu/69xBXMfRL0S6qV5CxkT4tS+FR49BRzcq3XQZlzRnRcVPXxMSJXBqsiEnwGVP51Eh7MXxPuFLm/4ffYUhZMtvyfBp3h3mIzS6NtgZOZVeGOnC/+G7NiXha9O4ysPswWMMeKMoRlVibf6xn6SRoBTdaCQkiNWtTaYjrds1ZPcgvBL3bcyd1gTut6N+iSCr X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Kairui Song The counter update before allocation was useful to avoid unnecessary scan when device is full, so it will abort early if the counter indicated the device is full. But that is an uncommon case, and now scanning of a full device is very fast, so the up-front update is not helpful any more. Remove it and simplify the slot allocation logic. Signed-off-by: Kairui Song --- mm/swapfile.c | 18 ++---------------- 1 file changed, 2 insertions(+), 16 deletions(-) diff --git a/mm/swapfile.c b/mm/swapfile.c index 449e388a6fec..ae3bd0a862fc 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1208,22 +1208,10 @@ int get_swap_pages(int n_goal, swp_entry_t swp_entries[], int entry_order) int order = swap_entry_order(entry_order); unsigned long size = 1 << order; struct swap_info_struct *si, *next; - long avail_pgs; int n_ret = 0; int node; spin_lock(&swap_avail_lock); - - avail_pgs = atomic_long_read(&nr_swap_pages) / size; - if (avail_pgs <= 0) { - spin_unlock(&swap_avail_lock); - goto noswap; - } - - n_goal = min3((long)n_goal, (long)SWAP_BATCH, avail_pgs); - - atomic_long_sub(n_goal * size, &nr_swap_pages); - start_over: node = numa_node_id(); plist_for_each_entry_safe(si, next, &swap_avail_heads[node], avail_lists[node]) { @@ -1257,10 +1245,8 @@ int get_swap_pages(int n_goal, swp_entry_t swp_entries[], int entry_order) spin_unlock(&swap_avail_lock); check_out: - if (n_ret < n_goal) - atomic_long_add((long)(n_goal - n_ret) * size, - &nr_swap_pages); -noswap: + atomic_long_sub(n_ret * size, &nr_swap_pages); + return n_ret; } From patchwork Fri Feb 14 17:57:07 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kairui Song X-Patchwork-Id: 13975440 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 44D58C02198 for ; Fri, 14 Feb 2025 17:58:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D05A8280008; Fri, 14 Feb 2025 12:58:51 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id CB666280002; Fri, 14 Feb 2025 12:58:51 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B08CA280008; Fri, 14 Feb 2025 12:58:51 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 92B30280002 for ; Fri, 14 Feb 2025 12:58:51 -0500 (EST) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 380E01607C3 for ; Fri, 14 Feb 2025 17:58:51 +0000 (UTC) X-FDA: 83119310862.17.D9A3E66 Received: from mail-pl1-f171.google.com (mail-pl1-f171.google.com [209.85.214.171]) by imf19.hostedemail.com (Postfix) with ESMTP id 49DB31A0008 for ; Fri, 14 Feb 2025 17:58:49 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=jzDdKBGI; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf19.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.171 as permitted sender) smtp.mailfrom=ryncsn@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1739555929; a=rsa-sha256; cv=none; b=R6LAWljjzLZkFiNR2VC2NeeHapSlh9KYJVdjoCBtXWBdJMb+j5y/m+ggWSD5uVQJoNsci7 DirBzZXud76TlHy746+26rrZr3w4aDPj7fAZUNT3ZgzaZu7bJCR2Ozz73jyCMRCA2xEfKO H2fFxgrF2vMzvMDKSTzR/1ZL44RUZ7o= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=jzDdKBGI; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf19.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.171 as permitted sender) smtp.mailfrom=ryncsn@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1739555929; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=4dCFAluMOzAQvK61QbUUUqwSef8skzDGJtN8wrwwO2c=; b=MBjQU0nIhXgbj2a5Wz/SBFrm74DzsI5Ga2asYRh0s2ZwR8LNS0LmUgE5oh4YjXGAcpSVVS MCi45Kx03y3iPsNTaR+FQe2+Y+AQ5i99JNlV1/ONc5FjxHPzYr2QXR6n6fGr+IZE2Jg0Gm GZZcT1BZIPQftjJYIBH2Q+sq/BezE/w= Received: by mail-pl1-f171.google.com with SMTP id d9443c01a7336-220c2a87378so34432855ad.1 for ; Fri, 14 Feb 2025 09:58:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1739555928; x=1740160728; darn=kvack.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=4dCFAluMOzAQvK61QbUUUqwSef8skzDGJtN8wrwwO2c=; b=jzDdKBGIa/ywhxlWMLnzg5UC+2a2Ge7BswlwneRThyj5nW78aFrFNzmcGdeVhSbmHM Zb1iJwxlc+7K97aQ2vFMdYjTVq8bAjirFi5cOFYJztP3yn4tsxYs0N4Bgh7mJeqgfTdl OW6+y/Q9DXXCruZVAqe7SNG4qAew/u7IJpzZ1ajh2TNg6M9a44uQgJZ2mNh/DBZb1bqY NTVBo8ND7TJs3qxLxBi5g1e9dNqNaFJqjNu1zlsCuR2a36MlkXlc81mPjFj/8OLRo4zo z8IFS2ZSWZ2ICaEL8KqZepy95nPJIdDV+HtISSTmOs79UdQnAVDSfs/rETNjgsVRG4Q0 H0yA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739555928; x=1740160728; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=4dCFAluMOzAQvK61QbUUUqwSef8skzDGJtN8wrwwO2c=; b=cowIyJKiaLisJ/c1YE2TcmNXIobEgYxS0miV4ZrzCA+A3/X5BM9A/98zVcJ8m5syFY HE3dtV7FX7bAHqgpy7Nh7Rcc4isrkIWLUFO6uAYJMs2QUlC/vS4sCWyOX6UJgkkKMZIc N+27fmHISghXswwGm+WNvSoc6QWz4bKXrP7bq6G7ZleuM1SZ4xLr8OIfzDJgGR0EGs4r mSATW3gKH4NG8ZQv9RstrrhJWiaCVID0WMTTf89e9AU+d8+HXNggYCMZorHHHkoY+iLg CSa8DlJZGYtkcWMBuVvkYaJH8c/dAK4WVl9j5AZ/0EHeoelur/x/LS3YwntBZn30XGqd 0b4Q== X-Gm-Message-State: AOJu0Ywx4bWSdpt0mHdsA/7AuhWFSHd/mXW2voHg4H2Je1VjACYuxXh7 oot0BlDPp5ae5b4b9rWKNGvKWN7cHsn0gTqphC/Gt6ToznIGjDk8kk3xZvHYBFE= X-Gm-Gg: ASbGncsD2903qAzqNZkzXq9mA9VMUy1xV46z03J1yAJG2Zyv0m+LcgcdJiGgc15EC9p z6k5ZsU2TMFwg4fIRQktr+Yi4H2CcSf0f+cW3ksBqje4wD8z33d03czAfHyFNWd9XUgeHw28DEB FFYKK8SDhGxt+1p1QofCES6naVodlsLs7CftqcvhxB1NwWxk2L8agjTV1o9w/gX1axpu3tQUT8h JTML9H9s9GawmK6dOSllGld5gjGwmajT48s+5vdyDXTXPLphvbPtIjMBrBgJyq+8j8SG7XrlLrV aYG6cy7Cen9wpfQXjZrEemN1af0HXG0jOItU X-Google-Smtp-Source: AGHT+IEGQxigwy9i4C5LjKLVWwRE3ILrRt8Bj7f0chMUrukyyWwUbg5alAHyT9i/np1CLUlPEgoenQ== X-Received: by 2002:a17:903:2301:b0:21f:6ce1:7410 with SMTP id d9443c01a7336-2210405d629mr2962655ad.24.1739555927679; Fri, 14 Feb 2025 09:58:47 -0800 (PST) Received: from KASONG-MC4.tencent.com ([106.37.120.226]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-220d55943b5sm31216605ad.246.2025.02.14.09.58.42 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Fri, 14 Feb 2025 09:58:47 -0800 (PST) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Chris Li , Barry Song , Hugh Dickins , Yosry Ahmed , "Huang, Ying" , Baoquan He , Nhat Pham , Johannes Weiner , Kalesh Singh , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH 5/7] mm, swap: use percpu cluster as allocation fast path Date: Sat, 15 Feb 2025 01:57:07 +0800 Message-ID: <20250214175709.76029-6-ryncsn@gmail.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250214175709.76029-1-ryncsn@gmail.com> References: <20250214175709.76029-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 49DB31A0008 X-Stat-Signature: tqe9tbdehcxr1spcp9qp51w9ycjw5131 X-Rspam-User: X-HE-Tag: 1739555929-54635 X-HE-Meta: U2FsdGVkX1+trmoUULi8ryahKbx+eWe/IxGPGZz7XhJuEjB4u0u/1clsu0ve7D4u4zmwsYUsv5sIXM9d09QcLtwkxWjFupASTYCb9qKlDCs6slbeMFsN7DkfASAubKsNFJ+PxKSp7AVOW/D0e5F86x3ovttSok1/qmRebXCyYHTmQ4zjvjL4NPyhgBuy+WiachTLMn4DvF7HVQpAAxnLP5mom1oe8Ds6izrod6aq/vsdQpYpMMtBKdpQ0LgGBa9t++TjKjk8/MRGwXWDmUbHpf+WKpbgu7g09YlGASm+I/bkQlSA7OA20pBDoOOj83HLxYhxdRfbpdIAfTTxpjZvDTLHczfphxV8SYyc6P6FrF8dhFyDjBQKiDo/iAJ4lkJojd2jOWWEs5r2g0ogMBKYFygbYXCt+9C5w0HQwef+Kek8T/5kgshvy6aOgIRPlHCxV/MP5G1WAn8pKsQiWEvk+gWeCnnaSKOU5ZiOdLoaQssmxu7nZCut61mu4I2DEo2zzDtG0sXWWakKLDCT8TeOiNp4eYXSAzi+4T7Ed0gzTdSFJBxPELgqmH6FKSCMjvzcCbJIeJicHpfi0gM0E7h8tQzDb5ayjI6bsggxctc+4HMyDEvWPkiGvqFFgNqnK+XaMHfINbF+QZ/v6O3EQwHrn85NVanMBWYWqEYgqwYy650xjGeozDXJ/XczWYq6lV9OpgENatgfI7WFicFoX23pzVkOJE9bKL/Qzbwn9rfEEmFdzYo05bu3FCj95wrVWiJt0N4w9w87QQdjKWpVOEeCgD96Ks2E7FS4Y2pFoXtK5bfl5iKeD4uqnGGrEjyt1D42STxaYc9bDLZ7pLljjxCAw0WSS2bIaF4KsAw/no30Y6T20oOZJ92h2HHuJGu1r2Gr42HguWQmtcSvM+pUDQFssKHLrg515OKBl5a8/ESzpvDgBg2JDvHgJMEiF1hoqUsyxMpB2CXEJZ2OWowjTfU DVrxmefu lhDec+wyQHeO5ygYYdh688ZPYW3RmZcscWJDdjtsX6lQZh4B4grCz6pZ4r/J8HCH5sKxRBlUCqdPepnifvlq60VmQNZ357hXov3rnlEDpYPTzUxvqq4aFwxdkfnPz7niYAfmhlh40aSeRvNcxIcsMkzL9ObUmYD1+WJFayzaEd7Iy8nWJSGPK/wcZVLkYSG6cfiy0F6NCkePpYkVFGLIsVttOHZOtObi8joIZ1p+2dv0GHk6+Q1XdzmQh9440u9UUswZBlAs484tVHCZcOCGKVjC15sd+mRqFT2w0ju9jR0NkTIFFzYXPcwLUQ0BtlMwuNpkqMxt0Qay0bzrDoBId5uuUUTea5IS95hgzdKV246SzKRwRdSQUlNF04xKJ3ynvtJLAYshwTyvAzmjuH/rZW+gUVwJ2JPH1DLRVVKRRz9hfjRbUhrQ4ZFh/cCFHBiDKQYEMpmkmOUhEk9Ltv9I7PmOeZMsk/K9VLCVpwNPzVP0ILtEQLsV3kYJ2iib2B5xdi15xUibHv/HlZ8KVx7UtsLv6Q8D1s2fX3evoH618PDdierQfFUfVuiAGEkrCTXmW8gDS X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Kairui Song Current allocation workflow first traverses the plist with a global lock held, after choosing a device, it uses the percpu cluster on that swap device. This commit moves the percpu cluster variable out of being tied to individual swap devices, making it a global percpu variable, and will be used directly for allocation as a fast path. The global percpu cluster variable will never point to a HDD device, and allocation on HDD devices is still globally serialized. This improves the allocator performance and prepares for removal of the slot cache in later commits. There shouldn't be much observable behavior change, except one thing: this changes how swap device allocation rotation works. Currently, each allocation will rotate the plist, and because of the existence of slot cache (64 entries), swap devices of the same priority are rotated for every 64 entries consumed. And, high order allocations are different, they will bypass the slot cache, and so swap device is rotated for every 16K, 32K, or up to 2M allocation. The rotation rule was never clearly defined or documented, it was changed several times without mentioning too. After this commit, once slot cache is gone in later commits, swap device rotation will happen for every consumed cluster. Ideally non-HDD devices will be rotated if 2M space has been consumed for each order, this seems reasonable. HDD devices is rotated for every allocation regardless of the allocation order, which should be OK and trivial. Signed-off-by: Kairui Song --- include/linux/swap.h | 11 ++-- mm/swapfile.c | 120 +++++++++++++++++++++++++++---------------- 2 files changed, 79 insertions(+), 52 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index 2fe91c293636..a8d84f22357e 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -284,12 +284,10 @@ enum swap_cluster_flags { #endif /* - * We assign a cluster to each CPU, so each CPU can allocate swap entry from - * its own cluster and swapout sequentially. The purpose is to optimize swapout - * throughput. + * We keep using same cluster for rotating device so swapout will be sequential. + * The purpose is to optimize swapout throughput on rotating device. */ -struct percpu_cluster { - local_lock_t lock; /* Protect the percpu_cluster above */ +struct swap_sequential_cluster { unsigned int next[SWAP_NR_ORDERS]; /* Likely next allocation offset */ }; @@ -315,8 +313,7 @@ struct swap_info_struct { atomic_long_t frag_cluster_nr[SWAP_NR_ORDERS]; unsigned int pages; /* total of usable pages of swap */ atomic_long_t inuse_pages; /* number of those currently in use */ - struct percpu_cluster __percpu *percpu_cluster; /* per cpu's swap location */ - struct percpu_cluster *global_cluster; /* Use one global cluster for rotating device */ + struct swap_sequential_cluster *global_cluster; /* Use one global cluster for rotating device */ spinlock_t global_cluster_lock; /* Serialize usage of global cluster */ struct rb_root swap_extent_root;/* root of the swap extent rbtree */ struct block_device *bdev; /* swap device or bdev of swap file */ diff --git a/mm/swapfile.c b/mm/swapfile.c index ae3bd0a862fc..791cd7ed5bdf 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -116,6 +116,18 @@ static atomic_t proc_poll_event = ATOMIC_INIT(0); atomic_t nr_rotate_swap = ATOMIC_INIT(0); +struct percpu_swap_cluster { + struct swap_info_struct *si; + unsigned long offset[SWAP_NR_ORDERS]; + local_lock_t lock; +}; + +static DEFINE_PER_CPU(struct percpu_swap_cluster, percpu_swap_cluster) = { + .si = NULL, + .offset = { SWAP_ENTRY_INVALID }, + .lock = INIT_LOCAL_LOCK(), +}; + static struct swap_info_struct *swap_type_to_swap_info(int type) { if (type >= MAX_SWAPFILES) @@ -548,7 +560,7 @@ static bool swap_do_scheduled_discard(struct swap_info_struct *si) ci = list_first_entry(&si->discard_clusters, struct swap_cluster_info, list); /* * Delete the cluster from list to prepare for discard, but keep - * the CLUSTER_FLAG_DISCARD flag, there could be percpu_cluster + * the CLUSTER_FLAG_DISCARD flag, percpu_swap_cluster could be * pointing to it, or ran into by relocate_cluster. */ list_del(&ci->list); @@ -815,10 +827,12 @@ static unsigned int alloc_swap_scan_cluster(struct swap_info_struct *si, out: relocate_cluster(si, ci); unlock_cluster(ci); - if (si->flags & SWP_SOLIDSTATE) - __this_cpu_write(si->percpu_cluster->next[order], next); - else + if (si->flags & SWP_SOLIDSTATE) { + __this_cpu_write(percpu_swap_cluster.si, si); + __this_cpu_write(percpu_swap_cluster.offset[order], next); + } else { si->global_cluster->next[order] = next; + } return found; } @@ -869,9 +883,8 @@ static void swap_reclaim_work(struct work_struct *work) } /* - * Try to get swap entries with specified order from current cpu's swap entry - * pool (a cluster). This might involve allocating a new cluster for current CPU - * too. + * Try to allocate swap entries with specified order and try set a new + * cluster for current CPU too. */ static unsigned long cluster_alloc_swap_entry(struct swap_info_struct *si, int order, unsigned char usage) @@ -879,18 +892,12 @@ static unsigned long cluster_alloc_swap_entry(struct swap_info_struct *si, int o struct swap_cluster_info *ci; unsigned int offset, found = 0; - if (si->flags & SWP_SOLIDSTATE) { - /* Fast path using per CPU cluster */ - local_lock(&si->percpu_cluster->lock); - offset = __this_cpu_read(si->percpu_cluster->next[order]); - } else { + if (!(si->flags & SWP_SOLIDSTATE)) { /* Serialize HDD SWAP allocation for each device. */ spin_lock(&si->global_cluster_lock); offset = si->global_cluster->next[order]; - } - - if (offset) { ci = lock_cluster(si, offset); + /* Cluster could have been used by another order */ if (cluster_is_usable(ci, order)) { if (cluster_is_empty(ci)) @@ -980,9 +987,7 @@ static unsigned long cluster_alloc_swap_entry(struct swap_info_struct *si, int o } } done: - if (si->flags & SWP_SOLIDSTATE) - local_unlock(&si->percpu_cluster->lock); - else + if (!(si->flags & SWP_SOLIDSTATE)) spin_unlock(&si->global_cluster_lock); return found; } @@ -1203,6 +1208,41 @@ static bool get_swap_device_info(struct swap_info_struct *si) return true; } +/* + * Fast path try to get swap entries with specified order from current + * CPU's swap entry pool (a cluster). + */ +static int swap_alloc_fast(swp_entry_t entries[], + unsigned char usage, + int order, int n_goal) +{ + struct swap_cluster_info *ci; + struct swap_info_struct *si; + unsigned int offset, found; + int n_ret = 0; + + n_goal = min(n_goal, SWAP_BATCH); + + si = __this_cpu_read(percpu_swap_cluster.si); + offset = __this_cpu_read(percpu_swap_cluster.offset[order]); + if (!si || !offset || !get_swap_device_info(si)) + return 0; + + while (offset) { + ci = lock_cluster(si, offset); + found = alloc_swap_scan_cluster(si, ci, offset, order, usage); + if (!found) + break; + entries[n_ret++] = swp_entry(si->type, found); + if (n_ret == n_goal) + break; + offset = __this_cpu_read(percpu_swap_cluster.offset[order]); + } + + put_swap_device(si); + return n_ret; +} + int get_swap_pages(int n_goal, swp_entry_t swp_entries[], int entry_order) { int order = swap_entry_order(entry_order); @@ -1211,19 +1251,28 @@ int get_swap_pages(int n_goal, swp_entry_t swp_entries[], int entry_order) int n_ret = 0; int node; + /* Fast path using percpu cluster */ + local_lock(&percpu_swap_cluster.lock); + n_ret = swap_alloc_fast(swp_entries, + SWAP_HAS_CACHE, + order, n_goal); + if (n_ret == n_goal) + goto out; + + n_goal = min_t(int, n_goal - n_ret, SWAP_BATCH); + /* Rotate the device and switch to a new cluster */ spin_lock(&swap_avail_lock); start_over: node = numa_node_id(); plist_for_each_entry_safe(si, next, &swap_avail_heads[node], avail_lists[node]) { - /* requeue si to after same-priority siblings */ plist_requeue(&si->avail_lists[node], &swap_avail_heads[node]); spin_unlock(&swap_avail_lock); if (get_swap_device_info(si)) { - n_ret = scan_swap_map_slots(si, SWAP_HAS_CACHE, - n_goal, swp_entries, order); + n_ret += scan_swap_map_slots(si, SWAP_HAS_CACHE, n_goal, + swp_entries + n_ret, order); put_swap_device(si); if (n_ret || size > 1) - goto check_out; + goto out; } spin_lock(&swap_avail_lock); @@ -1241,12 +1290,10 @@ int get_swap_pages(int n_goal, swp_entry_t swp_entries[], int entry_order) if (plist_node_empty(&next->avail_lists[node])) goto start_over; } - spin_unlock(&swap_avail_lock); - -check_out: +out: + local_unlock(&percpu_swap_cluster.lock); atomic_long_sub(n_ret * size, &nr_swap_pages); - return n_ret; } @@ -2733,8 +2780,6 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile) arch_swap_invalidate_area(p->type); zswap_swapoff(p->type); mutex_unlock(&swapon_mutex); - free_percpu(p->percpu_cluster); - p->percpu_cluster = NULL; kfree(p->global_cluster); p->global_cluster = NULL; vfree(swap_map); @@ -3133,7 +3178,7 @@ static struct swap_cluster_info *setup_clusters(struct swap_info_struct *si, unsigned long nr_clusters = DIV_ROUND_UP(maxpages, SWAPFILE_CLUSTER); struct swap_cluster_info *cluster_info; unsigned long i, j, idx; - int cpu, err = -ENOMEM; + int err = -ENOMEM; cluster_info = kvcalloc(nr_clusters, sizeof(*cluster_info), GFP_KERNEL); if (!cluster_info) @@ -3142,20 +3187,7 @@ static struct swap_cluster_info *setup_clusters(struct swap_info_struct *si, for (i = 0; i < nr_clusters; i++) spin_lock_init(&cluster_info[i].lock); - if (si->flags & SWP_SOLIDSTATE) { - si->percpu_cluster = alloc_percpu(struct percpu_cluster); - if (!si->percpu_cluster) - goto err_free; - - for_each_possible_cpu(cpu) { - struct percpu_cluster *cluster; - - cluster = per_cpu_ptr(si->percpu_cluster, cpu); - for (i = 0; i < SWAP_NR_ORDERS; i++) - cluster->next[i] = SWAP_ENTRY_INVALID; - local_lock_init(&cluster->lock); - } - } else { + if (!(si->flags & SWP_SOLIDSTATE)) { si->global_cluster = kmalloc(sizeof(*si->global_cluster), GFP_KERNEL); if (!si->global_cluster) @@ -3432,8 +3464,6 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags) bad_swap_unlock_inode: inode_unlock(inode); bad_swap: - free_percpu(si->percpu_cluster); - si->percpu_cluster = NULL; kfree(si->global_cluster); si->global_cluster = NULL; inode = NULL; From patchwork Fri Feb 14 17:57:08 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kairui Song X-Patchwork-Id: 13975442 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 920F5C021A4 for ; Fri, 14 Feb 2025 17:59:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2CD1928000B; Fri, 14 Feb 2025 12:59:15 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 27CFA280002; Fri, 14 Feb 2025 12:59:15 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 11EBC28000B; Fri, 14 Feb 2025 12:59:15 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id E1A48280002 for ; Fri, 14 Feb 2025 12:59:14 -0500 (EST) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id C2766C080F for ; Fri, 14 Feb 2025 17:58:56 +0000 (UTC) X-FDA: 83119311072.27.73A8E9E Received: from mail-pl1-f175.google.com (mail-pl1-f175.google.com [209.85.214.175]) by imf03.hostedemail.com (Postfix) with ESMTP id B0B8C20003 for ; Fri, 14 Feb 2025 17:58:54 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=jj4YGjuK; spf=pass (imf03.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.175 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1739555934; a=rsa-sha256; cv=none; b=BC9Jtgq5p5zj8ewagMl2QUxxP5SJQXDfTYuTAJSnhVPfQHEOmr/qeuYptZFLj0inQJYeSP L919VZIziRN53+z02J57wvi5dPx9miYsGLTRds3NRXWLuJ1pHp6MOhOKG14+91BKTJHLkU QQijwBA1sJO6sC0KIcOihNeWeVHWkzA= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=jj4YGjuK; spf=pass (imf03.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.175 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1739555934; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=IrxkWCsY2BG0TogE7kDNIlucXg1dBLFmlliY7Uuharg=; b=xUFF16CFZs1ifNcKO2AoozIcwGv3r9zV4hyI8M1A+EYmXEh/GXUBCuy9M9vqDi0DcfQ4Pm xwVJa/2SMXriCjYk5Xp6jX3aZLLVweX5kREpfaI9DeUhwwAvmjSEedh1g7HfGgkpu0WV3J kv/gBoFX5zZ1JqNz2keZCaeNbmdQya8= Received: by mail-pl1-f175.google.com with SMTP id d9443c01a7336-21f78b1fb7dso41685835ad.3 for ; Fri, 14 Feb 2025 09:58:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1739555933; x=1740160733; darn=kvack.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=IrxkWCsY2BG0TogE7kDNIlucXg1dBLFmlliY7Uuharg=; b=jj4YGjuKG1kjI2f11Wk00RcBHNiuok0l+uxDc0py4ooj0z00WxCMm0i3S2Z+2EL0E2 eYU6jqTfOko7eKVG10DKFLO65+CMuH2Zc5Sl4oiI/wktikiCoVU/xYj6WE/zQemGfZia 398E4jLVK1jjeapoxxqaKu4UAkGxwWIM1vDShpTlIloPlMzg5tO7hecJWDsc0ZW1Qntt sPR+zuLfKVbBwVhJg+nVpO24We66GfHKdAVLsKZTgb2kXnJ7QYODBKU/cMNBnFQ7/CwF UIQQV7kvN7VjYgJT0VGfGaKZV+w6xa43B+RANbBC0x1NVGarAJP6du5g7XyIgeUr6Oog 4hKQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739555933; x=1740160733; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=IrxkWCsY2BG0TogE7kDNIlucXg1dBLFmlliY7Uuharg=; b=YLINAvFMzXQzCGqtZpKgDMzy5GWxs/Ua4T4xuFM4USYHXvgttdDVOMH16G5+a7BUcd Ao46zaLSOs1EcxFWVA34B9VkTkai+9BuKlbpf4ZYTyKVzVr7flJp5szBtNqVGWGqquBS 7g5imxVuCY7YISCcD6Tsfo1/ahwsYLfreUCw+KGlpurqzpfCwq5lmOI4NAFAvsLo1+b/ N6hYTxciKd0OjXp++fejVR6TfXiutXwWUFttlsjDByD1BXVNtIOGUBkt7zYc3/ZvNIuf UM3E5TKO27bKqOA1U2Cp5ETIlRQbS8YQFWOY5U8doxe09SQjIXChjgAvIz71d9kAFVbl jLhg== X-Gm-Message-State: AOJu0YxllfFOU1X1BGj/IG2NIyftdY5rrXcLVyXFxlS7K6YdLoLM7RYi eMzK5tDFIkhIXFvMyaqLgrKBHYmT4PsZfhkysfSha18Db5DS2Zs/ehH/dNeT1nI= X-Gm-Gg: ASbGncseEw6sylqpjc7lCuQvRtwAlr2+DWjJZCImAt47Ej+N+/IGMYe98iLYEuLhVsF mNIaALc9a8VKYHmrojzhPUPcr6C7Wok8huYnFgSkfngg4UiFyAjbYl7wEAzAXTcXgfY/e7mSQO7 dz88tzB9Qz2gSf5gOHbDuc6DVASASeOHkTvlI2LOau4v2J6wdm5KKU49rchRd3gAm5F/IbDZC6f Sp19Ljxicxc7ZeoSXWoLyyBVSKpnfW9VlW/SOtSM9xjUU6I2AecAFW7zzdOEJpZZc4YhrdSsIe2 8bvU0nf4X4Oj/dtX5qEv9hCY5TvSX4k2tdnc X-Google-Smtp-Source: AGHT+IFU/H/3lJEuxMEyY0X4vY4k43jZAun4SfbIvrjpzUIvreDXgYLIc7CySd22D5UQYi76P8WUVQ== X-Received: by 2002:a17:902:cccc:b0:220:be86:a42d with SMTP id d9443c01a7336-22104048ecemr2911695ad.21.1739555932901; Fri, 14 Feb 2025 09:58:52 -0800 (PST) Received: from KASONG-MC4.tencent.com ([106.37.120.226]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-220d55943b5sm31216605ad.246.2025.02.14.09.58.48 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Fri, 14 Feb 2025 09:58:52 -0800 (PST) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Chris Li , Barry Song , Hugh Dickins , Yosry Ahmed , "Huang, Ying" , Baoquan He , Nhat Pham , Johannes Weiner , Kalesh Singh , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH 6/7] mm, swap: remove swap slot cache Date: Sat, 15 Feb 2025 01:57:08 +0800 Message-ID: <20250214175709.76029-7-ryncsn@gmail.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250214175709.76029-1-ryncsn@gmail.com> References: <20250214175709.76029-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 X-Rspamd-Queue-Id: B0B8C20003 X-Stat-Signature: xbfp7jjefbdwk38rs7f53irb9isndkcy X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1739555934-204802 X-HE-Meta: U2FsdGVkX1/4CMXjzagpVoJmd1acr7YiI8AM7U9qqhLz4dYoWyr7OgNJkrxSATMmN42WWfDEhiWVvuPeNcTu0dgfxWBlKKMIhxRrokeIUZI8fQrHcEE+uQX+Ilvi6/8xfnjPOw532C/x/0pj7aZuxGBxJ+zNq9f5PRD4ietTP5RnSTzCUa5G5LgcDR0QSwx12/lEZiAnq8y478FIDSLmwIVZS/1V+8a5YgfAoujR/h54o9bSxIv7wZPTd/FKFoikhe43SeDj8nXD9IqQrKyCaL9hi8I+ylbl3iG8jg6t24PQGs6NyMwXTC7YA3zRBSBZSsYvAA4zLhRc609u2kfGFbH9Xvzog5AuuCom+d22i0EV1Y9FBlJ1m7BrYADVTL8BrVW5q62Hx/svpnXWF1a08AQjYwCvDrkZszciMOted5CnifcXyNxmPu8r8eoxIUFo2iX64AGc6ZQs7O1+OZs/FI6COH48LVSk70CGSRHdaLf9ry33b6Y5yNriHDZEYe3zJiO1etet8F0JVovsSFk6IxgxCoxYOnhab3ILd48yDRXg4UgxsBeSEaJKRf4xfwiH+2jfGEmrIXERQunBrYbaaN8SzqINJ75dNWecchR/mB4lo90AFkufutX9z0yfJhXGdH734tRR7KzJKV2FAuCu35G8ctHkimIqawFXSfGkr7sCIrQUy0WNFLfXRT1eInGp3IkE8Gs+BhhFBpszywJ3MrFpWrzacij3edAX7AF1lUMwQj001sQaqvHL9HOWtcFYaZHN9WMVaGTPL38UYsrGBtxKy2hkor03nz9/EM3tjRJqFpnjsHpHeuaZE2p3glZjTM22ATptMIzJRg+rLBVU4wtin/q7Is8h2N/5RVyFiTQ9fJBuMXbgvLU127B3dx3h0pAI/eFw9s4nYABWOrbVyq+LBEJwnLP0c592H2thvLJWSM6FjSc9iwhtsiEDJkTgjFRz9+wOCAPOBbNSL/k flw4dxwg w6YU8azfKiIeguVukogHTHnJ48YV63c/p5fAJshS6sSSBeSt9e+ISUGA7BJ5UCf65IA64nRQRIB4iTa1TrRvCYFjcji597NbS66FfKKKGHNXvKQRTnb3jTZ0p7VQYDX4HVVwZKz3Dg82T9k6/k2PrC53LHNAoWnm8ZemO3kcc4La5Z9k6F9ph7DnvsD4JSb14mKmmctyK75HZnWv/w77iqhQuK2+NC9ddQs3vvGs8zn1q/1wpu30hz9+7ddjqm7rEWcq/Ins5mevcWFZvCzQ/f05QMcMoponiXk6a+5Uvu4dlHchx6EDlDHaVwjiMsh4/sMKw51co+pWQmWQEku4EYlfgdEnrTCM6dotACsdYhFWvrs8jH+FLqr7GFAyFjnH4rEowG9D14IVRgqJRdAeo8ZCWmzejiuWPMC+wQ+rwfIMOBM2npz1zlzu4JHmmfSnvoRm+RCgfxdAQzLAM7OM2+mJZrs0p+OMqeybDTTD4HT45HjMyKYkWwNhEcwwg2m+mJaULoYL0YPSyC5bXaaaZnHaS0ZfOLey/+NmjOHHjkUoZ6V02O8BDFGLbys5Ea1ukwx9f2lcLn0ar0LPj/McyYs5C9w== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Kairui Song Slot cache is no longer needed now, removing it and all related code. - vm-scalability with: `usemem --init-time -O -y -x -R -31 1G`, 12G memory cgroup using simulated pmem as SWAP (32G pmem, 32 CPUs), 16 test runs for each case, measuring the total throughput: Before (KB/s) (stdev) After (KB/s) (stdev) Random (4K): 424907.60 (24410.78) 414745.92 (34554.78) Random (64K): 163308.82 (11635.72) 167314.50 (18434.99) Sequential (4K, !-R): 6150056.79 (103205.90) 6321469.06 (115878.16) The performance changes are below noise level. - Build linux kernel with make -j96, using 4K folio with 1.5G memory cgroup limit and 64K folio with 2G memory cgroup limit, on top of tmpfs, 12 test runs, measuring the system time: Before (s) (stdev) After (s) (stdev) make -j96 (4K): 6445.69 (61.95) 6408.80 (69.46) make -j96 (64K): 6841.71 (409.04) 6437.99 (435.55) Similar to above, 64k mTHP case showed a slight improvement. Signed-off-by: Kairui Song --- include/linux/swap.h | 2 - include/linux/swap_slots.h | 28 ---- mm/Makefile | 2 +- mm/swap_slots.c | 295 ------------------------------------- mm/swap_state.c | 8 +- mm/swapfile.c | 173 ++++++++-------------- 6 files changed, 64 insertions(+), 444 deletions(-) delete mode 100644 include/linux/swap_slots.h delete mode 100644 mm/swap_slots.c diff --git a/include/linux/swap.h b/include/linux/swap.h index a8d84f22357e..456833705ea0 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -465,7 +465,6 @@ void free_pages_and_swap_cache(struct encoded_page **, int); extern atomic_long_t nr_swap_pages; extern long total_swap_pages; extern atomic_t nr_rotate_swap; -extern bool has_usable_swap(void); /* Swap 50% full? Release swapcache more aggressively.. */ static inline bool vm_swap_full(void) @@ -489,7 +488,6 @@ extern void swap_shmem_alloc(swp_entry_t, int); extern int swap_duplicate(swp_entry_t); extern int swapcache_prepare(swp_entry_t entry, int nr); extern void swap_free_nr(swp_entry_t entry, int nr_pages); -extern void swapcache_free_entries(swp_entry_t *entries, int n); extern void free_swap_and_cache_nr(swp_entry_t entry, int nr); int swap_type_of(dev_t device, sector_t offset); int find_first_swap(dev_t *device); diff --git a/include/linux/swap_slots.h b/include/linux/swap_slots.h deleted file mode 100644 index 840aec3523b2..000000000000 --- a/include/linux/swap_slots.h +++ /dev/null @@ -1,28 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 */ -#ifndef _LINUX_SWAP_SLOTS_H -#define _LINUX_SWAP_SLOTS_H - -#include -#include -#include - -#define SWAP_SLOTS_CACHE_SIZE SWAP_BATCH -#define THRESHOLD_ACTIVATE_SWAP_SLOTS_CACHE (5*SWAP_SLOTS_CACHE_SIZE) -#define THRESHOLD_DEACTIVATE_SWAP_SLOTS_CACHE (2*SWAP_SLOTS_CACHE_SIZE) - -struct swap_slots_cache { - bool lock_initialized; - struct mutex alloc_lock; /* protects slots, nr, cur */ - swp_entry_t *slots; - int nr; - int cur; - int n_ret; -}; - -void disable_swap_slots_cache_lock(void); -void reenable_swap_slots_cache_unlock(void); -void enable_swap_slots_cache(void); - -extern bool swap_slot_cache_enabled; - -#endif /* _LINUX_SWAP_SLOTS_H */ diff --git a/mm/Makefile b/mm/Makefile index 53392d2af3a5..ea16e472b294 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -75,7 +75,7 @@ ifdef CONFIG_MMU obj-$(CONFIG_ADVISE_SYSCALLS) += madvise.o endif -obj-$(CONFIG_SWAP) += page_io.o swap_state.o swapfile.o swap_slots.o +obj-$(CONFIG_SWAP) += page_io.o swap_state.o swapfile.o obj-$(CONFIG_ZSWAP) += zswap.o obj-$(CONFIG_HAS_DMA) += dmapool.o obj-$(CONFIG_HUGETLBFS) += hugetlb.o diff --git a/mm/swap_slots.c b/mm/swap_slots.c deleted file mode 100644 index 9c7c171df7ba..000000000000 --- a/mm/swap_slots.c +++ /dev/null @@ -1,295 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0 -/* - * Manage cache of swap slots to be used for and returned from - * swap. - * - * Copyright(c) 2016 Intel Corporation. - * - * Author: Tim Chen - * - * We allocate the swap slots from the global pool and put - * it into local per cpu caches. This has the advantage - * of no needing to acquire the swap_info lock every time - * we need a new slot. - * - * There is also opportunity to simply return the slot - * to local caches without needing to acquire swap_info - * lock. We do not reuse the returned slots directly but - * move them back to the global pool in a batch. This - * allows the slots to coalesce and reduce fragmentation. - * - * The swap entry allocated is marked with SWAP_HAS_CACHE - * flag in map_count that prevents it from being allocated - * again from the global pool. - * - * The swap slots cache is protected by a mutex instead of - * a spin lock as when we search for slots with scan_swap_map, - * we can possibly sleep. - */ - -#include -#include -#include -#include -#include -#include -#include - -static DEFINE_PER_CPU(struct swap_slots_cache, swp_slots); -static bool swap_slot_cache_active; -bool swap_slot_cache_enabled; -static bool swap_slot_cache_initialized; -static DEFINE_MUTEX(swap_slots_cache_mutex); -/* Serialize swap slots cache enable/disable operations */ -static DEFINE_MUTEX(swap_slots_cache_enable_mutex); - -static void __drain_swap_slots_cache(void); - -#define use_swap_slot_cache (swap_slot_cache_active && swap_slot_cache_enabled) - -static void deactivate_swap_slots_cache(void) -{ - mutex_lock(&swap_slots_cache_mutex); - swap_slot_cache_active = false; - __drain_swap_slots_cache(); - mutex_unlock(&swap_slots_cache_mutex); -} - -static void reactivate_swap_slots_cache(void) -{ - mutex_lock(&swap_slots_cache_mutex); - swap_slot_cache_active = true; - mutex_unlock(&swap_slots_cache_mutex); -} - -/* Must not be called with cpu hot plug lock */ -void disable_swap_slots_cache_lock(void) -{ - mutex_lock(&swap_slots_cache_enable_mutex); - swap_slot_cache_enabled = false; - if (swap_slot_cache_initialized) { - /* serialize with cpu hotplug operations */ - cpus_read_lock(); - __drain_swap_slots_cache(); - cpus_read_unlock(); - } -} - -static void __reenable_swap_slots_cache(void) -{ - swap_slot_cache_enabled = has_usable_swap(); -} - -void reenable_swap_slots_cache_unlock(void) -{ - __reenable_swap_slots_cache(); - mutex_unlock(&swap_slots_cache_enable_mutex); -} - -static bool check_cache_active(void) -{ - long pages; - - if (!swap_slot_cache_enabled) - return false; - - pages = get_nr_swap_pages(); - if (!swap_slot_cache_active) { - if (pages > num_online_cpus() * - THRESHOLD_ACTIVATE_SWAP_SLOTS_CACHE) - reactivate_swap_slots_cache(); - goto out; - } - - /* if global pool of slot caches too low, deactivate cache */ - if (pages < num_online_cpus() * THRESHOLD_DEACTIVATE_SWAP_SLOTS_CACHE) - deactivate_swap_slots_cache(); -out: - return swap_slot_cache_active; -} - -static int alloc_swap_slot_cache(unsigned int cpu) -{ - struct swap_slots_cache *cache; - swp_entry_t *slots; - - /* - * Do allocation outside swap_slots_cache_mutex - * as kvzalloc could trigger reclaim and folio_alloc_swap, - * which can lock swap_slots_cache_mutex. - */ - slots = kvcalloc(SWAP_SLOTS_CACHE_SIZE, sizeof(swp_entry_t), - GFP_KERNEL); - if (!slots) - return -ENOMEM; - - mutex_lock(&swap_slots_cache_mutex); - cache = &per_cpu(swp_slots, cpu); - if (cache->slots) { - /* cache already allocated */ - mutex_unlock(&swap_slots_cache_mutex); - - kvfree(slots); - - return 0; - } - - if (!cache->lock_initialized) { - mutex_init(&cache->alloc_lock); - cache->lock_initialized = true; - } - cache->nr = 0; - cache->cur = 0; - cache->n_ret = 0; - /* - * We initialized alloc_lock and free_lock earlier. We use - * !cache->slots or !cache->slots_ret to know if it is safe to acquire - * the corresponding lock and use the cache. Memory barrier below - * ensures the assumption. - */ - mb(); - cache->slots = slots; - mutex_unlock(&swap_slots_cache_mutex); - return 0; -} - -static void drain_slots_cache_cpu(unsigned int cpu, bool free_slots) -{ - struct swap_slots_cache *cache; - - cache = &per_cpu(swp_slots, cpu); - if (cache->slots) { - mutex_lock(&cache->alloc_lock); - swapcache_free_entries(cache->slots + cache->cur, cache->nr); - cache->cur = 0; - cache->nr = 0; - if (free_slots && cache->slots) { - kvfree(cache->slots); - cache->slots = NULL; - } - mutex_unlock(&cache->alloc_lock); - } -} - -static void __drain_swap_slots_cache(void) -{ - unsigned int cpu; - - /* - * This function is called during - * 1) swapoff, when we have to make sure no - * left over slots are in cache when we remove - * a swap device; - * 2) disabling of swap slot cache, when we run low - * on swap slots when allocating memory and need - * to return swap slots to global pool. - * - * We cannot acquire cpu hot plug lock here as - * this function can be invoked in the cpu - * hot plug path: - * cpu_up -> lock cpu_hotplug -> cpu hotplug state callback - * -> memory allocation -> direct reclaim -> folio_alloc_swap - * -> drain_swap_slots_cache - * - * Hence the loop over current online cpu below could miss cpu that - * is being brought online but not yet marked as online. - * That is okay as we do not schedule and run anything on a - * cpu before it has been marked online. Hence, we will not - * fill any swap slots in slots cache of such cpu. - * There are no slots on such cpu that need to be drained. - */ - for_each_online_cpu(cpu) - drain_slots_cache_cpu(cpu, false); -} - -static int free_slot_cache(unsigned int cpu) -{ - mutex_lock(&swap_slots_cache_mutex); - drain_slots_cache_cpu(cpu, true); - mutex_unlock(&swap_slots_cache_mutex); - return 0; -} - -void enable_swap_slots_cache(void) -{ - mutex_lock(&swap_slots_cache_enable_mutex); - if (!swap_slot_cache_initialized) { - int ret; - - ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "swap_slots_cache", - alloc_swap_slot_cache, free_slot_cache); - if (WARN_ONCE(ret < 0, "Cache allocation failed (%s), operating " - "without swap slots cache.\n", __func__)) - goto out_unlock; - - swap_slot_cache_initialized = true; - } - - __reenable_swap_slots_cache(); -out_unlock: - mutex_unlock(&swap_slots_cache_enable_mutex); -} - -/* called with swap slot cache's alloc lock held */ -static int refill_swap_slots_cache(struct swap_slots_cache *cache) -{ - if (!use_swap_slot_cache) - return 0; - - cache->cur = 0; - if (swap_slot_cache_active) - cache->nr = get_swap_pages(SWAP_SLOTS_CACHE_SIZE, - cache->slots, 0); - - return cache->nr; -} - -swp_entry_t folio_alloc_swap(struct folio *folio) -{ - swp_entry_t entry; - struct swap_slots_cache *cache; - - entry.val = 0; - - if (folio_test_large(folio)) { - if (IS_ENABLED(CONFIG_THP_SWAP)) - get_swap_pages(1, &entry, folio_order(folio)); - goto out; - } - - /* - * Preemption is allowed here, because we may sleep - * in refill_swap_slots_cache(). But it is safe, because - * accesses to the per-CPU data structure are protected by the - * mutex cache->alloc_lock. - * - * The alloc path here does not touch cache->slots_ret - * so cache->free_lock is not taken. - */ - cache = raw_cpu_ptr(&swp_slots); - - if (likely(check_cache_active() && cache->slots)) { - mutex_lock(&cache->alloc_lock); - if (cache->slots) { -repeat: - if (cache->nr) { - entry = cache->slots[cache->cur]; - cache->slots[cache->cur++].val = 0; - cache->nr--; - } else if (refill_swap_slots_cache(cache)) { - goto repeat; - } - } - mutex_unlock(&cache->alloc_lock); - if (entry.val) - goto out; - } - - get_swap_pages(1, &entry, 0); -out: - if (mem_cgroup_try_charge_swap(folio, entry)) { - put_swap_folio(folio, entry); - entry.val = 0; - } - return entry; -} diff --git a/mm/swap_state.c b/mm/swap_state.c index 50840a2887a5..2b5744e211cd 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -20,7 +20,6 @@ #include #include #include -#include #include #include #include "internal.h" @@ -447,13 +446,8 @@ struct folio *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, /* * Just skip read ahead for unused swap slot. - * During swap_off when swap_slot_cache is disabled, - * we have to handle the race between putting - * swap entry in swap cache and marking swap slot - * as SWAP_HAS_CACHE. That's done in later part of code or - * else swap_off will be aborted if we return NULL. */ - if (!swap_entry_swapped(si, entry) && swap_slot_cache_enabled) + if (!swap_entry_swapped(si, entry)) goto put_and_return; /* diff --git a/mm/swapfile.c b/mm/swapfile.c index 791cd7ed5bdf..66c8869ef346 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -37,7 +37,6 @@ #include #include #include -#include #include #include #include @@ -892,6 +891,13 @@ static unsigned long cluster_alloc_swap_entry(struct swap_info_struct *si, int o struct swap_cluster_info *ci; unsigned int offset, found = 0; + /* + * Swapfile is not block device so unable + * to allocate large entries. + */ + if (order && !(si->flags & SWP_BLKDEV)) + return 0; + if (!(si->flags & SWP_SOLIDSTATE)) { /* Serialize HDD SWAP allocation for each device. */ spin_lock(&si->global_cluster_lock); @@ -1155,43 +1161,6 @@ static void swap_range_free(struct swap_info_struct *si, unsigned long offset, swap_usage_sub(si, nr_entries); } -static int scan_swap_map_slots(struct swap_info_struct *si, - unsigned char usage, int nr, - swp_entry_t slots[], int order) -{ - unsigned int nr_pages = 1 << order; - int n_ret = 0; - - if (order > 0) { - /* - * Should not even be attempting large allocations when huge - * page swap is disabled. Warn and fail the allocation. - */ - if (!IS_ENABLED(CONFIG_THP_SWAP) || - nr_pages > SWAPFILE_CLUSTER) { - VM_WARN_ON_ONCE(1); - return 0; - } - - /* - * Swapfile is not block device so unable - * to allocate large entries. - */ - if (!(si->flags & SWP_BLKDEV)) - return 0; - } - - while (n_ret < nr) { - unsigned long offset = cluster_alloc_swap_entry(si, order, usage); - - if (!offset) - break; - slots[n_ret++] = swp_entry(si->type, offset); - } - - return n_ret; -} - static bool get_swap_device_info(struct swap_info_struct *si) { if (!percpu_ref_tryget_live(&si->users)) @@ -1212,54 +1181,53 @@ static bool get_swap_device_info(struct swap_info_struct *si) * Fast path try to get swap entries with specified order from current * CPU's swap entry pool (a cluster). */ -static int swap_alloc_fast(swp_entry_t entries[], +static int swap_alloc_fast(swp_entry_t *entry, unsigned char usage, - int order, int n_goal) + int order) { struct swap_cluster_info *ci; struct swap_info_struct *si; - unsigned int offset, found; - int n_ret = 0; - - n_goal = min(n_goal, SWAP_BATCH); + unsigned int offset, found = SWAP_ENTRY_INVALID; si = __this_cpu_read(percpu_swap_cluster.si); offset = __this_cpu_read(percpu_swap_cluster.offset[order]); if (!si || !offset || !get_swap_device_info(si)) - return 0; + return false; - while (offset) { - ci = lock_cluster(si, offset); - found = alloc_swap_scan_cluster(si, ci, offset, order, usage); - if (!found) - break; - entries[n_ret++] = swp_entry(si->type, found); - if (n_ret == n_goal) - break; - offset = __this_cpu_read(percpu_swap_cluster.offset[order]); - } + ci = lock_cluster(si, offset); + found = alloc_swap_scan_cluster(si, ci, offset, order, usage); + if (found) + *entry = swp_entry(si->type, found); put_swap_device(si); - return n_ret; + return !!found; } -int get_swap_pages(int n_goal, swp_entry_t swp_entries[], int entry_order) +swp_entry_t folio_alloc_swap(struct folio *folio) { - int order = swap_entry_order(entry_order); - unsigned long size = 1 << order; + unsigned int order = folio_order(folio); + unsigned int size = 1 << order; struct swap_info_struct *si, *next; - int n_ret = 0; + swp_entry_t entry = {}; + unsigned long offset; int node; + if (order) { + /* + * Should not even be attempting large allocations when huge + * page swap is disabled. Warn and fail the allocation. + */ + if (!IS_ENABLED(CONFIG_THP_SWAP) || size > SWAPFILE_CLUSTER) { + VM_WARN_ON_ONCE(1); + return entry; + } + } + /* Fast path using percpu cluster */ local_lock(&percpu_swap_cluster.lock); - n_ret = swap_alloc_fast(swp_entries, - SWAP_HAS_CACHE, - order, n_goal); - if (n_ret == n_goal) - goto out; + if (swap_alloc_fast(&entry, SWAP_HAS_CACHE, order)) + goto out_alloced; - n_goal = min_t(int, n_goal - n_ret, SWAP_BATCH); /* Rotate the device and switch to a new cluster */ spin_lock(&swap_avail_lock); start_over: @@ -1268,11 +1236,14 @@ int get_swap_pages(int n_goal, swp_entry_t swp_entries[], int entry_order) plist_requeue(&si->avail_lists[node], &swap_avail_heads[node]); spin_unlock(&swap_avail_lock); if (get_swap_device_info(si)) { - n_ret += scan_swap_map_slots(si, SWAP_HAS_CACHE, n_goal, - swp_entries + n_ret, order); + offset = cluster_alloc_swap_entry(si, order, SWAP_HAS_CACHE); put_swap_device(si); - if (n_ret || size > 1) - goto out; + if (offset) { + entry = swp_entry(si->type, offset); + goto out_alloced; + } + if (order) + goto out_failed; } spin_lock(&swap_avail_lock); @@ -1291,10 +1262,20 @@ int get_swap_pages(int n_goal, swp_entry_t swp_entries[], int entry_order) goto start_over; } spin_unlock(&swap_avail_lock); -out: +out_failed: + local_unlock(&percpu_swap_cluster.lock); + return entry; + +out_alloced: local_unlock(&percpu_swap_cluster.lock); - atomic_long_sub(n_ret * size, &nr_swap_pages); - return n_ret; + if (mem_cgroup_try_charge_swap(folio, entry)) { + put_swap_folio(folio, entry); + entry.val = 0; + } else { + atomic_long_sub(size, &nr_swap_pages); + } + + return entry; } static struct swap_info_struct *_swap_info_get(swp_entry_t entry) @@ -1590,25 +1571,6 @@ void put_swap_folio(struct folio *folio, swp_entry_t entry) unlock_cluster(ci); } -void swapcache_free_entries(swp_entry_t *entries, int n) -{ - int i; - struct swap_cluster_info *ci; - struct swap_info_struct *si = NULL; - - if (n <= 0) - return; - - for (i = 0; i < n; ++i) { - si = _swap_info_get(entries[i]); - if (si) { - ci = lock_cluster(si, swp_offset(entries[i])); - swap_entry_range_free(si, ci, entries[i], 1); - unlock_cluster(ci); - } - } -} - int __swap_count(swp_entry_t entry) { struct swap_info_struct *si = swp_swap_info(entry); @@ -1849,6 +1811,7 @@ void free_swap_and_cache_nr(swp_entry_t entry, int nr) swp_entry_t get_swap_page_of_type(int type) { struct swap_info_struct *si = swap_type_to_swap_info(type); + unsigned long offset; swp_entry_t entry = {0}; if (!si) @@ -1856,8 +1819,13 @@ swp_entry_t get_swap_page_of_type(int type) /* This is called for allocating swap entry, not cache */ if (get_swap_device_info(si)) { - if ((si->flags & SWP_WRITEOK) && scan_swap_map_slots(si, 1, 1, &entry, 0)) - atomic_long_dec(&nr_swap_pages); + if (si->flags & SWP_WRITEOK) { + offset = cluster_alloc_swap_entry(si, 0, 1); + if (offset) { + entry = swp_entry(si->type, offset); + atomic_long_dec(&nr_swap_pages); + } + } put_swap_device(si); } fail: @@ -2623,16 +2591,6 @@ static bool __has_usable_swap(void) return !plist_head_empty(&swap_active_head); } -bool has_usable_swap(void) -{ - bool ret; - - spin_lock(&swap_lock); - ret = __has_usable_swap(); - spin_unlock(&swap_lock); - return ret; -} - /* * Called after clearing SWP_WRITEOK, ensures cluster_alloc_range * see the updated flags, so there will be no more allocations. @@ -2724,8 +2682,6 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile) wait_for_allocation(p); - disable_swap_slots_cache_lock(); - set_current_oom_origin(); err = try_to_unuse(p->type); clear_current_oom_origin(); @@ -2733,12 +2689,9 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile) if (err) { /* re-insert swap space back into swap_list */ reinsert_swap_info(p); - reenable_swap_slots_cache_unlock(); goto out_dput; } - reenable_swap_slots_cache_unlock(); - /* * Wait for swap operations protected by get/put_swap_device() * to complete. Because of synchronize_rcu() here, all swap @@ -3487,8 +3440,6 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags) putname(name); if (inode) inode_unlock(inode); - if (!error) - enable_swap_slots_cache(); return error; } From patchwork Fri Feb 14 17:57:09 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kairui Song X-Patchwork-Id: 13975441 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 89A86C02198 for ; Fri, 14 Feb 2025 17:59:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 280EE28000A; Fri, 14 Feb 2025 12:59:01 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 20882280002; Fri, 14 Feb 2025 12:59:01 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 082C728000A; Fri, 14 Feb 2025 12:59:01 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id D5E55280002 for ; Fri, 14 Feb 2025 12:59:00 -0500 (EST) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 913E31C7F0C for ; Fri, 14 Feb 2025 17:59:00 +0000 (UTC) X-FDA: 83119311240.07.EB60319 Received: from mail-pl1-f170.google.com (mail-pl1-f170.google.com [209.85.214.170]) by imf23.hostedemail.com (Postfix) with ESMTP id A4BC0140017 for ; Fri, 14 Feb 2025 17:58:58 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=A3u4QK+p; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf23.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.170 as permitted sender) smtp.mailfrom=ryncsn@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1739555938; a=rsa-sha256; cv=none; b=hH9/4IyqBlt7acw7DmqAyXZ0pniQcHii6yG0glwgPyOULETodATe3xUXWNXBfORtuIX/bx 8hEb33MW5xf3sf8UNmRyoLQAWutoqayGqtOxFeOVf0NehjzXHpEnSZZ7chMiuJ3WrTzasA ohJgHkUJDRto/JncH3wEPy3KHwNtS6k= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=A3u4QK+p; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf23.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.170 as permitted sender) smtp.mailfrom=ryncsn@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1739555938; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=g131YEqRMfLTVZhTMtUAtQy5KjWaVYxNxWGQgGzsAzk=; b=A3up3Y8OAOkMhX7uDFnvXgEhRBlkHo49fsSe5JoD4rFDxmRO7QrNC+fe2VgxqP6dYMYqDt GEi5W59T15qnxvYjNzyXOXzpExiJCriFYsQ2kNiV2ebPOvoCfdIV+8pTq8joSy1X0Mrgh3 ecxbZoxVJdNhUr3WUuQO4Iu++rDbpmE= Received: by mail-pl1-f170.google.com with SMTP id d9443c01a7336-220c8eb195aso47559755ad.0 for ; Fri, 14 Feb 2025 09:58:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1739555937; x=1740160737; darn=kvack.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=g131YEqRMfLTVZhTMtUAtQy5KjWaVYxNxWGQgGzsAzk=; b=A3u4QK+pzVYgqg4ydOHxmaAjq89HR493kJcCCCYc/qgl8lx2AsjmLxh/RlseFgKgJp 4njoNN9T0cb62DJE/gt937sZ1AdnP14pTWpL9iegSWaY8Dnp8QC+L9GRNugtDOKEK2om OXC5wsEr/4ck5SUdno7gtYMUteMxrdZ4oY/dVkHx+IDKrzYEEiJtb8N6/MMJ8iXZ8KOE 2ELRUaL7QbaPZ+g0Ve7CSHhuKK9MWMsxBKGZlC0D591ItEBCwYbeQJl6UqURJA15dsJy H+qaMR+7wMrXHsEYoT3SC6pNoi7l7M1e+M/9Y/XnDtrHzuE/WrwPIerJFuPz6bhbuv82 +0Lw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739555937; x=1740160737; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=g131YEqRMfLTVZhTMtUAtQy5KjWaVYxNxWGQgGzsAzk=; b=L+j6LTPucjMFdsIIcyJSR7/pk5u3yCG680RTNhideS9LDps5KnulfVA5FCEwQADAKd 6WCEZPB5zcjAd1EZCxfcyIG+xGTSephqs+iZaAV9BbkT/8cXXCGSfj8zEpwmtx+aSBPe Db2hnBy6XrgBgg/X1a4ZcKICP4VNQ0zKjwuKZ10I+lFGHXXEqaQhnAh4y+TsmFRLRvVP dqV75dUOHM5+D2qVuUPe3SE1IjEbhZfmx15pl4fTpxlnCwUSaU+HdS7cQX02K6cZ39n0 5y2wYp6AkkfBpdeOwkSN8nNQOSerckctUh0Rqn8Idtcw8wy/o872F6kU9ok6LQZszDcw 9aMA== X-Gm-Message-State: AOJu0YwRSuy+3XpY2Cwng0KV7a1lo8AiUJdyQIjjAy2nkwxwL10W5Yb/ CrNIIOI/qV4183Hoj7+mn7foMNKLU+Krttm9S5crUbUIdK2cJW2+1plxte2uI+g= X-Gm-Gg: ASbGncuEVRDeE/ibBp9n/g+HnOvVmgTC61teBkASoWaglVpvL9qw4pyxrQWln2mkpFW G+PzhhLQw3NKETS9nSiFpgFKd2qOiK/KqCDpRL7PknEmjVwX8luxzb1CniT2YnFro5lP05vbgBh cemDz/bprZzQKH6mjg/9WO6Rr3x/bwqWhWc5Vi1bEKQa91ezcQQL/8qKIQSHD4JgzREAzaJplSR v+EBTIOOEQtbf3e2AR4e0pTo85DOlFxusULVT4wb9Ye+SQyv2eqPqXf5vVEJMpPdA2wKut/vXVt gkRZQAEXaWiGrsnDvO7O6UVvR5ehQzTUbwJg X-Google-Smtp-Source: AGHT+IEBFWRSKG7OwZIbmlb93b7xdfMca6gf2NNi7p6B/FLu0eRP37//Pl09ZSVseFox+/txo2Nx6Q== X-Received: by 2002:a17:902:d483:b0:215:6e01:ad07 with SMTP id d9443c01a7336-22103efb5b1mr2464155ad.6.1739555937010; Fri, 14 Feb 2025 09:58:57 -0800 (PST) Received: from KASONG-MC4.tencent.com ([106.37.120.226]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-220d55943b5sm31216605ad.246.2025.02.14.09.58.53 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Fri, 14 Feb 2025 09:58:56 -0800 (PST) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Chris Li , Barry Song , Hugh Dickins , Yosry Ahmed , "Huang, Ying" , Baoquan He , Nhat Pham , Johannes Weiner , Kalesh Singh , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH 7/7] mm, swap: simplify folio swap allocation Date: Sat, 15 Feb 2025 01:57:09 +0800 Message-ID: <20250214175709.76029-8-ryncsn@gmail.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250214175709.76029-1-ryncsn@gmail.com> References: <20250214175709.76029-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: A4BC0140017 X-Stat-Signature: 9nzrsc9s5rwhehzddtd3zems3tzu7y95 X-Rspam-User: X-HE-Tag: 1739555938-739516 X-HE-Meta: U2FsdGVkX18y8tduqN0MjtT4Xi9nEa0ZyZn2RIBvNRbar6/t9OLjYjbGY0eFRlzFQy7vhEkfnmEqyKGbjxoVXSXuYd4UojrTYv50rcbS9s0q+8kI2aJG8roefpKUwRXKe2BteLtNOx0ugrKAUXScl9APxTkJnoZqZmnqxI63wcZDGgYZPyT1DDMXfeE8mvAtyUWo2P1jhejyuySccrdC0De7X8CG34ejXSFklVRPu1SGal9cGz1sZ1rBC/CGqMmns0UFnzcBQlYdfr4TW30My3st2GoTURfvimsPB+wufHm19PVFpv4g9GywBbEXICq+b3PXfB+R2aW7jEjzMzKlSIiclYIQwWj8v9laB9h5O6lRkSrjoQ/D0qoHNzs351VBTQ+H1YU5NA0kHxYVtAxeKvVsbAgoYtxnpGXnMO5383sfoKE4r1ugeuXvVy3dKx4jfhFhaki32Z/jGVCwnBEaH/vHbYwB1UYaRxDstQ1g88SY4B7TgIu2eyVO7sf9J+Q7TW0aCq64oOYvYKvYPjfdWR4ReYF7eYN89xmSh5gDN5b/afT80ifWeC2Usm0OxMxapwx9o1HuXfWhyVYVcAsSdDTZuHMAcbNwYaPGTkqVQVMA0SpvaL4sNKHCMBfjmz5hVpmXBVB3i10Br9mEMx9/2MA8r9r+gUlhZi7wZPCP1sLveEFQliFcUCh0V7lKvsIF/Dpl2FHuiko9TPZGthdxIExdkCOu9yq61sSHLso6C2VuaRXe6e8oaHijTgqV3LJKQ1FL22ksjAtrnefYrXmd0JcXvdggJsiDTIIXFnXQZDqyyXdcO8bBmgxDD1k1W40d0benRlKV/1SYX/lVRPWtJ2AoMQox/inIsMIvMWOEZa45B6fw5AyG43TRAO0WeMc7++0a44HOrB11BFj8kV59XD26XHtXx7281Q3ZqrnmiGjZafRgaa4rp0KW/NvhjaNY0/wSFrsm99EYXZk4wV7 wBEI0Pd3 OHoqJ9lG+W7CMHB6Ll5M384hCKMywF28RGiS3lnzm59QF7u84nUmc7vcri3V47irizLUaLTm7LtlLU1seVs9OjLs7xBq1B+IyWY9xXrtGsr8dyj+3b6CThoRXfseIIUHYvpfXOqhD553fbtnFvKOWiKK268jp6D3Etsby8zEYk5Zg/id7I5D5msgHqa/he5mLZ/nd8Tb1Y9ZYp4iIx8hpEPfZmHZFOU5kMH8oK+rDQj2pYTGay4EPgnn6e8EMtxV9MSshjHC6onp9NK+hmnrT11E0aNC0jbwcGmDPwPvxq/ggoIgj8JUG3VXHg5kbUHrxUY4/4goGadow09/by/wUQijCP2IbdXNlWQFn900LcVlXXOhphNSvWM+bPAo7VydtDnCf+qRAYktAtA4JcaHmhH9/78QZbsgFof8Jc014aCr68LNhT3233n5udFTr+KtRDgXLWs+d6rcFbHXgEvxRZJSQ6HFWoHYHlxYbqR5G9rOfiFl3bvnjLJTaV2RAv2X972PRqbQnfwvoZDu9XsGBmUQbMQebeKfEcG2pQcc+IITvVA4MXgy16qFAnQuGNr1boQTj X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Kairui Song With slot cache gone, clean up the allocation helpers even more. folio_alloc_swap will be the only entry for allocation and adding the folio to swap cache (except suspend), making it opposite of folio_free_swap. Signed-off-by: Kairui Song --- include/linux/swap.h | 8 ++-- mm/shmem.c | 21 +++------ mm/swap.h | 6 --- mm/swap_state.c | 57 ---------------------- mm/swapfile.c | 110 ++++++++++++++++++++++++++++--------------- mm/vmscan.c | 16 ++++++- 6 files changed, 94 insertions(+), 124 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index 456833705ea0..e799e965dac8 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -478,7 +478,7 @@ static inline long get_nr_swap_pages(void) } extern void si_swapinfo(struct sysinfo *); -swp_entry_t folio_alloc_swap(struct folio *folio); +bool folio_alloc_swap(struct folio *folio, gfp_t gfp_mask); bool folio_free_swap(struct folio *folio); void put_swap_folio(struct folio *folio, swp_entry_t entry); extern swp_entry_t get_swap_page_of_type(int); @@ -587,11 +587,9 @@ static inline int swp_swapcount(swp_entry_t entry) return 0; } -static inline swp_entry_t folio_alloc_swap(struct folio *folio) +static bool folio_alloc_swap(struct folio *folio, gfp_t gfp_mask); { - swp_entry_t entry; - entry.val = 0; - return entry; + return false; } static inline bool folio_free_swap(struct folio *folio) diff --git a/mm/shmem.c b/mm/shmem.c index b35ba250c53d..2aa206b52ff2 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1546,7 +1546,6 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc) struct inode *inode = mapping->host; struct shmem_inode_info *info = SHMEM_I(inode); struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb); - swp_entry_t swap; pgoff_t index; int nr_pages; bool split = false; @@ -1628,14 +1627,6 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc) folio_mark_uptodate(folio); } - swap = folio_alloc_swap(folio); - if (!swap.val) { - if (nr_pages > 1) - goto try_split; - - goto redirty; - } - /* * Add inode to shmem_unuse()'s list of swapped-out inodes, * if it's not already there. Do it now before the folio is @@ -1648,20 +1639,20 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc) if (list_empty(&info->swaplist)) list_add(&info->swaplist, &shmem_swaplist); - if (add_to_swap_cache(folio, swap, - __GFP_HIGH | __GFP_NOMEMALLOC | __GFP_NOWARN, - NULL) == 0) { + if (folio_alloc_swap(folio, __GFP_HIGH | __GFP_NOMEMALLOC | __GFP_NOWARN)) { shmem_recalc_inode(inode, 0, nr_pages); - swap_shmem_alloc(swap, nr_pages); - shmem_delete_from_page_cache(folio, swp_to_radix_entry(swap)); + swap_shmem_alloc(folio->swap, nr_pages); + shmem_delete_from_page_cache(folio, swp_to_radix_entry(folio->swap)); mutex_unlock(&shmem_swaplist_mutex); BUG_ON(folio_mapped(folio)); return swap_writepage(&folio->page, wbc); } + list_del_init(&info->swaplist); mutex_unlock(&shmem_swaplist_mutex); - put_swap_folio(folio, swap); + if (nr_pages > 1) + goto try_split; redirty: folio_mark_dirty(folio); if (wbc->for_reclaim) diff --git a/mm/swap.h b/mm/swap.h index ad2f121de970..0abb68091b4f 100644 --- a/mm/swap.h +++ b/mm/swap.h @@ -50,7 +50,6 @@ static inline pgoff_t swap_cache_index(swp_entry_t entry) } void show_swap_cache_info(void); -bool add_to_swap(struct folio *folio); void *get_shadow_from_swap_cache(swp_entry_t entry); int add_to_swap_cache(struct folio *folio, swp_entry_t entry, gfp_t gfp, void **shadowp); @@ -163,11 +162,6 @@ struct folio *filemap_get_incore_folio(struct address_space *mapping, return filemap_get_folio(mapping, index); } -static inline bool add_to_swap(struct folio *folio) -{ - return false; -} - static inline void *get_shadow_from_swap_cache(swp_entry_t entry) { return NULL; diff --git a/mm/swap_state.c b/mm/swap_state.c index 2b5744e211cd..68fd981b514f 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -166,63 +166,6 @@ void __delete_from_swap_cache(struct folio *folio, __lruvec_stat_mod_folio(folio, NR_SWAPCACHE, -nr); } -/** - * add_to_swap - allocate swap space for a folio - * @folio: folio we want to move to swap - * - * Allocate swap space for the folio and add the folio to the - * swap cache. - * - * Context: Caller needs to hold the folio lock. - * Return: Whether the folio was added to the swap cache. - */ -bool add_to_swap(struct folio *folio) -{ - swp_entry_t entry; - int err; - - VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio); - VM_BUG_ON_FOLIO(!folio_test_uptodate(folio), folio); - - entry = folio_alloc_swap(folio); - if (!entry.val) - return false; - - /* - * XArray node allocations from PF_MEMALLOC contexts could - * completely exhaust the page allocator. __GFP_NOMEMALLOC - * stops emergency reserves from being allocated. - * - * TODO: this could cause a theoretical memory reclaim - * deadlock in the swap out path. - */ - /* - * Add it to the swap cache. - */ - err = add_to_swap_cache(folio, entry, - __GFP_HIGH|__GFP_NOMEMALLOC|__GFP_NOWARN, NULL); - if (err) - goto fail; - /* - * Normally the folio will be dirtied in unmap because its - * pte should be dirty. A special case is MADV_FREE page. The - * page's pte could have dirty bit cleared but the folio's - * SwapBacked flag is still set because clearing the dirty bit - * and SwapBacked flag has no lock protected. For such folio, - * unmap will not set dirty bit for it, so folio reclaim will - * not write the folio out. This can cause data corruption when - * the folio is swapped in later. Always setting the dirty flag - * for the folio solves the problem. - */ - folio_mark_dirty(folio); - - return true; - -fail: - put_swap_folio(folio, entry); - return false; -} - /* * This must be called only on folios that have * been verified to be in the swap cache and locked. diff --git a/mm/swapfile.c b/mm/swapfile.c index 66c8869ef346..8449bd703bd8 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1181,9 +1181,9 @@ static bool get_swap_device_info(struct swap_info_struct *si) * Fast path try to get swap entries with specified order from current * CPU's swap entry pool (a cluster). */ -static int swap_alloc_fast(swp_entry_t *entry, - unsigned char usage, - int order) +static bool swap_alloc_fast(swp_entry_t *entry, + unsigned char usage, + int order) { struct swap_cluster_info *ci; struct swap_info_struct *si; @@ -1203,47 +1203,31 @@ static int swap_alloc_fast(swp_entry_t *entry, return !!found; } -swp_entry_t folio_alloc_swap(struct folio *folio) +/* Rotate the device and switch to a new cluster */ +static bool swap_alloc_rotate(swp_entry_t *entry, + unsigned char usage, + int order) { - unsigned int order = folio_order(folio); - unsigned int size = 1 << order; - struct swap_info_struct *si, *next; - swp_entry_t entry = {}; - unsigned long offset; int node; + unsigned long offset; + struct swap_info_struct *si, *next; - if (order) { - /* - * Should not even be attempting large allocations when huge - * page swap is disabled. Warn and fail the allocation. - */ - if (!IS_ENABLED(CONFIG_THP_SWAP) || size > SWAPFILE_CLUSTER) { - VM_WARN_ON_ONCE(1); - return entry; - } - } - - /* Fast path using percpu cluster */ - local_lock(&percpu_swap_cluster.lock); - if (swap_alloc_fast(&entry, SWAP_HAS_CACHE, order)) - goto out_alloced; - - /* Rotate the device and switch to a new cluster */ + node = numa_node_id(); spin_lock(&swap_avail_lock); start_over: - node = numa_node_id(); plist_for_each_entry_safe(si, next, &swap_avail_heads[node], avail_lists[node]) { + /* Rotate the device and switch to a new cluster */ plist_requeue(&si->avail_lists[node], &swap_avail_heads[node]); spin_unlock(&swap_avail_lock); if (get_swap_device_info(si)) { offset = cluster_alloc_swap_entry(si, order, SWAP_HAS_CACHE); put_swap_device(si); if (offset) { - entry = swp_entry(si->type, offset); - goto out_alloced; + *entry = swp_entry(si->type, offset); + return true; } if (order) - goto out_failed; + return false; } spin_lock(&swap_avail_lock); @@ -1262,20 +1246,68 @@ swp_entry_t folio_alloc_swap(struct folio *folio) goto start_over; } spin_unlock(&swap_avail_lock); -out_failed: + return false; +} + +/** + * folio_alloc_swap - allocate swap space for a folio + * @folio: folio we want to move to swap + * @gfp: gfp mask for shadow nodes + * + * Allocate swap space for the folio and add the folio to the + * swap cache. + * + * Context: Caller needs to hold the folio lock. + * Return: Whether the folio was added to the swap cache. + */ +bool folio_alloc_swap(struct folio *folio, gfp_t gfp) +{ + unsigned int order = folio_order(folio); + unsigned int size = 1 << order; + swp_entry_t entry = {}; + + VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio); + VM_BUG_ON_FOLIO(!folio_test_uptodate(folio), folio); + + /* + * Should not even be attempting large allocations when huge + * page swap is disabled. Warn and fail the allocation. + */ + if (order && (!IS_ENABLED(CONFIG_THP_SWAP) || size > SWAPFILE_CLUSTER)) { + VM_WARN_ON_ONCE(1); + return false; + } + + local_lock(&percpu_swap_cluster.lock); + if (swap_alloc_fast(&entry, SWAP_HAS_CACHE, order)) + goto out_alloced; + if (swap_alloc_rotate(&entry, SWAP_HAS_CACHE, order)) + goto out_alloced; local_unlock(&percpu_swap_cluster.lock); - return entry; + return false; out_alloced: local_unlock(&percpu_swap_cluster.lock); - if (mem_cgroup_try_charge_swap(folio, entry)) { - put_swap_folio(folio, entry); - entry.val = 0; - } else { - atomic_long_sub(size, &nr_swap_pages); - } + if (mem_cgroup_try_charge_swap(folio, entry)) + goto out_free; - return entry; + /* + * XArray node allocations from PF_MEMALLOC contexts could + * completely exhaust the page allocator. __GFP_NOMEMALLOC + * stops emergency reserves from being allocated. + * + * TODO: this could cause a theoretical memory reclaim + * deadlock in the swap out path. + */ + if (add_to_swap_cache(folio, entry, gfp | __GFP_NOMEMALLOC, NULL)) + goto out_free; + + atomic_long_sub(size, &nr_swap_pages); + return true; + +out_free: + put_swap_folio(folio, entry); + return false; } static struct swap_info_struct *_swap_info_get(swp_entry_t entry) diff --git a/mm/vmscan.c b/mm/vmscan.c index fcca38bc640f..71a6b597e469 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1289,7 +1289,7 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, split_folio_to_list(folio, folio_list)) goto activate_locked; } - if (!add_to_swap(folio)) { + if (!folio_alloc_swap(folio, __GFP_HIGH | __GFP_NOWARN)) { int __maybe_unused order = folio_order(folio); if (!folio_test_large(folio)) @@ -1305,9 +1305,21 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, } #endif count_mthp_stat(order, MTHP_STAT_SWPOUT_FALLBACK); - if (!add_to_swap(folio)) + if (!folio_alloc_swap(folio, __GFP_HIGH | __GFP_NOWARN)) goto activate_locked_split; } + /* + * Normally the folio will be dirtied in unmap because its + * pte should be dirty. A special case is MADV_FREE page. The + * page's pte could have dirty bit cleared but the folio's + * SwapBacked flag is still set because clearing the dirty bit + * and SwapBacked flag has no lock protected. For such folio, + * unmap will not set dirty bit for it, so folio reclaim will + * not write the folio out. This can cause data corruption when + * the folio is swapped in later. Always setting the dirty flag + * for the folio solves the problem. + */ + folio_mark_dirty(folio); } }