From patchwork Thu Mar 13 16:59:29 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kairui Song X-Patchwork-Id: 14015500 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 809E2C282EC for ; Thu, 13 Mar 2025 17:01:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C5590280002; Thu, 13 Mar 2025 13:01:25 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C04E4280001; Thu, 13 Mar 2025 13:01:25 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A8234280002; Thu, 13 Mar 2025 13:01:25 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 866D9280001 for ; Thu, 13 Mar 2025 13:01:25 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 25B37C0341 for ; Thu, 13 Mar 2025 17:01:27 +0000 (UTC) X-FDA: 83217143814.17.7C8260A Received: from mail-pl1-f174.google.com (mail-pl1-f174.google.com [209.85.214.174]) by imf26.hostedemail.com (Postfix) with ESMTP id DA6B5140016 for ; Thu, 13 Mar 2025 17:01:20 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=UWB1pJuh; spf=pass (imf26.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.174 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1741885280; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=nUQ+OrAh02CbWtjPdQxW9RlcKT1xdT7Rpo6ksTT/P0Y=; b=RdkmJhYmcy4x1Z/yf8tFG1iXVHPeAvQyWnhZNuJT3r2dwwJFE+FbuHYDx3UbPo4YV7mcVf +ka7+dgWvRej0opFHxzMRZYaJ/IQSbZN7h0DgYjcJfW6yxY9zeVv3H05AkNkIePW0ndAB6 1LI2EZEwTl9hphFDbBkYV6XFY9YFprY= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=UWB1pJuh; spf=pass (imf26.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.174 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1741885280; a=rsa-sha256; cv=none; b=3jmnMMU7Hf1ZiM7k3JSEA8nKW+sl1p+GGox22E9HSjGkcVC5K/lqF3QCWwBIKB3eT/MTCR leaVhzLLUysjqcTDXBZrJ7FY+0aU2HD3MvtXBmL+5UOgf9kPGbuQ/1PBGO6ecpBgD3m4eU ymgYZlKOcsI6k0/B6Y48XkmTI2k2Ap8= Received: by mail-pl1-f174.google.com with SMTP id d9443c01a7336-223fb0f619dso24720765ad.1 for ; Thu, 13 Mar 2025 10:01:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1741885279; x=1742490079; darn=kvack.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=nUQ+OrAh02CbWtjPdQxW9RlcKT1xdT7Rpo6ksTT/P0Y=; b=UWB1pJuhchL1y+ImVTFPPBYIixRSmT5WZemwsdsdQg1mlAVaGavNKvsUGiLGVIfmnf y8EOGLVYdkDcTIj0J08+YDeeCgvpLABK847tbwlcUnhLaMhKAuNMctvcwGEB5qlToJgk pNdr6ksgpmLiqJxrCS8pp5Zq2UzN0T+V2Dqc1NQUODz1lY19R6ywn7iVkzfyMJMWkVE6 SqPnPMkFb5IZHt8jVMiYqZl0wrZpP8K2rlfoZtR3Lur6RgSUXqLXcOp5a44KWtm56lHn MhgU1LVzKiAwb9ocuGO57Fs6V7zVq4ARVoeppArmeUqfgIYhv7TeOFtxLZi2zawWzKcz aDjQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741885279; x=1742490079; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=nUQ+OrAh02CbWtjPdQxW9RlcKT1xdT7Rpo6ksTT/P0Y=; b=hxl2kqCDPftE2wOwA4eAATkoCiu/zRlz7COA9rZQosz+4Eh3Ie9t/koIP2SsS294DC keHwQ7HZFD59c4zzuuGlHRm5yFVXvcg6cgmtG8vCi6jB0426nrQrzijmjd8vIGWzr2K9 xQg7ei3PEsMqNtMvB6bC9yxKYC/SWqTUmOXqVCN/oeGY1OzVZO0Vx3E3LY+Af5Argz8B o7uX6wRb1oPT4fDTLgCj8n/yPCK8Wh0NWvnmjNw/ZxlofiLNgBXLPiR/qu3s9avU86mv k4AIYYScrsH6e3dFTm2Sck9jOvlI+Mla2D1AEXuE7Y2TE0lXwWuu4ZDI5TtqGeDI/DN0 o49Q== X-Gm-Message-State: AOJu0YyDd87m53YCi+MWtURbkJegXm0IU+yRQox0fwuFOHNVTypA65Mb 5dppS72zyqfQwnwiF+hcNdv9kCYywTfOoR40KWFvi/MLDwJKkLu7K1yinSfjiI0= X-Gm-Gg: ASbGnctZosDM1KnNpP09BcsvffazXNZNk1ZBB1XMPUlZ/N2m+0QictXd6Kmw984XtzQ vkaxpF6/zu6XRc12nrP0vyIFJQ1fuJEJ37LK0AFZxz+U0/nuzNxjFXIuILMX+PUJrTtUT1INQJs SF1fTObRAbEntZEGN4gL1o3t93ZRtfTewQg0htoQm5S3hQWXeI5VQ/Wq7MN7ScuCPlT2KY7sEbz CN5R7V5oPlhkd5six+R3boUQJrfoq/++Tu2gyJpE92TzHTNarbCibGuSx3y/VZFtakHKpEgTwUl IKkjVfO7UD2LXIunF9lkTDqvOZhjDIo9FxZh1gjN7HPpXqXStejCJUyQh0+aU3fTFw== X-Google-Smtp-Source: AGHT+IEtL0owIxoYlBZ71gxEIW/LFwX9S1V54rOpGhTHmxstlznzvgnxMsNk+OzJF8x7byESxSp3WQ== X-Received: by 2002:a17:903:1c8:b0:223:26da:4b8e with SMTP id d9443c01a7336-22592e21948mr178552515ad.4.1741885278863; Thu, 13 Mar 2025 10:01:18 -0700 (PDT) Received: from KASONG-MC4.tencent.com ([106.37.123.220]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-7371167df0esm1613529b3a.93.2025.03.13.10.01.13 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 13 Mar 2025 10:01:18 -0700 (PDT) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Chris Li , Barry Song , Hugh Dickins , Yosry Ahmed , "Huang, Ying" , Baoquan He , Nhat Pham , Johannes Weiner , Baolin Wang , Kalesh Singh , Matthew Wilcox , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH v3 1/7] mm, swap: avoid reclaiming irrelevant swap cache Date: Fri, 14 Mar 2025 00:59:29 +0800 Message-ID: <20250313165935.63303-2-ryncsn@gmail.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250313165935.63303-1-ryncsn@gmail.com> References: <20250313165935.63303-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: DA6B5140016 X-Stat-Signature: bpkj87nkbs7j8qxtarmoyfhtpsixa3yi X-HE-Tag: 1741885280-284810 X-HE-Meta: U2FsdGVkX1+B3YxvuBAGLliXADa8JKPzHduzNAbcFceGam6Fu+NTvWI0whlmHorBn+FE53/fwbb4Hy24n7GlA/scaffWWaSFnP+navRshwUCSo7KIO10zk1xEHsmjiqN1kP+QoJZFq619tPtJrMA9vjb0D/x4NxN5oSLbexEIOy+zhHWYMXH69VC0hO/G3Fe2pY+tWd5dJ+vAOff/pd7cSEntqo3CKjvpqsUmHBw6WvoWk5naJ4C2QLg+8XG/aeFsUIpHEohk+9afOUHzIHFDRCwEKAOCZeN+QfUQUMYkJbmiMjUjGNWkz0UL7XjQGQd4yuzAsnUnOl6zwREhXsk8mC+zfy6epVLE98oOmi0pEQJf3tzB5TmY4B9khc/EFcskLiBQPic6aWMQUC+Hrkl2ONQi5Ym/2t/JH8CV01AVChCfSsGUNoZiwGp8snkzyrdKVrnDwiLa8MqOTyhMLZTeSBRKOMlxa9UPDtD9mr+AGtCVDJvLpmJ++l9oXLe8vd6C5bJSZscoJ0nqHdcrrrxqZJSPxk+ZG61NEuDw/HYY+4B5mwf6KEGbeSB7BuOZNFr4vKrFat++XprfOf6h10W0UTWWW1OTwG4AM1Erngjl3ozdRvRMd+l77sL0tIzB0RKxnNeGK1lB4XD2IXGflRydN+J+KHwQN6zdxVxzJAW1PlIo3LPHeJ6bK9aECTwKzsml4/qqmxsVXqLgi92EkbQOuzcVZkoDc5YWucx6mDhtTLajE2FS2A7Kvwa9anPEFpzgZQOU6p7v82FKNJLwA56i0skD1d+1Po01AUuzcGoCpUGZoHKrfX97iBItUSFizRpt6AI/0uTmi65ChzrmXajQS2AMR9GT6yzzK/sKJyNoJ9b9JO9qyFxsm33OVy7G2rgFWT50K09QmiZgcrOE5QUe91lnY2OZ3+is1HT16RPtmpqI+5Lwvf3LrFOHb7Ba41GGOKaAYSVXoPHT8YOa8b djsrMD68 DeC19DtbkjsnsDXBt87VuRfKrJX5rNLWJq32WzTT3cirdqAU5QC9aCgl98EYkS9u7htF0I+pVk/Imobw+H3nKM40JzM5PaCtYHmNIzSqdKUamOJPaHxbeQJWjCiYvCRJjUm+mryozn4KJeefUMqQ+telCGkZyWVtsn+eLxfwfctG5fo5IfbaQblJUW9PyRvm2tk4qlvOSWrkO8UITIB5k4bUqQbcEUGWNM5lH7tXOUWWcfI4GUQ+IzWAM4lhCI0hq2rOGPLr4ITFE4lxwQMYOpFCe7QnNEC0H09/6VSJx9dr84cshvOWpk38xNqFkfBiJ0gtb/d1SmL87CmUrfE3mtqyQ2sKgKRPXj7EBCK0tzui+se8Nhl4Iwt4f+LfONbX327EQQkcQcC1qFGV0wgdkLn7zxny+htHxrjNuBdvOCQQc4zUT0l5Cg4wbj2SIYP9by89z4N+bosKvdp76GmyhtaV2FB6tREfGWjDq1NQftJGSCNJueIloF4MvV2VC9Cpt52XLU94oCkepPMciHtjydSx0MlaDV/muBiuIu6W1LzQ0ycNVnFrB4Ppz/+QNQqbCWsUT0e/B7/yIP07TrChIhiN33g== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Kairui Song Swap allocator will do swap cache reclaim to recycle HAS_CACHE slots for allocation. It initiates the reclaim from the offset to be reclaimed and looks up the corresponding folio. The lookup process is lockless, so it's possible the folio will be removed from the swap cache and given a different swap entry before the reclaim locks the folio. If it happens, the reclaim will end up reclaiming an irrelevant folio, and return wrong return value. This shouldn't cause any problem with correctness or stability, but it is indeed confusing and unexpected, and will increase fragmentation, decrease performance. Fix this by checking whether the folio is still pointing to the offset the allocator want to reclaim before reclaiming it. Signed-off-by: Kairui Song Reviewed-by: Baoquan He --- mm/swapfile.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/mm/swapfile.c b/mm/swapfile.c index a7f60006c52c..5618cd1c4b03 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -210,6 +210,7 @@ static int __try_to_reclaim_swap(struct swap_info_struct *si, int ret, nr_pages; bool need_reclaim; +again: folio = filemap_get_folio(address_space, swap_cache_index(entry)); if (IS_ERR(folio)) return 0; @@ -227,8 +228,16 @@ static int __try_to_reclaim_swap(struct swap_info_struct *si, if (!folio_trylock(folio)) goto out; - /* offset could point to the middle of a large folio */ + /* + * Offset could point to the middle of a large folio, or folio + * may no longer point to the expected offset before it's locked. + */ entry = folio->swap; + if (offset < swp_offset(entry) || offset >= swp_offset(entry) + nr_pages) { + folio_unlock(folio); + folio_put(folio); + goto again; + } offset = swp_offset(entry); need_reclaim = ((flags & TTRS_ANYWAY) || From patchwork Thu Mar 13 16:59:30 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kairui Song X-Patchwork-Id: 14015501 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id ED0D5C282EC for ; Thu, 13 Mar 2025 17:01:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5F68E280007; Thu, 13 Mar 2025 13:01:30 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5A405280001; Thu, 13 Mar 2025 13:01:30 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 41F09280007; Thu, 13 Mar 2025 13:01:30 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 1DC81280001 for ; Thu, 13 Mar 2025 13:01:30 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 853A980363 for ; Thu, 13 Mar 2025 17:01:31 +0000 (UTC) X-FDA: 83217143982.03.2E781DF Received: from mail-pl1-f174.google.com (mail-pl1-f174.google.com [209.85.214.174]) by imf26.hostedemail.com (Postfix) with ESMTP id 5B02A140043 for ; Thu, 13 Mar 2025 17:01:27 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=DdbmxnrD; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf26.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.174 as permitted sender) smtp.mailfrom=ryncsn@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1741885288; a=rsa-sha256; cv=none; b=MxL+4HjKaGU+m32XUfOZg0d720DQSiM0rJ2AsG4WbznkE/tlrdCHf/MyBx9EU/uz7oMITi mBS+HxqAj9YDZdr98flUX3b9C3kDqZ8W8eCDRVeVyoyJ39a2VF7X9endphPSGQZOFnBB3m 1IOgEHl/trSOqYfRn5YMM82RrhjcOzU= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=DdbmxnrD; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf26.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.174 as permitted sender) smtp.mailfrom=ryncsn@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1741885288; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=CBQw+jqSF0UF2k99azwHt4J3S+nAnROIiT87OqHHQ/M=; b=UNGSxCudz+6evgr/+N6w3nUuAUt0u97q6yxdWkdGkZI0xTWAXTsKHi7RpjhWcmRAddIMa/ N9mu0GeqKnLRvetMi+RSf4IgqSnmzHo9AjXM4RSQkHG4NT4ZyJ1hitA0/ajSrGLPTUsd2q OL6qN2rxVzFD1T98A+t65Z6mVtZPRes= Received: by mail-pl1-f174.google.com with SMTP id d9443c01a7336-219f8263ae0so23567165ad.0 for ; Thu, 13 Mar 2025 10:01:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1741885285; x=1742490085; darn=kvack.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=CBQw+jqSF0UF2k99azwHt4J3S+nAnROIiT87OqHHQ/M=; b=DdbmxnrDZOPgoJ7f3lbHW/2xZLsFZ/+/BIGhhBvfHMnrYLNNPqW6z8ulsqQMOmfmFt oz17JWp6nP9qbc+GK4+L9bsE0Qkc1mgzZC3Xxapj3CsEFV48Mj/JwRKfGhUNlN+cxzeS zBp13aDBVEPhnle9aYNcGqOJQ3DjDlpEzyVOSk8Y9D2k6q7NAM34qc7Far77j3uNWBVK WlntcDeoDzFNPNvXPGMHGGh24CqjQEEJqPph2qpw8Oq8k/HohMXBCvdLW8GOdeqBRGgp rdoGsYRVUOJVVNDhhN9YxCH06NsYYVx9GZ7ooIFDSCAVgtq9PlDS0bKxP9vSNxhoLbHD AWCA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741885285; x=1742490085; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=CBQw+jqSF0UF2k99azwHt4J3S+nAnROIiT87OqHHQ/M=; b=hweld1Xf7U7s94MsFY5MbmMqZEgHw4H5FOBm1YF1aIeNOdwDW1qo0X8dVmoc2R9MxB oFYqs3qSqO1TohS7uUIvbVzueRuyS/b0qLorgoM63BrcuZjPem1yyZNee5kKlDu7uh96 WucURD95qGc5A+favcyhNf1fUrgAmpQYtpBiBWekxmVq/aX5l1paIrCd9jNjfEpuVyiZ L6SukexMambgRakXWW+3QOJxyVAyYdXRZl5QELM2VytRa8KP7iulJobiZJwGwhenkt0W fLRGoLBaMg4NrckVE8wQrq+eMwPvc3dI2Xh3ii9r90p8qorqYEOPRGVIBARwFpXJSJ53 8oRw== X-Gm-Message-State: AOJu0YysMFt2vOpVj3T+weqmCJJyeRAqWiQKIkL2E2b0UDhEaxG88Suq OU3Y9ZyHQtAWJH5byCmiBzTzeo+VPUsUMluNY/WDGg4gBK7xJFX90nPHe9/la0w= X-Gm-Gg: ASbGncvMpVvE7+UNTDblrAueMhZ4MUkkd3ypXOSuTeJBxWdfFeC8NqCEY08ZwwlCkAX D/ZEB6gjz3lqLqUxMo7VGFPw1qq6ocGQk0+JP1W6algryRzERygACOD7Mp0MThv/EpkSGnOF7YI MZqJQ6dQGo83UX+5M+YQJYOt7HpCs4Y2sinWxIcinOXz2TuKofpLrqPgni4EZNAsVJi8LGDss6W PKp+/x9fqbbHyAO9BsWOcoEO0suMp4VAjC3WLoPZkYemak9Wv5XqLAZY9jw8EutAbJLwJUXSIvV EQuOkoMm3nAbeEimtpnYe4jjGosykGd2t+YpRRJUBVm3FafbBcJENuJ4l3fsZVdtFVExebhOv8C P X-Google-Smtp-Source: AGHT+IGt3mQfnAVqFY2K3f10aCR6eUv3elLxAx9cwot21o1/wrOdRjDlzZvktfS1qoxjfYK8Kmbx8g== X-Received: by 2002:a05:6a00:88f:b0:736:ab48:5b0 with SMTP id d2e1a72fcca58-736eb7b2f97mr15996283b3a.2.1741885284743; Thu, 13 Mar 2025 10:01:24 -0700 (PDT) Received: from KASONG-MC4.tencent.com ([106.37.123.220]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-7371167df0esm1613529b3a.93.2025.03.13.10.01.19 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 13 Mar 2025 10:01:24 -0700 (PDT) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Chris Li , Barry Song , Hugh Dickins , Yosry Ahmed , "Huang, Ying" , Baoquan He , Nhat Pham , Johannes Weiner , Baolin Wang , Kalesh Singh , Matthew Wilcox , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH v3 2/7] mm, swap: drop the flag TTRS_DIRECT Date: Fri, 14 Mar 2025 00:59:30 +0800 Message-ID: <20250313165935.63303-3-ryncsn@gmail.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250313165935.63303-1-ryncsn@gmail.com> References: <20250313165935.63303-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 X-Rspamd-Server: rspam07 X-Rspam-User: X-Stat-Signature: mcntfyibuu3xcpr4b4icchsypyhtywja X-Rspamd-Queue-Id: 5B02A140043 X-HE-Tag: 1741885287-956907 X-HE-Meta: U2FsdGVkX188jjoBm6yxtygmSGKwCUOsB5zjWhL7M0tbfhZ3hMvhHWRDZuR5bcEX+fsHQ/NTDU63PEn66imrcfmW8Js5hS79PAchn7DFfVc5vCnExKnrlG9baPmtOfXsPSLXCN6FAB4Haj0LhPHEoCR6F61X8izERCZWp9+PlVunyOLSFASd+PHulYe6GJu36pwqecqI+kPkrILgnprknXsN2schGM+m7gSDEepTEopx3DVPPW4CCI9P48TL1ThvKwmvarZcstrXukV4Rpi2xy4WbjWAwznYkNRH/bKicDnbliCzGU5pLa1oHe1Ju6JcSF6AuLLXG2di1hYOIq8e6JH727a51ho19n1mAlCtHbwDXtny+KrNiS4w4Z2ZMDI9tuekq0PieNTlF3PdrNSvOq/IFEbznmkf+6+KqnRy3a/MBG9uogopkPGY6j//tyMMI46DWOE0IzTaYEQKJTXqk1w5sysC4yIivz9XmZTTfRp5B/HcLtDKo/BcWc23NnLv/NMotxwmoLViD3QF2W9+tlg3btnbT+eeDjKsogGm7BHFoiUv3o946yg7dmKzdi2H98ijjlkT2lPd8gUza5GjDuyZKMlJlqpc3sb2O1DN28EZnkcDY7TuWXNT/VTsZPlh8Xh5NZoLP5NNfORT5O4MdHmZlBJPwOeWswnAhFRbIuqyexgB9nsC3Te0QLeotGJ7uMyqhYlVFzXiOvZPwBvtREVpNMqWz/p2rve8LwtFJXYWT8qw+PKI3XiXhk69NGHhHFhQTjECoiS99Txz8VI2Th5X0ldJtcvIkxSlBmoDjl8hURKWTeOafkF3O74mRUEe0cVkqeYMuTWRAyWG1Pgv8rC9X5xxAz1v8JPofUlm3ZfHr8AKBovU+P79tJKK8lpe4dUVRSsVcEw3NXwu0N3gAK32zhvuiM5pMlELbX+r8v0QuYgQN2lsilYfCclyoHRh4siWvsR3d66U4BFTvi3 sdG4kQTJ Jd2FrKyJWEHuYNct2A6k6YMbipWEJSgijeRFDZZUlE6/Sz3VJmcuQHd4MZWhLx+TQM0/lcBFrWS5zVwI94foVIRrWKSIA/cGfqt3N9lXm0l82QC3gVN+ITMkji7hSXwJpCZ5DtadjXA+TlQSugyhDrr0JTsCLJARl5tmeS6z9nW3dxU7Y9iyRcqpm8PxgCIyZOpRsrSQBr3Xbs3cQqotweNKl8GP8yCqf5EXP3vp2jTPwfJ68VpLsv9JdQze1bgqZxeXbpHmLA754x+uslhb/7bbdd4wITGNgWbSTLmPSsFEXB/2zGecX5bVCynUBolPfx6hTrQISyFUDXIXh9pWowh3UZmu8EtUFYUudqLiYNDdPU11D0OljuXbkEYBpTWWy/QH5rfLsy3k2ZtL0WrRtu4BvEJkjXYNyz4l/zCLfyy6hKlUeQKvd/EuqRgKFpYv6Y9iJGHrzAN/B/iF083ObzUxfpRlmnxf57h4GuPiLOW2BMJUTDG3PnUUidSQQvEbK+cIEA0UaIE5iulrN1lwKrn+3XS++Z8sLnuAn5As2bn9bzDN51JPmfalkkYGl4MyxXUT4/S30srg/XtIj7Omu3nVBiEfgzoWdhG4VWBFl32VSerOMcblVH0Xj5uPJKSgp03wqGCcUCzl4QyQ= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Kairui Song This flag exists temporarily to allow the allocator to bypass the slot cache during freeing, so reclaiming one slot will free the slot immediately. But now we have already removed slot cache usage on freeing, so this flag has no effect now. Signed-off-by: Kairui Song Reviewed-by: Baoquan He --- mm/swapfile.c | 23 +++-------------------- 1 file changed, 3 insertions(+), 20 deletions(-) diff --git a/mm/swapfile.c b/mm/swapfile.c index 5618cd1c4b03..6f2de59c6355 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -158,8 +158,6 @@ static long swap_usage_in_pages(struct swap_info_struct *si) #define TTRS_UNMAPPED 0x2 /* Reclaim the swap entry if swap is getting full */ #define TTRS_FULL 0x4 -/* Reclaim directly, bypass the slot cache and don't touch device lock */ -#define TTRS_DIRECT 0x8 static bool swap_only_has_cache(struct swap_info_struct *si, unsigned long offset, int nr_pages) @@ -257,23 +255,8 @@ static int __try_to_reclaim_swap(struct swap_info_struct *si, if (!need_reclaim) goto out_unlock; - if (!(flags & TTRS_DIRECT)) { - /* Free through slot cache */ - delete_from_swap_cache(folio); - folio_set_dirty(folio); - ret = nr_pages; - goto out_unlock; - } - - xa_lock_irq(&address_space->i_pages); - __delete_from_swap_cache(folio, entry, NULL); - xa_unlock_irq(&address_space->i_pages); - folio_ref_sub(folio, nr_pages); + delete_from_swap_cache(folio); folio_set_dirty(folio); - - ci = lock_cluster(si, offset); - swap_entry_range_free(si, ci, entry, nr_pages); - unlock_cluster(ci); ret = nr_pages; out_unlock: folio_unlock(folio); @@ -697,7 +680,7 @@ static bool cluster_reclaim_range(struct swap_info_struct *si, offset++; break; case SWAP_HAS_CACHE: - nr_reclaim = __try_to_reclaim_swap(si, offset, TTRS_ANYWAY | TTRS_DIRECT); + nr_reclaim = __try_to_reclaim_swap(si, offset, TTRS_ANYWAY); if (nr_reclaim > 0) offset += nr_reclaim; else @@ -849,7 +832,7 @@ static void swap_reclaim_full_clusters(struct swap_info_struct *si, bool force) if (READ_ONCE(map[offset]) == SWAP_HAS_CACHE) { spin_unlock(&ci->lock); nr_reclaim = __try_to_reclaim_swap(si, offset, - TTRS_ANYWAY | TTRS_DIRECT); + TTRS_ANYWAY); spin_lock(&ci->lock); if (nr_reclaim) { offset += abs(nr_reclaim); From patchwork Thu Mar 13 16:59:31 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kairui Song X-Patchwork-Id: 14015502 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7EB1DC282EC for ; Thu, 13 Mar 2025 17:01:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 18EA9280008; Thu, 13 Mar 2025 13:01:35 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 13FF3280001; Thu, 13 Mar 2025 13:01:35 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F2247280008; Thu, 13 Mar 2025 13:01:34 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id D249B280001 for ; Thu, 13 Mar 2025 13:01:34 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 56E59B2446 for ; Thu, 13 Mar 2025 17:01:36 +0000 (UTC) X-FDA: 83217144192.22.B9F9473 Received: from mail-pl1-f179.google.com (mail-pl1-f179.google.com [209.85.214.179]) by imf18.hostedemail.com (Postfix) with ESMTP id 814541C0039 for ; Thu, 13 Mar 2025 17:01:33 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="dPT/6p4m"; spf=pass (imf18.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.179 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1741885293; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=FKRxUK1y7Id83Wc7nB4qFch4/Nkz0/Hx3e5OqBZUaiQ=; b=hEZrnaVHEj0nwRigDJHRVPUGHS7N0I8B+4n7p1SOH44CnyionhSUFh7Ud01UDGnhAe6ONw YzFTWyf93I4PASZ5lvBxvIJbeejVXsPAqQ0Gp0P6xHIcMXJaIYpoOB5uMX7KzL2iSHyQxw LzngT8timbbKnB9F8HvSubUqCFc3yB0= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="dPT/6p4m"; spf=pass (imf18.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.179 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1741885293; a=rsa-sha256; cv=none; b=0zb0PVMNi75nS7Q+k3psajY78zpCiZ8S3y7FZEFWRUUyTjVdENabIHV3NBNmrmUNTni/5w EtFRsMKI2BlhBg+r9GuUG7ezRoKSUWRZXU36dXjDh2hL5/+makeQTGRWNtahWzbtmi2Pnv gvla1K3ayuubtghyfsUHq0zdhh+Gv2g= Received: by mail-pl1-f179.google.com with SMTP id d9443c01a7336-225b5448519so23577095ad.0 for ; Thu, 13 Mar 2025 10:01:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1741885291; x=1742490091; darn=kvack.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=FKRxUK1y7Id83Wc7nB4qFch4/Nkz0/Hx3e5OqBZUaiQ=; b=dPT/6p4mWKGpAXD8jG83MwWHEBDcyzYEdk03iOyvPC8C6TArkJfQyRW6gkukBupJdG HXpCq9nV/vqe3kmPXLYfEVvCB2X3fStOnfa5l9DYfOLUXxlrvQ6Gg6LALvTkv8cxLhmt 8m/7QKoTMyM1Wg0pTOmX7DrpAA8W3eFw8spHjnXwLbM5lDdFYAo3DKWQmlkqHHVVuzHM k4/1oE4uKuyT7b4FodiTUkKFLT/CsLpjgmf9ZhMDpMDSZsjTZrL3EYrVmnMeHYiyTPIx Mfkb+g8ssx7NxzeGa3eyljFEf7X6sqIyXwAMe5Fxj8wwrew7FcpqgRY8b3e46hKUaZoa mBYA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741885291; x=1742490091; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=FKRxUK1y7Id83Wc7nB4qFch4/Nkz0/Hx3e5OqBZUaiQ=; b=DR6VIQRhNyK+rragjzb2syt4ttb+5Yk5Lbx0hc1BHr48NhahHzr2Xhma8zi6ksi27U DbTWjgztf9aJTZHzIbjItZacF/aBjl/+7J2X7wPV4VoanjXAoX2eOFtXvxK65WDWDIGN v+sfKTSxow1CqEOq2cvwhTxPiI/tkeXm4cIw/Tpm1EJTs1ty/SXLjx5BHI+mW8+funrn QqfitZSsHLp0f0mEpLuk8EO8pDPMUviPCpWq2Ho6zGSkYh6GRs+lrB+IrAEiYR6wqysn Bqna5TI+DvvO+mgUKAKA/6lHQs29UFQoBhNfPfrFFPKobkWn7spmXMyvudkMng0h6UOK 7N4Q== X-Gm-Message-State: AOJu0Yw/vAXESxhvmnCHpkl1bNHm9myNX0/7QvtQLPlgdgt/ujoZEcn5 kzu5h1y7ZHSG4aqtXq1QShyCgpp9zdmkAwemNmfyVpu0aSSMbcqNW1WK2KDiASU= X-Gm-Gg: ASbGncuss96mjv4N/i2SchFe2s7BT4QNDwbPEEpHWD1KAY5QSEahOYaT8t6Y5FTnS2e PCWAAmrQPZOWn62jRyioLie+rvAov3LIsAvVv98R9huNn/YSrq7KveZLzq3rsAzkHUpHdGENFBR Vcrxarw9tW0E4xoCEkuN51cs20RfT5aXGwcG6/ZaqGXd7aBa6f96oZvjOMUyMUoKSaK/F65I+bs Bkjxkg02u1fiiH3yPx7be4UMXQJIg1NTa0pd82dkv5nmr5YAt4UxShfAC+jnSKmocx+93cDylfc joQ/X7vQsRIfI967OFmPgZZ/slWZl4BEV9cDendhGXPVvl9hRZNl1Rhhgr/lJwkB0sCovpkQXF8 s X-Google-Smtp-Source: AGHT+IE/NjRSkW5y7+0MzcGG6ioih4/hqPl9Wj5CnaKDtJUm8hn8RvTkO7mg55Bhe4Ik+x9Cr+r9SQ== X-Received: by 2002:a17:903:2b0f:b0:223:6657:5001 with SMTP id d9443c01a7336-225dd8b9881mr3214035ad.40.1741885290866; Thu, 13 Mar 2025 10:01:30 -0700 (PDT) Received: from KASONG-MC4.tencent.com ([106.37.123.220]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-7371167df0esm1613529b3a.93.2025.03.13.10.01.25 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 13 Mar 2025 10:01:30 -0700 (PDT) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Chris Li , Barry Song , Hugh Dickins , Yosry Ahmed , "Huang, Ying" , Baoquan He , Nhat Pham , Johannes Weiner , Baolin Wang , Kalesh Singh , Matthew Wilcox , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH v3 3/7] mm, swap: avoid redundant swap device pinning Date: Fri, 14 Mar 2025 00:59:31 +0800 Message-ID: <20250313165935.63303-4-ryncsn@gmail.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250313165935.63303-1-ryncsn@gmail.com> References: <20250313165935.63303-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 814541C0039 X-Stat-Signature: 8mc1a49iyd7eic3bbkirpxqiu7w3d6tk X-HE-Tag: 1741885293-592213 X-HE-Meta: U2FsdGVkX1/7axTOpVqHy0qVEnFc/Oe8SEeyp9PI3IZc3z/nUt65qTpmt4M9FFGoc3lgE5+S2d9AqGbRQ23QjmyXCP3L422LJ7fS/rcDZbfn/xoEI3K5tBNuMk4Z+GZkqSZAZ/OVp8ZV5zDxmMhDlOJ0kX9771z7hww9P2x3MibdBYrHiBPNeDSRtS/tcqeq0rsuNgCOcXeVG8i/guLzEJT91gUDWthd9m+EslWEJszlFYvTr44BE9U56zxnPCMEWP4+pQRcVcNcdJhGxStQDzxsKSeswlZf7+8EK6zCWh/bhOYaE6fdElh64/MvMEXn8lfwg/MADnsAwp/QFVQ7kgGcvAqlMZodmsLymjWvULPmeCJ5TckJ1UmZiYI1xOqWItJPduPwdotX5VVmnL5GUkjpY0uoodGOunXohJ5rPaqrB8dN7e9GSYl+4UseNcFY8UGJQGnn4hc+TqW8Lp9kHsdSz2Dfi2yBTDkifBvcEUrcwA5tVrWeQEARP5e6L5JfsEzcmuA3K3qX6JMsLQJjNvZfbnuZanhlF2rll93i82tu8qfE5jaUjd0GCynF5HnMQQpP28nfyuaTLE7vLhiIOP0U6NHPrjuk1SOLup8EJYysX3I9cA+PeaGxt5fVvNkJGgopzGU+Pb6q+w/4aiJtq6q4/WEwjLdKnt91GJVqDzW8OKA3hJS4IJWZiNUFoc96sMyPFwM+n05WJLw5+LOnwvBeYeUpMNkd0oXo7geeDPxzjvL2dLIm5OBpnvUXMFiUdw/37/Uz9ZAyuFn/H/qxVTS9+fhqWuMMwnOp3pJq0VrvvjxOrMp4SNdin7R9rE/4uKq4R1AWxML1KSco3X6Mp7O8nblliHLBu97ywSeSBU3ALLcRqk7WRVPOU9tV5PcspHY0D8xGhWY86DlhBaV9aLCVEEJaX8nbM7ozu0XxoKh/OBQ3LwIPLFA248rsXqV4VWROGsLRgqX2RgI0kLK tnD90Nn1 44w+ICMaZRYFqydnY64Q+2s3LLSm0i1BGDNLzhOvneY5XV5k2aczRXc4bpWlx8EQdRC/6NubqVACu9kyueCFB5YV6dCPDEVbsoBq1LPfs2yIMR2jUlLxqjSkz2achtOaWBMKhqiFCBhbaM0J5OUAe7bbaRiLR6fIdjo1d5SL2ospc7qI2H30Zbv4mg2FyvA2TBZ2fQNh9YCEsuDlozHAXTttRWqL1gVWZ0MT0ZZZuVCWXY4yTm3O+pNP6nmWx2hEgWEtyDqZMUuPbeP9D8tQqo3YGAOo4CYOaoFqAvZxfmTBFJyH/GBNrRULE5DwL5W0K/4FyAyM/7XyozzWgog1WEgdjEKVjw/ArvjGA0pXLqeUoXzt0YBMqKoTM40yNXtg6tYDMEL041fedxOF40yY54eYL+53V6SLbIvioZBrQqFKzsq41QiieAOOhN6ThAiEfvThdKMMAC55cwfq78IU+djuvyXAqpN5ymyPFEMljeOzAhYIWmIZvS4zzJlPsuBAnnqGgFz/wQE1UlrfXlbG+pPNQxY5QN9gE1EuOlbzbL4FHBYu10lzYXemfxyt2M/kE6YVDSef1pIe629P6UzOxd0K3mQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Kairui Song Currently __read_swap_cache_async() has get/put_swap_device() calls to increase/decrease a swap device reference to prevent swapoff. While some of its callers have already held the swap device reference, e.g in do_swap_page() and shmem_swapin_folio() where __read_swap_cache_async() will finally called. Now there are only two callers not holding a swap device reference, so make them hold a reference instead. And drop the get/put_swap_device calls in __read_swap_cache_async. This should reduce the overhead for swap in during page fault slightly. Signed-off-by: Kairui Song Reviewed-by: Baoquan He --- mm/swap_state.c | 14 ++++++++------ mm/zswap.c | 6 ++++++ 2 files changed, 14 insertions(+), 6 deletions(-) diff --git a/mm/swap_state.c b/mm/swap_state.c index a54b035d6a6c..50840a2887a5 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -426,17 +426,13 @@ struct folio *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, struct mempolicy *mpol, pgoff_t ilx, bool *new_page_allocated, bool skip_if_exists) { - struct swap_info_struct *si; + struct swap_info_struct *si = swp_swap_info(entry); struct folio *folio; struct folio *new_folio = NULL; struct folio *result = NULL; void *shadow = NULL; *new_page_allocated = false; - si = get_swap_device(entry); - if (!si) - return NULL; - for (;;) { int err; /* @@ -532,7 +528,6 @@ struct folio *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, put_swap_folio(new_folio, entry); folio_unlock(new_folio); put_and_return: - put_swap_device(si); if (!(*new_page_allocated) && new_folio) folio_put(new_folio); return result; @@ -552,11 +547,16 @@ struct folio *read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, struct vm_area_struct *vma, unsigned long addr, struct swap_iocb **plug) { + struct swap_info_struct *si; bool page_allocated; struct mempolicy *mpol; pgoff_t ilx; struct folio *folio; + si = get_swap_device(entry); + if (!si) + return NULL; + mpol = get_vma_policy(vma, addr, 0, &ilx); folio = __read_swap_cache_async(entry, gfp_mask, mpol, ilx, &page_allocated, false); @@ -564,6 +564,8 @@ struct folio *read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, if (page_allocated) swap_read_folio(folio, plug); + + put_swap_device(si); return folio; } diff --git a/mm/zswap.c b/mm/zswap.c index 7d8d684e54d4..c470073c17cc 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -1055,15 +1055,21 @@ static int zswap_writeback_entry(struct zswap_entry *entry, struct folio *folio; struct mempolicy *mpol; bool folio_was_allocated; + struct swap_info_struct *si; struct writeback_control wbc = { .sync_mode = WB_SYNC_NONE, }; int ret = 0; /* try to allocate swap cache folio */ + si = get_swap_device(swpentry); + if (!si) + return -EEXIST; + mpol = get_task_policy(current); folio = __read_swap_cache_async(swpentry, GFP_KERNEL, mpol, NO_INTERLEAVE_INDEX, &folio_was_allocated, true); + put_swap_device(si); if (!folio) return -ENOMEM; From patchwork Thu Mar 13 16:59:32 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kairui Song X-Patchwork-Id: 14015503 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AB6E4C282EC for ; Thu, 13 Mar 2025 17:01:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 500AD280009; Thu, 13 Mar 2025 13:01:43 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4AEF6280001; Thu, 13 Mar 2025 13:01:43 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 350B7280009; Thu, 13 Mar 2025 13:01:43 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 15C2A280001 for ; Thu, 13 Mar 2025 13:01:43 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id B99E816038F for ; Thu, 13 Mar 2025 17:01:44 +0000 (UTC) X-FDA: 83217144528.15.CD26B4F Received: from mail-pl1-f175.google.com (mail-pl1-f175.google.com [209.85.214.175]) by imf08.hostedemail.com (Postfix) with ESMTP id 8208D160037 for ; Thu, 13 Mar 2025 17:01:41 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=HsrtzI1T; spf=pass (imf08.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.175 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1741885301; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=16176wl+Bq4nvhP8cCENJp3OqZKTVE/qLPzCvGc5ZPA=; b=EH4FU/3eIOCzwIfHE62bloTN5rDHt98TYMItbL3+hoQ0GHCM1J67TkPXY/RVNeuaWK//35 zLpMUEPjsmyMwSsqXCo2cXswP/jGbMLTPmrygSZY8chptHQYFQaMX0WUgui3SUpyfFgAtm WBny0xWXHcyprsBkzT53x8FkzGzhaMw= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1741885301; a=rsa-sha256; cv=none; b=15q9I3AcOA+74roDEKum7gk/lklAfvSVqvjFiB93Us6/jhyL+CaauTBQ3ZZ5gC1H05i9vd 4RohziJcO8388ZXpWf6w82cTba5hTrJyquT7TDxOohdFN4XLdtIga3tO4jgmqt/bP6WML2 1xrylKt2g5odiDoVFtjL6GOhE3IU28A= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=HsrtzI1T; spf=pass (imf08.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.175 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-pl1-f175.google.com with SMTP id d9443c01a7336-225b5448519so23580105ad.0 for ; Thu, 13 Mar 2025 10:01:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1741885297; x=1742490097; darn=kvack.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=16176wl+Bq4nvhP8cCENJp3OqZKTVE/qLPzCvGc5ZPA=; b=HsrtzI1TR45QbhjIb64g5MGTAdJfGBXtYCGJow0ADzhdGLX1zbAMCW5ALZPZ+BTOw1 5W3yBsi4JRxt/THTKUZU+++mbXnM3ovKFmIX8GIBrT1ZetrB8X7lHKr04wvehZdn5EoZ 6c4aQNTqCDvGySlNs9FQj0fa0oBeleCZkGf8kl3oZuu4+wT+0JjempbF7LHir1yjd2ha 7lic9t/rwRVlClEcoSOZpBWl0+z/JfkjQBBcWF/oxiscUytg6i5Zh5rM+vbA2A0wTBFy Gp5pO4bG82L/0bjkaVQyERkPcjNNcCcl4uRk/aAcnrTAF5u5MzcRnS2uzDyoOYTaYpnY As+w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741885297; x=1742490097; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=16176wl+Bq4nvhP8cCENJp3OqZKTVE/qLPzCvGc5ZPA=; b=EIuowQu0WMFQs75XeFeumDZG5d0UpdJyPjNtEkBW457geC/kYdb6K9mGEUUax5VNEE RxNRe03PHU649uc7Zc5XlIOtB6NsPACG9Rk8oJXXeWzmIBQvWk/bYuk8jZfPqnKl1dx4 pT12XhjUV6WBugefL8zReEQjauxNyQI9mG3HYXFEGWnb4ibfR1+oj6d7JoodE4zCoiRQ 3FlYAlEjq/3aYAIRq2Z06prF7LlKByN/ZZEPAZOzd1fB4/i0M//uZ5dYm36M9UdGtbsI EHZsLu3AYYHoOh8f7bf27HJPDbFbNmCInw73xkSVcWSBn1I8tf20yz6/UUBL0U+v/quG G2CQ== X-Gm-Message-State: AOJu0YxsFKybK1ANYKSF43zjEmyBoda18oqFhA5J1Y9Nv7HE6WVb/mYq RumAggI8xjHy9l287e0FLu+CclBq+minqSXrcTiSj8ao6J1RGA7b/KjE57k6OX4= X-Gm-Gg: ASbGncstklc9V1BM+oxNycyjArVe9puhPzPFj3Ko7QeNYiMI510dFtbXRCaiYJypzoC 4qy3+nEFbfO97XApatAgFkAeXjVmDYuSHGPMk4bGJhiL0EyDUiCenMiS8skQ+O68G4wttBoCSn/ poBau7XlMSMhnLiqjqNoEWYCmKhe/H7QKPNcRprLlNWmK69jrYGq0NGiKAIFWsfXF4HNCEiQuqJ Qcx5cFSt3pXqMsTxzDnxtQ1He9EAA7iDEdky5qNqtA0RLVrLiIWi+Q8k9GmKpNgslSmmOHODyyK otCJY0NlICHm75W04REFD10E1Bmj2laqwLsivIVrtD7DvQC66B+v+wbleiu+kTXHKQ== X-Google-Smtp-Source: AGHT+IELVJnU/2w6JxSBVbjYpEvdIkIJJfwul5CyfHieaSFU0AM6vT3TQ+4+KXAwtNXc5aJfM4hBKQ== X-Received: by 2002:a05:6a00:1885:b0:736:5725:59b4 with SMTP id d2e1a72fcca58-7371f0cc2b1mr361404b3a.3.1741885296524; Thu, 13 Mar 2025 10:01:36 -0700 (PDT) Received: from KASONG-MC4.tencent.com ([106.37.123.220]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-7371167df0esm1613529b3a.93.2025.03.13.10.01.31 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 13 Mar 2025 10:01:36 -0700 (PDT) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Chris Li , Barry Song , Hugh Dickins , Yosry Ahmed , "Huang, Ying" , Baoquan He , Nhat Pham , Johannes Weiner , Baolin Wang , Kalesh Singh , Matthew Wilcox , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH v3 4/7] mm, swap: don't update the counter up-front Date: Fri, 14 Mar 2025 00:59:32 +0800 Message-ID: <20250313165935.63303-5-ryncsn@gmail.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250313165935.63303-1-ryncsn@gmail.com> References: <20250313165935.63303-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 8208D160037 X-Stat-Signature: ogh51kawphh9j1bgcp5s3thz437tzufh X-HE-Tag: 1741885301-819522 X-HE-Meta: U2FsdGVkX1+Jcw6MVR9IpLSWu3mVUhnKCXAdTBDd3CY+K64FNf6EmmcFL1nqLpGSUtgM1LahBdo8yHCZXSGgHeHmnXtbp8bDDrlmg35ePAYDyXZNZsyoRRyHANMmn9qK3azZNsIufBLRPgRIbcezzfAGmN4EkXwVUsnJj/mRXnwuKOAl6/SoWA8w9J+h5oFjdDqkq3RJa2Rx28Af/FOgS19tADLJTbJ6mIWl+8RObyZknT0KvE5haAAljc1E4y7kcsdOW+1qbXYLSszqJFsuOrM7qB7J3M8IKCPDYIGiHehOpnLmJ/x2GeW7Xq3x1DYukA8G9ht0nUDn+JUPkPOVJ/XiiiEtCwiHYbXo74fKJ+yK5MLdWYGqkRoMXo5XEI5NU5ZzECJCWuXckzSBZkZ+/hheiAPljWqklFEE5YufcMRjRYz91DHwRm8l5s0ko+7tPhv7DiACYYaxdmFm+w82OwttWi2WLbcEXiJyeTzbdl9CeieqIE+fBSquezdqC0hrX/mTYWAw1fymp8MDXlzCbco6WcPfhfndWKaQijw4C9qTdGBUjJzlqyVdonHnnIswvwqhLMxaZdgYbyZC/9MMUB2YxWr54gCtByNNRPLjDu5wh/+p1VQiEbPXUK7M7XKp8Mcv5Josy99aiTE5twJLhZ/u032i4IICmd4VH/6B3e3RRWibrG0jM2cR4JonCIpMRS6Mu6iPqc785lBTcKBtwuK87nzsCG1Ve4gtxI7wYqpGdvTtTJTMOLLsWwHwIjEEatSdOhFQX0AUfNds6uid/I+5JZdhLpnR6B/q/W90mtMR1u7kWr6BCa840M91F4ivG1L7ywv6GlSGHGsLHfRH3rj3BXkeTXTZ46belHbsUXrtQatGR01eB1wEvQxSDDES8M+cIvDIgwV8SCQvVJhQ6IN+BxT5lcZlRFJ6Abr4PsxkpsnqeWKeV/rs+7EF3tJXN/KtoyP7h7Y5E8cCDiq 5DRv7XG6 gcI8dgOw9+RBLp2v38pFRntSC7NU+Lqw5Ed3U/xKOIvX7dWS/wCHtjeY0q4WtwSZnQW8Lbo8RI6j8eZwZK17h2jmNPbCjI435/hVxBvg1q701v13J58CtCOflJY0AV0QEV4sFznbez7PdhBi2erIGU956hgtdhKGT1xDU7lyG3gzDTI1h3In3fSlKNCUW7MgylumG910imMeybUQY6gCZr1PXZA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Kairui Song The counter update before allocation design was useful to avoid unnecessary scan when device is full, so it will abort early if the counter indicates the device is full. But that is an uncommon case, and now scanning of a full device is very fast, so the up-front update is not helpful any more. Remove it and simplify the slot allocation logic. Signed-off-by: Kairui Song Reviewed-by: Baoquan He --- mm/swapfile.c | 18 ++---------------- 1 file changed, 2 insertions(+), 16 deletions(-) diff --git a/mm/swapfile.c b/mm/swapfile.c index 6f2de59c6355..db836670c334 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1201,22 +1201,10 @@ int get_swap_pages(int n_goal, swp_entry_t swp_entries[], int entry_order) int order = swap_entry_order(entry_order); unsigned long size = 1 << order; struct swap_info_struct *si, *next; - long avail_pgs; int n_ret = 0; int node; spin_lock(&swap_avail_lock); - - avail_pgs = atomic_long_read(&nr_swap_pages) / size; - if (avail_pgs <= 0) { - spin_unlock(&swap_avail_lock); - goto noswap; - } - - n_goal = min3((long)n_goal, (long)SWAP_BATCH, avail_pgs); - - atomic_long_sub(n_goal * size, &nr_swap_pages); - start_over: node = numa_node_id(); plist_for_each_entry_safe(si, next, &swap_avail_heads[node], avail_lists[node]) { @@ -1250,10 +1238,8 @@ int get_swap_pages(int n_goal, swp_entry_t swp_entries[], int entry_order) spin_unlock(&swap_avail_lock); check_out: - if (n_ret < n_goal) - atomic_long_add((long)(n_goal - n_ret) * size, - &nr_swap_pages); -noswap: + atomic_long_sub(n_ret * size, &nr_swap_pages); + return n_ret; } From patchwork Thu Mar 13 16:59:33 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kairui Song X-Patchwork-Id: 14015504 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4E934C282EC for ; Thu, 13 Mar 2025 17:01:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D236D28000A; Thu, 13 Mar 2025 13:01:46 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CD3D2280001; Thu, 13 Mar 2025 13:01:46 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id ADCE328000A; Thu, 13 Mar 2025 13:01:46 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 88412280001 for ; Thu, 13 Mar 2025 13:01:46 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id D9C251A0350 for ; Thu, 13 Mar 2025 17:01:47 +0000 (UTC) X-FDA: 83217144696.19.0FA3BD3 Received: from mail-pl1-f180.google.com (mail-pl1-f180.google.com [209.85.214.180]) by imf08.hostedemail.com (Postfix) with ESMTP id 21956160029 for ; Thu, 13 Mar 2025 17:01:44 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=hopkG3PX; spf=pass (imf08.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.180 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1741885305; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=9xT7r++KWm+yhA0y9xoo8iW6UiQaXdC44FZhlKbL5P4=; b=zmjvObf4Mvp5HqfS45yrxL6pO2T29pcQLXSLuc3qbHrcHGoKJvrXPtWoYmc91sON0EqSMA AHff7HXmBfYxxhA+Mpx7Zikds4RqyNZHf6uMkTtV1m1lF9lX9I8OHwrGXewdXoXzp74nbl xygcYQMD0TbcMSaksu6VSmvjBQyruCw= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=hopkG3PX; spf=pass (imf08.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.180 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1741885305; a=rsa-sha256; cv=none; b=CUTW5s9ePcO4U7z3SyzDILn+u34E6V+sz6olyV90t49UBazq0WCOCQEYJlMdo0+nODJKBv 4SQCcb5YoPbgJ8RAx4v5NnGRMz1ZKoWD4AxC1XQ48AB/YutdQlTZSW5Hky27sZNv9h+NH7 MCQgMCwMpv0p/ajqeN/xBeMV5xf6KIA= Received: by mail-pl1-f180.google.com with SMTP id d9443c01a7336-223fb0f619dso24731875ad.1 for ; Thu, 13 Mar 2025 10:01:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1741885303; x=1742490103; darn=kvack.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=9xT7r++KWm+yhA0y9xoo8iW6UiQaXdC44FZhlKbL5P4=; b=hopkG3PX02gwfDb2WxBEF0ed1Pvn6neznTo/aahQSODJEt3gECHoCCfOGl/b7/PyI2 0MP/Ti0MzWU2sz5TO6KuFM9RefCfbDYBBt/5gma5ARRjN2WXJ+t7m9pEbuwPs1Y8gvtI L+d3HFir9mb8kXXpfPB5WfU9QoBa2Tq1c+YkmmMrGsviYyPSTRPrCbXmSmIe/2KvExJq XGXdD90xwG+V1iKiQ1qNnJWpe2TIRmf5MrCHMv3qaJjzzIq/k3M980Y8hgqxAQkmZE81 VyKbFv+kXoxzCzkYTd1JpQ1aRXCq5UYnlsHMffNBaR1HU5PE3c/o8n6UbhyvyBqHWpFy xs7Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741885303; x=1742490103; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=9xT7r++KWm+yhA0y9xoo8iW6UiQaXdC44FZhlKbL5P4=; b=Si0PtRd1GumkyJRE08N6RitH7DaDWHw+Vp2wDUr0n/nWfIcG7qqAOV5bJuRaNizAhE W1t9S8VweiPQ0xmdojuXNYmEbTjefvIAgs/ozquZG8VLGTUgR6wjJ0v2bVdv1t+NAInP kAw86cGLOy0Kc9GZSO9Y1GDST9Nf0uyiRV6PhQsjibAKcJrQishFq468zKLnGtdpAD3S lI5iE193eb+jvuZfLd4juxfYlbkYQyTPb5aiKs7BxZxcm7L4Ht4Gjzdhq75PFUlKqwOM hF+codPR9zEEdUhNo4aciruI2Q9wrseAA3fv/1rMDDMdOvjY0DGJAnP00wCLGGyw5EfP FwlQ== X-Gm-Message-State: AOJu0YyDd5/kG7W/tY4bIMiUv3XK/8Mbte9OqlwindPT1FhTZLwGIhuU Yqm5itRkz+igUmtps1L8THLABfZR3Ouyp5PFKn6BLF7rLwwuGOQjI2B4AGLzWXk= X-Gm-Gg: ASbGnctIJCDLfJjWRf8aTYSOCcAsX29VqCFhq28pxTSvjOxSLZkW1uo+e4RoT1jgOLl 4Y59zGGShTNdZtOY7ju1GIHeAfde49kZsid6XP4q0A8inTElEDPf1qriCje4ANpcYRgoqdpmy9B 3VHw2A8kY0Y7+xmCWwI/B/Pv/FRrFfxYHKE+K7b+PlN5MH68BKm9JGWTt+fTEKA/LRtr4A56Oa6 vpGwtiYEXNUnVoSXrahfFV/4yePzfCiIm/G1aZql3bziBXNzSFFqzVdIpZvbF2/XS9qHej/qYiW NHD41Wzq/2cNVBtdgIp490tmrunsRLiKtBTKUFEKvx+QagJuV0ajkC1v17QwRiisfQ== X-Google-Smtp-Source: AGHT+IF3acUoheQV1BNR2DazXaoGGKrp5f62d5qSOJdcT9nhrgJYz8mDk/o5zexLyd1VkZE717bxcg== X-Received: by 2002:a05:6a00:194c:b0:730:79bf:c893 with SMTP id d2e1a72fcca58-736eb7a0176mr16562303b3a.4.1741885302958; Thu, 13 Mar 2025 10:01:42 -0700 (PDT) Received: from KASONG-MC4.tencent.com ([106.37.123.220]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-7371167df0esm1613529b3a.93.2025.03.13.10.01.36 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 13 Mar 2025 10:01:42 -0700 (PDT) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Chris Li , Barry Song , Hugh Dickins , Yosry Ahmed , "Huang, Ying" , Baoquan He , Nhat Pham , Johannes Weiner , Baolin Wang , Kalesh Singh , Matthew Wilcox , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH v3 5/7] mm, swap: use percpu cluster as allocation fast path Date: Fri, 14 Mar 2025 00:59:33 +0800 Message-ID: <20250313165935.63303-6-ryncsn@gmail.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250313165935.63303-1-ryncsn@gmail.com> References: <20250313165935.63303-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Queue-Id: 21956160029 X-Rspamd-Server: rspam03 X-Stat-Signature: j1xdyygn459ita4gxcqqa9fggjhxmj5q X-HE-Tag: 1741885304-975350 X-HE-Meta: U2FsdGVkX1/hlp5bZ8cTwS1e8G3ea8zfpo//+x40Ybt0sAySTg/igc9foSUDlCWT9jqoMd455QXgGKJ3IcxiUlAjH497fRplg6yYpytTxvPXLL9sgWB/r1cznPQajHqqnJnaaOo2ARmOt6TRGKdA96eBqswDAw28YUzcJXPq7Gs6ghx8J5bu3XctG/VOud/g3dP/r3NVcP4skoHsN+a1ar3wodLuUJQhs9DLWyDhDbVH//I/31h1j6ZPyOi7Qn+CbEtUX9joH9dyjmYhqlkVWRpzQ2ICQWomUyf3PjPSQtH8k/9fggbtXhMQgz9RrkxpAVvm7T5pZsJ3K8HGnuD79Cuqi6Y1fOU3UbKbq+mIPZIYl1EO+HeVAOQbsy3fHw86b7wvOIy82WX+SvnU8KRLJPEphM7OzT+2abl8IjqhKIlq73Aeocx7QWR5FwywgVopiIF6Ocd0O4eojN6Fs1Drqv23MkgIeoT9W/8FLg82u2jRgwuQaEoA4BQz/Ep8hPDE3CJo6HbSmkyZ1k6e341t+BdT3TkAziNoYar1k38nCscJM2YRbQx7VAmUXsA/PNiXFOoYt24qeAjwWRToT73V1k7BDPex4X3nmOnfFCjhgb/uAXJ/ZRqjFtviYE3MGoeTWSXc2YwOmOTRqyn6QNNTMKXtHhbLpVfZrVfaP4nbtOB2RRSmEUfODw2qwElGe22V3qONOzaf1t/6D4JHLWTFoOb5Dqoh7tZDVqJ+tAA+1YTKGQsB1dEhrBpYXMRhd5oBFEN12s/fW7cgqld7hjmobLAI+c1jp4VbnUN6G7QUzwh2ZlX4aKGonkQa+7M1Rh/5eZX8LN3bU9yHb9othTqTDflmRjxWc74U3VxYsgdTE6Nre5vcymQkQYgSUkJNmOXQRVCthq6WeNJNMc6G2PpN+m9hprfeH91mZC75vbJN+o/q2+l0cNqJ9TONILLnFcpLhxphNKjkir1LzJfnDcQ psvEmyoW 0M62fBYd2MAPY+KBagko1iVxsy3BlyBeZAHOEAcufoFYNTy7an9ldwq6n+HO6V+oo+z4g6e2wp4PXORO6CjmSUUZQDz7Xy3hwdWWB8sk/I1Ok+BmExc6MkHC5XHBDqBMIToRAVrzZMwQe2LRHH8lO9HBb1Po4j7G0kMSNwpCU9FW5EwTBgI6ORAG2uHluxjbvpEC74oblH/G1D/41fJO3s2RAPP0K1gSLxfm9TveLpNUppUyuoBB0bqRK59lJPT7ZInhrNJYsiZRAX83VTM1Vy8+Df89ylXPSvufXj7EN7jO65kVRlCWVYMFZ1QoVGmW7ynspSXdVblwZdtpf1aVfFcJYmHMyYSkL/+V5m9JMmI4BvfA5VhsnmXsRdEz1wWbPiwpNzTc8SQK0edrbbsYy63hr+FIZ7YrQXjKFEntjTsFzLoYUSTdFyUGI9bRkjeN6Q98V+yjkblFzWvAfwzAavyZfIroE0nbqLmB7cVO+GYRAR5TiRPsm1FpHMtoT5MunYmdgnGPN+/BkTTERZxA+GfNyVlUYTeyrNV0xf7GJFksLbyU8eKAhG0BZHpreA2eIs9V8 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Kairui Song Current allocation workflow first traverses the plist with a global lock held, after choosing a device, it uses the percpu cluster on that swap device. This commit moves the percpu cluster variable out of being tied to individual swap devices, making it a global percpu variable, and will be used directly for allocation as a fast path. The global percpu cluster variable will never point to a HDD device, and allocations on a HDD device are still globally serialized. This improves the allocator performance and prepares for removal of the slot cache in later commits. There shouldn't be much observable behavior change, except one thing: this changes how swap device allocation rotation works. Currently, each allocation will rotate the plist, and because of the existence of slot cache (one order 0 allocation usually returns 64 entries), swap devices of the same priority are rotated for every 64 order 0 entries consumed. High order allocations are different, they will bypass the slot cache, and so swap device is rotated for every 16K, 32K, or up to 2M allocation. The rotation rule was never clearly defined or documented, it was changed several times without mentioning. After this commit, and once slot cache is gone in later commits, swap device rotation will happen for every consumed cluster. Ideally non-HDD devices will be rotated if 2M space has been consumed for each order. Fragmented clusters will rotate the device faster, which seems OK. HDD devices is rotated for every allocation regardless of the allocation order, which should be OK too and trivial. This commit also slightly changes allocation behaviour for slot cache. The new added cluster allocation fast path may allocate entries from different device to the slot cache, this is not observable from user space, only impact performance very slightly, and slot cache will be just gone in next commit, so this can be ignored. Signed-off-by: Kairui Song --- include/linux/swap.h | 11 ++- mm/swapfile.c | 158 ++++++++++++++++++++++++++++++++----------- 2 files changed, 121 insertions(+), 48 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index 2fe91c293636..374bffc87427 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -284,12 +284,10 @@ enum swap_cluster_flags { #endif /* - * We assign a cluster to each CPU, so each CPU can allocate swap entry from - * its own cluster and swapout sequentially. The purpose is to optimize swapout - * throughput. + * We keep using same cluster for rotational device so IO will be sequential. + * The purpose is to optimize SWAP throughput on these device. */ -struct percpu_cluster { - local_lock_t lock; /* Protect the percpu_cluster above */ +struct swap_sequential_cluster { unsigned int next[SWAP_NR_ORDERS]; /* Likely next allocation offset */ }; @@ -315,8 +313,7 @@ struct swap_info_struct { atomic_long_t frag_cluster_nr[SWAP_NR_ORDERS]; unsigned int pages; /* total of usable pages of swap */ atomic_long_t inuse_pages; /* number of those currently in use */ - struct percpu_cluster __percpu *percpu_cluster; /* per cpu's swap location */ - struct percpu_cluster *global_cluster; /* Use one global cluster for rotating device */ + struct swap_sequential_cluster *global_cluster; /* Use one global cluster for rotating device */ spinlock_t global_cluster_lock; /* Serialize usage of global cluster */ struct rb_root swap_extent_root;/* root of the swap extent rbtree */ struct block_device *bdev; /* swap device or bdev of swap file */ diff --git a/mm/swapfile.c b/mm/swapfile.c index db836670c334..8b296c4c636b 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -116,6 +116,18 @@ static atomic_t proc_poll_event = ATOMIC_INIT(0); atomic_t nr_rotate_swap = ATOMIC_INIT(0); +struct percpu_swap_cluster { + struct swap_info_struct *si[SWAP_NR_ORDERS]; + unsigned long offset[SWAP_NR_ORDERS]; + local_lock_t lock; +}; + +static DEFINE_PER_CPU(struct percpu_swap_cluster, percpu_swap_cluster) = { + .si = { NULL }, + .offset = { SWAP_ENTRY_INVALID }, + .lock = INIT_LOCAL_LOCK(), +}; + static struct swap_info_struct *swap_type_to_swap_info(int type) { if (type >= MAX_SWAPFILES) @@ -539,7 +551,7 @@ static bool swap_do_scheduled_discard(struct swap_info_struct *si) ci = list_first_entry(&si->discard_clusters, struct swap_cluster_info, list); /* * Delete the cluster from list to prepare for discard, but keep - * the CLUSTER_FLAG_DISCARD flag, there could be percpu_cluster + * the CLUSTER_FLAG_DISCARD flag, percpu_swap_cluster could be * pointing to it, or ran into by relocate_cluster. */ list_del(&ci->list); @@ -805,10 +817,12 @@ static unsigned int alloc_swap_scan_cluster(struct swap_info_struct *si, out: relocate_cluster(si, ci); unlock_cluster(ci); - if (si->flags & SWP_SOLIDSTATE) - __this_cpu_write(si->percpu_cluster->next[order], next); - else + if (si->flags & SWP_SOLIDSTATE) { + this_cpu_write(percpu_swap_cluster.offset[order], next); + this_cpu_write(percpu_swap_cluster.si[order], si); + } else { si->global_cluster->next[order] = next; + } return found; } @@ -862,20 +876,18 @@ static void swap_reclaim_work(struct work_struct *work) } /* - * Try to get swap entries with specified order from current cpu's swap entry - * pool (a cluster). This might involve allocating a new cluster for current CPU - * too. + * Try to allocate swap entries with specified order and try set a new + * cluster for current CPU too. */ static unsigned long cluster_alloc_swap_entry(struct swap_info_struct *si, int order, unsigned char usage) { struct swap_cluster_info *ci; - unsigned int offset, found = 0; + unsigned int offset = SWAP_ENTRY_INVALID, found = SWAP_ENTRY_INVALID; if (si->flags & SWP_SOLIDSTATE) { - /* Fast path using per CPU cluster */ - local_lock(&si->percpu_cluster->lock); - offset = __this_cpu_read(si->percpu_cluster->next[order]); + if (si == this_cpu_read(percpu_swap_cluster.si[order])) + offset = this_cpu_read(percpu_swap_cluster.offset[order]); } else { /* Serialize HDD SWAP allocation for each device. */ spin_lock(&si->global_cluster_lock); @@ -973,9 +985,7 @@ static unsigned long cluster_alloc_swap_entry(struct swap_info_struct *si, int o } } done: - if (si->flags & SWP_SOLIDSTATE) - local_unlock(&si->percpu_cluster->lock); - else + if (!(si->flags & SWP_SOLIDSTATE)) spin_unlock(&si->global_cluster_lock); return found; } @@ -1196,6 +1206,51 @@ static bool get_swap_device_info(struct swap_info_struct *si) return true; } +/* + * Fast path try to get swap entries with specified order from current + * CPU's swap entry pool (a cluster). + */ +static int swap_alloc_fast(swp_entry_t entries[], + unsigned char usage, + int order, int n_goal) +{ + struct swap_cluster_info *ci; + struct swap_info_struct *si; + unsigned int offset, found; + int n_ret = 0; + + n_goal = min(n_goal, SWAP_BATCH); + + /* + * Once allocated, swap_info_struct will never be completely freed, + * so checking it's liveness by get_swap_device_info is enough. + */ + si = this_cpu_read(percpu_swap_cluster.si[order]); + offset = this_cpu_read(percpu_swap_cluster.offset[order]); + if (!si || !offset || !get_swap_device_info(si)) + return 0; + + while (offset) { + ci = lock_cluster(si, offset); + if (!cluster_is_usable(ci, order)) { + unlock_cluster(ci); + break; + } + if (cluster_is_empty(ci)) + offset = cluster_offset(si, ci); + found = alloc_swap_scan_cluster(si, ci, offset, order, usage); + if (!found) + break; + entries[n_ret++] = swp_entry(si->type, found); + if (n_ret == n_goal) + break; + offset = this_cpu_read(percpu_swap_cluster.offset[order]); + } + + put_swap_device(si); + return n_ret; +} + int get_swap_pages(int n_goal, swp_entry_t swp_entries[], int entry_order) { int order = swap_entry_order(entry_order); @@ -1204,19 +1259,36 @@ int get_swap_pages(int n_goal, swp_entry_t swp_entries[], int entry_order) int n_ret = 0; int node; + /* Fast path using percpu cluster */ + local_lock(&percpu_swap_cluster.lock); + n_ret = swap_alloc_fast(swp_entries, + SWAP_HAS_CACHE, + order, n_goal); + if (n_ret == n_goal) + goto out; + + n_goal = min_t(int, n_goal - n_ret, SWAP_BATCH); + /* Rotate the device and switch to a new cluster */ spin_lock(&swap_avail_lock); start_over: node = numa_node_id(); plist_for_each_entry_safe(si, next, &swap_avail_heads[node], avail_lists[node]) { - /* requeue si to after same-priority siblings */ plist_requeue(&si->avail_lists[node], &swap_avail_heads[node]); spin_unlock(&swap_avail_lock); if (get_swap_device_info(si)) { - n_ret = scan_swap_map_slots(si, SWAP_HAS_CACHE, - n_goal, swp_entries, order); + /* + * For order 0 allocation, try best to fill the request + * as it's used by slot cache. + * + * For mTHP allocation, it always have n_goal == 1, + * and falling a mTHP swapin will just make the caller + * fallback to order 0 allocation, so just bail out. + */ + n_ret += scan_swap_map_slots(si, SWAP_HAS_CACHE, n_goal, + swp_entries + n_ret, order); put_swap_device(si); if (n_ret || size > 1) - goto check_out; + goto out; } spin_lock(&swap_avail_lock); @@ -1234,12 +1306,10 @@ int get_swap_pages(int n_goal, swp_entry_t swp_entries[], int entry_order) if (plist_node_empty(&next->avail_lists[node])) goto start_over; } - spin_unlock(&swap_avail_lock); - -check_out: +out: + local_unlock(&percpu_swap_cluster.lock); atomic_long_sub(n_ret * size, &nr_swap_pages); - return n_ret; } @@ -2597,6 +2667,28 @@ static void wait_for_allocation(struct swap_info_struct *si) } } +/* + * Called after swap device's reference count is dead, so + * neither scan nor allocation will use it. + */ +static void flush_percpu_swap_cluster(struct swap_info_struct *si) +{ + int cpu, i; + struct swap_info_struct **pcp_si; + + for_each_possible_cpu(cpu) { + pcp_si = per_cpu_ptr(percpu_swap_cluster.si, cpu); + /* + * Invalidate the percpu swap cluster cache, si->users + * is dead, so no new user will point to it, just flush + * any existing user. + */ + for (i = 0; i < SWAP_NR_ORDERS; i++) + cmpxchg(&pcp_si[i], si, NULL); + } +} + + SYSCALL_DEFINE1(swapoff, const char __user *, specialfile) { struct swap_info_struct *p = NULL; @@ -2698,6 +2790,7 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile) flush_work(&p->discard_work); flush_work(&p->reclaim_work); + flush_percpu_swap_cluster(p); destroy_swap_extents(p); if (p->flags & SWP_CONTINUED) @@ -2725,8 +2818,6 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile) arch_swap_invalidate_area(p->type); zswap_swapoff(p->type); mutex_unlock(&swapon_mutex); - free_percpu(p->percpu_cluster); - p->percpu_cluster = NULL; kfree(p->global_cluster); p->global_cluster = NULL; vfree(swap_map); @@ -3125,7 +3216,7 @@ static struct swap_cluster_info *setup_clusters(struct swap_info_struct *si, unsigned long nr_clusters = DIV_ROUND_UP(maxpages, SWAPFILE_CLUSTER); struct swap_cluster_info *cluster_info; unsigned long i, j, idx; - int cpu, err = -ENOMEM; + int err = -ENOMEM; cluster_info = kvcalloc(nr_clusters, sizeof(*cluster_info), GFP_KERNEL); if (!cluster_info) @@ -3134,20 +3225,7 @@ static struct swap_cluster_info *setup_clusters(struct swap_info_struct *si, for (i = 0; i < nr_clusters; i++) spin_lock_init(&cluster_info[i].lock); - if (si->flags & SWP_SOLIDSTATE) { - si->percpu_cluster = alloc_percpu(struct percpu_cluster); - if (!si->percpu_cluster) - goto err_free; - - for_each_possible_cpu(cpu) { - struct percpu_cluster *cluster; - - cluster = per_cpu_ptr(si->percpu_cluster, cpu); - for (i = 0; i < SWAP_NR_ORDERS; i++) - cluster->next[i] = SWAP_ENTRY_INVALID; - local_lock_init(&cluster->lock); - } - } else { + if (!(si->flags & SWP_SOLIDSTATE)) { si->global_cluster = kmalloc(sizeof(*si->global_cluster), GFP_KERNEL); if (!si->global_cluster) @@ -3424,8 +3502,6 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags) bad_swap_unlock_inode: inode_unlock(inode); bad_swap: - free_percpu(si->percpu_cluster); - si->percpu_cluster = NULL; kfree(si->global_cluster); si->global_cluster = NULL; inode = NULL; From patchwork Thu Mar 13 16:59:34 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kairui Song X-Patchwork-Id: 14015505 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2A5E1C282DE for ; Thu, 13 Mar 2025 17:01:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AC86428000B; Thu, 13 Mar 2025 13:01:55 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A7812280001; Thu, 13 Mar 2025 13:01:55 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8C8A128000B; Thu, 13 Mar 2025 13:01:55 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id F1355280001 for ; Thu, 13 Mar 2025 13:01:54 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id A263C1C7A08 for ; Thu, 13 Mar 2025 17:01:55 +0000 (UTC) X-FDA: 83217144990.03.37936DE Received: from mail-pj1-f54.google.com (mail-pj1-f54.google.com [209.85.216.54]) by imf21.hostedemail.com (Postfix) with ESMTP id B8E9D1C0027 for ; Thu, 13 Mar 2025 17:01:52 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=YDMZszim; spf=pass (imf21.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.216.54 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1741885313; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=+4I9LquWhQFTzHzn4a566KpmzuDLRNhnu2Nm0KE4GXs=; b=nUNJVaXsTPhOSszlwaak5vTwy3wIDPiSEOBCCrkPf3UaJTAoNUQuxKRm4/cTh8QHYsg63y zJkLZJP4TMp6sbgx/o/uyIqsO5Brb1AC7xB/3VIucfCxIThMJ4n4Ip/spFTS2qPE6cPJ2L 4/jcRAFHwuQZs9vvWU3szqUHv1Xn/l0= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=YDMZszim; spf=pass (imf21.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.216.54 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1741885313; a=rsa-sha256; cv=none; b=Bop+jWApNgNHNIxvYkLkKdBmV+G8rej/ETL+dGPpryRsDhyNTn4yMAFSFfrCJM1L4Ni1+e QZP5egfTiOWP4KpWw8u0hUc8XDvfJrK/bTDs3FgiHG1zMCYwRzz2zg9CcSsKaZ0BEokHrv DklSaREk6QjYWwFSsTOR+wVO5QQiUb0= Received: by mail-pj1-f54.google.com with SMTP id 98e67ed59e1d1-2fee4d9c2efso2245674a91.3 for ; Thu, 13 Mar 2025 10:01:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1741885309; x=1742490109; darn=kvack.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=+4I9LquWhQFTzHzn4a566KpmzuDLRNhnu2Nm0KE4GXs=; b=YDMZszim3Ix3VMbxFudyqrG2uf+cvReJWFGdQ9jO4j/EoWN/ZZqwMmfArlS9VJ2rY+ pYj0QSCfE3WJvXPwBg8xnxVNXC/C6X2Oy9af1hH8RG6nsnBZ4pPdvPgqzuQk8wYXRxLU HgsggDJS9eJwkfc2rSO1PLsWs2/PD1W1WGNOfVl1w4d75gG9Kikah9kJbAPKyxSp6pVg ls2tEB/4NvpIHyAUkugbMffVmuuyEvEdTRwFyjpnx/F8M6EJOcgHtEQshG5naW/zD3Yx Ru8gzBjK45EYZaK0Dx9MGDtwgPT9E49QGPlwaMG2aeaY/7wqXq58HhoToP+0x8oe3J2N l6jg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741885309; x=1742490109; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=+4I9LquWhQFTzHzn4a566KpmzuDLRNhnu2Nm0KE4GXs=; b=KV9qhctHqrjbEwQyeCNEjCcPbPOEgg+NiLJonASZcXKNBpanc5r6gCdwA2RD+A4drI WAH1kNR7lgf0v7EVR1q2XIlzMRqczdJUMRL4LcPLg5SHsnOqpiU8G62AhtqTE1bKwX0j uwVXbBVirTNmXw9joGqsTKkxQx5mZl67+SVGtFDlwGjn+4G7MvGqF4P9cshyDpkwGUji mYKpAy6FnvfTtOaXLiyrLkDknGXSwKOj/pXOrluVhtSNi/P6GVVS0CFM325+elUTrydg d/+QYB7x+LhGCSzCpbWKoGEr34R/U8SF8xoZvOK4iqP4gaP3dFGWjV0YDOqCpMMTxQrE qaqg== X-Gm-Message-State: AOJu0YzyFwXe4avswWfhIx/bYP3U+gQ7Fja0IjwdNmPFvRP6TCwgC9GF SO2wjyRkMGqVx7hncd6Vt0F0sygLOoUHkREYlODnOfUWfbg1b5+YN1+hQAkgR3g= X-Gm-Gg: ASbGncvKSLi5Jx76rBz3N3fB3Ef3apopMXQ9gBycfPl3t9GPDzxcOENsrfXBqjcoafK oGM9XsGoESxOalp2FpNnI2cO9mpknh9Bb2K8+DbWBxWFsAOSfAmNJ8sITDeE6zOEbRM8tmI/BLj LejgINIFYXrBIOpftoB/JrpUpuE8TUWkmnkfpBjo4KJzmGVorwtSZrKRbJBTquN0tWlnc4gOjuB +D0h9oHvdSdSYt+9EdEbh+IO7wyJDGw0F3cxXXcMOD1GGeU0OG7SjFvAG67jnMke6jK0R27bQmT H4V3pnvORwLJl5RmogPFrza9qbmCFuK2TMyK9tDC8vkCjFZaU5fsOGMxBbODt/zQ2Q== X-Google-Smtp-Source: AGHT+IHzRh/3QVbVX3C2ihyaXTIz2R6apaCZC4LVpBEGHzBQKQXE2vKXrVdZTJZ32qLaga94XUD4Sg== X-Received: by 2002:a05:6a21:700b:b0:1f5:97c3:41b9 with SMTP id adf61e73a8af0-1f5bd7a97d8mr745989637.5.1741885309064; Thu, 13 Mar 2025 10:01:49 -0700 (PDT) Received: from KASONG-MC4.tencent.com ([106.37.123.220]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-7371167df0esm1613529b3a.93.2025.03.13.10.01.43 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 13 Mar 2025 10:01:48 -0700 (PDT) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Chris Li , Barry Song , Hugh Dickins , Yosry Ahmed , "Huang, Ying" , Baoquan He , Nhat Pham , Johannes Weiner , Baolin Wang , Kalesh Singh , Matthew Wilcox , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH v3 6/7] mm, swap: remove swap slot cache Date: Fri, 14 Mar 2025 00:59:34 +0800 Message-ID: <20250313165935.63303-7-ryncsn@gmail.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250313165935.63303-1-ryncsn@gmail.com> References: <20250313165935.63303-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Queue-Id: B8E9D1C0027 X-Rspamd-Server: rspam08 X-Stat-Signature: 1ysp96cm81d5d6kdisrz56xtxg677ubo X-HE-Tag: 1741885312-891898 X-HE-Meta: U2FsdGVkX1+Ko+l7nU4jhaaOl7rLWNZQdJFYbAv3wXFg4zi4kT6QHfgDdfIb1LLwfhF+xHe7BeSheq14/A+Fp4VzHeboKvXbhTbgUXfmBove9HL7/m8WzAU2tXCTy7uOubf8I0JN61yKPuTfJozhhQx7K7ML1dP+fCQjVOPj+82aDXaDq8Z1qcZ5BVoKXm0GSKRovkYm3sm1eu9c9LjIUCnOMwfAPBeqOQPg4+rnRiLXmutKQxNUGRvUkM1LQXSvpb9VqIdRC95uvCFJY46E7pw5h9spEgk8OgQ6lRjGSG/+B3sS3JSwtIoP1/7L2stmbXX2SBoatJi0D4rK7lLz/O0t4pRjvD1XgYD87qjPftLhdBjXaqoRcboj5bLdlOBDN+BzmJKuFu1vOzr1btU8sCyIAUPADKYHZNs/I/OIXV44oPSPEdZlxHioucJYA7Sf1ATeKeOVPsBQ07VWttdLQvfJXeq/5jsVXXuPRWIvzphAsLQo9wNBIC+nvCsO8bjqUXsYr0dPh7FOhFNT5/3nlKHuD253nYk727RygR+20KzdyXM/JG4LlQc0ZIklBdzmriF69xQJrZ/+Lydom8hmhkCOVRXGOcQdSlPqy/VB36A0MiPGS0C4tunEOfFcSyiQa061mUqZdyHXfpu8S5hgZHB5EQ2XWYHrMRtP3NG+fn1YpwgrDvi3/+nnEFgK5hMllQ21At9Y3agobbv/orI+ayw8XtpSEwdlbZKDgTchl+qNArfa5KL6kuCLMIWBsdgFyp/sEZPgfSjOYSeK7jOXRL5hFIremBQzZYqdTknOmXk7hKNVFMjWAZMis3+W1+V84fbJwomm7AVceE700nLHbZa7w13zFZahIps3vvVn2+MEhetQXgsR9ODxqpOiPpAqBFXSZMMA8WcomGHcjUnYrypqQZhYWhQGHrCyJzIKnGnova8uBGvFKM8hAdVRcRt6qahwSZkNIcPfo11oJYf 7Dv5jfBW kr8GthPh+d2cSzh3whXuWEX/3nxrjla4Sdna/9tZ+LIkftpew5KqkIQdwQvpsLOAwuM5E4VsOnos+zGi/N/PWsblNLyL2bnRlnrcr+uN231bYqw4ZqovXJzWqsMleGy4pDsXyt0a1Ib83uLIeVQxcR1dMWWEdDQ+jIkEJ7P81+vuM8dDWgve4wrbN/zSqAs/AZeYhpeZghZDStPbbWm51/jxAuA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Kairui Song Slot cache is no longer needed now, removing it and all related code. - vm-scalability with: `usemem --init-time -O -y -x -R -31 1G`, 12G memory cgroup using simulated pmem as SWAP (32G pmem, 32 CPUs), 16 test runs for each case, measuring the total throughput: Before (KB/s) (stdev) After (KB/s) (stdev) Random (4K): 424907.60 (24410.78) 414745.92 (34554.78) Random (64K): 163308.82 (11635.72) 167314.50 (18434.99) Sequential (4K, !-R): 6150056.79 (103205.90) 6321469.06 (115878.16) The performance changes are below noise level. - Build linux kernel with make -j96, using 4K folio with 1.5G memory cgroup limit and 64K folio with 2G memory cgroup limit, on top of tmpfs, 12 test runs, measuring the system time: Before (s) (stdev) After (s) (stdev) make -j96 (4K): 6445.69 (61.95) 6408.80 (69.46) make -j96 (64K): 6841.71 (409.04) 6437.99 (435.55) Similar to above, 64k mTHP case showed a slight improvement. Signed-off-by: Kairui Song Reviewed-by: Baoquan He --- include/linux/swap.h | 3 - include/linux/swap_slots.h | 28 ---- mm/Makefile | 2 +- mm/swap_slots.c | 295 ------------------------------------- mm/swap_state.c | 8 +- mm/swapfile.c | 194 ++++++++---------------- 6 files changed, 67 insertions(+), 463 deletions(-) delete mode 100644 include/linux/swap_slots.h delete mode 100644 mm/swap_slots.c diff --git a/include/linux/swap.h b/include/linux/swap.h index 374bffc87427..c5856dcc263a 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -465,7 +465,6 @@ void free_pages_and_swap_cache(struct encoded_page **, int); extern atomic_long_t nr_swap_pages; extern long total_swap_pages; extern atomic_t nr_rotate_swap; -extern bool has_usable_swap(void); /* Swap 50% full? Release swapcache more aggressively.. */ static inline bool vm_swap_full(void) @@ -483,13 +482,11 @@ swp_entry_t folio_alloc_swap(struct folio *folio); bool folio_free_swap(struct folio *folio); void put_swap_folio(struct folio *folio, swp_entry_t entry); extern swp_entry_t get_swap_page_of_type(int); -extern int get_swap_pages(int n, swp_entry_t swp_entries[], int order); extern int add_swap_count_continuation(swp_entry_t, gfp_t); extern void swap_shmem_alloc(swp_entry_t, int); extern int swap_duplicate(swp_entry_t); extern int swapcache_prepare(swp_entry_t entry, int nr); extern void swap_free_nr(swp_entry_t entry, int nr_pages); -extern void swapcache_free_entries(swp_entry_t *entries, int n); extern void free_swap_and_cache_nr(swp_entry_t entry, int nr); int swap_type_of(dev_t device, sector_t offset); int find_first_swap(dev_t *device); diff --git a/include/linux/swap_slots.h b/include/linux/swap_slots.h deleted file mode 100644 index 840aec3523b2..000000000000 --- a/include/linux/swap_slots.h +++ /dev/null @@ -1,28 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 */ -#ifndef _LINUX_SWAP_SLOTS_H -#define _LINUX_SWAP_SLOTS_H - -#include -#include -#include - -#define SWAP_SLOTS_CACHE_SIZE SWAP_BATCH -#define THRESHOLD_ACTIVATE_SWAP_SLOTS_CACHE (5*SWAP_SLOTS_CACHE_SIZE) -#define THRESHOLD_DEACTIVATE_SWAP_SLOTS_CACHE (2*SWAP_SLOTS_CACHE_SIZE) - -struct swap_slots_cache { - bool lock_initialized; - struct mutex alloc_lock; /* protects slots, nr, cur */ - swp_entry_t *slots; - int nr; - int cur; - int n_ret; -}; - -void disable_swap_slots_cache_lock(void); -void reenable_swap_slots_cache_unlock(void); -void enable_swap_slots_cache(void); - -extern bool swap_slot_cache_enabled; - -#endif /* _LINUX_SWAP_SLOTS_H */ diff --git a/mm/Makefile b/mm/Makefile index 4510a9869e77..e7f6bbf8ae5f 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -75,7 +75,7 @@ ifdef CONFIG_MMU obj-$(CONFIG_ADVISE_SYSCALLS) += madvise.o endif -obj-$(CONFIG_SWAP) += page_io.o swap_state.o swapfile.o swap_slots.o +obj-$(CONFIG_SWAP) += page_io.o swap_state.o swapfile.o obj-$(CONFIG_ZSWAP) += zswap.o obj-$(CONFIG_HAS_DMA) += dmapool.o obj-$(CONFIG_HUGETLBFS) += hugetlb.o diff --git a/mm/swap_slots.c b/mm/swap_slots.c deleted file mode 100644 index 9c7c171df7ba..000000000000 --- a/mm/swap_slots.c +++ /dev/null @@ -1,295 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0 -/* - * Manage cache of swap slots to be used for and returned from - * swap. - * - * Copyright(c) 2016 Intel Corporation. - * - * Author: Tim Chen - * - * We allocate the swap slots from the global pool and put - * it into local per cpu caches. This has the advantage - * of no needing to acquire the swap_info lock every time - * we need a new slot. - * - * There is also opportunity to simply return the slot - * to local caches without needing to acquire swap_info - * lock. We do not reuse the returned slots directly but - * move them back to the global pool in a batch. This - * allows the slots to coalesce and reduce fragmentation. - * - * The swap entry allocated is marked with SWAP_HAS_CACHE - * flag in map_count that prevents it from being allocated - * again from the global pool. - * - * The swap slots cache is protected by a mutex instead of - * a spin lock as when we search for slots with scan_swap_map, - * we can possibly sleep. - */ - -#include -#include -#include -#include -#include -#include -#include - -static DEFINE_PER_CPU(struct swap_slots_cache, swp_slots); -static bool swap_slot_cache_active; -bool swap_slot_cache_enabled; -static bool swap_slot_cache_initialized; -static DEFINE_MUTEX(swap_slots_cache_mutex); -/* Serialize swap slots cache enable/disable operations */ -static DEFINE_MUTEX(swap_slots_cache_enable_mutex); - -static void __drain_swap_slots_cache(void); - -#define use_swap_slot_cache (swap_slot_cache_active && swap_slot_cache_enabled) - -static void deactivate_swap_slots_cache(void) -{ - mutex_lock(&swap_slots_cache_mutex); - swap_slot_cache_active = false; - __drain_swap_slots_cache(); - mutex_unlock(&swap_slots_cache_mutex); -} - -static void reactivate_swap_slots_cache(void) -{ - mutex_lock(&swap_slots_cache_mutex); - swap_slot_cache_active = true; - mutex_unlock(&swap_slots_cache_mutex); -} - -/* Must not be called with cpu hot plug lock */ -void disable_swap_slots_cache_lock(void) -{ - mutex_lock(&swap_slots_cache_enable_mutex); - swap_slot_cache_enabled = false; - if (swap_slot_cache_initialized) { - /* serialize with cpu hotplug operations */ - cpus_read_lock(); - __drain_swap_slots_cache(); - cpus_read_unlock(); - } -} - -static void __reenable_swap_slots_cache(void) -{ - swap_slot_cache_enabled = has_usable_swap(); -} - -void reenable_swap_slots_cache_unlock(void) -{ - __reenable_swap_slots_cache(); - mutex_unlock(&swap_slots_cache_enable_mutex); -} - -static bool check_cache_active(void) -{ - long pages; - - if (!swap_slot_cache_enabled) - return false; - - pages = get_nr_swap_pages(); - if (!swap_slot_cache_active) { - if (pages > num_online_cpus() * - THRESHOLD_ACTIVATE_SWAP_SLOTS_CACHE) - reactivate_swap_slots_cache(); - goto out; - } - - /* if global pool of slot caches too low, deactivate cache */ - if (pages < num_online_cpus() * THRESHOLD_DEACTIVATE_SWAP_SLOTS_CACHE) - deactivate_swap_slots_cache(); -out: - return swap_slot_cache_active; -} - -static int alloc_swap_slot_cache(unsigned int cpu) -{ - struct swap_slots_cache *cache; - swp_entry_t *slots; - - /* - * Do allocation outside swap_slots_cache_mutex - * as kvzalloc could trigger reclaim and folio_alloc_swap, - * which can lock swap_slots_cache_mutex. - */ - slots = kvcalloc(SWAP_SLOTS_CACHE_SIZE, sizeof(swp_entry_t), - GFP_KERNEL); - if (!slots) - return -ENOMEM; - - mutex_lock(&swap_slots_cache_mutex); - cache = &per_cpu(swp_slots, cpu); - if (cache->slots) { - /* cache already allocated */ - mutex_unlock(&swap_slots_cache_mutex); - - kvfree(slots); - - return 0; - } - - if (!cache->lock_initialized) { - mutex_init(&cache->alloc_lock); - cache->lock_initialized = true; - } - cache->nr = 0; - cache->cur = 0; - cache->n_ret = 0; - /* - * We initialized alloc_lock and free_lock earlier. We use - * !cache->slots or !cache->slots_ret to know if it is safe to acquire - * the corresponding lock and use the cache. Memory barrier below - * ensures the assumption. - */ - mb(); - cache->slots = slots; - mutex_unlock(&swap_slots_cache_mutex); - return 0; -} - -static void drain_slots_cache_cpu(unsigned int cpu, bool free_slots) -{ - struct swap_slots_cache *cache; - - cache = &per_cpu(swp_slots, cpu); - if (cache->slots) { - mutex_lock(&cache->alloc_lock); - swapcache_free_entries(cache->slots + cache->cur, cache->nr); - cache->cur = 0; - cache->nr = 0; - if (free_slots && cache->slots) { - kvfree(cache->slots); - cache->slots = NULL; - } - mutex_unlock(&cache->alloc_lock); - } -} - -static void __drain_swap_slots_cache(void) -{ - unsigned int cpu; - - /* - * This function is called during - * 1) swapoff, when we have to make sure no - * left over slots are in cache when we remove - * a swap device; - * 2) disabling of swap slot cache, when we run low - * on swap slots when allocating memory and need - * to return swap slots to global pool. - * - * We cannot acquire cpu hot plug lock here as - * this function can be invoked in the cpu - * hot plug path: - * cpu_up -> lock cpu_hotplug -> cpu hotplug state callback - * -> memory allocation -> direct reclaim -> folio_alloc_swap - * -> drain_swap_slots_cache - * - * Hence the loop over current online cpu below could miss cpu that - * is being brought online but not yet marked as online. - * That is okay as we do not schedule and run anything on a - * cpu before it has been marked online. Hence, we will not - * fill any swap slots in slots cache of such cpu. - * There are no slots on such cpu that need to be drained. - */ - for_each_online_cpu(cpu) - drain_slots_cache_cpu(cpu, false); -} - -static int free_slot_cache(unsigned int cpu) -{ - mutex_lock(&swap_slots_cache_mutex); - drain_slots_cache_cpu(cpu, true); - mutex_unlock(&swap_slots_cache_mutex); - return 0; -} - -void enable_swap_slots_cache(void) -{ - mutex_lock(&swap_slots_cache_enable_mutex); - if (!swap_slot_cache_initialized) { - int ret; - - ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "swap_slots_cache", - alloc_swap_slot_cache, free_slot_cache); - if (WARN_ONCE(ret < 0, "Cache allocation failed (%s), operating " - "without swap slots cache.\n", __func__)) - goto out_unlock; - - swap_slot_cache_initialized = true; - } - - __reenable_swap_slots_cache(); -out_unlock: - mutex_unlock(&swap_slots_cache_enable_mutex); -} - -/* called with swap slot cache's alloc lock held */ -static int refill_swap_slots_cache(struct swap_slots_cache *cache) -{ - if (!use_swap_slot_cache) - return 0; - - cache->cur = 0; - if (swap_slot_cache_active) - cache->nr = get_swap_pages(SWAP_SLOTS_CACHE_SIZE, - cache->slots, 0); - - return cache->nr; -} - -swp_entry_t folio_alloc_swap(struct folio *folio) -{ - swp_entry_t entry; - struct swap_slots_cache *cache; - - entry.val = 0; - - if (folio_test_large(folio)) { - if (IS_ENABLED(CONFIG_THP_SWAP)) - get_swap_pages(1, &entry, folio_order(folio)); - goto out; - } - - /* - * Preemption is allowed here, because we may sleep - * in refill_swap_slots_cache(). But it is safe, because - * accesses to the per-CPU data structure are protected by the - * mutex cache->alloc_lock. - * - * The alloc path here does not touch cache->slots_ret - * so cache->free_lock is not taken. - */ - cache = raw_cpu_ptr(&swp_slots); - - if (likely(check_cache_active() && cache->slots)) { - mutex_lock(&cache->alloc_lock); - if (cache->slots) { -repeat: - if (cache->nr) { - entry = cache->slots[cache->cur]; - cache->slots[cache->cur++].val = 0; - cache->nr--; - } else if (refill_swap_slots_cache(cache)) { - goto repeat; - } - } - mutex_unlock(&cache->alloc_lock); - if (entry.val) - goto out; - } - - get_swap_pages(1, &entry, 0); -out: - if (mem_cgroup_try_charge_swap(folio, entry)) { - put_swap_folio(folio, entry); - entry.val = 0; - } - return entry; -} diff --git a/mm/swap_state.c b/mm/swap_state.c index 50840a2887a5..2b5744e211cd 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -20,7 +20,6 @@ #include #include #include -#include #include #include #include "internal.h" @@ -447,13 +446,8 @@ struct folio *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, /* * Just skip read ahead for unused swap slot. - * During swap_off when swap_slot_cache is disabled, - * we have to handle the race between putting - * swap entry in swap cache and marking swap slot - * as SWAP_HAS_CACHE. That's done in later part of code or - * else swap_off will be aborted if we return NULL. */ - if (!swap_entry_swapped(si, entry) && swap_slot_cache_enabled) + if (!swap_entry_swapped(si, entry)) goto put_and_return; /* diff --git a/mm/swapfile.c b/mm/swapfile.c index 8b296c4c636b..9bd95173865d 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -37,7 +37,6 @@ #include #include #include -#include #include #include #include @@ -885,16 +884,20 @@ static unsigned long cluster_alloc_swap_entry(struct swap_info_struct *si, int o struct swap_cluster_info *ci; unsigned int offset = SWAP_ENTRY_INVALID, found = SWAP_ENTRY_INVALID; - if (si->flags & SWP_SOLIDSTATE) { - if (si == this_cpu_read(percpu_swap_cluster.si[order])) - offset = this_cpu_read(percpu_swap_cluster.offset[order]); - } else { + /* + * Swapfile is not block device so unable + * to allocate large entries. + */ + if (order && !(si->flags & SWP_BLKDEV)) + return 0; + + if (!(si->flags & SWP_SOLIDSTATE)) { /* Serialize HDD SWAP allocation for each device. */ spin_lock(&si->global_cluster_lock); offset = si->global_cluster->next[order]; - } + if (offset == SWAP_ENTRY_INVALID) + goto new_cluster; - if (offset) { ci = lock_cluster(si, offset); /* Cluster could have been used by another order */ if (cluster_is_usable(ci, order)) { @@ -1153,43 +1156,6 @@ static void swap_range_free(struct swap_info_struct *si, unsigned long offset, swap_usage_sub(si, nr_entries); } -static int scan_swap_map_slots(struct swap_info_struct *si, - unsigned char usage, int nr, - swp_entry_t slots[], int order) -{ - unsigned int nr_pages = 1 << order; - int n_ret = 0; - - if (order > 0) { - /* - * Should not even be attempting large allocations when huge - * page swap is disabled. Warn and fail the allocation. - */ - if (!IS_ENABLED(CONFIG_THP_SWAP) || - nr_pages > SWAPFILE_CLUSTER) { - VM_WARN_ON_ONCE(1); - return 0; - } - - /* - * Swapfile is not block device so unable - * to allocate large entries. - */ - if (!(si->flags & SWP_BLKDEV)) - return 0; - } - - while (n_ret < nr) { - unsigned long offset = cluster_alloc_swap_entry(si, order, usage); - - if (!offset) - break; - slots[n_ret++] = swp_entry(si->type, offset); - } - - return n_ret; -} - static bool get_swap_device_info(struct swap_info_struct *si) { if (!percpu_ref_tryget_live(&si->users)) @@ -1210,16 +1176,13 @@ static bool get_swap_device_info(struct swap_info_struct *si) * Fast path try to get swap entries with specified order from current * CPU's swap entry pool (a cluster). */ -static int swap_alloc_fast(swp_entry_t entries[], +static int swap_alloc_fast(swp_entry_t *entry, unsigned char usage, - int order, int n_goal) + int order) { struct swap_cluster_info *ci; struct swap_info_struct *si; - unsigned int offset, found; - int n_ret = 0; - - n_goal = min(n_goal, SWAP_BATCH); + unsigned int offset, found = SWAP_ENTRY_INVALID; /* * Once allocated, swap_info_struct will never be completely freed, @@ -1228,46 +1191,48 @@ static int swap_alloc_fast(swp_entry_t entries[], si = this_cpu_read(percpu_swap_cluster.si[order]); offset = this_cpu_read(percpu_swap_cluster.offset[order]); if (!si || !offset || !get_swap_device_info(si)) - return 0; + return false; - while (offset) { - ci = lock_cluster(si, offset); - if (!cluster_is_usable(ci, order)) { - unlock_cluster(ci); - break; - } + ci = lock_cluster(si, offset); + if (cluster_is_usable(ci, order)) { if (cluster_is_empty(ci)) offset = cluster_offset(si, ci); found = alloc_swap_scan_cluster(si, ci, offset, order, usage); - if (!found) - break; - entries[n_ret++] = swp_entry(si->type, found); - if (n_ret == n_goal) - break; - offset = this_cpu_read(percpu_swap_cluster.offset[order]); + if (found) + *entry = swp_entry(si->type, found); + } else { + unlock_cluster(ci); } put_swap_device(si); - return n_ret; + return !!found; } -int get_swap_pages(int n_goal, swp_entry_t swp_entries[], int entry_order) +swp_entry_t folio_alloc_swap(struct folio *folio) { - int order = swap_entry_order(entry_order); - unsigned long size = 1 << order; + unsigned int order = folio_order(folio); + unsigned int size = 1 << order; struct swap_info_struct *si, *next; - int n_ret = 0; + swp_entry_t entry = {}; + unsigned long offset; int node; + if (order) { + /* + * Should not even be attempting large allocations when huge + * page swap is disabled. Warn and fail the allocation. + */ + if (!IS_ENABLED(CONFIG_THP_SWAP) || size > SWAPFILE_CLUSTER) { + VM_WARN_ON_ONCE(1); + return entry; + } + } + /* Fast path using percpu cluster */ local_lock(&percpu_swap_cluster.lock); - n_ret = swap_alloc_fast(swp_entries, - SWAP_HAS_CACHE, - order, n_goal); - if (n_ret == n_goal) + if (swap_alloc_fast(&entry, SWAP_HAS_CACHE, order)) goto out; - n_goal = min_t(int, n_goal - n_ret, SWAP_BATCH); /* Rotate the device and switch to a new cluster */ spin_lock(&swap_avail_lock); start_over: @@ -1276,18 +1241,13 @@ int get_swap_pages(int n_goal, swp_entry_t swp_entries[], int entry_order) plist_requeue(&si->avail_lists[node], &swap_avail_heads[node]); spin_unlock(&swap_avail_lock); if (get_swap_device_info(si)) { - /* - * For order 0 allocation, try best to fill the request - * as it's used by slot cache. - * - * For mTHP allocation, it always have n_goal == 1, - * and falling a mTHP swapin will just make the caller - * fallback to order 0 allocation, so just bail out. - */ - n_ret += scan_swap_map_slots(si, SWAP_HAS_CACHE, n_goal, - swp_entries + n_ret, order); + offset = cluster_alloc_swap_entry(si, order, SWAP_HAS_CACHE); put_swap_device(si); - if (n_ret || size > 1) + if (offset) { + entry = swp_entry(si->type, offset); + goto out; + } + if (order) goto out; } @@ -1309,8 +1269,14 @@ int get_swap_pages(int n_goal, swp_entry_t swp_entries[], int entry_order) spin_unlock(&swap_avail_lock); out: local_unlock(&percpu_swap_cluster.lock); - atomic_long_sub(n_ret * size, &nr_swap_pages); - return n_ret; + /* Need to call this even if allocation failed, for MEMCG_SWAP_FAIL. */ + if (mem_cgroup_try_charge_swap(folio, entry)) { + put_swap_folio(folio, entry); + entry.val = 0; + } + if (entry.val) + atomic_long_sub(size, &nr_swap_pages); + return entry; } static struct swap_info_struct *_swap_info_get(swp_entry_t entry) @@ -1606,25 +1572,6 @@ void put_swap_folio(struct folio *folio, swp_entry_t entry) unlock_cluster(ci); } -void swapcache_free_entries(swp_entry_t *entries, int n) -{ - int i; - struct swap_cluster_info *ci; - struct swap_info_struct *si = NULL; - - if (n <= 0) - return; - - for (i = 0; i < n; ++i) { - si = _swap_info_get(entries[i]); - if (si) { - ci = lock_cluster(si, swp_offset(entries[i])); - swap_entry_range_free(si, ci, entries[i], 1); - unlock_cluster(ci); - } - } -} - int __swap_count(swp_entry_t entry) { struct swap_info_struct *si = swp_swap_info(entry); @@ -1865,6 +1812,7 @@ void free_swap_and_cache_nr(swp_entry_t entry, int nr) swp_entry_t get_swap_page_of_type(int type) { struct swap_info_struct *si = swap_type_to_swap_info(type); + unsigned long offset; swp_entry_t entry = {0}; if (!si) @@ -1872,8 +1820,13 @@ swp_entry_t get_swap_page_of_type(int type) /* This is called for allocating swap entry, not cache */ if (get_swap_device_info(si)) { - if ((si->flags & SWP_WRITEOK) && scan_swap_map_slots(si, 1, 1, &entry, 0)) - atomic_long_dec(&nr_swap_pages); + if (si->flags & SWP_WRITEOK) { + offset = cluster_alloc_swap_entry(si, 0, 1); + if (offset) { + entry = swp_entry(si->type, offset); + atomic_long_dec(&nr_swap_pages); + } + } put_swap_device(si); } fail: @@ -2634,21 +2587,6 @@ static void reinsert_swap_info(struct swap_info_struct *si) spin_unlock(&swap_lock); } -static bool __has_usable_swap(void) -{ - return !plist_head_empty(&swap_active_head); -} - -bool has_usable_swap(void) -{ - bool ret; - - spin_lock(&swap_lock); - ret = __has_usable_swap(); - spin_unlock(&swap_lock); - return ret; -} - /* * Called after clearing SWP_WRITEOK, ensures cluster_alloc_range * see the updated flags, so there will be no more allocations. @@ -2761,8 +2699,6 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile) wait_for_allocation(p); - disable_swap_slots_cache_lock(); - set_current_oom_origin(); err = try_to_unuse(p->type); clear_current_oom_origin(); @@ -2770,12 +2706,9 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile) if (err) { /* re-insert swap space back into swap_list */ reinsert_swap_info(p); - reenable_swap_slots_cache_unlock(); goto out_dput; } - reenable_swap_slots_cache_unlock(); - /* * Wait for swap operations protected by get/put_swap_device() * to complete. Because of synchronize_rcu() here, all swap @@ -3525,8 +3458,6 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags) putname(name); if (inode) inode_unlock(inode); - if (!error) - enable_swap_slots_cache(); return error; } @@ -3922,6 +3853,11 @@ static void free_swap_count_continuations(struct swap_info_struct *si) } #if defined(CONFIG_MEMCG) && defined(CONFIG_BLK_CGROUP) +static bool __has_usable_swap(void) +{ + return !plist_head_empty(&swap_active_head); +} + void __folio_throttle_swaprate(struct folio *folio, gfp_t gfp) { struct swap_info_struct *si, *next; From patchwork Thu Mar 13 16:59:35 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kairui Song X-Patchwork-Id: 14015506 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7A45DC35FF3 for ; Thu, 13 Mar 2025 17:02:00 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0C98928000C; Thu, 13 Mar 2025 13:01:58 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 07A5A280001; Thu, 13 Mar 2025 13:01:57 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DC16F28000C; Thu, 13 Mar 2025 13:01:57 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id B6896280001 for ; Thu, 13 Mar 2025 13:01:57 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 3C5CD50547 for ; Thu, 13 Mar 2025 17:01:59 +0000 (UTC) X-FDA: 83217145158.11.B10F95E Received: from mail-pl1-f169.google.com (mail-pl1-f169.google.com [209.85.214.169]) by imf15.hostedemail.com (Postfix) with ESMTP id D3CC7A0039 for ; Thu, 13 Mar 2025 17:01:56 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=ZSuebyO8; spf=pass (imf15.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.169 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1741885316; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=N6Z7115UVx6Vl+tDLjAm/FdoMPRDXf2lrj7KLmbVn88=; b=pixXm0GqWyKP8DNpOtpfVAXQ/9tOPjUoTWo4HIH/m4N/lqNMAkGW0wYnrXzvQNbBUv+/US nKl9KJ5n/68dgcKxAiYwdtb1y6dcl7fEuNZAnG1HT+RJEsSKrdJMQu4ugVNCa6VgsgSvFO H7ytPth8BNEFprwYQ7qz7ItC8BKV1a8= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=ZSuebyO8; spf=pass (imf15.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.169 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1741885316; a=rsa-sha256; cv=none; b=l/iOO3M0OI2GCVf0KeC+IBPq+Co3xX/UKZFF+mkc74YjDlua5VYvKczQn9KnOSd5XwgntF 2omQR5m+GBmCRup/QcRVgKCCdpYcmoVPVo+qK2Wb8F6QVJ0I6X+O0h5prMK6YzISIfaiZC wd98uEaS87jHAVFUuZXKhx5RaO5yeqY= Received: by mail-pl1-f169.google.com with SMTP id d9443c01a7336-224019ad9edso32995575ad.1 for ; Thu, 13 Mar 2025 10:01:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1741885315; x=1742490115; darn=kvack.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=N6Z7115UVx6Vl+tDLjAm/FdoMPRDXf2lrj7KLmbVn88=; b=ZSuebyO8xScgVeIdkhLJQJQJo8jJB6/G+IF9Hic5sdK2C3epW+FeQPzoxUSRzAsuA/ xfB0H1G6TchqiX3nx38XdqiyZQ7AIz6XBEiIu3NE0wYbCJMSrc/Uz8Jx8eAqUqRB1rPQ frerWd3vUVO/3cYZoiF1FekYGnu2P3R72gGrnzzIgN+kfbb1Mq97sG53dzUvCrrRwfaP f2EPTIEuIywvNQUm8jQff3qWG1KHBzuqC5UqRvV/kTd8zX+6LDlUKwb/YXvgTorvFKXY POxeWsCNQU2k90R9bFNs51WLydi19oP27yWdg0er4+Dg2yR9VkAoUdMC+tNdcOwbeC31 EJLw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741885315; x=1742490115; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=N6Z7115UVx6Vl+tDLjAm/FdoMPRDXf2lrj7KLmbVn88=; b=rESUhMgppK1mJCS+o0eqhXL2MzTqgvCyFWudLOg1tmtOx+CC/LZ2S1TSUy09kipPGW LP6nRcdy3gL3by+wiTPVilZwnB2PrdI1lvomOYSKUROojO3DlGfVaK/K0zXfOfGakfWu KynJRUumnHWnYnh9xxRq3h9bQbb7/wUpPK9YmpFZc8mhTndpZSUq56pqvxxQq2BVQDcB y89Ev0k30OzJ4jO6SgiEj3ofHA6NmSs0b5yVaLmx3UmU7BSEZyonnGJ21l0/SXT/nMEL x3TX3cvFxBk01pLabMH/cmXhkVW3UWc3Z1AUR/LcCiuaRoJJgxl7mjrRDW5TxTI/ON2b h7Yw== X-Gm-Message-State: AOJu0YwPJ0/exFebm47MJgHQovUKbQSmvek5RnoVD+CAPzGVM8o/agor EwdPw9rWLVjIHC7EnlHRtU712eX0kggeMkRtxWwv1W2AxVMC24w5fc6lekOkiys= X-Gm-Gg: ASbGncu298ZfwPtO1lADlJN3IBz7eY+8eR8WiKa7CtEnkV1weQ3UbcwMSIDBN/iqyYi 61o57ryzIaQwGbbCZ/V4T79Sp2fw1UNYZBoDXX4nlVV15e6zZRI5P7fv6MvQmG6atHBbr+AF4+u aVqdM4isoCXcW/pqMvryDMBiLCK76v16oK1uPrwqrrpdj3TrFRULMNxVQQAJa9kmHbT28Zy1cRo hAVgWakPpxHE05xoX1Q9jgRF84a+B9lnkOSaWQ6FToNMHI1rmOUyiHRan+pezvXPdUes8Hz5I/q C94od+tzviGPcwoWqIQClOn/+RC8DPGpR3ffR6uYeul1kL0N/b9m1nyRYOspYq/YZw== X-Google-Smtp-Source: AGHT+IGFqZ5LNoVkmizf8XxxIrr2WieCKS2BNmYocBIQLWMsAOjYTWAUqhLCVcG6WQw9yTbXp2I88g== X-Received: by 2002:a17:902:f605:b0:220:cd9a:a167 with SMTP id d9443c01a7336-225dd822bbdmr4475895ad.4.1741885315030; Thu, 13 Mar 2025 10:01:55 -0700 (PDT) Received: from KASONG-MC4.tencent.com ([106.37.123.220]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-7371167df0esm1613529b3a.93.2025.03.13.10.01.49 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 13 Mar 2025 10:01:54 -0700 (PDT) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Chris Li , Barry Song , Hugh Dickins , Yosry Ahmed , "Huang, Ying" , Baoquan He , Nhat Pham , Johannes Weiner , Baolin Wang , Kalesh Singh , Matthew Wilcox , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH v3 7/7] mm, swap: simplify folio swap allocation Date: Fri, 14 Mar 2025 00:59:35 +0800 Message-ID: <20250313165935.63303-8-ryncsn@gmail.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250313165935.63303-1-ryncsn@gmail.com> References: <20250313165935.63303-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Queue-Id: D3CC7A0039 X-Rspamd-Server: rspam03 X-Stat-Signature: 5g8pm4e8qo6ern96jnxom1eh77z8fwcs X-HE-Tag: 1741885316-741305 X-HE-Meta: U2FsdGVkX1+KLUtDBHwPU49XNjPu19w3OS1T154IRJ1QYLusPZAdQiUuXPLg+8rWqsIZh2qjYuIBZKX8oVPDmp+GmLwl0zJLs6J6C4VgB3FDpbowaUDZ5UovtzwYPy4HDZxJlp9I80WBisElmU9LfFhJmoTY5d/G2DdtXTZgqmHKF3CdZpX9IYw8mi22jBJfP5T38XlgkzOBMLtDTUdwYHc42kBkxMCAnjqZCrNUmfZTfcZ7TIi1w60e1Vz6zLyVF+35D9wEoLD+bJF0vrssNVsL411W/vXlyd4namBYl8SzbyWeosB2bPgWHHmqtXjMBVIBheRhLHYAON0UOE61QAijF5Mx/zwsXT9VzE9QFHAa3FZexainmmke0eTyuBakpsh3nr11X4/83I3awUHgmlpDW2Xl0y6s6BoG9Cni5dQSAXpfzcs1VwjyScvdAdNRJATy/Dzcf3E0s1lr8rLkyGGb2yY+OutXDznTFCBtGrkUSvDa98nSKYkJ5oYWnmq43Pw0bA+osv+XHWJq+JIXoWwRSCJptz5HiDaxWbNrPeyS+9ntAZyk74hayXZKJcXXmAFvdz6XA0FzRiqPUVzZbro8ZGefSVWIUEIiyZkWhBt+i3EGoVaFqQ1xIq+nrWmX+8TqeKQNW6WvTXLwk4t0VFosXZjfCNqpRXRdoF/dxhHgXSsNu9LQZ+s1khX8AFk/Qe2TV69fyAFe6NWoxoxqxiIfvQgnq5ujjY8k1wZcWzXuz4E6IgJOG1IilogNsKvEQiONBvU6QcblL216XWvDj2dQbIXNDGSiBmVfa+PdcuDQ1JQCcD0yahsX08BJ0VXMivoBBCM7COlTFpaDY8BMP43Cupyxgo1Rv4f8obiV9qRiqi7sOHMj0VP81mVvpF0SCbRG2N8V7ROFkoKUAoD0H2TLhLpK72fWH/CnXlqx+GuuCO9CNfQDzeyc2EmZnIwDSZk/eryUuT0+7TDpfig uhUU1dMk xY3PUo0cIpE7LaSWiJMuXChD/2GJNnmdo+YAYquXXCbtfjRPVYF/iuRCihD9HJJ1DPitN5kpRVq/ITM09vhs+r9WZb2sAD8SwC6VR0dFrA4M09/ddEOPWlPl+bruooeXn645+ZcI/YjlZSRixvcLa5wrf9PbzYFbKxDagXVcnac/90ysATrN85Lcs9msA31iOoqUn+8L9C6uj1enyyWltdVc8FMBdMcGqhxWsRalAQdcbMTZiseqpcVy7KBlm00P8QXxGBAvI18vA4xJyM9LFNCQby305NIjGRFOyUUKEqhMcBx7R/QG7IOhPJzMIxGXfcE8IHCJ8WdvsNoWlEujdIMuibCmkf972wJlRcHmywLiYJHQokkdxyIk/w6pMeycUETX7RzYCqwyNk/5wzP6IBkG4zudWqAc4gA3B3bQKnUlMLqH6w9nmR71FPsl3i1dnt855GP+DBGRcn/U/dmqwo4+5cMhyWK++oE+eAPBNdS+WlX8uvFXDgv2Qkff3hmsrivQRV0WcRQYDnzhAOs7puHmx0niryWv4Zm02ujrlJhiKKRknbwrlW0mnYvV7RRM4rFgc X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Kairui Song With slot cache gone, clean up the allocation helpers even more. folio_alloc_swap will be the only entry for allocation and adding the folio to swap cache (except suspend), making it opposite of folio_free_swap. Signed-off-by: Kairui Song --- include/linux/swap.h | 8 ++-- mm/shmem.c | 21 +++----- mm/swap.h | 6 --- mm/swap_state.c | 57 ---------------------- mm/swapfile.c | 111 ++++++++++++++++++++++++++++--------------- mm/vmscan.c | 16 ++++++- 6 files changed, 95 insertions(+), 124 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index c5856dcc263a..9c99eee160f9 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -478,7 +478,7 @@ static inline long get_nr_swap_pages(void) } extern void si_swapinfo(struct sysinfo *); -swp_entry_t folio_alloc_swap(struct folio *folio); +int folio_alloc_swap(struct folio *folio, gfp_t gfp_mask); bool folio_free_swap(struct folio *folio); void put_swap_folio(struct folio *folio, swp_entry_t entry); extern swp_entry_t get_swap_page_of_type(int); @@ -586,11 +586,9 @@ static inline int swp_swapcount(swp_entry_t entry) return 0; } -static inline swp_entry_t folio_alloc_swap(struct folio *folio) +static inline int folio_alloc_swap(struct folio *folio, gfp_t gfp_mask) { - swp_entry_t entry; - entry.val = 0; - return entry; + return -EINVAL; } static inline bool folio_free_swap(struct folio *folio) diff --git a/mm/shmem.c b/mm/shmem.c index 1eed26bf8ae5..7b738d8d6581 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1546,7 +1546,6 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc) struct inode *inode = mapping->host; struct shmem_inode_info *info = SHMEM_I(inode); struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb); - swp_entry_t swap; pgoff_t index; int nr_pages; bool split = false; @@ -1628,14 +1627,6 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc) folio_mark_uptodate(folio); } - swap = folio_alloc_swap(folio); - if (!swap.val) { - if (nr_pages > 1) - goto try_split; - - goto redirty; - } - /* * Add inode to shmem_unuse()'s list of swapped-out inodes, * if it's not already there. Do it now before the folio is @@ -1648,20 +1639,20 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc) if (list_empty(&info->swaplist)) list_add(&info->swaplist, &shmem_swaplist); - if (add_to_swap_cache(folio, swap, - __GFP_HIGH | __GFP_NOMEMALLOC | __GFP_NOWARN, - NULL) == 0) { + if (!folio_alloc_swap(folio, __GFP_HIGH | __GFP_NOMEMALLOC | __GFP_NOWARN)) { shmem_recalc_inode(inode, 0, nr_pages); - swap_shmem_alloc(swap, nr_pages); - shmem_delete_from_page_cache(folio, swp_to_radix_entry(swap)); + swap_shmem_alloc(folio->swap, nr_pages); + shmem_delete_from_page_cache(folio, swp_to_radix_entry(folio->swap)); mutex_unlock(&shmem_swaplist_mutex); BUG_ON(folio_mapped(folio)); return swap_writepage(&folio->page, wbc); } + list_del_init(&info->swaplist); mutex_unlock(&shmem_swaplist_mutex); - put_swap_folio(folio, swap); + if (nr_pages > 1) + goto try_split; redirty: folio_mark_dirty(folio); if (wbc->for_reclaim) diff --git a/mm/swap.h b/mm/swap.h index ad2f121de970..0abb68091b4f 100644 --- a/mm/swap.h +++ b/mm/swap.h @@ -50,7 +50,6 @@ static inline pgoff_t swap_cache_index(swp_entry_t entry) } void show_swap_cache_info(void); -bool add_to_swap(struct folio *folio); void *get_shadow_from_swap_cache(swp_entry_t entry); int add_to_swap_cache(struct folio *folio, swp_entry_t entry, gfp_t gfp, void **shadowp); @@ -163,11 +162,6 @@ struct folio *filemap_get_incore_folio(struct address_space *mapping, return filemap_get_folio(mapping, index); } -static inline bool add_to_swap(struct folio *folio) -{ - return false; -} - static inline void *get_shadow_from_swap_cache(swp_entry_t entry) { return NULL; diff --git a/mm/swap_state.c b/mm/swap_state.c index 2b5744e211cd..68fd981b514f 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -166,63 +166,6 @@ void __delete_from_swap_cache(struct folio *folio, __lruvec_stat_mod_folio(folio, NR_SWAPCACHE, -nr); } -/** - * add_to_swap - allocate swap space for a folio - * @folio: folio we want to move to swap - * - * Allocate swap space for the folio and add the folio to the - * swap cache. - * - * Context: Caller needs to hold the folio lock. - * Return: Whether the folio was added to the swap cache. - */ -bool add_to_swap(struct folio *folio) -{ - swp_entry_t entry; - int err; - - VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio); - VM_BUG_ON_FOLIO(!folio_test_uptodate(folio), folio); - - entry = folio_alloc_swap(folio); - if (!entry.val) - return false; - - /* - * XArray node allocations from PF_MEMALLOC contexts could - * completely exhaust the page allocator. __GFP_NOMEMALLOC - * stops emergency reserves from being allocated. - * - * TODO: this could cause a theoretical memory reclaim - * deadlock in the swap out path. - */ - /* - * Add it to the swap cache. - */ - err = add_to_swap_cache(folio, entry, - __GFP_HIGH|__GFP_NOMEMALLOC|__GFP_NOWARN, NULL); - if (err) - goto fail; - /* - * Normally the folio will be dirtied in unmap because its - * pte should be dirty. A special case is MADV_FREE page. The - * page's pte could have dirty bit cleared but the folio's - * SwapBacked flag is still set because clearing the dirty bit - * and SwapBacked flag has no lock protected. For such folio, - * unmap will not set dirty bit for it, so folio reclaim will - * not write the folio out. This can cause data corruption when - * the folio is swapped in later. Always setting the dirty flag - * for the folio solves the problem. - */ - folio_mark_dirty(folio); - - return true; - -fail: - put_swap_folio(folio, entry); - return false; -} - /* * This must be called only on folios that have * been verified to be in the swap cache and locked. diff --git a/mm/swapfile.c b/mm/swapfile.c index 9bd95173865d..2eff8b51a945 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1176,9 +1176,8 @@ static bool get_swap_device_info(struct swap_info_struct *si) * Fast path try to get swap entries with specified order from current * CPU's swap entry pool (a cluster). */ -static int swap_alloc_fast(swp_entry_t *entry, - unsigned char usage, - int order) +static bool swap_alloc_fast(swp_entry_t *entry, + int order) { struct swap_cluster_info *ci; struct swap_info_struct *si; @@ -1197,7 +1196,7 @@ static int swap_alloc_fast(swp_entry_t *entry, if (cluster_is_usable(ci, order)) { if (cluster_is_empty(ci)) offset = cluster_offset(si, ci); - found = alloc_swap_scan_cluster(si, ci, offset, order, usage); + found = alloc_swap_scan_cluster(si, ci, offset, order, SWAP_HAS_CACHE); if (found) *entry = swp_entry(si->type, found); } else { @@ -1208,47 +1207,30 @@ static int swap_alloc_fast(swp_entry_t *entry, return !!found; } -swp_entry_t folio_alloc_swap(struct folio *folio) +/* Rotate the device and switch to a new cluster */ +static bool swap_alloc_slow(swp_entry_t *entry, + int order) { - unsigned int order = folio_order(folio); - unsigned int size = 1 << order; - struct swap_info_struct *si, *next; - swp_entry_t entry = {}; - unsigned long offset; int node; + unsigned long offset; + struct swap_info_struct *si, *next; - if (order) { - /* - * Should not even be attempting large allocations when huge - * page swap is disabled. Warn and fail the allocation. - */ - if (!IS_ENABLED(CONFIG_THP_SWAP) || size > SWAPFILE_CLUSTER) { - VM_WARN_ON_ONCE(1); - return entry; - } - } - - /* Fast path using percpu cluster */ - local_lock(&percpu_swap_cluster.lock); - if (swap_alloc_fast(&entry, SWAP_HAS_CACHE, order)) - goto out; - - /* Rotate the device and switch to a new cluster */ + node = numa_node_id(); spin_lock(&swap_avail_lock); start_over: - node = numa_node_id(); plist_for_each_entry_safe(si, next, &swap_avail_heads[node], avail_lists[node]) { + /* Rotate the device and switch to a new cluster */ plist_requeue(&si->avail_lists[node], &swap_avail_heads[node]); spin_unlock(&swap_avail_lock); if (get_swap_device_info(si)) { offset = cluster_alloc_swap_entry(si, order, SWAP_HAS_CACHE); put_swap_device(si); if (offset) { - entry = swp_entry(si->type, offset); - goto out; + *entry = swp_entry(si->type, offset); + return true; } if (order) - goto out; + return false; } spin_lock(&swap_avail_lock); @@ -1267,16 +1249,67 @@ swp_entry_t folio_alloc_swap(struct folio *folio) goto start_over; } spin_unlock(&swap_avail_lock); -out: + return false; +} + +/** + * folio_alloc_swap - allocate swap space for a folio + * @folio: folio we want to move to swap + * @gfp: gfp mask for shadow nodes + * + * Allocate swap space for the folio and add the folio to the + * swap cache. + * + * Context: Caller needs to hold the folio lock. + * Return: Whether the folio was added to the swap cache. + */ +int folio_alloc_swap(struct folio *folio, gfp_t gfp) +{ + unsigned int order = folio_order(folio); + unsigned int size = 1 << order; + swp_entry_t entry = {}; + + VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio); + VM_BUG_ON_FOLIO(!folio_test_uptodate(folio), folio); + + /* + * Should not even be attempting large allocations when huge + * page swap is disabled. Warn and fail the allocation. + */ + if (order && (!IS_ENABLED(CONFIG_THP_SWAP) || size > SWAPFILE_CLUSTER)) { + VM_WARN_ON_ONCE(1); + return -EINVAL; + } + + local_lock(&percpu_swap_cluster.lock); + if (!swap_alloc_fast(&entry, order)) + swap_alloc_slow(&entry, order); local_unlock(&percpu_swap_cluster.lock); + /* Need to call this even if allocation failed, for MEMCG_SWAP_FAIL. */ - if (mem_cgroup_try_charge_swap(folio, entry)) { - put_swap_folio(folio, entry); - entry.val = 0; - } - if (entry.val) - atomic_long_sub(size, &nr_swap_pages); - return entry; + if (mem_cgroup_try_charge_swap(folio, entry)) + goto out_free; + + if (!entry.val) + return -ENOMEM; + + /* + * XArray node allocations from PF_MEMALLOC contexts could + * completely exhaust the page allocator. __GFP_NOMEMALLOC + * stops emergency reserves from being allocated. + * + * TODO: this could cause a theoretical memory reclaim + * deadlock in the swap out path. + */ + if (add_to_swap_cache(folio, entry, gfp | __GFP_NOMEMALLOC, NULL)) + goto out_free; + + atomic_long_sub(size, &nr_swap_pages); + return 0; + +out_free: + put_swap_folio(folio, entry); + return -ENOMEM; } static struct swap_info_struct *_swap_info_get(swp_entry_t entry) diff --git a/mm/vmscan.c b/mm/vmscan.c index 84ec20f12200..2bc740637a6c 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1289,7 +1289,7 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, split_folio_to_list(folio, folio_list)) goto activate_locked; } - if (!add_to_swap(folio)) { + if (folio_alloc_swap(folio, __GFP_HIGH | __GFP_NOWARN)) { int __maybe_unused order = folio_order(folio); if (!folio_test_large(folio)) @@ -1305,9 +1305,21 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, } #endif count_mthp_stat(order, MTHP_STAT_SWPOUT_FALLBACK); - if (!add_to_swap(folio)) + if (folio_alloc_swap(folio, __GFP_HIGH | __GFP_NOWARN)) goto activate_locked_split; } + /* + * Normally the folio will be dirtied in unmap because its + * pte should be dirty. A special case is MADV_FREE page. The + * page's pte could have dirty bit cleared but the folio's + * SwapBacked flag is still set because clearing the dirty bit + * and SwapBacked flag has no lock protected. For such folio, + * unmap will not set dirty bit for it, so folio reclaim will + * not write the folio out. This can cause data corruption when + * the folio is swapped in later. Always setting the dirty flag + * for the folio solves the problem. + */ + folio_mark_dirty(folio); } }