From patchwork Tue Oct 22 19:24:39 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kairui Song X-Patchwork-Id: 13846075 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DFDAECDD0D8 for ; Tue, 22 Oct 2024 19:29:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 71A006B0082; Tue, 22 Oct 2024 15:29:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6CA316B0083; Tue, 22 Oct 2024 15:29:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 56A5B6B0085; Tue, 22 Oct 2024 15:29:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 3BC1A6B0082 for ; Tue, 22 Oct 2024 15:29:51 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 0FB25AAEC8 for ; Tue, 22 Oct 2024 19:29:18 +0000 (UTC) X-FDA: 82702227468.19.AC84342 Received: from mail-pl1-f176.google.com (mail-pl1-f176.google.com [209.85.214.176]) by imf20.hostedemail.com (Postfix) with ESMTP id 188731C0019 for ; Tue, 22 Oct 2024 19:29:28 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=jFUo6ukh; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf20.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.176 as permitted sender) smtp.mailfrom=ryncsn@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1729625222; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=/mi3lQ6T07AhjXuuW+YLZdNW4Sf4NNrHXlGjygFJG4M=; b=OqVcYjHI0Q+iFjNFswZHF2xCZyb9tVu2tSZmcMXfkJQjQYMo22h83H4e7RXNpBC3xHt6WP P2UOO2+6xd+pnZYfvn9zCL0mnfdmKjkodoUA33Vqurg6f72+EvB7OAc1mCSVPf466RsLcT eVEadPqx6NvDGd745/Pfq/5+eAnMibE= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729625222; a=rsa-sha256; cv=none; b=flE1PcyTi12NvSQkVQCz/m7pET9XX9uOL80JzVATDrIG4MbY0NoCqtTgYV7UGtOIjnyTap jhqgh7xWnI8oC/7BasdBcClObymCY9tswKmJOLNXen6d/Z0Tr4OVpFduxfNqh0eIzj8nE7 uA9fG48Nwx5YBkkrrgLs0X+Ij0f2fEk= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=jFUo6ukh; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf20.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.176 as permitted sender) smtp.mailfrom=ryncsn@gmail.com Received: by mail-pl1-f176.google.com with SMTP id d9443c01a7336-20cdb889222so58305635ad.3 for ; Tue, 22 Oct 2024 12:29:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1729625387; x=1730230187; darn=kvack.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=/mi3lQ6T07AhjXuuW+YLZdNW4Sf4NNrHXlGjygFJG4M=; b=jFUo6ukhnGKhLhCzrmw9NlwzE7E4qEjqzD9OEZZnfMAptuVOdeSJMHZShl93/2yx3/ padQYB1ND0U6G8oFXWpb5MG8x/VtjZ57OkqnStep27b6/SE6DpTffUWbrpP45mTcBCpK 8gC8YWIYne7GCCj6F3TDJpR63RZWmSgvwq/hbK6+WOB027SmUnkgnJQtGr52Mif0LKHQ 2Qlq1qP3NabTPi/PlWMZ7PVlF4Rjhx7+hLXHX2a60cPRTHI4WKNd/l/cBaGKqrUPW8jN pwf1NVYGh1yJVRC5sUNvlUWA5FHATHlrykh5i1aJu8vuY9XaIc7dwY/r/l//tJKWIj1x EASg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729625387; x=1730230187; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=/mi3lQ6T07AhjXuuW+YLZdNW4Sf4NNrHXlGjygFJG4M=; b=SlBCbOfOKvIR5dD4itLT593z74gtxFX4PrHbq4t8skKa8P5wz6gNFdDPqWXSp/1bsM 04sJ9vy6kc9DJeE45uY44Dh2/p09BhuMDxshk70ajfFvsA23xCXsXXT14VqWJNJ0eGbE TbcsETYkKEvmwmCnSHdmBbks0pXI4Y3TncU4nLvLhMfy6vMRV/vM9Hq6NcFP0MAy21Kp bdVcpgf5Sj9aAjNqolWVV07jreZVN7beX/n+d/85IOkd0Wa3BukBSJcdRJX6uodzFRJ7 CX6tWanBpVzlkRiiO877hQjMhOa2dChA1C6DZM2gU1c6aiK16ili0WjRx7Gr4YoSqrlK 1bIg== X-Gm-Message-State: AOJu0YwIfppK0V948K8R1UKAn59pswzjAXppiy447L/2yVscSYZUz8Wb Fqo7BZ5St4+E5lLo6Yv5BQDr8CHPUKCxeyxhYcg2Ivaf+dYr29LlHvZETL9x7HQ= X-Google-Smtp-Source: AGHT+IHYM0AqePAeT91a9awvzQB5jl0p+WmH3l4Mu0GnxO1LklkU59Sg3M8q1ydkMmTY3kKflEy/Lg== X-Received: by 2002:a17:903:2447:b0:20c:ce1f:13bd with SMTP id d9443c01a7336-20fa9e2488dmr2805955ad.18.1729625387386; Tue, 22 Oct 2024 12:29:47 -0700 (PDT) Received: from KASONG-MC4.tencent.com ([106.37.123.36]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-20e7f0d9f05sm45895305ad.186.2024.10.22.12.29.44 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 22 Oct 2024 12:29:46 -0700 (PDT) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Chris Li , Barry Song , Ryan Roberts , Hugh Dickins , Yosry Ahmed , "Huang, Ying" , Tim Chen , Nhat Pham , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH 01/13] mm, swap: minor clean up for swap entry allocation Date: Wed, 23 Oct 2024 03:24:39 +0800 Message-ID: <20241022192451.38138-2-ryncsn@gmail.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20241022192451.38138-1-ryncsn@gmail.com> References: <20241022192451.38138-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 188731C0019 X-Stat-Signature: ry4pe5h6jw4x68kot1ghbtc6t1eb8o37 X-Rspam-User: X-HE-Tag: 1729625368-783445 X-HE-Meta: U2FsdGVkX1/p3+6Rtx1Cz4KXPq75e/W2CwbvoyY/T4U2uLfaJd0fmsUiPZcVp5+hwSr8T3mQG97lnAPB2tmZERqxIABSqL7oxGhz04qWL+kMXbF0mJjHIw5TkMdSGZxXHyWTARS6XkK6oG9ZvK6eV93ucpWCmbEGK+bzl8tEEPDk+0iRYVQ91wOk2SYn+Mfb5hGuzxlHpbVIn7smTHRR7ol2jXm+36qoQMUuG/d6bp/EfwAAsG3jdqVlvv6z/9f/Cr2JaKj/CF7/p47J1ZcsyhesZWaIWdYn0tUC70N5A3B8botsqRVj15cEB4KqklDrgYJ7ECtJ1hBn63mdviqKhRTrV+TA937VUYvI8yyFV3jUyJI1nw+fxOI/foPoGi1kEacOIsQkblBU27h74jdBX9ye4Iq/r9FicO1xjAk24vWbS9KucXzAuVBxxbpP65plotwHneB8GFx659WlkBKnaUKH6nXYPwdhUh1TZOpwZ6sxtdsLbmlcYhIB/9nysYo5P6EG6fQHFfIUHgHQB3xGPlizZ1csUm+LLbvwceWCSb2xty8pCUsVf1QOH57hdJKPhk7SbAjNh2UMf2UIASbYbxgg8OiKclFrhb3mt2Hr9rfR2RSkfRlm+QmAZqlWMAquXdhVDIt9QncZ1OUIiqC1rrz3uK1MD8U81ar/4OkHprloRauiUb1jiLovd9Z9i4AACuxXNm04i5eckcxo3qg7089YicSlO07qUoSGvfWmTryMyT/y1vLoyDBthZsElAK2rl1hD7ekkSG3L93aJlGKApx99sMUO6xBwhdJtmMo+J2QnVa2lN6d4WuVJkZ8M7OLCMQ9rqFvX++7gtdurhhEPv5m5HJIHfrVvBwTMw9RReqRMocD3DfWpBFDq6dX13N/EINDMDL4V3okopaANv6/nld5z6VoR3ep8J0FJErgWpgKCMEXuvGcGnHIXeC4zhLYmTRuaLyAzU07bKd7RMw zjst0JPA dioK4q0/aORDLTciY8xe7NEVZwXjloWVdyivmRo/As6/KqvMlLw0pQ1k0fVUGKZMsIFUtayhVHmVW4WoPGAlVVG2C6EY5/UOu/A93+DHL7pOmWM0jPi7brhwAf534x5FkIfOzFD3Xj0OEPRMEcOvM8pG8AQROjbKIrvTVDNbTi9BkJwKIEBKq3ydzNylLDD96wlAJXEkke/eRZZTblN5DgtgIQIToUhpkdHS2G3aRjpc6W8qm+Z9enu8HsTmG04KPOo4PFNoMcIHJrLQHX7aIaFmC5HOYEMR0FTS9280MDzStlQVEZWzGCRJGscNjPYfmRZknBfF8Gd5fEGXYt3n72hDqigHNo1QTnyxZWthOVX1u9CGA2nPcD+5Xx41J1+r5KV9wwbfXM6Z+5VSDIZHibmBEAAmkiUxylLbOyIb0o84vmWuskuxdHULzRVUVn0FpTqw0QXnTwApxPplQfl32pHe87E0Euh2UsGcZ84h8tjY9VZV9+4XGvr4SO4GOSKiijkv/xR05ZWgmbJQlVt2pZQ/Ysn0xnAxEJRmTrn8DNAsqjVSl2L0gLjNdcw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Kairui Song Direct reclaim can skip the whole folio after reclaimed a set of folio based slots. Also simplify the code for allocation, reduce indention. Signed-off-by: Kairui Song --- mm/swapfile.c | 59 +++++++++++++++++++++++++-------------------------- 1 file changed, 29 insertions(+), 30 deletions(-) diff --git a/mm/swapfile.c b/mm/swapfile.c index 46bd4b1a3c07..1128cea95c47 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -604,23 +604,28 @@ static bool cluster_reclaim_range(struct swap_info_struct *si, unsigned long start, unsigned long end) { unsigned char *map = si->swap_map; - unsigned long offset; + unsigned long offset = start; + int nr_reclaim; spin_unlock(&ci->lock); spin_unlock(&si->lock); - for (offset = start; offset < end; offset++) { + do { switch (READ_ONCE(map[offset])) { case 0: - continue; + offset++; + break; case SWAP_HAS_CACHE: - if (__try_to_reclaim_swap(si, offset, TTRS_ANYWAY | TTRS_DIRECT) > 0) - continue; - goto out; + nr_reclaim = __try_to_reclaim_swap(si, offset, TTRS_ANYWAY | TTRS_DIRECT); + if (nr_reclaim > 0) + offset += nr_reclaim; + else + goto out; + break; default: goto out; } - } + } while (offset < end); out: spin_lock(&si->lock); spin_lock(&ci->lock); @@ -826,35 +831,30 @@ static unsigned long cluster_alloc_swap_entry(struct swap_info_struct *si, int o &found, order, usage); frags++; if (found) - break; + goto done; } - if (!found) { + /* + * Nonfull clusters are moved to frag tail if we reached + * here, count them too, don't over scan the frag list. + */ + while (frags < si->frag_cluster_nr[order]) { + ci = list_first_entry(&si->frag_clusters[order], + struct swap_cluster_info, list); /* - * Nonfull clusters are moved to frag tail if we reached - * here, count them too, don't over scan the frag list. + * Rotate the frag list to iterate, they were all failing + * high order allocation or moved here due to per-CPU usage, + * this help keeping usable cluster ahead. */ - while (frags < si->frag_cluster_nr[order]) { - ci = list_first_entry(&si->frag_clusters[order], - struct swap_cluster_info, list); - /* - * Rotate the frag list to iterate, they were all failing - * high order allocation or moved here due to per-CPU usage, - * this help keeping usable cluster ahead. - */ - list_move_tail(&ci->list, &si->frag_clusters[order]); - offset = alloc_swap_scan_cluster(si, cluster_offset(si, ci), - &found, order, usage); - frags++; - if (found) - break; - } + list_move_tail(&ci->list, &si->frag_clusters[order]); + offset = alloc_swap_scan_cluster(si, cluster_offset(si, ci), + &found, order, usage); + frags++; + if (found) + goto done; } } - if (found) - goto done; - if (!list_empty(&si->discard_clusters)) { /* * we don't have free cluster but have some clusters in @@ -892,7 +892,6 @@ static unsigned long cluster_alloc_swap_entry(struct swap_info_struct *si, int o goto done; } } - done: cluster->next[order] = offset; return found; From patchwork Tue Oct 22 19:24:40 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kairui Song X-Patchwork-Id: 13846076 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E9F17CDD0D7 for ; Tue, 22 Oct 2024 19:29:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7DC966B0085; Tue, 22 Oct 2024 15:29:55 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 766366B008A; Tue, 22 Oct 2024 15:29:55 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5C2656B0085; Tue, 22 Oct 2024 15:29:55 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 35A246B0085 for ; Tue, 22 Oct 2024 15:29:55 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id A25C1A0472 for ; Tue, 22 Oct 2024 19:29:24 +0000 (UTC) X-FDA: 82702227930.15.FE4DA58 Received: from mail-pl1-f174.google.com (mail-pl1-f174.google.com [209.85.214.174]) by imf12.hostedemail.com (Postfix) with ESMTP id 91E0040010 for ; Tue, 22 Oct 2024 19:29:45 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=VxaEIkFm; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf12.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.174 as permitted sender) smtp.mailfrom=ryncsn@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1729625226; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=0kiDuJ4tEKZSj0qaWySVrFhdnZUjCFe9iUFsOZ0gkhE=; b=dgfdahi/A3bP/5FIk4VvciM83vk2KobHse2s4cRcmzMWic1E1G/tCHeLnVMW2ctonOWzce SJ+DSTgLry1jVKAZiZBCZvWA/rkTmRMtrs3jB3EPxcL95aj+AikmDBXG59I3OWmmw6P+j1 dG47cHBPshSjHTJ901rQPmArjy0+r8A= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729625226; a=rsa-sha256; cv=none; b=RpSVvjtwu6sbIuRGFseqbiHYHCqgzeESSVCH4Qy7KluS6ch6jvogswPSP4/jUONK3+liG2 eEP2HpPoPi84SK1byytXN7OUpz11+5TD7M4ezNEr/LPubXFmVwOCKNQWqYC9tJpaWJV3y4 XVXe69AtSFdSxBEfhS13FOsOFdNE3ZU= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=VxaEIkFm; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf12.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.174 as permitted sender) smtp.mailfrom=ryncsn@gmail.com Received: by mail-pl1-f174.google.com with SMTP id d9443c01a7336-20cb47387ceso54187815ad.1 for ; Tue, 22 Oct 2024 12:29:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1729625391; x=1730230191; darn=kvack.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=0kiDuJ4tEKZSj0qaWySVrFhdnZUjCFe9iUFsOZ0gkhE=; b=VxaEIkFmzek84zeF6gG7r9NvenYP5ZIzu9jcMkZ68d9CCxyL1gVBdLxk7hi+x40mI3 gSzXvSs5jAiy6/x0+62/2OIX+9PtEOB69P3jFwQWX7+CSGNIS4OeNBvbX+1V2fyxrAfL KpPqxg7tNW/Gywfw0VeBQsQdkiJYjnnJ/rX6BR6vuaOB85+EgxdjVxQVph27pxpPREwG fLWK2a4zhEmJp6Jc4VLCh8kpiGq3kavXkVtYrI7R7OJ7yRiQ2eSrrFpNaoY1AuuvzgEj B6nrsqwXnq39bollsJcRbnr2+4Clh9p3bPmLjNztJu7GMgGq7hVUrRY2aRLVKps/hW/S KQ5Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729625391; x=1730230191; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=0kiDuJ4tEKZSj0qaWySVrFhdnZUjCFe9iUFsOZ0gkhE=; b=rzhtxE0kE47jnPkZ8+J8+0cQtqI7yxTAkpuyxySGalmRjM/GNGCfwgtOAfoLRqUD9t o39tHRTj05D4KKT1RzPNTHYpLhWLfCn0MOq7AIjA2RuBWmeNkgoEQJ2Zxy8c67jNVzWs VUhuHbE6McLKNsgopbukmI9nmn09NVSMDrAgCvrcDlW3n9PZ0I5pDRW4djKgofqOoOx0 eGKVrPW/8GEJ1uE77kE+kvCU94+6bjPH/SQ+UN1DI4Zcy4k+WariKGvEiZEdon8FvtxY 3b77KlkiA25dqxJfmNkw7AjpAFSTCBDejrUaW2f5DUGaLuPlYJ/GXP4Y+eh0cJDEgZCf 11gA== X-Gm-Message-State: AOJu0YzRZNR7AbpLkymrl2eXciS716ts4DzGHWABZjH9C/VMJwCskTrw Z+bEJNtv+TptLXS3cj03Mw9vOZRpLhl7/Jc0qV+caofwQfe5X8MgMFB0JV0wTCE= X-Google-Smtp-Source: AGHT+IHJ/mmbPeQO7m1JzLu61hvPRuxTG4vZV7GSls5xo1lWKGBwkK8GicKhDQOEb/5V/S1g/+bTNg== X-Received: by 2002:a17:902:db06:b0:20b:c1e4:2d5d with SMTP id d9443c01a7336-20fa9e72980mr2645385ad.34.1729625390976; Tue, 22 Oct 2024 12:29:50 -0700 (PDT) Received: from KASONG-MC4.tencent.com ([106.37.123.36]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-20e7f0d9f05sm45895305ad.186.2024.10.22.12.29.47 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 22 Oct 2024 12:29:50 -0700 (PDT) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Chris Li , Barry Song , Ryan Roberts , Hugh Dickins , Yosry Ahmed , "Huang, Ying" , Tim Chen , Nhat Pham , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH 02/13] mm, swap: fold swap_info_get_cont in the only caller Date: Wed, 23 Oct 2024 03:24:40 +0800 Message-ID: <20241022192451.38138-3-ryncsn@gmail.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20241022192451.38138-1-ryncsn@gmail.com> References: <20241022192451.38138-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 91E0040010 X-Stat-Signature: 9kxdqomdpxhygxoj4m19911h1pc7kniy X-Rspam-User: X-HE-Tag: 1729625385-104033 X-HE-Meta: U2FsdGVkX1+q46usB763XYKu6qiyrZUx4YVTAQFMBur9LPS/PM7iWVF+5bicCP5bKvODEiRUB0xh2audDjvHLFcCWu6WEr5A/XUuF3LcIdtzu5vTFz+Cfeoya+2iCa2Bn7fTwTLetuIkmP10TqfDRnO/5lRhXdmuB9u00sVfrkRyY08V6nMbqaZMfJ6xoHBVHlCuso3LgP5WowsTVlc1eLF+GXpE6UtMa7SS5vxStUmeh0LCDvWEJvhodIMIaH6RFpmDfF2KuR46M5SReJBIfrdKintrrrEaDLY/AnTLuZTIaFLj+ruukJ4i0GcBmmrVoRRnEd1MOJAIQ9QsP7UKeFqFYjKtopYkd1YEBSAX19E2M5eLnXCS0eLQ4MC1EF4psoer/BdYojmvaTXGI5aoN/mYrATkx5QbmIe8fPBGSqsXtX8Rgb1yiUgpIiIzg528TN6pd3iziUOd9IwZZQsi3NtNFnmFBIeJh7vWSCmprFa80d6ZwaBcW1Xo8FZotlER5noc3PTc5cvm05BoiQ6FltvbbN/pIflDug2QnNn5dJ1j/GP3HXCEsSbIbRKPejzt2DMHX/Tg1t586AZA1RttoS27EcaaekMvw5EQOyiIREGFFikfBsd+66vCANFOsZJz2XfklmwS90pHHck4oPc0Ohc2O7XUm8fmFeBPjqWJ+i9Xe+A7JGaK1qSqKNRrejDvH+Vx5C4GGa4RJ3Vxis3QvXJo6JB8wJt/WZUXyFbdHsi0QFeAgPYKrNOgsiInuI1ia6wH/m8wjrH0QXP8/NXZtEb2xojvnsprTVpWCDiQ+jhME3bdf//orWa2w+e8T5Iv1xihimVdoxMrSpGycewwYn4XC6v44Rvz6NVpnuqPvOKHfM4xsYa4jel1VC6RKhFJoHJAs2cspVNS1/rTZ9+fcT7vitFf4jHEvHoi6U9xr97bUSZRIZH1iP1FzuURL/8gYwobTB0VpGu9TUV9fO9 oorxu7s9 8XDh+h0gh92woEmyVlCG/ceFq8pknaF8kFNM7uCvoOVjLpOYonzFUU/1JOHDEcJ79p22S7ag0X/VsrsgR7Oou++xmb/oLorIFrjB7OsbBJchdrTbjpQMklcqz5mApYO0zeXZWHMA1Dm4JJZpTq9v6JBZVcsa943n+ON3/fLYrsD9AaPYhEaf3TbZKeAe9Ijrs9EUV+S8Awz4uGRrzfvQnjU/cNzDgvX3Th9OZCdtKOVdDko9aZDvS/EIlbSLwFOkNeCyQefymJmnQ1PV2kdGdpOhtrYjEG1H4BrZ7ZXWJZaWtr6yyLAQNZ1Y/UC7JBtp96dwDK2e+lmbkuPbBrGOh7PIwTgGdU3ydEHgAlS13y1dHhgM1OBHBMXK96p2QhtyqVXqr/A0rq2GacmshAAZ5HhaFES8LZm1jkCp1Q3wrxiVvr6CI6413k7K9MZRukqMYeygwDlySzaT+PHHg1ggQKK2pkg64ArOeYov0pNmwG5UIFE/PgAl050ia8ff32UhHWj4IQGQe/LOc8iXIKoVAAo7eTKa2en8RNm35nGD8cyU0NaHTIq3bBdOI9c0OHxbUISeb X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Kairui Song The name of the function is confusing, and the code is much easier to follow after folding, also rename the confusing naming "p" to more meaningful "si". Signed-off-by: Kairui Song --- mm/swapfile.c | 39 +++++++++++++++------------------------ 1 file changed, 15 insertions(+), 24 deletions(-) diff --git a/mm/swapfile.c b/mm/swapfile.c index 1128cea95c47..e1e4a1ba4fc5 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1359,22 +1359,6 @@ static struct swap_info_struct *_swap_info_get(swp_entry_t entry) return NULL; } -static struct swap_info_struct *swap_info_get_cont(swp_entry_t entry, - struct swap_info_struct *q) -{ - struct swap_info_struct *p; - - p = _swap_info_get(entry); - - if (p != q) { - if (q != NULL) - spin_unlock(&q->lock); - if (p != NULL) - spin_lock(&p->lock); - } - return p; -} - static unsigned char __swap_entry_free_locked(struct swap_info_struct *si, unsigned long offset, unsigned char usage) @@ -1671,14 +1655,14 @@ static int swp_entry_cmp(const void *ent1, const void *ent2) void swapcache_free_entries(swp_entry_t *entries, int n) { - struct swap_info_struct *p, *prev; + struct swap_info_struct *si, *prev; int i; if (n <= 0) return; prev = NULL; - p = NULL; + si = NULL; /* * Sort swap entries by swap device, so each lock is only taken once. @@ -1688,13 +1672,20 @@ void swapcache_free_entries(swp_entry_t *entries, int n) if (nr_swapfiles > 1) sort(entries, n, sizeof(entries[0]), swp_entry_cmp, NULL); for (i = 0; i < n; ++i) { - p = swap_info_get_cont(entries[i], prev); - if (p) - swap_entry_range_free(p, entries[i], 1); - prev = p; + si = _swap_info_get(entries[i]); + + if (si != prev) { + if (prev != NULL) + spin_unlock(&prev->lock); + if (si != NULL) + spin_lock(&si->lock); + } + if (si) + swap_entry_range_free(si, entries[i], 1); + prev = si; } - if (p) - spin_unlock(&p->lock); + if (si) + spin_unlock(&si->lock); } int __swap_count(swp_entry_t entry) From patchwork Tue Oct 22 19:24:41 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kairui Song X-Patchwork-Id: 13846077 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5CF34CDD0D7 for ; Tue, 22 Oct 2024 19:29:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E6C6A6B0089; Tue, 22 Oct 2024 15:29:58 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E1BED6B008A; Tue, 22 Oct 2024 15:29:58 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C95946B008C; Tue, 22 Oct 2024 15:29:58 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id AA7136B0089 for ; Tue, 22 Oct 2024 15:29:58 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 0CE8EA0469 for ; Tue, 22 Oct 2024 19:29:28 +0000 (UTC) X-FDA: 82702227972.10.C2E609D Received: from mail-pl1-f179.google.com (mail-pl1-f179.google.com [209.85.214.179]) by imf09.hostedemail.com (Postfix) with ESMTP id CC2FB14001B for ; Tue, 22 Oct 2024 19:29:44 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Uln1YLlP; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf09.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.179 as permitted sender) smtp.mailfrom=ryncsn@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729625345; a=rsa-sha256; cv=none; b=M+FYNK7rBphPkJgsGsXlzgH53pwqKILf9J68ZknuV8H13GZ54oc4w5oKkd8eOT7Gg7/pKQ EEEgDaQmmgbcRveW7Z60i4p3wjlLan1mfMRHdMLJNojKV6l3tgu+aOZeqKcXKKV/dyxiai zjEmIshOAstqiZ06v62uYRocJpSGydI= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Uln1YLlP; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf09.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.179 as permitted sender) smtp.mailfrom=ryncsn@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1729625345; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=3VbfK94vYMM7w9zgk7A3WhLYxN2WUErtpJ6jwxCUlfQ=; b=Qoi53mJnIdb7OHcZeUXejxLriK6H39UeuNP7hsXaBxbcl8aO/EQoyBm5HXWcHJDXUTsLMB whmUNEeHSkYs/GLfQpX6JJOyZEh/WMxr1Z6hrSfEHnX+E1UK3Z9tMujodHmYOzRIpapIAW KNFaD+Xq9ORorO6ctR4l7sf0xV2jh/0= Received: by mail-pl1-f179.google.com with SMTP id d9443c01a7336-20c803787abso1319205ad.0 for ; Tue, 22 Oct 2024 12:29:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1729625395; x=1730230195; darn=kvack.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=3VbfK94vYMM7w9zgk7A3WhLYxN2WUErtpJ6jwxCUlfQ=; b=Uln1YLlPsqwKgWUqVCHfCLocBvv8hfkS1juxlonmv2/2aaAWGCYd8uqop3ddHeaQL6 K1LH4mtbhlYEIBXzwvX0RoEGeA05+3j1D4icnwzRr7F2ZwaruoAP+FjPOqQ0GVMM2uAF qHk2hA6d2WG5q62gZX2Aoca1jnWYBTmlvAu6EDGWpamJVL85HlVMcVN69z2fPD241BJB /NBM9B4Y5N/ADzuNXngyKgrXr7nkeC56uPeTClm+Vw91Y2/ofMAzYeJTCL2ojpz80QDM lfn+jWZVPuiZzaOM8pzaKKMulWX5XeN01xuv0zWTs2latnPEHR4IVrBh4LayRfnhqtAH dMog== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729625395; x=1730230195; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=3VbfK94vYMM7w9zgk7A3WhLYxN2WUErtpJ6jwxCUlfQ=; b=YP+QyMHpEd417pJOvBuzch5LGJbcOxTdkCjSx07FwAT+XXLEaLwh5KevuCJlGQEOit YeMrbU4L1CPGeo4CX0mlzCq68FSSQynP4bLWSF2lhi8KsUGAxe5M9OMSRViyk0VlhadB z+EkXhUjtVgvXekRCDLP4DBM+AiPOy6xVdAZrwxvM4VqzDSdz0eRe9dntEBulERpnRMi lwSypm0SGbQTzc0t1sxpaspLGHEPrLgeV3dPcLdn3SqRlRXnKNtBphuxxc7ZNso1Zavr BdIInU3WPKNl0CvtTWaA8+UF0uj1Zh7RWG6Y3vsGDrN/2/Idz+UZkjW2ck0D4RaOuirv XxUg== X-Gm-Message-State: AOJu0YxoaGJpuI4Mub98WfqjkjOuhicYzZHqQK7G3h3EkmnvBp6RWbnV YUqBMm7jNbHtv+NwFQQPlLXEAwwQYV8vih+KZviDqqdjCiIpF/7JAsoe4pOhlMg= X-Google-Smtp-Source: AGHT+IGC2ct67tQlrbrn/euF7RF/i0JZyKYC9YgNS29vka34ew1P1aDh5IqUWUcc+iJ+99k9cKHp5A== X-Received: by 2002:a17:902:c40f:b0:20b:6c1e:1e13 with SMTP id d9443c01a7336-20e970c794cmr70541015ad.23.1729625394613; Tue, 22 Oct 2024 12:29:54 -0700 (PDT) Received: from KASONG-MC4.tencent.com ([106.37.123.36]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-20e7f0d9f05sm45895305ad.186.2024.10.22.12.29.51 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 22 Oct 2024 12:29:54 -0700 (PDT) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Chris Li , Barry Song , Ryan Roberts , Hugh Dickins , Yosry Ahmed , "Huang, Ying" , Tim Chen , Nhat Pham , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH 03/13] mm, swap: remove old allocation path for HDD Date: Wed, 23 Oct 2024 03:24:41 +0800 Message-ID: <20241022192451.38138-4-ryncsn@gmail.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20241022192451.38138-1-ryncsn@gmail.com> References: <20241022192451.38138-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Queue-Id: CC2FB14001B X-Rspamd-Server: rspam01 X-Stat-Signature: te8mcw7m7ccaqedk6uij353irkqqypp1 X-HE-Tag: 1729625384-149411 X-HE-Meta: U2FsdGVkX1/WjYwK657LctkmuQPlGiKWxMli0lnCOQexKy7KqxtLiBYdYgOANODG0cPwdlu7xfRC9IF5PD+EymRfZHFwcTiparVJ85QfdrjEe/6zWuYb0Kmyr7wx8zufGL1bMqGEzQc6Pa4KS+tO7pUrkAQS6wsS2suAwWP+8fol81LKlh46l8ay3PDJOD0yZA0cykn4Jlj1HbcTibtVF18I1qhzqNRpSBWTQXM6aUR2PBejsdpRI53pTozjrJGK4t3QZ/7w3rk0QBgtUzEsshmecda5sP5jjuTA62D0KiJHCCiwGCf2eDp9GVpHONZZ55TkJTHoKwpnxgKXQvuikjT1M9Gb1989HSPvSeo1laazjEfmX1LetMYFBJr+Ak9nIMFHODWTAr058TWsuspGcvfcbBV0FDJm9sCMCsVWRwTdswe/RZPhjRMwB1ADpLpiKnzGzFP9XZg9cbGX7ldwiYv5PCKoBzMrpDIzdoaKOf+Mg0bBUngkZ0oBS3GKLEjVEY2tfH+M3UnyZQYdKnKQ2ui0c/t8gyp1KS9nBfSBRsmqxSomQ7xzRlD5hZeogbs9ojIb9e1ilOLKSUmnievVUUh3euxLkRau/cgpfeFPo0yHxr5Q36f1SGyMxSpnDuLZSiN3g6oYA6O7BUSDn8YFJyC8A723m8zz78BupT57yEM9pSMDvObt9zXWZvdmzf9S5ZmOKnRcU3QjT16bqJeQVOl/gXf3MQSF1KHfmKXCNbvlyAK0U9BwO8FneZxxRT4YeUIPcbmrqhXvLMW4E1hO8nOrYc6mhVP2iSzSHsChNgCKa73lc9JcNOSkklVknJJ/WSKdm1C9KPJ1ito+GXPL2lweEBKbNPI3pAF4fZVWWmgwXy1JmfA+cWd5kVoSL/QNizSC0pf7lMWgKa6yuW4651tA4Hcr1zir2a13X1R15Oats4VZZFkfbf+5nYfnXO8jOVpFMkNlpiGdpnxVM6l rmnqTNva qQJ+h9Fizo356RGNxFuPTAJhEF7pWJOQQ2rLRMdzy+yyqq5iA8L3F25zxhbpcjWAJu6AkY5aB/Eae8tECmNorBFX1AvvGm97+yDJ4kzGFj8Gqb3kW8qb8n1cSds1Vzoxp0TF0qRsuvX3cTsB2pTBwmM0j8oH0UaAn9rdNYs821I02up+FFdCb2cleVsMskg58XO34SA0jyvhTg70oPDbw1IIy7IQsDC3aisBENMLVe1Iye9nalCiPHtUiUGy+oq+VomOXiE5Av2TxPzzKhcVnwyvMZ6kH7/ypYpNBbOJQAS/FutnoM26Bab/mxFXYw8qkgn2Su13BgPAW/YgEU7ZQ0IB1w17k7BbCgiJWv3fGSKEBVILY2NJp6ZYub+Sqj7jyagfSBX9XyNaglLD1rdo5XRjq6Ynb1mSU7AhMppRg9spjNxNo6CKmaMh9KD8F1Y8zYOH55/0E2LjWiZLF8mx+kGdsKZavDvB6aP2RbOqIce9tMM7rfmJACNUSeDq63ozG/jB6PgJInAH2ni77e0bieE6q+LbpZhHwJSG6rAQLUGz57ZDb/kINrZcG7xh8fjMOX2C2 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Kairui Song We are currently using different swap allocation algorithm for HDD and non-HDD. This leads to the existing of different set of locking, and the code path is heavily bloated, causing troubles for further optimization and maintenance. This commit removes all HDD swap allocation and related dead code, and use cluster allocation algorithm instead. The performance may drop a little bit temporarily, and should be negligible: The main advantage of legacy HDD allocation algorithm is that is tend to use continuous slots, but swap device gets fragmented quickly anyway, and the attempt to use continuous slots will fail easily. This commit also enables mTHP swap on HDD, which should be beneficial, and following commits will adapt and optimize the cluster allocator for HDD. Suggested-by: Chris Li Suggested-by: "Huang, Ying" Signed-off-by: Kairui Song --- include/linux/swap.h | 3 - mm/swapfile.c | 235 ++----------------------------------------- 2 files changed, 9 insertions(+), 229 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index f3e0ac20c2e8..3a71198a6957 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -309,9 +309,6 @@ struct swap_info_struct { unsigned int highest_bit; /* index of last free in swap_map */ unsigned int pages; /* total of usable pages of swap */ unsigned int inuse_pages; /* number of those currently in use */ - unsigned int cluster_next; /* likely index for next allocation */ - unsigned int cluster_nr; /* countdown to next cluster search */ - unsigned int __percpu *cluster_next_cpu; /*percpu index for next allocation */ struct percpu_cluster __percpu *percpu_cluster; /* per cpu's swap location */ struct rb_root swap_extent_root;/* root of the swap extent rbtree */ struct block_device *bdev; /* swap device or bdev of swap file */ diff --git a/mm/swapfile.c b/mm/swapfile.c index e1e4a1ba4fc5..ffdf7eedecb5 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -989,49 +989,6 @@ static void swap_range_free(struct swap_info_struct *si, unsigned long offset, WRITE_ONCE(si->inuse_pages, si->inuse_pages - nr_entries); } -static void set_cluster_next(struct swap_info_struct *si, unsigned long next) -{ - unsigned long prev; - - if (!(si->flags & SWP_SOLIDSTATE)) { - si->cluster_next = next; - return; - } - - prev = this_cpu_read(*si->cluster_next_cpu); - /* - * Cross the swap address space size aligned trunk, choose - * another trunk randomly to avoid lock contention on swap - * address space if possible. - */ - if ((prev >> SWAP_ADDRESS_SPACE_SHIFT) != - (next >> SWAP_ADDRESS_SPACE_SHIFT)) { - /* No free swap slots available */ - if (si->highest_bit <= si->lowest_bit) - return; - next = get_random_u32_inclusive(si->lowest_bit, si->highest_bit); - next = ALIGN_DOWN(next, SWAP_ADDRESS_SPACE_PAGES); - next = max_t(unsigned int, next, si->lowest_bit); - } - this_cpu_write(*si->cluster_next_cpu, next); -} - -static bool swap_offset_available_and_locked(struct swap_info_struct *si, - unsigned long offset) -{ - if (data_race(!si->swap_map[offset])) { - spin_lock(&si->lock); - return true; - } - - if (vm_swap_full() && READ_ONCE(si->swap_map[offset]) == SWAP_HAS_CACHE) { - spin_lock(&si->lock); - return true; - } - - return false; -} - static int cluster_alloc_swap(struct swap_info_struct *si, unsigned char usage, int nr, swp_entry_t slots[], int order) @@ -1055,13 +1012,7 @@ static int scan_swap_map_slots(struct swap_info_struct *si, unsigned char usage, int nr, swp_entry_t slots[], int order) { - unsigned long offset; - unsigned long scan_base; - unsigned long last_in_cluster = 0; - int latency_ration = LATENCY_LIMIT; unsigned int nr_pages = 1 << order; - int n_ret = 0; - bool scanned_many = false; /* * We try to cluster swap pages by allocating them sequentially @@ -1073,7 +1024,6 @@ static int scan_swap_map_slots(struct swap_info_struct *si, * But we do now try to find an empty cluster. -Andrea * And we let swap pages go all over an SSD partition. Hugh */ - if (order > 0) { /* * Should not even be attempting large allocations when huge @@ -1093,158 +1043,7 @@ static int scan_swap_map_slots(struct swap_info_struct *si, return 0; } - if (si->cluster_info) - return cluster_alloc_swap(si, usage, nr, slots, order); - - si->flags += SWP_SCANNING; - - /* For HDD, sequential access is more important. */ - scan_base = si->cluster_next; - offset = scan_base; - - if (unlikely(!si->cluster_nr--)) { - if (si->pages - si->inuse_pages < SWAPFILE_CLUSTER) { - si->cluster_nr = SWAPFILE_CLUSTER - 1; - goto checks; - } - - spin_unlock(&si->lock); - - /* - * If seek is expensive, start searching for new cluster from - * start of partition, to minimize the span of allocated swap. - */ - scan_base = offset = si->lowest_bit; - last_in_cluster = offset + SWAPFILE_CLUSTER - 1; - - /* Locate the first empty (unaligned) cluster */ - for (; last_in_cluster <= READ_ONCE(si->highest_bit); offset++) { - if (si->swap_map[offset]) - last_in_cluster = offset + SWAPFILE_CLUSTER; - else if (offset == last_in_cluster) { - spin_lock(&si->lock); - offset -= SWAPFILE_CLUSTER - 1; - si->cluster_next = offset; - si->cluster_nr = SWAPFILE_CLUSTER - 1; - goto checks; - } - if (unlikely(--latency_ration < 0)) { - cond_resched(); - latency_ration = LATENCY_LIMIT; - } - } - - offset = scan_base; - spin_lock(&si->lock); - si->cluster_nr = SWAPFILE_CLUSTER - 1; - } - -checks: - if (!(si->flags & SWP_WRITEOK)) - goto no_page; - if (!si->highest_bit) - goto no_page; - if (offset > si->highest_bit) - scan_base = offset = si->lowest_bit; - - /* reuse swap entry of cache-only swap if not busy. */ - if (vm_swap_full() && si->swap_map[offset] == SWAP_HAS_CACHE) { - int swap_was_freed; - spin_unlock(&si->lock); - swap_was_freed = __try_to_reclaim_swap(si, offset, TTRS_ANYWAY | TTRS_DIRECT); - spin_lock(&si->lock); - /* entry was freed successfully, try to use this again */ - if (swap_was_freed > 0) - goto checks; - goto scan; /* check next one */ - } - - if (si->swap_map[offset]) { - if (!n_ret) - goto scan; - else - goto done; - } - memset(si->swap_map + offset, usage, nr_pages); - - swap_range_alloc(si, offset, nr_pages); - slots[n_ret++] = swp_entry(si->type, offset); - - /* got enough slots or reach max slots? */ - if ((n_ret == nr) || (offset >= si->highest_bit)) - goto done; - - /* search for next available slot */ - - /* time to take a break? */ - if (unlikely(--latency_ration < 0)) { - if (n_ret) - goto done; - spin_unlock(&si->lock); - cond_resched(); - spin_lock(&si->lock); - latency_ration = LATENCY_LIMIT; - } - - if (si->cluster_nr && !si->swap_map[++offset]) { - /* non-ssd case, still more slots in cluster? */ - --si->cluster_nr; - goto checks; - } - - /* - * Even if there's no free clusters available (fragmented), - * try to scan a little more quickly with lock held unless we - * have scanned too many slots already. - */ - if (!scanned_many) { - unsigned long scan_limit; - - if (offset < scan_base) - scan_limit = scan_base; - else - scan_limit = si->highest_bit; - for (; offset <= scan_limit && --latency_ration > 0; - offset++) { - if (!si->swap_map[offset]) - goto checks; - } - } - -done: - if (order == 0) - set_cluster_next(si, offset + 1); - si->flags -= SWP_SCANNING; - return n_ret; - -scan: - VM_WARN_ON(order > 0); - spin_unlock(&si->lock); - while (++offset <= READ_ONCE(si->highest_bit)) { - if (unlikely(--latency_ration < 0)) { - cond_resched(); - latency_ration = LATENCY_LIMIT; - scanned_many = true; - } - if (swap_offset_available_and_locked(si, offset)) - goto checks; - } - offset = si->lowest_bit; - while (offset < scan_base) { - if (unlikely(--latency_ration < 0)) { - cond_resched(); - latency_ration = LATENCY_LIMIT; - scanned_many = true; - } - if (swap_offset_available_and_locked(si, offset)) - goto checks; - offset++; - } - spin_lock(&si->lock); - -no_page: - si->flags -= SWP_SCANNING; - return n_ret; + return cluster_alloc_swap(si, usage, nr, slots, order); } int get_swap_pages(int n_goal, swp_entry_t swp_entries[], int entry_order) @@ -2855,8 +2654,6 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile) mutex_unlock(&swapon_mutex); free_percpu(p->percpu_cluster); p->percpu_cluster = NULL; - free_percpu(p->cluster_next_cpu); - p->cluster_next_cpu = NULL; vfree(swap_map); kvfree(zeromap); kvfree(cluster_info); @@ -3168,8 +2965,6 @@ static unsigned long read_swap_header(struct swap_info_struct *si, } si->lowest_bit = 1; - si->cluster_next = 1; - si->cluster_nr = 0; maxpages = swapfile_maximum_size; last_page = swap_header->info.last_page; @@ -3255,7 +3050,6 @@ static struct swap_cluster_info *setup_clusters(struct swap_info_struct *si, unsigned long maxpages) { unsigned long nr_clusters = DIV_ROUND_UP(maxpages, SWAPFILE_CLUSTER); - unsigned long col = si->cluster_next / SWAPFILE_CLUSTER % SWAP_CLUSTER_COLS; struct swap_cluster_info *cluster_info; unsigned long i, j, k, idx; int cpu, err = -ENOMEM; @@ -3267,15 +3061,6 @@ static struct swap_cluster_info *setup_clusters(struct swap_info_struct *si, for (i = 0; i < nr_clusters; i++) spin_lock_init(&cluster_info[i].lock); - si->cluster_next_cpu = alloc_percpu(unsigned int); - if (!si->cluster_next_cpu) - goto err_free; - - /* Random start position to help with wear leveling */ - for_each_possible_cpu(cpu) - per_cpu(*si->cluster_next_cpu, cpu) = - get_random_u32_inclusive(1, si->highest_bit); - si->percpu_cluster = alloc_percpu(struct percpu_cluster); if (!si->percpu_cluster) goto err_free; @@ -3317,7 +3102,7 @@ static struct swap_cluster_info *setup_clusters(struct swap_info_struct *si, * sharing same address space. */ for (k = 0; k < SWAP_CLUSTER_COLS; k++) { - j = (k + col) % SWAP_CLUSTER_COLS; + j = k % SWAP_CLUSTER_COLS; for (i = 0; i < DIV_ROUND_UP(nr_clusters, SWAP_CLUSTER_COLS); i++) { struct swap_cluster_info *ci; idx = i * SWAP_CLUSTER_COLS + j; @@ -3467,18 +3252,18 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags) if (si->bdev && bdev_nonrot(si->bdev)) { si->flags |= SWP_SOLIDSTATE; - - cluster_info = setup_clusters(si, swap_header, maxpages); - if (IS_ERR(cluster_info)) { - error = PTR_ERR(cluster_info); - cluster_info = NULL; - goto bad_swap_unlock_inode; - } } else { atomic_inc(&nr_rotate_swap); inced_nr_rotate_swap = true; } + cluster_info = setup_clusters(si, swap_header, maxpages); + if (IS_ERR(cluster_info)) { + error = PTR_ERR(cluster_info); + cluster_info = NULL; + goto bad_swap_unlock_inode; + } + if ((swap_flags & SWAP_FLAG_DISCARD) && si->bdev && bdev_max_discard_sectors(si->bdev)) { /* @@ -3559,8 +3344,6 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags) bad_swap: free_percpu(si->percpu_cluster); si->percpu_cluster = NULL; - free_percpu(si->cluster_next_cpu); - si->cluster_next_cpu = NULL; inode = NULL; destroy_swap_extents(si); swap_cgroup_swapoff(si->type); From patchwork Tue Oct 22 19:24:42 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kairui Song X-Patchwork-Id: 13846078 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0259FCDD0D7 for ; Tue, 22 Oct 2024 19:30:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 87B196B008C; Tue, 22 Oct 2024 15:30:02 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8291D6B0092; Tue, 22 Oct 2024 15:30:02 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6A2A76B0095; Tue, 22 Oct 2024 15:30:02 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 490BC6B008C for ; Tue, 22 Oct 2024 15:30:02 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 74818140464 for ; Tue, 22 Oct 2024 19:29:44 +0000 (UTC) X-FDA: 82702228098.21.B413511 Received: from mail-pl1-f174.google.com (mail-pl1-f174.google.com [209.85.214.174]) by imf07.hostedemail.com (Postfix) with ESMTP id 5BD3F40012 for ; Tue, 22 Oct 2024 19:29:37 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="c/RpbDCf"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf07.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.174 as permitted sender) smtp.mailfrom=ryncsn@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729625288; a=rsa-sha256; cv=none; b=CvQWkco5XXEA014ylqOaULP4Cfz01flRx7dDAgbhRK5eqUFWSXB4DfS4E86hIlMit9BofB jj/NrU0yg1g5cgSWD3LoO+Q1ag4Po+quCtgTgGsJAh6aFOhre6q35/HMfcMAituexqScM1 E3hjYo0hh6nivxKhK0duD351kpNHNLE= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="c/RpbDCf"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf07.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.174 as permitted sender) smtp.mailfrom=ryncsn@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1729625288; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=De6FNmrDX2CsofmiG4+hb8nOrMhL9lIlNsptB1OQpyI=; b=nyaP6Fx3lqZSyPxANgHsjaS3HssmYqnvh3kwRdqHfBchpFvFSlSGNmY3urH+i3ZOGqWoVK WaazyslGhd2pMhaVL69WlfQ5iVaH45wQkIlAR6iqxt2NOzj0Wck9AI6+HbjKS+VauK8fwH Uh86FBe/HX9gfYTuEbDPBez2os2rKgs= Received: by mail-pl1-f174.google.com with SMTP id d9443c01a7336-20c6f492d2dso69674395ad.0 for ; Tue, 22 Oct 2024 12:29:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1729625398; x=1730230198; darn=kvack.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=De6FNmrDX2CsofmiG4+hb8nOrMhL9lIlNsptB1OQpyI=; b=c/RpbDCf3vNwtuy1dfO5m4aswzitcQDmojS+uJnOaCHQT4yq44cJXN3gpr3TmQiX20 RJ+i6mxXpQDAIm+jS0hFGzVprDFvXg3CUQNvy0J+YuG31DMySHe/95Epg2SG7ZKiLuv0 EsjGM50rugT6aGib1OPpBVoNlEOOz7UOAEmRg0utC3882DD1d8R7mjx1WV72pdvjwye5 jyUwasKyqxKq2IgdfoRlmVI6tsm5kutsikCTueJvp64SYN8fB89F8SLxF+GleadvV8rp 5UVszudHgqV6h5+VKhHX7CD4GcnHxkqAXsnyA6srnNVgCqUxFadW4sQH7X/+bxs6F6JG Xehg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729625398; x=1730230198; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=De6FNmrDX2CsofmiG4+hb8nOrMhL9lIlNsptB1OQpyI=; b=KVi2QLaqbPf+pecNzQlHLzXjG2JzR3d52n1P81IsMPMdLFTnUiFxOFuMJk7krO6jLx nFVsRcvl0jQzz2kax2TdumVn2h9y+xjruyEk/yQ0YCao80uXAVyIIn9QKwD752ek9Srp EcIccifEZpYi28ffls/IeONeoTZfO2l0ZO807jIYg3aFtRcJIFwJoE8KxeZEZBp83Aw6 6Zur4d/ikhIaTDohpxfM+0b6+MCcVsWw8o54SkH5wgQFDsy+00Rmu5MU/mYIod0iAuqr 6LRX8j+EpASNxlUy4ldg7vo+ifWGYP6LrozXlR71Jm+XZUtDiE7Su170qFajX0ks9jZG x+vQ== X-Gm-Message-State: AOJu0YywszfwDcZW0H/WPobp8lIuqRI7kf4+tTQOBdUo8N2QNJkavtmE g/3Zy5bGbUmT6rnvIrAFJCRZMFzi5jTgBT2qF37g2Yo/7P7r1Xw4mH2ilkqJ7FQ= X-Google-Smtp-Source: AGHT+IG648J4NAU1H2vV7qvnmugjAT+R3eqfpZK2sgM9OZGK+vlJTzeCTjXDpzcoLaWx1v4aNcZg4Q== X-Received: by 2002:a17:903:18e:b0:20b:6918:30b5 with SMTP id d9443c01a7336-20fa9e9f77emr3186585ad.41.1729625398260; Tue, 22 Oct 2024 12:29:58 -0700 (PDT) Received: from KASONG-MC4.tencent.com ([106.37.123.36]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-20e7f0d9f05sm45895305ad.186.2024.10.22.12.29.54 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 22 Oct 2024 12:29:57 -0700 (PDT) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Chris Li , Barry Song , Ryan Roberts , Hugh Dickins , Yosry Ahmed , "Huang, Ying" , Tim Chen , Nhat Pham , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH 04/13] mm, swap: use cluster lock for HDD Date: Wed, 23 Oct 2024 03:24:42 +0800 Message-ID: <20241022192451.38138-5-ryncsn@gmail.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20241022192451.38138-1-ryncsn@gmail.com> References: <20241022192451.38138-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 X-Rspamd-Queue-Id: 5BD3F40012 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: mab5oo5p9cz39nmif4ccrntja6cxjo3t X-HE-Tag: 1729625377-655070 X-HE-Meta: U2FsdGVkX19pk5IpK0qj+Zd7nRX62IrPTdW8NhRhnBesPGrgI2pVEqNgCEjW8rHJJWgSJwcMr+uQmQR9IoawkuUjQqGCe7NdoT4oZYEmHOLWVcQIlTUImhLbgwpHjcBhUfkVwiD0g4GRwWibxU4cYizsb1bFJJ6AGfdA/Zd3Z2HV7Yw460Yp7K4sFZhvcM1deVV1/lJcLS4GDlA06Ay78/sTQ+Nx1AHCpREPprEvPSRtmCXL1uJNg7vw/HZdC4dG3hkOwiCVJ4DseirBBdUA8a/n7WI8CWkJN63p9kMAIk2xBWJfSVRdaFYwKVxPDmZyWlkjEKn40tNjXkKV1v3i3XFuqQLfmrB9uHT0TqYoPc/D7LPJLqOAJelDdhtIhrPXQHfDfgOU6X1dvfuRLbcFvW59a0k6Dt1Bea46V2Lol6mwRZ/DzzRZG08E26GkyOO9xtqCbaTSj/JN15W4vwkrhoeIFrJ1V+SjjBNYruS5eXtcFJvkYVw9lztOL4yDBimQMq9MAizdVH+bzkC7jCDvTwHZ7wZauqmKab3/xR1Z1TEBktEa4G5jNyrPyQnLipk30HR4Oi8bJVjxi1fLccObjqbuIdA7dBkV4iho8niLZSjTIjY/X5+HU/+rKgupvOUJudELmwBB5807sn9t0ZUsPSxoR9NEJIg3YTYBRG6N3bHXzqSsRT7moaoKep2lQXFyxK8pIToaHHdmZpKIndaQj3m6XGIrAl6ocdp+tHarr1m5Z1iGWwp+0hz233gqeqRGVj8VRfyy54MKZrPQ3/QcAr/aZ1yTdNpRSmk49PggHA2LiOPfA74cDXgocfP96YqRiF2k+PTtelof27Q8NDbDSlTkepn3rN5Le9ozCozAIf1dWBZAEJXd/wFSc25/bDjawFLITP61gGUxz7SlLLwA3zn+0DwC1XRvMuE7kEUebEkAOzmuzBHkwT8KgvvyPPHkbfqmZDRK5gJwxAlraTJ W7pQHfYD pTLUs7gPK07NDMRIsDpphszEZ8YjV9YFVWlCnKQWx1zSDnCBqnCqVxY5BmL0pSnxduxZlDASQ5eow479v3y3c1t0t6Ef7rGZCyAue/6hyIfFH9IBC8UH4BcuPOcODR7L2G7+QGpmSS/YxhNOeZ05qqQbQ6nSOt/vr6aIix0SJc9B4+bBVS7j8Prgh9s6Mc8U2TwC+CNK1LCkOvQE7Kr4YDqjTlsw5d9Py0jKxFgMVDx4N7FuVZEPXMZEItI75dmOX5Q8LpV4TIBz1lA7Xh3N0ZT/JCeWWeHBGGvJ7fMIAxNOH2St4Q0Ur9ZhUoOJt1XGszLaJoziQuqcIrgahfkO9sTsbWOC0mZc7GOAotr2r+Iv3B2zxzqASkhnFuV/WY9CX+IIZOyjj2ukhz29PucY03CoSnU2Nj1kEdr9bswYYfOQpsSODFuMo2T671sbPvJooSWJXhZHnZ0OzPxRdkQnhmNtYeKDWQCIuCVGjFE32pmE+9PkFelAGQOXUZBRSUVyJ2vQfKFN8Gqub3rdZnthATZhuJaCdXBsqfpUm7Fi7z4mNUEY= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Kairui Song Cluster lock (ci->lock) was introduce to reduce contention for certain operations. Using cluster lock for HDD is not helpful as HDD have a poor performance, so locking isn't the bottleneck. But having different set of locks for HDD / non-HDD prevents further rework of device lock (si->lock). This commit just changed all lock_cluster_or_swap_info to lock_cluster, which is a safe and straight conversion since cluster info is always allocated now, also removed all cluster_info related checks. Suggested-by: Chris Li Signed-off-by: Kairui Song --- mm/swapfile.c | 107 ++++++++++++++++---------------------------------- 1 file changed, 34 insertions(+), 73 deletions(-) diff --git a/mm/swapfile.c b/mm/swapfile.c index ffdf7eedecb5..f8e70bb5f1d7 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -58,10 +58,9 @@ static void swap_entry_range_free(struct swap_info_struct *si, swp_entry_t entry static void swap_range_alloc(struct swap_info_struct *si, unsigned long offset, unsigned int nr_entries); static bool folio_swapcache_freeable(struct folio *folio); -static struct swap_cluster_info *lock_cluster_or_swap_info( - struct swap_info_struct *si, unsigned long offset); -static void unlock_cluster_or_swap_info(struct swap_info_struct *si, - struct swap_cluster_info *ci); +static struct swap_cluster_info *lock_cluster(struct swap_info_struct *si, + unsigned long offset); +static void unlock_cluster(struct swap_cluster_info *ci); static DEFINE_SPINLOCK(swap_lock); static unsigned int nr_swapfiles; @@ -222,9 +221,9 @@ static int __try_to_reclaim_swap(struct swap_info_struct *si, * swap_map is HAS_CACHE only, which means the slots have no page table * reference or pending writeback, and can't be allocated to others. */ - ci = lock_cluster_or_swap_info(si, offset); + ci = lock_cluster(si, offset); need_reclaim = swap_is_has_cache(si, offset, nr_pages); - unlock_cluster_or_swap_info(si, ci); + unlock_cluster(ci); if (!need_reclaim) goto out_unlock; @@ -404,45 +403,15 @@ static inline struct swap_cluster_info *lock_cluster(struct swap_info_struct *si { struct swap_cluster_info *ci; - ci = si->cluster_info; - if (ci) { - ci += offset / SWAPFILE_CLUSTER; - spin_lock(&ci->lock); - } - return ci; -} - -static inline void unlock_cluster(struct swap_cluster_info *ci) -{ - if (ci) - spin_unlock(&ci->lock); -} - -/* - * Determine the locking method in use for this device. Return - * swap_cluster_info if SSD-style cluster-based locking is in place. - */ -static inline struct swap_cluster_info *lock_cluster_or_swap_info( - struct swap_info_struct *si, unsigned long offset) -{ - struct swap_cluster_info *ci; - - /* Try to use fine-grained SSD-style locking if available: */ - ci = lock_cluster(si, offset); - /* Otherwise, fall back to traditional, coarse locking: */ - if (!ci) - spin_lock(&si->lock); + ci = &si->cluster_info[offset / SWAPFILE_CLUSTER]; + spin_lock(&ci->lock); return ci; } -static inline void unlock_cluster_or_swap_info(struct swap_info_struct *si, - struct swap_cluster_info *ci) +static inline void unlock_cluster(struct swap_cluster_info *ci) { - if (ci) - unlock_cluster(ci); - else - spin_unlock(&si->lock); + spin_unlock(&ci->lock); } /* Add a cluster to discard list and schedule it to do discard */ @@ -558,9 +527,6 @@ static void inc_cluster_info_page(struct swap_info_struct *si, unsigned long idx = page_nr / SWAPFILE_CLUSTER; struct swap_cluster_info *ci; - if (!cluster_info) - return; - ci = cluster_info + idx; ci->count++; @@ -576,9 +542,6 @@ static void inc_cluster_info_page(struct swap_info_struct *si, static void dec_cluster_info_page(struct swap_info_struct *si, struct swap_cluster_info *ci, int nr_pages) { - if (!si->cluster_info) - return; - VM_BUG_ON(ci->count < nr_pages); VM_BUG_ON(cluster_is_free(ci)); lockdep_assert_held(&si->lock); @@ -995,8 +958,6 @@ static int cluster_alloc_swap(struct swap_info_struct *si, { int n_ret = 0; - VM_BUG_ON(!si->cluster_info); - while (n_ret < nr) { unsigned long offset = cluster_alloc_swap_entry(si, order, usage); @@ -1036,10 +997,10 @@ static int scan_swap_map_slots(struct swap_info_struct *si, } /* - * Swapfile is not block device or not using clusters so unable + * Swapfile is not block device so unable * to allocate large entries. */ - if (!(si->flags & SWP_BLKDEV) || !si->cluster_info) + if (!(si->flags & SWP_BLKDEV)) return 0; } @@ -1279,9 +1240,9 @@ static unsigned char __swap_entry_free(struct swap_info_struct *si, unsigned long offset = swp_offset(entry); unsigned char usage; - ci = lock_cluster_or_swap_info(si, offset); + ci = lock_cluster(si, offset); usage = __swap_entry_free_locked(si, offset, 1); - unlock_cluster_or_swap_info(si, ci); + unlock_cluster(ci); if (!usage) free_swap_slot(entry); @@ -1304,14 +1265,14 @@ static bool __swap_entries_free(struct swap_info_struct *si, if (nr > SWAPFILE_CLUSTER - offset % SWAPFILE_CLUSTER) goto fallback; - ci = lock_cluster_or_swap_info(si, offset); + ci = lock_cluster(si, offset); if (!swap_is_last_map(si, offset, nr, &has_cache)) { - unlock_cluster_or_swap_info(si, ci); + unlock_cluster(ci); goto fallback; } for (i = 0; i < nr; i++) WRITE_ONCE(si->swap_map[offset + i], SWAP_HAS_CACHE); - unlock_cluster_or_swap_info(si, ci); + unlock_cluster(ci); if (!has_cache) { for (i = 0; i < nr; i++) @@ -1367,7 +1328,7 @@ static void cluster_swap_free_nr(struct swap_info_struct *si, DECLARE_BITMAP(to_free, BITS_PER_LONG) = { 0 }; int i, nr; - ci = lock_cluster_or_swap_info(si, offset); + ci = lock_cluster(si, offset); while (nr_pages) { nr = min(BITS_PER_LONG, nr_pages); for (i = 0; i < nr; i++) { @@ -1375,18 +1336,18 @@ static void cluster_swap_free_nr(struct swap_info_struct *si, bitmap_set(to_free, i, 1); } if (!bitmap_empty(to_free, BITS_PER_LONG)) { - unlock_cluster_or_swap_info(si, ci); + unlock_cluster(ci); for_each_set_bit(i, to_free, BITS_PER_LONG) free_swap_slot(swp_entry(si->type, offset + i)); if (nr == nr_pages) return; bitmap_clear(to_free, 0, BITS_PER_LONG); - ci = lock_cluster_or_swap_info(si, offset); + ci = lock_cluster(si, offset); } offset += nr; nr_pages -= nr; } - unlock_cluster_or_swap_info(si, ci); + unlock_cluster(ci); } /* @@ -1425,9 +1386,9 @@ void put_swap_folio(struct folio *folio, swp_entry_t entry) if (!si) return; - ci = lock_cluster_or_swap_info(si, offset); + ci = lock_cluster(si, offset); if (size > 1 && swap_is_has_cache(si, offset, size)) { - unlock_cluster_or_swap_info(si, ci); + unlock_cluster(ci); spin_lock(&si->lock); swap_entry_range_free(si, entry, size); spin_unlock(&si->lock); @@ -1435,14 +1396,14 @@ void put_swap_folio(struct folio *folio, swp_entry_t entry) } for (int i = 0; i < size; i++, entry.val++) { if (!__swap_entry_free_locked(si, offset + i, SWAP_HAS_CACHE)) { - unlock_cluster_or_swap_info(si, ci); + unlock_cluster(ci); free_swap_slot(entry); if (i == size - 1) return; - lock_cluster_or_swap_info(si, offset); + lock_cluster(si, offset); } } - unlock_cluster_or_swap_info(si, ci); + unlock_cluster(ci); } static int swp_entry_cmp(const void *ent1, const void *ent2) @@ -1506,9 +1467,9 @@ int swap_swapcount(struct swap_info_struct *si, swp_entry_t entry) struct swap_cluster_info *ci; int count; - ci = lock_cluster_or_swap_info(si, offset); + ci = lock_cluster(si, offset); count = swap_count(si->swap_map[offset]); - unlock_cluster_or_swap_info(si, ci); + unlock_cluster(ci); return count; } @@ -1531,7 +1492,7 @@ int swp_swapcount(swp_entry_t entry) offset = swp_offset(entry); - ci = lock_cluster_or_swap_info(si, offset); + ci = lock_cluster(si, offset); count = swap_count(si->swap_map[offset]); if (!(count & COUNT_CONTINUED)) @@ -1554,7 +1515,7 @@ int swp_swapcount(swp_entry_t entry) n *= (SWAP_CONT_MAX + 1); } while (tmp_count & COUNT_CONTINUED); out: - unlock_cluster_or_swap_info(si, ci); + unlock_cluster(ci); return count; } @@ -1569,8 +1530,8 @@ static bool swap_page_trans_huge_swapped(struct swap_info_struct *si, int i; bool ret = false; - ci = lock_cluster_or_swap_info(si, offset); - if (!ci || nr_pages == 1) { + ci = lock_cluster(si, offset); + if (nr_pages == 1) { if (swap_count(map[roffset])) ret = true; goto unlock_out; @@ -1582,7 +1543,7 @@ static bool swap_page_trans_huge_swapped(struct swap_info_struct *si, } } unlock_out: - unlock_cluster_or_swap_info(si, ci); + unlock_cluster(ci); return ret; } @@ -3412,7 +3373,7 @@ static int __swap_duplicate(swp_entry_t entry, unsigned char usage, int nr) offset = swp_offset(entry); VM_WARN_ON(nr > SWAPFILE_CLUSTER - offset % SWAPFILE_CLUSTER); VM_WARN_ON(usage == 1 && nr > 1); - ci = lock_cluster_or_swap_info(si, offset); + ci = lock_cluster(si, offset); err = 0; for (i = 0; i < nr; i++) { @@ -3467,7 +3428,7 @@ static int __swap_duplicate(swp_entry_t entry, unsigned char usage, int nr) } unlock_out: - unlock_cluster_or_swap_info(si, ci); + unlock_cluster(ci); return err; } From patchwork Tue Oct 22 19:24:43 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kairui Song X-Patchwork-Id: 13846079 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A54EBCDD0D7 for ; Tue, 22 Oct 2024 19:30:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 322A46B0095; Tue, 22 Oct 2024 15:30:06 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2D1C16B0096; Tue, 22 Oct 2024 15:30:06 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 124376B0098; Tue, 22 Oct 2024 15:30:06 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id DEABB6B0095 for ; Tue, 22 Oct 2024 15:30:05 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id A57FEAA386 for ; Tue, 22 Oct 2024 19:29:32 +0000 (UTC) X-FDA: 82702228602.28.7F97EF4 Received: from mail-pl1-f169.google.com (mail-pl1-f169.google.com [209.85.214.169]) by imf25.hostedemail.com (Postfix) with ESMTP id 27629A001A for ; Tue, 22 Oct 2024 19:29:51 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="Rt/XPyMU"; spf=pass (imf25.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.169 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1729625251; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=XKHLsBzhzCnbiPIqKsShhAk0UHdn03icCPe/yHcoGTQ=; b=mEE4fohCrV2XOI3QeTb7kxJcbBhnAw6ZrimIFnMr9R4ZiIMzGVRgxxA1YVIjZbhzFB5qLK hDZDsIvbkitCGYx9cBmFfEeNbP6yDSL6d52fu0tW4eKZvQNuJaLNnGnnwOZWKAMlUcvJ4C 21DikQvHOeO9H2bSATlHK8KabSR7Ad0= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729625251; a=rsa-sha256; cv=none; b=XmPwG6t7iwUGCTmK3KgFtdfpp85aJdeyKhOjCiyPOM8siHea3RC9t37shK9ZeF9sZZb6Xw R1kPuDNOFdHiVtf9MJt3iU90zD1tsq6qOAUHvnsV9yMIvPfUpaagCc3NXYV5rbQNvPzbtH mfPtJHjJco274XSC6sW69Bb/TolSCL0= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="Rt/XPyMU"; spf=pass (imf25.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.169 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-pl1-f169.google.com with SMTP id d9443c01a7336-20c6f492d2dso69674985ad.0 for ; Tue, 22 Oct 2024 12:30:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1729625402; x=1730230202; darn=kvack.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=XKHLsBzhzCnbiPIqKsShhAk0UHdn03icCPe/yHcoGTQ=; b=Rt/XPyMUJGga9XhOphmgIvkREBuGyVuIlFNioNslNK4KXn2GizAzfo28h67bku+Wc1 FBNPFt6yQ90I3+ZoBWZM1xVItdPjJ/eZqKbtDzm5zt2Mb4ZrpCPeEA9RfZrYrNp5ClOI E+WABNPkN9hwYlOHElFNqsxmuciNziuahVrHhXIGIPWLqmLkcugBTKkPDkIcXnd2K2l1 HUVwN9os4wKDOtETtgSwb8cc5d2bcd8OnMXN+ak8IwuHfBFHsxJS9iPar4kfQKSRufV4 xrL5DNo+/e+8rX7iUCfv1uvkFJ+UNJgEtch1p8XZwGjsgVb5XXqSvkflyq1A5MSEaWs8 UzHQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729625402; x=1730230202; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=XKHLsBzhzCnbiPIqKsShhAk0UHdn03icCPe/yHcoGTQ=; b=Sy27OZ1t9r4s50GTYYRBswr+PbF+qM3MF97AXj0w/lTLmsXyhx+yNtZ5Z+ivGSGDss IpzPtKyHW4qJknUws3NFxPS2qPvTkFpKWHsbYXpDqsNmov0eDKUxk38WqBMHRifK3DhA t1VMa1Lmv1d87vWz2df5XW65a+y2OgSfWkNtdFqYxpikb3qJca/zgOQmAZ6ZtDEoYktx rO75MGgQY8ZODV7fsnGS93PSeQepR8j10hwTlQoO3lXlN6oCpNnpsYqFX8uOedibfxGY 1Yuhm4hFh7N6l9Ovogb1cWLP8XbdzPaf0BxD6fPIX1yu1h2UoONDb4vR3lqmKKnMx0Py KS9Q== X-Gm-Message-State: AOJu0YyBpXx+Cy2pVWTXL+er2dIr13R/1ra+tpnbpd/GToDp+w4qY1u4 UoSJRUbihmJIlPaQfG+a6DGJAei097A6zv8o/ln/RdBvTafoFuEv38MqpPoZ+Ao= X-Google-Smtp-Source: AGHT+IH5d48HG7yM1vsiLoHlHuj/5dQGaeruEyz0B4KacEW852FH1mXkSs1tEGlqkW7GrlSz5sKaZA== X-Received: by 2002:a17:902:ec82:b0:20b:4875:2c51 with SMTP id d9443c01a7336-20fa9e5c6f7mr2808315ad.27.1729625401890; Tue, 22 Oct 2024 12:30:01 -0700 (PDT) Received: from KASONG-MC4.tencent.com ([106.37.123.36]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-20e7f0d9f05sm45895305ad.186.2024.10.22.12.29.58 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 22 Oct 2024 12:30:01 -0700 (PDT) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Chris Li , Barry Song , Ryan Roberts , Hugh Dickins , Yosry Ahmed , "Huang, Ying" , Tim Chen , Nhat Pham , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH 05/13] mm, swap: clean up device availability check Date: Wed, 23 Oct 2024 03:24:43 +0800 Message-ID: <20241022192451.38138-6-ryncsn@gmail.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20241022192451.38138-1-ryncsn@gmail.com> References: <20241022192451.38138-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 X-Stat-Signature: 8j3ofb751kimz18ujb8m5szxuqwn1tzg X-Rspamd-Queue-Id: 27629A001A X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1729625391-830763 X-HE-Meta: U2FsdGVkX19Pua8YZdxb8SDUnyV8WE1r+0cFE/q9U6EL+FVaDKpv6itb9hGWId2Q3H5yTQwG60P47yMI6nQuCOqdSTUayMJcqJFPOb3Da1I6C+lSoOEa6XfLWD/jE39hvKc7A8tvoxg5R6uGa4pT/EK3bR8E3aWHBsq6Cb29j282z3Cx7eHEj+XePe9FJgD8KIJci5W0K6UxgBKaxtdIGggwSGACZfJPwNqs9G1715qPfKT0a0OS3cDVLGDeFq+aXdFBI7vYZUyu0bByDYWLTMn7iExw2GtIf3pqWlgrM4bHznivLljcrWWNpuyo5oOXEtbbfIpMBrWBEjUc0bjszGyll8fYqpMeXj+kHckCIopg08dMOljKo7eh4fcqrdDb3THEyNTBlp1hp6IZjuyvNuik2KndN8KapPkzPDIls1K66gB6lBk8vQ1v+GOIkxlmCe+KFAQMBj3eBWgEetAjQW9LHuI2cEVKrTsEvwJ5UyAM4cGbbwnj3E6F4H+fwt3lEUfQaUVaPELL/nEHiBAx7zENWqX4MTqnGAXS7QuJYJF19IsT32P2u48l9wmVpJbgwNz1BLYACeD0gd6MjBpL9tRL2cJBy0jbQhqX1un9mjpqr0BuXOrnPbDIys7qcf916NOba1qi8AZBNcl+ZO1yFKxAp63XTt8ugd5g1Vq7rFPzxPNj7pnxL4IMpqgq+pj9WF11yQn2wrTQX0COImKf86vLLWF0KFdb5qI5AXlnJfQ/l6cBSfHaba9Vt6TlpFtIkoC0Li01i2u/Kx5/Vk7N3VNMCIC+Xyj309omV/Vj8TGJ+AQUkpdZ0wMDH5KcqJcLfR3zThcLuM38dxT41SX2Tm9myN7jq2G7znGZOVQMuQNJG5xa1yFfHUmmbkSbKuDvAldfO0KduJQ2qEYbkKZA2pePmYs4kSB8J70/ZcHLk6Vill7vhdN+sYjMsBbf7H8y5D98SNDQHFFe1emJH5t 9zD9VvBS D6xTKMCyL8pSnTV4QdipI2qP2yu2Mrvs9g0LxU4EyqTvjCuv6i1L7RS5IV0obmHtRSouN7C7oU/GK5V2PSMcx/QHo25sLWlpjTQSqnTxP0emkCa6n8iPcA5QMd9M7N2B7yUsmfL4Gji7Mdlu1/ODdc2nW2gF7alBH+hUs0OJzrJdUje2cTa/rbeW6UvAaIBqpXLrwq8I1xq9qNPUbnGUBIdTVQEJTiYYJRi7jfPa3NAtgEjrP9k8TjSdc2qoYQxlHSr3vurgGGyR1dgT3qcPIJnoAhOHUUzbRBhCdEKyVyEqe8F7ahrc8JXpZS1eii3d7zZT3/xN5OwU1msXgGJ7UpmdursEGqRkAYl+rxGPhgqK2zACK9+Ksl4dGcTwhPJ6VQiPgvPFHp9cxcmCXiJTGN7xytvD+DXb56GHeVo5LnzU1ZwtH6X1mkJuXrJfQ2dH2yU4wn7gbevCJyzSBAYcKC3e/3SUiYa0RT694z3aj9HvOu/i3hvjAlAEhLAZ3v+XV0WoGhqBPiUPAz8VjgdpcMze5+Q== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Kairui Song Remove highest_bit and lowest_bit. After HDD allocation path is removed, only purpose of these two fields is to judge if the device is full or not, which can be done by checking inuse_pages instead. Signed-off-by: Kairui Song --- fs/btrfs/inode.c | 1 - fs/iomap/swapfile.c | 1 - include/linux/swap.h | 2 -- mm/page_io.c | 1 - mm/swapfile.c | 38 ++++++++------------------------------ 5 files changed, 8 insertions(+), 35 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 5618ca02934a..aba9c0d58998 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -10023,7 +10023,6 @@ static int btrfs_swap_activate(struct swap_info_struct *sis, struct file *file, *span = bsi.highest_ppage - bsi.lowest_ppage + 1; sis->max = bsi.nr_pages; sis->pages = bsi.nr_pages - 1; - sis->highest_bit = bsi.nr_pages - 1; return bsi.nr_extents; } #else diff --git a/fs/iomap/swapfile.c b/fs/iomap/swapfile.c index 5fc0ac36dee3..b90d0eda9e51 100644 --- a/fs/iomap/swapfile.c +++ b/fs/iomap/swapfile.c @@ -189,7 +189,6 @@ int iomap_swapfile_activate(struct swap_info_struct *sis, *pagespan = 1 + isi.highest_ppage - isi.lowest_ppage; sis->max = isi.nr_pages; sis->pages = isi.nr_pages - 1; - sis->highest_bit = isi.nr_pages - 1; return isi.nr_extents; } EXPORT_SYMBOL_GPL(iomap_swapfile_activate); diff --git a/include/linux/swap.h b/include/linux/swap.h index 3a71198a6957..c0d49dad7a4b 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -305,8 +305,6 @@ struct swap_info_struct { struct list_head frag_clusters[SWAP_NR_ORDERS]; /* list of cluster that are fragmented or contented */ unsigned int frag_cluster_nr[SWAP_NR_ORDERS]; - unsigned int lowest_bit; /* index of first free in swap_map */ - unsigned int highest_bit; /* index of last free in swap_map */ unsigned int pages; /* total of usable pages of swap */ unsigned int inuse_pages; /* number of those currently in use */ struct percpu_cluster __percpu *percpu_cluster; /* per cpu's swap location */ diff --git a/mm/page_io.c b/mm/page_io.c index a28d28b6b3ce..c8a25203bcf4 100644 --- a/mm/page_io.c +++ b/mm/page_io.c @@ -163,7 +163,6 @@ int generic_swapfile_activate(struct swap_info_struct *sis, page_no = 1; /* force Empty message */ sis->max = page_no; sis->pages = page_no - 1; - sis->highest_bit = page_no - 1; out: return ret; bad_bmap: diff --git a/mm/swapfile.c b/mm/swapfile.c index f8e70bb5f1d7..e620b41c3120 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -55,7 +55,7 @@ static bool swap_count_continued(struct swap_info_struct *, pgoff_t, static void free_swap_count_continuations(struct swap_info_struct *); static void swap_entry_range_free(struct swap_info_struct *si, swp_entry_t entry, unsigned int nr_pages); -static void swap_range_alloc(struct swap_info_struct *si, unsigned long offset, +static void swap_range_alloc(struct swap_info_struct *si, unsigned int nr_entries); static bool folio_swapcache_freeable(struct folio *folio); static struct swap_cluster_info *lock_cluster(struct swap_info_struct *si, @@ -647,7 +647,7 @@ static void cluster_alloc_range(struct swap_info_struct *si, struct swap_cluster } memset(si->swap_map + start, usage, nr_pages); - swap_range_alloc(si, start, nr_pages); + swap_range_alloc(si, nr_pages); ci->count += nr_pages; if (ci->count == SWAPFILE_CLUSTER) { @@ -876,19 +876,11 @@ static void del_from_avail_list(struct swap_info_struct *si) spin_unlock(&swap_avail_lock); } -static void swap_range_alloc(struct swap_info_struct *si, unsigned long offset, +static void swap_range_alloc(struct swap_info_struct *si, unsigned int nr_entries) { - unsigned int end = offset + nr_entries - 1; - - if (offset == si->lowest_bit) - si->lowest_bit += nr_entries; - if (end == si->highest_bit) - WRITE_ONCE(si->highest_bit, si->highest_bit - nr_entries); WRITE_ONCE(si->inuse_pages, si->inuse_pages + nr_entries); if (si->inuse_pages == si->pages) { - si->lowest_bit = si->max; - si->highest_bit = 0; del_from_avail_list(si); if (vm_swap_full()) @@ -921,15 +913,8 @@ static void swap_range_free(struct swap_info_struct *si, unsigned long offset, for (i = 0; i < nr_entries; i++) clear_bit(offset + i, si->zeromap); - if (offset < si->lowest_bit) - si->lowest_bit = offset; - if (end > si->highest_bit) { - bool was_full = !si->highest_bit; - - WRITE_ONCE(si->highest_bit, end); - if (was_full && (si->flags & SWP_WRITEOK)) - add_to_avail_list(si); - } + if (si->inuse_pages == si->pages) + add_to_avail_list(si); if (si->flags & SWP_BLKDEV) swap_slot_free_notify = si->bdev->bd_disk->fops->swap_slot_free_notify; @@ -1035,15 +1020,12 @@ int get_swap_pages(int n_goal, swp_entry_t swp_entries[], int entry_order) plist_requeue(&si->avail_lists[node], &swap_avail_heads[node]); spin_unlock(&swap_avail_lock); spin_lock(&si->lock); - if (!si->highest_bit || !(si->flags & SWP_WRITEOK)) { + if ((si->inuse_pages == si->pages) || !(si->flags & SWP_WRITEOK)) { spin_lock(&swap_avail_lock); if (plist_node_empty(&si->avail_lists[node])) { spin_unlock(&si->lock); goto nextsi; } - WARN(!si->highest_bit, - "swap_info %d in list but !highest_bit\n", - si->type); WARN(!(si->flags & SWP_WRITEOK), "swap_info %d in list but !SWP_WRITEOK\n", si->type); @@ -2425,8 +2407,8 @@ static void _enable_swap_info(struct swap_info_struct *si) */ plist_add(&si->list, &swap_active_head); - /* add to available list iff swap device is not full */ - if (si->highest_bit) + /* add to available list if swap device is not full */ + if (si->inuse_pages < si->pages) add_to_avail_list(si); } @@ -2590,7 +2572,6 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile) drain_mmlist(); /* wait for anyone still in scan_swap_map_slots */ - p->highest_bit = 0; /* cuts scans short */ while (p->flags >= SWP_SCANNING) { spin_unlock(&p->lock); spin_unlock(&swap_lock); @@ -2925,8 +2906,6 @@ static unsigned long read_swap_header(struct swap_info_struct *si, return 0; } - si->lowest_bit = 1; - maxpages = swapfile_maximum_size; last_page = swap_header->info.last_page; if (!last_page) { @@ -2943,7 +2922,6 @@ static unsigned long read_swap_header(struct swap_info_struct *si, if ((unsigned int)maxpages == 0) maxpages = UINT_MAX; } - si->highest_bit = maxpages - 1; if (!maxpages) return 0; From patchwork Tue Oct 22 19:24:44 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kairui Song X-Patchwork-Id: 13846080 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1C025CDD0D7 for ; Tue, 22 Oct 2024 19:30:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9DA346B0098; Tue, 22 Oct 2024 15:30:10 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9611B6B0099; Tue, 22 Oct 2024 15:30:10 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 78D1D6B009A; Tue, 22 Oct 2024 15:30:10 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 541B86B0098 for ; Tue, 22 Oct 2024 15:30:10 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 486F21A045D for ; Tue, 22 Oct 2024 19:29:40 +0000 (UTC) X-FDA: 82702227888.17.3C92946 Received: from mail-pl1-f175.google.com (mail-pl1-f175.google.com [209.85.214.175]) by imf29.hostedemail.com (Postfix) with ESMTP id 4811F120005 for ; Tue, 22 Oct 2024 19:29:46 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="bmveENQ/"; spf=pass (imf29.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.175 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1729625256; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=91RZU9Im1mJOrq2P9uaFSxI4j9qGIThucf4AzLiTGJQ=; b=ChdvfMxRFYOrSEB9zawETxkunUdlm5ufdBrxqWuKB/Gfo16uQA+9RoLrgrvt1uqIF+qjVW I9REtRk1UjW0sJuJu3Uv8f1glN7GsMu30pOcOeMJ1+mKrHZ+XJMtgA9T0LMBXKkg+vQPSn 13QAn/Sh+lQ325w+hmkL3LYrIiBc5yc= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729625256; a=rsa-sha256; cv=none; b=g3wbI4JKedKuoSsycNpziC+OVXEmtB36xLpmoTx/CnXw1Aph9gdPN45ZXu7311uacEoE/L CoyLqQFx0Jngv1cACJyZZPgTLsQMQNcXAzIJGyitO3HGcgjc1tDD3paKHVkZOdQCNvaWA3 HawM+1SI9z2RdJo3gaYNM9bpInNjr+g= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="bmveENQ/"; spf=pass (imf29.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.175 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-pl1-f175.google.com with SMTP id d9443c01a7336-20cb7139d9dso55091815ad.1 for ; Tue, 22 Oct 2024 12:30:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1729625406; x=1730230206; darn=kvack.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=91RZU9Im1mJOrq2P9uaFSxI4j9qGIThucf4AzLiTGJQ=; b=bmveENQ/w7NRozZhRcQZseTm+y6OgG25X1nJlt4hl4AuPvALYt4SChP0/P0vzqLH/4 xx0hhvFg1yvqsJqozfxyjn4jEV96gBjjMEntKWrHafn3qd+TrOo94OjSguqOsPJr4eLa r6oRHlXrBiu+NIGfoytpQNFLC9OqLFYVW+UqWPCjanuQoefT/UJJpUbQdDq1+A7v++vJ UTzlxljJ6gG8btd4VNITIkgUJ06pITcQfx9oY7RoygLWjv1yD6yuTaIfLEyJQ6IltzI5 jSD7K10oSmfXTEfFfriSqiK6pKNd4+OkkdgsjIBwLR7OYIOE3chniDmSGi5XppSjcKXg IaTA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729625406; x=1730230206; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=91RZU9Im1mJOrq2P9uaFSxI4j9qGIThucf4AzLiTGJQ=; b=eJgM1F7X3pesmqCRJvgmetmIrAcbSqTz/92KRiVcg9SvdSyUjQcQQ2yeft2wo5Zi0L moxHUys0eCzxuGudFqviWqjB77SpNYOrwe+BAtPaMvxvvjyBQfwrPmMGW+97yeoFvv2p HoxeWK+sMtOcTsMx5wQjGP4N/LSV53mI7sG/TNkM84J8eKI2mw77iVYpi8M99DdvfQhR /Yrq5JSWKUWOHeKKxV8TWpaUZzOc9fvLPCmLGaaTRGI1TRD2K6FFoVqybBGGOG5z1y17 7XJKdqXK9s88tWZtZff4SnWP2aVMbdfXF49z9+5Jccz1kJksRPHQpz6Hf7dM64XC23Ai q+YA== X-Gm-Message-State: AOJu0YxAr1VsSfl/8lbHCehkU0SkhqWEIjnLPbWr8gGZ93g1vmZrFrhB Lc9sukPjEhQcaPPM7sbq4YGh9jCtBtIGFwegYLUXx3DCg923h17acsbzzHP/YTY= X-Google-Smtp-Source: AGHT+IGGKxwhFu82Kx287SIzTLPVnCIKI6xEf3aZnrPozIx88QdOWVORIZasbmDFFI1F9UEgs7b7OA== X-Received: by 2002:a17:902:e5c6:b0:20c:c834:b107 with SMTP id d9443c01a7336-20fa9e49d8fmr2964315ad.22.1729625405532; Tue, 22 Oct 2024 12:30:05 -0700 (PDT) Received: from KASONG-MC4.tencent.com ([106.37.123.36]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-20e7f0d9f05sm45895305ad.186.2024.10.22.12.30.02 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 22 Oct 2024 12:30:05 -0700 (PDT) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Chris Li , Barry Song , Ryan Roberts , Hugh Dickins , Yosry Ahmed , "Huang, Ying" , Tim Chen , Nhat Pham , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH 06/13] mm, swap: clean up plist removal and adding Date: Wed, 23 Oct 2024 03:24:44 +0800 Message-ID: <20241022192451.38138-7-ryncsn@gmail.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20241022192451.38138-1-ryncsn@gmail.com> References: <20241022192451.38138-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 X-Stat-Signature: jcnpbie6gagfm9xfw1rsfihdajfdepuc X-Rspamd-Queue-Id: 4811F120005 X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1729625386-145670 X-HE-Meta: U2FsdGVkX1+GQPnQHgynupK7J7nsKS6GBnEkYznysXESJsa1M0fhxwSM/q8m3wbfLz86nRYq3y+Qjb5R2vyoq7aRBGzuKaQgIgMKMA72jUlyBpvGS6fQ29ih4qvHmig/ptDo1NIQK5CMVOVdzELHziQy8femGHjlmAFH7ySc0uUNtHxsisp4m2kHeEsZvSUVCaLZUBLiAZzkx3JA+/K45LPX2ZFOGa65sh9/uJe5DQNE4VFKmtuFGd8Sej/tjL921KVH5f1WVmVQC8RvPgYqi3VzS9hMJFZ8ByCKYL+pDkFvpL2YxS4Nre8PVF/vRxPt3PxL+Y+61sTKmXNPByjzOuMqu0dVMmCUrWU3hh8F83tun49xhSVQ7/zD/UoAg2e8EzI92j701d8VlzfE4rN8GP7Yrbs1xJiSn+31e/dfJqNjE3k6ZppFZE3dE/s7Cn2ktYSD2A6h8lBhWHEbBtaNPBzjF4TESH6GlIhBkjtq3jUw72Vp/SV64cbpxJUTdT4gUkU5SUEGNbcmU3zW1aS2M4q8vVtoyP2xYcaXcoJ4W+DvADe0xICEgHWXSyzWLeH/osck7j7sEtTRJz8zsJMuAeDKHpuu2kaKvACM/rW8ePOGtbLheM9cSp97dn8jBclGKrR1LycZlqkXTU2HkklAZrPBAljUirEYHmJngE6N6QmvK4bZYrioTp82pygJg0r9+7m3Mz9C2Rs/G+8fliDvR045cxuF4uccsbUE6Z7eUo15h0Bhbmlrgnke/x2F0bLEfBsiagmLW400Ry90bSKXeYxUQ5c/fGPQFvCXWrf8GpN4ozpsP3xWDzuEUU+ytP2YrLvUo5PLCzC6gkmhg5+Nw/CNkAi1bu1lJaBFi0H6I0sOWDQNXktZIIiXC5r+qRThw3Mnj77UQR+bR0f+DsBAlrJRZqPObgUARl4F/L17RAvD67ehx7TicO2frcYMIbNmxn20h1ZEJ5VOQIOzkvs mBwkvaue E7YMlrDoUgLItLdolXHXK3CfmC4CkIaJIWKklVerkr4+CZNBapcB5cDsGY+qddFVY+Q7I4s7985iswWP82L2p3h97rKEJIq/daOij5+PwU0ORunNMrXldzXARrN8IRM/nwNp93UKX83sD2GJkURsa7470U6FJheJQeSiys5PoTDZbtjb8eQGJ1+YK8JCbrw9TMqTEa9Jms3CYMeZsSnogCMf91kAlzKZj5lutm7Av0ZFUyi1Sw0dtp2uOE+C/k11q6RoR1hTjPG9x61HI30VZd7ulXoMpzIrny58DLj8tS3NuHz5DIfmuWzIXdRX6J9plHFywktPpMHYpkaZgHVCnkvdQ/WMpa7VB356OnaGwQmXULV2fBSDF8UE1gAS/8+KAliHE8me+wOmo2w2ZAHx2nwqDvnKWAZ5i5RWv3SHu1tuAYDonffURjeOHxK3Tvm+Wc+2tq1tIXocLLIswdK3wPkgrb4hqoBCzFGM44ViArjqmiJ1Ia5FqHRpUqvWc5uAvX4SOtexgyPxsl7ZAXI5NPEG5eA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Kairui Song When swap device is full (inuse_pages == pages), it should be removed from the plist. And if any slot is freed, the swap device should be added back to the plist. On swapoff / swapon, the swap device will also be force removed / added. This is currently serialized by si->lock, and some historical sanity check code are still here. This commit decouple this from the protection of si->lock and clean it up to prepare for si->lock rework. Noticing inuse_pages counter is the only thing decides if a device should be removed from or added to the plist (except swapon / swapoff as a special case), and inuse_pages is a very hot counter. So to avoid extra overhead on the counter update hot path, and make it possible to check and update the plist when the counter value changes, embed the plist state into the inuse_pages counter, and turn the counter into an atomic. This way we can check and update the counter with one CAS and avoid any extra synchronization. If the counter is full (inuse_pages == pages) with the off-list bit unset, try to remove it from the plist. If the counter is not full (inuse_pages != pages) with the off-list bit set, try to add it to the plist. Removing and adding is serialized with lock as well as the bit setting. Ordinary counter updates will be lockless. Signed-off-by: Kairui Song --- include/linux/swap.h | 2 +- mm/swapfile.c | 182 +++++++++++++++++++++++++++++++------------ 2 files changed, 132 insertions(+), 52 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index c0d49dad7a4b..16dcf8bd1a4e 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -306,7 +306,7 @@ struct swap_info_struct { /* list of cluster that are fragmented or contented */ unsigned int frag_cluster_nr[SWAP_NR_ORDERS]; unsigned int pages; /* total of usable pages of swap */ - unsigned int inuse_pages; /* number of those currently in use */ + atomic_long_t inuse_pages; /* number of those currently in use */ struct percpu_cluster __percpu *percpu_cluster; /* per cpu's swap location */ struct rb_root swap_extent_root;/* root of the swap extent rbtree */ struct block_device *bdev; /* swap device or bdev of swap file */ diff --git a/mm/swapfile.c b/mm/swapfile.c index e620b41c3120..4e629536a07c 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -128,6 +128,25 @@ static inline unsigned char swap_count(unsigned char ent) return ent & ~SWAP_HAS_CACHE; /* may include COUNT_CONTINUED flag */ } +/* + * Use the second highest bit of inuse_pages as the indicator + * of if one swap device is on the allocation plist. + * + * inuse_pages is the only thing decides of a device should be on + * list or not (except swapoff as a special case). By embedding the + * on-list bit into it, updaters don't need any lock to check the + * device list status. + * + * This bit will be set to 1 if the device is not on the plist and not + * usable, will be cleared if the device is on the plist. + */ +#define SWAP_USAGE_OFFLIST_BIT (1UL << (BITS_PER_TYPE(atomic_t) - 2)) +#define SWAP_USAGE_COUNTER_MASK (~SWAP_USAGE_OFFLIST_BIT) +static long swap_usage_in_pages(struct swap_info_struct *si) +{ + return atomic_long_read(&si->inuse_pages) & SWAP_USAGE_COUNTER_MASK; +} + /* Reclaim the swap entry anyway if possible */ #define TTRS_ANYWAY 0x1 /* @@ -709,7 +728,7 @@ static void swap_reclaim_full_clusters(struct swap_info_struct *si, bool force) int nr_reclaim; if (force) - to_scan = si->inuse_pages / SWAPFILE_CLUSTER; + to_scan = swap_usage_in_pages(si) / SWAPFILE_CLUSTER; while (!list_empty(&si->full_clusters)) { ci = list_first_entry(&si->full_clusters, struct swap_cluster_info, list); @@ -860,42 +879,121 @@ static unsigned long cluster_alloc_swap_entry(struct swap_info_struct *si, int o return found; } -static void __del_from_avail_list(struct swap_info_struct *si) +/* + * SWAP_USAGE_OFFLIST_BIT can only be cleared by this helper and synced with + * counter updaters with atomic. + */ +static void del_from_avail_list(struct swap_info_struct *si, bool swapoff) { int nid; - assert_spin_locked(&si->lock); + spin_lock(&swap_avail_lock); + + if (swapoff) { + /* Clear SWP_WRITEOK so add_to_avail_list won't add it back */ + si->flags &= ~SWP_WRITEOK; + + /* Force take it off. */ + atomic_long_or(SWAP_USAGE_OFFLIST_BIT, &si->inuse_pages); + } else { + /* + * If not swapoff, take it off-list only when it's full and + * SWAP_USAGE_OFFLIST_BIT is not set (inuse_pages == pages). + * The cmpxchg below will fail and skip the removal if there + * are slots freed or device is off-listed by someone else. + */ + if (atomic_long_cmpxchg(&si->inuse_pages, si->pages, + si->pages | SWAP_USAGE_OFFLIST_BIT) != si->pages) + goto skip; + } + for_each_node(nid) plist_del(&si->avail_lists[nid], &swap_avail_heads[nid]); + +skip: + spin_unlock(&swap_avail_lock); } -static void del_from_avail_list(struct swap_info_struct *si) +/* + * SWAP_USAGE_OFFLIST_BIT can only be set by this helper and synced with + * counter updaters with atomic. + */ +static void add_to_avail_list(struct swap_info_struct *si, bool swapon) { + int nid; + long val; + bool swapoff; + spin_lock(&swap_avail_lock); - __del_from_avail_list(si); + + /* Special handling for swapon / swapoff */ + if (swapon) { + si->flags |= SWP_WRITEOK; + swapoff = false; + } else { + swapoff = !(READ_ONCE(si->flags) & SWP_WRITEOK); + } + + if (swapoff) + goto skip; + + if (!(atomic_long_read(&si->inuse_pages) & SWAP_USAGE_OFFLIST_BIT)) + goto skip; + + val = atomic_long_fetch_and_relaxed(~SWAP_USAGE_OFFLIST_BIT, &si->inuse_pages); + + /* + * When device is full and device is on the plist, only one updater will + * see (inuse_pages == si->pages) and will call del_from_avail_list. If + * that updater happen to be here, just skip adding. + */ + if (val == si->pages) { + /* Just like the cmpxchg in del_from_avail_list */ + if (atomic_long_cmpxchg(&si->inuse_pages, si->pages, + si->pages | SWAP_USAGE_OFFLIST_BIT) == si->pages) + goto skip; + } + + for_each_node(nid) + plist_add(&si->avail_lists[nid], &swap_avail_heads[nid]); + +skip: spin_unlock(&swap_avail_lock); } -static void swap_range_alloc(struct swap_info_struct *si, - unsigned int nr_entries) +/* + * swap_usage_add / swap_usage_sub are serialized by ci->lock in each cluster + * so the total contribution to the global counter should always be positive. + */ +static bool swap_usage_add(struct swap_info_struct *si, unsigned int nr_entries) { - WRITE_ONCE(si->inuse_pages, si->inuse_pages + nr_entries); - if (si->inuse_pages == si->pages) { - del_from_avail_list(si); + long val = atomic_long_add_return_relaxed(nr_entries, &si->inuse_pages); - if (vm_swap_full()) - schedule_work(&si->reclaim_work); + /* If device is full, SWAP_USAGE_OFFLIST_BIT not set, try off list it */ + if (val == si->pages) { + del_from_avail_list(si, false); + return true; } + + return false; } -static void add_to_avail_list(struct swap_info_struct *si) +static void swap_usage_sub(struct swap_info_struct *si, unsigned int nr_entries) { - int nid; + long val = atomic_long_sub_return_relaxed(nr_entries, &si->inuse_pages); - spin_lock(&swap_avail_lock); - for_each_node(nid) - plist_add(&si->avail_lists[nid], &swap_avail_heads[nid]); - spin_unlock(&swap_avail_lock); + /* If device is off list, try add it back */ + if (val & SWAP_USAGE_OFFLIST_BIT) + add_to_avail_list(si, false); +} + +static void swap_range_alloc(struct swap_info_struct *si, + unsigned int nr_entries) +{ + if (swap_usage_add(si, nr_entries)) { + if (vm_swap_full()) + schedule_work(&si->reclaim_work); + } } static void swap_range_free(struct swap_info_struct *si, unsigned long offset, @@ -913,8 +1011,6 @@ static void swap_range_free(struct swap_info_struct *si, unsigned long offset, for (i = 0; i < nr_entries; i++) clear_bit(offset + i, si->zeromap); - if (si->inuse_pages == si->pages) - add_to_avail_list(si); if (si->flags & SWP_BLKDEV) swap_slot_free_notify = si->bdev->bd_disk->fops->swap_slot_free_notify; @@ -928,13 +1024,13 @@ static void swap_range_free(struct swap_info_struct *si, unsigned long offset, } clear_shadow_from_swap_cache(si->type, begin, end); + atomic_long_add(nr_entries, &nr_swap_pages); /* * Make sure that try_to_unuse() observes si->inuse_pages reaching 0 * only after the above cleanups are done. */ smp_wmb(); - atomic_long_add(nr_entries, &nr_swap_pages); - WRITE_ONCE(si->inuse_pages, si->inuse_pages - nr_entries); + swap_usage_sub(si, nr_entries); } static int cluster_alloc_swap(struct swap_info_struct *si, @@ -1020,19 +1116,6 @@ int get_swap_pages(int n_goal, swp_entry_t swp_entries[], int entry_order) plist_requeue(&si->avail_lists[node], &swap_avail_heads[node]); spin_unlock(&swap_avail_lock); spin_lock(&si->lock); - if ((si->inuse_pages == si->pages) || !(si->flags & SWP_WRITEOK)) { - spin_lock(&swap_avail_lock); - if (plist_node_empty(&si->avail_lists[node])) { - spin_unlock(&si->lock); - goto nextsi; - } - WARN(!(si->flags & SWP_WRITEOK), - "swap_info %d in list but !SWP_WRITEOK\n", - si->type); - __del_from_avail_list(si); - spin_unlock(&si->lock); - goto nextsi; - } n_ret = scan_swap_map_slots(si, SWAP_HAS_CACHE, n_goal, swp_entries, order); spin_unlock(&si->lock); @@ -1041,7 +1124,6 @@ int get_swap_pages(int n_goal, swp_entry_t swp_entries[], int entry_order) cond_resched(); spin_lock(&swap_avail_lock); -nextsi: /* * if we got here, it's likely that si was almost full before, * and since scan_swap_map_slots() can drop the si->lock, @@ -1773,7 +1855,7 @@ unsigned int count_swap_pages(int type, int free) if (sis->flags & SWP_WRITEOK) { n = sis->pages; if (free) - n -= sis->inuse_pages; + n -= swap_usage_in_pages(sis); } spin_unlock(&sis->lock); } @@ -2108,7 +2190,7 @@ static int try_to_unuse(unsigned int type) swp_entry_t entry; unsigned int i; - if (!READ_ONCE(si->inuse_pages)) + if (!swap_usage_in_pages(si)) goto success; retry: @@ -2121,7 +2203,7 @@ static int try_to_unuse(unsigned int type) spin_lock(&mmlist_lock); p = &init_mm.mmlist; - while (READ_ONCE(si->inuse_pages) && + while (swap_usage_in_pages(si) && !signal_pending(current) && (p = p->next) != &init_mm.mmlist) { @@ -2149,7 +2231,7 @@ static int try_to_unuse(unsigned int type) mmput(prev_mm); i = 0; - while (READ_ONCE(si->inuse_pages) && + while (swap_usage_in_pages(si) && !signal_pending(current) && (i = find_next_to_unuse(si, i)) != 0) { @@ -2184,7 +2266,7 @@ static int try_to_unuse(unsigned int type) * folio_alloc_swap(), temporarily hiding that swap. It's easy * and robust (though cpu-intensive) just to keep retrying. */ - if (READ_ONCE(si->inuse_pages)) { + if (swap_usage_in_pages(si)) { if (!signal_pending(current)) goto retry; return -EINTR; @@ -2193,7 +2275,7 @@ static int try_to_unuse(unsigned int type) success: /* * Make sure that further cleanups after try_to_unuse() returns happen - * after swap_range_free() reduces si->inuse_pages to 0. + * after swap_range_free() reduces inuse_pages to 0. */ smp_mb(); return 0; @@ -2211,7 +2293,7 @@ static void drain_mmlist(void) unsigned int type; for (type = 0; type < nr_swapfiles; type++) - if (swap_info[type]->inuse_pages) + if (swap_usage_in_pages(swap_info[type])) return; spin_lock(&mmlist_lock); list_for_each_safe(p, next, &init_mm.mmlist) @@ -2390,7 +2472,6 @@ static void setup_swap_info(struct swap_info_struct *si, int prio, static void _enable_swap_info(struct swap_info_struct *si) { - si->flags |= SWP_WRITEOK; atomic_long_add(si->pages, &nr_swap_pages); total_swap_pages += si->pages; @@ -2407,9 +2488,8 @@ static void _enable_swap_info(struct swap_info_struct *si) */ plist_add(&si->list, &swap_active_head); - /* add to available list if swap device is not full */ - if (si->inuse_pages < si->pages) - add_to_avail_list(si); + /* Add back to available list */ + add_to_avail_list(si, true); } static void enable_swap_info(struct swap_info_struct *si, int prio, @@ -2507,7 +2587,7 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile) goto out_dput; } spin_lock(&p->lock); - del_from_avail_list(p); + del_from_avail_list(p, true); if (p->prio < 0) { struct swap_info_struct *si = p; int nid; @@ -2525,7 +2605,6 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile) plist_del(&p->list, &swap_active_head); atomic_long_sub(p->pages, &nr_swap_pages); total_swap_pages -= p->pages; - p->flags &= ~SWP_WRITEOK; spin_unlock(&p->lock); spin_unlock(&swap_lock); @@ -2705,7 +2784,7 @@ static int swap_show(struct seq_file *swap, void *v) } bytes = K(si->pages); - inuse = K(READ_ONCE(si->inuse_pages)); + inuse = K(swap_usage_in_pages(si)); file = si->swap_file; len = seq_file_path(swap, file, " \t\n\\"); @@ -2822,6 +2901,7 @@ static struct swap_info_struct *alloc_swap_info(void) } spin_lock_init(&p->lock); spin_lock_init(&p->cont_lock); + atomic_long_set(&p->inuse_pages, SWAP_USAGE_OFFLIST_BIT); init_completion(&p->comp); return p; @@ -3319,7 +3399,7 @@ void si_swapinfo(struct sysinfo *val) struct swap_info_struct *si = swap_info[type]; if ((si->flags & SWP_USED) && !(si->flags & SWP_WRITEOK)) - nr_to_be_unused += READ_ONCE(si->inuse_pages); + nr_to_be_unused += swap_usage_in_pages(si); } val->freeswap = atomic_long_read(&nr_swap_pages) + nr_to_be_unused; val->totalswap = total_swap_pages + nr_to_be_unused; From patchwork Tue Oct 22 19:24:45 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kairui Song X-Patchwork-Id: 13846081 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2E49ECDD0D8 for ; Tue, 22 Oct 2024 19:30:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B5FB76B009A; Tue, 22 Oct 2024 15:30:13 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B0ED06B009B; Tue, 22 Oct 2024 15:30:13 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 963B06B009C; Tue, 22 Oct 2024 15:30:13 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 71B206B009A for ; Tue, 22 Oct 2024 15:30:13 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 1C5FD120458 for ; Tue, 22 Oct 2024 19:29:58 +0000 (UTC) X-FDA: 82702228476.06.02062C8 Received: from mail-pl1-f171.google.com (mail-pl1-f171.google.com [209.85.214.171]) by imf03.hostedemail.com (Postfix) with ESMTP id 8905220014 for ; Tue, 22 Oct 2024 19:30:03 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=TYAxA47k; spf=pass (imf03.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.171 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729625334; a=rsa-sha256; cv=none; b=HX//bHFVdP51346AO1XZ4Z5txkfOjfpTjrPVbuTJ7ARbY7JKEujWWlA+atFH2EJaTpB/Z+ FNb0oE1S2GjdbUkWXhg2aa+jf9ptkwJ9fo6092I+EG2c1BYtG72Ub3tkubLvFrJVd0CARw f66uRHb9904v3zM7Hazqg7YwnWDKHR4= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=TYAxA47k; spf=pass (imf03.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.171 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1729625334; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Y5IVfma5ek0o5p05lfogeIX/Ksc+K9HaTmPWOdXoK94=; b=B4iHWjxVxS3VR80QjSA9Q6uANbA2xhkkFF+1l6eqIzLfX3aUx6f1dihk+qVDiAWfIYl5zV OR2kZO8/wAKWge6yP7hWAlHus4YsHpY5EmqEx/67BL8j+u4xOe+4mGOqZveroXFeHRuZ3M 7UefcDyNj8SWZ7B4fhtcHJJr1RI+woU= Received: by mail-pl1-f171.google.com with SMTP id d9443c01a7336-20c803787abso1321105ad.0 for ; Tue, 22 Oct 2024 12:30:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1729625409; x=1730230209; darn=kvack.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=Y5IVfma5ek0o5p05lfogeIX/Ksc+K9HaTmPWOdXoK94=; b=TYAxA47kim6KNzKua2gsaJQiBgGbmHR6YAqn0wQadTRLnLpgOEF3IGZkhYL/3vUETs jj1ct9HWpmdMeTsi6moXF7zr1SiINkd8linPM/WywA5TGphgX+tVW1XTOxjo4u121M3C CbA1w+BoXyGOU/+13qG3WgMbvO/eXH3lISQUP2rfBzY/BCHL2ET/EcLFyjVt2yYUV+qP 95aGk0zu7efGBJSONV4XgG+UcNRzdk7V40qVs3k0NLeIPxL0zo116tiheHcqUqFvd9Lg KBiY7rD+cRzBdulqdK5oi6qgFDMuFVd7qvdTrMBsIg7foN5+UltzvX+hrNjpQDv0c8/J nl5A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729625409; x=1730230209; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=Y5IVfma5ek0o5p05lfogeIX/Ksc+K9HaTmPWOdXoK94=; b=G1xb1GLBpzutJrZgwAouafhfv3A0e8KH6nXA9PM62AYv4OxWyV9UIgykVEh39Tlrm1 5UQ9NZICsHMMZIz0BfhaTzhz55wTrzUH1fmlKATdDnGzxROdkksGaUJenLdg3jjCn75T 2PF5Af+PnZNhzUv2098umJ57HH22xvhParhoCeLQuh4aVTDVIDL/t1Cr36OkM+iIemDg ovHoUH2p5sSqNpurEjk4yH48/6x8MMVdbrI++Nq9SQJftR3zsleZJ8Pa0JBVhaFSBrfB dgFsJMRyVIDCeV1bTcpFsbpseJ8YCCu2upexEAlWMbGdRQMGWTUvh0NOvkkH987Kzp+O BylQ== X-Gm-Message-State: AOJu0YycAvCR25BdrnF9si3Bk8QvI9X0ZtkkG0wImWVw3uMUeiJFATqf uYfSgQ/kOV2vRtVo0hw5NDiwgTqBOJIJpqO8ACDh7OnHWV32LpExUQrpE8Z0Ldk= X-Google-Smtp-Source: AGHT+IEaW5p22BuL++IGvPbx1wSKUh2fWL5sYrhcqHGPkvgStJJ2rzNvX7pO43mcwrLobjoc3axngg== X-Received: by 2002:a17:902:e845:b0:20b:a41f:6e4d with SMTP id d9443c01a7336-20fab2e2baemr2651685ad.15.1729625409537; Tue, 22 Oct 2024 12:30:09 -0700 (PDT) Received: from KASONG-MC4.tencent.com ([106.37.123.36]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-20e7f0d9f05sm45895305ad.186.2024.10.22.12.30.06 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 22 Oct 2024 12:30:09 -0700 (PDT) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Chris Li , Barry Song , Ryan Roberts , Hugh Dickins , Yosry Ahmed , "Huang, Ying" , Tim Chen , Nhat Pham , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH 07/13] mm, swap: hold a reference of si during scan and clean up flags Date: Wed, 23 Oct 2024 03:24:45 +0800 Message-ID: <20241022192451.38138-8-ryncsn@gmail.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20241022192451.38138-1-ryncsn@gmail.com> References: <20241022192451.38138-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 X-Stat-Signature: z5ecoi4gxpszkk7w3edyhugghcyguujk X-Rspamd-Queue-Id: 8905220014 X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1729625403-453599 X-HE-Meta: U2FsdGVkX18efDnCu8d2dPwnoBX/oKOkwQ/kKEgO21Nqpx6Spq+Nq4pGxJYPWWIjFBSzvYoaOsUEI9LSEcwvoIPpFyewAWHia9esfamro5NJ2Lww5TLO5DSXeIc5QzPal5dNUue3R0GNEUHxxp5KeDj1GWY3+Y0F7WyxpV7AHXbKJk96661CMAC4hx0+5kEWveZIdUbt21fOmZTf7q7TZwwKLWXbr02+2mlihqApm50u+/ZtBXKWhKmZoAHlo0S1ixDAWATY7mJHhJ69w0b+7Hb8kNm0IhkfnkqE050S5ggTCixn7eNY1ZWS2ZIs0BtGCMba/2GmfSfwo0lJ/yEDSbzsRN+y+3nmMVnV6qo1MR9RC4TCvuqrKiMgPrQ4VjJOOSNCPwTBJdJBUPcjIRx64ZkEBBxIlH3XUoSi95nLbUGHefaiklprY+G0mM4ief4y7B/xoPHM9raXi6jo8xYbNxbhwBp07Odg4xB+1u7TZVlew5PxJkkm7mF/QJnp+CZlGz+bhnUpjVi/LCmIc1LBDapinb4Ay/atAI8Hcl7NEiZArtyS4S4Yug8Jku+DmR878Xa11Fx97lZ4vZ7uvnrjReznVKiIHpkqlnVRL8Ea8jRq0KEtmR8V2J1SezJ1f3gn6VhXqrMWYMKQtbf7iTRYYfvakTNLxVSRJuKq8/yg4634Q4k+ge1zarATphrIIeqZJrI2CG93R5ZPCB81SZEj4w3KlrGl5d6U1o2MhnJUIAxgk7MLn2aC/4/oqx9FMtVK2W2WxU6b8CAjQChYlct5+mCnC4TG9BSU7SiLA+SihxiO3R8/ki2+dT15mium2t5nkpQ3FyM+pvamp/3Dm3b1pAW2pQkF+OBXjC2kJjvXMyIC36xCSaYxHGuTzbYpMUifaHQkkK3bAB9ri+uFDCjpvto06XeK7hae9vuYRNMwuB2QvCdj3sUivEepBZGYSucfjrA+arxk+cgzMApurCW rrqCRE4X UQL1d8vp6rWjseaz6xb3U7rETQ+qE9IoCMF2/V5pKqb2Iv3AgCWmS9ljS2PdECMLbNIACDjw+NH60f5Gagddgnke4wyJPKw8r4sZoI4Fw1oa44kjh6xIwrCtjC8I+4K8DD6tKW1ZLL7zrA+0GGfPyNrtjhfU7QQhfLNX6usut3K1JF/8PUOdo7jGwWSnVezgPSqpPrdahAig67ta/hDVUtpKLmSsp1X64xEiTlIGqCAaDT60s7MzstDyw1uiF4Cme+g/Uwhg9Dj7Y190SZvVByVPiOVVpLL1GlS8aoCIEZpMAuD58e02YtZ/joEu8EJerjLaSJALRBagepKlKhpud1RVzXJILpXS3SBfwuE9FzTUXkuGlLlSaaYEVtELMJoifgCi2AJohupyBgfM1ik2eo/54oYo+6N/9N4/+AC3OrZeu8QMv8ipT5Yt9cilvXEUYT0Q4PGHQuQJ4v9asXPKZXwVujAAlXVCFDzlx3JvJ9UTchGa7+/oZNc3KuZ1ri8ntdkLf22bDMZtr8pW47ae2vxBgaw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Kairui Song The flag SWP_SCANNING was used as an indicator of whether a device is being scanned, and prevents swap off. But it's already no longer used. The only thing protects the scanning now is the si lock. However allocation path may drop the si lock, in theory this could leaf to UAF. So clean this up, just hold a reference for whole allocation path. So per CPU counter killing will wait for existing scan and other usage. The flag SWP_SCANNING can also be dropped. Signed-off-by: Kairui Song --- include/linux/swap.h | 1 - mm/swapfile.c | 62 +++++++++++++++++++++++--------------------- 2 files changed, 33 insertions(+), 30 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index 16dcf8bd1a4e..1651174959c8 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -219,7 +219,6 @@ enum { SWP_STABLE_WRITES = (1 << 11), /* no overwrite PG_writeback pages */ SWP_SYNCHRONOUS_IO = (1 << 12), /* synchronous IO is efficient */ /* add others here before... */ - SWP_SCANNING = (1 << 14), /* refcount in scan_swap_map */ }; #define SWAP_CLUSTER_MAX 32UL diff --git a/mm/swapfile.c b/mm/swapfile.c index 4e629536a07c..d6b6e71ccc19 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1088,6 +1088,21 @@ static int scan_swap_map_slots(struct swap_info_struct *si, return cluster_alloc_swap(si, usage, nr, slots, order); } +static bool get_swap_device_info(struct swap_info_struct *si) +{ + if (!percpu_ref_tryget_live(&si->users)) + return false; + /* + * Guarantee the si->users are checked before accessing other + * fields of swap_info_struct. + * + * Paired with the spin_unlock() after setup_swap_info() in + * enable_swap_info(). + */ + smp_rmb(); + return true; +} + int get_swap_pages(int n_goal, swp_entry_t swp_entries[], int entry_order) { int order = swap_entry_order(entry_order); @@ -1115,13 +1130,16 @@ int get_swap_pages(int n_goal, swp_entry_t swp_entries[], int entry_order) /* requeue si to after same-priority siblings */ plist_requeue(&si->avail_lists[node], &swap_avail_heads[node]); spin_unlock(&swap_avail_lock); - spin_lock(&si->lock); - n_ret = scan_swap_map_slots(si, SWAP_HAS_CACHE, - n_goal, swp_entries, order); - spin_unlock(&si->lock); - if (n_ret || size > 1) - goto check_out; - cond_resched(); + if (get_swap_device_info(si)) { + spin_lock(&si->lock); + n_ret = scan_swap_map_slots(si, SWAP_HAS_CACHE, + n_goal, swp_entries, order); + spin_unlock(&si->lock); + put_swap_device(si); + if (n_ret || size > 1) + goto check_out; + cond_resched(); + } spin_lock(&swap_avail_lock); /* @@ -1272,16 +1290,8 @@ struct swap_info_struct *get_swap_device(swp_entry_t entry) si = swp_swap_info(entry); if (!si) goto bad_nofile; - if (!percpu_ref_tryget_live(&si->users)) + if (!get_swap_device_info(si)) goto out; - /* - * Guarantee the si->users are checked before accessing other - * fields of swap_info_struct. - * - * Paired with the spin_unlock() after setup_swap_info() in - * enable_swap_info(). - */ - smp_rmb(); offset = swp_offset(entry); if (offset >= si->max) goto put_out; @@ -1761,10 +1771,13 @@ swp_entry_t get_swap_page_of_type(int type) goto fail; /* This is called for allocating swap entry, not cache */ - spin_lock(&si->lock); - if ((si->flags & SWP_WRITEOK) && scan_swap_map_slots(si, 1, 1, &entry, 0)) - atomic_long_dec(&nr_swap_pages); - spin_unlock(&si->lock); + if (get_swap_device_info(si)) { + spin_lock(&si->lock); + if ((si->flags & SWP_WRITEOK) && scan_swap_map_slots(si, 1, 1, &entry, 0)) + atomic_long_dec(&nr_swap_pages); + spin_unlock(&si->lock); + put_swap_device(si); + } fail: return entry; } @@ -2650,15 +2663,6 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile) spin_lock(&p->lock); drain_mmlist(); - /* wait for anyone still in scan_swap_map_slots */ - while (p->flags >= SWP_SCANNING) { - spin_unlock(&p->lock); - spin_unlock(&swap_lock); - schedule_timeout_uninterruptible(1); - spin_lock(&swap_lock); - spin_lock(&p->lock); - } - swap_file = p->swap_file; p->swap_file = NULL; p->max = 0; From patchwork Tue Oct 22 19:24:46 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kairui Song X-Patchwork-Id: 13846082 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3DCBDCDD0D7 for ; Tue, 22 Oct 2024 19:30:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BFE5A6B009C; Tue, 22 Oct 2024 15:30:17 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BAAB86B009D; Tue, 22 Oct 2024 15:30:17 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A244F6B009E; Tue, 22 Oct 2024 15:30:17 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 78C3F6B009C for ; Tue, 22 Oct 2024 15:30:17 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id DBE6DA0469 for ; Tue, 22 Oct 2024 19:29:46 +0000 (UTC) X-FDA: 82702228770.30.2F572FD Received: from mail-pl1-f179.google.com (mail-pl1-f179.google.com [209.85.214.179]) by imf02.hostedemail.com (Postfix) with ESMTP id 0974E8000C for ; Tue, 22 Oct 2024 19:29:43 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=fcJVvfoa; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf02.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.179 as permitted sender) smtp.mailfrom=ryncsn@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729625364; a=rsa-sha256; cv=none; b=DYbxw4VvcJk2bbobEPXYIKi0ktSRufRlaVBX0v9px79Gkgcj1C3KlhbPxeW3vI2xUz254/ Wh8IoqYy0cY7VLP6six1ni5igIGUklsFcZz+qAKJlwjEOhKnyBo1KEn+dP/hMBN2t2I7hb nTwxcagyAMYfyo56h3FkexGx6vIf6jM= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=fcJVvfoa; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf02.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.179 as permitted sender) smtp.mailfrom=ryncsn@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1729625364; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=C0u3UHnFQamHvHumiMWfS42vAHn/bzcqM1FVrGPT7rA=; b=gH9qyo48d6gg7+SBSqO0FfCxnDmB4hubA7qh8YBVvUbp9VyXTygD/zrSAKchMdss4+w2Us NC8uTBrFH8x26DIu9TNH5cR5DLYrf6zYiotpj63QoXQ0unZZUTmQdrFjFR9Kjt7rXxYUSt +o4JGdbeQGx9I2rYRO21XlUwPR3uutw= Received: by mail-pl1-f179.google.com with SMTP id d9443c01a7336-20ceb8bd22fso51544595ad.3 for ; Tue, 22 Oct 2024 12:30:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1729625413; x=1730230213; darn=kvack.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=C0u3UHnFQamHvHumiMWfS42vAHn/bzcqM1FVrGPT7rA=; b=fcJVvfoaQRhoI/x3hY6PgCjnlwbMvoHsgvDTy1xRCjylYtbVFvWRCcvvmehLt2hZrm ua6nAFmUVXNQPNFBJInYDUDMx/GB5BgvkmxGgkNZ13hJA0djlI+tZfBCOq1Yh0FB5JWe 9vImoyC/nfLuhf2oMgMlY8ZqLPiCUAdyW7WY4dp8hEmmon7AU12D3zApW41NCnqpA0aR sSyLFVDWMUXoxcggQLZqFZpyq1xsGUvl9N0C960mXeZNgXP/fdAb8OHl7WWBLibQY4uv uV+J6vCHcXejJ+JTuHoh3dWI4D5gRahTvv6UTVON2IhL95o6zq8+gWuRMhbAGat6FUrt L0QQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729625413; x=1730230213; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=C0u3UHnFQamHvHumiMWfS42vAHn/bzcqM1FVrGPT7rA=; b=gffRlnoEvTfdk/QghNM8EUrVwCIY56NfVooBjYtEc3k+gndSd+lsVEhS7ZZoyffArP n2EpVUHQGgmWb9+IsaJmj/q/aVNXjRmxmlkWDapGSm4cTPqahTxHIlM/rtzVyGb8KOi5 ienUxKExNiob7uQ7l8pHh8wiaw/kbLqIBbJGblfcnI+VAJA+uwUFuc+ulWFJb1wecbLD 3caPzPXhN+BZWEcs1rqbjBLu39eU9ggAYLWyNPiZn3yFlhJ1rzhU5bHsDKkKjF9vwH8a gRo0peaqG0Zlt92Sv9Iq5YPjvd0+kw0lDTuNjmqdihbBifnv47Viu3OhjS7eYXf7V5L5 n58g== X-Gm-Message-State: AOJu0YyK3OMCoV3RRV/DX0T0P6mWnJ6AsRqfUIvV54e81DxxTeWswhTX tseFg8hT2K4Ke2DPiuPRXesPVo00TA3Wj3S0zdpckcpQUmNmr8PXsFqIfuOuqDQ= X-Google-Smtp-Source: AGHT+IGcRlQHedKQaqyDIKAfCIehFDARjRb4U2itQQ2f0zTHaoGQCyPSUA4jWG4iyNeB5YMGTc2MgA== X-Received: by 2002:a17:902:db12:b0:20c:e875:12b5 with SMTP id d9443c01a7336-20fab3276b7mr2519545ad.59.1729625413398; Tue, 22 Oct 2024 12:30:13 -0700 (PDT) Received: from KASONG-MC4.tencent.com ([106.37.123.36]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-20e7f0d9f05sm45895305ad.186.2024.10.22.12.30.09 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 22 Oct 2024 12:30:12 -0700 (PDT) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Chris Li , Barry Song , Ryan Roberts , Hugh Dickins , Yosry Ahmed , "Huang, Ying" , Tim Chen , Nhat Pham , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH 08/13] mm, swap: use an enum to define all cluster flags and wrap flags changes Date: Wed, 23 Oct 2024 03:24:46 +0800 Message-ID: <20241022192451.38138-9-ryncsn@gmail.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20241022192451.38138-1-ryncsn@gmail.com> References: <20241022192451.38138-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Queue-Id: 0974E8000C X-Rspamd-Server: rspam01 X-Stat-Signature: uo1zjt7oeinzmpgcbnqawjkuu9nmdnxt X-HE-Tag: 1729625383-911352 X-HE-Meta: U2FsdGVkX1/oREuE73ag47fqseRYuumYYrGguP6awgZ5B8gCvEjUgZapbmYCYn/s/yxGGDltPoB+XHLlDzBDqVsQFL3raEI2Iq7VZqJsSuVSpLeHjh++BMgw3RuH75oOcLeHS4FZ6P8/k4T2fJSWOQrtWJ4GvoATIQMSAdFGLAhy2pFcV9er/Pp6S/xW6LsDuAX/opSledkuses0iJY/0wdIuHVyVzGKeGXjDW0MLzlFPOGjP2JwsOn3aMppFZWWLJIK+Cr0jW2p/ZyzigY+MYxyW8FrwkeMbu5/98xQFswSqVKIsvoKjXHBEOtwOVa7ndYjKwfBC3sCpBp84a1x5xE0p9Nxo02P9WOv/cSzBgWrUC8rE+KjYTeavGZGkc3ts87rTJ0QNQJp7CCp6vjyl/8TYFfxQsq7pYtDIDjg7NG+pLOYfoWDegBBM6gXqjEXErf9QQ0dKx8OGz+kvMTP397yhxf56vW8FV+3629nAWSt2Mt1fo49qR5KfZy4aplFIkM/b30FWvgzcG8BBgWQJ01qkGpq1YU095nBfWQJt08QVjsjYMStpJdXjkkK/n6hRJPFrlNmrZawOYR8fJbiprG7kpKtUfF2IdvVgthVgKaYzPBUHmfiJcRP6L50KDp+vNyIEDSu/QjQOcUQ9h9FtWFrE61x9UTMQiM8n3TtpPnyHZBxVvviF7Ll0UIVTWXGPmDcle6igflDM5nT3xBr2xLBOFo4y2WFdnDVqTt/Mw3jbgAMu72hlBX5Ka8GQX5H7ZS8y+ynWYK8iT9tC0De7VZ/6Qp4tGd5JQJEBRGQGktB//Vcf0HebSDyQ8c8YqIBW6z1WvS+/l5gcTMQuz6A/0MKV1FkeTcXFXqHZiHRPx85vnhHQfLlVzykIgpG+hbS00HE6eITST9F2A8pdAhCw9k6JNyCyecHZenLaSvQq5QaWMiX/qQQiORlUr5ebn8pYOK++M4WkTc+OY+Kl4x wudTbMOR l7Aioj9WbSFQ0dWY19jEVTSQDIZOEstLgS8MqvUPBoVDbeC1gmejpn1bOWj4dumMZ/GVEOIbXEEKt3LX2zrpo0SdwJNjo6gqiifYwsq31lqV//VZK9VU4WH13QO/r7U8iFy1ldEgqzcoMtyOtBSvv6IrcYom1atZSYXzRak1XGHBxvCTxFft6c7q2XaOGmqfrSvUilRQ1dwqqLirKN7PEgmJf+VRrqjIoPLIvTOCOkV665/nT0FsLPVKa7I1Np0viYu8/akNpKDVBnlZVSzYK9u+jGAYJUPW6rS69/coIVE3tGW5HM8JTZ9EXXERuU4K76TkClOrrDxxAfRcRZZA/6IM46xvv9IQaVmFctBaLB0qP3vyh+kJH3xNYvhcwu/92y2EJSSs8BFfq9++lOrA+3qZcX5olUaUY1jOir0qeNU4pgj2cc852snlaDy3o7tkz8zLs0/I5qIw3G9uHQHvUVtxQwVbG/wvE08nMmnhS2FOW2kzX8FeLSqNhm6R+bs8faoNqrQpC94oDon/D6pTHOd1yPcN3kyNvippZmssi4NQiUAM= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Kairui Song Currently we are only using flags to indicate which list the cluster is on, using one bit for each list type might be a waste as the list type grows we will consume too much bits. And current the mixed usage of "&" and "==" is a bit confusing. Make it clean by using an enum to define all possible cluster status, only an off-list cluster will have the NONE (0) flag. And use a wrapper to annotate and sanitize all flag setting and list movement. Suggested-by: Chris Li Signed-off-by: Kairui Song --- include/linux/swap.h | 17 +++++++--- mm/swapfile.c | 76 +++++++++++++++++++++++--------------------- 2 files changed, 53 insertions(+), 40 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index 1651174959c8..75fc2da1767d 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -256,10 +256,19 @@ struct swap_cluster_info { u8 order; struct list_head list; }; -#define CLUSTER_FLAG_FREE 1 /* This cluster is free */ -#define CLUSTER_FLAG_NONFULL 2 /* This cluster is on nonfull list */ -#define CLUSTER_FLAG_FRAG 4 /* This cluster is on nonfull list */ -#define CLUSTER_FLAG_FULL 8 /* This cluster is on full list */ + +/* + * All on-list cluster must have a non-zero flag. + */ +enum swap_cluster_flags { + CLUSTER_FLAG_NONE = 0, /* For temporary off-list cluster */ + CLUSTER_FLAG_FREE, + CLUSTER_FLAG_NONFULL, + CLUSTER_FLAG_FRAG, + CLUSTER_FLAG_FULL, + CLUSTER_FLAG_DISCARD, + CLUSTER_FLAG_MAX, +}; /* * The first page in the swap file is the swap header, which is always marked diff --git a/mm/swapfile.c b/mm/swapfile.c index d6b6e71ccc19..96d8012b003c 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -402,7 +402,7 @@ static void discard_swap_cluster(struct swap_info_struct *si, static inline bool cluster_is_free(struct swap_cluster_info *info) { - return info->flags & CLUSTER_FLAG_FREE; + return info->flags == CLUSTER_FLAG_FREE; } static inline unsigned int cluster_index(struct swap_info_struct *si, @@ -433,6 +433,27 @@ static inline void unlock_cluster(struct swap_cluster_info *ci) spin_unlock(&ci->lock); } +static void cluster_move(struct swap_info_struct *si, + struct swap_cluster_info *ci, struct list_head *list, + enum swap_cluster_flags new_flags) +{ + VM_WARN_ON(ci->flags == new_flags); + BUILD_BUG_ON(1 << sizeof(ci->flags) * BITS_PER_BYTE < CLUSTER_FLAG_MAX); + + if (ci->flags == CLUSTER_FLAG_NONE) { + list_add_tail(&ci->list, list); + } else { + if (ci->flags == CLUSTER_FLAG_FRAG) { + VM_WARN_ON(!si->frag_cluster_nr[ci->order]); + si->frag_cluster_nr[ci->order]--; + } + list_move_tail(&ci->list, list); + } + ci->flags = new_flags; + if (new_flags == CLUSTER_FLAG_FRAG) + si->frag_cluster_nr[ci->order]++; +} + /* Add a cluster to discard list and schedule it to do discard */ static void swap_cluster_schedule_discard(struct swap_info_struct *si, struct swap_cluster_info *ci) @@ -446,10 +467,8 @@ static void swap_cluster_schedule_discard(struct swap_info_struct *si, */ memset(si->swap_map + idx * SWAPFILE_CLUSTER, SWAP_MAP_BAD, SWAPFILE_CLUSTER); - - VM_BUG_ON(ci->flags & CLUSTER_FLAG_FREE); - list_move_tail(&ci->list, &si->discard_clusters); - ci->flags = 0; + VM_BUG_ON(ci->flags == CLUSTER_FLAG_FREE); + cluster_move(si, ci, &si->discard_clusters, CLUSTER_FLAG_DISCARD); schedule_work(&si->discard_work); } @@ -457,12 +476,7 @@ static void __free_cluster(struct swap_info_struct *si, struct swap_cluster_info { lockdep_assert_held(&si->lock); lockdep_assert_held(&ci->lock); - - if (ci->flags) - list_move_tail(&ci->list, &si->free_clusters); - else - list_add_tail(&ci->list, &si->free_clusters); - ci->flags = CLUSTER_FLAG_FREE; + cluster_move(si, ci, &si->free_clusters, CLUSTER_FLAG_FREE); ci->order = 0; } @@ -478,6 +492,8 @@ static void swap_do_scheduled_discard(struct swap_info_struct *si) while (!list_empty(&si->discard_clusters)) { ci = list_first_entry(&si->discard_clusters, struct swap_cluster_info, list); list_del(&ci->list); + /* Must clear flag when taking a cluster off-list */ + ci->flags = CLUSTER_FLAG_NONE; idx = cluster_index(si, ci); spin_unlock(&si->lock); @@ -518,9 +534,6 @@ static void free_cluster(struct swap_info_struct *si, struct swap_cluster_info * lockdep_assert_held(&si->lock); lockdep_assert_held(&ci->lock); - if (ci->flags & CLUSTER_FLAG_FRAG) - si->frag_cluster_nr[ci->order]--; - /* * If the swap is discardable, prepare discard the cluster * instead of free it immediately. The cluster will be freed @@ -572,13 +585,9 @@ static void dec_cluster_info_page(struct swap_info_struct *si, return; } - if (!(ci->flags & CLUSTER_FLAG_NONFULL)) { - VM_BUG_ON(ci->flags & CLUSTER_FLAG_FREE); - if (ci->flags & CLUSTER_FLAG_FRAG) - si->frag_cluster_nr[ci->order]--; - list_move_tail(&ci->list, &si->nonfull_clusters[ci->order]); - ci->flags = CLUSTER_FLAG_NONFULL; - } + if (ci->flags != CLUSTER_FLAG_NONFULL) + cluster_move(si, ci, &si->nonfull_clusters[ci->order], + CLUSTER_FLAG_NONFULL); } static bool cluster_reclaim_range(struct swap_info_struct *si, @@ -657,11 +666,14 @@ static void cluster_alloc_range(struct swap_info_struct *si, struct swap_cluster { unsigned int nr_pages = 1 << order; + VM_BUG_ON(ci->flags != CLUSTER_FLAG_FREE && + ci->flags != CLUSTER_FLAG_NONFULL && + ci->flags != CLUSTER_FLAG_FRAG); + if (cluster_is_free(ci)) { - if (nr_pages < SWAPFILE_CLUSTER) { - list_move_tail(&ci->list, &si->nonfull_clusters[order]); - ci->flags = CLUSTER_FLAG_NONFULL; - } + if (nr_pages < SWAPFILE_CLUSTER) + cluster_move(si, ci, &si->nonfull_clusters[order], + CLUSTER_FLAG_NONFULL); ci->order = order; } @@ -669,14 +681,8 @@ static void cluster_alloc_range(struct swap_info_struct *si, struct swap_cluster swap_range_alloc(si, nr_pages); ci->count += nr_pages; - if (ci->count == SWAPFILE_CLUSTER) { - VM_BUG_ON(!(ci->flags & - (CLUSTER_FLAG_FREE | CLUSTER_FLAG_NONFULL | CLUSTER_FLAG_FRAG))); - if (ci->flags & CLUSTER_FLAG_FRAG) - si->frag_cluster_nr[ci->order]--; - list_move_tail(&ci->list, &si->full_clusters); - ci->flags = CLUSTER_FLAG_FULL; - } + if (ci->count == SWAPFILE_CLUSTER) + cluster_move(si, ci, &si->full_clusters, CLUSTER_FLAG_FULL); } static unsigned int alloc_swap_scan_cluster(struct swap_info_struct *si, unsigned long offset, @@ -806,9 +812,7 @@ static unsigned long cluster_alloc_swap_entry(struct swap_info_struct *si, int o while (!list_empty(&si->nonfull_clusters[order])) { ci = list_first_entry(&si->nonfull_clusters[order], struct swap_cluster_info, list); - list_move_tail(&ci->list, &si->frag_clusters[order]); - ci->flags = CLUSTER_FLAG_FRAG; - si->frag_cluster_nr[order]++; + cluster_move(si, ci, &si->frag_clusters[order], CLUSTER_FLAG_FRAG); offset = alloc_swap_scan_cluster(si, cluster_offset(si, ci), &found, order, usage); frags++; From patchwork Tue Oct 22 19:24:47 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kairui Song X-Patchwork-Id: 13846083 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CBC9CCDD0D7 for ; Tue, 22 Oct 2024 19:30:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5946A6B009E; Tue, 22 Oct 2024 15:30:22 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 544656B009F; Tue, 22 Oct 2024 15:30:22 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 395936B00A0; Tue, 22 Oct 2024 15:30:22 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 12E056B009E for ; Tue, 22 Oct 2024 15:30:22 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 8109D40460 for ; Tue, 22 Oct 2024 19:30:12 +0000 (UTC) X-FDA: 82702228686.03.650F565 Received: from mail-pl1-f178.google.com (mail-pl1-f178.google.com [209.85.214.178]) by imf14.hostedemail.com (Postfix) with ESMTP id 3F1AB100010 for ; Tue, 22 Oct 2024 19:30:01 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=LFsnhz8z; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf14.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.178 as permitted sender) smtp.mailfrom=ryncsn@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729625296; a=rsa-sha256; cv=none; b=xxOPXiN5HZtdaWHMkdQ/E9xFKXn0achvFn6P3HM42UzEVmapLjoR4fcopyBYdYaK/cOUdF ewib8zcxjouPPVo5LweDPN99rt23P5ahmwmBCh0CrBROSf+8+Ch+BFJH0iZ61bPJiQCFQ+ J4SIiB5eHTa47L6UdExjI0yj739tLpc= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=LFsnhz8z; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf14.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.178 as permitted sender) smtp.mailfrom=ryncsn@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1729625296; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=0T5whPPZ1mtfPZDL/Wr67Sgt2fIYamz1vqMoEXoMco4=; b=0gCsZdiOP29bwA7lvXUc8nhepqn606DZv9RrK14rLvew3ZU3MLZO0vzIo/muUAvdiGGh62 G9HgGO2xO0aw+KIBRty8bGTsf5FtrDXzkOoSNJC5kYLKHu6SLuSdCg85EnKpBQsMht+fSr DAo7eRunvWVbjarQEX8PpF2y0OIFHLY= Received: by mail-pl1-f178.google.com with SMTP id d9443c01a7336-20c693b68f5so64587765ad.1 for ; Tue, 22 Oct 2024 12:30:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1729625417; x=1730230217; darn=kvack.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=0T5whPPZ1mtfPZDL/Wr67Sgt2fIYamz1vqMoEXoMco4=; b=LFsnhz8z1RtkimaHI0bm+XYXywT+6P9fsN89YAJ408xszX8hoaTofdTi4wgVt0lq/f fG6zBm/aPrGMo+lLI2wb3Me6im1ptAofrhK+O03SNklQeURrw6ItidjmE9p2XalRnbCg Vw9qMMRlaJKmUAlDG214aL1SAkGequ2Js0Z3ydp4SUHYiP7gJbDbKH1RW0TINXCLTwns jK3viEnZD1umTGVpvDqEk8mcHe0PJBx0WJpHykLS0vlWDZKSvPYOnAzwM0l5H7P43+b7 qt13a7Mnc+xQm19Vi7y408T/MV5RlVwl8ENekI7Dgn4JW23Z5p7bti8QGcFKYTGlsHhE 0qnw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729625417; x=1730230217; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=0T5whPPZ1mtfPZDL/Wr67Sgt2fIYamz1vqMoEXoMco4=; b=oi6DLO9C2YaTNyD6uspeO304rqdgpeCJvDd/zm1hHbwhe/ZDlW630zKEvLPL7NkouR 6GgOlWlOvTJdOAoSmxn+i39Pf8sRJ7TIBfTKmjbonH4WZCT94InnAUf+9xQtyopEx2rK T3j9rBdVsUMvkqbQiAnN3PQWo8Ic0PjadxdK1hGtfC8kr+8wcNuzL/krFAFr169s/h7E bcUc8T2H8H3T4BAtW4GyYkN72rUj0qVwaVp1UwMKWQ4N5vCAviL9BkiQjpb+Oi6LXCc4 O+WTMPpLkQiaKCRH0CB9cbmKEPOgRwRRHdQgK9RXukJyJwzEjswlPyn1wphaBI+gyu6V mWDQ== X-Gm-Message-State: AOJu0YwTW6Vgco2c67IHv8/h2a3g6ameo/U/2BBMpq4hq+lYgpyGiLUT QkLBmTFGnHySGxyubOp6CeoDyDXvxQ+k1NYVzsvr1hTXJE9sMZlWq3tl2xrRqYk= X-Google-Smtp-Source: AGHT+IEkf+8rWb2625GwIb5iPnB6qF5HfrvR5aPCaq44dinbjDAQnyDVMYYmoXwRSMux2JXk+YmbGw== X-Received: by 2002:a17:902:ea02:b0:20c:7d4c:64db with SMTP id d9443c01a7336-20fab2dba5cmr2570545ad.49.1729625417229; Tue, 22 Oct 2024 12:30:17 -0700 (PDT) Received: from KASONG-MC4.tencent.com ([106.37.123.36]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-20e7f0d9f05sm45895305ad.186.2024.10.22.12.30.13 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 22 Oct 2024 12:30:16 -0700 (PDT) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Chris Li , Barry Song , Ryan Roberts , Hugh Dickins , Yosry Ahmed , "Huang, Ying" , Tim Chen , Nhat Pham , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH 09/13] mm, swap: reduce contention on device lock Date: Wed, 23 Oct 2024 03:24:47 +0800 Message-ID: <20241022192451.38138-10-ryncsn@gmail.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20241022192451.38138-1-ryncsn@gmail.com> References: <20241022192451.38138-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 3F1AB100010 X-Stat-Signature: inxeozr9heagi46mngcwa9ufnwk5s8z3 X-Rspam-User: X-HE-Tag: 1729625401-649903 X-HE-Meta: U2FsdGVkX1+HNZaagVQcvlWeYcIizPMkVewO9SmPbmQNNe+pT7hTE7nnrkIde7DjERvDbKYJVHIgacMoteu7bfvfd9nSs8YgqIQ5I9y85hrcBYU7CjMIc77UO8kkLW1FcCNUUD8v4+8nSzywRnw2Rhh8xfulZd29LhxQU+6IiOSPARrtCmBH4m5xf+bW7wTcUgXhnl1mmI++rpgafqmbmoQSMeRw0IHbojeIdj8k6dIpZebQzu27awDPphVFNVvf1SDd0LLNxrAm0tKtVjfPPaipsuLxGErBROh067s+zI0t6G46WncsmeLf8sYx8blYWQFfOshzrg176Lv3QfXV8zh5ql9AAlfgRpiGdc4rZDYM2i7FV69O9k6Imts3wVWv4e4GTdYroDOXTT94Uu/6j4EQyF+vsiY3k1jlf74inPY9iqG3zB3tIhyvSy1u4wGX6ZLMJVRW8fpuAOzBVtdPzqCvasXJa0iqPWbuJxv7vD3ZSkWgoBT+T4qLhHY/A5/r067+Zl1KE9cyaYWQK8H/Dh1HI5HiSRGBB/4SgtVA+SmJ616wzPRygsOveaJRcXJ/4qVPIH0tKVh969wZEX8MzGe9Q6w3+Q7i6jw+wXESW58Mn0u0obZLk3RdVSR1Dpo/y7Ozq3V8Chumo2ZthB4wSMc5X5/dyoUh7FiNzoLG0UdLtwWbDweNgvX/s4JG6Ze2Yg9fxh5wMABghIUdEXKFZyeg6txewv6nGiIpk7rTVERueudVKGmRgNUB4pJNX61FF8nb2hwwhFpMvk4VQ+xPgigD5DYC3+sI/UuQcdv/i12Te41i+Uq0fUTefnMzymQtyB/V6YL6YoQ/1WDizxw3AUQS/btC8dKpUx3+v8rKmGTGAVK8tosdmYH+hIE4Isit6jQqs6os35o9C4IQRpRGOhwVBUo7iHKF4nc/FQbPGK+mYZVQm9eoB3ZL2tzT8wTfRa5UWkb0ilAqhU/kLmI aMzgc7ak X3FnzQ2vaCBc6UFlcoYSN93qV027yjxlQieAMIj5QeY+4kpl2YavbS+xsc1e0zMCf+UScBMSkM1GfOeUz+1gHOTNHE2/CJk8FBcMSyj4UNDxY4BDuFihU/Qb9inByjImxE9ITk6WKpp7qpdYfCsBZV8K3uyawhjcmGsRT7a+1KVtGbKd1Wl++lSJw0C2Xc2TGcZ6dTnrN8SCIFkYMjnuRUBp2WNP+MNlgH/v7Y/HHRvYR+uzX5Ztqf6cW4TvYxVicV7PE9syAu+Zhz8e9CAtye9vL15x/fQHBD7Ov3mjXaAfOloGEvNry/0yQCPf2utfsCzqF99crOzpDiSHkUR6Nh85JT8SXYhdVutdSlIVKG2V5W8L5+BxIIIMZAJE13DYcx6QKws7UO94dFOW2n+roDbsoOsD32kIlSmkj91rriQcwC2/QDXJRQQCOpLP3TJ4RDEGS3UWN6mkvLHE1+787KGUlDwg1FQdHOqVzPWcpTN9JPY0U1zawmGJvvn4gAN/od4Ny X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Kairui Song Currently swap locking is mainly composed of two locks, cluster lock (ci->lock) and device lock (si->lock). Cluster lock is much more fine-grained, so it will be best to use ci->lock instead of si->lock as much as possible. Following the new cluster allocator design, many operation doesn't need to touch si->lock at all. In practise, we only need to take si->lock when moving clusters between lists. To archive it, this commit reworked the locking pattern of all si->lock and ci->lock users, eliminated all usage of ci->lock inside si->lock, introduce new design to avoid touching si->lock as much as possible. For minimal contention for allocation and easier understanding, two ideas are introduced with the corresponding helpers: `isolation` and `relocation`: - Clusters will be `isolated` from list upon being scanned for allocation, so scanning of on-list cluster no longer need to hold the si->lock except the very moment, and hence removed the ci->lock usage inside si->lock. In the new allocator design, one cluster always get moved after scanning (free -> nonfull, nonfull -> frag, frag -> frag tail) so this introduces no extra overhead. This also greatly reduced the contention of both si->lock and ci->lock as other CPUs won't walk onto the same cluster by iterating the list. The off-list time window of a cluster is also minimal, one CPU can at most hold one cluster while scanning the 512 entries on it, which we used to busy wait with a spin lock. This is done with `cluster_isolate_lock` on scanning of a new cluster. Note: Scanning of per CPU cluster is a special case, it doesn't isolation the cluster. That's because it doesn't need to hold the si->lock at all, it simply acquire the ci->lock of previously used cluster and use it. - Cluster will be `relocated` after allocation or freeing according to it's count and status. Allocations no longer holds si->lock now, and may drop ci->lock for reclaim, so the cluster could be moved to anywhere. Besides, `isolation` clears all flags when it takes the cluster off list (The flag must be in-sync with list status, so cluster users don't need to touch si->lock for checking its list status. This is important for reducing contention on si->lock). So the cluster have to be `relocated` according to its usage after being allocation to the right list. This is done with `relocate_cluster` after allocation, or `[partial_]free_cluster` after freeing. Now except swapon / swapoff and discard, `isolation` and `relocation` are the only two places that need to take si->lock. And as each CPU will keep using its per-CPU cluster as much as possible and a cluster have 512 entries to be consumed, si->lock is rarely touched. The lock contention of si->lock is now barely observable. Test with build linux kernel with defconfig showed huge performance improvement: tiem make -j96 / 768M memcg, 4K pages, 10G ZRAM, on Intel 8255C: Before: Sys time: 73578.30, Real time: 864.05 After: (-50.7% sys time, -44.8% real time) Sys time: 36227.49, Real time: 476.66 time make -j96 / 1152M memcg, 64K mTHP, 10G ZRAM, on Intel 8255C: (avg of 4 test run) Before: Sys time: 74044.85, Real time: 846.51 hugepages-64kB/stats/swpout: 1735216 hugepages-64kB/stats/swpout_fallback: 430333 After: (-40.4% sys time, -37.1% real time) Sys time: 44160.56, Real time: 532.07 hugepages-64kB/stats/swpout: 1786288 hugepages-64kB/stats/swpout_fallback: 243384 time make -j32 / 512M memcg, 4K pages, 5G ZRAM, on AMD 7K62: Before: Sys time: 8098.21, Real time: 401.3 After: (-22.6% sys time, -12.8% real time ) Sys time: 6265.02, Real time: 349.83 The allocation success rate also slightly improved as we sanitized the usage of clusters with new defined helpers and locks, so temporarily dropped si->lock or ci->lock won't cause cluster order shuffle. Suggested-by: Chris Li Signed-off-by: Kairui Song --- include/linux/swap.h | 5 +- mm/swapfile.c | 418 ++++++++++++++++++++++++------------------- 2 files changed, 239 insertions(+), 184 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index 75fc2da1767d..a3b5d74b095a 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -265,6 +265,8 @@ enum swap_cluster_flags { CLUSTER_FLAG_FREE, CLUSTER_FLAG_NONFULL, CLUSTER_FLAG_FRAG, + /* Clusters with flags above are allocatable */ + CLUSTER_FLAG_USABLE = CLUSTER_FLAG_FRAG, CLUSTER_FLAG_FULL, CLUSTER_FLAG_DISCARD, CLUSTER_FLAG_MAX, @@ -290,6 +292,7 @@ enum swap_cluster_flags { * throughput. */ struct percpu_cluster { + local_lock_t lock; /* Protect the percpu_cluster above */ unsigned int next[SWAP_NR_ORDERS]; /* Likely next allocation offset */ }; @@ -312,7 +315,7 @@ struct swap_info_struct { /* list of cluster that contains at least one free slot */ struct list_head frag_clusters[SWAP_NR_ORDERS]; /* list of cluster that are fragmented or contented */ - unsigned int frag_cluster_nr[SWAP_NR_ORDERS]; + atomic_long_t frag_cluster_nr[SWAP_NR_ORDERS]; unsigned int pages; /* total of usable pages of swap */ atomic_long_t inuse_pages; /* number of those currently in use */ struct percpu_cluster __percpu *percpu_cluster; /* per cpu's swap location */ diff --git a/mm/swapfile.c b/mm/swapfile.c index 96d8012b003c..a19ee8d5ffd0 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -260,12 +260,10 @@ static int __try_to_reclaim_swap(struct swap_info_struct *si, folio_ref_sub(folio, nr_pages); folio_set_dirty(folio); - spin_lock(&si->lock); /* Only sinple page folio can be backed by zswap */ if (nr_pages == 1) zswap_invalidate(entry); swap_entry_range_free(si, entry, nr_pages); - spin_unlock(&si->lock); ret = nr_pages; out_unlock: folio_unlock(folio); @@ -402,7 +400,21 @@ static void discard_swap_cluster(struct swap_info_struct *si, static inline bool cluster_is_free(struct swap_cluster_info *info) { - return info->flags == CLUSTER_FLAG_FREE; + return info->count == 0; +} + +static inline bool cluster_is_discard(struct swap_cluster_info *info) +{ + return info->flags == CLUSTER_FLAG_DISCARD; +} + +static inline bool cluster_is_usable(struct swap_cluster_info *ci, int order) +{ + if (unlikely(ci->flags > CLUSTER_FLAG_USABLE)) + return false; + if (!order) + return true; + return cluster_is_free(ci) || order == ci->order; } static inline unsigned int cluster_index(struct swap_info_struct *si, @@ -439,19 +451,20 @@ static void cluster_move(struct swap_info_struct *si, { VM_WARN_ON(ci->flags == new_flags); BUILD_BUG_ON(1 << sizeof(ci->flags) * BITS_PER_BYTE < CLUSTER_FLAG_MAX); + lockdep_assert_held(&ci->lock); - if (ci->flags == CLUSTER_FLAG_NONE) { + spin_lock(&si->lock); + if (ci->flags == CLUSTER_FLAG_NONE) list_add_tail(&ci->list, list); - } else { - if (ci->flags == CLUSTER_FLAG_FRAG) { - VM_WARN_ON(!si->frag_cluster_nr[ci->order]); - si->frag_cluster_nr[ci->order]--; - } + else list_move_tail(&ci->list, list); - } + spin_unlock(&si->lock); + + if (ci->flags == CLUSTER_FLAG_FRAG) + atomic_long_dec(&si->frag_cluster_nr[ci->order]); + else if (new_flags == CLUSTER_FLAG_FRAG) + atomic_long_inc(&si->frag_cluster_nr[ci->order]); ci->flags = new_flags; - if (new_flags == CLUSTER_FLAG_FRAG) - si->frag_cluster_nr[ci->order]++; } /* Add a cluster to discard list and schedule it to do discard */ @@ -474,39 +487,82 @@ static void swap_cluster_schedule_discard(struct swap_info_struct *si, static void __free_cluster(struct swap_info_struct *si, struct swap_cluster_info *ci) { - lockdep_assert_held(&si->lock); lockdep_assert_held(&ci->lock); cluster_move(si, ci, &si->free_clusters, CLUSTER_FLAG_FREE); ci->order = 0; } +/* + * Isolate and lock the first cluster that is not contented on a list, + * clean its flag before taken off-list. Cluster flag must be in sync + * with list status, so cluster updaters can always know the cluster + * list status without touching si lock. + * + * Note it's possible that all clusters on a list are contented so + * this returns NULL for an non-empty list. + */ +static struct swap_cluster_info *cluster_isolate_lock( + struct swap_info_struct *si, struct list_head *list) +{ + struct swap_cluster_info *ci, *ret = NULL; + + spin_lock(&si->lock); + list_for_each_entry(ci, list, list) { + if (!spin_trylock(&ci->lock)) + continue; + + /* We may only isolate and clear flags of following lists */ + VM_BUG_ON(!ci->flags); + VM_BUG_ON(ci->flags > CLUSTER_FLAG_USABLE && + ci->flags != CLUSTER_FLAG_FULL); + + list_del(&ci->list); + ci->flags = CLUSTER_FLAG_NONE; + ret = ci; + break; + } + spin_unlock(&si->lock); + + return ret; +} + /* * Doing discard actually. After a cluster discard is finished, the cluster - * will be added to free cluster list. caller should hold si->lock. -*/ -static void swap_do_scheduled_discard(struct swap_info_struct *si) + * will be added to free cluster list. Discard cluster is a bit special as + * they don't participate in allocation or reclaim, so clusters marked as + * CLUSTER_FLAG_DISCARD must remain off-list or on discard list. + */ +static bool swap_do_scheduled_discard(struct swap_info_struct *si) { struct swap_cluster_info *ci; + bool ret = false; unsigned int idx; + spin_lock(&si->lock); while (!list_empty(&si->discard_clusters)) { ci = list_first_entry(&si->discard_clusters, struct swap_cluster_info, list); + /* + * Delete the cluster from list but don't clear the flag until + * discard is done, so isolation and relocation will skip it. + */ list_del(&ci->list); - /* Must clear flag when taking a cluster off-list */ - ci->flags = CLUSTER_FLAG_NONE; idx = cluster_index(si, ci); spin_unlock(&si->lock); - discard_swap_cluster(si, idx * SWAPFILE_CLUSTER, SWAPFILE_CLUSTER); - spin_lock(&si->lock); spin_lock(&ci->lock); - __free_cluster(si, ci); + /* Discard is done, return to list and clear the flag */ + ci->flags = CLUSTER_FLAG_NONE; memset(si->swap_map + idx * SWAPFILE_CLUSTER, 0, SWAPFILE_CLUSTER); + __free_cluster(si, ci); spin_unlock(&ci->lock); + ret = true; + spin_lock(&si->lock); } + spin_unlock(&si->lock); + return ret; } static void swap_discard_work(struct work_struct *work) @@ -515,9 +571,7 @@ static void swap_discard_work(struct work_struct *work) si = container_of(work, struct swap_info_struct, discard_work); - spin_lock(&si->lock); swap_do_scheduled_discard(si); - spin_unlock(&si->lock); } static void swap_users_ref_free(struct percpu_ref *ref) @@ -528,10 +582,14 @@ static void swap_users_ref_free(struct percpu_ref *ref) complete(&si->comp); } +/* + * Must be called after freeing if ci->count == 0, puts the cluster to free + * or discard list. + */ static void free_cluster(struct swap_info_struct *si, struct swap_cluster_info *ci) { VM_BUG_ON(ci->count != 0); - lockdep_assert_held(&si->lock); + VM_BUG_ON(ci->flags == CLUSTER_FLAG_FREE); lockdep_assert_held(&ci->lock); /* @@ -548,6 +606,48 @@ static void free_cluster(struct swap_info_struct *si, struct swap_cluster_info * __free_cluster(si, ci); } +/* + * Must be called after freeing if ci->count != 0, puts the cluster to free + * or nonfull list. + */ +static void partial_free_cluster(struct swap_info_struct *si, + struct swap_cluster_info *ci) +{ + VM_BUG_ON(!ci->count || ci->count == SWAPFILE_CLUSTER); + lockdep_assert_held(&ci->lock); + + if (ci->flags != CLUSTER_FLAG_NONFULL) + cluster_move(si, ci, &si->nonfull_clusters[ci->order], + CLUSTER_FLAG_NONFULL); +} + +/* + * Must be called after allocation, put the cluster to full or frag list. + * Note: allocation don't need si lock, and may drop the ci lock for reclaim, + * so the cluster could end up any where before re-acquiring ci lock. + */ +static void relocate_cluster(struct swap_info_struct *si, + struct swap_cluster_info *ci) +{ + lockdep_assert_held(&ci->lock); + + /* Discard cluster must remain off-list or on discard list */ + if (cluster_is_discard(ci)) + return; + + if (!ci->count) { + free_cluster(si, ci); + } else if (ci->count != SWAPFILE_CLUSTER) { + if (ci->flags != CLUSTER_FLAG_FRAG) + cluster_move(si, ci, &si->frag_clusters[ci->order], + CLUSTER_FLAG_FRAG); + } else { + if (ci->flags != CLUSTER_FLAG_FULL) + cluster_move(si, ci, &si->full_clusters, + CLUSTER_FLAG_FULL); + } +} + /* * The cluster corresponding to page_nr will be used. The cluster will not be * added to free cluster list and its usage counter will be increased by 1. @@ -566,30 +666,6 @@ static void inc_cluster_info_page(struct swap_info_struct *si, VM_BUG_ON(ci->flags); } -/* - * The cluster ci decreases @nr_pages usage. If the usage counter becomes 0, - * which means no page in the cluster is in use, we can optionally discard - * the cluster and add it to free cluster list. - */ -static void dec_cluster_info_page(struct swap_info_struct *si, - struct swap_cluster_info *ci, int nr_pages) -{ - VM_BUG_ON(ci->count < nr_pages); - VM_BUG_ON(cluster_is_free(ci)); - lockdep_assert_held(&si->lock); - lockdep_assert_held(&ci->lock); - ci->count -= nr_pages; - - if (!ci->count) { - free_cluster(si, ci); - return; - } - - if (ci->flags != CLUSTER_FLAG_NONFULL) - cluster_move(si, ci, &si->nonfull_clusters[ci->order], - CLUSTER_FLAG_NONFULL); -} - static bool cluster_reclaim_range(struct swap_info_struct *si, struct swap_cluster_info *ci, unsigned long start, unsigned long end) @@ -599,8 +675,6 @@ static bool cluster_reclaim_range(struct swap_info_struct *si, int nr_reclaim; spin_unlock(&ci->lock); - spin_unlock(&si->lock); - do { switch (READ_ONCE(map[offset])) { case 0: @@ -618,9 +692,7 @@ static bool cluster_reclaim_range(struct swap_info_struct *si, } } while (offset < end); out: - spin_lock(&si->lock); spin_lock(&ci->lock); - /* * Recheck the range no matter reclaim succeeded or not, the slot * could have been be freed while we are not holding the lock. @@ -634,11 +706,11 @@ static bool cluster_reclaim_range(struct swap_info_struct *si, static bool cluster_scan_range(struct swap_info_struct *si, struct swap_cluster_info *ci, - unsigned long start, unsigned int nr_pages) + unsigned long start, unsigned int nr_pages, + bool *need_reclaim) { unsigned long offset, end = start + nr_pages; unsigned char *map = si->swap_map; - bool need_reclaim = false; for (offset = start; offset < end; offset++) { switch (READ_ONCE(map[offset])) { @@ -647,16 +719,13 @@ static bool cluster_scan_range(struct swap_info_struct *si, case SWAP_HAS_CACHE: if (!vm_swap_full()) return false; - need_reclaim = true; + *need_reclaim = true; continue; default: return false; } } - if (need_reclaim) - return cluster_reclaim_range(si, ci, start, end); - return true; } @@ -666,23 +735,12 @@ static void cluster_alloc_range(struct swap_info_struct *si, struct swap_cluster { unsigned int nr_pages = 1 << order; - VM_BUG_ON(ci->flags != CLUSTER_FLAG_FREE && - ci->flags != CLUSTER_FLAG_NONFULL && - ci->flags != CLUSTER_FLAG_FRAG); - - if (cluster_is_free(ci)) { - if (nr_pages < SWAPFILE_CLUSTER) - cluster_move(si, ci, &si->nonfull_clusters[order], - CLUSTER_FLAG_NONFULL); + if (cluster_is_free(ci)) ci->order = order; - } memset(si->swap_map + start, usage, nr_pages); swap_range_alloc(si, nr_pages); ci->count += nr_pages; - - if (ci->count == SWAPFILE_CLUSTER) - cluster_move(si, ci, &si->full_clusters, CLUSTER_FLAG_FULL); } static unsigned int alloc_swap_scan_cluster(struct swap_info_struct *si, unsigned long offset, @@ -692,34 +750,52 @@ static unsigned int alloc_swap_scan_cluster(struct swap_info_struct *si, unsigne unsigned long start = offset & ~(SWAPFILE_CLUSTER - 1); unsigned long end = min(start + SWAPFILE_CLUSTER, si->max); unsigned int nr_pages = 1 << order; + bool need_reclaim, ret; struct swap_cluster_info *ci; - if (end < nr_pages) - return SWAP_NEXT_INVALID; - end -= nr_pages; + ci = &si->cluster_info[offset / SWAPFILE_CLUSTER]; + lockdep_assert_held(&ci->lock); - ci = lock_cluster(si, offset); - if (ci->count + nr_pages > SWAPFILE_CLUSTER) { + if (end < nr_pages || ci->count + nr_pages > SWAPFILE_CLUSTER) { offset = SWAP_NEXT_INVALID; - goto done; + goto out; } - while (offset <= end) { - if (cluster_scan_range(si, ci, offset, nr_pages)) { - cluster_alloc_range(si, ci, offset, usage, order); - *foundp = offset; - if (ci->count == SWAPFILE_CLUSTER) { + for (end -= nr_pages; offset <= end; offset += nr_pages) { + need_reclaim = false; + if (!cluster_scan_range(si, ci, offset, nr_pages, &need_reclaim)) + continue; + if (need_reclaim) { + ret = cluster_reclaim_range(si, ci, start, end); + /* + * Reclaim drops ci->lock and cluster could be used + * by another order. Not checking flag as off-list + * cluster has no flag set, and change of list + * won't cause fragmentation. + */ + if (!cluster_is_usable(ci, order)) { offset = SWAP_NEXT_INVALID; - goto done; + goto out; } - offset += nr_pages; - break; + if (cluster_is_free(ci)) + offset = start; + /* Reclaim failed but cluster is usable, try next */ + if (!ret) + continue; + } + cluster_alloc_range(si, ci, offset, usage, order); + *foundp = offset; + if (ci->count == SWAPFILE_CLUSTER) { + offset = SWAP_NEXT_INVALID; + goto out; } offset += nr_pages; + break; } if (offset > end) offset = SWAP_NEXT_INVALID; -done: +out: + relocate_cluster(si, ci); unlock_cluster(ci); return offset; } @@ -736,18 +812,17 @@ static void swap_reclaim_full_clusters(struct swap_info_struct *si, bool force) if (force) to_scan = swap_usage_in_pages(si) / SWAPFILE_CLUSTER; - while (!list_empty(&si->full_clusters)) { - ci = list_first_entry(&si->full_clusters, struct swap_cluster_info, list); - list_move_tail(&ci->list, &si->full_clusters); + while ((ci = cluster_isolate_lock(si, &si->full_clusters))) { offset = cluster_offset(si, ci); end = min(si->max, offset + SWAPFILE_CLUSTER); to_scan--; - spin_unlock(&si->lock); while (offset < end) { if (READ_ONCE(map[offset]) == SWAP_HAS_CACHE) { + spin_unlock(&ci->lock); nr_reclaim = __try_to_reclaim_swap(si, offset, TTRS_ANYWAY | TTRS_DIRECT); + spin_lock(&ci->lock); if (nr_reclaim) { offset += abs(nr_reclaim); continue; @@ -755,8 +830,8 @@ static void swap_reclaim_full_clusters(struct swap_info_struct *si, bool force) } offset++; } - spin_lock(&si->lock); + unlock_cluster(ci); if (to_scan <= 0) break; } @@ -768,9 +843,7 @@ static void swap_reclaim_work(struct work_struct *work) si = container_of(work, struct swap_info_struct, reclaim_work); - spin_lock(&si->lock); swap_reclaim_full_clusters(si, true); - spin_unlock(&si->lock); } /* @@ -781,23 +854,36 @@ static void swap_reclaim_work(struct work_struct *work) static unsigned long cluster_alloc_swap_entry(struct swap_info_struct *si, int order, unsigned char usage) { - struct percpu_cluster *cluster; struct swap_cluster_info *ci; unsigned int offset, found = 0; -new_cluster: - lockdep_assert_held(&si->lock); - cluster = this_cpu_ptr(si->percpu_cluster); - offset = cluster->next[order]; + /* Fast path using per CPU cluster */ + local_lock(&si->percpu_cluster->lock); + offset = __this_cpu_read(si->percpu_cluster->next[order]); if (offset) { - offset = alloc_swap_scan_cluster(si, offset, &found, order, usage); + ci = lock_cluster(si, offset); + /* Cluster could have been used by another order */ + if (cluster_is_usable(ci, order)) { + if (cluster_is_free(ci)) + offset = cluster_offset(si, ci); + offset = alloc_swap_scan_cluster(si, offset, &found, + order, usage); + } else { + unlock_cluster(ci); + } if (found) goto done; } - if (!list_empty(&si->free_clusters)) { - ci = list_first_entry(&si->free_clusters, struct swap_cluster_info, list); - offset = alloc_swap_scan_cluster(si, cluster_offset(si, ci), &found, order, usage); +new_cluster: + ci = cluster_isolate_lock(si, &si->free_clusters); + if (ci) { + offset = alloc_swap_scan_cluster(si, cluster_offset(si, ci), + &found, order, usage); + /* + * Allocation from free cluster must never fail and + * cluster lock must remain untouched. + */ VM_BUG_ON(!found); goto done; } @@ -807,49 +893,45 @@ static unsigned long cluster_alloc_swap_entry(struct swap_info_struct *si, int o swap_reclaim_full_clusters(si, false); if (order < PMD_ORDER) { - unsigned int frags = 0; + unsigned int frags = 0, frags_existing; - while (!list_empty(&si->nonfull_clusters[order])) { - ci = list_first_entry(&si->nonfull_clusters[order], - struct swap_cluster_info, list); - cluster_move(si, ci, &si->frag_clusters[order], CLUSTER_FLAG_FRAG); + while ((ci = cluster_isolate_lock(si, &si->nonfull_clusters[order]))) { offset = alloc_swap_scan_cluster(si, cluster_offset(si, ci), &found, order, usage); - frags++; + /* + * With `fragmenting` set to true, it will surely take + * the cluster off nonfull list + */ if (found) goto done; + frags++; } - /* - * Nonfull clusters are moved to frag tail if we reached - * here, count them too, don't over scan the frag list. - */ - while (frags < si->frag_cluster_nr[order]) { - ci = list_first_entry(&si->frag_clusters[order], - struct swap_cluster_info, list); + frags_existing = atomic_long_read(&si->frag_cluster_nr[order]); + while (frags < frags_existing && + (ci = cluster_isolate_lock(si, &si->frag_clusters[order]))) { + atomic_long_dec(&si->frag_cluster_nr[order]); /* - * Rotate the frag list to iterate, they were all failing - * high order allocation or moved here due to per-CPU usage, - * this help keeping usable cluster ahead. + * Rotate the frag list to iterate, they were all + * failing high order allocation or moved here due to + * per-CPU usage, but either way they could contain + * usable (eg. lazy-freed swap cache) slots. */ - list_move_tail(&ci->list, &si->frag_clusters[order]); offset = alloc_swap_scan_cluster(si, cluster_offset(si, ci), &found, order, usage); - frags++; if (found) goto done; + frags++; } } - if (!list_empty(&si->discard_clusters)) { - /* - * we don't have free cluster but have some clusters in - * discarding, do discard now and reclaim them, then - * reread cluster_next_cpu since we dropped si->lock - */ - swap_do_scheduled_discard(si); + /* + * We don't have free cluster but have some clusters in + * discarding, do discard now and reclaim them, then + * reread cluster_next_cpu since we dropped si->lock + */ + if ((si->flags & SWP_PAGE_DISCARD) && swap_do_scheduled_discard(si)) goto new_cluster; - } if (order) goto done; @@ -860,26 +942,25 @@ static unsigned long cluster_alloc_swap_entry(struct swap_info_struct *si, int o * Clusters here have at least one usable slots and can't fail order 0 * allocation, but reclaim may drop si->lock and race with another user. */ - while (!list_empty(&si->frag_clusters[o])) { - ci = list_first_entry(&si->frag_clusters[o], - struct swap_cluster_info, list); + while ((ci = cluster_isolate_lock(si, &si->frag_clusters[o]))) { + atomic_long_dec(&si->frag_cluster_nr[o]); offset = alloc_swap_scan_cluster(si, cluster_offset(si, ci), - &found, 0, usage); + &found, order, usage); if (found) goto done; } - while (!list_empty(&si->nonfull_clusters[o])) { - ci = list_first_entry(&si->nonfull_clusters[o], - struct swap_cluster_info, list); + while ((ci = cluster_isolate_lock(si, &si->nonfull_clusters[o]))) { offset = alloc_swap_scan_cluster(si, cluster_offset(si, ci), - &found, 0, usage); + &found, order, usage); if (found) goto done; } } done: - cluster->next[order] = offset; + __this_cpu_write(si->percpu_cluster->next[order], offset); + local_unlock(&si->percpu_cluster->lock); + return found; } @@ -1135,14 +1216,11 @@ int get_swap_pages(int n_goal, swp_entry_t swp_entries[], int entry_order) plist_requeue(&si->avail_lists[node], &swap_avail_heads[node]); spin_unlock(&swap_avail_lock); if (get_swap_device_info(si)) { - spin_lock(&si->lock); n_ret = scan_swap_map_slots(si, SWAP_HAS_CACHE, n_goal, swp_entries, order); - spin_unlock(&si->lock); put_swap_device(si); if (n_ret || size > 1) goto check_out; - cond_resched(); } spin_lock(&swap_avail_lock); @@ -1355,9 +1433,7 @@ static bool __swap_entries_free(struct swap_info_struct *si, if (!has_cache) { for (i = 0; i < nr; i++) zswap_invalidate(swp_entry(si->type, offset + i)); - spin_lock(&si->lock); swap_entry_range_free(si, entry, nr); - spin_unlock(&si->lock); } return has_cache; @@ -1386,16 +1462,27 @@ static void swap_entry_range_free(struct swap_info_struct *si, swp_entry_t entry unsigned char *map_end = map + nr_pages; struct swap_cluster_info *ci; + /* It should never free entries across different clusters */ + VM_BUG_ON((offset / SWAPFILE_CLUSTER) != ((offset + nr_pages - 1) / SWAPFILE_CLUSTER)); + ci = lock_cluster(si, offset); + VM_BUG_ON(cluster_is_free(ci)); + VM_BUG_ON(ci->count < nr_pages); + + ci->count -= nr_pages; do { VM_BUG_ON(*map != SWAP_HAS_CACHE); *map = 0; } while (++map < map_end); - dec_cluster_info_page(si, ci, nr_pages); - unlock_cluster(ci); mem_cgroup_uncharge_swap(entry, nr_pages); swap_range_free(si, offset, nr_pages); + + if (!ci->count) + free_cluster(si, ci); + else + partial_free_cluster(si, ci); + unlock_cluster(ci); } static void cluster_swap_free_nr(struct swap_info_struct *si, @@ -1467,9 +1554,7 @@ void put_swap_folio(struct folio *folio, swp_entry_t entry) ci = lock_cluster(si, offset); if (size > 1 && swap_is_has_cache(si, offset, size)) { unlock_cluster(ci); - spin_lock(&si->lock); swap_entry_range_free(si, entry, size); - spin_unlock(&si->lock); return; } for (int i = 0; i < size; i++, entry.val++) { @@ -1484,46 +1569,19 @@ void put_swap_folio(struct folio *folio, swp_entry_t entry) unlock_cluster(ci); } -static int swp_entry_cmp(const void *ent1, const void *ent2) -{ - const swp_entry_t *e1 = ent1, *e2 = ent2; - - return (int)swp_type(*e1) - (int)swp_type(*e2); -} - void swapcache_free_entries(swp_entry_t *entries, int n) { - struct swap_info_struct *si, *prev; int i; + struct swap_info_struct *si = NULL; if (n <= 0) return; - prev = NULL; - si = NULL; - - /* - * Sort swap entries by swap device, so each lock is only taken once. - * nr_swapfiles isn't absolutely correct, but the overhead of sort() is - * so low that it isn't necessary to optimize further. - */ - if (nr_swapfiles > 1) - sort(entries, n, sizeof(entries[0]), swp_entry_cmp, NULL); for (i = 0; i < n; ++i) { si = _swap_info_get(entries[i]); - - if (si != prev) { - if (prev != NULL) - spin_unlock(&prev->lock); - if (si != NULL) - spin_lock(&si->lock); - } if (si) swap_entry_range_free(si, entries[i], 1); - prev = si; } - if (si) - spin_unlock(&si->lock); } int __swap_count(swp_entry_t entry) @@ -1775,13 +1833,8 @@ swp_entry_t get_swap_page_of_type(int type) goto fail; /* This is called for allocating swap entry, not cache */ - if (get_swap_device_info(si)) { - spin_lock(&si->lock); - if ((si->flags & SWP_WRITEOK) && scan_swap_map_slots(si, 1, 1, &entry, 0)) - atomic_long_dec(&nr_swap_pages); - spin_unlock(&si->lock); - put_swap_device(si); - } + if ((si->flags & SWP_WRITEOK) && scan_swap_map_slots(si, 1, 1, &entry, 0)) + atomic_long_dec(&nr_swap_pages); fail: return entry; } @@ -3098,6 +3151,7 @@ static struct swap_cluster_info *setup_clusters(struct swap_info_struct *si, cluster = per_cpu_ptr(si->percpu_cluster, cpu); for (i = 0; i < SWAP_NR_ORDERS; i++) cluster->next[i] = SWAP_NEXT_INVALID; + local_lock_init(&cluster->lock); } /* @@ -3121,7 +3175,7 @@ static struct swap_cluster_info *setup_clusters(struct swap_info_struct *si, for (i = 0; i < SWAP_NR_ORDERS; i++) { INIT_LIST_HEAD(&si->nonfull_clusters[i]); INIT_LIST_HEAD(&si->frag_clusters[i]); - si->frag_cluster_nr[i] = 0; + atomic_long_set(&si->frag_cluster_nr[i], 0); } /* @@ -3603,7 +3657,6 @@ int add_swap_count_continuation(swp_entry_t entry, gfp_t gfp_mask) */ goto outer; } - spin_lock(&si->lock); offset = swp_offset(entry); @@ -3668,7 +3721,6 @@ int add_swap_count_continuation(swp_entry_t entry, gfp_t gfp_mask) spin_unlock(&si->cont_lock); out: unlock_cluster(ci); - spin_unlock(&si->lock); put_swap_device(si); outer: if (page) From patchwork Tue Oct 22 19:24:48 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kairui Song X-Patchwork-Id: 13846084 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BE02ECDD0CB for ; Tue, 22 Oct 2024 19:30:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 321A76B009F; Tue, 22 Oct 2024 15:30:25 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 25DDF6B00A1; Tue, 22 Oct 2024 15:30:25 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0B1A06B00A2; Tue, 22 Oct 2024 15:30:25 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id DBAEE6B009F for ; Tue, 22 Oct 2024 15:30:24 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id AE012160426 for ; Tue, 22 Oct 2024 19:30:05 +0000 (UTC) X-FDA: 82702229400.29.A7BE50D Received: from mail-pl1-f173.google.com (mail-pl1-f173.google.com [209.85.214.173]) by imf03.hostedemail.com (Postfix) with ESMTP id 03B2320019 for ; Tue, 22 Oct 2024 19:30:14 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=BFDl+239; spf=pass (imf03.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.173 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1729625346; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=6X+VEn+3kCPXJIqxEfgPvCWyrBXaWqi+AOuTj72rupI=; b=KEC0SKKtsjqE/CFxd82PDFkZaHtAIK/MrnrovTH3iM39/EScI+SGnGrps2a9PgGrWwS+iE Vu5ItwrVN5+65ttUUdOxYvxsccvZYj4DNf+Nza7LoDQRWMBkREM8ZKUkx1E27KdfUq7fd/ frkQNLiHIyPIc8EFI0fw2eGqwXeRFqE= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=BFDl+239; spf=pass (imf03.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.173 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729625346; a=rsa-sha256; cv=none; b=lJiH7WexhRiRzpTAbFKzh/ijpVLmK3neGfrBiXPSfPv/vPjfZlKmM+fP4hnVWq0/6UF3iG FEFbSI1IipmYX2A7dZUxntuI4Brouk7Hj0rhxgZS2OScjhjvsNGe6JvYj8jvgcIdEda1co tfWOdAdtMk/7FVF/r5NSPeEmjVERGvY= Received: by mail-pl1-f173.google.com with SMTP id d9443c01a7336-20c714cd9c8so60047555ad.0 for ; Tue, 22 Oct 2024 12:30:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1729625421; x=1730230221; darn=kvack.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=6X+VEn+3kCPXJIqxEfgPvCWyrBXaWqi+AOuTj72rupI=; b=BFDl+239OcD4qUouwrjjFrq+jy+TsJMtTWgH29DzeRgi8eIJ1yDDKCL1Ruq1qJPF7I kAIe3BnxSeCM35oH4D1JQAy/C1JUCivukoT+Upiv/v1D21GbmXSM0WSL2Q7P0zTvkqKd 0NOnny2WYc8vivs28GhwWKi8bbs1ovMhuvAO+cTTA99leUcTCdX/jMwKJl+NIcamlN7Z TrFPyKOxL3eje9KYQ+GySL0Qpu34g7ZHxofbvRYicvPO6xLj894wbhRcBwtk/SDG+ur7 01CE8pmzFUrKeWpMWtlU7Mb9WGVTO5FGMRTfypwtmpmhNcKLszSEZoy/R1TxNOWbLMMS uYuw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729625421; x=1730230221; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=6X+VEn+3kCPXJIqxEfgPvCWyrBXaWqi+AOuTj72rupI=; b=mUUTzatTykwjdGrt+gO9w0TCfalifovVx5CpofBew6RukpqsEm7TkeH8PfioUNgh0C cJvhR9USHst6/JvlsPGEyROKAO76fLsKNiA6ZZcu8ELhydXVzt7UKvJ3xcdOaB1bX2RA wd5kWyss3MhMWw1wY6SmUnYkhUTXacugkSlg4LQy3TdRWR7+2hHjgNeb7RMoP6gt3HaV WhgZKuRwO+HXyge0QJxnRTSazfavjyd8IL1u9s3AhGAzKvsMZ4ZEEwUHjANbo6oMI/sI SeOEHVIt1qwNJUswXltM1OjWAzUZMn7omSHfIcsohRydTJ30NtfE6739GfyFjJvR5Ngj 1rKA== X-Gm-Message-State: AOJu0Yw4ydaz4NuUNKqU1j1uwZ9B8sNfwQDXweLxNkEU18nnkscJlZnp HpzlhIoWiXy3bWyyjotazYo9ZrVsuirbkxog33TdWxbbkU/5Tv6iY/gmvYLdPhw= X-Google-Smtp-Source: AGHT+IFwn5W3d+VVRyPS+Ks6t1Wg9f/VfrmjaEp3PoLGs96xVemWYbaZI0DuQ0MSgKZGu5ZlHrzEeA== X-Received: by 2002:a17:902:e74f:b0:20c:7898:a8f4 with SMTP id d9443c01a7336-20fab2d9835mr2557925ad.60.1729625420908; Tue, 22 Oct 2024 12:30:20 -0700 (PDT) Received: from KASONG-MC4.tencent.com ([106.37.123.36]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-20e7f0d9f05sm45895305ad.186.2024.10.22.12.30.17 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 22 Oct 2024 12:30:20 -0700 (PDT) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Chris Li , Barry Song , Ryan Roberts , Hugh Dickins , Yosry Ahmed , "Huang, Ying" , Tim Chen , Nhat Pham , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH 10/13] mm, swap: simplify percpu cluster updating Date: Wed, 23 Oct 2024 03:24:48 +0800 Message-ID: <20241022192451.38138-11-ryncsn@gmail.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20241022192451.38138-1-ryncsn@gmail.com> References: <20241022192451.38138-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 X-Rspamd-Server: rspam03 X-Rspam-User: X-Rspamd-Queue-Id: 03B2320019 X-Stat-Signature: znr4xr8qqwe4r7ub4h6gwc6suxk1jahe X-HE-Tag: 1729625414-248419 X-HE-Meta: U2FsdGVkX1+xmBXIdngnPC7fW8NElfbc5xL63bKAVok/UMxAwoDkpT6RlqDSA1mbC2QNFkfUXP2UckidHxdlmusPgAuGjoH9LKT19Jw6hQtL/bntxTQdEKyZ2a336/F2Wvw4EXAlFFDEPWi8kjGd1v4D5YFPREJXgbiReyzh8bZa+lOLQVWdiEvLkNrAXE+JLOw2Zt2aaeAb0IqXCzZ00eLlh+iDe6Zeyak4O9W5wi8YDZqaXbRJ3Bqeio0bHjvIEfR+qm288KLWL0JfsP5tA9xydGq9wlMVfQ3MJApDO/Ci+vut9qrm7MPLg8GoyG//Ml8XKbVMOdpFBzxgpU2WAxlpAxZF0/ZKqXHKXkTp055wlDw7F4+e8vhHZ8okeJnHBsEKkJsyhQwUTQDjcEmXCooAiRpiuDTrZTH2LrpdNRurXQrPjxixJkzC4g/sh3Fu2RxVDHJeEqWwteFVZ5+cYplLIdWAIdzjEGxDxYkzSfNWK87lN8Y4Dy1AFJAzpbzZZZ5cKHToGH8yvArEhXo8856FM9Tnt0EIV5yiqWc3CvmwSlDRdDk96xTeaX5rvWYyGEnC8ZIqkqSxqw+tRkF9rXjTQvWcULSkP+HUJBYZwFzPORFPulrBiynyd4SWMA2I3p/dr3iaJmYj/vDuPy1DH933j8xwOC5DA9xsX0M3I9Ngvyph7/6ZrDbfw8l9TDQyDRYApL3E8F1mYr4PaMew8VQxwZqdu9aEwiEo2i0qGXYJl/WxkKh05Rb1R/yX6Xjd3ARKwhCBuq/MCjMSb2VFlzIzb0tJwpqn4Hh6zwEWTUYb41tb8iHEiB6hb7oUQv6Zer0LOQ3CTlWswUpMxXA7vksPuXR6J1iXR8Pt9yb8bIm4j3MXDNq2WKADYn08JUwLTklTE+Uhk0HbHt47ws1yHASYGj1l0hS6uLKWKYxqD3+ZhSdoHeDLI9ZwK/bPwiJZ2DHL9CT0uwARAeihQwC Mzk/pul7 HEE1+i04UGIU92PtsoT980ru1jDMvNBjWcDRWoXxsyzZWAJH0r6pRklClTsJcajSnr8JQtZm9p4aoejdA6VetxA9C65q2V+4FpdemEjh13SKm1qB2CuE9sbi+s3E+OPrxzTP0fYO68v15Po6NbjeU1J3DYlxONZawieRvR7HVhTFI8SBcNSOjqOxOLtc7mIXijnz0+dgWPIis1CMsYvmia2ecocLtG+69qgBzhc3j5GNBlC4m+mqgLDn5c6SLchSr5xFGQSz94qY6hJIYNT4aLM0wVX9emM5CxJNALItH5QjLsiMnO3D/GC/0Rr1umAggba67iDF1gJOlsF37Em0A20IOt7M9DNdHSO8fJoSG4mTcrPdvTbKLXVZFO2w5uVbgKTwGcm8SLm+AwBjpgjBDz0A7UXsSQs+Do8Jjs2qhVtJnljQp5SVxeOQ3LspXbx5S9hV3LzTGC3tDQofZPGobSaRiEgKD0vFoSz87jG7JoPmG6d6cr64wUc5ZMJCWM3+ieXfiaIkvwLDCd9fVxHA0rBqfPg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Kairui Song Instead of using a returning argument, we can simply store the next cluster offset to the fixed percpu location, which reduce the stack usage and simplify the function: Object size: ./scripts/bloat-o-meter mm/swapfile.o mm/swapfile.o.new add/remove: 0/0 grow/shrink: 0/2 up/down: 0/-271 (-271) Function old new delta get_swap_pages 2847 2733 -114 alloc_swap_scan_cluster 894 737 -157 Total: Before=30833, After=30562, chg -0.88% Stack usage: Before: swapfile.c:1190:5:get_swap_pages 240 static After: swapfile.c:1185:5:get_swap_pages 216 static Signed-off-by: Kairui Song --- include/linux/swap.h | 4 ++-- mm/swapfile.c | 57 ++++++++++++++++++++------------------------ 2 files changed, 28 insertions(+), 33 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index a3b5d74b095a..0e6c6bb385f0 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -276,9 +276,9 @@ enum swap_cluster_flags { * The first page in the swap file is the swap header, which is always marked * bad to prevent it from being allocated as an entry. This also prevents the * cluster to which it belongs being marked free. Therefore 0 is safe to use as - * a sentinel to indicate next is not valid in percpu_cluster. + * a sentinel to indicate an entry is not valid. */ -#define SWAP_NEXT_INVALID 0 +#define SWAP_ENTRY_INVALID 0 #ifdef CONFIG_THP_SWAP #define SWAP_NR_ORDERS (PMD_ORDER + 1) diff --git a/mm/swapfile.c b/mm/swapfile.c index a19ee8d5ffd0..f529e2ce2019 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -743,11 +743,14 @@ static void cluster_alloc_range(struct swap_info_struct *si, struct swap_cluster ci->count += nr_pages; } -static unsigned int alloc_swap_scan_cluster(struct swap_info_struct *si, unsigned long offset, - unsigned int *foundp, unsigned int order, +/* Try use a new cluster for current CPU and allocate from it. */ +static unsigned int alloc_swap_scan_cluster(struct swap_info_struct *si, + unsigned long offset, + unsigned int order, unsigned char usage) { - unsigned long start = offset & ~(SWAPFILE_CLUSTER - 1); + unsigned int next = SWAP_ENTRY_INVALID, found = SWAP_ENTRY_INVALID; + unsigned long start = ALIGN_DOWN(offset, SWAPFILE_CLUSTER); unsigned long end = min(start + SWAPFILE_CLUSTER, si->max); unsigned int nr_pages = 1 << order; bool need_reclaim, ret; @@ -756,10 +759,8 @@ static unsigned int alloc_swap_scan_cluster(struct swap_info_struct *si, unsigne ci = &si->cluster_info[offset / SWAPFILE_CLUSTER]; lockdep_assert_held(&ci->lock); - if (end < nr_pages || ci->count + nr_pages > SWAPFILE_CLUSTER) { - offset = SWAP_NEXT_INVALID; + if (end < nr_pages || ci->count + nr_pages > SWAPFILE_CLUSTER) goto out; - } for (end -= nr_pages; offset <= end; offset += nr_pages) { need_reclaim = false; @@ -773,10 +774,8 @@ static unsigned int alloc_swap_scan_cluster(struct swap_info_struct *si, unsigne * cluster has no flag set, and change of list * won't cause fragmentation. */ - if (!cluster_is_usable(ci, order)) { - offset = SWAP_NEXT_INVALID; + if (!cluster_is_usable(ci, order)) goto out; - } if (cluster_is_free(ci)) offset = start; /* Reclaim failed but cluster is usable, try next */ @@ -784,20 +783,17 @@ static unsigned int alloc_swap_scan_cluster(struct swap_info_struct *si, unsigne continue; } cluster_alloc_range(si, ci, offset, usage, order); - *foundp = offset; - if (ci->count == SWAPFILE_CLUSTER) { - offset = SWAP_NEXT_INVALID; - goto out; - } + found = offset; offset += nr_pages; + if (ci->count < SWAPFILE_CLUSTER && offset <= end) + next = offset; break; } - if (offset > end) - offset = SWAP_NEXT_INVALID; out: relocate_cluster(si, ci); unlock_cluster(ci); - return offset; + __this_cpu_write(si->percpu_cluster->next[order], next); + return found; } /* Return true if reclaimed a whole cluster */ @@ -866,8 +862,8 @@ static unsigned long cluster_alloc_swap_entry(struct swap_info_struct *si, int o if (cluster_is_usable(ci, order)) { if (cluster_is_free(ci)) offset = cluster_offset(si, ci); - offset = alloc_swap_scan_cluster(si, offset, &found, - order, usage); + found = alloc_swap_scan_cluster(si, offset, + order, usage); } else { unlock_cluster(ci); } @@ -878,8 +874,8 @@ static unsigned long cluster_alloc_swap_entry(struct swap_info_struct *si, int o new_cluster: ci = cluster_isolate_lock(si, &si->free_clusters); if (ci) { - offset = alloc_swap_scan_cluster(si, cluster_offset(si, ci), - &found, order, usage); + found = alloc_swap_scan_cluster(si, cluster_offset(si, ci), + order, usage); /* * Allocation from free cluster must never fail and * cluster lock must remain untouched. @@ -896,8 +892,8 @@ static unsigned long cluster_alloc_swap_entry(struct swap_info_struct *si, int o unsigned int frags = 0, frags_existing; while ((ci = cluster_isolate_lock(si, &si->nonfull_clusters[order]))) { - offset = alloc_swap_scan_cluster(si, cluster_offset(si, ci), - &found, order, usage); + found = alloc_swap_scan_cluster(si, cluster_offset(si, ci), + order, usage); /* * With `fragmenting` set to true, it will surely take * the cluster off nonfull list @@ -917,8 +913,8 @@ static unsigned long cluster_alloc_swap_entry(struct swap_info_struct *si, int o * per-CPU usage, but either way they could contain * usable (eg. lazy-freed swap cache) slots. */ - offset = alloc_swap_scan_cluster(si, cluster_offset(si, ci), - &found, order, usage); + found = alloc_swap_scan_cluster(si, cluster_offset(si, ci), + order, usage); if (found) goto done; frags++; @@ -944,21 +940,20 @@ static unsigned long cluster_alloc_swap_entry(struct swap_info_struct *si, int o */ while ((ci = cluster_isolate_lock(si, &si->frag_clusters[o]))) { atomic_long_dec(&si->frag_cluster_nr[o]); - offset = alloc_swap_scan_cluster(si, cluster_offset(si, ci), - &found, order, usage); + found = alloc_swap_scan_cluster(si, cluster_offset(si, ci), + 0, usage); if (found) goto done; } while ((ci = cluster_isolate_lock(si, &si->nonfull_clusters[o]))) { - offset = alloc_swap_scan_cluster(si, cluster_offset(si, ci), - &found, order, usage); + found = alloc_swap_scan_cluster(si, cluster_offset(si, ci), + 0, usage); if (found) goto done; } } done: - __this_cpu_write(si->percpu_cluster->next[order], offset); local_unlock(&si->percpu_cluster->lock); return found; @@ -3150,7 +3145,7 @@ static struct swap_cluster_info *setup_clusters(struct swap_info_struct *si, cluster = per_cpu_ptr(si->percpu_cluster, cpu); for (i = 0; i < SWAP_NR_ORDERS; i++) - cluster->next[i] = SWAP_NEXT_INVALID; + cluster->next[i] = SWAP_ENTRY_INVALID; local_lock_init(&cluster->lock); } From patchwork Tue Oct 22 19:24:49 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kairui Song X-Patchwork-Id: 13846085 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8F7EBCDD0CB for ; Tue, 22 Oct 2024 19:30:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EE4636B00A2; Tue, 22 Oct 2024 15:30:28 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E91E86B00A3; Tue, 22 Oct 2024 15:30:28 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CBDAA6B00A4; Tue, 22 Oct 2024 15:30:28 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id A70F96B00A2 for ; Tue, 22 Oct 2024 15:30:28 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 90DDBC047F for ; Tue, 22 Oct 2024 19:30:10 +0000 (UTC) X-FDA: 82702228686.20.926740B Received: from mail-pj1-f49.google.com (mail-pj1-f49.google.com [209.85.216.49]) by imf19.hostedemail.com (Postfix) with ESMTP id 325671A0024 for ; Tue, 22 Oct 2024 19:30:05 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=LOw8tnSy; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf19.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.216.49 as permitted sender) smtp.mailfrom=ryncsn@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729625303; a=rsa-sha256; cv=none; b=apkSIVY5lfGFY6Ai5dpRPPy6GjXkGW1nF0cmXSBsYl9olsQQJZaBhJj2jaLxU0FtGvfYbj URicqrrwkPKqMci7XReMZoKGpgg4Hn0I1u4aiPxxv8XWlcJjEJF28FlWEQiydirSv3BlOw Xf0RIXfvdEJwemSA1GpJ0LClJSyFXbU= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=LOw8tnSy; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf19.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.216.49 as permitted sender) smtp.mailfrom=ryncsn@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1729625303; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=58sEe9Vj5/e+n1OUXtAWiattRyV+UTZMgvOaHIMS+fY=; b=7gRJMhTb2+UfY+M/NO4lHv2V5TGpGJ2LEwISMwWwUASYV2bDwLXLJXMSkRkTtx5J+VIcE8 5uL6wul/okVBXQYFF6BzulHNIqMySiPKjAmgRgUjFne1+/B8a677d0Kcb+0nU/EAwkURmv EbWyZGF79yP9wVsXygnWqaxLM8tscM4= Received: by mail-pj1-f49.google.com with SMTP id 98e67ed59e1d1-2e34a089cd3so4906198a91.3 for ; Tue, 22 Oct 2024 12:30:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1729625425; x=1730230225; darn=kvack.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=58sEe9Vj5/e+n1OUXtAWiattRyV+UTZMgvOaHIMS+fY=; b=LOw8tnSyYkpRa9MGlOi3QUeD9KV7XOOZxgxvZk2WopdcQG8zCivq5gfL/RKF3r5alD pVzCaNCnuSacqCEXWCASexgriEZ7DzSTXKDZ4tLbO0gteF5eGNzN17ekMTnA6Xis844E +1K4YPIC/sp0nGK/J5VL3J+2kYlvaPSufT8x2qjE0f75Yqi1nLLxlA0fSbyOLo1rhIle b3V8/oaYq6idBBFR3b8i+l9imO2pnua5SB0kK9AIlf9oCTjWmi04rtiinE3hzD9lIJsE vmPobFkMYx8BzIO2CoM+Mc/1CAJv/zeDiASfB7i7tydF6s5oU2AupgGlUkOK+OAuEnrI ZuYQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729625425; x=1730230225; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=58sEe9Vj5/e+n1OUXtAWiattRyV+UTZMgvOaHIMS+fY=; b=p2CrtSzELz7afYrP4kr2HPW+gd5OG4q+wzy0QHcTtAkE+fP7M+1g4jeZUxKuTUxVmr 45r+mFgGiyBtFWZw9n8fZ20zdpTkVqWCefktmhlmPBJh4JfNPzvQF6dOe3v6g05GgRrU MYVWPodCUoVzsJ/yfCs1ZQNFpHh38cC5ejm7DRyKg4yxoihVQKMcQxPZtNOHmc950pnQ cmUSLxiMCnDKzLL48DC4ncDdT+ScHnwVzFu6btL+4f/XefAmAWhNadm78n7Pmxe7B8C4 74u06ZITYaQjhJGWk58aBu6AIZ3qj/hKFBDQoAv0/kvEnFGTA2mRpBF1xH1382PdVuo/ NeUg== X-Gm-Message-State: AOJu0YxfpUqS8V4jB9PXBBpO1y1lMKjV6wfBzug2ohBTc6P8HQzXut7N MUEQ4GfHaPQjZ1c+ITAf2nBnybEfSmDoWuXFbQyveGjHI3pJAYjSgpLn6Om+iVI= X-Google-Smtp-Source: AGHT+IH//2l1S5Hl16buo+tJ0OBYMYhrYsRemVW2E65MZE0w+xwxw5pa87HO3zq6jV4yi+SknPNUlg== X-Received: by 2002:a17:90a:de8e:b0:2e2:bd68:b8d8 with SMTP id 98e67ed59e1d1-2e76b5b59a9mr38950a91.8.1729625424694; Tue, 22 Oct 2024 12:30:24 -0700 (PDT) Received: from KASONG-MC4.tencent.com ([106.37.123.36]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-20e7f0d9f05sm45895305ad.186.2024.10.22.12.30.21 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 22 Oct 2024 12:30:24 -0700 (PDT) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Chris Li , Barry Song , Ryan Roberts , Hugh Dickins , Yosry Ahmed , "Huang, Ying" , Tim Chen , Nhat Pham , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH 11/13] mm, swap: introduce a helper for retrieving cluster from offset Date: Wed, 23 Oct 2024 03:24:49 +0800 Message-ID: <20241022192451.38138-12-ryncsn@gmail.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20241022192451.38138-1-ryncsn@gmail.com> References: <20241022192451.38138-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 325671A0024 X-Stat-Signature: 6hujkh7xww7n6ep7thxccf3japwmgbzu X-Rspam-User: X-HE-Tag: 1729625405-716034 X-HE-Meta: U2FsdGVkX1+ujsQQEoO5vM9zSjFene5vRHkHEksWVNwCBeaJeUT5iMZPklyUG2p87Bl5YCAw9VdLxcuyoEkMGwNpNHYgpH+G4IdCmnQHCpqV8XT9CtWqCTSnT+Nn4rqZEiGaevH78zRX8zyci/SwOXM/VeWzNtv8DaXuhuVCcSmga24uwbrf10t7jU9blwWRWOAIECukj6qyc050dbcw+FNs10CiqT1VPv5kKo53Fl3DLEPY7Wu1gGZyhh5Dy3EJDvH2UYl19+icchsAU/cSCa465QsLLDvzYW9XfNqYC+ljaO0EZ+D6RUcVHc5Zf+OOfbuC2dT/OTbil2XfxQSwm8lVb5lkht2uh5PQc8+DUTLn+21/HnqQb+rDsD6Qt3x8c/cs1vZN0RCMsXoVU8FinNIVyIN9/9VOkFGmvC4kwWAUMZnzSTSHDsvJLQSJ5XsKYSn3Gs46ZBb/92SzGx+SRyXlm1yPSFJuoK2ddmqVtMTp/+LuAKQfMDdfmoMPU4G7p1QMvnJIzj9ouZzWzwEOtUGpciSN9Qsf3Rws/6e1GZZC7DGHpLbN5pLUKokkLnGlkscWa0SwxkQ7NEX0KBdorbfysmdolKR0Yqz1TgHkcZbLtUKHDyPMqS4z+x8ua6Jsc5s6LQDe3+DMDxAyjJbVov9oUZTLXwH7Si3dw8XMFB9BOwLoNPPFZJNLbUspgYU5+sPGfB0aIupxh+2oH2mhiMgoTvmOdwkETJkvdrggvlJWSV7XJqjHqRDL8khAVhC6NnSP3Z8HaseN9z/Tjm2PLlb9SMpIxT/s/OWmFAGT5IK9/CCsOiarXPqspPoyd2LEpDoOMaCf0mSM/5FIOXPVc5BKlcZxaBxayBcN7kQ28SqOVFN2tRnzRot+swvkcPQyfWLTriWa9HOvCNt3NlekBySLHbg2H0jaEuZdOtImxvoO2zxlsqNZCnXJf0ZN+TPLFiuZQt+zvLMxoKB+rgj us7ZZzyK ISO28pw+w1PDtyTsZlPHMzCUGuv+SYoXMEZrfYyXsSy2WzeiK0f83Kvl4Bg4S0K94iZ3fieoNRGAkuajrufU4ZdfbBdUUjmeSlrzIWT3XkLIFZ5dRoTKVsj59Ao/uUJgkxW8utjUE7fva68g96zr2Ez1eKeB7g6ICfL/sgedXF2gcvSpQhH2z2cvdxvrnzwsc4gyvUCPEdDBZoRPT/6gtF2wR2P67fnfYmkiRwbl65BD0GfdCUYR7qKWmShUyKJ70CArngHvr9Ha7QbBkicWHoUxs6xgcUTznbtkBHldx9FBl1g7qyFQzdrbuCG792lZ1yH6qcIcvwf8hJpqMDkrWLp37CYwpo9ZX8WwOblA0n7Pf7BL++xoi1NOd/oPEr7nizT+RWSp3MYQnGXFeiVmwNXvPIHjF5G7uqKqYJIv/KOPJ8q7nKi/qqSwqMszlaUe2kHAeuqAzZgahNod+6AC4LMhC6X/H0HgEhwbrWfHTb6W5+KvnpsP69IoleRe/UuPsKhxaraFvfl9tuNeUKHWKTXzeVu2FhdfDUzUiWbrfeOBNG4w= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Kairui Song It's a common operation to retrieve the cluster info from offset, introduce a helper for this. Suggested-by: Chris Li Signed-off-by: Kairui Song --- mm/swapfile.c | 16 +++++++++++----- 1 file changed, 11 insertions(+), 5 deletions(-) diff --git a/mm/swapfile.c b/mm/swapfile.c index f529e2ce2019..f25d697f6736 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -423,6 +423,12 @@ static inline unsigned int cluster_index(struct swap_info_struct *si, return ci - si->cluster_info; } +static inline struct swap_cluster_info *offset_to_cluster(struct swap_info_struct *si, + unsigned long offset) +{ + return &si->cluster_info[offset / SWAPFILE_CLUSTER]; +} + static inline unsigned int cluster_offset(struct swap_info_struct *si, struct swap_cluster_info *ci) { @@ -434,7 +440,7 @@ static inline struct swap_cluster_info *lock_cluster(struct swap_info_struct *si { struct swap_cluster_info *ci; - ci = &si->cluster_info[offset / SWAPFILE_CLUSTER]; + ci = offset_to_cluster(si, offset); spin_lock(&ci->lock); return ci; @@ -756,7 +762,7 @@ static unsigned int alloc_swap_scan_cluster(struct swap_info_struct *si, bool need_reclaim, ret; struct swap_cluster_info *ci; - ci = &si->cluster_info[offset / SWAPFILE_CLUSTER]; + ci = offset_to_cluster(si, offset); lockdep_assert_held(&ci->lock); if (end < nr_pages || ci->count + nr_pages > SWAPFILE_CLUSTER) @@ -1457,10 +1463,10 @@ static void swap_entry_range_free(struct swap_info_struct *si, swp_entry_t entry unsigned char *map_end = map + nr_pages; struct swap_cluster_info *ci; - /* It should never free entries across different clusters */ - VM_BUG_ON((offset / SWAPFILE_CLUSTER) != ((offset + nr_pages - 1) / SWAPFILE_CLUSTER)); - ci = lock_cluster(si, offset); + + /* It should never free entries across different clusters */ + VM_BUG_ON(ci != offset_to_cluster(si, offset + nr_pages - 1)); VM_BUG_ON(cluster_is_free(ci)); VM_BUG_ON(ci->count < nr_pages); From patchwork Tue Oct 22 19:24:50 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kairui Song X-Patchwork-Id: 13846086 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DA42FCDD0CB for ; Tue, 22 Oct 2024 19:30:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6A91D6B00A4; Tue, 22 Oct 2024 15:30:32 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 659B16B00A5; Tue, 22 Oct 2024 15:30:32 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 459646B00A6; Tue, 22 Oct 2024 15:30:32 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 1DC5A6B00A4 for ; Tue, 22 Oct 2024 15:30:32 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 064CB1A045C for ; Tue, 22 Oct 2024 19:30:02 +0000 (UTC) X-FDA: 82702229022.09.82AFB68 Received: from mail-pl1-f181.google.com (mail-pl1-f181.google.com [209.85.214.181]) by imf11.hostedemail.com (Postfix) with ESMTP id 3481440024 for ; Tue, 22 Oct 2024 19:30:09 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="FU/i7Y5X"; spf=pass (imf11.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.181 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1729625228; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=h2jmxIUhNZxFOtFl07G9SGXSpDqPv8x+3F53EFaJwLQ=; b=LAMXhA90jK4jRR8H2q6TJh9CRdsIiF8oh/TrwgmSfJAYSVnwJYpiL1Mt4DvVS9HD6qjjKK gL45ZUy2OKZIgoPLG0MzWNuwd2/1tqzqVgVnG3WL5llg4EpMJaD78l8tYTrRVdz3DT0Yj7 MEQZXd1J4p62gmfutSx4lPKbKTXCdv8= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="FU/i7Y5X"; spf=pass (imf11.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.181 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729625228; a=rsa-sha256; cv=none; b=jQJJBnTybXmgejsPvFaJy8vTTJW2g7htnYIGR2g3PC2a72Kbm4jcExL2IMpB+rZpXvQnla uthxMcsgCpUzWCAd/Zw7ayFCwtKgmZjwHaegZxikGK7sd//59fyQQZRek64IZlEMd3EC+w tGgUINfBYHakJrS8NQClcImWZQOfw/U= Received: by mail-pl1-f181.google.com with SMTP id d9443c01a7336-20cdbe608b3so49603685ad.1 for ; Tue, 22 Oct 2024 12:30:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1729625428; x=1730230228; darn=kvack.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=h2jmxIUhNZxFOtFl07G9SGXSpDqPv8x+3F53EFaJwLQ=; b=FU/i7Y5X9sOf7C7azB2XyvzgOz3gseurnGLZYpK+OPQzxgOtsPmOl7iC6kO4vZKNWJ N6wEXOMmCPZ3tt941NsA1rJfvgas34bRL3J0dAJEqVNWy2OE7fr3+572xloFrW5EECpK MEhAomJFhvIOcvfrhUVw68Ek5jFBbFNeyiiK7vA3SgO1Tv0wpa8MFx1in7M/2X1DDwo8 REVpPpmp13DX3i0srB6tSgizoI4Hx6VDqMJoVzIWpp5zXdbhbtHly8VLObbtPnR00FqM 7X0sRIqJKEQzfyPSxynKbtQI5bUunrx40genAE4Vqx6N93n2jHLSXvSJrFUj1nVq+Sar oOHw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729625428; x=1730230228; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=h2jmxIUhNZxFOtFl07G9SGXSpDqPv8x+3F53EFaJwLQ=; b=Qykm8oDho/WdjkR7Zq9tl4hkW6roDgVIH647uGcmiZ7BCP1/fl/Z5AzvVcm7iYRST0 ngMFRYXN5EfRiIyKqIoRbAVhVhQpezwG4mLYGeCitjXVBz6PYLB66VKhqXMiMvYohJuE DLiuMYmrT+em7dRXMZcH44PLRRrPx1eSYDFeD767USXr+4yGBLVj2dr/3APQCb7s3KE+ 7KbVEiY2QKlmg2fGAoOcL1CBHmsQz7u1+aIpP4SoeW9uAc4smRxhgki51HRSz6W5aW+R 5W+cpll2zPekObaEe/9E9nW5HrkkTEha4XIC1CxhJoHchfsmgN2MRqgjZGjHPj1M77T3 5pqA== X-Gm-Message-State: AOJu0YyMOpUOWy8JkjoSjm7Np4XoeVjdKQz1nV1O0jf/V7Z0nUK1gJay tSq3DbizX5f1dPAjmWiA0v6bFHXp+EzeG5GekjB3xmZWwwpGyiTKOOpNP42L6Us= X-Google-Smtp-Source: AGHT+IFV1ogV52DzmXBw68h0S9N3oYY1APJh2lIGdJCHh5CcSXToC3t30idapf9oyW75V51NGLt65w== X-Received: by 2002:a17:902:e752:b0:20f:ab2e:14f9 with SMTP id d9443c01a7336-20fab2e150bmr2982245ad.55.1729625428254; Tue, 22 Oct 2024 12:30:28 -0700 (PDT) Received: from KASONG-MC4.tencent.com ([106.37.123.36]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-20e7f0d9f05sm45895305ad.186.2024.10.22.12.30.24 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 22 Oct 2024 12:30:27 -0700 (PDT) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Chris Li , Barry Song , Ryan Roberts , Hugh Dickins , Yosry Ahmed , "Huang, Ying" , Tim Chen , Nhat Pham , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH 12/13] mm, swap: use a global swap cluster for non-rotation device Date: Wed, 23 Oct 2024 03:24:50 +0800 Message-ID: <20241022192451.38138-13-ryncsn@gmail.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20241022192451.38138-1-ryncsn@gmail.com> References: <20241022192451.38138-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 3481440024 X-Stat-Signature: 34kdite78646aacyurqix6bcy9pqdktx X-Rspam-User: X-HE-Tag: 1729625409-206308 X-HE-Meta: U2FsdGVkX19wDFxB68GXOdCmXEbcwEl2hhHLvMviWUXxR7K39MzmDVZekdw3jzqXZidsTeRdDHS0R+GrVoeUvzrbyzKV7IMrAGo9VZ8K63IEn3OcZYZUCarV/UzWiEBcs+2UTGymW42nJ2vDCvNUirVKAY+1jDJgfTKANM93SL02XupMlHmwHN0LM0ENaZKTihmp/dBX0u8/ivCbukCCEed2VCDmkfj1/P6skArXhIxVr7gJ3N4XCaSiD/YEqUss7LJyy7hHTTZ4ll/CCHCGdGsqb8qtnsOICzh7tO+9MI8NUMQcY89DIr8rHVKB3BfEzLaUbJMXocoH9/VEfG6qu2mWjMKFNiATSlkxjqB7IUvyXunCYd0Ibw4V/3fRJr31pjbDTfBdx8rJNX0Zxiu90XfmH0ujYM7iiLfC5/lyFRWHjgZAiJMsxIi+WQ1CYNFr/5eH5024wRMwzNdRNxJlPt7chfWUW9rMbIEtQtIp0v0oTG4NlMMzL88GGjmjGiaqcJhea6rARgBxX34HR+DmrUXHbpwgIImeeihv1Od8YZg+MZgFbYUPTZbmgc3bdqWq6BTEmdDnO4QEiKHiaVgDxp1c5yHeXMGCbIgfmL9YBnVNAtY/TvoAfmh+lnWt1TvIO7a7P2FYjoxFEMTyezlMF+cr3SzpugHcA87opFvuXAVcE59pYQskzSH0oBKeJlA0jiEfT1PBRSJxqy4kWZPKnzgVdIRFagBlxhnd2oEj01G+HeT6j1rkz261wQ4L/Zu9R/X4jg2f3vzeR7fXex2oWd5UbAI+A/9SzI2uE35PyVieFFmPt0ENQ6NaOEMS10o+kNWe+4DDk9JL4pL8TP6A0dyWePlOcmx/kdb7scfJMOe5QowrbttLZOgH2Qr/VqD0PUrElNHzRG9jdTLahA3O/d/OSKLFNqtrBxdsqMnpuZ/YTiiC6KaWptwAORkUquo34p/Dy4BYN8Q7tGeSecR GOM5AvaO xOm6onvPVH7xSYhqH82hYu+gwNP8mytSPxmxozJzEugrlXU3v7rd2+heJODnaNdRcJCcxiABqObDV2KsJJxmH+Lf+Oe+9wGpI/RNja1W1y3ClkyJSyU4KDl9iY9me4oQCZXpyXa/xlcFAY1gA0q+Be/3YmO5fKRziGjS9mYU6Qp8MJtdX3AFIB92njPthR6GjMLZGZuhm95w4CbE3JystHqXOWQyFKK/sMXWTgQCD+AScPnrSjvFAoF/nRUe66/+E1iO/6WsfTqPX1EvM+QVgpC6R3fEy2sYyRAaO+ISwywtkpLRhYmkfHt/s13GGy6l2ztMSmnYnHnINCAF1uNFJX1cmqLk3pLNhoHZBU4aKrqlbgLNs+YfJJd4bIq27VfwjI82FeB4brCM/dt8+61QLtfJl8Al6roqXeIdqhNKe+E6YWci5MkNcZv6exJ6R3j6NV8RRnbXlMsRbomVXLF1wOKc+ezFnOF9tnKZ8vJOEs1FWlQepU2I2ZMl8T2+YTLtUjRCq06B2bLHqEuQPW+TJKtR+HvRFXorya0Yy/tZRbbeLhl0= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Kairui Song Non-rotation (SSD / ZRAM) device can tolerate fragmentations so the goal of SWAP allocator is to avoid contention of clusters. So it used a per-CPU cluster design, and each CPU will be using a different cluster as much as possible. But HDD is very sensitive to fragmentations, contention is trivial compared to this. So just use one global cluster instead. This ensured each order will be wring to a same cluster as much as possible, which helps to make the IO more continuous. This ensures the performance of cluster allocator is as good as the old allocator. Test after this commit compared to before this series: make -j32 with tinyconfig, using 1G memcg limit and HDD swap: Before this series: 114.44user 29.11system 39:42.90elapsed 6%CPU (0avgtext+0avgdata 157284maxresident)k 2901232inputs+0outputs (238877major+4227640minor)pagefaults After this commit: 113.90user 23.81system 38:11.77elapsed 6%CPU (0avgtext+0avgdata 157260maxresident)k 2548728inputs+0outputs (235471major+4238110minor)pagefaults Suggested-by: Chris Li Signed-off-by: Kairui Song --- include/linux/swap.h | 2 ++ mm/swapfile.c | 48 ++++++++++++++++++++++++++++++++------------ 2 files changed, 37 insertions(+), 13 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index 0e6c6bb385f0..9898b1881d4d 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -319,6 +319,8 @@ struct swap_info_struct { unsigned int pages; /* total of usable pages of swap */ atomic_long_t inuse_pages; /* number of those currently in use */ struct percpu_cluster __percpu *percpu_cluster; /* per cpu's swap location */ + struct percpu_cluster *global_cluster; /* Use one global cluster for rotating device */ + spinlock_t global_cluster_lock; /* Serialize usage of global cluster */ struct rb_root swap_extent_root;/* root of the swap extent rbtree */ struct block_device *bdev; /* swap device or bdev of swap file */ struct file *swap_file; /* seldom referenced */ diff --git a/mm/swapfile.c b/mm/swapfile.c index f25d697f6736..6eb298a222c0 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -798,7 +798,10 @@ static unsigned int alloc_swap_scan_cluster(struct swap_info_struct *si, out: relocate_cluster(si, ci); unlock_cluster(ci); - __this_cpu_write(si->percpu_cluster->next[order], next); + if (si->flags & SWP_SOLIDSTATE) + __this_cpu_write(si->percpu_cluster->next[order], next); + else + si->global_cluster->next[order] = next; return found; } @@ -860,8 +863,14 @@ static unsigned long cluster_alloc_swap_entry(struct swap_info_struct *si, int o unsigned int offset, found = 0; /* Fast path using per CPU cluster */ - local_lock(&si->percpu_cluster->lock); - offset = __this_cpu_read(si->percpu_cluster->next[order]); + if (si->flags & SWP_SOLIDSTATE) { + local_lock(&si->percpu_cluster->lock); + offset = __this_cpu_read(si->percpu_cluster->next[order]); + } else { + spin_lock(&si->global_cluster_lock); + offset = si->global_cluster->next[order]; + } + if (offset) { ci = lock_cluster(si, offset); /* Cluster could have been used by another order */ @@ -960,8 +969,10 @@ static unsigned long cluster_alloc_swap_entry(struct swap_info_struct *si, int o } } done: - local_unlock(&si->percpu_cluster->lock); - + if (si->flags & SWP_SOLIDSTATE) + local_unlock(&si->percpu_cluster->lock); + else + spin_unlock(&si->global_cluster_lock); return found; } @@ -2737,6 +2748,8 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile) mutex_unlock(&swapon_mutex); free_percpu(p->percpu_cluster); p->percpu_cluster = NULL; + kfree(p->global_cluster); + p->global_cluster = NULL; vfree(swap_map); kvfree(zeromap); kvfree(cluster_info); @@ -3142,17 +3155,24 @@ static struct swap_cluster_info *setup_clusters(struct swap_info_struct *si, for (i = 0; i < nr_clusters; i++) spin_lock_init(&cluster_info[i].lock); - si->percpu_cluster = alloc_percpu(struct percpu_cluster); - if (!si->percpu_cluster) - goto err_free; + if (si->flags & SWP_SOLIDSTATE) { + si->percpu_cluster = alloc_percpu(struct percpu_cluster); + if (!si->percpu_cluster) + goto err_free; - for_each_possible_cpu(cpu) { - struct percpu_cluster *cluster; + for_each_possible_cpu(cpu) { + struct percpu_cluster *cluster; - cluster = per_cpu_ptr(si->percpu_cluster, cpu); + cluster = per_cpu_ptr(si->percpu_cluster, cpu); + for (i = 0; i < SWAP_NR_ORDERS; i++) + cluster->next[i] = SWAP_ENTRY_INVALID; + local_lock_init(&cluster->lock); + } + } else { + si->global_cluster = kmalloc(sizeof(*si->global_cluster), GFP_KERNEL); for (i = 0; i < SWAP_NR_ORDERS; i++) - cluster->next[i] = SWAP_ENTRY_INVALID; - local_lock_init(&cluster->lock); + si->global_cluster->next[i] = SWAP_ENTRY_INVALID; + spin_lock_init(&si->global_cluster_lock); } /* @@ -3426,6 +3446,8 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags) bad_swap: free_percpu(si->percpu_cluster); si->percpu_cluster = NULL; + kfree(si->global_cluster); + si->global_cluster = NULL; inode = NULL; destroy_swap_extents(si); swap_cgroup_swapoff(si->type); From patchwork Tue Oct 22 19:37:42 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kairui Song X-Patchwork-Id: 13846097 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5CCDFCDD0DB for ; Tue, 22 Oct 2024 19:38:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E48076B00A9; Tue, 22 Oct 2024 15:38:12 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DD1A16B00AA; Tue, 22 Oct 2024 15:38:12 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BFE136B00AB; Tue, 22 Oct 2024 15:38:12 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 97E716B00A9 for ; Tue, 22 Oct 2024 15:38:12 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id EB304A031E for ; Tue, 22 Oct 2024 19:37:41 +0000 (UTC) X-FDA: 82702248762.12.B8C4D93 Received: from mail-pl1-f170.google.com (mail-pl1-f170.google.com [209.85.214.170]) by imf13.hostedemail.com (Postfix) with ESMTP id 7696120016 for ; Tue, 22 Oct 2024 19:37:53 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=lzWfgnnV; spf=pass (imf13.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.170 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1729625738; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=R0FBRPvlweV3V63kFRaG7PTBewO1B3m5O3zQjLSkCfw=; b=d+fjzg/3USZQbJWlVbAQEFPQo5ADtvAaDbHg4X+/KewyVSSVHJ4hnMamknOmBVB9v8NWMB n+dqfJe5hZdIdVpR5XfGHIxDMwOtS4COdMMTGRnLvboqtR/TG85rdeMxa39xIY/V2/OmQe AIMpI7Qw6e6geL0IOwQJmIW+Fd8f+7k= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729625738; a=rsa-sha256; cv=none; b=2PYfmRf5ZwrAAbMN4RxGLsLZJOcNLHhcmuza9MXgxRJvxIzduOLaLYidfD5a5ASqMzzLs1 oeAEhVHh11yKkE7ThvEtVXspbc3YV+3YAh1JFqB651osKQkBIzRLE9bKV6WBXeklU8s5pI E104qzKiqNQRRXA0y9XysgjARy5iLq8= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=lzWfgnnV; spf=pass (imf13.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.170 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-pl1-f170.google.com with SMTP id d9443c01a7336-20cbcd71012so54465415ad.3 for ; Tue, 22 Oct 2024 12:38:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1729625888; x=1730230688; darn=kvack.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=R0FBRPvlweV3V63kFRaG7PTBewO1B3m5O3zQjLSkCfw=; b=lzWfgnnVk5HZmNfCUs0+2UPp4el1CcjKqBg+E+xl9AmV/3LdsXd8eAswzWO/DCp2d3 bmfzbMaf2qYcz0/LEt7HYCmac+xC6TVrdoP38YQueQ6chcRW5zJAY5zwFrbCA3njrIOD JrAqt77/n8EIjO1mFnjE46+xMb71oSbecJ4NqHUvsyTcvfdzZLnt+HZz+M2kMzaIOMGX ISqju7Toi6UhmbvVcYYRYQ8fsPP1fSf24GtXIDPvLxuUdM2PIi8zEEeRZmBM+hr4CJft uUUIpG4HqfzUduHUnGMYDA7QW9UrStkugmCfS+FnZy1C47E4WDj7LMwZvxSJJOeyhs6k 8nGw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729625888; x=1730230688; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=R0FBRPvlweV3V63kFRaG7PTBewO1B3m5O3zQjLSkCfw=; b=WYLoQQOD0D1B9+ORJ3jX03MycumWeAQxN5QXr4h4qGnaeUtQIr2Gj2oasBZX/hvSCE bEvtPQDRLfUe9Cl4VGyy6lK+fLCjMrIs0qHZXkeWAgCUcygnZ3WagXknjQkkqRAjPjyE t53kfIkr5lVaLwGQalGqRRGBVdDZ0qosg0RjBejJewj7IwMLLWFxCA7nFm8EsJSVlTXd X3qFJ5krRsdxaEzf7Qb62G9M+BZ8XNpfClVZe5qFBE3PGadh6D5Yd4ZSkTUK8IF1Otfl dpKYcfpk4wM9NIQhsA8aHiaDVPAfp5fCnVXyKsrf8TbfLiLg6nbPsWEU/2mJpujmqDqX kAIQ== X-Gm-Message-State: AOJu0YxnFpGn0auzlGaronBc+RkIpTksRDExa1Lzj/M090L5dKxB8mrG 9Cjd/+wpHUM0x6FQ6m+Zclo3UchEKZxVo0IQCwWJVGKF7G1posxYImUP9WVBuwE= X-Google-Smtp-Source: AGHT+IGUJ8AQzwqt8/MUkHuIb89YL1SfVEexjZ9y8Ui4ZXfkWlfD0aA1npA0ZkViny0ezZ6LYCpwQA== X-Received: by 2002:a17:902:e884:b0:20c:a498:1e4d with SMTP id d9443c01a7336-20fab2e0ed9mr2600735ad.60.1729625888219; Tue, 22 Oct 2024 12:38:08 -0700 (PDT) Received: from KASONG-MC4.tencent.com ([106.37.123.36]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-20e7ef0ad34sm46201875ad.68.2024.10.22.12.38.03 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 22 Oct 2024 12:38:06 -0700 (PDT) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Chris Li , Barry Song , Ryan Roberts , Hugh Dickins , Yosry Ahmed , "Huang, Ying" , Tim Chen , Nhat Pham , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH 13/13] mm, swap_slots: remove slot cache for freeing path Date: Wed, 23 Oct 2024 03:37:42 +0800 Message-ID: <20241022193742.43903-1-ryncsn@gmail.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20241022192451.38138-1-ryncsn@gmail.com> References: <20241022192451.38138-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 X-Stat-Signature: g4n4p4qk8e76zwwzkq7cf3e47zdjnuos X-Rspamd-Queue-Id: 7696120016 X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1729625873-363902 X-HE-Meta: U2FsdGVkX19FwPB5FeuhXCJu67QEw/wUYT/OIkbYTRQn9CEZimK8eQpmozsDSOaMV/aan8aOpBKd8vxhnwSPLAXVpeYaVsOAatLOi2Aa7kJsJAOEbeaO71JNM9MFaLwRtys7/l5HjGtQaj5wY9PqVznyAW64TUWH+5zJdWZYWEWwU+47f4K5sfc0Nph+KBIPe+QXnYRktEPbjGCf35Oslvx9vRk4XlYtZlE/yr2uoV3+er3J+4F8dJObwqlrgL8kLZEhSD08KVz89ECXzEpMW6qYiPWPDIPNF7EjMdQtjugZHwtX3JAlTKs+N7raBAcYinUILdH7Mlz6fSf3cw+yoreyzyxB5Ou1wh80YPiKfoKtl+C9lgebfprWBaZCg7B4f/deUpFO9D3RWqtqoUMzr4tx2IqzQmWI/SvwKpyJplX5qu4uTAYAkcY22XleivSuDXydSRTopQnCev85lWC1bQRWCwbfoqUMrHEle4IfeqYxBgUoh5UXOuhdkkWBIB2l5l/S5JlO29/F0wg5nCzqJq9OfPd8zSB4lC5torbX8LYhf7HkDdNxI7DWoPU1eMgd3p+x0IN2lMX8pNaknbDCs6+hJ314hyH32dCQ4/77qW5hY14+TyFzGoROiT/l3/ZIjv4b01R0yFQYeezyF1/EdIN+gGPwS9hhk8ugvTGv0tjWeDr1j4xHrj+ddQQXnJbzYd45vgn+VfkJmBGsWNzWaPCI+Gnqh6o+9aGs0QDD2gjO+y4/dIcTmV88cdnRtF16hDkZfnyFo6O4P/qTZZGwiwufEhBLvInJ61lav8Ty4TmO1n+LqJ6MQjt/TqkDcUyQITLnCI4k1/aTp7q2Hwo4Ha/EkeTmnVidO5OCd6BfVhB/QVYoxkJWnFneTpjwFAJlWznaZ8/39UQjjJkeK4Id6V6wCMBE4Y0Qu+mphprOSL+LUdafCqlhDIhyCdIMq3s06bETGxhXBm2kpz9dqNb WJ2ciBbo lhzU3y0fPZLedMOWmFAbpNPGd7DSDpGrrHf/ZviLC4R1kLvrf+z5E3N5LUtBY5rGAw+ypwAvFzkmExU2LZEimMdqL207LMsWV0eabp5CzjEwlbLc8fNWPhjazrxtTitLLaORjkBBCkkYQwZk5KzTl39MzAzmb3V+EWyo1COC6p/cOzkxYYTWdlSelWNaSyw+xe+gEs6JSiK0Kz8/4AntYff71kY72Ib7ufGvE7raq7ug3JRwbFLvqDRx10+js23uDJbrNdNn7EZxO6JrVfAvz5A/NcJRuNso047Q9p7+fcS1U3QRaN6o11Eh7KEjqlQNU7O836Nw3oVMsxXDkmqtRvfalD6JvcRKgO58m66d/Sb870/1GMzXD+Wk0EQ03nV1AeC0oumhTWsdm1Iv5VlchlPsZ9LbD2QYsh15Nnp+VqPSmeFF2xyH9PvUMDVLFvPNmsCa2uIPCmQ2vaPRzxiFGPzUX1o/SUsI+3OVf4B+0DGmSPGhQQVf1XlhkIkEP66GzEGjDSDeWhxG/v6+epc8kZ//p0JjaYvki1xyLp8dgwAqSxDp26Wnjtnrgs3JhhzOtn4lYhhrXk/CYHSY2FtJ7r1ZpRh3PslztSvaBO55HUem0raMdnupt5Gq/+A== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Kairui Song The slot cache for freeing path is mostly for reducing the overhead of si->lock. As we have basically eliminated the si->lock usage for freeing path, it can be just removed. This helps simplify the code, and avoids swap entries from being hold in cache upon freeing. The delayed freeing of entries have been causing trouble for further optimizations for zswap [1] and in theory will also cause more fragmentation, and extra overhead. Test with build linux kernel showed both performance and fragmentation is better without the cache: tiem make -j96 / 768M memcg, 4K pages, 10G ZRAM, avg of 4 test run:: Before: Sys time: 36047.78, Real time: 472.43 After: (-7.6% sys time, -7.3% real time) Sys time: 33314.76, Real time: 437.67 time make -j96 / 1152M memcg, 64K mTHP, 10G ZRAM, avg of 4 test run: Before: Sys time: 46859.04, Real time: 562.63 hugepages-64kB/stats/swpout: 1783392 hugepages-64kB/stats/swpout_fallback: 240875 After: (-23.3% sys time, -21.3% real time) Sys time: 35958.87, Real time: 442.69 hugepages-64kB/stats/swpout: 1866267 hugepages-64kB/stats/swpout_fallback: 158330 Sequential SWAP should be also slightly faster, tests didn't show a measurable difference though, at least no regression: Swapin 4G zero page on ZRAM (time in us): Before (avg. 1923756) 1912391 1927023 1927957 1916527 1918263 1914284 1934753 1940813 1921791 After (avg. 1922290): 1919101 1925743 1916810 1917007 1923930 1935152 1917403 1923549 1921913 Link: https://lore.kernel.org/all/CAMgjq7ACohT_uerSz8E_994ZZCv709Zor+43hdmesW_59W1BWw@mail.gmail.com/[1c] Suggested-by: Chris Li Signed-off-by: Kairui Song --- include/linux/swap_slots.h | 3 -- mm/swap_slots.c | 78 +++++---------------------------- mm/swapfile.c | 89 +++++++++++++++----------------------- 3 files changed, 44 insertions(+), 126 deletions(-) diff --git a/include/linux/swap_slots.h b/include/linux/swap_slots.h index 15adfb8c813a..840aec3523b2 100644 --- a/include/linux/swap_slots.h +++ b/include/linux/swap_slots.h @@ -16,15 +16,12 @@ struct swap_slots_cache { swp_entry_t *slots; int nr; int cur; - spinlock_t free_lock; /* protects slots_ret, n_ret */ - swp_entry_t *slots_ret; int n_ret; }; void disable_swap_slots_cache_lock(void); void reenable_swap_slots_cache_unlock(void); void enable_swap_slots_cache(void); -void free_swap_slot(swp_entry_t entry); extern bool swap_slot_cache_enabled; diff --git a/mm/swap_slots.c b/mm/swap_slots.c index 13ab3b771409..9c7c171df7ba 100644 --- a/mm/swap_slots.c +++ b/mm/swap_slots.c @@ -43,17 +43,15 @@ static DEFINE_MUTEX(swap_slots_cache_mutex); /* Serialize swap slots cache enable/disable operations */ static DEFINE_MUTEX(swap_slots_cache_enable_mutex); -static void __drain_swap_slots_cache(unsigned int type); +static void __drain_swap_slots_cache(void); #define use_swap_slot_cache (swap_slot_cache_active && swap_slot_cache_enabled) -#define SLOTS_CACHE 0x1 -#define SLOTS_CACHE_RET 0x2 static void deactivate_swap_slots_cache(void) { mutex_lock(&swap_slots_cache_mutex); swap_slot_cache_active = false; - __drain_swap_slots_cache(SLOTS_CACHE|SLOTS_CACHE_RET); + __drain_swap_slots_cache(); mutex_unlock(&swap_slots_cache_mutex); } @@ -72,7 +70,7 @@ void disable_swap_slots_cache_lock(void) if (swap_slot_cache_initialized) { /* serialize with cpu hotplug operations */ cpus_read_lock(); - __drain_swap_slots_cache(SLOTS_CACHE|SLOTS_CACHE_RET); + __drain_swap_slots_cache(); cpus_read_unlock(); } } @@ -113,7 +111,7 @@ static bool check_cache_active(void) static int alloc_swap_slot_cache(unsigned int cpu) { struct swap_slots_cache *cache; - swp_entry_t *slots, *slots_ret; + swp_entry_t *slots; /* * Do allocation outside swap_slots_cache_mutex @@ -125,28 +123,19 @@ static int alloc_swap_slot_cache(unsigned int cpu) if (!slots) return -ENOMEM; - slots_ret = kvcalloc(SWAP_SLOTS_CACHE_SIZE, sizeof(swp_entry_t), - GFP_KERNEL); - if (!slots_ret) { - kvfree(slots); - return -ENOMEM; - } - mutex_lock(&swap_slots_cache_mutex); cache = &per_cpu(swp_slots, cpu); - if (cache->slots || cache->slots_ret) { + if (cache->slots) { /* cache already allocated */ mutex_unlock(&swap_slots_cache_mutex); kvfree(slots); - kvfree(slots_ret); return 0; } if (!cache->lock_initialized) { mutex_init(&cache->alloc_lock); - spin_lock_init(&cache->free_lock); cache->lock_initialized = true; } cache->nr = 0; @@ -160,19 +149,16 @@ static int alloc_swap_slot_cache(unsigned int cpu) */ mb(); cache->slots = slots; - cache->slots_ret = slots_ret; mutex_unlock(&swap_slots_cache_mutex); return 0; } -static void drain_slots_cache_cpu(unsigned int cpu, unsigned int type, - bool free_slots) +static void drain_slots_cache_cpu(unsigned int cpu, bool free_slots) { struct swap_slots_cache *cache; - swp_entry_t *slots = NULL; cache = &per_cpu(swp_slots, cpu); - if ((type & SLOTS_CACHE) && cache->slots) { + if (cache->slots) { mutex_lock(&cache->alloc_lock); swapcache_free_entries(cache->slots + cache->cur, cache->nr); cache->cur = 0; @@ -183,20 +169,9 @@ static void drain_slots_cache_cpu(unsigned int cpu, unsigned int type, } mutex_unlock(&cache->alloc_lock); } - if ((type & SLOTS_CACHE_RET) && cache->slots_ret) { - spin_lock_irq(&cache->free_lock); - swapcache_free_entries(cache->slots_ret, cache->n_ret); - cache->n_ret = 0; - if (free_slots && cache->slots_ret) { - slots = cache->slots_ret; - cache->slots_ret = NULL; - } - spin_unlock_irq(&cache->free_lock); - kvfree(slots); - } } -static void __drain_swap_slots_cache(unsigned int type) +static void __drain_swap_slots_cache(void) { unsigned int cpu; @@ -224,13 +199,13 @@ static void __drain_swap_slots_cache(unsigned int type) * There are no slots on such cpu that need to be drained. */ for_each_online_cpu(cpu) - drain_slots_cache_cpu(cpu, type, false); + drain_slots_cache_cpu(cpu, false); } static int free_slot_cache(unsigned int cpu) { mutex_lock(&swap_slots_cache_mutex); - drain_slots_cache_cpu(cpu, SLOTS_CACHE | SLOTS_CACHE_RET, true); + drain_slots_cache_cpu(cpu, true); mutex_unlock(&swap_slots_cache_mutex); return 0; } @@ -269,39 +244,6 @@ static int refill_swap_slots_cache(struct swap_slots_cache *cache) return cache->nr; } -void free_swap_slot(swp_entry_t entry) -{ - struct swap_slots_cache *cache; - - /* Large folio swap slot is not covered. */ - zswap_invalidate(entry); - - cache = raw_cpu_ptr(&swp_slots); - if (likely(use_swap_slot_cache && cache->slots_ret)) { - spin_lock_irq(&cache->free_lock); - /* Swap slots cache may be deactivated before acquiring lock */ - if (!use_swap_slot_cache || !cache->slots_ret) { - spin_unlock_irq(&cache->free_lock); - goto direct_free; - } - if (cache->n_ret >= SWAP_SLOTS_CACHE_SIZE) { - /* - * Return slots to global pool. - * The current swap_map value is SWAP_HAS_CACHE. - * Set it to 0 to indicate it is available for - * allocation in global pool - */ - swapcache_free_entries(cache->slots_ret, cache->n_ret); - cache->n_ret = 0; - } - cache->slots_ret[cache->n_ret++] = entry; - spin_unlock_irq(&cache->free_lock); - } else { -direct_free: - swapcache_free_entries(&entry, 1); - } -} - swp_entry_t folio_alloc_swap(struct folio *folio) { swp_entry_t entry; diff --git a/mm/swapfile.c b/mm/swapfile.c index 6eb298a222c0..c77b6ec3c83b 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -53,14 +53,15 @@ static bool swap_count_continued(struct swap_info_struct *, pgoff_t, unsigned char); static void free_swap_count_continuations(struct swap_info_struct *); -static void swap_entry_range_free(struct swap_info_struct *si, swp_entry_t entry, - unsigned int nr_pages); +static void swap_entry_range_free(struct swap_info_struct *si, + struct swap_cluster_info *ci, + swp_entry_t entry, unsigned int nr_pages); static void swap_range_alloc(struct swap_info_struct *si, unsigned int nr_entries); static bool folio_swapcache_freeable(struct folio *folio); static struct swap_cluster_info *lock_cluster(struct swap_info_struct *si, unsigned long offset); -static void unlock_cluster(struct swap_cluster_info *ci); +static inline void unlock_cluster(struct swap_cluster_info *ci); static DEFINE_SPINLOCK(swap_lock); static unsigned int nr_swapfiles; @@ -260,10 +261,9 @@ static int __try_to_reclaim_swap(struct swap_info_struct *si, folio_ref_sub(folio, nr_pages); folio_set_dirty(folio); - /* Only sinple page folio can be backed by zswap */ - if (nr_pages == 1) - zswap_invalidate(entry); - swap_entry_range_free(si, entry, nr_pages); + ci = lock_cluster(si, offset); + swap_entry_range_free(si, ci, entry, nr_pages); + unlock_cluster(ci); ret = nr_pages; out_unlock: folio_unlock(folio); @@ -1105,8 +1105,10 @@ static void swap_range_free(struct swap_info_struct *si, unsigned long offset, * Use atomic clear_bit operations only on zeromap instead of non-atomic * bitmap_clear to prevent adjacent bits corruption due to simultaneous writes. */ - for (i = 0; i < nr_entries; i++) + for (i = 0; i < nr_entries; i++) { clear_bit(offset + i, si->zeromap); + zswap_invalidate(swp_entry(si->type, offset + i)); + } if (si->flags & SWP_BLKDEV) swap_slot_free_notify = @@ -1410,9 +1412,9 @@ static unsigned char __swap_entry_free(struct swap_info_struct *si, ci = lock_cluster(si, offset); usage = __swap_entry_free_locked(si, offset, 1); - unlock_cluster(ci); if (!usage) - free_swap_slot(entry); + swap_entry_range_free(si, ci, swp_entry(si->type, offset), 1); + unlock_cluster(ci); return usage; } @@ -1440,13 +1442,10 @@ static bool __swap_entries_free(struct swap_info_struct *si, } for (i = 0; i < nr; i++) WRITE_ONCE(si->swap_map[offset + i], SWAP_HAS_CACHE); + if (!has_cache) + swap_entry_range_free(si, ci, entry, nr); unlock_cluster(ci); - if (!has_cache) { - for (i = 0; i < nr; i++) - zswap_invalidate(swp_entry(si->type, offset + i)); - swap_entry_range_free(si, entry, nr); - } return has_cache; fallback: @@ -1466,15 +1465,13 @@ static bool __swap_entries_free(struct swap_info_struct *si, * Drop the last HAS_CACHE flag of swap entries, caller have to * ensure all entries belong to the same cgroup. */ -static void swap_entry_range_free(struct swap_info_struct *si, swp_entry_t entry, - unsigned int nr_pages) +static void swap_entry_range_free(struct swap_info_struct *si, + struct swap_cluster_info *ci, + swp_entry_t entry, unsigned int nr_pages) { unsigned long offset = swp_offset(entry); unsigned char *map = si->swap_map + offset; unsigned char *map_end = map + nr_pages; - struct swap_cluster_info *ci; - - ci = lock_cluster(si, offset); /* It should never free entries across different clusters */ VM_BUG_ON(ci != offset_to_cluster(si, offset + nr_pages - 1)); @@ -1494,7 +1491,6 @@ static void swap_entry_range_free(struct swap_info_struct *si, swp_entry_t entry free_cluster(si, ci); else partial_free_cluster(si, ci); - unlock_cluster(ci); } static void cluster_swap_free_nr(struct swap_info_struct *si, @@ -1502,28 +1498,13 @@ static void cluster_swap_free_nr(struct swap_info_struct *si, unsigned char usage) { struct swap_cluster_info *ci; - DECLARE_BITMAP(to_free, BITS_PER_LONG) = { 0 }; - int i, nr; + unsigned long end = offset + nr_pages; ci = lock_cluster(si, offset); - while (nr_pages) { - nr = min(BITS_PER_LONG, nr_pages); - for (i = 0; i < nr; i++) { - if (!__swap_entry_free_locked(si, offset + i, usage)) - bitmap_set(to_free, i, 1); - } - if (!bitmap_empty(to_free, BITS_PER_LONG)) { - unlock_cluster(ci); - for_each_set_bit(i, to_free, BITS_PER_LONG) - free_swap_slot(swp_entry(si->type, offset + i)); - if (nr == nr_pages) - return; - bitmap_clear(to_free, 0, BITS_PER_LONG); - ci = lock_cluster(si, offset); - } - offset += nr; - nr_pages -= nr; - } + do { + if (!__swap_entry_free_locked(si, offset, usage)) + swap_entry_range_free(si, ci, swp_entry(si->type, offset), 1); + } while (++offset < end); unlock_cluster(ci); } @@ -1564,18 +1545,12 @@ void put_swap_folio(struct folio *folio, swp_entry_t entry) return; ci = lock_cluster(si, offset); - if (size > 1 && swap_is_has_cache(si, offset, size)) { - unlock_cluster(ci); - swap_entry_range_free(si, entry, size); - return; - } - for (int i = 0; i < size; i++, entry.val++) { - if (!__swap_entry_free_locked(si, offset + i, SWAP_HAS_CACHE)) { - unlock_cluster(ci); - free_swap_slot(entry); - if (i == size - 1) - return; - lock_cluster(si, offset); + if (swap_is_has_cache(si, offset, size)) + swap_entry_range_free(si, ci, entry, size); + else { + for (int i = 0; i < size; i++, entry.val++) { + if (!__swap_entry_free_locked(si, offset + i, SWAP_HAS_CACHE)) + swap_entry_range_free(si, ci, entry, 1); } } unlock_cluster(ci); @@ -1584,6 +1559,7 @@ void put_swap_folio(struct folio *folio, swp_entry_t entry) void swapcache_free_entries(swp_entry_t *entries, int n) { int i; + struct swap_cluster_info *ci; struct swap_info_struct *si = NULL; if (n <= 0) @@ -1591,8 +1567,11 @@ void swapcache_free_entries(swp_entry_t *entries, int n) for (i = 0; i < n; ++i) { si = _swap_info_get(entries[i]); - if (si) - swap_entry_range_free(si, entries[i], 1); + if (si) { + ci = lock_cluster(si, swp_offset(entries[i])); + swap_entry_range_free(si, ci, entries[i], 1); + unlock_cluster(ci); + } } }