From patchwork Tue Dec 24 14:37:59 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kairui Song X-Patchwork-Id: 13920189 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8AE6AE77188 for ; Tue, 24 Dec 2024 14:39:36 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 19C8D6B0085; Tue, 24 Dec 2024 09:39:36 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 125A16B0088; Tue, 24 Dec 2024 09:39:36 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F06BA6B0089; Tue, 24 Dec 2024 09:39:35 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id D34AE6B0085 for ; Tue, 24 Dec 2024 09:39:35 -0500 (EST) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 8BB18160BC6 for ; Tue, 24 Dec 2024 14:39:35 +0000 (UTC) X-FDA: 82930110648.02.DC27E8D Received: from mail-pl1-f180.google.com (mail-pl1-f180.google.com [209.85.214.180]) by imf22.hostedemail.com (Postfix) with ESMTP id A4FD9C000D for ; Tue, 24 Dec 2024 14:38:52 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=VRCwjlAC; spf=pass (imf22.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.180 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1735051128; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ByQEC8FsvBO2uOH/tvc24OCSxXwsTLkJSMXrTAjB9ew=; b=OgYKQ0ZtCYd3YVntGndgFqG1xG6LNy98qDU3ugr/8khxiCFmuRVLZRIVdJ6ZlpDCtz1vqv j766pDrt7VPwLObhhuvSUov9/fGPOalPwTIwsCZs183JTPQ4Q7sT6LxDsnZ12mhXmNPNx8 +MDM01ZSJ5gRLlreLFV/2gURBpKeG2w= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1735051128; a=rsa-sha256; cv=none; b=wsJXaXWoc/Py1DhyBH0hmwX1upB8sh4PvWReflKnxympk16hwRAXHeFMqZ4h7n51AXaHSB LDOj+1MKvJomLTVXsn7QstGwyS85fKUgWePTJEjpGGNwsTnVak4d3Mypr3VEAnxXkEOUVA TVgA+/86TX96ulFnFUkCVJDBTvivcJs= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=VRCwjlAC; spf=pass (imf22.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.180 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-pl1-f180.google.com with SMTP id d9443c01a7336-2165cb60719so54434495ad.0 for ; Tue, 24 Dec 2024 06:39:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1735051172; x=1735655972; darn=kvack.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=ByQEC8FsvBO2uOH/tvc24OCSxXwsTLkJSMXrTAjB9ew=; b=VRCwjlACYIYX4eefkxaooJzOU5qeyNkbtBXIukA1zfMiE05rhyttkaMF+TOdRHJoLG Di2RaNzZh8T21w32zPWOKToJzGl/ZIRajq08wjgOl+72whsLrUqPUhapo+iVJ1TUSILf RGWnUri1wjIODnD0bEulSm1Uwxkio8Mz0SXz75SsSKor7EGOsQNLoEx5B2BepEYfCmJ3 Ie3AhjvzRBkm+wxWWZ3sF8jFuOP9WGTPGEOYY7T4713kcicJAkHmxfA+lKAErJslVHSE +11B2+WLecvu5LQzq+TxirA1HTRHP+Jw5Suruw6qBZiKoANTx+PVB/j0CkjATje0sn08 WWmQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1735051172; x=1735655972; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=ByQEC8FsvBO2uOH/tvc24OCSxXwsTLkJSMXrTAjB9ew=; b=SREzv9FD2jZk3XqaRxAi6mA3OsK/EEUZ107RBefaOs2Jvus51dCkg7JMwxCqiVzPM5 CfyHpUegf9OKal2hs0D0omo6nFvqK4ohcVCWJaHw6YHmqc1G5gRt/lrbNifX1EmFPUld todL5RIYWKPH5HiFk9c/PbbB+DpK2whGlJbzt78ndel9wJSoFicg5/rnIQaF5dmN3vMc Woqp70nElWaf6YQy4ZF9vQDbUNvSN1lA0y8LpWDsUHqirIX+uH17eiUrFT6nSJOXg0++ bB1GReEniMVMbjUFuo9T3qYhkzInFoVixIj0sRs4myOFJxQnYB02TkB5b9ULr9JTeljI b3SA== X-Gm-Message-State: AOJu0YxF5mTWh9bTiLk0O8HMOywT9SyTX5uu3+TI0fk0dTOsaN8Xlrt7 7g5FVwxYtXx7ju4jcMkdSgumHcivTtOan1pTKRjr8BR87mIiEvezr34wFzAZZ10= X-Gm-Gg: ASbGncuhbuEx92i/o8uFddoqvOeOxH0o711kkXxJ/BVBFBNnPMlmAbMPv5e+vjj1oj5 v7B2Rxa4uKujG6+MdjadK+TXYpijmDQ7UJ9uIcrpehBmYW10wVWbq22l7qBdRlZPXP1O+93vahy V/iZTk3IqYRE0+PjZ9vKRIt0nVnwT5GXpvUaKahcyK3xmudOAoJYsyMqJVsrj6vmESbWPoON5uH UIJlfCNjQVOsUU6HQWnokK9XmVX0DjViVmoTAcPEKTFiYLxFYqXkiz/SnKg6aJ7AEDZ8624njQT Mw== X-Google-Smtp-Source: AGHT+IERM809iT3lGHUyfSmwzz85A7RBdI6+sHZNonAv7QDefMy2AtxZq5MCVgChbGhf0YMVvbHZ6g== X-Received: by 2002:a17:902:c407:b0:216:50c6:6b42 with SMTP id d9443c01a7336-219e6f25ddbmr196266805ad.56.1735051171988; Tue, 24 Dec 2024 06:39:31 -0800 (PST) Received: from KASONG-MC4.tencent.com ([115.171.41.189]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-219dc9cdec1sm90598735ad.136.2024.12.24.06.39.28 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 24 Dec 2024 06:39:31 -0800 (PST) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Chris Li , Barry Song , Ryan Roberts , Hugh Dickins , Yosry Ahmed , "Huang, Ying" , Nhat Pham , Johannes Weiner , Kalesh Singh , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH v2 01/13] mm, swap: minor clean up for swap entry allocation Date: Tue, 24 Dec 2024 22:37:59 +0800 Message-ID: <20241224143811.33462-2-ryncsn@gmail.com> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20241224143811.33462-1-ryncsn@gmail.com> References: <20241224143811.33462-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 X-Rspamd-Queue-Id: A4FD9C000D X-Rspam-User: X-Rspamd-Server: rspam07 X-Stat-Signature: kkifmghegnbxkhjr1sx9qhr4ehj5ohb7 X-HE-Tag: 1735051132-518881 X-HE-Meta: U2FsdGVkX1+P8a60dJFh886p+WjBL1CQ5svslQxllz+MUgv2tH1qFmx0aJlkf7FAJqWLb3xPELLyzE8/T/aBksEeopr8z6dWs5gB2Lqb51BDZH//rzqkY/iZ38Ko0r481f+P9vu48bA2AA6uIK8msEn7tROmhKfCAr5EEGHgpbqySQ99WmlIV5Tsp+9HRq7qhsihhw0/z7PVFjeVYnp6uZORAVJsMtFS/+tf6v2XvKHNXpK/F2bHs5AtXKhbmcQNjVQiuuB6IAY0eCEnk9WDKAZXODGGGsqIg8mZDFC6eVRKg6VgWWY7LpeoiRWUB2h9Hthpt94ONvSKPbwTOPXRMMNaFeWuIhD4JtM/rW0bNRfrCXAo+8+etL7TV//c648FRgH/bKU7hTQyUBSsnoc/zmVUfYvdDasf1J0w8idxbMiBzq71SFRmXCkIVZqDsdqCaMSzXRjkSX+mj10t5e4QE/3Gv3OwcGduqrVcQIF5hbtv4/79R820jS3CspXA0FdS6nLgDot3syn2wOuWF6sWe0B+nghT5L7wDYw1LzdOSY/2CdR96yBFu5hLKscgWJMAuYFrFRvlr7UrouuGpaOVctwgrHGs8nLCVwuUHc/h9GklOBIZnSBgfchOfj+hjalrrSeyZN9lpiRKG1OyITpOOnXqP2c6gU/zKODOrVslvCM5UHf33Qu3EF+UbR5Avbq2OaaahbJpDXPj8lVyX7+BY3IIym42Rn+CGgaMpkWMdd2D0t6/3m/d1IK8K5wPaoclz2AMCSh+I6o8vqSx3u6ZkIAXSwKhJUf9+lfxGmU+RhW8r09SZfxM2lp/yJAeS9edLostBm9O/a1buQpyoSXUV9wWc2FlwK2SJNL23VMag0ngczMjyK0gF7GTUndT+jkW7EamA3FdvchdKhnCWhwDg5KTJkvax21IKXRblh/0zyrBZZ5RWopN2q/KeMbfxWSwBOaaj5hhBBj7eqkcVJN htSx7wJ7 bPaO+p/nytwKP7WPSSWsc7wgG1V1gIle5gP8IK5Ps3TiYAGWl7tW0HmpA5bU+A66emvdML1zL15X4/KsrbTfrVvtoZJ+MgTpG/YApEEBG4DkOE59/bgDTTZaSbEQuFJN/T7OLzn2DeA/0i6lAs86iVYX5gUIaT2KTfGlaZK4sZbA4O81hxoKeIvWzkxTUjwj0EZSzk+nBEZx99dD4SMM5eBXwLZGhdKMZwkhekT8GQGmVTbKIOMJd0e4tJT9WXlEDlHgn02VDd95N+lifIjE09t4LoLXIKYAtqamp57MjpGyI8xpTJ83+UApmEfPI/y16j6Ko6c9Su1mqhxMEZqnAHfppGT1gD4vx3hrma521BNA+F096PkZhKt5HX5GqD3qxIVmrzJ8J3E/xsKd3VrcOsWxcMSY9ySCPhjEdq9cWF+8jZKeQKOcPwDrzgaOa6tC1SJzOWynSxtpYCKkjF0Yo9TPb3VM7IrKFeNoTbhKsXNoT5Mjff3HvdtonxEnjrUtirp/+KsS+xbPE2gt+GNgu6tH8Bsllf4XizByQ4Qfs/Itcqbwpnim24FkWIg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.001076, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Kairui Song Direct reclaim can skip the whole folio after reclaimed a set of folio based slots. Also simplify the code for allocation, reduce indention. Signed-off-by: Kairui Song --- mm/swapfile.c | 59 +++++++++++++++++++++++++-------------------------- 1 file changed, 29 insertions(+), 30 deletions(-) diff --git a/mm/swapfile.c b/mm/swapfile.c index b0a9071cfe1d..f8002f110104 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -604,23 +604,28 @@ static bool cluster_reclaim_range(struct swap_info_struct *si, unsigned long start, unsigned long end) { unsigned char *map = si->swap_map; - unsigned long offset; + unsigned long offset = start; + int nr_reclaim; spin_unlock(&ci->lock); spin_unlock(&si->lock); - for (offset = start; offset < end; offset++) { + do { switch (READ_ONCE(map[offset])) { case 0: - continue; + offset++; + break; case SWAP_HAS_CACHE: - if (__try_to_reclaim_swap(si, offset, TTRS_ANYWAY | TTRS_DIRECT) > 0) - continue; - goto out; + nr_reclaim = __try_to_reclaim_swap(si, offset, TTRS_ANYWAY | TTRS_DIRECT); + if (nr_reclaim > 0) + offset += nr_reclaim; + else + goto out; + break; default: goto out; } - } + } while (offset < end); out: spin_lock(&si->lock); spin_lock(&ci->lock); @@ -838,35 +843,30 @@ static unsigned long cluster_alloc_swap_entry(struct swap_info_struct *si, int o &found, order, usage); frags++; if (found) - break; + goto done; } - if (!found) { + /* + * Nonfull clusters are moved to frag tail if we reached + * here, count them too, don't over scan the frag list. + */ + while (frags < si->frag_cluster_nr[order]) { + ci = list_first_entry(&si->frag_clusters[order], + struct swap_cluster_info, list); /* - * Nonfull clusters are moved to frag tail if we reached - * here, count them too, don't over scan the frag list. + * Rotate the frag list to iterate, they were all failing + * high order allocation or moved here due to per-CPU usage, + * this help keeping usable cluster ahead. */ - while (frags < si->frag_cluster_nr[order]) { - ci = list_first_entry(&si->frag_clusters[order], - struct swap_cluster_info, list); - /* - * Rotate the frag list to iterate, they were all failing - * high order allocation or moved here due to per-CPU usage, - * this help keeping usable cluster ahead. - */ - list_move_tail(&ci->list, &si->frag_clusters[order]); - offset = alloc_swap_scan_cluster(si, cluster_offset(si, ci), - &found, order, usage); - frags++; - if (found) - break; - } + list_move_tail(&ci->list, &si->frag_clusters[order]); + offset = alloc_swap_scan_cluster(si, cluster_offset(si, ci), + &found, order, usage); + frags++; + if (found) + goto done; } } - if (found) - goto done; - if (!list_empty(&si->discard_clusters)) { /* * we don't have free cluster but have some clusters in @@ -904,7 +904,6 @@ static unsigned long cluster_alloc_swap_entry(struct swap_info_struct *si, int o goto done; } } - done: cluster->next[order] = offset; return found; From patchwork Tue Dec 24 14:38:00 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kairui Song X-Patchwork-Id: 13920190 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 958E3E77188 for ; Tue, 24 Dec 2024 14:39:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 25F396B0089; Tue, 24 Dec 2024 09:39:41 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 1E8156B008A; Tue, 24 Dec 2024 09:39:41 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 061286B008C; Tue, 24 Dec 2024 09:39:40 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id DC0196B0089 for ; Tue, 24 Dec 2024 09:39:40 -0500 (EST) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 8FB7A802BC for ; Tue, 24 Dec 2024 14:39:40 +0000 (UTC) X-FDA: 82930110480.18.1DFB87F Received: from mail-pl1-f172.google.com (mail-pl1-f172.google.com [209.85.214.172]) by imf03.hostedemail.com (Postfix) with ESMTP id D73F820002 for ; Tue, 24 Dec 2024 14:39:20 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=SadAv3hW; spf=pass (imf03.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.172 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1735051160; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=rZtcJl9C6ncR1PQN3HhaQn7BWhIvEYupFW/4U59A+e4=; b=ExKCcArlNUMY/HizBVzCfYgNJ4JSsmkbSmv4caDUJO91UhEJb39F+kesC4nZamwAAzTvhG nYHgghxYQi8yFMLoKzPkTKUuEgj/FuLpg0CLdEc0T9iJOSDCQaG3HuD5GPgsES5LD7hxeK pyN7jl2l7kTWpaeCuhawTIejf5Fh0+c= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1735051160; a=rsa-sha256; cv=none; b=C/KWOJB1apTqsAGirmR7CUVORR0cZjmLlH8v3h58+OnArEl4wqzKdFbMkZIyOtN0SQ1COm WwyqyehcPmixO0WX8iBNmOCWuhfZFBLZSWbobZxJU6Q6riOUpLb2UfQFDz/3hiSuo81YHu btpF4TtPPSLAEvODHp1HhPXE1K5Ao3M= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=SadAv3hW; spf=pass (imf03.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.172 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-pl1-f172.google.com with SMTP id d9443c01a7336-2163bd70069so52463955ad.0 for ; Tue, 24 Dec 2024 06:39:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1735051176; x=1735655976; darn=kvack.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=rZtcJl9C6ncR1PQN3HhaQn7BWhIvEYupFW/4U59A+e4=; b=SadAv3hWhf7NNVQkrcx6o7NQ5GQHphOVoYVDKfwj86dBbTRiXmlSSshzrbd0SDj21K OxS4EnGwBIddyZpHV2xuZy86K1AyS7+klFu4PLUhWbsmPNsonTiOquixEBIvpEP+dpLM 9Q+6E7lSe4/Km+pzIqquB85DOGja99/xrAmKnZWI0+lN/zleS/Knu5VhE70nZDjSlmED 6OFClF4s210o05c/B/JyXeuEiVrxgNg+kCrFBoq5Ge9uPu9SWD9tbvYqU6GX2aFeVIks DPEvkztx4T1n27tWLQo5v41uRPREJYKpD1rAWNiVPSxHn0YrxDZhzbIrq1RjppprDFqC 8lHA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1735051176; x=1735655976; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=rZtcJl9C6ncR1PQN3HhaQn7BWhIvEYupFW/4U59A+e4=; b=xCKgCbiZdmM116bfNEFk0J60Ry2l5Qb9J9OU8DgQCK9CD+ldGxxu8ZHFZC1ajv1Bta 9p8wIbKJXN5usaHx5A8aVyBjS5hW0yOUZS5IXAFke8jNFRDrsTADBIiEImbGlAWt4yLk itiIpi2Q4KHnaF7EEX44PINahMBo6T3z/7oP8x6ZFo9yhpWbVULqM/vKD3un0LOjSCzF uy2wKQ2fIfwr7raTQmJUOLYacbnGkya/Mk9HkRUcccf9iu1MNiwvd5bwhkQbBgY8CGp/ ScppqlT1vdA8sh7KMJ61j8Brkcs/05kbD5supSmJ/jXXwGXTsN08/G5mbXfhUX+dX1aT S+UA== X-Gm-Message-State: AOJu0Yya3/tfa0j1PsuSQlEXak/OfyCAtPxkYFkWK9PDK8glxqRtk4YX LMw6I88r8+CT5ICPEwUqfk4vCG/mKXYa1BnRwpTXEUw/BFEhquyf9ppc5z6dg90= X-Gm-Gg: ASbGncuGUxogtg7ISPlxW2KJcruYaqkT8E6818lLnx0RllWWEM7vX1yZHrvErAWjB/d n4jBuPh3B+N6n8m5iFJShFHgw3JgjGCL8dRYPzVttswVvxzgK0s1V7AeHyKtLDWVtq7PC7c6qGP 9e7XYUQZEPzGZW+r1Umg2jR+CZmbCZQdjGH5hxcIOlkhbJA1P/8+3YIPnHViTKoy6In+A73MOxN 4+z6uiiI2HsqpM6aLnQ9734xcrsaBLBVjSTfgefL8p2HoYOcMdzz2IUN2/cxLpTwyv+wA+8iKVo gQ== X-Google-Smtp-Source: AGHT+IEkDx4n/weETtq0G9ZPhvCOgy2IrUiCp+3qPSdSRiNgKT1LkatuidUbaG44s3+jtvqlnVPqTA== X-Received: by 2002:a17:902:d4c4:b0:211:ce91:63ea with SMTP id d9443c01a7336-219e6ea1accmr181270775ad.15.1735051176547; Tue, 24 Dec 2024 06:39:36 -0800 (PST) Received: from KASONG-MC4.tencent.com ([115.171.41.189]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-219dc9cdec1sm90598735ad.136.2024.12.24.06.39.32 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 24 Dec 2024 06:39:36 -0800 (PST) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Chris Li , Barry Song , Ryan Roberts , Hugh Dickins , Yosry Ahmed , "Huang, Ying" , Nhat Pham , Johannes Weiner , Kalesh Singh , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH v2 02/13] mm, swap: fold swap_info_get_cont in the only caller Date: Tue, 24 Dec 2024 22:38:00 +0800 Message-ID: <20241224143811.33462-3-ryncsn@gmail.com> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20241224143811.33462-1-ryncsn@gmail.com> References: <20241224143811.33462-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: D73F820002 X-Stat-Signature: tnftw8x7dbsar8ymnf6b9f3ji7aehe6w X-Rspam-User: X-HE-Tag: 1735051160-154159 X-HE-Meta: U2FsdGVkX1+iEUXbDyadfyUj+SDePQtcuNaupZIIAAMaJ6WLFBPkdPXDC+95T+FMIlhQNY1TYK2Sqjy7wNMYqgizhhZZ9/8L3ePFi//Fqg5ySAeBWjZH1uq/lUclM2QWitddIK7GHEV+iVKrqoZlqi/hHd1F8TXA+F66N6jxh4EvCllZHxisbYjeWTaSnouxr7IRHbibrOrK3gxCuYtQf9c/LgPr0HAkwPKknPdpRUpeqz4q5DkTDyX/vJrODPacn4YAIPXdwSHdPOx6RwG3dh2fdAt535SBZSGQd7HqQ2z6LfrA1a5H6pyVbRv98bsC3wNuv/13IS6qpuAoL4PrWflYsLyWDhmVeaMu0jN/e1xJhIZ3vvr1D08Es2fX4Nh/xwcfCTB2DY8VONHqEsF8qjBUSXNJsfIeavXT+zibhOOqFJlWyKAHswaBCdjgsvi3G9xFhf76N5FEBTXokeBSc8gZcDFwQd9EZ70CybzcLVyELjNov8cERNCSRtECMRlNoKNmH+ajX8IOI5Phg/kSOQumr090100Dei4b4ZXxWRLwedAugTVcxDpmyQnTy5+SrZBR+3JDi7aW5cpcmbM68G2Yrsm1ZgIRuKfktVyupTIUp4CODAxan+/LwdnAWgcrNalJAyiba6YlD2//O1uQ/xGMCgOccEhHp0v6N3RC6mxf6xIPLOtVWUzpzrzzgzeH9/BjRqRq8Ey4hmAeNKGn/uWqdx3pX+ffuIfz6D+g9+5dhMld24dSP5uXdTWz8N8mQqnTfUYZdw5q0NkFwIIx0znji7+yS1MNbvpjSMTgkukWWKJum/wqokQvBdZXIB5xNgVwI62fByCIHSGFF0FaifvcVkmLsTG60Cox6gt3HJVriiLtcphl0A+n9gfV/MFecfHGoG2sLs0114a7NMlbK3Z5DeNIcUzIuQVMiXLFx8i10nUCo7VpZo+fsjpO20303y0iY5Fu9Z+WzjwpABz +jrWQkkM iUsozQesIZrqr7WB9Ka1ELsWK6oMONQ6tH+1UcxpJVu0xuGLYhTzCKwYyOTHTFDunr9Yq5OL92Wkh2Icu5nUQESeEYp5IGiOObaPj/yb7gaRSLdorlPbHRJPVkGBPYOsqKh7BWvg+w7GpMUTo6Ba3QM8DiGwxTD9FwPGohTtNdewuj4eu5V9qjoaWLuG8MP55/oOKiCBUvaDyUqJrrKCEFXJ6G5idMim9HIzYg3zAA3Zyv2rC28pgtv+k0VhGUjbpD8b+X7JOEL8O59p9yF1z1Dc5CaQSYsVoiTk+WPeiFVr+KTugKZNaZ4ck9vD1SzEnElGKxA6qS4eiywNZm60SZXraH4TW8iu0L4lxh1YapkZpZWiUb2ehpNU4msYmBPNdk1xlCzVtffMYweIO9E23x2EHCa8ufDECsqCQpanS1FgXEecC5WnaxFr8AmMNgj9rbbHzCH3c1c+EkGZ6f8GmbQTL54CmeED2yyaFtt3ZvjgxF5g5GTgFlO+JVlVqWKU/wxPSrMdmGSXN/TU/S4Y0Ilcdw1l4Y2HDrxQtMngcSS7qpURoJBW4dgqr5lilWUdUEi3h X-Bogosity: Ham, tests=bogofilter, spamicity=0.022054, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Kairui Song The name of the function is confusing, and the code is much easier to follow after folding, also rename the confusing naming "p" to more meaningful "si". Signed-off-by: Kairui Song --- mm/swapfile.c | 39 +++++++++++++++------------------------ 1 file changed, 15 insertions(+), 24 deletions(-) diff --git a/mm/swapfile.c b/mm/swapfile.c index f8002f110104..574059158627 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1375,22 +1375,6 @@ static struct swap_info_struct *_swap_info_get(swp_entry_t entry) return NULL; } -static struct swap_info_struct *swap_info_get_cont(swp_entry_t entry, - struct swap_info_struct *q) -{ - struct swap_info_struct *p; - - p = _swap_info_get(entry); - - if (p != q) { - if (q != NULL) - spin_unlock(&q->lock); - if (p != NULL) - spin_lock(&p->lock); - } - return p; -} - static unsigned char __swap_entry_free_locked(struct swap_info_struct *si, unsigned long offset, unsigned char usage) @@ -1687,14 +1671,14 @@ static int swp_entry_cmp(const void *ent1, const void *ent2) void swapcache_free_entries(swp_entry_t *entries, int n) { - struct swap_info_struct *p, *prev; + struct swap_info_struct *si, *prev; int i; if (n <= 0) return; prev = NULL; - p = NULL; + si = NULL; /* * Sort swap entries by swap device, so each lock is only taken once. @@ -1704,13 +1688,20 @@ void swapcache_free_entries(swp_entry_t *entries, int n) if (nr_swapfiles > 1) sort(entries, n, sizeof(entries[0]), swp_entry_cmp, NULL); for (i = 0; i < n; ++i) { - p = swap_info_get_cont(entries[i], prev); - if (p) - swap_entry_range_free(p, entries[i], 1); - prev = p; + si = _swap_info_get(entries[i]); + + if (si != prev) { + if (prev != NULL) + spin_unlock(&prev->lock); + if (si != NULL) + spin_lock(&si->lock); + } + if (si) + swap_entry_range_free(si, entries[i], 1); + prev = si; } - if (p) - spin_unlock(&p->lock); + if (si) + spin_unlock(&si->lock); } int __swap_count(swp_entry_t entry) From patchwork Tue Dec 24 14:38:01 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kairui Song X-Patchwork-Id: 13920191 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2E059E77188 for ; Tue, 24 Dec 2024 14:39:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8C9EF6B008C; Tue, 24 Dec 2024 09:39:45 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 829626B0092; Tue, 24 Dec 2024 09:39:45 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 654986B0093; Tue, 24 Dec 2024 09:39:45 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 38A616B008C for ; Tue, 24 Dec 2024 09:39:45 -0500 (EST) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 6D3308165A for ; Tue, 24 Dec 2024 14:39:44 +0000 (UTC) X-FDA: 82930110312.24.FA6D5BD Received: from mail-pl1-f176.google.com (mail-pl1-f176.google.com [209.85.214.176]) by imf05.hostedemail.com (Postfix) with ESMTP id 39E1F100012 for ; Tue, 24 Dec 2024 14:38:27 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=bQo89+zx; spf=pass (imf05.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.176 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1735051137; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=7vSxPva/IzAffauQhXhMBUxAGj3ydaL87rJATVq3axg=; b=6ruF/ySyhTLM8npKPlMtMdqM/Pdrn5Xh694UAPWLbQcpViTy8x9pGs0cuRSG+vKgpA9jNs DtRyjyhrWw1RkiKgi9ooNzTd+DxnfP6Bpmk4G2QAFY6ttELSc6JNb3dKZjQ5UfoQv7ayFw ZIaOSW19uyqnyehYyVsnVufgRLjUpCo= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1735051137; a=rsa-sha256; cv=none; b=N0PrXmJqLkS+0cqvAMsBAPkg/vtCqkfMrXzbweuGWttEC32ssuRbcrkVjZhIOTJvaZPJK1 fHCz+lcOlx7Vpt8Is8EfxKUNpfVjDLEzsTOyd0W8HTfsdDBdCK6IVEQRKGAXG+IM/RZ71p BURW8Tfl1quXiGAWaB/+qk9Bqttsz+w= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=bQo89+zx; spf=pass (imf05.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.176 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-pl1-f176.google.com with SMTP id d9443c01a7336-21683192bf9so58662175ad.3 for ; Tue, 24 Dec 2024 06:39:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1735051181; x=1735655981; darn=kvack.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=7vSxPva/IzAffauQhXhMBUxAGj3ydaL87rJATVq3axg=; b=bQo89+zx3pZJlSlz6j7WigyLbeDpYD1uGNcec/G7Du0czDHL7BCg43i9g31tEu5Aqz DssEWyZppF7jxZ+xbwA46jRyGAGQQCkR4a7fIFX/fPd7/z/KSW0uhsgfH8HFcsyzz9vt qkVnb4XyBqxSODLB0FnXsUOxSxOdHTtLcdrPinIBEeOoV7vI15cyWkQzVMXHwAnboaMJ wvXfsdMF+redtVzvH/MfonF9NFLZne6rOHG+iML3YQjcvuwUI4Ppnrb2GE4K8fRw6q3k thvfEe0bDpEzWNDZ4pvArasL4thGtq1gwTmFzKCs4rpsiNXL/9h2x+B/ObZy2pOxe0k3 mDrQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1735051181; x=1735655981; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=7vSxPva/IzAffauQhXhMBUxAGj3ydaL87rJATVq3axg=; b=cNPbDlZALNoe6Ok4pobdezJRBWVtjrxhaT9BFh63iyGkJ8+bS+j+9otXKUVzFZ199j MCiEHnii+X/70GdML853iNhpooS1bvH+9KBAYaQOKYzm6NmNZYDGusPhx5IzLCjUzniL EQ57GDphETXNaRFs2Goul3SZdPCvZif7TAyUPOPw+0uLqDc6UbflhfzYNxdPvfTf6uKp rAJHnzE4UsBpQXtPz+x2pjPxuPrzPxX4bZqCuaOqdKkqtf/e5CxFZB6G1JtLuFLowPbE JH39OhwDy9oKWUGYUXq9fV12fOOMhWuiEuKutjM6vKiWyS53lDOqPK9aTsYYJkseev0j vdnw== X-Gm-Message-State: AOJu0YyofEQVLL7LELAtooms5314paSHKBUQFqJW3SaDhqPNfK+Sspt/ 1Q4kvc0amBlFm+Nu+N3i/VK1XSDQaHkVz58kIo60+bPEv0BVFW/AVIBY0T381Ow= X-Gm-Gg: ASbGncuvlDQGdNF5MmGvAQllf7G4J33acb9SjICZFfbcpmz0WG8zEPFOPan81M4gO5y QGaYgjC3aB8EJbxFLlDRTkJT03F6Oxuf/66vJf2s9FnBNa5r5esDbqCSqt+/vf6JLXzuSPm+ZHW /JAmENHGnKF/bNmIjsK+BbWASpLzoQCLPmAB7XxnCPIPUM6IZxIRoQ4YUKUlhx4oPU6sMiWPuV6 0A+c7pUKLEJ54uuzXFD1Je5uUg24dzCK6Yt+cBRmYZk1O99Y7d9D655B0mTzj2bm/NPc6cg/Sab jA== X-Google-Smtp-Source: AGHT+IFUIbkJj2uiK0mi+sirzvJ+kxdlR40mR42wJ1gQTLR/qHCcG61l6wFzaFQioRKEUmA2LCan7A== X-Received: by 2002:a17:903:2442:b0:211:7156:4283 with SMTP id d9443c01a7336-219e70c0085mr249317255ad.43.1735051180901; Tue, 24 Dec 2024 06:39:40 -0800 (PST) Received: from KASONG-MC4.tencent.com ([115.171.41.189]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-219dc9cdec1sm90598735ad.136.2024.12.24.06.39.36 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 24 Dec 2024 06:39:40 -0800 (PST) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Chris Li , Barry Song , Ryan Roberts , Hugh Dickins , Yosry Ahmed , "Huang, Ying" , Nhat Pham , Johannes Weiner , Kalesh Singh , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH v2 03/13] mm, swap: remove old allocation path for HDD Date: Tue, 24 Dec 2024 22:38:01 +0800 Message-ID: <20241224143811.33462-4-ryncsn@gmail.com> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20241224143811.33462-1-ryncsn@gmail.com> References: <20241224143811.33462-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 X-Rspamd-Queue-Id: 39E1F100012 X-Rspam-User: X-Rspamd-Server: rspam07 X-Stat-Signature: gnjgyuyc4y4haso74idiqp66m1bz91hj X-HE-Tag: 1735051107-375251 X-HE-Meta: U2FsdGVkX18tfdTM/LbMlQlWHE3+wCqf9dLfwkvQfennTCnVVf/KBOWU1IZDj6C682GaS17DqXMMJUB3HtpO1Alwvez8EZTplP3ZAhF6StW6eRx9cU2YJzSMkpYUR8dEVL7JiX8V82LFwnZCwAmqcVcSGpNnvc4+csBwLy5/QmkTGoqYU0jwWraD5vOA35kt7s6BAsYYIfdRcivKQ1dMvvjFXkvuwqkKBpdYz4OqWk6DuzO8sdOiJEQV2UXQcJPq4fNncLiqtUWFFRqCuE5X9DPvZbtaDkzjR1ywr1xydAU36kmEvpmdLDSakOnX3E1362D5lbpBLR9GvdRZm/Pfj+wAPCHFCHoAs8O9pwITvfU1QD3hN38gc6V1EkEXGW22EOiTnB5pwvDFSX15g3lEdyQnB8pOx2YHvqOqwvPy0xyvo56mc/T7paBoAOJhbWLHkLzdAzM2IB7X5efh+1cSTF21Wnt7D77aVFnYnyEUAkEn5e0F5OO/zUL1r4s34VqpqguZnJhzyhW1WB5a9bWVwBPCtxCgllTYut/PbV9IoYHI2OsVRACKGM2tnAwDQPlKhJvFQxpHylJ6DSdSqQ4GTRQmvsIZwcjxWeJzRKQwSG186g1i2fbFlhY+b2MbXcdFQG2lhOICwnX1v2o0Nvi09/fhJAqC2WQiGWi+IYakGhspCWQxYQuYbD4DiloGDycpv/iC62o9E8t30LMY5UHGRrmqgUeERO3gkvb0t4C5QyqFAmfHnxWlyJ0K0hzJUR9d8/+es1OKYCx/92XpZk2xdKxVkrQ1TatY09h3H5T3c1TaLBRadTtxcEaFTRyJCbj700m++fF87oRmWaFbRtO/4jM7hY8JKgRDWG+GyhSvt3wsYlk8t77r0ImHgEsz94xz3TvrIQCnPN0xtLpBmqoiLv7iauMTE6XG+E+1jEnk8NUUnvgyqewS1Y1/f2aFJMQcB6Vil/ZGazGkonR5JHq u5Ah355v 43QjdGKS2s2ACUKa3zDeB3dd0AP2VQm8JVAmchu5M26xK1mAq19/wMljLLSnXtYkwnBSorvzMBIkqV9sNdpzNS+ChIF8IHidhwGmH1AKQSFz8nGPEu6Eo9240BlfsDpIZNL1uBMnGSeu29e6Dq39FmDTBh3BFK76csA9vgJ3x2Bf8CPrWv6wXQGKbzWdoLnvRIpKAbluiifNPmmGwUQyQlrOX11oAfUgZnec626OIGwnjO7zP2TL2DTLElJPY1Rnh/Uo/xdDmpAqNYTzT8z3hItZxW9hnPST3aJHkPcHiO6f4Fq1d3vuo9TIQ/NuKXghzN/CwYblkV/EC3fvKPTAMYTQHF9m3l3yezpYqhfcTguDqs1SUvdhux9WxKEJtXb662hiGkc5mSSdnKFfbv/8DqOYskjoVm2b97awfsqQ/P1QpXbrrsum0ta3wME8dpP10N8oDzLLuWBNMK0Pt//ff0MZwd6iRLyUDu2Fr0dGEipzyxD09Oll6P8SpU75rz4B2SzY42l5L1PC6MWwgcUxyPTi2bQpkMluRtCoeGlLMvgcgcMJwjwZa2dO9oYyZQjeveFh8 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Kairui Song We are currently using different swap allocation algorithm for HDD and non-HDD. This leads to the existence of a different set of locks, and the code path is heavily bloated, causing difficulties for further optimization and maintenance. This commit removes all HDD swap allocation and related dead code, and uses the cluster allocation algorithm instead. The performance may drop temporarily, but this should be negligible: The main advantage of the legacy HDD allocation algorithm is that it tends to use continuous slots, but swap device gets fragmented quickly anyway, and the attempt to use continuous slots will fail easily. This commit also enables mTHP swap on HDD, which is expected to be beneficial, and following commits will adapt and optimize the cluster allocator for HDD. Suggested-by: Chris Li Suggested-by: "Huang, Ying" Signed-off-by: Kairui Song --- include/linux/swap.h | 3 - mm/swapfile.c | 235 ++----------------------------------------- 2 files changed, 9 insertions(+), 229 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index 187715eec3cb..0c681aa5cb98 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -310,9 +310,6 @@ struct swap_info_struct { unsigned int highest_bit; /* index of last free in swap_map */ unsigned int pages; /* total of usable pages of swap */ unsigned int inuse_pages; /* number of those currently in use */ - unsigned int cluster_next; /* likely index for next allocation */ - unsigned int cluster_nr; /* countdown to next cluster search */ - unsigned int __percpu *cluster_next_cpu; /*percpu index for next allocation */ struct percpu_cluster __percpu *percpu_cluster; /* per cpu's swap location */ struct rb_root swap_extent_root;/* root of the swap extent rbtree */ struct block_device *bdev; /* swap device or bdev of swap file */ diff --git a/mm/swapfile.c b/mm/swapfile.c index 574059158627..fca58d43b836 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1001,49 +1001,6 @@ static void swap_range_free(struct swap_info_struct *si, unsigned long offset, WRITE_ONCE(si->inuse_pages, si->inuse_pages - nr_entries); } -static void set_cluster_next(struct swap_info_struct *si, unsigned long next) -{ - unsigned long prev; - - if (!(si->flags & SWP_SOLIDSTATE)) { - si->cluster_next = next; - return; - } - - prev = this_cpu_read(*si->cluster_next_cpu); - /* - * Cross the swap address space size aligned trunk, choose - * another trunk randomly to avoid lock contention on swap - * address space if possible. - */ - if ((prev >> SWAP_ADDRESS_SPACE_SHIFT) != - (next >> SWAP_ADDRESS_SPACE_SHIFT)) { - /* No free swap slots available */ - if (si->highest_bit <= si->lowest_bit) - return; - next = get_random_u32_inclusive(si->lowest_bit, si->highest_bit); - next = ALIGN_DOWN(next, SWAP_ADDRESS_SPACE_PAGES); - next = max_t(unsigned int, next, si->lowest_bit); - } - this_cpu_write(*si->cluster_next_cpu, next); -} - -static bool swap_offset_available_and_locked(struct swap_info_struct *si, - unsigned long offset) -{ - if (data_race(!si->swap_map[offset])) { - spin_lock(&si->lock); - return true; - } - - if (vm_swap_full() && READ_ONCE(si->swap_map[offset]) == SWAP_HAS_CACHE) { - spin_lock(&si->lock); - return true; - } - - return false; -} - static int cluster_alloc_swap(struct swap_info_struct *si, unsigned char usage, int nr, swp_entry_t slots[], int order) @@ -1071,13 +1028,7 @@ static int scan_swap_map_slots(struct swap_info_struct *si, unsigned char usage, int nr, swp_entry_t slots[], int order) { - unsigned long offset; - unsigned long scan_base; - unsigned long last_in_cluster = 0; - int latency_ration = LATENCY_LIMIT; unsigned int nr_pages = 1 << order; - int n_ret = 0; - bool scanned_many = false; /* * We try to cluster swap pages by allocating them sequentially @@ -1089,7 +1040,6 @@ static int scan_swap_map_slots(struct swap_info_struct *si, * But we do now try to find an empty cluster. -Andrea * And we let swap pages go all over an SSD partition. Hugh */ - if (order > 0) { /* * Should not even be attempting large allocations when huge @@ -1109,158 +1059,7 @@ static int scan_swap_map_slots(struct swap_info_struct *si, return 0; } - if (si->cluster_info) - return cluster_alloc_swap(si, usage, nr, slots, order); - - si->flags += SWP_SCANNING; - - /* For HDD, sequential access is more important. */ - scan_base = si->cluster_next; - offset = scan_base; - - if (unlikely(!si->cluster_nr--)) { - if (si->pages - si->inuse_pages < SWAPFILE_CLUSTER) { - si->cluster_nr = SWAPFILE_CLUSTER - 1; - goto checks; - } - - spin_unlock(&si->lock); - - /* - * If seek is expensive, start searching for new cluster from - * start of partition, to minimize the span of allocated swap. - */ - scan_base = offset = si->lowest_bit; - last_in_cluster = offset + SWAPFILE_CLUSTER - 1; - - /* Locate the first empty (unaligned) cluster */ - for (; last_in_cluster <= READ_ONCE(si->highest_bit); offset++) { - if (si->swap_map[offset]) - last_in_cluster = offset + SWAPFILE_CLUSTER; - else if (offset == last_in_cluster) { - spin_lock(&si->lock); - offset -= SWAPFILE_CLUSTER - 1; - si->cluster_next = offset; - si->cluster_nr = SWAPFILE_CLUSTER - 1; - goto checks; - } - if (unlikely(--latency_ration < 0)) { - cond_resched(); - latency_ration = LATENCY_LIMIT; - } - } - - offset = scan_base; - spin_lock(&si->lock); - si->cluster_nr = SWAPFILE_CLUSTER - 1; - } - -checks: - if (!(si->flags & SWP_WRITEOK)) - goto no_page; - if (!si->highest_bit) - goto no_page; - if (offset > si->highest_bit) - scan_base = offset = si->lowest_bit; - - /* reuse swap entry of cache-only swap if not busy. */ - if (vm_swap_full() && si->swap_map[offset] == SWAP_HAS_CACHE) { - int swap_was_freed; - spin_unlock(&si->lock); - swap_was_freed = __try_to_reclaim_swap(si, offset, TTRS_ANYWAY | TTRS_DIRECT); - spin_lock(&si->lock); - /* entry was freed successfully, try to use this again */ - if (swap_was_freed > 0) - goto checks; - goto scan; /* check next one */ - } - - if (si->swap_map[offset]) { - if (!n_ret) - goto scan; - else - goto done; - } - memset(si->swap_map + offset, usage, nr_pages); - - swap_range_alloc(si, offset, nr_pages); - slots[n_ret++] = swp_entry(si->type, offset); - - /* got enough slots or reach max slots? */ - if ((n_ret == nr) || (offset >= si->highest_bit)) - goto done; - - /* search for next available slot */ - - /* time to take a break? */ - if (unlikely(--latency_ration < 0)) { - if (n_ret) - goto done; - spin_unlock(&si->lock); - cond_resched(); - spin_lock(&si->lock); - latency_ration = LATENCY_LIMIT; - } - - if (si->cluster_nr && !si->swap_map[++offset]) { - /* non-ssd case, still more slots in cluster? */ - --si->cluster_nr; - goto checks; - } - - /* - * Even if there's no free clusters available (fragmented), - * try to scan a little more quickly with lock held unless we - * have scanned too many slots already. - */ - if (!scanned_many) { - unsigned long scan_limit; - - if (offset < scan_base) - scan_limit = scan_base; - else - scan_limit = si->highest_bit; - for (; offset <= scan_limit && --latency_ration > 0; - offset++) { - if (!si->swap_map[offset]) - goto checks; - } - } - -done: - if (order == 0) - set_cluster_next(si, offset + 1); - si->flags -= SWP_SCANNING; - return n_ret; - -scan: - VM_WARN_ON(order > 0); - spin_unlock(&si->lock); - while (++offset <= READ_ONCE(si->highest_bit)) { - if (unlikely(--latency_ration < 0)) { - cond_resched(); - latency_ration = LATENCY_LIMIT; - scanned_many = true; - } - if (swap_offset_available_and_locked(si, offset)) - goto checks; - } - offset = si->lowest_bit; - while (offset < scan_base) { - if (unlikely(--latency_ration < 0)) { - cond_resched(); - latency_ration = LATENCY_LIMIT; - scanned_many = true; - } - if (swap_offset_available_and_locked(si, offset)) - goto checks; - offset++; - } - spin_lock(&si->lock); - -no_page: - si->flags -= SWP_SCANNING; - return n_ret; + return cluster_alloc_swap(si, usage, nr, slots, order); } int get_swap_pages(int n_goal, swp_entry_t swp_entries[], int entry_order) @@ -2871,8 +2670,6 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile) mutex_unlock(&swapon_mutex); free_percpu(p->percpu_cluster); p->percpu_cluster = NULL; - free_percpu(p->cluster_next_cpu); - p->cluster_next_cpu = NULL; vfree(swap_map); kvfree(zeromap); kvfree(cluster_info); @@ -3184,8 +2981,6 @@ static unsigned long read_swap_header(struct swap_info_struct *si, } si->lowest_bit = 1; - si->cluster_next = 1; - si->cluster_nr = 0; maxpages = swapfile_maximum_size; last_page = swap_header->info.last_page; @@ -3271,7 +3066,6 @@ static struct swap_cluster_info *setup_clusters(struct swap_info_struct *si, unsigned long maxpages) { unsigned long nr_clusters = DIV_ROUND_UP(maxpages, SWAPFILE_CLUSTER); - unsigned long col = si->cluster_next / SWAPFILE_CLUSTER % SWAP_CLUSTER_COLS; struct swap_cluster_info *cluster_info; unsigned long i, j, k, idx; int cpu, err = -ENOMEM; @@ -3283,15 +3077,6 @@ static struct swap_cluster_info *setup_clusters(struct swap_info_struct *si, for (i = 0; i < nr_clusters; i++) spin_lock_init(&cluster_info[i].lock); - si->cluster_next_cpu = alloc_percpu(unsigned int); - if (!si->cluster_next_cpu) - goto err_free; - - /* Random start position to help with wear leveling */ - for_each_possible_cpu(cpu) - per_cpu(*si->cluster_next_cpu, cpu) = - get_random_u32_inclusive(1, si->highest_bit); - si->percpu_cluster = alloc_percpu(struct percpu_cluster); if (!si->percpu_cluster) goto err_free; @@ -3333,7 +3118,7 @@ static struct swap_cluster_info *setup_clusters(struct swap_info_struct *si, * sharing same address space. */ for (k = 0; k < SWAP_CLUSTER_COLS; k++) { - j = (k + col) % SWAP_CLUSTER_COLS; + j = k % SWAP_CLUSTER_COLS; for (i = 0; i < DIV_ROUND_UP(nr_clusters, SWAP_CLUSTER_COLS); i++) { struct swap_cluster_info *ci; idx = i * SWAP_CLUSTER_COLS + j; @@ -3483,18 +3268,18 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags) if (si->bdev && bdev_nonrot(si->bdev)) { si->flags |= SWP_SOLIDSTATE; - - cluster_info = setup_clusters(si, swap_header, maxpages); - if (IS_ERR(cluster_info)) { - error = PTR_ERR(cluster_info); - cluster_info = NULL; - goto bad_swap_unlock_inode; - } } else { atomic_inc(&nr_rotate_swap); inced_nr_rotate_swap = true; } + cluster_info = setup_clusters(si, swap_header, maxpages); + if (IS_ERR(cluster_info)) { + error = PTR_ERR(cluster_info); + cluster_info = NULL; + goto bad_swap_unlock_inode; + } + if ((swap_flags & SWAP_FLAG_DISCARD) && si->bdev && bdev_max_discard_sectors(si->bdev)) { /* @@ -3575,8 +3360,6 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags) bad_swap: free_percpu(si->percpu_cluster); si->percpu_cluster = NULL; - free_percpu(si->cluster_next_cpu); - si->cluster_next_cpu = NULL; inode = NULL; destroy_swap_extents(si); swap_cgroup_swapoff(si->type); From patchwork Tue Dec 24 14:38:02 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kairui Song X-Patchwork-Id: 13920192 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 81E42E77188 for ; Tue, 24 Dec 2024 14:39:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 19DE66B0093; Tue, 24 Dec 2024 09:39:50 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0FD5C6B0095; Tue, 24 Dec 2024 09:39:50 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E43BF6B0096; Tue, 24 Dec 2024 09:39:49 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id C21536B0093 for ; Tue, 24 Dec 2024 09:39:49 -0500 (EST) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 79C15C14D6 for ; Tue, 24 Dec 2024 14:39:49 +0000 (UTC) X-FDA: 82930110396.21.21BB9E7 Received: from mail-pl1-f178.google.com (mail-pl1-f178.google.com [209.85.214.178]) by imf28.hostedemail.com (Postfix) with ESMTP id B5B44C0006 for ; Tue, 24 Dec 2024 14:39:04 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=i+Irnk8m; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf28.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.178 as permitted sender) smtp.mailfrom=ryncsn@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1735051169; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=OUd7wiP5I69QsBZ6QEaFt/A5eIkrt9OzZ0llQ5gAam8=; b=fk1pLPuA0CTMKn+elJTwp+rGnhhlhhKsAt+Ej3wwZBz9tSwTxHd9Q6iNQhksdNUGRT9vDm TF4I7Llsw2TVukquBefHWuH0v84PD77qvds9VScrdPSBImCbRyAhpIbw4sujq0X56jOjmE iJsIWNRSJDXnUywrXp9gabsEL4SAcNA= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1735051169; a=rsa-sha256; cv=none; b=zNKQkNDvya8vnfHL+naNaisKxIGvPZCuAtIFmDkmV4WthvLQTZRw90POrqW+K45RIWmqR4 Z2cvNzMg69K+oztgFPhSqmJmBIxiLOIeS+rEyqvcZ7vsYSuw8qDXsfmXT2G8+fjZxtvncQ e9SDNgdWt6+ToHLVRo8LujbY+arcRoE= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=i+Irnk8m; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf28.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.178 as permitted sender) smtp.mailfrom=ryncsn@gmail.com Received: by mail-pl1-f178.google.com with SMTP id d9443c01a7336-21636268e43so66522245ad.2 for ; Tue, 24 Dec 2024 06:39:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1735051185; x=1735655985; darn=kvack.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=OUd7wiP5I69QsBZ6QEaFt/A5eIkrt9OzZ0llQ5gAam8=; b=i+Irnk8mGN+mjuzfbDHfzTYAa32tJeh2KWVBP1YkImsZGCE6f+oHkyLMwICOZ2C+pJ ryKeB7SDl+P3FNGjA0eWuGP0n9j4R09PTx12LVQq7H/5cNSgw2/E49TzJw9gEwq8D3/Z fmGhMY+zJ/OniCAMQe3TEzV1Mrs/yPg9VtsF/maYwRWGT4zI4+3ecrnNY74hDfTJZWNd gzwviarC4UNAuwrReObgribY6bOu6DONLbsqEkZaZPv3XUtaK7nsquiL65S4hRX4w0Wl /XMbDxgE09gIjmmH6el9ajZZzwVFo4F602A5vRDWbkFTLGUnUEtMbBkbpt7q8I5ermk5 /n7A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1735051185; x=1735655985; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=OUd7wiP5I69QsBZ6QEaFt/A5eIkrt9OzZ0llQ5gAam8=; b=vwMjgeWCvCaYESrKtmpn6mYhz3pVQr68jlsS3/a5TPCCG/QNW+wwOKbrWTI8RXKl8y 2UnuSZdL5Jtes/ykXCbjRMzPi1tXlcZ0+qhOWo16ZWyivDNojADuXqm0AY6Fwk1PDw9t nEzDro17ad5xdW5cW8GD4AgZbsCNIYSKE6/hyZbQelUc1siXCBhwR027VaFNuWWqZ9de HrFsFy89eKGX58w2S5/3MwfU7sjDjIf3y6C94eJD2z/y5v0HaiFddnuinDSAPOu4gqKt 0QKg/QoTOTkF/C7IU0TaapDbHLa2aAm7JwhXTYRmSPnxc7dXxy+Nk+CkVSy1Omhpl+vk kA3Q== X-Gm-Message-State: AOJu0YxX0FG4JUCD50KQyxtaDHI9y2lMdbg1f0Q6NUJ1NlYyu8K0ijx/ szgXvE5QdBHwJJNtEiccqnMcNRQLx370uM9q3fp/9+BvWIEytFlcF6TG5ooARoM= X-Gm-Gg: ASbGnctkgzHmhSvhOketvYjbFaZBNFfGQQkq61BDlh+WjUitjCao2JYZWTEy4+u7f6i mMSHkZ/A0LpEUknaRo+tuZ1d+rD9jQGv1GI45W5yXoCKh/2qZMe2JSjoXiKzXnFarsTYmlrcJIT B7XWnB5xKj2H+rWuG9QXiuDc4uSzS/fg7lljqCBbEuZ4aEpP4qpsja53TrR+Nj8BH1T1RJoFTU5 aY27/ktKA2GsrEEuvfa9IG1/VWetcPWf83r1e5qxKzLjgEnrmLeQl4vxn287RAwCg0K01acmhzB TA== X-Google-Smtp-Source: AGHT+IE4+37LkTR/shstkl6vwn0NHWgUrerMEPAg5MTpigYxllj3TZwSffwTFxM6mFeGjyntkHJiPQ== X-Received: by 2002:a17:902:dace:b0:216:4064:53ad with SMTP id d9443c01a7336-219e6f26fd6mr211789875ad.48.1735051185324; Tue, 24 Dec 2024 06:39:45 -0800 (PST) Received: from KASONG-MC4.tencent.com ([115.171.41.189]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-219dc9cdec1sm90598735ad.136.2024.12.24.06.39.41 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 24 Dec 2024 06:39:44 -0800 (PST) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Chris Li , Barry Song , Ryan Roberts , Hugh Dickins , Yosry Ahmed , "Huang, Ying" , Nhat Pham , Johannes Weiner , Kalesh Singh , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH v2 04/13] mm, swap: use cluster lock for HDD Date: Tue, 24 Dec 2024 22:38:02 +0800 Message-ID: <20241224143811.33462-5-ryncsn@gmail.com> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20241224143811.33462-1-ryncsn@gmail.com> References: <20241224143811.33462-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 X-Stat-Signature: wr5ecd4je5n6buedxrimya3itemm8tkq X-Rspamd-Queue-Id: B5B44C0006 X-Rspam-User: X-Rspamd-Server: rspam01 X-HE-Tag: 1735051144-915459 X-HE-Meta: U2FsdGVkX1/jhOXI4Uwq9oXQMghMYkQRZp9f9OdZQSEb9BttVckPvhr/U9cxFGdhgtNj2xkdHm/3eEq4poCykoAINJiC8j5CbDgS1XVWh/1ug5HsAOoZ8xrnoEABs23kTvHinOBiRczp21E0rco896Lfl+qUHWwcebmPCEJTKZDM6kyDR2bV91DV/f144dqCJRp4P+Lh/vm0Cm9jTCuFPMiHq09zM85ol/l5wXK2SDoVeLtgs5uQ2Iwjh5Ak3JNzdANcAEsya0Utxn+bm71la9G5RQ9KSSHpb4ZCO2au1m+FUeWvpqCwZ8W79qkIAN+agTtrqwYaeNCjczKPvf0gBq70pnfieARvCKh/Njn0D0XKiqq+kczUWocpnRWVkk4nB4EHTB+xdcOp9RWjPYuT3XT89dBd12b/npNQnotwA9pa27UFVya0q1ejm8H/gL2G04Awm+py/CuHljPK6KMABnKd/zOIR0y9xJZxki89iJrxqDkNBlGiQsy31tBJPfBBXxyuZr6NBnDvJ0VYflNQ3POcPLYNKjm4tPFCCwP6LURA1lBiqf5XQ8tTKp2utAReG/bk7gWANodAFReTc/opWWOUSBcpTRjTjN/GYMM1LaC8bvCa9IyYYiJp9C4SpP1aQur4pkF6MlZvY/VEY7Gqepc8iYefyMTS2iEBM7qnsVYPQMbY1NMi7boe/zHOB8UQYonzcBnvpaIMeq07k3wD2EqExxwIuJX35kmEoMZHXAksLD3IZVDEhwWNUvMHjpvfpxpTlyKehUrnqZQmVl8D+p7Iz38/jr8ZC4P7Jjic3yX1NS+KfsNhF2bJq0hKqouEV9tjdZDMSM83gsD/8Cq+D+Fz+k1DTdgkmz03yZN6GScrHRXKlSewDijf1KSPnlJKunT+ZYzbxOIJZVCgokKZwZjNhS4k/k//vCdPIAOh6pehTxqDJo7Cuah5ovNNEP+en7NQKVhc1Hz5XJLlCxl /tomRJqx AwYCxA2bauAPalF3HCWstnV3JRbB6JIMd3uk3E3bR9o7rL52FTIRSRmXMhmWuckofArtD3QUXELsSob/skh5pom1IHDf9qzBF+mlyZtVyoGfSvyqTkO6BGLhZYLg/JfM29E4m7veZbMLQ/rQHNRF5M6CSoRw+xmazZM7b6ygkd7EzxTkjRO5dmHTcHleJzNt2a6HUeSaTed/kcuL/fW9/PxkzRBbbnuqg7azfgJ0vG2muSfNBxlV3JTiJpF4UL19gjvItAvBf80KiRCQDpbaQVmF2Dj5HEXkjaBrdzXXqEB6gCdBjdF79E5PqDmzTFxD+PiIhj26dJfIpY5XsDY6wjwA1bDLGViJ24KW8KPQ64tZD6FS78FMfCKJw/WisXC8eY+WhQp6z6ipnSzPoqRKcpuc/2bwor46YHz8ep7TFIsVnj2IV292kYytE/cLoKU9h7UCi6vGHRKNPn4CHmRCWCsz1AvXxwJI+bbWu7rbi+fdHsCnHLilrtL7ZvsykYUs2dnQ+ZOEZDDrQJfYZQHth2BwM34p7SqQ9wXXE4qiBGGTmro0= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Kairui Song Cluster lock (ci->lock) was introduce to reduce contention for certain operations. Using cluster lock for HDD is not helpful as HDD have a poor performance, so locking isn't the bottleneck. But having different set of locks for HDD / non-HDD prevents further rework of device lock (si->lock). This commit just changed all lock_cluster_or_swap_info to lock_cluster, which is a safe and straight conversion since cluster info is always allocated now, also removed all cluster_info related checks. Suggested-by: Chris Li Signed-off-by: Kairui Song --- mm/swapfile.c | 107 ++++++++++++++++---------------------------------- 1 file changed, 34 insertions(+), 73 deletions(-) diff --git a/mm/swapfile.c b/mm/swapfile.c index fca58d43b836..d0e5b9fa0c48 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -58,10 +58,9 @@ static void swap_entry_range_free(struct swap_info_struct *si, swp_entry_t entry static void swap_range_alloc(struct swap_info_struct *si, unsigned long offset, unsigned int nr_entries); static bool folio_swapcache_freeable(struct folio *folio); -static struct swap_cluster_info *lock_cluster_or_swap_info( - struct swap_info_struct *si, unsigned long offset); -static void unlock_cluster_or_swap_info(struct swap_info_struct *si, - struct swap_cluster_info *ci); +static struct swap_cluster_info *lock_cluster(struct swap_info_struct *si, + unsigned long offset); +static void unlock_cluster(struct swap_cluster_info *ci); static DEFINE_SPINLOCK(swap_lock); static unsigned int nr_swapfiles; @@ -222,9 +221,9 @@ static int __try_to_reclaim_swap(struct swap_info_struct *si, * swap_map is HAS_CACHE only, which means the slots have no page table * reference or pending writeback, and can't be allocated to others. */ - ci = lock_cluster_or_swap_info(si, offset); + ci = lock_cluster(si, offset); need_reclaim = swap_is_has_cache(si, offset, nr_pages); - unlock_cluster_or_swap_info(si, ci); + unlock_cluster(ci); if (!need_reclaim) goto out_unlock; @@ -404,45 +403,15 @@ static inline struct swap_cluster_info *lock_cluster(struct swap_info_struct *si { struct swap_cluster_info *ci; - ci = si->cluster_info; - if (ci) { - ci += offset / SWAPFILE_CLUSTER; - spin_lock(&ci->lock); - } - return ci; -} - -static inline void unlock_cluster(struct swap_cluster_info *ci) -{ - if (ci) - spin_unlock(&ci->lock); -} - -/* - * Determine the locking method in use for this device. Return - * swap_cluster_info if SSD-style cluster-based locking is in place. - */ -static inline struct swap_cluster_info *lock_cluster_or_swap_info( - struct swap_info_struct *si, unsigned long offset) -{ - struct swap_cluster_info *ci; - - /* Try to use fine-grained SSD-style locking if available: */ - ci = lock_cluster(si, offset); - /* Otherwise, fall back to traditional, coarse locking: */ - if (!ci) - spin_lock(&si->lock); + ci = &si->cluster_info[offset / SWAPFILE_CLUSTER]; + spin_lock(&ci->lock); return ci; } -static inline void unlock_cluster_or_swap_info(struct swap_info_struct *si, - struct swap_cluster_info *ci) +static inline void unlock_cluster(struct swap_cluster_info *ci) { - if (ci) - unlock_cluster(ci); - else - spin_unlock(&si->lock); + spin_unlock(&ci->lock); } /* Add a cluster to discard list and schedule it to do discard */ @@ -558,9 +527,6 @@ static void inc_cluster_info_page(struct swap_info_struct *si, unsigned long idx = page_nr / SWAPFILE_CLUSTER; struct swap_cluster_info *ci; - if (!cluster_info) - return; - ci = cluster_info + idx; ci->count++; @@ -576,9 +542,6 @@ static void inc_cluster_info_page(struct swap_info_struct *si, static void dec_cluster_info_page(struct swap_info_struct *si, struct swap_cluster_info *ci, int nr_pages) { - if (!si->cluster_info) - return; - VM_BUG_ON(ci->count < nr_pages); VM_BUG_ON(cluster_is_free(ci)); lockdep_assert_held(&si->lock); @@ -1007,8 +970,6 @@ static int cluster_alloc_swap(struct swap_info_struct *si, { int n_ret = 0; - VM_BUG_ON(!si->cluster_info); - si->flags += SWP_SCANNING; while (n_ret < nr) { @@ -1052,10 +1013,10 @@ static int scan_swap_map_slots(struct swap_info_struct *si, } /* - * Swapfile is not block device or not using clusters so unable + * Swapfile is not block device so unable * to allocate large entries. */ - if (!(si->flags & SWP_BLKDEV) || !si->cluster_info) + if (!(si->flags & SWP_BLKDEV)) return 0; } @@ -1295,9 +1256,9 @@ static unsigned char __swap_entry_free(struct swap_info_struct *si, unsigned long offset = swp_offset(entry); unsigned char usage; - ci = lock_cluster_or_swap_info(si, offset); + ci = lock_cluster(si, offset); usage = __swap_entry_free_locked(si, offset, 1); - unlock_cluster_or_swap_info(si, ci); + unlock_cluster(ci); if (!usage) free_swap_slot(entry); @@ -1320,14 +1281,14 @@ static bool __swap_entries_free(struct swap_info_struct *si, if (nr > SWAPFILE_CLUSTER - offset % SWAPFILE_CLUSTER) goto fallback; - ci = lock_cluster_or_swap_info(si, offset); + ci = lock_cluster(si, offset); if (!swap_is_last_map(si, offset, nr, &has_cache)) { - unlock_cluster_or_swap_info(si, ci); + unlock_cluster(ci); goto fallback; } for (i = 0; i < nr; i++) WRITE_ONCE(si->swap_map[offset + i], SWAP_HAS_CACHE); - unlock_cluster_or_swap_info(si, ci); + unlock_cluster(ci); if (!has_cache) { for (i = 0; i < nr; i++) @@ -1383,7 +1344,7 @@ static void cluster_swap_free_nr(struct swap_info_struct *si, DECLARE_BITMAP(to_free, BITS_PER_LONG) = { 0 }; int i, nr; - ci = lock_cluster_or_swap_info(si, offset); + ci = lock_cluster(si, offset); while (nr_pages) { nr = min(BITS_PER_LONG, nr_pages); for (i = 0; i < nr; i++) { @@ -1391,18 +1352,18 @@ static void cluster_swap_free_nr(struct swap_info_struct *si, bitmap_set(to_free, i, 1); } if (!bitmap_empty(to_free, BITS_PER_LONG)) { - unlock_cluster_or_swap_info(si, ci); + unlock_cluster(ci); for_each_set_bit(i, to_free, BITS_PER_LONG) free_swap_slot(swp_entry(si->type, offset + i)); if (nr == nr_pages) return; bitmap_clear(to_free, 0, BITS_PER_LONG); - ci = lock_cluster_or_swap_info(si, offset); + ci = lock_cluster(si, offset); } offset += nr; nr_pages -= nr; } - unlock_cluster_or_swap_info(si, ci); + unlock_cluster(ci); } /* @@ -1441,9 +1402,9 @@ void put_swap_folio(struct folio *folio, swp_entry_t entry) if (!si) return; - ci = lock_cluster_or_swap_info(si, offset); + ci = lock_cluster(si, offset); if (size > 1 && swap_is_has_cache(si, offset, size)) { - unlock_cluster_or_swap_info(si, ci); + unlock_cluster(ci); spin_lock(&si->lock); swap_entry_range_free(si, entry, size); spin_unlock(&si->lock); @@ -1451,14 +1412,14 @@ void put_swap_folio(struct folio *folio, swp_entry_t entry) } for (int i = 0; i < size; i++, entry.val++) { if (!__swap_entry_free_locked(si, offset + i, SWAP_HAS_CACHE)) { - unlock_cluster_or_swap_info(si, ci); + unlock_cluster(ci); free_swap_slot(entry); if (i == size - 1) return; - lock_cluster_or_swap_info(si, offset); + lock_cluster(si, offset); } } - unlock_cluster_or_swap_info(si, ci); + unlock_cluster(ci); } static int swp_entry_cmp(const void *ent1, const void *ent2) @@ -1522,9 +1483,9 @@ int swap_swapcount(struct swap_info_struct *si, swp_entry_t entry) struct swap_cluster_info *ci; int count; - ci = lock_cluster_or_swap_info(si, offset); + ci = lock_cluster(si, offset); count = swap_count(si->swap_map[offset]); - unlock_cluster_or_swap_info(si, ci); + unlock_cluster(ci); return count; } @@ -1547,7 +1508,7 @@ int swp_swapcount(swp_entry_t entry) offset = swp_offset(entry); - ci = lock_cluster_or_swap_info(si, offset); + ci = lock_cluster(si, offset); count = swap_count(si->swap_map[offset]); if (!(count & COUNT_CONTINUED)) @@ -1570,7 +1531,7 @@ int swp_swapcount(swp_entry_t entry) n *= (SWAP_CONT_MAX + 1); } while (tmp_count & COUNT_CONTINUED); out: - unlock_cluster_or_swap_info(si, ci); + unlock_cluster(ci); return count; } @@ -1585,8 +1546,8 @@ static bool swap_page_trans_huge_swapped(struct swap_info_struct *si, int i; bool ret = false; - ci = lock_cluster_or_swap_info(si, offset); - if (!ci || nr_pages == 1) { + ci = lock_cluster(si, offset); + if (nr_pages == 1) { if (swap_count(map[roffset])) ret = true; goto unlock_out; @@ -1598,7 +1559,7 @@ static bool swap_page_trans_huge_swapped(struct swap_info_struct *si, } } unlock_out: - unlock_cluster_or_swap_info(si, ci); + unlock_cluster(ci); return ret; } @@ -3428,7 +3389,7 @@ static int __swap_duplicate(swp_entry_t entry, unsigned char usage, int nr) offset = swp_offset(entry); VM_WARN_ON(nr > SWAPFILE_CLUSTER - offset % SWAPFILE_CLUSTER); VM_WARN_ON(usage == 1 && nr > 1); - ci = lock_cluster_or_swap_info(si, offset); + ci = lock_cluster(si, offset); err = 0; for (i = 0; i < nr; i++) { @@ -3483,7 +3444,7 @@ static int __swap_duplicate(swp_entry_t entry, unsigned char usage, int nr) } unlock_out: - unlock_cluster_or_swap_info(si, ci); + unlock_cluster(ci); return err; } From patchwork Tue Dec 24 14:38:03 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kairui Song X-Patchwork-Id: 13920193 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 532ACE77188 for ; Tue, 24 Dec 2024 14:39:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DF8486B0096; Tue, 24 Dec 2024 09:39:53 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D81FE6B0098; Tue, 24 Dec 2024 09:39:53 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BD49C6B0099; Tue, 24 Dec 2024 09:39:53 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 981646B0096 for ; Tue, 24 Dec 2024 09:39:53 -0500 (EST) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 111841619F4 for ; Tue, 24 Dec 2024 14:39:53 +0000 (UTC) X-FDA: 82930110984.05.4CD4757 Received: from mail-pl1-f176.google.com (mail-pl1-f176.google.com [209.85.214.176]) by imf01.hostedemail.com (Postfix) with ESMTP id 40EEA40007 for ; Tue, 24 Dec 2024 14:39:21 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=JjsVYGVJ; spf=pass (imf01.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.176 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1735051150; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=wrLhjYRLlutCAl1gCOYOsbHS82zDx2LhvG3HLqe+qww=; b=SC2js8KeCXrNQ4kIl/i3JiutSV7/UsUAX7rUsDuxBpr0LxiUgZGyObVHZAhUzu0sqt97bM F2bNv8TiTcLN7ZtLnp3z8SNGfvrVHwqdU9H/Hic9vRZ4e6p/bvnQh4bmCZoLPfqcCVkObV wF6YreQC5e/mHSmy/L3AyqdKI7HXSJM= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=JjsVYGVJ; spf=pass (imf01.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.176 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1735051150; a=rsa-sha256; cv=none; b=Bq8jDBwWkHbltHhdY8bVO++4g7maoT8yRt5SA1tMbZi8OGo5R/08NkqGQArUnqLuFxBZjP 9/A1HJj1xA80vKRL/qdYusRhii3+iNi+4TRp0VXPvexnUd8ix8d5JjeHVp3WLGdEhVNa6Y hwzAidolkgiMEA90v6tWtGxnRE3D+Jk= Received: by mail-pl1-f176.google.com with SMTP id d9443c01a7336-2166022c5caso47527825ad.2 for ; Tue, 24 Dec 2024 06:39:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1735051189; x=1735655989; darn=kvack.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=wrLhjYRLlutCAl1gCOYOsbHS82zDx2LhvG3HLqe+qww=; b=JjsVYGVJrz2mN2jlRT5ngMybCM2DxFJM217NyOViXK5pdmiEkEgrVZa3CAQv7kvs8x E6qR5kCK1eXa+FXLrkkQqH0f8Q6B9A8rlMbAd228o75MnxCMBQ6qxpk+1BVEeiHi3EnW DSjSvxUgsyLlO5+FZvYfsK690kn1MAZ8d9z3378dBoS/9XGy/bu3DrVSzFIZPEyamE3U maAg8vZGyAjI8dEY6jUCqgs+KF2HbrmIZlBwXIbUZGlarsqs4Bx7Lk0LFN3vL1XrDtNF AtQNp7h1k1cmuaq5d5ZiqhIRvv0c60mMEG/gZnponiD3wwyI3ENC/XvgS5fMAXO5pQ1r Bi4w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1735051189; x=1735655989; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=wrLhjYRLlutCAl1gCOYOsbHS82zDx2LhvG3HLqe+qww=; b=kXeiiLctnrg54wwo0O+vCTOmwifoCd9gV9HYKXN9ymUni8E+C1s9wJCi5vs39Herq+ z1mLSrJIBUg8XQZlhfuY7t4bh0y/NCfTH5r7jLQVsRdJxIVZrzQas/EyBOB0sJkCfs+h Pk19DkbUFFowTE4/KNUjgd3FTj4trnNlH0YcY55ClSb2gqNsRKiHIewTsRWkn6INiS7a k2YBuxrScMSgiInlXHoYGoGg/f/WoGurb+W/QW8b/V8NAMN+g8vU+92PpMGQKGin/Fm0 yUTD6LBRnNKYLcb2aJk06iAMYFub25iIqk/ecBv4ewJGCndLHtWQWjuNVTIi1DD6VnUu 92Mw== X-Gm-Message-State: AOJu0YxlcnMjXrApFRVGG1SNbDkqqRIDc5BiZzuwta1ge/xhHr5yq4Yo tC5nV/M0Rx/tHQXRgpeT72hUzxngfeGbxJUMcEcyWjyqts/fSk/kMl7xwplOkuc= X-Gm-Gg: ASbGncvpSkyTnzxKMCLI4UtTGjWEHvJzMDhvo1WXBci31OKWynR9jF8jFQCzy0h3QgQ S7OX8iFQpR4mHIq+bjUWqEUanTU3hK6UXSgBELKtABDgBIeT2YdkEx3ujHyx6ShusT+SQ5dyuQs WsntgsRTFDxQcZxNL433AHrZ5BieEjOgXXpy9bUr+RLAeOc11rJJMk1oI5hOOrnayqq2+3rd6ku wFIp1k6aoDUKYYx5uTIEHuCFcs2sh6zL5B6VLYyUkCewoq1MlAv3I+fSTkPGuNVfTTpTCKLFj+f /Q== X-Google-Smtp-Source: AGHT+IGNGHWwQc63B1avFd+9enD0tjMC22ByFTa2Ud87N8om2hjQ66akQBxyJtCo/qcrn+jE1mHd2w== X-Received: by 2002:a17:902:d501:b0:216:6f1a:1c81 with SMTP id d9443c01a7336-219e6e8c975mr258344285ad.2.1735051189528; Tue, 24 Dec 2024 06:39:49 -0800 (PST) Received: from KASONG-MC4.tencent.com ([115.171.41.189]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-219dc9cdec1sm90598735ad.136.2024.12.24.06.39.45 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 24 Dec 2024 06:39:49 -0800 (PST) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Chris Li , Barry Song , Ryan Roberts , Hugh Dickins , Yosry Ahmed , "Huang, Ying" , Nhat Pham , Johannes Weiner , Kalesh Singh , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH v2 05/13] mm, swap: clean up device availability check Date: Tue, 24 Dec 2024 22:38:03 +0800 Message-ID: <20241224143811.33462-6-ryncsn@gmail.com> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20241224143811.33462-1-ryncsn@gmail.com> References: <20241224143811.33462-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 40EEA40007 X-Rspam-User: X-Stat-Signature: 9n6e87hjj8oum3wig4r9uw1afbhg1kcd X-HE-Tag: 1735051161-265401 X-HE-Meta: U2FsdGVkX189M01ND0EygpcnpDmDIDKk2ECDJGq8h2z4bx37YjOdvmkbW4U+wZWENPWf6oiHgBRAD30ioL5BjyK8YQqUEZjQoRahxZNqyZGw9vyqz1Tggg6P+0GCXDVcxzpMTVKw+JhvmMAWjX7IF7+1J0Inx5WvmOCaN2/PYsSm+p1ANS/egSkFg7q5vEGig89Cw/GSb/uLDeFOxnWKM22/PBQhFLshgopQogwVlH2ZfAw1EIdteD+p2OweZd5pInpQEZDJqPm2Ba02XGpfJHdCz6AaeV71OmY3B7ac7pS7emJ9ndgC2ioyzKTwGABV7ANcA4x2XPI87iH5VlugcH6TveFM6/z/XlJZG25jzirN++zAefWD1Y4q4UwylXHlR1Eqk+lN4w6Z2mnMv9KZeuGT/88STxmpFjCiwNeXvUchxM4mtAnSE0IYndGqGDGHiKY1FEpyF+oIc1E8+f17xjwXBOzKXI0D4gKkhbefK1DKBPwykvQrJ1avvLnlfBYC1NH7hFtVlLZ2aNYyK+rDPMY/Ky7d7nyTbkc7Hl8qbIsvij+Sg9XpRDzuu/ZKHwrhizOFVodvrTZpgM4zZds73sNoj6Afmq71mq0VF9llfkE2E/KWeeLk3VNNkLASXow+3a4lSC1C7F1dPrN82BVkvm5ystD69+Gb3UJktxDL17jYiWWPDZbKbVxM2nZ0vnhqPiRLlgTNT8XvgSUxTsYNJXYsTvA5RI7RDb7y7BosC2wYJ+K3gft+4EYUTGslG75zxgjeodCCy/VVohDvLSgYMObG604GmPQ95oKLCejSkeyJDzc63NZbTqvo3YUyq3LlHjF7Xb3sj/5oAKo+zTnERk5XdiWCMH+zyADZ4kHU+cliAwXI6tb/vx4mdpWaXzwcEwNPGaxm0bWqESFqqIQ2KwrOEKim0C9Dq+I/7v4vkGrtn/uNiDguIzBVO/QuI/xFhX+n/hKN7UdCK7zFJuS u8zB1Hic 8B/UCQz5Ng6FM8ZywYCFgB5rwuOT31dXrUO2AoE6sNSKME1PoeYenoQUwdmW0quX5WpGgCCdEwunHhbIJ7uUzeuMvQMRekIS5J8VAt8dUhCVQnpvDfXu8spe+wtr4vEo/60WeddAzJEPkX648/alM8OagakigCB0IVCm1g/cM0Z+jKpXfllCQ9KzDgRH/bn12FmSG1z5knPcTMO3rv+TsJix03JF8ySlq3CmSBllTN38LFbIFUIhYAkgCUNyZyt+0uKoocaTugSFeKWnDXKcwdNN0Xuo1se0id8T/ouBSqwILtIezsB3raSfw6yU6OTtDdPi0w7ckr29RukcQhvZ80gTdQzmqYRVHrXI1X0s7bQQXMe6BILBQRbSwO4+S174XSkClE5HyVMPlByddtC+OA/rvplR5x0VfoAsXJiYUnxgP/7dJWufXA/TcnZA7dR6U9NGBJjHLB6mjRJUXe/LBIp40+OfskAp785OVCvYzMdRUyJY= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000041, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Kairui Song Remove highest_bit and lowest_bit. After the HDD allocation path has been removed, the only purpose of these two fields is to determine whether the device is full or not, which can instead be determined by checking the inuse_pages. Signed-off-by: Kairui Song --- fs/btrfs/inode.c | 1 - fs/iomap/swapfile.c | 1 - include/linux/swap.h | 2 -- mm/page_io.c | 1 - mm/swapfile.c | 38 ++++++++------------------------------ 5 files changed, 8 insertions(+), 35 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 488edca8333a..a1ba78afab2c 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -10044,7 +10044,6 @@ static int btrfs_swap_activate(struct swap_info_struct *sis, struct file *file, *span = bsi.highest_ppage - bsi.lowest_ppage + 1; sis->max = bsi.nr_pages; sis->pages = bsi.nr_pages - 1; - sis->highest_bit = bsi.nr_pages - 1; return bsi.nr_extents; } #else diff --git a/fs/iomap/swapfile.c b/fs/iomap/swapfile.c index 5fc0ac36dee3..b90d0eda9e51 100644 --- a/fs/iomap/swapfile.c +++ b/fs/iomap/swapfile.c @@ -189,7 +189,6 @@ int iomap_swapfile_activate(struct swap_info_struct *sis, *pagespan = 1 + isi.highest_ppage - isi.lowest_ppage; sis->max = isi.nr_pages; sis->pages = isi.nr_pages - 1; - sis->highest_bit = isi.nr_pages - 1; return isi.nr_extents; } EXPORT_SYMBOL_GPL(iomap_swapfile_activate); diff --git a/include/linux/swap.h b/include/linux/swap.h index 0c681aa5cb98..0c222017b5c6 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -306,8 +306,6 @@ struct swap_info_struct { struct list_head frag_clusters[SWAP_NR_ORDERS]; /* list of cluster that are fragmented or contented */ unsigned int frag_cluster_nr[SWAP_NR_ORDERS]; - unsigned int lowest_bit; /* index of first free in swap_map */ - unsigned int highest_bit; /* index of last free in swap_map */ unsigned int pages; /* total of usable pages of swap */ unsigned int inuse_pages; /* number of those currently in use */ struct percpu_cluster __percpu *percpu_cluster; /* per cpu's swap location */ diff --git a/mm/page_io.c b/mm/page_io.c index 4b4ea8e49cf6..9b983de351f9 100644 --- a/mm/page_io.c +++ b/mm/page_io.c @@ -163,7 +163,6 @@ int generic_swapfile_activate(struct swap_info_struct *sis, page_no = 1; /* force Empty message */ sis->max = page_no; sis->pages = page_no - 1; - sis->highest_bit = page_no - 1; out: return ret; bad_bmap: diff --git a/mm/swapfile.c b/mm/swapfile.c index d0e5b9fa0c48..7963a0c646a4 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -55,7 +55,7 @@ static bool swap_count_continued(struct swap_info_struct *, pgoff_t, static void free_swap_count_continuations(struct swap_info_struct *); static void swap_entry_range_free(struct swap_info_struct *si, swp_entry_t entry, unsigned int nr_pages); -static void swap_range_alloc(struct swap_info_struct *si, unsigned long offset, +static void swap_range_alloc(struct swap_info_struct *si, unsigned int nr_entries); static bool folio_swapcache_freeable(struct folio *folio); static struct swap_cluster_info *lock_cluster(struct swap_info_struct *si, @@ -650,7 +650,7 @@ static bool cluster_alloc_range(struct swap_info_struct *si, struct swap_cluster } memset(si->swap_map + start, usage, nr_pages); - swap_range_alloc(si, start, nr_pages); + swap_range_alloc(si, nr_pages); ci->count += nr_pages; if (ci->count == SWAPFILE_CLUSTER) { @@ -888,19 +888,11 @@ static void del_from_avail_list(struct swap_info_struct *si) spin_unlock(&swap_avail_lock); } -static void swap_range_alloc(struct swap_info_struct *si, unsigned long offset, +static void swap_range_alloc(struct swap_info_struct *si, unsigned int nr_entries) { - unsigned int end = offset + nr_entries - 1; - - if (offset == si->lowest_bit) - si->lowest_bit += nr_entries; - if (end == si->highest_bit) - WRITE_ONCE(si->highest_bit, si->highest_bit - nr_entries); WRITE_ONCE(si->inuse_pages, si->inuse_pages + nr_entries); if (si->inuse_pages == si->pages) { - si->lowest_bit = si->max; - si->highest_bit = 0; del_from_avail_list(si); if (si->cluster_info && vm_swap_full()) @@ -933,15 +925,8 @@ static void swap_range_free(struct swap_info_struct *si, unsigned long offset, for (i = 0; i < nr_entries; i++) clear_bit(offset + i, si->zeromap); - if (offset < si->lowest_bit) - si->lowest_bit = offset; - if (end > si->highest_bit) { - bool was_full = !si->highest_bit; - - WRITE_ONCE(si->highest_bit, end); - if (was_full && (si->flags & SWP_WRITEOK)) - add_to_avail_list(si); - } + if (si->inuse_pages == si->pages) + add_to_avail_list(si); if (si->flags & SWP_BLKDEV) swap_slot_free_notify = si->bdev->bd_disk->fops->swap_slot_free_notify; @@ -1051,15 +1036,12 @@ int get_swap_pages(int n_goal, swp_entry_t swp_entries[], int entry_order) plist_requeue(&si->avail_lists[node], &swap_avail_heads[node]); spin_unlock(&swap_avail_lock); spin_lock(&si->lock); - if (!si->highest_bit || !(si->flags & SWP_WRITEOK)) { + if ((si->inuse_pages == si->pages) || !(si->flags & SWP_WRITEOK)) { spin_lock(&swap_avail_lock); if (plist_node_empty(&si->avail_lists[node])) { spin_unlock(&si->lock); goto nextsi; } - WARN(!si->highest_bit, - "swap_info %d in list but !highest_bit\n", - si->type); WARN(!(si->flags & SWP_WRITEOK), "swap_info %d in list but !SWP_WRITEOK\n", si->type); @@ -2441,8 +2423,8 @@ static void _enable_swap_info(struct swap_info_struct *si) */ plist_add(&si->list, &swap_active_head); - /* add to available list iff swap device is not full */ - if (si->highest_bit) + /* add to available list if swap device is not full */ + if (si->inuse_pages < si->pages) add_to_avail_list(si); } @@ -2606,7 +2588,6 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile) drain_mmlist(); /* wait for anyone still in scan_swap_map_slots */ - p->highest_bit = 0; /* cuts scans short */ while (p->flags >= SWP_SCANNING) { spin_unlock(&p->lock); spin_unlock(&swap_lock); @@ -2941,8 +2922,6 @@ static unsigned long read_swap_header(struct swap_info_struct *si, return 0; } - si->lowest_bit = 1; - maxpages = swapfile_maximum_size; last_page = swap_header->info.last_page; if (!last_page) { @@ -2959,7 +2938,6 @@ static unsigned long read_swap_header(struct swap_info_struct *si, if ((unsigned int)maxpages == 0) maxpages = UINT_MAX; } - si->highest_bit = maxpages - 1; if (!maxpages) return 0; From patchwork Tue Dec 24 14:38:04 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kairui Song X-Patchwork-Id: 13920194 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A72E7E7718D for ; Tue, 24 Dec 2024 14:39:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 370CE6B0099; Tue, 24 Dec 2024 09:39:58 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 2F8986B009A; Tue, 24 Dec 2024 09:39:58 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 14B796B009B; Tue, 24 Dec 2024 09:39:58 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id DFA436B0099 for ; Tue, 24 Dec 2024 09:39:57 -0500 (EST) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 9DF03A15CD for ; Tue, 24 Dec 2024 14:39:57 +0000 (UTC) X-FDA: 82930111656.29.24C6174 Received: from mail-pl1-f175.google.com (mail-pl1-f175.google.com [209.85.214.175]) by imf26.hostedemail.com (Postfix) with ESMTP id B0FF6140005 for ; Tue, 24 Dec 2024 14:39:25 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=SxPksWVW; spf=pass (imf26.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.175 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1735051177; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=mhVpmJTS6rGzEbGdCTwpT/iR8iNpYNa0AWyURPUsTrM=; b=z3zWsPLrg6iYL11Nx8rG6abeAFIT892BdtBLKS5FnS1MBgAl5R35jmMUEij21/McCRx8S3 UDJlbLVGCai9rhsHyG9dUwa1k0ao5S9rgHggn+sBlSfqwneNsleSU7rgAe41o86Yts9mj2 oLfxJsqQAmlwlPAlPxp9wFbwCOTlbq8= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=SxPksWVW; spf=pass (imf26.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.175 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1735051177; a=rsa-sha256; cv=none; b=ujR1oY0KMVuj5j82Z+MMxpDpU72MEgryANFvOHCo9ltfceLDMsx0gHO0X9VaWoCkU7NCXa DQ+I1BQOlRAcVKTTXZnC0jHChr9bAeOLQoexmIsL0mVHdyMqUQhyJXe1omcYAq+ncOdQdi 3ZU/t7Yw+4MxESMY+nLZMcLCUrCZFtM= Received: by mail-pl1-f175.google.com with SMTP id d9443c01a7336-2165448243fso62663535ad.1 for ; Tue, 24 Dec 2024 06:39:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1735051194; x=1735655994; darn=kvack.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=mhVpmJTS6rGzEbGdCTwpT/iR8iNpYNa0AWyURPUsTrM=; b=SxPksWVWKTBWMobah0Qbhey0Zn/YISxaVaZLaEk1VOaQ2vsygP7VKFlCM/tB7SZKsO BquYFH/RWjwF4Pmw182jKVEEtQ1K7ZRIwu1xoOpSSglftKbQu69XuL3IUMp5Ixcg+aD2 /Rfhi0yMV6Eb7ORZAgzKH4CvfSsiGvHDWRaw1hxhZPUj+vot6cooK3cmTgH4BMRNWJiD OjLlObRpxQzjjI0q/vSvlOhnLyzptzE3REkGHhp8Q6KyBcncatPtWsj37E5OWaBmTgxv eprZNr+1osSjA/YwAjncmfo1NmdwIFLLdQsntCdL2w/1nkBowLdFFvSfG/ys7e1kwT4A yCgw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1735051194; x=1735655994; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=mhVpmJTS6rGzEbGdCTwpT/iR8iNpYNa0AWyURPUsTrM=; b=eoBSGosRoACtITPQQqS+32au4n7UlTNHeyPVhpAiockaqMQEnZ1H66UPPqpaexup99 1fXq+gKmibk/JQXtcC6KNTlwGnH9d/YBXjmuL+u87S6ZXVUaGyh1IzrgcAXbh1WZk6oE KMwShiwLA79/jTL7vCd4OLIDpycnmjDdy+WzK7vsNNCpVeD9/l2rEyw40U3gqqL1DpL2 yaNY2jvE34iewSc36dUKfYCvZ+fZcG7c4tcX+CrSCmhxjJTfSd5LfGO37kbUmta2cmLs GR9dzfB3FbWyER+UCRGAl/xJwV3SdJBH/R3/oz/laM71nVAppgzGDibGQ3Z7VJnohJYg c4Hw== X-Gm-Message-State: AOJu0YzXs4XCHgLTBOdKDvCbx4r/nZ62WKV5j6TZUVi/h3R4K1SGxq1T aDaL6E8P0TRxWrITXuECqcPUoimsvM0SC0/yJo1QeQoYwIeogpnBiukse9+Na2E= X-Gm-Gg: ASbGnctvgCbBRlotYM5HudS4Cxf/dCKjP/o5SisOe0guF/SshQUchiq+WqS1sB7plJk ZuG1PWbcvN6dkmbs7PmXI6hhiNlgQ3SIcN25WIEeHVsPQ4Ofac7DKRVsvuzWOOcU+O2oZcESTnE rrXaey6x9lwwqd7r4z1K5V3UnuaXzBItHwzVTwtzgkt11ADNJl7Lz3rbMbEx07RzoMnq0v7Yhp7 GTkt3ioKaVNeMFNwgxV5pnp3pwC3Apys7JKzOm5RFuiWRtWs+yxc4PiUYkH+sEoHFfw7vo9wJRh 8A== X-Google-Smtp-Source: AGHT+IF0Mieft9SPZ9YL6n6WOz7rzrI2R1GTud1LrsAnCe1roQtNTk+vccAeFonoqRhrlCFIHfMJ1Q== X-Received: by 2002:a17:902:eccc:b0:20c:9936:f0ab with SMTP id d9443c01a7336-219e6f27fecmr238487775ad.47.1735051193891; Tue, 24 Dec 2024 06:39:53 -0800 (PST) Received: from KASONG-MC4.tencent.com ([115.171.41.189]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-219dc9cdec1sm90598735ad.136.2024.12.24.06.39.49 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 24 Dec 2024 06:39:53 -0800 (PST) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Chris Li , Barry Song , Ryan Roberts , Hugh Dickins , Yosry Ahmed , "Huang, Ying" , Nhat Pham , Johannes Weiner , Kalesh Singh , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH v2 06/13] mm, swap: clean up plist removal and adding Date: Tue, 24 Dec 2024 22:38:04 +0800 Message-ID: <20241224143811.33462-7-ryncsn@gmail.com> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20241224143811.33462-1-ryncsn@gmail.com> References: <20241224143811.33462-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 X-Rspamd-Queue-Id: B0FF6140005 X-Stat-Signature: 9f4um7yrh89bmwbx8kezuyjakh41z84g X-Rspam-User: X-Rspamd-Server: rspam11 X-HE-Tag: 1735051165-550778 X-HE-Meta: U2FsdGVkX1+qBJWyPA5ZL430SRnDGdQ7o523AXCYiOkh64+YgAC+nT9kzAgwEK06/LDMQleMHQMTXlNZnjMU5usubxWeLqVgZeASwHssR/dVB5LTaQCPzMkMe2vNtYDh1iZlkvRzfXQ8VsyGjiqqvPGzXnRY+Ij50Y9QaUsYLTXcoZkVsJ9HSwcF+QZae4mxmUmBXGp9lfTkN0qI9HEWhRx3Lue3dO8H054gk3RgRCAJob7mHG2i6Rtcku7AsXdAvEqH3idBWD99gaDHcMhtLuw5Pn5Are3hdqeWM8QW9HibV0BY4nL1m+q+a5/74ehbVqYhp438uRi5vLGgapPg0g+tQnlb/pHx0qS9aADAm/vTI95uwebLK1IyaBdfXc+h59xU0V9mCWuMP+OYiys4yZkSQgdxNh13dD+S6eSlBQoy9HLPMMsTu6Xid0zVIHb0tVoeYqBWZ4T8HlL4SGo0v4umWDyUdUufCOH51JNkG2LVsrMbZwq7ndFK5eY7z6UszOL2Pn8EQ42E9OdvUIyu+/20fERA5HIp7q1hPkNcII0HzqPNK2+Lz7QuIO9xjXf4RLF4OLL4HKjh4YgcEYandDvwr40Oa299J/C6DYKIgTcn7eoFNcDVMqZJNR0s4Dv335nG0q0yVawvQZMk9BL965eA+VwkO38X8vbACeJRiXWx0bF7wyHO0VpRNo8mIMOlQaFeuIf5nk1Ym2OvVrDUsm9MiInkr91iVL2asIgMg5ArPN2uow7zs5gKz+XmfYdspoqJq54LcOHrtx2QZ9xaQLhQ1jbmMXK4r1p61R5rjausfEGZ+t3MAVr74sEgt/kfgJzAcff8mQkk1HBoWxIqN0lIFCpAsf4L4CcdBkgtCWzi/0Twc9/cdFWYPA7sFZC5mohZVzUabQswixW0iv80CTO+YjNNlI0VjxMtvj8y9nCQ1VNFBnCZvRcSt49IFZ1TFnhv/BYsS81sCIuZNya AGaEzdHC V9SkXfp8gHUtBT7Zsf+DHUdb23aQjTleKv7CXe8iJmMBBwciHbQe+NM3VB60gTiE75r8uNMVAJ1OPjwhN5wsAJ4ydO2diR+cBFYRvpFZ8/NbkP2zo3Hlp1OQfI+NHnyDGVoQGL0fGtJvGNGRaftVGzdS7gFwVwlwd623EH9a52s8XvMf9AsnBSiqGSrpjmIr6Br08vN7cSBE/HH3jIt57kF4iRZvBFOfteEGxZm390TVKNo0z31G8wjZP2XdCAdDigVKDSd5taCex/LeT5IAB1AMo/vD4gEYHTzJBdfRo6LWTagJ8UyRpGNwTtsxH8hqevt8VYtdB9I8MklsU9ffFCiPNA+/XkybOH/uMI2m4xHsi09W20YgbW1WZg363Tk4T4zfYUGedAXp0qoZ40LxSXR2m4tm//kUBx1lnMUtiRY8yLoU7QxK2QH1IKQwK8jZ6/u4TtvOCuNnC0WDecAC7/5tasM5N7ykLU7qmy9S23/Y+y5glqp6zJEgnhh7yPPTUyqUsRjeivK7u3c+05gGSDABeDw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Kairui Song When the swap device is full (inuse_pages == pages), it should be removed from the allocation available plist. If any slot is freed, the swap device should be added back to the plist. Additionally, during swapon or swapoff, the swap device is forcefully added or removed. Currently, the condition (inuse_pages == pages) is checked after every counter update, then remove or add the device accordingly. This is serialized by si->lock. This commit decouples it from the protection of si->lock and reworked plist removal and adding, making it possible to get rid of the hard dependency on si->lock in allocation path in later commits. To achieve this, simply using another lock is not an optimal approach, as the overhead is observable for a hot counter, and may cause complex locking issues. Thus, this commit manages to make it a lock-free atomic operation, by embedding the plist state into the second highest bit of the atomic counter. Simply making the counter an atomic will not work, if the update and plist status check are not performed atomically, we may miss an addition or removal. With the embedded info we can update the counter and check the plist status with single atomic operations, and avoid any extra overheads: If the counter is full (inuse_pages == pages) and the off-list bit is unset, we attempt to remove it from the plist. If the counter is not full (inuse_pages != pages) and the off-list bit is set, we attempt to add it to the plist. Removing, adding and bit update is serialized with a lock, which is a cold path. Ordinary counter updates will be lock-free. Signed-off-by: Kairui Song --- include/linux/swap.h | 2 +- mm/swapfile.c | 184 +++++++++++++++++++++++++++++++------------ 2 files changed, 135 insertions(+), 51 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index 0c222017b5c6..e1eeea6307cd 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -307,7 +307,7 @@ struct swap_info_struct { /* list of cluster that are fragmented or contented */ unsigned int frag_cluster_nr[SWAP_NR_ORDERS]; unsigned int pages; /* total of usable pages of swap */ - unsigned int inuse_pages; /* number of those currently in use */ + atomic_long_t inuse_pages; /* number of those currently in use */ struct percpu_cluster __percpu *percpu_cluster; /* per cpu's swap location */ struct rb_root swap_extent_root;/* root of the swap extent rbtree */ struct block_device *bdev; /* swap device or bdev of swap file */ diff --git a/mm/swapfile.c b/mm/swapfile.c index 7963a0c646a4..ae0f7df06474 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -128,6 +128,26 @@ static inline unsigned char swap_count(unsigned char ent) return ent & ~SWAP_HAS_CACHE; /* may include COUNT_CONTINUED flag */ } +/* + * Use the second highest bit of inuse_pages counter as the indicator + * of if one swap device is on the available plist, so the atomic can + * still be updated arithmetic while having special data embedded. + * + * inuse_pages counter is the only thing indicating if a device should + * be on avail_lists or not (except swapon / swapoff). By embedding the + * on-list bit in the atomic counter, updates no longer need any lock + * to check the list status. + * + * This bit will be set if the device is not on the plist and not + * usable, will be cleared if the device is on the plist. + */ +#define SWAP_USAGE_OFFLIST_BIT (1UL << (BITS_PER_TYPE(atomic_t) - 2)) +#define SWAP_USAGE_COUNTER_MASK (~SWAP_USAGE_OFFLIST_BIT) +static long swap_usage_in_pages(struct swap_info_struct *si) +{ + return atomic_long_read(&si->inuse_pages) & SWAP_USAGE_COUNTER_MASK; +} + /* Reclaim the swap entry anyway if possible */ #define TTRS_ANYWAY 0x1 /* @@ -717,7 +737,7 @@ static void swap_reclaim_full_clusters(struct swap_info_struct *si, bool force) int nr_reclaim; if (force) - to_scan = si->inuse_pages / SWAPFILE_CLUSTER; + to_scan = swap_usage_in_pages(si) / SWAPFILE_CLUSTER; while (!list_empty(&si->full_clusters)) { ci = list_first_entry(&si->full_clusters, struct swap_cluster_info, list); @@ -872,42 +892,124 @@ static unsigned long cluster_alloc_swap_entry(struct swap_info_struct *si, int o return found; } -static void __del_from_avail_list(struct swap_info_struct *si) +/* SWAP_USAGE_OFFLIST_BIT can only be cleared by this helper. */ +static void del_from_avail_list(struct swap_info_struct *si, bool swapoff) { int nid; - assert_spin_locked(&si->lock); + spin_lock(&swap_avail_lock); + + if (swapoff) { + /* + * Forcefully remove it. Clear the SWP_WRITEOK flags for + * swapoff here so it's synchronized by both si->lock and + * swap_avail_lock, to ensure the result can be seen by + * add_to_avail_list. + */ + lockdep_assert_held(&si->lock); + si->flags &= ~SWP_WRITEOK; + atomic_long_or(SWAP_USAGE_OFFLIST_BIT, &si->inuse_pages); + } else { + /* + * If not called by swapoff, take it off-list only if it's + * full and SWAP_USAGE_OFFLIST_BIT is not set (strictly + * si->inuse_pages == pages), any concurrent slot freeing, + * or device already removed from plist by someone else + * will make this return false. + */ + if (atomic_long_cmpxchg(&si->inuse_pages, si->pages, + si->pages | SWAP_USAGE_OFFLIST_BIT) != si->pages) + goto skip; + } + for_each_node(nid) plist_del(&si->avail_lists[nid], &swap_avail_heads[nid]); + +skip: + spin_unlock(&swap_avail_lock); } -static void del_from_avail_list(struct swap_info_struct *si) +/* SWAP_USAGE_OFFLIST_BIT can only be set by this helper. */ +static void add_to_avail_list(struct swap_info_struct *si, bool swapon) { + int nid; + long val; + spin_lock(&swap_avail_lock); - __del_from_avail_list(si); + + /* Corresponding to SWP_WRITEOK clearing in del_from_avail_list */ + if (swapon) { + lockdep_assert_held(&si->lock); + si->flags |= SWP_WRITEOK; + } else { + if (!(READ_ONCE(si->flags) & SWP_WRITEOK)) + goto skip; + } + + if (!(atomic_long_read(&si->inuse_pages) & SWAP_USAGE_OFFLIST_BIT)) + goto skip; + + val = atomic_long_fetch_and_relaxed(~SWAP_USAGE_OFFLIST_BIT, &si->inuse_pages); + + /* + * When device is full and device is on the plist, only one updater will + * see (inuse_pages == si->pages) and will call del_from_avail_list. If + * that updater happen to be here, just skip adding. + */ + if (val == si->pages) { + /* Just like the cmpxchg in del_from_avail_list */ + if (atomic_long_cmpxchg(&si->inuse_pages, si->pages, + si->pages | SWAP_USAGE_OFFLIST_BIT) == si->pages) + goto skip; + } + + for_each_node(nid) + plist_add(&si->avail_lists[nid], &swap_avail_heads[nid]); + +skip: spin_unlock(&swap_avail_lock); } -static void swap_range_alloc(struct swap_info_struct *si, - unsigned int nr_entries) +/* + * swap_usage_add / swap_usage_sub of each slot are serialized by ci->lock + * within each cluster, so the total contribution to the global counter should + * always be positive and cannot exceed the total number of usable slots. + */ +static bool swap_usage_add(struct swap_info_struct *si, unsigned int nr_entries) { - WRITE_ONCE(si->inuse_pages, si->inuse_pages + nr_entries); - if (si->inuse_pages == si->pages) { - del_from_avail_list(si); + long val = atomic_long_add_return_relaxed(nr_entries, &si->inuse_pages); - if (si->cluster_info && vm_swap_full()) - schedule_work(&si->reclaim_work); + /* + * If device is full, and SWAP_USAGE_OFFLIST_BIT is not set, + * remove it from the plist. + */ + if (unlikely(val == si->pages)) { + del_from_avail_list(si, false); + return true; } + + return false; } -static void add_to_avail_list(struct swap_info_struct *si) +static void swap_usage_sub(struct swap_info_struct *si, unsigned int nr_entries) { - int nid; + long val = atomic_long_sub_return_relaxed(nr_entries, &si->inuse_pages); - spin_lock(&swap_avail_lock); - for_each_node(nid) - plist_add(&si->avail_lists[nid], &swap_avail_heads[nid]); - spin_unlock(&swap_avail_lock); + /* + * If device is not full, and SWAP_USAGE_OFFLIST_BIT is set, + * remove it from the plist. + */ + if (unlikely(val & SWAP_USAGE_OFFLIST_BIT)) + add_to_avail_list(si, false); +} + +static void swap_range_alloc(struct swap_info_struct *si, + unsigned int nr_entries) +{ + if (swap_usage_add(si, nr_entries)) { + if (si->cluster_info && vm_swap_full()) + schedule_work(&si->reclaim_work); + } } static void swap_range_free(struct swap_info_struct *si, unsigned long offset, @@ -925,8 +1027,6 @@ static void swap_range_free(struct swap_info_struct *si, unsigned long offset, for (i = 0; i < nr_entries; i++) clear_bit(offset + i, si->zeromap); - if (si->inuse_pages == si->pages) - add_to_avail_list(si); if (si->flags & SWP_BLKDEV) swap_slot_free_notify = si->bdev->bd_disk->fops->swap_slot_free_notify; @@ -946,7 +1046,7 @@ static void swap_range_free(struct swap_info_struct *si, unsigned long offset, */ smp_wmb(); atomic_long_add(nr_entries, &nr_swap_pages); - WRITE_ONCE(si->inuse_pages, si->inuse_pages - nr_entries); + swap_usage_sub(si, nr_entries); } static int cluster_alloc_swap(struct swap_info_struct *si, @@ -1036,19 +1136,6 @@ int get_swap_pages(int n_goal, swp_entry_t swp_entries[], int entry_order) plist_requeue(&si->avail_lists[node], &swap_avail_heads[node]); spin_unlock(&swap_avail_lock); spin_lock(&si->lock); - if ((si->inuse_pages == si->pages) || !(si->flags & SWP_WRITEOK)) { - spin_lock(&swap_avail_lock); - if (plist_node_empty(&si->avail_lists[node])) { - spin_unlock(&si->lock); - goto nextsi; - } - WARN(!(si->flags & SWP_WRITEOK), - "swap_info %d in list but !SWP_WRITEOK\n", - si->type); - __del_from_avail_list(si); - spin_unlock(&si->lock); - goto nextsi; - } n_ret = scan_swap_map_slots(si, SWAP_HAS_CACHE, n_goal, swp_entries, order); spin_unlock(&si->lock); @@ -1057,7 +1144,6 @@ int get_swap_pages(int n_goal, swp_entry_t swp_entries[], int entry_order) cond_resched(); spin_lock(&swap_avail_lock); -nextsi: /* * if we got here, it's likely that si was almost full before, * and since scan_swap_map_slots() can drop the si->lock, @@ -1789,7 +1875,7 @@ unsigned int count_swap_pages(int type, int free) if (sis->flags & SWP_WRITEOK) { n = sis->pages; if (free) - n -= sis->inuse_pages; + n -= swap_usage_in_pages(sis); } spin_unlock(&sis->lock); } @@ -2124,7 +2210,7 @@ static int try_to_unuse(unsigned int type) swp_entry_t entry; unsigned int i; - if (!READ_ONCE(si->inuse_pages)) + if (!swap_usage_in_pages(si)) goto success; retry: @@ -2137,7 +2223,7 @@ static int try_to_unuse(unsigned int type) spin_lock(&mmlist_lock); p = &init_mm.mmlist; - while (READ_ONCE(si->inuse_pages) && + while (swap_usage_in_pages(si) && !signal_pending(current) && (p = p->next) != &init_mm.mmlist) { @@ -2165,7 +2251,7 @@ static int try_to_unuse(unsigned int type) mmput(prev_mm); i = 0; - while (READ_ONCE(si->inuse_pages) && + while (swap_usage_in_pages(si) && !signal_pending(current) && (i = find_next_to_unuse(si, i)) != 0) { @@ -2200,7 +2286,7 @@ static int try_to_unuse(unsigned int type) * folio_alloc_swap(), temporarily hiding that swap. It's easy * and robust (though cpu-intensive) just to keep retrying. */ - if (READ_ONCE(si->inuse_pages)) { + if (swap_usage_in_pages(si)) { if (!signal_pending(current)) goto retry; return -EINTR; @@ -2209,7 +2295,7 @@ static int try_to_unuse(unsigned int type) success: /* * Make sure that further cleanups after try_to_unuse() returns happen - * after swap_range_free() reduces si->inuse_pages to 0. + * after swap_range_free() reduces inuse_pages to 0. */ smp_mb(); return 0; @@ -2227,7 +2313,7 @@ static void drain_mmlist(void) unsigned int type; for (type = 0; type < nr_swapfiles; type++) - if (swap_info[type]->inuse_pages) + if (swap_usage_in_pages(swap_info[type])) return; spin_lock(&mmlist_lock); list_for_each_safe(p, next, &init_mm.mmlist) @@ -2406,7 +2492,6 @@ static void setup_swap_info(struct swap_info_struct *si, int prio, static void _enable_swap_info(struct swap_info_struct *si) { - si->flags |= SWP_WRITEOK; atomic_long_add(si->pages, &nr_swap_pages); total_swap_pages += si->pages; @@ -2423,9 +2508,8 @@ static void _enable_swap_info(struct swap_info_struct *si) */ plist_add(&si->list, &swap_active_head); - /* add to available list if swap device is not full */ - if (si->inuse_pages < si->pages) - add_to_avail_list(si); + /* Add back to available list */ + add_to_avail_list(si, true); } static void enable_swap_info(struct swap_info_struct *si, int prio, @@ -2523,7 +2607,7 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile) goto out_dput; } spin_lock(&p->lock); - del_from_avail_list(p); + del_from_avail_list(p, true); if (p->prio < 0) { struct swap_info_struct *si = p; int nid; @@ -2541,7 +2625,6 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile) plist_del(&p->list, &swap_active_head); atomic_long_sub(p->pages, &nr_swap_pages); total_swap_pages -= p->pages; - p->flags &= ~SWP_WRITEOK; spin_unlock(&p->lock); spin_unlock(&swap_lock); @@ -2721,7 +2804,7 @@ static int swap_show(struct seq_file *swap, void *v) } bytes = K(si->pages); - inuse = K(READ_ONCE(si->inuse_pages)); + inuse = K(swap_usage_in_pages(si)); file = si->swap_file; len = seq_file_path(swap, file, " \t\n\\"); @@ -2838,6 +2921,7 @@ static struct swap_info_struct *alloc_swap_info(void) } spin_lock_init(&p->lock); spin_lock_init(&p->cont_lock); + atomic_long_set(&p->inuse_pages, SWAP_USAGE_OFFLIST_BIT); init_completion(&p->comp); return p; @@ -3335,7 +3419,7 @@ void si_swapinfo(struct sysinfo *val) struct swap_info_struct *si = swap_info[type]; if ((si->flags & SWP_USED) && !(si->flags & SWP_WRITEOK)) - nr_to_be_unused += READ_ONCE(si->inuse_pages); + nr_to_be_unused += swap_usage_in_pages(si); } val->freeswap = atomic_long_read(&nr_swap_pages) + nr_to_be_unused; val->totalswap = total_swap_pages + nr_to_be_unused; From patchwork Tue Dec 24 14:38:05 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kairui Song X-Patchwork-Id: 13920195 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E57CFE7718D for ; Tue, 24 Dec 2024 14:40:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 78EE26B009B; Tue, 24 Dec 2024 09:40:02 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 7175C6B009C; Tue, 24 Dec 2024 09:40:02 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 56A3C6B009D; Tue, 24 Dec 2024 09:40:02 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 364046B009B for ; Tue, 24 Dec 2024 09:40:02 -0500 (EST) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id AF41C1A160D for ; Tue, 24 Dec 2024 14:40:01 +0000 (UTC) X-FDA: 82930110396.19.4B18299 Received: from mail-pl1-f182.google.com (mail-pl1-f182.google.com [209.85.214.182]) by imf08.hostedemail.com (Postfix) with ESMTP id 8F4FE16000B for ; Tue, 24 Dec 2024 14:39:33 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=d92uMNdx; spf=pass (imf08.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.182 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1735051154; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=9HF1qACSod1kWAoA/COSAk5nfGEmiGZxS1K9zisyvtY=; b=XVapAGn1ijWDPCirkbvg2AGbM02a1NEWEhTEay1RJoa6hsBG1/GOhnVnSbeFniEH1+FyzB QTF1sbsstyp2zk5X69U+Ik9w6uWG+QgbVu4vWo0XSdw8N54wawWOnmtxpsEq8WYUjJopFy 98V6EZ0bLrENv1VkfM9yJFwgW71+cUI= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1735051154; a=rsa-sha256; cv=none; b=v7U9lHulgYxIc5hniZE+26OBTo+nH3XqGlXniEnzKy5Has3WNiI8iApoCT4852EXxeqpg7 1c2w2zwtAixr79gP5daRIhElfWwxgUT9to8XA8673Ue7efAmtLVKpnKStDv7MIJRPay+FZ 1Jt7fJKRTT9d/vCnJUmdCiqpAoTUETo= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=d92uMNdx; spf=pass (imf08.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.182 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-pl1-f182.google.com with SMTP id d9443c01a7336-2161eb95317so55246205ad.1 for ; Tue, 24 Dec 2024 06:39:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1735051198; x=1735655998; darn=kvack.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=9HF1qACSod1kWAoA/COSAk5nfGEmiGZxS1K9zisyvtY=; b=d92uMNdx9OPR6z5l2clgLQVMeEKSJ0i9lLoQz9QklAI7KlaNwNgZAXmTgM2UIKgGpL m8x9huOqNTjnvPIYoWvZpt/YjXtH1DjpD8+vnaW+Nu9O6Agfq1o9HIcmoQu0vIfdv6yz Msie976MANqjbnLZ/keRjGevfXMn3iYWwhc9A8Ol/B+YP51S6BmZrKRh7g9DD1wl15Jd fPcjEFgNTcfNTdsagbPCo27OXyB9JOOvU3XZR8qtG1b954frLUEwm1G1t/yjlQxBmfuS ZQVbq/TKKOCDV0yXuzClBkDEIA910BGe9HOATUfI5TvtNb0p8ihqu9j+Co0GRmdmCBOS uVDg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1735051198; x=1735655998; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=9HF1qACSod1kWAoA/COSAk5nfGEmiGZxS1K9zisyvtY=; b=S9kF/ujSlJAI4bsTOBLivfGxfwC3g+tTsWrtNXq551I1q+e9bGAgFGlphQD5PyTRX2 JiyaXnIlW85q7uT0nq/oMedEjsunYT0Y8riL1epmBlkx+d+A1+kvRyIq1jRU6GNnOXdv HN/mCs08MbhV7P3uFDQ5gDDE2cacSGYc6Mg9gdS8YsuU+5Gp+ztG+tcw8vwUypzmHNDP v7OpCfG240Ukb4DMTIOleGtSgrJv3ljqTKFl8B689KcKsdqd/L5NfquvT4hO8QO9ostA 4F3rHfrh/aKUsEfmm0JqorlHi9CuvoE4b9U6tJ0AxoLcpHZpg754NvnBSkT5KSwhD0t1 BTzw== X-Gm-Message-State: AOJu0YysWhR+QqToEEuD5OZ1LLhqPOMEkChxj7F7M1/Ul0Quc6eWvRmt ske44nBhsyRkPg+LtvT6P9DH2oqQynU3CVi/YV0CEbTYbVDJn3xMRSL/3oxTWAQ= X-Gm-Gg: ASbGncue4I8onqTCbp1X8327drYtLoJYAPXz1YYRthDDwoZLWIvVC6vlYVMBjo3eV8r dnJ53RObx7plKw1IZ163TMYCopkMOtfyHqchwbITtd4cz6KNo5dXq6Bb0cDZAMOf+UiQjMklDUq /i2HzbA9JcwLXuxjm3hgUnCf/UEipNwcMkEEvRDDpADJnS1prhf2JGCPdilP3/lL4OI9PdDGa3n s02IHJkZZO4+Y+Tu0z7JuqXJrRHbbcqiXuxQIBktFKKdxQVaeIzpdf5pxF98lJlNE3aTHQPhZqm Ng== X-Google-Smtp-Source: AGHT+IG6ErdK7jWxaeL/lueO5nmWZSFVm6gM6DX2BhsbGjPUTsi72O0feo489S4NIqCP9JZBLNzB4w== X-Received: by 2002:a17:903:182:b0:216:4943:e575 with SMTP id d9443c01a7336-219e6f28547mr173619465ad.57.1735051198014; Tue, 24 Dec 2024 06:39:58 -0800 (PST) Received: from KASONG-MC4.tencent.com ([115.171.41.189]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-219dc9cdec1sm90598735ad.136.2024.12.24.06.39.54 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 24 Dec 2024 06:39:57 -0800 (PST) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Chris Li , Barry Song , Ryan Roberts , Hugh Dickins , Yosry Ahmed , "Huang, Ying" , Nhat Pham , Johannes Weiner , Kalesh Singh , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH v2 07/13] mm, swap: hold a reference during scan and cleanup flag usage Date: Tue, 24 Dec 2024 22:38:05 +0800 Message-ID: <20241224143811.33462-8-ryncsn@gmail.com> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20241224143811.33462-1-ryncsn@gmail.com> References: <20241224143811.33462-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 X-Rspamd-Queue-Id: 8F4FE16000B X-Rspam-User: X-Rspamd-Server: rspam07 X-Stat-Signature: oshgyg5piueizsfz8rwtqwg6etfo7t4p X-HE-Tag: 1735051173-688127 X-HE-Meta: U2FsdGVkX18LtCNs6opa5XuqNNLCjcD+JXeDCOmXAcGf1fX20+lqkq11bfx/8uzYAAbhnDWYCO91BGNgue44d/QLfbmDl8ujosWpE6auchmYUt6fs1g1M6QJaZzZYaraj4tFBRhHKUH4Pi7LAC3lFQmQErCoZ2dwgIBanHhwWcW5NwUfgaNGAyQWuragcXsMu5xY2Np7nDALD8b4V0BDcu0EItCR7uR027lNXC7OwrA/aZcnGifrQcoGyGcT6aETbyP0g10HWiUJLgkiZpUO7jYy8ZiVSuH5fUWtUgrdCTfw3Vj3KYs5+X0hyvtl5jL5HlIE5futlp0mT8G51qU78lY/amDCbX/CujmUqIDlBQ2BMSKo4dq/6mxezqMFOGa3h8SGK9l6MLJdsM+N5/TitDfMpVSFnnCSgJcbNWzbGcuNQ+2Tt8ouSPCHz+Ytds2JhaD/03ic+kBdCR1FMtvYjvRQkRjz9sCV2A7SJknn1o1gspWfuwS5YsaxyvupZeR6/haxKilN5yMAnhymudqSID83vL05l2L9Zdb1et9YXKGcGE8Xu/ZxCVpQdrl//3OJv1QrT6jLCdVimc3NMNcYASxJfxfAJA0VKtvTau/tMyILRy0EDZfAIPwMCD6vyvK5Sohw+NsUSj0XkjVRCAig50tx/ERR54L3W/U2lCl9kwau0/JrQhHyvx6Ffe4wWHeSyLoerDi661EyJaV1ai/oKtuafaWedTMlR2X8BGf6p4HPpzT5H5BRTQ6W4OPTgmR6OwNrttIRnzv+TtIVCLDRRQgppjLDRmYgxkL9ie4IBpc8UNCOmsutizd8vQbJXF+ie3EN+Ahau0hHDag23xyXzHTyQrXQtanTF0syqmk+ysL+sd8ZKcW25f/faFdlWglv+Mu7jP8Wr+FeQqunWTZXT6eP+pnBDnvDgvkV9oMZk8e5bNOcvWUXIFTfD3isCWQ7mnyGeNmdnPE0jiSPiTp IHcrKd6j ly2NNuH2LCC5MgV+dcxGNAZ5LAyswOkl4KJkTMBKZnwLnGn6gyxwQg43hn3xeFppIh6xXUa+q3a2Aokv91Q/2c4JQaW8xylUN+ALMiL5Fn5ne3wAhkjh3A/L9WXxI4Q+1h86SrWtCNQcXDN3xgDVKD7D9POPvHsjbTk5Lvqicg5wjZMOjkpYU7C1gezE3EDaJcRE/GxhO3yLj03xwV/vjkqveZOEeTAJ5h2qBl62KN74W4gc68kf0J2oVx5UwxS/n2bIcXwFgitbGtRMwusvNVzqlsBlgJ+DJmbXEKoFI3yKZnPqfhLNs//AtxxGCdtjqHORBdL02ChN3Eq8TROUvZuXfcse8lARHlDD85VmJCxavvpAHmsaWQVwH7fV9wfGOIUv1PT9F4bBWfmGbd8rykSlQ/1fMspZVKk0lsfHzTS45gLtjibrZIGXKMF4uTrXxn/B5lbo2LNE0Hsv4/A8d41HUO7837+vOGYPsECkx3HDIthdULf0CQqqjDO/2VFr/FR/z5MBwVfK3qiTSnf+CTs2j1A== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Kairui Song The flag SWP_SCANNING was used as an indicator of whether a device is being scanned for allocation, and prevents swapoff. Combined with SWP_WRITEOK, they work as a set of barriers for a clean swapoff: 1. Swapoff clears SWP_WRITEOK, allocation requests will see ~SWP_WRITEOK and abort as it's serialized by si->lock. 2. Swapoff unuses all allocated entries. 3. Swapoff waits for SWP_SCANNING flag to be cleared, so ongoing allocations will stop, preventing UAF. 4. Now swapoff can free everything safely. This will make the allocation path have a hard dependency on si->lock. Allocation always have to acquire si->lock first for setting SWP_SCANNING and checking SWP_WRITEOK. This commit removes this flag, and just uses the existing per-CPU refcount instead to prevent UAF in step 3, which serves well for such usage without dependency on si->lock, and scales very well too. Just hold a reference during the whole scan and allocation process. Swapoff will kill and wait for the counter. And for preventing any allocation from happening after step 1 so the unuse in step 2 can ensure all slots are free, swapoff will acquire the ci->lock of each cluster one by one to ensure all allocations see ~SWP_WRITEOK and abort. This way these dependences on si->lock are gone. And worth noting we can't kill the refcount as the first step for swapoff as the unuse process have to acquire the refcount. Signed-off-by: Kairui Song --- include/linux/swap.h | 1 - mm/swapfile.c | 90 ++++++++++++++++++++++++++++---------------- 2 files changed, 57 insertions(+), 34 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index e1eeea6307cd..02120f1005d5 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -219,7 +219,6 @@ enum { SWP_STABLE_WRITES = (1 << 11), /* no overwrite PG_writeback pages */ SWP_SYNCHRONOUS_IO = (1 << 12), /* synchronous IO is efficient */ /* add others here before... */ - SWP_SCANNING = (1 << 14), /* refcount in scan_swap_map */ }; #define SWAP_CLUSTER_MAX 32UL diff --git a/mm/swapfile.c b/mm/swapfile.c index ae0f7df06474..0abff343f5f0 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -658,6 +658,8 @@ static bool cluster_alloc_range(struct swap_info_struct *si, struct swap_cluster { unsigned int nr_pages = 1 << order; + lockdep_assert_held(&ci->lock); + if (!(si->flags & SWP_WRITEOK)) return false; @@ -1055,8 +1057,6 @@ static int cluster_alloc_swap(struct swap_info_struct *si, { int n_ret = 0; - si->flags += SWP_SCANNING; - while (n_ret < nr) { unsigned long offset = cluster_alloc_swap_entry(si, order, usage); @@ -1065,8 +1065,6 @@ static int cluster_alloc_swap(struct swap_info_struct *si, slots[n_ret++] = swp_entry(si->type, offset); } - si->flags -= SWP_SCANNING; - return n_ret; } @@ -1108,6 +1106,22 @@ static int scan_swap_map_slots(struct swap_info_struct *si, return cluster_alloc_swap(si, usage, nr, slots, order); } +static bool get_swap_device_info(struct swap_info_struct *si) +{ + if (!percpu_ref_tryget_live(&si->users)) + return false; + /* + * Guarantee the si->users are checked before accessing other + * fields of swap_info_struct, and si->flags (SWP_WRITEOK) is + * up to dated. + * + * Paired with the spin_unlock() after setup_swap_info() in + * enable_swap_info(), and smp_wmb() in swapoff. + */ + smp_rmb(); + return true; +} + int get_swap_pages(int n_goal, swp_entry_t swp_entries[], int entry_order) { int order = swap_entry_order(entry_order); @@ -1135,13 +1149,16 @@ int get_swap_pages(int n_goal, swp_entry_t swp_entries[], int entry_order) /* requeue si to after same-priority siblings */ plist_requeue(&si->avail_lists[node], &swap_avail_heads[node]); spin_unlock(&swap_avail_lock); - spin_lock(&si->lock); - n_ret = scan_swap_map_slots(si, SWAP_HAS_CACHE, - n_goal, swp_entries, order); - spin_unlock(&si->lock); - if (n_ret || size > 1) - goto check_out; - cond_resched(); + if (get_swap_device_info(si)) { + spin_lock(&si->lock); + n_ret = scan_swap_map_slots(si, SWAP_HAS_CACHE, + n_goal, swp_entries, order); + spin_unlock(&si->lock); + put_swap_device(si); + if (n_ret || size > 1) + goto check_out; + cond_resched(); + } spin_lock(&swap_avail_lock); /* @@ -1292,16 +1309,8 @@ struct swap_info_struct *get_swap_device(swp_entry_t entry) si = swp_swap_info(entry); if (!si) goto bad_nofile; - if (!percpu_ref_tryget_live(&si->users)) + if (!get_swap_device_info(si)) goto out; - /* - * Guarantee the si->users are checked before accessing other - * fields of swap_info_struct. - * - * Paired with the spin_unlock() after setup_swap_info() in - * enable_swap_info(). - */ - smp_rmb(); offset = swp_offset(entry); if (offset >= si->max) goto put_out; @@ -1781,10 +1790,13 @@ swp_entry_t get_swap_page_of_type(int type) goto fail; /* This is called for allocating swap entry, not cache */ - spin_lock(&si->lock); - if ((si->flags & SWP_WRITEOK) && scan_swap_map_slots(si, 1, 1, &entry, 0)) - atomic_long_dec(&nr_swap_pages); - spin_unlock(&si->lock); + if (get_swap_device_info(si)) { + spin_lock(&si->lock); + if ((si->flags & SWP_WRITEOK) && scan_swap_map_slots(si, 1, 1, &entry, 0)) + atomic_long_dec(&nr_swap_pages); + spin_unlock(&si->lock); + put_swap_device(si); + } fail: return entry; } @@ -2558,6 +2570,25 @@ bool has_usable_swap(void) return ret; } +/* + * Called after clearing SWP_WRITEOK, ensures cluster_alloc_range + * see the updated flags, so there will be no more allocations. + */ +static void wait_for_allocation(struct swap_info_struct *si) +{ + unsigned long offset; + unsigned long end = ALIGN(si->max, SWAPFILE_CLUSTER); + struct swap_cluster_info *ci; + + BUG_ON(si->flags & SWP_WRITEOK); + + for (offset = 0; offset < end; offset += SWAPFILE_CLUSTER) { + ci = lock_cluster(si, offset); + unlock_cluster(ci); + offset += SWAPFILE_CLUSTER; + } +} + SYSCALL_DEFINE1(swapoff, const char __user *, specialfile) { struct swap_info_struct *p = NULL; @@ -2628,6 +2659,8 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile) spin_unlock(&p->lock); spin_unlock(&swap_lock); + wait_for_allocation(p); + disable_swap_slots_cache_lock(); set_current_oom_origin(); @@ -2670,15 +2703,6 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile) spin_lock(&p->lock); drain_mmlist(); - /* wait for anyone still in scan_swap_map_slots */ - while (p->flags >= SWP_SCANNING) { - spin_unlock(&p->lock); - spin_unlock(&swap_lock); - schedule_timeout_uninterruptible(1); - spin_lock(&swap_lock); - spin_lock(&p->lock); - } - swap_file = p->swap_file; p->swap_file = NULL; p->max = 0; From patchwork Tue Dec 24 14:38:06 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kairui Song X-Patchwork-Id: 13920196 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D16ABE7718D for ; Tue, 24 Dec 2024 14:40:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6F3D76B009D; Tue, 24 Dec 2024 09:40:06 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 654046B009E; Tue, 24 Dec 2024 09:40:06 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 481776B009F; Tue, 24 Dec 2024 09:40:06 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 1F0B56B009D for ; Tue, 24 Dec 2024 09:40:06 -0500 (EST) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id CE2201A1681 for ; Tue, 24 Dec 2024 14:40:05 +0000 (UTC) X-FDA: 82930110564.19.5BB089E Received: from mail-pl1-f175.google.com (mail-pl1-f175.google.com [209.85.214.175]) by imf14.hostedemail.com (Postfix) with ESMTP id 6B70510001B for ; Tue, 24 Dec 2024 14:39:20 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=g1jRSSFy; spf=pass (imf14.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.175 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1735051177; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=x1wPQexfBO4aVb9VNyBG+dw97Ap+sARYAr0a6bgjO1Q=; b=F2fobEwtpE0wNQvWYBsW+hiadg09Ck7WiYnnvjx+8Phz+ChjSI0bTXLoRvSFpjdZZteWC9 rYBzBpGQeSv+elKj3LsUeNJyFfLHclxvvTuGfvOBPndSNxAtz3V4JANyeiAWJ38FQjo67O gFZEo65rVolCHdlV1Vwlwcvc+rIrlkk= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=g1jRSSFy; spf=pass (imf14.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.175 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1735051177; a=rsa-sha256; cv=none; b=bc3I5n2EPL64RLlusdN5zV5s2qC4CAvlpEg6/rQjGPSCYaUf6qxWMEhirMCkxMr2g3kUsF Hls4MihS12Xb8yD3pl+v2uO7pPRjMoCMxu1JL7dL6/NxHwV+cxpRruXl2IK+uArvrKGR/H VzMXGwJ5OWxs5cjOM7mzhRxH03DbACk= Received: by mail-pl1-f175.google.com with SMTP id d9443c01a7336-2166f1e589cso62911245ad.3 for ; Tue, 24 Dec 2024 06:40:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1735051202; x=1735656002; darn=kvack.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=x1wPQexfBO4aVb9VNyBG+dw97Ap+sARYAr0a6bgjO1Q=; b=g1jRSSFydq4C4DvUSS9SQ71JJjEYrVUkKOXWt1HNZiI+ZpzlCDWb2g6lyfJWhw1nOs clKiRBwedJvRHtQThylYt7PsRd3KetxZgcW2vwDRVH4UGytZJQ2CgxgJulU+r7d9+8hV qXNO4z7YOx8taQZYQJzlPFJc3XFpJtZqpkFeJhMTtIruXJbg1hs2MDTMhUWNt6Z89ZqZ 9cdxJ2yopeL2jM3sSp9DxeSc62y4mjOkTxiw7+Yhdjnjt6T02tTgEYmiDEGp/GKbrbO8 iCdegaPX8lDQCo+1dpRX1g5TKePCcmO+nlovdDDo6VomAmIf06ZJWQQJP1Wkf11f3YlV 1W9w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1735051202; x=1735656002; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=x1wPQexfBO4aVb9VNyBG+dw97Ap+sARYAr0a6bgjO1Q=; b=P5SKruLyMW2QjFrqhWDrcif4vMFatl9iT80SNhhfyWJJgze6sGzREE5ni6PaB8TIfO r/hAKevyRfUjAEJB4pb+/ZEEympsbf9Di1D8QLwe59PfdwbBv0pNAdVOF94ENVB6cXFu r9Vp/SpIoOKuOa5a+7Ql3d0XibX+YrIURed3rOHGruXWmyrR7iNRxFCqYaSCx95hjq2u bayXCKzf50BT+dbgiTL49YbpZxo6r1cYCfHhHBzp7jOBh7tOA9vsspFoh13zJLAC7fW1 U2UB2iC6lVlu/SvRuk+UskAOQ/Fs0TTrd2NKcM9TQtcvgwHNC/F5Na+VbWy/yUTJFNrz v3Cg== X-Gm-Message-State: AOJu0Yxf+LqiN+sQYiD2bPepCwSMGm09p6yCgY3fDUQj4N+2jp03uwPA bCuHkLCto7TLq0zK7SpiTT5jM/W/HRIFw+46tQ7AUiOa4DgfHKLmvrbe5Yd9Gfk= X-Gm-Gg: ASbGncsEjhhj0y7Wgt584UoFbgcpaTEPWeixm/uarykEgcGtJhwwF5ctqBKKo9sTMJF nmshiV+TZwoQrwMPoC908OmiEUNyn5w0M8mIimW+9dOaqLC6VYEPmRHkiEK30GlaqUIsZXMl9Al AZXAUVS3pdytTqUr5a4FVJRWT5IfYAn447k8aJ2P5efoXcnIhPGSP7b5BGb7oY93bHjnWMkk+HD 1SYbXN/+IUfgBHlD9CaBK6LKWzIGXrJgNyaDxcTAWtMfAhFSpA0fv8RQDEOD5MkNVYRNAv4VsLr ZA== X-Google-Smtp-Source: AGHT+IEysBcICmtcsLy08l3mcesaPslPW6JfCL1D4WCO+ovHstzHh+/6NlLWVBwW4F+V5pf421KQkA== X-Received: by 2002:a17:903:1105:b0:215:a3fb:b4d6 with SMTP id d9443c01a7336-219e6e8c5a0mr250562295ad.8.1735051201813; Tue, 24 Dec 2024 06:40:01 -0800 (PST) Received: from KASONG-MC4.tencent.com ([115.171.41.189]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-219dc9cdec1sm90598735ad.136.2024.12.24.06.39.58 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 24 Dec 2024 06:40:01 -0800 (PST) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Chris Li , Barry Song , Ryan Roberts , Hugh Dickins , Yosry Ahmed , "Huang, Ying" , Nhat Pham , Johannes Weiner , Kalesh Singh , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH v2 08/13] mm, swap: use an enum to define all cluster flags and wrap flags changes Date: Tue, 24 Dec 2024 22:38:06 +0800 Message-ID: <20241224143811.33462-9-ryncsn@gmail.com> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20241224143811.33462-1-ryncsn@gmail.com> References: <20241224143811.33462-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 X-Rspamd-Server: rspam05 X-Stat-Signature: aqajpe71cseh8kin519dh5js15wf8jpn X-Rspamd-Queue-Id: 6B70510001B X-Rspam-User: X-HE-Tag: 1735051160-601289 X-HE-Meta: U2FsdGVkX1/SmMCzNbQjfOnqT6JHaOCb4QDQHvaFkyhs4RPEr8nTre2Nwx4oLWPKzs0t444pq31c34PGJD8RZo3HwvJMPOTmUyXCUp+EsmERIeUcZyVbbGwPfUKWUJQSIOwggAjuSAajbgo0FcbWmKOltb0R1xOhJONjPLMpz7v/dzZtlVd5U6R+wCoe03DiZUNKD0taNwDGai39/ddFltBeE3f0YmoGEuRBDGKaM+uFvZmSncsDlbeX7y5YEgfnUMhgwE0FhaJUEhd4+GNRmVT32x4p0O7uiNCuDEsvFB2XrorHM/gR+vlDvhPX1cX+nLWEmU++F6tCy4GWWVKwf27s5AaTwg+3QSQihp4ujz37DPtYhKeLoRur07HgIAORAkHjncsoCDXIAE0yi5OCScG6Sc8y8dKu2a7SwM4sty6mhwoUemjMSvV7P9MU8wxGMWSEMDdmY6NlWDB7scUYjNJTblK7f5NGLFoGGR4tPVPLqXOjy2VqFuvmssJZkTe+Ak0xy+rcOhi96xoklRqSAAl1w9MgcHjSpJ/VSVOj0EignCqHpLSsiBGb54UGGwBsmPk97sEKFdqqy5aog5+t/jFWbavhqG5qUoHWh1U/un9yu1Fu0g+DVGuMRMNCQWSlKkk+I7xrO9yeYD8syyfovmg79h48cK1vD5YggkTDQIo5VCQwf7xoQvUOpFXs4WFGN8HGI6Vj/zRO7drAGpREcQDrZgYHHAo242IVGgSvy7v/ooHHk33J/2r2w+Mnfne2NlIsxwpr94s7xtvv30iLDkhYvT2qRR9ecV27NxsALjBtHimD4FkKGaHvhBTbUVMhf83cyX832LeRD8YsQT1pV+HP/CfqQVbdb1b7nTJ2JMcQeR+wTe3Rjyecdbz6xM2EYunVODwKRoS/AFhmRF755dYghC2iC2UNnviXtAPkVabdijINZ7USAYLwJCHugid1Ww/rptEapRiRvDz3AG/ sf16sl/5 wYXzByrvLb+HjtLs9VvWMiFbcEh3rSF/YBwh93q8xKoEjThNx9yTfl+KNf/nsvPCrZ1Xzjke2bEaX+l8nVOwhQezzhQk1r2CtjE2swBvA5VmVG9XAPX/dnXM0MuxY5+npjy7IgCZ83x2DYhrpSWl9+1BvvC3mIVzpz5uUEBY1JfY5nuuvYu2Sg4Ap6rkbtE3g8jS9Rhr6w+fvTmZtC570z5oqlph3fKtcAb1KvRe2SpbIP5YXgTYaWEtz6iS8Hy82t+oOkaOQptNcFVbK5CxVANOPjvUjbLiSCGlSwvQY2RHknHe2GFkYd2DrWiZb3Tl+07xJx0yImXXoSjHSQfBNYXDIr89dkhwUDvbUwac2ikGyqzAK0sBgWouPN7iv3IKsT9QBgCdutJXLLLqprUiTCNNcoh3DzP2FerzV75e5oN2Hhfji1Kp47RGIVqesheSFaq+JskS1ubgrTWLT5/Yt/dXZuIyTcDqyz+x3RHn0HSagzT5gjRsUgRTjDhgkyLC82hvbic0yMCs/kAC+wVUvKK4NN9clmvMgGyGotlLhNVUfP9M= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000860, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Kairui Song Currently, we are only using flags to indicate which list the cluster is on. Using one bit for each list type might be a waste, as the list type grows, we will consume too many bits. Additionally, the current mixed usage of '&' and '==' is a bit confusing. Make it clean by using an enum to define all possible cluster statuses. Only an off-list cluster will have the NONE (0) flag. And use a wrapper to annotate and sanitize all flag settings and list movements. Suggested-by: Chris Li Signed-off-by: Kairui Song --- include/linux/swap.h | 17 +++++++--- mm/swapfile.c | 75 +++++++++++++++++++++++--------------------- 2 files changed, 52 insertions(+), 40 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index 02120f1005d5..339d7f0192ff 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -257,10 +257,19 @@ struct swap_cluster_info { u8 order; struct list_head list; }; -#define CLUSTER_FLAG_FREE 1 /* This cluster is free */ -#define CLUSTER_FLAG_NONFULL 2 /* This cluster is on nonfull list */ -#define CLUSTER_FLAG_FRAG 4 /* This cluster is on nonfull list */ -#define CLUSTER_FLAG_FULL 8 /* This cluster is on full list */ + +/* All on-list cluster must have a non-zero flag. */ +enum swap_cluster_flags { + CLUSTER_FLAG_NONE = 0, /* For temporary off-list cluster */ + CLUSTER_FLAG_FREE, + CLUSTER_FLAG_NONFULL, + CLUSTER_FLAG_FRAG, + /* Clusters with flags above are allocatable */ + CLUSTER_FLAG_USABLE = CLUSTER_FLAG_FRAG, + CLUSTER_FLAG_FULL, + CLUSTER_FLAG_DISCARD, + CLUSTER_FLAG_MAX, +}; /* * The first page in the swap file is the swap header, which is always marked diff --git a/mm/swapfile.c b/mm/swapfile.c index 0abff343f5f0..be2c719a51bb 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -403,7 +403,7 @@ static void discard_swap_cluster(struct swap_info_struct *si, static inline bool cluster_is_free(struct swap_cluster_info *info) { - return info->flags & CLUSTER_FLAG_FREE; + return info->flags == CLUSTER_FLAG_FREE; } static inline unsigned int cluster_index(struct swap_info_struct *si, @@ -434,6 +434,27 @@ static inline void unlock_cluster(struct swap_cluster_info *ci) spin_unlock(&ci->lock); } +static void cluster_move(struct swap_info_struct *si, + struct swap_cluster_info *ci, struct list_head *list, + enum swap_cluster_flags new_flags) +{ + VM_WARN_ON(ci->flags == new_flags); + BUILD_BUG_ON(1 << sizeof(ci->flags) * BITS_PER_BYTE < CLUSTER_FLAG_MAX); + + if (ci->flags == CLUSTER_FLAG_NONE) { + list_add_tail(&ci->list, list); + } else { + if (ci->flags == CLUSTER_FLAG_FRAG) { + VM_WARN_ON(!si->frag_cluster_nr[ci->order]); + si->frag_cluster_nr[ci->order]--; + } + list_move_tail(&ci->list, list); + } + ci->flags = new_flags; + if (new_flags == CLUSTER_FLAG_FRAG) + si->frag_cluster_nr[ci->order]++; +} + /* Add a cluster to discard list and schedule it to do discard */ static void swap_cluster_schedule_discard(struct swap_info_struct *si, struct swap_cluster_info *ci) @@ -447,10 +468,8 @@ static void swap_cluster_schedule_discard(struct swap_info_struct *si, */ memset(si->swap_map + idx * SWAPFILE_CLUSTER, SWAP_MAP_BAD, SWAPFILE_CLUSTER); - - VM_BUG_ON(ci->flags & CLUSTER_FLAG_FREE); - list_move_tail(&ci->list, &si->discard_clusters); - ci->flags = 0; + VM_BUG_ON(ci->flags == CLUSTER_FLAG_FREE); + cluster_move(si, ci, &si->discard_clusters, CLUSTER_FLAG_DISCARD); schedule_work(&si->discard_work); } @@ -458,12 +477,7 @@ static void __free_cluster(struct swap_info_struct *si, struct swap_cluster_info { lockdep_assert_held(&si->lock); lockdep_assert_held(&ci->lock); - - if (ci->flags) - list_move_tail(&ci->list, &si->free_clusters); - else - list_add_tail(&ci->list, &si->free_clusters); - ci->flags = CLUSTER_FLAG_FREE; + cluster_move(si, ci, &si->free_clusters, CLUSTER_FLAG_FREE); ci->order = 0; } @@ -479,6 +493,8 @@ static void swap_do_scheduled_discard(struct swap_info_struct *si) while (!list_empty(&si->discard_clusters)) { ci = list_first_entry(&si->discard_clusters, struct swap_cluster_info, list); list_del(&ci->list); + /* Must clear flag when taking a cluster off-list */ + ci->flags = CLUSTER_FLAG_NONE; idx = cluster_index(si, ci); spin_unlock(&si->lock); @@ -519,9 +535,6 @@ static void free_cluster(struct swap_info_struct *si, struct swap_cluster_info * lockdep_assert_held(&si->lock); lockdep_assert_held(&ci->lock); - if (ci->flags & CLUSTER_FLAG_FRAG) - si->frag_cluster_nr[ci->order]--; - /* * If the swap is discardable, prepare discard the cluster * instead of free it immediately. The cluster will be freed @@ -573,13 +586,9 @@ static void dec_cluster_info_page(struct swap_info_struct *si, return; } - if (!(ci->flags & CLUSTER_FLAG_NONFULL)) { - VM_BUG_ON(ci->flags & CLUSTER_FLAG_FREE); - if (ci->flags & CLUSTER_FLAG_FRAG) - si->frag_cluster_nr[ci->order]--; - list_move_tail(&ci->list, &si->nonfull_clusters[ci->order]); - ci->flags = CLUSTER_FLAG_NONFULL; - } + if (ci->flags != CLUSTER_FLAG_NONFULL) + cluster_move(si, ci, &si->nonfull_clusters[ci->order], + CLUSTER_FLAG_NONFULL); } static bool cluster_reclaim_range(struct swap_info_struct *si, @@ -663,11 +672,13 @@ static bool cluster_alloc_range(struct swap_info_struct *si, struct swap_cluster if (!(si->flags & SWP_WRITEOK)) return false; + VM_BUG_ON(ci->flags == CLUSTER_FLAG_NONE); + VM_BUG_ON(ci->flags > CLUSTER_FLAG_USABLE); + if (cluster_is_free(ci)) { - if (nr_pages < SWAPFILE_CLUSTER) { - list_move_tail(&ci->list, &si->nonfull_clusters[order]); - ci->flags = CLUSTER_FLAG_NONFULL; - } + if (nr_pages < SWAPFILE_CLUSTER) + cluster_move(si, ci, &si->nonfull_clusters[order], + CLUSTER_FLAG_NONFULL); ci->order = order; } @@ -675,14 +686,8 @@ static bool cluster_alloc_range(struct swap_info_struct *si, struct swap_cluster swap_range_alloc(si, nr_pages); ci->count += nr_pages; - if (ci->count == SWAPFILE_CLUSTER) { - VM_BUG_ON(!(ci->flags & - (CLUSTER_FLAG_FREE | CLUSTER_FLAG_NONFULL | CLUSTER_FLAG_FRAG))); - if (ci->flags & CLUSTER_FLAG_FRAG) - si->frag_cluster_nr[ci->order]--; - list_move_tail(&ci->list, &si->full_clusters); - ci->flags = CLUSTER_FLAG_FULL; - } + if (ci->count == SWAPFILE_CLUSTER) + cluster_move(si, ci, &si->full_clusters, CLUSTER_FLAG_FULL); return true; } @@ -821,9 +826,7 @@ static unsigned long cluster_alloc_swap_entry(struct swap_info_struct *si, int o while (!list_empty(&si->nonfull_clusters[order])) { ci = list_first_entry(&si->nonfull_clusters[order], struct swap_cluster_info, list); - list_move_tail(&ci->list, &si->frag_clusters[order]); - ci->flags = CLUSTER_FLAG_FRAG; - si->frag_cluster_nr[order]++; + cluster_move(si, ci, &si->frag_clusters[order], CLUSTER_FLAG_FRAG); offset = alloc_swap_scan_cluster(si, cluster_offset(si, ci), &found, order, usage); frags++; From patchwork Tue Dec 24 14:38:07 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kairui Song X-Patchwork-Id: 13920197 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0D278E7718D for ; Tue, 24 Dec 2024 14:40:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9B2816B009F; Tue, 24 Dec 2024 09:40:11 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 93B436B00A0; Tue, 24 Dec 2024 09:40:11 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 73FEA6B00A1; Tue, 24 Dec 2024 09:40:11 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 4CF266B009F for ; Tue, 24 Dec 2024 09:40:11 -0500 (EST) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 121E31C7D7E for ; Tue, 24 Dec 2024 14:40:11 +0000 (UTC) X-FDA: 82930111824.11.0FCE8B9 Received: from mail-pl1-f177.google.com (mail-pl1-f177.google.com [209.85.214.177]) by imf02.hostedemail.com (Postfix) with ESMTP id BE5528000D for ; Tue, 24 Dec 2024 14:38:53 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=GalU1LCJ; spf=pass (imf02.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.177 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1735051191; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ME3+zD7lj7HjkKMbVGqaEWjurYQ2nckLRVsGy7KATEg=; b=ullHZ9+51zxj5apYUif7p5hgzkWdvCs0lZn4W8Sr/Ovg6+dzxsMd0vi4fQ1l0BW6FRL8Ee uQfsg2a33H5AOry9jkNm6UGIimh7UlCMJblCaQWr3DDA9bSGWasz+woZQ9H/RgEVl5XBIR N4rgp2TkLu4vZnLBKoMW9wp+eAFw1F4= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=GalU1LCJ; spf=pass (imf02.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.177 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1735051191; a=rsa-sha256; cv=none; b=lHMVgNeKBXxTEMe66RxHv/mkWk3YcD1G5HrjzUAzKU+kR3Bj9DqV4NaR3j1mqKgImDC4lu sL2jjsAoWLTpPfrE6mqlNCFQZietv7TYsu5qp5qslVVSIy9KV0laH+cV+xzhQiaTqgVjpC gL8X0nl3cjmjSc7JKfJzSkk2syMj97U= Received: by mail-pl1-f177.google.com with SMTP id d9443c01a7336-21628b3fe7dso50680175ad.3 for ; Tue, 24 Dec 2024 06:40:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1735051207; x=1735656007; darn=kvack.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=ME3+zD7lj7HjkKMbVGqaEWjurYQ2nckLRVsGy7KATEg=; b=GalU1LCJyJY0UaeMmRmCvaQ8l87bs7xfPhev6t1pUQ/ytFJoutyj1gBhiYCkByrWxY b/u29sTmzqondGqN704rlK8taXdEJJ3HCWaVaXK1MiBBYcJFSjh/RW7AGPe03n/Cjbyt s3L4VqGzPxHnGQzb0YIyPW7GJXtW7TilOhQhY8Z0PtkGlW2m8tHdTqnQHaBpYhuZR+EQ DM0cBH8X22hTMo0db2Bu1JfYVzP4rc20scNAPaWqr1DhebVRUUzlcud4a1s/bbusezj2 cYsWcCbraGPyfaiRfH+35EoIH8stpN45hr20D7Xflqdhei09eUz7Ix6nWwgLgyQysHd3 J5Ww== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1735051207; x=1735656007; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=ME3+zD7lj7HjkKMbVGqaEWjurYQ2nckLRVsGy7KATEg=; b=WkIkTOEm2ZuLt0FlER2/uSRro13zp2k7Bz951JgNzvMt7AykcdTMJdhtBe3U4AbtgP c1iVVLQ5VGTwO5PVpp8ZWzXHAU8TBGtMiJ68rRvk5A1POq/+I2bOZgeGZdsozmmlpoH/ +beTu7Vre1h9j6R/J9zAj49poo0UYchZxsiX6mGBcoNXPPldWoecp/H64fWOYngEPL85 oRZycSp7kO1xZRfrvxiz4aSsxw6XTJkE2htYqyz0f/XBD5Y33/851YokfIercNe8ahZg Xh9BEuvMnCEjL423evkjg75zc3iAzH/+K4t94GALsdPuGDmg1sKiCrFml7HQAnCdn4xN ddeg== X-Gm-Message-State: AOJu0YzfzSA4CewASc2VfpTVq3O0VULCFvo4e7lbEFdaRMKAD9YwVQ7V vPigcy7ms1kgd6PvEbUgwyxcF7aVtFUQmzpKf112F+KA3Xh/7a9CS6N/bsKBJ2o= X-Gm-Gg: ASbGnct8owg6Gw4Fbm6smRbWZCvgbWv5QEd7lLvCQtdq9/GyJESMi+ema8A0BEd6QID yLJuEIVwWQ5DiX3x0H0Tgoqie2GtJGlizDNMVjgJ0OVkoi7PXGh6UzUM3CdHYTkUXte/pEPPihP qFjJEFxFErYEH6hZ9Wf9H0EEHgKdotXFg3fHPupxXn++Au6YjI7HUmJhkhgbeYLe6ph1YsgkuJZ htncmnT+Cm2icCO+77XPYwR4qnSOZujea10mlwGimzOc45Bj997BrOAnWsiPmyUlF8T0MQSei4m uw== X-Google-Smtp-Source: AGHT+IEJmuSwenPjD3zFn8/72DrjxeSYMyc/SNGORcCuKwsk9/ba5LKYVBZqu+p5nbj2Buyvuo+UkA== X-Received: by 2002:a17:902:d2cc:b0:216:70b6:8723 with SMTP id d9443c01a7336-219e6f1090dmr233491155ad.44.1735051207001; Tue, 24 Dec 2024 06:40:07 -0800 (PST) Received: from KASONG-MC4.tencent.com ([115.171.41.189]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-219dc9cdec1sm90598735ad.136.2024.12.24.06.40.02 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 24 Dec 2024 06:40:06 -0800 (PST) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Chris Li , Barry Song , Ryan Roberts , Hugh Dickins , Yosry Ahmed , "Huang, Ying" , Nhat Pham , Johannes Weiner , Kalesh Singh , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH v2 09/13] mm, swap: reduce contention on device lock Date: Tue, 24 Dec 2024 22:38:07 +0800 Message-ID: <20241224143811.33462-10-ryncsn@gmail.com> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20241224143811.33462-1-ryncsn@gmail.com> References: <20241224143811.33462-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 X-Rspamd-Queue-Id: BE5528000D X-Stat-Signature: riaba1x9sb6e7wrg3s5f5r34km1pyo9q X-Rspam-User: X-Rspamd-Server: rspam11 X-HE-Tag: 1735051133-948927 X-HE-Meta: U2FsdGVkX1832A4A3LpNJMIne83kwufzhegb2tS+gSGWGNDNFvpi9hoygVpIBo6ZvCD6HWIOXleqMKDxoqULRFicPk4zf3eV5pGPSeMJ39CT7z4TGYdcWo8K/af3+tOdKXCaHPM7AoXpBqNW/vQU9AVdtS6nqqRmKCV8wim+izlWpMWnw8QDGWeD/FrTot4rT088cAYjfDmddd3b/KTJQwdUsoK9KwJnyRhNqUb3qqzmTWl9mOCB2nligFCZQyjokDlq4DUY26ollEvaYi4q/SrtpAP4Nnq5Fv/E50WxLkfCp+SX0TgvQ2Q6updrj1aYrFs3Rn3+P40YJaenDgXb9647KF1EPz/1EIIK+pns3dikK8cxFlVeoWn+JWiTLt6UfvJUvorY76Tj2sWdUXso9PFGbXjUPNqGcxLuYQ0o6XePhubbqC/ju7d1jfw+ovorwmmS+c6jn0fwobebo9e1ipYgX5/trI6FCL2egxCDfUSbz2HToWVxNowvULq4ejULo90QcjV39ctBeiPNNAaBVzVQfs1Sfio1DE+aSdDayrBtD3j+nCGSxzTZ/v5FsL973Qj2N0mcu4pIDc4Yro6y/aIXw9q6EcAC9n5Ctudspb1Xj74WvRP0gDu2G7pB7ABwzvSP7PdIhsJrO5PfP9HBjngsrPAPxlsrHPjzajnXWXMTIqs6HFanGX//jg2meVzBkV/Wrt0jtsjgp8NAWhdjVbYjDSsSknwBJq3URZThVLaxmxgBLSIcUTd96zZ1VR4E34V+Ex/GpZZQAnHgR3/APipMdOgkokCgPGsVfF3bswJfxrvE2W9DVfd0JAW2RXjDhvXbjziHeq41L3ATr0+M6L/4lVRdnJ+oSOxXGO/tb1KKRJUNihkHL9zOSjWKZUwQ1ErIkP91tQ8YA6bZUIm/3yC+Rfgb//4URARWKW1ia5jIBGottVCQiaOy8A8DBWcSPBOLPA40+a8+0nighQK t+1on1xH bkk0a8ysAvDindcYN7fLeeJwZ8xh/h2gfqyRsHHQBwAohlZlA77CQwNVWuaujgQ8rrQg75vqayHXOvfCZ0uVWe2SLlqyrD8IF+v1lLcckb9YooJ55ipkGaTFRHDS3/Pe4p4rC33SYeGqjSTm2rWUX7aA6LVGaaf4mhH9/M9z6O4qbr3TwFEwmU+WjqStbTOpSof9lNx1jVWSYBDyHmvLHx9hoy7qmpj2ORf9HaT1FNkV1gXWvo+hOwpY5695p1eQKzG/Dnp4CJEbULEjwnpcJy5zhubgp8uf4PlGYODx6+paQf4L3ZOKF0JX5oHyWKiW92Et2EFyZzrIGrBm8GanVoqeQo4ObIRfH4fOmTtU9HGrZxbHtwUX7AufMYUIDcBQPFTCOkOf1nlksWn+gTdvBNRyKWJtLLB2oWIABiN+62SAfJ4Q8B/geCZ4yHYm0JrKxbPuWb1Ks2Wj4XYrG7IS/nK5kP/A08T9ZW/7ZZsWWQmsLXZZtBRJKBQ6tSMiv0k2i/UTUFIo/NhNghTG3fiFuAmxZdETeZ7O2rl5UM8P4tavyl94= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Kairui Song Currently, swap locking is mainly composed of two locks: the cluster lock (ci->lock) and the device lock (si->lock). The cluster lock is much more fine-grained, so it is best to use ci->lock instead of si->lock as much as possible. We have cleaned up other hard dependencies on si->lock. Following the new cluster allocator design, most operations don't need to touch si->lock at all. In practice, we only need to take si->lock when moving clusters between lists. To achieve this, this commit reworks the locking pattern of all si->lock and ci->lock users, eliminates all usage of ci->lock inside si->lock, and introduces a new design to avoid touching si->lock unless needed. For minimal contention and easier understanding of the system, two ideas are introduced with the corresponding helpers: isolation and relocation. - Clusters will be `isolated` from the list when iterating the list to search for an allocatable cluster. This ensures other CPUs won't walk into the same cluster easily, and it releases si->lock after acquiring ci->lock, providing the only place that handles the inversion of two locks, and avoids contention. Iterating the cluster list almost always moves the cluster (free -> nonfull, nonfull -> frag, frag -> frag tail), but it doesn't know where the cluster should be moved to until scanning is done. So keeping the cluster off-list is a good option with low overhead. The off-list time window of a cluster is also minimal. In the worst case, one CPU will return the cluster after scanning the 512 entries on it, which we used to busy wait with a spin lock. This is done with the new helper `cluster_isolate_lock`. - Clusters will be `relocated` after allocation or freeing, according to their usage count and status. Allocations no longer hold si->lock now, and may drop ci->lock for reclaim, so the cluster could be moved to any location while no lock is held. Besides, isolation clears all flags when it takes the cluster off the list (the flags must be in sync with the list status, so cluster users don't need to touch si->lock for checking its list status). So the cluster has to be relocated to the right list according to its usage after allocation or freeing. Relocation is optional, if the cluster flags indicate it's already on the right list, it will skip touching the list or si->lock. This is done with relocate_cluster after allocation or with [partial_]free_cluster after freeing. This handled usage of all kinds of clusters in a clean way. Scanning and allocation by iterating the cluster list is handled by "isolate - - relocate". Scanning and allocation of per-CPU clusters will only involve " - relocate", as it knows which cluster to lock and use. Freeing will only involve "relocate". Each CPU will keep using its per-CPU cluster until the 512 entries are all consumed. Freeing also has to free 512 entries to trigger cluster movement in the best case, so si->lock is rarely touched. Testing with building the Linux kernel with defconfig showed huge improvement: tiem make -j96 / 768M memcg, 4K pages, 10G ZRAM, on Intel 8255C: Before: Sys time: 73578.30, Real time: 864.05 After: (-50.7% sys time, -44.8% real time) Sys time: 36227.49, Real time: 476.66 time make -j96 / 1152M memcg, 64K mTHP, 10G ZRAM, on Intel 8255C: (avg of 4 test run) Before: Sys time: 74044.85, Real time: 846.51 hugepages-64kB/stats/swpout: 1735216 hugepages-64kB/stats/swpout_fallback: 430333 After: (-40.4% sys time, -37.1% real time) Sys time: 44160.56, Real time: 532.07 hugepages-64kB/stats/swpout: 1786288 hugepages-64kB/stats/swpout_fallback: 243384 time make -j32 / 512M memcg, 4K pages, 5G ZRAM, on AMD 7K62: Before: Sys time: 8098.21, Real time: 401.3 After: (-22.6% sys time, -12.8% real time ) Sys time: 6265.02, Real time: 349.83 The allocation success rate also slightly improved as we sanitized the usage of clusters with new defined helpers, previously dropping si->lock or ci->lock during scan will cause cluster order shuffle. Suggested-by: Chris Li Signed-off-by: Kairui Song --- include/linux/swap.h | 3 +- mm/swapfile.c | 435 ++++++++++++++++++++++++------------------- 2 files changed, 246 insertions(+), 192 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index 339d7f0192ff..c4ff31cb6bde 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -291,6 +291,7 @@ enum swap_cluster_flags { * throughput. */ struct percpu_cluster { + local_lock_t lock; /* Protect the percpu_cluster above */ unsigned int next[SWAP_NR_ORDERS]; /* Likely next allocation offset */ }; @@ -313,7 +314,7 @@ struct swap_info_struct { /* list of cluster that contains at least one free slot */ struct list_head frag_clusters[SWAP_NR_ORDERS]; /* list of cluster that are fragmented or contented */ - unsigned int frag_cluster_nr[SWAP_NR_ORDERS]; + atomic_long_t frag_cluster_nr[SWAP_NR_ORDERS]; unsigned int pages; /* total of usable pages of swap */ atomic_long_t inuse_pages; /* number of those currently in use */ struct percpu_cluster __percpu *percpu_cluster; /* per cpu's swap location */ diff --git a/mm/swapfile.c b/mm/swapfile.c index be2c719a51bb..bb1ef9192d99 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -261,12 +261,10 @@ static int __try_to_reclaim_swap(struct swap_info_struct *si, folio_ref_sub(folio, nr_pages); folio_set_dirty(folio); - spin_lock(&si->lock); /* Only sinple page folio can be backed by zswap */ if (nr_pages == 1) zswap_invalidate(entry); swap_entry_range_free(si, entry, nr_pages); - spin_unlock(&si->lock); ret = nr_pages; out_unlock: folio_unlock(folio); @@ -403,7 +401,21 @@ static void discard_swap_cluster(struct swap_info_struct *si, static inline bool cluster_is_free(struct swap_cluster_info *info) { - return info->flags == CLUSTER_FLAG_FREE; + return info->count == 0; +} + +static inline bool cluster_is_discard(struct swap_cluster_info *info) +{ + return info->flags == CLUSTER_FLAG_DISCARD; +} + +static inline bool cluster_is_usable(struct swap_cluster_info *ci, int order) +{ + if (unlikely(ci->flags > CLUSTER_FLAG_USABLE)) + return false; + if (!order) + return true; + return cluster_is_free(ci) || order == ci->order; } static inline unsigned int cluster_index(struct swap_info_struct *si, @@ -440,19 +452,20 @@ static void cluster_move(struct swap_info_struct *si, { VM_WARN_ON(ci->flags == new_flags); BUILD_BUG_ON(1 << sizeof(ci->flags) * BITS_PER_BYTE < CLUSTER_FLAG_MAX); + lockdep_assert_held(&ci->lock); - if (ci->flags == CLUSTER_FLAG_NONE) { + spin_lock(&si->lock); + if (ci->flags == CLUSTER_FLAG_NONE) list_add_tail(&ci->list, list); - } else { - if (ci->flags == CLUSTER_FLAG_FRAG) { - VM_WARN_ON(!si->frag_cluster_nr[ci->order]); - si->frag_cluster_nr[ci->order]--; - } + else list_move_tail(&ci->list, list); - } + spin_unlock(&si->lock); + + if (ci->flags == CLUSTER_FLAG_FRAG) + atomic_long_dec(&si->frag_cluster_nr[ci->order]); + else if (new_flags == CLUSTER_FLAG_FRAG) + atomic_long_inc(&si->frag_cluster_nr[ci->order]); ci->flags = new_flags; - if (new_flags == CLUSTER_FLAG_FRAG) - si->frag_cluster_nr[ci->order]++; } /* Add a cluster to discard list and schedule it to do discard */ @@ -475,39 +488,90 @@ static void swap_cluster_schedule_discard(struct swap_info_struct *si, static void __free_cluster(struct swap_info_struct *si, struct swap_cluster_info *ci) { - lockdep_assert_held(&si->lock); lockdep_assert_held(&ci->lock); cluster_move(si, ci, &si->free_clusters, CLUSTER_FLAG_FREE); ci->order = 0; } +/* + * Isolate and lock the first cluster that is not contented on a list, + * clean its flag before taken off-list. Cluster flag must be in sync + * with list status, so cluster updaters can always know the cluster + * list status without touching si lock. + * + * Note it's possible that all clusters on a list are contented so + * this returns NULL for an non-empty list. + */ +static struct swap_cluster_info *cluster_isolate_lock( + struct swap_info_struct *si, struct list_head *list) +{ + struct swap_cluster_info *ci, *ret = NULL; + + spin_lock(&si->lock); + + if (unlikely(!(si->flags & SWP_WRITEOK))) + goto out; + + list_for_each_entry(ci, list, list) { + if (!spin_trylock(&ci->lock)) + continue; + + /* We may only isolate and clear flags of following lists */ + VM_BUG_ON(!ci->flags); + VM_BUG_ON(ci->flags > CLUSTER_FLAG_USABLE && + ci->flags != CLUSTER_FLAG_FULL); + + list_del(&ci->list); + ci->flags = CLUSTER_FLAG_NONE; + ret = ci; + break; + } +out: + spin_unlock(&si->lock); + + return ret; +} + /* * Doing discard actually. After a cluster discard is finished, the cluster - * will be added to free cluster list. caller should hold si->lock. -*/ -static void swap_do_scheduled_discard(struct swap_info_struct *si) + * will be added to free cluster list. Discard cluster is a bit special as + * they don't participate in allocation or reclaim, so clusters marked as + * CLUSTER_FLAG_DISCARD must remain off-list or on discard list. + */ +static bool swap_do_scheduled_discard(struct swap_info_struct *si) { struct swap_cluster_info *ci; + bool ret = false; unsigned int idx; + spin_lock(&si->lock); while (!list_empty(&si->discard_clusters)) { ci = list_first_entry(&si->discard_clusters, struct swap_cluster_info, list); + /* + * Delete the cluster from list but don't clear its flags until + * discard is done, so isolation and relocation will skip it. + */ list_del(&ci->list); - /* Must clear flag when taking a cluster off-list */ - ci->flags = CLUSTER_FLAG_NONE; idx = cluster_index(si, ci); spin_unlock(&si->lock); - discard_swap_cluster(si, idx * SWAPFILE_CLUSTER, SWAPFILE_CLUSTER); - spin_lock(&si->lock); spin_lock(&ci->lock); - __free_cluster(si, ci); + /* + * Discard is done, clear its flags as it's now off-list, + * then return the cluster to allocation list. + */ + ci->flags = CLUSTER_FLAG_NONE; memset(si->swap_map + idx * SWAPFILE_CLUSTER, 0, SWAPFILE_CLUSTER); + __free_cluster(si, ci); spin_unlock(&ci->lock); + ret = true; + spin_lock(&si->lock); } + spin_unlock(&si->lock); + return ret; } static void swap_discard_work(struct work_struct *work) @@ -516,9 +580,7 @@ static void swap_discard_work(struct work_struct *work) si = container_of(work, struct swap_info_struct, discard_work); - spin_lock(&si->lock); swap_do_scheduled_discard(si); - spin_unlock(&si->lock); } static void swap_users_ref_free(struct percpu_ref *ref) @@ -529,10 +591,14 @@ static void swap_users_ref_free(struct percpu_ref *ref) complete(&si->comp); } +/* + * Must be called after freeing if ci->count == 0, moves the cluster to free + * or discard list. + */ static void free_cluster(struct swap_info_struct *si, struct swap_cluster_info *ci) { VM_BUG_ON(ci->count != 0); - lockdep_assert_held(&si->lock); + VM_BUG_ON(ci->flags == CLUSTER_FLAG_FREE); lockdep_assert_held(&ci->lock); /* @@ -549,6 +615,48 @@ static void free_cluster(struct swap_info_struct *si, struct swap_cluster_info * __free_cluster(si, ci); } +/* + * Must be called after freeing if ci->count != 0, moves the cluster to + * nonfull list. + */ +static void partial_free_cluster(struct swap_info_struct *si, + struct swap_cluster_info *ci) +{ + VM_BUG_ON(!ci->count || ci->count == SWAPFILE_CLUSTER); + lockdep_assert_held(&ci->lock); + + if (ci->flags != CLUSTER_FLAG_NONFULL) + cluster_move(si, ci, &si->nonfull_clusters[ci->order], + CLUSTER_FLAG_NONFULL); +} + +/* + * Must be called after allocation, moves the cluster to full or frag list. + * Note: allocation doesn't acquire si lock, and may drop the ci lock for + * reclaim, so the cluster could be any where when called. + */ +static void relocate_cluster(struct swap_info_struct *si, + struct swap_cluster_info *ci) +{ + lockdep_assert_held(&ci->lock); + + /* Discard cluster must remain off-list or on discard list */ + if (cluster_is_discard(ci)) + return; + + if (!ci->count) { + free_cluster(si, ci); + } else if (ci->count != SWAPFILE_CLUSTER) { + if (ci->flags != CLUSTER_FLAG_FRAG) + cluster_move(si, ci, &si->frag_clusters[ci->order], + CLUSTER_FLAG_FRAG); + } else { + if (ci->flags != CLUSTER_FLAG_FULL) + cluster_move(si, ci, &si->full_clusters, + CLUSTER_FLAG_FULL); + } +} + /* * The cluster corresponding to page_nr will be used. The cluster will not be * added to free cluster list and its usage counter will be increased by 1. @@ -567,30 +675,6 @@ static void inc_cluster_info_page(struct swap_info_struct *si, VM_BUG_ON(ci->flags); } -/* - * The cluster ci decreases @nr_pages usage. If the usage counter becomes 0, - * which means no page in the cluster is in use, we can optionally discard - * the cluster and add it to free cluster list. - */ -static void dec_cluster_info_page(struct swap_info_struct *si, - struct swap_cluster_info *ci, int nr_pages) -{ - VM_BUG_ON(ci->count < nr_pages); - VM_BUG_ON(cluster_is_free(ci)); - lockdep_assert_held(&si->lock); - lockdep_assert_held(&ci->lock); - ci->count -= nr_pages; - - if (!ci->count) { - free_cluster(si, ci); - return; - } - - if (ci->flags != CLUSTER_FLAG_NONFULL) - cluster_move(si, ci, &si->nonfull_clusters[ci->order], - CLUSTER_FLAG_NONFULL); -} - static bool cluster_reclaim_range(struct swap_info_struct *si, struct swap_cluster_info *ci, unsigned long start, unsigned long end) @@ -600,8 +684,6 @@ static bool cluster_reclaim_range(struct swap_info_struct *si, int nr_reclaim; spin_unlock(&ci->lock); - spin_unlock(&si->lock); - do { switch (READ_ONCE(map[offset])) { case 0: @@ -619,9 +701,7 @@ static bool cluster_reclaim_range(struct swap_info_struct *si, } } while (offset < end); out: - spin_lock(&si->lock); spin_lock(&ci->lock); - /* * Recheck the range no matter reclaim succeeded or not, the slot * could have been be freed while we are not holding the lock. @@ -635,11 +715,11 @@ static bool cluster_reclaim_range(struct swap_info_struct *si, static bool cluster_scan_range(struct swap_info_struct *si, struct swap_cluster_info *ci, - unsigned long start, unsigned int nr_pages) + unsigned long start, unsigned int nr_pages, + bool *need_reclaim) { unsigned long offset, end = start + nr_pages; unsigned char *map = si->swap_map; - bool need_reclaim = false; for (offset = start; offset < end; offset++) { switch (READ_ONCE(map[offset])) { @@ -648,16 +728,13 @@ static bool cluster_scan_range(struct swap_info_struct *si, case SWAP_HAS_CACHE: if (!vm_swap_full()) return false; - need_reclaim = true; + *need_reclaim = true; continue; default: return false; } } - if (need_reclaim) - return cluster_reclaim_range(si, ci, start, end); - return true; } @@ -672,23 +749,13 @@ static bool cluster_alloc_range(struct swap_info_struct *si, struct swap_cluster if (!(si->flags & SWP_WRITEOK)) return false; - VM_BUG_ON(ci->flags == CLUSTER_FLAG_NONE); - VM_BUG_ON(ci->flags > CLUSTER_FLAG_USABLE); - - if (cluster_is_free(ci)) { - if (nr_pages < SWAPFILE_CLUSTER) - cluster_move(si, ci, &si->nonfull_clusters[order], - CLUSTER_FLAG_NONFULL); + if (cluster_is_free(ci)) ci->order = order; - } memset(si->swap_map + start, usage, nr_pages); swap_range_alloc(si, nr_pages); ci->count += nr_pages; - if (ci->count == SWAPFILE_CLUSTER) - cluster_move(si, ci, &si->full_clusters, CLUSTER_FLAG_FULL); - return true; } @@ -699,37 +766,55 @@ static unsigned int alloc_swap_scan_cluster(struct swap_info_struct *si, unsigne unsigned long start = offset & ~(SWAPFILE_CLUSTER - 1); unsigned long end = min(start + SWAPFILE_CLUSTER, si->max); unsigned int nr_pages = 1 << order; + bool need_reclaim, ret; struct swap_cluster_info *ci; - if (end < nr_pages) - return SWAP_NEXT_INVALID; - end -= nr_pages; + ci = &si->cluster_info[offset / SWAPFILE_CLUSTER]; + lockdep_assert_held(&ci->lock); - ci = lock_cluster(si, offset); - if (ci->count + nr_pages > SWAPFILE_CLUSTER) { + if (end < nr_pages || ci->count + nr_pages > SWAPFILE_CLUSTER) { offset = SWAP_NEXT_INVALID; - goto done; + goto out; } - while (offset <= end) { - if (cluster_scan_range(si, ci, offset, nr_pages)) { - if (!cluster_alloc_range(si, ci, offset, usage, order)) { - offset = SWAP_NEXT_INVALID; - goto done; - } - *foundp = offset; - if (ci->count == SWAPFILE_CLUSTER) { + for (end -= nr_pages; offset <= end; offset += nr_pages) { + need_reclaim = false; + if (!cluster_scan_range(si, ci, offset, nr_pages, &need_reclaim)) + continue; + if (need_reclaim) { + ret = cluster_reclaim_range(si, ci, start, end); + /* + * Reclaim drops ci->lock and cluster could be used + * by another order. Not checking flag as off-list + * cluster has no flag set, and change of list + * won't cause fragmentation. + */ + if (!cluster_is_usable(ci, order)) { offset = SWAP_NEXT_INVALID; - goto done; + goto out; } - offset += nr_pages; - break; + if (cluster_is_free(ci)) + offset = start; + /* Reclaim failed but cluster is usable, try next */ + if (!ret) + continue; + } + if (!cluster_alloc_range(si, ci, offset, usage, order)) { + offset = SWAP_NEXT_INVALID; + goto out; + } + *foundp = offset; + if (ci->count == SWAPFILE_CLUSTER) { + offset = SWAP_NEXT_INVALID; + goto out; } offset += nr_pages; + break; } if (offset > end) offset = SWAP_NEXT_INVALID; -done: +out: + relocate_cluster(si, ci); unlock_cluster(ci); return offset; } @@ -746,18 +831,17 @@ static void swap_reclaim_full_clusters(struct swap_info_struct *si, bool force) if (force) to_scan = swap_usage_in_pages(si) / SWAPFILE_CLUSTER; - while (!list_empty(&si->full_clusters)) { - ci = list_first_entry(&si->full_clusters, struct swap_cluster_info, list); - list_move_tail(&ci->list, &si->full_clusters); + while ((ci = cluster_isolate_lock(si, &si->full_clusters))) { offset = cluster_offset(si, ci); end = min(si->max, offset + SWAPFILE_CLUSTER); to_scan--; - spin_unlock(&si->lock); while (offset < end) { if (READ_ONCE(map[offset]) == SWAP_HAS_CACHE) { + spin_unlock(&ci->lock); nr_reclaim = __try_to_reclaim_swap(si, offset, TTRS_ANYWAY | TTRS_DIRECT); + spin_lock(&ci->lock); if (nr_reclaim) { offset += abs(nr_reclaim); continue; @@ -765,8 +849,8 @@ static void swap_reclaim_full_clusters(struct swap_info_struct *si, bool force) } offset++; } - spin_lock(&si->lock); + unlock_cluster(ci); if (to_scan <= 0) break; } @@ -778,9 +862,7 @@ static void swap_reclaim_work(struct work_struct *work) si = container_of(work, struct swap_info_struct, reclaim_work); - spin_lock(&si->lock); swap_reclaim_full_clusters(si, true); - spin_unlock(&si->lock); } /* @@ -791,29 +873,34 @@ static void swap_reclaim_work(struct work_struct *work) static unsigned long cluster_alloc_swap_entry(struct swap_info_struct *si, int order, unsigned char usage) { - struct percpu_cluster *cluster; struct swap_cluster_info *ci; unsigned int offset, found = 0; -new_cluster: - lockdep_assert_held(&si->lock); - cluster = this_cpu_ptr(si->percpu_cluster); - offset = cluster->next[order]; + /* Fast path using per CPU cluster */ + local_lock(&si->percpu_cluster->lock); + offset = __this_cpu_read(si->percpu_cluster->next[order]); if (offset) { - offset = alloc_swap_scan_cluster(si, offset, &found, order, usage); + ci = lock_cluster(si, offset); + /* Cluster could have been used by another order */ + if (cluster_is_usable(ci, order)) { + if (cluster_is_free(ci)) + offset = cluster_offset(si, ci); + offset = alloc_swap_scan_cluster(si, offset, &found, + order, usage); + } else { + unlock_cluster(ci); + } if (found) goto done; } - if (!list_empty(&si->free_clusters)) { - ci = list_first_entry(&si->free_clusters, struct swap_cluster_info, list); - offset = alloc_swap_scan_cluster(si, cluster_offset(si, ci), &found, order, usage); - /* - * Either we didn't touch the cluster due to swapoff, - * or the allocation must success. - */ - VM_BUG_ON((si->flags & SWP_WRITEOK) && !found); - goto done; +new_cluster: + ci = cluster_isolate_lock(si, &si->free_clusters); + if (ci) { + offset = alloc_swap_scan_cluster(si, cluster_offset(si, ci), + &found, order, usage); + if (found) + goto done; } /* Try reclaim from full clusters if free clusters list is drained */ @@ -821,49 +908,45 @@ static unsigned long cluster_alloc_swap_entry(struct swap_info_struct *si, int o swap_reclaim_full_clusters(si, false); if (order < PMD_ORDER) { - unsigned int frags = 0; + unsigned int frags = 0, frags_existing; - while (!list_empty(&si->nonfull_clusters[order])) { - ci = list_first_entry(&si->nonfull_clusters[order], - struct swap_cluster_info, list); - cluster_move(si, ci, &si->frag_clusters[order], CLUSTER_FLAG_FRAG); + while ((ci = cluster_isolate_lock(si, &si->nonfull_clusters[order]))) { offset = alloc_swap_scan_cluster(si, cluster_offset(si, ci), &found, order, usage); - frags++; + /* + * With `fragmenting` set to true, it will surely take + * the cluster off nonfull list + */ if (found) goto done; + frags++; } - /* - * Nonfull clusters are moved to frag tail if we reached - * here, count them too, don't over scan the frag list. - */ - while (frags < si->frag_cluster_nr[order]) { - ci = list_first_entry(&si->frag_clusters[order], - struct swap_cluster_info, list); + frags_existing = atomic_long_read(&si->frag_cluster_nr[order]); + while (frags < frags_existing && + (ci = cluster_isolate_lock(si, &si->frag_clusters[order]))) { + atomic_long_dec(&si->frag_cluster_nr[order]); /* - * Rotate the frag list to iterate, they were all failing - * high order allocation or moved here due to per-CPU usage, - * this help keeping usable cluster ahead. + * Rotate the frag list to iterate, they were all + * failing high order allocation or moved here due to + * per-CPU usage, but they could contain newly released + * reclaimable (eg. lazy-freed swap cache) slots. */ - list_move_tail(&ci->list, &si->frag_clusters[order]); offset = alloc_swap_scan_cluster(si, cluster_offset(si, ci), &found, order, usage); - frags++; if (found) goto done; + frags++; } } - if (!list_empty(&si->discard_clusters)) { - /* - * we don't have free cluster but have some clusters in - * discarding, do discard now and reclaim them, then - * reread cluster_next_cpu since we dropped si->lock - */ - swap_do_scheduled_discard(si); + /* + * We don't have free cluster but have some clusters in + * discarding, do discard now and reclaim them, then + * reread cluster_next_cpu since we dropped si->lock + */ + if ((si->flags & SWP_PAGE_DISCARD) && swap_do_scheduled_discard(si)) goto new_cluster; - } if (order) goto done; @@ -874,26 +957,25 @@ static unsigned long cluster_alloc_swap_entry(struct swap_info_struct *si, int o * Clusters here have at least one usable slots and can't fail order 0 * allocation, but reclaim may drop si->lock and race with another user. */ - while (!list_empty(&si->frag_clusters[o])) { - ci = list_first_entry(&si->frag_clusters[o], - struct swap_cluster_info, list); + while ((ci = cluster_isolate_lock(si, &si->frag_clusters[o]))) { + atomic_long_dec(&si->frag_cluster_nr[o]); offset = alloc_swap_scan_cluster(si, cluster_offset(si, ci), - &found, 0, usage); + &found, order, usage); if (found) goto done; } - while (!list_empty(&si->nonfull_clusters[o])) { - ci = list_first_entry(&si->nonfull_clusters[o], - struct swap_cluster_info, list); + while ((ci = cluster_isolate_lock(si, &si->nonfull_clusters[o]))) { offset = alloc_swap_scan_cluster(si, cluster_offset(si, ci), - &found, 0, usage); + &found, order, usage); if (found) goto done; } } done: - cluster->next[order] = offset; + __this_cpu_write(si->percpu_cluster->next[order], offset); + local_unlock(&si->percpu_cluster->lock); + return found; } @@ -1153,14 +1235,11 @@ int get_swap_pages(int n_goal, swp_entry_t swp_entries[], int entry_order) plist_requeue(&si->avail_lists[node], &swap_avail_heads[node]); spin_unlock(&swap_avail_lock); if (get_swap_device_info(si)) { - spin_lock(&si->lock); n_ret = scan_swap_map_slots(si, SWAP_HAS_CACHE, n_goal, swp_entries, order); - spin_unlock(&si->lock); put_swap_device(si); if (n_ret || size > 1) goto check_out; - cond_resched(); } spin_lock(&swap_avail_lock); @@ -1373,9 +1452,7 @@ static bool __swap_entries_free(struct swap_info_struct *si, if (!has_cache) { for (i = 0; i < nr; i++) zswap_invalidate(swp_entry(si->type, offset + i)); - spin_lock(&si->lock); swap_entry_range_free(si, entry, nr); - spin_unlock(&si->lock); } return has_cache; @@ -1404,16 +1481,27 @@ static void swap_entry_range_free(struct swap_info_struct *si, swp_entry_t entry unsigned char *map_end = map + nr_pages; struct swap_cluster_info *ci; + /* It should never free entries across different clusters */ + VM_BUG_ON((offset / SWAPFILE_CLUSTER) != ((offset + nr_pages - 1) / SWAPFILE_CLUSTER)); + ci = lock_cluster(si, offset); + VM_BUG_ON(cluster_is_free(ci)); + VM_BUG_ON(ci->count < nr_pages); + + ci->count -= nr_pages; do { VM_BUG_ON(*map != SWAP_HAS_CACHE); *map = 0; } while (++map < map_end); - dec_cluster_info_page(si, ci, nr_pages); - unlock_cluster(ci); mem_cgroup_uncharge_swap(entry, nr_pages); swap_range_free(si, offset, nr_pages); + + if (!ci->count) + free_cluster(si, ci); + else + partial_free_cluster(si, ci); + unlock_cluster(ci); } static void cluster_swap_free_nr(struct swap_info_struct *si, @@ -1485,9 +1573,7 @@ void put_swap_folio(struct folio *folio, swp_entry_t entry) ci = lock_cluster(si, offset); if (size > 1 && swap_is_has_cache(si, offset, size)) { unlock_cluster(ci); - spin_lock(&si->lock); swap_entry_range_free(si, entry, size); - spin_unlock(&si->lock); return; } for (int i = 0; i < size; i++, entry.val++) { @@ -1502,46 +1588,19 @@ void put_swap_folio(struct folio *folio, swp_entry_t entry) unlock_cluster(ci); } -static int swp_entry_cmp(const void *ent1, const void *ent2) -{ - const swp_entry_t *e1 = ent1, *e2 = ent2; - - return (int)swp_type(*e1) - (int)swp_type(*e2); -} - void swapcache_free_entries(swp_entry_t *entries, int n) { - struct swap_info_struct *si, *prev; int i; + struct swap_info_struct *si = NULL; if (n <= 0) return; - prev = NULL; - si = NULL; - - /* - * Sort swap entries by swap device, so each lock is only taken once. - * nr_swapfiles isn't absolutely correct, but the overhead of sort() is - * so low that it isn't necessary to optimize further. - */ - if (nr_swapfiles > 1) - sort(entries, n, sizeof(entries[0]), swp_entry_cmp, NULL); for (i = 0; i < n; ++i) { si = _swap_info_get(entries[i]); - - if (si != prev) { - if (prev != NULL) - spin_unlock(&prev->lock); - if (si != NULL) - spin_lock(&si->lock); - } if (si) swap_entry_range_free(si, entries[i], 1); - prev = si; } - if (si) - spin_unlock(&si->lock); } int __swap_count(swp_entry_t entry) @@ -1793,13 +1852,8 @@ swp_entry_t get_swap_page_of_type(int type) goto fail; /* This is called for allocating swap entry, not cache */ - if (get_swap_device_info(si)) { - spin_lock(&si->lock); - if ((si->flags & SWP_WRITEOK) && scan_swap_map_slots(si, 1, 1, &entry, 0)) - atomic_long_dec(&nr_swap_pages); - spin_unlock(&si->lock); - put_swap_device(si); - } + if ((si->flags & SWP_WRITEOK) && scan_swap_map_slots(si, 1, 1, &entry, 0)) + atomic_long_dec(&nr_swap_pages); fail: return entry; } @@ -3137,6 +3191,7 @@ static struct swap_cluster_info *setup_clusters(struct swap_info_struct *si, cluster = per_cpu_ptr(si->percpu_cluster, cpu); for (i = 0; i < SWAP_NR_ORDERS; i++) cluster->next[i] = SWAP_NEXT_INVALID; + local_lock_init(&cluster->lock); } /* @@ -3160,7 +3215,7 @@ static struct swap_cluster_info *setup_clusters(struct swap_info_struct *si, for (i = 0; i < SWAP_NR_ORDERS; i++) { INIT_LIST_HEAD(&si->nonfull_clusters[i]); INIT_LIST_HEAD(&si->frag_clusters[i]); - si->frag_cluster_nr[i] = 0; + atomic_long_set(&si->frag_cluster_nr[i], 0); } /* @@ -3642,7 +3697,6 @@ int add_swap_count_continuation(swp_entry_t entry, gfp_t gfp_mask) */ goto outer; } - spin_lock(&si->lock); offset = swp_offset(entry); @@ -3707,7 +3761,6 @@ int add_swap_count_continuation(swp_entry_t entry, gfp_t gfp_mask) spin_unlock(&si->cont_lock); out: unlock_cluster(ci); - spin_unlock(&si->lock); put_swap_device(si); outer: if (page) From patchwork Tue Dec 24 14:38:08 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kairui Song X-Patchwork-Id: 13920198 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 55474E7718E for ; Tue, 24 Dec 2024 14:40:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DA9D86B00A1; Tue, 24 Dec 2024 09:40:15 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D342C6B00A2; Tue, 24 Dec 2024 09:40:15 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B85246B00A3; Tue, 24 Dec 2024 09:40:15 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 968FA6B00A1 for ; Tue, 24 Dec 2024 09:40:15 -0500 (EST) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 3F11D42721 for ; Tue, 24 Dec 2024 14:40:15 +0000 (UTC) X-FDA: 82930111488.21.BFFBEBB Received: from mail-pj1-f46.google.com (mail-pj1-f46.google.com [209.85.216.46]) by imf03.hostedemail.com (Postfix) with ESMTP id 749A120015 for ; Tue, 24 Dec 2024 14:39:55 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=IRvSFmIk; spf=pass (imf03.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.216.46 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1735051194; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=OeMbPM60Q8v3Qfaz9R6/9c2/Ea7K6oyXleY7O4H/Tp8=; b=7YXj8JtZUXAUkqOy4NVcrxkO6G/NbrbvX6tqKuzalqy1kCXsvftlS+AzVeg3m/GhGZ6mJg 6FV4aMbbHEAUiNe8FzFcvGL3tQskYBhxri7Z3rHRoESqXNvfkNJ3Xhcj/6zVeJS6ckkXPq Aoz6u1HnvaxAkw/+kgw0FnnPRVbL00I= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1735051194; a=rsa-sha256; cv=none; b=DVxcp2ElOE6sWPYAq1Jy/Tc1cyJm9agRIgwCZenOGhf9ILQhfBZJ1UbDwu1upRntvMvsSz 2sm8kaI2aNjOt07jULpee+7a4Sw60OjGWDUTPApizTIiXyW+TuJ/zX4MFlA+djlsov951H bBP9TBofaiSX4Qw5PLXjziMrXzOaHRA= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=IRvSFmIk; spf=pass (imf03.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.216.46 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-pj1-f46.google.com with SMTP id 98e67ed59e1d1-2f13acbe29bso5404989a91.1 for ; Tue, 24 Dec 2024 06:40:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1735051211; x=1735656011; darn=kvack.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=OeMbPM60Q8v3Qfaz9R6/9c2/Ea7K6oyXleY7O4H/Tp8=; b=IRvSFmIkgMGqvmrUPbkchfre5QebmdSl10tAb+mXQAEv+BSwNAiLZJ7Sebz/1+nPp8 ctQPkuFSjo1gZiX+KC1+RmNtktRPElTXTQrt8LZFAN/RBqMSJL2bp81ICgf6r8Sj2/S0 ZS8Ct9vKHaKHTjFaksEYwWdMv7DCJBKcUfehp4lSKb2rRn6EmeVAIAyJHH76t9X2zK5C v61pSJ/Trix66pAzoxsC3w+1fOpOZbWz1EoTy+h0xW5xKNaQYMOL01x89kt0Dizlksqt 7pubSRqHGCYXBZzUOSLE/WkoSnoWZktJlznrEb5GTQeVp25fEsRcWsubpJ+lo30ABeVc geng== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1735051211; x=1735656011; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=OeMbPM60Q8v3Qfaz9R6/9c2/Ea7K6oyXleY7O4H/Tp8=; b=frYOQdk3efptKVpFnBaNL/BTOdc/6Yb9mOhIPIK8y/x8cqUtJ+I8+ATiVO2VIdOw4S Kx2i8HT3qyYn1vyFTq59QK08Xb3rX11pwrzDyioeoPLnPAodqxrGIYKklBB7CAgV2SUD EmVz4AkkyY9IprOb2GqPiqLuMwgZ3OQhB5P/M9ttZLhEITDwO0vPPxjq0ZFClYA0LLmO LV9uPMs0lgHYSfGYKLKoJX2t1GZHljUIHcYM2bEtVpX9GsYNfqRAxJTKJplTjMjIL6d5 HM5knOANn7K7ytBbi1i9LflQ1vZUTSXhj0r+SF2uR3guLLD1XW4SLn9DvIKFpRdJhV4b bqMg== X-Gm-Message-State: AOJu0YxHz77YHDw1CwOYvCrCPWTOBkmSNQAxIBZym1XUhwn1uC8piMIK cac/Azo44BmZy2UZDWsyV3kdO67eA0BShRhsFsAEjtgklGu4YAlcLkn6BKCvy0s= X-Gm-Gg: ASbGncueSgyFybpvVI/ulEbWcluCddexgRvlV4FTEpLCkCcug16GzmkX7/ZSP9VcLfU TAMMAOYFeJhZR6Zxl+nAaaj63/tWXXb5Qlx4MfOGkZfMIIhKLEz7HNH3rOZqGGZeQwpEeL72OQa LXHt5wTE1MDQUzYvlfnguIJIAJWn2a6r7egQ3wCfwOraV7QppH7BFjYd6xv7jHwB0WKewH9ZL7b O7DGrdwVLfEFPWGpZrPmFsKD3b5Yv6FyxDlGEA1hLWvq1PXg8SWqs2PE/MR7V249wN893q6haaL Zw== X-Google-Smtp-Source: AGHT+IGbB/29Y+1ZgUGrQ+JERD995978jUTdlYdu6QWEKgLiCQEmWerJkgBsrHNrNTQtYolF6Xq0WQ== X-Received: by 2002:a17:90a:d64f:b0:2ee:9661:eafb with SMTP id 98e67ed59e1d1-2f4536d18a7mr24570769a91.12.1735051211431; Tue, 24 Dec 2024 06:40:11 -0800 (PST) Received: from KASONG-MC4.tencent.com ([115.171.41.189]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-219dc9cdec1sm90598735ad.136.2024.12.24.06.40.07 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 24 Dec 2024 06:40:11 -0800 (PST) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Chris Li , Barry Song , Ryan Roberts , Hugh Dickins , Yosry Ahmed , "Huang, Ying" , Nhat Pham , Johannes Weiner , Kalesh Singh , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH v2 10/13] mm, swap: simplify percpu cluster updating Date: Tue, 24 Dec 2024 22:38:08 +0800 Message-ID: <20241224143811.33462-11-ryncsn@gmail.com> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20241224143811.33462-1-ryncsn@gmail.com> References: <20241224143811.33462-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 749A120015 X-Stat-Signature: 4ojh1d83zjoeqne11f86gy6ogt5ehz3u X-Rspam-User: X-HE-Tag: 1735051195-351472 X-HE-Meta: U2FsdGVkX1/KV70RlT/eaCBFTRRyNAIVHujp9dMe1Y1WIJtVpCvbeQjbZ585r//YrGtODs5KDT3uTiOXV6lvHUh6oI7I4SW14uUGROKdpmUwWtYMi+WoY2fosy/fgMHKruT0Mb/C3glx2624EEubOX1VVYj7LEI3xJ4TbD6VibPMe7YSWBqKXRrZYrb8d6k64In1/zRdMvu/u1cM0PIuyfJXYeJMGH5RoHGtDxgxLf961dOVrjPpfkD3Zc0reJ8yzkN3dvL22K/69LtCimvdxaEC5hEim3ANROiOBm1dr6mItxIkC0OlLFclMiyUPi6P+lBO9R4P+kcWcLV2HrKEejx//nt7O4mFztEtzcHIz+WYDRj6hOCxNjlYJ+l0UqalaNNz0SlhJ53pXyZCwOzde/4N0dHK+LEUTTHaINyLCeWiAk570T2193vV9I3PqwYipl/j3HnHek6vdQJDwhA+Bk7DxNeIkftaoybRaFSK1UdHJxUKWgAKWl+IpdooYUWVWr4GKADKexOZMYpGbrLLhHF10I1a7kgR5LqzCrkKveSeOqizJaEVUtG/zz+AL/2Dis4+LiPHCYH1cmIFg6k3UlBK3PlOG658osCIGcD6egzSZ+unvZr50O4zsyDoF3dgaA8XAcjNEuVAvVBeWJhFuGBJ+gsPo2MhVizgje5R6Gi8VbFSNusqC1w0GvnZ9mIVLA7SG8kN77TyVIPX2J7rMpF+pRk4SSzNY251Sz4uJr7lcyfaWpWdoTgpW8vJ59aLnFZWbYhwUmaVjrL1lrwSo7unCjEtvyAeXzTgvTXxEoDU50mHaSgrXiYqd/odxBIejqucLatzfVDTuKcantMRAYj1lOzj7qdtNx6rWZ25BVtHGj2FFdjUCUxGcucjrgQUHsIlijj/YZbGyBv1dGk8cpI5IrPHsQsR3KoD89rVknovx0Qc+AyZWXdPCEsORH3s/6SeJyhBZEW20xD3AGA ucXbUMj2 kzNL1/ZGuYp6TCi2tQCzQYY/EvISygTs5JAqxqKEVkn7Da4EkHhcx5A3AqUEN3x8ZTkybNDiensGBl8/8BdLv4HNtG+7bvTZauyxg7gOOblqHqR5bZKF+s0uRAJddi6utZL7AQK5LUSb+dYHTjvyO05lqyuVxdskRQkDSdHwNxJRA80j3YVOT35lFtSFCTs0zUzfmLnJH1NrvZMqk7dwpUuL7Y3uPYHuLJRYgKDIf5xXwC+nfj9SiiIWI/tFTrSoZ6fcF/igOaeAYz9Pgxbwc9P7mpqeHZIqca6ZAQ+0+bj2VdPhTQnME6iNdHDA2hGKPw1/FNxKvZAwLmu4tw6a2o5pMz6kdifYyqplzjscO2FiDAZsYxRtw4L9ZY3236qybnloC5yhXGvzG3FZyq+mKdzTQ3WnvtsimtriS7LXcyt4rnL9zwrFNdUuYTMI4aRd5EDnv+KOJrO487aGHBALqZAL+XYrfZ7ziEnOKm7l60010Em9PA15UcLoJBzbZH4sDqPLQRo6JcL5o1xl64gp/O16uMA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.001635, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Kairui Song Instead of using a returning argument, we can simply store the next cluster offset to the fixed percpu location, which reduce the stack usage and simplify the function: Object size: ./scripts/bloat-o-meter mm/swapfile.o mm/swapfile.o.new add/remove: 0/0 grow/shrink: 0/2 up/down: 0/-271 (-271) Function old new delta get_swap_pages 2847 2733 -114 alloc_swap_scan_cluster 894 737 -157 Total: Before=30833, After=30562, chg -0.88% Stack usage: Before: swapfile.c:1190:5:get_swap_pages 240 static After: swapfile.c:1185:5:get_swap_pages 216 static Signed-off-by: Kairui Song --- include/linux/swap.h | 4 +-- mm/swapfile.c | 66 +++++++++++++++++++------------------------- 2 files changed, 31 insertions(+), 39 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index c4ff31cb6bde..4c1d2e69689f 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -275,9 +275,9 @@ enum swap_cluster_flags { * The first page in the swap file is the swap header, which is always marked * bad to prevent it from being allocated as an entry. This also prevents the * cluster to which it belongs being marked free. Therefore 0 is safe to use as - * a sentinel to indicate next is not valid in percpu_cluster. + * a sentinel to indicate an entry is not valid. */ -#define SWAP_NEXT_INVALID 0 +#define SWAP_ENTRY_INVALID 0 #ifdef CONFIG_THP_SWAP #define SWAP_NR_ORDERS (PMD_ORDER + 1) diff --git a/mm/swapfile.c b/mm/swapfile.c index bb1ef9192d99..ac170acf55a7 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -759,23 +759,23 @@ static bool cluster_alloc_range(struct swap_info_struct *si, struct swap_cluster return true; } -static unsigned int alloc_swap_scan_cluster(struct swap_info_struct *si, unsigned long offset, - unsigned int *foundp, unsigned int order, +/* Try use a new cluster for current CPU and allocate from it. */ +static unsigned int alloc_swap_scan_cluster(struct swap_info_struct *si, + struct swap_cluster_info *ci, + unsigned long offset, + unsigned int order, unsigned char usage) { - unsigned long start = offset & ~(SWAPFILE_CLUSTER - 1); + unsigned int next = SWAP_ENTRY_INVALID, found = SWAP_ENTRY_INVALID; + unsigned long start = ALIGN_DOWN(offset, SWAPFILE_CLUSTER); unsigned long end = min(start + SWAPFILE_CLUSTER, si->max); unsigned int nr_pages = 1 << order; bool need_reclaim, ret; - struct swap_cluster_info *ci; - ci = &si->cluster_info[offset / SWAPFILE_CLUSTER]; lockdep_assert_held(&ci->lock); - if (end < nr_pages || ci->count + nr_pages > SWAPFILE_CLUSTER) { - offset = SWAP_NEXT_INVALID; + if (end < nr_pages || ci->count + nr_pages > SWAPFILE_CLUSTER) goto out; - } for (end -= nr_pages; offset <= end; offset += nr_pages) { need_reclaim = false; @@ -789,34 +789,27 @@ static unsigned int alloc_swap_scan_cluster(struct swap_info_struct *si, unsigne * cluster has no flag set, and change of list * won't cause fragmentation. */ - if (!cluster_is_usable(ci, order)) { - offset = SWAP_NEXT_INVALID; + if (!cluster_is_usable(ci, order)) goto out; - } if (cluster_is_free(ci)) offset = start; /* Reclaim failed but cluster is usable, try next */ if (!ret) continue; } - if (!cluster_alloc_range(si, ci, offset, usage, order)) { - offset = SWAP_NEXT_INVALID; - goto out; - } - *foundp = offset; - if (ci->count == SWAPFILE_CLUSTER) { - offset = SWAP_NEXT_INVALID; - goto out; - } + if (!cluster_alloc_range(si, ci, offset, usage, order)) + break; + found = offset; offset += nr_pages; + if (ci->count < SWAPFILE_CLUSTER && offset <= end) + next = offset; break; } - if (offset > end) - offset = SWAP_NEXT_INVALID; out: relocate_cluster(si, ci); unlock_cluster(ci); - return offset; + __this_cpu_write(si->percpu_cluster->next[order], next); + return found; } /* Return true if reclaimed a whole cluster */ @@ -885,8 +878,8 @@ static unsigned long cluster_alloc_swap_entry(struct swap_info_struct *si, int o if (cluster_is_usable(ci, order)) { if (cluster_is_free(ci)) offset = cluster_offset(si, ci); - offset = alloc_swap_scan_cluster(si, offset, &found, - order, usage); + found = alloc_swap_scan_cluster(si, ci, offset, + order, usage); } else { unlock_cluster(ci); } @@ -897,8 +890,8 @@ static unsigned long cluster_alloc_swap_entry(struct swap_info_struct *si, int o new_cluster: ci = cluster_isolate_lock(si, &si->free_clusters); if (ci) { - offset = alloc_swap_scan_cluster(si, cluster_offset(si, ci), - &found, order, usage); + found = alloc_swap_scan_cluster(si, ci, cluster_offset(si, ci), + order, usage); if (found) goto done; } @@ -911,8 +904,8 @@ static unsigned long cluster_alloc_swap_entry(struct swap_info_struct *si, int o unsigned int frags = 0, frags_existing; while ((ci = cluster_isolate_lock(si, &si->nonfull_clusters[order]))) { - offset = alloc_swap_scan_cluster(si, cluster_offset(si, ci), - &found, order, usage); + found = alloc_swap_scan_cluster(si, ci, cluster_offset(si, ci), + order, usage); /* * With `fragmenting` set to true, it will surely take * the cluster off nonfull list @@ -932,8 +925,8 @@ static unsigned long cluster_alloc_swap_entry(struct swap_info_struct *si, int o * per-CPU usage, but they could contain newly released * reclaimable (eg. lazy-freed swap cache) slots. */ - offset = alloc_swap_scan_cluster(si, cluster_offset(si, ci), - &found, order, usage); + found = alloc_swap_scan_cluster(si, ci, cluster_offset(si, ci), + order, usage); if (found) goto done; frags++; @@ -959,21 +952,20 @@ static unsigned long cluster_alloc_swap_entry(struct swap_info_struct *si, int o */ while ((ci = cluster_isolate_lock(si, &si->frag_clusters[o]))) { atomic_long_dec(&si->frag_cluster_nr[o]); - offset = alloc_swap_scan_cluster(si, cluster_offset(si, ci), - &found, order, usage); + found = alloc_swap_scan_cluster(si, ci, cluster_offset(si, ci), + 0, usage); if (found) goto done; } while ((ci = cluster_isolate_lock(si, &si->nonfull_clusters[o]))) { - offset = alloc_swap_scan_cluster(si, cluster_offset(si, ci), - &found, order, usage); + found = alloc_swap_scan_cluster(si, ci, cluster_offset(si, ci), + 0, usage); if (found) goto done; } } done: - __this_cpu_write(si->percpu_cluster->next[order], offset); local_unlock(&si->percpu_cluster->lock); return found; @@ -3190,7 +3182,7 @@ static struct swap_cluster_info *setup_clusters(struct swap_info_struct *si, cluster = per_cpu_ptr(si->percpu_cluster, cpu); for (i = 0; i < SWAP_NR_ORDERS; i++) - cluster->next[i] = SWAP_NEXT_INVALID; + cluster->next[i] = SWAP_ENTRY_INVALID; local_lock_init(&cluster->lock); } From patchwork Tue Dec 24 14:38:09 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kairui Song X-Patchwork-Id: 13920199 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D326DE7718D for ; Tue, 24 Dec 2024 14:40:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6081A6B00A3; Tue, 24 Dec 2024 09:40:19 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 591406B00A4; Tue, 24 Dec 2024 09:40:19 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3714C6B00A5; Tue, 24 Dec 2024 09:40:19 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 12D6F6B00A3 for ; Tue, 24 Dec 2024 09:40:19 -0500 (EST) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id B640D1C7D7E for ; Tue, 24 Dec 2024 14:40:18 +0000 (UTC) X-FDA: 82930110606.09.5537D48 Received: from mail-pl1-f172.google.com (mail-pl1-f172.google.com [209.85.214.172]) by imf06.hostedemail.com (Postfix) with ESMTP id 3498C180004 for ; Tue, 24 Dec 2024 14:39:48 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=bYtwUEGF; spf=pass (imf06.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.172 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1735051198; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Bzlbcu02fLqLzv6kaO1DSu6E6WlQUI1ISyd8RxZxJr8=; b=oXlYKLpBBbyFy/j+8YugSrpBIBVqydr2+lFfUUtyhOQ94dA7htR9VmSs0yXZvs0RYprevt Br26+l5jDgIXgWlOO4HX3Bkh7cOoCfUka2ylhTDa2SIJFjr3jTHFgLLsaX5WZA4vbcYogZ rvq53HPoXEy4oDDNjKmGqUsSNSXTdQM= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1735051198; a=rsa-sha256; cv=none; b=gBofY70lkm+jYaHXaTMGikpLVdM+cEGQop6C1QUd6A7TqKP40iaYqmPYjjdF/4mlqQRLyY JSTTA10aIStnwBRyJeak0TZXIPtrxgleHiqrxcaJ9eP8SF9Fp9DFoApunbJrfCdn/vKqot 1WS9+1h7NvZMKLP0TnDO/r5LjjAjsjY= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=bYtwUEGF; spf=pass (imf06.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.172 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-pl1-f172.google.com with SMTP id d9443c01a7336-219f8263ae0so21236405ad.0 for ; Tue, 24 Dec 2024 06:40:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1735051215; x=1735656015; darn=kvack.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=Bzlbcu02fLqLzv6kaO1DSu6E6WlQUI1ISyd8RxZxJr8=; b=bYtwUEGFmmcA6wA/vtwlrImRE74D6LhJpkUlxWFu3KmbwDLpM+2DEQANu/N9sGp3F9 CGChjW60OfWl2oBWyq1kdj6u0nIqTuM5XAoIM6bvzR4RGJ7uf3hoXnGO1PrVy/LLpeI6 sEVb1ZrV8I8srrLQMQxkGnp21Ec0xBLuVdWpC/EW1j0cQqfNTvWEa/cPOPoiSfqb2S8u fCJAWUCZ+YPtxd48RQ/usC2+GO9uRGbixNUV55ge1YclbScxH2BBJdeRVpz2N1ifa6T7 c6pQ83lMx9qjMs6ssYYMBji1hzBVEEipyw/AMsyHHdF/4JRvqhw67DLOBJxvArK+3Epz C1Ow== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1735051215; x=1735656015; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=Bzlbcu02fLqLzv6kaO1DSu6E6WlQUI1ISyd8RxZxJr8=; b=MCp8rPc3NFxN3h7bFb8BDn4knkITf7ufOqVtWtl0+nvCI2NE/H464rdeYnw7509oSn Tfa2jCglXXN7nsm/v6FQR3VBvO3PT9GR7JpdihnrjC3ZI/djVyrB1wH6XQ2nDyjZv2uY 1BXZO0ONPLCBYzW30iLQc2mBPWuN8QAiDX5Kwk9B0yWc6LahGM3Q9QSU7qm4b4w7dYAw wMEgnpu1QgQeccP0hHTJEGx6D8jT5Vib1RgReL7TRG0QhA+uWiQ6GH8S/0SAlTqKsxZx Albn5VfJaQClx5hzw+VKYwAqiT2UIZoBbA9DWe6XFC1JAfCjjNVRHu4dvTdI4kH8ZKon opVg== X-Gm-Message-State: AOJu0YwX18f2sqr/PqqmhB8H3CV2pWyHhj/N/SxU+TBbDyorVyOIQXvi 1McK30BLlUYnll3iA3gYexNcNh1FTYyC/68iPxFBruZzjlPMBwBNL7KVP5yU0NQ= X-Gm-Gg: ASbGncvGp8a2ocIwLkp5TkOtS+IDQM2fhz+ytbRLkYEvnwyCyNRUzNgnO2Ph06Vq7DU m7I/WJZma4n8nzUHoYG3769mOnjntLw37EoJMP+FDOFwnqh0K4awZPLP5Eeh0PQDYKbQqu/PN7I yxJNwoEiDB+ZoArKB6AMidhCEI8DrNWwjY1i27RKM7Ayx7TZqLaQd0wcbsRVyV8ovsiDhKoHmHx /XK6yPBTOeX+HnkmCgIzn0+O7yOfayWo1aYJIYBvD8PxUQsim9MmqnRVA0kBIxXMukDRm0KyPpA zQ== X-Google-Smtp-Source: AGHT+IHfsTNBsjv0bUq65kgG9hxJHF1KnAgYhU8hlG8rshYiEPh2IPMtkj25JK/GiuQVbqe7HXA/wQ== X-Received: by 2002:a17:902:f70f:b0:211:8404:a957 with SMTP id d9443c01a7336-219e70c01a2mr271220915ad.41.1735051215184; Tue, 24 Dec 2024 06:40:15 -0800 (PST) Received: from KASONG-MC4.tencent.com ([115.171.41.189]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-219dc9cdec1sm90598735ad.136.2024.12.24.06.40.11 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 24 Dec 2024 06:40:14 -0800 (PST) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Chris Li , Barry Song , Ryan Roberts , Hugh Dickins , Yosry Ahmed , "Huang, Ying" , Nhat Pham , Johannes Weiner , Kalesh Singh , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH v2 11/13] mm, swap: introduce a helper for retrieving cluster from offset Date: Tue, 24 Dec 2024 22:38:09 +0800 Message-ID: <20241224143811.33462-12-ryncsn@gmail.com> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20241224143811.33462-1-ryncsn@gmail.com> References: <20241224143811.33462-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 3498C180004 X-Stat-Signature: 58iymxs74g3514ib1gn33fmaqkag6mwr X-Rspam-User: X-HE-Tag: 1735051188-462441 X-HE-Meta: U2FsdGVkX185fOBqQSaatcOQN0bgzpFf7/QOXqZ5RjoEIbcNhoKim5Plh2DPzRtQ22hpiaBhYUMgzLmqWdCxhXzbq483SYNCKpK09INIL4+28tLXqKqbNrWES3FtrhoFwCKKg7IEyltCS3uDmVBxR4T9AzRpubfta8pV4S/8mp4eOqA5tpuAS31Yzaqg7IFFAWuF7X5CXldEbmGJBbd+qJI0qytzRZ+JV6IcSmmT2JSNZhaBGyP90/VdcC3Wk40O2Yw5thM0bSAN0I9E/W1vXwt/0y6p497UQWNeYGhyWoOFkzCHyPWsESlbzxMqIWWiwl+/v8ayZNt0z2nYMsN9E8o15ZVreIXe0t3v8o3YFY9/zx5BUshIJqVdOi5Tny4f5A9aNp48b1jdfWzwQ23jmvjy0GtseGPERo0OJVwfFRHfKSPIDguUODbbz4BIrsx5bEdGUk89KbZZ+UwQZwji6HQ0Q9PsfWthYqZ5FXsd8+ZfH/lg3rkeOpLc3QbfBZik+JIw77KUWU3dYClWXUh3zWlHYSg/imYwgTIHBrAlqSDX30CXRloJnrbyj9FDydqsZRQDN2JxeYikORwoSthF+6hCldc9q0gH+uM9QCuWUOi4WS2mKSDSpEcZBVOFh06MRyDMHNWzY0eOTXTDz0bOTV0iHMe8UdEaE61Fum22cbtHFV3sptzIk261ozpFmXhAsXkbQvI4IH6qzzhOh4bsvHRbb/rgM4TzwZNudDTUB4cyC97ERnmRhjQkCFrLtpUbKrDilTGvVxzixz2PfS0bmI6fy4cWJhw6PEGoan8PHAa6YjXkNvAVSLhP4iOkv7+LUg/kHJ64xUDIY/iPV5oJGbwrbCw+FUsBCex4Hi2bDd4MYkRSZfBAGeeQHOoxmeTeomLxb+p1P5lTg8QmTdNAr2kPoesw7pyKjj8T4VNwyNJnko4OUhYbYyEelRILdXGS1ynLQgOrKOPwwgNkwJR coAcKF4M 7QoUjp6mZ069wEDhbLFWCKT0Vv2BFgXJJ3GkSac495cnKFhxDefxtJBUGzpcsJ6u536nUVdZRbiOqAIf2vND5Q7nONAVYAYa5TEIcGLYLbZYghb2mBrnnxmY8dTMxkfGvjwai9MRjVxuAhiU4eCcdjy2axBSTBxUKuq/Zo8RkuuFicE+4a4+H0M32WS5fih3k924wFZ7slch4iwP5gc1VGSxb/ZvSMeLYSa7tUxhdNxgokpy1+OZbbAAA1uk5qAMxmWLQP5HXidr3+l9uBQ88OWFFEtaeCL8MNaPRX0dDf9MpKzDIC0/yZdntkcP51Boqa8b7w3V3b9d3Jc/ArzshWwrXnemrWA5UU3Ooty0lrUJ4g0mKrNaOrlmNZL10teLYVv7TduyARWymC0SuLKk4i+2PlEhd5mpS86BIwazqFBO7gnNIlLbeuL0wa2XdF7/YgutX6nTZihFznrfTOBNvKQtHitRnXr58O1RAC/WIyW+dQOEd/7HC3+RmFCmb9Wy75vCah/r0bMz33B7IwyIQQzRvcF7x2YPS9tpzW+sd4/sFOKo= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000708, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Kairui Song It's a common operation to retrieve the cluster info from offset, introduce a helper for this. Suggested-by: Chris Li Signed-off-by: Kairui Song --- mm/swapfile.c | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/mm/swapfile.c b/mm/swapfile.c index ac170acf55a7..0445a2db8492 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -424,6 +424,12 @@ static inline unsigned int cluster_index(struct swap_info_struct *si, return ci - si->cluster_info; } +static inline struct swap_cluster_info *offset_to_cluster(struct swap_info_struct *si, + unsigned long offset) +{ + return &si->cluster_info[offset / SWAPFILE_CLUSTER]; +} + static inline unsigned int cluster_offset(struct swap_info_struct *si, struct swap_cluster_info *ci) { @@ -435,7 +441,7 @@ static inline struct swap_cluster_info *lock_cluster(struct swap_info_struct *si { struct swap_cluster_info *ci; - ci = &si->cluster_info[offset / SWAPFILE_CLUSTER]; + ci = offset_to_cluster(si, offset); spin_lock(&ci->lock); return ci; @@ -1473,10 +1479,10 @@ static void swap_entry_range_free(struct swap_info_struct *si, swp_entry_t entry unsigned char *map_end = map + nr_pages; struct swap_cluster_info *ci; - /* It should never free entries across different clusters */ - VM_BUG_ON((offset / SWAPFILE_CLUSTER) != ((offset + nr_pages - 1) / SWAPFILE_CLUSTER)); - ci = lock_cluster(si, offset); + + /* It should never free entries across different clusters */ + VM_BUG_ON(ci != offset_to_cluster(si, offset + nr_pages - 1)); VM_BUG_ON(cluster_is_free(ci)); VM_BUG_ON(ci->count < nr_pages); From patchwork Tue Dec 24 14:38:10 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kairui Song X-Patchwork-Id: 13920200 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9CAA3E7718D for ; Tue, 24 Dec 2024 14:40:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2FCCA6B00A5; Tue, 24 Dec 2024 09:40:23 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 283C96B00A6; Tue, 24 Dec 2024 09:40:23 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 089FB6B00A7; Tue, 24 Dec 2024 09:40:23 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id D85A36B00A5 for ; Tue, 24 Dec 2024 09:40:22 -0500 (EST) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 8F901C1A53 for ; Tue, 24 Dec 2024 14:40:22 +0000 (UTC) X-FDA: 82930112622.28.364A9FD Received: from mail-pl1-f179.google.com (mail-pl1-f179.google.com [209.85.214.179]) by imf04.hostedemail.com (Postfix) with ESMTP id AB9564000A for ; Tue, 24 Dec 2024 14:39:39 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Lq6ABOJL; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf04.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.179 as permitted sender) smtp.mailfrom=ryncsn@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1735051183; a=rsa-sha256; cv=none; b=j/vE+YKvqX2bJSdptpznot/aHsc22VFuWUANgjZ345Y/3oZDnxFYL/b9Sj1b7PvHAx8WfR cVVe6QySjVzN+KTFhc44FyeSkId0blf7lCdwOWmxhutEJ/EICN1agDIn9diCWngKSH4crc VnuY083Sx38SZ0abuMzd7+aBaglT6QA= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Lq6ABOJL; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf04.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.179 as permitted sender) smtp.mailfrom=ryncsn@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1735051183; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=grOS/bcMSihQT+oftYQ+fEn5xlcSkT6BHxL9XDq9XDs=; b=Wr7IjlBSW/zUneXliuVxIrE53b6gCD8YGu6ZVfqkySKaLTRsJ5oehFxGnNKrFZDbJIJEJt Iyq07meA4GZvfG518WzKzHgZAyywlh2j8MZT/xez95Hu9nVUy01GjlFIzQSqH1ZUKCZgk5 wM5mnRsJ/+0T4ZE9kIPz/qrM7OT6VL4= Received: by mail-pl1-f179.google.com with SMTP id d9443c01a7336-21644e6140cso55615755ad.1 for ; Tue, 24 Dec 2024 06:40:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1735051219; x=1735656019; darn=kvack.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=grOS/bcMSihQT+oftYQ+fEn5xlcSkT6BHxL9XDq9XDs=; b=Lq6ABOJLG6BohpY1+qu7I1nJRicXGBQE7kbxd+H+qGLu3xEy39mqKwc97bwS+Gkg9h jTD+M1VJ1Xe7h9H/aYWTcgydeKAXSMWKwKNeAtp3vYqMvg37xJLAvzNmZRjbFkZHXWO5 3evRNi+/Ea2Ky3qpM6PcRH5zcEzAwKMEYK+kiQEh/Yas8RdxYgRg7fxsu60q72JP9Fwd nvvghI3L69OV/ahnRx+l6c9Ey79x0RPN/zfIq2HRRrmkWJOqhYiQ6nlXY1kuXX87qmac aR5fpH8GjyAp6MyX/3Kn624Sh8ud5I4SxdZpMdYaQ+8m6xGcNsTFPTi757djUZFe5RoK DI6w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1735051219; x=1735656019; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=grOS/bcMSihQT+oftYQ+fEn5xlcSkT6BHxL9XDq9XDs=; b=Y0KFHLZaetZrr1QJcQRJ2kAO0bNMVSn6Js1nuQU6xxQIrP3ippg2B0TaOXckmV5MH6 yejaK2yHFhqNNPyap+FjmFfloD2ML2fqE5pr/+xC7fbVBQ+aA9LYt2gKGEIcPhiC3OFd FD4asGT7pCN6yCA4c+gPn6LGIvrT4veK24OYL/ftF4hSoRvU2v759/eIP3UvQD/fVTTa bbaIGivpfMU4/8j/YdD6OWNOXNQj0+GntunCIDIlxCO1tDM8W+HI0U6FkPjYWmea3Cfg Iw3F2f1751iNtvrTqZB/DOga4idN6sSMnzCUQWtI+V1IsNjFtfQH2NPLvPk7eT38RmTL CCRg== X-Gm-Message-State: AOJu0YzQSgCjRHYKS6PowCozj2mlY9TNNVmjjJsRDdVk6Mn9gGIZ9JB2 5FQ/9YTHzt7RL8ekzujZ4eOHM9lBoJx9lvZoXJT/0smKka5AAM2Z12jhewXpTb4= X-Gm-Gg: ASbGncu5AVoGbIpYZtOvq5UJU5r28CB+W8CQdQvRn+3FZ5Yw5gowlxTBu2ahBWaLknt qFdddkGyp5qpWmRzHPKJLefvTZr+kGTPnxLnwR2zQJpAfysRg7re9iTEV8Jq28jY7yJBVasQllU cDGt3wbpoZyHVm/lQ8WRhxsG9saudjVf4lqDIViWYLX2cWG3NW9cvXRyyM4Mq96FIsPzKG0olY2 xvH63hckdZmU12b/1ZNg9DteOqQ5XPwYWWD5Erwt/cKzwm5AoImAkDSCpF8nKcD/Gn9rxO8r00o NA== X-Google-Smtp-Source: AGHT+IGc1mwapEXAYQJNjva8I5Jy3JbCFO3+Cko8mZMiFwBQtMcFjYL1OETSaeprQYaypdgO4WqVRg== X-Received: by 2002:a17:903:120d:b0:216:5af7:5a8e with SMTP id d9443c01a7336-219e6ebdc17mr247308385ad.26.1735051218961; Tue, 24 Dec 2024 06:40:18 -0800 (PST) Received: from KASONG-MC4.tencent.com ([115.171.41.189]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-219dc9cdec1sm90598735ad.136.2024.12.24.06.40.15 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 24 Dec 2024 06:40:18 -0800 (PST) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Chris Li , Barry Song , Ryan Roberts , Hugh Dickins , Yosry Ahmed , "Huang, Ying" , Nhat Pham , Johannes Weiner , Kalesh Singh , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH v2 12/13] mm, swap: use a global swap cluster for non-rotation devices Date: Tue, 24 Dec 2024 22:38:10 +0800 Message-ID: <20241224143811.33462-13-ryncsn@gmail.com> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20241224143811.33462-1-ryncsn@gmail.com> References: <20241224143811.33462-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 X-Stat-Signature: gfoxuhyagbgqrik6hopticdmnaj5f4jn X-Rspam-User: X-Rspamd-Queue-Id: AB9564000A X-Rspamd-Server: rspam08 X-HE-Tag: 1735051179-721435 X-HE-Meta: U2FsdGVkX19l0cBPT9+VjZ22l5dPsYy+6nYPEMuyxlzv/RzwzktzGfI3xJv9zueOOEQKPNxN3ZyqgYnSKHsOm5+zhDwAN0j0VnN0omj7RsQi9f0tmd0u9KeKAKO8q0rMKygyoNFL3+hihqW+vb+vvUhA5HZiXtSCb3ht/mL6QSjt9jOmQ4unwkMf7wehPRN6DRfqyWQ1xvqVDWQmBy2o5oFiIyYxuMtWpr5ETGiW80lODughyiC40N816Gv/tc5cjrlK9S/qS18v4mKSXzcIBQ7iXU3MR0OVRoG+gm+WWJKt5DRRRD3em1MNBrj/luhds1sNnJBKstpdLPOgN7I0x/+IoSwvYfvg0VnMki2AYHMoXSlBbKWVjHbC/SVlcH/GRxA/vFPP7GBOHZx/nausyyO2FDWAzHV/z5L+juk0RFQ5K25TM8sKoxUxm5wQNm7uHHyyy/HAH9cgu2IwBOcOEft6/k1EO5Uz6ZY7ZWJOwMxaQEJ+70N6veG/R7Ijz3PlkuPaqGswO/SM432iME/bhS41sPwIXuKDM0B/MxO0r9X1SqyScmVipCtCR7wGbfywPVmFsV436K5Nc6giX7bvgDBXPRyzI6KypW3CuSsGAcGc8ujpjRFywFWRJhN/mn5qRbWV2zjVI/KIwdbIy6eX8MMRNpNvBKw9wZ9i32zmgf/qNMOCSNJWjVaZ4a7kUf9XpVF10RdhL3cKwkHHsiC8gHpBRWEagKxHV4XQkHclw2TB/pwKDRY1BmMtxsJyWWzzdVfZqEwYUjUYbm6lEI3kQE1x8HDidcRzMieQU5t3xX+Az4pcmdFZfdQ5YrBI99H7/Zf+cfIBqI4XWVYS0czaMpCzD6RN4d35ATB133iVYvnmCWtusqyfgSTCuVT6qhLjEctq1sfznO7IXcAWeaE4HKcwxnX93awIQFCmy7u9j8cPHrEiwX0Nxs410IhWv27HV8Kz0xouWvMZjfBjP4x 82PLPqA6 bX4CqJNYe8qbhutioFgC6b7HOQgd2d/Z6RQN4oJ1L8pw8Eb9DdHwuAipFK3/spgtMOLCVFtR5vA1Q/ZvoY8GEWaGi8OOWuK1J8l8d6lS5O6gsctDQq+L6RxfDbODXvrqFIbNjjmAu/TIgLWSL4+H6nbo+Vbn/mN2jy7oWrL0UnJtj9HN6YSz81wME73C/jYXsfZBaRjZ0yXk8HVqgpl8kR1o960mepRpgR9fI7hys1j3fRnDFckls60Oze98HVSg5yJVOq9v5vzlQ0XJ4oXBCkVPOr6oXy18g57EGFQlKlYK2amfHqbOYkvOggogauaGMHyL3eTL7zcCkPxoaPU0zIcvCIiXu8Hid5FWqWtZn9w71yE2hSkYIOWWnigiD399pCvCPBNku+NFBiFGRigT6gSNrdjAG7uHyTKVHqC1NJ3pBDUQpJQ6GE8EbaZaWNSNFLqM9NdBWtTA0EV/KpEMkQ2k982iZ1SY9GeYvztCPiFvAczcx+uBhYaTsKocBAbXEPK++f1/yKgkYZD8TKFPlB+xX2oRcoEMtIU3mCJkSr5W4cYU= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000016, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Kairui Song Non-rotational devices (SSD / ZRAM) can tolerate fragmentation, so the goal of the SWAP allocator is to avoid contention for clusters. It uses a per-CPU cluster design, and each CPU will use a different cluster as much as possible. However, HDDs are very sensitive to fragmentation, contention is trivial in comparison. Therefore, we use one global cluster instead. This ensures that each order will be written to the same cluster as much as possible, which helps make the I/O more continuous. This ensures that the performance of the cluster allocator is as good as that of the old allocator. Tests after this commit compared to those before this series: Tested using 'make -j32' with tinyconfig, a 1G memcg limit, and HDD swap: make -j32 with tinyconfig, using 1G memcg limit and HDD swap: Before this series: 114.44user 29.11system 39:42.90elapsed 6%CPU (0avgtext+0avgdata 157284maxresident)k 2901232inputs+0outputs (238877major+4227640minor)pagefaults After this commit: 113.90user 23.81system 38:11.77elapsed 6%CPU (0avgtext+0avgdata 157260maxresident)k 2548728inputs+0outputs (235471major+4238110minor)pagefaults Suggested-by: Chris Li Signed-off-by: Kairui Song --- include/linux/swap.h | 2 ++ mm/swapfile.c | 51 ++++++++++++++++++++++++++++++++------------ 2 files changed, 39 insertions(+), 14 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index 4c1d2e69689f..b13b72645db3 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -318,6 +318,8 @@ struct swap_info_struct { unsigned int pages; /* total of usable pages of swap */ atomic_long_t inuse_pages; /* number of those currently in use */ struct percpu_cluster __percpu *percpu_cluster; /* per cpu's swap location */ + struct percpu_cluster *global_cluster; /* Use one global cluster for rotating device */ + spinlock_t global_cluster_lock; /* Serialize usage of global cluster */ struct rb_root swap_extent_root;/* root of the swap extent rbtree */ struct block_device *bdev; /* swap device or bdev of swap file */ struct file *swap_file; /* seldom referenced */ diff --git a/mm/swapfile.c b/mm/swapfile.c index 0445a2db8492..482c531bdd8b 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -814,7 +814,10 @@ static unsigned int alloc_swap_scan_cluster(struct swap_info_struct *si, out: relocate_cluster(si, ci); unlock_cluster(ci); - __this_cpu_write(si->percpu_cluster->next[order], next); + if (si->flags & SWP_SOLIDSTATE) + __this_cpu_write(si->percpu_cluster->next[order], next); + else + si->global_cluster->next[order] = next; return found; } @@ -875,9 +878,16 @@ static unsigned long cluster_alloc_swap_entry(struct swap_info_struct *si, int o struct swap_cluster_info *ci; unsigned int offset, found = 0; - /* Fast path using per CPU cluster */ - local_lock(&si->percpu_cluster->lock); - offset = __this_cpu_read(si->percpu_cluster->next[order]); + if (si->flags & SWP_SOLIDSTATE) { + /* Fast path using per CPU cluster */ + local_lock(&si->percpu_cluster->lock); + offset = __this_cpu_read(si->percpu_cluster->next[order]); + } else { + /* Serialize HDD SWAP allocation for each device. */ + spin_lock(&si->global_cluster_lock); + offset = si->global_cluster->next[order]; + } + if (offset) { ci = lock_cluster(si, offset); /* Cluster could have been used by another order */ @@ -972,8 +982,10 @@ static unsigned long cluster_alloc_swap_entry(struct swap_info_struct *si, int o } } done: - local_unlock(&si->percpu_cluster->lock); - + if (si->flags & SWP_SOLIDSTATE) + local_unlock(&si->percpu_cluster->lock); + else + spin_unlock(&si->global_cluster_lock); return found; } @@ -2774,6 +2786,8 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile) mutex_unlock(&swapon_mutex); free_percpu(p->percpu_cluster); p->percpu_cluster = NULL; + kfree(p->global_cluster); + p->global_cluster = NULL; vfree(swap_map); kvfree(zeromap); kvfree(cluster_info); @@ -3179,17 +3193,24 @@ static struct swap_cluster_info *setup_clusters(struct swap_info_struct *si, for (i = 0; i < nr_clusters; i++) spin_lock_init(&cluster_info[i].lock); - si->percpu_cluster = alloc_percpu(struct percpu_cluster); - if (!si->percpu_cluster) - goto err_free; + if (si->flags & SWP_SOLIDSTATE) { + si->percpu_cluster = alloc_percpu(struct percpu_cluster); + if (!si->percpu_cluster) + goto err_free; - for_each_possible_cpu(cpu) { - struct percpu_cluster *cluster; + for_each_possible_cpu(cpu) { + struct percpu_cluster *cluster; - cluster = per_cpu_ptr(si->percpu_cluster, cpu); + cluster = per_cpu_ptr(si->percpu_cluster, cpu); + for (i = 0; i < SWAP_NR_ORDERS; i++) + cluster->next[i] = SWAP_ENTRY_INVALID; + local_lock_init(&cluster->lock); + } + } else { + si->global_cluster = kmalloc(sizeof(*si->global_cluster), GFP_KERNEL); for (i = 0; i < SWAP_NR_ORDERS; i++) - cluster->next[i] = SWAP_ENTRY_INVALID; - local_lock_init(&cluster->lock); + si->global_cluster->next[i] = SWAP_ENTRY_INVALID; + spin_lock_init(&si->global_cluster_lock); } /* @@ -3463,6 +3484,8 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags) bad_swap: free_percpu(si->percpu_cluster); si->percpu_cluster = NULL; + kfree(si->global_cluster); + si->global_cluster = NULL; inode = NULL; destroy_swap_extents(si); swap_cgroup_swapoff(si->type); From patchwork Tue Dec 24 14:38:11 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kairui Song X-Patchwork-Id: 13920201 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 21AC7E7718D for ; Tue, 24 Dec 2024 14:40:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9FFEC6B00A6; Tue, 24 Dec 2024 09:40:27 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 988616B00A8; Tue, 24 Dec 2024 09:40:27 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 78F2F6B00A9; Tue, 24 Dec 2024 09:40:27 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 4B8E06B00A6 for ; Tue, 24 Dec 2024 09:40:27 -0500 (EST) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 1296D450E0 for ; Tue, 24 Dec 2024 14:40:27 +0000 (UTC) X-FDA: 82930112538.04.E92A088 Received: from mail-pl1-f173.google.com (mail-pl1-f173.google.com [209.85.214.173]) by imf27.hostedemail.com (Postfix) with ESMTP id 4920A4000F for ; Tue, 24 Dec 2024 14:39:42 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=IwYzmo7d; spf=pass (imf27.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.173 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1735051206; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=oGCoWj+d4yQp+yoZIDEe8P3zgKNWoImSkpotl3PB1J8=; b=SD5Kk4VSq4PsOFWT31aKaS9wRYJXwu2GWvbBdilZ1RDd0vMIBImiRyoVfxY8Oxx6B3r6PR LHmwT7bZ+uWGfvAMcwCesaxj2bt0Bkr+54YEWkwUawfAYFinbs4pOZbUgC5PcbB7KHHlTc XvViv558qYoGFd+J8zHmo7knhsyfsPU= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1735051206; a=rsa-sha256; cv=none; b=UdKl+Mfz+oBmLc3m3XRSu965WS8NoLVh64ZonpF3aeh41UU8/uPHuOBgmiPAI7Gu3US/8J ErlPf/wThx4vxcNJzHWBmMMYHUtj6GoOvvJr6jeriaXpr+80e7is+2Y8HZA+vccF9DDq0e +tNAzyjeUH+ZY/bHWhf5Rxt0F4K+4UQ= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=IwYzmo7d; spf=pass (imf27.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.173 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-pl1-f173.google.com with SMTP id d9443c01a7336-21644e6140cso55616555ad.1 for ; Tue, 24 Dec 2024 06:40:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1735051223; x=1735656023; darn=kvack.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=oGCoWj+d4yQp+yoZIDEe8P3zgKNWoImSkpotl3PB1J8=; b=IwYzmo7dSvluOWQIQFegsrsuwOPPGQGT0nguCUOcHkvmC4yvd3IOUUV820X5Fn1X30 meKJefl2KB3OK4q9ijhYNKXICESg0NM+gGOcwJMKM9WTkN2Rvix4qniEf/mV3QrUIRFh 4yFDLDi8cuJulunWDIzTPX6/FVey5FJHNWJ56zxNwvyk98twPzbF1fmtMJKs5Yl7uOGI 4JxZ6zKU0F7F65V2VgPfjE5T2zpg3RuCikausYDKnyx5wu7L/ZSMy1QWtOqAQvld4kPk QD1pi15ZVff1b4OWPsZuftZPA0BUgmhATd29dvSqI7r+q82T6cCWTeHA+7xQCdxmJ9AW vv6g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1735051223; x=1735656023; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=oGCoWj+d4yQp+yoZIDEe8P3zgKNWoImSkpotl3PB1J8=; b=R/0Ddf/EtDQz7bdnGJLqkwJ3uwrgUu/UwSkQ13dJQhroNHWCYGL6bs+HybeO1u0Z1U aF90V6QekanFjtcJ0HsSTSrIKatYkXkQxww5E8Q0clwW75bB/wKCgKBQHDHO7NjldmZs C5kv4nqYkj51DU8g+ZaSlgGEviSN/WkwKbvUbCeQNp1QsXwStMUWA7Hq6VonFTJLHX1C RUefDtaz36cUYIw+WWyO5TLvEtHrRH/b69799bKiyjlH/eXIA9wKDxJKmE1uj7R4OabA 5rBb7CUS/EMnzIZHqzaNm91DvRqWwCx3NBEv0IXN7fm/OIiBJt76cKq/wVHk7aaNDAfU rv0A== X-Gm-Message-State: AOJu0YwT81oky+8G57fHGJG8xuagHC8zi+8tVLk86mACVa/fcalAy9bw dOFDNVzBKhVo4oUcPiFvFMn89apoBjPgr/nbvkPMd/3hlNUZSOC9qjzLhtmnydw= X-Gm-Gg: ASbGncushvDP1ubhbLPerthR5T6HeUfYJQvUEWEuzVAHp4DrMfE2moH6w19e9P2QXf0 QuXGlkgScjjwJB0etx2ZUw4qhJqMd/nB2yMwrLqvIyNMuZqUyh2B9xDVDpn8sd6vH0Bs+BxB65v hE+ULBNV3F5e+yOhTvGRHGBFQ6XnbKFBdfxazPYbiJgNGh1vwCx0UF6wJG5dsgf/gC/KMFNvmQZ qMW0dvKO7Mmezh4q6CJq8IMXgNewAc2jbvOK5VHEMvl3QElQ+N4n6NjNlXWT+jCvhlKcDqoIoq9 mQ== X-Google-Smtp-Source: AGHT+IGAPgVHp5vUFJh+SSN/uYjKj3b0hRrsN2QqzC9V+t4vENg01vQU53MJCWTYsvcbffhDkXP5bA== X-Received: by 2002:a17:902:d4c9:b0:215:6f9b:e447 with SMTP id d9443c01a7336-219e6ebf228mr175354085ad.30.1735051222881; Tue, 24 Dec 2024 06:40:22 -0800 (PST) Received: from KASONG-MC4.tencent.com ([115.171.41.189]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-219dc9cdec1sm90598735ad.136.2024.12.24.06.40.19 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 24 Dec 2024 06:40:22 -0800 (PST) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Chris Li , Barry Song , Ryan Roberts , Hugh Dickins , Yosry Ahmed , "Huang, Ying" , Nhat Pham , Johannes Weiner , Kalesh Singh , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH v2 13/13] mm, swap_slots: remove slot cache for freeing path Date: Tue, 24 Dec 2024 22:38:11 +0800 Message-ID: <20241224143811.33462-14-ryncsn@gmail.com> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20241224143811.33462-1-ryncsn@gmail.com> References: <20241224143811.33462-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 4920A4000F X-Stat-Signature: qnud3eaoistdp9g6xknthakjk3tnbjg3 X-Rspam-User: X-HE-Tag: 1735051182-60995 X-HE-Meta: U2FsdGVkX18AdyaHSfeE4YD02miu3WMe7mc+cpgxWhFMBgjYn1mgGMZhrBfFj3oZc9H7fC9SR9oIi2zT5/okK4oJsm328q4R4AIHJmC8MSsQ5IDAK3CicP1fF/CC8KCAxr394GtPao1nuS5rz1xWpxTIF1sBNjA1W+vwsZAwsSg1TTpJypeVBoeyh77qWekqabP1u+MBg/E4mKQpNDgDcDhn6QAjUcutG1B4rKr8rBrTKO+P4Ns3BpnTP5r5W1XqFv7O+5WAUlotvhTGKA4lZQ22LqL8+kp1NG6++5unvvhwQXWj89OVO7BAWOupS7vJ3x9A3BsSuKWFtej3VM/rwZTuXohlXvJI35dQV9ngXAA1ERmKzJ6CxmuJ5IrTOrCNfwoBR94oE3SnyNB2Q6Bs6Pnfoc60C5/cgneyLky53gnyQgeJOSls4IXLm0WmFpEu2mnGAHuzmsIeDnUGQ+Dn6oKU/EM/ybP43QZQt75F563ESyRDOn0axHdZhI/5p2OQjaU/5CcmqHhDYKyAbXdn92bRjNNjYS8uAVQAuTdecPPYY5vbz7WJlAZPDjem4KjFt3NlhMtO6As77rfCDoJ2KCz4ABSEh6qLZJcZKiRpU4AdGOWjJUTRjidHVWW1w3uUJSkKiEiMo0KNOolEgkJnR/20zS2xn3qeDCuHBMRap7ZORg74ajbGC2l+ru7RZHZNsHeuNtaWblWESRvUhNbfpI/HdkhG72Mg8vpt6jvS4ZtqY5rbA/SxVNFQI1q1UZm8eCPpKAVxhzXM6OQFC5SprdmFUugaQA1HOBD3YzdLJ/Bg9a8DMvGSpJao0lhqDEGwoamikPQm43jCArAqD7ZRMpLkPEgx4V2NQQ53ZbfJsgBSZSqtIUM7yIon+MC0e35znHDBixY5PnbIzwhm01kV6IVD8xkYfXzhDqrxdNgmLwqHZ5D9TXZdZS7qb28jtBQYZ4+BRfZqFwuNf7+J85K vgahoCkt Sl9ytdlY/KccJ2GZmWDmRElEt8vljx4Qj4S9Cefr47tA+jd7IblBwo3A9O1MMHE2kV4isp7yub1HHo9X8srVRz5K/8p/1Tfuilp6Tfthywlg2gmd3PzqSOQiVOo5a/4lkT0bZQlgRhJDYmvlPqaPuLph1F+s+Vj7Rf3JiW9iENPkHcD7p6PxwSfYZF9Hs0zExv05ms6MIGl5747rokCE0FqRRjgczWg8kMJTtGBf158HbGy2ZadqbOSQRCaeOTxFaMbN9mk0EAgRRRVY8IUyvwtuUB+MSxYaqdBSsY/g57qvMNw/dnprhJkwZTqIyCrjCGrqg/3Oex7ggsSHcZWPjc4oWMoofuSmgIcXhk6AdzZxOy/nVdwrYtkjblcO2u7bUBOvzI4Vrhl/hLBoV1pTz0+qt9X7F+mNi4mA03tN33KLQ6QOIkmEjP1lx3MT9WqARa5KBJTM6pk2G3FjSI0kK/06e+VF4Cb0XE4kRtC4jHyxtUu8cA4BurexgsQFc2rdyUZzYjL5gPzaXdTpP6Dc+LYcYAc8aV+4MpPiOXksmcnObEtjhuHeN57J6VkqpejlEabkt+JTCpwUj9UJnLPbSo+Xv4QWlwZPueoZuzlKrS5bIa60/p8CcOVoEAA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Kairui Song The slot cache for freeing path is mostly for reducing the overhead of si->lock. As we have basically eliminated the si->lock usage for freeing path, it can be removed. This helps simplify the code, and avoids swap entries from being hold in cache upon freeing. The delayed freeing of entries have been causing trouble for further optimizations for zswap [1] and in theory will also cause more fragmentation, and extra overhead. Test with build linux kernel showed both performance and fragmentation is better without the cache: tiem make -j96 / 768M memcg, 4K pages, 10G ZRAM, avg of 4 test run:: Before: Sys time: 36047.78, Real time: 472.43 After: (-7.6% sys time, -7.3% real time) Sys time: 33314.76, Real time: 437.67 time make -j96 / 1152M memcg, 64K mTHP, 10G ZRAM, avg of 4 test run: Before: Sys time: 46859.04, Real time: 562.63 hugepages-64kB/stats/swpout: 1783392 hugepages-64kB/stats/swpout_fallback: 240875 After: (-23.3% sys time, -21.3% real time) Sys time: 35958.87, Real time: 442.69 hugepages-64kB/stats/swpout: 1866267 hugepages-64kB/stats/swpout_fallback: 158330 Sequential SWAP should be also slightly faster, tests didn't show a measurable difference though, at least no regression: Swapin 4G zero page on ZRAM (time in us): Before (avg. 1923756) 1912391 1927023 1927957 1916527 1918263 1914284 1934753 1940813 1921791 After (avg. 1922290): 1919101 1925743 1916810 1917007 1923930 1935152 1917403 1923549 1921913 Link: https://lore.kernel.org/all/CAMgjq7ACohT_uerSz8E_994ZZCv709Zor+43hdmesW_59W1BWw@mail.gmail.com/[1] Suggested-by: Chris Li Signed-off-by: Kairui Song --- include/linux/swap_slots.h | 3 -- mm/swap_slots.c | 78 +++++---------------------------- mm/swapfile.c | 89 +++++++++++++++----------------------- 3 files changed, 44 insertions(+), 126 deletions(-) diff --git a/include/linux/swap_slots.h b/include/linux/swap_slots.h index 15adfb8c813a..840aec3523b2 100644 --- a/include/linux/swap_slots.h +++ b/include/linux/swap_slots.h @@ -16,15 +16,12 @@ struct swap_slots_cache { swp_entry_t *slots; int nr; int cur; - spinlock_t free_lock; /* protects slots_ret, n_ret */ - swp_entry_t *slots_ret; int n_ret; }; void disable_swap_slots_cache_lock(void); void reenable_swap_slots_cache_unlock(void); void enable_swap_slots_cache(void); -void free_swap_slot(swp_entry_t entry); extern bool swap_slot_cache_enabled; diff --git a/mm/swap_slots.c b/mm/swap_slots.c index 13ab3b771409..9c7c171df7ba 100644 --- a/mm/swap_slots.c +++ b/mm/swap_slots.c @@ -43,17 +43,15 @@ static DEFINE_MUTEX(swap_slots_cache_mutex); /* Serialize swap slots cache enable/disable operations */ static DEFINE_MUTEX(swap_slots_cache_enable_mutex); -static void __drain_swap_slots_cache(unsigned int type); +static void __drain_swap_slots_cache(void); #define use_swap_slot_cache (swap_slot_cache_active && swap_slot_cache_enabled) -#define SLOTS_CACHE 0x1 -#define SLOTS_CACHE_RET 0x2 static void deactivate_swap_slots_cache(void) { mutex_lock(&swap_slots_cache_mutex); swap_slot_cache_active = false; - __drain_swap_slots_cache(SLOTS_CACHE|SLOTS_CACHE_RET); + __drain_swap_slots_cache(); mutex_unlock(&swap_slots_cache_mutex); } @@ -72,7 +70,7 @@ void disable_swap_slots_cache_lock(void) if (swap_slot_cache_initialized) { /* serialize with cpu hotplug operations */ cpus_read_lock(); - __drain_swap_slots_cache(SLOTS_CACHE|SLOTS_CACHE_RET); + __drain_swap_slots_cache(); cpus_read_unlock(); } } @@ -113,7 +111,7 @@ static bool check_cache_active(void) static int alloc_swap_slot_cache(unsigned int cpu) { struct swap_slots_cache *cache; - swp_entry_t *slots, *slots_ret; + swp_entry_t *slots; /* * Do allocation outside swap_slots_cache_mutex @@ -125,28 +123,19 @@ static int alloc_swap_slot_cache(unsigned int cpu) if (!slots) return -ENOMEM; - slots_ret = kvcalloc(SWAP_SLOTS_CACHE_SIZE, sizeof(swp_entry_t), - GFP_KERNEL); - if (!slots_ret) { - kvfree(slots); - return -ENOMEM; - } - mutex_lock(&swap_slots_cache_mutex); cache = &per_cpu(swp_slots, cpu); - if (cache->slots || cache->slots_ret) { + if (cache->slots) { /* cache already allocated */ mutex_unlock(&swap_slots_cache_mutex); kvfree(slots); - kvfree(slots_ret); return 0; } if (!cache->lock_initialized) { mutex_init(&cache->alloc_lock); - spin_lock_init(&cache->free_lock); cache->lock_initialized = true; } cache->nr = 0; @@ -160,19 +149,16 @@ static int alloc_swap_slot_cache(unsigned int cpu) */ mb(); cache->slots = slots; - cache->slots_ret = slots_ret; mutex_unlock(&swap_slots_cache_mutex); return 0; } -static void drain_slots_cache_cpu(unsigned int cpu, unsigned int type, - bool free_slots) +static void drain_slots_cache_cpu(unsigned int cpu, bool free_slots) { struct swap_slots_cache *cache; - swp_entry_t *slots = NULL; cache = &per_cpu(swp_slots, cpu); - if ((type & SLOTS_CACHE) && cache->slots) { + if (cache->slots) { mutex_lock(&cache->alloc_lock); swapcache_free_entries(cache->slots + cache->cur, cache->nr); cache->cur = 0; @@ -183,20 +169,9 @@ static void drain_slots_cache_cpu(unsigned int cpu, unsigned int type, } mutex_unlock(&cache->alloc_lock); } - if ((type & SLOTS_CACHE_RET) && cache->slots_ret) { - spin_lock_irq(&cache->free_lock); - swapcache_free_entries(cache->slots_ret, cache->n_ret); - cache->n_ret = 0; - if (free_slots && cache->slots_ret) { - slots = cache->slots_ret; - cache->slots_ret = NULL; - } - spin_unlock_irq(&cache->free_lock); - kvfree(slots); - } } -static void __drain_swap_slots_cache(unsigned int type) +static void __drain_swap_slots_cache(void) { unsigned int cpu; @@ -224,13 +199,13 @@ static void __drain_swap_slots_cache(unsigned int type) * There are no slots on such cpu that need to be drained. */ for_each_online_cpu(cpu) - drain_slots_cache_cpu(cpu, type, false); + drain_slots_cache_cpu(cpu, false); } static int free_slot_cache(unsigned int cpu) { mutex_lock(&swap_slots_cache_mutex); - drain_slots_cache_cpu(cpu, SLOTS_CACHE | SLOTS_CACHE_RET, true); + drain_slots_cache_cpu(cpu, true); mutex_unlock(&swap_slots_cache_mutex); return 0; } @@ -269,39 +244,6 @@ static int refill_swap_slots_cache(struct swap_slots_cache *cache) return cache->nr; } -void free_swap_slot(swp_entry_t entry) -{ - struct swap_slots_cache *cache; - - /* Large folio swap slot is not covered. */ - zswap_invalidate(entry); - - cache = raw_cpu_ptr(&swp_slots); - if (likely(use_swap_slot_cache && cache->slots_ret)) { - spin_lock_irq(&cache->free_lock); - /* Swap slots cache may be deactivated before acquiring lock */ - if (!use_swap_slot_cache || !cache->slots_ret) { - spin_unlock_irq(&cache->free_lock); - goto direct_free; - } - if (cache->n_ret >= SWAP_SLOTS_CACHE_SIZE) { - /* - * Return slots to global pool. - * The current swap_map value is SWAP_HAS_CACHE. - * Set it to 0 to indicate it is available for - * allocation in global pool - */ - swapcache_free_entries(cache->slots_ret, cache->n_ret); - cache->n_ret = 0; - } - cache->slots_ret[cache->n_ret++] = entry; - spin_unlock_irq(&cache->free_lock); - } else { -direct_free: - swapcache_free_entries(&entry, 1); - } -} - swp_entry_t folio_alloc_swap(struct folio *folio) { swp_entry_t entry; diff --git a/mm/swapfile.c b/mm/swapfile.c index 482c531bdd8b..c5b5a3cd2a92 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -53,14 +53,15 @@ static bool swap_count_continued(struct swap_info_struct *, pgoff_t, unsigned char); static void free_swap_count_continuations(struct swap_info_struct *); -static void swap_entry_range_free(struct swap_info_struct *si, swp_entry_t entry, - unsigned int nr_pages); +static void swap_entry_range_free(struct swap_info_struct *si, + struct swap_cluster_info *ci, + swp_entry_t entry, unsigned int nr_pages); static void swap_range_alloc(struct swap_info_struct *si, unsigned int nr_entries); static bool folio_swapcache_freeable(struct folio *folio); static struct swap_cluster_info *lock_cluster(struct swap_info_struct *si, unsigned long offset); -static void unlock_cluster(struct swap_cluster_info *ci); +static inline void unlock_cluster(struct swap_cluster_info *ci); static DEFINE_SPINLOCK(swap_lock); static unsigned int nr_swapfiles; @@ -261,10 +262,9 @@ static int __try_to_reclaim_swap(struct swap_info_struct *si, folio_ref_sub(folio, nr_pages); folio_set_dirty(folio); - /* Only sinple page folio can be backed by zswap */ - if (nr_pages == 1) - zswap_invalidate(entry); - swap_entry_range_free(si, entry, nr_pages); + ci = lock_cluster(si, offset); + swap_entry_range_free(si, ci, entry, nr_pages); + unlock_cluster(ci); ret = nr_pages; out_unlock: folio_unlock(folio); @@ -1121,8 +1121,10 @@ static void swap_range_free(struct swap_info_struct *si, unsigned long offset, * Use atomic clear_bit operations only on zeromap instead of non-atomic * bitmap_clear to prevent adjacent bits corruption due to simultaneous writes. */ - for (i = 0; i < nr_entries; i++) + for (i = 0; i < nr_entries; i++) { clear_bit(offset + i, si->zeromap); + zswap_invalidate(swp_entry(si->type, offset + i)); + } if (si->flags & SWP_BLKDEV) swap_slot_free_notify = @@ -1427,9 +1429,9 @@ static unsigned char __swap_entry_free(struct swap_info_struct *si, ci = lock_cluster(si, offset); usage = __swap_entry_free_locked(si, offset, 1); - unlock_cluster(ci); if (!usage) - free_swap_slot(entry); + swap_entry_range_free(si, ci, swp_entry(si->type, offset), 1); + unlock_cluster(ci); return usage; } @@ -1457,13 +1459,10 @@ static bool __swap_entries_free(struct swap_info_struct *si, } for (i = 0; i < nr; i++) WRITE_ONCE(si->swap_map[offset + i], SWAP_HAS_CACHE); + if (!has_cache) + swap_entry_range_free(si, ci, entry, nr); unlock_cluster(ci); - if (!has_cache) { - for (i = 0; i < nr; i++) - zswap_invalidate(swp_entry(si->type, offset + i)); - swap_entry_range_free(si, entry, nr); - } return has_cache; fallback: @@ -1483,15 +1482,13 @@ static bool __swap_entries_free(struct swap_info_struct *si, * Drop the last HAS_CACHE flag of swap entries, caller have to * ensure all entries belong to the same cgroup. */ -static void swap_entry_range_free(struct swap_info_struct *si, swp_entry_t entry, - unsigned int nr_pages) +static void swap_entry_range_free(struct swap_info_struct *si, + struct swap_cluster_info *ci, + swp_entry_t entry, unsigned int nr_pages) { unsigned long offset = swp_offset(entry); unsigned char *map = si->swap_map + offset; unsigned char *map_end = map + nr_pages; - struct swap_cluster_info *ci; - - ci = lock_cluster(si, offset); /* It should never free entries across different clusters */ VM_BUG_ON(ci != offset_to_cluster(si, offset + nr_pages - 1)); @@ -1511,7 +1508,6 @@ static void swap_entry_range_free(struct swap_info_struct *si, swp_entry_t entry free_cluster(si, ci); else partial_free_cluster(si, ci); - unlock_cluster(ci); } static void cluster_swap_free_nr(struct swap_info_struct *si, @@ -1519,28 +1515,13 @@ static void cluster_swap_free_nr(struct swap_info_struct *si, unsigned char usage) { struct swap_cluster_info *ci; - DECLARE_BITMAP(to_free, BITS_PER_LONG) = { 0 }; - int i, nr; + unsigned long end = offset + nr_pages; ci = lock_cluster(si, offset); - while (nr_pages) { - nr = min(BITS_PER_LONG, nr_pages); - for (i = 0; i < nr; i++) { - if (!__swap_entry_free_locked(si, offset + i, usage)) - bitmap_set(to_free, i, 1); - } - if (!bitmap_empty(to_free, BITS_PER_LONG)) { - unlock_cluster(ci); - for_each_set_bit(i, to_free, BITS_PER_LONG) - free_swap_slot(swp_entry(si->type, offset + i)); - if (nr == nr_pages) - return; - bitmap_clear(to_free, 0, BITS_PER_LONG); - ci = lock_cluster(si, offset); - } - offset += nr; - nr_pages -= nr; - } + do { + if (!__swap_entry_free_locked(si, offset, usage)) + swap_entry_range_free(si, ci, swp_entry(si->type, offset), 1); + } while (++offset < end); unlock_cluster(ci); } @@ -1581,18 +1562,12 @@ void put_swap_folio(struct folio *folio, swp_entry_t entry) return; ci = lock_cluster(si, offset); - if (size > 1 && swap_is_has_cache(si, offset, size)) { - unlock_cluster(ci); - swap_entry_range_free(si, entry, size); - return; - } - for (int i = 0; i < size; i++, entry.val++) { - if (!__swap_entry_free_locked(si, offset + i, SWAP_HAS_CACHE)) { - unlock_cluster(ci); - free_swap_slot(entry); - if (i == size - 1) - return; - lock_cluster(si, offset); + if (swap_is_has_cache(si, offset, size)) + swap_entry_range_free(si, ci, entry, size); + else { + for (int i = 0; i < size; i++, entry.val++) { + if (!__swap_entry_free_locked(si, offset + i, SWAP_HAS_CACHE)) + swap_entry_range_free(si, ci, entry, 1); } } unlock_cluster(ci); @@ -1601,6 +1576,7 @@ void put_swap_folio(struct folio *folio, swp_entry_t entry) void swapcache_free_entries(swp_entry_t *entries, int n) { int i; + struct swap_cluster_info *ci; struct swap_info_struct *si = NULL; if (n <= 0) @@ -1608,8 +1584,11 @@ void swapcache_free_entries(swp_entry_t *entries, int n) for (i = 0; i < n; ++i) { si = _swap_info_get(entries[i]); - if (si) - swap_entry_range_free(si, entries[i], 1); + if (si) { + ci = lock_cluster(si, swp_offset(entries[i])); + swap_entry_range_free(si, ci, entries[i], 1); + unlock_cluster(ci); + } } }