From patchwork Tue Jun 18 23:26:45 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ryan Roberts X-Patchwork-Id: 13703161 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5C9CEC2BA1A for ; Tue, 18 Jun 2024 23:27:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 731066B03DC; Tue, 18 Jun 2024 19:27:10 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6E2BD6B03DE; Tue, 18 Jun 2024 19:27:10 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5A62F6B03DF; Tue, 18 Jun 2024 19:27:10 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 3CE7E6B03DC for ; Tue, 18 Jun 2024 19:27:10 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id DD571120B90 for ; Tue, 18 Jun 2024 23:27:09 +0000 (UTC) X-FDA: 82245597378.06.E35FD1E Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf05.hostedemail.com (Postfix) with ESMTP id 497E8100010 for ; Tue, 18 Jun 2024 23:27:08 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=none; spf=pass (imf05.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1718753220; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=2OZMpj7h1Pl3Q6o02DdhVknVwDf166qA3maGyAVOgBw=; b=ZvI/fzZ3M5iYv6SAJtDV9iIjF6NjHENbZyF1I5vmfwoGzOS37OGdAJZ1tH9y5tQbRDBVyk p3f+FhQcUeg17VHaNVIrV+Yd0fEoNApLhVksFRL1ZGSuu7HRN+97LtbcIkT+72DMCWI21/ SKLzv++Yfkp0Xk5k+wQd4Mhvd2mBU9c= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1718753220; a=rsa-sha256; cv=none; b=WU6S6zgChts1bK/s4zGAkDbbEkcQSr80fltl31lO6s4EZ0zhk9fpTawtlgLqJYoQ4VGag0 PBQid4auWjdJkm81yb+CDrL4cB0xRykKj0xlqI1qyjevPehX0BeuGdGai3rory9FRjccDA pNK1qjUT4fGnnx+PoOyhclXKiGSbvJE= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=none; spf=pass (imf05.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 5A856150C; Tue, 18 Jun 2024 16:27:32 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.27]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id EC8073F64C; Tue, 18 Jun 2024 16:27:05 -0700 (PDT) From: Ryan Roberts To: Andrew Morton , Chris Li , Kairui Song , "Huang, Ying" , Kalesh Singh , Barry Song , Hugh Dickins , David Hildenbrand Cc: Ryan Roberts , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [RFC PATCH v1 5/5] mm: swap: Optimize per-order cluster scanning Date: Wed, 19 Jun 2024 00:26:45 +0100 Message-ID: <20240618232648.4090299-6-ryan.roberts@arm.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240618232648.4090299-1-ryan.roberts@arm.com> References: <20240618232648.4090299-1-ryan.roberts@arm.com> MIME-Version: 1.0 X-Stat-Signature: 4xzy1on8y8rtq7do3nuer1qzadgw381o X-Rspamd-Queue-Id: 497E8100010 X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1718753228-652340 X-HE-Meta: U2FsdGVkX1/016TaAjl2z8+ZbPqnEJNbXkumofjAjLSXRGaXHFjp4WGbwCQ1vj89rdjkTJQq29Y+DtEmMKs6LguHswEJ+EcetZCoi/orhEHlJjl+YTzhbO+VcDAZYTXIq8etqbOgd/rSbkxU+dhAl4u7PBmlWlKqasMva7gdkl9Qer+HU0aP6j4Q3V2e9VvIN0SckVwRhtgH4m8mHE7qh6whzz8/M19Zc64Da8f1jXT+GYRLy3FtPuCGxgur1AE9sbJCCqawk6kpc2GWbNxnzWVwXR7wx+3bStgfjEcLQbHwB5rZKWyHiojZYQgC1ImkNLv3hKjSDQCzFknzhBmZzZFkayJua08jEIK/XbhbR9s8hrHh/wi+WPjNkBKvlg4miDSus+L9lWgRx1BtPgx06GXeREkxn7apB38L7Yu0Cwwk7zBGXqJ2dYfEMKd2eh4sfGwK7tvPvE37/uZSzjo/GqIdgFT2aPaHW/bldBSKK7HIDdYbx28yXCHy2nR6nuV8h+keNl1XrcS6bCmA9XKzKF+7C//gMNGIi49RJpmLportsfSw6FVWpukmISDfjBqWgKdSphqzaOQse6qiy8iUm8mv0tyQYFn7EOPR6iAcSqg4IeHlqdpnouH09Mi+ZlSvqK0CmjC/tcBut4vfEuCjc3XgTtxTt5Y356pVcAV9taXV1MXILZYBLAhGE9KLzgtUXKWJU1K2gzJhzylYqQfANr64UFxgbBbmCPHiaMBD7vWzCuyWjFcRTBzgtFfjYV/KpDJ2RH9s0uMTauVrgjE7CsDlNfPXzNOKsAnWzzHCbM1mEE4hTJHkvnT7E58BX43FSBk2sFst671DwgOH2c+uqS70Qne7SiC0o48RIPRwcLHEL9v403c4GUEJJzkePL/1ywMSv4Z836+1VD1nV+87CvZYrAUF6rCM18rK8J0nE4RWoCzoXXbT9WUyqwPzijt0MAm+tVvkJVoBGITAMsx YcUX/TRp jP97leQv6rBgijtGzRVrfxuGE0XOON5xA/5PFCbDuV88CR7B+IGf7A2d04U+1pD1++3rYNUfjBAjVjSAaZ8p3y/YkoJP+yIr7TxaUCqYAXl9hy2KAqp5TUFLMhln7Q9/3J4tK3bV5d0lk1drZUD4i7WFCVQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Add CLUSTER_FLAG_SKIP_SCAN cluster flag, which is applied to a cluster under 1 of 2 conditions. When present, the cluster will be skipped during a scan. - When the number of free entries is less than the number of entries that would be required for a new allocation of the order that the cluster serves. - When scanning completes for the cluster, and no further scanners are active for the cluster and no swap entries were freed for the cluster since the last scan began. In this case, it has been proven that there are no contiguous free entries of sufficient size to allcoate the order that the cluster serves. In this case the cluster is made eligible for scanning again when the next entry is freed. The latter is implemented to permit multiple CPUs to scan the same cluster, which in turn garrantees that if there is a free block available in a cluster allocated for the desired order then it will be allocated on a first come, first served basis. As a result, the number of active scanners for a cluster must be tracked, costing 4 bytes per cluster. Signed-off-by: Ryan Roberts --- include/linux/swap.h | 3 +++ mm/swapfile.c | 36 ++++++++++++++++++++++++++++++++++-- 2 files changed, 37 insertions(+), 2 deletions(-) -- 2.43.0 diff --git a/include/linux/swap.h b/include/linux/swap.h index 34ec4668a5c9..40c308749e79 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -257,9 +257,12 @@ struct swap_cluster_info { unsigned int data:24; unsigned int flags:4; unsigned int order:4; + unsigned int nr_scanners; }; #define CLUSTER_FLAG_FREE 1 /* This cluster is free */ #define CLUSTER_FLAG_NEXT_NULL 2 /* This cluster has no next cluster */ +#define CLUSTER_FLAG_SKIP_SCAN 4 /* Skip cluster for per-order scan */ +#define CLUSTER_FLAG_DECREMENT 8 /* A swap entry was freed from cluster */ /* * swap_info_struct::max is an unsigned int, so the maximum number of pages in diff --git a/mm/swapfile.c b/mm/swapfile.c index 24db03db8830..caf382b4ecd3 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -574,6 +574,9 @@ static void add_cluster_info_page(struct swap_info_struct *p, VM_BUG_ON(cluster_count(&cluster_info[idx]) + count > SWAPFILE_CLUSTER); cluster_set_count(&cluster_info[idx], cluster_count(&cluster_info[idx]) + count); + + if (SWAPFILE_CLUSTER - cluster_count(&cluster_info[idx]) < count) + cluster_info[idx].flags |= CLUSTER_FLAG_SKIP_SCAN; } /* @@ -595,6 +598,7 @@ static void dec_cluster_info_page(struct swap_info_struct *p, struct swap_cluster_info *cluster_info, unsigned long page_nr) { unsigned long idx = page_nr / SWAPFILE_CLUSTER; + unsigned long count = 1 << cluster_info[idx].order; if (!cluster_info) return; @@ -603,6 +607,10 @@ static void dec_cluster_info_page(struct swap_info_struct *p, cluster_set_count(&cluster_info[idx], cluster_count(&cluster_info[idx]) - 1); + cluster_info[idx].flags |= CLUSTER_FLAG_DECREMENT; + if (SWAPFILE_CLUSTER - cluster_count(&cluster_info[idx]) >= count) + cluster_info[idx].flags &= ~CLUSTER_FLAG_SKIP_SCAN; + if (cluster_count(&cluster_info[idx]) == 0) free_cluster(p, idx); } @@ -708,7 +716,8 @@ static unsigned int next_cluster_for_scan(struct swap_info_struct *si, end = offset_to_cluster(si, *stop); while (ci != end) { - if ((ci->flags & CLUSTER_FLAG_FREE) == 0 && ci->order == order) + if ((ci->flags & (CLUSTER_FLAG_SKIP_SCAN | CLUSTER_FLAG_FREE)) == 0 + && ci->order == order) break; ci = next_cluster_circular(si, ci); } @@ -722,6 +731,21 @@ static unsigned int next_cluster_for_scan(struct swap_info_struct *si, return cluster_to_offset(si, ci); } +static inline void cluster_inc_scanners(struct swap_cluster_info *ci) +{ + /* Protected by si lock. */ + ci->nr_scanners++; + ci->flags &= ~CLUSTER_FLAG_DECREMENT; +} + +static inline void cluster_dec_scanners(struct swap_cluster_info *ci) +{ + /* Protected by si lock. */ + ci->nr_scanners--; + if (ci->nr_scanners == 0 && (ci->flags & CLUSTER_FLAG_DECREMENT) == 0) + ci->flags |= CLUSTER_FLAG_SKIP_SCAN; +} + /* * Try to get swap entries with specified order from current cpu's swap entry * pool (a cluster). This might involve allocating a new cluster for current CPU @@ -764,6 +788,8 @@ static bool scan_swap_map_try_ssd_cluster(struct swap_info_struct *si, return false; } else return false; + + cluster_inc_scanners(offset_to_cluster(si, tmp)); } /* @@ -780,13 +806,19 @@ static bool scan_swap_map_try_ssd_cluster(struct swap_info_struct *si, } unlock_cluster(ci); if (tmp >= max) { + cluster_dec_scanners(ci); cluster->next[order] = SWAP_NEXT_INVALID; goto new_cluster; } *offset = tmp; *scan_base = tmp; tmp += nr_pages; - cluster->next[order] = tmp < max ? tmp : SWAP_NEXT_INVALID; + if (tmp >= max) { + cluster_dec_scanners(ci); + cluster->next[order] = SWAP_NEXT_INVALID; + } else { + cluster->next[order] = tmp; + } return true; }