From patchwork Mon Jan 13 17:57:31 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kairui Song X-Patchwork-Id: 13937878 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D2ED9C02180 for ; Mon, 13 Jan 2025 18:00:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6557B6B00A7; Mon, 13 Jan 2025 13:00:41 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 6076C6B00A8; Mon, 13 Jan 2025 13:00:41 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4804F6B00A9; Mon, 13 Jan 2025 13:00:41 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 26FAA6B00A7 for ; Mon, 13 Jan 2025 13:00:41 -0500 (EST) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id BDD6D1601E5 for ; Mon, 13 Jan 2025 18:00:40 +0000 (UTC) X-FDA: 83003193840.15.03D4014 Received: from mail-pl1-f171.google.com (mail-pl1-f171.google.com [209.85.214.171]) by imf12.hostedemail.com (Postfix) with ESMTP id ACA8E4000D for ; Mon, 13 Jan 2025 18:00:38 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=mcVVlBYp; spf=pass (imf12.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.171 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736791238; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=At1biLHm6e57SEHEw6uwq9JjLDEXAlo5gNBhgwUICrU=; b=LTWyUN9Tp5+VYvXmZ3ZlXl/SWYS0hs++L0VzbxwOMzG+elz+RpiPwYdonW1H0CxC3jgj/4 sJCSNhJUMkQDEy12tawG6j5RQXrN+bDZFeIa+CMVxVPBrGr3BuFlNOl/44gltiPC7c8pR6 dFHSoab2PH2e1988jvn2y6nqu3u0QBI= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=mcVVlBYp; spf=pass (imf12.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.171 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736791238; a=rsa-sha256; cv=none; b=DabFqPod3n5gmkX576dBAB6hCllp5Xw/nRL4vV6dOYU26gReQC8bZ1bfO6PZwDj0BBmjGQ YNn+dhgJUSk1qJaeYm7CS24ewu8aeS3QMfRAu3UbpPmvqAReGS5y2ml0JxoM+2zh+GAvvm p370T+UUk/c07yfTAO6kVtakHHJkKzc= Received: by mail-pl1-f171.google.com with SMTP id d9443c01a7336-216395e151bso56012185ad.0 for ; Mon, 13 Jan 2025 10:00:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1736791237; x=1737396037; darn=kvack.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=At1biLHm6e57SEHEw6uwq9JjLDEXAlo5gNBhgwUICrU=; b=mcVVlBYp2E20yw5VGtQfeZj4jNYH/aS3nOnXgtBlwCwyAEnaaoZAz3FVU/CYA/q5X4 qK7jDZb6I2NLDzwO9J8MDqqQiaASkvR56A+kH7LdOUMUKTp7eOjpmo7+ThC7+TB4U+xc jbFlpPRl9ZyEI1dNawsf/Mxiu5gjZjRqrYd5aWn1pyw+RIAubTOwX8mO29smP+VjfoE6 kOo6ux+iofsYQdV5JTuMmOYFBOqv2cYkFeiVXP+EpQSTieXQBg9PPW51ffXtHzwg9niQ CXVAPWKYix4oIanY0Omzk72bT6OPHQOIBZWE9qs7mcdxLI1xiyeUzyIF68fThG345Jg5 E3MQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736791237; x=1737396037; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=At1biLHm6e57SEHEw6uwq9JjLDEXAlo5gNBhgwUICrU=; b=Z+B0erWT7rIISecYCclWh5LHqqbzzuSG0NB5IPQ54fljv+NqmunrZdR76u5XuDNO+u 3BWTz4WVXe8prE3gPm4A5Dc75vtO3a/yc8ohfWJOdmrX33soeKlLf2MRG+WyC+2nIIHQ FRwqihDaX1b7nU0+cv9ZG0N7BWnv6JTq94+cYOFeADP4H5MgrkNz78xVEU6GVSi3VwKF 8zyUvjsNW7mYsjvgEphSLX+bqJNdcDMsEYysMRiGzAaBiTaFv3nEU5WarTm4ljK4suL9 w7mmrH/aXyA6rYsuCyyQN0wGzkraX8CEYnHL+JEoGR7qmBDbjbzUiBUijhIUzcOSG9TA YJoA== X-Gm-Message-State: AOJu0Yyiwx/aeyjUeqzI5PfmC+j59mAj6xUJhVQs4Ukb/K1rZzc1nzAq BTRPtOGCZ5dI3mebQ7A6D30VM3W9fdypC1YW2aXa38Ys52SHVsry2GJXv9vRNfU= X-Gm-Gg: ASbGncsBtN9KT8qHIfOsoygpg6oKlscpc4YKhNEhr1+E6hKI7NhEznlrPVkm48zjyxM V2Hdaga1lI5j9vqN7cGGjfDB2oSpWBMkZPxpnpmKPd5zsjLxydb3MMAPR2e903SMhUWraxPa0gA MkWGx1mhHK/efRRVAv29O/lOM7q/c/i9QeFdBTRPQoQ9dU7PyzVXlO2ry7+4wGTmN5ZwzQVa7+D whz3LqgbEblgNw46Th5572GcIiQuEZ9S8kRN0i84GH/XzakQn+UstaHRHvTmRpo6MyQZHbpRkoA DQ== X-Google-Smtp-Source: AGHT+IHxGwhG9J2j0xHRNOesp9JYTQhHyHxaHzGPoN76JvXqmpGF+xTYQiMe5d834HF94l/UImFywQ== X-Received: by 2002:a17:902:d4c9:b0:216:6ef9:60d with SMTP id d9443c01a7336-21a8d6e9625mr285446795ad.23.1736791236839; Mon, 13 Jan 2025 10:00:36 -0800 (PST) Received: from KASONG-MC4.tencent.com ([115.171.41.132]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-21a9f21aba7sm57023635ad.113.2025.01.13.10.00.33 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 13 Jan 2025 10:00:36 -0800 (PST) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Chris Li , Barry Song , Ryan Roberts , Hugh Dickins , Yosry Ahmed , "Huang, Ying" , Baoquan He , Nhat Pham , Johannes Weiner , Kalesh Singh , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH v4 12/13] mm, swap: use a global swap cluster for non-rotation devices Date: Tue, 14 Jan 2025 01:57:31 +0800 Message-ID: <20250113175732.48099-13-ryncsn@gmail.com> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20250113175732.48099-1-ryncsn@gmail.com> References: <20250113175732.48099-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 X-Rspamd-Server: rspam05 X-Stat-Signature: tdh9q1zt9343pgbhy6kw6r4aqr4aiepi X-Rspamd-Queue-Id: ACA8E4000D X-Rspam-User: X-HE-Tag: 1736791238-426537 X-HE-Meta: U2FsdGVkX1/ClTzADUan90U/I/3YHUS3ucwWt7Xl0HDO7w91bL8W0JAOjteXz6camtqOv/5GW/8z8aR/RE7zfJaL4qdH8JEez+YSNf8IEewTVxVvEV2UJuuTrFHgb1ofOSKhLNmTJaHs6eVWnieRiZYZMiGXstRMO6TIR5Jw/BJlv3DZCsdtCEmRk12IiHvwh0Yps+/YMoz1gWpc1SlVVwjzrDeakgluDNyJ5tWseWjpJ35hOO43JXRXC/AaeMfVmi4Sk1UH/9WTMcw0pan0LV7VV8Ysv9nTRylkBaYm1+iTqqsp9AzTZL7+lTRhqcJk0sdPYn/EbjpDUkJ2SgR8vFDXkcCBAsF7ubtGNujKrMgZlqNEGBxlBHwS/oAUeXuuO7hT0Z7A7CTgNgHWuzUGMIDHz5ejo6MCZcxIzbQhH/kH5gGNgZXLpx2XX6Cr4KZ4RMM+ESWWt7+L+IW3LeDiXIhs3Q9U1UZvnuhEOFZRLI28f6m+rZZHeAf0xF7HnIdoyJ/34Q2w1O5lZiF4UmN4VG4dFLNaihRMvSd/uaTBnAxHDMMYrgbAOM2UnHd2dJ5crtpcXhb10BVC8tnMhbwwrZ4AM97oCVMyWkLeFa6RnblwxnSYvWQn13OmtXFVErQS+ALLJXE1boJIZUaS8WHXCga/xVt6uWQcxEhzYRhD3v1OJEtGvK4rUspBqcn/NhP4V7EIv3gEZXbf8M/NsRNfGpbc5Nz/OHgu0LMo0Rei3FRMzD2A8RlOkb/7g9cQvC7tCH9eMWICBRu7hMlVYZoj3LehRjAUP+qzUH0JMTtvgoXMmfQjt5dL9P1ewdtCmkgbdjdUhkaFfp/lteQNvXB4QWeydYImwVCUD3/DQSaw9x82EYCmP7c9NLDjtl8tF+HOChPEsuH/DX5IbKI/Kp5BfYfmzPGyCqTK9TlmkWbLW2434HnhrRsiD+u7MaQrwxp31Zz3PbnCltJSJ5Pmsz5 KFaXCs/p lat2Ix/8XUXTUEWmrZSDn6yHl0GFZC9YrXyoQp9GFxyFwqtUC/oaUqJbdNQvgvukTCKEhmt9YQLoO+aIHD+TL3eVOvMJgpi1n94aaSKDFeqxNhIupVoKFR6Uja+B0i81hSumCv3oRkbEHBm7jpQUcqT6LCP9KO+ggk36ghEZ5qrcqrTll8mXU/+mN5PaQxD4AaioRZ8va3P71EXCdRNKOeN0x9NvKzSUjjig2htfnKEW4EAaQLjNSBb+DdwMS8p1wmjHr2/xOZuAG6mTM4qfdh3y5M9h2+mNtEscazWYp5QzPzc1K5ntHzEVxA0gTHzZRz3X1jD/UqZ8ugm/+6wxGrSL++o09K7rg+lBupcCAKAziuvH1A0/OrhKgpVzE3/iybLCpafdGsXbiFOMiZMc3Ar4YZQf1l/jOdZqYkjb4m4Du4LD85z50Qx3ySf3//klaQ3gzVjg+EWp+3lWxyQZrMv0s9pDd7Sp1kUgHW7CZRG3Bf9nmwu8OgLgbAyViSCS6ec3Tk2ocNmTnqfjGlvAGObnrzdd9zoBoD+GgEc55g9YIclQ= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Kairui Song Non-rotational devices (SSD / ZRAM) can tolerate fragmentation, so the goal of the SWAP allocator is to avoid contention for clusters. It uses a per-CPU cluster design, and each CPU will use a different cluster as much as possible. However, HDDs are very sensitive to fragmentation, contention is trivial in comparison. Therefore, we use one global cluster instead. This ensures that each order will be written to the same cluster as much as possible, which helps make the I/O more continuous. This ensures that the performance of the cluster allocator is as good as that of the old allocator. Tests after this commit compared to those before this series: Tested using 'make -j32' with tinyconfig, a 1G memcg limit, and HDD swap: make -j32 with tinyconfig, using 1G memcg limit and HDD swap: Before this series: 114.44user 29.11system 39:42.90elapsed 6%CPU (0avgtext+0avgdata 157284maxresident)k 2901232inputs+0outputs (238877major+4227640minor)pagefaults After this commit: 113.90user 23.81system 38:11.77elapsed 6%CPU (0avgtext+0avgdata 157260maxresident)k 2548728inputs+0outputs (235471major+4238110minor)pagefaults Suggested-by: Chris Li Signed-off-by: Kairui Song --- include/linux/swap.h | 2 ++ mm/swapfile.c | 51 ++++++++++++++++++++++++++++++++------------ 2 files changed, 39 insertions(+), 14 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index 4c1d2e69689f..b13b72645db3 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -318,6 +318,8 @@ struct swap_info_struct { unsigned int pages; /* total of usable pages of swap */ atomic_long_t inuse_pages; /* number of those currently in use */ struct percpu_cluster __percpu *percpu_cluster; /* per cpu's swap location */ + struct percpu_cluster *global_cluster; /* Use one global cluster for rotating device */ + spinlock_t global_cluster_lock; /* Serialize usage of global cluster */ struct rb_root swap_extent_root;/* root of the swap extent rbtree */ struct block_device *bdev; /* swap device or bdev of swap file */ struct file *swap_file; /* seldom referenced */ diff --git a/mm/swapfile.c b/mm/swapfile.c index 37d540fa0310..793b2fd1a2a8 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -820,7 +820,10 @@ static unsigned int alloc_swap_scan_cluster(struct swap_info_struct *si, out: relocate_cluster(si, ci); unlock_cluster(ci); - __this_cpu_write(si->percpu_cluster->next[order], next); + if (si->flags & SWP_SOLIDSTATE) + __this_cpu_write(si->percpu_cluster->next[order], next); + else + si->global_cluster->next[order] = next; return found; } @@ -881,9 +884,16 @@ static unsigned long cluster_alloc_swap_entry(struct swap_info_struct *si, int o struct swap_cluster_info *ci; unsigned int offset, found = 0; - /* Fast path using per CPU cluster */ - local_lock(&si->percpu_cluster->lock); - offset = __this_cpu_read(si->percpu_cluster->next[order]); + if (si->flags & SWP_SOLIDSTATE) { + /* Fast path using per CPU cluster */ + local_lock(&si->percpu_cluster->lock); + offset = __this_cpu_read(si->percpu_cluster->next[order]); + } else { + /* Serialize HDD SWAP allocation for each device. */ + spin_lock(&si->global_cluster_lock); + offset = si->global_cluster->next[order]; + } + if (offset) { ci = lock_cluster(si, offset); /* Cluster could have been used by another order */ @@ -975,8 +985,10 @@ static unsigned long cluster_alloc_swap_entry(struct swap_info_struct *si, int o } } done: - local_unlock(&si->percpu_cluster->lock); - + if (si->flags & SWP_SOLIDSTATE) + local_unlock(&si->percpu_cluster->lock); + else + spin_unlock(&si->global_cluster_lock); return found; } @@ -2784,6 +2796,8 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile) mutex_unlock(&swapon_mutex); free_percpu(p->percpu_cluster); p->percpu_cluster = NULL; + kfree(p->global_cluster); + p->global_cluster = NULL; vfree(swap_map); kvfree(zeromap); kvfree(cluster_info); @@ -3189,17 +3203,24 @@ static struct swap_cluster_info *setup_clusters(struct swap_info_struct *si, for (i = 0; i < nr_clusters; i++) spin_lock_init(&cluster_info[i].lock); - si->percpu_cluster = alloc_percpu(struct percpu_cluster); - if (!si->percpu_cluster) - goto err_free; + if (si->flags & SWP_SOLIDSTATE) { + si->percpu_cluster = alloc_percpu(struct percpu_cluster); + if (!si->percpu_cluster) + goto err_free; - for_each_possible_cpu(cpu) { - struct percpu_cluster *cluster; + for_each_possible_cpu(cpu) { + struct percpu_cluster *cluster; - cluster = per_cpu_ptr(si->percpu_cluster, cpu); + cluster = per_cpu_ptr(si->percpu_cluster, cpu); + for (i = 0; i < SWAP_NR_ORDERS; i++) + cluster->next[i] = SWAP_ENTRY_INVALID; + local_lock_init(&cluster->lock); + } + } else { + si->global_cluster = kmalloc(sizeof(*si->global_cluster), GFP_KERNEL); for (i = 0; i < SWAP_NR_ORDERS; i++) - cluster->next[i] = SWAP_ENTRY_INVALID; - local_lock_init(&cluster->lock); + si->global_cluster->next[i] = SWAP_ENTRY_INVALID; + spin_lock_init(&si->global_cluster_lock); } /* @@ -3473,6 +3494,8 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags) bad_swap: free_percpu(si->percpu_cluster); si->percpu_cluster = NULL; + kfree(si->global_cluster); + si->global_cluster = NULL; inode = NULL; destroy_swap_extents(si); swap_cgroup_swapoff(si->type);