From patchwork Tue Dec 24 14:38:10 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kairui Song X-Patchwork-Id: 13920200 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9CAA3E7718D for ; Tue, 24 Dec 2024 14:40:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2FCCA6B00A5; Tue, 24 Dec 2024 09:40:23 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 283C96B00A6; Tue, 24 Dec 2024 09:40:23 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 089FB6B00A7; Tue, 24 Dec 2024 09:40:23 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id D85A36B00A5 for ; Tue, 24 Dec 2024 09:40:22 -0500 (EST) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 8F901C1A53 for ; Tue, 24 Dec 2024 14:40:22 +0000 (UTC) X-FDA: 82930112622.28.364A9FD Received: from mail-pl1-f179.google.com (mail-pl1-f179.google.com [209.85.214.179]) by imf04.hostedemail.com (Postfix) with ESMTP id AB9564000A for ; Tue, 24 Dec 2024 14:39:39 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Lq6ABOJL; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf04.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.179 as permitted sender) smtp.mailfrom=ryncsn@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1735051183; a=rsa-sha256; cv=none; b=j/vE+YKvqX2bJSdptpznot/aHsc22VFuWUANgjZ345Y/3oZDnxFYL/b9Sj1b7PvHAx8WfR cVVe6QySjVzN+KTFhc44FyeSkId0blf7lCdwOWmxhutEJ/EICN1agDIn9diCWngKSH4crc VnuY083Sx38SZ0abuMzd7+aBaglT6QA= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Lq6ABOJL; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf04.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.179 as permitted sender) smtp.mailfrom=ryncsn@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1735051183; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=grOS/bcMSihQT+oftYQ+fEn5xlcSkT6BHxL9XDq9XDs=; b=Wr7IjlBSW/zUneXliuVxIrE53b6gCD8YGu6ZVfqkySKaLTRsJ5oehFxGnNKrFZDbJIJEJt Iyq07meA4GZvfG518WzKzHgZAyywlh2j8MZT/xez95Hu9nVUy01GjlFIzQSqH1ZUKCZgk5 wM5mnRsJ/+0T4ZE9kIPz/qrM7OT6VL4= Received: by mail-pl1-f179.google.com with SMTP id d9443c01a7336-21644e6140cso55615755ad.1 for ; Tue, 24 Dec 2024 06:40:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1735051219; x=1735656019; darn=kvack.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=grOS/bcMSihQT+oftYQ+fEn5xlcSkT6BHxL9XDq9XDs=; b=Lq6ABOJLG6BohpY1+qu7I1nJRicXGBQE7kbxd+H+qGLu3xEy39mqKwc97bwS+Gkg9h jTD+M1VJ1Xe7h9H/aYWTcgydeKAXSMWKwKNeAtp3vYqMvg37xJLAvzNmZRjbFkZHXWO5 3evRNi+/Ea2Ky3qpM6PcRH5zcEzAwKMEYK+kiQEh/Yas8RdxYgRg7fxsu60q72JP9Fwd nvvghI3L69OV/ahnRx+l6c9Ey79x0RPN/zfIq2HRRrmkWJOqhYiQ6nlXY1kuXX87qmac aR5fpH8GjyAp6MyX/3Kn624Sh8ud5I4SxdZpMdYaQ+8m6xGcNsTFPTi757djUZFe5RoK DI6w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1735051219; x=1735656019; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=grOS/bcMSihQT+oftYQ+fEn5xlcSkT6BHxL9XDq9XDs=; b=Y0KFHLZaetZrr1QJcQRJ2kAO0bNMVSn6Js1nuQU6xxQIrP3ippg2B0TaOXckmV5MH6 yejaK2yHFhqNNPyap+FjmFfloD2ML2fqE5pr/+xC7fbVBQ+aA9LYt2gKGEIcPhiC3OFd FD4asGT7pCN6yCA4c+gPn6LGIvrT4veK24OYL/ftF4hSoRvU2v759/eIP3UvQD/fVTTa bbaIGivpfMU4/8j/YdD6OWNOXNQj0+GntunCIDIlxCO1tDM8W+HI0U6FkPjYWmea3Cfg Iw3F2f1751iNtvrTqZB/DOga4idN6sSMnzCUQWtI+V1IsNjFtfQH2NPLvPk7eT38RmTL CCRg== X-Gm-Message-State: AOJu0YzQSgCjRHYKS6PowCozj2mlY9TNNVmjjJsRDdVk6Mn9gGIZ9JB2 5FQ/9YTHzt7RL8ekzujZ4eOHM9lBoJx9lvZoXJT/0smKka5AAM2Z12jhewXpTb4= X-Gm-Gg: ASbGncu5AVoGbIpYZtOvq5UJU5r28CB+W8CQdQvRn+3FZ5Yw5gowlxTBu2ahBWaLknt qFdddkGyp5qpWmRzHPKJLefvTZr+kGTPnxLnwR2zQJpAfysRg7re9iTEV8Jq28jY7yJBVasQllU cDGt3wbpoZyHVm/lQ8WRhxsG9saudjVf4lqDIViWYLX2cWG3NW9cvXRyyM4Mq96FIsPzKG0olY2 xvH63hckdZmU12b/1ZNg9DteOqQ5XPwYWWD5Erwt/cKzwm5AoImAkDSCpF8nKcD/Gn9rxO8r00o NA== X-Google-Smtp-Source: AGHT+IGc1mwapEXAYQJNjva8I5Jy3JbCFO3+Cko8mZMiFwBQtMcFjYL1OETSaeprQYaypdgO4WqVRg== X-Received: by 2002:a17:903:120d:b0:216:5af7:5a8e with SMTP id d9443c01a7336-219e6ebdc17mr247308385ad.26.1735051218961; Tue, 24 Dec 2024 06:40:18 -0800 (PST) Received: from KASONG-MC4.tencent.com ([115.171.41.189]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-219dc9cdec1sm90598735ad.136.2024.12.24.06.40.15 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 24 Dec 2024 06:40:18 -0800 (PST) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Chris Li , Barry Song , Ryan Roberts , Hugh Dickins , Yosry Ahmed , "Huang, Ying" , Nhat Pham , Johannes Weiner , Kalesh Singh , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH v2 12/13] mm, swap: use a global swap cluster for non-rotation devices Date: Tue, 24 Dec 2024 22:38:10 +0800 Message-ID: <20241224143811.33462-13-ryncsn@gmail.com> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20241224143811.33462-1-ryncsn@gmail.com> References: <20241224143811.33462-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 X-Stat-Signature: gfoxuhyagbgqrik6hopticdmnaj5f4jn X-Rspam-User: X-Rspamd-Queue-Id: AB9564000A X-Rspamd-Server: rspam08 X-HE-Tag: 1735051179-721435 X-HE-Meta: U2FsdGVkX19l0cBPT9+VjZ22l5dPsYy+6nYPEMuyxlzv/RzwzktzGfI3xJv9zueOOEQKPNxN3ZyqgYnSKHsOm5+zhDwAN0j0VnN0omj7RsQi9f0tmd0u9KeKAKO8q0rMKygyoNFL3+hihqW+vb+vvUhA5HZiXtSCb3ht/mL6QSjt9jOmQ4unwkMf7wehPRN6DRfqyWQ1xvqVDWQmBy2o5oFiIyYxuMtWpr5ETGiW80lODughyiC40N816Gv/tc5cjrlK9S/qS18v4mKSXzcIBQ7iXU3MR0OVRoG+gm+WWJKt5DRRRD3em1MNBrj/luhds1sNnJBKstpdLPOgN7I0x/+IoSwvYfvg0VnMki2AYHMoXSlBbKWVjHbC/SVlcH/GRxA/vFPP7GBOHZx/nausyyO2FDWAzHV/z5L+juk0RFQ5K25TM8sKoxUxm5wQNm7uHHyyy/HAH9cgu2IwBOcOEft6/k1EO5Uz6ZY7ZWJOwMxaQEJ+70N6veG/R7Ijz3PlkuPaqGswO/SM432iME/bhS41sPwIXuKDM0B/MxO0r9X1SqyScmVipCtCR7wGbfywPVmFsV436K5Nc6giX7bvgDBXPRyzI6KypW3CuSsGAcGc8ujpjRFywFWRJhN/mn5qRbWV2zjVI/KIwdbIy6eX8MMRNpNvBKw9wZ9i32zmgf/qNMOCSNJWjVaZ4a7kUf9XpVF10RdhL3cKwkHHsiC8gHpBRWEagKxHV4XQkHclw2TB/pwKDRY1BmMtxsJyWWzzdVfZqEwYUjUYbm6lEI3kQE1x8HDidcRzMieQU5t3xX+Az4pcmdFZfdQ5YrBI99H7/Zf+cfIBqI4XWVYS0czaMpCzD6RN4d35ATB133iVYvnmCWtusqyfgSTCuVT6qhLjEctq1sfznO7IXcAWeaE4HKcwxnX93awIQFCmy7u9j8cPHrEiwX0Nxs410IhWv27HV8Kz0xouWvMZjfBjP4x 82PLPqA6 bX4CqJNYe8qbhutioFgC6b7HOQgd2d/Z6RQN4oJ1L8pw8Eb9DdHwuAipFK3/spgtMOLCVFtR5vA1Q/ZvoY8GEWaGi8OOWuK1J8l8d6lS5O6gsctDQq+L6RxfDbODXvrqFIbNjjmAu/TIgLWSL4+H6nbo+Vbn/mN2jy7oWrL0UnJtj9HN6YSz81wME73C/jYXsfZBaRjZ0yXk8HVqgpl8kR1o960mepRpgR9fI7hys1j3fRnDFckls60Oze98HVSg5yJVOq9v5vzlQ0XJ4oXBCkVPOr6oXy18g57EGFQlKlYK2amfHqbOYkvOggogauaGMHyL3eTL7zcCkPxoaPU0zIcvCIiXu8Hid5FWqWtZn9w71yE2hSkYIOWWnigiD399pCvCPBNku+NFBiFGRigT6gSNrdjAG7uHyTKVHqC1NJ3pBDUQpJQ6GE8EbaZaWNSNFLqM9NdBWtTA0EV/KpEMkQ2k982iZ1SY9GeYvztCPiFvAczcx+uBhYaTsKocBAbXEPK++f1/yKgkYZD8TKFPlB+xX2oRcoEMtIU3mCJkSr5W4cYU= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000016, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Kairui Song Non-rotational devices (SSD / ZRAM) can tolerate fragmentation, so the goal of the SWAP allocator is to avoid contention for clusters. It uses a per-CPU cluster design, and each CPU will use a different cluster as much as possible. However, HDDs are very sensitive to fragmentation, contention is trivial in comparison. Therefore, we use one global cluster instead. This ensures that each order will be written to the same cluster as much as possible, which helps make the I/O more continuous. This ensures that the performance of the cluster allocator is as good as that of the old allocator. Tests after this commit compared to those before this series: Tested using 'make -j32' with tinyconfig, a 1G memcg limit, and HDD swap: make -j32 with tinyconfig, using 1G memcg limit and HDD swap: Before this series: 114.44user 29.11system 39:42.90elapsed 6%CPU (0avgtext+0avgdata 157284maxresident)k 2901232inputs+0outputs (238877major+4227640minor)pagefaults After this commit: 113.90user 23.81system 38:11.77elapsed 6%CPU (0avgtext+0avgdata 157260maxresident)k 2548728inputs+0outputs (235471major+4238110minor)pagefaults Suggested-by: Chris Li Signed-off-by: Kairui Song --- include/linux/swap.h | 2 ++ mm/swapfile.c | 51 ++++++++++++++++++++++++++++++++------------ 2 files changed, 39 insertions(+), 14 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index 4c1d2e69689f..b13b72645db3 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -318,6 +318,8 @@ struct swap_info_struct { unsigned int pages; /* total of usable pages of swap */ atomic_long_t inuse_pages; /* number of those currently in use */ struct percpu_cluster __percpu *percpu_cluster; /* per cpu's swap location */ + struct percpu_cluster *global_cluster; /* Use one global cluster for rotating device */ + spinlock_t global_cluster_lock; /* Serialize usage of global cluster */ struct rb_root swap_extent_root;/* root of the swap extent rbtree */ struct block_device *bdev; /* swap device or bdev of swap file */ struct file *swap_file; /* seldom referenced */ diff --git a/mm/swapfile.c b/mm/swapfile.c index 0445a2db8492..482c531bdd8b 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -814,7 +814,10 @@ static unsigned int alloc_swap_scan_cluster(struct swap_info_struct *si, out: relocate_cluster(si, ci); unlock_cluster(ci); - __this_cpu_write(si->percpu_cluster->next[order], next); + if (si->flags & SWP_SOLIDSTATE) + __this_cpu_write(si->percpu_cluster->next[order], next); + else + si->global_cluster->next[order] = next; return found; } @@ -875,9 +878,16 @@ static unsigned long cluster_alloc_swap_entry(struct swap_info_struct *si, int o struct swap_cluster_info *ci; unsigned int offset, found = 0; - /* Fast path using per CPU cluster */ - local_lock(&si->percpu_cluster->lock); - offset = __this_cpu_read(si->percpu_cluster->next[order]); + if (si->flags & SWP_SOLIDSTATE) { + /* Fast path using per CPU cluster */ + local_lock(&si->percpu_cluster->lock); + offset = __this_cpu_read(si->percpu_cluster->next[order]); + } else { + /* Serialize HDD SWAP allocation for each device. */ + spin_lock(&si->global_cluster_lock); + offset = si->global_cluster->next[order]; + } + if (offset) { ci = lock_cluster(si, offset); /* Cluster could have been used by another order */ @@ -972,8 +982,10 @@ static unsigned long cluster_alloc_swap_entry(struct swap_info_struct *si, int o } } done: - local_unlock(&si->percpu_cluster->lock); - + if (si->flags & SWP_SOLIDSTATE) + local_unlock(&si->percpu_cluster->lock); + else + spin_unlock(&si->global_cluster_lock); return found; } @@ -2774,6 +2786,8 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile) mutex_unlock(&swapon_mutex); free_percpu(p->percpu_cluster); p->percpu_cluster = NULL; + kfree(p->global_cluster); + p->global_cluster = NULL; vfree(swap_map); kvfree(zeromap); kvfree(cluster_info); @@ -3179,17 +3193,24 @@ static struct swap_cluster_info *setup_clusters(struct swap_info_struct *si, for (i = 0; i < nr_clusters; i++) spin_lock_init(&cluster_info[i].lock); - si->percpu_cluster = alloc_percpu(struct percpu_cluster); - if (!si->percpu_cluster) - goto err_free; + if (si->flags & SWP_SOLIDSTATE) { + si->percpu_cluster = alloc_percpu(struct percpu_cluster); + if (!si->percpu_cluster) + goto err_free; - for_each_possible_cpu(cpu) { - struct percpu_cluster *cluster; + for_each_possible_cpu(cpu) { + struct percpu_cluster *cluster; - cluster = per_cpu_ptr(si->percpu_cluster, cpu); + cluster = per_cpu_ptr(si->percpu_cluster, cpu); + for (i = 0; i < SWAP_NR_ORDERS; i++) + cluster->next[i] = SWAP_ENTRY_INVALID; + local_lock_init(&cluster->lock); + } + } else { + si->global_cluster = kmalloc(sizeof(*si->global_cluster), GFP_KERNEL); for (i = 0; i < SWAP_NR_ORDERS; i++) - cluster->next[i] = SWAP_ENTRY_INVALID; - local_lock_init(&cluster->lock); + si->global_cluster->next[i] = SWAP_ENTRY_INVALID; + spin_lock_init(&si->global_cluster_lock); } /* @@ -3463,6 +3484,8 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags) bad_swap: free_percpu(si->percpu_cluster); si->percpu_cluster = NULL; + kfree(si->global_cluster); + si->global_cluster = NULL; inode = NULL; destroy_swap_extents(si); swap_cgroup_swapoff(si->type);