From patchwork Tue Oct 22 19:24:50 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kairui Song X-Patchwork-Id: 13846086 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DA42FCDD0CB for ; Tue, 22 Oct 2024 19:30:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6A91D6B00A4; Tue, 22 Oct 2024 15:30:32 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 659B16B00A5; Tue, 22 Oct 2024 15:30:32 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 459646B00A6; Tue, 22 Oct 2024 15:30:32 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 1DC5A6B00A4 for ; Tue, 22 Oct 2024 15:30:32 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 064CB1A045C for ; Tue, 22 Oct 2024 19:30:02 +0000 (UTC) X-FDA: 82702229022.09.82AFB68 Received: from mail-pl1-f181.google.com (mail-pl1-f181.google.com [209.85.214.181]) by imf11.hostedemail.com (Postfix) with ESMTP id 3481440024 for ; Tue, 22 Oct 2024 19:30:09 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="FU/i7Y5X"; spf=pass (imf11.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.181 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1729625228; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=h2jmxIUhNZxFOtFl07G9SGXSpDqPv8x+3F53EFaJwLQ=; b=LAMXhA90jK4jRR8H2q6TJh9CRdsIiF8oh/TrwgmSfJAYSVnwJYpiL1Mt4DvVS9HD6qjjKK gL45ZUy2OKZIgoPLG0MzWNuwd2/1tqzqVgVnG3WL5llg4EpMJaD78l8tYTrRVdz3DT0Yj7 MEQZXd1J4p62gmfutSx4lPKbKTXCdv8= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="FU/i7Y5X"; spf=pass (imf11.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.181 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729625228; a=rsa-sha256; cv=none; b=jQJJBnTybXmgejsPvFaJy8vTTJW2g7htnYIGR2g3PC2a72Kbm4jcExL2IMpB+rZpXvQnla uthxMcsgCpUzWCAd/Zw7ayFCwtKgmZjwHaegZxikGK7sd//59fyQQZRek64IZlEMd3EC+w tGgUINfBYHakJrS8NQClcImWZQOfw/U= Received: by mail-pl1-f181.google.com with SMTP id d9443c01a7336-20cdbe608b3so49603685ad.1 for ; Tue, 22 Oct 2024 12:30:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1729625428; x=1730230228; darn=kvack.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=h2jmxIUhNZxFOtFl07G9SGXSpDqPv8x+3F53EFaJwLQ=; b=FU/i7Y5X9sOf7C7azB2XyvzgOz3gseurnGLZYpK+OPQzxgOtsPmOl7iC6kO4vZKNWJ N6wEXOMmCPZ3tt941NsA1rJfvgas34bRL3J0dAJEqVNWy2OE7fr3+572xloFrW5EECpK MEhAomJFhvIOcvfrhUVw68Ek5jFBbFNeyiiK7vA3SgO1Tv0wpa8MFx1in7M/2X1DDwo8 REVpPpmp13DX3i0srB6tSgizoI4Hx6VDqMJoVzIWpp5zXdbhbtHly8VLObbtPnR00FqM 7X0sRIqJKEQzfyPSxynKbtQI5bUunrx40genAE4Vqx6N93n2jHLSXvSJrFUj1nVq+Sar oOHw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729625428; x=1730230228; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=h2jmxIUhNZxFOtFl07G9SGXSpDqPv8x+3F53EFaJwLQ=; b=Qykm8oDho/WdjkR7Zq9tl4hkW6roDgVIH647uGcmiZ7BCP1/fl/Z5AzvVcm7iYRST0 ngMFRYXN5EfRiIyKqIoRbAVhVhQpezwG4mLYGeCitjXVBz6PYLB66VKhqXMiMvYohJuE DLiuMYmrT+em7dRXMZcH44PLRRrPx1eSYDFeD767USXr+4yGBLVj2dr/3APQCb7s3KE+ 7KbVEiY2QKlmg2fGAoOcL1CBHmsQz7u1+aIpP4SoeW9uAc4smRxhgki51HRSz6W5aW+R 5W+cpll2zPekObaEe/9E9nW5HrkkTEha4XIC1CxhJoHchfsmgN2MRqgjZGjHPj1M77T3 5pqA== X-Gm-Message-State: AOJu0YyMOpUOWy8JkjoSjm7Np4XoeVjdKQz1nV1O0jf/V7Z0nUK1gJay tSq3DbizX5f1dPAjmWiA0v6bFHXp+EzeG5GekjB3xmZWwwpGyiTKOOpNP42L6Us= X-Google-Smtp-Source: AGHT+IFV1ogV52DzmXBw68h0S9N3oYY1APJh2lIGdJCHh5CcSXToC3t30idapf9oyW75V51NGLt65w== X-Received: by 2002:a17:902:e752:b0:20f:ab2e:14f9 with SMTP id d9443c01a7336-20fab2e150bmr2982245ad.55.1729625428254; Tue, 22 Oct 2024 12:30:28 -0700 (PDT) Received: from KASONG-MC4.tencent.com ([106.37.123.36]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-20e7f0d9f05sm45895305ad.186.2024.10.22.12.30.24 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 22 Oct 2024 12:30:27 -0700 (PDT) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Chris Li , Barry Song , Ryan Roberts , Hugh Dickins , Yosry Ahmed , "Huang, Ying" , Tim Chen , Nhat Pham , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH 12/13] mm, swap: use a global swap cluster for non-rotation device Date: Wed, 23 Oct 2024 03:24:50 +0800 Message-ID: <20241022192451.38138-13-ryncsn@gmail.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20241022192451.38138-1-ryncsn@gmail.com> References: <20241022192451.38138-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 3481440024 X-Stat-Signature: 34kdite78646aacyurqix6bcy9pqdktx X-Rspam-User: X-HE-Tag: 1729625409-206308 X-HE-Meta: U2FsdGVkX19wDFxB68GXOdCmXEbcwEl2hhHLvMviWUXxR7K39MzmDVZekdw3jzqXZidsTeRdDHS0R+GrVoeUvzrbyzKV7IMrAGo9VZ8K63IEn3OcZYZUCarV/UzWiEBcs+2UTGymW42nJ2vDCvNUirVKAY+1jDJgfTKANM93SL02XupMlHmwHN0LM0ENaZKTihmp/dBX0u8/ivCbukCCEed2VCDmkfj1/P6skArXhIxVr7gJ3N4XCaSiD/YEqUss7LJyy7hHTTZ4ll/CCHCGdGsqb8qtnsOICzh7tO+9MI8NUMQcY89DIr8rHVKB3BfEzLaUbJMXocoH9/VEfG6qu2mWjMKFNiATSlkxjqB7IUvyXunCYd0Ibw4V/3fRJr31pjbDTfBdx8rJNX0Zxiu90XfmH0ujYM7iiLfC5/lyFRWHjgZAiJMsxIi+WQ1CYNFr/5eH5024wRMwzNdRNxJlPt7chfWUW9rMbIEtQtIp0v0oTG4NlMMzL88GGjmjGiaqcJhea6rARgBxX34HR+DmrUXHbpwgIImeeihv1Od8YZg+MZgFbYUPTZbmgc3bdqWq6BTEmdDnO4QEiKHiaVgDxp1c5yHeXMGCbIgfmL9YBnVNAtY/TvoAfmh+lnWt1TvIO7a7P2FYjoxFEMTyezlMF+cr3SzpugHcA87opFvuXAVcE59pYQskzSH0oBKeJlA0jiEfT1PBRSJxqy4kWZPKnzgVdIRFagBlxhnd2oEj01G+HeT6j1rkz261wQ4L/Zu9R/X4jg2f3vzeR7fXex2oWd5UbAI+A/9SzI2uE35PyVieFFmPt0ENQ6NaOEMS10o+kNWe+4DDk9JL4pL8TP6A0dyWePlOcmx/kdb7scfJMOe5QowrbttLZOgH2Qr/VqD0PUrElNHzRG9jdTLahA3O/d/OSKLFNqtrBxdsqMnpuZ/YTiiC6KaWptwAORkUquo34p/Dy4BYN8Q7tGeSecR GOM5AvaO xOm6onvPVH7xSYhqH82hYu+gwNP8mytSPxmxozJzEugrlXU3v7rd2+heJODnaNdRcJCcxiABqObDV2KsJJxmH+Lf+Oe+9wGpI/RNja1W1y3ClkyJSyU4KDl9iY9me4oQCZXpyXa/xlcFAY1gA0q+Be/3YmO5fKRziGjS9mYU6Qp8MJtdX3AFIB92njPthR6GjMLZGZuhm95w4CbE3JystHqXOWQyFKK/sMXWTgQCD+AScPnrSjvFAoF/nRUe66/+E1iO/6WsfTqPX1EvM+QVgpC6R3fEy2sYyRAaO+ISwywtkpLRhYmkfHt/s13GGy6l2ztMSmnYnHnINCAF1uNFJX1cmqLk3pLNhoHZBU4aKrqlbgLNs+YfJJd4bIq27VfwjI82FeB4brCM/dt8+61QLtfJl8Al6roqXeIdqhNKe+E6YWci5MkNcZv6exJ6R3j6NV8RRnbXlMsRbomVXLF1wOKc+ezFnOF9tnKZ8vJOEs1FWlQepU2I2ZMl8T2+YTLtUjRCq06B2bLHqEuQPW+TJKtR+HvRFXorya0Yy/tZRbbeLhl0= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Kairui Song Non-rotation (SSD / ZRAM) device can tolerate fragmentations so the goal of SWAP allocator is to avoid contention of clusters. So it used a per-CPU cluster design, and each CPU will be using a different cluster as much as possible. But HDD is very sensitive to fragmentations, contention is trivial compared to this. So just use one global cluster instead. This ensured each order will be wring to a same cluster as much as possible, which helps to make the IO more continuous. This ensures the performance of cluster allocator is as good as the old allocator. Test after this commit compared to before this series: make -j32 with tinyconfig, using 1G memcg limit and HDD swap: Before this series: 114.44user 29.11system 39:42.90elapsed 6%CPU (0avgtext+0avgdata 157284maxresident)k 2901232inputs+0outputs (238877major+4227640minor)pagefaults After this commit: 113.90user 23.81system 38:11.77elapsed 6%CPU (0avgtext+0avgdata 157260maxresident)k 2548728inputs+0outputs (235471major+4238110minor)pagefaults Suggested-by: Chris Li Signed-off-by: Kairui Song --- include/linux/swap.h | 2 ++ mm/swapfile.c | 48 ++++++++++++++++++++++++++++++++------------ 2 files changed, 37 insertions(+), 13 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index 0e6c6bb385f0..9898b1881d4d 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -319,6 +319,8 @@ struct swap_info_struct { unsigned int pages; /* total of usable pages of swap */ atomic_long_t inuse_pages; /* number of those currently in use */ struct percpu_cluster __percpu *percpu_cluster; /* per cpu's swap location */ + struct percpu_cluster *global_cluster; /* Use one global cluster for rotating device */ + spinlock_t global_cluster_lock; /* Serialize usage of global cluster */ struct rb_root swap_extent_root;/* root of the swap extent rbtree */ struct block_device *bdev; /* swap device or bdev of swap file */ struct file *swap_file; /* seldom referenced */ diff --git a/mm/swapfile.c b/mm/swapfile.c index f25d697f6736..6eb298a222c0 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -798,7 +798,10 @@ static unsigned int alloc_swap_scan_cluster(struct swap_info_struct *si, out: relocate_cluster(si, ci); unlock_cluster(ci); - __this_cpu_write(si->percpu_cluster->next[order], next); + if (si->flags & SWP_SOLIDSTATE) + __this_cpu_write(si->percpu_cluster->next[order], next); + else + si->global_cluster->next[order] = next; return found; } @@ -860,8 +863,14 @@ static unsigned long cluster_alloc_swap_entry(struct swap_info_struct *si, int o unsigned int offset, found = 0; /* Fast path using per CPU cluster */ - local_lock(&si->percpu_cluster->lock); - offset = __this_cpu_read(si->percpu_cluster->next[order]); + if (si->flags & SWP_SOLIDSTATE) { + local_lock(&si->percpu_cluster->lock); + offset = __this_cpu_read(si->percpu_cluster->next[order]); + } else { + spin_lock(&si->global_cluster_lock); + offset = si->global_cluster->next[order]; + } + if (offset) { ci = lock_cluster(si, offset); /* Cluster could have been used by another order */ @@ -960,8 +969,10 @@ static unsigned long cluster_alloc_swap_entry(struct swap_info_struct *si, int o } } done: - local_unlock(&si->percpu_cluster->lock); - + if (si->flags & SWP_SOLIDSTATE) + local_unlock(&si->percpu_cluster->lock); + else + spin_unlock(&si->global_cluster_lock); return found; } @@ -2737,6 +2748,8 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile) mutex_unlock(&swapon_mutex); free_percpu(p->percpu_cluster); p->percpu_cluster = NULL; + kfree(p->global_cluster); + p->global_cluster = NULL; vfree(swap_map); kvfree(zeromap); kvfree(cluster_info); @@ -3142,17 +3155,24 @@ static struct swap_cluster_info *setup_clusters(struct swap_info_struct *si, for (i = 0; i < nr_clusters; i++) spin_lock_init(&cluster_info[i].lock); - si->percpu_cluster = alloc_percpu(struct percpu_cluster); - if (!si->percpu_cluster) - goto err_free; + if (si->flags & SWP_SOLIDSTATE) { + si->percpu_cluster = alloc_percpu(struct percpu_cluster); + if (!si->percpu_cluster) + goto err_free; - for_each_possible_cpu(cpu) { - struct percpu_cluster *cluster; + for_each_possible_cpu(cpu) { + struct percpu_cluster *cluster; - cluster = per_cpu_ptr(si->percpu_cluster, cpu); + cluster = per_cpu_ptr(si->percpu_cluster, cpu); + for (i = 0; i < SWAP_NR_ORDERS; i++) + cluster->next[i] = SWAP_ENTRY_INVALID; + local_lock_init(&cluster->lock); + } + } else { + si->global_cluster = kmalloc(sizeof(*si->global_cluster), GFP_KERNEL); for (i = 0; i < SWAP_NR_ORDERS; i++) - cluster->next[i] = SWAP_ENTRY_INVALID; - local_lock_init(&cluster->lock); + si->global_cluster->next[i] = SWAP_ENTRY_INVALID; + spin_lock_init(&si->global_cluster_lock); } /* @@ -3426,6 +3446,8 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags) bad_swap: free_percpu(si->percpu_cluster); si->percpu_cluster = NULL; + kfree(si->global_cluster); + si->global_cluster = NULL; inode = NULL; destroy_swap_extents(si); swap_cgroup_swapoff(si->type);