From patchwork Wed Oct 25 14:45:44 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ryan Roberts X-Patchwork-Id: 13436307 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DFAEBC0032E for ; Wed, 25 Oct 2023 14:46:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AC6976B032B; Wed, 25 Oct 2023 10:46:02 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A75B66B032C; Wed, 25 Oct 2023 10:46:02 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8CCA46B032D; Wed, 25 Oct 2023 10:46:02 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 7A50E6B032B for ; Wed, 25 Oct 2023 10:46:02 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 43156C0396 for ; Wed, 25 Oct 2023 14:46:02 +0000 (UTC) X-FDA: 81384258564.11.5309CA7 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf24.hostedemail.com (Postfix) with ESMTP id 757ED18002C for ; Wed, 25 Oct 2023 14:46:00 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf24.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1698245160; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=0Tpf03q3z+j3kVeYNfMMrhq2bga3CLxbv4cIDq7P5sE=; b=cIS6bZctombYvtIhXT6ZXsv9S8wEm8lq9hAQC7zqlHSEUutHJs4Pf4ehGp73df42KAL8p+ ZQyXsRbmFspRBXByJXMWYp32UB7GMo5RKQt1djgyNrx/J/fezf5KMne7E/ESTm6wUiIAyD Per6anBobwdGao23YPkga1DrgGfkGxw= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf24.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1698245160; a=rsa-sha256; cv=none; b=YBDzLVSwuqa18acrZ2KM7mdATxOUg3aK+FZfZ4Pvp8/WcQMbbYpnH5NYjIHX/1UCB7WNGB chQ9T77ENh/ucJNpELnL5uE9dJjoyQJ+Ji42sptTVGvrMngCshBj9A/6XVM5L6SYW6OW8t L/H9VuKktfxddCBGBLX4G5f4JXYynSQ= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 30C9B1474; Wed, 25 Oct 2023 07:46:41 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.26]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 139993F7C5; Wed, 25 Oct 2023 07:45:57 -0700 (PDT) From: Ryan Roberts To: Andrew Morton , David Hildenbrand , Matthew Wilcox , Huang Ying , Gao Xiang , Yu Zhao , Yang Shi , Michal Hocko , Kefeng Wang Cc: Ryan Roberts , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v3 2/4] mm: swap: Remove struct percpu_cluster Date: Wed, 25 Oct 2023 15:45:44 +0100 Message-Id: <20231025144546.577640-3-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20231025144546.577640-1-ryan.roberts@arm.com> References: <20231025144546.577640-1-ryan.roberts@arm.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 757ED18002C X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: uzj3qwykxi81c7owmbt99re3irgf8mc8 X-HE-Tag: 1698245160-710989 X-HE-Meta: U2FsdGVkX18rZ61qgisKpVaL+zFFxMP606qxCsAXErQYPwt0NTDOW2d1zv2q9GoZzpaKj55MTC2A1tw2MkZkjCn9fkr997Egat8S2I3CXK6+G7Gjan+K5SbzlnyLkPtvRHU53S+CMCOhjh8KKscQ877uV9Q07UnJ0Q1QuNL0MJKLhZVlmy7rCLt7Eip/b8MPXpxOti6IDjs/KFfxXgngyciHJTKj7C3rbL8tf/q7m96BQrChBFKP+1HEOEuJ4qIuXJ6VZ85sqDBwNHMfsxTYREaXE4kP/R25vHu+bRSmiGpxwYoC/XajMPqDzhvk59Mczpi9LB8Aah20Eau1vfWZpXF/xu/WsG27ajoAFNf7z06dG6YALgsWPiabaPDkeyu10XXZQQu9QRiOABmr2hyB6WN5Sxqhv3cCbfATPcoHi1n/QjNonByGhGXtUQGOFlYX3gax0hXIHoBDq9RSQLlJKcrJFdJYiCQly1w+++xgd/bT3ym0VGeQPspJT9iTyIKTo2Mlq6bZiefnEHT+C+OSy3m+kRsblAr6GhFBClyN62pN6ePmBaRWXC4v1Bd3Wu9upZSzgDaamWvxexT7j1rKbFsqz000ajxZEtiqLxQh2mXmne8B0ZWa7NyJnwA3nUi0iS9YMkke51UcvvKioyfKmoQ3vGIuJXWnBMQjb3hvxIwX4YF4Aq2klsWAsVocv34CCvH3eKUyO83yJiin9E6gdcQSYf4k+oU7ZY1armBI6ub/ZFebs76/gGqhPY5429f6sSwxCchAzjLc3NPQoDTyrXU7qbUjcjGECUgj6yXPv9CcEQ5kam42CKSizpFcrCJow45Var7xdJt7WK3P7ytvIYW6joffyqsimwPsqDx1tNDmw9Zy/j7N+I6vthPa9x9owmqF/5AuiDWcCn+bfhCBxYrFGdQmbmlnLd4TIlsMWdiOEtMubYNolhdsQPL5z9ym33a1Pt4AdKvsL7nZl0Q L1agjGTT +FQ/wCZREvjxPqN/YuOLut+wsEBwQWqE6NPAoWpPd9UgRb1G6wn9tg65JyHjRCWd5ODHk8wpi/CQpcQSoEUgNZfkv5hpd8plq8eDhNCMTh7goR4SnfUmtSSclesG70+lX7JJYVsw2AG5bCcm9P4jzC0RbjnlL51DKBzEYgGhcqq6rZXerSy0YqhDBQZPA2XeTsreLWLQ8VTZddU8= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: struct percpu_cluster stores the index of cpu's current cluster and the offset of the next entry that will be allocated for the cpu. These two pieces of information are redundant because the cluster index is just (offset / SWAPFILE_CLUSTER). The only reason for explicitly keeping the cluster index is because the structure used for it also has a flag to indicate "no cluster". However this data structure also contains a spin lock, which is never used in this context, as a side effect the code copies the spinlock_t structure, which is questionable coding practice in my view. So let's clean this up and store only the next offset, and use a sentinal value (SWAP_NEXT_NULL) to indicate "no cluster". SWAP_NEXT_NULL is chosen to be 0, because 0 will never be seen legitimately; The first page in the swap file is the swap header, which is always marked bad to prevent it from being allocated as an entry. This also prevents the cluster to which it belongs being marked free, so it will never appear on the free list. This change saves 16 bytes per cpu. And given we are shortly going to extend this mechanism to be per-cpu-AND-per-order, we will end up saving 16 * 9 = 144 bytes per cpu, which adds up if you have 256 cpus in the system. Signed-off-by: Ryan Roberts --- include/linux/swap.h | 21 +++++++++++++-------- mm/swapfile.c | 43 +++++++++++++++++++------------------------ 2 files changed, 32 insertions(+), 32 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index a073366a227c..0ca8aaa098ba 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -261,14 +261,12 @@ struct swap_cluster_info { #define CLUSTER_FLAG_NEXT_NULL 2 /* This cluster has no next cluster */ /* - * We assign a cluster to each CPU, so each CPU can allocate swap entry from - * its own cluster and swapout sequentially. The purpose is to optimize swapout - * throughput. + * The first page in the swap file is the swap header, which is always marked + * bad to prevent it from being allocated as an entry. This also prevents the + * cluster to which it belongs being marked free. Therefore 0 is safe to use as + * a sentinel to indicate cpu_next is not valid in swap_info_struct. */ -struct percpu_cluster { - struct swap_cluster_info index; /* Current cluster index */ - unsigned int next; /* Likely next allocation offset */ -}; +#define SWAP_NEXT_NULL 0 struct swap_cluster_list { struct swap_cluster_info head; @@ -295,7 +293,14 @@ struct swap_info_struct { unsigned int cluster_next; /* likely index for next allocation */ unsigned int cluster_nr; /* countdown to next cluster search */ unsigned int __percpu *cluster_next_cpu; /*percpu index for next allocation */ - struct percpu_cluster __percpu *percpu_cluster; /* per cpu's swap location */ + unsigned int __percpu *cpu_next;/* + * Likely next allocation offset. We + * assign a cluster to each CPU, so each + * CPU can allocate swap entry from its + * own cluster and swapout sequentially. + * The purpose is to optimize swapout + * throughput. + */ struct rb_root swap_extent_root;/* root of the swap extent rbtree */ struct block_device *bdev; /* swap device or bdev of swap file */ struct file *swap_file; /* seldom referenced */ diff --git a/mm/swapfile.c b/mm/swapfile.c index b83ad77e04c0..617e34b8cdbe 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -591,7 +591,6 @@ static bool scan_swap_map_ssd_cluster_conflict(struct swap_info_struct *si, unsigned long offset) { - struct percpu_cluster *percpu_cluster; bool conflict; offset /= SWAPFILE_CLUSTER; @@ -602,8 +601,7 @@ scan_swap_map_ssd_cluster_conflict(struct swap_info_struct *si, if (!conflict) return false; - percpu_cluster = this_cpu_ptr(si->percpu_cluster); - cluster_set_null(&percpu_cluster->index); + *this_cpu_ptr(si->cpu_next) = SWAP_NEXT_NULL; return true; } @@ -614,16 +612,16 @@ scan_swap_map_ssd_cluster_conflict(struct swap_info_struct *si, static bool scan_swap_map_try_ssd_cluster(struct swap_info_struct *si, unsigned long *offset, unsigned long *scan_base) { - struct percpu_cluster *cluster; struct swap_cluster_info *ci; - unsigned long tmp, max; + unsigned int tmp, max; + unsigned int *cpu_next; new_cluster: - cluster = this_cpu_ptr(si->percpu_cluster); - if (cluster_is_null(&cluster->index)) { + cpu_next = this_cpu_ptr(si->cpu_next); + tmp = *cpu_next; + if (tmp == SWAP_NEXT_NULL) { if (!cluster_list_empty(&si->free_clusters)) { - cluster->index = si->free_clusters.head; - cluster->next = cluster_next(&cluster->index) * + tmp = cluster_next(&si->free_clusters.head) * SWAPFILE_CLUSTER; } else if (!cluster_list_empty(&si->discard_clusters)) { /* @@ -643,9 +641,8 @@ static bool scan_swap_map_try_ssd_cluster(struct swap_info_struct *si, * Other CPUs can use our cluster if they can't find a free cluster, * check if there is still free entry in the cluster */ - tmp = cluster->next; max = min_t(unsigned long, si->max, - (cluster_next(&cluster->index) + 1) * SWAPFILE_CLUSTER); + ALIGN_DOWN(tmp, SWAPFILE_CLUSTER) + SWAPFILE_CLUSTER); if (tmp < max) { ci = lock_cluster(si, tmp); while (tmp < max) { @@ -656,12 +653,13 @@ static bool scan_swap_map_try_ssd_cluster(struct swap_info_struct *si, unlock_cluster(ci); } if (tmp >= max) { - cluster_set_null(&cluster->index); + *cpu_next = SWAP_NEXT_NULL; goto new_cluster; } - cluster->next = tmp + 1; *offset = tmp; *scan_base = tmp; + tmp += 1; + *cpu_next = tmp < max ? tmp : SWAP_NEXT_NULL; return true; } @@ -2488,8 +2486,8 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile) arch_swap_invalidate_area(p->type); zswap_swapoff(p->type); mutex_unlock(&swapon_mutex); - free_percpu(p->percpu_cluster); - p->percpu_cluster = NULL; + free_percpu(p->cpu_next); + p->cpu_next = NULL; free_percpu(p->cluster_next_cpu); p->cluster_next_cpu = NULL; vfree(swap_map); @@ -3073,16 +3071,13 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags) for (ci = 0; ci < nr_cluster; ci++) spin_lock_init(&((cluster_info + ci)->lock)); - p->percpu_cluster = alloc_percpu(struct percpu_cluster); - if (!p->percpu_cluster) { + p->cpu_next = alloc_percpu(unsigned int); + if (!p->cpu_next) { error = -ENOMEM; goto bad_swap_unlock_inode; } - for_each_possible_cpu(cpu) { - struct percpu_cluster *cluster; - cluster = per_cpu_ptr(p->percpu_cluster, cpu); - cluster_set_null(&cluster->index); - } + for_each_possible_cpu(cpu) + per_cpu(*p->cpu_next, cpu) = SWAP_NEXT_NULL; } else { atomic_inc(&nr_rotate_swap); inced_nr_rotate_swap = true; @@ -3171,8 +3166,8 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags) bad_swap_unlock_inode: inode_unlock(inode); bad_swap: - free_percpu(p->percpu_cluster); - p->percpu_cluster = NULL; + free_percpu(p->cpu_next); + p->cpu_next = NULL; free_percpu(p->cluster_next_cpu); p->cluster_next_cpu = NULL; if (inode && S_ISBLK(inode->i_mode) && p->bdev) {