[v4,3/6] mm: swap: Simplify struct percpu_cluster

Message ID	20240311150058.1122862-4-ryan.roberts@arm.com (mailing list archive)
State	New
Headers	show Return-Path: <owner-linux-mm@kvack.org> From: Ryan Roberts <ryan.roberts@arm.com> To: Andrew Morton <akpm@linux-foundation.org>, David Hildenbrand <david@redhat.com>, Matthew Wilcox <willy@infradead.org>, Huang Ying <ying.huang@intel.com>, Gao Xiang <xiang@kernel.org>, Yu Zhao <yuzhao@google.com>, Yang Shi <shy828301@gmail.com>, Michal Hocko <mhocko@suse.com>, Kefeng Wang <wangkefeng.wang@huawei.com>, Barry Song <21cnbao@gmail.com>, Chris Li <chrisl@kernel.org> Cc: Ryan Roberts <ryan.roberts@arm.com>, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v4 3/6] mm: swap: Simplify struct percpu_cluster Date: Mon, 11 Mar 2024 15:00:55 +0000 Message-Id: <20240311150058.1122862-4-ryan.roberts@arm.com> In-Reply-To: <20240311150058.1122862-1-ryan.roberts@arm.com> References: <20240311150058.1122862-1-ryan.roberts@arm.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	Swap-out mTHP without splitting \| expand [v4,0/6] Swap-out mTHP without splitting [v4,1/6] mm: swap: Remove CLUSTER_FLAG_HUGE from swap_cluster_info:flags [v4,2/6] mm: swap: free_swap_and_cache_nr() as batched free_swap_and_cache() [v4,3/6] mm: swap: Simplify struct percpu_cluster [v4,4/6] mm: swap: Allow storage of all mTHP orders [v4,5/6] mm: vmscan: Avoid split during shrink_folio_list() [v4,6/6] mm: madvise: Avoid split during MADV_PAGEOUT and MADV_COLD

Message ID

20240311150058.1122862-4-ryan.roberts@arm.com (mailing list archive)

State

New

Headers

From: Ryan Roberts <ryan.roberts@arm.com>
To: Andrew Morton <akpm@linux-foundation.org>,
	David Hildenbrand <david@redhat.com>,
	Matthew Wilcox <willy@infradead.org>,
	Huang Ying <ying.huang@intel.com>,
	Gao Xiang <xiang@kernel.org>,
	Yu Zhao <yuzhao@google.com>,
	Yang Shi <shy828301@gmail.com>,
	Michal Hocko <mhocko@suse.com>,
	Kefeng Wang <wangkefeng.wang@huawei.com>,
	Barry Song <21cnbao@gmail.com>,
	Chris Li <chrisl@kernel.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>,
	linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Subject: [PATCH v4 3/6] mm: swap: Simplify struct percpu_cluster
Date: Mon, 11 Mar 2024 15:00:55 +0000
Message-Id: <20240311150058.1122862-4-ryan.roberts@arm.com>
In-Reply-To: <20240311150058.1122862-1-ryan.roberts@arm.com>
References: <20240311150058.1122862-1-ryan.roberts@arm.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Sender: owner-linux-mm@kvack.org
Precedence: bulk

Series

Swap-out mTHP without splitting | expand

Commit Message

Ryan Roberts March 11, 2024, 3 p.m. UTC

struct percpu_cluster stores the index of cpu's current cluster and the
offset of the next entry that will be allocated for the cpu. These two
pieces of information are redundant because the cluster index is just
(offset / SWAPFILE_CLUSTER). The only reason for explicitly keeping the
cluster index is because the structure used for it also has a flag to
indicate "no cluster". However this data structure also contains a spin
lock, which is never used in this context, as a side effect the code
copies the spinlock_t structure, which is questionable coding practice
in my view.

So let's clean this up and store only the next offset, and use a
sentinal value (SWAP_NEXT_INVALID) to indicate "no cluster".
SWAP_NEXT_INVALID is chosen to be 0, because 0 will never be seen
legitimately; The first page in the swap file is the swap header, which
is always marked bad to prevent it from being allocated as an entry.
This also prevents the cluster to which it belongs being marked free, so
it will never appear on the free list.

This change saves 16 bytes per cpu. And given we are shortly going to
extend this mechanism to be per-cpu-AND-per-order, we will end up saving
16 * 9 = 144 bytes per cpu, which adds up if you have 256 cpus in the
system.

Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
---
 include/linux/swap.h |  9 ++++++++-
 mm/swapfile.c        | 22 +++++++++++-----------
 2 files changed, 19 insertions(+), 12 deletions(-)

Comments

Huang, Ying March 12, 2024, 7:52 a.m. UTC | #1

Ryan Roberts <ryan.roberts@arm.com> writes:

> struct percpu_cluster stores the index of cpu's current cluster and the
> offset of the next entry that will be allocated for the cpu. These two
> pieces of information are redundant because the cluster index is just
> (offset / SWAPFILE_CLUSTER). The only reason for explicitly keeping the
> cluster index is because the structure used for it also has a flag to
> indicate "no cluster". However this data structure also contains a spin
> lock, which is never used in this context, as a side effect the code
> copies the spinlock_t structure, which is questionable coding practice
> in my view.
>
> So let's clean this up and store only the next offset, and use a
> sentinal value (SWAP_NEXT_INVALID) to indicate "no cluster".
> SWAP_NEXT_INVALID is chosen to be 0, because 0 will never be seen
> legitimately; The first page in the swap file is the swap header, which
> is always marked bad to prevent it from being allocated as an entry.
> This also prevents the cluster to which it belongs being marked free, so
> it will never appear on the free list.
>
> This change saves 16 bytes per cpu. And given we are shortly going to
> extend this mechanism to be per-cpu-AND-per-order, we will end up saving
> 16 * 9 = 144 bytes per cpu, which adds up if you have 256 cpus in the
> system.
>
> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>

LGTM, Thanks!

--
Best Regards,
Huang, Ying

Ryan Roberts March 12, 2024, 8:51 a.m. UTC | #2

On 12/03/2024 07:52, Huang, Ying wrote:
> Ryan Roberts <ryan.roberts@arm.com> writes:
> 
>> struct percpu_cluster stores the index of cpu's current cluster and the
>> offset of the next entry that will be allocated for the cpu. These two
>> pieces of information are redundant because the cluster index is just
>> (offset / SWAPFILE_CLUSTER). The only reason for explicitly keeping the
>> cluster index is because the structure used for it also has a flag to
>> indicate "no cluster". However this data structure also contains a spin
>> lock, which is never used in this context, as a side effect the code
>> copies the spinlock_t structure, which is questionable coding practice
>> in my view.
>>
>> So let's clean this up and store only the next offset, and use a
>> sentinal value (SWAP_NEXT_INVALID) to indicate "no cluster".
>> SWAP_NEXT_INVALID is chosen to be 0, because 0 will never be seen
>> legitimately; The first page in the swap file is the swap header, which
>> is always marked bad to prevent it from being allocated as an entry.
>> This also prevents the cluster to which it belongs being marked free, so
>> it will never appear on the free list.
>>
>> This change saves 16 bytes per cpu. And given we are shortly going to
>> extend this mechanism to be per-cpu-AND-per-order, we will end up saving
>> 16 * 9 = 144 bytes per cpu, which adds up if you have 256 cpus in the
>> system.
>>
>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> 
> LGTM, Thanks!

Thanks! What's a guy got to do to get Rb or Ack? :)

> 
> --
> Best Regards,
> Huang, Ying
>

Huang, Ying March 13, 2024, 1:34 a.m. UTC | #3

Ryan Roberts <ryan.roberts@arm.com> writes:

> On 12/03/2024 07:52, Huang, Ying wrote:
>> Ryan Roberts <ryan.roberts@arm.com> writes:
>> 
>>> struct percpu_cluster stores the index of cpu's current cluster and the
>>> offset of the next entry that will be allocated for the cpu. These two
>>> pieces of information are redundant because the cluster index is just
>>> (offset / SWAPFILE_CLUSTER). The only reason for explicitly keeping the
>>> cluster index is because the structure used for it also has a flag to
>>> indicate "no cluster". However this data structure also contains a spin
>>> lock, which is never used in this context, as a side effect the code
>>> copies the spinlock_t structure, which is questionable coding practice
>>> in my view.
>>>
>>> So let's clean this up and store only the next offset, and use a
>>> sentinal value (SWAP_NEXT_INVALID) to indicate "no cluster".
>>> SWAP_NEXT_INVALID is chosen to be 0, because 0 will never be seen
>>> legitimately; The first page in the swap file is the swap header, which
>>> is always marked bad to prevent it from being allocated as an entry.
>>> This also prevents the cluster to which it belongs being marked free, so
>>> it will never appear on the free list.
>>>
>>> This change saves 16 bytes per cpu. And given we are shortly going to
>>> extend this mechanism to be per-cpu-AND-per-order, we will end up saving
>>> 16 * 9 = 144 bytes per cpu, which adds up if you have 256 cpus in the
>>> system.
>>>
>>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
>> 
>> LGTM, Thanks!
>
> Thanks! What's a guy got to do to get Rb or Ack? :)

Feel free to add

Reviewed-by: "Huang, Ying" <ying.huang@intel.com>

in the future version.

--
Best Regards,
Huang, Ying

diff --git a/include/linux/swap.h b/include/linux/swap.h
index f2b7f204b968..0cb082bee717 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -260,13 +260,20 @@  struct swap_cluster_info {
 #define CLUSTER_FLAG_FREE 1 /* This cluster is free */
 #define CLUSTER_FLAG_NEXT_NULL 2 /* This cluster has no next cluster */
 
+/*
+ * The first page in the swap file is the swap header, which is always marked
+ * bad to prevent it from being allocated as an entry. This also prevents the
+ * cluster to which it belongs being marked free. Therefore 0 is safe to use as
+ * a sentinel to indicate next is not valid in percpu_cluster.
+ */
+#define SWAP_NEXT_INVALID	0
+
 /*
  * We assign a cluster to each CPU, so each CPU can allocate swap entry from
  * its own cluster and swapout sequentially. The purpose is to optimize swapout
  * throughput.
  */
 struct percpu_cluster {
-	struct swap_cluster_info index; /* Current cluster index */
 	unsigned int next; /* Likely next allocation offset */
 };
 
diff --git a/mm/swapfile.c b/mm/swapfile.c
index ee7e44cb40c5..3828d81aa6b8 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -609,7 +609,7 @@  scan_swap_map_ssd_cluster_conflict(struct swap_info_struct *si,
 		return false;
 
 	percpu_cluster = this_cpu_ptr(si->percpu_cluster);
-	cluster_set_null(&percpu_cluster->index);
+	percpu_cluster->next = SWAP_NEXT_INVALID;
 	return true;
 }
 
@@ -622,14 +622,14 @@  static bool scan_swap_map_try_ssd_cluster(struct swap_info_struct *si,
 {
 	struct percpu_cluster *cluster;
 	struct swap_cluster_info *ci;
-	unsigned long tmp, max;
+	unsigned int tmp, max;
 
 new_cluster:
 	cluster = this_cpu_ptr(si->percpu_cluster);
-	if (cluster_is_null(&cluster->index)) {
+	tmp = cluster->next;
+	if (tmp == SWAP_NEXT_INVALID) {
 		if (!cluster_list_empty(&si->free_clusters)) {
-			cluster->index = si->free_clusters.head;
-			cluster->next = cluster_next(&cluster->index) *
+			tmp = cluster_next(&si->free_clusters.head) *
 					SWAPFILE_CLUSTER;
 		} else if (!cluster_list_empty(&si->discard_clusters)) {
 			/*
@@ -649,9 +649,7 @@  static bool scan_swap_map_try_ssd_cluster(struct swap_info_struct *si,
 	 * Other CPUs can use our cluster if they can't find a free cluster,
 	 * check if there is still free entry in the cluster
 	 */
-	tmp = cluster->next;
-	max = min_t(unsigned long, si->max,
-		    (cluster_next(&cluster->index) + 1) * SWAPFILE_CLUSTER);
+	max = min_t(unsigned long, si->max, ALIGN(tmp + 1, SWAPFILE_CLUSTER));
 	if (tmp < max) {
 		ci = lock_cluster(si, tmp);
 		while (tmp < max) {
@@ -662,12 +660,13 @@  static bool scan_swap_map_try_ssd_cluster(struct swap_info_struct *si,
 		unlock_cluster(ci);
 	}
 	if (tmp >= max) {
-		cluster_set_null(&cluster->index);
+		cluster->next = SWAP_NEXT_INVALID;
 		goto new_cluster;
 	}
-	cluster->next = tmp + 1;
 	*offset = tmp;
 	*scan_base = tmp;
+	tmp += 1;
+	cluster->next = tmp < max ? tmp : SWAP_NEXT_INVALID;
 	return true;
 }
 
@@ -3138,8 +3137,9 @@  SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
 		}
 		for_each_possible_cpu(cpu) {
 			struct percpu_cluster *cluster;
+
 			cluster = per_cpu_ptr(p->percpu_cluster, cpu);
-			cluster_set_null(&cluster->index);
+			cluster->next = SWAP_NEXT_INVALID;
 		}
 	} else {
 		atomic_inc(&nr_rotate_swap);

[v4,3/6] mm: swap: Simplify struct percpu_cluster

Commit Message

Comments

Patch