From patchwork Mon Jan 13 17:57:22 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kairui Song <ryncsn@gmail.com>
X-Patchwork-Id: 13937869
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 22EE3C02180
	for <linux-mm@archiver.kernel.org>; Mon, 13 Jan 2025 18:00:06 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id A4F8A6B0098; Mon, 13 Jan 2025 13:00:05 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 9FFFC6B009A; Mon, 13 Jan 2025 13:00:05 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 850486B009B; Mon, 13 Jan 2025 13:00:05 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com
 [216.40.44.16])
	by kanga.kvack.org (Postfix) with ESMTP id 5E6CA6B0098
	for <linux-mm@kvack.org>; Mon, 13 Jan 2025 13:00:05 -0500 (EST)
Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay08.hostedemail.com (Postfix) with ESMTP id 173831401D0
	for <linux-mm@kvack.org>; Mon, 13 Jan 2025 18:00:05 +0000 (UTC)
X-FDA: 83003192370.10.FBD4364
Received: from mail-pl1-f176.google.com (mail-pl1-f176.google.com
 [209.85.214.176])
	by imf07.hostedemail.com (Postfix) with ESMTP id 0DCB440010
	for <linux-mm@kvack.org>; Mon, 13 Jan 2025 18:00:02 +0000 (UTC)
Authentication-Results: imf07.hostedemail.com;
	dkim=pass header.d=gmail.com header.s=20230601 header.b=MlDsMDZL;
	spf=pass (imf07.hostedemail.com: domain of ryncsn@gmail.com designates
 209.85.214.176 as permitted sender) smtp.mailfrom=ryncsn@gmail.com;
	dmarc=pass (policy=none) header.from=gmail.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed;
 d=hostedemail.com;
	s=arc-20220608; t=1736791203;
	h=from:from:sender:reply-to:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=1QSJCLnDWZKCsPmvpnvEILycrNkHAmJg2j2a6zzrUYg=;
	b=uNDCXjwFFlUprK3/m92+7EN4kIbYz7kHWTKClKzVLeqc5DH4T5Vk2EJAw6QpPxU6b50zTF
	Hrpj7L7K3jBbsdObPI6cuA1lFG/YGILlfGfcY5QJ4TtvqCO4POYTCQAZtoyAsrs/UbcOc0
	KhuL/q/MOpx2kxEREkRTfDIKP8e3mqQ=
ARC-Authentication-Results: i=1;
	imf07.hostedemail.com;
	dkim=pass header.d=gmail.com header.s=20230601 header.b=MlDsMDZL;
	spf=pass (imf07.hostedemail.com: domain of ryncsn@gmail.com designates
 209.85.214.176 as permitted sender) smtp.mailfrom=ryncsn@gmail.com;
	dmarc=pass (policy=none) header.from=gmail.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736791203; a=rsa-sha256;
	cv=none;
	b=yLJcTUd6hhVHg7z1PfFfWrHsETTcYQDKKIdS4uLy3LZRhpobijU6/FdOTaJd48aptT8y1s
	8BaRxeeca9JizmqD8Fvpfd+zgMFTDfIgLColMyzJ2/nkUj7j8xq7UtRSa2lY9ZhAk+raN+
	RlKTZJReFyAVd0aafzrucLRRHy4v6W8=
Received: by mail-pl1-f176.google.com with SMTP id
 d9443c01a7336-21a7ed0155cso79021205ad.3
        for <linux-mm@kvack.org>; Mon, 13 Jan 2025 10:00:02 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1736791201; x=1737396001; darn=kvack.org;
        h=content-transfer-encoding:mime-version:reply-to:references
         :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject
         :date:message-id:reply-to;
        bh=1QSJCLnDWZKCsPmvpnvEILycrNkHAmJg2j2a6zzrUYg=;
        b=MlDsMDZLNbWvBh7QE9qbpfInVqeBDc2AmWZwQbojmcayV4vc1E99Oa4AKUnpixdV+D
         YT8EDW15ZpsJvk0kT//+BRjW7HA/hSAO6BmfYtL/E3ic7XF4tP2T5eZRwZq8jOIePr2H
         VRQa+aLedRbb++XFL3Ox/rQ6pv3T7ig+UagGimGf8t4jh2RELtVaRINjgCon6/9B/c0h
         rry3o0RgdxAxqa27LKs88cpk/fteee1Uyjp2fBXwblNnOGuwaNmrvjFg0a3o+M0+tfyJ
         Rnc3mE8Dh5jvB6CI/0nYQVsUDb5bwGmz70bfj4pKXZ6/IHphPdFexMZeN1AQtmqNRimT
         aoUg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1736791201; x=1737396001;
        h=content-transfer-encoding:mime-version:reply-to:references
         :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state
         :from:to:cc:subject:date:message-id:reply-to;
        bh=1QSJCLnDWZKCsPmvpnvEILycrNkHAmJg2j2a6zzrUYg=;
        b=FHU4LZ3TWWUJ2hyfUNRsE28DBYiQzFix4bpoaXleeIaW0nDH3yRpMIDdNlXTzYSFO9
         VDQJGIe5VWZTcq01zX4g5Hl++s/qkMtxF9TU3XpnZBrRTg2yBeaD/pzKeFWPNmvOMFoZ
         zCKMPy5G9RzmZAV5hiOsdMoJFtLW/A7QggyGpq0cqJU0sfR7jy3IN8RMIKZXxnwWDpUz
         HCumYf6P1+iKqxM/JBb1pirDomM3CqcIdT1PHaNMiW1l0vRYt9JVeWZXmmkX/vsFe82k
         4GZJIx+yqBYQjlfB0GXdw9fpF93a8Fh4m4o3VPmBChGnqGV+xx3nlgb9Nn6XSgFRBo9Y
         ITrA==
X-Gm-Message-State: AOJu0YxUwcOZmfQ7+qJD1ic1ozeIZjzoBj/pxEY1z/g5g0Q+mveQfME8
	AcaqaYsPqlECgB21YEh/uXdxu0kD6hocRdE9Fldf62cB8KDa+ecHrUUdERZjkNI=
X-Gm-Gg: ASbGncsirhBKjjpP02/WFBoFNevaR19UPkayFXx1AIUWQS8Wi0+oT/P9TiqLBogejak
	SaFfaFJDDSHK8qwpfH7M+I7yyJQpakz59dbYAOZ4RWn+H53uOdJKzab1zkfCI0wR3krn10vFw4I
	wxEtNWJFcFwLIrX81OgKDIYwAWiDO73wTvQXazI5+l5gDvmvIqRszpYrAGHPsw6PAoHrksV6DVi
	P5sz8sXmT12a2WGTEC25UbtE/I1J2JcQLJ2W6PIwS/+TWteV7RELqQicRnj7/sqa12Q8kz0YpXi
	fA==
X-Google-Smtp-Source: 
 AGHT+IHsOcq0v8r/IsD/rTblP/e+H0X8Pne7rzRS4aDwWP++SL+VwES933KtHDTI+0HHkHetWhZy5w==
X-Received: by 2002:a17:902:ccc2:b0:216:4165:c05e with SMTP id
 d9443c01a7336-21a83f67982mr398301215ad.24.1736791201254;
        Mon, 13 Jan 2025 10:00:01 -0800 (PST)
Received: from KASONG-MC4.tencent.com ([115.171.41.132])
        by smtp.gmail.com with ESMTPSA id
 d9443c01a7336-21a9f21aba7sm57023635ad.113.2025.01.13.09.59.57
        (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256);
        Mon, 13 Jan 2025 10:00:00 -0800 (PST)
From: Kairui Song <ryncsn@gmail.com>
To: linux-mm@kvack.org
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Chris Li <chrisl@kernel.org>,
	Barry Song <v-songbaohua@oppo.com>,
	Ryan Roberts <ryan.roberts@arm.com>,
	Hugh Dickins <hughd@google.com>,
	Yosry Ahmed <yosryahmed@google.com>,
	"Huang, Ying" <ying.huang@linux.alibaba.com>,
	Baoquan He <bhe@redhat.com>,
	Nhat Pham <nphamcs@gmail.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Kalesh Singh <kaleshsingh@google.com>,
	linux-kernel@vger.kernel.org,
	Kairui Song <kasong@tencent.com>
Subject: [PATCH v4 03/13] mm, swap: remove old allocation path for HDD
Date: Tue, 14 Jan 2025 01:57:22 +0800
Message-ID: <20250113175732.48099-4-ryncsn@gmail.com>
X-Mailer: git-send-email 2.47.1
In-Reply-To: <20250113175732.48099-1-ryncsn@gmail.com>
References: <20250113175732.48099-1-ryncsn@gmail.com>
Reply-To: Kairui Song <kasong@tencent.com>
MIME-Version: 1.0
X-Rspamd-Server: rspam06
X-Rspamd-Queue-Id: 0DCB440010
X-Rspam-User: 
X-Stat-Signature: nrsq63rjfg1mjhpspehm3c1ebgxw76yn
X-HE-Tag: 1736791202-544261
X-HE-Meta: 
 U2FsdGVkX19DAWyfkyj94NS7cslckmyWIjnzQrji8A6PFp2sZaQSMHBeW3g2Ap5PMUdVRdZlfkMhJvbWxW7gEsc/eY1fEN1KzlkI6KQ1PjcbPjwIO3fyo+IpMCEBOHmUybBdASRignV6kBBB2nXh4Jai7t5CMazDLCcBf6KT6y6kk5Az42eTkTfz5ikLN1Pdf5qvEnfleT2QrYgMfUDet3226Nmt5DSYMfD7u8wDIBpsyiCdP8Wr98XTk9xUYwmoC3fWlEG73ORAkGNi7Ly5LuY6+29ml40m6KU8BzWVm5or4Q4diiBt/tzinHN1fhVNktLRDG5IiUI2GxWwHvm3ndd90L0JHVOWS6QUH2M6bqa95XQMmBGb7K/gHrROjigty9c4zuPcpZvpG2To7lynbwwTg7HUWATa9vQkHfQtBVUnwbCMMNzkGrnRiZwOgnASTodDmW9jfMckPm06FhIpV60RagLWrKzuJtPg9x4YQMYcr5Yl2GaQ5HqruwfMp6hAIhfrsKanYDFcoN74G7km9el+ollxR7ZI65p2ymnmhOu/AKBayYGxsrhtAgb4xAd8AOIH1amop310kjdDmWS0QZMX1b9oQ/F78BCMx6xLkUDixKqWzAAGS7BTACN1pvn3Q0u5WtYREA6Y6usxqim5QdVYvdcggcSLzpdZnlOF0XqMapYurm/GypCVsvdalcgoy2OfpZf3GIcHXT9Aey32HGGixztuGdIqFAfACU1I3prAPBLoF9KQiWYEdyYmSjwn0/p5rM/g/jJ+h6/p+09u8tm4jHQ/HgcsmnVTIHMt+shO71TfjhyVOesNlyp9cmrvivyP1lUB1grtMO/o3v0W5x4TrNrhajq/aqqDB54wUL1QFo7JR5CiSbOXzPBEuvUd/AxU2je8A6z1jdSFcLyv15VpVXJPiFLcBYoSktqwXmBXOgoUx9legQykLTq+bD9W+TUDpmHpGjsLIjNVofc
 uIa9pq7O
 0mfdTEzemJF63SsTvxZwdUcyg39SaDugd56V4lQZU5U+KEsyu6F5RTUq4N0XoqUKTQZdHOAf74moZu76rO7HJcxpQTKNELW6J3RpIzwCjT8XJmfxgheZG4+7JBXkcjvCda1GQYlOJSe67rPBEwDr8fkh8gDjX2KSuxn8EO7ieY4Y3wo2mcxdaoRYAXCPqmzmq3lI7WYCJATluAyNjL4FGXG4oxpPqb9sLiEw9zRJnVeftVTHpds5xHJvD9w/uX8pA1GHiVpBdupzo6DoPsEJHimDsqTj0IAxLgh5KJyYzpOPVPlALCC3squSsMXwnF1urMnjizoKX5DiiTTPyAzSYMwIIXqQtz+qL02NfmUCoAwQy5vvLQdCwj5jaysAkvnLl8oCQ6f8/15aG9N4Me5BvVTXwdaiOYjK3uTK9PGyXj1jJM2FTmZdIuMP/Ka7n9c2IQWaHGTQyPWqBAOkoyTR/hyLTuJN1cyBmgJA7fPQtxgfZ3kXPzR1MuFD2rU525NpqldMkWINDHA099NLCc5QZVpXr/BuW2AoiFe8ij38GvAGl3YQZp+GhZyV9hUxrcqieL1DPnMEF0faXFQ/lW6w1d/fuRw==
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>

From: Kairui Song <kasong@tencent.com>

We are currently using different swap allocation algorithm for HDD and
non-HDD. This leads to the existence of a different set of locks, and
the code path is heavily bloated, causing difficulties for further
optimization and maintenance.

This commit removes all HDD swap allocation and related dead code,
and uses the cluster allocation algorithm instead.

The performance may drop temporarily, but this should be negligible:
The main advantage of the legacy HDD allocation algorithm is that it
tends to use continuous slots, but swap device gets fragmented quickly
anyway, and the attempt to use continuous slots will fail easily.

This commit also enables mTHP swap on HDD, which is expected to be
beneficial, and following commits will adapt and optimize the cluster
allocator for HDD.

Suggested-by: Chris Li <chrisl@kernel.org>
Suggested-by: "Huang, Ying" <ying.huang@linux.alibaba.com>
Signed-off-by: Kairui Song <kasong@tencent.com>
Reviewed-by: Baoquan He <bhe@redhat.com>
---
 include/linux/swap.h |   3 -
 mm/swapfile.c        | 235 ++-----------------------------------------
 2 files changed, 9 insertions(+), 229 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index 187715eec3cb..0c681aa5cb98 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -310,9 +310,6 @@ struct swap_info_struct {
 	unsigned int highest_bit;	/* index of last free in swap_map */
 	unsigned int pages;		/* total of usable pages of swap */
 	unsigned int inuse_pages;	/* number of those currently in use */
-	unsigned int cluster_next;	/* likely index for next allocation */
-	unsigned int cluster_nr;	/* countdown to next cluster search */
-	unsigned int __percpu *cluster_next_cpu; /*percpu index for next allocation */
 	struct percpu_cluster __percpu *percpu_cluster; /* per cpu's swap location */
 	struct rb_root swap_extent_root;/* root of the swap extent rbtree */
 	struct block_device *bdev;	/* swap device or bdev of swap file */
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 574059158627..fca58d43b836 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1001,49 +1001,6 @@ static void swap_range_free(struct swap_info_struct *si, unsigned long offset,
 	WRITE_ONCE(si->inuse_pages, si->inuse_pages - nr_entries);
 }
 
-static void set_cluster_next(struct swap_info_struct *si, unsigned long next)
-{
-	unsigned long prev;
-
-	if (!(si->flags & SWP_SOLIDSTATE)) {
-		si->cluster_next = next;
-		return;
-	}
-
-	prev = this_cpu_read(*si->cluster_next_cpu);
-	/*
-	 * Cross the swap address space size aligned trunk, choose
-	 * another trunk randomly to avoid lock contention on swap
-	 * address space if possible.
-	 */
-	if ((prev >> SWAP_ADDRESS_SPACE_SHIFT) !=
-	    (next >> SWAP_ADDRESS_SPACE_SHIFT)) {
-		/* No free swap slots available */
-		if (si->highest_bit <= si->lowest_bit)
-			return;
-		next = get_random_u32_inclusive(si->lowest_bit, si->highest_bit);
-		next = ALIGN_DOWN(next, SWAP_ADDRESS_SPACE_PAGES);
-		next = max_t(unsigned int, next, si->lowest_bit);
-	}
-	this_cpu_write(*si->cluster_next_cpu, next);
-}
-
-static bool swap_offset_available_and_locked(struct swap_info_struct *si,
-					     unsigned long offset)
-{
-	if (data_race(!si->swap_map[offset])) {
-		spin_lock(&si->lock);
-		return true;
-	}
-
-	if (vm_swap_full() && READ_ONCE(si->swap_map[offset]) == SWAP_HAS_CACHE) {
-		spin_lock(&si->lock);
-		return true;
-	}
-
-	return false;
-}
-
 static int cluster_alloc_swap(struct swap_info_struct *si,
 			     unsigned char usage, int nr,
 			     swp_entry_t slots[], int order)
@@ -1071,13 +1028,7 @@ static int scan_swap_map_slots(struct swap_info_struct *si,
 			       unsigned char usage, int nr,
 			       swp_entry_t slots[], int order)
 {
-	unsigned long offset;
-	unsigned long scan_base;
-	unsigned long last_in_cluster = 0;
-	int latency_ration = LATENCY_LIMIT;
 	unsigned int nr_pages = 1 << order;
-	int n_ret = 0;
-	bool scanned_many = false;
 
 	/*
 	 * We try to cluster swap pages by allocating them sequentially
@@ -1089,7 +1040,6 @@ static int scan_swap_map_slots(struct swap_info_struct *si,
 	 * But we do now try to find an empty cluster.  -Andrea
 	 * And we let swap pages go all over an SSD partition.  Hugh
 	 */
-
 	if (order > 0) {
 		/*
 		 * Should not even be attempting large allocations when huge
@@ -1109,158 +1059,7 @@ static int scan_swap_map_slots(struct swap_info_struct *si,
 			return 0;
 	}
 
-	if (si->cluster_info)
-		return cluster_alloc_swap(si, usage, nr, slots, order);
-
-	si->flags += SWP_SCANNING;
-
-	/* For HDD, sequential access is more important. */
-	scan_base = si->cluster_next;
-	offset = scan_base;
-
-	if (unlikely(!si->cluster_nr--)) {
-		if (si->pages - si->inuse_pages < SWAPFILE_CLUSTER) {
-			si->cluster_nr = SWAPFILE_CLUSTER - 1;
-			goto checks;
-		}
-
-		spin_unlock(&si->lock);
-
-		/*
-		 * If seek is expensive, start searching for new cluster from
-		 * start of partition, to minimize the span of allocated swap.
-		 */
-		scan_base = offset = si->lowest_bit;
-		last_in_cluster = offset + SWAPFILE_CLUSTER - 1;
-
-		/* Locate the first empty (unaligned) cluster */
-		for (; last_in_cluster <= READ_ONCE(si->highest_bit); offset++) {
-			if (si->swap_map[offset])
-				last_in_cluster = offset + SWAPFILE_CLUSTER;
-			else if (offset == last_in_cluster) {
-				spin_lock(&si->lock);
-				offset -= SWAPFILE_CLUSTER - 1;
-				si->cluster_next = offset;
-				si->cluster_nr = SWAPFILE_CLUSTER - 1;
-				goto checks;
-			}
-			if (unlikely(--latency_ration < 0)) {
-				cond_resched();
-				latency_ration = LATENCY_LIMIT;
-			}
-		}
-
-		offset = scan_base;
-		spin_lock(&si->lock);
-		si->cluster_nr = SWAPFILE_CLUSTER - 1;
-	}
-
-checks:
-	if (!(si->flags & SWP_WRITEOK))
-		goto no_page;
-	if (!si->highest_bit)
-		goto no_page;
-	if (offset > si->highest_bit)
-		scan_base = offset = si->lowest_bit;
-
-	/* reuse swap entry of cache-only swap if not busy. */
-	if (vm_swap_full() && si->swap_map[offset] == SWAP_HAS_CACHE) {
-		int swap_was_freed;
-		spin_unlock(&si->lock);
-		swap_was_freed = __try_to_reclaim_swap(si, offset, TTRS_ANYWAY | TTRS_DIRECT);
-		spin_lock(&si->lock);
-		/* entry was freed successfully, try to use this again */
-		if (swap_was_freed > 0)
-			goto checks;
-		goto scan; /* check next one */
-	}
-
-	if (si->swap_map[offset]) {
-		if (!n_ret)
-			goto scan;
-		else
-			goto done;
-	}
-	memset(si->swap_map + offset, usage, nr_pages);
-
-	swap_range_alloc(si, offset, nr_pages);
-	slots[n_ret++] = swp_entry(si->type, offset);
-
-	/* got enough slots or reach max slots? */
-	if ((n_ret == nr) || (offset >= si->highest_bit))
-		goto done;
-
-	/* search for next available slot */
-
-	/* time to take a break? */
-	if (unlikely(--latency_ration < 0)) {
-		if (n_ret)
-			goto done;
-		spin_unlock(&si->lock);
-		cond_resched();
-		spin_lock(&si->lock);
-		latency_ration = LATENCY_LIMIT;
-	}
-
-	if (si->cluster_nr && !si->swap_map[++offset]) {
-		/* non-ssd case, still more slots in cluster? */
-		--si->cluster_nr;
-		goto checks;
-	}
-
-	/*
-	 * Even if there's no free clusters available (fragmented),
-	 * try to scan a little more quickly with lock held unless we
-	 * have scanned too many slots already.
-	 */
-	if (!scanned_many) {
-		unsigned long scan_limit;
-
-		if (offset < scan_base)
-			scan_limit = scan_base;
-		else
-			scan_limit = si->highest_bit;
-		for (; offset <= scan_limit && --latency_ration > 0;
-		     offset++) {
-			if (!si->swap_map[offset])
-				goto checks;
-		}
-	}
-
-done:
-	if (order == 0)
-		set_cluster_next(si, offset + 1);
-	si->flags -= SWP_SCANNING;
-	return n_ret;
-
-scan:
-	VM_WARN_ON(order > 0);
-	spin_unlock(&si->lock);
-	while (++offset <= READ_ONCE(si->highest_bit)) {
-		if (unlikely(--latency_ration < 0)) {
-			cond_resched();
-			latency_ration = LATENCY_LIMIT;
-			scanned_many = true;
-		}
-		if (swap_offset_available_and_locked(si, offset))
-			goto checks;
-	}
-	offset = si->lowest_bit;
-	while (offset < scan_base) {
-		if (unlikely(--latency_ration < 0)) {
-			cond_resched();
-			latency_ration = LATENCY_LIMIT;
-			scanned_many = true;
-		}
-		if (swap_offset_available_and_locked(si, offset))
-			goto checks;
-		offset++;
-	}
-	spin_lock(&si->lock);
-
-no_page:
-	si->flags -= SWP_SCANNING;
-	return n_ret;
+	return cluster_alloc_swap(si, usage, nr, slots, order);
 }
 
 int get_swap_pages(int n_goal, swp_entry_t swp_entries[], int entry_order)
@@ -2871,8 +2670,6 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
 	mutex_unlock(&swapon_mutex);
 	free_percpu(p->percpu_cluster);
 	p->percpu_cluster = NULL;
-	free_percpu(p->cluster_next_cpu);
-	p->cluster_next_cpu = NULL;
 	vfree(swap_map);
 	kvfree(zeromap);
 	kvfree(cluster_info);
@@ -3184,8 +2981,6 @@ static unsigned long read_swap_header(struct swap_info_struct *si,
 	}
 
 	si->lowest_bit  = 1;
-	si->cluster_next = 1;
-	si->cluster_nr = 0;
 
 	maxpages = swapfile_maximum_size;
 	last_page = swap_header->info.last_page;
@@ -3271,7 +3066,6 @@ static struct swap_cluster_info *setup_clusters(struct swap_info_struct *si,
 						unsigned long maxpages)
 {
 	unsigned long nr_clusters = DIV_ROUND_UP(maxpages, SWAPFILE_CLUSTER);
-	unsigned long col = si->cluster_next / SWAPFILE_CLUSTER % SWAP_CLUSTER_COLS;
 	struct swap_cluster_info *cluster_info;
 	unsigned long i, j, k, idx;
 	int cpu, err = -ENOMEM;
@@ -3283,15 +3077,6 @@ static struct swap_cluster_info *setup_clusters(struct swap_info_struct *si,
 	for (i = 0; i < nr_clusters; i++)
 		spin_lock_init(&cluster_info[i].lock);
 
-	si->cluster_next_cpu = alloc_percpu(unsigned int);
-	if (!si->cluster_next_cpu)
-		goto err_free;
-
-	/* Random start position to help with wear leveling */
-	for_each_possible_cpu(cpu)
-		per_cpu(*si->cluster_next_cpu, cpu) =
-		get_random_u32_inclusive(1, si->highest_bit);
-
 	si->percpu_cluster = alloc_percpu(struct percpu_cluster);
 	if (!si->percpu_cluster)
 		goto err_free;
@@ -3333,7 +3118,7 @@ static struct swap_cluster_info *setup_clusters(struct swap_info_struct *si,
 	 * sharing same address space.
 	 */
 	for (k = 0; k < SWAP_CLUSTER_COLS; k++) {
-		j = (k + col) % SWAP_CLUSTER_COLS;
+		j = k % SWAP_CLUSTER_COLS;
 		for (i = 0; i < DIV_ROUND_UP(nr_clusters, SWAP_CLUSTER_COLS); i++) {
 			struct swap_cluster_info *ci;
 			idx = i * SWAP_CLUSTER_COLS + j;
@@ -3483,18 +3268,18 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
 
 	if (si->bdev && bdev_nonrot(si->bdev)) {
 		si->flags |= SWP_SOLIDSTATE;
-
-		cluster_info = setup_clusters(si, swap_header, maxpages);
-		if (IS_ERR(cluster_info)) {
-			error = PTR_ERR(cluster_info);
-			cluster_info = NULL;
-			goto bad_swap_unlock_inode;
-		}
 	} else {
 		atomic_inc(&nr_rotate_swap);
 		inced_nr_rotate_swap = true;
 	}
 
+	cluster_info = setup_clusters(si, swap_header, maxpages);
+	if (IS_ERR(cluster_info)) {
+		error = PTR_ERR(cluster_info);
+		cluster_info = NULL;
+		goto bad_swap_unlock_inode;
+	}
+
 	if ((swap_flags & SWAP_FLAG_DISCARD) &&
 	    si->bdev && bdev_max_discard_sectors(si->bdev)) {
 		/*
@@ -3575,8 +3360,6 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
 bad_swap:
 	free_percpu(si->percpu_cluster);
 	si->percpu_cluster = NULL;
-	free_percpu(si->cluster_next_cpu);
-	si->cluster_next_cpu = NULL;
 	inode = NULL;
 	destroy_swap_extents(si);
 	swap_cgroup_swapoff(si->type);