From patchwork Fri Feb 14 17:57:08 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kairui Song X-Patchwork-Id: 13975442 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 920F5C021A4 for ; Fri, 14 Feb 2025 17:59:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2CD1928000B; Fri, 14 Feb 2025 12:59:15 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 27CFA280002; Fri, 14 Feb 2025 12:59:15 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 11EBC28000B; Fri, 14 Feb 2025 12:59:15 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id E1A48280002 for ; Fri, 14 Feb 2025 12:59:14 -0500 (EST) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id C2766C080F for ; Fri, 14 Feb 2025 17:58:56 +0000 (UTC) X-FDA: 83119311072.27.73A8E9E Received: from mail-pl1-f175.google.com (mail-pl1-f175.google.com [209.85.214.175]) by imf03.hostedemail.com (Postfix) with ESMTP id B0B8C20003 for ; Fri, 14 Feb 2025 17:58:54 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=jj4YGjuK; spf=pass (imf03.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.175 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1739555934; a=rsa-sha256; cv=none; b=BC9Jtgq5p5zj8ewagMl2QUxxP5SJQXDfTYuTAJSnhVPfQHEOmr/qeuYptZFLj0inQJYeSP L919VZIziRN53+z02J57wvi5dPx9miYsGLTRds3NRXWLuJ1pHp6MOhOKG14+91BKTJHLkU QQijwBA1sJO6sC0KIcOihNeWeVHWkzA= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=jj4YGjuK; spf=pass (imf03.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.175 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1739555934; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=IrxkWCsY2BG0TogE7kDNIlucXg1dBLFmlliY7Uuharg=; b=xUFF16CFZs1ifNcKO2AoozIcwGv3r9zV4hyI8M1A+EYmXEh/GXUBCuy9M9vqDi0DcfQ4Pm xwVJa/2SMXriCjYk5Xp6jX3aZLLVweX5kREpfaI9DeUhwwAvmjSEedh1g7HfGgkpu0WV3J kv/gBoFX5zZ1JqNz2keZCaeNbmdQya8= Received: by mail-pl1-f175.google.com with SMTP id d9443c01a7336-21f78b1fb7dso41685835ad.3 for ; Fri, 14 Feb 2025 09:58:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1739555933; x=1740160733; darn=kvack.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=IrxkWCsY2BG0TogE7kDNIlucXg1dBLFmlliY7Uuharg=; b=jj4YGjuKG1kjI2f11Wk00RcBHNiuok0l+uxDc0py4ooj0z00WxCMm0i3S2Z+2EL0E2 eYU6jqTfOko7eKVG10DKFLO65+CMuH2Zc5Sl4oiI/wktikiCoVU/xYj6WE/zQemGfZia 398E4jLVK1jjeapoxxqaKu4UAkGxwWIM1vDShpTlIloPlMzg5tO7hecJWDsc0ZW1Qntt sPR+zuLfKVbBwVhJg+nVpO24We66GfHKdAVLsKZTgb2kXnJ7QYODBKU/cMNBnFQ7/CwF UIQQV7kvN7VjYgJT0VGfGaKZV+w6xa43B+RANbBC0x1NVGarAJP6du5g7XyIgeUr6Oog 4hKQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739555933; x=1740160733; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=IrxkWCsY2BG0TogE7kDNIlucXg1dBLFmlliY7Uuharg=; b=YLINAvFMzXQzCGqtZpKgDMzy5GWxs/Ua4T4xuFM4USYHXvgttdDVOMH16G5+a7BUcd Ao46zaLSOs1EcxFWVA34B9VkTkai+9BuKlbpf4ZYTyKVzVr7flJp5szBtNqVGWGqquBS 7g5imxVuCY7YISCcD6Tsfo1/ahwsYLfreUCw+KGlpurqzpfCwq5lmOI4NAFAvsLo1+b/ N6hYTxciKd0OjXp++fejVR6TfXiutXwWUFttlsjDByD1BXVNtIOGUBkt7zYc3/ZvNIuf UM3E5TKO27bKqOA1U2Cp5ETIlRQbS8YQFWOY5U8doxe09SQjIXChjgAvIz71d9kAFVbl jLhg== X-Gm-Message-State: AOJu0YxllfFOU1X1BGj/IG2NIyftdY5rrXcLVyXFxlS7K6YdLoLM7RYi eMzK5tDFIkhIXFvMyaqLgrKBHYmT4PsZfhkysfSha18Db5DS2Zs/ehH/dNeT1nI= X-Gm-Gg: ASbGncseEw6sylqpjc7lCuQvRtwAlr2+DWjJZCImAt47Ej+N+/IGMYe98iLYEuLhVsF mNIaALc9a8VKYHmrojzhPUPcr6C7Wok8huYnFgSkfngg4UiFyAjbYl7wEAzAXTcXgfY/e7mSQO7 dz88tzB9Qz2gSf5gOHbDuc6DVASASeOHkTvlI2LOau4v2J6wdm5KKU49rchRd3gAm5F/IbDZC6f Sp19Ljxicxc7ZeoSXWoLyyBVSKpnfW9VlW/SOtSM9xjUU6I2AecAFW7zzdOEJpZZc4YhrdSsIe2 8bvU0nf4X4Oj/dtX5qEv9hCY5TvSX4k2tdnc X-Google-Smtp-Source: AGHT+IFU/H/3lJEuxMEyY0X4vY4k43jZAun4SfbIvrjpzUIvreDXgYLIc7CySd22D5UQYi76P8WUVQ== X-Received: by 2002:a17:902:cccc:b0:220:be86:a42d with SMTP id d9443c01a7336-22104048ecemr2911695ad.21.1739555932901; Fri, 14 Feb 2025 09:58:52 -0800 (PST) Received: from KASONG-MC4.tencent.com ([106.37.120.226]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-220d55943b5sm31216605ad.246.2025.02.14.09.58.48 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Fri, 14 Feb 2025 09:58:52 -0800 (PST) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Chris Li , Barry Song , Hugh Dickins , Yosry Ahmed , "Huang, Ying" , Baoquan He , Nhat Pham , Johannes Weiner , Kalesh Singh , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH 6/7] mm, swap: remove swap slot cache Date: Sat, 15 Feb 2025 01:57:08 +0800 Message-ID: <20250214175709.76029-7-ryncsn@gmail.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250214175709.76029-1-ryncsn@gmail.com> References: <20250214175709.76029-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 X-Rspamd-Queue-Id: B0B8C20003 X-Stat-Signature: xbfp7jjefbdwk38rs7f53irb9isndkcy X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1739555934-204802 X-HE-Meta: U2FsdGVkX1/4CMXjzagpVoJmd1acr7YiI8AM7U9qqhLz4dYoWyr7OgNJkrxSATMmN42WWfDEhiWVvuPeNcTu0dgfxWBlKKMIhxRrokeIUZI8fQrHcEE+uQX+Ilvi6/8xfnjPOw532C/x/0pj7aZuxGBxJ+zNq9f5PRD4ietTP5RnSTzCUa5G5LgcDR0QSwx12/lEZiAnq8y478FIDSLmwIVZS/1V+8a5YgfAoujR/h54o9bSxIv7wZPTd/FKFoikhe43SeDj8nXD9IqQrKyCaL9hi8I+ylbl3iG8jg6t24PQGs6NyMwXTC7YA3zRBSBZSsYvAA4zLhRc609u2kfGFbH9Xvzog5AuuCom+d22i0EV1Y9FBlJ1m7BrYADVTL8BrVW5q62Hx/svpnXWF1a08AQjYwCvDrkZszciMOted5CnifcXyNxmPu8r8eoxIUFo2iX64AGc6ZQs7O1+OZs/FI6COH48LVSk70CGSRHdaLf9ry33b6Y5yNriHDZEYe3zJiO1etet8F0JVovsSFk6IxgxCoxYOnhab3ILd48yDRXg4UgxsBeSEaJKRf4xfwiH+2jfGEmrIXERQunBrYbaaN8SzqINJ75dNWecchR/mB4lo90AFkufutX9z0yfJhXGdH734tRR7KzJKV2FAuCu35G8ctHkimIqawFXSfGkr7sCIrQUy0WNFLfXRT1eInGp3IkE8Gs+BhhFBpszywJ3MrFpWrzacij3edAX7AF1lUMwQj001sQaqvHL9HOWtcFYaZHN9WMVaGTPL38UYsrGBtxKy2hkor03nz9/EM3tjRJqFpnjsHpHeuaZE2p3glZjTM22ATptMIzJRg+rLBVU4wtin/q7Is8h2N/5RVyFiTQ9fJBuMXbgvLU127B3dx3h0pAI/eFw9s4nYABWOrbVyq+LBEJwnLP0c592H2thvLJWSM6FjSc9iwhtsiEDJkTgjFRz9+wOCAPOBbNSL/k flw4dxwg w6YU8azfKiIeguVukogHTHnJ48YV63c/p5fAJshS6sSSBeSt9e+ISUGA7BJ5UCf65IA64nRQRIB4iTa1TrRvCYFjcji597NbS66FfKKKGHNXvKQRTnb3jTZ0p7VQYDX4HVVwZKz3Dg82T9k6/k2PrC53LHNAoWnm8ZemO3kcc4La5Z9k6F9ph7DnvsD4JSb14mKmmctyK75HZnWv/w77iqhQuK2+NC9ddQs3vvGs8zn1q/1wpu30hz9+7ddjqm7rEWcq/Ins5mevcWFZvCzQ/f05QMcMoponiXk6a+5Uvu4dlHchx6EDlDHaVwjiMsh4/sMKw51co+pWQmWQEku4EYlfgdEnrTCM6dotACsdYhFWvrs8jH+FLqr7GFAyFjnH4rEowG9D14IVRgqJRdAeo8ZCWmzejiuWPMC+wQ+rwfIMOBM2npz1zlzu4JHmmfSnvoRm+RCgfxdAQzLAM7OM2+mJZrs0p+OMqeybDTTD4HT45HjMyKYkWwNhEcwwg2m+mJaULoYL0YPSyC5bXaaaZnHaS0ZfOLey/+NmjOHHjkUoZ6V02O8BDFGLbys5Ea1ukwx9f2lcLn0ar0LPj/McyYs5C9w== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Kairui Song Slot cache is no longer needed now, removing it and all related code. - vm-scalability with: `usemem --init-time -O -y -x -R -31 1G`, 12G memory cgroup using simulated pmem as SWAP (32G pmem, 32 CPUs), 16 test runs for each case, measuring the total throughput: Before (KB/s) (stdev) After (KB/s) (stdev) Random (4K): 424907.60 (24410.78) 414745.92 (34554.78) Random (64K): 163308.82 (11635.72) 167314.50 (18434.99) Sequential (4K, !-R): 6150056.79 (103205.90) 6321469.06 (115878.16) The performance changes are below noise level. - Build linux kernel with make -j96, using 4K folio with 1.5G memory cgroup limit and 64K folio with 2G memory cgroup limit, on top of tmpfs, 12 test runs, measuring the system time: Before (s) (stdev) After (s) (stdev) make -j96 (4K): 6445.69 (61.95) 6408.80 (69.46) make -j96 (64K): 6841.71 (409.04) 6437.99 (435.55) Similar to above, 64k mTHP case showed a slight improvement. Signed-off-by: Kairui Song --- include/linux/swap.h | 2 - include/linux/swap_slots.h | 28 ---- mm/Makefile | 2 +- mm/swap_slots.c | 295 ------------------------------------- mm/swap_state.c | 8 +- mm/swapfile.c | 173 ++++++++-------------- 6 files changed, 64 insertions(+), 444 deletions(-) delete mode 100644 include/linux/swap_slots.h delete mode 100644 mm/swap_slots.c diff --git a/include/linux/swap.h b/include/linux/swap.h index a8d84f22357e..456833705ea0 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -465,7 +465,6 @@ void free_pages_and_swap_cache(struct encoded_page **, int); extern atomic_long_t nr_swap_pages; extern long total_swap_pages; extern atomic_t nr_rotate_swap; -extern bool has_usable_swap(void); /* Swap 50% full? Release swapcache more aggressively.. */ static inline bool vm_swap_full(void) @@ -489,7 +488,6 @@ extern void swap_shmem_alloc(swp_entry_t, int); extern int swap_duplicate(swp_entry_t); extern int swapcache_prepare(swp_entry_t entry, int nr); extern void swap_free_nr(swp_entry_t entry, int nr_pages); -extern void swapcache_free_entries(swp_entry_t *entries, int n); extern void free_swap_and_cache_nr(swp_entry_t entry, int nr); int swap_type_of(dev_t device, sector_t offset); int find_first_swap(dev_t *device); diff --git a/include/linux/swap_slots.h b/include/linux/swap_slots.h deleted file mode 100644 index 840aec3523b2..000000000000 --- a/include/linux/swap_slots.h +++ /dev/null @@ -1,28 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 */ -#ifndef _LINUX_SWAP_SLOTS_H -#define _LINUX_SWAP_SLOTS_H - -#include -#include -#include - -#define SWAP_SLOTS_CACHE_SIZE SWAP_BATCH -#define THRESHOLD_ACTIVATE_SWAP_SLOTS_CACHE (5*SWAP_SLOTS_CACHE_SIZE) -#define THRESHOLD_DEACTIVATE_SWAP_SLOTS_CACHE (2*SWAP_SLOTS_CACHE_SIZE) - -struct swap_slots_cache { - bool lock_initialized; - struct mutex alloc_lock; /* protects slots, nr, cur */ - swp_entry_t *slots; - int nr; - int cur; - int n_ret; -}; - -void disable_swap_slots_cache_lock(void); -void reenable_swap_slots_cache_unlock(void); -void enable_swap_slots_cache(void); - -extern bool swap_slot_cache_enabled; - -#endif /* _LINUX_SWAP_SLOTS_H */ diff --git a/mm/Makefile b/mm/Makefile index 53392d2af3a5..ea16e472b294 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -75,7 +75,7 @@ ifdef CONFIG_MMU obj-$(CONFIG_ADVISE_SYSCALLS) += madvise.o endif -obj-$(CONFIG_SWAP) += page_io.o swap_state.o swapfile.o swap_slots.o +obj-$(CONFIG_SWAP) += page_io.o swap_state.o swapfile.o obj-$(CONFIG_ZSWAP) += zswap.o obj-$(CONFIG_HAS_DMA) += dmapool.o obj-$(CONFIG_HUGETLBFS) += hugetlb.o diff --git a/mm/swap_slots.c b/mm/swap_slots.c deleted file mode 100644 index 9c7c171df7ba..000000000000 --- a/mm/swap_slots.c +++ /dev/null @@ -1,295 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0 -/* - * Manage cache of swap slots to be used for and returned from - * swap. - * - * Copyright(c) 2016 Intel Corporation. - * - * Author: Tim Chen - * - * We allocate the swap slots from the global pool and put - * it into local per cpu caches. This has the advantage - * of no needing to acquire the swap_info lock every time - * we need a new slot. - * - * There is also opportunity to simply return the slot - * to local caches without needing to acquire swap_info - * lock. We do not reuse the returned slots directly but - * move them back to the global pool in a batch. This - * allows the slots to coalesce and reduce fragmentation. - * - * The swap entry allocated is marked with SWAP_HAS_CACHE - * flag in map_count that prevents it from being allocated - * again from the global pool. - * - * The swap slots cache is protected by a mutex instead of - * a spin lock as when we search for slots with scan_swap_map, - * we can possibly sleep. - */ - -#include -#include -#include -#include -#include -#include -#include - -static DEFINE_PER_CPU(struct swap_slots_cache, swp_slots); -static bool swap_slot_cache_active; -bool swap_slot_cache_enabled; -static bool swap_slot_cache_initialized; -static DEFINE_MUTEX(swap_slots_cache_mutex); -/* Serialize swap slots cache enable/disable operations */ -static DEFINE_MUTEX(swap_slots_cache_enable_mutex); - -static void __drain_swap_slots_cache(void); - -#define use_swap_slot_cache (swap_slot_cache_active && swap_slot_cache_enabled) - -static void deactivate_swap_slots_cache(void) -{ - mutex_lock(&swap_slots_cache_mutex); - swap_slot_cache_active = false; - __drain_swap_slots_cache(); - mutex_unlock(&swap_slots_cache_mutex); -} - -static void reactivate_swap_slots_cache(void) -{ - mutex_lock(&swap_slots_cache_mutex); - swap_slot_cache_active = true; - mutex_unlock(&swap_slots_cache_mutex); -} - -/* Must not be called with cpu hot plug lock */ -void disable_swap_slots_cache_lock(void) -{ - mutex_lock(&swap_slots_cache_enable_mutex); - swap_slot_cache_enabled = false; - if (swap_slot_cache_initialized) { - /* serialize with cpu hotplug operations */ - cpus_read_lock(); - __drain_swap_slots_cache(); - cpus_read_unlock(); - } -} - -static void __reenable_swap_slots_cache(void) -{ - swap_slot_cache_enabled = has_usable_swap(); -} - -void reenable_swap_slots_cache_unlock(void) -{ - __reenable_swap_slots_cache(); - mutex_unlock(&swap_slots_cache_enable_mutex); -} - -static bool check_cache_active(void) -{ - long pages; - - if (!swap_slot_cache_enabled) - return false; - - pages = get_nr_swap_pages(); - if (!swap_slot_cache_active) { - if (pages > num_online_cpus() * - THRESHOLD_ACTIVATE_SWAP_SLOTS_CACHE) - reactivate_swap_slots_cache(); - goto out; - } - - /* if global pool of slot caches too low, deactivate cache */ - if (pages < num_online_cpus() * THRESHOLD_DEACTIVATE_SWAP_SLOTS_CACHE) - deactivate_swap_slots_cache(); -out: - return swap_slot_cache_active; -} - -static int alloc_swap_slot_cache(unsigned int cpu) -{ - struct swap_slots_cache *cache; - swp_entry_t *slots; - - /* - * Do allocation outside swap_slots_cache_mutex - * as kvzalloc could trigger reclaim and folio_alloc_swap, - * which can lock swap_slots_cache_mutex. - */ - slots = kvcalloc(SWAP_SLOTS_CACHE_SIZE, sizeof(swp_entry_t), - GFP_KERNEL); - if (!slots) - return -ENOMEM; - - mutex_lock(&swap_slots_cache_mutex); - cache = &per_cpu(swp_slots, cpu); - if (cache->slots) { - /* cache already allocated */ - mutex_unlock(&swap_slots_cache_mutex); - - kvfree(slots); - - return 0; - } - - if (!cache->lock_initialized) { - mutex_init(&cache->alloc_lock); - cache->lock_initialized = true; - } - cache->nr = 0; - cache->cur = 0; - cache->n_ret = 0; - /* - * We initialized alloc_lock and free_lock earlier. We use - * !cache->slots or !cache->slots_ret to know if it is safe to acquire - * the corresponding lock and use the cache. Memory barrier below - * ensures the assumption. - */ - mb(); - cache->slots = slots; - mutex_unlock(&swap_slots_cache_mutex); - return 0; -} - -static void drain_slots_cache_cpu(unsigned int cpu, bool free_slots) -{ - struct swap_slots_cache *cache; - - cache = &per_cpu(swp_slots, cpu); - if (cache->slots) { - mutex_lock(&cache->alloc_lock); - swapcache_free_entries(cache->slots + cache->cur, cache->nr); - cache->cur = 0; - cache->nr = 0; - if (free_slots && cache->slots) { - kvfree(cache->slots); - cache->slots = NULL; - } - mutex_unlock(&cache->alloc_lock); - } -} - -static void __drain_swap_slots_cache(void) -{ - unsigned int cpu; - - /* - * This function is called during - * 1) swapoff, when we have to make sure no - * left over slots are in cache when we remove - * a swap device; - * 2) disabling of swap slot cache, when we run low - * on swap slots when allocating memory and need - * to return swap slots to global pool. - * - * We cannot acquire cpu hot plug lock here as - * this function can be invoked in the cpu - * hot plug path: - * cpu_up -> lock cpu_hotplug -> cpu hotplug state callback - * -> memory allocation -> direct reclaim -> folio_alloc_swap - * -> drain_swap_slots_cache - * - * Hence the loop over current online cpu below could miss cpu that - * is being brought online but not yet marked as online. - * That is okay as we do not schedule and run anything on a - * cpu before it has been marked online. Hence, we will not - * fill any swap slots in slots cache of such cpu. - * There are no slots on such cpu that need to be drained. - */ - for_each_online_cpu(cpu) - drain_slots_cache_cpu(cpu, false); -} - -static int free_slot_cache(unsigned int cpu) -{ - mutex_lock(&swap_slots_cache_mutex); - drain_slots_cache_cpu(cpu, true); - mutex_unlock(&swap_slots_cache_mutex); - return 0; -} - -void enable_swap_slots_cache(void) -{ - mutex_lock(&swap_slots_cache_enable_mutex); - if (!swap_slot_cache_initialized) { - int ret; - - ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "swap_slots_cache", - alloc_swap_slot_cache, free_slot_cache); - if (WARN_ONCE(ret < 0, "Cache allocation failed (%s), operating " - "without swap slots cache.\n", __func__)) - goto out_unlock; - - swap_slot_cache_initialized = true; - } - - __reenable_swap_slots_cache(); -out_unlock: - mutex_unlock(&swap_slots_cache_enable_mutex); -} - -/* called with swap slot cache's alloc lock held */ -static int refill_swap_slots_cache(struct swap_slots_cache *cache) -{ - if (!use_swap_slot_cache) - return 0; - - cache->cur = 0; - if (swap_slot_cache_active) - cache->nr = get_swap_pages(SWAP_SLOTS_CACHE_SIZE, - cache->slots, 0); - - return cache->nr; -} - -swp_entry_t folio_alloc_swap(struct folio *folio) -{ - swp_entry_t entry; - struct swap_slots_cache *cache; - - entry.val = 0; - - if (folio_test_large(folio)) { - if (IS_ENABLED(CONFIG_THP_SWAP)) - get_swap_pages(1, &entry, folio_order(folio)); - goto out; - } - - /* - * Preemption is allowed here, because we may sleep - * in refill_swap_slots_cache(). But it is safe, because - * accesses to the per-CPU data structure are protected by the - * mutex cache->alloc_lock. - * - * The alloc path here does not touch cache->slots_ret - * so cache->free_lock is not taken. - */ - cache = raw_cpu_ptr(&swp_slots); - - if (likely(check_cache_active() && cache->slots)) { - mutex_lock(&cache->alloc_lock); - if (cache->slots) { -repeat: - if (cache->nr) { - entry = cache->slots[cache->cur]; - cache->slots[cache->cur++].val = 0; - cache->nr--; - } else if (refill_swap_slots_cache(cache)) { - goto repeat; - } - } - mutex_unlock(&cache->alloc_lock); - if (entry.val) - goto out; - } - - get_swap_pages(1, &entry, 0); -out: - if (mem_cgroup_try_charge_swap(folio, entry)) { - put_swap_folio(folio, entry); - entry.val = 0; - } - return entry; -} diff --git a/mm/swap_state.c b/mm/swap_state.c index 50840a2887a5..2b5744e211cd 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -20,7 +20,6 @@ #include #include #include -#include #include #include #include "internal.h" @@ -447,13 +446,8 @@ struct folio *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, /* * Just skip read ahead for unused swap slot. - * During swap_off when swap_slot_cache is disabled, - * we have to handle the race between putting - * swap entry in swap cache and marking swap slot - * as SWAP_HAS_CACHE. That's done in later part of code or - * else swap_off will be aborted if we return NULL. */ - if (!swap_entry_swapped(si, entry) && swap_slot_cache_enabled) + if (!swap_entry_swapped(si, entry)) goto put_and_return; /* diff --git a/mm/swapfile.c b/mm/swapfile.c index 791cd7ed5bdf..66c8869ef346 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -37,7 +37,6 @@ #include #include #include -#include #include #include #include @@ -892,6 +891,13 @@ static unsigned long cluster_alloc_swap_entry(struct swap_info_struct *si, int o struct swap_cluster_info *ci; unsigned int offset, found = 0; + /* + * Swapfile is not block device so unable + * to allocate large entries. + */ + if (order && !(si->flags & SWP_BLKDEV)) + return 0; + if (!(si->flags & SWP_SOLIDSTATE)) { /* Serialize HDD SWAP allocation for each device. */ spin_lock(&si->global_cluster_lock); @@ -1155,43 +1161,6 @@ static void swap_range_free(struct swap_info_struct *si, unsigned long offset, swap_usage_sub(si, nr_entries); } -static int scan_swap_map_slots(struct swap_info_struct *si, - unsigned char usage, int nr, - swp_entry_t slots[], int order) -{ - unsigned int nr_pages = 1 << order; - int n_ret = 0; - - if (order > 0) { - /* - * Should not even be attempting large allocations when huge - * page swap is disabled. Warn and fail the allocation. - */ - if (!IS_ENABLED(CONFIG_THP_SWAP) || - nr_pages > SWAPFILE_CLUSTER) { - VM_WARN_ON_ONCE(1); - return 0; - } - - /* - * Swapfile is not block device so unable - * to allocate large entries. - */ - if (!(si->flags & SWP_BLKDEV)) - return 0; - } - - while (n_ret < nr) { - unsigned long offset = cluster_alloc_swap_entry(si, order, usage); - - if (!offset) - break; - slots[n_ret++] = swp_entry(si->type, offset); - } - - return n_ret; -} - static bool get_swap_device_info(struct swap_info_struct *si) { if (!percpu_ref_tryget_live(&si->users)) @@ -1212,54 +1181,53 @@ static bool get_swap_device_info(struct swap_info_struct *si) * Fast path try to get swap entries with specified order from current * CPU's swap entry pool (a cluster). */ -static int swap_alloc_fast(swp_entry_t entries[], +static int swap_alloc_fast(swp_entry_t *entry, unsigned char usage, - int order, int n_goal) + int order) { struct swap_cluster_info *ci; struct swap_info_struct *si; - unsigned int offset, found; - int n_ret = 0; - - n_goal = min(n_goal, SWAP_BATCH); + unsigned int offset, found = SWAP_ENTRY_INVALID; si = __this_cpu_read(percpu_swap_cluster.si); offset = __this_cpu_read(percpu_swap_cluster.offset[order]); if (!si || !offset || !get_swap_device_info(si)) - return 0; + return false; - while (offset) { - ci = lock_cluster(si, offset); - found = alloc_swap_scan_cluster(si, ci, offset, order, usage); - if (!found) - break; - entries[n_ret++] = swp_entry(si->type, found); - if (n_ret == n_goal) - break; - offset = __this_cpu_read(percpu_swap_cluster.offset[order]); - } + ci = lock_cluster(si, offset); + found = alloc_swap_scan_cluster(si, ci, offset, order, usage); + if (found) + *entry = swp_entry(si->type, found); put_swap_device(si); - return n_ret; + return !!found; } -int get_swap_pages(int n_goal, swp_entry_t swp_entries[], int entry_order) +swp_entry_t folio_alloc_swap(struct folio *folio) { - int order = swap_entry_order(entry_order); - unsigned long size = 1 << order; + unsigned int order = folio_order(folio); + unsigned int size = 1 << order; struct swap_info_struct *si, *next; - int n_ret = 0; + swp_entry_t entry = {}; + unsigned long offset; int node; + if (order) { + /* + * Should not even be attempting large allocations when huge + * page swap is disabled. Warn and fail the allocation. + */ + if (!IS_ENABLED(CONFIG_THP_SWAP) || size > SWAPFILE_CLUSTER) { + VM_WARN_ON_ONCE(1); + return entry; + } + } + /* Fast path using percpu cluster */ local_lock(&percpu_swap_cluster.lock); - n_ret = swap_alloc_fast(swp_entries, - SWAP_HAS_CACHE, - order, n_goal); - if (n_ret == n_goal) - goto out; + if (swap_alloc_fast(&entry, SWAP_HAS_CACHE, order)) + goto out_alloced; - n_goal = min_t(int, n_goal - n_ret, SWAP_BATCH); /* Rotate the device and switch to a new cluster */ spin_lock(&swap_avail_lock); start_over: @@ -1268,11 +1236,14 @@ int get_swap_pages(int n_goal, swp_entry_t swp_entries[], int entry_order) plist_requeue(&si->avail_lists[node], &swap_avail_heads[node]); spin_unlock(&swap_avail_lock); if (get_swap_device_info(si)) { - n_ret += scan_swap_map_slots(si, SWAP_HAS_CACHE, n_goal, - swp_entries + n_ret, order); + offset = cluster_alloc_swap_entry(si, order, SWAP_HAS_CACHE); put_swap_device(si); - if (n_ret || size > 1) - goto out; + if (offset) { + entry = swp_entry(si->type, offset); + goto out_alloced; + } + if (order) + goto out_failed; } spin_lock(&swap_avail_lock); @@ -1291,10 +1262,20 @@ int get_swap_pages(int n_goal, swp_entry_t swp_entries[], int entry_order) goto start_over; } spin_unlock(&swap_avail_lock); -out: +out_failed: + local_unlock(&percpu_swap_cluster.lock); + return entry; + +out_alloced: local_unlock(&percpu_swap_cluster.lock); - atomic_long_sub(n_ret * size, &nr_swap_pages); - return n_ret; + if (mem_cgroup_try_charge_swap(folio, entry)) { + put_swap_folio(folio, entry); + entry.val = 0; + } else { + atomic_long_sub(size, &nr_swap_pages); + } + + return entry; } static struct swap_info_struct *_swap_info_get(swp_entry_t entry) @@ -1590,25 +1571,6 @@ void put_swap_folio(struct folio *folio, swp_entry_t entry) unlock_cluster(ci); } -void swapcache_free_entries(swp_entry_t *entries, int n) -{ - int i; - struct swap_cluster_info *ci; - struct swap_info_struct *si = NULL; - - if (n <= 0) - return; - - for (i = 0; i < n; ++i) { - si = _swap_info_get(entries[i]); - if (si) { - ci = lock_cluster(si, swp_offset(entries[i])); - swap_entry_range_free(si, ci, entries[i], 1); - unlock_cluster(ci); - } - } -} - int __swap_count(swp_entry_t entry) { struct swap_info_struct *si = swp_swap_info(entry); @@ -1849,6 +1811,7 @@ void free_swap_and_cache_nr(swp_entry_t entry, int nr) swp_entry_t get_swap_page_of_type(int type) { struct swap_info_struct *si = swap_type_to_swap_info(type); + unsigned long offset; swp_entry_t entry = {0}; if (!si) @@ -1856,8 +1819,13 @@ swp_entry_t get_swap_page_of_type(int type) /* This is called for allocating swap entry, not cache */ if (get_swap_device_info(si)) { - if ((si->flags & SWP_WRITEOK) && scan_swap_map_slots(si, 1, 1, &entry, 0)) - atomic_long_dec(&nr_swap_pages); + if (si->flags & SWP_WRITEOK) { + offset = cluster_alloc_swap_entry(si, 0, 1); + if (offset) { + entry = swp_entry(si->type, offset); + atomic_long_dec(&nr_swap_pages); + } + } put_swap_device(si); } fail: @@ -2623,16 +2591,6 @@ static bool __has_usable_swap(void) return !plist_head_empty(&swap_active_head); } -bool has_usable_swap(void) -{ - bool ret; - - spin_lock(&swap_lock); - ret = __has_usable_swap(); - spin_unlock(&swap_lock); - return ret; -} - /* * Called after clearing SWP_WRITEOK, ensures cluster_alloc_range * see the updated flags, so there will be no more allocations. @@ -2724,8 +2682,6 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile) wait_for_allocation(p); - disable_swap_slots_cache_lock(); - set_current_oom_origin(); err = try_to_unuse(p->type); clear_current_oom_origin(); @@ -2733,12 +2689,9 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile) if (err) { /* re-insert swap space back into swap_list */ reinsert_swap_info(p); - reenable_swap_slots_cache_unlock(); goto out_dput; } - reenable_swap_slots_cache_unlock(); - /* * Wait for swap operations protected by get/put_swap_device() * to complete. Because of synchronize_rcu() here, all swap @@ -3487,8 +3440,6 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags) putname(name); if (inode) inode_unlock(inode); - if (!error) - enable_swap_slots_cache(); return error; }