From patchwork Tue Dec 24 14:38:04 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kairui Song X-Patchwork-Id: 13920194 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A72E7E7718D for ; Tue, 24 Dec 2024 14:39:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 370CE6B0099; Tue, 24 Dec 2024 09:39:58 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 2F8986B009A; Tue, 24 Dec 2024 09:39:58 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 14B796B009B; Tue, 24 Dec 2024 09:39:58 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id DFA436B0099 for ; Tue, 24 Dec 2024 09:39:57 -0500 (EST) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 9DF03A15CD for ; Tue, 24 Dec 2024 14:39:57 +0000 (UTC) X-FDA: 82930111656.29.24C6174 Received: from mail-pl1-f175.google.com (mail-pl1-f175.google.com [209.85.214.175]) by imf26.hostedemail.com (Postfix) with ESMTP id B0FF6140005 for ; Tue, 24 Dec 2024 14:39:25 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=SxPksWVW; spf=pass (imf26.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.175 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1735051177; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=mhVpmJTS6rGzEbGdCTwpT/iR8iNpYNa0AWyURPUsTrM=; b=z3zWsPLrg6iYL11Nx8rG6abeAFIT892BdtBLKS5FnS1MBgAl5R35jmMUEij21/McCRx8S3 UDJlbLVGCai9rhsHyG9dUwa1k0ao5S9rgHggn+sBlSfqwneNsleSU7rgAe41o86Yts9mj2 oLfxJsqQAmlwlPAlPxp9wFbwCOTlbq8= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=SxPksWVW; spf=pass (imf26.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.175 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1735051177; a=rsa-sha256; cv=none; b=ujR1oY0KMVuj5j82Z+MMxpDpU72MEgryANFvOHCo9ltfceLDMsx0gHO0X9VaWoCkU7NCXa DQ+I1BQOlRAcVKTTXZnC0jHChr9bAeOLQoexmIsL0mVHdyMqUQhyJXe1omcYAq+ncOdQdi 3ZU/t7Yw+4MxESMY+nLZMcLCUrCZFtM= Received: by mail-pl1-f175.google.com with SMTP id d9443c01a7336-2165448243fso62663535ad.1 for ; Tue, 24 Dec 2024 06:39:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1735051194; x=1735655994; darn=kvack.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=mhVpmJTS6rGzEbGdCTwpT/iR8iNpYNa0AWyURPUsTrM=; b=SxPksWVWKTBWMobah0Qbhey0Zn/YISxaVaZLaEk1VOaQ2vsygP7VKFlCM/tB7SZKsO BquYFH/RWjwF4Pmw182jKVEEtQ1K7ZRIwu1xoOpSSglftKbQu69XuL3IUMp5Ixcg+aD2 /Rfhi0yMV6Eb7ORZAgzKH4CvfSsiGvHDWRaw1hxhZPUj+vot6cooK3cmTgH4BMRNWJiD OjLlObRpxQzjjI0q/vSvlOhnLyzptzE3REkGHhp8Q6KyBcncatPtWsj37E5OWaBmTgxv eprZNr+1osSjA/YwAjncmfo1NmdwIFLLdQsntCdL2w/1nkBowLdFFvSfG/ys7e1kwT4A yCgw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1735051194; x=1735655994; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=mhVpmJTS6rGzEbGdCTwpT/iR8iNpYNa0AWyURPUsTrM=; b=eoBSGosRoACtITPQQqS+32au4n7UlTNHeyPVhpAiockaqMQEnZ1H66UPPqpaexup99 1fXq+gKmibk/JQXtcC6KNTlwGnH9d/YBXjmuL+u87S6ZXVUaGyh1IzrgcAXbh1WZk6oE KMwShiwLA79/jTL7vCd4OLIDpycnmjDdy+WzK7vsNNCpVeD9/l2rEyw40U3gqqL1DpL2 yaNY2jvE34iewSc36dUKfYCvZ+fZcG7c4tcX+CrSCmhxjJTfSd5LfGO37kbUmta2cmLs GR9dzfB3FbWyER+UCRGAl/xJwV3SdJBH/R3/oz/laM71nVAppgzGDibGQ3Z7VJnohJYg c4Hw== X-Gm-Message-State: AOJu0YzXs4XCHgLTBOdKDvCbx4r/nZ62WKV5j6TZUVi/h3R4K1SGxq1T aDaL6E8P0TRxWrITXuECqcPUoimsvM0SC0/yJo1QeQoYwIeogpnBiukse9+Na2E= X-Gm-Gg: ASbGnctvgCbBRlotYM5HudS4Cxf/dCKjP/o5SisOe0guF/SshQUchiq+WqS1sB7plJk ZuG1PWbcvN6dkmbs7PmXI6hhiNlgQ3SIcN25WIEeHVsPQ4Ofac7DKRVsvuzWOOcU+O2oZcESTnE rrXaey6x9lwwqd7r4z1K5V3UnuaXzBItHwzVTwtzgkt11ADNJl7Lz3rbMbEx07RzoMnq0v7Yhp7 GTkt3ioKaVNeMFNwgxV5pnp3pwC3Apys7JKzOm5RFuiWRtWs+yxc4PiUYkH+sEoHFfw7vo9wJRh 8A== X-Google-Smtp-Source: AGHT+IF0Mieft9SPZ9YL6n6WOz7rzrI2R1GTud1LrsAnCe1roQtNTk+vccAeFonoqRhrlCFIHfMJ1Q== X-Received: by 2002:a17:902:eccc:b0:20c:9936:f0ab with SMTP id d9443c01a7336-219e6f27fecmr238487775ad.47.1735051193891; Tue, 24 Dec 2024 06:39:53 -0800 (PST) Received: from KASONG-MC4.tencent.com ([115.171.41.189]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-219dc9cdec1sm90598735ad.136.2024.12.24.06.39.49 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 24 Dec 2024 06:39:53 -0800 (PST) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Chris Li , Barry Song , Ryan Roberts , Hugh Dickins , Yosry Ahmed , "Huang, Ying" , Nhat Pham , Johannes Weiner , Kalesh Singh , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH v2 06/13] mm, swap: clean up plist removal and adding Date: Tue, 24 Dec 2024 22:38:04 +0800 Message-ID: <20241224143811.33462-7-ryncsn@gmail.com> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20241224143811.33462-1-ryncsn@gmail.com> References: <20241224143811.33462-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 X-Rspamd-Queue-Id: B0FF6140005 X-Stat-Signature: 9f4um7yrh89bmwbx8kezuyjakh41z84g X-Rspam-User: X-Rspamd-Server: rspam11 X-HE-Tag: 1735051165-550778 X-HE-Meta: U2FsdGVkX1+qBJWyPA5ZL430SRnDGdQ7o523AXCYiOkh64+YgAC+nT9kzAgwEK06/LDMQleMHQMTXlNZnjMU5usubxWeLqVgZeASwHssR/dVB5LTaQCPzMkMe2vNtYDh1iZlkvRzfXQ8VsyGjiqqvPGzXnRY+Ij50Y9QaUsYLTXcoZkVsJ9HSwcF+QZae4mxmUmBXGp9lfTkN0qI9HEWhRx3Lue3dO8H054gk3RgRCAJob7mHG2i6Rtcku7AsXdAvEqH3idBWD99gaDHcMhtLuw5Pn5Are3hdqeWM8QW9HibV0BY4nL1m+q+a5/74ehbVqYhp438uRi5vLGgapPg0g+tQnlb/pHx0qS9aADAm/vTI95uwebLK1IyaBdfXc+h59xU0V9mCWuMP+OYiys4yZkSQgdxNh13dD+S6eSlBQoy9HLPMMsTu6Xid0zVIHb0tVoeYqBWZ4T8HlL4SGo0v4umWDyUdUufCOH51JNkG2LVsrMbZwq7ndFK5eY7z6UszOL2Pn8EQ42E9OdvUIyu+/20fERA5HIp7q1hPkNcII0HzqPNK2+Lz7QuIO9xjXf4RLF4OLL4HKjh4YgcEYandDvwr40Oa299J/C6DYKIgTcn7eoFNcDVMqZJNR0s4Dv335nG0q0yVawvQZMk9BL965eA+VwkO38X8vbACeJRiXWx0bF7wyHO0VpRNo8mIMOlQaFeuIf5nk1Ym2OvVrDUsm9MiInkr91iVL2asIgMg5ArPN2uow7zs5gKz+XmfYdspoqJq54LcOHrtx2QZ9xaQLhQ1jbmMXK4r1p61R5rjausfEGZ+t3MAVr74sEgt/kfgJzAcff8mQkk1HBoWxIqN0lIFCpAsf4L4CcdBkgtCWzi/0Twc9/cdFWYPA7sFZC5mohZVzUabQswixW0iv80CTO+YjNNlI0VjxMtvj8y9nCQ1VNFBnCZvRcSt49IFZ1TFnhv/BYsS81sCIuZNya AGaEzdHC V9SkXfp8gHUtBT7Zsf+DHUdb23aQjTleKv7CXe8iJmMBBwciHbQe+NM3VB60gTiE75r8uNMVAJ1OPjwhN5wsAJ4ydO2diR+cBFYRvpFZ8/NbkP2zo3Hlp1OQfI+NHnyDGVoQGL0fGtJvGNGRaftVGzdS7gFwVwlwd623EH9a52s8XvMf9AsnBSiqGSrpjmIr6Br08vN7cSBE/HH3jIt57kF4iRZvBFOfteEGxZm390TVKNo0z31G8wjZP2XdCAdDigVKDSd5taCex/LeT5IAB1AMo/vD4gEYHTzJBdfRo6LWTagJ8UyRpGNwTtsxH8hqevt8VYtdB9I8MklsU9ffFCiPNA+/XkybOH/uMI2m4xHsi09W20YgbW1WZg363Tk4T4zfYUGedAXp0qoZ40LxSXR2m4tm//kUBx1lnMUtiRY8yLoU7QxK2QH1IKQwK8jZ6/u4TtvOCuNnC0WDecAC7/5tasM5N7ykLU7qmy9S23/Y+y5glqp6zJEgnhh7yPPTUyqUsRjeivK7u3c+05gGSDABeDw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Kairui Song When the swap device is full (inuse_pages == pages), it should be removed from the allocation available plist. If any slot is freed, the swap device should be added back to the plist. Additionally, during swapon or swapoff, the swap device is forcefully added or removed. Currently, the condition (inuse_pages == pages) is checked after every counter update, then remove or add the device accordingly. This is serialized by si->lock. This commit decouples it from the protection of si->lock and reworked plist removal and adding, making it possible to get rid of the hard dependency on si->lock in allocation path in later commits. To achieve this, simply using another lock is not an optimal approach, as the overhead is observable for a hot counter, and may cause complex locking issues. Thus, this commit manages to make it a lock-free atomic operation, by embedding the plist state into the second highest bit of the atomic counter. Simply making the counter an atomic will not work, if the update and plist status check are not performed atomically, we may miss an addition or removal. With the embedded info we can update the counter and check the plist status with single atomic operations, and avoid any extra overheads: If the counter is full (inuse_pages == pages) and the off-list bit is unset, we attempt to remove it from the plist. If the counter is not full (inuse_pages != pages) and the off-list bit is set, we attempt to add it to the plist. Removing, adding and bit update is serialized with a lock, which is a cold path. Ordinary counter updates will be lock-free. Signed-off-by: Kairui Song --- include/linux/swap.h | 2 +- mm/swapfile.c | 184 +++++++++++++++++++++++++++++++------------ 2 files changed, 135 insertions(+), 51 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index 0c222017b5c6..e1eeea6307cd 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -307,7 +307,7 @@ struct swap_info_struct { /* list of cluster that are fragmented or contented */ unsigned int frag_cluster_nr[SWAP_NR_ORDERS]; unsigned int pages; /* total of usable pages of swap */ - unsigned int inuse_pages; /* number of those currently in use */ + atomic_long_t inuse_pages; /* number of those currently in use */ struct percpu_cluster __percpu *percpu_cluster; /* per cpu's swap location */ struct rb_root swap_extent_root;/* root of the swap extent rbtree */ struct block_device *bdev; /* swap device or bdev of swap file */ diff --git a/mm/swapfile.c b/mm/swapfile.c index 7963a0c646a4..ae0f7df06474 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -128,6 +128,26 @@ static inline unsigned char swap_count(unsigned char ent) return ent & ~SWAP_HAS_CACHE; /* may include COUNT_CONTINUED flag */ } +/* + * Use the second highest bit of inuse_pages counter as the indicator + * of if one swap device is on the available plist, so the atomic can + * still be updated arithmetic while having special data embedded. + * + * inuse_pages counter is the only thing indicating if a device should + * be on avail_lists or not (except swapon / swapoff). By embedding the + * on-list bit in the atomic counter, updates no longer need any lock + * to check the list status. + * + * This bit will be set if the device is not on the plist and not + * usable, will be cleared if the device is on the plist. + */ +#define SWAP_USAGE_OFFLIST_BIT (1UL << (BITS_PER_TYPE(atomic_t) - 2)) +#define SWAP_USAGE_COUNTER_MASK (~SWAP_USAGE_OFFLIST_BIT) +static long swap_usage_in_pages(struct swap_info_struct *si) +{ + return atomic_long_read(&si->inuse_pages) & SWAP_USAGE_COUNTER_MASK; +} + /* Reclaim the swap entry anyway if possible */ #define TTRS_ANYWAY 0x1 /* @@ -717,7 +737,7 @@ static void swap_reclaim_full_clusters(struct swap_info_struct *si, bool force) int nr_reclaim; if (force) - to_scan = si->inuse_pages / SWAPFILE_CLUSTER; + to_scan = swap_usage_in_pages(si) / SWAPFILE_CLUSTER; while (!list_empty(&si->full_clusters)) { ci = list_first_entry(&si->full_clusters, struct swap_cluster_info, list); @@ -872,42 +892,124 @@ static unsigned long cluster_alloc_swap_entry(struct swap_info_struct *si, int o return found; } -static void __del_from_avail_list(struct swap_info_struct *si) +/* SWAP_USAGE_OFFLIST_BIT can only be cleared by this helper. */ +static void del_from_avail_list(struct swap_info_struct *si, bool swapoff) { int nid; - assert_spin_locked(&si->lock); + spin_lock(&swap_avail_lock); + + if (swapoff) { + /* + * Forcefully remove it. Clear the SWP_WRITEOK flags for + * swapoff here so it's synchronized by both si->lock and + * swap_avail_lock, to ensure the result can be seen by + * add_to_avail_list. + */ + lockdep_assert_held(&si->lock); + si->flags &= ~SWP_WRITEOK; + atomic_long_or(SWAP_USAGE_OFFLIST_BIT, &si->inuse_pages); + } else { + /* + * If not called by swapoff, take it off-list only if it's + * full and SWAP_USAGE_OFFLIST_BIT is not set (strictly + * si->inuse_pages == pages), any concurrent slot freeing, + * or device already removed from plist by someone else + * will make this return false. + */ + if (atomic_long_cmpxchg(&si->inuse_pages, si->pages, + si->pages | SWAP_USAGE_OFFLIST_BIT) != si->pages) + goto skip; + } + for_each_node(nid) plist_del(&si->avail_lists[nid], &swap_avail_heads[nid]); + +skip: + spin_unlock(&swap_avail_lock); } -static void del_from_avail_list(struct swap_info_struct *si) +/* SWAP_USAGE_OFFLIST_BIT can only be set by this helper. */ +static void add_to_avail_list(struct swap_info_struct *si, bool swapon) { + int nid; + long val; + spin_lock(&swap_avail_lock); - __del_from_avail_list(si); + + /* Corresponding to SWP_WRITEOK clearing in del_from_avail_list */ + if (swapon) { + lockdep_assert_held(&si->lock); + si->flags |= SWP_WRITEOK; + } else { + if (!(READ_ONCE(si->flags) & SWP_WRITEOK)) + goto skip; + } + + if (!(atomic_long_read(&si->inuse_pages) & SWAP_USAGE_OFFLIST_BIT)) + goto skip; + + val = atomic_long_fetch_and_relaxed(~SWAP_USAGE_OFFLIST_BIT, &si->inuse_pages); + + /* + * When device is full and device is on the plist, only one updater will + * see (inuse_pages == si->pages) and will call del_from_avail_list. If + * that updater happen to be here, just skip adding. + */ + if (val == si->pages) { + /* Just like the cmpxchg in del_from_avail_list */ + if (atomic_long_cmpxchg(&si->inuse_pages, si->pages, + si->pages | SWAP_USAGE_OFFLIST_BIT) == si->pages) + goto skip; + } + + for_each_node(nid) + plist_add(&si->avail_lists[nid], &swap_avail_heads[nid]); + +skip: spin_unlock(&swap_avail_lock); } -static void swap_range_alloc(struct swap_info_struct *si, - unsigned int nr_entries) +/* + * swap_usage_add / swap_usage_sub of each slot are serialized by ci->lock + * within each cluster, so the total contribution to the global counter should + * always be positive and cannot exceed the total number of usable slots. + */ +static bool swap_usage_add(struct swap_info_struct *si, unsigned int nr_entries) { - WRITE_ONCE(si->inuse_pages, si->inuse_pages + nr_entries); - if (si->inuse_pages == si->pages) { - del_from_avail_list(si); + long val = atomic_long_add_return_relaxed(nr_entries, &si->inuse_pages); - if (si->cluster_info && vm_swap_full()) - schedule_work(&si->reclaim_work); + /* + * If device is full, and SWAP_USAGE_OFFLIST_BIT is not set, + * remove it from the plist. + */ + if (unlikely(val == si->pages)) { + del_from_avail_list(si, false); + return true; } + + return false; } -static void add_to_avail_list(struct swap_info_struct *si) +static void swap_usage_sub(struct swap_info_struct *si, unsigned int nr_entries) { - int nid; + long val = atomic_long_sub_return_relaxed(nr_entries, &si->inuse_pages); - spin_lock(&swap_avail_lock); - for_each_node(nid) - plist_add(&si->avail_lists[nid], &swap_avail_heads[nid]); - spin_unlock(&swap_avail_lock); + /* + * If device is not full, and SWAP_USAGE_OFFLIST_BIT is set, + * remove it from the plist. + */ + if (unlikely(val & SWAP_USAGE_OFFLIST_BIT)) + add_to_avail_list(si, false); +} + +static void swap_range_alloc(struct swap_info_struct *si, + unsigned int nr_entries) +{ + if (swap_usage_add(si, nr_entries)) { + if (si->cluster_info && vm_swap_full()) + schedule_work(&si->reclaim_work); + } } static void swap_range_free(struct swap_info_struct *si, unsigned long offset, @@ -925,8 +1027,6 @@ static void swap_range_free(struct swap_info_struct *si, unsigned long offset, for (i = 0; i < nr_entries; i++) clear_bit(offset + i, si->zeromap); - if (si->inuse_pages == si->pages) - add_to_avail_list(si); if (si->flags & SWP_BLKDEV) swap_slot_free_notify = si->bdev->bd_disk->fops->swap_slot_free_notify; @@ -946,7 +1046,7 @@ static void swap_range_free(struct swap_info_struct *si, unsigned long offset, */ smp_wmb(); atomic_long_add(nr_entries, &nr_swap_pages); - WRITE_ONCE(si->inuse_pages, si->inuse_pages - nr_entries); + swap_usage_sub(si, nr_entries); } static int cluster_alloc_swap(struct swap_info_struct *si, @@ -1036,19 +1136,6 @@ int get_swap_pages(int n_goal, swp_entry_t swp_entries[], int entry_order) plist_requeue(&si->avail_lists[node], &swap_avail_heads[node]); spin_unlock(&swap_avail_lock); spin_lock(&si->lock); - if ((si->inuse_pages == si->pages) || !(si->flags & SWP_WRITEOK)) { - spin_lock(&swap_avail_lock); - if (plist_node_empty(&si->avail_lists[node])) { - spin_unlock(&si->lock); - goto nextsi; - } - WARN(!(si->flags & SWP_WRITEOK), - "swap_info %d in list but !SWP_WRITEOK\n", - si->type); - __del_from_avail_list(si); - spin_unlock(&si->lock); - goto nextsi; - } n_ret = scan_swap_map_slots(si, SWAP_HAS_CACHE, n_goal, swp_entries, order); spin_unlock(&si->lock); @@ -1057,7 +1144,6 @@ int get_swap_pages(int n_goal, swp_entry_t swp_entries[], int entry_order) cond_resched(); spin_lock(&swap_avail_lock); -nextsi: /* * if we got here, it's likely that si was almost full before, * and since scan_swap_map_slots() can drop the si->lock, @@ -1789,7 +1875,7 @@ unsigned int count_swap_pages(int type, int free) if (sis->flags & SWP_WRITEOK) { n = sis->pages; if (free) - n -= sis->inuse_pages; + n -= swap_usage_in_pages(sis); } spin_unlock(&sis->lock); } @@ -2124,7 +2210,7 @@ static int try_to_unuse(unsigned int type) swp_entry_t entry; unsigned int i; - if (!READ_ONCE(si->inuse_pages)) + if (!swap_usage_in_pages(si)) goto success; retry: @@ -2137,7 +2223,7 @@ static int try_to_unuse(unsigned int type) spin_lock(&mmlist_lock); p = &init_mm.mmlist; - while (READ_ONCE(si->inuse_pages) && + while (swap_usage_in_pages(si) && !signal_pending(current) && (p = p->next) != &init_mm.mmlist) { @@ -2165,7 +2251,7 @@ static int try_to_unuse(unsigned int type) mmput(prev_mm); i = 0; - while (READ_ONCE(si->inuse_pages) && + while (swap_usage_in_pages(si) && !signal_pending(current) && (i = find_next_to_unuse(si, i)) != 0) { @@ -2200,7 +2286,7 @@ static int try_to_unuse(unsigned int type) * folio_alloc_swap(), temporarily hiding that swap. It's easy * and robust (though cpu-intensive) just to keep retrying. */ - if (READ_ONCE(si->inuse_pages)) { + if (swap_usage_in_pages(si)) { if (!signal_pending(current)) goto retry; return -EINTR; @@ -2209,7 +2295,7 @@ static int try_to_unuse(unsigned int type) success: /* * Make sure that further cleanups after try_to_unuse() returns happen - * after swap_range_free() reduces si->inuse_pages to 0. + * after swap_range_free() reduces inuse_pages to 0. */ smp_mb(); return 0; @@ -2227,7 +2313,7 @@ static void drain_mmlist(void) unsigned int type; for (type = 0; type < nr_swapfiles; type++) - if (swap_info[type]->inuse_pages) + if (swap_usage_in_pages(swap_info[type])) return; spin_lock(&mmlist_lock); list_for_each_safe(p, next, &init_mm.mmlist) @@ -2406,7 +2492,6 @@ static void setup_swap_info(struct swap_info_struct *si, int prio, static void _enable_swap_info(struct swap_info_struct *si) { - si->flags |= SWP_WRITEOK; atomic_long_add(si->pages, &nr_swap_pages); total_swap_pages += si->pages; @@ -2423,9 +2508,8 @@ static void _enable_swap_info(struct swap_info_struct *si) */ plist_add(&si->list, &swap_active_head); - /* add to available list if swap device is not full */ - if (si->inuse_pages < si->pages) - add_to_avail_list(si); + /* Add back to available list */ + add_to_avail_list(si, true); } static void enable_swap_info(struct swap_info_struct *si, int prio, @@ -2523,7 +2607,7 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile) goto out_dput; } spin_lock(&p->lock); - del_from_avail_list(p); + del_from_avail_list(p, true); if (p->prio < 0) { struct swap_info_struct *si = p; int nid; @@ -2541,7 +2625,6 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile) plist_del(&p->list, &swap_active_head); atomic_long_sub(p->pages, &nr_swap_pages); total_swap_pages -= p->pages; - p->flags &= ~SWP_WRITEOK; spin_unlock(&p->lock); spin_unlock(&swap_lock); @@ -2721,7 +2804,7 @@ static int swap_show(struct seq_file *swap, void *v) } bytes = K(si->pages); - inuse = K(READ_ONCE(si->inuse_pages)); + inuse = K(swap_usage_in_pages(si)); file = si->swap_file; len = seq_file_path(swap, file, " \t\n\\"); @@ -2838,6 +2921,7 @@ static struct swap_info_struct *alloc_swap_info(void) } spin_lock_init(&p->lock); spin_lock_init(&p->cont_lock); + atomic_long_set(&p->inuse_pages, SWAP_USAGE_OFFLIST_BIT); init_completion(&p->comp); return p; @@ -3335,7 +3419,7 @@ void si_swapinfo(struct sysinfo *val) struct swap_info_struct *si = swap_info[type]; if ((si->flags & SWP_USED) && !(si->flags & SWP_WRITEOK)) - nr_to_be_unused += READ_ONCE(si->inuse_pages); + nr_to_be_unused += swap_usage_in_pages(si); } val->freeswap = atomic_long_read(&nr_swap_pages) + nr_to_be_unused; val->totalswap = total_swap_pages + nr_to_be_unused;