From patchwork Tue Oct 22 19:24:44 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kairui Song X-Patchwork-Id: 13846080 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1C025CDD0D7 for ; Tue, 22 Oct 2024 19:30:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9DA346B0098; Tue, 22 Oct 2024 15:30:10 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9611B6B0099; Tue, 22 Oct 2024 15:30:10 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 78D1D6B009A; Tue, 22 Oct 2024 15:30:10 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 541B86B0098 for ; Tue, 22 Oct 2024 15:30:10 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 486F21A045D for ; Tue, 22 Oct 2024 19:29:40 +0000 (UTC) X-FDA: 82702227888.17.3C92946 Received: from mail-pl1-f175.google.com (mail-pl1-f175.google.com [209.85.214.175]) by imf29.hostedemail.com (Postfix) with ESMTP id 4811F120005 for ; Tue, 22 Oct 2024 19:29:46 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="bmveENQ/"; spf=pass (imf29.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.175 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1729625256; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=91RZU9Im1mJOrq2P9uaFSxI4j9qGIThucf4AzLiTGJQ=; b=ChdvfMxRFYOrSEB9zawETxkunUdlm5ufdBrxqWuKB/Gfo16uQA+9RoLrgrvt1uqIF+qjVW I9REtRk1UjW0sJuJu3Uv8f1glN7GsMu30pOcOeMJ1+mKrHZ+XJMtgA9T0LMBXKkg+vQPSn 13QAn/Sh+lQ325w+hmkL3LYrIiBc5yc= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729625256; a=rsa-sha256; cv=none; b=g3wbI4JKedKuoSsycNpziC+OVXEmtB36xLpmoTx/CnXw1Aph9gdPN45ZXu7311uacEoE/L CoyLqQFx0Jngv1cACJyZZPgTLsQMQNcXAzIJGyitO3HGcgjc1tDD3paKHVkZOdQCNvaWA3 HawM+1SI9z2RdJo3gaYNM9bpInNjr+g= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="bmveENQ/"; spf=pass (imf29.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.175 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-pl1-f175.google.com with SMTP id d9443c01a7336-20cb7139d9dso55091815ad.1 for ; Tue, 22 Oct 2024 12:30:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1729625406; x=1730230206; darn=kvack.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=91RZU9Im1mJOrq2P9uaFSxI4j9qGIThucf4AzLiTGJQ=; b=bmveENQ/w7NRozZhRcQZseTm+y6OgG25X1nJlt4hl4AuPvALYt4SChP0/P0vzqLH/4 xx0hhvFg1yvqsJqozfxyjn4jEV96gBjjMEntKWrHafn3qd+TrOo94OjSguqOsPJr4eLa r6oRHlXrBiu+NIGfoytpQNFLC9OqLFYVW+UqWPCjanuQoefT/UJJpUbQdDq1+A7v++vJ UTzlxljJ6gG8btd4VNITIkgUJ06pITcQfx9oY7RoygLWjv1yD6yuTaIfLEyJQ6IltzI5 jSD7K10oSmfXTEfFfriSqiK6pKNd4+OkkdgsjIBwLR7OYIOE3chniDmSGi5XppSjcKXg IaTA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729625406; x=1730230206; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=91RZU9Im1mJOrq2P9uaFSxI4j9qGIThucf4AzLiTGJQ=; b=eJgM1F7X3pesmqCRJvgmetmIrAcbSqTz/92KRiVcg9SvdSyUjQcQQ2yeft2wo5Zi0L moxHUys0eCzxuGudFqviWqjB77SpNYOrwe+BAtPaMvxvvjyBQfwrPmMGW+97yeoFvv2p HoxeWK+sMtOcTsMx5wQjGP4N/LSV53mI7sG/TNkM84J8eKI2mw77iVYpi8M99DdvfQhR /Yrq5JSWKUWOHeKKxV8TWpaUZzOc9fvLPCmLGaaTRGI1TRD2K6FFoVqybBGGOG5z1y17 7XJKdqXK9s88tWZtZff4SnWP2aVMbdfXF49z9+5Jccz1kJksRPHQpz6Hf7dM64XC23Ai q+YA== X-Gm-Message-State: AOJu0YxAr1VsSfl/8lbHCehkU0SkhqWEIjnLPbWr8gGZ93g1vmZrFrhB Lc9sukPjEhQcaPPM7sbq4YGh9jCtBtIGFwegYLUXx3DCg923h17acsbzzHP/YTY= X-Google-Smtp-Source: AGHT+IGGKxwhFu82Kx287SIzTLPVnCIKI6xEf3aZnrPozIx88QdOWVORIZasbmDFFI1F9UEgs7b7OA== X-Received: by 2002:a17:902:e5c6:b0:20c:c834:b107 with SMTP id d9443c01a7336-20fa9e49d8fmr2964315ad.22.1729625405532; Tue, 22 Oct 2024 12:30:05 -0700 (PDT) Received: from KASONG-MC4.tencent.com ([106.37.123.36]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-20e7f0d9f05sm45895305ad.186.2024.10.22.12.30.02 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 22 Oct 2024 12:30:05 -0700 (PDT) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Chris Li , Barry Song , Ryan Roberts , Hugh Dickins , Yosry Ahmed , "Huang, Ying" , Tim Chen , Nhat Pham , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH 06/13] mm, swap: clean up plist removal and adding Date: Wed, 23 Oct 2024 03:24:44 +0800 Message-ID: <20241022192451.38138-7-ryncsn@gmail.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20241022192451.38138-1-ryncsn@gmail.com> References: <20241022192451.38138-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 X-Stat-Signature: jcnpbie6gagfm9xfw1rsfihdajfdepuc X-Rspamd-Queue-Id: 4811F120005 X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1729625386-145670 X-HE-Meta: U2FsdGVkX1+GQPnQHgynupK7J7nsKS6GBnEkYznysXESJsa1M0fhxwSM/q8m3wbfLz86nRYq3y+Qjb5R2vyoq7aRBGzuKaQgIgMKMA72jUlyBpvGS6fQ29ih4qvHmig/ptDo1NIQK5CMVOVdzELHziQy8femGHjlmAFH7ySc0uUNtHxsisp4m2kHeEsZvSUVCaLZUBLiAZzkx3JA+/K45LPX2ZFOGa65sh9/uJe5DQNE4VFKmtuFGd8Sej/tjL921KVH5f1WVmVQC8RvPgYqi3VzS9hMJFZ8ByCKYL+pDkFvpL2YxS4Nre8PVF/vRxPt3PxL+Y+61sTKmXNPByjzOuMqu0dVMmCUrWU3hh8F83tun49xhSVQ7/zD/UoAg2e8EzI92j701d8VlzfE4rN8GP7Yrbs1xJiSn+31e/dfJqNjE3k6ZppFZE3dE/s7Cn2ktYSD2A6h8lBhWHEbBtaNPBzjF4TESH6GlIhBkjtq3jUw72Vp/SV64cbpxJUTdT4gUkU5SUEGNbcmU3zW1aS2M4q8vVtoyP2xYcaXcoJ4W+DvADe0xICEgHWXSyzWLeH/osck7j7sEtTRJz8zsJMuAeDKHpuu2kaKvACM/rW8ePOGtbLheM9cSp97dn8jBclGKrR1LycZlqkXTU2HkklAZrPBAljUirEYHmJngE6N6QmvK4bZYrioTp82pygJg0r9+7m3Mz9C2Rs/G+8fliDvR045cxuF4uccsbUE6Z7eUo15h0Bhbmlrgnke/x2F0bLEfBsiagmLW400Ry90bSKXeYxUQ5c/fGPQFvCXWrf8GpN4ozpsP3xWDzuEUU+ytP2YrLvUo5PLCzC6gkmhg5+Nw/CNkAi1bu1lJaBFi0H6I0sOWDQNXktZIIiXC5r+qRThw3Mnj77UQR+bR0f+DsBAlrJRZqPObgUARl4F/L17RAvD67ehx7TicO2frcYMIbNmxn20h1ZEJ5VOQIOzkvs mBwkvaue E7YMlrDoUgLItLdolXHXK3CfmC4CkIaJIWKklVerkr4+CZNBapcB5cDsGY+qddFVY+Q7I4s7985iswWP82L2p3h97rKEJIq/daOij5+PwU0ORunNMrXldzXARrN8IRM/nwNp93UKX83sD2GJkURsa7470U6FJheJQeSiys5PoTDZbtjb8eQGJ1+YK8JCbrw9TMqTEa9Jms3CYMeZsSnogCMf91kAlzKZj5lutm7Av0ZFUyi1Sw0dtp2uOE+C/k11q6RoR1hTjPG9x61HI30VZd7ulXoMpzIrny58DLj8tS3NuHz5DIfmuWzIXdRX6J9plHFywktPpMHYpkaZgHVCnkvdQ/WMpa7VB356OnaGwQmXULV2fBSDF8UE1gAS/8+KAliHE8me+wOmo2w2ZAHx2nwqDvnKWAZ5i5RWv3SHu1tuAYDonffURjeOHxK3Tvm+Wc+2tq1tIXocLLIswdK3wPkgrb4hqoBCzFGM44ViArjqmiJ1Ia5FqHRpUqvWc5uAvX4SOtexgyPxsl7ZAXI5NPEG5eA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Kairui Song When swap device is full (inuse_pages == pages), it should be removed from the plist. And if any slot is freed, the swap device should be added back to the plist. On swapoff / swapon, the swap device will also be force removed / added. This is currently serialized by si->lock, and some historical sanity check code are still here. This commit decouple this from the protection of si->lock and clean it up to prepare for si->lock rework. Noticing inuse_pages counter is the only thing decides if a device should be removed from or added to the plist (except swapon / swapoff as a special case), and inuse_pages is a very hot counter. So to avoid extra overhead on the counter update hot path, and make it possible to check and update the plist when the counter value changes, embed the plist state into the inuse_pages counter, and turn the counter into an atomic. This way we can check and update the counter with one CAS and avoid any extra synchronization. If the counter is full (inuse_pages == pages) with the off-list bit unset, try to remove it from the plist. If the counter is not full (inuse_pages != pages) with the off-list bit set, try to add it to the plist. Removing and adding is serialized with lock as well as the bit setting. Ordinary counter updates will be lockless. Signed-off-by: Kairui Song --- include/linux/swap.h | 2 +- mm/swapfile.c | 182 +++++++++++++++++++++++++++++++------------ 2 files changed, 132 insertions(+), 52 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index c0d49dad7a4b..16dcf8bd1a4e 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -306,7 +306,7 @@ struct swap_info_struct { /* list of cluster that are fragmented or contented */ unsigned int frag_cluster_nr[SWAP_NR_ORDERS]; unsigned int pages; /* total of usable pages of swap */ - unsigned int inuse_pages; /* number of those currently in use */ + atomic_long_t inuse_pages; /* number of those currently in use */ struct percpu_cluster __percpu *percpu_cluster; /* per cpu's swap location */ struct rb_root swap_extent_root;/* root of the swap extent rbtree */ struct block_device *bdev; /* swap device or bdev of swap file */ diff --git a/mm/swapfile.c b/mm/swapfile.c index e620b41c3120..4e629536a07c 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -128,6 +128,25 @@ static inline unsigned char swap_count(unsigned char ent) return ent & ~SWAP_HAS_CACHE; /* may include COUNT_CONTINUED flag */ } +/* + * Use the second highest bit of inuse_pages as the indicator + * of if one swap device is on the allocation plist. + * + * inuse_pages is the only thing decides of a device should be on + * list or not (except swapoff as a special case). By embedding the + * on-list bit into it, updaters don't need any lock to check the + * device list status. + * + * This bit will be set to 1 if the device is not on the plist and not + * usable, will be cleared if the device is on the plist. + */ +#define SWAP_USAGE_OFFLIST_BIT (1UL << (BITS_PER_TYPE(atomic_t) - 2)) +#define SWAP_USAGE_COUNTER_MASK (~SWAP_USAGE_OFFLIST_BIT) +static long swap_usage_in_pages(struct swap_info_struct *si) +{ + return atomic_long_read(&si->inuse_pages) & SWAP_USAGE_COUNTER_MASK; +} + /* Reclaim the swap entry anyway if possible */ #define TTRS_ANYWAY 0x1 /* @@ -709,7 +728,7 @@ static void swap_reclaim_full_clusters(struct swap_info_struct *si, bool force) int nr_reclaim; if (force) - to_scan = si->inuse_pages / SWAPFILE_CLUSTER; + to_scan = swap_usage_in_pages(si) / SWAPFILE_CLUSTER; while (!list_empty(&si->full_clusters)) { ci = list_first_entry(&si->full_clusters, struct swap_cluster_info, list); @@ -860,42 +879,121 @@ static unsigned long cluster_alloc_swap_entry(struct swap_info_struct *si, int o return found; } -static void __del_from_avail_list(struct swap_info_struct *si) +/* + * SWAP_USAGE_OFFLIST_BIT can only be cleared by this helper and synced with + * counter updaters with atomic. + */ +static void del_from_avail_list(struct swap_info_struct *si, bool swapoff) { int nid; - assert_spin_locked(&si->lock); + spin_lock(&swap_avail_lock); + + if (swapoff) { + /* Clear SWP_WRITEOK so add_to_avail_list won't add it back */ + si->flags &= ~SWP_WRITEOK; + + /* Force take it off. */ + atomic_long_or(SWAP_USAGE_OFFLIST_BIT, &si->inuse_pages); + } else { + /* + * If not swapoff, take it off-list only when it's full and + * SWAP_USAGE_OFFLIST_BIT is not set (inuse_pages == pages). + * The cmpxchg below will fail and skip the removal if there + * are slots freed or device is off-listed by someone else. + */ + if (atomic_long_cmpxchg(&si->inuse_pages, si->pages, + si->pages | SWAP_USAGE_OFFLIST_BIT) != si->pages) + goto skip; + } + for_each_node(nid) plist_del(&si->avail_lists[nid], &swap_avail_heads[nid]); + +skip: + spin_unlock(&swap_avail_lock); } -static void del_from_avail_list(struct swap_info_struct *si) +/* + * SWAP_USAGE_OFFLIST_BIT can only be set by this helper and synced with + * counter updaters with atomic. + */ +static void add_to_avail_list(struct swap_info_struct *si, bool swapon) { + int nid; + long val; + bool swapoff; + spin_lock(&swap_avail_lock); - __del_from_avail_list(si); + + /* Special handling for swapon / swapoff */ + if (swapon) { + si->flags |= SWP_WRITEOK; + swapoff = false; + } else { + swapoff = !(READ_ONCE(si->flags) & SWP_WRITEOK); + } + + if (swapoff) + goto skip; + + if (!(atomic_long_read(&si->inuse_pages) & SWAP_USAGE_OFFLIST_BIT)) + goto skip; + + val = atomic_long_fetch_and_relaxed(~SWAP_USAGE_OFFLIST_BIT, &si->inuse_pages); + + /* + * When device is full and device is on the plist, only one updater will + * see (inuse_pages == si->pages) and will call del_from_avail_list. If + * that updater happen to be here, just skip adding. + */ + if (val == si->pages) { + /* Just like the cmpxchg in del_from_avail_list */ + if (atomic_long_cmpxchg(&si->inuse_pages, si->pages, + si->pages | SWAP_USAGE_OFFLIST_BIT) == si->pages) + goto skip; + } + + for_each_node(nid) + plist_add(&si->avail_lists[nid], &swap_avail_heads[nid]); + +skip: spin_unlock(&swap_avail_lock); } -static void swap_range_alloc(struct swap_info_struct *si, - unsigned int nr_entries) +/* + * swap_usage_add / swap_usage_sub are serialized by ci->lock in each cluster + * so the total contribution to the global counter should always be positive. + */ +static bool swap_usage_add(struct swap_info_struct *si, unsigned int nr_entries) { - WRITE_ONCE(si->inuse_pages, si->inuse_pages + nr_entries); - if (si->inuse_pages == si->pages) { - del_from_avail_list(si); + long val = atomic_long_add_return_relaxed(nr_entries, &si->inuse_pages); - if (vm_swap_full()) - schedule_work(&si->reclaim_work); + /* If device is full, SWAP_USAGE_OFFLIST_BIT not set, try off list it */ + if (val == si->pages) { + del_from_avail_list(si, false); + return true; } + + return false; } -static void add_to_avail_list(struct swap_info_struct *si) +static void swap_usage_sub(struct swap_info_struct *si, unsigned int nr_entries) { - int nid; + long val = atomic_long_sub_return_relaxed(nr_entries, &si->inuse_pages); - spin_lock(&swap_avail_lock); - for_each_node(nid) - plist_add(&si->avail_lists[nid], &swap_avail_heads[nid]); - spin_unlock(&swap_avail_lock); + /* If device is off list, try add it back */ + if (val & SWAP_USAGE_OFFLIST_BIT) + add_to_avail_list(si, false); +} + +static void swap_range_alloc(struct swap_info_struct *si, + unsigned int nr_entries) +{ + if (swap_usage_add(si, nr_entries)) { + if (vm_swap_full()) + schedule_work(&si->reclaim_work); + } } static void swap_range_free(struct swap_info_struct *si, unsigned long offset, @@ -913,8 +1011,6 @@ static void swap_range_free(struct swap_info_struct *si, unsigned long offset, for (i = 0; i < nr_entries; i++) clear_bit(offset + i, si->zeromap); - if (si->inuse_pages == si->pages) - add_to_avail_list(si); if (si->flags & SWP_BLKDEV) swap_slot_free_notify = si->bdev->bd_disk->fops->swap_slot_free_notify; @@ -928,13 +1024,13 @@ static void swap_range_free(struct swap_info_struct *si, unsigned long offset, } clear_shadow_from_swap_cache(si->type, begin, end); + atomic_long_add(nr_entries, &nr_swap_pages); /* * Make sure that try_to_unuse() observes si->inuse_pages reaching 0 * only after the above cleanups are done. */ smp_wmb(); - atomic_long_add(nr_entries, &nr_swap_pages); - WRITE_ONCE(si->inuse_pages, si->inuse_pages - nr_entries); + swap_usage_sub(si, nr_entries); } static int cluster_alloc_swap(struct swap_info_struct *si, @@ -1020,19 +1116,6 @@ int get_swap_pages(int n_goal, swp_entry_t swp_entries[], int entry_order) plist_requeue(&si->avail_lists[node], &swap_avail_heads[node]); spin_unlock(&swap_avail_lock); spin_lock(&si->lock); - if ((si->inuse_pages == si->pages) || !(si->flags & SWP_WRITEOK)) { - spin_lock(&swap_avail_lock); - if (plist_node_empty(&si->avail_lists[node])) { - spin_unlock(&si->lock); - goto nextsi; - } - WARN(!(si->flags & SWP_WRITEOK), - "swap_info %d in list but !SWP_WRITEOK\n", - si->type); - __del_from_avail_list(si); - spin_unlock(&si->lock); - goto nextsi; - } n_ret = scan_swap_map_slots(si, SWAP_HAS_CACHE, n_goal, swp_entries, order); spin_unlock(&si->lock); @@ -1041,7 +1124,6 @@ int get_swap_pages(int n_goal, swp_entry_t swp_entries[], int entry_order) cond_resched(); spin_lock(&swap_avail_lock); -nextsi: /* * if we got here, it's likely that si was almost full before, * and since scan_swap_map_slots() can drop the si->lock, @@ -1773,7 +1855,7 @@ unsigned int count_swap_pages(int type, int free) if (sis->flags & SWP_WRITEOK) { n = sis->pages; if (free) - n -= sis->inuse_pages; + n -= swap_usage_in_pages(sis); } spin_unlock(&sis->lock); } @@ -2108,7 +2190,7 @@ static int try_to_unuse(unsigned int type) swp_entry_t entry; unsigned int i; - if (!READ_ONCE(si->inuse_pages)) + if (!swap_usage_in_pages(si)) goto success; retry: @@ -2121,7 +2203,7 @@ static int try_to_unuse(unsigned int type) spin_lock(&mmlist_lock); p = &init_mm.mmlist; - while (READ_ONCE(si->inuse_pages) && + while (swap_usage_in_pages(si) && !signal_pending(current) && (p = p->next) != &init_mm.mmlist) { @@ -2149,7 +2231,7 @@ static int try_to_unuse(unsigned int type) mmput(prev_mm); i = 0; - while (READ_ONCE(si->inuse_pages) && + while (swap_usage_in_pages(si) && !signal_pending(current) && (i = find_next_to_unuse(si, i)) != 0) { @@ -2184,7 +2266,7 @@ static int try_to_unuse(unsigned int type) * folio_alloc_swap(), temporarily hiding that swap. It's easy * and robust (though cpu-intensive) just to keep retrying. */ - if (READ_ONCE(si->inuse_pages)) { + if (swap_usage_in_pages(si)) { if (!signal_pending(current)) goto retry; return -EINTR; @@ -2193,7 +2275,7 @@ static int try_to_unuse(unsigned int type) success: /* * Make sure that further cleanups after try_to_unuse() returns happen - * after swap_range_free() reduces si->inuse_pages to 0. + * after swap_range_free() reduces inuse_pages to 0. */ smp_mb(); return 0; @@ -2211,7 +2293,7 @@ static void drain_mmlist(void) unsigned int type; for (type = 0; type < nr_swapfiles; type++) - if (swap_info[type]->inuse_pages) + if (swap_usage_in_pages(swap_info[type])) return; spin_lock(&mmlist_lock); list_for_each_safe(p, next, &init_mm.mmlist) @@ -2390,7 +2472,6 @@ static void setup_swap_info(struct swap_info_struct *si, int prio, static void _enable_swap_info(struct swap_info_struct *si) { - si->flags |= SWP_WRITEOK; atomic_long_add(si->pages, &nr_swap_pages); total_swap_pages += si->pages; @@ -2407,9 +2488,8 @@ static void _enable_swap_info(struct swap_info_struct *si) */ plist_add(&si->list, &swap_active_head); - /* add to available list if swap device is not full */ - if (si->inuse_pages < si->pages) - add_to_avail_list(si); + /* Add back to available list */ + add_to_avail_list(si, true); } static void enable_swap_info(struct swap_info_struct *si, int prio, @@ -2507,7 +2587,7 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile) goto out_dput; } spin_lock(&p->lock); - del_from_avail_list(p); + del_from_avail_list(p, true); if (p->prio < 0) { struct swap_info_struct *si = p; int nid; @@ -2525,7 +2605,6 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile) plist_del(&p->list, &swap_active_head); atomic_long_sub(p->pages, &nr_swap_pages); total_swap_pages -= p->pages; - p->flags &= ~SWP_WRITEOK; spin_unlock(&p->lock); spin_unlock(&swap_lock); @@ -2705,7 +2784,7 @@ static int swap_show(struct seq_file *swap, void *v) } bytes = K(si->pages); - inuse = K(READ_ONCE(si->inuse_pages)); + inuse = K(swap_usage_in_pages(si)); file = si->swap_file; len = seq_file_path(swap, file, " \t\n\\"); @@ -2822,6 +2901,7 @@ static struct swap_info_struct *alloc_swap_info(void) } spin_lock_init(&p->lock); spin_lock_init(&p->cont_lock); + atomic_long_set(&p->inuse_pages, SWAP_USAGE_OFFLIST_BIT); init_completion(&p->comp); return p; @@ -3319,7 +3399,7 @@ void si_swapinfo(struct sysinfo *val) struct swap_info_struct *si = swap_info[type]; if ((si->flags & SWP_USED) && !(si->flags & SWP_WRITEOK)) - nr_to_be_unused += READ_ONCE(si->inuse_pages); + nr_to_be_unused += swap_usage_in_pages(si); } val->freeswap = atomic_long_read(&nr_swap_pages) + nr_to_be_unused; val->totalswap = total_swap_pages + nr_to_be_unused;