From patchwork Mon Feb 24 18:02:12 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kairui Song X-Patchwork-Id: 13988668 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CC6FDC021A4 for ; Mon, 24 Feb 2025 18:03:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5C72D6B0092; Mon, 24 Feb 2025 13:03:22 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 579C56B0093; Mon, 24 Feb 2025 13:03:22 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3C9646B0095; Mon, 24 Feb 2025 13:03:22 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 18ACF6B0092 for ; Mon, 24 Feb 2025 13:03:22 -0500 (EST) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 92F111A0179 for ; Mon, 24 Feb 2025 18:03:21 +0000 (UTC) X-FDA: 83155610202.03.0F7587D Received: from mail-pl1-f179.google.com (mail-pl1-f179.google.com [209.85.214.179]) by imf02.hostedemail.com (Postfix) with ESMTP id B1BFA8003D for ; Mon, 24 Feb 2025 18:03:19 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=hitmdhfv; spf=pass (imf02.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.179 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740420199; a=rsa-sha256; cv=none; b=IbbdfuzrEQKE4mpEYR4PgORrg6G3QKZbpZDIUu9HVoZqcDAyzH8Xr54MFMxWn4LZ5wrYKX nYDmR88NXiulSY186nGj5ry+314tD6TtG2jgOB1qjGf5z3/PEaZC+jgu/pZU/lmfOjz37u LJ1IriL3f2xdkrtBk/zpdRUDonkjBvg= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=hitmdhfv; spf=pass (imf02.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.179 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740420199; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=0F3kezvsTBdwcMP4/p3aFmc/96fOBXv0a/r8BX7g49c=; b=5h9GFLO2sw6Lk4S25qYo6d48s1ms5gER4aR6UL4iTiVhG+fCBOdFckrvvRHSi00JRk6m8v u7/0ExIkUC2oFScMZIG2Nmu9P62aYucKgK3cT/n9GSoxVxRqpg/oeo4fdH8KXYDiKJflZf a28rMgO1s/8hwNAp4gAnsHDblvOWh48= Received: by mail-pl1-f179.google.com with SMTP id d9443c01a7336-220c92c857aso81046375ad.0 for ; Mon, 24 Feb 2025 10:03:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1740420198; x=1741024998; darn=kvack.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=0F3kezvsTBdwcMP4/p3aFmc/96fOBXv0a/r8BX7g49c=; b=hitmdhfvzaX4NidpfB9day6aynUP8yAPPkC4DosGIMHlDyIdwhPQ8NoiHjS2DwNOn7 p360bHlVeUCpGDQv9jLQPaZaNxlZhJITGaUr5q7v/ZG2JcqOOKJtadjuzN2Qepui/4I/ l42y1lWJAOAaWI6GvtASChnAi1LQ1K+w3ono4BAxChBBEfQ78pN7r2sr/hm/YFeGEBfy 7a1SKsJaOQJYfmS6ZT9pNklp0f6sdRb2s7EkbFQhLpX/KT58PeZ16cjI9vkt5lwwTSVj hziw4TtRQnqJ16Kw6hX8LQgi1RLWdD9bq+V0YQvC0DpnuigcmDQXW+6s9uc3t6RMbQ8k wBWQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1740420198; x=1741024998; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=0F3kezvsTBdwcMP4/p3aFmc/96fOBXv0a/r8BX7g49c=; b=B2biTgX783v6FY2TrAOTZRAA79VDzPKp5l5780sdT4ZhWUacPuvFjUpHs5Kn80h13J OgprDE2M15Ud1TjHwKBHIZWSfFbSpX2AJyc++e5SwsarXeHRlKkBPSewApq8aVIw5x+x wh2lKmci1Oi5eag6mSwYXu6o9l6J0qhUoNop+LmxsmPCVk23YK7uyMa9G2F7VzfIXxUq rleUQJvm9dTHcR7XZ69AcMqWumg5PctbWL99j4My6YmE2ONGI5u6NfCJr6zIswRCJQmj t7gVEP3KjfpsmIVL/ZQf+1fB734j7dqgzJe3i1lGfD5vlwiGc1JKtgPE6AfE2O0d2WyZ uQKA== X-Gm-Message-State: AOJu0YziUaxx//kLNUZRd2iCDKNOzshgVnNi7zB/joBD4gDA+fK4XKnn iHWs2kXiVLp7q6xkubkG4FmzOZFc0C+4w+vATOAAHaIVDR/WnpLBH+9x4Z6H1+w= X-Gm-Gg: ASbGncvli08VakST2Y8Tntb5mqzGr8XZEJ/9F8trHC5YMe9tdVETtgKBEoV/f1T4Tik gFm4ZFD2a0F70n+WB3zkTJFiT3O7qXTfE6LtJ63O0SnU3jJstklGFNQbCw+zOS4KAYRFqtcAg0r 9LTL+KxV7cya/O5wOO1sLjs98hGAIIcaW80kmQ7DmX7q/is8gzlpZK344g7naF5nnITQMJuX2X4 WOairKCcyZFFEdao27qKtp9JVbGQLj7WSzwQ9czG0nAgGKosm7CY7Uk2GjamDmBZlhlmV/J/sZX 6iBsXGHzdTas4SoI0OqCG7EZwhZkxguDS/55jsJhqzLZ X-Google-Smtp-Source: AGHT+IFXam9uJQZb5hs3Hxkm0TWEoFuaGqLoECsuP+xjMxnP1x5Le2+i2A4Hq3TcsdGNfTd3wjKQnQ== X-Received: by 2002:a17:902:fc4e:b0:21f:4c8b:c4da with SMTP id d9443c01a7336-2219fff87cbmr242938105ad.18.1740420197211; Mon, 24 Feb 2025 10:03:17 -0800 (PST) Received: from KASONG-MC4.tencent.com ([1.203.117.88]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-220d556e15esm184834695ad.190.2025.02.24.10.03.13 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 24 Feb 2025 10:03:16 -0800 (PST) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Chris Li , Barry Song , Hugh Dickins , Yosry Ahmed , "Huang, Ying" , Baoquan He , Nhat Pham , Johannes Weiner , Baolin Wang , Kalesh Singh , Matthew Wilcox , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH v2 7/7] mm, swap: simplify folio swap allocation Date: Tue, 25 Feb 2025 02:02:12 +0800 Message-ID: <20250224180212.22802-8-ryncsn@gmail.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250224180212.22802-1-ryncsn@gmail.com> References: <20250224180212.22802-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 X-Rspamd-Queue-Id: B1BFA8003D X-Stat-Signature: ur86aurp1c8nw96udcqmeg1c5k96rjec X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1740420199-426762 X-HE-Meta: U2FsdGVkX195+tdJaOnZ0TKb/8U+e1FDVIDBrXc2s8oe1l/Q6NdIpc6+I4xaLttWnkzI+bQl4GkxJ75SU/q/Hp9AG5tAst85O459ANmhxCa7BF3BUPV87hdXL3UeEYURhorwkE6guQGGuhsFp4e02/oXdZzDbZ4hHP76mXEIPrAJ4A5UdXO2fagiPhUc1VwERsDUJE1uapcQUch1mVJUGAbwsl3PHWDGB6UHcWx79HvDkh9OvdPlXtmwqHQtBqOqF8rOw8SutgHaNgLSbjwrqwSpoGDStSs+EHftAJ1sNvGipSKgwyXEtsQRMHmUY4Q0koiOTAYQLWIjAk9SOUcJ6qpswBX11i4Xa/JS0y7+qfYtUvj/j/R+yB9vnxYFmUV5Ai/f7M1pX3mgtkdm/mpM6mIq+4z7reNbX0BBB2T32suZo1ukDLLlDTT3WmobtclyTb/UEpepYy3H+K8tU6VByt9SKyLa1vx/7JH0uf/cjfzBw6fzh2MS5WG/28D7voO7WMmlvgrjoegEGFx1ctHgy50kDBFKUE5Tj01i5ciltgfl5GcPtnqMoy8c0XYiY/6B0YMLJJBecbqjan28EA2Lmmui9bzfTytcCUfJRyV8/MCDzTGV/6RAkdLV03zzoyLIoKlKQBcf6cJ0qPY2E6rZJtgB3Ec/2W8f47t1SeNa19rl35nzpAgp3Zyc88vW8FwUEDaXDfAHKWahjIXdXLXRHBUhNIXSPShs1lb6qxN5pLbHnlbtSYEWpa+gEtwxd0fMR2yv1YPgILHhEXDiR94PHHkI6QHCssm42UgL+8eSbARYW0tEL1N9+KMOgU04GnHxZ4uYmlpgHuwBLggl+t+satuEf8z8//mWBknc5sZvdFPaLBsIZZr1+9/dIvKr5HV32PWl/ckRLXWbPw3OVv5PWBVIZYiQkpnFatFXOaR+7PjA9ci/c5pW0sxutw2XuDXINH50PhLuR0sZTNGf6rS DtmVyuZp l05Rs0NiDR/rpp3qOgfm2/WJp54OU7qjs2Exf1HN1cSNPbzoxF7zlJGmSx+Mn8peBYItmkifk00tXW4Ety2jNCnZae86c0eAVXKMTPGzSdzvt5nKWwoGM2W00c43h5qzJv/UD1GkjInHddIlCsssdXLrCtnR6yjyHTSGzbyidzLH21oymayz6cBr14BzSHGzVnO2YB0/WJuTlUbRaDQVqRW+GDlDdDgQi/75DpQug7FIELZCnnJx8jv5Sg8WfwVcWFQWMSeJsImG2o29l2zDxKFwUTlimHuVXSZnUy3s+Qq21b0Vhg5Lk3OVZ2YrBDfTgKd9hxwGrPRQLFte8xCJz6dmBuXY9etkcAZLFzWqK9dVDto+tK5SICBXOG7fNMkSIK+JS+tW7zSbfRYMOCmzkQX+CdqbOwhe5EdZ27CvdWVKwNtU72QHN0ZGSn6uI/pT+SAGC/US6ob8D9POKMrBYcUQaowM2gI1IkUsP/KoXKAepU0GEr04wfWgHO8f1GQrhmpMyRtsXVZb7Lu22ELAfd/gzhXCmTVXVbLX/G+wAg6aNX5rvo8utRxgUp54VaIYD2XyH X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Kairui Song With slot cache gone, clean up the allocation helpers even more. folio_alloc_swap will be the only entry for allocation and adding the folio to swap cache (except suspend), making it opposite of folio_free_swap. Signed-off-by: Kairui Song --- include/linux/swap.h | 8 ++-- mm/shmem.c | 21 +++------ mm/swap.h | 6 --- mm/swap_state.c | 57 ---------------------- mm/swapfile.c | 110 ++++++++++++++++++++++++++++--------------- mm/vmscan.c | 16 ++++++- 6 files changed, 94 insertions(+), 124 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index a0a262bcaf41..3a68da686c4e 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -478,7 +478,7 @@ static inline long get_nr_swap_pages(void) } extern void si_swapinfo(struct sysinfo *); -swp_entry_t folio_alloc_swap(struct folio *folio); +int folio_alloc_swap(struct folio *folio, gfp_t gfp_mask); bool folio_free_swap(struct folio *folio); void put_swap_folio(struct folio *folio, swp_entry_t entry); extern swp_entry_t get_swap_page_of_type(int); @@ -587,11 +587,9 @@ static inline int swp_swapcount(swp_entry_t entry) return 0; } -static inline swp_entry_t folio_alloc_swap(struct folio *folio) +static int folio_alloc_swap(struct folio *folio, gfp_t gfp_mask) { - swp_entry_t entry; - entry.val = 0; - return entry; + return -EINVAL; } static inline bool folio_free_swap(struct folio *folio) diff --git a/mm/shmem.c b/mm/shmem.c index 45dbcb69da0c..aad02132b75a 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1546,7 +1546,6 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc) struct inode *inode = mapping->host; struct shmem_inode_info *info = SHMEM_I(inode); struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb); - swp_entry_t swap; pgoff_t index; int nr_pages; bool split = false; @@ -1628,14 +1627,6 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc) folio_mark_uptodate(folio); } - swap = folio_alloc_swap(folio); - if (!swap.val) { - if (nr_pages > 1) - goto try_split; - - goto redirty; - } - /* * Add inode to shmem_unuse()'s list of swapped-out inodes, * if it's not already there. Do it now before the folio is @@ -1648,20 +1639,20 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc) if (list_empty(&info->swaplist)) list_add(&info->swaplist, &shmem_swaplist); - if (add_to_swap_cache(folio, swap, - __GFP_HIGH | __GFP_NOMEMALLOC | __GFP_NOWARN, - NULL) == 0) { + if (!folio_alloc_swap(folio, __GFP_HIGH | __GFP_NOMEMALLOC | __GFP_NOWARN)) { shmem_recalc_inode(inode, 0, nr_pages); - swap_shmem_alloc(swap, nr_pages); - shmem_delete_from_page_cache(folio, swp_to_radix_entry(swap)); + swap_shmem_alloc(folio->swap, nr_pages); + shmem_delete_from_page_cache(folio, swp_to_radix_entry(folio->swap)); mutex_unlock(&shmem_swaplist_mutex); BUG_ON(folio_mapped(folio)); return swap_writepage(&folio->page, wbc); } + list_del_init(&info->swaplist); mutex_unlock(&shmem_swaplist_mutex); - put_swap_folio(folio, swap); + if (nr_pages > 1) + goto try_split; redirty: folio_mark_dirty(folio); if (wbc->for_reclaim) diff --git a/mm/swap.h b/mm/swap.h index ad2f121de970..0abb68091b4f 100644 --- a/mm/swap.h +++ b/mm/swap.h @@ -50,7 +50,6 @@ static inline pgoff_t swap_cache_index(swp_entry_t entry) } void show_swap_cache_info(void); -bool add_to_swap(struct folio *folio); void *get_shadow_from_swap_cache(swp_entry_t entry); int add_to_swap_cache(struct folio *folio, swp_entry_t entry, gfp_t gfp, void **shadowp); @@ -163,11 +162,6 @@ struct folio *filemap_get_incore_folio(struct address_space *mapping, return filemap_get_folio(mapping, index); } -static inline bool add_to_swap(struct folio *folio) -{ - return false; -} - static inline void *get_shadow_from_swap_cache(swp_entry_t entry) { return NULL; diff --git a/mm/swap_state.c b/mm/swap_state.c index 2b5744e211cd..68fd981b514f 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -166,63 +166,6 @@ void __delete_from_swap_cache(struct folio *folio, __lruvec_stat_mod_folio(folio, NR_SWAPCACHE, -nr); } -/** - * add_to_swap - allocate swap space for a folio - * @folio: folio we want to move to swap - * - * Allocate swap space for the folio and add the folio to the - * swap cache. - * - * Context: Caller needs to hold the folio lock. - * Return: Whether the folio was added to the swap cache. - */ -bool add_to_swap(struct folio *folio) -{ - swp_entry_t entry; - int err; - - VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio); - VM_BUG_ON_FOLIO(!folio_test_uptodate(folio), folio); - - entry = folio_alloc_swap(folio); - if (!entry.val) - return false; - - /* - * XArray node allocations from PF_MEMALLOC contexts could - * completely exhaust the page allocator. __GFP_NOMEMALLOC - * stops emergency reserves from being allocated. - * - * TODO: this could cause a theoretical memory reclaim - * deadlock in the swap out path. - */ - /* - * Add it to the swap cache. - */ - err = add_to_swap_cache(folio, entry, - __GFP_HIGH|__GFP_NOMEMALLOC|__GFP_NOWARN, NULL); - if (err) - goto fail; - /* - * Normally the folio will be dirtied in unmap because its - * pte should be dirty. A special case is MADV_FREE page. The - * page's pte could have dirty bit cleared but the folio's - * SwapBacked flag is still set because clearing the dirty bit - * and SwapBacked flag has no lock protected. For such folio, - * unmap will not set dirty bit for it, so folio reclaim will - * not write the folio out. This can cause data corruption when - * the folio is swapped in later. Always setting the dirty flag - * for the folio solves the problem. - */ - folio_mark_dirty(folio); - - return true; - -fail: - put_swap_folio(folio, entry); - return false; -} - /* * This must be called only on folios that have * been verified to be in the swap cache and locked. diff --git a/mm/swapfile.c b/mm/swapfile.c index 1ba916109d99..628f67974a7c 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1174,9 +1174,9 @@ static bool get_swap_device_info(struct swap_info_struct *si) * Fast path try to get swap entries with specified order from current * CPU's swap entry pool (a cluster). */ -static int swap_alloc_fast(swp_entry_t *entry, - unsigned char usage, - int order) +static bool swap_alloc_fast(swp_entry_t *entry, + unsigned char usage, + int order) { struct swap_cluster_info *ci; struct swap_info_struct *si; @@ -1206,47 +1206,31 @@ static int swap_alloc_fast(swp_entry_t *entry, return !!found; } -swp_entry_t folio_alloc_swap(struct folio *folio) +/* Rotate the device and switch to a new cluster */ +static bool swap_alloc_slow(swp_entry_t *entry, + unsigned char usage, + int order) { - unsigned int order = folio_order(folio); - unsigned int size = 1 << order; - struct swap_info_struct *si, *next; - swp_entry_t entry = {}; - unsigned long offset; int node; + unsigned long offset; + struct swap_info_struct *si, *next; - if (order) { - /* - * Should not even be attempting large allocations when huge - * page swap is disabled. Warn and fail the allocation. - */ - if (!IS_ENABLED(CONFIG_THP_SWAP) || size > SWAPFILE_CLUSTER) { - VM_WARN_ON_ONCE(1); - return entry; - } - } - - /* Fast path using percpu cluster */ - local_lock(&percpu_swap_cluster.lock); - if (swap_alloc_fast(&entry, SWAP_HAS_CACHE, order)) - goto out_alloced; - - /* Rotate the device and switch to a new cluster */ + node = numa_node_id(); spin_lock(&swap_avail_lock); start_over: - node = numa_node_id(); plist_for_each_entry_safe(si, next, &swap_avail_heads[node], avail_lists[node]) { + /* Rotate the device and switch to a new cluster */ plist_requeue(&si->avail_lists[node], &swap_avail_heads[node]); spin_unlock(&swap_avail_lock); if (get_swap_device_info(si)) { offset = cluster_alloc_swap_entry(si, order, SWAP_HAS_CACHE); put_swap_device(si); if (offset) { - entry = swp_entry(si->type, offset); - goto out_alloced; + *entry = swp_entry(si->type, offset); + return true; } if (order) - goto out_failed; + return false; } spin_lock(&swap_avail_lock); @@ -1265,20 +1249,68 @@ swp_entry_t folio_alloc_swap(struct folio *folio) goto start_over; } spin_unlock(&swap_avail_lock); -out_failed: + return false; +} + +/** + * folio_alloc_swap - allocate swap space for a folio + * @folio: folio we want to move to swap + * @gfp: gfp mask for shadow nodes + * + * Allocate swap space for the folio and add the folio to the + * swap cache. + * + * Context: Caller needs to hold the folio lock. + * Return: Whether the folio was added to the swap cache. + */ +int folio_alloc_swap(struct folio *folio, gfp_t gfp) +{ + unsigned int order = folio_order(folio); + unsigned int size = 1 << order; + swp_entry_t entry = {}; + + VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio); + VM_BUG_ON_FOLIO(!folio_test_uptodate(folio), folio); + + /* + * Should not even be attempting large allocations when huge + * page swap is disabled. Warn and fail the allocation. + */ + if (order && (!IS_ENABLED(CONFIG_THP_SWAP) || size > SWAPFILE_CLUSTER)) { + VM_WARN_ON_ONCE(1); + return -EINVAL; + } + + local_lock(&percpu_swap_cluster.lock); + if (swap_alloc_fast(&entry, SWAP_HAS_CACHE, order)) + goto out_alloced; + if (swap_alloc_slow(&entry, SWAP_HAS_CACHE, order)) + goto out_alloced; local_unlock(&percpu_swap_cluster.lock); - return entry; + return -ENOMEM; out_alloced: local_unlock(&percpu_swap_cluster.lock); - if (mem_cgroup_try_charge_swap(folio, entry)) { - put_swap_folio(folio, entry); - entry.val = 0; - } else { - atomic_long_sub(size, &nr_swap_pages); - } + if (mem_cgroup_try_charge_swap(folio, entry)) + goto out_free; - return entry; + /* + * XArray node allocations from PF_MEMALLOC contexts could + * completely exhaust the page allocator. __GFP_NOMEMALLOC + * stops emergency reserves from being allocated. + * + * TODO: this could cause a theoretical memory reclaim + * deadlock in the swap out path. + */ + if (add_to_swap_cache(folio, entry, gfp | __GFP_NOMEMALLOC, NULL)) + goto out_free; + + atomic_long_sub(size, &nr_swap_pages); + return 0; + +out_free: + put_swap_folio(folio, entry); + return -ENOMEM; } static struct swap_info_struct *_swap_info_get(swp_entry_t entry) diff --git a/mm/vmscan.c b/mm/vmscan.c index fcca38bc640f..be00af3763b5 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1289,7 +1289,7 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, split_folio_to_list(folio, folio_list)) goto activate_locked; } - if (!add_to_swap(folio)) { + if (folio_alloc_swap(folio, __GFP_HIGH | __GFP_NOWARN)) { int __maybe_unused order = folio_order(folio); if (!folio_test_large(folio)) @@ -1305,9 +1305,21 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, } #endif count_mthp_stat(order, MTHP_STAT_SWPOUT_FALLBACK); - if (!add_to_swap(folio)) + if (folio_alloc_swap(folio, __GFP_HIGH | __GFP_NOWARN)) goto activate_locked_split; } + /* + * Normally the folio will be dirtied in unmap because its + * pte should be dirty. A special case is MADV_FREE page. The + * page's pte could have dirty bit cleared but the folio's + * SwapBacked flag is still set because clearing the dirty bit + * and SwapBacked flag has no lock protected. For such folio, + * unmap will not set dirty bit for it, so folio reclaim will + * not write the folio out. This can cause data corruption when + * the folio is swapped in later. Always setting the dirty flag + * for the folio solves the problem. + */ + folio_mark_dirty(folio); } }