From patchwork Tue Oct 10 14:21:10 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ryan Roberts X-Patchwork-Id: 13415650 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 870AECD8C8B for ; Tue, 10 Oct 2023 14:27:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0637480044; Tue, 10 Oct 2023 10:27:35 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 011F08D0002; Tue, 10 Oct 2023 10:27:34 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E6A7780044; Tue, 10 Oct 2023 10:27:34 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id D89EA8D0002 for ; Tue, 10 Oct 2023 10:27:34 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id A20101CA41B for ; Tue, 10 Oct 2023 14:27:34 +0000 (UTC) X-FDA: 81329780028.20.D91909B Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf01.hostedemail.com (Postfix) with ESMTP id F04334024F for ; Tue, 10 Oct 2023 14:21:25 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf01.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1696947686; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=f1UPcKI2EAe1LDXDBqhCKA0Zs8agIyDfoHrUpoTDovk=; b=Z++kAFR/yh8TatSmZXDUURoHI0lZFjX+s1Mj13m1d5NwQrzZLdQ0Aq+M9B4FTKArcD1Cyr 8w6ilWcLv5vsw1ST1SXb8OQC0jpmDPPcNpfkrB5ALCCIBoOEsVXZdm346C8GvlLbj1pF6n 7qCNOfSF72asaYyRsUsq+9Jgu9AD6FM= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf01.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1696947686; a=rsa-sha256; cv=none; b=zOJQXFLRZLxobqUodXKncanP7n3BLNh1tMfV+P0vfDXVodsP7b5GT9W8kVQm/ABpIi4eWB mfkEi9xNB5A8n8UPLzuV7W25pRLnPfzjE6jm98ZtXYITOBcqcS93Hi5d4/Vq2zN6pGru4p r12D5YeAOt68srK4NTX4lalq9SOd7Wk= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 97726C15; Tue, 10 Oct 2023 07:22:05 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.26]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 76B223F762; Tue, 10 Oct 2023 07:21:23 -0700 (PDT) From: Ryan Roberts To: Andrew Morton , David Hildenbrand , Matthew Wilcox , Huang Ying , Gao Xiang , Yu Zhao , Yang Shi , Michal Hocko Cc: Ryan Roberts , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [RFC PATCH v1 1/2] mm: swap: Remove CLUSTER_FLAG_HUGE from swap_cluster_info:flags Date: Tue, 10 Oct 2023 15:21:10 +0100 Message-Id: <20231010142111.3997780-2-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20231010142111.3997780-1-ryan.roberts@arm.com> References: <20231010142111.3997780-1-ryan.roberts@arm.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: F04334024F X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: yp3hq6fud48nriexnxwapgu5ej9qkra9 X-HE-Tag: 1696947685-942705 X-HE-Meta: U2FsdGVkX1+szCeSa38k//25DmCne1a9jFX2u3zFVvJqjWofulW8W0edh5SR86ogzu44TWjFKgZ5sQzVlWHnXDQjsymqPz4zF+nwiSbGogCXMt9xst+ABFEI1TkJ8jMdqcE7DGCz3x8Kc4U6td4zQDsQ+ozgZUyUPr+r6X/m1kj6lZhTL/42JIZKiUpBqFiMNCDjewMKGDkDmPdt5+hEFKp+tUmkvh4tltr/TXkJDLY861MDJUCezi8jIt1K3nh791CjZRIPI4cQGdoT8YKseGajuQ5xpRb3nImDBHj7EmKcyd5wnQQiXV5RzWIiFxmmXmmfmLaEs85spyRfP6Pq8MoEShvjck2saOMWxg1JsjmltCOGKHDMx9JINUXg55ZkGzrOIvcPapa8xdpCdcdvIjxZ5Z1OeYdLTN4upBO1o7Vl6U8L8mKoq19fiykt6b+wu8FoXEHgFl9v4KssY2dBb8BQECJP30vQcYJmCqANHabjPWPYs28gdTlhMResy8EqEobe9OO5T9/PcGruF0JwCVbSZZOGlRhWCM4RMpA2TjHknHWYg6iOCzM9niCKTEMZd2cqNUa98Hvi25C1AcEaH+qvA6GAURuju3McYVef7lm6FERE4c0Nwp7eUyDMJ/cY/Zm1yiKiwKt6wrZ9JRTu439eW1m+wsbXax/XcL4UWDAc63dywFnSoyNl08+/ZjJwy84vEFXga1iQo0C22h8eyzCCqyPNiQtn41/uRnYWrg1zGQbU5BKvgrbLc1Qst84wDAbQK4ptQ7vOeyx88WRybQMRKNLyxMJ+GfnvAmxbMjCtYaiZ7PgH8ouM6S1BOR+kSkQ/kunGxTG4qQ3AxmIfrG9j+MuBQMkzcxQY7l6uUbic6d9pz6th7rU3mAGU9iyrHLj9pDvaoh5vf8GThgMcQW/vz5QjkApn6Xu+vRY2JYOfRxjHcV6nmDwR41ZsTaw+l0nHODUPyrioXsx2uzY hgRp3mYn JzjSFlowuo721N845p+9kxnW0H6mHY8P7hp5zvmcyISu/kCHyVTgO4gJb6Mp7GDHzfuaAV8qVamulxx2pUxNRJcVYQIm6OZWPd+Iqj1KTwUc8RMUhDIcOyJ8QGFsWpN8sP4Pwxe8X/tGxtswO5aU3uJ8m8aaeorrTtEYNSgX04m3s0wO7fw3HZXJKzjAwTy1a18+OOuPHOgtoLd8= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: As preparation for supporting small-sized THP in the swap-out path, without first needing to split to order-0, Remove the CLUSTER_FLAG_HUGE, which, when present, always implies PMD-sized THP, which is the same as the cluster size. The only use of the flag was to determine whether a swap entry refers to a single page or a PMD-sized THP in swap_page_trans_huge_swapped(). Instead of relying on the flag, we now pass in nr_pages, which originates from the folio's number of pages. This allows the logic to work for folios of any order. The one snag is that one of the swap_page_trans_huge_swapped() call sites does not have the folio. But it was only being called there to avoid bothering to call __try_to_reclaim_swap() in some cases. __try_to_reclaim_swap() gets the folio and (via some other functions) calls swap_page_trans_huge_swapped(). So I've removed the problematic call site and believe the new logic should be equivalent. Removing CLUSTER_FLAG_HUGE also means we can remove split_swap_cluster() which used to be called during folio splitting, since split_swap_cluster()'s only job was to remove the flag. Signed-off-by: Ryan Roberts --- include/linux/swap.h | 10 ---------- mm/huge_memory.c | 3 --- mm/swapfile.c | 47 ++++++++------------------------------------ 3 files changed, 8 insertions(+), 52 deletions(-) -- 2.25.1 diff --git a/include/linux/swap.h b/include/linux/swap.h index 19f30a29e1f1..a073366a227c 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -259,7 +259,6 @@ struct swap_cluster_info { }; #define CLUSTER_FLAG_FREE 1 /* This cluster is free */ #define CLUSTER_FLAG_NEXT_NULL 2 /* This cluster has no next cluster */ -#define CLUSTER_FLAG_HUGE 4 /* This cluster is backing a transparent huge page */ /* * We assign a cluster to each CPU, so each CPU can allocate swap entry from @@ -595,15 +594,6 @@ static inline int add_swap_extent(struct swap_info_struct *sis, } #endif /* CONFIG_SWAP */ -#ifdef CONFIG_THP_SWAP -extern int split_swap_cluster(swp_entry_t entry); -#else -static inline int split_swap_cluster(swp_entry_t entry) -{ - return 0; -} -#endif - #ifdef CONFIG_MEMCG static inline int mem_cgroup_swappiness(struct mem_cgroup *memcg) { diff --git a/mm/huge_memory.c b/mm/huge_memory.c index c9cbcbf6697e..46b3fb943207 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2597,9 +2597,6 @@ static void __split_huge_page(struct page *page, struct list_head *list, shmem_uncharge(head->mapping->host, nr_dropped); remap_page(folio, nr); - if (folio_test_swapcache(folio)) - split_swap_cluster(folio->swap); - for (i = 0; i < nr; i++) { struct page *subpage = head + i; if (subpage == page) diff --git a/mm/swapfile.c b/mm/swapfile.c index e52f486834eb..c668838fa660 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -342,18 +342,6 @@ static inline void cluster_set_null(struct swap_cluster_info *info) info->data = 0; } -static inline bool cluster_is_huge(struct swap_cluster_info *info) -{ - if (IS_ENABLED(CONFIG_THP_SWAP)) - return info->flags & CLUSTER_FLAG_HUGE; - return false; -} - -static inline void cluster_clear_huge(struct swap_cluster_info *info) -{ - info->flags &= ~CLUSTER_FLAG_HUGE; -} - static inline struct swap_cluster_info *lock_cluster(struct swap_info_struct *si, unsigned long offset) { @@ -1021,7 +1009,7 @@ static int swap_alloc_cluster(struct swap_info_struct *si, swp_entry_t *slot) offset = idx * SWAPFILE_CLUSTER; ci = lock_cluster(si, offset); alloc_cluster(si, idx); - cluster_set_count_flag(ci, SWAPFILE_CLUSTER, CLUSTER_FLAG_HUGE); + cluster_set_count_flag(ci, SWAPFILE_CLUSTER, 0); memset(si->swap_map + offset, SWAP_HAS_CACHE, SWAPFILE_CLUSTER); unlock_cluster(ci); @@ -1354,7 +1342,6 @@ void put_swap_folio(struct folio *folio, swp_entry_t entry) ci = lock_cluster_or_swap_info(si, offset); if (size == SWAPFILE_CLUSTER) { - VM_BUG_ON(!cluster_is_huge(ci)); map = si->swap_map + offset; for (i = 0; i < SWAPFILE_CLUSTER; i++) { val = map[i]; @@ -1362,7 +1349,6 @@ void put_swap_folio(struct folio *folio, swp_entry_t entry) if (val == SWAP_HAS_CACHE) free_entries++; } - cluster_clear_huge(ci); if (free_entries == SWAPFILE_CLUSTER) { unlock_cluster_or_swap_info(si, ci); spin_lock(&si->lock); @@ -1384,23 +1370,6 @@ void put_swap_folio(struct folio *folio, swp_entry_t entry) unlock_cluster_or_swap_info(si, ci); } -#ifdef CONFIG_THP_SWAP -int split_swap_cluster(swp_entry_t entry) -{ - struct swap_info_struct *si; - struct swap_cluster_info *ci; - unsigned long offset = swp_offset(entry); - - si = _swap_info_get(entry); - if (!si) - return -EBUSY; - ci = lock_cluster(si, offset); - cluster_clear_huge(ci); - unlock_cluster(ci); - return 0; -} -#endif - static int swp_entry_cmp(const void *ent1, const void *ent2) { const swp_entry_t *e1 = ent1, *e2 = ent2; @@ -1508,22 +1477,23 @@ int swp_swapcount(swp_entry_t entry) } static bool swap_page_trans_huge_swapped(struct swap_info_struct *si, - swp_entry_t entry) + swp_entry_t entry, + unsigned int nr_pages) { struct swap_cluster_info *ci; unsigned char *map = si->swap_map; unsigned long roffset = swp_offset(entry); - unsigned long offset = round_down(roffset, SWAPFILE_CLUSTER); + unsigned long offset = round_down(roffset, nr_pages); int i; bool ret = false; ci = lock_cluster_or_swap_info(si, offset); - if (!ci || !cluster_is_huge(ci)) { + if (!ci || nr_pages == 1) { if (swap_count(map[roffset])) ret = true; goto unlock_out; } - for (i = 0; i < SWAPFILE_CLUSTER; i++) { + for (i = 0; i < nr_pages; i++) { if (swap_count(map[offset + i])) { ret = true; break; @@ -1545,7 +1515,7 @@ static bool folio_swapped(struct folio *folio) if (!IS_ENABLED(CONFIG_THP_SWAP) || likely(!folio_test_large(folio))) return swap_swapcount(si, entry) != 0; - return swap_page_trans_huge_swapped(si, entry); + return swap_page_trans_huge_swapped(si, entry, folio_nr_pages(folio)); } /** @@ -1606,8 +1576,7 @@ int free_swap_and_cache(swp_entry_t entry) p = _swap_info_get(entry); if (p) { count = __swap_entry_free(p, entry); - if (count == SWAP_HAS_CACHE && - !swap_page_trans_huge_swapped(p, entry)) + if (count == SWAP_HAS_CACHE) __try_to_reclaim_swap(p, swp_offset(entry), TTRS_UNMAPPED | TTRS_FULL); } From patchwork Tue Oct 10 14:21:11 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ryan Roberts X-Patchwork-Id: 13415643 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AB881CD8C89 for ; Tue, 10 Oct 2023 14:22:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2CF438D00B4; Tue, 10 Oct 2023 10:22:49 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 280008D0002; Tue, 10 Oct 2023 10:22:49 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 170058D00B4; Tue, 10 Oct 2023 10:22:49 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 075CE8D0002 for ; Tue, 10 Oct 2023 10:22:49 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id CE19DC01FC for ; Tue, 10 Oct 2023 14:22:48 +0000 (UTC) X-FDA: 81329768016.28.537DBA0 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf09.hostedemail.com (Postfix) with ESMTP id E84451401DC for ; Tue, 10 Oct 2023 14:21:27 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=none; spf=pass (imf09.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1696947688; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=LvPNpBn5V6353PiF4jY+DaqBtEmdQFdjfrHbl0dM16I=; b=UhzteH3OSV5X3Y63Y1f2RbNIuP4YiFksTDSQa0tpwwXlXx0ICDPqXpWbfLtjjIy/M6nC5i sYQwitIhMDXYcmzRxG+SzExdXl56fxs/pyaGPD0hMKem2DD/GAQ3r1OpfCeNYXL7WzTPr6 ZRVxvsG+qd3DFLJPzPBh3QjtBjrnotE= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1696947688; a=rsa-sha256; cv=none; b=mFNLfse/jtrDG3d+rKFdVPPwHmUuLZH/Sraxud/TKNzFw7ebJE1ArtBdIvCXaQYYlYQkkq 9Yz2g5Rh6OtZ3vLRffBuuFIeCf9Iv3Ls8wMVXkNKC8zMM/+xRitW5IgEHFOhEULHsdvcvd OXkgGh4G7rHn8PAC/6y6DybktNm3c74= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=none; spf=pass (imf09.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 3EE63106F; Tue, 10 Oct 2023 07:22:07 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.26]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 1E5043F762; Tue, 10 Oct 2023 07:21:25 -0700 (PDT) From: Ryan Roberts To: Andrew Morton , David Hildenbrand , Matthew Wilcox , Huang Ying , Gao Xiang , Yu Zhao , Yang Shi , Michal Hocko Cc: Ryan Roberts , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [RFC PATCH v1 2/2] mm: swap: Swap-out small-sized THP without splitting Date: Tue, 10 Oct 2023 15:21:11 +0100 Message-Id: <20231010142111.3997780-3-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20231010142111.3997780-1-ryan.roberts@arm.com> References: <20231010142111.3997780-1-ryan.roberts@arm.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: E84451401DC X-Rspam-User: X-Stat-Signature: 3t6eqt9fp6dphf763e73dohcie51hk76 X-Rspamd-Server: rspam03 X-HE-Tag: 1696947687-744337 X-HE-Meta: U2FsdGVkX1930S8bnBmgZ9PFQc6o3jfncMFA9QgngjYBr6TPwHLBlp1cvMFctziq5yCZVVJ7sZy8RzgxBCBJ5sKEDQZzQmzcMLusPmXlO55A3GN5FLfqz4m4pE+XTQeCTWI/dSrYunM8rnJizYUagQ2FmfKBR2uzXTpZ70tTJ5IAmJUe1cb3APidJMrg3Iyt+Ca7hShWJbwPxcWz3/Px83YJTOJpBZOtvQgVx+Xs0RDeidME891v3GGLQq1fZe61f+veUiGDvwbBbm5NA1cfnT7LJ3FIvqNouznvXmKJfiidzwBFx0Lx2Q//LpoFjlXcOgu2VnW5LqzDxFPmI91kMQbL96zIp3WeSl5JP6uIUougnpg3ulAxpXBbwoUi5hCbGAvChUwffqtf3nsItCgxOQFNe60WVLE51T80J3VDnAUT3Kfb/VY5NHzJz7ZYCdD9FERZfME/kkQH61N3ZhAnkseUD2AZmf7qTnILgjSARo6oIpR8+JTqF27jJTU6WfbM9toEqM85rYhgJnHgG9hGNdbk/0La3WJD+rlVzLIDOX6MysV+Iic4IIj+y5yjN2B7RiMlegwmqi4wmwn6emPDBcdvgWJerQQxCeQPwQf6WvEaJBhtFAG3a5BBy34Hi3j69cNk0ZadByAoX1d1drOO4isk1ekcNlJBPEL8P2t4rO+JNkDq1CwLKDCSqT/Wt8ElnSOt6ikUnf9of9evTQoWmyi/0TM+CbC6AfPS4mBTc2yYNh9mAaAv4z4OyeJ0uWhoL6r5dNOHSHiENxeYwa88xlsjUD+4Q7+0tsmWqnBTA1nJZ3DcjBGzAstNmZV1efjtIKPQ1D010ss+70DhcFJhP5sXuRbZwVVD4VoXMYPruOB8FUT9Kqq4PEsWekjcn1w1Aai8OP8Eh3uz2nKElavdnOTZOxMGqBLuzQ6Q+kL/ihhGYBWsw8h3D7pPAdVchgKpC4BClwNqDIkY2w7ium+ OLztSUye iT4QHuiz+dKQoslZzzx7Mn5lrcb3gYufFtWsAaRjZtq/Mff9CUQ6Mu8I1qVCFxnF32/+CRj4n/GkLqYsV7+MWRu0qtHYHw/3dAI5klPq9KKLMTrwxmsgB2CgMSZ1TdRl9Q71TyY63J4EowHwW8MBiBWFV+wnBKAc2OioT X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The upcoming anonymous small-sized THP feature enables performance improvements by allocating large folios for anonymous memory. However I've observed that on an arm64 system running a parallel workload (e.g. kernel compilation) across many cores, under high memory pressure, the speed regresses. This is due to bottlenecking on the increased number of TLBIs added due to all the extra folio splitting. Therefore, solve this regression by adding support for swapping out small-sized THP without needing to split the folio, just like is already done for PMD-sized THP. This change only applies when CONFIG_THP_SWAP is enabled, and when the swap backing store is a non-rotating block device - these are the same constraints as for the existing PMD-sized THP swap-out support. Note that no attempt is made to swap-in THP here - this is still done page-by-page, like for PMD-sized THP. The main change here is to improve the swap entry allocator so that it can allocate any power-of-2 number of contiguous entries between [4, (1 << PMD_ORDER)]. This is done by allocating a cluster for each distinct order and allocating sequentially from it until the cluster is full. This ensures that we don't need to search the map and we get no fragmentation due to alignment padding for different orders in the cluster. If there is no current cluster for a given order, we attempt to allocate a free cluster from the list. If there are no free clusters, we fail the allocation and the caller falls back to splitting the folio and allocates individual entries (as per existing PMD-sized THP fallback). As far as I can tell, this should not cause any extra fragmentation concerns, given how similar it is to the existing PMD-sized THP allocation mechanism. There will be up to (PMD_ORDER-1) clusters in concurrent use though. In practice, the number of orders in use will be small though. Signed-off-by: Ryan Roberts --- include/linux/swap.h | 7 ++++++ mm/swapfile.c | 60 +++++++++++++++++++++++++++++++++----------- mm/vmscan.c | 10 +++++--- 3 files changed, 59 insertions(+), 18 deletions(-) -- 2.25.1 diff --git a/include/linux/swap.h b/include/linux/swap.h index a073366a227c..fc55b760aeff 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -320,6 +320,13 @@ struct swap_info_struct { */ struct work_struct discard_work; /* discard worker */ struct swap_cluster_list discard_clusters; /* discard clusters list */ + unsigned int large_next[PMD_ORDER]; /* + * next free offset within current + * allocation cluster for large + * folios, or UINT_MAX if no current + * cluster. Index is (order - 1). + * Only when cluster_info is used. + */ struct plist_node avail_lists[]; /* * entries in swap_avail_heads, one * entry per node. diff --git a/mm/swapfile.c b/mm/swapfile.c index c668838fa660..f8093dedc866 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -987,8 +987,10 @@ static int scan_swap_map_slots(struct swap_info_struct *si, return n_ret; } -static int swap_alloc_cluster(struct swap_info_struct *si, swp_entry_t *slot) +static int swap_alloc_large(struct swap_info_struct *si, swp_entry_t *slot, + unsigned int nr_pages) { + int order; unsigned long idx; struct swap_cluster_info *ci; unsigned long offset; @@ -1002,20 +1004,47 @@ static int swap_alloc_cluster(struct swap_info_struct *si, swp_entry_t *slot) return 0; } - if (cluster_list_empty(&si->free_clusters)) - return 0; + VM_WARN_ON(nr_pages < 2); + VM_WARN_ON(nr_pages > SWAPFILE_CLUSTER); + VM_WARN_ON(!is_power_of_2(nr_pages)); - idx = cluster_list_first(&si->free_clusters); - offset = idx * SWAPFILE_CLUSTER; - ci = lock_cluster(si, offset); - alloc_cluster(si, idx); - cluster_set_count_flag(ci, SWAPFILE_CLUSTER, 0); + order = ilog2(nr_pages); + offset = si->large_next[order - 1]; + + if (offset == UINT_MAX) { + if (cluster_list_empty(&si->free_clusters)) + return 0; - memset(si->swap_map + offset, SWAP_HAS_CACHE, SWAPFILE_CLUSTER); + idx = cluster_list_first(&si->free_clusters); + offset = idx * SWAPFILE_CLUSTER; + + ci = lock_cluster(si, offset); + alloc_cluster(si, idx); + cluster_set_count_flag(ci, SWAPFILE_CLUSTER, 0); + + /* + * If scan_swap_map_slots() can't find a free cluster, it will + * check si->swap_map directly. To make sure this standby + * cluster isn't taken by scan_swap_map_slots(), mark the swap + * entries bad (occupied). (same approach as discard). + */ + memset(si->swap_map + offset + nr_pages, SWAP_MAP_BAD, + SWAPFILE_CLUSTER - nr_pages); + } else { + idx = offset / SWAPFILE_CLUSTER; + ci = lock_cluster(si, offset); + } + + memset(si->swap_map + offset, SWAP_HAS_CACHE, nr_pages); unlock_cluster(ci); - swap_range_alloc(si, offset, SWAPFILE_CLUSTER); + swap_range_alloc(si, offset, nr_pages); *slot = swp_entry(si->type, offset); + offset += nr_pages; + if (idx != offset / SWAPFILE_CLUSTER) + offset = UINT_MAX; + si->large_next[order - 1] = offset; + return 1; } @@ -1041,7 +1070,7 @@ int get_swap_pages(int n_goal, swp_entry_t swp_entries[], int entry_size) int node; /* Only single cluster request supported */ - WARN_ON_ONCE(n_goal > 1 && size == SWAPFILE_CLUSTER); + WARN_ON_ONCE(n_goal > 1 && size > 1); spin_lock(&swap_avail_lock); @@ -1078,14 +1107,14 @@ int get_swap_pages(int n_goal, swp_entry_t swp_entries[], int entry_size) spin_unlock(&si->lock); goto nextsi; } - if (size == SWAPFILE_CLUSTER) { + if (size > 1) { if (si->flags & SWP_BLKDEV) - n_ret = swap_alloc_cluster(si, swp_entries); + n_ret = swap_alloc_large(si, swp_entries, size); } else n_ret = scan_swap_map_slots(si, SWAP_HAS_CACHE, n_goal, swp_entries); spin_unlock(&si->lock); - if (n_ret || size == SWAPFILE_CLUSTER) + if (n_ret || size > 1) goto check_out; cond_resched(); @@ -2725,6 +2754,9 @@ static struct swap_info_struct *alloc_swap_info(void) spin_lock_init(&p->cont_lock); init_completion(&p->comp); + for (i = 0; i < ARRAY_SIZE(p->large_next); i++) + p->large_next[i] = UINT_MAX; + return p; } diff --git a/mm/vmscan.c b/mm/vmscan.c index c16e2b1ea8ae..5984d2ae4547 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1212,11 +1212,13 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, if (!can_split_folio(folio, NULL)) goto activate_locked; /* - * Split folios without a PMD map right - * away. Chances are some or all of the - * tail pages can be freed without IO. + * Split PMD-mappable folios without a + * PMD map right away. Chances are some + * or all of the tail pages can be freed + * without IO. */ - if (!folio_entire_mapcount(folio) && + if (folio_test_pmd_mappable(folio) && + !folio_entire_mapcount(folio) && split_folio_to_list(folio, folio_list)) goto activate_locked;