From patchwork Mon Mar 4 08:13:44 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Barry Song <21cnbao@gmail.com> X-Patchwork-Id: 13580149 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C2731C48BF6 for ; Mon, 4 Mar 2024 08:14:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 41B676B009F; Mon, 4 Mar 2024 03:14:31 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 3CBDC6B00A0; Mon, 4 Mar 2024 03:14:31 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 245676B00A1; Mon, 4 Mar 2024 03:14:31 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 111FD6B009F for ; Mon, 4 Mar 2024 03:14:31 -0500 (EST) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id D76FE14096E for ; Mon, 4 Mar 2024 08:14:30 +0000 (UTC) X-FDA: 81858644700.25.08B661F Received: from mail-pf1-f176.google.com (mail-pf1-f176.google.com [209.85.210.176]) by imf06.hostedemail.com (Postfix) with ESMTP id 1A4EA18001D for ; Mon, 4 Mar 2024 08:14:28 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=YSySCNeL; spf=pass (imf06.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.210.176 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1709540069; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=y59EUmsn936TktKB/aH9UYKKlrxQ0A+jm6Fw0p9O+fA=; b=79FN3tfWQkcV1sKP1QhittlqUPCvDEF4MtgvWAmXitiuzRLjpINctZDmH/IJQ1U+vuAZEr u0pJyhcuhDcQxndNUDOBvEdLLHr8QOj1zfe3F0WnVAIsZmag+xKB631E0Gz7SmjTqU4Jcs Ab91vCpTL3AmUm1jq6UATpUx/fVoJKo= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1709540069; a=rsa-sha256; cv=none; b=i3hAJrBNM3Od4NCcgdxNOqN85XODDPwTXBizeXL/lCCciKQkKBBpltndW5o+Smq4jDkfu5 Wg4NUShwDnyTgmNJRZMJqyZ3UnulqHZ4bnwqT6Q+hSDc1hXt5kmU6wXklO9iJ9BOow7qBY ip8f+wCF9G2v+5CTnaJuUysHiJfDNtA= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=YSySCNeL; spf=pass (imf06.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.210.176 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-pf1-f176.google.com with SMTP id d2e1a72fcca58-6e558a67f70so3659648b3a.0 for ; Mon, 04 Mar 2024 00:14:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1709540068; x=1710144868; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=y59EUmsn936TktKB/aH9UYKKlrxQ0A+jm6Fw0p9O+fA=; b=YSySCNeLH/nJFk0fkB9ZalGNyNKvgRYSOSNhLJB+oBtzeSKre+4NnSGDsA1l1Ea7n4 tEnsyYQeaHQdz3bhhBvJAMW0fzv1gfYEhLQWO7Xl04UB8QWexw6TLZoqpjYjAy7+SNiX +90I8T8g2DhtdKux9+0/+Qwnp+4ecq/Mhad9zBDEu7gD5cEjqPac2EBIhZ/WZvVtEkoi ZPzShLSHXfJYeFv9+hebVegX2iG2rTYzSulVDMqGB9NsYI36Jp29vCH3YeiNaTwx4p/S fha85MswaYFAJcBN7GsVPL2kq+dRgzuFJ0I9R3HuIwRG/0ohCs+kbSeTw3m4Ql4fG502 85dA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709540068; x=1710144868; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=y59EUmsn936TktKB/aH9UYKKlrxQ0A+jm6Fw0p9O+fA=; b=kwBQMD6oGg/v9zqdS8PWy9zn4jlsjRhPima5uHLuqJzshdJ8p6hilbhdDLHnf2OZCz a3EVngDVHQc4UX4Su/kzvMF7egvXmzBteEG7ohl+Jx44T7pk/hmSQRgwoOMihOUk16n4 rEVUTG9bU5m02VBxKEcak1B2J1ckFkcrlmrDCARvm6gwz+2m97iVnzaXXg7jPWclgpMx yog61AUBnxKQkzgBxWR6tahZNfj3B0eWz/0HBTwo4Fhd/IUexsQKCr103u6Q+aqN4QbZ uuJcX6ZvtDcNvdTfLxfDBejDGC3VE90owhVvCqxmSk3pHgUkcsrZmtf+w9KgGuYhNx/j +E9A== X-Forwarded-Encrypted: i=1; AJvYcCXF6nDWFZf6y6CcLysJSOPx0qk4DvyPOGjql27QxyddFgFn5TwAD6TwizpJsbRLpgCPqPwefmHhZ3oofEQ2AisHVRI= X-Gm-Message-State: AOJu0YwZlgV25H7J0p0hIZ0n+heLn5UvLrcUThZ2W5187qn8yO5xZ7kC s0O7D44EQtaxKUmeKVk8OEKOGC1ehTeGgrnP4GFrDCMM1gyDdkov X-Google-Smtp-Source: AGHT+IGA7fWsmwmgOxEFHP05B2QC6S7nXvlivm8Kflb15xIyMMY/ZdBjS77UiDjrNdXg/50SGdBWOA== X-Received: by 2002:a05:6a00:b8f:b0:6e6:13ec:7178 with SMTP id g15-20020a056a000b8f00b006e613ec7178mr2688020pfj.32.1709540067899; Mon, 04 Mar 2024 00:14:27 -0800 (PST) Received: from localhost.localdomain ([2407:7000:8942:5500:aaa1:59ff:fe57:eb97]) by smtp.gmail.com with ESMTPSA id ka42-20020a056a0093aa00b006e558a67374sm6686387pfb.0.2024.03.04.00.14.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 04 Mar 2024 00:14:27 -0800 (PST) From: Barry Song <21cnbao@gmail.com> To: akpm@linux-foundation.org, linux-mm@kvack.org, ryan.roberts@arm.com Cc: chengming.zhou@linux.dev, chrisl@kernel.org, david@redhat.com, hannes@cmpxchg.org, kasong@tencent.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, mhocko@suse.com, nphamcs@gmail.com, shy828301@gmail.com, steven.price@arm.com, surenb@google.com, wangkefeng.wang@huawei.com, willy@infradead.org, xiang@kernel.org, ying.huang@intel.com, yosryahmed@google.com, yuzhao@google.com, Barry Song , Catalin Marinas , Will Deacon , Mark Rutland , Kemeng Shi , Anshuman Khandual , Peter Collingbourne , Peter Xu , Lorenzo Stoakes , "Mike Rapoport (IBM)" , Hugh Dickins , "Aneesh Kumar K.V" , Rick Edgecombe Subject: [RFC PATCH v3 1/5] arm64: mm: swap: support THP_SWAP on hardware with MTE Date: Mon, 4 Mar 2024 21:13:44 +1300 Message-Id: <20240304081348.197341-2-21cnbao@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240304081348.197341-1-21cnbao@gmail.com> References: <20240304081348.197341-1-21cnbao@gmail.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 1A4EA18001D X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: kthsnq5xje7ouhqnsuexqnfn9b7jyid6 X-HE-Tag: 1709540068-638588 X-HE-Meta: U2FsdGVkX1/fG0KBPLZVDpLuFBDpV4yzAwEo/SH0E1kbHj7uNKP64ka7Z6y7+nJtN7hbp8/DNCICov3/4JFxDL62tonh+Uwsh6a8iTAmdKswwDTkNsw627IkqIhWwKbnF9CdDI5B9dn//MpZ1fpu+tvRDrQh5rnfxVRlZN2Dtgu7hC0H0qdqsfjQUejfVhFiUXJd9zd/cJkgh44NfRro6lx7fPOojniKTRzOnGGvANGlwQ0lIWDC0OtDkG4XHQgdNczRV+SNXhxJrk8dTQ8e3+cl4YFE0J/E2f4g43+3BJYodfDjH5oXdFvNbfr+7Ma9Hhq6dJtwFTn1gy2gV1KqEUm3g3PvCXavd3FQKyUz1I1N8FSvt/DioY6jwF3tiIw0UnDYbzZMfJ/HrSyVnBD1wCk2IKgAHbsCrL1wxUyqXgrrzVs8d3fjjyIkplMirgJ5mUT2wp2RrXLksubOYWHsCPhIecCQwbiI08mC+bcaie1lolq615G5IJgGpbDNyxZBw3pAcsHGlIHpF47+CCabWMSqi9QTfM+ewY6Y4MKrAIyl+qqTHlpfTU0JPGNeFP2/+vnudSz68g+98KUPJjbCACOQyM7Y7a9xe0avVbHNE0ZDfNYnFoEIZtGpmWZLRwlaYUOThW5PqrMqch7gmyX3FAzcyUgix95J8XtMZnfSI78rF0Zx69bZMyINaMVJLoi8vUcc29rOiRBevRd8hbA77pL9FwLc73ECIyU0FO/Atyj6MKbVUQa6naZpLJ7YF4RQrY/tyj+R4OZtFNWp9Zf+tdZ7navPT6kkCyjtbPl/dANysUWa81k3/9HXEBUF01nQC271ElJ4QjtO1Z/3FCH9LGZCm5hKh0158xr9/xoU1+glBYH9HgssNQSdxCYEDQx4nO+Ip9pz6LKxiFJ+d+TdvCakQF0RNISj0+I2H9HRietc6rjRVwc2IC/aJkh/aCxnX95g1Rx7+3ODqF2dxTF MPkRCEQI LuZrIbgFH2WxaeZpNX9jmQmIaocN1BJanSmEXI0kcgvkIQWHkj5gdqaylCojm/GNEQLNizqPQpaKGqfKp0SXI8TkoGJHcSZ6BUKXQkAm5LpoOcgGovSZgziiyOvT+gHF2a8QtkMqVErEg7pJgIQ5np6bFtxpkuwU8VMRQ X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Barry Song Commit d0637c505f8a1 ("arm64: enable THP_SWAP for arm64") brings up THP_SWAP on ARM64, but it doesn't enable THP_SWP on hardware with MTE as the MTE code works with the assumption tags save/restore is always handling a folio with only one page. The limitation should be removed as more and more ARM64 SoCs have this feature. Co-existence of MTE and THP_SWAP becomes more and more important. This patch makes MTE tags saving support large folios, then we don't need to split large folios into base pages for swapping out on ARM64 SoCs with MTE any more. arch_prepare_to_swap() should take folio rather than page as parameter because we support THP swap-out as a whole. It saves tags for all pages in a large folio. As now we are restoring tags based-on folio, in arch_swap_restore(), we may increase some extra loops and early-exitings while refaulting a large folio which is still in swapcache in do_swap_page(). In case a large folio has nr pages, do_swap_page() will only set the PTE of the particular page which is causing the page fault. Thus do_swap_page() runs nr times, and each time, arch_swap_restore() will loop nr times for those subpages in the folio. So right now the algorithmic complexity becomes O(nr^2). Once we support mapping large folios in do_swap_page(), extra loops and early-exitings will decrease while not being completely removed as a large folio might get partially tagged in corner cases such as, 1. a large folio in swapcache can be partially unmapped, thus, MTE tags for the unmapped pages will be invalidated; 2. users might use mprotect() to set MTEs on a part of a large folio. arch_thp_swp_supported() is dropped since ARM64 MTE was the only one who needed it. Cc: Catalin Marinas Cc: Will Deacon Cc: Ryan Roberts Cc: Mark Rutland Cc: David Hildenbrand Cc: Kemeng Shi Cc: "Matthew Wilcox (Oracle)" Cc: Anshuman Khandual Cc: Peter Collingbourne Cc: Steven Price Cc: Yosry Ahmed Cc: Peter Xu Cc: Lorenzo Stoakes Cc: "Mike Rapoport (IBM)" Cc: Hugh Dickins CC: "Aneesh Kumar K.V" Cc: Rick Edgecombe Signed-off-by: Barry Song Reviewed-by: Steven Price Acked-by: Chris Li --- arch/arm64/include/asm/pgtable.h | 19 ++------------ arch/arm64/mm/mteswap.c | 43 ++++++++++++++++++++++++++++++++ include/linux/huge_mm.h | 12 --------- include/linux/pgtable.h | 2 +- mm/page_io.c | 2 +- mm/swap_slots.c | 2 +- 6 files changed, 48 insertions(+), 32 deletions(-) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index 401087e8a43d..7a54750770b8 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -45,12 +45,6 @@ __flush_tlb_range(vma, addr, end, PUD_SIZE, false, 1) #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ -static inline bool arch_thp_swp_supported(void) -{ - return !system_supports_mte(); -} -#define arch_thp_swp_supported arch_thp_swp_supported - /* * Outside of a few very special situations (e.g. hibernation), we always * use broadcast TLB invalidation instructions, therefore a spurious page @@ -1095,12 +1089,7 @@ static inline pmd_t pmdp_establish(struct vm_area_struct *vma, #ifdef CONFIG_ARM64_MTE #define __HAVE_ARCH_PREPARE_TO_SWAP -static inline int arch_prepare_to_swap(struct page *page) -{ - if (system_supports_mte()) - return mte_save_tags(page); - return 0; -} +extern int arch_prepare_to_swap(struct folio *folio); #define __HAVE_ARCH_SWAP_INVALIDATE static inline void arch_swap_invalidate_page(int type, pgoff_t offset) @@ -1116,11 +1105,7 @@ static inline void arch_swap_invalidate_area(int type) } #define __HAVE_ARCH_SWAP_RESTORE -static inline void arch_swap_restore(swp_entry_t entry, struct folio *folio) -{ - if (system_supports_mte()) - mte_restore_tags(entry, &folio->page); -} +extern void arch_swap_restore(swp_entry_t entry, struct folio *folio); #endif /* CONFIG_ARM64_MTE */ diff --git a/arch/arm64/mm/mteswap.c b/arch/arm64/mm/mteswap.c index a31833e3ddc5..295836fef620 100644 --- a/arch/arm64/mm/mteswap.c +++ b/arch/arm64/mm/mteswap.c @@ -68,6 +68,13 @@ void mte_invalidate_tags(int type, pgoff_t offset) mte_free_tag_storage(tags); } +static inline void __mte_invalidate_tags(struct page *page) +{ + swp_entry_t entry = page_swap_entry(page); + + mte_invalidate_tags(swp_type(entry), swp_offset(entry)); +} + void mte_invalidate_tags_area(int type) { swp_entry_t entry = swp_entry(type, 0); @@ -83,3 +90,39 @@ void mte_invalidate_tags_area(int type) } xa_unlock(&mte_pages); } + +int arch_prepare_to_swap(struct folio *folio) +{ + long i, nr; + int err; + + if (!system_supports_mte()) + return 0; + + nr = folio_nr_pages(folio); + + for (i = 0; i < nr; i++) { + err = mte_save_tags(folio_page(folio, i)); + if (err) + goto out; + } + return 0; + +out: + while (i--) + __mte_invalidate_tags(folio_page(folio, i)); + return err; +} + +void arch_swap_restore(swp_entry_t entry, struct folio *folio) +{ + if (system_supports_mte()) { + long i, nr = folio_nr_pages(folio); + + entry.val -= swp_offset(entry) & (nr - 1); + for (i = 0; i < nr; i++) { + mte_restore_tags(entry, folio_page(folio, i)); + entry.val++; + } + } +} diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index de0c89105076..e04b93c43965 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -535,16 +535,4 @@ static inline int split_folio_to_order(struct folio *folio, int new_order) #define split_folio_to_list(f, l) split_folio_to_list_to_order(f, l, 0) #define split_folio(f) split_folio_to_order(f, 0) -/* - * archs that select ARCH_WANTS_THP_SWAP but don't support THP_SWP due to - * limitations in the implementation like arm64 MTE can override this to - * false - */ -#ifndef arch_thp_swp_supported -static inline bool arch_thp_swp_supported(void) -{ - return true; -} -#endif - #endif /* _LINUX_HUGE_MM_H */ diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index e1b22903f709..bfcfe3386934 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -1106,7 +1106,7 @@ static inline int arch_unmap_one(struct mm_struct *mm, * prototypes must be defined in the arch-specific asm/pgtable.h file. */ #ifndef __HAVE_ARCH_PREPARE_TO_SWAP -static inline int arch_prepare_to_swap(struct page *page) +static inline int arch_prepare_to_swap(struct folio *folio) { return 0; } diff --git a/mm/page_io.c b/mm/page_io.c index ae2b49055e43..a9a7c236aecc 100644 --- a/mm/page_io.c +++ b/mm/page_io.c @@ -189,7 +189,7 @@ int swap_writepage(struct page *page, struct writeback_control *wbc) * Arch code may have to preserve more data than just the page * contents, e.g. memory tags. */ - ret = arch_prepare_to_swap(&folio->page); + ret = arch_prepare_to_swap(folio); if (ret) { folio_mark_dirty(folio); folio_unlock(folio); diff --git a/mm/swap_slots.c b/mm/swap_slots.c index 90973ce7881d..53abeaf1371d 100644 --- a/mm/swap_slots.c +++ b/mm/swap_slots.c @@ -310,7 +310,7 @@ swp_entry_t folio_alloc_swap(struct folio *folio) entry.val = 0; if (folio_test_large(folio)) { - if (IS_ENABLED(CONFIG_THP_SWAP) && arch_thp_swp_supported()) + if (IS_ENABLED(CONFIG_THP_SWAP)) get_swap_pages(1, &entry, folio_nr_pages(folio)); goto out; } From patchwork Mon Mar 4 08:13:45 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Barry Song <21cnbao@gmail.com> X-Patchwork-Id: 13580150 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 65378C48BF6 for ; Mon, 4 Mar 2024 08:14:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F18196B00A1; Mon, 4 Mar 2024 03:14:38 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id EC7F96B00A2; Mon, 4 Mar 2024 03:14:38 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D90636B00A3; Mon, 4 Mar 2024 03:14:38 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id C6B0E6B00A1 for ; Mon, 4 Mar 2024 03:14:38 -0500 (EST) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id A54C5A098D for ; Mon, 4 Mar 2024 08:14:38 +0000 (UTC) X-FDA: 81858645036.21.93F7DDB Received: from mail-pf1-f170.google.com (mail-pf1-f170.google.com [209.85.210.170]) by imf06.hostedemail.com (Postfix) with ESMTP id E2F57180007 for ; Mon, 4 Mar 2024 08:14:36 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=kNiKDwTq; spf=pass (imf06.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.210.170 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1709540077; a=rsa-sha256; cv=none; b=E87m49EtNFXtiJtWC0rh1JB8bZx90YOmC/U5/9E6YDh2XjdCT+t5EBU+yUXT8TCH97Q0xK bRluejh0e4qqKEuq6fG0O3V4E5zIxhiN0BbwgBr1JU/JPForuxnvmpS28lY+pTg6aWQQh7 xdjYWZ/4HqnF1GZiC98a06I82Il+T7w= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=kNiKDwTq; spf=pass (imf06.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.210.170 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1709540077; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=AOQhq+D2hctSN3bqux7faSvf1ZbbOD2cN4aBjG+5ulk=; b=2KN4Ac7kgVdIMUFcvETJSuYQhojPqNJ1C9sabN0PFIAxmWrt9sTOCcamEKmZTQpCyHizO9 BAVHcepYJ/QDiZcivAp1TILzO5MW82idd1WOsDG/ggNEomlCLfH6O/FwdAamxNHc9ZVhwp DgKR8Y1JgZFn75kosoFvJSrmx68iZR8= Received: by mail-pf1-f170.google.com with SMTP id d2e1a72fcca58-6e4560664b5so3374327b3a.1 for ; Mon, 04 Mar 2024 00:14:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1709540076; x=1710144876; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=AOQhq+D2hctSN3bqux7faSvf1ZbbOD2cN4aBjG+5ulk=; b=kNiKDwTqVTmiFFDr9MTniw9+1u4Jjh4g/B0+ADcpk49ee891FXtSbgbRO4P+1eNoYL xhhi7ONkoLTO/OvMq8dU7q1RZAhrb06W3OHqTGq3IGusiTIzHTePOm6ZwLp+kX7Y99fm IzGM9+jm4H/Yt0sPLm3GmxJolADgnNSHoOSzMhbls4QAKGJ2JbA1eBStPIQVpkjmUZjc Dou7uuL0kwee/4JUSXGd2u5YjlDlGndcyFi4rG4OdxL7ObG51qLoqgWaQgNkfQu5XEpp UP2FnEXEHLidBSgnBUhj3aHATKgWGsC6SSdIP7HgRI8m7Hrynn6vpEc9kPy/wZtMZ/yG EBvQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709540076; x=1710144876; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=AOQhq+D2hctSN3bqux7faSvf1ZbbOD2cN4aBjG+5ulk=; b=ChNx4H1Pg9/wW4DAflcmH0OOw7TFqrxl+F0Opljlx8sHaKhmLs0SHibzYXu2v0h0zx 8Or5pMqcsfwUBN9kHNs6NEGYBJY1zODAHpbccFkx1buUPjpAlkNLkR975Xp5449E2WhS abMgZ3GLlGy6iy6o9S2uTbh9Dw4ewpytu3hpZ9Oq6EiCc7qYL/TKM+VSun7/ZNlALDsC 1JtYXzeoxTPcvqq23na1Q6vruMQxc1JrSFcMRQbFdGPf1vK+zjzgBj0uhH0JydIZQbOe S0Qq052I4aRqv1ZK/OTTDCdlmy/mTPVi/908EuxtCbIXgzuTaivkmCOYDnt8FSgChIPt 1Fsg== X-Forwarded-Encrypted: i=1; AJvYcCWFjSpyf+O0mmtN7lYrhPk3emsqvmCKd87wqekQxs8+PpPsY/WGnk5cQgN5WFhwObwdEx6B1nIjm8xyZxMnmz4lEac= X-Gm-Message-State: AOJu0Yxje5/WOpZawXX9XEe9CgSAmmt+pW1raUfxdfgpb+rLzRZsEHMu rbHLC08+pR1th4DkV9S4ZrNrWfQPykmPqs00EUW2ojwNJ/WbNTzN X-Google-Smtp-Source: AGHT+IGEsGkarNyt9Jp8whgPA0poRrIOWMKKVHKbcHA3ADNFk6EzEEaBlL35RANkuwLw0QLGLyniCQ== X-Received: by 2002:a05:6a20:979b:b0:1a1:3000:ab67 with SMTP id hx27-20020a056a20979b00b001a13000ab67mr9356185pzc.46.1709540075740; Mon, 04 Mar 2024 00:14:35 -0800 (PST) Received: from localhost.localdomain ([2407:7000:8942:5500:aaa1:59ff:fe57:eb97]) by smtp.gmail.com with ESMTPSA id ka42-20020a056a0093aa00b006e558a67374sm6686387pfb.0.2024.03.04.00.14.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 04 Mar 2024 00:14:35 -0800 (PST) From: Barry Song <21cnbao@gmail.com> To: akpm@linux-foundation.org, linux-mm@kvack.org, ryan.roberts@arm.com Cc: chengming.zhou@linux.dev, chrisl@kernel.org, david@redhat.com, hannes@cmpxchg.org, kasong@tencent.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, mhocko@suse.com, nphamcs@gmail.com, shy828301@gmail.com, steven.price@arm.com, surenb@google.com, wangkefeng.wang@huawei.com, willy@infradead.org, xiang@kernel.org, ying.huang@intel.com, yosryahmed@google.com, yuzhao@google.com, Chuanhua Han , Barry Song Subject: [RFC PATCH v3 2/5] mm: swap: introduce swap_nr_free() for batched swap_free() Date: Mon, 4 Mar 2024 21:13:45 +1300 Message-Id: <20240304081348.197341-3-21cnbao@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240304081348.197341-1-21cnbao@gmail.com> References: <20240304081348.197341-1-21cnbao@gmail.com> MIME-Version: 1.0 X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: E2F57180007 X-Stat-Signature: fekxkzp386x7geofedpmg8rn66omr36t X-Rspam-User: X-HE-Tag: 1709540076-232089 X-HE-Meta: U2FsdGVkX1+BMZSB9gTuwQPq9EoHif2jUPIxUpNuFUw/2CPT/YliwPkl7/01ekF/B7X/oADVzxzU3R3usVGmtxkB5wvybgvVFiJSwo1UvbpVvi62H1YFQwF85XJVtCe9Iosc7cEpJ9fFuRox18fvQj57tuHaXvS6CBpBhOizVECL0ywOpAM7T/WgEdIaW8X4pZgtvEG664/bL18iwYp9XcbS4H1uzkNbbQmZOtyCheC+Ok9zrUI+pmz0PG4RI+RoXuJ5jio8yjB5gDd/jEiAetskQ522QCQ7GeTAUfuuOP5Up2aaIAlsvz9zi8YyDA0Lu/+YGIQYHRTmyRs7Uf33PmzaWk+oqcbUNYdCNxU6QoO9K/WPzHRoejPZT6hTsUTeDgWWHVGXck49097t7dB+R2TkGn111phOIedr+Gxvq9InH9N5NuBw8gG0tZHinYkxE8ETFx3k01VKSeuKuT2Ad7PCcYOjXR2F+wosCaE/xy664vTA/GHU4iO0wna7qrneRJyOqsqpmLYoUzLUsK4dMNykdEEusS3Hmt777/RszcwjXFdfq3Xw4eqIeS+nO3OOI84pgDufIoWVMh9kKD+rG4DIjG3KvFiNL3T7pB4DN+ICPmlqC5TyfA6wLTQsOzFnCEYoAY4MqLjb6i6AgFCtYBFB0aCQVGYsFU2ehJ5s+60ucMRpV/kZvYdwvwDJzfYECesYKeC6QCddcOhrTBtGRN3qRbHuvLFrIzPxrxICZc+Yr1gAEnu9WmAMoDDA6vnbtTX5T4hTNEJccLtMSp5s3kvTYIC7A3OGPqswngb/T6vEjV9ZXqrLw46cScyk/72xnej1jCYfNSifPPkddzYkcf1HZVk/XMeW/5pVsLqATJKjDV+WdXw0BhbeTXKNFxdVQnAL0X9AKV7AJFwqyFbo2YAOCZdgFaC/Ya4OzK/3pe5IkphVQI7zTqJuzIAYSLf1HK8AWBwFEbTW1UXXmV2 marB+20S SMCkE6uWkRIascfWIkoh/QLZugdq55R10VsaL+CduUfwPpHEofCg1m6OrfbKIL8DWzc/j8pHqDqRvvmmozHuVRUoKZ73ju5GObqmqDsarMM8kDl+ww9VLBkKc7gI2GoyTqHejJuQN6aYt4IW+IgVMalKtu8zKntioqUs2 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000003, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Chuanhua Han While swapping in a large folio, we need to free swaps related to the whole folio. To avoid frequently acquiring and releasing swap locks, it is better to introduce an API for batched free. Signed-off-by: Chuanhua Han Co-developed-by: Barry Song Signed-off-by: Barry Song --- include/linux/swap.h | 6 ++++++ mm/swapfile.c | 35 +++++++++++++++++++++++++++++++++++ 2 files changed, 41 insertions(+) diff --git a/include/linux/swap.h b/include/linux/swap.h index 2955f7a78d8d..d6ab27929458 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -481,6 +481,7 @@ extern void swap_shmem_alloc(swp_entry_t); extern int swap_duplicate(swp_entry_t); extern int swapcache_prepare(swp_entry_t); extern void swap_free(swp_entry_t); +extern void swap_nr_free(swp_entry_t entry, int nr_pages); extern void swapcache_free_entries(swp_entry_t *entries, int n); extern int free_swap_and_cache(swp_entry_t); int swap_type_of(dev_t device, sector_t offset); @@ -561,6 +562,11 @@ static inline void swap_free(swp_entry_t swp) { } +void swap_nr_free(swp_entry_t entry, int nr_pages) +{ + +} + static inline void put_swap_folio(struct folio *folio, swp_entry_t swp) { } diff --git a/mm/swapfile.c b/mm/swapfile.c index 3f594be83b58..244106998a69 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1341,6 +1341,41 @@ void swap_free(swp_entry_t entry) __swap_entry_free(p, entry); } +/* + * Called after swapping in a large folio, batched free swap entries + * for this large folio, entry should be for the first subpage and + * its offset is aligned with nr_pages + */ +void swap_nr_free(swp_entry_t entry, int nr_pages) +{ + int i; + struct swap_cluster_info *ci; + struct swap_info_struct *p; + unsigned type = swp_type(entry); + unsigned long offset = swp_offset(entry); + DECLARE_BITMAP(usage, SWAPFILE_CLUSTER) = { 0 }; + + /* all swap entries are within a cluster for mTHP */ + VM_BUG_ON(offset % SWAPFILE_CLUSTER + nr_pages > SWAPFILE_CLUSTER); + + if (nr_pages == 1) { + swap_free(entry); + return; + } + + p = _swap_info_get(entry); + + ci = lock_cluster(p, offset); + for (i = 0; i < nr_pages; i++) { + if (__swap_entry_free_locked(p, offset + i, 1)) + __bitmap_set(usage, i, 1); + } + unlock_cluster(ci); + + for_each_clear_bit(i, usage, nr_pages) + free_swap_slot(swp_entry(type, offset + i)); +} + /* * Called after dropping swapcache to decrease refcnt to swap entries. */ From patchwork Mon Mar 4 08:13:46 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Barry Song <21cnbao@gmail.com> X-Patchwork-Id: 13580151 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6D0EDC54E41 for ; Mon, 4 Mar 2024 08:14:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E30DC6B00A3; Mon, 4 Mar 2024 03:14:46 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id DDE0D6B00A4; Mon, 4 Mar 2024 03:14:46 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C57DE6B00A5; Mon, 4 Mar 2024 03:14:46 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id AF32B6B00A3 for ; Mon, 4 Mar 2024 03:14:46 -0500 (EST) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 743191A0946 for ; Mon, 4 Mar 2024 08:14:46 +0000 (UTC) X-FDA: 81858645372.14.2E0D0C8 Received: from mail-io1-f50.google.com (mail-io1-f50.google.com [209.85.166.50]) by imf01.hostedemail.com (Postfix) with ESMTP id 994D540013 for ; Mon, 4 Mar 2024 08:14:44 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=BtA62rUI; spf=pass (imf01.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.166.50 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1709540084; a=rsa-sha256; cv=none; b=aVRSbKXWL8OSCRH8GLOkJxi7rBmbbkblBRH8kor5K4tlFU5kU27sdLg2MhdOMa1fCDw6V1 BohCbDBIMn4192PZ4Ta3zriJgDPeM0BWCYK5gLl6mA8od+RcE6OBSII6so78UB7ammLSHX 9vLPcIHCjl4Z0lDudVLW7/Wbzy7H9hI= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=BtA62rUI; spf=pass (imf01.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.166.50 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1709540084; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=nPNnhD7u3Mm6wgXrNGhON5/bGI4wy7Z421ED7ypUMCI=; b=FWk4NhCYHhqc6cP5uIUODHtha3WnydtOKr8YaToFCq3fU++7W0qpHL/PcGTNydxQgvdH1y Z29HTxBaAeg49fKLhksd07mJAmz+SEJvkc9tanOPRv/CR7IN2PYOqYNP/ZSkGr9yeiGxxf jGRjAGWChveBV/jCpUSODJkYBMFu+yw= Received: by mail-io1-f50.google.com with SMTP id ca18e2360f4ac-7bed9fb159fso259233439f.1 for ; Mon, 04 Mar 2024 00:14:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1709540083; x=1710144883; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=nPNnhD7u3Mm6wgXrNGhON5/bGI4wy7Z421ED7ypUMCI=; b=BtA62rUIq8wZgbCm95ZopBfTnSyW4IGDWs/AnACD69LrnNDt6zWKuLEaCUkAVhfBYx PAsLtR2TnCVvtKSGDgxiVM88pm7p+q2PLL+zKxT7VwWcZ7DT4s4wHjGvpGI5FWzzgJ03 Ab4m6OOeIfhXTR84SwIgI8vflUhIhlybRjKkBQ8hnpRbhO9XU2gkeXTKqBVez+JbUGHn ZrFC8e1+Gmvs0/0j7ituXH/eXrBRI31+1oOIS6RMAXRnJr1e1niMoLYqKsouLxMymKB+ m9LMsPPYijYptgl/lRJ1qOJsLtpL/5vhkbaxjHXnsgC19LmPU2xa2B+0Xr0Jjzf8RkdY Xj/A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709540083; x=1710144883; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=nPNnhD7u3Mm6wgXrNGhON5/bGI4wy7Z421ED7ypUMCI=; b=i8LpU9uY3p0678DrISOhEnG0YMI01FhYbkmSZ1V2mvy4VO498F+OGmBpR2IGCbPH2S DHJ+Fbe1IeJ9lwWZ/jqC8Tg8TLKM+UOcrsxO5arvBEXmVfXLlrMBH5XQMW6IaOWPLWnl K3AYGFtMrCwuwo1fPcDC16TvxANNHsv8DO7zhwODGE8fZjF+j+cnImjtHg02nTVshGlX C4X10b1Cc2Kzg/16P9e0fNJpsDMY/TVLKfv289Ea5dS+eyUtSW4CtfX/fAEdeRu7KnzW B5YPvYPqZdv+D8QvP9xRtw+2U43atjMMGwkgkhfmoTup9XstGHEK9bhLswE6MzJC/zqp juUQ== X-Forwarded-Encrypted: i=1; AJvYcCXgXMHIh0YZ/ZRPDbJzeGlGSmUJOSqUasp2ATXa8KGinRvKrXxNSJPvS8hjRtLXVFv1aRN6eMWuYOIT+m93u2iiSdM= X-Gm-Message-State: AOJu0YwJZDifRLqjWz+JtCGXJ1JFGDv2zF8/vXe28Thlm5wtfqsjfFwA wGEDJjVg8pieBKr7tPOtoxcshPjq79YRVrlERo9MzXfpCAirMdH7 X-Google-Smtp-Source: AGHT+IGUq+bnojeKaoe2OXAEqxwKRkmXLSREbs2hTuKe5Xt3pOS0+qxiK9Mn4Apyji0BmRbt8d2UYg== X-Received: by 2002:a05:6e02:1567:b0:365:f8d:50c3 with SMTP id k7-20020a056e02156700b003650f8d50c3mr11150779ilu.21.1709540083664; Mon, 04 Mar 2024 00:14:43 -0800 (PST) Received: from localhost.localdomain ([2407:7000:8942:5500:aaa1:59ff:fe57:eb97]) by smtp.gmail.com with ESMTPSA id ka42-20020a056a0093aa00b006e558a67374sm6686387pfb.0.2024.03.04.00.14.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 04 Mar 2024 00:14:43 -0800 (PST) From: Barry Song <21cnbao@gmail.com> To: akpm@linux-foundation.org, linux-mm@kvack.org, ryan.roberts@arm.com Cc: chengming.zhou@linux.dev, chrisl@kernel.org, david@redhat.com, hannes@cmpxchg.org, kasong@tencent.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, mhocko@suse.com, nphamcs@gmail.com, shy828301@gmail.com, steven.price@arm.com, surenb@google.com, wangkefeng.wang@huawei.com, willy@infradead.org, xiang@kernel.org, ying.huang@intel.com, yosryahmed@google.com, yuzhao@google.com, Chuanhua Han , Barry Song Subject: [RFC PATCH v3 3/5] mm: swap: make should_try_to_free_swap() support large-folio Date: Mon, 4 Mar 2024 21:13:46 +1300 Message-Id: <20240304081348.197341-4-21cnbao@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240304081348.197341-1-21cnbao@gmail.com> References: <20240304081348.197341-1-21cnbao@gmail.com> MIME-Version: 1.0 X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 994D540013 X-Stat-Signature: 1dnsa49pdgx4pbqcn4oeye9bfraqk83z X-Rspam-User: X-HE-Tag: 1709540084-798920 X-HE-Meta: U2FsdGVkX1/bhNq8TtNG2Cjc9wMmhqwdtTAaFuCya1tpva6dI/f6bcJsoR0ypf50Yb2dnDF/7OfgF3+ylgWhk6QVx+hKWm1cOdZB8ytlfX13+ONMRn6s4d0AOG7gPHbkOfOsJ/gxnNspOJFNwo/cUAbw2UC9eC6Zs4/snO70JaDelboNs5WmnHYgnKC2kuLC6WA05uY3KCivMWCsGeN5occ6HaIZC4fnvzgrWSmLeIUqGoxghV7W7/jtomgIlBf104rRyRS5eB8Si65QnUuIc449Rnxw+4TdsE16N6Bj8riecgkYwHfxGgFIQcqqbYrMXKjnWfhxcB1WmjcXdUFG8l4AnuL1GhjMwmfJ2o2th63weeHa3/rJVxY222LneYtY8qQZLeEDG8wUZkcZSRBCY0UFTJHFZr+OyzAevpjoUZl7Xr4ABUw2R5ZDcoZuLhIOpPp7guf/cLR6EDvegC6/vS7knjGdKhS1+IAKWVH5e2wVlnpJGsRGDpwmjisEjA+ZthX2KtXLXeyH+jZ2KxgIZ27ghNRwY8zfVaqqRL1OH6gWn9UZHdxgxUxqIEdMWSpZkNlOKvNtiRjGHRbhML8U6zXn/Ors9QkhlYIauvFJpxvxQ+AOJjSuarj/eE3mp/ZhJu38zklOwKdPj5hYMMT2Yyp0TBd2O1tzWJpXqUT+UXK7YlVMCCR+D0y9Wy3d6lXjBRKILqdx4GEndO9oD/P/EBifVJF5LOM/jmnXeYPakWvFRxnlIao75gOs5HzibHX0f1yVS1vaiQN9hthIXSFTovXthOVafjmRD2SCMWQoKlaiQiWbdgIF3V6XMKqrQxmibPvHeNBrCWVaKlXEQXSADraLS02sdjajAAqHXzcRL7V0/1jgeAWgVyQiEqZiN9hBuxl67EYY3enC072NCPIdiMqVZOzrWnnlo+ThSow7PishhwhRgADBUYI+QKR96VPshuAEx2e02uEEiawHIM2 Njw3qfRM vi/t0/KhEKs43+3UaDG8LK+IIto28hMfXQuBeW+E2p1Cfu8bAJP8yT6OeARIZYFsVoY5cZ29M4cQ9kYS1bHXQPOKb3HT7dMQabm/nPAnuK3Zy0C/YEj0KLwhWAhBhexENT4tgzjV7pQgZKTO6fPSGlA0qnzgY6e5POL4qGpIQDza2tYu5gvEQV8BOrRebDJRl8ySgKlnJg3efr5lS3bC82mokDWW1tXflOeNTp8gQq4i7LEyXtZx06YPTlTiJPIIe9FI0bb1bJCo1t1OEuPnQZmIXjtkLbI1zgX3AV7wHiadkKsg+yKQT1V3Kxsil9TMXOb+1JoJuOYgFkGlGN7dWMi33EGAmUMDB31gbZKW62nSPabKxkwJ2UIJNcmvNqTSlmDd2hrL0zWVKchlN1TevIQVLuGgrtmnYc6P/a9zf5mhULeeWZfk6VS3gc5IusJFejv21nOcgzYWs9C5vQp2AIfGtaFM8DqjCBYSRV7HId2SxJMfGM74E5H3UPYCjX0CwHa+4HOBwqrvulLIqqOqK+aMDLKGl0thzIwzm X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Chuanhua Han should_try_to_free_swap() works with an assumption that swap-in is always done at normal page granularity, aka, folio_nr_pages = 1. To support large folio swap-in, this patch removes the assumption. Signed-off-by: Chuanhua Han Co-developed-by: Barry Song Signed-off-by: Barry Song Acked-by: Chris Li Reviewed-by: Ryan Roberts --- mm/memory.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/memory.c b/mm/memory.c index abd4f33d62c9..e0d34d705e07 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3837,7 +3837,7 @@ static inline bool should_try_to_free_swap(struct folio *folio, * reference only in case it's likely that we'll be the exlusive user. */ return (fault_flags & FAULT_FLAG_WRITE) && !folio_test_ksm(folio) && - folio_ref_count(folio) == 2; + folio_ref_count(folio) == (1 + folio_nr_pages(folio)); } static vm_fault_t pte_marker_clear(struct vm_fault *vmf) From patchwork Mon Mar 4 08:13:47 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Barry Song <21cnbao@gmail.com> X-Patchwork-Id: 13580152 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7D008C48BF6 for ; Mon, 4 Mar 2024 08:14:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1ACAF6B00A5; Mon, 4 Mar 2024 03:14:55 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 15B486B00A6; Mon, 4 Mar 2024 03:14:55 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0243D6B00A7; Mon, 4 Mar 2024 03:14:54 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id E41516B00A5 for ; Mon, 4 Mar 2024 03:14:54 -0500 (EST) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id B702EA090D for ; Mon, 4 Mar 2024 08:14:54 +0000 (UTC) X-FDA: 81858645708.03.3712DB5 Received: from mail-pf1-f169.google.com (mail-pf1-f169.google.com [209.85.210.169]) by imf27.hostedemail.com (Postfix) with ESMTP id 1D65B40011 for ; Mon, 4 Mar 2024 08:14:52 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Q30qUKtF; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf27.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.210.169 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1709540093; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=VEHcUaBOURO/yC0z8mfKLfYRHHxSSbfVWo8QOD6O6SM=; b=OQpwICz6npPL/tFAv5WJxQRWwzBTKzFUJ6ypUbdxlxDzSdf6Uu0aRxR+u/G/k+JkTzhahM RugfgACb0mKjkjq2lc0rHPpkhmNwI8SvMz/EEntL9/n2j+Ab3ZP9GAiydsDXJ6ePMBk0IJ yT97Sdxo17ux1qt8HvAT6rsen6basxw= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Q30qUKtF; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf27.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.210.169 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1709540093; a=rsa-sha256; cv=none; b=7P4SVFSuGEl0lWq5Ozgx6jmoiPqkEbLXodssrbAT+KNV/4OLjNOVbfl//h7t2AxKBw40M4 vgS2fQ9AFD00CYsJm2XVbtWPE70mb+hPcP/2HPHgkohbrab1OkghExVaruCOOZKZU0ffhu ogdStFK6BQ8lq3SxCsVlBWJxUMHsCzc= Received: by mail-pf1-f169.google.com with SMTP id d2e1a72fcca58-6da202aa138so3020994b3a.2 for ; Mon, 04 Mar 2024 00:14:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1709540092; x=1710144892; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=VEHcUaBOURO/yC0z8mfKLfYRHHxSSbfVWo8QOD6O6SM=; b=Q30qUKtFLH4Xt9nnQlHr/CpqCBopJFydnF+kI1Y9kYArD7iBU8AJHqgS179xzNNigE tDL4S3sBVBv07jiK+1CK+670RmDjkT9EsbLwCQTbazKt97ti741AtBbZ8idKN0XdMd1C QcR2Ee5zienGEoCAZQhQN6QmLw5qqdAvdilAzylrKDh9+R9ckKcq7/uJKTtCV43Xmb0Q RcSoJMqjmntZdcETmt2OYniK7omIHSUzkbEDool3zECRfFgsjheqSpnkUWUsMyyGysGW fRQv5oPPs1IdIoxJy1pSVlHK2y5N648H5rZZ0s0RXFz54snAIX93djXhzbJKxHwBzIwX FAug== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709540092; x=1710144892; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=VEHcUaBOURO/yC0z8mfKLfYRHHxSSbfVWo8QOD6O6SM=; b=px4WmNBWkRTmNJZ8qhTMix/hjLTBe3xIK6GFXxfO1jWaMfeiq+Um/k+RFSSmt5mL0l giVUyDD9QSTByp/oJloXw2Brp3uaMCsWSAJRD+cqdC6p0bB8OuAC2bbg5TQBIPpCEYXr JFHE4mBIOJvps4l6S2/x/r5CtBMr6lugcsr6tGFdch4hfeztfV0+UE03oeV/4g6h9TNI 6EGLh5nzUr1LcAOUktH/dcGtuHrBHk1+VLO0qwMI7Moo0WzRdDEeryHhVL+keLzCZxgY TRnlCHELzmjMihI1tvTn5JCpdTVFVIzsGyf4RKAmn43x6406IZfpk/4pwJ+V5QTWP0FI NTOg== X-Forwarded-Encrypted: i=1; AJvYcCU9/slQ/qwGN9z6DwnQCn8KxO2FHat+M/yfvRhHVZS9rYx9cNwKiNNLeiuPQYTwVA9LAQVOuhuY4MMItOnyPvDTIq8= X-Gm-Message-State: AOJu0YwkJ+PBLtEYsa5LsrIvj2UbMuutvA4uwWdrBfE4eOPcKz3NqNWM dk0WeY/UfpnKSF9AKRSe5EiZBokM7aT4442JuHa7gdlHRC4O2ti0 X-Google-Smtp-Source: AGHT+IGE7Q24s92a3q9O7BOh15+H0RGSvY6KsWDz94mjR2aHEmrzB8E5OFPS9feP8EbqMCx8xVkVsg== X-Received: by 2002:a05:6a00:2d99:b0:6e6:2499:d189 with SMTP id fb25-20020a056a002d9900b006e62499d189mr2330890pfb.2.1709540091964; Mon, 04 Mar 2024 00:14:51 -0800 (PST) Received: from localhost.localdomain ([2407:7000:8942:5500:aaa1:59ff:fe57:eb97]) by smtp.gmail.com with ESMTPSA id ka42-20020a056a0093aa00b006e558a67374sm6686387pfb.0.2024.03.04.00.14.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 04 Mar 2024 00:14:51 -0800 (PST) From: Barry Song <21cnbao@gmail.com> To: akpm@linux-foundation.org, linux-mm@kvack.org, ryan.roberts@arm.com Cc: chengming.zhou@linux.dev, chrisl@kernel.org, david@redhat.com, hannes@cmpxchg.org, kasong@tencent.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, mhocko@suse.com, nphamcs@gmail.com, shy828301@gmail.com, steven.price@arm.com, surenb@google.com, wangkefeng.wang@huawei.com, willy@infradead.org, xiang@kernel.org, ying.huang@intel.com, yosryahmed@google.com, yuzhao@google.com, Barry Song , Hugh Dickins , Minchan Kim , SeongJae Park Subject: [RFC PATCH v3 4/5] mm: swap: introduce swapcache_prepare_nr and swapcache_clear_nr for large folios swap-in Date: Mon, 4 Mar 2024 21:13:47 +1300 Message-Id: <20240304081348.197341-5-21cnbao@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240304081348.197341-1-21cnbao@gmail.com> References: <20240304081348.197341-1-21cnbao@gmail.com> MIME-Version: 1.0 X-Rspam-User: X-Stat-Signature: edb3jhs7onnjdme68mqu4qyhj6xo4uhd X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 1D65B40011 X-HE-Tag: 1709540092-686252 X-HE-Meta: U2FsdGVkX197dUO1tl253ewnNcTyIicWo7Weo0t65zrQDbu6PQfZ/7xl2Id4CJu2WHlwFBz4VPhnsjJK54no8BKH6HIHQ8SxLtzb3zjc3vo8rw+wkPjUCxlSuLi4DorznlrZxIBlu9USp1/R4VhCj7ErrTNx/cWGJiv/mQkYk/9f+wZwTQF1LZX/QM6QEMfg99oDSq5e+09drXgZwck3h/vUoa3FLm8wlztalrX92wA7wwhTLHyIB7vqy8pYtmk9IeKFuq9wLq11Yi8pmWW8IrVQYjXor36K47sAze2BboWQr9hbnlXPyatpnvH3lvm88TJJjDTXeKew3w2zsvGJ875aGYXrwACUNIdxhfrwePxB1e5tVVsizo+xsTH+ZBz5m+TTHrgx+RNPuwxC06imf4GgwwBJRkQxVoxywwg9SUmkutrSXeqLdtww+Vt7uIRplnewiuOs8K7OQ5JLfgLFvTAGmgMy6JvVm1ATHJsNdWlvxMrX9HgeeYuGnCYto/w5O/Qwl5SzYhaE+NTZ8DrrXBmdp52m6I7i+vPiB9z91y3aJ6cE0JjRAf8yIAt9tCNIehV5c9elyGT0hFEilgqNqJIO3DTwbdXave108Md0jX/oxPBI1R7jWUcm4HYZUrKQ6KwrV5p+R3UwrEWOK7bx7GYasJAlWTvFavSbuaGaCJ+6rK88eVgu9EqssTRZUOSzmlzUmibNgaPRYdiUhDee+s2ilhVzlAlcy2Cbs2ejIVtk3Y8XTxbBJom3eUyQAcji6luPWrmkPpFdTt6Yi7mdOje3Qs+8HsmtwB65vmhy9xFEBLuUklQdOYhkyok7D9l3G7Z34Qsh9s1ijsOz2urp1b08teXPFx0qyUGy7FgSuoXSYCsksZNcwk98mNd8tC5HBvo04W3pTDJJIgAvq88uzURcZ9saI+BxAAceQZnhbPY81j262bxixHHHNXz92bcnrQGDtRkJ2DAU740TOhC WXQ2ZVVV jlz/bW3gYeA1gxZzPySeICHqGHHgNK1GVio9SBfB0ki6VYY7kH36EEgoagu3Aoo5+tn925e4uw65ZlyDkr1g1CTO+pNWXWEf9wQrbXxUzzaC6stail+g7POtUDDAjjZivH7nviuJZq/vy6Veo5KRthLINmrP1B14LSbaV X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Barry Song Commit 13ddaf26be32 ("mm/swap: fix race when skipping swapcache") supports one entry only, to support large folio swap-in, we need to handle multiple swap entries. Cc: Kairui Song Cc: "Huang, Ying" Cc: David Hildenbrand Cc: Chris Li Cc: Hugh Dickins Cc: Johannes Weiner Cc: Matthew Wilcox (Oracle) Cc: Michal Hocko Cc: Minchan Kim Cc: Yosry Ahmed Cc: Yu Zhao Cc: SeongJae Park Signed-off-by: Barry Song --- include/linux/swap.h | 1 + mm/swap.h | 1 + mm/swapfile.c | 118 ++++++++++++++++++++++++++----------------- 3 files changed, 74 insertions(+), 46 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index d6ab27929458..22105f0fe2d4 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -480,6 +480,7 @@ extern int add_swap_count_continuation(swp_entry_t, gfp_t); extern void swap_shmem_alloc(swp_entry_t); extern int swap_duplicate(swp_entry_t); extern int swapcache_prepare(swp_entry_t); +extern int swapcache_prepare_nr(swp_entry_t entry, int nr); extern void swap_free(swp_entry_t); extern void swap_nr_free(swp_entry_t entry, int nr_pages); extern void swapcache_free_entries(swp_entry_t *entries, int n); diff --git a/mm/swap.h b/mm/swap.h index fc2f6ade7f80..1cec991efcda 100644 --- a/mm/swap.h +++ b/mm/swap.h @@ -42,6 +42,7 @@ void delete_from_swap_cache(struct folio *folio); void clear_shadow_from_swap_cache(int type, unsigned long begin, unsigned long end); void swapcache_clear(struct swap_info_struct *si, swp_entry_t entry); +void swapcache_clear_nr(struct swap_info_struct *si, swp_entry_t entry, int nr); struct folio *swap_cache_get_folio(swp_entry_t entry, struct vm_area_struct *vma, unsigned long addr); struct folio *filemap_get_incore_folio(struct address_space *mapping, diff --git a/mm/swapfile.c b/mm/swapfile.c index 244106998a69..bae1b8165b11 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -3309,7 +3309,7 @@ void si_swapinfo(struct sysinfo *val) } /* - * Verify that a swap entry is valid and increment its swap map count. + * Verify that nr swap entries are valid and increment their swap map count. * * Returns error code in following case. * - success -> 0 @@ -3319,66 +3319,76 @@ void si_swapinfo(struct sysinfo *val) * - swap-cache reference is requested but the entry is not used. -> ENOENT * - swap-mapped reference requested but needs continued swap count. -> ENOMEM */ -static int __swap_duplicate(swp_entry_t entry, unsigned char usage) +static int __swap_duplicate_nr(swp_entry_t entry, int nr, unsigned char usage) { struct swap_info_struct *p; struct swap_cluster_info *ci; unsigned long offset; - unsigned char count; - unsigned char has_cache; - int err; + unsigned char count[SWAPFILE_CLUSTER]; + unsigned char has_cache[SWAPFILE_CLUSTER]; + int err, i; p = swp_swap_info(entry); offset = swp_offset(entry); ci = lock_cluster_or_swap_info(p, offset); - count = p->swap_map[offset]; - - /* - * swapin_readahead() doesn't check if a swap entry is valid, so the - * swap entry could be SWAP_MAP_BAD. Check here with lock held. - */ - if (unlikely(swap_count(count) == SWAP_MAP_BAD)) { - err = -ENOENT; - goto unlock_out; - } - - has_cache = count & SWAP_HAS_CACHE; - count &= ~SWAP_HAS_CACHE; - err = 0; - - if (usage == SWAP_HAS_CACHE) { + for (i = 0; i < nr; i++) { + count[i] = p->swap_map[offset + i]; - /* set SWAP_HAS_CACHE if there is no cache and entry is used */ - if (!has_cache && count) - has_cache = SWAP_HAS_CACHE; - else if (has_cache) /* someone else added cache */ - err = -EEXIST; - else /* no users remaining */ + /* + * swapin_readahead() doesn't check if a swap entry is valid, so the + * swap entry could be SWAP_MAP_BAD. Check here with lock held. + */ + if (unlikely(swap_count(count[i]) == SWAP_MAP_BAD)) { err = -ENOENT; + goto unlock_out; + } - } else if (count || has_cache) { - - if ((count & ~COUNT_CONTINUED) < SWAP_MAP_MAX) - count += usage; - else if ((count & ~COUNT_CONTINUED) > SWAP_MAP_MAX) - err = -EINVAL; - else if (swap_count_continued(p, offset, count)) - count = COUNT_CONTINUED; - else - err = -ENOMEM; - } else - err = -ENOENT; /* unused swap entry */ + has_cache[i] = count[i] & SWAP_HAS_CACHE; + count[i] &= ~SWAP_HAS_CACHE; + err = 0; + + if (usage == SWAP_HAS_CACHE) { + + /* set SWAP_HAS_CACHE if there is no cache and entry is used */ + if (!has_cache[i] && count[i]) + has_cache[i] = SWAP_HAS_CACHE; + else if (has_cache[i]) /* someone else added cache */ + err = -EEXIST; + else /* no users remaining */ + err = -ENOENT; + } else if (count[i] || has_cache[i]) { + + if ((count[i] & ~COUNT_CONTINUED) < SWAP_MAP_MAX) + count[i] += usage; + else if ((count[i] & ~COUNT_CONTINUED) > SWAP_MAP_MAX) + err = -EINVAL; + else if (swap_count_continued(p, offset + i, count[i])) + count[i] = COUNT_CONTINUED; + else + err = -ENOMEM; + } else + err = -ENOENT; /* unused swap entry */ - if (!err) - WRITE_ONCE(p->swap_map[offset], count | has_cache); + if (err) + break; + } + if (!err) { + for (i = 0; i < nr; i++) + WRITE_ONCE(p->swap_map[offset + i], count[i] | has_cache[i]); + } unlock_out: unlock_cluster_or_swap_info(p, ci); return err; } +static int __swap_duplicate(swp_entry_t entry, unsigned char usage) +{ + return __swap_duplicate_nr(entry, 1, usage); +} + /* * Help swapoff by noting that swap entry belongs to shmem/tmpfs * (in which case its reference count is never incremented). @@ -3417,17 +3427,33 @@ int swapcache_prepare(swp_entry_t entry) return __swap_duplicate(entry, SWAP_HAS_CACHE); } -void swapcache_clear(struct swap_info_struct *si, swp_entry_t entry) +int swapcache_prepare_nr(swp_entry_t entry, int nr) +{ + return __swap_duplicate_nr(entry, nr, SWAP_HAS_CACHE); +} + +void swapcache_clear_nr(struct swap_info_struct *si, swp_entry_t entry, int nr) { struct swap_cluster_info *ci; unsigned long offset = swp_offset(entry); - unsigned char usage; + unsigned char usage[SWAPFILE_CLUSTER]; + int i; ci = lock_cluster_or_swap_info(si, offset); - usage = __swap_entry_free_locked(si, offset, SWAP_HAS_CACHE); + for (i = 0; i < nr; i++) + usage[i] = __swap_entry_free_locked(si, offset + i, SWAP_HAS_CACHE); unlock_cluster_or_swap_info(si, ci); - if (!usage) - free_swap_slot(entry); + for (i = 0; i < nr; i++) { + if (!usage[i]) { + free_swap_slot(entry); + entry.val++; + } + } +} + +void swapcache_clear(struct swap_info_struct *si, swp_entry_t entry) +{ + swapcache_clear_nr(si, entry, 1); } struct swap_info_struct *swp_swap_info(swp_entry_t entry) From patchwork Mon Mar 4 08:13:48 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Barry Song <21cnbao@gmail.com> X-Patchwork-Id: 13580153 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8E410C54E41 for ; Mon, 4 Mar 2024 08:15:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2712A6B00A7; Mon, 4 Mar 2024 03:15:03 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 21EBC6B00A8; Mon, 4 Mar 2024 03:15:03 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 098216B00A9; Mon, 4 Mar 2024 03:15:03 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id EB86B6B00A7 for ; Mon, 4 Mar 2024 03:15:02 -0500 (EST) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 911258098B for ; Mon, 4 Mar 2024 08:15:02 +0000 (UTC) X-FDA: 81858646044.25.6713F89 Received: from mail-oi1-f172.google.com (mail-oi1-f172.google.com [209.85.167.172]) by imf12.hostedemail.com (Postfix) with ESMTP id D3F2640021 for ; Mon, 4 Mar 2024 08:15:00 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=iiIlq4s0; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf12.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.167.172 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1709540100; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=CAJqxwL1m9o2obT3SwA5MTdLBPpoaZY5B/QAJ6xv5j8=; b=5yAuHY+bIaNxLA6iCi9fr1zkxbiEHYneZ6i3hLux3Vbob2aIB90Nnf4nO0MiVxRDgmMdOa ywUJX/BYiFH9FSQyUx/5pkw6oP7nALQ7hMvD6z7BSSN5AW0p+4NUoB9+NednmMVFGhbZ2w e75vAmgMiYir1j6l2+gJaW/A9ADDjXw= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=iiIlq4s0; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf12.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.167.172 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1709540100; a=rsa-sha256; cv=none; b=f++rQMn2Hf2X76QrFQmfQLn5m7Ntc2GgPwzraC2WtIWdf1uCqdg04TyZinHTji9IokPzf+ Z5BOse6XOzP0dhpFX5AFbSlKoPcJUrW+z4bSitnJlOb7zS/H0RK3qL9/uKMIl32UXe2teh CG5vs+qy0En2ASRheC7Owp2oT/okqmY= Received: by mail-oi1-f172.google.com with SMTP id 5614622812f47-3c1ec2c07ffso405172b6e.3 for ; Mon, 04 Mar 2024 00:15:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1709540100; x=1710144900; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=CAJqxwL1m9o2obT3SwA5MTdLBPpoaZY5B/QAJ6xv5j8=; b=iiIlq4s0uCdClLRxcA6Gpcu8ks82zRUO1WsbJm356s8k5dhuE0lvEvnNim4/DQVZPH 5qGJtePKEPKcQOpPQJnYqyOlmoR8avglHGTw3mrMpOeYoWzTFiaf/2zIvZ9qd58RdS94 GO6Hz/sqFWIToN63CO+WQ8oFcJ/rDUzzdjZFxb4J/UD6tm5vTh9eLCrhy5O8oCoIl6Bq 3D3BNYGtaUYq9yGYxniWEeYJ592gL67YdQ3QaQlljO+eNjkqHQk4zvxkD3nH1WykMyNZ 7AUDGI/BP3FYjHxcg/5/LtlIRb3wn541ETDk5ShFKS0L/FvbqQvN2c3YagyBLEBC04v/ 3KAw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709540100; x=1710144900; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=CAJqxwL1m9o2obT3SwA5MTdLBPpoaZY5B/QAJ6xv5j8=; b=Svqas5ZSJgdl46iozyZ1w9+of4THPI7fJ3gWrx0HdZiO4gkVwU0G1HBed87L77vvpr Z9RRTHyOij1TsFgG6195qrLhtYWfFl1QxuScwrxcUWBbItLd21fj1sVow2VbdsUSDQxs nH77E5iv8SBPeDIzPt5O1jAGjCd+ZlOphes9WXovyhW7TNUqIpNe8fcc2aJZlGupnXwo /s9ulO021BfH5x7fiAoPwzbRHxLlg5qEWlMoHyKACOxEdQa6ywB05HhUNM7zFW+zsm59 6kcD33kw+pvPQSeSkoKkNs05ELHd2u2bq+CgPypvbKJcmHyeoPnAIEsRIwsGK9D/Soy/ qBmw== X-Forwarded-Encrypted: i=1; AJvYcCXrDGSWzGDPdBhE4rCx2d6mvvqDL/7eappgNLXWFYSbFhIw11N+gMduplhbgbg0xKy30RcC5DK5cQiSRScHmEhsKP8= X-Gm-Message-State: AOJu0Yy54H2Lsy+HALVrhh6py6/ybsPiSy09voRNV7SB4+bQqVizFSvy ip+t3PkWJtpA1Z4jPRf6gxSPqEyMgDHnmyp+5gAdIJPYfp/R+1hu X-Google-Smtp-Source: AGHT+IH6AIE3OmLeVTX7JovqJlGUGKUv2cWCft66hk/1+/3mqzDatNzNe/9A2nspzbNZTbVsGn1pPA== X-Received: by 2002:a05:6870:e91:b0:21e:d80d:3f13 with SMTP id mm17-20020a0568700e9100b0021ed80d3f13mr11743190oab.58.1709540099802; Mon, 04 Mar 2024 00:14:59 -0800 (PST) Received: from localhost.localdomain ([2407:7000:8942:5500:aaa1:59ff:fe57:eb97]) by smtp.gmail.com with ESMTPSA id ka42-20020a056a0093aa00b006e558a67374sm6686387pfb.0.2024.03.04.00.14.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 04 Mar 2024 00:14:59 -0800 (PST) From: Barry Song <21cnbao@gmail.com> To: akpm@linux-foundation.org, linux-mm@kvack.org, ryan.roberts@arm.com Cc: chengming.zhou@linux.dev, chrisl@kernel.org, david@redhat.com, hannes@cmpxchg.org, kasong@tencent.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, mhocko@suse.com, nphamcs@gmail.com, shy828301@gmail.com, steven.price@arm.com, surenb@google.com, wangkefeng.wang@huawei.com, willy@infradead.org, xiang@kernel.org, ying.huang@intel.com, yosryahmed@google.com, yuzhao@google.com, Chuanhua Han , Barry Song Subject: [RFC PATCH v3 5/5] mm: support large folios swapin as a whole Date: Mon, 4 Mar 2024 21:13:48 +1300 Message-Id: <20240304081348.197341-6-21cnbao@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240304081348.197341-1-21cnbao@gmail.com> References: <20240304081348.197341-1-21cnbao@gmail.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: D3F2640021 X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: yt19yu1itpnmdx8xggjbyind96n4epxr X-HE-Tag: 1709540100-120882 X-HE-Meta: U2FsdGVkX1/uGrND6MwZ/1zJ25Xwn7FQRc8s3e0o0j1PosI9WLF+bGZMLH1QahASLJ3qm2nIXgLUMlzHcFCF5u5B0aTE0pkmyey63FlmCSrTASGEAZdxI3bVGbwFKjtTbUMQkZAQ9CDvRBtdKGkGFBaQkHKP33OlPvDKsv3QZxagmuRVCQLoW/NLyxLXwj2wfNrCnscrVC6+LrkLwYOGUNQvtR/YExpdzPAzviEQGpjsbU3IaETCCgSzhiz+ZIJ8ZU3YCmB53+3XI3r9diXAeU/6oOertr9stD1Z/S90Idnzen04CaOTSvm/3y52rriwv/Q0fytGNcCj5BlBhN/TJqqo6ZTjJUfOKNPp9s4Uzk7X46E/Iy+A49z8X1ne1zBt5IyrYlHEYaUCVxsKbhVnxbStDspIgVwljQwPmwgXMc1d/z4KXIqslco4e8uOCtWPwH1TYZ1bKEfRb+WXTzwz7hwqq7RNVFuCJhBl3ELfWeWJxuKhE5yBW0TGjheml+mrcI67CN5ImhBKtx7NkE2oX+U0S9sfw4ZxU7M2oEVtkwAkARBTVY4lepz0JyBRYbzQsxmNpTJe4gjlJ7rC+8ET4ndkQ/uizswlSsoCwOvRH8DpAt7rUDmO9xL7V1QLo3IMiCAMF/nFH/oKtguzJl28e3snsZ1mX5Ipjm9DJZkIS/aXqUVYfmI6P4D0/bF+Bq50f/FJyUaT0krHgPvHDAynq184H5Iwg3o4gN8kluCNJPPsfbG/A/gealkBduHfxh+B9rVSpKAxI/eDr7mthjl8X/D+O9O/TBG0ctfJE5wWCNgRkL3kBHnDxx57G+uUxhTjz8x9klE2V09+0KqwvOxjr0Pz1RYN46ckVSisHRD/BoJxy+yteEtSUasUsw78QERTKaTknJVoJGMvunMoVaE5KYBxCDv6+MloSNkP8y38l8F8T73B1kqKSlEGOkZcJTr6Y2vYLyp+2GG/BuT8Igv LBApXR0a pyNtSVsogvEChNNjT0sZrSqgijJMVjUp0NK6fV/8Qx7F0DmmEMKF+HlhZWHH3jxvZGq2nJs2gz8NztCEg3dTZYB58SiuENXirkVya2PB5fa7ZY7MGRDY4eyp/qA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Chuanhua Han On an embedded system like Android, more than half of anon memory is actually in swap devices such as zRAM. For example, while an app is switched to background, its most memory might be swapped-out. Now we have mTHP features, unfortunately, if we don't support large folios swap-in, once those large folios are swapped-out, we immediately lose the performance gain we can get through large folios and hardware optimization such as CONT-PTE. This patch brings up mTHP swap-in support. Right now, we limit mTHP swap-in to those contiguous swaps which were likely swapped out from mTHP as a whole. Meanwhile, the current implementation only covers the SWAP_SYCHRONOUS case. It doesn't support swapin_readahead as large folios yet since this kind of shared memory is much less than memory mapped by single process. Right now, we are re-faulting large folios which are still in swapcache as a whole, this can effectively decrease extra loops and early-exitings which we have increased in arch_swap_restore() while supporting MTE restore for folios rather than page. On the other hand, it can also decrease do_swap_page as PTEs used to be set one by one even we hit a large folio in swapcache. Signed-off-by: Chuanhua Han Co-developed-by: Barry Song Signed-off-by: Barry Song --- mm/memory.c | 250 ++++++++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 212 insertions(+), 38 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index e0d34d705e07..501ede745ef3 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3907,6 +3907,136 @@ static vm_fault_t handle_pte_marker(struct vm_fault *vmf) return VM_FAULT_SIGBUS; } +/* + * check a range of PTEs are completely swap entries with + * contiguous swap offsets and the same SWAP_HAS_CACHE. + * pte must be first one in the range + */ +static bool is_pte_range_contig_swap(pte_t *pte, int nr_pages) +{ + int i; + struct swap_info_struct *si; + swp_entry_t entry; + unsigned type; + pgoff_t start_offset; + char has_cache; + + entry = pte_to_swp_entry(ptep_get_lockless(pte)); + if (non_swap_entry(entry)) + return false; + start_offset = swp_offset(entry); + if (start_offset % nr_pages) + return false; + + si = swp_swap_info(entry); + type = swp_type(entry); + has_cache = si->swap_map[start_offset] & SWAP_HAS_CACHE; + for (i = 1; i < nr_pages; i++) { + entry = pte_to_swp_entry(ptep_get_lockless(pte + i)); + if (non_swap_entry(entry)) + return false; + if (swp_offset(entry) != start_offset + i) + return false; + if (swp_type(entry) != type) + return false; + /* + * while allocating a large folio and doing swap_read_folio for the + * SWP_SYNCHRONOUS_IO path, which is the case the being faulted pte + * doesn't have swapcache. We need to ensure all PTEs have no cache + * as well, otherwise, we might go to swap devices while the content + * is in swapcache + */ + if ((si->swap_map[start_offset + i] & SWAP_HAS_CACHE) != has_cache) + return false; + } + + return true; +} + +#ifdef CONFIG_TRANSPARENT_HUGEPAGE +/* + * Get a list of all the (large) orders below PMD_ORDER that are enabled + * for this vma. Then filter out the orders that can't be allocated over + * the faulting address and still be fully contained in the vma. + */ +static inline unsigned long get_alloc_folio_orders(struct vm_fault *vmf) +{ + struct vm_area_struct *vma = vmf->vma; + unsigned long orders; + + orders = thp_vma_allowable_orders(vma, vma->vm_flags, false, true, true, + BIT(PMD_ORDER) - 1); + orders = thp_vma_suitable_orders(vma, vmf->address, orders); + return orders; +} +#endif + +static struct folio *alloc_swap_folio(struct vm_fault *vmf) +{ + struct vm_area_struct *vma = vmf->vma; +#ifdef CONFIG_TRANSPARENT_HUGEPAGE + unsigned long orders; + struct folio *folio; + unsigned long addr; + pte_t *pte; + gfp_t gfp; + int order; + + /* + * If uffd is active for the vma we need per-page fault fidelity to + * maintain the uffd semantics. + */ + if (unlikely(userfaultfd_armed(vma))) + goto fallback; + + /* + * a large folio being swapped-in could be partially in + * zswap and partially in swap devices, zswap doesn't + * support large folios yet, we might get corrupted + * zero-filled data by reading all subpages from swap + * devices while some of them are actually in zswap + */ + if (is_zswap_enabled()) + goto fallback; + + orders = get_alloc_folio_orders(vmf); + if (!orders) + goto fallback; + + pte = pte_offset_map(vmf->pmd, vmf->address & PMD_MASK); + if (unlikely(!pte)) + goto fallback; + + /* + * For do_swap_page, find the highest order where the aligned range is + * completely swap entries with contiguous swap offsets. + */ + order = highest_order(orders); + while (orders) { + addr = ALIGN_DOWN(vmf->address, PAGE_SIZE << order); + if (is_pte_range_contig_swap(pte + pte_index(addr), 1 << order)) + break; + order = next_order(&orders, order); + } + + pte_unmap(pte); + + /* Try allocating the highest of the remaining orders. */ + gfp = vma_thp_gfp_mask(vma); + while (orders) { + addr = ALIGN_DOWN(vmf->address, PAGE_SIZE << order); + folio = vma_alloc_folio(gfp, order, vma, addr, true); + if (folio) + return folio; + order = next_order(&orders, order); + } + +fallback: +#endif + return vma_alloc_folio(GFP_HIGHUSER_MOVABLE, 0, vma, vmf->address, false); +} + + /* * We enter with non-exclusive mmap_lock (to exclude vma changes, * but allow concurrent faults), and pte mapped but not yet locked. @@ -3928,6 +4058,9 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) pte_t pte; vm_fault_t ret = 0; void *shadow = NULL; + int nr_pages = 1; + unsigned long start_address; + pte_t *start_pte; if (!pte_unmap_same(vmf)) goto out; @@ -3991,35 +4124,41 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) if (!folio) { if (data_race(si->flags & SWP_SYNCHRONOUS_IO) && __swap_count(entry) == 1) { - /* - * Prevent parallel swapin from proceeding with - * the cache flag. Otherwise, another thread may - * finish swapin first, free the entry, and swapout - * reusing the same entry. It's undetectable as - * pte_same() returns true due to entry reuse. - */ - if (swapcache_prepare(entry)) { - /* Relax a bit to prevent rapid repeated page faults */ - schedule_timeout_uninterruptible(1); - goto out; - } - need_clear_cache = true; - /* skip swapcache */ - folio = vma_alloc_folio(GFP_HIGHUSER_MOVABLE, 0, - vma, vmf->address, false); + folio = alloc_swap_folio(vmf); page = &folio->page; if (folio) { __folio_set_locked(folio); __folio_set_swapbacked(folio); + if (folio_test_large(folio)) { + nr_pages = folio_nr_pages(folio); + entry.val = ALIGN_DOWN(entry.val, nr_pages); + } + + /* + * Prevent parallel swapin from proceeding with + * the cache flag. Otherwise, another thread may + * finish swapin first, free the entry, and swapout + * reusing the same entry. It's undetectable as + * pte_same() returns true due to entry reuse. + */ + if (swapcache_prepare_nr(entry, nr_pages)) { + /* Relax a bit to prevent rapid repeated page faults */ + schedule_timeout_uninterruptible(1); + goto out; + } + need_clear_cache = true; + if (mem_cgroup_swapin_charge_folio(folio, vma->vm_mm, GFP_KERNEL, entry)) { ret = VM_FAULT_OOM; goto out_page; } - mem_cgroup_swapin_uncharge_swap(entry); + + for (swp_entry_t e = entry; e.val < entry.val + nr_pages; e.val++) + mem_cgroup_swapin_uncharge_swap(e); shadow = get_shadow_from_swap_cache(entry); if (shadow) @@ -4118,6 +4257,42 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) */ vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address, &vmf->ptl); + + start_address = vmf->address; + start_pte = vmf->pte; + if (start_pte && folio_test_large(folio)) { + unsigned long nr = folio_nr_pages(folio); + unsigned long addr = ALIGN_DOWN(vmf->address, nr * PAGE_SIZE); + pte_t *aligned_pte = vmf->pte - (vmf->address - addr) / PAGE_SIZE; + + /* + * case 1: we are allocating large_folio, try to map it as a whole + * iff the swap entries are still entirely mapped; + * case 2: we hit a large folio in swapcache, and all swap entries + * are still entirely mapped, try to map a large folio as a whole. + * otherwise, map only the faulting page within the large folio + * which is swapcache + */ + if (!is_pte_range_contig_swap(aligned_pte, nr)) { + if (nr_pages > 1) /* ptes have changed for case 1 */ + goto out_nomap; + goto check_pte; + } + + start_address = addr; + start_pte = aligned_pte; + /* + * the below has been done before swap_read_folio() + * for case 1 + */ + if (unlikely(folio == swapcache)) { + nr_pages = nr; + entry.val = ALIGN_DOWN(entry.val, nr_pages); + page = &folio->page; + } + } + +check_pte: if (unlikely(!vmf->pte || !pte_same(ptep_get(vmf->pte), vmf->orig_pte))) goto out_nomap; @@ -4185,12 +4360,14 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) * We're already holding a reference on the page but haven't mapped it * yet. */ - swap_free(entry); + swap_nr_free(entry, nr_pages); if (should_try_to_free_swap(folio, vma, vmf->flags)) folio_free_swap(folio); - inc_mm_counter(vma->vm_mm, MM_ANONPAGES); - dec_mm_counter(vma->vm_mm, MM_SWAPENTS); + folio_ref_add(folio, nr_pages - 1); + add_mm_counter(vma->vm_mm, MM_ANONPAGES, nr_pages); + add_mm_counter(vma->vm_mm, MM_SWAPENTS, -nr_pages); + pte = mk_pte(page, vma->vm_page_prot); /* @@ -4200,14 +4377,14 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) * exclusivity. */ if (!folio_test_ksm(folio) && - (exclusive || folio_ref_count(folio) == 1)) { + (exclusive || folio_ref_count(folio) == nr_pages)) { if (vmf->flags & FAULT_FLAG_WRITE) { pte = maybe_mkwrite(pte_mkdirty(pte), vma); vmf->flags &= ~FAULT_FLAG_WRITE; } rmap_flags |= RMAP_EXCLUSIVE; } - flush_icache_page(vma, page); + flush_icache_pages(vma, page, nr_pages); if (pte_swp_soft_dirty(vmf->orig_pte)) pte = pte_mksoft_dirty(pte); if (pte_swp_uffd_wp(vmf->orig_pte)) @@ -4216,17 +4393,19 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) /* ksm created a completely new copy */ if (unlikely(folio != swapcache && swapcache)) { - folio_add_new_anon_rmap(folio, vma, vmf->address); + folio_add_new_anon_rmap(folio, vma, start_address); folio_add_lru_vma(folio, vma); + } else if (!folio_test_anon(folio)) { + folio_add_new_anon_rmap(folio, vma, start_address); } else { - folio_add_anon_rmap_pte(folio, page, vma, vmf->address, + folio_add_anon_rmap_ptes(folio, page, nr_pages, vma, start_address, rmap_flags); } VM_BUG_ON(!folio_test_anon(folio) || (pte_write(pte) && !PageAnonExclusive(page))); - set_pte_at(vma->vm_mm, vmf->address, vmf->pte, pte); - arch_do_swap_page(vma->vm_mm, vma, vmf->address, pte, vmf->orig_pte); + set_ptes(vma->vm_mm, start_address, start_pte, pte, nr_pages); + arch_do_swap_page(vma->vm_mm, vma, start_address, pte, vmf->orig_pte); folio_unlock(folio); if (folio != swapcache && swapcache) { @@ -4243,6 +4422,9 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) } if (vmf->flags & FAULT_FLAG_WRITE) { + if (nr_pages > 1) + vmf->orig_pte = ptep_get(vmf->pte); + ret |= do_wp_page(vmf); if (ret & VM_FAULT_ERROR) ret &= VM_FAULT_ERROR; @@ -4250,14 +4432,14 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) } /* No need to invalidate - it was non-present before */ - update_mmu_cache_range(vmf, vma, vmf->address, vmf->pte, 1); + update_mmu_cache_range(vmf, vma, start_address, start_pte, nr_pages); unlock: if (vmf->pte) pte_unmap_unlock(vmf->pte, vmf->ptl); out: /* Clear the swap cache pin for direct swapin after PTL unlock */ if (need_clear_cache) - swapcache_clear(si, entry); + swapcache_clear_nr(si, entry, nr_pages); if (si) put_swap_device(si); return ret; @@ -4273,7 +4455,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) folio_put(swapcache); } if (need_clear_cache) - swapcache_clear(si, entry); + swapcache_clear_nr(si, entry, nr_pages); if (si) put_swap_device(si); return ret; @@ -4309,15 +4491,7 @@ static struct folio *alloc_anon_folio(struct vm_fault *vmf) if (unlikely(userfaultfd_armed(vma))) goto fallback; - /* - * Get a list of all the (large) orders below PMD_ORDER that are enabled - * for this vma. Then filter out the orders that can't be allocated over - * the faulting address and still be fully contained in the vma. - */ - orders = thp_vma_allowable_orders(vma, vma->vm_flags, false, true, true, - BIT(PMD_ORDER) - 1); - orders = thp_vma_suitable_orders(vma, vmf->address, orders); - + orders = get_alloc_folio_orders(vmf); if (!orders) goto fallback;