From patchwork Fri Mar 22 11:41:36 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Barry Song <21cnbao@gmail.com> X-Patchwork-Id: 13600003 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id E0E13C47DD9 for ; Fri, 22 Mar 2024 11:50:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=MmuWyQQ9UofTCC9WFNRzi2/x1xAqHQtzX1wuBo8xn3k=; b=4CH75Ds4buMEDh SHJJCP7C3t4NaSGGW93Xd4kXgYsZyUk7dHXCo0DiAt08n0IiWzARWzsO8zurQxyz+mEEeYPZd17JC 64ZTKAejDpUAtmlv21cX4zgVn97OI8KMnIfF1CkKoAiiy3uguDQ7FUqV5Tue70qb/4W0BDJUjNQ/F WhshEKS+VVi3NKzPAAaZgvVmwLy6Xst/KtlQVC2UTDwjb1/HUXmb3f4C4bOJuTHqytG+pbsYLdLMi lC8cHli/uY72qpRYplji7iiJLNeGySdpqREPVLZRVl/8utr5o36mdLi14rr3tfk++RBQhSXYlbDU1 hdb+PrM+V+Kn7CfyckTA==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1rndPR-0000000714r-0MzF; Fri, 22 Mar 2024 11:50:05 +0000 Received: from mail-oa1-x2a.google.com ([2001:4860:4864:20::2a]) by bombadil.infradead.org with esmtps (Exim 4.97.1 #2 (Red Hat Linux)) id 1rndPO-0000000713z-2AyX for linux-arm-kernel@lists.infradead.org; Fri, 22 Mar 2024 11:50:03 +0000 Received: by mail-oa1-x2a.google.com with SMTP id 586e51a60fabf-222ba2a19bdso1148520fac.1 for ; Fri, 22 Mar 2024 04:50:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1711108202; x=1711713002; darn=lists.infradead.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Er48O4Qnw/EUunMAmDlsUPIFAnK22vDNUKWOK0xsOIM=; b=dCWTc9YXA5zEFpOv4wjQ55HeBp3OTct5/jPTM5LABaQvbvJNaGDnwXeJTETQI1qMg+ GnF6CDwRVAFvbjlTQNq5gF0dnqmfON5lBoCsBDr0wAyXWvS/AgY0ngAHjZqu4HkE+ddL QxcfRr8E7+lPvRViVB7ArY5pNej0xqyOzMPwuufEUmwodjgPfwE9YLfdPxaIUCxStjBQ vKoAas5NfBMVCV3yJ/H68M7GeM8w7+Vr+sjQvDJHwPGTOUvmzmdLVjIDfp9UBK0fknts Ai2e9I6fu93TWHh8RP8iaxHDI+qXDwHRF/bmW3nJgOGrJ3O3ojbdMgEvDLUg9rz9tYj3 dQ0Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1711108202; x=1711713002; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Er48O4Qnw/EUunMAmDlsUPIFAnK22vDNUKWOK0xsOIM=; b=XGAuXMH5LgSZgdL1fgYubeTnMIh5p0aoycdcIshnOwxMVNy76wj4HqgVxlYYRrVk5/ 4Cps/UXlzSWnEDcp3se1Cm2fyveTbLgfRgsTlI8E5GQQDn8GQLqcaAkh3+7u3wPdYAO2 C8L3gPWIi2t6umXHrIGjtUgA3fLa/YNUkmWG2Gzd0sHwNdo8J5JMxWieEZMSJUjxWfH7 SsmuMbmCBwG2FjjT1vW8Yniuv+ad6FgDNaS1GNspgXoKg5g2JLePGeiIzCN/bd/WQbx4 MjEjcs30Oixen6IdVahAnG/lo20FD/BWoV17oGGxpB5RuE5eCU5ym7UPXTlowwvsmeaC Ds9Q== X-Forwarded-Encrypted: i=1; AJvYcCXHP+J1gmlbEWS5vBEMKfL6IJXQqR1VawA0Oj0DmX6X3CatLFsQSuu3X38nxZNOevFYjf20oZcBr2Qa+UbFvH8Fy56qYB3fOhvjVE+Uft9/ZXYKVqY= X-Gm-Message-State: AOJu0Ywvw4xrVbPSpCaPoLyZc69om43rfV2s9HxSDH2cL2OBeF8595ui VIYxJ8kmEQvqoNhcmuMNVqFF8wwzuNE3UCHQPxWgegTkBhIHalyk2jjDRtYtXjE= X-Google-Smtp-Source: AGHT+IHT3XyeYHRIL8a0Z8tJiZ3oQR87871x0QA0EhqpJu9zrww0j75nZ+STSZ1bMkgrc8y/tSaOxA== X-Received: by 2002:a05:6a21:3384:b0:1a3:648e:dacf with SMTP id yy4-20020a056a21338400b001a3648edacfmr2854498pzb.35.1711107724129; Fri, 22 Mar 2024 04:42:04 -0700 (PDT) Received: from localhost.localdomain ([2407:7000:8942:5500:aaa1:59ff:fe57:eb97]) by smtp.gmail.com with ESMTPSA id p14-20020a170902e74e00b001d92a58330csm1700209plf.145.2024.03.22.04.41.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 22 Mar 2024 04:42:03 -0700 (PDT) From: Barry Song <21cnbao@gmail.com> To: catalin.marinas@arm.com, will@kernel.org, akpm@linux-foundation.org, hughd@google.com, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org Cc: chrisl@kernel.org, mark.rutland@arm.com, ryan.roberts@arm.com, steven.price@arm.com, david@redhat.com, willy@infradead.org, Barry Song , Kemeng Shi , Anshuman Khandual , Peter Collingbourne , Yosry Ahmed , Peter Xu , Lorenzo Stoakes , "Mike Rapoport (IBM)" , "Aneesh Kumar K.V" , Rick Edgecombe Subject: [PATCH 1/1] arm64: mm: swap: support THP_SWAP on hardware with MTE Date: Sat, 23 Mar 2024 00:41:36 +1300 Message-Id: <20240322114136.61386-2-21cnbao@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240322114136.61386-1-21cnbao@gmail.com> References: <20240322114136.61386-1-21cnbao@gmail.com> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240322_045002_580621_0DB3B6DE X-CRM114-Status: GOOD ( 26.95 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org From: Barry Song Commit d0637c505f8a1 ("arm64: enable THP_SWAP for arm64") brings up THP_SWAP on ARM64, but it doesn't enable THP_SWP on hardware with MTE as the MTE code works with the assumption tags save/restore is always handling a folio with only one page. The limitation should be removed as more and more ARM64 SoCs have this feature. Co-existence of MTE and THP_SWAP becomes more and more important. This patch makes MTE tags saving support large folios, then we don't need to split large folios into base pages for swapping out on ARM64 SoCs with MTE any more. arch_prepare_to_swap() should take folio rather than page as parameter because we support THP swap-out as a whole. It saves tags for all pages in a large folio. As now we are restoring tags based-on folio, in arch_swap_restore(), we may increase some extra loops and early-exitings while refaulting a large folio which is still in swapcache in do_swap_page(). In case a large folio has nr pages, do_swap_page() will only set the PTE of the particular page which is causing the page fault. Thus do_swap_page() runs nr times, and each time, arch_swap_restore() will loop nr times for those subpages in the folio. So right now the algorithmic complexity becomes O(nr^2). Once we support mapping large folios in do_swap_page(), extra loops and early-exitings will decrease while not being completely removed as a large folio might get partially tagged in corner cases such as, 1. a large folio in swapcache can be partially unmapped, thus, MTE tags for the unmapped pages will be invalidated; 2. users might use mprotect() to set MTEs on a part of a large folio. arch_thp_swp_supported() is dropped since ARM64 MTE was the only one who needed it. Cc: Catalin Marinas Cc: Will Deacon Cc: Ryan Roberts Cc: Mark Rutland Cc: David Hildenbrand Cc: Kemeng Shi Cc: "Matthew Wilcox (Oracle)" Cc: Anshuman Khandual Cc: Peter Collingbourne Cc: Steven Price Cc: Yosry Ahmed Cc: Peter Xu Cc: Lorenzo Stoakes Cc: "Mike Rapoport (IBM)" Cc: Hugh Dickins CC: "Aneesh Kumar K.V" Cc: Rick Edgecombe Signed-off-by: Barry Song Reviewed-by: Steven Price Acked-by: Chris Li Reviewed-by: Ryan Roberts Acked-by: Catalin Marinas --- arch/arm64/include/asm/pgtable.h | 19 ++------------ arch/arm64/mm/mteswap.c | 45 ++++++++++++++++++++++++++++++++ include/linux/huge_mm.h | 12 --------- include/linux/pgtable.h | 2 +- mm/internal.h | 14 ++++++++++ mm/memory.c | 2 +- mm/page_io.c | 2 +- mm/shmem.c | 2 +- mm/swap_slots.c | 2 +- mm/swapfile.c | 2 +- 10 files changed, 67 insertions(+), 35 deletions(-) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index afdd56d26ad7..259325e6b7e8 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -49,12 +49,6 @@ __flush_tlb_range(vma, addr, end, PUD_SIZE, false, 1) #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ -static inline bool arch_thp_swp_supported(void) -{ - return !system_supports_mte(); -} -#define arch_thp_swp_supported arch_thp_swp_supported - /* * Outside of a few very special situations (e.g. hibernation), we always * use broadcast TLB invalidation instructions, therefore a spurious page @@ -1280,12 +1274,7 @@ static inline pmd_t pmdp_establish(struct vm_area_struct *vma, #ifdef CONFIG_ARM64_MTE #define __HAVE_ARCH_PREPARE_TO_SWAP -static inline int arch_prepare_to_swap(struct page *page) -{ - if (system_supports_mte()) - return mte_save_tags(page); - return 0; -} +extern int arch_prepare_to_swap(struct folio *folio); #define __HAVE_ARCH_SWAP_INVALIDATE static inline void arch_swap_invalidate_page(int type, pgoff_t offset) @@ -1301,11 +1290,7 @@ static inline void arch_swap_invalidate_area(int type) } #define __HAVE_ARCH_SWAP_RESTORE -static inline void arch_swap_restore(swp_entry_t entry, struct folio *folio) -{ - if (system_supports_mte()) - mte_restore_tags(entry, &folio->page); -} +extern void arch_swap_restore(swp_entry_t entry, struct folio *folio); #endif /* CONFIG_ARM64_MTE */ diff --git a/arch/arm64/mm/mteswap.c b/arch/arm64/mm/mteswap.c index a31833e3ddc5..63e8d72f202a 100644 --- a/arch/arm64/mm/mteswap.c +++ b/arch/arm64/mm/mteswap.c @@ -68,6 +68,13 @@ void mte_invalidate_tags(int type, pgoff_t offset) mte_free_tag_storage(tags); } +static inline void __mte_invalidate_tags(struct page *page) +{ + swp_entry_t entry = page_swap_entry(page); + + mte_invalidate_tags(swp_type(entry), swp_offset(entry)); +} + void mte_invalidate_tags_area(int type) { swp_entry_t entry = swp_entry(type, 0); @@ -83,3 +90,41 @@ void mte_invalidate_tags_area(int type) } xa_unlock(&mte_pages); } + +int arch_prepare_to_swap(struct folio *folio) +{ + long i, nr; + int err; + + if (!system_supports_mte()) + return 0; + + nr = folio_nr_pages(folio); + + for (i = 0; i < nr; i++) { + err = mte_save_tags(folio_page(folio, i)); + if (err) + goto out; + } + return 0; + +out: + while (i--) + __mte_invalidate_tags(folio_page(folio, i)); + return err; +} + +void arch_swap_restore(swp_entry_t entry, struct folio *folio) +{ + long i, nr; + + if (!system_supports_mte()) + return; + + nr = folio_nr_pages(folio); + + for (i = 0; i < nr; i++) { + mte_restore_tags(entry, folio_page(folio, i)); + entry.val++; + } +} diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index de0c89105076..e04b93c43965 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -535,16 +535,4 @@ static inline int split_folio_to_order(struct folio *folio, int new_order) #define split_folio_to_list(f, l) split_folio_to_list_to_order(f, l, 0) #define split_folio(f) split_folio_to_order(f, 0) -/* - * archs that select ARCH_WANTS_THP_SWAP but don't support THP_SWP due to - * limitations in the implementation like arm64 MTE can override this to - * false - */ -#ifndef arch_thp_swp_supported -static inline bool arch_thp_swp_supported(void) -{ - return true; -} -#endif - #endif /* _LINUX_HUGE_MM_H */ diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 85fc7554cd52..b10a7dd615bd 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -1052,7 +1052,7 @@ static inline int arch_unmap_one(struct mm_struct *mm, * prototypes must be defined in the arch-specific asm/pgtable.h file. */ #ifndef __HAVE_ARCH_PREPARE_TO_SWAP -static inline int arch_prepare_to_swap(struct page *page) +static inline int arch_prepare_to_swap(struct folio *folio) { return 0; } diff --git a/mm/internal.h b/mm/internal.h index 7e486f2c502c..2551e93dd885 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -76,6 +76,20 @@ static inline int folio_nr_pages_mapped(struct folio *folio) return atomic_read(&folio->_nr_pages_mapped) & FOLIO_PAGES_MAPPED; } +/* + * Retrieve the first entry of a folio based on a provided entry within the + * folio. We cannot rely on folio->swap as there is no guarantee that it has + * been initialized. Used for calling arch_swap_restore() + */ +static inline swp_entry_t folio_swap(swp_entry_t entry, struct folio *folio) +{ + swp_entry_t swap = { + .val = ALIGN_DOWN(entry.val, folio_nr_pages(folio)), + }; + + return swap; +} + static inline void *folio_raw_mapping(struct folio *folio) { unsigned long mapping = (unsigned long)folio->mapping; diff --git a/mm/memory.c b/mm/memory.c index f2bc6dd15eb8..b7cab8be8632 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4188,7 +4188,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) * when reading from swap. This metadata may be indexed by swap entry * so this must be called before swap_free(). */ - arch_swap_restore(entry, folio); + arch_swap_restore(folio_swap(entry, folio), folio); /* * Remove the swap entry and conditionally try to free up the swapcache. diff --git a/mm/page_io.c b/mm/page_io.c index ae2b49055e43..a9a7c236aecc 100644 --- a/mm/page_io.c +++ b/mm/page_io.c @@ -189,7 +189,7 @@ int swap_writepage(struct page *page, struct writeback_control *wbc) * Arch code may have to preserve more data than just the page * contents, e.g. memory tags. */ - ret = arch_prepare_to_swap(&folio->page); + ret = arch_prepare_to_swap(folio); if (ret) { folio_mark_dirty(folio); folio_unlock(folio); diff --git a/mm/shmem.c b/mm/shmem.c index 0aad0d9a621b..44c1519ba881 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1913,7 +1913,7 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index, * Some architectures may have to restore extra metadata to the * folio after reading from swap. */ - arch_swap_restore(swap, folio); + arch_swap_restore(folio_swap(swap, folio), folio); if (shmem_should_replace_folio(folio, gfp)) { error = shmem_replace_folio(&folio, gfp, info, index); diff --git a/mm/swap_slots.c b/mm/swap_slots.c index 90973ce7881d..53abeaf1371d 100644 --- a/mm/swap_slots.c +++ b/mm/swap_slots.c @@ -310,7 +310,7 @@ swp_entry_t folio_alloc_swap(struct folio *folio) entry.val = 0; if (folio_test_large(folio)) { - if (IS_ENABLED(CONFIG_THP_SWAP) && arch_thp_swp_supported()) + if (IS_ENABLED(CONFIG_THP_SWAP)) get_swap_pages(1, &entry, folio_nr_pages(folio)); goto out; } diff --git a/mm/swapfile.c b/mm/swapfile.c index 4919423cce76..5e6d2304a2a4 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1806,7 +1806,7 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_t *pmd, * when reading from swap. This metadata may be indexed by swap entry * so this must be called before swap_free(). */ - arch_swap_restore(entry, folio); + arch_swap_restore(folio_swap(entry, folio), folio); dec_mm_counter(vma->vm_mm, MM_SWAPENTS); inc_mm_counter(vma->vm_mm, MM_ANONPAGES);