From patchwork Wed Aug 23 13:13:14 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexandru Elisei X-Patchwork-Id: 13362311 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7195EEE49A3 for ; Wed, 23 Aug 2023 13:14:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235523AbjHWNO0 (ORCPT ); Wed, 23 Aug 2023 09:14:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44382 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235503AbjHWNOS (ORCPT ); Wed, 23 Aug 2023 09:14:18 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 27A43E57; Wed, 23 Aug 2023 06:14:16 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 6E09E11FB; Wed, 23 Aug 2023 06:14:56 -0700 (PDT) Received: from e121798.cable.virginm.net (unknown [172.31.20.19]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 2BD863F740; Wed, 23 Aug 2023 06:14:10 -0700 (PDT) From: Alexandru Elisei To: catalin.marinas@arm.com, will@kernel.org, oliver.upton@linux.dev, maz@kernel.org, james.morse@arm.com, suzuki.poulose@arm.com, yuzenghui@huawei.com, arnd@arndb.de, akpm@linux-foundation.org, mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, bristot@redhat.com, vschneid@redhat.com, mhiramat@kernel.org, rppt@kernel.org, hughd@google.com Cc: pcc@google.com, steven.price@arm.com, anshuman.khandual@arm.com, vincenzo.frascino@arm.com, david@redhat.com, eugenis@google.com, kcc@google.com, hyesoo.yu@samsung.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, kvmarm@lists.linux.dev, linux-fsdevel@vger.kernel.org, linux-arch@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org Subject: [PATCH RFC 01/37] mm: page_alloc: Rename gfp_to_alloc_flags_cma -> gfp_to_alloc_flags_fast Date: Wed, 23 Aug 2023 14:13:14 +0100 Message-Id: <20230823131350.114942-2-alexandru.elisei@arm.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230823131350.114942-1-alexandru.elisei@arm.com> References: <20230823131350.114942-1-alexandru.elisei@arm.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-trace-kernel@vger.kernel.org gfp_to_alloc_flags_cma() is called on the fast path of the page allocator and all it does is set the ALLOC_CMA flag if all the conditions are met for the allocation to be satisfied from the MIGRATE_CMA list. Rename it to be more generic, as it will soon have to handle another another flag. Signed-off-by: Alexandru Elisei --- mm/page_alloc.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 7d3460c7a480..e6f950c54494 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3081,7 +3081,7 @@ alloc_flags_nofragment(struct zone *zone, gfp_t gfp_mask) } /* Must be called after current_gfp_context() which can change gfp_mask */ -static inline unsigned int gfp_to_alloc_flags_cma(gfp_t gfp_mask, +static inline unsigned int gfp_to_alloc_flags_fast(gfp_t gfp_mask, unsigned int alloc_flags) { #ifdef CONFIG_CMA @@ -3784,7 +3784,7 @@ gfp_to_alloc_flags(gfp_t gfp_mask, unsigned int order) } else if (unlikely(rt_task(current)) && in_task()) alloc_flags |= ALLOC_MIN_RESERVE; - alloc_flags = gfp_to_alloc_flags_cma(gfp_mask, alloc_flags); + alloc_flags = gfp_to_alloc_flags_fast(gfp_mask, alloc_flags); return alloc_flags; } @@ -4074,7 +4074,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, reserve_flags = __gfp_pfmemalloc_flags(gfp_mask); if (reserve_flags) - alloc_flags = gfp_to_alloc_flags_cma(gfp_mask, reserve_flags) | + alloc_flags = gfp_to_alloc_flags_fast(gfp_mask, reserve_flags) | (alloc_flags & ALLOC_KSWAPD); /* @@ -4250,7 +4250,7 @@ static inline bool prepare_alloc_pages(gfp_t gfp_mask, unsigned int order, if (should_fail_alloc_page(gfp_mask, order)) return false; - *alloc_flags = gfp_to_alloc_flags_cma(gfp_mask, *alloc_flags); + *alloc_flags = gfp_to_alloc_flags_fast(gfp_mask, *alloc_flags); /* Dirty zone balancing only done in the fast path */ ac->spread_dirty_pages = (gfp_mask & __GFP_WRITE); From patchwork Wed Aug 23 13:13:15 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexandru Elisei X-Patchwork-Id: 13362312 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D3CB9EE49A0 for ; Wed, 23 Aug 2023 13:14:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235517AbjHWNOc (ORCPT ); Wed, 23 Aug 2023 09:14:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48944 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235505AbjHWNO1 (ORCPT ); Wed, 23 Aug 2023 09:14:27 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 71F72E79; Wed, 23 Aug 2023 06:14:22 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id BF9EF1515; Wed, 23 Aug 2023 06:15:02 -0700 (PDT) Received: from e121798.cable.virginm.net (unknown [172.31.20.19]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 2BAF13F740; Wed, 23 Aug 2023 06:14:16 -0700 (PDT) From: Alexandru Elisei To: catalin.marinas@arm.com, will@kernel.org, oliver.upton@linux.dev, maz@kernel.org, james.morse@arm.com, suzuki.poulose@arm.com, yuzenghui@huawei.com, arnd@arndb.de, akpm@linux-foundation.org, mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, bristot@redhat.com, vschneid@redhat.com, mhiramat@kernel.org, rppt@kernel.org, hughd@google.com Cc: pcc@google.com, steven.price@arm.com, anshuman.khandual@arm.com, vincenzo.frascino@arm.com, david@redhat.com, eugenis@google.com, kcc@google.com, hyesoo.yu@samsung.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, kvmarm@lists.linux.dev, linux-fsdevel@vger.kernel.org, linux-arch@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org Subject: [PATCH RFC 02/37] arm64: mte: Rework naming for tag manipulation functions Date: Wed, 23 Aug 2023 14:13:15 +0100 Message-Id: <20230823131350.114942-3-alexandru.elisei@arm.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230823131350.114942-1-alexandru.elisei@arm.com> References: <20230823131350.114942-1-alexandru.elisei@arm.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-trace-kernel@vger.kernel.org The tag save/restore/copy functions could be more explicit about from where the tags are coming from and where they are being copied to. Renaming the functions to make it easier to understand what they are doing: - Rename the mte_clear_page_tags() 'addr' parameter to 'page_addr', to match the other functions that take a page address as parameter. - Rename mte_save/restore_tags() to mte_save/restore_page_tags_by_swp_entry() to 1. distinguish the functions from mte_save/restore_page_tags() and 2. make it clear how they are indexed (this will become important once other ways to save the tags are added). Same applies to mte_invalidate_tags{,_area}_by_swp_entry(). - Rename mte_save/restore_page_tags() to make it clear where the tags are going to be saved, respectively from where they are restored - in a previously allocator memory buffer, not in an xarray, like with the tags preserved when swapping. - Rename mte_allocate/free_tag_storage() to mte_allocate/free_tags_mem() to make it clear the functions have nothing to do with the memory where the live tags are stored for a page. Change the parameter type for mte_free_tags_mem()) to be void *, to match the return value of mte_allocate_tags_mem(). Also because that memory is opaque and it not meant to be directly deferenced. In the name of consistency rename local variables from tag_storage to tags. Give a similar treatment to the hibernation code that saves and restores the tags for all tagged pages. In the same spirit, rename MTE_PAGE_TAG_STORAGE to MTE_PAGE_TAG_STORAGE_SIZE to make it clear that it relates to the size of the memory needed to save the tags for a page. Oportunistically rename MTE_TAG_SIZE to MTE_TAG_SIZE_BITS to make it clear it is measured in bits, not bytes, like the rest of the size variable from the same header file. Signed-off-by: Alexandru Elisei --- arch/arm64/include/asm/mte-def.h | 16 +++++----- arch/arm64/include/asm/mte.h | 24 +++++++++------ arch/arm64/include/asm/pgtable.h | 8 ++--- arch/arm64/kernel/elfcore.c | 14 ++++----- arch/arm64/kernel/hibernate.c | 46 ++++++++++++++--------------- arch/arm64/lib/mte.S | 14 ++++----- arch/arm64/mm/mteswap.c | 50 ++++++++++++++++---------------- 7 files changed, 89 insertions(+), 83 deletions(-) diff --git a/arch/arm64/include/asm/mte-def.h b/arch/arm64/include/asm/mte-def.h index 14ee86b019c2..eb0d76a6bdcf 100644 --- a/arch/arm64/include/asm/mte-def.h +++ b/arch/arm64/include/asm/mte-def.h @@ -5,14 +5,14 @@ #ifndef __ASM_MTE_DEF_H #define __ASM_MTE_DEF_H -#define MTE_GRANULE_SIZE UL(16) -#define MTE_GRANULE_MASK (~(MTE_GRANULE_SIZE - 1)) -#define MTE_GRANULES_PER_PAGE (PAGE_SIZE / MTE_GRANULE_SIZE) -#define MTE_TAG_SHIFT 56 -#define MTE_TAG_SIZE 4 -#define MTE_TAG_MASK GENMASK((MTE_TAG_SHIFT + (MTE_TAG_SIZE - 1)), MTE_TAG_SHIFT) -#define MTE_PAGE_TAG_STORAGE (MTE_GRANULES_PER_PAGE * MTE_TAG_SIZE / 8) +#define MTE_GRANULE_SIZE UL(16) +#define MTE_GRANULE_MASK (~(MTE_GRANULE_SIZE - 1)) +#define MTE_GRANULES_PER_PAGE (PAGE_SIZE / MTE_GRANULE_SIZE) +#define MTE_TAG_SHIFT 56 +#define MTE_TAG_SIZE_BITS 4 +#define MTE_TAG_MASK GENMASK((MTE_TAG_SHIFT + (MTE_TAG_SIZE_BITS - 1)), MTE_TAG_SHIFT) +#define MTE_PAGE_TAG_STORAGE_SIZE (MTE_GRANULES_PER_PAGE * MTE_TAG_SIZE_BITS / 8) -#define __MTE_PREAMBLE ARM64_ASM_PREAMBLE ".arch_extension memtag\n" +#define __MTE_PREAMBLE ARM64_ASM_PREAMBLE ".arch_extension memtag\n" #endif /* __ASM_MTE_DEF_H */ diff --git a/arch/arm64/include/asm/mte.h b/arch/arm64/include/asm/mte.h index 4cedbaa16f41..246a561652f4 100644 --- a/arch/arm64/include/asm/mte.h +++ b/arch/arm64/include/asm/mte.h @@ -18,19 +18,25 @@ #include -void mte_clear_page_tags(void *addr); +void mte_clear_page_tags(void *page_addr); + unsigned long mte_copy_tags_from_user(void *to, const void __user *from, unsigned long n); unsigned long mte_copy_tags_to_user(void __user *to, void *from, unsigned long n); -int mte_save_tags(struct page *page); -void mte_save_page_tags(const void *page_addr, void *tag_storage); -void mte_restore_tags(swp_entry_t entry, struct page *page); -void mte_restore_page_tags(void *page_addr, const void *tag_storage); -void mte_invalidate_tags(int type, pgoff_t offset); -void mte_invalidate_tags_area(int type); -void *mte_allocate_tag_storage(void); -void mte_free_tag_storage(char *storage); + +/* page_private(page) contains the swp_entry.val value. */ +int mte_save_page_tags_by_swp_entry(struct page *page); +void mte_restore_page_tags_by_swp_entry(swp_entry_t entry, struct page *page); + +void mte_save_page_tags_to_mem(const void *page_addr, void *to); +void mte_restore_page_tags_from_mem(void *page_addr, const void *from); + +void mte_invalidate_tags_by_swp_entry(int type, pgoff_t offset); +void mte_invalidate_tags_area_by_swp_entry(int type); + +void *mte_allocate_tags_mem(void); +void mte_free_tags_mem(void *tags); #ifdef CONFIG_ARM64_MTE diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index e8a252e62b12..944860d7090e 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -1020,7 +1020,7 @@ static inline pmd_t pmdp_establish(struct vm_area_struct *vma, static inline int arch_prepare_to_swap(struct page *page) { if (system_supports_mte()) - return mte_save_tags(page); + return mte_save_page_tags_by_swp_entry(page); return 0; } @@ -1028,20 +1028,20 @@ static inline int arch_prepare_to_swap(struct page *page) static inline void arch_swap_invalidate_page(int type, pgoff_t offset) { if (system_supports_mte()) - mte_invalidate_tags(type, offset); + mte_invalidate_tags_by_swp_entry(type, offset); } static inline void arch_swap_invalidate_area(int type) { if (system_supports_mte()) - mte_invalidate_tags_area(type); + mte_invalidate_tags_area_by_swp_entry(type); } #define __HAVE_ARCH_SWAP_RESTORE static inline void arch_swap_restore(swp_entry_t entry, struct folio *folio) { if (system_supports_mte()) - mte_restore_tags(entry, &folio->page); + mte_restore_page_tags_by_swp_entry(entry, &folio->page); } #endif /* CONFIG_ARM64_MTE */ diff --git a/arch/arm64/kernel/elfcore.c b/arch/arm64/kernel/elfcore.c index 2e94d20c4ac7..c062c2c3d10d 100644 --- a/arch/arm64/kernel/elfcore.c +++ b/arch/arm64/kernel/elfcore.c @@ -17,7 +17,7 @@ static unsigned long mte_vma_tag_dump_size(struct core_vma_metadata *m) { - return (m->dump_size >> PAGE_SHIFT) * MTE_PAGE_TAG_STORAGE; + return (m->dump_size >> PAGE_SHIFT) * MTE_PAGE_TAG_STORAGE_SIZE; } /* Derived from dump_user_range(); start/end must be page-aligned */ @@ -38,7 +38,7 @@ static int mte_dump_tag_range(struct coredump_params *cprm, * have been all zeros. */ if (!page) { - dump_skip(cprm, MTE_PAGE_TAG_STORAGE); + dump_skip(cprm, MTE_PAGE_TAG_STORAGE_SIZE); continue; } @@ -48,12 +48,12 @@ static int mte_dump_tag_range(struct coredump_params *cprm, */ if (!page_mte_tagged(page)) { put_page(page); - dump_skip(cprm, MTE_PAGE_TAG_STORAGE); + dump_skip(cprm, MTE_PAGE_TAG_STORAGE_SIZE); continue; } if (!tags) { - tags = mte_allocate_tag_storage(); + tags = mte_allocate_tags_mem(); if (!tags) { put_page(page); ret = 0; @@ -61,16 +61,16 @@ static int mte_dump_tag_range(struct coredump_params *cprm, } } - mte_save_page_tags(page_address(page), tags); + mte_save_page_tags_to_mem(page_address(page), tags); put_page(page); - if (!dump_emit(cprm, tags, MTE_PAGE_TAG_STORAGE)) { + if (!dump_emit(cprm, tags, MTE_PAGE_TAG_STORAGE_SIZE)) { ret = 0; break; } } if (tags) - mte_free_tag_storage(tags); + mte_free_tags_mem(tags); return ret; } diff --git a/arch/arm64/kernel/hibernate.c b/arch/arm64/kernel/hibernate.c index 02870beb271e..f3cdbd8ba8f9 100644 --- a/arch/arm64/kernel/hibernate.c +++ b/arch/arm64/kernel/hibernate.c @@ -215,41 +215,41 @@ static int create_safe_exec_page(void *src_start, size_t length, #ifdef CONFIG_ARM64_MTE -static DEFINE_XARRAY(mte_pages); +static DEFINE_XARRAY(tags_by_pfn); -static int save_tags(struct page *page, unsigned long pfn) +static int save_page_tags_by_pfn(struct page *page, unsigned long pfn) { - void *tag_storage, *ret; + void *tags, *ret; - tag_storage = mte_allocate_tag_storage(); - if (!tag_storage) + tags = mte_allocate_tags_mem(); + if (!tags) return -ENOMEM; - mte_save_page_tags(page_address(page), tag_storage); + mte_save_page_tags_to_mem(page_address(page), tags); - ret = xa_store(&mte_pages, pfn, tag_storage, GFP_KERNEL); + ret = xa_store(&tags_by_pfn, pfn, tags, GFP_KERNEL); if (WARN(xa_is_err(ret), "Failed to store MTE tags")) { - mte_free_tag_storage(tag_storage); + mte_free_tags_mem(tags); return xa_err(ret); } else if (WARN(ret, "swsusp: %s: Duplicate entry", __func__)) { - mte_free_tag_storage(ret); + mte_free_tags_mem(ret); } return 0; } -static void swsusp_mte_free_storage(void) +static void swsusp_mte_free_tags(void) { - XA_STATE(xa_state, &mte_pages, 0); + XA_STATE(xa_state, &tags_by_pfn, 0); void *tags; - xa_lock(&mte_pages); + xa_lock(&tags_by_pfn); xas_for_each(&xa_state, tags, ULONG_MAX) { - mte_free_tag_storage(tags); + mte_free_tags_mem(tags); } - xa_unlock(&mte_pages); + xa_unlock(&tags_by_pfn); - xa_destroy(&mte_pages); + xa_destroy(&tags_by_pfn); } static int swsusp_mte_save_tags(void) @@ -273,9 +273,9 @@ static int swsusp_mte_save_tags(void) if (!page_mte_tagged(page)) continue; - ret = save_tags(page, pfn); + ret = save_page_tags_by_pfn(page, pfn); if (ret) { - swsusp_mte_free_storage(); + swsusp_mte_free_tags(); goto out; } @@ -290,25 +290,25 @@ static int swsusp_mte_save_tags(void) static void swsusp_mte_restore_tags(void) { - XA_STATE(xa_state, &mte_pages, 0); + XA_STATE(xa_state, &tags_by_pfn, 0); int n = 0; void *tags; - xa_lock(&mte_pages); + xa_lock(&tags_by_pfn); xas_for_each(&xa_state, tags, ULONG_MAX) { unsigned long pfn = xa_state.xa_index; struct page *page = pfn_to_online_page(pfn); - mte_restore_page_tags(page_address(page), tags); + mte_restore_page_tags_from_mem(page_address(page), tags); - mte_free_tag_storage(tags); + mte_free_tags_mem(tags); n++; } - xa_unlock(&mte_pages); + xa_unlock(&tags_by_pfn); pr_info("Restored %d MTE pages\n", n); - xa_destroy(&mte_pages); + xa_destroy(&tags_by_pfn); } #else /* CONFIG_ARM64_MTE */ diff --git a/arch/arm64/lib/mte.S b/arch/arm64/lib/mte.S index 5018ac03b6bf..d3c4ff70f48b 100644 --- a/arch/arm64/lib/mte.S +++ b/arch/arm64/lib/mte.S @@ -119,7 +119,7 @@ SYM_FUNC_START(mte_copy_tags_to_user) cbz x2, 2f 1: ldg x4, [x1] - ubfx x4, x4, #MTE_TAG_SHIFT, #MTE_TAG_SIZE + ubfx x4, x4, #MTE_TAG_SHIFT, #MTE_TAG_SIZE_BITS USER(2f, sttrb w4, [x0]) add x0, x0, #1 add x1, x1, #MTE_GRANULE_SIZE @@ -134,9 +134,9 @@ SYM_FUNC_END(mte_copy_tags_to_user) /* * Save the tags in a page * x0 - page address - * x1 - tag storage, MTE_PAGE_TAG_STORAGE bytes + * x1 - memory buffer, MTE_PAGE_TAG_STORAGE_SIZE bytes */ -SYM_FUNC_START(mte_save_page_tags) +SYM_FUNC_START(mte_save_page_tags_to_mem) multitag_transfer_size x7, x5 1: mov x2, #0 @@ -153,14 +153,14 @@ SYM_FUNC_START(mte_save_page_tags) b.ne 1b ret -SYM_FUNC_END(mte_save_page_tags) +SYM_FUNC_END(mte_save_page_tags_to_mem) /* * Restore the tags in a page * x0 - page address - * x1 - tag storage, MTE_PAGE_TAG_STORAGE bytes + * x1 - memory buffer, MTE_PAGE_TAG_STORAGE_SIZE bytes */ -SYM_FUNC_START(mte_restore_page_tags) +SYM_FUNC_START(mte_restore_page_tags_from_mem) multitag_transfer_size x7, x5 1: ldr x2, [x1], #8 @@ -174,4 +174,4 @@ SYM_FUNC_START(mte_restore_page_tags) b.ne 1b ret -SYM_FUNC_END(mte_restore_page_tags) +SYM_FUNC_END(mte_restore_page_tags_from_mem) diff --git a/arch/arm64/mm/mteswap.c b/arch/arm64/mm/mteswap.c index cd508ba80ab1..aaeca57f36cc 100644 --- a/arch/arm64/mm/mteswap.c +++ b/arch/arm64/mm/mteswap.c @@ -7,78 +7,78 @@ #include #include -static DEFINE_XARRAY(mte_pages); +static DEFINE_XARRAY(tags_by_swp_entry); -void *mte_allocate_tag_storage(void) +void *mte_allocate_tags_mem(void) { /* tags granule is 16 bytes, 2 tags stored per byte */ - return kmalloc(MTE_PAGE_TAG_STORAGE, GFP_KERNEL); + return kmalloc(MTE_PAGE_TAG_STORAGE_SIZE, GFP_KERNEL); } -void mte_free_tag_storage(char *storage) +void mte_free_tags_mem(void *tags) { - kfree(storage); + kfree(tags); } -int mte_save_tags(struct page *page) +int mte_save_page_tags_by_swp_entry(struct page *page) { - void *tag_storage, *ret; + void *tags, *ret; if (!page_mte_tagged(page)) return 0; - tag_storage = mte_allocate_tag_storage(); - if (!tag_storage) + tags = mte_allocate_tags_mem(); + if (!tags) return -ENOMEM; - mte_save_page_tags(page_address(page), tag_storage); + mte_save_page_tags_to_mem(page_address(page), tags); /* page_private contains the swap entry.val set in do_swap_page */ - ret = xa_store(&mte_pages, page_private(page), tag_storage, GFP_KERNEL); + ret = xa_store(&tags_by_swp_entry, page_private(page), tags, GFP_KERNEL); if (WARN(xa_is_err(ret), "Failed to store MTE tags")) { - mte_free_tag_storage(tag_storage); + mte_free_tags_mem(tags); return xa_err(ret); } else if (ret) { /* Entry is being replaced, free the old entry */ - mte_free_tag_storage(ret); + mte_free_tags_mem(ret); } return 0; } -void mte_restore_tags(swp_entry_t entry, struct page *page) +void mte_restore_page_tags_by_swp_entry(swp_entry_t entry, struct page *page) { - void *tags = xa_load(&mte_pages, entry.val); + void *tags = xa_load(&tags_by_swp_entry, entry.val); if (!tags) return; if (try_page_mte_tagging(page)) { - mte_restore_page_tags(page_address(page), tags); + mte_restore_page_tags_from_mem(page_address(page), tags); set_page_mte_tagged(page); } } -void mte_invalidate_tags(int type, pgoff_t offset) +void mte_invalidate_tags_by_swp_entry(int type, pgoff_t offset) { swp_entry_t entry = swp_entry(type, offset); - void *tags = xa_erase(&mte_pages, entry.val); + void *tags = xa_erase(&tags_by_swp_entry, entry.val); - mte_free_tag_storage(tags); + mte_free_tags_mem(tags); } -void mte_invalidate_tags_area(int type) +void mte_invalidate_tags_area_by_swp_entry(int type) { swp_entry_t entry = swp_entry(type, 0); swp_entry_t last_entry = swp_entry(type + 1, 0); void *tags; - XA_STATE(xa_state, &mte_pages, entry.val); + XA_STATE(xa_state, &tags_by_swp_entry, entry.val); - xa_lock(&mte_pages); + xa_lock(&tags_by_swp_entry); xas_for_each(&xa_state, tags, last_entry.val - 1) { - __xa_erase(&mte_pages, xa_state.xa_index); - mte_free_tag_storage(tags); + __xa_erase(&tags_by_swp_entry, xa_state.xa_index); + mte_free_tags_mem(tags); } - xa_unlock(&mte_pages); + xa_unlock(&tags_by_swp_entry); } From patchwork Wed Aug 23 13:13:16 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexandru Elisei X-Patchwork-Id: 13362313 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 31441EE49A3 for ; Wed, 23 Aug 2023 13:14:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235530AbjHWNOd (ORCPT ); Wed, 23 Aug 2023 09:14:33 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40050 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235514AbjHWNOb (ORCPT ); Wed, 23 Aug 2023 09:14:31 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 93402E57; Wed, 23 Aug 2023 06:14:28 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id DC3DF1516; Wed, 23 Aug 2023 06:15:08 -0700 (PDT) Received: from e121798.cable.virginm.net (unknown [172.31.20.19]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 7BE823F740; Wed, 23 Aug 2023 06:14:22 -0700 (PDT) From: Alexandru Elisei To: catalin.marinas@arm.com, will@kernel.org, oliver.upton@linux.dev, maz@kernel.org, james.morse@arm.com, suzuki.poulose@arm.com, yuzenghui@huawei.com, arnd@arndb.de, akpm@linux-foundation.org, mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, bristot@redhat.com, vschneid@redhat.com, mhiramat@kernel.org, rppt@kernel.org, hughd@google.com Cc: pcc@google.com, steven.price@arm.com, anshuman.khandual@arm.com, vincenzo.frascino@arm.com, david@redhat.com, eugenis@google.com, kcc@google.com, hyesoo.yu@samsung.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, kvmarm@lists.linux.dev, linux-fsdevel@vger.kernel.org, linux-arch@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org Subject: [PATCH RFC 03/37] arm64: mte: Rename __GFP_ZEROTAGS to __GFP_TAGGED Date: Wed, 23 Aug 2023 14:13:16 +0100 Message-Id: <20230823131350.114942-4-alexandru.elisei@arm.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230823131350.114942-1-alexandru.elisei@arm.com> References: <20230823131350.114942-1-alexandru.elisei@arm.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-trace-kernel@vger.kernel.org __GFP_ZEROTAGS is used to instruct the page allocator to zero the tags at the same time as the physical frame is zeroed. The name can be slightly misleading, because it doesn't mean that the code will zero the tags unconditionally, but that the tags will be zeroed if and only if the physical frame is also zeroed (either __GFP_ZERO is set or init_on_alloc is 1). Rename it to __GFP_TAGGED, in preparation for it to be used by the page allocator to recognize when an allocation is tagged (has metadata). Signed-off-by: Alexandru Elisei --- arch/arm64/mm/fault.c | 2 +- include/linux/gfp_types.h | 14 +++++++------- include/trace/events/mmflags.h | 2 +- mm/page_alloc.c | 2 +- 4 files changed, 10 insertions(+), 10 deletions(-) diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c index 3fe516b32577..0ca89ebcdc63 100644 --- a/arch/arm64/mm/fault.c +++ b/arch/arm64/mm/fault.c @@ -949,7 +949,7 @@ struct folio *vma_alloc_zeroed_movable_folio(struct vm_area_struct *vma, * separate DC ZVA and STGM. */ if (vma->vm_flags & VM_MTE) - flags |= __GFP_ZEROTAGS; + flags |= __GFP_TAGGED; return vma_alloc_folio(flags, 0, vma, vaddr, false); } diff --git a/include/linux/gfp_types.h b/include/linux/gfp_types.h index 6583a58670c5..37b9e265d77e 100644 --- a/include/linux/gfp_types.h +++ b/include/linux/gfp_types.h @@ -45,7 +45,7 @@ typedef unsigned int __bitwise gfp_t; #define ___GFP_HARDWALL 0x100000u #define ___GFP_THISNODE 0x200000u #define ___GFP_ACCOUNT 0x400000u -#define ___GFP_ZEROTAGS 0x800000u +#define ___GFP_TAGGED 0x800000u #ifdef CONFIG_KASAN_HW_TAGS #define ___GFP_SKIP_ZERO 0x1000000u #define ___GFP_SKIP_KASAN 0x2000000u @@ -226,11 +226,11 @@ typedef unsigned int __bitwise gfp_t; * * %__GFP_ZERO returns a zeroed page on success. * - * %__GFP_ZEROTAGS zeroes memory tags at allocation time if the memory itself - * is being zeroed (either via __GFP_ZERO or via init_on_alloc, provided that - * __GFP_SKIP_ZERO is not set). This flag is intended for optimization: setting - * memory tags at the same time as zeroing memory has minimal additional - * performace impact. + * %__GFP_TAGGED marks the allocation as having tags, which will be zeroed it + * allocation time if the memory itself is being zeroed (either via __GFP_ZERO + * or via init_on_alloc, provided that __GFP_SKIP_ZERO is not set). This flag is + * intended for optimization: setting memory tags at the same time as zeroing + * memory has minimal additional performace impact. * * %__GFP_SKIP_KASAN makes KASAN skip unpoisoning on page allocation. * Used for userspace and vmalloc pages; the latter are unpoisoned by @@ -241,7 +241,7 @@ typedef unsigned int __bitwise gfp_t; #define __GFP_NOWARN ((__force gfp_t)___GFP_NOWARN) #define __GFP_COMP ((__force gfp_t)___GFP_COMP) #define __GFP_ZERO ((__force gfp_t)___GFP_ZERO) -#define __GFP_ZEROTAGS ((__force gfp_t)___GFP_ZEROTAGS) +#define __GFP_TAGGED ((__force gfp_t)___GFP_TAGGED) #define __GFP_SKIP_ZERO ((__force gfp_t)___GFP_SKIP_ZERO) #define __GFP_SKIP_KASAN ((__force gfp_t)___GFP_SKIP_KASAN) diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h index 1478b9dd05fa..4ccca8e73c93 100644 --- a/include/trace/events/mmflags.h +++ b/include/trace/events/mmflags.h @@ -50,7 +50,7 @@ gfpflag_string(__GFP_RECLAIM), \ gfpflag_string(__GFP_DIRECT_RECLAIM), \ gfpflag_string(__GFP_KSWAPD_RECLAIM), \ - gfpflag_string(__GFP_ZEROTAGS) + gfpflag_string(__GFP_TAGGED) #ifdef CONFIG_KASAN_HW_TAGS #define __def_gfpflag_names_kasan , \ diff --git a/mm/page_alloc.c b/mm/page_alloc.c index e6f950c54494..fdc230440a44 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1516,7 +1516,7 @@ inline void post_alloc_hook(struct page *page, unsigned int order, { bool init = !want_init_on_free() && want_init_on_alloc(gfp_flags) && !should_skip_init(gfp_flags); - bool zero_tags = init && (gfp_flags & __GFP_ZEROTAGS); + bool zero_tags = init && (gfp_flags & __GFP_TAGGED); int i; set_page_private(page, 0); From patchwork Wed Aug 23 13:13:17 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexandru Elisei X-Patchwork-Id: 13362314 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8CD01EE4993 for ; Wed, 23 Aug 2023 13:14:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235544AbjHWNOs (ORCPT ); Wed, 23 Aug 2023 09:14:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40144 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235515AbjHWNOk (ORCPT ); Wed, 23 Aug 2023 09:14:40 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 7A2AAE74; Wed, 23 Aug 2023 06:14:34 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id C581B152B; Wed, 23 Aug 2023 06:15:14 -0700 (PDT) Received: from e121798.cable.virginm.net (unknown [172.31.20.19]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 9A1373F740; Wed, 23 Aug 2023 06:14:28 -0700 (PDT) From: Alexandru Elisei To: catalin.marinas@arm.com, will@kernel.org, oliver.upton@linux.dev, maz@kernel.org, james.morse@arm.com, suzuki.poulose@arm.com, yuzenghui@huawei.com, arnd@arndb.de, akpm@linux-foundation.org, mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, bristot@redhat.com, vschneid@redhat.com, mhiramat@kernel.org, rppt@kernel.org, hughd@google.com Cc: pcc@google.com, steven.price@arm.com, anshuman.khandual@arm.com, vincenzo.frascino@arm.com, david@redhat.com, eugenis@google.com, kcc@google.com, hyesoo.yu@samsung.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, kvmarm@lists.linux.dev, linux-fsdevel@vger.kernel.org, linux-arch@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org Subject: [PATCH RFC 04/37] mm: Add MIGRATE_METADATA allocation policy Date: Wed, 23 Aug 2023 14:13:17 +0100 Message-Id: <20230823131350.114942-5-alexandru.elisei@arm.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230823131350.114942-1-alexandru.elisei@arm.com> References: <20230823131350.114942-1-alexandru.elisei@arm.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-trace-kernel@vger.kernel.org Some architectures implement hardware memory coloring to catch incorrect usage of memory allocation. One such architecture is arm64, which calls its hardware implementation Memory Tagging Extension. So far, the memory which stores the metadata has been configured by firmware and hidden from Linux. For arm64, it is impossible to to have the entire system RAM allocated with metadata because executable memory cannot be tagged. Furthermore, in practice, only a chunk of all the memory that can have tags is actually used as tagged. which leaves a portion of metadata memory unused. As such, it would be beneficial to use this memory, which so far has been unaccessible to Linux, to service allocation requests. To prepare for exposing this metadata memory a new migratetype is being added to the page allocator, called MIGRATE_METADATA. One important aspect is that for arm64 the memory that stores metadata cannot have metadata associated with it, it can only be used to store metadata for other pages. This means that the page allocator will *not* allocate from this migratetype if at least one of the following is true: - The allocation also needs metadata to be allocated. - The allocation isn't movable. A metadata page storing data must be able to be migrated at any given time so it can be repurposed to store metadata. Both cases are specific to arm64's implementation of memory metadata. For now, metadata storage pages management is disabled, and it will be enabled once the architecture-specific handling is added. Signed-off-by: Alexandru Elisei --- arch/arm64/include/asm/memory_metadata.h | 21 ++++++++++++++++++ arch/arm64/mm/fault.c | 3 +++ include/asm-generic/Kbuild | 1 + include/asm-generic/memory_metadata.h | 18 +++++++++++++++ include/linux/mmzone.h | 11 ++++++++++ mm/Kconfig | 3 +++ mm/internal.h | 5 +++++ mm/page_alloc.c | 28 ++++++++++++++++++++++++ 8 files changed, 90 insertions(+) create mode 100644 arch/arm64/include/asm/memory_metadata.h create mode 100644 include/asm-generic/memory_metadata.h diff --git a/arch/arm64/include/asm/memory_metadata.h b/arch/arm64/include/asm/memory_metadata.h new file mode 100644 index 000000000000..5269be7f455f --- /dev/null +++ b/arch/arm64/include/asm/memory_metadata.h @@ -0,0 +1,21 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) 2023 ARM Ltd. + */ +#ifndef __ASM_MEMORY_METADATA_H +#define __ASM_MEMORY_METADATA_H + +#include + +#ifdef CONFIG_MEMORY_METADATA +static inline bool metadata_storage_enabled(void) +{ + return false; +} +static inline bool alloc_can_use_metadata_pages(gfp_t gfp_mask) +{ + return false; +} +#endif /* CONFIG_MEMORY_METADATA */ + +#endif /* __ASM_MEMORY_METADATA_H */ diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c index 0ca89ebcdc63..1ca421c11ebc 100644 --- a/arch/arm64/mm/fault.c +++ b/arch/arm64/mm/fault.c @@ -13,6 +13,7 @@ #include #include #include +#include #include #include #include @@ -956,6 +957,8 @@ struct folio *vma_alloc_zeroed_movable_folio(struct vm_area_struct *vma, void tag_clear_highpage(struct page *page) { + /* Tag storage pages cannot be tagged. */ + WARN_ON_ONCE(is_migrate_metadata_page(page)); /* Newly allocated page, shouldn't have been tagged yet */ WARN_ON_ONCE(!try_page_mte_tagging(page)); mte_zero_clear_page_tags(page_address(page)); diff --git a/include/asm-generic/Kbuild b/include/asm-generic/Kbuild index 941be574bbe0..048ecffc430c 100644 --- a/include/asm-generic/Kbuild +++ b/include/asm-generic/Kbuild @@ -36,6 +36,7 @@ mandatory-y += kprobes.h mandatory-y += linkage.h mandatory-y += local.h mandatory-y += local64.h +mandatory-y += memory_metadata.h mandatory-y += mmiowb.h mandatory-y += mmu.h mandatory-y += mmu_context.h diff --git a/include/asm-generic/memory_metadata.h b/include/asm-generic/memory_metadata.h new file mode 100644 index 000000000000..dc0c84408a8e --- /dev/null +++ b/include/asm-generic/memory_metadata.h @@ -0,0 +1,18 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __ASM_GENERIC_MEMORY_METADATA_H +#define __ASM_GENERIC_MEMORY_METADATA_H + +#include + +#ifndef CONFIG_MEMORY_METADATA +static inline bool metadata_storage_enabled(void) +{ + return false; +} +static inline bool alloc_can_use_metadata_pages(gfp_t gfp_mask) +{ + return false; +} +#endif /* !CONFIG_MEMORY_METADATA */ + +#endif /* __ASM_GENERIC_MEMORY_METADATA_H */ diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 5e50b78d58ea..74925806687e 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -61,6 +61,9 @@ enum migratetype { */ MIGRATE_CMA, #endif +#ifdef CONFIG_MEMORY_METADATA + MIGRATE_METADATA, +#endif #ifdef CONFIG_MEMORY_ISOLATION MIGRATE_ISOLATE, /* can't allocate from here */ #endif @@ -78,6 +81,14 @@ extern const char * const migratetype_names[MIGRATE_TYPES]; # define is_migrate_cma_page(_page) false #endif +#ifdef CONFIG_MEMORY_METADATA +# define is_migrate_metadata(migratetype) unlikely((migratetype) == MIGRATE_METADATA) +# define is_migrate_metadata_page(_page) (get_pageblock_migratetype(_page) == MIGRATE_METADATA) +#else +# define is_migrate_metadata(migratetype) false +# define is_migrate_metadata_page(_page) false +#endif + static inline bool is_migrate_movable(int mt) { return is_migrate_cma(mt) || mt == MIGRATE_MOVABLE; diff --git a/mm/Kconfig b/mm/Kconfig index 09130434e30d..838193522e20 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -1236,6 +1236,9 @@ config LOCK_MM_AND_FIND_VMA bool depends on !STACK_GROWSUP +config MEMORY_METADATA + bool + source "mm/damon/Kconfig" endmenu diff --git a/mm/internal.h b/mm/internal.h index a7d9e980429a..efd52c9f1578 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -824,6 +824,11 @@ unsigned int reclaim_clean_pages_from_list(struct zone *zone, #define ALLOC_NOFRAGMENT 0x0 #endif #define ALLOC_HIGHATOMIC 0x200 /* Allows access to MIGRATE_HIGHATOMIC */ +#ifdef CONFIG_MEMORY_METADATA +#define ALLOC_FROM_METADATA 0x400 /* allow allocations from MIGRATE_METADATA list */ +#else +#define ALLOC_FROM_METADATA 0x0 +#endif #define ALLOC_KSWAPD 0x800 /* allow waking of kswapd, __GFP_KSWAPD_RECLAIM set */ /* Flags that allow allocations below the min watermark. */ diff --git a/mm/page_alloc.c b/mm/page_alloc.c index fdc230440a44..7baa78abf351 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -53,6 +53,7 @@ #include #include #include +#include #include "internal.h" #include "shuffle.h" #include "page_reporting.h" @@ -1645,6 +1646,17 @@ static inline struct page *__rmqueue_cma_fallback(struct zone *zone, unsigned int order) { return NULL; } #endif +#ifdef CONFIG_MEMORY_METADATA +static __always_inline struct page *__rmqueue_metadata_fallback(struct zone *zone, + unsigned int order) +{ + return __rmqueue_smallest(zone, order, MIGRATE_METADATA); +} +#else +static inline struct page *__rmqueue_metadata_fallback(struct zone *zone, + unsigned int order) { return NULL; } +#endif + /* * Move the free pages in a range to the freelist tail of the requested type. * Note that start_page and end_pages are not aligned on a pageblock @@ -2144,6 +2156,15 @@ __rmqueue(struct zone *zone, unsigned int order, int migratetype, if (alloc_flags & ALLOC_CMA) page = __rmqueue_cma_fallback(zone, order); + /* + * Allocate data pages from MIGRATE_METADATA only if the regular + * allocation path fails to increase the chance that the + * metadata page is available when the associated data page + * needs it. + */ + if (!page && (alloc_flags & ALLOC_FROM_METADATA)) + page = __rmqueue_metadata_fallback(zone, order); + if (!page && __rmqueue_fallback(zone, order, migratetype, alloc_flags)) goto retry; @@ -3088,6 +3109,13 @@ static inline unsigned int gfp_to_alloc_flags_fast(gfp_t gfp_mask, if (gfp_migratetype(gfp_mask) == MIGRATE_MOVABLE) alloc_flags |= ALLOC_CMA; #endif +#ifdef CONFIG_MEMORY_METADATA + if (metadata_storage_enabled() && + gfp_migratetype(gfp_mask) == MIGRATE_MOVABLE && + alloc_can_use_metadata_pages(gfp_mask)) + alloc_flags |= ALLOC_FROM_METADATA; +#endif + return alloc_flags; } From patchwork Wed Aug 23 13:13:18 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexandru Elisei X-Patchwork-Id: 13362315 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 15E3CEE49B0 for ; Wed, 23 Aug 2023 13:14:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235543AbjHWNO7 (ORCPT ); Wed, 23 Aug 2023 09:14:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35002 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235558AbjHWNOw (ORCPT ); Wed, 23 Aug 2023 09:14:52 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 524A210EC; Wed, 23 Aug 2023 06:14:41 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 85FE01570; Wed, 23 Aug 2023 06:15:21 -0700 (PDT) Received: from e121798.cable.virginm.net (unknown [172.31.20.19]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id B54533F740; Wed, 23 Aug 2023 06:14:34 -0700 (PDT) From: Alexandru Elisei To: catalin.marinas@arm.com, will@kernel.org, oliver.upton@linux.dev, maz@kernel.org, james.morse@arm.com, suzuki.poulose@arm.com, yuzenghui@huawei.com, arnd@arndb.de, akpm@linux-foundation.org, mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, bristot@redhat.com, vschneid@redhat.com, mhiramat@kernel.org, rppt@kernel.org, hughd@google.com Cc: pcc@google.com, steven.price@arm.com, anshuman.khandual@arm.com, vincenzo.frascino@arm.com, david@redhat.com, eugenis@google.com, kcc@google.com, hyesoo.yu@samsung.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, kvmarm@lists.linux.dev, linux-fsdevel@vger.kernel.org, linux-arch@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org Subject: [PATCH RFC 05/37] mm: Add memory statistics for the MIGRATE_METADATA allocation policy Date: Wed, 23 Aug 2023 14:13:18 +0100 Message-Id: <20230823131350.114942-6-alexandru.elisei@arm.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230823131350.114942-1-alexandru.elisei@arm.com> References: <20230823131350.114942-1-alexandru.elisei@arm.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-trace-kernel@vger.kernel.org Keep track of the total number of metadata pages available in the system, as well as the per-zone pages. Opportunistically add braces to an "if" block from rmqueue_bulk() where the body contains multiple lines of code. Signed-off-by: Alexandru Elisei --- fs/proc/meminfo.c | 8 ++++++++ include/asm-generic/memory_metadata.h | 2 ++ include/linux/mmzone.h | 13 +++++++++++++ include/linux/vmstat.h | 2 ++ mm/page_alloc.c | 18 +++++++++++++++++- mm/page_owner.c | 3 ++- mm/show_mem.c | 4 ++++ mm/vmstat.c | 8 ++++++-- 8 files changed, 54 insertions(+), 4 deletions(-) diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c index 8dca4d6d96c7..c9970860b5be 100644 --- a/fs/proc/meminfo.c +++ b/fs/proc/meminfo.c @@ -17,6 +17,9 @@ #ifdef CONFIG_CMA #include #endif +#ifdef CONFIG_MEMORY_METADATA +#include +#endif #include #include "internal.h" @@ -167,6 +170,11 @@ static int meminfo_proc_show(struct seq_file *m, void *v) show_val_kb(m, "CmaFree: ", global_zone_page_state(NR_FREE_CMA_PAGES)); #endif +#ifdef CONFIG_MEMORY_METADATA + show_val_kb(m, "MetadataTotal: ", totalmetadata_pages); + show_val_kb(m, "MetadataFree: ", + global_zone_page_state(NR_FREE_METADATA_PAGES)); +#endif #ifdef CONFIG_UNACCEPTED_MEMORY show_val_kb(m, "Unaccepted: ", diff --git a/include/asm-generic/memory_metadata.h b/include/asm-generic/memory_metadata.h index dc0c84408a8e..63ea661b354d 100644 --- a/include/asm-generic/memory_metadata.h +++ b/include/asm-generic/memory_metadata.h @@ -4,6 +4,8 @@ #include +extern unsigned long totalmetadata_pages; + #ifndef CONFIG_MEMORY_METADATA static inline bool metadata_storage_enabled(void) { diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 74925806687e..48c237248d87 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -160,6 +160,7 @@ enum zone_stat_item { #ifdef CONFIG_UNACCEPTED_MEMORY NR_UNACCEPTED, #endif + NR_FREE_METADATA_PAGES, NR_VM_ZONE_STAT_ITEMS }; enum node_stat_item { @@ -914,6 +915,9 @@ struct zone { #ifdef CONFIG_CMA unsigned long cma_pages; #endif +#ifdef CONFIG_MEMORY_METADATA + unsigned long metadata_pages; +#endif const char *name; @@ -1026,6 +1030,15 @@ static inline unsigned long zone_cma_pages(struct zone *zone) #endif } +static inline unsigned long zone_metadata_pages(struct zone *zone) +{ +#ifdef CONFIG_MEMORY_METADATA + return zone->metadata_pages; +#else + return 0; +#endif +} + static inline unsigned long zone_end_pfn(const struct zone *zone) { return zone->zone_start_pfn + zone->spanned_pages; diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h index fed855bae6d8..15aa069df6b1 100644 --- a/include/linux/vmstat.h +++ b/include/linux/vmstat.h @@ -493,6 +493,8 @@ static inline void __mod_zone_freepage_state(struct zone *zone, int nr_pages, __mod_zone_page_state(zone, NR_FREE_PAGES, nr_pages); if (is_migrate_cma(migratetype)) __mod_zone_page_state(zone, NR_FREE_CMA_PAGES, nr_pages); + if (is_migrate_metadata(migratetype)) + __mod_zone_page_state(zone, NR_FREE_METADATA_PAGES, nr_pages); } extern const char * const vmstat_text[]; diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 7baa78abf351..829134a4dfa8 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2202,9 +2202,14 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order, * pages are ordered properly. */ list_add_tail(&page->pcp_list, list); - if (is_migrate_cma(get_pcppage_migratetype(page))) + if (is_migrate_cma(get_pcppage_migratetype(page))) { __mod_zone_page_state(zone, NR_FREE_CMA_PAGES, -(1 << order)); + } + if (is_migrate_metadata(get_pcppage_migratetype(page))) { + __mod_zone_page_state(zone, NR_FREE_METADATA_PAGES, + -(1 << order)); + } } __mod_zone_page_state(zone, NR_FREE_PAGES, -(i << order)); @@ -2894,6 +2899,10 @@ static inline long __zone_watermark_unusable_free(struct zone *z, #ifdef CONFIG_UNACCEPTED_MEMORY unusable_free += zone_page_state(z, NR_UNACCEPTED); #endif +#ifdef CONFIG_MEMORY_METADATA + if (!(alloc_flags & ALLOC_FROM_METADATA)) + unusable_free += zone_page_state(z, NR_FREE_METADATA_PAGES); +#endif return unusable_free; } @@ -2974,6 +2983,13 @@ bool __zone_watermark_ok(struct zone *z, unsigned int order, unsigned long mark, return true; } #endif + +#ifdef CONFIG_MEMORY_METADATA + if ((alloc_flags & ALLOC_FROM_METADATA) && + !free_area_empty(area, MIGRATE_METADATA)) { + return true; + } +#endif if ((alloc_flags & (ALLOC_HIGHATOMIC|ALLOC_OOM)) && !free_area_empty(area, MIGRATE_HIGHATOMIC)) { return true; diff --git a/mm/page_owner.c b/mm/page_owner.c index c93baef0148f..c66e25536068 100644 --- a/mm/page_owner.c +++ b/mm/page_owner.c @@ -333,7 +333,8 @@ void pagetypeinfo_showmixedcount_print(struct seq_file *m, page_owner = get_page_owner(page_ext); page_mt = gfp_migratetype(page_owner->gfp_mask); if (pageblock_mt != page_mt) { - if (is_migrate_cma(pageblock_mt)) + if (is_migrate_cma(pageblock_mt) || + is_migrate_metadata(pageblock_mt)) count[MIGRATE_MOVABLE]++; else count[pageblock_mt]++; diff --git a/mm/show_mem.c b/mm/show_mem.c index 01f8e9905817..3935410c98ac 100644 --- a/mm/show_mem.c +++ b/mm/show_mem.c @@ -22,6 +22,7 @@ atomic_long_t _totalram_pages __read_mostly; EXPORT_SYMBOL(_totalram_pages); unsigned long totalreserve_pages __read_mostly; unsigned long totalcma_pages __read_mostly; +unsigned long totalmetadata_pages __read_mostly; static inline void show_node(struct zone *zone) { @@ -423,6 +424,9 @@ void __show_mem(unsigned int filter, nodemask_t *nodemask, int max_zone_idx) #ifdef CONFIG_CMA printk("%lu pages cma reserved\n", totalcma_pages); #endif +#ifdef CONFIG_MEMORY_METADATA + printk("%lu pages metadata reserved\n", totalmetadata_pages); +#endif #ifdef CONFIG_MEMORY_FAILURE printk("%lu pages hwpoisoned\n", atomic_long_read(&num_poisoned_pages)); #endif diff --git a/mm/vmstat.c b/mm/vmstat.c index b731d57996c5..07caa284a724 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1184,6 +1184,7 @@ const char * const vmstat_text[] = { #ifdef CONFIG_UNACCEPTED_MEMORY "nr_unaccepted", #endif + "nr_free_metadata", /* enum numa_stat_item counters */ #ifdef CONFIG_NUMA @@ -1695,7 +1696,8 @@ static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat, "\n spanned %lu" "\n present %lu" "\n managed %lu" - "\n cma %lu", + "\n cma %lu" + "\n metadata %lu", zone_page_state(zone, NR_FREE_PAGES), zone->watermark_boost, min_wmark_pages(zone), @@ -1704,7 +1706,8 @@ static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat, zone->spanned_pages, zone->present_pages, zone_managed_pages(zone), - zone_cma_pages(zone)); + zone_cma_pages(zone), + zone_metadata_pages(zone)); seq_printf(m, "\n protection: (%ld", @@ -1909,6 +1912,7 @@ int vmstat_refresh(struct ctl_table *table, int write, switch (i) { case NR_ZONE_WRITE_PENDING: case NR_FREE_CMA_PAGES: + case NR_FREE_METADATA_PAGES: continue; } val = atomic_long_read(&vm_zone_stat[i]); From patchwork Wed Aug 23 13:13:19 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexandru Elisei X-Patchwork-Id: 13362316 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 553AAEE49A0 for ; Wed, 23 Aug 2023 13:15:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235539AbjHWNPQ (ORCPT ); Wed, 23 Aug 2023 09:15:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35870 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235534AbjHWNPP (ORCPT ); Wed, 23 Aug 2023 09:15:15 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 54FCA170E; Wed, 23 Aug 2023 06:14:52 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 919B61576; Wed, 23 Aug 2023 06:15:27 -0700 (PDT) Received: from e121798.cable.virginm.net (unknown [172.31.20.19]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 415F23F740; Wed, 23 Aug 2023 06:14:41 -0700 (PDT) From: Alexandru Elisei To: catalin.marinas@arm.com, will@kernel.org, oliver.upton@linux.dev, maz@kernel.org, james.morse@arm.com, suzuki.poulose@arm.com, yuzenghui@huawei.com, arnd@arndb.de, akpm@linux-foundation.org, mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, bristot@redhat.com, vschneid@redhat.com, mhiramat@kernel.org, rppt@kernel.org, hughd@google.com Cc: pcc@google.com, steven.price@arm.com, anshuman.khandual@arm.com, vincenzo.frascino@arm.com, david@redhat.com, eugenis@google.com, kcc@google.com, hyesoo.yu@samsung.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, kvmarm@lists.linux.dev, linux-fsdevel@vger.kernel.org, linux-arch@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org Subject: [PATCH RFC 06/37] mm: page_alloc: Allocate from movable pcp lists only if ALLOC_FROM_METADATA Date: Wed, 23 Aug 2023 14:13:19 +0100 Message-Id: <20230823131350.114942-7-alexandru.elisei@arm.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230823131350.114942-1-alexandru.elisei@arm.com> References: <20230823131350.114942-1-alexandru.elisei@arm.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-trace-kernel@vger.kernel.org pcp lists keep MIGRATE_METADATA pages on the MIGRATE_MOVABLE list. Make sure pages from the movable list are allocated only when the ALLOC_FROM_METADATA alloc flag is set, as otherwise the page allocator could end up allocating a metadata page when that page cannot be used. __alloc_pages_bulk() sidesteps rmqueue() and calls __rmqueue_pcplist() directly. Add a check for the flag before calling __rmqueue_pcplist(), and fallback to __alloc_pages() if the check is false. Note that CMA isn't a problem for __alloc_pages_bulk(): an allocation can always use CMA pages if the requested migratetype is MIGRATE_MOVABLE, which is not the case with MIGRATE_METADATA pages. Signed-off-by: Alexandru Elisei --- mm/page_alloc.c | 21 +++++++++++++++++---- 1 file changed, 17 insertions(+), 4 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 829134a4dfa8..a693e23c4733 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2845,11 +2845,16 @@ struct page *rmqueue(struct zone *preferred_zone, if (likely(pcp_allowed_order(order))) { /* - * MIGRATE_MOVABLE pcplist could have the pages on CMA area and - * we need to skip it when CMA area isn't allowed. + * PCP lists keep MIGRATE_CMA/MIGRATE_METADATA pages on the same + * movable list. Make sure it's allowed to allocate both type of + * pages before allocating from the movable list. */ - if (!IS_ENABLED(CONFIG_CMA) || alloc_flags & ALLOC_CMA || - migratetype != MIGRATE_MOVABLE) { + bool movable_allowed = (!IS_ENABLED(CONFIG_CMA) || + (alloc_flags & ALLOC_CMA)) && + (!IS_ENABLED(CONFIG_MEMORY_METADATA) || + (alloc_flags & ALLOC_FROM_METADATA)); + + if (migratetype != MIGRATE_MOVABLE || movable_allowed) { page = rmqueue_pcplist(preferred_zone, zone, order, migratetype, alloc_flags); if (likely(page)) @@ -4388,6 +4393,14 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid, goto out; gfp = alloc_gfp; + /* + * pcp lists puts MIGRATE_METADATA on the MIGRATE_MOVABLE list, don't + * use pcp if allocating metadata pages is not allowed. + */ + if (metadata_storage_enabled() && ac.migratetype == MIGRATE_MOVABLE && + !(alloc_flags & ALLOC_FROM_METADATA)) + goto failed; + /* Find an allowed local zone that meets the low watermark. */ for_each_zone_zonelist_nodemask(zone, z, ac.zonelist, ac.highest_zoneidx, ac.nodemask) { unsigned long mark; From patchwork Wed Aug 23 13:13:20 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexandru Elisei X-Patchwork-Id: 13362317 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 905DBEE49A3 for ; Wed, 23 Aug 2023 13:15:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235574AbjHWNPS (ORCPT ); Wed, 23 Aug 2023 09:15:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35922 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235564AbjHWNPR (ORCPT ); Wed, 23 Aug 2023 09:15:17 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 8C2D9171C; Wed, 23 Aug 2023 06:14:53 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id B6AC91595; Wed, 23 Aug 2023 06:15:33 -0700 (PDT) Received: from e121798.cable.virginm.net (unknown [172.31.20.19]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 51AF33F740; Wed, 23 Aug 2023 06:14:47 -0700 (PDT) From: Alexandru Elisei To: catalin.marinas@arm.com, will@kernel.org, oliver.upton@linux.dev, maz@kernel.org, james.morse@arm.com, suzuki.poulose@arm.com, yuzenghui@huawei.com, arnd@arndb.de, akpm@linux-foundation.org, mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, bristot@redhat.com, vschneid@redhat.com, mhiramat@kernel.org, rppt@kernel.org, hughd@google.com Cc: pcc@google.com, steven.price@arm.com, anshuman.khandual@arm.com, vincenzo.frascino@arm.com, david@redhat.com, eugenis@google.com, kcc@google.com, hyesoo.yu@samsung.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, kvmarm@lists.linux.dev, linux-fsdevel@vger.kernel.org, linux-arch@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org Subject: [PATCH RFC 07/37] mm: page_alloc: Bypass pcp when freeing MIGRATE_METADATA pages Date: Wed, 23 Aug 2023 14:13:20 +0100 Message-Id: <20230823131350.114942-8-alexandru.elisei@arm.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230823131350.114942-1-alexandru.elisei@arm.com> References: <20230823131350.114942-1-alexandru.elisei@arm.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-trace-kernel@vger.kernel.org When a metadata page is returned to the page allocator because all the associated pages with metadata were freed, the page will be returned to the pcp list, which makes it very likely that it will be used to satisfy an allocation request. This is not optimal, because metadata pages should be used as a last resort, to increase the chances they are not in use when they are needed, to avoid costly page migration. Bypass the pcp lists when freeing metadata pages. Note that metadata pages can still end up on the pcp lists when a list is refilled, but this should only happen when memory is running low, which is as intended Signed-off-by: Alexandru Elisei --- mm/page_alloc.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index a693e23c4733..bbb49b489230 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2478,7 +2478,8 @@ void free_unref_page(struct page *page, unsigned int order) */ migratetype = get_pcppage_migratetype(page); if (unlikely(migratetype >= MIGRATE_PCPTYPES)) { - if (unlikely(is_migrate_isolate(migratetype))) { + if (unlikely(is_migrate_isolate(migratetype) || + is_migrate_metadata(migratetype))) { free_one_page(page_zone(page), page, pfn, order, migratetype, FPI_NONE); return; } @@ -2522,7 +2523,8 @@ void free_unref_page_list(struct list_head *list) * comment in free_unref_page. */ migratetype = get_pcppage_migratetype(page); - if (unlikely(is_migrate_isolate(migratetype))) { + if (unlikely(is_migrate_isolate(migratetype) || + is_migrate_metadata(migratetype))) { list_del(&page->lru); free_one_page(page_zone(page), page, pfn, 0, migratetype, FPI_NONE); continue; From patchwork Wed Aug 23 13:13:21 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexandru Elisei X-Patchwork-Id: 13362318 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 95420EE4993 for ; Wed, 23 Aug 2023 13:15:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235575AbjHWNPb (ORCPT ); Wed, 23 Aug 2023 09:15:31 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59196 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235564AbjHWNPa (ORCPT ); Wed, 23 Aug 2023 09:15:30 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 72AA810E5; Wed, 23 Aug 2023 06:14:59 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id AB0511596; Wed, 23 Aug 2023 06:15:39 -0700 (PDT) Received: from e121798.cable.virginm.net (unknown [172.31.20.19]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 756C53F740; Wed, 23 Aug 2023 06:14:53 -0700 (PDT) From: Alexandru Elisei To: catalin.marinas@arm.com, will@kernel.org, oliver.upton@linux.dev, maz@kernel.org, james.morse@arm.com, suzuki.poulose@arm.com, yuzenghui@huawei.com, arnd@arndb.de, akpm@linux-foundation.org, mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, bristot@redhat.com, vschneid@redhat.com, mhiramat@kernel.org, rppt@kernel.org, hughd@google.com Cc: pcc@google.com, steven.price@arm.com, anshuman.khandual@arm.com, vincenzo.frascino@arm.com, david@redhat.com, eugenis@google.com, kcc@google.com, hyesoo.yu@samsung.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, kvmarm@lists.linux.dev, linux-fsdevel@vger.kernel.org, linux-arch@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org Subject: [PATCH RFC 08/37] mm: compaction: Account for free metadata pages in __compact_finished() Date: Wed, 23 Aug 2023 14:13:21 +0100 Message-Id: <20230823131350.114942-9-alexandru.elisei@arm.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230823131350.114942-1-alexandru.elisei@arm.com> References: <20230823131350.114942-1-alexandru.elisei@arm.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-trace-kernel@vger.kernel.org __compact_finished() signals the end of compaction if a page of an order greater than or equal to the requested order if found on a free_area. When allocation of MIGRATE_METADATA pages is allowed, count the number of free metadata storage pages towards the request order. Signed-off-by: Alexandru Elisei --- mm/compaction.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/mm/compaction.c b/mm/compaction.c index dbc9f86b1934..f132c02b0655 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -2208,6 +2208,13 @@ static enum compact_result __compact_finished(struct compact_control *cc) if (migratetype == MIGRATE_MOVABLE && !free_area_empty(area, MIGRATE_CMA)) return COMPACT_SUCCESS; +#endif +#ifdef CONFIG_MEMORY_METADATA + if (metadata_storage_enabled() && + migratetype == MIGRATE_MOVABLE && + (cc->alloc_flags & ALLOC_FROM_METADATA) && + !free_area_empty(area, MIGRATE_METADATA)) + return COMPACT_SUCCESS; #endif /* * Job done if allocation would steal freepages from From patchwork Wed Aug 23 13:13:22 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexandru Elisei X-Patchwork-Id: 13362319 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2D1B0EE4993 for ; Wed, 23 Aug 2023 13:15:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235534AbjHWNPo (ORCPT ); Wed, 23 Aug 2023 09:15:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38144 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234904AbjHWNPm (ORCPT ); Wed, 23 Aug 2023 09:15:42 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 1049CE74; Wed, 23 Aug 2023 06:15:13 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id A7E7415A1; Wed, 23 Aug 2023 06:15:45 -0700 (PDT) Received: from e121798.cable.virginm.net (unknown [172.31.20.19]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 647253F740; Wed, 23 Aug 2023 06:14:59 -0700 (PDT) From: Alexandru Elisei To: catalin.marinas@arm.com, will@kernel.org, oliver.upton@linux.dev, maz@kernel.org, james.morse@arm.com, suzuki.poulose@arm.com, yuzenghui@huawei.com, arnd@arndb.de, akpm@linux-foundation.org, mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, bristot@redhat.com, vschneid@redhat.com, mhiramat@kernel.org, rppt@kernel.org, hughd@google.com Cc: pcc@google.com, steven.price@arm.com, anshuman.khandual@arm.com, vincenzo.frascino@arm.com, david@redhat.com, eugenis@google.com, kcc@google.com, hyesoo.yu@samsung.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, kvmarm@lists.linux.dev, linux-fsdevel@vger.kernel.org, linux-arch@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org Subject: [PATCH RFC 09/37] mm: compaction: Handle metadata pages as source for direct compaction Date: Wed, 23 Aug 2023 14:13:22 +0100 Message-Id: <20230823131350.114942-10-alexandru.elisei@arm.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230823131350.114942-1-alexandru.elisei@arm.com> References: <20230823131350.114942-1-alexandru.elisei@arm.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-trace-kernel@vger.kernel.org Metadata pages can have special requirements and can only be allocated if an architecture allows it. In the direct compaction case, the source pages that will be migrated will then be used to satisfy the allocation request that triggered the compaction. Make sure that the allocation allows the use of metadata pages when considering them for migration. When a page is freed during direct compaction, the page allocator will try to use that page to satisfy the allocation request. Don't capture a metadata page in this case, even if the allocation request would allow it, to increase the chances that the page is free when it needs to be taken from the allocator to store metadata. Signed-off-by: Alexandru Elisei --- mm/compaction.c | 10 ++++++++-- mm/page_alloc.c | 1 + 2 files changed, 9 insertions(+), 2 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index f132c02b0655..a29db409c5cc 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -23,6 +23,7 @@ #include #include #include +#include #include "internal.h" #ifdef CONFIG_COMPACTION @@ -1307,11 +1308,16 @@ static bool suitable_migration_source(struct compact_control *cc, if (pageblock_skip_persistent(page)) return false; + block_mt = get_pageblock_migratetype(page); + + if (metadata_storage_enabled() && cc->direct_compaction && + is_migrate_metadata(block_mt) && + !(cc->alloc_flags & ALLOC_FROM_METADATA)) + return false; + if ((cc->mode != MIGRATE_ASYNC) || !cc->direct_compaction) return true; - block_mt = get_pageblock_migratetype(page); - if (cc->migratetype == MIGRATE_MOVABLE) return is_migrate_movable(block_mt); else diff --git a/mm/page_alloc.c b/mm/page_alloc.c index bbb49b489230..011645d07ce9 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -654,6 +654,7 @@ compaction_capture(struct capture_control *capc, struct page *page, /* Do not accidentally pollute CMA or isolated regions*/ if (is_migrate_cma(migratetype) || + is_migrate_metadata(migratetype) || is_migrate_isolate(migratetype)) return false; From patchwork Wed Aug 23 13:13:23 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexandru Elisei X-Patchwork-Id: 13362320 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2F86EEE49A3 for ; Wed, 23 Aug 2023 13:15:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235594AbjHWNP6 (ORCPT ); Wed, 23 Aug 2023 09:15:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53208 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235612AbjHWNP4 (ORCPT ); Wed, 23 Aug 2023 09:15:56 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id D2D6D10C8; Wed, 23 Aug 2023 06:15:22 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 72F7D15BF; Wed, 23 Aug 2023 06:15:52 -0700 (PDT) Received: from e121798.cable.virginm.net (unknown [172.31.20.19]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 69A753F740; Wed, 23 Aug 2023 06:15:05 -0700 (PDT) From: Alexandru Elisei To: catalin.marinas@arm.com, will@kernel.org, oliver.upton@linux.dev, maz@kernel.org, james.morse@arm.com, suzuki.poulose@arm.com, yuzenghui@huawei.com, arnd@arndb.de, akpm@linux-foundation.org, mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, bristot@redhat.com, vschneid@redhat.com, mhiramat@kernel.org, rppt@kernel.org, hughd@google.com Cc: pcc@google.com, steven.price@arm.com, anshuman.khandual@arm.com, vincenzo.frascino@arm.com, david@redhat.com, eugenis@google.com, kcc@google.com, hyesoo.yu@samsung.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, kvmarm@lists.linux.dev, linux-fsdevel@vger.kernel.org, linux-arch@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org Subject: [PATCH RFC 10/37] mm: compaction: Do not use MIGRATE_METADATA to replace pages with metadata Date: Wed, 23 Aug 2023 14:13:23 +0100 Message-Id: <20230823131350.114942-11-alexandru.elisei@arm.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230823131350.114942-1-alexandru.elisei@arm.com> References: <20230823131350.114942-1-alexandru.elisei@arm.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-trace-kernel@vger.kernel.org MIGRATE_METADATA pages are special because for the one architecture (arm64) that use them, it is not possible to have metadata associated with a page used to store metadata. To avoid a situation where a page with metadata is being migrated to a page which cannot have metadata, keep track of whether such pages have been isolated as the source for migration. When allocating a destination page for migration, deny allocations from MIGRATE_METADATA if that's the case. fast_isolate_freepages() takes pages only from the MIGRATE_MOVABLE list, which means it is not necessary to have a similar check, as MIGRATE_METADATA pages will never be considered. Signed-off-by: Alexandru Elisei --- arch/arm64/include/asm/memory_metadata.h | 5 +++++ include/asm-generic/memory_metadata.h | 5 +++++ include/linux/mmzone.h | 2 +- mm/compaction.c | 19 +++++++++++++++++-- mm/internal.h | 1 + 5 files changed, 29 insertions(+), 3 deletions(-) diff --git a/arch/arm64/include/asm/memory_metadata.h b/arch/arm64/include/asm/memory_metadata.h index 5269be7f455f..c57c435c8ba3 100644 --- a/arch/arm64/include/asm/memory_metadata.h +++ b/arch/arm64/include/asm/memory_metadata.h @@ -7,6 +7,8 @@ #include +#include + #ifdef CONFIG_MEMORY_METADATA static inline bool metadata_storage_enabled(void) { @@ -16,6 +18,9 @@ static inline bool alloc_can_use_metadata_pages(gfp_t gfp_mask) { return false; } + +#define page_has_metadata(page) page_mte_tagged(page) + #endif /* CONFIG_MEMORY_METADATA */ #endif /* __ASM_MEMORY_METADATA_H */ diff --git a/include/asm-generic/memory_metadata.h b/include/asm-generic/memory_metadata.h index 63ea661b354d..02b279823920 100644 --- a/include/asm-generic/memory_metadata.h +++ b/include/asm-generic/memory_metadata.h @@ -3,6 +3,7 @@ #define __ASM_GENERIC_MEMORY_METADATA_H #include +#include extern unsigned long totalmetadata_pages; @@ -15,6 +16,10 @@ static inline bool alloc_can_use_metadata_pages(gfp_t gfp_mask) { return false; } +static inline bool page_has_metadata(struct page *page) +{ + return false; +} #endif /* !CONFIG_MEMORY_METADATA */ #endif /* __ASM_GENERIC_MEMORY_METADATA_H */ diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 48c237248d87..12d5072668ab 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -91,7 +91,7 @@ extern const char * const migratetype_names[MIGRATE_TYPES]; static inline bool is_migrate_movable(int mt) { - return is_migrate_cma(mt) || mt == MIGRATE_MOVABLE; + return is_migrate_cma(mt) || is_migrate_metadata(mt) || mt == MIGRATE_MOVABLE; } /* diff --git a/mm/compaction.c b/mm/compaction.c index a29db409c5cc..cc0139fa0cb0 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -1153,6 +1153,9 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, nr_isolated += folio_nr_pages(folio); nr_scanned += folio_nr_pages(folio) - 1; + if (page_has_metadata(&folio->page)) + cc->source_has_metadata = true; + /* * Avoid isolating too much unless this block is being * fully scanned (e.g. dirty/writeback pages, parallel allocation) @@ -1328,6 +1331,15 @@ static bool suitable_migration_source(struct compact_control *cc, static bool suitable_migration_target(struct compact_control *cc, struct page *page) { + int block_mt; + + block_mt = get_pageblock_migratetype(page); + + /* Pages from MIGRATE_METADATA cannot have metadata. */ + if (is_migrate_metadata(block_mt) && cc->source_has_metadata) + return false; + + /* If the page is a large free page, then disallow migration */ if (PageBuddy(page)) { /* @@ -1342,8 +1354,11 @@ static bool suitable_migration_target(struct compact_control *cc, if (cc->ignore_block_suitable) return true; - /* If the block is MIGRATE_MOVABLE or MIGRATE_CMA, allow migration */ - if (is_migrate_movable(get_pageblock_migratetype(page))) + /* + * If the block is MIGRATE_MOVABLE, MIGRATE_CMA or MIGRATE_METADATA, + * allow migration. + */ + if (is_migrate_movable(block_mt)) return true; /* Otherwise skip the block */ diff --git a/mm/internal.h b/mm/internal.h index efd52c9f1578..d28ac0085f61 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -491,6 +491,7 @@ struct compact_control { * ensure forward progress. */ bool alloc_contig; /* alloc_contig_range allocation */ + bool source_has_metadata; /* source pages have associated metadata */ }; /* From patchwork Wed Aug 23 13:13:24 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexandru Elisei X-Patchwork-Id: 13362321 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AB4CAEE49B2 for ; Wed, 23 Aug 2023 13:16:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234904AbjHWNQI (ORCPT ); Wed, 23 Aug 2023 09:16:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54648 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235615AbjHWNQD (ORCPT ); Wed, 23 Aug 2023 09:16:03 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 4DB9EE5C; Wed, 23 Aug 2023 06:15:32 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 9081C15DB; Wed, 23 Aug 2023 06:15:58 -0700 (PDT) Received: from e121798.cable.virginm.net (unknown [172.31.20.19]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 2C7E63F740; Wed, 23 Aug 2023 06:15:12 -0700 (PDT) From: Alexandru Elisei To: catalin.marinas@arm.com, will@kernel.org, oliver.upton@linux.dev, maz@kernel.org, james.morse@arm.com, suzuki.poulose@arm.com, yuzenghui@huawei.com, arnd@arndb.de, akpm@linux-foundation.org, mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, bristot@redhat.com, vschneid@redhat.com, mhiramat@kernel.org, rppt@kernel.org, hughd@google.com Cc: pcc@google.com, steven.price@arm.com, anshuman.khandual@arm.com, vincenzo.frascino@arm.com, david@redhat.com, eugenis@google.com, kcc@google.com, hyesoo.yu@samsung.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, kvmarm@lists.linux.dev, linux-fsdevel@vger.kernel.org, linux-arch@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org Subject: [PATCH RFC 11/37] mm: migrate/mempolicy: Allocate metadata-enabled destination page Date: Wed, 23 Aug 2023 14:13:24 +0100 Message-Id: <20230823131350.114942-12-alexandru.elisei@arm.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230823131350.114942-1-alexandru.elisei@arm.com> References: <20230823131350.114942-1-alexandru.elisei@arm.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-trace-kernel@vger.kernel.org With explicit metadata page management support, it's important to know if the source page for migration has metadata associated with it for two reasons: - The page allocator knows to skip metadata pages (which cannot have metadata) when allocating the destination page. - The associated metadata page is correctly reserved when fulfilling the allocation for the destination page. When choosing the destination during migration, keep track if the source page has metadata. The mbind() system call changes the NUMA allocation policy for the specified memory range and nodemask. If the MPOL_MF_MOVE or MPOL_MF_MOVE_ALL flags are set, then any existing allocations that fall within the range which don't conform to the specified policy will be migrated. The function that allocates the destination page for migration is new_page(), teach it too about source pages with metadata. Signed-off-by: Alexandru Elisei --- arch/arm64/include/asm/memory_metadata.h | 4 ++++ include/asm-generic/memory_metadata.h | 4 ++++ mm/mempolicy.c | 4 ++++ mm/migrate.c | 6 ++++++ 4 files changed, 18 insertions(+) diff --git a/arch/arm64/include/asm/memory_metadata.h b/arch/arm64/include/asm/memory_metadata.h index c57c435c8ba3..132707fce9ab 100644 --- a/arch/arm64/include/asm/memory_metadata.h +++ b/arch/arm64/include/asm/memory_metadata.h @@ -21,6 +21,10 @@ static inline bool alloc_can_use_metadata_pages(gfp_t gfp_mask) #define page_has_metadata(page) page_mte_tagged(page) +static inline bool folio_has_metadata(struct folio *folio) +{ + return page_has_metadata(&folio->page); +} #endif /* CONFIG_MEMORY_METADATA */ #endif /* __ASM_MEMORY_METADATA_H */ diff --git a/include/asm-generic/memory_metadata.h b/include/asm-generic/memory_metadata.h index 02b279823920..8f4e2fba222f 100644 --- a/include/asm-generic/memory_metadata.h +++ b/include/asm-generic/memory_metadata.h @@ -20,6 +20,10 @@ static inline bool page_has_metadata(struct page *page) { return false; } +static inline bool folio_has_metadata(struct folio *folio) +{ + return false; +} #endif /* !CONFIG_MEMORY_METADATA */ #endif /* __ASM_GENERIC_MEMORY_METADATA_H */ diff --git a/mm/mempolicy.c b/mm/mempolicy.c index edc25195f5bd..d164b5c50243 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -103,6 +103,7 @@ #include #include +#include #include #include #include @@ -1219,6 +1220,9 @@ static struct folio *new_folio(struct folio *src, unsigned long start) if (folio_test_large(src)) gfp = GFP_TRANSHUGE; + if (folio_has_metadata(src)) + gfp |= __GFP_TAGGED; + /* * if !vma, vma_alloc_folio() will use task or system default policy */ diff --git a/mm/migrate.c b/mm/migrate.c index 24baad2571e3..c6826562220a 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -51,6 +51,7 @@ #include #include +#include #include #include @@ -1990,6 +1991,9 @@ struct folio *alloc_migration_target(struct folio *src, unsigned long private) if (nid == NUMA_NO_NODE) nid = folio_nid(src); + if (folio_has_metadata(src)) + gfp_mask |= __GFP_TAGGED; + if (folio_test_hugetlb(src)) { struct hstate *h = folio_hstate(src); @@ -2476,6 +2480,8 @@ static struct folio *alloc_misplaced_dst_folio(struct folio *src, __GFP_NOWARN; gfp &= ~__GFP_RECLAIM; } + if (folio_has_metadata(src)) + gfp |= __GFP_TAGGED; return __folio_alloc_node(gfp, order, nid); } From patchwork Wed Aug 23 13:13:25 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexandru Elisei X-Patchwork-Id: 13362322 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B675AEE4993 for ; Wed, 23 Aug 2023 13:16:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235624AbjHWNQ3 (ORCPT ); Wed, 23 Aug 2023 09:16:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33348 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235612AbjHWNQR (ORCPT ); Wed, 23 Aug 2023 09:16:17 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 3368C1704; Wed, 23 Aug 2023 06:15:54 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 91C481650; Wed, 23 Aug 2023 06:16:04 -0700 (PDT) Received: from e121798.cable.virginm.net (unknown [172.31.20.19]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 4AFC03F740; Wed, 23 Aug 2023 06:15:18 -0700 (PDT) From: Alexandru Elisei To: catalin.marinas@arm.com, will@kernel.org, oliver.upton@linux.dev, maz@kernel.org, james.morse@arm.com, suzuki.poulose@arm.com, yuzenghui@huawei.com, arnd@arndb.de, akpm@linux-foundation.org, mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, bristot@redhat.com, vschneid@redhat.com, mhiramat@kernel.org, rppt@kernel.org, hughd@google.com Cc: pcc@google.com, steven.price@arm.com, anshuman.khandual@arm.com, vincenzo.frascino@arm.com, david@redhat.com, eugenis@google.com, kcc@google.com, hyesoo.yu@samsung.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, kvmarm@lists.linux.dev, linux-fsdevel@vger.kernel.org, linux-arch@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org Subject: [PATCH RFC 12/37] mm: gup: Don't allow longterm pinning of MIGRATE_METADATA pages Date: Wed, 23 Aug 2023 14:13:25 +0100 Message-Id: <20230823131350.114942-13-alexandru.elisei@arm.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230823131350.114942-1-alexandru.elisei@arm.com> References: <20230823131350.114942-1-alexandru.elisei@arm.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-trace-kernel@vger.kernel.org Treat MIGRATE_METADATA pages just like movable or CMA pages and don't allow them to be pinned longterm. No special handling needed for migrate_longterm_unpinnable_pages() because the gfp mask for allocating the destination pages is GFP_USER. GFP_USER doesn't include __GFP_MOVABLE, which makes it impossible to accidently allocate metadata pages for migrating the pinned pages. Signed-off-by: Alexandru Elisei --- include/linux/mm.h | 10 +++++++--- mm/Kconfig | 2 ++ 2 files changed, 9 insertions(+), 3 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 2dd73e4f3d8e..ce87d55ecf87 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1925,16 +1925,20 @@ static inline bool is_zero_folio(const struct folio *folio) return is_zero_page(&folio->page); } -/* MIGRATE_CMA and ZONE_MOVABLE do not allow pin folios */ +/* MIGRATE_CMA, MIGRATE_METADATA and ZONE_MOVABLE do not allow pin folios */ #ifdef CONFIG_MIGRATION static inline bool folio_is_longterm_pinnable(struct folio *folio) { -#ifdef CONFIG_CMA +#if defined(CONFIG_CMA) || defined(CONFIG_MEMORY_METADATA) int mt = folio_migratetype(folio); - if (mt == MIGRATE_CMA || mt == MIGRATE_ISOLATE) + if (mt == MIGRATE_ISOLATE) + return false; + + if (is_migrate_cma(mt) || is_migrate_metadata(mt)) return false; #endif + /* The zero page can be "pinned" but gets special handling. */ if (is_zero_folio(folio)) return true; diff --git a/mm/Kconfig b/mm/Kconfig index 838193522e20..847e1669dba0 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -1238,6 +1238,8 @@ config LOCK_MM_AND_FIND_VMA config MEMORY_METADATA bool + select MEMORY_ISOLATION + select MIGRATION source "mm/damon/Kconfig" From patchwork Wed Aug 23 13:13:26 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexandru Elisei X-Patchwork-Id: 13362323 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 33CA2EE49A3 for ; Wed, 23 Aug 2023 13:16:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235039AbjHWNQl (ORCPT ); Wed, 23 Aug 2023 09:16:41 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41312 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235632AbjHWNQc (ORCPT ); Wed, 23 Aug 2023 09:16:32 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 5AD9410E7; Wed, 23 Aug 2023 06:16:09 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 1E9101655; Wed, 23 Aug 2023 06:16:11 -0700 (PDT) Received: from e121798.cable.virginm.net (unknown [172.31.20.19]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 54BD83F740; Wed, 23 Aug 2023 06:15:24 -0700 (PDT) From: Alexandru Elisei To: catalin.marinas@arm.com, will@kernel.org, oliver.upton@linux.dev, maz@kernel.org, james.morse@arm.com, suzuki.poulose@arm.com, yuzenghui@huawei.com, arnd@arndb.de, akpm@linux-foundation.org, mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, bristot@redhat.com, vschneid@redhat.com, mhiramat@kernel.org, rppt@kernel.org, hughd@google.com Cc: pcc@google.com, steven.price@arm.com, anshuman.khandual@arm.com, vincenzo.frascino@arm.com, david@redhat.com, eugenis@google.com, kcc@google.com, hyesoo.yu@samsung.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, kvmarm@lists.linux.dev, linux-fsdevel@vger.kernel.org, linux-arch@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org Subject: [PATCH RFC 13/37] arm64: mte: Reserve tag storage memory Date: Wed, 23 Aug 2023 14:13:26 +0100 Message-Id: <20230823131350.114942-14-alexandru.elisei@arm.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230823131350.114942-1-alexandru.elisei@arm.com> References: <20230823131350.114942-1-alexandru.elisei@arm.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-trace-kernel@vger.kernel.org Allow the kernel to get the size and location of the MTE tag storage memory from the DTB. This memory is marked as reserved for now, with later patches adding support for making use of it. The DTB node for the tag storage region is defined as: metadata1: metadata@8c0000000 { compatible = "arm,mte-tag-storage"; reg = <0x08 0xc0000000 0x00 0x1000000>; block-size = <0x1000>; // 4k memory = <&memory1>; // Associated tagged memory }; The tag storage region represents the largest contiguous memory region that holds all the tags for the associated contiguous memory region which can be tagged. For example, for a 32GB contiguous tagged memory the corresponding tag storage region is 1GB of contiguous memory, not two adjacent 512M memory regions. "block-size" represents the minimum multiple of 4K of tag storage where all the tags stored in the block correspond to a contiguous memory region. This in needed for platforms where the memory controller interleaves tag writes to memory. For example, if the memory controller interleaves tag writes for 256KB of contiguous memory across 8K of tag storage (2-way interleave), then the correct value for "block-size" is 0x2000. Signed-off-by: Alexandru Elisei --- arch/arm64/Kconfig | 12 ++ arch/arm64/include/asm/memory_metadata.h | 3 +- arch/arm64/include/asm/mte_tag_storage.h | 15 ++ arch/arm64/kernel/Makefile | 1 + arch/arm64/kernel/mte_tag_storage.c | 262 +++++++++++++++++++++++ arch/arm64/kernel/setup.c | 7 + 6 files changed, 299 insertions(+), 1 deletion(-) create mode 100644 arch/arm64/include/asm/mte_tag_storage.h create mode 100644 arch/arm64/kernel/mte_tag_storage.c diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index a2511b30d0f6..ed27bb87babd 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -2077,6 +2077,18 @@ config ARM64_MTE Documentation/arch/arm64/memory-tagging-extension.rst. +if ARM64_MTE +config ARM64_MTE_TAG_STORAGE + bool "Dynamic MTE tag storage management" + select MEMORY_METADATA + help + Adds support for dynamic management of the memory used by the hardware + for storing MTE tags. This memory can be used as regular data memory + when it's not used for storing tags. + + If unsure, say N +endif # ARM64_MTE + endmenu # "ARMv8.5 architectural features" menu "ARMv8.7 architectural features" diff --git a/arch/arm64/include/asm/memory_metadata.h b/arch/arm64/include/asm/memory_metadata.h index 132707fce9ab..3287b2776af1 100644 --- a/arch/arm64/include/asm/memory_metadata.h +++ b/arch/arm64/include/asm/memory_metadata.h @@ -14,9 +14,10 @@ static inline bool metadata_storage_enabled(void) { return false; } + static inline bool alloc_can_use_metadata_pages(gfp_t gfp_mask) { - return false; + return !(gfp_mask & __GFP_TAGGED); } #define page_has_metadata(page) page_mte_tagged(page) diff --git a/arch/arm64/include/asm/mte_tag_storage.h b/arch/arm64/include/asm/mte_tag_storage.h new file mode 100644 index 000000000000..8f86c4f9a7c3 --- /dev/null +++ b/arch/arm64/include/asm/mte_tag_storage.h @@ -0,0 +1,15 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) 2023 ARM Ltd. + */ +#ifndef __ASM_MTE_TAG_STORAGE_H +#define __ASM_MTE_TAG_STORAGE_H + +#ifdef CONFIG_ARM64_MTE_TAG_STORAGE +void mte_tag_storage_init(void); +#else +static inline void mte_tag_storage_init(void) +{ +} +#endif /* CONFIG_ARM64_MTE_TAG_STORAGE */ +#endif /* __ASM_MTE_TAG_STORAGE_H */ diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile index d95b3d6b471a..5f031bf9f8f1 100644 --- a/arch/arm64/kernel/Makefile +++ b/arch/arm64/kernel/Makefile @@ -70,6 +70,7 @@ obj-$(CONFIG_CRASH_CORE) += crash_core.o obj-$(CONFIG_ARM_SDE_INTERFACE) += sdei.o obj-$(CONFIG_ARM64_PTR_AUTH) += pointer_auth.o obj-$(CONFIG_ARM64_MTE) += mte.o +obj-$(CONFIG_ARM64_MTE_TAG_STORAGE) += mte_tag_storage.o obj-y += vdso-wrap.o obj-$(CONFIG_COMPAT_VDSO) += vdso32-wrap.o obj-$(CONFIG_UNWIND_PATCH_PAC_INTO_SCS) += patch-scs.o diff --git a/arch/arm64/kernel/mte_tag_storage.c b/arch/arm64/kernel/mte_tag_storage.c new file mode 100644 index 000000000000..5014dda9bf35 --- /dev/null +++ b/arch/arm64/kernel/mte_tag_storage.c @@ -0,0 +1,262 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Support for dynamic tag storage. + * + * Copyright (C) 2023 ARM Ltd. + */ + +#include +#include +#include +#include +#include +#include +#include + +#include +#include + +struct tag_region { + struct range mem_range; /* Memory associated with the tag storage, in PFNs. */ + struct range tag_range; /* Tag storage memory, in PFNs. */ + u32 block_size; /* Tag block size, in pages. */ +}; + +#define MAX_TAG_REGIONS 32 + +static struct tag_region tag_regions[MAX_TAG_REGIONS]; +static int num_tag_regions; + +static int __init tag_storage_of_flat_get_range(unsigned long node, const __be32 *reg, + int reg_len, struct range *range) +{ + int addr_cells = dt_root_addr_cells; + int size_cells = dt_root_size_cells; + u64 size; + + if (reg_len / 4 > addr_cells + size_cells) + return -EINVAL; + + range->start = PHYS_PFN(of_read_number(reg, addr_cells)); + size = PHYS_PFN(of_read_number(reg + addr_cells, size_cells)); + if (size == 0) { + pr_err("Invalid node"); + return -EINVAL; + } + range->end = range->start + size - 1; + + return 0; +} + +static int __init tag_storage_of_flat_get_tag_range(unsigned long node, + struct range *tag_range) +{ + const __be32 *reg; + int reg_len; + + reg = of_get_flat_dt_prop(node, "reg", ®_len); + if (reg == NULL) { + pr_err("Invalid metadata node"); + return -EINVAL; + } + + return tag_storage_of_flat_get_range(node, reg, reg_len, tag_range); +} + +static int __init tag_storage_of_flat_get_memory_range(unsigned long node, struct range *mem) +{ + const __be32 *reg; + int reg_len; + + reg = of_get_flat_dt_prop(node, "linux,usable-memory", ®_len); + if (reg == NULL) + reg = of_get_flat_dt_prop(node, "reg", ®_len); + + if (reg == NULL) { + pr_err("Invalid memory node"); + return -EINVAL; + } + + return tag_storage_of_flat_get_range(node, reg, reg_len, mem); +} + +struct find_memory_node_arg { + unsigned long node; + u32 phandle; +}; + +static int __init fdt_find_memory_node(unsigned long node, const char *uname, + int depth, void *data) +{ + const char *type = of_get_flat_dt_prop(node, "device_type", NULL); + struct find_memory_node_arg *arg = data; + + if (depth != 1 || !type || strcmp(type, "memory") != 0) + return 0; + + if (of_get_flat_dt_phandle(node) == arg->phandle) { + arg->node = node; + return 1; + } + + return 0; +} + +static int __init tag_storage_get_memory_node(unsigned long tag_node, unsigned long *mem_node) +{ + struct find_memory_node_arg arg = { 0 }; + const __be32 *memory_prop; + u32 mem_phandle; + int ret, reg_len; + + memory_prop = of_get_flat_dt_prop(tag_node, "memory", ®_len); + if (!memory_prop) { + pr_err("Missing 'memory' property in the tag storage node"); + return -EINVAL; + } + + mem_phandle = be32_to_cpup(memory_prop); + arg.phandle = mem_phandle; + + ret = of_scan_flat_dt(fdt_find_memory_node, &arg); + if (ret != 1) { + pr_err("Associated memory node not found"); + return -EINVAL; + } + + *mem_node = arg.node; + + return 0; +} + +static int __init tag_storage_of_flat_read_u32(unsigned long node, const char *propname, + u32 *retval) +{ + const __be32 *reg; + + reg = of_get_flat_dt_prop(node, propname, NULL); + if (!reg) + return -EINVAL; + + *retval = be32_to_cpup(reg); + return 0; +} + +static u32 __init get_block_size_pages(u32 block_size_bytes) +{ + u32 a = PAGE_SIZE; + u32 b = block_size_bytes; + u32 r; + + /* Find greatest common divisor using the Euclidian algorithm. */ + do { + r = a % b; + a = b; + b = r; + } while (b != 0); + + return PHYS_PFN(PAGE_SIZE * block_size_bytes / a); +} + +static int __init fdt_init_tag_storage(unsigned long node, const char *uname, + int depth, void *data) +{ + struct tag_region *region; + unsigned long mem_node; + struct range *mem_range; + struct range *tag_range; + u32 block_size_bytes; + u32 nid; + int ret; + + if (depth != 1 || !strstr(uname, "metadata")) + return 0; + + if (!of_flat_dt_is_compatible(node, "arm,mte-tag-storage")) + return 0; + + if (num_tag_regions == MAX_TAG_REGIONS) { + pr_err("Maximum number of tag storage regions exceeded"); + return -EINVAL; + } + + region = &tag_regions[num_tag_regions]; + mem_range = ®ion->mem_range; + tag_range = ®ion->tag_range; + + ret = tag_storage_of_flat_get_tag_range(node, tag_range); + if (ret) { + pr_err("Invalid tag storage node"); + return ret; + } + + ret = tag_storage_get_memory_node(node, &mem_node); + if (ret) + return ret; + + ret = tag_storage_of_flat_get_memory_range(mem_node, mem_range); + if (ret) { + pr_err("Invalid address for associated data memory node"); + return ret; + } + + /* The tag region must exactly match the corresponding memory. */ + if (range_len(tag_range) * 32 != range_len(mem_range)) { + pr_err("Tag region doesn't cover exactly the corresponding memory region"); + return -EINVAL; + } + + ret = tag_storage_of_flat_read_u32(node, "block-size", &block_size_bytes); + if (ret || block_size_bytes == 0) { + pr_err("Invalid or missing 'block-size' property"); + return -EINVAL; + } + region->block_size = get_block_size_pages(block_size_bytes); + if (range_len(tag_range) % region->block_size != 0) { + pr_err("Tag storage region size is not a multiple of allocation block size"); + return -EINVAL; + } + + ret = tag_storage_of_flat_read_u32(mem_node, "numa-node-id", &nid); + if (ret) + nid = numa_node_id(); + + ret = memblock_add_node(PFN_PHYS(tag_range->start), PFN_PHYS(range_len(tag_range)), + nid, MEMBLOCK_NONE); + if (ret) { + pr_err("Error adding tag memblock (%d)", ret); + return ret; + } + memblock_reserve(PFN_PHYS(tag_range->start), PFN_PHYS(range_len(tag_range))); + + pr_info("Found MTE tag storage region 0x%llx@0x%llx, block size %u pages", + PFN_PHYS(range_len(tag_range)), PFN_PHYS(tag_range->start), region->block_size); + + num_tag_regions++; + + return 0; +} + +void __init mte_tag_storage_init(void) +{ + struct range *tag_range; + int i, ret; + + ret = of_scan_flat_dt(fdt_init_tag_storage, NULL); + if (ret) { + pr_err("MTE tag storage management disabled"); + goto out_err; + } + + if (num_tag_regions == 0) + pr_info("No MTE tag storage regions detected"); + + return; + +out_err: + for (i = 0; i < num_tag_regions; i++) { + tag_range = &tag_regions[i].tag_range; + memblock_remove(PFN_PHYS(tag_range->start), PFN_PHYS(range_len(tag_range))); + } + num_tag_regions = 0; +} diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c index 417a8a86b2db..1b77138c1aa5 100644 --- a/arch/arm64/kernel/setup.c +++ b/arch/arm64/kernel/setup.c @@ -42,6 +42,7 @@ #include #include #include +#include #include #include #include @@ -342,6 +343,12 @@ void __init __no_sanitize_address setup_arch(char **cmdline_p) FW_BUG "Booted with MMU enabled!"); } + /* + * Must be called before memory limits are enforced by + * arm64_memblock_init(). + */ + mte_tag_storage_init(); + arm64_memblock_init(); paging_init(); From patchwork Wed Aug 23 13:13:27 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexandru Elisei X-Patchwork-Id: 13362324 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4C407EE4993 for ; Wed, 23 Aug 2023 13:17:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235680AbjHWNRA (ORCPT ); Wed, 23 Aug 2023 09:17:00 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49274 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235478AbjHWNQ7 (ORCPT ); Wed, 23 Aug 2023 09:16:59 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id BBCAAE75; Wed, 23 Aug 2023 06:16:18 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 8850B165C; Wed, 23 Aug 2023 06:16:18 -0700 (PDT) Received: from e121798.cable.virginm.net (unknown [172.31.20.19]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id CEDA63F740; Wed, 23 Aug 2023 06:15:30 -0700 (PDT) From: Alexandru Elisei To: catalin.marinas@arm.com, will@kernel.org, oliver.upton@linux.dev, maz@kernel.org, james.morse@arm.com, suzuki.poulose@arm.com, yuzenghui@huawei.com, arnd@arndb.de, akpm@linux-foundation.org, mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, bristot@redhat.com, vschneid@redhat.com, mhiramat@kernel.org, rppt@kernel.org, hughd@google.com Cc: pcc@google.com, steven.price@arm.com, anshuman.khandual@arm.com, vincenzo.frascino@arm.com, david@redhat.com, eugenis@google.com, kcc@google.com, hyesoo.yu@samsung.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, kvmarm@lists.linux.dev, linux-fsdevel@vger.kernel.org, linux-arch@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org Subject: [PATCH RFC 14/37] arm64: mte: Expose tag storage pages to the MIGRATE_METADATA freelist Date: Wed, 23 Aug 2023 14:13:27 +0100 Message-Id: <20230823131350.114942-15-alexandru.elisei@arm.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230823131350.114942-1-alexandru.elisei@arm.com> References: <20230823131350.114942-1-alexandru.elisei@arm.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-trace-kernel@vger.kernel.org Add the MTE tag storage pages to the MIGRATE_METADATA freelist, which allows the page allocator to manage them like (almost) regular pages. Signed-off-by: Alexandru Elisei --- arch/arm64/kernel/mte_tag_storage.c | 47 +++++++++++++++++++++++++++++ include/linux/gfp.h | 8 +++++ mm/mm_init.c | 24 +++++++++++++-- 3 files changed, 76 insertions(+), 3 deletions(-) diff --git a/arch/arm64/kernel/mte_tag_storage.c b/arch/arm64/kernel/mte_tag_storage.c index 5014dda9bf35..87160f53960f 100644 --- a/arch/arm64/kernel/mte_tag_storage.c +++ b/arch/arm64/kernel/mte_tag_storage.c @@ -5,10 +5,12 @@ * Copyright (C) 2023 ARM Ltd. */ +#include #include #include #include #include +#include #include #include #include @@ -190,6 +192,12 @@ static int __init fdt_init_tag_storage(unsigned long node, const char *uname, return ret; } + /* Pages are managed in pageblock_nr_pages chunks */ + if (!IS_ALIGNED(tag_range->start | range_len(tag_range), pageblock_nr_pages)) { + pr_err("Tag storage region not aligned to 0x%lx", pageblock_nr_pages); + return -EINVAL; + } + ret = tag_storage_get_memory_node(node, &mem_node); if (ret) return ret; @@ -260,3 +268,42 @@ void __init mte_tag_storage_init(void) } num_tag_regions = 0; } + +static int __init mte_tag_storage_activate_regions(void) +{ + phys_addr_t dram_start, dram_end; + struct range *tag_range; + unsigned long pfn; + int i; + + if (num_tag_regions == 0) + return 0; + + dram_start = memblock_start_of_DRAM(); + dram_end = memblock_end_of_DRAM(); + + for (i = 0; i < num_tag_regions; i++) { + tag_range = &tag_regions[i].tag_range; + /* + * Tag storage region was clipped by arm64_bootmem_init() + * enforcing addressing limits. + */ + if (PFN_PHYS(tag_range->start) < dram_start || + PFN_PHYS(tag_range->end) >= dram_end) { + pr_err("Tag storage region 0x%llx-0x%llx outside addressable memory", + PFN_PHYS(tag_range->start), PFN_PHYS(tag_range->end + 1)); + return -EINVAL; + } + } + + for (i = 0; i < num_tag_regions; i++) { + tag_range = &tag_regions[i].tag_range; + for (pfn = tag_range->start; pfn <= tag_range->end; pfn += pageblock_nr_pages) { + init_metadata_reserved_pageblock(pfn_to_page(pfn)); + totalmetadata_pages += pageblock_nr_pages; + } + } + + return 0; +} +core_initcall(mte_tag_storage_activate_regions) diff --git a/include/linux/gfp.h b/include/linux/gfp.h index 665f06675c83..fb344baa9a9b 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -354,4 +354,12 @@ extern struct page *alloc_contig_pages(unsigned long nr_pages, gfp_t gfp_mask, #endif void free_contig_range(unsigned long pfn, unsigned long nr_pages); +#ifdef CONFIG_MEMORY_METADATA +extern void init_metadata_reserved_pageblock(struct page *page); +#else +static inline void init_metadata_reserved_pageblock(struct page *page) +{ +} +#endif + #endif /* __LINUX_GFP_H */ diff --git a/mm/mm_init.c b/mm/mm_init.c index a1963c3322af..467c80e9dacc 100644 --- a/mm/mm_init.c +++ b/mm/mm_init.c @@ -2329,8 +2329,9 @@ bool __init deferred_grow_zone(struct zone *zone, unsigned int order) #endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */ -#ifdef CONFIG_CMA -void __init init_cma_reserved_pageblock(struct page *page) +#if defined(CONFIG_CMA) || defined(CONFIG_MEMORY_METADATA) +static void __init init_reserved_pageblock(struct page *page, + enum migratetype migratetype) { unsigned i = pageblock_nr_pages; struct page *p = page; @@ -2340,15 +2341,32 @@ void __init init_cma_reserved_pageblock(struct page *page) set_page_count(p, 0); } while (++p, --i); - set_pageblock_migratetype(page, MIGRATE_CMA); + set_pageblock_migratetype(page, migratetype); set_page_refcounted(page); __free_pages(page, pageblock_order); adjust_managed_page_count(page, pageblock_nr_pages); +} + +#ifdef CONFIG_CMA +/* Free whole pageblock and set its migration type to MIGRATE_CMA. */ +void __init init_cma_reserved_pageblock(struct page *page) +{ + init_reserved_pageblock(page, MIGRATE_CMA); page_zone(page)->cma_pages += pageblock_nr_pages; } #endif +#ifdef CONFIG_MEMORY_METADATA +/* Free whole pageblock and set its migration type to MIGRATE_METADATA. */ +void __init init_metadata_reserved_pageblock(struct page *page) +{ + init_reserved_pageblock(page, MIGRATE_METADATA); + page_zone(page)->metadata_pages += pageblock_nr_pages; +} +#endif +#endif /* CONFIG_CMA || CONFIG_MEMORY_METADATA */ + void set_zone_contiguous(struct zone *zone) { unsigned long block_start_pfn = zone->zone_start_pfn; From patchwork Wed Aug 23 13:13:28 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexandru Elisei X-Patchwork-Id: 13362325 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 51C92EE49A0 for ; Wed, 23 Aug 2023 13:17:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235658AbjHWNRL (ORCPT ); Wed, 23 Aug 2023 09:17:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49372 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235689AbjHWNRA (ORCPT ); Wed, 23 Aug 2023 09:17:00 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 179F8173D; Wed, 23 Aug 2023 06:16:27 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id D1B7C1682; Wed, 23 Aug 2023 06:16:24 -0700 (PDT) Received: from e121798.cable.virginm.net (unknown [172.31.20.19]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 4F9513F740; Wed, 23 Aug 2023 06:15:38 -0700 (PDT) From: Alexandru Elisei To: catalin.marinas@arm.com, will@kernel.org, oliver.upton@linux.dev, maz@kernel.org, james.morse@arm.com, suzuki.poulose@arm.com, yuzenghui@huawei.com, arnd@arndb.de, akpm@linux-foundation.org, mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, bristot@redhat.com, vschneid@redhat.com, mhiramat@kernel.org, rppt@kernel.org, hughd@google.com Cc: pcc@google.com, steven.price@arm.com, anshuman.khandual@arm.com, vincenzo.frascino@arm.com, david@redhat.com, eugenis@google.com, kcc@google.com, hyesoo.yu@samsung.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, kvmarm@lists.linux.dev, linux-fsdevel@vger.kernel.org, linux-arch@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org Subject: [PATCH RFC 15/37] arm64: mte: Make tag storage depend on ARCH_KEEP_MEMBLOCK Date: Wed, 23 Aug 2023 14:13:28 +0100 Message-Id: <20230823131350.114942-16-alexandru.elisei@arm.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230823131350.114942-1-alexandru.elisei@arm.com> References: <20230823131350.114942-1-alexandru.elisei@arm.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-trace-kernel@vger.kernel.org Tag storage memory requires that the tag storage pages used for data can be migrated when they need to be repurposed to store tags. If ARCH_KEEP_MEMBLOCK is enabled, kexec will scan all non-reserved memblocks to find a suitable location for copying the kernel image. The kernel image, once loaded, cannot be moved to another location in physical memory. The initialization code for the tag storage reserves the memblocks for the tag storage pages, which means kexec will not use them, and the tag storage pages can be migrated at any time, which is the desired behaviour. However, if ARCH_KEEP_MEMBLOCK is not selected, kexec will not skip a region unless the memory resource has the IORESOURCE_SYSRAM_DRIVER_MANAGED flag, which isn't currently set by the initialization code. Make ARM64_MTE_TAG_STORAGE depend on ARCH_KEEP_MEMBLOCK to make it explicit that that is required for it to work correctly. Signed-off-by: Alexandru Elisei --- arch/arm64/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index ed27bb87babd..1e3d23ee22ab 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -2081,6 +2081,7 @@ if ARM64_MTE config ARM64_MTE_TAG_STORAGE bool "Dynamic MTE tag storage management" select MEMORY_METADATA + depends on ARCH_KEEP_MEMBLOCK help Adds support for dynamic management of the memory used by the hardware for storing MTE tags. This memory can be used as regular data memory From patchwork Wed Aug 23 13:13:29 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexandru Elisei X-Patchwork-Id: 13362595 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A9658EE49B7 for ; Wed, 23 Aug 2023 14:01:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235877AbjHWOBi (ORCPT ); Wed, 23 Aug 2023 10:01:38 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53788 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236008AbjHWN1i (ORCPT ); Wed, 23 Aug 2023 09:27:38 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 5ECF110C7; Wed, 23 Aug 2023 06:27:06 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id CC9FC1684; Wed, 23 Aug 2023 06:16:30 -0700 (PDT) Received: from e121798.cable.virginm.net (unknown [172.31.20.19]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 8E4913F740; Wed, 23 Aug 2023 06:15:44 -0700 (PDT) From: Alexandru Elisei To: catalin.marinas@arm.com, will@kernel.org, oliver.upton@linux.dev, maz@kernel.org, james.morse@arm.com, suzuki.poulose@arm.com, yuzenghui@huawei.com, arnd@arndb.de, akpm@linux-foundation.org, mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, bristot@redhat.com, vschneid@redhat.com, mhiramat@kernel.org, rppt@kernel.org, hughd@google.com Cc: pcc@google.com, steven.price@arm.com, anshuman.khandual@arm.com, vincenzo.frascino@arm.com, david@redhat.com, eugenis@google.com, kcc@google.com, hyesoo.yu@samsung.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, kvmarm@lists.linux.dev, linux-fsdevel@vger.kernel.org, linux-arch@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org Subject: [PATCH RFC 16/37] arm64: mte: Move tag storage to MIGRATE_MOVABLE when MTE is disabled Date: Wed, 23 Aug 2023 14:13:29 +0100 Message-Id: <20230823131350.114942-17-alexandru.elisei@arm.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230823131350.114942-1-alexandru.elisei@arm.com> References: <20230823131350.114942-1-alexandru.elisei@arm.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-trace-kernel@vger.kernel.org If MTE is disabled (for example, from the kernel command line with the arm64.nomte option), the tag storage pages behave just like normal pages, because they will never be used to store tags. If that's the case, expose them to the page allocator as MIGRATE_MOVABLE pages. MIGRATE_MOVABLE has been chosen because the bulk of memory allocations comes from userspace, and the migratetype for those allocations is MIGRATE_MOVABLE. MIGRATE_RECLAIMABLE and MIGRATE_UNMOVABLE requests can still use the pages as a fallback. Signed-off-by: Alexandru Elisei --- arch/arm64/kernel/mte_tag_storage.c | 18 ++++++++++++++++++ include/linux/gfp.h | 2 ++ mm/mm_init.c | 3 +-- 3 files changed, 21 insertions(+), 2 deletions(-) diff --git a/arch/arm64/kernel/mte_tag_storage.c b/arch/arm64/kernel/mte_tag_storage.c index 87160f53960f..4a6bfdf88458 100644 --- a/arch/arm64/kernel/mte_tag_storage.c +++ b/arch/arm64/kernel/mte_tag_storage.c @@ -296,6 +296,24 @@ static int __init mte_tag_storage_activate_regions(void) } } + /* + * MTE disabled, tag storage pages can be used like any other pages. The + * only restriction is that the pages cannot be used by kexec because + * the memory is marked as reserved in the memblock allocator. + */ + if (!system_supports_mte()) { + for (i = 0; i< num_tag_regions; i++) { + tag_range = &tag_regions[i].tag_range; + for (pfn = tag_range->start; + pfn <= tag_range->end; + pfn += pageblock_nr_pages) { + init_reserved_pageblock(pfn_to_page(pfn), MIGRATE_MOVABLE); + } + } + + return 0; + } + for (i = 0; i < num_tag_regions; i++) { tag_range = &tag_regions[i].tag_range; for (pfn = tag_range->start; pfn <= tag_range->end; pfn += pageblock_nr_pages) { diff --git a/include/linux/gfp.h b/include/linux/gfp.h index fb344baa9a9b..622bb9406cae 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -354,6 +354,8 @@ extern struct page *alloc_contig_pages(unsigned long nr_pages, gfp_t gfp_mask, #endif void free_contig_range(unsigned long pfn, unsigned long nr_pages); +extern void init_reserved_pageblock(struct page *page, enum migratetype migratetype); + #ifdef CONFIG_MEMORY_METADATA extern void init_metadata_reserved_pageblock(struct page *page); #else diff --git a/mm/mm_init.c b/mm/mm_init.c index 467c80e9dacc..eedaacdf153d 100644 --- a/mm/mm_init.c +++ b/mm/mm_init.c @@ -2330,8 +2330,7 @@ bool __init deferred_grow_zone(struct zone *zone, unsigned int order) #endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */ #if defined(CONFIG_CMA) || defined(CONFIG_MEMORY_METADATA) -static void __init init_reserved_pageblock(struct page *page, - enum migratetype migratetype) +void __init init_reserved_pageblock(struct page *page, enum migratetype migratetype) { unsigned i = pageblock_nr_pages; struct page *p = page; From patchwork Wed Aug 23 13:13:30 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexandru Elisei X-Patchwork-Id: 13362327 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EF35FEE49B0 for ; Wed, 23 Aug 2023 13:17:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235669AbjHWNRe (ORCPT ); Wed, 23 Aug 2023 09:17:34 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34090 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235397AbjHWNRd (ORCPT ); Wed, 23 Aug 2023 09:17:33 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id BDC4AE7E; Wed, 23 Aug 2023 06:17:08 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id ECD8715BF; Wed, 23 Aug 2023 06:16:36 -0700 (PDT) Received: from e121798.cable.virginm.net (unknown [172.31.20.19]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 8BF703F740; Wed, 23 Aug 2023 06:15:50 -0700 (PDT) From: Alexandru Elisei To: catalin.marinas@arm.com, will@kernel.org, oliver.upton@linux.dev, maz@kernel.org, james.morse@arm.com, suzuki.poulose@arm.com, yuzenghui@huawei.com, arnd@arndb.de, akpm@linux-foundation.org, mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, bristot@redhat.com, vschneid@redhat.com, mhiramat@kernel.org, rppt@kernel.org, hughd@google.com Cc: pcc@google.com, steven.price@arm.com, anshuman.khandual@arm.com, vincenzo.frascino@arm.com, david@redhat.com, eugenis@google.com, kcc@google.com, hyesoo.yu@samsung.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, kvmarm@lists.linux.dev, linux-fsdevel@vger.kernel.org, linux-arch@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org Subject: [PATCH RFC 17/37] arm64: mte: Disable dynamic tag storage management if HW KASAN is enabled Date: Wed, 23 Aug 2023 14:13:30 +0100 Message-Id: <20230823131350.114942-18-alexandru.elisei@arm.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230823131350.114942-1-alexandru.elisei@arm.com> References: <20230823131350.114942-1-alexandru.elisei@arm.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-trace-kernel@vger.kernel.org Reserving the tag storage associated with a tagged page requires the ability to migrate existing data if the tag storage is in use for data. The kernel allocates pages, which are now tagged because of HW KASAN, in non-preemptible contexts, which can make reserving the associate tag storage impossible. Don't expose the tag storage pages to the memory allocator if HW KASAN is enabled. Signed-off-by: Alexandru Elisei --- arch/arm64/kernel/mte_tag_storage.c | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/arch/arm64/kernel/mte_tag_storage.c b/arch/arm64/kernel/mte_tag_storage.c index 4a6bfdf88458..f45128d0244e 100644 --- a/arch/arm64/kernel/mte_tag_storage.c +++ b/arch/arm64/kernel/mte_tag_storage.c @@ -314,6 +314,18 @@ static int __init mte_tag_storage_activate_regions(void) return 0; } + /* + * The kernel allocates memory in non-preemptible contexts, which makes + * migration impossible when reserving the associated tag storage. + * + * The check is safe to make because KASAN HW tags are enabled before + * the rest of the init functions are called, in smp_prepare_boot_cpu(). + */ + if (kasan_hw_tags_enabled()) { + pr_info("KASAN HW tags enabled, disabling tag storage"); + return 0; + } + for (i = 0; i < num_tag_regions; i++) { tag_range = &tag_regions[i].tag_range; for (pfn = tag_range->start; pfn <= tag_range->end; pfn += pageblock_nr_pages) { From patchwork Wed Aug 23 13:13:31 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexandru Elisei X-Patchwork-Id: 13362326 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6224FEE4993 for ; Wed, 23 Aug 2023 13:17:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230458AbjHWNRY (ORCPT ); Wed, 23 Aug 2023 09:17:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47288 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230108AbjHWNRV (ORCPT ); Wed, 23 Aug 2023 09:17:21 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id BAB841736; Wed, 23 Aug 2023 06:16:55 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id DA46F15DB; Wed, 23 Aug 2023 06:16:43 -0700 (PDT) Received: from e121798.cable.virginm.net (unknown [172.31.20.19]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id A7DD83F740; Wed, 23 Aug 2023 06:15:56 -0700 (PDT) From: Alexandru Elisei To: catalin.marinas@arm.com, will@kernel.org, oliver.upton@linux.dev, maz@kernel.org, james.morse@arm.com, suzuki.poulose@arm.com, yuzenghui@huawei.com, arnd@arndb.de, akpm@linux-foundation.org, mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, bristot@redhat.com, vschneid@redhat.com, mhiramat@kernel.org, rppt@kernel.org, hughd@google.com Cc: pcc@google.com, steven.price@arm.com, anshuman.khandual@arm.com, vincenzo.frascino@arm.com, david@redhat.com, eugenis@google.com, kcc@google.com, hyesoo.yu@samsung.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, kvmarm@lists.linux.dev, linux-fsdevel@vger.kernel.org, linux-arch@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org Subject: [PATCH RFC 18/37] arm64: mte: Check that tag storage blocks are in the same zone Date: Wed, 23 Aug 2023 14:13:31 +0100 Message-Id: <20230823131350.114942-19-alexandru.elisei@arm.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230823131350.114942-1-alexandru.elisei@arm.com> References: <20230823131350.114942-1-alexandru.elisei@arm.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-trace-kernel@vger.kernel.org alloc_contig_range() requires that the requested pages are in the same zone. Check that this is indeed the case before initializing the tag storage blocks. Signed-off-by: Alexandru Elisei --- arch/arm64/kernel/mte_tag_storage.c | 35 ++++++++++++++++++++++++++++- 1 file changed, 34 insertions(+), 1 deletion(-) diff --git a/arch/arm64/kernel/mte_tag_storage.c b/arch/arm64/kernel/mte_tag_storage.c index f45128d0244e..3e0123aa3fb3 100644 --- a/arch/arm64/kernel/mte_tag_storage.c +++ b/arch/arm64/kernel/mte_tag_storage.c @@ -269,12 +269,41 @@ void __init mte_tag_storage_init(void) num_tag_regions = 0; } +/* alloc_contig_range() requires all pages to be in the same zone. */ +static int __init mte_tag_storage_check_zone(void) +{ + struct range *tag_range; + struct zone *zone; + unsigned long pfn; + u32 block_size; + int i, j; + + for (i = 0; i < num_tag_regions; i++) { + block_size = tag_regions[i].block_size; + if (block_size == 1) + continue; + + tag_range = &tag_regions[i].tag_range; + for (pfn = tag_range->start; pfn <= tag_range->end; pfn += block_size) { + zone = page_zone(pfn_to_page(pfn)); + for (j = 1; j < block_size; j++) { + if (page_zone(pfn_to_page(pfn + j)) != zone) { + pr_err("Tag block pages in different zones"); + return -EINVAL; + } + } + } + } + + return 0; +} + static int __init mte_tag_storage_activate_regions(void) { phys_addr_t dram_start, dram_end; struct range *tag_range; unsigned long pfn; - int i; + int i, ret; if (num_tag_regions == 0) return 0; @@ -326,6 +355,10 @@ static int __init mte_tag_storage_activate_regions(void) return 0; } + ret = mte_tag_storage_check_zone(); + if (ret) + return ret; + for (i = 0; i < num_tag_regions; i++) { tag_range = &tag_regions[i].tag_range; for (pfn = tag_range->start; pfn <= tag_range->end; pfn += pageblock_nr_pages) { From patchwork Wed Aug 23 13:13:32 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexandru Elisei X-Patchwork-Id: 13362330 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 63EC7EE49B2 for ; Wed, 23 Aug 2023 13:18:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235718AbjHWNSE (ORCPT ); Wed, 23 Aug 2023 09:18:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35208 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235739AbjHWNR7 (ORCPT ); Wed, 23 Aug 2023 09:17:59 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 92A01E58; Wed, 23 Aug 2023 06:17:29 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 472231688; Wed, 23 Aug 2023 06:16:50 -0700 (PDT) Received: from e121798.cable.virginm.net (unknown [172.31.20.19]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 99D093F740; Wed, 23 Aug 2023 06:16:03 -0700 (PDT) From: Alexandru Elisei To: catalin.marinas@arm.com, will@kernel.org, oliver.upton@linux.dev, maz@kernel.org, james.morse@arm.com, suzuki.poulose@arm.com, yuzenghui@huawei.com, arnd@arndb.de, akpm@linux-foundation.org, mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, bristot@redhat.com, vschneid@redhat.com, mhiramat@kernel.org, rppt@kernel.org, hughd@google.com Cc: pcc@google.com, steven.price@arm.com, anshuman.khandual@arm.com, vincenzo.frascino@arm.com, david@redhat.com, eugenis@google.com, kcc@google.com, hyesoo.yu@samsung.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, kvmarm@lists.linux.dev, linux-fsdevel@vger.kernel.org, linux-arch@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org Subject: [PATCH RFC 19/37] mm: page_alloc: Manage metadata storage on page allocation Date: Wed, 23 Aug 2023 14:13:32 +0100 Message-Id: <20230823131350.114942-20-alexandru.elisei@arm.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230823131350.114942-1-alexandru.elisei@arm.com> References: <20230823131350.114942-1-alexandru.elisei@arm.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-trace-kernel@vger.kernel.org When a page is allocated with metadata, the associated metadata storage cannot be used for data because the page metadata will overwrite the metadata storage contents. Reserve metadata storage when the associated page is allocated with metadata enabled. If metadata storage cannot be reserved, because, for example, of a short term pin, then the page with metadata enabled which triggered the reservation will be put back at the tail of the free list and the page allocator will repeat the process for a new page. If the page allocator exhausts all allocation paths, then it must mean that the system is out of memory and this is treated like any other OOM situation. When a metadata-enabled page is freed, then also free the associated metadata storage, so it can be used to data allocations. For the direct reclaim slowpath, no special handling for metadata pages has been added - metadata pages are still considered for reclaim even if they cannot be used to satisfy the allocation request. This behaviour has been preserved to increase the chance that the metadata storage is free when the associated page is allocated with metadata enabled. Signed-off-by: Alexandru Elisei --- arch/arm64/include/asm/memory_metadata.h | 14 ++++++++ include/asm-generic/memory_metadata.h | 11 ++++++ include/linux/vm_event_item.h | 5 +++ mm/page_alloc.c | 43 ++++++++++++++++++++++++ mm/vmstat.c | 5 +++ 5 files changed, 78 insertions(+) diff --git a/arch/arm64/include/asm/memory_metadata.h b/arch/arm64/include/asm/memory_metadata.h index 3287b2776af1..1b18e3217dd0 100644 --- a/arch/arm64/include/asm/memory_metadata.h +++ b/arch/arm64/include/asm/memory_metadata.h @@ -20,12 +20,26 @@ static inline bool alloc_can_use_metadata_pages(gfp_t gfp_mask) return !(gfp_mask & __GFP_TAGGED); } +static inline bool alloc_requires_metadata(gfp_t gfp_mask) +{ + return gfp_mask & __GFP_TAGGED; +} + #define page_has_metadata(page) page_mte_tagged(page) static inline bool folio_has_metadata(struct folio *folio) { return page_has_metadata(&folio->page); } + +static inline int reserve_metadata_storage(struct page *page, int order, gfp_t gfp_mask) +{ + return 0; +} + +static inline void free_metadata_storage(struct page *page, int order) +{ +} #endif /* CONFIG_MEMORY_METADATA */ #endif /* __ASM_MEMORY_METADATA_H */ diff --git a/include/asm-generic/memory_metadata.h b/include/asm-generic/memory_metadata.h index 8f4e2fba222f..111d6edc0997 100644 --- a/include/asm-generic/memory_metadata.h +++ b/include/asm-generic/memory_metadata.h @@ -16,6 +16,17 @@ static inline bool alloc_can_use_metadata_pages(gfp_t gfp_mask) { return false; } +static inline bool alloc_requires_metadata(gfp_t gfp_mask) +{ + return false; +} +static inline int reserve_metadata_storage(struct page *page, int order, gfp_t gfp_mask) +{ + return 0; +} +static inline void free_metadata_storage(struct page *page, int order) +{ +} static inline bool page_has_metadata(struct page *page) { return false; diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h index 8abfa1240040..3163b85d2bc6 100644 --- a/include/linux/vm_event_item.h +++ b/include/linux/vm_event_item.h @@ -86,6 +86,11 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT, #ifdef CONFIG_CMA CMA_ALLOC_SUCCESS, CMA_ALLOC_FAIL, +#endif +#ifdef CONFIG_MEMORY_METADATA + METADATA_RESERVE_SUCCESS, + METADATA_RESERVE_FAIL, + METADATA_RESERVE_FREE, #endif UNEVICTABLE_PGCULLED, /* culled to noreclaim list */ UNEVICTABLE_PGSCANNED, /* scanned for reclaimability */ diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 011645d07ce9..911d3c362848 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1111,6 +1111,9 @@ static __always_inline bool free_pages_prepare(struct page *page, trace_mm_page_free(page, order); kmsan_free_page(page, order); + if (metadata_storage_enabled() && page_has_metadata(page)) + free_metadata_storage(page, order); + if (unlikely(PageHWPoison(page)) && !order) { /* * Do not let hwpoison pages hit pcplists/buddy @@ -3143,6 +3146,24 @@ static inline unsigned int gfp_to_alloc_flags_fast(gfp_t gfp_mask, return alloc_flags; } +#ifdef CONFIG_MEMORY_METADATA +static void return_page_to_buddy(struct page *page, int order) +{ + struct zone *zone = page_zone(page); + unsigned long pfn = page_to_pfn(page); + unsigned long flags; + int migratetype = get_pfnblock_migratetype(page, pfn); + + spin_lock_irqsave(&zone->lock, flags); + __free_one_page(page, pfn, zone, order, migratetype, FPI_TO_TAIL); + spin_unlock_irqrestore(&zone->lock, flags); +} +#else +static void return_page_to_buddy(struct page *page, int order) +{ +} +#endif + /* * get_page_from_freelist goes through the zonelist trying to allocate * a page. @@ -3156,6 +3177,7 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags, struct pglist_data *last_pgdat = NULL; bool last_pgdat_dirty_ok = false; bool no_fallback; + int ret; retry: /* @@ -3270,6 +3292,15 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags, page = rmqueue(ac->preferred_zoneref->zone, zone, order, gfp_mask, alloc_flags, ac->migratetype); if (page) { + if (metadata_storage_enabled() && alloc_requires_metadata(gfp_mask)) { + ret = reserve_metadata_storage(page, order, gfp_mask); + if (ret != 0) { + return_page_to_buddy(page, order); + page = NULL; + goto no_page; + } + } + prep_new_page(page, order, gfp_mask, alloc_flags); /* @@ -3285,7 +3316,10 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags, if (try_to_accept_memory(zone, order)) goto try_this_zone; } + } +no_page: + if (!page) { #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT /* Try again if zone has deferred pages */ if (deferred_pages_enabled()) { @@ -3475,6 +3509,7 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order, struct page *page = NULL; unsigned long pflags; unsigned int noreclaim_flag; + int ret; if (!order) return NULL; @@ -3498,6 +3533,14 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order, */ count_vm_event(COMPACTSTALL); + if (metadata_storage_enabled() && page && alloc_requires_metadata(gfp_mask)) { + ret = reserve_metadata_storage(page, order, gfp_mask); + if (ret != 0) { + return_page_to_buddy(page, order); + page = NULL; + } + } + /* Prep a captured page if available */ if (page) prep_new_page(page, order, gfp_mask, alloc_flags); diff --git a/mm/vmstat.c b/mm/vmstat.c index 07caa284a724..807b514718d2 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1338,6 +1338,11 @@ const char * const vmstat_text[] = { #ifdef CONFIG_CMA "cma_alloc_success", "cma_alloc_fail", +#endif +#ifdef CONFIG_MEMORY_METADATA + "metadata_reserve_success", + "metadata_reserve_fail", + "metadata_reserve_free", #endif "unevictable_pgs_culled", "unevictable_pgs_scanned", From patchwork Wed Aug 23 13:13:33 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexandru Elisei X-Patchwork-Id: 13362414 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 065B5EE4993 for ; Wed, 23 Aug 2023 13:27:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235366AbjHWN1H (ORCPT ); Wed, 23 Aug 2023 09:27:07 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42420 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235167AbjHWN1G (ORCPT ); Wed, 23 Aug 2023 09:27:06 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 30E2A10CF; Wed, 23 Aug 2023 06:26:42 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 93C321691; Wed, 23 Aug 2023 06:16:56 -0700 (PDT) Received: from e121798.cable.virginm.net (unknown [172.31.20.19]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 033CD3F740; Wed, 23 Aug 2023 06:16:09 -0700 (PDT) From: Alexandru Elisei To: catalin.marinas@arm.com, will@kernel.org, oliver.upton@linux.dev, maz@kernel.org, james.morse@arm.com, suzuki.poulose@arm.com, yuzenghui@huawei.com, arnd@arndb.de, akpm@linux-foundation.org, mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, bristot@redhat.com, vschneid@redhat.com, mhiramat@kernel.org, rppt@kernel.org, hughd@google.com Cc: pcc@google.com, steven.price@arm.com, anshuman.khandual@arm.com, vincenzo.frascino@arm.com, david@redhat.com, eugenis@google.com, kcc@google.com, hyesoo.yu@samsung.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, kvmarm@lists.linux.dev, linux-fsdevel@vger.kernel.org, linux-arch@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org Subject: [PATCH RFC 20/37] mm: compaction: Reserve metadata storage in compaction_alloc() Date: Wed, 23 Aug 2023 14:13:33 +0100 Message-Id: <20230823131350.114942-21-alexandru.elisei@arm.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230823131350.114942-1-alexandru.elisei@arm.com> References: <20230823131350.114942-1-alexandru.elisei@arm.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-trace-kernel@vger.kernel.org If the source page being migrated has metadata associated with it, make sure to reserve the metadata storage when choosing a suitable destination page from the free list. Signed-off-by: Alexandru Elisei --- mm/compaction.c | 9 +++++++++ mm/internal.h | 1 + 2 files changed, 10 insertions(+) diff --git a/mm/compaction.c b/mm/compaction.c index cc0139fa0cb0..af2ee3085623 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -570,6 +570,7 @@ static unsigned long isolate_freepages_block(struct compact_control *cc, bool locked = false; unsigned long blockpfn = *start_pfn; unsigned int order; + int ret; /* Strict mode is for isolation, speed is secondary */ if (strict) @@ -626,6 +627,11 @@ static unsigned long isolate_freepages_block(struct compact_control *cc, /* Found a free page, will break it into order-0 pages */ order = buddy_order(page); + if (metadata_storage_enabled() && cc->reserve_metadata) { + ret = reserve_metadata_storage(page, order, cc->gfp_mask); + if (ret) + goto isolate_fail; + } isolated = __isolate_free_page(page, order); if (!isolated) break; @@ -1757,6 +1763,9 @@ static struct folio *compaction_alloc(struct folio *src, unsigned long data) struct compact_control *cc = (struct compact_control *)data; struct folio *dst; + if (metadata_storage_enabled()) + cc->reserve_metadata = folio_has_metadata(src); + if (list_empty(&cc->freepages)) { isolate_freepages(cc); diff --git a/mm/internal.h b/mm/internal.h index d28ac0085f61..046cc264bfbe 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -492,6 +492,7 @@ struct compact_control { */ bool alloc_contig; /* alloc_contig_range allocation */ bool source_has_metadata; /* source pages have associated metadata */ + bool reserve_metadata; }; /* From patchwork Wed Aug 23 13:13:34 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexandru Elisei X-Patchwork-Id: 13362596 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AA9D5EE49BA for ; Wed, 23 Aug 2023 14:01:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235948AbjHWOBj (ORCPT ); Wed, 23 Aug 2023 10:01:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53770 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236007AbjHWN1i (ORCPT ); Wed, 23 Aug 2023 09:27:38 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 548CE10CA; Wed, 23 Aug 2023 06:27:05 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id C95B2168F; Wed, 23 Aug 2023 06:17:03 -0700 (PDT) Received: from e121798.cable.virginm.net (unknown [172.31.20.19]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id B4A793F740; Wed, 23 Aug 2023 06:16:16 -0700 (PDT) From: Alexandru Elisei To: catalin.marinas@arm.com, will@kernel.org, oliver.upton@linux.dev, maz@kernel.org, james.morse@arm.com, suzuki.poulose@arm.com, yuzenghui@huawei.com, arnd@arndb.de, akpm@linux-foundation.org, mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, bristot@redhat.com, vschneid@redhat.com, mhiramat@kernel.org, rppt@kernel.org, hughd@google.com Cc: pcc@google.com, steven.price@arm.com, anshuman.khandual@arm.com, vincenzo.frascino@arm.com, david@redhat.com, eugenis@google.com, kcc@google.com, hyesoo.yu@samsung.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, kvmarm@lists.linux.dev, linux-fsdevel@vger.kernel.org, linux-arch@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org Subject: [PATCH RFC 21/37] mm: khugepaged: Handle metadata-enabled VMAs Date: Wed, 23 Aug 2023 14:13:34 +0100 Message-Id: <20230823131350.114942-22-alexandru.elisei@arm.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230823131350.114942-1-alexandru.elisei@arm.com> References: <20230823131350.114942-1-alexandru.elisei@arm.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-trace-kernel@vger.kernel.org Both madvise(MADV_COLLAPSE) and khugepaged can collapse a contiguous THP-sized memory region mapped as PTEs into a THP. If metadata is enabled for the VMA where the PTEs are mapped, make sure to allocate metadata storage for the compound page that will be replacing them. Signed-off-by: Alexandru Elisei --- arch/arm64/include/asm/memory_metadata.h | 7 +++++++ include/asm-generic/memory_metadata.h | 4 ++++ mm/khugepaged.c | 7 +++++++ 3 files changed, 18 insertions(+) diff --git a/arch/arm64/include/asm/memory_metadata.h b/arch/arm64/include/asm/memory_metadata.h index 1b18e3217dd0..ade37331a5c8 100644 --- a/arch/arm64/include/asm/memory_metadata.h +++ b/arch/arm64/include/asm/memory_metadata.h @@ -5,6 +5,8 @@ #ifndef __ASM_MEMORY_METADATA_H #define __ASM_MEMORY_METADATA_H +#include + #include #include @@ -40,6 +42,11 @@ static inline int reserve_metadata_storage(struct page *page, int order, gfp_t g static inline void free_metadata_storage(struct page *page, int order) { } + +static inline bool vma_has_metadata(struct vm_area_struct *vma) +{ + return vma && (vma->vm_flags & VM_MTE); +} #endif /* CONFIG_MEMORY_METADATA */ #endif /* __ASM_MEMORY_METADATA_H */ diff --git a/include/asm-generic/memory_metadata.h b/include/asm-generic/memory_metadata.h index 111d6edc0997..35a0d6a8b5fc 100644 --- a/include/asm-generic/memory_metadata.h +++ b/include/asm-generic/memory_metadata.h @@ -35,6 +35,10 @@ static inline bool folio_has_metadata(struct folio *folio) { return false; } +static inline bool vma_has_metadata(struct vm_area_struct *vma) +{ + return false; +} #endif /* !CONFIG_MEMORY_METADATA */ #endif /* __ASM_GENERIC_MEMORY_METADATA_H */ diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 78c8d5d8b628..174710d941c2 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -20,6 +20,7 @@ #include #include +#include #include #include #include "internal.h" @@ -96,6 +97,7 @@ static struct kmem_cache *mm_slot_cache __read_mostly; struct collapse_control { bool is_khugepaged; + bool has_metadata; /* Num pages scanned per node */ u32 node_load[MAX_NUMNODES]; @@ -1069,6 +1071,9 @@ static int alloc_charge_hpage(struct page **hpage, struct mm_struct *mm, int node = hpage_collapse_find_target_node(cc); struct folio *folio; + if (cc->has_metadata) + gfp |= __GFP_TAGGED; + if (!hpage_collapse_alloc_page(hpage, gfp, node, &cc->alloc_nmask)) return SCAN_ALLOC_HUGE_PAGE_FAIL; @@ -2497,6 +2502,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result, if (khugepaged_scan.address < hstart) khugepaged_scan.address = hstart; VM_BUG_ON(khugepaged_scan.address & ~HPAGE_PMD_MASK); + cc->has_metadata = vma_has_metadata(vma); while (khugepaged_scan.address < hend) { bool mmap_locked = true; @@ -2838,6 +2844,7 @@ int madvise_collapse(struct vm_area_struct *vma, struct vm_area_struct **prev, if (!cc) return -ENOMEM; cc->is_khugepaged = false; + cc->has_metadata = vma_has_metadata(vma); mmgrab(mm); lru_add_drain_all(); From patchwork Wed Aug 23 13:13:35 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexandru Elisei X-Patchwork-Id: 13362419 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9C41EEE49B8 for ; Wed, 23 Aug 2023 13:27:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235902AbjHWN16 (ORCPT ); Wed, 23 Aug 2023 09:27:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53782 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236006AbjHWN1i (ORCPT ); Wed, 23 Aug 2023 09:27:38 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 5F2C410CF; Wed, 23 Aug 2023 06:27:06 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id CC2FD169C; Wed, 23 Aug 2023 06:17:09 -0700 (PDT) Received: from e121798.cable.virginm.net (unknown [172.31.20.19]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 866CD3F740; Wed, 23 Aug 2023 06:16:23 -0700 (PDT) From: Alexandru Elisei To: catalin.marinas@arm.com, will@kernel.org, oliver.upton@linux.dev, maz@kernel.org, james.morse@arm.com, suzuki.poulose@arm.com, yuzenghui@huawei.com, arnd@arndb.de, akpm@linux-foundation.org, mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, bristot@redhat.com, vschneid@redhat.com, mhiramat@kernel.org, rppt@kernel.org, hughd@google.com Cc: pcc@google.com, steven.price@arm.com, anshuman.khandual@arm.com, vincenzo.frascino@arm.com, david@redhat.com, eugenis@google.com, kcc@google.com, hyesoo.yu@samsung.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, kvmarm@lists.linux.dev, linux-fsdevel@vger.kernel.org, linux-arch@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org Subject: [PATCH RFC 22/37] mm: shmem: Allocate metadata storage for in-memory filesystems Date: Wed, 23 Aug 2023 14:13:35 +0100 Message-Id: <20230823131350.114942-23-alexandru.elisei@arm.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230823131350.114942-1-alexandru.elisei@arm.com> References: <20230823131350.114942-1-alexandru.elisei@arm.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-trace-kernel@vger.kernel.org Set __GFP_TAGGED when a new page is faulted in, so the page allocator reserves the corresponding metadata storage. Signed-off-by: Alexandru Elisei --- mm/shmem.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/mm/shmem.c b/mm/shmem.c index 2f2e0e618072..0b772ec34caa 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -81,6 +81,8 @@ static struct vfsmount *shm_mnt; #include +#include + #include "internal.h" #define BLOCKS_PER_PAGE (PAGE_SIZE/512) @@ -1530,7 +1532,7 @@ static struct folio *shmem_swapin(swp_entry_t swap, gfp_t gfp, */ static gfp_t limit_gfp_mask(gfp_t huge_gfp, gfp_t limit_gfp) { - gfp_t allowflags = __GFP_IO | __GFP_FS | __GFP_RECLAIM; + gfp_t allowflags = __GFP_IO | __GFP_FS | __GFP_RECLAIM | __GFP_TAGGED; gfp_t denyflags = __GFP_NOWARN | __GFP_NORETRY; gfp_t zoneflags = limit_gfp & GFP_ZONEMASK; gfp_t result = huge_gfp & ~(allowflags | GFP_ZONEMASK); @@ -1941,6 +1943,8 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index, goto alloc_nohuge; huge_gfp = vma_thp_gfp_mask(vma); + if (vma_has_metadata(vma)) + huge_gfp |= __GFP_TAGGED; huge_gfp = limit_gfp_mask(huge_gfp, gfp); folio = shmem_alloc_and_acct_folio(huge_gfp, inode, index, true); if (IS_ERR(folio)) { @@ -2101,6 +2105,10 @@ static vm_fault_t shmem_fault(struct vm_fault *vmf) int err; vm_fault_t ret = VM_FAULT_LOCKED; + /* Fixup gfp flags for metadata enabled VMAs. */ + if (vma_has_metadata(vma)) + gfp |= __GFP_TAGGED; + /* * Trinity finds that probing a hole which tmpfs is punching can * prevent the hole-punch from ever completing: which in turn From patchwork Wed Aug 23 13:13:36 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexandru Elisei X-Patchwork-Id: 13362417 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C52F0EE49B8 for ; Wed, 23 Aug 2023 13:27:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235837AbjHWN1J (ORCPT ); Wed, 23 Aug 2023 09:27:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42464 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234891AbjHWN1H (ORCPT ); Wed, 23 Aug 2023 09:27:07 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id C74BA10C3; Wed, 23 Aug 2023 06:26:42 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 6F7C11655; Wed, 23 Aug 2023 06:17:16 -0700 (PDT) Received: from e121798.cable.virginm.net (unknown [172.31.20.19]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 880023F740; Wed, 23 Aug 2023 06:16:29 -0700 (PDT) From: Alexandru Elisei To: catalin.marinas@arm.com, will@kernel.org, oliver.upton@linux.dev, maz@kernel.org, james.morse@arm.com, suzuki.poulose@arm.com, yuzenghui@huawei.com, arnd@arndb.de, akpm@linux-foundation.org, mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, bristot@redhat.com, vschneid@redhat.com, mhiramat@kernel.org, rppt@kernel.org, hughd@google.com Cc: pcc@google.com, steven.price@arm.com, anshuman.khandual@arm.com, vincenzo.frascino@arm.com, david@redhat.com, eugenis@google.com, kcc@google.com, hyesoo.yu@samsung.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, kvmarm@lists.linux.dev, linux-fsdevel@vger.kernel.org, linux-arch@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org Subject: [PATCH RFC 23/37] mm: Teach vma_alloc_folio() about metadata-enabled VMAs Date: Wed, 23 Aug 2023 14:13:36 +0100 Message-Id: <20230823131350.114942-24-alexandru.elisei@arm.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230823131350.114942-1-alexandru.elisei@arm.com> References: <20230823131350.114942-1-alexandru.elisei@arm.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-trace-kernel@vger.kernel.org When an anonymous page is mapped into the user address space as a result of a write fault, that page is zeroed. On arm64, when the VMA has metadata enabled, the tags are zeroed at the same time as the page contents, with the combination of gfp flags __GFP_ZERO | __GFP_TAGGED (which used be called __GFP_ZEROTAGS for this reason). For this use case, it is enough to set the __GFP_TAGGED flag in vma_alloc_zeroed_movable_folio(). But with dynamic tag storage reuse, it becomes necessary to have the __GFP_TAGGED flag set when allocating a page to be mapped in a VMA with metadata enabled in order reserve the corresponding metadata storage. Change vma_alloc_folio() to take into account VMAs with metadata enabled. Signed-off-by: Alexandru Elisei --- arch/arm64/include/asm/page.h | 5 ++--- arch/arm64/mm/fault.c | 19 ------------------- mm/mempolicy.c | 3 +++ 3 files changed, 5 insertions(+), 22 deletions(-) diff --git a/arch/arm64/include/asm/page.h b/arch/arm64/include/asm/page.h index 2312e6ee595f..88bab032a493 100644 --- a/arch/arm64/include/asm/page.h +++ b/arch/arm64/include/asm/page.h @@ -29,9 +29,8 @@ void copy_user_highpage(struct page *to, struct page *from, void copy_highpage(struct page *to, struct page *from); #define __HAVE_ARCH_COPY_HIGHPAGE -struct folio *vma_alloc_zeroed_movable_folio(struct vm_area_struct *vma, - unsigned long vaddr); -#define vma_alloc_zeroed_movable_folio vma_alloc_zeroed_movable_folio +#define vma_alloc_zeroed_movable_folio(vma, vaddr) \ + vma_alloc_folio(GFP_HIGHUSER_MOVABLE | __GFP_ZERO, 0, vma, vaddr, false) void tag_clear_highpage(struct page *to); #define __HAVE_ARCH_TAG_CLEAR_HIGHPAGE diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c index 1ca421c11ebc..7e2dcf5e3baf 100644 --- a/arch/arm64/mm/fault.c +++ b/arch/arm64/mm/fault.c @@ -936,25 +936,6 @@ void do_debug_exception(unsigned long addr_if_watchpoint, unsigned long esr, } NOKPROBE_SYMBOL(do_debug_exception); -/* - * Used during anonymous page fault handling. - */ -struct folio *vma_alloc_zeroed_movable_folio(struct vm_area_struct *vma, - unsigned long vaddr) -{ - gfp_t flags = GFP_HIGHUSER_MOVABLE | __GFP_ZERO; - - /* - * If the page is mapped with PROT_MTE, initialise the tags at the - * point of allocation and page zeroing as this is usually faster than - * separate DC ZVA and STGM. - */ - if (vma->vm_flags & VM_MTE) - flags |= __GFP_TAGGED; - - return vma_alloc_folio(flags, 0, vma, vaddr, false); -} - void tag_clear_highpage(struct page *page) { /* Tag storage pages cannot be tagged. */ diff --git a/mm/mempolicy.c b/mm/mempolicy.c index d164b5c50243..782e0771cabd 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -2170,6 +2170,9 @@ struct folio *vma_alloc_folio(gfp_t gfp, int order, struct vm_area_struct *vma, int preferred_nid; nodemask_t *nmask; + if (vma->vm_flags & VM_MTE) + gfp |= __GFP_TAGGED; + pol = get_vma_policy(vma, addr); if (pol->mode == MPOL_INTERLEAVE) { From patchwork Wed Aug 23 13:13:37 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexandru Elisei X-Patchwork-Id: 13362590 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B4171EE49A0 for ; Wed, 23 Aug 2023 13:56:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231497AbjHWN4W (ORCPT ); Wed, 23 Aug 2023 09:56:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59716 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236343AbjHWN4V (ORCPT ); Wed, 23 Aug 2023 09:56:21 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 37C15E45; Wed, 23 Aug 2023 06:56:17 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id EB6C41692; Wed, 23 Aug 2023 06:17:22 -0700 (PDT) Received: from e121798.cable.virginm.net (unknown [172.31.20.19]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 2EA123F740; Wed, 23 Aug 2023 06:16:36 -0700 (PDT) From: Alexandru Elisei To: catalin.marinas@arm.com, will@kernel.org, oliver.upton@linux.dev, maz@kernel.org, james.morse@arm.com, suzuki.poulose@arm.com, yuzenghui@huawei.com, arnd@arndb.de, akpm@linux-foundation.org, mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, bristot@redhat.com, vschneid@redhat.com, mhiramat@kernel.org, rppt@kernel.org, hughd@google.com Cc: pcc@google.com, steven.price@arm.com, anshuman.khandual@arm.com, vincenzo.frascino@arm.com, david@redhat.com, eugenis@google.com, kcc@google.com, hyesoo.yu@samsung.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, kvmarm@lists.linux.dev, linux-fsdevel@vger.kernel.org, linux-arch@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org Subject: [PATCH RFC 24/37] mm: page_alloc: Teach alloc_contig_range() about MIGRATE_METADATA Date: Wed, 23 Aug 2023 14:13:37 +0100 Message-Id: <20230823131350.114942-25-alexandru.elisei@arm.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230823131350.114942-1-alexandru.elisei@arm.com> References: <20230823131350.114942-1-alexandru.elisei@arm.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-trace-kernel@vger.kernel.org alloc_contig_range() allocates a contiguous range of physical memory. Metadata pages in use for data will have to be migrated and then taken from the free lists when they are repurposed to store tags, and that will be accomplished by calling alloc_contig_range(). The first step in alloc_contig_range() is to isolate the requested pages. If the pages are part of a larger huge page, then the hugepage must be split before the pages can be isolated. Add support for metadata pages in isolate_single_pageblock(). __isolate_free_page() checks the WMARK_MIN watermark before deleting the page from the free list. alloc_contig_range(), when called to allocate MIGRATE_METADATA pages, ends up calling this function from isolate_freepages_range() -> isolate_freepages_block(). As such, take into account the number of free metadata pages when checking the watermark to avoid false negatives. Signed-off-by: Alexandru Elisei --- mm/compaction.c | 4 ++-- mm/page_alloc.c | 9 +++++---- mm/page_isolation.c | 19 +++++++++++++------ 3 files changed, 20 insertions(+), 12 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index af2ee3085623..314793ec8bdb 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -46,7 +46,7 @@ static inline void count_compact_events(enum vm_event_item item, long delta) #define count_compact_events(item, delta) do { } while (0) #endif -#if defined CONFIG_COMPACTION || defined CONFIG_CMA +#if defined CONFIG_COMPACTION || defined CONFIG_CMA || defined CONFIG_MEMORY_METADATA #define CREATE_TRACE_POINTS #include @@ -1306,7 +1306,7 @@ isolate_migratepages_range(struct compact_control *cc, unsigned long start_pfn, return ret; } -#endif /* CONFIG_COMPACTION || CONFIG_CMA */ +#endif /* CONFIG_COMPACTION || CONFIG_CMA || CONFIG_MEMORY_METADATA */ #ifdef CONFIG_COMPACTION static bool suitable_migration_source(struct compact_control *cc, diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 911d3c362848..1adaefa22208 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2624,7 +2624,8 @@ int __isolate_free_page(struct page *page, unsigned int order) * exists. */ watermark = zone->_watermark[WMARK_MIN] + (1UL << order); - if (!zone_watermark_ok(zone, 0, watermark, 0, ALLOC_CMA)) + if (!zone_watermark_ok(zone, 0, watermark, 0, + ALLOC_CMA | ALLOC_FROM_METADATA)) return 0; __mod_zone_freepage_state(zone, -(1UL << order), mt); @@ -6246,9 +6247,9 @@ int __alloc_contig_migrate_range(struct compact_control *cc, * @start: start PFN to allocate * @end: one-past-the-last PFN to allocate * @migratetype: migratetype of the underlying pageblocks (either - * #MIGRATE_MOVABLE or #MIGRATE_CMA). All pageblocks - * in range must have the same migratetype and it must - * be either of the two. + * #MIGRATE_MOVABLE, #MIGRATE_CMA or #MIGRATE_METADATA). + * All pageblocks in range must have the same migratetype + * and it must be either of the three. * @gfp_mask: GFP mask to use during compaction * * The PFN range does not have to be pageblock aligned. The PFN range must diff --git a/mm/page_isolation.c b/mm/page_isolation.c index 6599cc965e21..bb2a72ce201b 100644 --- a/mm/page_isolation.c +++ b/mm/page_isolation.c @@ -52,6 +52,13 @@ static struct page *has_unmovable_pages(unsigned long start_pfn, unsigned long e return page; } + if (is_migrate_metadata_page(page)) { + if (is_migrate_metadata(migratetype)) + return NULL; + else + return page; + } + for (pfn = start_pfn; pfn < end_pfn; pfn++) { page = pfn_to_page(pfn); @@ -396,7 +403,7 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, int flags, pfn = head_pfn + nr_pages; continue; } -#if defined CONFIG_COMPACTION || defined CONFIG_CMA +#if defined CONFIG_COMPACTION || defined CONFIG_CMA || defined CONFIG_MEMORY_METADATA /* * hugetlb, lru compound (THP), and movable compound pages * can be migrated. Otherwise, fail the isolation. @@ -466,7 +473,7 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, int flags, pfn = outer_pfn; continue; } else -#endif +#endif /* CONFIG_COMPACTION || CONFIG_CMA || CONFIG_MEMORY_METADATA */ goto failed; } @@ -495,10 +502,10 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, int flags, * @gfp_flags: GFP flags used for migrating pages that sit across the * range boundaries. * - * Making page-allocation-type to be MIGRATE_ISOLATE means free pages in - * the range will never be allocated. Any free pages and pages freed in the - * future will not be allocated again. If specified range includes migrate types - * other than MOVABLE or CMA, this will fail with -EBUSY. For isolating all + * Making page-allocation-type to be MIGRATE_ISOLATE means free pages in the + * range will never be allocated. Any free pages and pages freed in the future + * will not be allocated again. If specified range includes migrate types other + * than MOVABLE, CMA or METADATA, this will fail with -EBUSY. For isolating all * pages in the range finally, the caller have to free all pages in the range. * test_page_isolated() can be used for test it. * From patchwork Wed Aug 23 13:13:38 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexandru Elisei X-Patchwork-Id: 13362589 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A6DEBEE4993 for ; Wed, 23 Aug 2023 13:56:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236336AbjHWN4U (ORCPT ); Wed, 23 Aug 2023 09:56:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49242 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232939AbjHWN4U (ORCPT ); Wed, 23 Aug 2023 09:56:20 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 2B5F3CFE; Wed, 23 Aug 2023 06:56:17 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id E999E165C; Wed, 23 Aug 2023 06:17:30 -0700 (PDT) Received: from e121798.cable.virginm.net (unknown [172.31.20.19]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id AC7ED3F740; Wed, 23 Aug 2023 06:16:42 -0700 (PDT) From: Alexandru Elisei To: catalin.marinas@arm.com, will@kernel.org, oliver.upton@linux.dev, maz@kernel.org, james.morse@arm.com, suzuki.poulose@arm.com, yuzenghui@huawei.com, arnd@arndb.de, akpm@linux-foundation.org, mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, bristot@redhat.com, vschneid@redhat.com, mhiramat@kernel.org, rppt@kernel.org, hughd@google.com Cc: pcc@google.com, steven.price@arm.com, anshuman.khandual@arm.com, vincenzo.frascino@arm.com, david@redhat.com, eugenis@google.com, kcc@google.com, hyesoo.yu@samsung.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, kvmarm@lists.linux.dev, linux-fsdevel@vger.kernel.org, linux-arch@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org Subject: [PATCH RFC 25/37] arm64: mte: Manage tag storage on page allocation Date: Wed, 23 Aug 2023 14:13:38 +0100 Message-Id: <20230823131350.114942-26-alexandru.elisei@arm.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230823131350.114942-1-alexandru.elisei@arm.com> References: <20230823131350.114942-1-alexandru.elisei@arm.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-trace-kernel@vger.kernel.org Reserve tag storage for a tagged allocation by migrating the contents of the tag storage (if in use for data) and removing the pages from page allocator by using alloc_contig_range(). When all the associated tagged pages have been freed, put the tag storage pages back to the page allocator, where they can be used for data allocations. Signed-off-by: Alexandru Elisei --- arch/arm64/include/asm/memory_metadata.h | 16 +- arch/arm64/include/asm/mte.h | 12 ++ arch/arm64/include/asm/mte_tag_storage.h | 8 + arch/arm64/kernel/mte_tag_storage.c | 242 ++++++++++++++++++++++- fs/proc/page.c | 1 + include/linux/kernel-page-flags.h | 1 + include/linux/page-flags.h | 1 + include/trace/events/mmflags.h | 3 +- mm/huge_memory.c | 1 + 9 files changed, 273 insertions(+), 12 deletions(-) diff --git a/arch/arm64/include/asm/memory_metadata.h b/arch/arm64/include/asm/memory_metadata.h index ade37331a5c8..167b039f06cf 100644 --- a/arch/arm64/include/asm/memory_metadata.h +++ b/arch/arm64/include/asm/memory_metadata.h @@ -12,9 +12,11 @@ #include #ifdef CONFIG_MEMORY_METADATA +DECLARE_STATIC_KEY_FALSE(metadata_storage_enabled_key); + static inline bool metadata_storage_enabled(void) { - return false; + return static_branch_likely(&metadata_storage_enabled_key); } static inline bool alloc_can_use_metadata_pages(gfp_t gfp_mask) @@ -34,19 +36,13 @@ static inline bool folio_has_metadata(struct folio *folio) return page_has_metadata(&folio->page); } -static inline int reserve_metadata_storage(struct page *page, int order, gfp_t gfp_mask) -{ - return 0; -} - -static inline void free_metadata_storage(struct page *page, int order) -{ -} - static inline bool vma_has_metadata(struct vm_area_struct *vma) { return vma && (vma->vm_flags & VM_MTE); } + +int reserve_metadata_storage(struct page *page, int order, gfp_t gfp_mask); +void free_metadata_storage(struct page *page, int order); #endif /* CONFIG_MEMORY_METADATA */ #endif /* __ASM_MEMORY_METADATA_H */ diff --git a/arch/arm64/include/asm/mte.h b/arch/arm64/include/asm/mte.h index 246a561652f4..70cfd09b4a11 100644 --- a/arch/arm64/include/asm/mte.h +++ b/arch/arm64/include/asm/mte.h @@ -44,9 +44,21 @@ void mte_free_tags_mem(void *tags); #define PG_mte_tagged PG_arch_2 /* simple lock to avoid multiple threads tagging the same page */ #define PG_mte_lock PG_arch_3 +/* Track if a tagged page has tag storage reserved */ +#define PG_tag_storage_reserved PG_arch_4 + +#ifdef CONFIG_ARM64_MTE_TAG_STORAGE +DECLARE_STATIC_KEY_FALSE(metadata_storage_enabled_key); +extern bool page_tag_storage_reserved(struct page *page); +#endif static inline void set_page_mte_tagged(struct page *page) { +#ifdef CONFIG_ARM64_MTE_TAG_STORAGE + /* Open code mte_tag_storage_enabled() */ + WARN_ON_ONCE(static_branch_likely(&metadata_storage_enabled_key) && + !page_tag_storage_reserved(page)); +#endif /* * Ensure that the tags written prior to this function are visible * before the page flags update. diff --git a/arch/arm64/include/asm/mte_tag_storage.h b/arch/arm64/include/asm/mte_tag_storage.h index 8f86c4f9a7c3..7b69a8af13f3 100644 --- a/arch/arm64/include/asm/mte_tag_storage.h +++ b/arch/arm64/include/asm/mte_tag_storage.h @@ -5,11 +5,19 @@ #ifndef __ASM_MTE_TAG_STORAGE_H #define __ASM_MTE_TAG_STORAGE_H +#include + #ifdef CONFIG_ARM64_MTE_TAG_STORAGE void mte_tag_storage_init(void); +bool page_tag_storage_reserved(struct page *page); #else static inline void mte_tag_storage_init(void) { } +static inline bool page_tag_storage_reserved(struct page *page) +{ + return true; +} #endif /* CONFIG_ARM64_MTE_TAG_STORAGE */ + #endif /* __ASM_MTE_TAG_STORAGE_H */ diff --git a/arch/arm64/kernel/mte_tag_storage.c b/arch/arm64/kernel/mte_tag_storage.c index 3e0123aa3fb3..075231443dec 100644 --- a/arch/arm64/kernel/mte_tag_storage.c +++ b/arch/arm64/kernel/mte_tag_storage.c @@ -11,13 +11,19 @@ #include #include #include +#include +#include #include +#include #include +#include #include #include #include +__ro_after_init DEFINE_STATIC_KEY_FALSE(metadata_storage_enabled_key); + struct tag_region { struct range mem_range; /* Memory associated with the tag storage, in PFNs. */ struct range tag_range; /* Tag storage memory, in PFNs. */ @@ -29,6 +35,30 @@ struct tag_region { static struct tag_region tag_regions[MAX_TAG_REGIONS]; static int num_tag_regions; +/* + * A note on locking. Reserving tag storage takes the tag_blocks_lock mutex, + * because alloc_contig_range() might sleep. + * + * Freeing tag storage takes the xa_lock spinlock with interrupts disabled + * because pages can be freed from non-preemptible contexts, including from an + * interrupt handler. + * + * Because tag storage freeing can happen from interrupt contexts, the xarray is + * defined with the XA_FLAGS_LOCK_IRQ flag to disable interrupts when calling + * xa_store() to prevent a deadlock. + * + * This means that reserve_metadata_storage() cannot run concurrently with + * itself (no concurrent insertions), but it can run at the same time as + * free_metadata_storage(). The first thing that reserve_metadata_storage() does + * after taking the mutex is increase the refcount on all present tag storage + * blocks with the xa_lock held, to serialize against freeing the blocks. This + * is an optimization to avoid taking and releasing the xa_lock after each + * iteration if the refcount operation was moved inside the loop, where it would + * have had to be executed for each block. + */ +static DEFINE_XARRAY_FLAGS(tag_blocks_reserved, XA_FLAGS_LOCK_IRQ); +static DEFINE_MUTEX(tag_blocks_lock); + static int __init tag_storage_of_flat_get_range(unsigned long node, const __be32 *reg, int reg_len, struct range *range) { @@ -367,6 +397,216 @@ static int __init mte_tag_storage_activate_regions(void) } } + return ret; +} +core_initcall(mte_tag_storage_activate_regions); + +bool page_tag_storage_reserved(struct page *page) +{ + return test_bit(PG_tag_storage_reserved, &page->flags); +} + +static int tag_storage_find_block_in_region(struct page *page, unsigned long *blockp, + struct tag_region *region) +{ + struct range *tag_range = ®ion->tag_range; + struct range *mem_range = ®ion->mem_range; + u64 page_pfn = page_to_pfn(page); + u64 block, block_offset; + + if (!(mem_range->start <= page_pfn && page_pfn <= mem_range->end)) + return -ERANGE; + + block_offset = (page_pfn - mem_range->start) / 32; + block = tag_range->start + rounddown(block_offset, region->block_size); + + if (block + region->block_size - 1 > tag_range->end) { + pr_err("Block 0x%llx-0x%llx is outside tag region 0x%llx-0x%llx\n", + PFN_PHYS(block), PFN_PHYS(block + region->block_size), + PFN_PHYS(tag_range->start), PFN_PHYS(tag_range->end)); + return -ERANGE; + } + *blockp = block; + + return 0; +} + +static int tag_storage_find_block(struct page *page, unsigned long *block, + struct tag_region **region) +{ + int i, ret; + + for (i = 0; i < num_tag_regions; i++) { + ret = tag_storage_find_block_in_region(page, block, &tag_regions[i]); + if (ret == 0) { + *region = &tag_regions[i]; + return 0; + } + } + + return -EINVAL; +} + +static void block_ref_add(unsigned long block, struct tag_region *region, int order) +{ + int count; + + count = min(1u << order, 32 * region->block_size); + page_ref_add(pfn_to_page(block), count); +} + +static int block_ref_sub_return(unsigned long block, struct tag_region *region, int order) +{ + int count; + + count = min(1u << order, 32 * region->block_size); + return page_ref_sub_return(pfn_to_page(block), count); +} + +static bool tag_storage_block_is_reserved(unsigned long block) +{ + return xa_load(&tag_blocks_reserved, block) != NULL; +} + +static int tag_storage_reserve_block(unsigned long block, struct tag_region *region, int order) +{ + int ret; + + ret = xa_err(xa_store(&tag_blocks_reserved, block, pfn_to_page(block), GFP_KERNEL)); + if (!ret) + block_ref_add(block, region, order); + + return ret; +} + +bool alloc_can_use_tag_storage(gfp_t gfp_mask) +{ + return !(gfp_mask & __GFP_TAGGED); +} + +bool alloc_requires_tag_storage(gfp_t gfp_mask) +{ + return gfp_mask & __GFP_TAGGED; +} + +static int order_to_num_blocks(int order) +{ + return max((1 << order) / 32, 1); +} + +int reserve_metadata_storage(struct page *page, int order, gfp_t gfp) +{ + unsigned long start_block, end_block; + struct tag_region *region; + unsigned long block; + unsigned long flags; + int i, tries; + int ret = 0; + + VM_WARN_ON_ONCE(!preemptible()); + + /* + * __alloc_contig_migrate_range() ignores gfp when allocating the + * destination page for migration. Regardless, massage gfp flags and + * remove __GFP_TAGGED to avoid recursion in case gfp stops being + * ignored. + */ + gfp &= ~__GFP_TAGGED; + if (!(gfp & __GFP_NORETRY)) + gfp |= __GFP_RETRY_MAYFAIL; + + ret = tag_storage_find_block(page, &start_block, ®ion); + if (WARN_ONCE(ret, "Missing tag storage block for pfn 0x%lx", page_to_pfn(page))) + return 0; + + end_block = start_block + order_to_num_blocks(order) * region->block_size; + + mutex_lock(&tag_blocks_lock); + + /* Make sure existing entries are not freed from out under out feet. */ + xa_lock_irqsave(&tag_blocks_reserved, flags); + for (block = start_block; block < end_block; block += region->block_size) { + if (tag_storage_block_is_reserved(block)) + block_ref_add(block, region, order); + } + xa_unlock_irqrestore(&tag_blocks_reserved, flags); + + for (block = start_block; block < end_block; block += region->block_size) { + /* Refcount incremented above. */ + if (tag_storage_block_is_reserved(block)) + continue; + + tries = 5; + while (tries--) { + ret = alloc_contig_range(block, block + region->block_size, MIGRATE_METADATA, gfp); + if (ret == 0 || ret != -EBUSY) + break; + } + + if (ret) + goto out_error; + + ret = tag_storage_reserve_block(block, region, order); + if (ret) { + free_contig_range(block, region->block_size); + goto out_error; + } + + count_vm_events(METADATA_RESERVE_SUCCESS, region->block_size); + } + + for (i = 0; i < (1 << order); i++) + set_bit(PG_tag_storage_reserved, &(page + i)->flags); + + mutex_unlock(&tag_blocks_lock); + return 0; + +out_error: + xa_lock_irqsave(&tag_blocks_reserved, flags); + for (block = start_block; block < end_block; block += region->block_size) { + if (tag_storage_block_is_reserved(block) && + block_ref_sub_return(block, region, order) == 1) { + __xa_erase(&tag_blocks_reserved, block); + free_contig_range(block, region->block_size); + } + } + xa_unlock_irqrestore(&tag_blocks_reserved, flags); + + mutex_unlock(&tag_blocks_lock); + + count_vm_events(METADATA_RESERVE_FAIL, region->block_size); + + return ret; +} + +void free_metadata_storage(struct page *page, int order) +{ + unsigned long block, start_block, end_block; + struct tag_region *region; + unsigned long flags; + int ret; + + if (WARN_ONCE(!page_mte_tagged(page), "pfn 0x%lx is not tagged", page_to_pfn(page))) + return; + + ret = tag_storage_find_block(page, &start_block, ®ion); + if (WARN_ONCE(ret, "Missing tag storage block for pfn 0x%lx", page_to_pfn(page))) + return; + + end_block = start_block + order_to_num_blocks(order) * region->block_size; + + xa_lock_irqsave(&tag_blocks_reserved, flags); + for (block = start_block; block < end_block; block += region->block_size) { + if (WARN_ONCE(!tag_storage_block_is_reserved(block), + "Block 0x%lx is not reserved for pfn 0x%lx", block, page_to_pfn(page))) + continue; + + if (block_ref_sub_return(block, region, order) == 1) { + __xa_erase(&tag_blocks_reserved, block); + free_contig_range(block, region->block_size); + count_vm_events(METADATA_RESERVE_FREE, region->block_size); + } + } + xa_unlock_irqrestore(&tag_blocks_reserved, flags); } -core_initcall(mte_tag_storage_activate_regions) diff --git a/fs/proc/page.c b/fs/proc/page.c index 195b077c0fac..e7eb584a9234 100644 --- a/fs/proc/page.c +++ b/fs/proc/page.c @@ -221,6 +221,7 @@ u64 stable_page_flags(struct page *page) #ifdef CONFIG_ARCH_USES_PG_ARCH_X u |= kpf_copy_bit(k, KPF_ARCH_2, PG_arch_2); u |= kpf_copy_bit(k, KPF_ARCH_3, PG_arch_3); + u |= kpf_copy_bit(k, KPF_ARCH_4, PG_arch_4); #endif return u; diff --git a/include/linux/kernel-page-flags.h b/include/linux/kernel-page-flags.h index 859f4b0c1b2b..4a0d719ffdd4 100644 --- a/include/linux/kernel-page-flags.h +++ b/include/linux/kernel-page-flags.h @@ -19,5 +19,6 @@ #define KPF_SOFTDIRTY 40 #define KPF_ARCH_2 41 #define KPF_ARCH_3 42 +#define KPF_ARCH_4 43 #endif /* LINUX_KERNEL_PAGE_FLAGS_H */ diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 92a2063a0a23..42fb54cb9a54 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -135,6 +135,7 @@ enum pageflags { #ifdef CONFIG_ARCH_USES_PG_ARCH_X PG_arch_2, PG_arch_3, + PG_arch_4, #endif __NR_PAGEFLAGS, diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h index 4ccca8e73c93..23f1a76d66a7 100644 --- a/include/trace/events/mmflags.h +++ b/include/trace/events/mmflags.h @@ -125,7 +125,8 @@ IF_HAVE_PG_HWPOISON(hwpoison) \ IF_HAVE_PG_IDLE(idle) \ IF_HAVE_PG_IDLE(young) \ IF_HAVE_PG_ARCH_X(arch_2) \ -IF_HAVE_PG_ARCH_X(arch_3) +IF_HAVE_PG_ARCH_X(arch_3) \ +IF_HAVE_PG_ARCH_X(arch_4) #define show_page_flags(flags) \ (flags) ? __print_flags(flags, "|", \ diff --git a/mm/huge_memory.c b/mm/huge_memory.c index eb3678360b97..cf5247b012de 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2458,6 +2458,7 @@ static void __split_huge_page_tail(struct page *head, int tail, #ifdef CONFIG_ARCH_USES_PG_ARCH_X (1L << PG_arch_2) | (1L << PG_arch_3) | + (1L << PG_arch_4) | #endif (1L << PG_dirty) | LRU_GEN_MASK | LRU_REFS_MASK)); From patchwork Wed Aug 23 13:13:39 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexandru Elisei X-Patchwork-Id: 13362418 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 10AEDEE49BD for ; Wed, 23 Aug 2023 13:27:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235864AbjHWN1M (ORCPT ); Wed, 23 Aug 2023 09:27:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42490 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235805AbjHWN1H (ORCPT ); Wed, 23 Aug 2023 09:27:07 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 27F5C10FB; Wed, 23 Aug 2023 06:26:43 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id D68951682; Wed, 23 Aug 2023 06:17:38 -0700 (PDT) Received: from e121798.cable.virginm.net (unknown [172.31.20.19]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id BFDA13F740; Wed, 23 Aug 2023 06:16:50 -0700 (PDT) From: Alexandru Elisei To: catalin.marinas@arm.com, will@kernel.org, oliver.upton@linux.dev, maz@kernel.org, james.morse@arm.com, suzuki.poulose@arm.com, yuzenghui@huawei.com, arnd@arndb.de, akpm@linux-foundation.org, mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, bristot@redhat.com, vschneid@redhat.com, mhiramat@kernel.org, rppt@kernel.org, hughd@google.com Cc: pcc@google.com, steven.price@arm.com, anshuman.khandual@arm.com, vincenzo.frascino@arm.com, david@redhat.com, eugenis@google.com, kcc@google.com, hyesoo.yu@samsung.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, kvmarm@lists.linux.dev, linux-fsdevel@vger.kernel.org, linux-arch@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org Subject: [PATCH RFC 26/37] arm64: mte: Perform CMOs for tag blocks on tagged page allocation/free Date: Wed, 23 Aug 2023 14:13:39 +0100 Message-Id: <20230823131350.114942-27-alexandru.elisei@arm.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230823131350.114942-1-alexandru.elisei@arm.com> References: <20230823131350.114942-1-alexandru.elisei@arm.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-trace-kernel@vger.kernel.org Make sure that the contents of the tag storage block is not corrupted by performing: 1. A tag dcache inval when the associated tagged pages are freed, to avoid dirty tag cache lines being evicted and corrupting the tag storage block when it's being used to store data. 2. A data cache inval when the tag storage block is being reserved, to ensure that no dirty data cache lines are present, which would trigger a writeback that could corrupt the tags stored in the block. Signed-off-by: Alexandru Elisei --- arch/arm64/include/asm/assembler.h | 10 ++++++++++ arch/arm64/include/asm/mte_tag_storage.h | 2 ++ arch/arm64/kernel/mte_tag_storage.c | 14 ++++++++++++++ arch/arm64/lib/mte.S | 16 ++++++++++++++++ 4 files changed, 42 insertions(+) diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h index 376a980f2bad..8d41c8cfdc69 100644 --- a/arch/arm64/include/asm/assembler.h +++ b/arch/arm64/include/asm/assembler.h @@ -310,6 +310,16 @@ alternative_cb_end lsl \reg, \reg, \tmp // actual cache line size .endm +/* + * tcache_line_size - get the safe tag cache line size across all CPUs + */ + .macro tcache_line_size, reg, tmp + read_ctr \tmp + ubfm \tmp, \tmp, #32, #37 // tag cache line size encoding + mov \reg, #4 // bytes per word + lsl \reg, \reg, \tmp // actual tag cache line size + .endm + /* * raw_icache_line_size - get the minimum I-cache line size on this CPU * from the CTR register. diff --git a/arch/arm64/include/asm/mte_tag_storage.h b/arch/arm64/include/asm/mte_tag_storage.h index 7b69a8af13f3..bad865866eeb 100644 --- a/arch/arm64/include/asm/mte_tag_storage.h +++ b/arch/arm64/include/asm/mte_tag_storage.h @@ -7,6 +7,8 @@ #include +extern void dcache_inval_tags_poc(unsigned long start, unsigned long end); + #ifdef CONFIG_ARM64_MTE_TAG_STORAGE void mte_tag_storage_init(void); bool page_tag_storage_reserved(struct page *page); diff --git a/arch/arm64/kernel/mte_tag_storage.c b/arch/arm64/kernel/mte_tag_storage.c index 075231443dec..7dff93492a7b 100644 --- a/arch/arm64/kernel/mte_tag_storage.c +++ b/arch/arm64/kernel/mte_tag_storage.c @@ -19,6 +19,7 @@ #include #include +#include #include #include @@ -470,8 +471,13 @@ static bool tag_storage_block_is_reserved(unsigned long block) static int tag_storage_reserve_block(unsigned long block, struct tag_region *region, int order) { + unsigned long block_va; int ret; + block_va = (unsigned long)page_to_virt(pfn_to_page(block)); + /* Avoid writeback of dirty data cache lines corrupting tags. */ + dcache_inval_poc(block_va, block_va + region->block_size * PAGE_SIZE); + ret = xa_err(xa_store(&tag_blocks_reserved, block, pfn_to_page(block), GFP_KERNEL)); if (!ret) block_ref_add(block, region, order); @@ -584,6 +590,7 @@ void free_metadata_storage(struct page *page, int order) { unsigned long block, start_block, end_block; struct tag_region *region; + unsigned long page_va; unsigned long flags; int ret; @@ -594,6 +601,13 @@ void free_metadata_storage(struct page *page, int order) if (WARN_ONCE(ret, "Missing tag storage block for pfn 0x%lx", page_to_pfn(page))) return; + page_va = (unsigned long)page_to_virt(page); + /* + * Remove dirty tag cache lines to avoid corruption of the tag storage + * page contents when it gets freed back to the page allocator. + */ + dcache_inval_tags_poc(page_va, page_va + (PAGE_SIZE << order)); + end_block = start_block + order_to_num_blocks(order) * region->block_size; xa_lock_irqsave(&tag_blocks_reserved, flags); diff --git a/arch/arm64/lib/mte.S b/arch/arm64/lib/mte.S index d3c4ff70f48b..97f0bb071284 100644 --- a/arch/arm64/lib/mte.S +++ b/arch/arm64/lib/mte.S @@ -175,3 +175,19 @@ SYM_FUNC_START(mte_restore_page_tags_from_mem) ret SYM_FUNC_END(mte_restore_page_tags_from_mem) + +/* + * dcache_inval_tags_poc(start, end) + * + * Ensure that any tags in the D-cache for the interval [start, end) + * are invalidated to PoC. + * + * - start - virtual start address of region + * - end - virtual end address of region + */ +SYM_FUNC_START(__pi_dcache_inval_tags_poc) + tcache_line_size x2, x3 + dcache_by_myline_op igvac, sy, x0, x1, x2, x3 + ret +SYM_FUNC_END(__pi_dcache_inval_tags_poc) +SYM_FUNC_ALIAS(dcache_inval_tags_poc, __pi_dcache_inval_tags_poc) From patchwork Wed Aug 23 13:13:40 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexandru Elisei X-Patchwork-Id: 13362594 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B3A53EE49B2 for ; Wed, 23 Aug 2023 14:01:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235705AbjHWOBh (ORCPT ); Wed, 23 Aug 2023 10:01:37 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43566 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236004AbjHWN1i (ORCPT ); Wed, 23 Aug 2023 09:27:38 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 68E6510E7; Wed, 23 Aug 2023 06:27:06 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 1A4C816F2; Wed, 23 Aug 2023 06:17:45 -0700 (PDT) Received: from e121798.cable.virginm.net (unknown [172.31.20.19]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 9BF043F740; Wed, 23 Aug 2023 06:16:58 -0700 (PDT) From: Alexandru Elisei To: catalin.marinas@arm.com, will@kernel.org, oliver.upton@linux.dev, maz@kernel.org, james.morse@arm.com, suzuki.poulose@arm.com, yuzenghui@huawei.com, arnd@arndb.de, akpm@linux-foundation.org, mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, bristot@redhat.com, vschneid@redhat.com, mhiramat@kernel.org, rppt@kernel.org, hughd@google.com Cc: pcc@google.com, steven.price@arm.com, anshuman.khandual@arm.com, vincenzo.frascino@arm.com, david@redhat.com, eugenis@google.com, kcc@google.com, hyesoo.yu@samsung.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, kvmarm@lists.linux.dev, linux-fsdevel@vger.kernel.org, linux-arch@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org Subject: [PATCH RFC 27/37] arm64: mte: Reserve tag block for the zero page Date: Wed, 23 Aug 2023 14:13:40 +0100 Message-Id: <20230823131350.114942-28-alexandru.elisei@arm.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230823131350.114942-1-alexandru.elisei@arm.com> References: <20230823131350.114942-1-alexandru.elisei@arm.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-trace-kernel@vger.kernel.org On arm64, the zero page receives special treatment by having the tagged flag set on MTE initialization, not when the page is mapped in a process address space. Reserve the corresponding tag block when tag storage is being activated. Signed-off-by: Alexandru Elisei --- arch/arm64/kernel/mte_tag_storage.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/arm64/kernel/mte_tag_storage.c b/arch/arm64/kernel/mte_tag_storage.c index 7dff93492a7b..1ab875be5f9b 100644 --- a/arch/arm64/kernel/mte_tag_storage.c +++ b/arch/arm64/kernel/mte_tag_storage.c @@ -398,6 +398,8 @@ static int __init mte_tag_storage_activate_regions(void) } } + ret = reserve_metadata_storage(ZERO_PAGE(0), 0, GFP_HIGHUSER_MOVABLE); + return ret; } core_initcall(mte_tag_storage_activate_regions); From patchwork Wed Aug 23 13:13:41 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexandru Elisei X-Patchwork-Id: 13362416 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9DC00EE49B7 for ; Wed, 23 Aug 2023 13:27:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235822AbjHWN1I (ORCPT ); Wed, 23 Aug 2023 09:27:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42478 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232438AbjHWN1H (ORCPT ); Wed, 23 Aug 2023 09:27:07 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id C2EB410D0; Wed, 23 Aug 2023 06:26:42 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 5D19016F3; Wed, 23 Aug 2023 06:17:51 -0700 (PDT) Received: from e121798.cable.virginm.net (unknown [172.31.20.19]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id C9D2F3F740; Wed, 23 Aug 2023 06:17:04 -0700 (PDT) From: Alexandru Elisei To: catalin.marinas@arm.com, will@kernel.org, oliver.upton@linux.dev, maz@kernel.org, james.morse@arm.com, suzuki.poulose@arm.com, yuzenghui@huawei.com, arnd@arndb.de, akpm@linux-foundation.org, mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, bristot@redhat.com, vschneid@redhat.com, mhiramat@kernel.org, rppt@kernel.org, hughd@google.com Cc: pcc@google.com, steven.price@arm.com, anshuman.khandual@arm.com, vincenzo.frascino@arm.com, david@redhat.com, eugenis@google.com, kcc@google.com, hyesoo.yu@samsung.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, kvmarm@lists.linux.dev, linux-fsdevel@vger.kernel.org, linux-arch@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org Subject: [PATCH RFC 28/37] mm: sched: Introduce PF_MEMALLOC_ISOLATE Date: Wed, 23 Aug 2023 14:13:41 +0100 Message-Id: <20230823131350.114942-29-alexandru.elisei@arm.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230823131350.114942-1-alexandru.elisei@arm.com> References: <20230823131350.114942-1-alexandru.elisei@arm.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-trace-kernel@vger.kernel.org On arm64, when reserving tag storage for an allocated page, if the tag storage is in use, the tag storage must be migrated before it can be reserved. As part of the migration process, the tag storage block is first isolated. Compaction also isolates the source pages before migrating them. If the target for compaction requires metadata pages to be reserved, those metadata pages might also need to be isolated, which, in rare circumstances, can lead to the threshold in too_many_isolated() being reached, and isolate_migratepages_pageblock() will get stuck in an infinite loop. Add the flag PF_MEMALLOC_ISOLATE for the current thread, which makes too_many_isolated() ignore the threshold to make forward progress in isolate_migratepages_pageblock(). For consistency, the similarly named function too_many_isolated() called during reclaim has received the same treatment. Signed-off-by: Alexandru Elisei --- arch/arm64/kernel/mte_tag_storage.c | 5 ++++- include/linux/sched.h | 2 +- include/linux/sched/mm.h | 13 +++++++++++++ mm/compaction.c | 3 +++ mm/vmscan.c | 3 +++ 5 files changed, 24 insertions(+), 2 deletions(-) diff --git a/arch/arm64/kernel/mte_tag_storage.c b/arch/arm64/kernel/mte_tag_storage.c index 1ab875be5f9b..ba316ffb9aef 100644 --- a/arch/arm64/kernel/mte_tag_storage.c +++ b/arch/arm64/kernel/mte_tag_storage.c @@ -505,9 +505,9 @@ static int order_to_num_blocks(int order) int reserve_metadata_storage(struct page *page, int order, gfp_t gfp) { unsigned long start_block, end_block; + unsigned long flags, cflags; struct tag_region *region; unsigned long block; - unsigned long flags; int i, tries; int ret = 0; @@ -539,6 +539,7 @@ int reserve_metadata_storage(struct page *page, int order, gfp_t gfp) } xa_unlock_irqrestore(&tag_blocks_reserved, flags); + cflags = memalloc_isolate_save(); for (block = start_block; block < end_block; block += region->block_size) { /* Refcount incremented above. */ if (tag_storage_block_is_reserved(block)) @@ -566,6 +567,7 @@ int reserve_metadata_storage(struct page *page, int order, gfp_t gfp) for (i = 0; i < (1 << order); i++) set_bit(PG_tag_storage_reserved, &(page + i)->flags); + memalloc_isolate_restore(cflags); mutex_unlock(&tag_blocks_lock); return 0; @@ -581,6 +583,7 @@ int reserve_metadata_storage(struct page *page, int order, gfp_t gfp) } xa_unlock_irqrestore(&tag_blocks_reserved, flags); + memalloc_isolate_restore(cflags); mutex_unlock(&tag_blocks_lock); count_vm_events(METADATA_RESERVE_FAIL, region->block_size); diff --git a/include/linux/sched.h b/include/linux/sched.h index 609bde814cb0..a2a930cab31a 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1734,7 +1734,7 @@ extern struct pid *cad_pid; #define PF_USED_MATH 0x00002000 /* If unset the fpu must be initialized before use */ #define PF_USER_WORKER 0x00004000 /* Kernel thread cloned from userspace thread */ #define PF_NOFREEZE 0x00008000 /* This thread should not be frozen */ -#define PF__HOLE__00010000 0x00010000 +#define PF_MEMALLOC_ISOLATE 0x00010000 /* Ignore isolation limits */ #define PF_KSWAPD 0x00020000 /* I am kswapd */ #define PF_MEMALLOC_NOFS 0x00040000 /* All allocation requests will inherit GFP_NOFS */ #define PF_MEMALLOC_NOIO 0x00080000 /* All allocation requests will inherit GFP_NOIO */ diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h index 8d89c8c4fac1..8db491208746 100644 --- a/include/linux/sched/mm.h +++ b/include/linux/sched/mm.h @@ -393,6 +393,19 @@ static inline void memalloc_pin_restore(unsigned int flags) current->flags = (current->flags & ~PF_MEMALLOC_PIN) | flags; } +static inline unsigned int memalloc_isolate_save(void) +{ + unsigned int flags = current->flags & PF_MEMALLOC_ISOLATE; + + current->flags |= PF_MEMALLOC_ISOLATE; + return flags; +} + +static inline void memalloc_isolate_restore(unsigned int flags) +{ + current->flags = (current->flags & ~PF_MEMALLOC_ISOLATE) | flags; +} + #ifdef CONFIG_MEMCG DECLARE_PER_CPU(struct mem_cgroup *, int_active_memcg); /** diff --git a/mm/compaction.c b/mm/compaction.c index 314793ec8bdb..fdb75316f0cc 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -778,6 +778,9 @@ static bool too_many_isolated(struct compact_control *cc) unsigned long active, inactive, isolated; + if (current->flags & PF_MEMALLOC_ISOLATE) + return false; + inactive = node_page_state(pgdat, NR_INACTIVE_FILE) + node_page_state(pgdat, NR_INACTIVE_ANON); active = node_page_state(pgdat, NR_ACTIVE_FILE) + diff --git a/mm/vmscan.c b/mm/vmscan.c index 1080209a568b..912ebb6003a0 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2453,6 +2453,9 @@ static int too_many_isolated(struct pglist_data *pgdat, int file, if (current_is_kswapd()) return 0; + if (current->flags & PF_MEMALLOC_ISOLATE) + return 0; + if (!writeback_throttling_sane(sc)) return 0; From patchwork Wed Aug 23 13:13:42 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexandru Elisei X-Patchwork-Id: 13362328 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B19DDEE49B0 for ; Wed, 23 Aug 2023 13:17:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235689AbjHWNRt (ORCPT ); Wed, 23 Aug 2023 09:17:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40174 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235397AbjHWNRr (ORCPT ); Wed, 23 Aug 2023 09:17:47 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 17DE8E6A; Wed, 23 Aug 2023 06:17:21 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 740FB16F8; Wed, 23 Aug 2023 06:17:57 -0700 (PDT) Received: from e121798.cable.virginm.net (unknown [172.31.20.19]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 1BF5A3F740; Wed, 23 Aug 2023 06:17:10 -0700 (PDT) From: Alexandru Elisei To: catalin.marinas@arm.com, will@kernel.org, oliver.upton@linux.dev, maz@kernel.org, james.morse@arm.com, suzuki.poulose@arm.com, yuzenghui@huawei.com, arnd@arndb.de, akpm@linux-foundation.org, mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, bristot@redhat.com, vschneid@redhat.com, mhiramat@kernel.org, rppt@kernel.org, hughd@google.com Cc: pcc@google.com, steven.price@arm.com, anshuman.khandual@arm.com, vincenzo.frascino@arm.com, david@redhat.com, eugenis@google.com, kcc@google.com, hyesoo.yu@samsung.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, kvmarm@lists.linux.dev, linux-fsdevel@vger.kernel.org, linux-arch@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org Subject: [PATCH RFC 29/37] mm: arm64: Define the PAGE_METADATA_NONE page protection Date: Wed, 23 Aug 2023 14:13:42 +0100 Message-Id: <20230823131350.114942-30-alexandru.elisei@arm.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230823131350.114942-1-alexandru.elisei@arm.com> References: <20230823131350.114942-1-alexandru.elisei@arm.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-trace-kernel@vger.kernel.org Define the PAGE_METADATA_NONE page protection to be used when a page with metadata doesn't have metadata storage reserved. For arm64, this is accomplished by adding a new page table entry software bit PTE_METADATA_NONE. Linux doesn't set any of the PBHA bits in entries from the last level of the translation table and it doesn't use the TCR_ELx.HWUxx bits. This makes it safe to define PTE_METADATA_NONE as bit 59. Signed-off-by: Alexandru Elisei --- arch/arm64/include/asm/pgtable-prot.h | 2 ++ arch/arm64/include/asm/pgtable.h | 16 ++++++++++++++-- include/linux/pgtable.h | 12 ++++++++++++ 3 files changed, 28 insertions(+), 2 deletions(-) diff --git a/arch/arm64/include/asm/pgtable-prot.h b/arch/arm64/include/asm/pgtable-prot.h index eed814b00a38..ed2a98ec4e95 100644 --- a/arch/arm64/include/asm/pgtable-prot.h +++ b/arch/arm64/include/asm/pgtable-prot.h @@ -19,6 +19,7 @@ #define PTE_SPECIAL (_AT(pteval_t, 1) << 56) #define PTE_DEVMAP (_AT(pteval_t, 1) << 57) #define PTE_PROT_NONE (_AT(pteval_t, 1) << 58) /* only when !PTE_VALID */ +#define PTE_METADATA_NONE (_AT(pteval_t, 1) << 59) /* only when PTE_PROT_NONE */ /* * This bit indicates that the entry is present i.e. pmd_page() @@ -98,6 +99,7 @@ extern bool arm64_use_ng_mappings; }) #define PAGE_NONE __pgprot(((_PAGE_DEFAULT) & ~PTE_VALID) | PTE_PROT_NONE | PTE_RDONLY | PTE_NG | PTE_PXN | PTE_UXN) +#define PAGE_METADATA_NONE __pgprot((_PAGE_DEFAULT & ~PTE_VALID) | PTE_PROT_NONE | PTE_METADATA_NONE | PTE_RDONLY | PTE_NG | PTE_PXN | PTE_UXN) /* shared+writable pages are clean by default, hence PTE_RDONLY|PTE_WRITE */ #define PAGE_SHARED __pgprot(_PAGE_SHARED) #define PAGE_SHARED_EXEC __pgprot(_PAGE_SHARED_EXEC) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index 944860d7090e..2e42f7713425 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -451,6 +451,18 @@ static inline int pmd_protnone(pmd_t pmd) } #endif +#ifdef CONFIG_MEMORY_METADATA +static inline bool pte_metadata_none(pte_t pte) +{ + return (((pte_val(pte) & (PTE_VALID | PTE_PROT_NONE)) == PTE_PROT_NONE) + && (pte_val(pte) & PTE_METADATA_NONE)); +} +static inline bool pmd_metadata_none(pmd_t pmd) +{ + return pte_metadata_none(pmd_pte(pmd)); +} +#endif + #define pmd_present_invalid(pmd) (!!(pmd_val(pmd) & PMD_PRESENT_INVALID)) static inline int pmd_present(pmd_t pmd) @@ -809,8 +821,8 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot) * in MAIR_EL1. The mask below has to include PTE_ATTRINDX_MASK. */ const pteval_t mask = PTE_USER | PTE_PXN | PTE_UXN | PTE_RDONLY | - PTE_PROT_NONE | PTE_VALID | PTE_WRITE | PTE_GP | - PTE_ATTRINDX_MASK; + PTE_PROT_NONE | PTE_METADATA_NONE | PTE_VALID | + PTE_WRITE | PTE_GP | PTE_ATTRINDX_MASK; /* preserve the hardware dirty information */ if (pte_hw_dirty(pte)) pte = pte_mkdirty(pte); diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 5063b482e34f..0119ffa2c0ab 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -1340,6 +1340,18 @@ static inline int pmd_protnone(pmd_t pmd) } #endif /* CONFIG_NUMA_BALANCING */ +#ifndef CONFIG_MEMORY_METADATA +static inline bool pte_metadata_none(pte_t pte) +{ + return false; +} + +static inline bool pmd_metadata_none(pmd_t pmd) +{ + return false; +} +#endif /* CONFIG_MEMORY_METADATA */ + #endif /* CONFIG_MMU */ #ifdef CONFIG_HAVE_ARCH_HUGE_VMAP From patchwork Wed Aug 23 13:13:43 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexandru Elisei X-Patchwork-Id: 13362329 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 49D7DEE49B5 for ; Wed, 23 Aug 2023 13:18:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235744AbjHWNSG (ORCPT ); Wed, 23 Aug 2023 09:18:06 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41344 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235757AbjHWNSE (ORCPT ); Wed, 23 Aug 2023 09:18:04 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 17F1EE7A; Wed, 23 Aug 2023 06:17:33 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id B515415DB; Wed, 23 Aug 2023 06:18:03 -0700 (PDT) Received: from e121798.cable.virginm.net (unknown [172.31.20.19]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 376473F740; Wed, 23 Aug 2023 06:17:17 -0700 (PDT) From: Alexandru Elisei To: catalin.marinas@arm.com, will@kernel.org, oliver.upton@linux.dev, maz@kernel.org, james.morse@arm.com, suzuki.poulose@arm.com, yuzenghui@huawei.com, arnd@arndb.de, akpm@linux-foundation.org, mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, bristot@redhat.com, vschneid@redhat.com, mhiramat@kernel.org, rppt@kernel.org, hughd@google.com Cc: pcc@google.com, steven.price@arm.com, anshuman.khandual@arm.com, vincenzo.frascino@arm.com, david@redhat.com, eugenis@google.com, kcc@google.com, hyesoo.yu@samsung.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, kvmarm@lists.linux.dev, linux-fsdevel@vger.kernel.org, linux-arch@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org Subject: [PATCH RFC 30/37] mm: mprotect: arm64: Set PAGE_METADATA_NONE for mprotect(PROT_MTE) Date: Wed, 23 Aug 2023 14:13:43 +0100 Message-Id: <20230823131350.114942-31-alexandru.elisei@arm.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230823131350.114942-1-alexandru.elisei@arm.com> References: <20230823131350.114942-1-alexandru.elisei@arm.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-trace-kernel@vger.kernel.org To enable tagging on a memory range, userspace can use mprotect() with the PROT_MTE access flag. Pages already mapped in the VMA obviously don't have the associated tag storage block reserved, so mark the PTEs as PAGE_METADATA_NONE to trigger a fault next time they are accessed, and reserve the tag storage as part of the fault handling. If the tag storage for the page cannot be reserved, then migrate the page, because alloc_migration_target() will do the right thing and allocate a destination page with the tag storage reserved. If the mapped page is a metadata storage page, which cannot have metadata associated with it, the page is unconditionally migrated. This has several benefits over reserving the tag storage as part of the mprotect() call handling: - Tag storage is reserved only for pages that are accessed. - Reduces the latency of the mprotect() call. - Eliminates races with page migration. But all of this is at the expense of an extra page fault until the pages being accessed all have their corresponding tag storage reserved. This is only implemented for PTE mappings; PMD mappings will follow. Signed-off-by: Alexandru Elisei --- arch/arm64/kernel/mte_tag_storage.c | 6 ++ include/linux/migrate_mode.h | 1 + include/linux/mm.h | 2 + mm/memory.c | 152 +++++++++++++++++++++++++++- mm/mprotect.c | 46 +++++++++ 5 files changed, 206 insertions(+), 1 deletion(-) diff --git a/arch/arm64/kernel/mte_tag_storage.c b/arch/arm64/kernel/mte_tag_storage.c index ba316ffb9aef..27bde1d2609c 100644 --- a/arch/arm64/kernel/mte_tag_storage.c +++ b/arch/arm64/kernel/mte_tag_storage.c @@ -531,6 +531,10 @@ int reserve_metadata_storage(struct page *page, int order, gfp_t gfp) mutex_lock(&tag_blocks_lock); + /* Can happen for concurrent accesses to a METADATA_NONE page. */ + if (page_tag_storage_reserved(page)) + goto out_unlock; + /* Make sure existing entries are not freed from out under out feet. */ xa_lock_irqsave(&tag_blocks_reserved, flags); for (block = start_block; block < end_block; block += region->block_size) { @@ -568,6 +572,8 @@ int reserve_metadata_storage(struct page *page, int order, gfp_t gfp) set_bit(PG_tag_storage_reserved, &(page + i)->flags); memalloc_isolate_restore(cflags); + +out_unlock: mutex_unlock(&tag_blocks_lock); return 0; diff --git a/include/linux/migrate_mode.h b/include/linux/migrate_mode.h index f37cc03f9369..5a9af239e425 100644 --- a/include/linux/migrate_mode.h +++ b/include/linux/migrate_mode.h @@ -29,6 +29,7 @@ enum migrate_reason { MR_CONTIG_RANGE, MR_LONGTERM_PIN, MR_DEMOTION, + MR_METADATA_NONE, MR_TYPES }; diff --git a/include/linux/mm.h b/include/linux/mm.h index ce87d55ecf87..6bd7d5810122 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2466,6 +2466,8 @@ extern unsigned long move_page_tables(struct vm_area_struct *vma, #define MM_CP_UFFD_WP_RESOLVE (1UL << 3) /* Resolve wp */ #define MM_CP_UFFD_WP_ALL (MM_CP_UFFD_WP | \ MM_CP_UFFD_WP_RESOLVE) +/* Whether this protection change is to allocate metadata on next access */ +#define MM_CP_PROT_METADATA_NONE (1UL << 4) bool vma_needs_dirty_tracking(struct vm_area_struct *vma); int vma_wants_writenotify(struct vm_area_struct *vma, pgprot_t vm_page_prot); diff --git a/mm/memory.c b/mm/memory.c index 01f39e8144ef..6c4a6151c7b2 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -51,6 +51,7 @@ #include #include #include +#include #include #include #include @@ -82,6 +83,7 @@ #include #include +#include #include #include #include @@ -4681,6 +4683,151 @@ static vm_fault_t do_fault(struct vm_fault *vmf) return ret; } +/* Returns with the page reference dropped. */ +static void migrate_metadata_none_page(struct page *page, struct vm_area_struct *vma) +{ + struct migration_target_control mtc = { + .nid = NUMA_NO_NODE, + .gfp_mask = GFP_HIGHUSER_MOVABLE | __GFP_TAGGED, + }; + LIST_HEAD(pagelist); + int ret, tries; + + lru_cache_disable(); + + if (!isolate_lru_page(page)) { + put_page(page); + lru_cache_enable(); + return; + } + /* Isolate just grabbed another reference, drop ours. */ + put_page(page); + + list_add_tail(&page->lru, &pagelist); + + tries = 5; + while (tries--) { + ret = migrate_pages(&pagelist, alloc_migration_target, NULL, + (unsigned long)&mtc, MIGRATE_SYNC, MR_METADATA_NONE, NULL); + if (ret == 0 || ret != -EBUSY) + break; + } + + if (ret != 0) { + list_del(&page->lru); + putback_movable_pages(&pagelist); + } + lru_cache_enable(); +} + +static vm_fault_t do_metadata_none_page(struct vm_fault *vmf) +{ + struct vm_area_struct *vma = vmf->vma; + struct page *page = NULL; + bool do_migrate = false; + pte_t new_pte, old_pte; + bool writable = false; + vm_fault_t err; + int ret; + + /* + * The pte at this point cannot be used safely without validation + * through pte_same(). + */ + vmf->ptl = pte_lockptr(vma->vm_mm, vmf->pmd); + spin_lock(vmf->ptl); + if (unlikely(!pte_same(*vmf->pte, vmf->orig_pte))) { + pte_unmap_unlock(vmf->pte, vmf->ptl); + return 0; + } + + /* Get the normal PTE */ + old_pte = ptep_get(vmf->pte); + new_pte = pte_modify(old_pte, vma->vm_page_prot); + + /* + * Detect now whether the PTE could be writable; this information + * is only valid while holding the PT lock. + */ + writable = pte_write(new_pte); + if (!writable && vma_wants_manual_pte_write_upgrade(vma) && + can_change_pte_writable(vma, vmf->address, new_pte)) + writable = true; + + page = vm_normal_page(vma, vmf->address, new_pte); + if (!page) + goto out_map; + + /* + * This should never happen, once a VMA has been marked as tagged, that + * cannot be changed. + */ + if (!(vma->vm_flags & VM_MTE)) + goto out_map; + + /* Prevent the page from being unmapped from under us. */ + get_page(page); + vma_set_access_pid_bit(vma); + + pte_unmap_unlock(vmf->pte, vmf->ptl); + + /* + * Probably the page is being isolated for migration, replay the fault + * to give time for the entry to be replaced by a migration pte. + */ + if (unlikely(is_migrate_isolate_page(page))) { + if (!(vmf->flags & FAULT_FLAG_TRIED)) + err = VM_FAULT_RETRY; + else + err = 0; + put_page(page); + return 0; + } else if (is_migrate_metadata_page(page)) { + do_migrate = true; + } else { + ret = reserve_metadata_storage(page, 0, GFP_HIGHUSER_MOVABLE); + if (ret == -EINTR) { + put_page(page); + return VM_FAULT_RETRY; + } else if (ret) { + do_migrate = true; + } + } + if (do_migrate) { + migrate_metadata_none_page(page, vma); + /* + * Either the page was migrated, in which case there's nothing + * we need to do; either migration failed, in which case all we + * can do is try again. So don't change the pte. + */ + return 0; + } + + put_page(page); + + spin_lock(vmf->ptl); + if (unlikely(!pte_same(*vmf->pte, vmf->orig_pte))) { + pte_unmap_unlock(vmf->pte, vmf->ptl); + return 0; + } + +out_map: + /* + * Make it present again, depending on how arch implements + * non-accessible ptes, some can allow access by kernel mode. + */ + old_pte = ptep_modify_prot_start(vma, vmf->address, vmf->pte); + new_pte = pte_modify(old_pte, vma->vm_page_prot); + new_pte = pte_mkyoung(new_pte); + if (writable) + new_pte = pte_mkwrite(new_pte); + ptep_modify_prot_commit(vma, vmf->address, vmf->pte, old_pte, new_pte); + update_mmu_cache(vma, vmf->address, vmf->pte); + pte_unmap_unlock(vmf->pte, vmf->ptl); + + return 0; +} + int numa_migrate_prep(struct page *page, struct vm_area_struct *vma, unsigned long addr, int page_nid, int *flags) { @@ -4941,8 +5088,11 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf) if (!pte_present(vmf->orig_pte)) return do_swap_page(vmf); - if (pte_protnone(vmf->orig_pte) && vma_is_accessible(vmf->vma)) + if (pte_protnone(vmf->orig_pte) && vma_is_accessible(vmf->vma)) { + if (metadata_storage_enabled() && pte_metadata_none(vmf->orig_pte)) + return do_metadata_none_page(vmf); return do_numa_page(vmf); + } spin_lock(vmf->ptl); entry = vmf->orig_pte; diff --git a/mm/mprotect.c b/mm/mprotect.c index 6f658d483704..2c022133aed3 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -33,6 +33,7 @@ #include #include #include +#include #include #include #include @@ -89,6 +90,7 @@ static long change_pte_range(struct mmu_gather *tlb, long pages = 0; int target_node = NUMA_NO_NODE; bool prot_numa = cp_flags & MM_CP_PROT_NUMA; + bool prot_metadata_none = cp_flags & MM_CP_PROT_METADATA_NONE; bool uffd_wp = cp_flags & MM_CP_UFFD_WP; bool uffd_wp_resolve = cp_flags & MM_CP_UFFD_WP_RESOLVE; @@ -161,6 +163,40 @@ static long change_pte_range(struct mmu_gather *tlb, jiffies_to_msecs(jiffies)); } + if (prot_metadata_none) { + struct page *page; + + /* + * Skip METADATA_NONE pages, but not NUMA pages, + * just so we don't get two faults, one after + * the other. The page fault handling code + * might end up migrating the current page + * anyway, so there really is no need to keep + * the pte marked for NUMA balancing. + */ + if (pte_protnone(oldpte) && pte_metadata_none(oldpte)) + continue; + + page = vm_normal_page(vma, addr, oldpte); + if (!page || is_zone_device_page(page)) + continue; + + /* Page already mapped as tagged in a shared VMA. */ + if (page_has_metadata(page)) + continue; + + /* + * The LRU takes a page reference, which means + * that page_count > 1 is true even if the page + * is not COW. Reserving tag storage for a COW + * page is ok, because one mapping of that page + * won't be migrated; but not reserving tag + * storage for a page is definitely wrong. So + * don't skip pages that might be COW, like + * NUMA does. + */ + } + oldpte = ptep_modify_prot_start(vma, addr, pte); ptent = pte_modify(oldpte, newprot); @@ -531,6 +567,13 @@ long change_protection(struct mmu_gather *tlb, WARN_ON_ONCE(cp_flags & MM_CP_PROT_NUMA); #endif +#ifdef CONFIG_MEMORY_METADATA + if (cp_flags & MM_CP_PROT_METADATA_NONE) + newprot = PAGE_METADATA_NONE; +#else + WARN_ON_ONCE(cp_flags & MM_CP_PROT_METADATA_NONE); +#endif + if (is_vm_hugetlb_page(vma)) pages = hugetlb_change_protection(vma, start, end, newprot, cp_flags); @@ -661,6 +704,9 @@ mprotect_fixup(struct vma_iterator *vmi, struct mmu_gather *tlb, mm_cp_flags |= MM_CP_TRY_CHANGE_WRITABLE; vma_set_page_prot(vma); + if (metadata_storage_enabled() && (newflags & VM_MTE) && !(oldflags & VM_MTE)) + mm_cp_flags |= MM_CP_PROT_METADATA_NONE; + change_protection(tlb, vma, start, end, mm_cp_flags); /* From patchwork Wed Aug 23 13:13:44 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexandru Elisei X-Patchwork-Id: 13362387 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 02550EE49B2 for ; Wed, 23 Aug 2023 13:18:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235840AbjHWNSf (ORCPT ); Wed, 23 Aug 2023 09:18:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41384 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235788AbjHWNSc (ORCPT ); Wed, 23 Aug 2023 09:18:32 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 9118D10FE; Wed, 23 Aug 2023 06:17:46 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 4D5841756; Wed, 23 Aug 2023 06:18:10 -0700 (PDT) Received: from e121798.cable.virginm.net (unknown [172.31.20.19]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 6F2923F740; Wed, 23 Aug 2023 06:17:23 -0700 (PDT) From: Alexandru Elisei To: catalin.marinas@arm.com, will@kernel.org, oliver.upton@linux.dev, maz@kernel.org, james.morse@arm.com, suzuki.poulose@arm.com, yuzenghui@huawei.com, arnd@arndb.de, akpm@linux-foundation.org, mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, bristot@redhat.com, vschneid@redhat.com, mhiramat@kernel.org, rppt@kernel.org, hughd@google.com Cc: pcc@google.com, steven.price@arm.com, anshuman.khandual@arm.com, vincenzo.frascino@arm.com, david@redhat.com, eugenis@google.com, kcc@google.com, hyesoo.yu@samsung.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, kvmarm@lists.linux.dev, linux-fsdevel@vger.kernel.org, linux-arch@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org Subject: [PATCH RFC 31/37] mm: arm64: Set PAGE_METADATA_NONE in set_pte_at() if missing metadata storage Date: Wed, 23 Aug 2023 14:13:44 +0100 Message-Id: <20230823131350.114942-32-alexandru.elisei@arm.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230823131350.114942-1-alexandru.elisei@arm.com> References: <20230823131350.114942-1-alexandru.elisei@arm.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-trace-kernel@vger.kernel.org When a metadata page is mapped in the process address space and then mprotect(PROT_MTE) changes the VMA flags to allow the use of tags, the page is migrated out when it is first accessed. But this creates an interesting corner case. Let's consider the scenario: Initial conditions: metadata page M1 and page P1 are mapped in a VMA without VM_MTE. The metadata storage for page P1 is **metadata page M1**. 1. mprotect(PROT_MTE) changes the VMA, so now all pages must have the associated metadata storage reserved. The to-be-tagged pages are marked as PAGE_METADATA_NONE. 2. Page P1 is accessed and metadata page M1 must be reserved. 3. Because it is mapped, the metadata storage code will migrate metadata page M1. The replacement page for M1, page P2, is allocated without metadata storage (__GFP_TAGGED is not set). This is done intentionally in reserve_metadata_storage() to avoid recursion and deadlock. 4. Migration finishes and page P2 replaces M1 in a VMA with VM_MTE set. The result: P2 is mapped in a VM_MTE VMA, but the associated metadata storage is not reserved. Fix this by teaching set_pte_at() -> mte_sync_tags() to change the PTE protection to PAGE_METADATA_NONE when the associated metadata storage is not reserved. Signed-off-by: Alexandru Elisei --- arch/arm64/include/asm/mte.h | 4 ++-- arch/arm64/include/asm/pgtable.h | 2 +- arch/arm64/kernel/mte.c | 14 +++++++++++--- 3 files changed, 14 insertions(+), 6 deletions(-) diff --git a/arch/arm64/include/asm/mte.h b/arch/arm64/include/asm/mte.h index 70cfd09b4a11..e89d1fa3f410 100644 --- a/arch/arm64/include/asm/mte.h +++ b/arch/arm64/include/asm/mte.h @@ -108,7 +108,7 @@ static inline bool try_page_mte_tagging(struct page *page) } void mte_zero_clear_page_tags(void *addr); -void mte_sync_tags(pte_t pte); +void mte_sync_tags(pte_t *pteval); void mte_copy_page_tags(void *kto, const void *kfrom); void mte_thread_init_user(void); void mte_thread_switch(struct task_struct *next); @@ -140,7 +140,7 @@ static inline bool try_page_mte_tagging(struct page *page) static inline void mte_zero_clear_page_tags(void *addr) { } -static inline void mte_sync_tags(pte_t pte) +static inline void mte_sync_tags(pte_t *pteval) { } static inline void mte_copy_page_tags(void *kto, const void *kfrom) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index 2e42f7713425..e5e1c23afb14 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -338,7 +338,7 @@ static inline void __set_pte_at(struct mm_struct *mm, unsigned long addr, */ if (system_supports_mte() && pte_access_permitted(pte, false) && !pte_special(pte) && pte_tagged(pte)) - mte_sync_tags(pte); + mte_sync_tags(&pte); __check_safe_pte_update(mm, ptep, pte); diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c index 4edecaac8f91..4556989f0b9e 100644 --- a/arch/arm64/kernel/mte.c +++ b/arch/arm64/kernel/mte.c @@ -20,7 +20,9 @@ #include #include +#include #include +#include #include #include @@ -35,13 +37,19 @@ DEFINE_STATIC_KEY_FALSE(mte_async_or_asymm_mode); EXPORT_SYMBOL_GPL(mte_async_or_asymm_mode); #endif -void mte_sync_tags(pte_t pte) +void mte_sync_tags(pte_t *pteval) { - struct page *page = pte_page(pte); + struct page *page = pte_page(*pteval); long i, nr_pages = compound_nr(page); - /* if PG_mte_tagged is set, tags have already been initialised */ for (i = 0; i < nr_pages; i++, page++) { + if (metadata_storage_enabled() && + unlikely(!page_tag_storage_reserved(page))) { + *pteval = pte_modify(*pteval, PAGE_METADATA_NONE); + continue; + } + + /* if PG_mte_tagged is set, tags have already been initialised */ if (try_page_mte_tagging(page)) { mte_clear_page_tags(page_address(page)); set_page_mte_tagged(page); From patchwork Wed Aug 23 13:13:45 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexandru Elisei X-Patchwork-Id: 13362389 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 06928EE49A3 for ; Wed, 23 Aug 2023 13:18:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235778AbjHWNS4 (ORCPT ); Wed, 23 Aug 2023 09:18:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34378 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235822AbjHWNSe (ORCPT ); Wed, 23 Aug 2023 09:18:34 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 22E64E58; Wed, 23 Aug 2023 06:18:03 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 8531815BF; Wed, 23 Aug 2023 06:18:17 -0700 (PDT) Received: from e121798.cable.virginm.net (unknown [172.31.20.19]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 0FC3D3F740; Wed, 23 Aug 2023 06:17:29 -0700 (PDT) From: Alexandru Elisei To: catalin.marinas@arm.com, will@kernel.org, oliver.upton@linux.dev, maz@kernel.org, james.morse@arm.com, suzuki.poulose@arm.com, yuzenghui@huawei.com, arnd@arndb.de, akpm@linux-foundation.org, mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, bristot@redhat.com, vschneid@redhat.com, mhiramat@kernel.org, rppt@kernel.org, hughd@google.com Cc: pcc@google.com, steven.price@arm.com, anshuman.khandual@arm.com, vincenzo.frascino@arm.com, david@redhat.com, eugenis@google.com, kcc@google.com, hyesoo.yu@samsung.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, kvmarm@lists.linux.dev, linux-fsdevel@vger.kernel.org, linux-arch@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org Subject: [PATCH RFC 32/37] mm: Call arch_swap_prepare_to_restore() before arch_swap_restore() Date: Wed, 23 Aug 2023 14:13:45 +0100 Message-Id: <20230823131350.114942-33-alexandru.elisei@arm.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230823131350.114942-1-alexandru.elisei@arm.com> References: <20230823131350.114942-1-alexandru.elisei@arm.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-trace-kernel@vger.kernel.org arch_swap_restore() allows an architecture to restore metadata before the page is swapped in and it's called in atomic context (with the ptl lock held). Introduce arch_swap_prepare_to_restore() to allow such architectures to perform extra work in a blocking context. Signed-off-by: Alexandru Elisei --- include/linux/pgtable.h | 7 +++++++ mm/memory.c | 11 +++++++++++ mm/shmem.c | 4 ++++ mm/swapfile.c | 4 ++++ 4 files changed, 26 insertions(+) diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 0119ffa2c0ab..0bce12f9eaab 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -816,6 +816,13 @@ static inline void arch_swap_restore(swp_entry_t entry, struct folio *folio) } #endif +#ifndef __HAVE_ARCH_SWAP_PREPARE_TO_RESTORE +static inline int arch_swap_prepare_to_restore(swp_entry_t entry, struct folio *folio) +{ + return 0; +} +#endif + #ifndef __HAVE_ARCH_PGD_OFFSET_GATE #define pgd_offset_gate(mm, addr) pgd_offset(mm, addr) #endif diff --git a/mm/memory.c b/mm/memory.c index 6c4a6151c7b2..5f7587109ac2 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3724,6 +3724,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) swp_entry_t entry; pte_t pte; int locked; + int error; vm_fault_t ret = 0; void *shadow = NULL; @@ -3892,6 +3893,16 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) folio_throttle_swaprate(folio, GFP_KERNEL); + /* + * Some architecture may need to perform certain operations before + * arch_swap_restore() in preemptible context (like memory allocations). + */ + error = arch_swap_prepare_to_restore(entry, folio); + if (error) { + ret = VM_FAULT_ERROR; + goto out_page; + } + /* * Back out if somebody else already faulted in this pte. */ diff --git a/mm/shmem.c b/mm/shmem.c index 0b772ec34caa..4704be6a4e9b 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1796,6 +1796,10 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index, } folio_wait_writeback(folio); + error = arch_swap_prepare_to_restore(swap, folio); + if (error) + goto unlock; + /* * Some architectures may have to restore extra metadata to the * folio after reading from swap. diff --git a/mm/swapfile.c b/mm/swapfile.c index 6d719ed5c616..387971e2c5f0 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1756,6 +1756,10 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_t *pmd, else if (unlikely(PTR_ERR(page) == -EHWPOISON)) hwposioned = true; + ret = arch_swap_prepare_to_restore(entry, folio); + if (ret) + return ret; + pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); if (unlikely(!pte || !pte_same_as_swp(ptep_get(pte), swp_entry_to_pte(entry)))) { From patchwork Wed Aug 23 13:13:46 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexandru Elisei X-Patchwork-Id: 13362388 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 49302EE49A0 for ; Wed, 23 Aug 2023 13:18:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235892AbjHWNSz (ORCPT ); Wed, 23 Aug 2023 09:18:55 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34296 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235813AbjHWNSe (ORCPT ); Wed, 23 Aug 2023 09:18:34 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 660181708; Wed, 23 Aug 2023 06:17:56 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 6FA3A1713; Wed, 23 Aug 2023 06:18:24 -0700 (PDT) Received: from e121798.cable.virginm.net (unknown [172.31.20.19]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 5E4203F740; Wed, 23 Aug 2023 06:17:37 -0700 (PDT) From: Alexandru Elisei To: catalin.marinas@arm.com, will@kernel.org, oliver.upton@linux.dev, maz@kernel.org, james.morse@arm.com, suzuki.poulose@arm.com, yuzenghui@huawei.com, arnd@arndb.de, akpm@linux-foundation.org, mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, bristot@redhat.com, vschneid@redhat.com, mhiramat@kernel.org, rppt@kernel.org, hughd@google.com Cc: pcc@google.com, steven.price@arm.com, anshuman.khandual@arm.com, vincenzo.frascino@arm.com, david@redhat.com, eugenis@google.com, kcc@google.com, hyesoo.yu@samsung.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, kvmarm@lists.linux.dev, linux-fsdevel@vger.kernel.org, linux-arch@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org Subject: [PATCH RFC 33/37] arm64: mte: swap/copypage: Handle tag restoring when missing tag storage Date: Wed, 23 Aug 2023 14:13:46 +0100 Message-Id: <20230823131350.114942-34-alexandru.elisei@arm.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230823131350.114942-1-alexandru.elisei@arm.com> References: <20230823131350.114942-1-alexandru.elisei@arm.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-trace-kernel@vger.kernel.org Linux restores tags when a page is swapped in and there are tags saved for the swap entry which the new page will replace. The tags are restored even if the page will not be mapped as tagged. This is done so when a shared page is swapped in as untagged, followed by mprotect(PROT_MTE), the process can still access the correct tags. But this poses a challenge for tag storage: when a page is swapped in for the process where it is untagged, the corresponding tag storage block is not reserved, and restoring the tags can overwrite data in the tag storage block, leading to data corruption. Get around this issue by saving the tags in a new xarray, this time indexed by the page pfn, and then restoring them in set_pte_at(). Something similar can happen when a page is migrated: the migration process starts and the destination page is allocated when the VMA does not have MTE enabled (so tag storage is not reserved as part of the allocation), mprotect(PROT_MTE) is called before migration finishes and the source page is accessed (thus marking it as tagged). When folio_copy() is called, the code will try to copy the tags to the destination page, which doesn't have tag storage reserved. Fix this in a similar way to tag restoring when doing swap in, by saving the tags of the source page in a buffer, then restoring them in set_pte_at(). Signed-off-by: Alexandru Elisei --- arch/arm64/include/asm/memory_metadata.h | 1 + arch/arm64/include/asm/mte_tag_storage.h | 11 +++++ arch/arm64/include/asm/pgtable.h | 7 +++ arch/arm64/kernel/mte.c | 17 +++++++ arch/arm64/kernel/mte_tag_storage.c | 9 +++- arch/arm64/mm/copypage.c | 26 ++++++++++ arch/arm64/mm/mteswap.c | 63 ++++++++++++++++++++++++ include/asm-generic/memory_metadata.h | 4 ++ mm/memory.c | 12 +++++ 9 files changed, 149 insertions(+), 1 deletion(-) diff --git a/arch/arm64/include/asm/memory_metadata.h b/arch/arm64/include/asm/memory_metadata.h index 167b039f06cf..25b4d790e92b 100644 --- a/arch/arm64/include/asm/memory_metadata.h +++ b/arch/arm64/include/asm/memory_metadata.h @@ -43,6 +43,7 @@ static inline bool vma_has_metadata(struct vm_area_struct *vma) int reserve_metadata_storage(struct page *page, int order, gfp_t gfp_mask); void free_metadata_storage(struct page *page, int order); +bool page_metadata_in_swap(struct page *page); #endif /* CONFIG_MEMORY_METADATA */ #endif /* __ASM_MEMORY_METADATA_H */ diff --git a/arch/arm64/include/asm/mte_tag_storage.h b/arch/arm64/include/asm/mte_tag_storage.h index bad865866eeb..cafbb618d97a 100644 --- a/arch/arm64/include/asm/mte_tag_storage.h +++ b/arch/arm64/include/asm/mte_tag_storage.h @@ -12,6 +12,9 @@ extern void dcache_inval_tags_poc(unsigned long start, unsigned long end); #ifdef CONFIG_ARM64_MTE_TAG_STORAGE void mte_tag_storage_init(void); bool page_tag_storage_reserved(struct page *page); + +void *mte_erase_page_tags_by_pfn(struct page *page); +int mte_save_page_tags_by_pfn(struct page *page, void *tags); #else static inline void mte_tag_storage_init(void) { @@ -20,6 +23,14 @@ static inline bool page_tag_storage_reserved(struct page *page) { return true; } +static inline void *mte_erase_page_tags_by_pfn(struct page *page) +{ + return NULL; +} +static inline int mte_save_page_tags_by_pfn(struct page *page, void *tags) +{ + return 0; +} #endif /* CONFIG_ARM64_MTE_TAG_STORAGE */ #endif /* __ASM_MTE_TAG_STORAGE_H */ diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index e5e1c23afb14..a1e93d3228fa 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -1056,6 +1056,13 @@ static inline void arch_swap_restore(swp_entry_t entry, struct folio *folio) mte_restore_page_tags_by_swp_entry(entry, &folio->page); } +#ifdef CONFIG_ARM64_MTE_TAG_STORAGE + +#define __HAVE_ARCH_SWAP_PREPARE_TO_RESTORE +int arch_swap_prepare_to_restore(swp_entry_t entry, struct folio *folio); + +#endif /* CONFIG_ARM64_MTE_TAG_STORAGE */ + #endif /* CONFIG_ARM64_MTE */ /* diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c index 4556989f0b9e..5139ce6952ff 100644 --- a/arch/arm64/kernel/mte.c +++ b/arch/arm64/kernel/mte.c @@ -37,6 +37,20 @@ DEFINE_STATIC_KEY_FALSE(mte_async_or_asymm_mode); EXPORT_SYMBOL_GPL(mte_async_or_asymm_mode); #endif +static bool mte_restore_saved_tags(struct page *page) +{ + void *tags = mte_erase_page_tags_by_pfn(page); + + if (likely(!tags)) + return false; + + mte_restore_page_tags_from_mem(page_address(page), tags); + mte_free_tags_mem(tags); + set_page_mte_tagged(page); + + return true; +} + void mte_sync_tags(pte_t *pteval) { struct page *page = pte_page(*pteval); @@ -51,6 +65,9 @@ void mte_sync_tags(pte_t *pteval) /* if PG_mte_tagged is set, tags have already been initialised */ if (try_page_mte_tagging(page)) { + if (metadata_storage_enabled() && + unlikely(mte_restore_saved_tags(page))) + continue; mte_clear_page_tags(page_address(page)); set_page_mte_tagged(page); } diff --git a/arch/arm64/kernel/mte_tag_storage.c b/arch/arm64/kernel/mte_tag_storage.c index 27bde1d2609c..ce378f45f866 100644 --- a/arch/arm64/kernel/mte_tag_storage.c +++ b/arch/arm64/kernel/mte_tag_storage.c @@ -603,7 +603,8 @@ void free_metadata_storage(struct page *page, int order) struct tag_region *region; unsigned long page_va; unsigned long flags; - int ret; + void *tags; + int i, ret; if (WARN_ONCE(!page_mte_tagged(page), "pfn 0x%lx is not tagged", page_to_pfn(page))) return; @@ -619,6 +620,12 @@ void free_metadata_storage(struct page *page, int order) */ dcache_inval_tags_poc(page_va, page_va + (PAGE_SIZE << order)); + for (i = 0; i < (1 << order); i++) { + tags = mte_erase_page_tags_by_pfn(page + i); + if (unlikely(tags)) + mte_free_tags_mem(tags); + } + end_block = start_block + order_to_num_blocks(order) * region->block_size; xa_lock_irqsave(&tag_blocks_reserved, flags); diff --git a/arch/arm64/mm/copypage.c b/arch/arm64/mm/copypage.c index a7bb20055ce0..e4ac3806b994 100644 --- a/arch/arm64/mm/copypage.c +++ b/arch/arm64/mm/copypage.c @@ -12,7 +12,29 @@ #include #include #include +#include #include +#include + +static bool copy_page_tags_to_page(struct page *to, struct page *from) +{ + void *kfrom = page_address(from); + void *tags; + + if (likely(page_tag_storage_reserved(to))) + return false; + + tags = mte_allocate_tags_mem(); + if (WARN_ON(!tags)) + goto out; + + mte_save_page_tags_to_mem(kfrom, tags); + + if (WARN_ON(mte_save_page_tags_by_pfn(to, tags))) + mte_free_tags_mem(tags); +out: + return true; +} void copy_highpage(struct page *to, struct page *from) { @@ -25,6 +47,10 @@ void copy_highpage(struct page *to, struct page *from) page_kasan_tag_reset(to); if (system_supports_mte() && page_mte_tagged(from)) { + if (metadata_storage_enabled() && + unlikely(copy_page_tags_to_page(to, from))) + return; + /* It's a new page, shouldn't have been tagged yet */ WARN_ON_ONCE(!try_page_mte_tagging(to)); mte_copy_page_tags(kto, kfrom); diff --git a/arch/arm64/mm/mteswap.c b/arch/arm64/mm/mteswap.c index aaeca57f36cc..f6a9b6f889e6 100644 --- a/arch/arm64/mm/mteswap.c +++ b/arch/arm64/mm/mteswap.c @@ -5,7 +5,9 @@ #include #include #include +#include #include +#include static DEFINE_XARRAY(tags_by_swp_entry); @@ -20,6 +22,62 @@ void mte_free_tags_mem(void *tags) kfree(tags); } +#ifdef CONFIG_ARM64_MTE_TAG_STORAGE +static DEFINE_XARRAY(tags_by_pfn); + +int mte_save_page_tags_by_pfn(struct page *page, void *tags) +{ + void *entry; + + entry = xa_store(&tags_by_pfn, page_to_pfn(page), tags, GFP_KERNEL); + if (xa_is_err(entry)) + return xa_err(entry); + else if (entry) + mte_free_tags_mem(entry); + + return 0; +} + +int arch_swap_prepare_to_restore(swp_entry_t entry, struct folio *folio) +{ + struct page *page = &folio->page; + void *swp_tags, *pfn_tags; + int ret; + + might_sleep(); + + if (!metadata_storage_enabled() || page_mte_tagged(page) || + page_tag_storage_reserved(page)) + return 0; + + swp_tags = xa_load(&tags_by_swp_entry, entry.val); + if (!swp_tags) + return 0; + + pfn_tags = mte_allocate_tags_mem(); + if (!pfn_tags) + return -ENOMEM; + + memcpy(pfn_tags, swp_tags, MTE_PAGE_TAG_STORAGE_SIZE); + + ret = mte_save_page_tags_by_pfn(page, pfn_tags); + if (ret) + mte_free_tags_mem(pfn_tags); + + return ret; +} + +void *mte_erase_page_tags_by_pfn(struct page *page) +{ + return xa_erase(&tags_by_pfn, page_to_pfn(page)); +} + +bool page_metadata_in_swap(struct page *page) +{ + return xa_load(&tags_by_pfn, page_to_pfn(page)) != NULL; +} +#endif + int mte_save_page_tags_by_swp_entry(struct page *page) { void *tags, *ret; @@ -53,6 +111,11 @@ void mte_restore_page_tags_by_swp_entry(swp_entry_t entry, struct page *page) if (!tags) return; + /* Tags already saved in mte_swap_prepare_to_restore(). */ + if (metadata_storage_enabled() && + unlikely(!page_tag_storage_reserved(page))) + return; + if (try_page_mte_tagging(page)) { mte_restore_page_tags_from_mem(page_address(page), tags); set_page_mte_tagged(page); diff --git a/include/asm-generic/memory_metadata.h b/include/asm-generic/memory_metadata.h index 35a0d6a8b5fc..4176fd89ef41 100644 --- a/include/asm-generic/memory_metadata.h +++ b/include/asm-generic/memory_metadata.h @@ -39,6 +39,10 @@ static inline bool vma_has_metadata(struct vm_area_struct *vma) { return false; } +static inline bool page_metadata_in_swap(struct page *page) +{ + return false; +} #endif /* !CONFIG_MEMORY_METADATA */ #endif /* __ASM_GENERIC_MEMORY_METADATA_H */ diff --git a/mm/memory.c b/mm/memory.c index 5f7587109ac2..ade71f38b2ff 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4801,6 +4801,18 @@ static vm_fault_t do_metadata_none_page(struct vm_fault *vmf) put_page(page); return VM_FAULT_RETRY; } else if (ret) { + // TODO: support migrating swap metadata with the page. + if (unlikely(page_metadata_in_swap(page))) { + vm_fault_t err; + + if (vmf->flags & FAULT_FLAG_TRIED) + err = VM_FAULT_OOM; + else + err = VM_FAULT_RETRY; + + put_page(page); + return err; + } do_migrate = true; } } From patchwork Wed Aug 23 13:13:47 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexandru Elisei X-Patchwork-Id: 13362391 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1B605EE49A3 for ; Wed, 23 Aug 2023 13:19:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235885AbjHWNTd (ORCPT ); Wed, 23 Aug 2023 09:19:33 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51356 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235887AbjHWNSz (ORCPT ); Wed, 23 Aug 2023 09:18:55 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 9E88A1996; Wed, 23 Aug 2023 06:18:23 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 6AF6E16F8; Wed, 23 Aug 2023 06:18:30 -0700 (PDT) Received: from e121798.cable.virginm.net (unknown [172.31.20.19]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 2CE8A3F740; Wed, 23 Aug 2023 06:17:44 -0700 (PDT) From: Alexandru Elisei To: catalin.marinas@arm.com, will@kernel.org, oliver.upton@linux.dev, maz@kernel.org, james.morse@arm.com, suzuki.poulose@arm.com, yuzenghui@huawei.com, arnd@arndb.de, akpm@linux-foundation.org, mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, bristot@redhat.com, vschneid@redhat.com, mhiramat@kernel.org, rppt@kernel.org, hughd@google.com Cc: pcc@google.com, steven.price@arm.com, anshuman.khandual@arm.com, vincenzo.frascino@arm.com, david@redhat.com, eugenis@google.com, kcc@google.com, hyesoo.yu@samsung.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, kvmarm@lists.linux.dev, linux-fsdevel@vger.kernel.org, linux-arch@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org Subject: [PATCH RFC 34/37] arm64: mte: Handle fatal signal in reserve_metadata_storage() Date: Wed, 23 Aug 2023 14:13:47 +0100 Message-Id: <20230823131350.114942-35-alexandru.elisei@arm.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230823131350.114942-1-alexandru.elisei@arm.com> References: <20230823131350.114942-1-alexandru.elisei@arm.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-trace-kernel@vger.kernel.org As long as a fatal signal is pending, alloc_contig_range() will fail with -EINTR. This makes it impossible for tag storage allocation to succeed, and the page allocator will print an OOM splat. The process is going to be killed, so return 0 (success) from reserve_metadata_storage() to allow the page allocator to make progress. set_pte_at() will map it with PAGE_METADATA_NONE and subsequent accesses from different threads will trap until the signal is delivered. Signed-off-by: Alexandru Elisei --- arch/arm64/kernel/mte_tag_storage.c | 17 +++++++++++++++++ arch/arm64/mm/fault.c | 23 +++++++++++++++++++++++ 2 files changed, 40 insertions(+) diff --git a/arch/arm64/kernel/mte_tag_storage.c b/arch/arm64/kernel/mte_tag_storage.c index ce378f45f866..1ccbcc144979 100644 --- a/arch/arm64/kernel/mte_tag_storage.c +++ b/arch/arm64/kernel/mte_tag_storage.c @@ -556,6 +556,23 @@ int reserve_metadata_storage(struct page *page, int order, gfp_t gfp) break; } + /* + * alloc_contig_range() returns -EINTR from + * __alloc_contig_migrate_range() if a fatal signal is pending. + * As long as the signal hasn't been handled, it is impossible + * to reserve tag storage for any page. Treat it as an error, + * but return 0 so the page allocator can make forward progress, + * instead of printing an OOM splat. + * + * The tagged page with missing tag storage will be mapped with + * PAGE_METADATA_NONE in set_pte_at(), and accesses until the + * signal is delivered will cause a fault. + */ + if (ret == -EINTR) { + ret = 0; + goto out_error; + } + if (ret) goto out_error; diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c index 7e2dcf5e3baf..64c5d77664c8 100644 --- a/arch/arm64/mm/fault.c +++ b/arch/arm64/mm/fault.c @@ -37,7 +37,9 @@ #include #include #include +#include #include +#include #include #include #include @@ -936,10 +938,31 @@ void do_debug_exception(unsigned long addr_if_watchpoint, unsigned long esr, } NOKPROBE_SYMBOL(do_debug_exception); +static void save_zero_page_tags(struct page *page) +{ + void *tags; + + clear_page(page_address(page)); + + tags = kmalloc(MTE_PAGE_TAG_STORAGE_SIZE, GFP_KERNEL | __GFP_ZERO); + if (WARN_ON(!tags)) + return; + + if (WARN_ON(mte_save_page_tags_by_pfn(page, tags))) + mte_free_tags_mem(tags); +} + void tag_clear_highpage(struct page *page) { /* Tag storage pages cannot be tagged. */ WARN_ON_ONCE(is_migrate_metadata_page(page)); + + if (metadata_storage_enabled() && + unlikely(!page_tag_storage_reserved(page))) { + save_zero_page_tags(page); + return; + } + /* Newly allocated page, shouldn't have been tagged yet */ WARN_ON_ONCE(!try_page_mte_tagging(page)); mte_zero_clear_page_tags(page_address(page)); From patchwork Wed Aug 23 13:13:48 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexandru Elisei X-Patchwork-Id: 13362415 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 45336EE49B6 for ; Wed, 23 Aug 2023 13:27:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235797AbjHWN1H (ORCPT ); Wed, 23 Aug 2023 09:27:07 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42452 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235323AbjHWN1H (ORCPT ); Wed, 23 Aug 2023 09:27:07 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 93E37CEA; Wed, 23 Aug 2023 06:26:42 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 9A5CF1758; Wed, 23 Aug 2023 06:18:36 -0700 (PDT) Received: from e121798.cable.virginm.net (unknown [172.31.20.19]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 29A2D3F740; Wed, 23 Aug 2023 06:17:50 -0700 (PDT) From: Alexandru Elisei To: catalin.marinas@arm.com, will@kernel.org, oliver.upton@linux.dev, maz@kernel.org, james.morse@arm.com, suzuki.poulose@arm.com, yuzenghui@huawei.com, arnd@arndb.de, akpm@linux-foundation.org, mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, bristot@redhat.com, vschneid@redhat.com, mhiramat@kernel.org, rppt@kernel.org, hughd@google.com Cc: pcc@google.com, steven.price@arm.com, anshuman.khandual@arm.com, vincenzo.frascino@arm.com, david@redhat.com, eugenis@google.com, kcc@google.com, hyesoo.yu@samsung.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, kvmarm@lists.linux.dev, linux-fsdevel@vger.kernel.org, linux-arch@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org Subject: [PATCH RFC 35/37] mm: hugepage: Handle PAGE_METADATA_NONE faults for huge pages Date: Wed, 23 Aug 2023 14:13:48 +0100 Message-Id: <20230823131350.114942-36-alexandru.elisei@arm.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230823131350.114942-1-alexandru.elisei@arm.com> References: <20230823131350.114942-1-alexandru.elisei@arm.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-trace-kernel@vger.kernel.org Handle accesses to huge pages mapped with PAGE_METADATA_NONE in a similar way to how accesses to PTEs are handled. Signed-off-by: Alexandru Elisei --- include/asm-generic/memory_metadata.h | 2 + include/linux/huge_mm.h | 6 ++ mm/huge_memory.c | 108 ++++++++++++++++++++++++++ mm/memory.c | 7 +- 4 files changed, 121 insertions(+), 2 deletions(-) diff --git a/include/asm-generic/memory_metadata.h b/include/asm-generic/memory_metadata.h index 4176fd89ef41..dfdf2dd82ea6 100644 --- a/include/asm-generic/memory_metadata.h +++ b/include/asm-generic/memory_metadata.h @@ -7,6 +7,8 @@ extern unsigned long totalmetadata_pages; +void migrate_metadata_none_page(struct page *page, struct vm_area_struct *vma); + #ifndef CONFIG_MEMORY_METADATA static inline bool metadata_storage_enabled(void) { diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 20284387b841..6920571b5b6d 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -229,6 +229,7 @@ struct page *follow_devmap_pud(struct vm_area_struct *vma, unsigned long addr, pud_t *pud, int flags, struct dev_pagemap **pgmap); vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf); +vm_fault_t do_huge_pmd_metadata_none_page(struct vm_fault *vmf); extern struct page *huge_zero_page; extern unsigned long huge_zero_pfn; @@ -356,6 +357,11 @@ static inline vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf) return 0; } +static inline vm_fault_t do_huge_pmd_metadata_none_page(struct vm_fault *vmf) +{ + return 0; +} + static inline bool is_huge_zero_page(struct page *page) { return false; diff --git a/mm/huge_memory.c b/mm/huge_memory.c index cf5247b012de..06038424c3a7 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -26,6 +26,7 @@ #include #include #include +#include #include #include #include @@ -38,6 +39,7 @@ #include #include +#include #include #include #include "internal.h" @@ -1490,6 +1492,112 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma, return page; } +vm_fault_t do_huge_pmd_metadata_none_page(struct vm_fault *vmf) +{ + unsigned long haddr = vmf->address & HPAGE_PMD_MASK; + struct vm_area_struct *vma = vmf->vma; + pmd_t old_pmd = vmf->orig_pmd; + struct page *page = NULL; + bool do_migrate = false; + bool writable = false; + vm_fault_t err; + pmd_t new_pmd; + int ret; + + vmf->ptl = pmd_lockptr(vma->vm_mm, vmf->pmd); + spin_lock(vmf->ptl); + if (unlikely(!pmd_same(*vmf->pmd, old_pmd))) { + spin_unlock(vmf->ptl); + return 0; + } + + new_pmd = pmd_modify(old_pmd, vma->vm_page_prot); + + /* + * Detect now whether the PMD could be writable; this information + * is only valid while holding the PT lock. + */ + writable = pmd_write(new_pmd); + if (!writable && vma_wants_manual_pte_write_upgrade(vma) && + can_change_pmd_writable(vma, vmf->address, new_pmd)) + writable = true; + + page = vm_normal_page_pmd(vma, vmf->address, new_pmd); + if (!page) + goto out_map; + + /* + * This should never happen, once a VMA has been marked as tagged, that + * cannot be changed. + */ + if (!(vma->vm_flags & VM_MTE)) + goto out_map; + + /* Prevent the page from being unmapped from under us. */ + get_page(page); + vma_set_access_pid_bit(vma); + + spin_unlock(vmf->ptl); + writable = false; + + if (unlikely(is_migrate_isolate_page(page))) { + if (!(vmf->flags & FAULT_FLAG_TRIED)) + err = VM_FAULT_RETRY; + else + err = 0; + put_page(page); + } else if (is_migrate_metadata_page(page)) { + do_migrate = true; + } else { + ret = reserve_metadata_storage(page, HPAGE_PMD_ORDER, GFP_HIGHUSER_MOVABLE); + if (ret == -EINTR) { + put_page(page); + return VM_FAULT_RETRY; + } else if (ret) { + if (unlikely(page_metadata_in_swap(page))) { + if (vmf->flags & FAULT_FLAG_TRIED) + err = VM_FAULT_OOM; + else + err = VM_FAULT_RETRY; + + put_page(page); + return err; + } + do_migrate = true; + } + } + + if (do_migrate) { + migrate_metadata_none_page(page, vma); + /* + * Either the page was migrated, in which case there's nothing + * we need to do; either migration failed, in which case all we + * can do is try again. So don't change the pte. + */ + return 0; + } + + put_page(page); + + vmf->ptl = pmd_lock(vma->vm_mm, vmf->pmd); + if (unlikely(!pmd_same(*vmf->pmd, old_pmd))) { + spin_unlock(vmf->ptl); + return 0; + } + +out_map: + new_pmd = pmd_modify(old_pmd, vma->vm_page_prot); + new_pmd = pmd_mkyoung(new_pmd); + if (writable) + new_pmd = pmd_mkwrite(new_pmd); + set_pmd_at(vma->vm_mm, haddr, vmf->pmd, new_pmd); + update_mmu_cache_pmd(vma, vmf->address, vmf->pmd); + spin_unlock(vmf->ptl); + + return 0; +} + + /* NUMA hinting page fault entry point for trans huge pmds */ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf) { diff --git a/mm/memory.c b/mm/memory.c index ade71f38b2ff..6d78d33ef91f 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4695,7 +4695,7 @@ static vm_fault_t do_fault(struct vm_fault *vmf) } /* Returns with the page reference dropped. */ -static void migrate_metadata_none_page(struct page *page, struct vm_area_struct *vma) +void migrate_metadata_none_page(struct page *page, struct vm_area_struct *vma) { struct migration_target_control mtc = { .nid = NUMA_NO_NODE, @@ -5234,8 +5234,11 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma, return 0; } if (pmd_trans_huge(vmf.orig_pmd) || pmd_devmap(vmf.orig_pmd)) { - if (pmd_protnone(vmf.orig_pmd) && vma_is_accessible(vma)) + if (pmd_protnone(vmf.orig_pmd) && vma_is_accessible(vma)) { + if (metadata_storage_enabled() && pmd_metadata_none(vmf.orig_pmd)) + return do_huge_pmd_metadata_none_page(&vmf); return do_huge_pmd_numa_page(&vmf); + } if ((flags & (FAULT_FLAG_WRITE|FAULT_FLAG_UNSHARE)) && !pmd_write(vmf.orig_pmd)) { From patchwork Wed Aug 23 13:13:49 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexandru Elisei X-Patchwork-Id: 13362390 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DA6AEEE49A0 for ; Wed, 23 Aug 2023 13:19:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235946AbjHWNTV (ORCPT ); Wed, 23 Aug 2023 09:19:21 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51306 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235859AbjHWNSp (ORCPT ); Wed, 23 Aug 2023 09:18:45 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 8CF1D173E; Wed, 23 Aug 2023 06:18:16 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id B64F11688; Wed, 23 Aug 2023 06:18:42 -0700 (PDT) Received: from e121798.cable.virginm.net (unknown [172.31.20.19]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 58BFD3F740; Wed, 23 Aug 2023 06:17:56 -0700 (PDT) From: Alexandru Elisei To: catalin.marinas@arm.com, will@kernel.org, oliver.upton@linux.dev, maz@kernel.org, james.morse@arm.com, suzuki.poulose@arm.com, yuzenghui@huawei.com, arnd@arndb.de, akpm@linux-foundation.org, mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, bristot@redhat.com, vschneid@redhat.com, mhiramat@kernel.org, rppt@kernel.org, hughd@google.com Cc: pcc@google.com, steven.price@arm.com, anshuman.khandual@arm.com, vincenzo.frascino@arm.com, david@redhat.com, eugenis@google.com, kcc@google.com, hyesoo.yu@samsung.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, kvmarm@lists.linux.dev, linux-fsdevel@vger.kernel.org, linux-arch@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org Subject: [PATCH RFC 36/37] KVM: arm64: Disable MTE is tag storage is enabled Date: Wed, 23 Aug 2023 14:13:49 +0100 Message-Id: <20230823131350.114942-37-alexandru.elisei@arm.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230823131350.114942-1-alexandru.elisei@arm.com> References: <20230823131350.114942-1-alexandru.elisei@arm.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-trace-kernel@vger.kernel.org KVM allows MTE enabled VMs to be created when the backing memory is does not have MTE enabled. Without changes to how KVM allocates memory for a VM, it is impossible to discern when the corresponding tag storage needs to be reserved. For now, disable MTE in KVM if tag storage is enabled. Signed-off-by: Alexandru Elisei --- arch/arm64/kvm/arm.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index 72dc53a75d1c..1f39c2d5223d 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -38,6 +38,7 @@ #include #include #include +#include #include #include @@ -85,7 +86,8 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm, break; case KVM_CAP_ARM_MTE: mutex_lock(&kvm->lock); - if (!system_supports_mte() || kvm->created_vcpus) { + if (!system_supports_mte() || metadata_storage_enabled() || + kvm->created_vcpus) { r = -EINVAL; } else { r = 0; @@ -277,7 +279,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) r = 1; break; case KVM_CAP_ARM_MTE: - r = system_supports_mte(); + r = system_supports_mte() && !metadata_storage_enabled(); break; case KVM_CAP_STEAL_TIME: r = kvm_arm_pvtime_supported(); From patchwork Wed Aug 23 13:13:50 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexandru Elisei X-Patchwork-Id: 13362420 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 71156EE49B9 for ; Wed, 23 Aug 2023 13:27:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235859AbjHWN15 (ORCPT ); Wed, 23 Aug 2023 09:27:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53586 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236005AbjHWN1i (ORCPT ); Wed, 23 Aug 2023 09:27:38 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 5476B10C8; Wed, 23 Aug 2023 06:27:05 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id D315F15DB; Wed, 23 Aug 2023 06:18:48 -0700 (PDT) Received: from e121798.cable.virginm.net (unknown [172.31.20.19]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 799E43F740; Wed, 23 Aug 2023 06:18:02 -0700 (PDT) From: Alexandru Elisei To: catalin.marinas@arm.com, will@kernel.org, oliver.upton@linux.dev, maz@kernel.org, james.morse@arm.com, suzuki.poulose@arm.com, yuzenghui@huawei.com, arnd@arndb.de, akpm@linux-foundation.org, mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, bristot@redhat.com, vschneid@redhat.com, mhiramat@kernel.org, rppt@kernel.org, hughd@google.com Cc: pcc@google.com, steven.price@arm.com, anshuman.khandual@arm.com, vincenzo.frascino@arm.com, david@redhat.com, eugenis@google.com, kcc@google.com, hyesoo.yu@samsung.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, kvmarm@lists.linux.dev, linux-fsdevel@vger.kernel.org, linux-arch@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org Subject: [PATCH RFC 37/37] arm64: mte: Enable tag storage management Date: Wed, 23 Aug 2023 14:13:50 +0100 Message-Id: <20230823131350.114942-38-alexandru.elisei@arm.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230823131350.114942-1-alexandru.elisei@arm.com> References: <20230823131350.114942-1-alexandru.elisei@arm.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-trace-kernel@vger.kernel.org Everything is in place, enable tag storage management. Signed-off-by: Alexandru Elisei --- arch/arm64/kernel/mte_tag_storage.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/arch/arm64/kernel/mte_tag_storage.c b/arch/arm64/kernel/mte_tag_storage.c index 1ccbcc144979..18264bc8f590 100644 --- a/arch/arm64/kernel/mte_tag_storage.c +++ b/arch/arm64/kernel/mte_tag_storage.c @@ -399,6 +399,12 @@ static int __init mte_tag_storage_activate_regions(void) } ret = reserve_metadata_storage(ZERO_PAGE(0), 0, GFP_HIGHUSER_MOVABLE); + if (ret) { + pr_info("MTE tag storage disabled"); + } else { + static_branch_enable(&metadata_storage_enabled_key); + pr_info("MTE tag storage enabled\n"); + } return ret; }