From patchwork Fri Feb 28 02:30:42 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mathieu Desnoyers X-Patchwork-Id: 13995482 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7004FC282C5 for ; Fri, 28 Feb 2025 02:31:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DECEE6B0088; Thu, 27 Feb 2025 21:31:02 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D9C0A6B0083; Thu, 27 Feb 2025 21:31:02 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AB72E6B008A; Thu, 27 Feb 2025 21:31:02 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 7C62A6B0085 for ; Thu, 27 Feb 2025 21:31:02 -0500 (EST) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 2C8AC1620A3 for ; Fri, 28 Feb 2025 02:31:02 +0000 (UTC) X-FDA: 83167775964.26.CE243B6 Received: from smtpout.efficios.com (smtpout.efficios.com [158.69.130.18]) by imf11.hostedemail.com (Postfix) with ESMTP id 81B224000F for ; Fri, 28 Feb 2025 02:31:00 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=efficios.com header.s=smtpout1 header.b=VtAveP9A; spf=pass (imf11.hostedemail.com: domain of mathieu.desnoyers@efficios.com designates 158.69.130.18 as permitted sender) smtp.mailfrom=mathieu.desnoyers@efficios.com; dmarc=pass (policy=none) header.from=efficios.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740709860; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=fXnArMSCb5KiTeN8Z/namQ89Hv1JwptXDGI6FSUEi6A=; b=5myFuvXBUzfEki7Zu1e6aoFPirGJpchSIrVvvVcg46YyrGzKYrS+BVvcs6qE1HtnD7e2Eh t+t54v7n/Wom6NNZSOGFcjSvVXtjAsx6KJlT10r0n+JJY3rzfUn13VVm8xApcZZny8Apij dlbpDPETsr8DWp/YK71UUXcY5B6pFm8= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740709860; a=rsa-sha256; cv=none; b=Jg2GqWj8HU3Xvz1UfpDIVtEI5Rgvb5Z7YlzVRAAUXsCxjlpZZeET4gylaaB6QEsxmcdltu UopLyyMMZu97p3gbw7XezXa5mPx3KNd9QWWrXqmOZnZbFZcRUjSENbrWViCb/5qhUVdyBW wbJiUuMJOPhUv6AY888eUmXkUXDZFos= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=efficios.com header.s=smtpout1 header.b=VtAveP9A; spf=pass (imf11.hostedemail.com: domain of mathieu.desnoyers@efficios.com designates 158.69.130.18 as permitted sender) smtp.mailfrom=mathieu.desnoyers@efficios.com; dmarc=pass (policy=none) header.from=efficios.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=efficios.com; s=smtpout1; t=1740709859; bh=SeGYrGpP1DhpBbiS/ptbdJruKzifKB1SHy2MBwnsS+Y=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=VtAveP9AJXAPNcMG6yUNacYliDmZMVsLbX+I+FT0+5226T6VDewaavARxkPqffqtD eZS07g3M1icobG9CgkWjkkbB7e1Ehj3Uh9JbDV1i5R4ST4HJnYpPElMes00OSAiTI7 GzCIElr+xpund56fWwSyIjtyX9OenUv+XuK+G9mlRrKCFFBqsLSFzw4z0Q7Crni/mB h7wLKEEnd3eXD8FD6780cf2UEUQmpIQ/AEbm5u3ePCw5y4XJDHKorFVmfIVS9i8CwQ 1rMHy0Fj7M2b3DRrAaEIffY0+edPtY4cIs2Ur8LWjZyA8Y4TCkjT1il9rUbgmZX2l4 ZtymeBGqAg9Vg== Received: from thinkos.internal.efficios.com (unknown [IPv6:2606:6d00:100:4000:cacb:9855:de1f:ded2]) by smtpout.efficios.com (Postfix) with ESMTPSA id 4Z3sdb56crz10GZ; Thu, 27 Feb 2025 21:30:59 -0500 (EST) From: Mathieu Desnoyers To: Andrew Morton Cc: linux-kernel@vger.kernel.org, Mathieu Desnoyers , Linus Torvalds , Matthew Wilcox , Olivier Dion , linux-mm@kvack.org Subject: [RFC PATCH 1/2] mm: Introduce SKSM: Synchronous Kernel Samepage Merging Date: Thu, 27 Feb 2025 21:30:42 -0500 Message-Id: <20250228023043.83726-2-mathieu.desnoyers@efficios.com> X-Mailer: git-send-email 2.39.5 In-Reply-To: <20250228023043.83726-1-mathieu.desnoyers@efficios.com> References: <20250228023043.83726-1-mathieu.desnoyers@efficios.com> MIME-Version: 1.0 X-Rspamd-Server: rspam02 X-Stat-Signature: 3y8is591a869paebyjmfg7bkmfxhch6m X-Rspamd-Queue-Id: 81B224000F X-Rspam-User: X-HE-Tag: 1740709860-914490 X-HE-Meta: U2FsdGVkX1+2OpmKGGdjTB1seO1OvcPx1xUlBQ33wjtErU/G5RnSIGzDamBzLaBJIhFmUvy3ZGChmz7J2/Ch97ihk8uE7lOQkHwlDIWYrvFKUFAWdBLfy99Q8BxdzdRzhMBK4oa4eQ+bgLMpas9lGCOwy7XJ4WqckhrFH1bbcNez8R7RJN4o4QPIkTl7TSLRzjLxw4I6zn4DWqJHrtO+vsPuKP84wIWI1avo4y0ucd7Dv7wxI7+c3uxcWOBlF7yz0XjLb84oXcyJspvRkAmfDArkFBIHF45KrGZ23kyUBilT2MLanpwW3uuNAoZY9zz0yr3YRPwrC2adi0VKJX45HA2OqT2F3UT2pAxgxHXIXSlqUmLGdFkePUMxdCUq+vDbk454PjLFbU7+C7m3aTFxEVcgPaMI2w3wFgOafdBGU/SskOnTXFlF6VSrRk/qCXYFi4fEYpHTdG0StDejRNbofG8VkDTEyJwnPmQjQi+SOWiupniEQD7IKVpHc0hYLvoFsrAq7jdt48hQrVAqzJTNJU0o3to3PmKGNhdhtuSXDLn7kTeQuIAq+ze+hJCHi920nPLzfUiqqfOeyfqdbFCEjWVMb2NMSqS3hl42T2tk0hvRTHiSWXm0MSZk0bLaDGKxiYv+AfrpbBGJHy/7yyivXepUWdS4IvG5PoJ+WEA3vU4Sw31h4jKkSz2zUYRN6ihv+efaBRZQbVaeGu4zmO0/y9oOABPuBfngMpqJ2abvDnzceZh1n5GyLHXxeADhTTUDpPOLbHfN01v9rBMKkcfTF1xmOFZQpBvpq05ejPXe/ltVvDr/aPjXxJpQPmaIjljNqcjGINlBPz2sN0vpUcRqr94gFWwpiL2dYgE656FjjeTv/hmWKSxl13xHPHCa+sm53C0dGPwubXW0L9LVMuShZX/2XkGyqU2GkrN1f8B6GhlPm2Ytq/o0Qi+s2EQOcK2b7qLBAWsoqqzP2z/OOqj zEEsE+qC s4zsZpEfqUd2KwiB/TgUW7dQGji/F3LTE0x3x2bTzwCke1V99Ir7XShha+MRx3qo+axuaQavlozRH9SuCgt0/K5D84fx0xHytU5qrwdER3UQD3LfMeAvejHY03Rih8/p5WOOxXk0Neh1kToaAtWANodVREZwWIdze68GSNh+Rq5KGwQYesibf/9OY0LtLrGtdalQ2apN2bvJcc3LkYRNI7WyEFKOz5GpxHAMIFsKYJnqybL0Tem7osYN0JrJS0dFZdkjGE0PyRZVh9vvG5I6bwo7vMnw4gI83pkq8iyBM+ockgXK5jihN5WkQRMmjYondPkg9NlFAk3UHD9UbJDqfEgQ6Y4FZJBrIFf7T8oi1WiSJZ1Bw2DhZNrh3LBusUsmsn1/cE1cgmD5WKNOMmhOytyeCg5UE64ihuZD+LNf3zlQIEMtS8JkHe+i6p+sC+P+vQINEMZeJjAoRjeuXf2P3cikWHTffnXPUL9OPJtwr3Nbk/UVAfA52qGYkhFTcxPdBozA4 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: * Main use-case targeted by SKSM: Code patching The main use-case targeted by SKSM is deduplication of anonymous pages created by COW (Copy-On-Write) triggered by patching executable and library code for a user-space implementation of "static keys" and "alternative" code patching. Code patching improves: - Runtime feature detection, where a constructor can dynamically enable a feature by turning a no-op into a jump. - Instrumentation activation at runtime (e.g. tracepoints) (patch a dormant no-op instrumentation into a jump). - Runtime assembler specialisation, where a constructor can dynamically modify assembler instructions to select the best alternative for the detected hardware and software environment (e.g. CPU features, rseq availability). The main distinction between doing code patching at kernel-level and at user-space level is that in user-space, executable and library code is shared across all processes mapping the same executable or library files. This reduces memory use and improves cache locality by sharing executable pages across processes. Writing to those private mappings trigger COW, which allocates anonymous pages within each process, and thus lose the benefit from sharing the same pages from the backing storage. Without memory deduplication, this increases memory use, and therefore degrades cache locality: populating the patched content into separate COW pages within each process ends up using distinct CPU cache lines, thus trashing the CPU instruction and data caches. * Why not use KSM ? The KSM mechanism has the following downsides which the SKSM ABI aims to overcome: - KSM requires careful tuning of scan parameters for the workload by the system administrator. A) This makes KSM mostly useless with a standard distro config. B) KSM is workload-specific. C) Scanning pages adds overhead to the system, which is the reason why the scan parameters must be tuned for the workload. - KSM has security implications, because it allows processes to confirm that an unrelated process has a page which contains a known content. A) The documentation of madvise(2) MADV_MERGEABLE would benefit from advising against targeting memory that contains secret data, due to the risk of discovery through side-channel timing attack. B) prctl(2) PR_SET_MEMORY_MERGE inherently marks the entire process memory as mergeable, which makes it incompatible with security oriented use-cases. * SKSM Overview SKSM enables synchronous dynamic sharing of identical pages found in different memory areas, even if they are not shared by fork(). Userspace must explicitly request for pages within specific address ranges to be merged with madvise MADV_MERGE. Those should *not* contain secrets, as side-channel timing attacks can allow a process to learn the existence of a known content within another process. The synchronous memory merging performs the memory merging synchronously within madvise. There is no global scan and no need for background daemon. The anonymous pages targeted for merge are write-protected and checksummed. They are then compared to other pages targeted for merge. The mergeable pages are added to a hash table indexed by checksum of their content. The hash value is derived from the page content checksum, and its comparison function is based on comparison of the page content. If a page is written to after being targeted for merge, a COW will be triggered, and thus a new page will be populated in its stead. * Expected Use User-space is expected to perform code patching, e.g. from a library constructor, and then when the text pages are expected to stay invariant for a long time, issue madvise(2) MADV_MERGE on those pages. At this point, the pages will be write-protected, and merged with identical SKSM pages. Further stores to those pages will trigger COW again. * Results Output of "cat /proc/vmstat | grep nr_anon_pages" while running 1000 instances of "sleep 500": - Baseline (no preload): nr_anon_pages 39721 - COW each executable page from libc: nr_anon_pages 419927 - madvise MADV_MERGE after COW of libc: nr_anon_pages 45525 * Limitations - This is a Proof-of-concept ! - It is incompatible with SKM (depends on !KSM) for now. - Swap behavior under memory pressure is untested. - The size of the hash table is static (65536 buckets) for now. Signed-off-by: Mathieu Desnoyers Cc: Andrew Morton Cc: Linus Torvalds Cc: Matthew Wilcox Cc: Olivier Dion Cc: linux-mm@kvack.org --- include/linux/ksm.h | 4 + include/linux/mm_types.h | 7 + include/linux/page-flags.h | 42 +++++ include/linux/sksm.h | 27 +++ include/uapi/asm-generic/mman-common.h | 2 + mm/Kconfig | 5 + mm/Makefile | 1 + mm/ksm-common.h | 228 +++++++++++++++++++++++++ mm/ksm.c | 219 +----------------------- mm/madvise.c | 6 + mm/memory.c | 2 + mm/page_alloc.c | 3 + mm/sksm.c | 190 +++++++++++++++++++++ 13 files changed, 518 insertions(+), 218 deletions(-) create mode 100644 include/linux/sksm.h create mode 100644 mm/ksm-common.h create mode 100644 mm/sksm.c diff --git a/include/linux/ksm.h b/include/linux/ksm.h index 6a53ac4885bb..dc3ce855863c 100644 --- a/include/linux/ksm.h +++ b/include/linux/ksm.h @@ -118,6 +118,10 @@ static inline void ksm_exit(struct mm_struct *mm) { } +static inline void ksm_map_zero_page(struct mm_struct *mm) +{ +} + static inline void ksm_might_unmap_zero_page(struct mm_struct *mm, pte_t pte) { } diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 332cee285662..e4940562cb81 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -19,6 +19,7 @@ #include #include #include +#include #include @@ -216,6 +217,12 @@ struct page { struct page *kmsan_shadow; struct page *kmsan_origin; #endif + +#ifdef CONFIG_SKSM + /* TODO: move those fields into unused union fields instead. */ + struct hlist_node sksm_node; + u32 checksum; +#endif } _struct_page_alignment; /* diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 691506bdf2c5..4e96437ab94e 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -701,6 +701,48 @@ static __always_inline bool PageAnon(const struct page *page) return folio_test_anon(page_folio(page)); } +#ifdef CONFIG_SKSM +static __always_inline bool folio_test_sksm(const struct folio *folio) +{ + return !hlist_unhashed_lockless(&folio->page.sksm_node); +} +#else +static __always_inline bool folio_test_sksm(const struct folio *folio) +{ + return false; +} +#endif + +static __always_inline bool PageSKSM(const struct page *page) +{ + return folio_test_sksm(page_folio(page)); +} + +#ifdef CONFIG_SKSM +static inline void set_page_checksum(struct page *page, u32 checksum) +{ + page->checksum = checksum; +} + +static inline void init_page_sksm_node(struct page *page) +{ + INIT_HLIST_NODE(&page->sksm_node); +} + +void __sksm_page_remove(struct page *page); + +static inline void sksm_page_remove(struct page *page) +{ + if (!PageSKSM(page)) + return; + __sksm_page_remove(page); +} +#else +static inline void set_page_checksum(struct page *page, u32 checksum) { } +static inline void init_page_sksm_node(struct page *page) { } +static inline void sksm_page_remove(struct page *page) { } +#endif + static __always_inline bool __folio_test_movable(const struct folio *folio) { return ((unsigned long)folio->mapping & PAGE_MAPPING_FLAGS) == diff --git a/include/linux/sksm.h b/include/linux/sksm.h new file mode 100644 index 000000000000..4f3aaec512df --- /dev/null +++ b/include/linux/sksm.h @@ -0,0 +1,27 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __LINUX_SKSM_H +#define __LINUX_SKSM_H +/* + * Synchronous memory merging support. + * + * This code enables synchronous dynamic sharing of identical pages + * found in different memory areas, even if they are not shared by + * fork(). + */ + +#ifdef CONFIG_SKSM + +int sksm_merge(struct vm_area_struct *vma, unsigned long start, + unsigned long end); + +#else /* !CONFIG_KSM */ + +static inline int sksm_merge(struct vm_area_struct *vma, unsigned long start, + unsigned long end) +{ + return 0; +} + +#endif /* !CONFIG_KSM */ + +#endif /* __LINUX_SKSM_H */ diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h index 1ea2c4c33b86..8bd57eb21c12 100644 --- a/include/uapi/asm-generic/mman-common.h +++ b/include/uapi/asm-generic/mman-common.h @@ -79,6 +79,8 @@ #define MADV_COLLAPSE 25 /* Synchronous hugepage collapse */ +#define MADV_MERGE 26 /* Synchronously merge identical pages */ + #define MADV_GUARD_INSTALL 102 /* fatal signal on access to range */ #define MADV_GUARD_REMOVE 103 /* unguard range */ diff --git a/mm/Kconfig b/mm/Kconfig index 84000b016808..067d4c3aa21c 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -740,6 +740,11 @@ config KSM until a program has madvised that an area is MADV_MERGEABLE, and root has set /sys/kernel/mm/ksm/run to 1 (if CONFIG_SYSFS is set). +config SKSM + bool "Enable Synchronous KSM for page merging" + depends on MMU && !KSM + select XXHASH + config DEFAULT_MMAP_MIN_ADDR int "Low address space to protect from user allocation" depends on MMU diff --git a/mm/Makefile b/mm/Makefile index dba52bb0da8a..8722c3ea572c 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -85,6 +85,7 @@ obj-$(CONFIG_SPARSEMEM) += sparse.o obj-$(CONFIG_SPARSEMEM_VMEMMAP) += sparse-vmemmap.o obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o obj-$(CONFIG_KSM) += ksm.o +obj-$(CONFIG_SKSM) += sksm.o obj-$(CONFIG_PAGE_POISONING) += page_poison.o obj-$(CONFIG_KASAN) += kasan/ obj-$(CONFIG_KFENCE) += kfence/ diff --git a/mm/ksm-common.h b/mm/ksm-common.h new file mode 100644 index 000000000000..b676f1f5c10f --- /dev/null +++ b/mm/ksm-common.h @@ -0,0 +1,228 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Memory merging support, common code. + */ +#ifndef _KSM_COMMON_H +#define _KSM_COMMON_H + +#include + +static bool vma_ksm_compatible(struct vm_area_struct *vma) +{ + if (vma->vm_flags & (VM_SHARED | VM_MAYSHARE | VM_PFNMAP | + VM_IO | VM_DONTEXPAND | VM_HUGETLB | + VM_MIXEDMAP| VM_DROPPABLE)) + return false; /* just ignore the advice */ + + if (vma_is_dax(vma)) + return false; + +#ifdef VM_SAO + if (vma->vm_flags & VM_SAO) + return false; +#endif +#ifdef VM_SPARC_ADI + if (vma->vm_flags & VM_SPARC_ADI) + return false; +#endif + + return true; +} + +static u32 calc_checksum(struct page *page) +{ + u32 checksum; + void *addr = kmap_local_page(page); + checksum = xxhash(addr, PAGE_SIZE, 0); + kunmap_local(addr); + return checksum; +} + +static int write_protect_page(struct vm_area_struct *vma, struct folio *folio, + pte_t *orig_pte) +{ + struct mm_struct *mm = vma->vm_mm; + DEFINE_FOLIO_VMA_WALK(pvmw, folio, vma, 0, 0); + int swapped; + int err = -EFAULT; + struct mmu_notifier_range range; + bool anon_exclusive; + pte_t entry; + + if (WARN_ON_ONCE(folio_test_large(folio))) + return err; + + pvmw.address = page_address_in_vma(folio, folio_page(folio, 0), vma); + if (pvmw.address == -EFAULT) + goto out; + + mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, mm, pvmw.address, + pvmw.address + PAGE_SIZE); + mmu_notifier_invalidate_range_start(&range); + + if (!page_vma_mapped_walk(&pvmw)) + goto out_mn; + if (WARN_ONCE(!pvmw.pte, "Unexpected PMD mapping?")) + goto out_unlock; + + anon_exclusive = PageAnonExclusive(&folio->page); + entry = ptep_get(pvmw.pte); + if (pte_write(entry) || pte_dirty(entry) || + anon_exclusive || mm_tlb_flush_pending(mm)) { + swapped = folio_test_swapcache(folio); + flush_cache_page(vma, pvmw.address, folio_pfn(folio)); + /* + * Ok this is tricky, when get_user_pages_fast() run it doesn't + * take any lock, therefore the check that we are going to make + * with the pagecount against the mapcount is racy and + * O_DIRECT can happen right after the check. + * So we clear the pte and flush the tlb before the check + * this assure us that no O_DIRECT can happen after the check + * or in the middle of the check. + * + * No need to notify as we are downgrading page table to read + * only not changing it to point to a new page. + * + * See Documentation/mm/mmu_notifier.rst + */ + entry = ptep_clear_flush(vma, pvmw.address, pvmw.pte); + /* + * Check that no O_DIRECT or similar I/O is in progress on the + * page + */ + if (folio_mapcount(folio) + 1 + swapped != folio_ref_count(folio)) { + set_pte_at(mm, pvmw.address, pvmw.pte, entry); + goto out_unlock; + } + + /* See folio_try_share_anon_rmap_pte(): clear PTE first. */ + if (anon_exclusive && + folio_try_share_anon_rmap_pte(folio, &folio->page)) { + set_pte_at(mm, pvmw.address, pvmw.pte, entry); + goto out_unlock; + } + + if (pte_dirty(entry)) + folio_mark_dirty(folio); + entry = pte_mkclean(entry); + + if (pte_write(entry)) + entry = pte_wrprotect(entry); + + set_pte_at(mm, pvmw.address, pvmw.pte, entry); + } + *orig_pte = entry; + err = 0; + +out_unlock: + page_vma_mapped_walk_done(&pvmw); +out_mn: + mmu_notifier_invalidate_range_end(&range); +out: + return err; +} + +/** + * replace_page - replace page in vma by new ksm page + * @vma: vma that holds the pte pointing to page + * @page: the page we are replacing by kpage + * @kpage: the ksm page we replace page by + * @orig_pte: the original value of the pte + * + * Returns 0 on success, -EFAULT on failure. + */ +static int replace_page(struct vm_area_struct *vma, struct page *page, + struct page *kpage, pte_t orig_pte) +{ + struct folio *kfolio = page_folio(kpage); + struct mm_struct *mm = vma->vm_mm; + struct folio *folio = page_folio(page); + pmd_t *pmd; + pmd_t pmde; + pte_t *ptep; + pte_t newpte; + spinlock_t *ptl; + unsigned long addr; + int err = -EFAULT; + struct mmu_notifier_range range; + + addr = page_address_in_vma(folio, page, vma); + if (addr == -EFAULT) + goto out; + + pmd = mm_find_pmd(mm, addr); + if (!pmd) + goto out; + /* + * Some THP functions use the sequence pmdp_huge_clear_flush(), set_pmd_at() + * without holding anon_vma lock for write. So when looking for a + * genuine pmde (in which to find pte), test present and !THP together. + */ + pmde = pmdp_get_lockless(pmd); + if (!pmd_present(pmde) || pmd_trans_huge(pmde)) + goto out; + + mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, mm, addr, + addr + PAGE_SIZE); + mmu_notifier_invalidate_range_start(&range); + + ptep = pte_offset_map_lock(mm, pmd, addr, &ptl); + if (!ptep) + goto out_mn; + if (!pte_same(ptep_get(ptep), orig_pte)) { + pte_unmap_unlock(ptep, ptl); + goto out_mn; + } + VM_BUG_ON_PAGE(PageAnonExclusive(page), page); + VM_BUG_ON_FOLIO(folio_test_anon(kfolio) && PageAnonExclusive(kpage), + kfolio); + + /* + * No need to check ksm_use_zero_pages here: we can only have a + * zero_page here if ksm_use_zero_pages was enabled already. + */ + if (!is_zero_pfn(page_to_pfn(kpage))) { + folio_get(kfolio); + folio_add_anon_rmap_pte(kfolio, kpage, vma, addr, RMAP_NONE); + newpte = mk_pte(kpage, vma->vm_page_prot); + } else { + /* + * Use pte_mkdirty to mark the zero page mapped by KSM, and then + * we can easily track all KSM-placed zero pages by checking if + * the dirty bit in zero page's PTE is set. + */ + newpte = pte_mkdirty(pte_mkspecial(pfn_pte(page_to_pfn(kpage), vma->vm_page_prot))); + ksm_map_zero_page(mm); + /* + * We're replacing an anonymous page with a zero page, which is + * not anonymous. We need to do proper accounting otherwise we + * will get wrong values in /proc, and a BUG message in dmesg + * when tearing down the mm. + */ + dec_mm_counter(mm, MM_ANONPAGES); + } + + flush_cache_page(vma, addr, pte_pfn(ptep_get(ptep))); + /* + * No need to notify as we are replacing a read only page with another + * read only page with the same content. + * + * See Documentation/mm/mmu_notifier.rst + */ + ptep_clear_flush(vma, addr, ptep); + set_pte_at(mm, addr, ptep, newpte); + + folio_remove_rmap_pte(folio, page, vma); + if (!folio_mapped(folio)) + folio_free_swap(folio); + folio_put(folio); + + pte_unmap_unlock(ptep, ptl); + err = 0; +out_mn: + mmu_notifier_invalidate_range_end(&range); +out: + return err; +} + +#endif /* _KSM_COMMON_H */ diff --git a/mm/ksm.c b/mm/ksm.c index 31a9bc365437..c495469a8329 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -44,6 +44,7 @@ #include #include "internal.h" #include "mm_slot.h" +#include "ksm-common.h" #define CREATE_TRACE_POINTS #include @@ -677,28 +678,6 @@ static int break_ksm(struct vm_area_struct *vma, unsigned long addr, bool lock_v return (ret & VM_FAULT_OOM) ? -ENOMEM : 0; } -static bool vma_ksm_compatible(struct vm_area_struct *vma) -{ - if (vma->vm_flags & (VM_SHARED | VM_MAYSHARE | VM_PFNMAP | - VM_IO | VM_DONTEXPAND | VM_HUGETLB | - VM_MIXEDMAP| VM_DROPPABLE)) - return false; /* just ignore the advice */ - - if (vma_is_dax(vma)) - return false; - -#ifdef VM_SAO - if (vma->vm_flags & VM_SAO) - return false; -#endif -#ifdef VM_SPARC_ADI - if (vma->vm_flags & VM_SPARC_ADI) - return false; -#endif - - return true; -} - static struct vm_area_struct *find_mergeable_vma(struct mm_struct *mm, unsigned long addr) { @@ -1234,202 +1213,6 @@ static int unmerge_and_remove_all_rmap_items(void) } #endif /* CONFIG_SYSFS */ -static u32 calc_checksum(struct page *page) -{ - u32 checksum; - void *addr = kmap_local_page(page); - checksum = xxhash(addr, PAGE_SIZE, 0); - kunmap_local(addr); - return checksum; -} - -static int write_protect_page(struct vm_area_struct *vma, struct folio *folio, - pte_t *orig_pte) -{ - struct mm_struct *mm = vma->vm_mm; - DEFINE_FOLIO_VMA_WALK(pvmw, folio, vma, 0, 0); - int swapped; - int err = -EFAULT; - struct mmu_notifier_range range; - bool anon_exclusive; - pte_t entry; - - if (WARN_ON_ONCE(folio_test_large(folio))) - return err; - - pvmw.address = page_address_in_vma(folio, folio_page(folio, 0), vma); - if (pvmw.address == -EFAULT) - goto out; - - mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, mm, pvmw.address, - pvmw.address + PAGE_SIZE); - mmu_notifier_invalidate_range_start(&range); - - if (!page_vma_mapped_walk(&pvmw)) - goto out_mn; - if (WARN_ONCE(!pvmw.pte, "Unexpected PMD mapping?")) - goto out_unlock; - - anon_exclusive = PageAnonExclusive(&folio->page); - entry = ptep_get(pvmw.pte); - if (pte_write(entry) || pte_dirty(entry) || - anon_exclusive || mm_tlb_flush_pending(mm)) { - swapped = folio_test_swapcache(folio); - flush_cache_page(vma, pvmw.address, folio_pfn(folio)); - /* - * Ok this is tricky, when get_user_pages_fast() run it doesn't - * take any lock, therefore the check that we are going to make - * with the pagecount against the mapcount is racy and - * O_DIRECT can happen right after the check. - * So we clear the pte and flush the tlb before the check - * this assure us that no O_DIRECT can happen after the check - * or in the middle of the check. - * - * No need to notify as we are downgrading page table to read - * only not changing it to point to a new page. - * - * See Documentation/mm/mmu_notifier.rst - */ - entry = ptep_clear_flush(vma, pvmw.address, pvmw.pte); - /* - * Check that no O_DIRECT or similar I/O is in progress on the - * page - */ - if (folio_mapcount(folio) + 1 + swapped != folio_ref_count(folio)) { - set_pte_at(mm, pvmw.address, pvmw.pte, entry); - goto out_unlock; - } - - /* See folio_try_share_anon_rmap_pte(): clear PTE first. */ - if (anon_exclusive && - folio_try_share_anon_rmap_pte(folio, &folio->page)) { - set_pte_at(mm, pvmw.address, pvmw.pte, entry); - goto out_unlock; - } - - if (pte_dirty(entry)) - folio_mark_dirty(folio); - entry = pte_mkclean(entry); - - if (pte_write(entry)) - entry = pte_wrprotect(entry); - - set_pte_at(mm, pvmw.address, pvmw.pte, entry); - } - *orig_pte = entry; - err = 0; - -out_unlock: - page_vma_mapped_walk_done(&pvmw); -out_mn: - mmu_notifier_invalidate_range_end(&range); -out: - return err; -} - -/** - * replace_page - replace page in vma by new ksm page - * @vma: vma that holds the pte pointing to page - * @page: the page we are replacing by kpage - * @kpage: the ksm page we replace page by - * @orig_pte: the original value of the pte - * - * Returns 0 on success, -EFAULT on failure. - */ -static int replace_page(struct vm_area_struct *vma, struct page *page, - struct page *kpage, pte_t orig_pte) -{ - struct folio *kfolio = page_folio(kpage); - struct mm_struct *mm = vma->vm_mm; - struct folio *folio = page_folio(page); - pmd_t *pmd; - pmd_t pmde; - pte_t *ptep; - pte_t newpte; - spinlock_t *ptl; - unsigned long addr; - int err = -EFAULT; - struct mmu_notifier_range range; - - addr = page_address_in_vma(folio, page, vma); - if (addr == -EFAULT) - goto out; - - pmd = mm_find_pmd(mm, addr); - if (!pmd) - goto out; - /* - * Some THP functions use the sequence pmdp_huge_clear_flush(), set_pmd_at() - * without holding anon_vma lock for write. So when looking for a - * genuine pmde (in which to find pte), test present and !THP together. - */ - pmde = pmdp_get_lockless(pmd); - if (!pmd_present(pmde) || pmd_trans_huge(pmde)) - goto out; - - mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, mm, addr, - addr + PAGE_SIZE); - mmu_notifier_invalidate_range_start(&range); - - ptep = pte_offset_map_lock(mm, pmd, addr, &ptl); - if (!ptep) - goto out_mn; - if (!pte_same(ptep_get(ptep), orig_pte)) { - pte_unmap_unlock(ptep, ptl); - goto out_mn; - } - VM_BUG_ON_PAGE(PageAnonExclusive(page), page); - VM_BUG_ON_FOLIO(folio_test_anon(kfolio) && PageAnonExclusive(kpage), - kfolio); - - /* - * No need to check ksm_use_zero_pages here: we can only have a - * zero_page here if ksm_use_zero_pages was enabled already. - */ - if (!is_zero_pfn(page_to_pfn(kpage))) { - folio_get(kfolio); - folio_add_anon_rmap_pte(kfolio, kpage, vma, addr, RMAP_NONE); - newpte = mk_pte(kpage, vma->vm_page_prot); - } else { - /* - * Use pte_mkdirty to mark the zero page mapped by KSM, and then - * we can easily track all KSM-placed zero pages by checking if - * the dirty bit in zero page's PTE is set. - */ - newpte = pte_mkdirty(pte_mkspecial(pfn_pte(page_to_pfn(kpage), vma->vm_page_prot))); - ksm_map_zero_page(mm); - /* - * We're replacing an anonymous page with a zero page, which is - * not anonymous. We need to do proper accounting otherwise we - * will get wrong values in /proc, and a BUG message in dmesg - * when tearing down the mm. - */ - dec_mm_counter(mm, MM_ANONPAGES); - } - - flush_cache_page(vma, addr, pte_pfn(ptep_get(ptep))); - /* - * No need to notify as we are replacing a read only page with another - * read only page with the same content. - * - * See Documentation/mm/mmu_notifier.rst - */ - ptep_clear_flush(vma, addr, ptep); - set_pte_at(mm, addr, ptep, newpte); - - folio_remove_rmap_pte(folio, page, vma); - if (!folio_mapped(folio)) - folio_free_swap(folio); - folio_put(folio); - - pte_unmap_unlock(ptep, ptl); - err = 0; -out_mn: - mmu_notifier_invalidate_range_end(&range); -out: - return err; -} - /* * try_to_merge_one_page - take two pages and merge them into one * @vma: the vma that holds the pte pointing to page diff --git a/mm/madvise.c b/mm/madvise.c index 0ceae57da7da..d9d678053ca2 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -22,6 +22,7 @@ #include #include #include +#include #include #include #include @@ -1318,6 +1319,8 @@ static int madvise_vma_behavior(struct vm_area_struct *vma, return madvise_guard_install(vma, prev, start, end); case MADV_GUARD_REMOVE: return madvise_guard_remove(vma, prev, start, end); + case MADV_MERGE: + return sksm_merge(vma, start, end); } anon_name = anon_vma_name(vma); @@ -1422,6 +1425,9 @@ madvise_behavior_valid(int behavior) #ifdef CONFIG_MEMORY_FAILURE case MADV_SOFT_OFFLINE: case MADV_HWPOISON: +#endif +#ifdef CONFIG_SKSM + case MADV_MERGE: #endif return true; diff --git a/mm/memory.c b/mm/memory.c index 398c031be9ba..782363315b31 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3618,6 +3618,8 @@ static bool wp_can_reuse_anon_folio(struct folio *folio, */ if (folio_test_ksm(folio) || folio_ref_count(folio) > 3) return false; + if (folio_test_sksm(folio)) + return false; if (!folio_test_lru(folio)) /* * We cannot easily detect+handle references from diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 01eab25edf89..0bb9755896ce 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1122,6 +1122,7 @@ __always_inline bool free_pages_prepare(struct page *page, return false; } + sksm_page_remove(page); page_cpupid_reset_last(page); page->flags &= ~PAGE_FLAGS_CHECK_AT_PREP; reset_page_owner(page, order); @@ -1509,6 +1510,8 @@ inline void post_alloc_hook(struct page *page, unsigned int order, set_page_private(page, 0); set_page_refcounted(page); + set_page_checksum(page, 0); + init_page_sksm_node(page); arch_alloc_page(page, order); debug_pagealloc_map_pages(page, 1 << order); diff --git a/mm/sksm.c b/mm/sksm.c new file mode 100644 index 000000000000..190f6bc05f2d --- /dev/null +++ b/mm/sksm.c @@ -0,0 +1,190 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Synchronous memory merging support. + * + * This code enables synchronous dynamic sharing of identical pages + * found in different memory areas, even if they are not shared by + * fork(). + * + * Userspace must explicitly request for pages within specific address + * ranges to be merged with madvise MADV_MERGE. Those should *not* + * contain secrets, as side-channel timing attacks can allow a process + * to learn the existence of a known content within another process. + * + * The synchronous memory merging performs the memory merging + * synchronously within madvise. There is no global scan and no need + * for background daemon. + * + * The anonymous pages targeted for merge are write-protected and + * checksummed. They are then compared to other pages targeted for + * merge. + * + * The mergeable pages are added to a hash table indexed by checksum of + * their content. The hash value is derived from the page content + * checksum, and its comparison function is based on comparison of + * the page content. + * + * If a page is written to after being targeted for merge, a COW will be + * triggered, and thus a new page will be populated in its stead. + * + * The typical usage pattern expected from userspace is: + * + * 1) Userspace writes non-secret content to a MAP_PRIVATE page, thus + * triggering COW. + * + * 2) After userspace has completed writing to the page, it issues + * madvise MADV_MERGE on a range containing the page, which + * write-protect, checksum, and add the page to the sksm hash + * table. It then merges this page with other mergeable pages + * that have the same content. + * + * 3) It is typically expected that this page's content stays invariant + * for a long time. If userspace issues writes to the page after + * madvise MADV_MERGE, another COW will be triggered, which will + * populate a new page copy into the process page table and release + * the reference to the old page. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "internal.h" +#include "ksm-common.h" + +#define SKSM_HT_BITS 16 + +static DEFINE_MUTEX(sksm_lock); + +/* + * The hash is derived from the page checksum. + */ +static DEFINE_HASHTABLE(sksm_ht, SKSM_HT_BITS); + +void __sksm_page_remove(struct page *page) +{ + guard(mutex)(&sksm_lock); + hash_del(&page->sksm_node); +} + +static int sksm_merge_page(struct vm_area_struct *vma, struct page *page) +{ + struct folio *folio = page_folio(page); + pte_t orig_pte = __pte(0); + struct page *kpage; + int err = 0; + + folio_lock(folio); + + if (folio_test_large(folio)) { + if (split_huge_page(page)) + goto out_unlock; + folio = page_folio(page); + } + + /* Write protect page. */ + if (write_protect_page(vma, folio, &orig_pte) != 0) + goto out_unlock; + + /* Checksum page. */ + page->checksum = calc_checksum(page); + + guard(mutex)(&sksm_lock); + + /* Merge page with duplicates. */ + hash_for_each_possible(sksm_ht, kpage, sksm_node, page->checksum) { + if (page->checksum != kpage->checksum || !pages_identical(page, kpage)) + continue; + if (!get_page_unless_zero(kpage)) + continue; + err = replace_page(vma, page, kpage, orig_pte); + put_page(kpage); + if (!err) + goto out_unlock; + } + + /* + * This page is not linked to its address_space anymore because it + * can be shared with other processes and replace pages originally + * associated with other address spaces. + */ + page->mapping = (void *) PAGE_MAPPING_ANON; + + /* Add page to hash table. */ + hash_add(sksm_ht, &page->sksm_node, page->checksum); +out_unlock: + folio_unlock(folio); + return err; +} + +static struct page *get_vma_page_from_addr(struct vm_area_struct *vma, unsigned long addr) +{ + struct page *page = NULL; + struct folio_walk fw; + struct folio *folio; + + folio = folio_walk_start(&fw, vma, addr, 0); + if (folio) { + if (!folio_is_zone_device(folio) && + folio_test_anon(folio)) { + folio_get(folio); + page = fw.page; + } + folio_walk_end(&fw, vma); + } + if (page) { + flush_anon_page(vma, page, addr); + flush_dcache_page(page); + } + return page; +} + +/* Called with mmap write lock held. */ +int sksm_merge(struct vm_area_struct *vma, unsigned long start, + unsigned long end) +{ + unsigned long addr; + int err = 0; + + if (!PAGE_ALIGNED(start) || !PAGE_ALIGNED(end)) + return -EINVAL; + if (!vma_ksm_compatible(vma)) + return 0; + + /* + * A number of pages can hang around indefinitely in per-cpu + * LRU cache, raised page count preventing write_protect_page + * from merging them. + */ + lru_add_drain_all(); + + for (addr = start; addr < end && !err; addr += PAGE_SIZE) { + struct page *page = get_vma_page_from_addr(vma, addr); + + if (!page) + continue; + err = sksm_merge_page(vma, page); + put_page(page); + } + return err; +} + +static int __init sksm_init(void) +{ + struct page *zero_page = ZERO_PAGE(0); + + zero_page->checksum = calc_checksum(zero_page); + /* Add page to hash table. */ + hash_add(sksm_ht, &zero_page->sksm_node, zero_page->checksum); + return 0; +} +subsys_initcall(sksm_init); From patchwork Fri Feb 28 02:30:43 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mathieu Desnoyers X-Patchwork-Id: 13995483 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6422FC282C1 for ; Fri, 28 Feb 2025 02:31:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 104F16B0083; Thu, 27 Feb 2025 21:31:03 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id F23756B008C; Thu, 27 Feb 2025 21:31:02 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D31F46B0093; Thu, 27 Feb 2025 21:31:02 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 872006B0088 for ; Thu, 27 Feb 2025 21:31:02 -0500 (EST) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 250FE82136 for ; Fri, 28 Feb 2025 02:31:02 +0000 (UTC) X-FDA: 83167775964.28.3FAB3B9 Received: from smtpout.efficios.com (smtpout.efficios.com [158.69.130.18]) by imf15.hostedemail.com (Postfix) with ESMTP id 9AE38A0007 for ; Fri, 28 Feb 2025 02:31:00 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=efficios.com header.s=smtpout1 header.b=EmykgpXy; spf=pass (imf15.hostedemail.com: domain of mathieu.desnoyers@efficios.com designates 158.69.130.18 as permitted sender) smtp.mailfrom=mathieu.desnoyers@efficios.com; dmarc=pass (policy=none) header.from=efficios.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740709860; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=NQe5+uFBHMs29DYpVcjU1aCjnWzs001jFTZ02V7HwFw=; b=Rjk6pktkUvGSDYFHK/Me9CDQNueG7tNheB0ZWUmzsK3vquB6JbKjbm7g2OLO4R+VY6UU3G yZ6ZWKd0HR5f7YQBTManaT9lBKtjCjv1g1Y8Jl7p5DO/N1InvDKslzG7tXilHd3Rkt2OTt It+hjR7YxTlGa0bAIP3Uj35qh1xgEuU= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=efficios.com header.s=smtpout1 header.b=EmykgpXy; spf=pass (imf15.hostedemail.com: domain of mathieu.desnoyers@efficios.com designates 158.69.130.18 as permitted sender) smtp.mailfrom=mathieu.desnoyers@efficios.com; dmarc=pass (policy=none) header.from=efficios.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740709860; a=rsa-sha256; cv=none; b=pyRUuZb7MoKUIBunXN5SzyRiWmYLEwB6Ol317coNn/+1r2WQVwcWLYlR3expF3N6Td+12s N0sNNrIPxtb7sjSaCCXiLSb3ZQMPka83fP381ONjlf2Wes9nLstK6ZW6GCqmoF99db5eO6 RKLsjyMFFhC29+M/DyuxOjLx3a+/2X4= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=efficios.com; s=smtpout1; t=1740709859; bh=swtR6F88cbKVzdm+NrDTVhCXo1eYZDF79C/l4caSVJw=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=EmykgpXy1gGmyGYdH0Q9n3pB/kU2w6Z3jdYPyOvETkpaoGinxXvL1ftzDJpxQJPvK j0oZXtAv7ioeBglEIpSY51zV/Ugg3IG1XhBIUIbChKCG24IjOUo5gB8Pe3b5qEsJM6 mSQmWfl8KVSZVxipAgi5zAaB1lzi3Kw9Sz0KTrC2f+p7aUSV8jSooEL7dJFXMpWp/v tD+68xfXySOK4qj8Lt5VYmXRiIbHB4vHa2FV12bda8FqrgYT/k/moNPDvIC2OM892K Yh/xEw1QfT5GF8M63m55tigugUQxVpqSx8GtIbXPkSJA35wcSCuUmd7eFgt+6W1XOa K55eaNV4x+onA== Received: from thinkos.internal.efficios.com (unknown [IPv6:2606:6d00:100:4000:cacb:9855:de1f:ded2]) by smtpout.efficios.com (Postfix) with ESMTPSA id 4Z3sdb69Mgzylf; Thu, 27 Feb 2025 21:30:59 -0500 (EST) From: Mathieu Desnoyers To: Andrew Morton Cc: linux-kernel@vger.kernel.org, Mathieu Desnoyers , Linus Torvalds , Matthew Wilcox , Olivier Dion , linux-mm@kvack.org Subject: [RFC PATCH 2/2] selftests/kskm: Introduce SKSM basic test Date: Thu, 27 Feb 2025 21:30:43 -0500 Message-Id: <20250228023043.83726-3-mathieu.desnoyers@efficios.com> X-Mailer: git-send-email 2.39.5 In-Reply-To: <20250228023043.83726-1-mathieu.desnoyers@efficios.com> References: <20250228023043.83726-1-mathieu.desnoyers@efficios.com> MIME-Version: 1.0 X-Rspam-User: X-Stat-Signature: sqeiwgkcgfhqihbkubhg9g13fy8g3eax X-Rspamd-Queue-Id: 9AE38A0007 X-Rspamd-Server: rspam07 X-HE-Tag: 1740709860-247593 X-HE-Meta: U2FsdGVkX1+ElsWLzw5e2OG7ga0CFS0mp2A5HkXLwHLRWGPMu+eao1ZvASxmH6SR+8bX09GgRfFGaOTNp5G3Icjq1OhIfkGWrYnO7NPZLNX+Plo9m1skdT26UhnVsygQFohXibU0oDv+8BBGJQ5FvEKsZYcyBCwEM9bbJ8M8oGl08BRoVfyt8u2D/UTWxyruMeRP1r24q/49Aau2u3FDw58GMTAzhRhjb7f+tcSetXVsK0JOb+O0X3Odq+h0P3bBOu6JFDl6gICNSeQTTcqEAatcOjbvlkwnh2N3IPkp8EGqNgtZNH6f5SEiT8gjG+mGo1gLs8Jpz39EhAStwHtSiF9iISeMDtSvvVhqXmoOtoocgtjrQBviAuktD+e7dCw7e+HOopbyDnoI5BWDXJ/2ZZVL5Wbj398nlvEs+qi7xpDtjzNKsoOl2IVkg1K7wlFvhvSGAEIr9wHRBVzql7Aaf7zbpKT5My+EJFPoWyZHx2y9MFvPNzQApwEpbOxcVi4m2B6L1fhI7ufIFNvkwf3BefFQ5N5/aJF+dF1XP7EL4O6iKZw7RQQBAJ9XoFep+JnbmPpnaUeHJjnkh40rJdlHudP2pBEZSFLAIpX07M/GmGT/n8HgGi/+ReJkjcYfgYOArSePfzqaC0g+GxRYukBoE5gn8BhaGS22uIRVByGbdLRkaktrcgDRWqNuteDnMBiQWppiwPGSUeCC1+Cu/liLew65LMUy07aO2DAooppNYqtyJ4W6OLbrCD+e84CqyVNQvOoPvj8Ac6+teLiOMQAA4+zxSy/PlIKWK+3sesRwHncagZVYvC8z2j5iQi1uoUamyqqOfyK6nBtN1oHECcVV7I8f0jxLDQ1ny3PE5TbS9oP258/2SIskHXk4+ZnJWGCVRROS/4XFQ5Az0xtdaF/IAs9YYd6YXNl2WjKv8UtpsDE8Vb8sRW4hEyRKRdOPhsuPyso73Ng5YUdw/+ScK95 z2G40Pxx dPCBEWBQeY1Z7OjmDjLAZGg72frR8A+ayvKF5jSC0zMZV2YEBZMSOp6IH2N8Iti4rrzpZ0Z0ynr4wPAcTewi5d2d2QGqX4EAJQhvHDipEAZRikc03Rh/UtG5BlRpP6YvlQC2tg3uxrZLN6g8FVJP+bAki28XHLshu5J7MHiLSBzzyZM8t8lEOYPGt97PBGyMJlMwDaMkiL1VPxHS9HdFvR2mHn/fc4YJ2PPw9MgeTkQZYZDUZicR/ypRozvMEZSbuvSMnEDSRZotOa3RSf+GCgQV3ojDK03v+w9Tz3sAU5nXmcqbEKp9qH27GsWa7iFkr1jFe86wrBMoW1UaMWtLwtqzQTQ8z+xWJvzklwLfX4BrT62q2I3rFdA4BTRtxrieSfTZCQVvqG1OM6BAwOHH0x78c4yR3qwJJaEx14zvxar0E/Fh8GpT7wVnck8/tpssZ22vQMyZjsiC3AUCZkm+ZoeP787xNjdy+QsBLyk4u6SADPJRsPknlRIR5FSEfn8cCykKH X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Introduce a basic selftest for SKSM. See ./basic_test -h for options. Signed-off-by: Mathieu Desnoyers Cc: Andrew Morton Cc: Linus Torvalds Cc: Matthew Wilcox Cc: Olivier Dion Cc: linux-mm@kvack.org --- tools/testing/selftests/sksm/.gitignore | 2 + tools/testing/selftests/sksm/Makefile | 14 ++ tools/testing/selftests/sksm/basic_test.c | 217 ++++++++++++++++++++++ 3 files changed, 233 insertions(+) create mode 100644 tools/testing/selftests/sksm/.gitignore create mode 100644 tools/testing/selftests/sksm/Makefile create mode 100644 tools/testing/selftests/sksm/basic_test.c diff --git a/tools/testing/selftests/sksm/.gitignore b/tools/testing/selftests/sksm/.gitignore new file mode 100644 index 000000000000..0f5b0baa91e7 --- /dev/null +++ b/tools/testing/selftests/sksm/.gitignore @@ -0,0 +1,2 @@ +# SPDX-License-Identifier: GPL-2.0-only +basic_test diff --git a/tools/testing/selftests/sksm/Makefile b/tools/testing/selftests/sksm/Makefile new file mode 100644 index 000000000000..ec1a10783bda --- /dev/null +++ b/tools/testing/selftests/sksm/Makefile @@ -0,0 +1,14 @@ +# SPDX-License-Identifier: GPL-2.0+ OR MIT + +top_srcdir = ../../../.. + +CFLAGS += -O2 -Wall -g -I./ $(KHDR_INCLUDES) -L$(OUTPUT) -Wl,-rpath=./ \ + $(CLANG_FLAGS) -I$(top_srcdir)/tools/include +LDLIBS += -lpthread + +TEST_GEN_PROGS = basic_test + +include ../lib.mk + +$(OUTPUT)/%: %.c + $(CC) $(CFLAGS) $< $(LDLIBS) -o $@ diff --git a/tools/testing/selftests/sksm/basic_test.c b/tools/testing/selftests/sksm/basic_test.c new file mode 100644 index 000000000000..1a7571a999d2 --- /dev/null +++ b/tools/testing/selftests/sksm/basic_test.c @@ -0,0 +1,217 @@ +// SPDX-License-Identifier: LGPL-2.1 +/* + * Basic test for SKSM. + */ + +#include +#include +#include +#include +#include +#include +#include + +#ifndef MADV_MERGE +#define MADV_MERGE 26 +#endif + +#define PAGE_SIZE 4096 + +#define WRITE_ONCE(x, val) ((*(volatile typeof(x) *) &(x)) = (val)) + +static int opt_stop_at = 0, opt_pause = 0; + +struct test_page { + char array[PAGE_SIZE] __attribute__((aligned(PAGE_SIZE))); +}; + +struct test_page2 { + char array[2 * PAGE_SIZE] __attribute__((aligned(PAGE_SIZE))); +}; + +/* identical to zero page. */ +static struct test_page zero; + +/* a1 and a2 are identical. */ +static struct test_page a1 = { + .array[0] = 0x42, + .array[1] = 0x42, +}; + +static struct test_page a2 = { + .array[0] = 0x42, + .array[1] = 0x42, +}; + +/* b1 and b2 are identical. */ +static struct test_page2 b1 = { + .array[0] = 0x43, + .array[1] = 0x43, + .array[PAGE_SIZE] = 0x44, + .array[PAGE_SIZE + 1] = 0x44, +}; + +static struct test_page2 b2 = { + .array[0] = 0x43, + .array[1] = 0x43, + .array[PAGE_SIZE] = 0x44, + .array[PAGE_SIZE + 1] = 0x44, +}; + +static void touch_pages(void *p, size_t len) +{ + size_t i; + + for (i = 0; i < len; i += PAGE_SIZE) + WRITE_ONCE(((char *)p)[i], ((char *)p)[i]); +} + +static void test_step(char step) +{ + printf("\nTest step: <%c>\n", step); + if (opt_pause) { + printf("Press ENTER to continue...\n"); + getchar(); + } + if (opt_stop_at == step) { + poll(NULL, 0, -1); + exit(0); + } +} + +static void show_usage(int argc, char **argv) +{ + printf("Usage : %s \n", + argv[0]); + printf("OPTIONS:\n"); + printf(" [-s stop_at] Stop test at step A, B, C, D, E, or F and wait forever.\n"); + printf(" [-p] Pause test between steps (await newline from the console).\n"); + printf(" [-h] Show this help.\n"); + printf("\n"); +} + +int main(int argc, char **argv) +{ + int i; + + for (i = 1; i < argc; i++) { + if (argv[i][0] != '-') + continue; + switch (argv[i][1]) { + case 's': + if (argc < i + 2) { + show_usage(argc, argv); + return -1; + } + opt_stop_at = *argv[i + 1]; + switch (opt_stop_at) { + case 'A': + case 'B': + case 'C': + case 'D': + case 'E': + case 'F': + break; + default: + show_usage(argc, argv); + return -1; + } + i++; + break; + case 'p': + opt_pause = 1; + i++; + break; + case 'h': + show_usage(argc, argv); + return 0; + default: + show_usage(argc, argv); + return -1; + } + } + + + printf("PID: %d\n", getpid()); + printf("Shared mapping (write-protected)\n"); + + test_step('A'); + + printf("madvise MADV_MERGE a1\n"); + if (madvise(&a1, sizeof(a1), MADV_MERGE)) + goto error; + printf("madvise MADV_MERGE a2\n"); + if (madvise(&a2, sizeof(a2), MADV_MERGE)) + goto error; + printf("madvise MADV_MERGE b1\n"); + if (madvise(&b1, sizeof(b1), MADV_MERGE)) + goto error; + printf("madvise MADV_MERGE b2\n"); + if (madvise(&b2, sizeof(b2), MADV_MERGE)) + goto error; + printf("madvise MADV_MERGE zero\n"); + if (madvise(&zero, sizeof(zero), MADV_MERGE)) + goto error; + + test_step('B'); + + printf("Trigger COW\n"); + touch_pages(&zero, sizeof(zero)); + touch_pages(&a1, sizeof(a1)); + touch_pages(&a2, sizeof(a2)); + touch_pages(&b1, sizeof(b1)); + touch_pages(&b2, sizeof(b2)); + + test_step('C'); + + printf("madvise MADV_MERGE a1\n"); + if (madvise(&a1, sizeof(a1), MADV_MERGE)) + goto error; + printf("madvise MADV_MERGE a2\n"); + if (madvise(&a2, sizeof(a2), MADV_MERGE)) + goto error; + printf("madvise MADV_MERGE b1\n"); + if (madvise(&b1, sizeof(b1), MADV_MERGE)) + goto error; + printf("madvise MADV_MERGE b2\n"); + if (madvise(&b2, sizeof(b2), MADV_MERGE)) + goto error; + printf("madvise MADV_MERGE zero\n"); + if (madvise(&zero, sizeof(zero), MADV_MERGE)) + goto error; + + test_step('D'); + + printf("Trigger COW\n"); + touch_pages(&zero, sizeof(zero)); + touch_pages(&a1, sizeof(a1)); + touch_pages(&a2, sizeof(a2)); + touch_pages(&b1, sizeof(b1)); + touch_pages(&b2, sizeof(b2)); + + test_step('E'); + + printf("madvise MADV_MERGE a1\n"); + if (madvise(&a1, sizeof(a1), MADV_MERGE)) + goto error; + printf("madvise MADV_MERGE a2\n"); + if (madvise(&a2, sizeof(a2), MADV_MERGE)) + goto error; + printf("madvise MADV_MERGE b1\n"); + if (madvise(&b1, sizeof(b1), MADV_MERGE)) + goto error; + printf("madvise MADV_MERGE b2\n"); + if (madvise(&b2, sizeof(b2), MADV_MERGE)) + goto error; + printf("madvise MADV_MERGE zero\n"); + if (madvise(&zero, sizeof(zero), MADV_MERGE)) + goto error; + + test_step('F'); + + return 0; + +error: + perror("madvise"); + return -1; +}