From patchwork Mon Jul 18 11:47:45 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nadav Amit X-Patchwork-Id: 12921633 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E9628C43334 for ; Mon, 18 Jul 2022 19:22:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3FB136B0073; Mon, 18 Jul 2022 15:22:41 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3D1156B0074; Mon, 18 Jul 2022 15:22:41 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 24B4C8E0001; Mon, 18 Jul 2022 15:22:41 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 17DB66B0073 for ; Mon, 18 Jul 2022 15:22:41 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id D33DC20AE6 for ; Mon, 18 Jul 2022 19:22:40 +0000 (UTC) X-FDA: 79701192480.19.C3C0E6B Received: from relay5.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by imf11.hostedemail.com (Postfix) with ESMTP id 6C74F4008A for ; Mon, 18 Jul 2022 19:22:40 +0000 (UTC) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay11.hostedemail.com (Postfix) with ESMTP id 28CF580232 for ; Mon, 18 Jul 2022 19:22:40 +0000 (UTC) X-FDA: 79701192480.21.ACA6138 Received: from mail-pl1-f175.google.com (mail-pl1-f175.google.com [209.85.214.175]) by imf13.hostedemail.com (Postfix) with ESMTP id 8FFF320097 for ; Mon, 18 Jul 2022 19:22:39 +0000 (UTC) Received: by mail-pl1-f175.google.com with SMTP id x21so9928630plb.3 for ; Mon, 18 Jul 2022 12:22:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=OQnkRYEPKgq5fO0dKH7Bjw+39/XJwiUYDPfNEcN6JO0=; b=M1DzUFc6zLXb04kP4nOe3+o6pUebaSEUk6RzGnlDaqEZEKd5bxpSaVZvKqxGQAOx6F VSBAvPSDLdRMAGlgMHpGIlfzkhTLfucwdOesHgvh8yvqvrKdgmPI3yaZPp0c+EJL+8fO OzjHL3CgOWnSF5oHdu1aHz1/3NMU8uPHT4R9WMAXlfTZ/sHUSJ9SzCmkrK3eAWZIULjs e1eg+wlysgijKTAzL4nMMnauEbbcBJBUv10Axbi8c23+Aad8Cs0isWWBOuRGqsohcy// lyxiz5UCBUZR1Vo5pZBhh6kSwaw47f3i2d3DYoGnBZCTNFmfPOYPiXXBTwkCRJvUytk/ vP6Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=OQnkRYEPKgq5fO0dKH7Bjw+39/XJwiUYDPfNEcN6JO0=; b=RmNIekLxLJDE5WsIkO3vhKOtV4o8LBVreeUOhz8PS5VtG9j2c43noTEliyhHI2NI0p AEP7d/sBWPcHLYx2GWbzdnmE9Jtartj1nXNHBC4KqiYRCnTrquE0jwOX2eh49t2wl14L 07aXaM+TdpcG3Y3ir9PwQvC7FeqiIZRrfl5dKhKx6wUTRDUU+6yQ0/8Bem8muXwmhUn6 w1j1ea1z1wNiwnhPEsrqslKvz3+a6GBhmVpDSwek03E1HS5wwpP89xFPbe47hn1ud/li geWnv2dIwkLDM97+DKYs5jKYlRqNoXhgWKgrWmT6S9AlJYkiagGddPz45q7LqcgCFLqL JJqg== X-Gm-Message-State: AJIora+TQiHO3xbRNFiZJyipaPMs6LJaqWrsnBQrFSs3Ii9c5Tr8oVPc jVpJtk9wJ/T3UkX6gXBFhfTQLijcPio= X-Google-Smtp-Source: AGRyM1uxuD1e+x/GR+8VUjRTBxiouQOGweL1WbfvLxKcCk30SJb5pqisAW4T/jh8UBjITxFc2oAYZQ== X-Received: by 2002:a17:90b:1c81:b0:1f1:aa6a:82d0 with SMTP id oo1-20020a17090b1c8100b001f1aa6a82d0mr13446935pjb.170.1658172158084; Mon, 18 Jul 2022 12:22:38 -0700 (PDT) Received: from sc2-haas01-esx0118.eng.vmware.com ([66.170.99.1]) by smtp.gmail.com with ESMTPSA id g9-20020a625209000000b0051bc5f4df1csm9613570pfb.154.2022.07.18.12.22.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 18 Jul 2022 12:22:37 -0700 (PDT) From: Nadav Amit X-Google-Original-From: Nadav Amit To: linux-mm@kvack.org Cc: Andrew Morton , Nadav Amit , Mike Kravetz , Hugh Dickins , Axel Rasmussen , Peter Xu , David Hildenbrand , Mike Rapoport Subject: [PATCH v2 2/5] userfaultfd: introduce access-likely mode for common operations Date: Mon, 18 Jul 2022 04:47:45 -0700 Message-Id: <20220718114748.2623-3-namit@vmware.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220718114748.2623-1-namit@vmware.com> References: <20220718114748.2623-1-namit@vmware.com> MIME-Version: 1.0 ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1658172160; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=OQnkRYEPKgq5fO0dKH7Bjw+39/XJwiUYDPfNEcN6JO0=; b=GmHF2CAexoFmEjNlkv64L48rN876Ng5BMxBjO5Nhe+L08Xq6TUEkuFUWHtWPzAm+J8iuk0 TfxV/YpuMMgmMMWb8org4hHWKqHus6UmUeBbrTgGWnww+UsXqKl0H30QmA9tIZoyJiSzON Tq02cJNFbBnYl0x0qhheZRa5NaSDgWE= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=M1DzUFc6; spf=none (imf11.hostedemail.com: domain of MAILER-DAEMON@hostedemail.com has no SPF policy when checking 216.40.44.16) smtp.mailfrom=MAILER-DAEMON@hostedemail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1658172160; a=rsa-sha256; cv=none; b=6YoF1wZ+w3G+w08WMYsVghLyVjV9BAuNv7zPGB/4JuaCsSfroj4Jam0Oep3S0QNkgHVh16 HpB5FE/Cr70Szv6zclZHCnwszntVztYBnBSr4dq9DBo9Y9gZweFRUyeGX1ea0O3WHkbLJW NOIUVvUYEgHJTBbevCu4HrzlCdBKna4= X-Stat-Signature: 3a5urpxpu7pah736dr7u9yghhcaakyyf X-Rspamd-Queue-Id: 6C74F4008A X-HE-Tag-Orig: 1658172159-661772 X-Rspamd-Server: rspam02 X-Rspam-User: Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=M1DzUFc6; spf=none (imf11.hostedemail.com: domain of MAILER-DAEMON@hostedemail.com has no SPF policy when checking 216.40.44.16) smtp.mailfrom=MAILER-DAEMON@hostedemail.com; dmarc=pass (policy=none) header.from=gmail.com X-HE-Tag: 1658172160-772625 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Nadav Amit Introduce access-hints in userfaultfd. The expectation is that userspace would set access-hints when a page-fault occurred on a page and would not provide the access-hint on prefaulted memory. The exact behavior of the kernel in regard to the hints would not be part of userfaultfd api. At this time the use of the access-hint is only in setting access-bit similarly to the way it is done in do_set_pte(). In x86, currently PTEs are always marked as young, including prefetched ones. But on arm64, PTEs would be marked as old (when access bit is supported). If access hints are not enabled, the kernel would behave as if the access-hint was provided for backward compatibility. Cc: Mike Kravetz Cc: Hugh Dickins Cc: Andrew Morton Cc: Axel Rasmussen Cc: Peter Xu Cc: David Hildenbrand Cc: Mike Rapoport Signed-off-by: Nadav Amit --- fs/userfaultfd.c | 39 ++++++++++++++++++++++++++++---- include/linux/userfaultfd_k.h | 1 + include/uapi/linux/userfaultfd.h | 20 +++++++++++++++- mm/internal.h | 13 +++++++++++ mm/memory.c | 12 ---------- mm/userfaultfd.c | 11 +++++++-- 6 files changed, 77 insertions(+), 19 deletions(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 2ae24327beec..8d8792b27c53 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -1708,13 +1708,21 @@ static int userfaultfd_copy(struct userfaultfd_ctx *ctx, ret = -EINVAL; if (uffdio_copy.src + uffdio_copy.len <= uffdio_copy.src) goto out; - if (uffdio_copy.mode & ~(UFFDIO_COPY_MODE_DONTWAKE|UFFDIO_COPY_MODE_WP)) + if (uffdio_copy.mode & ~(UFFDIO_COPY_MODE_DONTWAKE|UFFDIO_COPY_MODE_WP| + UFFDIO_COPY_MODE_ACCESS_LIKELY)) goto out; mode_wp = uffdio_copy.mode & UFFDIO_COPY_MODE_WP; uffd_flags = mode_wp ? UFFD_FLAGS_WP : UFFD_FLAGS_NONE; + if (ctx->features & UFFD_FEATURE_ACCESS_HINTS) { + if (uffdio_copy.mode & UFFDIO_COPY_MODE_ACCESS_LIKELY) + uffd_flags |= UFFD_FLAGS_ACCESS_LIKELY; + } else { + uffd_flags |= UFFD_FLAGS_ACCESS_LIKELY; + } + if (mmget_not_zero(ctx->mm)) { ret = mcopy_atomic(ctx->mm, uffdio_copy.dst, uffdio_copy.src, uffdio_copy.len, &ctx->mmap_changing, @@ -1765,9 +1773,17 @@ static int userfaultfd_zeropage(struct userfaultfd_ctx *ctx, if (ret) goto out; ret = -EINVAL; - if (uffdio_zeropage.mode & ~UFFDIO_ZEROPAGE_MODE_DONTWAKE) + if (uffdio_zeropage.mode & ~(UFFDIO_ZEROPAGE_MODE_DONTWAKE| + UFFDIO_ZEROPAGE_MODE_ACCESS_LIKELY)) goto out; + if (ctx->features & UFFD_FEATURE_ACCESS_HINTS) { + if (uffdio_zeropage.mode & UFFDIO_ZEROPAGE_MODE_ACCESS_LIKELY) + uffd_flags |= UFFD_FLAGS_ACCESS_LIKELY; + } else { + uffd_flags |= UFFD_FLAGS_ACCESS_LIKELY; + } + if (mmget_not_zero(ctx->mm)) { ret = mfill_zeropage(ctx->mm, uffdio_zeropage.range.start, uffdio_zeropage.range.len, @@ -1817,7 +1833,8 @@ static int userfaultfd_writeprotect(struct userfaultfd_ctx *ctx, return ret; if (uffdio_wp.mode & ~(UFFDIO_WRITEPROTECT_MODE_DONTWAKE | - UFFDIO_WRITEPROTECT_MODE_WP)) + UFFDIO_WRITEPROTECT_MODE_WP | + UFFDIO_WRITEPROTECT_MODE_ACCESS_LIKELY)) return -EINVAL; mode_wp = uffdio_wp.mode & UFFDIO_WRITEPROTECT_MODE_WP; @@ -1827,6 +1844,12 @@ static int userfaultfd_writeprotect(struct userfaultfd_ctx *ctx, return -EINVAL; uffd_flags = mode_wp ? UFFD_FLAGS_WP : UFFD_FLAGS_NONE; + if (ctx->features & UFFD_FEATURE_ACCESS_HINTS) { + if (uffdio_wp.mode & UFFDIO_WRITEPROTECT_MODE_ACCESS_LIKELY) + uffd_flags |= UFFD_FLAGS_ACCESS_LIKELY; + } else { + uffd_flags |= UFFD_FLAGS_ACCESS_LIKELY; + } if (mmget_not_zero(ctx->mm)) { ret = mwriteprotect_range(ctx->mm, uffdio_wp.range.start, @@ -1879,9 +1902,17 @@ static int userfaultfd_continue(struct userfaultfd_ctx *ctx, unsigned long arg) uffdio_continue.range.start) { goto out; } - if (uffdio_continue.mode & ~UFFDIO_CONTINUE_MODE_DONTWAKE) + if (uffdio_continue.mode & ~(UFFDIO_CONTINUE_MODE_DONTWAKE| + UFFDIO_CONTINUE_MODE_ACCESS_LIKELY)) goto out; + if (ctx->features & UFFD_FEATURE_ACCESS_HINTS) { + if (uffdio_continue.mode & UFFDIO_CONTINUE_MODE_ACCESS_LIKELY) + uffd_flags |= UFFD_FLAGS_ACCESS_LIKELY; + } else { + uffd_flags |= UFFD_FLAGS_ACCESS_LIKELY; + } + if (mmget_not_zero(ctx->mm)) { ret = mcopy_continue(ctx->mm, uffdio_continue.range.start, uffdio_continue.range.len, diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index a63b61823984..b326798b5677 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -59,6 +59,7 @@ typedef unsigned int __bitwise uffd_flags_t; #define UFFD_FLAGS_NONE ((__force uffd_flags_t)0) #define UFFD_FLAGS_WP ((__force uffd_flags_t)BIT(0)) +#define UFFD_FLAGS_ACCESS_LIKELY ((__force uffd_flags_t)BIT(1)) extern int mfill_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd, struct vm_area_struct *dst_vma, diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaultfd.h index 7d32b1e797fb..02e0c1f56939 100644 --- a/include/uapi/linux/userfaultfd.h +++ b/include/uapi/linux/userfaultfd.h @@ -34,7 +34,8 @@ UFFD_FEATURE_MINOR_HUGETLBFS | \ UFFD_FEATURE_MINOR_SHMEM | \ UFFD_FEATURE_EXACT_ADDRESS | \ - UFFD_FEATURE_WP_HUGETLBFS_SHMEM) + UFFD_FEATURE_WP_HUGETLBFS_SHMEM | \ + UFFD_FEATURE_ACCESS_HINTS) #define UFFD_API_IOCTLS \ ((__u64)1 << _UFFDIO_REGISTER | \ (__u64)1 << _UFFDIO_UNREGISTER | \ @@ -199,6 +200,9 @@ struct uffdio_api { * * UFFD_FEATURE_WP_HUGETLBFS_SHMEM indicates that userfaultfd * write-protection mode is supported on both shmem and hugetlbfs. + * + * UFFD_FEATURE_ACCESS_HINTS indicates that the ioctl operations + * support the UFFDIO_*_MODE_ACCESS_LIKELY hints. */ #define UFFD_FEATURE_PAGEFAULT_FLAG_WP (1<<0) #define UFFD_FEATURE_EVENT_FORK (1<<1) @@ -213,6 +217,7 @@ struct uffdio_api { #define UFFD_FEATURE_MINOR_SHMEM (1<<10) #define UFFD_FEATURE_EXACT_ADDRESS (1<<11) #define UFFD_FEATURE_WP_HUGETLBFS_SHMEM (1<<12) +#define UFFD_FEATURE_ACCESS_HINTS (1<<13) __u64 features; __u64 ioctls; @@ -247,8 +252,14 @@ struct uffdio_copy { * the fly. UFFDIO_COPY_MODE_WP is available only if the * write protected ioctl is implemented for the range * according to the uffdio_register.ioctls. + * + * UFFDIO_COPY_MODE_ACCESS_LIKELY provides a hint to the kernel that the + * page is likely to be access in the near future. Providing the hint + * properly can improve performance. + * */ #define UFFDIO_COPY_MODE_WP ((__u64)1<<1) +#define UFFDIO_COPY_MODE_ACCESS_LIKELY ((__u64)1<<2) __u64 mode; /* @@ -261,6 +272,7 @@ struct uffdio_copy { struct uffdio_zeropage { struct uffdio_range range; #define UFFDIO_ZEROPAGE_MODE_DONTWAKE ((__u64)1<<0) +#define UFFDIO_ZEROPAGE_MODE_ACCESS_LIKELY ((__u64)1<<1) __u64 mode; /* @@ -280,6 +292,10 @@ struct uffdio_writeprotect { * UFFDIO_WRITEPROTECT_MODE_DONTWAKE: set the flag to avoid waking up * any wait thread after the operation succeeds. * + * UFFDIO_WRITEPROTECT_MODE_ACCESS_LIKELY provides a hint to the kernel + * that the page is likely to be access in the near future. Providing + * the hint properly can improve performance. + * * NOTE: Write protecting a region (WP=1) is unrelated to page faults, * therefore DONTWAKE flag is meaningless with WP=1. Removing write * protection (WP=0) in response to a page fault wakes the faulting @@ -287,12 +303,14 @@ struct uffdio_writeprotect { */ #define UFFDIO_WRITEPROTECT_MODE_WP ((__u64)1<<0) #define UFFDIO_WRITEPROTECT_MODE_DONTWAKE ((__u64)1<<1) +#define UFFDIO_WRITEPROTECT_MODE_ACCESS_LIKELY ((__u64)1<<2) __u64 mode; }; struct uffdio_continue { struct uffdio_range range; #define UFFDIO_CONTINUE_MODE_DONTWAKE ((__u64)1<<0) +#define UFFDIO_CONTINUE_MODE_ACCESS_LIKELY ((__u64)1<<1) __u64 mode; /* diff --git a/mm/internal.h b/mm/internal.h index c0f8fbe0445b..d035b77b4f2f 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -12,6 +12,7 @@ #include #include #include +#include struct folio_batch; @@ -861,4 +862,16 @@ struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags); DECLARE_PER_CPU(struct per_cpu_nodestat, boot_nodestats); +#ifndef arch_wants_old_prefaulted_pte +static inline bool arch_wants_old_prefaulted_pte(void) +{ + /* + * Transitioning a PTE from 'old' to 'young' can be expensive on + * some architectures, even if it's performed in hardware. By + * default, "false" means prefaulted entries will be 'young'. + */ + return false; +} +#endif + #endif /* __MM_INTERNAL_H */ diff --git a/mm/memory.c b/mm/memory.c index 580c62febe42..31ec3f0071a2 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -137,18 +137,6 @@ static inline bool arch_faults_on_old_pte(void) } #endif -#ifndef arch_wants_old_prefaulted_pte -static inline bool arch_wants_old_prefaulted_pte(void) -{ - /* - * Transitioning a PTE from 'old' to 'young' can be expensive on - * some architectures, even if it's performed in hardware. By - * default, "false" means prefaulted entries will be 'young'. - */ - return false; -} -#endif - static int __init disable_randmaps(char *s) { randomize_va_space = 0; diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 421784d26651..c15679f3eb6a 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -65,6 +65,7 @@ int mfill_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd, bool writable = dst_vma->vm_flags & VM_WRITE; bool vm_shared = dst_vma->vm_flags & VM_SHARED; bool page_in_cache = page->mapping; + bool prefault = !(uffd_flags & UFFD_FLAGS_ACCESS_LIKELY); spinlock_t *ptl; struct inode *inode; pgoff_t offset, max_off; @@ -92,6 +93,11 @@ int mfill_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd, */ _dst_pte = pte_wrprotect(_dst_pte); + if (prefault && arch_wants_old_prefaulted_pte()) + _dst_pte = pte_mkold(_dst_pte); + else + _dst_pte = pte_sw_mkyoung(_dst_pte); + dst_pte = pte_offset_map_lock(dst_mm, dst_pmd, dst_addr, &ptl); if (vma_is_shmem(dst_vma)) { @@ -202,7 +208,8 @@ static int mcopy_atomic_pte(struct mm_struct *dst_mm, static int mfill_zeropage_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd, struct vm_area_struct *dst_vma, - unsigned long dst_addr) + unsigned long dst_addr, + uffd_flags_t uffd_flags) { pte_t _dst_pte, *dst_pte; spinlock_t *ptl; @@ -495,7 +502,7 @@ static __always_inline ssize_t mfill_atomic_pte(struct mm_struct *dst_mm, uffd_flags); else err = mfill_zeropage_pte(dst_mm, dst_pmd, - dst_vma, dst_addr); + dst_vma, dst_addr, uffd_flags); } else { err = shmem_mfill_atomic_pte(dst_mm, dst_pmd, dst_vma, dst_addr, src_addr, From patchwork Mon Jul 18 11:47:46 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nadav Amit X-Patchwork-Id: 12921634 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CCCD8C433EF for ; Mon, 18 Jul 2022 19:22:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 582EE6B0074; Mon, 18 Jul 2022 15:22:42 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4E3608E0001; Mon, 18 Jul 2022 15:22:42 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2E71B6B0078; Mon, 18 Jul 2022 15:22:42 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 1F6136B0074 for ; Mon, 18 Jul 2022 15:22:42 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id E1B9B207C8 for ; Mon, 18 Jul 2022 19:22:41 +0000 (UTC) X-FDA: 79701192522.25.3E7F711 Received: from relay3.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by imf18.hostedemail.com (Postfix) with ESMTP id 6F8761C007E for ; Mon, 18 Jul 2022 19:22:41 +0000 (UTC) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 2888E208B0 for ; Mon, 18 Jul 2022 19:22:41 +0000 (UTC) X-FDA: 79701192522.28.D59C628 Received: from mail-pl1-f181.google.com (mail-pl1-f181.google.com [209.85.214.181]) by imf12.hostedemail.com (Postfix) with ESMTP id B50A240086 for ; Mon, 18 Jul 2022 19:22:40 +0000 (UTC) Received: by mail-pl1-f181.google.com with SMTP id 5so9924508plk.9 for ; Mon, 18 Jul 2022 12:22:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=9JnFtO1B98JvBHwzxYS/9yFVfDMFi9WXo7p+265wzs4=; b=CiY4d6Ms1IdgIR0MPBSFFR0IAYGl6NHf1DWuXKYfE5/RzgsP7qdnaAKBTVAHZTx9NW auOpI1ORXWjaspNpWnXLrQB4AjRTnYqCp6S/rOqCnBGRANchIgWU9rw1AcTaAesLlMs5 lVSltvp/6DUSLak8UtWHpVH+E7/wiwp8S0zBUOJ5c66Hfwt1zNOA2i/a/BXuHDPdNXEe yrAzNGSUHusD265TOzFzbNkaO9DnrLS1t2KUxtL2yKBlwB58mA+cGP5+thURTsclrYU0 V0GSAEiREDI+LF94z7e6dJt+tLoseXy1FeDQcn/BRt931N7WhYcdVxV8eGMIBFWrw/ds iIoQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=9JnFtO1B98JvBHwzxYS/9yFVfDMFi9WXo7p+265wzs4=; b=EiyA4CtRMEpio6FgKgujVMW2yXfDaLDN/Plz7Fx55gWgyJ9GEutmnOtydi+82ITyrq cRJj6FcKnJlkWrA4Cjigc9tQKba1X1rsY5xRvhS61JnlFZwkr0vVQgfGsZ3FbjGRFx8I ZQA+u/5YocvLf/pJva455ZmyG20Gc+VB2a5/S5q00nEwtgY4tM7yCpVPgxXferHXZeLj eOLF3kEUqSdiBZBsGnxIRXpouECryl5qfJN37uHQlsI9f+A54bzXQNdmDKTxzsz8bM08 2jVXSu6f+AQgILEYTZWxfpyNw5eojk4dOq8TDfOQ7N8n/rj3/KEYzV1A7UOE73uWK3oW drsQ== X-Gm-Message-State: AJIora9pSqZ3WMs+XQNY9kK5ZJGwweL1s2QUrOwV2V1QjTCNDyp+tjc5 U++xDJ26zg3AuvZWDSofDIYnxq7/w1M= X-Google-Smtp-Source: AGRyM1vy5jW3YbDHpO3/btFga+TPOadwDARgmiX5HyZerVF4vZ73mef2kK9Tdtud/9nVygaSnuDlug== X-Received: by 2002:a17:90a:f001:b0:1f0:3285:6b5b with SMTP id bt1-20020a17090af00100b001f032856b5bmr34995317pjb.12.1658172159315; Mon, 18 Jul 2022 12:22:39 -0700 (PDT) Received: from sc2-haas01-esx0118.eng.vmware.com ([66.170.99.1]) by smtp.gmail.com with ESMTPSA id g9-20020a625209000000b0051bc5f4df1csm9613570pfb.154.2022.07.18.12.22.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 18 Jul 2022 12:22:38 -0700 (PDT) From: Nadav Amit X-Google-Original-From: Nadav Amit To: linux-mm@kvack.org Cc: Andrew Morton , Nadav Amit , Mike Kravetz , Hugh Dickins , Axel Rasmussen , Peter Xu , David Hildenbrand , Mike Rapoport Subject: [PATCH v2 3/5] userfaultfd: introduce write-likely mode for uffd operations Date: Mon, 18 Jul 2022 04:47:46 -0700 Message-Id: <20220718114748.2623-4-namit@vmware.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220718114748.2623-1-namit@vmware.com> References: <20220718114748.2623-1-namit@vmware.com> MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1658172161; a=rsa-sha256; cv=none; b=NJETpBUINWfFhUnrZQyrSrKpSxPzdViFj/FjbOrV3xXnuRbifs+QTK9hE0ZngZ/yYMOQjm QcUiqoxRFysW2BXT4bnEzy8tNW88SY1cpeLPmHKwrZWdE8TpDrG8Ioo521/iFNj+exPJ55 8S9QbCpGDb8aEaKZtF9oSrEfl01T4E4= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=CiY4d6Ms; spf=none (imf18.hostedemail.com: domain of MAILER-DAEMON@hostedemail.com has no SPF policy when checking 216.40.44.15) smtp.mailfrom=MAILER-DAEMON@hostedemail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1658172161; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=9JnFtO1B98JvBHwzxYS/9yFVfDMFi9WXo7p+265wzs4=; b=YwKlT/zyhqRJrlG1zdF88M0egbLRR6qQ2wLlG9yblyn92vQtsM7iOAYqTXZLMi/bF/uCho TIEMVS5Rmf0VaEPT0fDZETSVAnlNqouPDPUk1UtXz/sMV5oOiTN3xM0lk1fO9PkELlzJE/ 3p8HzHdy1qAyafrwKjFfxC/UN5yRD2M= X-Stat-Signature: pmunf9pb8htsckh7n8owi3tuekyce4br X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 6F8761C007E X-Rspam-User: X-HE-Tag-Orig: 1658172160-386368 Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=CiY4d6Ms; spf=none (imf18.hostedemail.com: domain of MAILER-DAEMON@hostedemail.com has no SPF policy when checking 216.40.44.15) smtp.mailfrom=MAILER-DAEMON@hostedemail.com; dmarc=pass (policy=none) header.from=gmail.com X-HE-Tag: 1658172161-131042 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Nadav Amit Introduce write-likely hints for uffd. These hints would be used in a future patch to decide whether to attempt to map pages in the page-table or whether to only mark them logically as writable. This allows userspace to determine whether a page would be accessed faster or whether removal of the page would be possible, potentially, without writeback and TLB flush. Cc: Mike Kravetz Cc: Hugh Dickins Cc: Andrew Morton Cc: Axel Rasmussen Cc: Peter Xu Cc: David Hildenbrand Cc: Mike Rapoport Signed-off-by: Nadav Amit --- fs/userfaultfd.c | 32 ++++++++++++++++++++++++-------- include/linux/userfaultfd_k.h | 1 + include/uapi/linux/userfaultfd.h | 13 ++++++++++++- 3 files changed, 37 insertions(+), 9 deletions(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 8d8792b27c53..3027d228550a 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -1709,7 +1709,8 @@ static int userfaultfd_copy(struct userfaultfd_ctx *ctx, if (uffdio_copy.src + uffdio_copy.len <= uffdio_copy.src) goto out; if (uffdio_copy.mode & ~(UFFDIO_COPY_MODE_DONTWAKE|UFFDIO_COPY_MODE_WP| - UFFDIO_COPY_MODE_ACCESS_LIKELY)) + UFFDIO_COPY_MODE_ACCESS_LIKELY| + UFFDIO_COPY_MODE_WRITE_LIKELY)) goto out; mode_wp = uffdio_copy.mode & UFFDIO_COPY_MODE_WP; @@ -1719,8 +1720,11 @@ static int userfaultfd_copy(struct userfaultfd_ctx *ctx, if (ctx->features & UFFD_FEATURE_ACCESS_HINTS) { if (uffdio_copy.mode & UFFDIO_COPY_MODE_ACCESS_LIKELY) uffd_flags |= UFFD_FLAGS_ACCESS_LIKELY; + if (uffdio_copy.mode & UFFDIO_COPY_MODE_WRITE_LIKELY) + uffd_flags |= UFFD_FLAGS_WRITE_LIKELY; } else { - uffd_flags |= UFFD_FLAGS_ACCESS_LIKELY; + uffd_flags |= UFFD_FLAGS_ACCESS_LIKELY | + UFFD_FLAGS_WRITE_LIKELY; } if (mmget_not_zero(ctx->mm)) { @@ -1774,14 +1778,18 @@ static int userfaultfd_zeropage(struct userfaultfd_ctx *ctx, goto out; ret = -EINVAL; if (uffdio_zeropage.mode & ~(UFFDIO_ZEROPAGE_MODE_DONTWAKE| - UFFDIO_ZEROPAGE_MODE_ACCESS_LIKELY)) + UFFDIO_ZEROPAGE_MODE_ACCESS_LIKELY| + UFFDIO_ZEROPAGE_MODE_WRITE_LIKELY)) goto out; if (ctx->features & UFFD_FEATURE_ACCESS_HINTS) { if (uffdio_zeropage.mode & UFFDIO_ZEROPAGE_MODE_ACCESS_LIKELY) uffd_flags |= UFFD_FLAGS_ACCESS_LIKELY; + if (uffdio_zeropage.mode & UFFDIO_ZEROPAGE_MODE_WRITE_LIKELY) + uffd_flags |= UFFD_FLAGS_WRITE_LIKELY; } else { - uffd_flags |= UFFD_FLAGS_ACCESS_LIKELY; + uffd_flags |= UFFD_FLAGS_ACCESS_LIKELY | + UFFD_FLAGS_WRITE_LIKELY; } if (mmget_not_zero(ctx->mm)) { @@ -1834,7 +1842,8 @@ static int userfaultfd_writeprotect(struct userfaultfd_ctx *ctx, if (uffdio_wp.mode & ~(UFFDIO_WRITEPROTECT_MODE_DONTWAKE | UFFDIO_WRITEPROTECT_MODE_WP | - UFFDIO_WRITEPROTECT_MODE_ACCESS_LIKELY)) + UFFDIO_WRITEPROTECT_MODE_ACCESS_LIKELY | + UFFDIO_WRITEPROTECT_MODE_WRITE_LIKELY)) return -EINVAL; mode_wp = uffdio_wp.mode & UFFDIO_WRITEPROTECT_MODE_WP; @@ -1847,8 +1856,11 @@ static int userfaultfd_writeprotect(struct userfaultfd_ctx *ctx, if (ctx->features & UFFD_FEATURE_ACCESS_HINTS) { if (uffdio_wp.mode & UFFDIO_WRITEPROTECT_MODE_ACCESS_LIKELY) uffd_flags |= UFFD_FLAGS_ACCESS_LIKELY; + if (uffdio_wp.mode & UFFDIO_WRITEPROTECT_MODE_WRITE_LIKELY) + uffd_flags |= UFFD_FLAGS_WRITE_LIKELY; } else { - uffd_flags |= UFFD_FLAGS_ACCESS_LIKELY; + uffd_flags |= UFFD_FLAGS_ACCESS_LIKELY | + UFFD_FLAGS_WRITE_LIKELY; } if (mmget_not_zero(ctx->mm)) { @@ -1903,14 +1915,18 @@ static int userfaultfd_continue(struct userfaultfd_ctx *ctx, unsigned long arg) goto out; } if (uffdio_continue.mode & ~(UFFDIO_CONTINUE_MODE_DONTWAKE| - UFFDIO_CONTINUE_MODE_ACCESS_LIKELY)) + UFFDIO_CONTINUE_MODE_ACCESS_LIKELY| + UFFDIO_CONTINUE_MODE_WRITE_LIKELY)) goto out; if (ctx->features & UFFD_FEATURE_ACCESS_HINTS) { if (uffdio_continue.mode & UFFDIO_CONTINUE_MODE_ACCESS_LIKELY) uffd_flags |= UFFD_FLAGS_ACCESS_LIKELY; + if (uffdio_continue.mode & UFFDIO_CONTINUE_MODE_WRITE_LIKELY) + uffd_flags |= UFFD_FLAGS_WRITE_LIKELY; } else { - uffd_flags |= UFFD_FLAGS_ACCESS_LIKELY; + uffd_flags |= UFFD_FLAGS_ACCESS_LIKELY | + UFFD_FLAGS_WRITE_LIKELY; } if (mmget_not_zero(ctx->mm)) { diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index b326798b5677..4968c86938b2 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -60,6 +60,7 @@ typedef unsigned int __bitwise uffd_flags_t; #define UFFD_FLAGS_NONE ((__force uffd_flags_t)0) #define UFFD_FLAGS_WP ((__force uffd_flags_t)BIT(0)) #define UFFD_FLAGS_ACCESS_LIKELY ((__force uffd_flags_t)BIT(1)) +#define UFFD_FLAGS_WRITE_LIKELY ((__force uffd_flags_t)BIT(2)) extern int mfill_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd, struct vm_area_struct *dst_vma, diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaultfd.h index 02e0c1f56939..f52cbe4c9c44 100644 --- a/include/uapi/linux/userfaultfd.h +++ b/include/uapi/linux/userfaultfd.h @@ -202,7 +202,7 @@ struct uffdio_api { * write-protection mode is supported on both shmem and hugetlbfs. * * UFFD_FEATURE_ACCESS_HINTS indicates that the ioctl operations - * support the UFFDIO_*_MODE_ACCESS_LIKELY hints. + * support the UFFDIO_*_MODE_[ACCESS|WRITE]_LIKELY hints. */ #define UFFD_FEATURE_PAGEFAULT_FLAG_WP (1<<0) #define UFFD_FEATURE_EVENT_FORK (1<<1) @@ -257,9 +257,13 @@ struct uffdio_copy { * page is likely to be access in the near future. Providing the hint * properly can improve performance. * + * UFFDIO_COPY_MODE_WRITE_LIKELY provides a hint to the kernel that the + * page is likely to be written in the near future. Providing the hint + * properly can improve performance. */ #define UFFDIO_COPY_MODE_WP ((__u64)1<<1) #define UFFDIO_COPY_MODE_ACCESS_LIKELY ((__u64)1<<2) +#define UFFDIO_COPY_MODE_WRITE_LIKELY ((__u64)1<<3) __u64 mode; /* @@ -273,6 +277,7 @@ struct uffdio_zeropage { struct uffdio_range range; #define UFFDIO_ZEROPAGE_MODE_DONTWAKE ((__u64)1<<0) #define UFFDIO_ZEROPAGE_MODE_ACCESS_LIKELY ((__u64)1<<1) +#define UFFDIO_ZEROPAGE_MODE_WRITE_LIKELY ((__u64)1<<2) __u64 mode; /* @@ -296,6 +301,10 @@ struct uffdio_writeprotect { * that the page is likely to be access in the near future. Providing * the hint properly can improve performance. * + * UFFDIO_WRITEPROTECT_MODE_WRITE_LIKELY: provides a hint to the kernel + * that the page is likely to be written in the near future. Providing + * the hint properly can improve performance. + * * NOTE: Write protecting a region (WP=1) is unrelated to page faults, * therefore DONTWAKE flag is meaningless with WP=1. Removing write * protection (WP=0) in response to a page fault wakes the faulting @@ -304,6 +313,7 @@ struct uffdio_writeprotect { #define UFFDIO_WRITEPROTECT_MODE_WP ((__u64)1<<0) #define UFFDIO_WRITEPROTECT_MODE_DONTWAKE ((__u64)1<<1) #define UFFDIO_WRITEPROTECT_MODE_ACCESS_LIKELY ((__u64)1<<2) +#define UFFDIO_WRITEPROTECT_MODE_WRITE_LIKELY ((__u64)1<<3) __u64 mode; }; @@ -311,6 +321,7 @@ struct uffdio_continue { struct uffdio_range range; #define UFFDIO_CONTINUE_MODE_DONTWAKE ((__u64)1<<0) #define UFFDIO_CONTINUE_MODE_ACCESS_LIKELY ((__u64)1<<1) +#define UFFDIO_CONTINUE_MODE_WRITE_LIKELY ((__u64)1<<2) __u64 mode; /* From patchwork Mon Jul 18 11:47:47 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nadav Amit X-Patchwork-Id: 12921635 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BBB05C43334 for ; Mon, 18 Jul 2022 19:22:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5A8AF6B0075; Mon, 18 Jul 2022 15:22:43 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 50A8F6B0078; Mon, 18 Jul 2022 15:22:43 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 35D916B007B; Mon, 18 Jul 2022 15:22:43 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 22E2D6B0075 for ; Mon, 18 Jul 2022 15:22:43 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id F2A4033512 for ; Mon, 18 Jul 2022 19:22:42 +0000 (UTC) X-FDA: 79701192564.26.3A6D725 Received: from relay5.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by imf21.hostedemail.com (Postfix) with ESMTP id 968FB1C008D for ; Mon, 18 Jul 2022 19:22:42 +0000 (UTC) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 5026C20438 for ; Mon, 18 Jul 2022 19:22:42 +0000 (UTC) X-FDA: 79701192564.04.AA3260B Received: from mail-pf1-f179.google.com (mail-pf1-f179.google.com [209.85.210.179]) by imf01.hostedemail.com (Postfix) with ESMTP id E8C9C40067 for ; Mon, 18 Jul 2022 19:22:41 +0000 (UTC) Received: by mail-pf1-f179.google.com with SMTP id g126so11531594pfb.3 for ; Mon, 18 Jul 2022 12:22:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=Ww8yAbYm93gsR7TAtQ/8ukDrr5zdDTnK385wVRiHFGY=; b=lmCGJim1YiEvbO+WT1twN6bmgaMWthf72oMjp34gdbGNhOw75RDN9th0CLrliZBNJl J3brxYlh8GQTkprkdHkD5i6WDCs2Oe9X4VSAolRysX6gyX0Pt7smhvp/3+NOyq3qrhnx 0DGlNkM0FcP2PZSCj9Wo/G7NJ69Zt0T4MG8eEQdfZleG8F3n/SgCodtczkEQbFxwnUmn zEonsiGMRe1vb/AUKPZSq7W05L0Zerf9zhbC699t27BEbqjYu7/E3CyJMQj+IafiqHTI 3Qm0tl8IQJa+GKLNB2nHSl2B9fxnk81RaZUmZf/kJcrjG4zM8J9699iXS/eKWBuQywVl Ev0w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Ww8yAbYm93gsR7TAtQ/8ukDrr5zdDTnK385wVRiHFGY=; b=eZ0TiAVZlaslVoYNjOAY2qRevutkimuku34VC14/qppIQp9RsmAORiC3Qac321Y8vn wmkED2hS3terkM+kT1Hlex12KvbUqjba7Eesz1w8L8WVcJOyL6ytMfIxwkC2L7huw3wP 6DsIWN9gZe3VHcKdf8dfnPMCtuLTTOpaoPDqsJa2Kwih9Eg7R3qPo/ZVSZ64DHVCai9R 6AXUV+H/Ca9sz66iAyCGvIFhQ3NjXtbugAm8LsBMazZqC3muS5qR1aiP47zDcAfYgOip OugoI04naIvpuUYwfDngo+Z3ZdDEtPQlFanMut6tJNa/h8ShX7f0MVUj8+wQ/mZwSTWy qz6Q== X-Gm-Message-State: AJIora9pgog+4f13jVvTy7M1oP0ldYF917mPyMjjOR+aGNlk0mQW8boc CHqgDYc66B2xH0xjR7wRx4jFZ14XRPE= X-Google-Smtp-Source: AGRyM1uY9QgJV/Sd0CdywlLsSm/mW3f90iJxrGW3gb9jU1ZnG+9/Q5X+9uZpEuCW0xBS28LVgO9yow== X-Received: by 2002:a05:6a00:1a92:b0:52b:ac3:7964 with SMTP id e18-20020a056a001a9200b0052b0ac37964mr29235872pfv.31.1658172160532; Mon, 18 Jul 2022 12:22:40 -0700 (PDT) Received: from sc2-haas01-esx0118.eng.vmware.com ([66.170.99.1]) by smtp.gmail.com with ESMTPSA id g9-20020a625209000000b0051bc5f4df1csm9613570pfb.154.2022.07.18.12.22.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 18 Jul 2022 12:22:40 -0700 (PDT) From: Nadav Amit X-Google-Original-From: Nadav Amit To: linux-mm@kvack.org Cc: Andrew Morton , Nadav Amit , David Hildenbrand , Mike Kravetz , Hugh Dickins , Axel Rasmussen , Mike Rapoport , Peter Xu Subject: [PATCH v2 4/5] userfaultfd: zero access/write hints Date: Mon, 18 Jul 2022 04:47:47 -0700 Message-Id: <20220718114748.2623-5-namit@vmware.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220718114748.2623-1-namit@vmware.com> References: <20220718114748.2623-1-namit@vmware.com> MIME-Version: 1.0 ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1658172162; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Ww8yAbYm93gsR7TAtQ/8ukDrr5zdDTnK385wVRiHFGY=; b=43PdAVSsEiN3HXbf+jgsDE33Mvp2AaatFbRcOonVYaRCx5CDXgazaTCBmJzuCLpahq5c6p XBROvARAWwWxeaCizJfvhDvP/k6GOD7IN1PSEXoZTI9V2eQ9Wo8J4k42YBHOn/+rQtRsZZ giCn7xR+qRhV3v0XH/IQ8IuvZfDi+d0= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1658172162; a=rsa-sha256; cv=none; b=u+BMoPu02971ZETYMdUGuWO0mssQaYjsV5ulgVSXN4clPc4tlmQTczPZnkBZAQIJidAKiE AhEt2dgwGvZLYuKhAoH2jUeDTECzBCc+d5HVas/i2l8krXjcAgjMbWdqylRHlNW9hkdnJH 6iQS0amnA67osOTPLK+LBqFDY+vjMkQ= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=lmCGJim1; dmarc=pass (policy=none) header.from=gmail.com; spf=none (imf21.hostedemail.com: domain of MAILER-DAEMON@hostedemail.com has no SPF policy when checking 216.40.44.14) smtp.mailfrom=MAILER-DAEMON@hostedemail.com X-Rspamd-Server: rspam10 X-Stat-Signature: maothbhg596i618gb3brr68yogfrm68i X-Rspam-User: X-Rspamd-Queue-Id: 968FB1C008D X-HE-Tag-Orig: 1658172161-183184 Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=lmCGJim1; dmarc=pass (policy=none) header.from=gmail.com; spf=none (imf21.hostedemail.com: domain of MAILER-DAEMON@hostedemail.com has no SPF policy when checking 216.40.44.14) smtp.mailfrom=MAILER-DAEMON@hostedemail.com X-HE-Tag: 1658172162-661646 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Nadav Amit When userfaultfd provides a zeropage in response to ioctl, it provides a readonly alias to the zero page. If the page is later written (which is the likely scenario), page-fault occurs and the page-fault allocator allocates a page and rewires the page-tables. This is an expensive flow for cases in which a page is likely be written to. Users can use the copy ioctl to initialize zero page (by copying zeros), but this is also wasteful. Allow userfaultfd users to efficiently map initialized zero-pages that are writable. IF UFFDIO_ZEROPAGE_MODE_WRITE_LIKELY is provided would map a clear page instead of an alias to the zero page. Suggested-by: David Hildenbrand Cc: Mike Kravetz Cc: Hugh Dickins Cc: Andrew Morton Cc: Axel Rasmussen Cc: Mike Rapoport Acked-by: Peter Xu Signed-off-by: Nadav Amit Reviewed-by: David Hildenbrand --- mm/userfaultfd.c | 35 +++++++++++++++++++++++++++++++++++ 1 file changed, 35 insertions(+) diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index c15679f3eb6a..954c6980b29f 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -241,6 +241,37 @@ static int mfill_zeropage_pte(struct mm_struct *dst_mm, return ret; } +static int mfill_clearpage_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd, + struct vm_area_struct *dst_vma, + unsigned long dst_addr, + uffd_flags_t uffd_flags) +{ + struct page *page; + int ret; + + ret = -ENOMEM; + page = alloc_zeroed_user_highpage_movable(dst_vma, dst_addr); + if (!page) + goto out; + + /* The PTE is not marked as dirty unconditionally */ + SetPageDirty(page); + __SetPageUptodate(page); + + if (mem_cgroup_charge(page_folio(page), dst_vma->vm_mm, GFP_KERNEL)) + goto out_release; + + ret = mfill_atomic_install_pte(dst_mm, dst_pmd, dst_vma, dst_addr, + page, true, uffd_flags); + if (ret) + goto out_release; +out: + return ret; +out_release: + put_page(page); + goto out; +} + /* Handles UFFDIO_CONTINUE for all shmem VMAs (shared or private). */ static int mcontinue_atomic_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd, @@ -500,6 +531,10 @@ static __always_inline ssize_t mfill_atomic_pte(struct mm_struct *dst_mm, err = mcopy_atomic_pte(dst_mm, dst_pmd, dst_vma, dst_addr, src_addr, page, uffd_flags); + else if (!(uffd_flags & UFFD_FLAGS_WP) && + (uffd_flags & UFFD_FLAGS_WRITE_LIKELY)) + err = mfill_clearpage_pte(dst_mm, dst_pmd, dst_vma, + dst_addr, uffd_flags); else err = mfill_zeropage_pte(dst_mm, dst_pmd, dst_vma, dst_addr, uffd_flags); From patchwork Mon Jul 18 11:47:48 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nadav Amit X-Patchwork-Id: 12921636 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E0EA5CCA479 for ; Mon, 18 Jul 2022 19:22:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F1F4F6B0078; Mon, 18 Jul 2022 15:22:44 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id EAC8A6B007B; Mon, 18 Jul 2022 15:22:44 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B7C2D6B007D; Mon, 18 Jul 2022 15:22:44 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id A4B1B6B0078 for ; Mon, 18 Jul 2022 15:22:44 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 6462860420 for ; Mon, 18 Jul 2022 19:22:44 +0000 (UTC) X-FDA: 79701192648.14.C42ADB2 Received: from relay3.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by imf06.hostedemail.com (Postfix) with ESMTP id D341E180055 for ; Mon, 18 Jul 2022 19:22:43 +0000 (UTC) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 82AC2207C8 for ; Mon, 18 Jul 2022 19:22:43 +0000 (UTC) X-FDA: 79701192606.10.AB6777C Received: from mail-pl1-f181.google.com (mail-pl1-f181.google.com [209.85.214.181]) by imf13.hostedemail.com (Postfix) with ESMTP id CAF2C20098 for ; Mon, 18 Jul 2022 19:22:42 +0000 (UTC) Received: by mail-pl1-f181.google.com with SMTP id q5so9931086plr.11 for ; Mon, 18 Jul 2022 12:22:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=ZC1di4K8mzuPHZwILjb2huyZD4/t1Q7mzfb990p6oBE=; b=KfM1I1BxerLWQVZE4Moy1hZM2GQJiZcjE8z5MiLmlxMI1jPpMeJrbjeyD17XUuCxwo Fk/ck4Ryq7hPsC0zIvoSTVPdRnCHG6H8gxGolKpW2ohoBiizYSpWSyvPno8/TxA88d8r ox9bcTZyH7dIjyY5sS2ysMigSrHn5OjAybtTQkk9DxrQlLOj9Euarksisyyqd3fWr6hS MtNvde6VmAFZ8/tn4deZBtDY2DNL4U76krQupaP1/MYnIkCIqx5D5FMs0BcyIYu4ER8d FXKbFFTSTOg+/dWnWUW/PllHGg/EZhNt8pPopr4amyxBxUyYQlc8Se4gE8yrQcBqYUYj 05nw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=ZC1di4K8mzuPHZwILjb2huyZD4/t1Q7mzfb990p6oBE=; b=vK3c/dWOi0xZ5qnjiGM3Ms7MUmU8e0x4rI2Du9WKzT9kQ84jH9jntWEUzFIHOljYRM 15KwzrC33tWYgHi5LStVqHdUB8aHDNORKd7MLkSqE57ucuSTOYZCDHZtXYMBK2Jqtj8A Ld7wAMJtz8G+iyEGxuxUJEeg2Rjeoi6QsY83Z4/hCoMhufXlbX4RrR5px28lcaJDxPLH Md/90XkdrvkrPFsPe5CMVWPOTuhtdsdC+1qKJ/lY2ydC8b2F+4dauGDPRCsGh9f5lRxD /VLhxrIxslK5P48Z46UWDyB9M5atUwwT5nctRw9eMq97OX61KKPDHtw6ReFWkvU6JVOE cNJA== X-Gm-Message-State: AJIora//KU8EdMNnq3smO6/Susy8KQcdfYWN6EGnF22e1rIooAOqLk++ 2D/ljVrjALnu5inrMRKRQc0Z3IdXJLc= X-Google-Smtp-Source: AGRyM1sUl8K4UViYcikdUSgOopTNYKHV3R6AKur+LcYXEMosP5XAcxxOGZIfACeP8Bz7avwE/QnDnw== X-Received: by 2002:a17:90a:7ac4:b0:1ef:a606:4974 with SMTP id b4-20020a17090a7ac400b001efa6064974mr40815100pjl.51.1658172161757; Mon, 18 Jul 2022 12:22:41 -0700 (PDT) Received: from sc2-haas01-esx0118.eng.vmware.com ([66.170.99.1]) by smtp.gmail.com with ESMTPSA id g9-20020a625209000000b0051bc5f4df1csm9613570pfb.154.2022.07.18.12.22.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 18 Jul 2022 12:22:41 -0700 (PDT) From: Nadav Amit X-Google-Original-From: Nadav Amit To: linux-mm@kvack.org Cc: Andrew Morton , Nadav Amit , Mike Kravetz , Hugh Dickins , Axel Rasmussen , Peter Xu , David Hildenbrand , Mike Rapoport Subject: [PATCH v2 5/5] selftest/userfaultfd: test read/write hints Date: Mon, 18 Jul 2022 04:47:48 -0700 Message-Id: <20220718114748.2623-6-namit@vmware.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220718114748.2623-1-namit@vmware.com> References: <20220718114748.2623-1-namit@vmware.com> MIME-Version: 1.0 ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1658172163; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ZC1di4K8mzuPHZwILjb2huyZD4/t1Q7mzfb990p6oBE=; b=0s6TJZ2bfzAiXxacw6reNOqgsFsBn0sKN7CgLToqP7R80OgR8f0Z7t7+RwWGEU6lBoMQ0d DSR/ei5+5iyzsg6jMqpmNy8o3Vjk0qbwKyfjLGv62iSI1PcHmO0lNjaQ4ZZnCBBeqSk8m9 grxgeUezd4MEl0lD0mcPcdz6MuTfnKw= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1658172163; a=rsa-sha256; cv=none; b=6wFsKEHKO2/13DFXDLm89RZlo0dSEx+7Vptbzm/FLFfr5DqQtTS6/zl916B2yJhCsziU2N +ffRmD4v8GnQfEw/fPjxfblgRVWXqgwokta42Viq4K/4gqpSns0+iP+cjrvOQO0Wq5WIVk IKUgtSh8LviAENTNjomkcHSSUpDuegE= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=KfM1I1Bx; dmarc=pass (policy=none) header.from=gmail.com; spf=none (imf06.hostedemail.com: domain of MAILER-DAEMON@hostedemail.com has no SPF policy when checking 216.40.44.10) smtp.mailfrom=MAILER-DAEMON@hostedemail.com X-Rspamd-Server: rspam10 X-Stat-Signature: h1c99q73imuwghxfusmikofokz1kna9d X-Rspam-User: X-Rspamd-Queue-Id: D341E180055 X-HE-Tag-Orig: 1658172162-899486 Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=KfM1I1Bx; dmarc=pass (policy=none) header.from=gmail.com; spf=none (imf06.hostedemail.com: domain of MAILER-DAEMON@hostedemail.com has no SPF policy when checking 216.40.44.10) smtp.mailfrom=MAILER-DAEMON@hostedemail.com X-HE-Tag: 1658172163-36205 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Nadav Amit Test UFFDIO_*_MODE_ACCESS_LIKELY and UFFDIO_*_MODE_WRITE_LIKELY. Introduce a modifier to trigger the use of the hints. Add the test to run_vmtests.sh and add an array to run different userfaultfd configurations. Cc: Mike Kravetz Cc: Hugh Dickins Cc: Andrew Morton Cc: Axel Rasmussen Cc: Peter Xu Cc: David Hildenbrand Cc: Mike Rapoport Signed-off-by: Nadav Amit --- tools/testing/selftests/vm/run_vmtests.sh | 16 ++++--- tools/testing/selftests/vm/userfaultfd.c | 54 +++++++++++++++++++++-- 2 files changed, 62 insertions(+), 8 deletions(-) diff --git a/tools/testing/selftests/vm/run_vmtests.sh b/tools/testing/selftests/vm/run_vmtests.sh index 27c01c35c7a9..296862547ff9 100755 --- a/tools/testing/selftests/vm/run_vmtests.sh +++ b/tools/testing/selftests/vm/run_vmtests.sh @@ -120,11 +120,17 @@ run_test ./gup_test -a # Dump pages 0, 19, and 4096, using pin_user_pages: run_test ./gup_test -ct -F 0x1 0 19 0x1000 -run_test ./userfaultfd anon 20 16 -# Test requires source and destination huge pages. Size of source -# (half_ufd_size_MB) is passed as argument to test. -run_test ./userfaultfd hugetlb "$half_ufd_size_MB" 32 -run_test ./userfaultfd shmem 20 16 +uffd_mods=("" ":access_likely" ":access_likely:write_likely" ":write_likely") + +for mod in "${uffd_mods[@]}"; do + run_test ./userfaultfd anon${mod} 20 16 + # Hugetlb tests require source and destination huge pages. Pass in half the + # size ($half_ufd_size_MB), which is used for *each*. + run_test ./userfaultfd hugetlb${mod} "$half_ufd_size_MB" 32 + run_test ./userfaultfd hugetlb_shared${mod} "$half_ufd_size_MB" 32 "$mnt"/uffd-test + rm -f "$mnt"/uffd-test + run_test ./userfaultfd shmem${mod} 20 16 +done #cleanup umount "$mnt" diff --git a/tools/testing/selftests/vm/userfaultfd.c b/tools/testing/selftests/vm/userfaultfd.c index 7c3f1b0ab468..d54f65246bd8 100644 --- a/tools/testing/selftests/vm/userfaultfd.c +++ b/tools/testing/selftests/vm/userfaultfd.c @@ -85,6 +85,8 @@ static volatile bool test_uffdio_zeropage_eexist = true; static bool test_uffdio_wp = true; /* Whether to test uffd minor faults */ static bool test_uffdio_minor = false; +static bool test_access_likely; +static bool test_write_likely; static bool map_shared; static int shm_fd; @@ -518,6 +520,12 @@ static void wp_range(int ufd, __u64 start, __u64 len, bool wp) /* Undo write-protect, do wakeup after that */ prms.mode = wp ? UFFDIO_WRITEPROTECT_MODE_WP : 0; + if (test_access_likely) + prms.mode |= UFFDIO_WRITEPROTECT_MODE_ACCESS_LIKELY; + + if (test_write_likely) + prms.mode |= UFFDIO_WRITEPROTECT_MODE_WRITE_LIKELY; + if (ioctl(ufd, UFFDIO_WRITEPROTECT, &prms)) err("clear WP failed: address=0x%"PRIx64, (uint64_t)start); } @@ -531,6 +539,12 @@ static void continue_range(int ufd, __u64 start, __u64 len) req.range.len = len; req.mode = 0; + if (test_access_likely) + req.mode |= UFFDIO_CONTINUE_MODE_ACCESS_LIKELY; + + if (test_write_likely) + req.mode |= UFFDIO_CONTINUE_MODE_WRITE_LIKELY; + if (ioctl(ufd, UFFDIO_CONTINUE, &req)) err("UFFDIO_CONTINUE failed for address 0x%" PRIx64, (uint64_t)start); @@ -621,6 +635,13 @@ static int __copy_page(int ufd, unsigned long offset, bool retry) uffdio_copy.mode = UFFDIO_COPY_MODE_WP; else uffdio_copy.mode = 0; + + if (test_access_likely) + uffdio_copy.mode |= UFFDIO_COPY_MODE_ACCESS_LIKELY; + + if (test_write_likely) + uffdio_copy.mode |= UFFDIO_COPY_MODE_WRITE_LIKELY; + uffdio_copy.copy = 0; if (ioctl(ufd, UFFDIO_COPY, &uffdio_copy)) { /* real retval in ufdio_copy.copy */ @@ -1048,6 +1069,13 @@ static int __uffdio_zeropage(int ufd, unsigned long offset, bool retry) uffdio_zeropage.range.start = (unsigned long) area_dst + offset; uffdio_zeropage.range.len = page_size; uffdio_zeropage.mode = 0; + + if (test_access_likely) + uffdio_zeropage.mode |= UFFDIO_ZEROPAGE_MODE_ACCESS_LIKELY; + + if (test_write_likely) + uffdio_zeropage.mode |= UFFDIO_ZEROPAGE_MODE_WRITE_LIKELY; + ret = ioctl(ufd, UFFDIO_ZEROPAGE, &uffdio_zeropage); res = uffdio_zeropage.zeropage; if (ret) { @@ -1584,8 +1612,6 @@ unsigned long default_huge_page_size(void) static void set_test_type(const char *type) { - uint64_t features = UFFD_API_FEATURES; - if (!strcmp(type, "anon")) { test_type = TEST_ANON; uffd_test_ops = &anon_uffd_test_ops; @@ -1606,6 +1632,28 @@ static void set_test_type(const char *type) } else { err("Unknown test type: %s", type); } +} + +static void parse_test_type_arg(const char *raw_type) +{ + char *buf = strdup(raw_type); + uint64_t features = UFFD_API_FEATURES; + + while (buf) { + const char *token = strsep(&buf, ":"); + + if (!test_type) + set_test_type(token); + else if (!strcmp(token, "access_likely")) + test_access_likely = true; + else if (!strcmp(token, "write_likely")) + test_write_likely = true; + else + err("unrecognized test mod '%s'", token); + } + + if (!test_type) + err("failed to parse test type argument: '%s'", raw_type); if (test_type == TEST_HUGETLB) page_size = default_huge_page_size(); @@ -1653,7 +1701,7 @@ int main(int argc, char **argv) err("failed to arm SIGALRM"); alarm(ALARM_INTERVAL_SECS); - set_test_type(argv[1]); + parse_test_type_arg(argv[1]); nr_cpus = sysconf(_SC_NPROCESSORS_ONLN); nr_pages_per_cpu = atol(argv[2]) * 1024*1024 / page_size /