From patchwork Wed Jun 22 18:50:34 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nadav Amit X-Patchwork-Id: 12891715 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EBA6AC433EF for ; Thu, 23 Jun 2022 02:25:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5C2C18E010A; Wed, 22 Jun 2022 22:25:55 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 54C488E00FA; Wed, 22 Jun 2022 22:25:55 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3D0E38E010A; Wed, 22 Jun 2022 22:25:55 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 284008E00FA for ; Wed, 22 Jun 2022 22:25:55 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay12.hostedemail.com (Postfix) with ESMTP id F078C1211FC for ; Thu, 23 Jun 2022 02:25:54 +0000 (UTC) X-FDA: 79607910228.19.0C64D35 Received: from relay3.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by imf13.hostedemail.com (Postfix) with ESMTP id 146AB2001C for ; Thu, 23 Jun 2022 02:25:53 +0000 (UTC) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id A18AF12F1 for ; Thu, 23 Jun 2022 02:25:53 +0000 (UTC) X-FDA: 79607910186.21.EB19C0E Received: from mail-pg1-f172.google.com (mail-pg1-f172.google.com [209.85.215.172]) by imf14.hostedemail.com (Postfix) with ESMTP id 4029D10001E for ; Thu, 23 Jun 2022 02:25:53 +0000 (UTC) Received: by mail-pg1-f172.google.com with SMTP id l4so17768059pgh.13 for ; Wed, 22 Jun 2022 19:25:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=pbmDy9SbdsqirEFy03BmEBkE31HpJPwyuaKM2GZPZTw=; b=i7Y9tdLGvBcDk2/LrY3mxb2zo68uQZ8ZcQOMXGL7fbEe2aAVO0xxLgt1u0gcy7+4YL c91gCTAIcTLNP7GEcMHhaqkWduCjcaD8klKia2FniL/y+NHtwmxip7ETqKiWJiBuUNwt IRrOG36rc/8xyT7liNM3TuVgEn7zPX2Z8L+EGmRVLOVmfG0Q7yZkh48NHjjR0IbXEdgJ 5rTTDe4d423B3Yzd1dIEsoWqRhJ59hxg7MCbTbG+ryiK6NO99ptE0PWlPPmZey9SzTqS CU2U6FAPVe2pj4HoR7QimqgRK+KSVVjTAQRZirLEaAD1FYZvvvFezwSQ7DRMzyfi2s57 WtRw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=pbmDy9SbdsqirEFy03BmEBkE31HpJPwyuaKM2GZPZTw=; b=zsiP2+KeiEeewNKVVBZHz7T2DJDtkTz+s5/NtnJn8CJnfyMzD+YUHUSCeald7Gdat7 Jhc5fsmi7ZB7xPeaieCKT4/Q0FWEGcoIfWe178tQXc9gOdbGNpQSY9OZq4IeufsK5+XC DMGfjG2hTiAd6K9pbgoDI76tHiDMfK813jbcYm1pElhFcoMMcReE+K7S2qa019TmzH7W k7oEuZRvDW6quiu4y8yeuyw4bJiz5aCANOoCqv2zLdy3Ju29Z+g757xQ2kV/Kvvvqdj8 PL6QehE+Bel9H/Fn8L3pd+WytoeCejGgNoCVgPS6RwFAS3cb1D3zJXwOyJjbQPE2eY/t ZW3w== X-Gm-Message-State: AJIora86BUmIMcy2SWtebaa0Da4a4Nn6RFaiblYi4OhGPmkMucPjilTJ 6ghZZI7n/r+bQAarMvHKRWsfGO/PPJ0fRw== X-Google-Smtp-Source: AGRyM1so74ZA9cUBh8vikf/AKAsXuXHqRFKMUNHLgb0/acKns4Z52jv7zLoqa6YGHPwKzZ58+Ui8Ig== X-Received: by 2002:a63:1d46:0:b0:3fd:df71:dac0 with SMTP id d6-20020a631d46000000b003fddf71dac0mr5571155pgm.258.1655951151844; Wed, 22 Jun 2022 19:25:51 -0700 (PDT) Received: from sc2-haas01-esx0118.eng.vmware.com ([66.170.99.1]) by smtp.gmail.com with ESMTPSA id ik10-20020a170902ab0a00b001617541c94fsm13423998plb.60.2022.06.22.19.25.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 22 Jun 2022 19:25:51 -0700 (PDT) From: Nadav Amit X-Google-Original-From: Nadav Amit To: linux-mm@kvack.org Cc: Nadav Amit , Mike Kravetz , Hugh Dickins , Andrew Morton , Axel Rasmussen , Peter Xu , Mike Rapoport , David Hildenbrand Subject: [PATCH v1 1/5] userfaultfd: introduce uffd_flags Date: Wed, 22 Jun 2022 11:50:34 -0700 Message-Id: <20220622185038.71740-2-namit@vmware.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220622185038.71740-1-namit@vmware.com> References: <20220622185038.71740-1-namit@vmware.com> MIME-Version: 1.0 ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=i7Y9tdLG; spf=none (imf13.hostedemail.com: domain of MAILER-DAEMON@hostedemail.com has no SPF policy when checking 216.40.44.17) smtp.mailfrom=MAILER-DAEMON@hostedemail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1655951154; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=pbmDy9SbdsqirEFy03BmEBkE31HpJPwyuaKM2GZPZTw=; b=8CEZjaGFeb3isDjfK6onWg+Nvvp/25TgHfwlUBYqAYvIegd5ktV4533iOc6sxhQSpC/EGj 9YREtX3vWQjwVEpMaYpEGiqNoSqxBlD/qH4l3Yyfjp0OJEd8+Is0pKbyiA4YC9dBrv7K1a uhJDvFRrae5FqTwgoaHka16uO2Y7sUU= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1655951154; a=rsa-sha256; cv=none; b=vDH05r1wvYlYfLsJV7+6q+KS5Bghmu/6r2kd2vwB4THXUKXgshXfJ7uMngsCAUkWHBv04a dQ3/uVCvrfLOX8MlB3fH+L3kWsYZD8seNbwpGSB5VSgOUAnY3Q5tuZECeyCjVoYGpmEULc idt59K465NdAYMz3r01ZUEfUBzyA5Ts= X-HE-Tag-Orig: 1655951153-742902 X-Stat-Signature: bdpima54p6sx3zgcocq978uwqg7h77ii X-Rspam-User: X-Rspamd-Server: rspam07 Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=i7Y9tdLG; spf=none (imf13.hostedemail.com: domain of MAILER-DAEMON@hostedemail.com has no SPF policy when checking 216.40.44.17) smtp.mailfrom=MAILER-DAEMON@hostedemail.com; dmarc=pass (policy=none) header.from=gmail.com X-Rspamd-Queue-Id: 146AB2001C X-HE-Tag: 1655951153-980587 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Nadav Amit As the next patches are going to introduce more information that needs to be propagated regarding handled user requests, introduce uffd_flags that would be used to propagate this information. Remove the unused UFFD_FLAGS_SET to avoid confusion in the constant names. Introducing uffd flags also allows to avoid mm/userfaultfd from being using uapi (e.g., UFFDIO_COPY_MODE_WP). Cc: Mike Kravetz Cc: Hugh Dickins Cc: Andrew Morton Cc: Axel Rasmussen Cc: Peter Xu Cc: Mike Rapoport Acked-by: David Hildenbrand Signed-off-by: Nadav Amit --- fs/userfaultfd.c | 21 ++++++++++---- include/linux/hugetlb.h | 4 +-- include/linux/shmem_fs.h | 8 ++++-- include/linux/userfaultfd_k.h | 24 ++++++++++------ mm/hugetlb.c | 3 +- mm/shmem.c | 6 ++-- mm/userfaultfd.c | 53 ++++++++++++++++++----------------- 7 files changed, 70 insertions(+), 49 deletions(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index d398f6bf6d74..a44e46f8249f 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -1700,6 +1700,8 @@ static int userfaultfd_copy(struct userfaultfd_ctx *ctx, struct uffdio_copy uffdio_copy; struct uffdio_copy __user *user_uffdio_copy; struct userfaultfd_wake_range range; + bool mode_wp; + uffd_flags_t uffd_flags; user_uffdio_copy = (struct uffdio_copy __user *) arg; @@ -1726,10 +1728,15 @@ static int userfaultfd_copy(struct userfaultfd_ctx *ctx, goto out; if (uffdio_copy.mode & ~(UFFDIO_COPY_MODE_DONTWAKE|UFFDIO_COPY_MODE_WP)) goto out; + + mode_wp = uffdio_copy.mode & UFFDIO_COPY_MODE_WP; + + uffd_flags = mode_wp ? UFFD_FLAGS_WP : UFFD_FLAGS_NONE; + if (mmget_not_zero(ctx->mm)) { ret = mcopy_atomic(ctx->mm, uffdio_copy.dst, uffdio_copy.src, uffdio_copy.len, &ctx->mmap_changing, - uffdio_copy.mode); + uffd_flags); mmput(ctx->mm); } else { return -ESRCH; @@ -1757,6 +1764,7 @@ static int userfaultfd_zeropage(struct userfaultfd_ctx *ctx, struct uffdio_zeropage uffdio_zeropage; struct uffdio_zeropage __user *user_uffdio_zeropage; struct userfaultfd_wake_range range; + uffd_flags_t uffd_flags = UFFD_FLAGS_NONE; user_uffdio_zeropage = (struct uffdio_zeropage __user *) arg; @@ -1781,7 +1789,7 @@ static int userfaultfd_zeropage(struct userfaultfd_ctx *ctx, if (mmget_not_zero(ctx->mm)) { ret = mfill_zeropage(ctx->mm, uffdio_zeropage.range.start, uffdio_zeropage.range.len, - &ctx->mmap_changing); + &ctx->mmap_changing, uffd_flags); mmput(ctx->mm); } else { return -ESRCH; @@ -1810,6 +1818,7 @@ static int userfaultfd_writeprotect(struct userfaultfd_ctx *ctx, struct uffdio_writeprotect __user *user_uffdio_wp; struct userfaultfd_wake_range range; bool mode_wp, mode_dontwake; + uffd_flags_t uffd_flags; if (atomic_read(&ctx->mmap_changing)) return -EAGAIN; @@ -1835,10 +1844,12 @@ static int userfaultfd_writeprotect(struct userfaultfd_ctx *ctx, if (mode_wp && mode_dontwake) return -EINVAL; + uffd_flags = mode_wp ? UFFD_FLAGS_WP : UFFD_FLAGS_NONE; + if (mmget_not_zero(ctx->mm)) { ret = mwriteprotect_range(ctx->mm, uffdio_wp.range.start, - uffdio_wp.range.len, mode_wp, - &ctx->mmap_changing); + uffdio_wp.range.len, + &ctx->mmap_changing, uffd_flags); mmput(ctx->mm); } else { return -ESRCH; @@ -1891,7 +1902,7 @@ static int userfaultfd_continue(struct userfaultfd_ctx *ctx, unsigned long arg) if (mmget_not_zero(ctx->mm)) { ret = mcopy_continue(ctx->mm, uffdio_continue.range.start, uffdio_continue.range.len, - &ctx->mmap_changing); + &ctx->mmap_changing, 0); mmput(ctx->mm); } else { return -ESRCH; diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 642a39016f9a..a4f326bc2de6 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -166,7 +166,7 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, pte_t *dst_pte, unsigned long src_addr, enum mcopy_atomic_mode mode, struct page **pagep, - bool wp_copy); + uffd_flags_t uffd_flags); #endif /* CONFIG_USERFAULTFD */ bool hugetlb_reserve_pages(struct inode *inode, long from, long to, struct vm_area_struct *vma, @@ -366,7 +366,7 @@ static inline int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, unsigned long src_addr, enum mcopy_atomic_mode mode, struct page **pagep, - bool wp_copy) + uffd_flags_t uffd_flags) { BUG(); return 0; diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h index a68f982f22d1..f93a3c114002 100644 --- a/include/linux/shmem_fs.h +++ b/include/linux/shmem_fs.h @@ -9,6 +9,7 @@ #include #include #include +#include /* inode in-kernel data */ @@ -145,11 +146,12 @@ extern int shmem_mfill_atomic_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd, struct vm_area_struct *dst_vma, unsigned long dst_addr, unsigned long src_addr, - bool zeropage, bool wp_copy, - struct page **pagep); + bool zeropage, + struct page **pagep, + uffd_flags_t uffd_flags); #else /* !CONFIG_SHMEM */ #define shmem_mfill_atomic_pte(dst_mm, dst_pmd, dst_vma, dst_addr, \ - src_addr, zeropage, wp_copy, pagep) ({ BUG(); 0; }) + src_addr, zeropage, pagep, uffd_flags) ({ BUG(); 0; }) #endif /* CONFIG_SHMEM */ #endif /* CONFIG_USERFAULTFD */ diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index eee374c29c85..d5b3dff48a87 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -34,7 +34,6 @@ #define UFFD_NONBLOCK O_NONBLOCK #define UFFD_SHARED_FCNTL_FLAGS (O_CLOEXEC | O_NONBLOCK) -#define UFFD_FLAGS_SET (EFD_SHARED_FCNTL_FLAGS) extern int sysctl_unprivileged_userfaultfd; @@ -56,23 +55,30 @@ enum mcopy_atomic_mode { MCOPY_ATOMIC_CONTINUE, }; +typedef unsigned int __bitwise uffd_flags_t; + +#define UFFD_FLAGS_NONE ((__force uffd_flags_t)0) +#define UFFD_FLAGS_WP ((__force uffd_flags_t)BIT(0)) + extern int mfill_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd, struct vm_area_struct *dst_vma, unsigned long dst_addr, struct page *page, - bool newly_allocated, bool wp_copy); + bool newly_allocated, + uffd_flags_t uffd_flags); extern ssize_t mcopy_atomic(struct mm_struct *dst_mm, unsigned long dst_start, unsigned long src_start, unsigned long len, - atomic_t *mmap_changing, __u64 mode); -extern ssize_t mfill_zeropage(struct mm_struct *dst_mm, - unsigned long dst_start, - unsigned long len, - atomic_t *mmap_changing); + atomic_t *mmap_changing, uffd_flags_t uffd_flags); +extern ssize_t mfill_zeropage(struct mm_struct *dst_mm, unsigned long dst_start, + unsigned long len, atomic_t *mmap_changing, + uffd_flags_t uffd_flags); extern ssize_t mcopy_continue(struct mm_struct *dst_mm, unsigned long dst_start, - unsigned long len, atomic_t *mmap_changing); + unsigned long len, atomic_t *mmap_changing, + uffd_flags_t uffd_flags); extern int mwriteprotect_range(struct mm_struct *dst_mm, unsigned long start, unsigned long len, - bool enable_wp, atomic_t *mmap_changing); + atomic_t *mmap_changing, + uffd_flags_t uffd_flags); /* mm helpers */ static inline bool is_mergeable_vm_userfaultfd_ctx(struct vm_area_struct *vma, diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 2bc9d1170e4f..2beff8a4bf7c 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5875,9 +5875,10 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, unsigned long src_addr, enum mcopy_atomic_mode mode, struct page **pagep, - bool wp_copy) + uffd_flags_t uffd_flags) { bool is_continue = (mode == MCOPY_ATOMIC_CONTINUE); + bool wp_copy = uffd_flags & UFFD_FLAGS_WP; struct hstate *h = hstate_vma(dst_vma); struct address_space *mapping = dst_vma->vm_file->f_mapping; pgoff_t idx = vma_hugecache_offset(h, dst_vma, dst_addr); diff --git a/mm/shmem.c b/mm/shmem.c index 12ac67dc831f..89c775275bae 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -2343,8 +2343,8 @@ int shmem_mfill_atomic_pte(struct mm_struct *dst_mm, struct vm_area_struct *dst_vma, unsigned long dst_addr, unsigned long src_addr, - bool zeropage, bool wp_copy, - struct page **pagep) + bool zeropage, struct page **pagep, + uffd_flags_t uffd_flags) { struct inode *inode = file_inode(dst_vma->vm_file); struct shmem_inode_info *info = SHMEM_I(inode); @@ -2418,7 +2418,7 @@ int shmem_mfill_atomic_pte(struct mm_struct *dst_mm, goto out_release; ret = mfill_atomic_install_pte(dst_mm, dst_pmd, dst_vma, dst_addr, - page, true, wp_copy); + page, true, uffd_flags); if (ret) goto out_delete_from_cache; diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 07d3befc80e4..734de6aa0b8e 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -58,7 +58,7 @@ struct vm_area_struct *find_dst_vma(struct mm_struct *dst_mm, int mfill_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd, struct vm_area_struct *dst_vma, unsigned long dst_addr, struct page *page, - bool newly_allocated, bool wp_copy) + bool newly_allocated, uffd_flags_t uffd_flags) { int ret; pte_t _dst_pte, *dst_pte; @@ -78,7 +78,7 @@ int mfill_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd, * Always mark a PTE as write-protected when needed, regardless of * VM_WRITE, which the user might change. */ - if (wp_copy) { + if (uffd_flags & UFFD_FLAGS_WP) { _dst_pte = pte_mkuffd_wp(_dst_pte); writable = false; } @@ -145,7 +145,7 @@ static int mcopy_atomic_pte(struct mm_struct *dst_mm, unsigned long dst_addr, unsigned long src_addr, struct page **pagep, - bool wp_copy) + uffd_flags_t uffd_flags) { void *page_kaddr; int ret; @@ -189,7 +189,7 @@ static int mcopy_atomic_pte(struct mm_struct *dst_mm, goto out_release; ret = mfill_atomic_install_pte(dst_mm, dst_pmd, dst_vma, dst_addr, - page, true, wp_copy); + page, true, uffd_flags); if (ret) goto out_release; out: @@ -239,7 +239,7 @@ static int mcontinue_atomic_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd, struct vm_area_struct *dst_vma, unsigned long dst_addr, - bool wp_copy) + uffd_flags_t uffd_flags) { struct inode *inode = file_inode(dst_vma->vm_file); pgoff_t pgoff = linear_page_index(dst_vma, dst_addr); @@ -263,7 +263,7 @@ static int mcontinue_atomic_pte(struct mm_struct *dst_mm, } ret = mfill_atomic_install_pte(dst_mm, dst_pmd, dst_vma, dst_addr, - page, false, wp_copy); + page, false, uffd_flags); if (ret) goto out_release; @@ -309,7 +309,7 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, unsigned long src_start, unsigned long len, enum mcopy_atomic_mode mode, - bool wp_copy) + uffd_flags_t uffd_flags) { int vm_shared = dst_vma->vm_flags & VM_SHARED; ssize_t err; @@ -406,7 +406,7 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, err = hugetlb_mcopy_atomic_pte(dst_mm, dst_pte, dst_vma, dst_addr, src_addr, mode, &page, - wp_copy); + uffd_flags); mutex_unlock(&hugetlb_fault_mutex_table[hash]); i_mmap_unlock_read(mapping); @@ -462,7 +462,7 @@ extern ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, unsigned long src_start, unsigned long len, enum mcopy_atomic_mode mode, - bool wp_copy); + uffd_flags_t uffd_flags); #endif /* CONFIG_HUGETLB_PAGE */ static __always_inline ssize_t mfill_atomic_pte(struct mm_struct *dst_mm, @@ -472,13 +472,13 @@ static __always_inline ssize_t mfill_atomic_pte(struct mm_struct *dst_mm, unsigned long src_addr, struct page **page, enum mcopy_atomic_mode mode, - bool wp_copy) + uffd_flags_t uffd_flags) { ssize_t err; if (mode == MCOPY_ATOMIC_CONTINUE) { return mcontinue_atomic_pte(dst_mm, dst_pmd, dst_vma, dst_addr, - wp_copy); + uffd_flags); } /* @@ -495,7 +495,7 @@ static __always_inline ssize_t mfill_atomic_pte(struct mm_struct *dst_mm, if (mode == MCOPY_ATOMIC_NORMAL) err = mcopy_atomic_pte(dst_mm, dst_pmd, dst_vma, dst_addr, src_addr, page, - wp_copy); + uffd_flags); else err = mfill_zeropage_pte(dst_mm, dst_pmd, dst_vma, dst_addr); @@ -503,7 +503,7 @@ static __always_inline ssize_t mfill_atomic_pte(struct mm_struct *dst_mm, err = shmem_mfill_atomic_pte(dst_mm, dst_pmd, dst_vma, dst_addr, src_addr, mode != MCOPY_ATOMIC_NORMAL, - wp_copy, page); + page, uffd_flags); } return err; @@ -515,7 +515,7 @@ static __always_inline ssize_t __mcopy_atomic(struct mm_struct *dst_mm, unsigned long len, enum mcopy_atomic_mode mcopy_mode, atomic_t *mmap_changing, - __u64 mode) + uffd_flags_t uffd_flags) { struct vm_area_struct *dst_vma; ssize_t err; @@ -523,7 +523,6 @@ static __always_inline ssize_t __mcopy_atomic(struct mm_struct *dst_mm, unsigned long src_addr, dst_addr; long copied; struct page *page; - bool wp_copy; /* * Sanitize the command parameters: @@ -570,11 +569,10 @@ static __always_inline ssize_t __mcopy_atomic(struct mm_struct *dst_mm, goto out_unlock; /* - * validate 'mode' now that we know the dst_vma: don't allow + * validate 'flags' now that we know the dst_vma: don't allow * a wrprotect copy if the userfaultfd didn't register as WP. */ - wp_copy = mode & UFFDIO_COPY_MODE_WP; - if (wp_copy && !(dst_vma->vm_flags & VM_UFFD_WP)) + if ((uffd_flags & UFFD_FLAGS_WP) && !(dst_vma->vm_flags & VM_UFFD_WP)) goto out_unlock; /* @@ -583,7 +581,7 @@ static __always_inline ssize_t __mcopy_atomic(struct mm_struct *dst_mm, if (is_vm_hugetlb_page(dst_vma)) return __mcopy_atomic_hugetlb(dst_mm, dst_vma, dst_start, src_start, len, mcopy_mode, - wp_copy); + uffd_flags); if (!vma_is_anonymous(dst_vma) && !vma_is_shmem(dst_vma)) goto out_unlock; @@ -635,7 +633,7 @@ static __always_inline ssize_t __mcopy_atomic(struct mm_struct *dst_mm, BUG_ON(pmd_trans_huge(*dst_pmd)); err = mfill_atomic_pte(dst_mm, dst_pmd, dst_vma, dst_addr, - src_addr, &page, mcopy_mode, wp_copy); + src_addr, &page, mcopy_mode, uffd_flags); cond_resched(); if (unlikely(err == -ENOENT)) { @@ -683,30 +681,33 @@ static __always_inline ssize_t __mcopy_atomic(struct mm_struct *dst_mm, ssize_t mcopy_atomic(struct mm_struct *dst_mm, unsigned long dst_start, unsigned long src_start, unsigned long len, - atomic_t *mmap_changing, __u64 mode) + atomic_t *mmap_changing, uffd_flags_t uffd_flags) { return __mcopy_atomic(dst_mm, dst_start, src_start, len, - MCOPY_ATOMIC_NORMAL, mmap_changing, mode); + MCOPY_ATOMIC_NORMAL, mmap_changing, uffd_flags); } ssize_t mfill_zeropage(struct mm_struct *dst_mm, unsigned long start, - unsigned long len, atomic_t *mmap_changing) + unsigned long len, atomic_t *mmap_changing, + uffd_flags_t uffd_flags) { return __mcopy_atomic(dst_mm, start, 0, len, MCOPY_ATOMIC_ZEROPAGE, mmap_changing, 0); } ssize_t mcopy_continue(struct mm_struct *dst_mm, unsigned long start, - unsigned long len, atomic_t *mmap_changing) + unsigned long len, atomic_t *mmap_changing, + uffd_flags_t uffd_flags) { return __mcopy_atomic(dst_mm, start, 0, len, MCOPY_ATOMIC_CONTINUE, mmap_changing, 0); } int mwriteprotect_range(struct mm_struct *dst_mm, unsigned long start, - unsigned long len, bool enable_wp, - atomic_t *mmap_changing) + unsigned long len, + atomic_t *mmap_changing, uffd_flags_t uffd_flags) { + bool enable_wp = uffd_flags & UFFD_FLAGS_WP; struct vm_area_struct *dst_vma; unsigned long page_mask; struct mmu_gather tlb; From patchwork Wed Jun 22 18:50:35 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nadav Amit X-Patchwork-Id: 12891716 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2FDD7C43334 for ; Thu, 23 Jun 2022 02:25:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0BEBE8E010B; Wed, 22 Jun 2022 22:25:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id EC1838E00FA; Wed, 22 Jun 2022 22:25:55 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D636D8E010B; Wed, 22 Jun 2022 22:25:55 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id BA5C08E00FA for ; Wed, 22 Jun 2022 22:25:55 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 8C8D93570B for ; Thu, 23 Jun 2022 02:25:55 +0000 (UTC) X-FDA: 79607910270.22.C4064E4 Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by imf15.hostedemail.com (Postfix) with ESMTP id 0F971A0004 for ; Thu, 23 Jun 2022 02:25:54 +0000 (UTC) Received: from smtpin31.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id BF166613C7 for ; Thu, 23 Jun 2022 02:25:54 +0000 (UTC) X-FDA: 79607910228.31.9C376B6 Received: from mail-pg1-f177.google.com (mail-pg1-f177.google.com [209.85.215.177]) by imf19.hostedemail.com (Postfix) with ESMTP id 4168C1A001B for ; Thu, 23 Jun 2022 02:25:54 +0000 (UTC) Received: by mail-pg1-f177.google.com with SMTP id 184so17782051pga.12 for ; Wed, 22 Jun 2022 19:25:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=pqUWXRGN0QqWO9lVdyz5c49OfcgUmUpD4Ajt7dD7c2w=; b=AxXDQN57elghAQEwg0BQjwqIgveFwnskzGAQewqhgTSoSws50GuIf2XQ3OeNl5Ffc6 wdhun4ZvjQtzn1zwk3K3sHYKpCj0WpnwHuFQG7uZIXn+znwiN8ESXP5z14g+V5TYdDa1 erVFruODWKInd6+wkUNPzqxsppwYGDof/s39owCjrIxwxTc63PQ37F8r3V8Ci6K1ew9W 1Vf5R2tuVVEYuYh2x+0BsE7VA/d5x01Q2Kx1ApZx3KlcxbEcHcmesQQsLiETRniMDSmn VqReayxmMCtIbrVt6j6z9YgMJlhpUInrkSgbJIpoA3+8+Jf/n6HzTWsh5UtFDfeKsYgF WU9w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=pqUWXRGN0QqWO9lVdyz5c49OfcgUmUpD4Ajt7dD7c2w=; b=XL5Xeoaed8cX+PCtHv3ssngx1OAxBqaFI5qySd9I3uzSyg8TVem1ecI1tLNNrnlTm1 LaLnyZAzSwLvMp35PxY6C3El/oS89rE4zstlbVAtnApyiX3N0OGEXAnPtJhqYgHpf8GV W4W5pd6PzjWKdXjdJYcz9cbNShrW7Cjfdos5ydsO1ROAAUtDpYRHjuNQB5OYQvrUR2Op xlMhU4RNZYxxEVx5Z8X785B3BaIf4efGcQdqTSdf3Th4RLIfY7DeWaEVSofyRCBm0ke8 DgGA+eAw/5DbYUT1RuUkccShyEN4yiRJ4qfhobRPY4ZDMNchgIQdKauSJEH7H4x9Wrw9 YcEA== X-Gm-Message-State: AJIora/SFAAC/G3hDWOzWl9sxPAlocKzWjCqpJ8zxQmytRW8VOR+GyrV 3w47j0/YYaM0AEdq6K+E9ILM5AOKS47MoA== X-Google-Smtp-Source: AGRyM1tbtJ3Asncd5WV1OcxPI3TVk6x3OE+eKTNNG5N3wWd13mSVvDRBh+CJczpsvfNykKvlKYYCcQ== X-Received: by 2002:a65:6045:0:b0:399:3a5e:e25a with SMTP id a5-20020a656045000000b003993a5ee25amr5430257pgp.139.1655951152951; Wed, 22 Jun 2022 19:25:52 -0700 (PDT) Received: from sc2-haas01-esx0118.eng.vmware.com ([66.170.99.1]) by smtp.gmail.com with ESMTPSA id ik10-20020a170902ab0a00b001617541c94fsm13423998plb.60.2022.06.22.19.25.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 22 Jun 2022 19:25:52 -0700 (PDT) From: Nadav Amit X-Google-Original-From: Nadav Amit To: linux-mm@kvack.org Cc: Nadav Amit , Mike Kravetz , Hugh Dickins , Andrew Morton , Axel Rasmussen , Peter Xu , David Hildenbrand , Mike Rapoport Subject: [PATCH v1 2/5] userfaultfd: introduce access-likely mode for common operations Date: Wed, 22 Jun 2022 11:50:35 -0700 Message-Id: <20220622185038.71740-3-namit@vmware.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220622185038.71740-1-namit@vmware.com> References: <20220622185038.71740-1-namit@vmware.com> MIME-Version: 1.0 ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=AxXDQN57; spf=none (imf15.hostedemail.com: domain of MAILER-DAEMON@hostedemail.com has no SPF policy when checking 216.40.44.11) smtp.mailfrom=MAILER-DAEMON@hostedemail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1655951155; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=pqUWXRGN0QqWO9lVdyz5c49OfcgUmUpD4Ajt7dD7c2w=; b=6D2+KfZ1nIrU378TM7mg48Rnnp3tXUc4+9j9150T5/hk32zCRFgGogoGDK1Jbh24qbmLRA YZGJ2hUA4yCkjda/hYASdtMT0ZLPn3DbD3oL7rwfXetDnj4hc80VP1GzVtBnpsP8PHblYH k9mcbhriczq+saGTrv5cKW5QJWpuchk= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1655951155; a=rsa-sha256; cv=none; b=WQuc+nPZIdFuWHpH5FFmp6n9Bdog2Lw0HvcCCN83ZzqO7bDTVrEEZLpqhXiZhlS1mPa/XF vRooF32YpSFYyVpMa1Jugi2Qjc0msKJbMP8uRRgT3xg9iHLmPj3MI/H9J8/vq2dkK4dp75 xIz0Mw0gj9lMg4IKrhKrUzxXU3p+Nmg= X-HE-Tag-Orig: 1655951154-354615 X-Stat-Signature: kqs6czxgnat3nfqgpmweyh7quokoxo7r X-Rspam-User: X-Rspamd-Server: rspam07 Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=AxXDQN57; spf=none (imf15.hostedemail.com: domain of MAILER-DAEMON@hostedemail.com has no SPF policy when checking 216.40.44.11) smtp.mailfrom=MAILER-DAEMON@hostedemail.com; dmarc=pass (policy=none) header.from=gmail.com X-Rspamd-Queue-Id: 0F971A0004 X-HE-Tag: 1655951154-254693 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Nadav Amit Using a PTE on x86 with cleared access-bit (aka young-bit) takes ~600 cycles more than when the access bit is set. At the same time, setting the access-bit for memory that is not used (e.g., prefetched) can introduce greater overheads, as the prefetched memory is reclaimed later than it should be. Userfaultfd currently does not set the access-bit (excluding the huge-pages case). Arguably, it is best to let the user control whether the access bit should be set or not. The expected use is to request userfaultfd to set the access-bit when the copy/wp operation is done to resolve a page-fault, and not to set the access-bit when the memory is prefetched. Introduce UFFDIO_[op]_ACCESS_LIKELY to enable userspace to request the young bit to be set. Cc: Mike Kravetz Cc: Hugh Dickins Cc: Andrew Morton Cc: Axel Rasmussen Cc: Peter Xu Cc: David Hildenbrand Cc: Mike Rapoport Signed-off-by: Nadav Amit --- fs/userfaultfd.c | 25 ++++++++++++++++++++----- include/linux/userfaultfd_k.h | 1 + include/uapi/linux/userfaultfd.h | 20 +++++++++++++++++++- mm/userfaultfd.c | 16 ++++++++++++---- 4 files changed, 52 insertions(+), 10 deletions(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index a44e46f8249f..abf176bd0349 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -1726,12 +1726,15 @@ static int userfaultfd_copy(struct userfaultfd_ctx *ctx, ret = -EINVAL; if (uffdio_copy.src + uffdio_copy.len <= uffdio_copy.src) goto out; - if (uffdio_copy.mode & ~(UFFDIO_COPY_MODE_DONTWAKE|UFFDIO_COPY_MODE_WP)) + if (uffdio_copy.mode & ~(UFFDIO_COPY_MODE_DONTWAKE|UFFDIO_COPY_MODE_WP| + UFFDIO_COPY_MODE_ACCESS_LIKELY)) goto out; mode_wp = uffdio_copy.mode & UFFDIO_COPY_MODE_WP; uffd_flags = mode_wp ? UFFD_FLAGS_WP : UFFD_FLAGS_NONE; + if (uffdio_copy.mode & UFFDIO_COPY_MODE_ACCESS_LIKELY) + uffd_flags |= UFFD_FLAGS_ACCESS_LIKELY; if (mmget_not_zero(ctx->mm)) { ret = mcopy_atomic(ctx->mm, uffdio_copy.dst, uffdio_copy.src, @@ -1783,9 +1786,13 @@ static int userfaultfd_zeropage(struct userfaultfd_ctx *ctx, if (ret) goto out; ret = -EINVAL; - if (uffdio_zeropage.mode & ~UFFDIO_ZEROPAGE_MODE_DONTWAKE) + if (uffdio_zeropage.mode & ~(UFFDIO_ZEROPAGE_MODE_DONTWAKE| + UFFDIO_ZEROPAGE_MODE_ACCESS_LIKELY)) goto out; + if (uffdio_zeropage.mode & UFFDIO_ZEROPAGE_MODE_ACCESS_LIKELY) + uffd_flags |= UFFD_FLAGS_ACCESS_LIKELY; + if (mmget_not_zero(ctx->mm)) { ret = mfill_zeropage(ctx->mm, uffdio_zeropage.range.start, uffdio_zeropage.range.len, @@ -1835,7 +1842,8 @@ static int userfaultfd_writeprotect(struct userfaultfd_ctx *ctx, return ret; if (uffdio_wp.mode & ~(UFFDIO_WRITEPROTECT_MODE_DONTWAKE | - UFFDIO_WRITEPROTECT_MODE_WP)) + UFFDIO_WRITEPROTECT_MODE_WP | + UFFDIO_WRITEPROTECT_MODE_ACCESS_LIKELY)) return -EINVAL; mode_wp = uffdio_wp.mode & UFFDIO_WRITEPROTECT_MODE_WP; @@ -1845,6 +1853,8 @@ static int userfaultfd_writeprotect(struct userfaultfd_ctx *ctx, return -EINVAL; uffd_flags = mode_wp ? UFFD_FLAGS_WP : UFFD_FLAGS_NONE; + if (uffdio_wp.mode & UFFDIO_WRITEPROTECT_MODE_ACCESS_LIKELY) + uffd_flags |= UFFD_FLAGS_ACCESS_LIKELY; if (mmget_not_zero(ctx->mm)) { ret = mwriteprotect_range(ctx->mm, uffdio_wp.range.start, @@ -1872,6 +1882,7 @@ static int userfaultfd_continue(struct userfaultfd_ctx *ctx, unsigned long arg) struct uffdio_continue uffdio_continue; struct uffdio_continue __user *user_uffdio_continue; struct userfaultfd_wake_range range; + uffd_flags_t uffd_flags = UFFD_FLAGS_NONE; user_uffdio_continue = (struct uffdio_continue __user *)arg; @@ -1896,13 +1907,17 @@ static int userfaultfd_continue(struct userfaultfd_ctx *ctx, unsigned long arg) uffdio_continue.range.start) { goto out; } - if (uffdio_continue.mode & ~UFFDIO_CONTINUE_MODE_DONTWAKE) + if (uffdio_continue.mode & ~(UFFDIO_CONTINUE_MODE_DONTWAKE| + UFFDIO_CONTINUE_MODE_ACCESS_LIKELY)) goto out; + if (uffdio_continue.mode & UFFDIO_CONTINUE_MODE_ACCESS_LIKELY) + uffd_flags |= UFFD_FLAGS_ACCESS_LIKELY; + if (mmget_not_zero(ctx->mm)) { ret = mcopy_continue(ctx->mm, uffdio_continue.range.start, uffdio_continue.range.len, - &ctx->mmap_changing, 0); + &ctx->mmap_changing, uffd_flags); mmput(ctx->mm); } else { return -ESRCH; diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index d5b3dff48a87..af268b2c2b27 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -59,6 +59,7 @@ typedef unsigned int __bitwise uffd_flags_t; #define UFFD_FLAGS_NONE ((__force uffd_flags_t)0) #define UFFD_FLAGS_WP ((__force uffd_flags_t)BIT(0)) +#define UFFD_FLAGS_ACCESS_LIKELY ((__force uffd_flags_t)BIT(1)) extern int mfill_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd, struct vm_area_struct *dst_vma, diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaultfd.h index 005e5e306266..ff7150c878bb 100644 --- a/include/uapi/linux/userfaultfd.h +++ b/include/uapi/linux/userfaultfd.h @@ -38,7 +38,8 @@ UFFD_FEATURE_MINOR_HUGETLBFS | \ UFFD_FEATURE_MINOR_SHMEM | \ UFFD_FEATURE_EXACT_ADDRESS | \ - UFFD_FEATURE_WP_HUGETLBFS_SHMEM) + UFFD_FEATURE_WP_HUGETLBFS_SHMEM | \ + UFFD_FEATURE_ACCESS_HINTS) #define UFFD_API_IOCTLS \ ((__u64)1 << _UFFDIO_REGISTER | \ (__u64)1 << _UFFDIO_UNREGISTER | \ @@ -203,6 +204,9 @@ struct uffdio_api { * * UFFD_FEATURE_WP_HUGETLBFS_SHMEM indicates that userfaultfd * write-protection mode is supported on both shmem and hugetlbfs. + * + * UFFD_FEATURE_ACCESS_HINTS indicates that the ioctl operations + * support the UFFDIO_*_MODE_ACCESS_LIKELY hints. */ #define UFFD_FEATURE_PAGEFAULT_FLAG_WP (1<<0) #define UFFD_FEATURE_EVENT_FORK (1<<1) @@ -217,6 +221,7 @@ struct uffdio_api { #define UFFD_FEATURE_MINOR_SHMEM (1<<10) #define UFFD_FEATURE_EXACT_ADDRESS (1<<11) #define UFFD_FEATURE_WP_HUGETLBFS_SHMEM (1<<12) +#define UFFD_FEATURE_ACCESS_HINTS (1<<13) __u64 features; __u64 ioctls; @@ -251,8 +256,14 @@ struct uffdio_copy { * the fly. UFFDIO_COPY_MODE_WP is available only if the * write protected ioctl is implemented for the range * according to the uffdio_register.ioctls. + * + * UFFDIO_COPY_MODE_ACCESS_LIKELY provides a hint to the kernel that the + * page is likely to be access in the near future. Providing the hint + * properly can improve performance. + * */ #define UFFDIO_COPY_MODE_WP ((__u64)1<<1) +#define UFFDIO_COPY_MODE_ACCESS_LIKELY ((__u64)1<<2) __u64 mode; /* @@ -265,6 +276,7 @@ struct uffdio_copy { struct uffdio_zeropage { struct uffdio_range range; #define UFFDIO_ZEROPAGE_MODE_DONTWAKE ((__u64)1<<0) +#define UFFDIO_ZEROPAGE_MODE_ACCESS_LIKELY ((__u64)1<<1) __u64 mode; /* @@ -284,6 +296,10 @@ struct uffdio_writeprotect { * UFFDIO_WRITEPROTECT_MODE_DONTWAKE: set the flag to avoid waking up * any wait thread after the operation succeeds. * + * UFFDIO_WRITEPROTECT_MODE_ACCESS_LIKELY provides a hint to the kernel + * that the page is likely to be access in the near future. Providing + * the hint properly can improve performance. + * * NOTE: Write protecting a region (WP=1) is unrelated to page faults, * therefore DONTWAKE flag is meaningless with WP=1. Removing write * protection (WP=0) in response to a page fault wakes the faulting @@ -291,12 +307,14 @@ struct uffdio_writeprotect { */ #define UFFDIO_WRITEPROTECT_MODE_WP ((__u64)1<<0) #define UFFDIO_WRITEPROTECT_MODE_DONTWAKE ((__u64)1<<1) +#define UFFDIO_WRITEPROTECT_MODE_ACCESS_LIKELY ((__u64)1<<2) __u64 mode; }; struct uffdio_continue { struct uffdio_range range; #define UFFDIO_CONTINUE_MODE_DONTWAKE ((__u64)1<<0) +#define UFFDIO_CONTINUE_MODE_ACCESS_LIKELY ((__u64)1<<1) __u64 mode; /* diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 734de6aa0b8e..5051b9028722 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -92,6 +92,9 @@ int mfill_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd, */ _dst_pte = pte_wrprotect(_dst_pte); + if (uffd_flags & UFFD_FLAGS_ACCESS_LIKELY) + _dst_pte = pte_mkyoung(_dst_pte); + dst_pte = pte_offset_map_lock(dst_mm, dst_pmd, dst_addr, &ptl); if (vma_is_shmem(dst_vma)) { @@ -202,7 +205,8 @@ static int mcopy_atomic_pte(struct mm_struct *dst_mm, static int mfill_zeropage_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd, struct vm_area_struct *dst_vma, - unsigned long dst_addr) + unsigned long dst_addr, + uffd_flags_t uffd_flags) { pte_t _dst_pte, *dst_pte; spinlock_t *ptl; @@ -225,6 +229,10 @@ static int mfill_zeropage_pte(struct mm_struct *dst_mm, ret = -EEXIST; if (!pte_none(*dst_pte)) goto out_unlock; + + if (uffd_flags & UFFD_FLAGS_ACCESS_LIKELY) + _dst_pte = pte_mkyoung(_dst_pte); + set_pte_at(dst_mm, dst_addr, dst_pte, _dst_pte); /* No need to invalidate - it was non-present before */ update_mmu_cache(dst_vma, dst_addr, dst_pte); @@ -498,7 +506,7 @@ static __always_inline ssize_t mfill_atomic_pte(struct mm_struct *dst_mm, uffd_flags); else err = mfill_zeropage_pte(dst_mm, dst_pmd, - dst_vma, dst_addr); + dst_vma, dst_addr, uffd_flags); } else { err = shmem_mfill_atomic_pte(dst_mm, dst_pmd, dst_vma, dst_addr, src_addr, @@ -692,7 +700,7 @@ ssize_t mfill_zeropage(struct mm_struct *dst_mm, unsigned long start, uffd_flags_t uffd_flags) { return __mcopy_atomic(dst_mm, start, 0, len, MCOPY_ATOMIC_ZEROPAGE, - mmap_changing, 0); + mmap_changing, uffd_flags); } ssize_t mcopy_continue(struct mm_struct *dst_mm, unsigned long start, @@ -700,7 +708,7 @@ ssize_t mcopy_continue(struct mm_struct *dst_mm, unsigned long start, uffd_flags_t uffd_flags) { return __mcopy_atomic(dst_mm, start, 0, len, MCOPY_ATOMIC_CONTINUE, - mmap_changing, 0); + mmap_changing, uffd_flags); } int mwriteprotect_range(struct mm_struct *dst_mm, unsigned long start, From patchwork Wed Jun 22 18:50:36 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nadav Amit X-Patchwork-Id: 12891717 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B319BC433EF for ; Thu, 23 Jun 2022 02:25:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E76FE8E010C; Wed, 22 Jun 2022 22:25:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DD8758E00FA; Wed, 22 Jun 2022 22:25:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B8DCD8E010C; Wed, 22 Jun 2022 22:25:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id A47868E00FA for ; Wed, 22 Jun 2022 22:25:56 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 79ACC12F1 for ; Thu, 23 Jun 2022 02:25:56 +0000 (UTC) X-FDA: 79607910312.28.166BB32 Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by imf01.hostedemail.com (Postfix) with ESMTP id 102DF4001F for ; Thu, 23 Jun 2022 02:25:55 +0000 (UTC) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id C293A34803 for ; Thu, 23 Jun 2022 02:25:55 +0000 (UTC) X-FDA: 79607910270.15.F9A909D Received: from mail-pg1-f169.google.com (mail-pg1-f169.google.com [209.85.215.169]) by imf29.hostedemail.com (Postfix) with ESMTP id 53D8212001C for ; Thu, 23 Jun 2022 02:25:55 +0000 (UTC) Received: by mail-pg1-f169.google.com with SMTP id d129so17787276pgc.9 for ; Wed, 22 Jun 2022 19:25:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=egRAgvK3KjGXMpuJXiIv98W6PoKwjdA3C2XwehLqWZQ=; b=gyAc0XiK/CcE7RDtHwYsimROLN05k+ESFwJw8lNvfqJf8Irt4LvE80l7+GUDu8T9YR Yq1X8WDd9EXM+GeX0d9GyJqfvl2FcE9nLzJQ5iZoIp2VkgvhrUY8qv7afc5A301Nne+z 3vq5IRW4Z+BUtLi678yi9PtZcw3g3d6MYfzaMWKFP/aFBAk/u8NFdaz7JUrnl+oydX0i xeBzcM7+LB3xfR3TdDx2/++/MHog6Ys7zn1CBWK3Qn9fjSw4IvDDzMGLLANni3D5I16c 2ASI4CypmSiLS/dnM8V8wN43PiSbkuI21fZKotcqXjYLgX3B9GJBQVjNSffadn6Sngtt sRNg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=egRAgvK3KjGXMpuJXiIv98W6PoKwjdA3C2XwehLqWZQ=; b=g9LmA4C+cpwexHKc0q+DygpAsNogjzWFnOyWttO+ucroWfgPkSjMnjM0Lz/uu7Q5l4 m6Mf/1y29QDWP3uZrykT3CEyi3U2Ef2ZKhGi2GfCMGfpgF6cFhwl4ApXqOho02RQL1do vjlCmc46Y57bL7d2UEJb7kwm8WXgautxF11xfO0Xk1QygAnShTGEUebynEFD3RaA1mUj k5B1F8BjYAZ6AaxdKQi7ffGagJA5UszqudgAETFwOmfbEALY8WTqN9NyFr6ZS4VGDFbR ZYS30U6hgMSAAwHJBdLDu5/xrv9+2tpRz+/MUDfLBoarsJQrd+Zf4KTWnIuUtLGcEYp+ DMwA== X-Gm-Message-State: AJIora+WexogTeQhKDA7i8yXhX2953OImT4JfJq3q8X32JkQEEJ93v2M LyTChO5rr0JUqCkYSz3tROIt27jiVJeSUQ== X-Google-Smtp-Source: AGRyM1uoY6GauWfcaO7XrdOYiZGfaSJIaz11NdLlTvgxMuCH6QXkIQrcZA5oYSWxdsc+AjvRyQAfxA== X-Received: by 2002:a63:3dc1:0:b0:40c:9f5a:e35 with SMTP id k184-20020a633dc1000000b0040c9f5a0e35mr5449248pga.608.1655951154114; Wed, 22 Jun 2022 19:25:54 -0700 (PDT) Received: from sc2-haas01-esx0118.eng.vmware.com ([66.170.99.1]) by smtp.gmail.com with ESMTPSA id ik10-20020a170902ab0a00b001617541c94fsm13423998plb.60.2022.06.22.19.25.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 22 Jun 2022 19:25:53 -0700 (PDT) From: Nadav Amit X-Google-Original-From: Nadav Amit To: linux-mm@kvack.org Cc: Nadav Amit , Mike Kravetz , Hugh Dickins , Andrew Morton , Axel Rasmussen , Peter Xu , David Hildenbrand , Mike Rapoport Subject: [PATCH v1 3/5] userfaultfd: introduce write-likely mode for uffd operations Date: Wed, 22 Jun 2022 11:50:36 -0700 Message-Id: <20220622185038.71740-4-namit@vmware.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220622185038.71740-1-namit@vmware.com> References: <20220622185038.71740-1-namit@vmware.com> MIME-Version: 1.0 ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=gyAc0XiK; dmarc=pass (policy=none) header.from=gmail.com; spf=none (imf01.hostedemail.com: domain of MAILER-DAEMON@hostedemail.com has no SPF policy when checking 216.40.44.11) smtp.mailfrom=MAILER-DAEMON@hostedemail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1655951156; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=egRAgvK3KjGXMpuJXiIv98W6PoKwjdA3C2XwehLqWZQ=; b=mAZ0gyuKVPm2iCIrw/8FSFyyEkSBjM8pLFpLnWPP6ibXCG4/p3+lSblr48r2PAopXWPtHF l09yyCtS+9DRqL2em98Nbm6JRBP1fL4KHJZw6oJTukfYwNYf61h1q9f3AX/W3KoUEwQbsu Nhp9yYKYqoy9nv0sdq1c4/UCrtqDQrc= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1655951156; a=rsa-sha256; cv=none; b=YPUPGkJa/hKeA8+/3iC6sOPdT56u7jFgl1vMmGt7ZC0EmiHbdimNN3W5mlSEUAD5qZ4QVm 88tXKhLuzWro//hfixq7uIZVqrxyhyd+c0eQwjm1O/tnCo0nCkyaAW1gdRKZ1iEoffshU7 UNCh80l46/bsYsOkgEfoxe5Bq4KCe34= X-Rspam-User: Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=gyAc0XiK; dmarc=pass (policy=none) header.from=gmail.com; spf=none (imf01.hostedemail.com: domain of MAILER-DAEMON@hostedemail.com has no SPF policy when checking 216.40.44.11) smtp.mailfrom=MAILER-DAEMON@hostedemail.com X-Stat-Signature: bwp3c4o6a949krth9sup58uxm6e66tp9 X-Rspamd-Queue-Id: 102DF4001F X-HE-Tag-Orig: 1655951155-261112 X-Rspamd-Server: rspam12 X-HE-Tag: 1655951155-25322 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Nadav Amit Either always setting the dirty bit or always leaving it clear does not seem as the best policy. Leaving the bit clear introduces overhead on the first write-access, which is required to set the bit. Setting the bit for pages the are eventually not written can require more TLB flushes. Let the userfaultfd users control whether PTEs are marked as dirty or clean. Introduce UFFDIO_[op]_MODE_WRITE to enable userspace to indicate whether pages are likely to be written and set the dirty-bit if they are likely to be written. Cc: Mike Kravetz Cc: Hugh Dickins Cc: Andrew Morton Cc: Axel Rasmussen Cc: Peter Xu Cc: David Hildenbrand Cc: Mike Rapoport Signed-off-by: Nadav Amit --- fs/userfaultfd.c | 20 ++++++++++++++++---- include/linux/userfaultfd_k.h | 1 + include/uapi/linux/userfaultfd.h | 13 ++++++++++++- mm/hugetlb.c | 3 +++ mm/shmem.c | 3 +++ mm/userfaultfd.c | 13 ++++++++++--- 6 files changed, 45 insertions(+), 8 deletions(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index abf176bd0349..13d73e37e230 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -1727,7 +1727,8 @@ static int userfaultfd_copy(struct userfaultfd_ctx *ctx, if (uffdio_copy.src + uffdio_copy.len <= uffdio_copy.src) goto out; if (uffdio_copy.mode & ~(UFFDIO_COPY_MODE_DONTWAKE|UFFDIO_COPY_MODE_WP| - UFFDIO_COPY_MODE_ACCESS_LIKELY)) + UFFDIO_COPY_MODE_ACCESS_LIKELY| + UFFDIO_COPY_MODE_WRITE_LIKELY)) goto out; mode_wp = uffdio_copy.mode & UFFDIO_COPY_MODE_WP; @@ -1735,6 +1736,8 @@ static int userfaultfd_copy(struct userfaultfd_ctx *ctx, uffd_flags = mode_wp ? UFFD_FLAGS_WP : UFFD_FLAGS_NONE; if (uffdio_copy.mode & UFFDIO_COPY_MODE_ACCESS_LIKELY) uffd_flags |= UFFD_FLAGS_ACCESS_LIKELY; + if (uffdio_copy.mode & UFFDIO_COPY_MODE_WRITE_LIKELY) + uffd_flags |= UFFD_FLAGS_WRITE_LIKELY; if (mmget_not_zero(ctx->mm)) { ret = mcopy_atomic(ctx->mm, uffdio_copy.dst, uffdio_copy.src, @@ -1787,11 +1790,14 @@ static int userfaultfd_zeropage(struct userfaultfd_ctx *ctx, goto out; ret = -EINVAL; if (uffdio_zeropage.mode & ~(UFFDIO_ZEROPAGE_MODE_DONTWAKE| - UFFDIO_ZEROPAGE_MODE_ACCESS_LIKELY)) + UFFDIO_ZEROPAGE_MODE_ACCESS_LIKELY| + UFFDIO_ZEROPAGE_MODE_WRITE_LIKELY)) goto out; if (uffdio_zeropage.mode & UFFDIO_ZEROPAGE_MODE_ACCESS_LIKELY) uffd_flags |= UFFD_FLAGS_ACCESS_LIKELY; + if (uffdio_zeropage.mode & UFFDIO_ZEROPAGE_MODE_WRITE_LIKELY) + uffd_flags |= UFFD_FLAGS_WRITE_LIKELY; if (mmget_not_zero(ctx->mm)) { ret = mfill_zeropage(ctx->mm, uffdio_zeropage.range.start, @@ -1843,7 +1849,8 @@ static int userfaultfd_writeprotect(struct userfaultfd_ctx *ctx, if (uffdio_wp.mode & ~(UFFDIO_WRITEPROTECT_MODE_DONTWAKE | UFFDIO_WRITEPROTECT_MODE_WP | - UFFDIO_WRITEPROTECT_MODE_ACCESS_LIKELY)) + UFFDIO_WRITEPROTECT_MODE_ACCESS_LIKELY | + UFFDIO_WRITEPROTECT_MODE_WRITE_LIKELY)) return -EINVAL; mode_wp = uffdio_wp.mode & UFFDIO_WRITEPROTECT_MODE_WP; @@ -1855,6 +1862,8 @@ static int userfaultfd_writeprotect(struct userfaultfd_ctx *ctx, uffd_flags = mode_wp ? UFFD_FLAGS_WP : UFFD_FLAGS_NONE; if (uffdio_wp.mode & UFFDIO_WRITEPROTECT_MODE_ACCESS_LIKELY) uffd_flags |= UFFD_FLAGS_ACCESS_LIKELY; + if (uffdio_wp.mode & UFFDIO_WRITEPROTECT_MODE_WRITE_LIKELY) + uffd_flags |= UFFD_FLAGS_WRITE_LIKELY; if (mmget_not_zero(ctx->mm)) { ret = mwriteprotect_range(ctx->mm, uffdio_wp.range.start, @@ -1908,11 +1917,14 @@ static int userfaultfd_continue(struct userfaultfd_ctx *ctx, unsigned long arg) goto out; } if (uffdio_continue.mode & ~(UFFDIO_CONTINUE_MODE_DONTWAKE| - UFFDIO_CONTINUE_MODE_ACCESS_LIKELY)) + UFFDIO_CONTINUE_MODE_ACCESS_LIKELY| + UFFDIO_CONTINUE_MODE_WRITE_LIKELY)) goto out; if (uffdio_continue.mode & UFFDIO_CONTINUE_MODE_ACCESS_LIKELY) uffd_flags |= UFFD_FLAGS_ACCESS_LIKELY; + if (uffdio_continue.mode & UFFDIO_CONTINUE_MODE_WRITE_LIKELY) + uffd_flags |= UFFD_FLAGS_WRITE_LIKELY; if (mmget_not_zero(ctx->mm)) { ret = mcopy_continue(ctx->mm, uffdio_continue.range.start, diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index af268b2c2b27..59c43ea502e7 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -60,6 +60,7 @@ typedef unsigned int __bitwise uffd_flags_t; #define UFFD_FLAGS_NONE ((__force uffd_flags_t)0) #define UFFD_FLAGS_WP ((__force uffd_flags_t)BIT(0)) #define UFFD_FLAGS_ACCESS_LIKELY ((__force uffd_flags_t)BIT(1)) +#define UFFD_FLAGS_WRITE_LIKELY ((__force uffd_flags_t)BIT(2)) extern int mfill_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd, struct vm_area_struct *dst_vma, diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaultfd.h index ff7150c878bb..7b6ab0b43475 100644 --- a/include/uapi/linux/userfaultfd.h +++ b/include/uapi/linux/userfaultfd.h @@ -206,7 +206,7 @@ struct uffdio_api { * write-protection mode is supported on both shmem and hugetlbfs. * * UFFD_FEATURE_ACCESS_HINTS indicates that the ioctl operations - * support the UFFDIO_*_MODE_ACCESS_LIKELY hints. + * support the UFFDIO_*_MODE_[ACCESS|WRITE]_LIKELY hints. */ #define UFFD_FEATURE_PAGEFAULT_FLAG_WP (1<<0) #define UFFD_FEATURE_EVENT_FORK (1<<1) @@ -261,9 +261,13 @@ struct uffdio_copy { * page is likely to be access in the near future. Providing the hint * properly can improve performance. * + * UFFDIO_COPY_MODE_WRITE_LIKELY provides a hint to the kernel that the + * page is likely to be written in the near future. Providing the hint + * properly can improve performance. */ #define UFFDIO_COPY_MODE_WP ((__u64)1<<1) #define UFFDIO_COPY_MODE_ACCESS_LIKELY ((__u64)1<<2) +#define UFFDIO_COPY_MODE_WRITE_LIKELY ((__u64)1<<3) __u64 mode; /* @@ -277,6 +281,7 @@ struct uffdio_zeropage { struct uffdio_range range; #define UFFDIO_ZEROPAGE_MODE_DONTWAKE ((__u64)1<<0) #define UFFDIO_ZEROPAGE_MODE_ACCESS_LIKELY ((__u64)1<<1) +#define UFFDIO_ZEROPAGE_MODE_WRITE_LIKELY ((__u64)1<<2) __u64 mode; /* @@ -300,6 +305,10 @@ struct uffdio_writeprotect { * that the page is likely to be access in the near future. Providing * the hint properly can improve performance. * + * UFFDIO_WRITEPROTECT_MODE_WRITE_LIKELY: provides a hint to the kernel + * that the page is likely to be written in the near future. Providing + * the hint properly can improve performance. + * * NOTE: Write protecting a region (WP=1) is unrelated to page faults, * therefore DONTWAKE flag is meaningless with WP=1. Removing write * protection (WP=0) in response to a page fault wakes the faulting @@ -308,6 +317,7 @@ struct uffdio_writeprotect { #define UFFDIO_WRITEPROTECT_MODE_WP ((__u64)1<<0) #define UFFDIO_WRITEPROTECT_MODE_DONTWAKE ((__u64)1<<1) #define UFFDIO_WRITEPROTECT_MODE_ACCESS_LIKELY ((__u64)1<<2) +#define UFFDIO_WRITEPROTECT_MODE_WRITE_LIKELY ((__u64)1<<3) __u64 mode; }; @@ -315,6 +325,7 @@ struct uffdio_continue { struct uffdio_range range; #define UFFDIO_CONTINUE_MODE_DONTWAKE ((__u64)1<<0) #define UFFDIO_CONTINUE_MODE_ACCESS_LIKELY ((__u64)1<<1) +#define UFFDIO_CONTINUE_MODE_WRITE_LIKELY ((__u64)1<<2) __u64 mode; /* diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 2beff8a4bf7c..46814fc7762f 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5962,6 +5962,9 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, *pagep = NULL; } + /* The PTE is not marked as dirty unconditionally */ + SetPageDirty(page); + /* * The memory barrier inside __SetPageUptodate makes sure that * preceding stores to the page contents become visible before diff --git a/mm/shmem.c b/mm/shmem.c index 89c775275bae..7488cd186c32 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -2404,6 +2404,9 @@ int shmem_mfill_atomic_pte(struct mm_struct *dst_mm, VM_BUG_ON(PageSwapBacked(page)); __SetPageLocked(page); __SetPageSwapBacked(page); + + /* The PTE is not marked as dirty unconditionally */ + SetPageDirty(page); __SetPageUptodate(page); ret = -EFAULT; diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 5051b9028722..6e767f1e7007 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -70,7 +70,6 @@ int mfill_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd, pgoff_t offset, max_off; _dst_pte = mk_pte(page, dst_vma->vm_page_prot); - _dst_pte = pte_mkdirty(_dst_pte); if (page_in_cache && !vm_shared) writable = false; @@ -83,14 +82,19 @@ int mfill_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd, writable = false; } - if (writable) + if (writable) { _dst_pte = pte_mkwrite(_dst_pte); - else + + /* Marking RO entries as dirty can mess with other code */ + if (uffd_flags & UFFD_FLAGS_WRITE_LIKELY) + _dst_pte = pte_mkdirty(_dst_pte); + } else { /* * We need this to make sure write bit removed; as mk_pte() * could return a pte with write bit set. */ _dst_pte = pte_wrprotect(_dst_pte); + } if (uffd_flags & UFFD_FLAGS_ACCESS_LIKELY) _dst_pte = pte_mkyoung(_dst_pte); @@ -180,6 +184,9 @@ static int mcopy_atomic_pte(struct mm_struct *dst_mm, *pagep = NULL; } + /* The PTE is not marked as dirty unconditionally */ + SetPageDirty(page); + /* * The memory barrier inside __SetPageUptodate makes sure that * preceding stores to the page contents become visible before From patchwork Wed Jun 22 18:50:37 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nadav Amit X-Patchwork-Id: 12891718 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5313EC43334 for ; Thu, 23 Jun 2022 02:26:00 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 576C38E010D; Wed, 22 Jun 2022 22:25:58 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 54B498E00FA; Wed, 22 Jun 2022 22:25:58 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3F0568E010D; Wed, 22 Jun 2022 22:25:58 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 2CD3B8E00FA for ; Wed, 22 Jun 2022 22:25:58 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 000FE213E7 for ; Thu, 23 Jun 2022 02:25:57 +0000 (UTC) X-FDA: 79607910396.20.CC9C24E Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by imf04.hostedemail.com (Postfix) with ESMTP id A297E4001B for ; Thu, 23 Jun 2022 02:25:57 +0000 (UTC) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 431663570B for ; Thu, 23 Jun 2022 02:25:57 +0000 (UTC) X-FDA: 79607910354.29.E712680 Received: from mail-pl1-f175.google.com (mail-pl1-f175.google.com [209.85.214.175]) by imf18.hostedemail.com (Postfix) with ESMTP id CB8691C0024 for ; Thu, 23 Jun 2022 02:25:56 +0000 (UTC) Received: by mail-pl1-f175.google.com with SMTP id k14so5450674plh.4 for ; Wed, 22 Jun 2022 19:25:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=zvPXoMWmzuR0KHLWaxsojd02ycg7ylF2SnBJl0q13Z8=; b=IihgpJ6DAoJKZMmbvBoB/+HCZIugXwER9x+fsqM/BzvhaDCZuC4lqtDXsPiMGJb1yu 1oeRFXHUewBjRsk8+h7VHqVfR1YoHA3IalNJNAI9BEd9pfCvStk2FyZ8SUeSgXR6RMxj Oi2AN0XczkrMW8z5SVYOjtb35g8Y0yPxS1W064X0GAQPFCSnhzXpV4pvaymrx/9e3Ll+ JfzRSbXQcC0u1gmDQzYlyLgFWPFMKsYDTA2LoSXwPJxeyQhzD9rKOzPFviLCrmPtj+xp JnBQKWBy3Cjfu1e8Lpv1XmfWj+aW+AjQBmsj00jczTNmDmrdwkrCysDIlvsqven7wiJ+ cV7A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=zvPXoMWmzuR0KHLWaxsojd02ycg7ylF2SnBJl0q13Z8=; b=vhak/KW8dp1tXUJ4GUxsHmsv3WMMMKkVtXxL1Znwdd2bJW8kk3XYT4qLxQeW0sS6fX SHqKOgJVNEkGNHFBqemDI0LxbJNFvoRH1CSRUePK/RPqAvDTSEE4L0gstS+QRj0qc9eq OQw+DGXIouOUcuDNAAv3BEuZ78FdekdNBwSyzS958Jwm+T64tYUcTwBO7vc6EX7VVJ+n B2XslcjCFwkUKfHmaU7tE+tRuB8D92rZTxmWl+UanP8CBQnMV9MiYR9h+205PsXfKdM8 rGTEbdCAFlgbmtCEquvb9do90kiNvEtfwWZey/N/x5xuyzFHRwlxRxcR7pRWIZ05laR+ raMw== X-Gm-Message-State: AJIora8GcDhiEqWATIagAc/36UMNk6HJ5S78zT9Eg4+5+phycH7TwTN2 KI/yaGKw3JFDL1U00xgv0A/bqspIN0Uw6w== X-Google-Smtp-Source: AGRyM1tLtsw5WrKsqR9EZgF9ROrEkynp1ju8qc5271A+0tP8HNMqrsAv9NXwdVsC/naA3plzsc84ew== X-Received: by 2002:a17:90b:1c86:b0:1ea:4ceb:2788 with SMTP id oo6-20020a17090b1c8600b001ea4ceb2788mr1550568pjb.16.1655951155550; Wed, 22 Jun 2022 19:25:55 -0700 (PDT) Received: from sc2-haas01-esx0118.eng.vmware.com ([66.170.99.1]) by smtp.gmail.com with ESMTPSA id ik10-20020a170902ab0a00b001617541c94fsm13423998plb.60.2022.06.22.19.25.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 22 Jun 2022 19:25:54 -0700 (PDT) From: Nadav Amit X-Google-Original-From: Nadav Amit To: linux-mm@kvack.org Cc: Nadav Amit , David Hildenbrand , Mike Kravetz , Hugh Dickins , Andrew Morton , Axel Rasmussen , Peter Xu , Mike Rapoport Subject: [PATCH v1 4/5] userfaultfd: zero access/write hints Date: Wed, 22 Jun 2022 11:50:37 -0700 Message-Id: <20220622185038.71740-5-namit@vmware.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220622185038.71740-1-namit@vmware.com> References: <20220622185038.71740-1-namit@vmware.com> MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1655951157; a=rsa-sha256; cv=none; b=aQnWqAEjlnJALZgw0SegSBGY2NL8KDNg46LZaz0pven8scPBgjtJcqcj8gf6MjzmyxkVGM uDn3Bktllm03xIPVzxTB/e+uIbulhnKRzrBhoa0Wa5mp2ln9Wlpde/11R0VvZDBz2Yqjll R2b8NVBDINA9CMDLvH6Q7EPYul3vStg= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=IihgpJ6D; spf=none (imf04.hostedemail.com: domain of MAILER-DAEMON@hostedemail.com has no SPF policy when checking 216.40.44.17) smtp.mailfrom=MAILER-DAEMON@hostedemail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1655951157; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=zvPXoMWmzuR0KHLWaxsojd02ycg7ylF2SnBJl0q13Z8=; b=VhYf6waksAIcpCHgLaD6dJhN5JjMDFiKtLUE8HCFRS9c3TU+nVCkr3S1ibgf9J9irXgpSN zmU4J/0UbmEDT3kYrf2C6mjlTroaYrjHTZICtsd+kaaug7Aie/bfub1j9nkc7Iu1y3oEag OMhPyl7yw0NJ7hO3YSvvlj9wY1rzDMU= X-Stat-Signature: bhphymbgbdu8zm7ik5wc4z6sk9xdsawx X-Rspamd-Server: rspam06 Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=IihgpJ6D; spf=none (imf04.hostedemail.com: domain of MAILER-DAEMON@hostedemail.com has no SPF policy when checking 216.40.44.17) smtp.mailfrom=MAILER-DAEMON@hostedemail.com; dmarc=pass (policy=none) header.from=gmail.com X-Rspam-User: X-HE-Tag-Orig: 1655951156-659929 X-Rspamd-Queue-Id: A297E4001B X-HE-Tag: 1655951157-61537 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Nadav Amit When userfaultfd provides a zeropage in response to ioctl, it provides a readonly alias to the zero page. If the page is later written (which is the likely scenario), page-fault occurs and the page-fault allocator allocates a page and rewires the page-tables. This is an expensive flow for cases in which a page is likely be written to. Users can use the copy ioctl to initialize zero page (by copying zeros), but this is also wasteful. Allow userfaultfd users to efficiently map initialized zero-pages that are writable. IF UFFDIO_ZEROPAGE_MODE_WRITE_LIKELY is provided would map a clear page instead of an alias to the zero page. For consistency, introduce also UFFDIO_ZEROPAGE_MODE_ACCESS_LIKELY. Suggested-by: David Hildenbrand Cc: Mike Kravetz Cc: Hugh Dickins Cc: Andrew Morton Cc: Axel Rasmussen Cc: Peter Xu Cc: Mike Rapoport Signed-off-by: Nadav Amit Acked-by: Peter Xu --- mm/userfaultfd.c | 35 +++++++++++++++++++++++++++++++++++ 1 file changed, 35 insertions(+) diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 6e767f1e7007..48286746b0af 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -249,6 +249,37 @@ static int mfill_zeropage_pte(struct mm_struct *dst_mm, return ret; } +static int mfill_clearpage_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd, + struct vm_area_struct *dst_vma, + unsigned long dst_addr, + uffd_flags_t uffd_flags) +{ + struct page *page; + int ret; + + ret = -ENOMEM; + page = alloc_zeroed_user_highpage_movable(dst_vma, dst_addr); + if (!page) + goto out; + + /* The PTE is not marked as dirty unconditionally */ + SetPageDirty(page); + __SetPageUptodate(page); + + if (mem_cgroup_charge(page_folio(page), dst_vma->vm_mm, GFP_KERNEL)) + goto out_release; + + ret = mfill_atomic_install_pte(dst_mm, dst_pmd, dst_vma, dst_addr, + page, true, uffd_flags); + if (ret) + goto out_release; +out: + return ret; +out_release: + put_page(page); + goto out; +} + /* Handles UFFDIO_CONTINUE for all shmem VMAs (shared or private). */ static int mcontinue_atomic_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd, @@ -511,6 +542,10 @@ static __always_inline ssize_t mfill_atomic_pte(struct mm_struct *dst_mm, err = mcopy_atomic_pte(dst_mm, dst_pmd, dst_vma, dst_addr, src_addr, page, uffd_flags); + else if (!(uffd_flags & UFFD_FLAGS_WP) && + (uffd_flags & UFFD_FLAGS_WRITE_LIKELY)) + err = mfill_clearpage_pte(dst_mm, dst_pmd, dst_vma, + dst_addr, uffd_flags); else err = mfill_zeropage_pte(dst_mm, dst_pmd, dst_vma, dst_addr, uffd_flags); From patchwork Wed Jun 22 18:50:38 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nadav Amit X-Patchwork-Id: 12891719 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0A549C433EF for ; Thu, 23 Jun 2022 02:26:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6865E8E010E; Wed, 22 Jun 2022 22:25:59 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 637418E00FA; Wed, 22 Jun 2022 22:25:59 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4B2DE8E010E; Wed, 22 Jun 2022 22:25:59 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 399B18E00FA for ; Wed, 22 Jun 2022 22:25:59 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 10D9635720 for ; Thu, 23 Jun 2022 02:25:59 +0000 (UTC) X-FDA: 79607910438.12.1E7F533 Received: from relay4.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by imf07.hostedemail.com (Postfix) with ESMTP id AC76340022 for ; Thu, 23 Jun 2022 02:25:58 +0000 (UTC) Received: from smtpin31.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay13.hostedemail.com (Postfix) with ESMTP id 6601D61200 for ; Thu, 23 Jun 2022 02:25:58 +0000 (UTC) X-FDA: 79607910396.31.F109CB1 Received: from mail-pj1-f49.google.com (mail-pj1-f49.google.com [209.85.216.49]) by imf27.hostedemail.com (Postfix) with ESMTP id DBCE540015 for ; Thu, 23 Jun 2022 02:25:57 +0000 (UTC) Received: by mail-pj1-f49.google.com with SMTP id a11-20020a17090acb8b00b001eca0041455so2487780pju.1 for ; Wed, 22 Jun 2022 19:25:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=NcPceKSRgtPTD9CpI6EICDoagd6x5gRxlN7y0kDN0bw=; b=ZTB1bY7/GK32DU08qg0ylip3r72EiobJb0xacUri4NAOupjuhG8c8M963oyoHIf9YB qsfcElNI6M/zieR11eAPCWYrVXajbANgV2PHunXI1qJdLCJMq1AlWU2UEkZSG96KWksq 9j9wwpHU9YpQTzGYmOOMLX266nsgOoLvGW6cWqBJPULFnxh5LM60ZC5PHr93GE4z1lkN UkzzpoS9q3RNK5gnoNJtZFO8qXF7EBd/DQLC/ROw1mDU1AV3so2tRNeKdZrUUQqXOKtJ W+PxbbevFsz7GFVSIKI1nHYKhPSFVDBz5ha45KL2A1CT1JQ+d/+FWssILyzXgz2IvzFN n60A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=NcPceKSRgtPTD9CpI6EICDoagd6x5gRxlN7y0kDN0bw=; b=HZp8uEhhxA4aBMHwGsYLyMRFj1bWsIa99kclSc0SUDtRb9akk0sMwC0AltRIyq7L4X 5j8Gny5IwdEJ+3g1KgxIBkkRpzbNm5zzczlrk2POtq2liTyozmGIIhZh+D8ENgURsqEy rgYpd0tc9SNcPpxxIKR0R5wNM8teHkW7rkZ6rwgimoYTL69NvLy7aJaL1UNpCWXbyG97 r/t1b9YCH2XEiU+/JH0YZqcLS3Zk3Gzu66KolBqFrNIQoh1L4dTgA75XLDJTFN1vffog yrv+vk+MUcezhdCInilL9jWs5xOpSIs02Zo+T0sf6qOWicA1imO/QFOtcbDmK57Dn4IH 4WEw== X-Gm-Message-State: AJIora8pXPT4lfKKuSmLz6ZeRt5EXNC86louUEiKsL345jq1/vqvum+M L/UZsk2+JGjhIr9iiSPleErHap5tOSFONg== X-Google-Smtp-Source: AGRyM1s19uzxfg/bZxAY69ymTT8MiZLYUxhv0VmiQoALOW/sVd/QZVpw8dSCCAlBLVY8EfDm7e9egA== X-Received: by 2002:a17:903:22d0:b0:16a:3039:adc9 with SMTP id y16-20020a17090322d000b0016a3039adc9mr14291532plg.32.1655951156583; Wed, 22 Jun 2022 19:25:56 -0700 (PDT) Received: from sc2-haas01-esx0118.eng.vmware.com ([66.170.99.1]) by smtp.gmail.com with ESMTPSA id ik10-20020a170902ab0a00b001617541c94fsm13423998plb.60.2022.06.22.19.25.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 22 Jun 2022 19:25:56 -0700 (PDT) From: Nadav Amit X-Google-Original-From: Nadav Amit To: linux-mm@kvack.org Cc: Nadav Amit , Mike Kravetz , Hugh Dickins , Andrew Morton , Axel Rasmussen , Peter Xu , David Hildenbrand , Mike Rapoport Subject: [PATCH v1 5/5] selftest/userfaultfd: test read/write hints Date: Wed, 22 Jun 2022 11:50:38 -0700 Message-Id: <20220622185038.71740-6-namit@vmware.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220622185038.71740-1-namit@vmware.com> References: <20220622185038.71740-1-namit@vmware.com> MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1655951158; a=rsa-sha256; cv=none; b=Znyw2syWzx4gwNeQQBlD1yTDpjbO41SvurPHelG4CRpoINoduzklgio+bkykG5gP5mspJO YITHRO/Ndh+XfIuOuARDBCFSnRHABVVaF4HTQRWO/yFx+NNEoMegeTvDmX8fe+//G8Hi7q z/Hi6jURYGb46TCLeJm8wMofqqtQvLo= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b="ZTB1bY7/"; dmarc=pass (policy=none) header.from=gmail.com; spf=none (imf07.hostedemail.com: domain of MAILER-DAEMON@hostedemail.com has no SPF policy when checking 216.40.44.17) smtp.mailfrom=MAILER-DAEMON@hostedemail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1655951158; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=NcPceKSRgtPTD9CpI6EICDoagd6x5gRxlN7y0kDN0bw=; b=8kpE5sfeIk8b1OFV9Qjp79k7/vSAxPpEg0PNMttbs+qH9QJobTBsJJ1E9nmtRyjZ+V0DGL H4Hp7p0qOB+rlY1D6o3LvkDSt/NE3pISBKh8MKUxgbcYBGi5zxzexUl3M2v7017rcX8xHt fQ07KkpnYBuheAw9WQ2a9Q8+KQpJwkc= X-Rspamd-Queue-Id: AC76340022 X-HE-Tag-Orig: 1655951157-937872 X-Rspam-User: Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b="ZTB1bY7/"; dmarc=pass (policy=none) header.from=gmail.com; spf=none (imf07.hostedemail.com: domain of MAILER-DAEMON@hostedemail.com has no SPF policy when checking 216.40.44.17) smtp.mailfrom=MAILER-DAEMON@hostedemail.com X-Rspamd-Server: rspam03 X-Stat-Signature: e9c4jy9opaod8f7k6hoz3fc7rx57k83p X-HE-Tag: 1655951158-248695 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Nadav Amit Test UFFDIO_*_MODE_ACCESS_LIKELY and UFFDIO_*_MODE_WRITE_LIKELY. Introduce a modifier to trigger the use of the hints. Add the test to run_vmtests.sh and add an array to run different userfaultfd configurations. Cc: Mike Kravetz Cc: Hugh Dickins Cc: Andrew Morton Cc: Axel Rasmussen Cc: Peter Xu Cc: David Hildenbrand Cc: Mike Rapoport Signed-off-by: Nadav Amit --- tools/testing/selftests/vm/run_vmtests.sh | 23 ++++++++-------- tools/testing/selftests/vm/userfaultfd.c | 32 +++++++++++++++++++++++ 2 files changed, 43 insertions(+), 12 deletions(-) diff --git a/tools/testing/selftests/vm/run_vmtests.sh b/tools/testing/selftests/vm/run_vmtests.sh index 930c54eb5b4b..b90e9cf9716d 100755 --- a/tools/testing/selftests/vm/run_vmtests.sh +++ b/tools/testing/selftests/vm/run_vmtests.sh @@ -120,18 +120,17 @@ run_test ./gup_test -a # Dump pages 0, 19, and 4096, using pin_user_pages: run_test ./gup_test -ct -F 0x1 0 19 0x1000 -run_test ./userfaultfd anon 20 16 -run_test ./userfaultfd anon:dev 20 16 -# Hugetlb tests require source and destination huge pages. Pass in half the -# size ($half_ufd_size_MB), which is used for *each*. -run_test ./userfaultfd hugetlb "$half_ufd_size_MB" 32 -run_test ./userfaultfd hugetlb:dev "$half_ufd_size_MB" 32 -run_test ./userfaultfd hugetlb_shared "$half_ufd_size_MB" 32 "$mnt"/uffd-test -rm -f "$mnt"/uffd-test -run_test ./userfaultfd hugetlb_shared:dev "$half_ufd_size_MB" 32 "$mnt"/uffd-test -rm -f "$mnt"/uffd-test -run_test ./userfaultfd shmem 20 16 -run_test ./userfaultfd shmem:dev 20 16 +uffd_mods=("" ":dev" ":access_likely" ":access_likely:write_likely" ":write_likely") + +for mod in "${uffd_mods[@]}"; do + run_test ./userfaultfd anon${mod} 20 16 + # Hugetlb tests require source and destination huge pages. Pass in half the + # size ($half_ufd_size_MB), which is used for *each*. + run_test ./userfaultfd hugetlb${mod} "$half_ufd_size_MB" 32 + run_test ./userfaultfd hugetlb_shared${mod} "$half_ufd_size_MB" 32 "$mnt"/uffd-test + rm -f "$mnt"/uffd-test + run_test ./userfaultfd shmem${mod} 20 16 +done #cleanup umount "$mnt" diff --git a/tools/testing/selftests/vm/userfaultfd.c b/tools/testing/selftests/vm/userfaultfd.c index 28b881523d15..763458ce1d52 100644 --- a/tools/testing/selftests/vm/userfaultfd.c +++ b/tools/testing/selftests/vm/userfaultfd.c @@ -88,6 +88,8 @@ static volatile bool test_uffdio_zeropage_eexist = true; static bool test_uffdio_wp = true; /* Whether to test uffd minor faults */ static bool test_uffdio_minor = false; +static bool test_access_likely; +static bool test_write_likely; static bool map_shared; static int shm_fd; @@ -550,6 +552,12 @@ static void wp_range(int ufd, __u64 start, __u64 len, bool wp) /* Undo write-protect, do wakeup after that */ prms.mode = wp ? UFFDIO_WRITEPROTECT_MODE_WP : 0; + if (test_access_likely) + prms.mode |= UFFDIO_WRITEPROTECT_MODE_ACCESS_LIKELY; + + if (test_write_likely) + prms.mode |= UFFDIO_WRITEPROTECT_MODE_WRITE_LIKELY; + if (ioctl(ufd, UFFDIO_WRITEPROTECT, &prms)) err("clear WP failed: address=0x%"PRIx64, (uint64_t)start); } @@ -563,6 +571,12 @@ static void continue_range(int ufd, __u64 start, __u64 len) req.range.len = len; req.mode = 0; + if (test_access_likely) + req.mode |= UFFDIO_CONTINUE_MODE_ACCESS_LIKELY; + + if (test_write_likely) + req.mode |= UFFDIO_CONTINUE_MODE_WRITE_LIKELY; + if (ioctl(ufd, UFFDIO_CONTINUE, &req)) err("UFFDIO_CONTINUE failed for address 0x%" PRIx64, (uint64_t)start); @@ -653,6 +667,13 @@ static int __copy_page(int ufd, unsigned long offset, bool retry) uffdio_copy.mode = UFFDIO_COPY_MODE_WP; else uffdio_copy.mode = 0; + + if (test_access_likely) + uffdio_copy.mode |= UFFDIO_COPY_MODE_ACCESS_LIKELY; + + if (test_write_likely) + uffdio_copy.mode |= UFFDIO_COPY_MODE_WRITE_LIKELY; + uffdio_copy.copy = 0; if (ioctl(ufd, UFFDIO_COPY, &uffdio_copy)) { /* real retval in ufdio_copy.copy */ @@ -1080,6 +1101,13 @@ static int __uffdio_zeropage(int ufd, unsigned long offset, bool retry) uffdio_zeropage.range.start = (unsigned long) area_dst + offset; uffdio_zeropage.range.len = page_size; uffdio_zeropage.mode = 0; + + if (test_access_likely) + uffdio_zeropage.mode |= UFFDIO_ZEROPAGE_MODE_ACCESS_LIKELY; + + if (test_write_likely) + uffdio_zeropage.mode |= UFFDIO_ZEROPAGE_MODE_WRITE_LIKELY; + ret = ioctl(ufd, UFFDIO_ZEROPAGE, &uffdio_zeropage); res = uffdio_zeropage.zeropage; if (ret) { @@ -1648,6 +1676,10 @@ static void parse_test_type_arg(const char *raw_type) set_test_type(token); else if (!strcmp(token, "dev")) test_dev_userfaultfd = true; + else if (!strcmp(token, "access_likely")) + test_access_likely = true; + else if (!strcmp(token, "write_likely")) + test_write_likely = true; else err("unrecognized test mod '%s'", token); }