From patchwork Fri Jun 28 14:35:22 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lorenzo Stoakes X-Patchwork-Id: 13716242 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E5BD9C3064D for ; Fri, 28 Jun 2024 14:35:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 56F366B008C; Fri, 28 Jun 2024 10:35:42 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 51F886B0095; Fri, 28 Jun 2024 10:35:42 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 34B7E6B0096; Fri, 28 Jun 2024 10:35:42 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 0FA4A6B008C for ; Fri, 28 Jun 2024 10:35:42 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id B83F61204A5 for ; Fri, 28 Jun 2024 14:35:41 +0000 (UTC) X-FDA: 82280546082.13.945F8FD Received: from mail-lf1-f41.google.com (mail-lf1-f41.google.com [209.85.167.41]) by imf06.hostedemail.com (Postfix) with ESMTP id 9FDE7180028 for ; Fri, 28 Jun 2024 14:35:39 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Dj04VhwB; spf=pass (imf06.hostedemail.com: domain of lstoakes@gmail.com designates 209.85.167.41 as permitted sender) smtp.mailfrom=lstoakes@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1719585330; a=rsa-sha256; cv=none; b=wXVr+4apoK9OsDEuTfqoAYH0gAj+4cufyOpTXCRjqOF2vp7kmMdcCEENnwlJSgRTbPID5A FnrR/8yEubOSfTlFx9yEzzjZf4p3nVS/pseFJROrozLf3bJ1zI+y3eIfE1vTWW8h+SmF8Z DcsaCuIIWS0kS7FHmX6E7jZ6VQBum2w= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Dj04VhwB; spf=pass (imf06.hostedemail.com: domain of lstoakes@gmail.com designates 209.85.167.41 as permitted sender) smtp.mailfrom=lstoakes@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1719585330; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=45Fn1VR56OryxHQR2ICCsQfaXEj2VPk0P2HvHUTeh68=; b=i/zmg4oiC1zZeAE1ufAXApmAT6EltbVI6kIcT895/q5xBpMfjs3TuX6z9173t/TMxVYY6W Wn3r0yLJDeFv+1rFmyKuy0iCkNGV+WcTWj1IVrqeRaMZZV39tvAu9mwLmayVw4v2DMoryc 0JBznBwFNv5owNvKTXfqAJQygMNhADo= Received: by mail-lf1-f41.google.com with SMTP id 2adb3069b0e04-52cdd893e5cso759015e87.1 for ; Fri, 28 Jun 2024 07:35:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1719585338; x=1720190138; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=45Fn1VR56OryxHQR2ICCsQfaXEj2VPk0P2HvHUTeh68=; b=Dj04VhwBfQ0m/cQLzLeul3I8VWXPzda3KfDdpqG7jdegPU4kbrI6SEIV/z1AgoxQtO KYbYMgh56IWR0g0RYi24GWZtiXS+RGRy3sMTDRLj2zNLRm0j62tb6QibPR99+paNjKHp UdtgQDY95yC+FwADENwl5aRRBTzKB1UdaoX/Oitv6i2KwWWNBqKTZoO/sgF++T+yrcLo Rn9Rz+6sihm1WFV88posEimA4F3ExV/8BxM34X7FC5e9Y4+wLgM1JMDVDsiAHxx9Tj/e 1RJK406slB9E3/5fAoI2IiFdJj+xk457MPlF7VcL8oc08Nn2h+6KCdzCxuBMdmuSMKfL jQAw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1719585338; x=1720190138; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=45Fn1VR56OryxHQR2ICCsQfaXEj2VPk0P2HvHUTeh68=; b=i1VbfQ6OjXBGPGf3cFBuxbAXRarVP6xPLXNQL6d7CTGiyExbk5uc7n8H1JIBy2fM65 UMZU/o/ZjSzVByA72pxTsA8HonYE8RwwDTbkLRrtF8R/r4elKpBBq1qe1XMp0/QLpDjL cmORyhHGJC5yT05Z6xyEdMnPqzka5YfhvkJfwHFcZcALI9YcoBQXDPDpl+/pyNzJLveA CTTqz9rOItqFS9I43vuFvywqnaW72jH8ENvx4a2k4Ua8VKl5GMqbJDU99q2WjEqIxNVl 7HM37zkFvRBJt+01ZBGpxVJ6MiROHOXfAfgMBjlFDegXRRWNyZ7IS5qU63AqiODQ37qY oV2Q== X-Forwarded-Encrypted: i=1; AJvYcCUOJ5fpCL1G/5/DFo09ktgY0/Dlj/v+ca6WTVkTTY9j+PA66spU+ybePYYCO3LCrfi3NGQ/LfbsI9VwLCnWfbiH9xM= X-Gm-Message-State: AOJu0YzX37BtPxrDrNBlA0Z+rMXaNMG66FZzdT2epkLB8P/61rZdpPKc KCLnS5deXbkre553TeiF4muoUYHa6wyx6CwCtdl410KMDeKwxGuf X-Google-Smtp-Source: AGHT+IGADRofLmu0u6p9UqIYdElb/weLzHiNpcCewkVldJURTAToz8vEU+B0cUlLgs5vMRuOvfV8Zg== X-Received: by 2002:a05:6512:e95:b0:52c:898b:a180 with SMTP id 2adb3069b0e04-52ce1832725mr16276974e87.12.1719585337325; Fri, 28 Jun 2024 07:35:37 -0700 (PDT) Received: from lucifer.home ([2a00:23cc:d20f:ba01:bb66:f8b2:a0e8:6447]) by smtp.googlemail.com with ESMTPSA id 5b1f17b1804b1-4256af37828sm38985485e9.9.2024.06.28.07.35.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 28 Jun 2024 07:35:35 -0700 (PDT) From: Lorenzo Stoakes To: Andrew Morton Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, "Liam R . Howlett" , Vlastimil Babka , Matthew Wilcox , Alexander Viro , Christian Brauner , Jan Kara , Eric Biederman , Kees Cook , Suren Baghdasaryan , Lorenzo Stoakes Subject: [RFC PATCH v2 1/7] userfaultfd: move core VMA manipulation logic to mm/userfaultfd.c Date: Fri, 28 Jun 2024 15:35:22 +0100 Message-ID: <61741ce2b6c4a782ed29e3a1762047fb3c306309.1719584707.git.lstoakes@gmail.com> X-Mailer: git-send-email 2.45.1 In-Reply-To: References: MIME-Version: 1.0 X-Stat-Signature: 15afx87a4qik4b6nds6ngfqexyd9oxif X-Rspamd-Queue-Id: 9FDE7180028 X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1719585339-616579 X-HE-Meta: U2FsdGVkX1+gYSP1ar5+ezoE9yIHcFmQs61QFJR8t+EOsrEj/DvlkUOgfWHqxlKI6WzaMnvTdixj4sp6KAFK8lA4bR/ZMHZX1Rjt/yEl1re4CmAnXKX+F/eKaMDEr6TeIV8YlWPCTyb61KMNqD6h2lEml8If0ptijOdHD5H6chgM7/y/xFppVJ/z+o9tjzIgTbUGZmvIv1PoTPuEJh079dKVZ1k16ctpcZsimErBfrgexLZowlRVSM98CG9x5SEySa4qxQVrss8t3wdSfL7hwny0KmZq/Hu/n70T24I07EE1pTx6kA4YFhRrWhJwL1nyy3RItx7b4pLaW8TcOUZRPCvlH9noGGdmZxisJ0gCKwu/TnaUPPqe/PZ2lG7KY7z932KAhaTzWdHpm0nX10jy8yFRlUZzrBcIW8Y4lGQZoV1XOZctZMTo5GJWaMKkwx/bGoAmM3D2smqSN30+m3Qp6LDmwvMhdHOILuIb2Tl01s2QcaOmfUWKCU6hiZQwlKSKD2+wMdgZ2oHWUttmvyzGhJUCd4crOMPv8J75PVhrHb1yapp2svEJD3msXEH69GkX9+0wuSq9QBSiwuuAAIkfgG2Av1+qZ2nBacqNkyu2oGidjjzxdGIm4vtoceKC/xyyyTLTuyiRA70xLIv8LMvS/ALlEZHuN1tnNFn82ASbIP9UWQjrGKbnL/Id+c5UGk5O2J1e8ms0kN7D9xb2QSQFcIpP5yMz6A4y2sA9UKEgQ0p/5GV/nOvLSEhr7Rcg7T776rejN1DRPZB14hDiise3ZB2nR7GSWlQfHfddXrYXdSDecxN8yAOwqyaGdY1NbKg4nNZ2t8PIVRO5T0HgmI0Rj0gDF+pPDvdRGxPe0Rlg5BLTT+vrA8uHkkVAPtZ3DVioFeQOnfN2EovXkfdfx8VdgoD216GD7RqLsioIPEtkVDaWwB9+n+vs6ziXPBUY0m0cc6PaZUuQYhbm+LoGflL MxBAKYt4 LUwGh5t66zQ2xra3A2GSidb228SWIr5961wyoue13++3WpM/9sBllxgFBAAk0hgYiLbG1TES62E7Tlz2FQw5vcu/K+g2tbXsVnEnQxgZVhkP3Q5jyTUYhcWOHORHvPxCx0h8WoC2MOzntJxj1bpZ8AIWsW0K67DC5GWo1zlZGwDH2hDO6CNPe1OhJ2Fe+GMSgiubBRIwQQBBR2Z7v5z4VyTqgLGYxFbbzP0uDvlT+zw0eBbzoVGpl7Kuf4grVuRuoJBQfgKbDQuDztilj0pcGutW/LILCBOMnsxpEAVd4OcNmSD4Lxg4yk0ORN+vEVJr97MydQLnkRwVIm/WwCsA/EUK2VA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: This patch forms part of a patch series intending to separate out VMA logic and render it testable from userspace, which requires that core manipulation functions be exposed in an mm/-internal header file. In order to do this, we must abstract APIs we wish to test, in this instance functions which ultimately invoke vma_modify(). This patch therefore moves all logic which ultimately invokes vma_modify() to mm/userfaultfd.c, trying to transfer code at a functional granularity where possible. Signed-off-by: Lorenzo Stoakes --- fs/userfaultfd.c | 160 +++----------------------------- include/linux/userfaultfd_k.h | 19 ++++ mm/userfaultfd.c | 168 ++++++++++++++++++++++++++++++++++ 3 files changed, 198 insertions(+), 149 deletions(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 17e409ceaa33..31fa788d9ecd 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -104,21 +104,6 @@ bool userfaultfd_wp_unpopulated(struct vm_area_struct *vma) return ctx->features & UFFD_FEATURE_WP_UNPOPULATED; } -static void userfaultfd_set_vm_flags(struct vm_area_struct *vma, - vm_flags_t flags) -{ - const bool uffd_wp_changed = (vma->vm_flags ^ flags) & VM_UFFD_WP; - - vm_flags_reset(vma, flags); - /* - * For shared mappings, we want to enable writenotify while - * userfaultfd-wp is enabled (see vma_wants_writenotify()). We'll simply - * recalculate vma->vm_page_prot whenever userfaultfd-wp changes. - */ - if ((vma->vm_flags & VM_SHARED) && uffd_wp_changed) - vma_set_page_prot(vma); -} - static int userfaultfd_wake_function(wait_queue_entry_t *wq, unsigned mode, int wake_flags, void *key) { @@ -615,22 +600,7 @@ static void userfaultfd_event_wait_completion(struct userfaultfd_ctx *ctx, spin_unlock_irq(&ctx->event_wqh.lock); if (release_new_ctx) { - struct vm_area_struct *vma; - struct mm_struct *mm = release_new_ctx->mm; - VMA_ITERATOR(vmi, mm, 0); - - /* the various vma->vm_userfaultfd_ctx still points to it */ - mmap_write_lock(mm); - for_each_vma(vmi, vma) { - if (vma->vm_userfaultfd_ctx.ctx == release_new_ctx) { - vma_start_write(vma); - vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX; - userfaultfd_set_vm_flags(vma, - vma->vm_flags & ~__VM_UFFD_FLAGS); - } - } - mmap_write_unlock(mm); - + userfaultfd_release_new(release_new_ctx); userfaultfd_ctx_put(release_new_ctx); } @@ -662,9 +632,7 @@ int dup_userfaultfd(struct vm_area_struct *vma, struct list_head *fcs) return 0; if (!(octx->features & UFFD_FEATURE_EVENT_FORK)) { - vma_start_write(vma); - vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX; - userfaultfd_set_vm_flags(vma, vma->vm_flags & ~__VM_UFFD_FLAGS); + userfaultfd_reset_ctx(vma); return 0; } @@ -749,9 +717,7 @@ void mremap_userfaultfd_prep(struct vm_area_struct *vma, up_write(&ctx->map_changing_lock); } else { /* Drop uffd context if remap feature not enabled */ - vma_start_write(vma); - vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX; - userfaultfd_set_vm_flags(vma, vma->vm_flags & ~__VM_UFFD_FLAGS); + userfaultfd_reset_ctx(vma); } } @@ -870,53 +836,13 @@ static int userfaultfd_release(struct inode *inode, struct file *file) { struct userfaultfd_ctx *ctx = file->private_data; struct mm_struct *mm = ctx->mm; - struct vm_area_struct *vma, *prev; /* len == 0 means wake all */ struct userfaultfd_wake_range range = { .len = 0, }; - unsigned long new_flags; - VMA_ITERATOR(vmi, mm, 0); WRITE_ONCE(ctx->released, true); - if (!mmget_not_zero(mm)) - goto wakeup; - - /* - * Flush page faults out of all CPUs. NOTE: all page faults - * must be retried without returning VM_FAULT_SIGBUS if - * userfaultfd_ctx_get() succeeds but vma->vma_userfault_ctx - * changes while handle_userfault released the mmap_lock. So - * it's critical that released is set to true (above), before - * taking the mmap_lock for writing. - */ - mmap_write_lock(mm); - prev = NULL; - for_each_vma(vmi, vma) { - cond_resched(); - BUG_ON(!!vma->vm_userfaultfd_ctx.ctx ^ - !!(vma->vm_flags & __VM_UFFD_FLAGS)); - if (vma->vm_userfaultfd_ctx.ctx != ctx) { - prev = vma; - continue; - } - /* Reset ptes for the whole vma range if wr-protected */ - if (userfaultfd_wp(vma)) - uffd_wp_range(vma, vma->vm_start, - vma->vm_end - vma->vm_start, false); - new_flags = vma->vm_flags & ~__VM_UFFD_FLAGS; - vma = vma_modify_flags_uffd(&vmi, prev, vma, vma->vm_start, - vma->vm_end, new_flags, - NULL_VM_UFFD_CTX); - - vma_start_write(vma); - userfaultfd_set_vm_flags(vma, new_flags); - vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX; + userfaultfd_release_all(mm, ctx); - prev = vma; - } - mmap_write_unlock(mm); - mmput(mm); -wakeup: /* * After no new page faults can wait on this fault_*wqh, flush * the last page faults that may have been already waiting on @@ -1293,14 +1219,14 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx, unsigned long arg) { struct mm_struct *mm = ctx->mm; - struct vm_area_struct *vma, *prev, *cur; + struct vm_area_struct *vma, *cur; int ret; struct uffdio_register uffdio_register; struct uffdio_register __user *user_uffdio_register; - unsigned long vm_flags, new_flags; + unsigned long vm_flags; bool found; bool basic_ioctls; - unsigned long start, end, vma_end; + unsigned long start, end; struct vma_iterator vmi; bool wp_async = userfaultfd_wp_async_ctx(ctx); @@ -1428,57 +1354,8 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx, } for_each_vma_range(vmi, cur, end); BUG_ON(!found); - vma_iter_set(&vmi, start); - prev = vma_prev(&vmi); - if (vma->vm_start < start) - prev = vma; - - ret = 0; - for_each_vma_range(vmi, vma, end) { - cond_resched(); - - BUG_ON(!vma_can_userfault(vma, vm_flags, wp_async)); - BUG_ON(vma->vm_userfaultfd_ctx.ctx && - vma->vm_userfaultfd_ctx.ctx != ctx); - WARN_ON(!(vma->vm_flags & VM_MAYWRITE)); - - /* - * Nothing to do: this vma is already registered into this - * userfaultfd and with the right tracking mode too. - */ - if (vma->vm_userfaultfd_ctx.ctx == ctx && - (vma->vm_flags & vm_flags) == vm_flags) - goto skip; - - if (vma->vm_start > start) - start = vma->vm_start; - vma_end = min(end, vma->vm_end); - - new_flags = (vma->vm_flags & ~__VM_UFFD_FLAGS) | vm_flags; - vma = vma_modify_flags_uffd(&vmi, prev, vma, start, vma_end, - new_flags, - (struct vm_userfaultfd_ctx){ctx}); - if (IS_ERR(vma)) { - ret = PTR_ERR(vma); - break; - } - - /* - * In the vma_merge() successful mprotect-like case 8: - * the next vma was merged into the current one and - * the current one has not been updated yet. - */ - vma_start_write(vma); - userfaultfd_set_vm_flags(vma, new_flags); - vma->vm_userfaultfd_ctx.ctx = ctx; - - if (is_vm_hugetlb_page(vma) && uffd_disable_huge_pmd_share(vma)) - hugetlb_unshare_all_pmds(vma); - - skip: - prev = vma; - start = vma->vm_end; - } + ret = userfaultfd_register_range(ctx, vma, vm_flags, start, end, + wp_async); out_unlock: mmap_write_unlock(mm); @@ -1519,7 +1396,6 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx, struct vm_area_struct *vma, *prev, *cur; int ret; struct uffdio_range uffdio_unregister; - unsigned long new_flags; bool found; unsigned long start, end, vma_end; const void __user *buf = (void __user *)arg; @@ -1622,27 +1498,13 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx, wake_userfault(vma->vm_userfaultfd_ctx.ctx, &range); } - /* Reset ptes for the whole vma range if wr-protected */ - if (userfaultfd_wp(vma)) - uffd_wp_range(vma, start, vma_end - start, false); - - new_flags = vma->vm_flags & ~__VM_UFFD_FLAGS; - vma = vma_modify_flags_uffd(&vmi, prev, vma, start, vma_end, - new_flags, NULL_VM_UFFD_CTX); + vma = userfaultfd_clear_vma(&vmi, prev, vma, + start, vma_end); if (IS_ERR(vma)) { ret = PTR_ERR(vma); break; } - /* - * In the vma_merge() successful mprotect-like case 8: - * the next vma was merged into the current one and - * the current one has not been updated yet. - */ - vma_start_write(vma); - userfaultfd_set_vm_flags(vma, new_flags); - vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX; - skip: prev = vma; start = vma->vm_end; diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index 05d59f74fc88..6355ed5bd34b 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -264,6 +264,25 @@ extern void userfaultfd_unmap_complete(struct mm_struct *mm, extern bool userfaultfd_wp_unpopulated(struct vm_area_struct *vma); extern bool userfaultfd_wp_async(struct vm_area_struct *vma); +extern void userfaultfd_reset_ctx(struct vm_area_struct *vma); + +extern struct vm_area_struct *userfaultfd_clear_vma(struct vma_iterator *vmi, + struct vm_area_struct *prev, + struct vm_area_struct *vma, + unsigned long start, + unsigned long end); + +int userfaultfd_register_range(struct userfaultfd_ctx *ctx, + struct vm_area_struct *vma, + unsigned long vm_flags, + unsigned long start, unsigned long end, + bool wp_async); + +extern void userfaultfd_release_new(struct userfaultfd_ctx *ctx); + +extern void userfaultfd_release_all(struct mm_struct *mm, + struct userfaultfd_ctx *ctx); + #else /* CONFIG_USERFAULTFD */ /* mm helpers */ diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 8dedaec00486..950fe6b2f0f7 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -1760,3 +1760,171 @@ ssize_t move_pages(struct userfaultfd_ctx *ctx, unsigned long dst_start, VM_WARN_ON(!moved && !err); return moved ? moved : err; } + +static void userfaultfd_set_vm_flags(struct vm_area_struct *vma, + vm_flags_t flags) +{ + const bool uffd_wp_changed = (vma->vm_flags ^ flags) & VM_UFFD_WP; + + vm_flags_reset(vma, flags); + /* + * For shared mappings, we want to enable writenotify while + * userfaultfd-wp is enabled (see vma_wants_writenotify()). We'll simply + * recalculate vma->vm_page_prot whenever userfaultfd-wp changes. + */ + if ((vma->vm_flags & VM_SHARED) && uffd_wp_changed) + vma_set_page_prot(vma); +} + +static void userfaultfd_set_ctx(struct vm_area_struct *vma, + struct userfaultfd_ctx *ctx, + unsigned long flags) +{ + vma_start_write(vma); + vma->vm_userfaultfd_ctx = (struct vm_userfaultfd_ctx){ctx}; + userfaultfd_set_vm_flags(vma, + (vma->vm_flags & ~__VM_UFFD_FLAGS) | flags); +} + +void userfaultfd_reset_ctx(struct vm_area_struct *vma) +{ + userfaultfd_set_ctx(vma, NULL, 0); +} + +struct vm_area_struct *userfaultfd_clear_vma(struct vma_iterator *vmi, + struct vm_area_struct *prev, + struct vm_area_struct *vma, + unsigned long start, + unsigned long end) +{ + struct vm_area_struct *ret; + + /* Reset ptes for the whole vma range if wr-protected */ + if (userfaultfd_wp(vma)) + uffd_wp_range(vma, start, end - start, false); + + ret = vma_modify_flags_uffd(vmi, prev, vma, start, end, + vma->vm_flags & ~__VM_UFFD_FLAGS, + NULL_VM_UFFD_CTX); + + /* + * In the vma_merge() successful mprotect-like case 8: + * the next vma was merged into the current one and + * the current one has not been updated yet. + */ + if (!IS_ERR(ret)) + userfaultfd_reset_ctx(vma); + + return ret; +} + +/* Assumes mmap write lock taken, and mm_struct pinned. */ +int userfaultfd_register_range(struct userfaultfd_ctx *ctx, + struct vm_area_struct *vma, + unsigned long vm_flags, + unsigned long start, unsigned long end, + bool wp_async) +{ + VMA_ITERATOR(vmi, ctx->mm, start); + struct vm_area_struct *prev = vma_prev(&vmi); + unsigned long vma_end; + unsigned long new_flags; + + if (vma->vm_start < start) + prev = vma; + + for_each_vma_range(vmi, vma, end) { + cond_resched(); + + BUG_ON(!vma_can_userfault(vma, vm_flags, wp_async)); + BUG_ON(vma->vm_userfaultfd_ctx.ctx && + vma->vm_userfaultfd_ctx.ctx != ctx); + WARN_ON(!(vma->vm_flags & VM_MAYWRITE)); + + /* + * Nothing to do: this vma is already registered into this + * userfaultfd and with the right tracking mode too. + */ + if (vma->vm_userfaultfd_ctx.ctx == ctx && + (vma->vm_flags & vm_flags) == vm_flags) + goto skip; + + if (vma->vm_start > start) + start = vma->vm_start; + vma_end = min(end, vma->vm_end); + + new_flags = (vma->vm_flags & ~__VM_UFFD_FLAGS) | vm_flags; + vma = vma_modify_flags_uffd(&vmi, prev, vma, start, vma_end, + new_flags, + (struct vm_userfaultfd_ctx){ctx}); + if (IS_ERR(vma)) + return PTR_ERR(vma); + + /* + * In the vma_merge() successful mprotect-like case 8: + * the next vma was merged into the current one and + * the current one has not been updated yet. + */ + userfaultfd_set_ctx(vma, ctx, vm_flags); + + if (is_vm_hugetlb_page(vma) && uffd_disable_huge_pmd_share(vma)) + hugetlb_unshare_all_pmds(vma); + +skip: + prev = vma; + start = vma->vm_end; + } + + return 0; +} + +void userfaultfd_release_new(struct userfaultfd_ctx *ctx) +{ + struct mm_struct *mm = ctx->mm; + struct vm_area_struct *vma; + VMA_ITERATOR(vmi, mm, 0); + + /* the various vma->vm_userfaultfd_ctx still points to it */ + mmap_write_lock(mm); + for_each_vma(vmi, vma) { + if (vma->vm_userfaultfd_ctx.ctx == ctx) + userfaultfd_reset_ctx(vma); + } + mmap_write_unlock(mm); +} + +void userfaultfd_release_all(struct mm_struct *mm, + struct userfaultfd_ctx *ctx) +{ + struct vm_area_struct *vma, *prev; + VMA_ITERATOR(vmi, mm, 0); + + if (!mmget_not_zero(mm)) + return; + + /* + * Flush page faults out of all CPUs. NOTE: all page faults + * must be retried without returning VM_FAULT_SIGBUS if + * userfaultfd_ctx_get() succeeds but vma->vma_userfault_ctx + * changes while handle_userfault released the mmap_lock. So + * it's critical that released is set to true (above), before + * taking the mmap_lock for writing. + */ + mmap_write_lock(mm); + prev = NULL; + for_each_vma(vmi, vma) { + cond_resched(); + BUG_ON(!!vma->vm_userfaultfd_ctx.ctx ^ + !!(vma->vm_flags & __VM_UFFD_FLAGS)); + if (vma->vm_userfaultfd_ctx.ctx != ctx) { + prev = vma; + continue; + } + + vma = userfaultfd_clear_vma(&vmi, prev, vma, + vma->vm_start, vma->vm_end); + prev = vma; + } + mmap_write_unlock(mm); + mmput(mm); +}