From patchwork Thu Jun 27 10:39:26 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lorenzo Stoakes X-Patchwork-Id: 13714162 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8FF98C30653 for ; Thu, 27 Jun 2024 10:39:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F22846B0092; Thu, 27 Jun 2024 06:39:45 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id EA9196B0095; Thu, 27 Jun 2024 06:39:45 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C88A36B0096; Thu, 27 Jun 2024 06:39:45 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 9B5396B0092 for ; Thu, 27 Jun 2024 06:39:45 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 3CC7A121CE6 for ; Thu, 27 Jun 2024 10:39:45 +0000 (UTC) X-FDA: 82276322730.08.8E8C4D7 Received: from mail-wm1-f48.google.com (mail-wm1-f48.google.com [209.85.128.48]) by imf20.hostedemail.com (Postfix) with ESMTP id 380AA1C0007 for ; Thu, 27 Jun 2024 10:39:42 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=mufEcppI; spf=pass (imf20.hostedemail.com: domain of lstoakes@gmail.com designates 209.85.128.48 as permitted sender) smtp.mailfrom=lstoakes@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1719484765; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=45Fn1VR56OryxHQR2ICCsQfaXEj2VPk0P2HvHUTeh68=; b=IwFNwRy/zgfIAe/I0HVEtUgvUHoS4yEqPM3YBDFoc+wqsFH1jg97Ehk2rRMpMTz9gXsaNK /WwrmMO+XiuGe4MWqddOP9Yh+0jqZmSEPB8HwTY+gOuqibjnUHBpHlkk5HXuJCgoqqMvzj jV22cBDpsgkphkJqHoeXCGWY+36gsuk= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1719484765; a=rsa-sha256; cv=none; b=C2ddApkagIEV41LVYZvSBDtrGgp/bESIo3BjAUKg9td2Q670DgrylKUOKGDtWbAeqd9kOj lm8Wdxy7fpqSL/RV9WB8RNBNvbpt7WcNdDtBYosdx5q1lKDUer2RwBP/qCD9efMACxAQWb tdiBwG0nbQhPHlZ0UdZUQfaAWnWki5k= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=mufEcppI; spf=pass (imf20.hostedemail.com: domain of lstoakes@gmail.com designates 209.85.128.48 as permitted sender) smtp.mailfrom=lstoakes@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-wm1-f48.google.com with SMTP id 5b1f17b1804b1-42563a9fa58so6289875e9.0 for ; Thu, 27 Jun 2024 03:39:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1719484781; x=1720089581; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=45Fn1VR56OryxHQR2ICCsQfaXEj2VPk0P2HvHUTeh68=; b=mufEcppIvQBAHsss+Zf78E0+G+fv4ZmeYat4Q2+GfWkL9ZzMjGJMQkWHu+9rQ6cr3D ILmts8OAou3D88UTSoJESmcJdWpXiSH0hBWO4pd8yKOu6EzlA2thXArGtavy5E4UJFaA uBvRafMUMG6eSDusyspoJY9WmyusZAIJj/GZyHEDF2XGnQdEVq5lOhFv8Q/XGF4uvQhX gwOprpviH39EVh69cK7fbFIN2iD8bdJfss8wWgg/2guPjgfYDfQKwUiAvbWrylK6mv1M OxGHGv5UMjUSmZUFbFfKjjqEDLURqJj+DhTjFvgBfiusbcvgo9sqkVxTQBfKdinSjawZ LfkA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1719484781; x=1720089581; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=45Fn1VR56OryxHQR2ICCsQfaXEj2VPk0P2HvHUTeh68=; b=KHW5etuEfvBSHi49ZnSgYsawwANFYHtJB2DisZflUO2SnKERdElRW8VDFNAPkWK3fI i9ifq4gxuFxgZkV3mOE5wb9U50O6GOxvt4Q/eJGXagOJSA0WaW6ky+anhuO2fcWtPRQL EIdkKBH+77WXgJ/lHduK1Y/6ADwiu5lroXrHSeHKPJ/6fotYs5ZJaMwEbjZLJriAPY2H IByQLTCfx4x+reTOAX+1ieMoZ7PqdjSb95iIwbQOCl1ZirzDjR1l6J8mENbWKyTDgm2s xNAxyD0HXYsqJkT3hn0Ss6MfNbMo6wDAzsG+YBy7yQLSm5X9GXZQ+Vj16155g98DfOog T0GQ== X-Forwarded-Encrypted: i=1; AJvYcCXPZAs48d3xcRZsg4b/i/yQD6yuCpzctjLbo3eJKZC84cg+EATOvj0/ggK04MjLa74bau9fwoC8tscvdu5lI4mIs5U= X-Gm-Message-State: AOJu0YxBxi9fdgs/4vNeTzU4KD6nmTrZjLdb28+KMpZ4ibCpuQFlxxEG qKqi9TAeukVuiudWEG6jRj1U5iQ8TUBSWOnn0dgd8wAnxNVP65gF X-Google-Smtp-Source: AGHT+IHbVxEq6Xcc0Ej4ydTNNZL88D6mvm5WatlhxIgqdW/boGrCfIsfA5Eysvo1nQ0GlqcvAY7Cvg== X-Received: by 2002:a05:600c:3501:b0:424:8fe4:5a42 with SMTP id 5b1f17b1804b1-4248fe45a88mr83351445e9.8.1719484780725; Thu, 27 Jun 2024 03:39:40 -0700 (PDT) Received: from lucifer.home ([2a00:23cc:d20f:ba01:bb66:f8b2:a0e8:6447]) by smtp.googlemail.com with ESMTPSA id 5b1f17b1804b1-42564bb6caasm19957195e9.33.2024.06.27.03.39.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 27 Jun 2024 03:39:39 -0700 (PDT) From: Lorenzo Stoakes To: Andrew Morton Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, "Liam R . Howlett" , Vlastimil Babka , Matthew Wilcox , Alexander Viro , Christian Brauner , Jan Kara , Eric Biederman , Kees Cook , Suren Baghdasaryan , Lorenzo Stoakes Subject: [RFC PATCH 1/7] userfaultfd: move core VMA manipulation logic to mm/userfaultfd.c Date: Thu, 27 Jun 2024 11:39:26 +0100 Message-ID: <993a7b057b8959903410a22d8ac117138c271117.1719481836.git.lstoakes@gmail.com> X-Mailer: git-send-email 2.45.1 In-Reply-To: References: MIME-Version: 1.0 X-Stat-Signature: kujofgqm45f7or8gsgrtas4zgor5cp8n X-Rspamd-Queue-Id: 380AA1C0007 X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1719484782-643960 X-HE-Meta: U2FsdGVkX19XH69bwwh9Wg9k1lIsfXLl6k6yNeCfEeFC3EFYrgC3npZL7HKijz6IIKxQOWEgmxFtADDcCi/fvwOqVS6l1+v0Qse7yugdnMNZP7dp1XHlLrvoY+IFBUnFi8ACqFTN8uKn30Nsz4evRqoW6Nb8/sgtlGUDZazURu3/WUx6aQq1jk5sn+XE71ACcuzilteK/flVX7YRBuL1qaekftq5rjSMtXFs36aGY+XbKboevKy/3JwvrWEpZ5QH9iw+UF5/fToroXNtwl7ZWBJ1tPUAQePGyXBBTrhgFeiOO8fjxg6mapxz/pQQllOE0NoIa9am/TsWVbYilak54hSpe8MAsQS1vA79ne+r+EucheXuUCvGm1QcTXOoYqEAvYFY9r4+XrIjhFsiac9vHCA+TOpvsCVExdUQ7eFNlPgtd6l1DsfmhgbO3wmEx/xeSFqvWpNc7tqWQWirJh5D96MVhkly71MSY+Y+/gAHvRKfJ0YRDSx2rZabuxDREM74qgBP7fsmwArAkZXy/fn8aH2aMWLMfNgWnPh+zrkA0Finv2CWdiQ5vPMcMuzuDTnDb2PqnN1Q3jiUExmcGSLXcvCYotrjpQi9dRmgQph07fwTb9KAg5dlRAUi/um56ZmfWwC47Jr2JDAA5RbpQM4B/c58/pHk6y2lpJ93P1HDMRB2Y1LQYSAoMRumhoGnEJ3gbuDhDa4ZggFg8RrRukA15GpWXcToIbwbAMt/zM/QPvqpGOcJTxs3LObHOPQfVxH2+fDrieM7xD2T6hdBLljg1Yws8ctNeBylp0p7LFdOxEV8uZuWRlG7na68JgcE7iLc8SqBXHSk/bktOdZsSMnoDZjzLMEwC7fKAYLfDFuJgFA/rywU05jNwRICHX0SzIbuETGvp0cN8qRmwhYC/9VpTai3hLRIASCpPzjbN1WDHfnk+ooNCnZG7ikx+AA+wgAQbpmewSZiEyQCk9DX5uU IWiLxOtQ fKnimVoC/A31QN5wXUggbo1ewILSRP0hrzbugasEL91s0u/FcnRrL2anER0rquvqWRsgYRCJokOQs0BJd+v/gGD1aNKQ3O142uRy3mI2WpmdO5DRaJwszJI2HEWivboU/sBdGYqoIIcVDWqZ0mMZywmPthPNvSgpKEUv9xxz10BtcRTEp6Gr56U9S9NJFQJyKy8rcYiK3XQFpMkwhygVNa3rd2pJ9/wxf9/Ropha1ODp/rBSwT8uIPhFSEoxQUoPd2QHOb9nfgbmHtxD3SOHgSNcjMIFue1mRssQemeGhE8Q5ZOL0SIqoRGniDJoAK72l9xoRuJii76Kwu0H6PuuxsQVGhw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: This patch forms part of a patch series intending to separate out VMA logic and render it testable from userspace, which requires that core manipulation functions be exposed in an mm/-internal header file. In order to do this, we must abstract APIs we wish to test, in this instance functions which ultimately invoke vma_modify(). This patch therefore moves all logic which ultimately invokes vma_modify() to mm/userfaultfd.c, trying to transfer code at a functional granularity where possible. Signed-off-by: Lorenzo Stoakes --- fs/userfaultfd.c | 160 +++----------------------------- include/linux/userfaultfd_k.h | 19 ++++ mm/userfaultfd.c | 168 ++++++++++++++++++++++++++++++++++ 3 files changed, 198 insertions(+), 149 deletions(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 17e409ceaa33..31fa788d9ecd 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -104,21 +104,6 @@ bool userfaultfd_wp_unpopulated(struct vm_area_struct *vma) return ctx->features & UFFD_FEATURE_WP_UNPOPULATED; } -static void userfaultfd_set_vm_flags(struct vm_area_struct *vma, - vm_flags_t flags) -{ - const bool uffd_wp_changed = (vma->vm_flags ^ flags) & VM_UFFD_WP; - - vm_flags_reset(vma, flags); - /* - * For shared mappings, we want to enable writenotify while - * userfaultfd-wp is enabled (see vma_wants_writenotify()). We'll simply - * recalculate vma->vm_page_prot whenever userfaultfd-wp changes. - */ - if ((vma->vm_flags & VM_SHARED) && uffd_wp_changed) - vma_set_page_prot(vma); -} - static int userfaultfd_wake_function(wait_queue_entry_t *wq, unsigned mode, int wake_flags, void *key) { @@ -615,22 +600,7 @@ static void userfaultfd_event_wait_completion(struct userfaultfd_ctx *ctx, spin_unlock_irq(&ctx->event_wqh.lock); if (release_new_ctx) { - struct vm_area_struct *vma; - struct mm_struct *mm = release_new_ctx->mm; - VMA_ITERATOR(vmi, mm, 0); - - /* the various vma->vm_userfaultfd_ctx still points to it */ - mmap_write_lock(mm); - for_each_vma(vmi, vma) { - if (vma->vm_userfaultfd_ctx.ctx == release_new_ctx) { - vma_start_write(vma); - vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX; - userfaultfd_set_vm_flags(vma, - vma->vm_flags & ~__VM_UFFD_FLAGS); - } - } - mmap_write_unlock(mm); - + userfaultfd_release_new(release_new_ctx); userfaultfd_ctx_put(release_new_ctx); } @@ -662,9 +632,7 @@ int dup_userfaultfd(struct vm_area_struct *vma, struct list_head *fcs) return 0; if (!(octx->features & UFFD_FEATURE_EVENT_FORK)) { - vma_start_write(vma); - vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX; - userfaultfd_set_vm_flags(vma, vma->vm_flags & ~__VM_UFFD_FLAGS); + userfaultfd_reset_ctx(vma); return 0; } @@ -749,9 +717,7 @@ void mremap_userfaultfd_prep(struct vm_area_struct *vma, up_write(&ctx->map_changing_lock); } else { /* Drop uffd context if remap feature not enabled */ - vma_start_write(vma); - vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX; - userfaultfd_set_vm_flags(vma, vma->vm_flags & ~__VM_UFFD_FLAGS); + userfaultfd_reset_ctx(vma); } } @@ -870,53 +836,13 @@ static int userfaultfd_release(struct inode *inode, struct file *file) { struct userfaultfd_ctx *ctx = file->private_data; struct mm_struct *mm = ctx->mm; - struct vm_area_struct *vma, *prev; /* len == 0 means wake all */ struct userfaultfd_wake_range range = { .len = 0, }; - unsigned long new_flags; - VMA_ITERATOR(vmi, mm, 0); WRITE_ONCE(ctx->released, true); - if (!mmget_not_zero(mm)) - goto wakeup; - - /* - * Flush page faults out of all CPUs. NOTE: all page faults - * must be retried without returning VM_FAULT_SIGBUS if - * userfaultfd_ctx_get() succeeds but vma->vma_userfault_ctx - * changes while handle_userfault released the mmap_lock. So - * it's critical that released is set to true (above), before - * taking the mmap_lock for writing. - */ - mmap_write_lock(mm); - prev = NULL; - for_each_vma(vmi, vma) { - cond_resched(); - BUG_ON(!!vma->vm_userfaultfd_ctx.ctx ^ - !!(vma->vm_flags & __VM_UFFD_FLAGS)); - if (vma->vm_userfaultfd_ctx.ctx != ctx) { - prev = vma; - continue; - } - /* Reset ptes for the whole vma range if wr-protected */ - if (userfaultfd_wp(vma)) - uffd_wp_range(vma, vma->vm_start, - vma->vm_end - vma->vm_start, false); - new_flags = vma->vm_flags & ~__VM_UFFD_FLAGS; - vma = vma_modify_flags_uffd(&vmi, prev, vma, vma->vm_start, - vma->vm_end, new_flags, - NULL_VM_UFFD_CTX); - - vma_start_write(vma); - userfaultfd_set_vm_flags(vma, new_flags); - vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX; + userfaultfd_release_all(mm, ctx); - prev = vma; - } - mmap_write_unlock(mm); - mmput(mm); -wakeup: /* * After no new page faults can wait on this fault_*wqh, flush * the last page faults that may have been already waiting on @@ -1293,14 +1219,14 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx, unsigned long arg) { struct mm_struct *mm = ctx->mm; - struct vm_area_struct *vma, *prev, *cur; + struct vm_area_struct *vma, *cur; int ret; struct uffdio_register uffdio_register; struct uffdio_register __user *user_uffdio_register; - unsigned long vm_flags, new_flags; + unsigned long vm_flags; bool found; bool basic_ioctls; - unsigned long start, end, vma_end; + unsigned long start, end; struct vma_iterator vmi; bool wp_async = userfaultfd_wp_async_ctx(ctx); @@ -1428,57 +1354,8 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx, } for_each_vma_range(vmi, cur, end); BUG_ON(!found); - vma_iter_set(&vmi, start); - prev = vma_prev(&vmi); - if (vma->vm_start < start) - prev = vma; - - ret = 0; - for_each_vma_range(vmi, vma, end) { - cond_resched(); - - BUG_ON(!vma_can_userfault(vma, vm_flags, wp_async)); - BUG_ON(vma->vm_userfaultfd_ctx.ctx && - vma->vm_userfaultfd_ctx.ctx != ctx); - WARN_ON(!(vma->vm_flags & VM_MAYWRITE)); - - /* - * Nothing to do: this vma is already registered into this - * userfaultfd and with the right tracking mode too. - */ - if (vma->vm_userfaultfd_ctx.ctx == ctx && - (vma->vm_flags & vm_flags) == vm_flags) - goto skip; - - if (vma->vm_start > start) - start = vma->vm_start; - vma_end = min(end, vma->vm_end); - - new_flags = (vma->vm_flags & ~__VM_UFFD_FLAGS) | vm_flags; - vma = vma_modify_flags_uffd(&vmi, prev, vma, start, vma_end, - new_flags, - (struct vm_userfaultfd_ctx){ctx}); - if (IS_ERR(vma)) { - ret = PTR_ERR(vma); - break; - } - - /* - * In the vma_merge() successful mprotect-like case 8: - * the next vma was merged into the current one and - * the current one has not been updated yet. - */ - vma_start_write(vma); - userfaultfd_set_vm_flags(vma, new_flags); - vma->vm_userfaultfd_ctx.ctx = ctx; - - if (is_vm_hugetlb_page(vma) && uffd_disable_huge_pmd_share(vma)) - hugetlb_unshare_all_pmds(vma); - - skip: - prev = vma; - start = vma->vm_end; - } + ret = userfaultfd_register_range(ctx, vma, vm_flags, start, end, + wp_async); out_unlock: mmap_write_unlock(mm); @@ -1519,7 +1396,6 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx, struct vm_area_struct *vma, *prev, *cur; int ret; struct uffdio_range uffdio_unregister; - unsigned long new_flags; bool found; unsigned long start, end, vma_end; const void __user *buf = (void __user *)arg; @@ -1622,27 +1498,13 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx, wake_userfault(vma->vm_userfaultfd_ctx.ctx, &range); } - /* Reset ptes for the whole vma range if wr-protected */ - if (userfaultfd_wp(vma)) - uffd_wp_range(vma, start, vma_end - start, false); - - new_flags = vma->vm_flags & ~__VM_UFFD_FLAGS; - vma = vma_modify_flags_uffd(&vmi, prev, vma, start, vma_end, - new_flags, NULL_VM_UFFD_CTX); + vma = userfaultfd_clear_vma(&vmi, prev, vma, + start, vma_end); if (IS_ERR(vma)) { ret = PTR_ERR(vma); break; } - /* - * In the vma_merge() successful mprotect-like case 8: - * the next vma was merged into the current one and - * the current one has not been updated yet. - */ - vma_start_write(vma); - userfaultfd_set_vm_flags(vma, new_flags); - vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX; - skip: prev = vma; start = vma->vm_end; diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index 05d59f74fc88..6355ed5bd34b 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -264,6 +264,25 @@ extern void userfaultfd_unmap_complete(struct mm_struct *mm, extern bool userfaultfd_wp_unpopulated(struct vm_area_struct *vma); extern bool userfaultfd_wp_async(struct vm_area_struct *vma); +extern void userfaultfd_reset_ctx(struct vm_area_struct *vma); + +extern struct vm_area_struct *userfaultfd_clear_vma(struct vma_iterator *vmi, + struct vm_area_struct *prev, + struct vm_area_struct *vma, + unsigned long start, + unsigned long end); + +int userfaultfd_register_range(struct userfaultfd_ctx *ctx, + struct vm_area_struct *vma, + unsigned long vm_flags, + unsigned long start, unsigned long end, + bool wp_async); + +extern void userfaultfd_release_new(struct userfaultfd_ctx *ctx); + +extern void userfaultfd_release_all(struct mm_struct *mm, + struct userfaultfd_ctx *ctx); + #else /* CONFIG_USERFAULTFD */ /* mm helpers */ diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 8dedaec00486..950fe6b2f0f7 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -1760,3 +1760,171 @@ ssize_t move_pages(struct userfaultfd_ctx *ctx, unsigned long dst_start, VM_WARN_ON(!moved && !err); return moved ? moved : err; } + +static void userfaultfd_set_vm_flags(struct vm_area_struct *vma, + vm_flags_t flags) +{ + const bool uffd_wp_changed = (vma->vm_flags ^ flags) & VM_UFFD_WP; + + vm_flags_reset(vma, flags); + /* + * For shared mappings, we want to enable writenotify while + * userfaultfd-wp is enabled (see vma_wants_writenotify()). We'll simply + * recalculate vma->vm_page_prot whenever userfaultfd-wp changes. + */ + if ((vma->vm_flags & VM_SHARED) && uffd_wp_changed) + vma_set_page_prot(vma); +} + +static void userfaultfd_set_ctx(struct vm_area_struct *vma, + struct userfaultfd_ctx *ctx, + unsigned long flags) +{ + vma_start_write(vma); + vma->vm_userfaultfd_ctx = (struct vm_userfaultfd_ctx){ctx}; + userfaultfd_set_vm_flags(vma, + (vma->vm_flags & ~__VM_UFFD_FLAGS) | flags); +} + +void userfaultfd_reset_ctx(struct vm_area_struct *vma) +{ + userfaultfd_set_ctx(vma, NULL, 0); +} + +struct vm_area_struct *userfaultfd_clear_vma(struct vma_iterator *vmi, + struct vm_area_struct *prev, + struct vm_area_struct *vma, + unsigned long start, + unsigned long end) +{ + struct vm_area_struct *ret; + + /* Reset ptes for the whole vma range if wr-protected */ + if (userfaultfd_wp(vma)) + uffd_wp_range(vma, start, end - start, false); + + ret = vma_modify_flags_uffd(vmi, prev, vma, start, end, + vma->vm_flags & ~__VM_UFFD_FLAGS, + NULL_VM_UFFD_CTX); + + /* + * In the vma_merge() successful mprotect-like case 8: + * the next vma was merged into the current one and + * the current one has not been updated yet. + */ + if (!IS_ERR(ret)) + userfaultfd_reset_ctx(vma); + + return ret; +} + +/* Assumes mmap write lock taken, and mm_struct pinned. */ +int userfaultfd_register_range(struct userfaultfd_ctx *ctx, + struct vm_area_struct *vma, + unsigned long vm_flags, + unsigned long start, unsigned long end, + bool wp_async) +{ + VMA_ITERATOR(vmi, ctx->mm, start); + struct vm_area_struct *prev = vma_prev(&vmi); + unsigned long vma_end; + unsigned long new_flags; + + if (vma->vm_start < start) + prev = vma; + + for_each_vma_range(vmi, vma, end) { + cond_resched(); + + BUG_ON(!vma_can_userfault(vma, vm_flags, wp_async)); + BUG_ON(vma->vm_userfaultfd_ctx.ctx && + vma->vm_userfaultfd_ctx.ctx != ctx); + WARN_ON(!(vma->vm_flags & VM_MAYWRITE)); + + /* + * Nothing to do: this vma is already registered into this + * userfaultfd and with the right tracking mode too. + */ + if (vma->vm_userfaultfd_ctx.ctx == ctx && + (vma->vm_flags & vm_flags) == vm_flags) + goto skip; + + if (vma->vm_start > start) + start = vma->vm_start; + vma_end = min(end, vma->vm_end); + + new_flags = (vma->vm_flags & ~__VM_UFFD_FLAGS) | vm_flags; + vma = vma_modify_flags_uffd(&vmi, prev, vma, start, vma_end, + new_flags, + (struct vm_userfaultfd_ctx){ctx}); + if (IS_ERR(vma)) + return PTR_ERR(vma); + + /* + * In the vma_merge() successful mprotect-like case 8: + * the next vma was merged into the current one and + * the current one has not been updated yet. + */ + userfaultfd_set_ctx(vma, ctx, vm_flags); + + if (is_vm_hugetlb_page(vma) && uffd_disable_huge_pmd_share(vma)) + hugetlb_unshare_all_pmds(vma); + +skip: + prev = vma; + start = vma->vm_end; + } + + return 0; +} + +void userfaultfd_release_new(struct userfaultfd_ctx *ctx) +{ + struct mm_struct *mm = ctx->mm; + struct vm_area_struct *vma; + VMA_ITERATOR(vmi, mm, 0); + + /* the various vma->vm_userfaultfd_ctx still points to it */ + mmap_write_lock(mm); + for_each_vma(vmi, vma) { + if (vma->vm_userfaultfd_ctx.ctx == ctx) + userfaultfd_reset_ctx(vma); + } + mmap_write_unlock(mm); +} + +void userfaultfd_release_all(struct mm_struct *mm, + struct userfaultfd_ctx *ctx) +{ + struct vm_area_struct *vma, *prev; + VMA_ITERATOR(vmi, mm, 0); + + if (!mmget_not_zero(mm)) + return; + + /* + * Flush page faults out of all CPUs. NOTE: all page faults + * must be retried without returning VM_FAULT_SIGBUS if + * userfaultfd_ctx_get() succeeds but vma->vma_userfault_ctx + * changes while handle_userfault released the mmap_lock. So + * it's critical that released is set to true (above), before + * taking the mmap_lock for writing. + */ + mmap_write_lock(mm); + prev = NULL; + for_each_vma(vmi, vma) { + cond_resched(); + BUG_ON(!!vma->vm_userfaultfd_ctx.ctx ^ + !!(vma->vm_flags & __VM_UFFD_FLAGS)); + if (vma->vm_userfaultfd_ctx.ctx != ctx) { + prev = vma; + continue; + } + + vma = userfaultfd_clear_vma(&vmi, prev, vma, + vma->vm_start, vma->vm_end); + prev = vma; + } + mmap_write_unlock(mm); + mmput(mm); +}