From patchwork Fri Jun 28 14:35:22 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lorenzo Stoakes X-Patchwork-Id: 13716242 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E5BD9C3064D for ; Fri, 28 Jun 2024 14:35:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 56F366B008C; Fri, 28 Jun 2024 10:35:42 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 51F886B0095; Fri, 28 Jun 2024 10:35:42 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 34B7E6B0096; Fri, 28 Jun 2024 10:35:42 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 0FA4A6B008C for ; Fri, 28 Jun 2024 10:35:42 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id B83F61204A5 for ; Fri, 28 Jun 2024 14:35:41 +0000 (UTC) X-FDA: 82280546082.13.945F8FD Received: from mail-lf1-f41.google.com (mail-lf1-f41.google.com [209.85.167.41]) by imf06.hostedemail.com (Postfix) with ESMTP id 9FDE7180028 for ; Fri, 28 Jun 2024 14:35:39 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Dj04VhwB; spf=pass (imf06.hostedemail.com: domain of lstoakes@gmail.com designates 209.85.167.41 as permitted sender) smtp.mailfrom=lstoakes@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1719585330; a=rsa-sha256; cv=none; b=wXVr+4apoK9OsDEuTfqoAYH0gAj+4cufyOpTXCRjqOF2vp7kmMdcCEENnwlJSgRTbPID5A FnrR/8yEubOSfTlFx9yEzzjZf4p3nVS/pseFJROrozLf3bJ1zI+y3eIfE1vTWW8h+SmF8Z DcsaCuIIWS0kS7FHmX6E7jZ6VQBum2w= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Dj04VhwB; spf=pass (imf06.hostedemail.com: domain of lstoakes@gmail.com designates 209.85.167.41 as permitted sender) smtp.mailfrom=lstoakes@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1719585330; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=45Fn1VR56OryxHQR2ICCsQfaXEj2VPk0P2HvHUTeh68=; b=i/zmg4oiC1zZeAE1ufAXApmAT6EltbVI6kIcT895/q5xBpMfjs3TuX6z9173t/TMxVYY6W Wn3r0yLJDeFv+1rFmyKuy0iCkNGV+WcTWj1IVrqeRaMZZV39tvAu9mwLmayVw4v2DMoryc 0JBznBwFNv5owNvKTXfqAJQygMNhADo= Received: by mail-lf1-f41.google.com with SMTP id 2adb3069b0e04-52cdd893e5cso759015e87.1 for ; Fri, 28 Jun 2024 07:35:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1719585338; x=1720190138; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=45Fn1VR56OryxHQR2ICCsQfaXEj2VPk0P2HvHUTeh68=; b=Dj04VhwBfQ0m/cQLzLeul3I8VWXPzda3KfDdpqG7jdegPU4kbrI6SEIV/z1AgoxQtO KYbYMgh56IWR0g0RYi24GWZtiXS+RGRy3sMTDRLj2zNLRm0j62tb6QibPR99+paNjKHp UdtgQDY95yC+FwADENwl5aRRBTzKB1UdaoX/Oitv6i2KwWWNBqKTZoO/sgF++T+yrcLo Rn9Rz+6sihm1WFV88posEimA4F3ExV/8BxM34X7FC5e9Y4+wLgM1JMDVDsiAHxx9Tj/e 1RJK406slB9E3/5fAoI2IiFdJj+xk457MPlF7VcL8oc08Nn2h+6KCdzCxuBMdmuSMKfL jQAw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1719585338; x=1720190138; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=45Fn1VR56OryxHQR2ICCsQfaXEj2VPk0P2HvHUTeh68=; b=i1VbfQ6OjXBGPGf3cFBuxbAXRarVP6xPLXNQL6d7CTGiyExbk5uc7n8H1JIBy2fM65 UMZU/o/ZjSzVByA72pxTsA8HonYE8RwwDTbkLRrtF8R/r4elKpBBq1qe1XMp0/QLpDjL cmORyhHGJC5yT05Z6xyEdMnPqzka5YfhvkJfwHFcZcALI9YcoBQXDPDpl+/pyNzJLveA CTTqz9rOItqFS9I43vuFvywqnaW72jH8ENvx4a2k4Ua8VKl5GMqbJDU99q2WjEqIxNVl 7HM37zkFvRBJt+01ZBGpxVJ6MiROHOXfAfgMBjlFDegXRRWNyZ7IS5qU63AqiODQ37qY oV2Q== X-Forwarded-Encrypted: i=1; AJvYcCUOJ5fpCL1G/5/DFo09ktgY0/Dlj/v+ca6WTVkTTY9j+PA66spU+ybePYYCO3LCrfi3NGQ/LfbsI9VwLCnWfbiH9xM= X-Gm-Message-State: AOJu0YzX37BtPxrDrNBlA0Z+rMXaNMG66FZzdT2epkLB8P/61rZdpPKc KCLnS5deXbkre553TeiF4muoUYHa6wyx6CwCtdl410KMDeKwxGuf X-Google-Smtp-Source: AGHT+IGADRofLmu0u6p9UqIYdElb/weLzHiNpcCewkVldJURTAToz8vEU+B0cUlLgs5vMRuOvfV8Zg== X-Received: by 2002:a05:6512:e95:b0:52c:898b:a180 with SMTP id 2adb3069b0e04-52ce1832725mr16276974e87.12.1719585337325; Fri, 28 Jun 2024 07:35:37 -0700 (PDT) Received: from lucifer.home ([2a00:23cc:d20f:ba01:bb66:f8b2:a0e8:6447]) by smtp.googlemail.com with ESMTPSA id 5b1f17b1804b1-4256af37828sm38985485e9.9.2024.06.28.07.35.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 28 Jun 2024 07:35:35 -0700 (PDT) From: Lorenzo Stoakes To: Andrew Morton Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, "Liam R . Howlett" , Vlastimil Babka , Matthew Wilcox , Alexander Viro , Christian Brauner , Jan Kara , Eric Biederman , Kees Cook , Suren Baghdasaryan , Lorenzo Stoakes Subject: [RFC PATCH v2 1/7] userfaultfd: move core VMA manipulation logic to mm/userfaultfd.c Date: Fri, 28 Jun 2024 15:35:22 +0100 Message-ID: <61741ce2b6c4a782ed29e3a1762047fb3c306309.1719584707.git.lstoakes@gmail.com> X-Mailer: git-send-email 2.45.1 In-Reply-To: References: MIME-Version: 1.0 X-Stat-Signature: 15afx87a4qik4b6nds6ngfqexyd9oxif X-Rspamd-Queue-Id: 9FDE7180028 X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1719585339-616579 X-HE-Meta: U2FsdGVkX1+gYSP1ar5+ezoE9yIHcFmQs61QFJR8t+EOsrEj/DvlkUOgfWHqxlKI6WzaMnvTdixj4sp6KAFK8lA4bR/ZMHZX1Rjt/yEl1re4CmAnXKX+F/eKaMDEr6TeIV8YlWPCTyb61KMNqD6h2lEml8If0ptijOdHD5H6chgM7/y/xFppVJ/z+o9tjzIgTbUGZmvIv1PoTPuEJh079dKVZ1k16ctpcZsimErBfrgexLZowlRVSM98CG9x5SEySa4qxQVrss8t3wdSfL7hwny0KmZq/Hu/n70T24I07EE1pTx6kA4YFhRrWhJwL1nyy3RItx7b4pLaW8TcOUZRPCvlH9noGGdmZxisJ0gCKwu/TnaUPPqe/PZ2lG7KY7z932KAhaTzWdHpm0nX10jy8yFRlUZzrBcIW8Y4lGQZoV1XOZctZMTo5GJWaMKkwx/bGoAmM3D2smqSN30+m3Qp6LDmwvMhdHOILuIb2Tl01s2QcaOmfUWKCU6hiZQwlKSKD2+wMdgZ2oHWUttmvyzGhJUCd4crOMPv8J75PVhrHb1yapp2svEJD3msXEH69GkX9+0wuSq9QBSiwuuAAIkfgG2Av1+qZ2nBacqNkyu2oGidjjzxdGIm4vtoceKC/xyyyTLTuyiRA70xLIv8LMvS/ALlEZHuN1tnNFn82ASbIP9UWQjrGKbnL/Id+c5UGk5O2J1e8ms0kN7D9xb2QSQFcIpP5yMz6A4y2sA9UKEgQ0p/5GV/nOvLSEhr7Rcg7T776rejN1DRPZB14hDiise3ZB2nR7GSWlQfHfddXrYXdSDecxN8yAOwqyaGdY1NbKg4nNZ2t8PIVRO5T0HgmI0Rj0gDF+pPDvdRGxPe0Rlg5BLTT+vrA8uHkkVAPtZ3DVioFeQOnfN2EovXkfdfx8VdgoD216GD7RqLsioIPEtkVDaWwB9+n+vs6ziXPBUY0m0cc6PaZUuQYhbm+LoGflL MxBAKYt4 LUwGh5t66zQ2xra3A2GSidb228SWIr5961wyoue13++3WpM/9sBllxgFBAAk0hgYiLbG1TES62E7Tlz2FQw5vcu/K+g2tbXsVnEnQxgZVhkP3Q5jyTUYhcWOHORHvPxCx0h8WoC2MOzntJxj1bpZ8AIWsW0K67DC5GWo1zlZGwDH2hDO6CNPe1OhJ2Fe+GMSgiubBRIwQQBBR2Z7v5z4VyTqgLGYxFbbzP0uDvlT+zw0eBbzoVGpl7Kuf4grVuRuoJBQfgKbDQuDztilj0pcGutW/LILCBOMnsxpEAVd4OcNmSD4Lxg4yk0ORN+vEVJr97MydQLnkRwVIm/WwCsA/EUK2VA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: This patch forms part of a patch series intending to separate out VMA logic and render it testable from userspace, which requires that core manipulation functions be exposed in an mm/-internal header file. In order to do this, we must abstract APIs we wish to test, in this instance functions which ultimately invoke vma_modify(). This patch therefore moves all logic which ultimately invokes vma_modify() to mm/userfaultfd.c, trying to transfer code at a functional granularity where possible. Signed-off-by: Lorenzo Stoakes --- fs/userfaultfd.c | 160 +++----------------------------- include/linux/userfaultfd_k.h | 19 ++++ mm/userfaultfd.c | 168 ++++++++++++++++++++++++++++++++++ 3 files changed, 198 insertions(+), 149 deletions(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 17e409ceaa33..31fa788d9ecd 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -104,21 +104,6 @@ bool userfaultfd_wp_unpopulated(struct vm_area_struct *vma) return ctx->features & UFFD_FEATURE_WP_UNPOPULATED; } -static void userfaultfd_set_vm_flags(struct vm_area_struct *vma, - vm_flags_t flags) -{ - const bool uffd_wp_changed = (vma->vm_flags ^ flags) & VM_UFFD_WP; - - vm_flags_reset(vma, flags); - /* - * For shared mappings, we want to enable writenotify while - * userfaultfd-wp is enabled (see vma_wants_writenotify()). We'll simply - * recalculate vma->vm_page_prot whenever userfaultfd-wp changes. - */ - if ((vma->vm_flags & VM_SHARED) && uffd_wp_changed) - vma_set_page_prot(vma); -} - static int userfaultfd_wake_function(wait_queue_entry_t *wq, unsigned mode, int wake_flags, void *key) { @@ -615,22 +600,7 @@ static void userfaultfd_event_wait_completion(struct userfaultfd_ctx *ctx, spin_unlock_irq(&ctx->event_wqh.lock); if (release_new_ctx) { - struct vm_area_struct *vma; - struct mm_struct *mm = release_new_ctx->mm; - VMA_ITERATOR(vmi, mm, 0); - - /* the various vma->vm_userfaultfd_ctx still points to it */ - mmap_write_lock(mm); - for_each_vma(vmi, vma) { - if (vma->vm_userfaultfd_ctx.ctx == release_new_ctx) { - vma_start_write(vma); - vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX; - userfaultfd_set_vm_flags(vma, - vma->vm_flags & ~__VM_UFFD_FLAGS); - } - } - mmap_write_unlock(mm); - + userfaultfd_release_new(release_new_ctx); userfaultfd_ctx_put(release_new_ctx); } @@ -662,9 +632,7 @@ int dup_userfaultfd(struct vm_area_struct *vma, struct list_head *fcs) return 0; if (!(octx->features & UFFD_FEATURE_EVENT_FORK)) { - vma_start_write(vma); - vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX; - userfaultfd_set_vm_flags(vma, vma->vm_flags & ~__VM_UFFD_FLAGS); + userfaultfd_reset_ctx(vma); return 0; } @@ -749,9 +717,7 @@ void mremap_userfaultfd_prep(struct vm_area_struct *vma, up_write(&ctx->map_changing_lock); } else { /* Drop uffd context if remap feature not enabled */ - vma_start_write(vma); - vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX; - userfaultfd_set_vm_flags(vma, vma->vm_flags & ~__VM_UFFD_FLAGS); + userfaultfd_reset_ctx(vma); } } @@ -870,53 +836,13 @@ static int userfaultfd_release(struct inode *inode, struct file *file) { struct userfaultfd_ctx *ctx = file->private_data; struct mm_struct *mm = ctx->mm; - struct vm_area_struct *vma, *prev; /* len == 0 means wake all */ struct userfaultfd_wake_range range = { .len = 0, }; - unsigned long new_flags; - VMA_ITERATOR(vmi, mm, 0); WRITE_ONCE(ctx->released, true); - if (!mmget_not_zero(mm)) - goto wakeup; - - /* - * Flush page faults out of all CPUs. NOTE: all page faults - * must be retried without returning VM_FAULT_SIGBUS if - * userfaultfd_ctx_get() succeeds but vma->vma_userfault_ctx - * changes while handle_userfault released the mmap_lock. So - * it's critical that released is set to true (above), before - * taking the mmap_lock for writing. - */ - mmap_write_lock(mm); - prev = NULL; - for_each_vma(vmi, vma) { - cond_resched(); - BUG_ON(!!vma->vm_userfaultfd_ctx.ctx ^ - !!(vma->vm_flags & __VM_UFFD_FLAGS)); - if (vma->vm_userfaultfd_ctx.ctx != ctx) { - prev = vma; - continue; - } - /* Reset ptes for the whole vma range if wr-protected */ - if (userfaultfd_wp(vma)) - uffd_wp_range(vma, vma->vm_start, - vma->vm_end - vma->vm_start, false); - new_flags = vma->vm_flags & ~__VM_UFFD_FLAGS; - vma = vma_modify_flags_uffd(&vmi, prev, vma, vma->vm_start, - vma->vm_end, new_flags, - NULL_VM_UFFD_CTX); - - vma_start_write(vma); - userfaultfd_set_vm_flags(vma, new_flags); - vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX; + userfaultfd_release_all(mm, ctx); - prev = vma; - } - mmap_write_unlock(mm); - mmput(mm); -wakeup: /* * After no new page faults can wait on this fault_*wqh, flush * the last page faults that may have been already waiting on @@ -1293,14 +1219,14 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx, unsigned long arg) { struct mm_struct *mm = ctx->mm; - struct vm_area_struct *vma, *prev, *cur; + struct vm_area_struct *vma, *cur; int ret; struct uffdio_register uffdio_register; struct uffdio_register __user *user_uffdio_register; - unsigned long vm_flags, new_flags; + unsigned long vm_flags; bool found; bool basic_ioctls; - unsigned long start, end, vma_end; + unsigned long start, end; struct vma_iterator vmi; bool wp_async = userfaultfd_wp_async_ctx(ctx); @@ -1428,57 +1354,8 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx, } for_each_vma_range(vmi, cur, end); BUG_ON(!found); - vma_iter_set(&vmi, start); - prev = vma_prev(&vmi); - if (vma->vm_start < start) - prev = vma; - - ret = 0; - for_each_vma_range(vmi, vma, end) { - cond_resched(); - - BUG_ON(!vma_can_userfault(vma, vm_flags, wp_async)); - BUG_ON(vma->vm_userfaultfd_ctx.ctx && - vma->vm_userfaultfd_ctx.ctx != ctx); - WARN_ON(!(vma->vm_flags & VM_MAYWRITE)); - - /* - * Nothing to do: this vma is already registered into this - * userfaultfd and with the right tracking mode too. - */ - if (vma->vm_userfaultfd_ctx.ctx == ctx && - (vma->vm_flags & vm_flags) == vm_flags) - goto skip; - - if (vma->vm_start > start) - start = vma->vm_start; - vma_end = min(end, vma->vm_end); - - new_flags = (vma->vm_flags & ~__VM_UFFD_FLAGS) | vm_flags; - vma = vma_modify_flags_uffd(&vmi, prev, vma, start, vma_end, - new_flags, - (struct vm_userfaultfd_ctx){ctx}); - if (IS_ERR(vma)) { - ret = PTR_ERR(vma); - break; - } - - /* - * In the vma_merge() successful mprotect-like case 8: - * the next vma was merged into the current one and - * the current one has not been updated yet. - */ - vma_start_write(vma); - userfaultfd_set_vm_flags(vma, new_flags); - vma->vm_userfaultfd_ctx.ctx = ctx; - - if (is_vm_hugetlb_page(vma) && uffd_disable_huge_pmd_share(vma)) - hugetlb_unshare_all_pmds(vma); - - skip: - prev = vma; - start = vma->vm_end; - } + ret = userfaultfd_register_range(ctx, vma, vm_flags, start, end, + wp_async); out_unlock: mmap_write_unlock(mm); @@ -1519,7 +1396,6 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx, struct vm_area_struct *vma, *prev, *cur; int ret; struct uffdio_range uffdio_unregister; - unsigned long new_flags; bool found; unsigned long start, end, vma_end; const void __user *buf = (void __user *)arg; @@ -1622,27 +1498,13 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx, wake_userfault(vma->vm_userfaultfd_ctx.ctx, &range); } - /* Reset ptes for the whole vma range if wr-protected */ - if (userfaultfd_wp(vma)) - uffd_wp_range(vma, start, vma_end - start, false); - - new_flags = vma->vm_flags & ~__VM_UFFD_FLAGS; - vma = vma_modify_flags_uffd(&vmi, prev, vma, start, vma_end, - new_flags, NULL_VM_UFFD_CTX); + vma = userfaultfd_clear_vma(&vmi, prev, vma, + start, vma_end); if (IS_ERR(vma)) { ret = PTR_ERR(vma); break; } - /* - * In the vma_merge() successful mprotect-like case 8: - * the next vma was merged into the current one and - * the current one has not been updated yet. - */ - vma_start_write(vma); - userfaultfd_set_vm_flags(vma, new_flags); - vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX; - skip: prev = vma; start = vma->vm_end; diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index 05d59f74fc88..6355ed5bd34b 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -264,6 +264,25 @@ extern void userfaultfd_unmap_complete(struct mm_struct *mm, extern bool userfaultfd_wp_unpopulated(struct vm_area_struct *vma); extern bool userfaultfd_wp_async(struct vm_area_struct *vma); +extern void userfaultfd_reset_ctx(struct vm_area_struct *vma); + +extern struct vm_area_struct *userfaultfd_clear_vma(struct vma_iterator *vmi, + struct vm_area_struct *prev, + struct vm_area_struct *vma, + unsigned long start, + unsigned long end); + +int userfaultfd_register_range(struct userfaultfd_ctx *ctx, + struct vm_area_struct *vma, + unsigned long vm_flags, + unsigned long start, unsigned long end, + bool wp_async); + +extern void userfaultfd_release_new(struct userfaultfd_ctx *ctx); + +extern void userfaultfd_release_all(struct mm_struct *mm, + struct userfaultfd_ctx *ctx); + #else /* CONFIG_USERFAULTFD */ /* mm helpers */ diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 8dedaec00486..950fe6b2f0f7 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -1760,3 +1760,171 @@ ssize_t move_pages(struct userfaultfd_ctx *ctx, unsigned long dst_start, VM_WARN_ON(!moved && !err); return moved ? moved : err; } + +static void userfaultfd_set_vm_flags(struct vm_area_struct *vma, + vm_flags_t flags) +{ + const bool uffd_wp_changed = (vma->vm_flags ^ flags) & VM_UFFD_WP; + + vm_flags_reset(vma, flags); + /* + * For shared mappings, we want to enable writenotify while + * userfaultfd-wp is enabled (see vma_wants_writenotify()). We'll simply + * recalculate vma->vm_page_prot whenever userfaultfd-wp changes. + */ + if ((vma->vm_flags & VM_SHARED) && uffd_wp_changed) + vma_set_page_prot(vma); +} + +static void userfaultfd_set_ctx(struct vm_area_struct *vma, + struct userfaultfd_ctx *ctx, + unsigned long flags) +{ + vma_start_write(vma); + vma->vm_userfaultfd_ctx = (struct vm_userfaultfd_ctx){ctx}; + userfaultfd_set_vm_flags(vma, + (vma->vm_flags & ~__VM_UFFD_FLAGS) | flags); +} + +void userfaultfd_reset_ctx(struct vm_area_struct *vma) +{ + userfaultfd_set_ctx(vma, NULL, 0); +} + +struct vm_area_struct *userfaultfd_clear_vma(struct vma_iterator *vmi, + struct vm_area_struct *prev, + struct vm_area_struct *vma, + unsigned long start, + unsigned long end) +{ + struct vm_area_struct *ret; + + /* Reset ptes for the whole vma range if wr-protected */ + if (userfaultfd_wp(vma)) + uffd_wp_range(vma, start, end - start, false); + + ret = vma_modify_flags_uffd(vmi, prev, vma, start, end, + vma->vm_flags & ~__VM_UFFD_FLAGS, + NULL_VM_UFFD_CTX); + + /* + * In the vma_merge() successful mprotect-like case 8: + * the next vma was merged into the current one and + * the current one has not been updated yet. + */ + if (!IS_ERR(ret)) + userfaultfd_reset_ctx(vma); + + return ret; +} + +/* Assumes mmap write lock taken, and mm_struct pinned. */ +int userfaultfd_register_range(struct userfaultfd_ctx *ctx, + struct vm_area_struct *vma, + unsigned long vm_flags, + unsigned long start, unsigned long end, + bool wp_async) +{ + VMA_ITERATOR(vmi, ctx->mm, start); + struct vm_area_struct *prev = vma_prev(&vmi); + unsigned long vma_end; + unsigned long new_flags; + + if (vma->vm_start < start) + prev = vma; + + for_each_vma_range(vmi, vma, end) { + cond_resched(); + + BUG_ON(!vma_can_userfault(vma, vm_flags, wp_async)); + BUG_ON(vma->vm_userfaultfd_ctx.ctx && + vma->vm_userfaultfd_ctx.ctx != ctx); + WARN_ON(!(vma->vm_flags & VM_MAYWRITE)); + + /* + * Nothing to do: this vma is already registered into this + * userfaultfd and with the right tracking mode too. + */ + if (vma->vm_userfaultfd_ctx.ctx == ctx && + (vma->vm_flags & vm_flags) == vm_flags) + goto skip; + + if (vma->vm_start > start) + start = vma->vm_start; + vma_end = min(end, vma->vm_end); + + new_flags = (vma->vm_flags & ~__VM_UFFD_FLAGS) | vm_flags; + vma = vma_modify_flags_uffd(&vmi, prev, vma, start, vma_end, + new_flags, + (struct vm_userfaultfd_ctx){ctx}); + if (IS_ERR(vma)) + return PTR_ERR(vma); + + /* + * In the vma_merge() successful mprotect-like case 8: + * the next vma was merged into the current one and + * the current one has not been updated yet. + */ + userfaultfd_set_ctx(vma, ctx, vm_flags); + + if (is_vm_hugetlb_page(vma) && uffd_disable_huge_pmd_share(vma)) + hugetlb_unshare_all_pmds(vma); + +skip: + prev = vma; + start = vma->vm_end; + } + + return 0; +} + +void userfaultfd_release_new(struct userfaultfd_ctx *ctx) +{ + struct mm_struct *mm = ctx->mm; + struct vm_area_struct *vma; + VMA_ITERATOR(vmi, mm, 0); + + /* the various vma->vm_userfaultfd_ctx still points to it */ + mmap_write_lock(mm); + for_each_vma(vmi, vma) { + if (vma->vm_userfaultfd_ctx.ctx == ctx) + userfaultfd_reset_ctx(vma); + } + mmap_write_unlock(mm); +} + +void userfaultfd_release_all(struct mm_struct *mm, + struct userfaultfd_ctx *ctx) +{ + struct vm_area_struct *vma, *prev; + VMA_ITERATOR(vmi, mm, 0); + + if (!mmget_not_zero(mm)) + return; + + /* + * Flush page faults out of all CPUs. NOTE: all page faults + * must be retried without returning VM_FAULT_SIGBUS if + * userfaultfd_ctx_get() succeeds but vma->vma_userfault_ctx + * changes while handle_userfault released the mmap_lock. So + * it's critical that released is set to true (above), before + * taking the mmap_lock for writing. + */ + mmap_write_lock(mm); + prev = NULL; + for_each_vma(vmi, vma) { + cond_resched(); + BUG_ON(!!vma->vm_userfaultfd_ctx.ctx ^ + !!(vma->vm_flags & __VM_UFFD_FLAGS)); + if (vma->vm_userfaultfd_ctx.ctx != ctx) { + prev = vma; + continue; + } + + vma = userfaultfd_clear_vma(&vmi, prev, vma, + vma->vm_start, vma->vm_end); + prev = vma; + } + mmap_write_unlock(mm); + mmput(mm); +} From patchwork Fri Jun 28 14:35:23 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lorenzo Stoakes X-Patchwork-Id: 13716243 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DD16CC2BD09 for ; Fri, 28 Jun 2024 14:35:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 177F96B0095; Fri, 28 Jun 2024 10:35:44 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 12A446B0096; Fri, 28 Jun 2024 10:35:44 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E48656B0098; Fri, 28 Jun 2024 10:35:43 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id C0E7C6B0095 for ; Fri, 28 Jun 2024 10:35:43 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 362BB1C0C75 for ; Fri, 28 Jun 2024 14:35:43 +0000 (UTC) X-FDA: 82280546166.29.D8DB988 Received: from mail-wm1-f50.google.com (mail-wm1-f50.google.com [209.85.128.50]) by imf08.hostedemail.com (Postfix) with ESMTP id 321C5160020 for ; Fri, 28 Jun 2024 14:35:40 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Cq6r4v+3; spf=pass (imf08.hostedemail.com: domain of lstoakes@gmail.com designates 209.85.128.50 as permitted sender) smtp.mailfrom=lstoakes@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1719585327; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Z4tIycn76980srpS0Qx+f5aSwAYnJlMpkjRvY7J5k/M=; b=PDCZrwnqWu8F22xfAECldaTnb+NED0tIXbdHiVGHVaZZasEyuRqlCR0TOeC3DS7L5bOeVp +8BXlTPmjUbXpbI/qY7Ooy6z42Betn4a9GJnzUnlMmjW7/MS90kUivZdMBEi7DJDuYK1y2 JUYQy/F1WvvhxrNH5jcVtigDasUAkEE= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Cq6r4v+3; spf=pass (imf08.hostedemail.com: domain of lstoakes@gmail.com designates 209.85.128.50 as permitted sender) smtp.mailfrom=lstoakes@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1719585327; a=rsa-sha256; cv=none; b=flXiCLI6nBt70xIGEFBGwstzCzFubHsn+CVJN+sJ38xQr+0y3LAe9AQrQRUdJf3Bl0uBSJ WgHt0Z1yy61ZQt6uTHypMLzjasmg7MGVV/7j/Z59pKWosTaIbGEGZX+D/i1KGG1cOHsX4g 26QLqETt45A5yvMJF11wXWcn1nk+4pw= Received: by mail-wm1-f50.google.com with SMTP id 5b1f17b1804b1-4256eec963eso4480245e9.1 for ; Fri, 28 Jun 2024 07:35:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1719585340; x=1720190140; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Z4tIycn76980srpS0Qx+f5aSwAYnJlMpkjRvY7J5k/M=; b=Cq6r4v+31hwQmziUz6bUr6OUEdhdJfvVGMSPLocAePDSfe5VQeSPl/zASECk1c204F hXoEA9I42y0MWaUrRD6L6keJsnNsom1uPzlfTvwp1Qx3Sa2h/FHWG47MNtLkPkl0pNsv 7oWtJj47WChxhUgpN3BzIWX2rKTff93SDXTgp3ppYiWKBzhng2+EzyX2IXSmK7U9ZrYb 0ih3AMk8YYmLgUQPy91iWD3GtUokocJv1MrCnuisvFTVi+07zKuG7ZKsqO+lLhhXT6IY xcphVXgkhqHCDyZ/4k9Bw0fqswmdIasiLeP5ZoNBVO1ulcolTlY0FKOv4PxawL6fbK5Y OABQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1719585340; x=1720190140; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Z4tIycn76980srpS0Qx+f5aSwAYnJlMpkjRvY7J5k/M=; b=D7JTr1pXpoBwtHCynfzcAHEP8v+y8ro89vzifBv+UW39fE3xfhC8CkONtiacpa3zAU ++gE5No6nEV3gQjfjSOv+LocLoMjjzAHdIPb5T8nhgfv2FK+7ps1KkpnhlReuCyI9qQm SDBidperjEAdfNZzMqST18cC7fNUm3ww4UfQ9TR+Oyi/qRfGDDb38rIEDkuo3/bJgByr rJuvSlvGjvQObEenjpltPlAft8W/RZH3RVVMvg4baw+SCoXIrYzNXk/JOEoV9O2Z7N9O CDEjx1IUxEfZ1nFXxuCPFPHXyKAEMLrLTlL7GHGyUIZqgyee86QBRaEM+RO0Laf9EBA2 gc4g== X-Forwarded-Encrypted: i=1; AJvYcCU6Q/cfTuNBat43hhLDQ1F/S2d5wR22yr6Z203FzTbrD67czfEvimt2N8PpIC2/nm/XCJGJGf6v1iLSFENt6vPHIl4= X-Gm-Message-State: AOJu0Yyr2lNnP84tNaYBxz1NsaEMIuV+ZYZ/bJdBbESc4KOC8UWl24VC y3Sjflmg2mUSQRyegL2Ia5L0JUyLJ7qXwRfyQH0wPyW9Hll8MQSS X-Google-Smtp-Source: AGHT+IFhBGe8jyb5W7chS9Spyqsf3A8ZCwGCMKmUS7z29i0piJQ3aiKvCg0rr0Q6ubvs+Umxr5BHGQ== X-Received: by 2002:a05:600c:1608:b0:425:6498:3b6c with SMTP id 5b1f17b1804b1-42564983cd8mr40449895e9.26.1719585339378; Fri, 28 Jun 2024 07:35:39 -0700 (PDT) Received: from lucifer.home ([2a00:23cc:d20f:ba01:bb66:f8b2:a0e8:6447]) by smtp.googlemail.com with ESMTPSA id 5b1f17b1804b1-4256af37828sm38985485e9.9.2024.06.28.07.35.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 28 Jun 2024 07:35:38 -0700 (PDT) From: Lorenzo Stoakes To: Andrew Morton Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, "Liam R . Howlett" , Vlastimil Babka , Matthew Wilcox , Alexander Viro , Christian Brauner , Jan Kara , Eric Biederman , Kees Cook , Suren Baghdasaryan , Lorenzo Stoakes Subject: [RFC PATCH v2 2/7] mm: move vma_modify() and helpers to internal header Date: Fri, 28 Jun 2024 15:35:23 +0100 Message-ID: <0e048500da5cfd51647699c244b1575229856bd1.1719584707.git.lstoakes@gmail.com> X-Mailer: git-send-email 2.45.1 In-Reply-To: References: MIME-Version: 1.0 X-Stat-Signature: hy9gco1k4xjd4rb7his3orwa9hpqbadk X-Rspam-User: X-Rspamd-Queue-Id: 321C5160020 X-Rspamd-Server: rspam02 X-HE-Tag: 1719585340-514425 X-HE-Meta: U2FsdGVkX1/F9OFKVkOMg6rZ5FeHru9po+SkKH7R1+0Ffj+4e6/bcWQi95615u8MmDKaVCjWL4Tsd+Z0626TAkc3uMRFBKW2grpIUQjC/jQPynzVV6wNeQkPKmdWvg8zdk79W7+ScMCVtIV+apcmLCshlw6UBh0CKUw8SbVYVCAkUOhPhE1SbZ9pRqaMf7W/szNV9gz4xkbTwoyC2VYX8M7bhf6SFdGGByCFQ7EURRSoq6QlW56V1kMpSYKck/Tl64stL6kSFaTEaLeoxZOImAsMNWAm/kob7t2o3U1xJcaeYHRxRoLtC/SXz4TwN/4qmwxGvyg1nyJtL9gPcs3CxMCe0uXvAzAG5+4f0M1duh3ashwp0x9uV6JdQATpy9SzOx+yqaXN3trcRhrdm2DRqZwjxJO1NVlQtPkzwWRVsdCXMHFJbLa48cIXijFRBPq1cEofGucQCccn/soyXSLwz7LkIEPm/IEGmvJ7v0SyEuWcpOHHY+ZlaoZ+OQ/QsOnAzBjiNbatKbekJ6jFIAJNk8vIW8g/boHSRsMOAA4Ba/FjLcF587us4BbrFjuvjqSxqfSqvvgtj4eraNUQCLUTwiA5BcuzEFMAaeNKJdCSYQToTFqStgGLiS6QMzmbzvmKHehipJkY52gU0Qg9W7jc4LWl7zebT1k7fIt03owGMDlNXwYeNaCFqufr//YiaiMt03oBvTgM/VKP6Y7K5RiVJYiu0ZL7+wV0xDAUx3VlJIoDTpE3SiEQzld3K7YapcTHtFpJ6VUwytHfiqduDIXgnBr9XLK+1Ww9puDL7K+8sN/Rbi5iYXLDM0DjlOyLhNCdVqRn1mirsi8d6Nc9qs3yOluHvr7NyzDDdtHTZP6DqjUuQXUN2xvcAEhBeOO1G2X73wUzjKfnld7djP6XBOjqwppzQ9J0jkibuUi7a4BG8LQIH9qA8dIkvHs3Azeg3AOUoittZlWBxWn3zoTkhnl qSycMzoE x7Fl0226TsnCZdQCV8fM0vHhMqlgK/5SaNhXEn1PP69rUX8JZFUc6OX4WUb844bSmMzRyj9qdYm6sbzWYdBlXpV4OdSKB4SDrSfNPsPasNSqpk6l88W9dn/IV7y0MfZSLrr+yjNECyQuhvenXVhMscCBoqHrF6JGcbkN5SXViSg2Bg64Eek8Jx6KrEprKqY1nJpxZ1cO5R2ExXS0nW+of2AXj9lcunfSwzTgGRIndLvdyN4mvZhQ/xrk8H0UPYZ0gYX87ibozJnkD7MOo7MhQ4JxzHTH41UwJPZaVXz0XL9mJibgnrA5P8WpJF2m4T4w1n0/ifS+fybRZ+myhtPFtsXc0xO+/487H+ik+aMM5cHsUfp6Xa3IFrzV5mocL7CuJaIxt X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: These are core VMA manipulation functions which invoke VMA splitting and merging and should not be directly accessed from outside of mm/. Signed-off-by: Lorenzo Stoakes --- include/linux/mm.h | 60 --------------------------------------------- mm/internal.h | 61 ++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 61 insertions(+), 60 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 5f1075d19600..4d2b5538925b 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3285,66 +3285,6 @@ extern struct vm_area_struct *copy_vma(struct vm_area_struct **, unsigned long addr, unsigned long len, pgoff_t pgoff, bool *need_rmap_locks); extern void exit_mmap(struct mm_struct *); -struct vm_area_struct *vma_modify(struct vma_iterator *vmi, - struct vm_area_struct *prev, - struct vm_area_struct *vma, - unsigned long start, unsigned long end, - unsigned long vm_flags, - struct mempolicy *policy, - struct vm_userfaultfd_ctx uffd_ctx, - struct anon_vma_name *anon_name); - -/* We are about to modify the VMA's flags. */ -static inline struct vm_area_struct -*vma_modify_flags(struct vma_iterator *vmi, - struct vm_area_struct *prev, - struct vm_area_struct *vma, - unsigned long start, unsigned long end, - unsigned long new_flags) -{ - return vma_modify(vmi, prev, vma, start, end, new_flags, - vma_policy(vma), vma->vm_userfaultfd_ctx, - anon_vma_name(vma)); -} - -/* We are about to modify the VMA's flags and/or anon_name. */ -static inline struct vm_area_struct -*vma_modify_flags_name(struct vma_iterator *vmi, - struct vm_area_struct *prev, - struct vm_area_struct *vma, - unsigned long start, - unsigned long end, - unsigned long new_flags, - struct anon_vma_name *new_name) -{ - return vma_modify(vmi, prev, vma, start, end, new_flags, - vma_policy(vma), vma->vm_userfaultfd_ctx, new_name); -} - -/* We are about to modify the VMA's memory policy. */ -static inline struct vm_area_struct -*vma_modify_policy(struct vma_iterator *vmi, - struct vm_area_struct *prev, - struct vm_area_struct *vma, - unsigned long start, unsigned long end, - struct mempolicy *new_pol) -{ - return vma_modify(vmi, prev, vma, start, end, vma->vm_flags, - new_pol, vma->vm_userfaultfd_ctx, anon_vma_name(vma)); -} - -/* We are about to modify the VMA's flags and/or uffd context. */ -static inline struct vm_area_struct -*vma_modify_flags_uffd(struct vma_iterator *vmi, - struct vm_area_struct *prev, - struct vm_area_struct *vma, - unsigned long start, unsigned long end, - unsigned long new_flags, - struct vm_userfaultfd_ctx new_ctx) -{ - return vma_modify(vmi, prev, vma, start, end, new_flags, - vma_policy(vma), new_ctx, anon_vma_name(vma)); -} static inline int check_data_rlimit(unsigned long rlim, unsigned long new, diff --git a/mm/internal.h b/mm/internal.h index b264a7dabefe..164f03c6bce2 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -1243,6 +1243,67 @@ struct vm_area_struct *vma_merge_extend(struct vma_iterator *vmi, struct vm_area_struct *vma, unsigned long delta); +struct vm_area_struct *vma_modify(struct vma_iterator *vmi, + struct vm_area_struct *prev, + struct vm_area_struct *vma, + unsigned long start, unsigned long end, + unsigned long vm_flags, + struct mempolicy *policy, + struct vm_userfaultfd_ctx uffd_ctx, + struct anon_vma_name *anon_name); + +/* We are about to modify the VMA's flags. */ +static inline struct vm_area_struct +*vma_modify_flags(struct vma_iterator *vmi, + struct vm_area_struct *prev, + struct vm_area_struct *vma, + unsigned long start, unsigned long end, + unsigned long new_flags) +{ + return vma_modify(vmi, prev, vma, start, end, new_flags, + vma_policy(vma), vma->vm_userfaultfd_ctx, + anon_vma_name(vma)); +} + +/* We are about to modify the VMA's flags and/or anon_name. */ +static inline struct vm_area_struct +*vma_modify_flags_name(struct vma_iterator *vmi, + struct vm_area_struct *prev, + struct vm_area_struct *vma, + unsigned long start, + unsigned long end, + unsigned long new_flags, + struct anon_vma_name *new_name) +{ + return vma_modify(vmi, prev, vma, start, end, new_flags, + vma_policy(vma), vma->vm_userfaultfd_ctx, new_name); +} + +/* We are about to modify the VMA's memory policy. */ +static inline struct vm_area_struct +*vma_modify_policy(struct vma_iterator *vmi, + struct vm_area_struct *prev, + struct vm_area_struct *vma, + unsigned long start, unsigned long end, + struct mempolicy *new_pol) +{ + return vma_modify(vmi, prev, vma, start, end, vma->vm_flags, + new_pol, vma->vm_userfaultfd_ctx, anon_vma_name(vma)); +} + +/* We are about to modify the VMA's flags and/or uffd context. */ +static inline struct vm_area_struct +*vma_modify_flags_uffd(struct vma_iterator *vmi, + struct vm_area_struct *prev, + struct vm_area_struct *vma, + unsigned long start, unsigned long end, + unsigned long new_flags, + struct vm_userfaultfd_ctx new_ctx) +{ + return vma_modify(vmi, prev, vma, start, end, new_flags, + vma_policy(vma), new_ctx, anon_vma_name(vma)); +} + enum { /* mark page accessed */ FOLL_TOUCH = 1 << 16, From patchwork Fri Jun 28 14:35:24 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lorenzo Stoakes X-Patchwork-Id: 13716244 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1D26EC2BD09 for ; Fri, 28 Jun 2024 14:35:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B8A1B6B0096; Fri, 28 Jun 2024 10:35:45 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B12056B0098; Fri, 28 Jun 2024 10:35:45 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 93DDC6B0099; Fri, 28 Jun 2024 10:35:45 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 6C29F6B0096 for ; Fri, 28 Jun 2024 10:35:45 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 1C8D0A1C72 for ; Fri, 28 Jun 2024 14:35:45 +0000 (UTC) X-FDA: 82280546250.09.6134EC6 Received: from mail-lf1-f43.google.com (mail-lf1-f43.google.com [209.85.167.43]) by imf18.hostedemail.com (Postfix) with ESMTP id 2AA4B1C0003 for ; Fri, 28 Jun 2024 14:35:42 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=bzdnoOGW; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf18.hostedemail.com: domain of lstoakes@gmail.com designates 209.85.167.43 as permitted sender) smtp.mailfrom=lstoakes@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1719585322; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=9WJzFQGPim0f4heCjtjo3+B1Hrs0LYDOSzxuaubE8V4=; b=JPM87HNO+UnoT+o15ZXUeT0OfjYOVDvwzQizHqx8fNAaDfjwgeaN5Pi3foZHcRpECb25+6 slN0u3Ijv6kinIRwjSQtcsKo34Z8uErddfIBxcICT4RzE4zRgQg4Lmd4fuJvti8DCp4XCB fxWB+YJ25VAL2tHOnit1J9cfKyIxy6s= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1719585322; a=rsa-sha256; cv=none; b=RCNtWCoqpNRCSnSH2BIIkz1wL3qdBhvLLyghDWpoTzqiK3//eKdPhFu581HbSijNs1FaS0 GZGH87BeFNcbLC9KuoC+Hn5AMMrQisO6eqK6VMZAPckLgN/tL8tNn/JfYjBuEVv5xgEwCV UHcYztwaw8yUJSJZtnF605mm7xsATuE= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=bzdnoOGW; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf18.hostedemail.com: domain of lstoakes@gmail.com designates 209.85.167.43 as permitted sender) smtp.mailfrom=lstoakes@gmail.com Received: by mail-lf1-f43.google.com with SMTP id 2adb3069b0e04-52cd628f21cso705282e87.3 for ; Fri, 28 Jun 2024 07:35:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1719585341; x=1720190141; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=9WJzFQGPim0f4heCjtjo3+B1Hrs0LYDOSzxuaubE8V4=; b=bzdnoOGWzAcN7a0Uu1KBjgV1lhJzzoy2TMNTCwXAfqL2yiMEMftJnGmhfYW9EIsZaE 0DIxxe1HNHFy6bCAppF/HZYL3++G7926tTgP/rwB6QmpORt7iyFfGI9gemtPtyWDf1/m P0tY6U1QZye0OMU7WYz3V3soDhOYSPiZKhbj3+OWIYBnWVXcvoaZqJWO3Hp60N9XtriH 0Br48i7fK/J2Cj5RT0fJk9BUGtb/vQ0jB2Gry5kpfsdNgmoRsfuFGecanRdk/ziSvtS2 dmjMKO+Gf5euDKW6fnTHGkItQGd4LZyIIFuxBXHMUTYmDhYNYi3p6tEW41CUUkdH7m8a AaTA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1719585341; x=1720190141; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=9WJzFQGPim0f4heCjtjo3+B1Hrs0LYDOSzxuaubE8V4=; b=VMxm5dVfAx5XjjrAcrV9gvkd/nzFx2RIUGjEtIxKysLev+XRVZCAgN1QN8Bw6juX35 7bfB6BWwia9SP7pVg1K9g2Ne9zYF+3b7zasdyaSi2puV6J2WUKsf3SD4TUvL8MvIw5b/ 68yQTS2Dl4OknXqYI0W/CswxekSfgp2UVJ9SnN9Sdm8H93HQCHzJw4hNichpEtMkUIAx hlFt0FLczGlZ1ddDrEuRQWXbNjCBISP3EZLLKH/6m9P/li4h7jM9agCOrbn09+vPWntZ Ks7B1ULZTMOEjT0v31zsLtKB+jiLrkLSN9S38UF7ReoTExMGYKo7Y6AEYj2YVEmgJbyK tHIg== X-Forwarded-Encrypted: i=1; AJvYcCWAfX/XBDcyomCeVs66Hyl6LOzXuA1A3kxCoG17GtM5eqRpHc+y6rv5YwTGeVHwUiKBXtszA//EhhnoK93lszCKmG4= X-Gm-Message-State: AOJu0Yx/+qbNFu342ftUIMOhSRpX/gEZ9prs6m2rgonYDoYjq1ADrpTd uZLifi/Ml/t+AWFJfvpM+IhyrMRQERxjoQh86sZWB6EKB+DKLYNv X-Google-Smtp-Source: AGHT+IElovPPcmb1nrio2SN5g9r5FMwlKXD72NvlhVm6ljpkGlvNAV9o84Vw6PLKxk+nipDVk0hyxg== X-Received: by 2002:a05:6512:ad3:b0:52c:df6f:a66 with SMTP id 2adb3069b0e04-52cf45c1be9mr9253395e87.58.1719585341301; Fri, 28 Jun 2024 07:35:41 -0700 (PDT) Received: from lucifer.home ([2a00:23cc:d20f:ba01:bb66:f8b2:a0e8:6447]) by smtp.googlemail.com with ESMTPSA id 5b1f17b1804b1-4256af37828sm38985485e9.9.2024.06.28.07.35.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 28 Jun 2024 07:35:40 -0700 (PDT) From: Lorenzo Stoakes To: Andrew Morton Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, "Liam R . Howlett" , Vlastimil Babka , Matthew Wilcox , Alexander Viro , Christian Brauner , Jan Kara , Eric Biederman , Kees Cook , Suren Baghdasaryan , Lorenzo Stoakes Subject: [RFC PATCH v2 3/7] mm: move vma_shrink(), vma_expand() to internal header Date: Fri, 28 Jun 2024 15:35:24 +0100 Message-ID: X-Mailer: git-send-email 2.45.1 In-Reply-To: References: MIME-Version: 1.0 X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 2AA4B1C0003 X-Stat-Signature: 9zsu8a43rucfwyrnr9tz4ptexbkfdddp X-Rspam-User: X-HE-Tag: 1719585342-330729 X-HE-Meta: U2FsdGVkX1+LxsoltbK/JtgevaNYpnAC/N5s0ai31ZrEieR+7CPYVobYwQ+84r0H8QsoOWKPSgcHPm1LV9f/NRH1U516WEanj04NkRFI3lGG6eO54/ePmQ3hUXzcgp1t1+vwbHCiDKosI6+IlVXoA7Z5qG45a/OFnXY/MPraf9pbY+quDHYBaI0X+f0Wpq1gLiH1lnALQXZKcXhYK718sSM/d3zfopnBqR3vvH5WMGaOQmdmA99qclIyWrvkIzH9EZj8kiWaerWaG+yT4ZbSmGdjvIpl6UjNFW8yVvnk6Y22HeQNfSYd5rqj54lmnFNapu9pI4KHpNsPOY2Htt9JxoRYefH4to3SdzbvDfqC6neopiTE1DxjciIWi2JR6JmeBQ3x8/SW+ArT8WfDS7Hf9YuCZJM4iO2NczwZ+T9lR/uCZJL9GAAWQBMrUQjbg+KgxgHGHVLbA/kJYQsgq+U2Mf/YGhHxvM1eFLy8mvR7kWhBsRTzSB+ZXbKQ0kVkZdXBCl6lMRMkamaoXPHOH20qeW8XdcigA5D+sa31A4qHZVUCNqUoWbOiJWFppwSCCT3KUivt3xxJCUmxe5p4L5+d8r1UasFh+xCDDggaanKtzNnwAUJul2FBItYa/znA0Z5eFYzJkp6nj+5z7pRIXEe4LoUS3EML0HCgMKcoAG8gb5V4xHTHGynb4JY/oF5tnZjS98CHVpAJW7ng5vsBbwEhQz3R/q9nOhTh1A2+la3Ku1jS5HfsrZB3SoP0SuIdCOWHiSiLF0j6Krgipf4rDHJpxLQu/nKpoR5eItJAsRsrkKNxh9VM9u7rnZTSPmUV2D8As8kCJUWlqKcSr/XwY8msUgzhVnib9fbCPqnc1jGQFtUDp2GLa7fIRxzI8nJ24YKIrrsYc0yx1OwNribumnsQ/XxmT2CGqhjXb2Accc3xcR1JZ4vzcR3jLdO5AFpjp29n758hP5PYPycwHXeK8YS mkq1xs5B 9uyF4zxksbFZI5kIHLzzXDgHivTWFGvpUc9XBnG7MEwTqR84AjRd1nuQ516zrO/T+bSFV7VwUdAEyTkLmWEvwTvpcwrpDUdNptYxVO+ekZAHJz5SiR91X0SJ43ztoLBSWyPg2Mj7PAKt0YHQ3nyNZ6Jn1JCyuUHeca036HKFCHdLudboPE5fGf7wv7R/R1lh7XTinK2yn/nlWf4xT1f8PpPefmBGdHs3V/IykXeTU2AcICBqK4fxauQxsMa4s2b4pV/gV3eUuGsKi6QnK/AgBbnYLAETRktU8hcr8WxN8epI5CWp37j4N3mbzyzpC4t8LSJD73kcf5yYGV4ZmMQMVtwYrHvoZbWsWMQUo40YRw5Lfz7KOQLjkF3GVZqasL4yDGJ3m X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: The vma_shrink() and vma_expand() functions are internal VMA manipulation functions which we ought to abstract for use outside of memory management code. To achieve this, we abstract the operation performed in fs/exec.c by shift_arg_pages() into a new relocate_vma() function implemented in mm/mmap.c, which enables us to also move move_page_tables() and vma_iter_prev_range() to internal.h. The purpose of doing this is to isolate key VMA manipulation functions in order that we can both abstract them and later render them easily testable. Signed-off-by: Lorenzo Stoakes --- fs/exec.c | 68 ++------------------------------------ include/linux/mm.h | 17 +--------- mm/internal.h | 18 +++++++++++ mm/mmap.c | 81 ++++++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 102 insertions(+), 82 deletions(-) diff --git a/fs/exec.c b/fs/exec.c index 40073142288f..5cf53e20d8df 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -683,75 +683,11 @@ static int copy_strings_kernel(int argc, const char *const *argv, /* * During bprm_mm_init(), we create a temporary stack at STACK_TOP_MAX. Once * the binfmt code determines where the new stack should reside, we shift it to - * its final location. The process proceeds as follows: - * - * 1) Use shift to calculate the new vma endpoints. - * 2) Extend vma to cover both the old and new ranges. This ensures the - * arguments passed to subsequent functions are consistent. - * 3) Move vma's page tables to the new range. - * 4) Free up any cleared pgd range. - * 5) Shrink the vma to cover only the new range. + * its final location. */ static int shift_arg_pages(struct vm_area_struct *vma, unsigned long shift) { - struct mm_struct *mm = vma->vm_mm; - unsigned long old_start = vma->vm_start; - unsigned long old_end = vma->vm_end; - unsigned long length = old_end - old_start; - unsigned long new_start = old_start - shift; - unsigned long new_end = old_end - shift; - VMA_ITERATOR(vmi, mm, new_start); - struct vm_area_struct *next; - struct mmu_gather tlb; - - BUG_ON(new_start > new_end); - - /* - * ensure there are no vmas between where we want to go - * and where we are - */ - if (vma != vma_next(&vmi)) - return -EFAULT; - - vma_iter_prev_range(&vmi); - /* - * cover the whole range: [new_start, old_end) - */ - if (vma_expand(&vmi, vma, new_start, old_end, vma->vm_pgoff, NULL)) - return -ENOMEM; - - /* - * move the page tables downwards, on failure we rely on - * process cleanup to remove whatever mess we made. - */ - if (length != move_page_tables(vma, old_start, - vma, new_start, length, false, true)) - return -ENOMEM; - - lru_add_drain(); - tlb_gather_mmu(&tlb, mm); - next = vma_next(&vmi); - if (new_end > old_start) { - /* - * when the old and new regions overlap clear from new_end. - */ - free_pgd_range(&tlb, new_end, old_end, new_end, - next ? next->vm_start : USER_PGTABLES_CEILING); - } else { - /* - * otherwise, clean from old_start; this is done to not touch - * the address space in [new_end, old_start) some architectures - * have constraints on va-space that make this illegal (IA64) - - * for the others its just a little faster. - */ - free_pgd_range(&tlb, old_start, old_end, new_end, - next ? next->vm_start : USER_PGTABLES_CEILING); - } - tlb_finish_mmu(&tlb); - - vma_prev(&vmi); - /* Shrink the vma to just the new range */ - return vma_shrink(&vmi, vma, new_start, new_end, vma->vm_pgoff); + return relocate_vma(vma, shift); } /* diff --git a/include/linux/mm.h b/include/linux/mm.h index 4d2b5538925b..ab4b70f2ce94 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -998,12 +998,6 @@ static inline struct vm_area_struct *vma_prev(struct vma_iterator *vmi) return mas_prev(&vmi->mas, 0); } -static inline -struct vm_area_struct *vma_iter_prev_range(struct vma_iterator *vmi) -{ - return mas_prev_range(&vmi->mas, 0); -} - static inline unsigned long vma_iter_addr(struct vma_iterator *vmi) { return vmi->mas.index; @@ -2523,11 +2517,6 @@ int set_page_dirty_lock(struct page *page); int get_cmdline(struct task_struct *task, char *buffer, int buflen); -extern unsigned long move_page_tables(struct vm_area_struct *vma, - unsigned long old_addr, struct vm_area_struct *new_vma, - unsigned long new_addr, unsigned long len, - bool need_rmap_locks, bool for_stack); - /* * Flags used by change_protection(). For now we make it a bitmap so * that we can pass in multiple flags just like parameters. However @@ -3273,11 +3262,6 @@ void anon_vma_interval_tree_verify(struct anon_vma_chain *node); /* mmap.c */ extern int __vm_enough_memory(struct mm_struct *mm, long pages, int cap_sys_admin); -extern int vma_expand(struct vma_iterator *vmi, struct vm_area_struct *vma, - unsigned long start, unsigned long end, pgoff_t pgoff, - struct vm_area_struct *next); -extern int vma_shrink(struct vma_iterator *vmi, struct vm_area_struct *vma, - unsigned long start, unsigned long end, pgoff_t pgoff); extern struct anon_vma *find_mergeable_anon_vma(struct vm_area_struct *); extern int insert_vm_struct(struct mm_struct *, struct vm_area_struct *); extern void unlink_file_vma(struct vm_area_struct *); @@ -3285,6 +3269,7 @@ extern struct vm_area_struct *copy_vma(struct vm_area_struct **, unsigned long addr, unsigned long len, pgoff_t pgoff, bool *need_rmap_locks); extern void exit_mmap(struct mm_struct *); +extern int relocate_vma(struct vm_area_struct *vma, unsigned long shift); static inline int check_data_rlimit(unsigned long rlim, unsigned long new, diff --git a/mm/internal.h b/mm/internal.h index 164f03c6bce2..8c7aa5860df4 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -1304,6 +1304,12 @@ static inline struct vm_area_struct vma_policy(vma), new_ctx, anon_vma_name(vma)); } +int vma_expand(struct vma_iterator *vmi, struct vm_area_struct *vma, + unsigned long start, unsigned long end, pgoff_t pgoff, + struct vm_area_struct *next); +int vma_shrink(struct vma_iterator *vmi, struct vm_area_struct *vma, + unsigned long start, unsigned long end, pgoff_t pgoff); + enum { /* mark page accessed */ FOLL_TOUCH = 1 << 16, @@ -1527,6 +1533,12 @@ static inline int vma_iter_store_gfp(struct vma_iterator *vmi, return 0; } +static inline +struct vm_area_struct *vma_iter_prev_range(struct vma_iterator *vmi) +{ + return mas_prev_range(&vmi->mas, 0); +} + /* * VMA lock generalization */ @@ -1638,4 +1650,10 @@ void unlink_file_vma_batch_init(struct unlink_vma_file_batch *); void unlink_file_vma_batch_add(struct unlink_vma_file_batch *, struct vm_area_struct *); void unlink_file_vma_batch_final(struct unlink_vma_file_batch *); +/* mremap.c */ +unsigned long move_page_tables(struct vm_area_struct *vma, + unsigned long old_addr, struct vm_area_struct *new_vma, + unsigned long new_addr, unsigned long len, + bool need_rmap_locks, bool for_stack); + #endif /* __MM_INTERNAL_H */ diff --git a/mm/mmap.c b/mm/mmap.c index e42d89f98071..d2eebbed87b9 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -4058,3 +4058,84 @@ static int __meminit init_reserve_notifier(void) return 0; } subsys_initcall(init_reserve_notifier); + +/* + * Relocate a VMA downwards by shift bytes. There cannot be any VMAs between + * this VMA and its relocated range, which will now reside at [vma->vm_start - + * shift, vma->vm_end - shift). + * + * This function is almost certainly NOT what you want for anything other than + * early executable temporary stack relocation. + */ +int relocate_vma(struct vm_area_struct *vma, unsigned long shift) +{ + /* + * The process proceeds as follows: + * + * 1) Use shift to calculate the new vma endpoints. + * 2) Extend vma to cover both the old and new ranges. This ensures the + * arguments passed to subsequent functions are consistent. + * 3) Move vma's page tables to the new range. + * 4) Free up any cleared pgd range. + * 5) Shrink the vma to cover only the new range. + */ + + struct mm_struct *mm = vma->vm_mm; + unsigned long old_start = vma->vm_start; + unsigned long old_end = vma->vm_end; + unsigned long length = old_end - old_start; + unsigned long new_start = old_start - shift; + unsigned long new_end = old_end - shift; + VMA_ITERATOR(vmi, mm, new_start); + struct vm_area_struct *next; + struct mmu_gather tlb; + + BUG_ON(new_start > new_end); + + /* + * ensure there are no vmas between where we want to go + * and where we are + */ + if (vma != vma_next(&vmi)) + return -EFAULT; + + vma_iter_prev_range(&vmi); + /* + * cover the whole range: [new_start, old_end) + */ + if (vma_expand(&vmi, vma, new_start, old_end, vma->vm_pgoff, NULL)) + return -ENOMEM; + + /* + * move the page tables downwards, on failure we rely on + * process cleanup to remove whatever mess we made. + */ + if (length != move_page_tables(vma, old_start, + vma, new_start, length, false, true)) + return -ENOMEM; + + lru_add_drain(); + tlb_gather_mmu(&tlb, mm); + next = vma_next(&vmi); + if (new_end > old_start) { + /* + * when the old and new regions overlap clear from new_end. + */ + free_pgd_range(&tlb, new_end, old_end, new_end, + next ? next->vm_start : USER_PGTABLES_CEILING); + } else { + /* + * otherwise, clean from old_start; this is done to not touch + * the address space in [new_end, old_start) some architectures + * have constraints on va-space that make this illegal (IA64) - + * for the others its just a little faster. + */ + free_pgd_range(&tlb, old_start, old_end, new_end, + next ? next->vm_start : USER_PGTABLES_CEILING); + } + tlb_finish_mmu(&tlb); + + vma_prev(&vmi); + /* Shrink the vma to just the new range */ + return vma_shrink(&vmi, vma, new_start, new_end, vma->vm_pgoff); +} From patchwork Fri Jun 28 14:35:25 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lorenzo Stoakes X-Patchwork-Id: 13716247 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2042EC3064D for ; Fri, 28 Jun 2024 14:35:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 809C36B0099; Fri, 28 Jun 2024 10:35:49 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7B9546B009B; Fri, 28 Jun 2024 10:35:49 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 39AFC6B009C; Fri, 28 Jun 2024 10:35:49 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id F38206B0099 for ; Fri, 28 Jun 2024 10:35:48 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 8D61C1C0C64 for ; Fri, 28 Jun 2024 14:35:48 +0000 (UTC) X-FDA: 82280546376.03.4E935BD Received: from mail-wm1-f48.google.com (mail-wm1-f48.google.com [209.85.128.48]) by imf18.hostedemail.com (Postfix) with ESMTP id 11D981C0018 for ; Fri, 28 Jun 2024 14:35:45 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=ls8YhS6A; spf=pass (imf18.hostedemail.com: domain of lstoakes@gmail.com designates 209.85.128.48 as permitted sender) smtp.mailfrom=lstoakes@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1719585328; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=6hJwIvoW9tciUclmSufBt/h04++QFICJXKONGqnsCqo=; b=xNZ+0he6QH3yEps4iDTZB4Ed6k1UTGAxL783WBw+WvX2o/OCObf1w8ksBn6ZAztv6HWeYr S80ar0g1d13tzkREozHlR7JBsD2MVIjro1NYkD9WJ6pikPyd3kGXn/kzSakBYMgKsuSD8A ZvU5KICNhnzqkzb2Gv2ajp9B+XUYmpU= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1719585328; a=rsa-sha256; cv=none; b=fdNdu3uw/IKiOQYhRgxokRYT/dGerGL8wGjPT7t+67qzq66POeF66REfPgiTIWJubWu8wB krIAuo/z6ZUNqoaTs+zUEc2nN4Z86nPT0yHwSfEPxIfagm8NxGgtehfxglGwh9SBKefI9x q/ZJtuLy03DBvmc6JwYeoYHuNceKNYM= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=ls8YhS6A; spf=pass (imf18.hostedemail.com: domain of lstoakes@gmail.com designates 209.85.128.48 as permitted sender) smtp.mailfrom=lstoakes@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-wm1-f48.google.com with SMTP id 5b1f17b1804b1-42564316479so4728005e9.2 for ; Fri, 28 Jun 2024 07:35:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1719585344; x=1720190144; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=6hJwIvoW9tciUclmSufBt/h04++QFICJXKONGqnsCqo=; b=ls8YhS6AbSyg0ZZXeXkV6qyOtqdVecgU9IE8LkjJJVGjXfKDKpfzpzvf2u79qlLsIW k8LaGm4gdU+kxc7mHoQ0/qqJEts8K6yTRZCn3YPyAc+YL1cK0R//rqIZeeBbVK8aHj7D YvnZt7RNJ4N5z31K8PtXAMVK96UVkEhFOIqV3FbmpN2llwcznrghjCknBPC6sftRpJUs Qw6O3rMudoIKyg9tYZBQ8Gqd2fXwRsoRpQRSZ3nZGJZbTLUrSO9jIR4gvpy1AY8PH1wN s5vzYol/S1feeDwEPrX5KWYNXa6RBDj3kMLd7jZgH8glw9aWKhqFkpYx7L6mU+4eD/4K Q38g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1719585344; x=1720190144; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=6hJwIvoW9tciUclmSufBt/h04++QFICJXKONGqnsCqo=; b=VvIZ7zAughnSg3kyGVFYM6RBT5RZ4/8bPBU/ni5HR+sqn4eAnGe6YI7s3ktpxSUgSL bZQtChU+LrfQcRQpcfs3pdsGVPY85SfFTeZInPbUxEcf/CaaYIvf3FeE2Wtok0I1I/b7 WR99EmAaR5xdRzLv3FqyQ3XS6eLXJFe4TVsQGtDga696wEcYDqrXLMgNUeGSwVZFFHtR +1S7tThtUqyxzvP4nd0Zk1jaC6EGw/4hBWFoRzs56azv1YrIwbsymneDKw7vRISzX1Yt LYSLcqbfI05agbWjJTvqzceT8xVfkRzS4y8OLo6rrhHWdDGphDcLkRMHJH1rX5hXTuvb z0cA== X-Forwarded-Encrypted: i=1; AJvYcCUCDO88NV3215bEiWi1spJiEvGtYWQBby10azet2kjmkzaGisboiPqXsO4p7M2FgfcL6txLEN3T70qCi+tKnZQCLEI= X-Gm-Message-State: AOJu0YyyTes9Pw9/N2nbpSruosZscAbzuEFjxW2Uv45Qn8lwTxr1T7nj R93p8AOOdmyvbO4RUyC2pjeomDqkGQ5Y3aUjLOd90ERDjuvLGgzG X-Google-Smtp-Source: AGHT+IFMI6otL/8uTmLOvixAgOF2Y/1f1UUvhWj6HnymPhruGIpFuHBOMIr/ndr9v0ZKDjC4PBFwUQ== X-Received: by 2002:a05:600c:2e56:b0:425:5eff:8407 with SMTP id 5b1f17b1804b1-4255eff8a4dmr53784405e9.14.1719585343604; Fri, 28 Jun 2024 07:35:43 -0700 (PDT) Received: from lucifer.home ([2a00:23cc:d20f:ba01:bb66:f8b2:a0e8:6447]) by smtp.googlemail.com with ESMTPSA id 5b1f17b1804b1-4256af37828sm38985485e9.9.2024.06.28.07.35.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 28 Jun 2024 07:35:42 -0700 (PDT) From: Lorenzo Stoakes To: Andrew Morton Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, "Liam R . Howlett" , Vlastimil Babka , Matthew Wilcox , Alexander Viro , Christian Brauner , Jan Kara , Eric Biederman , Kees Cook , Suren Baghdasaryan , Lorenzo Stoakes Subject: [RFC PATCH v2 4/7] mm: move internal core VMA manipulation functions to own file Date: Fri, 28 Jun 2024 15:35:25 +0100 Message-ID: <4fd37092b65caf30187c29399c2cc320a8126a66.1719584707.git.lstoakes@gmail.com> X-Mailer: git-send-email 2.45.1 In-Reply-To: References: MIME-Version: 1.0 X-Stat-Signature: sbpqokxoq9bydf8dp75bxrpxdhruk7nj X-Rspamd-Queue-Id: 11D981C0018 X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1719585345-72815 X-HE-Meta: U2FsdGVkX1+IIDyX7qduy3XnTVxhJ21CjjHR3hQqfCxABWClCWm/u2taav6o6Pnmx4M8Ch52PteS0BGEuKvE5T+IHv3W8D4DQ7U/drKNfFClM2j9oofRg5IVnZ3OS0JLpwgmIkQHUOEXFenIDsFF/K4o8ZQDGahHn0/xhVvfn3Pr0oIuo2D13FUXwTOTuyw9QzecTv2QeECh75TdyEGHkjYQxlNKeXGZW++2h6Nkj9e7TZryXGe7vUUhRBVMnlQ97O613dlXtMhxjErViQHO9quKF9eC0RKMeB7kyzg8UubKZgkQtyZvE2sxtfm6wgIjDLlmOeRy4Td7RK9a1LS3Ek4JEPngFQe6zPTluZuvfocb2cE0WIJ8gwEOaOWOL+przbAXA2cXlyrCLMyKDcblMS0bhUf+McMM2jfu3gWIwOt/n7s3VcqpOgsxLcXDHFJffXUA9tILX42/DJTgK7fG0YlpcUu84QNxGnO4yCNhEwGxCCbR4KUS2fYhdvfvLmTbp8iX01l39BGBwrQq5exA1W+5a95SZnT8SiJdhaqxmB4qNHRDJ8Wj0OzkPpKqiOEHEGyyHvMoV8a1FBzu4kVigm6Wcy1Alph42B4GEpftNLxZqbODMKYH4Cnf0f8re3V0e1e47APk7G6Hzi6kyLlnKdmEoVUSVddjndeX5Mu2/M5IuUOcG/BzMYqWikdE4K++3cxJgLe90UjuRZWHxCLDfSg9LJcjU5rFwGsHXVD5h0w1CldAJGAehplo36fTd5FRsuRnDruG092rySqPplKYrdDeMCOIv5ftQZJohhGtPfGxm4bY1ZALzz8buI1zv+B7LDUcBWEY69CGGb9gA14PNGt0ti+QvLYu5M03ZXlrA8Cnv90K+bHCJc0ekFvSgJWq4jxCpFNcyrV5af/SHo/48VwNRDYZRZ2oJOXYRp9LHHk7wNiZopRchGfY/65g8uUemQ9LyeR9yRmzCIV57k2 8bMPQz6u Mv7BE/Tyuh6uo8nnz0jrtNC8awV6RSFXiMhBShePzxdszUnkn4nGnR9kFSeXkOV+cSLInp+xbnGaD1tmuPhdigC8hWUXTop4t90LZ1pOxMOocCffKkPQl8sY8k4yKLMESjAwFfVRy+QU+2ELAigvSsCcow63nWe0g7uJNiPV4VgYTX4LiNjXOER0sMqLQsw02DSK5164MAxNMuYKC1IBG/xd6DNIJCEEW3NWtkS0TO4YfYJlvItwNkovCf04ZSTw8aFQxCqBTKAe2842DnNhaxkyBqkbCIDi0XcDirPZEBkFM4bBCh+eWeoIarrMt376p33+mFtITnyV6sSabZePu0d6BCK8yMVDTae5nadntfpWTZYawEQqEwTePeHnXmiym1cMYYQAKJ0Ud/04eiD9szScQ3JCANbjmGfD0P945FzxlCWlqFa38mwk/fECFVrDI7kQ0Y/Uw9hqS2lo= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: This patch introduces vma.c and moves internal core VMA manipulation functions to this file from mmap.c. This allows us to isolate VMA functionality in a single place such that we can create userspace testing code that invokes this functionality in an environment where we can implement simple unit tests of core functionality. This patch ensures that core VMA functionality is explicitly marked as such by its presence in mm/vma.h. It also places the header includes required by vma.c in vma_internal.h, which is simply imported by vma.c. This makes the VMA functionality testable, as userland testing code can simply stub out functionality as required. Signed-off-by: Lorenzo Stoakes --- include/linux/mm.h | 35 - mm/Makefile | 2 +- mm/internal.h | 236 +----- mm/mmap.c | 1981 +++----------------------------------------- mm/mmu_notifier.c | 2 + mm/vma.c | 1766 +++++++++++++++++++++++++++++++++++++++ mm/vma.h | 362 ++++++++ mm/vma_internal.h | 52 ++ 8 files changed, 2293 insertions(+), 2143 deletions(-) create mode 100644 mm/vma.c create mode 100644 mm/vma.h create mode 100644 mm/vma_internal.h diff --git a/include/linux/mm.h b/include/linux/mm.h index ab4b70f2ce94..3afcf6ae5854 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -998,21 +998,6 @@ static inline struct vm_area_struct *vma_prev(struct vma_iterator *vmi) return mas_prev(&vmi->mas, 0); } -static inline unsigned long vma_iter_addr(struct vma_iterator *vmi) -{ - return vmi->mas.index; -} - -static inline unsigned long vma_iter_end(struct vma_iterator *vmi) -{ - return vmi->mas.last + 1; -} -static inline int vma_iter_bulk_alloc(struct vma_iterator *vmi, - unsigned long count) -{ - return mas_expected_entries(&vmi->mas, count); -} - static inline int vma_iter_clear_gfp(struct vma_iterator *vmi, unsigned long start, unsigned long end, gfp_t gfp) { @@ -2537,21 +2522,6 @@ int get_cmdline(struct task_struct *task, char *buffer, int buflen); #define MM_CP_UFFD_WP_ALL (MM_CP_UFFD_WP | \ MM_CP_UFFD_WP_RESOLVE) -bool vma_needs_dirty_tracking(struct vm_area_struct *vma); -bool vma_wants_writenotify(struct vm_area_struct *vma, pgprot_t vm_page_prot); -static inline bool vma_wants_manual_pte_write_upgrade(struct vm_area_struct *vma) -{ - /* - * We want to check manually if we can change individual PTEs writable - * if we can't do that automatically for all PTEs in a mapping. For - * private mappings, that's always the case when we have write - * permissions as we properly have to handle COW. - */ - if (vma->vm_flags & VM_SHARED) - return vma_wants_writenotify(vma, vma->vm_page_prot); - return !!(vma->vm_flags & VM_WRITE); - -} bool can_change_pte_writable(struct vm_area_struct *vma, unsigned long addr, pte_t pte); extern long change_protection(struct mmu_gather *tlb, @@ -3262,12 +3232,7 @@ void anon_vma_interval_tree_verify(struct anon_vma_chain *node); /* mmap.c */ extern int __vm_enough_memory(struct mm_struct *mm, long pages, int cap_sys_admin); -extern struct anon_vma *find_mergeable_anon_vma(struct vm_area_struct *); extern int insert_vm_struct(struct mm_struct *, struct vm_area_struct *); -extern void unlink_file_vma(struct vm_area_struct *); -extern struct vm_area_struct *copy_vma(struct vm_area_struct **, - unsigned long addr, unsigned long len, pgoff_t pgoff, - bool *need_rmap_locks); extern void exit_mmap(struct mm_struct *); extern int relocate_vma(struct vm_area_struct *vma, unsigned long shift); diff --git a/mm/Makefile b/mm/Makefile index d2915f8c9dc0..140a22654dde 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -37,7 +37,7 @@ mmu-y := nommu.o mmu-$(CONFIG_MMU) := highmem.o memory.o mincore.o \ mlock.o mmap.o mmu_gather.o mprotect.o mremap.o \ msync.o page_vma_mapped.o pagewalk.o \ - pgtable-generic.o rmap.o vmalloc.o + pgtable-generic.o rmap.o vmalloc.o vma.o ifdef CONFIG_CROSS_MEMORY_ATTACH diff --git a/mm/internal.h b/mm/internal.h index 8c7aa5860df4..f61779206624 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -8,13 +8,18 @@ #define __MM_INTERNAL_H #include +#include #include +#include #include #include #include #include #include +/* Internal core VMA manipulation functions. */ +#include "vma.h" + struct folio_batch; /* @@ -778,37 +783,6 @@ static inline bool free_area_empty(struct free_area *area, int migratetype) return list_empty(&area->free_list[migratetype]); } -/* - * These three helpers classifies VMAs for virtual memory accounting. - */ - -/* - * Executable code area - executable, not writable, not stack - */ -static inline bool is_exec_mapping(vm_flags_t flags) -{ - return (flags & (VM_EXEC | VM_WRITE | VM_STACK)) == VM_EXEC; -} - -/* - * Stack area (including shadow stacks) - * - * VM_GROWSUP / VM_GROWSDOWN VMAs are always private anonymous: - * do_mmap() forbids all other combinations. - */ -static inline bool is_stack_mapping(vm_flags_t flags) -{ - return ((flags & VM_STACK) == VM_STACK) || (flags & VM_SHADOW_STACK); -} - -/* - * Data area - private, writable, not stack - */ -static inline bool is_data_mapping(vm_flags_t flags) -{ - return (flags & (VM_WRITE | VM_SHARED | VM_STACK)) == VM_WRITE; -} - /* mm/util.c */ struct anon_vma *folio_anon_vma(struct folio *folio); @@ -1236,80 +1210,6 @@ void touch_pud(struct vm_area_struct *vma, unsigned long addr, void touch_pmd(struct vm_area_struct *vma, unsigned long addr, pmd_t *pmd, bool write); -/* - * mm/mmap.c - */ -struct vm_area_struct *vma_merge_extend(struct vma_iterator *vmi, - struct vm_area_struct *vma, - unsigned long delta); - -struct vm_area_struct *vma_modify(struct vma_iterator *vmi, - struct vm_area_struct *prev, - struct vm_area_struct *vma, - unsigned long start, unsigned long end, - unsigned long vm_flags, - struct mempolicy *policy, - struct vm_userfaultfd_ctx uffd_ctx, - struct anon_vma_name *anon_name); - -/* We are about to modify the VMA's flags. */ -static inline struct vm_area_struct -*vma_modify_flags(struct vma_iterator *vmi, - struct vm_area_struct *prev, - struct vm_area_struct *vma, - unsigned long start, unsigned long end, - unsigned long new_flags) -{ - return vma_modify(vmi, prev, vma, start, end, new_flags, - vma_policy(vma), vma->vm_userfaultfd_ctx, - anon_vma_name(vma)); -} - -/* We are about to modify the VMA's flags and/or anon_name. */ -static inline struct vm_area_struct -*vma_modify_flags_name(struct vma_iterator *vmi, - struct vm_area_struct *prev, - struct vm_area_struct *vma, - unsigned long start, - unsigned long end, - unsigned long new_flags, - struct anon_vma_name *new_name) -{ - return vma_modify(vmi, prev, vma, start, end, new_flags, - vma_policy(vma), vma->vm_userfaultfd_ctx, new_name); -} - -/* We are about to modify the VMA's memory policy. */ -static inline struct vm_area_struct -*vma_modify_policy(struct vma_iterator *vmi, - struct vm_area_struct *prev, - struct vm_area_struct *vma, - unsigned long start, unsigned long end, - struct mempolicy *new_pol) -{ - return vma_modify(vmi, prev, vma, start, end, vma->vm_flags, - new_pol, vma->vm_userfaultfd_ctx, anon_vma_name(vma)); -} - -/* We are about to modify the VMA's flags and/or uffd context. */ -static inline struct vm_area_struct -*vma_modify_flags_uffd(struct vma_iterator *vmi, - struct vm_area_struct *prev, - struct vm_area_struct *vma, - unsigned long start, unsigned long end, - unsigned long new_flags, - struct vm_userfaultfd_ctx new_ctx) -{ - return vma_modify(vmi, prev, vma, start, end, new_flags, - vma_policy(vma), new_ctx, anon_vma_name(vma)); -} - -int vma_expand(struct vma_iterator *vmi, struct vm_area_struct *vma, - unsigned long start, unsigned long end, pgoff_t pgoff, - struct vm_area_struct *next); -int vma_shrink(struct vma_iterator *vmi, struct vm_area_struct *vma, - unsigned long start, unsigned long end, pgoff_t pgoff); - enum { /* mark page accessed */ FOLL_TOUCH = 1 << 16, @@ -1436,123 +1336,6 @@ static inline bool pte_needs_soft_dirty_wp(struct vm_area_struct *vma, pte_t pte return vma_soft_dirty_enabled(vma) && !pte_soft_dirty(pte); } -static inline void vma_iter_config(struct vma_iterator *vmi, - unsigned long index, unsigned long last) -{ - __mas_set_range(&vmi->mas, index, last - 1); -} - -static inline void vma_iter_reset(struct vma_iterator *vmi) -{ - mas_reset(&vmi->mas); -} - -static inline -struct vm_area_struct *vma_iter_prev_range_limit(struct vma_iterator *vmi, unsigned long min) -{ - return mas_prev_range(&vmi->mas, min); -} - -static inline -struct vm_area_struct *vma_iter_next_range_limit(struct vma_iterator *vmi, unsigned long max) -{ - return mas_next_range(&vmi->mas, max); -} - -static inline int vma_iter_area_lowest(struct vma_iterator *vmi, unsigned long min, - unsigned long max, unsigned long size) -{ - return mas_empty_area(&vmi->mas, min, max - 1, size); -} - -static inline int vma_iter_area_highest(struct vma_iterator *vmi, unsigned long min, - unsigned long max, unsigned long size) -{ - return mas_empty_area_rev(&vmi->mas, min, max - 1, size); -} - -/* - * VMA Iterator functions shared between nommu and mmap - */ -static inline int vma_iter_prealloc(struct vma_iterator *vmi, - struct vm_area_struct *vma) -{ - return mas_preallocate(&vmi->mas, vma, GFP_KERNEL); -} - -static inline void vma_iter_clear(struct vma_iterator *vmi) -{ - mas_store_prealloc(&vmi->mas, NULL); -} - -static inline struct vm_area_struct *vma_iter_load(struct vma_iterator *vmi) -{ - return mas_walk(&vmi->mas); -} - -/* Store a VMA with preallocated memory */ -static inline void vma_iter_store(struct vma_iterator *vmi, - struct vm_area_struct *vma) -{ - -#if defined(CONFIG_DEBUG_VM_MAPLE_TREE) - if (MAS_WARN_ON(&vmi->mas, vmi->mas.status != ma_start && - vmi->mas.index > vma->vm_start)) { - pr_warn("%lx > %lx\n store vma %lx-%lx\n into slot %lx-%lx\n", - vmi->mas.index, vma->vm_start, vma->vm_start, - vma->vm_end, vmi->mas.index, vmi->mas.last); - } - if (MAS_WARN_ON(&vmi->mas, vmi->mas.status != ma_start && - vmi->mas.last < vma->vm_start)) { - pr_warn("%lx < %lx\nstore vma %lx-%lx\ninto slot %lx-%lx\n", - vmi->mas.last, vma->vm_start, vma->vm_start, vma->vm_end, - vmi->mas.index, vmi->mas.last); - } -#endif - - if (vmi->mas.status != ma_start && - ((vmi->mas.index > vma->vm_start) || (vmi->mas.last < vma->vm_start))) - vma_iter_invalidate(vmi); - - __mas_set_range(&vmi->mas, vma->vm_start, vma->vm_end - 1); - mas_store_prealloc(&vmi->mas, vma); -} - -static inline int vma_iter_store_gfp(struct vma_iterator *vmi, - struct vm_area_struct *vma, gfp_t gfp) -{ - if (vmi->mas.status != ma_start && - ((vmi->mas.index > vma->vm_start) || (vmi->mas.last < vma->vm_start))) - vma_iter_invalidate(vmi); - - __mas_set_range(&vmi->mas, vma->vm_start, vma->vm_end - 1); - mas_store_gfp(&vmi->mas, vma, gfp); - if (unlikely(mas_is_err(&vmi->mas))) - return -ENOMEM; - - return 0; -} - -static inline -struct vm_area_struct *vma_iter_prev_range(struct vma_iterator *vmi) -{ - return mas_prev_range(&vmi->mas, 0); -} - -/* - * VMA lock generalization - */ -struct vma_prepare { - struct vm_area_struct *vma; - struct vm_area_struct *adj_next; - struct file *file; - struct address_space *mapping; - struct anon_vma *anon_vma; - struct vm_area_struct *insert; - struct vm_area_struct *remove; - struct vm_area_struct *remove2; -}; - void __meminit __init_single_page(struct page *page, unsigned long pfn, unsigned long zone, int nid); @@ -1641,15 +1424,6 @@ static inline void shrinker_debugfs_remove(struct dentry *debugfs_entry, void workingset_update_node(struct xa_node *node); extern struct list_lru shadow_nodes; -struct unlink_vma_file_batch { - int count; - struct vm_area_struct *vmas[8]; -}; - -void unlink_file_vma_batch_init(struct unlink_vma_file_batch *); -void unlink_file_vma_batch_add(struct unlink_vma_file_batch *, struct vm_area_struct *); -void unlink_file_vma_batch_final(struct unlink_vma_file_batch *); - /* mremap.c */ unsigned long move_page_tables(struct vm_area_struct *vma, unsigned long old_addr, struct vm_area_struct *new_vma, diff --git a/mm/mmap.c b/mm/mmap.c index d2eebbed87b9..721870f380bf 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -57,6 +57,7 @@ #include #include "internal.h" +#include "vma.h" #ifndef arch_mmap_check #define arch_mmap_check(addr, len, flags) (0) @@ -76,16 +77,6 @@ int mmap_rnd_compat_bits __read_mostly = CONFIG_ARCH_MMAP_RND_COMPAT_BITS; static bool ignore_rlimit_data; core_param(ignore_rlimit_data, ignore_rlimit_data, bool, 0644); -static void unmap_region(struct mm_struct *mm, struct ma_state *mas, - struct vm_area_struct *vma, struct vm_area_struct *prev, - struct vm_area_struct *next, unsigned long start, - unsigned long end, unsigned long tree_end, bool mm_wr_locked); - -static pgprot_t vm_pgprot_modify(pgprot_t oldprot, unsigned long vm_flags) -{ - return pgprot_modify(oldprot, vm_get_page_prot(vm_flags)); -} - /* Update vma->vm_page_prot to reflect vma->vm_flags. */ void vma_set_page_prot(struct vm_area_struct *vma) { @@ -101,100 +92,6 @@ void vma_set_page_prot(struct vm_area_struct *vma) WRITE_ONCE(vma->vm_page_prot, vm_page_prot); } -/* - * Requires inode->i_mapping->i_mmap_rwsem - */ -static void __remove_shared_vm_struct(struct vm_area_struct *vma, - struct address_space *mapping) -{ - if (vma_is_shared_maywrite(vma)) - mapping_unmap_writable(mapping); - - flush_dcache_mmap_lock(mapping); - vma_interval_tree_remove(vma, &mapping->i_mmap); - flush_dcache_mmap_unlock(mapping); -} - -/* - * Unlink a file-based vm structure from its interval tree, to hide - * vma from rmap and vmtruncate before freeing its page tables. - */ -void unlink_file_vma(struct vm_area_struct *vma) -{ - struct file *file = vma->vm_file; - - if (file) { - struct address_space *mapping = file->f_mapping; - i_mmap_lock_write(mapping); - __remove_shared_vm_struct(vma, mapping); - i_mmap_unlock_write(mapping); - } -} - -void unlink_file_vma_batch_init(struct unlink_vma_file_batch *vb) -{ - vb->count = 0; -} - -static void unlink_file_vma_batch_process(struct unlink_vma_file_batch *vb) -{ - struct address_space *mapping; - int i; - - mapping = vb->vmas[0]->vm_file->f_mapping; - i_mmap_lock_write(mapping); - for (i = 0; i < vb->count; i++) { - VM_WARN_ON_ONCE(vb->vmas[i]->vm_file->f_mapping != mapping); - __remove_shared_vm_struct(vb->vmas[i], mapping); - } - i_mmap_unlock_write(mapping); - - unlink_file_vma_batch_init(vb); -} - -void unlink_file_vma_batch_add(struct unlink_vma_file_batch *vb, - struct vm_area_struct *vma) -{ - if (vma->vm_file == NULL) - return; - - if ((vb->count > 0 && vb->vmas[0]->vm_file != vma->vm_file) || - vb->count == ARRAY_SIZE(vb->vmas)) - unlink_file_vma_batch_process(vb); - - vb->vmas[vb->count] = vma; - vb->count++; -} - -void unlink_file_vma_batch_final(struct unlink_vma_file_batch *vb) -{ - if (vb->count > 0) - unlink_file_vma_batch_process(vb); -} - -/* - * Close a vm structure and free it. - */ -static void remove_vma(struct vm_area_struct *vma, bool unreachable) -{ - might_sleep(); - if (vma->vm_ops && vma->vm_ops->close) - vma->vm_ops->close(vma); - if (vma->vm_file) - fput(vma->vm_file); - mpol_put(vma_policy(vma)); - if (unreachable) - __vm_area_free(vma); - else - vm_area_free(vma); -} - -static inline struct vm_area_struct *vma_prev_limit(struct vma_iterator *vmi, - unsigned long min) -{ - return mas_prev(&vmi->mas, min); -} - /* * check_brk_limits() - Use platform specific check of range & verify mlock * limits. @@ -300,891 +197,22 @@ SYSCALL_DEFINE1(brk, unsigned long, brk) if (do_brk_flags(&vmi, brkvma, oldbrk, newbrk - oldbrk, 0) < 0) goto out; - mm->brk = brk; - if (mm->def_flags & VM_LOCKED) - populate = true; - -success: - mmap_write_unlock(mm); -success_unlocked: - userfaultfd_unmap_complete(mm, &uf); - if (populate) - mm_populate(oldbrk, newbrk - oldbrk); - return brk; - -out: - mm->brk = origbrk; - mmap_write_unlock(mm); - return origbrk; -} - -#if defined(CONFIG_DEBUG_VM_MAPLE_TREE) -static void validate_mm(struct mm_struct *mm) -{ - int bug = 0; - int i = 0; - struct vm_area_struct *vma; - VMA_ITERATOR(vmi, mm, 0); - - mt_validate(&mm->mm_mt); - for_each_vma(vmi, vma) { -#ifdef CONFIG_DEBUG_VM_RB - struct anon_vma *anon_vma = vma->anon_vma; - struct anon_vma_chain *avc; -#endif - unsigned long vmi_start, vmi_end; - bool warn = 0; - - vmi_start = vma_iter_addr(&vmi); - vmi_end = vma_iter_end(&vmi); - if (VM_WARN_ON_ONCE_MM(vma->vm_end != vmi_end, mm)) - warn = 1; - - if (VM_WARN_ON_ONCE_MM(vma->vm_start != vmi_start, mm)) - warn = 1; - - if (warn) { - pr_emerg("issue in %s\n", current->comm); - dump_stack(); - dump_vma(vma); - pr_emerg("tree range: %px start %lx end %lx\n", vma, - vmi_start, vmi_end - 1); - vma_iter_dump_tree(&vmi); - } - -#ifdef CONFIG_DEBUG_VM_RB - if (anon_vma) { - anon_vma_lock_read(anon_vma); - list_for_each_entry(avc, &vma->anon_vma_chain, same_vma) - anon_vma_interval_tree_verify(avc); - anon_vma_unlock_read(anon_vma); - } -#endif - i++; - } - if (i != mm->map_count) { - pr_emerg("map_count %d vma iterator %d\n", mm->map_count, i); - bug = 1; - } - VM_BUG_ON_MM(bug, mm); -} - -#else /* !CONFIG_DEBUG_VM_MAPLE_TREE */ -#define validate_mm(mm) do { } while (0) -#endif /* CONFIG_DEBUG_VM_MAPLE_TREE */ - -/* - * vma has some anon_vma assigned, and is already inserted on that - * anon_vma's interval trees. - * - * Before updating the vma's vm_start / vm_end / vm_pgoff fields, the - * vma must be removed from the anon_vma's interval trees using - * anon_vma_interval_tree_pre_update_vma(). - * - * After the update, the vma will be reinserted using - * anon_vma_interval_tree_post_update_vma(). - * - * The entire update must be protected by exclusive mmap_lock and by - * the root anon_vma's mutex. - */ -static inline void -anon_vma_interval_tree_pre_update_vma(struct vm_area_struct *vma) -{ - struct anon_vma_chain *avc; - - list_for_each_entry(avc, &vma->anon_vma_chain, same_vma) - anon_vma_interval_tree_remove(avc, &avc->anon_vma->rb_root); -} - -static inline void -anon_vma_interval_tree_post_update_vma(struct vm_area_struct *vma) -{ - struct anon_vma_chain *avc; - - list_for_each_entry(avc, &vma->anon_vma_chain, same_vma) - anon_vma_interval_tree_insert(avc, &avc->anon_vma->rb_root); -} - -static unsigned long count_vma_pages_range(struct mm_struct *mm, - unsigned long addr, unsigned long end) -{ - VMA_ITERATOR(vmi, mm, addr); - struct vm_area_struct *vma; - unsigned long nr_pages = 0; - - for_each_vma_range(vmi, vma, end) { - unsigned long vm_start = max(addr, vma->vm_start); - unsigned long vm_end = min(end, vma->vm_end); - - nr_pages += PHYS_PFN(vm_end - vm_start); - } - - return nr_pages; -} - -static void __vma_link_file(struct vm_area_struct *vma, - struct address_space *mapping) -{ - if (vma_is_shared_maywrite(vma)) - mapping_allow_writable(mapping); - - flush_dcache_mmap_lock(mapping); - vma_interval_tree_insert(vma, &mapping->i_mmap); - flush_dcache_mmap_unlock(mapping); -} - -static void vma_link_file(struct vm_area_struct *vma) -{ - struct file *file = vma->vm_file; - struct address_space *mapping; - - if (file) { - mapping = file->f_mapping; - i_mmap_lock_write(mapping); - __vma_link_file(vma, mapping); - i_mmap_unlock_write(mapping); - } -} - -static int vma_link(struct mm_struct *mm, struct vm_area_struct *vma) -{ - VMA_ITERATOR(vmi, mm, 0); - - vma_iter_config(&vmi, vma->vm_start, vma->vm_end); - if (vma_iter_prealloc(&vmi, vma)) - return -ENOMEM; - - vma_start_write(vma); - vma_iter_store(&vmi, vma); - vma_link_file(vma); - mm->map_count++; - validate_mm(mm); - return 0; -} - -/* - * init_multi_vma_prep() - Initializer for struct vma_prepare - * @vp: The vma_prepare struct - * @vma: The vma that will be altered once locked - * @next: The next vma if it is to be adjusted - * @remove: The first vma to be removed - * @remove2: The second vma to be removed - */ -static inline void init_multi_vma_prep(struct vma_prepare *vp, - struct vm_area_struct *vma, struct vm_area_struct *next, - struct vm_area_struct *remove, struct vm_area_struct *remove2) -{ - memset(vp, 0, sizeof(struct vma_prepare)); - vp->vma = vma; - vp->anon_vma = vma->anon_vma; - vp->remove = remove; - vp->remove2 = remove2; - vp->adj_next = next; - if (!vp->anon_vma && next) - vp->anon_vma = next->anon_vma; - - vp->file = vma->vm_file; - if (vp->file) - vp->mapping = vma->vm_file->f_mapping; - -} - -/* - * init_vma_prep() - Initializer wrapper for vma_prepare struct - * @vp: The vma_prepare struct - * @vma: The vma that will be altered once locked - */ -static inline void init_vma_prep(struct vma_prepare *vp, - struct vm_area_struct *vma) -{ - init_multi_vma_prep(vp, vma, NULL, NULL, NULL); -} - - -/* - * vma_prepare() - Helper function for handling locking VMAs prior to altering - * @vp: The initialized vma_prepare struct - */ -static inline void vma_prepare(struct vma_prepare *vp) -{ - if (vp->file) { - uprobe_munmap(vp->vma, vp->vma->vm_start, vp->vma->vm_end); - - if (vp->adj_next) - uprobe_munmap(vp->adj_next, vp->adj_next->vm_start, - vp->adj_next->vm_end); - - i_mmap_lock_write(vp->mapping); - if (vp->insert && vp->insert->vm_file) { - /* - * Put into interval tree now, so instantiated pages - * are visible to arm/parisc __flush_dcache_page - * throughout; but we cannot insert into address - * space until vma start or end is updated. - */ - __vma_link_file(vp->insert, - vp->insert->vm_file->f_mapping); - } - } - - if (vp->anon_vma) { - anon_vma_lock_write(vp->anon_vma); - anon_vma_interval_tree_pre_update_vma(vp->vma); - if (vp->adj_next) - anon_vma_interval_tree_pre_update_vma(vp->adj_next); - } - - if (vp->file) { - flush_dcache_mmap_lock(vp->mapping); - vma_interval_tree_remove(vp->vma, &vp->mapping->i_mmap); - if (vp->adj_next) - vma_interval_tree_remove(vp->adj_next, - &vp->mapping->i_mmap); - } - -} - -/* - * vma_complete- Helper function for handling the unlocking after altering VMAs, - * or for inserting a VMA. - * - * @vp: The vma_prepare struct - * @vmi: The vma iterator - * @mm: The mm_struct - */ -static inline void vma_complete(struct vma_prepare *vp, - struct vma_iterator *vmi, struct mm_struct *mm) -{ - if (vp->file) { - if (vp->adj_next) - vma_interval_tree_insert(vp->adj_next, - &vp->mapping->i_mmap); - vma_interval_tree_insert(vp->vma, &vp->mapping->i_mmap); - flush_dcache_mmap_unlock(vp->mapping); - } - - if (vp->remove && vp->file) { - __remove_shared_vm_struct(vp->remove, vp->mapping); - if (vp->remove2) - __remove_shared_vm_struct(vp->remove2, vp->mapping); - } else if (vp->insert) { - /* - * split_vma has split insert from vma, and needs - * us to insert it before dropping the locks - * (it may either follow vma or precede it). - */ - vma_iter_store(vmi, vp->insert); - mm->map_count++; - } - - if (vp->anon_vma) { - anon_vma_interval_tree_post_update_vma(vp->vma); - if (vp->adj_next) - anon_vma_interval_tree_post_update_vma(vp->adj_next); - anon_vma_unlock_write(vp->anon_vma); - } - - if (vp->file) { - i_mmap_unlock_write(vp->mapping); - uprobe_mmap(vp->vma); - - if (vp->adj_next) - uprobe_mmap(vp->adj_next); - } - - if (vp->remove) { -again: - vma_mark_detached(vp->remove, true); - if (vp->file) { - uprobe_munmap(vp->remove, vp->remove->vm_start, - vp->remove->vm_end); - fput(vp->file); - } - if (vp->remove->anon_vma) - anon_vma_merge(vp->vma, vp->remove); - mm->map_count--; - mpol_put(vma_policy(vp->remove)); - if (!vp->remove2) - WARN_ON_ONCE(vp->vma->vm_end < vp->remove->vm_end); - vm_area_free(vp->remove); - - /* - * In mprotect's case 6 (see comments on vma_merge), - * we are removing both mid and next vmas - */ - if (vp->remove2) { - vp->remove = vp->remove2; - vp->remove2 = NULL; - goto again; - } - } - if (vp->insert && vp->file) - uprobe_mmap(vp->insert); - validate_mm(mm); -} - -/* - * dup_anon_vma() - Helper function to duplicate anon_vma - * @dst: The destination VMA - * @src: The source VMA - * @dup: Pointer to the destination VMA when successful. - * - * Returns: 0 on success. - */ -static inline int dup_anon_vma(struct vm_area_struct *dst, - struct vm_area_struct *src, struct vm_area_struct **dup) -{ - /* - * Easily overlooked: when mprotect shifts the boundary, make sure the - * expanding vma has anon_vma set if the shrinking vma had, to cover any - * anon pages imported. - */ - if (src->anon_vma && !dst->anon_vma) { - int ret; - - vma_assert_write_locked(dst); - dst->anon_vma = src->anon_vma; - ret = anon_vma_clone(dst, src); - if (ret) - return ret; - - *dup = dst; - } - - return 0; -} - -/* - * vma_expand - Expand an existing VMA - * - * @vmi: The vma iterator - * @vma: The vma to expand - * @start: The start of the vma - * @end: The exclusive end of the vma - * @pgoff: The page offset of vma - * @next: The current of next vma. - * - * Expand @vma to @start and @end. Can expand off the start and end. Will - * expand over @next if it's different from @vma and @end == @next->vm_end. - * Checking if the @vma can expand and merge with @next needs to be handled by - * the caller. - * - * Returns: 0 on success - */ -int vma_expand(struct vma_iterator *vmi, struct vm_area_struct *vma, - unsigned long start, unsigned long end, pgoff_t pgoff, - struct vm_area_struct *next) -{ - struct vm_area_struct *anon_dup = NULL; - bool remove_next = false; - struct vma_prepare vp; - - vma_start_write(vma); - if (next && (vma != next) && (end == next->vm_end)) { - int ret; - - remove_next = true; - vma_start_write(next); - ret = dup_anon_vma(vma, next, &anon_dup); - if (ret) - return ret; - } - - init_multi_vma_prep(&vp, vma, NULL, remove_next ? next : NULL, NULL); - /* Not merging but overwriting any part of next is not handled. */ - VM_WARN_ON(next && !vp.remove && - next != vma && end > next->vm_start); - /* Only handles expanding */ - VM_WARN_ON(vma->vm_start < start || vma->vm_end > end); - - /* Note: vma iterator must be pointing to 'start' */ - vma_iter_config(vmi, start, end); - if (vma_iter_prealloc(vmi, vma)) - goto nomem; - - vma_prepare(&vp); - vma_adjust_trans_huge(vma, start, end, 0); - vma_set_range(vma, start, end, pgoff); - vma_iter_store(vmi, vma); - - vma_complete(&vp, vmi, vma->vm_mm); - return 0; - -nomem: - if (anon_dup) - unlink_anon_vmas(anon_dup); - return -ENOMEM; -} - -/* - * vma_shrink() - Reduce an existing VMAs memory area - * @vmi: The vma iterator - * @vma: The VMA to modify - * @start: The new start - * @end: The new end - * - * Returns: 0 on success, -ENOMEM otherwise - */ -int vma_shrink(struct vma_iterator *vmi, struct vm_area_struct *vma, - unsigned long start, unsigned long end, pgoff_t pgoff) -{ - struct vma_prepare vp; - - WARN_ON((vma->vm_start != start) && (vma->vm_end != end)); - - if (vma->vm_start < start) - vma_iter_config(vmi, vma->vm_start, start); - else - vma_iter_config(vmi, end, vma->vm_end); - - if (vma_iter_prealloc(vmi, NULL)) - return -ENOMEM; - - vma_start_write(vma); - - init_vma_prep(&vp, vma); - vma_prepare(&vp); - vma_adjust_trans_huge(vma, start, end, 0); - - vma_iter_clear(vmi); - vma_set_range(vma, start, end, pgoff); - vma_complete(&vp, vmi, vma->vm_mm); - return 0; -} - -/* - * If the vma has a ->close operation then the driver probably needs to release - * per-vma resources, so we don't attempt to merge those if the caller indicates - * the current vma may be removed as part of the merge. - */ -static inline bool is_mergeable_vma(struct vm_area_struct *vma, - struct file *file, unsigned long vm_flags, - struct vm_userfaultfd_ctx vm_userfaultfd_ctx, - struct anon_vma_name *anon_name, bool may_remove_vma) -{ - /* - * VM_SOFTDIRTY should not prevent from VMA merging, if we - * match the flags but dirty bit -- the caller should mark - * merged VMA as dirty. If dirty bit won't be excluded from - * comparison, we increase pressure on the memory system forcing - * the kernel to generate new VMAs when old one could be - * extended instead. - */ - if ((vma->vm_flags ^ vm_flags) & ~VM_SOFTDIRTY) - return false; - if (vma->vm_file != file) - return false; - if (may_remove_vma && vma->vm_ops && vma->vm_ops->close) - return false; - if (!is_mergeable_vm_userfaultfd_ctx(vma, vm_userfaultfd_ctx)) - return false; - if (!anon_vma_name_eq(anon_vma_name(vma), anon_name)) - return false; - return true; -} - -static inline bool is_mergeable_anon_vma(struct anon_vma *anon_vma1, - struct anon_vma *anon_vma2, struct vm_area_struct *vma) -{ - /* - * The list_is_singular() test is to avoid merging VMA cloned from - * parents. This can improve scalability caused by anon_vma lock. - */ - if ((!anon_vma1 || !anon_vma2) && (!vma || - list_is_singular(&vma->anon_vma_chain))) - return true; - return anon_vma1 == anon_vma2; -} - -/* - * Return true if we can merge this (vm_flags,anon_vma,file,vm_pgoff) - * in front of (at a lower virtual address and file offset than) the vma. - * - * We cannot merge two vmas if they have differently assigned (non-NULL) - * anon_vmas, nor if same anon_vma is assigned but offsets incompatible. - * - * We don't check here for the merged mmap wrapping around the end of pagecache - * indices (16TB on ia32) because do_mmap() does not permit mmap's which - * wrap, nor mmaps which cover the final page at index -1UL. - * - * We assume the vma may be removed as part of the merge. - */ -static bool -can_vma_merge_before(struct vm_area_struct *vma, unsigned long vm_flags, - struct anon_vma *anon_vma, struct file *file, - pgoff_t vm_pgoff, struct vm_userfaultfd_ctx vm_userfaultfd_ctx, - struct anon_vma_name *anon_name) -{ - if (is_mergeable_vma(vma, file, vm_flags, vm_userfaultfd_ctx, anon_name, true) && - is_mergeable_anon_vma(anon_vma, vma->anon_vma, vma)) { - if (vma->vm_pgoff == vm_pgoff) - return true; - } - return false; -} - -/* - * Return true if we can merge this (vm_flags,anon_vma,file,vm_pgoff) - * beyond (at a higher virtual address and file offset than) the vma. - * - * We cannot merge two vmas if they have differently assigned (non-NULL) - * anon_vmas, nor if same anon_vma is assigned but offsets incompatible. - * - * We assume that vma is not removed as part of the merge. - */ -static bool -can_vma_merge_after(struct vm_area_struct *vma, unsigned long vm_flags, - struct anon_vma *anon_vma, struct file *file, - pgoff_t vm_pgoff, struct vm_userfaultfd_ctx vm_userfaultfd_ctx, - struct anon_vma_name *anon_name) -{ - if (is_mergeable_vma(vma, file, vm_flags, vm_userfaultfd_ctx, anon_name, false) && - is_mergeable_anon_vma(anon_vma, vma->anon_vma, vma)) { - pgoff_t vm_pglen; - vm_pglen = vma_pages(vma); - if (vma->vm_pgoff + vm_pglen == vm_pgoff) - return true; - } - return false; -} - -/* - * Given a mapping request (addr,end,vm_flags,file,pgoff,anon_name), - * figure out whether that can be merged with its predecessor or its - * successor. Or both (it neatly fills a hole). - * - * In most cases - when called for mmap, brk or mremap - [addr,end) is - * certain not to be mapped by the time vma_merge is called; but when - * called for mprotect, it is certain to be already mapped (either at - * an offset within prev, or at the start of next), and the flags of - * this area are about to be changed to vm_flags - and the no-change - * case has already been eliminated. - * - * The following mprotect cases have to be considered, where **** is - * the area passed down from mprotect_fixup, never extending beyond one - * vma, PPPP is the previous vma, CCCC is a concurrent vma that starts - * at the same address as **** and is of the same or larger span, and - * NNNN the next vma after ****: - * - * **** **** **** - * PPPPPPNNNNNN PPPPPPNNNNNN PPPPPPCCCCCC - * cannot merge might become might become - * PPNNNNNNNNNN PPPPPPPPPPCC - * mmap, brk or case 4 below case 5 below - * mremap move: - * **** **** - * PPPP NNNN PPPPCCCCNNNN - * might become might become - * PPPPPPPPPPPP 1 or PPPPPPPPPPPP 6 or - * PPPPPPPPNNNN 2 or PPPPPPPPNNNN 7 or - * PPPPNNNNNNNN 3 PPPPNNNNNNNN 8 - * - * It is important for case 8 that the vma CCCC overlapping the - * region **** is never going to extended over NNNN. Instead NNNN must - * be extended in region **** and CCCC must be removed. This way in - * all cases where vma_merge succeeds, the moment vma_merge drops the - * rmap_locks, the properties of the merged vma will be already - * correct for the whole merged range. Some of those properties like - * vm_page_prot/vm_flags may be accessed by rmap_walks and they must - * be correct for the whole merged range immediately after the - * rmap_locks are released. Otherwise if NNNN would be removed and - * CCCC would be extended over the NNNN range, remove_migration_ptes - * or other rmap walkers (if working on addresses beyond the "end" - * parameter) may establish ptes with the wrong permissions of CCCC - * instead of the right permissions of NNNN. - * - * In the code below: - * PPPP is represented by *prev - * CCCC is represented by *curr or not represented at all (NULL) - * NNNN is represented by *next or not represented at all (NULL) - * **** is not represented - it will be merged and the vma containing the - * area is returned, or the function will return NULL - */ -static struct vm_area_struct -*vma_merge(struct vma_iterator *vmi, struct vm_area_struct *prev, - struct vm_area_struct *src, unsigned long addr, unsigned long end, - unsigned long vm_flags, pgoff_t pgoff, struct mempolicy *policy, - struct vm_userfaultfd_ctx vm_userfaultfd_ctx, - struct anon_vma_name *anon_name) -{ - struct mm_struct *mm = src->vm_mm; - struct anon_vma *anon_vma = src->anon_vma; - struct file *file = src->vm_file; - struct vm_area_struct *curr, *next, *res; - struct vm_area_struct *vma, *adjust, *remove, *remove2; - struct vm_area_struct *anon_dup = NULL; - struct vma_prepare vp; - pgoff_t vma_pgoff; - int err = 0; - bool merge_prev = false; - bool merge_next = false; - bool vma_expanded = false; - unsigned long vma_start = addr; - unsigned long vma_end = end; - pgoff_t pglen = (end - addr) >> PAGE_SHIFT; - long adj_start = 0; - - /* - * We later require that vma->vm_flags == vm_flags, - * so this tests vma->vm_flags & VM_SPECIAL, too. - */ - if (vm_flags & VM_SPECIAL) - return NULL; - - /* Does the input range span an existing VMA? (cases 5 - 8) */ - curr = find_vma_intersection(mm, prev ? prev->vm_end : 0, end); - - if (!curr || /* cases 1 - 4 */ - end == curr->vm_end) /* cases 6 - 8, adjacent VMA */ - next = vma_lookup(mm, end); - else - next = NULL; /* case 5 */ - - if (prev) { - vma_start = prev->vm_start; - vma_pgoff = prev->vm_pgoff; - - /* Can we merge the predecessor? */ - if (addr == prev->vm_end && mpol_equal(vma_policy(prev), policy) - && can_vma_merge_after(prev, vm_flags, anon_vma, file, - pgoff, vm_userfaultfd_ctx, anon_name)) { - merge_prev = true; - vma_prev(vmi); - } - } - - /* Can we merge the successor? */ - if (next && mpol_equal(policy, vma_policy(next)) && - can_vma_merge_before(next, vm_flags, anon_vma, file, pgoff+pglen, - vm_userfaultfd_ctx, anon_name)) { - merge_next = true; - } - - /* Verify some invariant that must be enforced by the caller. */ - VM_WARN_ON(prev && addr <= prev->vm_start); - VM_WARN_ON(curr && (addr != curr->vm_start || end > curr->vm_end)); - VM_WARN_ON(addr >= end); - - if (!merge_prev && !merge_next) - return NULL; /* Not mergeable. */ - - if (merge_prev) - vma_start_write(prev); - - res = vma = prev; - remove = remove2 = adjust = NULL; - - /* Can we merge both the predecessor and the successor? */ - if (merge_prev && merge_next && - is_mergeable_anon_vma(prev->anon_vma, next->anon_vma, NULL)) { - vma_start_write(next); - remove = next; /* case 1 */ - vma_end = next->vm_end; - err = dup_anon_vma(prev, next, &anon_dup); - if (curr) { /* case 6 */ - vma_start_write(curr); - remove = curr; - remove2 = next; - /* - * Note that the dup_anon_vma below cannot overwrite err - * since the first caller would do nothing unless next - * has an anon_vma. - */ - if (!next->anon_vma) - err = dup_anon_vma(prev, curr, &anon_dup); - } - } else if (merge_prev) { /* case 2 */ - if (curr) { - vma_start_write(curr); - if (end == curr->vm_end) { /* case 7 */ - /* - * can_vma_merge_after() assumed we would not be - * removing prev vma, so it skipped the check - * for vm_ops->close, but we are removing curr - */ - if (curr->vm_ops && curr->vm_ops->close) - err = -EINVAL; - remove = curr; - } else { /* case 5 */ - adjust = curr; - adj_start = (end - curr->vm_start); - } - if (!err) - err = dup_anon_vma(prev, curr, &anon_dup); - } - } else { /* merge_next */ - vma_start_write(next); - res = next; - if (prev && addr < prev->vm_end) { /* case 4 */ - vma_start_write(prev); - vma_end = addr; - adjust = next; - adj_start = -(prev->vm_end - addr); - err = dup_anon_vma(next, prev, &anon_dup); - } else { - /* - * Note that cases 3 and 8 are the ONLY ones where prev - * is permitted to be (but is not necessarily) NULL. - */ - vma = next; /* case 3 */ - vma_start = addr; - vma_end = next->vm_end; - vma_pgoff = next->vm_pgoff - pglen; - if (curr) { /* case 8 */ - vma_pgoff = curr->vm_pgoff; - vma_start_write(curr); - remove = curr; - err = dup_anon_vma(next, curr, &anon_dup); - } - } - } - - /* Error in anon_vma clone. */ - if (err) - goto anon_vma_fail; - - if (vma_start < vma->vm_start || vma_end > vma->vm_end) - vma_expanded = true; - - if (vma_expanded) { - vma_iter_config(vmi, vma_start, vma_end); - } else { - vma_iter_config(vmi, adjust->vm_start + adj_start, - adjust->vm_end); - } - - if (vma_iter_prealloc(vmi, vma)) - goto prealloc_fail; - - init_multi_vma_prep(&vp, vma, adjust, remove, remove2); - VM_WARN_ON(vp.anon_vma && adjust && adjust->anon_vma && - vp.anon_vma != adjust->anon_vma); - - vma_prepare(&vp); - vma_adjust_trans_huge(vma, vma_start, vma_end, adj_start); - vma_set_range(vma, vma_start, vma_end, vma_pgoff); - - if (vma_expanded) - vma_iter_store(vmi, vma); - - if (adj_start) { - adjust->vm_start += adj_start; - adjust->vm_pgoff += adj_start >> PAGE_SHIFT; - if (adj_start < 0) { - WARN_ON(vma_expanded); - vma_iter_store(vmi, next); - } - } - - vma_complete(&vp, vmi, mm); - khugepaged_enter_vma(res, vm_flags); - return res; - -prealloc_fail: - if (anon_dup) - unlink_anon_vmas(anon_dup); - -anon_vma_fail: - vma_iter_set(vmi, addr); - vma_iter_load(vmi); - return NULL; -} - -/* - * Rough compatibility check to quickly see if it's even worth looking - * at sharing an anon_vma. - * - * They need to have the same vm_file, and the flags can only differ - * in things that mprotect may change. - * - * NOTE! The fact that we share an anon_vma doesn't _have_ to mean that - * we can merge the two vma's. For example, we refuse to merge a vma if - * there is a vm_ops->close() function, because that indicates that the - * driver is doing some kind of reference counting. But that doesn't - * really matter for the anon_vma sharing case. - */ -static int anon_vma_compatible(struct vm_area_struct *a, struct vm_area_struct *b) -{ - return a->vm_end == b->vm_start && - mpol_equal(vma_policy(a), vma_policy(b)) && - a->vm_file == b->vm_file && - !((a->vm_flags ^ b->vm_flags) & ~(VM_ACCESS_FLAGS | VM_SOFTDIRTY)) && - b->vm_pgoff == a->vm_pgoff + ((b->vm_start - a->vm_start) >> PAGE_SHIFT); -} - -/* - * Do some basic sanity checking to see if we can re-use the anon_vma - * from 'old'. The 'a'/'b' vma's are in VM order - one of them will be - * the same as 'old', the other will be the new one that is trying - * to share the anon_vma. - * - * NOTE! This runs with mmap_lock held for reading, so it is possible that - * the anon_vma of 'old' is concurrently in the process of being set up - * by another page fault trying to merge _that_. But that's ok: if it - * is being set up, that automatically means that it will be a singleton - * acceptable for merging, so we can do all of this optimistically. But - * we do that READ_ONCE() to make sure that we never re-load the pointer. - * - * IOW: that the "list_is_singular()" test on the anon_vma_chain only - * matters for the 'stable anon_vma' case (ie the thing we want to avoid - * is to return an anon_vma that is "complex" due to having gone through - * a fork). - * - * We also make sure that the two vma's are compatible (adjacent, - * and with the same memory policies). That's all stable, even with just - * a read lock on the mmap_lock. - */ -static struct anon_vma *reusable_anon_vma(struct vm_area_struct *old, struct vm_area_struct *a, struct vm_area_struct *b) -{ - if (anon_vma_compatible(a, b)) { - struct anon_vma *anon_vma = READ_ONCE(old->anon_vma); - - if (anon_vma && list_is_singular(&old->anon_vma_chain)) - return anon_vma; - } - return NULL; -} - -/* - * find_mergeable_anon_vma is used by anon_vma_prepare, to check - * neighbouring vmas for a suitable anon_vma, before it goes off - * to allocate a new anon_vma. It checks because a repetitive - * sequence of mprotects and faults may otherwise lead to distinct - * anon_vmas being allocated, preventing vma merge in subsequent - * mprotect. - */ -struct anon_vma *find_mergeable_anon_vma(struct vm_area_struct *vma) -{ - struct anon_vma *anon_vma = NULL; - struct vm_area_struct *prev, *next; - VMA_ITERATOR(vmi, vma->vm_mm, vma->vm_end); - - /* Try next first. */ - next = vma_iter_load(&vmi); - if (next) { - anon_vma = reusable_anon_vma(next, vma, next); - if (anon_vma) - return anon_vma; - } + mm->brk = brk; + if (mm->def_flags & VM_LOCKED) + populate = true; - prev = vma_prev(&vmi); - VM_BUG_ON_VMA(prev != vma, vma); - prev = vma_prev(&vmi); - /* Try prev next. */ - if (prev) - anon_vma = reusable_anon_vma(prev, prev, vma); +success: + mmap_write_unlock(mm); +success_unlocked: + userfaultfd_unmap_complete(mm, &uf); + if (populate) + mm_populate(oldbrk, newbrk - oldbrk); + return brk; - /* - * We might reach here with anon_vma == NULL if we can't find - * any reusable anon_vma. - * There's no absolute need to look only at touching neighbours: - * we could search further afield for "compatible" anon_vmas. - * But it would probably just be a waste of time searching, - * or lead to too many vmas hanging off the same anon_vma. - * We're trying to allow mprotect remerging later on, - * not trying to minimize memory used for anon_vmas. - */ - return anon_vma; +out: + mm->brk = origbrk; + mmap_write_unlock(mm); + return origbrk; } /* @@ -1519,85 +547,6 @@ SYSCALL_DEFINE1(old_mmap, struct mmap_arg_struct __user *, arg) } #endif /* __ARCH_WANT_SYS_OLD_MMAP */ -static bool vm_ops_needs_writenotify(const struct vm_operations_struct *vm_ops) -{ - return vm_ops && (vm_ops->page_mkwrite || vm_ops->pfn_mkwrite); -} - -static bool vma_is_shared_writable(struct vm_area_struct *vma) -{ - return (vma->vm_flags & (VM_WRITE | VM_SHARED)) == - (VM_WRITE | VM_SHARED); -} - -static bool vma_fs_can_writeback(struct vm_area_struct *vma) -{ - /* No managed pages to writeback. */ - if (vma->vm_flags & VM_PFNMAP) - return false; - - return vma->vm_file && vma->vm_file->f_mapping && - mapping_can_writeback(vma->vm_file->f_mapping); -} - -/* - * Does this VMA require the underlying folios to have their dirty state - * tracked? - */ -bool vma_needs_dirty_tracking(struct vm_area_struct *vma) -{ - /* Only shared, writable VMAs require dirty tracking. */ - if (!vma_is_shared_writable(vma)) - return false; - - /* Does the filesystem need to be notified? */ - if (vm_ops_needs_writenotify(vma->vm_ops)) - return true; - - /* - * Even if the filesystem doesn't indicate a need for writenotify, if it - * can writeback, dirty tracking is still required. - */ - return vma_fs_can_writeback(vma); -} - -/* - * Some shared mappings will want the pages marked read-only - * to track write events. If so, we'll downgrade vm_page_prot - * to the private version (using protection_map[] without the - * VM_SHARED bit). - */ -bool vma_wants_writenotify(struct vm_area_struct *vma, pgprot_t vm_page_prot) -{ - /* If it was private or non-writable, the write bit is already clear */ - if (!vma_is_shared_writable(vma)) - return false; - - /* The backer wishes to know when pages are first written to? */ - if (vm_ops_needs_writenotify(vma->vm_ops)) - return true; - - /* The open routine did something to the protections that pgprot_modify - * won't preserve? */ - if (pgprot_val(vm_page_prot) != - pgprot_val(vm_pgprot_modify(vm_page_prot, vma->vm_flags))) - return false; - - /* - * Do we need to track softdirty? hugetlb does not support softdirty - * tracking yet. - */ - if (vma_soft_dirty_enabled(vma) && !is_vm_hugetlb_page(vma)) - return true; - - /* Do we need write faults for uffd-wp tracking? */ - if (userfaultfd_wp(vma)) - return true; - - /* Can the mapping track the dirty pages? */ - return vma_fs_can_writeback(vma); -} - /* * We account for memory if it's a private writeable mapping, * not hugepages and VM_NORESERVE wasn't set. @@ -2238,566 +1187,129 @@ int expand_downwards(struct vm_area_struct *vma, unsigned long address) anon_vma_interval_tree_post_update_vma(vma); spin_unlock(&mm->page_table_lock); - perf_event_mmap(vma); - } - } - } - anon_vma_unlock_write(vma->anon_vma); - vma_iter_free(&vmi); - validate_mm(mm); - return error; -} - -/* enforced gap between the expanding stack and other mappings. */ -unsigned long stack_guard_gap = 256UL<close operation then the driver probably needs to release + * per-vma resources, so we don't attempt to merge those if the caller indicates + * the current vma may be removed as part of the merge. + */ +static inline bool is_mergeable_vma(struct vm_area_struct *vma, + struct file *file, unsigned long vm_flags, + struct vm_userfaultfd_ctx vm_userfaultfd_ctx, + struct anon_vma_name *anon_name, bool may_remove_vma) +{ + /* + * VM_SOFTDIRTY should not prevent from VMA merging, if we + * match the flags but dirty bit -- the caller should mark + * merged VMA as dirty. If dirty bit won't be excluded from + * comparison, we increase pressure on the memory system forcing + * the kernel to generate new VMAs when old one could be + * extended instead. + */ + if ((vma->vm_flags ^ vm_flags) & ~VM_SOFTDIRTY) + return false; + if (vma->vm_file != file) + return false; + if (may_remove_vma && vma->vm_ops && vma->vm_ops->close) + return false; + if (!is_mergeable_vm_userfaultfd_ctx(vma, vm_userfaultfd_ctx)) + return false; + if (!anon_vma_name_eq(anon_vma_name(vma), anon_name)) + return false; + return true; +} + +static inline bool is_mergeable_anon_vma(struct anon_vma *anon_vma1, + struct anon_vma *anon_vma2, struct vm_area_struct *vma) +{ + /* + * The list_is_singular() test is to avoid merging VMA cloned from + * parents. This can improve scalability caused by anon_vma lock. + */ + if ((!anon_vma1 || !anon_vma2) && (!vma || + list_is_singular(&vma->anon_vma_chain))) + return true; + return anon_vma1 == anon_vma2; +} + +/* + * init_multi_vma_prep() - Initializer for struct vma_prepare + * @vp: The vma_prepare struct + * @vma: The vma that will be altered once locked + * @next: The next vma if it is to be adjusted + * @remove: The first vma to be removed + * @remove2: The second vma to be removed + */ +static void init_multi_vma_prep(struct vma_prepare *vp, + struct vm_area_struct *vma, + struct vm_area_struct *next, + struct vm_area_struct *remove, + struct vm_area_struct *remove2) +{ + memset(vp, 0, sizeof(struct vma_prepare)); + vp->vma = vma; + vp->anon_vma = vma->anon_vma; + vp->remove = remove; + vp->remove2 = remove2; + vp->adj_next = next; + if (!vp->anon_vma && next) + vp->anon_vma = next->anon_vma; + + vp->file = vma->vm_file; + if (vp->file) + vp->mapping = vma->vm_file->f_mapping; + +} + +/* + * Return true if we can merge this (vm_flags,anon_vma,file,vm_pgoff) + * in front of (at a lower virtual address and file offset than) the vma. + * + * We cannot merge two vmas if they have differently assigned (non-NULL) + * anon_vmas, nor if same anon_vma is assigned but offsets incompatible. + * + * We don't check here for the merged mmap wrapping around the end of pagecache + * indices (16TB on ia32) because do_mmap() does not permit mmap's which + * wrap, nor mmaps which cover the final page at index -1UL. + * + * We assume the vma may be removed as part of the merge. + */ +bool +can_vma_merge_before(struct vm_area_struct *vma, unsigned long vm_flags, + struct anon_vma *anon_vma, struct file *file, + pgoff_t vm_pgoff, struct vm_userfaultfd_ctx vm_userfaultfd_ctx, + struct anon_vma_name *anon_name) +{ + if (is_mergeable_vma(vma, file, vm_flags, vm_userfaultfd_ctx, anon_name, true) && + is_mergeable_anon_vma(anon_vma, vma->anon_vma, vma)) { + if (vma->vm_pgoff == vm_pgoff) + return true; + } + return false; +} + +/* + * Return true if we can merge this (vm_flags,anon_vma,file,vm_pgoff) + * beyond (at a higher virtual address and file offset than) the vma. + * + * We cannot merge two vmas if they have differently assigned (non-NULL) + * anon_vmas, nor if same anon_vma is assigned but offsets incompatible. + * + * We assume that vma is not removed as part of the merge. + */ +bool +can_vma_merge_after(struct vm_area_struct *vma, unsigned long vm_flags, + struct anon_vma *anon_vma, struct file *file, + pgoff_t vm_pgoff, struct vm_userfaultfd_ctx vm_userfaultfd_ctx, + struct anon_vma_name *anon_name) +{ + if (is_mergeable_vma(vma, file, vm_flags, vm_userfaultfd_ctx, anon_name, false) && + is_mergeable_anon_vma(anon_vma, vma->anon_vma, vma)) { + pgoff_t vm_pglen; + + vm_pglen = vma_pages(vma); + if (vma->vm_pgoff + vm_pglen == vm_pgoff) + return true; + } + return false; +} + +/* + * Close a vm structure and free it. + */ +void remove_vma(struct vm_area_struct *vma, bool unreachable) +{ + might_sleep(); + if (vma->vm_ops && vma->vm_ops->close) + vma->vm_ops->close(vma); + if (vma->vm_file) + fput(vma->vm_file); + mpol_put(vma_policy(vma)); + if (unreachable) + __vm_area_free(vma); + else + vm_area_free(vma); +} + +/* + * Get rid of page table information in the indicated region. + * + * Called with the mm semaphore held. + */ +void unmap_region(struct mm_struct *mm, struct ma_state *mas, + struct vm_area_struct *vma, struct vm_area_struct *prev, + struct vm_area_struct *next, unsigned long start, + unsigned long end, unsigned long tree_end, bool mm_wr_locked) +{ + struct mmu_gather tlb; + unsigned long mt_start = mas->index; + + lru_add_drain(); + tlb_gather_mmu(&tlb, mm); + update_hiwater_rss(mm); + unmap_vmas(&tlb, mas, vma, start, end, tree_end, mm_wr_locked); + mas_set(mas, mt_start); + free_pgtables(&tlb, mas, vma, prev ? prev->vm_end : FIRST_USER_ADDRESS, + next ? next->vm_start : USER_PGTABLES_CEILING, + mm_wr_locked); + tlb_finish_mmu(&tlb); +} + +/* + * __split_vma() bypasses sysctl_max_map_count checking. We use this where it + * has already been checked or doesn't make sense to fail. + * VMA Iterator will point to the end VMA. + */ +static int __split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma, + unsigned long addr, int new_below) +{ + struct vma_prepare vp; + struct vm_area_struct *new; + int err; + + WARN_ON(vma->vm_start >= addr); + WARN_ON(vma->vm_end <= addr); + + if (vma->vm_ops && vma->vm_ops->may_split) { + err = vma->vm_ops->may_split(vma, addr); + if (err) + return err; + } + + new = vm_area_dup(vma); + if (!new) + return -ENOMEM; + + if (new_below) { + new->vm_end = addr; + } else { + new->vm_start = addr; + new->vm_pgoff += ((addr - vma->vm_start) >> PAGE_SHIFT); + } + + err = -ENOMEM; + vma_iter_config(vmi, new->vm_start, new->vm_end); + if (vma_iter_prealloc(vmi, new)) + goto out_free_vma; + + err = vma_dup_policy(vma, new); + if (err) + goto out_free_vmi; + + err = anon_vma_clone(new, vma); + if (err) + goto out_free_mpol; + + if (new->vm_file) + get_file(new->vm_file); + + if (new->vm_ops && new->vm_ops->open) + new->vm_ops->open(new); + + vma_start_write(vma); + vma_start_write(new); + + init_vma_prep(&vp, vma); + vp.insert = new; + vma_prepare(&vp); + vma_adjust_trans_huge(vma, vma->vm_start, addr, 0); + + if (new_below) { + vma->vm_start = addr; + vma->vm_pgoff += (addr - new->vm_start) >> PAGE_SHIFT; + } else { + vma->vm_end = addr; + } + + /* vma_complete stores the new vma */ + vma_complete(&vp, vmi, vma->vm_mm); + + /* Success. */ + if (new_below) + vma_next(vmi); + return 0; + +out_free_mpol: + mpol_put(vma_policy(new)); +out_free_vmi: + vma_iter_free(vmi); +out_free_vma: + vm_area_free(new); + return err; +} + +/* + * Split a vma into two pieces at address 'addr', a new vma is allocated + * either for the first part or the tail. + */ +static int split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma, + unsigned long addr, int new_below) +{ + if (vma->vm_mm->map_count >= sysctl_max_map_count) + return -ENOMEM; + + return __split_vma(vmi, vma, addr, new_below); +} + +/* + * Ok - we have the memory areas we should free on a maple tree so release them, + * and do the vma updates. + * + * Called with the mm semaphore held. + */ +static inline void remove_mt(struct mm_struct *mm, struct ma_state *mas) +{ + unsigned long nr_accounted = 0; + struct vm_area_struct *vma; + + /* Update high watermark before we lower total_vm */ + update_hiwater_vm(mm); + mas_for_each(mas, vma, ULONG_MAX) { + long nrpages = vma_pages(vma); + + if (vma->vm_flags & VM_ACCOUNT) + nr_accounted += nrpages; + vm_stat_account(mm, vma->vm_flags, -nrpages); + remove_vma(vma, false); + } + vm_unacct_memory(nr_accounted); +} + +/* + * init_vma_prep() - Initializer wrapper for vma_prepare struct + * @vp: The vma_prepare struct + * @vma: The vma that will be altered once locked + */ +void init_vma_prep(struct vma_prepare *vp, + struct vm_area_struct *vma) +{ + init_multi_vma_prep(vp, vma, NULL, NULL, NULL); +} + +/* + * Requires inode->i_mapping->i_mmap_rwsem + */ +static void __remove_shared_vm_struct(struct vm_area_struct *vma, + struct address_space *mapping) +{ + if (vma_is_shared_maywrite(vma)) + mapping_unmap_writable(mapping); + + flush_dcache_mmap_lock(mapping); + vma_interval_tree_remove(vma, &mapping->i_mmap); + flush_dcache_mmap_unlock(mapping); +} + +/* + * vma has some anon_vma assigned, and is already inserted on that + * anon_vma's interval trees. + * + * Before updating the vma's vm_start / vm_end / vm_pgoff fields, the + * vma must be removed from the anon_vma's interval trees using + * anon_vma_interval_tree_pre_update_vma(). + * + * After the update, the vma will be reinserted using + * anon_vma_interval_tree_post_update_vma(). + * + * The entire update must be protected by exclusive mmap_lock and by + * the root anon_vma's mutex. + */ +void +anon_vma_interval_tree_pre_update_vma(struct vm_area_struct *vma) +{ + struct anon_vma_chain *avc; + + list_for_each_entry(avc, &vma->anon_vma_chain, same_vma) + anon_vma_interval_tree_remove(avc, &avc->anon_vma->rb_root); +} + +void +anon_vma_interval_tree_post_update_vma(struct vm_area_struct *vma) +{ + struct anon_vma_chain *avc; + + list_for_each_entry(avc, &vma->anon_vma_chain, same_vma) + anon_vma_interval_tree_insert(avc, &avc->anon_vma->rb_root); +} + +static void __vma_link_file(struct vm_area_struct *vma, + struct address_space *mapping) +{ + if (vma_is_shared_maywrite(vma)) + mapping_allow_writable(mapping); + + flush_dcache_mmap_lock(mapping); + vma_interval_tree_insert(vma, &mapping->i_mmap); + flush_dcache_mmap_unlock(mapping); +} + +/* + * vma_prepare() - Helper function for handling locking VMAs prior to altering + * @vp: The initialized vma_prepare struct + */ +void vma_prepare(struct vma_prepare *vp) +{ + if (vp->file) { + uprobe_munmap(vp->vma, vp->vma->vm_start, vp->vma->vm_end); + + if (vp->adj_next) + uprobe_munmap(vp->adj_next, vp->adj_next->vm_start, + vp->adj_next->vm_end); + + i_mmap_lock_write(vp->mapping); + if (vp->insert && vp->insert->vm_file) { + /* + * Put into interval tree now, so instantiated pages + * are visible to arm/parisc __flush_dcache_page + * throughout; but we cannot insert into address + * space until vma start or end is updated. + */ + __vma_link_file(vp->insert, + vp->insert->vm_file->f_mapping); + } + } + + if (vp->anon_vma) { + anon_vma_lock_write(vp->anon_vma); + anon_vma_interval_tree_pre_update_vma(vp->vma); + if (vp->adj_next) + anon_vma_interval_tree_pre_update_vma(vp->adj_next); + } + + if (vp->file) { + flush_dcache_mmap_lock(vp->mapping); + vma_interval_tree_remove(vp->vma, &vp->mapping->i_mmap); + if (vp->adj_next) + vma_interval_tree_remove(vp->adj_next, + &vp->mapping->i_mmap); + } + +} + +/* + * dup_anon_vma() - Helper function to duplicate anon_vma + * @dst: The destination VMA + * @src: The source VMA + * @dup: Pointer to the destination VMA when successful. + * + * Returns: 0 on success. + */ +static int dup_anon_vma(struct vm_area_struct *dst, + struct vm_area_struct *src, struct vm_area_struct **dup) +{ + /* + * Easily overlooked: when mprotect shifts the boundary, make sure the + * expanding vma has anon_vma set if the shrinking vma had, to cover any + * anon pages imported. + */ + if (src->anon_vma && !dst->anon_vma) { + int ret; + + vma_assert_write_locked(dst); + dst->anon_vma = src->anon_vma; + ret = anon_vma_clone(dst, src); + if (ret) + return ret; + + *dup = dst; + } + + return 0; +} + +#ifdef CONFIG_DEBUG_VM_MAPLE_TREE +void validate_mm(struct mm_struct *mm) +{ + int bug = 0; + int i = 0; + struct vm_area_struct *vma; + VMA_ITERATOR(vmi, mm, 0); + + mt_validate(&mm->mm_mt); + for_each_vma(vmi, vma) { +#ifdef CONFIG_DEBUG_VM_RB + struct anon_vma *anon_vma = vma->anon_vma; + struct anon_vma_chain *avc; +#endif + unsigned long vmi_start, vmi_end; + bool warn = 0; + + vmi_start = vma_iter_addr(&vmi); + vmi_end = vma_iter_end(&vmi); + if (VM_WARN_ON_ONCE_MM(vma->vm_end != vmi_end, mm)) + warn = 1; + + if (VM_WARN_ON_ONCE_MM(vma->vm_start != vmi_start, mm)) + warn = 1; + + if (warn) { + pr_emerg("issue in %s\n", current->comm); + dump_stack(); + dump_vma(vma); + pr_emerg("tree range: %px start %lx end %lx\n", vma, + vmi_start, vmi_end - 1); + vma_iter_dump_tree(&vmi); + } + +#ifdef CONFIG_DEBUG_VM_RB + if (anon_vma) { + anon_vma_lock_read(anon_vma); + list_for_each_entry(avc, &vma->anon_vma_chain, same_vma) + anon_vma_interval_tree_verify(avc); + anon_vma_unlock_read(anon_vma); + } +#endif + i++; + } + if (i != mm->map_count) { + pr_emerg("map_count %d vma iterator %d\n", mm->map_count, i); + bug = 1; + } + VM_BUG_ON_MM(bug, mm); +} +#endif /* CONFIG_DEBUG_VM_MAPLE_TREE */ + +/* + * vma_expand - Expand an existing VMA + * + * @vmi: The vma iterator + * @vma: The vma to expand + * @start: The start of the vma + * @end: The exclusive end of the vma + * @pgoff: The page offset of vma + * @next: The current of next vma. + * + * Expand @vma to @start and @end. Can expand off the start and end. Will + * expand over @next if it's different from @vma and @end == @next->vm_end. + * Checking if the @vma can expand and merge with @next needs to be handled by + * the caller. + * + * Returns: 0 on success + */ +int vma_expand(struct vma_iterator *vmi, struct vm_area_struct *vma, + unsigned long start, unsigned long end, pgoff_t pgoff, + struct vm_area_struct *next) +{ + struct vm_area_struct *anon_dup = NULL; + bool remove_next = false; + struct vma_prepare vp; + + vma_start_write(vma); + if (next && (vma != next) && (end == next->vm_end)) { + int ret; + + remove_next = true; + vma_start_write(next); + ret = dup_anon_vma(vma, next, &anon_dup); + if (ret) + return ret; + } + + init_multi_vma_prep(&vp, vma, NULL, remove_next ? next : NULL, NULL); + /* Not merging but overwriting any part of next is not handled. */ + VM_WARN_ON(next && !vp.remove && + next != vma && end > next->vm_start); + /* Only handles expanding */ + VM_WARN_ON(vma->vm_start < start || vma->vm_end > end); + + /* Note: vma iterator must be pointing to 'start' */ + vma_iter_config(vmi, start, end); + if (vma_iter_prealloc(vmi, vma)) + goto nomem; + + vma_prepare(&vp); + vma_adjust_trans_huge(vma, start, end, 0); + vma_set_range(vma, start, end, pgoff); + vma_iter_store(vmi, vma); + + vma_complete(&vp, vmi, vma->vm_mm); + return 0; + +nomem: + if (anon_dup) + unlink_anon_vmas(anon_dup); + return -ENOMEM; +} + +/* + * vma_shrink() - Reduce an existing VMAs memory area + * @vmi: The vma iterator + * @vma: The VMA to modify + * @start: The new start + * @end: The new end + * + * Returns: 0 on success, -ENOMEM otherwise + */ +int vma_shrink(struct vma_iterator *vmi, struct vm_area_struct *vma, + unsigned long start, unsigned long end, pgoff_t pgoff) +{ + struct vma_prepare vp; + + WARN_ON((vma->vm_start != start) && (vma->vm_end != end)); + + if (vma->vm_start < start) + vma_iter_config(vmi, vma->vm_start, start); + else + vma_iter_config(vmi, end, vma->vm_end); + + if (vma_iter_prealloc(vmi, NULL)) + return -ENOMEM; + + vma_start_write(vma); + + init_vma_prep(&vp, vma); + vma_prepare(&vp); + vma_adjust_trans_huge(vma, start, end, 0); + + vma_iter_clear(vmi); + vma_set_range(vma, start, end, pgoff); + vma_complete(&vp, vmi, vma->vm_mm); + return 0; +} + +/* + * vma_complete- Helper function for handling the unlocking after altering VMAs, + * or for inserting a VMA. + * + * @vp: The vma_prepare struct + * @vmi: The vma iterator + * @mm: The mm_struct + */ +void vma_complete(struct vma_prepare *vp, + struct vma_iterator *vmi, struct mm_struct *mm) +{ + if (vp->file) { + if (vp->adj_next) + vma_interval_tree_insert(vp->adj_next, + &vp->mapping->i_mmap); + vma_interval_tree_insert(vp->vma, &vp->mapping->i_mmap); + flush_dcache_mmap_unlock(vp->mapping); + } + + if (vp->remove && vp->file) { + __remove_shared_vm_struct(vp->remove, vp->mapping); + if (vp->remove2) + __remove_shared_vm_struct(vp->remove2, vp->mapping); + } else if (vp->insert) { + /* + * split_vma has split insert from vma, and needs + * us to insert it before dropping the locks + * (it may either follow vma or precede it). + */ + vma_iter_store(vmi, vp->insert); + mm->map_count++; + } + + if (vp->anon_vma) { + anon_vma_interval_tree_post_update_vma(vp->vma); + if (vp->adj_next) + anon_vma_interval_tree_post_update_vma(vp->adj_next); + anon_vma_unlock_write(vp->anon_vma); + } + + if (vp->file) { + i_mmap_unlock_write(vp->mapping); + uprobe_mmap(vp->vma); + + if (vp->adj_next) + uprobe_mmap(vp->adj_next); + } + + if (vp->remove) { +again: + vma_mark_detached(vp->remove, true); + if (vp->file) { + uprobe_munmap(vp->remove, vp->remove->vm_start, + vp->remove->vm_end); + fput(vp->file); + } + if (vp->remove->anon_vma) + anon_vma_merge(vp->vma, vp->remove); + mm->map_count--; + mpol_put(vma_policy(vp->remove)); + if (!vp->remove2) + WARN_ON_ONCE(vp->vma->vm_end < vp->remove->vm_end); + vm_area_free(vp->remove); + + /* + * In mprotect's case 6 (see comments on vma_merge), + * we are removing both mid and next vmas + */ + if (vp->remove2) { + vp->remove = vp->remove2; + vp->remove2 = NULL; + goto again; + } + } + if (vp->insert && vp->file) + uprobe_mmap(vp->insert); + validate_mm(mm); +} + +/* + * do_vmi_align_munmap() - munmap the aligned region from @start to @end. + * @vmi: The vma iterator + * @vma: The starting vm_area_struct + * @mm: The mm_struct + * @start: The aligned start address to munmap. + * @end: The aligned end address to munmap. + * @uf: The userfaultfd list_head + * @unlock: Set to true to drop the mmap_lock. unlocking only happens on + * success. + * + * Return: 0 on success and drops the lock if so directed, error and leaves the + * lock held otherwise. + */ +int +do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma, + struct mm_struct *mm, unsigned long start, + unsigned long end, struct list_head *uf, bool unlock) +{ + struct vm_area_struct *prev, *next = NULL; + struct maple_tree mt_detach; + int count = 0; + int error = -ENOMEM; + unsigned long locked_vm = 0; + MA_STATE(mas_detach, &mt_detach, 0, 0); + mt_init_flags(&mt_detach, vmi->mas.tree->ma_flags & MT_FLAGS_LOCK_MASK); + mt_on_stack(mt_detach); + + /* + * If we need to split any vma, do it now to save pain later. + * + * Note: mremap's move_vma VM_ACCOUNT handling assumes a partially + * unmapped vm_area_struct will remain in use: so lower split_vma + * places tmp vma above, and higher split_vma places tmp vma below. + */ + + /* Does it split the first one? */ + if (start > vma->vm_start) { + + /* + * Make sure that map_count on return from munmap() will + * not exceed its limit; but let map_count go just above + * its limit temporarily, to help free resources as expected. + */ + if (end < vma->vm_end && mm->map_count >= sysctl_max_map_count) + goto map_count_exceeded; + + error = __split_vma(vmi, vma, start, 1); + if (error) + goto start_split_failed; + } + + /* + * Detach a range of VMAs from the mm. Using next as a temp variable as + * it is always overwritten. + */ + next = vma; + do { + /* Does it split the end? */ + if (next->vm_end > end) { + error = __split_vma(vmi, next, end, 0); + if (error) + goto end_split_failed; + } + vma_start_write(next); + mas_set(&mas_detach, count); + error = mas_store_gfp(&mas_detach, next, GFP_KERNEL); + if (error) + goto munmap_gather_failed; + vma_mark_detached(next, true); + if (next->vm_flags & VM_LOCKED) + locked_vm += vma_pages(next); + + count++; + if (unlikely(uf)) { + /* + * If userfaultfd_unmap_prep returns an error the vmas + * will remain split, but userland will get a + * highly unexpected error anyway. This is no + * different than the case where the first of the two + * __split_vma fails, but we don't undo the first + * split, despite we could. This is unlikely enough + * failure that it's not worth optimizing it for. + */ + error = userfaultfd_unmap_prep(next, start, end, uf); + + if (error) + goto userfaultfd_error; + } +#ifdef CONFIG_DEBUG_VM_MAPLE_TREE + BUG_ON(next->vm_start < start); + BUG_ON(next->vm_start > end); +#endif + } for_each_vma_range(*vmi, next, end); + +#if defined(CONFIG_DEBUG_VM_MAPLE_TREE) + /* Make sure no VMAs are about to be lost. */ + { + MA_STATE(test, &mt_detach, 0, 0); + struct vm_area_struct *vma_mas, *vma_test; + int test_count = 0; + + vma_iter_set(vmi, start); + rcu_read_lock(); + vma_test = mas_find(&test, count - 1); + for_each_vma_range(*vmi, vma_mas, end) { + BUG_ON(vma_mas != vma_test); + test_count++; + vma_test = mas_next(&test, count - 1); + } + rcu_read_unlock(); + BUG_ON(count != test_count); + } +#endif + + while (vma_iter_addr(vmi) > start) + vma_iter_prev_range(vmi); + + error = vma_iter_clear_gfp(vmi, start, end, GFP_KERNEL); + if (error) + goto clear_tree_failed; + + /* Point of no return */ + mm->locked_vm -= locked_vm; + mm->map_count -= count; + if (unlock) + mmap_write_downgrade(mm); + + prev = vma_iter_prev_range(vmi); + next = vma_next(vmi); + if (next) + vma_iter_prev_range(vmi); + + /* + * We can free page tables without write-locking mmap_lock because VMAs + * were isolated before we downgraded mmap_lock. + */ + mas_set(&mas_detach, 1); + unmap_region(mm, &mas_detach, vma, prev, next, start, end, count, + !unlock); + /* Statistics and freeing VMAs */ + mas_set(&mas_detach, 0); + remove_mt(mm, &mas_detach); + validate_mm(mm); + if (unlock) + mmap_read_unlock(mm); + + __mt_destroy(&mt_detach); + return 0; + +clear_tree_failed: +userfaultfd_error: +munmap_gather_failed: +end_split_failed: + mas_set(&mas_detach, 0); + mas_for_each(&mas_detach, next, end) + vma_mark_detached(next, false); + + __mt_destroy(&mt_detach); +start_split_failed: +map_count_exceeded: + validate_mm(mm); + return error; +} + +/* + * do_vmi_munmap() - munmap a given range. + * @vmi: The vma iterator + * @mm: The mm_struct + * @start: The start address to munmap + * @len: The length of the range to munmap + * @uf: The userfaultfd list_head + * @unlock: set to true if the user wants to drop the mmap_lock on success + * + * This function takes a @mas that is either pointing to the previous VMA or set + * to MA_START and sets it up to remove the mapping(s). The @len will be + * aligned and any arch_unmap work will be preformed. + * + * Return: 0 on success and drops the lock if so directed, error and leaves the + * lock held otherwise. + */ +int do_vmi_munmap(struct vma_iterator *vmi, struct mm_struct *mm, + unsigned long start, size_t len, struct list_head *uf, + bool unlock) +{ + unsigned long end; + struct vm_area_struct *vma; + + if ((offset_in_page(start)) || start > TASK_SIZE || len > TASK_SIZE-start) + return -EINVAL; + + end = start + PAGE_ALIGN(len); + if (end == start) + return -EINVAL; + + /* + * Check if memory is sealed before arch_unmap. + * Prevent unmapping a sealed VMA. + * can_modify_mm assumes we have acquired the lock on MM. + */ + if (unlikely(!can_modify_mm(mm, start, end))) + return -EPERM; + + /* arch_unmap() might do unmaps itself. */ + arch_unmap(mm, start, end); + + /* Find the first overlapping VMA */ + vma = vma_find(vmi, end); + if (!vma) { + if (unlock) + mmap_write_unlock(mm); + return 0; + } + + return do_vmi_align_munmap(vmi, vma, mm, start, end, uf, unlock); +} + +/* + * Given a mapping request (addr,end,vm_flags,file,pgoff,anon_name), + * figure out whether that can be merged with its predecessor or its + * successor. Or both (it neatly fills a hole). + * + * In most cases - when called for mmap, brk or mremap - [addr,end) is + * certain not to be mapped by the time vma_merge is called; but when + * called for mprotect, it is certain to be already mapped (either at + * an offset within prev, or at the start of next), and the flags of + * this area are about to be changed to vm_flags - and the no-change + * case has already been eliminated. + * + * The following mprotect cases have to be considered, where **** is + * the area passed down from mprotect_fixup, never extending beyond one + * vma, PPPP is the previous vma, CCCC is a concurrent vma that starts + * at the same address as **** and is of the same or larger span, and + * NNNN the next vma after ****: + * + * **** **** **** + * PPPPPPNNNNNN PPPPPPNNNNNN PPPPPPCCCCCC + * cannot merge might become might become + * PPNNNNNNNNNN PPPPPPPPPPCC + * mmap, brk or case 4 below case 5 below + * mremap move: + * **** **** + * PPPP NNNN PPPPCCCCNNNN + * might become might become + * PPPPPPPPPPPP 1 or PPPPPPPPPPPP 6 or + * PPPPPPPPNNNN 2 or PPPPPPPPNNNN 7 or + * PPPPNNNNNNNN 3 PPPPNNNNNNNN 8 + * + * It is important for case 8 that the vma CCCC overlapping the + * region **** is never going to extended over NNNN. Instead NNNN must + * be extended in region **** and CCCC must be removed. This way in + * all cases where vma_merge succeeds, the moment vma_merge drops the + * rmap_locks, the properties of the merged vma will be already + * correct for the whole merged range. Some of those properties like + * vm_page_prot/vm_flags may be accessed by rmap_walks and they must + * be correct for the whole merged range immediately after the + * rmap_locks are released. Otherwise if NNNN would be removed and + * CCCC would be extended over the NNNN range, remove_migration_ptes + * or other rmap walkers (if working on addresses beyond the "end" + * parameter) may establish ptes with the wrong permissions of CCCC + * instead of the right permissions of NNNN. + * + * In the code below: + * PPPP is represented by *prev + * CCCC is represented by *curr or not represented at all (NULL) + * NNNN is represented by *next or not represented at all (NULL) + * **** is not represented - it will be merged and the vma containing the + * area is returned, or the function will return NULL + */ +static struct vm_area_struct +*vma_merge(struct vma_iterator *vmi, struct vm_area_struct *prev, + struct vm_area_struct *src, unsigned long addr, unsigned long end, + unsigned long vm_flags, pgoff_t pgoff, struct mempolicy *policy, + struct vm_userfaultfd_ctx vm_userfaultfd_ctx, + struct anon_vma_name *anon_name) +{ + struct mm_struct *mm = src->vm_mm; + struct anon_vma *anon_vma = src->anon_vma; + struct file *file = src->vm_file; + struct vm_area_struct *curr, *next, *res; + struct vm_area_struct *vma, *adjust, *remove, *remove2; + struct vm_area_struct *anon_dup = NULL; + struct vma_prepare vp; + pgoff_t vma_pgoff; + int err = 0; + bool merge_prev = false; + bool merge_next = false; + bool vma_expanded = false; + unsigned long vma_start = addr; + unsigned long vma_end = end; + pgoff_t pglen = (end - addr) >> PAGE_SHIFT; + long adj_start = 0; + + /* + * We later require that vma->vm_flags == vm_flags, + * so this tests vma->vm_flags & VM_SPECIAL, too. + */ + if (vm_flags & VM_SPECIAL) + return NULL; + + /* Does the input range span an existing VMA? (cases 5 - 8) */ + curr = find_vma_intersection(mm, prev ? prev->vm_end : 0, end); + + if (!curr || /* cases 1 - 4 */ + end == curr->vm_end) /* cases 6 - 8, adjacent VMA */ + next = vma_lookup(mm, end); + else + next = NULL; /* case 5 */ + + if (prev) { + vma_start = prev->vm_start; + vma_pgoff = prev->vm_pgoff; + + /* Can we merge the predecessor? */ + if (addr == prev->vm_end && mpol_equal(vma_policy(prev), policy) + && can_vma_merge_after(prev, vm_flags, anon_vma, file, + pgoff, vm_userfaultfd_ctx, anon_name)) { + merge_prev = true; + vma_prev(vmi); + } + } + + /* Can we merge the successor? */ + if (next && mpol_equal(policy, vma_policy(next)) && + can_vma_merge_before(next, vm_flags, anon_vma, file, pgoff+pglen, + vm_userfaultfd_ctx, anon_name)) { + merge_next = true; + } + + /* Verify some invariant that must be enforced by the caller. */ + VM_WARN_ON(prev && addr <= prev->vm_start); + VM_WARN_ON(curr && (addr != curr->vm_start || end > curr->vm_end)); + VM_WARN_ON(addr >= end); + + if (!merge_prev && !merge_next) + return NULL; /* Not mergeable. */ + + if (merge_prev) + vma_start_write(prev); + + res = vma = prev; + remove = remove2 = adjust = NULL; + + /* Can we merge both the predecessor and the successor? */ + if (merge_prev && merge_next && + is_mergeable_anon_vma(prev->anon_vma, next->anon_vma, NULL)) { + vma_start_write(next); + remove = next; /* case 1 */ + vma_end = next->vm_end; + err = dup_anon_vma(prev, next, &anon_dup); + if (curr) { /* case 6 */ + vma_start_write(curr); + remove = curr; + remove2 = next; + /* + * Note that the dup_anon_vma below cannot overwrite err + * since the first caller would do nothing unless next + * has an anon_vma. + */ + if (!next->anon_vma) + err = dup_anon_vma(prev, curr, &anon_dup); + } + } else if (merge_prev) { /* case 2 */ + if (curr) { + vma_start_write(curr); + if (end == curr->vm_end) { /* case 7 */ + /* + * can_vma_merge_after() assumed we would not be + * removing prev vma, so it skipped the check + * for vm_ops->close, but we are removing curr + */ + if (curr->vm_ops && curr->vm_ops->close) + err = -EINVAL; + remove = curr; + } else { /* case 5 */ + adjust = curr; + adj_start = (end - curr->vm_start); + } + if (!err) + err = dup_anon_vma(prev, curr, &anon_dup); + } + } else { /* merge_next */ + vma_start_write(next); + res = next; + if (prev && addr < prev->vm_end) { /* case 4 */ + vma_start_write(prev); + vma_end = addr; + adjust = next; + adj_start = -(prev->vm_end - addr); + err = dup_anon_vma(next, prev, &anon_dup); + } else { + /* + * Note that cases 3 and 8 are the ONLY ones where prev + * is permitted to be (but is not necessarily) NULL. + */ + vma = next; /* case 3 */ + vma_start = addr; + vma_end = next->vm_end; + vma_pgoff = next->vm_pgoff - pglen; + if (curr) { /* case 8 */ + vma_pgoff = curr->vm_pgoff; + vma_start_write(curr); + remove = curr; + err = dup_anon_vma(next, curr, &anon_dup); + } + } + } + + /* Error in anon_vma clone. */ + if (err) + goto anon_vma_fail; + + if (vma_start < vma->vm_start || vma_end > vma->vm_end) + vma_expanded = true; + + if (vma_expanded) { + vma_iter_config(vmi, vma_start, vma_end); + } else { + vma_iter_config(vmi, adjust->vm_start + adj_start, + adjust->vm_end); + } + + if (vma_iter_prealloc(vmi, vma)) + goto prealloc_fail; + + init_multi_vma_prep(&vp, vma, adjust, remove, remove2); + VM_WARN_ON(vp.anon_vma && adjust && adjust->anon_vma && + vp.anon_vma != adjust->anon_vma); + + vma_prepare(&vp); + vma_adjust_trans_huge(vma, vma_start, vma_end, adj_start); + vma_set_range(vma, vma_start, vma_end, vma_pgoff); + + if (vma_expanded) + vma_iter_store(vmi, vma); + + if (adj_start) { + adjust->vm_start += adj_start; + adjust->vm_pgoff += adj_start >> PAGE_SHIFT; + if (adj_start < 0) { + WARN_ON(vma_expanded); + vma_iter_store(vmi, next); + } + } + + vma_complete(&vp, vmi, mm); + khugepaged_enter_vma(res, vm_flags); + return res; + +prealloc_fail: + if (anon_dup) + unlink_anon_vmas(anon_dup); + +anon_vma_fail: + vma_iter_set(vmi, addr); + vma_iter_load(vmi); + return NULL; +} + +/* + * We are about to modify one or multiple of a VMA's flags, policy, userfaultfd + * context and anonymous VMA name within the range [start, end). + * + * As a result, we might be able to merge the newly modified VMA range with an + * adjacent VMA with identical properties. + * + * If no merge is possible and the range does not span the entirety of the VMA, + * we then need to split the VMA to accommodate the change. + * + * The function returns either the merged VMA, the original VMA if a split was + * required instead, or an error if the split failed. + */ +struct vm_area_struct *vma_modify(struct vma_iterator *vmi, + struct vm_area_struct *prev, + struct vm_area_struct *vma, + unsigned long start, unsigned long end, + unsigned long vm_flags, + struct mempolicy *policy, + struct vm_userfaultfd_ctx uffd_ctx, + struct anon_vma_name *anon_name) +{ + pgoff_t pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT); + struct vm_area_struct *merged; + + merged = vma_merge(vmi, prev, vma, start, end, vm_flags, + pgoff, policy, uffd_ctx, anon_name); + if (merged) + return merged; + + if (vma->vm_start < start) { + int err = split_vma(vmi, vma, start, 1); + + if (err) + return ERR_PTR(err); + } + + if (vma->vm_end > end) { + int err = split_vma(vmi, vma, end, 0); + + if (err) + return ERR_PTR(err); + } + + return vma; +} + +/* + * Attempt to merge a newly mapped VMA with those adjacent to it. The caller + * must ensure that [start, end) does not overlap any existing VMA. + */ +struct vm_area_struct +*vma_merge_new_vma(struct vma_iterator *vmi, struct vm_area_struct *prev, + struct vm_area_struct *vma, unsigned long start, + unsigned long end, pgoff_t pgoff) +{ + return vma_merge(vmi, prev, vma, start, end, vma->vm_flags, pgoff, + vma_policy(vma), vma->vm_userfaultfd_ctx, anon_vma_name(vma)); +} + +/* + * Expand vma by delta bytes, potentially merging with an immediately adjacent + * VMA with identical properties. + */ +struct vm_area_struct *vma_merge_extend(struct vma_iterator *vmi, + struct vm_area_struct *vma, + unsigned long delta) +{ + pgoff_t pgoff = vma->vm_pgoff + vma_pages(vma); + + /* vma is specified as prev, so case 1 or 2 will apply. */ + return vma_merge(vmi, vma, vma, vma->vm_end, vma->vm_end + delta, + vma->vm_flags, pgoff, vma_policy(vma), + vma->vm_userfaultfd_ctx, anon_vma_name(vma)); +} + +void unlink_file_vma_batch_init(struct unlink_vma_file_batch *vb) +{ + vb->count = 0; +} + +static void unlink_file_vma_batch_process(struct unlink_vma_file_batch *vb) +{ + struct address_space *mapping; + int i; + + mapping = vb->vmas[0]->vm_file->f_mapping; + i_mmap_lock_write(mapping); + for (i = 0; i < vb->count; i++) { + VM_WARN_ON_ONCE(vb->vmas[i]->vm_file->f_mapping != mapping); + __remove_shared_vm_struct(vb->vmas[i], mapping); + } + i_mmap_unlock_write(mapping); + + unlink_file_vma_batch_init(vb); +} + +void unlink_file_vma_batch_add(struct unlink_vma_file_batch *vb, + struct vm_area_struct *vma) +{ + if (vma->vm_file == NULL) + return; + + if ((vb->count > 0 && vb->vmas[0]->vm_file != vma->vm_file) || + vb->count == ARRAY_SIZE(vb->vmas)) + unlink_file_vma_batch_process(vb); + + vb->vmas[vb->count] = vma; + vb->count++; +} + +void unlink_file_vma_batch_final(struct unlink_vma_file_batch *vb) +{ + if (vb->count > 0) + unlink_file_vma_batch_process(vb); +} + +/* + * Unlink a file-based vm structure from its interval tree, to hide + * vma from rmap and vmtruncate before freeing its page tables. + */ +void unlink_file_vma(struct vm_area_struct *vma) +{ + struct file *file = vma->vm_file; + + if (file) { + struct address_space *mapping = file->f_mapping; + + i_mmap_lock_write(mapping); + __remove_shared_vm_struct(vma, mapping); + i_mmap_unlock_write(mapping); + } +} + +void vma_link_file(struct vm_area_struct *vma) +{ + struct file *file = vma->vm_file; + struct address_space *mapping; + + if (file) { + mapping = file->f_mapping; + i_mmap_lock_write(mapping); + __vma_link_file(vma, mapping); + i_mmap_unlock_write(mapping); + } +} + +int vma_link(struct mm_struct *mm, struct vm_area_struct *vma) +{ + VMA_ITERATOR(vmi, mm, 0); + + vma_iter_config(&vmi, vma->vm_start, vma->vm_end); + if (vma_iter_prealloc(&vmi, vma)) + return -ENOMEM; + + vma_start_write(vma); + vma_iter_store(&vmi, vma); + vma_link_file(vma); + mm->map_count++; + validate_mm(mm); + return 0; +} + +/* + * Copy the vma structure to a new location in the same mm, + * prior to moving page table entries, to effect an mremap move. + */ +struct vm_area_struct *copy_vma(struct vm_area_struct **vmap, + unsigned long addr, unsigned long len, pgoff_t pgoff, + bool *need_rmap_locks) +{ + struct vm_area_struct *vma = *vmap; + unsigned long vma_start = vma->vm_start; + struct mm_struct *mm = vma->vm_mm; + struct vm_area_struct *new_vma, *prev; + bool faulted_in_anon_vma = true; + VMA_ITERATOR(vmi, mm, addr); + + /* + * If anonymous vma has not yet been faulted, update new pgoff + * to match new location, to increase its chance of merging. + */ + if (unlikely(vma_is_anonymous(vma) && !vma->anon_vma)) { + pgoff = addr >> PAGE_SHIFT; + faulted_in_anon_vma = false; + } + + new_vma = find_vma_prev(mm, addr, &prev); + if (new_vma && new_vma->vm_start < addr + len) + return NULL; /* should never get here */ + + new_vma = vma_merge_new_vma(&vmi, prev, vma, addr, addr + len, pgoff); + if (new_vma) { + /* + * Source vma may have been merged into new_vma + */ + if (unlikely(vma_start >= new_vma->vm_start && + vma_start < new_vma->vm_end)) { + /* + * The only way we can get a vma_merge with + * self during an mremap is if the vma hasn't + * been faulted in yet and we were allowed to + * reset the dst vma->vm_pgoff to the + * destination address of the mremap to allow + * the merge to happen. mremap must change the + * vm_pgoff linearity between src and dst vmas + * (in turn preventing a vma_merge) to be + * safe. It is only safe to keep the vm_pgoff + * linear if there are no pages mapped yet. + */ + VM_BUG_ON_VMA(faulted_in_anon_vma, new_vma); + *vmap = vma = new_vma; + } + *need_rmap_locks = (new_vma->vm_pgoff <= vma->vm_pgoff); + } else { + new_vma = vm_area_dup(vma); + if (!new_vma) + goto out; + vma_set_range(new_vma, addr, addr + len, pgoff); + if (vma_dup_policy(vma, new_vma)) + goto out_free_vma; + if (anon_vma_clone(new_vma, vma)) + goto out_free_mempol; + if (new_vma->vm_file) + get_file(new_vma->vm_file); + if (new_vma->vm_ops && new_vma->vm_ops->open) + new_vma->vm_ops->open(new_vma); + if (vma_link(mm, new_vma)) + goto out_vma_link; + *need_rmap_locks = false; + } + return new_vma; + +out_vma_link: + if (new_vma->vm_ops && new_vma->vm_ops->close) + new_vma->vm_ops->close(new_vma); + + if (new_vma->vm_file) + fput(new_vma->vm_file); + + unlink_anon_vmas(new_vma); +out_free_mempol: + mpol_put(vma_policy(new_vma)); +out_free_vma: + vm_area_free(new_vma); +out: + return NULL; +} + +/* + * Rough compatibility check to quickly see if it's even worth looking + * at sharing an anon_vma. + * + * They need to have the same vm_file, and the flags can only differ + * in things that mprotect may change. + * + * NOTE! The fact that we share an anon_vma doesn't _have_ to mean that + * we can merge the two vma's. For example, we refuse to merge a vma if + * there is a vm_ops->close() function, because that indicates that the + * driver is doing some kind of reference counting. But that doesn't + * really matter for the anon_vma sharing case. + */ +static int anon_vma_compatible(struct vm_area_struct *a, struct vm_area_struct *b) +{ + return a->vm_end == b->vm_start && + mpol_equal(vma_policy(a), vma_policy(b)) && + a->vm_file == b->vm_file && + !((a->vm_flags ^ b->vm_flags) & ~(VM_ACCESS_FLAGS | VM_SOFTDIRTY)) && + b->vm_pgoff == a->vm_pgoff + ((b->vm_start - a->vm_start) >> PAGE_SHIFT); +} + +/* + * Do some basic sanity checking to see if we can re-use the anon_vma + * from 'old'. The 'a'/'b' vma's are in VM order - one of them will be + * the same as 'old', the other will be the new one that is trying + * to share the anon_vma. + * + * NOTE! This runs with mmap_lock held for reading, so it is possible that + * the anon_vma of 'old' is concurrently in the process of being set up + * by another page fault trying to merge _that_. But that's ok: if it + * is being set up, that automatically means that it will be a singleton + * acceptable for merging, so we can do all of this optimistically. But + * we do that READ_ONCE() to make sure that we never re-load the pointer. + * + * IOW: that the "list_is_singular()" test on the anon_vma_chain only + * matters for the 'stable anon_vma' case (ie the thing we want to avoid + * is to return an anon_vma that is "complex" due to having gone through + * a fork). + * + * We also make sure that the two vma's are compatible (adjacent, + * and with the same memory policies). That's all stable, even with just + * a read lock on the mmap_lock. + */ +static struct anon_vma *reusable_anon_vma(struct vm_area_struct *old, + struct vm_area_struct *a, + struct vm_area_struct *b) +{ + if (anon_vma_compatible(a, b)) { + struct anon_vma *anon_vma = READ_ONCE(old->anon_vma); + + if (anon_vma && list_is_singular(&old->anon_vma_chain)) + return anon_vma; + } + return NULL; +} + +/* + * find_mergeable_anon_vma is used by anon_vma_prepare, to check + * neighbouring vmas for a suitable anon_vma, before it goes off + * to allocate a new anon_vma. It checks because a repetitive + * sequence of mprotects and faults may otherwise lead to distinct + * anon_vmas being allocated, preventing vma merge in subsequent + * mprotect. + */ +struct anon_vma *find_mergeable_anon_vma(struct vm_area_struct *vma) +{ + struct anon_vma *anon_vma = NULL; + struct vm_area_struct *prev, *next; + VMA_ITERATOR(vmi, vma->vm_mm, vma->vm_end); + + /* Try next first. */ + next = vma_iter_load(&vmi); + if (next) { + anon_vma = reusable_anon_vma(next, vma, next); + if (anon_vma) + return anon_vma; + } + + prev = vma_prev(&vmi); + VM_BUG_ON_VMA(prev != vma, vma); + prev = vma_prev(&vmi); + /* Try prev next. */ + if (prev) + anon_vma = reusable_anon_vma(prev, prev, vma); + + /* + * We might reach here with anon_vma == NULL if we can't find + * any reusable anon_vma. + * There's no absolute need to look only at touching neighbours: + * we could search further afield for "compatible" anon_vmas. + * But it would probably just be a waste of time searching, + * or lead to too many vmas hanging off the same anon_vma. + * We're trying to allow mprotect remerging later on, + * not trying to minimize memory used for anon_vmas. + */ + return anon_vma; +} + +static bool vm_ops_needs_writenotify(const struct vm_operations_struct *vm_ops) +{ + return vm_ops && (vm_ops->page_mkwrite || vm_ops->pfn_mkwrite); +} + +static bool vma_is_shared_writable(struct vm_area_struct *vma) +{ + return (vma->vm_flags & (VM_WRITE | VM_SHARED)) == + (VM_WRITE | VM_SHARED); +} + +static bool vma_fs_can_writeback(struct vm_area_struct *vma) +{ + /* No managed pages to writeback. */ + if (vma->vm_flags & VM_PFNMAP) + return false; + + return vma->vm_file && vma->vm_file->f_mapping && + mapping_can_writeback(vma->vm_file->f_mapping); +} + +/* + * Does this VMA require the underlying folios to have their dirty state + * tracked? + */ +bool vma_needs_dirty_tracking(struct vm_area_struct *vma) +{ + /* Only shared, writable VMAs require dirty tracking. */ + if (!vma_is_shared_writable(vma)) + return false; + + /* Does the filesystem need to be notified? */ + if (vm_ops_needs_writenotify(vma->vm_ops)) + return true; + + /* + * Even if the filesystem doesn't indicate a need for writenotify, if it + * can writeback, dirty tracking is still required. + */ + return vma_fs_can_writeback(vma); +} + +/* + * Some shared mappings will want the pages marked read-only + * to track write events. If so, we'll downgrade vm_page_prot + * to the private version (using protection_map[] without the + * VM_SHARED bit). + */ +bool vma_wants_writenotify(struct vm_area_struct *vma, pgprot_t vm_page_prot) +{ + /* If it was private or non-writable, the write bit is already clear */ + if (!vma_is_shared_writable(vma)) + return false; + + /* The backer wishes to know when pages are first written to? */ + if (vm_ops_needs_writenotify(vma->vm_ops)) + return true; + + /* The open routine did something to the protections that pgprot_modify + * won't preserve? */ + if (pgprot_val(vm_page_prot) != + pgprot_val(vm_pgprot_modify(vm_page_prot, vma->vm_flags))) + return false; + + /* + * Do we need to track softdirty? hugetlb does not support softdirty + * tracking yet. + */ + if (vma_soft_dirty_enabled(vma) && !is_vm_hugetlb_page(vma)) + return true; + + /* Do we need write faults for uffd-wp tracking? */ + if (userfaultfd_wp(vma)) + return true; + + /* Can the mapping track the dirty pages? */ + return vma_fs_can_writeback(vma); +} + +unsigned long count_vma_pages_range(struct mm_struct *mm, + unsigned long addr, unsigned long end) +{ + VMA_ITERATOR(vmi, mm, addr); + struct vm_area_struct *vma; + unsigned long nr_pages = 0; + + for_each_vma_range(vmi, vma, end) { + unsigned long vm_start = max(addr, vma->vm_start); + unsigned long vm_end = min(end, vma->vm_end); + + nr_pages += PHYS_PFN(vm_end - vm_start); + } + + return nr_pages; +} + +static DEFINE_MUTEX(mm_all_locks_mutex); + +static void vm_lock_anon_vma(struct mm_struct *mm, struct anon_vma *anon_vma) +{ + if (!test_bit(0, (unsigned long *) &anon_vma->root->rb_root.rb_root.rb_node)) { + /* + * The LSB of head.next can't change from under us + * because we hold the mm_all_locks_mutex. + */ + down_write_nest_lock(&anon_vma->root->rwsem, &mm->mmap_lock); + /* + * We can safely modify head.next after taking the + * anon_vma->root->rwsem. If some other vma in this mm shares + * the same anon_vma we won't take it again. + * + * No need of atomic instructions here, head.next + * can't change from under us thanks to the + * anon_vma->root->rwsem. + */ + if (__test_and_set_bit(0, (unsigned long *) + &anon_vma->root->rb_root.rb_root.rb_node)) + BUG(); + } +} + +static void vm_lock_mapping(struct mm_struct *mm, struct address_space *mapping) +{ + if (!test_bit(AS_MM_ALL_LOCKS, &mapping->flags)) { + /* + * AS_MM_ALL_LOCKS can't change from under us because + * we hold the mm_all_locks_mutex. + * + * Operations on ->flags have to be atomic because + * even if AS_MM_ALL_LOCKS is stable thanks to the + * mm_all_locks_mutex, there may be other cpus + * changing other bitflags in parallel to us. + */ + if (test_and_set_bit(AS_MM_ALL_LOCKS, &mapping->flags)) + BUG(); + down_write_nest_lock(&mapping->i_mmap_rwsem, &mm->mmap_lock); + } +} + +/* + * This operation locks against the VM for all pte/vma/mm related + * operations that could ever happen on a certain mm. This includes + * vmtruncate, try_to_unmap, and all page faults. + * + * The caller must take the mmap_lock in write mode before calling + * mm_take_all_locks(). The caller isn't allowed to release the + * mmap_lock until mm_drop_all_locks() returns. + * + * mmap_lock in write mode is required in order to block all operations + * that could modify pagetables and free pages without need of + * altering the vma layout. It's also needed in write mode to avoid new + * anon_vmas to be associated with existing vmas. + * + * A single task can't take more than one mm_take_all_locks() in a row + * or it would deadlock. + * + * The LSB in anon_vma->rb_root.rb_node and the AS_MM_ALL_LOCKS bitflag in + * mapping->flags avoid to take the same lock twice, if more than one + * vma in this mm is backed by the same anon_vma or address_space. + * + * We take locks in following order, accordingly to comment at beginning + * of mm/rmap.c: + * - all hugetlbfs_i_mmap_rwsem_key locks (aka mapping->i_mmap_rwsem for + * hugetlb mapping); + * - all vmas marked locked + * - all i_mmap_rwsem locks; + * - all anon_vma->rwseml + * + * We can take all locks within these types randomly because the VM code + * doesn't nest them and we protected from parallel mm_take_all_locks() by + * mm_all_locks_mutex. + * + * mm_take_all_locks() and mm_drop_all_locks are expensive operations + * that may have to take thousand of locks. + * + * mm_take_all_locks() can fail if it's interrupted by signals. + */ +int mm_take_all_locks(struct mm_struct *mm) +{ + struct vm_area_struct *vma; + struct anon_vma_chain *avc; + VMA_ITERATOR(vmi, mm, 0); + + mmap_assert_write_locked(mm); + + mutex_lock(&mm_all_locks_mutex); + + /* + * vma_start_write() does not have a complement in mm_drop_all_locks() + * because vma_start_write() is always asymmetrical; it marks a VMA as + * being written to until mmap_write_unlock() or mmap_write_downgrade() + * is reached. + */ + for_each_vma(vmi, vma) { + if (signal_pending(current)) + goto out_unlock; + vma_start_write(vma); + } + + vma_iter_init(&vmi, mm, 0); + for_each_vma(vmi, vma) { + if (signal_pending(current)) + goto out_unlock; + if (vma->vm_file && vma->vm_file->f_mapping && + is_vm_hugetlb_page(vma)) + vm_lock_mapping(mm, vma->vm_file->f_mapping); + } + + vma_iter_init(&vmi, mm, 0); + for_each_vma(vmi, vma) { + if (signal_pending(current)) + goto out_unlock; + if (vma->vm_file && vma->vm_file->f_mapping && + !is_vm_hugetlb_page(vma)) + vm_lock_mapping(mm, vma->vm_file->f_mapping); + } + + vma_iter_init(&vmi, mm, 0); + for_each_vma(vmi, vma) { + if (signal_pending(current)) + goto out_unlock; + if (vma->anon_vma) + list_for_each_entry(avc, &vma->anon_vma_chain, same_vma) + vm_lock_anon_vma(mm, avc->anon_vma); + } + + return 0; + +out_unlock: + mm_drop_all_locks(mm); + return -EINTR; +} + +static void vm_unlock_anon_vma(struct anon_vma *anon_vma) +{ + if (test_bit(0, (unsigned long *) &anon_vma->root->rb_root.rb_root.rb_node)) { + /* + * The LSB of head.next can't change to 0 from under + * us because we hold the mm_all_locks_mutex. + * + * We must however clear the bitflag before unlocking + * the vma so the users using the anon_vma->rb_root will + * never see our bitflag. + * + * No need of atomic instructions here, head.next + * can't change from under us until we release the + * anon_vma->root->rwsem. + */ + if (!__test_and_clear_bit(0, (unsigned long *) + &anon_vma->root->rb_root.rb_root.rb_node)) + BUG(); + anon_vma_unlock_write(anon_vma); + } +} + +static void vm_unlock_mapping(struct address_space *mapping) +{ + if (test_bit(AS_MM_ALL_LOCKS, &mapping->flags)) { + /* + * AS_MM_ALL_LOCKS can't change to 0 from under us + * because we hold the mm_all_locks_mutex. + */ + i_mmap_unlock_write(mapping); + if (!test_and_clear_bit(AS_MM_ALL_LOCKS, + &mapping->flags)) + BUG(); + } +} + +/* + * The mmap_lock cannot be released by the caller until + * mm_drop_all_locks() returns. + */ +void mm_drop_all_locks(struct mm_struct *mm) +{ + struct vm_area_struct *vma; + struct anon_vma_chain *avc; + VMA_ITERATOR(vmi, mm, 0); + + mmap_assert_write_locked(mm); + BUG_ON(!mutex_is_locked(&mm_all_locks_mutex)); + + for_each_vma(vmi, vma) { + if (vma->anon_vma) + list_for_each_entry(avc, &vma->anon_vma_chain, same_vma) + vm_unlock_anon_vma(avc->anon_vma); + if (vma->vm_file && vma->vm_file->f_mapping) + vm_unlock_mapping(vma->vm_file->f_mapping); + } + + mutex_unlock(&mm_all_locks_mutex); +} diff --git a/mm/vma.h b/mm/vma.h new file mode 100644 index 000000000000..cd184f9233ec --- /dev/null +++ b/mm/vma.h @@ -0,0 +1,362 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* + * vma.h + * + * Core VMA manipulation API implemented in vma.c. + */ +#ifndef __MM_VMA_H +#define __MM_VMA_H + +/* + * VMA lock generalization + */ +struct vma_prepare { + struct vm_area_struct *vma; + struct vm_area_struct *adj_next; + struct file *file; + struct address_space *mapping; + struct anon_vma *anon_vma; + struct vm_area_struct *insert; + struct vm_area_struct *remove; + struct vm_area_struct *remove2; +}; + +struct unlink_vma_file_batch { + int count; + struct vm_area_struct *vmas[8]; +}; + +#ifdef CONFIG_DEBUG_VM_MAPLE_TREE +void validate_mm(struct mm_struct *mm); +#else +#define validate_mm(mm) do { } while (0) +#endif + +/* Required for expand_downwards(). */ +void anon_vma_interval_tree_pre_update_vma(struct vm_area_struct *vma); + +/* Required for expand_downwards(). */ +void anon_vma_interval_tree_post_update_vma(struct vm_area_struct *vma); + +/* Required for do_brk_flags(). */ +void vma_prepare(struct vma_prepare *vp); + +/* Required for do_brk_flags(). */ +void init_vma_prep(struct vma_prepare *vp, + struct vm_area_struct *vma); + +/* Required for do_brk_flags(). */ +void vma_complete(struct vma_prepare *vp, + struct vma_iterator *vmi, struct mm_struct *mm); + +int vma_expand(struct vma_iterator *vmi, struct vm_area_struct *vma, + unsigned long start, unsigned long end, pgoff_t pgoff, + struct vm_area_struct *next); + +int vma_shrink(struct vma_iterator *vmi, struct vm_area_struct *vma, + unsigned long start, unsigned long end, pgoff_t pgoff); + +int +do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma, + struct mm_struct *mm, unsigned long start, + unsigned long end, struct list_head *uf, bool unlock); + +int do_vmi_munmap(struct vma_iterator *vmi, struct mm_struct *mm, + unsigned long start, size_t len, struct list_head *uf, + bool unlock); + +void remove_vma(struct vm_area_struct *vma, bool unreachable); + +void unmap_region(struct mm_struct *mm, struct ma_state *mas, + struct vm_area_struct *vma, struct vm_area_struct *prev, + struct vm_area_struct *next, unsigned long start, + unsigned long end, unsigned long tree_end, bool mm_wr_locked); + +/* Required by mmap_region(). */ +bool +can_vma_merge_before(struct vm_area_struct *vma, unsigned long vm_flags, + struct anon_vma *anon_vma, struct file *file, + pgoff_t vm_pgoff, struct vm_userfaultfd_ctx vm_userfaultfd_ctx, + struct anon_vma_name *anon_name); + +/* Required by mmap_region() and do_brk_flags(). */ +bool +can_vma_merge_after(struct vm_area_struct *vma, unsigned long vm_flags, + struct anon_vma *anon_vma, struct file *file, + pgoff_t vm_pgoff, struct vm_userfaultfd_ctx vm_userfaultfd_ctx, + struct anon_vma_name *anon_name); + +struct vm_area_struct *vma_modify(struct vma_iterator *vmi, + struct vm_area_struct *prev, + struct vm_area_struct *vma, + unsigned long start, unsigned long end, + unsigned long vm_flags, + struct mempolicy *policy, + struct vm_userfaultfd_ctx uffd_ctx, + struct anon_vma_name *anon_name); + +/* We are about to modify the VMA's flags. */ +static inline struct vm_area_struct +*vma_modify_flags(struct vma_iterator *vmi, + struct vm_area_struct *prev, + struct vm_area_struct *vma, + unsigned long start, unsigned long end, + unsigned long new_flags) +{ + return vma_modify(vmi, prev, vma, start, end, new_flags, + vma_policy(vma), vma->vm_userfaultfd_ctx, + anon_vma_name(vma)); +} + +/* We are about to modify the VMA's flags and/or anon_name. */ +static inline struct vm_area_struct +*vma_modify_flags_name(struct vma_iterator *vmi, + struct vm_area_struct *prev, + struct vm_area_struct *vma, + unsigned long start, + unsigned long end, + unsigned long new_flags, + struct anon_vma_name *new_name) +{ + return vma_modify(vmi, prev, vma, start, end, new_flags, + vma_policy(vma), vma->vm_userfaultfd_ctx, new_name); +} + +/* We are about to modify the VMA's memory policy. */ +static inline struct vm_area_struct +*vma_modify_policy(struct vma_iterator *vmi, + struct vm_area_struct *prev, + struct vm_area_struct *vma, + unsigned long start, unsigned long end, + struct mempolicy *new_pol) +{ + return vma_modify(vmi, prev, vma, start, end, vma->vm_flags, + new_pol, vma->vm_userfaultfd_ctx, anon_vma_name(vma)); +} + +/* We are about to modify the VMA's flags and/or uffd context. */ +static inline struct vm_area_struct +*vma_modify_flags_uffd(struct vma_iterator *vmi, + struct vm_area_struct *prev, + struct vm_area_struct *vma, + unsigned long start, unsigned long end, + unsigned long new_flags, + struct vm_userfaultfd_ctx new_ctx) +{ + return vma_modify(vmi, prev, vma, start, end, new_flags, + vma_policy(vma), new_ctx, anon_vma_name(vma)); +} + +struct vm_area_struct +*vma_merge_new_vma(struct vma_iterator *vmi, struct vm_area_struct *prev, + struct vm_area_struct *vma, unsigned long start, + unsigned long end, pgoff_t pgoff); + +struct vm_area_struct *vma_merge_extend(struct vma_iterator *vmi, + struct vm_area_struct *vma, + unsigned long delta); + +void unlink_file_vma_batch_init(struct unlink_vma_file_batch *vb); + +void unlink_file_vma_batch_final(struct unlink_vma_file_batch *vb); + +void unlink_file_vma_batch_add(struct unlink_vma_file_batch *vb, + struct vm_area_struct *vma); + +void unlink_file_vma(struct vm_area_struct *vma); + +void vma_link_file(struct vm_area_struct *vma); + +int vma_link(struct mm_struct *mm, struct vm_area_struct *vma); + +struct vm_area_struct *copy_vma(struct vm_area_struct **vmap, + unsigned long addr, unsigned long len, pgoff_t pgoff, + bool *need_rmap_locks); + +struct anon_vma *find_mergeable_anon_vma(struct vm_area_struct *vma); + +bool vma_needs_dirty_tracking(struct vm_area_struct *vma); +bool vma_wants_writenotify(struct vm_area_struct *vma, pgprot_t vm_page_prot); + +int mm_take_all_locks(struct mm_struct *mm); +void mm_drop_all_locks(struct mm_struct *mm); +unsigned long count_vma_pages_range(struct mm_struct *mm, + unsigned long addr, unsigned long end); + +static inline bool vma_wants_manual_pte_write_upgrade(struct vm_area_struct *vma) +{ + /* + * We want to check manually if we can change individual PTEs writable + * if we can't do that automatically for all PTEs in a mapping. For + * private mappings, that's always the case when we have write + * permissions as we properly have to handle COW. + */ + if (vma->vm_flags & VM_SHARED) + return vma_wants_writenotify(vma, vma->vm_page_prot); + return !!(vma->vm_flags & VM_WRITE); +} + +static inline pgprot_t vm_pgprot_modify(pgprot_t oldprot, unsigned long vm_flags) +{ + return pgprot_modify(oldprot, vm_get_page_prot(vm_flags)); +} + +static inline struct vm_area_struct *vma_prev_limit(struct vma_iterator *vmi, + unsigned long min) +{ + return mas_prev(&vmi->mas, min); +} + +static inline int vma_iter_store_gfp(struct vma_iterator *vmi, + struct vm_area_struct *vma, gfp_t gfp) +{ + if (vmi->mas.status != ma_start && + ((vmi->mas.index > vma->vm_start) || (vmi->mas.last < vma->vm_start))) + vma_iter_invalidate(vmi); + + __mas_set_range(&vmi->mas, vma->vm_start, vma->vm_end - 1); + mas_store_gfp(&vmi->mas, vma, gfp); + if (unlikely(mas_is_err(&vmi->mas))) + return -ENOMEM; + + return 0; +} + + +/* + * These three helpers classifies VMAs for virtual memory accounting. + */ + +/* + * Executable code area - executable, not writable, not stack + */ +static inline bool is_exec_mapping(vm_flags_t flags) +{ + return (flags & (VM_EXEC | VM_WRITE | VM_STACK)) == VM_EXEC; +} + +/* + * Stack area (including shadow stacks) + * + * VM_GROWSUP / VM_GROWSDOWN VMAs are always private anonymous: + * do_mmap() forbids all other combinations. + */ +static inline bool is_stack_mapping(vm_flags_t flags) +{ + return ((flags & VM_STACK) == VM_STACK) || (flags & VM_SHADOW_STACK); +} + +/* + * Data area - private, writable, not stack + */ +static inline bool is_data_mapping(vm_flags_t flags) +{ + return (flags & (VM_WRITE | VM_SHARED | VM_STACK)) == VM_WRITE; +} + + +static inline void vma_iter_config(struct vma_iterator *vmi, + unsigned long index, unsigned long last) +{ + __mas_set_range(&vmi->mas, index, last - 1); +} + +static inline void vma_iter_reset(struct vma_iterator *vmi) +{ + mas_reset(&vmi->mas); +} + +static inline +struct vm_area_struct *vma_iter_prev_range_limit(struct vma_iterator *vmi, unsigned long min) +{ + return mas_prev_range(&vmi->mas, min); +} + +static inline +struct vm_area_struct *vma_iter_next_range_limit(struct vma_iterator *vmi, unsigned long max) +{ + return mas_next_range(&vmi->mas, max); +} + +static inline int vma_iter_area_lowest(struct vma_iterator *vmi, unsigned long min, + unsigned long max, unsigned long size) +{ + return mas_empty_area(&vmi->mas, min, max - 1, size); +} + +static inline int vma_iter_area_highest(struct vma_iterator *vmi, unsigned long min, + unsigned long max, unsigned long size) +{ + return mas_empty_area_rev(&vmi->mas, min, max - 1, size); +} + +/* + * VMA Iterator functions shared between nommu and mmap + */ +static inline int vma_iter_prealloc(struct vma_iterator *vmi, + struct vm_area_struct *vma) +{ + return mas_preallocate(&vmi->mas, vma, GFP_KERNEL); +} + +static inline void vma_iter_clear(struct vma_iterator *vmi) +{ + mas_store_prealloc(&vmi->mas, NULL); +} + +static inline struct vm_area_struct *vma_iter_load(struct vma_iterator *vmi) +{ + return mas_walk(&vmi->mas); +} + +/* Store a VMA with preallocated memory */ +static inline void vma_iter_store(struct vma_iterator *vmi, + struct vm_area_struct *vma) +{ + +#if defined(CONFIG_DEBUG_VM_MAPLE_TREE) + if (MAS_WARN_ON(&vmi->mas, vmi->mas.status != ma_start && + vmi->mas.index > vma->vm_start)) { + pr_warn("%lx > %lx\n store vma %lx-%lx\n into slot %lx-%lx\n", + vmi->mas.index, vma->vm_start, vma->vm_start, + vma->vm_end, vmi->mas.index, vmi->mas.last); + } + if (MAS_WARN_ON(&vmi->mas, vmi->mas.status != ma_start && + vmi->mas.last < vma->vm_start)) { + pr_warn("%lx < %lx\nstore vma %lx-%lx\ninto slot %lx-%lx\n", + vmi->mas.last, vma->vm_start, vma->vm_start, vma->vm_end, + vmi->mas.index, vmi->mas.last); + } +#endif + + if (vmi->mas.status != ma_start && + ((vmi->mas.index > vma->vm_start) || (vmi->mas.last < vma->vm_start))) + vma_iter_invalidate(vmi); + + __mas_set_range(&vmi->mas, vma->vm_start, vma->vm_end - 1); + mas_store_prealloc(&vmi->mas, vma); +} + +static inline unsigned long vma_iter_addr(struct vma_iterator *vmi) +{ + return vmi->mas.index; +} + +static inline unsigned long vma_iter_end(struct vma_iterator *vmi) +{ + return vmi->mas.last + 1; +} + +static inline int vma_iter_bulk_alloc(struct vma_iterator *vmi, + unsigned long count) +{ + return mas_expected_entries(&vmi->mas, count); +} + +static inline +struct vm_area_struct *vma_iter_prev_range(struct vma_iterator *vmi) +{ + return mas_prev_range(&vmi->mas, 0); +} + +#endif /* __MM_VMA_H */ diff --git a/mm/vma_internal.h b/mm/vma_internal.h new file mode 100644 index 000000000000..e13e5950df78 --- /dev/null +++ b/mm/vma_internal.h @@ -0,0 +1,52 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* + * vma_internal.h + * + * Headers required by vma.c, which can be substituted accordingly when testing + * VMA functionality. + */ + +#ifndef __MM_VMA_INTERNAL_H +#define __MM_VMA_INTERNAL_H + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include + +#include "internal.h" + +#endif /* __MM_VMA_INTERNAL_H */ From patchwork Fri Jun 28 14:35:26 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lorenzo Stoakes X-Patchwork-Id: 13716245 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CA337C2BD09 for ; Fri, 28 Jun 2024 14:35:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CC8E16B009B; Fri, 28 Jun 2024 10:35:49 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BFD5C6B009C; Fri, 28 Jun 2024 10:35:49 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A9C0B6B009D; Fri, 28 Jun 2024 10:35:49 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 83CBC6B009C for ; Fri, 28 Jun 2024 10:35:49 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id EED7C1A03DB for ; Fri, 28 Jun 2024 14:35:48 +0000 (UTC) X-FDA: 82280546376.22.1D86B85 Received: from mail-wm1-f51.google.com (mail-wm1-f51.google.com [209.85.128.51]) by imf15.hostedemail.com (Postfix) with ESMTP id DC4FBA0005 for ; Fri, 28 Jun 2024 14:35:46 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=f7SeRtxM; spf=pass (imf15.hostedemail.com: domain of lstoakes@gmail.com designates 209.85.128.51 as permitted sender) smtp.mailfrom=lstoakes@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1719585337; a=rsa-sha256; cv=none; b=7o601y1YVO4LCvY4jDZ1L5W68HsqEH7/A3wsOayALE86GLGL7mcpS/z5ya2JY2bLQaPbCU WWoHeFI7crtlDI1LkqFvnJxhpJzatMPUq/QQ9Hs8PvP1tvfBW50mil6hm5JjnAbTF4oz3q 2U81iZcm2J0RRfmZcGUKZ24sxrEZCjg= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=f7SeRtxM; spf=pass (imf15.hostedemail.com: domain of lstoakes@gmail.com designates 209.85.128.51 as permitted sender) smtp.mailfrom=lstoakes@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1719585337; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=+VCM3WkcoPjwO3U6PrjTKLd51325lQXJKBmY5RFuANs=; b=2DPlyYNnpl6znfA69eNZIE6t1egIJn0/Rj2flG4CYy4fMyO4UlLsHgXru3fjp8ot9bNyZC 4jgNrRmWtckkDl9/yjB6HqmA7+KprEbaXF8qyU9oS8g/OHjPpzfVgCGbE+z9sHPmnX9eWn oRLeVzec98SghT8sjlyrhdjgDaTyB1I= Received: by mail-wm1-f51.google.com with SMTP id 5b1f17b1804b1-424aa70fbc4so4807335e9.1 for ; Fri, 28 Jun 2024 07:35:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1719585345; x=1720190145; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=+VCM3WkcoPjwO3U6PrjTKLd51325lQXJKBmY5RFuANs=; b=f7SeRtxMBLBayYcYwHIcORkilWovp4qpmpdOiqBSwPY235lHH4c4zI667/jyHFJjvX n7DufDNHkGJZ5jfOY2b0G+PHMKn7D829pyhaUap0QyOPPE8yzJPURy3BIDbhjGdpk45k IRfWOP5r0Uc0kwkQh04jrz/pKpddAHyW2xGGFvhjQSZ4srn6M85xFWwg4VwvckTiyBFv QNjTqC1rDY272urft64SeM/6UUbJ/ZgyYan1c5mbUZ0iR9xUC3CHvTGKGJlnz9lUwQBP eh7IZp3nZN0079J4vraS9qdXYEOY5GVF7aWgZRitd3bm6IZKfyQttyFXw3wh7SRkQjaR gRag== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1719585345; x=1720190145; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=+VCM3WkcoPjwO3U6PrjTKLd51325lQXJKBmY5RFuANs=; b=jQe0GX6gb4ctmclJbPMgONHEHbGssvrLVHhCK+OCaIF/SxK1azAMhpqJD44V3qa9NN UP2zSnoQcquecDd9D0GwGxDM73EKzFgjoP4mgKXs/qUVqUOyYi6rcy1SoDMD0xg22E2Q 2EHlaCJQnjI3RSQU3KNw0+EGJNoEOrJy030GCCsPhvZc24NmN1Gw/b6Ur10vkVc07rq6 WJLTj4UbPbT5Jig9JYIbFD1LzqJsXOC72H71DafDM6enOj2y2RrN98xUG/ASE2V4oc89 0IzyxyzV8NvpRNQ7y23YCkDb5AUrwYj2BSWqPIFgAiJM4po9DP5CJkHTqqHWi67mijU+ iRKw== X-Forwarded-Encrypted: i=1; AJvYcCVSq1sTn2zFJIRrJywsYIN89j8tKPpKMHEzbqYMfyNAtUOn57dbpNt7ZR7B1y0a0ZtGrDOBzaSHvs+fQusDxxF5ZQ8= X-Gm-Message-State: AOJu0YzfWciZAi9w//Xj8D6GAiMPDvHSsu0gl38WSRtVeDmVAzF5lv92 I0kdOvIyv/oQyshilSJwoaSVFbOQMVXHo1XAj9x0Eib7bUpj1pOy X-Google-Smtp-Source: AGHT+IG+GT7vM/vuCP4Ik73lva+ehQFqIN8obARO7xTMmlpMIbkWtm6BmGJCs2S1YQbgjFGCpfZfTw== X-Received: by 2002:a05:600c:470d:b0:422:7eca:db41 with SMTP id 5b1f17b1804b1-4248cc18101mr133385125e9.2.1719585345445; Fri, 28 Jun 2024 07:35:45 -0700 (PDT) Received: from lucifer.home ([2a00:23cc:d20f:ba01:bb66:f8b2:a0e8:6447]) by smtp.googlemail.com with ESMTPSA id 5b1f17b1804b1-4256af37828sm38985485e9.9.2024.06.28.07.35.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 28 Jun 2024 07:35:44 -0700 (PDT) From: Lorenzo Stoakes To: Andrew Morton Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, "Liam R . Howlett" , Vlastimil Babka , Matthew Wilcox , Alexander Viro , Christian Brauner , Jan Kara , Eric Biederman , Kees Cook , Suren Baghdasaryan , Lorenzo Stoakes Subject: [RFC PATCH v2 5/7] MAINTAINERS: Add entry for new VMA files Date: Fri, 28 Jun 2024 15:35:26 +0100 Message-ID: <0319419d965adc03bf22fee66e39244fc3d65528.1719584707.git.lstoakes@gmail.com> X-Mailer: git-send-email 2.45.1 In-Reply-To: References: MIME-Version: 1.0 X-Stat-Signature: and4r6eqg6f9hisad6th5yk8yi5noqtr X-Rspamd-Queue-Id: DC4FBA0005 X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1719585346-285900 X-HE-Meta: U2FsdGVkX1/CMA0Qprpu/jxSqHzOV7GnwcuZbb7xZAxyA3dGGwrqZPJ3acHFjx2gWBbvtd7fo4oTrMsOgjOmTEurn2x8xFpXl7eX4WuFTY10Hm2HkEpJ+LCMSDjVyW81DaTBeeYObpXaQCpom1/Y5SNGh9ocR2aBzNcy5dXZxVKcwXCzDIMNsFuqGjKIvEkAWK9TZIdZnXlypnw7EBOXrxXWenJP+e62F2S8mXXAkjeR0Eps/YtomlanM5E+KvoQY1uNtSXN9j6lOFc6ajiwsg9YqjGjaYbZEXvmnXCAJDK0TgD2DsdBo9Of3b58Fh3MYVUN7nxNkAaalfEc4BA2AlcDpdexA2trKX3BmcmnE96Go1E1cAghno/HVvMIRAnqBnSuTec8x3Ht01x1evBZrvhRh1WyNC918B0L2K9GZrkS1ggfrSaNVKRvrBDyz050vreK3tZF1iabADnfaBSJKXoJKyU7DB6hD6n6X+YOho72jgmTNRFva+2CAzZezF5dl7B2HNkVVee2n6ysb008LsbfWMBtJjvhJZAincxe1uGFHCsmrA++bQPyrtpIqwiKCzjH5/PpUxA41S4Lm3yuQzW/oRcTtUMBbJwF7DcFuWtnROoV9bQBJw9toiz5QfB08WslK+UL4PQvd0xSE+2Xp+L2deVt9DMWGmZ65FhKvqdreaYP/aYj8pglWgE59ZMFu7akdHSf0B6mmzg3qNMbiksF8vs6X5Ed+tGxGxK0Qhq3W2gwPqXRo+/kr/Wywh9sKdw4yoWVG+ToLtyEs65l08sy6trcVg9tZwvtx2k3s/MFCnOwvRf6rgpKt2f7/4og6R8K9o/N6s907hCdLQfEdF/n3LuME5KF79UANmPa5p4HY/DbVCptbAFC/SHYkCI+GIRkg5wKjgpOG4dsqczeBgujA8MxZpwo/+egZlOWtKoiodowl4Q2yuF17yMbMEKA7b+iYN/k1kwErTxR6xQ rqCvn6Zs wvDuWKeFmzK6g1YFz339XH0QGFWIhHYXQqDhUvdiDC4XFhTrOLqYGVslWiNcULP2h2zb2J/miVYADvHxDmP6hc061jI4QvMHGIl7aY5dsDULXpH+V5soj2S3VThS/gNn9mVAeVSHsKtx3weULCS1FQmDSk7x8ZiTdBe/vu14Qygnk/4bxmVm2t76TFzkuwYqThHFz263J12iX5poH02cqzW4VHmj26/2Ptg10A8ZC36D5l07hV/SKVOgB6pFSs9IETLAimpn2I9I4NqeScflN6fXZuohHzeQ8OyHMnFL4dNBvioULwt9VfN9k26R+OsEPsUcWAL2GrU2SsRqn3gnpnEmmcnfjrlxERDDSjCpagUWfonaAoFBwZLvoS8SSItFh7WL8D10f885uvhoNpB0Y2pYnNSX7IJEJ9KwsAzt3zvh1txnbOPIA4id1AIflQ+6mD4Fg8wNDpec6j/2dedGWXOuVARPFY+vhRRdpCUAGIYhWgwo59YHH3PzOdqHOpiPmj0ZEW4iNWGz159W9P3tZHgLVDkSuTgmSiXT0lncErEu9d+pKvArNQlsSAkdjxIYvypo6a230HuaPHBT6CjFDfpjFhg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: The vma files contain logic split from mmap.c for the most part and are all relevant to VMA logic, so maintain the same reviewers for both. Signed-off-by: Lorenzo Stoakes --- MAINTAINERS | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index 098d214f78d9..0847cb5903ab 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -23971,6 +23971,19 @@ F: include/uapi/linux/vsockmon.h F: net/vmw_vsock/ F: tools/testing/vsock/ +VMA +M: Andrew Morton +R: Liam R. Howlett +R: Vlastimil Babka +R: Lorenzo Stoakes +L: linux-mm@kvack.org +S: Maintained +W: http://www.linux-mm.org +T: git git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm +F: mm/vma.c +F: mm/vma.h +F: mm/vma_internal.h + VMALLOC M: Andrew Morton R: Uladzislau Rezki From patchwork Fri Jun 28 14:35:27 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lorenzo Stoakes X-Patchwork-Id: 13716246 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1D535C2BD09 for ; Fri, 28 Jun 2024 14:35:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7481B6B009C; Fri, 28 Jun 2024 10:35:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6F3AD6B009D; Fri, 28 Jun 2024 10:35:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 546BB6B009E; Fri, 28 Jun 2024 10:35:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 2E95F6B009C for ; Fri, 28 Jun 2024 10:35:51 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id C4ACD16042A for ; Fri, 28 Jun 2024 14:35:50 +0000 (UTC) X-FDA: 82280546460.10.4A0A3B2 Received: from mail-wm1-f48.google.com (mail-wm1-f48.google.com [209.85.128.48]) by imf03.hostedemail.com (Postfix) with ESMTP id B6B3E2001C for ; Fri, 28 Jun 2024 14:35:48 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=dthyPOkR; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf03.hostedemail.com: domain of lstoakes@gmail.com designates 209.85.128.48 as permitted sender) smtp.mailfrom=lstoakes@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1719585335; a=rsa-sha256; cv=none; b=XA3Jl5otDQSnJkwGGNYmVIrqv45evDOhNM+3Ad56Jjfs/+uIhHjAKXQBr5tD5B2c0aLbIk E5dSnNrhQXUK1H++A7xrxCASxxI4vxPLep1Nb0cjS96rssQVnKecnJNl/MgPWRtCaa2TPU j5q01iZTK/utv7NHo8McD1TlzVoXHls= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=dthyPOkR; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf03.hostedemail.com: domain of lstoakes@gmail.com designates 209.85.128.48 as permitted sender) smtp.mailfrom=lstoakes@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1719585335; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=NWYOXuCXP6e46p+DWPm2mdKXMIhkuMXGRD48/DvCa1o=; b=2q3WhNyo1EKkyxklb1erMXDeGs4zULk+tSYh5E6LDwrWbX84C2m8E4+iWLsqzecso2CSvj YMJYSXnw39fWbnZUIIGXeo0zAMYKifln62hdUuIOkXNL4nfRtVz3vbRS1bzQAGpSZSdVjT pJMcqWDr1i+QVFc3OHXYg7+/IXOkTuM= Received: by mail-wm1-f48.google.com with SMTP id 5b1f17b1804b1-424acf3226fso5874725e9.1 for ; Fri, 28 Jun 2024 07:35:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1719585347; x=1720190147; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=NWYOXuCXP6e46p+DWPm2mdKXMIhkuMXGRD48/DvCa1o=; b=dthyPOkRChZQEkr3j2ayIszVQH9Y/sBkjtAHswf0ZCG5ahNCIq3h/nst1waddl0Dwm RtHdBSS4+pz5/lInHwW9B4g2RUTLfgqHaSlPDb8C8p2eqpyh3uEPmPuy4XVBJMuh+2oi m/XlQnCX+n7emzeQAnlKPlS7c7mnGP8jKA2JtxMsV7KpSfkDKoh3Bbt2UfweIpZr4lQF CJIgfwHIZgxHP2MfTQb3LuFDYbuRX8wzGHC5B0duT/MAiL6Z3UTP3HcyONR6ZNiRDRPM 4oB+odBMnDFBVQdI5pJhk2275VUhz3Sfo/LkF+2zfX3WJL5NAlBkOvOLoReMhhXKjTZp wzCQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1719585347; x=1720190147; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=NWYOXuCXP6e46p+DWPm2mdKXMIhkuMXGRD48/DvCa1o=; b=BaolXskzeVomRest09Jf+zhu5W/OcMbv/IIUYdsgPW4SOLakpJZPonWgZk6ATwSunk Uz1GDp8GfMWPqnUYy8WAxgO1FqRoGnFB+xX3MSgZZANJ0EWbqtBBpeDZJdmhA6MUP8h7 P4S7y+CEndPdHSNaajakYDt22SaRq6jisEGp5GGYq3wCeOiysbTwGax0414dbSDXK37A le0Ck0DBUG/QKuF5SZhsGx1f9nm7vg8J9SEDuu0cecTQ4U1DLvEW+LOyVQOA3GF2oRBF OmjBC328iVqs7m6QhDJ5aha/T9cfzSmpUs9jbWzRs5nOneEqUW7vkqs3pShkp1z3zMfx qTQw== X-Forwarded-Encrypted: i=1; AJvYcCWdRx6NQTvGVq4yx4kyLgzy0Vl9vPd3e143fqdQSehykU6GdHaqeepFVDPofV2zJyLuSFBy/gE1RVBoYAc5/el+t3Y= X-Gm-Message-State: AOJu0YxxP44vNlvLpAFJpRU7fhV+xX9tspLHEQ5JxedY1/K7IwfXuesJ R8nH30AG2gOCCqIDcvUqhRVSRDawWn+VP1ZLLaiDnneiWSY1kN5A X-Google-Smtp-Source: AGHT+IGSZOJ0dbHVRYSGryimv6aA1rUgySqZQVkdvN/lo/Uk+BdxK1YgYZ6RG0nn2JHkVBFdBxSOAw== X-Received: by 2002:a05:600c:1c8e:b0:425:680b:a691 with SMTP id 5b1f17b1804b1-425680ba7a5mr25979305e9.31.1719585347172; Fri, 28 Jun 2024 07:35:47 -0700 (PDT) Received: from lucifer.home ([2a00:23cc:d20f:ba01:bb66:f8b2:a0e8:6447]) by smtp.googlemail.com with ESMTPSA id 5b1f17b1804b1-4256af37828sm38985485e9.9.2024.06.28.07.35.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 28 Jun 2024 07:35:45 -0700 (PDT) From: Lorenzo Stoakes To: Andrew Morton Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, "Liam R . Howlett" , Vlastimil Babka , Matthew Wilcox , Alexander Viro , Christian Brauner , Jan Kara , Eric Biederman , Kees Cook , Suren Baghdasaryan , Lorenzo Stoakes Subject: [RFC PATCH v2 6/7] tools: separate out shared radix-tree components Date: Fri, 28 Jun 2024 15:35:27 +0100 Message-ID: <053b1fcaf694ff37b4342dd4b394ebde936de337.1719584707.git.lstoakes@gmail.com> X-Mailer: git-send-email 2.45.1 In-Reply-To: References: MIME-Version: 1.0 X-Rspamd-Queue-Id: B6B3E2001C X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: r3w8d6i4xqr8hspo1ct3yq4hkfin9kfg X-HE-Tag: 1719585348-480863 X-HE-Meta: U2FsdGVkX1/oRtjDnLUsrrakO3nP2YtzO9UmCxVaZSoZ2e+tg3tC+FtIsh5oHbEmrk4XQnnLkhf31KVkiPxVDsdQWH7Ci2DDJJpxPJHgM2clzmufGR7qAVw/o1xRlYTrZf3zNZ/ZmECljOnsKYMe+VKHTHCwBOzZrvmjTdd+S416b/i8D6PoiLMCkEM2HIuqlsVTNttJk/k4+cEjg9nG3bLd8HIIkQk9auzNLEy6tRhRvd/nG8lQII1RY+uscshCw5Acr/gqpJPzTBTXoOHZWkQktzN9VcSz41ov82hW10dAEqV8307Rst3hdfk3i5xqOQiQNgxkTZSPzvanax55zy3RfsnS8SI77hdzntJXEVwJKjD4WUgHFtiXK1kC+spYI1n+7HvrWaSWz+QaVqmK/FxtRXm0k6qWIqMxPOXo6c6+omuIgxmgv6nCMrZDScb1TAWTpHqAzVDF4NJZOrWJA60LDZm237HNwLG4x8wAo6Ga72MTzaB7ldf3QW1mnAYYDhUlEXG+zQE+2fN9b4wQ7VLzfkMr1w67YVPswKlMb5CHyasBwIeWu/4b1d7hLlpsV2TUq0iEMQVLQSF90FcPc6Px4uvgY9VKSpi0miLFDQSuT2UeroPJYcbxXjsu05mMXAV/C0ZtPK24M/Zwaw/GkszhI7oFFR6s0tA+24OLF918CK+Qm3iqJ5I7pNCxdJhCyekplvVnBhv3g0ZmigShHO+GcsQCRBviJE5ZgYkveM5zy2Ua5YyZsoXi/BnO44OLsIsyLY/79b69x5Hsz39iOpsPoOOgN6WofItR+MdsRK5oErC4RofJNrbHQ3SIcW8Uy6Zt1yMAF86m5rXaeu6aukpDXkUDKJ4PzWkzCu5QK3tzQAZKQqFQ6LFcJ6kaMIKWBWWb/UG09fRxnSa+v2OiYUuCEsvbnsRsmMCI4aBeM1E0IWWYKCdM3V4LGx6+MtxmMoFXIAPuendagXH5OUL widmMJgU 87pZ1p1D4xPJCoe63PgbpsY0tJ8H4KscuAZEH+GoeRq9aRisXWJN+5kUk8k3neuohH4PPp4WFxx6b8p+faItjVn1XpVOd0NbToeGfP/E58Xm9gKscOOjtwjtn9/UvBPA3rQPVPZDuljPbsq3e4xLBuqMnJknIF1IlmGDW3N2PT9loQpwcMIFRarmBvrokTrAJ4YpvuCA3kNRMyN6Ve/QidwABptAoZQ6QBOluBuowerCJ1Iy/Gyv4GSQyG7J6zpXZN8cfQvCSYhrzX/7O5iRBkCBWUV2iLLpEKeCfiuIbQ4x70g8J3ojMToLsualS3hca1P8VwCCVMh30J2kGv8OtAaj2JEafE7u2h/7AB6KeCLiuDoSmU26wNhnG0B96yrKoN0D/3ynLdw0D/Hfj8025Jfntqlyd8cegzg3LI081fE0Jsyypf+xl5YBxGUQSv2t1cWhVypyGwHQTr+DFodUYF/pt3uAaPA8kJJj1EZMGLP7XVms= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: The core components contained within the radix-tree tests which provide shims for kernel headers and access to the maple tree are useful for testing other things, so separate them out and make the radix tree tests dependent on the shared components. This lays the groundwork for us to add VMA tests of the newly introduced vma.c file. Signed-off-by: Lorenzo Stoakes --- tools/testing/radix-tree/Makefile | 68 +++---------------- tools/testing/radix-tree/maple.c | 14 +--- tools/testing/radix-tree/xarray.c | 9 +-- tools/testing/shared/autoconf.h | 2 + tools/testing/{radix-tree => shared}/bitmap.c | 0 tools/testing/{radix-tree => shared}/linux.c | 0 .../{radix-tree => shared}/linux/bug.h | 0 .../{radix-tree => shared}/linux/cpu.h | 0 .../{radix-tree => shared}/linux/idr.h | 0 .../{radix-tree => shared}/linux/init.h | 0 .../{radix-tree => shared}/linux/kconfig.h | 0 .../{radix-tree => shared}/linux/kernel.h | 0 .../{radix-tree => shared}/linux/kmemleak.h | 0 .../{radix-tree => shared}/linux/local_lock.h | 0 .../{radix-tree => shared}/linux/lockdep.h | 0 .../{radix-tree => shared}/linux/maple_tree.h | 0 .../{radix-tree => shared}/linux/percpu.h | 0 .../{radix-tree => shared}/linux/preempt.h | 0 .../{radix-tree => shared}/linux/radix-tree.h | 0 .../{radix-tree => shared}/linux/rcupdate.h | 0 .../{radix-tree => shared}/linux/xarray.h | 0 tools/testing/shared/maple-shared.h | 9 +++ tools/testing/shared/maple-shim.c | 7 ++ tools/testing/shared/shared.h | 34 ++++++++++ tools/testing/shared/shared.mk | 68 +++++++++++++++++++ .../testing/shared/trace/events/maple_tree.h | 5 ++ tools/testing/shared/xarray-shared.c | 5 ++ tools/testing/shared/xarray-shared.h | 4 ++ 28 files changed, 147 insertions(+), 78 deletions(-) create mode 100644 tools/testing/shared/autoconf.h rename tools/testing/{radix-tree => shared}/bitmap.c (100%) rename tools/testing/{radix-tree => shared}/linux.c (100%) rename tools/testing/{radix-tree => shared}/linux/bug.h (100%) rename tools/testing/{radix-tree => shared}/linux/cpu.h (100%) rename tools/testing/{radix-tree => shared}/linux/idr.h (100%) rename tools/testing/{radix-tree => shared}/linux/init.h (100%) rename tools/testing/{radix-tree => shared}/linux/kconfig.h (100%) rename tools/testing/{radix-tree => shared}/linux/kernel.h (100%) rename tools/testing/{radix-tree => shared}/linux/kmemleak.h (100%) rename tools/testing/{radix-tree => shared}/linux/local_lock.h (100%) rename tools/testing/{radix-tree => shared}/linux/lockdep.h (100%) rename tools/testing/{radix-tree => shared}/linux/maple_tree.h (100%) rename tools/testing/{radix-tree => shared}/linux/percpu.h (100%) rename tools/testing/{radix-tree => shared}/linux/preempt.h (100%) rename tools/testing/{radix-tree => shared}/linux/radix-tree.h (100%) rename tools/testing/{radix-tree => shared}/linux/rcupdate.h (100%) rename tools/testing/{radix-tree => shared}/linux/xarray.h (100%) create mode 100644 tools/testing/shared/maple-shared.h create mode 100644 tools/testing/shared/maple-shim.c create mode 100644 tools/testing/shared/shared.h create mode 100644 tools/testing/shared/shared.mk create mode 100644 tools/testing/shared/trace/events/maple_tree.h create mode 100644 tools/testing/shared/xarray-shared.c create mode 100644 tools/testing/shared/xarray-shared.h diff --git a/tools/testing/radix-tree/Makefile b/tools/testing/radix-tree/Makefile index 7527f738b4a1..29d607063749 100644 --- a/tools/testing/radix-tree/Makefile +++ b/tools/testing/radix-tree/Makefile @@ -1,29 +1,16 @@ # SPDX-License-Identifier: GPL-2.0 -CFLAGS += -I. -I../../include -I../../../lib -g -Og -Wall \ - -D_LGPL_SOURCE -fsanitize=address -fsanitize=undefined -LDFLAGS += -fsanitize=address -fsanitize=undefined -LDLIBS+= -lpthread -lurcu -TARGETS = main idr-test multiorder xarray maple -CORE_OFILES := xarray.o radix-tree.o idr.o linux.o test.o find_bit.o bitmap.o \ - slab.o maple.o -OFILES = main.o $(CORE_OFILES) regression1.o regression2.o regression3.o \ - regression4.o tag_check.o multiorder.o idr-test.o iteration_check.o \ - iteration_check_2.o benchmark.o +.PHONY: default -ifndef SHIFT - SHIFT=3 -endif +default: main -ifeq ($(BUILD), 32) - CFLAGS += -m32 - LDFLAGS += -m32 -LONG_BIT := 32 -endif +include ../shared/shared.mk -ifndef LONG_BIT -LONG_BIT := $(shell getconf LONG_BIT) -endif +TARGETS = main idr-test multiorder xarray maple +CORE_OFILES = $(SHARED_OFILES) xarray.o maple.o test.o +OFILES = main.o $(CORE_OFILES) regression1.o regression2.o \ + regression3.o regression4.o tag_check.o multiorder.o idr-test.o \ + iteration_check.o iteration_check_2.o benchmark.o targets: generated/map-shift.h generated/bit-length.h $(TARGETS) @@ -32,46 +19,13 @@ main: $(OFILES) idr-test.o: ../../../lib/test_ida.c idr-test: idr-test.o $(CORE_OFILES) -xarray: $(CORE_OFILES) +xarray: $(CORE_OFILES) xarray.o -maple: $(CORE_OFILES) +maple: $(CORE_OFILES) maple.o multiorder: multiorder.o $(CORE_OFILES) clean: $(RM) $(TARGETS) *.o radix-tree.c idr.c generated/map-shift.h generated/bit-length.h -vpath %.c ../../lib - -$(OFILES): Makefile *.h */*.h generated/map-shift.h generated/bit-length.h \ - ../../include/linux/*.h \ - ../../include/asm/*.h \ - ../../../include/linux/xarray.h \ - ../../../include/linux/maple_tree.h \ - ../../../include/linux/radix-tree.h \ - ../../../lib/radix-tree.h \ - ../../../include/linux/idr.h - -radix-tree.c: ../../../lib/radix-tree.c - sed -e 's/^static //' -e 's/__always_inline //' -e 's/inline //' < $< > $@ - -idr.c: ../../../lib/idr.c - sed -e 's/^static //' -e 's/__always_inline //' -e 's/inline //' < $< > $@ - -xarray.o: ../../../lib/xarray.c ../../../lib/test_xarray.c - -maple.o: ../../../lib/maple_tree.c ../../../lib/test_maple_tree.c - -generated/map-shift.h: - @if ! grep -qws $(SHIFT) generated/map-shift.h; then \ - echo "#define XA_CHUNK_SHIFT $(SHIFT)" > \ - generated/map-shift.h; \ - fi - -generated/bit-length.h: FORCE - @if ! grep -qws CONFIG_$(LONG_BIT)BIT generated/bit-length.h; then \ - echo "Generating $@"; \ - echo "#define CONFIG_$(LONG_BIT)BIT 1" > $@; \ - fi - -FORCE: ; +$(OFILES): $(SHARED_DEPS) *.h */*.h diff --git a/tools/testing/radix-tree/maple.c b/tools/testing/radix-tree/maple.c index f1caf4bcf937..5b53ecf22fc4 100644 --- a/tools/testing/radix-tree/maple.c +++ b/tools/testing/radix-tree/maple.c @@ -8,20 +8,8 @@ * difficult to handle in kernel tests. */ -#define CONFIG_DEBUG_MAPLE_TREE -#define CONFIG_MAPLE_SEARCH -#define MAPLE_32BIT (MAPLE_NODE_SLOTS > 31) +#include "maple-shared.h" #include "test.h" -#include -#include -#include "linux/init.h" - -#define module_init(x) -#define module_exit(x) -#define MODULE_AUTHOR(x) -#define MODULE_LICENSE(x) -#define dump_stack() assert(0) - #include "../../../lib/maple_tree.c" #include "../../../lib/test_maple_tree.c" diff --git a/tools/testing/radix-tree/xarray.c b/tools/testing/radix-tree/xarray.c index f20e12cbbfd4..253208a8541b 100644 --- a/tools/testing/radix-tree/xarray.c +++ b/tools/testing/radix-tree/xarray.c @@ -4,16 +4,9 @@ * Copyright (c) 2018 Matthew Wilcox */ -#define XA_DEBUG +#include "xarray-shared.h" #include "test.h" -#define module_init(x) -#define module_exit(x) -#define MODULE_AUTHOR(x) -#define MODULE_LICENSE(x) -#define dump_stack() assert(0) - -#include "../../../lib/xarray.c" #undef XA_DEBUG #include "../../../lib/test_xarray.c" diff --git a/tools/testing/shared/autoconf.h b/tools/testing/shared/autoconf.h new file mode 100644 index 000000000000..92dc474c349b --- /dev/null +++ b/tools/testing/shared/autoconf.h @@ -0,0 +1,2 @@ +#include "bit-length.h" +#define CONFIG_XARRAY_MULTI 1 diff --git a/tools/testing/radix-tree/bitmap.c b/tools/testing/shared/bitmap.c similarity index 100% rename from tools/testing/radix-tree/bitmap.c rename to tools/testing/shared/bitmap.c diff --git a/tools/testing/radix-tree/linux.c b/tools/testing/shared/linux.c similarity index 100% rename from tools/testing/radix-tree/linux.c rename to tools/testing/shared/linux.c diff --git a/tools/testing/radix-tree/linux/bug.h b/tools/testing/shared/linux/bug.h similarity index 100% rename from tools/testing/radix-tree/linux/bug.h rename to tools/testing/shared/linux/bug.h diff --git a/tools/testing/radix-tree/linux/cpu.h b/tools/testing/shared/linux/cpu.h similarity index 100% rename from tools/testing/radix-tree/linux/cpu.h rename to tools/testing/shared/linux/cpu.h diff --git a/tools/testing/radix-tree/linux/idr.h b/tools/testing/shared/linux/idr.h similarity index 100% rename from tools/testing/radix-tree/linux/idr.h rename to tools/testing/shared/linux/idr.h diff --git a/tools/testing/radix-tree/linux/init.h b/tools/testing/shared/linux/init.h similarity index 100% rename from tools/testing/radix-tree/linux/init.h rename to tools/testing/shared/linux/init.h diff --git a/tools/testing/radix-tree/linux/kconfig.h b/tools/testing/shared/linux/kconfig.h similarity index 100% rename from tools/testing/radix-tree/linux/kconfig.h rename to tools/testing/shared/linux/kconfig.h diff --git a/tools/testing/radix-tree/linux/kernel.h b/tools/testing/shared/linux/kernel.h similarity index 100% rename from tools/testing/radix-tree/linux/kernel.h rename to tools/testing/shared/linux/kernel.h diff --git a/tools/testing/radix-tree/linux/kmemleak.h b/tools/testing/shared/linux/kmemleak.h similarity index 100% rename from tools/testing/radix-tree/linux/kmemleak.h rename to tools/testing/shared/linux/kmemleak.h diff --git a/tools/testing/radix-tree/linux/local_lock.h b/tools/testing/shared/linux/local_lock.h similarity index 100% rename from tools/testing/radix-tree/linux/local_lock.h rename to tools/testing/shared/linux/local_lock.h diff --git a/tools/testing/radix-tree/linux/lockdep.h b/tools/testing/shared/linux/lockdep.h similarity index 100% rename from tools/testing/radix-tree/linux/lockdep.h rename to tools/testing/shared/linux/lockdep.h diff --git a/tools/testing/radix-tree/linux/maple_tree.h b/tools/testing/shared/linux/maple_tree.h similarity index 100% rename from tools/testing/radix-tree/linux/maple_tree.h rename to tools/testing/shared/linux/maple_tree.h diff --git a/tools/testing/radix-tree/linux/percpu.h b/tools/testing/shared/linux/percpu.h similarity index 100% rename from tools/testing/radix-tree/linux/percpu.h rename to tools/testing/shared/linux/percpu.h diff --git a/tools/testing/radix-tree/linux/preempt.h b/tools/testing/shared/linux/preempt.h similarity index 100% rename from tools/testing/radix-tree/linux/preempt.h rename to tools/testing/shared/linux/preempt.h diff --git a/tools/testing/radix-tree/linux/radix-tree.h b/tools/testing/shared/linux/radix-tree.h similarity index 100% rename from tools/testing/radix-tree/linux/radix-tree.h rename to tools/testing/shared/linux/radix-tree.h diff --git a/tools/testing/radix-tree/linux/rcupdate.h b/tools/testing/shared/linux/rcupdate.h similarity index 100% rename from tools/testing/radix-tree/linux/rcupdate.h rename to tools/testing/shared/linux/rcupdate.h diff --git a/tools/testing/radix-tree/linux/xarray.h b/tools/testing/shared/linux/xarray.h similarity index 100% rename from tools/testing/radix-tree/linux/xarray.h rename to tools/testing/shared/linux/xarray.h diff --git a/tools/testing/shared/maple-shared.h b/tools/testing/shared/maple-shared.h new file mode 100644 index 000000000000..3d847edd149d --- /dev/null +++ b/tools/testing/shared/maple-shared.h @@ -0,0 +1,9 @@ +/* SPDX-License-Identifier: GPL-2.0+ */ + +#define CONFIG_DEBUG_MAPLE_TREE +#define CONFIG_MAPLE_SEARCH +#define MAPLE_32BIT (MAPLE_NODE_SLOTS > 31) +#include "shared.h" +#include +#include +#include "linux/init.h" diff --git a/tools/testing/shared/maple-shim.c b/tools/testing/shared/maple-shim.c new file mode 100644 index 000000000000..640df76f483e --- /dev/null +++ b/tools/testing/shared/maple-shim.c @@ -0,0 +1,7 @@ +// SPDX-License-Identifier: GPL-2.0-or-later + +/* Very simple shim around the maple tree. */ + +#include "maple-shared.h" + +#include "../../../lib/maple_tree.c" diff --git a/tools/testing/shared/shared.h b/tools/testing/shared/shared.h new file mode 100644 index 000000000000..495602e60b65 --- /dev/null +++ b/tools/testing/shared/shared.h @@ -0,0 +1,34 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#include +#include +#include +#include + +#include +#include +#include + +#ifndef module_init +#define module_init(x) +#endif + +#ifndef module_exit +#define module_exit(x) +#endif + +#ifndef MODULE_AUTHOR +#define MODULE_AUTHOR(x) +#endif + +#ifndef MODULE_LICENSE +#define MODULE_LICENSE(x) +#endif + +#ifndef MODULE_DESCRIPTION +#define MODULE_DESCRIPTION(x) +#endif + +#ifndef dump_stack +#define dump_stack() assert(0) +#endif diff --git a/tools/testing/shared/shared.mk b/tools/testing/shared/shared.mk new file mode 100644 index 000000000000..69a6a528eaed --- /dev/null +++ b/tools/testing/shared/shared.mk @@ -0,0 +1,68 @@ +# SPDX-License-Identifier: GPL-2.0 + +CFLAGS += -I../shared -I. -I../../include -I../../../lib -g -Og -Wall \ + -D_LGPL_SOURCE -fsanitize=address -fsanitize=undefined +LDFLAGS += -fsanitize=address -fsanitize=undefined +LDLIBS += -lpthread -lurcu +SHARED_OFILES = xarray-shared.o radix-tree.o idr.o linux.o find_bit.o bitmap.o \ + slab.o +SHARED_DEPS = Makefile ../shared/shared.mk ../shared/*.h generated/map-shift.h \ + generated/bit-length.h generated/autoconf.h \ + ../../include/linux/*.h \ + ../../include/asm/*.h \ + ../../../include/linux/xarray.h \ + ../../../include/linux/maple_tree.h \ + ../../../include/linux/radix-tree.h \ + ../../../lib/radix-tree.h \ + ../../../include/linux/idr.h + +ifndef SHIFT + SHIFT=3 +endif + +ifeq ($(BUILD), 32) + CFLAGS += -m32 + LDFLAGS += -m32 +LONG_BIT := 32 +endif + +ifndef LONG_BIT +LONG_BIT := $(shell getconf LONG_BIT) +endif + +%.o: ../shared/%.c + $(CC) -c $(CFLAGS) $(CPPFLAGS) $< -o $@ + +vpath %.c ../../lib + +$(SHARED_OFILES): $(SHARED_DEPS) + +radix-tree.c: ../../../lib/radix-tree.c + sed -e 's/^static //' -e 's/__always_inline //' -e 's/inline //' < $< > $@ + +idr.c: ../../../lib/idr.c + sed -e 's/^static //' -e 's/__always_inline //' -e 's/inline //' < $< > $@ + +xarray-shared.o: ../shared/xarray-shared.c ../../../lib/xarray.c \ + ../../../lib/test_xarray.c + +maple-shared.o: ../shared/maple-shared.c ../../../lib/maple_tree.c \ + ../../../lib/test_maple_tree.c + +generated/autoconf.h: + cp ../shared/autoconf.h generated/autoconf.h + +generated/map-shift.h: + @if ! grep -qws $(SHIFT) generated/map-shift.h; then \ + echo "Generating $@"; \ + echo "#define XA_CHUNK_SHIFT $(SHIFT)" > \ + generated/map-shift.h; \ + fi + +generated/bit-length.h: FORCE + @if ! grep -qws CONFIG_$(LONG_BIT)BIT generated/bit-length.h; then \ + echo "Generating $@"; \ + echo "#define CONFIG_$(LONG_BIT)BIT 1" > $@; \ + fi + +FORCE: ; diff --git a/tools/testing/shared/trace/events/maple_tree.h b/tools/testing/shared/trace/events/maple_tree.h new file mode 100644 index 000000000000..97d0e1ddcf08 --- /dev/null +++ b/tools/testing/shared/trace/events/maple_tree.h @@ -0,0 +1,5 @@ +/* SPDX-License-Identifier: GPL-2.0+ */ + +#define trace_ma_op(a, b) do {} while (0) +#define trace_ma_read(a, b) do {} while (0) +#define trace_ma_write(a, b, c, d) do {} while (0) diff --git a/tools/testing/shared/xarray-shared.c b/tools/testing/shared/xarray-shared.c new file mode 100644 index 000000000000..e90901958dcd --- /dev/null +++ b/tools/testing/shared/xarray-shared.c @@ -0,0 +1,5 @@ +// SPDX-License-Identifier: GPL-2.0-or-later + +#include "xarray-shared.h" + +#include "../../../lib/xarray.c" diff --git a/tools/testing/shared/xarray-shared.h b/tools/testing/shared/xarray-shared.h new file mode 100644 index 000000000000..ac2d16ff53ae --- /dev/null +++ b/tools/testing/shared/xarray-shared.h @@ -0,0 +1,4 @@ +/* SPDX-License-Identifier: GPL-2.0+ */ + +#define XA_DEBUG +#include "shared.h" From patchwork Fri Jun 28 14:35:28 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lorenzo Stoakes X-Patchwork-Id: 13716248 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 70518C3065B for ; Fri, 28 Jun 2024 14:36:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5C0196B009E; Fri, 28 Jun 2024 10:35:54 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 545A66B009F; Fri, 28 Jun 2024 10:35:54 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 359A66B00A0; Fri, 28 Jun 2024 10:35:54 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 0C4276B009E for ; Fri, 28 Jun 2024 10:35:54 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id A84C4A1C72 for ; Fri, 28 Jun 2024 14:35:53 +0000 (UTC) X-FDA: 82280546586.11.5F17C91 Received: from mail-wm1-f47.google.com (mail-wm1-f47.google.com [209.85.128.47]) by imf20.hostedemail.com (Postfix) with ESMTP id 4ABAE1C0020 for ; Fri, 28 Jun 2024 14:35:51 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=kyJVHmZo; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf20.hostedemail.com: domain of lstoakes@gmail.com designates 209.85.128.47 as permitted sender) smtp.mailfrom=lstoakes@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1719585336; a=rsa-sha256; cv=none; b=A45H9EDnA+Bfjo1/yOSu9pz4msILl3XoSQwFEhYkZ4Fc6gsiuAYctFtmkDnI3bFzjAtkzI e0ZbHKfrOwHbIP9XJX5iiCKq3ZBZaczfN+C70p17T7/8NbHXAmHHDO2OOYfjWPb5VRMJw5 YZzMo00zWpw+zMtq3iBdL/Qwr2QMryg= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=kyJVHmZo; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf20.hostedemail.com: domain of lstoakes@gmail.com designates 209.85.128.47 as permitted sender) smtp.mailfrom=lstoakes@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1719585336; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=y4z0i7Y8HIpyBHs7U6k/ncTPiwaVGdjuzgwtoOXRYYA=; b=q7i0YWSv03jc3jDb4x0c6HxceSmuL3MxAJ+jAeJBm1DMwWQL2F4DQ6Qu2eGGVW1s/SKvI5 U3nnC+t5mDvD5Xx/XtAOZUxXloWfR/Z+tQ08E79XNalD3AVzVXli+e2p1B7e4WvSJna6eQ A79EFm8ds3JWs/NC4gQZy6FoLTM0JUQ= Received: by mail-wm1-f47.google.com with SMTP id 5b1f17b1804b1-4248ea53493so4771755e9.3 for ; Fri, 28 Jun 2024 07:35:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1719585350; x=1720190150; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=y4z0i7Y8HIpyBHs7U6k/ncTPiwaVGdjuzgwtoOXRYYA=; b=kyJVHmZoe2cEpKR5y+Yk9Rm6qTxt7rus6g6sU1G3fWLaoTqTJLadP0WXYo59YE/yAr IJL1PgVgLZ+8TtYMBwryMSVQJrE5BZjhI5xJg+cWuznKHg1h2sIu4Mqbo4LXcGtmB4xC SfQJ5fVrLGjvFqYDqwbyEvhiDWpTDIbfFEbNDRDFVTvqr0u/TgAenx85E+cu4Y8hGaVB sRsXpMO2Z0tdSlZ5Y1bL7b9RFGTw1j2exmILfEAWg1udaLJKW7HP4NHWnu4vYrEJvEVF 7qEpv79hNyJXfw3/9S4zNz25c2/wvk8zBVmRDZUf7Vi3BIb+tx1p8+tNj9FxSo6VBzAz wSmQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1719585350; x=1720190150; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=y4z0i7Y8HIpyBHs7U6k/ncTPiwaVGdjuzgwtoOXRYYA=; b=cG3tHTa7JN4nGJcpow243wz+XiW5w3ofCM8z2kcvXUxkjQ/MfmZUvSnZP6fySiAmoh kuKUf+ZSzozYsQHhVKD5+pmNn95xWCf9n2noj/w+HJ3AlJ3xCrHNYBQE1Vp93zZBdiHX eQgWOvNtDMZ9E0AM96S5JYJwqZt74ka6nl4XBZoS05J26lwmJNtE3ppD3KarYlzUG+fR xR2XXi38sjKplpub8JIUkQOhakpNRhj9vFgP1Ef40sIf8ZKl3KxuNQks3LclgL2YFO5r Bx4ZJI1OUPhgMc3ctpocYmyVzQw1YfTkTj9l5acW+ZLh2bJ0/gkWPsyY24s9TJXdTuuX 4vWA== X-Forwarded-Encrypted: i=1; AJvYcCX9R937jAE33HP31S8xCtOQ4xzjkp+a/WXSIjTgYmrz/fLiC3iJFnjyvNuMApArKvk9hJLx29sx/PnrXlxxhx7VADc= X-Gm-Message-State: AOJu0YwdAuj0VhyeEGEO+hegXj7pEqT+BvHCsM4ZiPod49ar18Xamb+i pLjn7Z8f/Ft2NqftqUpV1M8rbhsU8nvfN3S8L5zxiZ1l2vbuySYz X-Google-Smtp-Source: AGHT+IGd3eQ3elvTRjJLRy2OtZfHZAmmKqAw7Y+Ixh6aVlwD1cFyrqWl7mIT5eMk3LnO8WFZlwnOkA== X-Received: by 2002:a05:600c:3592:b0:424:798a:f7ee with SMTP id 5b1f17b1804b1-4248cc18166mr121847805e9.7.1719585349608; Fri, 28 Jun 2024 07:35:49 -0700 (PDT) Received: from lucifer.home ([2a00:23cc:d20f:ba01:bb66:f8b2:a0e8:6447]) by smtp.googlemail.com with ESMTPSA id 5b1f17b1804b1-4256af37828sm38985485e9.9.2024.06.28.07.35.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 28 Jun 2024 07:35:47 -0700 (PDT) From: Lorenzo Stoakes To: Andrew Morton Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, "Liam R . Howlett" , Vlastimil Babka , Matthew Wilcox , Alexander Viro , Christian Brauner , Jan Kara , Eric Biederman , Kees Cook , Suren Baghdasaryan , Lorenzo Stoakes Subject: [RFC PATCH v2 7/7] tools: add skeleton code for userland testing of VMA logic Date: Fri, 28 Jun 2024 15:35:28 +0100 Message-ID: <73c7a094524bdb21e25d8c436c9059820ad82cb5.1719584707.git.lstoakes@gmail.com> X-Mailer: git-send-email 2.45.1 In-Reply-To: References: MIME-Version: 1.0 X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 4ABAE1C0020 X-Stat-Signature: 8ucoqi648ybebjt1fb61ajci5mcchc5p X-Rspam-User: X-HE-Tag: 1719585351-195658 X-HE-Meta: U2FsdGVkX18foRWm7Lnc+knQ4A8zxe6NttTCMZkANS/iurxObtrSwdRwHBKiyhjbmrrRHKtsEF5ZIhMSsb9hdWzvicWdzmjczw/uJzBOlqQNEqQG3jPihCABpYFIokiluUklTEm72/GiYY4akwHmvE6Iv/GlJS2vTa0j+97DaA5DBvMzjT5RwU//EOVAgo+fBJL5ponSzx9t8QySI/LJuivAWM6fjLWdATiFYcUZUNXFHX6dEyX8nimcSpznMXzeTiPFf9UAt6arWmCl3QEmR1KuSewmrxpPe5+F6Uf99XKXqLtDhf2m1ani093G0WB8ZYc2IyHfF1Mt4NkQwONzJta9dRTHFXGnPURw7zxf+fwg/hGvg4xlR3skg9lekz9iBUQfRvh/ZgEshO6FPp7SSiELyitoi5mh2B2mmcU7FZN1VV0gEamLt85hORodJrDlbq0qGAaeH3SDbyP9/KEZ2XtB3pkGay/g0oAnA1bV8zdH7SDXPxX/ASfD40Dm80OeL2gm+9lIVYUPHLipWLGdGyHfbitsqBJFgCpDjLcUNHRLft8+hCIDujbEAq2isDR4qN+pt0sOhbNNA8PsD90+jxg+1INCAVTTg2+mQFAhk6Hq8OLgbSnao8j5Sk2h4nRH1NW24DohTY8SX0k4nDGwPEUGWV9ebhtidL2QFRN1WsKMXJdm3LACrbeAMlTFLFzc49sSwc798WMT1TVaPZsfONRG4zNxTHvxdiFxLHqmX+QWzvPJ/4VtDZsTzQ6fJR0F4l5nyROWCXY4c4qZkC/PCu5zbjvbgO7rCYbZgWuG3VOUfy3v/UNXIZfDtwsxG5b33d1Lu47FrI/BRK6nTqqqRPNnb6WZ99RGbpW5zvxhrHC7jOkNyXJ8h/x+9vAJi8rOD1EFp8owcbIufjxXlO9N24Idk9DeyvRVZfWFIzFC19oSooA+5m7mdsI5mDDy8g3HplqkQWaYkaWODRGUSkd RIgq6shK jN4dm19T3yE2lrxOXiHHMRNmwjYeSnqpWB5fYqzSur2pLLldgXIeY11BG/iD3TO8mxUZ1womGMTPuCgyK67lioXU91DBWusBOYd3SaSF/3mvBa4/1F75hwn4Q4piKupaqXhjsJz87oEtyVPtor2rZvMD8e9ZWhg7p5FAv0cvwzxCc2WtsN0iYxv3K8AVO9pOCCIJFl3viBVxkFr9tdEWZ7CtejODWVSWIQIV6AuiEq3D2I3dyOjPZ3VhqhBN5ghE2f3L+Eizb1BtrhQ8vL1OJlCnPumO2KcOnTNlnchH8uwVAo6opyMaEkofSLn17K+V4bJECvO9757pwSq796au0FLQ4VDdLRM9FGWIQudhZMiZv2kqumhGHMDaGG3OuKMoNnSkwCy+XYh7mS0mYe5fGdkbVOWXf1B5gQgt/gXCGT4KEk5Y3i2+VZd2nOq1W1WsItVBXiTBZB18SsXQ= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Establish a new userland VMA unit testing implementation under tools/testing which utilises existing logic providing maple tree support in userland utilising the now-shared code previously exclusive to radix tree testing. This provides fundamental VMA operations whose API is defined in mm/vma.h, while stubbing out superfluous functionality. This exists as a proof-of-concept, with the test implementation functional and sufficient to allow userland compilation of vma.c, but containing only cursory tests to demonstrate basic functionality. Signed-off-by: Lorenzo Stoakes --- MAINTAINERS | 1 + include/linux/atomic.h | 2 +- include/linux/mmzone.h | 3 +- tools/testing/vma/.gitignore | 6 + tools/testing/vma/Makefile | 15 + tools/testing/vma/errors.txt | 0 tools/testing/vma/generated/autoconf.h | 2 + tools/testing/vma/linux/atomic.h | 12 + tools/testing/vma/linux/mmzone.h | 38 ++ tools/testing/vma/vma.c | 207 ++++++ tools/testing/vma/vma_internal.h | 882 +++++++++++++++++++++++++ 11 files changed, 1166 insertions(+), 2 deletions(-) create mode 100644 tools/testing/vma/.gitignore create mode 100644 tools/testing/vma/Makefile create mode 100644 tools/testing/vma/errors.txt create mode 100644 tools/testing/vma/generated/autoconf.h create mode 100644 tools/testing/vma/linux/atomic.h create mode 100644 tools/testing/vma/linux/mmzone.h create mode 100644 tools/testing/vma/vma.c create mode 100644 tools/testing/vma/vma_internal.h diff --git a/MAINTAINERS b/MAINTAINERS index 0847cb5903ab..410062bd8e21 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -23983,6 +23983,7 @@ T: git git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm F: mm/vma.c F: mm/vma.h F: mm/vma_internal.h +F: tools/testing/vma VMALLOC M: Andrew Morton diff --git a/include/linux/atomic.h b/include/linux/atomic.h index 8dd57c3a99e9..badfba2fd10f 100644 --- a/include/linux/atomic.h +++ b/include/linux/atomic.h @@ -81,4 +81,4 @@ #include #include -#endif /* _LINUX_ATOMIC_H */ +#endif /* _LINUX_ATOMIC_H */ diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 41458892bc8a..30a22e57fa50 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -1,4 +1,5 @@ -/* SPDX-License-Identifier: GPL-2.0 */ +/* SPDX-License-Identifier: GPL-2.0-or-later */ + #ifndef _LINUX_MMZONE_H #define _LINUX_MMZONE_H diff --git a/tools/testing/vma/.gitignore b/tools/testing/vma/.gitignore new file mode 100644 index 000000000000..d915f7d7fb1a --- /dev/null +++ b/tools/testing/vma/.gitignore @@ -0,0 +1,6 @@ +# SPDX-License-Identifier: GPL-2.0-only +generated/bit-length.h +generated/map-shift.h +idr.c +radix-tree.c +vma diff --git a/tools/testing/vma/Makefile b/tools/testing/vma/Makefile new file mode 100644 index 000000000000..a12fb6596016 --- /dev/null +++ b/tools/testing/vma/Makefile @@ -0,0 +1,15 @@ +# SPDX-License-Identifier: GPL-2.0-or-later + +.PHONY: default + +default: vma + +include ../shared/shared.mk + +OFILES = vma.o $(SHARED_OFILES) maple-shim.o +TARGETS = vma + +vma: $(OFILES) + +clean: + $(RM) $(TARGETS) *.o radix-tree.c idr.c generated/map-shift.h generated/bit-length.h diff --git a/tools/testing/vma/errors.txt b/tools/testing/vma/errors.txt new file mode 100644 index 000000000000..e69de29bb2d1 diff --git a/tools/testing/vma/generated/autoconf.h b/tools/testing/vma/generated/autoconf.h new file mode 100644 index 000000000000..92dc474c349b --- /dev/null +++ b/tools/testing/vma/generated/autoconf.h @@ -0,0 +1,2 @@ +#include "bit-length.h" +#define CONFIG_XARRAY_MULTI 1 diff --git a/tools/testing/vma/linux/atomic.h b/tools/testing/vma/linux/atomic.h new file mode 100644 index 000000000000..e01f66f98982 --- /dev/null +++ b/tools/testing/vma/linux/atomic.h @@ -0,0 +1,12 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ + +#ifndef _LINUX_ATOMIC_H +#define _LINUX_ATOMIC_H + +#define atomic_t int32_t +#define atomic_inc(x) uatomic_inc(x) +#define atomic_read(x) uatomic_read(x) +#define atomic_set(x, y) do {} while (0) +#define U8_MAX UCHAR_MAX + +#endif /* _LINUX_ATOMIC_H */ diff --git a/tools/testing/vma/linux/mmzone.h b/tools/testing/vma/linux/mmzone.h new file mode 100644 index 000000000000..e6a96c686610 --- /dev/null +++ b/tools/testing/vma/linux/mmzone.h @@ -0,0 +1,38 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#ifndef _LINUX_MMZONE_H +#define _LINUX_MMZONE_H + +#include + +struct pglist_data *first_online_pgdat(void); +struct pglist_data *next_online_pgdat(struct pglist_data *pgdat); + +#define for_each_online_pgdat(pgdat) \ + for (pgdat = first_online_pgdat(); \ + pgdat; \ + pgdat = next_online_pgdat(pgdat)) + +enum zone_type { + __MAX_NR_ZONES +}; + +#define MAX_NR_ZONES __MAX_NR_ZONES +#define MAX_PAGE_ORDER 10 +#define MAX_ORDER_NR_PAGES (1 << MAX_PAGE_ORDER) + +#define pageblock_order MAX_PAGE_ORDER +#define pageblock_nr_pages BIT(pageblock_order) +#define pageblock_align(pfn) ALIGN((pfn), pageblock_nr_pages) +#define pageblock_start_pfn(pfn) ALIGN_DOWN((pfn), pageblock_nr_pages) + +struct zone { + atomic_long_t managed_pages; +}; + +typedef struct pglist_data { + struct zone node_zones[MAX_NR_ZONES]; + +} pg_data_t; + +#endif /* _LINUX_MMZONE_H */ diff --git a/tools/testing/vma/vma.c b/tools/testing/vma/vma.c new file mode 100644 index 000000000000..ac7f917ab108 --- /dev/null +++ b/tools/testing/vma/vma.c @@ -0,0 +1,207 @@ +// SPDX-License-Identifier: GPL-2.0-or-later + +#include +#include +#include + +#include "maple-shared.h" +#include "vma_internal.h" + +/* + * Directly import the VMA implementation here. Our vma_internal.h wrapper + * provides userland-equivalent functionality for everything vma.c uses. + */ +#include "../../../mm/vma.c" + +const struct vm_operations_struct vma_dummy_vm_ops; + +#define ASSERT_TRUE(_expr) \ + do { \ + if (!(_expr)) { \ + fprintf(stderr, \ + "Assert FAILED at %s:%d:%s(): %s is FALSE.\n", \ + __FILE__, __LINE__, __FUNCTION__, #_expr); \ + return false; \ + } \ + } while (0) +#define ASSERT_FALSE(_expr) ASSERT_TRUE(!(_expr)) +#define ASSERT_EQ(_val1, _val2) ASSERT_TRUE((_val1) == (_val2)) +#define ASSERT_NE(_val1, _val2) ASSERT_TRUE((_val1) != (_val2)) + +static struct vm_area_struct *alloc_vma(struct mm_struct *mm, + unsigned long start, + unsigned long end, + pgoff_t pgoff, + vm_flags_t flags) +{ + struct vm_area_struct *ret = vm_area_alloc(mm); + + if (ret == NULL) + return NULL; + + ret->vm_start = start; + ret->vm_end = end; + ret->vm_pgoff = pgoff; + ret->__vm_flags = flags; + + return ret; +} + +static bool test_simple_merge(void) +{ + struct vm_area_struct *vma; + unsigned long flags = VM_READ | VM_WRITE | VM_MAYREAD | VM_MAYWRITE; + struct mm_struct mm = {}; + struct vm_area_struct *vma_left = alloc_vma(&mm, 0, 0x1000, 0, flags); + struct vm_area_struct *vma_middle = alloc_vma(&mm, 0x1000, 0x2000, 1, flags); + struct vm_area_struct *vma_right = alloc_vma(&mm, 0x2000, 0x3000, 2, flags); + VMA_ITERATOR(vmi, &mm, 0x1000); + + ASSERT_FALSE(vma_link(&mm, vma_left)); + ASSERT_FALSE(vma_link(&mm, vma_middle)); + ASSERT_FALSE(vma_link(&mm, vma_right)); + + vma = vma_merge_new_vma(&vmi, vma_left, vma_middle, 0x1000, + 0x2000, 1); + ASSERT_NE(vma, NULL); + + ASSERT_EQ(vma->vm_start, 0); + ASSERT_EQ(vma->vm_end, 0x3000); + ASSERT_EQ(vma->vm_pgoff, 0); + ASSERT_EQ(vma->vm_flags, flags); + + vm_area_free(vma); + mtree_destroy(&mm.mm_mt); + + return true; +} + +static bool test_simple_modify(void) +{ + struct vm_area_struct *vma; + unsigned long flags = VM_READ | VM_WRITE | VM_MAYREAD | VM_MAYWRITE; + struct mm_struct mm = {}; + struct vm_area_struct *init_vma = alloc_vma(&mm, 0, 0x3000, 0, flags); + VMA_ITERATOR(vmi, &mm, 0x1000); + + ASSERT_FALSE(vma_link(&mm, init_vma)); + + /* + * The flags will not be changed, the vma_modify_flags() function + * performs the merge/split only. + */ + vma = vma_modify_flags(&vmi, NULL, init_vma, + 0x1000, 0x2000, VM_READ | VM_MAYREAD); + ASSERT_NE(vma, NULL); + /* We modify the provided VMA, and on split allocate new VMAs. */ + ASSERT_EQ(vma, init_vma); + + ASSERT_EQ(vma->vm_start, 0x1000); + ASSERT_EQ(vma->vm_end, 0x2000); + ASSERT_EQ(vma->vm_pgoff, 1); + + /* + * Now walk through the three split VMAs and make sure they are as + * expected. + */ + + vma_iter_set(&vmi, 0); + vma = vma_iter_load(&vmi); + + ASSERT_EQ(vma->vm_start, 0); + ASSERT_EQ(vma->vm_end, 0x1000); + ASSERT_EQ(vma->vm_pgoff, 0); + + vm_area_free(vma); + vma_iter_clear(&vmi); + + vma = vma_next(&vmi); + + ASSERT_EQ(vma->vm_start, 0x1000); + ASSERT_EQ(vma->vm_end, 0x2000); + ASSERT_EQ(vma->vm_pgoff, 1); + + vm_area_free(vma); + vma_iter_clear(&vmi); + + vma = vma_next(&vmi); + + ASSERT_EQ(vma->vm_start, 0x2000); + ASSERT_EQ(vma->vm_end, 0x3000); + ASSERT_EQ(vma->vm_pgoff, 2); + + vm_area_free(vma); + mtree_destroy(&mm.mm_mt); + + return true; +} + +static bool test_simple_expand(void) +{ + unsigned long flags = VM_READ | VM_WRITE | VM_MAYREAD | VM_MAYWRITE; + struct mm_struct mm = {}; + struct vm_area_struct *vma = alloc_vma(&mm, 0, 0x1000, 0, flags); + VMA_ITERATOR(vmi, &mm, 0); + + ASSERT_FALSE(vma_link(&mm, vma)); + + ASSERT_FALSE(vma_expand(&vmi, vma, 0, 0x3000, 0, NULL)); + + ASSERT_EQ(vma->vm_start, 0); + ASSERT_EQ(vma->vm_end, 0x3000); + ASSERT_EQ(vma->vm_pgoff, 0); + + vm_area_free(vma); + mtree_destroy(&mm.mm_mt); + + return true; +} + +static bool test_simple_shrink(void) +{ + unsigned long flags = VM_READ | VM_WRITE | VM_MAYREAD | VM_MAYWRITE; + struct mm_struct mm = {}; + struct vm_area_struct *vma = alloc_vma(&mm, 0, 0x3000, 0, flags); + VMA_ITERATOR(vmi, &mm, 0); + + ASSERT_FALSE(vma_link(&mm, vma)); + + ASSERT_FALSE(vma_shrink(&vmi, vma, 0, 0x1000, 0)); + + ASSERT_EQ(vma->vm_start, 0); + ASSERT_EQ(vma->vm_end, 0x1000); + ASSERT_EQ(vma->vm_pgoff, 0); + + vm_area_free(vma); + mtree_destroy(&mm.mm_mt); + + return true; +} + +int main(void) +{ + int num_tests = 0, num_fail = 0; + + maple_tree_init(); + +#define TEST(name) \ + do { \ + num_tests++; \ + if (!test_##name()) { \ + num_fail++; \ + fprintf(stderr, "Test " #name " FAILED\n"); \ + } \ + } while (0) + + TEST(simple_merge); + TEST(simple_modify); + TEST(simple_expand); + TEST(simple_shrink); + +#undef TEST + + printf("%d tests run, %d passed, %d failed.\n", + num_tests, num_tests - num_fail, num_fail); + + return EXIT_SUCCESS; +} diff --git a/tools/testing/vma/vma_internal.h b/tools/testing/vma/vma_internal.h new file mode 100644 index 000000000000..093560e5b2ac --- /dev/null +++ b/tools/testing/vma/vma_internal.h @@ -0,0 +1,882 @@ +/* SPDX-License-Identifier: GPL-2.0+ */ +/* + * vma_internal.h + * + * Header providing userland wrappers and shims for the functionality provided + * by mm/vma_internal.h. + * + * We make the header guard the same as mm/vma_internal.h, so if this shim + * header is included, it precludes the inclusion of the kernel one. + */ + +#ifndef __MM_VMA_INTERNAL_H +#define __MM_VMA_INTERNAL_H + +#define __private +#define __bitwise +#define __randomize_layout + +#define CONFIG_MMU +#define CONFIG_PER_VMA_LOCK + +#include + +#include +#include +#include +#include +#include + +#define VM_WARN_ON(_expr) (WARN_ON(_expr)) +#define VM_WARN_ON_ONCE(_expr) (WARN_ON_ONCE(_expr)) +#define VM_BUG_ON(_expr) (BUG_ON(_expr)) +#define VM_BUG_ON_VMA(_expr, _vma) (BUG_ON(_expr)) + +#define VM_NONE 0x00000000 +#define VM_READ 0x00000001 +#define VM_WRITE 0x00000002 +#define VM_EXEC 0x00000004 +#define VM_SHARED 0x00000008 +#define VM_MAYREAD 0x00000010 +#define VM_MAYWRITE 0x00000020 +#define VM_GROWSDOWN 0x00000100 +#define VM_PFNMAP 0x00000400 +#define VM_LOCKED 0x00002000 +#define VM_IO 0x00004000 +#define VM_DONTEXPAND 0x00040000 +#define VM_ACCOUNT 0x00100000 +#define VM_MIXEDMAP 0x10000000 +#define VM_STACK VM_GROWSDOWN +#define VM_SHADOW_STACK VM_NONE +#define VM_SOFTDIRTY 0 + +#define VM_ACCESS_FLAGS (VM_READ | VM_WRITE | VM_EXEC) +#define VM_SPECIAL (VM_IO | VM_DONTEXPAND | VM_PFNMAP | VM_MIXEDMAP) + +#define FIRST_USER_ADDRESS 0UL +#define USER_PGTABLES_CEILING 0UL + +#define vma_policy(vma) NULL + +#define down_write_nest_lock(sem, nest_lock) + +#define pgprot_val(x) ((x).pgprot) +#define __pgprot(x) ((pgprot_t) { (x) } ) + +#define for_each_vma(__vmi, __vma) \ + while (((__vma) = vma_next(&(__vmi))) != NULL) + +/* The MM code likes to work with exclusive end addresses */ +#define for_each_vma_range(__vmi, __vma, __end) \ + while (((__vma) = vma_find(&(__vmi), (__end))) != NULL) + +#define offset_in_page(p) ((unsigned long)(p) & ~PAGE_MASK) + +#define PHYS_PFN(x) ((unsigned long)((x) >> PAGE_SHIFT)) + +#define test_and_set_bit(nr, addr) __test_and_set_bit(nr, addr) +#define test_and_clear_bit(nr, addr) __test_and_clear_bit(nr, addr) + +#define TASK_SIZE ((1ul << 47)-PAGE_SIZE) + +#define AS_MM_ALL_LOCKS 2 + +#define current NULL + +/* We hardcode this for now. */ +#define sysctl_max_map_count 0x1000000UL + +#define pgoff_t unsigned long +typedef unsigned long pgprotval_t; +typedef struct pgprot { pgprotval_t pgprot; } pgprot_t; +typedef unsigned long vm_flags_t; +typedef __bitwise unsigned int vm_fault_t; + +typedef struct refcount_struct { + atomic_t refs; +} refcount_t; + +struct kref { + refcount_t refcount; +}; + +struct anon_vma { + struct anon_vma *root; + struct rb_root_cached rb_root; +}; + +struct anon_vma_chain { + struct anon_vma *anon_vma; + struct list_head same_vma; +}; + +struct anon_vma_name { + struct kref kref; + /* The name needs to be at the end because it is dynamically sized. */ + char name[]; +}; + +struct vma_iterator { + struct ma_state mas; +}; + +#define VMA_ITERATOR(name, __mm, __addr) \ + struct vma_iterator name = { \ + .mas = { \ + .tree = &(__mm)->mm_mt, \ + .index = __addr, \ + .node = NULL, \ + .status = ma_start, \ + }, \ + } + +struct address_space { + struct rb_root_cached i_mmap; + unsigned long flags; + atomic_t i_mmap_writable; +}; + +struct vm_userfaultfd_ctx {}; +struct mempolicy {}; +struct mmu_gather {}; +struct mutex {}; +#define DEFINE_MUTEX(mutexname) \ + struct mutex mutexname = {} + +struct mm_struct { + struct maple_tree mm_mt; + int map_count; /* number of VMAs */ + unsigned long total_vm; /* Total pages mapped */ + unsigned long locked_vm; /* Pages that have PG_mlocked set */ + unsigned long data_vm; /* VM_WRITE & ~VM_SHARED & ~VM_STACK */ + unsigned long exec_vm; /* VM_EXEC & ~VM_WRITE & ~VM_STACK */ + unsigned long stack_vm; /* VM_STACK */ +}; + +struct vma_lock { + struct rw_semaphore lock; +}; + + +struct file { + struct address_space *f_mapping; +}; + +struct vm_area_struct { + /* The first cache line has the info for VMA tree walking. */ + + union { + struct { + /* VMA covers [vm_start; vm_end) addresses within mm */ + unsigned long vm_start; + unsigned long vm_end; + }; +#ifdef CONFIG_PER_VMA_LOCK + struct rcu_head vm_rcu; /* Used for deferred freeing. */ +#endif + }; + + struct mm_struct *vm_mm; /* The address space we belong to. */ + pgprot_t vm_page_prot; /* Access permissions of this VMA. */ + + /* + * Flags, see mm.h. + * To modify use vm_flags_{init|reset|set|clear|mod} functions. + */ + union { + const vm_flags_t vm_flags; + vm_flags_t __private __vm_flags; + }; + +#ifdef CONFIG_PER_VMA_LOCK + /* Flag to indicate areas detached from the mm->mm_mt tree */ + bool detached; + + /* + * Can only be written (using WRITE_ONCE()) while holding both: + * - mmap_lock (in write mode) + * - vm_lock->lock (in write mode) + * Can be read reliably while holding one of: + * - mmap_lock (in read or write mode) + * - vm_lock->lock (in read or write mode) + * Can be read unreliably (using READ_ONCE()) for pessimistic bailout + * while holding nothing (except RCU to keep the VMA struct allocated). + * + * This sequence counter is explicitly allowed to overflow; sequence + * counter reuse can only lead to occasional unnecessary use of the + * slowpath. + */ + int vm_lock_seq; + struct vma_lock *vm_lock; +#endif + + /* + * For areas with an address space and backing store, + * linkage into the address_space->i_mmap interval tree. + * + */ + struct { + struct rb_node rb; + unsigned long rb_subtree_last; + } shared; + + /* + * A file's MAP_PRIVATE vma can be in both i_mmap tree and anon_vma + * list, after a COW of one of the file pages. A MAP_SHARED vma + * can only be in the i_mmap tree. An anonymous MAP_PRIVATE, stack + * or brk vma (with NULL file) can only be in an anon_vma list. + */ + struct list_head anon_vma_chain; /* Serialized by mmap_lock & + * page_table_lock */ + struct anon_vma *anon_vma; /* Serialized by page_table_lock */ + + /* Function pointers to deal with this struct. */ + const struct vm_operations_struct *vm_ops; + + /* Information about our backing store: */ + unsigned long vm_pgoff; /* Offset (within vm_file) in PAGE_SIZE + units */ + struct file * vm_file; /* File we map to (can be NULL). */ + void * vm_private_data; /* was vm_pte (shared mem) */ + +#ifdef CONFIG_ANON_VMA_NAME + /* + * For private and shared anonymous mappings, a pointer to a null + * terminated string containing the name given to the vma, or NULL if + * unnamed. Serialized by mmap_lock. Use anon_vma_name to access. + */ + struct anon_vma_name *anon_name; +#endif +#ifdef CONFIG_SWAP + atomic_long_t swap_readahead_info; +#endif +#ifndef CONFIG_MMU + struct vm_region *vm_region; /* NOMMU mapping region */ +#endif +#ifdef CONFIG_NUMA + struct mempolicy *vm_policy; /* NUMA policy for the VMA */ +#endif +#ifdef CONFIG_NUMA_BALANCING + struct vma_numab_state *numab_state; /* NUMA Balancing state */ +#endif + struct vm_userfaultfd_ctx vm_userfaultfd_ctx; +} __randomize_layout; + +struct vm_fault {}; + +struct vm_operations_struct { + void (*open)(struct vm_area_struct * area); + /** + * @close: Called when the VMA is being removed from the MM. + * Context: User context. May sleep. Caller holds mmap_lock. + */ + void (*close)(struct vm_area_struct * area); + /* Called any time before splitting to check if it's allowed */ + int (*may_split)(struct vm_area_struct *area, unsigned long addr); + int (*mremap)(struct vm_area_struct *area); + /* + * Called by mprotect() to make driver-specific permission + * checks before mprotect() is finalised. The VMA must not + * be modified. Returns 0 if mprotect() can proceed. + */ + int (*mprotect)(struct vm_area_struct *vma, unsigned long start, + unsigned long end, unsigned long newflags); + vm_fault_t (*fault)(struct vm_fault *vmf); + vm_fault_t (*huge_fault)(struct vm_fault *vmf, unsigned int order); + vm_fault_t (*map_pages)(struct vm_fault *vmf, + pgoff_t start_pgoff, pgoff_t end_pgoff); + unsigned long (*pagesize)(struct vm_area_struct * area); + + /* notification that a previously read-only page is about to become + * writable, if an error is returned it will cause a SIGBUS */ + vm_fault_t (*page_mkwrite)(struct vm_fault *vmf); + + /* same as page_mkwrite when using VM_PFNMAP|VM_MIXEDMAP */ + vm_fault_t (*pfn_mkwrite)(struct vm_fault *vmf); + + /* called by access_process_vm when get_user_pages() fails, typically + * for use by special VMAs. See also generic_access_phys() for a generic + * implementation useful for any iomem mapping. + */ + int (*access)(struct vm_area_struct *vma, unsigned long addr, + void *buf, int len, int write); + + /* Called by the /proc/PID/maps code to ask the vma whether it + * has a special name. Returning non-NULL will also cause this + * vma to be dumped unconditionally. */ + const char *(*name)(struct vm_area_struct *vma); + +#ifdef CONFIG_NUMA + /* + * set_policy() op must add a reference to any non-NULL @new mempolicy + * to hold the policy upon return. Caller should pass NULL @new to + * remove a policy and fall back to surrounding context--i.e. do not + * install a MPOL_DEFAULT policy, nor the task or system default + * mempolicy. + */ + int (*set_policy)(struct vm_area_struct *vma, struct mempolicy *new); + + /* + * get_policy() op must add reference [mpol_get()] to any policy at + * (vma,addr) marked as MPOL_SHARED. The shared policy infrastructure + * in mm/mempolicy.c will do this automatically. + * get_policy() must NOT add a ref if the policy at (vma,addr) is not + * marked as MPOL_SHARED. vma policies are protected by the mmap_lock. + * If no [shared/vma] mempolicy exists at the addr, get_policy() op + * must return NULL--i.e., do not "fallback" to task or system default + * policy. + */ + struct mempolicy *(*get_policy)(struct vm_area_struct *vma, + unsigned long addr, pgoff_t *ilx); +#endif + /* + * Called by vm_normal_page() for special PTEs to find the + * page for @addr. This is useful if the default behavior + * (using pte_page()) would not find the correct page. + */ + struct page *(*find_special_page)(struct vm_area_struct *vma, + unsigned long addr); +}; + +static inline void vma_iter_invalidate(struct vma_iterator *vmi) +{ + mas_pause(&vmi->mas); +} + +static inline pgprot_t pgprot_modify(pgprot_t oldprot, pgprot_t newprot) +{ + return __pgprot(pgprot_val(oldprot) | pgprot_val(newprot)); +} + +static inline pgprot_t vm_get_page_prot(unsigned long vm_flags) +{ + return __pgprot(vm_flags); +} + +static inline bool is_shared_maywrite(vm_flags_t vm_flags) +{ + return (vm_flags & (VM_SHARED | VM_MAYWRITE)) == + (VM_SHARED | VM_MAYWRITE); +} + +static inline bool vma_is_shared_maywrite(struct vm_area_struct *vma) +{ + return is_shared_maywrite(vma->vm_flags); +} + +static inline struct vm_area_struct *vma_next(struct vma_iterator *vmi) +{ + /* + * Uses mas_find() to get the first VMA when the iterator starts. + * Calling mas_next() could skip the first entry. + */ + return mas_find(&vmi->mas, ULONG_MAX); +} + +static inline bool vma_lock_alloc(struct vm_area_struct *vma) +{ + vma->vm_lock = calloc(1, sizeof(struct vma_lock)); + + if (!vma->vm_lock) + return false; + + init_rwsem(&vma->vm_lock->lock); + vma->vm_lock_seq = -1; + + return true; +} + +static inline void vma_assert_write_locked(struct vm_area_struct *); +static inline void vma_mark_detached(struct vm_area_struct *vma, bool detached) +{ + /* When detaching vma should be write-locked */ + if (detached) + vma_assert_write_locked(vma); + vma->detached = detached; +} + +extern const struct vm_operations_struct vma_dummy_vm_ops; + +static inline void vma_init(struct vm_area_struct *vma, struct mm_struct *mm) +{ + memset(vma, 0, sizeof(*vma)); + vma->vm_mm = mm; + vma->vm_ops = &vma_dummy_vm_ops; + INIT_LIST_HEAD(&vma->anon_vma_chain); + vma_mark_detached(vma, false); +} + +static inline struct vm_area_struct *vm_area_alloc(struct mm_struct *mm) +{ + struct vm_area_struct *vma = calloc(1, sizeof(struct vm_area_struct)); + + if (!vma) + return NULL; + + vma_init(vma, mm); + if (!vma_lock_alloc(vma)) { + free(vma); + return NULL; + } + + return vma; +} + +static inline struct vm_area_struct *vm_area_dup(struct vm_area_struct *orig) +{ + struct vm_area_struct *new = calloc(1, sizeof(struct vm_area_struct)); + + if (!new) + return NULL; + + memcpy(new, orig, sizeof(*new)); + if (!vma_lock_alloc(new)) { + free(new); + return NULL; + } + INIT_LIST_HEAD(&new->anon_vma_chain); + + return new; +} + +/* + * These are defined in vma.h, but sadly vm_stat_account() is referenced by + * kernel/fork.c, so we have to these broadly available there, and temporarily + * define them here to resolve the dependency cycle. + */ + +#define is_exec_mapping(flags) \ + ((flags & (VM_EXEC | VM_WRITE | VM_STACK)) == VM_EXEC) + +#define is_stack_mapping(flags) \ + (((flags & VM_STACK) == VM_STACK) || (flags & VM_SHADOW_STACK)) + +#define is_data_mapping(flags) \ + ((flags & (VM_WRITE | VM_SHARED | VM_STACK)) == VM_WRITE) + +static inline void vm_stat_account(struct mm_struct *mm, vm_flags_t flags, + long npages) +{ + WRITE_ONCE(mm->total_vm, READ_ONCE(mm->total_vm)+npages); + + if (is_exec_mapping(flags)) + mm->exec_vm += npages; + else if (is_stack_mapping(flags)) + mm->stack_vm += npages; + else if (is_data_mapping(flags)) + mm->data_vm += npages; +} + +#undef is_exec_mapping +#undef is_stack_mapping +#undef is_data_mapping + +/* Currently stubbed but we may later wish to un-stub. */ +static inline void vm_acct_memory(long pages); +static inline void vm_unacct_memory(long pages) +{ + vm_acct_memory(-pages); +} + +static inline void mapping_allow_writable(struct address_space *mapping) +{ + atomic_inc(&mapping->i_mmap_writable); +} + +static inline void vma_set_range(struct vm_area_struct *vma, + unsigned long start, unsigned long end, + pgoff_t pgoff) +{ + vma->vm_start = start; + vma->vm_end = end; + vma->vm_pgoff = pgoff; +} + +static inline +struct vm_area_struct *vma_find(struct vma_iterator *vmi, unsigned long max) +{ + return mas_find(&vmi->mas, max - 1); +} + +static inline int vma_iter_clear_gfp(struct vma_iterator *vmi, + unsigned long start, unsigned long end, gfp_t gfp) +{ + __mas_set_range(&vmi->mas, start, end - 1); + mas_store_gfp(&vmi->mas, NULL, gfp); + if (unlikely(mas_is_err(&vmi->mas))) + return -ENOMEM; + + return 0; +} + +static inline void mmap_assert_locked(struct mm_struct *); +static inline struct vm_area_struct *find_vma_intersection(struct mm_struct *mm, + unsigned long start_addr, + unsigned long end_addr) +{ + unsigned long index = start_addr; + + mmap_assert_locked(mm); + return mt_find(&mm->mm_mt, &index, end_addr - 1); +} + +static inline +struct vm_area_struct *vma_lookup(struct mm_struct *mm, unsigned long addr) +{ + return mtree_load(&mm->mm_mt, addr); +} + +static inline struct vm_area_struct *vma_prev(struct vma_iterator *vmi) +{ + return mas_prev(&vmi->mas, 0); +} + +static inline void vma_iter_set(struct vma_iterator *vmi, unsigned long addr) +{ + mas_set(&vmi->mas, addr); +} + +static inline bool vma_is_anonymous(struct vm_area_struct *vma) +{ + return !vma->vm_ops; +} + +/* Defined in vma.h, so temporarily define here to avoid circular dependency. */ +#define vma_iter_load(vmi) \ + mas_walk(&(vmi)->mas) + +static inline struct vm_area_struct * +find_vma_prev(struct mm_struct *mm, unsigned long addr, + struct vm_area_struct **pprev) +{ + struct vm_area_struct *vma; + VMA_ITERATOR(vmi, mm, addr); + + vma = vma_iter_load(&vmi); + *pprev = vma_prev(&vmi); + if (!vma) + vma = vma_next(&vmi); + return vma; +} + +#undef vma_iter_load + +static inline void vma_iter_init(struct vma_iterator *vmi, + struct mm_struct *mm, unsigned long addr) +{ + mas_init(&vmi->mas, &mm->mm_mt, addr); +} + +/* Stubbed functions. */ + +static inline struct anon_vma_name *anon_vma_name(struct vm_area_struct *vma) +{ + return NULL; +} + +static inline bool is_mergeable_vm_userfaultfd_ctx(struct vm_area_struct *vma, + struct vm_userfaultfd_ctx vm_ctx) +{ + return true; +} + +static inline bool anon_vma_name_eq(struct anon_vma_name *anon_name1, + struct anon_vma_name *anon_name2) +{ + return true; +} + +static inline void might_sleep(void) +{ +} + +static inline unsigned long vma_pages(struct vm_area_struct *vma) +{ + return (vma->vm_end - vma->vm_start) >> PAGE_SHIFT; +} + +static inline void fput(struct file *) +{ +} + +static inline void mpol_put(struct mempolicy *) +{ +} + +static inline void vma_lock_free(struct vm_area_struct *vma) +{ + free(vma->vm_lock); +} + +static inline void __vm_area_free(struct vm_area_struct *vma) +{ + vma_lock_free(vma); + free(vma); +} + +static inline void vm_area_free(struct vm_area_struct *vma) +{ + __vm_area_free(vma); +} + +static inline void lru_add_drain(void) +{ +} + +static inline void tlb_gather_mmu(struct mmu_gather *, struct mm_struct *) +{ +} + +static inline void update_hiwater_rss(struct mm_struct *) +{ +} + +static inline void update_hiwater_vm(struct mm_struct *) +{ +} + +static inline void unmap_vmas(struct mmu_gather *tlb, struct ma_state *mas, + struct vm_area_struct *vma, unsigned long start_addr, + unsigned long end_addr, unsigned long tree_end, + bool mm_wr_locked) +{ + (void)tlb; + (void)mas; + (void)vma; + (void)start_addr; + (void)end_addr; + (void)tree_end; + (void)mm_wr_locked; +} + +static inline void free_pgtables(struct mmu_gather *tlb, struct ma_state *mas, + struct vm_area_struct *vma, unsigned long floor, + unsigned long ceiling, bool mm_wr_locked) +{ + (void)tlb; + (void)mas; + (void)vma; + (void)floor; + (void)ceiling; + (void)mm_wr_locked; +} + +static inline void mapping_unmap_writable(struct address_space *) +{ +} + +static inline void flush_dcache_mmap_lock(struct address_space *) +{ +} + +static inline void tlb_finish_mmu(struct mmu_gather *) +{ +} + +static inline void get_file(struct file *) +{ +} + +static inline int vma_dup_policy(struct vm_area_struct *, struct vm_area_struct *) +{ + return 0; +} + +static inline int anon_vma_clone(struct vm_area_struct *, struct vm_area_struct *) +{ + return 0; +} + +static inline void vma_start_write(struct vm_area_struct *) +{ +} + +static inline void vma_adjust_trans_huge(struct vm_area_struct *vma, + unsigned long start, + unsigned long end, + long adjust_next) +{ + (void)vma; + (void)start; + (void)end; + (void)adjust_next; +} + +static inline void vma_iter_free(struct vma_iterator *vmi) +{ + mas_destroy(&vmi->mas); +} + +static inline void vm_acct_memory(long pages) +{ +} + +static inline void vma_interval_tree_insert(struct vm_area_struct *, + struct rb_root_cached *) +{ +} + +static inline void vma_interval_tree_remove(struct vm_area_struct *, + struct rb_root_cached *) +{ +} + +static inline void flush_dcache_mmap_unlock(struct address_space *) +{ +} + +static inline void anon_vma_interval_tree_insert(struct anon_vma_chain*, + struct rb_root_cached *) +{ +} + +static inline void anon_vma_interval_tree_remove(struct anon_vma_chain*, + struct rb_root_cached *) +{ +} + +static inline void uprobe_mmap(struct vm_area_struct *) +{ +} + +static inline void uprobe_munmap(struct vm_area_struct *vma, + unsigned long start, unsigned long end) +{ + (void)vma; + (void)start; + (void)end; +} + +static inline void i_mmap_lock_write(struct address_space *) +{ +} + +static inline void anon_vma_lock_write(struct anon_vma *) +{ +} + +static inline void vma_assert_write_locked(struct vm_area_struct *) +{ +} + +static inline void unlink_anon_vmas(struct vm_area_struct *) +{ +} + +static inline void anon_vma_unlock_write(struct anon_vma *) +{ +} + +static inline void i_mmap_unlock_write(struct address_space *) +{ +} + +static inline void anon_vma_merge(struct vm_area_struct *, + struct vm_area_struct *) +{ +} + +static inline int userfaultfd_unmap_prep(struct vm_area_struct *vma, + unsigned long start, + unsigned long end, + struct list_head *unmaps) +{ + (void)vma; + (void)start; + (void)end; + (void)unmaps; + + return 0; +} + +static inline void mmap_write_downgrade(struct mm_struct *) +{ +} + +static inline void mmap_read_unlock(struct mm_struct *) +{ +} + +static inline void mmap_write_unlock(struct mm_struct *) +{ +} + +static inline bool can_modify_mm(struct mm_struct *mm, + unsigned long start, + unsigned long end) +{ + (void)mm; + (void)start; + (void)end; + + return true; +} + +static inline void arch_unmap(struct mm_struct *mm, + unsigned long start, + unsigned long end) +{ + (void)mm; + (void)start; + (void)end; +} + +static inline void mmap_assert_locked(struct mm_struct *) +{ +} + +static inline bool mpol_equal(struct mempolicy *, struct mempolicy *) +{ + return true; +} + +static inline void khugepaged_enter_vma(struct vm_area_struct *vma, + unsigned long vm_flags) +{ + (void)vma; + (void)vm_flags; +} + +static inline bool mapping_can_writeback(struct address_space *) +{ + return true; +} + +static inline bool is_vm_hugetlb_page(struct vm_area_struct *) +{ + return false; +} + +static inline bool vma_soft_dirty_enabled(struct vm_area_struct *) +{ + return false; +} + +static inline bool userfaultfd_wp(struct vm_area_struct *) +{ + return false; +} + +static inline void mmap_assert_write_locked(struct mm_struct *) +{ +} + +static inline void mutex_lock(struct mutex *) +{ +} + +static inline void mutex_unlock(struct mutex *) +{ +} + +static inline bool mutex_is_locked(struct mutex *) +{ + return true; +} + +static inline bool signal_pending(void *) +{ + return false; +} + +#endif /* __MM_VMA_INTERNAL_H */