From patchwork Fri Jun 28 14:35:24 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lorenzo Stoakes X-Patchwork-Id: 13716244 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1D26EC2BD09 for ; Fri, 28 Jun 2024 14:35:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B8A1B6B0096; Fri, 28 Jun 2024 10:35:45 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B12056B0098; Fri, 28 Jun 2024 10:35:45 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 93DDC6B0099; Fri, 28 Jun 2024 10:35:45 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 6C29F6B0096 for ; Fri, 28 Jun 2024 10:35:45 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 1C8D0A1C72 for ; Fri, 28 Jun 2024 14:35:45 +0000 (UTC) X-FDA: 82280546250.09.6134EC6 Received: from mail-lf1-f43.google.com (mail-lf1-f43.google.com [209.85.167.43]) by imf18.hostedemail.com (Postfix) with ESMTP id 2AA4B1C0003 for ; Fri, 28 Jun 2024 14:35:42 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=bzdnoOGW; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf18.hostedemail.com: domain of lstoakes@gmail.com designates 209.85.167.43 as permitted sender) smtp.mailfrom=lstoakes@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1719585322; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=9WJzFQGPim0f4heCjtjo3+B1Hrs0LYDOSzxuaubE8V4=; b=JPM87HNO+UnoT+o15ZXUeT0OfjYOVDvwzQizHqx8fNAaDfjwgeaN5Pi3foZHcRpECb25+6 slN0u3Ijv6kinIRwjSQtcsKo34Z8uErddfIBxcICT4RzE4zRgQg4Lmd4fuJvti8DCp4XCB fxWB+YJ25VAL2tHOnit1J9cfKyIxy6s= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1719585322; a=rsa-sha256; cv=none; b=RCNtWCoqpNRCSnSH2BIIkz1wL3qdBhvLLyghDWpoTzqiK3//eKdPhFu581HbSijNs1FaS0 GZGH87BeFNcbLC9KuoC+Hn5AMMrQisO6eqK6VMZAPckLgN/tL8tNn/JfYjBuEVv5xgEwCV UHcYztwaw8yUJSJZtnF605mm7xsATuE= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=bzdnoOGW; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf18.hostedemail.com: domain of lstoakes@gmail.com designates 209.85.167.43 as permitted sender) smtp.mailfrom=lstoakes@gmail.com Received: by mail-lf1-f43.google.com with SMTP id 2adb3069b0e04-52cd628f21cso705282e87.3 for ; Fri, 28 Jun 2024 07:35:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1719585341; x=1720190141; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=9WJzFQGPim0f4heCjtjo3+B1Hrs0LYDOSzxuaubE8V4=; b=bzdnoOGWzAcN7a0Uu1KBjgV1lhJzzoy2TMNTCwXAfqL2yiMEMftJnGmhfYW9EIsZaE 0DIxxe1HNHFy6bCAppF/HZYL3++G7926tTgP/rwB6QmpORt7iyFfGI9gemtPtyWDf1/m P0tY6U1QZye0OMU7WYz3V3soDhOYSPiZKhbj3+OWIYBnWVXcvoaZqJWO3Hp60N9XtriH 0Br48i7fK/J2Cj5RT0fJk9BUGtb/vQ0jB2Gry5kpfsdNgmoRsfuFGecanRdk/ziSvtS2 dmjMKO+Gf5euDKW6fnTHGkItQGd4LZyIIFuxBXHMUTYmDhYNYi3p6tEW41CUUkdH7m8a AaTA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1719585341; x=1720190141; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=9WJzFQGPim0f4heCjtjo3+B1Hrs0LYDOSzxuaubE8V4=; b=VMxm5dVfAx5XjjrAcrV9gvkd/nzFx2RIUGjEtIxKysLev+XRVZCAgN1QN8Bw6juX35 7bfB6BWwia9SP7pVg1K9g2Ne9zYF+3b7zasdyaSi2puV6J2WUKsf3SD4TUvL8MvIw5b/ 68yQTS2Dl4OknXqYI0W/CswxekSfgp2UVJ9SnN9Sdm8H93HQCHzJw4hNichpEtMkUIAx hlFt0FLczGlZ1ddDrEuRQWXbNjCBISP3EZLLKH/6m9P/li4h7jM9agCOrbn09+vPWntZ Ks7B1ULZTMOEjT0v31zsLtKB+jiLrkLSN9S38UF7ReoTExMGYKo7Y6AEYj2YVEmgJbyK tHIg== X-Forwarded-Encrypted: i=1; AJvYcCWAfX/XBDcyomCeVs66Hyl6LOzXuA1A3kxCoG17GtM5eqRpHc+y6rv5YwTGeVHwUiKBXtszA//EhhnoK93lszCKmG4= X-Gm-Message-State: AOJu0Yx/+qbNFu342ftUIMOhSRpX/gEZ9prs6m2rgonYDoYjq1ADrpTd uZLifi/Ml/t+AWFJfvpM+IhyrMRQERxjoQh86sZWB6EKB+DKLYNv X-Google-Smtp-Source: AGHT+IElovPPcmb1nrio2SN5g9r5FMwlKXD72NvlhVm6ljpkGlvNAV9o84Vw6PLKxk+nipDVk0hyxg== X-Received: by 2002:a05:6512:ad3:b0:52c:df6f:a66 with SMTP id 2adb3069b0e04-52cf45c1be9mr9253395e87.58.1719585341301; Fri, 28 Jun 2024 07:35:41 -0700 (PDT) Received: from lucifer.home ([2a00:23cc:d20f:ba01:bb66:f8b2:a0e8:6447]) by smtp.googlemail.com with ESMTPSA id 5b1f17b1804b1-4256af37828sm38985485e9.9.2024.06.28.07.35.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 28 Jun 2024 07:35:40 -0700 (PDT) From: Lorenzo Stoakes To: Andrew Morton Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, "Liam R . Howlett" , Vlastimil Babka , Matthew Wilcox , Alexander Viro , Christian Brauner , Jan Kara , Eric Biederman , Kees Cook , Suren Baghdasaryan , Lorenzo Stoakes Subject: [RFC PATCH v2 3/7] mm: move vma_shrink(), vma_expand() to internal header Date: Fri, 28 Jun 2024 15:35:24 +0100 Message-ID: X-Mailer: git-send-email 2.45.1 In-Reply-To: References: MIME-Version: 1.0 X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 2AA4B1C0003 X-Stat-Signature: 9zsu8a43rucfwyrnr9tz4ptexbkfdddp X-Rspam-User: X-HE-Tag: 1719585342-330729 X-HE-Meta: U2FsdGVkX1+LxsoltbK/JtgevaNYpnAC/N5s0ai31ZrEieR+7CPYVobYwQ+84r0H8QsoOWKPSgcHPm1LV9f/NRH1U516WEanj04NkRFI3lGG6eO54/ePmQ3hUXzcgp1t1+vwbHCiDKosI6+IlVXoA7Z5qG45a/OFnXY/MPraf9pbY+quDHYBaI0X+f0Wpq1gLiH1lnALQXZKcXhYK718sSM/d3zfopnBqR3vvH5WMGaOQmdmA99qclIyWrvkIzH9EZj8kiWaerWaG+yT4ZbSmGdjvIpl6UjNFW8yVvnk6Y22HeQNfSYd5rqj54lmnFNapu9pI4KHpNsPOY2Htt9JxoRYefH4to3SdzbvDfqC6neopiTE1DxjciIWi2JR6JmeBQ3x8/SW+ArT8WfDS7Hf9YuCZJM4iO2NczwZ+T9lR/uCZJL9GAAWQBMrUQjbg+KgxgHGHVLbA/kJYQsgq+U2Mf/YGhHxvM1eFLy8mvR7kWhBsRTzSB+ZXbKQ0kVkZdXBCl6lMRMkamaoXPHOH20qeW8XdcigA5D+sa31A4qHZVUCNqUoWbOiJWFppwSCCT3KUivt3xxJCUmxe5p4L5+d8r1UasFh+xCDDggaanKtzNnwAUJul2FBItYa/znA0Z5eFYzJkp6nj+5z7pRIXEe4LoUS3EML0HCgMKcoAG8gb5V4xHTHGynb4JY/oF5tnZjS98CHVpAJW7ng5vsBbwEhQz3R/q9nOhTh1A2+la3Ku1jS5HfsrZB3SoP0SuIdCOWHiSiLF0j6Krgipf4rDHJpxLQu/nKpoR5eItJAsRsrkKNxh9VM9u7rnZTSPmUV2D8As8kCJUWlqKcSr/XwY8msUgzhVnib9fbCPqnc1jGQFtUDp2GLa7fIRxzI8nJ24YKIrrsYc0yx1OwNribumnsQ/XxmT2CGqhjXb2Accc3xcR1JZ4vzcR3jLdO5AFpjp29n758hP5PYPycwHXeK8YS mkq1xs5B 9uyF4zxksbFZI5kIHLzzXDgHivTWFGvpUc9XBnG7MEwTqR84AjRd1nuQ516zrO/T+bSFV7VwUdAEyTkLmWEvwTvpcwrpDUdNptYxVO+ekZAHJz5SiR91X0SJ43ztoLBSWyPg2Mj7PAKt0YHQ3nyNZ6Jn1JCyuUHeca036HKFCHdLudboPE5fGf7wv7R/R1lh7XTinK2yn/nlWf4xT1f8PpPefmBGdHs3V/IykXeTU2AcICBqK4fxauQxsMa4s2b4pV/gV3eUuGsKi6QnK/AgBbnYLAETRktU8hcr8WxN8epI5CWp37j4N3mbzyzpC4t8LSJD73kcf5yYGV4ZmMQMVtwYrHvoZbWsWMQUo40YRw5Lfz7KOQLjkF3GVZqasL4yDGJ3m X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: The vma_shrink() and vma_expand() functions are internal VMA manipulation functions which we ought to abstract for use outside of memory management code. To achieve this, we abstract the operation performed in fs/exec.c by shift_arg_pages() into a new relocate_vma() function implemented in mm/mmap.c, which enables us to also move move_page_tables() and vma_iter_prev_range() to internal.h. The purpose of doing this is to isolate key VMA manipulation functions in order that we can both abstract them and later render them easily testable. Signed-off-by: Lorenzo Stoakes --- fs/exec.c | 68 ++------------------------------------ include/linux/mm.h | 17 +--------- mm/internal.h | 18 +++++++++++ mm/mmap.c | 81 ++++++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 102 insertions(+), 82 deletions(-) diff --git a/fs/exec.c b/fs/exec.c index 40073142288f..5cf53e20d8df 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -683,75 +683,11 @@ static int copy_strings_kernel(int argc, const char *const *argv, /* * During bprm_mm_init(), we create a temporary stack at STACK_TOP_MAX. Once * the binfmt code determines where the new stack should reside, we shift it to - * its final location. The process proceeds as follows: - * - * 1) Use shift to calculate the new vma endpoints. - * 2) Extend vma to cover both the old and new ranges. This ensures the - * arguments passed to subsequent functions are consistent. - * 3) Move vma's page tables to the new range. - * 4) Free up any cleared pgd range. - * 5) Shrink the vma to cover only the new range. + * its final location. */ static int shift_arg_pages(struct vm_area_struct *vma, unsigned long shift) { - struct mm_struct *mm = vma->vm_mm; - unsigned long old_start = vma->vm_start; - unsigned long old_end = vma->vm_end; - unsigned long length = old_end - old_start; - unsigned long new_start = old_start - shift; - unsigned long new_end = old_end - shift; - VMA_ITERATOR(vmi, mm, new_start); - struct vm_area_struct *next; - struct mmu_gather tlb; - - BUG_ON(new_start > new_end); - - /* - * ensure there are no vmas between where we want to go - * and where we are - */ - if (vma != vma_next(&vmi)) - return -EFAULT; - - vma_iter_prev_range(&vmi); - /* - * cover the whole range: [new_start, old_end) - */ - if (vma_expand(&vmi, vma, new_start, old_end, vma->vm_pgoff, NULL)) - return -ENOMEM; - - /* - * move the page tables downwards, on failure we rely on - * process cleanup to remove whatever mess we made. - */ - if (length != move_page_tables(vma, old_start, - vma, new_start, length, false, true)) - return -ENOMEM; - - lru_add_drain(); - tlb_gather_mmu(&tlb, mm); - next = vma_next(&vmi); - if (new_end > old_start) { - /* - * when the old and new regions overlap clear from new_end. - */ - free_pgd_range(&tlb, new_end, old_end, new_end, - next ? next->vm_start : USER_PGTABLES_CEILING); - } else { - /* - * otherwise, clean from old_start; this is done to not touch - * the address space in [new_end, old_start) some architectures - * have constraints on va-space that make this illegal (IA64) - - * for the others its just a little faster. - */ - free_pgd_range(&tlb, old_start, old_end, new_end, - next ? next->vm_start : USER_PGTABLES_CEILING); - } - tlb_finish_mmu(&tlb); - - vma_prev(&vmi); - /* Shrink the vma to just the new range */ - return vma_shrink(&vmi, vma, new_start, new_end, vma->vm_pgoff); + return relocate_vma(vma, shift); } /* diff --git a/include/linux/mm.h b/include/linux/mm.h index 4d2b5538925b..ab4b70f2ce94 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -998,12 +998,6 @@ static inline struct vm_area_struct *vma_prev(struct vma_iterator *vmi) return mas_prev(&vmi->mas, 0); } -static inline -struct vm_area_struct *vma_iter_prev_range(struct vma_iterator *vmi) -{ - return mas_prev_range(&vmi->mas, 0); -} - static inline unsigned long vma_iter_addr(struct vma_iterator *vmi) { return vmi->mas.index; @@ -2523,11 +2517,6 @@ int set_page_dirty_lock(struct page *page); int get_cmdline(struct task_struct *task, char *buffer, int buflen); -extern unsigned long move_page_tables(struct vm_area_struct *vma, - unsigned long old_addr, struct vm_area_struct *new_vma, - unsigned long new_addr, unsigned long len, - bool need_rmap_locks, bool for_stack); - /* * Flags used by change_protection(). For now we make it a bitmap so * that we can pass in multiple flags just like parameters. However @@ -3273,11 +3262,6 @@ void anon_vma_interval_tree_verify(struct anon_vma_chain *node); /* mmap.c */ extern int __vm_enough_memory(struct mm_struct *mm, long pages, int cap_sys_admin); -extern int vma_expand(struct vma_iterator *vmi, struct vm_area_struct *vma, - unsigned long start, unsigned long end, pgoff_t pgoff, - struct vm_area_struct *next); -extern int vma_shrink(struct vma_iterator *vmi, struct vm_area_struct *vma, - unsigned long start, unsigned long end, pgoff_t pgoff); extern struct anon_vma *find_mergeable_anon_vma(struct vm_area_struct *); extern int insert_vm_struct(struct mm_struct *, struct vm_area_struct *); extern void unlink_file_vma(struct vm_area_struct *); @@ -3285,6 +3269,7 @@ extern struct vm_area_struct *copy_vma(struct vm_area_struct **, unsigned long addr, unsigned long len, pgoff_t pgoff, bool *need_rmap_locks); extern void exit_mmap(struct mm_struct *); +extern int relocate_vma(struct vm_area_struct *vma, unsigned long shift); static inline int check_data_rlimit(unsigned long rlim, unsigned long new, diff --git a/mm/internal.h b/mm/internal.h index 164f03c6bce2..8c7aa5860df4 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -1304,6 +1304,12 @@ static inline struct vm_area_struct vma_policy(vma), new_ctx, anon_vma_name(vma)); } +int vma_expand(struct vma_iterator *vmi, struct vm_area_struct *vma, + unsigned long start, unsigned long end, pgoff_t pgoff, + struct vm_area_struct *next); +int vma_shrink(struct vma_iterator *vmi, struct vm_area_struct *vma, + unsigned long start, unsigned long end, pgoff_t pgoff); + enum { /* mark page accessed */ FOLL_TOUCH = 1 << 16, @@ -1527,6 +1533,12 @@ static inline int vma_iter_store_gfp(struct vma_iterator *vmi, return 0; } +static inline +struct vm_area_struct *vma_iter_prev_range(struct vma_iterator *vmi) +{ + return mas_prev_range(&vmi->mas, 0); +} + /* * VMA lock generalization */ @@ -1638,4 +1650,10 @@ void unlink_file_vma_batch_init(struct unlink_vma_file_batch *); void unlink_file_vma_batch_add(struct unlink_vma_file_batch *, struct vm_area_struct *); void unlink_file_vma_batch_final(struct unlink_vma_file_batch *); +/* mremap.c */ +unsigned long move_page_tables(struct vm_area_struct *vma, + unsigned long old_addr, struct vm_area_struct *new_vma, + unsigned long new_addr, unsigned long len, + bool need_rmap_locks, bool for_stack); + #endif /* __MM_INTERNAL_H */ diff --git a/mm/mmap.c b/mm/mmap.c index e42d89f98071..d2eebbed87b9 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -4058,3 +4058,84 @@ static int __meminit init_reserve_notifier(void) return 0; } subsys_initcall(init_reserve_notifier); + +/* + * Relocate a VMA downwards by shift bytes. There cannot be any VMAs between + * this VMA and its relocated range, which will now reside at [vma->vm_start - + * shift, vma->vm_end - shift). + * + * This function is almost certainly NOT what you want for anything other than + * early executable temporary stack relocation. + */ +int relocate_vma(struct vm_area_struct *vma, unsigned long shift) +{ + /* + * The process proceeds as follows: + * + * 1) Use shift to calculate the new vma endpoints. + * 2) Extend vma to cover both the old and new ranges. This ensures the + * arguments passed to subsequent functions are consistent. + * 3) Move vma's page tables to the new range. + * 4) Free up any cleared pgd range. + * 5) Shrink the vma to cover only the new range. + */ + + struct mm_struct *mm = vma->vm_mm; + unsigned long old_start = vma->vm_start; + unsigned long old_end = vma->vm_end; + unsigned long length = old_end - old_start; + unsigned long new_start = old_start - shift; + unsigned long new_end = old_end - shift; + VMA_ITERATOR(vmi, mm, new_start); + struct vm_area_struct *next; + struct mmu_gather tlb; + + BUG_ON(new_start > new_end); + + /* + * ensure there are no vmas between where we want to go + * and where we are + */ + if (vma != vma_next(&vmi)) + return -EFAULT; + + vma_iter_prev_range(&vmi); + /* + * cover the whole range: [new_start, old_end) + */ + if (vma_expand(&vmi, vma, new_start, old_end, vma->vm_pgoff, NULL)) + return -ENOMEM; + + /* + * move the page tables downwards, on failure we rely on + * process cleanup to remove whatever mess we made. + */ + if (length != move_page_tables(vma, old_start, + vma, new_start, length, false, true)) + return -ENOMEM; + + lru_add_drain(); + tlb_gather_mmu(&tlb, mm); + next = vma_next(&vmi); + if (new_end > old_start) { + /* + * when the old and new regions overlap clear from new_end. + */ + free_pgd_range(&tlb, new_end, old_end, new_end, + next ? next->vm_start : USER_PGTABLES_CEILING); + } else { + /* + * otherwise, clean from old_start; this is done to not touch + * the address space in [new_end, old_start) some architectures + * have constraints on va-space that make this illegal (IA64) - + * for the others its just a little faster. + */ + free_pgd_range(&tlb, old_start, old_end, new_end, + next ? next->vm_start : USER_PGTABLES_CEILING); + } + tlb_finish_mmu(&tlb); + + vma_prev(&vmi); + /* Shrink the vma to just the new range */ + return vma_shrink(&vmi, vma, new_start, new_end, vma->vm_pgoff); +}