mbox series

[RFC,0/7] Make core VMA operations internal and testable

Message ID cover.1719481836.git.lstoakes@gmail.com (mailing list archive)
Headers show
Series Make core VMA operations internal and testable | expand

Message

Lorenzo Stoakes June 27, 2024, 10:39 a.m. UTC
There are a number of "core" VMA manipulation functions implemented in
mm/mmap.c, notably those concerning VMA merging, splitting, modifying,
expanding and shrinking, which logically don't belong there.

More importantly this functionality represents an internal implementation
detail of memory management and should not be exposed outside of mm/
itself.

This patch series isolates core VMA manipulation functionality into its own
file, mm/vma.c, and provides an API to the rest of the mm code in mm/vma.h.

Importantly, it also carefully implements mm/vma_internal.h, which
specifies which headers need to be imported by vma.c, leading to the very
useful property that vma.c depends only on mm/vma.h and mm/vma_internal.h.

This is useful, because we can then re-implement vma_internal.h in
userland, stubbing out and adding shims for kernel mechanisms as required,
and then can directly and very easily unit test internal VMA functionality.

This patch series takes advantage of existing shim logic and full userland
maple tree support contained in tools/testing/radix-tree/ and
tools/include/linux/, separating out shared components of the radix tree
implementation to provide this testing.

Kernel functionality is stubbed and shimmed as needed in tools/testing/vma/
which contains a fully functional userland vma_internal.h file and which
imports mm/vma.c and mm/vma.h to be directly tested from userland.

A simple, skeleton testing implementation is provided in
tools/testing/vma/main.c as a proof-of-concept, asserting that simple VMA
merge, modify (testing split), expand and shrink functionality works
correctly.

Lorenzo Stoakes (7):
  userfaultfd: move core VMA manipulation logic to mm/userfaultfd.c
  mm: move vma_modify() and helpers to internal header
  mm: unexport vma_expand() / vma_shrink()
  mm: move internal core VMA manipulation functions to own file
  MAINTAINERS: Add entry for new VMA files
  tools: separate out shared radix-tree components
  tools: add skeleton code for userland testing of VMA logic

 MAINTAINERS                                   |   14 +
 fs/exec.c                                     |   26 +-
 fs/userfaultfd.c                              |  160 +-
 include/linux/mm.h                            |  104 +-
 include/linux/userfaultfd_k.h                 |   19 +
 mm/Makefile                                   |    2 +-
 mm/gup.c                                      |    1 +
 mm/huge_memory.c                              |    1 +
 mm/internal.h                                 |  160 +-
 mm/madvise.c                                  |    1 +
 mm/memory.c                                   |    1 +
 mm/mempolicy.c                                |    1 +
 mm/mlock.c                                    |    1 +
 mm/mmap.c                                     | 1808 +----------------
 mm/mmu_notifier.c                             |    2 +
 mm/mprotect.c                                 |    1 +
 mm/mremap.c                                   |    1 +
 mm/mseal.c                                    |    2 +
 mm/rmap.c                                     |    1 +
 mm/userfaultfd.c                              |  170 ++
 mm/vma.c                                      | 1766 ++++++++++++++++
 mm/vma.h                                      |  356 ++++
 mm/vma_internal.h                             |  143 ++
 tools/testing/radix-tree/Makefile             |   68 +-
 tools/testing/radix-tree/maple.c              |   14 +-
 tools/testing/radix-tree/xarray.c             |    9 +-
 tools/testing/shared/autoconf.h               |    2 +
 tools/testing/{radix-tree => shared}/bitmap.c |    0
 tools/testing/{radix-tree => shared}/linux.c  |    0
 .../{radix-tree => shared}/linux/bug.h        |    0
 .../{radix-tree => shared}/linux/cpu.h        |    0
 .../{radix-tree => shared}/linux/idr.h        |    0
 .../{radix-tree => shared}/linux/init.h       |    0
 .../{radix-tree => shared}/linux/kconfig.h    |    0
 .../{radix-tree => shared}/linux/kernel.h     |    0
 .../{radix-tree => shared}/linux/kmemleak.h   |    0
 .../{radix-tree => shared}/linux/local_lock.h |    0
 .../{radix-tree => shared}/linux/lockdep.h    |    0
 .../{radix-tree => shared}/linux/maple_tree.h |    0
 .../{radix-tree => shared}/linux/percpu.h     |    0
 .../{radix-tree => shared}/linux/preempt.h    |    0
 .../{radix-tree => shared}/linux/radix-tree.h |    0
 .../{radix-tree => shared}/linux/rcupdate.h   |    0
 .../{radix-tree => shared}/linux/xarray.h     |    0
 tools/testing/shared/maple-shared.h           |    9 +
 tools/testing/shared/maple-shim.c             |    7 +
 tools/testing/shared/shared.h                 |   34 +
 tools/testing/shared/shared.mk                |   68 +
 .../testing/shared/trace/events/maple_tree.h  |    5 +
 tools/testing/shared/xarray-shared.c          |    5 +
 tools/testing/shared/xarray-shared.h          |    4 +
 tools/testing/vma/.gitignore                  |    7 +
 tools/testing/vma/Makefile                    |   18 +
 tools/testing/vma/errors.txt                  |    0
 tools/testing/vma/generated/autoconf.h        |    2 +
 tools/testing/vma/linux/atomic.h              |   19 +
 tools/testing/vma/linux/mmzone.h              |   37 +
 tools/testing/vma/main.c                      |  161 ++
 tools/testing/vma/vma.h                       |    3 +
 tools/testing/vma/vma_internal.h              |  843 ++++++++
 tools/testing/vma/vma_stub.c                  |    6 +
 61 files changed, 3800 insertions(+), 2262 deletions(-)
 create mode 100644 mm/vma.c
 create mode 100644 mm/vma.h
 create mode 100644 mm/vma_internal.h
 create mode 100644 tools/testing/shared/autoconf.h
 rename tools/testing/{radix-tree => shared}/bitmap.c (100%)
 rename tools/testing/{radix-tree => shared}/linux.c (100%)
 rename tools/testing/{radix-tree => shared}/linux/bug.h (100%)
 rename tools/testing/{radix-tree => shared}/linux/cpu.h (100%)
 rename tools/testing/{radix-tree => shared}/linux/idr.h (100%)
 rename tools/testing/{radix-tree => shared}/linux/init.h (100%)
 rename tools/testing/{radix-tree => shared}/linux/kconfig.h (100%)
 rename tools/testing/{radix-tree => shared}/linux/kernel.h (100%)
 rename tools/testing/{radix-tree => shared}/linux/kmemleak.h (100%)
 rename tools/testing/{radix-tree => shared}/linux/local_lock.h (100%)
 rename tools/testing/{radix-tree => shared}/linux/lockdep.h (100%)
 rename tools/testing/{radix-tree => shared}/linux/maple_tree.h (100%)
 rename tools/testing/{radix-tree => shared}/linux/percpu.h (100%)
 rename tools/testing/{radix-tree => shared}/linux/preempt.h (100%)
 rename tools/testing/{radix-tree => shared}/linux/radix-tree.h (100%)
 rename tools/testing/{radix-tree => shared}/linux/rcupdate.h (100%)
 rename tools/testing/{radix-tree => shared}/linux/xarray.h (100%)
 create mode 100644 tools/testing/shared/maple-shared.h
 create mode 100644 tools/testing/shared/maple-shim.c
 create mode 100644 tools/testing/shared/shared.h
 create mode 100644 tools/testing/shared/shared.mk
 create mode 100644 tools/testing/shared/trace/events/maple_tree.h
 create mode 100644 tools/testing/shared/xarray-shared.c
 create mode 100644 tools/testing/shared/xarray-shared.h
 create mode 100644 tools/testing/vma/.gitignore
 create mode 100644 tools/testing/vma/Makefile
 create mode 100644 tools/testing/vma/errors.txt
 create mode 100644 tools/testing/vma/generated/autoconf.h
 create mode 100644 tools/testing/vma/linux/atomic.h
 create mode 100644 tools/testing/vma/linux/mmzone.h
 create mode 100644 tools/testing/vma/main.c
 create mode 100644 tools/testing/vma/vma.h
 create mode 100644 tools/testing/vma/vma_internal.h
 create mode 100644 tools/testing/vma/vma_stub.c

--
2.45.1

Comments

Liam R. Howlett June 27, 2024, 5:45 p.m. UTC | #1
* Lorenzo Stoakes <lstoakes@gmail.com> [240627 06:39]:
> The vma_expand() and vma_shrink() functions are core VMA manipulaion
> functions which ultimately invoke VMA split/merge. In order to make these
> testable, it is convenient to place all such core functions in a header
> internal to mm/.
> 

The sole user doesn't cause a split or merge, it relocates a vma by
'sliding' the window of the vma by expand/shrink with the moving of page
tables in the middle of the slide.

It slides to relocate the vma start/end and keep the vma pointer
constant.

> In addition, it is safer to abstract direct access to such functionality so
> we can better control how other parts of the kernel use them, which
> provides us the freedom to change how this functionality behaves as needed
> without having to worry about how this functionality is used elsewhere.
> 
> In order to service both these requirements, we provide abstractions for
> the sole external user of these functions, shift_arg_pages() in fs/exec.c.
> 
> We provide vma_expand_bottom() and vma_shrink_top() functions which better
> match the semantics of what shift_arg_pages() is trying to accomplish by
> explicitly wrapping the safe expansion of the bottom of a VMA and the
> shrinking of the top of a VMA.
> 
> As a result, we place the vma_shrink() and vma_expand() functions into
> mm/internal.h to unexport them from use by any other part of the kernel.

There is no point to have vma_shrink() have a wrapper since this is the
only place it's ever used.  So we're wrapping a function that's only
called once.

I'd rather a vma_relocate() do everything in this function than wrap
them.  The only other think it does is the page table moving and freeing
- which we have to do in the vma code.  We;d expose something we want no
one to use - but we already have two of those here..

> 
> Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
> ---
>  fs/exec.c          | 26 +++++--------------
>  include/linux/mm.h |  9 +++----
>  mm/internal.h      |  6 +++++
>  mm/mmap.c          | 65 ++++++++++++++++++++++++++++++++++++++++++++++
>  4 files changed, 82 insertions(+), 24 deletions(-)
> 
> diff --git a/fs/exec.c b/fs/exec.c
> index 40073142288f..1cb3bf323e0f 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -700,25 +700,14 @@ static int shift_arg_pages(struct vm_area_struct *vma, unsigned long shift)
>  	unsigned long length = old_end - old_start;
>  	unsigned long new_start = old_start - shift;
>  	unsigned long new_end = old_end - shift;
> -	VMA_ITERATOR(vmi, mm, new_start);
> +	VMA_ITERATOR(vmi, mm, 0);
>  	struct vm_area_struct *next;
>  	struct mmu_gather tlb;
> +	int ret;
>  
> -	BUG_ON(new_start > new_end);
> -
> -	/*
> -	 * ensure there are no vmas between where we want to go
> -	 * and where we are
> -	 */
> -	if (vma != vma_next(&vmi))
> -		return -EFAULT;
> -
> -	vma_iter_prev_range(&vmi);
> -	/*
> -	 * cover the whole range: [new_start, old_end)
> -	 */
> -	if (vma_expand(&vmi, vma, new_start, old_end, vma->vm_pgoff, NULL))
> -		return -ENOMEM;
> +	ret = vma_expand_bottom(&vmi, vma, shift, &next);
> +	if (ret)
> +		return ret;
>  
>  	/*
>  	 * move the page tables downwards, on failure we rely on
> @@ -730,7 +719,7 @@ static int shift_arg_pages(struct vm_area_struct *vma, unsigned long shift)
>  
>  	lru_add_drain();
>  	tlb_gather_mmu(&tlb, mm);
> -	next = vma_next(&vmi);
> +
>  	if (new_end > old_start) {
>  		/*
>  		 * when the old and new regions overlap clear from new_end.
> @@ -749,9 +738,8 @@ static int shift_arg_pages(struct vm_area_struct *vma, unsigned long shift)
>  	}
>  	tlb_finish_mmu(&tlb);
>  
> -	vma_prev(&vmi);
>  	/* Shrink the vma to just the new range */
> -	return vma_shrink(&vmi, vma, new_start, new_end, vma->vm_pgoff);
> +	return vma_shrink_top(&vmi, vma, shift);
>  }
>  
>  /*
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 4d2b5538925b..e3220439cf75 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -3273,11 +3273,10 @@ void anon_vma_interval_tree_verify(struct anon_vma_chain *node);
>  
>  /* mmap.c */
>  extern int __vm_enough_memory(struct mm_struct *mm, long pages, int cap_sys_admin);
> -extern int vma_expand(struct vma_iterator *vmi, struct vm_area_struct *vma,
> -		      unsigned long start, unsigned long end, pgoff_t pgoff,
> -		      struct vm_area_struct *next);
> -extern int vma_shrink(struct vma_iterator *vmi, struct vm_area_struct *vma,
> -		       unsigned long start, unsigned long end, pgoff_t pgoff);
> +extern int vma_expand_bottom(struct vma_iterator *vmi, struct vm_area_struct *vma,
> +			     unsigned long shift, struct vm_area_struct **next);
> +extern int vma_shrink_top(struct vma_iterator *vmi, struct vm_area_struct *vma,
> +			  unsigned long shift);
>  extern struct anon_vma *find_mergeable_anon_vma(struct vm_area_struct *);
>  extern int insert_vm_struct(struct mm_struct *, struct vm_area_struct *);
>  extern void unlink_file_vma(struct vm_area_struct *);
> diff --git a/mm/internal.h b/mm/internal.h
> index c8177200c943..f7779727bb78 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -1305,6 +1305,12 @@ static inline struct vm_area_struct
>  			  vma_policy(vma), new_ctx, anon_vma_name(vma));
>  }
>  
> +int vma_expand(struct vma_iterator *vmi, struct vm_area_struct *vma,
> +	       unsigned long start, unsigned long end, pgoff_t pgoff,
> +		      struct vm_area_struct *next);
> +int vma_shrink(struct vma_iterator *vmi, struct vm_area_struct *vma,
> +	       unsigned long start, unsigned long end, pgoff_t pgoff);
> +
>  enum {
>  	/* mark page accessed */
>  	FOLL_TOUCH = 1 << 16,
> diff --git a/mm/mmap.c b/mm/mmap.c
> index e42d89f98071..574e69a04ebe 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -3940,6 +3940,71 @@ void mm_drop_all_locks(struct mm_struct *mm)
>  	mutex_unlock(&mm_all_locks_mutex);
>  }
>  
> +/*
> + * vma_expand_bottom() - Expands the bottom of a VMA downwards. An error will
> + *                       arise if there is another VMA in the expanded range, or
> + *                       if the expansion fails. This function leaves the VMA
> + *                       iterator, vmi, positioned at the newly expanded VMA.
> + * @vmi: The VMA iterator.
> + * @vma: The VMA to modify.
> + * @shift: The number of bytes by which to expand the bottom of the VMA.
> + * @next: Output parameter, pointing at the VMA immediately succeeding the newly
> + *        expanded VMA.
> + *
> + * Returns: 0 on success, an error code otherwise.
> + */
> +int vma_expand_bottom(struct vma_iterator *vmi, struct vm_area_struct *vma,
> +		      unsigned long shift, struct vm_area_struct **next)
> +{
> +	unsigned long old_start = vma->vm_start;
> +	unsigned long old_end = vma->vm_end;
> +	unsigned long new_start = old_start - shift;
> +	unsigned long new_end = old_end - shift;
> +
> +	BUG_ON(new_start > new_end);
> +
> +	vma_iter_set(vmi, new_start);
> +
> +	/*
> +	 * ensure there are no vmas between where we want to go
> +	 * and where we are
> +	 */
> +	if (vma != vma_next(vmi))
> +		return -EFAULT;
> +
> +	vma_iter_prev_range(vmi);
> +
> +	/*
> +	 * cover the whole range: [new_start, old_end)
> +	 */
> +	if (vma_expand(vmi, vma, new_start, old_end, vma->vm_pgoff, NULL))
> +		return -ENOMEM;
> +
> +	*next = vma_next(vmi);
> +	vma_prev(vmi);
> +
> +	return 0;
> +}
> +
> +/*
> + * vma_shrink_top() - Reduce an existing VMA's memory area by shift bytes from
> + *                    the top of the VMA.
> + * @vmi: The VMA iterator, must be positioned at the VMA.
> + * @vma: The VMA to modify.
> + * @shift: The number of bytes by which to shrink the VMA.
> + *
> + * Returns: 0 on success, an error code otherwise.
> + */
> +int vma_shrink_top(struct vma_iterator *vmi, struct vm_area_struct *vma,
> +		   unsigned long shift)
> +{
> +	if (shift >= vma->vm_end - vma->vm_start)
> +		return -EINVAL;
> +
> +	return vma_shrink(vmi, vma, vma->vm_start, vma->vm_end - shift,
> +			  vma->vm_pgoff);
> +}
> +
>  /*
>   * initialise the percpu counter for VM
>   */
> -- 
> 2.45.1
>
Lorenzo Stoakes June 27, 2024, 7:38 p.m. UTC | #2
On Thu, Jun 27, 2024 at 01:45:34PM -0400, Liam R. Howlett wrote:
> * Lorenzo Stoakes <lstoakes@gmail.com> [240627 06:39]:
> > The vma_expand() and vma_shrink() functions are core VMA manipulaion
> > functions which ultimately invoke VMA split/merge. In order to make these
> > testable, it is convenient to place all such core functions in a header
> > internal to mm/.
> >
>
> The sole user doesn't cause a split or merge, it relocates a vma by
> 'sliding' the window of the vma by expand/shrink with the moving of page
> tables in the middle of the slide.
>
> It slides to relocate the vma start/end and keep the vma pointer
> constant.

Yeah sorry, I actually don't know why I said this (I did say ultimately
again as well!), as you say and I was in fact aware of, this doesn't invoke
split/merge. I will put this down to me being tired when I wrote this :)

Will fix.

>
> > In addition, it is safer to abstract direct access to such functionality so
> > we can better control how other parts of the kernel use them, which
> > provides us the freedom to change how this functionality behaves as needed
> > without having to worry about how this functionality is used elsewhere.
> >
> > In order to service both these requirements, we provide abstractions for
> > the sole external user of these functions, shift_arg_pages() in fs/exec.c.
> >
> > We provide vma_expand_bottom() and vma_shrink_top() functions which better
> > match the semantics of what shift_arg_pages() is trying to accomplish by
> > explicitly wrapping the safe expansion of the bottom of a VMA and the
> > shrinking of the top of a VMA.
> >
> > As a result, we place the vma_shrink() and vma_expand() functions into
> > mm/internal.h to unexport them from use by any other part of the kernel.
>
> There is no point to have vma_shrink() have a wrapper since this is the
> only place it's ever used.  So we're wrapping a function that's only
> called once.

Yeah that was a sketchy part of this change, I feel the vma_expand() case
is a lot more defensible, the vma_shrink() one, well I expected I might get
some feedback on anyway :)

This was obviously to try to find a way to abstract these away from fs/ in
some vaguely sensible fashion while retaining functionality.

>
> I'd rather a vma_relocate() do everything in this function than wrap
> them.  The only other think it does is the page table moving and freeing
> - which we have to do in the vma code.  We;d expose something we want no
> one to use - but we already have two of those here..

Right, I think I was trying to avoid _the whole thing_ as it's so specific
and not so nice to make available, but at the same time, it is perhaps the
only way forward reasonably to avoid the vma_shrink() micro-wrapper.

So yeah, will rework with a vma_relocate() or similar. As you say, we can't
really get away from exposing something nasty here.

>
> >
> > Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
> > ---
> >  fs/exec.c          | 26 +++++--------------
> >  include/linux/mm.h |  9 +++----
> >  mm/internal.h      |  6 +++++
> >  mm/mmap.c          | 65 ++++++++++++++++++++++++++++++++++++++++++++++
> >  4 files changed, 82 insertions(+), 24 deletions(-)
> >
> > diff --git a/fs/exec.c b/fs/exec.c
> > index 40073142288f..1cb3bf323e0f 100644
> > --- a/fs/exec.c
> > +++ b/fs/exec.c
> > @@ -700,25 +700,14 @@ static int shift_arg_pages(struct vm_area_struct *vma, unsigned long shift)
> >  	unsigned long length = old_end - old_start;
> >  	unsigned long new_start = old_start - shift;
> >  	unsigned long new_end = old_end - shift;
> > -	VMA_ITERATOR(vmi, mm, new_start);
> > +	VMA_ITERATOR(vmi, mm, 0);
> >  	struct vm_area_struct *next;
> >  	struct mmu_gather tlb;
> > +	int ret;
> >
> > -	BUG_ON(new_start > new_end);
> > -
> > -	/*
> > -	 * ensure there are no vmas between where we want to go
> > -	 * and where we are
> > -	 */
> > -	if (vma != vma_next(&vmi))
> > -		return -EFAULT;
> > -
> > -	vma_iter_prev_range(&vmi);
> > -	/*
> > -	 * cover the whole range: [new_start, old_end)
> > -	 */
> > -	if (vma_expand(&vmi, vma, new_start, old_end, vma->vm_pgoff, NULL))
> > -		return -ENOMEM;
> > +	ret = vma_expand_bottom(&vmi, vma, shift, &next);
> > +	if (ret)
> > +		return ret;
> >
> >  	/*
> >  	 * move the page tables downwards, on failure we rely on
> > @@ -730,7 +719,7 @@ static int shift_arg_pages(struct vm_area_struct *vma, unsigned long shift)
> >
> >  	lru_add_drain();
> >  	tlb_gather_mmu(&tlb, mm);
> > -	next = vma_next(&vmi);
> > +
> >  	if (new_end > old_start) {
> >  		/*
> >  		 * when the old and new regions overlap clear from new_end.
> > @@ -749,9 +738,8 @@ static int shift_arg_pages(struct vm_area_struct *vma, unsigned long shift)
> >  	}
> >  	tlb_finish_mmu(&tlb);
> >
> > -	vma_prev(&vmi);
> >  	/* Shrink the vma to just the new range */
> > -	return vma_shrink(&vmi, vma, new_start, new_end, vma->vm_pgoff);
> > +	return vma_shrink_top(&vmi, vma, shift);
> >  }
> >
> >  /*
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index 4d2b5538925b..e3220439cf75 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -3273,11 +3273,10 @@ void anon_vma_interval_tree_verify(struct anon_vma_chain *node);
> >
> >  /* mmap.c */
> >  extern int __vm_enough_memory(struct mm_struct *mm, long pages, int cap_sys_admin);
> > -extern int vma_expand(struct vma_iterator *vmi, struct vm_area_struct *vma,
> > -		      unsigned long start, unsigned long end, pgoff_t pgoff,
> > -		      struct vm_area_struct *next);
> > -extern int vma_shrink(struct vma_iterator *vmi, struct vm_area_struct *vma,
> > -		       unsigned long start, unsigned long end, pgoff_t pgoff);
> > +extern int vma_expand_bottom(struct vma_iterator *vmi, struct vm_area_struct *vma,
> > +			     unsigned long shift, struct vm_area_struct **next);
> > +extern int vma_shrink_top(struct vma_iterator *vmi, struct vm_area_struct *vma,
> > +			  unsigned long shift);
> >  extern struct anon_vma *find_mergeable_anon_vma(struct vm_area_struct *);
> >  extern int insert_vm_struct(struct mm_struct *, struct vm_area_struct *);
> >  extern void unlink_file_vma(struct vm_area_struct *);
> > diff --git a/mm/internal.h b/mm/internal.h
> > index c8177200c943..f7779727bb78 100644
> > --- a/mm/internal.h
> > +++ b/mm/internal.h
> > @@ -1305,6 +1305,12 @@ static inline struct vm_area_struct
> >  			  vma_policy(vma), new_ctx, anon_vma_name(vma));
> >  }
> >
> > +int vma_expand(struct vma_iterator *vmi, struct vm_area_struct *vma,
> > +	       unsigned long start, unsigned long end, pgoff_t pgoff,
> > +		      struct vm_area_struct *next);
> > +int vma_shrink(struct vma_iterator *vmi, struct vm_area_struct *vma,
> > +	       unsigned long start, unsigned long end, pgoff_t pgoff);
> > +
> >  enum {
> >  	/* mark page accessed */
> >  	FOLL_TOUCH = 1 << 16,
> > diff --git a/mm/mmap.c b/mm/mmap.c
> > index e42d89f98071..574e69a04ebe 100644
> > --- a/mm/mmap.c
> > +++ b/mm/mmap.c
> > @@ -3940,6 +3940,71 @@ void mm_drop_all_locks(struct mm_struct *mm)
> >  	mutex_unlock(&mm_all_locks_mutex);
> >  }
> >
> > +/*
> > + * vma_expand_bottom() - Expands the bottom of a VMA downwards. An error will
> > + *                       arise if there is another VMA in the expanded range, or
> > + *                       if the expansion fails. This function leaves the VMA
> > + *                       iterator, vmi, positioned at the newly expanded VMA.
> > + * @vmi: The VMA iterator.
> > + * @vma: The VMA to modify.
> > + * @shift: The number of bytes by which to expand the bottom of the VMA.
> > + * @next: Output parameter, pointing at the VMA immediately succeeding the newly
> > + *        expanded VMA.
> > + *
> > + * Returns: 0 on success, an error code otherwise.
> > + */
> > +int vma_expand_bottom(struct vma_iterator *vmi, struct vm_area_struct *vma,
> > +		      unsigned long shift, struct vm_area_struct **next)
> > +{
> > +	unsigned long old_start = vma->vm_start;
> > +	unsigned long old_end = vma->vm_end;
> > +	unsigned long new_start = old_start - shift;
> > +	unsigned long new_end = old_end - shift;
> > +
> > +	BUG_ON(new_start > new_end);
> > +
> > +	vma_iter_set(vmi, new_start);
> > +
> > +	/*
> > +	 * ensure there are no vmas between where we want to go
> > +	 * and where we are
> > +	 */
> > +	if (vma != vma_next(vmi))
> > +		return -EFAULT;
> > +
> > +	vma_iter_prev_range(vmi);
> > +
> > +	/*
> > +	 * cover the whole range: [new_start, old_end)
> > +	 */
> > +	if (vma_expand(vmi, vma, new_start, old_end, vma->vm_pgoff, NULL))
> > +		return -ENOMEM;
> > +
> > +	*next = vma_next(vmi);
> > +	vma_prev(vmi);
> > +
> > +	return 0;
> > +}
> > +
> > +/*
> > + * vma_shrink_top() - Reduce an existing VMA's memory area by shift bytes from
> > + *                    the top of the VMA.
> > + * @vmi: The VMA iterator, must be positioned at the VMA.
> > + * @vma: The VMA to modify.
> > + * @shift: The number of bytes by which to shrink the VMA.
> > + *
> > + * Returns: 0 on success, an error code otherwise.
> > + */
> > +int vma_shrink_top(struct vma_iterator *vmi, struct vm_area_struct *vma,
> > +		   unsigned long shift)
> > +{
> > +	if (shift >= vma->vm_end - vma->vm_start)
> > +		return -EINVAL;
> > +
> > +	return vma_shrink(vmi, vma, vma->vm_start, vma->vm_end - shift,
> > +			  vma->vm_pgoff);
> > +}
> > +
> >  /*
> >   * initialise the percpu counter for VM
> >   */
> > --
> > 2.45.1
> >