diff mbox series

[v2,1/2] mm: Allow non-VM_DONTEXPAND and VM_PFNMAP mappings with MREMAP_DONTUNMAP

Message ID 20210317191334.564944-1-bgeffon@google.com (mailing list archive)
State New, archived
Headers show
Series [v2,1/2] mm: Allow non-VM_DONTEXPAND and VM_PFNMAP mappings with MREMAP_DONTUNMAP | expand

Commit Message

Brian Geffon March 17, 2021, 7:13 p.m. UTC
Currently MREMAP_DONTUNMAP only accepts private anonymous mappings. This
change will widen the support to include any mappings which are not
VM_DONTEXPAND or VM_PFNMAP. The primary use case is to support
MREMAP_DONTUNMAP on mappings which may have been created from a memfd.

This change which takes advantage of the existing check in vma_to_resize
for non-VM_DONTEXPAND and non-VM_PFNMAP mappings will cause
MREMAP_DONTUNMAP to return -EFAULT if such mappings are remapped. This
behavior is consistent with existing behavior when using mremap with
such mappings.

Lokesh Gidra who works on the Android JVM, provided an explanation of how
such a feature will improve Android JVM garbage collection:
"Android is developing a new garbage collector (GC), based on userfaultfd.
The garbage collector will use userfaultfd (uffd) on the java heap during
compaction. On accessing any uncompacted page, the application threads will
find it missing, at which point the thread will create the compacted page
and then use UFFDIO_COPY ioctl to get it mapped and then resume execution.
Before starting this compaction, in a stop-the-world pause the heap will be
mremap(MREMAP_DONTUNMAP) so that the java heap is ready to receive
UFFD_EVENT_PAGEFAULT events after resuming execution.

To speedup mremap operations, pagetable movement was optimized by moving
PUD entries instead of PTE entries [1]. It was necessary as mremap of even
modest sized memory ranges also took several milliseconds, and stopping the
application for that long isn't acceptable in response-time sensitive
cases.

With UFFDIO_CONTINUE feature [2], it will be even more efficient to
implement this GC, particularly the 'non-moveable' portions of the heap.
It will also help in reducing the need to copy (UFFDIO_COPY) the pages.
However, for this to work, the java heap has to be on a 'shared' vma.
Currently MREMAP_DONTUNMAP only supports private anonymous mappings, this
patch will enable using UFFDIO_CONTINUE for the new userfaultfd-based heap
compaction."

[1] https://lore.kernel.org/linux-mm/20201215030730.NC3CU98e4%25akpm@linux-foundation.org/
[2] https://lore.kernel.org/linux-mm/20210302000133.272579-1-axelrasmussen@google.com/

Signed-off-by: Brian Geffon <bgeffon@google.com>
---
 mm/mremap.c | 4 ----
 1 file changed, 4 deletions(-)

Comments

Peter Xu March 17, 2021, 8:40 p.m. UTC | #1
Hi, Brian,

On Wed, Mar 17, 2021 at 12:13:33PM -0700, Brian Geffon wrote:
> Currently MREMAP_DONTUNMAP only accepts private anonymous mappings. This
> change will widen the support to include any mappings which are not
> VM_DONTEXPAND or VM_PFNMAP. The primary use case is to support
> MREMAP_DONTUNMAP on mappings which may have been created from a memfd.
> 
> This change which takes advantage of the existing check in vma_to_resize
> for non-VM_DONTEXPAND and non-VM_PFNMAP mappings will cause
> MREMAP_DONTUNMAP to return -EFAULT if such mappings are remapped. This
> behavior is consistent with existing behavior when using mremap with
> such mappings.
> 
> Lokesh Gidra who works on the Android JVM, provided an explanation of how
> such a feature will improve Android JVM garbage collection:
> "Android is developing a new garbage collector (GC), based on userfaultfd.
> The garbage collector will use userfaultfd (uffd) on the java heap during
> compaction. On accessing any uncompacted page, the application threads will
> find it missing, at which point the thread will create the compacted page
> and then use UFFDIO_COPY ioctl to get it mapped and then resume execution.
> Before starting this compaction, in a stop-the-world pause the heap will be
> mremap(MREMAP_DONTUNMAP) so that the java heap is ready to receive
> UFFD_EVENT_PAGEFAULT events after resuming execution.
> 
> To speedup mremap operations, pagetable movement was optimized by moving
> PUD entries instead of PTE entries [1]. It was necessary as mremap of even
> modest sized memory ranges also took several milliseconds, and stopping the
> application for that long isn't acceptable in response-time sensitive
> cases.
> 
> With UFFDIO_CONTINUE feature [2], it will be even more efficient to
> implement this GC, particularly the 'non-moveable' portions of the heap.
> It will also help in reducing the need to copy (UFFDIO_COPY) the pages.
> However, for this to work, the java heap has to be on a 'shared' vma.
> Currently MREMAP_DONTUNMAP only supports private anonymous mappings, this
> patch will enable using UFFDIO_CONTINUE for the new userfaultfd-based heap
> compaction."
> 
> [1] https://lore.kernel.org/linux-mm/20201215030730.NC3CU98e4%25akpm@linux-foundation.org/
> [2] https://lore.kernel.org/linux-mm/20210302000133.272579-1-axelrasmussen@google.com/
> 
> Signed-off-by: Brian Geffon <bgeffon@google.com>
> ---
>  mm/mremap.c | 4 ----
>  1 file changed, 4 deletions(-)
> 
> diff --git a/mm/mremap.c b/mm/mremap.c
> index ec8f840399ed..2c57dc4bc8b6 100644
> --- a/mm/mremap.c
> +++ b/mm/mremap.c
> @@ -653,10 +653,6 @@ static struct vm_area_struct *vma_to_resize(unsigned long addr,
>  		return ERR_PTR(-EINVAL);
>  	}
>  
> -	if (flags & MREMAP_DONTUNMAP && (!vma_is_anonymous(vma) ||
> -			vma->vm_flags & VM_SHARED))
> -		return ERR_PTR(-EINVAL);
> -
>  	if (is_vm_hugetlb_page(vma))
>  		return ERR_PTR(-EINVAL);

The code change seems to be not aligned with what the commit message said.  Did
you perhaps forget to add the checks against VM_DONTEXPAND | VM_PFNMAP?  I'm
guessing that (instead of commit message to be touched up) because you still
attached the revert patch, then that check seems to be needed.  Thanks,
Brian Geffon March 17, 2021, 8:44 p.m. UTC | #2
Hi Peter,
Thank you as always for taking a look. This change relies on the
existing check in vma_to_resize on line 686:
https://elixir.bootlin.com/linux/v5.12-rc3/source/mm/mremap.c#L686
which returns -EFAULT when the vma is VM_DONTEXPAND or VM_PFNMAP.

Thanks
Brian

On Wed, Mar 17, 2021 at 4:40 PM Peter Xu <peterx@redhat.com> wrote:
>
> Hi, Brian,
>
> On Wed, Mar 17, 2021 at 12:13:33PM -0700, Brian Geffon wrote:
> > Currently MREMAP_DONTUNMAP only accepts private anonymous mappings. This
> > change will widen the support to include any mappings which are not
> > VM_DONTEXPAND or VM_PFNMAP. The primary use case is to support
> > MREMAP_DONTUNMAP on mappings which may have been created from a memfd.
> >
> > This change which takes advantage of the existing check in vma_to_resize
> > for non-VM_DONTEXPAND and non-VM_PFNMAP mappings will cause
> > MREMAP_DONTUNMAP to return -EFAULT if such mappings are remapped. This
> > behavior is consistent with existing behavior when using mremap with
> > such mappings.
> >
> > Lokesh Gidra who works on the Android JVM, provided an explanation of how
> > such a feature will improve Android JVM garbage collection:
> > "Android is developing a new garbage collector (GC), based on userfaultfd.
> > The garbage collector will use userfaultfd (uffd) on the java heap during
> > compaction. On accessing any uncompacted page, the application threads will
> > find it missing, at which point the thread will create the compacted page
> > and then use UFFDIO_COPY ioctl to get it mapped and then resume execution.
> > Before starting this compaction, in a stop-the-world pause the heap will be
> > mremap(MREMAP_DONTUNMAP) so that the java heap is ready to receive
> > UFFD_EVENT_PAGEFAULT events after resuming execution.
> >
> > To speedup mremap operations, pagetable movement was optimized by moving
> > PUD entries instead of PTE entries [1]. It was necessary as mremap of even
> > modest sized memory ranges also took several milliseconds, and stopping the
> > application for that long isn't acceptable in response-time sensitive
> > cases.
> >
> > With UFFDIO_CONTINUE feature [2], it will be even more efficient to
> > implement this GC, particularly the 'non-moveable' portions of the heap.
> > It will also help in reducing the need to copy (UFFDIO_COPY) the pages.
> > However, for this to work, the java heap has to be on a 'shared' vma.
> > Currently MREMAP_DONTUNMAP only supports private anonymous mappings, this
> > patch will enable using UFFDIO_CONTINUE for the new userfaultfd-based heap
> > compaction."
> >
> > [1] https://lore.kernel.org/linux-mm/20201215030730.NC3CU98e4%25akpm@linux-foundation.org/
> > [2] https://lore.kernel.org/linux-mm/20210302000133.272579-1-axelrasmussen@google.com/
> >
> > Signed-off-by: Brian Geffon <bgeffon@google.com>
> > ---
> >  mm/mremap.c | 4 ----
> >  1 file changed, 4 deletions(-)
> >
> > diff --git a/mm/mremap.c b/mm/mremap.c
> > index ec8f840399ed..2c57dc4bc8b6 100644
> > --- a/mm/mremap.c
> > +++ b/mm/mremap.c
> > @@ -653,10 +653,6 @@ static struct vm_area_struct *vma_to_resize(unsigned long addr,
> >               return ERR_PTR(-EINVAL);
> >       }
> >
> > -     if (flags & MREMAP_DONTUNMAP && (!vma_is_anonymous(vma) ||
> > -                     vma->vm_flags & VM_SHARED))
> > -             return ERR_PTR(-EINVAL);
> > -
> >       if (is_vm_hugetlb_page(vma))
> >               return ERR_PTR(-EINVAL);
>
> The code change seems to be not aligned with what the commit message said.  Did
> you perhaps forget to add the checks against VM_DONTEXPAND | VM_PFNMAP?  I'm
> guessing that (instead of commit message to be touched up) because you still
> attached the revert patch, then that check seems to be needed.  Thanks,
>
> --
> Peter Xu
>
Peter Xu March 17, 2021, 9:18 p.m. UTC | #3
On Wed, Mar 17, 2021 at 04:44:25PM -0400, Brian Geffon wrote:
> Hi Peter,

Hi, Brian,

> Thank you as always for taking a look. This change relies on the
> existing check in vma_to_resize on line 686:
> https://elixir.bootlin.com/linux/v5.12-rc3/source/mm/mremap.c#L686
> which returns -EFAULT when the vma is VM_DONTEXPAND or VM_PFNMAP.

Do you mean line 676?

https://elixir.bootlin.com/linux/v5.12-rc3/source/mm/mremap.c#L676

I'm not sure whether it'll work for MREMAP_DONTUNMAP, since IIUC
MREMAP_DONTUNMAP only works for the remap case with no size change, however in
that case in vma_to_resize() we'll bail out even earlier than line 676 when
checking against the size:

https://elixir.bootlin.com/linux/v5.12-rc3/source/mm/mremap.c#L667

So IIUC we'll still need the change as Hugh suggested previously.

Thanks,
Brian Geffon March 17, 2021, 9:25 p.m. UTC | #4
You're 100% correct, I'll mail a new patch in a few

Brian


On Wed, Mar 17, 2021 at 5:19 PM Peter Xu <peterx@redhat.com> wrote:
>
> On Wed, Mar 17, 2021 at 04:44:25PM -0400, Brian Geffon wrote:
> > Hi Peter,
>
> Hi, Brian,
>
> > Thank you as always for taking a look. This change relies on the
> > existing check in vma_to_resize on line 686:
> > https://elixir.bootlin.com/linux/v5.12-rc3/source/mm/mremap.c#L686
> > which returns -EFAULT when the vma is VM_DONTEXPAND or VM_PFNMAP.
>
> Do you mean line 676?
>
> https://elixir.bootlin.com/linux/v5.12-rc3/source/mm/mremap.c#L676
>
> I'm not sure whether it'll work for MREMAP_DONTUNMAP, since IIUC
> MREMAP_DONTUNMAP only works for the remap case with no size change, however in
> that case in vma_to_resize() we'll bail out even earlier than line 676 when
> checking against the size:
>
> https://elixir.bootlin.com/linux/v5.12-rc3/source/mm/mremap.c#L667
>
> So IIUC we'll still need the change as Hugh suggested previously.
>
> Thanks,
>
> --
> Peter Xu
>
diff mbox series

Patch

diff --git a/mm/mremap.c b/mm/mremap.c
index ec8f840399ed..2c57dc4bc8b6 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -653,10 +653,6 @@  static struct vm_area_struct *vma_to_resize(unsigned long addr,
 		return ERR_PTR(-EINVAL);
 	}
 
-	if (flags & MREMAP_DONTUNMAP && (!vma_is_anonymous(vma) ||
-			vma->vm_flags & VM_SHARED))
-		return ERR_PTR(-EINVAL);
-
 	if (is_vm_hugetlb_page(vma))
 		return ERR_PTR(-EINVAL);