diff mbox series

mm: Add MREMAP_DONTUNMAP to mremap().

Message ID 20200123014627.71720-1-bgeffon@google.com (mailing list archive)
State New, archived
Headers show
Series mm: Add MREMAP_DONTUNMAP to mremap(). | expand

Commit Message

Brian Geffon Jan. 23, 2020, 1:46 a.m. UTC
MREMAP_DONTUNMAP is an additional flag that can be used with
MREMAP_FIXED to move a mapping to a new address. Normally, mremap(2)
would then tear down the old vma so subsequent accesses to the vma
cause a segfault. However, with this new flag it will keep the old
vma with zapping PTEs so any access to the old VMA after that point
will result in a pagefault.

This feature will find a use in ChromeOS along with userfaultfd.
Specifically we will want to register a VMA with userfaultfd and then
pull it out from under a running process. By using MREMAP_DONTUNMAP we
don't have to worry about mprotecting and then potentially racing with
VMA permission changes from a running process.

This feature also has a use case in Android, Lokesh Gidra has said
that "As part of using userfaultfd for GC, We'll have to move the physical
pages of the java heap to a separate location. For this purpose mremap
will be used. Without the MREMAP_DONTUNMAP flag, when I mremap the java
heap, its virtual mapping will be removed as well. Therefore, we'll
require performing mmap immediately after. This is not only time consuming
but also opens a time window where a native thread may call mmap and
reserve the java heap's address range for its own usage. This flag
solves the problem."

Signed-off-by: Brian Geffon <bgeffon@google.com>
---
 include/uapi/linux/mman.h |  5 +++--
 mm/mremap.c               | 37 ++++++++++++++++++++++++++++++-------
 2 files changed, 33 insertions(+), 9 deletions(-)

Comments

Andy Lutomirski Jan. 23, 2020, 3:02 a.m. UTC | #1
> On Jan 22, 2020, at 5:46 PM, Brian Geffon <bgeffon@google.com> wrote:
> 
> MREMAP_DONTUNMAP is an additional flag that can be used with
> MREMAP_FIXED to move a mapping to a new address. Normally, mremap(2)
> would then tear down the old vma so subsequent accesses to the vma
> cause a segfault. However, with this new flag it will keep the old
> vma with zapping PTEs so any access to the old VMA after that point
> will result in a pagefault.

This needs a vastly better description. Perhaps:

When remapping an anonymous, private mapping, if MREMAP_DONTUNMAP is set, the source mapping will not be removed. Instead it will be cleared as if a brand new anonymous, private mapping had been created atomically as part of the mremap() call.  If a userfaultfd was watching the source, it will continue to watch the new mapping.  For a mapping that is shared or not anonymous, MREMAP_DONTUNMAP will cause the mremap() call to fail.

Or is it something else?

> 
> This feature will find a use in ChromeOS along with userfaultfd.
> Specifically we will want to register a VMA with userfaultfd and then
> pull it out from under a running process. By using MREMAP_DONTUNMAP we
> don't have to worry about mprotecting and then potentially racing with
> VMA permission changes from a running process.

Does this mean you yank it out but you want to replace it simultaneously?

> 
> This feature also has a use case in Android, Lokesh Gidra has said
> that "As part of using userfaultfd for GC, We'll have to move the physical
> pages of the java heap to a separate location. For this purpose mremap
> will be used. Without the MREMAP_DONTUNMAP flag, when I mremap the java
> heap, its virtual mapping will be removed as well. Therefore, we'll
> require performing mmap immediately after. This is not only time consuming
> but also opens a time window where a native thread may call mmap and
> reserve the java heap's address range for its own usage. This flag
> solves the problem."

Cute.
Lokesh Gidra Jan. 23, 2020, 6:47 p.m. UTC | #2
On Wed, Jan 22, 2020 at 7:02 PM Andy Lutomirski <luto@amacapital.net> wrote:

>
>
> > On Jan 22, 2020, at 5:46 PM, Brian Geffon <bgeffon@google.com> wrote:
> >
> > MREMAP_DONTUNMAP is an additional flag that can be used with
> > MREMAP_FIXED to move a mapping to a new address. Normally, mremap(2)
> > would then tear down the old vma so subsequent accesses to the vma
> > cause a segfault. However, with this new flag it will keep the old
> > vma with zapping PTEs so any access to the old VMA after that point
> > will result in a pagefault.
>
> This needs a vastly better description. Perhaps:
>
> When remapping an anonymous, private mapping, if MREMAP_DONTUNMAP is set,
> the source mapping will not be removed. Instead it will be cleared as if a
> brand new anonymous, private mapping had been created atomically as part of
> the mremap() call.  If a userfaultfd was watching the source, it will
> continue to watch the new mapping.  For a mapping that is shared or not
> anonymous, MREMAP_DONTUNMAP will cause the mremap() call to fail.
>
> This is the exact behaviour I'm looking for.


> Or is it something else?
>
> >
> > This feature will find a use in ChromeOS along with userfaultfd.
> > Specifically we will want to register a VMA with userfaultfd and then
> > pull it out from under a running process. By using MREMAP_DONTUNMAP we
> > don't have to worry about mprotecting and then potentially racing with
> > VMA permission changes from a running process.
>
> Does this mean you yank it out but you want to replace it simultaneously?
>
> >
> > This feature also has a use case in Android, Lokesh Gidra has said
> > that "As part of using userfaultfd for GC, We'll have to move the
> physical
> > pages of the java heap to a separate location. For this purpose mremap
> > will be used. Without the MREMAP_DONTUNMAP flag, when I mremap the java
> > heap, its virtual mapping will be removed as well. Therefore, we'll
> > require performing mmap immediately after. This is not only time
> consuming
> > but also opens a time window where a native thread may call mmap and
> > reserve the java heap's address range for its own usage. This flag
> > solves the problem."
>
> Cute.
Brian Geffon Jan. 23, 2020, 7:02 p.m. UTC | #3
Andy,

Thanks, yes, that's a much clearer description of the feature. I'll make
sure to update the description with subsequent patches and with later man
page updates.

Brian


On Wed, Jan 22, 2020 at 7:02 PM Andy Lutomirski <luto@amacapital.net> wrote:

>
>
> > On Jan 22, 2020, at 5:46 PM, Brian Geffon <bgeffon@google.com> wrote:
> >
> > MREMAP_DONTUNMAP is an additional flag that can be used with
> > MREMAP_FIXED to move a mapping to a new address. Normally, mremap(2)
> > would then tear down the old vma so subsequent accesses to the vma
> > cause a segfault. However, with this new flag it will keep the old
> > vma with zapping PTEs so any access to the old VMA after that point
> > will result in a pagefault.
>
> This needs a vastly better description. Perhaps:
>
> When remapping an anonymous, private mapping, if MREMAP_DONTUNMAP is set,
> the source mapping will not be removed. Instead it will be cleared as if a
> brand new anonymous, private mapping had been created atomically as part of
> the mremap() call.  If a userfaultfd was watching the source, it will
> continue to watch the new mapping.  For a mapping that is shared or not
> anonymous, MREMAP_DONTUNMAP will cause the mremap() call to fail.
>
> Or is it something else?
>
> >
> > This feature will find a use in ChromeOS along with userfaultfd.
> > Specifically we will want to register a VMA with userfaultfd and then
> > pull it out from under a running process. By using MREMAP_DONTUNMAP we
> > don't have to worry about mprotecting and then potentially racing with
> > VMA permission changes from a running process.
>
> Does this mean you yank it out but you want to replace it simultaneously?
>
> >
> > This feature also has a use case in Android, Lokesh Gidra has said
> > that "As part of using userfaultfd for GC, We'll have to move the
> physical
> > pages of the java heap to a separate location. For this purpose mremap
> > will be used. Without the MREMAP_DONTUNMAP flag, when I mremap the java
> > heap, its virtual mapping will be removed as well. Therefore, we'll
> > require performing mmap immediately after. This is not only time
> consuming
> > but also opens a time window where a native thread may call mmap and
> > reserve the java heap's address range for its own usage. This flag
> > solves the problem."
>
> Cute.
Brian Geffon Jan. 23, 2020, 7:03 p.m. UTC | #4
Andy,

Thanks, yes, that's a much clearer description of the feature. I'll
make sure to update the description with subsequent patches and with
later man page updates.

Brian

On Wed, Jan 22, 2020 at 7:02 PM Andy Lutomirski <luto@amacapital.net> wrote:
>
>
>
> > On Jan 22, 2020, at 5:46 PM, Brian Geffon <bgeffon@google.com> wrote:
> >
> > MREMAP_DONTUNMAP is an additional flag that can be used with
> > MREMAP_FIXED to move a mapping to a new address. Normally, mremap(2)
> > would then tear down the old vma so subsequent accesses to the vma
> > cause a segfault. However, with this new flag it will keep the old
> > vma with zapping PTEs so any access to the old VMA after that point
> > will result in a pagefault.
>
> This needs a vastly better description. Perhaps:
>
> When remapping an anonymous, private mapping, if MREMAP_DONTUNMAP is set, the source mapping will not be removed. Instead it will be cleared as if a brand new anonymous, private mapping had been created atomically as part of the mremap() call.  If a userfaultfd was watching the source, it will continue to watch the new mapping.  For a mapping that is shared or not anonymous, MREMAP_DONTUNMAP will cause the mremap() call to fail.
>
> Or is it something else?
>
> >
> > This feature will find a use in ChromeOS along with userfaultfd.
> > Specifically we will want to register a VMA with userfaultfd and then
> > pull it out from under a running process. By using MREMAP_DONTUNMAP we
> > don't have to worry about mprotecting and then potentially racing with
> > VMA permission changes from a running process.
>
> Does this mean you yank it out but you want to replace it simultaneously?
>
> >
> > This feature also has a use case in Android, Lokesh Gidra has said
> > that "As part of using userfaultfd for GC, We'll have to move the physical
> > pages of the java heap to a separate location. For this purpose mremap
> > will be used. Without the MREMAP_DONTUNMAP flag, when I mremap the java
> > heap, its virtual mapping will be removed as well. Therefore, we'll
> > require performing mmap immediately after. This is not only time consuming
> > but also opens a time window where a native thread may call mmap and
> > reserve the java heap's address range for its own usage. This flag
> > solves the problem."
>
> Cute.
Dan Carpenter Jan. 27, 2020, 4:46 a.m. UTC | #5
Hi Brian,

url:    https://github.com/0day-ci/linux/commits/Brian-Geffon/mm-Add-MREMAP_DONTUNMAP-to-mremap/20200125-013342
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 4703d9119972bf586d2cca76ec6438f819ffa30e

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>

smatch warnings:
mm/mremap.c:561 mremap_to() error: potentially dereferencing uninitialized 'vma'.

# https://github.com/0day-ci/linux/commit/98663ca05501623c3da7f0f30be8ba7d632cf010
git remote add linux-review https://github.com/0day-ci/linux
git remote update linux-review
git checkout 98663ca05501623c3da7f0f30be8ba7d632cf010
vim +/vma +561 mm/mremap.c

81909b842107ef Michel Lespinasse  2013-02-22  506  static unsigned long mremap_to(unsigned long addr, unsigned long old_len,
72f87654c69690 Pavel Emelyanov    2017-02-22  507  		unsigned long new_addr, unsigned long new_len, bool *locked,
98663ca0550162 Brian Geffon       2020-01-22  508  		unsigned long flags, struct vm_userfaultfd_ctx *uf,
b22823719302e8 Mike Rapoport      2017-08-02  509  		struct list_head *uf_unmap_early,
897ab3e0c49e24 Mike Rapoport      2017-02-24  510  		struct list_head *uf_unmap)
ecc1a8993751de Al Viro            2009-11-24  511  {
ecc1a8993751de Al Viro            2009-11-24  512  	struct mm_struct *mm = current->mm;
ecc1a8993751de Al Viro            2009-11-24  513  	struct vm_area_struct *vma;
ecc1a8993751de Al Viro            2009-11-24  514  	unsigned long ret = -EINVAL;
ecc1a8993751de Al Viro            2009-11-24  515  	unsigned long charged = 0;
097eed103862f9 Al Viro            2009-11-24  516  	unsigned long map_flags;
ecc1a8993751de Al Viro            2009-11-24  517  
f19cb115a25f3f Alexander Kuleshov 2015-11-05  518  	if (offset_in_page(new_addr))
ecc1a8993751de Al Viro            2009-11-24  519  		goto out;
ecc1a8993751de Al Viro            2009-11-24  520  
ecc1a8993751de Al Viro            2009-11-24  521  	if (new_len > TASK_SIZE || new_addr > TASK_SIZE - new_len)
ecc1a8993751de Al Viro            2009-11-24  522  		goto out;
ecc1a8993751de Al Viro            2009-11-24  523  
9943242ca46814 Oleg Nesterov      2015-09-04  524  	/* Ensure the old/new locations do not overlap */
9943242ca46814 Oleg Nesterov      2015-09-04  525  	if (addr + old_len > new_addr && new_addr + new_len > addr)
ecc1a8993751de Al Viro            2009-11-24  526  		goto out;
ecc1a8993751de Al Viro            2009-11-24  527  
ea2c3f6f554561 Oscar Salvador     2019-03-05  528  	/*
ea2c3f6f554561 Oscar Salvador     2019-03-05  529  	 * move_vma() need us to stay 4 maps below the threshold, otherwise
ea2c3f6f554561 Oscar Salvador     2019-03-05  530  	 * it will bail out at the very beginning.
ea2c3f6f554561 Oscar Salvador     2019-03-05  531  	 * That is a problem if we have already unmaped the regions here
ea2c3f6f554561 Oscar Salvador     2019-03-05  532  	 * (new_addr, and old_addr), because userspace will not know the
ea2c3f6f554561 Oscar Salvador     2019-03-05  533  	 * state of the vma's after it gets -ENOMEM.
ea2c3f6f554561 Oscar Salvador     2019-03-05  534  	 * So, to avoid such scenario we can pre-compute if the whole
ea2c3f6f554561 Oscar Salvador     2019-03-05  535  	 * operation has high chances to success map-wise.
ea2c3f6f554561 Oscar Salvador     2019-03-05  536  	 * Worst-scenario case is when both vma's (new_addr and old_addr) get
ea2c3f6f554561 Oscar Salvador     2019-03-05  537  	 * split in 3 before unmaping it.
ea2c3f6f554561 Oscar Salvador     2019-03-05  538  	 * That means 2 more maps (1 for each) to the ones we already hold.
ea2c3f6f554561 Oscar Salvador     2019-03-05  539  	 * Check whether current map count plus 2 still leads us to 4 maps below
ea2c3f6f554561 Oscar Salvador     2019-03-05  540  	 * the threshold, otherwise return -ENOMEM here to be more safe.
ea2c3f6f554561 Oscar Salvador     2019-03-05  541  	 */
ea2c3f6f554561 Oscar Salvador     2019-03-05  542  	if ((mm->map_count + 2) >= sysctl_max_map_count - 3)
ea2c3f6f554561 Oscar Salvador     2019-03-05  543  		return -ENOMEM;
ea2c3f6f554561 Oscar Salvador     2019-03-05  544  
b22823719302e8 Mike Rapoport      2017-08-02  545  	ret = do_munmap(mm, new_addr, new_len, uf_unmap_early);
ecc1a8993751de Al Viro            2009-11-24  546  	if (ret)
ecc1a8993751de Al Viro            2009-11-24  547  		goto out;
ecc1a8993751de Al Viro            2009-11-24  548  
ecc1a8993751de Al Viro            2009-11-24  549  	if (old_len >= new_len) {
897ab3e0c49e24 Mike Rapoport      2017-02-24  550  		ret = do_munmap(mm, addr+new_len, old_len - new_len, uf_unmap);
ecc1a8993751de Al Viro            2009-11-24  551  		if (ret && old_len != new_len)
ecc1a8993751de Al Viro            2009-11-24  552  			goto out;
ecc1a8993751de Al Viro            2009-11-24  553  		old_len = new_len;
ecc1a8993751de Al Viro            2009-11-24  554  	}
ecc1a8993751de Al Viro            2009-11-24  555  
98663ca0550162 Brian Geffon       2020-01-22  556  	/*
98663ca0550162 Brian Geffon       2020-01-22  557  	 * MREMAP_DONTUNMAP expands by old_len + (new_len - old_len), we will
98663ca0550162 Brian Geffon       2020-01-22  558  	 * check that we can expand by old_len and vma_to_resize will handle
98663ca0550162 Brian Geffon       2020-01-22  559  	 * the vma growing.
98663ca0550162 Brian Geffon       2020-01-22  560  	 */
98663ca0550162 Brian Geffon       2020-01-22 @561  	if (unlikely(flags & MREMAP_DONTUNMAP && !may_expand_vm(mm,
98663ca0550162 Brian Geffon       2020-01-22  562  				vma->vm_flags, old_len >> PAGE_SHIFT))) {
                                                                                ^^^^^^^^^^^^^

98663ca0550162 Brian Geffon       2020-01-22  563  		ret = -ENOMEM;
98663ca0550162 Brian Geffon       2020-01-22  564  		goto out;
98663ca0550162 Brian Geffon       2020-01-22  565  	}
98663ca0550162 Brian Geffon       2020-01-22  566  
ecc1a8993751de Al Viro            2009-11-24  567  	vma = vma_to_resize(addr, old_len, new_len, &charged);
                                                        ^^^^^^^^^^^^^^^^^^^^

ecc1a8993751de Al Viro            2009-11-24  568  	if (IS_ERR(vma)) {
ecc1a8993751de Al Viro            2009-11-24  569  		ret = PTR_ERR(vma);
ecc1a8993751de Al Viro            2009-11-24  570  		goto out;
ecc1a8993751de Al Viro            2009-11-24  571  	}
ecc1a8993751de Al Viro            2009-11-24  572  
097eed103862f9 Al Viro            2009-11-24  573  	map_flags = MAP_FIXED;

---
0-DAY kernel test infrastructure                 Open Source Technology Center
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org Intel Corporation
diff mbox series

Patch

diff --git a/include/uapi/linux/mman.h b/include/uapi/linux/mman.h
index fc1a64c3447b..923cc162609c 100644
--- a/include/uapi/linux/mman.h
+++ b/include/uapi/linux/mman.h
@@ -5,8 +5,9 @@ 
 #include <asm/mman.h>
 #include <asm-generic/hugetlb_encode.h>
 
-#define MREMAP_MAYMOVE	1
-#define MREMAP_FIXED	2
+#define MREMAP_MAYMOVE		1
+#define MREMAP_FIXED		2
+#define MREMAP_DONTUNMAP	4
 
 #define OVERCOMMIT_GUESS		0
 #define OVERCOMMIT_ALWAYS		1
diff --git a/mm/mremap.c b/mm/mremap.c
index 122938dcec15..bf97c3eb538b 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -318,8 +318,8 @@  unsigned long move_page_tables(struct vm_area_struct *vma,
 static unsigned long move_vma(struct vm_area_struct *vma,
 		unsigned long old_addr, unsigned long old_len,
 		unsigned long new_len, unsigned long new_addr,
-		bool *locked, struct vm_userfaultfd_ctx *uf,
-		struct list_head *uf_unmap)
+		bool *locked, unsigned long flags,
+		struct vm_userfaultfd_ctx *uf, struct list_head *uf_unmap)
 {
 	struct mm_struct *mm = vma->vm_mm;
 	struct vm_area_struct *new_vma;
@@ -408,6 +408,13 @@  static unsigned long move_vma(struct vm_area_struct *vma,
 	if (unlikely(vma->vm_flags & VM_PFNMAP))
 		untrack_pfn_moved(vma);
 
+	if (unlikely(!err && (flags & MREMAP_DONTUNMAP))) {
+		if (vm_flags & VM_ACCOUNT)
+			vma->vm_flags |= VM_ACCOUNT;
+
+		goto out;
+	}
+
 	if (do_munmap(mm, old_addr, old_len, uf_unmap) < 0) {
 		/* OOM: unable to split vma, just get accounts right */
 		vm_unacct_memory(excess >> PAGE_SHIFT);
@@ -422,6 +429,7 @@  static unsigned long move_vma(struct vm_area_struct *vma,
 			vma->vm_next->vm_flags |= VM_ACCOUNT;
 	}
 
+out:
 	if (vm_flags & VM_LOCKED) {
 		mm->locked_vm += new_len >> PAGE_SHIFT;
 		*locked = true;
@@ -497,7 +505,7 @@  static struct vm_area_struct *vma_to_resize(unsigned long addr,
 
 static unsigned long mremap_to(unsigned long addr, unsigned long old_len,
 		unsigned long new_addr, unsigned long new_len, bool *locked,
-		struct vm_userfaultfd_ctx *uf,
+		unsigned long flags, struct vm_userfaultfd_ctx *uf,
 		struct list_head *uf_unmap_early,
 		struct list_head *uf_unmap)
 {
@@ -545,6 +553,17 @@  static unsigned long mremap_to(unsigned long addr, unsigned long old_len,
 		old_len = new_len;
 	}
 
+	/*
+	 * MREMAP_DONTUNMAP expands by old_len + (new_len - old_len), we will
+	 * check that we can expand by old_len and vma_to_resize will handle
+	 * the vma growing.
+	 */
+	if (unlikely(flags & MREMAP_DONTUNMAP && !may_expand_vm(mm,
+				vma->vm_flags, old_len >> PAGE_SHIFT))) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
 	vma = vma_to_resize(addr, old_len, new_len, &charged);
 	if (IS_ERR(vma)) {
 		ret = PTR_ERR(vma);
@@ -561,7 +580,7 @@  static unsigned long mremap_to(unsigned long addr, unsigned long old_len,
 	if (IS_ERR_VALUE(ret))
 		goto out1;
 
-	ret = move_vma(vma, addr, old_len, new_len, new_addr, locked, uf,
+	ret = move_vma(vma, addr, old_len, new_len, new_addr, locked, flags, uf,
 		       uf_unmap);
 	if (!(offset_in_page(ret)))
 		goto out;
@@ -609,12 +628,15 @@  SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
 	addr = untagged_addr(addr);
 	new_addr = untagged_addr(new_addr);
 
-	if (flags & ~(MREMAP_FIXED | MREMAP_MAYMOVE))
+	if (flags & ~(MREMAP_FIXED | MREMAP_MAYMOVE | MREMAP_DONTUNMAP))
 		return ret;
 
 	if (flags & MREMAP_FIXED && !(flags & MREMAP_MAYMOVE))
 		return ret;
 
+	if (flags & MREMAP_DONTUNMAP && !(flags & MREMAP_FIXED))
+		return ret;
+
 	if (offset_in_page(addr))
 		return ret;
 
@@ -634,7 +656,8 @@  SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
 
 	if (flags & MREMAP_FIXED) {
 		ret = mremap_to(addr, old_len, new_addr, new_len,
-				&locked, &uf, &uf_unmap_early, &uf_unmap);
+				&locked, flags, &uf, &uf_unmap_early,
+				&uf_unmap);
 		goto out;
 	}
 
@@ -712,7 +735,7 @@  SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
 		}
 
 		ret = move_vma(vma, addr, old_len, new_len, new_addr,
-			       &locked, &uf, &uf_unmap);
+			       &locked, flags, &uf, &uf_unmap);
 	}
 out:
 	if (offset_in_page(ret)) {