diff mbox series

[RFC,v2,7/8] mseal:Check seal flag for mmap(2)

Message ID 20231017090815.1067790-8-jeffxu@chromium.org (mailing list archive)
State New
Headers show
Series Introduce mseal() syscall | expand

Commit Message

Jeff Xu Oct. 17, 2023, 9:08 a.m. UTC
From: Jeff Xu <jeffxu@google.com>

mmap(2) can change a protection of existing VMAs.
Sealing will prevent unintended mmap(2) call.

What this patch does:
When a mmap(2) is invoked, if one of its VMAs has MM_SEAL_MMAP set
from previous mseal(2) call, the mmap(2) will fail, without any
VMAs modified.

The patch is based on following:
There are two cases: with MMU, NO MMU.

For MMU case:
1. ksys_mmap_pgoff() currently are called in 2 places:
SYSCALL_DEFINE1(old_mmap, ...)
SYSCALL_DEFINE6(mmap_pgoff,...)
Since both are syscall entry point, omit adding
checkSeals in the signature of ksys_mmap_pgoff().

2. ksys_mmap_pgoff() calls vm_mmap_pgoff() with
checkSeals = MM_SEAL_MMAP, in turn, checkSeals flag is
passed into do_mmap(),
Note: Of all the call paths that goes into do_mmap(),
ksys_mmap_pgoff() is the only place where
checkSeals = MM_SEAL_MMAP. The rest has checkSeals = 0.

3. In do_mmap(), call can_modify_mm() before any update
is maded to the VMAs.

For NON-MMU case:
Set checkSeals = 0 for all cases.

Signed-off-by: Jeff Xu <jeffxu@google.com>
---
 fs/aio.c           |  5 +++--
 include/linux/mm.h |  5 ++++-
 ipc/shm.c          |  3 ++-
 mm/internal.h      |  4 ++--
 mm/mmap.c          | 22 ++++++++++++++++++----
 mm/nommu.c         |  6 ++++--
 mm/util.c          |  8 +++++---
 7 files changed, 38 insertions(+), 15 deletions(-)

Comments

Linus Torvalds Oct. 17, 2023, 5:04 p.m. UTC | #1
On Tue, 17 Oct 2023 at 02:08, <jeffxu@chromium.org> wrote:
>
> Note: Of all the call paths that goes into do_mmap(),
> ksys_mmap_pgoff() is the only place where
> checkSeals = MM_SEAL_MMAP. The rest has checkSeals = 0.

Again, this is all completely nonsensical.

First off, since seals only exist on existing vma's, the "MMAP seal"
makes no sense to begin with. You cannot mmap twice - and mmap'ing
over an existing mapping involves unmapping the old one.

So a "mmap seal" is nonsensical. What should protect a mapping is "you
cannot unmap this". That automatically means that you cannot mmap over
it.

In other words, all these sealing flag semantics are completely random
noise. None of this makes sense.

You need to EXPLAIN what the concept is.

Honestly, there is only two kinds of sealing that makes sense:

 - you cannot change the permissions of some area

 - you cannot unmap an area

where that first version might then have sub-cases (maybe you can make
permissions _stricter_, but not the other way)?

And dammit, once something is sealed, it is SEALED. None of this crazy
"one place honors the sealing, random other places do not".

I do *NOT* want to see another random patch series tomorrow that
modifies something small here.

I want to get an EXPLANATION and the whole "what the f*ck is the
concept". No more random rules. No more nonsensical code. No more of
this "one place honors seals, another one does not".

Seriously. As long as this is chock-full of these kinds of random
"this makes no sense", please don't send any patches AT ALL. Explain
the high-level rules first, and if you cannot explain why one random
place does something different from all the other random places, don't
even bother.

No more random code. No more random seals. None of this crazy "you
ostensibly can't unmap a vma, but you you can actually unmap it by
mmap'ing over it and then unmapping the new one".

             Linus
Linus Torvalds Oct. 17, 2023, 5:43 p.m. UTC | #2
On Tue, 17 Oct 2023 at 10:04, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Honestly, there is only two kinds of sealing that makes sense:
>
>  - you cannot change the permissions of some area
>
>  - you cannot unmap an area

Actually, I guess at least theoretically, there could be three different things:

 - you cannot move an area

although I do think that maybe just saying "you cannot unmap" might
also include "you cannot move".

But I guess it depends on whether you feel it's the virtual _address_
you are protecting, or whether it's the concept of mapping something.

I personally think that from a security perspective, what you want to
protect is a particular address. That implies that "seal from
unmapping" would thus also include "you can't move this area
elsewhere".

But at least conceptually, splitting "unmap" and "move" apart might
make some sense. I would like to hear a practical reason for it,
though.

Without that practical reason, I think the only two sane sealing operations are:

 - SEAL_MUNMAP: "don't allow this mapping address to go away"

   IOW no unmap, no shrinking, no moving mremap

 - SEAL_MPROTECT: "don't allow any mapping permission changes"

Again, that permission case might end up being "don't allow
_additional_ permissions" and "don't allow taking permissions away".
Or it could be split by operation (ie "don't allow permission changes
to writability / readability / executability respectively").

I suspect there isn't a real-life example of splitting the
SEAL_MPROTECT (the same way I doubt there's a real-life example for
splitting the UNMAP into "unmap vs move"), so unless there is some
real reason, I'd keep the sealing minimal and to just those two flags.

We could always add more flags later, if there is a real use case
(IOW, if we start with "don't allow any permission changes", we could
add a flag later that just says "don't allow writability changes").

               Linus
Jeff Xu Oct. 18, 2023, 7:01 a.m. UTC | #3
Hi Linus,

On Tue, Oct 17, 2023 at 10:43 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Tue, 17 Oct 2023 at 10:04, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > Honestly, there is only two kinds of sealing that makes sense:
> >
> >  - you cannot change the permissions of some area
> >
> >  - you cannot unmap an area
>
> Actually, I guess at least theoretically, there could be three different things:
>
>  - you cannot move an area
>
Yes.

Actually, the newly added selftest covers some of the above:
1. can't change the permission of some areas.
test_seal_mprotect
test_seal_mmap_overwrite_prot

2. can't unmap an area (thus mmap() to the same address later)
test_seal_munmap

3. can't move to an area:
test_seal_mremap_move         //can't move from a sealed area:
test_seal_mremap_move_fixed_zero //can't move from a sealed area to a
fixed address
test_seal_mremap_move_fixed   //can't move to a sealed area.

4 can't expand or shrink the area:
test_seal_mremap_shrink
test_seal_mremap_expand

> although I do think that maybe just saying "you cannot unmap" might
> also include "you cannot move".
>
> But I guess it depends on whether you feel it's the virtual _address_
> you are protecting, or whether it's the concept of mapping something.
>
> I personally think that from a security perspective, what you want to
> protect is a particular address. That implies that "seal from
> unmapping" would thus also include "you can't move this area
> elsewhere".
>
> But at least conceptually, splitting "unmap" and "move" apart might
> make some sense. I would like to hear a practical reason for it,
> though.
>
> Without that practical reason, I think the only two sane sealing operations are:
>
>  - SEAL_MUNMAP: "don't allow this mapping address to go away"
>
>    IOW no unmap, no shrinking, no moving mremap
>
>  - SEAL_MPROTECT: "don't allow any mapping permission changes"
>
I agree with the concept in general. The separation of two seal types
is easy to understand.

For mmap(MAP_FIXED), I know for a fact that it can modify permission of
an existing mapping, (as in selftest:test_seal_mmap_overwrite_prot).
I think it can also expand an existing VMA. This is not a problem, code-wise,
I mention it here, because it needs extra care when coding mmap() change.

> Again, that permission case might end up being "don't allow
> _additional_ permissions" and "don't allow taking permissions away".
> Or it could be split by operation (ie "don't allow permission changes
> to writability / readability / executability respectively").
>
Yes. If the application desires this, it can also be done.
i.e. seal of X bit, or seal of W bit, this will be similar to file sealing.
I discussed this with Stephan before, at this point of time,  Chrome
doesn't have a use case.

> I suspect there isn't a real-life example of splitting the
> SEAL_MPROTECT (the same way I doubt there's a real-life example for
> splitting the UNMAP into "unmap vs move"), so unless there is some
> real reason, I'd keep the sealing minimal and to just those two flags.
>
I think two seal-type (permission and unmap/move/expand/shrink)
will work for the Chrome case. Stephen Röttger is an expert in Chrome,
on vacation/ be back soon. I will wait for Stephen to confirm.

> We could always add more flags later, if there is a real use case
> (IOW, if we start with "don't allow any permission changes", we could
> add a flag later that just says "don't allow writability changes").
>
Agreed 100%, thanks for understanding.

-Jeff


>                Linus
Stephen Röttger Oct. 19, 2023, 7:27 a.m. UTC | #4
> Without that practical reason, I think the only two sane sealing operations are:
>
>  - SEAL_MUNMAP: "don't allow this mapping address to go away"
>
>    IOW no unmap, no shrinking, no moving mremap
>
>  - SEAL_MPROTECT: "don't allow any mapping permission changes"
>
> Again, that permission case might end up being "don't allow
> _additional_ permissions" and "don't allow taking permissions away".
> Or it could be split by operation (ie "don't allow permission changes
> to writability / readability / executability respectively").
>
> I suspect there isn't a real-life example of splitting the
> SEAL_MPROTECT (the same way I doubt there's a real-life example for
> splitting the UNMAP into "unmap vs move"), so unless there is some
> real reason, I'd keep the sealing minimal and to just those two flags.

These two flags are exactly what we would use in Chrome. I can't think of a
use case for a more fine grained split either.
diff mbox series

Patch

diff --git a/fs/aio.c b/fs/aio.c
index b3174da80ff6..7f4863d0082d 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -557,8 +557,9 @@  static int aio_setup_ring(struct kioctx *ctx, unsigned int nr_events)
 	}
 
 	ctx->mmap_base = do_mmap(ctx->aio_ring_file, 0, ctx->mmap_size,
-				 PROT_READ | PROT_WRITE,
-				 MAP_SHARED, 0, &unused, NULL);
+				PROT_READ | PROT_WRITE, MAP_SHARED, 0, &unused,
+				NULL, 0);
+
 	mmap_write_unlock(mm);
 	if (IS_ERR((void *)ctx->mmap_base)) {
 		ctx->mmap_size = 0;
diff --git a/include/linux/mm.h b/include/linux/mm.h
index f2f316522f2a..9f496c9f2970 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3274,9 +3274,12 @@  extern unsigned long get_unmapped_area(struct file *, unsigned long, unsigned lo
 extern unsigned long mmap_region(struct file *file, unsigned long addr,
 	unsigned long len, vm_flags_t vm_flags, unsigned long pgoff,
 	struct list_head *uf);
+
 extern unsigned long do_mmap(struct file *file, unsigned long addr,
 	unsigned long len, unsigned long prot, unsigned long flags,
-	unsigned long pgoff, unsigned long *populate, struct list_head *uf);
+	unsigned long pgoff, unsigned long *populate, struct list_head *uf,
+	unsigned long checkSeals);
+
 extern int do_vmi_munmap(struct vma_iterator *vmi, struct mm_struct *mm,
 			 unsigned long start, size_t len, struct list_head *uf,
 			 bool unlock, unsigned long checkSeals);
diff --git a/ipc/shm.c b/ipc/shm.c
index 60e45e7045d4..3660f522ecba 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -1662,7 +1662,8 @@  long do_shmat(int shmid, char __user *shmaddr, int shmflg,
 			goto invalid;
 	}
 
-	addr = do_mmap(file, addr, size, prot, flags, 0, &populate, NULL);
+	addr = do_mmap(file, addr, size, prot, flags, 0, &populate, NULL,
+			0);
 	*raddr = addr;
 	err = 0;
 	if (IS_ERR_VALUE(addr))
diff --git a/mm/internal.h b/mm/internal.h
index d1d4bf4e63c0..2c074d8c6abd 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -800,8 +800,8 @@  extern u64 hwpoison_filter_memcg;
 extern u32 hwpoison_filter_enable;
 
 extern unsigned long  __must_check vm_mmap_pgoff(struct file *, unsigned long,
-        unsigned long, unsigned long,
-        unsigned long, unsigned long);
+	unsigned long, unsigned long, unsigned long, unsigned long,
+	unsigned long checkSeals);
 
 extern void set_pageblock_order(void);
 unsigned long reclaim_pages(struct list_head *folio_list);
diff --git a/mm/mmap.c b/mm/mmap.c
index 62d592f16f45..edcadd2bb394 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1197,7 +1197,8 @@  static inline bool file_mmap_ok(struct file *file, struct inode *inode,
 unsigned long do_mmap(struct file *file, unsigned long addr,
 			unsigned long len, unsigned long prot,
 			unsigned long flags, unsigned long pgoff,
-			unsigned long *populate, struct list_head *uf)
+			unsigned long *populate, struct list_head *uf,
+			unsigned long checkSeals)
 {
 	struct mm_struct *mm = current->mm;
 	vm_flags_t vm_flags;
@@ -1365,6 +1366,9 @@  unsigned long do_mmap(struct file *file, unsigned long addr,
 			vm_flags |= VM_NORESERVE;
 	}
 
+	if (!can_modify_mm(mm, addr, addr + len, MM_SEAL_MMAP))
+		return -EACCES;
+
 	addr = mmap_region(file, addr, len, vm_flags, pgoff, uf);
 	if (!IS_ERR_VALUE(addr) &&
 	    ((vm_flags & VM_LOCKED) ||
@@ -1411,7 +1415,17 @@  unsigned long ksys_mmap_pgoff(unsigned long addr, unsigned long len,
 			return PTR_ERR(file);
 	}
 
-	retval = vm_mmap_pgoff(file, addr, len, prot, flags, pgoff);
+	/*
+	 * vm_mmap_pgoff() currently called from two places:
+	 * SYSCALL_DEFINE1(old_mmap, ...)
+	 * SYSCALL_DEFINE6(mmap_pgoff,...)
+	 * and not in any other places.
+	 * Therefore, omit changing the signature of vm_mmap_pgoff()
+	 * Otherwise, we might need to add checkSeals and pass it
+	 * from all callers of vm_mmap_pgoff().
+	 */
+	retval = vm_mmap_pgoff(file, addr, len, prot, flags, pgoff,
+				MM_SEAL_MMAP);
 out_fput:
 	if (file)
 		fput(file);
@@ -3016,8 +3030,8 @@  SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size,
 		flags |= MAP_LOCKED;
 
 	file = get_file(vma->vm_file);
-	ret = do_mmap(vma->vm_file, start, size,
-			prot, flags, pgoff, &populate, NULL);
+	ret = do_mmap(vma->vm_file, start, size, prot, flags, pgoff,
+			&populate, NULL, 0);
 	fput(file);
 out:
 	mmap_write_unlock(mm);
diff --git a/mm/nommu.c b/mm/nommu.c
index 8dba41cfc44d..dc83651ee777 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -1018,7 +1018,8 @@  unsigned long do_mmap(struct file *file,
 			unsigned long flags,
 			unsigned long pgoff,
 			unsigned long *populate,
-			struct list_head *uf)
+			struct list_head *uf,
+			unsigned long checkSeals)
 {
 	struct vm_area_struct *vma;
 	struct vm_region *region;
@@ -1262,7 +1263,8 @@  unsigned long ksys_mmap_pgoff(unsigned long addr, unsigned long len,
 			goto out;
 	}
 
-	retval = vm_mmap_pgoff(file, addr, len, prot, flags, pgoff);
+	retval = vm_mmap_pgoff(file, addr, len, prot, flags, pgoff,
+				0);
 
 	if (file)
 		fput(file);
diff --git a/mm/util.c b/mm/util.c
index 4ed8b9b5273c..ca9d8c69267c 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -532,7 +532,8 @@  EXPORT_SYMBOL_GPL(account_locked_vm);
 
 unsigned long vm_mmap_pgoff(struct file *file, unsigned long addr,
 	unsigned long len, unsigned long prot,
-	unsigned long flag, unsigned long pgoff)
+	unsigned long flag, unsigned long pgoff,
+	unsigned long checkseals)
 {
 	unsigned long ret;
 	struct mm_struct *mm = current->mm;
@@ -544,7 +545,7 @@  unsigned long vm_mmap_pgoff(struct file *file, unsigned long addr,
 		if (mmap_write_lock_killable(mm))
 			return -EINTR;
 		ret = do_mmap(file, addr, len, prot, flag, pgoff, &populate,
-			      &uf);
+			      &uf, checkseals);
 		mmap_write_unlock(mm);
 		userfaultfd_unmap_complete(mm, &uf);
 		if (populate)
@@ -562,7 +563,8 @@  unsigned long vm_mmap(struct file *file, unsigned long addr,
 	if (unlikely(offset_in_page(offset)))
 		return -EINVAL;
 
-	return vm_mmap_pgoff(file, addr, len, prot, flag, offset >> PAGE_SHIFT);
+	return vm_mmap_pgoff(file, addr, len, prot, flag, offset >> PAGE_SHIFT,
+				0);
 }
 EXPORT_SYMBOL(vm_mmap);