Message ID | 20231017090815.1067790-8-jeffxu@chromium.org (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Introduce mseal() syscall | expand |
On Tue, 17 Oct 2023 at 02:08, <jeffxu@chromium.org> wrote: > > Note: Of all the call paths that goes into do_mmap(), > ksys_mmap_pgoff() is the only place where > checkSeals = MM_SEAL_MMAP. The rest has checkSeals = 0. Again, this is all completely nonsensical. First off, since seals only exist on existing vma's, the "MMAP seal" makes no sense to begin with. You cannot mmap twice - and mmap'ing over an existing mapping involves unmapping the old one. So a "mmap seal" is nonsensical. What should protect a mapping is "you cannot unmap this". That automatically means that you cannot mmap over it. In other words, all these sealing flag semantics are completely random noise. None of this makes sense. You need to EXPLAIN what the concept is. Honestly, there is only two kinds of sealing that makes sense: - you cannot change the permissions of some area - you cannot unmap an area where that first version might then have sub-cases (maybe you can make permissions _stricter_, but not the other way)? And dammit, once something is sealed, it is SEALED. None of this crazy "one place honors the sealing, random other places do not". I do *NOT* want to see another random patch series tomorrow that modifies something small here. I want to get an EXPLANATION and the whole "what the f*ck is the concept". No more random rules. No more nonsensical code. No more of this "one place honors seals, another one does not". Seriously. As long as this is chock-full of these kinds of random "this makes no sense", please don't send any patches AT ALL. Explain the high-level rules first, and if you cannot explain why one random place does something different from all the other random places, don't even bother. No more random code. No more random seals. None of this crazy "you ostensibly can't unmap a vma, but you you can actually unmap it by mmap'ing over it and then unmapping the new one". Linus
On Tue, 17 Oct 2023 at 10:04, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > Honestly, there is only two kinds of sealing that makes sense: > > - you cannot change the permissions of some area > > - you cannot unmap an area Actually, I guess at least theoretically, there could be three different things: - you cannot move an area although I do think that maybe just saying "you cannot unmap" might also include "you cannot move". But I guess it depends on whether you feel it's the virtual _address_ you are protecting, or whether it's the concept of mapping something. I personally think that from a security perspective, what you want to protect is a particular address. That implies that "seal from unmapping" would thus also include "you can't move this area elsewhere". But at least conceptually, splitting "unmap" and "move" apart might make some sense. I would like to hear a practical reason for it, though. Without that practical reason, I think the only two sane sealing operations are: - SEAL_MUNMAP: "don't allow this mapping address to go away" IOW no unmap, no shrinking, no moving mremap - SEAL_MPROTECT: "don't allow any mapping permission changes" Again, that permission case might end up being "don't allow _additional_ permissions" and "don't allow taking permissions away". Or it could be split by operation (ie "don't allow permission changes to writability / readability / executability respectively"). I suspect there isn't a real-life example of splitting the SEAL_MPROTECT (the same way I doubt there's a real-life example for splitting the UNMAP into "unmap vs move"), so unless there is some real reason, I'd keep the sealing minimal and to just those two flags. We could always add more flags later, if there is a real use case (IOW, if we start with "don't allow any permission changes", we could add a flag later that just says "don't allow writability changes"). Linus
Hi Linus, On Tue, Oct 17, 2023 at 10:43 AM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > On Tue, 17 Oct 2023 at 10:04, Linus Torvalds > <torvalds@linux-foundation.org> wrote: > > > > Honestly, there is only two kinds of sealing that makes sense: > > > > - you cannot change the permissions of some area > > > > - you cannot unmap an area > > Actually, I guess at least theoretically, there could be three different things: > > - you cannot move an area > Yes. Actually, the newly added selftest covers some of the above: 1. can't change the permission of some areas. test_seal_mprotect test_seal_mmap_overwrite_prot 2. can't unmap an area (thus mmap() to the same address later) test_seal_munmap 3. can't move to an area: test_seal_mremap_move //can't move from a sealed area: test_seal_mremap_move_fixed_zero //can't move from a sealed area to a fixed address test_seal_mremap_move_fixed //can't move to a sealed area. 4 can't expand or shrink the area: test_seal_mremap_shrink test_seal_mremap_expand > although I do think that maybe just saying "you cannot unmap" might > also include "you cannot move". > > But I guess it depends on whether you feel it's the virtual _address_ > you are protecting, or whether it's the concept of mapping something. > > I personally think that from a security perspective, what you want to > protect is a particular address. That implies that "seal from > unmapping" would thus also include "you can't move this area > elsewhere". > > But at least conceptually, splitting "unmap" and "move" apart might > make some sense. I would like to hear a practical reason for it, > though. > > Without that practical reason, I think the only two sane sealing operations are: > > - SEAL_MUNMAP: "don't allow this mapping address to go away" > > IOW no unmap, no shrinking, no moving mremap > > - SEAL_MPROTECT: "don't allow any mapping permission changes" > I agree with the concept in general. The separation of two seal types is easy to understand. For mmap(MAP_FIXED), I know for a fact that it can modify permission of an existing mapping, (as in selftest:test_seal_mmap_overwrite_prot). I think it can also expand an existing VMA. This is not a problem, code-wise, I mention it here, because it needs extra care when coding mmap() change. > Again, that permission case might end up being "don't allow > _additional_ permissions" and "don't allow taking permissions away". > Or it could be split by operation (ie "don't allow permission changes > to writability / readability / executability respectively"). > Yes. If the application desires this, it can also be done. i.e. seal of X bit, or seal of W bit, this will be similar to file sealing. I discussed this with Stephan before, at this point of time, Chrome doesn't have a use case. > I suspect there isn't a real-life example of splitting the > SEAL_MPROTECT (the same way I doubt there's a real-life example for > splitting the UNMAP into "unmap vs move"), so unless there is some > real reason, I'd keep the sealing minimal and to just those two flags. > I think two seal-type (permission and unmap/move/expand/shrink) will work for the Chrome case. Stephen Röttger is an expert in Chrome, on vacation/ be back soon. I will wait for Stephen to confirm. > We could always add more flags later, if there is a real use case > (IOW, if we start with "don't allow any permission changes", we could > add a flag later that just says "don't allow writability changes"). > Agreed 100%, thanks for understanding. -Jeff > Linus
> Without that practical reason, I think the only two sane sealing operations are: > > - SEAL_MUNMAP: "don't allow this mapping address to go away" > > IOW no unmap, no shrinking, no moving mremap > > - SEAL_MPROTECT: "don't allow any mapping permission changes" > > Again, that permission case might end up being "don't allow > _additional_ permissions" and "don't allow taking permissions away". > Or it could be split by operation (ie "don't allow permission changes > to writability / readability / executability respectively"). > > I suspect there isn't a real-life example of splitting the > SEAL_MPROTECT (the same way I doubt there's a real-life example for > splitting the UNMAP into "unmap vs move"), so unless there is some > real reason, I'd keep the sealing minimal and to just those two flags. These two flags are exactly what we would use in Chrome. I can't think of a use case for a more fine grained split either.
diff --git a/fs/aio.c b/fs/aio.c index b3174da80ff6..7f4863d0082d 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -557,8 +557,9 @@ static int aio_setup_ring(struct kioctx *ctx, unsigned int nr_events) } ctx->mmap_base = do_mmap(ctx->aio_ring_file, 0, ctx->mmap_size, - PROT_READ | PROT_WRITE, - MAP_SHARED, 0, &unused, NULL); + PROT_READ | PROT_WRITE, MAP_SHARED, 0, &unused, + NULL, 0); + mmap_write_unlock(mm); if (IS_ERR((void *)ctx->mmap_base)) { ctx->mmap_size = 0; diff --git a/include/linux/mm.h b/include/linux/mm.h index f2f316522f2a..9f496c9f2970 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3274,9 +3274,12 @@ extern unsigned long get_unmapped_area(struct file *, unsigned long, unsigned lo extern unsigned long mmap_region(struct file *file, unsigned long addr, unsigned long len, vm_flags_t vm_flags, unsigned long pgoff, struct list_head *uf); + extern unsigned long do_mmap(struct file *file, unsigned long addr, unsigned long len, unsigned long prot, unsigned long flags, - unsigned long pgoff, unsigned long *populate, struct list_head *uf); + unsigned long pgoff, unsigned long *populate, struct list_head *uf, + unsigned long checkSeals); + extern int do_vmi_munmap(struct vma_iterator *vmi, struct mm_struct *mm, unsigned long start, size_t len, struct list_head *uf, bool unlock, unsigned long checkSeals); diff --git a/ipc/shm.c b/ipc/shm.c index 60e45e7045d4..3660f522ecba 100644 --- a/ipc/shm.c +++ b/ipc/shm.c @@ -1662,7 +1662,8 @@ long do_shmat(int shmid, char __user *shmaddr, int shmflg, goto invalid; } - addr = do_mmap(file, addr, size, prot, flags, 0, &populate, NULL); + addr = do_mmap(file, addr, size, prot, flags, 0, &populate, NULL, + 0); *raddr = addr; err = 0; if (IS_ERR_VALUE(addr)) diff --git a/mm/internal.h b/mm/internal.h index d1d4bf4e63c0..2c074d8c6abd 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -800,8 +800,8 @@ extern u64 hwpoison_filter_memcg; extern u32 hwpoison_filter_enable; extern unsigned long __must_check vm_mmap_pgoff(struct file *, unsigned long, - unsigned long, unsigned long, - unsigned long, unsigned long); + unsigned long, unsigned long, unsigned long, unsigned long, + unsigned long checkSeals); extern void set_pageblock_order(void); unsigned long reclaim_pages(struct list_head *folio_list); diff --git a/mm/mmap.c b/mm/mmap.c index 62d592f16f45..edcadd2bb394 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1197,7 +1197,8 @@ static inline bool file_mmap_ok(struct file *file, struct inode *inode, unsigned long do_mmap(struct file *file, unsigned long addr, unsigned long len, unsigned long prot, unsigned long flags, unsigned long pgoff, - unsigned long *populate, struct list_head *uf) + unsigned long *populate, struct list_head *uf, + unsigned long checkSeals) { struct mm_struct *mm = current->mm; vm_flags_t vm_flags; @@ -1365,6 +1366,9 @@ unsigned long do_mmap(struct file *file, unsigned long addr, vm_flags |= VM_NORESERVE; } + if (!can_modify_mm(mm, addr, addr + len, MM_SEAL_MMAP)) + return -EACCES; + addr = mmap_region(file, addr, len, vm_flags, pgoff, uf); if (!IS_ERR_VALUE(addr) && ((vm_flags & VM_LOCKED) || @@ -1411,7 +1415,17 @@ unsigned long ksys_mmap_pgoff(unsigned long addr, unsigned long len, return PTR_ERR(file); } - retval = vm_mmap_pgoff(file, addr, len, prot, flags, pgoff); + /* + * vm_mmap_pgoff() currently called from two places: + * SYSCALL_DEFINE1(old_mmap, ...) + * SYSCALL_DEFINE6(mmap_pgoff,...) + * and not in any other places. + * Therefore, omit changing the signature of vm_mmap_pgoff() + * Otherwise, we might need to add checkSeals and pass it + * from all callers of vm_mmap_pgoff(). + */ + retval = vm_mmap_pgoff(file, addr, len, prot, flags, pgoff, + MM_SEAL_MMAP); out_fput: if (file) fput(file); @@ -3016,8 +3030,8 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size, flags |= MAP_LOCKED; file = get_file(vma->vm_file); - ret = do_mmap(vma->vm_file, start, size, - prot, flags, pgoff, &populate, NULL); + ret = do_mmap(vma->vm_file, start, size, prot, flags, pgoff, + &populate, NULL, 0); fput(file); out: mmap_write_unlock(mm); diff --git a/mm/nommu.c b/mm/nommu.c index 8dba41cfc44d..dc83651ee777 100644 --- a/mm/nommu.c +++ b/mm/nommu.c @@ -1018,7 +1018,8 @@ unsigned long do_mmap(struct file *file, unsigned long flags, unsigned long pgoff, unsigned long *populate, - struct list_head *uf) + struct list_head *uf, + unsigned long checkSeals) { struct vm_area_struct *vma; struct vm_region *region; @@ -1262,7 +1263,8 @@ unsigned long ksys_mmap_pgoff(unsigned long addr, unsigned long len, goto out; } - retval = vm_mmap_pgoff(file, addr, len, prot, flags, pgoff); + retval = vm_mmap_pgoff(file, addr, len, prot, flags, pgoff, + 0); if (file) fput(file); diff --git a/mm/util.c b/mm/util.c index 4ed8b9b5273c..ca9d8c69267c 100644 --- a/mm/util.c +++ b/mm/util.c @@ -532,7 +532,8 @@ EXPORT_SYMBOL_GPL(account_locked_vm); unsigned long vm_mmap_pgoff(struct file *file, unsigned long addr, unsigned long len, unsigned long prot, - unsigned long flag, unsigned long pgoff) + unsigned long flag, unsigned long pgoff, + unsigned long checkseals) { unsigned long ret; struct mm_struct *mm = current->mm; @@ -544,7 +545,7 @@ unsigned long vm_mmap_pgoff(struct file *file, unsigned long addr, if (mmap_write_lock_killable(mm)) return -EINTR; ret = do_mmap(file, addr, len, prot, flag, pgoff, &populate, - &uf); + &uf, checkseals); mmap_write_unlock(mm); userfaultfd_unmap_complete(mm, &uf); if (populate) @@ -562,7 +563,8 @@ unsigned long vm_mmap(struct file *file, unsigned long addr, if (unlikely(offset_in_page(offset))) return -EINVAL; - return vm_mmap_pgoff(file, addr, len, prot, flag, offset >> PAGE_SHIFT); + return vm_mmap_pgoff(file, addr, len, prot, flag, offset >> PAGE_SHIFT, + 0); } EXPORT_SYMBOL(vm_mmap);