Message ID | 20160111225750.GA4256@www.outflux.net (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Mon, Jan 11, 2016 at 02:57:50PM -0800, Kees Cook wrote: > Normally, when a user can modify a file that has setuid or setgid bits, > those bits are cleared when they are not the file owner or a member > of the group. This is enforced when using write and truncate but not > when writing to a shared mmap on the file. This could allow the file > writer to gain privileges by changing a binary without losing the > setuid/setgid/caps bits. > > Changing the bits requires holding inode->i_mutex, so it cannot be done > during the page fault (due to mmap_sem being held during the fault). We > could do this during vm_mmap_pgoff, but that would need coverage in > mprotect as well, but to check for MAP_SHARED, we'd need to hold mmap_sem > again. We could clear at open() time, but it's possible things are > accidentally opening with O_RDWR and only reading. Better to clear on > close and error failures (i.e. an improvement over now, which is not > clearing at all). > > Instead, detect the need to clear the bits during the page fault, and > actually remove the bits during final fput. Since the file was open for > writing, it wouldn't have been possible to execute it yet (ETXTBSY). > > Signed-off-by: Kees Cook <keescook@chromium.org> > --- [...] > diff --git a/fs/file_table.c b/fs/file_table.c > index ad17e05ebf95..ca11b86613cf 100644 > --- a/fs/file_table.c > +++ b/fs/file_table.c > @@ -191,6 +191,21 @@ static void __fput(struct file *file) > > might_sleep(); > > + /* > + * XXX: This is a delayed removal of privs (we've already been > + * written to), since we must avoid mmap_sem. But a race shouldn't > + * be possible since when open for writing, execve() will fail > + * with ETXTBSY (via deny_write_access()). A remaining problem > + * is that since we've already been written to, we must ignore the > + * return value of file_remove_privs(), since we can't reject the > + * writes of the past. > + */ > + if (unlikely(file->f_flags & O_REMOVEPRIV)) { > + mutex_lock(&inode->i_mutex); > + file_remove_privs(file); > + mutex_unlock(&inode->i_mutex); > + } > + If there is any other setuid file I can run, can't I just do this? pid_t child = fork(); if (child == 0) { /* fd will be 3 or so */ int fd = open("setuid-file-with-bad-privs", O_WRONLY); char *ptr = mmap(..., fd, 0); memcpy(ptr, my_evil_code, sizeof(my_evil_code)); /* su --bad-option just prints usage and exits, without touching * the fd - but since su has the last reference to the fd, __fput * will run with its privileges */ execlp("su", "su", "--bad-option", NULL); } int status; wait(&status); execlp("setuid-file-with-bad-privs", "setuid-file-with-bad-privs", NULL); I think that file_remove_privs() really needs to be changed to use f_cred instead of current_cred(). That would also fix the known bypass where you pass the fd to a setuid process as fd 1, causing the setuid process to write more-or-less controlled data to a chosen offset, or similar stuff (see http://www.halfdog.net/Security/2015/SetgidDirectoryPrivilegeEscalation/). Or was there already another patch that does this that I didn't see?
On Thu, Jan 21, 2016 at 3:22 PM, Jann Horn <jann@thejh.net> wrote: > On Mon, Jan 11, 2016 at 02:57:50PM -0800, Kees Cook wrote: >> Normally, when a user can modify a file that has setuid or setgid bits, >> those bits are cleared when they are not the file owner or a member >> of the group. This is enforced when using write and truncate but not >> when writing to a shared mmap on the file. This could allow the file >> writer to gain privileges by changing a binary without losing the >> setuid/setgid/caps bits. >> >> Changing the bits requires holding inode->i_mutex, so it cannot be done >> during the page fault (due to mmap_sem being held during the fault). We >> could do this during vm_mmap_pgoff, but that would need coverage in >> mprotect as well, but to check for MAP_SHARED, we'd need to hold mmap_sem >> again. We could clear at open() time, but it's possible things are >> accidentally opening with O_RDWR and only reading. Better to clear on >> close and error failures (i.e. an improvement over now, which is not >> clearing at all). >> >> Instead, detect the need to clear the bits during the page fault, and >> actually remove the bits during final fput. Since the file was open for >> writing, it wouldn't have been possible to execute it yet (ETXTBSY). >> >> Signed-off-by: Kees Cook <keescook@chromium.org> >> --- > [...] >> diff --git a/fs/file_table.c b/fs/file_table.c >> index ad17e05ebf95..ca11b86613cf 100644 >> --- a/fs/file_table.c >> +++ b/fs/file_table.c >> @@ -191,6 +191,21 @@ static void __fput(struct file *file) >> >> might_sleep(); >> >> + /* >> + * XXX: This is a delayed removal of privs (we've already been >> + * written to), since we must avoid mmap_sem. But a race shouldn't >> + * be possible since when open for writing, execve() will fail >> + * with ETXTBSY (via deny_write_access()). A remaining problem >> + * is that since we've already been written to, we must ignore the >> + * return value of file_remove_privs(), since we can't reject the >> + * writes of the past. >> + */ >> + if (unlikely(file->f_flags & O_REMOVEPRIV)) { >> + mutex_lock(&inode->i_mutex); >> + file_remove_privs(file); >> + mutex_unlock(&inode->i_mutex); >> + } >> + > > If there is any other setuid file I can run, can't I just do this? > > pid_t child = fork(); > if (child == 0) { > /* fd will be 3 or so */ > int fd = open("setuid-file-with-bad-privs", O_WRONLY); > char *ptr = mmap(..., fd, 0); > memcpy(ptr, my_evil_code, sizeof(my_evil_code)); > /* su --bad-option just prints usage and exits, without touching > * the fd - but since su has the last reference to the fd, __fput > * will run with its privileges */ > execlp("su", "su", "--bad-option", NULL); > } > int status; > wait(&status); > execlp("setuid-file-with-bad-privs", "setuid-file-with-bad-privs", NULL); > > I think that file_remove_privs() really needs to be changed to use f_cred > instead of current_cred(). That would also fix the known bypass where > you pass the fd to a setuid process as fd 1, causing the setuid process > to write more-or-less controlled data to a chosen offset, or similar > stuff (see > http://www.halfdog.net/Security/2015/SetgidDirectoryPrivilegeEscalation/). > > Or was there already another patch that does this that I didn't see? Andy brought it up as an issue, but I view it as a separate problem. Both things need to be fixed. :) -Kees
diff --git a/fs/file_table.c b/fs/file_table.c index ad17e05ebf95..ca11b86613cf 100644 --- a/fs/file_table.c +++ b/fs/file_table.c @@ -191,6 +191,21 @@ static void __fput(struct file *file) might_sleep(); + /* + * XXX: This is a delayed removal of privs (we've already been + * written to), since we must avoid mmap_sem. But a race shouldn't + * be possible since when open for writing, execve() will fail + * with ETXTBSY (via deny_write_access()). A remaining problem + * is that since we've already been written to, we must ignore the + * return value of file_remove_privs(), since we can't reject the + * writes of the past. + */ + if (unlikely(file->f_flags & O_REMOVEPRIV)) { + mutex_lock(&inode->i_mutex); + file_remove_privs(file); + mutex_unlock(&inode->i_mutex); + } + fsnotify_close(file); /* * The function eventpoll_release() should be the first called diff --git a/fs/open.c b/fs/open.c index b6f1e96a7c0b..89069d16ca80 100644 --- a/fs/open.c +++ b/fs/open.c @@ -895,7 +895,7 @@ static inline int build_open_flags(int flags, umode_t mode, struct open_flags *o op->mode = 0; /* Must never be set by userspace */ - flags &= ~FMODE_NONOTIFY & ~O_CLOEXEC; + flags &= ~FMODE_NONOTIFY & ~O_CLOEXEC & ~O_REMOVEPRIV; /* * O_SYNC is implemented as __O_SYNC|O_DSYNC. As many places only diff --git a/include/uapi/asm-generic/fcntl.h b/include/uapi/asm-generic/fcntl.h index e063effe0cc1..1b6c49edb393 100644 --- a/include/uapi/asm-generic/fcntl.h +++ b/include/uapi/asm-generic/fcntl.h @@ -88,6 +88,17 @@ #define __O_TMPFILE 020000000 #endif +/* + * Reserved arch-specific values for __O_TMPFILE: + * 0040000000 (parisc) + * 0100000000 (alpha) + * 0200000000 (sparc) + */ + +#ifndef O_REMOVEPRIV +#define O_REMOVEPRIV 0400000000 +#endif + /* a horrid kludge trying to make sure that this will fail on old kernels */ #define O_TMPFILE (__O_TMPFILE | O_DIRECTORY) #define O_TMPFILE_MASK (__O_TMPFILE | O_DIRECTORY | O_CREAT) diff --git a/mm/memory.c b/mm/memory.c index c387430f06c3..ad4188a8f279 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2036,6 +2036,11 @@ static inline int wp_page_reuse(struct mm_struct *mm, if (!page_mkwrite) file_update_time(vma->vm_file); + if (unlikely((vma->vm_file->f_flags & O_REMOVEPRIV) == 0)) { + spin_lock(&vma->vm_file->f_lock); + vma->vm_file->f_flags |= O_REMOVEPRIV; + spin_unlock(&vma->vm_file->f_lock); + } } return VM_FAULT_WRITE;
Normally, when a user can modify a file that has setuid or setgid bits, those bits are cleared when they are not the file owner or a member of the group. This is enforced when using write and truncate but not when writing to a shared mmap on the file. This could allow the file writer to gain privileges by changing a binary without losing the setuid/setgid/caps bits. Changing the bits requires holding inode->i_mutex, so it cannot be done during the page fault (due to mmap_sem being held during the fault). We could do this during vm_mmap_pgoff, but that would need coverage in mprotect as well, but to check for MAP_SHARED, we'd need to hold mmap_sem again. We could clear at open() time, but it's possible things are accidentally opening with O_RDWR and only reading. Better to clear on close and error failures (i.e. an improvement over now, which is not clearing at all). Instead, detect the need to clear the bits during the page fault, and actually remove the bits during final fput. Since the file was open for writing, it wouldn't have been possible to execute it yet (ETXTBSY). Signed-off-by: Kees Cook <keescook@chromium.org> --- v7: - document and avoid arch-specific O_* values, viro v6: - clarify ETXTBSY situation in comments, luto v5: - add to f_flags instead, viro - add i_mutex during __fput, jack v4: - delay removal instead of still needing mmap_sem for mprotect, yalin v3: - move outside of mmap_sem for real now, fengguang - check return code of file_remove_privs, akpm v2: - move to mmap from fault handler, jack --- fs/file_table.c | 15 +++++++++++++++ fs/open.c | 2 +- include/uapi/asm-generic/fcntl.h | 11 +++++++++++ mm/memory.c | 5 +++++ 4 files changed, 32 insertions(+), 1 deletion(-)