[v5] fs: clear file privilege bits when mmap writing

Message ID	20151209225148.GA14794@www.outflux.net (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-fsdevel-owner@kernel.org> Date: Wed, 9 Dec 2015 14:51:48 -0800 From: Kees Cook <keescook@chromium.org> To: Andrew Morton <akpm@linux-foundation.org> Cc: Jan Kara <jack@suse.cz>, yalin wang <yalin.wang2010@gmail.com>, Willy Tarreau <w@1wt.eu>, "Eric W. Biederman" <ebiederm@xmission.com>, Alexander Viro <viro@zeniv.linux.org.uk>, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v5] fs: clear file privilege bits when mmap writing Message-ID: <20151209225148.GA14794@www.outflux.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk

Message ID

20151209225148.GA14794@www.outflux.net (mailing list archive)

State

New, archived

Headers

Date: Wed, 9 Dec 2015 14:51:48 -0800
From: Kees Cook <keescook@chromium.org>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Jan Kara <jack@suse.cz>, yalin wang <yalin.wang2010@gmail.com>,
	Willy Tarreau <w@1wt.eu>, "Eric W. Biederman" <ebiederm@xmission.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Subject: [PATCH v5] fs: clear file privilege bits when mmap writing
Message-ID: <20151209225148.GA14794@www.outflux.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Sender: linux-fsdevel-owner@vger.kernel.org
Precedence: bulk

Commit Message

Kees Cook Dec. 9, 2015, 10:51 p.m. UTC

Normally, when a user can modify a file that has setuid or setgid bits,
those bits are cleared when they are not the file owner or a member
of the group. This is enforced when using write and truncate but not
when writing to a shared mmap on the file. This could allow the file
writer to gain privileges by changing a binary without losing the
setuid/setgid/caps bits.

Changing the bits requires holding inode->i_mutex, so it cannot be done
during the page fault (due to mmap_sem being held during the fault). We
could do this during vm_mmap_pgoff, but that would need coverage in
mprotect as well, but to check for MAP_SHARED, we'd need to hold mmap_sem
again. We could clear at open() time, but it's possible things are
accidentally opening with O_RDWR and only reading. Better to clear on
close and error failures (i.e. an improvement over now, which is not
clearing at all).

Instead, detect the need to clear the bits during the page fault, and
actually remove the bits during final fput. Since the file was open for
writing, it wouldn't have been possible to execute it yet.

Signed-off-by: Kees Cook <keescook@chromium.org>
---
I think this is the best we can do; everything else is blocked by mmap_sem.
---
 fs/file_table.c    | 11 +++++++++++
 include/linux/fs.h |  1 +
 mm/memory.c        |  1 +
 3 files changed, 13 insertions(+)

Comments

Christoph Hellwig Dec. 10, 2015, 1:21 a.m. UTC | #1

> Changing the bits requires holding inode->i_mutex, so it cannot be done
> during the page fault (due to mmap_sem being held during the fault). We
> could do this during vm_mmap_pgoff, but that would need coverage in
> mprotect as well, but to check for MAP_SHARED, we'd need to hold mmap_sem
> again. We could clear at open() time, but it's possible things are
> accidentally opening with O_RDWR and only reading. Better to clear on
> close and error failures (i.e. an improvement over now, which is not
> clearing at all).
> 
> Instead, detect the need to clear the bits during the page fault, and
> actually remove the bits during final fput. Since the file was open for
> writing, it wouldn't have been possible to execute it yet.


> 
> Signed-off-by: Kees Cook <keescook@chromium.org>
> ---
> I think this is the best we can do; everything else is blocked by mmap_sem.

It should be done at mmap time, before even taking mmap_sem.

Adding a new field for this to strut file isn't really acceptable.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Kees Cook Dec. 10, 2015, 3:25 a.m. UTC | #2

On Wed, Dec 9, 2015 at 5:21 PM, Christoph Hellwig <hch@infradead.org> wrote:
>> Changing the bits requires holding inode->i_mutex, so it cannot be done
>> during the page fault (due to mmap_sem being held during the fault). We
>> could do this during vm_mmap_pgoff, but that would need coverage in
>> mprotect as well, but to check for MAP_SHARED, we'd need to hold mmap_sem
>> again. We could clear at open() time, but it's possible things are
>> accidentally opening with O_RDWR and only reading. Better to clear on
>> close and error failures (i.e. an improvement over now, which is not
>> clearing at all).
>>
>> Instead, detect the need to clear the bits during the page fault, and
>> actually remove the bits during final fput. Since the file was open for
>> writing, it wouldn't have been possible to execute it yet.
>
>
>>
>> Signed-off-by: Kees Cook <keescook@chromium.org>
>> ---
>> I think this is the best we can do; everything else is blocked by mmap_sem.
>
> It should be done at mmap time, before even taking mmap_sem.
>
> Adding a new field for this to strut file isn't really acceptable.

I already covered this: there's no way to handle the mprotect case --
checking for MAP_SHARED is under mmap_sem still.

-Kees

Al Viro Dec. 10, 2015, 4:14 a.m. UTC | #3

On Wed, Dec 09, 2015 at 02:51:48PM -0800, Kees Cook wrote:
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 3aa514254161..409bd7047e7e 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -872,6 +872,7 @@ struct file {
>  	struct list_head	f_tfile_llink;
>  #endif /* #ifdef CONFIG_EPOLL */
>  	struct address_space	*f_mapping;
> +	bool			f_remove_privs;

NAK.  If anything, such things belong in ->f_flags.  _If_ this is worth
doing at all, that is.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

diff --git a/fs/file_table.c b/fs/file_table.c
index ad17e05ebf95..3a7eee76ea90 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -191,6 +191,17 @@  static void __fput(struct file *file)
 
 	might_sleep();
 
+	/*
+	 * XXX: While avoiding mmap_sem, we've already been written to.
+	 * We must ignore the return value, since we can't reject the
+	 * write.
+	 */
+	if (unlikely(file->f_remove_privs)) {
+		mutex_lock(&inode->i_mutex);
+		file_remove_privs(file);
+		mutex_unlock(&inode->i_mutex);
+	}
+
 	fsnotify_close(file);
 	/*
 	 * The function eventpoll_release() should be the first called
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 3aa514254161..409bd7047e7e 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -872,6 +872,7 @@  struct file {
 	struct list_head	f_tfile_llink;
 #endif /* #ifdef CONFIG_EPOLL */
 	struct address_space	*f_mapping;
+	bool			f_remove_privs;
 } __attribute__((aligned(4)));	/* lest something weird decides that 2 is OK */
 
 struct file_handle {
diff --git a/mm/memory.c b/mm/memory.c
index c387430f06c3..08a77e0cf65f 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2036,6 +2036,7 @@  static inline int wp_page_reuse(struct mm_struct *mm,
 
 		if (!page_mkwrite)
 			file_update_time(vma->vm_file);
+		vma->vm_file->f_remove_privs = true;
 	}
 
 	return VM_FAULT_WRITE;

[v5] fs: clear file privilege bits when mmap writing

Commit Message

Comments

Patch