diff mbox

fs: clear file set[ug]id when writing via mmap

Message ID 20151120001043.GA28204@www.outflux.net (mailing list archive)
State New, archived
Headers show

Commit Message

Kees Cook Nov. 20, 2015, 12:10 a.m. UTC
Normally, when a user can modify a file that has setuid or setgid bits,
those bits are cleared when they are not the file owner or a member of the
group. This is enforced when using write() directly but not when writing
to a shared mmap on the file. This could allow the file writer to gain
privileges by changing the binary without losing the setuid/setgid bits.

Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: stable@vger.kernel.org
---
 mm/memory.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Andrew Morton Nov. 20, 2015, 12:41 a.m. UTC | #1
On Thu, 19 Nov 2015 16:10:43 -0800 Kees Cook <keescook@chromium.org> wrote:

> Normally, when a user can modify a file that has setuid or setgid bits,
> those bits are cleared when they are not the file owner or a member of the
> group. This is enforced when using write() directly but not when writing
> to a shared mmap on the file. This could allow the file writer to gain
> privileges by changing the binary without losing the setuid/setgid bits.
> 
> Signed-off-by: Kees Cook <keescook@chromium.org>
> Cc: stable@vger.kernel.org
> ---
>  mm/memory.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index deb679c31f2a..4c970a4e0057 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -2036,6 +2036,7 @@ static inline int wp_page_reuse(struct mm_struct *mm,
>  
>  		if (!page_mkwrite)
>  			file_update_time(vma->vm_file);
> +		file_remove_privs(vma->vm_file);
>  	}
>  
>  	return VM_FAULT_WRITE;

file_remove_privs() is depressingly heavyweight.  You'd think there was
some more lightweight way of caching the fact that we've already done
this.

Dumb question: can we run file_remove_privs() once, when the file is
opened writably, rather than for each and every write into each page?


Also, the proposed patch drops the file_remove_privs() return value on
the floor and we just go ahead with the modification.  How come?

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Kees Cook Nov. 20, 2015, 12:52 a.m. UTC | #2
On Thu, Nov 19, 2015 at 4:41 PM, Andrew Morton
<akpm@linux-foundation.org> wrote:
> On Thu, 19 Nov 2015 16:10:43 -0800 Kees Cook <keescook@chromium.org> wrote:
>
>> Normally, when a user can modify a file that has setuid or setgid bits,
>> those bits are cleared when they are not the file owner or a member of the
>> group. This is enforced when using write() directly but not when writing
>> to a shared mmap on the file. This could allow the file writer to gain
>> privileges by changing the binary without losing the setuid/setgid bits.
>>
>> Signed-off-by: Kees Cook <keescook@chromium.org>
>> Cc: stable@vger.kernel.org
>> ---
>>  mm/memory.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/mm/memory.c b/mm/memory.c
>> index deb679c31f2a..4c970a4e0057 100644
>> --- a/mm/memory.c
>> +++ b/mm/memory.c
>> @@ -2036,6 +2036,7 @@ static inline int wp_page_reuse(struct mm_struct *mm,
>>
>>               if (!page_mkwrite)
>>                       file_update_time(vma->vm_file);
>> +             file_remove_privs(vma->vm_file);
>>       }
>>
>>       return VM_FAULT_WRITE;
>
> file_remove_privs() is depressingly heavyweight.  You'd think there was
> some more lightweight way of caching the fact that we've already done
> this.

In theory, the IS_NOSEC(inode) should be fast. Perhaps track it in the
vma or file struct?

> Dumb question: can we run file_remove_privs() once, when the file is
> opened writably, rather than for each and every write into each page?

This got discussed briefly, but I can't remember why it got shot down.

> Also, the proposed patch drops the file_remove_privs() return value on
> the floor and we just go ahead with the modification.  How come?

Oh, excellent catch. If it can't drop it, it shouldn't be writable.
I'm not sure what the right abort scenario is in wp_page_reuse. Maybe
move this to start of wp_page_shared instead?

-Kees
Willy Tarreau Nov. 20, 2015, 1 a.m. UTC | #3
Hi Kees,

On Thu, Nov 19, 2015 at 04:10:43PM -0800, Kees Cook wrote:
> Normally, when a user can modify a file that has setuid or setgid bits,
> those bits are cleared when they are not the file owner or a member of the
> group. This is enforced when using write() directly but not when writing
> to a shared mmap on the file. This could allow the file writer to gain
> privileges by changing the binary without losing the setuid/setgid bits.
> 
> Signed-off-by: Kees Cook <keescook@chromium.org>
> Cc: stable@vger.kernel.org
> ---
>  mm/memory.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index deb679c31f2a..4c970a4e0057 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -2036,6 +2036,7 @@ static inline int wp_page_reuse(struct mm_struct *mm,
>  
>  		if (!page_mkwrite)
>  			file_update_time(vma->vm_file);
> +		file_remove_privs(vma->vm_file);

I thought you said in one of the early mails of this thread that it
didn't work. Or maybe I misunderstood.

Also, don't you think we should move that into the if (!page_mkwrite)
just like for the time update ?

Willy

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Willy Tarreau Nov. 20, 2015, 1:03 a.m. UTC | #4
On Fri, Nov 20, 2015 at 02:00:16AM +0100, Willy Tarreau wrote:
> > diff --git a/mm/memory.c b/mm/memory.c
> > index deb679c31f2a..4c970a4e0057 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -2036,6 +2036,7 @@ static inline int wp_page_reuse(struct mm_struct *mm,
> >  
> >  		if (!page_mkwrite)
> >  			file_update_time(vma->vm_file);
> > +		file_remove_privs(vma->vm_file);
> 
> I thought you said in one of the early mails of this thread that it
> didn't work. Or maybe I misunderstood.

OK never mind for this one I just saw the other mail where you said
the test is OK now. But I'm still worried about the performance so
the other point below remains :

> Also, don't you think we should move that into the if (!page_mkwrite)
> just like for the time update ?

Thanks!
Willy

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Kees Cook Nov. 20, 2015, 1:03 a.m. UTC | #5
On Thu, Nov 19, 2015 at 5:00 PM, Willy Tarreau <w@1wt.eu> wrote:
> Hi Kees,
>
> On Thu, Nov 19, 2015 at 04:10:43PM -0800, Kees Cook wrote:
>> Normally, when a user can modify a file that has setuid or setgid bits,
>> those bits are cleared when they are not the file owner or a member of the
>> group. This is enforced when using write() directly but not when writing
>> to a shared mmap on the file. This could allow the file writer to gain
>> privileges by changing the binary without losing the setuid/setgid bits.
>>
>> Signed-off-by: Kees Cook <keescook@chromium.org>
>> Cc: stable@vger.kernel.org
>> ---
>>  mm/memory.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/mm/memory.c b/mm/memory.c
>> index deb679c31f2a..4c970a4e0057 100644
>> --- a/mm/memory.c
>> +++ b/mm/memory.c
>> @@ -2036,6 +2036,7 @@ static inline int wp_page_reuse(struct mm_struct *mm,
>>
>>               if (!page_mkwrite)
>>                       file_update_time(vma->vm_file);
>> +             file_remove_privs(vma->vm_file);
>
> I thought you said in one of the early mails of this thread that it
> didn't work. Or maybe I misunderstood.

I had a think-o in my earlier attempts. I understood the meaning of
page_mkwrite incorrectly.

> Also, don't you think we should move that into the if (!page_mkwrite)
> just like for the time update ?

Nope, page_mkwrite indicates if there was a vmops call to
page_mkwrite. In this case, it means "I will update the file time if
the filesystem driver didn't take care of it like it should". For
file_remove_privs, we want to always do it, since we should not depend
on filesystems to do it.

-Kees
Willy Tarreau Nov. 20, 2015, 1:06 a.m. UTC | #6
On Thu, Nov 19, 2015 at 05:03:15PM -0800, Kees Cook wrote:
> On Thu, Nov 19, 2015 at 5:00 PM, Willy Tarreau <w@1wt.eu> wrote:
> > Hi Kees,
> >
> > On Thu, Nov 19, 2015 at 04:10:43PM -0800, Kees Cook wrote:
> >> Normally, when a user can modify a file that has setuid or setgid bits,
> >> those bits are cleared when they are not the file owner or a member of the
> >> group. This is enforced when using write() directly but not when writing
> >> to a shared mmap on the file. This could allow the file writer to gain
> >> privileges by changing the binary without losing the setuid/setgid bits.
> >>
> >> Signed-off-by: Kees Cook <keescook@chromium.org>
> >> Cc: stable@vger.kernel.org
> >> ---
> >>  mm/memory.c | 1 +
> >>  1 file changed, 1 insertion(+)
> >>
> >> diff --git a/mm/memory.c b/mm/memory.c
> >> index deb679c31f2a..4c970a4e0057 100644
> >> --- a/mm/memory.c
> >> +++ b/mm/memory.c
> >> @@ -2036,6 +2036,7 @@ static inline int wp_page_reuse(struct mm_struct *mm,
> >>
> >>               if (!page_mkwrite)
> >>                       file_update_time(vma->vm_file);
> >> +             file_remove_privs(vma->vm_file);
> >
> > I thought you said in one of the early mails of this thread that it
> > didn't work. Or maybe I misunderstood.
> 
> I had a think-o in my earlier attempts. I understood the meaning of
> page_mkwrite incorrectly.
>
> > Also, don't you think we should move that into the if (!page_mkwrite)
> > just like for the time update ?
> 
> Nope, page_mkwrite indicates if there was a vmops call to
> page_mkwrite. In this case, it means "I will update the file time if
> the filesystem driver didn't take care of it like it should". For
> file_remove_privs, we want to always do it, since we should not depend
> on filesystems to do it.

Ah OK, thanks for the explanation, I didn't understand it like this
at all last time I read it.

Cheers,
Willy

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Kara Nov. 23, 2015, 12:26 p.m. UTC | #7
On Thu 19-11-15 16:10:43, Kees Cook wrote:
> Normally, when a user can modify a file that has setuid or setgid bits,
> those bits are cleared when they are not the file owner or a member of the
> group. This is enforced when using write() directly but not when writing
> to a shared mmap on the file. This could allow the file writer to gain
> privileges by changing the binary without losing the setuid/setgid bits.
> 
> Signed-off-by: Kees Cook <keescook@chromium.org>
> Cc: stable@vger.kernel.org

So I had another look at this and now I understand why we didn't do it from
the start:

To call file_remove_privs() safely, we need to hold inode->i_mutex since
that operations is going to modify file mode / extended attributes and
i_mutex protects those. However we cannot get i_mutex in the page fault
path as that ranks above mmap_sem which we hold during the whole page
fault.

So calling file_remove_privs() when opening the file is probably as good as
it can get. It doesn't catch the case when suid bits / IMA attrs are set
while the file is already open but I don't see easy way around this.

BTW: This is another example where page fault locking is constraining us
and life would be simpler for filesystems we they get called without
mmap_sem held...

								Honza
Eric W. Biederman Nov. 23, 2015, 12:34 p.m. UTC | #8
Jan Kara <jack@suse.cz> writes:

> On Thu 19-11-15 16:10:43, Kees Cook wrote:
>> Normally, when a user can modify a file that has setuid or setgid bits,
>> those bits are cleared when they are not the file owner or a member of the
>> group. This is enforced when using write() directly but not when writing
>> to a shared mmap on the file. This could allow the file writer to gain
>> privileges by changing the binary without losing the setuid/setgid bits.
>> 
>> Signed-off-by: Kees Cook <keescook@chromium.org>
>> Cc: stable@vger.kernel.org
>
> So I had another look at this and now I understand why we didn't do it from
> the start:
>
> To call file_remove_privs() safely, we need to hold inode->i_mutex since
> that operations is going to modify file mode / extended attributes and
> i_mutex protects those. However we cannot get i_mutex in the page fault
> path as that ranks above mmap_sem which we hold during the whole page
> fault.
>
> So calling file_remove_privs() when opening the file is probably as good as
> it can get. It doesn't catch the case when suid bits / IMA attrs are set
> while the file is already open but I don't see easy way around this.

Could we perhaps do this on mmap MAP_WRITE instead of open, and simply
deny adding these attributes if a file is mapped for write?

That would seem to be a little more compatible with what we already do,
and guards against the races you mention as well.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Kees Cook Dec. 2, 2015, 11:55 p.m. UTC | #9
On Mon, Nov 23, 2015 at 4:26 AM, Jan Kara <jack@suse.cz> wrote:
> On Thu 19-11-15 16:10:43, Kees Cook wrote:
>> Normally, when a user can modify a file that has setuid or setgid bits,
>> those bits are cleared when they are not the file owner or a member of the
>> group. This is enforced when using write() directly but not when writing
>> to a shared mmap on the file. This could allow the file writer to gain
>> privileges by changing the binary without losing the setuid/setgid bits.
>>
>> Signed-off-by: Kees Cook <keescook@chromium.org>
>> Cc: stable@vger.kernel.org
>
> So I had another look at this and now I understand why we didn't do it from
> the start:
>
> To call file_remove_privs() safely, we need to hold inode->i_mutex since
> that operations is going to modify file mode / extended attributes and
> i_mutex protects those. However we cannot get i_mutex in the page fault
> path as that ranks above mmap_sem which we hold during the whole page
> fault.

Ah, I see the notation in __generic_file_write_iter about i_mutex.
Should file_remove_privs() get some debug annotation to catch callers
that don't hold that mutex? (That would have alerted me much earlier.)

> So calling file_remove_privs() when opening the file is probably as good as
> it can get. It doesn't catch the case when suid bits / IMA attrs are set
> while the file is already open but I don't see easy way around this.

I agree with Eric: mmap time seems like the right place.

> BTW: This is another example where page fault locking is constraining us
> and life would be simpler for filesystems we they get called without
> mmap_sem held...
>
>                                                                 Honza
> --
> Jan Kara <jack@suse.com>
> SUSE Labs, CR

-Kees
diff mbox

Patch

diff --git a/mm/memory.c b/mm/memory.c
index deb679c31f2a..4c970a4e0057 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2036,6 +2036,7 @@  static inline int wp_page_reuse(struct mm_struct *mm,
 
 		if (!page_mkwrite)
 			file_update_time(vma->vm_file);
+		file_remove_privs(vma->vm_file);
 	}
 
 	return VM_FAULT_WRITE;