diff mbox series

exec: Make suid_dumpable apply to SUID/SGID binaries irrespective of invoking users

Message ID 20211221021744.864115-1-longman@redhat.com (mailing list archive)
State New, archived
Headers show
Series exec: Make suid_dumpable apply to SUID/SGID binaries irrespective of invoking users | expand

Commit Message

Waiman Long Dec. 21, 2021, 2:17 a.m. UTC
The begin_new_exec() function checks for SUID or SGID binaries by
comparing effective uid and gid against real uid and gid and using
the suid_dumpable sysctl parameter setting only if either one of them
differs.

In the special case that the uid and/or gid of the SUID/SGID binaries
matches the id's of the user invoking it, the suid_dumpable is not
used and SUID_DUMP_USER will be used instead. The documentation for the
suid_dumpable sysctl parameter does not include that exception and so
this will be an undocumented behavior.

Eliminate this undocumented behavior by adding a flag in the linux_binprm
structure to designate a SUID/SGID binary and use it for determining
if the suid_dumpable setting should be applied or not.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 fs/exec.c               | 6 +++---
 include/linux/binfmts.h | 5 ++++-
 2 files changed, 7 insertions(+), 4 deletions(-)

Comments

Eric W. Biederman Dec. 21, 2021, 3:55 p.m. UTC | #1
Waiman Long <longman@redhat.com> writes:

> The begin_new_exec() function checks for SUID or SGID binaries by
> comparing effective uid and gid against real uid and gid and using
> the suid_dumpable sysctl parameter setting only if either one of them
> differs.
>
> In the special case that the uid and/or gid of the SUID/SGID binaries
> matches the id's of the user invoking it, the suid_dumpable is not
> used and SUID_DUMP_USER will be used instead. The documentation for the
> suid_dumpable sysctl parameter does not include that exception and so
> this will be an undocumented behavior.
>
> Eliminate this undocumented behavior by adding a flag in the linux_binprm
> structure to designate a SUID/SGID binary and use it for determining
> if the suid_dumpable setting should be applied or not.

I see that you are making the code match the documentation.
What harm/problems does this mismatch cause in practice?
What is the motivation for this change?

I am trying to see the motivation but all I can see is that
in the case where suid and sgid do nothing in practice the code
does not change dumpable.  The point of dumpable is to refuse to
core dump when it is not safe.  In this case since nothing happened
in practice it is safe.

So how does this matter in practice.  If there isn't a good
motivation my feel is that it is the documentation that needs to be
updated rather than the code.

There are a lot of warts to the suid/sgid handling during exec.  This
just doesn't look like one of them.

Eric


> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
>  fs/exec.c               | 6 +++---
>  include/linux/binfmts.h | 5 ++++-
>  2 files changed, 7 insertions(+), 4 deletions(-)
>
> diff --git a/fs/exec.c b/fs/exec.c
> index 537d92c41105..60e02e678fb6 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -1344,9 +1344,7 @@ int begin_new_exec(struct linux_binprm * bprm)
>  	 * is wrong, but userspace depends on it. This should be testing
>  	 * bprm->secureexec instead.
>  	 */
> -	if (bprm->interp_flags & BINPRM_FLAGS_ENFORCE_NONDUMP ||
> -	    !(uid_eq(current_euid(), current_uid()) &&
> -	      gid_eq(current_egid(), current_gid())))
> +	if (bprm->interp_flags & BINPRM_FLAGS_ENFORCE_NONDUMP || bprm->is_sugid)
>  		set_dumpable(current->mm, suid_dumpable);
>  	else
>  		set_dumpable(current->mm, SUID_DUMP_USER);
> @@ -1619,11 +1617,13 @@ static void bprm_fill_uid(struct linux_binprm *bprm, struct file *file)
>  	if (mode & S_ISUID) {
>  		bprm->per_clear |= PER_CLEAR_ON_SETID;
>  		bprm->cred->euid = uid;
> +		bprm->is_sugid = 1;
>  	}
>  
>  	if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP)) {
>  		bprm->per_clear |= PER_CLEAR_ON_SETID;
>  		bprm->cred->egid = gid;
> +		bprm->is_sugid = 1;
>  	}
>  }
>  
> diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
> index 049cf9421d83..6d9893c59085 100644
> --- a/include/linux/binfmts.h
> +++ b/include/linux/binfmts.h
> @@ -41,7 +41,10 @@ struct linux_binprm {
>  		 * Set when errors can no longer be returned to the
>  		 * original userspace.
>  		 */
> -		point_of_no_return:1;
> +		point_of_no_return:1,
> +
> +		/* Is a SUID or SGID binary? */
> +		is_sugid:1;
>  #ifdef __alpha__
>  	unsigned int taso:1;
>  #endif
Waiman Long Dec. 21, 2021, 4:41 p.m. UTC | #2
On 12/21/21 10:55, Eric W. Biederman wrote:
> Waiman Long <longman@redhat.com> writes:
>
>> The begin_new_exec() function checks for SUID or SGID binaries by
>> comparing effective uid and gid against real uid and gid and using
>> the suid_dumpable sysctl parameter setting only if either one of them
>> differs.
>>
>> In the special case that the uid and/or gid of the SUID/SGID binaries
>> matches the id's of the user invoking it, the suid_dumpable is not
>> used and SUID_DUMP_USER will be used instead. The documentation for the
>> suid_dumpable sysctl parameter does not include that exception and so
>> this will be an undocumented behavior.
>>
>> Eliminate this undocumented behavior by adding a flag in the linux_binprm
>> structure to designate a SUID/SGID binary and use it for determining
>> if the suid_dumpable setting should be applied or not.
> I see that you are making the code match the documentation.
> What harm/problems does this mismatch cause in practice?
> What is the motivation for this change?
>
> I am trying to see the motivation but all I can see is that
> in the case where suid and sgid do nothing in practice the code
> does not change dumpable.  The point of dumpable is to refuse to
> core dump when it is not safe.  In this case since nothing happened
> in practice it is safe.
>
> So how does this matter in practice.  If there isn't a good
> motivation my feel is that it is the documentation that needs to be
> updated rather than the code.
>
> There are a lot of warts to the suid/sgid handling during exec.  This
> just doesn't look like one of them

This patch is a minor mitigation in response to the security 
vulnerability as posted in 
https://www.openwall.com/lists/oss-security/2021/10/20/2 (aka 
CVE-2021-3864). In particular, the Su PoC (tested on CentOS 7) showing 
that the su invokes /usr/sbin/unix_chkpwd which is also a SUID binary. 
The initial su invocation won't generate a core dump because the real 
uid and euid differs, but the second unix_chkpwd invocation will. This 
patch eliminates this hole by making sure that all SUID binaries follow 
suid_dumpable setting.

Cheers,
Longman
Eric W. Biederman Dec. 21, 2021, 5:35 p.m. UTC | #3
Adding a couple of other people who have expressed opinions on how
to mitigate this issue in the kernel.

Waiman Long <longman@redhat.com> writes:

> On 12/21/21 10:55, Eric W. Biederman wrote:
>> Waiman Long <longman@redhat.com> writes:
>>
>>> The begin_new_exec() function checks for SUID or SGID binaries by
>>> comparing effective uid and gid against real uid and gid and using
>>> the suid_dumpable sysctl parameter setting only if either one of them
>>> differs.
>>>
>>> In the special case that the uid and/or gid of the SUID/SGID binaries
>>> matches the id's of the user invoking it, the suid_dumpable is not
>>> used and SUID_DUMP_USER will be used instead. The documentation for the
>>> suid_dumpable sysctl parameter does not include that exception and so
>>> this will be an undocumented behavior.
>>>
>>> Eliminate this undocumented behavior by adding a flag in the linux_binprm
>>> structure to designate a SUID/SGID binary and use it for determining
>>> if the suid_dumpable setting should be applied or not.
>> I see that you are making the code match the documentation.
>> What harm/problems does this mismatch cause in practice?
>> What is the motivation for this change?
>>
>> I am trying to see the motivation but all I can see is that
>> in the case where suid and sgid do nothing in practice the code
>> does not change dumpable.  The point of dumpable is to refuse to
>> core dump when it is not safe.  In this case since nothing happened
>> in practice it is safe.
>>
>> So how does this matter in practice.  If there isn't a good
>> motivation my feel is that it is the documentation that needs to be
>> updated rather than the code.
>>
>> There are a lot of warts to the suid/sgid handling during exec.  This
>> just doesn't look like one of them
>
> This patch is a minor mitigation in response to the security
> vulnerability as posted in
> https://www.openwall.com/lists/oss-security/2021/10/20/2 (aka
> CVE-2021-3864). In particular, the Su PoC (tested on CentOS 7) showing
> that the su invokes /usr/sbin/unix_chkpwd which is also a SUID
> binary. The initial su invocation won't generate a core dump because
> the real uid and euid differs, but the second unix_chkpwd invocation
> will. This patch eliminates this hole by making sure that all SUID
> binaries follow suid_dumpable setting.

All that is required to take advantage of this vulnerability is
for an suid program to exec something that will coredump.  That
exec resets the dumpability.

While the example exploit is execing a suid program it is not required
that the exec'd program be suid.

This makes your proposed change is not a particularly effective mitigation.


The best idea I have seen to mitigate this from the kernel side is:

1) set RLIMIT_CORE to 0 during an suid exec
2) update do_coredump to honor an rlimit of 0 for pipes

Anecdotally this should not effect the common systems that pipe
coredumps into programs as those programs are reported to honor
RLIMIT_CORE of 0.  This needs to be verified.

If those programs do honor RLIMIT_CORE of 0 we won't have any user
visible changes if they never see coredumps from a program with a
RLIMIT_CORE of 0.


I have been meaning to audit userspace and see if the common coredump
catchers truly honor an RLIMIT_CORE of 0.  Unfortunately I have not
found time to do that yet.


Eric
Waiman Long Dec. 21, 2021, 6:01 p.m. UTC | #4
On 12/21/21 12:35, Eric W. Biederman wrote:
> Adding a couple of other people who have expressed opinions on how
> to mitigate this issue in the kernel.
>
> Waiman Long <longman@redhat.com> writes:
>
>> On 12/21/21 10:55, Eric W. Biederman wrote:
>>> Waiman Long <longman@redhat.com> writes:
>>>
>>>> The begin_new_exec() function checks for SUID or SGID binaries by
>>>> comparing effective uid and gid against real uid and gid and using
>>>> the suid_dumpable sysctl parameter setting only if either one of them
>>>> differs.
>>>>
>>>> In the special case that the uid and/or gid of the SUID/SGID binaries
>>>> matches the id's of the user invoking it, the suid_dumpable is not
>>>> used and SUID_DUMP_USER will be used instead. The documentation for the
>>>> suid_dumpable sysctl parameter does not include that exception and so
>>>> this will be an undocumented behavior.
>>>>
>>>> Eliminate this undocumented behavior by adding a flag in the linux_binprm
>>>> structure to designate a SUID/SGID binary and use it for determining
>>>> if the suid_dumpable setting should be applied or not.
>>> I see that you are making the code match the documentation.
>>> What harm/problems does this mismatch cause in practice?
>>> What is the motivation for this change?
>>>
>>> I am trying to see the motivation but all I can see is that
>>> in the case where suid and sgid do nothing in practice the code
>>> does not change dumpable.  The point of dumpable is to refuse to
>>> core dump when it is not safe.  In this case since nothing happened
>>> in practice it is safe.
>>>
>>> So how does this matter in practice.  If there isn't a good
>>> motivation my feel is that it is the documentation that needs to be
>>> updated rather than the code.
>>>
>>> There are a lot of warts to the suid/sgid handling during exec.  This
>>> just doesn't look like one of them
>> This patch is a minor mitigation in response to the security
>> vulnerability as posted in
>> https://www.openwall.com/lists/oss-security/2021/10/20/2 (aka
>> CVE-2021-3864). In particular, the Su PoC (tested on CentOS 7) showing
>> that the su invokes /usr/sbin/unix_chkpwd which is also a SUID
>> binary. The initial su invocation won't generate a core dump because
>> the real uid and euid differs, but the second unix_chkpwd invocation
>> will. This patch eliminates this hole by making sure that all SUID
>> binaries follow suid_dumpable setting.
> All that is required to take advantage of this vulnerability is
> for an suid program to exec something that will coredump.  That
> exec resets the dumpability.
>
> While the example exploit is execing a suid program it is not required
> that the exec'd program be suid.
>
> This makes your proposed change is not a particularly effective mitigation.

Yes, I am aware of that. That is why I said it is just a minor 
mitigation. This patch was inspired after investigating this problem, 
but I do think it is good to make the code consistent with the 
documentation. Of course, we can go either way. I prefer my approach to 
use a flag to indicate a suid binary instead of just comparing ruid and 
euid.


>
> The best idea I have seen to mitigate this from the kernel side is:
>
> 1) set RLIMIT_CORE to 0 during an suid exec
> 2) update do_coredump to honor an rlimit of 0 for pipes
>
> Anecdotally this should not effect the common systems that pipe
> coredumps into programs as those programs are reported to honor
> RLIMIT_CORE of 0.  This needs to be verified.
>
> If those programs do honor RLIMIT_CORE of 0 we won't have any user
> visible changes if they never see coredumps from a program with a
> RLIMIT_CORE of 0.
>
>
> I have been meaning to audit userspace and see if the common coredump
> catchers truly honor an RLIMIT_CORE of 0.  Unfortunately I have not
> found time to do that yet.

Default RLIMIT_CORE to 0 will likely mitigate this vulnerability. 
However, there are still some userspace impacts as existing behavior 
will be modified. For instance, we may need to modify su to restore a 
proper value for RLIMIT_CORE after successful authentication.

Cheers,
Longman
Linus Torvalds Dec. 21, 2021, 6:19 p.m. UTC | #5
On Tue, Dec 21, 2021 at 10:01 AM Waiman Long <longman@redhat.com> wrote:
>
> Default RLIMIT_CORE to 0 will likely mitigate this vulnerability.
> However, there are still some userspace impacts as existing behavior
> will be modified. For instance, we may need to modify su to restore a
> proper value for RLIMIT_CORE after successful authentication.

We had a "clever" idea for this that I thought people were ok with.

It's been some time since this came up, but iirc the notion was to
instead of setting the rlimit to zero (which makes it really hard to
restore afterwards, because you don't know what the restored value
would be, so you are dependent on user space doing it), we just never
reset set_dumpable() when we execve.

So any suid exec will do set_dumpable() to suid_dumpable, and exec'ing
something else does nothing at all - it stays non-dumpable (obviously
"non-dumpable" here depends on the actual value for "suid_dumpable" -
you can enable suid dump debugging manually).

And instead, we say that operations like "setsid()" that start a new
session - *those* are the ones that enable core dumping again. Or
doing things like a "ulimit(RLIMIT_CORE)" (which clearly implies "I
want core-dumps").

Those will all very naturally make "login" and friends work correctly,
while keeping core-dumps disabled for some suid situation that doesn't
explicitly set up a new context.

I think the basic problem with the traditional UNIX model of "suid
exec doesn't core dump" is that the "enter non-core-dump" is a nice
clear "your privileges changed".

But then the "exit non-core-dump" thing is an exec that *doesn't*
change privileges. That's the odd and crazy part: you just disabled
core-dumps because there was a privilege level change, and then you
enable core-dumps again because there *wasn't* a privilege change -
even if you're still at those elevated privileges.

Now, this is clearly not a Linux issue - we're just doing what others
have been doing too. But I think we should just admit that "what
others have been doing" is simply broken.

And yes, some odd situation migth be broken by this kind of change,
but I think this kind of "the old model was broken" may simply require
that. I suspect we can find a solution that fixes all the regular
cases.

Hmm?

               Linus
Waiman Long Dec. 21, 2021, 7:27 p.m. UTC | #6
On 12/21/21 13:19, Linus Torvalds wrote:
> On Tue, Dec 21, 2021 at 10:01 AM Waiman Long <longman@redhat.com> wrote:
>> Default RLIMIT_CORE to 0 will likely mitigate this vulnerability.
>> However, there are still some userspace impacts as existing behavior
>> will be modified. For instance, we may need to modify su to restore a
>> proper value for RLIMIT_CORE after successful authentication.
> We had a "clever" idea for this that I thought people were ok with.
>
> It's been some time since this came up, but iirc the notion was to
> instead of setting the rlimit to zero (which makes it really hard to
> restore afterwards, because you don't know what the restored value
> would be, so you are dependent on user space doing it), we just never
> reset set_dumpable() when we execve.
>
> So any suid exec will do set_dumpable() to suid_dumpable, and exec'ing
> something else does nothing at all - it stays non-dumpable (obviously
> "non-dumpable" here depends on the actual value for "suid_dumpable" -
> you can enable suid dump debugging manually).
>
> And instead, we say that operations like "setsid()" that start a new
> session - *those* are the ones that enable core dumping again. Or
> doing things like a "ulimit(RLIMIT_CORE)" (which clearly implies "I
> want core-dumps").
>
> Those will all very naturally make "login" and friends work correctly,
> while keeping core-dumps disabled for some suid situation that doesn't
> explicitly set up a new context.
>
> I think the basic problem with the traditional UNIX model of "suid
> exec doesn't core dump" is that the "enter non-core-dump" is a nice
> clear "your privileges changed".
>
> But then the "exit non-core-dump" thing is an exec that *doesn't*
> change privileges. That's the odd and crazy part: you just disabled
> core-dumps because there was a privilege level change, and then you
> enable core-dumps again because there *wasn't* a privilege change -
> even if you're still at those elevated privileges.
>
> Now, this is clearly not a Linux issue - we're just doing what others
> have been doing too. But I think we should just admit that "what
> others have been doing" is simply broken.
>
> And yes, some odd situation migth be broken by this kind of change,
> but I think this kind of "the old model was broken" may simply require
> that. I suspect we can find a solution that fixes all the regular
> cases.
>
> Hmm?

I think this is a pretty clever idea. At least it is better than 
resetting RLIMIT_CORE to 0. As it is all done within the kernel, there 
is no need to change any userspace code. We may need to add a flag bit 
in the task structure to indicate using the suid_dumpable setting so 
that it can be inherited across fork/exec.

Thanks for the suggestion.

Cheers,
Longman
Willy Tarreau Dec. 21, 2021, 8:56 p.m. UTC | #7
On Tue, Dec 21, 2021 at 02:27:47PM -0500, Waiman Long wrote:
> On 12/21/21 13:19, Linus Torvalds wrote:
> > On Tue, Dec 21, 2021 at 10:01 AM Waiman Long <longman@redhat.com> wrote:
> > > Default RLIMIT_CORE to 0 will likely mitigate this vulnerability.
> > > However, there are still some userspace impacts as existing behavior
> > > will be modified. For instance, we may need to modify su to restore a
> > > proper value for RLIMIT_CORE after successful authentication.
> > We had a "clever" idea for this that I thought people were ok with.
> > 
> > It's been some time since this came up, but iirc the notion was to
> > instead of setting the rlimit to zero (which makes it really hard to
> > restore afterwards, because you don't know what the restored value
> > would be, so you are dependent on user space doing it), we just never
> > reset set_dumpable() when we execve.
> > 
> > So any suid exec will do set_dumpable() to suid_dumpable, and exec'ing
> > something else does nothing at all - it stays non-dumpable (obviously
> > "non-dumpable" here depends on the actual value for "suid_dumpable" -
> > you can enable suid dump debugging manually).
> > 
> > And instead, we say that operations like "setsid()" that start a new
> > session - *those* are the ones that enable core dumping again. Or
> > doing things like a "ulimit(RLIMIT_CORE)" (which clearly implies "I
> > want core-dumps").
> > 
> > Those will all very naturally make "login" and friends work correctly,
> > while keeping core-dumps disabled for some suid situation that doesn't
> > explicitly set up a new context.
> > 
> > I think the basic problem with the traditional UNIX model of "suid
> > exec doesn't core dump" is that the "enter non-core-dump" is a nice
> > clear "your privileges changed".
> > 
> > But then the "exit non-core-dump" thing is an exec that *doesn't*
> > change privileges. That's the odd and crazy part: you just disabled
> > core-dumps because there was a privilege level change, and then you
> > enable core-dumps again because there *wasn't* a privilege change -
> > even if you're still at those elevated privileges.
> > 
> > Now, this is clearly not a Linux issue - we're just doing what others
> > have been doing too. But I think we should just admit that "what
> > others have been doing" is simply broken.
> > 
> > And yes, some odd situation migth be broken by this kind of change,
> > but I think this kind of "the old model was broken" may simply require
> > that. I suspect we can find a solution that fixes all the regular
> > cases.
> > 
> > Hmm?
> 
> I think this is a pretty clever idea. At least it is better than resetting
> RLIMIT_CORE to 0.

Another problem that was raised when discussing RLIMIT_CORE to zero was
the loss of the old value.

> As it is all done within the kernel, there is no need to
> change any userspace code. We may need to add a flag bit in the task
> structure to indicate using the suid_dumpable setting so that it can be
> inherited across fork/exec.

Depending on what we change there can be some subtly visible changes.
In one of my servers I explicitly re-enable dumpable before setsid()
when a core dump is desired for debugging. But other deamons could do
the exact opposite. If setsid() systematically restores suid_dumpable,
a process that explicitly disables it before calling setsid() would
see it come back. But if we have a special "suid_in_progress" flag
to mask suid_dumpable and that's reset by setsid() and possibly
prctl(PR_SET_DUMPABLE) then I think it could even cover that unlikely
case.

Just my two cents,
Willy
Willy Tarreau Dec. 21, 2021, 10:13 p.m. UTC | #8
On Tue, Dec 21, 2021 at 09:56:35PM +0100, Willy Tarreau wrote:
> > As it is all done within the kernel, there is no need to
> > change any userspace code. We may need to add a flag bit in the task
> > structure to indicate using the suid_dumpable setting so that it can be
> > inherited across fork/exec.
> 
> Depending on what we change there can be some subtly visible changes.
> In one of my servers I explicitly re-enable dumpable before setsid()
> when a core dump is desired for debugging. But other deamons could do
> the exact opposite. If setsid() systematically restores suid_dumpable,
> a process that explicitly disables it before calling setsid() would
> see it come back. But if we have a special "suid_in_progress" flag
> to mask suid_dumpable and that's reset by setsid() and possibly
> prctl(PR_SET_DUMPABLE) then I think it could even cover that unlikely
> case.

Would there be any interest in pursuing attempts like the untested patch
below ? The intent is to set a new MMF_NOT_DUMPABLE on exec on setuid or
setgid bit, but clear it on setrlimit(RLIMIT_CORE), prctl(SET_DUMPABLE),
and setsid(). This flag makes get_dumpable() return SUID_DUMP_DISABLED
when set. I think that in the spirit it could maintain the info that a
suidexec happened and was not reset, without losing any tuning made by
the application. I never feel at ease touching all this and I certainly
did some mistakes but for now it's mostly to have a base to discuss
around, so do not hesitate to suggest or criticize.

Willy


diff --git a/fs/exec.c b/fs/exec.c
index a098c133d8d7..a80bfd835235 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1348,8 +1348,11 @@ int begin_new_exec(struct linux_binprm * bprm)
 	 */
 	if (bprm->interp_flags & BINPRM_FLAGS_ENFORCE_NONDUMP ||
 	    !(uid_eq(current_euid(), current_uid()) &&
-	      gid_eq(current_egid(), current_gid())))
+	      gid_eq(current_egid(), current_gid()))) {
+		set_bit(MMF_NOT_DUMPABLE, &mm->flags);
+
 		set_dumpable(current->mm, suid_dumpable);
+	}
 	else
 		set_dumpable(current->mm, SUID_DUMP_USER);
 
diff --git a/include/linux/sched/coredump.h b/include/linux/sched/coredump.h
index 4d9e3a656875..fd2109d036bc 100644
--- a/include/linux/sched/coredump.h
+++ b/include/linux/sched/coredump.h
@@ -14,23 +14,6 @@
 #define MMF_DUMPABLE_BITS 2
 #define MMF_DUMPABLE_MASK ((1 << MMF_DUMPABLE_BITS) - 1)
 
-extern void set_dumpable(struct mm_struct *mm, int value);
-/*
- * This returns the actual value of the suid_dumpable flag. For things
- * that are using this for checking for privilege transitions, it must
- * test against SUID_DUMP_USER rather than treating it as a boolean
- * value.
- */
-static inline int __get_dumpable(unsigned long mm_flags)
-{
-	return mm_flags & MMF_DUMPABLE_MASK;
-}
-
-static inline int get_dumpable(struct mm_struct *mm)
-{
-	return __get_dumpable(mm->flags);
-}
-
 /* coredump filter bits */
 #define MMF_DUMP_ANON_PRIVATE	2
 #define MMF_DUMP_ANON_SHARED	3
@@ -81,9 +64,29 @@ static inline int get_dumpable(struct mm_struct *mm)
  * lifecycle of this mm, just for simplicity.
  */
 #define MMF_HAS_PINNED		28	/* FOLL_PIN has run, never cleared */
+#define MMF_NOT_DUMPABLE	29	/* dump disable by suidexec */
 #define MMF_DISABLE_THP_MASK	(1 << MMF_DISABLE_THP)
 
 #define MMF_INIT_MASK		(MMF_DUMPABLE_MASK | MMF_DUMP_FILTER_MASK |\
 				 MMF_DISABLE_THP_MASK)
 
+extern void set_dumpable(struct mm_struct *mm, int value);
+/*
+ * This returns the actual value of the suid_dumpable flag. For things
+ * that are using this for checking for privilege transitions, it must
+ * test against SUID_DUMP_USER rather than treating it as a boolean
+ * value.
+ */
+static inline int __get_dumpable(unsigned long mm_flags)
+{
+	if (mm_flag & MMF_NOT_DUMPABLE)
+		return SUID_DUMP_DISABLE;
+	return mm_flags & MMF_DUMPABLE_MASK;
+}
+
+static inline int get_dumpable(struct mm_struct *mm)
+{
+	return __get_dumpable(mm->flags);
+}
+
 #endif /* _LINUX_SCHED_COREDUMP_H */
diff --git a/kernel/sys.c b/kernel/sys.c
index 8fdac0d90504..a20002075496 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1215,6 +1215,13 @@ int ksys_setsid(void)
 out:
 	write_unlock_irq(&tasklist_lock);
 	if (err > 0) {
+		struct mm_struct *mm = get_task_mm(current);
+		if (mm) {
+			/* session leaders reset the not-dumpable protection */
+			clear_bit(MMF_NOT_DUMPABLE, &mm->flags);
+			mmput(mm);
+		}
+
 		proc_sid_connector(group_leader);
 		sched_autogroup_create_attach(group_leader);
 	}
@@ -1610,6 +1617,18 @@ int do_prlimit(struct task_struct *tsk, unsigned int resource,
 	     new_rlim->rlim_cur != RLIM_INFINITY &&
 	     IS_ENABLED(CONFIG_POSIX_TIMERS))
 		update_rlimit_cpu(tsk, new_rlim->rlim_cur);
+
+	/*
+	 * If an application wants to change core dump settings, it means
+	 * it wants to decide on its dumpability so we reset MMF_NOT_DUMPABLE.
+	 */
+	if (resource == RLIMIT_CORE) {
+		struct mm_struct *mm = get_task_mm(tsk);
+		if (mm) {
+			clear_bit(MMF_NOT_DUMPABLE, &mm->flags);
+			mmput(mm);
+		}
+	}
 out:
 	read_unlock(&tasklist_lock);
 	return retval;
@@ -2292,6 +2311,7 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
 			error = -EINVAL;
 			break;
 		}
+		clear_bit(MMF_NOT_DUMPABLE, &me->mm->flags);
 		set_dumpable(me->mm, arg2);
 		break;
Eric W. Biederman Dec. 21, 2021, 11:35 p.m. UTC | #9
Willy Tarreau <w@1wt.eu> writes:

> On Tue, Dec 21, 2021 at 09:56:35PM +0100, Willy Tarreau wrote:
>> > As it is all done within the kernel, there is no need to
>> > change any userspace code. We may need to add a flag bit in the task
>> > structure to indicate using the suid_dumpable setting so that it can be
>> > inherited across fork/exec.
>> 
>> Depending on what we change there can be some subtly visible changes.
>> In one of my servers I explicitly re-enable dumpable before setsid()
>> when a core dump is desired for debugging. But other deamons could do
>> the exact opposite. If setsid() systematically restores suid_dumpable,
>> a process that explicitly disables it before calling setsid() would
>> see it come back. But if we have a special "suid_in_progress" flag
>> to mask suid_dumpable and that's reset by setsid() and possibly
>> prctl(PR_SET_DUMPABLE) then I think it could even cover that unlikely
>> case.
>
> Would there be any interest in pursuing attempts like the untested patch
> below ? The intent is to set a new MMF_NOT_DUMPABLE on exec on setuid or
> setgid bit, but clear it on setrlimit(RLIMIT_CORE), prctl(SET_DUMPABLE),
> and setsid(). This flag makes get_dumpable() return SUID_DUMP_DISABLED
> when set. I think that in the spirit it could maintain the info that a
> suidexec happened and was not reset, without losing any tuning made by
> the application. I never feel at ease touching all this and I certainly
> did some mistakes but for now it's mostly to have a base to discuss
> around, so do not hesitate to suggest or criticize.


Yes.  This looks like a good place to start the conversation.

We need to do something like you are doing to separate dumpability
changes due to privilege gains during exec and dumpability changes due
to privilege shuffling with setresuid.

As long as we only impact processes descending from a binary that has
gained privileges during exec (like this patch) I think we have a lot
of latitude in how we make this happen.  Basically we only need to
test su and sudo and verify that whatever we do works reasonably
well for them.

On the one hand I believe of gaining privileges during exec while
letting the caller control some aspect of our environment is a dangerous
design flaw and I would love to remove gaining privileges during exec
entirely.

On the other hand we need to introduces as few regressions as possible
and make gaining privileges during exec as safe as possible.



I do agree that RLIMIT_CORE and prctl(SET_DUMPABLE) are good places
to clear the flag.


I don't know if setsid is the proper key to re-enabling dumpability.

I ran a quick test and simply doing "su" and then running a shell
as root does not change the session, nor does "su -" (which creates
a login shell).  Also "sudo -s" does not create a new session.

So session creation does not happen naturally.

Still setsid is part of the standard formula for starting a daemon,
so I don't think system services that run as daemons will be affected.


I don't think anything we do matters for systemd.  As I understand
it "systemctl start ..." causes pid 1 to fork and exec services,
which will ensure the started processes are not descendants of
the binary the gained privileges during exec.

Eric
Willy Tarreau Dec. 22, 2021, 6:29 a.m. UTC | #10
On Tue, Dec 21, 2021 at 05:35:29PM -0600, Eric W. Biederman wrote:
> > Would there be any interest in pursuing attempts like the untested patch
> > below ? The intent is to set a new MMF_NOT_DUMPABLE on exec on setuid or
> > setgid bit, but clear it on setrlimit(RLIMIT_CORE), prctl(SET_DUMPABLE),
> > and setsid(). This flag makes get_dumpable() return SUID_DUMP_DISABLED
> > when set. I think that in the spirit it could maintain the info that a
> > suidexec happened and was not reset, without losing any tuning made by
> > the application. I never feel at ease touching all this and I certainly
> > did some mistakes but for now it's mostly to have a base to discuss
> > around, so do not hesitate to suggest or criticize.
> 
> 
> Yes.  This looks like a good place to start the conversation.

OK thanks.

> We need to do something like you are doing to separate dumpability
> changes due to privilege gains during exec and dumpability changes due
> to privilege shuffling with setresuid.
> 
> As long as we only impact processes descending from a binary that has
> gained privileges during exec (like this patch) I think we have a lot
> of latitude in how we make this happen.

Yes that's the idea. I think that fundamentally we ought to mark
that a chain of processes are potentially unsafe for dumps until the
application has shown that it could regain control of the code paths,
and hence is expected to deal properly with errors that might appear
so as not to dump a core anywhere with random permissions.

> Basically we only need to
> test su and sudo and verify that whatever we do works reasonably
> well for them.
> 
> On the one hand I believe of gaining privileges during exec while
> letting the caller control some aspect of our environment is a dangerous
> design flaw and I would love to remove gaining privileges during exec
> entirely.

You would like to postpone this ? It's not very clear to me how to do
that nor if it could reliably address this shortcoming.

> On the other hand we need to introduces as few regressions as possible
> and make gaining privileges during exec as safe as possible.

Yep I think so. Also code that is designed to run under setuid (like sudo)
is usally well tested, and quite portable thanks to various OS-specific
tweaks that we definitely don't want to break.

> I do agree that RLIMIT_CORE and prctl(SET_DUMPABLE) are good places
> to clear the flag.

There are probably other ones, but ideally we ought to avoid stuff
that could happen early in the dynamic linker.

> I don't know if setsid is the proper key to re-enabling dumpability.

It was a supposition emitted by Linus, which deserved being checked
at least given that it's part of the usual sequence when starting a
deamon.

> I ran a quick test and simply doing "su" and then running a shell
> as root does not change the session, nor does "su -" (which creates
> a login shell).  Also "sudo -s" does not create a new session.
> 
> So session creation does not happen naturally.

OK, so it will not help them. For them we could use setuid()/setreuid()
and setresuid() as good indicators that the application has taken control
of its fate. Sadly we cannot do that in set_user() because this one is
not called when the uid doesn't change.

> Still setsid is part of the standard formula for starting a daemon,
> so I don't think system services that run as daemons will be affected.
> 
> 
> I don't think anything we do matters for systemd.  As I understand
> it "systemctl start ..." causes pid 1 to fork and exec services,
> which will ensure the started processes are not descendants of
> the binary the gained privileges during exec.

Good point, I hadn't thought about that, but I agree with you.

Willy
Willy Tarreau Dec. 26, 2021, 3:03 p.m. UTC | #11
Hi Eric,

I've experimented a bit with this and am not completely convinced we're
on the right track.

First, resetting the not-dumpable status on setuid()/setsid() and friends
means that sudo itself resets it, and as such that the executable that it
launches may very well crash on CPU limits for example, thus allowing a
sudoer to produce a root core in a random directory. That's the current
state of affairs after the first attached patch.

This made me think that since we want to protect the called program and
not sudo itself, we ought to verify that the called program performs the
action itself. I.e. only programs that are marked as dumpable may reset
their not-dumpable status on various actions. That's what the second patch
does.

Doing this worked a lot better and initially made me think it solved the
issue: a user becoming root via sudo could regularly dump cores, but
crashes during startup would not work. That was until I tested with
"sudo ping" and saw root cores again, because ping, like many setuid
enabled programs, takes care of its permissions via setuid(0).

So this makes me think that trying to infer a program's intents via
snooping a few syscalls isn't going very far. Earlier in this
conversation there were a few other proposals around just playing with
RLIMIT_CORE and PR_SET_DUMPABLE.

I tend to think that if we combine the principle above (only monitor
dumpable programs) with the only two actions that *really* act on the
ability to produce a core (rlimit and prctl) then it might actually
reasonably work, because then a program could explicitly want to enable
core dumps and be allowed to do that. That's what the last patch does
and I couldn't find a case where it doesn't work. I.e. switching from
a user to root via sudo allows me to dump a core from a shell as the
shell sets RLIMIT_CORE, but will not let a program such as "ping" dump
a core by default for example. I've put that in the last patch which
replaces the first two ones.

I'm still wondering if adding a 4th value to suid_dumpable wouldn't
solve all of this: the value would automatically change when setting
RLIMIT_CORE. It could just be a bit more confusing to configure.

Other (orthogonal) approaches could consist in forcefully resetting a
few limits on suidexec. At least memory limits and CPU limits could be
reset to default (unlimited) values since there doesn't seem to be any
valid reason for an unprivileged process to change them for a privileged
one. And the core dump limits could be set to zero for the same reason.
It could be decided that we'd never dump a core on user-addressable
signals (SIGQUIT, maybe others?) and that could be even cleaner than
the currently discussed solutions.

Please let me know what you think.

Thanks,
Willy
From 8ca54b3a0bba56dbfce8ccf96ba36a1d6c189e85 Mon Sep 17 00:00:00 2001
From: Willy Tarreau <w@1wt.eu>
Date: Sun, 26 Dec 2021 15:17:08 +0100
Subject: WIP: only remove non-dumpable in the non-suid program

this way sudo doesn't drop its own not-dumpable flag before executing.
---
 fs/coredump.c                  |  2 ++
 include/linux/sched/coredump.h |  4 ++--
 kernel/sys.c                   | 15 ++++++++++-----
 3 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/fs/coredump.c b/fs/coredump.c
index 3224dee44d30..5f0bfe2c00a6 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -609,6 +609,8 @@ void do_coredump(const kernel_siginfo_t *siginfo)
 		goto fail;
 	if (!__get_dumpable(cprm.mm_flags))
 		goto fail;
+	if (cprm.mm_flags & MMF_NOT_DUMPABLE_MASK)
+		goto fail;
 
 	cred = prepare_creds();
 	if (!cred)
diff --git a/include/linux/sched/coredump.h b/include/linux/sched/coredump.h
index 302f31247c90..662508d139e1 100644
--- a/include/linux/sched/coredump.h
+++ b/include/linux/sched/coredump.h
@@ -80,8 +80,8 @@ extern void set_dumpable(struct mm_struct *mm, int value);
  */
 static inline int __get_dumpable(unsigned long mm_flags)
 {
-	if (mm_flags & MMF_NOT_DUMPABLE_MASK)
-		return SUID_DUMP_DISABLE;
+	//if (mm_flags & MMF_NOT_DUMPABLE_MASK)
+	//	return SUID_DUMP_DISABLE;
 	return mm_flags & MMF_DUMPABLE_MASK;
 }
 
diff --git a/kernel/sys.c b/kernel/sys.c
index f4405e004145..0ecdb4cc64e7 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -564,7 +564,8 @@ long __sys_setreuid(uid_t ruid, uid_t euid)
 		goto error;
 
 	/* attempt to change ID drops the not-dumpable protection */
-	clear_bit(MMF_NOT_DUMPABLE, &current->mm->flags);
+	if (get_dumpable(current->mm))
+		clear_bit(MMF_NOT_DUMPABLE, &current->mm->flags);
 
 	return commit_creds(new);
 
@@ -629,7 +630,8 @@ long __sys_setuid(uid_t uid)
 		goto error;
 
 	/* attempt to change ID drops the not-dumpable protection */
-	clear_bit(MMF_NOT_DUMPABLE, &current->mm->flags);
+	if (get_dumpable(current->mm))
+		clear_bit(MMF_NOT_DUMPABLE, &current->mm->flags);
 
 	return commit_creds(new);
 
@@ -711,7 +713,8 @@ long __sys_setresuid(uid_t ruid, uid_t euid, uid_t suid)
 		goto error;
 
 	/* attempt to change ID drops the not-dumpable protection */
-	clear_bit(MMF_NOT_DUMPABLE, &current->mm->flags);
+	if (get_dumpable(current->mm))
+		clear_bit(MMF_NOT_DUMPABLE, &current->mm->flags);
 
 	return commit_creds(new);
 
@@ -1225,7 +1228,8 @@ int ksys_setsid(void)
 	write_unlock_irq(&tasklist_lock);
 	if (err > 0) {
 		/* session leaders reset the not-dumpable protection */
-		clear_bit(MMF_NOT_DUMPABLE, &current->mm->flags);
+		if (get_dumpable(current->mm))
+			clear_bit(MMF_NOT_DUMPABLE, &current->mm->flags);
 
 		proc_sid_connector(group_leader);
 		sched_autogroup_create_attach(group_leader);
@@ -1627,7 +1631,8 @@ int do_prlimit(struct task_struct *tsk, unsigned int resource,
 	 * If an application wants to change its own core dump settings, it
 	 * wants to decide on its dumpability so we reset MMF_NOT_DUMPABLE.
 	 */
-	if (!retval && new_rlim && resource == RLIMIT_CORE && tsk == current)
+	if (!retval && new_rlim && resource == RLIMIT_CORE && tsk == current &&
+	    get_dumpable(tsk->mm))
 		clear_bit(MMF_NOT_DUMPABLE, &tsk->mm->flags);
 out:
 	read_unlock(&tasklist_lock);
diff mbox series

Patch

diff --git a/fs/exec.c b/fs/exec.c
index 537d92c41105..60e02e678fb6 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1344,9 +1344,7 @@  int begin_new_exec(struct linux_binprm * bprm)
 	 * is wrong, but userspace depends on it. This should be testing
 	 * bprm->secureexec instead.
 	 */
-	if (bprm->interp_flags & BINPRM_FLAGS_ENFORCE_NONDUMP ||
-	    !(uid_eq(current_euid(), current_uid()) &&
-	      gid_eq(current_egid(), current_gid())))
+	if (bprm->interp_flags & BINPRM_FLAGS_ENFORCE_NONDUMP || bprm->is_sugid)
 		set_dumpable(current->mm, suid_dumpable);
 	else
 		set_dumpable(current->mm, SUID_DUMP_USER);
@@ -1619,11 +1617,13 @@  static void bprm_fill_uid(struct linux_binprm *bprm, struct file *file)
 	if (mode & S_ISUID) {
 		bprm->per_clear |= PER_CLEAR_ON_SETID;
 		bprm->cred->euid = uid;
+		bprm->is_sugid = 1;
 	}
 
 	if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP)) {
 		bprm->per_clear |= PER_CLEAR_ON_SETID;
 		bprm->cred->egid = gid;
+		bprm->is_sugid = 1;
 	}
 }
 
diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
index 049cf9421d83..6d9893c59085 100644
--- a/include/linux/binfmts.h
+++ b/include/linux/binfmts.h
@@ -41,7 +41,10 @@  struct linux_binprm {
 		 * Set when errors can no longer be returned to the
 		 * original userspace.
 		 */
-		point_of_no_return:1;
+		point_of_no_return:1,
+
+		/* Is a SUID or SGID binary? */
+		is_sugid:1;
 #ifdef __alpha__
 	unsigned int taso:1;
 #endif