[v2,4/8] futex: don't leak robust_list pointer
diff mbox

Message ID 20160930145256.GB12862@redhat.com
State New
Headers show

Commit Message

Oleg Nesterov Sept. 30, 2016, 2:52 p.m. UTC
On 09/23, Jann Horn wrote:
>
> This prevents an attacker from determining the robust_list or
> compat_robust_list userspace pointer of a process created by executing
> a setuid binary. Such an attack could be performed by racing
> get_robust_list() with a setuid execution. The impact of this issue is that
> an attacker could theoretically bypass ASLR when attacking setuid binaries.

Well. I am not sure this actually needs a fix, but I won't argue.

I can't really understand what this patch actually fixes,

> @@ -3007,31 +3007,43 @@ SYSCALL_DEFINE3(get_robust_list, int, pid,
>  	if (!futex_cmpxchg_enabled)
>  		return -ENOSYS;
>
> -	rcu_read_lock();
> -
> -	ret = -ESRCH;
> -	if (!pid)
> +	if (!pid) {
>  		p = current;
> -	else {
> +		get_task_struct(p);
> +	} else {
> +		rcu_read_lock();
>  		p = find_task_by_vpid(pid);
> +		/* pin the task to permit dropping the RCU read lock before
> +		 * acquiring the mutex
> +		 */
> +		if (p)
> +			get_task_struct(p);
> +		rcu_read_unlock();
>  		if (!p)
> -			goto err_unlock;
> +			return -ESRCH;
>  	}
>
> +	ret = mutex_lock_killable(&p->signal->cred_guard_light);
> +	if (ret)
> +		goto err_put;
> +
>  	ret = -EPERM;
>  	if (!ptrace_may_access(p, PTRACE_MODE_READ_REALCREDS))
>  		goto err_unlock;
>
>  	head = p->robust_list;
> -	rcu_read_unlock();

OK, suppose it races with setuid exec, and mutex_lock_killable() +
ptrace_may_access() comes after flush_old_exec() but before
install_exec_creds(), in this case ptrace_may_access() can wrongly
succeed.

In theory, it is possible that the execing thread can complete exec,
return to user-mode and call sys_set_robust_list() before we read
head = p->robust_list. Yes, this is unlikely, but unless I am totally
confused the race you are trying to fix is equally unlikely?

perhaps we can make a much simpler change to prevent this, see below.
We can rely on fact that both ptrace_may_access() and exec_mmap()
takes the same task_lock(). Sure, this can "leak" robust_list too,
a set-uid binary can exec and/or lower its credentials after we
read p->robust_list, but personally I think we do not care.

Or I missed something else?

Oleg.


--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Jann Horn Oct. 30, 2016, 5:16 p.m. UTC | #1
On Fri, Sep 30, 2016 at 04:52:57PM +0200, Oleg Nesterov wrote:
> On 09/23, Jann Horn wrote:
> >
> > This prevents an attacker from determining the robust_list or
> > compat_robust_list userspace pointer of a process created by executing
> > a setuid binary. Such an attack could be performed by racing
> > get_robust_list() with a setuid execution. The impact of this issue is that
> > an attacker could theoretically bypass ASLR when attacking setuid binaries.
> 
> Well. I am not sure this actually needs a fix, but I won't argue.
> 
> I can't really understand what this patch actually fixes,
> 
> > @@ -3007,31 +3007,43 @@ SYSCALL_DEFINE3(get_robust_list, int, pid,
> >  	if (!futex_cmpxchg_enabled)
> >  		return -ENOSYS;
> >
> > -	rcu_read_lock();
> > -
> > -	ret = -ESRCH;
> > -	if (!pid)
> > +	if (!pid) {
> >  		p = current;
> > -	else {
> > +		get_task_struct(p);
> > +	} else {
> > +		rcu_read_lock();
> >  		p = find_task_by_vpid(pid);
> > +		/* pin the task to permit dropping the RCU read lock before
> > +		 * acquiring the mutex
> > +		 */
> > +		if (p)
> > +			get_task_struct(p);
> > +		rcu_read_unlock();
> >  		if (!p)
> > -			goto err_unlock;
> > +			return -ESRCH;
> >  	}
> >
> > +	ret = mutex_lock_killable(&p->signal->cred_guard_light);
> > +	if (ret)
> > +		goto err_put;
> > +
> >  	ret = -EPERM;
> >  	if (!ptrace_may_access(p, PTRACE_MODE_READ_REALCREDS))
> >  		goto err_unlock;
> >
> >  	head = p->robust_list;
> > -	rcu_read_unlock();
> 
> OK, suppose it races with setuid exec, and mutex_lock_killable() +
> ptrace_may_access() comes after flush_old_exec() but before
> install_exec_creds(), in this case ptrace_may_access() can wrongly
> succeed.

I take cred_guard_light in flush_old_exec() and release it in
install_exec_creds(), so that shouldn't work, I think.


> In theory, it is possible that the execing thread can complete exec,
> return to user-mode and call sys_set_robust_list() before we read
> head = p->robust_list. Yes, this is unlikely, but unless I am totally
> confused the race you are trying to fix is equally unlikely?
> 
> perhaps we can make a much simpler change to prevent this, see below.
> We can rely on fact that both ptrace_may_access() and exec_mmap()
> takes the same task_lock(). Sure, this can "leak" robust_list too,
> a set-uid binary can exec and/or lower its credentials after we
> read p->robust_list, but personally I think we do not care.
> 
> Or I missed something else?

No - I think your patch would work, too, apart from the potential
leak you mentioned.

But unless this breaks something, I would prefer to do it properly.

> Oleg.
> 
> --- x/kernel/futex.c
> +++ x/kernel/futex.c
> @@ -3019,10 +3019,10 @@ SYSCALL_DEFINE3(get_robust_list, int, pi
>  	}
>  
>  	ret = -EPERM;
> +	head = p->robust_list;
>  	if (!ptrace_may_access(p, PTRACE_MODE_READ_REALCREDS))
>  		goto err_unlock;
>  
> -	head = p->robust_list;
>  	rcu_read_unlock();
>  
>  	if (put_user(sizeof(*head), len_ptr))
>
Jann Horn Nov. 2, 2016, 9:39 p.m. UTC | #2
On Sun, Oct 30, 2016 at 06:16:50PM +0100, Jann Horn wrote:
> On Fri, Sep 30, 2016 at 04:52:57PM +0200, Oleg Nesterov wrote:
> > On 09/23, Jann Horn wrote:
> > >
> > > This prevents an attacker from determining the robust_list or
> > > compat_robust_list userspace pointer of a process created by executing
> > > a setuid binary. Such an attack could be performed by racing
> > > get_robust_list() with a setuid execution. The impact of this issue is that
> > > an attacker could theoretically bypass ASLR when attacking setuid binaries.
> > 
> > Well. I am not sure this actually needs a fix, but I won't argue.
> > 
> > I can't really understand what this patch actually fixes,
> > 
> > > @@ -3007,31 +3007,43 @@ SYSCALL_DEFINE3(get_robust_list, int, pid,
> > >  	if (!futex_cmpxchg_enabled)
> > >  		return -ENOSYS;
> > >
> > > -	rcu_read_lock();
> > > -
> > > -	ret = -ESRCH;
> > > -	if (!pid)
> > > +	if (!pid) {
> > >  		p = current;
> > > -	else {
> > > +		get_task_struct(p);
> > > +	} else {
> > > +		rcu_read_lock();
> > >  		p = find_task_by_vpid(pid);
> > > +		/* pin the task to permit dropping the RCU read lock before
> > > +		 * acquiring the mutex
> > > +		 */
> > > +		if (p)
> > > +			get_task_struct(p);
> > > +		rcu_read_unlock();
> > >  		if (!p)
> > > -			goto err_unlock;
> > > +			return -ESRCH;
> > >  	}
> > >
> > > +	ret = mutex_lock_killable(&p->signal->cred_guard_light);
> > > +	if (ret)
> > > +		goto err_put;
> > > +
> > >  	ret = -EPERM;
> > >  	if (!ptrace_may_access(p, PTRACE_MODE_READ_REALCREDS))
> > >  		goto err_unlock;
> > >
> > >  	head = p->robust_list;
> > > -	rcu_read_unlock();
> > 
> > OK, suppose it races with setuid exec, and mutex_lock_killable() +
> > ptrace_may_access() comes after flush_old_exec() but before
> > install_exec_creds(), in this case ptrace_may_access() can wrongly
> > succeed.
> 
> I take cred_guard_light in flush_old_exec() and release it in
> install_exec_creds(), so that shouldn't work, I think.
> 
> 
> > In theory, it is possible that the execing thread can complete exec,
> > return to user-mode and call sys_set_robust_list() before we read
> > head = p->robust_list. Yes, this is unlikely, but unless I am totally
> > confused the race you are trying to fix is equally unlikely?
> > 
> > perhaps we can make a much simpler change to prevent this, see below.
> > We can rely on fact that both ptrace_may_access() and exec_mmap()
> > takes the same task_lock(). Sure, this can "leak" robust_list too,
> > a set-uid binary can exec and/or lower its credentials after we
> > read p->robust_list, but personally I think we do not care.
> > 
> > Or I missed something else?
> 
> No - I think your patch would work, too, apart from the potential
> leak you mentioned.

Changing my opinion:

This does not just affect setuid binaries. It also affects daemons like
cron and atd that execute processes with dropped privileges.

This is how atd runs jobs (strace output, with irrelevant stuff removed):

[...]
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fa81b1099d0) = 14915
Process 14915 attached
[...]
[pid 14915] set_robust_list(0x7fa81b1099e0, 24) = 0
[...]
[pid 14915] setregid(0, 1)              = 0
[pid 14915] setreuid(0, 1)              = 0
[pid 14915] close(0)                    = 0
[pid 14915] close(1)                    = 0
[pid 14915] close(2)                    = 0
[pid 14915] clone(Process 14916 attached
child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fa81b1099d0) = 14916
[pid 14916] set_robust_list(0x7fa81b1099e0, 24) = 0
[pid 14915] wait4(14916,  <unfinished ...>
[pid 14916] lseek(6, 0, SEEK_SET)       = 0
[pid 14916] dup2(6, 0)                  = 0
[pid 14916] dup2(5, 1)                  = 1
[pid 14916] dup2(5, 2)                  = 2
[pid 14916] close(6)                    = 0
[pid 14916] close(5)                    = 0
[pid 14916] setreuid(1, 0)              = 0
[pid 14916] setregid(1, 0)              = 0
[...]
[pid 14916] setgroups(13, [1000, [...]]) = 0
[pid 14916] setgid(1000)                = 0
[pid 14916] setuid(1000)                = 0
[pid 14916] chdir("/")                  = 0
[pid 14916] execve("/bin/sh", ["sh"], [/* 0 vars */]) = 0
[...]

Basically, you can see that the pointer 0x7fa81b1099e0, which reveals
information about the address space layout, is the robust list of pid 14916
when it calls execve(), and after that execve() call, pid 14916 will be
ptraceable for the user (modulo LSMs).

So I think that my patch is a bit safer. Yes, there aren't many local
daemons whose address space layout you can discover this way, but it's still
not great.
Jann Horn Nov. 2, 2016, 10:47 p.m. UTC | #3
On Wed, Nov 02, 2016 at 10:39:32PM +0100, Jann Horn wrote:
> On Sun, Oct 30, 2016 at 06:16:50PM +0100, Jann Horn wrote:
> > On Fri, Sep 30, 2016 at 04:52:57PM +0200, Oleg Nesterov wrote:
> > > On 09/23, Jann Horn wrote:
> > > >
> > > > This prevents an attacker from determining the robust_list or
> > > > compat_robust_list userspace pointer of a process created by executing
> > > > a setuid binary. Such an attack could be performed by racing
> > > > get_robust_list() with a setuid execution. The impact of this issue is that
> > > > an attacker could theoretically bypass ASLR when attacking setuid binaries.
> > > 
> > > Well. I am not sure this actually needs a fix, but I won't argue.
> > > 
> > > I can't really understand what this patch actually fixes,
> > > 
> > > > @@ -3007,31 +3007,43 @@ SYSCALL_DEFINE3(get_robust_list, int, pid,
> > > >  	if (!futex_cmpxchg_enabled)
> > > >  		return -ENOSYS;
> > > >
> > > > -	rcu_read_lock();
> > > > -
> > > > -	ret = -ESRCH;
> > > > -	if (!pid)
> > > > +	if (!pid) {
> > > >  		p = current;
> > > > -	else {
> > > > +		get_task_struct(p);
> > > > +	} else {
> > > > +		rcu_read_lock();
> > > >  		p = find_task_by_vpid(pid);
> > > > +		/* pin the task to permit dropping the RCU read lock before
> > > > +		 * acquiring the mutex
> > > > +		 */
> > > > +		if (p)
> > > > +			get_task_struct(p);
> > > > +		rcu_read_unlock();
> > > >  		if (!p)
> > > > -			goto err_unlock;
> > > > +			return -ESRCH;
> > > >  	}
> > > >
> > > > +	ret = mutex_lock_killable(&p->signal->cred_guard_light);
> > > > +	if (ret)
> > > > +		goto err_put;
> > > > +
> > > >  	ret = -EPERM;
> > > >  	if (!ptrace_may_access(p, PTRACE_MODE_READ_REALCREDS))
> > > >  		goto err_unlock;
> > > >
> > > >  	head = p->robust_list;
> > > > -	rcu_read_unlock();
> > > 
> > > OK, suppose it races with setuid exec, and mutex_lock_killable() +
> > > ptrace_may_access() comes after flush_old_exec() but before
> > > install_exec_creds(), in this case ptrace_may_access() can wrongly
> > > succeed.
> > 
> > I take cred_guard_light in flush_old_exec() and release it in
> > install_exec_creds(), so that shouldn't work, I think.
> > 
> > 
> > > In theory, it is possible that the execing thread can complete exec,
> > > return to user-mode and call sys_set_robust_list() before we read
> > > head = p->robust_list. Yes, this is unlikely, but unless I am totally
> > > confused the race you are trying to fix is equally unlikely?
> > > 
> > > perhaps we can make a much simpler change to prevent this, see below.
> > > We can rely on fact that both ptrace_may_access() and exec_mmap()
> > > takes the same task_lock(). Sure, this can "leak" robust_list too,
> > > a set-uid binary can exec and/or lower its credentials after we
> > > read p->robust_list, but personally I think we do not care.
> > > 
> > > Or I missed something else?
> > 
> > No - I think your patch would work, too, apart from the potential
> > leak you mentioned.
> 
> Changing my opinion:
> 
> This does not just affect setuid binaries. It also affects daemons like
> cron and atd that execute processes with dropped privileges.
> 
> This is how atd runs jobs (strace output, with irrelevant stuff removed):
> 
> [...]
> clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fa81b1099d0) = 14915
> Process 14915 attached
> [...]
> [pid 14915] set_robust_list(0x7fa81b1099e0, 24) = 0
> [...]
> [pid 14915] setregid(0, 1)              = 0
> [pid 14915] setreuid(0, 1)              = 0
> [pid 14915] close(0)                    = 0
> [pid 14915] close(1)                    = 0
> [pid 14915] close(2)                    = 0
> [pid 14915] clone(Process 14916 attached
> child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fa81b1099d0) = 14916
> [pid 14916] set_robust_list(0x7fa81b1099e0, 24) = 0
> [pid 14915] wait4(14916,  <unfinished ...>
> [pid 14916] lseek(6, 0, SEEK_SET)       = 0
> [pid 14916] dup2(6, 0)                  = 0
> [pid 14916] dup2(5, 1)                  = 1
> [pid 14916] dup2(5, 2)                  = 2
> [pid 14916] close(6)                    = 0
> [pid 14916] close(5)                    = 0
> [pid 14916] setreuid(1, 0)              = 0
> [pid 14916] setregid(1, 0)              = 0
> [...]
> [pid 14916] setgroups(13, [1000, [...]]) = 0
> [pid 14916] setgid(1000)                = 0
> [pid 14916] setuid(1000)                = 0
> [pid 14916] chdir("/")                  = 0
> [pid 14916] execve("/bin/sh", ["sh"], [/* 0 vars */]) = 0
> [...]
> 
> Basically, you can see that the pointer 0x7fa81b1099e0, which reveals
> information about the address space layout, is the robust list of pid 14916
> when it calls execve(), and after that execve() call, pid 14916 will be
> ptraceable for the user (modulo LSMs).
> 
> So I think that my patch is a bit safer. Yes, there aren't many local
> daemons whose address space layout you can discover this way, but it's still
> not great.

I think my previous message wasn't very clear about what I think the issue is.

Basically, here, it would be plausible for uid 1000 to be able to determine
the pre-execve() robust_list pointer of pid 14916 by racing get_robust_list()
during the execve(). That itself isn't a big issue because the memory mappings
of pid 14916 are thrown away during the execve(), but what is potentially
interesting to an attacker is that before the execve(), pid 14916 shared its
address space layout with its parents, including the atd daemon. So if an
attacker has a vulnerability in atd but needs an address leak in order to
exploit it, this would be such a leak.

Patch
diff mbox

--- x/kernel/futex.c
+++ x/kernel/futex.c
@@ -3019,10 +3019,10 @@  SYSCALL_DEFINE3(get_robust_list, int, pi
 	}
 
 	ret = -EPERM;
+	head = p->robust_list;
 	if (!ptrace_may_access(p, PTRACE_MODE_READ_REALCREDS))
 		goto err_unlock;
 
-	head = p->robust_list;
 	rcu_read_unlock();
 
 	if (put_user(sizeof(*head), len_ptr))