From patchwork Fri Nov  4 15:00:38 2016
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Eric W. Biederman" <ebiederm@xmission.com>
X-Patchwork-Id: 9412721
Return-Path: <linux-security-module-owner@kernel.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
	[172.30.200.125])
	by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id
	BA06360573
	for <patchwork-linux-security-module@patchwork.kernel.org>;
	Fri,  4 Nov 2016 15:08:28 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id ABA602B150
	for <patchwork-linux-security-module@patchwork.kernel.org>;
	Fri,  4 Nov 2016 15:08:28 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 9ED152B1A4; Fri,  4 Nov 2016 15:08:28 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI
	autolearn=unavailable version=3.3.1
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C306C2B193
	for <patchwork-linux-security-module@patchwork.kernel.org>;
	Fri,  4 Nov 2016 15:08:26 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S932272AbcKDPIU (ORCPT
	<rfc822;patchwork-linux-security-module@patchwork.kernel.org>);
	Fri, 4 Nov 2016 11:08:20 -0400
Received: from out01.mta.xmission.com ([166.70.13.231]:45044 "EHLO
	out01.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S932271AbcKDPIS (ORCPT
	<rfc822;linux-security-module@vger.kernel.org>);
	Fri, 4 Nov 2016 11:08:18 -0400
Received: from in01.mta.xmission.com ([166.70.13.51])
	by out01.mta.xmission.com with esmtps
	(TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.87)
	(envelope-from <ebiederm@xmission.com>)
	id 1c2g5A-0004Dw-G4; Fri, 04 Nov 2016 09:07:04 -0600
Received: from [205.159.154.82] (helo=x220.xmission.com)
	by in01.mta.xmission.com with esmtpsa
	(TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.87)
	(envelope-from <ebiederm@xmission.com>)
	id 1c2g4L-0006Th-0l; Fri, 04 Nov 2016 09:06:24 -0600
From: ebiederm@xmission.com (Eric W. Biederman)
To: Oleg Nesterov <oleg@redhat.com>
Cc: Jann Horn <jann@thejh.net>, Alexander Viro <viro@zeniv.linux.org.uk>,
	Roland McGrath <roland@hack.frob.com>,
	John Johansen <john.johansen@canonical.com>,
	James Morris <james.l.morris@oracle.com>,
	"Serge E. Hallyn" <serge@hallyn.com>, Paul Moore <aul@paul-moore.com>,
	Stephen Smalley <sds@tycho.nsa.gov>, Eric Paris <eparis@parisplace.org>,
	Casey Schaufler <casey@schaufler-ca.com>,
	Kees Cook <keescook@chromium.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Janis Danisevskis <jdanis@google.com>,
	Seth Forshee <seth.forshee@canonical.com>,
	Thomas Gleixner <tglx@linutronix.de>, Benjamin LaHaise <bcrl@kvack.org>,
	Ben Hutchings <ben@decadent.org.uk>,
	Andy Lutomirski <luto@amacapital.net>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Krister Johansen <kjlx@templeofstupid.com>,
	linux-fsdevel@vger.kernel.org,
	linux-security-module@vger.kernel.org, security@kernel.org
References: <1477863998-3298-1-git-send-email-jann@thejh.net>
	<1477863998-3298-2-git-send-email-jann@thejh.net>
	<20161102181806.GB1112@redhat.com>
	<20161102205011.GF8196@pc.thejh.net>
	<20161103181225.GA11212@redhat.com> <87k2cj2x6j.fsf@xmission.com>
Date: Fri, 04 Nov 2016 10:00:38 -0500
In-Reply-To: <87k2cj2x6j.fsf@xmission.com> (Eric W. Biederman's message of
	"Fri, 04 Nov 2016 08:26:28 -0500")
Message-ID: <87k2cjuw6h.fsf@xmission.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux)
MIME-Version: 1.0
X-XM-SPF: eid=1c2g4L-0006Th-0l; ; ; mid=<87k2cjuw6h.fsf@xmission.com>; ; ;
	hst=in01.mta.xmission.com; ; ; ip=205.159.154.82; ; ;
	frm=ebiederm@xmission.com; ; ; spf=neutral
X-XM-AID: U2FsdGVkX1+rbezMcCbJhX2JeTaBGij58iRXdmYOc+A=
X-SA-Exim-Connect-IP: 205.159.154.82
X-SA-Exim-Mail-From: ebiederm@xmission.com
Subject: Re: [PATCH v3 1/8] exec: introduce cred_guard_light
X-SA-Exim-Version: 4.2.1 (built Thu, 05 May 2016 13:38:54 -0600)
X-SA-Exim-Scanned: Yes (on in01.mta.xmission.com)
Sender: owner-linux-security-module@vger.kernel.org
Precedence: bulk
List-ID: <linux-security-module.vger.kernel.org>
X-Virus-Scanned: ClamAV using ClamSMTP

ebiederm@xmission.com (Eric W. Biederman) writes:

> Oleg Nesterov <oleg@redhat.com> writes:
>
>> On 11/02, Jann Horn wrote:
>>>
>>> On Wed, Nov 02, 2016 at 07:18:06PM +0100, Oleg Nesterov wrote:
>>> > On 10/30, Jann Horn wrote:
>>> > >
>>> > > This is a new per-threadgroup lock that can often be taken instead of
>>> > > cred_guard_mutex and has less deadlock potential. I'm doing this because
>>> > > Oleg Nesterov mentioned the potential for deadlocks, in particular if a
>>> > > debugged task is stuck in execve, trying to get rid of a ptrace-stopped
>>> > > thread, and the debugger attempts to inspect procfs files of the debugged
>>> > > task.
>>> >
>>> > Yes, but let me repeat that we need to fix this anyway. So I don't really
>>> > understand why should we add yet another mutex.
>>>
>>> execve() only takes the new mutex immediately after de_thread(), so this
>>> problem shouldn't occur there.
>>
>> Yes, I see.
>>
>>> Basically, I think that I'm not making the
>>> problem worse with my patches this way.
>>
>> In a sense that it doesn't add the new deadlocks, I agree. But it adds
>> yet another per-process mutex while we already have the similar one,
>>
>>> I believe that it should be possible to convert most existing users of the
>>> cred_guard_mutex to the new cred_guard_light - exceptions to that that I
>>> see are:
>>>
>>>  - PTRACE_ATTACH
>>
>> This is the main problem afaics. So "strace -f" can hang if it races
>> with mt-exec. And we need to fix this. I constantly forget about this
>> problem, but I tried many times to find a reasonable solution, still
>> can't.
>>
>> IMO, it would be nice to rework the lsm hooks, so that we could take
>> cred_guard_mutex after de_thread() (like your cred_guard_light) or
>> at least drop it earlier, but unlikely this is possible...
>>
>> So the only plan I currently have is change de_thread() to wait until
>> other threads pass exit_notify() or even exit_signals(), but I don't
>> like this.
>>
>>>  - SECCOMP_FILTER_FLAG_TSYNC (sets NO_NEW_PRIVS on remote task)
>>
>> I forgot about this one... Need to re-check but at first glance this
>> is not a real problem.
>>
>>> Beyond that, conceptually, the new cred_guard_light could also be turned
>>> into a read-write mutex
>>
>> Not sure I understand how this can help... doesn't matter.
>>
>> My point is, imo you should not add the new mutex. Just use the old
>> one in (say) 4/8 (which I do not personally like as you know ;), this
>> won't add the new problem.
>>
>>
>>> It seems to me like SECCOMP_FILTER_FLAG_TSYNC doesn't really have
>>> deadlocking issues.
>>
>> Yes, agreed.
>>
>>> PTRACE_ATTACH isn't that clear to me; if a debugger
>>> tries to attach to a newly spawned thread while another ptraced thread is
>>> dying because of de_thread() in a third thread, that might still cause
>>> the debugger to deadlock, right?
>>
>> This is the trivial test-case I wrote when the problem was initially
>> reported. And damn, I always knew that cred_guard_mutex needs fixes,
>> but somehow I completely forgot that it is used by PTRACE_ATTACH when
>> I was going to try to remove from fs/proc a long ago.
>>
>> 	void *thread(void *arg)
>> 	{
>> 		ptrace(PTRACE_TRACEME, 0,0,0);
>> 		return NULL;
>> 	}
>>
>> 	int main(void)
>> 	{
>> 		int pid = fork();
>>
>> 		if (!pid) {
>> 			pthread_t pt;
>> 			pthread_create(&pt, NULL, thread, NULL);
>> 			pthread_join(pt, NULL);
>> 			execlp("echo", "echo", "passed", NULL);
>> 		}
>>
>> 		sleep(1);
>> 		// or anything else which needs ->cred_guard_mutex,
>> 		// say open(/proc/$pid/mem)
>> 		ptrace(PTRACE_ATTACH, pid, 0,0);
>> 		kill(pid, SIGCONT);
>>
>> 		return 0;
>> 	}
>>
>> The problem is trivial. The execing thread waits until its sub-thread
>> goes away, it should be reaped by the tracer, the tracer waits for
>> cred_guard_mutex.
>
> There is a bug here but I don't believe it has anything to do with
> the cred_guard_mutex.
>
> If we reach zap_other_threads fundamentally the tracer should not
> be able to block the traced thread from exiting.  Those are the
> semantics described in the comments in the code.
>
> I have poked things a little and have a half fix for that but
> the fix appears to be the wrong, but enlightening.
>
> AKA the following prevents the hang of your test case.
> diff --git a/kernel/signal.c b/kernel/signal.c
> index 75761acc77cf..a6f83450500e 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -1200,7 +1200,7 @@ int zap_other_threads(struct task_struct *p)
>  		if (t->exit_state)
>  			continue;
>  		sigaddset(&t->pending.signal, SIGKILL);
> -		signal_wake_up(t, 1);
> +		signal_wake_up_state(t, TASK_WAKEKILL | __TASK_TRACED);
>  	}
>  
>  	return count;
>
> It looks like somewhere on the exit path the traced thread is blocking
> without setting TASK_WAKEKILL.

Apologies there was a testing mistake and that patch does not actually
help anything.

The following mostly correct patch modifies zap_other_threads in
the case of a de_thread to not wait for zombies to be reaped.  The only
case that cares is ptrace (as threads are self reaping).  So I don't
think this will cause any problems except removing the strace -f race.

Not waiting for zombies to be reaped in de_thread keeps the kernel from
holding the cred_guard_mutex while waiting for userspace.  Which should
mean we don't have to move it.

Not waiting for zombies to be reaped should also speed of mt-exec.  So I
think this is a benefit all around.


Eric
---
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

diff --git a/kernel/exit.c b/kernel/exit.c
index 9d68c45ebbe3..8c8556cab655 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -109,7 +109,8 @@ static void __exit_signal(struct task_struct *tsk)
 		 * If there is any task waiting for the group exit
 		 * then notify it:
 		 */
-		if (sig->notify_count > 0 && !--sig->notify_count)
+		if ((sig->flags & SIGNAL_GROUP_EXIT) &&
+		    sig->notify_count > 0 && !--sig->notify_count)
 			wake_up_process(sig->group_exit_task);
 
 		if (tsk == sig->curr_target)
@@ -690,6 +691,10 @@ static void exit_notify(struct task_struct *tsk, int group_dead)
 	if (tsk->exit_state == EXIT_DEAD)
 		list_add(&tsk->ptrace_entry, &dead);
 
+	if (!(tsk->signal->flags & SIGNAL_GROUP_EXIT) &&
+	    tsk->signal->notify_count > 0 && !--tsk->signal->notify_count)
+		wake_up_process(tsk->signal->group_exit_task);
+
 	/* mt-exec, de_thread() is waiting for group leader */
 	if (unlikely(tsk->signal->notify_count < 0))
 		wake_up_process(tsk->signal->group_exit_task);
diff --git a/kernel/signal.c b/kernel/signal.c
index 75761acc77cf..a3a5cd8dad0f 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1194,7 +1194,9 @@ int zap_other_threads(struct task_struct *p)
 
 	while_each_thread(p, t) {
 		task_clear_jobctl_pending(t, JOBCTL_PENDING_MASK);
-		count++;
+		if ((t->signal->flags & SIGNAL_GROUP_EXIT) ||
+		    !t->exit_state)
+			count++;
 
 		/* Don't bother with already dead threads */
 		if (t->exit_state)