diff mbox series

[7/9] ptrace: Simplify the wait_task_inactive call in ptrace_check_attach

Message ID 20220426225211.308418-7-ebiederm@xmission.com (mailing list archive)
State Handled Elsewhere, archived
Headers show
Series ptrace: cleaning up ptrace_stop | expand

Commit Message

Eric W. Biederman April 26, 2022, 10:52 p.m. UTC
Asking wait_task_inactive to verify that tsk->__state == __TASK_TRACED
was needed to detect the when ptrace_stop would decide not to stop
after calling "set_special_state(TASK_TRACED)".  With the recent
cleanups ptrace_stop will always stop after calling set_special_state.

Take advatnage of this by no longer asking wait_task_inactive to
verify the state.  If a bug is hit and wait_task_inactive does not
succeed warn and return -ESRCH.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/ptrace.c | 14 +++-----------
 1 file changed, 3 insertions(+), 11 deletions(-)

Comments

Eric W. Biederman April 27, 2022, 1:42 p.m. UTC | #1
"Eric W. Biederman" <ebiederm@xmission.com> writes:

> Asking wait_task_inactive to verify that tsk->__state == __TASK_TRACED
> was needed to detect the when ptrace_stop would decide not to stop
> after calling "set_special_state(TASK_TRACED)".  With the recent
> cleanups ptrace_stop will always stop after calling set_special_state.
>
> Take advatnage of this by no longer asking wait_task_inactive to
> verify the state.  If a bug is hit and wait_task_inactive does not
> succeed warn and return -ESRCH.

As Oleg noticed upthread there are more reasons than simply
!current->ptrace for wait_task_inactive to fail.  In particular a fatal
signal can be received any time before JOBCTL_DELAY_SIGKILL.

So this change is not safe.  I will respin this one.

Eric


> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>  kernel/ptrace.c | 14 +++-----------
>  1 file changed, 3 insertions(+), 11 deletions(-)
>
> diff --git a/kernel/ptrace.c b/kernel/ptrace.c
> index 16d1a84a2cae..0634da7ac685 100644
> --- a/kernel/ptrace.c
> +++ b/kernel/ptrace.c
> @@ -265,17 +265,9 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
>  	}
>  	read_unlock(&tasklist_lock);
>  
> -	if (!ret && !ignore_state) {
> -		if (!wait_task_inactive(child, __TASK_TRACED)) {
> -			/*
> -			 * This can only happen if may_ptrace_stop() fails and
> -			 * ptrace_stop() changes ->state back to TASK_RUNNING,
> -			 * so we should not worry about leaking __TASK_TRACED.
> -			 */
> -			WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED);
> -			ret = -ESRCH;
> -		}
> -	}
> +	if (!ret && !ignore_state &&
> +	    WARN_ON_ONCE(!wait_task_inactive(child, 0)))
> +		ret = -ESRCH;
>  
>  	return ret;
>  }

Eric
Eric W. Biederman April 27, 2022, 2:27 p.m. UTC | #2
"Eric W. Biederman" <ebiederm@xmission.com> writes:

> "Eric W. Biederman" <ebiederm@xmission.com> writes:
>
>> Asking wait_task_inactive to verify that tsk->__state == __TASK_TRACED
>> was needed to detect the when ptrace_stop would decide not to stop
>> after calling "set_special_state(TASK_TRACED)".  With the recent
>> cleanups ptrace_stop will always stop after calling set_special_state.
>>
>> Take advatnage of this by no longer asking wait_task_inactive to
>> verify the state.  If a bug is hit and wait_task_inactive does not
>> succeed warn and return -ESRCH.
>
> As Oleg noticed upthread there are more reasons than simply
> !current->ptrace for wait_task_inactive to fail.  In particular a fatal
> signal can be received any time before JOBCTL_DELAY_SIGKILL.
>
> So this change is not safe.  I will respin this one.

Bah.  I definitely need to update the description so there is going to
be a v2.

I confused myself.  This change is safe because ptrace_freeze_traced
fails if there is a pending fatal signal, and arranges that no new fatal
signals will wake up the task.

Eric

>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>> ---
>>  kernel/ptrace.c | 14 +++-----------
>>  1 file changed, 3 insertions(+), 11 deletions(-)
>>
>> diff --git a/kernel/ptrace.c b/kernel/ptrace.c
>> index 16d1a84a2cae..0634da7ac685 100644
>> --- a/kernel/ptrace.c
>> +++ b/kernel/ptrace.c
>> @@ -265,17 +265,9 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
>>  	}
>>  	read_unlock(&tasklist_lock);
>>  
>> -	if (!ret && !ignore_state) {
>> -		if (!wait_task_inactive(child, __TASK_TRACED)) {
>> -			/*
>> -			 * This can only happen if may_ptrace_stop() fails and
>> -			 * ptrace_stop() changes ->state back to TASK_RUNNING,
>> -			 * so we should not worry about leaking __TASK_TRACED.
>> -			 */
>> -			WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED);
>> -			ret = -ESRCH;
>> -		}
>> -	}
>> +	if (!ret && !ignore_state &&
>> +	    WARN_ON_ONCE(!wait_task_inactive(child, 0)))
>> +		ret = -ESRCH;
>>  
>>  	return ret;
>>  }
>
> Eric
Oleg Nesterov April 27, 2022, 3:14 p.m. UTC | #3
On 04/26, Eric W. Biederman wrote:
>
> Asking wait_task_inactive to verify that tsk->__state == __TASK_TRACED
> was needed to detect the when ptrace_stop would decide not to stop
> after calling "set_special_state(TASK_TRACED)".  With the recent
> cleanups ptrace_stop will always stop after calling set_special_state.
>
> Take advatnage of this by no longer asking wait_task_inactive to
> verify the state.  If a bug is hit and wait_task_inactive does not
> succeed warn and return -ESRCH.

ACK, but I think that the changelog is wrong.

We could do this right after may_ptrace_stop() has gone. This doesn't
depend on the previous changes in this series.

Oleg.
Peter Zijlstra April 28, 2022, 10:42 a.m. UTC | #4
On Wed, Apr 27, 2022 at 05:14:57PM +0200, Oleg Nesterov wrote:
> On 04/26, Eric W. Biederman wrote:
> >
> > Asking wait_task_inactive to verify that tsk->__state == __TASK_TRACED
> > was needed to detect the when ptrace_stop would decide not to stop
> > after calling "set_special_state(TASK_TRACED)".  With the recent
> > cleanups ptrace_stop will always stop after calling set_special_state.
> >
> > Take advatnage of this by no longer asking wait_task_inactive to
> > verify the state.  If a bug is hit and wait_task_inactive does not
> > succeed warn and return -ESRCH.
> 
> ACK, but I think that the changelog is wrong.
> 
> We could do this right after may_ptrace_stop() has gone. This doesn't
> depend on the previous changes in this series.

It very much does rely on there not being any blocking between
set_special_state() and schedule() tho. So all those PREEMPT_RT
spinlock->rt_mutex things need to be gone.

That is also the reason I couldn't do wait_task_inactive(task, 0) in the
other patch, I had to really match 'TASK_TRACED or TASK_FROZEN' any
other state must fail (specifically TASK_RTLOCK_WAIT must not match).
Oleg Nesterov April 28, 2022, 11:19 a.m. UTC | #5
On 04/28, Peter Zijlstra wrote:
>
> On Wed, Apr 27, 2022 at 05:14:57PM +0200, Oleg Nesterov wrote:
> > On 04/26, Eric W. Biederman wrote:
> > >
> > > Asking wait_task_inactive to verify that tsk->__state == __TASK_TRACED
> > > was needed to detect the when ptrace_stop would decide not to stop
> > > after calling "set_special_state(TASK_TRACED)".  With the recent
> > > cleanups ptrace_stop will always stop after calling set_special_state.
> > >
> > > Take advatnage of this by no longer asking wait_task_inactive to
> > > verify the state.  If a bug is hit and wait_task_inactive does not
> > > succeed warn and return -ESRCH.
> >
> > ACK, but I think that the changelog is wrong.
> >
> > We could do this right after may_ptrace_stop() has gone. This doesn't
> > depend on the previous changes in this series.
>
> It very much does rely on there not being any blocking between
> set_special_state() and schedule() tho. So all those PREEMPT_RT
> spinlock->rt_mutex things need to be gone.

Yes sure. But this patch doesn't add the new problems, imo.

Yes we can hit the WARN_ON_ONCE(!wait_task_inactive()), but this is
correct in that it should not fail, and this is what we need to fix.

> That is also the reason I couldn't do wait_task_inactive(task, 0)

Ah, I din't notice this patch uses wait_task_inactive(child, 0),
I think it should do wait_task_inactive(child, __TASK_TRACED).

Oleg.
Peter Zijlstra April 28, 2022, 1:54 p.m. UTC | #6
On Thu, Apr 28, 2022 at 01:19:11PM +0200, Oleg Nesterov wrote:
> > That is also the reason I couldn't do wait_task_inactive(task, 0)
> 
> Ah, I din't notice this patch uses wait_task_inactive(child, 0),
> I think it should do wait_task_inactive(child, __TASK_TRACED).

Shouldn't we then switch wait_task_inactive() so have & matching instead
of the current ==.
Oleg Nesterov April 28, 2022, 2:57 p.m. UTC | #7
On 04/28, Peter Zijlstra wrote:
>
> On Thu, Apr 28, 2022 at 01:19:11PM +0200, Oleg Nesterov wrote:
> > > That is also the reason I couldn't do wait_task_inactive(task, 0)
> >
> > Ah, I din't notice this patch uses wait_task_inactive(child, 0),
> > I think it should do wait_task_inactive(child, __TASK_TRACED).
>
> Shouldn't we then switch wait_task_inactive() so have & matching instead
> of the current ==.

Sorry, I don't understand the context...

As long as ptrace_freeze_traced() sets __state == __TASK_TRACED (as it
currently does) wait_task_inactive(__TASK_TRACED) is what we need ?

After we change it to use JOBCTL_DELAY_WAKEKILL and not abuse __state,
ptrace_attach() should use wait_task_inactive(TASK_TRACED), but this
depends on what exactly we are going to do...

Oleg.
Peter Zijlstra April 28, 2022, 4:09 p.m. UTC | #8
On Thu, Apr 28, 2022 at 04:57:50PM +0200, Oleg Nesterov wrote:

> > Shouldn't we then switch wait_task_inactive() so have & matching instead
> > of the current ==.
> 
> Sorry, I don't understand the context...

This.. I've always found it strange to have wti use a different matching
scheme from ttwu.

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index f259621f4c93..c039aef4c8fe 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3304,7 +3304,7 @@ unsigned long wait_task_inactive(struct task_struct *p, unsigned int match_state
 		 * is actually now running somewhere else!
 		 */
 		while (task_running(rq, p)) {
-			if (match_state && unlikely(READ_ONCE(p->__state) != match_state))
+			if (match_state && unlikely(!(READ_ONCE(p->__state) & match_state)))
 				return 0;
 			cpu_relax();
 		}
@@ -3319,7 +3319,7 @@ unsigned long wait_task_inactive(struct task_struct *p, unsigned int match_state
 		running = task_running(rq, p);
 		queued = task_on_rq_queued(p);
 		ncsw = 0;
-		if (!match_state || READ_ONCE(p->__state) == match_state)
+		if (!match_state || (READ_ONCE(p->__state) & match_state))
 			ncsw = p->nvcsw | LONG_MIN; /* sets MSB */
 		task_rq_unlock(rq, p, &rf);
Oleg Nesterov April 28, 2022, 4:19 p.m. UTC | #9
On 04/28, Peter Zijlstra wrote:
>
> On Thu, Apr 28, 2022 at 04:57:50PM +0200, Oleg Nesterov wrote:
>
> > > Shouldn't we then switch wait_task_inactive() so have & matching instead
> > > of the current ==.
> >
> > Sorry, I don't understand the context...
>
> This.. I've always found it strange to have wti use a different matching
> scheme from ttwu.

Ah. This is what I understood (and I too thought about this), just I meant that
this patch from Eric (assuming wait_task_inactive() still uses __TASK_TRACED) is
fine without your change below.

Oleg.

> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index f259621f4c93..c039aef4c8fe 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -3304,7 +3304,7 @@ unsigned long wait_task_inactive(struct task_struct *p, unsigned int match_state
>  		 * is actually now running somewhere else!
>  		 */
>  		while (task_running(rq, p)) {
> -			if (match_state && unlikely(READ_ONCE(p->__state) != match_state))
> +			if (match_state && unlikely(!(READ_ONCE(p->__state) & match_state)))
>  				return 0;
>  			cpu_relax();
>  		}
> @@ -3319,7 +3319,7 @@ unsigned long wait_task_inactive(struct task_struct *p, unsigned int match_state
>  		running = task_running(rq, p);
>  		queued = task_on_rq_queued(p);
>  		ncsw = 0;
> -		if (!match_state || READ_ONCE(p->__state) == match_state)
> +		if (!match_state || (READ_ONCE(p->__state) & match_state))
>  			ncsw = p->nvcsw | LONG_MIN; /* sets MSB */
>  		task_rq_unlock(rq, p, &rf);
diff mbox series

Patch

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 16d1a84a2cae..0634da7ac685 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -265,17 +265,9 @@  static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
 	}
 	read_unlock(&tasklist_lock);
 
-	if (!ret && !ignore_state) {
-		if (!wait_task_inactive(child, __TASK_TRACED)) {
-			/*
-			 * This can only happen if may_ptrace_stop() fails and
-			 * ptrace_stop() changes ->state back to TASK_RUNNING,
-			 * so we should not worry about leaking __TASK_TRACED.
-			 */
-			WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED);
-			ret = -ESRCH;
-		}
-	}
+	if (!ret && !ignore_state &&
+	    WARN_ON_ONCE(!wait_task_inactive(child, 0)))
+		ret = -ESRCH;
 
 	return ret;
 }