Message ID | 20220426225211.308418-9-ebiederm@xmission.com (mailing list archive) |
---|---|
State | Handled Elsewhere, archived |
Headers | show |
Series | ptrace: cleaning up ptrace_stop | expand |
On 04/26, Eric W. Biederman wrote: > > static void ptrace_unfreeze_traced(struct task_struct *task) > { > - if (READ_ONCE(task->__state) != __TASK_TRACED) > + if (!(READ_ONCE(task->jobctl) & JOBCTL_DELAY_WAKEKILL)) > return; > > WARN_ON(!task->ptrace || task->parent != current); > @@ -213,11 +213,10 @@ static void ptrace_unfreeze_traced(struct task_struct *task) > * Recheck state under the lock to close this race. > */ > spin_lock_irq(&task->sighand->siglock); Now that we do not check __state = __TASK_TRACED, we need lock_task_sighand(). The tracee can be already woken up by ptrace_resume(), but it is possible that it didn't clear DELAY_WAKEKILL yet. Now, before we take ->siglock, the tracee can exit and another thread can do wait() and reap this task. Also, I think the comment above should be updated. I agree, it makes sense to re-check JOBCTL_DELAY_WAKEKILL under siglock just for clarity, but we no longer need to do this to close the race; jobctl &= ~JOBCTL_DELAY_WAKEKILL and wake_up_state() are safe even if JOBCTL_DELAY_WAKEKILL was already cleared. > @@ -2307,6 +2307,7 @@ static int ptrace_stop(int exit_code, int why, int clear_code, > > /* LISTENING can be set only during STOP traps, clear it */ > current->jobctl &= ~JOBCTL_LISTENING; > + current->jobctl &= ~JOBCTL_DELAY_WAKEKILL; minor, but current->jobctl &= ~(JOBCTL_LISTENING | JOBCTL_DELAY_WAKEKILL); looks better. Oleg.
On 04/26, Eric W. Biederman wrote: > > @@ -253,7 +252,7 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state) > */ > if (lock_task_sighand(child, &flags)) { > if (child->ptrace && child->parent == current) { > - WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED); > + WARN_ON(child->jobctl & JOBCTL_DELAY_WAKEKILL); This WARN_ON() doesn't look right. It is possible that this child was traced by another task and PTRACE_DETACH'ed, but it didn't clear DELAY_WAKEKILL. If the new debugger attaches and calls ptrace() before the child takes siglock ptrace_freeze_traced() will fail, but we can hit this WARN_ON(). Oleg.
Oleg Nesterov <oleg@redhat.com> writes: > On 04/26, Eric W. Biederman wrote: >> >> @@ -253,7 +252,7 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state) >> */ >> if (lock_task_sighand(child, &flags)) { >> if (child->ptrace && child->parent == current) { >> - WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED); >> + WARN_ON(child->jobctl & JOBCTL_DELAY_WAKEKILL); > > This WARN_ON() doesn't look right. > > It is possible that this child was traced by another task and PTRACE_DETACH'ed, > but it didn't clear DELAY_WAKEKILL. That would be a bug. That would mean that PTRACE_DETACHED process can not be SIGKILL'd. > If the new debugger attaches and calls ptrace() before the child takes siglock > ptrace_freeze_traced() will fail, but we can hit this WARN_ON(). Eric
On 04/27, Eric W. Biederman wrote: > > Oleg Nesterov <oleg@redhat.com> writes: > > > On 04/26, Eric W. Biederman wrote: > >> > >> @@ -253,7 +252,7 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state) > >> */ > >> if (lock_task_sighand(child, &flags)) { > >> if (child->ptrace && child->parent == current) { > >> - WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED); > >> + WARN_ON(child->jobctl & JOBCTL_DELAY_WAKEKILL); > > > > This WARN_ON() doesn't look right. > > > > It is possible that this child was traced by another task and PTRACE_DETACH'ed, > > but it didn't clear DELAY_WAKEKILL. > > That would be a bug. That would mean that PTRACE_DETACHED process can > not be SIGKILL'd. Why? The tracee will take siglock, clear JOBCTL_DELAY_WAKEKILL and notice SIGKILL after that. Oleg. > > If the new debugger attaches and calls ptrace() before the child takes siglock > > ptrace_freeze_traced() will fail, but we can hit this WARN_ON(). > > Eric >
On 04/27, Oleg Nesterov wrote: > > On 04/27, Eric W. Biederman wrote: > > > > Oleg Nesterov <oleg@redhat.com> writes: > > > > > On 04/26, Eric W. Biederman wrote: > > >> > > >> @@ -253,7 +252,7 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state) > > >> */ > > >> if (lock_task_sighand(child, &flags)) { > > >> if (child->ptrace && child->parent == current) { > > >> - WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED); > > >> + WARN_ON(child->jobctl & JOBCTL_DELAY_WAKEKILL); > > > > > > This WARN_ON() doesn't look right. > > > > > > It is possible that this child was traced by another task and PTRACE_DETACH'ed, > > > but it didn't clear DELAY_WAKEKILL. > > > > That would be a bug. That would mean that PTRACE_DETACHED process can > > not be SIGKILL'd. > > Why? The tracee will take siglock, clear JOBCTL_DELAY_WAKEKILL and notice > SIGKILL after that. Not to mention that the tracee is TASK_RUNNING after PTRACE_DETACH wakes it up, so the pending JOBCTL_DELAY_WAKEKILL simply has no effect. Oleg. > > > If the new debugger attaches and calls ptrace() before the child takes siglock > > > ptrace_freeze_traced() will fail, but we can hit this WARN_ON(). > > > > Eric > >
Oleg Nesterov <oleg@redhat.com> writes: > On 04/27, Oleg Nesterov wrote: >> >> On 04/27, Eric W. Biederman wrote: >> > >> > Oleg Nesterov <oleg@redhat.com> writes: >> > >> > > On 04/26, Eric W. Biederman wrote: >> > >> >> > >> @@ -253,7 +252,7 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state) >> > >> */ >> > >> if (lock_task_sighand(child, &flags)) { >> > >> if (child->ptrace && child->parent == current) { >> > >> - WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED); >> > >> + WARN_ON(child->jobctl & JOBCTL_DELAY_WAKEKILL); >> > > >> > > This WARN_ON() doesn't look right. >> > > >> > > It is possible that this child was traced by another task and PTRACE_DETACH'ed, >> > > but it didn't clear DELAY_WAKEKILL. >> > >> > That would be a bug. That would mean that PTRACE_DETACHED process can >> > not be SIGKILL'd. >> >> Why? The tracee will take siglock, clear JOBCTL_DELAY_WAKEKILL and notice >> SIGKILL after that. > > Not to mention that the tracee is TASK_RUNNING after PTRACE_DETACH wakes it > up, so the pending JOBCTL_DELAY_WAKEKILL simply has no effect. Oh. You are talking about the window when between clearing the traced state and when tracee resumes executing and clears JOBCTL_DELAY_WAKEKILL. I thought you were thinking about JOBCTL_DELAY_WAKEKILL being leaked. That requires both ptrace_attach and ptrace_check_attach for the new tracer to happen before the tracee is scheduled to run. I agree. I think the WARN_ON could reasonably be moved a bit later, but I don't know that the WARN_ON is important. I simply kept it because it seemed to make sense. Eric
Oleg Nesterov <oleg@redhat.com> writes: 2> On 04/26, Eric W. Biederman wrote: >> >> static void ptrace_unfreeze_traced(struct task_struct *task) >> { >> - if (READ_ONCE(task->__state) != __TASK_TRACED) >> + if (!(READ_ONCE(task->jobctl) & JOBCTL_DELAY_WAKEKILL)) >> return; >> >> WARN_ON(!task->ptrace || task->parent != current); >> @@ -213,11 +213,10 @@ static void ptrace_unfreeze_traced(struct task_struct *task) >> * Recheck state under the lock to close this race. >> */ >> spin_lock_irq(&task->sighand->siglock); > > Now that we do not check __state = __TASK_TRACED, we need lock_task_sighand(). > The tracee can be already woken up by ptrace_resume(), but it is possible that > it didn't clear DELAY_WAKEKILL yet. Yes. The subtle differences in when __TASK_TRACED and JOBCTL_DELAY_WAKEKILL are cleared are causing me some minor issues. This "WARN_ON(!task->ptrace || task->parent != current);" also now needs to be inside siglock, because the __TASK_TRACED is insufficient. > Now, before we take ->siglock, the tracee can exit and another thread can do > wait() and reap this task. > > Also, I think the comment above should be updated. I agree, it makes sense to > re-check JOBCTL_DELAY_WAKEKILL under siglock just for clarity, but we no longer > need to do this to close the race; jobctl &= ~JOBCTL_DELAY_WAKEKILL and > wake_up_state() are safe even if JOBCTL_DELAY_WAKEKILL was already > cleared. I think you are right about it being safe, but I am having a hard time convincing myself that is true. I want to be very careful sending __TASK_TRACED wake_ups as ptrace_stop fundamentally can't handle spurious wake_ups. So I think adding task_is_traced to the test to verify the task is still frozen. static void ptrace_unfreeze_traced(struct task_struct *task) { unsigned long flags; /* * Verify the task is still frozen before unfreezing it, * ptrace_resume could have unfrozen us. */ if (lock_task_sighand(task, &flags)) { if ((task->jobctl & JOBCTL_DELAY_WAKEKILL) && task_is_traced(task)) { task->jobctl &= ~JOBCTL_DELAY_WAKEKILL; if (__fatal_signal_pending(task)) wake_up_state(task, __TASK_TRACED); } unlock_task_sighand(task, &flags); } } >> @@ -2307,6 +2307,7 @@ static int ptrace_stop(int exit_code, int why, int clear_code, >> >> /* LISTENING can be set only during STOP traps, clear it */ >> current->jobctl &= ~JOBCTL_LISTENING; >> + current->jobctl &= ~JOBCTL_DELAY_WAKEKILL; > > minor, but > > current->jobctl &= ~(JOBCTL_LISTENING | JOBCTL_DELAY_WAKEKILL); > > looks better. Yes. Eric
"Eric W. Biederman" <ebiederm@xmission.com> writes: > diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h > index 3c8b34876744..1947c85aa9d9 100644 > --- a/include/linux/sched/signal.h > +++ b/include/linux/sched/signal.h > @@ -437,7 +437,8 @@ extern void signal_wake_up_state(struct task_struct *t, unsigned int state); > > static inline void signal_wake_up(struct task_struct *t, bool resume) > { > - signal_wake_up_state(t, resume ? TASK_WAKEKILL : 0); > + bool wakekill = resume && !(t->jobctl & JOBCTL_DELAY_WAKEKILL); > + signal_wake_up_state(t, wakekill ? TASK_WAKEKILL : 0); > } > static inline void ptrace_signal_wake_up(struct task_struct *t, bool resume) > { Grrr. While looking through everything today I have realized that there is a bug. Suppose we have 3 processes: TRACER, TRACEE, KILLER. Meanwhile TRACEE is in the middle of ptrace_stop, just after siglock has been dropped. The TRACER process has performed ptrace_attach on TRACEE and is in the middle of a ptrace operation and has just set JOBCTL_DELAY_WAKEKILL. Then comes in the KILLER process and sends the TRACEE a SIGKILL. The TRACEE __state remains TASK_TRACED, as designed. The bug appears when the TRACEE makes it to schedule(). Inside schedule there is a call to signal_pending_state() which notices a SIGKILL is pending and refuses to sleep. I could avoid setting TIF_SIGPENDING in signal_wake_up but that is insufficient as another signal may be pending. I could avoid marking the task as __fatal_signal_pending but then where would the information that the task needs to become __fatal_signal_pending go. Hmm. This looks like I need my other pending cleanup which introduces a helper to get this idea to work. Eric
On 04/27, Eric W. Biederman wrote: > > "Eric W. Biederman" <ebiederm@xmission.com> writes: > > > diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h > > index 3c8b34876744..1947c85aa9d9 100644 > > --- a/include/linux/sched/signal.h > > +++ b/include/linux/sched/signal.h > > @@ -437,7 +437,8 @@ extern void signal_wake_up_state(struct task_struct *t, unsigned int state); > > > > static inline void signal_wake_up(struct task_struct *t, bool resume) > > { > > - signal_wake_up_state(t, resume ? TASK_WAKEKILL : 0); > > + bool wakekill = resume && !(t->jobctl & JOBCTL_DELAY_WAKEKILL); > > + signal_wake_up_state(t, wakekill ? TASK_WAKEKILL : 0); > > } > > static inline void ptrace_signal_wake_up(struct task_struct *t, bool resume) > > { > > Grrr. While looking through everything today I have realized that there > is a bug. > > Suppose we have 3 processes: TRACER, TRACEE, KILLER. > > Meanwhile TRACEE is in the middle of ptrace_stop, just after siglock has > been dropped. > > The TRACER process has performed ptrace_attach on TRACEE and is in the > middle of a ptrace operation and has just set JOBCTL_DELAY_WAKEKILL. > > Then comes in the KILLER process and sends the TRACEE a SIGKILL. > The TRACEE __state remains TASK_TRACED, as designed. > > The bug appears when the TRACEE makes it to schedule(). Inside > schedule there is a call to signal_pending_state() which notices > a SIGKILL is pending and refuses to sleep. And I think this is fine. This doesn't really differ from the case when the tracee was killed before it takes siglock. The only problem (afaics) is that, once we introduce JOBCTL_TRACED, ptrace_stop() can leak this flag. That is why I suggested to clear it along with LISTENING/DELAY_WAKEKILL before return, exactly because schedule() won't block if fatal_signal_pending() is true. But may be I misunderstood you concern? Oleg.
Oleg Nesterov <oleg@redhat.com> writes: > On 04/27, Eric W. Biederman wrote: >> >> "Eric W. Biederman" <ebiederm@xmission.com> writes: >> >> > diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h >> > index 3c8b34876744..1947c85aa9d9 100644 >> > --- a/include/linux/sched/signal.h >> > +++ b/include/linux/sched/signal.h >> > @@ -437,7 +437,8 @@ extern void signal_wake_up_state(struct task_struct *t, unsigned int state); >> > >> > static inline void signal_wake_up(struct task_struct *t, bool resume) >> > { >> > - signal_wake_up_state(t, resume ? TASK_WAKEKILL : 0); >> > + bool wakekill = resume && !(t->jobctl & JOBCTL_DELAY_WAKEKILL); >> > + signal_wake_up_state(t, wakekill ? TASK_WAKEKILL : 0); >> > } >> > static inline void ptrace_signal_wake_up(struct task_struct *t, bool resume) >> > { >> >> Grrr. While looking through everything today I have realized that there >> is a bug. >> >> Suppose we have 3 processes: TRACER, TRACEE, KILLER. >> >> Meanwhile TRACEE is in the middle of ptrace_stop, just after siglock has >> been dropped. >> >> The TRACER process has performed ptrace_attach on TRACEE and is in the >> middle of a ptrace operation and has just set JOBCTL_DELAY_WAKEKILL. >> >> Then comes in the KILLER process and sends the TRACEE a SIGKILL. >> The TRACEE __state remains TASK_TRACED, as designed. >> >> The bug appears when the TRACEE makes it to schedule(). Inside >> schedule there is a call to signal_pending_state() which notices >> a SIGKILL is pending and refuses to sleep. > > And I think this is fine. This doesn't really differ from the case > when the tracee was killed before it takes siglock. Hmm. Maybe. > The only problem (afaics) is that, once we introduce JOBCTL_TRACED, > ptrace_stop() can leak this flag. That is why I suggested to clear > it along with LISTENING/DELAY_WAKEKILL before return, exactly because > schedule() won't block if fatal_signal_pending() is true. > > But may be I misunderstood you concern? Prior to JOBCTL_DELAY_WAKEKILL once __state was set to __TASK_TRACED we were guaranteed that schedule() would stop if a SIGKILL was received after that point. As well as being immune from wake-ups from SIGKILL. I guess we are immune from wake-ups with JOBCTL_DELAY_WAKEKILL as I have implemented it. The practical concern then seems to be that we are not guaranteed wait_task_inactive will succeed. Which means that it must continue to include the TASK_TRACED bit. Previously we were actually guaranteed in ptrace_check_attach that after ptrace_freeze_traced would succeed as any pending fatal signal would cause ptrace_freeze_traced to fail. Any incoming fatal signal would not stop schedule from sleeping. The ptraced task would continue to be ptraced, as all other ptrace operations are blocked by virtue of ptrace being single threaded. I think in my tired mind yesterday I thought it would messing things up after schedule decided to sleep. Still I would like to be able to let wait_task_inactive not care about the state of the process it is going to sleep for. Eric
On 04/28, Eric W. Biederman wrote: > > Oleg Nesterov <oleg@redhat.com> writes: > > >> The bug appears when the TRACEE makes it to schedule(). Inside > >> schedule there is a call to signal_pending_state() which notices > >> a SIGKILL is pending and refuses to sleep. > > > > And I think this is fine. This doesn't really differ from the case > > when the tracee was killed before it takes siglock. > > Hmm. Maybe. I hope ;) > Previously we were actually guaranteed in ptrace_check_attach that after > ptrace_freeze_traced would succeed as any pending fatal signal would > cause ptrace_freeze_traced to fail. Any incoming fatal signal would not > stop schedule from sleeping. Yes. So let me repeat, 7/9 "ptrace: Simplify the wait_task_inactive call in ptrace_check_attach" looks good to me (except it should use wait_task_inactive(__TASK_TRACED)), but it should come before other meaningfull changes and the changelog should be updated. And then we will probably need to reconsider this wait_task_inactive() and WARN_ON() around it, but depends on what will we finally do. > I think in my tired mind yesterday I got lost too ;) > Still I would like to be able to > let wait_task_inactive not care about the state of the process it is > going to sleep for. Not sure... but to be honest I didn't really pay attention to the wait_task_inactive(match_state => 0) part... Oleg.
diff --git a/include/linux/sched/jobctl.h b/include/linux/sched/jobctl.h index fa067de9f1a9..4e154ad8205f 100644 --- a/include/linux/sched/jobctl.h +++ b/include/linux/sched/jobctl.h @@ -19,6 +19,7 @@ struct task_struct; #define JOBCTL_TRAPPING_BIT 21 /* switching to TRACED */ #define JOBCTL_LISTENING_BIT 22 /* ptracer is listening for events */ #define JOBCTL_TRAP_FREEZE_BIT 23 /* trap for cgroup freezer */ +#define JOBCTL_DELAY_WAKEKILL_BIT 24 /* delay killable wakeups */ #define JOBCTL_STOP_DEQUEUED (1UL << JOBCTL_STOP_DEQUEUED_BIT) #define JOBCTL_STOP_PENDING (1UL << JOBCTL_STOP_PENDING_BIT) @@ -28,6 +29,7 @@ struct task_struct; #define JOBCTL_TRAPPING (1UL << JOBCTL_TRAPPING_BIT) #define JOBCTL_LISTENING (1UL << JOBCTL_LISTENING_BIT) #define JOBCTL_TRAP_FREEZE (1UL << JOBCTL_TRAP_FREEZE_BIT) +#define JOBCTL_DELAY_WAKEKILL (1UL << JOBCTL_DELAY_WAKEKILL_BIT) #define JOBCTL_TRAP_MASK (JOBCTL_TRAP_STOP | JOBCTL_TRAP_NOTIFY) #define JOBCTL_PENDING_MASK (JOBCTL_STOP_PENDING | JOBCTL_TRAP_MASK) diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h index 3c8b34876744..1947c85aa9d9 100644 --- a/include/linux/sched/signal.h +++ b/include/linux/sched/signal.h @@ -437,7 +437,8 @@ extern void signal_wake_up_state(struct task_struct *t, unsigned int state); static inline void signal_wake_up(struct task_struct *t, bool resume) { - signal_wake_up_state(t, resume ? TASK_WAKEKILL : 0); + bool wakekill = resume && !(t->jobctl & JOBCTL_DELAY_WAKEKILL); + signal_wake_up_state(t, wakekill ? TASK_WAKEKILL : 0); } static inline void ptrace_signal_wake_up(struct task_struct *t, bool resume) { diff --git a/kernel/ptrace.c b/kernel/ptrace.c index 842511ee9a9f..0bea74539320 100644 --- a/kernel/ptrace.c +++ b/kernel/ptrace.c @@ -194,7 +194,7 @@ static bool ptrace_freeze_traced(struct task_struct *task) if (task_is_traced(task) && !looks_like_a_spurious_pid(task) && !__fatal_signal_pending(task)) { - WRITE_ONCE(task->__state, __TASK_TRACED); + task->jobctl |= JOBCTL_DELAY_WAKEKILL; ret = true; } @@ -203,7 +203,7 @@ static bool ptrace_freeze_traced(struct task_struct *task) static void ptrace_unfreeze_traced(struct task_struct *task) { - if (READ_ONCE(task->__state) != __TASK_TRACED) + if (!(READ_ONCE(task->jobctl) & JOBCTL_DELAY_WAKEKILL)) return; WARN_ON(!task->ptrace || task->parent != current); @@ -213,11 +213,10 @@ static void ptrace_unfreeze_traced(struct task_struct *task) * Recheck state under the lock to close this race. */ spin_lock_irq(&task->sighand->siglock); - if (READ_ONCE(task->__state) == __TASK_TRACED) { + if (task->jobctl & JOBCTL_DELAY_WAKEKILL) { + task->jobctl &= ~JOBCTL_DELAY_WAKEKILL; if (__fatal_signal_pending(task)) wake_up_state(task, __TASK_TRACED); - else - WRITE_ONCE(task->__state, TASK_TRACED); } spin_unlock_irq(&task->sighand->siglock); } @@ -253,7 +252,7 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state) */ if (lock_task_sighand(child, &flags)) { if (child->ptrace && child->parent == current) { - WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED); + WARN_ON(child->jobctl & JOBCTL_DELAY_WAKEKILL); /* * child->sighand can't be NULL, release_task() * does ptrace_unlink() before __exit_signal(). diff --git a/kernel/signal.c b/kernel/signal.c index 584d67deb3cb..2b332f89cbad 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -2307,6 +2307,7 @@ static int ptrace_stop(int exit_code, int why, int clear_code, /* LISTENING can be set only during STOP traps, clear it */ current->jobctl &= ~JOBCTL_LISTENING; + current->jobctl &= ~JOBCTL_DELAY_WAKEKILL; /* * Queued signals ignored us while we were stopped for tracing.
Stop playing with tsk->__state to remove TASK_WAKEKILL while a ptrace command is executing. Instead implement a new jobtl flag JOBCTL_DELAY_WAKEKILL. This new flag is set in jobctl_freeze_task and cleared when ptrace_stop is awoken or in jobctl_unfreeze_task (when ptrace_stop remains asleep). In signal_wake_up_state drop TASK_WAKEKILL from state if TASK_WAKEKILL is used while JOBCTL_DELAY_WAKEKILL is set. This has the same effect as changing TASK_TRACED to __TASK_TRACED as all of the wake_ups that use TASK_KILLABLE go through signal_wake_up except the wake_up in ptrace_unfreeze_traced. Previously the __state value of __TASK_TRACED was changed to TASK_RUNNING when woken up or back to TASK_TRACED when the code was left in ptrace_stop. Now when woken up ptrace_stop now clears JOBCTL_DELAY_WAKEKILL and when left sleeping ptrace_unfreezed_traced clears JOBCTL_DELAY_WAKEKILL. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- include/linux/sched/jobctl.h | 2 ++ include/linux/sched/signal.h | 3 ++- kernel/ptrace.c | 11 +++++------ kernel/signal.c | 1 + 4 files changed, 10 insertions(+), 7 deletions(-)