Message ID | 20220822114648.856734578@infradead.org (mailing list archive) |
---|---|
State | Handled Elsewhere, archived |
Headers | show |
Series | Freezer Rewrite | expand |
* Peter Zijlstra <peterz@infradead.org> wrote: > Make wait_task_inactive()'s @match_state work like ttwu()'s @state. > > That is, instead of an equal comparison, use it as a mask. This allows > matching multiple block conditions. > > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> > --- > kernel/sched/core.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -3295,7 +3295,7 @@ unsigned long wait_task_inactive(struct > * is actually now running somewhere else! > */ > while (task_running(rq, p)) { > - if (match_state && unlikely(READ_ONCE(p->__state) != match_state)) > + if (match_state && !(READ_ONCE(p->__state) & match_state)) > return 0; We lose the unlikely annotation there - but I guess it probably never really mattered anyway? Suggestion #1: - Shouldn't we rename task_running() to something like task_on_cpu()? The task_running() primitive is similar to TASK_RUNNING but is not based off any TASK_FLAGS. Suggestion #2: - Shouldn't we eventually standardize on task->on_cpu on UP kernels too? They don't really matter anymore, and doing so removes #ifdefs and makes the code easier to read. > cpu_relax(); > } > @@ -3310,7 +3310,7 @@ unsigned long wait_task_inactive(struct > running = task_running(rq, p); > queued = task_on_rq_queued(p); > ncsw = 0; > - if (!match_state || READ_ONCE(p->__state) == match_state) > + if (!match_state || (READ_ONCE(p->__state) & match_state)) > ncsw = p->nvcsw | LONG_MIN; /* sets MSB */ > task_rq_unlock(rq, p, &rf); Suggestion #3: - Couldn't the following users with a 0 mask: drivers/powercap/idle_inject.c: wait_task_inactive(iit->tsk, 0); fs/coredump.c: wait_task_inactive(ptr->task, 0); Use ~0 instead (exposed as TASK_ANY or so) and then we can drop the !match_state special case? They'd do something like: drivers/powercap/idle_inject.c: wait_task_inactive(iit->tsk, TASK_ANY); fs/coredump.c: wait_task_inactive(ptr->task, TASK_ANY); It's not an entirely 100% equivalent transformation though, but looks OK at first sight: ->__state will be some nonzero mask for genuine tasks waiting to schedule out, so any match will be functionally the same as a 0 flag telling us not to check any of the bits, right? I might be missing something though. Thanks, Ingo
On Sun, Sep 04, 2022 at 12:44:36PM +0200, Ingo Molnar wrote: > > * Peter Zijlstra <peterz@infradead.org> wrote: > > > Make wait_task_inactive()'s @match_state work like ttwu()'s @state. > > > > That is, instead of an equal comparison, use it as a mask. This allows > > matching multiple block conditions. > > > > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> > > --- > > kernel/sched/core.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > --- a/kernel/sched/core.c > > +++ b/kernel/sched/core.c > > @@ -3295,7 +3295,7 @@ unsigned long wait_task_inactive(struct > > * is actually now running somewhere else! > > */ > > while (task_running(rq, p)) { > > - if (match_state && unlikely(READ_ONCE(p->__state) != match_state)) > > + if (match_state && !(READ_ONCE(p->__state) & match_state)) > > return 0; > > We lose the unlikely annotation there - but I guess it probably never > really mattered anyway? So any wait_task_inactive() caller does want that case to be true, but the whole match_state precondition mostly wrecks things anyway. If anything it should've been: if (likely(match_state && !(READ_ONCE(p->__state) & match_state))) return 0; but I can't find it in me to care too much here. > Suggestion #1: > > - Shouldn't we rename task_running() to something like task_on_cpu()? The > task_running() primitive is similar to TASK_RUNNING but is not based off > any TASK_FLAGS. That looks like a simple enough patch, lemme go do that. > Suggestion #2: > > - Shouldn't we eventually standardize on task->on_cpu on UP kernels too? > They don't really matter anymore, and doing so removes #ifdefs and makes > the code easier to read. Probably, but that sounds like something that'll spiral out of control real quick, so I'll leave that on the TODO list somewhere. > > cpu_relax(); > > } > > @@ -3310,7 +3310,7 @@ unsigned long wait_task_inactive(struct > > running = task_running(rq, p); > > queued = task_on_rq_queued(p); > > ncsw = 0; > > - if (!match_state || READ_ONCE(p->__state) == match_state) > > + if (!match_state || (READ_ONCE(p->__state) & match_state)) > > ncsw = p->nvcsw | LONG_MIN; /* sets MSB */ > > task_rq_unlock(rq, p, &rf); > > Suggestion #3: > > - Couldn't the following users with a 0 mask: > > drivers/powercap/idle_inject.c: wait_task_inactive(iit->tsk, 0); > fs/coredump.c: wait_task_inactive(ptr->task, 0); > > Use ~0 instead (exposed as TASK_ANY or so) and then we can drop the > !match_state special case? > > They'd do something like: > > drivers/powercap/idle_inject.c: wait_task_inactive(iit->tsk, TASK_ANY); > fs/coredump.c: wait_task_inactive(ptr->task, TASK_ANY); > > It's not an entirely 100% equivalent transformation though, but looks OK > at first sight: ->__state will be some nonzero mask for genuine tasks > waiting to schedule out, so any match will be functionally the same as a > 0 flag telling us not to check any of the bits, right? I might be missing > something though. I too am thinking that should work. Added patch for that.
* Peter Zijlstra <peterz@infradead.org> wrote: > On Sun, Sep 04, 2022 at 12:44:36PM +0200, Ingo Molnar wrote: > > > > * Peter Zijlstra <peterz@infradead.org> wrote: > > > > > Make wait_task_inactive()'s @match_state work like ttwu()'s @state. > > > > > > That is, instead of an equal comparison, use it as a mask. This allows > > > matching multiple block conditions. > > > > > > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> > > > --- > > > kernel/sched/core.c | 4 ++-- > > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > > > --- a/kernel/sched/core.c > > > +++ b/kernel/sched/core.c > > > @@ -3295,7 +3295,7 @@ unsigned long wait_task_inactive(struct > > > * is actually now running somewhere else! > > > */ > > > while (task_running(rq, p)) { > > > - if (match_state && unlikely(READ_ONCE(p->__state) != match_state)) > > > + if (match_state && !(READ_ONCE(p->__state) & match_state)) > > > return 0; > > > > We lose the unlikely annotation there - but I guess it probably never > > really mattered anyway? > > So any wait_task_inactive() caller does want that case to be true, but > the whole match_state precondition mostly wrecks things anyway. If > anything it should've been: > > if (likely(match_state && !(READ_ONCE(p->__state) & match_state))) > return 0; > > but I can't find it in me to care too much here. Yeah, I agree that this is probably the most likely branch - and default compiler code generation behavior should be pretty close to that to begin with. Ie. ack on dropping the unlikely() annotation. :-) Might make sense to add a sentence to the changelog though, in case anyone (like me) is wondering about whether the dropped annotation was intended. Thanks, Ingo
On Tue, Sep 06, 2022 at 12:54:34PM +0200, Peter Zijlstra wrote: > > Suggestion #1: > > > > - Shouldn't we rename task_running() to something like task_on_cpu()? The > > task_running() primitive is similar to TASK_RUNNING but is not based off > > any TASK_FLAGS. > > That looks like a simple enough patch, lemme go do that. --- Subject: sched: Rename task_running() to task_on_cpu() From: Peter Zijlstra <peterz@infradead.org> Date: Tue Sep 6 12:33:04 CEST 2022 There is some ambiguity about task_running() in that it is unrelated to TASK_RUNNING but instead tests ->on_cpu. As such, rename the thing task_on_cpu(). Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> --- kernel/sched/core.c | 10 +++++----- kernel/sched/core_sched.c | 2 +- kernel/sched/deadline.c | 6 +++--- kernel/sched/fair.c | 2 +- kernel/sched/rt.c | 6 +++--- kernel/sched/sched.h | 2 +- 6 files changed, 14 insertions(+), 14 deletions(-) --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2778,7 +2778,7 @@ static int affine_move_task(struct rq *r return -EINVAL; } - if (task_running(rq, p) || READ_ONCE(p->__state) == TASK_WAKING) { + if (task_on_cpu(rq, p) || READ_ONCE(p->__state) == TASK_WAKING) { /* * MIGRATE_ENABLE gets here because 'p == current', but for * anything else we cannot do is_migration_disabled(), punt @@ -3290,11 +3290,11 @@ unsigned long wait_task_inactive(struct * * NOTE! Since we don't hold any locks, it's not * even sure that "rq" stays as the right runqueue! - * But we don't care, since "task_running()" will + * But we don't care, since "task_on_cpu()" will * return false if the runqueue has changed and p * is actually now running somewhere else! */ - while (task_running(rq, p)) { + while (task_on_cpu(rq, p)) { if (match_state && unlikely(READ_ONCE(p->__state) != match_state)) return 0; cpu_relax(); @@ -3307,7 +3307,7 @@ unsigned long wait_task_inactive(struct */ rq = task_rq_lock(p, &rf); trace_sched_wait_task(p); - running = task_running(rq, p); + running = task_on_cpu(rq, p); queued = task_on_rq_queued(p); ncsw = 0; if (!match_state || READ_ONCE(p->__state) == match_state) @@ -8649,7 +8649,7 @@ int __sched yield_to(struct task_struct if (curr->sched_class != p->sched_class) goto out_unlock; - if (task_running(p_rq, p) || !task_is_running(p)) + if (task_on_cpu(p_rq, p) || !task_is_running(p)) goto out_unlock; yielded = curr->sched_class->yield_to_task(rq, p); --- a/kernel/sched/core_sched.c +++ b/kernel/sched/core_sched.c @@ -88,7 +88,7 @@ static unsigned long sched_core_update_c * core has now entered/left forced idle state. Defer accounting to the * next scheduling edge, rather than always forcing a reschedule here. */ - if (task_running(rq, p)) + if (task_on_cpu(rq, p)) resched_curr(rq); task_rq_unlock(rq, p, &rf); --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -2087,7 +2087,7 @@ static void task_fork_dl(struct task_str static int pick_dl_task(struct rq *rq, struct task_struct *p, int cpu) { - if (!task_running(rq, p) && + if (!task_on_cpu(rq, p) && cpumask_test_cpu(cpu, &p->cpus_mask)) return 1; return 0; @@ -2241,7 +2241,7 @@ static struct rq *find_lock_later_rq(str if (double_lock_balance(rq, later_rq)) { if (unlikely(task_rq(task) != rq || !cpumask_test_cpu(later_rq->cpu, &task->cpus_mask) || - task_running(rq, task) || + task_on_cpu(rq, task) || !dl_task(task) || !task_on_rq_queued(task))) { double_unlock_balance(rq, later_rq); @@ -2475,7 +2475,7 @@ static void pull_dl_task(struct rq *this */ static void task_woken_dl(struct rq *rq, struct task_struct *p) { - if (!task_running(rq, p) && + if (!task_on_cpu(rq, p) && !test_tsk_need_resched(rq->curr) && p->nr_cpus_allowed > 1 && dl_task(rq->curr) && --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -7938,7 +7938,7 @@ int can_migrate_task(struct task_struct /* Record that we found at least one task that could run on dst_cpu */ env->flags &= ~LBF_ALL_PINNED; - if (task_running(env->src_rq, p)) { + if (task_on_cpu(env->src_rq, p)) { schedstat_inc(p->stats.nr_failed_migrations_running); return 0; } --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -1849,7 +1849,7 @@ static void put_prev_task_rt(struct rq * static int pick_rt_task(struct rq *rq, struct task_struct *p, int cpu) { - if (!task_running(rq, p) && + if (!task_on_cpu(rq, p) && cpumask_test_cpu(cpu, &p->cpus_mask)) return 1; @@ -2004,7 +2004,7 @@ static struct rq *find_lock_lowest_rq(st */ if (unlikely(task_rq(task) != rq || !cpumask_test_cpu(lowest_rq->cpu, &task->cpus_mask) || - task_running(rq, task) || + task_on_cpu(rq, task) || !rt_task(task) || !task_on_rq_queued(task))) { @@ -2462,7 +2462,7 @@ static void pull_rt_task(struct rq *this */ static void task_woken_rt(struct rq *rq, struct task_struct *p) { - bool need_to_push = !task_running(rq, p) && + bool need_to_push = !task_on_cpu(rq, p) && !test_tsk_need_resched(rq->curr) && p->nr_cpus_allowed > 1 && (dl_task(rq->curr) || rt_task(rq->curr)) && --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -2060,7 +2060,7 @@ static inline int task_current(struct rq return rq->curr == p; } -static inline int task_running(struct rq *rq, struct task_struct *p) +static inline int task_on_cpu(struct rq *rq, struct task_struct *p) { #ifdef CONFIG_SMP return p->on_cpu;
On Tue, Sep 06, 2022 at 12:54:34PM +0200, Peter Zijlstra wrote: > > Suggestion #3: > > > > - Couldn't the following users with a 0 mask: > > > > drivers/powercap/idle_inject.c: wait_task_inactive(iit->tsk, 0); > > fs/coredump.c: wait_task_inactive(ptr->task, 0); > > > > Use ~0 instead (exposed as TASK_ANY or so) and then we can drop the > > !match_state special case? > > > > They'd do something like: > > > > drivers/powercap/idle_inject.c: wait_task_inactive(iit->tsk, TASK_ANY); > > fs/coredump.c: wait_task_inactive(ptr->task, TASK_ANY); > > > > It's not an entirely 100% equivalent transformation though, but looks OK > > at first sight: ->__state will be some nonzero mask for genuine tasks > > waiting to schedule out, so any match will be functionally the same as a > > 0 flag telling us not to check any of the bits, right? I might be missing > > something though. > > I too am thinking that should work. Added patch for that. --- Subject: sched: Add TASK_ANY for wait_task_inactive() From: Peter Zijlstra <peterz@infradead.org> Date: Tue Sep 6 12:39:55 CEST 2022 Now that wait_task_inactive()'s @match_state argument is a mask (like ttwu()) it is possible to replace the special !match_state case with an 'all-states' value such that any blocked state will match. Suggested-by: Ingo Molnar (mingo@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> --- drivers/powercap/idle_inject.c | 2 +- fs/coredump.c | 2 +- include/linux/sched.h | 2 ++ kernel/sched/core.c | 16 ++++++++-------- 4 files changed, 12 insertions(+), 10 deletions(-) --- a/drivers/powercap/idle_inject.c +++ b/drivers/powercap/idle_inject.c @@ -254,7 +254,7 @@ void idle_inject_stop(struct idle_inject iit = per_cpu_ptr(&idle_inject_thread, cpu); iit->should_run = 0; - wait_task_inactive(iit->tsk, 0); + wait_task_inactive(iit->tsk, TASK_ANY); } cpu_hotplug_enable(); --- a/fs/coredump.c +++ b/fs/coredump.c @@ -412,7 +412,7 @@ static int coredump_wait(int exit_code, */ ptr = core_state->dumper.next; while (ptr != NULL) { - wait_task_inactive(ptr->task, 0); + wait_task_inactive(ptr->task, TASK_ANY); ptr = ptr->next; } } --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -101,6 +101,8 @@ struct task_group; #define TASK_RTLOCK_WAIT 0x1000 #define TASK_STATE_MAX 0x2000 +#define TASK_ANY (TASK_STATE_MAX-1) + /* Convenience macros for the sake of set_current_state: */ #define TASK_KILLABLE (TASK_WAKEKILL | TASK_UNINTERRUPTIBLE) #define TASK_STOPPED (TASK_WAKEKILL | __TASK_STOPPED) --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -3254,12 +3254,12 @@ int migrate_swap(struct task_struct *cur /* * wait_task_inactive - wait for a thread to unschedule. * - * If @match_state is nonzero, it's the @p->state value just checked and - * not expected to change. If it changes, i.e. @p might have woken up, - * then return zero. When we succeed in waiting for @p to be off its CPU, - * we return a positive number (its total switch count). If a second call - * a short while later returns the same number, the caller can be sure that - * @p has remained unscheduled the whole time. + * Wait for the thread to block in any of the states set in @match_state. + * If it changes, i.e. @p might have woken up, then return zero. When we + * succeed in waiting for @p to be off its CPU, we return a positive number + * (its total switch count). If a second call a short while later returns the + * same number, the caller can be sure that @p has remained unscheduled the + * whole time. * * The caller must ensure that the task *will* unschedule sometime soon, * else this function might spin for a *long* time. This function can't @@ -3295,7 +3295,7 @@ unsigned long wait_task_inactive(struct * is actually now running somewhere else! */ while (task_on_cpu(rq, p)) { - if (match_state && !(READ_ONCE(p->__state) & match_state)) + if (!(READ_ONCE(p->__state) & match_state)) return 0; cpu_relax(); } @@ -3310,7 +3310,7 @@ unsigned long wait_task_inactive(struct running = task_on_cpu(rq, p); queued = task_on_rq_queued(p); ncsw = 0; - if (!match_state || (READ_ONCE(p->__state) & match_state)) + if (READ_ONCE(p->__state) & match_state) ncsw = p->nvcsw | LONG_MIN; /* sets MSB */ task_rq_unlock(rq, p, &rf);
--- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -3295,7 +3295,7 @@ unsigned long wait_task_inactive(struct * is actually now running somewhere else! */ while (task_running(rq, p)) { - if (match_state && unlikely(READ_ONCE(p->__state) != match_state)) + if (match_state && !(READ_ONCE(p->__state) & match_state)) return 0; cpu_relax(); } @@ -3310,7 +3310,7 @@ unsigned long wait_task_inactive(struct running = task_running(rq, p); queued = task_on_rq_queued(p); ncsw = 0; - if (!match_state || READ_ONCE(p->__state) == match_state) + if (!match_state || (READ_ONCE(p->__state) & match_state)) ncsw = p->nvcsw | LONG_MIN; /* sets MSB */ task_rq_unlock(rq, p, &rf);
Make wait_task_inactive()'s @match_state work like ttwu()'s @state. That is, instead of an equal comparison, use it as a mask. This allows matching multiple block conditions. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> --- kernel/sched/core.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)