Message ID | 20150605161021.GJ19282@twins.programming.kicks-ass.net (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Fri 2015-06-05 18:10:21, Peter Zijlstra wrote: > On Fri, Jun 05, 2015 at 05:01:08PM +0200, Petr Mladek wrote: > > Many kthreads go into an interruptible sleep when there is nothing > > to do. They should check if anyone did not requested the kthread > > to terminate, freeze, or park in the meantime. It is easy to do > > it a wrong way. > > INTERRUPTIBLE is the wrong state to idle in for kthreads, use > TASK_IDLE. > > --- > > commit 80ed87c8a9ca0cad7ca66cf3bbdfb17559a66dcf > Author: Peter Zijlstra <peterz@infradead.org> > Date: Fri May 8 14:23:45 2015 +0200 > > sched/wait: Introduce TASK_NOLOAD and TASK_IDLE > > Currently people use TASK_INTERRUPTIBLE to idle kthreads and wait for > 'work' because TASK_UNINTERRUPTIBLE contributes to the loadavg. Having > all idle kthreads contribute to the loadavg is somewhat silly. > > Now mostly this works OK, because kthreads have all their signals > masked. However there's a few sites where this is causing problems and > TASK_UNINTERRUPTIBLE should be used, except for that loadavg issue. > > This patch adds TASK_NOLOAD which, when combined with > TASK_UNINTERRUPTIBLE avoids the loadavg accounting. > > As most of imagined usage sites are loops where a thread wants to > idle, waiting for work, a helper TASK_IDLE is introduced. Just to be sure. Do you suggest to use TASK_IDLE everywhere in kthreads or only when the uninterruptible sleep is really needed? IMHO, we should not use TASK_IDLE in freezable kthreads because it would break freezing. Well, we could freezable_schedule() but only on locations where it is safe to get freezed. Anyway, we need to be careful here. BTW: What is the preferred way of freezing, please? Is it better to end up in the fridge or is it fine to call freezer_do_not_count(); or set PF_NOFREEZE when it is safe? The fridge looks more clean to me but in this case we should avoid uninterruptible sleep as much as possible. Best Regards, Petr > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> > Cc: Julian Anastasov <ja@ssi.bg> > Cc: Linus Torvalds <torvalds@linux-foundation.org> > Cc: NeilBrown <neilb@suse.de> > Cc: Oleg Nesterov <oleg@redhat.com> > Cc: Peter Zijlstra <peterz@infradead.org> > Cc: Thomas Gleixner <tglx@linutronix.de> > Signed-off-by: Ingo Molnar <mingo@kernel.org> > > diff --git a/include/linux/sched.h b/include/linux/sched.h > index dd07ac03f82a..7de815c6fa78 100644 > --- a/include/linux/sched.h > +++ b/include/linux/sched.h > @@ -218,9 +218,10 @@ print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq); > #define TASK_WAKEKILL 128 > #define TASK_WAKING 256 > #define TASK_PARKED 512 > -#define TASK_STATE_MAX 1024 > +#define TASK_NOLOAD 1024 > +#define TASK_STATE_MAX 2048 > > -#define TASK_STATE_TO_CHAR_STR "RSDTtXZxKWP" > +#define TASK_STATE_TO_CHAR_STR "RSDTtXZxKWPN" > > extern char ___assert_task_state[1 - 2*!!( > sizeof(TASK_STATE_TO_CHAR_STR)-1 != ilog2(TASK_STATE_MAX)+1)]; > @@ -230,6 +231,8 @@ extern char ___assert_task_state[1 - 2*!!( > #define TASK_STOPPED (TASK_WAKEKILL | __TASK_STOPPED) > #define TASK_TRACED (TASK_WAKEKILL | __TASK_TRACED) > > +#define TASK_IDLE (TASK_UNINTERRUPTIBLE | TASK_NOLOAD) > + > /* Convenience macros for the sake of wake_up */ > #define TASK_NORMAL (TASK_INTERRUPTIBLE | TASK_UNINTERRUPTIBLE) > #define TASK_ALL (TASK_NORMAL | __TASK_STOPPED | __TASK_TRACED) > @@ -245,7 +248,8 @@ extern char ___assert_task_state[1 - 2*!!( > ((task->state & (__TASK_STOPPED | __TASK_TRACED)) != 0) > #define task_contributes_to_load(task) \ > ((task->state & TASK_UNINTERRUPTIBLE) != 0 && \ > - (task->flags & PF_FROZEN) == 0) > + (task->flags & PF_FROZEN) == 0 && \ > + (task->state & TASK_NOLOAD) == 0) > > #ifdef CONFIG_DEBUG_ATOMIC_SLEEP > > diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h > index 30fedaf3e56a..d57a575fe31f 100644 > --- a/include/trace/events/sched.h > +++ b/include/trace/events/sched.h > @@ -147,7 +147,8 @@ TRACE_EVENT(sched_switch, > __print_flags(__entry->prev_state & (TASK_STATE_MAX-1), "|", > { 1, "S"} , { 2, "D" }, { 4, "T" }, { 8, "t" }, > { 16, "Z" }, { 32, "X" }, { 64, "x" }, > - { 128, "K" }, { 256, "W" }, { 512, "P" }) : "R", > + { 128, "K" }, { 256, "W" }, { 512, "P" }, > + { 1024, "N" }) : "R", > __entry->prev_state & TASK_STATE_MAX ? "+" : "", > __entry->next_comm, __entry->next_pid, __entry->next_prio) > ); -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 2015-06-08 at 12:01 +0200, Petr Mladek wrote: > Just to be sure. Do you suggest to use TASK_IDLE everywhere in > kthreads or only when the uninterruptible sleep is really needed? Always, only use INTERRUPTIBLE when you're actually interruptible, that is you want signals or such muck to terminate your wait. > IMHO, we should not use TASK_IDLE in freezable kthreads because > it would break freezing. How so? The task is IDLE, its not doing anything. > Well, we could freezable_schedule() but only > on locations where it is safe to get freezed. Anyway, we need to > be careful here. s/freezed/frozen/ Bah, you made me look at the freezer code, karma reduction for you. And this is the arch typical freeze point if ever there was one, you're checking kthread_stop, if we can terminate the kthread, we can certainly get frozen. > BTW: What is the preferred way of freezing, please? Is it better > to end up in the fridge or is it fine to call freezer_do_not_count(); > or set PF_NOFREEZE when it is safe? freezable_schedule() is fine in this case. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 5 Jun 2015 18:10:21 +0200 Peter Zijlstra <peterz@infradead.org> wrote: > On Fri, Jun 05, 2015 at 05:01:08PM +0200, Petr Mladek wrote: > > Many kthreads go into an interruptible sleep when there is nothing > > to do. They should check if anyone did not requested the kthread > > to terminate, freeze, or park in the meantime. It is easy to do > > it a wrong way. > > INTERRUPTIBLE is the wrong state to idle in for kthreads, use > TASK_IDLE. > > --- > > commit 80ed87c8a9ca0cad7ca66cf3bbdfb17559a66dcf > Author: Peter Zijlstra <peterz@infradead.org> > Date: Fri May 8 14:23:45 2015 +0200 > > sched/wait: Introduce TASK_NOLOAD and TASK_IDLE > > Currently people use TASK_INTERRUPTIBLE to idle kthreads and wait for > 'work' because TASK_UNINTERRUPTIBLE contributes to the loadavg. Having > all idle kthreads contribute to the loadavg is somewhat silly. Not to mention, tasks in TASK_UNINTERRUPTIBLE state for too long will trigger hung task detection. > > Now mostly this works OK, because kthreads have all their signals > masked. However there's a few sites where this is causing problems and > TASK_UNINTERRUPTIBLE should be used, except for that loadavg issue. > > This patch adds TASK_NOLOAD which, when combined with > TASK_UNINTERRUPTIBLE avoids the loadavg accounting. > > As most of imagined usage sites are loops where a thread wants to > idle, waiting for work, a helper TASK_IDLE is introduced. > > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Steven Rostedt <rostedt@goodmis.org> -- Steve > Cc: Julian Anastasov <ja@ssi.bg> > Cc: Linus Torvalds <torvalds@linux-foundation.org> > Cc: NeilBrown <neilb@suse.de> > Cc: Oleg Nesterov <oleg@redhat.com> > Cc: Peter Zijlstra <peterz@infradead.org> > Cc: Thomas Gleixner <tglx@linutronix.de> > Signed-off-by: Ingo Molnar <mingo@kernel.org> > > diff --git a/include/linux/sched.h b/include/linux/sched.h -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello, On Mon, Jun 08, 2015 at 12:01:07PM +0200, Petr Mladek wrote: > BTW: What is the preferred way of freezing, please? Is it better > to end up in the fridge or is it fine to call freezer_do_not_count(); > or set PF_NOFREEZE when it is safe? There's no one good answer. The closest would be "don't use freezer on kthreads". As Peter said, exit points are always safe freezing points and it's generally a good idea to avoid adding one anywhere else. Thanks.
On Mon, Jun 08, 2015 at 01:48:10PM -0400, Steven Rostedt wrote: > > commit 80ed87c8a9ca0cad7ca66cf3bbdfb17559a66dcf > > Author: Peter Zijlstra <peterz@infradead.org> > > Date: Fri May 8 14:23:45 2015 +0200 > > > > sched/wait: Introduce TASK_NOLOAD and TASK_IDLE > > > > Currently people use TASK_INTERRUPTIBLE to idle kthreads and wait for > > 'work' because TASK_UNINTERRUPTIBLE contributes to the loadavg. Having > > all idle kthreads contribute to the loadavg is somewhat silly. > > Not to mention, tasks in TASK_UNINTERRUPTIBLE state for too long will > trigger hung task detection. Right, and I had not considered that, but it turns out the hung_task detector checks p->state == TASK_UNINTERRUPTIBLE, so TASK_IDLE is indeed safe from that. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 10 Jun 2015 11:07:24 +0200 Peter Zijlstra <peterz@infradead.org> wrote: > > Not to mention, tasks in TASK_UNINTERRUPTIBLE state for too long will > > trigger hung task detection. > > Right, and I had not considered that, but it turns out the hung_task > detector checks p->state == TASK_UNINTERRUPTIBLE, so TASK_IDLE is indeed > safe from that. Also, I would assume that TASK_IDLE only makes sense for kernel threads, I wonder if we should add an assertion in schedule that triggers if a task is scheduling with TASK_IDLE and is not a kernel thread (has its own mm?) -- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 10 Jun 2015, Steven Rostedt wrote: > > Right, and I had not considered that, but it turns out the hung_task > > detector checks p->state == TASK_UNINTERRUPTIBLE, so TASK_IDLE is indeed > > safe from that. > > Also, I would assume that TASK_IDLE only makes sense for kernel > threads, I wonder if we should add an assertion in schedule that > triggers if a task is scheduling with TASK_IDLE and is not a kernel > thread (has its own mm?) For the sake of completnes -- testing for !task_struct->mm is not a correct test to find out whether given entity is a kernel thread; kernel threads are free to temporarily adopt user struct mm via use_mm() (usually for handling AIO on behalf of a particular struct mm). The correct check is to look at PF_KTHREAD flag in task_struct->flags.
diff --git a/include/linux/sched.h b/include/linux/sched.h index dd07ac03f82a..7de815c6fa78 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -218,9 +218,10 @@ print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq); #define TASK_WAKEKILL 128 #define TASK_WAKING 256 #define TASK_PARKED 512 -#define TASK_STATE_MAX 1024 +#define TASK_NOLOAD 1024 +#define TASK_STATE_MAX 2048 -#define TASK_STATE_TO_CHAR_STR "RSDTtXZxKWP" +#define TASK_STATE_TO_CHAR_STR "RSDTtXZxKWPN" extern char ___assert_task_state[1 - 2*!!( sizeof(TASK_STATE_TO_CHAR_STR)-1 != ilog2(TASK_STATE_MAX)+1)]; @@ -230,6 +231,8 @@ extern char ___assert_task_state[1 - 2*!!( #define TASK_STOPPED (TASK_WAKEKILL | __TASK_STOPPED) #define TASK_TRACED (TASK_WAKEKILL | __TASK_TRACED) +#define TASK_IDLE (TASK_UNINTERRUPTIBLE | TASK_NOLOAD) + /* Convenience macros for the sake of wake_up */ #define TASK_NORMAL (TASK_INTERRUPTIBLE | TASK_UNINTERRUPTIBLE) #define TASK_ALL (TASK_NORMAL | __TASK_STOPPED | __TASK_TRACED) @@ -245,7 +248,8 @@ extern char ___assert_task_state[1 - 2*!!( ((task->state & (__TASK_STOPPED | __TASK_TRACED)) != 0) #define task_contributes_to_load(task) \ ((task->state & TASK_UNINTERRUPTIBLE) != 0 && \ - (task->flags & PF_FROZEN) == 0) + (task->flags & PF_FROZEN) == 0 && \ + (task->state & TASK_NOLOAD) == 0) #ifdef CONFIG_DEBUG_ATOMIC_SLEEP diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h index 30fedaf3e56a..d57a575fe31f 100644 --- a/include/trace/events/sched.h +++ b/include/trace/events/sched.h @@ -147,7 +147,8 @@ TRACE_EVENT(sched_switch, __print_flags(__entry->prev_state & (TASK_STATE_MAX-1), "|", { 1, "S"} , { 2, "D" }, { 4, "T" }, { 8, "t" }, { 16, "Z" }, { 32, "X" }, { 64, "x" }, - { 128, "K" }, { 256, "W" }, { 512, "P" }) : "R", + { 128, "K" }, { 256, "W" }, { 512, "P" }, + { 1024, "N" }) : "R", __entry->prev_state & TASK_STATE_MAX ? "+" : "", __entry->next_comm, __entry->next_pid, __entry->next_prio) );