diff mbox

[RFC,09/18] kthread: Make it easier to correctly sleep in iterant kthreads

Message ID 20150605161021.GJ19282@twins.programming.kicks-ass.net (mailing list archive)
State New, archived
Headers show

Commit Message

Peter Zijlstra June 5, 2015, 4:10 p.m. UTC
On Fri, Jun 05, 2015 at 05:01:08PM +0200, Petr Mladek wrote:
> Many kthreads go into an interruptible sleep when there is nothing
> to do. They should check if anyone did not requested the kthread
> to terminate, freeze, or park in the meantime. It is easy to do
> it a wrong way.

INTERRUPTIBLE is the wrong state to idle in for kthreads, use
TASK_IDLE.

---

commit 80ed87c8a9ca0cad7ca66cf3bbdfb17559a66dcf
Author: Peter Zijlstra <peterz@infradead.org>
Date:   Fri May 8 14:23:45 2015 +0200

    sched/wait: Introduce TASK_NOLOAD and TASK_IDLE
    
    Currently people use TASK_INTERRUPTIBLE to idle kthreads and wait for
    'work' because TASK_UNINTERRUPTIBLE contributes to the loadavg. Having
    all idle kthreads contribute to the loadavg is somewhat silly.
    
    Now mostly this works OK, because kthreads have all their signals
    masked. However there's a few sites where this is causing problems and
    TASK_UNINTERRUPTIBLE should be used, except for that loadavg issue.
    
    This patch adds TASK_NOLOAD which, when combined with
    TASK_UNINTERRUPTIBLE avoids the loadavg accounting.
    
    As most of imagined usage sites are loops where a thread wants to
    idle, waiting for work, a helper TASK_IDLE is introduced.
    
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Cc: Julian Anastasov <ja@ssi.bg>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: NeilBrown <neilb@suse.de>
    Cc: Oleg Nesterov <oleg@redhat.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Ingo Molnar <mingo@kernel.org>

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Petr Mladek June 8, 2015, 10:01 a.m. UTC | #1
On Fri 2015-06-05 18:10:21, Peter Zijlstra wrote:
> On Fri, Jun 05, 2015 at 05:01:08PM +0200, Petr Mladek wrote:
> > Many kthreads go into an interruptible sleep when there is nothing
> > to do. They should check if anyone did not requested the kthread
> > to terminate, freeze, or park in the meantime. It is easy to do
> > it a wrong way.
> 
> INTERRUPTIBLE is the wrong state to idle in for kthreads, use
> TASK_IDLE.
> 
> ---
> 
> commit 80ed87c8a9ca0cad7ca66cf3bbdfb17559a66dcf
> Author: Peter Zijlstra <peterz@infradead.org>
> Date:   Fri May 8 14:23:45 2015 +0200
> 
>     sched/wait: Introduce TASK_NOLOAD and TASK_IDLE
>     
>     Currently people use TASK_INTERRUPTIBLE to idle kthreads and wait for
>     'work' because TASK_UNINTERRUPTIBLE contributes to the loadavg. Having
>     all idle kthreads contribute to the loadavg is somewhat silly.
>     
>     Now mostly this works OK, because kthreads have all their signals
>     masked. However there's a few sites where this is causing problems and
>     TASK_UNINTERRUPTIBLE should be used, except for that loadavg issue.
>     
>     This patch adds TASK_NOLOAD which, when combined with
>     TASK_UNINTERRUPTIBLE avoids the loadavg accounting.
>     
>     As most of imagined usage sites are loops where a thread wants to
>     idle, waiting for work, a helper TASK_IDLE is introduced.

Just to be sure. Do you suggest to use TASK_IDLE everywhere in
kthreads or only when the uninterruptible sleep is really needed?

IMHO, we should not use TASK_IDLE in freezable kthreads because
it would break freezing. Well, we could freezable_schedule() but only
on locations where it is safe to get freezed. Anyway, we need to
be careful here.

BTW: What is the preferred way of freezing, please? Is it better
to end up in the fridge or is it fine to call freezer_do_not_count();
or set PF_NOFREEZE when it is safe?

The fridge looks more clean to me but in this case we should avoid
uninterruptible sleep as much as possible.


Best Regards,
Petr

>     Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
>     Cc: Julian Anastasov <ja@ssi.bg>
>     Cc: Linus Torvalds <torvalds@linux-foundation.org>
>     Cc: NeilBrown <neilb@suse.de>
>     Cc: Oleg Nesterov <oleg@redhat.com>
>     Cc: Peter Zijlstra <peterz@infradead.org>
>     Cc: Thomas Gleixner <tglx@linutronix.de>
>     Signed-off-by: Ingo Molnar <mingo@kernel.org>
> 
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index dd07ac03f82a..7de815c6fa78 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -218,9 +218,10 @@ print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq);
>  #define TASK_WAKEKILL		128
>  #define TASK_WAKING		256
>  #define TASK_PARKED		512
> -#define TASK_STATE_MAX		1024
> +#define TASK_NOLOAD		1024
> +#define TASK_STATE_MAX		2048
>  
> -#define TASK_STATE_TO_CHAR_STR "RSDTtXZxKWP"
> +#define TASK_STATE_TO_CHAR_STR "RSDTtXZxKWPN"
>  
>  extern char ___assert_task_state[1 - 2*!!(
>  		sizeof(TASK_STATE_TO_CHAR_STR)-1 != ilog2(TASK_STATE_MAX)+1)];
> @@ -230,6 +231,8 @@ extern char ___assert_task_state[1 - 2*!!(
>  #define TASK_STOPPED		(TASK_WAKEKILL | __TASK_STOPPED)
>  #define TASK_TRACED		(TASK_WAKEKILL | __TASK_TRACED)
>  
> +#define TASK_IDLE		(TASK_UNINTERRUPTIBLE | TASK_NOLOAD)
> +
>  /* Convenience macros for the sake of wake_up */
>  #define TASK_NORMAL		(TASK_INTERRUPTIBLE | TASK_UNINTERRUPTIBLE)
>  #define TASK_ALL		(TASK_NORMAL | __TASK_STOPPED | __TASK_TRACED)
> @@ -245,7 +248,8 @@ extern char ___assert_task_state[1 - 2*!!(
>  			((task->state & (__TASK_STOPPED | __TASK_TRACED)) != 0)
>  #define task_contributes_to_load(task)	\
>  				((task->state & TASK_UNINTERRUPTIBLE) != 0 && \
> -				 (task->flags & PF_FROZEN) == 0)
> +				 (task->flags & PF_FROZEN) == 0 && \
> +				 (task->state & TASK_NOLOAD) == 0)
>  
>  #ifdef CONFIG_DEBUG_ATOMIC_SLEEP
>  
> diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
> index 30fedaf3e56a..d57a575fe31f 100644
> --- a/include/trace/events/sched.h
> +++ b/include/trace/events/sched.h
> @@ -147,7 +147,8 @@ TRACE_EVENT(sched_switch,
>  		  __print_flags(__entry->prev_state & (TASK_STATE_MAX-1), "|",
>  				{ 1, "S"} , { 2, "D" }, { 4, "T" }, { 8, "t" },
>  				{ 16, "Z" }, { 32, "X" }, { 64, "x" },
> -				{ 128, "K" }, { 256, "W" }, { 512, "P" }) : "R",
> +				{ 128, "K" }, { 256, "W" }, { 512, "P" },
> +				{ 1024, "N" }) : "R",
>  		__entry->prev_state & TASK_STATE_MAX ? "+" : "",
>  		__entry->next_comm, __entry->next_pid, __entry->next_prio)
>  );
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Peter Zijlstra June 8, 2015, 11:39 a.m. UTC | #2
On Mon, 2015-06-08 at 12:01 +0200, Petr Mladek wrote:

> Just to be sure. Do you suggest to use TASK_IDLE everywhere in
> kthreads or only when the uninterruptible sleep is really needed?

Always, only use INTERRUPTIBLE when you're actually interruptible, that
is you want signals or such muck to terminate your wait.

> IMHO, we should not use TASK_IDLE in freezable kthreads because
> it would break freezing.

How so? The task is IDLE, its not doing anything.

>  Well, we could freezable_schedule() but only
> on locations where it is safe to get freezed. Anyway, we need to
> be careful here.

s/freezed/frozen/

Bah, you made me look at the freezer code, karma reduction for you.

And this is the arch typical freeze point if ever there was one, you're
checking kthread_stop, if we can terminate the kthread, we can certainly
get frozen.

> BTW: What is the preferred way of freezing, please? Is it better
> to end up in the fridge or is it fine to call freezer_do_not_count();
> or set PF_NOFREEZE when it is safe?

freezable_schedule() is fine in this case.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Steven Rostedt June 8, 2015, 5:48 p.m. UTC | #3
On Fri, 5 Jun 2015 18:10:21 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Fri, Jun 05, 2015 at 05:01:08PM +0200, Petr Mladek wrote:
> > Many kthreads go into an interruptible sleep when there is nothing
> > to do. They should check if anyone did not requested the kthread
> > to terminate, freeze, or park in the meantime. It is easy to do
> > it a wrong way.
> 
> INTERRUPTIBLE is the wrong state to idle in for kthreads, use
> TASK_IDLE.
> 
> ---
> 
> commit 80ed87c8a9ca0cad7ca66cf3bbdfb17559a66dcf
> Author: Peter Zijlstra <peterz@infradead.org>
> Date:   Fri May 8 14:23:45 2015 +0200
> 
>     sched/wait: Introduce TASK_NOLOAD and TASK_IDLE
>     
>     Currently people use TASK_INTERRUPTIBLE to idle kthreads and wait for
>     'work' because TASK_UNINTERRUPTIBLE contributes to the loadavg. Having
>     all idle kthreads contribute to the loadavg is somewhat silly.

Not to mention, tasks in TASK_UNINTERRUPTIBLE state for too long will
trigger hung task detection.


>     
>     Now mostly this works OK, because kthreads have all their signals
>     masked. However there's a few sites where this is causing problems and
>     TASK_UNINTERRUPTIBLE should be used, except for that loadavg issue.
>     
>     This patch adds TASK_NOLOAD which, when combined with
>     TASK_UNINTERRUPTIBLE avoids the loadavg accounting.
>     
>     As most of imagined usage sites are loops where a thread wants to
>     idle, waiting for work, a helper TASK_IDLE is introduced.
>     
>     Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Acked-by: Steven Rostedt <rostedt@goodmis.org>

-- Steve

>     Cc: Julian Anastasov <ja@ssi.bg>
>     Cc: Linus Torvalds <torvalds@linux-foundation.org>
>     Cc: NeilBrown <neilb@suse.de>
>     Cc: Oleg Nesterov <oleg@redhat.com>
>     Cc: Peter Zijlstra <peterz@infradead.org>
>     Cc: Thomas Gleixner <tglx@linutronix.de>
>     Signed-off-by: Ingo Molnar <mingo@kernel.org>
> 
> diff --git a/include/linux/sched.h b/include/linux/sched.h
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tejun Heo June 9, 2015, 7:32 a.m. UTC | #4
Hello,

On Mon, Jun 08, 2015 at 12:01:07PM +0200, Petr Mladek wrote:
> BTW: What is the preferred way of freezing, please? Is it better
> to end up in the fridge or is it fine to call freezer_do_not_count();
> or set PF_NOFREEZE when it is safe?

There's no one good answer.  The closest would be "don't use freezer
on kthreads".  As Peter said, exit points are always safe freezing
points and it's generally a good idea to avoid adding one anywhere
else.

Thanks.
Peter Zijlstra June 10, 2015, 9:07 a.m. UTC | #5
On Mon, Jun 08, 2015 at 01:48:10PM -0400, Steven Rostedt wrote:
> > commit 80ed87c8a9ca0cad7ca66cf3bbdfb17559a66dcf
> > Author: Peter Zijlstra <peterz@infradead.org>
> > Date:   Fri May 8 14:23:45 2015 +0200
> > 
> >     sched/wait: Introduce TASK_NOLOAD and TASK_IDLE
> >     
> >     Currently people use TASK_INTERRUPTIBLE to idle kthreads and wait for
> >     'work' because TASK_UNINTERRUPTIBLE contributes to the loadavg. Having
> >     all idle kthreads contribute to the loadavg is somewhat silly.
> 
> Not to mention, tasks in TASK_UNINTERRUPTIBLE state for too long will
> trigger hung task detection.

Right, and I had not considered that, but it turns out the hung_task
detector checks p->state == TASK_UNINTERRUPTIBLE, so TASK_IDLE is indeed
safe from that.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Steven Rostedt June 10, 2015, 2:07 p.m. UTC | #6
On Wed, 10 Jun 2015 11:07:24 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> > Not to mention, tasks in TASK_UNINTERRUPTIBLE state for too long will
> > trigger hung task detection.
> 
> Right, and I had not considered that, but it turns out the hung_task
> detector checks p->state == TASK_UNINTERRUPTIBLE, so TASK_IDLE is indeed
> safe from that.

Also, I would assume that TASK_IDLE only makes sense for kernel
threads, I wonder if we should add an assertion in schedule that
triggers if a task is scheduling with TASK_IDLE and is not a kernel
thread (has its own mm?)

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jiri Kosina June 11, 2015, 4:28 a.m. UTC | #7
On Wed, 10 Jun 2015, Steven Rostedt wrote:

> > Right, and I had not considered that, but it turns out the hung_task
> > detector checks p->state == TASK_UNINTERRUPTIBLE, so TASK_IDLE is indeed
> > safe from that.
> 
> Also, I would assume that TASK_IDLE only makes sense for kernel
> threads, I wonder if we should add an assertion in schedule that
> triggers if a task is scheduling with TASK_IDLE and is not a kernel
> thread (has its own mm?)

For the sake of completnes -- testing for !task_struct->mm is not a 
correct test to find out whether given entity is a kernel thread; kernel 
threads are free to temporarily adopt user struct mm via use_mm() (usually 
for handling AIO on behalf of a particular struct mm).

The correct check is to look at PF_KTHREAD flag in task_struct->flags.
diff mbox

Patch

diff --git a/include/linux/sched.h b/include/linux/sched.h
index dd07ac03f82a..7de815c6fa78 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -218,9 +218,10 @@  print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq);
 #define TASK_WAKEKILL		128
 #define TASK_WAKING		256
 #define TASK_PARKED		512
-#define TASK_STATE_MAX		1024
+#define TASK_NOLOAD		1024
+#define TASK_STATE_MAX		2048
 
-#define TASK_STATE_TO_CHAR_STR "RSDTtXZxKWP"
+#define TASK_STATE_TO_CHAR_STR "RSDTtXZxKWPN"
 
 extern char ___assert_task_state[1 - 2*!!(
 		sizeof(TASK_STATE_TO_CHAR_STR)-1 != ilog2(TASK_STATE_MAX)+1)];
@@ -230,6 +231,8 @@  extern char ___assert_task_state[1 - 2*!!(
 #define TASK_STOPPED		(TASK_WAKEKILL | __TASK_STOPPED)
 #define TASK_TRACED		(TASK_WAKEKILL | __TASK_TRACED)
 
+#define TASK_IDLE		(TASK_UNINTERRUPTIBLE | TASK_NOLOAD)
+
 /* Convenience macros for the sake of wake_up */
 #define TASK_NORMAL		(TASK_INTERRUPTIBLE | TASK_UNINTERRUPTIBLE)
 #define TASK_ALL		(TASK_NORMAL | __TASK_STOPPED | __TASK_TRACED)
@@ -245,7 +248,8 @@  extern char ___assert_task_state[1 - 2*!!(
 			((task->state & (__TASK_STOPPED | __TASK_TRACED)) != 0)
 #define task_contributes_to_load(task)	\
 				((task->state & TASK_UNINTERRUPTIBLE) != 0 && \
-				 (task->flags & PF_FROZEN) == 0)
+				 (task->flags & PF_FROZEN) == 0 && \
+				 (task->state & TASK_NOLOAD) == 0)
 
 #ifdef CONFIG_DEBUG_ATOMIC_SLEEP
 
diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
index 30fedaf3e56a..d57a575fe31f 100644
--- a/include/trace/events/sched.h
+++ b/include/trace/events/sched.h
@@ -147,7 +147,8 @@  TRACE_EVENT(sched_switch,
 		  __print_flags(__entry->prev_state & (TASK_STATE_MAX-1), "|",
 				{ 1, "S"} , { 2, "D" }, { 4, "T" }, { 8, "t" },
 				{ 16, "Z" }, { 32, "X" }, { 64, "x" },
-				{ 128, "K" }, { 256, "W" }, { 512, "P" }) : "R",
+				{ 128, "K" }, { 256, "W" }, { 512, "P" },
+				{ 1024, "N" }) : "R",
 		__entry->prev_state & TASK_STATE_MAX ? "+" : "",
 		__entry->next_comm, __entry->next_pid, __entry->next_prio)
 );