diff mbox

[BUG] "sched: Remove rq->lock from the first half of ttwu()" locks up on ARM

Message ID 1306358128.21578.107.camel@twins (mailing list archive)
State New, archived
Headers show

Commit Message

Peter Zijlstra May 25, 2011, 9:15 p.m. UTC
On Wed, 2011-05-25 at 19:08 +0200, Peter Zijlstra wrote:
> Ooh, shiny, whilst typing this I got an NMI-watchdog error reporting me
> that CPU1 got stuck in try_to_wake_up(), so it looks like I can indeed
> reproduce some funnies.
> 
> /me goes dig in. 

Does the below make your ARM box happy again?

It restores the old ttwu behaviour for this case and seems to not mess
up my x86 with __ARCH_WANT_INTERRUPTS_ON_CTXSW.

Figuring out why the existing condition failed and writing a proper
changelog requires a mind that is slightly less deprived of sleep and I
shall attempt that tomorrow -- provided this does indeed work for you.

---

Comments

Yong Zhang May 26, 2011, 7:29 a.m. UTC | #1
On Thu, May 26, 2011 at 5:15 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Wed, 2011-05-25 at 19:08 +0200, Peter Zijlstra wrote:
>> Ooh, shiny, whilst typing this I got an NMI-watchdog error reporting me
>> that CPU1 got stuck in try_to_wake_up(), so it looks like I can indeed
>> reproduce some funnies.
>>
>> /me goes dig in.
>
> Does the below make your ARM box happy again?
>
> It restores the old ttwu behaviour for this case and seems to not mess
> up my x86 with __ARCH_WANT_INTERRUPTS_ON_CTXSW.
>
> Figuring out why the existing condition failed

Seems  'current' will change before/after switch_to since it's derived from
sp register.
So that means if interrupt come before we switch sp, 'p == current' will
catch it, but if interrupt comes after we switch sp, we will lose a wake up.

Thanks,
Yong

> and writing a proper
> changelog requires a mind that is slightly less deprived of sleep and I
> shall attempt that tomorrow -- provided this does indeed work for you.
>
> ---
> diff --git a/kernel/sched.c b/kernel/sched.c
> index 2d12893..6976eac 100644
> --- a/kernel/sched.c
> +++ b/kernel/sched.c
> @@ -2573,7 +2573,19 @@ static void ttwu_queue_remote(struct task_struct *p, int cpu)
>        if (!next)
>                smp_send_reschedule(cpu);
>  }
> -#endif
> +
> +#ifdef __ARCH_WANT_INTERRUPTS_ON_CTXSW
> +static void ttwu_activate_remote(struct task_struct *p, int wake_flags)
> +{
> +       struct rq *rq = __task_rq_lock(p);
> +
> +       ttwu_activate(rq, p, ENQUEUE_WAKEUP | ENQUEUE_WAKING);
> +       ttwu_do_wakeup(rq, p, wake_flags);
> +
> +       __task_rq_unlock(rq);
> +}
> +#endif /* __ARCH_WANT_INTERRUPTS_ON_CTXSW */
> +#endif /* CONFIG_SMP */
>
>  static void ttwu_queue(struct task_struct *p, int cpu)
>  {
> @@ -2630,18 +2642,11 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
>         */
>        while (p->on_cpu) {
>  #ifdef __ARCH_WANT_INTERRUPTS_ON_CTXSW
> -               /*
> -                * If called from interrupt context we could have landed in the
> -                * middle of schedule(), in this case we should take care not
> -                * to spin on ->on_cpu if p is current, since that would
> -                * deadlock.
> -                */
> -               if (p == current) {
> -                       ttwu_queue(p, cpu);
> -                       goto stat;
> -               }
> -#endif
> +               ttwu_activate_remote(p, wake_flags);
> +               goto stat;
> +#else
>                cpu_relax();
> +#endif
>        }
>        /*
>         * Pairs with the smp_wmb() in finish_lock_switch().
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>
Peter Zijlstra May 26, 2011, 10:32 a.m. UTC | #2
On Thu, 2011-05-26 at 15:29 +0800, Yong Zhang wrote:
> > Figuring out why the existing condition failed
> 
> Seems  'current' will change before/after switch_to since it's derived from
> sp register.
> So that means if interrupt come before we switch sp, 'p == current' will
> catch it, but if interrupt comes after we switch sp, we will lose a wake up.

Well, loosing a wakeup isn't the problem here (although it would be a
problem), the immediate problem is that we're getting stuck
(life-locked) in that while (p->on_cpu) loop.

But yes, I think that explains it, if the interrupts hits
context_switch() after current was changed but before clearing
p->on_cpu, we would life-lock in interrupt context.

Now we could of course go add in_interrupt() checks there, but that
would make this already fragile path more interesting, so I think I'll
stick with the proposed patch -- again provided it actually works.

Marc, any word on that?
Marc Zyngier May 26, 2011, 11:02 a.m. UTC | #3
On Thu, 2011-05-26 at 12:32 +0200, Peter Zijlstra wrote:
> On Thu, 2011-05-26 at 15:29 +0800, Yong Zhang wrote:
> > > Figuring out why the existing condition failed
> > 
> > Seems  'current' will change before/after switch_to since it's derived from
> > sp register.
> > So that means if interrupt come before we switch sp, 'p == current' will
> > catch it, but if interrupt comes after we switch sp, we will lose a wake up.
> 
> Well, loosing a wakeup isn't the problem here (although it would be a
> problem), the immediate problem is that we're getting stuck
> (life-locked) in that while (p->on_cpu) loop.
> 
> But yes, I think that explains it, if the interrupts hits
> context_switch() after current was changed but before clearing
> p->on_cpu, we would life-lock in interrupt context.
> 
> Now we could of course go add in_interrupt() checks there, but that
> would make this already fragile path more interesting, so I think I'll
> stick with the proposed patch -- again provided it actually works.
> 
> Marc, any word on that?

The box is currently building kernels in a loop (using -j64...). So far,
so good. Oh, and that fixed the load-average thing as well.

Oh wait (my turn...):
INFO: task gcc:10030 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

One of my ssh sessions is locking up periodically, and it generally
feels a bit sluggish.

	M.
Peter Zijlstra May 26, 2011, 11:32 a.m. UTC | #4
On Thu, 2011-05-26 at 12:02 +0100, Marc Zyngier wrote:

> The box is currently building kernels in a loop (using -j64...). So far,
> so good. Oh, and that fixed the load-average thing as well.

OK, great.

> Oh wait (my turn...):
> INFO: task gcc:10030 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> 
> One of my ssh sessions is locking up periodically, and it generally
> feels a bit sluggish.

The good news is that I can indeed confirm that, I somehow failed to
notice that last night. I simply put the machine to build kernels and
walked off, only to come back 30 minutes or so later to see it was still
happily chugging along.

Further good news is that by disabling 
__ARCH_WANT_INTERRUPTS_ON_CTXSW again it goes away, so it must be
something funny with the relatively little code under that directive.

The bad news is of course that I've got a little more head-scratching to
do, will keep you informed.
diff mbox

Patch

diff --git a/kernel/sched.c b/kernel/sched.c
index 2d12893..6976eac 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -2573,7 +2573,19 @@  static void ttwu_queue_remote(struct task_struct *p, int cpu)
 	if (!next)
 		smp_send_reschedule(cpu);
 }
-#endif
+
+#ifdef __ARCH_WANT_INTERRUPTS_ON_CTXSW
+static void ttwu_activate_remote(struct task_struct *p, int wake_flags)
+{
+	struct rq *rq = __task_rq_lock(p);
+
+	ttwu_activate(rq, p, ENQUEUE_WAKEUP | ENQUEUE_WAKING);
+	ttwu_do_wakeup(rq, p, wake_flags);
+
+	__task_rq_unlock(rq);
+}
+#endif /* __ARCH_WANT_INTERRUPTS_ON_CTXSW */
+#endif /* CONFIG_SMP */
 
 static void ttwu_queue(struct task_struct *p, int cpu)
 {
@@ -2630,18 +2642,11 @@  try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
 	 */
 	while (p->on_cpu) {
 #ifdef __ARCH_WANT_INTERRUPTS_ON_CTXSW
-		/*
-		 * If called from interrupt context we could have landed in the
-		 * middle of schedule(), in this case we should take care not
-		 * to spin on ->on_cpu if p is current, since that would
-		 * deadlock.
-		 */
-		if (p == current) {
-			ttwu_queue(p, cpu);
-			goto stat;
-		}
-#endif
+		ttwu_activate_remote(p, wake_flags);
+		goto stat;
+#else
 		cpu_relax();
+#endif
 	}
 	/*
 	 * Pairs with the smp_wmb() in finish_lock_switch().