From patchwork Thu May 26 12:21:51 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Zijlstra X-Patchwork-Id: 820632 Received: from canuck.infradead.org (canuck.infradead.org [134.117.69.58]) by demeter2.kernel.org (8.14.4/8.14.3) with ESMTP id p4QCP2og011522 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Thu, 26 May 2011 12:25:23 GMT Received: from localhost ([127.0.0.1] helo=canuck.infradead.org) by canuck.infradead.org with esmtp (Exim 4.76 #1 (Red Hat Linux)) id 1QPZaA-0006rV-KY; Thu, 26 May 2011 12:22:30 +0000 Received: from j77219.upc-j.chello.nl ([24.132.77.219] helo=twins) by canuck.infradead.org with esmtpsa (Exim 4.76 #1 (Red Hat Linux)) id 1QPZa4-0006rL-GJ; Thu, 26 May 2011 12:22:25 +0000 Received: by twins (Postfix, from userid 1000) id C16EA81AA9BE; Thu, 26 May 2011 14:21:51 +0200 (CEST) Subject: Re: [BUG] "sched: Remove rq->lock from the first half of ttwu()" locks up on ARM From: Peter Zijlstra To: Marc Zyngier In-Reply-To: <1306409575.1200.71.camel@twins> References: <1306260792.27474.133.camel@e102391-lin.cambridge.arm.com> <1306272750.2497.79.camel@laptop> <1306343335.21578.65.camel@twins> <1306358128.21578.107.camel@twins> <1306405979.1200.63.camel@twins> <1306407759.27474.207.camel@e102391-lin.cambridge.arm.com> <1306409575.1200.71.camel@twins> Date: Thu, 26 May 2011 14:21:51 +0200 Message-ID: <1306412511.1200.90.camel@twins> Mime-Version: 1.0 X-Mailer: Evolution 2.30.3 Cc: Frank Rowand , Oleg Nesterov , linux-kernel@vger.kernel.org, Yong Zhang , Ingo Molnar , linux-arm-kernel@lists.infradead.org X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: linux-arm-kernel-bounces@lists.infradead.org Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org X-Greylist: IP, sender and recipient auto-whitelisted, not delayed by milter-greylist-4.2.6 (demeter2.kernel.org [140.211.167.43]); Thu, 26 May 2011 12:25:23 +0000 (UTC) On Thu, 2011-05-26 at 13:32 +0200, Peter Zijlstra wrote: > > The bad news is of course that I've got a little more head-scratching to > do, will keep you informed. OK, that wasn't too hard.. (/me crosses fingers and prays Marc doesn't find more funnies ;-). Does the below cure all woes? Tested-by: Marc Zyngier --- Subject: sched: Fix ttwu() for __ARCH_WANT_INTERRUPTS_ON_CTXSW From: Peter Zijlstra Date: Thu May 26 14:21:33 CEST 2011 Marc reported that e4a52bcb9 (sched: Remove rq->lock from the first half of ttwu()) broke his ARM-SMP machine. Now ARM is one of the few __ARCH_WANT_INTERRUPTS_ON_CTXSW users, so that exception in the ttwu() code was suspect. Yong found that the interrupt could hit hits after context_switch() changes current but before it clears p->on_cpu, if that interrupt were to attempt a wake-up of p we would indeed find ourselves spinning in IRQ context. Sort this by reverting to the old behaviour for this situation and perform a full remote wake-up. Cc: Frank Rowand Cc: Yong Zhang Cc: Oleg Nesterov Reported-by: Marc Zyngier Signed-off-by: Peter Zijlstra --- kernel/sched.c | 37 ++++++++++++++++++++++++++++--------- 1 file changed, 28 insertions(+), 9 deletions(-) Index: linux-2.6/kernel/sched.c =================================================================== --- linux-2.6.orig/kernel/sched.c +++ linux-2.6/kernel/sched.c @@ -2573,7 +2573,26 @@ static void ttwu_queue_remote(struct tas if (!next) smp_send_reschedule(cpu); } -#endif + +#ifdef __ARCH_WANT_INTERRUPTS_ON_CTXSW +static int ttwu_activate_remote(struct task_struct *p, int wake_flags) +{ + struct rq *rq; + int ret = 0; + + rq = __task_rq_lock(p); + if (p->on_cpu) { + ttwu_activate(rq, p, ENQUEUE_WAKEUP); + ttwu_do_wakeup(rq, p, wake_flags); + ret = 1; + } + __task_rq_unlock(rq); + + return ret; + +} +#endif /* __ARCH_WANT_INTERRUPTS_ON_CTXSW */ +#endif /* CONFIG_SMP */ static void ttwu_queue(struct task_struct *p, int cpu) { @@ -2631,17 +2650,17 @@ try_to_wake_up(struct task_struct *p, un while (p->on_cpu) { #ifdef __ARCH_WANT_INTERRUPTS_ON_CTXSW /* - * If called from interrupt context we could have landed in the - * middle of schedule(), in this case we should take care not - * to spin on ->on_cpu if p is current, since that would - * deadlock. + * In case the architecture enables interrupts in + * context_switch(), we cannot busy wait, since that + * would lead to live-locks when an interrupt hits and + * tries to wake up @prev. So bail and do a complete + * remote wakeup. */ - if (p == current) { - ttwu_queue(p, cpu); + if (ttwu_activate_remote(p, wake_flags)) goto stat; - } -#endif +#else cpu_relax(); +#endif } /* * Pairs with the smp_wmb() in finish_lock_switch().