From patchwork Sat Sep 22 00:05:37 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Paul E. McKenney" X-Patchwork-Id: 1493781 Return-Path: X-Original-To: patchwork-linux-arm@patchwork.kernel.org Delivered-To: patchwork-process-083081@patchwork2.kernel.org Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) by patchwork2.kernel.org (Postfix) with ESMTP id 93FF9DF28C for ; Sat, 22 Sep 2012 00:07:20 +0000 (UTC) Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.76 #1 (Red Hat Linux)) id 1TFDEE-0002ZZ-1b; Sat, 22 Sep 2012 00:05:50 +0000 Received: from e31.co.us.ibm.com ([32.97.110.149]) by merlin.infradead.org with esmtps (Exim 4.76 #1 (Red Hat Linux)) id 1TFDEA-0002YY-93 for linux-arm-kernel@lists.infradead.org; Sat, 22 Sep 2012 00:05:47 +0000 Received: from /spool/local by e31.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Fri, 21 Sep 2012 18:05:41 -0600 Received: from d03dlp01.boulder.ibm.com (9.17.202.177) by e31.co.us.ibm.com (192.168.1.131) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Fri, 21 Sep 2012 18:05:39 -0600 Received: from d03relay05.boulder.ibm.com (d03relay05.boulder.ibm.com [9.17.195.107]) by d03dlp01.boulder.ibm.com (Postfix) with ESMTP id 915501FF003C for ; Fri, 21 Sep 2012 18:05:35 -0600 (MDT) Received: from d03av01.boulder.ibm.com (d03av01.boulder.ibm.com [9.17.195.167]) by d03relay05.boulder.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id q8M05cZ2259402 for ; Fri, 21 Sep 2012 18:05:38 -0600 Received: from d03av01.boulder.ibm.com (loopback [127.0.0.1]) by d03av01.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id q8M05cLb005815 for ; Fri, 21 Sep 2012 18:05:38 -0600 Received: from paulmck-ThinkPad-W500 ([9.47.24.72]) by d03av01.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVin) with ESMTP id q8M05bm7005800; Fri, 21 Sep 2012 18:05:38 -0600 Received: by paulmck-ThinkPad-W500 (Postfix, from userid 1000) id 66EF0EC515; Fri, 21 Sep 2012 17:05:37 -0700 (PDT) Date: Fri, 21 Sep 2012 17:05:37 -0700 From: "Paul E. McKenney" To: Paul Walmsley Subject: Re: rcu self-detected stall messages on OMAP3, 4 boards Message-ID: <20120922000537.GH2454@linux.vnet.ibm.com> References: <20120913011208.GT4257@linux.vnet.ibm.com> <20120920000351.GI2455@linux.vnet.ibm.com> <20120920220130.GN2449@linux.vnet.ibm.com> <20120921212054.GE2454@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-Content-Scanned: Fidelis XPS MAILER x-cbid: 12092200-7282-0000-0000-00000D412713 X-Spam-Note: CRM114 invocation failed X-Spam-Score: -7.4 (-------) X-Spam-Report: SpamAssassin version 3.3.2 on merlin.infradead.org summary: Content analysis details: (-7.4 points) pts rule name description ---- ---------------------- -------------------------------------------------- -5.0 RCVD_IN_DNSWL_HI RBL: Sender listed at http://www.dnswl.org/, high trust [32.97.110.149 listed in list.dnswl.org] -0.5 RP_MATCHES_RCVD Envelope sender domain matches handover relay domain -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% [score: 0.0000] Cc: "Hilman, Kevin" , "" , "Bruce, Becky" , "" , "Paul E. McKenney" , "Shilimkar, Santosh" , "Hunter, Jon" , "" , "" X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: paulmck@linux.vnet.ibm.com List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: linux-arm-kernel-bounces@lists.infradead.org Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org On Fri, Sep 21, 2012 at 10:41:14PM +0000, Paul Walmsley wrote: > On Fri, 21 Sep 2012, Paul E. McKenney wrote: > > > On Fri, Sep 21, 2012 at 05:47:31PM +0000, Paul Walmsley wrote: > > > > > I built an OMAP kernel from Linus' commit > > > 4651afbbae968772efd6dc4ba461cba9b49bb9d8 ("Merge branch 'for-3.6-fixes' of > > > git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq"). The config used > > > was 'omap2plus_defconfig', and enabled CONFIG_CPU_IDLE by hand. Booted it > > > on a Pandaboard (OMAP4430ES2) into a very minimal Debian rootfs. > > > > Did you have the patch at https://lkml.org/lkml/2012/8/30/290 applied? > > No, it's just as described above. > > > If not, could you please try it? (This patch cleared up a similar > > problem for Becky, also on OMAP.) > > Did not seem to help, either with or without CONFIG_CPU_IDLE. I was hoping! ;-) And my init=/bin/sh kernel ran idle for more than an hour without any RCU CPU stall warnings... I am wondering if your system somehow figured out how to start a grace period that had no RCU callbacks waiting for it. If that happened, then a CONFIG_NO_HZ=y system could in theory get into a state where all CPUs are in dyntick-idle mode, so that none of them is doing anything to force the grace period to complete. That should be easy to diagnose, anyway. Please see below, which includes the earlier diagnostic patch. Thanx, Paul ------------------------------------------------------------------------ diff --git a/kernel/rcutree.c b/kernel/rcutree.c index 307caf1..696f189 100644 --- a/kernel/rcutree.c +++ b/kernel/rcutree.c @@ -879,6 +879,7 @@ static void print_other_cpu_stall(struct rcu_state *rsp) unsigned long flags; int ndetected = 0; struct rcu_node *rnp = rcu_get_root(rsp); + long totqlen = 0; /* Only let one CPU complain about others per time interval. */ @@ -923,8 +924,11 @@ static void print_other_cpu_stall(struct rcu_state *rsp) raw_spin_unlock_irqrestore(&rnp->lock, flags); print_cpu_stall_info_end(); - printk(KERN_CONT "(detected by %d, t=%ld jiffies)\n", - smp_processor_id(), (long)(jiffies - rsp->gp_start)); + for_each_possible_cpu(cpu) + totqlen += per_cpu_ptr(rsp->rda, cpu)->qlen; + pr_cont("(detected by %d, t=%ld jiffies, g=%lu, c=%lu, q=%lu)\n", + smp_processor_id(), (long)(jiffies - rsp->gp_start), + rsp->gpnum, rsp->completed, totqlen); if (ndetected == 0) printk(KERN_ERR "INFO: Stall ended before state dump start\n"); else if (!trigger_all_cpu_backtrace()) @@ -939,8 +943,10 @@ static void print_other_cpu_stall(struct rcu_state *rsp) static void print_cpu_stall(struct rcu_state *rsp) { + int cpu; unsigned long flags; struct rcu_node *rnp = rcu_get_root(rsp); + long totqlen = 0; /* * OK, time to rat on ourselves... @@ -951,7 +957,10 @@ static void print_cpu_stall(struct rcu_state *rsp) print_cpu_stall_info_begin(); print_cpu_stall_info(rsp, smp_processor_id()); print_cpu_stall_info_end(); - printk(KERN_CONT " (t=%lu jiffies)\n", jiffies - rsp->gp_start); + for_each_possible_cpu(cpu) + totqlen += per_cpu_ptr(rsp->rda, cpu)->qlen; + pr_cont(" (t=%lu jiffies g=%lu c=%lu q=%lu)\n", + jiffies - rsp->gp_start, rsp->gpnum, rsp->completed, totqlen); if (!trigger_all_cpu_backtrace()) dump_stack();