From patchwork Sun Aug 20 04:45:53 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nicholas Piggin X-Patchwork-Id: 9910891 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id D3E13602B1 for ; Sun, 20 Aug 2017 04:46:48 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B063B287AB for ; Sun, 20 Aug 2017 04:46:48 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id A39762890F; Sun, 20 Aug 2017 04:46:48 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.6 required=2.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, DKIM_SIGNED, DKIM_VALID, FREEMAIL_FROM, RCVD_IN_DNSWL_LOW autolearn=ham version=3.3.1 Received: from bombadil.infradead.org (bombadil.infradead.org [65.50.211.133]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 9A9FE287AB for ; Sun, 20 Aug 2017 04:46:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20170209; h=Sender: Content-Transfer-Encoding:Content-Type:Cc:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-ID:Subject:To:From:Date:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=dRmKzuFwC5OXthDgzqQAflBhj6XGd0RTfdfkZQOnit4=; b=F1yYkiAUOgyCsb oLFCaXnqz7t2n8aZb1i3wyDZzW3HpJrUjdokZZN/uIBssJ6K5N2OAQ6RUlimzNd9Cw/zxNSy023FM K1IUh0WaePRCuCwvDv4m+dZ3D2K6rhCGy7Cur9Xc2vNPbnlJaakYNrvtbz3eKEuxEVPChThweDiRR 22qoQBUuSmufl4USuwd1WwX4bpNoNXCE4JW0naeLTPooFk05V7ibmBJE8kPgz47GPBMMVGU4PHaQM 5SvuLwDk4/FIvKAQGjjF4yWMaKfN3Pke3rjrCeu16bwZwAO4Al75lknghCr+IkFjjIpJQSOF9agRj EJicNaKL3N37m2gQOx7Q==; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.87 #1 (Red Hat Linux)) id 1djI8D-0005H2-F9; Sun, 20 Aug 2017 04:46:37 +0000 Received: from mail-pf0-x242.google.com ([2607:f8b0:400e:c00::242]) by bombadil.infradead.org with esmtps (Exim 4.87 #1 (Red Hat Linux)) id 1djI89-0005D7-Lm for linux-arm-kernel@lists.infradead.org; Sun, 20 Aug 2017 04:46:35 +0000 Received: by mail-pf0-x242.google.com with SMTP id z3so1897299pfk.0 for ; Sat, 19 Aug 2017 21:46:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:cc:subject:message-id:in-reply-to:references :organization:mime-version:content-transfer-encoding; bh=bYJLrZGs1sMQkGrExqILmuOqxVNgRM+2o7uUOCMpvf8=; b=W4NMTGq+EdyK53HHk9jVT+P9L3IigBMfU1rQBP0PIg7XZB2B1fWUP2oOAtQ984CSzl i4SxZUeZvJRZek3ABSlhUfnD7Gdy8xQXOGvCLtRGk7ZiIJW/EsXnnmn2YhrSkZIjqf7Y iGj497aGZIpjREBjsM5paJumf6GJpBJA19pwEwEI2/xE+qHuLFtsqlzaY7q5OYg7/WfG pVUm9DOkOnZ4Sa5+8jC7PHyFb1mRDY4RFcBK8zT9BtWhddu85c50u2hpeh2hiw9pE28C lGtylXznXUfFt9x06gTtRnzs1S7hYoFprY6Ta2qgLUV9/FPVyBnCApZwuFwUiC3wzBTf i9UA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:in-reply-to :references:organization:mime-version:content-transfer-encoding; bh=bYJLrZGs1sMQkGrExqILmuOqxVNgRM+2o7uUOCMpvf8=; b=CBymiRfJOGSGGQXlCkNxvmGcHNCU2wCAdExBYFPtm+aX3pYY+llGqIzLDGw9fagIYQ yQeJqLFac9WLlbuhKZ0pp0UpCaIIO0w1FcwgAzEmo7f43mh3j9MGBEtou4woMddWkBwg BL/fu9xzUHmVWbO29Ho98meV8y3Dz8YzKVEvJQtNlIhBXoyKE9mwLPp67CNYyWvqCi+7 3wnz4NHbmdx5d1T38Dc8uRDFvK19i/GHgXiP4rZddzfv12Cvy983IG+EJj5JbYqOq+d3 d28yRuSifguC98U9l01xhUJ6iI6soISDk/XBZyRjnTtHDz0bs3duuXO0MYrgstiapQl5 JIdg== X-Gm-Message-State: AHYfb5isgeVNFH0n6VVnhEmEc1YekQm967JJ30s7MzHiolnrUT6Kx0U7 gS4NQA0fAgRxyA== X-Received: by 10.101.76.5 with SMTP id u5mr12859194pgq.119.1503204371046; Sat, 19 Aug 2017 21:46:11 -0700 (PDT) Received: from ppc64le (203-219-56-202.tpgi.com.au. [203.219.56.202]) by smtp.gmail.com with ESMTPSA id y25sm18164781pfk.162.2017.08.19.21.46.04 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Sat, 19 Aug 2017 21:46:09 -0700 (PDT) Date: Sun, 20 Aug 2017 14:45:53 +1000 From: Nicholas Piggin To: "Paul E. McKenney" Subject: Re: RCU lockup issues when CONFIG_SOFTLOCKUP_DETECTOR=n - any one else seeing this? Message-ID: <20170820144553.2ab2727b@ppc64le> In-Reply-To: <20170816162731.GA22978@linux.vnet.ibm.com> References: <20170728182053.000072aa@huawei.com> <20170728190349.GM3730@linux.vnet.ibm.com> <20170731120847.00003d5c@huawei.com> <20170731150411.GA3730@linux.vnet.ibm.com> <20170731162757.000058ba@huawei.com> <20170801184646.GE3730@linux.vnet.ibm.com> <20170802172555.0000468a@huawei.com> <20170815154743.GK7017@linux.vnet.ibm.com> <87wp63smwn.fsf@concordia.ellerman.id.au> <20170816125617.GY7017@linux.vnet.ibm.com> <20170816162731.GA22978@linux.vnet.ibm.com> Organization: IBM X-Mailer: Claws Mail 3.15.0-dirty (GTK+ 2.24.31; x86_64-pc-linux-gnu) MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20170819_214633_851792_27FB937F X-CRM114-Status: GOOD ( 19.88 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: dzickus@redhat.com, sfr@canb.auug.org.au, Michael Ellerman , linuxarm@huawei.com, David Miller , abdhalee@linux.vnet.ibm.com, john.stultz@linaro.org, Jonathan Cameron , sparclinux@vger.kernel.org, tglx@linutronix.de, linuxppc-dev@lists.ozlabs.org, akpm@linux-foundation.org, linux-arm-kernel@lists.infradead.org Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org X-Virus-Scanned: ClamAV using ClamSMTP On Wed, 16 Aug 2017 09:27:31 -0700 "Paul E. McKenney" wrote: > On Wed, Aug 16, 2017 at 05:56:17AM -0700, Paul E. McKenney wrote: > > Thomas, John, am I misinterpreting the timer trace event messages? So I did some digging, and what you find is that rcu_sched seems to do a simple scheudle_timeout(1) and just goes out to lunch for many seconds. The process_timeout timer never fires (when it finally does wake after one of these events, it usually removes the timer with del_timer_sync). So this patch seems to fix it. Testing, comments welcome. Thanks, Nick [PATCH] timers: Fix excessive granularity of new timers after a nohz idle When a timer base is idle, it is forwarded when a new timer is added to ensure that granularity does not become excessive. When not idle, the timer tick is expected to increment the base. However there is a window after a timer is restarted from nohz, when it is marked not-idle, and before the timer tick on this CPU, where a timer may be added on an ancient base that does not get forwarded (beacause the timer appears not-idle). This results in excessive granularity. So much so that a 1 jiffy timeout has blown out to 10s of seconds and triggered the RCU stall warning detector. Fix this by always forwarding the base when adding a new timer if it is more than 1 jiffy behind. Another approach I looked at first was to note if the base was idle but not yet run or forwarded, however this just seemed to add more branches and complexity when it seems we can just cover it with this test. Also add a comment noting a case where we could get an unexpectedly large granularity for a timer. I debugged this problem by adding warnings for such cases, but it seems we can't add them in general due to this corner case. Signed-off-by: Nicholas Piggin --- kernel/time/timer.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/kernel/time/timer.c b/kernel/time/timer.c index 8f5d1bf18854..8f69b3105b8f 100644 --- a/kernel/time/timer.c +++ b/kernel/time/timer.c @@ -859,10 +859,10 @@ static inline void forward_timer_base(struct timer_base *base) unsigned long jnow = READ_ONCE(jiffies); /* - * We only forward the base when it's idle and we have a delta between - * base clock and jiffies. + * We only forward the base when we have a delta between base clock + * and jiffies. In the common case, run_timers will take care of it. */ - if (!base->is_idle || (long) (jnow - base->clk) < 2) + if ((long) (jnow - base->clk) < 2) return; /* @@ -938,6 +938,13 @@ __mod_timer(struct timer_list *timer, unsigned long expires, bool pending_only) * same array bucket then just return: */ if (timer_pending(timer)) { + /* + * The downside of this optimization is that it can result in + * larger granularity than you would get from adding a new + * timer with this expiry. Would a timer flag for networking + * be appropriate, then we can try to keep expiry of general + * timers within ~1/8th of their interval? + */ if (timer->expires == expires) return 1;