From patchwork Tue Dec 1 13:04:04 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Zijlstra X-Patchwork-Id: 7737171 Return-Path: X-Original-To: patchwork-linux-arm@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 312E4BEEE1 for ; Tue, 1 Dec 2015 13:12:51 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 4233F206B0 for ; Tue, 1 Dec 2015 13:12:50 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.9]) (using TLSv1.2 with cipher AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 4B1FA204D1 for ; Tue, 1 Dec 2015 13:12:49 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.80.1 #2 (Red Hat Linux)) id 1a3khc-00048d-B1; Tue, 01 Dec 2015 13:10:40 +0000 Received: from casper.infradead.org ([2001:770:15f::2]) by bombadil.infradead.org with esmtps (Exim 4.80.1 #2 (Red Hat Linux)) id 1a3kbK-0003oD-2R for linux-arm-kernel@bombadil.infradead.org; Tue, 01 Dec 2015 13:04:10 +0000 Received: from j217066.upc-j.chello.nl ([24.132.217.66] helo=twins) by casper.infradead.org with esmtpsa (Exim 4.80.1 #2 (Red Hat Linux)) id 1a3P0h-0003YQ-2m; Mon, 30 Nov 2015 14:00:55 +0000 Received: by twins (Postfix, from userid 1000) id 027E012531889; Tue, 1 Dec 2015 14:04:05 +0100 (CET) Date: Tue, 1 Dec 2015 14:04:04 +0100 From: Peter Zijlstra To: Vladimir Murzin Subject: Re: [BISECTED] rcu_sched self-detected stall since 3.17 Message-ID: <20151201130404.GL3816@twins.programming.kicks-ass.net> References: <564F3DCA.1080907@arm.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <564F3DCA.1080907@arm.com> User-Agent: Mutt/1.5.21 (2012-12-30) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: mark.rutland@arm.com, neilb@suse.de, linux-kernel@vger.kernel.org, oleg@redhat.com, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, T_RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Sorry for the delay and thanks for the reminder! On Fri, Nov 20, 2015 at 03:35:38PM +0000, Vladimir Murzin wrote: > commit 743162013d40ca612b4cb53d3a200dff2d9ab26e > Author: NeilBrown > Date: Mon Jul 7 15:16:04 2014 +1000 > > sched: Remove proliferation of wait_on_bit() action functions > > The only change I noticed is from (mm/filemap.c) > > io_schedule(); > fatal_signal_pending(current) > > to (kernel/sched/wait.c) > > signal_pending_state(current->state, current) > io_schedule(); > > and if I apply following diff I don't see stalls anymore. > > diff --git a/kernel/sched/wait.c b/kernel/sched/wait.c > index a104879..2d68cdb 100644 > --- a/kernel/sched/wait.c > +++ b/kernel/sched/wait.c > @@ -514,9 +514,10 @@ EXPORT_SYMBOL(bit_wait); > > __sched int bit_wait_io(void *word) > { > + io_schedule(); > + > if (signal_pending_state(current->state, current)) > return 1; > - io_schedule(); > return 0; > } > EXPORT_SYMBOL(bit_wait_io); > > Any ideas why it might happen and why diff above helps? Yes, the code as presented is simply wrong. And in fact most of the code it replaced was of the right form (with a few exceptions which would indeed have been subject to the same problem you've observed. Note how the late: - cifs_sb_tcon_pending_wait - fscache_wait_bit_interruptible - sleep_on_page_killable - wait_inquiry - key_wait_bit_intr All check the signal state _after_ calling schedule(). As opposed to: - gfs2_journalid_wait which follows the broken pattern. Further notice that most expect a return of -EINTR, which also seems correct given that this is a signal, those that do not return -EINTR only check for a !0 return value so would work equally well with -EINTR. The reason this is broken is that schedule() will no-op when there is a pending signal, while raising a signal will also issue a wakeup. Thus the right thing to do is check for the signal state after, that way you handle both cases: - calling schedule() with a signal pending - receiving a signal while sleeping As such, I would propose the below patch. Oleg, do you concur? Tested-by: Vladimir Murzin --- Subject: sched,wait: Fix signal handling in bit wait helpers Vladimir reported getting RCU stall warnings and bisected it back to commit 743162013d40. That commit inadvertently reversed the calls to schedule() and signal_pending(), thereby not handling the case where the signal receives while we sleep. Fixes: 743162013d40 ("sched: Remove proliferation of wait_on_bit() action functions") Fixes: cbbce8220949 ("SCHED: add some "wait..on_bit...timeout()" interfaces.") Reported-by: Vladimir Murzin Signed-off-by: Peter Zijlstra (Intel) --- kernel/sched/wait.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/kernel/sched/wait.c b/kernel/sched/wait.c index 052e02672d12..f10bd873e684 100644 --- a/kernel/sched/wait.c +++ b/kernel/sched/wait.c @@ -583,18 +583,18 @@ EXPORT_SYMBOL(wake_up_atomic_t); __sched int bit_wait(struct wait_bit_key *word) { - if (signal_pending_state(current->state, current)) - return 1; schedule(); + if (signal_pending(current)) + return -EINTR; return 0; } EXPORT_SYMBOL(bit_wait); __sched int bit_wait_io(struct wait_bit_key *word) { - if (signal_pending_state(current->state, current)) - return 1; io_schedule(); + if (signal_pending(current)) + return -EINTR; return 0; } EXPORT_SYMBOL(bit_wait_io); @@ -602,11 +602,11 @@ EXPORT_SYMBOL(bit_wait_io); __sched int bit_wait_timeout(struct wait_bit_key *word) { unsigned long now = READ_ONCE(jiffies); - if (signal_pending_state(current->state, current)) - return 1; if (time_after_eq(now, word->timeout)) return -EAGAIN; schedule_timeout(word->timeout - now); + if (signal_pending(current)) + return -EINTR; return 0; } EXPORT_SYMBOL_GPL(bit_wait_timeout); @@ -614,11 +614,11 @@ EXPORT_SYMBOL_GPL(bit_wait_timeout); __sched int bit_wait_io_timeout(struct wait_bit_key *word) { unsigned long now = READ_ONCE(jiffies); - if (signal_pending_state(current->state, current)) - return 1; if (time_after_eq(now, word->timeout)) return -EAGAIN; io_schedule_timeout(word->timeout - now); + if (signal_pending(current)) + return -EINTR; return 0; } EXPORT_SYMBOL_GPL(bit_wait_io_timeout);