From patchwork Wed Apr 12 15:08:35 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Frederic Weisbecker X-Patchwork-Id: 9677597 X-Patchwork-Delegate: bhelgaas@google.com Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id B536560384 for ; Wed, 12 Apr 2017 15:09:04 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A3B2728655 for ; Wed, 12 Apr 2017 15:09:04 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 982EF2851F; Wed, 12 Apr 2017 15:09:04 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.5 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, FREEMAIL_FROM, RCVD_IN_DNSWL_HI, RCVD_IN_SORBS_SPAM autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3FA1D28615 for ; Wed, 12 Apr 2017 15:09:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752650AbdDLPIo (ORCPT ); Wed, 12 Apr 2017 11:08:44 -0400 Received: from mail-wr0-f193.google.com ([209.85.128.193]:33233 "EHLO mail-wr0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751108AbdDLPIn (ORCPT ); Wed, 12 Apr 2017 11:08:43 -0400 Received: by mail-wr0-f193.google.com with SMTP id l28so4659732wre.0; Wed, 12 Apr 2017 08:08:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=wgO8slbiiz3sWL8KaWXNoFoguizZzZ6pVo6XbtIBxiY=; b=osyD+KmRrC9pWLCDMNPSfwQqROxw8uwEBl0aO23nA3Jh/R165lXfKdZgi345EHkPr9 X3O9j5trtiZwoE2Ol5tPdH2QCRSm4x472C64QfXtENxAPxPFlMBGB4knnKVxm9B3Mxtl vw0aeXCVyl2rEajN85HfVQLM88+Mrh2vZC6TGII6XIC8H4sTb2u5wgG6CEP9uQ+4C19w oGEop/6lR+RhLPaMHIbr+VpTfpdx5ncrvYrME0TW2uvksBuq8GEzVbbWNFB2PClbrJp4 6NFWVlm/kZx0YFADLFvPvtuGIZu8CoeTZlyzdP1fR7FOlFSUVR4EBfKAMP1u4kjI79yY cgKg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=wgO8slbiiz3sWL8KaWXNoFoguizZzZ6pVo6XbtIBxiY=; b=ckGzarJkW0yCYdpzHi0lGaTNq+w9QH1Xdqj6fjU7/qob21lSIvqjQUkLI7wrWDPhrz qkc2upN11UPX9g0fheJrmVBsx7a+LsU52NA0AirZIfwntOhxt4gyDip59brz4XsEpFFX tkXkePMKe3k5NX9I9VmClIkQ2jJgFrlE8+asLJCFWYwomIJSWdFLXctDoa9KJkhmbQaZ VxfN7b/wK9XkOCl6zbhC9p8e9hyJaJd6ziKTmCq+jrjBZvXn/fEFwZzG7xuNnHxp1lBM 2ys5t08ByDr8MMCdjq2Ly9qYRPEHYLZY+GiUlyLsXIO2wQbjpAViMdvhOXWl2wGLbWZE Gu6Q== X-Gm-Message-State: AN3rC/5LnLhRbYCkZO7LA2deIjDDY+jB/o4i+8DF3EaLEtOoQIT1exoCEkUcggh4IQkssA== X-Received: by 10.223.150.162 with SMTP id u31mr3622046wrb.184.1492009721465; Wed, 12 Apr 2017 08:08:41 -0700 (PDT) Received: from localhost (4be54-h03-87-88-245-150.dsl.sta.abo.bbox.fr. [87.88.245.150]) by smtp.gmail.com with ESMTPSA id r52sm4640222wrb.37.2017.04.12.08.08.37 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 12 Apr 2017 08:08:40 -0700 (PDT) Date: Wed, 12 Apr 2017 17:08:35 +0200 From: Frederic Weisbecker To: Pavel Machek Cc: Alan Stern , torvalds@linux-foundation.org, kernel list , linux-usb@vger.kernel.org, gregkh@linuxfoundation.org, bhelgaas@google.com, linux-pci@vger.kernel.org Subject: Re: v4.10-rc8 (-rc6) boot regression on Intel desktop, does not boot after cold boots, boots after reboot Message-ID: <20170412150832.GE21309@lerouge> References: <20170203190414.GA3701@amd> <20170203205129.GA3791@amd> <20170203211854.GA3697@amd> <20170214175956.GA3587@amd> <20170214192743.GA3869@amd> <20170223162825.GA16646@lerouge> <20170223184013.GA5177@amd> <20170403153850.GA4418@lerouge> <20170403182050.GA6555@amd> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20170403182050.GA6555@amd> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Mon, Apr 03, 2017 at 08:20:50PM +0200, Pavel Machek wrote: > > > > > > ...1d.7: PCI fixup... pass 2 > > > > > > ...1d.7: PCI fixup... pass 3 > > > > > > ...1d.7: PCI fixup... pass 3 done > > > > > > > > > > > > ...followed by hang. So yes, it looks USB related. > > > > > > > > > > > > (Sometimes it hangs with some kind backtrace involving secondary CPU > > > > > > startup, unfortunately useful info is off screen at that point). > > > > > > > > > > Forgot to say, 1d.7 is EHCI controller. > > > > > > > > > > 00:1d.7 USB controller: Intel Corporation NM10/ICH7 Family USB2 EHCI > > > > > Controller (rev 01) > > > > > > > > Ok, I should have access soon to a EeePc 1015CX (which seem to have this controller). > > > > I hope I'll be able to reproduce the issue there. If not, I'm sorry but I'll have to > > > > burden you again :-) > > > > > > Go through more mails. It is only reproducible after cold boot. .. so > > > I doubt it will be easy to reproduce on another machine. > > > > > > Now... I do have serial port, and I even might have serial cable > > > somewhere, but.... Giving how sensitive it is, it is probably going to > > > go away with console on ttyS... > > > > I also tried on an eeepc (which has ICH7/NM10 as well), with your config. > > I even plugged a usb keyboard but even then I have been unable to > > reproduce either :-( > > Ok, give me some time. I'm no longer using the affected machine, so no > promises. Actually someone reported me a very similar issue than yours lately. It's probably the same. And I have a potential fix. The scenario is a bit tricky again, and still theoretical. If you're interested in gory details: a tick which is scheduled at jiffies = N + 1, in order to expire a timer_list timer, fires a tiny bit too early (ie: very few microseconds in advance). So it doesn't update the jiffies on irq entry and still sees jiffies = N. The timer_list timer doesnt expire yet and on IRQ exit we reschedule the tick at the same time. But we see that ts->next_tick already has that value, therefore we don't reprogram it again, leaving the clockevent unprogrammed. So in case you have the time and opportunity to test the fix, you'll need to: 1) Revert back to the offending change: git revert 558e8e27e73f53f8a512485be538b07115fe5f3c 2) Apply a delta fix: Thanks! Tested-by: Pavel Machek diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index a3b8154..ae66515 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -1071,8 +1071,10 @@ static void tick_nohz_handler(struct clock_event_device *dev) tick_sched_handle(ts, regs); /* No need to reprogram if we are running tickless */ - if (unlikely(ts->tick_stopped)) + if (unlikely(ts->tick_stopped)) { + ts->next_tick = 0; return; + } hrtimer_forward(&ts->sched_timer, now, tick_period); tick_program_event(hrtimer_get_expires(&ts->sched_timer), 1); @@ -1172,8 +1174,10 @@ static enum hrtimer_restart tick_sched_timer(struct hrtimer *timer) tick_sched_handle(ts, regs); /* No need to reprogram if we are in idle or full dynticks mode */ - if (unlikely(ts->tick_stopped)) + if (unlikely(ts->tick_stopped)) { + ts->next_tick = 0; return HRTIMER_NORESTART; + } hrtimer_forward(timer, now, tick_period);