diff mbox

[Regression,Revert,request] Excessive delay or hang during resume from system suspend due to a hrtimer commit

Message ID alpine.LFD.2.02.1207161318010.32033@ionos (mailing list archive)
State Not Applicable, archived
Headers show

Commit Message

Thomas Gleixner July 16, 2012, 11:26 a.m. UTC
On Mon, 16 Jul 2012, Thomas Gleixner wrote:

> On Mon, 16 Jul 2012, Rafael J. Wysocki wrote:
> 
> > On Monday, July 16, 2012, Thomas Gleixner wrote:
> > > On Sun, 15 Jul 2012, Rafael J. Wysocki wrote:
> > > > To everyone involved: the fact that this change, which was likely to introduce
> > > > regressions from the look of it alone, has been pushed to Linus (an to -stable
> > > > at the same time!) so late in the cycle, is seriuosly disappointing.
> > > 
> > > Well, we spent an massive amount of time in testing, reviewing and
> > > discussion and it definitely did not break suspend/resume here.
> > 
> > I'm not saying that you didn't consider it thoroughly, but unfortunately you
> > did overlook this particular issue, didn't you?
> > 
> > > This was not pushed without a lot of thoughts and in fact what you are
> > > seing is another long standing bug in the timekeeping resume code,
> > > which was just papered over by the incorrect handling of the clock was
> > > set cases in the other parts of the system.
> > > 
> > > Does the following patch fix the problem for you ?
> > 
> > Yes, it does, thanks!
> > 
> > > @John: Should that clear ntp as well or is it enough to set ntp_error
> > >        to 0 ?
> > > 
> > > /me really goes on vacation now.
> > 
> > So who's going to take care of the patch? :-)
> 
> I'm still packing gear. So i'll push it into timers/urgent.

Actually that's a bad idea. John want's to double check vs. the
ntp_clear question. So John can send it to linus directly.

@John: Should it be: timekeeping_update(true)

Now I'm gone for real.

Thanks,

	tglx
-----
Subject: timekeeping: Add missing update call in timekeeping_resume()
From: Thomas Gleixner <tglx@linutronix.de>
Date: Mon, 16 Jul 2012 11:47:31 +0200 (CEST)

The leap second rework unearthed another issue of inconsistent data.

On timekeeping_resume() the timekeeper data is updated, but nothing
calls timekeeping_update(), so now the update code in the timer
interrupt sees stale values.

This has been the case before those changes, but then the timer
interrupt was using stale data as well so this went unnoticed for
quite some time.

Add the missing update call, so all the data is consistent everywhere.

Reported-by: Andreas Schwab <schwab@linux-m68k.org>
Reported-by-and-tested-by: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Linux PM list <linux-pm@vger.kernel.org>
Cc: John Stultz <johnstul@us.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>, 
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

John Stultz July 16, 2012, 3:47 p.m. UTC | #1
On 07/16/2012 04:26 AM, Thomas Gleixner wrote:
> On Mon, 16 Jul 2012, Thomas Gleixner wrote:
>
>> On Mon, 16 Jul 2012, Rafael J. Wysocki wrote:
>>
>>> On Monday, July 16, 2012, Thomas Gleixner wrote:
>>>> On Sun, 15 Jul 2012, Rafael J. Wysocki wrote:
>>>>> To everyone involved: the fact that this change, which was likely to introduce
>>>>> regressions from the look of it alone, has been pushed to Linus (an to -stable
>>>>> at the same time!) so late in the cycle, is seriuosly disappointing.
>>>> Well, we spent an massive amount of time in testing, reviewing and
>>>> discussion and it definitely did not break suspend/resume here.
>>> I'm not saying that you didn't consider it thoroughly, but unfortunately you
>>> did overlook this particular issue, didn't you?
>>>
>>>> This was not pushed without a lot of thoughts and in fact what you are
>>>> seing is another long standing bug in the timekeeping resume code,
>>>> which was just papered over by the incorrect handling of the clock was
>>>> set cases in the other parts of the system.
>>>>
>>>> Does the following patch fix the problem for you ?
>>> Yes, it does, thanks!
>>>
>>>> @John: Should that clear ntp as well or is it enough to set ntp_error
>>>>         to 0 ?
>>>>
>>>> /me really goes on vacation now.
>>> So who's going to take care of the patch? :-)
>> I'm still packing gear. So i'll push it into timers/urgent.
> Actually that's a bad idea. John want's to double check vs. the
> ntp_clear question. So John can send it to linus directly.
>
> @John: Should it be: timekeeping_update(true)
I think its better to leave it as false, so we don't reset the NTP state 
machine completely after suspend.

When we come back from suspend our error is usually off by the 
persistent_clock/rtc granularity, so it might make sense, but I'd want a 
lot more testing of using ntp over suspend before changing the existing 
behavior of not doing it.

> Now I'm gone for real.
Ok. Thanks for spinning this up so quickly. I'll go ahead and send it on 
to Linus.

thanks
-john

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

Index: tip/kernel/time/timekeeping.c
===================================================================
--- tip.orig/kernel/time/timekeeping.c
+++ tip/kernel/time/timekeeping.c
@@ -717,6 +717,7 @@  static void timekeeping_resume(void)
 	timekeeper.clock->cycle_last = timekeeper.clock->read(timekeeper.clock);
 	timekeeper.ntp_error = 0;
 	timekeeping_suspended = 0;
+	timekeeping_update(false);
 	write_sequnlock_irqrestore(&timekeeper.lock, flags);
 
 	touch_softlockup_watchdog();