Message ID | 1405721392-30795-1-git-send-email-sboyd@codeaurora.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 07/18/2014 03:09 PM, Stephen Boyd wrote: > During suspend we call sched_clock_poll() to update the epoch and > accumulated time and reprogram the sched_clock_timer to fire > before the next wrap-around time. Unfortunately, > sched_clock_poll() doesn't restart the timer, instead it relies > on the hrtimer layer to do that and during suspend we aren't > calling that function from the hrtimer layer. Instead, we're > reprogramming the expires time while the hrtimer is enqueued, > which can cause the hrtimer tree to be corrupted. Fix this > problem by updating the state via update_sched_clock() and > properly restarting the timer via hrtimer_start(). > > Fixes: a08ca5d1089d "sched_clock: Use an hrtimer instead of timer" > Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> > --- > > I also wonder if we should be restarting the timer during resume > instead of suspend given that the resume path modifies the epoch. > At that point timers can't run because interrupts are disabled and > we don't really care if the timer fires earlier than it's supposed > to anyway because it's just there to avoid rollover events, but > does it seem better to do it that way? I didn't send that version > because this patch is to fix the code intention, but I'm curious > if anyone else feels like it should be changed. Yea, starting the timer on suspend seems unintuitive to me. Is this something you were hoping to get in for 3.17 or is this a urgent 3.16 item? thanks -john
On 07/18/14 15:25, John Stultz wrote: > On 07/18/2014 03:09 PM, Stephen Boyd wrote: >> During suspend we call sched_clock_poll() to update the epoch and >> accumulated time and reprogram the sched_clock_timer to fire >> before the next wrap-around time. Unfortunately, >> sched_clock_poll() doesn't restart the timer, instead it relies >> on the hrtimer layer to do that and during suspend we aren't >> calling that function from the hrtimer layer. Instead, we're >> reprogramming the expires time while the hrtimer is enqueued, >> which can cause the hrtimer tree to be corrupted. Fix this >> problem by updating the state via update_sched_clock() and >> properly restarting the timer via hrtimer_start(). >> >> Fixes: a08ca5d1089d "sched_clock: Use an hrtimer instead of timer" >> Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> >> --- >> >> I also wonder if we should be restarting the timer during resume >> instead of suspend given that the resume path modifies the epoch. >> At that point timers can't run because interrupts are disabled and >> we don't really care if the timer fires earlier than it's supposed >> to anyway because it's just there to avoid rollover events, but >> does it seem better to do it that way? I didn't send that version >> because this patch is to fix the code intention, but I'm curious >> if anyone else feels like it should be changed. > Yea, starting the timer on suspend seems unintuitive to me. > > Is this something you were hoping to get in for 3.17 or is this a urgent > 3.16 item? Ok I'll send a follow up patch to cancel during suspend and start during resume, unless you want that to be part of this fix? It's a regression back to v3.13 so I would think it's urgent, although I haven't seen any reports on the mailing list, just reports on some of our android kernels.
On 07/18/2014 03:38 PM, Stephen Boyd wrote: > On 07/18/14 15:25, John Stultz wrote: >> On 07/18/2014 03:09 PM, Stephen Boyd wrote: >>> During suspend we call sched_clock_poll() to update the epoch and >>> accumulated time and reprogram the sched_clock_timer to fire >>> before the next wrap-around time. Unfortunately, >>> sched_clock_poll() doesn't restart the timer, instead it relies >>> on the hrtimer layer to do that and during suspend we aren't >>> calling that function from the hrtimer layer. Instead, we're >>> reprogramming the expires time while the hrtimer is enqueued, >>> which can cause the hrtimer tree to be corrupted. Fix this >>> problem by updating the state via update_sched_clock() and >>> properly restarting the timer via hrtimer_start(). >>> >>> Fixes: a08ca5d1089d "sched_clock: Use an hrtimer instead of timer" >>> Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> >>> --- >>> >>> I also wonder if we should be restarting the timer during resume >>> instead of suspend given that the resume path modifies the epoch. >>> At that point timers can't run because interrupts are disabled and >>> we don't really care if the timer fires earlier than it's supposed >>> to anyway because it's just there to avoid rollover events, but >>> does it seem better to do it that way? I didn't send that version >>> because this patch is to fix the code intention, but I'm curious >>> if anyone else feels like it should be changed. >> Yea, starting the timer on suspend seems unintuitive to me. >> >> Is this something you were hoping to get in for 3.17 or is this a urgent >> 3.16 item? > Ok I'll send a follow up patch to cancel during suspend and start during > resume, unless you want that to be part of this fix? It's a regression > back to v3.13 so I would think it's urgent, although I haven't seen any > reports on the mailing list, just reports on some of our android kernels. If its a regression (and needs -stable backports) it needs to go in via tip/timers/urgent, and not via the regular merge window. Whats the additional risk -stable wise for canceling the timer during suspend and starting it back up during resume? thanks -john
diff --git a/kernel/time/sched_clock.c b/kernel/time/sched_clock.c index 445106d2c729..9e32ce88e9ee 100644 --- a/kernel/time/sched_clock.c +++ b/kernel/time/sched_clock.c @@ -191,7 +191,9 @@ void __init sched_clock_postinit(void) static int sched_clock_suspend(void) { - sched_clock_poll(&sched_clock_timer); + update_sched_clock(); + /* Restart the timer because we forced an update */ + hrtimer_start(&sched_clock_timer, cd.wrap_kt, HRTIMER_MODE_REL); cd.suspended = true; return 0; }
During suspend we call sched_clock_poll() to update the epoch and accumulated time and reprogram the sched_clock_timer to fire before the next wrap-around time. Unfortunately, sched_clock_poll() doesn't restart the timer, instead it relies on the hrtimer layer to do that and during suspend we aren't calling that function from the hrtimer layer. Instead, we're reprogramming the expires time while the hrtimer is enqueued, which can cause the hrtimer tree to be corrupted. Fix this problem by updating the state via update_sched_clock() and properly restarting the timer via hrtimer_start(). Fixes: a08ca5d1089d "sched_clock: Use an hrtimer instead of timer" Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> --- I also wonder if we should be restarting the timer during resume instead of suspend given that the resume path modifies the epoch. At that point timers can't run because interrupts are disabled and we don't really care if the timer fires earlier than it's supposed to anyway because it's just there to avoid rollover events, but does it seem better to do it that way? I didn't send that version because this patch is to fix the code intention, but I'm curious if anyone else feels like it should be changed. kernel/time/sched_clock.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)