Message ID | 2161372.IsD4PDzmmY@aspire.rjw.lan (mailing list archive) |
---|---|
State | Mainlined |
Delegated to: | Rafael Wysocki |
Headers | show |
Series | sched: idle: Avoid retaining the tick when it has been stopped | expand |
On Thu, Aug 09, 2018 at 07:08:34PM +0200, Rafael J . Wysocki wrote: > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > If the tick has been stopped already, but the governor has not asked to > stop it (which it can do sometimes), the idle loop should invoke > tick_nohz_idle_stop_tick(), to let tick_nohz_stop_tick() take care > of this case properly. IMHO, I don't think this patch is on the right way; from the idle loop side, it needs to provide sane fundamental supports, for example, it can stop or restart the tick per idle governor's request. On the other hand, the idle governors can decide their own policy for how to use the tick in idle loop. This patch seems mixes two things and finally it's possible to couple the implementation between idle loop and 'menu' governor for sched tick usage. I still think my patch to restart the tick is valid :) Thanks, Leo Yan > Fixes: 554c8aa8ecad (sched: idle: Select idle state before stopping the tick) > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > --- > kernel/sched/idle.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > Index: linux-pm/kernel/sched/idle.c > =================================================================== > --- linux-pm.orig/kernel/sched/idle.c > +++ linux-pm/kernel/sched/idle.c > @@ -190,7 +190,7 @@ static void cpuidle_idle_call(void) > */ > next_state = cpuidle_select(drv, dev, &stop_tick); > > - if (stop_tick) > + if (stop_tick || tick_nohz_tick_stopped()) > tick_nohz_idle_stop_tick(); > else > tick_nohz_idle_retain_tick(); >
On Fri, Aug 10, 2018 at 8:19 AM, <leo.yan@linaro.org> wrote: > On Thu, Aug 09, 2018 at 07:08:34PM +0200, Rafael J . Wysocki wrote: >> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> >> >> If the tick has been stopped already, but the governor has not asked to >> stop it (which it can do sometimes), the idle loop should invoke >> tick_nohz_idle_stop_tick(), to let tick_nohz_stop_tick() take care >> of this case properly. > > IMHO, I don't think this patch is on the right way; So we disagree here, quite obviously. > from the idle loop side, it needs to provide sane fundamental supports, > for example, it can stop or restart the tick per idle governor's request. No, if the tick is stopped, restarting it is pointless until we exit the loop in do_idle(). > On the other hand, the idle governors can decide their own policy for how to > use the tick in idle loop. This patch seems mixes two things and > finally it's possible to couple the implementation between idle loop > and 'menu' governor for sched tick usage. I'm not following this, sorry. > I still think my patch to restart the tick is valid :) It changes the behavior significantly, though, and it is not clear if the new behavior is desirable. The patch here simply fixes a problem while leaving the overall behavior as is. >> Fixes: 554c8aa8ecad (sched: idle: Select idle state before stopping the tick) >> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> >> --- >> kernel/sched/idle.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> Index: linux-pm/kernel/sched/idle.c >> =================================================================== >> --- linux-pm.orig/kernel/sched/idle.c >> +++ linux-pm/kernel/sched/idle.c >> @@ -190,7 +190,7 @@ static void cpuidle_idle_call(void) >> */ >> next_state = cpuidle_select(drv, dev, &stop_tick); >> >> - if (stop_tick) >> + if (stop_tick || tick_nohz_tick_stopped()) >> tick_nohz_idle_stop_tick(); >> else >> tick_nohz_idle_retain_tick(); >>
On Thu, Aug 09, 2018 at 07:08:34PM +0200, Rafael J. Wysocki wrote: > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > If the tick has been stopped already, but the governor has not asked to > stop it (which it can do sometimes), the idle loop should invoke > tick_nohz_idle_stop_tick(), to let tick_nohz_stop_tick() take care > of this case properly. > > Fixes: 554c8aa8ecad (sched: idle: Select idle state before stopping the tick) > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > --- > kernel/sched/idle.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > Index: linux-pm/kernel/sched/idle.c > =================================================================== > --- linux-pm.orig/kernel/sched/idle.c > +++ linux-pm/kernel/sched/idle.c > @@ -190,7 +190,7 @@ static void cpuidle_idle_call(void) > */ > next_state = cpuidle_select(drv, dev, &stop_tick); > > - if (stop_tick) > + if (stop_tick || tick_nohz_tick_stopped()) > tick_nohz_idle_stop_tick(); > else > tick_nohz_idle_retain_tick(); So what if tick_nohz_idle_stop_tick() sees no timer to schedule and cancels it, we may remain idle in a shallow state for a long while? Otherwise we can have something like this: diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index da9455a..408c985 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -806,6 +806,9 @@ static void tick_nohz_stop_tick(struct tick_sched *ts, int cpu) static void tick_nohz_retain_tick(struct tick_sched *ts) { ts->timer_expires_base = 0; + + if (ts->tick_stopped) + tick_nohz_restart(ts, ktime_get()); } #ifdef CONFIG_NO_HZ_FULL
On Thursday, August 16, 2018 3:27:24 PM CEST Frederic Weisbecker wrote: > On Thu, Aug 09, 2018 at 07:08:34PM +0200, Rafael J. Wysocki wrote: > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > > > If the tick has been stopped already, but the governor has not asked to > > stop it (which it can do sometimes), the idle loop should invoke > > tick_nohz_idle_stop_tick(), to let tick_nohz_stop_tick() take care > > of this case properly. > > > > Fixes: 554c8aa8ecad (sched: idle: Select idle state before stopping the tick) > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > --- > > kernel/sched/idle.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > Index: linux-pm/kernel/sched/idle.c > > =================================================================== > > --- linux-pm.orig/kernel/sched/idle.c > > +++ linux-pm/kernel/sched/idle.c > > @@ -190,7 +190,7 @@ static void cpuidle_idle_call(void) > > */ > > next_state = cpuidle_select(drv, dev, &stop_tick); > > > > - if (stop_tick) > > + if (stop_tick || tick_nohz_tick_stopped()) > > tick_nohz_idle_stop_tick(); > > else > > tick_nohz_idle_retain_tick(); > > So what if tick_nohz_idle_stop_tick() sees no timer to schedule and > cancels it, we may remain idle in a shallow state for a long while? Yes, but the governor is expected to avoid using shallow states when the tick is stopped already. > Otherwise we can have something like this: > > diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c > index da9455a..408c985 100644 > --- a/kernel/time/tick-sched.c > +++ b/kernel/time/tick-sched.c > @@ -806,6 +806,9 @@ static void tick_nohz_stop_tick(struct tick_sched *ts, int cpu) > static void tick_nohz_retain_tick(struct tick_sched *ts) > { > ts->timer_expires_base = 0; > + > + if (ts->tick_stopped) > + tick_nohz_restart(ts, ktime_get()); > } > > #ifdef CONFIG_NO_HZ_FULL > We could do that, but my concern with that approach is that we may end up stopping and starting the tick back and forth without exiting the loop in do_idle() just because somebody uses a periodic timer behind our back and the governor gets confused. Besides, that would be a change in behavior, while the $subject patch simply fixes a mistake in the original design. Cheers, Rafael
On Fri, Aug 17, 2018 at 11:34 AM Rafael J. Wysocki <rjw@rjwysocki.net> wrote: > > On Thursday, August 16, 2018 3:27:24 PM CEST Frederic Weisbecker wrote: > > On Thu, Aug 09, 2018 at 07:08:34PM +0200, Rafael J. Wysocki wrote: > > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > > > > > If the tick has been stopped already, but the governor has not asked to > > > stop it (which it can do sometimes), the idle loop should invoke > > > tick_nohz_idle_stop_tick(), to let tick_nohz_stop_tick() take care > > > of this case properly. > > > > > > Fixes: 554c8aa8ecad (sched: idle: Select idle state before stopping the tick) > > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > > --- > > > kernel/sched/idle.c | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > Index: linux-pm/kernel/sched/idle.c > > > =================================================================== > > > --- linux-pm.orig/kernel/sched/idle.c > > > +++ linux-pm/kernel/sched/idle.c > > > @@ -190,7 +190,7 @@ static void cpuidle_idle_call(void) > > > */ > > > next_state = cpuidle_select(drv, dev, &stop_tick); > > > > > > - if (stop_tick) > > > + if (stop_tick || tick_nohz_tick_stopped()) > > > tick_nohz_idle_stop_tick(); > > > else > > > tick_nohz_idle_retain_tick(); > > > > So what if tick_nohz_idle_stop_tick() sees no timer to schedule and > > cancels it, we may remain idle in a shallow state for a long while? > > Yes, but the governor is expected to avoid using shallow states when the > tick is stopped already. > > > Otherwise we can have something like this: > > > > diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c > > index da9455a..408c985 100644 > > --- a/kernel/time/tick-sched.c > > +++ b/kernel/time/tick-sched.c > > @@ -806,6 +806,9 @@ static void tick_nohz_stop_tick(struct tick_sched *ts, int cpu) > > static void tick_nohz_retain_tick(struct tick_sched *ts) > > { > > ts->timer_expires_base = 0; > > + > > + if (ts->tick_stopped) > > + tick_nohz_restart(ts, ktime_get()); > > } > > > > #ifdef CONFIG_NO_HZ_FULL > > > > We could do that, but my concern with that approach is that we may end up > stopping and starting the tick back and forth without exiting the loop > in do_idle() just because somebody uses a periodic timer behind our > back and the governor gets confused. > > Besides, that would be a change in behavior, while the $subject patch > simply fixes a mistake in the original design. Anyway, I'm sort of divided here. We need to do something, this way or another, because the current code is not strictly correct. If there are no concerns about the possible extra overhead related to restarting the tick, I'd just add a tick_nohz_idle_restart_tick() to the tick_nohz_idle_retain_tick() branch in cpuidle_idle_call() (it would do what's needed in there without affecting any other places). Then, of course, governors would not need to worry about leaving the tick stopped, so menu could be simplified somewhat, which may be a good thing after all. Cheers, Rafael
On Fri, Aug 17, 2018 at 11:32:07AM +0200, Rafael J. Wysocki wrote: > On Thursday, August 16, 2018 3:27:24 PM CEST Frederic Weisbecker wrote: > > On Thu, Aug 09, 2018 at 07:08:34PM +0200, Rafael J. Wysocki wrote: > > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > > > > > If the tick has been stopped already, but the governor has not asked to > > > stop it (which it can do sometimes), the idle loop should invoke > > > tick_nohz_idle_stop_tick(), to let tick_nohz_stop_tick() take care > > > of this case properly. > > > > > > Fixes: 554c8aa8ecad (sched: idle: Select idle state before stopping the tick) > > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > > --- > > > kernel/sched/idle.c | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > Index: linux-pm/kernel/sched/idle.c > > > =================================================================== > > > --- linux-pm.orig/kernel/sched/idle.c > > > +++ linux-pm/kernel/sched/idle.c > > > @@ -190,7 +190,7 @@ static void cpuidle_idle_call(void) > > > */ > > > next_state = cpuidle_select(drv, dev, &stop_tick); > > > > > > - if (stop_tick) > > > + if (stop_tick || tick_nohz_tick_stopped()) > > > tick_nohz_idle_stop_tick(); > > > else > > > tick_nohz_idle_retain_tick(); > > > > So what if tick_nohz_idle_stop_tick() sees no timer to schedule and > > cancels it, we may remain idle in a shallow state for a long while? > > Yes, but the governor is expected to avoid using shallow states when the > tick is stopped already. So what kind of sleep do we enter to when an idle tick fires and we go back to idle? Is it always deep? I believe that ts->tick_stopped == 1 shouldn't be too relevant for the governor. We can definetly have scenarios where the idle tick is stopped for a long while, then it fires and schedules the next timer at NOW() + TICK_NSEC (as if the tick had been restarted). This can even repeat that way for some time, because ts->tick_stopped == 1 only implies that the tick has been stopped once since we entered the idle loop. After that we may well have a periodic tick behaviour. In that case we probably don't want deep idle state. Especially if we have: idle_loop() { tick_stop (scheduled several seconds forward) deep_idle_sleep() //several seconds later tick() tick_stop (scheduled TICK_NSEC forward) deep_idle_sleep() tick() { set_need_resched() } exit idle loop } Here the last deep idle state isn't necessary. > > > Otherwise we can have something like this: > > > > diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c > > index da9455a..408c985 100644 > > --- a/kernel/time/tick-sched.c > > +++ b/kernel/time/tick-sched.c > > @@ -806,6 +806,9 @@ static void tick_nohz_stop_tick(struct tick_sched *ts, int cpu) > > static void tick_nohz_retain_tick(struct tick_sched *ts) > > { > > ts->timer_expires_base = 0; > > + > > + if (ts->tick_stopped) > > + tick_nohz_restart(ts, ktime_get()); > > } > > > > #ifdef CONFIG_NO_HZ_FULL > > > > We could do that, but my concern with that approach is that we may end up > stopping and starting the tick back and forth without exiting the loop > in do_idle() just because somebody uses a periodic timer behind our > back and the governor gets confused. > > Besides, that would be a change in behavior, while the $subject patch > simply fixes a mistake in the original design. Ok, let's take the safe approach for now as this is a fix and it should even be routed to stable. But then in the longer term, perhaps cpuidle_select() should think that through. Thanks.
On Fri, Aug 17, 2018 at 4:12 PM Frederic Weisbecker <frederic@kernel.org> wrote: > > On Fri, Aug 17, 2018 at 11:32:07AM +0200, Rafael J. Wysocki wrote: > > On Thursday, August 16, 2018 3:27:24 PM CEST Frederic Weisbecker wrote: > > > On Thu, Aug 09, 2018 at 07:08:34PM +0200, Rafael J. Wysocki wrote: > > > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > > > > > > > If the tick has been stopped already, but the governor has not asked to > > > > stop it (which it can do sometimes), the idle loop should invoke > > > > tick_nohz_idle_stop_tick(), to let tick_nohz_stop_tick() take care > > > > of this case properly. > > > > > > > > Fixes: 554c8aa8ecad (sched: idle: Select idle state before stopping the tick) > > > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > > > --- > > > > kernel/sched/idle.c | 2 +- > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > Index: linux-pm/kernel/sched/idle.c > > > > =================================================================== > > > > --- linux-pm.orig/kernel/sched/idle.c > > > > +++ linux-pm/kernel/sched/idle.c > > > > @@ -190,7 +190,7 @@ static void cpuidle_idle_call(void) > > > > */ > > > > next_state = cpuidle_select(drv, dev, &stop_tick); > > > > > > > > - if (stop_tick) > > > > + if (stop_tick || tick_nohz_tick_stopped()) > > > > tick_nohz_idle_stop_tick(); > > > > else > > > > tick_nohz_idle_retain_tick(); > > > > > > So what if tick_nohz_idle_stop_tick() sees no timer to schedule and > > > cancels it, we may remain idle in a shallow state for a long while? > > > > Yes, but the governor is expected to avoid using shallow states when the > > tick is stopped already. > > So what kind of sleep do we enter to when an idle tick fires and we go > back to idle? Is it always deep? No, it isn't. The state to select must always fit the time till the closest timer event and that may be shorter than the tick period. If there's a non-tick timer to wake the CPU up, we don't need to worry about restarting the tick, though. :-) > I believe that ts->tick_stopped == 1 shouldn't be too relevant for the governor. > We can definetly have scenarios where the idle tick is stopped for a long while, > then it fires and schedules the next timer at NOW() + TICK_NSEC (as if the tick > had been restarted). This can even repeat that way for some time, because > ts->tick_stopped == 1 only implies that the tick has been stopped once since > we entered the idle loop. After that we may well have a periodic tick behaviour. > In that case we probably don't want deep idle state. Especially if we have: > > idle_loop() { > tick_stop (scheduled several seconds forward) > deep_idle_sleep() > //several seconds later > tick() > tick_stop (scheduled TICK_NSEC forward) > deep_idle_sleep() > tick() { > set_need_resched() > } > exit idle loop > } > > Here the last deep idle state isn't necessary. No, it isn't. However, that is not relevant for the question of whether or not to restart the tick before entering the idle state IMO (see the considerations below). > > > > > Otherwise we can have something like this: > > > > > > diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c > > > index da9455a..408c985 100644 > > > --- a/kernel/time/tick-sched.c > > > +++ b/kernel/time/tick-sched.c > > > @@ -806,6 +806,9 @@ static void tick_nohz_stop_tick(struct tick_sched *ts, int cpu) > > > static void tick_nohz_retain_tick(struct tick_sched *ts) > > > { > > > ts->timer_expires_base = 0; > > > + > > > + if (ts->tick_stopped) > > > + tick_nohz_restart(ts, ktime_get()); > > > } > > > > > > #ifdef CONFIG_NO_HZ_FULL > > > > > > > We could do that, but my concern with that approach is that we may end up > > stopping and starting the tick back and forth without exiting the loop > > in do_idle() just because somebody uses a periodic timer behind our > > back and the governor gets confused. > > > > Besides, that would be a change in behavior, while the $subject patch > > simply fixes a mistake in the original design. > > Ok, let's take the safe approach for now as this is a fix and it should even be > routed to stable. Right. I'll queue up this patch, then. > But then in the longer term, perhaps cpuidle_select() should think that > through. So I have given more consideration to this and my conclusion is that restarting the tick between cpuidle_select() and call_cpuidle() is a bad idea. First off, if need_resched() is "false", the primary reason for running the tick on the given CPU is not there, so it only might be useful as a "backup" timer to wake up the CPU from an inadequate idle state. Now, in general, there are two reasons for the idle governor (whatever it is) to select an idle state with a target residency below the tick period length. The first reason is when the governor knows that the closest timer event is going to occur in this time frame, but in that case (as stated above), it is not necessary to worry about the tick, because the other timer will trigger soon enough anyway. The second reason is when the governor predicts a wakeup which is not by a timer in this time frame and it is quite arguable what the governor should do then. IMO it at least is not unreasonable to throw the prediction away and still go for the closest timer event in that case (which is the current approach). There's more, though. Restarting the tick between cpuidle_select() and call_cpuidle() might introduce quite a bit of latency into that point and that would mess up with the idle state selection (e.g. selecting a very shallow idle state might not make a lot of sense if that latency was high enough, because the expected wakeup might very well take place when the tick was being restarted), so it should rather be avoided IMO. Cheers, Rafael
On Sat, Aug 18, 2018 at 11:57:00PM +0200, Rafael J. Wysocki wrote: [...] > > > > Otherwise we can have something like this: > > > > > > > > diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c > > > > index da9455a..408c985 100644 > > > > --- a/kernel/time/tick-sched.c > > > > +++ b/kernel/time/tick-sched.c > > > > @@ -806,6 +806,9 @@ static void tick_nohz_stop_tick(struct tick_sched *ts, int cpu) > > > > static void tick_nohz_retain_tick(struct tick_sched *ts) > > > > { > > > > ts->timer_expires_base = 0; > > > > + > > > > + if (ts->tick_stopped) > > > > + tick_nohz_restart(ts, ktime_get()); > > > > } > > > > > > > > #ifdef CONFIG_NO_HZ_FULL > > > > > > > > > > We could do that, but my concern with that approach is that we may end up > > > stopping and starting the tick back and forth without exiting the loop > > > in do_idle() just because somebody uses a periodic timer behind our > > > back and the governor gets confused. > > > > > > Besides, that would be a change in behavior, while the $subject patch > > > simply fixes a mistake in the original design. > > > > Ok, let's take the safe approach for now as this is a fix and it should even be > > routed to stable. > > Right. I'll queue up this patch, then. > > > But then in the longer term, perhaps cpuidle_select() should think that > > through. > > So I have given more consideration to this and my conclusion is that > restarting the tick between cpuidle_select() and call_cpuidle() is a > bad idea. > > First off, if need_resched() is "false", the primary reason for > running the tick on the given CPU is not there, so it only might be > useful as a "backup" timer to wake up the CPU from an inadequate idle > state. > > Now, in general, there are two reasons for the idle governor (whatever > it is) to select an idle state with a target residency below the tick > period length. The first reason is when the governor knows that the > closest timer event is going to occur in this time frame, but in that > case (as stated above), it is not necessary to worry about the tick, > because the other timer will trigger soon enough anyway. The second > reason is when the governor predicts a wakeup which is not by a timer > in this time frame and it is quite arguable what the governor should > do then. IMO it at least is not unreasonable to throw the prediction > away and still go for the closest timer event in that case (which is > the current approach). > > There's more, though. Restarting the tick between cpuidle_select() > and call_cpuidle() might introduce quite a bit of latency into that > point and that would mess up with the idle state selection (e.g. > selecting a very shallow idle state might not make a lot of sense if > that latency was high enough, because the expected wakeup might very > well take place when the tick was being restarted), so it should > rather be avoided IMO. I expect the idle governor doesn't introduce many restarting tick operations, the reason is if there have a close timer event than idle governor can trust it to wake up CPU so in this case the idle governor will not restart tick; if the the timer event is long delta and the shallow state selection is caused by factors (e.g. typical pattern), then we need restart tick to avoid powernightmares, for this case we can restart tick only once at the beginning for the typical pattern interrupt events; after the typical pattern interrupt doesn't continue then we can rely on the tick to rescue the idle state to deep one. Thanks, Leo Yan
On Sun, Aug 19, 2018 at 2:36 AM <leo.yan@linaro.org> wrote: > > On Sat, Aug 18, 2018 at 11:57:00PM +0200, Rafael J. Wysocki wrote: > > [...] > > > > > > Otherwise we can have something like this: > > > > > > > > > > diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c > > > > > index da9455a..408c985 100644 > > > > > --- a/kernel/time/tick-sched.c > > > > > +++ b/kernel/time/tick-sched.c > > > > > @@ -806,6 +806,9 @@ static void tick_nohz_stop_tick(struct tick_sched *ts, int cpu) > > > > > static void tick_nohz_retain_tick(struct tick_sched *ts) > > > > > { > > > > > ts->timer_expires_base = 0; > > > > > + > > > > > + if (ts->tick_stopped) > > > > > + tick_nohz_restart(ts, ktime_get()); > > > > > } > > > > > > > > > > #ifdef CONFIG_NO_HZ_FULL > > > > > > > > > > > > > We could do that, but my concern with that approach is that we may end up > > > > stopping and starting the tick back and forth without exiting the loop > > > > in do_idle() just because somebody uses a periodic timer behind our > > > > back and the governor gets confused. > > > > > > > > Besides, that would be a change in behavior, while the $subject patch > > > > simply fixes a mistake in the original design. > > > > > > Ok, let's take the safe approach for now as this is a fix and it should even be > > > routed to stable. > > > > Right. I'll queue up this patch, then. > > > > > But then in the longer term, perhaps cpuidle_select() should think that > > > through. > > > > So I have given more consideration to this and my conclusion is that > > restarting the tick between cpuidle_select() and call_cpuidle() is a > > bad idea. > > > > First off, if need_resched() is "false", the primary reason for > > running the tick on the given CPU is not there, so it only might be > > useful as a "backup" timer to wake up the CPU from an inadequate idle > > state. > > > > Now, in general, there are two reasons for the idle governor (whatever > > it is) to select an idle state with a target residency below the tick > > period length. The first reason is when the governor knows that the > > closest timer event is going to occur in this time frame, but in that > > case (as stated above), it is not necessary to worry about the tick, > > because the other timer will trigger soon enough anyway. The second > > reason is when the governor predicts a wakeup which is not by a timer > > in this time frame and it is quite arguable what the governor should > > do then. IMO it at least is not unreasonable to throw the prediction > > away and still go for the closest timer event in that case (which is > > the current approach). > > > > There's more, though. Restarting the tick between cpuidle_select() > > and call_cpuidle() might introduce quite a bit of latency into that > > point and that would mess up with the idle state selection (e.g. > > selecting a very shallow idle state might not make a lot of sense if > > that latency was high enough, because the expected wakeup might very > > well take place when the tick was being restarted), so it should > > rather be avoided IMO. > > I expect the idle governor doesn't introduce many restarting tick > operations, the reason is if there have a close timer event than idle > governor can trust it to wake up CPU so in this case the idle governor > will not restart tick; if the the timer event is long delta and the > shallow state selection is caused by factors (e.g. typical pattern), > then we need restart tick to avoid powernightmares, for this case we > can restart tick only once at the beginning for the typical pattern > interrupt events; after the typical pattern interrupt doesn't continue > then we can rely on the tick to rescue the idle state to deep one. No, we don't need to restart the tick at all. We just need to require the governor to disregard "typical patterns" (which are not timer-induced, mind you) when it knows that the tick has been stopped already. Unfortunately, the menu governor cannot distinguish a timer-induced "typical" pattern from one related to device interrupts, but I don't really see a reason to worry about the latter when the CPU is idle and with stopped tick (which means that the workload can tolerate extra latency from deep idle states anyway).
On Sat, Aug 18, 2018 at 11:57:00PM +0200, Rafael J. Wysocki wrote: > So I have given more consideration to this and my conclusion is that > restarting the tick between cpuidle_select() and call_cpuidle() is a > bad idea. Ack, we should only restart the tick once we leave the idle loop. > First off, if need_resched() is "false", the primary reason for > running the tick on the given CPU is not there, so it only might be > useful as a "backup" timer to wake up the CPU from an inadequate idle > state. this.. <snip> > The second > reason is when the governor predicts a wakeup which is not by a timer > in this time frame and it is quite arguable what the governor should > do then. IMO it at least is not unreasonable to throw the prediction > away and still go for the closest timer event in that case (which is > the current approach). Yes, I think I can agree with that, predictions at that scale are just not that useful. The primary point of the governor is to stay shallow when we can, but once we're deep and have disabled the tick and lost caches, there's really no point anymore. Waking up is going to hurt. > There's more, though. Restarting the tick between cpuidle_select() > and call_cpuidle() might introduce quite a bit of latency into that > point and that would mess up with the idle state selection (e.g. > selecting a very shallow idle state might not make a lot of sense if > that latency was high enough, because the expected wakeup might very > well take place when the tick was being restarted), so it should > rather be avoided IMO. Absolutely, mucking with the tick just because of a hunch is the wrong thing. So, Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> for this one.
On Sat, Aug 18, 2018 at 11:57:00PM +0200, Rafael J. Wysocki wrote: > On Fri, Aug 17, 2018 at 4:12 PM Frederic Weisbecker <frederic@kernel.org> wrote: > > > > On Fri, Aug 17, 2018 at 11:32:07AM +0200, Rafael J. Wysocki wrote: > > > On Thursday, August 16, 2018 3:27:24 PM CEST Frederic Weisbecker wrote: > > > > On Thu, Aug 09, 2018 at 07:08:34PM +0200, Rafael J. Wysocki wrote: > > > > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > > > > > > > > > If the tick has been stopped already, but the governor has not asked to > > > > > stop it (which it can do sometimes), the idle loop should invoke > > > > > tick_nohz_idle_stop_tick(), to let tick_nohz_stop_tick() take care > > > > > of this case properly. > > > > > > > > > > Fixes: 554c8aa8ecad (sched: idle: Select idle state before stopping the tick) > > > > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > > > > --- > > > > > kernel/sched/idle.c | 2 +- > > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > > > Index: linux-pm/kernel/sched/idle.c > > > > > =================================================================== > > > > > --- linux-pm.orig/kernel/sched/idle.c > > > > > +++ linux-pm/kernel/sched/idle.c > > > > > @@ -190,7 +190,7 @@ static void cpuidle_idle_call(void) > > > > > */ > > > > > next_state = cpuidle_select(drv, dev, &stop_tick); > > > > > > > > > > - if (stop_tick) > > > > > + if (stop_tick || tick_nohz_tick_stopped()) > > > > > tick_nohz_idle_stop_tick(); > > > > > else > > > > > tick_nohz_idle_retain_tick(); > > > > > > > > So what if tick_nohz_idle_stop_tick() sees no timer to schedule and > > > > cancels it, we may remain idle in a shallow state for a long while? > > > > > > Yes, but the governor is expected to avoid using shallow states when the > > > tick is stopped already. > > > > So what kind of sleep do we enter to when an idle tick fires and we go > > back to idle? Is it always deep? > > No, it isn't. > > The state to select must always fit the time till the closest timer > event and that may be shorter than the tick period. Ah ok, so that's fine then. > > If there's a non-tick timer to wake the CPU up, we don't need to worry > about restarting the tick, though. :-) Ok. > > > I believe that ts->tick_stopped == 1 shouldn't be too relevant for the governor. > > We can definetly have scenarios where the idle tick is stopped for a long while, > > then it fires and schedules the next timer at NOW() + TICK_NSEC (as if the tick > > had been restarted). This can even repeat that way for some time, because > > ts->tick_stopped == 1 only implies that the tick has been stopped once since > > we entered the idle loop. After that we may well have a periodic tick behaviour. > > In that case we probably don't want deep idle state. Especially if we have: > > > > idle_loop() { > > tick_stop (scheduled several seconds forward) > > deep_idle_sleep() > > //several seconds later > > tick() > > tick_stop (scheduled TICK_NSEC forward) > > deep_idle_sleep() > > tick() { > > set_need_resched() > > } > > exit idle loop > > } > > > > Here the last deep idle state isn't necessary. > > No, it isn't. > > However, that is not relevant for the question of whether or not to > restart the tick before entering the idle state IMO (see the > considerations below). Yes indeed. > > But then in the longer term, perhaps cpuidle_select() should think that > > through. > > So I have given more consideration to this and my conclusion is that > restarting the tick between cpuidle_select() and call_cpuidle() is a > bad idea. > > First off, if need_resched() is "false", the primary reason for > running the tick on the given CPU is not there, so it only might be > useful as a "backup" timer to wake up the CPU from an inadequate idle > state. > > Now, in general, there are two reasons for the idle governor (whatever > it is) to select an idle state with a target residency below the tick > period length. The first reason is when the governor knows that the > closest timer event is going to occur in this time frame, but in that > case (as stated above), it is not necessary to worry about the tick, > because the other timer will trigger soon enough anyway. The second > reason is when the governor predicts a wakeup which is not by a timer > in this time frame and it is quite arguable what the governor should > do then. IMO it at least is not unreasonable to throw the prediction > away and still go for the closest timer event in that case (which is > the current approach). Then in this case, when you say you throw away that prediction, does it mean you select an idle state that only takes the next timer event into consideration? So for example we predict a wake up event TICK_NSEC ahead but the next timer event is a few seconds, you're going to select an idle state according to that "few seconds" ahead next event, right? (which in practice is likely to be deep I guess). I guess so but, just want to be sure I understand you correctly. > > There's more, though. Restarting the tick between cpuidle_select() > and call_cpuidle() might introduce quite a bit of latency into that > point and that would mess up with the idle state selection (e.g. > selecting a very shallow idle state might not make a lot of sense if > that latency was high enough, because the expected wakeup might very > well take place when the tick was being restarted), so it should > rather be avoided IMO. Yes indeed. Thanks.
On Mon, Aug 20, 2018 at 4:42 PM Frederic Weisbecker <frederic@kernel.org> wrote: > > On Sat, Aug 18, 2018 at 11:57:00PM +0200, Rafael J. Wysocki wrote: > > On Fri, Aug 17, 2018 at 4:12 PM Frederic Weisbecker <frederic@kernel.org> wrote: [cut] > > > > Now, in general, there are two reasons for the idle governor (whatever > > it is) to select an idle state with a target residency below the tick > > period length. The first reason is when the governor knows that the > > closest timer event is going to occur in this time frame, but in that > > case (as stated above), it is not necessary to worry about the tick, > > because the other timer will trigger soon enough anyway. The second > > reason is when the governor predicts a wakeup which is not by a timer > > in this time frame and it is quite arguable what the governor should > > do then. IMO it at least is not unreasonable to throw the prediction > > away and still go for the closest timer event in that case (which is > > the current approach). > > Then in this case, when you say you throw away that prediction, does it > mean you select an idle state that only takes the next timer event into > consideration? Yes, it does. > So for example we predict a wake up event TICK_NSEC ahead but the next > timer event is a few seconds, you're going to select an idle state > according to that "few seconds" ahead next event, right? (which in > practice is likely to be deep I guess). > > I guess so but, just want to be sure I understand you correctly. More precisely, if the original predicted idle duration is less than TICK_USEC and the tick has been stopped, the governor takes the time till the next timer event instead of the predicted value (so effectively the predicted value is discarded). If the original predicted idle duration is TICK_USEC or more, the tick would have been stopped anyway (had it not been stopped already), so it may as well be used for idle state selection in the stopped tick case. Cheers, Rafael
* Rafael J. Wysocki <rjw@rjwysocki.net> [691231 23:00]: > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > If the tick has been stopped already, but the governor has not asked to > stop it (which it can do sometimes), the idle loop should invoke > tick_nohz_idle_stop_tick(), to let tick_nohz_stop_tick() take care > of this case properly. > > Fixes: 554c8aa8ecad (sched: idle: Select idle state before stopping the tick) This patch seems to fix an issue where boot hangs occasionally on beagleboard-xm with ARM multi_v7_defconfig as reported by kernelci.org and Mark Brown earlier at [0]. At least so far no boot hangs for me with this fix, so: Tested-by: Tony Lindgren <tony@atomide.com> [0] https://www.spinics.net/lists/linux-mmc/msg50480.html
Index: linux-pm/kernel/sched/idle.c =================================================================== --- linux-pm.orig/kernel/sched/idle.c +++ linux-pm/kernel/sched/idle.c @@ -190,7 +190,7 @@ static void cpuidle_idle_call(void) */ next_state = cpuidle_select(drv, dev, &stop_tick); - if (stop_tick) + if (stop_tick || tick_nohz_tick_stopped()) tick_nohz_idle_stop_tick(); else tick_nohz_idle_retain_tick();