diff mbox

[RFC/RFT,v2,2/6] sched: idle: Do not stop the tick upfront in the idle loop

Message ID 3346281.BDGJiv2ZOp@aspire.rjw.lan (mailing list archive)
State RFC, archived
Headers show

Commit Message

Rafael J. Wysocki March 6, 2018, 9:02 a.m. UTC
From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Push the decision whether or not to stop the tick somewhat deeper
into the idle loop.

Stopping the tick upfront leads to unpleasant outcomes in case the
idle governor doesn't agree with the timekeeping code on the duration
of the upcoming idle period.  Specifically, if the tick has been
stopped and the idle governor predicts short idle, the situation is
bad regardless of whether or not the prediction is accurate.  If it
is accurate, the tick has been stopped unnecessarily which means
excessive overhead.  If it is not accurate, the CPU is likely to
spend too much time in the (shallow, because short idle has been
predicted) idle state selected by the governor [1].

As the first step towards addressing this problem, change the code
to make the tick stopping decision inside of the loop in do_idle().
In particular, do not stop the tick in the cpu_idle_poll() code path.
Also don't do that in tick_nohz_irq_exit() which doesn't really have
information to whether or not to stop the tick.

Link: https://marc.info/?l=linux-pm&m=150116085925208&w=2 # [1]
Link: https://tu-dresden.de/zih/forschung/ressourcen/dateien/projekte/haec/powernightmares.pdf
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---

-> v2: No changes.

---
 kernel/sched/idle.c      |   13 ++++++++++---
 kernel/time/tick-sched.c |    2 +-
 2 files changed, 11 insertions(+), 4 deletions(-)

Comments

Frederic Weisbecker March 7, 2018, 11:39 p.m. UTC | #1
On Tue, Mar 06, 2018 at 10:02:15AM +0100, Rafael J. Wysocki wrote:
> Index: linux-pm/kernel/sched/idle.c
> ===================================================================
> --- linux-pm.orig/kernel/sched/idle.c
> +++ linux-pm/kernel/sched/idle.c
> @@ -220,13 +220,17 @@ static void do_idle(void)
>  	 */
>  
>  	__current_set_polling();
> -	tick_nohz_idle_enter();
> +	tick_nohz_idle_prepare();

Since we leave tick_nohz_idle_exit() unchanged, can we keep tick_nohz_idle_prepare()
under the name tick_nohz_idle_enter() so that we stay symetric? And then make xen call
the two functions:

    tick_nohz_idle_enter();
    tick_nohz_idle_go_idle();

Also can we rename tick_nohz_idle_go_idle() to tick_nohz_idle_stop_tick() ?
This will be more self-explanatory.

Thanks.
Rafael J. Wysocki March 8, 2018, 9:05 a.m. UTC | #2
On Thursday, March 8, 2018 12:39:12 AM CET Frederic Weisbecker wrote:
> On Tue, Mar 06, 2018 at 10:02:15AM +0100, Rafael J. Wysocki wrote:
> > Index: linux-pm/kernel/sched/idle.c
> > ===================================================================
> > --- linux-pm.orig/kernel/sched/idle.c
> > +++ linux-pm/kernel/sched/idle.c
> > @@ -220,13 +220,17 @@ static void do_idle(void)
> >  	 */
> >  
> >  	__current_set_polling();
> > -	tick_nohz_idle_enter();
> > +	tick_nohz_idle_prepare();
> 
> Since we leave tick_nohz_idle_exit() unchanged, can we keep tick_nohz_idle_prepare()
> under the name tick_nohz_idle_enter() so that we stay symetric? And then make xen call
> the two functions:
> 
>     tick_nohz_idle_enter();
>     tick_nohz_idle_go_idle();

No problem with that.

> Also can we rename tick_nohz_idle_go_idle() to tick_nohz_idle_stop_tick() ?
> This will be more self-explanatory.

But it doesn't always stop the tick which is why I chose the other name.
diff mbox

Patch

Index: linux-pm/kernel/sched/idle.c
===================================================================
--- linux-pm.orig/kernel/sched/idle.c
+++ linux-pm/kernel/sched/idle.c
@@ -220,13 +220,17 @@  static void do_idle(void)
 	 */
 
 	__current_set_polling();
-	tick_nohz_idle_enter();
+	tick_nohz_idle_prepare();
 
 	while (!need_resched()) {
 		check_pgt_cache();
 		rmb();
 
 		if (cpu_is_offline(cpu)) {
+			local_irq_disable();
+			tick_nohz_idle_go_idle(true);
+			local_irq_enable();
+
 			cpuhp_report_idle_dead();
 			arch_cpu_idle_dead();
 		}
@@ -240,10 +244,13 @@  static void do_idle(void)
 		 * broadcast device expired for us, we don't want to go deep
 		 * idle as we know that the IPI is going to arrive right away.
 		 */
-		if (cpu_idle_force_poll || tick_check_broadcast_expired())
+		if (cpu_idle_force_poll || tick_check_broadcast_expired()) {
+			tick_nohz_idle_go_idle(false);
 			cpu_idle_poll();
-		else
+		} else {
+			tick_nohz_idle_go_idle(true);
 			cpuidle_idle_call();
+		}
 		arch_cpu_idle_exit();
 	}
 
Index: linux-pm/kernel/time/tick-sched.c
===================================================================
--- linux-pm.orig/kernel/time/tick-sched.c
+++ linux-pm/kernel/time/tick-sched.c
@@ -1007,7 +1007,7 @@  void tick_nohz_irq_exit(void)
 	struct tick_sched *ts = this_cpu_ptr(&tick_cpu_sched);
 
 	if (ts->inidle)
-		__tick_nohz_idle_enter(ts, true);
+		__tick_nohz_idle_enter(ts, false);
 	else
 		tick_nohz_full_update_tick(ts);
 }