[v12,3/4] cpuidle: Export the next timer/tick expiration for a CPU
diff mbox series

Message ID 20190227195836.24739-4-ulf.hansson@linaro.org
State Changes Requested
Headers show
Series
  • PM / Domains: Support hierarchical CPU arrangement (PSCI/ARM)
Related show

Commit Message

Ulf Hansson Feb. 27, 2019, 7:58 p.m. UTC
To be able to predict the sleep duration for a CPU that is entering idle,
knowing when the next timer/tick is going to expire, is extremely useful.
Both the teo and the menu cpuidle governors already makes use of this
information, while selecting an idle state.

Moving forward, the similar prediction needs to be done, but for a group of
idle CPUs rather than for a single idle CPU. Following changes implements a
new genpd governor, which needs this.

Support this, by sharing a new function called
tick_nohz_get_next_hrtimer(), which returns the next hrtimer or the next
tick, whatever that expires first.

Additionally, when cpuidle is about to invoke the ->enter() callback, then
call tick_nohz_get_next_hrtimer() and store its return value in the per CPU
struct cpuidle_device, as to make it available outside cpuidle.

Do note, at the point when cpuidle calls tick_nohz_get_next_hrtimer(), the
governor's ->select() callback has already made a decision whether to stop
the tick or not. In this way, tick_nohz_get_next_hrtimer() actually returns
the next timer expiration, whatever origin.

Cc: Lina Iyer <ilina@codeaurora.org>
Co-developed-by: Lina Iyer <lina.iyer@linaro.org>
Co-developed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
---

Changes in v12:
	- New patch.

---
 drivers/cpuidle/cpuidle.c |  8 ++++++++
 include/linux/cpuidle.h   |  1 +
 include/linux/tick.h      |  7 ++++++-
 kernel/time/tick-sched.c  | 12 ++++++++++++
 4 files changed, 27 insertions(+), 1 deletion(-)

Comments

Daniel Lezcano March 21, 2019, 10:04 a.m. UTC | #1
On 27/02/2019 20:58, Ulf Hansson wrote:
> To be able to predict the sleep duration for a CPU that is entering idle,
> knowing when the next timer/tick is going to expire, is extremely useful.
> Both the teo and the menu cpuidle governors already makes use of this
> information, while selecting an idle state.
> 
> Moving forward, the similar prediction needs to be done, but for a group of
> idle CPUs rather than for a single idle CPU. Following changes implements a
> new genpd governor, which needs this.
> 
> Support this, by sharing a new function called
> tick_nohz_get_next_hrtimer(), which returns the next hrtimer or the next
> tick, whatever that expires first.
> 
> Additionally, when cpuidle is about to invoke the ->enter() callback, then
> call tick_nohz_get_next_hrtimer() and store its return value in the per CPU
> struct cpuidle_device, as to make it available outside cpuidle.
> 
> Do note, at the point when cpuidle calls tick_nohz_get_next_hrtimer(), the
> governor's ->select() callback has already made a decision whether to stop
> the tick or not. In this way, tick_nohz_get_next_hrtimer() actually returns
> the next timer expiration, whatever origin.
> 
> Cc: Lina Iyer <ilina@codeaurora.org>
> Co-developed-by: Lina Iyer <lina.iyer@linaro.org>
> Co-developed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
> ---

Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Rafael J. Wysocki March 25, 2019, 12:19 p.m. UTC | #2
On Wednesday, February 27, 2019 8:58:35 PM CET Ulf Hansson wrote:
> To be able to predict the sleep duration for a CPU that is entering idle,
> knowing when the next timer/tick is going to expire, is extremely useful.
> Both the teo and the menu cpuidle governors already makes use of this
> information, while selecting an idle state.
> 
> Moving forward, the similar prediction needs to be done, but for a group of
> idle CPUs rather than for a single idle CPU. Following changes implements a
> new genpd governor, which needs this.
> 
> Support this, by sharing a new function called
> tick_nohz_get_next_hrtimer(), which returns the next hrtimer or the next
> tick, whatever that expires first.
> 
> Additionally, when cpuidle is about to invoke the ->enter() callback, then
> call tick_nohz_get_next_hrtimer() and store its return value in the per CPU
> struct cpuidle_device, as to make it available outside cpuidle.
> 
> Do note, at the point when cpuidle calls tick_nohz_get_next_hrtimer(), the
> governor's ->select() callback has already made a decision whether to stop
> the tick or not. In this way, tick_nohz_get_next_hrtimer() actually returns
> the next timer expiration, whatever origin.
> 
> Cc: Lina Iyer <ilina@codeaurora.org>
> Co-developed-by: Lina Iyer <lina.iyer@linaro.org>
> Co-developed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
> ---
> 
> Changes in v12:
> 	- New patch.
> 
> ---
>  drivers/cpuidle/cpuidle.c |  8 ++++++++
>  include/linux/cpuidle.h   |  1 +
>  include/linux/tick.h      |  7 ++++++-
>  kernel/time/tick-sched.c  | 12 ++++++++++++
>  4 files changed, 27 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
> index 7f108309e871..255365b1a6ab 100644
> --- a/drivers/cpuidle/cpuidle.c
> +++ b/drivers/cpuidle/cpuidle.c
> @@ -328,6 +328,14 @@ int cpuidle_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
>  int cpuidle_enter(struct cpuidle_driver *drv, struct cpuidle_device *dev,
>  		  int index)
>  {
> +	/*
> +	 * Store the next hrtimer, which becomes either next tick or the next
> +	 * timer event, whatever expires first. Additionally, to make this data
> +	 * useful for consumers outside cpuidle, we rely on that the governor's
> +	 * ->select() callback have decided, whether to stop the tick or not.
> +	 */
> +	dev->next_hrtimer = tick_nohz_get_next_hrtimer();

I would use WRITE_ONCE() to set next_hrtimer here and READ_ONCE() for
reading that value in the next patch, as a matter of annotation if
nothing else.

> +
>  	if (cpuidle_state_is_coupled(drv, index))
>  		return cpuidle_enter_state_coupled(dev, drv, index);
>  	return cpuidle_enter_state(dev, drv, index);

Also I would clear next_hrtimer here to avoid dragging stale values
around.

Apart from this the series LGTM.

Thanks!
Ulf Hansson March 25, 2019, 2:23 p.m. UTC | #3
On Mon, 25 Mar 2019 at 13:21, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>
> On Wednesday, February 27, 2019 8:58:35 PM CET Ulf Hansson wrote:
> > To be able to predict the sleep duration for a CPU that is entering idle,
> > knowing when the next timer/tick is going to expire, is extremely useful.
> > Both the teo and the menu cpuidle governors already makes use of this
> > information, while selecting an idle state.
> >
> > Moving forward, the similar prediction needs to be done, but for a group of
> > idle CPUs rather than for a single idle CPU. Following changes implements a
> > new genpd governor, which needs this.
> >
> > Support this, by sharing a new function called
> > tick_nohz_get_next_hrtimer(), which returns the next hrtimer or the next
> > tick, whatever that expires first.
> >
> > Additionally, when cpuidle is about to invoke the ->enter() callback, then
> > call tick_nohz_get_next_hrtimer() and store its return value in the per CPU
> > struct cpuidle_device, as to make it available outside cpuidle.
> >
> > Do note, at the point when cpuidle calls tick_nohz_get_next_hrtimer(), the
> > governor's ->select() callback has already made a decision whether to stop
> > the tick or not. In this way, tick_nohz_get_next_hrtimer() actually returns
> > the next timer expiration, whatever origin.
> >
> > Cc: Lina Iyer <ilina@codeaurora.org>
> > Co-developed-by: Lina Iyer <lina.iyer@linaro.org>
> > Co-developed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
> > Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
> > ---
> >
> > Changes in v12:
> >       - New patch.
> >
> > ---
> >  drivers/cpuidle/cpuidle.c |  8 ++++++++
> >  include/linux/cpuidle.h   |  1 +
> >  include/linux/tick.h      |  7 ++++++-
> >  kernel/time/tick-sched.c  | 12 ++++++++++++
> >  4 files changed, 27 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
> > index 7f108309e871..255365b1a6ab 100644
> > --- a/drivers/cpuidle/cpuidle.c
> > +++ b/drivers/cpuidle/cpuidle.c
> > @@ -328,6 +328,14 @@ int cpuidle_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
> >  int cpuidle_enter(struct cpuidle_driver *drv, struct cpuidle_device *dev,
> >                 int index)
> >  {
> > +     /*
> > +      * Store the next hrtimer, which becomes either next tick or the next
> > +      * timer event, whatever expires first. Additionally, to make this data
> > +      * useful for consumers outside cpuidle, we rely on that the governor's
> > +      * ->select() callback have decided, whether to stop the tick or not.
> > +      */
> > +     dev->next_hrtimer = tick_nohz_get_next_hrtimer();
>
> I would use WRITE_ONCE() to set next_hrtimer here and READ_ONCE() for
> reading that value in the next patch, as a matter of annotation if
> nothing else.

Okay!

>
> > +
> >       if (cpuidle_state_is_coupled(drv, index))
> >               return cpuidle_enter_state_coupled(dev, drv, index);
> >       return cpuidle_enter_state(dev, drv, index);
>
> Also I would clear next_hrtimer here to avoid dragging stale values
> around.

Right, I can do that.

However, at least in my case it would be an unnecessary update of the
variable, as I am never in a path where the value can be "stale". Even
if one theoretically could use a stale value, it's seems likely to not
be an issue, don't you think? Anyway, if I don't hear from you, I do
the change as you suggested.

>
> Apart from this the series LGTM.

Great, thanks. I re-spin a new version.

Kind regards
Uffe
Rafael J. Wysocki March 26, 2019, 10:36 a.m. UTC | #4
On Mon, Mar 25, 2019 at 3:24 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
>
> On Mon, 25 Mar 2019 at 13:21, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> >
> > On Wednesday, February 27, 2019 8:58:35 PM CET Ulf Hansson wrote:
> > > To be able to predict the sleep duration for a CPU that is entering idle,
> > > knowing when the next timer/tick is going to expire, is extremely useful.
> > > Both the teo and the menu cpuidle governors already makes use of this
> > > information, while selecting an idle state.
> > >

[cut]

> >
> > > +
> > >       if (cpuidle_state_is_coupled(drv, index))
> > >               return cpuidle_enter_state_coupled(dev, drv, index);
> > >       return cpuidle_enter_state(dev, drv, index);
> >
> > Also I would clear next_hrtimer here to avoid dragging stale values
> > around.
>
> Right, I can do that.
>
> However, at least in my case it would be an unnecessary update of the
> variable, as I am never in a path where the value can be "stale".

It easily can AFAICS.  After all, cpu_power_down_ok() need not run on
the same CPU that is setting next_hrtimer here.

> Even if one theoretically could use a stale value, it's seems likely to not
> be an issue, don't you think?

That would be because of the locking in the ->enter() callback I
suppose?  But is it actually universally guaranteed that setting
next_hrtimer will never be reordered with acquiring the lock?

Also, there is some overhead to be avoided if cpu_power_down_ok()
checked the next_hrtimer of the other CPUs against 0 explicitly, isn't
it?
Ulf Hansson March 26, 2019, 11:29 a.m. UTC | #5
On Tue, 26 Mar 2019 at 11:36, Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Mon, Mar 25, 2019 at 3:24 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
> >
> > On Mon, 25 Mar 2019 at 13:21, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > >
> > > On Wednesday, February 27, 2019 8:58:35 PM CET Ulf Hansson wrote:
> > > > To be able to predict the sleep duration for a CPU that is entering idle,
> > > > knowing when the next timer/tick is going to expire, is extremely useful.
> > > > Both the teo and the menu cpuidle governors already makes use of this
> > > > information, while selecting an idle state.
> > > >
>
> [cut]
>
> > >
> > > > +
> > > >       if (cpuidle_state_is_coupled(drv, index))
> > > >               return cpuidle_enter_state_coupled(dev, drv, index);
> > > >       return cpuidle_enter_state(dev, drv, index);
> > >
> > > Also I would clear next_hrtimer here to avoid dragging stale values
> > > around.
> >
> > Right, I can do that.
> >
> > However, at least in my case it would be an unnecessary update of the
> > variable, as I am never in a path where the value can be "stale".
>
> It easily can AFAICS.  After all, cpu_power_down_ok() need not run on
> the same CPU that is setting next_hrtimer here.

That's correct.

>
> > Even if one theoretically could use a stale value, it's seems likely to not
> > be an issue, don't you think?
>
> That would be because of the locking in the ->enter() callback I
> suppose?  But is it actually universally guaranteed that setting
> next_hrtimer will never be reordered with acquiring the lock?

In the PSCI case and for those CPUs that shares the same genpd
governor (even hierarchically), then yes.

Unfortunate, I haven't been able to explore this in that great detail
for other legacy ARM32 platforms, so maybe it's just better to play
safe, as you suggest and avoid a stale value.

>
> Also, there is some overhead to be avoided if cpu_power_down_ok()
> checked the next_hrtimer of the other CPUs against 0 explicitly, isn't
> it?

In regards to overhead and when using genpd for CPUs, there are a
couple of things I have in mind that we could try to improve. Yes,
checking for next_hrtimer against 0 could be one thing to consider.

Kind regards
Uffe

Patch
diff mbox series

diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index 7f108309e871..255365b1a6ab 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -328,6 +328,14 @@  int cpuidle_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
 int cpuidle_enter(struct cpuidle_driver *drv, struct cpuidle_device *dev,
 		  int index)
 {
+	/*
+	 * Store the next hrtimer, which becomes either next tick or the next
+	 * timer event, whatever expires first. Additionally, to make this data
+	 * useful for consumers outside cpuidle, we rely on that the governor's
+	 * ->select() callback have decided, whether to stop the tick or not.
+	 */
+	dev->next_hrtimer = tick_nohz_get_next_hrtimer();
+
 	if (cpuidle_state_is_coupled(drv, index))
 		return cpuidle_enter_state_coupled(dev, drv, index);
 	return cpuidle_enter_state(dev, drv, index);
diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h
index 3b39472324a3..bb9a0db89f1a 100644
--- a/include/linux/cpuidle.h
+++ b/include/linux/cpuidle.h
@@ -83,6 +83,7 @@  struct cpuidle_device {
 	unsigned int		use_deepest_state:1;
 	unsigned int		poll_time_limit:1;
 	unsigned int		cpu;
+	ktime_t			next_hrtimer;
 
 	int			last_residency;
 	struct cpuidle_state_usage	states_usage[CPUIDLE_STATE_MAX];
diff --git a/include/linux/tick.h b/include/linux/tick.h
index 55388ab45fd4..8891b5ac3e40 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -122,6 +122,7 @@  extern void tick_nohz_idle_enter(void);
 extern void tick_nohz_idle_exit(void);
 extern void tick_nohz_irq_exit(void);
 extern bool tick_nohz_idle_got_tick(void);
+extern ktime_t tick_nohz_get_next_hrtimer(void);
 extern ktime_t tick_nohz_get_sleep_length(ktime_t *delta_next);
 extern unsigned long tick_nohz_get_idle_calls(void);
 extern unsigned long tick_nohz_get_idle_calls_cpu(int cpu);
@@ -145,7 +146,11 @@  static inline void tick_nohz_idle_restart_tick(void) { }
 static inline void tick_nohz_idle_enter(void) { }
 static inline void tick_nohz_idle_exit(void) { }
 static inline bool tick_nohz_idle_got_tick(void) { return false; }
-
+static inline ktime_t tick_nohz_get_next_hrtimer(void)
+{
+	/* Next wake up is the tick period, assume it starts now */
+	return ktime_add(ktime_get(), TICK_NSEC);
+}
 static inline ktime_t tick_nohz_get_sleep_length(ktime_t *delta_next)
 {
 	*delta_next = TICK_NSEC;
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 6fa52cd6df0b..8d18e03124ff 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -1022,6 +1022,18 @@  bool tick_nohz_idle_got_tick(void)
 	return false;
 }
 
+/**
+ * tick_nohz_get_next_hrtimer - return the next expiration time for the hrtimer
+ * or the tick, whatever that expires first. Note that, if the tick has been
+ * stopped, it returns the next hrtimer.
+ *
+ * Called from power state control code with interrupts disabled
+ */
+ktime_t tick_nohz_get_next_hrtimer(void)
+{
+	return __this_cpu_read(tick_cpu_device.evtdev)->next_event;
+}
+
 /**
  * tick_nohz_get_sleep_length - return the expected length of the current sleep
  * @delta_next: duration until the next event if the tick cannot be stopped