diff mbox series

[v3,01/17] cpufreq: Prepare timer flags for hierarchical timer pull model

Message ID 20221025135850.51044-2-anna-maria@linutronix.de (mailing list archive)
State Handled Elsewhere, archived
Headers show
Series timer: Move from a push remote at enqueue to a pull at expiry model | expand

Commit Message

Anna-Maria Behnsen Oct. 25, 2022, 1:58 p.m. UTC
Note: This is a proposal only. I was waiting on input how to change this
driver properly to use the already existing infrastructure. See therfore
the thread on linux-pm mailinglist:
https://lore.kernel.org/linux-pm/4c99f34b-40f1-e6cc-2669-7854b615b5fd@linutronix.de/

gpstates timer is the only timer using TIMER_PINNED and TIMER_DEFERRABLE
flag. When moving to hierarchical timer pull model, pinned and deferrable
timers are stored in separate bases.

To ensure gpstates timer always expires on the CPU where it is pinned to,
keep only TIMER_PINNED flag and drop TIMER_DEFERRABLE flag.

While at it, rewrite comment explaining the rule for timer expiry for the
next interval and fix whitespace damages.

Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
Cc: linux-pm@vger.kernel.org
Cc: Rafael J. Wysocki <rafael@kernel.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
---
 drivers/cpufreq/powernv-cpufreq.c | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

Comments

Frederic Weisbecker Oct. 26, 2022, 1:54 p.m. UTC | #1
On Tue, Oct 25, 2022 at 03:58:34PM +0200, Anna-Maria Behnsen wrote:
> Note: This is a proposal only. I was waiting on input how to change this
> driver properly to use the already existing infrastructure. See therfore
> the thread on linux-pm mailinglist:
> https://lore.kernel.org/linux-pm/4c99f34b-40f1-e6cc-2669-7854b615b5fd@linutronix.de/
> 
> gpstates timer is the only timer using TIMER_PINNED and TIMER_DEFERRABLE
> flag. When moving to hierarchical timer pull model, pinned and deferrable
> timers are stored in separate bases.
> 
> To ensure gpstates timer always expires on the CPU where it is pinned to,
> keep only TIMER_PINNED flag and drop TIMER_DEFERRABLE flag.

OTOH there are deferrable timers out there that expect to run on a
specific CPU, because there are always queued with add_timer_on().

For example workqueues using DECLARE_DEFERRABLE_WORK() that are queued
with queue_delayed_work_on(). Like vmstat().

Those are not explicitely pinned because they don't rely on __mod_timer()
but they expect CPU affinity.

Thanks.

> 
> While at it, rewrite comment explaining the rule for timer expiry for the
> next interval and fix whitespace damages.
> 
> Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
> Cc: linux-pm@vger.kernel.org
> Cc: Rafael J. Wysocki <rafael@kernel.org>
> Cc: Viresh Kumar <viresh.kumar@linaro.org>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> ---
>  drivers/cpufreq/powernv-cpufreq.c | 15 +++++++--------
>  1 file changed, 7 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/cpufreq/powernv-cpufreq.c b/drivers/cpufreq/powernv-cpufreq.c
> index fddbd1ea1635..08d6bd54539d 100644
> --- a/drivers/cpufreq/powernv-cpufreq.c
> +++ b/drivers/cpufreq/powernv-cpufreq.c
> @@ -640,18 +640,18 @@ static inline int calc_global_pstate(unsigned int elapsed_time,
>  		return highest_lpstate_idx + index_diff;
>  }
>  
> -static inline void  queue_gpstate_timer(struct global_pstate_info *gpstates)
> +static inline void queue_gpstate_timer(struct global_pstate_info *gpstates)
>  {
>  	unsigned int timer_interval;
>  
>  	/*
> -	 * Setting up timer to fire after GPSTATE_TIMER_INTERVAL ms, But
> -	 * if it exceeds MAX_RAMP_DOWN_TIME ms for ramp down time.
> -	 * Set timer such that it fires exactly at MAX_RAMP_DOWN_TIME
> -	 * seconds of ramp down time.
> +	 * Timer should expire next time after GPSTATE_TIMER_INTERVAL. If
> +	 * the resulting interval (elapsed time + interval) between last
> +	 * and next timer expiry is greater than MAX_RAMP_DOWN_TIME, ensure
> +	 * it is maximum MAX_RAMP_DOWN_TIME when queueing the next timer.
>  	 */
>  	if ((gpstates->elapsed_time + GPSTATE_TIMER_INTERVAL)
> -	     > MAX_RAMP_DOWN_TIME)
> +	    > MAX_RAMP_DOWN_TIME)
>  		timer_interval = MAX_RAMP_DOWN_TIME - gpstates->elapsed_time;
>  	else
>  		timer_interval = GPSTATE_TIMER_INTERVAL;
> @@ -865,8 +865,7 @@ static int powernv_cpufreq_cpu_init(struct cpufreq_policy *policy)
>  
>  	/* initialize timer */
>  	gpstates->policy = policy;
> -	timer_setup(&gpstates->timer, gpstate_timer_handler,
> -		    TIMER_PINNED | TIMER_DEFERRABLE);
> +	timer_setup(&gpstates->timer, gpstate_timer_handler, TIMER_PINNED);
>  	gpstates->timer.expires = jiffies +
>  				msecs_to_jiffies(GPSTATE_TIMER_INTERVAL);
>  	spin_lock_init(&gpstates->gpstate_lock);
> -- 
> 2.30.2
>
Anna-Maria Behnsen Oct. 31, 2022, 3:22 p.m. UTC | #2
On Wed, 26 Oct 2022, Frederic Weisbecker wrote:

> On Tue, Oct 25, 2022 at 03:58:34PM +0200, Anna-Maria Behnsen wrote:
> > Note: This is a proposal only. I was waiting on input how to change this
> > driver properly to use the already existing infrastructure. See therfore
> > the thread on linux-pm mailinglist:
> > https://lore.kernel.org/linux-pm/4c99f34b-40f1-e6cc-2669-7854b615b5fd@linutronix.de/
> > 
> > gpstates timer is the only timer using TIMER_PINNED and TIMER_DEFERRABLE
> > flag. When moving to hierarchical timer pull model, pinned and deferrable
> > timers are stored in separate bases.
> > 
> > To ensure gpstates timer always expires on the CPU where it is pinned to,
> > keep only TIMER_PINNED flag and drop TIMER_DEFERRABLE flag.
> 
> OTOH there are deferrable timers out there that expect to run on a
> specific CPU, because there are always queued with add_timer_on().
> 
> For example workqueues using DECLARE_DEFERRABLE_WORK() that are queued
> with queue_delayed_work_on(). Like vmstat().
> 
> Those are not explicitely pinned because they don't rely on __mod_timer()
> but they expect CPU affinity.
> 

You are right. In contrast to the original plan, I'm not able (yet) to
remove the deferrable timers completely. But all timers using the
add_timer_on() path need the TIMER_PINNED flag. Then three timer bases per
CPU will be available:

- global base (TIMER_PINNED is not set)
- local base (TIMER_PINNED is set but not TIMER_DEFERRABLE)
- deferrable pinned base (TIMER_PINNED and TIMER_DEFERRABLE is set)

The logic stays the same as already implemented in patch queue: Timers in
global base will not prevent CPU from going idle. When the CPU has the
migrator duty, timers in hierarchy are taken into account. Timers in local
base force the CPU to wake up. Timers in the deferrable pinned base are not
taken into account when going idle.

With this, the rework of cpufreq driver is no longer required - the timer
will end up in deferrable pinned base the same with vmstat.

Thanks,

	Anna-Maria
diff mbox series

Patch

diff --git a/drivers/cpufreq/powernv-cpufreq.c b/drivers/cpufreq/powernv-cpufreq.c
index fddbd1ea1635..08d6bd54539d 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -640,18 +640,18 @@  static inline int calc_global_pstate(unsigned int elapsed_time,
 		return highest_lpstate_idx + index_diff;
 }
 
-static inline void  queue_gpstate_timer(struct global_pstate_info *gpstates)
+static inline void queue_gpstate_timer(struct global_pstate_info *gpstates)
 {
 	unsigned int timer_interval;
 
 	/*
-	 * Setting up timer to fire after GPSTATE_TIMER_INTERVAL ms, But
-	 * if it exceeds MAX_RAMP_DOWN_TIME ms for ramp down time.
-	 * Set timer such that it fires exactly at MAX_RAMP_DOWN_TIME
-	 * seconds of ramp down time.
+	 * Timer should expire next time after GPSTATE_TIMER_INTERVAL. If
+	 * the resulting interval (elapsed time + interval) between last
+	 * and next timer expiry is greater than MAX_RAMP_DOWN_TIME, ensure
+	 * it is maximum MAX_RAMP_DOWN_TIME when queueing the next timer.
 	 */
 	if ((gpstates->elapsed_time + GPSTATE_TIMER_INTERVAL)
-	     > MAX_RAMP_DOWN_TIME)
+	    > MAX_RAMP_DOWN_TIME)
 		timer_interval = MAX_RAMP_DOWN_TIME - gpstates->elapsed_time;
 	else
 		timer_interval = GPSTATE_TIMER_INTERVAL;
@@ -865,8 +865,7 @@  static int powernv_cpufreq_cpu_init(struct cpufreq_policy *policy)
 
 	/* initialize timer */
 	gpstates->policy = policy;
-	timer_setup(&gpstates->timer, gpstate_timer_handler,
-		    TIMER_PINNED | TIMER_DEFERRABLE);
+	timer_setup(&gpstates->timer, gpstate_timer_handler, TIMER_PINNED);
 	gpstates->timer.expires = jiffies +
 				msecs_to_jiffies(GPSTATE_TIMER_INTERVAL);
 	spin_lock_init(&gpstates->gpstate_lock);