Message ID | 20221025135850.51044-2-anna-maria@linutronix.de (mailing list archive) |
---|---|
State | Handled Elsewhere, archived |
Headers | show |
Series | timer: Move from a push remote at enqueue to a pull at expiry model | expand |
On Tue, Oct 25, 2022 at 03:58:34PM +0200, Anna-Maria Behnsen wrote: > Note: This is a proposal only. I was waiting on input how to change this > driver properly to use the already existing infrastructure. See therfore > the thread on linux-pm mailinglist: > https://lore.kernel.org/linux-pm/4c99f34b-40f1-e6cc-2669-7854b615b5fd@linutronix.de/ > > gpstates timer is the only timer using TIMER_PINNED and TIMER_DEFERRABLE > flag. When moving to hierarchical timer pull model, pinned and deferrable > timers are stored in separate bases. > > To ensure gpstates timer always expires on the CPU where it is pinned to, > keep only TIMER_PINNED flag and drop TIMER_DEFERRABLE flag. OTOH there are deferrable timers out there that expect to run on a specific CPU, because there are always queued with add_timer_on(). For example workqueues using DECLARE_DEFERRABLE_WORK() that are queued with queue_delayed_work_on(). Like vmstat(). Those are not explicitely pinned because they don't rely on __mod_timer() but they expect CPU affinity. Thanks. > > While at it, rewrite comment explaining the rule for timer expiry for the > next interval and fix whitespace damages. > > Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> > Cc: linux-pm@vger.kernel.org > Cc: Rafael J. Wysocki <rafael@kernel.org> > Cc: Viresh Kumar <viresh.kumar@linaro.org> > Cc: Michael Ellerman <mpe@ellerman.id.au> > --- > drivers/cpufreq/powernv-cpufreq.c | 15 +++++++-------- > 1 file changed, 7 insertions(+), 8 deletions(-) > > diff --git a/drivers/cpufreq/powernv-cpufreq.c b/drivers/cpufreq/powernv-cpufreq.c > index fddbd1ea1635..08d6bd54539d 100644 > --- a/drivers/cpufreq/powernv-cpufreq.c > +++ b/drivers/cpufreq/powernv-cpufreq.c > @@ -640,18 +640,18 @@ static inline int calc_global_pstate(unsigned int elapsed_time, > return highest_lpstate_idx + index_diff; > } > > -static inline void queue_gpstate_timer(struct global_pstate_info *gpstates) > +static inline void queue_gpstate_timer(struct global_pstate_info *gpstates) > { > unsigned int timer_interval; > > /* > - * Setting up timer to fire after GPSTATE_TIMER_INTERVAL ms, But > - * if it exceeds MAX_RAMP_DOWN_TIME ms for ramp down time. > - * Set timer such that it fires exactly at MAX_RAMP_DOWN_TIME > - * seconds of ramp down time. > + * Timer should expire next time after GPSTATE_TIMER_INTERVAL. If > + * the resulting interval (elapsed time + interval) between last > + * and next timer expiry is greater than MAX_RAMP_DOWN_TIME, ensure > + * it is maximum MAX_RAMP_DOWN_TIME when queueing the next timer. > */ > if ((gpstates->elapsed_time + GPSTATE_TIMER_INTERVAL) > - > MAX_RAMP_DOWN_TIME) > + > MAX_RAMP_DOWN_TIME) > timer_interval = MAX_RAMP_DOWN_TIME - gpstates->elapsed_time; > else > timer_interval = GPSTATE_TIMER_INTERVAL; > @@ -865,8 +865,7 @@ static int powernv_cpufreq_cpu_init(struct cpufreq_policy *policy) > > /* initialize timer */ > gpstates->policy = policy; > - timer_setup(&gpstates->timer, gpstate_timer_handler, > - TIMER_PINNED | TIMER_DEFERRABLE); > + timer_setup(&gpstates->timer, gpstate_timer_handler, TIMER_PINNED); > gpstates->timer.expires = jiffies + > msecs_to_jiffies(GPSTATE_TIMER_INTERVAL); > spin_lock_init(&gpstates->gpstate_lock); > -- > 2.30.2 >
On Wed, 26 Oct 2022, Frederic Weisbecker wrote: > On Tue, Oct 25, 2022 at 03:58:34PM +0200, Anna-Maria Behnsen wrote: > > Note: This is a proposal only. I was waiting on input how to change this > > driver properly to use the already existing infrastructure. See therfore > > the thread on linux-pm mailinglist: > > https://lore.kernel.org/linux-pm/4c99f34b-40f1-e6cc-2669-7854b615b5fd@linutronix.de/ > > > > gpstates timer is the only timer using TIMER_PINNED and TIMER_DEFERRABLE > > flag. When moving to hierarchical timer pull model, pinned and deferrable > > timers are stored in separate bases. > > > > To ensure gpstates timer always expires on the CPU where it is pinned to, > > keep only TIMER_PINNED flag and drop TIMER_DEFERRABLE flag. > > OTOH there are deferrable timers out there that expect to run on a > specific CPU, because there are always queued with add_timer_on(). > > For example workqueues using DECLARE_DEFERRABLE_WORK() that are queued > with queue_delayed_work_on(). Like vmstat(). > > Those are not explicitely pinned because they don't rely on __mod_timer() > but they expect CPU affinity. > You are right. In contrast to the original plan, I'm not able (yet) to remove the deferrable timers completely. But all timers using the add_timer_on() path need the TIMER_PINNED flag. Then three timer bases per CPU will be available: - global base (TIMER_PINNED is not set) - local base (TIMER_PINNED is set but not TIMER_DEFERRABLE) - deferrable pinned base (TIMER_PINNED and TIMER_DEFERRABLE is set) The logic stays the same as already implemented in patch queue: Timers in global base will not prevent CPU from going idle. When the CPU has the migrator duty, timers in hierarchy are taken into account. Timers in local base force the CPU to wake up. Timers in the deferrable pinned base are not taken into account when going idle. With this, the rework of cpufreq driver is no longer required - the timer will end up in deferrable pinned base the same with vmstat. Thanks, Anna-Maria
diff --git a/drivers/cpufreq/powernv-cpufreq.c b/drivers/cpufreq/powernv-cpufreq.c index fddbd1ea1635..08d6bd54539d 100644 --- a/drivers/cpufreq/powernv-cpufreq.c +++ b/drivers/cpufreq/powernv-cpufreq.c @@ -640,18 +640,18 @@ static inline int calc_global_pstate(unsigned int elapsed_time, return highest_lpstate_idx + index_diff; } -static inline void queue_gpstate_timer(struct global_pstate_info *gpstates) +static inline void queue_gpstate_timer(struct global_pstate_info *gpstates) { unsigned int timer_interval; /* - * Setting up timer to fire after GPSTATE_TIMER_INTERVAL ms, But - * if it exceeds MAX_RAMP_DOWN_TIME ms for ramp down time. - * Set timer such that it fires exactly at MAX_RAMP_DOWN_TIME - * seconds of ramp down time. + * Timer should expire next time after GPSTATE_TIMER_INTERVAL. If + * the resulting interval (elapsed time + interval) between last + * and next timer expiry is greater than MAX_RAMP_DOWN_TIME, ensure + * it is maximum MAX_RAMP_DOWN_TIME when queueing the next timer. */ if ((gpstates->elapsed_time + GPSTATE_TIMER_INTERVAL) - > MAX_RAMP_DOWN_TIME) + > MAX_RAMP_DOWN_TIME) timer_interval = MAX_RAMP_DOWN_TIME - gpstates->elapsed_time; else timer_interval = GPSTATE_TIMER_INTERVAL; @@ -865,8 +865,7 @@ static int powernv_cpufreq_cpu_init(struct cpufreq_policy *policy) /* initialize timer */ gpstates->policy = policy; - timer_setup(&gpstates->timer, gpstate_timer_handler, - TIMER_PINNED | TIMER_DEFERRABLE); + timer_setup(&gpstates->timer, gpstate_timer_handler, TIMER_PINNED); gpstates->timer.expires = jiffies + msecs_to_jiffies(GPSTATE_TIMER_INTERVAL); spin_lock_init(&gpstates->gpstate_lock);
Note: This is a proposal only. I was waiting on input how to change this driver properly to use the already existing infrastructure. See therfore the thread on linux-pm mailinglist: https://lore.kernel.org/linux-pm/4c99f34b-40f1-e6cc-2669-7854b615b5fd@linutronix.de/ gpstates timer is the only timer using TIMER_PINNED and TIMER_DEFERRABLE flag. When moving to hierarchical timer pull model, pinned and deferrable timers are stored in separate bases. To ensure gpstates timer always expires on the CPU where it is pinned to, keep only TIMER_PINNED flag and drop TIMER_DEFERRABLE flag. While at it, rewrite comment explaining the rule for timer expiry for the next interval and fix whitespace damages. Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Cc: linux-pm@vger.kernel.org Cc: Rafael J. Wysocki <rafael@kernel.org> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Michael Ellerman <mpe@ellerman.id.au> --- drivers/cpufreq/powernv-cpufreq.c | 15 +++++++-------- 1 file changed, 7 insertions(+), 8 deletions(-)