Message ID | 20241024150413.518862-4-bigeasy@linutronix.de (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | softirq: Use a dedicated thread for timer wakeups on PREEMPT_RT. | expand |
Le Thu, Oct 24, 2024 at 04:55:51PM +0200, Sebastian Andrzej Siewior a écrit : > A timer/ hrtimer softirq is raised in-IRQ context. With threaded > interrupts enabled or on PREEMPT_RT this leads to waking the ksoftirqd > for the processing of the softirq. ksoftirqd runs as SCHED_OTHER which > means it will compete with other tasks for CPU ressources. > This can introduce long delays for timer processing on heavy loaded > systems and is not desired. > > Split the TIMER_SOFTIRQ and HRTIMER_SOFTIRQ processing into a dedicated > timers thread and let it run at the lowest SCHED_FIFO priority. > Wake-ups for RT tasks happen from hardirq context so only timer_list timers > and hrtimers for "regular" tasks are processed here. The higher priority > ensures that wakeups are performed before scheduling SCHED_OTHER tasks. > > Using a dedicated variable to store the pending softirq bits values > ensure that the timer are not accidentally picked up by ksoftirqd and > other threaded interrupts. > It shouldn't be picked up by ksoftirqd since it runs at lower priority. > However if ksoftirqd is already running while a timer fires, then > ksoftird will be PI-boosted due to the BH-lock to ktimer's priority. > Ideally we try to avoid having ksoftirqd running. > > The timer thread can pick up pending softirqs from ksoftirqd but only > if the softirq load is high. It is not be desired that the picked up > softirqs are processed at SCHED_FIFO priority under high softirq load > but this can already happen by a PI-boost by a force-threaded interrupt. > > [ frederic@kernel.org: rcutorture.c fixes, storm fix by introduction of > local_timers_pending() for tick_nohz_next_event() ] > > [ junxiao.chang@intel.com: Ensure ktimersd gets woken up even if a > softirq is currently served. ] > > Reviewed-by: Paul E. McKenney <paulmck@kernel.org> [rcutorture] > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Just a few nits: > --- > include/linux/interrupt.h | 44 +++++++++++++++++++++++++ > kernel/rcu/rcutorture.c | 6 ++++ > kernel/softirq.c | 69 ++++++++++++++++++++++++++++++++++++++- > kernel/time/hrtimer.c | 4 +-- > kernel/time/tick-sched.c | 2 +- > kernel/time/timer.c | 2 +- > 6 files changed, 122 insertions(+), 5 deletions(-) > > diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h > index 457151f9f263d..9637af78087f3 100644 > --- a/include/linux/interrupt.h > +++ b/include/linux/interrupt.h > @@ -616,6 +616,50 @@ extern void __raise_softirq_irqoff(unsigned int nr); > extern void raise_softirq_irqoff(unsigned int nr); > extern void raise_softirq(unsigned int nr); > > +/* > + * Handle timers in a dedicated thread at a low SCHED_FIFO priority instead in > + * ksoftirqd as to be prefred over SCHED_NORMAL tasks. > + */ This doesn't parse. How about, inspired by your changelog: """ Wake-ups for RT tasks happen from hardirq context so only timer_list timers and hrtimers for SCHED_OTHER tasks are processed from softirq. As they are raised from hardirq, their processing would normally happen from ksoftirqd which runs as SCHED_OTHER and compete with other tasks. Moving timers softirqs to a low-prio SCHED_FIFO kthread instead ensures that wakeups from timers are performed before scheduling the target SCHED_OTHER tasks. """ Thanks.
On 2024-10-28 15:01:55 [+0100], Frederic Weisbecker wrote: > > diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h > > index 457151f9f263d..9637af78087f3 100644 > > --- a/include/linux/interrupt.h > > +++ b/include/linux/interrupt.h > > @@ -616,6 +616,50 @@ extern void __raise_softirq_irqoff(unsigned int nr); > > extern void raise_softirq_irqoff(unsigned int nr); > > extern void raise_softirq(unsigned int nr); > > > > +/* > > + * Handle timers in a dedicated thread at a low SCHED_FIFO priority instead in > > + * ksoftirqd as to be prefred over SCHED_NORMAL tasks. > > + */ > > This doesn't parse. How about, inspired by your changelog: … What about this essay instead: | With forced-threaded interrupts enabled a raised softirq is deferred to | ksoftirqd unless it can be handled within the threaded interrupt. This | affects timer_list timers and hrtimers which are explicitly marked with | HRTIMER_MODE_SOFT. | With PREEMPT_RT enabled more hrtimers are moved to softirq for processing | which includes all timers which are not explicitly marked HRTIMER_MODE_HARD. | Userspace controlled timers (like the clock_nanosleep() interface) is divided | into two categories: Tasks with elevated scheduling policy including | SCHED_{FIFO|RR|DL} and the remaining scheduling policy. The tasks with the | elevated scheduling policy are woken up directly from the HARDIRQ while all | other wake ups are delayed to so softirq and so to ksoftirqd. | | The ksoftirqd runs at SCHED_OTHER policy at which it should remain since it | handles the softirq in an overloaded situation (not handled everything | within its last run). | If the timers are handled at SCHED_OTHER priority then they competes with all | other SCHED_OTHER tasks for CPU resources are possibly delayed. | Moving timers softirqs to a low priority SCHED_FIFO thread instead ensures | that timer are performed before scheduling any SCHED_OTHER thread. And with this piece of text I convinced myself to also enable this in the forced-threaded case. > Thanks. Sebastian
Le Tue, Oct 29, 2024 at 02:52:31PM +0100, Sebastian Andrzej Siewior a écrit : > On 2024-10-28 15:01:55 [+0100], Frederic Weisbecker wrote: > > > diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h > > > index 457151f9f263d..9637af78087f3 100644 > > > --- a/include/linux/interrupt.h > > > +++ b/include/linux/interrupt.h > > > @@ -616,6 +616,50 @@ extern void __raise_softirq_irqoff(unsigned int nr); > > > extern void raise_softirq_irqoff(unsigned int nr); > > > extern void raise_softirq(unsigned int nr); > > > > > > +/* > > > + * Handle timers in a dedicated thread at a low SCHED_FIFO priority instead in > > > + * ksoftirqd as to be prefred over SCHED_NORMAL tasks. > > > + */ > > > > This doesn't parse. How about, inspired by your changelog: > … > > What about this essay instead: > > | With forced-threaded interrupts enabled a raised softirq is deferred to > | ksoftirqd unless it can be handled within the threaded interrupt. This > | affects timer_list timers and hrtimers which are explicitly marked with > | HRTIMER_MODE_SOFT. > | With PREEMPT_RT enabled more hrtimers are moved to softirq for processing > | which includes all timers which are not explicitly marked HRTIMER_MODE_HARD. > | Userspace controlled timers (like the clock_nanosleep() interface) is divided > | into two categories: Tasks with elevated scheduling policy including > | SCHED_{FIFO|RR|DL} and the remaining scheduling policy. The tasks with the > | elevated scheduling policy are woken up directly from the HARDIRQ while all > | other wake ups are delayed to so softirq and so to ksoftirqd. First "so" is intended? > | > | The ksoftirqd runs at SCHED_OTHER policy at which it should remain since it > | handles the softirq in an overloaded situation (not handled everything > | within its last run). > | If the timers are handled at SCHED_OTHER priority then they competes with all > | other SCHED_OTHER tasks for CPU resources are possibly delayed. > | Moving timers softirqs to a low priority SCHED_FIFO thread instead ensures > | that timer are performed before scheduling any SCHED_OTHER thread. Works for me! > And with this piece of text I convinced myself to also enable this in > the forced-threaded case. Yes, that makes sense. Thanks. > > Thanks. > > Sebastian
On 2024-10-29 22:47:21 [+0100], Frederic Weisbecker wrote: > Le Tue, Oct 29, 2024 at 02:52:31PM +0100, Sebastian Andrzej Siewior a écrit : > > On 2024-10-28 15:01:55 [+0100], Frederic Weisbecker wrote: > > > > diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h > > > > index 457151f9f263d..9637af78087f3 100644 > > > > --- a/include/linux/interrupt.h > > > > +++ b/include/linux/interrupt.h > > > > @@ -616,6 +616,50 @@ extern void __raise_softirq_irqoff(unsigned int nr); > > > > extern void raise_softirq_irqoff(unsigned int nr); > > > > extern void raise_softirq(unsigned int nr); > > > > > > > > +/* > > > > + * Handle timers in a dedicated thread at a low SCHED_FIFO priority instead in > > > > + * ksoftirqd as to be prefred over SCHED_NORMAL tasks. > > > > + */ > > > > > > This doesn't parse. How about, inspired by your changelog: > > … > > > > What about this essay instead: > > > > | With forced-threaded interrupts enabled a raised softirq is deferred to > > | ksoftirqd unless it can be handled within the threaded interrupt. This > > | affects timer_list timers and hrtimers which are explicitly marked with > > | HRTIMER_MODE_SOFT. > > | With PREEMPT_RT enabled more hrtimers are moved to softirq for processing > > | which includes all timers which are not explicitly marked HRTIMER_MODE_HARD. > > | Userspace controlled timers (like the clock_nanosleep() interface) is divided > > | into two categories: Tasks with elevated scheduling policy including > > | SCHED_{FIFO|RR|DL} and the remaining scheduling policy. The tasks with the > > | elevated scheduling policy are woken up directly from the HARDIRQ while all > > | other wake ups are delayed to so softirq and so to ksoftirqd. > > First "so" is intended? No, it needs to go. > > | > > | The ksoftirqd runs at SCHED_OTHER policy at which it should remain since it > > | handles the softirq in an overloaded situation (not handled everything > > | within its last run). > > | If the timers are handled at SCHED_OTHER priority then they competes with all > > | other SCHED_OTHER tasks for CPU resources are possibly delayed. > > | Moving timers softirqs to a low priority SCHED_FIFO thread instead ensures > > | that timer are performed before scheduling any SCHED_OTHER thread. > > Works for me! Great. > > And with this piece of text I convinced myself to also enable this in > > the forced-threaded case. > > Yes, that makes sense. Good. Then I'm pick to up your tag for that patch. Thank you. > Thanks. > Sebastian
diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h index 457151f9f263d..9637af78087f3 100644 --- a/include/linux/interrupt.h +++ b/include/linux/interrupt.h @@ -616,6 +616,50 @@ extern void __raise_softirq_irqoff(unsigned int nr); extern void raise_softirq_irqoff(unsigned int nr); extern void raise_softirq(unsigned int nr); +/* + * Handle timers in a dedicated thread at a low SCHED_FIFO priority instead in + * ksoftirqd as to be prefred over SCHED_NORMAL tasks. + */ +#ifdef CONFIG_PREEMPT_RT +DECLARE_PER_CPU(struct task_struct *, timersd); +DECLARE_PER_CPU(unsigned long, pending_timer_softirq); + +void raise_ktimers_thread(unsigned int nr); + +static inline void raise_timer_softirq(void) +{ + raise_ktimers_thread(TIMER_SOFTIRQ); +} + +static inline void raise_hrtimer_softirq(void) +{ + raise_ktimers_thread(HRTIMER_SOFTIRQ); +} + +static inline unsigned int local_timers_pending(void) +{ + return __this_cpu_read(pending_timer_softirq); +} + +#else +static inline void raise_timer_softirq(void) +{ + lockdep_assert_in_irq(); + __raise_softirq_irqoff(TIMER_SOFTIRQ); +} + +static inline void raise_hrtimer_softirq(void) +{ + lockdep_assert_in_irq(); + __raise_softirq_irqoff(HRTIMER_SOFTIRQ); +} + +static inline unsigned int local_timers_pending(void) +{ + return local_softirq_pending(); +} +#endif + DECLARE_PER_CPU(struct task_struct *, ksoftirqd); static inline struct task_struct *this_cpu_ksoftirqd(void) diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c index bb75dbf5c800c..609687fd742d5 100644 --- a/kernel/rcu/rcutorture.c +++ b/kernel/rcu/rcutorture.c @@ -2440,6 +2440,12 @@ static int rcutorture_booster_init(unsigned int cpu) WARN_ON_ONCE(!t); sp.sched_priority = 2; sched_setscheduler_nocheck(t, SCHED_FIFO, &sp); +#ifdef CONFIG_PREEMPT_RT + t = per_cpu(timersd, cpu); + WARN_ON_ONCE(!t); + sp.sched_priority = 2; + sched_setscheduler_nocheck(t, SCHED_FIFO, &sp); +#endif } /* Don't allow time recalculation while creating a new task. */ diff --git a/kernel/softirq.c b/kernel/softirq.c index d082e7840f880..b452206cf93b2 100644 --- a/kernel/softirq.c +++ b/kernel/softirq.c @@ -624,6 +624,24 @@ static inline void tick_irq_exit(void) #endif } +#ifdef CONFIG_PREEMPT_RT +DEFINE_PER_CPU(struct task_struct *, timersd); +DEFINE_PER_CPU(unsigned long, pending_timer_softirq); + +static void wake_timersd(void) +{ + struct task_struct *tsk = __this_cpu_read(timersd); + + if (tsk) + wake_up_process(tsk); +} + +#else + +static inline void wake_timersd(void) { } + +#endif + static inline void __irq_exit_rcu(void) { #ifndef __ARCH_IRQ_EXIT_IRQS_DISABLED @@ -636,6 +654,10 @@ static inline void __irq_exit_rcu(void) if (!in_interrupt() && local_softirq_pending()) invoke_softirq(); + if (IS_ENABLED(CONFIG_PREEMPT_RT) && local_timers_pending() && + !(in_nmi() | in_hardirq())) + wake_timersd(); + tick_irq_exit(); } @@ -971,12 +993,57 @@ static struct smp_hotplug_thread softirq_threads = { .thread_comm = "ksoftirqd/%u", }; +#ifdef CONFIG_PREEMPT_RT +static void timersd_setup(unsigned int cpu) +{ + /* Above SCHED_NORMAL to handle timers before regular tasks. */ + sched_set_fifo_low(current); +} + +static int timersd_should_run(unsigned int cpu) +{ + return local_timers_pending(); +} + +void raise_ktimers_thread(unsigned int nr) +{ + lockdep_assert_in_irq(); + trace_softirq_raise(nr); + __this_cpu_or(pending_timer_softirq, 1 << nr); +} + +static void run_timersd(unsigned int cpu) +{ + unsigned int timer_si; + + ksoftirqd_run_begin(); + + timer_si = local_timers_pending(); + __this_cpu_write(pending_timer_softirq, 0); + or_softirq_pending(timer_si); + + __do_softirq(); + + ksoftirqd_run_end(); +} + +static struct smp_hotplug_thread timer_threads = { + .store = &timersd, + .setup = timersd_setup, + .thread_should_run = timersd_should_run, + .thread_fn = run_timersd, + .thread_comm = "ktimers/%u", +}; +#endif + static __init int spawn_ksoftirqd(void) { cpuhp_setup_state_nocalls(CPUHP_SOFTIRQ_DEAD, "softirq:dead", NULL, takeover_tasklets); BUG_ON(smpboot_register_percpu_thread(&softirq_threads)); - +#ifdef CONFIG_PREEMPT_RT + BUG_ON(smpboot_register_percpu_thread(&timer_threads)); +#endif return 0; } early_initcall(spawn_ksoftirqd); diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c index 5402e0f242178..133d49f703d93 100644 --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -1811,7 +1811,7 @@ void hrtimer_interrupt(struct clock_event_device *dev) if (!ktime_before(now, cpu_base->softirq_expires_next)) { cpu_base->softirq_expires_next = KTIME_MAX; cpu_base->softirq_activated = 1; - __raise_softirq_irqoff(HRTIMER_SOFTIRQ); + raise_hrtimer_softirq(); } __hrtimer_run_queues(cpu_base, now, flags, HRTIMER_ACTIVE_HARD); @@ -1906,7 +1906,7 @@ void hrtimer_run_queues(void) if (!ktime_before(now, cpu_base->softirq_expires_next)) { cpu_base->softirq_expires_next = KTIME_MAX; cpu_base->softirq_activated = 1; - __raise_softirq_irqoff(HRTIMER_SOFTIRQ); + raise_hrtimer_softirq(); } __hrtimer_run_queues(cpu_base, now, flags, HRTIMER_ACTIVE_HARD); diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index 753a184c70907..976a212cca2e8 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -859,7 +859,7 @@ static void tick_nohz_restart(struct tick_sched *ts, ktime_t now) static inline bool local_timer_softirq_pending(void) { - return local_softirq_pending() & BIT(TIMER_SOFTIRQ); + return local_timers_pending() & BIT(TIMER_SOFTIRQ); } /* diff --git a/kernel/time/timer.c b/kernel/time/timer.c index 1759de934284c..79f0dc73ac436 100644 --- a/kernel/time/timer.c +++ b/kernel/time/timer.c @@ -2499,7 +2499,7 @@ static void run_local_timers(void) */ if (time_after_eq(jiffies, READ_ONCE(base->next_expiry)) || (i == BASE_DEF && tmigr_requires_handle_remote())) { - __raise_softirq_irqoff(TIMER_SOFTIRQ); + raise_timer_softirq(); return; } }