diff mbox series

[3/4] rcu: Make tiny RCU use ksoftirqd to trigger a QS from idle

Message ID 20231019233543.1243121-4-frederic@kernel.org (mailing list archive)
State Accepted
Commit 023e98a802827b8a2646c54e13ee98fa9f330ec3
Headers show
Series rcu: Fix PF_IDLE related issues, part. 1 | expand

Commit Message

Frederic Weisbecker Oct. 19, 2023, 11:35 p.m. UTC
The commit:

	cff9b2332ab7 ("kernel/sched: Modify initial boot task idle setup")

fixed an issue where rcutiny would request a quiescent state with
setting TIF_NEED_RESCHED in early boot when init/0 has the PF_IDLE flag
set but interrupts aren't enabled yet. A subsequent call to
cond_resched() would then enable IRQs too early.

When callbacks are enqueued in idle, RCU currently performs the
following:

1) Call resched_cpu() to trigger exit from idle and go through the
   scheduler to call rcu_note_context_switch() -> rcu_qs()

2) rcu_qs() notes the quiescent state and raises RCU_SOFTIRQ if there
   is a callback, waking up ksoftirqd since it isn't called from an
   interrupt.

However the call to resched_cpu() can opportunistically be replaced and
optimized with raising RCU_SOFTIRQ and forcing ksoftirqd wakeup instead.

It's worth noting that RCU grace period polling while idle is then
suboptimized but such a usecase can be considered very rare or even
non-existent.

The advantage of this optimization is that it also works if PF_IDLE is
set early because ksoftirqd is created way after IRQs are enabled on
boot and it can't be awaken before its creation. If
raise_ksoftirqd_irqoff() is called after the first scheduling point
but before kostfirqd is created, nearby voluntary schedule calls are
expected to provide the desired quiescent state and in the worst case
the first launch of ksoftirqd is close enough on the first initcalls.

Fixes: cff9b2332ab7 ("kernel/sched: Modify initial boot task idle setup")
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 kernel/rcu/tiny.c | 21 ++++++++++++++++-----
 1 file changed, 16 insertions(+), 5 deletions(-)

Comments

Paul E. McKenney Oct. 20, 2023, 12:49 a.m. UTC | #1
On Fri, Oct 20, 2023 at 01:35:42AM +0200, Frederic Weisbecker wrote:
> The commit:
> 
> 	cff9b2332ab7 ("kernel/sched: Modify initial boot task idle setup")
> 
> fixed an issue where rcutiny would request a quiescent state with
> setting TIF_NEED_RESCHED in early boot when init/0 has the PF_IDLE flag
> set but interrupts aren't enabled yet. A subsequent call to
> cond_resched() would then enable IRQs too early.
> 
> When callbacks are enqueued in idle, RCU currently performs the
> following:
> 
> 1) Call resched_cpu() to trigger exit from idle and go through the
>    scheduler to call rcu_note_context_switch() -> rcu_qs()
> 
> 2) rcu_qs() notes the quiescent state and raises RCU_SOFTIRQ if there
>    is a callback, waking up ksoftirqd since it isn't called from an
>    interrupt.
> 
> However the call to resched_cpu() can opportunistically be replaced and
> optimized with raising RCU_SOFTIRQ and forcing ksoftirqd wakeup instead.
> 
> It's worth noting that RCU grace period polling while idle is then
> suboptimized but such a usecase can be considered very rare or even
> non-existent.
> 
> The advantage of this optimization is that it also works if PF_IDLE is
> set early because ksoftirqd is created way after IRQs are enabled on
> boot and it can't be awaken before its creation. If
> raise_ksoftirqd_irqoff() is called after the first scheduling point
> but before kostfirqd is created, nearby voluntary schedule calls are
> expected to provide the desired quiescent state and in the worst case
> the first launch of ksoftirqd is close enough on the first initcalls.
> 
> Fixes: cff9b2332ab7 ("kernel/sched: Modify initial boot task idle setup")
> Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
> Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
> Cc: Sebastian Siewior <bigeasy@linutronix.de>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>

Reviewed-by: Paul E. McKenney <paulmck@kernel.org>

I did a touch-test of the series, and have started overnight testing.

							Thanx, Paul

> ---
>  kernel/rcu/tiny.c | 21 ++++++++++++++++-----
>  1 file changed, 16 insertions(+), 5 deletions(-)
> 
> diff --git a/kernel/rcu/tiny.c b/kernel/rcu/tiny.c
> index fec804b79080..9460e4e9d84c 100644
> --- a/kernel/rcu/tiny.c
> +++ b/kernel/rcu/tiny.c
> @@ -190,12 +190,15 @@ void call_rcu(struct rcu_head *head, rcu_callback_t func)
>  	local_irq_save(flags);
>  	*rcu_ctrlblk.curtail = head;
>  	rcu_ctrlblk.curtail = &head->next;
> -	local_irq_restore(flags);
>  
>  	if (unlikely(is_idle_task(current))) {
> -		/* force scheduling for rcu_qs() */
> -		resched_cpu(0);
> +		/*
> +		 * Force resched to trigger a QS and handle callbacks right after.
> +		 * This also takes care of avoiding too early rescheduling on boot.
> +		 */
> +		raise_ksoftirqd_irqoff(RCU_SOFTIRQ);
>  	}
> +	local_irq_restore(flags);
>  }
>  EXPORT_SYMBOL_GPL(call_rcu);
>  
> @@ -228,8 +231,16 @@ unsigned long start_poll_synchronize_rcu(void)
>  	unsigned long gp_seq = get_state_synchronize_rcu();
>  
>  	if (unlikely(is_idle_task(current))) {
> -		/* force scheduling for rcu_qs() */
> -		resched_cpu(0);
> +		unsigned long flags;
> +
> +		/*
> +		 * Force resched to trigger a QS. This also takes care of avoiding
> +		 * too early rescheduling on boot. It's suboptimized but GP
> +		 * polling on idle isn't expected much as a usecase.
> +		 */
> +		local_irq_save(flags);
> +		raise_ksoftirqd_irqoff(RCU_SOFTIRQ);
> +		local_irq_restore(flags);
>  	}
>  	return gp_seq;
>  }
> -- 
> 2.34.1
>
diff mbox series

Patch

diff --git a/kernel/rcu/tiny.c b/kernel/rcu/tiny.c
index fec804b79080..9460e4e9d84c 100644
--- a/kernel/rcu/tiny.c
+++ b/kernel/rcu/tiny.c
@@ -190,12 +190,15 @@  void call_rcu(struct rcu_head *head, rcu_callback_t func)
 	local_irq_save(flags);
 	*rcu_ctrlblk.curtail = head;
 	rcu_ctrlblk.curtail = &head->next;
-	local_irq_restore(flags);
 
 	if (unlikely(is_idle_task(current))) {
-		/* force scheduling for rcu_qs() */
-		resched_cpu(0);
+		/*
+		 * Force resched to trigger a QS and handle callbacks right after.
+		 * This also takes care of avoiding too early rescheduling on boot.
+		 */
+		raise_ksoftirqd_irqoff(RCU_SOFTIRQ);
 	}
+	local_irq_restore(flags);
 }
 EXPORT_SYMBOL_GPL(call_rcu);
 
@@ -228,8 +231,16 @@  unsigned long start_poll_synchronize_rcu(void)
 	unsigned long gp_seq = get_state_synchronize_rcu();
 
 	if (unlikely(is_idle_task(current))) {
-		/* force scheduling for rcu_qs() */
-		resched_cpu(0);
+		unsigned long flags;
+
+		/*
+		 * Force resched to trigger a QS. This also takes care of avoiding
+		 * too early rescheduling on boot. It's suboptimized but GP
+		 * polling on idle isn't expected much as a usecase.
+		 */
+		local_irq_save(flags);
+		raise_ksoftirqd_irqoff(RCU_SOFTIRQ);
+		local_irq_restore(flags);
 	}
 	return gp_seq;
 }