diff mbox series

[6/7] rcu/nocb: Add option to opt rcuo kthreads out of RT priority

Message ID 20220620224503.3841196-6-paulmck@kernel.org (mailing list archive)
State Accepted
Commit 8f489b4da5278fc6e5fc8f0029ae7fb51c060215
Headers show
Series Callback-offload (nocb) updates for v5.20 | expand

Commit Message

Paul E. McKenney June 20, 2022, 10:45 p.m. UTC
From: "Uladzislau Rezki (Sony)" <urezki@gmail.com>

This commit introduces a RCU_NOCB_CPU_CB_BOOST Kconfig option that
prevents rcuo kthreads from running at real-time priority, even in
kernels built with RCU_BOOST.  This capability is important to devices
needing low-latency (as in a few milliseconds) response from expedited
RCU grace periods, but which are not running a classic real-time workload.
On such devices, permitting the rcuo kthreads to run at real-time priority
results in unacceptable latencies imposed on the application tasks,
which run as SCHED_OTHER.

See for example the following trace output:

<snip>
<...>-60 [006] d..1 2979.028717: rcu_batch_start: rcu_preempt CBs=34619 bl=270
<snip>

If that rcuop kthread were permitted to run at real-time SCHED_FIFO
priority, it would monopolize its CPU for hundreds of milliseconds
while invoking those 34619 RCU callback functions, which would cause an
unacceptably long latency spike for many application stacks on Android
platforms.

However, some existing real-time workloads require that callback
invocation run at SCHED_FIFO priority, for example, those running on
systems with heavy SCHED_OTHER background loads.  (It is the real-time
system's administrator's responsibility to make sure that important
real-time tasks run at a higher priority than do RCU's kthreads.)

Therefore, this new RCU_NOCB_CPU_CB_BOOST Kconfig option defaults to
"y" on kernels built with PREEMPT_RT and defaults to "n" otherwise.
The effect is to preserve current behavior for real-time systems, but for
other systems to allow expedited RCU grace periods to run with real-time
priority while continuing to invoke RCU callbacks as SCHED_OTHER.

As you would expect, this RCU_NOCB_CPU_CB_BOOST Kconfig option has no
effect except on CPUs with offloaded RCU callbacks.

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Acked-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 kernel/rcu/Kconfig     | 16 ++++++++++++++++
 kernel/rcu/tree.c      |  6 +++++-
 kernel/rcu/tree_nocb.h |  3 ++-
 3 files changed, 23 insertions(+), 2 deletions(-)

Comments

Neeraj Upadhyay July 19, 2022, 9:35 a.m. UTC | #1
On 6/21/2022 4:15 AM, Paul E. McKenney wrote:
> From: "Uladzislau Rezki (Sony)" <urezki@gmail.com>
> 
> This commit introduces a RCU_NOCB_CPU_CB_BOOST Kconfig option that
> prevents rcuo kthreads from running at real-time priority, even in
> kernels built with RCU_BOOST.  This capability is important to devices
> needing low-latency (as in a few milliseconds) response from expedited
> RCU grace periods, but which are not running a classic real-time workload.
> On such devices, permitting the rcuo kthreads to run at real-time priority
> results in unacceptable latencies imposed on the application tasks,
> which run as SCHED_OTHER.
> 
> See for example the following trace output:
> 
> <snip>
> <...>-60 [006] d..1 2979.028717: rcu_batch_start: rcu_preempt CBs=34619 bl=270
> <snip>
> 
> If that rcuop kthread were permitted to run at real-time SCHED_FIFO
> priority, it would monopolize its CPU for hundreds of milliseconds
> while invoking those 34619 RCU callback functions, which would cause an
> unacceptably long latency spike for many application stacks on Android
> platforms.
> 
> However, some existing real-time workloads require that callback
> invocation run at SCHED_FIFO priority, for example, those running on
> systems with heavy SCHED_OTHER background loads.  (It is the real-time
> system's administrator's responsibility to make sure that important
> real-time tasks run at a higher priority than do RCU's kthreads.)
> 
> Therefore, this new RCU_NOCB_CPU_CB_BOOST Kconfig option defaults to
> "y" on kernels built with PREEMPT_RT and defaults to "n" otherwise.
> The effect is to preserve current behavior for real-time systems, but for
> other systems to allow expedited RCU grace periods to run with real-time
> priority while continuing to invoke RCU callbacks as SCHED_OTHER.
> 
> As you would expect, this RCU_NOCB_CPU_CB_BOOST Kconfig option has no
> effect except on CPUs with offloaded RCU callbacks.
> 
> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> Acked-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> ---


Reviewed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>


Thanks
Neeraj

>   kernel/rcu/Kconfig     | 16 ++++++++++++++++
>   kernel/rcu/tree.c      |  6 +++++-
>   kernel/rcu/tree_nocb.h |  3 ++-
>   3 files changed, 23 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/rcu/Kconfig b/kernel/rcu/Kconfig
> index 27aab870ae4cf..c05ca52cdf64d 100644
> --- a/kernel/rcu/Kconfig
> +++ b/kernel/rcu/Kconfig
> @@ -275,6 +275,22 @@ config RCU_NOCB_CPU_DEFAULT_ALL
>   	  Say Y here if you want offload all CPUs by default on boot.
>   	  Say N here if you are unsure.
>   
> +config RCU_NOCB_CPU_CB_BOOST
> +	bool "Offload RCU callback from real-time kthread"
> +	depends on RCU_NOCB_CPU && RCU_BOOST
> +	default y if PREEMPT_RT
> +	help
> +	  Use this option to invoke offloaded callbacks as SCHED_FIFO
> +	  to avoid starvation by heavy SCHED_OTHER background load.
> +	  Of course, running as SCHED_FIFO during callback floods will
> +	  cause the rcuo[ps] kthreads to monopolize the CPU for hundreds
> +	  of milliseconds or more.  Therefore, when enabling this option,
> +	  it is your responsibility to ensure that latency-sensitive
> +	  tasks either run with higher priority or run on some other CPU.
> +
> +	  Say Y here if you want to set RT priority for offloading kthreads.
> +	  Say N here if you are building a !PREEMPT_RT kernel and are unsure.
> +
>   config TASKS_TRACE_RCU_READ_MB
>   	bool "Tasks Trace RCU readers use memory barriers in user and idle"
>   	depends on RCU_EXPERT && TASKS_TRACE_RCU
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 74455671e6cf2..3b9f45ebb4999 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -154,7 +154,11 @@ static void sync_sched_exp_online_cleanup(int cpu);
>   static void check_cb_ovld_locked(struct rcu_data *rdp, struct rcu_node *rnp);
>   static bool rcu_rdp_is_offloaded(struct rcu_data *rdp);
>   
> -/* rcuc/rcub/rcuop kthread realtime priority */
> +/*
> + * rcuc/rcub/rcuop kthread realtime priority. The "rcuop"
> + * real-time priority(enabling/disabling) is controlled by
> + * the extra CONFIG_RCU_NOCB_CPU_CB_BOOST configuration.
> + */
>   static int kthread_prio = IS_ENABLED(CONFIG_RCU_BOOST) ? 1 : 0;
>   module_param(kthread_prio, int, 0444);
>   
> diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
> index 60cc92cc66552..fa8e4f82e60c0 100644
> --- a/kernel/rcu/tree_nocb.h
> +++ b/kernel/rcu/tree_nocb.h
> @@ -1315,8 +1315,9 @@ static void rcu_spawn_cpu_nocb_kthread(int cpu)
>   	if (WARN_ONCE(IS_ERR(t), "%s: Could not start rcuo CB kthread, OOM is now expected behavior\n", __func__))
>   		goto end;
>   
> -	if (kthread_prio)
> +	if (IS_ENABLED(CONFIG_RCU_NOCB_CPU_CB_BOOST) && kthread_prio)
>   		sched_setscheduler_nocheck(t, SCHED_FIFO, &sp);
> +
>   	WRITE_ONCE(rdp->nocb_cb_kthread, t);
>   	WRITE_ONCE(rdp->nocb_gp_kthread, rdp_gp->nocb_gp_kthread);
>   	return;
diff mbox series

Patch

diff --git a/kernel/rcu/Kconfig b/kernel/rcu/Kconfig
index 27aab870ae4cf..c05ca52cdf64d 100644
--- a/kernel/rcu/Kconfig
+++ b/kernel/rcu/Kconfig
@@ -275,6 +275,22 @@  config RCU_NOCB_CPU_DEFAULT_ALL
 	  Say Y here if you want offload all CPUs by default on boot.
 	  Say N here if you are unsure.
 
+config RCU_NOCB_CPU_CB_BOOST
+	bool "Offload RCU callback from real-time kthread"
+	depends on RCU_NOCB_CPU && RCU_BOOST
+	default y if PREEMPT_RT
+	help
+	  Use this option to invoke offloaded callbacks as SCHED_FIFO
+	  to avoid starvation by heavy SCHED_OTHER background load.
+	  Of course, running as SCHED_FIFO during callback floods will
+	  cause the rcuo[ps] kthreads to monopolize the CPU for hundreds
+	  of milliseconds or more.  Therefore, when enabling this option,
+	  it is your responsibility to ensure that latency-sensitive
+	  tasks either run with higher priority or run on some other CPU.
+
+	  Say Y here if you want to set RT priority for offloading kthreads.
+	  Say N here if you are building a !PREEMPT_RT kernel and are unsure.
+
 config TASKS_TRACE_RCU_READ_MB
 	bool "Tasks Trace RCU readers use memory barriers in user and idle"
 	depends on RCU_EXPERT && TASKS_TRACE_RCU
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 74455671e6cf2..3b9f45ebb4999 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -154,7 +154,11 @@  static void sync_sched_exp_online_cleanup(int cpu);
 static void check_cb_ovld_locked(struct rcu_data *rdp, struct rcu_node *rnp);
 static bool rcu_rdp_is_offloaded(struct rcu_data *rdp);
 
-/* rcuc/rcub/rcuop kthread realtime priority */
+/*
+ * rcuc/rcub/rcuop kthread realtime priority. The "rcuop"
+ * real-time priority(enabling/disabling) is controlled by
+ * the extra CONFIG_RCU_NOCB_CPU_CB_BOOST configuration.
+ */
 static int kthread_prio = IS_ENABLED(CONFIG_RCU_BOOST) ? 1 : 0;
 module_param(kthread_prio, int, 0444);
 
diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index 60cc92cc66552..fa8e4f82e60c0 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -1315,8 +1315,9 @@  static void rcu_spawn_cpu_nocb_kthread(int cpu)
 	if (WARN_ONCE(IS_ERR(t), "%s: Could not start rcuo CB kthread, OOM is now expected behavior\n", __func__))
 		goto end;
 
-	if (kthread_prio)
+	if (IS_ENABLED(CONFIG_RCU_NOCB_CPU_CB_BOOST) && kthread_prio)
 		sched_setscheduler_nocheck(t, SCHED_FIFO, &sp);
+
 	WRITE_ONCE(rdp->nocb_cb_kthread, t);
 	WRITE_ONCE(rdp->nocb_gp_kthread, rdp_gp->nocb_gp_kthread);
 	return;