diff mbox series

[2/2] rcu: Robustify rcu_is_cpu_rrupt_from_idle()

Message ID 20250318135619.4300-3-frederic@kernel.org (mailing list archive)
State New
Headers show
Series rcu: Random updates | expand

Commit Message

Frederic Weisbecker March 18, 2025, 1:56 p.m. UTC
RCU relies on the context tracking nesting counter in order to determine
if it is running in extended quiescent state.

However the context tracking nesting counter is not completely
synchronized with the actual context tracking state:

* The nesting counter is set to 1 or incremented further _after_ the
  actual state is set to RCU not watching.

   (then we know for sure we interrupted RCU not watching)

* The nesting counter is set to 0 or decremented further _before_ the
  actual state is set to RCU watching.

Therefore it is safe to assume that if ct_nesting() > 0, RCU is not
watching. But if ct_nesting() <= 0, RCU is watching except for a tiny
window.

This hasn't been a problem so far because rcu_is_cpu_rrupt_from_idle()
has only been called from interrupts. However the code is confusing
and abuses the role of the context tracking nesting counter while there
are more accurate indicators available.

Clarify and robustify accordingly.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 kernel/rcu/tree.c | 27 +++++++++++++++++----------
 1 file changed, 17 insertions(+), 10 deletions(-)

Comments

Joel Fernandes March 22, 2025, 5 p.m. UTC | #1
On 3/18/2025 2:56 PM, Frederic Weisbecker wrote:
> RCU relies on the context tracking nesting counter in order to determine
> if it is running in extended quiescent state.
> 
> However the context tracking nesting counter is not completely
> synchronized with the actual context tracking state:
> 
> * The nesting counter is set to 1 or incremented further _after_ the
>   actual state is set to RCU not watching.

I agree with patch, but this line is a bit confusing ->nesting is set to 1
*after* the RCU state is set to "watching".  Did you mean "watching" ?

But I think you meant "After RCU transitions from a state of not-watching to
watching' instead of 'actual state is set to RCU not watching'..

ct_kernel_entry():

	// RCU is not watching here ...
	ct_kernel_enter_state(offset);
	// ... but is watching here.
	WRITE_ONCE(ct->nesting, 1);

>    (then we know for sure we interrupted RCU not watching)
> 
> * The nesting counter is set to 0 or decremented further _before_ the
>   actual state is set to RCU watching.
> 
> Therefore it is safe to assume that if ct_nesting() > 0, RCU is not
> watching. But if ct_nesting() <= 0, RCU is watching except for a tiny
> window.
> 
> This hasn't been a problem so far because rcu_is_cpu_rrupt_from_idle()
> has only been called from interrupts. However the code is confusing

Agreed, and I could also see the existing code's snippet:
	WARN_ON_ONCE(!nesting && !is_idle_task(current));

.. not working if this function were to be called from non-interrupt kernel context.


> and abuses the role of the context tracking nesting counter while there
> are more accurate indicators available.
> 
> Clarify and robustify accordingly.
> 
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> ---
>  kernel/rcu/tree.c | 27 +++++++++++++++++----------
>  1 file changed, 17 insertions(+), 10 deletions(-)
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 79dced5fb72e..90c43061c981 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -367,7 +367,7 @@ EXPORT_SYMBOL_GPL(rcu_momentary_eqs);
>   */
>  static int rcu_is_cpu_rrupt_from_idle(void)
>  {
> -	long nesting;
> +	long nmi_nesting = ct_nmi_nesting();
>  
>  	/*
>  	 * Usually called from the tick; but also used from smp_function_call()
> @@ -379,21 +379,28 @@ static int rcu_is_cpu_rrupt_from_idle(void)
>  	/* Check for counter underflows */
>  	RCU_LOCKDEP_WARN(ct_nesting() < 0,
>  			 "RCU nesting counter underflow!");
> -	RCU_LOCKDEP_WARN(ct_nmi_nesting() <= 0,
> -			 "RCU nmi_nesting counter underflow/zero!");
>  
> -	/* Are we at first interrupt nesting level? */
> -	nesting = ct_nmi_nesting();
> -	if (nesting > 1)
> +	/* Non-idle interrupt or nested idle interrupt */
> +	if (nmi_nesting > 1)
>  		return false;
>  
>  	/*
> -	 * If we're not in an interrupt, we must be in the idle task!
> +	 * Non nested idle interrupt (interrupting section where RCU
> +	 * wasn't watching).
>  	 */
> -	WARN_ON_ONCE(!nesting && !is_idle_task(current));
> +	if (nmi_nesting == 1)
> +		return true;
>  
> -	/* Does CPU appear to be idle from an RCU standpoint? */
> -	return ct_nesting() == 0;
> +	/* Not in an interrupt */
> +	if (!nmi_nesting) {
> +		RCU_LOCKDEP_WARN(!in_task() || !is_idle_task(current),
> +				 "RCU nmi_nesting counter not in idle task!");
> +		return !rcu_is_watching_curr_cpu();

Makes sense to me and it is also consistent with rcu_watching_snap_in_eqs().

thanks,

 - Joel
Frederic Weisbecker March 23, 2025, 10:31 p.m. UTC | #2
Le Sat, Mar 22, 2025 at 06:00:13PM +0100, Joel Fernandes a écrit :
> 
> 
> On 3/18/2025 2:56 PM, Frederic Weisbecker wrote:
> > RCU relies on the context tracking nesting counter in order to determine
> > if it is running in extended quiescent state.
> > 
> > However the context tracking nesting counter is not completely
> > synchronized with the actual context tracking state:
> > 
> > * The nesting counter is set to 1 or incremented further _after_ the
> >   actual state is set to RCU not watching.
> 
> I agree with patch, but this line is a bit confusing ->nesting is set to 1
> *after* the RCU state is set to "watching".  Did you mean "watching" ?
> 
> But I think you meant "After RCU transitions from a state of not-watching to
> watching' instead of 'actual state is set to RCU not watching'..
> 
> ct_kernel_entry():
> 
> 	// RCU is not watching here ...
> 	ct_kernel_enter_state(offset);
> 	// ... but is watching here.
> 	WRITE_ONCE(ct->nesting, 1);

Oh I completely inverted the thing in the changelog!

> 
> >    (then we know for sure we interrupted RCU not watching)
> > 
> > * The nesting counter is set to 0 or decremented further _before_ the
> >   actual state is set to RCU watching.
> > 
> > Therefore it is safe to assume that if ct_nesting() > 0, RCU is not
> > watching. But if ct_nesting() <= 0, RCU is watching except for a tiny
> > window.
> > 
> > This hasn't been a problem so far because rcu_is_cpu_rrupt_from_idle()
> > has only been called from interrupts. However the code is confusing
> 
> Agreed, and I could also see the existing code's snippet:
> 	WARN_ON_ONCE(!nesting && !is_idle_task(current));
> 
> .. not working if this function were to be called from non-interrupt kernel
> context.

Right.

I'll reissue that one.

Thanks!
diff mbox series

Patch

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 79dced5fb72e..90c43061c981 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -367,7 +367,7 @@  EXPORT_SYMBOL_GPL(rcu_momentary_eqs);
  */
 static int rcu_is_cpu_rrupt_from_idle(void)
 {
-	long nesting;
+	long nmi_nesting = ct_nmi_nesting();
 
 	/*
 	 * Usually called from the tick; but also used from smp_function_call()
@@ -379,21 +379,28 @@  static int rcu_is_cpu_rrupt_from_idle(void)
 	/* Check for counter underflows */
 	RCU_LOCKDEP_WARN(ct_nesting() < 0,
 			 "RCU nesting counter underflow!");
-	RCU_LOCKDEP_WARN(ct_nmi_nesting() <= 0,
-			 "RCU nmi_nesting counter underflow/zero!");
 
-	/* Are we at first interrupt nesting level? */
-	nesting = ct_nmi_nesting();
-	if (nesting > 1)
+	/* Non-idle interrupt or nested idle interrupt */
+	if (nmi_nesting > 1)
 		return false;
 
 	/*
-	 * If we're not in an interrupt, we must be in the idle task!
+	 * Non nested idle interrupt (interrupting section where RCU
+	 * wasn't watching).
 	 */
-	WARN_ON_ONCE(!nesting && !is_idle_task(current));
+	if (nmi_nesting == 1)
+		return true;
 
-	/* Does CPU appear to be idle from an RCU standpoint? */
-	return ct_nesting() == 0;
+	/* Not in an interrupt */
+	if (!nmi_nesting) {
+		RCU_LOCKDEP_WARN(!in_task() || !is_idle_task(current),
+				 "RCU nmi_nesting counter not in idle task!");
+		return !rcu_is_watching_curr_cpu();
+	}
+
+	RCU_LOCKDEP_WARN(1, "RCU nmi_nesting counter underflow/zero!");
+
+	return false;
 }
 
 #define DEFAULT_RCU_BLIMIT (IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD) ? 1000 : 10)