diff mbox series

[01/12] rcu: Decrease FQS scan wait time in case of callback overloading

Message ID 20220620222032.3839547-1-paulmck@kernel.org (mailing list archive)
State New, archived
Headers show
Series Miscellaneous fixes for v5.20 | expand

Commit Message

Paul E. McKenney June 20, 2022, 10:20 p.m. UTC
The force-quiesce-state loop function rcu_gp_fqs_loop() checks for
callback overloading and does an immediate initial scan for idle CPUs
if so.  However, subsequent rescans will be carried out at as leisurely a
rate as they always are, as specified by the rcutree.jiffies_till_next_fqs
module parameter.  It might be tempting to just continue immediately
rescanning, but this turns the RCU grace-period kthread into a CPU hog.
It might also be tempting to reduce the time between rescans to a single
jiffy, but this can be problematic on larger systems.

This commit therefore divides the normal time between rescans by three,
rounding up.  Thus a small system running at HZ=1000 that is suffering
from callback overload will wait only one jiffy instead of the normal
three between rescans.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tree.c | 5 +++++
 1 file changed, 5 insertions(+)

Comments

Neeraj Upadhyay June 21, 2022, 5:29 a.m. UTC | #1
On 6/21/2022 3:50 AM, Paul E. McKenney wrote:
> The force-quiesce-state loop function rcu_gp_fqs_loop() checks for
> callback overloading and does an immediate initial scan for idle CPUs
> if so.  However, subsequent rescans will be carried out at as leisurely a
> rate as they always are, as specified by the rcutree.jiffies_till_next_fqs
> module parameter.  It might be tempting to just continue immediately
> rescanning, but this turns the RCU grace-period kthread into a CPU hog.
> It might also be tempting to reduce the time between rescans to a single
> jiffy, but this can be problematic on larger systems.
> 
> This commit therefore divides the normal time between rescans by three,
> rounding up.  Thus a small system running at HZ=1000 that is suffering
> from callback overload will wait only one jiffy instead of the normal
> three between rescans.
> 
> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> ---
>   kernel/rcu/tree.c | 5 +++++
>   1 file changed, 5 insertions(+)
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index c25ba442044a6..c19d5926886fb 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -1993,6 +1993,11 @@ static noinline_for_stack void rcu_gp_fqs_loop(void)
>   			WRITE_ONCE(rcu_state.jiffies_kick_kthreads,
>   				   jiffies + (j ? 3 * j : 2));
>   		}
> +		if (rcu_state.cbovld) {
> +			j = (j + 2) / 3;
> +			if (j <= 0)
> +				j = 1;
> +		}

We update 'j' here, after setting rcu_state.jiffies_force_qs

     WRITE_ONCE(rcu_state.jiffies_force_qs, jiffies + j)

So, we return from swait_event_idle_timeout_exclusive after 1/3 time 
duration.

     swait_event_idle_timeout_exclusive(rcu_state.gp_wq,
				 rcu_gp_fqs_check_wake(&gf), j);

This can result in !timer_after check to return false and we will
enter the 'else' (stray signal block) code?

This might not matter for first 2 fqs loop iterations, where 
RCU_GP_FLAG_OVLD is set in 'gf', but subsequent iterations won't benefit
from this patch?


if (!time_after(rcu_state.jiffies_force_qs, jiffies) ||
	(gf & (RCU_GP_FLAG_FQS | RCU_GP_FLAG_OVLD))) {
			...
} else {
	/* Deal with stray signal. */
}


So, do we need to move this calculation above the 'if' block which sets 
rcu_state.jiffies_force_qs?
		if (!ret) {

			WRITE_ONCE(rcu_state.jiffies_force_qs, jiffies +
						j);...
		}

Thanks
Neeraj

>   		trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
>   				       TPS("fqswait"));
>   		WRITE_ONCE(rcu_state.gp_state, RCU_GP_WAIT_FQS);
Paul E. McKenney June 21, 2022, 10:19 p.m. UTC | #2
On Tue, Jun 21, 2022 at 10:59:58AM +0530, Neeraj Upadhyay wrote:
> 
> 
> On 6/21/2022 3:50 AM, Paul E. McKenney wrote:
> > The force-quiesce-state loop function rcu_gp_fqs_loop() checks for
> > callback overloading and does an immediate initial scan for idle CPUs
> > if so.  However, subsequent rescans will be carried out at as leisurely a
> > rate as they always are, as specified by the rcutree.jiffies_till_next_fqs
> > module parameter.  It might be tempting to just continue immediately
> > rescanning, but this turns the RCU grace-period kthread into a CPU hog.
> > It might also be tempting to reduce the time between rescans to a single
> > jiffy, but this can be problematic on larger systems.
> > 
> > This commit therefore divides the normal time between rescans by three,
> > rounding up.  Thus a small system running at HZ=1000 that is suffering
> > from callback overload will wait only one jiffy instead of the normal
> > three between rescans.
> > 
> > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > ---
> >   kernel/rcu/tree.c | 5 +++++
> >   1 file changed, 5 insertions(+)
> > 
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index c25ba442044a6..c19d5926886fb 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -1993,6 +1993,11 @@ static noinline_for_stack void rcu_gp_fqs_loop(void)
> >   			WRITE_ONCE(rcu_state.jiffies_kick_kthreads,
> >   				   jiffies + (j ? 3 * j : 2));
> >   		}
> > +		if (rcu_state.cbovld) {
> > +			j = (j + 2) / 3;
> > +			if (j <= 0)
> > +				j = 1;
> > +		}
> 
> We update 'j' here, after setting rcu_state.jiffies_force_qs
> 
>     WRITE_ONCE(rcu_state.jiffies_force_qs, jiffies + j)
> 
> So, we return from swait_event_idle_timeout_exclusive after 1/3 time
> duration.
> 
>     swait_event_idle_timeout_exclusive(rcu_state.gp_wq,
> 				 rcu_gp_fqs_check_wake(&gf), j);
> 
> This can result in !timer_after check to return false and we will
> enter the 'else' (stray signal block) code?
> 
> This might not matter for first 2 fqs loop iterations, where
> RCU_GP_FLAG_OVLD is set in 'gf', but subsequent iterations won't benefit
> from this patch?
> 
> 
> if (!time_after(rcu_state.jiffies_force_qs, jiffies) ||
> 	(gf & (RCU_GP_FLAG_FQS | RCU_GP_FLAG_OVLD))) {
> 			...
> } else {
> 	/* Deal with stray signal. */
> }
> 
> 
> So, do we need to move this calculation above the 'if' block which sets
> rcu_state.jiffies_force_qs?
> 		if (!ret) {
> 
> 			WRITE_ONCE(rcu_state.jiffies_force_qs, jiffies +
> 						j);...
> 		}

Good catch, thank you!  How about the updated patch shown below?

							Thanx, Paul

------------------------------------------------------------------------

commit 77de092c78f549b5c28075bfee9998a525d21f84
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Tue Apr 12 15:08:14 2022 -0700

    rcu: Decrease FQS scan wait time in case of callback overloading
    
    The force-quiesce-state loop function rcu_gp_fqs_loop() checks for
    callback overloading and does an immediate initial scan for idle CPUs
    if so.  However, subsequent rescans will be carried out at as leisurely a
    rate as they always are, as specified by the rcutree.jiffies_till_next_fqs
    module parameter.  It might be tempting to just continue immediately
    rescanning, but this turns the RCU grace-period kthread into a CPU hog.
    It might also be tempting to reduce the time between rescans to a single
    jiffy, but this can be problematic on larger systems.
    
    This commit therefore divides the normal time between rescans by three,
    rounding up.  Thus a small system running at HZ=1000 that is suffering
    from callback overload will wait only one jiffy instead of the normal
    three between rescans.
    
    [ paulmck: Apply Neeraj Upadhyay feedback. ]
    
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index c25ba442044a6..52094e72866e5 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1983,7 +1983,12 @@ static noinline_for_stack void rcu_gp_fqs_loop(void)
 		gf = RCU_GP_FLAG_OVLD;
 	ret = 0;
 	for (;;) {
-		if (!ret) {
+		if (rcu_state.cbovld) {
+			j = (j + 2) / 3;
+			if (j <= 0)
+				j = 1;
+		}
+		if (!ret || time_before(jiffies + j, rcu_state.jiffies_force_qs)) {
 			WRITE_ONCE(rcu_state.jiffies_force_qs, jiffies + j);
 			/*
 			 * jiffies_force_qs before RCU_GP_WAIT_FQS state
Neeraj Upadhyay June 22, 2022, 11:46 a.m. UTC | #3
On 6/22/2022 3:49 AM, Paul E. McKenney wrote:
> On Tue, Jun 21, 2022 at 10:59:58AM +0530, Neeraj Upadhyay wrote:
>>
>>
>> On 6/21/2022 3:50 AM, Paul E. McKenney wrote:
>>> The force-quiesce-state loop function rcu_gp_fqs_loop() checks for
>>> callback overloading and does an immediate initial scan for idle CPUs
>>> if so.  However, subsequent rescans will be carried out at as leisurely a
>>> rate as they always are, as specified by the rcutree.jiffies_till_next_fqs
>>> module parameter.  It might be tempting to just continue immediately
>>> rescanning, but this turns the RCU grace-period kthread into a CPU hog.
>>> It might also be tempting to reduce the time between rescans to a single
>>> jiffy, but this can be problematic on larger systems.
>>>
>>> This commit therefore divides the normal time between rescans by three,
>>> rounding up.  Thus a small system running at HZ=1000 that is suffering
>>> from callback overload will wait only one jiffy instead of the normal
>>> three between rescans.
>>>
>>> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
>>> ---
>>>    kernel/rcu/tree.c | 5 +++++
>>>    1 file changed, 5 insertions(+)
>>>
>>> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
>>> index c25ba442044a6..c19d5926886fb 100644
>>> --- a/kernel/rcu/tree.c
>>> +++ b/kernel/rcu/tree.c
>>> @@ -1993,6 +1993,11 @@ static noinline_for_stack void rcu_gp_fqs_loop(void)
>>>    			WRITE_ONCE(rcu_state.jiffies_kick_kthreads,
>>>    				   jiffies + (j ? 3 * j : 2));
>>>    		}
>>> +		if (rcu_state.cbovld) {
>>> +			j = (j + 2) / 3;
>>> +			if (j <= 0)
>>> +				j = 1;
>>> +		}
>>
>> We update 'j' here, after setting rcu_state.jiffies_force_qs
>>
>>      WRITE_ONCE(rcu_state.jiffies_force_qs, jiffies + j)
>>
>> So, we return from swait_event_idle_timeout_exclusive after 1/3 time
>> duration.
>>
>>      swait_event_idle_timeout_exclusive(rcu_state.gp_wq,
>> 				 rcu_gp_fqs_check_wake(&gf), j);
>>
>> This can result in !timer_after check to return false and we will
>> enter the 'else' (stray signal block) code?
>>
>> This might not matter for first 2 fqs loop iterations, where
>> RCU_GP_FLAG_OVLD is set in 'gf', but subsequent iterations won't benefit
>> from this patch?
>>
>>
>> if (!time_after(rcu_state.jiffies_force_qs, jiffies) ||
>> 	(gf & (RCU_GP_FLAG_FQS | RCU_GP_FLAG_OVLD))) {
>> 			...
>> } else {
>> 	/* Deal with stray signal. */
>> }
>>
>>
>> So, do we need to move this calculation above the 'if' block which sets
>> rcu_state.jiffies_force_qs?
>> 		if (!ret) {
>>
>> 			WRITE_ONCE(rcu_state.jiffies_force_qs, jiffies +
>> 						j);...
>> 		}
> 
> Good catch, thank you!  How about the updated patch shown below?
> 

Looks good to me.


Thanks
Neeraj

> 							Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
> commit 77de092c78f549b5c28075bfee9998a525d21f84
> Author: Paul E. McKenney <paulmck@kernel.org>
> Date:   Tue Apr 12 15:08:14 2022 -0700
> 
>      rcu: Decrease FQS scan wait time in case of callback overloading
>      
>      The force-quiesce-state loop function rcu_gp_fqs_loop() checks for
>      callback overloading and does an immediate initial scan for idle CPUs
>      if so.  However, subsequent rescans will be carried out at as leisurely a
>      rate as they always are, as specified by the rcutree.jiffies_till_next_fqs
>      module parameter.  It might be tempting to just continue immediately
>      rescanning, but this turns the RCU grace-period kthread into a CPU hog.
>      It might also be tempting to reduce the time between rescans to a single
>      jiffy, but this can be problematic on larger systems.
>      
>      This commit therefore divides the normal time between rescans by three,
>      rounding up.  Thus a small system running at HZ=1000 that is suffering
>      from callback overload will wait only one jiffy instead of the normal
>      three between rescans.
>      
>      [ paulmck: Apply Neeraj Upadhyay feedback. ]
>      
>      Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index c25ba442044a6..52094e72866e5 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -1983,7 +1983,12 @@ static noinline_for_stack void rcu_gp_fqs_loop(void)
>   		gf = RCU_GP_FLAG_OVLD;
>   	ret = 0;
>   	for (;;) {
> -		if (!ret) {
> +		if (rcu_state.cbovld) {
> +			j = (j + 2) / 3;
> +			if (j <= 0)
> +				j = 1;
> +		}
> +		if (!ret || time_before(jiffies + j, rcu_state.jiffies_force_qs)) {
>   			WRITE_ONCE(rcu_state.jiffies_force_qs, jiffies + j);
>   			/*
>   			 * jiffies_force_qs before RCU_GP_WAIT_FQS state
diff mbox series

Patch

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index c25ba442044a6..c19d5926886fb 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1993,6 +1993,11 @@  static noinline_for_stack void rcu_gp_fqs_loop(void)
 			WRITE_ONCE(rcu_state.jiffies_kick_kthreads,
 				   jiffies + (j ? 3 * j : 2));
 		}
+		if (rcu_state.cbovld) {
+			j = (j + 2) / 3;
+			if (j <= 0)
+				j = 1;
+		}
 		trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
 				       TPS("fqswait"));
 		WRITE_ONCE(rcu_state.gp_state, RCU_GP_WAIT_FQS);