Message ID | 20220620222032.3839547-1-paulmck@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Miscellaneous fixes for v5.20 | expand |
On 6/21/2022 3:50 AM, Paul E. McKenney wrote: > The force-quiesce-state loop function rcu_gp_fqs_loop() checks for > callback overloading and does an immediate initial scan for idle CPUs > if so. However, subsequent rescans will be carried out at as leisurely a > rate as they always are, as specified by the rcutree.jiffies_till_next_fqs > module parameter. It might be tempting to just continue immediately > rescanning, but this turns the RCU grace-period kthread into a CPU hog. > It might also be tempting to reduce the time between rescans to a single > jiffy, but this can be problematic on larger systems. > > This commit therefore divides the normal time between rescans by three, > rounding up. Thus a small system running at HZ=1000 that is suffering > from callback overload will wait only one jiffy instead of the normal > three between rescans. > > Signed-off-by: Paul E. McKenney <paulmck@kernel.org> > --- > kernel/rcu/tree.c | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index c25ba442044a6..c19d5926886fb 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -1993,6 +1993,11 @@ static noinline_for_stack void rcu_gp_fqs_loop(void) > WRITE_ONCE(rcu_state.jiffies_kick_kthreads, > jiffies + (j ? 3 * j : 2)); > } > + if (rcu_state.cbovld) { > + j = (j + 2) / 3; > + if (j <= 0) > + j = 1; > + } We update 'j' here, after setting rcu_state.jiffies_force_qs WRITE_ONCE(rcu_state.jiffies_force_qs, jiffies + j) So, we return from swait_event_idle_timeout_exclusive after 1/3 time duration. swait_event_idle_timeout_exclusive(rcu_state.gp_wq, rcu_gp_fqs_check_wake(&gf), j); This can result in !timer_after check to return false and we will enter the 'else' (stray signal block) code? This might not matter for first 2 fqs loop iterations, where RCU_GP_FLAG_OVLD is set in 'gf', but subsequent iterations won't benefit from this patch? if (!time_after(rcu_state.jiffies_force_qs, jiffies) || (gf & (RCU_GP_FLAG_FQS | RCU_GP_FLAG_OVLD))) { ... } else { /* Deal with stray signal. */ } So, do we need to move this calculation above the 'if' block which sets rcu_state.jiffies_force_qs? if (!ret) { WRITE_ONCE(rcu_state.jiffies_force_qs, jiffies + j);... } Thanks Neeraj > trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq, > TPS("fqswait")); > WRITE_ONCE(rcu_state.gp_state, RCU_GP_WAIT_FQS);
On Tue, Jun 21, 2022 at 10:59:58AM +0530, Neeraj Upadhyay wrote: > > > On 6/21/2022 3:50 AM, Paul E. McKenney wrote: > > The force-quiesce-state loop function rcu_gp_fqs_loop() checks for > > callback overloading and does an immediate initial scan for idle CPUs > > if so. However, subsequent rescans will be carried out at as leisurely a > > rate as they always are, as specified by the rcutree.jiffies_till_next_fqs > > module parameter. It might be tempting to just continue immediately > > rescanning, but this turns the RCU grace-period kthread into a CPU hog. > > It might also be tempting to reduce the time between rescans to a single > > jiffy, but this can be problematic on larger systems. > > > > This commit therefore divides the normal time between rescans by three, > > rounding up. Thus a small system running at HZ=1000 that is suffering > > from callback overload will wait only one jiffy instead of the normal > > three between rescans. > > > > Signed-off-by: Paul E. McKenney <paulmck@kernel.org> > > --- > > kernel/rcu/tree.c | 5 +++++ > > 1 file changed, 5 insertions(+) > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > index c25ba442044a6..c19d5926886fb 100644 > > --- a/kernel/rcu/tree.c > > +++ b/kernel/rcu/tree.c > > @@ -1993,6 +1993,11 @@ static noinline_for_stack void rcu_gp_fqs_loop(void) > > WRITE_ONCE(rcu_state.jiffies_kick_kthreads, > > jiffies + (j ? 3 * j : 2)); > > } > > + if (rcu_state.cbovld) { > > + j = (j + 2) / 3; > > + if (j <= 0) > > + j = 1; > > + } > > We update 'j' here, after setting rcu_state.jiffies_force_qs > > WRITE_ONCE(rcu_state.jiffies_force_qs, jiffies + j) > > So, we return from swait_event_idle_timeout_exclusive after 1/3 time > duration. > > swait_event_idle_timeout_exclusive(rcu_state.gp_wq, > rcu_gp_fqs_check_wake(&gf), j); > > This can result in !timer_after check to return false and we will > enter the 'else' (stray signal block) code? > > This might not matter for first 2 fqs loop iterations, where > RCU_GP_FLAG_OVLD is set in 'gf', but subsequent iterations won't benefit > from this patch? > > > if (!time_after(rcu_state.jiffies_force_qs, jiffies) || > (gf & (RCU_GP_FLAG_FQS | RCU_GP_FLAG_OVLD))) { > ... > } else { > /* Deal with stray signal. */ > } > > > So, do we need to move this calculation above the 'if' block which sets > rcu_state.jiffies_force_qs? > if (!ret) { > > WRITE_ONCE(rcu_state.jiffies_force_qs, jiffies + > j);... > } Good catch, thank you! How about the updated patch shown below? Thanx, Paul ------------------------------------------------------------------------ commit 77de092c78f549b5c28075bfee9998a525d21f84 Author: Paul E. McKenney <paulmck@kernel.org> Date: Tue Apr 12 15:08:14 2022 -0700 rcu: Decrease FQS scan wait time in case of callback overloading The force-quiesce-state loop function rcu_gp_fqs_loop() checks for callback overloading and does an immediate initial scan for idle CPUs if so. However, subsequent rescans will be carried out at as leisurely a rate as they always are, as specified by the rcutree.jiffies_till_next_fqs module parameter. It might be tempting to just continue immediately rescanning, but this turns the RCU grace-period kthread into a CPU hog. It might also be tempting to reduce the time between rescans to a single jiffy, but this can be problematic on larger systems. This commit therefore divides the normal time between rescans by three, rounding up. Thus a small system running at HZ=1000 that is suffering from callback overload will wait only one jiffy instead of the normal three between rescans. [ paulmck: Apply Neeraj Upadhyay feedback. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index c25ba442044a6..52094e72866e5 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -1983,7 +1983,12 @@ static noinline_for_stack void rcu_gp_fqs_loop(void) gf = RCU_GP_FLAG_OVLD; ret = 0; for (;;) { - if (!ret) { + if (rcu_state.cbovld) { + j = (j + 2) / 3; + if (j <= 0) + j = 1; + } + if (!ret || time_before(jiffies + j, rcu_state.jiffies_force_qs)) { WRITE_ONCE(rcu_state.jiffies_force_qs, jiffies + j); /* * jiffies_force_qs before RCU_GP_WAIT_FQS state
On 6/22/2022 3:49 AM, Paul E. McKenney wrote: > On Tue, Jun 21, 2022 at 10:59:58AM +0530, Neeraj Upadhyay wrote: >> >> >> On 6/21/2022 3:50 AM, Paul E. McKenney wrote: >>> The force-quiesce-state loop function rcu_gp_fqs_loop() checks for >>> callback overloading and does an immediate initial scan for idle CPUs >>> if so. However, subsequent rescans will be carried out at as leisurely a >>> rate as they always are, as specified by the rcutree.jiffies_till_next_fqs >>> module parameter. It might be tempting to just continue immediately >>> rescanning, but this turns the RCU grace-period kthread into a CPU hog. >>> It might also be tempting to reduce the time between rescans to a single >>> jiffy, but this can be problematic on larger systems. >>> >>> This commit therefore divides the normal time between rescans by three, >>> rounding up. Thus a small system running at HZ=1000 that is suffering >>> from callback overload will wait only one jiffy instead of the normal >>> three between rescans. >>> >>> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> >>> --- >>> kernel/rcu/tree.c | 5 +++++ >>> 1 file changed, 5 insertions(+) >>> >>> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c >>> index c25ba442044a6..c19d5926886fb 100644 >>> --- a/kernel/rcu/tree.c >>> +++ b/kernel/rcu/tree.c >>> @@ -1993,6 +1993,11 @@ static noinline_for_stack void rcu_gp_fqs_loop(void) >>> WRITE_ONCE(rcu_state.jiffies_kick_kthreads, >>> jiffies + (j ? 3 * j : 2)); >>> } >>> + if (rcu_state.cbovld) { >>> + j = (j + 2) / 3; >>> + if (j <= 0) >>> + j = 1; >>> + } >> >> We update 'j' here, after setting rcu_state.jiffies_force_qs >> >> WRITE_ONCE(rcu_state.jiffies_force_qs, jiffies + j) >> >> So, we return from swait_event_idle_timeout_exclusive after 1/3 time >> duration. >> >> swait_event_idle_timeout_exclusive(rcu_state.gp_wq, >> rcu_gp_fqs_check_wake(&gf), j); >> >> This can result in !timer_after check to return false and we will >> enter the 'else' (stray signal block) code? >> >> This might not matter for first 2 fqs loop iterations, where >> RCU_GP_FLAG_OVLD is set in 'gf', but subsequent iterations won't benefit >> from this patch? >> >> >> if (!time_after(rcu_state.jiffies_force_qs, jiffies) || >> (gf & (RCU_GP_FLAG_FQS | RCU_GP_FLAG_OVLD))) { >> ... >> } else { >> /* Deal with stray signal. */ >> } >> >> >> So, do we need to move this calculation above the 'if' block which sets >> rcu_state.jiffies_force_qs? >> if (!ret) { >> >> WRITE_ONCE(rcu_state.jiffies_force_qs, jiffies + >> j);... >> } > > Good catch, thank you! How about the updated patch shown below? > Looks good to me. Thanks Neeraj > Thanx, Paul > > ------------------------------------------------------------------------ > > commit 77de092c78f549b5c28075bfee9998a525d21f84 > Author: Paul E. McKenney <paulmck@kernel.org> > Date: Tue Apr 12 15:08:14 2022 -0700 > > rcu: Decrease FQS scan wait time in case of callback overloading > > The force-quiesce-state loop function rcu_gp_fqs_loop() checks for > callback overloading and does an immediate initial scan for idle CPUs > if so. However, subsequent rescans will be carried out at as leisurely a > rate as they always are, as specified by the rcutree.jiffies_till_next_fqs > module parameter. It might be tempting to just continue immediately > rescanning, but this turns the RCU grace-period kthread into a CPU hog. > It might also be tempting to reduce the time between rescans to a single > jiffy, but this can be problematic on larger systems. > > This commit therefore divides the normal time between rescans by three, > rounding up. Thus a small system running at HZ=1000 that is suffering > from callback overload will wait only one jiffy instead of the normal > three between rescans. > > [ paulmck: Apply Neeraj Upadhyay feedback. ] > > Signed-off-by: Paul E. McKenney <paulmck@kernel.org> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index c25ba442044a6..52094e72866e5 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -1983,7 +1983,12 @@ static noinline_for_stack void rcu_gp_fqs_loop(void) > gf = RCU_GP_FLAG_OVLD; > ret = 0; > for (;;) { > - if (!ret) { > + if (rcu_state.cbovld) { > + j = (j + 2) / 3; > + if (j <= 0) > + j = 1; > + } > + if (!ret || time_before(jiffies + j, rcu_state.jiffies_force_qs)) { > WRITE_ONCE(rcu_state.jiffies_force_qs, jiffies + j); > /* > * jiffies_force_qs before RCU_GP_WAIT_FQS state
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index c25ba442044a6..c19d5926886fb 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -1993,6 +1993,11 @@ static noinline_for_stack void rcu_gp_fqs_loop(void) WRITE_ONCE(rcu_state.jiffies_kick_kthreads, jiffies + (j ? 3 * j : 2)); } + if (rcu_state.cbovld) { + j = (j + 2) / 3; + if (j <= 0) + j = 1; + } trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq, TPS("fqswait")); WRITE_ONCE(rcu_state.gp_state, RCU_GP_WAIT_FQS);
The force-quiesce-state loop function rcu_gp_fqs_loop() checks for callback overloading and does an immediate initial scan for idle CPUs if so. However, subsequent rescans will be carried out at as leisurely a rate as they always are, as specified by the rcutree.jiffies_till_next_fqs module parameter. It might be tempting to just continue immediately rescanning, but this turns the RCU grace-period kthread into a CPU hog. It might also be tempting to reduce the time between rescans to a single jiffy, but this can be problematic on larger systems. This commit therefore divides the normal time between rescans by three, rounding up. Thus a small system running at HZ=1000 that is suffering from callback overload will wait only one jiffy instead of the normal three between rescans. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> --- kernel/rcu/tree.c | 5 +++++ 1 file changed, 5 insertions(+)