Message ID | 20230531101736.12981-5-frederic@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | rcu: Support for lazy callbacks on !CONFIG_RCU_NOCB_CPU | expand |
On Wed, May 31, 2023 at 12:17:31PM +0200, Frederic Weisbecker wrote: > The lazy and the regular bypass queues share the same thresholds in > terms of number of callbacks after which a flush to the main list is > performed: 10 000 callbacks. > > However lazy and regular bypass don't have the same purposes and neither > should their respective thresholds: > > * The bypass queue stands for relieving the main queue in case of a > callback storm. It makes sense to allow a high number of callbacks to > pile up before flushing to the main queue, especially as the life > cycle for this queue is very short (1 jiffy). Sure. > > * The lazy queue aims to spare wake ups and reduce the number of grace > periods. There it doesn't make sense to allow a huge number of > callbacks before flushing so as not to introduce memory pressure, > especially as the life cycle for this queue is very long (10 > seconds). It does make sense as we have a shrinker, and it is better to avoid RCU disturbing the system unwantedly when there's lots of memory lying around. > > For those reasons, set the default threshold for the lazy queue to > 100. I am OK with splitting the qhimark, but this lazy default is too low. In typical workloads on ChromeOS, we see 1000s of callback even when CPU utilization is low. So considering that, we would be flushing all the time. Eventually I want the mm subsystem to tell us when flushing is needed so we could flush sooner at that time if really needed, but for now we have a shrinker so it should be OK. Is there a reason the shrinker does not work for you? thanks, - Joel > > Signed-off-by: Frederic Weisbecker <frederic@kernel.org> > --- > kernel/rcu/tree.c | 2 ++ > kernel/rcu/tree_nocb.h | 9 ++++----- > 2 files changed, 6 insertions(+), 5 deletions(-) > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index bc4e7c9b51cb..9b98d87fa22e 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -379,6 +379,8 @@ static int rcu_is_cpu_rrupt_from_idle(void) > static long blimit = DEFAULT_RCU_BLIMIT; > #define DEFAULT_RCU_QHIMARK 10000 // If this many pending, ignore blimit. > static long qhimark = DEFAULT_RCU_QHIMARK; > +#define DEFAULT_RCU_QHIMARK_LAZY 100 // If this many pending, flush. > +static long qhimark_lazy = DEFAULT_RCU_QHIMARK_LAZY; > #define DEFAULT_RCU_QLOMARK 100 // Once only this many pending, use blimit. > static long qlowmark = DEFAULT_RCU_QLOMARK; > #define DEFAULT_RCU_QOVLD_MULT 2 > diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h > index 8320eb77b58b..c08447db5a2e 100644 > --- a/kernel/rcu/tree_nocb.h > +++ b/kernel/rcu/tree_nocb.h > @@ -480,10 +480,9 @@ static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp, > > // If ->nocb_bypass has been used too long or is too full, > // flush ->nocb_bypass to ->cblist. > - if ((ncbs && !bypass_is_lazy && j != READ_ONCE(rdp->nocb_bypass_first)) || > - (ncbs && bypass_is_lazy && > - (time_after(j, READ_ONCE(rdp->nocb_bypass_first) + jiffies_lazy_flush))) || > - ncbs >= qhimark) { > + if (ncbs && > + ((!bypass_is_lazy && ((j != READ_ONCE(rdp->nocb_bypass_first)) || ncbs >= qhimark)) || > + (bypass_is_lazy && (time_after(j, READ_ONCE(rdp->nocb_bypass_first) + jiffies_lazy_flush) || ncbs >= qhimark_lazy)))) { > rcu_nocb_lock(rdp); > *was_alldone = !rcu_segcblist_pend_cbs(&rdp->cblist); > > @@ -724,7 +723,7 @@ static void nocb_gp_wait(struct rcu_data *my_rdp) > > if (bypass_ncbs && (lazy_ncbs == bypass_ncbs) && > (time_after(j, READ_ONCE(rdp->nocb_bypass_first) + jiffies_lazy_flush) || > - bypass_ncbs > 2 * qhimark)) { > + bypass_ncbs > 2 * qhimark_lazy)) { > flush_bypass = true; > } else if (bypass_ncbs && (lazy_ncbs != bypass_ncbs) && > (time_after(j, READ_ONCE(rdp->nocb_bypass_first) + 1) || > -- > 2.40.1 >
Le Sat, Jun 03, 2023 at 01:23:38AM +0000, Joel Fernandes a écrit : > On Wed, May 31, 2023 at 12:17:31PM +0200, Frederic Weisbecker wrote: > > The lazy and the regular bypass queues share the same thresholds in > > terms of number of callbacks after which a flush to the main list is > > performed: 10 000 callbacks. > > > > However lazy and regular bypass don't have the same purposes and neither > > should their respective thresholds: > > > > * The bypass queue stands for relieving the main queue in case of a > > callback storm. It makes sense to allow a high number of callbacks to > > pile up before flushing to the main queue, especially as the life > > cycle for this queue is very short (1 jiffy). > > Sure. > > > > > * The lazy queue aims to spare wake ups and reduce the number of grace > > periods. There it doesn't make sense to allow a huge number of > > callbacks before flushing so as not to introduce memory pressure, > > especially as the life cycle for this queue is very long (10 > > seconds). > > It does make sense as we have a shrinker, and it is better to avoid RCU > disturbing the system unwantedly when there's lots of memory lying around. > > > > > For those reasons, set the default threshold for the lazy queue to > > 100. > > I am OK with splitting the qhimark, but this lazy default is too low. In > typical workloads on ChromeOS, we see 1000s of callback even when CPU > utilization is low. So considering that, we would be flushing all the time. > > Eventually I want the mm subsystem to tell us when flushing is needed so we > could flush sooner at that time if really needed, but for now we have a > shrinker so it should be OK. Is there a reason the shrinker does not work for > you? So you mean dynamically adapting the lazy qhimark on top of shrinker calls, right? That would make sense indeed. Thanks.
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index bc4e7c9b51cb..9b98d87fa22e 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -379,6 +379,8 @@ static int rcu_is_cpu_rrupt_from_idle(void) static long blimit = DEFAULT_RCU_BLIMIT; #define DEFAULT_RCU_QHIMARK 10000 // If this many pending, ignore blimit. static long qhimark = DEFAULT_RCU_QHIMARK; +#define DEFAULT_RCU_QHIMARK_LAZY 100 // If this many pending, flush. +static long qhimark_lazy = DEFAULT_RCU_QHIMARK_LAZY; #define DEFAULT_RCU_QLOMARK 100 // Once only this many pending, use blimit. static long qlowmark = DEFAULT_RCU_QLOMARK; #define DEFAULT_RCU_QOVLD_MULT 2 diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h index 8320eb77b58b..c08447db5a2e 100644 --- a/kernel/rcu/tree_nocb.h +++ b/kernel/rcu/tree_nocb.h @@ -480,10 +480,9 @@ static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp, // If ->nocb_bypass has been used too long or is too full, // flush ->nocb_bypass to ->cblist. - if ((ncbs && !bypass_is_lazy && j != READ_ONCE(rdp->nocb_bypass_first)) || - (ncbs && bypass_is_lazy && - (time_after(j, READ_ONCE(rdp->nocb_bypass_first) + jiffies_lazy_flush))) || - ncbs >= qhimark) { + if (ncbs && + ((!bypass_is_lazy && ((j != READ_ONCE(rdp->nocb_bypass_first)) || ncbs >= qhimark)) || + (bypass_is_lazy && (time_after(j, READ_ONCE(rdp->nocb_bypass_first) + jiffies_lazy_flush) || ncbs >= qhimark_lazy)))) { rcu_nocb_lock(rdp); *was_alldone = !rcu_segcblist_pend_cbs(&rdp->cblist); @@ -724,7 +723,7 @@ static void nocb_gp_wait(struct rcu_data *my_rdp) if (bypass_ncbs && (lazy_ncbs == bypass_ncbs) && (time_after(j, READ_ONCE(rdp->nocb_bypass_first) + jiffies_lazy_flush) || - bypass_ncbs > 2 * qhimark)) { + bypass_ncbs > 2 * qhimark_lazy)) { flush_bypass = true; } else if (bypass_ncbs && (lazy_ncbs != bypass_ncbs) && (time_after(j, READ_ONCE(rdp->nocb_bypass_first) + 1) ||
The lazy and the regular bypass queues share the same thresholds in terms of number of callbacks after which a flush to the main list is performed: 10 000 callbacks. However lazy and regular bypass don't have the same purposes and neither should their respective thresholds: * The bypass queue stands for relieving the main queue in case of a callback storm. It makes sense to allow a high number of callbacks to pile up before flushing to the main queue, especially as the life cycle for this queue is very short (1 jiffy). * The lazy queue aims to spare wake ups and reduce the number of grace periods. There it doesn't make sense to allow a huge number of callbacks before flushing so as not to introduce memory pressure, especially as the life cycle for this queue is very long (10 seconds) For those reasons, set the default threshold for the lazy queue to 100. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> --- kernel/rcu/tree.c | 2 ++ kernel/rcu/tree_nocb.h | 9 ++++----- 2 files changed, 6 insertions(+), 5 deletions(-)