diff mbox series

[4/9] rcu: Introduce lazy queue's own qhimark

Message ID 20230531101736.12981-5-frederic@kernel.org (mailing list archive)
State New, archived
Headers show
Series rcu: Support for lazy callbacks on !CONFIG_RCU_NOCB_CPU | expand

Commit Message

Frederic Weisbecker May 31, 2023, 10:17 a.m. UTC
The lazy and the regular bypass queues share the same thresholds in
terms of number of callbacks after which a flush to the main list is
performed: 10 000 callbacks.

However lazy and regular bypass don't have the same purposes and neither
should their respective thresholds:

* The bypass queue stands for relieving the main queue in case of a
  callback storm. It makes sense to allow a high number of callbacks to
  pile up before flushing to the main queue, especially as the life
  cycle for this queue is very short (1 jiffy).

* The lazy queue aims to spare wake ups and reduce the number of grace
  periods. There it doesn't make sense to allow a huge number of
  callbacks before flushing so as not to introduce memory pressure,
  especially as the life cycle for this queue is very long (10
  seconds)

For those reasons, set the default threshold for the lazy queue to
100.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 kernel/rcu/tree.c      | 2 ++
 kernel/rcu/tree_nocb.h | 9 ++++-----
 2 files changed, 6 insertions(+), 5 deletions(-)

Comments

Joel Fernandes June 3, 2023, 1:23 a.m. UTC | #1
On Wed, May 31, 2023 at 12:17:31PM +0200, Frederic Weisbecker wrote:
> The lazy and the regular bypass queues share the same thresholds in
> terms of number of callbacks after which a flush to the main list is
> performed: 10 000 callbacks.
> 
> However lazy and regular bypass don't have the same purposes and neither
> should their respective thresholds:
> 
> * The bypass queue stands for relieving the main queue in case of a
>   callback storm. It makes sense to allow a high number of callbacks to
>   pile up before flushing to the main queue, especially as the life
>   cycle for this queue is very short (1 jiffy).

Sure.

> 
> * The lazy queue aims to spare wake ups and reduce the number of grace
>   periods. There it doesn't make sense to allow a huge number of
>   callbacks before flushing so as not to introduce memory pressure,
>   especially as the life cycle for this queue is very long (10
>   seconds).

It does make sense as we have a shrinker, and it is better to avoid RCU
disturbing the system unwantedly when there's lots of memory lying around.

> 
> For those reasons, set the default threshold for the lazy queue to
> 100.

I am OK with splitting the qhimark, but this lazy default is too low. In
typical workloads on ChromeOS, we see 1000s of callback even when CPU
utilization is low. So considering that, we would be flushing all the time.

Eventually I want the mm subsystem to tell us when flushing is needed so we
could flush sooner at that time if really needed, but for now we have a
shrinker so it should be OK. Is there a reason the shrinker does not work for
you?

thanks,

 - Joel


> 
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> ---
>  kernel/rcu/tree.c      | 2 ++
>  kernel/rcu/tree_nocb.h | 9 ++++-----
>  2 files changed, 6 insertions(+), 5 deletions(-)
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index bc4e7c9b51cb..9b98d87fa22e 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -379,6 +379,8 @@ static int rcu_is_cpu_rrupt_from_idle(void)
>  static long blimit = DEFAULT_RCU_BLIMIT;
>  #define DEFAULT_RCU_QHIMARK 10000 // If this many pending, ignore blimit.
>  static long qhimark = DEFAULT_RCU_QHIMARK;
> +#define DEFAULT_RCU_QHIMARK_LAZY 100 // If this many pending, flush.
> +static long qhimark_lazy = DEFAULT_RCU_QHIMARK_LAZY;
>  #define DEFAULT_RCU_QLOMARK 100   // Once only this many pending, use blimit.
>  static long qlowmark = DEFAULT_RCU_QLOMARK;
>  #define DEFAULT_RCU_QOVLD_MULT 2
> diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
> index 8320eb77b58b..c08447db5a2e 100644
> --- a/kernel/rcu/tree_nocb.h
> +++ b/kernel/rcu/tree_nocb.h
> @@ -480,10 +480,9 @@ static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
>  
>  	// If ->nocb_bypass has been used too long or is too full,
>  	// flush ->nocb_bypass to ->cblist.
> -	if ((ncbs && !bypass_is_lazy && j != READ_ONCE(rdp->nocb_bypass_first)) ||
> -	    (ncbs &&  bypass_is_lazy &&
> -	     (time_after(j, READ_ONCE(rdp->nocb_bypass_first) + jiffies_lazy_flush))) ||
> -	    ncbs >= qhimark) {
> +	if (ncbs &&
> +	    ((!bypass_is_lazy && ((j != READ_ONCE(rdp->nocb_bypass_first)) || ncbs >= qhimark)) ||
> +	     (bypass_is_lazy && (time_after(j, READ_ONCE(rdp->nocb_bypass_first) + jiffies_lazy_flush) || ncbs >= qhimark_lazy)))) {
>  		rcu_nocb_lock(rdp);
>  		*was_alldone = !rcu_segcblist_pend_cbs(&rdp->cblist);
>  
> @@ -724,7 +723,7 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
>  
>  		if (bypass_ncbs && (lazy_ncbs == bypass_ncbs) &&
>  		    (time_after(j, READ_ONCE(rdp->nocb_bypass_first) + jiffies_lazy_flush) ||
> -		     bypass_ncbs > 2 * qhimark)) {
> +		     bypass_ncbs > 2 * qhimark_lazy)) {
>  			flush_bypass = true;
>  		} else if (bypass_ncbs && (lazy_ncbs != bypass_ncbs) &&
>  		    (time_after(j, READ_ONCE(rdp->nocb_bypass_first) + 1) ||
> -- 
> 2.40.1
>
Frederic Weisbecker July 5, 2023, 12:45 p.m. UTC | #2
Le Sat, Jun 03, 2023 at 01:23:38AM +0000, Joel Fernandes a écrit :
> On Wed, May 31, 2023 at 12:17:31PM +0200, Frederic Weisbecker wrote:
> > The lazy and the regular bypass queues share the same thresholds in
> > terms of number of callbacks after which a flush to the main list is
> > performed: 10 000 callbacks.
> > 
> > However lazy and regular bypass don't have the same purposes and neither
> > should their respective thresholds:
> > 
> > * The bypass queue stands for relieving the main queue in case of a
> >   callback storm. It makes sense to allow a high number of callbacks to
> >   pile up before flushing to the main queue, especially as the life
> >   cycle for this queue is very short (1 jiffy).
> 
> Sure.
> 
> > 
> > * The lazy queue aims to spare wake ups and reduce the number of grace
> >   periods. There it doesn't make sense to allow a huge number of
> >   callbacks before flushing so as not to introduce memory pressure,
> >   especially as the life cycle for this queue is very long (10
> >   seconds).
> 
> It does make sense as we have a shrinker, and it is better to avoid RCU
> disturbing the system unwantedly when there's lots of memory lying around.
> 
> > 
> > For those reasons, set the default threshold for the lazy queue to
> > 100.
> 
> I am OK with splitting the qhimark, but this lazy default is too low. In
> typical workloads on ChromeOS, we see 1000s of callback even when CPU
> utilization is low. So considering that, we would be flushing all the time.
> 
> Eventually I want the mm subsystem to tell us when flushing is needed so we
> could flush sooner at that time if really needed, but for now we have a
> shrinker so it should be OK. Is there a reason the shrinker does not work for
> you?

So you mean dynamically adapting the lazy qhimark on top of shrinker calls,
right? That would make sense indeed.

Thanks.
diff mbox series

Patch

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index bc4e7c9b51cb..9b98d87fa22e 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -379,6 +379,8 @@  static int rcu_is_cpu_rrupt_from_idle(void)
 static long blimit = DEFAULT_RCU_BLIMIT;
 #define DEFAULT_RCU_QHIMARK 10000 // If this many pending, ignore blimit.
 static long qhimark = DEFAULT_RCU_QHIMARK;
+#define DEFAULT_RCU_QHIMARK_LAZY 100 // If this many pending, flush.
+static long qhimark_lazy = DEFAULT_RCU_QHIMARK_LAZY;
 #define DEFAULT_RCU_QLOMARK 100   // Once only this many pending, use blimit.
 static long qlowmark = DEFAULT_RCU_QLOMARK;
 #define DEFAULT_RCU_QOVLD_MULT 2
diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index 8320eb77b58b..c08447db5a2e 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -480,10 +480,9 @@  static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
 
 	// If ->nocb_bypass has been used too long or is too full,
 	// flush ->nocb_bypass to ->cblist.
-	if ((ncbs && !bypass_is_lazy && j != READ_ONCE(rdp->nocb_bypass_first)) ||
-	    (ncbs &&  bypass_is_lazy &&
-	     (time_after(j, READ_ONCE(rdp->nocb_bypass_first) + jiffies_lazy_flush))) ||
-	    ncbs >= qhimark) {
+	if (ncbs &&
+	    ((!bypass_is_lazy && ((j != READ_ONCE(rdp->nocb_bypass_first)) || ncbs >= qhimark)) ||
+	     (bypass_is_lazy && (time_after(j, READ_ONCE(rdp->nocb_bypass_first) + jiffies_lazy_flush) || ncbs >= qhimark_lazy)))) {
 		rcu_nocb_lock(rdp);
 		*was_alldone = !rcu_segcblist_pend_cbs(&rdp->cblist);
 
@@ -724,7 +723,7 @@  static void nocb_gp_wait(struct rcu_data *my_rdp)
 
 		if (bypass_ncbs && (lazy_ncbs == bypass_ncbs) &&
 		    (time_after(j, READ_ONCE(rdp->nocb_bypass_first) + jiffies_lazy_flush) ||
-		     bypass_ncbs > 2 * qhimark)) {
+		     bypass_ncbs > 2 * qhimark_lazy)) {
 			flush_bypass = true;
 		} else if (bypass_ncbs && (lazy_ncbs != bypass_ncbs) &&
 		    (time_after(j, READ_ONCE(rdp->nocb_bypass_first) + 1) ||