mbox series

[v4,0/4] Reduce synchronize_rcu() latency(v4)

Message ID 20240104162510.72773-1-urezki@gmail.com (mailing list archive)
Headers show
Series Reduce synchronize_rcu() latency(v4) | expand

Message

Uladzislau Rezki (Sony) Jan. 4, 2024, 4:25 p.m. UTC
This is a v4 that tends to improve synchronize_rcu() call. To be more
specific it is about reducing a waiting time(especially worst cases)
of caller that blocks until a grace period is elapsed. 

In general, this series separates synchronize_rcu() callers from other
callbacks. We keep a dedicated an independent queue, thus the processing
of it starts as soon as grace period is over, so there is no need to wait
until other callbacks are processed one by one. Please note, a number of
callbacks can be 10K, 20K, 60K and so on. That is why this series maintain
a separate track for this call that blocks a context.

v3 -> v4:
 - Squash patches;
 - Add more description;
 - Fix comments based on v3 feedback.

v3: https://lore.kernel.org/lkml/cd45b0b5-f86b-43fb-a5f3-47d340cd4f9f@paulmck-laptop/T/
v2: https://lore.kernel.org/all/20231030131254.488186-1-urezki@gmail.com/T/
v1: https://lore.kernel.org/lkml/20231025140915.590390-1-urezki@gmail.com/T/

Neeraj Upadhyay (1):
  rcu: Improve handling of synchronize_rcu() users

Uladzislau Rezki (Sony) (3):
  rcu: Reduce synchronize_rcu() latency
  rcu: Add a trace event for synchronize_rcu_normal()
  rcu: Support direct wake-up of synchronize_rcu() users

 .../admin-guide/kernel-parameters.txt         |  14 +
 include/trace/events/rcu.h                    |  27 ++
 kernel/rcu/Kconfig.debug                      |  12 +
 kernel/rcu/tree.c                             | 361 +++++++++++++++++-
 kernel/rcu/tree.h                             |  19 +
 kernel/rcu/tree_exp.h                         |   2 +-
 6 files changed, 433 insertions(+), 2 deletions(-)

Comments

Paul E. McKenney Jan. 27, 2024, 7:07 a.m. UTC | #1
On Thu, Jan 04, 2024 at 05:25:06PM +0100, Uladzislau Rezki (Sony) wrote:
> This is a v4 that tends to improve synchronize_rcu() call. To be more
> specific it is about reducing a waiting time(especially worst cases)
> of caller that blocks until a grace period is elapsed. 
> 
> In general, this series separates synchronize_rcu() callers from other
> callbacks. We keep a dedicated an independent queue, thus the processing
> of it starts as soon as grace period is over, so there is no need to wait
> until other callbacks are processed one by one. Please note, a number of
> callbacks can be 10K, 20K, 60K and so on. That is why this series maintain
> a separate track for this call that blocks a context.

And before I forget (again), a possible follow-on to this work is to
reduce cond_synchronize_rcu() and cond_synchronize_rcu_full() latency.
Right now, these wait for a full additional grace period (and maybe
more) when the required grace period has not elapsed.  In contrast,
this work might enable waiting only for the needed portion of a grace
period to elapse.

							Thanx, Paul

> v3 -> v4:
>  - Squash patches;
>  - Add more description;
>  - Fix comments based on v3 feedback.
> 
> v3: https://lore.kernel.org/lkml/cd45b0b5-f86b-43fb-a5f3-47d340cd4f9f@paulmck-laptop/T/
> v2: https://lore.kernel.org/all/20231030131254.488186-1-urezki@gmail.com/T/
> v1: https://lore.kernel.org/lkml/20231025140915.590390-1-urezki@gmail.com/T/
> 
> Neeraj Upadhyay (1):
>   rcu: Improve handling of synchronize_rcu() users
> 
> Uladzislau Rezki (Sony) (3):
>   rcu: Reduce synchronize_rcu() latency
>   rcu: Add a trace event for synchronize_rcu_normal()
>   rcu: Support direct wake-up of synchronize_rcu() users
> 
>  .../admin-guide/kernel-parameters.txt         |  14 +
>  include/trace/events/rcu.h                    |  27 ++
>  kernel/rcu/Kconfig.debug                      |  12 +
>  kernel/rcu/tree.c                             | 361 +++++++++++++++++-
>  kernel/rcu/tree.h                             |  19 +
>  kernel/rcu/tree_exp.h                         |   2 +-
>  6 files changed, 433 insertions(+), 2 deletions(-)
> 
> -- 
> 2.39.2
>
Uladzislau Rezki (Sony) Jan. 29, 2024, 4:23 p.m. UTC | #2
On Fri, Jan 26, 2024 at 11:07:18PM -0800, Paul E. McKenney wrote:
> On Thu, Jan 04, 2024 at 05:25:06PM +0100, Uladzislau Rezki (Sony) wrote:
> > This is a v4 that tends to improve synchronize_rcu() call. To be more
> > specific it is about reducing a waiting time(especially worst cases)
> > of caller that blocks until a grace period is elapsed. 
> > 
> > In general, this series separates synchronize_rcu() callers from other
> > callbacks. We keep a dedicated an independent queue, thus the processing
> > of it starts as soon as grace period is over, so there is no need to wait
> > until other callbacks are processed one by one. Please note, a number of
> > callbacks can be 10K, 20K, 60K and so on. That is why this series maintain
> > a separate track for this call that blocks a context.
> 
> And before I forget (again), a possible follow-on to this work is to
> reduce cond_synchronize_rcu() and cond_synchronize_rcu_full() latency.
> Right now, these wait for a full additional grace period (and maybe
> more) when the required grace period has not elapsed.  In contrast,
> this work might enable waiting only for the needed portion of a grace
> period to elapse.
> 
Thanks. I see it. Probably we also need to move "sync" related
functionality out of tree.c file to the sync.c or something similar
to that name. IMO. 

Thanks!

--
Uladzislau Rezki
Paul E. McKenney Jan. 29, 2024, 7:43 p.m. UTC | #3
On Mon, Jan 29, 2024 at 05:23:01PM +0100, Uladzislau Rezki wrote:
> On Fri, Jan 26, 2024 at 11:07:18PM -0800, Paul E. McKenney wrote:
> > On Thu, Jan 04, 2024 at 05:25:06PM +0100, Uladzislau Rezki (Sony) wrote:
> > > This is a v4 that tends to improve synchronize_rcu() call. To be more
> > > specific it is about reducing a waiting time(especially worst cases)
> > > of caller that blocks until a grace period is elapsed. 
> > > 
> > > In general, this series separates synchronize_rcu() callers from other
> > > callbacks. We keep a dedicated an independent queue, thus the processing
> > > of it starts as soon as grace period is over, so there is no need to wait
> > > until other callbacks are processed one by one. Please note, a number of
> > > callbacks can be 10K, 20K, 60K and so on. That is why this series maintain
> > > a separate track for this call that blocks a context.
> > 
> > And before I forget (again), a possible follow-on to this work is to
> > reduce cond_synchronize_rcu() and cond_synchronize_rcu_full() latency.
> > Right now, these wait for a full additional grace period (and maybe
> > more) when the required grace period has not elapsed.  In contrast,
> > this work might enable waiting only for the needed portion of a grace
> > period to elapse.
> > 
> Thanks. I see it. Probably we also need to move "sync" related
> functionality out of tree.c file to the sync.c or something similar
> to that name. IMO. 

I would prioritize moving the kfree_rcu() code out of tree.c quite
a ways over moving out the synchronous-wait code.  ;-)

							Thanx, Paul
Uladzislau Rezki (Sony) Jan. 29, 2024, 8:36 p.m. UTC | #4
On Mon, Jan 29, 2024 at 11:43:43AM -0800, Paul E. McKenney wrote:
> On Mon, Jan 29, 2024 at 05:23:01PM +0100, Uladzislau Rezki wrote:
> > On Fri, Jan 26, 2024 at 11:07:18PM -0800, Paul E. McKenney wrote:
> > > On Thu, Jan 04, 2024 at 05:25:06PM +0100, Uladzislau Rezki (Sony) wrote:
> > > > This is a v4 that tends to improve synchronize_rcu() call. To be more
> > > > specific it is about reducing a waiting time(especially worst cases)
> > > > of caller that blocks until a grace period is elapsed. 
> > > > 
> > > > In general, this series separates synchronize_rcu() callers from other
> > > > callbacks. We keep a dedicated an independent queue, thus the processing
> > > > of it starts as soon as grace period is over, so there is no need to wait
> > > > until other callbacks are processed one by one. Please note, a number of
> > > > callbacks can be 10K, 20K, 60K and so on. That is why this series maintain
> > > > a separate track for this call that blocks a context.
> > > 
> > > And before I forget (again), a possible follow-on to this work is to
> > > reduce cond_synchronize_rcu() and cond_synchronize_rcu_full() latency.
> > > Right now, these wait for a full additional grace period (and maybe
> > > more) when the required grace period has not elapsed.  In contrast,
> > > this work might enable waiting only for the needed portion of a grace
> > > period to elapse.
> > > 
> > Thanks. I see it. Probably we also need to move "sync" related
> > functionality out of tree.c file to the sync.c or something similar
> > to that name. IMO. 
> 
> I would prioritize moving the kfree_rcu() code out of tree.c quite
> a ways over moving out the synchronous-wait code.  ;-)
> 
Indeed. But i am not about priority :)

--
Uladzislau Rezki