[v8,01/13] rcu: Fix missing nocb gp wake on rcu_barrier()

Message ID	20221011180142.2742289-2-joel@joelfernandes.org (mailing list archive)
State	Superseded
Headers	show Return-Path: <rcu-owner@kernel.org> From: "Joel Fernandes (Google)" <joel@joelfernandes.org> To: rcu@vger.kernel.org Cc: linux-kernel@vger.kernel.org, rushikesh.s.kadam@intel.com, urezki@gmail.com, neeraj.iitr10@gmail.com, frederic@kernel.org, paulmck@kernel.org, rostedt@goodmis.org, youssefesmat@google.com, surenb@google.com, Joel Fernandes <joel@joelfernandes.org> Subject: [PATCH v8 01/13] rcu: Fix missing nocb gp wake on rcu_barrier() Date: Tue, 11 Oct 2022 18:01:30 +0000 Message-Id: <20221011180142.2742289-2-joel@joelfernandes.org> In-Reply-To: <20221011180142.2742289-1-joel@joelfernandes.org> References: <20221011180142.2742289-1-joel@joelfernandes.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	rcu: call_rcu() power improvements \| expand [v8,00/13] rcu: call_rcu() power improvements [v8,01/13] rcu: Fix missing nocb gp wake on rcu_barrier() [v8,02/13] rcu: Make call_rcu() lazy to save power [v8,03/13] rcu: Refactor code a bit in rcu_nocb_do_flush_bypass() [v8,04/13] rcu: shrinker for lazy rcu [v8,05/13] rcuscale: Add laziness and kfree tests [v8,06/13] percpu-refcount: Use call_rcu_flush() for atomic switch [v8,07/13] rcu/sync: Use call_rcu_flush() instead of call_rcu [v8,08/13] rcu/rcuscale: Use call_rcu_flush() for async reader test [v8,09/13] rcu/rcutorture: Use call_rcu_flush() where needed [v8,10/13] scsi/scsi_error: Use call_rcu_flush() instead of call_rcu() [v8,11/13] workqueue: Make queue_rcu_work() use call_rcu_flush() [v8,12/13] rxrpc: Use call_rcu_flush() instead of call_rcu() [v8,13/13] rcu/debug: Add wake-up debugging for lazy callbacks

Joel Fernandes Oct. 11, 2022, 6:01 p.m. UTC

From: Frederic Weisbecker <frederic@kernel.org>

Upon entraining a callback to a NOCB CPU, no further wake up is
issued on the corresponding nocb_gp kthread. As a result, the callback
and all the subsequent ones on that CPU may be ignored, at least until
an RCU_NOCB_WAKE_FORCE timer is ever armed or another NOCB CPU belonging
to the same group enqueues a callback on an empty queue.

Here is a possible bad scenario:

1) CPU 0 is NOCB unlike all other CPUs.
2) CPU 0 queues a callback
2) The grace period related to that callback elapses
3) The callback is moved to the done list (but is not invoked yet),
   there are no more pending callbacks for CPU 0
4) CPU 1 calls rcu_barrier() and sends an IPI to CPU 0
5) CPU 0 entrains the callback but doesn't wake up nocb_gp
6) CPU 1 blocks forever, unless CPU 0 ever queues enough further
   callbacks to arm an RCU_NOCB_WAKE_FORCE timer.

Make sure the necessary wake up is produced whenever necessary.

This is also required to make sure lazy callbacks in future patches
don't end up making rcu_barrier() wait for multiple seconds.

Reported-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Fixes: 5d6742b37727 ("rcu/nocb: Use rcu_segcblist for no-CBs CPUs")
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 kernel/rcu/tree.c      | 6 ++++++
 kernel/rcu/tree.h      | 1 +
 kernel/rcu/tree_nocb.h | 5 +++++
 3 files changed, 12 insertions(+)

Paul E. McKenney Oct. 14, 2022, 2:21 p.m. UTC | #1

On Tue, Oct 11, 2022 at 06:01:30PM +0000, Joel Fernandes (Google) wrote:
> From: Frederic Weisbecker <frederic@kernel.org>
> 
> Upon entraining a callback to a NOCB CPU, no further wake up is
> issued on the corresponding nocb_gp kthread. As a result, the callback
> and all the subsequent ones on that CPU may be ignored, at least until
> an RCU_NOCB_WAKE_FORCE timer is ever armed or another NOCB CPU belonging
> to the same group enqueues a callback on an empty queue.
> 
> Here is a possible bad scenario:
> 
> 1) CPU 0 is NOCB unlike all other CPUs.
> 2) CPU 0 queues a callback

Call it CB1.

> 2) The grace period related to that callback elapses
> 3) The callback is moved to the done list (but is not invoked yet),
>    there are no more pending callbacks for CPU 0

So CB1 is on ->cblist waiting to be invoked, correct?

> 4) CPU 1 calls rcu_barrier() and sends an IPI to CPU 0
> 5) CPU 0 entrains the callback but doesn't wake up nocb_gp

And CB1 must still be there because otherwise the IPI handler would not
have entrained the callback, correct?  If so, we have both CB1 and the
rcu_barrier() callback (call it CB2) in ->cblist, but on the done list.

> 6) CPU 1 blocks forever, unless CPU 0 ever queues enough further
>    callbacks to arm an RCU_NOCB_WAKE_FORCE timer.

Except that -something- must have already been prepared to wake up in
order to invoke CB1.  And that something would invoke CB2 along with CB1,
given that they are both on the done list.  If there is no such wakeup
already, then the hang could occur with just CB1, without the help of CB2.

> Make sure the necessary wake up is produced whenever necessary.

I am not seeing that the wakeup is needed in this case.

So what am I missing here?

> This is also required to make sure lazy callbacks in future patches
> don't end up making rcu_barrier() wait for multiple seconds.

But I do see that the wakeup is needed in the lazy case, and if I remember
correctly, the ten-second rcu_barrier() delay really did happen.  If I
understand correctly, for this to happen, all of the callbacks must be
in the bypass list, that is, ->cblist must be empty.

So has the scenario steps 1-6 called out above actually happened in the
absence of lazy callbacks?

							Thanx, Paul

> Reported-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> Fixes: 5d6742b37727 ("rcu/nocb: Use rcu_segcblist for no-CBs CPUs")
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> ---
>  kernel/rcu/tree.c      | 6 ++++++
>  kernel/rcu/tree.h      | 1 +
>  kernel/rcu/tree_nocb.h | 5 +++++
>  3 files changed, 12 insertions(+)
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 5ec97e3f7468..dc1c502216c7 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -3894,6 +3894,8 @@ static void rcu_barrier_entrain(struct rcu_data *rdp)
>  {
>  	unsigned long gseq = READ_ONCE(rcu_state.barrier_sequence);
>  	unsigned long lseq = READ_ONCE(rdp->barrier_seq_snap);
> +	bool wake_nocb = false;
> +	bool was_alldone = false;
>  
>  	lockdep_assert_held(&rcu_state.barrier_lock);
>  	if (rcu_seq_state(lseq) || !rcu_seq_state(gseq) || rcu_seq_ctr(lseq) != rcu_seq_ctr(gseq))
> @@ -3902,6 +3904,7 @@ static void rcu_barrier_entrain(struct rcu_data *rdp)
>  	rdp->barrier_head.func = rcu_barrier_callback;
>  	debug_rcu_head_queue(&rdp->barrier_head);
>  	rcu_nocb_lock(rdp);
> +	was_alldone = rcu_rdp_is_offloaded(rdp) && !rcu_segcblist_pend_cbs(&rdp->cblist);
>  	WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, jiffies));
>  	if (rcu_segcblist_entrain(&rdp->cblist, &rdp->barrier_head)) {
>  		atomic_inc(&rcu_state.barrier_cpu_count);
> @@ -3909,7 +3912,10 @@ static void rcu_barrier_entrain(struct rcu_data *rdp)
>  		debug_rcu_head_unqueue(&rdp->barrier_head);
>  		rcu_barrier_trace(TPS("IRQNQ"), -1, rcu_state.barrier_sequence);
>  	}
> +	wake_nocb = was_alldone && rcu_segcblist_pend_cbs(&rdp->cblist);
>  	rcu_nocb_unlock(rdp);
> +	if (wake_nocb)
> +		wake_nocb_gp(rdp, false);
>  	smp_store_release(&rdp->barrier_seq_snap, gseq);
>  }
>  
> diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
> index d4a97e40ea9c..925dd98f8b23 100644
> --- a/kernel/rcu/tree.h
> +++ b/kernel/rcu/tree.h
> @@ -439,6 +439,7 @@ static void zero_cpu_stall_ticks(struct rcu_data *rdp);
>  static struct swait_queue_head *rcu_nocb_gp_get(struct rcu_node *rnp);
>  static void rcu_nocb_gp_cleanup(struct swait_queue_head *sq);
>  static void rcu_init_one_nocb(struct rcu_node *rnp);
> +static bool wake_nocb_gp(struct rcu_data *rdp, bool force);
>  static bool rcu_nocb_flush_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
>  				  unsigned long j);
>  static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
> diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
> index f77a6d7e1356..094fd454b6c3 100644
> --- a/kernel/rcu/tree_nocb.h
> +++ b/kernel/rcu/tree_nocb.h
> @@ -1558,6 +1558,11 @@ static void rcu_init_one_nocb(struct rcu_node *rnp)
>  {
>  }
>  
> +static bool wake_nocb_gp(struct rcu_data *rdp, bool force)
> +{
> +	return false;
> +}
> +
>  static bool rcu_nocb_flush_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
>  				  unsigned long j)
>  {
> -- 
> 2.38.0.rc1.362.ged0d419d3c-goog
>

Frederic Weisbecker Oct. 14, 2022, 2:40 p.m. UTC | #2

On Fri, Oct 14, 2022 at 07:21:27AM -0700, Paul E. McKenney wrote:
> On Tue, Oct 11, 2022 at 06:01:30PM +0000, Joel Fernandes (Google) wrote:
> > From: Frederic Weisbecker <frederic@kernel.org>
> > 
> > Upon entraining a callback to a NOCB CPU, no further wake up is
> > issued on the corresponding nocb_gp kthread. As a result, the callback
> > and all the subsequent ones on that CPU may be ignored, at least until
> > an RCU_NOCB_WAKE_FORCE timer is ever armed or another NOCB CPU belonging
> > to the same group enqueues a callback on an empty queue.
> > 
> > Here is a possible bad scenario:
> > 
> > 1) CPU 0 is NOCB unlike all other CPUs.
> > 2) CPU 0 queues a callback
> 
> Call it CB1.
> 
> > 2) The grace period related to that callback elapses
> > 3) The callback is moved to the done list (but is not invoked yet),
> >    there are no more pending callbacks for CPU 0
> 
> So CB1 is on ->cblist waiting to be invoked, correct?
> 
> > 4) CPU 1 calls rcu_barrier() and sends an IPI to CPU 0
> > 5) CPU 0 entrains the callback but doesn't wake up nocb_gp
> 
> And CB1 must still be there because otherwise the IPI handler would not
> have entrained the callback, correct?  If so, we have both CB1 and the
> rcu_barrier() callback (call it CB2) in ->cblist, but on the done list.
> 
> > 6) CPU 1 blocks forever, unless CPU 0 ever queues enough further
> >    callbacks to arm an RCU_NOCB_WAKE_FORCE timer.
> 
> Except that -something- must have already been prepared to wake up in
> order to invoke CB1.  And that something would invoke CB2 along with CB1,
> given that they are both on the done list.  If there is no such wakeup
> already, then the hang could occur with just CB1, without the help of CB2.

Heh good point. I was confused with CB1 on RCU_DONE_TAIL and the possibility
for CB2 to be entrained on RCU_WAIT_TAIL. But that's indeed not supposed to
happen. Ok so this patch indeed doesn't make sense outside lazy.

> > This is also required to make sure lazy callbacks in future patches
> > don't end up making rcu_barrier() wait for multiple seconds.
> 
> But I do see that the wakeup is needed in the lazy case, and if I remember
> correctly, the ten-second rcu_barrier() delay really did happen.  If I
> understand correctly, for this to happen, all of the callbacks must be
> in the bypass list, that is, ->cblist must be empty.
> 
> So has the scenario steps 1-6 called out above actually happened in the
> absence of lazy callbacks?

Nope, so I guess we can have the pending check around rcu_nocb_flush_bypass()
only...

Thanks!

Paul E. McKenney Oct. 14, 2022, 3:03 p.m. UTC | #3

On Fri, Oct 14, 2022 at 04:40:19PM +0200, Frederic Weisbecker wrote:
> On Fri, Oct 14, 2022 at 07:21:27AM -0700, Paul E. McKenney wrote:
> > On Tue, Oct 11, 2022 at 06:01:30PM +0000, Joel Fernandes (Google) wrote:
> > > From: Frederic Weisbecker <frederic@kernel.org>
> > > 
> > > Upon entraining a callback to a NOCB CPU, no further wake up is
> > > issued on the corresponding nocb_gp kthread. As a result, the callback
> > > and all the subsequent ones on that CPU may be ignored, at least until
> > > an RCU_NOCB_WAKE_FORCE timer is ever armed or another NOCB CPU belonging
> > > to the same group enqueues a callback on an empty queue.
> > > 
> > > Here is a possible bad scenario:
> > > 
> > > 1) CPU 0 is NOCB unlike all other CPUs.
> > > 2) CPU 0 queues a callback
> > 
> > Call it CB1.
> > 
> > > 2) The grace period related to that callback elapses
> > > 3) The callback is moved to the done list (but is not invoked yet),
> > >    there are no more pending callbacks for CPU 0
> > 
> > So CB1 is on ->cblist waiting to be invoked, correct?
> > 
> > > 4) CPU 1 calls rcu_barrier() and sends an IPI to CPU 0
> > > 5) CPU 0 entrains the callback but doesn't wake up nocb_gp
> > 
> > And CB1 must still be there because otherwise the IPI handler would not
> > have entrained the callback, correct?  If so, we have both CB1 and the
> > rcu_barrier() callback (call it CB2) in ->cblist, but on the done list.
> > 
> > > 6) CPU 1 blocks forever, unless CPU 0 ever queues enough further
> > >    callbacks to arm an RCU_NOCB_WAKE_FORCE timer.
> > 
> > Except that -something- must have already been prepared to wake up in
> > order to invoke CB1.  And that something would invoke CB2 along with CB1,
> > given that they are both on the done list.  If there is no such wakeup
> > already, then the hang could occur with just CB1, without the help of CB2.
> 
> Heh good point. I was confused with CB1 on RCU_DONE_TAIL and the possibility
> for CB2 to be entrained on RCU_WAIT_TAIL. But that's indeed not supposed to
> happen. Ok so this patch indeed doesn't make sense outside lazy.

Whew!!!  ;-)

> > > This is also required to make sure lazy callbacks in future patches
> > > don't end up making rcu_barrier() wait for multiple seconds.
> > 
> > But I do see that the wakeup is needed in the lazy case, and if I remember
> > correctly, the ten-second rcu_barrier() delay really did happen.  If I
> > understand correctly, for this to happen, all of the callbacks must be
> > in the bypass list, that is, ->cblist must be empty.
> > 
> > So has the scenario steps 1-6 called out above actually happened in the
> > absence of lazy callbacks?
> 
> Nope, so I guess we can have the pending check around rcu_nocb_flush_bypass()
> only...

OK, sounds good.

I have put this series on branch lazy.2022.10.14a and am testing it.

							Thanx, Paul

Joel Fernandes Oct. 14, 2022, 3:19 p.m. UTC | #4

On Fri, Oct 14, 2022 at 11:03 AM Paul E. McKenney <paulmck@kernel.org> wrote:
>
> On Fri, Oct 14, 2022 at 04:40:19PM +0200, Frederic Weisbecker wrote:
> > On Fri, Oct 14, 2022 at 07:21:27AM -0700, Paul E. McKenney wrote:
> > > On Tue, Oct 11, 2022 at 06:01:30PM +0000, Joel Fernandes (Google) wrote:
> > > > From: Frederic Weisbecker <frederic@kernel.org>
> > > >
> > > > Upon entraining a callback to a NOCB CPU, no further wake up is
> > > > issued on the corresponding nocb_gp kthread. As a result, the callback
> > > > and all the subsequent ones on that CPU may be ignored, at least until
> > > > an RCU_NOCB_WAKE_FORCE timer is ever armed or another NOCB CPU belonging
> > > > to the same group enqueues a callback on an empty queue.
> > > >
> > > > Here is a possible bad scenario:
> > > >
> > > > 1) CPU 0 is NOCB unlike all other CPUs.
> > > > 2) CPU 0 queues a callback
> > >
> > > Call it CB1.
> > >
> > > > 2) The grace period related to that callback elapses
> > > > 3) The callback is moved to the done list (but is not invoked yet),
> > > >    there are no more pending callbacks for CPU 0
> > >
> > > So CB1 is on ->cblist waiting to be invoked, correct?
> > >
> > > > 4) CPU 1 calls rcu_barrier() and sends an IPI to CPU 0
> > > > 5) CPU 0 entrains the callback but doesn't wake up nocb_gp
> > >
> > > And CB1 must still be there because otherwise the IPI handler would not
> > > have entrained the callback, correct?  If so, we have both CB1 and the
> > > rcu_barrier() callback (call it CB2) in ->cblist, but on the done list.
> > >
> > > > 6) CPU 1 blocks forever, unless CPU 0 ever queues enough further
> > > >    callbacks to arm an RCU_NOCB_WAKE_FORCE timer.
> > >
> > > Except that -something- must have already been prepared to wake up in
> > > order to invoke CB1.  And that something would invoke CB2 along with CB1,
> > > given that they are both on the done list.  If there is no such wakeup
> > > already, then the hang could occur with just CB1, without the help of CB2.
> >
> > Heh good point. I was confused with CB1 on RCU_DONE_TAIL and the possibility
> > for CB2 to be entrained on RCU_WAIT_TAIL. But that's indeed not supposed to
> > happen. Ok so this patch indeed doesn't make sense outside lazy.
>
> Whew!!!  ;-)
>
> > > > This is also required to make sure lazy callbacks in future patches
> > > > don't end up making rcu_barrier() wait for multiple seconds.
> > >
> > > But I do see that the wakeup is needed in the lazy case, and if I remember
> > > correctly, the ten-second rcu_barrier() delay really did happen.  If I

Yes it did happen. Real world device testing confirmed it.

> > > understand correctly, for this to happen, all of the callbacks must be
> > > in the bypass list, that is, ->cblist must be empty.
> > >
> > > So has the scenario steps 1-6 called out above actually happened in the
> > > absence of lazy callbacks?
> >
> > Nope, so I guess we can have the pending check around rcu_nocb_flush_bypass()
> > only...
>
> OK, sounds good.
>
> I have put this series on branch lazy.2022.10.14a and am testing it.

I agree with the discussion, though if all CBs are in the bypass list,
the patch will also save 2 jiffies.

So just commit messages that need rework then? This one can be taken instead:
https://lore.kernel.org/rcu/21ECDA9F-81B1-4D22-8B03-020FB5DADA4F@joelfernandes.org/T/#m14d21fbce23539a521693a4184b28ddc55d7d2c5

Thanks!

 - Joel

Paul E. McKenney Oct. 14, 2022, 3:46 p.m. UTC | #5

On Fri, Oct 14, 2022 at 11:19:28AM -0400, Joel Fernandes wrote:
> On Fri, Oct 14, 2022 at 11:03 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> > On Fri, Oct 14, 2022 at 04:40:19PM +0200, Frederic Weisbecker wrote:
> > > On Fri, Oct 14, 2022 at 07:21:27AM -0700, Paul E. McKenney wrote:
> > > > On Tue, Oct 11, 2022 at 06:01:30PM +0000, Joel Fernandes (Google) wrote:
> > > > > From: Frederic Weisbecker <frederic@kernel.org>
> > > > >
> > > > > Upon entraining a callback to a NOCB CPU, no further wake up is
> > > > > issued on the corresponding nocb_gp kthread. As a result, the callback
> > > > > and all the subsequent ones on that CPU may be ignored, at least until
> > > > > an RCU_NOCB_WAKE_FORCE timer is ever armed or another NOCB CPU belonging
> > > > > to the same group enqueues a callback on an empty queue.
> > > > >
> > > > > Here is a possible bad scenario:
> > > > >
> > > > > 1) CPU 0 is NOCB unlike all other CPUs.
> > > > > 2) CPU 0 queues a callback
> > > >
> > > > Call it CB1.
> > > >
> > > > > 2) The grace period related to that callback elapses
> > > > > 3) The callback is moved to the done list (but is not invoked yet),
> > > > >    there are no more pending callbacks for CPU 0
> > > >
> > > > So CB1 is on ->cblist waiting to be invoked, correct?
> > > >
> > > > > 4) CPU 1 calls rcu_barrier() and sends an IPI to CPU 0
> > > > > 5) CPU 0 entrains the callback but doesn't wake up nocb_gp
> > > >
> > > > And CB1 must still be there because otherwise the IPI handler would not
> > > > have entrained the callback, correct?  If so, we have both CB1 and the
> > > > rcu_barrier() callback (call it CB2) in ->cblist, but on the done list.
> > > >
> > > > > 6) CPU 1 blocks forever, unless CPU 0 ever queues enough further
> > > > >    callbacks to arm an RCU_NOCB_WAKE_FORCE timer.
> > > >
> > > > Except that -something- must have already been prepared to wake up in
> > > > order to invoke CB1.  And that something would invoke CB2 along with CB1,
> > > > given that they are both on the done list.  If there is no such wakeup
> > > > already, then the hang could occur with just CB1, without the help of CB2.
> > >
> > > Heh good point. I was confused with CB1 on RCU_DONE_TAIL and the possibility
> > > for CB2 to be entrained on RCU_WAIT_TAIL. But that's indeed not supposed to
> > > happen. Ok so this patch indeed doesn't make sense outside lazy.
> >
> > Whew!!!  ;-)
> >
> > > > > This is also required to make sure lazy callbacks in future patches
> > > > > don't end up making rcu_barrier() wait for multiple seconds.
> > > >
> > > > But I do see that the wakeup is needed in the lazy case, and if I remember
> > > > correctly, the ten-second rcu_barrier() delay really did happen.  If I
> 
> Yes it did happen. Real world device testing confirmed it.

Very good, thank you!

> > > > understand correctly, for this to happen, all of the callbacks must be
> > > > in the bypass list, that is, ->cblist must be empty.
> > > >
> > > > So has the scenario steps 1-6 called out above actually happened in the
> > > > absence of lazy callbacks?
> > >
> > > Nope, so I guess we can have the pending check around rcu_nocb_flush_bypass()
> > > only...
> >
> > OK, sounds good.
> >
> > I have put this series on branch lazy.2022.10.14a and am testing it.
> 
> I agree with the discussion, though if all CBs are in the bypass list,
> the patch will also save 2 jiffies.
> 
> So just commit messages that need rework then? This one can be taken instead:
> https://lore.kernel.org/rcu/21ECDA9F-81B1-4D22-8B03-020FB5DADA4F@joelfernandes.org/T/#m14d21fbce23539a521693a4184b28ddc55d7d2c5

This one looks plausible to me.

							Thanx, Paul

Frederic Weisbecker Oct. 14, 2022, 8:47 p.m. UTC | #6

On Fri, Oct 14, 2022 at 08:46:06AM -0700, Paul E. McKenney wrote:
> On Fri, Oct 14, 2022 at 11:19:28AM -0400, Joel Fernandes wrote:
> > I agree with the discussion, though if all CBs are in the bypass list,
> > the patch will also save 2 jiffies.
> > 
> > So just commit messages that need rework then? This one can be taken instead:
> > https://lore.kernel.org/rcu/21ECDA9F-81B1-4D22-8B03-020FB5DADA4F@joelfernandes.org/T/#m14d21fbce23539a521693a4184b28ddc55d7d2c5
> 
> This one looks plausible to me.

With the following modified diff (passed 25 hours of TREE01):

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 96d678c9cfb6..7f1f6f792240 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3914,6 +3914,8 @@ static void rcu_barrier_entrain(struct rcu_data *rdp)
 {
 	unsigned long gseq = READ_ONCE(rcu_state.barrier_sequence);
 	unsigned long lseq = READ_ONCE(rdp->barrier_seq_snap);
+	bool wake_nocb = false;
+	bool was_alldone = false;
 
 	lockdep_assert_held(&rcu_state.barrier_lock);
 	if (rcu_seq_state(lseq) || !rcu_seq_state(gseq) || rcu_seq_ctr(lseq) != rcu_seq_ctr(gseq))
@@ -3922,7 +3924,14 @@ static void rcu_barrier_entrain(struct rcu_data *rdp)
 	rdp->barrier_head.func = rcu_barrier_callback;
 	debug_rcu_head_queue(&rdp->barrier_head);
 	rcu_nocb_lock(rdp);
+	/*
+	 * Flush bypass and wakeup rcuog if we add callbacks to an empty regular
+	 * queue. This way we don't wait for bypass timer that can reach seconds
+	 * if it's fully lazy.
+	 */
+	was_alldone = rcu_rdp_is_offloaded(rdp) && !rcu_segcblist_pend_cbs(&rdp->cblist);
 	WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, jiffies));
+	wake_nocb = was_alldone && rcu_segcblist_pend_cbs(&rdp->cblist);
 	if (rcu_segcblist_entrain(&rdp->cblist, &rdp->barrier_head)) {
 		atomic_inc(&rcu_state.barrier_cpu_count);
 	} else {
@@ -3930,6 +3939,8 @@ static void rcu_barrier_entrain(struct rcu_data *rdp)
 		rcu_barrier_trace(TPS("IRQNQ"), -1, rcu_state.barrier_sequence);
 	}
 	rcu_nocb_unlock(rdp);
+	if (wake_nocb)
+		wake_nocb_gp(rdp, false);
 	smp_store_release(&rdp->barrier_seq_snap, gseq);
 }
 
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index d4a97e40ea9c..925dd98f8b23 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -439,6 +439,7 @@ static void zero_cpu_stall_ticks(struct rcu_data *rdp);
 static struct swait_queue_head *rcu_nocb_gp_get(struct rcu_node *rnp);
 static void rcu_nocb_gp_cleanup(struct swait_queue_head *sq);
 static void rcu_init_one_nocb(struct rcu_node *rnp);
+static bool wake_nocb_gp(struct rcu_data *rdp, bool force);
 static bool rcu_nocb_flush_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
 				  unsigned long j);
 static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index f77a6d7e1356..094fd454b6c3 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -1558,6 +1558,11 @@ static void rcu_init_one_nocb(struct rcu_node *rnp)
 {
 }
 
+static bool wake_nocb_gp(struct rcu_data *rdp, bool force)
+{
+	return false;
+}
+
 static bool rcu_nocb_flush_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
 				  unsigned long j)
 {

Paul E. McKenney Oct. 16, 2022, 3:16 p.m. UTC | #7

On Fri, Oct 14, 2022 at 10:47:50PM +0200, Frederic Weisbecker wrote:
> On Fri, Oct 14, 2022 at 08:46:06AM -0700, Paul E. McKenney wrote:
> > On Fri, Oct 14, 2022 at 11:19:28AM -0400, Joel Fernandes wrote:
> > > I agree with the discussion, though if all CBs are in the bypass list,
> > > the patch will also save 2 jiffies.
> > > 
> > > So just commit messages that need rework then? This one can be taken instead:
> > > https://lore.kernel.org/rcu/21ECDA9F-81B1-4D22-8B03-020FB5DADA4F@joelfernandes.org/T/#m14d21fbce23539a521693a4184b28ddc55d7d2c5
> > 
> > This one looks plausible to me.
> 
> With the following modified diff (passed 25 hours of TREE01):

Very good!

Could one of you (presumably Joel) please send v9?

Just to avoid me getting the wrong patch in the wrong place or similar.
Or mangling the required rebase following a pull (otherwise, as soon as I
create branches, Stephen Rothwell notes the lack of a committer signoff.)

							Thanx, Paul

> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 96d678c9cfb6..7f1f6f792240 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -3914,6 +3914,8 @@ static void rcu_barrier_entrain(struct rcu_data *rdp)
>  {
>  	unsigned long gseq = READ_ONCE(rcu_state.barrier_sequence);
>  	unsigned long lseq = READ_ONCE(rdp->barrier_seq_snap);
> +	bool wake_nocb = false;
> +	bool was_alldone = false;
>  
>  	lockdep_assert_held(&rcu_state.barrier_lock);
>  	if (rcu_seq_state(lseq) || !rcu_seq_state(gseq) || rcu_seq_ctr(lseq) != rcu_seq_ctr(gseq))
> @@ -3922,7 +3924,14 @@ static void rcu_barrier_entrain(struct rcu_data *rdp)
>  	rdp->barrier_head.func = rcu_barrier_callback;
>  	debug_rcu_head_queue(&rdp->barrier_head);
>  	rcu_nocb_lock(rdp);
> +	/*
> +	 * Flush bypass and wakeup rcuog if we add callbacks to an empty regular
> +	 * queue. This way we don't wait for bypass timer that can reach seconds
> +	 * if it's fully lazy.
> +	 */
> +	was_alldone = rcu_rdp_is_offloaded(rdp) && !rcu_segcblist_pend_cbs(&rdp->cblist);
>  	WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, jiffies));
> +	wake_nocb = was_alldone && rcu_segcblist_pend_cbs(&rdp->cblist);
>  	if (rcu_segcblist_entrain(&rdp->cblist, &rdp->barrier_head)) {
>  		atomic_inc(&rcu_state.barrier_cpu_count);
>  	} else {
> @@ -3930,6 +3939,8 @@ static void rcu_barrier_entrain(struct rcu_data *rdp)
>  		rcu_barrier_trace(TPS("IRQNQ"), -1, rcu_state.barrier_sequence);
>  	}
>  	rcu_nocb_unlock(rdp);
> +	if (wake_nocb)
> +		wake_nocb_gp(rdp, false);
>  	smp_store_release(&rdp->barrier_seq_snap, gseq);
>  }
>  
> diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
> index d4a97e40ea9c..925dd98f8b23 100644
> --- a/kernel/rcu/tree.h
> +++ b/kernel/rcu/tree.h
> @@ -439,6 +439,7 @@ static void zero_cpu_stall_ticks(struct rcu_data *rdp);
>  static struct swait_queue_head *rcu_nocb_gp_get(struct rcu_node *rnp);
>  static void rcu_nocb_gp_cleanup(struct swait_queue_head *sq);
>  static void rcu_init_one_nocb(struct rcu_node *rnp);
> +static bool wake_nocb_gp(struct rcu_data *rdp, bool force);
>  static bool rcu_nocb_flush_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
>  				  unsigned long j);
>  static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
> diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
> index f77a6d7e1356..094fd454b6c3 100644
> --- a/kernel/rcu/tree_nocb.h
> +++ b/kernel/rcu/tree_nocb.h
> @@ -1558,6 +1558,11 @@ static void rcu_init_one_nocb(struct rcu_node *rnp)
>  {
>  }
>  
> +static bool wake_nocb_gp(struct rcu_data *rdp, bool force)
> +{
> +	return false;
> +}
> +
>  static bool rcu_nocb_flush_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
>  				  unsigned long j)
>  {

Joel Fernandes Oct. 16, 2022, 3:56 p.m. UTC | #8

On Sun, Oct 16, 2022 at 11:16 AM Paul E. McKenney <paulmck@kernel.org> wrote:
>
> On Fri, Oct 14, 2022 at 10:47:50PM +0200, Frederic Weisbecker wrote:
> > On Fri, Oct 14, 2022 at 08:46:06AM -0700, Paul E. McKenney wrote:
> > > On Fri, Oct 14, 2022 at 11:19:28AM -0400, Joel Fernandes wrote:
> > > > I agree with the discussion, though if all CBs are in the bypass list,
> > > > the patch will also save 2 jiffies.
> > > >
> > > > So just commit messages that need rework then? This one can be taken instead:
> > > > https://lore.kernel.org/rcu/21ECDA9F-81B1-4D22-8B03-020FB5DADA4F@joelfernandes.org/T/#m14d21fbce23539a521693a4184b28ddc55d7d2c5
> > >
> > > This one looks plausible to me.
> >
> > With the following modified diff (passed 25 hours of TREE01):
>
> Very good!
>
> Could one of you (presumably Joel) please send v9?
>
> Just to avoid me getting the wrong patch in the wrong place or similar.
> Or mangling the required rebase following a pull (otherwise, as soon as I
> create branches, Stephen Rothwell notes the lack of a committer signoff.)

Taking a break from raking leaves so doing it right now! Those leaves!

 - Joel

[v8,01/13] rcu: Fix missing nocb gp wake on rcu_barrier()

Commit Message

Comments

Patch