diff mbox series

[v5,04/18] rcu: Fix late wakeup when flush of bypass cblist happens

Message ID 20220901221720.1105021-5-joel@joelfernandes.org (mailing list archive)
State New, archived
Headers show
Series Implement call_rcu_lazy() and miscellaneous fixes | expand

Commit Message

Joel Fernandes Sept. 1, 2022, 10:17 p.m. UTC
When the bypass cblist gets too big or its timeout has occurred, it is
flushed into the main cblist. However, the bypass timer is still running
and the behavior is that it would eventually expire and wake the GP
thread.

Since we are going to use the bypass cblist for lazy CBs, do the wakeup
soon as the flush happens. Otherwise, the lazy-timer will go off much
later and the now-non-lazy cblist CBs can get stranded for the duration
of the timer.

This is a good thing to do anyway (regardless of this series), since it
makes the behavior consistent with behavior of other code paths where queueing
something into the ->cblist makes the GP kthread in a non-sleeping state
quickly.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 kernel/rcu/tree_nocb.h | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

Comments

Frederic Weisbecker Sept. 2, 2022, 11:35 a.m. UTC | #1
On Thu, Sep 01, 2022 at 10:17:06PM +0000, Joel Fernandes (Google) wrote:
> When the bypass cblist gets too big or its timeout has occurred, it is
> flushed into the main cblist. However, the bypass timer is still running
> and the behavior is that it would eventually expire and wake the GP
> thread.
> 
> Since we are going to use the bypass cblist for lazy CBs, do the wakeup
> soon as the flush happens. Otherwise, the lazy-timer will go off much
> later and the now-non-lazy cblist CBs can get stranded for the duration
> of the timer.
> 
> This is a good thing to do anyway (regardless of this series), since it
> makes the behavior consistent with behavior of other code paths where queueing
> something into the ->cblist makes the GP kthread in a non-sleeping state
> quickly.
> 
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> ---
>  kernel/rcu/tree_nocb.h | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
> index 0a5f0ef41484..31068dd31315 100644
> --- a/kernel/rcu/tree_nocb.h
> +++ b/kernel/rcu/tree_nocb.h
> @@ -447,7 +447,13 @@ static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
>  			rcu_advance_cbs_nowake(rdp->mynode, rdp);
>  			rdp->nocb_gp_adv_time = j;
>  		}
> -		rcu_nocb_unlock_irqrestore(rdp, flags);
> +
> +		// The flush succeeded and we moved CBs into the ->cblist.
> +		// However, the bypass timer might still be running. Wakeup the
> +		// GP thread by calling a helper with was_all_done set so that
> +		// wake up happens (needed if main CB list was empty before).
> +		__call_rcu_nocb_wake(rdp, true, flags)
> +

Ok so there are two different changes here:

1) wake up nocb_gp as we just flushed the bypass list. Indeed if the regular
   callback list was empty before flushing, we rather want to immediately wake
   up nocb_gp instead of waiting for the bypass timer to process them.

2) wake up nocb_gp unconditionally (ie: even if the regular queue was not empty
   before bypass flushing) so that nocb_gp_wait() is forced through another loop
   starting with cancelling the bypass timer (I suggest you put such explanation
   in the comment btw because that process may not be obvious for mortals).

The change 1) looks like a good idea to me.

The change 2) has unclear motivation. It forces nocb_gp_wait() through another
costly loop even though the timer might have been cancelled into some near
future, eventually avoiding that extra costly loop. Also it abuses the
was_alldone stuff and we may get rcu_nocb_wake with incoherent meanings
(WakeEmpty/WakeEmptyIsDeferred) when it's actually not empty.

So you may need to clarify the purpose. And I would suggest to make two patches
here.

Thanks!


   

   

>  		return true; // Callback already enqueued.
>  	}
>  
> -- 
> 2.37.2.789.g6183377224-goog
>
Joel Fernandes Sept. 2, 2022, 11:58 p.m. UTC | #2
On 9/2/2022 7:35 AM, Frederic Weisbecker wrote:
> On Thu, Sep 01, 2022 at 10:17:06PM +0000, Joel Fernandes (Google) wrote:
>> When the bypass cblist gets too big or its timeout has occurred, it is
>> flushed into the main cblist. However, the bypass timer is still running
>> and the behavior is that it would eventually expire and wake the GP
>> thread.
>>
>> Since we are going to use the bypass cblist for lazy CBs, do the wakeup
>> soon as the flush happens. Otherwise, the lazy-timer will go off much
>> later and the now-non-lazy cblist CBs can get stranded for the duration
>> of the timer.
>>
>> This is a good thing to do anyway (regardless of this series), since it
>> makes the behavior consistent with behavior of other code paths where queueing
>> something into the ->cblist makes the GP kthread in a non-sleeping state
>> quickly.
>>
>> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
>> ---
>>  kernel/rcu/tree_nocb.h | 8 +++++++-
>>  1 file changed, 7 insertions(+), 1 deletion(-)
>>
>> diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
>> index 0a5f0ef41484..31068dd31315 100644
>> --- a/kernel/rcu/tree_nocb.h
>> +++ b/kernel/rcu/tree_nocb.h
>> @@ -447,7 +447,13 @@ static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
>>  			rcu_advance_cbs_nowake(rdp->mynode, rdp);
>>  			rdp->nocb_gp_adv_time = j;
>>  		}
>> -		rcu_nocb_unlock_irqrestore(rdp, flags);
>> +
>> +		// The flush succeeded and we moved CBs into the ->cblist.
>> +		// However, the bypass timer might still be running. Wakeup the
>> +		// GP thread by calling a helper with was_all_done set so that
>> +		// wake up happens (needed if main CB list was empty before).
>> +		__call_rcu_nocb_wake(rdp, true, flags)
>> +
> 
> Ok so there are two different changes here:
> 
> 1) wake up nocb_gp as we just flushed the bypass list. Indeed if the regular
>    callback list was empty before flushing, we rather want to immediately wake
>    up nocb_gp instead of waiting for the bypass timer to process them.
> 
> 2) wake up nocb_gp unconditionally (ie: even if the regular queue was not empty
>    before bypass flushing) so that nocb_gp_wait() is forced through another loop
>    starting with cancelling the bypass timer (I suggest you put such explanation
>    in the comment btw because that process may not be obvious for mortals).
> 
> The change 1) looks like a good idea to me.
> 
> The change 2) has unclear motivation. It forces nocb_gp_wait() through another
> costly loop even though the timer might have been cancelled into some near
> future, eventually avoiding that extra costly loop. Also it abuses the
> was_alldone stuff and we may get rcu_nocb_wake with incoherent meanings
> (WakeEmpty/WakeEmptyIsDeferred) when it's actually not empty.

Yes #2 can be optimized as follows I think on top of this patch, good point:
=============
diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index ee5924ba2f3b..24aabd723abd 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -514,12 +514,13 @@ static bool rcu_nocb_try_bypass(struct rcu_data *rdp,
struct rcu_head *rhp,
            ncbs >= qhimark) {
                rcu_nocb_lock(rdp);

+               *was_alldone = !rcu_segcblist_pend_cbs(&rdp->cblist);
+
                rcu_cblist_set_flush(&rdp->nocb_bypass,
                                lazy ? BIT(CB_DEBUG_BYPASS_LAZY_FLUSHED) :
BIT(CB_DEBUG_BYPASS_FLUSHED),
                                (j - READ_ONCE(cb_debug_jiffies_first)));

                if (!rcu_nocb_flush_bypass(rdp, rhp, j, lazy, false)) {
-                       *was_alldone = !rcu_segcblist_pend_cbs(&rdp->cblist);
                        if (*was_alldone)
                                trace_rcu_nocb_wake(rcu_state.name, rdp->cpu,
                                                    TPS("FirstQ"));
@@ -537,7 +538,7 @@ static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct
rcu_head *rhp,
                // However, the bypass timer might still be running. Wakeup the
                // GP thread by calling a helper with was_all_done set so that
                // wake up happens (needed if main CB list was empty before).
-               __call_rcu_nocb_wake(rdp, true, flags)
+               __call_rcu_nocb_wake(rdp, *was_all_done, flags)

                return true; // Callback already enqueued.
        }
=============

> So you may need to clarify the purpose. And I would suggest to make two patches
> here.
I guess this change only #2 is no longer a concern? And splitting is not needed
then as it is only #1.

Thanks,

 - Joel
Paul E. McKenney Sept. 3, 2022, 2:04 p.m. UTC | #3
On Thu, Sep 01, 2022 at 10:17:06PM +0000, Joel Fernandes (Google) wrote:
> When the bypass cblist gets too big or its timeout has occurred, it is
> flushed into the main cblist. However, the bypass timer is still running
> and the behavior is that it would eventually expire and wake the GP
> thread.
> 
> Since we are going to use the bypass cblist for lazy CBs, do the wakeup
> soon as the flush happens. Otherwise, the lazy-timer will go off much
> later and the now-non-lazy cblist CBs can get stranded for the duration
> of the timer.
> 
> This is a good thing to do anyway (regardless of this series), since it
> makes the behavior consistent with behavior of other code paths where queueing
> something into the ->cblist makes the GP kthread in a non-sleeping state
> quickly.
> 
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> ---
>  kernel/rcu/tree_nocb.h | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
> index 0a5f0ef41484..31068dd31315 100644
> --- a/kernel/rcu/tree_nocb.h
> +++ b/kernel/rcu/tree_nocb.h
> @@ -447,7 +447,13 @@ static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
>  			rcu_advance_cbs_nowake(rdp->mynode, rdp);
>  			rdp->nocb_gp_adv_time = j;
>  		}
> -		rcu_nocb_unlock_irqrestore(rdp, flags);
> +
> +		// The flush succeeded and we moved CBs into the ->cblist.
> +		// However, the bypass timer might still be running. Wakeup the
> +		// GP thread by calling a helper with was_all_done set so that
> +		// wake up happens (needed if main CB list was empty before).
> +		__call_rcu_nocb_wake(rdp, true, flags)

TREE01 and TREE04 gripe about the missing ";".  I added it.

							Thanx, Paul

> +
>  		return true; // Callback already enqueued.
>  	}
>  
> -- 
> 2.37.2.789.g6183377224-goog
>
Joel Fernandes Sept. 3, 2022, 2:05 p.m. UTC | #4
On 9/3/2022 10:04 AM, Paul E. McKenney wrote:
> On Thu, Sep 01, 2022 at 10:17:06PM +0000, Joel Fernandes (Google) wrote:
>> When the bypass cblist gets too big or its timeout has occurred, it is
>> flushed into the main cblist. However, the bypass timer is still running
>> and the behavior is that it would eventually expire and wake the GP
>> thread.
>>
>> Since we are going to use the bypass cblist for lazy CBs, do the wakeup
>> soon as the flush happens. Otherwise, the lazy-timer will go off much
>> later and the now-non-lazy cblist CBs can get stranded for the duration
>> of the timer.
>>
>> This is a good thing to do anyway (regardless of this series), since it
>> makes the behavior consistent with behavior of other code paths where queueing
>> something into the ->cblist makes the GP kthread in a non-sleeping state
>> quickly.
>>
>> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
>> ---
>>  kernel/rcu/tree_nocb.h | 8 +++++++-
>>  1 file changed, 7 insertions(+), 1 deletion(-)
>>
>> diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
>> index 0a5f0ef41484..31068dd31315 100644
>> --- a/kernel/rcu/tree_nocb.h
>> +++ b/kernel/rcu/tree_nocb.h
>> @@ -447,7 +447,13 @@ static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
>>  			rcu_advance_cbs_nowake(rdp->mynode, rdp);
>>  			rdp->nocb_gp_adv_time = j;
>>  		}
>> -		rcu_nocb_unlock_irqrestore(rdp, flags);
>> +
>> +		// The flush succeeded and we moved CBs into the ->cblist.
>> +		// However, the bypass timer might still be running. Wakeup the
>> +		// GP thread by calling a helper with was_all_done set so that
>> +		// wake up happens (needed if main CB list was empty before).
>> +		__call_rcu_nocb_wake(rdp, true, flags)
> 
> TREE01 and TREE04 gripe about the missing ";".  I added it.

Sorry about that, I must have messed lost the semicolon during re-basing.

 - Joel
Paul E. McKenney Sept. 3, 2022, 3:10 p.m. UTC | #5
On Fri, Sep 02, 2022 at 07:58:42PM -0400, Joel Fernandes wrote:
> On 9/2/2022 7:35 AM, Frederic Weisbecker wrote:
> > On Thu, Sep 01, 2022 at 10:17:06PM +0000, Joel Fernandes (Google) wrote:
> >> When the bypass cblist gets too big or its timeout has occurred, it is
> >> flushed into the main cblist. However, the bypass timer is still running
> >> and the behavior is that it would eventually expire and wake the GP
> >> thread.
> >>
> >> Since we are going to use the bypass cblist for lazy CBs, do the wakeup
> >> soon as the flush happens. Otherwise, the lazy-timer will go off much
> >> later and the now-non-lazy cblist CBs can get stranded for the duration
> >> of the timer.
> >>
> >> This is a good thing to do anyway (regardless of this series), since it
> >> makes the behavior consistent with behavior of other code paths where queueing
> >> something into the ->cblist makes the GP kthread in a non-sleeping state
> >> quickly.
> >>
> >> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> >> ---
> >>  kernel/rcu/tree_nocb.h | 8 +++++++-
> >>  1 file changed, 7 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
> >> index 0a5f0ef41484..31068dd31315 100644
> >> --- a/kernel/rcu/tree_nocb.h
> >> +++ b/kernel/rcu/tree_nocb.h
> >> @@ -447,7 +447,13 @@ static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
> >>  			rcu_advance_cbs_nowake(rdp->mynode, rdp);
> >>  			rdp->nocb_gp_adv_time = j;
> >>  		}
> >> -		rcu_nocb_unlock_irqrestore(rdp, flags);
> >> +
> >> +		// The flush succeeded and we moved CBs into the ->cblist.
> >> +		// However, the bypass timer might still be running. Wakeup the
> >> +		// GP thread by calling a helper with was_all_done set so that
> >> +		// wake up happens (needed if main CB list was empty before).
> >> +		__call_rcu_nocb_wake(rdp, true, flags)
> >> +
> > 
> > Ok so there are two different changes here:
> > 
> > 1) wake up nocb_gp as we just flushed the bypass list. Indeed if the regular
> >    callback list was empty before flushing, we rather want to immediately wake
> >    up nocb_gp instead of waiting for the bypass timer to process them.
> > 
> > 2) wake up nocb_gp unconditionally (ie: even if the regular queue was not empty
> >    before bypass flushing) so that nocb_gp_wait() is forced through another loop
> >    starting with cancelling the bypass timer (I suggest you put such explanation
> >    in the comment btw because that process may not be obvious for mortals).
> > 
> > The change 1) looks like a good idea to me.
> > 
> > The change 2) has unclear motivation. It forces nocb_gp_wait() through another
> > costly loop even though the timer might have been cancelled into some near
> > future, eventually avoiding that extra costly loop. Also it abuses the
> > was_alldone stuff and we may get rcu_nocb_wake with incoherent meanings
> > (WakeEmpty/WakeEmptyIsDeferred) when it's actually not empty.
> 
> Yes #2 can be optimized as follows I think on top of this patch, good point:

I am holding off on this for the moment.

							Thanx, Paul

> =============
> diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
> index ee5924ba2f3b..24aabd723abd 100644
> --- a/kernel/rcu/tree_nocb.h
> +++ b/kernel/rcu/tree_nocb.h
> @@ -514,12 +514,13 @@ static bool rcu_nocb_try_bypass(struct rcu_data *rdp,
> struct rcu_head *rhp,
>             ncbs >= qhimark) {
>                 rcu_nocb_lock(rdp);
> 
> +               *was_alldone = !rcu_segcblist_pend_cbs(&rdp->cblist);
> +
>                 rcu_cblist_set_flush(&rdp->nocb_bypass,
>                                 lazy ? BIT(CB_DEBUG_BYPASS_LAZY_FLUSHED) :
> BIT(CB_DEBUG_BYPASS_FLUSHED),
>                                 (j - READ_ONCE(cb_debug_jiffies_first)));
> 
>                 if (!rcu_nocb_flush_bypass(rdp, rhp, j, lazy, false)) {
> -                       *was_alldone = !rcu_segcblist_pend_cbs(&rdp->cblist);
>                         if (*was_alldone)
>                                 trace_rcu_nocb_wake(rcu_state.name, rdp->cpu,
>                                                     TPS("FirstQ"));
> @@ -537,7 +538,7 @@ static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct
> rcu_head *rhp,
>                 // However, the bypass timer might still be running. Wakeup the
>                 // GP thread by calling a helper with was_all_done set so that
>                 // wake up happens (needed if main CB list was empty before).
> -               __call_rcu_nocb_wake(rdp, true, flags)
> +               __call_rcu_nocb_wake(rdp, *was_all_done, flags)
> 
>                 return true; // Callback already enqueued.
>         }
> =============
> 
> > So you may need to clarify the purpose. And I would suggest to make two patches
> > here.
> I guess this change only #2 is no longer a concern? And splitting is not needed
> then as it is only #1.
> 
> Thanks,
> 
>  - Joel
> 
> 
> 
> 
>
Frederic Weisbecker Sept. 4, 2022, 9:13 p.m. UTC | #6
On Fri, Sep 02, 2022 at 07:58:42PM -0400, Joel Fernandes wrote:
> 
> 
> On 9/2/2022 7:35 AM, Frederic Weisbecker wrote:
> > On Thu, Sep 01, 2022 at 10:17:06PM +0000, Joel Fernandes (Google) wrote:
> >> When the bypass cblist gets too big or its timeout has occurred, it is
> >> flushed into the main cblist. However, the bypass timer is still running
> >> and the behavior is that it would eventually expire and wake the GP
> >> thread.
> >>
> >> Since we are going to use the bypass cblist for lazy CBs, do the wakeup
> >> soon as the flush happens. Otherwise, the lazy-timer will go off much
> >> later and the now-non-lazy cblist CBs can get stranded for the duration
> >> of the timer.
> >>
> >> This is a good thing to do anyway (regardless of this series), since it
> >> makes the behavior consistent with behavior of other code paths where queueing
> >> something into the ->cblist makes the GP kthread in a non-sleeping state
> >> quickly.
> >>
> >> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> >> ---
> >>  kernel/rcu/tree_nocb.h | 8 +++++++-
> >>  1 file changed, 7 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
> >> index 0a5f0ef41484..31068dd31315 100644
> >> --- a/kernel/rcu/tree_nocb.h
> >> +++ b/kernel/rcu/tree_nocb.h
> >> @@ -447,7 +447,13 @@ static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
> >>  			rcu_advance_cbs_nowake(rdp->mynode, rdp);
> >>  			rdp->nocb_gp_adv_time = j;
> >>  		}
> >> -		rcu_nocb_unlock_irqrestore(rdp, flags);
> >> +
> >> +		// The flush succeeded and we moved CBs into the ->cblist.
> >> +		// However, the bypass timer might still be running. Wakeup the
> >> +		// GP thread by calling a helper with was_all_done set so that
> >> +		// wake up happens (needed if main CB list was empty before).
> >> +		__call_rcu_nocb_wake(rdp, true, flags)
> >> +
> > 
> > Ok so there are two different changes here:
> > 
> > 1) wake up nocb_gp as we just flushed the bypass list. Indeed if the regular
> >    callback list was empty before flushing, we rather want to immediately wake
> >    up nocb_gp instead of waiting for the bypass timer to process them.
> > 
> > 2) wake up nocb_gp unconditionally (ie: even if the regular queue was not empty
> >    before bypass flushing) so that nocb_gp_wait() is forced through another loop
> >    starting with cancelling the bypass timer (I suggest you put such explanation
> >    in the comment btw because that process may not be obvious for mortals).
> > 
> > The change 1) looks like a good idea to me.
> > 
> > The change 2) has unclear motivation. It forces nocb_gp_wait() through another
> > costly loop even though the timer might have been cancelled into some near
> > future, eventually avoiding that extra costly loop. Also it abuses the
> > was_alldone stuff and we may get rcu_nocb_wake with incoherent meanings
> > (WakeEmpty/WakeEmptyIsDeferred) when it's actually not empty.
> 
> Yes #2 can be optimized as follows I think on top of this patch, good point:
> =============
> diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
> index ee5924ba2f3b..24aabd723abd 100644
> --- a/kernel/rcu/tree_nocb.h
> +++ b/kernel/rcu/tree_nocb.h
> @@ -514,12 +514,13 @@ static bool rcu_nocb_try_bypass(struct rcu_data *rdp,
> struct rcu_head *rhp,
>             ncbs >= qhimark) {
>                 rcu_nocb_lock(rdp);
> 
> +               *was_alldone = !rcu_segcblist_pend_cbs(&rdp->cblist);
> +
>                 rcu_cblist_set_flush(&rdp->nocb_bypass,
>                                 lazy ? BIT(CB_DEBUG_BYPASS_LAZY_FLUSHED) :
> BIT(CB_DEBUG_BYPASS_FLUSHED),
>                                 (j - READ_ONCE(cb_debug_jiffies_first)));
> 
>                 if (!rcu_nocb_flush_bypass(rdp, rhp, j, lazy, false)) {
> -                       *was_alldone = !rcu_segcblist_pend_cbs(&rdp->cblist);
>                         if (*was_alldone)
>                                 trace_rcu_nocb_wake(rcu_state.name, rdp->cpu,
>                                                     TPS("FirstQ"));
> @@ -537,7 +538,7 @@ static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct
> rcu_head *rhp,
>                 // However, the bypass timer might still be running. Wakeup the
>                 // GP thread by calling a helper with was_all_done set so that
>                 // wake up happens (needed if main CB list was empty before).
> -               __call_rcu_nocb_wake(rdp, true, flags)
> +               __call_rcu_nocb_wake(rdp, *was_all_done, flags)
> 
>                 return true; // Callback already enqueued.
>

That looks right!

}
> =============
> 
> > So you may need to clarify the purpose. And I would suggest to make two patches
> > here.
> I guess this change only #2 is no longer a concern? And splitting is not needed
> then as it is only #1.

Sounds good!

Thanks!
Joel Fernandes Sept. 6, 2022, 3:07 a.m. UTC | #7
On Thu, Sep 01, 2022 at 10:17:06PM +0000, Joel Fernandes (Google) wrote:
> When the bypass cblist gets too big or its timeout has occurred, it is
> flushed into the main cblist. However, the bypass timer is still running
> and the behavior is that it would eventually expire and wake the GP
> thread.
> 
> Since we are going to use the bypass cblist for lazy CBs, do the wakeup
> soon as the flush happens. Otherwise, the lazy-timer will go off much
> later and the now-non-lazy cblist CBs can get stranded for the duration
> of the timer.
> 
> This is a good thing to do anyway (regardless of this series), since it
> makes the behavior consistent with behavior of other code paths where queueing
> something into the ->cblist makes the GP kthread in a non-sleeping state
> quickly.
> 
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> ---

Here is the updated version of this patch after review comments from
Frederic:

---8<-----------------------

From: "Joel Fernandes (Google)" <joel@joelfernandes.org>
Subject: [PATCH v6] rcu: Fix late wakeup when flush of bypass cblist happens

When the bypass cblist gets too big or its timeout has occurred, it is
flushed into the main cblist. However, the bypass timer is still running
and the behavior is that it would eventually expire and wake the GP
thread.

Since we are going to use the bypass cblist for lazy CBs, do the wakeup
soon as the flush happens. Otherwise, the lazy-timer will go off much
later and the now-non-lazy cblist CBs can get stranded for the duration
of the timer.

This is a good thing to do anyway (regardless of this series), since it
makes the behavior consistent with behavior of other code paths where queueing
something into the ->cblist makes the GP kthread in a non-sleeping state
quickly.

[ Frederic Weisbec: changes to not do wake up GP thread unless needed ].

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 kernel/rcu/tree_nocb.h | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index 0a5f0ef41484..4dc86274b3e8 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -433,8 +433,9 @@ static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
 	if ((ncbs && j != READ_ONCE(rdp->nocb_bypass_first)) ||
 	    ncbs >= qhimark) {
 		rcu_nocb_lock(rdp);
+		*was_alldone = !rcu_segcblist_pend_cbs(&rdp->cblist);
+
 		if (!rcu_nocb_flush_bypass(rdp, rhp, j)) {
-			*was_alldone = !rcu_segcblist_pend_cbs(&rdp->cblist);
 			if (*was_alldone)
 				trace_rcu_nocb_wake(rcu_state.name, rdp->cpu,
 						    TPS("FirstQ"));
@@ -447,7 +448,13 @@ static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
 			rcu_advance_cbs_nowake(rdp->mynode, rdp);
 			rdp->nocb_gp_adv_time = j;
 		}
-		rcu_nocb_unlock_irqrestore(rdp, flags);
+
+		// The flush succeeded and we moved CBs into the ->cblist.
+		// However, the bypass timer might still be running. Wakeup the
+		// GP thread by calling a helper with was_all_done set so that
+		// wake up happens (needed if main CB list was empty before).
+		__call_rcu_nocb_wake(rdp, *was_alldone, flags);
+
 		return true; // Callback already enqueued.
 	}
Frederic Weisbecker Sept. 6, 2022, 9:48 a.m. UTC | #8
On Tue, Sep 06, 2022 at 03:07:05AM +0000, Joel Fernandes wrote:
> On Thu, Sep 01, 2022 at 10:17:06PM +0000, Joel Fernandes (Google) wrote:
> From: "Joel Fernandes (Google)" <joel@joelfernandes.org>
> Subject: [PATCH v6] rcu: Fix late wakeup when flush of bypass cblist happens
> 
> When the bypass cblist gets too big or its timeout has occurred, it is
> flushed into the main cblist. However, the bypass timer is still running
> and the behavior is that it would eventually expire and wake the GP
> thread.
> 
> Since we are going to use the bypass cblist for lazy CBs, do the wakeup
> soon as the flush happens. Otherwise, the lazy-timer will go off much
> later and the now-non-lazy cblist CBs can get stranded for the duration
> of the timer.
> 
> This is a good thing to do anyway (regardless of this series), since it
> makes the behavior consistent with behavior of other code paths where queueing
> something into the ->cblist makes the GP kthread in a non-sleeping state
> quickly.
> 
> [ Frederic Weisbec: changes to not do wake up GP thread unless needed ].
> 
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> ---
>  kernel/rcu/tree_nocb.h | 11 +++++++++--
>  1 file changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
> index 0a5f0ef41484..4dc86274b3e8 100644
> --- a/kernel/rcu/tree_nocb.h
> +++ b/kernel/rcu/tree_nocb.h
> @@ -433,8 +433,9 @@ static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
>  	if ((ncbs && j != READ_ONCE(rdp->nocb_bypass_first)) ||
>  	    ncbs >= qhimark) {
>  		rcu_nocb_lock(rdp);
> +		*was_alldone = !rcu_segcblist_pend_cbs(&rdp->cblist);
> +
>  		if (!rcu_nocb_flush_bypass(rdp, rhp, j)) {
> -			*was_alldone = !rcu_segcblist_pend_cbs(&rdp->cblist);
>  			if (*was_alldone)
>  				trace_rcu_nocb_wake(rcu_state.name, rdp->cpu,
>  						    TPS("FirstQ"));
> @@ -447,7 +448,13 @@ static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
>  			rcu_advance_cbs_nowake(rdp->mynode, rdp);
>  			rdp->nocb_gp_adv_time = j;
>  		}
> -		rcu_nocb_unlock_irqrestore(rdp, flags);
> +
> +		// The flush succeeded and we moved CBs into the ->cblist.
> +		// However, the bypass timer might still be running. Wakeup the

That part of the comment mentioning the bypass timer looks strange...


> +		// GP thread by calling a helper with was_all_done set so that
> +		// wake up happens (needed if main CB list was empty before).

How about:

                // The flush succeeded and we moved CBs into the regular list.
		// Don't wait for the wake up timer as it may be too far ahead.
		// Wake up now GP thread instead if the cblist was empty

Thanks.

Other than that the patch looks good, thanks!

> +		__call_rcu_nocb_wake(rdp, *was_alldone, flags);
> +
>  		return true; // Callback already enqueued.
>  	}
>  
> -- 
> 2.37.2.789.g6183377224-goog
>
Joel Fernandes Sept. 7, 2022, 2:43 a.m. UTC | #9
On 9/6/2022 5:48 AM, Frederic Weisbecker wrote:
>>  		}
>> -		rcu_nocb_unlock_irqrestore(rdp, flags);
>> +
>> +		// The flush succeeded and we moved CBs into the ->cblist.
>> +		// However, the bypass timer might still be running. Wakeup the
> That part of the comment mentioning the bypass timer looks strange...
> 
> 
>> +		// GP thread by calling a helper with was_all_done set so that
>> +		// wake up happens (needed if main CB list was empty before).
> How about:
> 
>                 // The flush succeeded and we moved CBs into the regular list.
> 		// Don't wait for the wake up timer as it may be too far ahead.
> 		// Wake up now GP thread instead if the cblist was empty
> 
> Thanks.
> 
> Other than that the patch looks good, thanks!
> 

I updated it accordingly and will send it for next revision, thank you!

 - Joel
diff mbox series

Patch

diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index 0a5f0ef41484..31068dd31315 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -447,7 +447,13 @@  static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
 			rcu_advance_cbs_nowake(rdp->mynode, rdp);
 			rdp->nocb_gp_adv_time = j;
 		}
-		rcu_nocb_unlock_irqrestore(rdp, flags);
+
+		// The flush succeeded and we moved CBs into the ->cblist.
+		// However, the bypass timer might still be running. Wakeup the
+		// GP thread by calling a helper with was_all_done set so that
+		// wake up happens (needed if main CB list was empty before).
+		__call_rcu_nocb_wake(rdp, true, flags)
+
 		return true; // Callback already enqueued.
 	}