diff mbox series

[rcu/next,3/3] rcu: Call trace_rcu_callback() also for bypass queuing (v2)

Message ID 20220917164200.511783-4-joel@joelfernandes.org (mailing list archive)
State New, archived
Headers show
Series Preparatory patches borrowed from lazy rcu v5 | expand

Commit Message

Joel Fernandes Sept. 17, 2022, 4:42 p.m. UTC
If any callback is queued into the bypass list, then
trace_rcu_callback() does not show it. This makes it unclear when a
callback was actually queued, as the resulting trace only includes a
rcu_invoke_callback event. Fix it by calling the tracing function even
if queuing into bypass. This is needed for the future rcutop tool which
monitors enqueuing of callbacks.

Note that, in case of bypass queuing, the new tracing happens without
the nocb_lock.  This should be OK since on CONFIG_RCU_NOCB_CPU systems,
the total callbacks is represented by an atomic counter. Also, other
paths like rcu_barrier() also sample the total number of callback
without the nocb_lock.

Also, while at it, optimize the tracing so that rcu_state is not
accessed if tracing is disabled, because that's useless if we are
not tracing. A quick inspection of the generated assembler shows that
rcu_state is accessed even if the jump label for the tracepoint is
disabled.

Here is gcc -S output of the bad asm (note that I un-inlined it just for
testing and illustration however the final __trace_rcu_callback in the
patch is marked static inline):

__trace_rcu_callback:
        movq    8(%rdi), %rcx
        movq    rcu_state+3640(%rip), %rax
        movq    %rdi, %rdx
        cmpq    $4095, %rcx
        ja      .L3100
        movq    192(%rsi), %r8
        1:jmp .L3101 # objtool NOPs this
        .pushsection __jump_table,  "aw"
         .balign 8
        .long 1b - .
        .long .L3101 - .
         .quad __tracepoint_rcu_kvfree_callback+8 + 2 - .
        .popsection

With this change, the jump label check which is NOOPed is moved to the
beginning of the function.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 kernel/rcu/tree.c | 30 ++++++++++++++++++++++--------
 1 file changed, 22 insertions(+), 8 deletions(-)

Comments

Frederic Weisbecker Sept. 17, 2022, 7:58 p.m. UTC | #1
On Sat, Sep 17, 2022 at 04:42:00PM +0000, Joel Fernandes (Google) wrote:
> @@ -2809,17 +2825,15 @@ void call_rcu(struct rcu_head *head, rcu_callback_t func)
>  	}
>  
>  	check_cb_ovld(rdp);
> -	if (rcu_nocb_try_bypass(rdp, head, &was_alldone, flags))
> +
> +	if (rcu_nocb_try_bypass(rdp, head, &was_alldone, flags)) {
> +		__trace_rcu_callback(head, rdp);
>  		return; // Enqueued onto ->nocb_bypass, so just leave.
> +	}

I think the bypass enqueues should be treated differently. Either with extending
the current trace_rcu_callback/trace_rcu_kvfree_callback (might break tools) or
with creating a new trace_rcu_callback_bypass()/trace_rcu_kvfree_callback_bypass().

Those could later be paired with a trace_rcu_bypass_flush().

Thanks.


> +
>  	// If no-CBs CPU gets here, rcu_nocb_try_bypass() acquired ->nocb_lock.
>  	rcu_segcblist_enqueue(&rdp->cblist, head);
> -	if (__is_kvfree_rcu_offset((unsigned long)func))
> -		trace_rcu_kvfree_callback(rcu_state.name, head,
> -					 (unsigned long)func,
> -					 rcu_segcblist_n_cbs(&rdp->cblist));
> -	else
> -		trace_rcu_callback(rcu_state.name, head,
> -				   rcu_segcblist_n_cbs(&rdp->cblist));
> +	__trace_rcu_callback(head, rdp);
>  
>  	trace_rcu_segcb_stats(&rdp->cblist, TPS("SegCBQueued"));
>  
> -- 
> 2.37.3.968.ga6b4b080e4-goog
>
Joel Fernandes Sept. 17, 2022, 9:43 p.m. UTC | #2
On Sat, Sep 17, 2022 at 3:58 PM Frederic Weisbecker <frederic@kernel.org> wrote:
>
> On Sat, Sep 17, 2022 at 04:42:00PM +0000, Joel Fernandes (Google) wrote:
> > @@ -2809,17 +2825,15 @@ void call_rcu(struct rcu_head *head, rcu_callback_t func)
> >       }
> >
> >       check_cb_ovld(rdp);
> > -     if (rcu_nocb_try_bypass(rdp, head, &was_alldone, flags))
> > +
> > +     if (rcu_nocb_try_bypass(rdp, head, &was_alldone, flags)) {
> > +             __trace_rcu_callback(head, rdp);
> >               return; // Enqueued onto ->nocb_bypass, so just leave.
> > +     }
>
> I think the bypass enqueues should be treated differently. Either with extending
> the current trace_rcu_callback/trace_rcu_kvfree_callback (might break tools)
>
> or
> with creating a new trace_rcu_callback_bypass()/trace_rcu_kvfree_callback_bypass().
>
> Those could later be paired with a trace_rcu_bypass_flush().

I am having a hard time seeing why it should be treated differently.
We already increment the length of the main segcblist even when
bypassing. Why not just call the trace point instead of omitting it?
Otherwise it becomes a bit confusing IMO (say someone does not enable
your proposed new bypass tracepoint and only enables the existing one,
then they would see weird traces where call_rcu is called but their
traces are missing trace_rcu_callback). Not to mention - your
suggestion will also complicate writing tools that use the existing
rcu_callback tracepoint to monitor call_rcu().

Also if you see the definition of rcu_callback, "Tracepoint for the
registration of a single RCU callback function.".  That pretty much
fits the usage here.

As for tracing of the flushing, I don’t care about tracing that at the
moment using tracepoints, but I don’t mind if it is added later.

Maybe let’s let Paul help resolve our disagreement on this one? :)

thanks,

 - Joel

>
>
> Thanks.
>
>
> > +
> >       // If no-CBs CPU gets here, rcu_nocb_try_bypass() acquired ->nocb_lock.
> >       rcu_segcblist_enqueue(&rdp->cblist, head);
> > -     if (__is_kvfree_rcu_offset((unsigned long)func))
> > -             trace_rcu_kvfree_callback(rcu_state.name, head,
> > -                                      (unsigned long)func,
> > -                                      rcu_segcblist_n_cbs(&rdp->cblist));
> > -     else
> > -             trace_rcu_callback(rcu_state.name, head,
> > -                                rcu_segcblist_n_cbs(&rdp->cblist));
> > +     __trace_rcu_callback(head, rdp);
> >
> >       trace_rcu_segcb_stats(&rdp->cblist, TPS("SegCBQueued"));
> >
> > --
> > 2.37.3.968.ga6b4b080e4-goog
> >
Frederic Weisbecker Sept. 17, 2022, 10:21 p.m. UTC | #3
On Sat, Sep 17, 2022 at 05:43:06PM -0400, Joel Fernandes wrote:
> On Sat, Sep 17, 2022 at 3:58 PM Frederic Weisbecker <frederic@kernel.org> wrote:
> >
> > On Sat, Sep 17, 2022 at 04:42:00PM +0000, Joel Fernandes (Google) wrote:
> > > @@ -2809,17 +2825,15 @@ void call_rcu(struct rcu_head *head, rcu_callback_t func)
> > >       }
> > >
> > >       check_cb_ovld(rdp);
> > > -     if (rcu_nocb_try_bypass(rdp, head, &was_alldone, flags))
> > > +
> > > +     if (rcu_nocb_try_bypass(rdp, head, &was_alldone, flags)) {
> > > +             __trace_rcu_callback(head, rdp);
> > >               return; // Enqueued onto ->nocb_bypass, so just leave.
> > > +     }
> >
> > I think the bypass enqueues should be treated differently. Either with extending
> > the current trace_rcu_callback/trace_rcu_kvfree_callback (might break tools)
> >
> > or
> > with creating a new trace_rcu_callback_bypass()/trace_rcu_kvfree_callback_bypass().
> >
> > Those could later be paired with a trace_rcu_bypass_flush().
> 
> I am having a hard time seeing why it should be treated differently.
> We already increment the length of the main segcblist even when
> bypassing. Why not just call the trace point instead of omitting it?

I'm not suggesting to omit it. I'm suggesting to improve its precision.

> Otherwise it becomes a bit confusing IMO (say someone does not enable
> your proposed new bypass tracepoint and only enables the existing one,
> then they would see weird traces where call_rcu is called but their
> traces are missing trace_rcu_callback).

Well, if they decided to see only half of the information...

> Not to mention - your
> suggestion will also complicate writing tools that use the existing
> rcu_callback tracepoint to monitor call_rcu().

If we add another tracepoint, the prototype will be the same as the
existing one, not many lines to add. If instead we extend the existing
tracepoint, it's merely just a flag to check or ignore.

OTOH your suggestion doesn't provide any bypass related information.

> 
> Also if you see the definition of rcu_callback, "Tracepoint for the
> registration of a single RCU callback function.".  That pretty much
> fits the usage here.

Doesn't tell if it's a bypass or not.

> 
> As for tracing of the flushing, I don’t care about tracing that at the
> moment using tracepoints

You will soon enough ;-)

> but I don’t mind if it is added later.
> Maybe let’s let Paul help resolve our disagreement on this one? :)

FWIW, I would be personally interested in such tracepoints (or the extension
of the existing ones, whichever way you guys prefer), they would be of great help
for debugging.

Also if rcu_top is ever released, I really hope the kernel will be ready in
case we want the tool to display bypass related informations.

Please be careful while designing tracepoints that may be consumed by userspace
released tools. Such tracepoints eventually turn into ABI and there is no way
back after that.

Thanks.
Joel Fernandes Sept. 17, 2022, 10:42 p.m. UTC | #4
On Sat, Sep 17, 2022 at 6:21 PM Frederic Weisbecker <frederic@kernel.org> wrote:
>
> On Sat, Sep 17, 2022 at 05:43:06PM -0400, Joel Fernandes wrote:
> > On Sat, Sep 17, 2022 at 3:58 PM Frederic Weisbecker <frederic@kernel.org> wrote:
> > >
> > > On Sat, Sep 17, 2022 at 04:42:00PM +0000, Joel Fernandes (Google) wrote:
> > > > @@ -2809,17 +2825,15 @@ void call_rcu(struct rcu_head *head, rcu_callback_t func)
> > > >       }
> > > >
> > > >       check_cb_ovld(rdp);
> > > > -     if (rcu_nocb_try_bypass(rdp, head, &was_alldone, flags))
> > > > +
> > > > +     if (rcu_nocb_try_bypass(rdp, head, &was_alldone, flags)) {
> > > > +             __trace_rcu_callback(head, rdp);
> > > >               return; // Enqueued onto ->nocb_bypass, so just leave.
> > > > +     }
> > >
> > > I think the bypass enqueues should be treated differently. Either with extending
> > > the current trace_rcu_callback/trace_rcu_kvfree_callback (might break tools)
> > >
> > > or
> > > with creating a new trace_rcu_callback_bypass()/trace_rcu_kvfree_callback_bypass().
> > >
> > > Those could later be paired with a trace_rcu_bypass_flush().
> >
> > I am having a hard time seeing why it should be treated differently.
> > We already increment the length of the main segcblist even when
> > bypassing. Why not just call the trace point instead of omitting it?
>
> I'm not suggesting to omit it. I'm suggesting to improve its precision.

That's exactly what I'm doing :-). It is imprecise the way it is, by
calling it where it needs to be (not omitting it), I am making it more
precise.

> > Otherwise it becomes a bit confusing IMO (say someone does not enable
> > your proposed new bypass tracepoint and only enables the existing one,
> > then they would see weird traces where call_rcu is called but their
> > traces are missing trace_rcu_callback).
>
> Well, if they decided to see only half of the information...

It is not that they decide, there are lots of RCU tracepoints and it
is likely common to enable just a few of them.

> > Not to mention - your
> > suggestion will also complicate writing tools that use the existing
> > rcu_callback tracepoint to monitor call_rcu().
>
> If we add another tracepoint, the prototype will be the same as the
> existing one, not many lines to add. If instead we extend the existing
> tracepoint, it's merely just a flag to check or ignore.
>
> OTOH your suggestion doesn't provide any bypass related information.

Bypass related information is not relevant to this patch. I am already
using trace_rcu_callback() in my (yet to be released) rcutop, and I
don't use it for any bypass information.

> > Also if you see the definition of rcu_callback, "Tracepoint for the
> > registration of a single RCU callback function.".  That pretty much
> > fits the usage here.
>
> Doesn't tell if it's a bypass or not.

It doesn't tell a lot of things, so what? Saying that it is bypass is
not the point of this patch.

> > As for tracing of the flushing, I don’t care about tracing that at the
> > moment using tracepoints
>
> You will soon enough ;-)

I respect your experience in the matter :-)

> > but I don’t mind if it is added later.
> > Maybe let’s let Paul help resolve our disagreement on this one? :)
>
> FWIW, I would be personally interested in such tracepoints (or the extension

Understood.

> of the existing ones, whichever way you guys prefer), they would be of great help
> for debugging.
>
> Also if rcu_top is ever released, I really hope the kernel will be ready in
> case we want the tool to display bypass related informations.

I feel the main issue you have with my patch is that it does not add
the information you want, however the information you mention is
beyond the scope of the patch and can in future/different patches.
This patch only fixes an *existing* broken tracepoint.

I can certainly add a new bypass-related tracepoint in the future, but
I don't see how that's relevant to my patch.

> Please be careful while designing tracepoints that may be consumed by userspace
> released tools. Such tracepoints eventually turn into ABI and there is no way
> back after that.

Sure thing, that's why I'm fixing the broken tracepoint. Some
registered callbacks can be invisible to the user.

 - Joel
diff mbox series

Patch

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 5ec97e3f7468..18f07e167d5e 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2728,6 +2728,22 @@  static void check_cb_ovld(struct rcu_data *rdp)
 	raw_spin_unlock_rcu_node(rnp);
 }
 
+/*
+ * Trace RCU callback helper, call after enqueuing callback.
+ */
+static inline void __trace_rcu_callback(struct rcu_head *head,
+				      struct rcu_data *rdp)
+{
+	if (trace_rcu_kvfree_callback_enabled() &&
+	    __is_kvfree_rcu_offset((unsigned long)head->func))
+		trace_rcu_kvfree_callback(rcu_state.name, head,
+					 (unsigned long)head->func,
+					 rcu_segcblist_n_cbs(&rdp->cblist));
+	else if (trace_rcu_callback_enabled())
+		trace_rcu_callback(rcu_state.name, head,
+				   rcu_segcblist_n_cbs(&rdp->cblist));
+}
+
 /**
  * call_rcu() - Queue an RCU callback for invocation after a grace period.
  * @head: structure to be used for queueing the RCU updates.
@@ -2809,17 +2825,15 @@  void call_rcu(struct rcu_head *head, rcu_callback_t func)
 	}
 
 	check_cb_ovld(rdp);
-	if (rcu_nocb_try_bypass(rdp, head, &was_alldone, flags))
+
+	if (rcu_nocb_try_bypass(rdp, head, &was_alldone, flags)) {
+		__trace_rcu_callback(head, rdp);
 		return; // Enqueued onto ->nocb_bypass, so just leave.
+	}
+
 	// If no-CBs CPU gets here, rcu_nocb_try_bypass() acquired ->nocb_lock.
 	rcu_segcblist_enqueue(&rdp->cblist, head);
-	if (__is_kvfree_rcu_offset((unsigned long)func))
-		trace_rcu_kvfree_callback(rcu_state.name, head,
-					 (unsigned long)func,
-					 rcu_segcblist_n_cbs(&rdp->cblist));
-	else
-		trace_rcu_callback(rcu_state.name, head,
-				   rcu_segcblist_n_cbs(&rdp->cblist));
+	__trace_rcu_callback(head, rdp);
 
 	trace_rcu_segcb_stats(&rdp->cblist, TPS("SegCBQueued"));