Message ID | 20220917164200.511783-4-joel@joelfernandes.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Preparatory patches borrowed from lazy rcu v5 | expand |
On Sat, Sep 17, 2022 at 04:42:00PM +0000, Joel Fernandes (Google) wrote: > @@ -2809,17 +2825,15 @@ void call_rcu(struct rcu_head *head, rcu_callback_t func) > } > > check_cb_ovld(rdp); > - if (rcu_nocb_try_bypass(rdp, head, &was_alldone, flags)) > + > + if (rcu_nocb_try_bypass(rdp, head, &was_alldone, flags)) { > + __trace_rcu_callback(head, rdp); > return; // Enqueued onto ->nocb_bypass, so just leave. > + } I think the bypass enqueues should be treated differently. Either with extending the current trace_rcu_callback/trace_rcu_kvfree_callback (might break tools) or with creating a new trace_rcu_callback_bypass()/trace_rcu_kvfree_callback_bypass(). Those could later be paired with a trace_rcu_bypass_flush(). Thanks. > + > // If no-CBs CPU gets here, rcu_nocb_try_bypass() acquired ->nocb_lock. > rcu_segcblist_enqueue(&rdp->cblist, head); > - if (__is_kvfree_rcu_offset((unsigned long)func)) > - trace_rcu_kvfree_callback(rcu_state.name, head, > - (unsigned long)func, > - rcu_segcblist_n_cbs(&rdp->cblist)); > - else > - trace_rcu_callback(rcu_state.name, head, > - rcu_segcblist_n_cbs(&rdp->cblist)); > + __trace_rcu_callback(head, rdp); > > trace_rcu_segcb_stats(&rdp->cblist, TPS("SegCBQueued")); > > -- > 2.37.3.968.ga6b4b080e4-goog >
On Sat, Sep 17, 2022 at 3:58 PM Frederic Weisbecker <frederic@kernel.org> wrote: > > On Sat, Sep 17, 2022 at 04:42:00PM +0000, Joel Fernandes (Google) wrote: > > @@ -2809,17 +2825,15 @@ void call_rcu(struct rcu_head *head, rcu_callback_t func) > > } > > > > check_cb_ovld(rdp); > > - if (rcu_nocb_try_bypass(rdp, head, &was_alldone, flags)) > > + > > + if (rcu_nocb_try_bypass(rdp, head, &was_alldone, flags)) { > > + __trace_rcu_callback(head, rdp); > > return; // Enqueued onto ->nocb_bypass, so just leave. > > + } > > I think the bypass enqueues should be treated differently. Either with extending > the current trace_rcu_callback/trace_rcu_kvfree_callback (might break tools) > > or > with creating a new trace_rcu_callback_bypass()/trace_rcu_kvfree_callback_bypass(). > > Those could later be paired with a trace_rcu_bypass_flush(). I am having a hard time seeing why it should be treated differently. We already increment the length of the main segcblist even when bypassing. Why not just call the trace point instead of omitting it? Otherwise it becomes a bit confusing IMO (say someone does not enable your proposed new bypass tracepoint and only enables the existing one, then they would see weird traces where call_rcu is called but their traces are missing trace_rcu_callback). Not to mention - your suggestion will also complicate writing tools that use the existing rcu_callback tracepoint to monitor call_rcu(). Also if you see the definition of rcu_callback, "Tracepoint for the registration of a single RCU callback function.". That pretty much fits the usage here. As for tracing of the flushing, I don’t care about tracing that at the moment using tracepoints, but I don’t mind if it is added later. Maybe let’s let Paul help resolve our disagreement on this one? :) thanks, - Joel > > > Thanks. > > > > + > > // If no-CBs CPU gets here, rcu_nocb_try_bypass() acquired ->nocb_lock. > > rcu_segcblist_enqueue(&rdp->cblist, head); > > - if (__is_kvfree_rcu_offset((unsigned long)func)) > > - trace_rcu_kvfree_callback(rcu_state.name, head, > > - (unsigned long)func, > > - rcu_segcblist_n_cbs(&rdp->cblist)); > > - else > > - trace_rcu_callback(rcu_state.name, head, > > - rcu_segcblist_n_cbs(&rdp->cblist)); > > + __trace_rcu_callback(head, rdp); > > > > trace_rcu_segcb_stats(&rdp->cblist, TPS("SegCBQueued")); > > > > -- > > 2.37.3.968.ga6b4b080e4-goog > >
On Sat, Sep 17, 2022 at 05:43:06PM -0400, Joel Fernandes wrote: > On Sat, Sep 17, 2022 at 3:58 PM Frederic Weisbecker <frederic@kernel.org> wrote: > > > > On Sat, Sep 17, 2022 at 04:42:00PM +0000, Joel Fernandes (Google) wrote: > > > @@ -2809,17 +2825,15 @@ void call_rcu(struct rcu_head *head, rcu_callback_t func) > > > } > > > > > > check_cb_ovld(rdp); > > > - if (rcu_nocb_try_bypass(rdp, head, &was_alldone, flags)) > > > + > > > + if (rcu_nocb_try_bypass(rdp, head, &was_alldone, flags)) { > > > + __trace_rcu_callback(head, rdp); > > > return; // Enqueued onto ->nocb_bypass, so just leave. > > > + } > > > > I think the bypass enqueues should be treated differently. Either with extending > > the current trace_rcu_callback/trace_rcu_kvfree_callback (might break tools) > > > > or > > with creating a new trace_rcu_callback_bypass()/trace_rcu_kvfree_callback_bypass(). > > > > Those could later be paired with a trace_rcu_bypass_flush(). > > I am having a hard time seeing why it should be treated differently. > We already increment the length of the main segcblist even when > bypassing. Why not just call the trace point instead of omitting it? I'm not suggesting to omit it. I'm suggesting to improve its precision. > Otherwise it becomes a bit confusing IMO (say someone does not enable > your proposed new bypass tracepoint and only enables the existing one, > then they would see weird traces where call_rcu is called but their > traces are missing trace_rcu_callback). Well, if they decided to see only half of the information... > Not to mention - your > suggestion will also complicate writing tools that use the existing > rcu_callback tracepoint to monitor call_rcu(). If we add another tracepoint, the prototype will be the same as the existing one, not many lines to add. If instead we extend the existing tracepoint, it's merely just a flag to check or ignore. OTOH your suggestion doesn't provide any bypass related information. > > Also if you see the definition of rcu_callback, "Tracepoint for the > registration of a single RCU callback function.". That pretty much > fits the usage here. Doesn't tell if it's a bypass or not. > > As for tracing of the flushing, I don’t care about tracing that at the > moment using tracepoints You will soon enough ;-) > but I don’t mind if it is added later. > Maybe let’s let Paul help resolve our disagreement on this one? :) FWIW, I would be personally interested in such tracepoints (or the extension of the existing ones, whichever way you guys prefer), they would be of great help for debugging. Also if rcu_top is ever released, I really hope the kernel will be ready in case we want the tool to display bypass related informations. Please be careful while designing tracepoints that may be consumed by userspace released tools. Such tracepoints eventually turn into ABI and there is no way back after that. Thanks.
On Sat, Sep 17, 2022 at 6:21 PM Frederic Weisbecker <frederic@kernel.org> wrote: > > On Sat, Sep 17, 2022 at 05:43:06PM -0400, Joel Fernandes wrote: > > On Sat, Sep 17, 2022 at 3:58 PM Frederic Weisbecker <frederic@kernel.org> wrote: > > > > > > On Sat, Sep 17, 2022 at 04:42:00PM +0000, Joel Fernandes (Google) wrote: > > > > @@ -2809,17 +2825,15 @@ void call_rcu(struct rcu_head *head, rcu_callback_t func) > > > > } > > > > > > > > check_cb_ovld(rdp); > > > > - if (rcu_nocb_try_bypass(rdp, head, &was_alldone, flags)) > > > > + > > > > + if (rcu_nocb_try_bypass(rdp, head, &was_alldone, flags)) { > > > > + __trace_rcu_callback(head, rdp); > > > > return; // Enqueued onto ->nocb_bypass, so just leave. > > > > + } > > > > > > I think the bypass enqueues should be treated differently. Either with extending > > > the current trace_rcu_callback/trace_rcu_kvfree_callback (might break tools) > > > > > > or > > > with creating a new trace_rcu_callback_bypass()/trace_rcu_kvfree_callback_bypass(). > > > > > > Those could later be paired with a trace_rcu_bypass_flush(). > > > > I am having a hard time seeing why it should be treated differently. > > We already increment the length of the main segcblist even when > > bypassing. Why not just call the trace point instead of omitting it? > > I'm not suggesting to omit it. I'm suggesting to improve its precision. That's exactly what I'm doing :-). It is imprecise the way it is, by calling it where it needs to be (not omitting it), I am making it more precise. > > Otherwise it becomes a bit confusing IMO (say someone does not enable > > your proposed new bypass tracepoint and only enables the existing one, > > then they would see weird traces where call_rcu is called but their > > traces are missing trace_rcu_callback). > > Well, if they decided to see only half of the information... It is not that they decide, there are lots of RCU tracepoints and it is likely common to enable just a few of them. > > Not to mention - your > > suggestion will also complicate writing tools that use the existing > > rcu_callback tracepoint to monitor call_rcu(). > > If we add another tracepoint, the prototype will be the same as the > existing one, not many lines to add. If instead we extend the existing > tracepoint, it's merely just a flag to check or ignore. > > OTOH your suggestion doesn't provide any bypass related information. Bypass related information is not relevant to this patch. I am already using trace_rcu_callback() in my (yet to be released) rcutop, and I don't use it for any bypass information. > > Also if you see the definition of rcu_callback, "Tracepoint for the > > registration of a single RCU callback function.". That pretty much > > fits the usage here. > > Doesn't tell if it's a bypass or not. It doesn't tell a lot of things, so what? Saying that it is bypass is not the point of this patch. > > As for tracing of the flushing, I don’t care about tracing that at the > > moment using tracepoints > > You will soon enough ;-) I respect your experience in the matter :-) > > but I don’t mind if it is added later. > > Maybe let’s let Paul help resolve our disagreement on this one? :) > > FWIW, I would be personally interested in such tracepoints (or the extension Understood. > of the existing ones, whichever way you guys prefer), they would be of great help > for debugging. > > Also if rcu_top is ever released, I really hope the kernel will be ready in > case we want the tool to display bypass related informations. I feel the main issue you have with my patch is that it does not add the information you want, however the information you mention is beyond the scope of the patch and can in future/different patches. This patch only fixes an *existing* broken tracepoint. I can certainly add a new bypass-related tracepoint in the future, but I don't see how that's relevant to my patch. > Please be careful while designing tracepoints that may be consumed by userspace > released tools. Such tracepoints eventually turn into ABI and there is no way > back after that. Sure thing, that's why I'm fixing the broken tracepoint. Some registered callbacks can be invisible to the user. - Joel
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 5ec97e3f7468..18f07e167d5e 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -2728,6 +2728,22 @@ static void check_cb_ovld(struct rcu_data *rdp) raw_spin_unlock_rcu_node(rnp); } +/* + * Trace RCU callback helper, call after enqueuing callback. + */ +static inline void __trace_rcu_callback(struct rcu_head *head, + struct rcu_data *rdp) +{ + if (trace_rcu_kvfree_callback_enabled() && + __is_kvfree_rcu_offset((unsigned long)head->func)) + trace_rcu_kvfree_callback(rcu_state.name, head, + (unsigned long)head->func, + rcu_segcblist_n_cbs(&rdp->cblist)); + else if (trace_rcu_callback_enabled()) + trace_rcu_callback(rcu_state.name, head, + rcu_segcblist_n_cbs(&rdp->cblist)); +} + /** * call_rcu() - Queue an RCU callback for invocation after a grace period. * @head: structure to be used for queueing the RCU updates. @@ -2809,17 +2825,15 @@ void call_rcu(struct rcu_head *head, rcu_callback_t func) } check_cb_ovld(rdp); - if (rcu_nocb_try_bypass(rdp, head, &was_alldone, flags)) + + if (rcu_nocb_try_bypass(rdp, head, &was_alldone, flags)) { + __trace_rcu_callback(head, rdp); return; // Enqueued onto ->nocb_bypass, so just leave. + } + // If no-CBs CPU gets here, rcu_nocb_try_bypass() acquired ->nocb_lock. rcu_segcblist_enqueue(&rdp->cblist, head); - if (__is_kvfree_rcu_offset((unsigned long)func)) - trace_rcu_kvfree_callback(rcu_state.name, head, - (unsigned long)func, - rcu_segcblist_n_cbs(&rdp->cblist)); - else - trace_rcu_callback(rcu_state.name, head, - rcu_segcblist_n_cbs(&rdp->cblist)); + __trace_rcu_callback(head, rdp); trace_rcu_segcb_stats(&rdp->cblist, TPS("SegCBQueued"));
If any callback is queued into the bypass list, then trace_rcu_callback() does not show it. This makes it unclear when a callback was actually queued, as the resulting trace only includes a rcu_invoke_callback event. Fix it by calling the tracing function even if queuing into bypass. This is needed for the future rcutop tool which monitors enqueuing of callbacks. Note that, in case of bypass queuing, the new tracing happens without the nocb_lock. This should be OK since on CONFIG_RCU_NOCB_CPU systems, the total callbacks is represented by an atomic counter. Also, other paths like rcu_barrier() also sample the total number of callback without the nocb_lock. Also, while at it, optimize the tracing so that rcu_state is not accessed if tracing is disabled, because that's useless if we are not tracing. A quick inspection of the generated assembler shows that rcu_state is accessed even if the jump label for the tracepoint is disabled. Here is gcc -S output of the bad asm (note that I un-inlined it just for testing and illustration however the final __trace_rcu_callback in the patch is marked static inline): __trace_rcu_callback: movq 8(%rdi), %rcx movq rcu_state+3640(%rip), %rax movq %rdi, %rdx cmpq $4095, %rcx ja .L3100 movq 192(%rsi), %r8 1:jmp .L3101 # objtool NOPs this .pushsection __jump_table, "aw" .balign 8 .long 1b - . .long .L3101 - . .quad __tracepoint_rcu_kvfree_callback+8 + 2 - . .popsection With this change, the jump label check which is NOOPed is moved to the beginning of the function. Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> --- kernel/rcu/tree.c | 30 ++++++++++++++++++++++-------- 1 file changed, 22 insertions(+), 8 deletions(-)