Message ID | 20221012215717.10492-1-beaub@linux.microsoft.com (mailing list archive) |
---|---|
State | Rejected |
Headers | show |
Series | tracing: Track event ref in tracefs enable/disable | expand |
On Wed, 12 Oct 2022 14:57:17 -0700 Beau Belgrave <beaub@linux.microsoft.com> wrote: > When events are enabled via the "enable" file within tracefs there is no > get or put ref. Add these to ensure modules and dynamic events do not > unload while the event is enabled via tracefs. Why is this an issue? The events are only called from the module code, and when the module is unloaded, they are no longer called. Why keep the module from unloading when enabled? -- Steve
On Wed, Oct 12, 2022 at 06:26:39PM -0400, Steven Rostedt wrote: > On Wed, 12 Oct 2022 14:57:17 -0700 > Beau Belgrave <beaub@linux.microsoft.com> wrote: > > > When events are enabled via the "enable" file within tracefs there is no > > get or put ref. Add these to ensure modules and dynamic events do not > > unload while the event is enabled via tracefs. > > Why is this an issue? It depends on the scenario, it may or may not cause a issue. > > The events are only called from the module code, and when the module is > unloaded, they are no longer called. Why keep the module from unloading > when enabled? Won't the modules remove the event calls? At the very least the event call structure in memory goes away during module unload. If it gets reused odd things will happen, right? IE: trace_module_remove_events(). Maybe I have a bad assumption: I thought the point of trace_event_try_get_ref()/put_ref() was to tell the system the call cannot go away. However, if ftrace enable doesn't use these the lifetime is ambigious in this case. If this was intentional, how are event call lifetimes described if not within the ref? In my namespace patches I hit this case when user_events try to go away during namespace teardown. Since there is no reference to the event being used I removed the call. However, it was clearly being used within tracefs at that point. When I cat "enable" in this case instead of "0" or "1" I get "?". I suppose worse things could happen when the memory of the call gets reused? > > -- Steve Thanks, -Beau
On Wed, 12 Oct 2022 17:19:38 -0700 Beau Belgrave <beaub@linux.microsoft.com> wrote: > > > > The events are only called from the module code, and when the module is > > unloaded, they are no longer called. Why keep the module from unloading > > when enabled? > > Won't the modules remove the event calls? At the very least the event > call structure in memory goes away during module unload. If it gets > reused odd things will happen, right? IE: trace_module_remove_events(). Yes it gets removed along with the module. But everything about the static event is part of the module. > > Maybe I have a bad assumption: > I thought the point of trace_event_try_get_ref()/put_ref() was to tell > the system the call cannot go away. However, if ftrace enable doesn't > use these the lifetime is ambigious in this case. If this was > intentional, how are event call lifetimes described if not within the > ref? > The purpose of trace_event_try_get_ref and friends is for the case of dynamic events (eprobes and synthetic events) that can attach to any event (including module events), and the module removal does not remove the dynamic portion that was attached to them. And in this case, the removal would have dire results. > In my namespace patches I hit this case when user_events try to go away > during namespace teardown. Since there is no reference to the event > being used I removed the call. However, it was clearly being used within > tracefs at that point. When I cat "enable" in this case instead of "0" > or "1" I get "?". I suppose worse things could happen when the memory of > the call gets reused? > Could you have a callback telling tracefs that it is going away (like the module notifier does)? -- Steve
On Thu, Oct 13, 2022 at 03:58:27AM -0400, Steven Rostedt wrote: > On Wed, 12 Oct 2022 17:19:38 -0700 > Beau Belgrave <beaub@linux.microsoft.com> wrote: > > > > > > > The events are only called from the module code, and when the module is > > > unloaded, they are no longer called. Why keep the module from unloading > > > when enabled? > > > > Won't the modules remove the event calls? At the very least the event > > call structure in memory goes away during module unload. If it gets > > reused odd things will happen, right? IE: trace_module_remove_events(). > > Yes it gets removed along with the module. But everything about the > static event is part of the module. > I see, trace_module_remove_events() does remove the trace call for module cases only. > > > > Maybe I have a bad assumption: > > I thought the point of trace_event_try_get_ref()/put_ref() was to tell > > the system the call cannot go away. However, if ftrace enable doesn't > > use these the lifetime is ambigious in this case. If this was > > intentional, how are event call lifetimes described if not within the > > ref? > > > > The purpose of trace_event_try_get_ref and friends is for the case of > dynamic events (eprobes and synthetic events) that can attach to any > event (including module events), and the module removal does not remove > the dynamic portion that was attached to them. And in this case, the > removal would have dire results. > Got it, uprobes and user_events are also dynamic events. Here are things I found and am thinking about and questioning: Uprobe specifically calls trace_event_dyn_busy() during unregister_trace_uprobe(). This will return false in this case, since there is no reference. Fortunately trace_uprobe_is_busy() used by dyn_events checks the event flags to see if something really hooked there. User_events also tracks usage via the event.class.reg() REG/UNREG callback. It's dyn_events is_busy() op also checks internal flags. So normally these are not issues for either above system. However, once we have a tracing_namespace with put/get ref semantics for lifetime, those corner case checks won't keep us from removing the calls. It appears if perf attaches to any event it will take a reference. This would imply the module won't be allowed to go away while perf is running. I'm not sure why we wouldn't want consistency, perhaps it might break some scenarios I'm unaware of around module unload. > > In my namespace patches I hit this case when user_events try to go away > > during namespace teardown. Since there is no reference to the event > > being used I removed the call. However, it was clearly being used within > > tracefs at that point. When I cat "enable" in this case instead of "0" > > or "1" I get "?". I suppose worse things could happen when the memory of > > the call gets reused? > > > > Could you have a callback telling tracefs that it is going away (like > the module notifier does)? > I could, would it be limited to just tracefs? I don't believe perf or eBPF currently hooks into those notifiers? If we had a consistent get/put ref model I don't think we would need the notifiers (unless we cannot trust all legacy code doing a get/put ref). It appears in my testing that perf is always doing get/put refs in these cases to ensure the call cannot go away underneath it. Thanks, -Beau
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c index 0356cae0cf74..f0b17adba1bc 100644 --- a/kernel/trace/trace_events.c +++ b/kernel/trace/trace_events.c @@ -634,6 +634,8 @@ static int __ftrace_event_enable_disable(struct trace_event_file *file, } call->class->reg(call, TRACE_REG_UNREGISTER, file); + + trace_event_put_ref(call); } /* If in SOFT_MODE, just set the SOFT_DISABLE_BIT, else clear it */ if (file->flags & EVENT_FILE_FL_SOFT_MODE) @@ -659,7 +661,7 @@ static int __ftrace_event_enable_disable(struct trace_event_file *file, } if (!(file->flags & EVENT_FILE_FL_ENABLED)) { - bool cmd = false, tgid = false; + bool cmd = false, tgid = false, put = false; /* Keep the event disabled, when going to SOFT_MODE. */ if (soft_disable) @@ -677,14 +679,22 @@ static int __ftrace_event_enable_disable(struct trace_event_file *file, set_bit(EVENT_FILE_FL_RECORDED_TGID_BIT, &file->flags); } + if (!trace_event_try_get_ref(call)) + goto error_enabling; + + put = true; + ret = call->class->reg(call, TRACE_REG_REGISTER, file); if (ret) { +error_enabling: + pr_info("event trace: Could not enable event " + "%s\n", trace_event_name(call)); if (cmd) tracing_stop_cmdline_record(); if (tgid) tracing_stop_tgid_record(); - pr_info("event trace: Could not enable event " - "%s\n", trace_event_name(call)); + if (put) + trace_event_put_ref(call); break; } set_bit(EVENT_FILE_FL_ENABLED_BIT, &file->flags);
When events are enabled via the "enable" file within tracefs there is no get or put ref. Add these to ensure modules and dynamic events do not unload while the event is enabled via tracefs. Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com> --- kernel/trace/trace_events.c | 16 +++++++++++++--- 1 file changed, 13 insertions(+), 3 deletions(-) base-commit: 6c0f39e87b6ab1a3009e3a49d3e6f6db8dc756a8