Message ID | 20231002144149.1325-1-Nicholas.Lowell@gmail.com (mailing list archive) |
---|---|
State | Changes Requested |
Headers | show |
Series | [v2] trace: tracing_event_filter: fast path when no subsystem filters | expand |
On Mon, 2 Oct 2023 10:41:48 -0400 Nicholas Lowell <nicholas.lowell@gmail.com> wrote: > @@ -2411,7 +2418,12 @@ int apply_subsystem_event_filter(struct trace_subsystem_dir *dir, > } > > if (!strcmp(strstrip(filter_string), "0")) { > - filter_free_subsystem_preds(dir, tr); > + /* If nothing was freed, we do not need to sync */ > + if (!filter_free_subsystem_preds(dir, tr)) { > + if(!(WARN_ON_ONCE(system->filter))) > + goto out_unlock; > + } > + > remove_filter_string(system->filter); > filter = system->filter; > system->filter = NULL; > -- This is why I asked for the warning: trace-cmd record -o /tmp/trace.dat -e sched -f "(common_pid == $$) || ((common_pid > 10) && common_pid < 100) || (common_pid >= 1000 && common_pid <= 1050) || (common_pid > 10000 && common_pid < 20000)" sleep 5 Causes: ------------[ cut here ]------------ WARNING: CPU: 5 PID: 944 at kernel/trace/trace_events_filter.c:2423 apply_subsystem_event_filter+0x18c/0x5e0 Modules linked in: CPU: 5 PID: 944 Comm: trace-cmd Not tainted 6.6.0-rc4-test-00009-gff7cd7446fe5 #102 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 RIP: 0010:apply_subsystem_event_filter+0x18c/0x5e0 Code: 44 24 08 00 00 00 00 48 8b 6d 00 4c 39 f5 75 bc 48 8b 44 24 18 4c 8b 60 18 4c 89 e5 45 84 ff 75 14 48 85 ed 0f 84 37 ff ff ff <0f> 0b eb 10 e8 4b be fd ff eb b0 4d 85 e4 0f 84 a3 02 00 00 48 8b RSP: 0018:ffff9b4941607db8 EFLAGS: 00010286 RAX: ffff8b2780a77280 RBX: ffff8b2780a77400 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff8b2781c11c38 RDI: ffff8b2781c11c38 RBP: ffff8b28df449030 R08: ffff8b2781c11c38 R09: 0000000000000000 R10: ffff8b2781c11c38 R11: 0000000000000000 R12: ffff8b28df449030 R13: ffffffffaaf64de0 R14: ffffffffaaf66bb8 R15: 0000000000000000 FS: 00007fd221def3c0(0000) GS:ffff8b28f7d40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000056117c93e160 CR3: 000000010173a003 CR4: 0000000000170ee0 Call Trace: <TASK> ? apply_subsystem_event_filter+0x18c/0x5e0 ? __warn+0x81/0x130 ? apply_subsystem_event_filter+0x18c/0x5e0 ? report_bug+0x191/0x1c0 ? handle_bug+0x3c/0x80 ? exc_invalid_op+0x17/0x70 ? asm_exc_invalid_op+0x1a/0x20 ? apply_subsystem_event_filter+0x18c/0x5e0 ? apply_subsystem_event_filter+0x5b/0x5e0 ? __check_object_size+0x25b/0x2c0 subsystem_filter_write+0x41/0x70 vfs_write+0xf2/0x440 ? kmem_cache_free+0x22/0x350 ksys_write+0x6f/0xf0 do_syscall_64+0x3f/0xc0 entry_SYSCALL_64_after_hwframe+0x6e/0xd8 RIP: 0033:0x7fd221ee7ae0 -- Steve
On Tue, Oct 3, 2023 at 10:29 PM Steven Rostedt <rostedt@goodmis.org> wrote: > > On Mon, 2 Oct 2023 10:41:48 -0400 > Nicholas Lowell <nicholas.lowell@gmail.com> wrote: > > > @@ -2411,7 +2418,12 @@ int apply_subsystem_event_filter(struct trace_subsystem_dir *dir, > > } > > > > if (!strcmp(strstrip(filter_string), "0")) { > > - filter_free_subsystem_preds(dir, tr); > > + /* If nothing was freed, we do not need to sync */ > > + if (!filter_free_subsystem_preds(dir, tr)) { > > + if(!(WARN_ON_ONCE(system->filter))) > > + goto out_unlock; > > + } > > + > > remove_filter_string(system->filter); > > filter = system->filter; > > system->filter = NULL; > > -- > > This is why I asked for the warning: > > trace-cmd record -o /tmp/trace.dat -e sched -f "(common_pid == $$) || ((common_pid > 10) && common_pid < 100) || (common_pid >= 1000 && common_pid <= 1050) || (common_pid > 10000 && common_pid < 20000)" sleep 5 > > > Causes: > > ------------[ cut here ]------------ > WARNING: CPU: 5 PID: 944 at kernel/trace/trace_events_filter.c:2423 apply_subsystem_event_filter+0x18c/0x5e0 > Modules linked in: > CPU: 5 PID: 944 Comm: trace-cmd Not tainted 6.6.0-rc4-test-00009-gff7cd7446fe5 #102 > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 > RIP: 0010:apply_subsystem_event_filter+0x18c/0x5e0 > Code: 44 24 08 00 00 00 00 48 8b 6d 00 4c 39 f5 75 bc 48 8b 44 24 18 4c 8b 60 18 4c 89 e5 45 84 ff 75 14 48 85 ed 0f 84 37 ff ff ff <0f> 0b eb 10 e8 4b be fd ff eb b0 4d 85 e4 0f 84 a3 02 00 00 48 8b > RSP: 0018:ffff9b4941607db8 EFLAGS: 00010286 > RAX: ffff8b2780a77280 RBX: ffff8b2780a77400 RCX: 0000000000000000 > RDX: 0000000000000000 RSI: ffff8b2781c11c38 RDI: ffff8b2781c11c38 > RBP: ffff8b28df449030 R08: ffff8b2781c11c38 R09: 0000000000000000 > R10: ffff8b2781c11c38 R11: 0000000000000000 R12: ffff8b28df449030 > R13: ffffffffaaf64de0 R14: ffffffffaaf66bb8 R15: 0000000000000000 > FS: 00007fd221def3c0(0000) GS:ffff8b28f7d40000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 000056117c93e160 CR3: 000000010173a003 CR4: 0000000000170ee0 > Call Trace: > <TASK> > ? apply_subsystem_event_filter+0x18c/0x5e0 > ? __warn+0x81/0x130 > ? apply_subsystem_event_filter+0x18c/0x5e0 > ? report_bug+0x191/0x1c0 > ? handle_bug+0x3c/0x80 > ? exc_invalid_op+0x17/0x70 > ? asm_exc_invalid_op+0x1a/0x20 > ? apply_subsystem_event_filter+0x18c/0x5e0 > ? apply_subsystem_event_filter+0x5b/0x5e0 > ? __check_object_size+0x25b/0x2c0 > subsystem_filter_write+0x41/0x70 > vfs_write+0xf2/0x440 > ? kmem_cache_free+0x22/0x350 > ksys_write+0x6f/0xf0 > do_syscall_64+0x3f/0xc0 > entry_SYSCALL_64_after_hwframe+0x6e/0xd8 > RIP: 0033:0x7fd221ee7ae0 > > -- Steve Is this just informative indicating that there are issues with how filters are being used or are you saying there is something else I need to do before this patch is approved? What version of trace-cmd is that using?
On Wed, 4 Oct 2023 10:39:33 -0400 Nick Lowell <nicholas.lowell@gmail.com> wrote: > > ------------[ cut here ]------------ > > WARNING: CPU: 5 PID: 944 at kernel/trace/trace_events_filter.c:2423 apply_subsystem_event_filter+0x18c/0x5e0 > > Modules linked in: > > CPU: 5 PID: 944 Comm: trace-cmd Not tainted 6.6.0-rc4-test-00009-gff7cd7446fe5 #102 > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 > > RIP: 0010:apply_subsystem_event_filter+0x18c/0x5e0 > > Code: 44 24 08 00 00 00 00 48 8b 6d 00 4c 39 f5 75 bc 48 8b 44 24 18 4c 8b 60 18 4c 89 e5 45 84 ff 75 14 48 85 ed 0f 84 37 ff ff ff <0f> 0b eb 10 e8 4b be fd ff eb b0 4d 85 e4 0f 84 a3 02 00 00 48 8b > > RSP: 0018:ffff9b4941607db8 EFLAGS: 00010286 > > RAX: ffff8b2780a77280 RBX: ffff8b2780a77400 RCX: 0000000000000000 > > RDX: 0000000000000000 RSI: ffff8b2781c11c38 RDI: ffff8b2781c11c38 > > RBP: ffff8b28df449030 R08: ffff8b2781c11c38 R09: 0000000000000000 > > R10: ffff8b2781c11c38 R11: 0000000000000000 R12: ffff8b28df449030 > > R13: ffffffffaaf64de0 R14: ffffffffaaf66bb8 R15: 0000000000000000 > > FS: 00007fd221def3c0(0000) GS:ffff8b28f7d40000(0000) knlGS:0000000000000000 > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > CR2: 000056117c93e160 CR3: 000000010173a003 CR4: 0000000000170ee0 > > Call Trace: > > <TASK> > > ? apply_subsystem_event_filter+0x18c/0x5e0 > > ? __warn+0x81/0x130 > > ? apply_subsystem_event_filter+0x18c/0x5e0 > > ? report_bug+0x191/0x1c0 > > ? handle_bug+0x3c/0x80 > > ? exc_invalid_op+0x17/0x70 > > ? asm_exc_invalid_op+0x1a/0x20 > > ? apply_subsystem_event_filter+0x18c/0x5e0 > > ? apply_subsystem_event_filter+0x5b/0x5e0 > > ? __check_object_size+0x25b/0x2c0 > > subsystem_filter_write+0x41/0x70 > > vfs_write+0xf2/0x440 > > ? kmem_cache_free+0x22/0x350 > > ksys_write+0x6f/0xf0 > > do_syscall_64+0x3f/0xc0 > > entry_SYSCALL_64_after_hwframe+0x6e/0xd8 > > RIP: 0033:0x7fd221ee7ae0 > > > > -- Steve > > Is this just informative indicating that there are issues with how > filters are being used or are you saying there is something else I > need to do before this patch is approved? > What version of trace-cmd is that using? Not sure if it matters, but the above was with trace-cmd v3.2. So, I guess we need to look a bit deeper at the change. > @@ -2411,7 +2418,12 @@ int apply_subsystem_event_filter(struct trace_subsystem_dir *dir, > } > > if (!strcmp(strstrip(filter_string), "0")) { > - filter_free_subsystem_preds(dir, tr); > + /* If nothing was freed, we do not need to sync */ > + if (!filter_free_subsystem_preds(dir, tr)) { > + if(!(WARN_ON_ONCE(system->filter))) > + goto out_unlock; When do we want to skip the below? The original version just did the "goto out_unlock" before the "system->filter" check, and that would have caused a memory leak, or just left the "system->filter" around when unneeded. In other words, the patch is incorrect in general then. > + } > + > remove_filter_string(system->filter); > filter = system->filter; > system->filter = NULL; I believe, what you really want here is simply: bool sync; [..] if (!strcmp(strstrip(filter_string), "0")) { + sync = filter_free_subsystem_preds(dir, tr); remove_filter_string(system->filter); filter = system->filter; system->filter = NULL; - /* Ensure all filters are no longer used */ - tracepoint_synchronize_unregister(); + if (sync) { + /* Ensure all filters are no longer used */ + tracepoint_synchronize_unregister(); + } filter_free_subsystem_filters(dir, tr); Maybe even pass in "sync" to the filter_free_subsystem_filters() to make sure there were nothing to be freed, and do the WARN_ON_ONCE() then. __free_filter(filter); goto out_unlock; } -- Steve
On Wed, Oct 4, 2023 at 10:52 AM Steven Rostedt <rostedt@goodmis.org> wrote: > > On Wed, 4 Oct 2023 10:39:33 -0400 > Nick Lowell <nicholas.lowell@gmail.com> wrote: > > > > ------------[ cut here ]------------ > > > WARNING: CPU: 5 PID: 944 at kernel/trace/trace_events_filter.c:2423 apply_subsystem_event_filter+0x18c/0x5e0 > > > Modules linked in: > > > CPU: 5 PID: 944 Comm: trace-cmd Not tainted 6.6.0-rc4-test-00009-gff7cd7446fe5 #102 > > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 > > > RIP: 0010:apply_subsystem_event_filter+0x18c/0x5e0 > > > Code: 44 24 08 00 00 00 00 48 8b 6d 00 4c 39 f5 75 bc 48 8b 44 24 18 4c 8b 60 18 4c 89 e5 45 84 ff 75 14 48 85 ed 0f 84 37 ff ff ff <0f> 0b eb 10 e8 4b be fd ff eb b0 4d 85 e4 0f 84 a3 02 00 00 48 8b > > > RSP: 0018:ffff9b4941607db8 EFLAGS: 00010286 > > > RAX: ffff8b2780a77280 RBX: ffff8b2780a77400 RCX: 0000000000000000 > > > RDX: 0000000000000000 RSI: ffff8b2781c11c38 RDI: ffff8b2781c11c38 > > > RBP: ffff8b28df449030 R08: ffff8b2781c11c38 R09: 0000000000000000 > > > R10: ffff8b2781c11c38 R11: 0000000000000000 R12: ffff8b28df449030 > > > R13: ffffffffaaf64de0 R14: ffffffffaaf66bb8 R15: 0000000000000000 > > > FS: 00007fd221def3c0(0000) GS:ffff8b28f7d40000(0000) knlGS:0000000000000000 > > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > CR2: 000056117c93e160 CR3: 000000010173a003 CR4: 0000000000170ee0 > > > Call Trace: > > > <TASK> > > > ? apply_subsystem_event_filter+0x18c/0x5e0 > > > ? __warn+0x81/0x130 > > > ? apply_subsystem_event_filter+0x18c/0x5e0 > > > ? report_bug+0x191/0x1c0 > > > ? handle_bug+0x3c/0x80 > > > ? exc_invalid_op+0x17/0x70 > > > ? asm_exc_invalid_op+0x1a/0x20 > > > ? apply_subsystem_event_filter+0x18c/0x5e0 > > > ? apply_subsystem_event_filter+0x5b/0x5e0 > > > ? __check_object_size+0x25b/0x2c0 > > > subsystem_filter_write+0x41/0x70 > > > vfs_write+0xf2/0x440 > > > ? kmem_cache_free+0x22/0x350 > > > ksys_write+0x6f/0xf0 > > > do_syscall_64+0x3f/0xc0 > > > entry_SYSCALL_64_after_hwframe+0x6e/0xd8 > > > RIP: 0033:0x7fd221ee7ae0 > > > > > > -- Steve > > > > Is this just informative indicating that there are issues with how > > filters are being used or are you saying there is something else I > > need to do before this patch is approved? > > What version of trace-cmd is that using? > > Not sure if it matters, but the above was with trace-cmd v3.2. > > So, I guess we need to look a bit deeper at the change. > > > @@ -2411,7 +2418,12 @@ int apply_subsystem_event_filter(struct trace_subsystem_dir *dir, > > } > > > > if (!strcmp(strstrip(filter_string), "0")) { > > - filter_free_subsystem_preds(dir, tr); > > + /* If nothing was freed, we do not need to sync */ > > + if (!filter_free_subsystem_preds(dir, tr)) { > > + if(!(WARN_ON_ONCE(system->filter))) > > + goto out_unlock; > > When do we want to skip the below? > > The original version just did the "goto out_unlock" before the > "system->filter" check, and that would have caused a memory leak, or just > left the "system->filter" around when unneeded. > > In other words, the patch is incorrect in general then. > > > + } > > + > > remove_filter_string(system->filter); > > filter = system->filter; > > system->filter = NULL; > > I believe, what you really want here is simply: > > bool sync; > > [..] > > if (!strcmp(strstrip(filter_string), "0")) { > + sync = filter_free_subsystem_preds(dir, tr); > remove_filter_string(system->filter); > filter = system->filter; > system->filter = NULL; > - /* Ensure all filters are no longer used */ > - tracepoint_synchronize_unregister(); > + if (sync) { > + /* Ensure all filters are no longer used */ > + tracepoint_synchronize_unregister(); > + } > filter_free_subsystem_filters(dir, tr); > I really appreciate the continued feedback. I was able to reproduce. I think I'm understanding better but still need some help. I am actually wondering if remove_filter_string(system->filter) should also return bool as an OR'd input for sync. Should it be something like this? if (!strcmp(strstrip(filter_string), "0")) { - filter_free_subsystem_preds(dir, tr); - remove_filter_string(system->filter); + bool sync; + + sync = filter_free_subsystem_preds(dir, tr); + sync = sync || remove_filter_string(system->filter); filter = system->filter; system->filter = NULL; - /* Ensure all filters are no longer used */ - tracepoint_synchronize_unregister(); + /* If nothing was freed, we do not need to sync */ + if(sync) { + /* Ensure all filters are no longer used */ + tracepoint_synchronize_unregister(); + } filter_free_subsystem_filters(dir, tr); __free_filter(filter); goto out_unlock; > Maybe even pass in "sync" to the filter_free_subsystem_filters() to make > sure there were nothing to be freed, and do the WARN_ON_ONCE() then. > > __free_filter(filter); > goto out_unlock; > } > > -- Steve I'm not sure if I see the reasoning for the WARN_ON_ONCE() in filter_free_subsystem_filters() because it ends up checking the same if(!filter) just like filter_free_subsystem_preds() did earlier. It doesn't seem to do anything with system->filter. I actually wonder if !sync, could filter_free_subsystem_filters() be skipped altogether. Help me if I'm missing something. Thanks again, Nick
On Thu, 12 Oct 2023 16:37:35 -0400 Nick Lowell <nicholas.lowell@gmail.com> wrote: Sorry, I was traveling when this was sent, and I missed it. > I really appreciate the continued feedback. I was able to reproduce. > I think I'm understanding better but still need some help. > I am actually wondering if remove_filter_string(system->filter) should You mean to return true if filter->filter_string was not NULL? > also return bool as an OR'd input for sync. > Should it be something like this? > > if (!strcmp(strstrip(filter_string), "0")) { > - filter_free_subsystem_preds(dir, tr); > - remove_filter_string(system->filter); > + bool sync; I would just make this an int; > + > + sync = filter_free_subsystem_preds(dir, tr); > + sync = sync || remove_filter_string(system->filter); And then just have: sync |= remove_filter_string(system->filter); > filter = system->filter; > system->filter = NULL; > - /* Ensure all filters are no longer used */ > - tracepoint_synchronize_unregister(); > + /* If nothing was freed, we do not need to sync */ > + if(sync) { > + /* Ensure all filters are no longer used */ > + tracepoint_synchronize_unregister(); > + } > filter_free_subsystem_filters(dir, tr); > __free_filter(filter); > goto out_unlock; > > > Maybe even pass in "sync" to the filter_free_subsystem_filters() to make > > sure there were nothing to be freed, and do the WARN_ON_ONCE() then. > > > > __free_filter(filter); > > goto out_unlock; > > } > > > > -- Steve > > I'm not sure if I see the reasoning for the WARN_ON_ONCE() in > filter_free_subsystem_filters() > because it ends up checking the same if(!filter) just like > filter_free_subsystem_preds() did earlier. It doesn't > seem to do anything with system->filter. I actually wonder if !sync, > could filter_free_subsystem_filters() > be skipped altogether. Help me if I'm missing something. The point is, code always changes. It's a bug if one of the filters had content in filter_free_subsystem_filters() and sync is 0, hence the WARN_ON_ONCE() if it does. WARN_ON*()s are added to make sure the code is acting the way it is expected to act. Yes, it should never trigger, but if it does, we know there's a bug somewhere. -- Steve
diff --git a/kernel/trace/trace_events_filter.c b/kernel/trace/trace_events_filter.c index 33264e510d16..baa108f88032 100644 --- a/kernel/trace/trace_events_filter.c +++ b/kernel/trace/trace_events_filter.c @@ -1317,22 +1317,29 @@ void free_event_filter(struct event_filter *filter) __free_filter(filter); } -static inline void __remove_filter(struct trace_event_file *file) +static inline bool __remove_filter(struct trace_event_file *file) { filter_disable(file); + if (!file->filter) + return false; + remove_filter_string(file->filter); + return true; } -static void filter_free_subsystem_preds(struct trace_subsystem_dir *dir, +static bool filter_free_subsystem_preds(struct trace_subsystem_dir *dir, struct trace_array *tr) { struct trace_event_file *file; + bool do_sync = false; list_for_each_entry(file, &tr->events, list) { if (file->system != dir) continue; - __remove_filter(file); + if (__remove_filter(file)) + do_sync = true; } + return do_sync; } static inline void __free_subsystem_filter(struct trace_event_file *file) @@ -2411,7 +2418,12 @@ int apply_subsystem_event_filter(struct trace_subsystem_dir *dir, } if (!strcmp(strstrip(filter_string), "0")) { - filter_free_subsystem_preds(dir, tr); + /* If nothing was freed, we do not need to sync */ + if (!filter_free_subsystem_preds(dir, tr)) { + if(!(WARN_ON_ONCE(system->filter))) + goto out_unlock; + } + remove_filter_string(system->filter); filter = system->filter; system->filter = NULL;