Message ID | 20230929191637.416931-1-cleger@rivosinc.com (mailing list archive) |
---|---|
State | Accepted |
Commit | 23cce5f25491968b23fb9c399bbfb25f13870cd9 |
Headers | show |
Series | [v2] tracing: relax trace_event_eval_update() execution with cond_resched() | expand |
On Fri, Sep 29, 2023 at 12:17 PM Clément Léger <cleger@rivosinc.com> wrote: > > When kernel is compiled without preemption, the eval_map_work_func() > (which calls trace_event_eval_update()) will not be preempted up to its > complete execution. This can actually cause a problem since if another > CPU call stop_machine(), the call will have to wait for the > eval_map_work_func() function to finish executing in the workqueue > before being able to be scheduled. This problem was observe on a SMP > system at boot time, when the CPU calling the initcalls executed > clocksource_done_booting() which in the end calls stop_machine(). We > observed a 1 second delay because one CPU was executing > eval_map_work_func() and was not preempted by the stop_machine() task. > > Adding a call to cond_resched() in trace_event_eval_update() allows > other tasks to be executed and thus continue working asynchronously > like before without blocking any pending task at boot time. > > Signed-off-by: Clément Léger <cleger@rivosinc.com> > --- > > Changes in v2: > - Use cond_resched() instead of schedule() > > kernel/trace/trace_events.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c > index 91951d038ba4..f49d6ddb6342 100644 > --- a/kernel/trace/trace_events.c > +++ b/kernel/trace/trace_events.c > @@ -2770,6 +2770,7 @@ void trace_event_eval_update(struct trace_eval_map **map, int len) > update_event_fields(call, map[i]); > } > } > + cond_resched(); > } > up_write(&trace_event_sem); > } (First email got rejected as gmail decided to switch back the format to html :(). Sorry for the noise. Thanks for the fix. The boot time improvement is evident in both spike/Qemu. FWIW Tested-by: Atish Patra <atishp@rivosinc.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> > -- > 2.40.1 >
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c index 91951d038ba4..f49d6ddb6342 100644 --- a/kernel/trace/trace_events.c +++ b/kernel/trace/trace_events.c @@ -2770,6 +2770,7 @@ void trace_event_eval_update(struct trace_eval_map **map, int len) update_event_fields(call, map[i]); } } + cond_resched(); } up_write(&trace_event_sem); }
When kernel is compiled without preemption, the eval_map_work_func() (which calls trace_event_eval_update()) will not be preempted up to its complete execution. This can actually cause a problem since if another CPU call stop_machine(), the call will have to wait for the eval_map_work_func() function to finish executing in the workqueue before being able to be scheduled. This problem was observe on a SMP system at boot time, when the CPU calling the initcalls executed clocksource_done_booting() which in the end calls stop_machine(). We observed a 1 second delay because one CPU was executing eval_map_work_func() and was not preempted by the stop_machine() task. Adding a call to cond_resched() in trace_event_eval_update() allows other tasks to be executed and thus continue working asynchronously like before without blocking any pending task at boot time. Signed-off-by: Clément Léger <cleger@rivosinc.com> --- Changes in v2: - Use cond_resched() instead of schedule() kernel/trace/trace_events.c | 1 + 1 file changed, 1 insertion(+)