diff mbox series

tracing: relax trace_event_eval_update() execution with schedule()

Message ID 20230929141348.248761-1-cleger@rivosinc.com (mailing list archive)
State Superseded
Headers show
Series tracing: relax trace_event_eval_update() execution with schedule() | expand

Commit Message

Clément Léger Sept. 29, 2023, 2:13 p.m. UTC
When kernel is compiled without preemption, the eval_map_work_func()
(which calls trace_event_eval_update()) will not be preempted up to its
complete execution. This can actually cause a problem since if another
CPU call stop_machine(), the call will have to wait for the
eval_map_work_func() function to finish executing in the workqueue
before being able to be scheduled. This problem was observe on a SMP
system at boot time, when the CPU calling the initcalls executed
clocksource_done_booting() which in the end calls stop_machine(). We
observed a 1 second delay because one CPU was executing
eval_map_work_func() and was not preempted by the stop_machine() task.

Adding a call to schedule() in trace_event_eval_update() allows to let
other tasks to be executed and thus continue working asynchronously like
before without blocking any pending task at boot time.

Signed-off-by: Clément Léger <cleger@rivosinc.com>
---
 kernel/trace/trace_events.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Steven Rostedt Sept. 29, 2023, 3:06 p.m. UTC | #1
On Fri, 29 Sep 2023 16:13:48 +0200
Clément Léger <cleger@rivosinc.com> wrote:

> When kernel is compiled without preemption, the eval_map_work_func()
> (which calls trace_event_eval_update()) will not be preempted up to its
> complete execution. This can actually cause a problem since if another
> CPU call stop_machine(), the call will have to wait for the
> eval_map_work_func() function to finish executing in the workqueue
> before being able to be scheduled. This problem was observe on a SMP
> system at boot time, when the CPU calling the initcalls executed
> clocksource_done_booting() which in the end calls stop_machine(). We
> observed a 1 second delay because one CPU was executing
> eval_map_work_func() and was not preempted by the stop_machine() task.
> 
> Adding a call to schedule() in trace_event_eval_update() allows to let
> other tasks to be executed and thus continue working asynchronously like
> before without blocking any pending task at boot time.
> 
> Signed-off-by: Clément Léger <cleger@rivosinc.com>
> ---
>  kernel/trace/trace_events.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
> index 91951d038ba4..dbdf57a081c0 100644
> --- a/kernel/trace/trace_events.c
> +++ b/kernel/trace/trace_events.c
> @@ -2770,6 +2770,7 @@ void trace_event_eval_update(struct trace_eval_map **map, int len)
>  				update_event_fields(call, map[i]);
>  			}
>  		}
> +		schedule();

The proper answer to this is "cond_resched()" but still, there's going
to be work to get rid of all that soon [1]. But I'll take a cond_resched()
now until that is implemented.

-- Steve

>  	}
>  	up_write(&trace_event_sem);
>  }

[1] https://lore.kernel.org/all/87cyyfxd4k.ffs@tglx/
Clément Léger Sept. 29, 2023, 3:10 p.m. UTC | #2
On 29/09/2023 17:06, Steven Rostedt wrote:
> On Fri, 29 Sep 2023 16:13:48 +0200
> Clément Léger <cleger@rivosinc.com> wrote:
> 
>> When kernel is compiled without preemption, the eval_map_work_func()
>> (which calls trace_event_eval_update()) will not be preempted up to its
>> complete execution. This can actually cause a problem since if another
>> CPU call stop_machine(), the call will have to wait for the
>> eval_map_work_func() function to finish executing in the workqueue
>> before being able to be scheduled. This problem was observe on a SMP
>> system at boot time, when the CPU calling the initcalls executed
>> clocksource_done_booting() which in the end calls stop_machine(). We
>> observed a 1 second delay because one CPU was executing
>> eval_map_work_func() and was not preempted by the stop_machine() task.
>>
>> Adding a call to schedule() in trace_event_eval_update() allows to let
>> other tasks to be executed and thus continue working asynchronously like
>> before without blocking any pending task at boot time.
>>
>> Signed-off-by: Clément Léger <cleger@rivosinc.com>
>> ---
>>  kernel/trace/trace_events.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
>> index 91951d038ba4..dbdf57a081c0 100644
>> --- a/kernel/trace/trace_events.c
>> +++ b/kernel/trace/trace_events.c
>> @@ -2770,6 +2770,7 @@ void trace_event_eval_update(struct trace_eval_map **map, int len)
>>  				update_event_fields(call, map[i]);
>>  			}
>>  		}
>> +		schedule();
> 
> The proper answer to this is "cond_resched()" but still, there's going
> to be work to get rid of all that soon [1]. But I'll take a cond_resched()
> now until that is implemented.

Hi Steven,

Thanks for the information, I'll update the patch and send a V2.

Clément

> 
> -- Steve
> 
>>  	}
>>  	up_write(&trace_event_sem);
>>  }
> 
> [1] https://lore.kernel.org/all/87cyyfxd4k.ffs@tglx/
diff mbox series

Patch

diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 91951d038ba4..dbdf57a081c0 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -2770,6 +2770,7 @@  void trace_event_eval_update(struct trace_eval_map **map, int len)
 				update_event_fields(call, map[i]);
 			}
 		}
+		schedule();
 	}
 	up_write(&trace_event_sem);
 }