diff mbox series

[resend,6/8] tracing/ftrace: Add might_fault check to syscall probes

Message ID 20240930192357.1154417-7-mathieu.desnoyers@efficios.com (mailing list archive)
State Superseded
Headers show
Series tracing: Allow system call tracepoints to handle page faults | expand

Checks

Context Check Description
netdev/tree_selection success Not a local patch

Commit Message

Mathieu Desnoyers Sept. 30, 2024, 7:23 p.m. UTC
Add a might_fault() check to validate that the ftrace sys_enter/sys_exit
probe callbacks are indeed called from a context where page faults can
be handled.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Michael Jeanson <mjeanson@efficios.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Yonghong Song <yhs@fb.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: bpf@vger.kernel.org
Cc: Joel Fernandes <joel@joelfernandes.org>
---
 include/trace/trace_events.h  | 1 +
 kernel/trace/trace_syscalls.c | 2 ++
 2 files changed, 3 insertions(+)

Comments

Thomas Gleixner Oct. 28, 2024, 5:42 p.m. UTC | #1
On Mon, Sep 30 2024 at 15:23, Mathieu Desnoyers wrote:
> diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
> index a3d8ac00793e..0430890cbb42 100644
> --- a/kernel/trace/trace_syscalls.c
> +++ b/kernel/trace/trace_syscalls.c
> @@ -303,6 +303,7 @@ static void ftrace_syscall_enter(void *data, struct pt_regs *regs, long id)
>  	 * Syscall probe called with preemption enabled, but the ring
>  	 * buffer and per-cpu data require preemption to be disabled.
>  	 */
> +	might_fault();
>  	guard(preempt_notrace)();

I find it odd that the might_fault() check is in all the implementations
and not in the tracepoint itself:

    if (syscall) {
        might_fault();
 	rcu_read_unlock_trace();
   } else ...     

That's where I would have expected it to be.

Thanks,

        tglx
Mathieu Desnoyers Oct. 28, 2024, 7:02 p.m. UTC | #2
On 2024-10-28 13:42, Thomas Gleixner wrote:
> On Mon, Sep 30 2024 at 15:23, Mathieu Desnoyers wrote:
>> diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
>> index a3d8ac00793e..0430890cbb42 100644
>> --- a/kernel/trace/trace_syscalls.c
>> +++ b/kernel/trace/trace_syscalls.c
>> @@ -303,6 +303,7 @@ static void ftrace_syscall_enter(void *data, struct pt_regs *regs, long id)
>>   	 * Syscall probe called with preemption enabled, but the ring
>>   	 * buffer and per-cpu data require preemption to be disabled.
>>   	 */
>> +	might_fault();
>>   	guard(preempt_notrace)();
> 
> I find it odd that the might_fault() check is in all the implementations
> and not in the tracepoint itself:
> 
>      if (syscall) {
>          might_fault();
>   	rcu_read_unlock_trace();
>     } else ...
> 
> That's where I would have expected it to be.

You raise a good point: we should also add a might_fault() check in
__DO_TRACE() in the syscall case, so we can catch incorrect use of the
syscall tracepoint even if no probes are registered to it.

I've added the might_fault() in each tracer syscall probe to make sure
a tracer don't end up registering a faultable probe on a tracepoint
protected with preempt_disable by mistake. It validates that the tracers
are using the tracepoint registration as expected.

I'll prepare separate a patch adding this and will add it to this
series.

Thanks,

Mathieu

> 
> Thanks,
> 
>          tglx
diff mbox series

Patch

diff --git a/include/trace/trace_events.h b/include/trace/trace_events.h
index 0228d9ed94a3..e0d4850b0d77 100644
--- a/include/trace/trace_events.h
+++ b/include/trace/trace_events.h
@@ -446,6 +446,7 @@  __DECLARE_EVENT_CLASS(call, PARAMS(proto), PARAMS(args), PARAMS(tstruct), \
 static notrace void							\
 trace_event_raw_event_##call(void *__data, proto)			\
 {									\
+	might_fault();							\
 	guard(preempt_notrace)();					\
 	do_trace_event_raw_event_##call(__data, args);			\
 }
diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index a3d8ac00793e..0430890cbb42 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -303,6 +303,7 @@  static void ftrace_syscall_enter(void *data, struct pt_regs *regs, long id)
 	 * Syscall probe called with preemption enabled, but the ring
 	 * buffer and per-cpu data require preemption to be disabled.
 	 */
+	might_fault();
 	guard(preempt_notrace)();
 
 	syscall_nr = trace_get_syscall_nr(current, regs);
@@ -348,6 +349,7 @@  static void ftrace_syscall_exit(void *data, struct pt_regs *regs, long ret)
 	 * Syscall probe called with preemption enabled, but the ring
 	 * buffer and per-cpu data require preemption to be disabled.
 	 */
+	might_fault();
 	guard(preempt_notrace)();
 
 	syscall_nr = trace_get_syscall_nr(current, regs);