diff mbox series

[RFC,v4,4/4] tracing: Add might_fault() check in __DO_TRACE() for syscall

Message ID 20241028190927.648953-5-mathieu.desnoyers@efficios.com (mailing list archive)
State Superseded
Headers show
Series Faultable syscall tracepoints updates | expand

Checks

Context Check Description
netdev/tree_selection success Not a local patch

Commit Message

Mathieu Desnoyers Oct. 28, 2024, 7:09 p.m. UTC
Catch incorrect use of syscall tracepoints even if no probes are
registered by adding a might_fault() check in __DO_TRACE() when
syscall=1.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Michael Jeanson <mjeanson@efficios.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Yonghong Song <yhs@fb.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: bpf@vger.kernel.org
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Jordan Rife <jrife@google.com>
---
 include/linux/tracepoint.h | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

Comments

Mathieu Desnoyers Oct. 28, 2024, 7:58 p.m. UTC | #1
On 2024-10-28 15:09, Mathieu Desnoyers wrote:
> Catch incorrect use of syscall tracepoints even if no probes are
> registered by adding a might_fault() check in __DO_TRACE() when
> syscall=1.
> 
> Suggested-by: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Michael Jeanson <mjeanson@efficios.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Masami Hiramatsu <mhiramat@kernel.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Alexei Starovoitov <ast@kernel.org>
> Cc: Yonghong Song <yhs@fb.com>
> Cc: Paul E. McKenney <paulmck@kernel.org>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> Cc: Namhyung Kim <namhyung@kernel.org>
> Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
> Cc: bpf@vger.kernel.org
> Cc: Joel Fernandes <joel@joelfernandes.org>
> Cc: Jordan Rife <jrife@google.com>
> ---
>   include/linux/tracepoint.h | 6 ++++--
>   1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
> index 259f0ab4ece6..7bed499b7055 100644
> --- a/include/linux/tracepoint.h
> +++ b/include/linux/tracepoint.h
> @@ -226,10 +226,12 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p)
>   		if (!(cond))						\
>   			return;						\
>   									\
> -		if (syscall)						\
> +		if (syscall) {						\
>   			rcu_read_lock_trace();				\
> -		else							\
> +			might_fault();					\

Actually, __DO_TRACE() is not the best place to put this, because it's
only executed when the tracepoint is enabled.

I'll move this to __DECLARE_TRACE_SYSCALL()

#define __DECLARE_TRACE_SYSCALL(name, proto, args, cond, data_proto)    \
         __DECLARE_TRACE_COMMON(name, PARAMS(proto), PARAMS(args), cond, PARAMS(data_proto)) \
         static inline void trace_##name(proto)                          \
         {                                                               \
                 might_fault();                                          \
                 if (static_branch_unlikely(&__tracepoint_##name.key))   \
                         __DO_TRACE(name,                                \
                                 TP_ARGS(args),                          \
                                 TP_CONDITION(cond), 1);                 \
[...]

instead in v5.

Thanks,

Mathieu

> +		} else {						\
>   			preempt_disable_notrace();			\
> +		}							\
>   									\
>   		__DO_TRACE_CALL(name, TP_ARGS(args));			\
>   									\
Andrii Nakryiko Oct. 28, 2024, 8:20 p.m. UTC | #2
On Mon, Oct 28, 2024 at 12:59 PM Mathieu Desnoyers
<mathieu.desnoyers@efficios.com> wrote:
>
> On 2024-10-28 15:09, Mathieu Desnoyers wrote:
> > Catch incorrect use of syscall tracepoints even if no probes are
> > registered by adding a might_fault() check in __DO_TRACE() when
> > syscall=1.
> >
> > Suggested-by: Thomas Gleixner <tglx@linutronix.de>
> > Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Michael Jeanson <mjeanson@efficios.com>
> > Cc: Steven Rostedt <rostedt@goodmis.org>
> > Cc: Masami Hiramatsu <mhiramat@kernel.org>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Alexei Starovoitov <ast@kernel.org>
> > Cc: Yonghong Song <yhs@fb.com>
> > Cc: Paul E. McKenney <paulmck@kernel.org>
> > Cc: Ingo Molnar <mingo@redhat.com>
> > Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
> > Cc: Mark Rutland <mark.rutland@arm.com>
> > Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> > Cc: Namhyung Kim <namhyung@kernel.org>
> > Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
> > Cc: bpf@vger.kernel.org
> > Cc: Joel Fernandes <joel@joelfernandes.org>
> > Cc: Jordan Rife <jrife@google.com>
> > ---
> >   include/linux/tracepoint.h | 6 ++++--
> >   1 file changed, 4 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
> > index 259f0ab4ece6..7bed499b7055 100644
> > --- a/include/linux/tracepoint.h
> > +++ b/include/linux/tracepoint.h
> > @@ -226,10 +226,12 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p)
> >               if (!(cond))                                            \
> >                       return;                                         \
> >                                                                       \
> > -             if (syscall)                                            \
> > +             if (syscall) {                                          \
> >                       rcu_read_lock_trace();                          \
> > -             else                                                    \
> > +                     might_fault();                                  \
>
> Actually, __DO_TRACE() is not the best place to put this, because it's
> only executed when the tracepoint is enabled.
>
> I'll move this to __DECLARE_TRACE_SYSCALL()
>
> #define __DECLARE_TRACE_SYSCALL(name, proto, args, cond, data_proto)    \
>          __DECLARE_TRACE_COMMON(name, PARAMS(proto), PARAMS(args), cond, PARAMS(data_proto)) \
>          static inline void trace_##name(proto)                          \
>          {                                                               \
>                  might_fault();                                          \
>                  if (static_branch_unlikely(&__tracepoint_##name.key))   \
>                          __DO_TRACE(name,                                \
>                                  TP_ARGS(args),                          \
>                                  TP_CONDITION(cond), 1);                 \
> [...]
>
> instead in v5.

please drop the RFC tag while at it

>
> Thanks,
>
> Mathieu
>
> > +             } else {                                                \
> >                       preempt_disable_notrace();                      \
> > +             }                                                       \
> >                                                                       \
> >               __DO_TRACE_CALL(name, TP_ARGS(args));                   \
> >                                                                       \
>
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> https://www.efficios.com
>
Mathieu Desnoyers Oct. 28, 2024, 8:25 p.m. UTC | #3
On 2024-10-28 16:20, Andrii Nakryiko wrote:
> On Mon, Oct 28, 2024 at 12:59 PM Mathieu Desnoyers
> <mathieu.desnoyers@efficios.com> wrote:
>>
>> On 2024-10-28 15:09, Mathieu Desnoyers wrote:
>>> Catch incorrect use of syscall tracepoints even if no probes are
>>> registered by adding a might_fault() check in __DO_TRACE() when
>>> syscall=1.
>>>
>>> Suggested-by: Thomas Gleixner <tglx@linutronix.de>
>>> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
>>> Cc: Thomas Gleixner <tglx@linutronix.de>
>>> Cc: Michael Jeanson <mjeanson@efficios.com>
>>> Cc: Steven Rostedt <rostedt@goodmis.org>
>>> Cc: Masami Hiramatsu <mhiramat@kernel.org>
>>> Cc: Peter Zijlstra <peterz@infradead.org>
>>> Cc: Alexei Starovoitov <ast@kernel.org>
>>> Cc: Yonghong Song <yhs@fb.com>
>>> Cc: Paul E. McKenney <paulmck@kernel.org>
>>> Cc: Ingo Molnar <mingo@redhat.com>
>>> Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
>>> Cc: Mark Rutland <mark.rutland@arm.com>
>>> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
>>> Cc: Namhyung Kim <namhyung@kernel.org>
>>> Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
>>> Cc: bpf@vger.kernel.org
>>> Cc: Joel Fernandes <joel@joelfernandes.org>
>>> Cc: Jordan Rife <jrife@google.com>
>>> ---
>>>    include/linux/tracepoint.h | 6 ++++--
>>>    1 file changed, 4 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
>>> index 259f0ab4ece6..7bed499b7055 100644
>>> --- a/include/linux/tracepoint.h
>>> +++ b/include/linux/tracepoint.h
>>> @@ -226,10 +226,12 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p)
>>>                if (!(cond))                                            \
>>>                        return;                                         \
>>>                                                                        \
>>> -             if (syscall)                                            \
>>> +             if (syscall) {                                          \
>>>                        rcu_read_lock_trace();                          \
>>> -             else                                                    \
>>> +                     might_fault();                                  \
>>
>> Actually, __DO_TRACE() is not the best place to put this, because it's
>> only executed when the tracepoint is enabled.
>>
>> I'll move this to __DECLARE_TRACE_SYSCALL()
>>
>> #define __DECLARE_TRACE_SYSCALL(name, proto, args, cond, data_proto)    \
>>           __DECLARE_TRACE_COMMON(name, PARAMS(proto), PARAMS(args), cond, PARAMS(data_proto)) \
>>           static inline void trace_##name(proto)                          \
>>           {                                                               \
>>                   might_fault();                                          \
>>                   if (static_branch_unlikely(&__tracepoint_##name.key))   \
>>                           __DO_TRACE(name,                                \
>>                                   TP_ARGS(args),                          \
>>                                   TP_CONDITION(cond), 1);                 \
>> [...]
>>
>> instead in v5.
> 
> please drop the RFC tag while at it

I'm still awaiting for Jordan (or someone else) to come back with
testing results before I feel confident dropping the RFC tag.

Thanks,

Mathieu

> 
>>
>> Thanks,
>>
>> Mathieu
>>
>>> +             } else {                                                \
>>>                        preempt_disable_notrace();                      \
>>> +             }                                                       \
>>>                                                                        \
>>>                __DO_TRACE_CALL(name, TP_ARGS(args));                   \
>>>                                                                        \
>>
>> --
>> Mathieu Desnoyers
>> EfficiOS Inc.
>> https://www.efficios.com
>>
Jordan Rife Oct. 28, 2024, 8:40 p.m. UTC | #4
> I'm still awaiting for Jordan (or someone else) to come back with
testing results before I feel confident dropping the RFC tag.

I can test this later today. Considering there needs to be a fix on
the BPF side to fully resolve the use-after-free issue reported by
syzbot, I may combine your v4 patch with the bandaid fix which chains
call_rcu->call_rcu_tasks_trace I made earlier while running the
reproducer locally. This should give some confidence that your patch
addresses issues on the tracepoint side until we can combine it with
Andrii's fix for BPF that consumes tracepoint_is_faultable(). I'm open
to other suggestions as well if you would like me to test something
else.

-Jordan
Jordan Rife Oct. 29, 2024, 12:28 a.m. UTC | #5
> I can test this later today. Considering there needs to be a fix on
> the BPF side to fully resolve the use-after-free issue reported by
> syzbot, I may combine your v4 patch with the bandaid fix which chains
> call_rcu->call_rcu_tasks_trace I made earlier while running the
> reproducer locally.

Testing this way, the series LGTM. Here's what I did starting from
linux-next tag next-20241028.

1. Applied my patch from [1] to prevent any failures resulting from the
   as-of-yet unpatched BPF code that uses call_rcu(). This lets us
   focus on the effect's of Mathieu's patch series.
2. Ran the reproducer [3] from the original syzbot report [2] on a
   kernel build /without/ Mathieu's v4 patch to confirm that we hit
   a use-after-free bug resulting from the use of call_rcu() inside
   release_probe().
3. Applied the patch series and rebuilt the kernel.
4. Reran the reproducer on the new kernel build to ensure that we don't
   hit the same use-after-free bug anymore.

[1]: https://lore.kernel.org/bpf/20241023145640.1499722-1-jrife@google.com/
[2]: https://lore.kernel.org/bpf/67121037.050a0220.10f4f4.000f.GAE@google.com/
[3]: https://syzkaller.appspot.com/x/repro.syz?x=153ef887980000

Without Mathieu's Patch
=======================
A crash occurs after a few minutes.

jordan@t14:~/contexts/use-after-free$ ssh \
-p 10022 \
-o UserKnownHostsFile=/dev/null  \
-o StrictHostKeyChecking=no \
-o IdentitiesOnly=yes \
root@127.0.0.1 './syz-execprog  -repeat=0 -procs=5 ./repro.syz.txt'
Warning: Permanently added '[127.0.0.1]:10022' (ED25519) to the list of known hosts.
2024/10/28 23:15:39 parsed 1 programs
2024/10/28 23:15:52 executed programs: 0
2024/10/28 23:15:57 executed programs: 34
2024/10/28 23:16:02 executed programs: 90
2024/10/28 23:16:07 executed programs: 121
2024/10/28 23:16:12 executed programs: 152
2024/10/28 23:16:17 executed programs: 165
2024/10/28 23:16:23 executed programs: 177
2024/10/28 23:16:28 executed programs: 209
2024/10/28 23:16:33 executed programs: 228
2024/10/28 23:16:38 executed programs: 251
2024/10/28 23:16:44 executed programs: 273
2024/10/28 23:16:49 executed programs: 316
2024/10/28 23:16:54 executed programs: 338
2024/10/28 23:16:59 executed programs: 352
2024/10/28 23:17:04 executed programs: 376
2024/10/28 23:17:10 executed programs: 404
2024/10/28 23:17:16 executed programs: 419
2024/10/28 23:17:21 executed programs: 433
2024/10/28 23:17:26 executed programs: 460

[  687.323615][T16276] ==================================================================
[  687.325235][T16276] BUG: KFENCE: use-after-free read in __traceiter_sys_enter+0x30/0x50
[  687.325235][T16276] 
[  687.327193][T16276] Use-after-free read at 0xffff88807ec60028 (in kfence-#47):
[  687.328404][T16276]  __traceiter_sys_enter+0x30/0x50
[  687.329338][T16276]  syscall_trace_enter+0x1ea/0x2b0
[  687.330021][T16276]  do_syscall_64+0x1ec/0x250
[  687.330816][T16276]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
[  687.331826][T16276] 
[  687.332291][T16276] kfence-#47: 0xffff88807ec60000-0xffff88807ec60057, size=88, cache=kmalloc-96
[  687.332291][T16276] 
[  687.334265][T16276] allocated by task 16281 on cpu 1 at 683.953385s (3.380878s ago):
[  687.335615][T16276]  tracepoint_add_func+0x28a/0xd90
[  687.336424][T16276]  tracepoint_probe_register_prio_may_exist+0xa2/0xf0
[  687.337416][T16276]  bpf_probe_register+0x186/0x200
[  687.338174][T16276]  bpf_raw_tp_link_attach+0x21f/0x540
[  687.339233][T16276]  __sys_bpf+0x393/0x4fa0
[  687.340042][T16276]  __x64_sys_bpf+0x78/0xc0
[  687.340801][T16276]  do_syscall_64+0xcb/0x250
[  687.341623][T16276]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
[  687.342697][T16276] 
[  687.343147][T16276] freed by task 14317 on cpu 1 at 687.273223s (0.069923s ago):
[  687.344352][T16276]  rcu_core+0x7a2/0x14f0
[  687.344996][T16276]  handle_softirqs+0x1d4/0x870
[  687.345797][T16276]  irq_exit_rcu+0xbb/0x120
[  687.346519][T16276]  sysvec_apic_timer_interrupt+0xa8/0xc0
[  687.347432][T16276]  asm_sysvec_apic_timer_interrupt+0x1a/0x20
[  687.348574][T16276]  check_preemption_disabled+0x22/0x170
[  687.349904][T16276]  rcu_is_watching+0x12/0xc0
[  687.350849][T16276]  lock_release+0x51e/0x6f0
[  687.351758][T16276]  bpf_trace_run2+0x25a/0x580
[  687.352675][T16276]  __bpf_trace_sys_enter+0x6e/0xa0
[  687.353625][T16276]  syscall_trace_enter+0x1ea/0x2b0
[  687.354570][T16276]  do_syscall_64+0x1ec/0x250
[  687.355447][T16276]  entry_SYSCALL_64_after_hwframe+0x77/0x7f


With Mathieu's Patch
====================
No crash after 10m.

jordan@t14:~/contexts/use-after-free$ ssh \
-p 10022 \
-o UserKnownHostsFile=/dev/null \
-o StrictHostKeyChecking=no \
-o IdentitiesOnly=yes root@127.0.0.1 \
'./syz-execprog  -repeat=0 -procs=5 ./repro.syz.txt'
Warning: Permanently added '[127.0.0.1]:10022' (ED25519) to the list of known hosts.
2024/10/29 00:09:00 parsed 1 programs
2024/10/29 00:09:07 executed programs: 0
2024/10/29 00:09:12 executed programs: 10
2024/10/29 00:09:17 executed programs: 51
2024/10/29 00:09:23 executed programs: 82
2024/10/29 00:09:28 executed programs: 103
2024/10/29 00:09:33 executed programs: 125
2024/10/29 00:09:39 executed programs: 134
2024/10/29 00:09:44 executed programs: 147
...
2024/10/29 00:18:06 executed programs: 2671
2024/10/29 00:18:11 executed programs: 2725
2024/10/29 00:18:17 executed programs: 2743
2024/10/29 00:18:22 executed programs: 2772
2024/10/29 00:18:30 executed programs: 2784
2024/10/29 00:18:35 executed programs: 2816
2024/10/29 00:18:40 executed programs: 2842
2024/10/29 00:18:46 executed programs: 2881
2024/10/29 00:18:52 executed programs: 2923
2024/10/29 00:18:57 executed programs: 2947
2024/10/29 00:19:03 executed programs: 2991
2024/10/29 00:19:09 executed programs: 3013
2024/10/29 00:19:16 executed programs: 3052


Tested-by: Jordan Rife <jrife@google.com>
Mathieu Desnoyers Oct. 30, 2024, 2:42 p.m. UTC | #6
On 2024-10-28 20:28, Jordan Rife wrote:
>> I can test this later today. Considering there needs to be a fix on
>> the BPF side to fully resolve the use-after-free issue reported by
>> syzbot, I may combine your v4 patch with the bandaid fix which chains
>> call_rcu->call_rcu_tasks_trace I made earlier while running the
>> reproducer locally.
> 
> Testing this way, the series LGTM. Here's what I did starting from
> linux-next tag next-20241028.
[...]
> Tested-by: Jordan Rife <jrife@google.com>
> 

Thanks! I'll add your tested-by tags and send a v5 non-RFC.

Mathieu
Alexei Starovoitov Oct. 30, 2024, 2:59 p.m. UTC | #7
On Mon, Oct 28, 2024 at 5:28 PM Jordan Rife <jrife@google.com> wrote:
>
>
> 1. Applied my patch from [1] to prevent any failures resulting from the
>    as-of-yet unpatched BPF code that uses call_rcu(). This lets us

...

> [1]: https://lore.kernel.org/bpf/20241023145640.1499722-1-jrife@google.com/
> [2]: https://lore.kernel.org/bpf/67121037.050a0220.10f4f4.000f.GAE@google.com/
> [3]: https://syzkaller.appspot.com/x/repro.syz?x=153ef887980000
>
>
> [  687.323615][T16276] ==================================================================
> [  687.325235][T16276] BUG: KFENCE: use-after-free read in __traceiter_sys_enter+0x30/0x50
> [  687.325235][T16276]
> [  687.327193][T16276] Use-after-free read at 0xffff88807ec60028 (in kfence-#47):
> [  687.328404][T16276]  __traceiter_sys_enter+0x30/0x50
> [  687.329338][T16276]  syscall_trace_enter+0x1ea/0x2b0
> [  687.330021][T16276]  do_syscall_64+0x1ec/0x250
> [  687.330816][T16276]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
> [  687.331826][T16276]
> [  687.332291][T16276] kfence-#47: 0xffff88807ec60000-0xffff88807ec60057, size=88, cache=kmalloc-96
> [  687.332291][T16276]
> [  687.334265][T16276] allocated by task 16281 on cpu 1 at 683.953385s (3.380878s ago):
> [  687.335615][T16276]  tracepoint_add_func+0x28a/0xd90
> [  687.336424][T16276]  tracepoint_probe_register_prio_may_exist+0xa2/0xf0
> [  687.337416][T16276]  bpf_probe_register+0x186/0x200
> [  687.338174][T16276]  bpf_raw_tp_link_attach+0x21f/0x540
> [  687.339233][T16276]  __sys_bpf+0x393/0x4fa0
> [  687.340042][T16276]  __x64_sys_bpf+0x78/0xc0
> [  687.340801][T16276]  do_syscall_64+0xcb/0x250
> [  687.341623][T16276]  entry_SYSCALL_64_after_hwframe+0x77/0x7f

I think the stack trace points out that the patch [1] isn't really fixing it.
UAF is on access to bpf_link in __traceiter_sys_enter
while your patch [1] and all attempts to "fix" were delaying bpf_prog.
The issue is not reproducing anymore due to luck.
Jordan Rife Oct. 30, 2024, 5:34 p.m. UTC | #8
> > [  687.334265][T16276] allocated by task 16281 on cpu 1 at 683.953385s (3.380878s ago):
> > [  687.335615][T16276]  tracepoint_add_func+0x28a/0xd90
> > [  687.336424][T16276]  tracepoint_probe_register_prio_may_exist+0xa2/0xf0
> > [  687.337416][T16276]  bpf_probe_register+0x186/0x200
> > [  687.338174][T16276]  bpf_raw_tp_link_attach+0x21f/0x540
> > [  687.339233][T16276]  __sys_bpf+0x393/0x4fa0
> > [  687.340042][T16276]  __x64_sys_bpf+0x78/0xc0
> > [  687.340801][T16276]  do_syscall_64+0xcb/0x250
> > [  687.341623][T16276]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
>
> I think the stack trace points out that the patch [1] isn't really fixing it.
> UAF is on access to bpf_link in __traceiter_sys_enter

The stack trace points to the memory in question being allocated by
tracepoint_add_func where allocation and assignment to
__tracepoint_sys_enter->funcs happens. Mathieu's patch addresses
use-after-free on this structure by using call_rcu_tasks_trace inside
release_probes. In contrast, here is what the "allocated by" trace
looks like for UAF on access to bpf_link (copied from the original
KASAN crash report [4]).

> Allocated by task 5681:
>  kasan_save_stack mm/kasan/common.c:47 [inline]
>  kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
>  poison_kmalloc_redzone mm/kasan/common.c:377 [inline]
>  __kasan_kmalloc+0x98/0xb0 mm/kasan/common.c:394
>  kasan_kmalloc include/linux/kasan.h:260 [inline]
>  __kmalloc_cache_noprof+0x243/0x390 mm/slub.c:4304
>  kmalloc_noprof include/linux/slab.h:901 [inline]
>  kzalloc_noprof include/linux/slab.h:1037 [inline]
>  bpf_raw_tp_link_attach+0x2a0/0x6e0 kernel/bpf/syscall.c:3829
>  bpf_raw_tracepoint_open+0x177/0x1f0 kernel/bpf/syscall.c:3876
>  __sys_bpf+0x3c0/0x810 kernel/bpf/syscall.c:5691
>  __do_sys_bpf kernel/bpf/syscall.c:5756 [inline]
>  __se_sys_bpf kernel/bpf/syscall.c:5754 [inline]
>  __x64_sys_bpf+0x7c/0x90 kernel/bpf/syscall.c:5754
>  do_syscall_x64 arch/x86/entry/common.c:52 [inline]
>  do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
>  entry_SYSCALL_64_after_hwframe+0x77/0x7f

This clearly points to where memory for a bpf_link is allocated.

> link = kzalloc(sizeof(*link), GFP_USER);
> if (!link) {
>     err = -ENOMEM;
>     goto out_put_btp;
> }

To add some context, if I apply Mathieu's patch alone then I get no
meaningful test signal when running the reproducer, because the UAF
crash almost always occurs first on accesses to bpf_link or bpf_prog
showing a trace like the second one above. My intent in applying patch
[1] is to mask out these sources of UAF-related crashes on the BPF
side to just focus on what this series addresses. This series should
eventually be tested end-to-end with Andrii's fix for the BPF stuff
that he mentioned last week, but that would rely on this patch series,
tracepoint_is_faultable() in particular, so it's kind of chicken and
egg in terms of testing. In the meantime, [1] provides a bandaid to
allow some degree of test coverage on this patch.

> while your patch [1] and all attempts to "fix" were delaying bpf_prog.
> The issue is not reproducing anymore due to luck.

[1] chains call_rcu_tasks_trace and call_rcu to free both bpf_prog and
bpf_link after unregistering the trace point. This grace period should
be sufficient to prevent UAF on these structures from the syscall TP
handlers which are protected with rcu_read_lock_trace. I've run the
reproducer many times. Without /some/ fix on the BPF side it crashes
reliably within seconds here. Using call_rcu_tasks_trace or chaining
it with call_rcu eliminates UAF on the BPF stuff which eliminates a
couple of variables for local testing.

If you are not convinced, I'm happy to run through other test
scenarios or run the reproducer for much longer.

-Jordan
Andrii Nakryiko Oct. 30, 2024, 5:39 p.m. UTC | #9
On Wed, Oct 30, 2024 at 10:34 AM Jordan Rife <jrife@google.com> wrote:
>
> > > [  687.334265][T16276] allocated by task 16281 on cpu 1 at 683.953385s (3.380878s ago):
> > > [  687.335615][T16276]  tracepoint_add_func+0x28a/0xd90
> > > [  687.336424][T16276]  tracepoint_probe_register_prio_may_exist+0xa2/0xf0
> > > [  687.337416][T16276]  bpf_probe_register+0x186/0x200
> > > [  687.338174][T16276]  bpf_raw_tp_link_attach+0x21f/0x540
> > > [  687.339233][T16276]  __sys_bpf+0x393/0x4fa0
> > > [  687.340042][T16276]  __x64_sys_bpf+0x78/0xc0
> > > [  687.340801][T16276]  do_syscall_64+0xcb/0x250
> > > [  687.341623][T16276]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
> >
> > I think the stack trace points out that the patch [1] isn't really fixing it.
> > UAF is on access to bpf_link in __traceiter_sys_enter
>
> The stack trace points to the memory in question being allocated by
> tracepoint_add_func where allocation and assignment to
> __tracepoint_sys_enter->funcs happens. Mathieu's patch addresses
> use-after-free on this structure by using call_rcu_tasks_trace inside
> release_probes. In contrast, here is what the "allocated by" trace
> looks like for UAF on access to bpf_link (copied from the original
> KASAN crash report [4]).
>
> > Allocated by task 5681:
> >  kasan_save_stack mm/kasan/common.c:47 [inline]
> >  kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
> >  poison_kmalloc_redzone mm/kasan/common.c:377 [inline]
> >  __kasan_kmalloc+0x98/0xb0 mm/kasan/common.c:394
> >  kasan_kmalloc include/linux/kasan.h:260 [inline]
> >  __kmalloc_cache_noprof+0x243/0x390 mm/slub.c:4304
> >  kmalloc_noprof include/linux/slab.h:901 [inline]
> >  kzalloc_noprof include/linux/slab.h:1037 [inline]
> >  bpf_raw_tp_link_attach+0x2a0/0x6e0 kernel/bpf/syscall.c:3829
> >  bpf_raw_tracepoint_open+0x177/0x1f0 kernel/bpf/syscall.c:3876
> >  __sys_bpf+0x3c0/0x810 kernel/bpf/syscall.c:5691
> >  __do_sys_bpf kernel/bpf/syscall.c:5756 [inline]
> >  __se_sys_bpf kernel/bpf/syscall.c:5754 [inline]
> >  __x64_sys_bpf+0x7c/0x90 kernel/bpf/syscall.c:5754
> >  do_syscall_x64 arch/x86/entry/common.c:52 [inline]
> >  do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
> >  entry_SYSCALL_64_after_hwframe+0x77/0x7f
>
> This clearly points to where memory for a bpf_link is allocated.
>
> > link = kzalloc(sizeof(*link), GFP_USER);
> > if (!link) {
> >     err = -ENOMEM;
> >     goto out_put_btp;
> > }
>
> To add some context, if I apply Mathieu's patch alone then I get no
> meaningful test signal when running the reproducer, because the UAF
> crash almost always occurs first on accesses to bpf_link or bpf_prog
> showing a trace like the second one above. My intent in applying patch
> [1] is to mask out these sources of UAF-related crashes on the BPF
> side to just focus on what this series addresses. This series should
> eventually be tested end-to-end with Andrii's fix for the BPF stuff
> that he mentioned last week, but that would rely on this patch series,
> tracepoint_is_faultable() in particular, so it's kind of chicken and

Yep, agreed. I'll need this patch set landed before landing my fixes.
Mathieu's patch set fixes one set of issues, so I'd say we should land
it and unblock BPF link-specific fixes.

> egg in terms of testing. In the meantime, [1] provides a bandaid to
> allow some degree of test coverage on this patch.
>
> > while your patch [1] and all attempts to "fix" were delaying bpf_prog.
> > The issue is not reproducing anymore due to luck.
>
> [1] chains call_rcu_tasks_trace and call_rcu to free both bpf_prog and
> bpf_link after unregistering the trace point. This grace period should
> be sufficient to prevent UAF on these structures from the syscall TP
> handlers which are protected with rcu_read_lock_trace. I've run the
> reproducer many times. Without /some/ fix on the BPF side it crashes
> reliably within seconds here. Using call_rcu_tasks_trace or chaining
> it with call_rcu eliminates UAF on the BPF stuff which eliminates a
> couple of variables for local testing.
>
> If you are not convinced, I'm happy to run through other test
> scenarios or run the reproducer for much longer.
>
> -Jordan
diff mbox series

Patch

diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
index 259f0ab4ece6..7bed499b7055 100644
--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -226,10 +226,12 @@  static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p)
 		if (!(cond))						\
 			return;						\
 									\
-		if (syscall)						\
+		if (syscall) {						\
 			rcu_read_lock_trace();				\
-		else							\
+			might_fault();					\
+		} else {						\
 			preempt_disable_notrace();			\
+		}							\
 									\
 		__DO_TRACE_CALL(name, TP_ARGS(args));			\
 									\