Message ID | 20241028190927.648953-5-mathieu.desnoyers@efficios.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | Faultable syscall tracepoints updates | expand |
Context | Check | Description |
---|---|---|
netdev/tree_selection | success | Not a local patch |
On 2024-10-28 15:09, Mathieu Desnoyers wrote: > Catch incorrect use of syscall tracepoints even if no probes are > registered by adding a might_fault() check in __DO_TRACE() when > syscall=1. > > Suggested-by: Thomas Gleixner <tglx@linutronix.de> > Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> > Cc: Thomas Gleixner <tglx@linutronix.de> > Cc: Michael Jeanson <mjeanson@efficios.com> > Cc: Steven Rostedt <rostedt@goodmis.org> > Cc: Masami Hiramatsu <mhiramat@kernel.org> > Cc: Peter Zijlstra <peterz@infradead.org> > Cc: Alexei Starovoitov <ast@kernel.org> > Cc: Yonghong Song <yhs@fb.com> > Cc: Paul E. McKenney <paulmck@kernel.org> > Cc: Ingo Molnar <mingo@redhat.com> > Cc: Arnaldo Carvalho de Melo <acme@kernel.org> > Cc: Mark Rutland <mark.rutland@arm.com> > Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> > Cc: Namhyung Kim <namhyung@kernel.org> > Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com> > Cc: bpf@vger.kernel.org > Cc: Joel Fernandes <joel@joelfernandes.org> > Cc: Jordan Rife <jrife@google.com> > --- > include/linux/tracepoint.h | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h > index 259f0ab4ece6..7bed499b7055 100644 > --- a/include/linux/tracepoint.h > +++ b/include/linux/tracepoint.h > @@ -226,10 +226,12 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p) > if (!(cond)) \ > return; \ > \ > - if (syscall) \ > + if (syscall) { \ > rcu_read_lock_trace(); \ > - else \ > + might_fault(); \ Actually, __DO_TRACE() is not the best place to put this, because it's only executed when the tracepoint is enabled. I'll move this to __DECLARE_TRACE_SYSCALL() #define __DECLARE_TRACE_SYSCALL(name, proto, args, cond, data_proto) \ __DECLARE_TRACE_COMMON(name, PARAMS(proto), PARAMS(args), cond, PARAMS(data_proto)) \ static inline void trace_##name(proto) \ { \ might_fault(); \ if (static_branch_unlikely(&__tracepoint_##name.key)) \ __DO_TRACE(name, \ TP_ARGS(args), \ TP_CONDITION(cond), 1); \ [...] instead in v5. Thanks, Mathieu > + } else { \ > preempt_disable_notrace(); \ > + } \ > \ > __DO_TRACE_CALL(name, TP_ARGS(args)); \ > \
On Mon, Oct 28, 2024 at 12:59 PM Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote: > > On 2024-10-28 15:09, Mathieu Desnoyers wrote: > > Catch incorrect use of syscall tracepoints even if no probes are > > registered by adding a might_fault() check in __DO_TRACE() when > > syscall=1. > > > > Suggested-by: Thomas Gleixner <tglx@linutronix.de> > > Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> > > Cc: Thomas Gleixner <tglx@linutronix.de> > > Cc: Michael Jeanson <mjeanson@efficios.com> > > Cc: Steven Rostedt <rostedt@goodmis.org> > > Cc: Masami Hiramatsu <mhiramat@kernel.org> > > Cc: Peter Zijlstra <peterz@infradead.org> > > Cc: Alexei Starovoitov <ast@kernel.org> > > Cc: Yonghong Song <yhs@fb.com> > > Cc: Paul E. McKenney <paulmck@kernel.org> > > Cc: Ingo Molnar <mingo@redhat.com> > > Cc: Arnaldo Carvalho de Melo <acme@kernel.org> > > Cc: Mark Rutland <mark.rutland@arm.com> > > Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> > > Cc: Namhyung Kim <namhyung@kernel.org> > > Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com> > > Cc: bpf@vger.kernel.org > > Cc: Joel Fernandes <joel@joelfernandes.org> > > Cc: Jordan Rife <jrife@google.com> > > --- > > include/linux/tracepoint.h | 6 ++++-- > > 1 file changed, 4 insertions(+), 2 deletions(-) > > > > diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h > > index 259f0ab4ece6..7bed499b7055 100644 > > --- a/include/linux/tracepoint.h > > +++ b/include/linux/tracepoint.h > > @@ -226,10 +226,12 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p) > > if (!(cond)) \ > > return; \ > > \ > > - if (syscall) \ > > + if (syscall) { \ > > rcu_read_lock_trace(); \ > > - else \ > > + might_fault(); \ > > Actually, __DO_TRACE() is not the best place to put this, because it's > only executed when the tracepoint is enabled. > > I'll move this to __DECLARE_TRACE_SYSCALL() > > #define __DECLARE_TRACE_SYSCALL(name, proto, args, cond, data_proto) \ > __DECLARE_TRACE_COMMON(name, PARAMS(proto), PARAMS(args), cond, PARAMS(data_proto)) \ > static inline void trace_##name(proto) \ > { \ > might_fault(); \ > if (static_branch_unlikely(&__tracepoint_##name.key)) \ > __DO_TRACE(name, \ > TP_ARGS(args), \ > TP_CONDITION(cond), 1); \ > [...] > > instead in v5. please drop the RFC tag while at it > > Thanks, > > Mathieu > > > + } else { \ > > preempt_disable_notrace(); \ > > + } \ > > \ > > __DO_TRACE_CALL(name, TP_ARGS(args)); \ > > \ > > -- > Mathieu Desnoyers > EfficiOS Inc. > https://www.efficios.com >
On 2024-10-28 16:20, Andrii Nakryiko wrote: > On Mon, Oct 28, 2024 at 12:59 PM Mathieu Desnoyers > <mathieu.desnoyers@efficios.com> wrote: >> >> On 2024-10-28 15:09, Mathieu Desnoyers wrote: >>> Catch incorrect use of syscall tracepoints even if no probes are >>> registered by adding a might_fault() check in __DO_TRACE() when >>> syscall=1. >>> >>> Suggested-by: Thomas Gleixner <tglx@linutronix.de> >>> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> >>> Cc: Thomas Gleixner <tglx@linutronix.de> >>> Cc: Michael Jeanson <mjeanson@efficios.com> >>> Cc: Steven Rostedt <rostedt@goodmis.org> >>> Cc: Masami Hiramatsu <mhiramat@kernel.org> >>> Cc: Peter Zijlstra <peterz@infradead.org> >>> Cc: Alexei Starovoitov <ast@kernel.org> >>> Cc: Yonghong Song <yhs@fb.com> >>> Cc: Paul E. McKenney <paulmck@kernel.org> >>> Cc: Ingo Molnar <mingo@redhat.com> >>> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> >>> Cc: Mark Rutland <mark.rutland@arm.com> >>> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> >>> Cc: Namhyung Kim <namhyung@kernel.org> >>> Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com> >>> Cc: bpf@vger.kernel.org >>> Cc: Joel Fernandes <joel@joelfernandes.org> >>> Cc: Jordan Rife <jrife@google.com> >>> --- >>> include/linux/tracepoint.h | 6 ++++-- >>> 1 file changed, 4 insertions(+), 2 deletions(-) >>> >>> diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h >>> index 259f0ab4ece6..7bed499b7055 100644 >>> --- a/include/linux/tracepoint.h >>> +++ b/include/linux/tracepoint.h >>> @@ -226,10 +226,12 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p) >>> if (!(cond)) \ >>> return; \ >>> \ >>> - if (syscall) \ >>> + if (syscall) { \ >>> rcu_read_lock_trace(); \ >>> - else \ >>> + might_fault(); \ >> >> Actually, __DO_TRACE() is not the best place to put this, because it's >> only executed when the tracepoint is enabled. >> >> I'll move this to __DECLARE_TRACE_SYSCALL() >> >> #define __DECLARE_TRACE_SYSCALL(name, proto, args, cond, data_proto) \ >> __DECLARE_TRACE_COMMON(name, PARAMS(proto), PARAMS(args), cond, PARAMS(data_proto)) \ >> static inline void trace_##name(proto) \ >> { \ >> might_fault(); \ >> if (static_branch_unlikely(&__tracepoint_##name.key)) \ >> __DO_TRACE(name, \ >> TP_ARGS(args), \ >> TP_CONDITION(cond), 1); \ >> [...] >> >> instead in v5. > > please drop the RFC tag while at it I'm still awaiting for Jordan (or someone else) to come back with testing results before I feel confident dropping the RFC tag. Thanks, Mathieu > >> >> Thanks, >> >> Mathieu >> >>> + } else { \ >>> preempt_disable_notrace(); \ >>> + } \ >>> \ >>> __DO_TRACE_CALL(name, TP_ARGS(args)); \ >>> \ >> >> -- >> Mathieu Desnoyers >> EfficiOS Inc. >> https://www.efficios.com >>
> I'm still awaiting for Jordan (or someone else) to come back with
testing results before I feel confident dropping the RFC tag.
I can test this later today. Considering there needs to be a fix on
the BPF side to fully resolve the use-after-free issue reported by
syzbot, I may combine your v4 patch with the bandaid fix which chains
call_rcu->call_rcu_tasks_trace I made earlier while running the
reproducer locally. This should give some confidence that your patch
addresses issues on the tracepoint side until we can combine it with
Andrii's fix for BPF that consumes tracepoint_is_faultable(). I'm open
to other suggestions as well if you would like me to test something
else.
-Jordan
> I can test this later today. Considering there needs to be a fix on > the BPF side to fully resolve the use-after-free issue reported by > syzbot, I may combine your v4 patch with the bandaid fix which chains > call_rcu->call_rcu_tasks_trace I made earlier while running the > reproducer locally. Testing this way, the series LGTM. Here's what I did starting from linux-next tag next-20241028. 1. Applied my patch from [1] to prevent any failures resulting from the as-of-yet unpatched BPF code that uses call_rcu(). This lets us focus on the effect's of Mathieu's patch series. 2. Ran the reproducer [3] from the original syzbot report [2] on a kernel build /without/ Mathieu's v4 patch to confirm that we hit a use-after-free bug resulting from the use of call_rcu() inside release_probe(). 3. Applied the patch series and rebuilt the kernel. 4. Reran the reproducer on the new kernel build to ensure that we don't hit the same use-after-free bug anymore. [1]: https://lore.kernel.org/bpf/20241023145640.1499722-1-jrife@google.com/ [2]: https://lore.kernel.org/bpf/67121037.050a0220.10f4f4.000f.GAE@google.com/ [3]: https://syzkaller.appspot.com/x/repro.syz?x=153ef887980000 Without Mathieu's Patch ======================= A crash occurs after a few minutes. jordan@t14:~/contexts/use-after-free$ ssh \ -p 10022 \ -o UserKnownHostsFile=/dev/null \ -o StrictHostKeyChecking=no \ -o IdentitiesOnly=yes \ root@127.0.0.1 './syz-execprog -repeat=0 -procs=5 ./repro.syz.txt' Warning: Permanently added '[127.0.0.1]:10022' (ED25519) to the list of known hosts. 2024/10/28 23:15:39 parsed 1 programs 2024/10/28 23:15:52 executed programs: 0 2024/10/28 23:15:57 executed programs: 34 2024/10/28 23:16:02 executed programs: 90 2024/10/28 23:16:07 executed programs: 121 2024/10/28 23:16:12 executed programs: 152 2024/10/28 23:16:17 executed programs: 165 2024/10/28 23:16:23 executed programs: 177 2024/10/28 23:16:28 executed programs: 209 2024/10/28 23:16:33 executed programs: 228 2024/10/28 23:16:38 executed programs: 251 2024/10/28 23:16:44 executed programs: 273 2024/10/28 23:16:49 executed programs: 316 2024/10/28 23:16:54 executed programs: 338 2024/10/28 23:16:59 executed programs: 352 2024/10/28 23:17:04 executed programs: 376 2024/10/28 23:17:10 executed programs: 404 2024/10/28 23:17:16 executed programs: 419 2024/10/28 23:17:21 executed programs: 433 2024/10/28 23:17:26 executed programs: 460 [ 687.323615][T16276] ================================================================== [ 687.325235][T16276] BUG: KFENCE: use-after-free read in __traceiter_sys_enter+0x30/0x50 [ 687.325235][T16276] [ 687.327193][T16276] Use-after-free read at 0xffff88807ec60028 (in kfence-#47): [ 687.328404][T16276] __traceiter_sys_enter+0x30/0x50 [ 687.329338][T16276] syscall_trace_enter+0x1ea/0x2b0 [ 687.330021][T16276] do_syscall_64+0x1ec/0x250 [ 687.330816][T16276] entry_SYSCALL_64_after_hwframe+0x77/0x7f [ 687.331826][T16276] [ 687.332291][T16276] kfence-#47: 0xffff88807ec60000-0xffff88807ec60057, size=88, cache=kmalloc-96 [ 687.332291][T16276] [ 687.334265][T16276] allocated by task 16281 on cpu 1 at 683.953385s (3.380878s ago): [ 687.335615][T16276] tracepoint_add_func+0x28a/0xd90 [ 687.336424][T16276] tracepoint_probe_register_prio_may_exist+0xa2/0xf0 [ 687.337416][T16276] bpf_probe_register+0x186/0x200 [ 687.338174][T16276] bpf_raw_tp_link_attach+0x21f/0x540 [ 687.339233][T16276] __sys_bpf+0x393/0x4fa0 [ 687.340042][T16276] __x64_sys_bpf+0x78/0xc0 [ 687.340801][T16276] do_syscall_64+0xcb/0x250 [ 687.341623][T16276] entry_SYSCALL_64_after_hwframe+0x77/0x7f [ 687.342697][T16276] [ 687.343147][T16276] freed by task 14317 on cpu 1 at 687.273223s (0.069923s ago): [ 687.344352][T16276] rcu_core+0x7a2/0x14f0 [ 687.344996][T16276] handle_softirqs+0x1d4/0x870 [ 687.345797][T16276] irq_exit_rcu+0xbb/0x120 [ 687.346519][T16276] sysvec_apic_timer_interrupt+0xa8/0xc0 [ 687.347432][T16276] asm_sysvec_apic_timer_interrupt+0x1a/0x20 [ 687.348574][T16276] check_preemption_disabled+0x22/0x170 [ 687.349904][T16276] rcu_is_watching+0x12/0xc0 [ 687.350849][T16276] lock_release+0x51e/0x6f0 [ 687.351758][T16276] bpf_trace_run2+0x25a/0x580 [ 687.352675][T16276] __bpf_trace_sys_enter+0x6e/0xa0 [ 687.353625][T16276] syscall_trace_enter+0x1ea/0x2b0 [ 687.354570][T16276] do_syscall_64+0x1ec/0x250 [ 687.355447][T16276] entry_SYSCALL_64_after_hwframe+0x77/0x7f With Mathieu's Patch ==================== No crash after 10m. jordan@t14:~/contexts/use-after-free$ ssh \ -p 10022 \ -o UserKnownHostsFile=/dev/null \ -o StrictHostKeyChecking=no \ -o IdentitiesOnly=yes root@127.0.0.1 \ './syz-execprog -repeat=0 -procs=5 ./repro.syz.txt' Warning: Permanently added '[127.0.0.1]:10022' (ED25519) to the list of known hosts. 2024/10/29 00:09:00 parsed 1 programs 2024/10/29 00:09:07 executed programs: 0 2024/10/29 00:09:12 executed programs: 10 2024/10/29 00:09:17 executed programs: 51 2024/10/29 00:09:23 executed programs: 82 2024/10/29 00:09:28 executed programs: 103 2024/10/29 00:09:33 executed programs: 125 2024/10/29 00:09:39 executed programs: 134 2024/10/29 00:09:44 executed programs: 147 ... 2024/10/29 00:18:06 executed programs: 2671 2024/10/29 00:18:11 executed programs: 2725 2024/10/29 00:18:17 executed programs: 2743 2024/10/29 00:18:22 executed programs: 2772 2024/10/29 00:18:30 executed programs: 2784 2024/10/29 00:18:35 executed programs: 2816 2024/10/29 00:18:40 executed programs: 2842 2024/10/29 00:18:46 executed programs: 2881 2024/10/29 00:18:52 executed programs: 2923 2024/10/29 00:18:57 executed programs: 2947 2024/10/29 00:19:03 executed programs: 2991 2024/10/29 00:19:09 executed programs: 3013 2024/10/29 00:19:16 executed programs: 3052 Tested-by: Jordan Rife <jrife@google.com>
On 2024-10-28 20:28, Jordan Rife wrote: >> I can test this later today. Considering there needs to be a fix on >> the BPF side to fully resolve the use-after-free issue reported by >> syzbot, I may combine your v4 patch with the bandaid fix which chains >> call_rcu->call_rcu_tasks_trace I made earlier while running the >> reproducer locally. > > Testing this way, the series LGTM. Here's what I did starting from > linux-next tag next-20241028. [...] > Tested-by: Jordan Rife <jrife@google.com> > Thanks! I'll add your tested-by tags and send a v5 non-RFC. Mathieu
On Mon, Oct 28, 2024 at 5:28 PM Jordan Rife <jrife@google.com> wrote: > > > 1. Applied my patch from [1] to prevent any failures resulting from the > as-of-yet unpatched BPF code that uses call_rcu(). This lets us ... > [1]: https://lore.kernel.org/bpf/20241023145640.1499722-1-jrife@google.com/ > [2]: https://lore.kernel.org/bpf/67121037.050a0220.10f4f4.000f.GAE@google.com/ > [3]: https://syzkaller.appspot.com/x/repro.syz?x=153ef887980000 > > > [ 687.323615][T16276] ================================================================== > [ 687.325235][T16276] BUG: KFENCE: use-after-free read in __traceiter_sys_enter+0x30/0x50 > [ 687.325235][T16276] > [ 687.327193][T16276] Use-after-free read at 0xffff88807ec60028 (in kfence-#47): > [ 687.328404][T16276] __traceiter_sys_enter+0x30/0x50 > [ 687.329338][T16276] syscall_trace_enter+0x1ea/0x2b0 > [ 687.330021][T16276] do_syscall_64+0x1ec/0x250 > [ 687.330816][T16276] entry_SYSCALL_64_after_hwframe+0x77/0x7f > [ 687.331826][T16276] > [ 687.332291][T16276] kfence-#47: 0xffff88807ec60000-0xffff88807ec60057, size=88, cache=kmalloc-96 > [ 687.332291][T16276] > [ 687.334265][T16276] allocated by task 16281 on cpu 1 at 683.953385s (3.380878s ago): > [ 687.335615][T16276] tracepoint_add_func+0x28a/0xd90 > [ 687.336424][T16276] tracepoint_probe_register_prio_may_exist+0xa2/0xf0 > [ 687.337416][T16276] bpf_probe_register+0x186/0x200 > [ 687.338174][T16276] bpf_raw_tp_link_attach+0x21f/0x540 > [ 687.339233][T16276] __sys_bpf+0x393/0x4fa0 > [ 687.340042][T16276] __x64_sys_bpf+0x78/0xc0 > [ 687.340801][T16276] do_syscall_64+0xcb/0x250 > [ 687.341623][T16276] entry_SYSCALL_64_after_hwframe+0x77/0x7f I think the stack trace points out that the patch [1] isn't really fixing it. UAF is on access to bpf_link in __traceiter_sys_enter while your patch [1] and all attempts to "fix" were delaying bpf_prog. The issue is not reproducing anymore due to luck.
> > [ 687.334265][T16276] allocated by task 16281 on cpu 1 at 683.953385s (3.380878s ago): > > [ 687.335615][T16276] tracepoint_add_func+0x28a/0xd90 > > [ 687.336424][T16276] tracepoint_probe_register_prio_may_exist+0xa2/0xf0 > > [ 687.337416][T16276] bpf_probe_register+0x186/0x200 > > [ 687.338174][T16276] bpf_raw_tp_link_attach+0x21f/0x540 > > [ 687.339233][T16276] __sys_bpf+0x393/0x4fa0 > > [ 687.340042][T16276] __x64_sys_bpf+0x78/0xc0 > > [ 687.340801][T16276] do_syscall_64+0xcb/0x250 > > [ 687.341623][T16276] entry_SYSCALL_64_after_hwframe+0x77/0x7f > > I think the stack trace points out that the patch [1] isn't really fixing it. > UAF is on access to bpf_link in __traceiter_sys_enter The stack trace points to the memory in question being allocated by tracepoint_add_func where allocation and assignment to __tracepoint_sys_enter->funcs happens. Mathieu's patch addresses use-after-free on this structure by using call_rcu_tasks_trace inside release_probes. In contrast, here is what the "allocated by" trace looks like for UAF on access to bpf_link (copied from the original KASAN crash report [4]). > Allocated by task 5681: > kasan_save_stack mm/kasan/common.c:47 [inline] > kasan_save_track+0x3f/0x80 mm/kasan/common.c:68 > poison_kmalloc_redzone mm/kasan/common.c:377 [inline] > __kasan_kmalloc+0x98/0xb0 mm/kasan/common.c:394 > kasan_kmalloc include/linux/kasan.h:260 [inline] > __kmalloc_cache_noprof+0x243/0x390 mm/slub.c:4304 > kmalloc_noprof include/linux/slab.h:901 [inline] > kzalloc_noprof include/linux/slab.h:1037 [inline] > bpf_raw_tp_link_attach+0x2a0/0x6e0 kernel/bpf/syscall.c:3829 > bpf_raw_tracepoint_open+0x177/0x1f0 kernel/bpf/syscall.c:3876 > __sys_bpf+0x3c0/0x810 kernel/bpf/syscall.c:5691 > __do_sys_bpf kernel/bpf/syscall.c:5756 [inline] > __se_sys_bpf kernel/bpf/syscall.c:5754 [inline] > __x64_sys_bpf+0x7c/0x90 kernel/bpf/syscall.c:5754 > do_syscall_x64 arch/x86/entry/common.c:52 [inline] > do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 > entry_SYSCALL_64_after_hwframe+0x77/0x7f This clearly points to where memory for a bpf_link is allocated. > link = kzalloc(sizeof(*link), GFP_USER); > if (!link) { > err = -ENOMEM; > goto out_put_btp; > } To add some context, if I apply Mathieu's patch alone then I get no meaningful test signal when running the reproducer, because the UAF crash almost always occurs first on accesses to bpf_link or bpf_prog showing a trace like the second one above. My intent in applying patch [1] is to mask out these sources of UAF-related crashes on the BPF side to just focus on what this series addresses. This series should eventually be tested end-to-end with Andrii's fix for the BPF stuff that he mentioned last week, but that would rely on this patch series, tracepoint_is_faultable() in particular, so it's kind of chicken and egg in terms of testing. In the meantime, [1] provides a bandaid to allow some degree of test coverage on this patch. > while your patch [1] and all attempts to "fix" were delaying bpf_prog. > The issue is not reproducing anymore due to luck. [1] chains call_rcu_tasks_trace and call_rcu to free both bpf_prog and bpf_link after unregistering the trace point. This grace period should be sufficient to prevent UAF on these structures from the syscall TP handlers which are protected with rcu_read_lock_trace. I've run the reproducer many times. Without /some/ fix on the BPF side it crashes reliably within seconds here. Using call_rcu_tasks_trace or chaining it with call_rcu eliminates UAF on the BPF stuff which eliminates a couple of variables for local testing. If you are not convinced, I'm happy to run through other test scenarios or run the reproducer for much longer. -Jordan
On Wed, Oct 30, 2024 at 10:34 AM Jordan Rife <jrife@google.com> wrote: > > > > [ 687.334265][T16276] allocated by task 16281 on cpu 1 at 683.953385s (3.380878s ago): > > > [ 687.335615][T16276] tracepoint_add_func+0x28a/0xd90 > > > [ 687.336424][T16276] tracepoint_probe_register_prio_may_exist+0xa2/0xf0 > > > [ 687.337416][T16276] bpf_probe_register+0x186/0x200 > > > [ 687.338174][T16276] bpf_raw_tp_link_attach+0x21f/0x540 > > > [ 687.339233][T16276] __sys_bpf+0x393/0x4fa0 > > > [ 687.340042][T16276] __x64_sys_bpf+0x78/0xc0 > > > [ 687.340801][T16276] do_syscall_64+0xcb/0x250 > > > [ 687.341623][T16276] entry_SYSCALL_64_after_hwframe+0x77/0x7f > > > > I think the stack trace points out that the patch [1] isn't really fixing it. > > UAF is on access to bpf_link in __traceiter_sys_enter > > The stack trace points to the memory in question being allocated by > tracepoint_add_func where allocation and assignment to > __tracepoint_sys_enter->funcs happens. Mathieu's patch addresses > use-after-free on this structure by using call_rcu_tasks_trace inside > release_probes. In contrast, here is what the "allocated by" trace > looks like for UAF on access to bpf_link (copied from the original > KASAN crash report [4]). > > > Allocated by task 5681: > > kasan_save_stack mm/kasan/common.c:47 [inline] > > kasan_save_track+0x3f/0x80 mm/kasan/common.c:68 > > poison_kmalloc_redzone mm/kasan/common.c:377 [inline] > > __kasan_kmalloc+0x98/0xb0 mm/kasan/common.c:394 > > kasan_kmalloc include/linux/kasan.h:260 [inline] > > __kmalloc_cache_noprof+0x243/0x390 mm/slub.c:4304 > > kmalloc_noprof include/linux/slab.h:901 [inline] > > kzalloc_noprof include/linux/slab.h:1037 [inline] > > bpf_raw_tp_link_attach+0x2a0/0x6e0 kernel/bpf/syscall.c:3829 > > bpf_raw_tracepoint_open+0x177/0x1f0 kernel/bpf/syscall.c:3876 > > __sys_bpf+0x3c0/0x810 kernel/bpf/syscall.c:5691 > > __do_sys_bpf kernel/bpf/syscall.c:5756 [inline] > > __se_sys_bpf kernel/bpf/syscall.c:5754 [inline] > > __x64_sys_bpf+0x7c/0x90 kernel/bpf/syscall.c:5754 > > do_syscall_x64 arch/x86/entry/common.c:52 [inline] > > do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 > > entry_SYSCALL_64_after_hwframe+0x77/0x7f > > This clearly points to where memory for a bpf_link is allocated. > > > link = kzalloc(sizeof(*link), GFP_USER); > > if (!link) { > > err = -ENOMEM; > > goto out_put_btp; > > } > > To add some context, if I apply Mathieu's patch alone then I get no > meaningful test signal when running the reproducer, because the UAF > crash almost always occurs first on accesses to bpf_link or bpf_prog > showing a trace like the second one above. My intent in applying patch > [1] is to mask out these sources of UAF-related crashes on the BPF > side to just focus on what this series addresses. This series should > eventually be tested end-to-end with Andrii's fix for the BPF stuff > that he mentioned last week, but that would rely on this patch series, > tracepoint_is_faultable() in particular, so it's kind of chicken and Yep, agreed. I'll need this patch set landed before landing my fixes. Mathieu's patch set fixes one set of issues, so I'd say we should land it and unblock BPF link-specific fixes. > egg in terms of testing. In the meantime, [1] provides a bandaid to > allow some degree of test coverage on this patch. > > > while your patch [1] and all attempts to "fix" were delaying bpf_prog. > > The issue is not reproducing anymore due to luck. > > [1] chains call_rcu_tasks_trace and call_rcu to free both bpf_prog and > bpf_link after unregistering the trace point. This grace period should > be sufficient to prevent UAF on these structures from the syscall TP > handlers which are protected with rcu_read_lock_trace. I've run the > reproducer many times. Without /some/ fix on the BPF side it crashes > reliably within seconds here. Using call_rcu_tasks_trace or chaining > it with call_rcu eliminates UAF on the BPF stuff which eliminates a > couple of variables for local testing. > > If you are not convinced, I'm happy to run through other test > scenarios or run the reproducer for much longer. > > -Jordan
diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h index 259f0ab4ece6..7bed499b7055 100644 --- a/include/linux/tracepoint.h +++ b/include/linux/tracepoint.h @@ -226,10 +226,12 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p) if (!(cond)) \ return; \ \ - if (syscall) \ + if (syscall) { \ rcu_read_lock_trace(); \ - else \ + might_fault(); \ + } else { \ preempt_disable_notrace(); \ + } \ \ __DO_TRACE_CALL(name, TP_ARGS(args)); \ \
Catch incorrect use of syscall tracepoints even if no probes are registered by adding a might_fault() check in __DO_TRACE() when syscall=1. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Michael Jeanson <mjeanson@efficios.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Yonghong Song <yhs@fb.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com> Cc: bpf@vger.kernel.org Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Jordan Rife <jrife@google.com> --- include/linux/tracepoint.h | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)