diff mbox series

[2/4] perf,uprobes: fix user stack traces in the presence of pending uretprobes

Message ID 20240508212605.4012172-3-andrii@kernel.org (mailing list archive)
State Handled Elsewhere
Delegated to: BPF
Headers show
Series Fix user stack traces captured from uprobes | expand

Checks

Context Check Description
netdev/tree_selection success Not a local patch
bpf/vmtest-bpf-next-PR fail PR summary
bpf/vmtest-bpf-next-VM_Test-11 success Logs for s390x-gcc / build / build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-17 success Logs for s390x-gcc / veristat
bpf/vmtest-bpf-next-VM_Test-18 success Logs for set-matrix
bpf/vmtest-bpf-next-VM_Test-19 success Logs for x86_64-gcc / build / build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-20 success Logs for x86_64-gcc / build-release
bpf/vmtest-bpf-next-VM_Test-28 success Logs for x86_64-llvm-17 / build / build for x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-29 success Logs for x86_64-llvm-17 / build-release / build for x86_64 with llvm-17 and -O2 optimization
bpf/vmtest-bpf-next-VM_Test-34 success Logs for x86_64-llvm-17 / veristat
bpf/vmtest-bpf-next-VM_Test-35 success Logs for x86_64-llvm-18 / build / build for x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-36 success Logs for x86_64-llvm-18 / build-release / build for x86_64 with llvm-18 and -O2 optimization
bpf/vmtest-bpf-next-VM_Test-42 success Logs for x86_64-llvm-18 / veristat
bpf/vmtest-bpf-next-VM_Test-6 success Logs for aarch64-gcc / test (test_maps, false, 360) / test_maps on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-9 success Logs for aarch64-gcc / test (test_verifier, false, 360) / test_verifier on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-16 success Logs for s390x-gcc / test (test_verifier, false, 360) / test_verifier on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-14 fail Logs for s390x-gcc / test (test_progs, false, 360) / test_progs on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-21 success Logs for x86_64-gcc / test (test_maps, false, 360) / test_maps on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-22 fail Logs for x86_64-gcc / test (test_progs, false, 360) / test_progs on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-23 fail Logs for x86_64-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-24 success Logs for x86_64-gcc / test (test_progs_no_alu32_parallel, true, 30) / test_progs_no_alu32_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-26 success Logs for x86_64-gcc / test (test_verifier, false, 360) / test_verifier on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-25 success Logs for x86_64-gcc / test (test_progs_parallel, true, 30) / test_progs_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-27 success Logs for x86_64-gcc / veristat / veristat on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-30 success Logs for x86_64-llvm-17 / test (test_maps, false, 360) / test_maps on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-31 success Logs for x86_64-llvm-17 / test (test_progs, false, 360) / test_progs on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-32 success Logs for x86_64-llvm-17 / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-33 success Logs for x86_64-llvm-17 / test (test_verifier, false, 360) / test_verifier on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-37 success Logs for x86_64-llvm-18 / test (test_maps, false, 360) / test_maps on x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-38 success Logs for x86_64-llvm-18 / test (test_progs, false, 360) / test_progs on x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-39 success Logs for x86_64-llvm-18 / test (test_progs_cpuv4, false, 360) / test_progs_cpuv4 on x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-40 success Logs for x86_64-llvm-18 / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-41 success Logs for x86_64-llvm-18 / test (test_verifier, false, 360) / test_verifier on x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-5 success Logs for aarch64-gcc / build-release
bpf/vmtest-bpf-next-VM_Test-8 success Logs for set-matrix
bpf/vmtest-bpf-next-VM_Test-7 success Logs for s390x-gcc / build-release
bpf/vmtest-bpf-next-VM_Test-4 success Logs for aarch64-gcc / build / build for aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-10 success Logs for aarch64-gcc / veristat
bpf/vmtest-bpf-next-VM_Test-12 success Logs for s390x-gcc / build-release
bpf/vmtest-bpf-next-VM_Test-13 success Logs for set-matrix
bpf/vmtest-bpf-next-VM_Test-15 success Logs for x86_64-gcc / build-release
bpf/vmtest-bpf-next-VM_Test-2 success Logs for Unittests
bpf/vmtest-bpf-next-VM_Test-0 success Logs for Lint
bpf/vmtest-bpf-next-VM_Test-1 success Logs for ShellCheck
bpf/vmtest-bpf-next-VM_Test-3 success Logs for Validate matrix.py

Commit Message

Andrii Nakryiko May 8, 2024, 9:26 p.m. UTC
When kernel has pending uretprobes installed, it hijacks original user
function return address on the stack with a uretprobe trampoline
address. There could be multiple such pending uretprobes (either on
different user functions or on the same recursive one) at any given
time within the same task.

This approach interferes with the user stack trace capture logic, which
would report suprising addresses (like 0x7fffffffe000) that correspond
to a special "[uprobes]" section that kernel installs in the target
process address space for uretprobe trampoline code, while logically it
should be an address somewhere within the calling function of another
traced user function.

This is easy to correct for, though. Uprobes subsystem keeps track of
pending uretprobes and records original return addresses. This patch is
using this to do a post-processing step and restore each trampoline
address entries with correct original return address. This is done only
if there are pending uretprobes for current task.

Reported-by: Riham Selim <rihams@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 kernel/events/callchain.c | 42 ++++++++++++++++++++++++++++++++++++++-
 kernel/events/uprobes.c   |  9 +++++++++
 2 files changed, 50 insertions(+), 1 deletion(-)

Comments

Peter Zijlstra May 15, 2024, 9:30 a.m. UTC | #1
On Wed, May 08, 2024 at 02:26:03PM -0700, Andrii Nakryiko wrote:

> +static void fixup_uretprobe_trampoline_entries(struct perf_callchain_entry *entry,
> +					       int start_entry_idx)
> +{
> +#ifdef CONFIG_UPROBES
> +	struct uprobe_task *utask = current->utask;
> +	struct return_instance *ri;
> +	__u64 *cur_ip, *last_ip, tramp_addr;
> +
> +	if (likely(!utask || !utask->return_instances))
> +		return;
> +
> +	cur_ip = &entry->ip[start_entry_idx];
> +	last_ip = &entry->ip[entry->nr - 1];
> +	ri = utask->return_instances;
> +	tramp_addr = uprobe_get_trampoline_vaddr();
> +
> +	/* If there are pending uretprobes for current thread, they are

Comment style fail. Also 'for *the* current thread'.

> +	 * recorded in a list inside utask->return_instances; each such
> +	 * pending uretprobe replaces traced user function's return address on
> +	 * the stack, so when stack trace is captured, instead of seeing
> +	 * actual function's return address, we'll have one or many uretprobe
> +	 * trampoline addresses in the stack trace, which are not helpful and
> +	 * misleading to users.

I would beg to differ, what if the uprobe is causing the performance
issue?

While I do think it makes sense to fix the unwind in the sense that we
should be able to continue the unwind, I don't think it makes sense to
completely hide the presence of uprobes.

> +	 * So here we go over the pending list of uretprobes, and each
> +	 * encountered trampoline address is replaced with actual return
> +	 * address.
> +	 */
> +	while (ri && cur_ip <= last_ip) {
> +		if (*cur_ip == tramp_addr) {
> +			*cur_ip = ri->orig_ret_vaddr;
> +			ri = ri->next;
> +		}
> +		cur_ip++;
> +	}
> +#endif
> +}
Andrii Nakryiko May 15, 2024, 2:32 p.m. UTC | #2
On Wed, May 15, 2024 at 3:30 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Wed, May 08, 2024 at 02:26:03PM -0700, Andrii Nakryiko wrote:
>
> > +static void fixup_uretprobe_trampoline_entries(struct perf_callchain_entry *entry,
> > +                                            int start_entry_idx)
> > +{
> > +#ifdef CONFIG_UPROBES
> > +     struct uprobe_task *utask = current->utask;
> > +     struct return_instance *ri;
> > +     __u64 *cur_ip, *last_ip, tramp_addr;
> > +
> > +     if (likely(!utask || !utask->return_instances))
> > +             return;
> > +
> > +     cur_ip = &entry->ip[start_entry_idx];
> > +     last_ip = &entry->ip[entry->nr - 1];
> > +     ri = utask->return_instances;
> > +     tramp_addr = uprobe_get_trampoline_vaddr();
> > +
> > +     /* If there are pending uretprobes for current thread, they are
>
> Comment style fail. Also 'for *the* current thread'.
>

ack, will fix

> > +      * recorded in a list inside utask->return_instances; each such
> > +      * pending uretprobe replaces traced user function's return address on
> > +      * the stack, so when stack trace is captured, instead of seeing
> > +      * actual function's return address, we'll have one or many uretprobe
> > +      * trampoline addresses in the stack trace, which are not helpful and
> > +      * misleading to users.
>
> I would beg to differ, what if the uprobe is causing the performance
> issue?

If uprobe/uretprobe code itself is causing performance issues, you'll
see that in other stack traces, where this code will be actively
running on CPU. I don't think we make anything worse here.

Here we are talking about the case where the uprobe part is done and
it hijacked the return address on the stack, uretprobe is not yet
running (and so not causing any performance issues). The presence of
this "snooping" (pending) uretprobe is irrelevant to the user that is
capturing stack trace. Right now address in [uprobes] VMA section
installed by uretprobe infra code is directly replacing correct and
actual calling function address.

Worst case, one can argue that both [uprobes] and original caller
address should be in the stack trace, but I think it still will be
confusing to users. And also will make implementation less efficient
because now we'll need to insert entries into the array and shift
everything around.

So as I mentioned above, if the concern is seeing uprobe/uretprobe
code using CPU, that doesn't change, we'll see that in the overall set
of captured stack traces (be it custom uprobe handler code or BPF
program).

>
> While I do think it makes sense to fix the unwind in the sense that we
> should be able to continue the unwind, I don't think it makes sense to
> completely hide the presence of uprobes.

Unwind isn't broken in this sense, we do unwind the entire stack trace
(see examples in the later patch). We just don't capture actual
callers if they have uretprobe pending.

>
> > +      * So here we go over the pending list of uretprobes, and each
> > +      * encountered trampoline address is replaced with actual return
> > +      * address.
> > +      */
> > +     while (ri && cur_ip <= last_ip) {
> > +             if (*cur_ip == tramp_addr) {
> > +                     *cur_ip = ri->orig_ret_vaddr;
> > +                     ri = ri->next;
> > +             }
> > +             cur_ip++;
> > +     }
> > +#endif
> > +}
diff mbox series

Patch

diff --git a/kernel/events/callchain.c b/kernel/events/callchain.c
index 1273be84392c..2f7ceca7ae3f 100644
--- a/kernel/events/callchain.c
+++ b/kernel/events/callchain.c
@@ -11,6 +11,7 @@ 
 #include <linux/perf_event.h>
 #include <linux/slab.h>
 #include <linux/sched/task_stack.h>
+#include <linux/uprobes.h>
 
 #include "internal.h"
 
@@ -176,13 +177,50 @@  put_callchain_entry(int rctx)
 	put_recursion_context(this_cpu_ptr(callchain_recursion), rctx);
 }
 
+static void fixup_uretprobe_trampoline_entries(struct perf_callchain_entry *entry,
+					       int start_entry_idx)
+{
+#ifdef CONFIG_UPROBES
+	struct uprobe_task *utask = current->utask;
+	struct return_instance *ri;
+	__u64 *cur_ip, *last_ip, tramp_addr;
+
+	if (likely(!utask || !utask->return_instances))
+		return;
+
+	cur_ip = &entry->ip[start_entry_idx];
+	last_ip = &entry->ip[entry->nr - 1];
+	ri = utask->return_instances;
+	tramp_addr = uprobe_get_trampoline_vaddr();
+
+	/* If there are pending uretprobes for current thread, they are
+	 * recorded in a list inside utask->return_instances; each such
+	 * pending uretprobe replaces traced user function's return address on
+	 * the stack, so when stack trace is captured, instead of seeing
+	 * actual function's return address, we'll have one or many uretprobe
+	 * trampoline addresses in the stack trace, which are not helpful and
+	 * misleading to users.
+	 * So here we go over the pending list of uretprobes, and each
+	 * encountered trampoline address is replaced with actual return
+	 * address.
+	 */
+	while (ri && cur_ip <= last_ip) {
+		if (*cur_ip == tramp_addr) {
+			*cur_ip = ri->orig_ret_vaddr;
+			ri = ri->next;
+		}
+		cur_ip++;
+	}
+#endif
+}
+
 struct perf_callchain_entry *
 get_perf_callchain(struct pt_regs *regs, u32 init_nr, bool kernel, bool user,
 		   u32 max_stack, bool crosstask, bool add_mark)
 {
 	struct perf_callchain_entry *entry;
 	struct perf_callchain_entry_ctx ctx;
-	int rctx;
+	int rctx, start_entry_idx;
 
 	entry = get_callchain_entry(&rctx);
 	if (!entry)
@@ -215,7 +253,9 @@  get_perf_callchain(struct pt_regs *regs, u32 init_nr, bool kernel, bool user,
 			if (add_mark)
 				perf_callchain_store_context(&ctx, PERF_CONTEXT_USER);
 
+			start_entry_idx = entry->nr;
 			perf_callchain_user(&ctx, regs);
+			fixup_uretprobe_trampoline_entries(entry, start_entry_idx);
 		}
 	}
 
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index d60d24f0f2f4..1c99380dc89d 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -2149,6 +2149,15 @@  static void handle_trampoline(struct pt_regs *regs)
 
 		instruction_pointer_set(regs, ri->orig_ret_vaddr);
 		do {
+			/* pop current instance from the stack of pending return instances,
+			 * as it's not pending anymore: we just fixed up original
+			 * instruction pointer in regs and are about to call handlers;
+			 * this allows fixup_uretprobe_trampoline_entries() to properly fix up
+			 * captured stack traces from uretprobe handlers, in which pending
+			 * trampoline addresses on the stack are replaced with correct
+			 * original return addresses
+			 */
+			utask->return_instances = ri->next;
 			if (valid)
 				handle_uretprobe_chain(ri, regs);
 			ri = free_ret_instance(ri);