diff mbox series

[v4] perf,x86: avoid missing caller address in stack traces captured in uprobe

Message ID 20240708231127.1055083-1-andrii@kernel.org (mailing list archive)
State Superseded
Headers show
Series [v4] perf,x86: avoid missing caller address in stack traces captured in uprobe | expand

Checks

Context Check Description
netdev/tree_selection success Not a local patch

Commit Message

Andrii Nakryiko July 8, 2024, 11:11 p.m. UTC
When tracing user functions with uprobe functionality, it's common to
install the probe (e.g., a BPF program) at the first instruction of the
function. This is often going to be `push %rbp` instruction in function
preamble, which means that within that function frame pointer hasn't
been established yet. This leads to consistently missing an actual
caller of the traced function, because perf_callchain_user() only
records current IP (capturing traced function) and then following frame
pointer chain (which would be caller's frame, containing the address of
caller's caller).

So when we have target_1 -> target_2 -> target_3 call chain and we are
tracing an entry to target_3, captured stack trace will report
target_1 -> target_3 call chain, which is wrong and confusing.

This patch proposes a x86-64-specific heuristic to detect `push %rbp`
(`push %ebp` on 32-bit architecture) instruction being traced. Given
entire kernel implementation of user space stack trace capturing works
under assumption that user space code was compiled with frame pointer
register (%rbp/%ebp) preservation, it seems pretty reasonable to use
this instruction as a strong indicator that this is the entry to the
function. In that case, return address is still pointed to by %rsp/%esp,
so we fetch it and add to stack trace before proceeding to unwind the
rest using frame pointer-based logic.

We also check for `endbr64` (for 64-bit modes) as another common pattern
for function entry, as suggested by Josh Poimboeuf. Even if we get this
wrong sometimes for uprobes attached not at the function entry, it's OK
because stack trace will still be overall meaningful, just with one
extra bogus entry. If we don't detect this, we end up with guaranteed to
be missing caller function entry in the stack trace, which is worse
overall.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
v3->v4:
  - use get_user() instead of __get_user(), given untrusted input (Josh);
  - reduced #ifdef-ery (Josh);
v2->v3:
  - added endr64 detection and extracted heuristics into a function (Josh);
v1->v2:
  - use native unsigned long for ret_addr (Peter);
  - add same logic for compat logic in perf_callchain_user32 (Peter).

 arch/x86/events/core.c  | 62 +++++++++++++++++++++++++++++++++++++++++
 include/linux/uprobes.h |  2 ++
 kernel/events/uprobes.c |  2 ++
 3 files changed, 66 insertions(+)

Comments

Peter Zijlstra July 9, 2024, 10:11 a.m. UTC | #1
On Mon, Jul 08, 2024 at 04:11:27PM -0700, Andrii Nakryiko wrote:
> +#ifdef CONFIG_UPROBES
> +/*
> + * Heuristic-based check if uprobe is installed at the function entry.
> + *
> + * Under assumption of user code being compiled with frame pointers,
> + * `push %rbp/%ebp` is a good indicator that we indeed are.
> + *
> + * Similarly, `endbr64` (assuming 64-bit mode) is also a common pattern.
> + * If we get this wrong, captured stack trace might have one extra bogus
> + * entry, but the rest of stack trace will still be meaningful.
> + */
> +static bool is_uprobe_at_func_entry(struct pt_regs *regs)
> +{
> +	struct arch_uprobe *auprobe;
> +
> +	if (!current->utask)
> +		return false;
> +
> +	auprobe = current->utask->auprobe;
> +	if (!auprobe)
> +		return false;
> +
> +	/* push %rbp/%ebp */
> +	if (auprobe->insn[0] == 0x55)
> +		return true;
> +
> +	/* endbr64 (64-bit only) */
> +	if (user_64bit_mode(regs) && *(u32 *)auprobe->insn == 0xfa1e0ff3)
> +		return true;

I meant to reply to Josh suggesting this, but... how can this be? If you
scribble the ENDBR with an INT3 things will #CP and we'll never get to
the #BP.

Also, we tried very hard to not have a literal encode ENDBR (I really
should teach objtool about this one :/). If it somehow makes sense to
keep this clause, please use: gen_endbr()
Masami Hiramatsu (Google) July 9, 2024, 2:10 p.m. UTC | #2
On Tue, 9 Jul 2024 12:11:33 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Mon, Jul 08, 2024 at 04:11:27PM -0700, Andrii Nakryiko wrote:
> > +#ifdef CONFIG_UPROBES
> > +/*
> > + * Heuristic-based check if uprobe is installed at the function entry.
> > + *
> > + * Under assumption of user code being compiled with frame pointers,
> > + * `push %rbp/%ebp` is a good indicator that we indeed are.
> > + *
> > + * Similarly, `endbr64` (assuming 64-bit mode) is also a common pattern.
> > + * If we get this wrong, captured stack trace might have one extra bogus
> > + * entry, but the rest of stack trace will still be meaningful.
> > + */
> > +static bool is_uprobe_at_func_entry(struct pt_regs *regs)
> > +{
> > +	struct arch_uprobe *auprobe;
> > +
> > +	if (!current->utask)
> > +		return false;
> > +
> > +	auprobe = current->utask->auprobe;
> > +	if (!auprobe)
> > +		return false;
> > +
> > +	/* push %rbp/%ebp */
> > +	if (auprobe->insn[0] == 0x55)
> > +		return true;
> > +
> > +	/* endbr64 (64-bit only) */
> > +	if (user_64bit_mode(regs) && *(u32 *)auprobe->insn == 0xfa1e0ff3)
> > +		return true;
> 
> I meant to reply to Josh suggesting this, but... how can this be? If you
> scribble the ENDBR with an INT3 things will #CP and we'll never get to
> the #BP.

Hmm, kprobes checks the instruction and reject if it is ENDBR.
Shouldn't uprobe also skip the ENDBR too?

Thank you,

> 
> Also, we tried very hard to not have a literal encode ENDBR (I really
> should teach objtool about this one :/). If it somehow makes sense to
> keep this clause, please use: gen_endbr()
Peter Zijlstra July 9, 2024, 3:24 p.m. UTC | #3
On Tue, Jul 09, 2024 at 11:10:17PM +0900, Masami Hiramatsu wrote:
> On Tue, 9 Jul 2024 12:11:33 +0200
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > On Mon, Jul 08, 2024 at 04:11:27PM -0700, Andrii Nakryiko wrote:
> > > +#ifdef CONFIG_UPROBES
> > > +/*
> > > + * Heuristic-based check if uprobe is installed at the function entry.
> > > + *
> > > + * Under assumption of user code being compiled with frame pointers,
> > > + * `push %rbp/%ebp` is a good indicator that we indeed are.
> > > + *
> > > + * Similarly, `endbr64` (assuming 64-bit mode) is also a common pattern.
> > > + * If we get this wrong, captured stack trace might have one extra bogus
> > > + * entry, but the rest of stack trace will still be meaningful.
> > > + */
> > > +static bool is_uprobe_at_func_entry(struct pt_regs *regs)
> > > +{
> > > +	struct arch_uprobe *auprobe;
> > > +
> > > +	if (!current->utask)
> > > +		return false;
> > > +
> > > +	auprobe = current->utask->auprobe;
> > > +	if (!auprobe)
> > > +		return false;
> > > +
> > > +	/* push %rbp/%ebp */
> > > +	if (auprobe->insn[0] == 0x55)
> > > +		return true;
> > > +
> > > +	/* endbr64 (64-bit only) */
> > > +	if (user_64bit_mode(regs) && *(u32 *)auprobe->insn == 0xfa1e0ff3)
> > > +		return true;
> > 
> > I meant to reply to Josh suggesting this, but... how can this be? If you
> > scribble the ENDBR with an INT3 things will #CP and we'll never get to
> > the #BP.
> 
> Hmm, kprobes checks the instruction and reject if it is ENDBR.
> Shouldn't uprobe also skip the ENDBR too?

Should, yes, but I can't find in a hurry if we actually do.
Andrii Nakryiko July 9, 2024, 5:50 p.m. UTC | #4
On Tue, Jul 9, 2024 at 3:11 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Mon, Jul 08, 2024 at 04:11:27PM -0700, Andrii Nakryiko wrote:
> > +#ifdef CONFIG_UPROBES
> > +/*
> > + * Heuristic-based check if uprobe is installed at the function entry.
> > + *
> > + * Under assumption of user code being compiled with frame pointers,
> > + * `push %rbp/%ebp` is a good indicator that we indeed are.
> > + *
> > + * Similarly, `endbr64` (assuming 64-bit mode) is also a common pattern.
> > + * If we get this wrong, captured stack trace might have one extra bogus
> > + * entry, but the rest of stack trace will still be meaningful.
> > + */
> > +static bool is_uprobe_at_func_entry(struct pt_regs *regs)
> > +{
> > +     struct arch_uprobe *auprobe;
> > +
> > +     if (!current->utask)
> > +             return false;
> > +
> > +     auprobe = current->utask->auprobe;
> > +     if (!auprobe)
> > +             return false;
> > +
> > +     /* push %rbp/%ebp */
> > +     if (auprobe->insn[0] == 0x55)
> > +             return true;
> > +
> > +     /* endbr64 (64-bit only) */
> > +     if (user_64bit_mode(regs) && *(u32 *)auprobe->insn == 0xfa1e0ff3)
> > +             return true;
>
> I meant to reply to Josh suggesting this, but... how can this be? If you
> scribble the ENDBR with an INT3 things will #CP and we'll never get to
> the #BP.

Well, it seems like it works in practice, I just tried. Here's the
disassembly of the function:

00000000000019d0 <urandlib_api_v1>:
    19d0: f3 0f 1e fa                   endbr64
    19d4: 55                            pushq   %rbp
    19d5: 48 89 e5                      movq    %rsp, %rbp
    19d8: 48 83 ec 10                   subq    $0x10, %rsp
    19dc: 48 8d 3d fe ed ff ff          leaq    -0x1202(%rip), %rdi
 # 0x7e1 <__isoc99_scanf+0x7e1>
    19e3: 48 8d 75 fc                   leaq    -0x4(%rbp), %rsi
    19e7: b0 00                         movb    $0x0, %al
    19e9: e8 f2 00 00 00                callq   0x1ae0 <__isoc99_scanf+0x1ae0>
    19ee: b8 01 00 00 00                movl    $0x1, %eax
    19f3: 48 83 c4 10                   addq    $0x10, %rsp
    19f7: 5d                            popq    %rbp
    19f8: c3                            retq
    19f9: 0f 1f 80 00 00 00 00          nopl    (%rax)

And here's the state when uprobe is attached:

(gdb) disass/r urandlib_api_v1
Dump of assembler code for function urandlib_api_v1:
   0x00007ffb734e39d0 <+0>:     cc                      int3
   0x00007ffb734e39d1 <+1>:     0f 1e fa                nop    %edx
   0x00007ffb734e39d4 <+4>:     55                      push   %rbp
   0x00007ffb734e39d5 <+5>:     48 89 e5                mov    %rsp,%rbp
   0x00007ffb734e39d8 <+8>:     48 83 ec 10             sub    $0x10,%rsp
   0x00007ffb734e39dc <+12>:    48 8d 3d fe ed ff ff    lea
-0x1202(%rip),%rdi        # 0x7ffb734e27e1
   0x00007ffb734e39e3 <+19>:    48 8d 75 fc             lea    -0x4(%rbp),%rsi
=> 0x00007ffb734e39e7 <+23>:    b0 00                   mov    $0x0,%al
   0x00007ffb734e39e9 <+25>:    e8 f2 00 00 00          call
0x7ffb734e3ae0 <__isoc99_scanf@plt>
   0x00007ffb734e39ee <+30>:    b8 01 00 00 00          mov    $0x1,%eax
   0x00007ffb734e39f3 <+35>:    48 83 c4 10             add    $0x10,%rsp
   0x00007ffb734e39f7 <+39>:    5d                      pop    %rbp
   0x00007ffb734e39f8 <+40>:    c3                      ret


You can see it replaced the first byte, the following 3 bytes are
remnants of endb64 (gdb says it's a nop? :)), and then we proceeded,
you can see I stepped through a few more instructions.

Works by accident?

But either way, if we prevent uprobe to be placed on end64 that will
essentially break any code that does compile with endbr64
(-fcf-protection=branch), which is very not great (I suspect most
people that care would just disable that option in such a case).

>
> Also, we tried very hard to not have a literal encode ENDBR (I really
> should teach objtool about this one :/). If it somehow makes sense to
> keep this clause, please use: gen_endbr()

I'll just use is_endbr(), no problem.
Peter Zijlstra July 10, 2024, 11:38 a.m. UTC | #5
On Tue, Jul 09, 2024 at 10:50:00AM -0700, Andrii Nakryiko wrote:
> On Tue, Jul 9, 2024 at 3:11 AM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Mon, Jul 08, 2024 at 04:11:27PM -0700, Andrii Nakryiko wrote:
> > > +#ifdef CONFIG_UPROBES
> > > +/*
> > > + * Heuristic-based check if uprobe is installed at the function entry.
> > > + *
> > > + * Under assumption of user code being compiled with frame pointers,
> > > + * `push %rbp/%ebp` is a good indicator that we indeed are.
> > > + *
> > > + * Similarly, `endbr64` (assuming 64-bit mode) is also a common pattern.
> > > + * If we get this wrong, captured stack trace might have one extra bogus
> > > + * entry, but the rest of stack trace will still be meaningful.
> > > + */
> > > +static bool is_uprobe_at_func_entry(struct pt_regs *regs)
> > > +{
> > > +     struct arch_uprobe *auprobe;
> > > +
> > > +     if (!current->utask)
> > > +             return false;
> > > +
> > > +     auprobe = current->utask->auprobe;
> > > +     if (!auprobe)
> > > +             return false;
> > > +
> > > +     /* push %rbp/%ebp */
> > > +     if (auprobe->insn[0] == 0x55)
> > > +             return true;
> > > +
> > > +     /* endbr64 (64-bit only) */
> > > +     if (user_64bit_mode(regs) && *(u32 *)auprobe->insn == 0xfa1e0ff3)
> > > +             return true;
> >
> > I meant to reply to Josh suggesting this, but... how can this be? If you
> > scribble the ENDBR with an INT3 things will #CP and we'll never get to
> > the #BP.
> 
> Well, it seems like it works in practice, I just tried. Here's the
> disassembly of the function:
> 
> 00000000000019d0 <urandlib_api_v1>:
>     19d0: f3 0f 1e fa                   endbr64
>     19d4: 55                            pushq   %rbp
>     19d5: 48 89 e5                      movq    %rsp, %rbp
>     19d8: 48 83 ec 10                   subq    $0x10, %rsp
>     19dc: 48 8d 3d fe ed ff ff          leaq    -0x1202(%rip), %rdi
>  # 0x7e1 <__isoc99_scanf+0x7e1>
>     19e3: 48 8d 75 fc                   leaq    -0x4(%rbp), %rsi
>     19e7: b0 00                         movb    $0x0, %al
>     19e9: e8 f2 00 00 00                callq   0x1ae0 <__isoc99_scanf+0x1ae0>
>     19ee: b8 01 00 00 00                movl    $0x1, %eax
>     19f3: 48 83 c4 10                   addq    $0x10, %rsp
>     19f7: 5d                            popq    %rbp
>     19f8: c3                            retq
>     19f9: 0f 1f 80 00 00 00 00          nopl    (%rax)
> 
> And here's the state when uprobe is attached:
> 
> (gdb) disass/r urandlib_api_v1
> Dump of assembler code for function urandlib_api_v1:
>    0x00007ffb734e39d0 <+0>:     cc                      int3
>    0x00007ffb734e39d1 <+1>:     0f 1e fa                nop    %edx
>    0x00007ffb734e39d4 <+4>:     55                      push   %rbp
>    0x00007ffb734e39d5 <+5>:     48 89 e5                mov    %rsp,%rbp
>    0x00007ffb734e39d8 <+8>:     48 83 ec 10             sub    $0x10,%rsp
>    0x00007ffb734e39dc <+12>:    48 8d 3d fe ed ff ff    lea
> -0x1202(%rip),%rdi        # 0x7ffb734e27e1
>    0x00007ffb734e39e3 <+19>:    48 8d 75 fc             lea    -0x4(%rbp),%rsi
> => 0x00007ffb734e39e7 <+23>:    b0 00                   mov    $0x0,%al
>    0x00007ffb734e39e9 <+25>:    e8 f2 00 00 00          call
> 0x7ffb734e3ae0 <__isoc99_scanf@plt>
>    0x00007ffb734e39ee <+30>:    b8 01 00 00 00          mov    $0x1,%eax
>    0x00007ffb734e39f3 <+35>:    48 83 c4 10             add    $0x10,%rsp
>    0x00007ffb734e39f7 <+39>:    5d                      pop    %rbp
>    0x00007ffb734e39f8 <+40>:    c3                      ret
> 
> 
> You can see it replaced the first byte, the following 3 bytes are
> remnants of endb64 (gdb says it's a nop? :)), and then we proceeded,
> you can see I stepped through a few more instructions.
> 
> Works by accident?

Yeah, we don't actually have Userspace IBT enabled yet, even on hardware
that supports it.
Andrii Nakryiko July 10, 2024, 3:11 p.m. UTC | #6
On Wed, Jul 10, 2024 at 4:39 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Tue, Jul 09, 2024 at 10:50:00AM -0700, Andrii Nakryiko wrote:
> > On Tue, Jul 9, 2024 at 3:11 AM Peter Zijlstra <peterz@infradead.org> wrote:
> > >
> > > On Mon, Jul 08, 2024 at 04:11:27PM -0700, Andrii Nakryiko wrote:
> > > > +#ifdef CONFIG_UPROBES
> > > > +/*
> > > > + * Heuristic-based check if uprobe is installed at the function entry.
> > > > + *
> > > > + * Under assumption of user code being compiled with frame pointers,
> > > > + * `push %rbp/%ebp` is a good indicator that we indeed are.
> > > > + *
> > > > + * Similarly, `endbr64` (assuming 64-bit mode) is also a common pattern.
> > > > + * If we get this wrong, captured stack trace might have one extra bogus
> > > > + * entry, but the rest of stack trace will still be meaningful.
> > > > + */
> > > > +static bool is_uprobe_at_func_entry(struct pt_regs *regs)
> > > > +{
> > > > +     struct arch_uprobe *auprobe;
> > > > +
> > > > +     if (!current->utask)
> > > > +             return false;
> > > > +
> > > > +     auprobe = current->utask->auprobe;
> > > > +     if (!auprobe)
> > > > +             return false;
> > > > +
> > > > +     /* push %rbp/%ebp */
> > > > +     if (auprobe->insn[0] == 0x55)
> > > > +             return true;
> > > > +
> > > > +     /* endbr64 (64-bit only) */
> > > > +     if (user_64bit_mode(regs) && *(u32 *)auprobe->insn == 0xfa1e0ff3)
> > > > +             return true;
> > >
> > > I meant to reply to Josh suggesting this, but... how can this be? If you
> > > scribble the ENDBR with an INT3 things will #CP and we'll never get to
> > > the #BP.
> >
> > Well, it seems like it works in practice, I just tried. Here's the
> > disassembly of the function:
> >
> > 00000000000019d0 <urandlib_api_v1>:
> >     19d0: f3 0f 1e fa                   endbr64
> >     19d4: 55                            pushq   %rbp
> >     19d5: 48 89 e5                      movq    %rsp, %rbp
> >     19d8: 48 83 ec 10                   subq    $0x10, %rsp
> >     19dc: 48 8d 3d fe ed ff ff          leaq    -0x1202(%rip), %rdi
> >  # 0x7e1 <__isoc99_scanf+0x7e1>
> >     19e3: 48 8d 75 fc                   leaq    -0x4(%rbp), %rsi
> >     19e7: b0 00                         movb    $0x0, %al
> >     19e9: e8 f2 00 00 00                callq   0x1ae0 <__isoc99_scanf+0x1ae0>
> >     19ee: b8 01 00 00 00                movl    $0x1, %eax
> >     19f3: 48 83 c4 10                   addq    $0x10, %rsp
> >     19f7: 5d                            popq    %rbp
> >     19f8: c3                            retq
> >     19f9: 0f 1f 80 00 00 00 00          nopl    (%rax)
> >
> > And here's the state when uprobe is attached:
> >
> > (gdb) disass/r urandlib_api_v1
> > Dump of assembler code for function urandlib_api_v1:
> >    0x00007ffb734e39d0 <+0>:     cc                      int3
> >    0x00007ffb734e39d1 <+1>:     0f 1e fa                nop    %edx
> >    0x00007ffb734e39d4 <+4>:     55                      push   %rbp
> >    0x00007ffb734e39d5 <+5>:     48 89 e5                mov    %rsp,%rbp
> >    0x00007ffb734e39d8 <+8>:     48 83 ec 10             sub    $0x10,%rsp
> >    0x00007ffb734e39dc <+12>:    48 8d 3d fe ed ff ff    lea
> > -0x1202(%rip),%rdi        # 0x7ffb734e27e1
> >    0x00007ffb734e39e3 <+19>:    48 8d 75 fc             lea    -0x4(%rbp),%rsi
> > => 0x00007ffb734e39e7 <+23>:    b0 00                   mov    $0x0,%al
> >    0x00007ffb734e39e9 <+25>:    e8 f2 00 00 00          call
> > 0x7ffb734e3ae0 <__isoc99_scanf@plt>
> >    0x00007ffb734e39ee <+30>:    b8 01 00 00 00          mov    $0x1,%eax
> >    0x00007ffb734e39f3 <+35>:    48 83 c4 10             add    $0x10,%rsp
> >    0x00007ffb734e39f7 <+39>:    5d                      pop    %rbp
> >    0x00007ffb734e39f8 <+40>:    c3                      ret
> >
> >
> > You can see it replaced the first byte, the following 3 bytes are
> > remnants of endb64 (gdb says it's a nop? :)), and then we proceeded,
> > you can see I stepped through a few more instructions.
> >
> > Works by accident?
>
> Yeah, we don't actually have Userspace IBT enabled yet, even on hardware
> that supports it.

OK, I don't know what the implications are, but it's a good accident :)

Anyways, what should I do for v4? Drop is_endbr6() check or keep it?
Josh Poimboeuf July 10, 2024, 4:24 p.m. UTC | #7
On Wed, Jul 10, 2024 at 08:11:57AM -0700, Andrii Nakryiko wrote:
> On Wed, Jul 10, 2024 at 4:39 AM Peter Zijlstra <peterz@infradead.org> wrote:
> > On Tue, Jul 09, 2024 at 10:50:00AM -0700, Andrii Nakryiko wrote:
> > > You can see it replaced the first byte, the following 3 bytes are
> > > remnants of endb64 (gdb says it's a nop? :)), and then we proceeded,
> > > you can see I stepped through a few more instructions.
> > >
> > > Works by accident?
> >
> > Yeah, we don't actually have Userspace IBT enabled yet, even on hardware
> > that supports it.
> 
> OK, I don't know what the implications are, but it's a good accident :)
> 
> Anyways, what should I do for v4? Drop is_endbr6() check or keep it?

Given the current behavior of uprobe overwriting ENDBR64 with INT3, the
is_endbr6() check still makes sense, otherwise is_uprobe_at_func_entry()
would never return true on OSes which have the ENDBR64 compiled in.

However, once userspace IBT actually gets enabled, uprobe should skip
the ENDBR64 and patch the subsequent instruction.  Then the is_endbr6()
check would no longer be needed.
Andrii Nakryiko July 10, 2024, 7:24 p.m. UTC | #8
On Wed, Jul 10, 2024 at 9:24 AM Josh Poimboeuf <jpoimboe@kernel.org> wrote:
>
> On Wed, Jul 10, 2024 at 08:11:57AM -0700, Andrii Nakryiko wrote:
> > On Wed, Jul 10, 2024 at 4:39 AM Peter Zijlstra <peterz@infradead.org> wrote:
> > > On Tue, Jul 09, 2024 at 10:50:00AM -0700, Andrii Nakryiko wrote:
> > > > You can see it replaced the first byte, the following 3 bytes are
> > > > remnants of endb64 (gdb says it's a nop? :)), and then we proceeded,
> > > > you can see I stepped through a few more instructions.
> > > >
> > > > Works by accident?
> > >
> > > Yeah, we don't actually have Userspace IBT enabled yet, even on hardware
> > > that supports it.
> >
> > OK, I don't know what the implications are, but it's a good accident :)
> >
> > Anyways, what should I do for v4? Drop is_endbr6() check or keep it?
>
> Given the current behavior of uprobe overwriting ENDBR64 with INT3, the
> is_endbr6() check still makes sense, otherwise is_uprobe_at_func_entry()
> would never return true on OSes which have the ENDBR64 compiled in.
>
> However, once userspace IBT actually gets enabled, uprobe should skip
> the ENDBR64 and patch the subsequent instruction.  Then the is_endbr6()
> check would no longer be needed.
>

Ok, I'll keep it then, thanks.

> --
> Josh
diff mbox series

Patch

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 5b0dd07b1ef1..c09603d769a2 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -41,6 +41,7 @@ 
 #include <asm/desc.h>
 #include <asm/ldt.h>
 #include <asm/unwind.h>
+#include <asm/uprobes.h>
 
 #include "perf_event.h"
 
@@ -2813,6 +2814,46 @@  static unsigned long get_segment_base(unsigned int segment)
 	return get_desc_base(desc);
 }
 
+#ifdef CONFIG_UPROBES
+/*
+ * Heuristic-based check if uprobe is installed at the function entry.
+ *
+ * Under assumption of user code being compiled with frame pointers,
+ * `push %rbp/%ebp` is a good indicator that we indeed are.
+ *
+ * Similarly, `endbr64` (assuming 64-bit mode) is also a common pattern.
+ * If we get this wrong, captured stack trace might have one extra bogus
+ * entry, but the rest of stack trace will still be meaningful.
+ */
+static bool is_uprobe_at_func_entry(struct pt_regs *regs)
+{
+	struct arch_uprobe *auprobe;
+
+	if (!current->utask)
+		return false;
+
+	auprobe = current->utask->auprobe;
+	if (!auprobe)
+		return false;
+
+	/* push %rbp/%ebp */
+	if (auprobe->insn[0] == 0x55)
+		return true;
+
+	/* endbr64 (64-bit only) */
+	if (user_64bit_mode(regs) && *(u32 *)auprobe->insn == 0xfa1e0ff3)
+		return true;
+
+	return false;
+}
+
+#else
+static bool is_uprobe_at_func_entry(struct pt_regs *regs)
+{
+	return false;
+}
+#endif /* CONFIG_UPROBES */
+
 #ifdef CONFIG_IA32_EMULATION
 
 #include <linux/compat.h>
@@ -2824,6 +2865,7 @@  perf_callchain_user32(struct pt_regs *regs, struct perf_callchain_entry_ctx *ent
 	unsigned long ss_base, cs_base;
 	struct stack_frame_ia32 frame;
 	const struct stack_frame_ia32 __user *fp;
+	u32 ret_addr;
 
 	if (user_64bit_mode(regs))
 		return 0;
@@ -2833,6 +2875,12 @@  perf_callchain_user32(struct pt_regs *regs, struct perf_callchain_entry_ctx *ent
 
 	fp = compat_ptr(ss_base + regs->bp);
 	pagefault_disable();
+
+	/* see perf_callchain_user() below for why we do this */
+	if (is_uprobe_at_func_entry(regs) &&
+	    !get_user(ret_addr, (const u32 __user *)regs->sp))
+		perf_callchain_store(entry, ret_addr);
+
 	while (entry->nr < entry->max_stack) {
 		if (!valid_user_frame(fp, sizeof(frame)))
 			break;
@@ -2861,6 +2909,7 @@  perf_callchain_user(struct perf_callchain_entry_ctx *entry, struct pt_regs *regs
 {
 	struct stack_frame frame;
 	const struct stack_frame __user *fp;
+	unsigned long ret_addr;
 
 	if (perf_guest_state()) {
 		/* TODO: We don't support guest os callchain now */
@@ -2884,6 +2933,19 @@  perf_callchain_user(struct perf_callchain_entry_ctx *entry, struct pt_regs *regs
 		return;
 
 	pagefault_disable();
+
+	/*
+	 * If we are called from uprobe handler, and we are indeed at the very
+	 * entry to user function (which is normally a `push %rbp` instruction,
+	 * under assumption of application being compiled with frame pointers),
+	 * we should read return address from *regs->sp before proceeding
+	 * to follow frame pointers, otherwise we'll skip immediate caller
+	 * as %rbp is not yet setup.
+	 */
+	if (is_uprobe_at_func_entry(regs) &&
+	    !get_user(ret_addr, (const unsigned long __user *)regs->sp))
+		perf_callchain_store(entry, ret_addr);
+
 	while (entry->nr < entry->max_stack) {
 		if (!valid_user_frame(fp, sizeof(frame)))
 			break;
diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index b503fafb7fb3..a270a5892ab4 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -76,6 +76,8 @@  struct uprobe_task {
 	struct uprobe			*active_uprobe;
 	unsigned long			xol_vaddr;
 
+	struct arch_uprobe              *auprobe;
+
 	struct return_instance		*return_instances;
 	unsigned int			depth;
 };
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 99be2adedbc0..6e22e4d80f1e 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -2082,6 +2082,7 @@  static void handler_chain(struct uprobe *uprobe, struct pt_regs *regs)
 	bool need_prep = false; /* prepare return uprobe, when needed */
 
 	down_read(&uprobe->register_rwsem);
+	current->utask->auprobe = &uprobe->arch;
 	for (uc = uprobe->consumers; uc; uc = uc->next) {
 		int rc = 0;
 
@@ -2096,6 +2097,7 @@  static void handler_chain(struct uprobe *uprobe, struct pt_regs *regs)
 
 		remove &= rc;
 	}
+	current->utask->auprobe = NULL;
 
 	if (need_prep && !remove)
 		prepare_uretprobe(uprobe, regs); /* put bp at return */