mbox series

[-tip,v2,00/10] kprobes: Fix stacktrace with kretprobes

Message ID 161553130371.1038734.7661319550287837734.stgit@devnote2 (mailing list archive)
Headers show
Series kprobes: Fix stacktrace with kretprobes | expand

Message

Masami Hiramatsu (Google) March 12, 2021, 6:41 a.m. UTC
Hello,

Here is the 2nd version of the series to fix the stacktrace with kretprobe.

The 1st series is here;

https://lore.kernel.org/bpf/161495873696.346821.10161501768906432924.stgit@devnote2/

In this version I merged the ORC unwinder fix for kretprobe which discussed in the
previous thread. [3/10] is updated according to the Miroslav's comment. [4/10] is
updated for simplify the code. [5/10]-[9/10] are discussed in the previsous tread
and are introduced to the series.

Daniel, can you also test this again? I and Josh discussed a bit different
method and I've implemented it on this version.

This actually changes the kretprobe behavisor a bit, now the instraction pointer in
the pt_regs passed to kretprobe user handler is correctly set the real return
address. So user handlers can get it via instruction_pointer() API.

Thank you,

---

Josh Poimboeuf (1):
      x86/kprobes: Add UNWIND_HINT_FUNC on kretprobe_trampoline code

Masami Hiramatsu (9):
      ia64: kprobes: Fix to pass correct trampoline address to the handler
      kprobes: treewide: Replace arch_deref_entry_point() with dereference_function_descriptor()
      kprobes: treewide: Remove trampoline_address from kretprobe_trampoline_handler()
      kprobes: stacktrace: Recover the address changed by kretprobe
      ARC: Add instruction_pointer_set() API
      ia64: Add instruction_pointer_set() API
      kprobes: Setup instruction pointer in __kretprobe_trampoline_handler
      x86/unwind/orc: Fixup kretprobe trampoline entry
      tracing: Remove kretprobe unknown indicator from stacktrace


 arch/arc/include/asm/ptrace.h       |    5 ++
 arch/arc/kernel/kprobes.c           |    2 -
 arch/arm/probes/kprobes/core.c      |    3 -
 arch/arm64/kernel/probes/kprobes.c  |    3 -
 arch/csky/kernel/probes/kprobes.c   |    2 -
 arch/ia64/include/asm/ptrace.h      |    6 +++
 arch/ia64/kernel/kprobes.c          |   15 ++----
 arch/mips/kernel/kprobes.c          |    3 -
 arch/parisc/kernel/kprobes.c        |    4 +-
 arch/powerpc/kernel/kprobes.c       |   13 -----
 arch/riscv/kernel/probes/kprobes.c  |    2 -
 arch/s390/kernel/kprobes.c          |    2 -
 arch/sh/kernel/kprobes.c            |    2 -
 arch/sparc/kernel/kprobes.c         |    2 -
 arch/x86/include/asm/unwind.h       |    4 ++
 arch/x86/include/asm/unwind_hints.h |    5 ++
 arch/x86/kernel/kprobes/core.c      |    5 +-
 arch/x86/kernel/unwind_orc.c        |   16 +++++++
 include/linux/kprobes.h             |   41 +++++++++++++++--
 kernel/kprobes.c                    |   84 +++++++++++++++++++++--------------
 kernel/stacktrace.c                 |   22 +++++++++
 kernel/trace/trace_output.c         |   27 ++---------
 lib/error-inject.c                  |    3 +
 23 files changed, 170 insertions(+), 101 deletions(-)

--
Masami Hiramatsu (Linaro) <mhiramat@kernel.org>

Comments

Daniel Xu March 12, 2021, 6:56 p.m. UTC | #1
On Fri, Mar 12, 2021 at 03:41:44PM +0900, Masami Hiramatsu wrote:
> Hello,
> 
> Here is the 2nd version of the series to fix the stacktrace with kretprobe.
> 
> The 1st series is here;
> 
> https://lore.kernel.org/bpf/161495873696.346821.10161501768906432924.stgit@devnote2/
> 
> In this version I merged the ORC unwinder fix for kretprobe which discussed in the
> previous thread. [3/10] is updated according to the Miroslav's comment. [4/10] is
> updated for simplify the code. [5/10]-[9/10] are discussed in the previsous tread
> and are introduced to the series.
> 
> Daniel, can you also test this again? I and Josh discussed a bit different
> method and I've implemented it on this version.

Works great, thanks!

Tested-by: Daniel Xu <dxu@dxuuu.xyz>

> 
> This actually changes the kretprobe behavisor a bit, now the instraction pointer in
> the pt_regs passed to kretprobe user handler is correctly set the real return
> address. So user handlers can get it via instruction_pointer() API.

While this changes behavior a little bit, I don't think anyone will
mind. I think it's more accurate now.

<...>

Thanks,
Daniel
Josh Poimboeuf March 16, 2021, 2:30 a.m. UTC | #2
On Fri, Mar 12, 2021 at 03:41:44PM +0900, Masami Hiramatsu wrote:
> Hello,
> 
> Here is the 2nd version of the series to fix the stacktrace with kretprobe.
> 
> The 1st series is here;
> 
> https://lore.kernel.org/bpf/161495873696.346821.10161501768906432924.stgit@devnote2/
> 
> In this version I merged the ORC unwinder fix for kretprobe which discussed in the
> previous thread. [3/10] is updated according to the Miroslav's comment. [4/10] is
> updated for simplify the code. [5/10]-[9/10] are discussed in the previsous tread
> and are introduced to the series.
> 
> Daniel, can you also test this again? I and Josh discussed a bit different
> method and I've implemented it on this version.
> 
> This actually changes the kretprobe behavisor a bit, now the instraction pointer in
> the pt_regs passed to kretprobe user handler is correctly set the real return
> address. So user handlers can get it via instruction_pointer() API.

When I add WARN_ON(1) to a test kretprobe, it doesn't unwind properly.

show_trace_log_lvl() reads the entire stack in lockstep with calls to
the unwinder so that it can decide which addresses get prefixed with
'?'.  So it expects to find an actual return address on the stack.
Instead it finds %rsp.  So it never syncs up with unwind_next_frame()
and shows all remaining addresses as unreliable.

  Call Trace:
   __kretprobe_trampoline_handler+0xca/0x1a0
   trampoline_handler+0x3d/0x50
   kretprobe_trampoline+0x25/0x50
   ? init_test_probes.cold+0x323/0x387
   ? init_kprobes+0x144/0x14c
   ? init_optprobes+0x15/0x15
   ? do_one_initcall+0x5b/0x300
   ? lock_is_held_type+0xe8/0x140
   ? kernel_init_freeable+0x174/0x2cd
   ? rest_init+0x233/0x233
   ? kernel_init+0xa/0x11d
   ? ret_from_fork+0x22/0x30

How about pushing 'kretprobe_trampoline' instead of %rsp for the return
address placeholder.  That fixes the above test, and removes the need
for the weird 'state->ip == sp' check:

  Call Trace:
   __kretprobe_trampoline_handler+0xca/0x150
   trampoline_handler+0x3d/0x50
   kretprobe_trampoline+0x29/0x50
   ? init_test_probes.cold+0x323/0x387
   elfcorehdr_read+0x10/0x10
   init_kprobes+0x144/0x14c
   ? init_optprobes+0x15/0x15
   do_one_initcall+0x72/0x280
   kernel_init_freeable+0x174/0x2cd
   ? rest_init+0x122/0x122
   kernel_init+0xa/0x10e
   ret_from_fork+0x22/0x30

Though, init_test_probes.cold() (the real return address) is still
listed as unreliable.  Maybe we need a call to kretprobe_find_ret_addr()
in show_trace_log_lvl(), similar to the ftrace_graph_ret_addr() call
there.



diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
index 06f33bfebc50..70188fdd288e 100644
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -766,19 +766,19 @@ asm(
 	"kretprobe_trampoline:\n"
 	/* We don't bother saving the ss register */
 #ifdef CONFIG_X86_64
-	"	pushq %rsp\n"
+	/* Push fake return address to tell the unwinder it's a kretprobe */
+	"	pushq $kretprobe_trampoline\n"
 	UNWIND_HINT_FUNC
 	"	pushfq\n"
 	SAVE_REGS_STRING
 	"	movq %rsp, %rdi\n"
 	"	call trampoline_handler\n"
-	/* Replace saved sp with true return address. */
+	/* Replace the fake return address with the real one. */
 	"	movq %rax, 19*8(%rsp)\n"
 	RESTORE_REGS_STRING
 	"	popfq\n"
 #else
 	"	pushl %esp\n"
-	UNWIND_HINT_FUNC
 	"	pushfl\n"
 	SAVE_REGS_STRING
 	"	movl %esp, %eax\n"
diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c
index 1d1b9388a1b1..1d3de84d2410 100644
--- a/arch/x86/kernel/unwind_orc.c
+++ b/arch/x86/kernel/unwind_orc.c
@@ -548,8 +548,7 @@ bool unwind_next_frame(struct unwind_state *state)
 		 * In those cases, find the correct return address from
 		 * task->kretprobe_instances list.
 		 */
-		if (state->ip == sp ||
-		    is_kretprobe_trampoline(state->ip))
+		if (is_kretprobe_trampoline(state->ip))
 			state->ip = kretprobe_find_ret_addr(state->task,
 							    &state->kr_iter);
Masami Hiramatsu (Google) March 16, 2021, 6:30 a.m. UTC | #3
On Mon, 15 Mar 2021 21:30:03 -0500
Josh Poimboeuf <jpoimboe@redhat.com> wrote:

> On Fri, Mar 12, 2021 at 03:41:44PM +0900, Masami Hiramatsu wrote:
> > Hello,
> > 
> > Here is the 2nd version of the series to fix the stacktrace with kretprobe.
> > 
> > The 1st series is here;
> > 
> > https://lore.kernel.org/bpf/161495873696.346821.10161501768906432924.stgit@devnote2/
> > 
> > In this version I merged the ORC unwinder fix for kretprobe which discussed in the
> > previous thread. [3/10] is updated according to the Miroslav's comment. [4/10] is
> > updated for simplify the code. [5/10]-[9/10] are discussed in the previsous tread
> > and are introduced to the series.
> > 
> > Daniel, can you also test this again? I and Josh discussed a bit different
> > method and I've implemented it on this version.
> > 
> > This actually changes the kretprobe behavisor a bit, now the instraction pointer in
> > the pt_regs passed to kretprobe user handler is correctly set the real return
> > address. So user handlers can get it via instruction_pointer() API.
> 
> When I add WARN_ON(1) to a test kretprobe, it doesn't unwind properly.
> 
> show_trace_log_lvl() reads the entire stack in lockstep with calls to
> the unwinder so that it can decide which addresses get prefixed with
> '?'.  So it expects to find an actual return address on the stack.
> Instead it finds %rsp.  So it never syncs up with unwind_next_frame()
> and shows all remaining addresses as unreliable.
> 
>   Call Trace:
>    __kretprobe_trampoline_handler+0xca/0x1a0
>    trampoline_handler+0x3d/0x50
>    kretprobe_trampoline+0x25/0x50
>    ? init_test_probes.cold+0x323/0x387
>    ? init_kprobes+0x144/0x14c
>    ? init_optprobes+0x15/0x15
>    ? do_one_initcall+0x5b/0x300
>    ? lock_is_held_type+0xe8/0x140
>    ? kernel_init_freeable+0x174/0x2cd
>    ? rest_init+0x233/0x233
>    ? kernel_init+0xa/0x11d
>    ? ret_from_fork+0x22/0x30
> 
> How about pushing 'kretprobe_trampoline' instead of %rsp for the return
> address placeholder.  That fixes the above test, and removes the need
> for the weird 'state->ip == sp' check:
> 
>   Call Trace:
>    __kretprobe_trampoline_handler+0xca/0x150
>    trampoline_handler+0x3d/0x50
>    kretprobe_trampoline+0x29/0x50
>    ? init_test_probes.cold+0x323/0x387
>    elfcorehdr_read+0x10/0x10
>    init_kprobes+0x144/0x14c
>    ? init_optprobes+0x15/0x15
>    do_one_initcall+0x72/0x280
>    kernel_init_freeable+0x174/0x2cd
>    ? rest_init+0x122/0x122
>    kernel_init+0xa/0x10e
>    ret_from_fork+0x22/0x30
> 
> Though, init_test_probes.cold() (the real return address) is still
> listed as unreliable.  Maybe we need a call to kretprobe_find_ret_addr()
> in show_trace_log_lvl(), similar to the ftrace_graph_ret_addr() call
> there.

Thanks for the test!
OK, that could be acceptable. However, push %sp still needed for accessing
stack address from kretprobe handler via pt_regs. (regs->sp must point
the stack address)
Anyway, with int3, it pushes one more entry for emulating call, so I think
it is OK.
Let me update the series!

Thank you!

> 
> 
> 
> diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
> index 06f33bfebc50..70188fdd288e 100644
> --- a/arch/x86/kernel/kprobes/core.c
> +++ b/arch/x86/kernel/kprobes/core.c
> @@ -766,19 +766,19 @@ asm(
>  	"kretprobe_trampoline:\n"
>  	/* We don't bother saving the ss register */
>  #ifdef CONFIG_X86_64
> -	"	pushq %rsp\n"
> +	/* Push fake return address to tell the unwinder it's a kretprobe */
> +	"	pushq $kretprobe_trampoline\n"
>  	UNWIND_HINT_FUNC
>  	"	pushfq\n"
>  	SAVE_REGS_STRING
>  	"	movq %rsp, %rdi\n"
>  	"	call trampoline_handler\n"
> -	/* Replace saved sp with true return address. */
> +	/* Replace the fake return address with the real one. */
>  	"	movq %rax, 19*8(%rsp)\n"
>  	RESTORE_REGS_STRING
>  	"	popfq\n"
>  #else
>  	"	pushl %esp\n"
> -	UNWIND_HINT_FUNC
>  	"	pushfl\n"
>  	SAVE_REGS_STRING
>  	"	movl %esp, %eax\n"
> diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c
> index 1d1b9388a1b1..1d3de84d2410 100644
> --- a/arch/x86/kernel/unwind_orc.c
> +++ b/arch/x86/kernel/unwind_orc.c
> @@ -548,8 +548,7 @@ bool unwind_next_frame(struct unwind_state *state)
>  		 * In those cases, find the correct return address from
>  		 * task->kretprobe_instances list.
>  		 */
> -		if (state->ip == sp ||
> -		    is_kretprobe_trampoline(state->ip))
> +		if (is_kretprobe_trampoline(state->ip))
>  			state->ip = kretprobe_find_ret_addr(state->task,
>  							    &state->kr_iter);
>  
> 
>
Andrii Nakryiko May 17, 2021, 9:06 p.m. UTC | #4
On Tue, Mar 16, 2021 at 1:45 AM Masami Hiramatsu <mhiramat@kernel.org> wrote:
>
> On Mon, 15 Mar 2021 21:30:03 -0500
> Josh Poimboeuf <jpoimboe@redhat.com> wrote:
>
> > On Fri, Mar 12, 2021 at 03:41:44PM +0900, Masami Hiramatsu wrote:
> > > Hello,
> > >
> > > Here is the 2nd version of the series to fix the stacktrace with kretprobe.
> > >
> > > The 1st series is here;
> > >
> > > https://lore.kernel.org/bpf/161495873696.346821.10161501768906432924.stgit@devnote2/
> > >
> > > In this version I merged the ORC unwinder fix for kretprobe which discussed in the
> > > previous thread. [3/10] is updated according to the Miroslav's comment. [4/10] is
> > > updated for simplify the code. [5/10]-[9/10] are discussed in the previsous tread
> > > and are introduced to the series.
> > >
> > > Daniel, can you also test this again? I and Josh discussed a bit different
> > > method and I've implemented it on this version.
> > >
> > > This actually changes the kretprobe behavisor a bit, now the instraction pointer in
> > > the pt_regs passed to kretprobe user handler is correctly set the real return
> > > address. So user handlers can get it via instruction_pointer() API.
> >
> > When I add WARN_ON(1) to a test kretprobe, it doesn't unwind properly.
> >
> > show_trace_log_lvl() reads the entire stack in lockstep with calls to
> > the unwinder so that it can decide which addresses get prefixed with
> > '?'.  So it expects to find an actual return address on the stack.
> > Instead it finds %rsp.  So it never syncs up with unwind_next_frame()
> > and shows all remaining addresses as unreliable.
> >
> >   Call Trace:
> >    __kretprobe_trampoline_handler+0xca/0x1a0
> >    trampoline_handler+0x3d/0x50
> >    kretprobe_trampoline+0x25/0x50
> >    ? init_test_probes.cold+0x323/0x387
> >    ? init_kprobes+0x144/0x14c
> >    ? init_optprobes+0x15/0x15
> >    ? do_one_initcall+0x5b/0x300
> >    ? lock_is_held_type+0xe8/0x140
> >    ? kernel_init_freeable+0x174/0x2cd
> >    ? rest_init+0x233/0x233
> >    ? kernel_init+0xa/0x11d
> >    ? ret_from_fork+0x22/0x30
> >
> > How about pushing 'kretprobe_trampoline' instead of %rsp for the return
> > address placeholder.  That fixes the above test, and removes the need
> > for the weird 'state->ip == sp' check:
> >
> >   Call Trace:
> >    __kretprobe_trampoline_handler+0xca/0x150
> >    trampoline_handler+0x3d/0x50
> >    kretprobe_trampoline+0x29/0x50
> >    ? init_test_probes.cold+0x323/0x387
> >    elfcorehdr_read+0x10/0x10
> >    init_kprobes+0x144/0x14c
> >    ? init_optprobes+0x15/0x15
> >    do_one_initcall+0x72/0x280
> >    kernel_init_freeable+0x174/0x2cd
> >    ? rest_init+0x122/0x122
> >    kernel_init+0xa/0x10e
> >    ret_from_fork+0x22/0x30
> >
> > Though, init_test_probes.cold() (the real return address) is still
> > listed as unreliable.  Maybe we need a call to kretprobe_find_ret_addr()
> > in show_trace_log_lvl(), similar to the ftrace_graph_ret_addr() call
> > there.
>
> Thanks for the test!
> OK, that could be acceptable. However, push %sp still needed for accessing
> stack address from kretprobe handler via pt_regs. (regs->sp must point
> the stack address)
> Anyway, with int3, it pushes one more entry for emulating call, so I think
> it is OK.
> Let me update the series!
>

Hi Misami,

Did you get a chance to follow up on this? I checked v5.13-rc1 and it
still has this issue. Inability to capture a stack trace from BPF
kretprobes is a pretty big problem for some applications, it would be
great to get this fixed. Thanks!


> Thank you!
>
> >
> >
> >
> > diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
> > index 06f33bfebc50..70188fdd288e 100644
> > --- a/arch/x86/kernel/kprobes/core.c
> > +++ b/arch/x86/kernel/kprobes/core.c
> > @@ -766,19 +766,19 @@ asm(
> >       "kretprobe_trampoline:\n"
> >       /* We don't bother saving the ss register */
> >  #ifdef CONFIG_X86_64
> > -     "       pushq %rsp\n"
> > +     /* Push fake return address to tell the unwinder it's a kretprobe */
> > +     "       pushq $kretprobe_trampoline\n"
> >       UNWIND_HINT_FUNC
> >       "       pushfq\n"
> >       SAVE_REGS_STRING
> >       "       movq %rsp, %rdi\n"
> >       "       call trampoline_handler\n"
> > -     /* Replace saved sp with true return address. */
> > +     /* Replace the fake return address with the real one. */
> >       "       movq %rax, 19*8(%rsp)\n"
> >       RESTORE_REGS_STRING
> >       "       popfq\n"
> >  #else
> >       "       pushl %esp\n"
> > -     UNWIND_HINT_FUNC
> >       "       pushfl\n"
> >       SAVE_REGS_STRING
> >       "       movl %esp, %eax\n"
> > diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c
> > index 1d1b9388a1b1..1d3de84d2410 100644
> > --- a/arch/x86/kernel/unwind_orc.c
> > +++ b/arch/x86/kernel/unwind_orc.c
> > @@ -548,8 +548,7 @@ bool unwind_next_frame(struct unwind_state *state)
> >                * In those cases, find the correct return address from
> >                * task->kretprobe_instances list.
> >                */
> > -             if (state->ip == sp ||
> > -                 is_kretprobe_trampoline(state->ip))
> > +             if (is_kretprobe_trampoline(state->ip))
> >                       state->ip = kretprobe_find_ret_addr(state->task,
> >                                                           &state->kr_iter);
> >
> >
> >
>
>
> --
> Masami Hiramatsu <mhiramat@kernel.org>
Masami Hiramatsu (Google) May 23, 2021, 2:22 p.m. UTC | #5
On Mon, 17 May 2021 14:06:24 -0700
Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:

> On Tue, Mar 16, 2021 at 1:45 AM Masami Hiramatsu <mhiramat@kernel.org> wrote:
> >
> > On Mon, 15 Mar 2021 21:30:03 -0500
> > Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> >
> > > On Fri, Mar 12, 2021 at 03:41:44PM +0900, Masami Hiramatsu wrote:
> > > > Hello,
> > > >
> > > > Here is the 2nd version of the series to fix the stacktrace with kretprobe.
> > > >
> > > > The 1st series is here;
> > > >
> > > > https://lore.kernel.org/bpf/161495873696.346821.10161501768906432924.stgit@devnote2/
> > > >
> > > > In this version I merged the ORC unwinder fix for kretprobe which discussed in the
> > > > previous thread. [3/10] is updated according to the Miroslav's comment. [4/10] is
> > > > updated for simplify the code. [5/10]-[9/10] are discussed in the previsous tread
> > > > and are introduced to the series.
> > > >
> > > > Daniel, can you also test this again? I and Josh discussed a bit different
> > > > method and I've implemented it on this version.
> > > >
> > > > This actually changes the kretprobe behavisor a bit, now the instraction pointer in
> > > > the pt_regs passed to kretprobe user handler is correctly set the real return
> > > > address. So user handlers can get it via instruction_pointer() API.
> > >
> > > When I add WARN_ON(1) to a test kretprobe, it doesn't unwind properly.
> > >
> > > show_trace_log_lvl() reads the entire stack in lockstep with calls to
> > > the unwinder so that it can decide which addresses get prefixed with
> > > '?'.  So it expects to find an actual return address on the stack.
> > > Instead it finds %rsp.  So it never syncs up with unwind_next_frame()
> > > and shows all remaining addresses as unreliable.
> > >
> > >   Call Trace:
> > >    __kretprobe_trampoline_handler+0xca/0x1a0
> > >    trampoline_handler+0x3d/0x50
> > >    kretprobe_trampoline+0x25/0x50
> > >    ? init_test_probes.cold+0x323/0x387
> > >    ? init_kprobes+0x144/0x14c
> > >    ? init_optprobes+0x15/0x15
> > >    ? do_one_initcall+0x5b/0x300
> > >    ? lock_is_held_type+0xe8/0x140
> > >    ? kernel_init_freeable+0x174/0x2cd
> > >    ? rest_init+0x233/0x233
> > >    ? kernel_init+0xa/0x11d
> > >    ? ret_from_fork+0x22/0x30
> > >
> > > How about pushing 'kretprobe_trampoline' instead of %rsp for the return
> > > address placeholder.  That fixes the above test, and removes the need
> > > for the weird 'state->ip == sp' check:
> > >
> > >   Call Trace:
> > >    __kretprobe_trampoline_handler+0xca/0x150
> > >    trampoline_handler+0x3d/0x50
> > >    kretprobe_trampoline+0x29/0x50
> > >    ? init_test_probes.cold+0x323/0x387
> > >    elfcorehdr_read+0x10/0x10
> > >    init_kprobes+0x144/0x14c
> > >    ? init_optprobes+0x15/0x15
> > >    do_one_initcall+0x72/0x280
> > >    kernel_init_freeable+0x174/0x2cd
> > >    ? rest_init+0x122/0x122
> > >    kernel_init+0xa/0x10e
> > >    ret_from_fork+0x22/0x30
> > >
> > > Though, init_test_probes.cold() (the real return address) is still
> > > listed as unreliable.  Maybe we need a call to kretprobe_find_ret_addr()
> > > in show_trace_log_lvl(), similar to the ftrace_graph_ret_addr() call
> > > there.
> >
> > Thanks for the test!
> > OK, that could be acceptable. However, push %sp still needed for accessing
> > stack address from kretprobe handler via pt_regs. (regs->sp must point
> > the stack address)
> > Anyway, with int3, it pushes one more entry for emulating call, so I think
> > it is OK.
> > Let me update the series!
> >
> 
> Hi Misami,
> 
> Did you get a chance to follow up on this? I checked v5.13-rc1 and it
> still has this issue. Inability to capture a stack trace from BPF
> kretprobes is a pretty big problem for some applications, it would be
> great to get this fixed. Thanks!

OK, let me rework this series.

Thank you,


> 
> 
> > Thank you!
> >
> > >
> > >
> > >
> > > diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
> > > index 06f33bfebc50..70188fdd288e 100644
> > > --- a/arch/x86/kernel/kprobes/core.c
> > > +++ b/arch/x86/kernel/kprobes/core.c
> > > @@ -766,19 +766,19 @@ asm(
> > >       "kretprobe_trampoline:\n"
> > >       /* We don't bother saving the ss register */
> > >  #ifdef CONFIG_X86_64
> > > -     "       pushq %rsp\n"
> > > +     /* Push fake return address to tell the unwinder it's a kretprobe */
> > > +     "       pushq $kretprobe_trampoline\n"
> > >       UNWIND_HINT_FUNC
> > >       "       pushfq\n"
> > >       SAVE_REGS_STRING
> > >       "       movq %rsp, %rdi\n"
> > >       "       call trampoline_handler\n"
> > > -     /* Replace saved sp with true return address. */
> > > +     /* Replace the fake return address with the real one. */
> > >       "       movq %rax, 19*8(%rsp)\n"
> > >       RESTORE_REGS_STRING
> > >       "       popfq\n"
> > >  #else
> > >       "       pushl %esp\n"
> > > -     UNWIND_HINT_FUNC
> > >       "       pushfl\n"
> > >       SAVE_REGS_STRING
> > >       "       movl %esp, %eax\n"
> > > diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c
> > > index 1d1b9388a1b1..1d3de84d2410 100644
> > > --- a/arch/x86/kernel/unwind_orc.c
> > > +++ b/arch/x86/kernel/unwind_orc.c
> > > @@ -548,8 +548,7 @@ bool unwind_next_frame(struct unwind_state *state)
> > >                * In those cases, find the correct return address from
> > >                * task->kretprobe_instances list.
> > >                */
> > > -             if (state->ip == sp ||
> > > -                 is_kretprobe_trampoline(state->ip))
> > > +             if (is_kretprobe_trampoline(state->ip))
> > >                       state->ip = kretprobe_find_ret_addr(state->task,
> > >                                                           &state->kr_iter);
> > >
> > >
> > >
> >
> >
> > --
> > Masami Hiramatsu <mhiramat@kernel.org>
Andrii Nakryiko May 24, 2021, 5:49 p.m. UTC | #6
On Sun, May 23, 2021 at 7:22 AM Masami Hiramatsu <mhiramat@kernel.org> wrote:
>
> On Mon, 17 May 2021 14:06:24 -0700
> Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
>
> > On Tue, Mar 16, 2021 at 1:45 AM Masami Hiramatsu <mhiramat@kernel.org> wrote:
> > >
> > > On Mon, 15 Mar 2021 21:30:03 -0500
> > > Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > >
> > > > On Fri, Mar 12, 2021 at 03:41:44PM +0900, Masami Hiramatsu wrote:
> > > > > Hello,
> > > > >
> > > > > Here is the 2nd version of the series to fix the stacktrace with kretprobe.
> > > > >
> > > > > The 1st series is here;
> > > > >
> > > > > https://lore.kernel.org/bpf/161495873696.346821.10161501768906432924.stgit@devnote2/
> > > > >
> > > > > In this version I merged the ORC unwinder fix for kretprobe which discussed in the
> > > > > previous thread. [3/10] is updated according to the Miroslav's comment. [4/10] is
> > > > > updated for simplify the code. [5/10]-[9/10] are discussed in the previsous tread
> > > > > and are introduced to the series.
> > > > >
> > > > > Daniel, can you also test this again? I and Josh discussed a bit different
> > > > > method and I've implemented it on this version.
> > > > >
> > > > > This actually changes the kretprobe behavisor a bit, now the instraction pointer in
> > > > > the pt_regs passed to kretprobe user handler is correctly set the real return
> > > > > address. So user handlers can get it via instruction_pointer() API.
> > > >
> > > > When I add WARN_ON(1) to a test kretprobe, it doesn't unwind properly.
> > > >
> > > > show_trace_log_lvl() reads the entire stack in lockstep with calls to
> > > > the unwinder so that it can decide which addresses get prefixed with
> > > > '?'.  So it expects to find an actual return address on the stack.
> > > > Instead it finds %rsp.  So it never syncs up with unwind_next_frame()
> > > > and shows all remaining addresses as unreliable.
> > > >
> > > >   Call Trace:
> > > >    __kretprobe_trampoline_handler+0xca/0x1a0
> > > >    trampoline_handler+0x3d/0x50
> > > >    kretprobe_trampoline+0x25/0x50
> > > >    ? init_test_probes.cold+0x323/0x387
> > > >    ? init_kprobes+0x144/0x14c
> > > >    ? init_optprobes+0x15/0x15
> > > >    ? do_one_initcall+0x5b/0x300
> > > >    ? lock_is_held_type+0xe8/0x140
> > > >    ? kernel_init_freeable+0x174/0x2cd
> > > >    ? rest_init+0x233/0x233
> > > >    ? kernel_init+0xa/0x11d
> > > >    ? ret_from_fork+0x22/0x30
> > > >
> > > > How about pushing 'kretprobe_trampoline' instead of %rsp for the return
> > > > address placeholder.  That fixes the above test, and removes the need
> > > > for the weird 'state->ip == sp' check:
> > > >
> > > >   Call Trace:
> > > >    __kretprobe_trampoline_handler+0xca/0x150
> > > >    trampoline_handler+0x3d/0x50
> > > >    kretprobe_trampoline+0x29/0x50
> > > >    ? init_test_probes.cold+0x323/0x387
> > > >    elfcorehdr_read+0x10/0x10
> > > >    init_kprobes+0x144/0x14c
> > > >    ? init_optprobes+0x15/0x15
> > > >    do_one_initcall+0x72/0x280
> > > >    kernel_init_freeable+0x174/0x2cd
> > > >    ? rest_init+0x122/0x122
> > > >    kernel_init+0xa/0x10e
> > > >    ret_from_fork+0x22/0x30
> > > >
> > > > Though, init_test_probes.cold() (the real return address) is still
> > > > listed as unreliable.  Maybe we need a call to kretprobe_find_ret_addr()
> > > > in show_trace_log_lvl(), similar to the ftrace_graph_ret_addr() call
> > > > there.
> > >
> > > Thanks for the test!
> > > OK, that could be acceptable. However, push %sp still needed for accessing
> > > stack address from kretprobe handler via pt_regs. (regs->sp must point
> > > the stack address)
> > > Anyway, with int3, it pushes one more entry for emulating call, so I think
> > > it is OK.
> > > Let me update the series!
> > >
> >
> > Hi Misami,
> >
> > Did you get a chance to follow up on this? I checked v5.13-rc1 and it
> > still has this issue. Inability to capture a stack trace from BPF
> > kretprobes is a pretty big problem for some applications, it would be
> > great to get this fixed. Thanks!
>
> OK, let me rework this series.

Great, thank you! Looking forward.

>
> Thank you,
>
>
> >
> >
> > > Thank you!
> > >
> > > >
> > > >
> > > >
> > > > diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
> > > > index 06f33bfebc50..70188fdd288e 100644
> > > > --- a/arch/x86/kernel/kprobes/core.c
> > > > +++ b/arch/x86/kernel/kprobes/core.c
> > > > @@ -766,19 +766,19 @@ asm(
> > > >       "kretprobe_trampoline:\n"
> > > >       /* We don't bother saving the ss register */
> > > >  #ifdef CONFIG_X86_64
> > > > -     "       pushq %rsp\n"
> > > > +     /* Push fake return address to tell the unwinder it's a kretprobe */
> > > > +     "       pushq $kretprobe_trampoline\n"
> > > >       UNWIND_HINT_FUNC
> > > >       "       pushfq\n"
> > > >       SAVE_REGS_STRING
> > > >       "       movq %rsp, %rdi\n"
> > > >       "       call trampoline_handler\n"
> > > > -     /* Replace saved sp with true return address. */
> > > > +     /* Replace the fake return address with the real one. */
> > > >       "       movq %rax, 19*8(%rsp)\n"
> > > >       RESTORE_REGS_STRING
> > > >       "       popfq\n"
> > > >  #else
> > > >       "       pushl %esp\n"
> > > > -     UNWIND_HINT_FUNC
> > > >       "       pushfl\n"
> > > >       SAVE_REGS_STRING
> > > >       "       movl %esp, %eax\n"
> > > > diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c
> > > > index 1d1b9388a1b1..1d3de84d2410 100644
> > > > --- a/arch/x86/kernel/unwind_orc.c
> > > > +++ b/arch/x86/kernel/unwind_orc.c
> > > > @@ -548,8 +548,7 @@ bool unwind_next_frame(struct unwind_state *state)
> > > >                * In those cases, find the correct return address from
> > > >                * task->kretprobe_instances list.
> > > >                */
> > > > -             if (state->ip == sp ||
> > > > -                 is_kretprobe_trampoline(state->ip))
> > > > +             if (is_kretprobe_trampoline(state->ip))
> > > >                       state->ip = kretprobe_find_ret_addr(state->task,
> > > >                                                           &state->kr_iter);
> > > >
> > > >
> > > >
> > >
> > >
> > > --
> > > Masami Hiramatsu <mhiramat@kernel.org>
>
>
> --
> Masami Hiramatsu <mhiramat@kernel.org>