mbox series

[v5,00/12] tracing: fprobe: rethook: Use ftrace_regs instead of pt_regs

Message ID 169556254640.146934.5654329452696494756.stgit@devnote2 (mailing list archive)
Headers show
Series tracing: fprobe: rethook: Use ftrace_regs instead of pt_regs | expand

Message

Masami Hiramatsu (Google) Sept. 24, 2023, 1:35 p.m. UTC
Hi,

Here is the 5th version of the series to use ftrace_regs instead of pt_regs
in fprobe.
The previous version is here;

https://lore.kernel.org/all/169280372795.282662.9784422934484459769.stgit@devnote2/

In this version, I decided to use perf's own per-cpu pt_regs array to
copy the required registers[8/12]. Thus this version adds a patch which
adds a new ftrace_fill_perf_regs() API. So the ftrace_partial_regs() will
be used for BPF and ftrace_fill_perf_regs() is used for perf events.

This also adds a fix for RISCV ftrace[1/12]. When kernel is built with
disabling CONFIG_DYNAMIC_FTRACE_WITH_REGS on RISCV, it stores partial
registers on the stack, but it doesn't make it fit to struct ftrace_regs.
But since the 4th argument of ftrace_func_t is ftrace_regs *, it breaks
the ABI. So fixing it to save registers on ftrace_regs (== pt_regs on RISCV).

Another new patch [3/12] is adding a comment about the requirements for
the ftrace_regs.

 - RISCV ftrace fix to save registers on struct ftrace_regs correctly.
 - Document fix for the current fprobe callback prototype.
 - Add a comment of requirement for the ftrace_regs.
 - Simply replace pt_regs in fprobe_entry_handler with ftrace_regs.
 - Expose ftrace_regs even if CONFIG_FUNCTION_TRACER=n.
 - Introduce ftrace_partial_regs(). (This changes ARM64 which needs a custom
   implementation)
 - Introduce ftrace_fill_perf_regs() for perf pt_regs.
 - Replace pt_regs in rethook and fprobe_exit_handler with ftrace_regs. This
   introduce a new HAVE_PT_REGS_TO_FTRACE_REGS_CAST which means ftrace_regs is
   just a wrapper of pt_regs (except for arm64, other architectures do this)
 - Update fprobe-events to use ftrace_regs natively.
 - Update bpf multi-kprobe handler use ftrace_partial_regs().
 - Update document for new fprobe callbacks.
 - Add notes for the $argN and $retval.

This series can be applied against the trace-v6.6-rc2 on linux-trace tree.

This series can also be found below branch.

https://git.kernel.org/pub/scm/linux/kernel/git/mhiramat/linux.git/log/?h=topic/fprobe-ftrace-regs

Thank you,

---

Masami Hiramatsu (Google) (12):
      riscv: ftrace: Fix to pass correct ftrace_regs to ftrace_func_t functions
      Documentation: probes: Add a new ret_ip callback parameter
      tracing: Add a comment about the requirements of the ftrace_regs
      fprobe: Use ftrace_regs in fprobe entry handler
      tracing: Expose ftrace_regs regardless of CONFIG_FUNCTION_TRACER
      fprobe: rethook: Use ftrace_regs in fprobe exit handler and rethook
      tracing: Add ftrace_partial_regs() for converting ftrace_regs to pt_regs
      tracing: Add ftrace_fill_perf_regs() for perf event
      tracing/fprobe: Enable fprobe events with CONFIG_DYNAMIC_FTRACE_WITH_ARGS
      bpf: Enable kprobe_multi feature if CONFIG_FPROBE is enabled
      Documentation: probes: Update fprobe document to use ftrace_regs
      Documentation: tracing: Add a note about argument and retval access


 Documentation/trace/fprobe.rst      |   18 +++--
 Documentation/trace/fprobetrace.rst |    8 ++
 Documentation/trace/kprobetrace.rst |    8 ++
 arch/Kconfig                        |    1 
 arch/arm64/include/asm/ftrace.h     |   18 +++++
 arch/loongarch/Kconfig              |    1 
 arch/loongarch/kernel/rethook.c     |   10 +--
 arch/loongarch/kernel/rethook.h     |    4 +
 arch/powerpc/include/asm/ftrace.h   |    7 ++
 arch/riscv/kernel/mcount-dyn.S      |   67 ++++++++----------
 arch/riscv/kernel/probes/rethook.c  |   12 ++-
 arch/riscv/kernel/probes/rethook.h  |    6 +-
 arch/s390/Kconfig                   |    1 
 arch/s390/include/asm/ftrace.h      |    9 ++
 arch/s390/kernel/rethook.c          |   10 ++-
 arch/s390/kernel/rethook.h          |    2 -
 arch/x86/Kconfig                    |    1 
 arch/x86/include/asm/ftrace.h       |    7 ++
 arch/x86/kernel/rethook.c           |   13 ++--
 include/linux/fprobe.h              |    4 +
 include/linux/ftrace.h              |  128 +++++++++++++++++++++++++++++------
 include/linux/rethook.h             |   11 ++-
 kernel/kprobes.c                    |   10 ++-
 kernel/trace/Kconfig                |    9 ++
 kernel/trace/bpf_trace.c            |   14 ++--
 kernel/trace/fprobe.c               |   10 +--
 kernel/trace/rethook.c              |   16 ++--
 kernel/trace/trace_fprobe.c         |   70 +++++++++++--------
 kernel/trace/trace_probe_tmpl.h     |    2 -
 lib/test_fprobe.c                   |   10 +--
 samples/fprobe/fprobe_example.c     |    4 +
 31 files changed, 327 insertions(+), 164 deletions(-)

--
Masami Hiramatsu (Google) <mhiramat@kernel.org>

Comments

Masami Hiramatsu (Google) Sept. 29, 2023, 1:21 a.m. UTC | #1
Hi,

While revising the LPC slides, I realized that this series is actually
slightly going in the wrong direction.

My goal is to unify "the shadow stack and the trampoline" for function exit
tracing (function graph tracer and function return probe event), but not
only unifying the internal interface.

My original plan was to introduce an independent "interface" for the shadow
stack and trampoline, which can switch the backend implementation. This was
important because when I started that, there were kretprobe or function-
graph tracer which hook the function exit.

If kprobe depends on the function-graph tracer only for using the same shadow
stack, that makes kprobe usability down. So I introduced "rethook" for the
interface, which could be a wrapper interface of the shadow stacks and the
trampolines. 
One my misread was the "pt_regs" issue. So this series is for fixing it.

However, when I introduced "fprobe" for function entry and exit probe, this
assumption has changed. If we move from kprobe/kretprobe to fprobe for
function entry/exit probing (= function entry/exit probe event on ftrace),
we don't need to care about the dependency of kprobes, because if "fprobe"
already depends on function tracer. Maybe it can depends on function-graph
tracer too.

Thus, what I need is to make fprobe to use function-graph tracer's shadow
stack and trampoline instead of rethook. This may need to generalize its
interface so that we can share it between fprobe and function-graph tracer,
but we don't need to involve rethook and kretprobes anymore.

Note that this plan still requires changing the fprobe interface to
use ftrace_regs, because some architecture doesn't support pt_regs on
ftrace.

Thus, I will keep the following patches from this series.
(first 3 patches are fixes so to be sent independently)

>  - RISCV ftrace fix to save registers on struct ftrace_regs correctly.
>  - Document fix for the current fprobe callback prototype.
>  - Add a comment of requirement for the ftrace_regs.
>  - Simply replace pt_regs in fprobe_entry_handler with ftrace_regs.
      (this needs to be fixed)

>  - Expose ftrace_regs even if CONFIG_FUNCTION_TRACER=n.
>  - Introduce ftrace_partial_regs(). (This changes ARM64 which needs a custom
>    implementation)
>  - Introduce ftrace_fill_perf_regs() for perf pt_regs.

>  - Update fprobe-events to use ftrace_regs natively.
>  - Update bpf multi-kprobe handler use ftrace_partial_regs().

And need to add patches

 - Introduce a generized function exit hook interface for ftrace.
 - Replace rethook in fprobe with the function exit hook interface.


Thank you,

On Sun, 24 Sep 2023 22:35:47 +0900
"Masami Hiramatsu (Google)" <mhiramat@kernel.org> wrote:

> Hi,
> 
> Here is the 5th version of the series to use ftrace_regs instead of pt_regs
> in fprobe.
> The previous version is here;
> 
> https://lore.kernel.org/all/169280372795.282662.9784422934484459769.stgit@devnote2/
> 
> In this version, I decided to use perf's own per-cpu pt_regs array to
> copy the required registers[8/12]. Thus this version adds a patch which
> adds a new ftrace_fill_perf_regs() API. So the ftrace_partial_regs() will
> be used for BPF and ftrace_fill_perf_regs() is used for perf events.
> 
> This also adds a fix for RISCV ftrace[1/12]. When kernel is built with
> disabling CONFIG_DYNAMIC_FTRACE_WITH_REGS on RISCV, it stores partial
> registers on the stack, but it doesn't make it fit to struct ftrace_regs.
> But since the 4th argument of ftrace_func_t is ftrace_regs *, it breaks
> the ABI. So fixing it to save registers on ftrace_regs (== pt_regs on RISCV).
> 
> Another new patch [3/12] is adding a comment about the requirements for
> the ftrace_regs.
> 
>  - RISCV ftrace fix to save registers on struct ftrace_regs correctly.
>  - Document fix for the current fprobe callback prototype.
>  - Add a comment of requirement for the ftrace_regs.
>  - Simply replace pt_regs in fprobe_entry_handler with ftrace_regs.
>  - Expose ftrace_regs even if CONFIG_FUNCTION_TRACER=n.
>  - Introduce ftrace_partial_regs(). (This changes ARM64 which needs a custom
>    implementation)
>  - Introduce ftrace_fill_perf_regs() for perf pt_regs.
>  - Replace pt_regs in rethook and fprobe_exit_handler with ftrace_regs. This
>    introduce a new HAVE_PT_REGS_TO_FTRACE_REGS_CAST which means ftrace_regs is
>    just a wrapper of pt_regs (except for arm64, other architectures do this)
>  - Update fprobe-events to use ftrace_regs natively.
>  - Update bpf multi-kprobe handler use ftrace_partial_regs().
>  - Update document for new fprobe callbacks.
>  - Add notes for the $argN and $retval.
> 
> This series can be applied against the trace-v6.6-rc2 on linux-trace tree.
> 
> This series can also be found below branch.
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/mhiramat/linux.git/log/?h=topic/fprobe-ftrace-regs
> 
> Thank you,
> 
> ---
> 
> Masami Hiramatsu (Google) (12):
>       riscv: ftrace: Fix to pass correct ftrace_regs to ftrace_func_t functions
>       Documentation: probes: Add a new ret_ip callback parameter
>       tracing: Add a comment about the requirements of the ftrace_regs
>       fprobe: Use ftrace_regs in fprobe entry handler
>       tracing: Expose ftrace_regs regardless of CONFIG_FUNCTION_TRACER
>       fprobe: rethook: Use ftrace_regs in fprobe exit handler and rethook
>       tracing: Add ftrace_partial_regs() for converting ftrace_regs to pt_regs
>       tracing: Add ftrace_fill_perf_regs() for perf event
>       tracing/fprobe: Enable fprobe events with CONFIG_DYNAMIC_FTRACE_WITH_ARGS
>       bpf: Enable kprobe_multi feature if CONFIG_FPROBE is enabled
>       Documentation: probes: Update fprobe document to use ftrace_regs
>       Documentation: tracing: Add a note about argument and retval access
> 
> 
>  Documentation/trace/fprobe.rst      |   18 +++--
>  Documentation/trace/fprobetrace.rst |    8 ++
>  Documentation/trace/kprobetrace.rst |    8 ++
>  arch/Kconfig                        |    1 
>  arch/arm64/include/asm/ftrace.h     |   18 +++++
>  arch/loongarch/Kconfig              |    1 
>  arch/loongarch/kernel/rethook.c     |   10 +--
>  arch/loongarch/kernel/rethook.h     |    4 +
>  arch/powerpc/include/asm/ftrace.h   |    7 ++
>  arch/riscv/kernel/mcount-dyn.S      |   67 ++++++++----------
>  arch/riscv/kernel/probes/rethook.c  |   12 ++-
>  arch/riscv/kernel/probes/rethook.h  |    6 +-
>  arch/s390/Kconfig                   |    1 
>  arch/s390/include/asm/ftrace.h      |    9 ++
>  arch/s390/kernel/rethook.c          |   10 ++-
>  arch/s390/kernel/rethook.h          |    2 -
>  arch/x86/Kconfig                    |    1 
>  arch/x86/include/asm/ftrace.h       |    7 ++
>  arch/x86/kernel/rethook.c           |   13 ++--
>  include/linux/fprobe.h              |    4 +
>  include/linux/ftrace.h              |  128 +++++++++++++++++++++++++++++------
>  include/linux/rethook.h             |   11 ++-
>  kernel/kprobes.c                    |   10 ++-
>  kernel/trace/Kconfig                |    9 ++
>  kernel/trace/bpf_trace.c            |   14 ++--
>  kernel/trace/fprobe.c               |   10 +--
>  kernel/trace/rethook.c              |   16 ++--
>  kernel/trace/trace_fprobe.c         |   70 +++++++++++--------
>  kernel/trace/trace_probe_tmpl.h     |    2 -
>  lib/test_fprobe.c                   |   10 +--
>  samples/fprobe/fprobe_example.c     |    4 +
>  31 files changed, 327 insertions(+), 164 deletions(-)
> 
> --
> Masami Hiramatsu (Google) <mhiramat@kernel.org>
Alexei Starovoitov Sept. 30, 2023, 12:12 a.m. UTC | #2
On Thu, Sep 28, 2023 at 6:21 PM Masami Hiramatsu <mhiramat@kernel.org> wrote:
>
>
> Thus, what I need is to make fprobe to use function-graph tracer's shadow
> stack and trampoline instead of rethook. This may need to generalize its
> interface so that we can share it between fprobe and function-graph tracer,
> but we don't need to involve rethook and kretprobes anymore.

...

> And need to add patches
>
>  - Introduce a generized function exit hook interface for ftrace.
>  - Replace rethook in fprobe with the function exit hook interface.

you mean that rethook will be removed after that?
Masami Hiramatsu (Google) Sept. 30, 2023, 9:14 a.m. UTC | #3
On Fri, 29 Sep 2023 17:12:07 -0700
Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:

> On Thu, Sep 28, 2023 at 6:21 PM Masami Hiramatsu <mhiramat@kernel.org> wrote:
> >
> >
> > Thus, what I need is to make fprobe to use function-graph tracer's shadow
> > stack and trampoline instead of rethook. This may need to generalize its
> > interface so that we can share it between fprobe and function-graph tracer,
> > but we don't need to involve rethook and kretprobes anymore.
> 
> ...
> 
> > And need to add patches
> >
> >  - Introduce a generized function exit hook interface for ftrace.
> >  - Replace rethook in fprobe with the function exit hook interface.
> 
> you mean that rethook will be removed after that?

No, it is too late. rethook is deeply integrated with kretprobe.
So when we remove the kretprobe, rethook will be removed too.
(fprobe and kretprobe provides similar functionality, so we can
move to fprobe)

Even though, objpool(*) itself might be kept for some other use
cases. As far as I can see, ftrace_ret_stack can not provide a context
local storage between entry -> exit callbacks. (so this feature must
be dropped from fprobe)

(*) https://lore.kernel.org/all/20230905015255.81545-1-wuqiang.matt@bytedance.com/

Thank you,
Masami Hiramatsu (Google) Oct. 2, 2023, 3:39 p.m. UTC | #4
On Sat, 30 Sep 2023 18:14:35 +0900
Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote:

> On Fri, 29 Sep 2023 17:12:07 -0700
> Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> 
> > On Thu, Sep 28, 2023 at 6:21 PM Masami Hiramatsu <mhiramat@kernel.org> wrote:
> > >
> > >
> > > Thus, what I need is to make fprobe to use function-graph tracer's shadow
> > > stack and trampoline instead of rethook. This may need to generalize its
> > > interface so that we can share it between fprobe and function-graph tracer,
> > > but we don't need to involve rethook and kretprobes anymore.
> > 
> > ...
> > 
> > > And need to add patches
> > >
> > >  - Introduce a generized function exit hook interface for ftrace.
> > >  - Replace rethook in fprobe with the function exit hook interface.
> > 
> > you mean that rethook will be removed after that?
> 
> No, it is too late. rethook is deeply integrated with kretprobe.
> So when we remove the kretprobe, rethook will be removed too.
> (fprobe and kretprobe provides similar functionality, so we can
> move to fprobe)
> 
> Even though, objpool(*) itself might be kept for some other use
> cases. As far as I can see, ftrace_ret_stack can not provide a context
> local storage between entry -> exit callbacks. (so this feature must
> be dropped from fprobe)
> 
> (*) https://lore.kernel.org/all/20230905015255.81545-1-wuqiang.matt@bytedance.com/

Oops, I rechecked the performance of objpool with prctl loop by
perf stat -a -I 10000 --interval-count 4 -e syscalls:sys_enter_prctl

And I found that with objpool, fprobe performance is the same
as function-graph!

	noprobe	kretprobe	fprobe	function-graph
T1	10706762	8506402	10475887	11249096
T2	28698960	20972543	22567923	22586848
T4	56634397	41500675	45042714	44932685
T8	114910972	85211522	91560078	90068034
T16	228519966	169212249	181582171	181181211
T32	448049923	330408645	361074536	356221873
T64	623779515	450932779	499909030	495516920

objpool consumes more memory than current freelist (because it is
just a list with counter) but that is limited. Usual use-case
(per-probe node size is 128, # of cpu: 8) one probe will allocate
22KB. (100 probes will need 2.2MB)
This is comparable to function graph ret-stack.

So now I'm reconsidering the strategy. I might better to keep
using rethook, but without ugly pt_regs casts. (e.g. use different
trampoline if rethook user requires ftrace_regs)

Sorry for confusing the direction.

Thank you,