diff mbox series

[v6] arm64: implement KPROBES_ON_FTRACE

Message ID 20191218140622.57bbaca5@xhacker.debian (mailing list archive)
State New, archived
Headers show
Series [v6] arm64: implement KPROBES_ON_FTRACE | expand

Commit Message

Jisheng Zhang Dec. 18, 2019, 6:21 a.m. UTC
KPROBES_ON_FTRACE avoids much of the overhead with regular kprobes as it
eliminates the need for a trap, as well as the need to emulate or
single-step instructions.

Tested on berlin arm64 platform.

~ # mount -t debugfs debugfs /sys/kernel/debug/
~ # cd /sys/kernel/debug/
/sys/kernel/debug # echo 'p _do_fork' > tracing/kprobe_events

before the patch:

/sys/kernel/debug # cat kprobes/list
ffffff801009fe28  k  _do_fork+0x0    [DISABLED]

after the patch:

/sys/kernel/debug # cat kprobes/list
ffffff801009ff54  k  _do_fork+0x4    [DISABLED][FTRACE]

Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
---
KPROBES_ON_FTRACE avoids much of the overhead with regular kprobes as it
eliminates the need for a trap, as well as the need to emulate or
single-step instructions.

Changes since v5:
  - rebase v5.5-rc1
  - collect Acked-by and Reviewed-by tags

Changes since v4:
  - correct reg->pc: probed on foo, then pre_handler see foo+0x4, while
    post_handler see foo+0x8

Changes since v3:
  - move kprobe_lookup_name() and arch_kprobe_on_func_entry to ftrace.c since
    we only want to choose the ftrace entry for KPROBES_ON_FTRACE.
  - only choose ftrace entry if (addr && !offset)

Changes since v2:
  - remove patch1, make it a single cleanup patch
  - remove "This patch" in the change log
  - implement arm64's kprobe_lookup_name() and arch_kprobe_on_func_entry instead
    of patching the common kprobes code

Changes since v1:
  - make the kprobes/x86: use instruction_pointer and instruction_pointer_set
    as patch1
  - add Masami's ACK to patch1
  - add some description about KPROBES_ON_FTRACE and why we need it on
    arm64
  - correct the log before the patch
  - remove the consolidation patch, make it as TODO
  - only adjust kprobe's addr when KPROBE_FLAG_FTRACE is set
  - if KPROBES_ON_FTRACE, ftrace_call_adjust() the kprobe's addr before
    calling ftrace_location()
  - update the kprobes-on-ftrace/arch-support.txt in doc
 .../debug/kprobes-on-ftrace/arch-support.txt  |  2 +-
 arch/arm64/Kconfig                            |  1 +
 arch/arm64/kernel/probes/Makefile             |  1 +
 arch/arm64/kernel/probes/ftrace.c             | 83 +++++++++++++++++++
 4 files changed, 86 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm64/kernel/probes/ftrace.c

Comments

Masami Hiramatsu (Google) Dec. 18, 2019, 1:25 p.m. UTC | #1
On Wed, 18 Dec 2019 06:21:35 +0000
Jisheng Zhang <Jisheng.Zhang@synaptics.com> wrote:

> KPROBES_ON_FTRACE avoids much of the overhead with regular kprobes as it
> eliminates the need for a trap, as well as the need to emulate or
> single-step instructions.
> 
> Tested on berlin arm64 platform.
> 
> ~ # mount -t debugfs debugfs /sys/kernel/debug/
> ~ # cd /sys/kernel/debug/
> /sys/kernel/debug # echo 'p _do_fork' > tracing/kprobe_events
> 
> before the patch:
> 
> /sys/kernel/debug # cat kprobes/list
> ffffff801009fe28  k  _do_fork+0x0    [DISABLED]
> 
> after the patch:
> 
> /sys/kernel/debug # cat kprobes/list
> ffffff801009ff54  k  _do_fork+0x4    [DISABLED][FTRACE]

BTW, it seems this automatically changes the offset without
user's intention or any warnings. How would you manage if the user
pass a new probe on _do_fork+0x4?

IOW, it is still the question who really wants to probe on
the _do_fork+"0", if kprobes modifies it automatically,
no one can do that anymore.
This can be happen if the user want to record LR or SP value
at the function call for debug. If kprobe always modifies it,
we will lose the way to do it.

Could you remove below function at this moment?

> +kprobe_opcode_t *kprobe_lookup_name(const char *name, unsigned int offset)
> +{
> +	unsigned long addr = kallsyms_lookup_name(name);
> +
> +	if (addr && !offset) {
> +		unsigned long faddr;
> +		/*
> +		 * with -fpatchable-function-entry=2, the first 4 bytes is the
> +		 * LR saver, then the actual call insn. So ftrace location is
> +		 * always on the first 4 bytes offset.
> +		 */
> +		faddr = ftrace_location_range(addr,
> +					      addr + AARCH64_INSN_SIZE);
> +		if (faddr)
> +			return (kprobe_opcode_t *)faddr;
> +	}
> +	return (kprobe_opcode_t *)addr;
> +}
> +
> +bool arch_kprobe_on_func_entry(unsigned long offset)
> +{
> +	return offset <= AARCH64_INSN_SIZE;
> +}


Without this automatic change, we still can change the offset
in upper layer.

Thank you,
Jisheng Zhang Dec. 23, 2019, 7:47 a.m. UTC | #2
Hi Masami,

On Wed, 18 Dec 2019 22:25:50 +0900 Masami Hiramatsu wrote:


> 
> 
> On Wed, 18 Dec 2019 06:21:35 +0000
> Jisheng Zhang <Jisheng.Zhang@synaptics.com> wrote:
> 
> > KPROBES_ON_FTRACE avoids much of the overhead with regular kprobes as it
> > eliminates the need for a trap, as well as the need to emulate or
> > single-step instructions.
> >
> > Tested on berlin arm64 platform.
> >
> > ~ # mount -t debugfs debugfs /sys/kernel/debug/
> > ~ # cd /sys/kernel/debug/
> > /sys/kernel/debug # echo 'p _do_fork' > tracing/kprobe_events
> >
> > before the patch:
> >
> > /sys/kernel/debug # cat kprobes/list
> > ffffff801009fe28  k  _do_fork+0x0    [DISABLED]
> >
> > after the patch:
> >
> > /sys/kernel/debug # cat kprobes/list
> > ffffff801009ff54  k  _do_fork+0x4    [DISABLED][FTRACE]  
> 
> BTW, it seems this automatically changes the offset without
> user's intention or any warnings. How would you manage if the user
> pass a new probe on _do_fork+0x4?

In current implementation, two probes at the same address _do_fork+0x4

> 
> IOW, it is still the question who really wants to probe on
> the _do_fork+"0", if kprobes modifies it automatically,
> no one can do that anymore.
> This can be happen if the user want to record LR or SP value
> at the function call for debug. If kprobe always modifies it,
> we will lose the way to do it.

arm64's DYNAMIC_FTRACE_WITH_REGS implementation makes use of GCC
-fpatchable-function-entry=2 option to insert two nops. When the function
is traced, the first nop will be modified to the LR saver, then the
second nop to "bl <ftrace-entry>", commit 3b23e4991fb6("
arm64: implement ftrace with regs") explains the mechanism.

So on arm64(in fact any arch makes use of -fpatchable-function-entry will
behave similarly), when DYNAMIC_FTRACE_WITH_REGS is enabled, the ftrace location
is always on the first 4 bytes offset.

I think when users want to register a kprobe on _do_fork+0, what he really want
is to kprobe on the patched(by -fpatchable-function-entry)_do_fork+4

PS: per my understanding, powerpc's kprobes_on_ftrace also introduces an
extra automatic offset due to its implementation.

> 
> Could you remove below function at this moment?
> 
> > +kprobe_opcode_t *kprobe_lookup_name(const char *name, unsigned int offset)
> > +{
> > +     unsigned long addr = kallsyms_lookup_name(name);
> > +
> > +     if (addr && !offset) {
> > +             unsigned long faddr;
> > +             /*
> > +              * with -fpatchable-function-entry=2, the first 4 bytes is the
> > +              * LR saver, then the actual call insn. So ftrace location is
> > +              * always on the first 4 bytes offset.
> > +              */
> > +             faddr = ftrace_location_range(addr,
> > +                                           addr + AARCH64_INSN_SIZE);
> > +             if (faddr)
> > +                     return (kprobe_opcode_t *)faddr;
> > +     }
> > +     return (kprobe_opcode_t *)addr;
> > +}
> > +
> > +bool arch_kprobe_on_func_entry(unsigned long offset)
> > +{
> > +     return offset <= AARCH64_INSN_SIZE;
> > +}  
> 
> 
> Without this automatic change, we still can change the offset
> in upper layer.

If remove the two functions, kprobe on  _do_fork can't ride on
ftrace infrastructure, but kprobe on _do_fork+4 can. I'm not sure
whether this is reasonable. Every kprobe users on arm64 will need to
remember to pass an extra offset +4 to make use of kprobe_on_ftrace, could
we hide the "+4"?

Thanks
Masami Hiramatsu (Google) Dec. 24, 2019, 10:09 a.m. UTC | #3
Hi Jisheng,

On Mon, 23 Dec 2019 07:47:24 +0000
Jisheng Zhang <Jisheng.Zhang@synaptics.com> wrote:

> Hi Masami,
> 
> On Wed, 18 Dec 2019 22:25:50 +0900 Masami Hiramatsu wrote:
> 
> 
> > 
> > 
> > On Wed, 18 Dec 2019 06:21:35 +0000
> > Jisheng Zhang <Jisheng.Zhang@synaptics.com> wrote:
> > 
> > > KPROBES_ON_FTRACE avoids much of the overhead with regular kprobes as it
> > > eliminates the need for a trap, as well as the need to emulate or
> > > single-step instructions.
> > >
> > > Tested on berlin arm64 platform.
> > >
> > > ~ # mount -t debugfs debugfs /sys/kernel/debug/
> > > ~ # cd /sys/kernel/debug/
> > > /sys/kernel/debug # echo 'p _do_fork' > tracing/kprobe_events
> > >
> > > before the patch:
> > >
> > > /sys/kernel/debug # cat kprobes/list
> > > ffffff801009fe28  k  _do_fork+0x0    [DISABLED]
> > >
> > > after the patch:
> > >
> > > /sys/kernel/debug # cat kprobes/list
> > > ffffff801009ff54  k  _do_fork+0x4    [DISABLED][FTRACE]  
> > 
> > BTW, it seems this automatically changes the offset without
> > user's intention or any warnings. How would you manage if the user
> > pass a new probe on _do_fork+0x4?
> 
> In current implementation, two probes at the same address _do_fork+0x4

OK, that is my point.

> > IOW, it is still the question who really wants to probe on
> > the _do_fork+"0", if kprobes modifies it automatically,
> > no one can do that anymore.
> > This can be happen if the user want to record LR or SP value
> > at the function call for debug. If kprobe always modifies it,
> > we will lose the way to do it.
> 
> arm64's DYNAMIC_FTRACE_WITH_REGS implementation makes use of GCC
> -fpatchable-function-entry=2 option to insert two nops. When the function
> is traced, the first nop will be modified to the LR saver, then the
> second nop to "bl <ftrace-entry>", commit 3b23e4991fb6("
> arm64: implement ftrace with regs") explains the mechanism.

So both of the instruction at func+0 and func+4 are replaced.

> So on arm64(in fact any arch makes use of -fpatchable-function-entry will
> behave similarly), when DYNAMIC_FTRACE_WITH_REGS is enabled, the ftrace location
> is always on the first 4 bytes offset.
> 
> I think when users want to register a kprobe on _do_fork+0, what he really want
> is to kprobe on the patched(by -fpatchable-function-entry)_do_fork+4

OK, in this case, kprobe should treat the first 2 instructions as a
single virtual instruction. This means,

 - kprobes can probe func+0, but not func+4 if the function is ftraced.
    (-EILSEQ must be returned)
 - both debugfs/kprobes/list and tracefs/kprobe_events should show the
   probed address as func+0. Not func+4.

Then, user can use kprobes as if there is one big (8-byte) instruction
at the entry of the function. Since probing on func+4 is rejected and
the call-site LR/SP is restored in ftrace, there should be no
contradiction. It should work as if we put a breakpoint on the func + 0.

> 
> PS: per my understanding, powerpc's kprobes_on_ftrace also introduces an
> extra automatic offset due to its implementation.

Uh, that is also ugly.... must be fixed.


> > 
> > Could you remove below function at this moment?
> > 
> > > +kprobe_opcode_t *kprobe_lookup_name(const char *name, unsigned int offset)
> > > +{
> > > +     unsigned long addr = kallsyms_lookup_name(name);
> > > +
> > > +     if (addr && !offset) {
> > > +             unsigned long faddr;
> > > +             /*
> > > +              * with -fpatchable-function-entry=2, the first 4 bytes is the
> > > +              * LR saver, then the actual call insn. So ftrace location is
> > > +              * always on the first 4 bytes offset.
> > > +              */
> > > +             faddr = ftrace_location_range(addr,
> > > +                                           addr + AARCH64_INSN_SIZE);
> > > +             if (faddr)
> > > +                     return (kprobe_opcode_t *)faddr;
> > > +     }
> > > +     return (kprobe_opcode_t *)addr;
> > > +}
> > > +
> > > +bool arch_kprobe_on_func_entry(unsigned long offset)
> > > +{
> > > +     return offset <= AARCH64_INSN_SIZE;
> > > +}  
> > 
> > 
> > Without this automatic change, we still can change the offset
> > in upper layer.
> 
> If remove the two functions, kprobe on  _do_fork can't ride on
> ftrace infrastructure, but kprobe on _do_fork+4 can. I'm not sure
> whether this is reasonable. Every kprobe users on arm64 will need to
> remember to pass an extra offset +4 to make use of kprobe_on_ftrace, could
> we hide the "+4"?

Yes, OK, as I said above, please hide +4. We will see the virtual
"call" instruction (= "mov x9, lr; br <addr>") at the entry of ftraced
function. :)

Thank you,
Jisheng Zhang Dec. 25, 2019, 9:01 a.m. UTC | #4
Hi,

On Tue, 24 Dec 2019 19:09:16 +0900 Masami Hiramatsu wrote:

> 
> Hi Jisheng,
> 
> On Mon, 23 Dec 2019 07:47:24 +0000
> Jisheng Zhang <Jisheng.Zhang@synaptics.com> wrote:
> 
> > Hi Masami,
> >
> > On Wed, 18 Dec 2019 22:25:50 +0900 Masami Hiramatsu wrote:
> >
> >  
> > >
> > >
> > > On Wed, 18 Dec 2019 06:21:35 +0000
> > > Jisheng Zhang <Jisheng.Zhang@synaptics.com> wrote:
> > >  
> > > > KPROBES_ON_FTRACE avoids much of the overhead with regular kprobes as it
> > > > eliminates the need for a trap, as well as the need to emulate or
> > > > single-step instructions.
> > > >
> > > > Tested on berlin arm64 platform.
> > > >
> > > > ~ # mount -t debugfs debugfs /sys/kernel/debug/
> > > > ~ # cd /sys/kernel/debug/
> > > > /sys/kernel/debug # echo 'p _do_fork' > tracing/kprobe_events
> > > >
> > > > before the patch:
> > > >
> > > > /sys/kernel/debug # cat kprobes/list
> > > > ffffff801009fe28  k  _do_fork+0x0    [DISABLED]
> > > >
> > > > after the patch:
> > > >
> > > > /sys/kernel/debug # cat kprobes/list
> > > > ffffff801009ff54  k  _do_fork+0x4    [DISABLED][FTRACE]  
> > >
> > > BTW, it seems this automatically changes the offset without
> > > user's intention or any warnings. How would you manage if the user
> > > pass a new probe on _do_fork+0x4?  
> >
> > In current implementation, two probes at the same address _do_fork+0x4  
> 
> OK, that is my point.
> 
> > > IOW, it is still the question who really wants to probe on
> > > the _do_fork+"0", if kprobes modifies it automatically,
> > > no one can do that anymore.
> > > This can be happen if the user want to record LR or SP value
> > > at the function call for debug. If kprobe always modifies it,
> > > we will lose the way to do it.  
> >
> > arm64's DYNAMIC_FTRACE_WITH_REGS implementation makes use of GCC
> > -fpatchable-function-entry=2 option to insert two nops. When the function
> > is traced, the first nop will be modified to the LR saver, then the
> > second nop to "bl <ftrace-entry>", commit 3b23e4991fb6("
> > arm64: implement ftrace with regs") explains the mechanism.  
> 
> So both of the instruction at func+0 and func+4 are replaced.
> 
> > So on arm64(in fact any arch makes use of -fpatchable-function-entry will
> > behave similarly), when DYNAMIC_FTRACE_WITH_REGS is enabled, the ftrace location
> > is always on the first 4 bytes offset.
> >
> > I think when users want to register a kprobe on _do_fork+0, what he really want
> > is to kprobe on the patched(by -fpatchable-function-entry)_do_fork+4  
> 
> OK, in this case, kprobe should treat the first 2 instructions as a
> single virtual instruction. This means,
> 
>  - kprobes can probe func+0, but not func+4 if the function is ftraced.
>     (-EILSEQ must be returned)
>  - both debugfs/kprobes/list and tracefs/kprobe_events should show the
>    probed address as func+0. Not func+4.
> 
> Then, user can use kprobes as if there is one big (8-byte) instruction
> at the entry of the function. Since probing on func+4 is rejected and
> the call-site LR/SP is restored in ftrace, there should be no
> contradiction. It should work as if we put a breakpoint on the func + 0.

From https://lkml.org/lkml/2019/6/18/648, Naveen tried to allow probing on
any ftrace address, is it acceptable on arm64 as well? I will post patches
for this purpose.

Thanks for your review
diff mbox series

Patch

diff --git a/Documentation/features/debug/kprobes-on-ftrace/arch-support.txt b/Documentation/features/debug/kprobes-on-ftrace/arch-support.txt
index 4fae0464ddff..f9dd9dd91e0c 100644
--- a/Documentation/features/debug/kprobes-on-ftrace/arch-support.txt
+++ b/Documentation/features/debug/kprobes-on-ftrace/arch-support.txt
@@ -9,7 +9,7 @@ 
     |       alpha: | TODO |
     |         arc: | TODO |
     |         arm: | TODO |
-    |       arm64: | TODO |
+    |       arm64: |  ok  |
     |         c6x: | TODO |
     |        csky: | TODO |
     |       h8300: | TODO |
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index b1b4476ddb83..92b9882889ac 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -166,6 +166,7 @@  config ARM64
 	select HAVE_STACKPROTECTOR
 	select HAVE_SYSCALL_TRACEPOINTS
 	select HAVE_KPROBES
+	select HAVE_KPROBES_ON_FTRACE
 	select HAVE_KRETPROBES
 	select HAVE_GENERIC_VDSO
 	select IOMMU_DMA if IOMMU_SUPPORT
diff --git a/arch/arm64/kernel/probes/Makefile b/arch/arm64/kernel/probes/Makefile
index 8e4be92e25b1..4020cfc66564 100644
--- a/arch/arm64/kernel/probes/Makefile
+++ b/arch/arm64/kernel/probes/Makefile
@@ -4,3 +4,4 @@  obj-$(CONFIG_KPROBES)		+= kprobes.o decode-insn.o	\
 				   simulate-insn.o
 obj-$(CONFIG_UPROBES)		+= uprobes.o decode-insn.o	\
 				   simulate-insn.o
+obj-$(CONFIG_KPROBES_ON_FTRACE)	+= ftrace.o
diff --git a/arch/arm64/kernel/probes/ftrace.c b/arch/arm64/kernel/probes/ftrace.c
new file mode 100644
index 000000000000..9f80905f02fa
--- /dev/null
+++ b/arch/arm64/kernel/probes/ftrace.c
@@ -0,0 +1,83 @@ 
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Dynamic Ftrace based Kprobes Optimization
+ *
+ * Copyright (C) Hitachi Ltd., 2012
+ * Copyright (C) 2019 Jisheng Zhang <jszhang@kernel.org>
+ *		      Synaptics Incorporated
+ */
+
+#include <linux/kprobes.h>
+
+/* Ftrace callback handler for kprobes -- called under preepmt disabed */
+void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
+			   struct ftrace_ops *ops, struct pt_regs *regs)
+{
+	struct kprobe *p;
+	struct kprobe_ctlblk *kcb;
+
+	/* Preempt is disabled by ftrace */
+	p = get_kprobe((kprobe_opcode_t *)ip);
+	if (unlikely(!p) || kprobe_disabled(p))
+		return;
+
+	kcb = get_kprobe_ctlblk();
+	if (kprobe_running()) {
+		kprobes_inc_nmissed_count(p);
+	} else {
+		unsigned long orig_ip = instruction_pointer(regs);
+
+		instruction_pointer_set(regs, ip);
+		__this_cpu_write(current_kprobe, p);
+		kcb->kprobe_status = KPROBE_HIT_ACTIVE;
+		if (!p->pre_handler || !p->pre_handler(p, regs)) {
+			/*
+			 * Emulate singlestep (and also recover regs->pc)
+			 * as if there is a nop
+			 */
+			instruction_pointer_set(regs,
+				(unsigned long)p->addr + MCOUNT_INSN_SIZE);
+			if (unlikely(p->post_handler)) {
+				kcb->kprobe_status = KPROBE_HIT_SSDONE;
+				p->post_handler(p, regs, 0);
+			}
+			instruction_pointer_set(regs, orig_ip);
+		}
+		/*
+		 * If pre_handler returns !0, it changes regs->pc. We have to
+		 * skip emulating post_handler.
+		 */
+		__this_cpu_write(current_kprobe, NULL);
+	}
+}
+NOKPROBE_SYMBOL(kprobe_ftrace_handler);
+
+kprobe_opcode_t *kprobe_lookup_name(const char *name, unsigned int offset)
+{
+	unsigned long addr = kallsyms_lookup_name(name);
+
+	if (addr && !offset) {
+		unsigned long faddr;
+		/*
+		 * with -fpatchable-function-entry=2, the first 4 bytes is the
+		 * LR saver, then the actual call insn. So ftrace location is
+		 * always on the first 4 bytes offset.
+		 */
+		faddr = ftrace_location_range(addr,
+					      addr + AARCH64_INSN_SIZE);
+		if (faddr)
+			return (kprobe_opcode_t *)faddr;
+	}
+	return (kprobe_opcode_t *)addr;
+}
+
+bool arch_kprobe_on_func_entry(unsigned long offset)
+{
+	return offset <= AARCH64_INSN_SIZE;
+}
+
+int arch_prepare_kprobe_ftrace(struct kprobe *p)
+{
+	p->ainsn.api.insn = NULL;
+	return 0;
+}