diff mbox series

[RFC] ftrace/x86: Emulate call function while updating in breakpoint handler

Message ID 20190430134913.4e29ce72@gandalf.local.home (mailing list archive)
State New
Headers show
Series [RFC] ftrace/x86: Emulate call function while updating in breakpoint handler | expand

Commit Message

Steven Rostedt April 30, 2019, 5:49 p.m. UTC
From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>


Nicolai Stange discovered[1] that if live kernel patching is enabled, and the
function tracer started tracing the same function that was patched, the
conversion of the fentry call site during the translation of going from
calling the live kernel patch trampoline to the iterator trampoline, would
have as slight window where it didn't call anything.

As live kernel patching depends on ftrace to always call its code (to
prevent the function being traced from being called, as it will redirect
it). This small window would allow the old buggy function to be called, and
this can cause undesirable results.

Nicolai submitted new patches[2] but these were controversial. As this is
similar to the static call emulation issues that came up a while ago[3],
Linus suggested using per CPU data along with special trampolines[4] to emulate
the calls.

Linus's solution was for text poke (which was mostly what the static_call
code did), but as ftrace has its own mechanism, it required doing its own
thing.

Having ftrace use its own per CPU data and having its own set of specialized
trampolines solves the issue of missed calls that live kernel patching
suffers.

[1] http://lkml.kernel.org/r/20180726104029.7736-1-nstange@suse.de
[2] http://lkml.kernel.org/r/20190427100639.15074-1-nstange@suse.de
[3] http://lkml.kernel.org/r/3cf04e113d71c9f8e4be95fb84a510f085aa4afa.1541711457.git.jpoimboe@redhat.com
[4] http://lkml.kernel.org/r/CAHk-=wh5OpheSU8Em_Q3Hg8qw_JtoijxOdPtHru6d+5K8TWM=A@mail.gmail.com

Inspired-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: stable@vger.kernel.org
Fixes: b700e7f03df5 ("livepatch: kernel: add support for live patching")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 arch/x86/kernel/ftrace.c | 146 ++++++++++++++++++++++++++++++++++++++-
 1 file changed, 143 insertions(+), 3 deletions(-)

Comments

Linus Torvalds April 30, 2019, 6:33 p.m. UTC | #1
On Tue, Apr 30, 2019 at 10:49 AM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> +
> +asm(
> +       ".text\n"
> +
> +       /* Trampoline for function update with interrupts enabled */
> +       ".global ftrace_emulate_call_irqoff\n"
> +       ".type ftrace_emulate_call_irqoff, @function\n"
> +       "ftrace_emulate_call_irqoff:\n\t"
> +               "push %gs:ftrace_bp_call_return\n\t"

Well, as mentioned in my original suggestion, this won't work on
32-bit, or on UP. They have different models for per-cpu data (32-bti
uses %fs, and UP doesn't use a segment override at all).

Maybe we just don't care about UP at all for this code, of course.

And maybe we can make the decision to also make 32-bit just not use
this either - so maybe the code is ok per se, just needs to make sure
it never triggers for the cases that it's not written for..

> +       "ftrace_emulate_call_update_irqoff:\n\t"
> +               "push %gs:ftrace_bp_call_return\n\t"
> +               "sti\n\t"
> +               "jmp *ftrace_update_func_call\n"

.. and this should then use the "push push sti ret" model instead.

Plus get updated for objtool complaints.

Anyway, since Andy really likes the entry code change, can we have
that patch in parallel and judge the difference that way? Iirc, that
was x86-64 specific too.

                           Linus
Steven Rostedt April 30, 2019, 7 p.m. UTC | #2
On Tue, 30 Apr 2019 11:33:21 -0700
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Tue, Apr 30, 2019 at 10:49 AM Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> > +
> > +asm(
> > +       ".text\n"
> > +
> > +       /* Trampoline for function update with interrupts enabled */
> > +       ".global ftrace_emulate_call_irqoff\n"
> > +       ".type ftrace_emulate_call_irqoff, @function\n"
> > +       "ftrace_emulate_call_irqoff:\n\t"
> > +               "push %gs:ftrace_bp_call_return\n\t"  
> 
> Well, as mentioned in my original suggestion, this won't work on
> 32-bit, or on UP. They have different models for per-cpu data (32-bti
> uses %fs, and UP doesn't use a segment override at all).

Ah, yeah, I forgot about 32-bit. I could easily make this use fs as
well, and for UP, just use a static variable.

> 
> Maybe we just don't care about UP at all for this code, of course.
> 
> And maybe we can make the decision to also make 32-bit just not use
> this either - so maybe the code is ok per se, just needs to make sure
> it never triggers for the cases that it's not written for..
> 
> > +       "ftrace_emulate_call_update_irqoff:\n\t"
> > +               "push %gs:ftrace_bp_call_return\n\t"
> > +               "sti\n\t"
> > +               "jmp *ftrace_update_func_call\n"  
> 
> .. and this should then use the "push push sti ret" model instead.
> 
> Plus get updated for objtool complaints.

Yeah, I see that now. Somehow it disappeared when I looked for it after
making some other changes. I can update it.

> 
> Anyway, since Andy really likes the entry code change, can we have
> that patch in parallel and judge the difference that way? Iirc, that
> was x86-64 specific too.

Note, I don't think live kernel patching supports 32 bit anyway, so
that may not be an issue.

Josh,

When you come back to the office, can you look into that method?

-- Steve
Steven Rostedt April 30, 2019, 9:08 p.m. UTC | #3
On Tue, 30 Apr 2019 11:33:21 -0700
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> > +       "ftrace_emulate_call_update_irqoff:\n\t"
> > +               "push %gs:ftrace_bp_call_return\n\t"
> > +               "sti\n\t"
> > +               "jmp *ftrace_update_func_call\n"  
> 
> .. and this should then use the "push push sti ret" model instead.
> 
> Plus get updated for objtool complaints.

And unfortunately, this blows up on lockdep. Lockdep notices that the
return from the breakpoint handler has interrupts enabled, and will not
enable them in its shadow irqs disabled variable. But then we enabled
them in the trampoline, without telling lockdep and we trigger
something likes this:

------------[ cut here ]------------
IRQs not enabled as expected
WARNING: CPU: 2 PID: 0 at kernel/time/tick-sched.c:979 tick_nohz_idle_enter+0x44/0x8c
Modules linked in:
CPU: 2 PID: 0 Comm: swapper/2 Not tainted 5.1.0-rc3-test+ #123
Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014
EIP: tick_nohz_idle_enter+0x44/0x8c
Code: f0 05 00 00 00 75 26 83 b8 c4 05 00 00 00 75 1d 80 3d 5f 0f 43 c1 00 75 14 68 72 74 16 c1 c6 05 5f 0f 43 c1 01 e8 33 d7 f8 ff <0f> 0b 58 fa e8 4e 2c 04 00 bb e0 36 6b c1 64 03 1d 28 81 56 c1 8b
EAX: 0000001c EBX: ee769f84 ECX: 00000000 EDX: 00000006
ESI: 00000000 EDI: 00000002 EBP: ee769f50 ESP: ee769f48
DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00210292
CR0: 80050033 CR2: 00000000 CR3: 016c4000 CR4: 001406f0
Call Trace:
 do_idle+0x2a/0x1fc
 cpu_startup_entry+0x1e/0x20
 start_secondary+0x1d3/0x1ec
 startup_32_smp+0x164/0x168


I have to fool lockdep with the following:

		if (regs->flags & X86_EFLAGS_IF) {
			regs->flags &= ~X86_EFLAGS_IF;
			regs->ip = (unsigned long) ftrace_emulate_call_irqoff;
			/* Tell lockdep here we are enabling interrupts */
			trace_hardirqs_on();
		} else {
			regs->ip = (unsigned long) ftrace_emulate_call_irqon;
		}

-- Steve
Peter Zijlstra May 1, 2019, 1:11 p.m. UTC | #4
On Tue, Apr 30, 2019 at 11:33:21AM -0700, Linus Torvalds wrote:
> Anyway, since Andy really likes the entry code change, can we have
> that patch in parallel and judge the difference that way? Iirc, that
> was x86-64 specific too.

Here goes, compile tested only...

It obviously needs a self-test, but that shoulnd't be too hard to
arrange.

---
 arch/x86/entry/entry_32.S            |  7 +++++++
 arch/x86/entry/entry_64.S            | 14 ++++++++++++--
 arch/x86/include/asm/text-patching.h | 20 ++++++++++++++++++++
 arch/x86/kernel/ftrace.c             | 24 +++++++++++++++++++-----
 4 files changed, 58 insertions(+), 7 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 7b23431be5cb..d246302085a3 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -1479,6 +1479,13 @@ ENTRY(int3)
 	ASM_CLAC
 	pushl	$-1				# mark this as an int
 
+	testl	$SEGMENT_RPL_MASK, PT_CS(%esp)
+	jnz	.Lfrom_usermode_no_gap
+	.rept 6
+	pushl	5*4(%esp)
+	.endr
+.Lfrom_usermode_no_gap:
+
 	SAVE_ALL switch_stacks=1
 	ENCODE_FRAME_POINTER
 	TRACE_IRQS_OFF
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 20e45d9b4e15..268cd9affe04 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -878,7 +878,7 @@ apicinterrupt IRQ_WORK_VECTOR			irq_work_interrupt		smp_irq_work_interrupt
  * @paranoid == 2 is special: the stub will never switch stacks.  This is for
  * #DF: if the thread stack is somehow unusable, we'll still get a useful OOPS.
  */
-.macro idtentry sym do_sym has_error_code:req paranoid=0 shift_ist=-1 ist_offset=0
+.macro idtentry sym do_sym has_error_code:req paranoid=0 shift_ist=-1 ist_offset=0 create_gap=0
 ENTRY(\sym)
 	UNWIND_HINT_IRET_REGS offset=\has_error_code*8
 
@@ -898,6 +898,16 @@ ENTRY(\sym)
 	jnz	.Lfrom_usermode_switch_stack_\@
 	.endif
 
+	.if \create_gap == 1
+	testb	$3, CS-ORIG_RAX(%rsp)
+	jnz	.Lfrom_usermode_no_gap_\@
+	.rept 6
+	pushq	5*8(%rsp)
+	.endr
+	UNWIND_HINT_IRET_REGS offset=8
+.Lfrom_usermode_no_gap_\@:
+	.endif
+
 	.if \paranoid
 	call	paranoid_entry
 	.else
@@ -1129,7 +1139,7 @@ apicinterrupt3 HYPERV_STIMER0_VECTOR \
 #endif /* CONFIG_HYPERV */
 
 idtentry debug			do_debug		has_error_code=0	paranoid=1 shift_ist=IST_INDEX_DB ist_offset=DB_STACK_OFFSET
-idtentry int3			do_int3			has_error_code=0
+idtentry int3			do_int3			has_error_code=0	create_gap=1
 idtentry stack_segment		do_stack_segment	has_error_code=1
 
 #ifdef CONFIG_XEN_PV
diff --git a/arch/x86/include/asm/text-patching.h b/arch/x86/include/asm/text-patching.h
index e85ff65c43c3..ba275b6292db 100644
--- a/arch/x86/include/asm/text-patching.h
+++ b/arch/x86/include/asm/text-patching.h
@@ -39,4 +39,24 @@ extern int poke_int3_handler(struct pt_regs *regs);
 extern void *text_poke_bp(void *addr, const void *opcode, size_t len, void *handler);
 extern int after_bootmem;
 
+static inline void int3_emulate_push(struct pt_regs *regs, unsigned long val)
+{
+	regs->sp -= sizeof(unsigned long);
+	*(unsigned long *)regs->sp = val;
+}
+
+static inline void int3_emulate_jmp(struct pt_regs *regs, unsigned long ip)
+{
+	regs->ip = ip;
+}
+
+#define INT3_INSN_SIZE 1
+#define CALL_INSN_SIZE 5
+
+static inline void int3_emulate_call(struct pt_regs *regs, unsigned long func)
+{
+	int3_emulate_push(regs, regs->ip - INT3_INSN_SIZE + CALL_INSN_SIZE);
+	int3_emulate_jmp(regs, func);
+}
+
 #endif /* _ASM_X86_TEXT_PATCHING_H */
diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
index ef49517f6bb2..90d319687d7e 100644
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -29,6 +29,7 @@
 #include <asm/kprobes.h>
 #include <asm/ftrace.h>
 #include <asm/nops.h>
+#include <asm/text-patching.h>
 
 #ifdef CONFIG_DYNAMIC_FTRACE
 
@@ -231,6 +232,7 @@ int ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr,
 }
 
 static unsigned long ftrace_update_func;
+static unsigned long ftrace_update_func_call;
 
 static int update_ftrace_func(unsigned long ip, void *new)
 {
@@ -259,6 +261,8 @@ int ftrace_update_ftrace_func(ftrace_func_t func)
 	unsigned char *new;
 	int ret;
 
+	ftrace_update_func_call = (unsigned long)func;
+
 	new = ftrace_call_replace(ip, (unsigned long)func);
 	ret = update_ftrace_func(ip, new);
 
@@ -295,12 +299,19 @@ int ftrace_int3_handler(struct pt_regs *regs)
 		return 0;
 
 	ip = regs->ip - 1;
-	if (!ftrace_location(ip) && !is_ftrace_caller(ip))
-		return 0;
-
-	regs->ip += MCOUNT_INSN_SIZE - 1;
+	if (ftrace_location(ip)) {
+		int3_emulate_call(regs, ftrace_update_func_call);
+		return 1;
+	} else if (is_ftrace_caller(ip)) {
+		if (!ftrace_update_func_call) {
+			int3_emulate_jmp(regs, regs->ip - INT3_INSN_SIZE + CALL_INSN_SIZE);
+			return 1;
+		}
+		int3_emulate_call(regs, ftrace_update_func_call);
+		return 1;
+	}
 
-	return 1;
+	return 0;
 }
 NOKPROBE_SYMBOL(ftrace_int3_handler);
 
@@ -859,6 +870,8 @@ void arch_ftrace_update_trampoline(struct ftrace_ops *ops)
 
 	func = ftrace_ops_get_func(ops);
 
+	ftrace_update_func_call = (unsigned long)func;
+
 	/* Do a safe modify in case the trampoline is executing */
 	new = ftrace_call_replace(ip, (unsigned long)func);
 	ret = update_ftrace_func(ip, new);
@@ -960,6 +973,7 @@ static int ftrace_mod_jmp(unsigned long ip, void *func)
 {
 	unsigned char *new;
 
+	ftrace_update_func_call = 0UL;
 	new = ftrace_jmp_replace(ip, (unsigned long)func);
 
 	return update_ftrace_func(ip, new);
Steven Rostedt May 1, 2019, 6:58 p.m. UTC | #5
On Wed, 1 May 2019 15:11:17 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Tue, Apr 30, 2019 at 11:33:21AM -0700, Linus Torvalds wrote:
> > Anyway, since Andy really likes the entry code change, can we have
> > that patch in parallel and judge the difference that way? Iirc, that
> > was x86-64 specific too.  
> 
> Here goes, compile tested only...
> 
> It obviously needs a self-test, but that shoulnd't be too hard to
> arrange.
> 

I was able to get it applied (with slight tweaking) but it then
crashed. But that was due to incorrect updates in the
ftrace_int3_handler().

> ---
>  arch/x86/entry/entry_32.S            |  7 +++++++
>  arch/x86/entry/entry_64.S            | 14 ++++++++++++--
>  arch/x86/include/asm/text-patching.h | 20 ++++++++++++++++++++
>  arch/x86/kernel/ftrace.c             | 24 +++++++++++++++++++-----
>  4 files changed, 58 insertions(+), 7 deletions(-)


>  #endif /* _ASM_X86_TEXT_PATCHING_H */
> diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
> index ef49517f6bb2..90d319687d7e 100644
> --- a/arch/x86/kernel/ftrace.c
> +++ b/arch/x86/kernel/ftrace.c
> @@ -29,6 +29,7 @@
>  #include <asm/kprobes.h>
>  #include <asm/ftrace.h>
>  #include <asm/nops.h>
> +#include <asm/text-patching.h>
>  
>  #ifdef CONFIG_DYNAMIC_FTRACE
>  
> @@ -231,6 +232,7 @@ int ftrace_modify_call(struct dyn_ftrace *rec,
> unsigned long old_addr, }
>  
>  static unsigned long ftrace_update_func;
> +static unsigned long ftrace_update_func_call;
>  
>  static int update_ftrace_func(unsigned long ip, void *new)
>  {
> @@ -259,6 +261,8 @@ int ftrace_update_ftrace_func(ftrace_func_t func)
>  	unsigned char *new;
>  	int ret;
>  
> +	ftrace_update_func_call = (unsigned long)func;
> +
>  	new = ftrace_call_replace(ip, (unsigned long)func);
>  	ret = update_ftrace_func(ip, new);
>  
> @@ -295,12 +299,19 @@ int ftrace_int3_handler(struct pt_regs *regs)
>  		return 0;
>  
>  	ip = regs->ip - 1;
> -	if (!ftrace_location(ip) && !is_ftrace_caller(ip))
> -		return 0;
> -
> -	regs->ip += MCOUNT_INSN_SIZE - 1;
> +	if (ftrace_location(ip)) {
> +		int3_emulate_call(regs, ftrace_update_func_call);

Should be:

		int3_emulate_call(regs, (unsigned long)ftrace_regs_caller);

> +		return 1;
> +	} else if (is_ftrace_caller(ip)) {
> +		if (!ftrace_update_func_call) {
> +			int3_emulate_jmp(regs, regs->ip - INT3_INSN_SIZE + CALL_INSN_SIZE);

I see what you did here, but I think:

			int3_emulate_jmp(regs, ip + CALL_INSN_SIZE);

looks better. But that said, we could in the beginning do:

	ip = regs->ip - INT3_INSN_SIZE;

instead of

	ip = regs->ip - 1;

I made these updates and posted them to Linus.

-- Steve


> +			return 1;
> +		}
> +		int3_emulate_call(regs, ftrace_update_func_call);
> +		return 1;
> +	}
>  
> -	return 1;
> +	return 0;
>  }
>  NOKPROBE_SYMBOL(ftrace_int3_handler);
>  
> @@ -859,6 +870,8 @@ void arch_ftrace_update_trampoline(struct
> ftrace_ops *ops) 
>  	func = ftrace_ops_get_func(ops);
>  
> +	ftrace_update_func_call = (unsigned long)func;
> +
>  	/* Do a safe modify in case the trampoline is executing */
>  	new = ftrace_call_replace(ip, (unsigned long)func);
>  	ret = update_ftrace_func(ip, new);
> @@ -960,6 +973,7 @@ static int ftrace_mod_jmp(unsigned long ip, void
> *func) {
>  	unsigned char *new;
>  
> +	ftrace_update_func_call = 0UL;
>  	new = ftrace_jmp_replace(ip, (unsigned long)func);
>  
>  	return update_ftrace_func(ip, new);
Peter Zijlstra May 1, 2019, 7:03 p.m. UTC | #6
On Wed, May 01, 2019 at 02:58:24PM -0400, Steven Rostedt wrote:
> > +	if (ftrace_location(ip)) {
> > +		int3_emulate_call(regs, ftrace_update_func_call);
> 
> Should be:
> 
> 		int3_emulate_call(regs, (unsigned long)ftrace_regs_caller);

Ah, I lost the plot a little there.

> > +		return 1;
> > +	} else if (is_ftrace_caller(ip)) {
> > +		if (!ftrace_update_func_call) {
> > +			int3_emulate_jmp(regs, regs->ip - INT3_INSN_SIZE + CALL_INSN_SIZE);
> 
> I see what you did here, but I think:
> 
> 			int3_emulate_jmp(regs, ip + CALL_INSN_SIZE);
> 
> looks better. But that said, we could in the beginning do:
> 
> 	ip = regs->ip - INT3_INSN_SIZE;
> 
> instead of
> 
> 	ip = regs->ip - 1;
> 
> I made these updates and posted them to Linus.

I was actually considering:

static inline void int3_emulate_nop(struct pt_regs *regs, unsigned long size)
{
	int3_emulate_jmp(regs, regs->ip - INT3_INSN_SIZE + size);
}

And then the above becomes:

	int3_emulate_nop(regs, CALL_INSN_SIZE);

Hmm?
Linus Torvalds May 1, 2019, 7:03 p.m. UTC | #7
On Wed, May 1, 2019 at 6:11 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> Here goes, compile tested only...

Ugh, two different threads. This has the same bug (same source) as the
one Steven posted:

> --- a/arch/x86/entry/entry_32.S
> +++ b/arch/x86/entry/entry_32.S
> @@ -1479,6 +1479,13 @@ ENTRY(int3)
>         ASM_CLAC
>         pushl   $-1                             # mark this as an int
>
> +       testl   $SEGMENT_RPL_MASK, PT_CS(%esp)
> +       jnz     .Lfrom_usermode_no_gap
> +       .rept 6
> +       pushl   5*4(%esp)
> +       .endr
> +.Lfrom_usermode_no_gap:

This will corrupt things horribly if you still use vm86 mode. Checking
CS RPL is simply not correct.

                 Linus
Peter Zijlstra May 1, 2019, 7:13 p.m. UTC | #8
On Wed, May 01, 2019 at 12:03:52PM -0700, Linus Torvalds wrote:
> On Wed, May 1, 2019 at 6:11 AM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > Here goes, compile tested only...
> 
> Ugh, two different threads. This has the same bug (same source) as the
> one Steven posted:

This is what Steve started from; lets continue in the other thread.

> > --- a/arch/x86/entry/entry_32.S
> > +++ b/arch/x86/entry/entry_32.S
> > @@ -1479,6 +1479,13 @@ ENTRY(int3)
> >         ASM_CLAC
> >         pushl   $-1                             # mark this as an int
> >
> > +       testl   $SEGMENT_RPL_MASK, PT_CS(%esp)
> > +       jnz     .Lfrom_usermode_no_gap
> > +       .rept 6
> > +       pushl   5*4(%esp)
> > +       .endr
> > +.Lfrom_usermode_no_gap:
> 
> This will corrupt things horribly if you still use vm86 mode. Checking
> CS RPL is simply not correct.

I'll go fix; I never really understood that vm86 crud and I cobbled this
32bit thing together based on the 64bit version (that Josh did a while
ago).
Steven Rostedt May 1, 2019, 7:13 p.m. UTC | #9
On Wed, 1 May 2019 12:03:52 -0700
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Wed, May 1, 2019 at 6:11 AM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > Here goes, compile tested only...  
> 
> Ugh, two different threads. This has the same bug (same source) as the
> one Steven posted:
> 
> > --- a/arch/x86/entry/entry_32.S
> > +++ b/arch/x86/entry/entry_32.S
> > @@ -1479,6 +1479,13 @@ ENTRY(int3)
> >         ASM_CLAC
> >         pushl   $-1                             # mark this as an int
> >
> > +       testl   $SEGMENT_RPL_MASK, PT_CS(%esp)
> > +       jnz     .Lfrom_usermode_no_gap
> > +       .rept 6
> > +       pushl   5*4(%esp)
> > +       .endr
> > +.Lfrom_usermode_no_gap:  
> 
> This will corrupt things horribly if you still use vm86 mode. Checking
> CS RPL is simply not correct.

I never tested the 32 bit version of this. And we could just not
implement it (I don't think there's live kernel patching for it
either).

But this doesn't make it any worse than my version, because under the
full testing of my patch with the trampolines, I would easily crash the
32 bit version. That was one reason I made my last patch only support 64
bit.

Under light load, 32 bit works, but when I stress it (running perf and
ftrace together) it blows up. Could be an NMI issue.

-- Steve
Jiri Kosina May 1, 2019, 7:33 p.m. UTC | #10
On Wed, 1 May 2019, Steven Rostedt wrote:

> I never tested the 32 bit version of this. And we could just not
> implement it (I don't think there's live kernel patching for it
> either).

That's correct, there is no livepatching on x86_32 (and no plans for 
it). CONFIG_LIVEPATCH is not available for 32bit builds at all.
Peter Zijlstra May 1, 2019, 7:41 p.m. UTC | #11
On Wed, May 01, 2019 at 09:33:28PM +0200, Jiri Kosina wrote:
> On Wed, 1 May 2019, Steven Rostedt wrote:
> 
> > I never tested the 32 bit version of this. And we could just not
> > implement it (I don't think there's live kernel patching for it
> > either).
> 
> That's correct, there is no livepatching on x86_32 (and no plans for 
> it). CONFIG_LIVEPATCH is not available for 32bit builds at all.

We still want this for static_call(), even on 32bit.
diff mbox series

Patch

diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
index ef49517f6bb2..835277043348 100644
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -17,6 +17,7 @@ 
 #include <linux/uaccess.h>
 #include <linux/ftrace.h>
 #include <linux/percpu.h>
+#include <linux/frame.h>
 #include <linux/sched.h>
 #include <linux/slab.h>
 #include <linux/init.h>
@@ -232,6 +233,9 @@  int ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr,
 
 static unsigned long ftrace_update_func;
 
+/* Used within inline asm below */
+unsigned long ftrace_update_func_call;
+
 static int update_ftrace_func(unsigned long ip, void *new)
 {
 	unsigned char old[MCOUNT_INSN_SIZE];
@@ -259,6 +263,8 @@  int ftrace_update_ftrace_func(ftrace_func_t func)
 	unsigned char *new;
 	int ret;
 
+	ftrace_update_func_call = (unsigned long)func;
+
 	new = ftrace_call_replace(ip, (unsigned long)func);
 	ret = update_ftrace_func(ip, new);
 
@@ -280,6 +286,103 @@  static nokprobe_inline int is_ftrace_caller(unsigned long ip)
 	return 0;
 }
 
+/*
+ * We need to handle the "call func1" -> "call func2" case.
+ * Just skipping the call is not sufficient as it will be like
+ * changing to "nop" first and then updating the call. But some
+ * users of ftrace require calls never to be missed.
+ *
+ * To emulate the call while converting the call site with a breakpoint,
+ * some trampolines are used along with per CPU buffers.
+ * There are three trampolines for the call sites and three trampolines
+ * for the updating of the call in ftrace trampoline. The three
+ * trampolines are:
+ *
+ * 1) Interrupts are enabled when the breakpoint is hit
+ * 2) Interrupts are disabled when the breakpoint is hit
+ * 3) The breakpoint was hit in an NMI
+ *
+ * As per CPU data is used, interrupts must be disabled to prevent them
+ * from corrupting the data. A separate NMI trampoline is used for the
+ * NMI case. If interrupts are already disabled, then the return path
+ * of where the breakpoint was hit (saved in the per CPU data) is pushed
+ * on the stack and then a jump to either the ftrace_caller (which will
+ * loop through all registered ftrace_ops handlers depending on the ip
+ * address), or if its a ftrace trampoline call update, it will call
+ * ftrace_update_func_call which will hold the call that should be
+ * called.
+ */
+extern asmlinkage void ftrace_emulate_call_irqon(void);
+extern asmlinkage void ftrace_emulate_call_irqoff(void);
+extern asmlinkage void ftrace_emulate_call_nmi(void);
+extern asmlinkage void ftrace_emulate_call_update_irqoff(void);
+extern asmlinkage void ftrace_emulate_call_update_irqon(void);
+extern asmlinkage void ftrace_emulate_call_update_nmi(void);
+
+static DEFINE_PER_CPU(void *, ftrace_bp_call_return);
+static DEFINE_PER_CPU(void *, ftrace_bp_call_nmi_return);
+
+asm(
+	".text\n"
+
+	/* Trampoline for function update with interrupts enabled */
+	".global ftrace_emulate_call_irqoff\n"
+	".type ftrace_emulate_call_irqoff, @function\n"
+	"ftrace_emulate_call_irqoff:\n\t"
+		"push %gs:ftrace_bp_call_return\n\t"
+		"sti\n\t"
+		"jmp ftrace_caller\n"
+	".size ftrace_emulate_call_irqoff, .-ftrace_emulate_call_irqoff\n"
+
+	/* Trampoline for function update with interrupts disabled*/
+	".global ftrace_emulate_call_irqon\n"
+	".type ftrace_emulate_call_irqon, @function\n"
+	"ftrace_emulate_call_irqon:\n\t"
+		"push %gs:ftrace_bp_call_return\n\t"
+		"jmp ftrace_caller\n"
+	".size ftrace_emulate_call_irqon, .-ftrace_emulate_call_irqon\n"
+
+	/* Trampoline for function update in an NMI */
+	".global ftrace_emulate_call_nmi\n"
+	".type ftrace_emulate_call_nmi, @function\n"
+	"ftrace_emulate_call_nmi:\n\t"
+		"push %gs:ftrace_bp_call_nmi_return\n\t"
+		"jmp ftrace_caller\n"
+	".size ftrace_emulate_call_nmi, .-ftrace_emulate_call_nmi\n"
+
+	/* Trampoline for ftrace trampoline call update with interrupts enabled */
+	".global ftrace_emulate_call_update_irqoff\n"
+	".type ftrace_emulate_call_update_irqoff, @function\n"
+	"ftrace_emulate_call_update_irqoff:\n\t"
+		"push %gs:ftrace_bp_call_return\n\t"
+		"sti\n\t"
+		"jmp *ftrace_update_func_call\n"
+	".size ftrace_emulate_call_update_irqoff, .-ftrace_emulate_call_update_irqoff\n"
+
+	/* Trampoline for ftrace trampoline call update with interrupts disabled */
+	".global ftrace_emulate_call_update_irqon\n"
+	".type ftrace_emulate_call_update_irqon, @function\n"
+	"ftrace_emulate_call_update_irqon:\n\t"
+		"push %gs:ftrace_bp_call_return\n\t"
+		"jmp *ftrace_update_func_call\n"
+	".size ftrace_emulate_call_update_irqon, .-ftrace_emulate_call_update_irqon\n"
+
+	/* Trampoline for ftrace trampoline call update in an NMI */
+	".global ftrace_emulate_call_update_nmi\n"
+	".type ftrace_emulate_call_update_nmi, @function\n"
+	"ftrace_emulate_call_update_nmi:\n\t"
+		"push %gs:ftrace_bp_call_nmi_return\n\t"
+		"jmp *ftrace_update_func_call\n"
+	".size ftrace_emulate_call_update_nmi, .-ftrace_emulate_call_update_nmi\n"
+	".previous\n");
+
+STACK_FRAME_NON_STANDARD(ftrace_emulate_call_irqoff);
+STACK_FRAME_NON_STANDARD(ftrace_emulate_call_irqon);
+STACK_FRAME_NON_STANDARD(ftrace_emulate_call_nmi);
+STACK_FRAME_NON_STANDARD(ftrace_emulate_call_update_irqoff);
+STACK_FRAME_NON_STANDARD(ftrace_emulate_call_update_irqon);
+STACK_FRAME_NON_STANDARD(ftrace_emulate_call_update_nmi);
+
 /*
  * A breakpoint was added to the code address we are about to
  * modify, and this is the handle that will just skip over it.
@@ -295,10 +398,44 @@  int ftrace_int3_handler(struct pt_regs *regs)
 		return 0;
 
 	ip = regs->ip - 1;
-	if (!ftrace_location(ip) && !is_ftrace_caller(ip))
+	if (ftrace_location(ip)) {
+		/* A breakpoint at the beginning of the function was hit */
+		if (in_nmi()) {
+			/* NMIs have their own trampoline */
+			this_cpu_write(ftrace_bp_call_nmi_return, (void *)ip + MCOUNT_INSN_SIZE);
+			regs->ip = (unsigned long) ftrace_emulate_call_nmi;
+			return 1;
+		}
+		this_cpu_write(ftrace_bp_call_return, (void *)ip + MCOUNT_INSN_SIZE);
+		if (regs->flags & X86_EFLAGS_IF) {
+			regs->flags &= ~X86_EFLAGS_IF;
+			regs->ip = (unsigned long) ftrace_emulate_call_irqoff;
+		} else {
+			regs->ip = (unsigned long) ftrace_emulate_call_irqon;
+		}
+	} else if (is_ftrace_caller(ip)) {
+		/* An ftrace trampoline is being updated */
+		if (!ftrace_update_func_call) {
+			/* If it's a jump, just need to skip it */
+			regs->ip += MCOUNT_INSN_SIZE -1;
+			return 1;
+		}
+		if (in_nmi()) {
+			/* NMIs have their own trampoline */
+			this_cpu_write(ftrace_bp_call_nmi_return, (void *)ip + MCOUNT_INSN_SIZE);
+			regs->ip = (unsigned long) ftrace_emulate_call_update_nmi;
+			return 1;
+		}
+		this_cpu_write(ftrace_bp_call_return, (void *)ip + MCOUNT_INSN_SIZE);
+		if (regs->flags & X86_EFLAGS_IF) {
+			regs->flags &= ~X86_EFLAGS_IF;
+			regs->ip = (unsigned long) ftrace_emulate_call_update_irqoff;
+		} else {
+			regs->ip = (unsigned long) ftrace_emulate_call_update_irqon;
+		}
+	} else {
 		return 0;
-
-	regs->ip += MCOUNT_INSN_SIZE - 1;
+	}
 
 	return 1;
 }
@@ -859,6 +996,8 @@  void arch_ftrace_update_trampoline(struct ftrace_ops *ops)
 
 	func = ftrace_ops_get_func(ops);
 
+	ftrace_update_func_call = (unsigned long)func;
+
 	/* Do a safe modify in case the trampoline is executing */
 	new = ftrace_call_replace(ip, (unsigned long)func);
 	ret = update_ftrace_func(ip, new);
@@ -960,6 +1099,7 @@  static int ftrace_mod_jmp(unsigned long ip, void *func)
 {
 	unsigned char *new;
 
+	ftrace_update_func_call = 0;
 	new = ftrace_jmp_replace(ip, (unsigned long)func);
 
 	return update_ftrace_func(ip, new);