diff mbox series

[1/4] x86/thread_info: introduce ->ftrace_int3_stack member

Message ID 20190427100639.15074-2-nstange@suse.de (mailing list archive)
State New
Headers show
Series x86/ftrace: make ftrace_int3_handler() not to skip fops invocation | expand

Commit Message

Nicolai Stange April 27, 2019, 10:06 a.m. UTC
Before actually rewriting an insn, x86' DYNAMIC_FTRACE  implementation
places an int3 breakpoint on it. Currently, ftrace_int3_handler() simply
treats the insn in question as nop and advances %rip past it. An upcoming
patch will improve this by making the int3 trap handler emulate the call
insn. To this end, ftrace_int3_handler() will be made to change its iret
frame's ->ip to some stub which will then mimic the function call in the
original context.

Somehow the trapping ->ip address will have to get communicated from
ftrace_int3_handler() to these stubs though. Note that at any given point
in time, there can be at most four such call insn emulations pending:
namely at most one per "process", "irq", "softirq" and "nmi" context.

Introduce struct ftrace_int3_stack providing four entries for storing
the instruction pointer.

In principle, it could be made per-cpu, but this would require making
ftrace_int3_handler() to return with preemption disabled and to enable it
from those emulation stubs again only after the stack's top entry has been
consumed. I've been told that this would "break a lot of norms" and that
making this stack part of struct thread_info instead would be less fragile.
Follow this advice and add a struct ftrace_int3_stack instance to x86's
struct thread_info. Note that these stacks will get only rarely accessed
(only during ftrace's code modifications) and thus, cache line dirtying
won't have any significant impact on the neighbouring fields.

Initialization will take place implicitly through INIT_THREAD_INFO as per
the rules for missing elements in initializers. The memcpy() in
arch_dup_task_struct() will propagate the initial state properly, because
it's always run in process context and won't ever see a non-zero ->depth
value.

Finally, add the necessary bits to asm-offsets for making struct
ftrace_int3_stack accessible from assembly.

Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Nicolai Stange <nstange@suse.de>
---
 arch/x86/include/asm/thread_info.h | 11 +++++++++++
 arch/x86/kernel/asm-offsets.c      |  8 ++++++++
 2 files changed, 19 insertions(+)

Comments

Andy Lutomirski April 28, 2019, 5:41 p.m. UTC | #1
> On Apr 27, 2019, at 3:06 AM, Nicolai Stange <nstange@suse.de> wrote:
> 
> Before actually rewriting an insn, x86' DYNAMIC_FTRACE  implementation
> places an int3 breakpoint on it. Currently, ftrace_int3_handler() simply
> treats the insn in question as nop and advances %rip past it.

How does this not crash all the time?

> An upcoming
> patch will improve this by making the int3 trap handler emulate the call
> insn. To this end, ftrace_int3_handler() will be made to change its iret
> frame's ->ip to some stub which will then mimic the function call in the
> original context.
> 
> Somehow the trapping ->ip address will have to get communicated from
> ftrace_int3_handler() to these stubs though.

Cute.  But this should get “ftrace” removed from the name, since it’s potentially more generally useful.  Josh wanted something like this for static_call.

> Note that at any given point
> in time, there can be at most four such call insn emulations pending:
> namely at most one per "process", "irq", "softirq" and "nmi" context.
> 

That’s quite an assumption. I think your list should also contain exception, exceptions nested inside that exception, and machine check, at the very least.  I’m also wondering why irq and softirq are treated separately.

All this makes me think that one of the other solutions we came up with last time we discussed this might be better.
Steven Rostedt April 28, 2019, 5:51 p.m. UTC | #2
On Sun, 28 Apr 2019 10:41:10 -0700
Andy Lutomirski <luto@amacapital.net> wrote:


> > Note that at any given point
> > in time, there can be at most four such call insn emulations pending:
> > namely at most one per "process", "irq", "softirq" and "nmi" context.
> >   
> 
> That’s quite an assumption. I think your list should also contain
> exception, exceptions nested inside that exception, and machine
> check, at the very least.  I’m also wondering why irq and softirq are
> treated separately.

4 has usually been the context count we choose. But I guess in theory,
if we get exceptions then I could potentially be more.

As for irq vs softirq, an interrupt can preempt a softirq. Interrupts
are enabled while softirqs are running. When sofirqs run, softirqs are
disabled to prevent nested softirqs. But interrupts are enabled again,
and another interrupt may come in while a softirq is executing.

> 
> All this makes me think that one of the other solutions we came up
> with last time we discussed this might be better.

+100

Perhaps adding another slot into pt_regs that gets used by int3 to
store a slot to emulate a call on return?

-- Steve
Andy Lutomirski April 28, 2019, 6:08 p.m. UTC | #3
> On Apr 28, 2019, at 10:51 AM, Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> On Sun, 28 Apr 2019 10:41:10 -0700
> Andy Lutomirski <luto@amacapital.net> wrote:
> 
> 
>>> Note that at any given point
>>> in time, there can be at most four such call insn emulations pending:
>>> namely at most one per "process", "irq", "softirq" and "nmi" context.
>>> 
>> 
>> That’s quite an assumption. I think your list should also contain
>> exception, exceptions nested inside that exception, and machine
>> check, at the very least.  I’m also wondering why irq and softirq are
>> treated separately.
> 
> 4 has usually been the context count we choose. But I guess in theory,
> if we get exceptions then I could potentially be more.
> 
> As for irq vs softirq, an interrupt can preempt a softirq. Interrupts
> are enabled while softirqs are running. When sofirqs run, softirqs are
> disabled to prevent nested softirqs. But interrupts are enabled again,
> and another interrupt may come in while a softirq is executing.
> 
>> 
>> All this makes me think that one of the other solutions we came up
>> with last time we discussed this might be better.
> 
> +100
> 
> Perhaps adding another slot into pt_regs that gets used by int3 to
> store a slot to emulate a call on return?
> 
> 

That’s not totally nuts, although finding pt_regs isn’t entirely trivial.

I still think I prefer an approach where we just emulate the call directly.
Steven Rostedt April 28, 2019, 7:43 p.m. UTC | #4
On Sun, 28 Apr 2019 11:08:34 -0700
Andy Lutomirski <luto@amacapital.net> wrote:

> > 
> > Perhaps adding another slot into pt_regs that gets used by int3 to
> > store a slot to emulate a call on return?
> > 
> >   
> 
> That’s not totally nuts, although finding pt_regs isn’t entirely trivial.

I meant on the int3 handler (which stores the pt_regs).

> 
> I still think I prefer an approach where we just emulate the call directly.

Then, on the return of int3, if there's anything in that slot, then we
could possibly shift the exception handler frame (that was added by the
hardware), insert the slot data into the top of the stack, and then
call iret (which the int3 handler, would add the return ip to be the
function being called), which would in essence emulate the call directly.

I believe the complexity comes from the exception frame added by the
hardware is where we need to put the return of the call for the
emulation.

-- Steve
Andy Lutomirski April 28, 2019, 8:56 p.m. UTC | #5
> On Apr 28, 2019, at 12:43 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> On Sun, 28 Apr 2019 11:08:34 -0700
> Andy Lutomirski <luto@amacapital.net> wrote:
> 
>>> 
>>> Perhaps adding another slot into pt_regs that gets used by int3 to
>>> store a slot to emulate a call on return?
>>> 
>>> 
>> 
>> That’s not totally nuts, although finding pt_regs isn’t entirely trivial.
> 
> I meant on the int3 handler (which stores the pt_regs).

But that’s below the stub’s RSP, so it’s toast if another interrupt happens. Or am I misunderstanding you?

> 
>> 
>> I still think I prefer an approach where we just emulate the call directly.
> 
> Then, on the return of int3, if there's anything in that slot, then we
> could possibly shift the exception handler frame (that was added by the
> hardware), insert the slot data into the top of the stack, and then
> call iret (which the int3 handler, would add the return ip to be the
> function being called), which would in essence emulate the call directly.

Oh, I get it.

I liked Josh’s old proposal of unconditionally shifting the #BP frame 8 bytes better. It will be interesting when kernel shadow stacks are thrown in the mix, but that’s a problem for another day.
Nicolai Stange April 28, 2019, 9:22 p.m. UTC | #6
Steven Rostedt <rostedt@goodmis.org> writes:

> On Sun, 28 Apr 2019 10:41:10 -0700
> Andy Lutomirski <luto@amacapital.net> wrote:
>
>
>> > Note that at any given point
>> > in time, there can be at most four such call insn emulations pending:
>> > namely at most one per "process", "irq", "softirq" and "nmi" context.
>> >   
>> 
>> That’s quite an assumption. I think your list should also contain
>> exception, exceptions nested inside that exception, and machine
>> check, at the very least.  I’m also wondering why irq and softirq are
>> treated separately.

You're right, I missed the machine check case.


> 4 has usually been the context count we choose. But I guess in theory,
> if we get exceptions then I could potentially be more.

After having seen the static_call discussion, I'm in no way defending
this limited approach here, but out of curiosity: can the code between
the push onto the stack from ftrace_int3_handler() and the subsequent
pop from the stub actually trigger an (non-#MC) exception? There's an
iret inbetween, but that can fault only when returning to user space,
correct?


> As for irq vs softirq, an interrupt can preempt a softirq. Interrupts
> are enabled while softirqs are running. When sofirqs run, softirqs are
> disabled to prevent nested softirqs. But interrupts are enabled again,
> and another interrupt may come in while a softirq is executing.
>

Thanks,

Nicolai
Andy Lutomirski April 28, 2019, 11:27 p.m. UTC | #7
> On Apr 28, 2019, at 2:22 PM, Nicolai Stange <nstange@suse.de> wrote:
> 
> Steven Rostedt <rostedt@goodmis.org> writes:
> 
>> On Sun, 28 Apr 2019 10:41:10 -0700
>> Andy Lutomirski <luto@amacapital.net> wrote:
>> 
>> 
>>>> Note that at any given point
>>>> in time, there can be at most four such call insn emulations pending:
>>>> namely at most one per "process", "irq", "softirq" and "nmi" context.
>>>> 
>>> 
>>> That’s quite an assumption. I think your list should also contain
>>> exception, exceptions nested inside that exception, and machine
>>> check, at the very least.  I’m also wondering why irq and softirq are
>>> treated separately.
> 
> You're right, I missed the machine check case.
> 
> 
>> 4 has usually been the context count we choose. But I guess in theory,
>> if we get exceptions then I could potentially be more.
> 
> After having seen the static_call discussion, I'm in no way defending
> this limited approach here, but out of curiosity: can the code between
> the push onto the stack from ftrace_int3_handler() and the subsequent
> pop from the stub actually trigger an (non-#MC) exception? There's an
> iret inbetween, but that can fault only when returning to user space,
> correct?

IRET doesn’t do any fancy masking, so #DB, NMI and regular IRQs should all be possible.

> 
> 
>> As for irq vs softirq, an interrupt can preempt a softirq. Interrupts
>> are enabled while softirqs are running. When sofirqs run, softirqs are
>> disabled to prevent nested softirqs. But interrupts are enabled again,
>> and another interrupt may come in while a softirq is executing.
>> 
> 
> Thanks,
> 
> Nicolai
> 
> 
> -- 
> SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton,
> HRB 21284 (AG Nürnberg)
diff mbox series

Patch

diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index e0eccbcb8447..83434a88cfbb 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -56,6 +56,17 @@  struct task_struct;
 struct thread_info {
 	unsigned long		flags;		/* low level flags */
 	u32			status;		/* thread synchronous flags */
+#ifdef CONFIG_DYNAMIC_FTRACE
+	struct ftrace_int3_stack {
+		int depth;
+		/*
+		 * There can be at most one slot in use per context,
+		 * i.e. at most one for "normal", "irq", "softirq" and
+		 * "nmi" each.
+		 */
+		unsigned long slots[4];
+	} ftrace_int3_stack;
+#endif
 };
 
 #define INIT_THREAD_INFO(tsk)			\
diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
index 168543d077d7..ca6ee24a0c6e 100644
--- a/arch/x86/kernel/asm-offsets.c
+++ b/arch/x86/kernel/asm-offsets.c
@@ -105,4 +105,12 @@  static void __used common(void)
 	OFFSET(TSS_sp0, tss_struct, x86_tss.sp0);
 	OFFSET(TSS_sp1, tss_struct, x86_tss.sp1);
 	OFFSET(TSS_sp2, tss_struct, x86_tss.sp2);
+
+#ifdef CONFIG_DYNAMIC_FTRACE
+	BLANK();
+	OFFSET(TASK_TI_ftrace_int3_depth, task_struct,
+	       thread_info.ftrace_int3_stack.depth);
+	OFFSET(TASK_TI_ftrace_int3_slots, task_struct,
+	       thread_info.ftrace_int3_stack.slots);
+#endif
 }