diff mbox series

[V2,08/17] x86/entry: Move syscall irq tracing to C code

Message ID 20191023123118.386844979@linutronix.de (mailing list archive)
State New, archived
Headers show
Series entry: Provide generic implementation for host and guest entry/exit work | expand

Commit Message

Thomas Gleixner Oct. 23, 2019, 12:27 p.m. UTC
Interrupt state tracing can be safely done in C code. The few stack
operations in assembly do not need to be covered.

Remove the now pointless indirection via .Lsyscall_32_done and jump to
swapgs_restore_regs_and_return_to_usermode directly.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/entry/common.c          |   10 ++++++++++
 arch/x86/entry/entry_32.S        |   17 -----------------
 arch/x86/entry/entry_64.S        |    6 ------
 arch/x86/entry/entry_64_compat.S |   30 ++++--------------------------
 4 files changed, 14 insertions(+), 49 deletions(-)

Comments

Andy Lutomirski Oct. 23, 2019, 9:30 p.m. UTC | #1
On Wed, Oct 23, 2019 at 5:31 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> Interrupt state tracing can be safely done in C code. The few stack
> operations in assembly do not need to be covered.
>
> Remove the now pointless indirection via .Lsyscall_32_done and jump to
> swapgs_restore_regs_and_return_to_usermode directly.

This doesn't look right.

>  #define SYSCALL_EXIT_WORK_FLAGS                                \
> @@ -279,6 +282,9 @@ static void syscall_slow_exit_work(struc
>  {
>         struct thread_info *ti;
>
> +       /* User to kernel transition disabled interrupts. */
> +       trace_hardirqs_off();
> +

So you just traced IRQs off, but...

>         enter_from_user_mode();
>         local_irq_enable();

Now they're on and traced on again?

I also don't see how your patch handles the fastpath case.

How about the attached patch instead?
Andy Lutomirski Oct. 23, 2019, 9:35 p.m. UTC | #2
On Wed, Oct 23, 2019 at 2:30 PM Andy Lutomirski <luto@kernel.org> wrote:
>
> On Wed, Oct 23, 2019 at 5:31 AM Thomas Gleixner <tglx@linutronix.de> wrote:
> >
> > Interrupt state tracing can be safely done in C code. The few stack
> > operations in assembly do not need to be covered.
> >
> > Remove the now pointless indirection via .Lsyscall_32_done and jump to
> > swapgs_restore_regs_and_return_to_usermode directly.
>
> This doesn't look right.
>
> >  #define SYSCALL_EXIT_WORK_FLAGS                                \
> > @@ -279,6 +282,9 @@ static void syscall_slow_exit_work(struc
> >  {
> >         struct thread_info *ti;
> >
> > +       /* User to kernel transition disabled interrupts. */
> > +       trace_hardirqs_off();
> > +
>
> So you just traced IRQs off, but...
>
> >         enter_from_user_mode();
> >         local_irq_enable();
>
> Now they're on and traced on again?
>
> I also don't see how your patch handles the fastpath case.
>
> How about the attached patch instead?

Ignore the attached patch.  You have this in your
do_exit_to_usermode() later in the series.  But I'm still quite
confused by this patch.
Thomas Gleixner Oct. 23, 2019, 11:16 p.m. UTC | #3
On Wed, 23 Oct 2019, Andy Lutomirski wrote:

> On Wed, Oct 23, 2019 at 5:31 AM Thomas Gleixner <tglx@linutronix.de> wrote:
> >
> > Interrupt state tracing can be safely done in C code. The few stack
> > operations in assembly do not need to be covered.
> >
> > Remove the now pointless indirection via .Lsyscall_32_done and jump to
> > swapgs_restore_regs_and_return_to_usermode directly.
> 
> This doesn't look right.
> 
> >  #define SYSCALL_EXIT_WORK_FLAGS                                \
> > @@ -279,6 +282,9 @@ static void syscall_slow_exit_work(struc
> >  {
> >         struct thread_info *ti;
> >
> > +       /* User to kernel transition disabled interrupts. */
> > +       trace_hardirqs_off();
> > +
> 
> So you just traced IRQs off, but...
> 
> >         enter_from_user_mode();
> >         local_irq_enable();
> 
> Now they're on and traced on again?

Yes, because that's what actually happens.

usermode
 syscall		<- Disables interrupts, but tracing thinks they are on
   entry_SYSCALL_64
   ....
   call do_syscall_64

     trace_hardirqs_off() <- So before calling anything else, we have to tell
			     the tracer that interrupts are on, which we did
			     so far in the ASM code between entry_SYSCALL_64
			     and 'call do_syscall_64'. I'm merily lifting this
			     to C-code.

     enter_from_user_mode()
     local_irq_enable()
 
> I also don't see how your patch handles the fastpath case.

Hmm?

All syscalls return through:

    syscall_return_slowpath()
        local_irq_disable()
	prepare_exit_to_usermode()
	  user_enter_irqoff()
	  mds_user_clear_cpu_buffers()
	  trace_hardirqs_on()

What am I missing?
 
> How about the attached patch instead?

      	    	^^^^^^ Groan.

>
>  	user_enter_irqoff();
>  
> +	/*
> +	 * The actual return to usermode will almost certainly turn IRQs on.
> +	 * Trace it here to simplify the asm code.

Why would we return to user from a syscall or interrupt with interrupts
traced as disabled? Also the existing ASM is inconsistent vs. that:

ENTRY(entry_SYSENTER_32)        TRACE_IRQS_ON

ENTRY(entry_INT80_32)		TRACE_IRQS_IRET

ENTRY(entry_SYSCALL_64)		TRACE_IRQS_IRET

ENTRY(ret_from_fork)		TRACE_IRQS_ON

GLOBAL(retint_user)		TRACE_IRQS_IRETQ

ENTRY(entry_SYSCALL_compat)	TRACE_IRQS_ON

ENTRY(entry_INT80_compat)	TRACE_IRQS_ON

> +	 */
> +	if (likely(regs->flags & X86_EFLAGS_IF))
> +		trace_hardirqs_on();

My variant does this unconditionally and after
mds_user_clear_cpu_buffers().

> 	mds_user_clear_cpu_buffers();
> }
 
And your ASM changes keep still all the TRACE_IRQS_OFF invocations in the
various syscall entry pathes, which is what I removed and put as the first
thing into the C functions.

Confused.

Thanks,

	tglx
Thomas Gleixner Oct. 23, 2019, 11:31 p.m. UTC | #4
On Wed, 23 Oct 2019, Andy Lutomirski wrote:
> On Wed, Oct 23, 2019 at 2:30 PM Andy Lutomirski <luto@kernel.org> wrote:
> >
> > On Wed, Oct 23, 2019 at 5:31 AM Thomas Gleixner <tglx@linutronix.de> wrote:
> > >
> > > Interrupt state tracing can be safely done in C code. The few stack
> > > operations in assembly do not need to be covered.
> > >
> > > Remove the now pointless indirection via .Lsyscall_32_done and jump to
> > > swapgs_restore_regs_and_return_to_usermode directly.
> >
> > This doesn't look right.
> >
> > >  #define SYSCALL_EXIT_WORK_FLAGS                                \
> > > @@ -279,6 +282,9 @@ static void syscall_slow_exit_work(struc
> > >  {
> > >         struct thread_info *ti;
> > >
> > > +       /* User to kernel transition disabled interrupts. */
> > > +       trace_hardirqs_off();
> > > +
> >
> > So you just traced IRQs off, but...
> >
> > >         enter_from_user_mode();
> > >         local_irq_enable();
> >
> > Now they're on and traced on again?
> >
> > I also don't see how your patch handles the fastpath case.
> >
> > How about the attached patch instead?
> 
> Ignore the attached patch.  You have this in your
> do_exit_to_usermode() later in the series.  But I'm still quite
> confused by this patch.

What's confusing you? It basically does:

  ENTRY(syscall/int80)

-	TRACE_IRQS_OFF
	call C-syscall*()
-	TRACE_IRQS_ON/IRET

and

C-syscall*()

+       trace_hardirqs_off()		<- first action
	....
	prepare_exit_to_usermode()	<- last action
	return

and

prepare_exit_to_usermode()
	....
+       trace_hardirqs_on()		<- last action
	return

So this is exactly the same as the ASM today.

The only change is that I made it do unconditionally trace_hardirqs_on()
for consistency reasons.

I tried to split it into bits and pieces, but failed to come up with
something sensible. Let me try again tomorrow.

Thanks,

	tglx
Andy Lutomirski Oct. 24, 2019, 4:24 p.m. UTC | #5
On Wed, Oct 23, 2019 at 2:30 PM Andy Lutomirski <luto@kernel.org> wrote:
>
> On Wed, Oct 23, 2019 at 5:31 AM Thomas Gleixner <tglx@linutronix.de> wrote:
> >
> > Interrupt state tracing can be safely done in C code. The few stack
> > operations in assembly do not need to be covered.
> >
> > Remove the now pointless indirection via .Lsyscall_32_done and jump to
> > swapgs_restore_regs_and_return_to_usermode directly.
>
> This doesn't look right.

Well, I feel a bit silly.  I read this:

>
> >  #define SYSCALL_EXIT_WORK_FLAGS                                \
> > @@ -279,6 +282,9 @@ static void syscall_slow_exit_work(struc

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

and I applied the diff in my head to the wrong function, and I didn't
notice that it didn't really apply there.  Oddly, gitweb gets this
right:

https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?h=WIP.core/entry&id=e3158f93138ded84eb44fa97606197f6adcf9366

Looking at the actual code:

Acked-by: Andy Lutomirski <luto@kernel.org>

with one minor caveat: you are making a subtle and mostly irrelevant
semantic change: with your patch, user mode will be traced as IRQs on
even if a nasty user has used iopl() to turn off interrupts.  This is
probably a good thing, but I think you should mention it in the
changelog.

FWIW, the rest of the series looks pretty good, too.
Peter Zijlstra Oct. 24, 2019, 5:40 p.m. UTC | #6
On Thu, Oct 24, 2019 at 09:24:13AM -0700, Andy Lutomirski wrote:
> On Wed, Oct 23, 2019 at 2:30 PM Andy Lutomirski <luto@kernel.org> wrote:
> >
> > On Wed, Oct 23, 2019 at 5:31 AM Thomas Gleixner <tglx@linutronix.de> wrote:
> > >
> > > Interrupt state tracing can be safely done in C code. The few stack
> > > operations in assembly do not need to be covered.
> > >
> > > Remove the now pointless indirection via .Lsyscall_32_done and jump to
> > > swapgs_restore_regs_and_return_to_usermode directly.
> >
> > This doesn't look right.
> 
> Well, I feel a bit silly.  I read this:
> 
> >
> > >  #define SYSCALL_EXIT_WORK_FLAGS                                \
> > > @@ -279,6 +282,9 @@ static void syscall_slow_exit_work(struc
> 
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 
> and I applied the diff in my head to the wrong function, and I didn't
> notice that it didn't really apply there.  Oddly, gitweb gets this

I had the same when reviewing these patches; I was almost going to ask
tglx about it on IRC when the penny dropped.
Thomas Gleixner Oct. 24, 2019, 8:54 p.m. UTC | #7
On Thu, 24 Oct 2019, Peter Zijlstra wrote:
> On Thu, Oct 24, 2019 at 09:24:13AM -0700, Andy Lutomirski wrote:
> > On Wed, Oct 23, 2019 at 2:30 PM Andy Lutomirski <luto@kernel.org> wrote:
> > >
> > > On Wed, Oct 23, 2019 at 5:31 AM Thomas Gleixner <tglx@linutronix.de> wrote:
> > > >
> > > > Interrupt state tracing can be safely done in C code. The few stack
> > > > operations in assembly do not need to be covered.
> > > >
> > > > Remove the now pointless indirection via .Lsyscall_32_done and jump to
> > > > swapgs_restore_regs_and_return_to_usermode directly.
> > >
> > > This doesn't look right.
> > 
> > Well, I feel a bit silly.  I read this:

Happened to me before. Don't worry.

> > >
> > > >  #define SYSCALL_EXIT_WORK_FLAGS                                \
> > > > @@ -279,6 +282,9 @@ static void syscall_slow_exit_work(struc
> > 
> > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > 
> > and I applied the diff in my head to the wrong function, and I didn't
> > notice that it didn't really apply there.  Oddly, gitweb gets this
> 
> I had the same when reviewing these patches; I was almost going to ask
> tglx about it on IRC when the penny dropped.

diff is weird at times.

Thanks,

	tglx
diff mbox series

Patch

--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -218,6 +218,9 @@  static void exit_to_usermode_loop(struct
 	user_enter_irqoff();
 
 	mds_user_clear_cpu_buffers();
+
+	/* The return to usermode reenables interrupts. Tell the tracer */
+	trace_hardirqs_on();
 }
 
 #define SYSCALL_EXIT_WORK_FLAGS				\
@@ -279,6 +282,9 @@  static void syscall_slow_exit_work(struc
 {
 	struct thread_info *ti;
 
+	/* User to kernel transition disabled interrupts. */
+	trace_hardirqs_off();
+
 	enter_from_user_mode();
 	local_irq_enable();
 	ti = current_thread_info();
@@ -351,6 +357,7 @@  static __always_inline void do_syscall_3
 /* Handles int $0x80 */
 __visible void do_int80_syscall_32(struct pt_regs *regs)
 {
+	trace_hardirqs_off();
 	enter_from_user_mode();
 	local_irq_enable();
 	do_syscall_32_irqs_on(regs);
@@ -367,6 +374,9 @@  static __always_inline void do_syscall_3
 	unsigned long landing_pad = (unsigned long)current->mm->context.vdso +
 		vdso_image_32.sym_int80_landing_pad;
 
+	/* User to kernel transition disabled interrupts. */
+	trace_hardirqs_off();
+
 	/*
 	 * SYSENTER loses EIP, and even SYSCALL32 needs us to skip forward
 	 * so that 'regs->ip -= 2' lands back on an int $0x80 instruction.
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -924,12 +924,6 @@  ENTRY(entry_SYSENTER_32)
 	jnz	.Lsysenter_fix_flags
 .Lsysenter_flags_fixed:
 
-	/*
-	 * User mode is traced as though IRQs are on, and SYSENTER
-	 * turned them off.
-	 */
-	TRACE_IRQS_OFF
-
 	movl	%esp, %eax
 	call	do_fast_syscall_32
 	/* XEN PV guests always use IRET path */
@@ -939,8 +933,6 @@  ENTRY(entry_SYSENTER_32)
 	STACKLEAK_ERASE
 
 /* Opportunistic SYSEXIT */
-	TRACE_IRQS_ON			/* User mode traces as IRQs on. */
-
 	/*
 	 * Setup entry stack - we keep the pointer in %eax and do the
 	 * switch after almost all user-state is restored.
@@ -1039,12 +1031,6 @@  ENTRY(entry_INT80_32)
 
 	SAVE_ALL pt_regs_ax=$-ENOSYS switch_stacks=1	/* save rest */
 
-	/*
-	 * User mode is traced as though IRQs are on, and the interrupt gate
-	 * turned them off.
-	 */
-	TRACE_IRQS_OFF
-
 	movl	%esp, %eax
 	call	do_int80_syscall_32
 .Lsyscall_32_done:
@@ -1052,11 +1038,8 @@  ENTRY(entry_INT80_32)
 	STACKLEAK_ERASE
 
 restore_all:
-	TRACE_IRQS_IRET
 	SWITCH_TO_ENTRY_STACK
-.Lrestore_all_notrace:
 	CHECK_AND_APPLY_ESPFIX
-.Lrestore_nocheck:
 	/* Switch back to user CR3 */
 	SWITCH_TO_USER_CR3 scratch_reg=%eax
 
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -167,15 +167,11 @@  GLOBAL(entry_SYSCALL_64_after_hwframe)
 
 	PUSH_AND_CLEAR_REGS rax=$-ENOSYS
 
-	TRACE_IRQS_OFF
-
 	/* IRQs are off. */
 	movq	%rax, %rdi
 	movq	%rsp, %rsi
 	call	do_syscall_64		/* returns with IRQs disabled */
 
-	TRACE_IRQS_IRETQ		/* we're about to change IF */
-
 	/*
 	 * Try to use SYSRET instead of IRET if we're returning to
 	 * a completely clean 64-bit userspace context.  If we're not,
@@ -342,7 +338,6 @@  ENTRY(ret_from_fork)
 	UNWIND_HINT_REGS
 	movq	%rsp, %rdi
 	call	syscall_return_slowpath	/* returns with IRQs disabled */
-	TRACE_IRQS_ON			/* user mode is traced as IRQS on */
 	jmp	swapgs_restore_regs_and_return_to_usermode
 
 1:
@@ -606,7 +601,6 @@  END(common_spurious)
 GLOBAL(retint_user)
 	mov	%rsp,%rdi
 	call	prepare_exit_to_usermode
-	TRACE_IRQS_IRETQ
 
 GLOBAL(swapgs_restore_regs_and_return_to_usermode)
 #ifdef CONFIG_DEBUG_ENTRY
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -129,17 +129,11 @@  ENTRY(entry_SYSENTER_compat)
 	jnz	.Lsysenter_fix_flags
 .Lsysenter_flags_fixed:
 
-	/*
-	 * User mode is traced as though IRQs are on, and SYSENTER
-	 * turned them off.
-	 */
-	TRACE_IRQS_OFF
-
 	movq	%rsp, %rdi
 	call	do_fast_syscall_32
 	/* XEN PV guests always use IRET path */
-	ALTERNATIVE "testl %eax, %eax; jz .Lsyscall_32_done", \
-		    "jmp .Lsyscall_32_done", X86_FEATURE_XENPV
+	ALTERNATIVE "testl %eax, %eax; jz swapgs_restore_regs_and_return_to_usermode", \
+		    "jmp swapgs_restore_regs_and_return_to_usermode", X86_FEATURE_XENPV
 	jmp	sysret32_from_system_call
 
 .Lsysenter_fix_flags:
@@ -247,17 +241,11 @@  GLOBAL(entry_SYSCALL_compat_after_hwfram
 	pushq   $0			/* pt_regs->r15 = 0 */
 	xorl	%r15d, %r15d		/* nospec   r15 */
 
-	/*
-	 * User mode is traced as though IRQs are on, and SYSENTER
-	 * turned them off.
-	 */
-	TRACE_IRQS_OFF
-
 	movq	%rsp, %rdi
 	call	do_fast_syscall_32
 	/* XEN PV guests always use IRET path */
-	ALTERNATIVE "testl %eax, %eax; jz .Lsyscall_32_done", \
-		    "jmp .Lsyscall_32_done", X86_FEATURE_XENPV
+	ALTERNATIVE "testl %eax, %eax; jz swapgs_restore_regs_and_return_to_usermode", \
+		    "jmp swapgs_restore_regs_and_return_to_usermode", X86_FEATURE_XENPV
 
 	/* Opportunistic SYSRET */
 sysret32_from_system_call:
@@ -266,7 +254,6 @@  GLOBAL(entry_SYSCALL_compat_after_hwfram
 	 * stack. So let's erase the thread stack right now.
 	 */
 	STACKLEAK_ERASE
-	TRACE_IRQS_ON			/* User mode traces as IRQs on. */
 	movq	RBX(%rsp), %rbx		/* pt_regs->rbx */
 	movq	RBP(%rsp), %rbp		/* pt_regs->rbp */
 	movq	EFLAGS(%rsp), %r11	/* pt_regs->flags (in r11) */
@@ -403,17 +390,8 @@  ENTRY(entry_INT80_compat)
 	xorl	%r15d, %r15d		/* nospec   r15 */
 	cld
 
-	/*
-	 * User mode is traced as though IRQs are on, and the interrupt
-	 * gate turned them off.
-	 */
-	TRACE_IRQS_OFF
-
 	movq	%rsp, %rdi
 	call	do_int80_syscall_32
-.Lsyscall_32_done:
 
-	/* Go back to user mode. */
-	TRACE_IRQS_ON
 	jmp	swapgs_restore_regs_and_return_to_usermode
 END(entry_INT80_compat)