diff mbox

[RFC,10/10] x86/enter: Use IBRS on syscall and interrupts

Message ID 1516476182-5153-11-git-send-email-karahmed@amazon.de (mailing list archive)
State New, archived
Headers show

Commit Message

KarimAllah Ahmed Jan. 20, 2018, 7:23 p.m. UTC
From: Tim Chen <tim.c.chen@linux.intel.com>

Stop Indirect Branch Speculation on every user space to kernel space
transition and reenable it when returning to user space./

The NMI interrupt save/restore of IBRS state was based on Andrea
Arcangeli's implementation.  Here's an explanation by Dave Hansen on why we
save IBRS state for NMI.

The normal interrupt code uses the 'error_entry' path which uses the
Code Segment (CS) of the instruction that was interrupted to tell
whether it interrupted the kernel or userspace and thus has to switch
IBRS, or leave it alone.

The NMI code is different.  It uses 'paranoid_entry' because it can
interrupt the kernel while it is running with a userspace IBRS (and %GS
and CR3) value, but has a kernel CS.  If we used the same approach as
the normal interrupt code, we might do the following;

	SYSENTER_entry
<-------------- NMI HERE
	IBRS=1
		do_something()
	IBRS=0
	SYSRET

The NMI code might notice that we are running in the kernel and decide
that it is OK to skip the IBRS=1.  This would leave it running
unprotected with IBRS=0, which is bad.

However, if we unconditionally set IBRS=1, in the NMI, we might get the
following case:

	SYSENTER_entry
	IBRS=1
		do_something()
	IBRS=0
<-------------- NMI HERE (set IBRS=1)
	SYSRET

and we would return to userspace with IBRS=1.  Userspace would run
slowly until we entered and exited the kernel again.

Instead of those two approaches, we chose a third one where we simply
save the IBRS value in a scratch register (%r13) and then restore that
value, verbatim.

[karahmed use the new SPEC_CTRL_IBRS defines]

Co-developed-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Ashok Raj <ashok.raj@intel.com>
Link: https://lkml.kernel.org/r/d5e4c03ec290c61dfbe5a769f7287817283fa6b7.1515542293.git.tim.c.chen@linux.intel.com
---
 arch/x86/entry/entry_64.S        | 35 ++++++++++++++++++++++++++++++++++-
 arch/x86/entry/entry_64_compat.S | 21 +++++++++++++++++++--
 2 files changed, 53 insertions(+), 3 deletions(-)

Comments

Konrad Rzeszutek Wilk Jan. 21, 2018, 1:50 p.m. UTC | #1
On Sat, Jan 20, 2018 at 08:23:01PM +0100, KarimAllah Ahmed wrote:
> From: Tim Chen <tim.c.chen@linux.intel.com>
> 
> Stop Indirect Branch Speculation on every user space to kernel space
> transition and reenable it when returning to user space./

How about interrupts?

That is should .macro interrupt have the same treatment?
KarimAllah Ahmed Jan. 21, 2018, 2:40 p.m. UTC | #2
On 01/21/2018 02:50 PM, Konrad Rzeszutek Wilk wrote:

> On Sat, Jan 20, 2018 at 08:23:01PM +0100, KarimAllah Ahmed wrote:
>> From: Tim Chen <tim.c.chen@linux.intel.com>
>>
>> Stop Indirect Branch Speculation on every user space to kernel space
>> transition and reenable it when returning to user space./
> How about interrupts?
>
> That is should .macro interrupt have the same treatment?

RESTRICT_IB_SPEC is called in switch_to_thread_stack which is almost the 
first thing called from ".macro interrupt".

>

Amazon Development Center Germany GmbH
Berlin - Dresden - Aachen
main office: Krausenstr. 38, 10117 Berlin
Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
Ust-ID: DE289237879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B
Dave Hansen Jan. 21, 2018, 5:22 p.m. UTC | #3
On 01/21/2018 05:50 AM, Konrad Rzeszutek Wilk wrote:
> On Sat, Jan 20, 2018 at 08:23:01PM +0100, KarimAllah Ahmed wrote:
>> From: Tim Chen <tim.c.chen@linux.intel.com>
>>
>> Stop Indirect Branch Speculation on every user space to kernel space
>> transition and reenable it when returning to user space./
> 
> How about interrupts?

This code covers all kernel entry/exit paths, including interrupts.
Despite its name, "error_entry" is used by the interrupt path.

> That is should .macro interrupt have the same treatment?

It already does.
diff mbox

Patch

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 63f4320..b3d90cf 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -171,6 +171,8 @@  ENTRY(entry_SYSCALL_64_trampoline)
 
 	/* Load the top of the task stack into RSP */
 	movq	CPU_ENTRY_AREA_tss + TSS_sp1 + CPU_ENTRY_AREA, %rsp
+	/* Restrict indirect branch speculation */
+	RESTRICT_IB_SPEC
 
 	/* Start building the simulated IRET frame. */
 	pushq	$__USER_DS			/* pt_regs->ss */
@@ -214,6 +216,8 @@  ENTRY(entry_SYSCALL_64)
 	 */
 	movq	%rsp, PER_CPU_VAR(rsp_scratch)
 	movq	PER_CPU_VAR(cpu_current_top_of_stack), %rsp
+	/* Restrict Indirect Branch Speculation */
+	RESTRICT_IB_SPEC
 
 	TRACE_IRQS_OFF
 
@@ -409,6 +413,8 @@  syscall_return_via_sysret:
 	pushq	RSP-RDI(%rdi)	/* RSP */
 	pushq	(%rdi)		/* RDI */
 
+	/* Unrestrict Indirect Branch Speculation */
+	UNRESTRICT_IB_SPEC
 	/*
 	 * We are on the trampoline stack.  All regs except RDI are live.
 	 * We can do future final exit work right here.
@@ -757,11 +763,12 @@  GLOBAL(swapgs_restore_regs_and_return_to_usermode)
 	/* Push user RDI on the trampoline stack. */
 	pushq	(%rdi)
 
+	/* Unrestrict Indirect Branch Speculation */
+	UNRESTRICT_IB_SPEC
 	/*
 	 * We are on the trampoline stack.  All regs except RDI are live.
 	 * We can do future final exit work right here.
 	 */
-
 	SWITCH_TO_USER_CR3_STACK scratch_reg=%rdi
 
 	/* Restore RDI. */
@@ -849,6 +856,13 @@  native_irq_return_ldt:
 	SWAPGS					/* to kernel GS */
 	SWITCH_TO_KERNEL_CR3 scratch_reg=%rdi	/* to kernel CR3 */
 
+	/*
+	 * There is no point in disabling Indirect Branch Speculation
+	 * here as this is going to return to user space immediately
+	 * after fixing ESPFIX stack.  There is no vulnerable code
+	 * to protect so spare two MSR writes.
+	 */
+
 	movq	PER_CPU_VAR(espfix_waddr), %rdi
 	movq	%rax, (0*8)(%rdi)		/* user RAX */
 	movq	(1*8)(%rsp), %rax		/* user RIP */
@@ -982,6 +996,8 @@  ENTRY(switch_to_thread_stack)
 	SWITCH_TO_KERNEL_CR3 scratch_reg=%rdi
 	movq	%rsp, %rdi
 	movq	PER_CPU_VAR(cpu_current_top_of_stack), %rsp
+	/* Restrict Indirect Branch Speculation */
+	RESTRICT_IB_SPEC
 	UNWIND_HINT sp_offset=16 sp_reg=ORC_REG_DI
 
 	pushq	7*8(%rdi)		/* regs->ss */
@@ -1282,6 +1298,8 @@  ENTRY(paranoid_entry)
 
 1:
 	SAVE_AND_SWITCH_TO_KERNEL_CR3 scratch_reg=%rax save_reg=%r14
+	/* Restrict Indirect Branch speculation */
+	RESTRICT_IB_SPEC_SAVE_AND_CLOBBER save_reg=%r13d
 
 	ret
 END(paranoid_entry)
@@ -1305,6 +1323,8 @@  ENTRY(paranoid_exit)
 	testl	%ebx, %ebx			/* swapgs needed? */
 	jnz	.Lparanoid_exit_no_swapgs
 	TRACE_IRQS_IRETQ
+	/* Restore Indirect Branch Speculation to the previous state */
+	RESTORE_IB_SPEC_CLOBBER save_reg=%r13d
 	RESTORE_CR3	scratch_reg=%rbx save_reg=%r14
 	SWAPGS_UNSAFE_STACK
 	jmp	.Lparanoid_exit_restore
@@ -1335,6 +1355,8 @@  ENTRY(error_entry)
 	SWAPGS
 	/* We have user CR3.  Change to kernel CR3. */
 	SWITCH_TO_KERNEL_CR3 scratch_reg=%rax
+	/* Restrict Indirect Branch Speculation */
+	RESTRICT_IB_SPEC_CLOBBER
 
 .Lerror_entry_from_usermode_after_swapgs:
 	/* Put us onto the real thread stack. */
@@ -1382,6 +1404,8 @@  ENTRY(error_entry)
 	 */
 	SWAPGS
 	SWITCH_TO_KERNEL_CR3 scratch_reg=%rax
+	/* Restrict Indirect Branch Speculation */
+	RESTRICT_IB_SPEC_CLOBBER
 	jmp .Lerror_entry_done
 
 .Lbstep_iret:
@@ -1396,6 +1420,8 @@  ENTRY(error_entry)
 	 */
 	SWAPGS
 	SWITCH_TO_KERNEL_CR3 scratch_reg=%rax
+	/* Restrict Indirect Branch Speculation */
+	RESTRICT_IB_SPEC
 
 	/*
 	 * Pretend that the exception came from user mode: set up pt_regs
@@ -1497,6 +1523,10 @@  ENTRY(nmi)
 	SWITCH_TO_KERNEL_CR3 scratch_reg=%rdx
 	movq	%rsp, %rdx
 	movq	PER_CPU_VAR(cpu_current_top_of_stack), %rsp
+
+	/* Restrict Indirect Branch Speculation */
+	RESTRICT_IB_SPEC
+
 	UNWIND_HINT_IRET_REGS base=%rdx offset=8
 	pushq	5*8(%rdx)	/* pt_regs->ss */
 	pushq	4*8(%rdx)	/* pt_regs->rsp */
@@ -1747,6 +1777,9 @@  end_repeat_nmi:
 	movq	$-1, %rsi
 	call	do_nmi
 
+	/* Restore Indirect Branch speculation to the previous state */
+	RESTORE_IB_SPEC_CLOBBER save_reg=%r13d
+
 	RESTORE_CR3 scratch_reg=%r15 save_reg=%r14
 
 	testl	%ebx, %ebx			/* swapgs needed? */
diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S
index 98d5358..5b45d93 100644
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -54,6 +54,8 @@  ENTRY(entry_SYSENTER_compat)
 	SWITCH_TO_KERNEL_CR3 scratch_reg=%rsp
 
 	movq	PER_CPU_VAR(cpu_current_top_of_stack), %rsp
+	/* Restrict Indirect Branch Speculation */
+	RESTRICT_IB_SPEC
 
 	/*
 	 * User tracing code (ptrace or signal handlers) might assume that
@@ -224,12 +226,18 @@  GLOBAL(entry_SYSCALL_compat_after_hwframe)
 	pushq   $0			/* pt_regs->r14 = 0 */
 	pushq   $0			/* pt_regs->r15 = 0 */
 
-	/*
-	 * User mode is traced as though IRQs are on, and SYSENTER
+	/* Restrict Indirect Branch Speculation. All registers are saved already */
+	RESTRICT_IB_SPEC_CLOBBER
+
+	/* User mode is traced as though IRQs are on, and SYSENTER
 	 * turned them off.
 	 */
 	TRACE_IRQS_OFF
 
+	/*
+	 * We just saved %rdi so it is safe to clobber.  It is not
+	 * preserved during the C calls inside TRACE_IRQS_OFF anyway.
+	 */
 	movq	%rsp, %rdi
 	call	do_fast_syscall_32
 	/* XEN PV guests always use IRET path */
@@ -239,6 +247,15 @@  GLOBAL(entry_SYSCALL_compat_after_hwframe)
 	/* Opportunistic SYSRET */
 sysret32_from_system_call:
 	TRACE_IRQS_ON			/* User mode traces as IRQs on. */
+
+	/*
+	 * Unrestrict Indirect Branch Speculation. This is safe to do here
+	 * because there are no indirect branches between here and the
+	 * return to userspace (sysretl).
+	 * Clobber of %rax, %rcx, %rdx is OK before register restoring.
+	 */
+	UNRESTRICT_IB_SPEC_CLOBBER
+
 	movq	RBX(%rsp), %rbx		/* pt_regs->rbx */
 	movq	RBP(%rsp), %rbp		/* pt_regs->rbp */
 	movq	EFLAGS(%rsp), %r11	/* pt_regs->flags (in r11) */