diff mbox series

[V4,04/50] x86/xen: Add xenpv_restore_regs_and_return_to_usermode()

Message ID 20211026141420.17138-5-jiangshanlai@gmail.com (mailing list archive)
State New, archived
Headers show
Series None | expand

Commit Message

Lai Jiangshan Oct. 26, 2021, 2:13 p.m. UTC
From: Lai Jiangshan <laijs@linux.alibaba.com>

While in the native case, PER_CPU_VAR(cpu_tss_rw + TSS_sp0) is the
trampoline stack.  But XEN pv doesn't use trampoline stack, so
PER_CPU_VAR(cpu_tss_rw + TSS_sp0) is also the kernel stack.  Hence source
and destination stacks are identical in that case, which means reusing
swapgs_restore_regs_and_return_to_usermode() in XEN pv would cause %rsp
to move up to the top of the kernel stack and leave the IRET frame below
%rsp, which is dangerous to be corrupted if #NMI / #MC hit as either of
these events occurring in the middle of the stack pushing would clobber
data on the (original) stack.

And swapgs_restore_regs_and_return_to_usermode() pushing the IRET frame
on to the original address is useless and error-prone when there is any
future attempt to modify the code.

Fixes: 7f2590a110b8 ("x86/entry/64: Use a per-CPU trampoline stack for IDT entries")
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Peter Anvin <hpa@zytor.com>
Cc: xen-devel@lists.xenproject.org>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 arch/x86/entry/entry_64.S        |  9 ++++++---
 arch/x86/entry/entry_64_compat.S |  7 ++++---
 arch/x86/xen/xen-asm.S           | 27 +++++++++++++++++++++++++++
 3 files changed, 37 insertions(+), 6 deletions(-)

Comments

Borislav Petkov Nov. 2, 2021, 8:58 a.m. UTC | #1
On Tue, Oct 26, 2021 at 10:13:34PM +0800, Lai Jiangshan wrote:
> From: Lai Jiangshan <laijs@linux.alibaba.com>
> 
> While in the native case, PER_CPU_VAR(cpu_tss_rw + TSS_sp0) is the
> trampoline stack.  But XEN pv doesn't use trampoline stack, so
> PER_CPU_VAR(cpu_tss_rw + TSS_sp0) is also the kernel stack.  Hence source
> and destination stacks are identical in that case, which means reusing
> swapgs_restore_regs_and_return_to_usermode() in XEN pv would cause %rsp
> to move up to the top of the kernel stack and leave the IRET frame below
> %rsp, which is dangerous to be corrupted if #NMI / #MC hit as either of
> these events occurring in the middle of the stack pushing would clobber
> data on the (original) stack.
> 
> And swapgs_restore_regs_and_return_to_usermode() pushing the IRET frame
> on to the original address is useless and error-prone when there is any
> future attempt to modify the code.
> 
> Fixes: 7f2590a110b8 ("x86/entry/64: Use a per-CPU trampoline stack for IDT entries")
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Juergen Gross <jgross@suse.com>
> Cc: Peter Anvin <hpa@zytor.com>
> Cc: xen-devel@lists.xenproject.org>
> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
> ---
>  arch/x86/entry/entry_64.S        |  9 ++++++---
>  arch/x86/entry/entry_64_compat.S |  7 ++++---
>  arch/x86/xen/xen-asm.S           | 27 +++++++++++++++++++++++++++
>  3 files changed, 37 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> index 9d468c8877e2..0dde5a253dda 100644
> --- a/arch/x86/entry/entry_64.S
> +++ b/arch/x86/entry/entry_64.S
> @@ -119,7 +119,7 @@ SYM_INNER_LABEL(entry_SYSCALL_64_after_hwframe, SYM_L_GLOBAL)
>  	 * In the Xen PV case we must use iret anyway.
>  	 */
>  
> -	ALTERNATIVE "", "jmp	swapgs_restore_regs_and_return_to_usermode", \
> +	ALTERNATIVE "", "jmp xenpv_restore_regs_and_return_to_usermode", \

Instead of sprinkling all those ALTERNATIVE calls everywhere,
why don't you simply jump to the xenpv-one at the
swapgs_restore_regs_and_return_to_usermode label itself and have a
single ALTERNATIVE there?
Lai Jiangshan Nov. 2, 2021, 9:19 a.m. UTC | #2
On 2021/11/2 16:58, Borislav Petkov wrote:

>>   	 */
>>   
>> -	ALTERNATIVE "", "jmp	swapgs_restore_regs_and_return_to_usermode", \
>> +	ALTERNATIVE "", "jmp xenpv_restore_regs_and_return_to_usermode", \
> 
> Instead of sprinkling all those ALTERNATIVE calls everywhere,
> why don't you simply jump to the xenpv-one at the
> swapgs_restore_regs_and_return_to_usermode label itself and have a
> single ALTERNATIVE there?
> 

It will add a 5-byte NOP at the beginning of the native
swapgs_restore_regs_and_return_to_usermode.  I avoided adding unneeded
code in the native code even if it is NOPs and avoided melting xenpv-one into
the native one which will reduce the code readability.

I will follow your preference since a 5-byte NOP is so negligible in the slow
path with an iret instruction.

Or other option that adds macros to wrap the ALTERNATIVE.
RESTORE_REGS_AND_RETURN_TO_USERMODE and
COND_RESTORE_REGS_AND_RETURN_TO_USERMODE (test %eax before jmp in native case)
Borislav Petkov Nov. 2, 2021, 9:49 a.m. UTC | #3
On Tue, Nov 02, 2021 at 05:19:46PM +0800, Lai Jiangshan wrote:
> It will add a 5-byte NOP at the beginning of the native
> swapgs_restore_regs_and_return_to_usermode.

So?

> I avoided adding unneeded code in the native code even if it is NOPs
> and avoided melting xenpv-one into the native one which will reduce
> the code readability.

How does this reduce code readability?!

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index e38a4cf795d9..bf1de54a1fca 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -567,6 +567,10 @@ __irqentry_text_end:
 
 SYM_CODE_START_LOCAL(common_interrupt_return)
 SYM_INNER_LABEL(swapgs_restore_regs_and_return_to_usermode, SYM_L_GLOBAL)
+
+	ALTERNATIVE "", "jmp xenpv_restore_regs_and_return_to_usermode", \
+                X86_FEATURE_XENPV
+
 #ifdef CONFIG_DEBUG_ENTRY
 	/* Assert that pt_regs indicates user mode. */
 	testb	$3, CS(%rsp)

> I will follow your preference since a 5-byte NOP is so negligible in the slow
> path with an iret instruction.

Yes, we do already gazillion things on those entry and exit paths.

> Or other option that adds macros to wrap the ALTERNATIVE.
> RESTORE_REGS_AND_RETURN_TO_USERMODE and
> COND_RESTORE_REGS_AND_RETURN_TO_USERMODE (test %eax before jmp in native case)

No, the main goal is to keep the asm code as readable and as simple as
possible.

If macros or whatever need to be added, there better be a good reason
for them. Saving a NOP is not one of them.

Thx.
H. Peter Anvin Nov. 2, 2021, 11:22 a.m. UTC | #4
On November 2, 2021 10:49:44 AM GMT+01:00, Borislav Petkov <bp@alien8.de> wrote:
>On Tue, Nov 02, 2021 at 05:19:46PM +0800, Lai Jiangshan wrote:
>> It will add a 5-byte NOP at the beginning of the native
>> swapgs_restore_regs_and_return_to_usermode.
>
>So?
>

It would be interesting to have an "override function with jmp" alternatives macro. It doesn't require any changes to the alternatives mechanism proper (but possibly to objtool): it would just insert an alternatives entry without adding any code including nops to the main path. It would of course only be applicable to a jmp, so a syntax like OVERRIDE_JMP feature, target rather than open-coding the instruction would probably be a good idea.

That would reduce the trade-off to zero.

>> I avoided adding unneeded code in the native code even if it is NOPs
>> and avoided melting xenpv-one into the native one which will reduce
>> the code readability.
>
>How does this reduce code readability?!
>
>diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
>index e38a4cf795d9..bf1de54a1fca 100644
>--- a/arch/x86/entry/entry_64.S
>+++ b/arch/x86/entry/entry_64.S
>@@ -567,6 +567,10 @@ __irqentry_text_end:
> 
> SYM_CODE_START_LOCAL(common_interrupt_return)
> SYM_INNER_LABEL(swapgs_restore_regs_and_return_to_usermode, SYM_L_GLOBAL)
>+
>+	ALTERNATIVE "", "jmp xenpv_restore_regs_and_return_to_usermode", \
>+                X86_FEATURE_XENPV
>+
> #ifdef CONFIG_DEBUG_ENTRY
> 	/* Assert that pt_regs indicates user mode. */
> 	testb	$3, CS(%rsp)
>
>> I will follow your preference since a 5-byte NOP is so negligible in the slow
>> path with an iret instruction.
>
>Yes, we do already gazillion things on those entry and exit paths.
>
>> Or other option that adds macros to wrap the ALTERNATIVE.
>> RESTORE_REGS_AND_RETURN_TO_USERMODE and
>> COND_RESTORE_REGS_AND_RETURN_TO_USERMODE (test %eax before jmp in native case)
>
>No, the main goal is to keep the asm code as readable and as simple as
>possible.
>
>If macros or whatever need to be added, there better be a good reason
>for them. Saving a NOP is not one of them.
>
>Thx.
>
Borislav Petkov Nov. 2, 2021, 2:27 p.m. UTC | #5
On Tue, Nov 02, 2021 at 12:22:50PM +0100, H. Peter Anvin wrote:
> It would be interesting to have an "override function with jmp"
> alternatives macro. It doesn't require any changes to the alternatives
> mechanism proper (but possibly to objtool): it would just insert an
> alternatives entry without adding any code including nops to the main
> path. It would of course only be applicable to a jmp, so a syntax like
> OVERRIDE_JMP feature, target rather than open-coding the instruction
> would probably be a good idea.

I think you wanna say ALTERNATIVE_JMP here seeing how we have
ALTERNATIVE_CALL already :)

As to marking it properly, we can finally add that struct
alt_instr.flags thing we have been trying to add for years now.

/me adds it to his evergrowing todo.

If anyone beats /me to it, /me will gladly have a look at it.
diff mbox series

Patch

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 9d468c8877e2..0dde5a253dda 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -119,7 +119,7 @@  SYM_INNER_LABEL(entry_SYSCALL_64_after_hwframe, SYM_L_GLOBAL)
 	 * In the Xen PV case we must use iret anyway.
 	 */
 
-	ALTERNATIVE "", "jmp	swapgs_restore_regs_and_return_to_usermode", \
+	ALTERNATIVE "", "jmp xenpv_restore_regs_and_return_to_usermode", \
 		X86_FEATURE_XENPV
 
 	movq	RCX(%rsp), %rcx
@@ -286,7 +286,8 @@  SYM_CODE_START(ret_from_fork)
 	UNWIND_HINT_REGS
 	movq	%rsp, %rdi
 	call	syscall_exit_to_user_mode	/* returns with IRQs disabled */
-	jmp	swapgs_restore_regs_and_return_to_usermode
+	ALTERNATIVE "jmp swapgs_restore_regs_and_return_to_usermode",	\
+		"jmp xenpv_restore_regs_and_return_to_usermode", X86_FEATURE_XENPV
 
 1:
 	/* kernel thread */
@@ -566,6 +567,7 @@  __irqentry_text_start:
 __irqentry_text_end:
 
 SYM_CODE_START_LOCAL(common_interrupt_return)
+SYM_INNER_LABEL(xenpv_restore_regs_and_return_to_usermode, SYM_L_WEAK) /* placeholder */
 SYM_INNER_LABEL(swapgs_restore_regs_and_return_to_usermode, SYM_L_GLOBAL)
 #ifdef CONFIG_DEBUG_ENTRY
 	/* Assert that pt_regs indicates user mode. */
@@ -1055,7 +1057,8 @@  SYM_CODE_START_LOCAL(error_return)
 	DEBUG_ENTRY_ASSERT_IRQS_OFF
 	testb	$3, CS(%rsp)
 	jz	restore_regs_and_return_to_kernel
-	jmp	swapgs_restore_regs_and_return_to_usermode
+	ALTERNATIVE "jmp swapgs_restore_regs_and_return_to_usermode",	\
+		"jmp xenpv_restore_regs_and_return_to_usermode", X86_FEATURE_XENPV
 SYM_CODE_END(error_return)
 
 /*
diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S
index 0051cf5c792d..2a4d9532dfd5 100644
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -139,7 +139,7 @@  SYM_INNER_LABEL(entry_SYSENTER_compat_after_hwframe, SYM_L_GLOBAL)
 	call	do_SYSENTER_32
 	/* XEN PV guests always use IRET path */
 	ALTERNATIVE "testl %eax, %eax; jz swapgs_restore_regs_and_return_to_usermode", \
-		    "jmp swapgs_restore_regs_and_return_to_usermode", X86_FEATURE_XENPV
+		    "jmp xenpv_restore_regs_and_return_to_usermode", X86_FEATURE_XENPV
 	jmp	sysret32_from_system_call
 
 .Lsysenter_fix_flags:
@@ -256,7 +256,7 @@  SYM_INNER_LABEL(entry_SYSCALL_compat_after_hwframe, SYM_L_GLOBAL)
 	call	do_fast_syscall_32
 	/* XEN PV guests always use IRET path */
 	ALTERNATIVE "testl %eax, %eax; jz swapgs_restore_regs_and_return_to_usermode", \
-		    "jmp swapgs_restore_regs_and_return_to_usermode", X86_FEATURE_XENPV
+		    "jmp xenpv_restore_regs_and_return_to_usermode", X86_FEATURE_XENPV
 
 	/* Opportunistic SYSRET */
 sysret32_from_system_call:
@@ -411,5 +411,6 @@  SYM_CODE_START(entry_INT80_compat)
 
 	movq	%rsp, %rdi
 	call	do_int80_syscall_32
-	jmp	swapgs_restore_regs_and_return_to_usermode
+	ALTERNATIVE "jmp swapgs_restore_regs_and_return_to_usermode",	\
+		"jmp xenpv_restore_regs_and_return_to_usermode", X86_FEATURE_XENPV
 SYM_CODE_END(entry_INT80_compat)
diff --git a/arch/x86/xen/xen-asm.S b/arch/x86/xen/xen-asm.S
index 220dd9678494..032be1bab113 100644
--- a/arch/x86/xen/xen-asm.S
+++ b/arch/x86/xen/xen-asm.S
@@ -20,6 +20,7 @@ 
 
 #include <linux/init.h>
 #include <linux/linkage.h>
+#include <../entry/calling.h>
 
 .pushsection .noinstr.text, "ax"
 /*
@@ -192,6 +193,32 @@  SYM_CODE_START(xen_iret)
 	jmp hypercall_iret
 SYM_CODE_END(xen_iret)
 
+/*
+ * XEN pv doesn't use trampoline stack, PER_CPU_VAR(cpu_tss_rw + TSS_sp0) is
+ * also the kernel stack.  Reusing swapgs_restore_regs_and_return_to_usermode()
+ * in XEN pv would cause %rsp to move up to the top of the kernel stack and
+ * leave the IRET frame below %rsp, which is dangerous to be corrupted if #NMI
+ * interrupts. And swapgs_restore_regs_and_return_to_usermode() pushing the IRET
+ * frame at the same address is useless.
+ */
+SYM_CODE_START(xenpv_restore_regs_and_return_to_usermode)
+	UNWIND_HINT_REGS
+#ifdef CONFIG_DEBUG_ENTRY
+	/* Assert that pt_regs indicates user mode. */
+	testb	$3, CS(%rsp)
+	jnz	1f
+	ud2
+1:
+#endif
+	POP_REGS
+
+	/* stackleak_erase() can work safely on the kernel stack. */
+	STACKLEAK_ERASE_NOCLOBBER
+
+	addq	$8, %rsp	/* skip regs->orig_ax */
+	jmp xen_iret
+SYM_CODE_END(xenpv_restore_regs_and_return_to_usermode)
+
 /*
  * Xen handles syscall callbacks much like ordinary exceptions, which
  * means we have: