diff mbox series

[v5,21/27] x86/cet/shstk: Introduce WRUSS instruction

Message ID 20181011151523.27101-22-yu-cheng.yu@intel.com (mailing list archive)
State New, archived
Headers show
Series Control Flow Enforcement: Shadow Stack | expand

Commit Message

Yu-cheng Yu Oct. 11, 2018, 3:15 p.m. UTC
WRUSS is a new kernel-mode instruction but writes directly to user
shadow stack memory.  This is used to construct a return address on
the shadow stack for the signal handler.

This instruction can fault if the user shadow stack is invalid shadow
stack memory.  In that case, the kernel does a fixup.

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
---
 arch/x86/include/asm/special_insns.h | 32 ++++++++++++++++++++++++++++
 arch/x86/mm/fault.c                  |  9 ++++++++
 2 files changed, 41 insertions(+)

Comments

Dave Hansen Nov. 6, 2018, 6:43 p.m. UTC | #1
On 10/11/18 8:15 AM, Yu-cheng Yu wrote:
> --- a/arch/x86/mm/fault.c
> +++ b/arch/x86/mm/fault.c
> @@ -1305,6 +1305,15 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code,
>  		error_code |= X86_PF_USER;
>  		flags |= FAULT_FLAG_USER;
>  	} else {
> +		/*
> +		 * WRUSS is a kernel instruction and but writes
> +		 * to user shadow stack.  When a fault occurs,
> +		 * both X86_PF_USER and X86_PF_SHSTK are set.
> +		 * Clear X86_PF_USER here.
> +		 */
> +		if ((error_code & (X86_PF_USER | X86_PF_SHSTK)) ==
> +		    (X86_PF_USER | X86_PF_SHSTK))
> +			error_code &= ~X86_PF_USER;
This hunk of code basically points out that the architecture of WRUSS is
broken for Linux.  The setting of X86_PF_USER for a ring-0 instruction
really is a mis-feature of the architecture for us and we *undo* it in
software which is unfortunate.  Wish I would have caught this earlier.

Andy, note that this is another case where hw_error_code and
sw_error_code will diverge, unfortunately.

Anyway, this is going to necessitate some comment updates in the page
fault code.  Yu-cheng, you are going to collide with some recent changes
I made to the page fault code.  Please be careful with the context when
you do the merge and make sure that all the new comments stay correct.
Andy Lutomirski Nov. 6, 2018, 6:55 p.m. UTC | #2
On Tue, Nov 6, 2018 at 10:43 AM Dave Hansen <dave.hansen@intel.com> wrote:
>
> On 10/11/18 8:15 AM, Yu-cheng Yu wrote:
> > --- a/arch/x86/mm/fault.c
> > +++ b/arch/x86/mm/fault.c
> > @@ -1305,6 +1305,15 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code,
> >               error_code |= X86_PF_USER;
> >               flags |= FAULT_FLAG_USER;
> >       } else {
> > +             /*
> > +              * WRUSS is a kernel instruction and but writes
> > +              * to user shadow stack.  When a fault occurs,
> > +              * both X86_PF_USER and X86_PF_SHSTK are set.
> > +              * Clear X86_PF_USER here.
> > +              */
> > +             if ((error_code & (X86_PF_USER | X86_PF_SHSTK)) ==
> > +                 (X86_PF_USER | X86_PF_SHSTK))
> > +                     error_code &= ~X86_PF_USER;
> This hunk of code basically points out that the architecture of WRUSS is
> broken for Linux.  The setting of X86_PF_USER for a ring-0 instruction
> really is a mis-feature of the architecture for us and we *undo* it in
> software which is unfortunate.  Wish I would have caught this earlier.
>
> Andy, note that this is another case where hw_error_code and
> sw_error_code will diverge, unfortunately.
>
> Anyway, this is going to necessitate some comment updates in the page
> fault code.  Yu-cheng, you are going to collide with some recent changes
> I made to the page fault code.  Please be careful with the context when
> you do the merge and make sure that all the new comments stay correct.

I'm going to send a patch set in the next day or two that cleans it up
further and is probably good preparation for WRUSS.
Yu-cheng Yu Nov. 6, 2018, 8:21 p.m. UTC | #3
On Tue, 2018-11-06 at 10:43 -0800, Dave Hansen wrote:
> On 10/11/18 8:15 AM, Yu-cheng Yu wrote:
> > --- a/arch/x86/mm/fault.c
> > +++ b/arch/x86/mm/fault.c
> > @@ -1305,6 +1305,15 @@ __do_page_fault(struct pt_regs *regs, unsigned long
> > error_code,
> >  		error_code |= X86_PF_USER;
> >  		flags |= FAULT_FLAG_USER;
> >  	} else {
> > +		/*
> > +		 * WRUSS is a kernel instruction and but writes
> > +		 * to user shadow stack.  When a fault occurs,
> > +		 * both X86_PF_USER and X86_PF_SHSTK are set.
> > +		 * Clear X86_PF_USER here.
> > +		 */
> > +		if ((error_code & (X86_PF_USER | X86_PF_SHSTK)) ==
> > +		    (X86_PF_USER | X86_PF_SHSTK))
> > +			error_code &= ~X86_PF_USER;
> 
> This hunk of code basically points out that the architecture of WRUSS is
> broken for Linux.  The setting of X86_PF_USER for a ring-0 instruction
> really is a mis-feature of the architecture for us and we *undo* it in
> software which is unfortunate.  Wish I would have caught this earlier.
> 
> Andy, note that this is another case where hw_error_code and
> sw_error_code will diverge, unfortunately.
> 
> Anyway, this is going to necessitate some comment updates in the page
> fault code.  Yu-cheng, you are going to collide with some recent changes
> I made to the page fault code.  Please be careful with the context when
> you do the merge and make sure that all the new comments stay correct.

Ok.  Thanks!

Yu-cheng
diff mbox series

Patch

diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h
index 317fc59b512c..37f16269747d 100644
--- a/arch/x86/include/asm/special_insns.h
+++ b/arch/x86/include/asm/special_insns.h
@@ -237,6 +237,38 @@  static inline void clwb(volatile void *__p)
 		: [pax] "a" (p));
 }
 
+#ifdef CONFIG_X86_INTEL_CET
+#if defined(CONFIG_IA32_EMULATION) || defined(CONFIG_X86_X32)
+static inline int write_user_shstk_32(unsigned long addr, unsigned int val)
+{
+	asm_volatile_goto("1: wrussd %1, (%0)\n"
+			  _ASM_EXTABLE(1b, %l[fail])
+			  :: "r" (addr), "r" (val)
+			  :: fail);
+	return 0;
+fail:
+	return -EPERM;
+}
+#else
+static inline int write_user_shstk_32(unsigned long addr, unsigned int val)
+{
+	WARN_ONCE(1, "%s used but not supported.\n", __func__);
+	return -EFAULT;
+}
+#endif
+
+static inline int write_user_shstk_64(unsigned long addr, unsigned long val)
+{
+	asm_volatile_goto("1: wrussq %1, (%0)\n"
+			  _ASM_EXTABLE(1b, %l[fail])
+			  :: "r" (addr), "r" (val)
+			  :: fail);
+	return 0;
+fail:
+	return -EPERM;
+}
+#endif /* CONFIG_X86_INTEL_CET */
+
 #define nop() asm volatile ("nop")
 
 
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 7c3877a982f4..b91fc008f33a 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1305,6 +1305,15 @@  __do_page_fault(struct pt_regs *regs, unsigned long error_code,
 		error_code |= X86_PF_USER;
 		flags |= FAULT_FLAG_USER;
 	} else {
+		/*
+		 * WRUSS is a kernel instruction and but writes
+		 * to user shadow stack.  When a fault occurs,
+		 * both X86_PF_USER and X86_PF_SHSTK are set.
+		 * Clear X86_PF_USER here.
+		 */
+		if ((error_code & (X86_PF_USER | X86_PF_SHSTK)) ==
+		    (X86_PF_USER | X86_PF_SHSTK))
+			error_code &= ~X86_PF_USER;
 		if (regs->flags & X86_EFLAGS_IF)
 			local_irq_enable();
 	}