Message ID | 20181011151523.27101-22-yu-cheng.yu@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Control Flow Enforcement: Shadow Stack | expand |
On 10/11/18 8:15 AM, Yu-cheng Yu wrote: > --- a/arch/x86/mm/fault.c > +++ b/arch/x86/mm/fault.c > @@ -1305,6 +1305,15 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code, > error_code |= X86_PF_USER; > flags |= FAULT_FLAG_USER; > } else { > + /* > + * WRUSS is a kernel instruction and but writes > + * to user shadow stack. When a fault occurs, > + * both X86_PF_USER and X86_PF_SHSTK are set. > + * Clear X86_PF_USER here. > + */ > + if ((error_code & (X86_PF_USER | X86_PF_SHSTK)) == > + (X86_PF_USER | X86_PF_SHSTK)) > + error_code &= ~X86_PF_USER; This hunk of code basically points out that the architecture of WRUSS is broken for Linux. The setting of X86_PF_USER for a ring-0 instruction really is a mis-feature of the architecture for us and we *undo* it in software which is unfortunate. Wish I would have caught this earlier. Andy, note that this is another case where hw_error_code and sw_error_code will diverge, unfortunately. Anyway, this is going to necessitate some comment updates in the page fault code. Yu-cheng, you are going to collide with some recent changes I made to the page fault code. Please be careful with the context when you do the merge and make sure that all the new comments stay correct.
On Tue, Nov 6, 2018 at 10:43 AM Dave Hansen <dave.hansen@intel.com> wrote: > > On 10/11/18 8:15 AM, Yu-cheng Yu wrote: > > --- a/arch/x86/mm/fault.c > > +++ b/arch/x86/mm/fault.c > > @@ -1305,6 +1305,15 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code, > > error_code |= X86_PF_USER; > > flags |= FAULT_FLAG_USER; > > } else { > > + /* > > + * WRUSS is a kernel instruction and but writes > > + * to user shadow stack. When a fault occurs, > > + * both X86_PF_USER and X86_PF_SHSTK are set. > > + * Clear X86_PF_USER here. > > + */ > > + if ((error_code & (X86_PF_USER | X86_PF_SHSTK)) == > > + (X86_PF_USER | X86_PF_SHSTK)) > > + error_code &= ~X86_PF_USER; > This hunk of code basically points out that the architecture of WRUSS is > broken for Linux. The setting of X86_PF_USER for a ring-0 instruction > really is a mis-feature of the architecture for us and we *undo* it in > software which is unfortunate. Wish I would have caught this earlier. > > Andy, note that this is another case where hw_error_code and > sw_error_code will diverge, unfortunately. > > Anyway, this is going to necessitate some comment updates in the page > fault code. Yu-cheng, you are going to collide with some recent changes > I made to the page fault code. Please be careful with the context when > you do the merge and make sure that all the new comments stay correct. I'm going to send a patch set in the next day or two that cleans it up further and is probably good preparation for WRUSS.
On Tue, 2018-11-06 at 10:43 -0800, Dave Hansen wrote: > On 10/11/18 8:15 AM, Yu-cheng Yu wrote: > > --- a/arch/x86/mm/fault.c > > +++ b/arch/x86/mm/fault.c > > @@ -1305,6 +1305,15 @@ __do_page_fault(struct pt_regs *regs, unsigned long > > error_code, > > error_code |= X86_PF_USER; > > flags |= FAULT_FLAG_USER; > > } else { > > + /* > > + * WRUSS is a kernel instruction and but writes > > + * to user shadow stack. When a fault occurs, > > + * both X86_PF_USER and X86_PF_SHSTK are set. > > + * Clear X86_PF_USER here. > > + */ > > + if ((error_code & (X86_PF_USER | X86_PF_SHSTK)) == > > + (X86_PF_USER | X86_PF_SHSTK)) > > + error_code &= ~X86_PF_USER; > > This hunk of code basically points out that the architecture of WRUSS is > broken for Linux. The setting of X86_PF_USER for a ring-0 instruction > really is a mis-feature of the architecture for us and we *undo* it in > software which is unfortunate. Wish I would have caught this earlier. > > Andy, note that this is another case where hw_error_code and > sw_error_code will diverge, unfortunately. > > Anyway, this is going to necessitate some comment updates in the page > fault code. Yu-cheng, you are going to collide with some recent changes > I made to the page fault code. Please be careful with the context when > you do the merge and make sure that all the new comments stay correct. Ok. Thanks! Yu-cheng
diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h index 317fc59b512c..37f16269747d 100644 --- a/arch/x86/include/asm/special_insns.h +++ b/arch/x86/include/asm/special_insns.h @@ -237,6 +237,38 @@ static inline void clwb(volatile void *__p) : [pax] "a" (p)); } +#ifdef CONFIG_X86_INTEL_CET +#if defined(CONFIG_IA32_EMULATION) || defined(CONFIG_X86_X32) +static inline int write_user_shstk_32(unsigned long addr, unsigned int val) +{ + asm_volatile_goto("1: wrussd %1, (%0)\n" + _ASM_EXTABLE(1b, %l[fail]) + :: "r" (addr), "r" (val) + :: fail); + return 0; +fail: + return -EPERM; +} +#else +static inline int write_user_shstk_32(unsigned long addr, unsigned int val) +{ + WARN_ONCE(1, "%s used but not supported.\n", __func__); + return -EFAULT; +} +#endif + +static inline int write_user_shstk_64(unsigned long addr, unsigned long val) +{ + asm_volatile_goto("1: wrussq %1, (%0)\n" + _ASM_EXTABLE(1b, %l[fail]) + :: "r" (addr), "r" (val) + :: fail); + return 0; +fail: + return -EPERM; +} +#endif /* CONFIG_X86_INTEL_CET */ + #define nop() asm volatile ("nop") diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index 7c3877a982f4..b91fc008f33a 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -1305,6 +1305,15 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code, error_code |= X86_PF_USER; flags |= FAULT_FLAG_USER; } else { + /* + * WRUSS is a kernel instruction and but writes + * to user shadow stack. When a fault occurs, + * both X86_PF_USER and X86_PF_SHSTK are set. + * Clear X86_PF_USER here. + */ + if ((error_code & (X86_PF_USER | X86_PF_SHSTK)) == + (X86_PF_USER | X86_PF_SHSTK)) + error_code &= ~X86_PF_USER; if (regs->flags & X86_EFLAGS_IF) local_irq_enable(); }
WRUSS is a new kernel-mode instruction but writes directly to user shadow stack memory. This is used to construct a return address on the shadow stack for the signal handler. This instruction can fault if the user shadow stack is invalid shadow stack memory. In that case, the kernel does a fixup. Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com> --- arch/x86/include/asm/special_insns.h | 32 ++++++++++++++++++++++++++++ arch/x86/mm/fault.c | 9 ++++++++ 2 files changed, 41 insertions(+)