mbox series

[RFC,v6,00/26] Control-flow Enforcement: Shadow Stack

Message ID 20181119214809.6086-1-yu-cheng.yu@intel.com (mailing list archive)
Headers show
Series Control-flow Enforcement: Shadow Stack | expand

Message

Yu-cheng Yu Nov. 19, 2018, 9:47 p.m. UTC
The previous version of CET Shadow Stack patches is at the following
link:

  https://lkml.org/lkml/2018/10/11/642

Summary of changes from v5:

  To support more threads, change compat-mode thread shadow stack to
  a fixed size from RLIMIT_STACK to RLIMIT_STACK / 4.  This change
  applies only to 32-bit pthreads.

  Some clean-up and small fixes in response to comments.  Thanks to
  all reviewers.

Yu-cheng Yu (26):
  Documentation/x86: Add CET description
  x86/cpufeatures: Add CET CPU feature flags for Control-flow
    Enforcement Technology (CET)
  x86/fpu/xstate: Change names to separate XSAVES system and user states
  x86/fpu/xstate: Introduce XSAVES system states
  x86/fpu/xstate: Add XSAVES system states for shadow stack
  x86/cet: Add control protection exception handler
  x86/cet/shstk: Add Kconfig option for user-mode shadow stack
  mm: Introduce VM_SHSTK for shadow stack memory
  mm/mmap: Prevent Shadow Stack VMA merges
  x86/mm: Change _PAGE_DIRTY to _PAGE_DIRTY_HW
  x86/mm: Introduce _PAGE_DIRTY_SW
  drm/i915/gvt: Update _PAGE_DIRTY to _PAGE_DIRTY_BITS
  x86/mm: Modify ptep_set_wrprotect and pmdp_set_wrprotect for
    _PAGE_DIRTY_SW
  x86/mm: Shadow stack page fault error checking
  mm: Handle shadow stack page fault
  mm: Handle THP/HugeTLB shadow stack page fault
  mm: Update can_follow_write_pte/pmd for shadow stack
  mm: Introduce do_mmap_locked()
  x86/cet/shstk: User-mode shadow stack support
  x86/cet/shstk: Introduce WRUSS instruction
  x86/cet/shstk: Signal handling for shadow stack
  x86/cet/shstk: ELF header parsing of Shadow Stack
  x86/cet/shstk: Handle thread shadow stack
  mm/mmap: Add Shadow stack pages to memory accounting
  x86/cet/shstk: Add arch_prctl functions for Shadow Stack
  x86/cet/shstk: Add Shadow Stack instructions to opcode map

 .../admin-guide/kernel-parameters.txt         |   6 +
 Documentation/index.rst                       |   1 +
 Documentation/x86/index.rst                   |  13 +
 Documentation/x86/intel_cet.rst               | 268 +++++++++++++
 arch/x86/Kconfig                              |  29 ++
 arch/x86/Makefile                             |   7 +
 arch/x86/entry/entry_64.S                     |   2 +-
 arch/x86/ia32/ia32_signal.c                   |  21 +
 arch/x86/include/asm/cet.h                    |  46 +++
 arch/x86/include/asm/cpufeatures.h            |   2 +
 arch/x86/include/asm/disabled-features.h      |   8 +-
 arch/x86/include/asm/elf.h                    |   5 +
 arch/x86/include/asm/fpu/internal.h           |   5 +-
 arch/x86/include/asm/fpu/types.h              |  22 ++
 arch/x86/include/asm/fpu/xstate.h             |  26 +-
 arch/x86/include/asm/mmu_context.h            |   3 +
 arch/x86/include/asm/msr-index.h              |  15 +
 arch/x86/include/asm/pgtable.h                | 191 ++++++++--
 arch/x86/include/asm/pgtable_types.h          |  38 +-
 arch/x86/include/asm/processor.h              |   5 +
 arch/x86/include/asm/sighandling.h            |   5 +
 arch/x86/include/asm/special_insns.h          |  32 ++
 arch/x86/include/asm/traps.h                  |   5 +
 arch/x86/include/uapi/asm/elf_property.h      |  15 +
 arch/x86/include/uapi/asm/prctl.h             |   5 +
 arch/x86/include/uapi/asm/processor-flags.h   |   2 +
 arch/x86/include/uapi/asm/sigcontext.h        |  15 +
 arch/x86/kernel/Makefile                      |   4 +
 arch/x86/kernel/cet.c                         | 304 +++++++++++++++
 arch/x86/kernel/cet_prctl.c                   |  86 +++++
 arch/x86/kernel/cpu/common.c                  |  25 ++
 arch/x86/kernel/elf.c                         | 358 ++++++++++++++++++
 arch/x86/kernel/fpu/core.c                    |  10 +-
 arch/x86/kernel/fpu/init.c                    |  10 -
 arch/x86/kernel/fpu/signal.c                  |   6 +-
 arch/x86/kernel/fpu/xstate.c                  | 158 +++++---
 arch/x86/kernel/idt.c                         |   4 +
 arch/x86/kernel/process.c                     |   7 +-
 arch/x86/kernel/process_64.c                  |   7 +
 arch/x86/kernel/relocate_kernel_64.S          |   2 +-
 arch/x86/kernel/signal.c                      |  97 +++++
 arch/x86/kernel/signal_compat.c               |   2 +-
 arch/x86/kernel/traps.c                       |  57 +++
 arch/x86/kvm/vmx.c                            |   2 +-
 arch/x86/lib/x86-opcode-map.txt               |  26 +-
 arch/x86/mm/fault.c                           |  27 ++
 arch/x86/mm/pgtable.c                         |  41 ++
 drivers/gpu/drm/i915/gvt/gtt.c                |   2 +-
 fs/binfmt_elf.c                               |  15 +
 fs/proc/task_mmu.c                            |   3 +
 include/asm-generic/pgtable.h                 |  14 +
 include/linux/mm.h                            |  26 ++
 include/uapi/asm-generic/siginfo.h            |   3 +-
 include/uapi/linux/elf.h                      |   1 +
 mm/gup.c                                      |   8 +-
 mm/huge_memory.c                              |  12 +-
 mm/memory.c                                   |   7 +-
 mm/mmap.c                                     |  11 +
 .../arch/x86/include/asm/disabled-features.h  |   8 +-
 tools/objtool/arch/x86/lib/x86-opcode-map.txt |  26 +-
 60 files changed, 2004 insertions(+), 157 deletions(-)
 create mode 100644 Documentation/x86/index.rst
 create mode 100644 Documentation/x86/intel_cet.rst
 create mode 100644 arch/x86/include/asm/cet.h
 create mode 100644 arch/x86/include/uapi/asm/elf_property.h
 create mode 100644 arch/x86/kernel/cet.c
 create mode 100644 arch/x86/kernel/cet_prctl.c
 create mode 100644 arch/x86/kernel/elf.c

Comments

Andy Lutomirski Nov. 22, 2018, 4:53 p.m. UTC | #1
[cc some more libc folks]

I have a general question about this patch set:

If I'm writing a user program, and I write a signal handler, there are
two things I want to make sure I can still do:

1. I want to be able to unwind directly from the signal handler
without involving sigreturn() -- that is, I want to make sure that
siglongjmp() works.  How does this work?  Is INCSSP involved?  How
exactly does the user program know how much to increment SSP by?  (And
why on Earth does INCSSP only consider the low 8 bits of its argument?
 That sounds like a mistake.  Can Intel still fix that?  On the other
hand, what happens if you INCSSP off the end of the shadow stack
entirely?  I assume the next access will fault as long as there's an
appropriate guard page.)

2. I want to be able to modify the signal context from a signal
handler such that, when the signal handler returns, it will return to
a frame higher up on the call stack than where the signal started and
to a different RIP value.  How can I do this?  I guess I can modify
the shadow stack with WRSS if WR_SHSTK_EN=1, but how do I tell the
kernel to kindly skip the frames I want to skip when I do sigreturn()?

The reason I'm asking #2 is that I think it's time to resurrect my old
vDSO syscall cancellation helper series here:

https://lwn.net/Articles/679434/

and it's not at all clear to me that __vdso_abort_pending_syscall()
can work without kernel assistance when CET is enabled.  I want to
make sure that it can be done, or I want to come up with some other
way to allow a signal handler to abort a syscall while CET is on.  I
could probably change __vdso_abort_pending_syscall() to instead point
RIP to __kernel_vsyscall's epilogue so that we con't change the depth
of the call stack.  But I could imagine that other user programs might
engage in similar shenanigans and want to have some way to unwind a
signal's return context without actually jumping there a la
siglongjmp().

Also, what is the intended setting of WR_SHSTK_EN with this patch set applied?

(I suppose we could just say that 32-bit processes should not use CET,
but that seems a bit sad.)
Yu-cheng Yu Nov. 26, 2018, 5:38 p.m. UTC | #2
On Thu, 2018-11-22 at 08:53 -0800, Andy Lutomirski wrote:
> [cc some more libc folks]
> 
> I have a general question about this patch set:
> 
> If I'm writing a user program, and I write a signal handler, there are
> two things I want to make sure I can still do:
> 
> 1. I want to be able to unwind directly from the signal handler
> without involving sigreturn() -- that is, I want to make sure that
> siglongjmp() works.  How does this work?  Is INCSSP involved?  How

Yes, siglongjmp() works by doing INCSSP.

> exactly does the user program know how much to increment SSP by?  (And
> why on Earth does INCSSP only consider the low 8 bits of its argument?
>  That sounds like a mistake.  Can Intel still fix that?  On the other

GLIBC calculates how many frames to be unwound and breaks into 255 batches when
necessary.

> hand, what happens if you INCSSP off the end of the shadow stack
> entirely?  I assume the next access will fault as long as there's an
> appropriate guard page.)

Yes, that is the case.

> 
> 2. I want to be able to modify the signal context from a signal
> handler such that, when the signal handler returns, it will return to
> a frame higher up on the call stack than where the signal started and
> to a different RIP value.  How can I do this?  I guess I can modify
> the shadow stack with WRSS if WR_SHSTK_EN=1, but how do I tell the
> kernel to kindly skip the frames I want to skip when I do sigreturn()?
> 
> The reason I'm asking #2 is that I think it's time to resurrect my old
> vDSO syscall cancellation helper series here:
> 
> https://lwn.net/Articles/679434/

If tools/testing/selftests/x86/unwind_vdso.c passes, can we say the kernel does
the right thing?  Or do you have other tests that I can run?

> 
> and it's not at all clear to me that __vdso_abort_pending_syscall()
> can work without kernel assistance when CET is enabled.  I want to
> make sure that it can be done, or I want to come up with some other
> way to allow a signal handler to abort a syscall while CET is on.  I
> could probably change __vdso_abort_pending_syscall() to instead point
> RIP to __kernel_vsyscall's epilogue so that we con't change the depth
> of the call stack.  But I could imagine that other user programs might
> engage in similar shenanigans and want to have some way to unwind a
> signal's return context without actually jumping there a la
> siglongjmp().
> 
> Also, what is the intended setting of WR_SHSTK_EN with this patch set applied?

This bit enables WRSS instruction, which writes to kernel SHSTK.  This patch set
uses only WRUSS and WR_SHSTK_EN is not be set.

> 
> (I suppose we could just say that 32-bit processes should not use CET,
> but that seems a bit sad.)

They work in compat mode.  Should anything break, we can fix it.

Yu-cheng
Andy Lutomirski Nov. 26, 2018, 6:29 p.m. UTC | #3
On Mon, Nov 26, 2018 at 9:44 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
>
> On Thu, 2018-11-22 at 08:53 -0800, Andy Lutomirski wrote:
> > [cc some more libc folks]

>
> >
> > 2. I want to be able to modify the signal context from a signal
> > handler such that, when the signal handler returns, it will return to
> > a frame higher up on the call stack than where the signal started and
> > to a different RIP value.  How can I do this?  I guess I can modify
> > the shadow stack with WRSS if WR_SHSTK_EN=1, but how do I tell the
> > kernel to kindly skip the frames I want to skip when I do sigreturn()?
> >
> > The reason I'm asking #2 is that I think it's time to resurrect my old
> > vDSO syscall cancellation helper series here:
> >
> > https://lwn.net/Articles/679434/
>
> If tools/testing/selftests/x86/unwind_vdso.c passes, can we say the kernel does
> the right thing?  Or do you have other tests that I can run?

I haven't written the relevant test yet.  Hopefully soon :)