diff mbox

[RFC,v2,05/27] Documentation/x86: Add CET description

Message ID 20180710222639.8241-6-yu-cheng.yu@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Yu, Yu-cheng July 10, 2018, 10:26 p.m. UTC
Explain how CET works and the no_cet_shstk/no_cet_ibt kernel
parameters.

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
---
 .../admin-guide/kernel-parameters.txt         |   6 +
 Documentation/x86/intel_cet.txt               | 250 ++++++++++++++++++
 2 files changed, 256 insertions(+)
 create mode 100644 Documentation/x86/intel_cet.txt

Comments

Pavel Machek July 11, 2018, 8:27 a.m. UTC | #1
On Tue 2018-07-10 15:26:17, Yu-cheng Yu wrote:
> Explain how CET works and the no_cet_shstk/no_cet_ibt kernel
> parameters.
> 

> --- /dev/null
> +++ b/Documentation/x86/intel_cet.txt
> @@ -0,0 +1,250 @@
> +=========================================
> +Control Flow Enforcement Technology (CET)
> +=========================================

We normally use .rst for this kind of formatted text.


> +[6] The implementation of the SHSTK
> +===================================
> +
> +SHSTK size
> +----------
> +
> +A task's SHSTK is allocated from memory to a fixed size that can
> +support 32 KB nested function calls; that is 256 KB for a 64-bit
> +application and 128 KB for a 32-bit application.  The system admin
> +can change the default size.

How does admin change that? We already have ulimit for stack size,
should those be somehow tied together?

$ ulimit -a
...
stack size              (kbytes, -s) 8192
Florian Weimer July 11, 2018, 9:57 a.m. UTC | #2
On 07/11/2018 12:26 AM, Yu-cheng Yu wrote:

> +To build a CET-enabled kernel, Binutils v2.30 and GCC v8.1 or later
> +are required.  To build a CET-enabled application, GLIBC v2.29 or
> +later is also requried.

Have you given up on getting the required changes into glibc 2.28?

Thanks,
Florian
H.J. Lu July 11, 2018, 1:47 p.m. UTC | #3
On Wed, Jul 11, 2018 at 2:57 AM, Florian Weimer <fweimer@redhat.com> wrote:
> On 07/11/2018 12:26 AM, Yu-cheng Yu wrote:
>
>> +To build a CET-enabled kernel, Binutils v2.30 and GCC v8.1 or later
>> +are required.  To build a CET-enabled application, GLIBC v2.29 or
>> +later is also requried.
>
>
> Have you given up on getting the required changes into glibc 2.28?
>

This is a typo.  We are still targeting for 2.28.  All pieces are there.
Yu, Yu-cheng July 11, 2018, 2:53 p.m. UTC | #4
On Wed, 2018-07-11 at 06:47 -0700, H.J. Lu wrote:
> On Wed, Jul 11, 2018 at 2:57 AM, Florian Weimer <fweimer@redhat.com>
> wrote:
> > 
> > On 07/11/2018 12:26 AM, Yu-cheng Yu wrote:
> > 
> > > 
> > > +To build a CET-enabled kernel, Binutils v2.30 and GCC v8.1 or
> > > later
> > > +are required.  To build a CET-enabled application, GLIBC v2.29
> > > or
> > > +later is also requried.
> > 
> > Have you given up on getting the required changes into glibc 2.28?
> > 
> This is a typo.  We are still targeting for 2.28.  All pieces are
> there.
> 

Ok, I will fix it.

Yu-cheng
Yu, Yu-cheng July 11, 2018, 3:25 p.m. UTC | #5
On Wed, 2018-07-11 at 10:27 +0200, Pavel Machek wrote:
> On Tue 2018-07-10 15:26:17, Yu-cheng Yu wrote:
> > 
> > Explain how CET works and the no_cet_shstk/no_cet_ibt kernel
> > parameters.
> > 
> > 
> > --- /dev/null
> > +++ b/Documentation/x86/intel_cet.txt
> > @@ -0,0 +1,250 @@
> > +=========================================
> > +Control Flow Enforcement Technology (CET)
> > +=========================================
> We normally use .rst for this kind of formatted text.

I will change this to a .rst file.

> 
> 
> > 
> > +[6] The implementation of the SHSTK
> > +===================================
> > +
> > +SHSTK size
> > +----------
> > +
> > +A task's SHSTK is allocated from memory to a fixed size that can
> > +support 32 KB nested function calls; that is 256 KB for a 64-bit
> > +application and 128 KB for a 32-bit application.  The system admin
> > +can change the default size.
> How does admin change that? We already have ulimit for stack size,
> should those be somehow tied together?
> 
> $ ulimit -a
> ...
> stack size              (kbytes, -s) 8192
> 

We can do that.  This makes sense to me.

Yu-cheng
diff mbox

Patch

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index efc7aa7a0670..dc787facdcde 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2661,6 +2661,12 @@ 
 			noexec=on: enable non-executable mappings (default)
 			noexec=off: disable non-executable mappings
 
+	no_cet_ibt	[X86-64] Disable indirect branch tracking for user-mode
+			applications
+
+	no_cet_shstk	[X86-64] Disable shadow stack support for user-mode
+			applications
+
 	nosmap		[X86]
 			Disable SMAP (Supervisor Mode Access Prevention)
 			even if it is supported by processor.
diff --git a/Documentation/x86/intel_cet.txt b/Documentation/x86/intel_cet.txt
new file mode 100644
index 000000000000..974bb8262146
--- /dev/null
+++ b/Documentation/x86/intel_cet.txt
@@ -0,0 +1,250 @@ 
+=========================================
+Control Flow Enforcement Technology (CET)
+=========================================
+
+[1] Overview
+============
+
+Control Flow Enforcement Technology (CET) provides protection against
+return/jump-oriented programing (ROP) attacks.  It can be implemented
+to protect both the kernel and applications.  In the first phase,
+only the user-mode protection is implemented for the 64-bit kernel.
+Thirty-two bit applications are supported under the compatibility
+mode.
+
+CET includes shadow stack (SHSTK) and indirect branch tracking (IBT)
+and they are enabled from two kernel configuration options:
+
+    INTEL_X86_SHADOW_STACK_USER, and
+    INTEL_X86_BRANCH_TRACKING_USER.
+
+To build a CET-enabled kernel, Binutils v2.30 and GCC v8.1 or later
+are required.  To build a CET-enabled application, GLIBC v2.29 or
+later is also requried.
+
+There are two command-line options for disabling CET features:
+
+    no_cet_shstk - disables SHSTK, and
+    no_cet_ibt - disables IBT.
+
+At run time, /proc/cpuinfo shows the availability of SHSTK and IBT.
+
+[2] CET assembly instructions
+=============================
+
+RDSSP %r
+    Read the SHSTK pointer into %r.
+
+INCSSP %r
+    Unwind (increment) the SHSTK pointer (0 ~ 255) steps as indicated
+    in the operand register.  The GLIBC longjmp uses INCSSP to unwind
+    the SHSTK until that matches the program stack.  When it is
+    necessary to unwind beyond 255 steps, longjmp divides and repeats
+    the process.
+
+RSTORSSP (%r)
+    Switch to the SHSTK indicated in the 'restore token' pointed by
+    the operand register and replace the 'restore token' with a new
+    token to be saved (with SAVEPREVSSP) for the outgoing SHSTK.
+
+                               Before RSTORSSP
+
+             Incoming SHSTK                   Current/Outgoing SHSTK
+
+        |----------------------|             |----------------------|
+ addr=x |                      |       ssp-> |                      |
+        |----------------------|             |----------------------|
+ (%r)-> | rstor_token=(x|Lg)   |    addr=y-8 |                      |
+        |----------------------|             |----------------------|
+
+                               After RSTORSSP
+
+        |----------------------|             |----------------------|
+  ssp-> |                      |             |                      |
+        |----------------------|             |----------------------|
+        | rstor_token=(y|Bz|Lg)|    addr=y-8 |                      |
+        |----------------------|             |----------------------|
+
+    note:
+        1. Only valid addresses and restore tokens can be on the
+           user-mode SHSTK.
+        2. A token is always of type u64 and must align to u64.
+        3. The incoming SHSTK pointer in a rstor_token must point to
+           immediately above the token.
+        4. 'Lg' is bit[0] of a rstor_token indicating a 64-bit SHSTK.
+        5. 'Bz' is bit[1] of a rstor_token indicating the token is to
+           be used only for the next SAVEPREVSSP and invalid for the
+           RSTORSSP.
+
+SAVEPREVSSP
+    Store the SHSTK 'restore token' pointed by
+        (current_SHSTK_pointer + 8).
+
+                             After SAVEPREVSSP
+
+        |----------------------|             |----------------------|
+  ssp-> |                      |             |                      |
+        |----------------------|             |----------------------|
+        | rstor_token=(y|Bz|Lg)|    addr=y-8 | rstor_token(y|Lg)    |
+        |----------------------|             |----------------------|
+
+WRUSS %r0, (%r1)
+    Write the value in %r0 to the SHSTK address pointed by (%r1).
+    This is a kernel-mode only instruction.
+
+ENDBR
+    The compiler inserts an ENDBR at all valid branch targets.  Any
+    CALL/JMP to a target without an ENDBR triggers a control
+    protection fault.
+
+[3] Application Enabling
+========================
+
+An application's CET capability is marked in its ELF header and can
+be verified from the following command output, in the
+NT_GNU_PROPERTY_TYPE_0 field:
+
+    readelf -n <application>
+
+If an application supports CET and is statically linked, it will run
+with CET protection.  If the application needs any shared libraries,
+the loader checks all dependencies and enables CET only when all
+requirements are met.
+
+[4] Legacy Libraries
+====================
+
+GLIBC provides a few tunables for backward compatibility.
+
+GLIBC_TUNABLES=glibc.tune.hwcaps=-SHSTK,-IBT
+    Turn off SHSTK/IBT for the current shell.
+
+GLIBC_TUNABLES=glibc.tune.x86_shstk=<on, permissive>
+    This controls how dlopen() handles SHSTK legacy libraries:
+        on: continue with SHSTK enabled;
+        permissive: continue with SHSTK off.
+
+[5] CET system calls
+====================
+
+The following arch_prctl() system calls are added for CET:
+
+arch_prctl(ARCH_CET_STATUS, unsigned long *addr)
+    Return CET feature status.
+
+    The parameter 'addr' is a pointer to a user buffer.
+    On returning to the caller, the kernel fills the following
+    information:
+
+    *addr = SHSTK/IBT status
+    *(addr + 1) = SHSTK base address
+    *(addr + 2) = SHSTK size
+
+arch_prctl(ARCH_CET_DISABLE, unsigned long features)
+    Disable SHSTK and/or IBT specified in 'features'.  Return -EPERM
+    if CET is locked out.
+
+arch_prctl(ARCH_CET_LOCK)
+    Lock out CET feature.
+
+arch_prctl(ARCH_CET_ALLOC_SHSTK, unsigned long *addr)
+    Allocate a new SHSTK.
+
+    The parameter 'addr' is a pointer to a user buffer and indicates
+    the desired SHSTK size to allocate.  On returning to the caller
+    the buffer contains the address of the new SHSTK.
+
+arch_prctl(ARCH_CET_LEGACY_BITMAP, unsigned long *addr)
+    Allocate an IBT legacy code bitmap if the current task does not
+    have one.
+
+    The parameter 'addr' is a pointer to a user buffer.
+    On returning to the caller, the kernel fills the following
+    information:
+
+    *addr = IBT bitmap base address
+    *(addr + 1) = IBT bitmap size
+
+[6] The implementation of the SHSTK
+===================================
+
+SHSTK size
+----------
+
+A task's SHSTK is allocated from memory to a fixed size that can
+support 32 KB nested function calls; that is 256 KB for a 64-bit
+application and 128 KB for a 32-bit application.  The system admin
+can change the default size.
+
+Signal
+------
+
+The main program and its signal handlers use the same SHSTK.  Because
+the SHSTK stores only return addresses, we can estimate a large
+enough SHSTK to cover the condition that both the program stack and
+the sigaltstack run out.
+
+The kernel creates a restore token at the SHSTK restoring address and
+verifies that token when restoring from the signal handler.
+
+Fork
+----
+
+The SHSTK's vma has VM_SHSTK flag set; its PTEs are required to be
+read-only and dirty.  When a SHSTK PTE is not present, RO, and dirty,
+a SHSTK access triggers a page fault with an additional SHSTK bit set
+in the page fault error code.
+
+When a task forks a child, its SHSTK PTEs are copied and both the
+parent's and the child's SHSTK PTEs are cleared of the dirty bit.
+Upon the next SHSTK access, the resulting SHSTK page fault is handled
+by page copy/re-use.
+
+When a pthread child is created, the kernel allocates a new SHSTK for
+the new thread.
+
+Setjmp/Longjmp
+--------------
+
+Longjmp unwinds SHSTK until it matches the program stack.
+
+Ucontext
+--------
+
+In GLIBC, getcontext/setcontext is implemented in similar way as
+setjmp/longjmp.
+
+When makecontext creates a new ucontext, a new SHSTK is allocated for
+that context with ARCH_CET_ALLOC_SHSTK the syscall.  The kernel
+creates a restore token at the top of the new SHSTK and the user-mode
+code switches to the new SHSTK with the RSTORSSP instruction.
+
+[7] The management of read-only & dirty PTEs for SHSTK
+======================================================
+
+A RO and dirty PTE exists in the following cases:
+
+(a) A page is modified and then shared with a fork()'ed child;
+(b) A R/O page that has been COW'ed;
+(c) A SHSTK page.
+
+The processor only checks the dirty bit for (c).  To prevent the use
+of non-SHSTK memory as SHSTK, we use a spare bit of the 64-bit PTE as
+DIRTY_SW for (a) and (b) above.  This results to the following PTE
+settings:
+
+Modified PTE:             (R/W + DIRTY_HW)
+Modified and shared PTE:  (R/O + DIRTY_SW)
+R/O PTE, COW'ed:          (R/O + DIRTY_SW)
+SHSTK PTE:                (R/O + DIRTY_HW)
+SHSTK PTE, COW'ed:        (R/O + DIRTY_HW)
+SHSTK PTE, shared:        (R/O + DIRTY_SW)
+
+Note that DIRTY_SW is only used in R/O PTEs but not R/W PTEs.
+
+[8] The implementation of IBT
+=============================
+
+The kernel provides IBT support in mmap() of the legacy code bit map.
+However, the management of the bitmap is done in the GLIBC or the
+application.