From patchwork Wed Jul 11 20:36:36 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Popov X-Patchwork-Id: 10520475 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 30B9B600CA for ; Wed, 11 Jul 2018 20:37:47 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0EF2A206E2 for ; Wed, 11 Jul 2018 20:37:47 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 0208F28DE3; Wed, 11 Jul 2018 20:37:46 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00, MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from mother.openwall.net (mother.openwall.net [195.42.179.200]) by mail.wl.linuxfoundation.org (Postfix) with SMTP id 49700206E2 for ; Wed, 11 Jul 2018 20:37:45 +0000 (UTC) Received: (qmail 1636 invoked by uid 550); 11 Jul 2018 20:37:35 -0000 Mailing-List: contact kernel-hardening-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Delivered-To: mailing list kernel-hardening@lists.openwall.com Received: (qmail 1555 invoked from network); 11 Jul 2018 20:37:35 -0000 X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references; bh=lIy3cXJUGpw7RQnqTKoxC7WQS+2zc8DiaXlIHJUVmwM=; b=a02T+vEb0rGXOxjm9Mbj+viMdk19lOyAuxL0c0Xc4lOgWqZ8kRQeVyVf1s4KneftHl guNX3SvFVdwFP7c/Q2u3rfNNlcyJ7blcrqd4npmxhu0/Bpa1ECrQVW8vHG6NpHwH9Tqs E/a03Vy/QtcVday42KwyMm2sMd+IEOqrTmD7PJdyak6DdMteI4rHLgbeuIM9wSa1nWos elHs1NgT9nkSz0weotOihblzziIu6uvTNgZnxMrJzngwAdiHPnYv/OcAGNNDQS+PJf5f QPfHN9x8I4O1VeI8apS1lryYqpFHBV55HnKu6HZTEJZNPMD3enmCXODditXEKhWNosDA OsXg== X-Gm-Message-State: AOUpUlE8OfTOcweBZSpu5NDo6q48w0z7eOHMH5MM4yFrbsMfhWpseM10 4hL1CBi6j2P4ncCfevl0ySojs8mlT6E= X-Google-Smtp-Source: AAOMgpejKvw44ZueQB7Ru8S3+eXY9K9MvgvAm/hGlaFcSAjf72E97JSTKP2saigv3lcUlnj7+rRFdg== X-Received: by 2002:a19:ec04:: with SMTP id b4-v6mr79654lfa.91.1531341443434; Wed, 11 Jul 2018 13:37:23 -0700 (PDT) From: Alexander Popov To: kernel-hardening@lists.openwall.com, Kees Cook , PaX Team , Brad Spengler , Ingo Molnar , Andy Lutomirski , Tycho Andersen , Laura Abbott , Mark Rutland , Ard Biesheuvel , Borislav Petkov , Richard Sandiford , Thomas Gleixner , "H . Peter Anvin" , Peter Zijlstra , "Dmitry V . Levin" , Emese Revfy , Jonathan Corbet , Andrey Ryabinin , "Kirill A . Shutemov" , Thomas Garnier , Andrew Morton , Alexei Starovoitov , Josef Bacik , Masami Hiramatsu , Nicholas Piggin , Al Viro , "David S . Miller" , Ding Tianhong , David Woodhouse , Josh Poimboeuf , Steven Rostedt , Dominik Brodowski , Juergen Gross , Linus Torvalds , Greg Kroah-Hartman , Dan Williams , Dave Hansen , Mathias Krause , Vikas Shivappa , Kyle Huey , Dmitry Safonov , Will Deacon , Arnd Bergmann , Florian Weimer , Boris Lukashev , Andrey Konovalov , x86@kernel.org, linux-kernel@vger.kernel.org, alex.popov@linux.com Subject: [PATCH v14 2/6] x86/entry: Add STACKLEAK erasing the kernel stack at the end of syscalls Date: Wed, 11 Jul 2018 23:36:36 +0300 Message-Id: <1531341400-12077-3-git-send-email-alex.popov@linux.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1531341400-12077-1-git-send-email-alex.popov@linux.com> References: <1531341400-12077-1-git-send-email-alex.popov@linux.com> X-Virus-Scanned: ClamAV using ClamSMTP The STACKLEAK feature erases the kernel stack before returning from syscalls. That reduces the information which kernel stack leak bugs can reveal and blocks some uninitialized stack variable attacks. Moreover, STACKLEAK blocks kernel stack depth overflow caused by alloca(), aka Stack Clash attack. This commit introduces the code filling the used part of the kernel stack with a poison value before returning to userspace. Full STACKLEAK feature also contains the gcc plugin which comes in a separate commit. The STACKLEAK feature is ported from grsecurity/PaX. More information at: https://grsecurity.net/ https://pax.grsecurity.net/ This code is modified from Brad Spengler/PaX Team's code in the last public patch of grsecurity/PaX based on our understanding of the code. Changes or omissions from the original code are ours and don't reflect the original grsecurity/PaX code. Signed-off-by: Alexander Popov Acked-by: Thomas Gleixner Reviewed-by: Dave Hansen --- Documentation/x86/x86_64/mm.txt | 2 ++ arch/Kconfig | 27 +++++++++++++++++ arch/x86/Kconfig | 1 + arch/x86/entry/calling.h | 14 +++++++++ arch/x86/entry/entry_32.S | 7 +++++ arch/x86/entry/entry_64.S | 3 ++ arch/x86/entry/entry_64_compat.S | 5 ++++ include/linux/sched.h | 4 +++ include/linux/stackleak.h | 23 +++++++++++++++ kernel/Makefile | 4 +++ kernel/fork.c | 3 ++ kernel/stackleak.c | 62 ++++++++++++++++++++++++++++++++++++++++ 12 files changed, 155 insertions(+) create mode 100644 include/linux/stackleak.h create mode 100644 kernel/stackleak.c diff --git a/Documentation/x86/x86_64/mm.txt b/Documentation/x86/x86_64/mm.txt index 5432a96..600bc2a 100644 --- a/Documentation/x86/x86_64/mm.txt +++ b/Documentation/x86/x86_64/mm.txt @@ -24,6 +24,7 @@ ffffffffa0000000 - fffffffffeffffff (1520 MB) module mapping space [fixmap start] - ffffffffff5fffff kernel-internal fixmap range ffffffffff600000 - ffffffffff600fff (=4 kB) legacy vsyscall ABI ffffffffffe00000 - ffffffffffffffff (=2 MB) unused hole +STACKLEAK_POISON value in this last hole: ffffffffffff4111 Virtual memory map with 5 level page tables: @@ -50,6 +51,7 @@ ffffffffa0000000 - fffffffffeffffff (1520 MB) module mapping space [fixmap start] - ffffffffff5fffff kernel-internal fixmap range ffffffffff600000 - ffffffffff600fff (=4 kB) legacy vsyscall ABI ffffffffffe00000 - ffffffffffffffff (=2 MB) unused hole +STACKLEAK_POISON value in this last hole: ffffffffffff4111 Architecture defines a 64-bit virtual address. Implementations can support less. Currently supported are 48- and 57-bit virtual addresses. Bits 63 diff --git a/arch/Kconfig b/arch/Kconfig index 1aa5906..07c8929 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -414,6 +414,13 @@ config PLUGIN_HOSTCC Host compiler used to build GCC plugins. This can be $(HOSTCXX), $(HOSTCC), or a null string if GCC plugin is unsupported. +config HAVE_ARCH_STACKLEAK + bool + help + An architecture should select this if it has the code which + fills the used part of the kernel stack with the STACKLEAK_POISON + value before returning from system calls. + config HAVE_GCC_PLUGINS bool help @@ -549,6 +556,26 @@ config GCC_PLUGIN_RANDSTRUCT_PERFORMANCE in structures. This reduces the performance hit of RANDSTRUCT at the cost of weakened randomization. +config GCC_PLUGIN_STACKLEAK + bool "Erase the kernel stack before returning from syscalls" + depends on GCC_PLUGINS + depends on HAVE_ARCH_STACKLEAK + help + This option makes the kernel erase the kernel stack before + returning from system calls. That reduces the information which + kernel stack leak bugs can reveal and blocks some uninitialized + stack variable attacks. This option also blocks kernel stack depth + overflow caused by alloca(), aka Stack Clash attack. + + The tradeoff is the performance impact: on a single CPU system kernel + compilation sees a 1% slowdown, other systems and workloads may vary + and you are advised to test this feature on your expected workload + before deploying it. + + This plugin was ported from grsecurity/PaX. More information at: + * https://grsecurity.net/ + * https://pax.grsecurity.net/ + config HAVE_STACKPROTECTOR bool help diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index f1dbb4e..3718d70 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -125,6 +125,7 @@ config X86 select HAVE_ARCH_COMPAT_MMAP_BASES if MMU && COMPAT select HAVE_ARCH_SECCOMP_FILTER select HAVE_ARCH_THREAD_STRUCT_WHITELIST + select HAVE_ARCH_STACKLEAK select HAVE_ARCH_TRACEHOOK select HAVE_ARCH_TRANSPARENT_HUGEPAGE select HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD if X86_64 diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h index 352e70c..20d0885 100644 --- a/arch/x86/entry/calling.h +++ b/arch/x86/entry/calling.h @@ -329,8 +329,22 @@ For 32-bit we have the following conventions - kernel is built with #endif +.macro STACKLEAK_ERASE_NOCLOBBER +#ifdef CONFIG_GCC_PLUGIN_STACKLEAK + PUSH_AND_CLEAR_REGS + call stackleak_erase + POP_REGS +#endif +.endm + #endif /* CONFIG_X86_64 */ +.macro STACKLEAK_ERASE +#ifdef CONFIG_GCC_PLUGIN_STACKLEAK + call stackleak_erase +#endif +.endm + /* * This does 'call enter_from_user_mode' unless we can avoid it based on * kernel config or using the static jump infrastructure. diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S index c371bfe..8dc1580 100644 --- a/arch/x86/entry/entry_32.S +++ b/arch/x86/entry/entry_32.S @@ -46,6 +46,8 @@ #include #include +#include "calling.h" + .section .entry.text, "ax" /* @@ -298,6 +300,7 @@ ENTRY(ret_from_fork) /* When we fork, we trace the syscall return in the child, too. */ movl %esp, %eax call syscall_return_slowpath + STACKLEAK_ERASE jmp restore_all /* kernel thread */ @@ -458,6 +461,8 @@ ENTRY(entry_SYSENTER_32) ALTERNATIVE "testl %eax, %eax; jz .Lsyscall_32_done", \ "jmp .Lsyscall_32_done", X86_FEATURE_XENPV + STACKLEAK_ERASE + /* Opportunistic SYSEXIT */ TRACE_IRQS_ON /* User mode traces as IRQs on. */ movl PT_EIP(%esp), %edx /* pt_regs->ip */ @@ -544,6 +549,8 @@ ENTRY(entry_INT80_32) call do_int80_syscall_32 .Lsyscall_32_done: + STACKLEAK_ERASE + restore_all: TRACE_IRQS_IRET .Lrestore_all_notrace: diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index 73a522d..408101b 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -329,6 +329,8 @@ syscall_return_via_sysret: * We are on the trampoline stack. All regs except RDI are live. * We can do future final exit work right here. */ + STACKLEAK_ERASE_NOCLOBBER + SWITCH_TO_USER_CR3_STACK scratch_reg=%rdi popq %rdi @@ -687,6 +689,7 @@ GLOBAL(swapgs_restore_regs_and_return_to_usermode) * We are on the trampoline stack. All regs except RDI are live. * We can do future final exit work right here. */ + STACKLEAK_ERASE_NOCLOBBER SWITCH_TO_USER_CR3_STACK scratch_reg=%rdi diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S index 7d0df78..8eaf895 100644 --- a/arch/x86/entry/entry_64_compat.S +++ b/arch/x86/entry/entry_64_compat.S @@ -261,6 +261,11 @@ GLOBAL(entry_SYSCALL_compat_after_hwframe) /* Opportunistic SYSRET */ sysret32_from_system_call: + /* + * We are not going to return to userspace from the trampoline + * stack. So let's erase the thread stack right now. + */ + STACKLEAK_ERASE TRACE_IRQS_ON /* User mode traces as IRQs on. */ movq RBX(%rsp), %rbx /* pt_regs->rbx */ movq RBP(%rsp), %rbp /* pt_regs->rbp */ diff --git a/include/linux/sched.h b/include/linux/sched.h index 43731fe..bf773d1 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1180,6 +1180,10 @@ struct task_struct { void *security; #endif +#ifdef CONFIG_GCC_PLUGIN_STACKLEAK + unsigned long lowest_stack; +#endif + /* * New fields for task_struct should be added above here, so that * they are included in the randomized portion of task_struct. diff --git a/include/linux/stackleak.h b/include/linux/stackleak.h new file mode 100644 index 0000000..d2560159 --- /dev/null +++ b/include/linux/stackleak.h @@ -0,0 +1,23 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_STACKLEAK_H +#define _LINUX_STACKLEAK_H + +#include +#include + +/* + * Check that the poison value points to the unused hole in the + * virtual memory map for your platform. + */ +#define STACKLEAK_POISON -0xBEEF + +#define STACKLEAK_SEARCH_DEPTH 128 + +static inline void stackleak_task_init(struct task_struct *t) +{ +#ifdef CONFIG_GCC_PLUGIN_STACKLEAK + t->lowest_stack = (unsigned long)end_of_stack(t) + sizeof(unsigned long); +#endif +} + +#endif diff --git a/kernel/Makefile b/kernel/Makefile index 04bc07c..e5dc293 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -117,6 +117,10 @@ obj-$(CONFIG_HAS_IOMEM) += iomem.o obj-$(CONFIG_ZONE_DEVICE) += memremap.o obj-$(CONFIG_RSEQ) += rseq.o +obj-$(CONFIG_GCC_PLUGIN_STACKLEAK) += stackleak.o +KASAN_SANITIZE_stackleak.o := n +KCOV_INSTRUMENT_stackleak.o := n + $(obj)/configs.o: $(obj)/config_data.h targets += config_data.gz diff --git a/kernel/fork.c b/kernel/fork.c index 9440d61..1bc86ba 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -91,6 +91,7 @@ #include #include #include +#include #include #include @@ -1813,6 +1814,8 @@ static __latent_entropy struct task_struct *copy_process( if (retval) goto bad_fork_cleanup_io; + stackleak_task_init(p); + if (pid != &init_struct_pid) { pid = alloc_pid(p->nsproxy->pid_ns_for_children); if (IS_ERR(pid)) { diff --git a/kernel/stackleak.c b/kernel/stackleak.c new file mode 100644 index 0000000..bc3ad49 --- /dev/null +++ b/kernel/stackleak.c @@ -0,0 +1,62 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * This code fills the used part of the kernel stack with a poison value + * before returning to userspace. It's part of the STACKLEAK feature + * ported from grsecurity/PaX. + * + * Author: Alexander Popov + * + * STACKLEAK reduces the information which kernel stack leak bugs can + * reveal and blocks some uninitialized stack variable attacks. Moreover, + * STACKLEAK blocks stack depth overflow caused by alloca(), aka Stack Clash + * attack. + */ + +#include + +asmlinkage void stackleak_erase(void) +{ + /* It would be nice not to have 'kstack_ptr' and 'boundary' on stack */ + unsigned long kstack_ptr = current->lowest_stack; + unsigned long boundary = kstack_ptr & ~(THREAD_SIZE - 1); + unsigned int poison_count = 0; + const unsigned int depth = STACKLEAK_SEARCH_DEPTH / sizeof(unsigned long); + + /* Search for the poison value in the kernel stack */ + while (kstack_ptr > boundary && poison_count <= depth) { + if (*(unsigned long *)kstack_ptr == STACKLEAK_POISON) + poison_count++; + else + poison_count = 0; + + kstack_ptr -= sizeof(unsigned long); + } + + /* + * One 'long int' at the bottom of the thread stack is reserved and + * should not be poisoned (see CONFIG_SCHED_STACK_END_CHECK=y). + */ + if (kstack_ptr == boundary) + kstack_ptr += sizeof(unsigned long); + + /* + * Now write the poison value to the kernel stack. Start from + * 'kstack_ptr' and move up till the new 'boundary'. We assume that + * the stack pointer doesn't change when we write poison. + */ + if (on_thread_stack()) + boundary = current_stack_pointer; + else + boundary = current_top_of_stack(); + + BUG_ON(boundary - kstack_ptr >= THREAD_SIZE); + + while (kstack_ptr < boundary) { + *(unsigned long *)kstack_ptr = STACKLEAK_POISON; + kstack_ptr += sizeof(unsigned long); + } + + /* Reset the 'lowest_stack' value for the next syscall */ + current->lowest_stack = current_top_of_stack() - THREAD_SIZE/64; +} +