[1/9] mm: Hardened usercopy
diff mbox

Message ID 1467843928-29351-2-git-send-email-keescook@chromium.org
State New
Headers show

Commit Message

Kees Cook July 6, 2016, 10:25 p.m. UTC
This is the start of porting PAX_USERCOPY into the mainline kernel. This
is the first set of features, controlled by CONFIG_HARDENED_USERCOPY. The
work is based on code by PaX Team and Brad Spengler, and an earlier port
from Casey Schaufler. Additional non-slab page tests are from Rik van Riel.

This patch contains the logic for validating several conditions when
performing copy_to_user() and copy_from_user() on the kernel object
being copied to/from:
- address range doesn't wrap around
- address range isn't NULL or zero-allocated (with a non-zero copy size)
- if on the slab allocator:
  - object size must be less than or equal to copy size (when check is
    implemented in the allocator, which appear in subsequent patches)
- otherwise, object must not span page allocations
- if on the stack
  - object must not extend before/after the current process task
  - object must be contained by the current stack frame (when there is
    arch/build support for identifying stack frames)
- object must not overlap with kernel text

Signed-off-by: Kees Cook <keescook@chromium.org>
---
 arch/Kconfig                |   7 ++
 include/linux/slab.h        |  12 +++
 include/linux/thread_info.h |  15 +++
 mm/Makefile                 |   4 +
 mm/usercopy.c               | 239 ++++++++++++++++++++++++++++++++++++++++++++
 security/Kconfig            |  27 +++++
 6 files changed, 304 insertions(+)
 create mode 100644 mm/usercopy.c

Comments

Baruch Siach July 7, 2016, 5:37 a.m. UTC | #1
Hi Kees,

On Wed, Jul 06, 2016 at 03:25:20PM -0700, Kees Cook wrote:
> +#ifdef CONFIG_HAVE_HARDENED_USERCOPY_ALLOCATOR

Should be CONFIG_HARDENED_USERCOPY to match the slab/slub implementation 
condition.

> +const char *__check_heap_object(const void *ptr, unsigned long n,
> +				struct page *page);
> +#else
> +static inline const char *__check_heap_object(const void *ptr,
> +					      unsigned long n,
> +					      struct page *page)
> +{
> +	return NULL;
> +}
> +#endif

baruch
Thomas Gleixner July 7, 2016, 7:42 a.m. UTC | #2
On Wed, 6 Jul 2016, Kees Cook wrote:
> +
> +#if defined(CONFIG_FRAME_POINTER) && defined(CONFIG_X86)
> +	const void *frame = NULL;
> +	const void *oldframe;
> +#endif

That's ugly

> +
> +	/* Object is not on the stack at all. */
> +	if (obj + len <= stack || stackend <= obj)
> +		return 0;
> +
> +	/*
> +	 * Reject: object partially overlaps the stack (passing the
> +	 * the check above means at least one end is within the stack,
> +	 * so if this check fails, the other end is outside the stack).
> +	 */
> +	if (obj < stack || stackend < obj + len)
> +		return -1;
> +
> +#if defined(CONFIG_FRAME_POINTER) && defined(CONFIG_X86)
> +	oldframe = __builtin_frame_address(1);
> +	if (oldframe)
> +		frame = __builtin_frame_address(2);
> +	/*
> +	 * low ----------------------------------------------> high
> +	 * [saved bp][saved ip][args][local vars][saved bp][saved ip]
> +	 *		     ^----------------^
> +	 *             allow copies only within here
> +	 */
> +	while (stack <= frame && frame < stackend) {
> +		/*
> +		 * If obj + len extends past the last frame, this
> +		 * check won't pass and the next frame will be 0,
> +		 * causing us to bail out and correctly report
> +		 * the copy as invalid.
> +		 */
> +		if (obj + len <= frame)
> +			return obj >= oldframe + 2 * sizeof(void *) ? 2 : -1;
> +		oldframe = frame;
> +		frame = *(const void * const *)frame;
> +	}
> +	return -1;
> +#else
> +	return 1;
> +#endif

I'd rather make that a weak function returning 1 which can be replaced by
x86 for CONFIG_FRAME_POINTER=y. That also allows other architectures to
implement their specific frame checks.

Thanks,

	tglx
Arnd Bergmann July 7, 2016, 8:01 a.m. UTC | #3
On Wednesday, July 6, 2016 3:25:20 PM CEST Kees Cook wrote:
> This is the start of porting PAX_USERCOPY into the mainline kernel. This
> is the first set of features, controlled by CONFIG_HARDENED_USERCOPY. The
> work is based on code by PaX Team and Brad Spengler, and an earlier port
> from Casey Schaufler. Additional non-slab page tests are from Rik van Riel.
> 
> This patch contains the logic for validating several conditions when
> performing copy_to_user() and copy_from_user() on the kernel object
> being copied to/from:
> - address range doesn't wrap around
> - address range isn't NULL or zero-allocated (with a non-zero copy size)
> - if on the slab allocator:
>   - object size must be less than or equal to copy size (when check is
>     implemented in the allocator, which appear in subsequent patches)
> - otherwise, object must not span page allocations
> - if on the stack
>   - object must not extend before/after the current process task
>   - object must be contained by the current stack frame (when there is
>     arch/build support for identifying stack frames)
> - object must not overlap with kernel text
> 
> Signed-off-by: Kees Cook <keescook@chromium.org>

Nice!

I have a few further thoughts, most of which have probably been
considered before:

> +static inline const char *check_bogus_address(const void *ptr, unsigned long n)
> +{
> +	/* Reject if object wraps past end of memory. */
> +	if (ptr + n < ptr)
> +		return "<wrapped address>";
> +
> +	/* Reject if NULL or ZERO-allocation. */
> +	if (ZERO_OR_NULL_PTR(ptr))
> +		return "<null>";
> +
> +	return NULL;
> +}

This checks against address (void*)16, but I guess on most architectures the
lowest possible kernel address is much higher. While there may not be much
that to exploit if the expected kernel address points to userland, forbidding
any obviously incorrect address that is outside of the kernel may be easier.

Even on architectures like s390 that start the kernel memory at (void *)0x0,
the lowest address to which we may want to do a copy_to_user would be much
higher than (void*)0x16.

> +
> +	/* Allow kernel rodata region (if not marked as Reserved). */
> +	if (ptr >= (const void *)__start_rodata &&
> +	    end <= (const void *)__end_rodata)
> +		return NULL;

Should we explicitly forbid writing to rodata, or is it enough to
rely on page protection here?

> +	/* Allow kernel bss region (if not marked as Reserved). */
> +	if (ptr >= (const void *)__bss_start &&
> +	    end <= (const void *)__bss_stop)
> +		return NULL;

accesses to .data/.rodata/.bss are probably not performance critical,
so we could go further here and check the kallsyms table to ensure
that we are not spanning multiple symbols here.

For stuff that is performance critical, should there be a way to
opt out of the checks, or do we assume it already uses functions
that avoid the checks? I looked at the file and network I/O path
briefly and they seem to use kmap_atomic() to get to the user pages
at least in some of the common cases (but I may well be missing
important ones).

	Arnd
Rik van Riel July 7, 2016, 4:19 p.m. UTC | #4
On Wed, 2016-07-06 at 15:25 -0700, Kees Cook wrote:
> This is the start of porting PAX_USERCOPY into the mainline kernel.
> This
> is the first set of features, controlled by CONFIG_HARDENED_USERCOPY.
> The
> work is based on code by PaX Team and Brad Spengler, and an earlier
> port
> from Casey Schaufler. Additional non-slab page tests are from Rik van
> Riel.

Feel free to add my S-O-B for the code I wrote. The rest
looks good, too.

There may be some room for optimization later on, by putting
the most likely branches first, annotating with likely/unlikely,
etc, but I suspect the less likely checks are already towards
the ends of the functions.

Signed-off-by: Rik van Riel <riel@redhat.com>

> This patch contains the logic for validating several conditions when
> performing copy_to_user() and copy_from_user() on the kernel object
> being copied to/from:
> - address range doesn't wrap around
> - address range isn't NULL or zero-allocated (with a non-zero copy
> size)
> - if on the slab allocator:
>   - object size must be less than or equal to copy size (when check
> is
>     implemented in the allocator, which appear in subsequent patches)
> - otherwise, object must not span page allocations
> - if on the stack
>   - object must not extend before/after the current process task
>   - object must be contained by the current stack frame (when there
> is
>     arch/build support for identifying stack frames)
> - object must not overlap with kernel text
> 
> Signed-off-by: Kees Cook <keescook@chromium.org>
> ---
>  arch/Kconfig                |   7 ++
>  include/linux/slab.h        |  12 +++
>  include/linux/thread_info.h |  15 +++
>  mm/Makefile                 |   4 +
>  mm/usercopy.c               | 239
> ++++++++++++++++++++++++++++++++++++++++++++
>  security/Kconfig            |  27 +++++
>  6 files changed, 304 insertions(+)
>  create mode 100644 mm/usercopy.c
> 
> diff --git a/arch/Kconfig b/arch/Kconfig
> index d794384a0404..3ea04d8dcf62 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -424,6 +424,13 @@ config CC_STACKPROTECTOR_STRONG
>  
>  endchoice
>  
> +config HAVE_ARCH_LINEAR_KERNEL_MAPPING
> +	bool
> +	help
> +	  An architecture should select this if it has a secondary
> linear
> +	  mapping of the kernel text. This is used to verify that
> kernel
> +	  text exposures are not visible under
> CONFIG_HARDENED_USERCOPY.
> +
>  config HAVE_CONTEXT_TRACKING
>  	bool
>  	help
> diff --git a/include/linux/slab.h b/include/linux/slab.h
> index aeb3e6d00a66..96a16a3fb7cb 100644
> --- a/include/linux/slab.h
> +++ b/include/linux/slab.h
> @@ -155,6 +155,18 @@ void kfree(const void *);
>  void kzfree(const void *);
>  size_t ksize(const void *);
>  
> +#ifdef CONFIG_HAVE_HARDENED_USERCOPY_ALLOCATOR
> +const char *__check_heap_object(const void *ptr, unsigned long n,
> +				struct page *page);
> +#else
> +static inline const char *__check_heap_object(const void *ptr,
> +					      unsigned long n,
> +					      struct page *page)
> +{
> +	return NULL;
> +}
> +#endif
> +
>  /*
>   * Some archs want to perform DMA into kmalloc caches and need a
> guaranteed
>   * alignment larger than the alignment of a 64-bit integer.
> diff --git a/include/linux/thread_info.h
> b/include/linux/thread_info.h
> index b4c2a485b28a..a02200db9c33 100644
> --- a/include/linux/thread_info.h
> +++ b/include/linux/thread_info.h
> @@ -146,6 +146,21 @@ static inline bool
> test_and_clear_restore_sigmask(void)
>  #error "no set_restore_sigmask() provided and default one won't
> work"
>  #endif
>  
> +#ifdef CONFIG_HARDENED_USERCOPY
> +extern void __check_object_size(const void *ptr, unsigned long n,
> +					bool to_user);
> +
> +static inline void check_object_size(const void *ptr, unsigned long
> n,
> +				     bool to_user)
> +{
> +	__check_object_size(ptr, n, to_user);
> +}
> +#else
> +static inline void check_object_size(const void *ptr, unsigned long
> n,
> +				     bool to_user)
> +{ }
> +#endif /* CONFIG_HARDENED_USERCOPY */
> +
>  #endif	/* __KERNEL__ */
>  
>  #endif /* _LINUX_THREAD_INFO_H */
> diff --git a/mm/Makefile b/mm/Makefile
> index 78c6f7dedb83..32d37247c7e5 100644
> --- a/mm/Makefile
> +++ b/mm/Makefile
> @@ -21,6 +21,9 @@ KCOV_INSTRUMENT_memcontrol.o := n
>  KCOV_INSTRUMENT_mmzone.o := n
>  KCOV_INSTRUMENT_vmstat.o := n
>  
> +# Since __builtin_frame_address does work as used, disable the
> warning.
> +CFLAGS_usercopy.o += $(call cc-disable-warning, frame-address)
> +
>  mmu-y			:= nommu.o
>  mmu-$(CONFIG_MMU)	:= gup.o highmem.o memory.o mincore.o \
>  			   mlock.o mmap.o mprotect.o mremap.o
> msync.o rmap.o \
> @@ -99,3 +102,4 @@ obj-$(CONFIG_USERFAULTFD) += userfaultfd.o
>  obj-$(CONFIG_IDLE_PAGE_TRACKING) += page_idle.o
>  obj-$(CONFIG_FRAME_VECTOR) += frame_vector.o
>  obj-$(CONFIG_DEBUG_PAGE_REF) += debug_page_ref.o
> +obj-$(CONFIG_HARDENED_USERCOPY) += usercopy.o
> diff --git a/mm/usercopy.c b/mm/usercopy.c
> new file mode 100644
> index 000000000000..ad2765dd6dc4
> --- /dev/null
> +++ b/mm/usercopy.c
> @@ -0,0 +1,239 @@
> +/*
> + * This implements the various checks for CONFIG_HARDENED_USERCOPY*,
> + * which are designed to protect kernel memory from needless
> exposure
> + * and overwrite under many unintended conditions. This code is
> based
> + * on PAX_USERCOPY, which is:
> + *
> + * Copyright (C) 2001-2016 PaX Team, Bradley Spengler, Open Source
> + * Security Inc.
> + *
> + * This program is free software; you can redistribute it and/or
> modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + */
> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> +
> +#include <linux/mm.h>
> +#include <linux/slab.h>
> +#include <asm/sections.h>
> +
> +/*
> + * Checks if a given pointer and length is contained by the current
> + * stack frame (if possible).
> + *
> + *	0: not at all on the stack
> + *	1: fully on the stack (when can't do frame-checking)
> + *	2: fully inside the current stack frame
> + *	-1: error condition (invalid stack position or bad stack
> frame)
> + */
> +static noinline int check_stack_object(const void *obj, unsigned
> long len)
> +{
> +	const void * const stack = task_stack_page(current);
> +	const void * const stackend = stack + THREAD_SIZE;
> +
> +#if defined(CONFIG_FRAME_POINTER) && defined(CONFIG_X86)
> +	const void *frame = NULL;
> +	const void *oldframe;
> +#endif
> +
> +	/* Object is not on the stack at all. */
> +	if (obj + len <= stack || stackend <= obj)
> +		return 0;
> +
> +	/*
> +	 * Reject: object partially overlaps the stack (passing the
> +	 * the check above means at least one end is within the
> stack,
> +	 * so if this check fails, the other end is outside the
> stack).
> +	 */
> +	if (obj < stack || stackend < obj + len)
> +		return -1;
> +
> +#if defined(CONFIG_FRAME_POINTER) && defined(CONFIG_X86)
> +	oldframe = __builtin_frame_address(1);
> +	if (oldframe)
> +		frame = __builtin_frame_address(2);
> +	/*
> +	 * low ----------------------------------------------> high
> +	 * [saved bp][saved ip][args][local vars][saved bp][saved
> ip]
> +	 *		     ^----------------^
> +	 *             allow copies only within here
> +	 */
> +	while (stack <= frame && frame < stackend) {
> +		/*
> +		 * If obj + len extends past the last frame, this
> +		 * check won't pass and the next frame will be 0,
> +		 * causing us to bail out and correctly report
> +		 * the copy as invalid.
> +		 */
> +		if (obj + len <= frame)
> +			return obj >= oldframe + 2 * sizeof(void *)
> ? 2 : -1;
> +		oldframe = frame;
> +		frame = *(const void * const *)frame;
> +	}
> +	return -1;
> +#else
> +	return 1;
> +#endif
> +}
> +
> +static void report_usercopy(const void *ptr, unsigned long len,
> +			    bool to_user, const char *type)
> +{
> +	pr_emerg("kernel memory %s attempt detected %s %p (%s) (%lu
> bytes)\n",
> +		to_user ? "exposure" : "overwrite",
> +		to_user ? "from" : "to", ptr, type ? : "unknown",
> len);
> +	dump_stack();
> +	do_group_exit(SIGKILL);
> +}
> +
> +/* Returns true if any portion of [ptr,ptr+n) over laps with
> [low,high). */
> +static bool overlaps(const void *ptr, unsigned long n, unsigned long
> low,
> +		     unsigned long high)
> +{
> +	unsigned long check_low = (uintptr_t)ptr;
> +	unsigned long check_high = check_low + n;
> +
> +	/* Does not overlap if entirely above or entirely below. */
> +	if (check_low >= high || check_high < low)
> +		return false;
> +
> +	return true;
> +}
> +
> +/* Is this address range in the kernel text area? */
> +static inline const char *check_kernel_text_object(const void *ptr,
> +						   unsigned long n)
> +{
> +	unsigned long textlow = (unsigned long)_stext;
> +	unsigned long texthigh = (unsigned long)_etext;
> +
> +	if (overlaps(ptr, n, textlow, texthigh))
> +		return "<kernel text>";
> +
> +#ifdef HAVE_ARCH_LINEAR_KERNEL_MAPPING
> +	/* Check against linear mapping as well. */
> +	if (overlaps(ptr, n, (unsigned long)__va(__pa(textlow)),
> +		     (unsigned long)__va(__pa(texthigh))))
> +		return "<linear kernel text>";
> +#endif
> +
> +	return NULL;
> +}
> +
> +static inline const char *check_bogus_address(const void *ptr,
> unsigned long n)
> +{
> +	/* Reject if object wraps past end of memory. */
> +	if (ptr + n < ptr)
> +		return "<wrapped address>";
> +
> +	/* Reject if NULL or ZERO-allocation. */
> +	if (ZERO_OR_NULL_PTR(ptr))
> +		return "<null>";
> +
> +	return NULL;
> +}
> +
> +static inline const char *check_heap_object(const void *ptr,
> unsigned long n)
> +{
> +	struct page *page, *endpage;
> +	const void *end = ptr + n - 1;
> +
> +	if (!virt_addr_valid(ptr))
> +		return NULL;
> +
> +	page = virt_to_head_page(ptr);
> +
> +	/* Check slab allocator for flags and size. */
> +	if (PageSlab(page))
> +		return __check_heap_object(ptr, n, page);
> +
> +	/* Is the object wholly within one base page? */
> +	if (likely(((unsigned long)ptr & (unsigned long)PAGE_MASK)
> ==
> +		   ((unsigned long)end & (unsigned long)PAGE_MASK)))
> +		return NULL;
> +
> +	/* Allow if start and end are inside the same compound page.
> */
> +	endpage = virt_to_head_page(end);
> +	if (likely(endpage == page))
> +		return NULL;
> +
> +	/* Allow special areas, device memory, and sometimes kernel
> data. */
> +	if (PageReserved(page) && PageReserved(endpage))
> +		return NULL;
> +
> +	/*
> +	 * Sometimes the kernel data regions are not marked
> Reserved. And
> +	 * sometimes [_sdata,_edata) does not cover rodata and/or
> bss,
> +	 * so check each range explicitly.
> +	 */
> +
> +	/* Allow kernel data region (if not marked as Reserved). */
> +	if (ptr >= (const void *)_sdata && end <= (const void
> *)_edata)
> +		return NULL;
> +
> +	/* Allow kernel rodata region (if not marked as Reserved).
> */
> +	if (ptr >= (const void *)__start_rodata &&
> +	    end <= (const void *)__end_rodata)
> +		return NULL;
> +
> +	/* Allow kernel bss region (if not marked as Reserved). */
> +	if (ptr >= (const void *)__bss_start &&
> +	    end <= (const void *)__bss_stop)
> +		return NULL;
> +
> +	/* Uh oh. The "object" spans several independently allocated
> pages. */
> +	return "<spans multiple pages>";
> +}
> +
> +/*
> + * Validates that the given object is one of:
> + * - known safe heap object
> + * - known safe stack object
> + * - not in kernel text
> + */
> +void __check_object_size(const void *ptr, unsigned long n, bool
> to_user)
> +{
> +	const char *err;
> +
> +	/* Skip all tests if size is zero. */
> +	if (!n)
> +		return;
> +
> +	/* Check for invalid addresses. */
> +	err = check_bogus_address(ptr, n);
> +	if (err)
> +		goto report;
> +
> +	/* Check for bad heap object. */
> +	err = check_heap_object(ptr, n);
> +	if (err)
> +		goto report;
> +
> +	/* Check for bad stack object. */
> +	switch (check_stack_object(ptr, n)) {
> +	case 0:
> +		/* Object is not touching the current process stack.
> */
> +		break;
> +	case 1:
> +	case 2:
> +		/*
> +		 * Object is either in the correct frame (when it
> +		 * is possible to check) or just generally on the
> +		 * process stack (when frame checking not
> available).
> +		 */
> +		return;
> +	default:
> +		err = "<process stack>";
> +		goto report;
> +	}
> +
> +	/* Check for object in kernel to avoid text exposure. */
> +	err = check_kernel_text_object(ptr, n);
> +	if (!err)
> +		return;
> +
> +report:
> +	report_usercopy(ptr, n, to_user, err);
> +}
> +EXPORT_SYMBOL(__check_object_size);
> diff --git a/security/Kconfig b/security/Kconfig
> index 176758cdfa57..63340ad0b9f9 100644
> --- a/security/Kconfig
> +++ b/security/Kconfig
> @@ -118,6 +118,33 @@ config LSM_MMAP_MIN_ADDR
>  	  this low address space will need the permission specific
> to the
>  	  systems running LSM.
>  
> +config HAVE_HARDENED_USERCOPY_ALLOCATOR
> +	bool
> +	help
> +	  The heap allocator implements __check_heap_object() for
> +	  validating memory ranges against heap object sizes in
> +	  support of CONFIG_HARDENED_USERCOPY.
> +
> +config HAVE_ARCH_HARDENED_USERCOPY
> +	bool
> +	help
> +	  The architecture supports CONFIG_HARDENED_USERCOPY by
> +	  calling check_object_size() just before performing the
> +	  userspace copies in the low level implementation of
> +	  copy_to_user() and copy_from_user().
> +
> +config HARDENED_USERCOPY
> +	bool "Harden memory copies between kernel and userspace"
> +	depends on HAVE_ARCH_HARDENED_USERCOPY
> +	help
> +	  This option checks for obviously wrong memory regions when
> +	  copying memory to/from the kernel (via copy_to_user() and
> +	  copy_from_user() functions) by rejecting memory ranges
> that
> +	  are larger than the specified heap object, span multiple
> +	  separately allocates pages, are not on the process stack,
> +	  or are part of the kernel text. This kills entire classes
> +	  of heap overflow exploits and similar kernel memory
> exposures.
> +
>  source security/selinux/Kconfig
>  source security/smack/Kconfig
>  source security/tomoyo/Kconfig
Rik van Riel July 7, 2016, 4:35 p.m. UTC | #5
On Wed, 2016-07-06 at 15:25 -0700, Kees Cook wrote:

> +	/* Allow kernel rodata region (if not marked as Reserved).
> */
> +	if (ptr >= (const void *)__start_rodata &&
> +	    end <= (const void *)__end_rodata)
> +		return NULL;
> 
One comment here.

__check_object_size gets "to_user" as an argument.

It may make sense to pass that to check_heap_object, and
only allow copy_to_user from rodata, never copy_from_user,
since that section should be read only.

> +void __check_object_size(const void *ptr, unsigned long n, bool
> to_user)
> +{
>
Kees Cook July 7, 2016, 5:25 p.m. UTC | #6
On Thu, Jul 7, 2016 at 1:37 AM, Baruch Siach <baruch@tkos.co.il> wrote:
> Hi Kees,
>
> On Wed, Jul 06, 2016 at 03:25:20PM -0700, Kees Cook wrote:
>> +#ifdef CONFIG_HAVE_HARDENED_USERCOPY_ALLOCATOR
>
> Should be CONFIG_HARDENED_USERCOPY to match the slab/slub implementation
> condition.
>
>> +const char *__check_heap_object(const void *ptr, unsigned long n,
>> +                             struct page *page);
>> +#else
>> +static inline const char *__check_heap_object(const void *ptr,
>> +                                           unsigned long n,
>> +                                           struct page *page)
>> +{
>> +     return NULL;
>> +}
>> +#endif

Hmm, I think what I have is correct: if the allocator supports the
heap object checking, it defines __check_heap_object as existing via
CONFIG_HAVE_HARDENED_USERCOPY_ALLOCATOR. If usercopy checking is done
at all is controlled by CONFIG_HARDENED_USERCOPY.

I.e. you can have the other usercopy checks even if your allocator
doesn't support object size checking.

-Kees
Kees Cook July 7, 2016, 5:29 p.m. UTC | #7
On Thu, Jul 7, 2016 at 3:42 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Wed, 6 Jul 2016, Kees Cook wrote:
>> +
>> +#if defined(CONFIG_FRAME_POINTER) && defined(CONFIG_X86)
>> +     const void *frame = NULL;
>> +     const void *oldframe;
>> +#endif
>
> That's ugly

Yeah, I'd like to have this be controlled by a specific CONFIG, like I
invented for the linear mapping, but I wasn't sure what was the best
approach.

>
>> +
>> +     /* Object is not on the stack at all. */
>> +     if (obj + len <= stack || stackend <= obj)
>> +             return 0;
>> +
>> +     /*
>> +      * Reject: object partially overlaps the stack (passing the
>> +      * the check above means at least one end is within the stack,
>> +      * so if this check fails, the other end is outside the stack).
>> +      */
>> +     if (obj < stack || stackend < obj + len)
>> +             return -1;
>> +
>> +#if defined(CONFIG_FRAME_POINTER) && defined(CONFIG_X86)
>> +     oldframe = __builtin_frame_address(1);
>> +     if (oldframe)
>> +             frame = __builtin_frame_address(2);
>> +     /*
>> +      * low ----------------------------------------------> high
>> +      * [saved bp][saved ip][args][local vars][saved bp][saved ip]
>> +      *                   ^----------------^
>> +      *             allow copies only within here
>> +      */
>> +     while (stack <= frame && frame < stackend) {
>> +             /*
>> +              * If obj + len extends past the last frame, this
>> +              * check won't pass and the next frame will be 0,
>> +              * causing us to bail out and correctly report
>> +              * the copy as invalid.
>> +              */
>> +             if (obj + len <= frame)
>> +                     return obj >= oldframe + 2 * sizeof(void *) ? 2 : -1;
>> +             oldframe = frame;
>> +             frame = *(const void * const *)frame;
>> +     }
>> +     return -1;
>> +#else
>> +     return 1;
>> +#endif
>
> I'd rather make that a weak function returning 1 which can be replaced by
> x86 for CONFIG_FRAME_POINTER=y. That also allows other architectures to
> implement their specific frame checks.

Yeah, though I prefer CONFIG-controlled stuff over weak functions, but
I agree, something like arch_check_stack_frame(...) or similar. I'll
build something for this on the next revision.

-Kees
Kees Cook July 7, 2016, 5:37 p.m. UTC | #8
On Thu, Jul 7, 2016 at 4:01 AM, Arnd Bergmann <arnd@arndb.de> wrote:
> On Wednesday, July 6, 2016 3:25:20 PM CEST Kees Cook wrote:
>> This is the start of porting PAX_USERCOPY into the mainline kernel. This
>> is the first set of features, controlled by CONFIG_HARDENED_USERCOPY. The
>> work is based on code by PaX Team and Brad Spengler, and an earlier port
>> from Casey Schaufler. Additional non-slab page tests are from Rik van Riel.
>>
>> This patch contains the logic for validating several conditions when
>> performing copy_to_user() and copy_from_user() on the kernel object
>> being copied to/from:
>> - address range doesn't wrap around
>> - address range isn't NULL or zero-allocated (with a non-zero copy size)
>> - if on the slab allocator:
>>   - object size must be less than or equal to copy size (when check is
>>     implemented in the allocator, which appear in subsequent patches)
>> - otherwise, object must not span page allocations
>> - if on the stack
>>   - object must not extend before/after the current process task
>>   - object must be contained by the current stack frame (when there is
>>     arch/build support for identifying stack frames)
>> - object must not overlap with kernel text
>>
>> Signed-off-by: Kees Cook <keescook@chromium.org>
>
> Nice!
>
> I have a few further thoughts, most of which have probably been
> considered before:
>
>> +static inline const char *check_bogus_address(const void *ptr, unsigned long n)
>> +{
>> +     /* Reject if object wraps past end of memory. */
>> +     if (ptr + n < ptr)
>> +             return "<wrapped address>";
>> +
>> +     /* Reject if NULL or ZERO-allocation. */
>> +     if (ZERO_OR_NULL_PTR(ptr))
>> +             return "<null>";
>> +
>> +     return NULL;
>> +}
>
> This checks against address (void*)16, but I guess on most architectures the
> lowest possible kernel address is much higher. While there may not be much
> that to exploit if the expected kernel address points to userland, forbidding
> any obviously incorrect address that is outside of the kernel may be easier.
>
> Even on architectures like s390 that start the kernel memory at (void *)0x0,
> the lowest address to which we may want to do a copy_to_user would be much
> higher than (void*)0x16.

Yeah, that's worth exploring, but given the shenanigans around
set_fs(), I'd like to leave this as-is, and we can add to these checks
as we remove as much of the insane usage of set_fs().

>> +
>> +     /* Allow kernel rodata region (if not marked as Reserved). */
>> +     if (ptr >= (const void *)__start_rodata &&
>> +         end <= (const void *)__end_rodata)
>> +             return NULL;
>
> Should we explicitly forbid writing to rodata, or is it enough to
> rely on page protection here?

Hm, interesting. That's a very small check to add. My knee-jerk is to
just leave it up to page protection. I'm on the fence. :)

>
>> +     /* Allow kernel bss region (if not marked as Reserved). */
>> +     if (ptr >= (const void *)__bss_start &&
>> +         end <= (const void *)__bss_stop)
>> +             return NULL;
>
> accesses to .data/.rodata/.bss are probably not performance critical,
> so we could go further here and check the kallsyms table to ensure
> that we are not spanning multiple symbols here.

Oh, interesting! Yeah, would you be willing to put together that patch
and test it? I wonder if there are any cases where there are
legitimate usercopys across multiple symbols.

> For stuff that is performance critical, should there be a way to
> opt out of the checks, or do we assume it already uses functions
> that avoid the checks? I looked at the file and network I/O path
> briefly and they seem to use kmap_atomic() to get to the user pages
> at least in some of the common cases (but I may well be missing
> important ones).

I don't want to start with an exemption here, so until such a case is
found, I'd rather leave this as-is. That said, the primary protection
here tends to be buggy lengths (which is why put/get_user() is
untouched). For constant-sized copies, some checks could be skipped.
In the second part of this protection (what I named
CONFIG_HARDENED_USERCOPY_WHITELIST in the RFC version of this series),
there are cases where we want to skip the whitelist checking since it
is for a constant-sized copy the code understands is okay to pull out
of an otherwise disallowed allocator object.

-Kees
Kees Cook July 7, 2016, 5:41 p.m. UTC | #9
On Thu, Jul 7, 2016 at 12:35 PM, Rik van Riel <riel@redhat.com> wrote:
> On Wed, 2016-07-06 at 15:25 -0700, Kees Cook wrote:
>>
>> +     /* Allow kernel rodata region (if not marked as Reserved).
>> */
>> +     if (ptr >= (const void *)__start_rodata &&
>> +         end <= (const void *)__end_rodata)
>> +             return NULL;
>>
> One comment here.
>
> __check_object_size gets "to_user" as an argument.
>
> It may make sense to pass that to check_heap_object, and
> only allow copy_to_user from rodata, never copy_from_user,
> since that section should be read only.

Well, that's two votes for this extra check, but I'm still not sure
since it may already be allowed by the Reserved check, but I can
reorder things to _reject_ on rodata writes before the Reserved check,
etc.

I'll see what could work here...

-Kees

>
>> +void __check_object_size(const void *ptr, unsigned long n, bool
>> to_user)
>> +{
>>
>
> --
>
> All Rights Reversed.
Baruch Siach July 7, 2016, 6:35 p.m. UTC | #10
Hi Kees,

On Thu, Jul 07, 2016 at 01:25:21PM -0400, Kees Cook wrote:
> On Thu, Jul 7, 2016 at 1:37 AM, Baruch Siach <baruch@tkos.co.il> wrote:
> > On Wed, Jul 06, 2016 at 03:25:20PM -0700, Kees Cook wrote:
> >> +#ifdef CONFIG_HAVE_HARDENED_USERCOPY_ALLOCATOR
> >
> > Should be CONFIG_HARDENED_USERCOPY to match the slab/slub implementation
> > condition.
> >
> >> +const char *__check_heap_object(const void *ptr, unsigned long n,
> >> +                             struct page *page);
> >> +#else
> >> +static inline const char *__check_heap_object(const void *ptr,
> >> +                                           unsigned long n,
> >> +                                           struct page *page)
> >> +{
> >> +     return NULL;
> >> +}
> >> +#endif
> 
> Hmm, I think what I have is correct: if the allocator supports the
> heap object checking, it defines __check_heap_object as existing via
> CONFIG_HAVE_HARDENED_USERCOPY_ALLOCATOR. If usercopy checking is done
> at all is controlled by CONFIG_HARDENED_USERCOPY.
> 
> I.e. you can have the other usercopy checks even if your allocator
> doesn't support object size checking.

Right. I missed the fact that usercopy.c build also depends on 
CONFIG_HARDENED_USERCOPY. Sorry for the noise.

baruch
Thomas Gleixner July 7, 2016, 7:34 p.m. UTC | #11
On Thu, 7 Jul 2016, Kees Cook wrote:
> On Thu, Jul 7, 2016 at 3:42 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> > I'd rather make that a weak function returning 1 which can be replaced by
> > x86 for CONFIG_FRAME_POINTER=y. That also allows other architectures to
> > implement their specific frame checks.
> 
> Yeah, though I prefer CONFIG-controlled stuff over weak functions, but
> I agree, something like arch_check_stack_frame(...) or similar. I'll
> build something for this on the next revision.

I'm fine with CONFIG_CONTROLLED as long as the ifdeffery is limited to header
files.

Thanks,

	tglx
Michael Ellerman July 8, 2016, 5:34 a.m. UTC | #12
Kees Cook <keescook@chromium.org> writes:

> On Thu, Jul 7, 2016 at 4:01 AM, Arnd Bergmann <arnd@arndb.de> wrote:
>> On Wednesday, July 6, 2016 3:25:20 PM CEST Kees Cook wrote:
>>> +
>>> +     /* Allow kernel rodata region (if not marked as Reserved). */
>>> +     if (ptr >= (const void *)__start_rodata &&
>>> +         end <= (const void *)__end_rodata)
>>> +             return NULL;
>>
>> Should we explicitly forbid writing to rodata, or is it enough to
>> rely on page protection here?
>
> Hm, interesting. That's a very small check to add. My knee-jerk is to
> just leave it up to page protection. I'm on the fence. :)

There are platforms that don't have page protection, so it would be nice
if they could at least opt-in to checking for it here.

cheers

Patch
diff mbox

diff --git a/arch/Kconfig b/arch/Kconfig
index d794384a0404..3ea04d8dcf62 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -424,6 +424,13 @@  config CC_STACKPROTECTOR_STRONG
 
 endchoice
 
+config HAVE_ARCH_LINEAR_KERNEL_MAPPING
+	bool
+	help
+	  An architecture should select this if it has a secondary linear
+	  mapping of the kernel text. This is used to verify that kernel
+	  text exposures are not visible under CONFIG_HARDENED_USERCOPY.
+
 config HAVE_CONTEXT_TRACKING
 	bool
 	help
diff --git a/include/linux/slab.h b/include/linux/slab.h
index aeb3e6d00a66..96a16a3fb7cb 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -155,6 +155,18 @@  void kfree(const void *);
 void kzfree(const void *);
 size_t ksize(const void *);
 
+#ifdef CONFIG_HAVE_HARDENED_USERCOPY_ALLOCATOR
+const char *__check_heap_object(const void *ptr, unsigned long n,
+				struct page *page);
+#else
+static inline const char *__check_heap_object(const void *ptr,
+					      unsigned long n,
+					      struct page *page)
+{
+	return NULL;
+}
+#endif
+
 /*
  * Some archs want to perform DMA into kmalloc caches and need a guaranteed
  * alignment larger than the alignment of a 64-bit integer.
diff --git a/include/linux/thread_info.h b/include/linux/thread_info.h
index b4c2a485b28a..a02200db9c33 100644
--- a/include/linux/thread_info.h
+++ b/include/linux/thread_info.h
@@ -146,6 +146,21 @@  static inline bool test_and_clear_restore_sigmask(void)
 #error "no set_restore_sigmask() provided and default one won't work"
 #endif
 
+#ifdef CONFIG_HARDENED_USERCOPY
+extern void __check_object_size(const void *ptr, unsigned long n,
+					bool to_user);
+
+static inline void check_object_size(const void *ptr, unsigned long n,
+				     bool to_user)
+{
+	__check_object_size(ptr, n, to_user);
+}
+#else
+static inline void check_object_size(const void *ptr, unsigned long n,
+				     bool to_user)
+{ }
+#endif /* CONFIG_HARDENED_USERCOPY */
+
 #endif	/* __KERNEL__ */
 
 #endif /* _LINUX_THREAD_INFO_H */
diff --git a/mm/Makefile b/mm/Makefile
index 78c6f7dedb83..32d37247c7e5 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -21,6 +21,9 @@  KCOV_INSTRUMENT_memcontrol.o := n
 KCOV_INSTRUMENT_mmzone.o := n
 KCOV_INSTRUMENT_vmstat.o := n
 
+# Since __builtin_frame_address does work as used, disable the warning.
+CFLAGS_usercopy.o += $(call cc-disable-warning, frame-address)
+
 mmu-y			:= nommu.o
 mmu-$(CONFIG_MMU)	:= gup.o highmem.o memory.o mincore.o \
 			   mlock.o mmap.o mprotect.o mremap.o msync.o rmap.o \
@@ -99,3 +102,4 @@  obj-$(CONFIG_USERFAULTFD) += userfaultfd.o
 obj-$(CONFIG_IDLE_PAGE_TRACKING) += page_idle.o
 obj-$(CONFIG_FRAME_VECTOR) += frame_vector.o
 obj-$(CONFIG_DEBUG_PAGE_REF) += debug_page_ref.o
+obj-$(CONFIG_HARDENED_USERCOPY) += usercopy.o
diff --git a/mm/usercopy.c b/mm/usercopy.c
new file mode 100644
index 000000000000..ad2765dd6dc4
--- /dev/null
+++ b/mm/usercopy.c
@@ -0,0 +1,239 @@ 
+/*
+ * This implements the various checks for CONFIG_HARDENED_USERCOPY*,
+ * which are designed to protect kernel memory from needless exposure
+ * and overwrite under many unintended conditions. This code is based
+ * on PAX_USERCOPY, which is:
+ *
+ * Copyright (C) 2001-2016 PaX Team, Bradley Spengler, Open Source
+ * Security Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ */
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/mm.h>
+#include <linux/slab.h>
+#include <asm/sections.h>
+
+/*
+ * Checks if a given pointer and length is contained by the current
+ * stack frame (if possible).
+ *
+ *	0: not at all on the stack
+ *	1: fully on the stack (when can't do frame-checking)
+ *	2: fully inside the current stack frame
+ *	-1: error condition (invalid stack position or bad stack frame)
+ */
+static noinline int check_stack_object(const void *obj, unsigned long len)
+{
+	const void * const stack = task_stack_page(current);
+	const void * const stackend = stack + THREAD_SIZE;
+
+#if defined(CONFIG_FRAME_POINTER) && defined(CONFIG_X86)
+	const void *frame = NULL;
+	const void *oldframe;
+#endif
+
+	/* Object is not on the stack at all. */
+	if (obj + len <= stack || stackend <= obj)
+		return 0;
+
+	/*
+	 * Reject: object partially overlaps the stack (passing the
+	 * the check above means at least one end is within the stack,
+	 * so if this check fails, the other end is outside the stack).
+	 */
+	if (obj < stack || stackend < obj + len)
+		return -1;
+
+#if defined(CONFIG_FRAME_POINTER) && defined(CONFIG_X86)
+	oldframe = __builtin_frame_address(1);
+	if (oldframe)
+		frame = __builtin_frame_address(2);
+	/*
+	 * low ----------------------------------------------> high
+	 * [saved bp][saved ip][args][local vars][saved bp][saved ip]
+	 *		     ^----------------^
+	 *             allow copies only within here
+	 */
+	while (stack <= frame && frame < stackend) {
+		/*
+		 * If obj + len extends past the last frame, this
+		 * check won't pass and the next frame will be 0,
+		 * causing us to bail out and correctly report
+		 * the copy as invalid.
+		 */
+		if (obj + len <= frame)
+			return obj >= oldframe + 2 * sizeof(void *) ? 2 : -1;
+		oldframe = frame;
+		frame = *(const void * const *)frame;
+	}
+	return -1;
+#else
+	return 1;
+#endif
+}
+
+static void report_usercopy(const void *ptr, unsigned long len,
+			    bool to_user, const char *type)
+{
+	pr_emerg("kernel memory %s attempt detected %s %p (%s) (%lu bytes)\n",
+		to_user ? "exposure" : "overwrite",
+		to_user ? "from" : "to", ptr, type ? : "unknown", len);
+	dump_stack();
+	do_group_exit(SIGKILL);
+}
+
+/* Returns true if any portion of [ptr,ptr+n) over laps with [low,high). */
+static bool overlaps(const void *ptr, unsigned long n, unsigned long low,
+		     unsigned long high)
+{
+	unsigned long check_low = (uintptr_t)ptr;
+	unsigned long check_high = check_low + n;
+
+	/* Does not overlap if entirely above or entirely below. */
+	if (check_low >= high || check_high < low)
+		return false;
+
+	return true;
+}
+
+/* Is this address range in the kernel text area? */
+static inline const char *check_kernel_text_object(const void *ptr,
+						   unsigned long n)
+{
+	unsigned long textlow = (unsigned long)_stext;
+	unsigned long texthigh = (unsigned long)_etext;
+
+	if (overlaps(ptr, n, textlow, texthigh))
+		return "<kernel text>";
+
+#ifdef HAVE_ARCH_LINEAR_KERNEL_MAPPING
+	/* Check against linear mapping as well. */
+	if (overlaps(ptr, n, (unsigned long)__va(__pa(textlow)),
+		     (unsigned long)__va(__pa(texthigh))))
+		return "<linear kernel text>";
+#endif
+
+	return NULL;
+}
+
+static inline const char *check_bogus_address(const void *ptr, unsigned long n)
+{
+	/* Reject if object wraps past end of memory. */
+	if (ptr + n < ptr)
+		return "<wrapped address>";
+
+	/* Reject if NULL or ZERO-allocation. */
+	if (ZERO_OR_NULL_PTR(ptr))
+		return "<null>";
+
+	return NULL;
+}
+
+static inline const char *check_heap_object(const void *ptr, unsigned long n)
+{
+	struct page *page, *endpage;
+	const void *end = ptr + n - 1;
+
+	if (!virt_addr_valid(ptr))
+		return NULL;
+
+	page = virt_to_head_page(ptr);
+
+	/* Check slab allocator for flags and size. */
+	if (PageSlab(page))
+		return __check_heap_object(ptr, n, page);
+
+	/* Is the object wholly within one base page? */
+	if (likely(((unsigned long)ptr & (unsigned long)PAGE_MASK) ==
+		   ((unsigned long)end & (unsigned long)PAGE_MASK)))
+		return NULL;
+
+	/* Allow if start and end are inside the same compound page. */
+	endpage = virt_to_head_page(end);
+	if (likely(endpage == page))
+		return NULL;
+
+	/* Allow special areas, device memory, and sometimes kernel data. */
+	if (PageReserved(page) && PageReserved(endpage))
+		return NULL;
+
+	/*
+	 * Sometimes the kernel data regions are not marked Reserved. And
+	 * sometimes [_sdata,_edata) does not cover rodata and/or bss,
+	 * so check each range explicitly.
+	 */
+
+	/* Allow kernel data region (if not marked as Reserved). */
+	if (ptr >= (const void *)_sdata && end <= (const void *)_edata)
+		return NULL;
+
+	/* Allow kernel rodata region (if not marked as Reserved). */
+	if (ptr >= (const void *)__start_rodata &&
+	    end <= (const void *)__end_rodata)
+		return NULL;
+
+	/* Allow kernel bss region (if not marked as Reserved). */
+	if (ptr >= (const void *)__bss_start &&
+	    end <= (const void *)__bss_stop)
+		return NULL;
+
+	/* Uh oh. The "object" spans several independently allocated pages. */
+	return "<spans multiple pages>";
+}
+
+/*
+ * Validates that the given object is one of:
+ * - known safe heap object
+ * - known safe stack object
+ * - not in kernel text
+ */
+void __check_object_size(const void *ptr, unsigned long n, bool to_user)
+{
+	const char *err;
+
+	/* Skip all tests if size is zero. */
+	if (!n)
+		return;
+
+	/* Check for invalid addresses. */
+	err = check_bogus_address(ptr, n);
+	if (err)
+		goto report;
+
+	/* Check for bad heap object. */
+	err = check_heap_object(ptr, n);
+	if (err)
+		goto report;
+
+	/* Check for bad stack object. */
+	switch (check_stack_object(ptr, n)) {
+	case 0:
+		/* Object is not touching the current process stack. */
+		break;
+	case 1:
+	case 2:
+		/*
+		 * Object is either in the correct frame (when it
+		 * is possible to check) or just generally on the
+		 * process stack (when frame checking not available).
+		 */
+		return;
+	default:
+		err = "<process stack>";
+		goto report;
+	}
+
+	/* Check for object in kernel to avoid text exposure. */
+	err = check_kernel_text_object(ptr, n);
+	if (!err)
+		return;
+
+report:
+	report_usercopy(ptr, n, to_user, err);
+}
+EXPORT_SYMBOL(__check_object_size);
diff --git a/security/Kconfig b/security/Kconfig
index 176758cdfa57..63340ad0b9f9 100644
--- a/security/Kconfig
+++ b/security/Kconfig
@@ -118,6 +118,33 @@  config LSM_MMAP_MIN_ADDR
 	  this low address space will need the permission specific to the
 	  systems running LSM.
 
+config HAVE_HARDENED_USERCOPY_ALLOCATOR
+	bool
+	help
+	  The heap allocator implements __check_heap_object() for
+	  validating memory ranges against heap object sizes in
+	  support of CONFIG_HARDENED_USERCOPY.
+
+config HAVE_ARCH_HARDENED_USERCOPY
+	bool
+	help
+	  The architecture supports CONFIG_HARDENED_USERCOPY by
+	  calling check_object_size() just before performing the
+	  userspace copies in the low level implementation of
+	  copy_to_user() and copy_from_user().
+
+config HARDENED_USERCOPY
+	bool "Harden memory copies between kernel and userspace"
+	depends on HAVE_ARCH_HARDENED_USERCOPY
+	help
+	  This option checks for obviously wrong memory regions when
+	  copying memory to/from the kernel (via copy_to_user() and
+	  copy_from_user() functions) by rejecting memory ranges that
+	  are larger than the specified heap object, span multiple
+	  separately allocates pages, are not on the process stack,
+	  or are part of the kernel text. This kills entire classes
+	  of heap overflow exploits and similar kernel memory exposures.
+
 source security/selinux/Kconfig
 source security/smack/Kconfig
 source security/tomoyo/Kconfig