[4/6] Protectable Memory

Message ID	20180124175631.22925-5-igor.stoppa@huawei.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <kernel-hardening-return-11408-patchwork-kernel-hardening=patchwork.kernel.org@lists.openwall.com> Mailing-List: contact kernel-hardening-help@lists.openwall.com; run by ezmlm Precedence: bulk From: Igor Stoppa <igor.stoppa@huawei.com> To: <jglisse@redhat.com>, <keescook@chromium.org>, <mhocko@kernel.org>, <labbott@redhat.com>, <hch@infradead.org>, <willy@infradead.org> CC: <cl@linux.com>, <linux-security-module@vger.kernel.org>, <linux-mm@kvack.org>, <linux-kernel@vger.kernel.org>, <kernel-hardening@lists.openwall.com>, Igor Stoppa <igor.stoppa@huawei.com> Date: Wed, 24 Jan 2018 19:56:29 +0200 Message-ID: <20180124175631.22925-5-igor.stoppa@huawei.com> In-Reply-To: <20180124175631.22925-1-igor.stoppa@huawei.com> References: <20180124175631.22925-1-igor.stoppa@huawei.com> MIME-Version: 1.0 Content-Type: text/plain Subject: [kernel-hardening] [PATCH 4/6] Protectable Memory

On Thu, Jan 25, 2018 at 6:59 AM, Igor Stoppa <igor.stoppa@huawei.com> wrote: > > Hi, > > thanks for the review. My reply below. > > On 24/01/18 21:10, Jann Horn wrote: > > > I'm not entirely convinced by the approach of marking small parts of > > kernel memory as readonly for hardening. > > Because of the physmap you mention later? > > Regarding small parts vs big parts (what is big enough?) I did propose > the use of a custom zone at the very beginning, however I met 2 objections: > > 1. It's not a special case and there was no will to reserve another zone > This might be mitigated by aliasing with a zone that is already > defined, but not in use. For example DMA or DMA32. > But it looks like a good way to replicate the confusion that is page > struct. Anyway, I found the next objection more convincing. > > 2. What would be the size of this zone? It would become something that > is really application specific. At the very least it should become a > command line parameter. A distro would have to allocate a lot of > memory for it, because it cannot really know upfront what its users > will do. But, most likely, the vast majority of users would never > need that much. > > If you have some idea of how to address these objections without using > vmalloc, or at least without using the same page provider that vmalloc > is using now, I'd be interested to hear it. > > Besides the double mapping problem, the major benefit I can see from > having a contiguous area is that it simplifies the hardened user copy > verification, because there is a fixed range to test for overlap. > > > Comments on some details are inline. > > thank you > > >> diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h > >> index 1e5d8c3..116d280 100644 > >> --- a/include/linux/vmalloc.h > >> +++ b/include/linux/vmalloc.h > >> @@ -20,6 +20,7 @@ struct notifier_block; /* in notifier.h */ > >> #define VM_UNINITIALIZED 0x00000020 /* vm_struct is not fully initialized */ > >> #define VM_NO_GUARD 0x00000040 /* don't add guard page */ > >> #define VM_KASAN 0x00000080 /* has allocated kasan shadow memory */ > >> +#define VM_PMALLOC 0x00000100 /* pmalloc area - see docs */ > > > > Is "see docs" specific enough to actually guide the reader to the > > right documentation? > > The doc file is named pmalloc.txt, but I can be more explicit. > > >> +#define pmalloc_attr_init(data, attr_name) \ > >> +do { \ > >> + sysfs_attr_init(&data->attr_##attr_name.attr); \ > >> + data->attr_##attr_name.attr.name = #attr_name; \ > >> + data->attr_##attr_name.attr.mode = VERIFY_OCTAL_PERMISSIONS(0444); \ > >> + data->attr_##attr_name.show = pmalloc_pool_show_##attr_name; \ > >> +} while (0) > > > > Is there a good reason for making all these files mode 0444 (as > > opposed to setting them to 0400 and then allowing userspace to make > > them accessible if desired)? /proc/slabinfo contains vaguely similar > > data and is mode 0400 (or mode 0600, depending on the kernel config) > > AFAICS. > > ok, you do have a point, so far I have been mostly focusing on the > > "drop-in replacement for kmalloc" aspect. > > >> +void *pmalloc(struct gen_pool *pool, size_t size, gfp_t gfp) > >> +{ > > [...] > >> + /* Expand pool */ > >> + chunk_size = roundup(size, PAGE_SIZE); > >> + chunk = vmalloc(chunk_size); > > > > You're allocating with vmalloc(), which, as far as I know, establishes > > a second mapping in the vmalloc area for pages that are already mapped > > as RW through the physmap. AFAICS, later, when you're trying to make > > pages readonly, you're only changing the protections on the second > > mapping in the vmalloc area, therefore leaving the memory writable > > through the physmap. Is that correct? If so, please either document > > the reasoning why this is okay or change it. > > About why vmalloc as backend for pmalloc, please refer to this: > > http://www.openwall.com/lists/kernel-hardening/2018/01/24/11 > > I tried to give a short summary of what took me toward vmalloc. > vmalloc is also a convenient way of obtaining arbitrarily (within > reason) large amounts of virtually contiguous memory. > > Your objection is toward the unprotected access, through the alternate > mapping, rather than to the idea of having pools that can be protected > individually, right? > > In the mail I linked, I explained that I could not use kmalloc because > of the problem of splitting huge pages on ARM. > > kmalloc does require the physmap, for performance reason. > > However, vmalloc is already doing mapping of individual pages, because > it must ensure that they are virtually contiguous, so would it be > possible to have vmalloc _always_ outside of the physmap? > > If I have understood correctly, the actual extension of physmap is > highly architecture and platform dependant, so it might be (but I have > not checked) that in some cases (like some 32bit systems) vmalloc is > typically outside of physmap, but probably that is not the case on 64bit? > > Also, I need to understand how physmap works against vmalloc vs how it > works against kernel text and const/__ro_after_init sections. > > Can they also be accessed (and written?) through the physmap? > > But, to take a different angle: if an attacker knows where kernel > symbols are and has gained capability to write at arbitrary location(s) > in kernel data, what prevents a modification of mappings and permissions? > > What is considered robust enough? > > I have the impression that, without support from HW, to have some > one-way mechanism that protects some page permanently, it's always > possible to undo the various protections we are talking about, only harder. > > From the perspective of protecting against accidental overwrites, > instead, the current implementation should be ok, since it's less likely > that some stray pointer happens to assume a value that goes through the > physmap. > > But I'm interested to hear, if you have some suggestion about how to > prevent the side access through the physmap. > > -- > thanks, igor DMA/physmap access coupled with a knowledge of which virtual mappings are in the physical space should be enough for an attacker to bypass the gating mechanism this work imposes. Not trivial, but not impossible. Since there's no way to prevent that sort of access in current hardware (especially something like a NIC or GPU working independently of the CPU altogether), we have the option of checking contents of a sealed page against a checksum/hash of the page prior to returning its contents to the caller (since it needs to be read to be verified), or some other mechanism within the read path to ensure that no event since the last read affected the page/allocation. If the structure containing the list of verifiers is separate from the page, the attacker needs to resolve and change the contents of those signatures for the pages they're affecting via DMA before the kernel checks one against the other in the read path. I cant speak to overhead, but it should complicate the logic of a successful attack chain. Off the cuff, if the allocator sums the contents when sealing a page, stores it in a lookup table, and forces verification on every read/lookup, it should prevent _use_ of memory which was modified unless our attacker is clever enough to fix that up prior to the next access. Since its write-once memory, race conditions on subsequent access shouldn't be a problem.

diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h index a8fdabf..9f2974f 100644 --- a/include/linux/genalloc.h +++ b/include/linux/genalloc.h @@ -121,6 +121,9 @@ extern unsigned long gen_pool_alloc_algo(struct gen_pool *, size_t, extern void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size, dma_addr_t *dma); extern void gen_pool_free(struct gen_pool *, unsigned long, size_t); + +extern void gen_pool_flush_chunk(struct gen_pool *pool, + struct gen_pool_chunk *chunk); extern void gen_pool_for_each_chunk(struct gen_pool *, void (*)(struct gen_pool *, struct gen_pool_chunk *, void *), void *); extern size_t gen_pool_avail(struct gen_pool *); diff --git a/include/linux/pmalloc.h b/include/linux/pmalloc.h new file mode 100644 index 0000000..cb18739 --- /dev/null +++ b/include/linux/pmalloc.h @@ -0,0 +1,215 @@ +/* + * pmalloc.h: Header for Protectable Memory Allocator + * + * (C) Copyright 2017 Huawei Technologies Co. Ltd. + * Author: Igor Stoppa <igor.stoppa@huawei.com> + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; version 2 + * of the License. + */ + +#ifndef _PMALLOC_H +#define _PMALLOC_H + + +#include <linux/genalloc.h> +#include <linux/string.h> + +#define PMALLOC_DEFAULT_ALLOC_ORDER (-1) + +/* + * Library for dynamic allocation of pools of memory that can be, + * after initialization, marked as read-only. + * + * This is intended to complement __read_only_after_init, for those cases + * where either it is not possible to know the initialization value before + * init is completed, or the amount of data is variable and can be + * determined only at run-time. + * + * ***WARNING*** + * The user of the API is expected to synchronize: + * 1) allocation, + * 2) writes to the allocated memory, + * 3) write protection of the pool, + * 4) freeing of the allocated memory, and + * 5) destruction of the pool. + * + * For a non-threaded scenario, this type of locking is not even required. + * + * Even if the library were to provide support for locking, point 2) + * would still depend on the user taking the lock. + */ + + +/** + * pmalloc_create_pool - create a new protectable memory pool - + * @name: the name of the pool, must be unique + * @min_alloc_order: log2 of the minimum allocation size obtainable + * from the pool + * + * Creates a new (empty) memory pool for allocation of protectable + * memory. Memory will be allocated upon request (through pmalloc). + * + * Returns a pointer to the new pool upon success, otherwise a NULL. + */ +struct gen_pool *pmalloc_create_pool(const char *name, + int min_alloc_order); + + +int is_pmalloc_object(const void *ptr, const unsigned long n); + +/** + * pmalloc_prealloc - tries to allocate a memory chunk of the requested size + * @pool: handler to the pool to be used for memory allocation + * @size: amount of memory (in bytes) requested + * + * Prepares a chunk of the requested size. + * This is intended to both minimize latency in later memory requests and + * avoid sleping during allocation. + * Memory allocated with prealloc is stored in one single chunk, as + * opposite to what is allocated on-demand when pmalloc runs out of free + * space already existing in the pool and has to invoke vmalloc. + * + * Returns true if the vmalloc call was successful, false otherwise. + */ +bool pmalloc_prealloc(struct gen_pool *pool, size_t size); + +/** + * pmalloc - allocate protectable memory from a pool + * @pool: handler to the pool to be used for memory allocation + * @size: amount of memory (in bytes) requested + * @gfp: flags for page allocation + * + * Allocates memory from an unprotected pool. If the pool doesn't have + * enough memory, and the request did not include GFP_ATOMIC, an attempt + * is made to add a new chunk of memory to the pool + * (a multiple of PAGE_SIZE), in order to fit the new request. + * Otherwise, NULL is returned. + * + * Returns the pointer to the memory requested upon success, + * NULL otherwise (either no memory available or pool already read-only). + */ +void *pmalloc(struct gen_pool *pool, size_t size, gfp_t gfp); + + +/** + * pzalloc - zero-initialized version of pmalloc + * @pool: handler to the pool to be used for memory allocation + * @size: amount of memory (in bytes) requested + * @gfp: flags for page allocation + * + * Executes pmalloc, initializing the memory requested to 0, + * before returning the pointer to it. + * + * Returns the pointer to the zeroed memory requested, upon success, + * NULL otherwise (either no memory available or pool already read-only). + */ +static inline void *pzalloc(struct gen_pool *pool, size_t size, gfp_t gfp) +{ + return pmalloc(pool, size, gfp | __GFP_ZERO); +} + +/** + * pmalloc_array - allocates an array according to the parameters + * @pool: handler to the pool to be used for memory allocation + * @size: amount of memory (in bytes) requested + * @gfp: flags for page allocation + * + * Executes pmalloc, if it has a chance to succeed. + * + * Returns either NULL or the pmalloc result. + */ +static inline void *pmalloc_array(struct gen_pool *pool, size_t n, + size_t size, gfp_t flags) +{ + if (unlikely(!(pool && n && size))) + return NULL; + return pmalloc(pool, n * size, flags); +} + +/** + * pcalloc - allocates a 0-initialized array according to the parameters + * @pool: handler to the pool to be used for memory allocation + * @size: amount of memory (in bytes) requested + * @gfp: flags for page allocation + * + * Executes pmalloc, if it has a chance to succeed. + * + * Returns either NULL or the pmalloc result. + */ +static inline void *pcalloc(struct gen_pool *pool, size_t n, + size_t size, gfp_t flags) +{ + return pmalloc_array(pool, n, size, flags | __GFP_ZERO); +} + +/** + * pstrdup - duplicate a string, using pmalloc as allocator + * @pool: handler to the pool to be used for memory allocation + * @s: string to duplicate + * @gfp: flags for page allocation + * + * Generates a copy of the given string, allocating sufficient memory + * from the given pmalloc pool. + * + * Returns a pointer to the replica, NULL in case of recoverable error. + */ +static inline char *pstrdup(struct gen_pool *pool, const char *s, gfp_t gfp) +{ + size_t len; + char *buf; + + if (unlikely(pool == NULL || s == NULL)) + return NULL; + + len = strlen(s) + 1; + buf = pmalloc(pool, len, gfp); + if (likely(buf)) + strncpy(buf, s, len); + return buf; +} + +/** + * pmalloc_protect_pool - turn a read/write pool read-only + * @pool: the pool to protect + * + * Write-protects all the memory chunks assigned to the pool. + * This prevents any further allocation. + * + * Returns 0 upon success, -EINVAL in abnormal cases. + */ +int pmalloc_protect_pool(struct gen_pool *pool); + +/** + * pfree - mark as unused memory that was previously in use + * @pool: handler to the pool to be used for memory allocation + * @addr: the beginning of the memory area to be freed + * + * The behavior of pfree is different, depending on the state of the + * protection. + * If the pool is not yet protected, the memory is marked as unused and + * will be availabel for further allocations. + * If the pool is already protected, the memory is marked as unused, but + * it will still be impossible to perform further allocation, because of + * the existing protection. + * The freed memory, in this case, will be truly released only when the + * pool is destroyed. + */ +static inline void pfree(struct gen_pool *pool, const void *addr) +{ + gen_pool_free(pool, (unsigned long)addr, 0); +} + +/** + * pmalloc_destroy_pool - destroys a pool and all the associated memory + * @pool: the pool to destroy + * + * All the memory that was allocated through pmalloc in the pool will be freed. + * + * Returns 0 upon success, -EINVAL in abnormal cases. + */ +int pmalloc_destroy_pool(struct gen_pool *pool); + +#endif diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h index 1e5d8c3..116d280 100644 --- a/include/linux/vmalloc.h +++ b/include/linux/vmalloc.h @@ -20,6 +20,7 @@ struct notifier_block; /* in notifier.h */ #define VM_UNINITIALIZED 0x00000020 /* vm_struct is not fully initialized */ #define VM_NO_GUARD 0x00000040 /* don't add guard page */ #define VM_KASAN 0x00000080 /* has allocated kasan shadow memory */ +#define VM_PMALLOC 0x00000100 /* pmalloc area - see docs */ /* bits [20..32] reserved for arch specific ioremap internals */ /* diff --git a/lib/genalloc.c b/lib/genalloc.c index 13bc8cf..8ce616fb 100644 --- a/lib/genalloc.c +++ b/lib/genalloc.c @@ -519,6 +519,33 @@ void gen_pool_free(struct gen_pool *pool, unsigned long addr, size_t size) } EXPORT_SYMBOL(gen_pool_free); + +/** + * gen_pool_flush_chunk - drops all the allocations from a specific chunk + * @pool: the generic memory pool + * @chunk: The chunk to wipe clear. + * + * This is meant to be called only while destroying a pool. It's up to the + * caller to avoid races, but really, at this point the pool should have + * already been retired and have become unavailable for any other sort of + * operation. + */ +void gen_pool_flush_chunk(struct gen_pool *pool, + struct gen_pool_chunk *chunk) +{ + size_t size; + + if (unlikely(!(pool && chunk))) + return; + + size = chunk->end_addr + 1 - chunk->start_addr; + memset(chunk->entries, 0, + DIV_ROUND_UP(size >> pool->min_alloc_order * BITS_PER_ENTRY, + BITS_PER_BYTE)); + atomic_set(&chunk->avail, size); +} + + /** * gen_pool_for_each_chunk - call func for every chunk of generic memory pool * @pool: the generic memory pool diff --git a/mm/Makefile b/mm/Makefile index e669f02..a6a47e1 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -65,6 +65,7 @@ obj-$(CONFIG_SPARSEMEM) += sparse.o obj-$(CONFIG_SPARSEMEM_VMEMMAP) += sparse-vmemmap.o obj-$(CONFIG_SLOB) += slob.o obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o +obj-$(CONFIG_ARCH_HAS_SET_MEMORY) += pmalloc.o obj-$(CONFIG_KSM) += ksm.o obj-$(CONFIG_PAGE_POISONING) += page_poison.o obj-$(CONFIG_SLAB) += slab.o diff --git a/mm/pmalloc.c b/mm/pmalloc.c new file mode 100644 index 0000000..a64ac49 --- /dev/null +++ b/mm/pmalloc.c @@ -0,0 +1,513 @@ +/* + * pmalloc.c: Protectable Memory Allocator + * + * (C) Copyright 2017 Huawei Technologies Co. Ltd. + * Author: Igor Stoppa <igor.stoppa@huawei.com> + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; version 2 + * of the License. + */ + +#include <linux/printk.h> +#include <linux/init.h> +#include <linux/mm.h> +#include <linux/vmalloc.h> +#include <linux/genalloc.h> +#include <linux/kernel.h> +#include <linux/log2.h> +#include <linux/slab.h> +#include <linux/device.h> +#include <linux/atomic.h> +#include <linux/rculist.h> +#include <linux/set_memory.h> +#include <asm/cacheflush.h> +#include <asm/page.h> + +/** + * pmalloc_data contains the data specific to a pmalloc pool, + * in a format compatible with the design of gen_alloc. + * Some of the fields are used for exposing the corresponding parameter + * to userspace, through sysfs. + */ +struct pmalloc_data { + struct gen_pool *pool; /* Link back to the associated pool. */ + bool protected; /* Status of the pool: RO or RW. */ + struct kobj_attribute attr_protected; /* Sysfs attribute. */ + struct kobj_attribute attr_avail; /* Sysfs attribute. */ + struct kobj_attribute attr_size; /* Sysfs attribute. */ + struct kobj_attribute attr_chunks; /* Sysfs attribute. */ + struct kobject *pool_kobject; + struct list_head node; /* list of pools */ +}; + +static LIST_HEAD(pmalloc_final_list); +static LIST_HEAD(pmalloc_tmp_list); +static struct list_head *pmalloc_list = &pmalloc_tmp_list; +static DEFINE_MUTEX(pmalloc_mutex); +static struct kobject *pmalloc_kobject; + +static ssize_t pmalloc_pool_show_protected(struct kobject *dev, + struct kobj_attribute *attr, + char *buf) +{ + struct pmalloc_data *data; + + data = container_of(attr, struct pmalloc_data, attr_protected); + if (data->protected) + return sprintf(buf, "protected\n"); + else + return sprintf(buf, "unprotected\n"); +} + +static ssize_t pmalloc_pool_show_avail(struct kobject *dev, + struct kobj_attribute *attr, + char *buf) +{ + struct pmalloc_data *data; + + data = container_of(attr, struct pmalloc_data, attr_avail); + return sprintf(buf, "%lu\n", gen_pool_avail(data->pool)); +} + +static ssize_t pmalloc_pool_show_size(struct kobject *dev, + struct kobj_attribute *attr, + char *buf) +{ + struct pmalloc_data *data; + + data = container_of(attr, struct pmalloc_data, attr_size); + return sprintf(buf, "%lu\n", gen_pool_size(data->pool)); +} + +static void pool_chunk_number(struct gen_pool *pool, + struct gen_pool_chunk *chunk, void *data) +{ + unsigned long *counter = data; + + (*counter)++; +} + +static ssize_t pmalloc_pool_show_chunks(struct kobject *dev, + struct kobj_attribute *attr, + char *buf) +{ + struct pmalloc_data *data; + unsigned long chunks_num = 0; + + data = container_of(attr, struct pmalloc_data, attr_chunks); + gen_pool_for_each_chunk(data->pool, pool_chunk_number, &chunks_num); + return sprintf(buf, "%lu\n", chunks_num); +} + +/** + * Exposes the pool and its attributes through sysfs. + */ +static struct kobject *pmalloc_connect(struct pmalloc_data *data) +{ + const struct attribute *attrs[] = { + &data->attr_protected.attr, + &data->attr_avail.attr, + &data->attr_size.attr, + &data->attr_chunks.attr, + NULL + }; + struct kobject *kobj; + + kobj = kobject_create_and_add(data->pool->name, pmalloc_kobject); + if (unlikely(!kobj)) + return NULL; + + if (unlikely(sysfs_create_files(kobj, attrs) < 0)) { + kobject_put(kobj); + kobj = NULL; + } + return kobj; +} + +/** + * Removes the pool and its attributes from sysfs. + */ +static void pmalloc_disconnect(struct pmalloc_data *data, + struct kobject *kobj) +{ + const struct attribute *attrs[] = { + &data->attr_protected.attr, + &data->attr_avail.attr, + &data->attr_size.attr, + &data->attr_chunks.attr, + NULL + }; + + sysfs_remove_files(kobj, attrs); + kobject_put(kobj); +} + +/** + * Declares an attribute of the pool. + */ + +#define pmalloc_attr_init(data, attr_name) \ +do { \ + sysfs_attr_init(&data->attr_##attr_name.attr); \ + data->attr_##attr_name.attr.name = #attr_name; \ + data->attr_##attr_name.attr.mode = VERIFY_OCTAL_PERMISSIONS(0444); \ + data->attr_##attr_name.show = pmalloc_pool_show_##attr_name; \ +} while (0) + +struct gen_pool *pmalloc_create_pool(const char *name, int min_alloc_order) +{ + struct gen_pool *pool; + const char *pool_name; + struct pmalloc_data *data; + + if (!name) { + WARN_ON(1); + return NULL; + } + + if (min_alloc_order < 0) + min_alloc_order = ilog2(sizeof(unsigned long)); + + pool = gen_pool_create(min_alloc_order, NUMA_NO_NODE); + if (unlikely(!pool)) + return NULL; + + mutex_lock(&pmalloc_mutex); + list_for_each_entry(data, pmalloc_list, node) + if (!strcmp(name, data->pool->name)) + goto same_name_err; + + pool_name = kstrdup(name, GFP_KERNEL); + if (unlikely(!pool_name)) + goto name_alloc_err; + + data = kzalloc(sizeof(struct pmalloc_data), GFP_KERNEL); + if (unlikely(!data)) + goto data_alloc_err; + + data->protected = false; + data->pool = pool; + pmalloc_attr_init(data, protected); + pmalloc_attr_init(data, avail); + pmalloc_attr_init(data, size); + pmalloc_attr_init(data, chunks); + pool->data = data; + pool->name = pool_name; + + list_add(&data->node, pmalloc_list); + if (pmalloc_list == &pmalloc_final_list) + data->pool_kobject = pmalloc_connect(data); + mutex_unlock(&pmalloc_mutex); + return pool; + +data_alloc_err: + kfree(pool_name); +name_alloc_err: +same_name_err: + mutex_unlock(&pmalloc_mutex); + gen_pool_destroy(pool); + return NULL; +} + +static inline int check_alloc_params(struct gen_pool *pool, size_t req_size) +{ + struct pmalloc_data *data; + unsigned int order; + + if (unlikely(!req_size || !pool)) + return -1; + + order = (unsigned int)pool->min_alloc_order; + data = pool->data; + + if (data == NULL) + return -1; + + if (unlikely(data->protected)) { + WARN_ON(1); + return -1; + } + return 0; +} + + +static inline bool chunk_tagging(void *chunk, bool tag) +{ + struct vm_struct *area; + struct page *page; + + if (!is_vmalloc_addr(chunk)) + return false; + + page = vmalloc_to_page(chunk); + if (unlikely(!page)) + return false; + + area = page->area; + if (tag) + area->flags |= VM_PMALLOC; + else + area->flags &= ~VM_PMALLOC; + return true; +} + + +static inline bool tag_chunk(void *chunk) +{ + return chunk_tagging(chunk, true); +} + + +static inline bool untag_chunk(void *chunk) +{ + return chunk_tagging(chunk, false); +} + +enum { + INVALID_PMALLOC_OBJECT = -1, + NOT_PMALLOC_OBJECT = 0, + VALID_PMALLOC_OBJECT = 1, +}; + +int is_pmalloc_object(const void *ptr, const unsigned long n) +{ + struct vm_struct *area; + struct page *page; + unsigned long area_start; + unsigned long area_end; + unsigned long object_start; + unsigned long object_end; + + + /* is_pmalloc_object gets called pretty late, so chances are high + * that the object is indeed of vmalloc type + */ + if (unlikely(!is_vmalloc_addr(ptr))) + return NOT_PMALLOC_OBJECT; + + page = vmalloc_to_page(ptr); + if (unlikely(!page)) + return NOT_PMALLOC_OBJECT; + + area = page->area; + + if (likely(!(area->flags & VM_PMALLOC))) + return NOT_PMALLOC_OBJECT; + + area_start = (unsigned long)area->addr; + area_end = area_start + area->nr_pages * PAGE_SIZE - 1; + object_start = (unsigned long)ptr; + object_end = object_start + n - 1; + + if (likely((area_start <= object_start) && + (object_end <= area_end))) + return VALID_PMALLOC_OBJECT; + else + return INVALID_PMALLOC_OBJECT; +} + + +bool pmalloc_prealloc(struct gen_pool *pool, size_t size) +{ + void *chunk; + size_t chunk_size; + bool add_error; + unsigned int order; + + if (check_alloc_params(pool, size)) + return false; + + order = (unsigned int)pool->min_alloc_order; + + /* Expand pool */ + chunk_size = roundup(size, PAGE_SIZE); + chunk = vmalloc(chunk_size); + if (unlikely(chunk == NULL)) + return false; + + /* Locking is already done inside gen_pool_add */ + add_error = gen_pool_add(pool, (unsigned long)chunk, chunk_size, + NUMA_NO_NODE); + if (unlikely(add_error != 0)) + goto abort; + + return true; +abort: + vfree(chunk); + return false; + +} + +void *pmalloc(struct gen_pool *pool, size_t size, gfp_t gfp) +{ + void *chunk; + size_t chunk_size; + bool add_error; + unsigned long retval; + unsigned int order; + + if (check_alloc_params(pool, size)) + return NULL; + + order = (unsigned int)pool->min_alloc_order; + +retry_alloc_from_pool: + retval = gen_pool_alloc(pool, size); + if (retval) + goto return_allocation; + + if (unlikely((gfp & __GFP_ATOMIC))) { + if (unlikely((gfp & __GFP_NOFAIL))) + goto retry_alloc_from_pool; + else + return NULL; + } + + /* Expand pool */ + chunk_size = roundup(size, PAGE_SIZE); + chunk = vmalloc(chunk_size); + if (unlikely(!chunk)) { + if (unlikely((gfp & __GFP_NOFAIL))) + goto retry_alloc_from_pool; + else + return NULL; + } + if (unlikely(!tag_chunk(chunk))) + goto free; + + /* Locking is already done inside gen_pool_add */ + add_error = gen_pool_add(pool, (unsigned long)chunk, chunk_size, + NUMA_NO_NODE); + if (unlikely(add_error)) + goto abort; + + retval = gen_pool_alloc(pool, size); + if (retval) { +return_allocation: + *(size_t *)retval = size; + if (gfp & __GFP_ZERO) + memset((void *)retval, 0, size); + return (void *)retval; + } + /* Here there is no test for __GFP_NO_FAIL because, in case of + * concurrent allocation, one thread might add a chunk to the + * pool and this memory could be allocated by another thread, + * before the first thread gets a chance to use it. + * As long as vmalloc succeeds, it's ok to retry. + */ + goto retry_alloc_from_pool; +abort: + untag_chunk(chunk); +free: + vfree(chunk); + return NULL; +} + +static void pmalloc_chunk_set_protection(struct gen_pool *pool, + + struct gen_pool_chunk *chunk, + void *data) +{ + const bool *flag = data; + size_t chunk_size = chunk->end_addr + 1 - chunk->start_addr; + unsigned long pages = chunk_size / PAGE_SIZE; + + BUG_ON(chunk_size & (PAGE_SIZE - 1)); + + if (*flag) + set_memory_ro(chunk->start_addr, pages); + else + set_memory_rw(chunk->start_addr, pages); +} + +static int pmalloc_pool_set_protection(struct gen_pool *pool, bool protection) +{ + struct pmalloc_data *data; + struct gen_pool_chunk *chunk; + + if (unlikely(!pool)) + return -EINVAL; + + data = pool->data; + + if (unlikely(!data)) + return -EINVAL; + + if (unlikely(data->protected == protection)) { + WARN_ON(1); + return 0; + } + + data->protected = protection; + list_for_each_entry(chunk, &(pool)->chunks, next_chunk) + pmalloc_chunk_set_protection(pool, chunk, &protection); + return 0; +} + +int pmalloc_protect_pool(struct gen_pool *pool) +{ + return pmalloc_pool_set_protection(pool, true); +} + + +static void pmalloc_chunk_free(struct gen_pool *pool, + struct gen_pool_chunk *chunk, void *data) +{ + untag_chunk(chunk); + gen_pool_flush_chunk(pool, chunk); + vfree_atomic((void *)chunk->start_addr); +} + + +int pmalloc_destroy_pool(struct gen_pool *pool) +{ + struct pmalloc_data *data; + + if (unlikely(pool == NULL)) + return -EINVAL; + + data = pool->data; + + if (unlikely(data == NULL)) + return -EINVAL; + + mutex_lock(&pmalloc_mutex); + list_del(&data->node); + mutex_unlock(&pmalloc_mutex); + + if (likely(data->pool_kobject)) + pmalloc_disconnect(data, data->pool_kobject); + + pmalloc_pool_set_protection(pool, false); + gen_pool_for_each_chunk(pool, pmalloc_chunk_free, NULL); + gen_pool_destroy(pool); + kfree(data); + return 0; +} + +/** + * When the sysfs is ready to receive registrations, connect all the + * pools previously created. Also enable further pools to be connected + * right away. + */ +static int __init pmalloc_late_init(void) +{ + struct pmalloc_data *data, *n; + + pmalloc_kobject = kobject_create_and_add("pmalloc", kernel_kobj); + + mutex_lock(&pmalloc_mutex); + pmalloc_list = &pmalloc_final_list; + + if (likely(pmalloc_kobject != NULL)) { + list_for_each_entry_safe(data, n, &pmalloc_tmp_list, node) { + list_move(&data->node, &pmalloc_final_list); + pmalloc_connect(data); + } + } + mutex_unlock(&pmalloc_mutex); + return 0; +} +late_initcall(pmalloc_late_init); diff --git a/mm/usercopy.c b/mm/usercopy.c index a9852b2..c3b1029 100644 --- a/mm/usercopy.c +++ b/mm/usercopy.c @@ -15,6 +15,7 @@ #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt #include <linux/mm.h> +#include <linux/pmalloc.h> #include <linux/slab.h> #include <linux/sched.h> #include <linux/sched/task.h> @@ -222,6 +223,7 @@ static inline const char *check_heap_object(const void *ptr, unsigned long n, void __check_object_size(const void *ptr, unsigned long n, bool to_user) { const char *err; + int retv; /* Skip all tests if size is zero. */ if (!n) @@ -229,12 +231,12 @@ void __check_object_size(const void *ptr, unsigned long n, bool to_user) /* Check for invalid addresses. */ err = check_bogus_address(ptr, n); - if (err) + if (unlikely(err)) goto report; /* Check for bad heap object. */ err = check_heap_object(ptr, n, to_user); - if (err) + if (unlikely(err)) goto report; /* Check for bad stack object. */ @@ -257,8 +259,23 @@ void __check_object_size(const void *ptr, unsigned long n, bool to_user) /* Check for object in kernel to avoid text exposure. */ err = check_kernel_text_object(ptr, n); - if (!err) - return; + if (unlikely(err)) + goto report; + + /* Check if object is from a pmalloc chunk. + */ + retv = is_pmalloc_object(ptr, n); + if (unlikely(retv)) { + if (unlikely(!to_user)) { + err = "<trying to write to pmalloc object>"; + goto report; + } + if (retv < 0) { + err = "<invalid pmalloc object>"; + goto report; + } + } + return; report: report_usercopy(ptr, n, to_user, err);

[4/6] Protectable Memory

Commit Message

Comments

Patch