diff mbox

[2/5] Protectable Memory Allocator

Message ID 20170605192216.21596-3-igor.stoppa@huawei.com (mailing list archive)
State New, archived
Headers show

Commit Message

Igor Stoppa June 5, 2017, 7:22 p.m. UTC
The MMU available in many systems runnign Linux can often provide R/O
protection to the memory pages it handles.

However, this works efficiently only when said pages contain only data
that does not need to be modified.

This can work well for statically allocated variables, however it doe
not fit too well the case of dynamically allocated ones.

Dynamic allocation does not provide, currently, means for grouping
variables in memory pages that would contain exclusively data that can
be made read only.

The allocator here provided (pmalloc - protectable memory allocator)
introduces the concept of pools of protectable memory.

A module can request a pool and then refer any allocation request to the
pool handler it has received.

Once all the memory requested (over various iterations) is initialized,
the pool can be protected.

After this point, the pool can only be destroyed (it is up to the module
to avoid any no further references to the memory from the pool, after
the destruction is invoked).

The latter case is mainly meant for releasing memory when a module is
unloaded.

A module can have as many pools as needed, for example to support the
protection of data that is initialized in sufficiently distinct phases.

Signed-off-by: Igor Stoppa <igor.stoppa@huawei.com>
---
 include/linux/page-flags.h     |   2 +
 include/linux/pmalloc.h        |  20 ++++
 include/trace/events/mmflags.h |   1 +
 mm/Makefile                    |   2 +-
 mm/pmalloc.c                   | 227 +++++++++++++++++++++++++++++++++++++++++
 mm/usercopy.c                  |  24 +++--
 6 files changed, 266 insertions(+), 10 deletions(-)
 create mode 100644 include/linux/pmalloc.h
 create mode 100644 mm/pmalloc.c

Comments

Tetsuo Handa June 6, 2017, 4:44 a.m. UTC | #1
Igor Stoppa wrote:
> +int pmalloc_protect_pool(struct pmalloc_pool *pool)
> +{
> +	struct pmalloc_node *node;
> +
> +	if (!pool)
> +		return -EINVAL;
> +	mutex_lock(&pool->nodes_list_mutex);
> +	hlist_for_each_entry(node, &pool->nodes_list_head, nodes_list) {
> +		unsigned long size, pages;
> +
> +		size = WORD_SIZE * node->total_words + HEADER_SIZE;
> +		pages = size / PAGE_SIZE;
> +		set_memory_ro((unsigned long)node, pages);
> +	}
> +	pool->protected = true;
> +	mutex_unlock(&pool->nodes_list_mutex);
> +	return 0;
> +}

As far as I know, not all CONFIG_MMU=y architectures provide
set_memory_ro()/set_memory_rw(). You need to provide fallback for
architectures which do not provide set_memory_ro()/set_memory_rw()
or kernels built with CONFIG_MMU=n.

>  mmu-$(CONFIG_MMU)	:= gup.o highmem.o memory.o mincore.o \
>  			   mlock.o mmap.o mprotect.o mremap.o msync.o \
>  			   page_vma_mapped.o pagewalk.o pgtable-generic.o \
> -			   rmap.o vmalloc.o
> +			   rmap.o vmalloc.o pmalloc.o



Is this __PMALLOC_ALIGNED needed? Why not use "long" and "BITS_PER_LONG" ?

> +struct pmalloc_node {
> +	struct hlist_node nodes_list;
> +	atomic_t used_words;
> +	unsigned int total_words;
> +	__PMALLOC_ALIGNED align_t data[];
> +};



Please use macros for round up/down.

> +	size = ((HEADER_SIZE - 1 + PAGE_SIZE) +
> +		WORD_SIZE * (unsigned long) words) & PAGE_MASK;

> +	req_words = (((int)size) + WORD_SIZE - 1) / WORD_SIZE;



You need to check for node != NULL before dereference it.
Also, why rcu_read_lock()/rcu_read_unlock() ? 
I can't find corresponding synchronize_rcu() etc. in this patch.
pmalloc() won't be hotpath. Enclosing whole using a mutex might be OK.
If any reason to use rcu, rcu_read_unlock() is missing if came from "goto".

+void *pmalloc(unsigned long size, struct pmalloc_pool *pool)
+{
+	struct pmalloc_node *node;
+	int req_words;
+	int starting_word;
+
+	if (size > INT_MAX || size == 0)
+		return NULL;
+	req_words = (((int)size) + WORD_SIZE - 1) / WORD_SIZE;
+	rcu_read_lock();
+	hlist_for_each_entry_rcu(node, &pool->nodes_list_head, nodes_list) {
+		starting_word = atomic_fetch_add(req_words, &node->used_words);
+		if (starting_word + req_words > node->total_words)
+			atomic_sub(req_words, &node->used_words);
+		else
+			goto found_node;
+	}
+	rcu_read_unlock();
+	node = __pmalloc_create_node(req_words);
+	starting_word = atomic_fetch_add(req_words, &node->used_words);
+	mutex_lock(&pool->nodes_list_mutex);
+	hlist_add_head_rcu(&node->nodes_list, &pool->nodes_list_head);
+	mutex_unlock(&pool->nodes_list_mutex);
+	atomic_inc(&pool->nodes_count);
+found_node:
+	return node->data + starting_word;
+}



I feel that n is off-by-one if (ptr + n) % PAGE_SIZE == 0
according to check_page_span().

> +const char *__pmalloc_check_object(const void *ptr, unsigned long n)
> +{
> +	unsigned long p;
> +
> +	p = (unsigned long)ptr;
> +	n += (unsigned long)ptr;
> +	for (; (PAGE_MASK & p) <= (PAGE_MASK & n); p += PAGE_SIZE) {
> +		if (is_vmalloc_addr((void *)p)) {
> +			struct page *page;
> +
> +			page = vmalloc_to_page((void *)p);
> +			if (!(page && PagePmalloc(page)))
> +				return msg;
> +		}
> +	}
> +	return NULL;
> +}



Why need to call pmalloc_init() from loadable kernel module?
It has to be called very early stage of boot for only once.

> +int __init pmalloc_init(void)
> +{
> +	pmalloc_data = vmalloc(sizeof(struct pmalloc_data));
> +	if (!pmalloc_data)
> +		return -ENOMEM;
> +	INIT_HLIST_HEAD(&pmalloc_data->pools_list_head);
> +	mutex_init(&pmalloc_data->pools_list_mutex);
> +	atomic_set(&pmalloc_data->pools_count, 0);
> +	return 0;
> +}
> +EXPORT_SYMBOL(pmalloc_init);

Since pmalloc_data is a globally shared variable, why need to
allocate it dynamically? If it is for randomizing the address
of pmalloc_data, it does not make sense to continue because
vmalloc() failure causes subsequent oops.
Christoph Hellwig June 6, 2017, 6:25 a.m. UTC | #2
On Tue, Jun 06, 2017 at 01:44:32PM +0900, Tetsuo Handa wrote:
> Igor Stoppa wrote:
> > +int pmalloc_protect_pool(struct pmalloc_pool *pool)
> > +{
> > +	struct pmalloc_node *node;
> > +
> > +	if (!pool)
> > +		return -EINVAL;
> > +	mutex_lock(&pool->nodes_list_mutex);
> > +	hlist_for_each_entry(node, &pool->nodes_list_head, nodes_list) {
> > +		unsigned long size, pages;
> > +
> > +		size = WORD_SIZE * node->total_words + HEADER_SIZE;
> > +		pages = size / PAGE_SIZE;
> > +		set_memory_ro((unsigned long)node, pages);
> > +	}
> > +	pool->protected = true;
> > +	mutex_unlock(&pool->nodes_list_mutex);
> > +	return 0;
> > +}
> 
> As far as I know, not all CONFIG_MMU=y architectures provide
> set_memory_ro()/set_memory_rw(). You need to provide fallback for
> architectures which do not provide set_memory_ro()/set_memory_rw()
> or kernels built with CONFIG_MMU=n.

I think we'll just need to generalize CONFIG_STRICT_MODULE_RWX and/or
ARCH_HAS_STRICT_MODULE_RWX so there is a symbol to key this off.
Igor Stoppa June 6, 2017, 11:34 a.m. UTC | #3
On 06/06/17 09:25, Christoph Hellwig wrote:
> On Tue, Jun 06, 2017 at 01:44:32PM +0900, Tetsuo Handa wrote:

[..]

>> As far as I know, not all CONFIG_MMU=y architectures provide
>> set_memory_ro()/set_memory_rw(). You need to provide fallback for
>> architectures which do not provide set_memory_ro()/set_memory_rw()
>> or kernels built with CONFIG_MMU=n.
> 
> I think we'll just need to generalize CONFIG_STRICT_MODULE_RWX and/or
> ARCH_HAS_STRICT_MODULE_RWX so there is a symbol to key this off.

Would STRICT_KERNEL_RWX work? It's already present.
If both kernel text and rodata can be protected, so can pmalloc data.

---
igor
Igor Stoppa June 6, 2017, 11:42 a.m. UTC | #4
Hi,
thanks a lot for the review. My answers are in-line below.
I have rearranged your comments because I wasn't sure how to reply to
them inlined.

On 06/06/17 07:44, Tetsuo Handa wrote:
> Igor Stoppa wrote:

[...]

> As far as I know, not all CONFIG_MMU=y architectures provide
> set_memory_ro()/set_memory_rw().

I'll follow up on this in the existing separate thread.

[...]

>> +struct pmalloc_node {
>> +	struct hlist_node nodes_list;
>> +	atomic_t used_words;
>> +	unsigned int total_words;
>> +	__PMALLOC_ALIGNED align_t data[];
>> +};
>
> Is this __PMALLOC_ALIGNED needed? Why not use "long" and "BITS_PER_LONG" ?

In an earlier version I actually asked the same question.
It is currently there because I just don't know enough about various
architectures. The idea of having "align_t" was that it could be tied
into what is the most desirable alignment for each architecture.
But I'm actually looking for advise on this.


>> +	size = ((HEADER_SIZE - 1 + PAGE_SIZE) +
>> +		WORD_SIZE * (unsigned long) words) & PAGE_MASK;
> 
>> +	req_words = (((int)size) + WORD_SIZE - 1) / WORD_SIZE;
>
> Please use macros for round up/down.

ok

[...]


> +	rcu_read_lock();
> +	hlist_for_each_entry_rcu(node, &pool->nodes_list_head, nodes_list) {
> +		starting_word = atomic_fetch_add(req_words, &node->used_words);
> +		if (starting_word + req_words > node->total_words)
> +			atomic_sub(req_words, &node->used_words);
> +		else
> +			goto found_node;
> +	}
> +	rcu_read_unlock();
> 
> You need to check for node != NULL before dereference it.

This was intentionally left out, on the ground that I'm using the kernel
macros for both populating and walking the list.
So, if I understood correctly, there shouldn't be a case where node is
NULL, right?
Unless it has been tampered/damaged. Is that what you mean?


> Also, why rcu_read_lock()/rcu_read_unlock() ? 
> I can't find corresponding synchronize_rcu() etc. in this patch.

oops. Thanks for spotting it.


> pmalloc() won't be hotpath. Enclosing whole using a mutex might be OK.
> If any reason to use rcu, rcu_read_unlock() is missing if came from "goto".

If there are no strong objections, I'd rather fix it and keep it as RCU.
Kees Cook was mentioning the possibility of implementing also
"write seldom" in a similar fashion.
In that case the path is likely to warm up.
It might be premature optimization, but I'd prefer to avoid knowingly
introduce performance issues.
Said this, I agree on the bug you spotted.


>> +const char *__pmalloc_check_object(const void *ptr, unsigned long n)
>> +{
>> +	unsigned long p;
>> +
>> +	p = (unsigned long)ptr;
>> +	n += (unsigned long)ptr;
>> +	for (; (PAGE_MASK & p) <= (PAGE_MASK & n); p += PAGE_SIZE) {
>> +		if (is_vmalloc_addr((void *)p)) {
>> +			struct page *page;
>> +
>> +			page = vmalloc_to_page((void *)p);
>> +			if (!(page && PagePmalloc(page)))
>> +				return msg;
>> +		}
>> +	}
>> +	return NULL;
>> +}
> 
> I feel that n is off-by-one if (ptr + n) % PAGE_SIZE == 0
> according to check_page_span().

Hmm,
let's say PAGE_SIZE is 0x0100 and PAGE MASK 0xFF00

Here are some cases (N = number of pages found):

 ptr       ptr + n   Pages        Test                  N

0x0005     0x00FF      1     0x0000 <= 0x00FF  true     1
0x0105     0x00FF            0x0100 <= 0x00FF  false    1

0x0005     0x0100      2     0x0000 <= 0x0100  true     1
0x0100     0x0100            0x0100 <= 0x0100  true     2
0x0200     0x0100            0x0200 <= 0x0100  false    2

0x0005     0x01FF      2     0x0000 <= 0x0100  true     1
0x0105     0x01FF            0x0100 <= 0x0100  true     2
0x0205     0x01FF            0x0200 <= 0x0100  false    2

It seems to work. If I am missing your point, could you please
use the same format of the example I made, to explain me?

I might be able to understand better.

>> +int __init pmalloc_init(void)
>> +{
>> +	pmalloc_data = vmalloc(sizeof(struct pmalloc_data));
>> +	if (!pmalloc_data)
>> +		return -ENOMEM;
>> +	INIT_HLIST_HEAD(&pmalloc_data->pools_list_head);
>> +	mutex_init(&pmalloc_data->pools_list_mutex);
>> +	atomic_set(&pmalloc_data->pools_count, 0);
>> +	return 0;
>> +}
>> +EXPORT_SYMBOL(pmalloc_init);
>
> Why need to call pmalloc_init() from loadable kernel module?
> It has to be called very early stage of boot for only once.

Yes, this is a bug.
Actually I forgot to put in this patch the real call to pmalloc init,
which is in init/main.c, right before the security init.
I should see if I can move it higher, to allow for more early users of
pmalloc.

> Since pmalloc_data is a globally shared variable, why need to
> allocate it dynamically? If it is for randomizing the address
> of pmalloc_data, it does not make sense to continue because
> vmalloc() failure causes subsequent oops.

My idea was to delegate the failure-handling to the caller, which might
be able to take more sensible actions than what I can do in this function.

I see you have already replied in a different thread that the value is
not checked. So I will remove it.


thanks again for the feedback,
igor
Tetsuo Handa June 6, 2017, 12:08 p.m. UTC | #5
Igor Stoppa wrote:
> >> +struct pmalloc_node {
> >> +	struct hlist_node nodes_list;
> >> +	atomic_t used_words;
> >> +	unsigned int total_words;
> >> +	__PMALLOC_ALIGNED align_t data[];
> >> +};
> >
> > Is this __PMALLOC_ALIGNED needed? Why not use "long" and "BITS_PER_LONG" ?
> 
> In an earlier version I actually asked the same question.
> It is currently there because I just don't know enough about various
> architectures. The idea of having "align_t" was that it could be tied
> into what is the most desirable alignment for each architecture.
> But I'm actually looking for advise on this.

I think that let the compiler use natural alignment is OK.



> > You need to check for node != NULL before dereference it.
> 
> So, if I understood correctly, there shouldn't be a case where node is
> NULL, right?
> Unless it has been tampered/damaged. Is that what you mean?

I meant to say

+	node = __pmalloc_create_node(req_words);
// this location.
+	starting_word = atomic_fetch_add(req_words, &node->used_words);



> >> +const char *__pmalloc_check_object(const void *ptr, unsigned long n)
> >> +{
> >> +	unsigned long p;
> >> +
> >> +	p = (unsigned long)ptr;
> >> +	n += (unsigned long)ptr;
> >> +	for (; (PAGE_MASK & p) <= (PAGE_MASK & n); p += PAGE_SIZE) {
> >> +		if (is_vmalloc_addr((void *)p)) {
> >> +			struct page *page;
> >> +
> >> +			page = vmalloc_to_page((void *)p);
> >> +			if (!(page && PagePmalloc(page)))
> >> +				return msg;
> >> +		}
> >> +	}
> >> +	return NULL;
> >> +}
> > 
> > I feel that n is off-by-one if (ptr + n) % PAGE_SIZE == 0
> > according to check_page_span().
> 
> It seems to work. If I am missing your point, could you please
> use the same format of the example I made, to explain me?

If ptr == NULL and n == PAGE_SIZE so that (ptr + n) % PAGE_SIZE == 0,
this loop will access two pages (one page containing p == 0 and another
page containing p == PAGE_SIZE) when this loop should access only one
page containing p == 0. When checking n bytes, it's range is 0 to n - 1.
Igor Stoppa June 6, 2017, 12:23 p.m. UTC | #6
On 06/06/17 15:08, Tetsuo Handa wrote:
> Igor Stoppa wrote:
>>>> +struct pmalloc_node {
>>>> +	struct hlist_node nodes_list;
>>>> +	atomic_t used_words;
>>>> +	unsigned int total_words;
>>>> +	__PMALLOC_ALIGNED align_t data[];
>>>> +};
>>>
>>> Is this __PMALLOC_ALIGNED needed? Why not use "long" and "BITS_PER_LONG" ?
>>
>> In an earlier version I actually asked the same question.
>> It is currently there because I just don't know enough about various
>> architectures. The idea of having "align_t" was that it could be tied
>> into what is the most desirable alignment for each architecture.
>> But I'm actually looking for advise on this.
> 
> I think that let the compiler use natural alignment is OK.

On a 64 bit machine the preferred alignment might be either 32 or 64,
depending on the application. How can the compiler choose?


>>> You need to check for node != NULL before dereference it.
>>
>> So, if I understood correctly, there shouldn't be a case where node is
>> NULL, right?
>> Unless it has been tampered/damaged. Is that what you mean?
> 
> I meant to say
> 
> +	node = __pmalloc_create_node(req_words);
> // this location.
> +	starting_word = atomic_fetch_add(req_words, &node->used_words);

argh, yes


>>>> +const char *__pmalloc_check_object(const void *ptr, unsigned long n)
>>>> +{
>>>> +	unsigned long p;
>>>> +
>>>> +	p = (unsigned long)ptr;
>>>> +	n += (unsigned long)ptr;
>>>> +	for (; (PAGE_MASK & p) <= (PAGE_MASK & n); p += PAGE_SIZE) {
>>>> +		if (is_vmalloc_addr((void *)p)) {
>>>> +			struct page *page;
>>>> +
>>>> +			page = vmalloc_to_page((void *)p);
>>>> +			if (!(page && PagePmalloc(page)))
>>>> +				return msg;
>>>> +		}
>>>> +	}
>>>> +	return NULL;
>>>> +}
>>>
>>> I feel that n is off-by-one if (ptr + n) % PAGE_SIZE == 0
>>> according to check_page_span().
>>
>> It seems to work. If I am missing your point, could you please
>> use the same format of the example I made, to explain me?
> 
> If ptr == NULL and n == PAGE_SIZE so that (ptr + n) % PAGE_SIZE == 0,
> this loop will access two pages (one page containing p == 0 and another
> page containing p == PAGE_SIZE) when this loop should access only one
> page containing p == 0. When checking n bytes, it's range is 0 to n - 1.

oh, so:

p = (unsigned long) ptr;
n = p + n - 1;


--
igor
Laura Abbott June 6, 2017, 4:24 p.m. UTC | #7
On 06/06/2017 04:34 AM, Igor Stoppa wrote:
> On 06/06/17 09:25, Christoph Hellwig wrote:
>> On Tue, Jun 06, 2017 at 01:44:32PM +0900, Tetsuo Handa wrote:
> 
> [..]
> 
>>> As far as I know, not all CONFIG_MMU=y architectures provide
>>> set_memory_ro()/set_memory_rw(). You need to provide fallback for
>>> architectures which do not provide set_memory_ro()/set_memory_rw()
>>> or kernels built with CONFIG_MMU=n.
>>
>> I think we'll just need to generalize CONFIG_STRICT_MODULE_RWX and/or
>> ARCH_HAS_STRICT_MODULE_RWX so there is a symbol to key this off.
> 
> Would STRICT_KERNEL_RWX work? It's already present.
> If both kernel text and rodata can be protected, so can pmalloc data.
> 
> ---
> igor
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 


There's already ARCH_HAS_SET_MEMORY for this purpose.

Thanks,
Laura
diff mbox

Patch

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 6b5818d..acc0723 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -81,6 +81,7 @@  enum pageflags {
 	PG_active,
 	PG_waiters,		/* Page has waiters, check its waitqueue. Must be bit #7 and in the same byte as "PG_locked" */
 	PG_slab,
+	PG_pmalloc,
 	PG_owner_priv_1,	/* Owner use. If pagecache, fs may use*/
 	PG_arch_1,
 	PG_reserved,
@@ -274,6 +275,7 @@  PAGEFLAG(Active, active, PF_HEAD) __CLEARPAGEFLAG(Active, active, PF_HEAD)
 	TESTCLEARFLAG(Active, active, PF_HEAD)
 __PAGEFLAG(Slab, slab, PF_NO_TAIL)
 __PAGEFLAG(SlobFree, slob_free, PF_NO_TAIL)
+__PAGEFLAG(Pmalloc, pmalloc, PF_NO_TAIL)
 PAGEFLAG(Checked, checked, PF_NO_COMPOUND)	   /* Used by some filesystems */
 
 /* Xen */
diff --git a/include/linux/pmalloc.h b/include/linux/pmalloc.h
new file mode 100644
index 0000000..83d3557
--- /dev/null
+++ b/include/linux/pmalloc.h
@@ -0,0 +1,20 @@ 
+/*
+ * pmalloc.h: Header for Protectable Memory Allocator
+ *
+ * (C) Copyright 2017 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.stoppa@huawei.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; version 2
+ * of the License.
+ */
+
+#ifndef _PMALLOC_H
+#define _PMALLOC_H
+
+struct pmalloc_pool *pmalloc_create_pool(const char *name);
+void *pmalloc(unsigned long size, struct pmalloc_pool *pool);
+int pmalloc_protect_pool(struct pmalloc_pool *pool);
+int pmalloc_destroy_pool(struct pmalloc_pool *pool);
+#endif
diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h
index 304ff94..41d1587 100644
--- a/include/trace/events/mmflags.h
+++ b/include/trace/events/mmflags.h
@@ -91,6 +91,7 @@ 
 	{1UL << PG_lru,			"lru"		},		\
 	{1UL << PG_active,		"active"	},		\
 	{1UL << PG_slab,		"slab"		},		\
+	{1UL << PG_pmalloc,		"pmalloc"	},		\
 	{1UL << PG_owner_priv_1,	"owner_priv_1"	},		\
 	{1UL << PG_arch_1,		"arch_1"	},		\
 	{1UL << PG_reserved,		"reserved"	},		\
diff --git a/mm/Makefile b/mm/Makefile
index 026f6a8..79dd99c 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -25,7 +25,7 @@  mmu-y			:= nommu.o
 mmu-$(CONFIG_MMU)	:= gup.o highmem.o memory.o mincore.o \
 			   mlock.o mmap.o mprotect.o mremap.o msync.o \
 			   page_vma_mapped.o pagewalk.o pgtable-generic.o \
-			   rmap.o vmalloc.o
+			   rmap.o vmalloc.o pmalloc.o
 
 
 ifdef CONFIG_CROSS_MEMORY_ATTACH
diff --git a/mm/pmalloc.c b/mm/pmalloc.c
new file mode 100644
index 0000000..c73d60c
--- /dev/null
+++ b/mm/pmalloc.c
@@ -0,0 +1,227 @@ 
+/*
+ * pmalloc.c: Protectable Memory Allocator
+ *
+ * (C) Copyright 2017 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.stoppa@huawei.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; version 2
+ * of the License.
+ */
+
+#include <linux/printk.h>
+#include <linux/init.h>
+#include <linux/mm.h>
+#include <linux/vmalloc.h>
+#include <linux/list.h>
+#include <linux/rculist.h>
+#include <linux/mutex.h>
+#include <linux/atomic.h>
+#include <asm/set_memory.h>
+#include <asm/page.h>
+
+typedef uint64_t align_t;
+#define WORD_SIZE sizeof(align_t)
+
+#define __PMALLOC_ALIGNED __aligned(WORD_SIZE)
+
+#define MAX_POOL_NAME_LEN 40
+
+#define PMALLOC_HASH_SIZE (PAGE_SIZE / 2)
+
+#define PMALLOC_HASH_ENTRIES ilog2(PMALLOC_HASH_SIZE)
+
+
+struct pmalloc_data {
+	struct hlist_head pools_list_head;
+	struct mutex pools_list_mutex;
+	atomic_t pools_count;
+};
+
+struct pmalloc_pool {
+	struct hlist_node pools_list;
+	struct hlist_head nodes_list_head;
+	struct mutex nodes_list_mutex;
+	atomic_t nodes_count;
+	bool protected;
+	char name[MAX_POOL_NAME_LEN];
+};
+
+struct pmalloc_node {
+	struct hlist_node nodes_list;
+	atomic_t used_words;
+	unsigned int total_words;
+	__PMALLOC_ALIGNED align_t data[];
+};
+
+#define HEADER_SIZE sizeof(struct pmalloc_node)
+
+static struct pmalloc_data *pmalloc_data;
+
+struct pmalloc_node *__pmalloc_create_node(int words)
+{
+	struct pmalloc_node *node;
+	unsigned long size, i, pages;
+	struct page *p;
+
+	size = ((HEADER_SIZE - 1 + PAGE_SIZE) +
+		WORD_SIZE * (unsigned long) words) & PAGE_MASK;
+	node = vmalloc(size);
+	if (!node)
+		return NULL;
+	atomic_set(&node->used_words, 0);
+	node->total_words = (size - HEADER_SIZE) / WORD_SIZE;
+	pages = size / PAGE_SIZE;
+	for (i = 0; i < pages; i++) {
+		p = vmalloc_to_page((void *)(i * PAGE_SIZE +
+					     (unsigned long)node));
+		__SetPagePmalloc(p);
+	}
+	return node;
+}
+
+void *pmalloc(unsigned long size, struct pmalloc_pool *pool)
+{
+	struct pmalloc_node *node;
+	int req_words;
+	int starting_word;
+
+	if (size > INT_MAX || size == 0)
+		return NULL;
+	req_words = (((int)size) + WORD_SIZE - 1) / WORD_SIZE;
+	rcu_read_lock();
+	hlist_for_each_entry_rcu(node, &pool->nodes_list_head, nodes_list) {
+		starting_word = atomic_fetch_add(req_words, &node->used_words);
+		if (starting_word + req_words > node->total_words)
+			atomic_sub(req_words, &node->used_words);
+		else
+			goto found_node;
+	}
+	rcu_read_unlock();
+	node = __pmalloc_create_node(req_words);
+	starting_word = atomic_fetch_add(req_words, &node->used_words);
+	mutex_lock(&pool->nodes_list_mutex);
+	hlist_add_head_rcu(&node->nodes_list, &pool->nodes_list_head);
+	mutex_unlock(&pool->nodes_list_mutex);
+	atomic_inc(&pool->nodes_count);
+found_node:
+	return node->data + starting_word;
+}
+
+const char msg[] = "Not a valid Pmalloc object.";
+const char *__pmalloc_check_object(const void *ptr, unsigned long n)
+{
+	unsigned long p;
+
+	p = (unsigned long)ptr;
+	n += (unsigned long)ptr;
+	for (; (PAGE_MASK & p) <= (PAGE_MASK & n); p += PAGE_SIZE) {
+		if (is_vmalloc_addr((void *)p)) {
+			struct page *page;
+
+			page = vmalloc_to_page((void *)p);
+			if (!(page && PagePmalloc(page)))
+				return msg;
+		}
+	}
+	return NULL;
+}
+EXPORT_SYMBOL(__pmalloc_check_object);
+
+
+struct pmalloc_pool *pmalloc_create_pool(const char *name)
+{
+	struct pmalloc_pool *pool;
+	unsigned int name_len;
+
+	name_len = strnlen(name, MAX_POOL_NAME_LEN);
+	if (unlikely(name_len == MAX_POOL_NAME_LEN))
+		return NULL;
+	pool = vmalloc(sizeof(struct pmalloc_pool));
+	if (unlikely(!pool))
+		return NULL;
+	INIT_HLIST_NODE(&pool->pools_list);
+	INIT_HLIST_HEAD(&pool->nodes_list_head);
+	mutex_init(&pool->nodes_list_mutex);
+	atomic_set(&pool->nodes_count, 0);
+	pool->protected = false;
+	strcpy(pool->name, name);
+	mutex_lock(&pmalloc_data->pools_list_mutex);
+	hlist_add_head_rcu(&pool->pools_list, &pmalloc_data->pools_list_head);
+	mutex_unlock(&pmalloc_data->pools_list_mutex);
+	atomic_inc(&pmalloc_data->pools_count);
+	return pool;
+}
+
+int pmalloc_protect_pool(struct pmalloc_pool *pool)
+{
+	struct pmalloc_node *node;
+
+	if (!pool)
+		return -EINVAL;
+	mutex_lock(&pool->nodes_list_mutex);
+	hlist_for_each_entry(node, &pool->nodes_list_head, nodes_list) {
+		unsigned long size, pages;
+
+		size = WORD_SIZE * node->total_words + HEADER_SIZE;
+		pages = size / PAGE_SIZE;
+		set_memory_ro((unsigned long)node, pages);
+	}
+	pool->protected = true;
+	mutex_unlock(&pool->nodes_list_mutex);
+	return 0;
+}
+
+static __always_inline
+void __pmalloc_destroy_node(struct pmalloc_node *node)
+{
+	int pages, i;
+
+	pages = (node->total_words * WORD_SIZE + HEADER_SIZE) /	PAGE_SIZE;
+	for (i = 0; i < pages; i++)
+		__ClearPagePmalloc(vmalloc_to_page(node + i * PAGE_SIZE));
+	vfree(node);
+}
+
+int pmalloc_destroy_pool(struct pmalloc_pool *pool)
+{
+	struct pmalloc_node *node;
+
+	if (!pool)
+		return -EINVAL;
+	mutex_lock(&pool->nodes_list_mutex);
+	mutex_lock(&pmalloc_data->pools_list_mutex);
+	hlist_del_rcu(&pool->pools_list);
+	mutex_unlock(&pmalloc_data->pools_list_mutex);
+	hlist_for_each_entry_rcu(node, &pool->nodes_list_head, nodes_list) {
+		int pages;
+
+		pages = (node->total_words * WORD_SIZE + HEADER_SIZE) /
+			PAGE_SIZE;
+		set_memory_rw((unsigned long)node, pages);
+	}
+
+	while (likely(!hlist_empty(&pool->nodes_list_head))) {
+		node = hlist_entry(pool->nodes_list_head.first,
+				   struct pmalloc_node, nodes_list);
+		hlist_del(&node->nodes_list);
+		__pmalloc_destroy_node(node);
+	}
+	mutex_unlock(&pool->nodes_list_mutex);
+	atomic_dec(&pmalloc_data->pools_count);
+	vfree(pool);
+	return 0;
+}
+
+int __init pmalloc_init(void)
+{
+	pmalloc_data = vmalloc(sizeof(struct pmalloc_data));
+	if (!pmalloc_data)
+		return -ENOMEM;
+	INIT_HLIST_HEAD(&pmalloc_data->pools_list_head);
+	mutex_init(&pmalloc_data->pools_list_mutex);
+	atomic_set(&pmalloc_data->pools_count, 0);
+	return 0;
+}
+EXPORT_SYMBOL(pmalloc_init);
diff --git a/mm/usercopy.c b/mm/usercopy.c
index a9852b2..29bb691 100644
--- a/mm/usercopy.c
+++ b/mm/usercopy.c
@@ -195,22 +195,28 @@  static inline const char *check_page_span(const void *ptr, unsigned long n,
 	return NULL;
 }
 
+extern const char *__pmalloc_check_object(const void *ptr, unsigned long n);
+
 static inline const char *check_heap_object(const void *ptr, unsigned long n,
 					    bool to_user)
 {
 	struct page *page;
 
-	if (!virt_addr_valid(ptr))
-		return NULL;
-
-	page = virt_to_head_page(ptr);
-
-	/* Check slab allocator for flags and size. */
-	if (PageSlab(page))
-		return __check_heap_object(ptr, n, page);
+	if (virt_addr_valid(ptr)) {
+		page = virt_to_head_page(ptr);
 
+		/* Check slab allocator for flags and size. */
+		if (PageSlab(page))
+			return __check_heap_object(ptr, n, page);
 	/* Verify object does not incorrectly span multiple pages. */
-	return check_page_span(ptr, n, page, to_user);
+		return check_page_span(ptr, n, page, to_user);
+	}
+	if (likely(is_vmalloc_addr(ptr))) {
+		page = vmalloc_to_page(ptr);
+		if (unlikely(page && PagePmalloc(page)))
+			return __pmalloc_check_object(ptr, n);
+	}
+	return NULL;
 }
 
 /*