diff mbox series

[03/12] __wr_after_init: generic functionality

Message ID 20181221181423.20455-4-igor.stoppa@huawei.com (mailing list archive)
State New, archived
Headers show
Series [01/12] x86_64: memset_user() | expand

Commit Message

Igor Stoppa Dec. 21, 2018, 6:14 p.m. UTC
The patch provides:
- the generic part of the write rare functionality for static data,
  based on code from Matthew Wilcox
- the dummy functionality, in case an arch doesn't support write rare or
  the functionality is disabled

The basic functions are:
- wr_memset(): write rare counterpart of memset()
- wr_memcpy(): write rare counterpart of memcpy()
- wr_assign(): write rare counterpart of the assignment ('=') operator
- wr_rcu_assign_pointer(): write rare counterpart of rcu_assign_pointer()

Signed-off-by: Igor Stoppa <igor.stoppa@huawei.com>

CC: Andy Lutomirski <luto@amacapital.net>
CC: Nadav Amit <nadav.amit@gmail.com>
CC: Matthew Wilcox <willy@infradead.org>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Kees Cook <keescook@chromium.org>
CC: Dave Hansen <dave.hansen@linux.intel.com>
CC: Mimi Zohar <zohar@linux.vnet.ibm.com>
CC: Thiago Jung Bauermann <bauerman@linux.ibm.com>
CC: Ahmed Soliman <ahmedsoliman@mena.vt.edu>
CC: linux-integrity@vger.kernel.org
CC: kernel-hardening@lists.openwall.com
CC: linux-mm@kvack.org
CC: linux-kernel@vger.kernel.org
---
 include/linux/prmem.h | 106 ++++++++++++++++++++++++++++++++++++++++++
 mm/Makefile           |   1 +
 mm/prmem.c            |  97 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 204 insertions(+)
 create mode 100644 include/linux/prmem.h
 create mode 100644 mm/prmem.c

Comments

Matthew Wilcox (Oracle) Dec. 21, 2018, 6:41 p.m. UTC | #1
On Fri, Dec 21, 2018 at 08:14:14PM +0200, Igor Stoppa wrote:
> +static inline int memtst(void *p, int c, __kernel_size_t len)

I don't understand why you're verifying that writes actually happen
in production code.  Sure, write lib/test_wrmem.c or something, but
verifying every single rare write seems like a mistake to me.

> +#ifndef CONFIG_PRMEM

So is this PRMEM or wr_mem?  It's not obvious that CONFIG_PRMEM controls
wrmem.

> +#define wr_assign(var, val)	((var) = (val))

The hamming distance between 'var' and 'val' is too small.  The convention
in the line immediately below (p and v) is much more readable.

> +#define wr_rcu_assign_pointer(p, v)	rcu_assign_pointer(p, v)
> +#define wr_assign(var, val) ({			\
> +	typeof(var) tmp = (typeof(var))val;	\
> +						\
> +	wr_memcpy(&var, &tmp, sizeof(var));	\
> +	var;					\
> +})

Doesn't wr_memcpy return 'var' anyway?

> +/**
> + * wr_memcpy() - copyes size bytes from q to p

typo

> + * @p: beginning of the memory to write to
> + * @q: beginning of the memory to read from
> + * @size: amount of bytes to copy
> + *
> + * Returns pointer to the destination

> + * The architecture code must provide:
> + *   void __wr_enable(wr_state_t *state)
> + *   void *__wr_addr(void *addr)
> + *   void *__wr_memcpy(void *p, const void *q, __kernel_size_t size)
> + *   void __wr_disable(wr_state_t *state)

This section shouldn't be in the user documentation of wr_memcpy().

> + */
> +void *wr_memcpy(void *p, const void *q, __kernel_size_t size)
> +{
> +	wr_state_t wr_state;
> +	void *wr_poking_addr = __wr_addr(p);
> +
> +	if (WARN_ONCE(!wr_ready, "No writable mapping available") ||

Surely not.  If somebody's called wr_memcpy() before wr_ready is set,
that means we can just call memcpy().
Igor Stoppa Dec. 21, 2018, 7:07 p.m. UTC | #2
On 21/12/2018 20:41, Matthew Wilcox wrote:
> On Fri, Dec 21, 2018 at 08:14:14PM +0200, Igor Stoppa wrote:
>> +static inline int memtst(void *p, int c, __kernel_size_t len)
> 
> I don't understand why you're verifying that writes actually happen
> in production code.  Sure, write lib/test_wrmem.c or something, but
> verifying every single rare write seems like a mistake to me.

This is actually something I wrote more as a stop-gap.
I have the feeling there should be already something similar available.
And probably I could not find it. Unless it's so trivial that it doesn't 
deserve to become a function?

But if there is really no existing alternative, I can put it in a 
separate file.

> 
>> +#ifndef CONFIG_PRMEM
> 
> So is this PRMEM or wr_mem?  It's not obvious that CONFIG_PRMEM controls
> wrmem.

In my mind (maybe still clinging to the old implementation), PRMEM is 
the master toggle, for protected memory.

Then there are various types and the first one being now implemented is 
write rare after init (because ro after init already exists).

However, the same levels of protection should then follow for 
dynamically allocated memory (ye old pmalloc).

PRMEM would then become the moniker for the whole shebang.

>> +#define wr_assign(var, val)	((var) = (val))
> 
> The hamming distance between 'var' and 'val' is too small.  The convention
> in the line immediately below (p and v) is much more readable.

ok, I'll fix it

>> +#define wr_rcu_assign_pointer(p, v)	rcu_assign_pointer(p, v)
>> +#define wr_assign(var, val) ({			\
>> +	typeof(var) tmp = (typeof(var))val;	\
>> +						\
>> +	wr_memcpy(&var, &tmp, sizeof(var));	\
>> +	var;					\
>> +})
> 
> Doesn't wr_memcpy return 'var' anyway?

It should return the destination, which is &var.

But I wanted to return the actual value of the assignment, val

Like if I do  (a = 7)  it evaluates to 7,

similarly wr_assign(a, 7) would also evaluate to 7

The reason why i returned var instead of val is that it would allow to 
detect any error.

>> +/**
>> + * wr_memcpy() - copyes size bytes from q to p
> 
> typo

:-( thanks

>> + * @p: beginning of the memory to write to
>> + * @q: beginning of the memory to read from
>> + * @size: amount of bytes to copy
>> + *
>> + * Returns pointer to the destination
> 
>> + * The architecture code must provide:
>> + *   void __wr_enable(wr_state_t *state)
>> + *   void *__wr_addr(void *addr)
>> + *   void *__wr_memcpy(void *p, const void *q, __kernel_size_t size)
>> + *   void __wr_disable(wr_state_t *state)
> 
> This section shouldn't be in the user documentation of wr_memcpy().

ok

>> + */
>> +void *wr_memcpy(void *p, const void *q, __kernel_size_t size)
>> +{
>> +	wr_state_t wr_state;
>> +	void *wr_poking_addr = __wr_addr(p);
>> +
>> +	if (WARN_ONCE(!wr_ready, "No writable mapping available") ||
> 
> Surely not.  If somebody's called wr_memcpy() before wr_ready is set,
> that means we can just call memcpy().


What I was trying to catch is the case where, after a failed init, the 
writable mapping doesn't exist. In that case wr_ready is also not set.

The problem is that I just don't know what to do in a case where there 
has been such a major error which prevents he creation of hte alternate 
mapping.

I understand that we still want to continue, to provide as much debug 
info as possible, but I am at a loss about finding the saner course of 
actions.

--
igor
Matthew Wilcox (Oracle) Dec. 21, 2018, 7:43 p.m. UTC | #3
On Fri, Dec 21, 2018 at 09:07:54PM +0200, Igor Stoppa wrote:
> On 21/12/2018 20:41, Matthew Wilcox wrote:
> > On Fri, Dec 21, 2018 at 08:14:14PM +0200, Igor Stoppa wrote:
> > > +static inline int memtst(void *p, int c, __kernel_size_t len)
> > 
> > I don't understand why you're verifying that writes actually happen
> > in production code.  Sure, write lib/test_wrmem.c or something, but
> > verifying every single rare write seems like a mistake to me.
> 
> This is actually something I wrote more as a stop-gap.
> I have the feeling there should be already something similar available.
> And probably I could not find it. Unless it's so trivial that it doesn't
> deserve to become a function?
> 
> But if there is really no existing alternative, I can put it in a separate
> file.

I'm not questioning the implementation, I'm questioning why it's ever
called.  If I type 'p = q', I don't then verify that p actually is equal
to q.  I just assume that the compiler did its job.

> > > +#ifndef CONFIG_PRMEM
> > 
> > So is this PRMEM or wr_mem?  It's not obvious that CONFIG_PRMEM controls
> > wrmem.
> 
> In my mind (maybe still clinging to the old implementation), PRMEM is the
> master toggle, for protected memory.
> 
> Then there are various types and the first one being now implemented is
> write rare after init (because ro after init already exists).
> 
> However, the same levels of protection should then follow for dynamically
> allocated memory (ye old pmalloc).
> 
> PRMEM would then become the moniker for the whole shebang.

To my mind, what we have in this patchset is support for statically
allocated protected (or write-rare) memory.  Later, we'll add dynamically
allocated protected memory.  So it's all protected memory, and we'll
use the same accessors for both ... right?

> > > +#define wr_rcu_assign_pointer(p, v)	rcu_assign_pointer(p, v)
> > > +#define wr_assign(var, val) ({			\
> > > +	typeof(var) tmp = (typeof(var))val;	\
> > > +						\
> > > +	wr_memcpy(&var, &tmp, sizeof(var));	\
> > > +	var;					\
> > > +})
> > 
> > Doesn't wr_memcpy return 'var' anyway?
> 
> It should return the destination, which is &var.
> 
> But I wanted to return the actual value of the assignment, val
> 
> Like if I do  (a = 7)  it evaluates to 7,
> 
> similarly wr_assign(a, 7) would also evaluate to 7
> 
> The reason why i returned var instead of val is that it would allow to
> detect any error.

Ah, good point; I missed the var vs &var distinction.

> > > +void *wr_memcpy(void *p, const void *q, __kernel_size_t size)
> > > +{
> > > +	wr_state_t wr_state;
> > > +	void *wr_poking_addr = __wr_addr(p);
> > > +
> > > +	if (WARN_ONCE(!wr_ready, "No writable mapping available") ||
> > 
> > Surely not.  If somebody's called wr_memcpy() before wr_ready is set,
> > that means we can just call memcpy().
> 
> What I was trying to catch is the case where, after a failed init, the
> writable mapping doesn't exist. In that case wr_ready is also not set.
> 
> The problem is that I just don't know what to do in a case where there has
> been such a major error which prevents he creation of hte alternate mapping.
> 
> I understand that we still want to continue, to provide as much debug info
> as possible, but I am at a loss about finding the saner course of actions.

I don't think there's anything to be done in that case.  Indeed,
I think the only thing to do is panic and stop the whole machine if
initialisation fails.  We'd be in a situation where nothing can update
protected memory, and the machine just won't work.

I suppose we could "fail insecure" and never protect the memory, but I
think that's asking for trouble.

Anyway, my concern was for a driver which can be built either as a
module or built-in.  Its init code will be called before write-protection
happens when it's built in, and after write-protection happens when it's
a module.  It should be able to use wr_assign() in either circumstance.
One might also have a utility function which is called from both init
and non-init code and want to use wr_assign() whether initialisation
has completed or not.
Igor Stoppa Dec. 21, 2018, 9:54 p.m. UTC | #4
On 21/12/2018 21:43, Matthew Wilcox wrote:
> On Fri, Dec 21, 2018 at 09:07:54PM +0200, Igor Stoppa wrote:
>> On 21/12/2018 20:41, Matthew Wilcox wrote:
>>> On Fri, Dec 21, 2018 at 08:14:14PM +0200, Igor Stoppa wrote:
>>>> +static inline int memtst(void *p, int c, __kernel_size_t len)
>>>
>>> I don't understand why you're verifying that writes actually happen
>>> in production code.  Sure, write lib/test_wrmem.c or something, but
>>> verifying every single rare write seems like a mistake to me.
>>
>> This is actually something I wrote more as a stop-gap.
>> I have the feeling there should be already something similar available.
>> And probably I could not find it. Unless it's so trivial that it doesn't
>> deserve to become a function?
>>
>> But if there is really no existing alternative, I can put it in a separate
>> file.
> 
> I'm not questioning the implementation, I'm questioning why it's ever
> called.  If I type 'p = q', I don't then verify that p actually is equal
> to q.  I just assume that the compiler did its job. 

Paranoia, probably.

My thinking is that, once the data is protected, it could still be 
attacked through the metadata. A pte, for example.
Preventing the setting of a flag, that for example enables a 
functionality, might be a nice way to thwart all this protection.

If I verify that the write was successful, through the read-only 
address, then I know that the action really completed successfully.

There are many more types of attack that one can come up with, but 
attacking the metadata is probably the most likely next level.

So what I'm trying to do is more akin to:

p = &d;
*p = q;
d == q;

But in our case there is an indefinite amount of time between the 
creation of
the alternate mapping and its use.

Another way could be to check that the mapping is correct before writing 
to it. Maybe safer? I went for confirming that the end result is correct.

Of course it adds overhead, but if the whole thing is already slow and 
happening not too often, how much does it matter?

An alternative approach would be that the code invoking the wr operation 
performs an explicit test.

Would it look better if I implemented this as a wr_assign_verify() 
inline function?

>>>> +#ifndef CONFIG_PRMEM
>>>
>>> So is this PRMEM or wr_mem?  It's not obvious that CONFIG_PRMEM controls
>>> wrmem.
>>
>> In my mind (maybe still clinging to the old implementation), PRMEM is the
>> master toggle, for protected memory.
>>
>> Then there are various types and the first one being now implemented is
>> write rare after init (because ro after init already exists).
>>
>> However, the same levels of protection should then follow for dynamically
>> allocated memory (ye old pmalloc).
>>
>> PRMEM would then become the moniker for the whole shebang.
> 
> To my mind, what we have in this patchset is support for statically
> allocated protected (or write-rare) memory.  Later, we'll add dynamically
> allocated protected memory.  So it's all protected memory, and we'll
> use the same accessors for both ... right?

The static one is only write rare because read only after init already 
exists.

The dynamic one must introduce the same write rare, yes, but it should 
also introduce read_only (I do not count the destruction of an entire 
pool as a write rare operation). Ex: SELinux policyDB.

write rare, regardless if dynamic or static, is a sub-case of protected 
memory, hence the differentiation between protected and write rare.

I'm not claiming to be particularly skilled at choosing names, so if 
something better sounding is available, it can be used.
This is the best I could come up with.

[...]

> I don't think there's anything to be done in that case.  Indeed,
> I think the only thing to do is panic and stop the whole machine if
> initialisation fails.  We'd be in a situation where nothing can update
> protected memory, and the machine just won't work.
> 
> I suppose we could "fail insecure" and never protect the memory, but I
> think that's asking for trouble.

ok, so init will BUG() if it fails, instead of the current WARN_ONCE() 
and return.

> Anyway, my concern was for a driver which can be built either as a
> module or built-in.  Its init code will be called before write-protection
> happens when it's built in, and after write-protection happens when it's
> a module.  It should be able to use wr_assign() in either circumstance.
> One might also have a utility function which is called from both init
> and non-init code and want to use wr_assign() whether initialisation
> has completed or not.

If the writable mapping is created early enough, the only penalty for 
using the write-rare function on a writable variable is that it would be 
slower. Probably there wouldn't be so much data to deal with.

If the driver is dealing with some HW, most likely that would make any 
write rare extra delay look negligible.

--
igor
diff mbox series

Patch

diff --git a/include/linux/prmem.h b/include/linux/prmem.h
new file mode 100644
index 000000000000..12c1d0d1cb78
--- /dev/null
+++ b/include/linux/prmem.h
@@ -0,0 +1,106 @@ 
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * prmem.h: Header for memory protection library
+ *
+ * (C) Copyright 2018 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.stoppa@huawei.com>
+ *
+ * Support for:
+ * - statically allocated write rare data
+ */
+
+#ifndef _LINUX_PRMEM_H
+#define _LINUX_PRMEM_H
+
+#include <linux/set_memory.h>
+#include <linux/mutex.h>
+#include <linux/compiler.h>
+
+
+/**
+ * memtst() - test len bytes starting at p to match the c value
+ * @p: beginning of the memory to test
+ * @c: byte to compare against
+ * @len: amount of bytes to test
+ *
+ * Returns 0 on success, non-zero otherwise.
+ */
+static inline int memtst(void *p, int c, __kernel_size_t len)
+{
+	__kernel_size_t i;
+
+	for (i = 0; i < len; i++) {
+		u8 d =  *(i + (u8 *)p) - (u8)c;
+
+		if (unlikely(d))
+			return d;
+	}
+	return 0;
+}
+
+
+#ifndef CONFIG_PRMEM
+
+static inline void *wr_memset(void *p, int c, __kernel_size_t len)
+{
+	return memset(p, c, len);
+}
+
+static inline void *wr_memcpy(void *p, const void *q, __kernel_size_t size)
+{
+	return memcpy(p, q, size);
+}
+
+#define wr_assign(var, val)	((var) = (val))
+#define wr_rcu_assign_pointer(p, v)	rcu_assign_pointer(p, v)
+
+#else
+
+#include <linux/string.h>
+#include <linux/slab.h>
+#include <linux/mm.h>
+#include <linux/vmalloc.h>
+
+#include <asm/prmem.h>
+
+void *wr_memset(void *p, int c, __kernel_size_t len);
+void *wr_memcpy(void *p, const void *q, __kernel_size_t size);
+
+/**
+ * wr_assign() - sets a write-rare variable to a specified value
+ * @var: the variable to set
+ * @val: the new value
+ *
+ * Returns: the variable
+ *
+ * Note: it might be possible to optimize this, to use wr_memset in some
+ * cases (maybe with NULL?).
+ */
+
+#define wr_assign(var, val) ({			\
+	typeof(var) tmp = (typeof(var))val;	\
+						\
+	wr_memcpy(&var, &tmp, sizeof(var));	\
+	var;					\
+})
+
+/**
+ * wr_rcu_assign_pointer() - initialize a pointer in rcu mode
+ * @p: the rcu pointer - it MUST be aligned to a machine word
+ * @v: the new value
+ *
+ * Returns the value assigned to the rcu pointer.
+ *
+ * It is provided as macro, to match rcu_assign_pointer()
+ * The rcu_assign_pointer() is implemented as equivalent of:
+ *
+ * smp_mb();
+ * WRITE_ONCE();
+ */
+#define wr_rcu_assign_pointer(p, v) ({	\
+	smp_mb();			\
+	wr_assign(p, v);		\
+	p;				\
+})
+#endif
+#endif
diff --git a/mm/Makefile b/mm/Makefile
index d210cc9d6f80..ef3867c16ce0 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -58,6 +58,7 @@  obj-$(CONFIG_SPARSEMEM)	+= sparse.o
 obj-$(CONFIG_SPARSEMEM_VMEMMAP) += sparse-vmemmap.o
 obj-$(CONFIG_SLOB) += slob.o
 obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o
+obj-$(CONFIG_PRMEM) += prmem.o
 obj-$(CONFIG_KSM) += ksm.o
 obj-$(CONFIG_PAGE_POISONING) += page_poison.o
 obj-$(CONFIG_SLAB) += slab.o
diff --git a/mm/prmem.c b/mm/prmem.c
new file mode 100644
index 000000000000..e1c1be3a1171
--- /dev/null
+++ b/mm/prmem.c
@@ -0,0 +1,97 @@ 
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * prmem.c: Memory Protection Library
+ *
+ * (C) Copyright 2017-2018 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.stoppa@huawei.com>
+ */
+
+#include <linux/mm.h>
+#include <linux/string.h>
+#include <linux/compiler.h>
+#include <linux/slab.h>
+#include <linux/mmu_context.h>
+#include <linux/rcupdate.h>
+#include <linux/prmem.h>
+
+__ro_after_init bool wr_ready;
+
+/*
+ * The following two variables are statically allocated by the linker
+ * script at the the boundaries of the memory region (rounded up to
+ * multiples of PAGE_SIZE) reserved for __wr_after_init.
+ */
+extern long __start_wr_after_init;
+extern long __end_wr_after_init;
+static unsigned long start = (unsigned long)&__start_wr_after_init;
+static unsigned long end = (unsigned long)&__end_wr_after_init;
+
+static inline bool is_wr_after_init(void *p, __kernel_size_t size)
+{
+	unsigned long low = (unsigned long)p;
+	unsigned long high = low + size;
+
+	return likely(start <= low && high <= end);
+}
+
+/**
+ * wr_memcpy() - copyes size bytes from q to p
+ * @p: beginning of the memory to write to
+ * @q: beginning of the memory to read from
+ * @size: amount of bytes to copy
+ *
+ * Returns pointer to the destination
+ *
+ * The architecture code must provide:
+ *   void __wr_enable(wr_state_t *state)
+ *   void *__wr_addr(void *addr)
+ *   void *__wr_memcpy(void *p, const void *q, __kernel_size_t size)
+ *   void __wr_disable(wr_state_t *state)
+ */
+void *wr_memcpy(void *p, const void *q, __kernel_size_t size)
+{
+	wr_state_t wr_state;
+	void *wr_poking_addr = __wr_addr(p);
+
+	if (WARN_ONCE(!wr_ready, "No writable mapping available") ||
+	    WARN_ONCE(!is_wr_after_init(p, size), "Invalid WR range."))
+		return p;
+
+	local_irq_disable();
+	__wr_enable(&wr_state);
+	__wr_memcpy(wr_poking_addr, q, size);
+	__wr_disable(&wr_state);
+	local_irq_enable();
+	return p;
+}
+
+/**
+ * wr_memset() - sets len bytes of the destination p to the c value
+ * @p: beginning of the memory to write to
+ * @c: byte to replicate
+ * @len: amount of bytes to copy
+ *
+ * Returns pointer to the destination
+ *
+ * The architecture code must provide:
+ *   void __wr_enable(wr_state_t *state)
+ *   void *__wr_addr(void *addr)
+ *   void *__wr_memset(void *p, int c, __kernel_size_t len)
+ *   void __wr_disable(wr_state_t *state)
+ */
+void *wr_memset(void *p, int c, __kernel_size_t len)
+{
+	wr_state_t wr_state;
+	void *wr_poking_addr = __wr_addr(p);
+
+	if (WARN_ONCE(!wr_ready, "No writable mapping available") ||
+	    WARN_ONCE(!is_wr_after_init(p, len), "Invalid WR range."))
+		return p;
+
+	local_irq_disable();
+	__wr_enable(&wr_state);
+	__wr_memset(wr_poking_addr, c, len);
+	__wr_disable(&wr_state);
+	local_irq_enable();
+	return p;
+}