diff mbox

patches for __write_rarely section

Message ID CAAseMr6Fae=4otZZX9HmO1YgehjsVmd+44G5RN7Wu8iF2+fruA@mail.gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Gengjia Chen Nov. 16, 2016, 2:39 p.m. UTC
Hello kees and everyone :

This email introduces two patches.

Patch 1 introduce the write-rarely memory section for
those kernel objects which are read only mostly
but need to be written to sometimes.

Patch 2 introduce two helper functions (mark_wrdata_rw/
mark_wrdata_ro) to make __write_rarely memory section
writable or unwritable. They play like the pax_open_kernel/
pax_close_kernel functions in grsecurity patch.Right now
this only been implemented on arm32.

These two patches were based on the work by PaxTeam and Grsecurity.
These two patched were based on https://github.com/torvalds/linux.git :
27bcd37e0240bbe33f0efe244b5aad52104115b3

By the way, I am a new recruit for kernel coding so I hope that
someone can review these patches and tell me what to do next.
Also, I am working on the implement of mark_wrdata_rw/mark_wrdata_ro
on aarch64, any advice about that would be appreciated.

  * mapping to be mapped at.  This is particularly important for

Comments

Mark Rutland Nov. 16, 2016, 3:16 p.m. UTC | #1
On Wed, Nov 16, 2016 at 10:39:38PM +0800, Gengjia Chen wrote:
>    Hello kees and everyone :

Hi,

>    This email introduces two patches.

As a general note, it would be better to use git send-email to send
these patches as separate emails, in reply to a cover letter. That's
more in keeping with usual Linux list style, and makes review and
application of patches easier/possible.

e.g. assuming you have two patches on a branch foo:

$ git format-patch --cover-leter foo~2..foo
$ ${EDITOR} 0000-* # write your cover letter comments
$ git send-email --to=${whoever} --cc={who_else} *.patch

See $(man git-send-email) and $(man git-config) for how to set that up.

>    Patch 1 introduce the write-rarely memory section for
>    those kernel objects which are read only mostly 
>    but need to be written to sometimes.
>    Patch 2 introduce two helper functions (mark_wrdata_rw/
>    mark_wrdata_ro) to make __write_rarely memory section
>    writable or unwritable. They play like the pax_open_kernel/
>    pax_close_kernel functions in grsecurity patch.Right now 
>    this only been implemented on arm32.

I've mentioned this elsewhere, but I don't think {open,close}_kernel()
works as an interface. Specifically, I don't believe it can be safely
and efficiently implemented for arm with LPAE, nor arm64.

To cater for those it may be possible to use a temporary RW mapping
separate from the usual kernel mapping (e.g. in TTBR0 on arm64). It
should be possible to have an API that can use either approach, if
writers are suitable annotated.

That all said, from the two patches it's not even clear how this would
be used, as nothing is marked __write_rarely, nor are any writers
updated. A demonstration would be helpful.

It would also be worth getting these onto the relevant lists (per
scripts/get_maintainer.pl), so as to get the relevant core and
architecture maintainers involved as early as possible.

Thanks,
Mark.
Rik van Riel Nov. 16, 2016, 3:22 p.m. UTC | #2
On Wed, 2016-11-16 at 22:39 +0800, Gengjia Chen wrote:
> Hello kees and everyone :
> 
> This email introduces two patches.
> 
> Patch 1 introduce the write-rarely memory section for
> those kernel objects which are read only mostly 
> but need to be written to sometimes.
> 
> Patch 2 introduce two helper functions (mark_wrdata_rw/
> mark_wrdata_ro) to make __write_rarely memory section
> writable or unwritable. They play like the pax_open_kernel/
> pax_close_kernel functions in grsecurity patch.Right now 
> this only been implemented on arm32.

Is this the way we want to go with this, or could it
make more sense to have a special ELF section in the
kernel that is allowed to write to write-rarely memory,
and have the exception handler deal with writes coming
from that code?

That way it would be much harder to insert code into the
kernel that calls your mark_wrdata_rw/mark_wrdata_ro
functions.
Greg KH Nov. 16, 2016, 4:14 p.m. UTC | #3
On Wed, Nov 16, 2016 at 10:39:38PM +0800, Gengjia Chen wrote:
> From: chengengjia <chengjia4574@gmail.com>
> Date: Wed, 16 Nov 2016 21:46:49 +0800
> Subject: [PATCH] arch: introduce write-rarely memory section
> 
> This introduces __write_rarely section as a way
> to mark those memory that read only in most case,
> but also need to be written to sometimes.
> 
> Signed-off-by: chengengjia <chengjia4574@gmail.com>

Minor nit, please use your full/real name for From: and signed-off-by:
lines.

thanks,

greg k-h
Kees Cook Nov. 17, 2016, 11:47 p.m. UTC | #4
On Wed, Nov 16, 2016 at 7:16 AM, Mark Rutland <mark.rutland@arm.com> wrote:
> On Wed, Nov 16, 2016 at 10:39:38PM +0800, Gengjia Chen wrote:
>>    Patch 1 introduce the write-rarely memory section for
>>    those kernel objects which are read only mostly
>>    but need to be written to sometimes.
>>    Patch 2 introduce two helper functions (mark_wrdata_rw/
>>    mark_wrdata_ro) to make __write_rarely memory section
>>    writable or unwritable. They play like the pax_open_kernel/
>>    pax_close_kernel functions in grsecurity patch.Right now
>>    this only been implemented on arm32.

Thanks Gengjia for working on this!

As part of this, can you add some lkdtm (drivers/misc/lkdtm*.c) tests
for these kinds of memory sections, so it's possible to test the
feature? And as other have mentioned, finding some simple examples of
where to use this primitive would be great. I think I saw a small one
in grsecurity in drivers/cdrom/cdrom.c where cdo->generic_packet gets
assigned. It's the only thing updated in the entire cdo object:

-       if (!cdo->generic_packet)
-               cdo->generic_packet = cdrom_dummy_generic_packet;
+       if (!cdo->generic_packet) {
+               pax_open_kernel();
+               const_cast(cdo->generic_packet) = cdrom_dummy_generic_packet;
+               pax_close_kernel();
+       }

> I've mentioned this elsewhere, but I don't think {open,close}_kernel()
> works as an interface. Specifically, I don't believe it can be safely
> and efficiently implemented for arm with LPAE, nor arm64.

I've seen similar objections from x86 maintainers.

> To cater for those it may be possible to use a temporary RW mapping
> separate from the usual kernel mapping (e.g. in TTBR0 on arm64). It
> should be possible to have an API that can use either approach, if
> writers are suitable annotated.

Do you mean something besides a new fixmap?

As for annotation, this was something that came up during Kernel
Summit too. It'd be nice to have a type-checked function of some kind
that could be used. Maybe something like this (using the cdrom example
above):

   rare_write(cdo, generic_packet, cdrom_dummy_generic_packet);

> That all said, from the two patches it's not even clear how this would
> be used, as nothing is marked __write_rarely, nor are any writers
> updated. A demonstration would be helpful.

The primary user of this would be the constify plugin, which marks all
structures that contain only function pointers as being read-only. It
also takes annotations for other things that are found manually.

> It would also be worth getting these onto the relevant lists (per
> scripts/get_maintainer.pl), so as to get the relevant core and
> architecture maintainers involved as early as possible.

Yeah, once some more examples are seen, it should be clear how to best
build the API.

-Kees
Mark Rutland Nov. 18, 2016, 11:10 a.m. UTC | #5
On Thu, Nov 17, 2016 at 03:47:31PM -0800, Kees Cook wrote:
> On Wed, Nov 16, 2016 at 7:16 AM, Mark Rutland <mark.rutland@arm.com> wrote:
> > On Wed, Nov 16, 2016 at 10:39:38PM +0800, Gengjia Chen wrote:

> +               pax_open_kernel();
> +               const_cast(cdo->generic_packet) = cdrom_dummy_generic_packet;
> +               pax_close_kernel();

> > I've mentioned this elsewhere, but I don't think {open,close}_kernel()
> > works as an interface. Specifically, I don't believe it can be safely
> > and efficiently implemented for arm with LPAE, nor arm64.
> 
> I've seen similar objections from x86 maintainers.
> 
> > To cater for those it may be possible to use a temporary RW mapping
> > separate from the usual kernel mapping (e.g. in TTBR0 on arm64). It
> > should be possible to have an API that can use either approach, if
> > writers are suitable annotated.
> 
> Do you mean something besides a new fixmap?

Yes.

I think i mentioned this in passing at LSS. For arm64, my idea was to
have the RW mapping in its own mm, using the processes/idmap address
area. That way it will only be mapped per-thread as required. Other
architectures without something like domains should be able to do
something similar.

I believe the only additional part we need for that is something to
convert a pointer to its RW alias. See below for an example.

We already do both of these things on arm64 in other cases, e.g. for
temporarily mapping EFI runtime services, switching TTBR1 safely.

> As for annotation, this was something that came up during Kernel
> Summit too. It'd be nice to have a type-checked function of some kind
> that could be used. Maybe something like this (using the cdrom example
> above):
> 
>    rare_write(cdo, generic_packet, cdrom_dummy_generic_packet);

Something like that should be possible. Though I'd expect just var and
val as arguments, e.g.

	rare_write(cdo->generic_packet, cdrom_dummy_generic_packet);

With something like the below (assuming all helpers are inlined):

#define rare_write(__var, __val) ({				\
	typeof(var) *__rw_var;					\
	__some_typecheck_perhaps(__var, __val);			\
								\
	__rw_var = __rare_rw_ptr(&(__var));			\
								\
	__rare_rw_map();					\
	*__rw_var = (__val);					\
	__rare_rw_unmap();					\
								\
	__clobber_var_so_gcc_knows(__var);			\
})

... which should work in the case of a separate mapping, or
finer-grained per-thread permission controls (e.g. ARM domains, x86
pkeys).

... and the trivial case is simple enough:

#if !IS_ENABLED(WHATEVER_THIS_IS_CALLED)
#define __rare_rw_ptr(__p)	__p
#define __rare_rw_map()
#define __rare_rw_unmap()
#endif

Thanks,
Mark.
Mark Rutland Nov. 18, 2016, 11:42 a.m. UTC | #6
On Fri, Nov 18, 2016 at 11:10:59AM +0000, Mark Rutland wrote:
> With something like the below (assuming all helpers are inlined):
> 
> #define rare_write(__var, __val) ({				\
> 	typeof(var) *__rw_var;					\

... iwith s/var/__var/ here...

> 	__some_typecheck_perhaps(__var, __val);			\
> 								\
> 	__rw_var = __rare_rw_ptr(&(__var));			\
> 								\
> 	__rare_rw_map();					\
> 	*__rw_var = (__val);					\
> 	__rare_rw_unmap();					\
> 								\
> 	__clobber_var_so_gcc_knows(__var);			\
> })

... and with explicit clobber gone, since __rare_rw_unmap would
presumably have to have a memory clobber anyway, just like the fixmap
code.

Thanks,
Mark.
diff mbox

Patch

==== patch 1 =====

From 7ee1ed5a723723112167ca8603f4e2945d133b33 Mon Sep 17 00:00:00 2001
From: chengengjia <chengjia4574@gmail.com>
Date: Wed, 16 Nov 2016 21:46:49 +0800
Subject: [PATCH] arch: introduce write-rarely memory section

This introduces __write_rarely section as a way
to mark those memory that read only in most case,
but also need to be written to sometimes.

Signed-off-by: chengengjia <chengjia4574@gmail.com>
---
 include/asm-generic/vmlinux.lds.h | 9 +++++++++
 include/linux/cache.h             | 9 +++++++++
 2 files changed, 18 insertions(+)

diff --git a/include/asm-generic/vmlinux.lds.h
b/include/asm-generic/vmlinux.lds.h
index 3074796..684bcf2 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -263,6 +263,14 @@ 
 #endif

 /*
+ * Allow architectures to handle write_rarely data on their
+ * own by defining an empty WRITE_RARELY_DATA.
+ */
+#ifndef WRITE_RARELY_DATA
+#define WRITE_RARELY_DATA *(.data..write_rarely)
+#endif
+
+/*
  * Read only Data
  */
 #define RO_DATA_SECTION(align) \
@@ -271,6 +279,7 @@ 
  VMLINUX_SYMBOL(__start_rodata) = .; \
  *(.rodata) *(.rodata.*) \
  RO_AFTER_INIT_DATA /* Read only after init */ \
+ WRITE_RARELY_DATA /* write rarely*/ \
  *(__vermagic) /* Kernel version magic */ \
  . = ALIGN(8); \
  VMLINUX_SYMBOL(__start___tracepoints_ptrs) = .; \
diff --git a/include/linux/cache.h b/include/linux/cache.h
index 1be04f8..d2c30cd 100644
--- a/include/linux/cache.h
+++ b/include/linux/cache.h
@@ -30,6 +30,15 @@ 
 #define __ro_after_init
__attribute__((__section__(".data..ro_after_init")))
 #endif

+/*
+ * __write_rarely is used to mark things that are read-only mostly but
need to be
+ * written to sometimes. To write to __write_rarely variables,
mark_wrdata_rw()
+ * need to be called, after write, mark_wrdata_ro() should be called.
+ */
+#ifndef __write_rarely
+#define __write_rarely __attribute__((__section__(".data..write_rarely")))
+#endif
+
 #ifndef ____cacheline_aligned
 #define ____cacheline_aligned __attribute__((__aligned__(SMP_CACHE_BYTES)))
 #endif
-- 
1.9.1


==== patch 2 =====

From 0472f3f14532577a1963d517da8a0edfe1d99375 Mon Sep 17 00:00:00 2001
From: chengengjia <chengjia4574@gmail.com>
Date: Wed, 16 Nov 2016 22:16:43 +0800
Subject: [PATCH] arch: arm: add mark_wrdata_rw/mark_wrdata_ro functions

This introduces mark_wrdata_rw/mark_wrdata_ro to
mark __write_rarely section writable/unwritable.
Based on work by PaxTeam and Grsecurity.

Signed-off-by: chengengjia <chengjia4574@gmail.com>
---
 arch/arm/include/asm/domain.h  | 14 ++++++++++++++
 arch/arm/include/asm/pgtable.h | 38 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 52 insertions(+)

diff --git a/arch/arm/include/asm/domain.h b/arch/arm/include/asm/domain.h
index 99d9f63..01f18a5 100644
--- a/arch/arm/include/asm/domain.h
+++ b/arch/arm/include/asm/domain.h
@@ -54,6 +54,10 @@ 
 #else
 #define DOMAIN_MANAGER 1
 #endif
+#ifdef  CONFIG_DEBUG_RODATA
+#define DOMAIN_OPENKERN 3
+#endif
+

 #define domain_mask(dom) ((3) << (2 * (dom)))
 #define domain_val(dom,type) ((type) << (2 * (dom)))
@@ -124,6 +128,16 @@  static inline void set_domain(unsigned val)
  set_domain(domain); \
  } while (0)

+#elif defined(CONFIG_DEBUG_RODATA)
+#define modify_domain(dom,type)                                 \
+        do {                                                    \
+                struct thread_info *thread = current_thread_info(); \
+                unsigned int domain = get_domain();             \
+                domain &= ~domain_mask(dom);                    \
+                domain = domain | domain_val(dom, type);        \
+                thread->cpu_domain = domain;                    \
+                set_domain(domain);                             \
+        } while (0)
 #else
 static inline void modify_domain(unsigned dom, unsigned type) { }
 #endif
diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
index a8d656d..5e6f553 100644
--- a/arch/arm/include/asm/pgtable.h
+++ b/arch/arm/include/asm/pgtable.h
@@ -56,6 +56,44 @@ 
 #define pmd_ERROR(pmd) __pmd_error(__FILE__, __LINE__, pmd)
 #define pgd_ERROR(pgd) __pgd_error(__FILE__, __LINE__, pgd)

+#ifdef  CONFIG_DEBUG_RODATA
+#include <asm/domain.h>
+#include <linux/thread_info.h>
+#include <linux/preempt.h>
+
+static inline int test_domain(int domain, int domaintype)
+{
+        return ((current_thread_info()->cpu_domain) & domain_val(domain,
3))
+ == domain_val(domain, domaintype);
+}
+
+static inline unsigned long mark_wrdata_rw(void) {
+#ifdef CONFIG_ARM_LPAE
+        /* TODO */
+#else
+        preempt_disable();
+        BUG_ON(test_domain(DOMAIN_KERNEL, DOMAIN_OPENKERN));
+        modify_domain(DOMAIN_KERNEL, DOMAIN_OPENKERN);
+#endif
+        return 0;
+}
+
+static inline unsigned long mark_wrdata_ro(void) {
+#ifdef CONFIG_ARM_LPAE
+        /* TODO */
+#else
+        BUG_ON(test_domain(DOMAIN_KERNEL, DOMAIN_MANAGER));
+        modify_domain(DOMAIN_KERNEL, DOMAIN_MANAGER);
+        preempt_enable_no_resched();
+#endif
+        return 0;
+}
+#else
+static inline unsigned long mark_wrdata_rw(void) { return 0; }
+static inline unsigned long mark_wrdata_ro(void) { return 0; }
+#endif
+
+
 /*
  * This is the lowest virtual address we can permit any user space