diff mbox

[RFC,4/8] x86: Implement __arch_rare_write_map/unmap()

Message ID 1488228186-110679-5-git-send-email-keescook@chromium.org (mailing list archive)
State New, archived
Headers show

Commit Message

Kees Cook Feb. 27, 2017, 8:43 p.m. UTC
Based on PaX's x86 pax_{open,close}_kernel() implementation, this
allows HAVE_ARCH_RARE_WRITE to work on x86.

There is missing work to sort out some header file issues where preempt.h
is missing, though it can't be included in pg_table.h unconditionally...
some other solution will be needed, perhaps an entirely separate header
file for rare_write()-related defines...

This patch is also missing paravirt support.

Signed-off-by: Kees Cook <keescook@chromium.org>
---
 arch/x86/Kconfig               |  1 +
 arch/x86/include/asm/pgtable.h | 31 +++++++++++++++++++++++++++++++
 2 files changed, 32 insertions(+)

Comments

Andy Lutomirski Feb. 28, 2017, 7:33 p.m. UTC | #1
On Mon, Feb 27, 2017 at 12:43 PM, Kees Cook <keescook@chromium.org> wrote:
> Based on PaX's x86 pax_{open,close}_kernel() implementation, this
> allows HAVE_ARCH_RARE_WRITE to work on x86.
>
> There is missing work to sort out some header file issues where preempt.h
> is missing, though it can't be included in pg_table.h unconditionally...
> some other solution will be needed, perhaps an entirely separate header
> file for rare_write()-related defines...
>
> This patch is also missing paravirt support.
>
> Signed-off-by: Kees Cook <keescook@chromium.org>
> ---
>  arch/x86/Kconfig               |  1 +
>  arch/x86/include/asm/pgtable.h | 31 +++++++++++++++++++++++++++++++
>  2 files changed, 32 insertions(+)
>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index e487493bbd47..6d4d6f73d4b8 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -102,6 +102,7 @@ config X86
>         select HAVE_ARCH_KMEMCHECK
>         select HAVE_ARCH_MMAP_RND_BITS          if MMU
>         select HAVE_ARCH_MMAP_RND_COMPAT_BITS   if MMU && COMPAT
> +       select HAVE_ARCH_RARE_WRITE
>         select HAVE_ARCH_SECCOMP_FILTER
>         select HAVE_ARCH_TRACEHOOK
>         select HAVE_ARCH_TRANSPARENT_HUGEPAGE
> diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
> index 437feb436efa..86e78e97738f 100644
> --- a/arch/x86/include/asm/pgtable.h
> +++ b/arch/x86/include/asm/pgtable.h
> @@ -90,6 +90,37 @@ extern struct mm_struct *pgd_page_get_mm(struct page *page);
>
>  #endif /* CONFIG_PARAVIRT */
>
> +/* TODO: Bad hack to deal with preempt macros being missing sometimes. */
> +#ifndef preempt_disable
> +#include <linux/preempt.h>
> +#endif
> +
> +static inline unsigned long __arch_rare_write_map(void)
> +{
> +       unsigned long cr0;
> +
> +       preempt_disable();
> +       barrier();
> +       cr0 = read_cr0() ^ X86_CR0_WP;
> +       BUG_ON(cr0 & X86_CR0_WP);
> +       write_cr0(cr0);
> +       barrier();
> +       return cr0 ^ X86_CR0_WP;
> +}
> +
> +static inline unsigned long __arch_rare_write_unmap(void)
> +{
> +       unsigned long cr0;
> +
> +       barrier();
> +       cr0 = read_cr0() ^ X86_CR0_WP;
> +       BUG_ON(!(cr0 & X86_CR0_WP));
> +       write_cr0(cr0);
> +       barrier();
> +       preempt_enable_no_resched();
> +       return cr0 ^ X86_CR0_WP;
> +}

This seems like a wonderful ROP target.  Off-hand, I'd guess the
reason it's never been a problem (that I've heard of) in grsecurity is
that grsecurity isn't quite popular enough to be a big publicly
discussed target.

Can't we at least make this:

struct rare_write_mapping {
  void *addr;
  struct arch_rare_write_state arch_state;
};

static inline struct rare_write_mapping __arch_rare_write_map(void
*addr, size_t len);
static inline void __arch_rare_write_unmap(struct rare_write_mapping mapping);

or similar, this allowing a non-terrifying implementation (e.g. switch
CR3 to a specialized pagetable) down the road?

I realize that if you get exactly the code generation you want, the
CR0.WP approach *might* be okay, but that seems quite fragile.
Kees Cook Feb. 28, 2017, 9:35 p.m. UTC | #2
On Tue, Feb 28, 2017 at 11:33 AM, Andy Lutomirski <luto@amacapital.net> wrote:
> On Mon, Feb 27, 2017 at 12:43 PM, Kees Cook <keescook@chromium.org> wrote:
>> Based on PaX's x86 pax_{open,close}_kernel() implementation, this
>> allows HAVE_ARCH_RARE_WRITE to work on x86.
>>
>> There is missing work to sort out some header file issues where preempt.h
>> is missing, though it can't be included in pg_table.h unconditionally...
>> some other solution will be needed, perhaps an entirely separate header
>> file for rare_write()-related defines...
>>
>> This patch is also missing paravirt support.
>>
>> Signed-off-by: Kees Cook <keescook@chromium.org>
>> ---
>> [...]
>> +static inline unsigned long __arch_rare_write_unmap(void)
>> +{
>> +       unsigned long cr0;
>> +
>> +       barrier();
>> +       cr0 = read_cr0() ^ X86_CR0_WP;
>> +       BUG_ON(!(cr0 & X86_CR0_WP));
>> +       write_cr0(cr0);
>> +       barrier();
>> +       preempt_enable_no_resched();
>> +       return cr0 ^ X86_CR0_WP;
>> +}
>
> This seems like a wonderful ROP target.  Off-hand, I'd guess the
> reason it's never been a problem (that I've heard of) in grsecurity is
> that grsecurity isn't quite popular enough to be a big publicly
> discussed target.

The reason it's less risky is due to the inlining. This will always
produce code where WP is left enabled again before a "ret" path is
taken. That said, it is still a minor ROP target in my mind, since it
still has the form:

turn off read-only;
write a thing;
turn on read-only;

But frankly, if an attacker already has enough control over the stack
to build up registers for the "write a thing" step to do what they
want it to do, they likely already have vast control over the kernel
already.

> Can't we at least make this:
>
> struct rare_write_mapping {
>   void *addr;
>   struct arch_rare_write_state arch_state;
> };
>
> static inline struct rare_write_mapping __arch_rare_write_map(void
> *addr, size_t len);
> static inline void __arch_rare_write_unmap(struct rare_write_mapping mapping);

How do you envision this working with the more complex things I
included in the latter patches of the series, namely doing linked list
updates across multiple structure instances, etc?

ie, poor list manipulation pseudo-code:

turn off read-only;
struct_one->next = struct_tree->node;
struct_three->prev = struct_one->node;
struct_two->prev = struct_two->next = NULL;
turn on read-only;

That's three separate memory areas involved...

> or similar, this allowing a non-terrifying implementation (e.g. switch
> CR3 to a specialized pagetable) down the road?

We'd need some kind of per-CPU mapping that other CPUs couldn't get
at... which is kind of what the WP bit gets us already. (And ARM is
similar: it elevates permissions on the kernel memory domain to ignore
restrictions.)

> I realize that if you get exactly the code generation you want, the
> CR0.WP approach *might* be okay, but that seems quite fragile.

I think with preempt off, barriers, and inlining, this should be pretty safe...

(Which reminds me that my x86 implementation needs to use
__always_inline, rather than just inline...)

-Kees
Andy Lutomirski Feb. 28, 2017, 10:54 p.m. UTC | #3
On Tue, Feb 28, 2017 at 1:35 PM, Kees Cook <keescook@chromium.org> wrote:
>> Can't we at least make this:
>>
>> struct rare_write_mapping {
>>   void *addr;
>>   struct arch_rare_write_state arch_state;
>> };
>>
>> static inline struct rare_write_mapping __arch_rare_write_map(void
>> *addr, size_t len);
>> static inline void __arch_rare_write_unmap(struct rare_write_mapping mapping);
>
> How do you envision this working with the more complex things I
> included in the latter patches of the series, namely doing linked list
> updates across multiple structure instances, etc?
>
> ie, poor list manipulation pseudo-code:
>
> turn off read-only;
> struct_one->next = struct_tree->node;
> struct_three->prev = struct_one->node;
> struct_two->prev = struct_two->next = NULL;
> turn on read-only;
>
> That's three separate memory areas involved...

Fair enough.  That being said, how is this supposed to work on
architectures that don't have a global "disable write protection" bit?
 Surely these architectures exist.

>
>> or similar, this allowing a non-terrifying implementation (e.g. switch
>> CR3 to a specialized pagetable) down the road?
>
> We'd need some kind of per-CPU mapping that other CPUs couldn't get
> at... which is kind of what the WP bit gets us already. (And ARM is
> similar: it elevates permissions on the kernel memory domain to ignore
> restrictions.)
>
>> I realize that if you get exactly the code generation you want, the
>> CR0.WP approach *might* be okay, but that seems quite fragile.
>
> I think with preempt off, barriers, and inlining, this should be pretty safe...
>
Kees Cook Feb. 28, 2017, 11:52 p.m. UTC | #4
On Tue, Feb 28, 2017 at 2:54 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> On Tue, Feb 28, 2017 at 1:35 PM, Kees Cook <keescook@chromium.org> wrote:
>>> Can't we at least make this:
>>>
>>> struct rare_write_mapping {
>>>   void *addr;
>>>   struct arch_rare_write_state arch_state;
>>> };
>>>
>>> static inline struct rare_write_mapping __arch_rare_write_map(void
>>> *addr, size_t len);
>>> static inline void __arch_rare_write_unmap(struct rare_write_mapping mapping);
>>
>> How do you envision this working with the more complex things I
>> included in the latter patches of the series, namely doing linked list
>> updates across multiple structure instances, etc?
>>
>> ie, poor list manipulation pseudo-code:
>>
>> turn off read-only;
>> struct_one->next = struct_tree->node;
>> struct_three->prev = struct_one->node;
>> struct_two->prev = struct_two->next = NULL;
>> turn on read-only;
>>
>> That's three separate memory areas involved...
>
> Fair enough.  That being said, how is this supposed to work on
> architectures that don't have a global "disable write protection" bit?
> Surely these architectures exist.

I don't know. :) That's part of the reason for putting up this series:
looking to see what's possible on other architectures. It's not clear
to me what arm64 can do, for example. Without domains there didn't
seem to be an obvious global override. My intention is to make sure we
get a viable interface for the architectures that are interested in
these kinds of self-protection thingies. :)

-Kees
Mark Rutland March 1, 2017, 10:59 a.m. UTC | #5
On Tue, Feb 28, 2017 at 01:35:13PM -0800, Kees Cook wrote:
> On Tue, Feb 28, 2017 at 11:33 AM, Andy Lutomirski <luto@amacapital.net> wrote:
> > On Mon, Feb 27, 2017 at 12:43 PM, Kees Cook <keescook@chromium.org> wrote:
> >> Based on PaX's x86 pax_{open,close}_kernel() implementation, this
> >> allows HAVE_ARCH_RARE_WRITE to work on x86.
> >>
> >> There is missing work to sort out some header file issues where preempt.h
> >> is missing, though it can't be included in pg_table.h unconditionally...
> >> some other solution will be needed, perhaps an entirely separate header
> >> file for rare_write()-related defines...
> >>
> >> This patch is also missing paravirt support.
> >>
> >> Signed-off-by: Kees Cook <keescook@chromium.org>
> >> ---
> >> [...]
> >> +static inline unsigned long __arch_rare_write_unmap(void)
> >> +{
> >> +       unsigned long cr0;
> >> +
> >> +       barrier();
> >> +       cr0 = read_cr0() ^ X86_CR0_WP;
> >> +       BUG_ON(!(cr0 & X86_CR0_WP));
> >> +       write_cr0(cr0);
> >> +       barrier();
> >> +       preempt_enable_no_resched();
> >> +       return cr0 ^ X86_CR0_WP;
> >> +}

> > Can't we at least make this:
> >
> > struct rare_write_mapping {
> >   void *addr;
> >   struct arch_rare_write_state arch_state;
> > };
> >
> > static inline struct rare_write_mapping __arch_rare_write_map(void
> > *addr, size_t len);
> > static inline void __arch_rare_write_unmap(struct rare_write_mapping mapping);
> 
> How do you envision this working with the more complex things I
> included in the latter patches of the series, namely doing linked list
> updates across multiple structure instances, etc?
> 
> ie, poor list manipulation pseudo-code:
> 
> turn off read-only;
> struct_one->next = struct_tree->node;
> struct_three->prev = struct_one->node;
> struct_two->prev = struct_two->next = NULL;
> turn on read-only;
> 
> That's three separate memory areas involved...

Do we need all list manipulation primitives, or is is just list_add()
and list_del() that we really care about?

If we only need to support a limited set of primitives, then we could
special-case those, e.g. for the above:

	rare_write_map();
	__rare_write(struct_one->next, struct_tree->node);
	__rare_write(struct_three->prev, struct_one->node);
	__rare_write(struct_two->prev, NULL);
	__rare_write(struct_two->next, NULL);
	rare_write_unmap();

... then the __rare_write() can map/unmap a separate page table if
required. We can have separate rare_write() and __rare_write() to fold
in the rare_write_{map,unmap}(), as with {__,}copy_*_user().

That doesn't work if we need arbitrary primitives, certainly.

Thanks,
Mark.
Mark Rutland March 1, 2017, 11:24 a.m. UTC | #6
On Tue, Feb 28, 2017 at 03:52:26PM -0800, Kees Cook wrote:
> On Tue, Feb 28, 2017 at 2:54 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> > On Tue, Feb 28, 2017 at 1:35 PM, Kees Cook <keescook@chromium.org> wrote:
> >>> Can't we at least make this:
> >>>
> >>> struct rare_write_mapping {
> >>>   void *addr;
> >>>   struct arch_rare_write_state arch_state;
> >>> };
> >>>
> >>> static inline struct rare_write_mapping __arch_rare_write_map(void
> >>> *addr, size_t len);
> >>> static inline void __arch_rare_write_unmap(struct rare_write_mapping mapping);
> >>
> >> How do you envision this working with the more complex things I
> >> included in the latter patches of the series, namely doing linked list
> >> updates across multiple structure instances, etc?
> >>
> >> ie, poor list manipulation pseudo-code:
> >>
> >> turn off read-only;
> >> struct_one->next = struct_tree->node;
> >> struct_three->prev = struct_one->node;
> >> struct_two->prev = struct_two->next = NULL;
> >> turn on read-only;
> >>
> >> That's three separate memory areas involved...
> >
> > Fair enough.  That being said, how is this supposed to work on
> > architectures that don't have a global "disable write protection" bit?
> > Surely these architectures exist.
> 
> I don't know. :) That's part of the reason for putting up this series:
> looking to see what's possible on other architectures. It's not clear
> to me what arm64 can do, for example.

There is no global override of this sort on arm64. Just having map/unap,
open/close, shed/unshed, etc, won't work.

The options I can think of for arm64 are:

* Have a separate RW alias of just the write_rarely data, that we
  temporarily map-in on a given CPU (using TTBR0) to perform the write.
  The RW alias is at a different VA to the usual RO alias, so we have to
  convert each pointer to its RW alias to perform the write. That's why
  we need __rare_write_ptr() to hide this, and can't have uninstrumented
  writes.
  
  Since this would *only* map the write_rarely data, it's simple to set
  up, and we don't need to modify the tables at runtime.
  
  I also think we can implement this generically using switch_mm() and
  {get,put}_user(), or specialised variants thereof.

  Assuming we can figure out how to handle those complex cases, this is
  my preferred solution. :)
  
* Have a copy of the entire swapper page tables, which differs only in
  the write_rarely data being RW. We then switch TTBR1 over to this for
  the duration of the rare_write_map() .. write_write_unmap() sequence.

  While this sounds conceptually simple, it's far more complex in
  practice. Keeping the not-quite-clone of the swapper in-sync is going
  to be very painful and ripe for error.

  Additionally, the TTBR1 switch will be very expensive, far more so
  than the TTBR0 switch due to TLB maintenance and other work (including
  switching TTBR0 twice per TTBR1 switch, itself requiring synchronised
  TLB maintenance and other work).

  I am not fond of this approach.

* Give up on this being per-cpu. Either change the permissions in place,
  or fixmap an RW alias as required. In either case, all CPUs will have
  RW access.

Thanks,
Mark.
Kees Cook March 1, 2017, 8:25 p.m. UTC | #7
On Wed, Mar 1, 2017 at 3:24 AM, Mark Rutland <mark.rutland@arm.com> wrote:
> On Tue, Feb 28, 2017 at 03:52:26PM -0800, Kees Cook wrote:
>> On Tue, Feb 28, 2017 at 2:54 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>> > On Tue, Feb 28, 2017 at 1:35 PM, Kees Cook <keescook@chromium.org> wrote:
>> >>> Can't we at least make this:
>> >>>
>> >>> struct rare_write_mapping {
>> >>>   void *addr;
>> >>>   struct arch_rare_write_state arch_state;
>> >>> };
>> >>>
>> >>> static inline struct rare_write_mapping __arch_rare_write_map(void
>> >>> *addr, size_t len);
>> >>> static inline void __arch_rare_write_unmap(struct rare_write_mapping mapping);
>> >>
>> >> How do you envision this working with the more complex things I
>> >> included in the latter patches of the series, namely doing linked list
>> >> updates across multiple structure instances, etc?
>> >>
>> >> ie, poor list manipulation pseudo-code:
>> >>
>> >> turn off read-only;
>> >> struct_one->next = struct_tree->node;
>> >> struct_three->prev = struct_one->node;
>> >> struct_two->prev = struct_two->next = NULL;
>> >> turn on read-only;
>> >>
>> >> That's three separate memory areas involved...
>> >
>> > Fair enough.  That being said, how is this supposed to work on
>> > architectures that don't have a global "disable write protection" bit?
>> > Surely these architectures exist.
>>
>> I don't know. :) That's part of the reason for putting up this series:
>> looking to see what's possible on other architectures. It's not clear
>> to me what arm64 can do, for example.
>
> There is no global override of this sort on arm64. Just having map/unap,
> open/close, shed/unshed, etc, won't work.
>
> The options I can think of for arm64 are:
>
> * Have a separate RW alias of just the write_rarely data, that we
>   temporarily map-in on a given CPU (using TTBR0) to perform the write.
>   The RW alias is at a different VA to the usual RO alias, so we have to
>   convert each pointer to its RW alias to perform the write. That's why
>   we need __rare_write_ptr() to hide this, and can't have uninstrumented
>   writes.

I think only the list code isn't instrumented, and that's just because
it discards casts outside the function. There's no reason it couldn't
be instrumented. All the other places I can see are using the
const_cast() explicitly because otherwise gcc freaks out (since the
constify plugin forces the variables const). There are a few places
where things aren't explicitly const_cast() (like in the modules).

Oh hey, I just found a bug in the open/close stuff, that's why there
was a giant routine misdetected by scripts... just changes one of the
outiers though. I'll send a separate email with the correction...

>   Since this would *only* map the write_rarely data, it's simple to set
>   up, and we don't need to modify the tables at runtime.
>
>   I also think we can implement this generically using switch_mm() and
>   {get,put}_user(), or specialised variants thereof.
>
>   Assuming we can figure out how to handle those complex cases, this is
>   my preferred solution. :)

Would this alias be CPU-local? (I assume yes, given the "give up on on
being per-cpu" option below..)

> * Have a copy of the entire swapper page tables, which differs only in
>   the write_rarely data being RW. We then switch TTBR1 over to this for
>   the duration of the rare_write_map() .. write_write_unmap() sequence.
>
>   While this sounds conceptually simple, it's far more complex in
>   practice. Keeping the not-quite-clone of the swapper in-sync is going
>   to be very painful and ripe for error.
>
>   Additionally, the TTBR1 switch will be very expensive, far more so
>   than the TTBR0 switch due to TLB maintenance and other work (including
>   switching TTBR0 twice per TTBR1 switch, itself requiring synchronised
>   TLB maintenance and other work).
>
>   I am not fond of this approach.
>
> * Give up on this being per-cpu. Either change the permissions in place,
>   or fixmap an RW alias as required. In either case, all CPUs will have
>   RW access.

-Kees
Mark Rutland March 2, 2017, 11:20 a.m. UTC | #8
On Wed, Mar 01, 2017 at 12:25:11PM -0800, Kees Cook wrote:
> On Wed, Mar 1, 2017 at 3:24 AM, Mark Rutland <mark.rutland@arm.com> wrote:
> > There is no global override of this sort on arm64. Just having map/unap,
> > open/close, shed/unshed, etc, won't work.
> >
> > The options I can think of for arm64 are:
> >
> > * Have a separate RW alias of just the write_rarely data, that we
> >   temporarily map-in on a given CPU (using TTBR0) to perform the write.
> >   The RW alias is at a different VA to the usual RO alias, so we have to
> >   convert each pointer to its RW alias to perform the write. That's why
> >   we need __rare_write_ptr() to hide this, and can't have uninstrumented
> >   writes.
> 
> I think only the list code isn't instrumented, and that's just because
> it discards casts outside the function. There's no reason it couldn't
> be instrumented. 

Ok, it sounds like we could make this work, then.

> >   Since this would *only* map the write_rarely data, it's simple to set
> >   up, and we don't need to modify the tables at runtime.
> >
> >   I also think we can implement this generically using switch_mm() and
> >   {get,put}_user(), or specialised variants thereof.
> >
> >   Assuming we can figure out how to handle those complex cases, this is
> >   my preferred solution. :)
> 
> Would this alias be CPU-local? (I assume yes, given the "give up on on
> being per-cpu" option below..)

Yes, this would be CPU-local. It would be like mapping the idmap, or
userspace.

Thanks,
Mark.
Hoeun Ryu March 3, 2017, 12:59 a.m. UTC | #9
>> On Mar 1, 2017, at 8:24 PM, Mark Rutland <mark.rutland@arm.com> wrote:
>> 
>>> On Tue, Feb 28, 2017 at 03:52:26PM -0800, Kees Cook wrote:
>>> On Tue, Feb 28, 2017 at 2:54 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>>> On Tue, Feb 28, 2017 at 1:35 PM, Kees Cook <keescook@chromium.org> wrote:
>>>>> Can't we at least make this:
>>>>> 
>>>>> struct rare_write_mapping {
>>>>> void *addr;
>>>>> struct arch_rare_write_state arch_state;
>>>>> };
>>>>> 
>>>>> static inline struct rare_write_mapping __arch_rare_write_map(void
>>>>> *addr, size_t len);
>>>>> static inline void __arch_rare_write_unmap(struct rare_write_mapping mapping);
>>>> 
>>>> How do you envision this working with the more complex things I
>>>> included in the latter patches of the series, namely doing linked list
>>>> updates across multiple structure instances, etc?
>>>> 
>>>> ie, poor list manipulation pseudo-code:
>>>> 
>>>> turn off read-only;
>>>> struct_one->next = struct_tree->node;
>>>> struct_three->prev = struct_one->node;
>>>> struct_two->prev = struct_two->next = NULL;
>>>> turn on read-only;
>>>> 
>>>> That's three separate memory areas involved...
>>> 
>>> Fair enough.  That being said, how is this supposed to work on
>>> architectures that don't have a global "disable write protection" bit?
>>> Surely these architectures exist.
>> 
>> I don't know. :) That's part of the reason for putting up this series:
>> looking to see what's possible on other architectures. It's not clear
>> to me what arm64 can do, for example.
> 
> There is no global override of this sort on arm64. Just having map/unap,
> open/close, shed/unshed, etc, won't work.
> 
> The options I can think of for arm64 are:
> 
> * Have a separate RW alias of just the write_rarely data, that we
> temporarily map-in on a given CPU (using TTBR0) to perform the write.
> The RW alias is at a different VA to the usual RO alias, so we have to
> convert each pointer to its RW alias to perform the write. That's why
> we need __rare_write_ptr() to hide this, and can't have uninstrumented
> writes.
> 
> Since this would *only* map the write_rarely data, it's simple to set
> up, and we don't need to modify the tables at runtime.
> 
> I also think we can implement this generically using switch_mm() and
> {get,put}_user(), or specialised variants thereof.
> 
> Assuming we can figure out how to handle those complex cases, this is
> my preferred solution. :)

I implemented RFC version of the option #1 .
It would be appreciated if you could review the code [1].

[1] http://www.openwall.com/lists/kernel-hardening/2017/03/02/5

> 
> * Have a copy of the entire swapper page tables, which differs only in
> the write_rarely data being RW. We then switch TTBR1 over to this for
> the duration of the rare_write_map() .. write_write_unmap() sequence.
> 
> While this sounds conceptually simple, it's far more complex in
> practice. Keeping the not-quite-clone of the swapper in-sync is going
> to be very painful and ripe for error.
> 
> Additionally, the TTBR1 switch will be very expensive, far more so
> than the TTBR0 switch due to TLB maintenance and other work (including
> switching TTBR0 twice per TTBR1 switch, itself requiring synchronised
> TLB maintenance and other work).
> 
> I am not fond of this approach.
> 
> * Give up on this being per-cpu. Either change the permissions in place,
> or fixmap an RW alias as required. In either case, all CPUs will have
> RW access.
> 
> Thanks,
> Mark.
diff mbox

Patch

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index e487493bbd47..6d4d6f73d4b8 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -102,6 +102,7 @@  config X86
 	select HAVE_ARCH_KMEMCHECK
 	select HAVE_ARCH_MMAP_RND_BITS		if MMU
 	select HAVE_ARCH_MMAP_RND_COMPAT_BITS	if MMU && COMPAT
+	select HAVE_ARCH_RARE_WRITE
 	select HAVE_ARCH_SECCOMP_FILTER
 	select HAVE_ARCH_TRACEHOOK
 	select HAVE_ARCH_TRANSPARENT_HUGEPAGE
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 437feb436efa..86e78e97738f 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -90,6 +90,37 @@  extern struct mm_struct *pgd_page_get_mm(struct page *page);
 
 #endif	/* CONFIG_PARAVIRT */
 
+/* TODO: Bad hack to deal with preempt macros being missing sometimes. */
+#ifndef preempt_disable
+#include <linux/preempt.h>
+#endif
+
+static inline unsigned long __arch_rare_write_map(void)
+{
+	unsigned long cr0;
+
+	preempt_disable();
+	barrier();
+	cr0 = read_cr0() ^ X86_CR0_WP;
+	BUG_ON(cr0 & X86_CR0_WP);
+	write_cr0(cr0);
+	barrier();
+	return cr0 ^ X86_CR0_WP;
+}
+
+static inline unsigned long __arch_rare_write_unmap(void)
+{
+	unsigned long cr0;
+
+	barrier();
+	cr0 = read_cr0() ^ X86_CR0_WP;
+	BUG_ON(!(cr0 & X86_CR0_WP));
+	write_cr0(cr0);
+	barrier();
+	preempt_enable_no_resched();
+	return cr0 ^ X86_CR0_WP;
+}
+
 /*
  * The following only work if pte_present() is true.
  * Undefined behaviour if not..