[v2] x86/asm: Pin sensitive CR4 bits

Message ID	20190220180934.GA46255@beast (mailing list archive)
State	New, archived
Headers	show Return-Path: <kernel-hardening-return-15293-patchwork-kernel-hardening=patchwork.kernel.org@lists.openwall.com> Mailing-List: contact kernel-hardening-help@lists.openwall.com; run by ezmlm Precedence: bulk Date: Wed, 20 Feb 2019 10:09:34 -0800 From: Kees Cook <keescook@chromium.org> To: Thomas Gleixner <tglx@linutronix.de> Cc: Jann Horn <jannh@google.com>, Dominik Brodowski <linux@dominikbrodowski.net>, linux-kernel@vger.kernel.org, kernel-hardening@lists.openwall.com, x86@kernel.org Subject: [PATCH v2] x86/asm: Pin sensitive CR4 bits Message-ID: <20190220180934.GA46255@beast> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline
Series	[v2] x86/asm: Pin sensitive CR4 bits \| expand [v2] x86/asm: Pin sensitive CR4 bits

Kees Cook Feb. 20, 2019, 6:09 p.m. UTC

Several recent exploits have used direct calls to the native_write_cr4()
function to disable SMEP and SMAP before then continuing their exploits
using userspace memory access. This pins bits of cr4 so that they cannot
be changed through a common function. This is not intended to be general
ROP protection (which would require CFI to defend against properly), but
rather a way to avoid trivial direct function calling (or CFI bypassing
via a matching function prototype) as seen in:

https://googleprojectzero.blogspot.com/2017/05/exploiting-linux-kernel-via-packet.html
(https://github.com/xairy/kernel-exploits/tree/master/CVE-2017-7308)

The goals of this change:
 - pin specific bits (SMEP, SMAP, and UMIP) when writing cr4.
 - avoid setting the bits too early (they must become pinned only after
   first being used).
 - pinning mask needs to be read-only during normal runtime.
 - pinning needs to be rechecked after set to avoid jumps into the middle
   of the function.

Using __ro_after_init on the mask is done so it can't be first disabled
with a malicious write. And since it becomes read-only, we must avoid
writing to it later (hence the check for bits already having been set
instead of unconditionally writing to the mask).

The use of volatile is done to force the compiler to perform a full reload
of the mask after setting cr4 (to protect against just jumping into the
function past where the masking happens; we must check that the mask was
applied after we do the set). Due to how this function can be built by the
compiler (especially due to the removal of frame pointers), jumping into
the middle of the function frequently doesn't require stack manipulation
to construct a stack frame (there may only a retq without pops, which is
sufficient for use with exploits like timer overwrites mentioned above).

For example, without the recheck, the function may appear as:

   native_write_cr4:
      mov [pin], %rbx
      or  %rbx, %rdi
   1: mov %rdi, %cr4
      retq

The masking "or" could be trivially bypassed by just calling to label "1"
instead of "native_write_cr4". (CFI will force calls to only be able to
call into native_write_cr4, but CFI and CET are uncommon currently.)

Signed-off-by: Kees Cook <keescook@chromium.org>
---
v2: fix think-o in cr4_pin recheck (Jann Horn)
---
 arch/x86/include/asm/special_insns.h | 11 +++++++++++
 arch/x86/kernel/cpu/common.c         | 12 +++++++++++-
 2 files changed, 22 insertions(+), 1 deletion(-)

Solar Designer Feb. 20, 2019, 6:48 p.m. UTC | #1

On Wed, Feb 20, 2019 at 10:09:34AM -0800, Kees Cook wrote:
> +extern volatile unsigned long cr4_pin;
> +
>  static inline void native_write_cr4(unsigned long val)
>  {
> +again:
> +	val |= cr4_pin;
>  	asm volatile("mov %0,%%cr4": : "r" (val), "m" (__force_order));
> +	/*
> +	 * If the MOV above was used directly as a ROP gadget we can
> +	 * notice the lack of pinned bits in "val" and start the function
> +	 * from the beginning to gain the cr4_pin bits for sure.
> +	 */
> +	if (WARN_ONCE((val & cr4_pin) != cr4_pin, "cr4 bypass attempt?!\n"))
> +		goto again;
>  }

I think "goto again" is too mild a response given that it occurs after a
successful write of a non-pinned value to CR4.  I think it'd allow some
exploits to eventually win the race: make their desired use of whatever
functionality SMEP, etc. would have prevented - which may be just a few
instructions they need to run - before the CR4 value is reverted after
"goto again".  I think it's one of those cases where a kernel panic
would be more appropriate.

Also, WARN_ONCE possibly introduces a delay sufficient to realistically
win this race on the first try.  If we choose to warn, we should do it
after having reverted the CR4 value, not before.

Alexander

Kees Cook Feb. 20, 2019, 9:20 p.m. UTC | #2

On Wed, Feb 20, 2019 at 10:49 AM Solar Designer <solar@openwall.com> wrote:
>
> On Wed, Feb 20, 2019 at 10:09:34AM -0800, Kees Cook wrote:
> > +     if (WARN_ONCE((val & cr4_pin) != cr4_pin, "cr4 bypass attempt?!\n"))
> > +             goto again;
>
> I think "goto again" is too mild a response given that it occurs after a
> successful write of a non-pinned value to CR4.  I think it'd allow some
> exploits to eventually win the race: make their desired use of whatever
> functionality SMEP, etc. would have prevented - which may be just a few
> instructions they need to run - before the CR4 value is reverted after
> "goto again".  I think it's one of those cases where a kernel panic
> would be more appropriate.

It will not land upstream with a BUG() or panic(). Linus has
explicitly stated that none of this work can do that until it has
"baked" in the kernel for a couple years.

In his defense, anyone sufficiently paranoid can already raise the
priority of a WARN() into a panic via sysctl kernel.panic_on_warn (and
kernel.panic_on_oops).

> Also, WARN_ONCE possibly introduces a delay sufficient to realistically
> win this race on the first try.  If we choose to warn, we should do it
> after having reverted the CR4 value, not before.

Isn't cr4 CPU-local though? Couldn't we turn off interrupts to stop the race?

Solar Designer Feb. 21, 2019, 1:06 p.m. UTC | #3

On Wed, Feb 20, 2019 at 01:20:58PM -0800, Kees Cook wrote:
> On Wed, Feb 20, 2019 at 10:49 AM Solar Designer <solar@openwall.com> wrote:
> >
> > On Wed, Feb 20, 2019 at 10:09:34AM -0800, Kees Cook wrote:
> > > +     if (WARN_ONCE((val & cr4_pin) != cr4_pin, "cr4 bypass attempt?!\n"))
> > > +             goto again;
> >
> > I think "goto again" is too mild a response given that it occurs after a
> > successful write of a non-pinned value to CR4.  I think it'd allow some
> > exploits to eventually win the race: make their desired use of whatever
> > functionality SMEP, etc. would have prevented - which may be just a few
> > instructions they need to run - before the CR4 value is reverted after
> > "goto again".  I think it's one of those cases where a kernel panic
> > would be more appropriate.
> 
> It will not land upstream with a BUG() or panic(). Linus has
> explicitly stated that none of this work can do that until it has
> "baked" in the kernel for a couple years.

OK.

> In his defense, anyone sufficiently paranoid can already raise the
> priority of a WARN() into a panic via sysctl kernel.panic_on_warn (and
> kernel.panic_on_oops).

I think there are too many uses of WARN() for anyone sane to enable
that in production, whereas it'd have made sense to enable it for the
few security-related uses.

> > Also, WARN_ONCE possibly introduces a delay sufficient to realistically
> > win this race on the first try.  If we choose to warn, we should do it
> > after having reverted the CR4 value, not before.
> 
> Isn't cr4 CPU-local though?

Good point.  I don't know.  If CR4 is per hardware thread, then the race
would require an interrupt and would be much harder to win.

> Couldn't we turn off interrupts to stop the race?

This won't help.  An attack would skip the code that disables interrupts
and land right on the MOV instruction.

Alexander

Kees Cook Feb. 21, 2019, 4:11 p.m. UTC | #4

On Thu, Feb 21, 2019 at 5:06 AM Solar Designer <solar@openwall.com> wrote:
>
> On Wed, Feb 20, 2019 at 01:20:58PM -0800, Kees Cook wrote:
> > On Wed, Feb 20, 2019 at 10:49 AM Solar Designer <solar@openwall.com> wrote:
> > >
> > > On Wed, Feb 20, 2019 at 10:09:34AM -0800, Kees Cook wrote:
> > > > +     if (WARN_ONCE((val & cr4_pin) != cr4_pin, "cr4 bypass attempt?!\n"))
> > > > +             goto again;
> > >
> > > I think "goto again" is too mild a response given that it occurs after a
> > > successful write of a non-pinned value to CR4.  I think it'd allow some
> > > exploits to eventually win the race: make their desired use of whatever
> > > functionality SMEP, etc. would have prevented - which may be just a few
> > > instructions they need to run - before the CR4 value is reverted after
> > > "goto again".  I think it's one of those cases where a kernel panic
> > > would be more appropriate.
> >
> > It will not land upstream with a BUG() or panic(). Linus has
> > explicitly stated that none of this work can do that until it has
> > "baked" in the kernel for a couple years.
>
> OK.
>
> > In his defense, anyone sufficiently paranoid can already raise the
> > priority of a WARN() into a panic via sysctl kernel.panic_on_warn (and
> > kernel.panic_on_oops).
>
> I think there are too many uses of WARN() for anyone sane to enable
> that in production, whereas it'd have made sense to enable it for the
> few security-related uses.

Yeah, that's been my thinking too. I've been thinking about this for a
while trying to decide if we need something between WARN and BUG, but
I can't make up my mind. ;)

> > > Also, WARN_ONCE possibly introduces a delay sufficient to realistically
> > > win this race on the first try.  If we choose to warn, we should do it
> > > after having reverted the CR4 value, not before.
> >
> > Isn't cr4 CPU-local though?
>
> Good point.  I don't know.  If CR4 is per hardware thread, then the race
> would require an interrupt and would be much harder to win.
>
> > Couldn't we turn off interrupts to stop the race?
>
> This won't help.  An attack would skip the code that disables interrupts
> and land right on the MOV instruction.

Oh duh, yeah. ;)

I think v2 is good enough given the constraints we've got.

Thanks for looking at it!

Sean Christopherson Feb. 21, 2019, 5:33 p.m. UTC | #5

On Wed, Feb 20, 2019 at 10:09:34AM -0800, Kees Cook wrote:
> Several recent exploits have used direct calls to the native_write_cr4()
> function to disable SMEP and SMAP before then continuing their exploits
> using userspace memory access. This pins bits of cr4 so that they cannot
> be changed through a common function. This is not intended to be general
> ROP protection (which would require CFI to defend against properly), but
> rather a way to avoid trivial direct function calling (or CFI bypassing
> via a matching function prototype) as seen in:
> 
> https://googleprojectzero.blogspot.com/2017/05/exploiting-linux-kernel-via-packet.html
> (https://github.com/xairy/kernel-exploits/tree/master/CVE-2017-7308)
> 
> The goals of this change:
>  - pin specific bits (SMEP, SMAP, and UMIP) when writing cr4.
>  - avoid setting the bits too early (they must become pinned only after
>    first being used).
>  - pinning mask needs to be read-only during normal runtime.
>  - pinning needs to be rechecked after set to avoid jumps into the middle
>    of the function.
> 
> Using __ro_after_init on the mask is done so it can't be first disabled
> with a malicious write. And since it becomes read-only, we must avoid
> writing to it later (hence the check for bits already having been set
> instead of unconditionally writing to the mask).
> 
> The use of volatile is done to force the compiler to perform a full reload
> of the mask after setting cr4 (to protect against just jumping into the
> function past where the masking happens; we must check that the mask was
> applied after we do the set). Due to how this function can be built by the
> compiler (especially due to the removal of frame pointers), jumping into
> the middle of the function frequently doesn't require stack manipulation
> to construct a stack frame (there may only a retq without pops, which is
> sufficient for use with exploits like timer overwrites mentioned above).
> 
> For example, without the recheck, the function may appear as:
> 
>    native_write_cr4:
>       mov [pin], %rbx
>       or  %rbx, %rdi
>    1: mov %rdi, %cr4
>       retq
> 
> The masking "or" could be trivially bypassed by just calling to label "1"
> instead of "native_write_cr4". (CFI will force calls to only be able to
> call into native_write_cr4, but CFI and CET are uncommon currently.)
> 
> Signed-off-by: Kees Cook <keescook@chromium.org>
> ---
> v2: fix think-o in cr4_pin recheck (Jann Horn)
> ---
>  arch/x86/include/asm/special_insns.h | 11 +++++++++++
>  arch/x86/kernel/cpu/common.c         | 12 +++++++++++-
>  2 files changed, 22 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h
> index 43c029cdc3fe..4c26004ed5d4 100644
> --- a/arch/x86/include/asm/special_insns.h
> +++ b/arch/x86/include/asm/special_insns.h
> @@ -72,9 +72,20 @@ static inline unsigned long native_read_cr4(void)
>  	return val;
>  }
>  
> +extern volatile unsigned long cr4_pin;
> +
>  static inline void native_write_cr4(unsigned long val)
>  {
> +again:
> +	val |= cr4_pin;
>  	asm volatile("mov %0,%%cr4": : "r" (val), "m" (__force_order));
> +	/*
> +	 * If the MOV above was used directly as a ROP gadget we can
> +	 * notice the lack of pinned bits in "val" and start the function
> +	 * from the beginning to gain the cr4_pin bits for sure.
> +	 */
> +	if (WARN_ONCE((val & cr4_pin) != cr4_pin, "cr4 bypass attempt?!\n"))

Printing what bits diverged would be helpful in the unlikely event that the
WARN_ONCE triggers.  "cr4 bypass attempt" is probably only meaningful to
people that are already familiar with the code, e.g.:

	if (WARN_ONCE((val & cr4_pin) != cr4_pin,
	    "Attempt to unpin cr4 bits: %lx, cr4 bypass attack?!", ~val & cr4_pin))


> +		goto again;
>  }
>  
>  #ifdef CONFIG_X86_64

Kees Cook Feb. 21, 2019, 5:37 p.m. UTC | #6

On Thu, Feb 21, 2019 at 9:33 AM Sean Christopherson
<sean.j.christopherson@intel.com> wrote:
> On Wed, Feb 20, 2019 at 10:09:34AM -0800, Kees Cook wrote:
> > +     if (WARN_ONCE((val & cr4_pin) != cr4_pin, "cr4 bypass attempt?!\n"))
>
> Printing what bits diverged would be helpful in the unlikely event that the
> WARN_ONCE triggers.  "cr4 bypass attempt" is probably only meaningful to
> people that are already familiar with the code, e.g.:
>
>         if (WARN_ONCE((val & cr4_pin) != cr4_pin,
>             "Attempt to unpin cr4 bits: %lx, cr4 bypass attack?!", ~val & cr4_pin))

Ah yeah, I like it. I'll send a v3. Thanks!

[v2] x86/asm: Pin sensitive CR4 bits

Commit Message

Comments

Patch