diff mbox

[v2,07/19] x86: introduce __uaccess_begin_nospec and ASM_IFENCE

Message ID 151571802258.27429.932636277047687877.stgit@dwillia2-desk3.amr.corp.intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Dan Williams Jan. 12, 2018, 12:47 a.m. UTC
For 'get_user' paths, do not allow the kernel to speculate on the value
of a user controlled pointer. In addition to the 'stac' instruction for
Supervisor Mode Access Protection, an 'ifence' causes the 'access_ok'
result to resolve in the pipeline before the cpu might take any
speculative action on the pointer value.

Since this is a major kernel interface that deals with user controlled
data, the '__uaccess_begin_nospec' mechanism will prevent speculative
execution past an 'access_ok' permission check. While speculative
execution past 'access_ok' is not enough to lead to a kernel memory
leak, it is a necessary precondition.

To be clear, '__uaccess_begin_nospec' and ASM_IFENCE are not addressing
any known issues with 'get_user' they are addressing a class of
potential problems that could be near 'get_user' usages. In other words,
these helpers are for hygiene not clinical fixes.

There are no functional changes in this patch.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Suggested-by: Andi Kleen <ak@linux.intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: x86@kernel.org
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/x86/include/asm/smap.h    |    4 ++++
 arch/x86/include/asm/uaccess.h |   10 ++++++++++
 2 files changed, 14 insertions(+)

Comments

Josh Poimboeuf Jan. 12, 2018, 5:51 p.m. UTC | #1
On Thu, Jan 11, 2018 at 04:47:02PM -0800, Dan Williams wrote:
> For 'get_user' paths, do not allow the kernel to speculate on the value
> of a user controlled pointer. In addition to the 'stac' instruction for
> Supervisor Mode Access Protection, an 'ifence' causes the 'access_ok'
> result to resolve in the pipeline before the cpu might take any
> speculative action on the pointer value.

So I understand the need to "patch first and ask questions later".  I
also understand that usercopy is an obvious attack point for speculative
bugs.  However, I'm still hopelessly confused about what exactly this
patch (and the next one) are supposed to accomplish.

I can't figure out if:

a) I'm missing something completely obvious;
b) this is poorly described; or
c) it doesn't actually fix/protect/harden anything.

The commit log doesn't help me at all.  In fact, it confuses me more.
For example, this paragraph:

> Since this is a major kernel interface that deals with user controlled
> data, the '__uaccess_begin_nospec' mechanism will prevent speculative
> execution past an 'access_ok' permission check. While speculative
> execution past 'access_ok' is not enough to lead to a kernel memory
> leak, it is a necessary precondition.

That just sounds wrong.  What if the speculation starts *after* the
access_ok() check?  Then the barrier has no purpose.

Most access_ok/get_user/copy_from_user calls are like this:

  if (copy_from_user(...uptr..))  /* or access_ok() or get_user() */
  	return -EFAULT;

So in other words, the usercopy function is called *before* the branch.

But to halt speculation, the lfence needs to come *after* the branch.
So putting lfences *before* the branch doesn't solve anything.

So what am I missing?
Dan Williams Jan. 12, 2018, 6:21 p.m. UTC | #2
On Fri, Jan 12, 2018 at 9:51 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> On Thu, Jan 11, 2018 at 04:47:02PM -0800, Dan Williams wrote:
>> For 'get_user' paths, do not allow the kernel to speculate on the value
>> of a user controlled pointer. In addition to the 'stac' instruction for
>> Supervisor Mode Access Protection, an 'ifence' causes the 'access_ok'
>> result to resolve in the pipeline before the cpu might take any
>> speculative action on the pointer value.
>
> So I understand the need to "patch first and ask questions later".  I
> also understand that usercopy is an obvious attack point for speculative
> bugs.  However, I'm still hopelessly confused about what exactly this
> patch (and the next one) are supposed to accomplish.
>
> I can't figure out if:
>
> a) I'm missing something completely obvious;
> b) this is poorly described; or
> c) it doesn't actually fix/protect/harden anything.
>
> The commit log doesn't help me at all.  In fact, it confuses me more.
> For example, this paragraph:
>
>> Since this is a major kernel interface that deals with user controlled
>> data, the '__uaccess_begin_nospec' mechanism will prevent speculative
>> execution past an 'access_ok' permission check. While speculative
>> execution past 'access_ok' is not enough to lead to a kernel memory
>> leak, it is a necessary precondition.
>
> That just sounds wrong.  What if the speculation starts *after* the
> access_ok() check?  Then the barrier has no purpose.
>
> Most access_ok/get_user/copy_from_user calls are like this:
>
>   if (copy_from_user(...uptr..))  /* or access_ok() or get_user() */
>         return -EFAULT;
>
> So in other words, the usercopy function is called *before* the branch.
>
> But to halt speculation, the lfence needs to come *after* the branch.
> So putting lfences *before* the branch doesn't solve anything.
>
> So what am I missing?

We're trying to prevent a pointer under user control from being
de-referenced inside the kernel, before we know it has been limited to
something safe. In the following sequence the branch we are worried
about speculating is the privilege check:

if (access_ok(uptr))  /* <--- Privelege Check */
    if (copy_from_user_(uptr))

The cpu can speculatively skip that access_ok() check and cause a read
of kernel memory.
Josh Poimboeuf Jan. 12, 2018, 6:58 p.m. UTC | #3
On Fri, Jan 12, 2018 at 10:21:43AM -0800, Dan Williams wrote:
> > That just sounds wrong.  What if the speculation starts *after* the
> > access_ok() check?  Then the barrier has no purpose.
> >
> > Most access_ok/get_user/copy_from_user calls are like this:
> >
> >   if (copy_from_user(...uptr..))  /* or access_ok() or get_user() */
> >         return -EFAULT;
> >
> > So in other words, the usercopy function is called *before* the branch.
> >
> > But to halt speculation, the lfence needs to come *after* the branch.
> > So putting lfences *before* the branch doesn't solve anything.
> >
> > So what am I missing?
> 
> We're trying to prevent a pointer under user control from being
> de-referenced inside the kernel, before we know it has been limited to
> something safe. In the following sequence the branch we are worried
> about speculating is the privilege check:
> 
> if (access_ok(uptr))  /* <--- Privelege Check */
>     if (copy_from_user_(uptr))
> 
> The cpu can speculatively skip that access_ok() check and cause a read
> of kernel memory.

Converting your example code to assembly:

	call	access_ok # no speculation which started before this point is allowed to continue past this point
	test	%rax, %rax
	jne	error_path
dereference_uptr:
	(do nefarious things with the user pointer)

error_path:
	mov -EINVAL, %rax
	ret

So the CPU is still free to speculately execute the dereference_uptr
block because the lfence was before the 'jne error_path' branch.
Dan Williams Jan. 12, 2018, 7:26 p.m. UTC | #4
On Fri, Jan 12, 2018 at 10:58 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> On Fri, Jan 12, 2018 at 10:21:43AM -0800, Dan Williams wrote:
>> > That just sounds wrong.  What if the speculation starts *after* the
>> > access_ok() check?  Then the barrier has no purpose.
>> >
>> > Most access_ok/get_user/copy_from_user calls are like this:
>> >
>> >   if (copy_from_user(...uptr..))  /* or access_ok() or get_user() */
>> >         return -EFAULT;
>> >
>> > So in other words, the usercopy function is called *before* the branch.
>> >
>> > But to halt speculation, the lfence needs to come *after* the branch.
>> > So putting lfences *before* the branch doesn't solve anything.
>> >
>> > So what am I missing?
>>
>> We're trying to prevent a pointer under user control from being
>> de-referenced inside the kernel, before we know it has been limited to
>> something safe. In the following sequence the branch we are worried
>> about speculating is the privilege check:
>>
>> if (access_ok(uptr))  /* <--- Privelege Check */
>>     if (copy_from_user_(uptr))
>>
>> The cpu can speculatively skip that access_ok() check and cause a read
>> of kernel memory.
>
> Converting your example code to assembly:
>
>         call    access_ok # no speculation which started before this point is allowed to continue past this point
>         test    %rax, %rax
>         jne     error_path
> dereference_uptr:
>         (do nefarious things with the user pointer)
>
> error_path:
>         mov -EINVAL, %rax
>         ret
>
> So the CPU is still free to speculately execute the dereference_uptr
> block because the lfence was before the 'jne error_path' branch.

By the time we get to de-reference uptr we know it is not pointing at
kernel memory, because access_ok would have failed and the cpu would
have waited for that failure result before doing anything else.

Now the vulnerability that still exists after this fence is if we do
something with the value returned from de-referencing that pointer.
I.e. if we do:

get_user(val, uptr);
if (val < array_max)
    array[val];

...then we're back to having a user controlled pointer in the kernel.
This is the point where we need array_ptr() to sanitize things.
Linus Torvalds Jan. 12, 2018, 8:01 p.m. UTC | #5
On Fri, Jan 12, 2018 at 11:26 AM, Dan Williams <dan.j.williams@intel.com> wrote:
>
> By the time we get to de-reference uptr we know it is not pointing at
> kernel memory, because access_ok would have failed and the cpu would
> have waited for that failure result before doing anything else.

I'm not actually convinced that's right in the original patches,
exactly because of the issue that Josh pointed out: even if there is a
comparison inside access_ok() that will be properly serialized, then
that comparison can (and sometimes does) just cause a truth value to
be generated, and then there  might be *another* comparison of that
return value after the lfence. And while the return value is table,
the conditional branch on that comparison isn't.

The new model of just doing it together with the STAC should be fine, though.

I do think that it would be a good idea to very expressly document the
fact that it's not that the user access itself is unsafe. I do agree
that things like "get_user()" want to be protected, but not because of
any direct bugs or problems with get_user() and friends, but simply
because get_user() is an excellent source of a pointer that is
obviously controlled from a potentially attacking user space. So it's
a prime candidate for then finding _subsequent_ accesses that can then
be used to perturb the cache.

                 Linus
Josh Poimboeuf Jan. 12, 2018, 8:41 p.m. UTC | #6
On Fri, Jan 12, 2018 at 12:01:04PM -0800, Linus Torvalds wrote:
> On Fri, Jan 12, 2018 at 11:26 AM, Dan Williams <dan.j.williams@intel.com> wrote:
> >
> > By the time we get to de-reference uptr we know it is not pointing at
> > kernel memory, because access_ok would have failed and the cpu would
> > have waited for that failure result before doing anything else.
> 
> I'm not actually convinced that's right in the original patches,
> exactly because of the issue that Josh pointed out: even if there is a
> comparison inside access_ok() that will be properly serialized, then
> that comparison can (and sometimes does) just cause a truth value to
> be generated, and then there  might be *another* comparison of that
> return value after the lfence. And while the return value is table,
> the conditional branch on that comparison isn't.
> 
> The new model of just doing it together with the STAC should be fine, though.

Aha, that clears it up for me, thanks.  I was still thinking about the
previous version of the patch which had the barrier in access_ok().  I
didn't realize the new version moved the barrier to after the
access_ok() checks.
diff mbox

Patch

diff --git a/arch/x86/include/asm/smap.h b/arch/x86/include/asm/smap.h
index db333300bd4b..0b59707e0b46 100644
--- a/arch/x86/include/asm/smap.h
+++ b/arch/x86/include/asm/smap.h
@@ -40,6 +40,10 @@ 
 
 #endif /* CONFIG_X86_SMAP */
 
+#define ASM_IFENCE \
+	ALTERNATIVE_2 "", "mfence", X86_FEATURE_MFENCE_RDTSC, \
+			  "lfence", X86_FEATURE_LFENCE_RDTSC
+
 #else /* __ASSEMBLY__ */
 
 #include <asm/alternative.h>
diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index 574dff4d2913..a31fd4fc6483 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -124,6 +124,11 @@  extern int __get_user_bad(void);
 
 #define __uaccess_begin() stac()
 #define __uaccess_end()   clac()
+#define __uaccess_begin_nospec()	\
+({					\
+	stac();				\
+	ifence();			\
+})
 
 /*
  * This is a type: either unsigned long, if the argument fits into
@@ -487,6 +492,11 @@  struct __large_struct { unsigned long buf[100]; };
 	__uaccess_begin();						\
 	barrier();
 
+#define uaccess_try_nospec do {						\
+	current->thread.uaccess_err = 0;				\
+	__uaccess_begin_nospec();					\
+	barrier();
+
 #define uaccess_catch(err)						\
 	__uaccess_end();						\
 	(err) |= (current->thread.uaccess_err ? -EFAULT : 0);		\