diff mbox

[v5,04/12] x86: introduce __uaccess_begin_nospec and ifence

Message ID 151703973427.26578.15693075353773519333.stgit@dwillia2-desk3.amr.corp.intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Dan Williams Jan. 27, 2018, 7:55 a.m. UTC
For '__get_user' paths, do not allow the kernel to speculate on the
value of a user controlled pointer. In addition to the 'stac'
instruction for Supervisor Mode Access Protection, an 'ifence' causes
the 'access_ok' result to resolve in the pipeline before the cpu might
take any speculative action on the pointer value.

Since __get_user is a major kernel interface that deals with user
controlled pointers, the '__uaccess_begin_nospec' mechanism will prevent
speculative execution past an 'access_ok' permission check. While
speculative execution past 'access_ok' is not enough to lead to a kernel
memory leak, it is a necessary precondition.

To be clear, '__uaccess_begin_nospec' is addressing a class of potential
problems near '__get_user' usages.

Note, that while ifence is used to protect '__get_user', pointer masking
will be used for 'get_user' since it incorporates a bounds check near
the usage.

There are no functional changes in this patch.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Suggested-by: Andi Kleen <ak@linux.intel.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: x86@kernel.org
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/x86/include/asm/barrier.h |    4 ++++
 arch/x86/include/asm/msr.h     |    3 +--
 arch/x86/include/asm/uaccess.h |    9 +++++++++
 3 files changed, 14 insertions(+), 2 deletions(-)

Comments

Ingo Molnar Jan. 28, 2018, 9:06 a.m. UTC | #1
* Dan Williams <dan.j.williams@intel.com> wrote:

> For '__get_user' paths, do not allow the kernel to speculate on the
> value of a user controlled pointer. In addition to the 'stac'
> instruction for Supervisor Mode Access Protection, an 'ifence' causes
> the 'access_ok' result to resolve in the pipeline before the cpu might
> take any speculative action on the pointer value.
> 
> Since __get_user is a major kernel interface that deals with user
> controlled pointers, the '__uaccess_begin_nospec' mechanism will prevent
> speculative execution past an 'access_ok' permission check. While
> speculative execution past 'access_ok' is not enough to lead to a kernel
> memory leak, it is a necessary precondition.
> 
> To be clear, '__uaccess_begin_nospec' is addressing a class of potential
> problems near '__get_user' usages.
> 
> Note, that while ifence is used to protect '__get_user', pointer masking
> will be used for 'get_user' since it incorporates a bounds check near
> the usage.
> 
> There are no functional changes in this patch.

The style problems/inconsistencies of the #2 patch are repeated here too.

Also, please split this patch into two patches:

 - #1 introducing ifence() and using it where it was open coded before
 - #2 introducing the _nospec() uaccess variants

> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
> Suggested-by: Andi Kleen <ak@linux.intel.com>
> Cc: Tom Lendacky <thomas.lendacky@amd.com>
> Cc: Al Viro <viro@zeniv.linux.org.uk>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: x86@kernel.org
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  arch/x86/include/asm/barrier.h |    4 ++++
>  arch/x86/include/asm/msr.h     |    3 +--
>  arch/x86/include/asm/uaccess.h |    9 +++++++++
>  3 files changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
> index 30419b674ebd..5f11d4c5c862 100644
> --- a/arch/x86/include/asm/barrier.h
> +++ b/arch/x86/include/asm/barrier.h
> @@ -46,6 +46,10 @@ static inline unsigned long array_idx_mask(unsigned long idx, unsigned long sz)
>  	return mask;
>  }
>  
> +/* prevent speculative execution past this barrier */

Please use consistent capitalization in comments.

Thanks,

	Ingo
Ingo Molnar Jan. 28, 2018, 9:14 a.m. UTC | #2
* Dan Williams <dan.j.williams@intel.com> wrote:

> --- a/arch/x86/include/asm/uaccess.h
> +++ b/arch/x86/include/asm/uaccess.h
> @@ -124,6 +124,11 @@ extern int __get_user_bad(void);
>  
>  #define __uaccess_begin() stac()
>  #define __uaccess_end()   clac()
> +#define __uaccess_begin_nospec()	\
> +({					\
> +	stac();				\
> +	ifence();			\
> +})

BTW., wouldn't it be better to switch the barrier order here, i.e. to do:

	ifence();			\
	stac();				\

?

The reason is that stac()/clac() is usually paired, so there's a chance with short 
sequences that it would resolve with 'no externally visible changes to flags'.

Also, there's many cases where flags are modified _inside_ the STAC/CLAC section, 
so grouping them together inside a speculation atom could be beneficial.

The flip side is that if the MFENCE stalls the STAC that is ahead of it could be 
processed for 'free' - while it's always post barrier with my suggestion.

But in any case it would be nice to see a discussion of this aspect in the 
changelog, even if the patch does not change.

Thanks,

	Ingo
Dan Williams Jan. 29, 2018, 8:41 p.m. UTC | #3
On Sun, Jan 28, 2018 at 1:14 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Dan Williams <dan.j.williams@intel.com> wrote:
>
>> --- a/arch/x86/include/asm/uaccess.h
>> +++ b/arch/x86/include/asm/uaccess.h
>> @@ -124,6 +124,11 @@ extern int __get_user_bad(void);
>>
>>  #define __uaccess_begin() stac()
>>  #define __uaccess_end()   clac()
>> +#define __uaccess_begin_nospec()     \
>> +({                                   \
>> +     stac();                         \
>> +     ifence();                       \
>> +})
>
> BTW., wouldn't it be better to switch the barrier order here, i.e. to do:
>
>         ifence();                       \
>         stac();                         \
>
> ?
>
> The reason is that stac()/clac() is usually paired, so there's a chance with short
> sequences that it would resolve with 'no externally visible changes to flags'.
>
> Also, there's many cases where flags are modified _inside_ the STAC/CLAC section,
> so grouping them together inside a speculation atom could be beneficial.

I'm struggling a bit to grok this. Are you referring to the state of
the flags from the address limit comparison? That's the result that
needs fencing before speculative execution leaks to to the user
pointer de-reference.

> The flip side is that if the MFENCE stalls the STAC that is ahead of it could be
> processed for 'free' - while it's always post barrier with my suggestion.

This 'for free' aspect is what I aiming for.

>
> But in any case it would be nice to see a discussion of this aspect in the
> changelog, even if the patch does not change.

I'll add a note to the changelog that having the fence after the
'stac' hopefully allows some overlap of the cost of 'stac' and the
flushing of the instruction pipeline.
Ingo Molnar Jan. 30, 2018, 6:56 a.m. UTC | #4
* Dan Williams <dan.j.williams@intel.com> wrote:

> > The flip side is that if the MFENCE stalls the STAC that is ahead of it could be
> > processed for 'free' - while it's always post barrier with my suggestion.
> 
> This 'for free' aspect is what I aiming for.

Ok.

> >
> > But in any case it would be nice to see a discussion of this aspect in the
> > changelog, even if the patch does not change.
> 
> I'll add a note to the changelog that having the fence after the
> 'stac' hopefully allows some overlap of the cost of 'stac' and the
> flushing of the instruction pipeline.

Perfect!

Thanks,

	Ingo
diff mbox

Patch

diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
index 30419b674ebd..5f11d4c5c862 100644
--- a/arch/x86/include/asm/barrier.h
+++ b/arch/x86/include/asm/barrier.h
@@ -46,6 +46,10 @@  static inline unsigned long array_idx_mask(unsigned long idx, unsigned long sz)
 	return mask;
 }
 
+/* prevent speculative execution past this barrier */
+#define ifence() alternative_2("", "mfence", X86_FEATURE_MFENCE_RDTSC, \
+				   "lfence", X86_FEATURE_LFENCE_RDTSC)
+
 #ifdef CONFIG_X86_PPRO_FENCE
 #define dma_rmb()	rmb()
 #else
diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
index 07962f5f6fba..e426d2a33ff3 100644
--- a/arch/x86/include/asm/msr.h
+++ b/arch/x86/include/asm/msr.h
@@ -214,8 +214,7 @@  static __always_inline unsigned long long rdtsc_ordered(void)
 	 * that some other imaginary CPU is updating continuously with a
 	 * time stamp.
 	 */
-	alternative_2("", "mfence", X86_FEATURE_MFENCE_RDTSC,
-			  "lfence", X86_FEATURE_LFENCE_RDTSC);
+	ifence();
 	return rdtsc();
 }
 
diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index 574dff4d2913..626caf58183a 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -124,6 +124,11 @@  extern int __get_user_bad(void);
 
 #define __uaccess_begin() stac()
 #define __uaccess_end()   clac()
+#define __uaccess_begin_nospec()	\
+({					\
+	stac();				\
+	ifence();			\
+})
 
 /*
  * This is a type: either unsigned long, if the argument fits into
@@ -487,6 +492,10 @@  struct __large_struct { unsigned long buf[100]; };
 	__uaccess_begin();						\
 	barrier();
 
+#define uaccess_try_nospec do {						\
+	current->thread.uaccess_err = 0;				\
+	__uaccess_begin_nospec();					\
+
 #define uaccess_catch(err)						\
 	__uaccess_end();						\
 	(err) |= (current->thread.uaccess_err ? -EFAULT : 0);		\