diff mbox

[4.4,178/193] x86/syscall: Sanitize syscall table de-references under speculation

Message ID 20180223170354.028619665@linuxfoundation.org (mailing list archive)
State New, archived
Headers show

Commit Message

Greg Kroah-Hartman Feb. 23, 2018, 6:26 p.m. UTC
4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Dan Williams <dan.j.williams@intel.com>

(cherry picked from commit 2fbd7af5af8665d18bcefae3e9700be07e22b681)

The syscall table base is a user controlled function pointer in kernel
space. Use array_index_nospec() to prevent any out of bounds speculation.

While retpoline prevents speculating into a userspace directed target it
does not stop the pointer de-reference, the concern is leaking memory
relative to the syscall table base, by observing instruction cache
behavior.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: kernel-hardening@lists.openwall.com
Cc: gregkh@linuxfoundation.org
Cc: Andy Lutomirski <luto@kernel.org>
Cc: alan@linux.intel.com
Link: https://lkml.kernel.org/r/151727417984.33451.1216731042505722161.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
[jwang: port to 4.4, no syscall_64]
Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/entry/common.c |    2 ++
 1 file changed, 2 insertions(+)

Comments

Jiri Slaby March 6, 2018, 2:21 p.m. UTC | #1
On 02/23/2018, 07:26 PM, Greg Kroah-Hartman wrote:
> 4.4-stable review patch.  If anyone has any objections, please let me know.
> 
> ------------------
> 
> From: Dan Williams <dan.j.williams@intel.com>
> 
> (cherry picked from commit 2fbd7af5af8665d18bcefae3e9700be07e22b681)
> 
> The syscall table base is a user controlled function pointer in kernel
> space. Use array_index_nospec() to prevent any out of bounds speculation.
> 
> While retpoline prevents speculating into a userspace directed target it
> does not stop the pointer de-reference, the concern is leaking memory
> relative to the syscall table base, by observing instruction cache
> behavior.
> 
> Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: linux-arch@vger.kernel.org
> Cc: kernel-hardening@lists.openwall.com
> Cc: gregkh@linuxfoundation.org
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: alan@linux.intel.com
> Link: https://lkml.kernel.org/r/151727417984.33451.1216731042505722161.stgit@dwillia2-desk3.amr.corp.intel.com
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> [jwang: port to 4.4, no syscall_64]

This is not complete IMO, the syscall is indeed there, only written in
assembly in 4.4 yet.

So this patch looks like it is missing these two hunks (from my
SLE12-SP2 backport):

> --- a/arch/x86/entry/entry_64.S
> +++ b/arch/x86/entry/entry_64.S
> @@ -184,6 +184,8 @@ entry_SYSCALL_64_fastpath:
>        cmpl    $__NR_syscall_max, %eax
>  #endif
>        ja      1f                              /* return -ENOSYS (already in pt_regs->ax) */
> +      sbb     %rcx, %rcx                      /* array_index_mask_nospec() */
> +      and     %rcx, %rax
>        movq    %r10, %rcx
>  #ifdef CONFIG_RETPOLINE
>        movq    sys_call_table(, %rax, 8), %rax
> @@ -282,6 +284,8 @@ tracesys_phase2:
>        cmpl    $__NR_syscall_max, %eax
>  #endif
>        ja      1f                              /* return -ENOSYS (already in pt_regs->ax) */
> +      sbb     %rcx, %rcx                      /* array_index_mask_nospec() */
> +      and     %rcx, %rax
>        movq    %r10, %rcx                      /* fixup for C */
>  #ifdef CONFIG_RETPOLINE
>        movq    sys_call_table(, %rax, 8), %rax

Discovered by Jan Beulich.

thanks,
Jiri Slaby March 6, 2018, 4:02 p.m. UTC | #2
On 03/06/2018, 03:21 PM, Jiri Slaby wrote:
> On 02/23/2018, 07:26 PM, Greg Kroah-Hartman wrote:
>> 4.4-stable review patch.  If anyone has any objections, please let me know.
>>
>> ------------------
>>
>> From: Dan Williams <dan.j.williams@intel.com>
>>
>> (cherry picked from commit 2fbd7af5af8665d18bcefae3e9700be07e22b681)
>>
>> The syscall table base is a user controlled function pointer in kernel
>> space. Use array_index_nospec() to prevent any out of bounds speculation.
>>
>> While retpoline prevents speculating into a userspace directed target it
>> does not stop the pointer de-reference, the concern is leaking memory
>> relative to the syscall table base, by observing instruction cache
>> behavior.
>>
>> Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
>> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>> Cc: linux-arch@vger.kernel.org
>> Cc: kernel-hardening@lists.openwall.com
>> Cc: gregkh@linuxfoundation.org
>> Cc: Andy Lutomirski <luto@kernel.org>
>> Cc: alan@linux.intel.com
>> Link: https://lkml.kernel.org/r/151727417984.33451.1216731042505722161.stgit@dwillia2-desk3.amr.corp.intel.com
>> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
>> [jwang: port to 4.4, no syscall_64]
> 
> This is not complete IMO, the syscall is indeed there, only written in
> assembly in 4.4 yet.
> 
> So this patch looks like it is missing these two hunks (from my
> SLE12-SP2 backport):
> 
>> --- a/arch/x86/entry/entry_64.S
>> +++ b/arch/x86/entry/entry_64.S
>> @@ -184,6 +184,8 @@ entry_SYSCALL_64_fastpath:
>>        cmpl    $__NR_syscall_max, %eax
>>  #endif
>>        ja      1f                              /* return -ENOSYS (already in pt_regs->ax) */
>> +      sbb     %rcx, %rcx                      /* array_index_mask_nospec() */
>> +      and     %rcx, %rax

Which is not completely correct either. The preceding comparison should
write:
    cmpl    $NR_syscalls, %eax
    jae     1f
to have sbb correctly working even on the last syscall number.

thanks,
Jinpu Wang March 6, 2018, 4:11 p.m. UTC | #3
On Tue, Mar 6, 2018 at 3:21 PM, Jiri Slaby <jslaby@suse.cz> wrote:
> On 02/23/2018, 07:26 PM, Greg Kroah-Hartman wrote:
>> 4.4-stable review patch.  If anyone has any objections, please let me know.
>>
>> ------------------
>>
>> From: Dan Williams <dan.j.williams@intel.com>
>>
>> (cherry picked from commit 2fbd7af5af8665d18bcefae3e9700be07e22b681)
>>
>> The syscall table base is a user controlled function pointer in kernel
>> space. Use array_index_nospec() to prevent any out of bounds speculation.
>>
>> While retpoline prevents speculating into a userspace directed target it
>> does not stop the pointer de-reference, the concern is leaking memory
>> relative to the syscall table base, by observing instruction cache
>> behavior.
>>
>> Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
>> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>> Cc: linux-arch@vger.kernel.org
>> Cc: kernel-hardening@lists.openwall.com
>> Cc: gregkh@linuxfoundation.org
>> Cc: Andy Lutomirski <luto@kernel.org>
>> Cc: alan@linux.intel.com
>> Link: https://lkml.kernel.org/r/151727417984.33451.1216731042505722161.stgit@dwillia2-desk3.amr.corp.intel.com
>> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
>> [jwang: port to 4.4, no syscall_64]
>
> This is not complete IMO, the syscall is indeed there, only written in
> assembly in 4.4 yet.
>
> So this patch looks like it is missing these two hunks (from my
> SLE12-SP2 backport):
>
>> --- a/arch/x86/entry/entry_64.S
>> +++ b/arch/x86/entry/entry_64.S
>> @@ -184,6 +184,8 @@ entry_SYSCALL_64_fastpath:
>>        cmpl    $__NR_syscall_max, %eax
>>  #endif
>>        ja      1f                              /* return -ENOSYS (already in pt_regs->ax) */
>> +      sbb     %rcx, %rcx                      /* array_index_mask_nospec() */
>> +      and     %rcx, %rax
>>        movq    %r10, %rcx
>>  #ifdef CONFIG_RETPOLINE
>>        movq    sys_call_table(, %rax, 8), %rax
>> @@ -282,6 +284,8 @@ tracesys_phase2:
>>        cmpl    $__NR_syscall_max, %eax
>>  #endif
>>        ja      1f                              /* return -ENOSYS (already in pt_regs->ax) */
>> +      sbb     %rcx, %rcx                      /* array_index_mask_nospec() */
>> +      and     %rcx, %rax
>>        movq    %r10, %rcx                      /* fixup for C */
>>  #ifdef CONFIG_RETPOLINE
>>        movq    sys_call_table(, %rax, 8), %rax
>
> Discovered by Jan Beulich.
>
> thanks,
> --
> js
> suse labs

Thanks Jiri, yes, indeed, could you send a formal patch of the fix?

Thanks!
Jack Wang
Jiri Slaby March 7, 2018, 7:55 a.m. UTC | #4
On 03/07/2018, 08:53 AM, Jiri Slaby wrote:
> From: Dan Williams <dan.j.williams@intel.com>

WTF, sorry, this is a bad header taken from the original patch -- I will
resend shortly.

thanks,
diff mbox

Patch

--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -20,6 +20,7 @@ 
 #include <linux/export.h>
 #include <linux/context_tracking.h>
 #include <linux/user-return-notifier.h>
+#include <linux/nospec.h>
 #include <linux/uprobes.h>
 
 #include <asm/desc.h>
@@ -381,6 +382,7 @@  __always_inline void do_syscall_32_irqs_
 	}
 
 	if (likely(nr < IA32_NR_syscalls)) {
+		nr = array_index_nospec(nr, IA32_NR_syscalls);
 		/*
 		 * It's possible that a 32-bit syscall implementation
 		 * takes a 64-bit parameter but nonetheless assumes that