Message ID | 1406020499-5537-4-git-send-email-takahiro.akashi@linaro.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 07/22/2014 02:14 AM, AKASHI Takahiro wrote: > secure_computing() should always be called first in syscall_trace_enter(). > > If secure_computing() returns -1, we should stop further handling. Then > that system call may eventually fail with a specified return value (errno), > be trapped or the process itself be killed depending on loaded rules. > In these cases, syscall_trace_enter() also returns -1, that results in > skiping a normal syscall handling as well as syscall_trace_exit(). > > Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org> > --- > arch/arm64/Kconfig | 14 ++++++++++++++ > arch/arm64/include/asm/seccomp.h | 25 +++++++++++++++++++++++++ > arch/arm64/include/asm/unistd.h | 3 +++ > arch/arm64/kernel/ptrace.c | 5 +++++ > 4 files changed, 47 insertions(+) > create mode 100644 arch/arm64/include/asm/seccomp.h > > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig > index 3a18571..eeac003 100644 > --- a/arch/arm64/Kconfig > +++ b/arch/arm64/Kconfig > @@ -32,6 +32,7 @@ config ARM64 > select HAVE_ARCH_AUDITSYSCALL > select HAVE_ARCH_JUMP_LABEL > select HAVE_ARCH_KGDB > + select HAVE_ARCH_SECCOMP_FILTER > select HAVE_ARCH_TRACEHOOK > select HAVE_C_RECORDMCOUNT > select HAVE_DEBUG_BUGVERBOSE > @@ -259,6 +260,19 @@ config ARCH_HAS_CACHE_LINE_SIZE > > source "mm/Kconfig" > > +config SECCOMP > + bool "Enable seccomp to safely compute untrusted bytecode" > + ---help--- > + This kernel feature is useful for number crunching applications > + that may need to compute untrusted bytecode during their > + execution. By using pipes or other transports made available to > + the process as file descriptors supporting the read/write > + syscalls, it's possible to isolate those applications in > + their own address space using seccomp. Once seccomp is > + enabled via prctl(PR_SET_SECCOMP), it cannot be disabled > + and the task is only allowed to execute a few safe syscalls > + defined by each seccomp mode. > + > config XEN_DOM0 > def_bool y > depends on XEN > diff --git a/arch/arm64/include/asm/seccomp.h b/arch/arm64/include/asm/seccomp.h > new file mode 100644 > index 0000000..c76fac9 > --- /dev/null > +++ b/arch/arm64/include/asm/seccomp.h > @@ -0,0 +1,25 @@ > +/* > + * arch/arm64/include/asm/seccomp.h > + * > + * Copyright (C) 2014 Linaro Limited > + * Author: AKASHI Takahiro <takahiro.akashi@linaro.org> > + * > + * This program is free software; you can redistribute it and/or modify > + * it under the terms of the GNU General Public License version 2 as > + * published by the Free Software Foundation. > + */ > +#ifndef _ASM_SECCOMP_H > +#define _ASM_SECCOMP_H > + > +#include <asm/unistd.h> > + > +#ifdef CONFIG_COMPAT > +#define __NR_seccomp_read_32 __NR_compat_read > +#define __NR_seccomp_write_32 __NR_compat_write > +#define __NR_seccomp_exit_32 __NR_compat_exit > +#define __NR_seccomp_sigreturn_32 __NR_compat_rt_sigreturn > +#endif /* CONFIG_COMPAT */ > + > +#include <asm-generic/seccomp.h> > + > +#endif /* _ASM_SECCOMP_H */ > diff --git a/arch/arm64/include/asm/unistd.h b/arch/arm64/include/asm/unistd.h > index c980ab7..729c155 100644 > --- a/arch/arm64/include/asm/unistd.h > +++ b/arch/arm64/include/asm/unistd.h > @@ -31,6 +31,9 @@ > * Compat syscall numbers used by the AArch64 kernel. > */ > #define __NR_compat_restart_syscall 0 > +#define __NR_compat_exit 1 > +#define __NR_compat_read 3 > +#define __NR_compat_write 4 > #define __NR_compat_sigreturn 119 > #define __NR_compat_rt_sigreturn 173 > > diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c > index 100d7d1..e477f6f 100644 > --- a/arch/arm64/kernel/ptrace.c > +++ b/arch/arm64/kernel/ptrace.c > @@ -28,6 +28,7 @@ > #include <linux/smp.h> > #include <linux/ptrace.h> > #include <linux/user.h> > +#include <linux/seccomp.h> > #include <linux/security.h> > #include <linux/init.h> > #include <linux/signal.h> > @@ -1115,6 +1116,10 @@ asmlinkage int syscall_trace_enter(struct pt_regs *regs) > saved_x0 = regs->regs[0]; > saved_x8 = regs->regs[8]; > > + if (secure_computing(regs->syscallno) == -1) > + /* seccomp failures shouldn't expose any additional code. */ > + return -1; > + This will conflict with the fastpath stuff in Kees' tree. (Actually, it's likely to apply cleanly, but fail to compile.) The fix is trivial, but, given that the fastpath stuff is new, can you take a look and see if arm64 can use it effectively? I suspect that the performance considerations are rather different on arm64 as compared to x86 (I really hope that x86 is the only architecture with the absurd sysret vs. iret distinction), but at least the seccomp_data stuff ought to help anywhere. (It looks like there's a distinct fast path, too, so the two-phase thing might also be a fairly large win if it's supportable.) See: https://git.kernel.org/cgit/linux/kernel/git/kees/linux.git/log/?h=seccomp/fastpath Also, I'll ask the usual question? What are all of the factors other than nr and args that affect syscall execution? What are the audit arch values? Do they match correctly? For example, it looks like, if arm64 adds OABI support, you'll have a problem. (Note that arm currently disables audit and seccomp if OABI is enabled for exactly this reason.) Do any syscall implementations care whether the user code is LE or BE? Are the arguments encoded the same way? An arm-specific question: will there be any confusion as a result of the fact that compat syscalls seems to stick nr in w7, but arm64 puts them somewhere else? --Andy
On 07/24/2014 12:52 PM, Andy Lutomirski wrote: > On 07/22/2014 02:14 AM, AKASHI Takahiro wrote: >> secure_computing() should always be called first in syscall_trace_enter(). >> >> If secure_computing() returns -1, we should stop further handling. Then >> that system call may eventually fail with a specified return value (errno), >> be trapped or the process itself be killed depending on loaded rules. >> In these cases, syscall_trace_enter() also returns -1, that results in >> skiping a normal syscall handling as well as syscall_trace_exit(). >> >> Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org> >> --- >> arch/arm64/Kconfig | 14 ++++++++++++++ >> arch/arm64/include/asm/seccomp.h | 25 +++++++++++++++++++++++++ >> arch/arm64/include/asm/unistd.h | 3 +++ >> arch/arm64/kernel/ptrace.c | 5 +++++ >> 4 files changed, 47 insertions(+) >> create mode 100644 arch/arm64/include/asm/seccomp.h >> >> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig >> index 3a18571..eeac003 100644 >> --- a/arch/arm64/Kconfig >> +++ b/arch/arm64/Kconfig >> @@ -32,6 +32,7 @@ config ARM64 >> select HAVE_ARCH_AUDITSYSCALL >> select HAVE_ARCH_JUMP_LABEL >> select HAVE_ARCH_KGDB >> + select HAVE_ARCH_SECCOMP_FILTER >> select HAVE_ARCH_TRACEHOOK >> select HAVE_C_RECORDMCOUNT >> select HAVE_DEBUG_BUGVERBOSE >> @@ -259,6 +260,19 @@ config ARCH_HAS_CACHE_LINE_SIZE >> >> source "mm/Kconfig" >> >> +config SECCOMP >> + bool "Enable seccomp to safely compute untrusted bytecode" >> + ---help--- >> + This kernel feature is useful for number crunching applications >> + that may need to compute untrusted bytecode during their >> + execution. By using pipes or other transports made available to >> + the process as file descriptors supporting the read/write >> + syscalls, it's possible to isolate those applications in >> + their own address space using seccomp. Once seccomp is >> + enabled via prctl(PR_SET_SECCOMP), it cannot be disabled >> + and the task is only allowed to execute a few safe syscalls >> + defined by each seccomp mode. >> + >> config XEN_DOM0 >> def_bool y >> depends on XEN >> diff --git a/arch/arm64/include/asm/seccomp.h b/arch/arm64/include/asm/seccomp.h >> new file mode 100644 >> index 0000000..c76fac9 >> --- /dev/null >> +++ b/arch/arm64/include/asm/seccomp.h >> @@ -0,0 +1,25 @@ >> +/* >> + * arch/arm64/include/asm/seccomp.h >> + * >> + * Copyright (C) 2014 Linaro Limited >> + * Author: AKASHI Takahiro <takahiro.akashi@linaro.org> >> + * >> + * This program is free software; you can redistribute it and/or modify >> + * it under the terms of the GNU General Public License version 2 as >> + * published by the Free Software Foundation. >> + */ >> +#ifndef _ASM_SECCOMP_H >> +#define _ASM_SECCOMP_H >> + >> +#include <asm/unistd.h> >> + >> +#ifdef CONFIG_COMPAT >> +#define __NR_seccomp_read_32 __NR_compat_read >> +#define __NR_seccomp_write_32 __NR_compat_write >> +#define __NR_seccomp_exit_32 __NR_compat_exit >> +#define __NR_seccomp_sigreturn_32 __NR_compat_rt_sigreturn >> +#endif /* CONFIG_COMPAT */ >> + >> +#include <asm-generic/seccomp.h> >> + >> +#endif /* _ASM_SECCOMP_H */ >> diff --git a/arch/arm64/include/asm/unistd.h b/arch/arm64/include/asm/unistd.h >> index c980ab7..729c155 100644 >> --- a/arch/arm64/include/asm/unistd.h >> +++ b/arch/arm64/include/asm/unistd.h >> @@ -31,6 +31,9 @@ >> * Compat syscall numbers used by the AArch64 kernel. >> */ >> #define __NR_compat_restart_syscall 0 >> +#define __NR_compat_exit 1 >> +#define __NR_compat_read 3 >> +#define __NR_compat_write 4 >> #define __NR_compat_sigreturn 119 >> #define __NR_compat_rt_sigreturn 173 >> >> diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c >> index 100d7d1..e477f6f 100644 >> --- a/arch/arm64/kernel/ptrace.c >> +++ b/arch/arm64/kernel/ptrace.c >> @@ -28,6 +28,7 @@ >> #include <linux/smp.h> >> #include <linux/ptrace.h> >> #include <linux/user.h> >> +#include <linux/seccomp.h> >> #include <linux/security.h> >> #include <linux/init.h> >> #include <linux/signal.h> >> @@ -1115,6 +1116,10 @@ asmlinkage int syscall_trace_enter(struct pt_regs *regs) >> saved_x0 = regs->regs[0]; >> saved_x8 = regs->regs[8]; >> >> + if (secure_computing(regs->syscallno) == -1) >> + /* seccomp failures shouldn't expose any additional code. */ >> + return -1; >> + > > This will conflict with the fastpath stuff in Kees' tree. (Actually, it's likely to apply cleanly, but fail to > compile.) The fix is trivial, but, given that the fastpath stuff is new, can you take a look and see if arm64 can use > it effectively? I will look into the code later. > I suspect that the performance considerations are rather different on arm64 as compared to x86 (I really hope that x86 > is the only architecture with the absurd sysret vs. iret distinction), but at least the seccomp_data stuff ought to help > anywhere. (It looks like there's a distinct fast path, too, so the two-phase thing might also be a fairly large win if > it's supportable.) > > See: > > https://git.kernel.org/cgit/linux/kernel/git/kees/linux.git/log/?h=seccomp/fastpath > > Also, I'll ask the usual question? What are all of the factors other than nr and args that affect syscall execution? > What are the audit arch values? Do they match correctly? As far as I know, > For example, it looks like, if arm64 adds OABI support, you'll have a problem. (Note that arm currently disables audit > and seccomp if OABI is enabled for exactly this reason.) I don't think that arm64 will add OABI support in the future. > Do any syscall implementations care whether the user code is LE or BE? Are the arguments encoded the same way? when I implemented audit for arm64, the assumptions were * If userspace is LE, then the kernel is also LE and if BE, then the kernel is BE. * the syscall numbers and how arguments are encoded are the same btw BE and LE. So syscall_get_arch() always return the same value. > An arm-specific question: will there be any confusion as a result of the fact that compat syscalls seems to stick nr in > w7, but arm64 puts them somewhere else? I don't know, but syscall_get_arch() returns ARCH_ARM for 32-bit tasks. Thanks, -Takahiro AKASHI > --Andy
On Jul 23, 2014 10:40 PM, "AKASHI Takahiro" <takahiro.akashi@linaro.org> wrote: > > On 07/24/2014 12:52 PM, Andy Lutomirski wrote: >> >> On 07/22/2014 02:14 AM, AKASHI Takahiro wrote: >>> >>> secure_computing() should always be called first in syscall_trace_enter(). >>> >>> If secure_computing() returns -1, we should stop further handling. Then >>> that system call may eventually fail with a specified return value (errno), >>> be trapped or the process itself be killed depending on loaded rules. >>> In these cases, syscall_trace_enter() also returns -1, that results in >>> skiping a normal syscall handling as well as syscall_trace_exit(). >>> >>> Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org> >>> --- >>> arch/arm64/Kconfig | 14 ++++++++++++++ >>> arch/arm64/include/asm/seccomp.h | 25 +++++++++++++++++++++++++ >>> arch/arm64/include/asm/unistd.h | 3 +++ >>> arch/arm64/kernel/ptrace.c | 5 +++++ >>> 4 files changed, 47 insertions(+) >>> create mode 100644 arch/arm64/include/asm/seccomp.h >>> >>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig >>> index 3a18571..eeac003 100644 >>> --- a/arch/arm64/Kconfig >>> +++ b/arch/arm64/Kconfig >>> @@ -32,6 +32,7 @@ config ARM64 >>> select HAVE_ARCH_AUDITSYSCALL >>> select HAVE_ARCH_JUMP_LABEL >>> select HAVE_ARCH_KGDB >>> + select HAVE_ARCH_SECCOMP_FILTER >>> select HAVE_ARCH_TRACEHOOK >>> select HAVE_C_RECORDMCOUNT >>> select HAVE_DEBUG_BUGVERBOSE >>> @@ -259,6 +260,19 @@ config ARCH_HAS_CACHE_LINE_SIZE >>> >>> source "mm/Kconfig" >>> >>> +config SECCOMP >>> + bool "Enable seccomp to safely compute untrusted bytecode" >>> + ---help--- >>> + This kernel feature is useful for number crunching applications >>> + that may need to compute untrusted bytecode during their >>> + execution. By using pipes or other transports made available to >>> + the process as file descriptors supporting the read/write >>> + syscalls, it's possible to isolate those applications in >>> + their own address space using seccomp. Once seccomp is >>> + enabled via prctl(PR_SET_SECCOMP), it cannot be disabled >>> + and the task is only allowed to execute a few safe syscalls >>> + defined by each seccomp mode. >>> + >>> config XEN_DOM0 >>> def_bool y >>> depends on XEN >>> diff --git a/arch/arm64/include/asm/seccomp.h b/arch/arm64/include/asm/seccomp.h >>> new file mode 100644 >>> index 0000000..c76fac9 >>> --- /dev/null >>> +++ b/arch/arm64/include/asm/seccomp.h >>> @@ -0,0 +1,25 @@ >>> +/* >>> + * arch/arm64/include/asm/seccomp.h >>> + * >>> + * Copyright (C) 2014 Linaro Limited >>> + * Author: AKASHI Takahiro <takahiro.akashi@linaro.org> >>> + * >>> + * This program is free software; you can redistribute it and/or modify >>> + * it under the terms of the GNU General Public License version 2 as >>> + * published by the Free Software Foundation. >>> + */ >>> +#ifndef _ASM_SECCOMP_H >>> +#define _ASM_SECCOMP_H >>> + >>> +#include <asm/unistd.h> >>> + >>> +#ifdef CONFIG_COMPAT >>> +#define __NR_seccomp_read_32 __NR_compat_read >>> +#define __NR_seccomp_write_32 __NR_compat_write >>> +#define __NR_seccomp_exit_32 __NR_compat_exit >>> +#define __NR_seccomp_sigreturn_32 __NR_compat_rt_sigreturn >>> +#endif /* CONFIG_COMPAT */ >>> + >>> +#include <asm-generic/seccomp.h> >>> + >>> +#endif /* _ASM_SECCOMP_H */ >>> diff --git a/arch/arm64/include/asm/unistd.h b/arch/arm64/include/asm/unistd.h >>> index c980ab7..729c155 100644 >>> --- a/arch/arm64/include/asm/unistd.h >>> +++ b/arch/arm64/include/asm/unistd.h >>> @@ -31,6 +31,9 @@ >>> * Compat syscall numbers used by the AArch64 kernel. >>> */ >>> #define __NR_compat_restart_syscall 0 >>> +#define __NR_compat_exit 1 >>> +#define __NR_compat_read 3 >>> +#define __NR_compat_write 4 >>> #define __NR_compat_sigreturn 119 >>> #define __NR_compat_rt_sigreturn 173 >>> >>> diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c >>> index 100d7d1..e477f6f 100644 >>> --- a/arch/arm64/kernel/ptrace.c >>> +++ b/arch/arm64/kernel/ptrace.c >>> @@ -28,6 +28,7 @@ >>> #include <linux/smp.h> >>> #include <linux/ptrace.h> >>> #include <linux/user.h> >>> +#include <linux/seccomp.h> >>> #include <linux/security.h> >>> #include <linux/init.h> >>> #include <linux/signal.h> >>> @@ -1115,6 +1116,10 @@ asmlinkage int syscall_trace_enter(struct pt_regs *regs) >>> saved_x0 = regs->regs[0]; >>> saved_x8 = regs->regs[8]; >>> >>> + if (secure_computing(regs->syscallno) == -1) >>> + /* seccomp failures shouldn't expose any additional code. */ >>> + return -1; >>> + >> >> >> This will conflict with the fastpath stuff in Kees' tree. (Actually, it's likely to apply cleanly, but fail to >> compile.) The fix is trivial, but, given that the fastpath stuff is new, can you take a look and see if arm64 can use >> it effectively? > > > I will look into the code later. > > >> I suspect that the performance considerations are rather different on arm64 as compared to x86 (I really hope that x86 >> is the only architecture with the absurd sysret vs. iret distinction), but at least the seccomp_data stuff ought to help >> anywhere. (It looks like there's a distinct fast path, too, so the two-phase thing might also be a fairly large win if >> it's supportable.) >> >> See: >> >> https://git.kernel.org/cgit/linux/kernel/git/kees/linux.git/log/?h=seccomp/fastpath >> >> Also, I'll ask the usual question? What are all of the factors other than nr and args that affect syscall execution? >> What are the audit arch values? Do they match correctly? > > > As far as I know, > > >> For example, it looks like, if arm64 adds OABI support, you'll have a problem. (Note that arm currently disables audit >> and seccomp if OABI is enabled for exactly this reason.) > > > I don't think that arm64 will add OABI support in the future. > > >> Do any syscall implementations care whether the user code is LE or BE? Are the arguments encoded the same way? > > > when I implemented audit for arm64, the assumptions were > * If userspace is LE, then the kernel is also LE and if BE, then the kernel is BE. > * the syscall numbers and how arguments are encoded are the same btw BE and LE. > So syscall_get_arch() always return the same value. If arm64 ever adds support for mixed-endian userspace, this could become awkward. Hmm. IMO this matters more for seccomp than for audit. The audit code doesn't seem to do anything terribly interesting w/ the arch field, at least in terms of interpretation of syscall args. > > >> An arm-specific question: will there be any confusion as a result of the fact that compat syscalls seems to stick nr in >> w7, but arm64 puts them somewhere else? > > > I don't know, but syscall_get_arch() returns ARCH_ARM for 32-bit tasks. Will 32-bit tracers be compatible between arm and arm64 kernels? That is, if a 32-bit program installs a seccomp filter with a trace action and traces a 32-bit process, will everything work correctly? (Kees' and Will's tests should work for this, I think.) --Andy
On Thu, Jul 24, 2014 at 04:00:03PM +0100, Andy Lutomirski wrote: > On Jul 23, 2014 10:40 PM, "AKASHI Takahiro" <takahiro.akashi@linaro.org> wrote: > > when I implemented audit for arm64, the assumptions were > > * If userspace is LE, then the kernel is also LE and if BE, then the kernel is BE. > > * the syscall numbers and how arguments are encoded are the same btw BE and LE. > > So syscall_get_arch() always return the same value. > > If arm64 ever adds support for mixed-endian userspace, this could > become awkward. Hmm. I really doubt we would ever support mixed endian user space. Too many problems with translating syscalls, futexes (someone looked into this and gave up eventually).
On 07/25/2014 12:00 AM, Andy Lutomirski wrote: > On Jul 23, 2014 10:40 PM, "AKASHI Takahiro" <takahiro.akashi@linaro.org> wrote: >> >> On 07/24/2014 12:52 PM, Andy Lutomirski wrote: >>> >>> On 07/22/2014 02:14 AM, AKASHI Takahiro wrote: >>>> >>>> secure_computing() should always be called first in syscall_trace_enter(). >>>> >>>> If secure_computing() returns -1, we should stop further handling. Then >>>> that system call may eventually fail with a specified return value (errno), >>>> be trapped or the process itself be killed depending on loaded rules. >>>> In these cases, syscall_trace_enter() also returns -1, that results in >>>> skiping a normal syscall handling as well as syscall_trace_exit(). >>>> >>>> Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org> >>>> --- >>>> arch/arm64/Kconfig | 14 ++++++++++++++ >>>> arch/arm64/include/asm/seccomp.h | 25 +++++++++++++++++++++++++ >>>> arch/arm64/include/asm/unistd.h | 3 +++ >>>> arch/arm64/kernel/ptrace.c | 5 +++++ >>>> 4 files changed, 47 insertions(+) >>>> create mode 100644 arch/arm64/include/asm/seccomp.h >>>> >>>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig >>>> index 3a18571..eeac003 100644 >>>> --- a/arch/arm64/Kconfig >>>> +++ b/arch/arm64/Kconfig >>>> @@ -32,6 +32,7 @@ config ARM64 >>>> select HAVE_ARCH_AUDITSYSCALL >>>> select HAVE_ARCH_JUMP_LABEL >>>> select HAVE_ARCH_KGDB >>>> + select HAVE_ARCH_SECCOMP_FILTER >>>> select HAVE_ARCH_TRACEHOOK >>>> select HAVE_C_RECORDMCOUNT >>>> select HAVE_DEBUG_BUGVERBOSE >>>> @@ -259,6 +260,19 @@ config ARCH_HAS_CACHE_LINE_SIZE >>>> >>>> source "mm/Kconfig" >>>> >>>> +config SECCOMP >>>> + bool "Enable seccomp to safely compute untrusted bytecode" >>>> + ---help--- >>>> + This kernel feature is useful for number crunching applications >>>> + that may need to compute untrusted bytecode during their >>>> + execution. By using pipes or other transports made available to >>>> + the process as file descriptors supporting the read/write >>>> + syscalls, it's possible to isolate those applications in >>>> + their own address space using seccomp. Once seccomp is >>>> + enabled via prctl(PR_SET_SECCOMP), it cannot be disabled >>>> + and the task is only allowed to execute a few safe syscalls >>>> + defined by each seccomp mode. >>>> + >>>> config XEN_DOM0 >>>> def_bool y >>>> depends on XEN >>>> diff --git a/arch/arm64/include/asm/seccomp.h b/arch/arm64/include/asm/seccomp.h >>>> new file mode 100644 >>>> index 0000000..c76fac9 >>>> --- /dev/null >>>> +++ b/arch/arm64/include/asm/seccomp.h >>>> @@ -0,0 +1,25 @@ >>>> +/* >>>> + * arch/arm64/include/asm/seccomp.h >>>> + * >>>> + * Copyright (C) 2014 Linaro Limited >>>> + * Author: AKASHI Takahiro <takahiro.akashi@linaro.org> >>>> + * >>>> + * This program is free software; you can redistribute it and/or modify >>>> + * it under the terms of the GNU General Public License version 2 as >>>> + * published by the Free Software Foundation. >>>> + */ >>>> +#ifndef _ASM_SECCOMP_H >>>> +#define _ASM_SECCOMP_H >>>> + >>>> +#include <asm/unistd.h> >>>> + >>>> +#ifdef CONFIG_COMPAT >>>> +#define __NR_seccomp_read_32 __NR_compat_read >>>> +#define __NR_seccomp_write_32 __NR_compat_write >>>> +#define __NR_seccomp_exit_32 __NR_compat_exit >>>> +#define __NR_seccomp_sigreturn_32 __NR_compat_rt_sigreturn >>>> +#endif /* CONFIG_COMPAT */ >>>> + >>>> +#include <asm-generic/seccomp.h> >>>> + >>>> +#endif /* _ASM_SECCOMP_H */ >>>> diff --git a/arch/arm64/include/asm/unistd.h b/arch/arm64/include/asm/unistd.h >>>> index c980ab7..729c155 100644 >>>> --- a/arch/arm64/include/asm/unistd.h >>>> +++ b/arch/arm64/include/asm/unistd.h >>>> @@ -31,6 +31,9 @@ >>>> * Compat syscall numbers used by the AArch64 kernel. >>>> */ >>>> #define __NR_compat_restart_syscall 0 >>>> +#define __NR_compat_exit 1 >>>> +#define __NR_compat_read 3 >>>> +#define __NR_compat_write 4 >>>> #define __NR_compat_sigreturn 119 >>>> #define __NR_compat_rt_sigreturn 173 >>>> >>>> diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c >>>> index 100d7d1..e477f6f 100644 >>>> --- a/arch/arm64/kernel/ptrace.c >>>> +++ b/arch/arm64/kernel/ptrace.c >>>> @@ -28,6 +28,7 @@ >>>> #include <linux/smp.h> >>>> #include <linux/ptrace.h> >>>> #include <linux/user.h> >>>> +#include <linux/seccomp.h> >>>> #include <linux/security.h> >>>> #include <linux/init.h> >>>> #include <linux/signal.h> >>>> @@ -1115,6 +1116,10 @@ asmlinkage int syscall_trace_enter(struct pt_regs *regs) >>>> saved_x0 = regs->regs[0]; >>>> saved_x8 = regs->regs[8]; >>>> >>>> + if (secure_computing(regs->syscallno) == -1) >>>> + /* seccomp failures shouldn't expose any additional code. */ >>>> + return -1; >>>> + >>> >>> >>> This will conflict with the fastpath stuff in Kees' tree. (Actually, it's likely to apply cleanly, but fail to >>> compile.) The fix is trivial, but, given that the fastpath stuff is new, can you take a look and see if arm64 can use >>> it effectively? >> >> >> I will look into the code later. >> >> >>> I suspect that the performance considerations are rather different on arm64 as compared to x86 (I really hope that x86 >>> is the only architecture with the absurd sysret vs. iret distinction), but at least the seccomp_data stuff ought to help >>> anywhere. (It looks like there's a distinct fast path, too, so the two-phase thing might also be a fairly large win if >>> it's supportable.) >>> >>> See: >>> >>> https://git.kernel.org/cgit/linux/kernel/git/kees/linux.git/log/?h=seccomp/fastpath >>> >>> Also, I'll ask the usual question? What are all of the factors other than nr and args that affect syscall execution? >>> What are the audit arch values? Do they match correctly? >> >> >> As far as I know, >> >> >>> For example, it looks like, if arm64 adds OABI support, you'll have a problem. (Note that arm currently disables audit >>> and seccomp if OABI is enabled for exactly this reason.) >> >> >> I don't think that arm64 will add OABI support in the future. >> >> >>> Do any syscall implementations care whether the user code is LE or BE? Are the arguments encoded the same way? >> >> >> when I implemented audit for arm64, the assumptions were >> * If userspace is LE, then the kernel is also LE and if BE, then the kernel is BE. >> * the syscall numbers and how arguments are encoded are the same btw BE and LE. >> So syscall_get_arch() always return the same value. > > If arm64 ever adds support for mixed-endian userspace, this could > become awkward. Hmm. > > IMO this matters more for seccomp than for audit. The audit code > doesn't seem to do anything terribly interesting w/ the arch field, at > least in terms of interpretation of syscall args. I digged into libseccomp source files, and found that there is some endianness-dependent code. Especially, "classic" BPF interpreter handles only 32-bit accumulator/registers, and so special care should be taken when a filter wants to check a 64-bit argument of system call. If we don't support mixed-endianness, this issue can be fixed by statically compiling the library with BYTE_ORDER macro. But otherwise syscall_get_arch() should return a dedicated value for BE kernel and this change will also have some impact on audit commands. >> >> >>> An arm-specific question: will there be any confusion as a result of the fact that compat syscalls seems to stick nr in >>> w7, but arm64 puts them somewhere else? >> >> >> I don't know, but syscall_get_arch() returns ARCH_ARM for 32-bit tasks. > > Will 32-bit tracers be compatible between arm and arm64 kernels? That > is, if a 32-bit program installs a seccomp filter with a trace action > and traces a 32-bit process, will everything work correctly? (Kees' > and Will's tests should work for this, I think.) I found a bug in my current patch (v5). When 32-bit tracer skips a system call, we should not update syscallno from x8 since syscallno is re-written directly via ptrace(PTRACE_SET_SYSCALL). I'm sure that my next version should work with 32/64-bit tracers on 64-bit kernel. Thanks, -Takahiro AKASHI > --Andy >
On Fri, Jul 25, 2014 at 2:37 AM, AKASHI Takahiro <takahiro.akashi@linaro.org> wrote: > I found a bug in my current patch (v5). When 32-bit tracer skips a system call, > we should not update syscallno from x8 since syscallno is re-written directly > via ptrace(PTRACE_SET_SYSCALL). Ah, yes. Will aarch64 have a PTRACE_SET_SYSCALL option, or is this strictly a 32-bit vs 64-bit issue? > I'm sure that my next version should work with 32/64-bit tracers on 64-bit > kernel. Do you have a git tree uploaded anywhere? I'd love to follow this more closely. When do you expect a v6? Thanks! -Kees
On 08/06/2014 12:08 AM, Kees Cook wrote: > On Fri, Jul 25, 2014 at 2:37 AM, AKASHI Takahiro > <takahiro.akashi@linaro.org> wrote: >> I found a bug in my current patch (v5). When 32-bit tracer skips a system call, >> we should not update syscallno from x8 since syscallno is re-written directly >> via ptrace(PTRACE_SET_SYSCALL). > > Ah, yes. Will aarch64 have a PTRACE_SET_SYSCALL option, or is this > strictly a 32-bit vs 64-bit issue? As discussed in a few weeks ago, aarch64 won't support PTRACE_SET_SYSCALL. >> I'm sure that my next version should work with 32/64-bit tracers on 64-bit >> kernel. > > Do you have a git tree uploaded anywhere? I'd love to follow this more > closely. When do you expect a v6? I'd like to submit v6 as soon as possible, but (1) how we should handle syscall(-1) is annoying me. Without ptracer, we will normally return -ENOSYS but, for example, what if some seccomp filter is installed and it does allow (or doesn't have any rule against) '-1' syscall? Since the kernel doesn't know tracer's intention, we should just let syscall(-1) return a bogus value. Thus we will see inconsistent results of syscall(-1). (2) I'm investigating some failures in Kees' test suite. * 'TRACE.handler' case on compat task: Now I found a bug in arm64's compat_siginfo_t and fixed it. * 'TSYNC.two_siblings_*' cases on 32/64-bit task: I rebased my patch on pre-v3.17-rc1, but those cases still fail. I have no clues at this moment. So please be patient for a while. -Takahiro AKASHI > Thanks! > > -Kees >
On Fri, Aug 08, 2014 at 08:35:42AM +0100, AKASHI Takahiro wrote: > On 08/06/2014 12:08 AM, Kees Cook wrote: > > On Fri, Jul 25, 2014 at 2:37 AM, AKASHI Takahiro > > <takahiro.akashi@linaro.org> wrote: > >> I found a bug in my current patch (v5). When 32-bit tracer skips a system call, > >> we should not update syscallno from x8 since syscallno is re-written directly > >> via ptrace(PTRACE_SET_SYSCALL). > > > > Ah, yes. Will aarch64 have a PTRACE_SET_SYSCALL option, or is this > > strictly a 32-bit vs 64-bit issue? > > As discussed in a few weeks ago, aarch64 won't support PTRACE_SET_SYSCALL. Well, I don't think anything was set in stone. If you have a compelling reason why adding the new request gives you something over setting w8 directly, then we can extend ptrace. Will
Will, On 08/11/2014 06:24 PM, Will Deacon wrote: > On Fri, Aug 08, 2014 at 08:35:42AM +0100, AKASHI Takahiro wrote: >> On 08/06/2014 12:08 AM, Kees Cook wrote: >>> On Fri, Jul 25, 2014 at 2:37 AM, AKASHI Takahiro >>> <takahiro.akashi@linaro.org> wrote: >>>> I found a bug in my current patch (v5). When 32-bit tracer skips a system call, >>>> we should not update syscallno from x8 since syscallno is re-written directly >>>> via ptrace(PTRACE_SET_SYSCALL). >>> >>> Ah, yes. Will aarch64 have a PTRACE_SET_SYSCALL option, or is this >>> strictly a 32-bit vs 64-bit issue? >> >> As discussed in a few weeks ago, aarch64 won't support PTRACE_SET_SYSCALL. > > Well, I don't think anything was set in stone. If you have a compelling > reason why adding the new request gives you something over setting w8 > directly, then we can extend ptrace. Yeah, I think I may have to change my mind. Looking into __secure_computing(), I found the code below: > case SECCOMP_MODE_FILTER: > case SECCOMP_RET_TRACE: > ... > if (syscall_get_nr(current, regs) < 0) > goto skip; This implies that we should modify syscallno *before* __secure_computing() returns. I assumed, in my next version, we could skip a system call by overwriting syscallno with x8 in syscall_trace_enter() after __secure_computing() returns 0, and it actually works. But we'd better implement PTRACE_SET_SYSCALL to comply with what __secure_computing() expects. -Takahiro AKASHI > Will > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel >
Hi Akashi, On Tue, Aug 12, 2014 at 07:57:25AM +0100, AKASHI Takahiro wrote: > On 08/11/2014 06:24 PM, Will Deacon wrote: > > On Fri, Aug 08, 2014 at 08:35:42AM +0100, AKASHI Takahiro wrote: > >> As discussed in a few weeks ago, aarch64 won't support PTRACE_SET_SYSCALL. > > > > Well, I don't think anything was set in stone. If you have a compelling > > reason why adding the new request gives you something over setting w8 > > directly, then we can extend ptrace. > > Yeah, I think I may have to change my mind. Looking into __secure_computing(), > I found the code below: > > > case SECCOMP_MODE_FILTER: > > case SECCOMP_RET_TRACE: > > ... > > if (syscall_get_nr(current, regs) < 0) > > goto skip; > > This implies that we should modify syscallno *before* __secure_computing() > returns. Why does it imply that? There are four competing entities here: - seccomp - tracehook - ftrace (trace_sys_*) - audit With the exception of ftrace, they can all potentially rewrite the pt_regs (the code you cite above is just below a ptrace_event call), so we have to choose some order in which to call them. On entry, x86 and arm call them in the order I listed above, so it seems sensible to follow that. > I assumed, in my next version, we could skip a system call by overwriting > syscallno with x8 in syscall_trace_enter() after __secure_computing() > returns 0, and it actually works. Why does overwriting the syscallno with x8 skip the syscall? I thought the idea was that we would save w8 prior to each call that could change the pt_regs, then if it was changed to -1 we would replace it with the saved value and return -1? The only confusion I have is whether we should call the exit hooks after skipping a syscall. I *think* x86 does call them, but ARM doesn't. Andy says this can trigger an OOPs: http://lists.infradead.org/pipermail/linux-arm-kernel/2014-July/274988.html so we should fix that for ARM while we're here. Will
On 08/12/2014 06:40 PM, Will Deacon wrote: > Hi Akashi, > > On Tue, Aug 12, 2014 at 07:57:25AM +0100, AKASHI Takahiro wrote: >> On 08/11/2014 06:24 PM, Will Deacon wrote: >>> On Fri, Aug 08, 2014 at 08:35:42AM +0100, AKASHI Takahiro wrote: >>>> As discussed in a few weeks ago, aarch64 won't support PTRACE_SET_SYSCALL. >>> >>> Well, I don't think anything was set in stone. If you have a compelling >>> reason why adding the new request gives you something over setting w8 >>> directly, then we can extend ptrace. >> >> Yeah, I think I may have to change my mind. Looking into __secure_computing(), >> I found the code below: >> >> > case SECCOMP_MODE_FILTER: >> > case SECCOMP_RET_TRACE: >> > ... >> > if (syscall_get_nr(current, regs) < 0) >> > goto skip; >> >> This implies that we should modify syscallno *before* __secure_computing() >> returns. > > Why does it imply that? There are four competing entities here: > > - seccomp > - tracehook > - ftrace (trace_sys_*) > - audit > > With the exception of ftrace, they can all potentially rewrite the pt_regs > (the code you cite above is just below a ptrace_event call), so we have > to choose some order in which to call them. (audit won't change registers.) > On entry, x86 and arm call them in the order I listed above, so it seems > sensible to follow that. Right, but as far as I understand, ptrace_event() in __secure_computing() calls ptrace_notify(), and eventually executes ptrace_stop(), which can be stopped while tracer runs (until ptrace(PTRACE_CONT)?). So syscall_get_nr() is expected to return -1 if trace changes a syscall number to -1 (as far as sycall_get_nr() refers to syscallno in pt_regs). That is why I think we should have PTRACE_SET_SYSCALL. >> I assumed, in my next version, we could skip a system call by overwriting >> syscallno with x8 in syscall_trace_enter() after __secure_computing() >> returns 0, and it actually works. > > Why does overwriting the syscallno with x8 skip the syscall? > > I thought the idea was that we would save w8 prior to each call that could > change the pt_regs, then if it was changed to -1 we would replace it with > the saved value and return -1? I think its the right way to do. But x86 rewrites orig_ax and arm rewrites syscallno directly, and refer to these values as "syscall numbers" later on, for example, see the arguments to audit_syscall_entry(). So if we don't update syscallno, we may see different behaviors from x86 or arm? > The only confusion I have is whether we > should call the exit hooks after skipping a syscall. I *think* x86 does > call them, but ARM doesn't. Andy says this can trigger an OOPs: Again, right. we should definitely avoid OOPs. But we may be able to avoid OOPs by not calling entry hooks for skipped system calls, instead of calling exit hooks, if we rewrite syscallno as mentioned above. (Please note, as I mentioned, audit_syscall_xx() ignores any request for logging invalid system calls.) Thanks, -Takahiro AKASHI > http://lists.infradead.org/pipermail/linux-arm-kernel/2014-July/274988.html > > so we should fix that for ARM while we're here. > > Will > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel >
On Tue, Aug 12, 2014 at 12:17:53PM +0100, AKASHI Takahiro wrote: > On 08/12/2014 06:40 PM, Will Deacon wrote: > > On Tue, Aug 12, 2014 at 07:57:25AM +0100, AKASHI Takahiro wrote: > >> > >> > case SECCOMP_MODE_FILTER: > >> > case SECCOMP_RET_TRACE: > >> > ... > >> > if (syscall_get_nr(current, regs) < 0) > >> > goto skip; > >> > >> This implies that we should modify syscallno *before* __secure_computing() > >> returns. > > > > Why does it imply that? There are four competing entities here: > > > > - seccomp > > - tracehook > > - ftrace (trace_sys_*) > > - audit > > > > With the exception of ftrace, they can all potentially rewrite the pt_regs > > (the code you cite above is just below a ptrace_event call), so we have > > to choose some order in which to call them. > > (audit won't change registers.) Sorry, you're quite right. > > On entry, x86 and arm call them in the order I listed above, so it seems > > sensible to follow that. > > Right, but as far as I understand, ptrace_event() in __secure_computing() > calls ptrace_notify(), and eventually executes ptrace_stop(), which can > be stopped while tracer runs (until ptrace(PTRACE_CONT)?). > So syscall_get_nr() is expected to return -1 if trace changes a syscall number to -1 > (as far as sycall_get_nr() refers to syscallno in pt_regs). > > That is why I think we should have PTRACE_SET_SYSCALL. Gotcha, yeah that looks like the cleanest approach after all. Thanks for the explanation. Will
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 3a18571..eeac003 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -32,6 +32,7 @@ config ARM64 select HAVE_ARCH_AUDITSYSCALL select HAVE_ARCH_JUMP_LABEL select HAVE_ARCH_KGDB + select HAVE_ARCH_SECCOMP_FILTER select HAVE_ARCH_TRACEHOOK select HAVE_C_RECORDMCOUNT select HAVE_DEBUG_BUGVERBOSE @@ -259,6 +260,19 @@ config ARCH_HAS_CACHE_LINE_SIZE source "mm/Kconfig" +config SECCOMP + bool "Enable seccomp to safely compute untrusted bytecode" + ---help--- + This kernel feature is useful for number crunching applications + that may need to compute untrusted bytecode during their + execution. By using pipes or other transports made available to + the process as file descriptors supporting the read/write + syscalls, it's possible to isolate those applications in + their own address space using seccomp. Once seccomp is + enabled via prctl(PR_SET_SECCOMP), it cannot be disabled + and the task is only allowed to execute a few safe syscalls + defined by each seccomp mode. + config XEN_DOM0 def_bool y depends on XEN diff --git a/arch/arm64/include/asm/seccomp.h b/arch/arm64/include/asm/seccomp.h new file mode 100644 index 0000000..c76fac9 --- /dev/null +++ b/arch/arm64/include/asm/seccomp.h @@ -0,0 +1,25 @@ +/* + * arch/arm64/include/asm/seccomp.h + * + * Copyright (C) 2014 Linaro Limited + * Author: AKASHI Takahiro <takahiro.akashi@linaro.org> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ +#ifndef _ASM_SECCOMP_H +#define _ASM_SECCOMP_H + +#include <asm/unistd.h> + +#ifdef CONFIG_COMPAT +#define __NR_seccomp_read_32 __NR_compat_read +#define __NR_seccomp_write_32 __NR_compat_write +#define __NR_seccomp_exit_32 __NR_compat_exit +#define __NR_seccomp_sigreturn_32 __NR_compat_rt_sigreturn +#endif /* CONFIG_COMPAT */ + +#include <asm-generic/seccomp.h> + +#endif /* _ASM_SECCOMP_H */ diff --git a/arch/arm64/include/asm/unistd.h b/arch/arm64/include/asm/unistd.h index c980ab7..729c155 100644 --- a/arch/arm64/include/asm/unistd.h +++ b/arch/arm64/include/asm/unistd.h @@ -31,6 +31,9 @@ * Compat syscall numbers used by the AArch64 kernel. */ #define __NR_compat_restart_syscall 0 +#define __NR_compat_exit 1 +#define __NR_compat_read 3 +#define __NR_compat_write 4 #define __NR_compat_sigreturn 119 #define __NR_compat_rt_sigreturn 173 diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c index 100d7d1..e477f6f 100644 --- a/arch/arm64/kernel/ptrace.c +++ b/arch/arm64/kernel/ptrace.c @@ -28,6 +28,7 @@ #include <linux/smp.h> #include <linux/ptrace.h> #include <linux/user.h> +#include <linux/seccomp.h> #include <linux/security.h> #include <linux/init.h> #include <linux/signal.h> @@ -1115,6 +1116,10 @@ asmlinkage int syscall_trace_enter(struct pt_regs *regs) saved_x0 = regs->regs[0]; saved_x8 = regs->regs[8]; + if (secure_computing(regs->syscallno) == -1) + /* seccomp failures shouldn't expose any additional code. */ + return -1; + if (test_thread_flag(TIF_SYSCALL_TRACE)) tracehook_report_syscall(regs, PTRACE_SYSCALL_ENTER);
secure_computing() should always be called first in syscall_trace_enter(). If secure_computing() returns -1, we should stop further handling. Then that system call may eventually fail with a specified return value (errno), be trapped or the process itself be killed depending on loaded rules. In these cases, syscall_trace_enter() also returns -1, that results in skiping a normal syscall handling as well as syscall_trace_exit(). Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org> --- arch/arm64/Kconfig | 14 ++++++++++++++ arch/arm64/include/asm/seccomp.h | 25 +++++++++++++++++++++++++ arch/arm64/include/asm/unistd.h | 3 +++ arch/arm64/kernel/ptrace.c | 5 +++++ 4 files changed, 47 insertions(+) create mode 100644 arch/arm64/include/asm/seccomp.h