diff mbox

[RFC] syscalls: Restore address limit after a syscall

Message ID 20170210192242.GM27312@n2100.armlinux.org.uk (mailing list archive)
State New, archived
Headers show

Commit Message

Russell King (Oracle) Feb. 10, 2017, 7:22 p.m. UTC
On Thu, Feb 09, 2017 at 06:42:34PM -0800, Andy Lutomirski wrote:
> On Thu, Feb 9, 2017 at 3:41 PM, Thomas Garnier <thgarnie@google.com> wrote:
> > So by default it is in the wrapper. If selected, an architecture can
> > disable the wrapper put it in the best places. Understood correctly?
> 
> Sounds good to me.
> 
> Presumably the result should go through -mm.  Want to cc: akpm and
> linux-arch@ on the next version?
> 
> I've also cc'd arm and s390 folks -- those are the other arches that
> try to be on top of hardening.

The best place for this on ARM is in the assembly code, rather than in
the hundreds of system calls - having it in one place is surely better
for reducing the cache impact.

This (untested) patch should be sufficient for ARM - there's two choices
which I think make sense to do this:
1. Immediately after returning the syscall
2. Immediately before any returning to userspace

(1) has the advantage that the address limit will be forced for the
exit-path works that we do, preventing those making accesses to kernel
space.

(2) has the advantage that we'd guarantee that the address limit will
be forced while userspace is running for the next entry into kernel
space.

There's actually a third option as well:

(3) forcing the address limit on entry to the kernel from userspace.

This patch implements option 1.

 arch/arm/kernel/entry-common.S | 6 ++++++
 1 files changed, 6 insertions(+)

Comments

Kees Cook Feb. 10, 2017, 8:49 p.m. UTC | #1
On Fri, Feb 10, 2017 at 11:22 AM, Russell King - ARM Linux
<linux@armlinux.org.uk> wrote:
> On Thu, Feb 09, 2017 at 06:42:34PM -0800, Andy Lutomirski wrote:
>> On Thu, Feb 9, 2017 at 3:41 PM, Thomas Garnier <thgarnie@google.com> wrote:
>> > So by default it is in the wrapper. If selected, an architecture can
>> > disable the wrapper put it in the best places. Understood correctly?
>>
>> Sounds good to me.
>>
>> Presumably the result should go through -mm.  Want to cc: akpm and
>> linux-arch@ on the next version?
>>
>> I've also cc'd arm and s390 folks -- those are the other arches that
>> try to be on top of hardening.
>
> The best place for this on ARM is in the assembly code, rather than in
> the hundreds of system calls - having it in one place is surely better
> for reducing the cache impact.
>
> This (untested) patch should be sufficient for ARM - there's two choices
> which I think make sense to do this:
> 1. Immediately after returning the syscall
> 2. Immediately before any returning to userspace
>
> (1) has the advantage that the address limit will be forced for the
> exit-path works that we do, preventing those making accesses to kernel
> space.
>
> (2) has the advantage that we'd guarantee that the address limit will
> be forced while userspace is running for the next entry into kernel
> space.
>
> There's actually a third option as well:
>
> (3) forcing the address limit on entry to the kernel from userspace.
>
> This patch implements option 1.
>
>  arch/arm/kernel/entry-common.S | 6 ++++++
>  1 files changed, 6 insertions(+)
>
> diff --git a/arch/arm/kernel/entry-common.S b/arch/arm/kernel/entry-common.S
> index eb5cd77bf1d8..6a717a2ccb88 100644
> --- a/arch/arm/kernel/entry-common.S
> +++ b/arch/arm/kernel/entry-common.S
> @@ -39,6 +39,8 @@
>  ret_fast_syscall:
>   UNWIND(.fnstart       )
>   UNWIND(.cantunwind    )
> +       mov     r1, #TASK_SIZE
> +       str     r1, [tsk, #TI_ADDR_LIMIT]
>         disable_irq_notrace                     @ disable interrupts
>         ldr     r1, [tsk, #TI_FLAGS]            @ re-check for syscall tracing
>         tst     r1, #_TIF_SYSCALL_WORK | _TIF_WORK_MASK
> @@ -64,6 +66,8 @@ ENDPROC(ret_fast_syscall)
>  ret_fast_syscall:
>   UNWIND(.fnstart       )
>   UNWIND(.cantunwind    )
> +       mov     r1, #TASK_SIZE
> +       str     r1, [tsk, #TI_ADDR_LIMIT]
>         str     r0, [sp, #S_R0 + S_OFF]!        @ save returned r0
>         disable_irq_notrace                     @ disable interrupts
>         ldr     r1, [tsk, #TI_FLAGS]            @ re-check for syscall tracing
> @@ -262,6 +266,8 @@ ENDPROC(vector_swi)
>         b       ret_slow_syscall
>
>  __sys_trace_return:
> +       mov     r1, #TASK_SIZE
> +       str     r1, [tsk, #TI_ADDR_LIMIT]
>         str     r0, [sp, #S_R0 + S_OFF]!        @ save returned r0
>         mov     r0, sp
>         bl      syscall_trace_exit
>

That looks pretty great! If I'm reading the macros correctly, this'll
only happen on _actual_ syscall exit, right? So all the crazy OABI
stuff won't suddenly break? e.g.:

asmlinkage long sys_oabi_semtimedop(int semid,
...
                mm_segment_t fs = get_fs();
                set_fs(KERNEL_DS);
                err = sys_semtimedop(semid, sops, nsops, timeout);
                set_fs(fs);
...
        return err;
}

Is there a similarly good place to do this for arm64?

-Kees
Russell King (Oracle) Feb. 10, 2017, 9:49 p.m. UTC | #2
On Fri, Feb 10, 2017 at 12:49:34PM -0800, Kees Cook wrote:
> On Fri, Feb 10, 2017 at 11:22 AM, Russell King - ARM Linux
> <linux@armlinux.org.uk> wrote:
> > On Thu, Feb 09, 2017 at 06:42:34PM -0800, Andy Lutomirski wrote:
> >> On Thu, Feb 9, 2017 at 3:41 PM, Thomas Garnier <thgarnie@google.com> wrote:
> >> > So by default it is in the wrapper. If selected, an architecture can
> >> > disable the wrapper put it in the best places. Understood correctly?
> >>
> >> Sounds good to me.
> >>
> >> Presumably the result should go through -mm.  Want to cc: akpm and
> >> linux-arch@ on the next version?
> >>
> >> I've also cc'd arm and s390 folks -- those are the other arches that
> >> try to be on top of hardening.
> >
> > The best place for this on ARM is in the assembly code, rather than in
> > the hundreds of system calls - having it in one place is surely better
> > for reducing the cache impact.
> >
> > This (untested) patch should be sufficient for ARM - there's two choices
> > which I think make sense to do this:
> > 1. Immediately after returning the syscall
> > 2. Immediately before any returning to userspace
> >
> > (1) has the advantage that the address limit will be forced for the
> > exit-path works that we do, preventing those making accesses to kernel
> > space.
> >
> > (2) has the advantage that we'd guarantee that the address limit will
> > be forced while userspace is running for the next entry into kernel
> > space.
> >
> > There's actually a third option as well:
> >
> > (3) forcing the address limit on entry to the kernel from userspace.
> >
> > This patch implements option 1.
> >
> >  arch/arm/kernel/entry-common.S | 6 ++++++
> >  1 files changed, 6 insertions(+)
> >
> > diff --git a/arch/arm/kernel/entry-common.S b/arch/arm/kernel/entry-common.S
> > index eb5cd77bf1d8..6a717a2ccb88 100644
> > --- a/arch/arm/kernel/entry-common.S
> > +++ b/arch/arm/kernel/entry-common.S
> > @@ -39,6 +39,8 @@
> >  ret_fast_syscall:
> >   UNWIND(.fnstart       )
> >   UNWIND(.cantunwind    )
> > +       mov     r1, #TASK_SIZE
> > +       str     r1, [tsk, #TI_ADDR_LIMIT]
> >         disable_irq_notrace                     @ disable interrupts
> >         ldr     r1, [tsk, #TI_FLAGS]            @ re-check for syscall tracing
> >         tst     r1, #_TIF_SYSCALL_WORK | _TIF_WORK_MASK
> > @@ -64,6 +66,8 @@ ENDPROC(ret_fast_syscall)
> >  ret_fast_syscall:
> >   UNWIND(.fnstart       )
> >   UNWIND(.cantunwind    )
> > +       mov     r1, #TASK_SIZE
> > +       str     r1, [tsk, #TI_ADDR_LIMIT]
> >         str     r0, [sp, #S_R0 + S_OFF]!        @ save returned r0
> >         disable_irq_notrace                     @ disable interrupts
> >         ldr     r1, [tsk, #TI_FLAGS]            @ re-check for syscall tracing
> > @@ -262,6 +266,8 @@ ENDPROC(vector_swi)
> >         b       ret_slow_syscall
> >
> >  __sys_trace_return:
> > +       mov     r1, #TASK_SIZE
> > +       str     r1, [tsk, #TI_ADDR_LIMIT]
> >         str     r0, [sp, #S_R0 + S_OFF]!        @ save returned r0
> >         mov     r0, sp
> >         bl      syscall_trace_exit
> >
> 
> That looks pretty great! If I'm reading the macros correctly, this'll
> only happen on _actual_ syscall exit, right? So all the crazy OABI
> stuff won't suddenly break? e.g.:

Correct.

> Is there a similarly good place to do this for arm64?

I'd imagine similar places exist in arm64:

ret_fast_syscall
ret_to_user

maybe?
diff mbox

Patch

diff --git a/arch/arm/kernel/entry-common.S b/arch/arm/kernel/entry-common.S
index eb5cd77bf1d8..6a717a2ccb88 100644
--- a/arch/arm/kernel/entry-common.S
+++ b/arch/arm/kernel/entry-common.S
@@ -39,6 +39,8 @@ 
 ret_fast_syscall:
  UNWIND(.fnstart	)
  UNWIND(.cantunwind	)
+	mov	r1, #TASK_SIZE
+	str	r1, [tsk, #TI_ADDR_LIMIT]
 	disable_irq_notrace			@ disable interrupts
 	ldr	r1, [tsk, #TI_FLAGS]		@ re-check for syscall tracing
 	tst	r1, #_TIF_SYSCALL_WORK | _TIF_WORK_MASK
@@ -64,6 +66,8 @@  ENDPROC(ret_fast_syscall)
 ret_fast_syscall:
  UNWIND(.fnstart	)
  UNWIND(.cantunwind	)
+	mov	r1, #TASK_SIZE
+	str	r1, [tsk, #TI_ADDR_LIMIT]
 	str	r0, [sp, #S_R0 + S_OFF]!	@ save returned r0
 	disable_irq_notrace			@ disable interrupts
 	ldr	r1, [tsk, #TI_FLAGS]		@ re-check for syscall tracing
@@ -262,6 +266,8 @@  ENDPROC(vector_swi)
 	b	ret_slow_syscall
 
 __sys_trace_return:
+	mov	r1, #TASK_SIZE
+	str	r1, [tsk, #TI_ADDR_LIMIT]
 	str	r0, [sp, #S_R0 + S_OFF]!	@ save returned r0
 	mov	r0, sp
 	bl	syscall_trace_exit