Message ID | 20230804071045.never.134-kees@kernel.org (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | ARM: ptrace: Restore syscall skipping and restart while tracing | expand |
On Fri, Aug 4, 2023, at 09:10, Kees Cook wrote: > Since commit 4e57a4ddf6b0 ("ARM: 9107/1: syscall: always store > thread_info->abi_syscall"), the seccomp selftests "syscall_errno", > "syscall_faked", and "syscall_restart" have been broken. This was > related to two issues: While it looks like my patch introduced both problems, it might be better to split your fix into two bits. > - seccomp and PTRACE depend on using the special value of "-1" for > skipping syscalls. This value wasn't working because it was getting > masked by __NR_SYSCALL_MASK in both PTRACE_SET_SYSCALL and > get_syscall_nr(). > Explicitly test for -1 in PTRACE_SET_SYSCALL and get_syscall_nr(), > leaving it exposed when present, allowing tracers to skip syscalls > again. This part looks good to me, at least it seems to be one of multiple ways of doing this, depending on how we want to encode the syscall skipping in the variable. > - the syscall entry label "local_restart" is used for resuming syscalls > interrupted by signals, but the updated syscall number (in scno) was > not being stored in current_thread_info()->abi_syscall, causing traced > syscall restarting to fail. > > Move the AEABI-only assignment of current_thread_info()->abi_syscall > after the "local_restart" label to allow tracers to survive syscall > restarting. I'm not following exactly what you are doing here yet, but I suspect this part is wrong: > diff --git a/arch/arm/kernel/entry-common.S b/arch/arm/kernel/entry-common.S > index bcc4c9ec3aa4..08bd624e4c6f 100644 > --- a/arch/arm/kernel/entry-common.S > +++ b/arch/arm/kernel/entry-common.S > @@ -246,8 +246,6 @@ ENTRY(vector_swi) > bic scno, scno, #0xff000000 @ mask off SWI op-code > str scno, [tsk, #TI_ABI_SYSCALL] > eor scno, scno, #__NR_SYSCALL_BASE @ check OS number > -#else > - str scno, [tsk, #TI_ABI_SYSCALL] > #endif > /* > * Reload the registers that may have been corrupted on entry to > @@ -256,6 +254,9 @@ ENTRY(vector_swi) > TRACE( ldmia sp, {r0 - r3} ) > > local_restart: > +#if defined(CONFIG_AEABI) && !defined(CONFIG_OABI_COMPAT) > + str scno, [tsk, #TI_ABI_SYSCALL] @ store scno for syscall restart > +#endif > ldr r10, [tsk, #TI_FLAGS] @ check for syscall tracing > stmdb sp!, {r4, r5} @ push fifth and sixth args > If the local_restart code has to store the syscall number for an EABI-only kernel, wouldn't it have to also do this for a kernel with OABI-only or OABI_COMPAT support? Arnd
On Wed, Aug 09, 2023 at 09:47:24PM +0200, Arnd Bergmann wrote: > On Fri, Aug 4, 2023, at 09:10, Kees Cook wrote: > > Since commit 4e57a4ddf6b0 ("ARM: 9107/1: syscall: always store > > thread_info->abi_syscall"), the seccomp selftests "syscall_errno", > > "syscall_faked", and "syscall_restart" have been broken. This was > > related to two issues: > > While it looks like my patch introduced both problems, it might > be better to split your fix into two bits. Okay, sounds good. > > - seccomp and PTRACE depend on using the special value of "-1" for > > skipping syscalls. This value wasn't working because it was getting > > masked by __NR_SYSCALL_MASK in both PTRACE_SET_SYSCALL and > > get_syscall_nr(). > > > Explicitly test for -1 in PTRACE_SET_SYSCALL and get_syscall_nr(), > > leaving it exposed when present, allowing tracers to skip syscalls > > again. > > This part looks good to me, at least it seems to be one of multiple > ways of doing this, depending on how we want to encode the > syscall skipping in the variable. > > > - the syscall entry label "local_restart" is used for resuming syscalls > > interrupted by signals, but the updated syscall number (in scno) was > > not being stored in current_thread_info()->abi_syscall, causing traced > > syscall restarting to fail. > > > > Move the AEABI-only assignment of current_thread_info()->abi_syscall > > after the "local_restart" label to allow tracers to survive syscall > > restarting. > > I'm not following exactly what you are doing here yet, but I suspect > this part is wrong: > > > diff --git a/arch/arm/kernel/entry-common.S b/arch/arm/kernel/entry-common.S > > index bcc4c9ec3aa4..08bd624e4c6f 100644 > > --- a/arch/arm/kernel/entry-common.S > > +++ b/arch/arm/kernel/entry-common.S > > @@ -246,8 +246,6 @@ ENTRY(vector_swi) > > bic scno, scno, #0xff000000 @ mask off SWI op-code > > str scno, [tsk, #TI_ABI_SYSCALL] > > eor scno, scno, #__NR_SYSCALL_BASE @ check OS number > > -#else > > - str scno, [tsk, #TI_ABI_SYSCALL] > > #endif > > /* > > * Reload the registers that may have been corrupted on entry to > > @@ -256,6 +254,9 @@ ENTRY(vector_swi) > > TRACE( ldmia sp, {r0 - r3} ) > > > > local_restart: > > +#if defined(CONFIG_AEABI) && !defined(CONFIG_OABI_COMPAT) > > + str scno, [tsk, #TI_ABI_SYSCALL] @ store scno for syscall restart > > +#endif > > ldr r10, [tsk, #TI_FLAGS] @ check for syscall tracing > > stmdb sp!, {r4, r5} @ push fifth and sixth args > > > > If the local_restart code has to store the syscall number > for an EABI-only kernel, wouldn't it have to also do this > for a kernel with OABI-only or OABI_COMPAT support? This is the part I wasn't sure about. Initially I was thinking it didn't matter because it's only a problem for a seccomp tracer, but I realize it might be exposed to a PTRACE tracer too. I was only able to test with EABI since seccomp is disabled for OABI_COMPAT. Anyway, syscall restart is done this way: movlt scno, #(__NR_restart_syscall - __NR_SYSCALL_BASE) Can a EABI call restart an OABI syscall? I think so? So maybe we just need to add: str scno, [tsk, #TI_ABI_SYSCALL] @ store scno for syscall restart after that instead of moving it like I did originally? Let me test that...
On Thu, Aug 10, 2023, at 21:32, Kees Cook wrote: > On Wed, Aug 09, 2023 at 09:47:24PM +0200, Arnd Bergmann wrote: > >> If the local_restart code has to store the syscall number >> for an EABI-only kernel, wouldn't it have to also do this >> for a kernel with OABI-only or OABI_COMPAT support? > > This is the part I wasn't sure about. Initially I was thinking it didn't > matter because it's only a problem for a seccomp tracer, but I realize > it might be exposed to a PTRACE tracer too. I was only able to test with > EABI since seccomp is disabled for OABI_COMPAT. > > Anyway, syscall restart is done this way: > > movlt scno, #(__NR_restart_syscall - __NR_SYSCALL_BASE) > > Can a EABI call restart an OABI syscall? I think so? There are very few differences between oabi and eabi syscalls, I think it basically comes down to - the syscall number, and register in which it is passed to the kernel - a few syscalls that exist for OABI backward compatibility and were deprecated before EABI was added - a few syscalls that pass a struct with different alignment rules - epoll_wait() uses a runtime check for the output format It also seems like the __NR_restart_syscall path is only relevant for syscalls using restart_block for restarting, and that means it's only poll(), futex(), nanosleep(), clock_nanosleep() and their time64 counterparts. All of these are handled by the same entry points for OABI and EABI, i.e. there is no overlap with the exceptions above. Crucially, epoll does not use restart_block, unlike poll(). > So maybe we just need to add: > > str scno, [tsk, #TI_ABI_SYSCALL] @ store scno for syscall restart > > after that instead of moving it like I did originally? Yes, I think that works! For pure EABI and pure OABI kernels, this just does the right thing, storing a plain __NR_restart_syscall in the field without an ABI marker. For an OABI compat task running on an EABI kernel, it will call the EABI version of restart_syscall(), but that is exactly the same as the OABI version, as shown above. Arnd
On Thu, Aug 10, 2023 at 10:10:08PM +0200, Arnd Bergmann wrote: > On Thu, Aug 10, 2023, at 21:32, Kees Cook wrote: > > On Wed, Aug 09, 2023 at 09:47:24PM +0200, Arnd Bergmann wrote: > > > >> If the local_restart code has to store the syscall number > >> for an EABI-only kernel, wouldn't it have to also do this > >> for a kernel with OABI-only or OABI_COMPAT support? > > > > This is the part I wasn't sure about. Initially I was thinking it didn't > > matter because it's only a problem for a seccomp tracer, but I realize > > it might be exposed to a PTRACE tracer too. I was only able to test with > > EABI since seccomp is disabled for OABI_COMPAT. > > > > Anyway, syscall restart is done this way: > > > > movlt scno, #(__NR_restart_syscall - __NR_SYSCALL_BASE) > > > > Can a EABI call restart an OABI syscall? I think so? > > There are very few differences between oabi and eabi syscalls, I > think it basically comes down to > > - the syscall number, and register in which it is passed to the kernel > - a few syscalls that exist for OABI backward compatibility and were > deprecated before EABI was added > - a few syscalls that pass a struct with different alignment rules > - epoll_wait() uses a runtime check for the output format > > It also seems like the __NR_restart_syscall path is only relevant > for syscalls using restart_block for restarting, and that means > it's only poll(), futex(), nanosleep(), clock_nanosleep() and their > time64 counterparts. All of these are handled by the same entry Right -- it's a tiny corner case I tripped over years ago while building seccomp filters, so it got added to the selftests. :) > points for OABI and EABI, i.e. there is no overlap with the > exceptions above. Crucially, epoll does not use restart_block, > unlike poll(). > > > So maybe we just need to add: > > > > str scno, [tsk, #TI_ABI_SYSCALL] @ store scno for syscall restart > > > > after that instead of moving it like I did originally? > > Yes, I think that works! > > For pure EABI and pure OABI kernels, this just does the right thing, > storing a plain __NR_restart_syscall in the field without an ABI > marker. For an OABI compat task running on an EABI kernel, it will > call the EABI version of restart_syscall(), but that is exactly > the same as the OABI version, as shown above. Okay, excellent. I came to the same conclusion. Patch 1 in the v2 addresses this and tested okay for me. Thanks for looking at this! -Kees
diff --git a/arch/arm/include/asm/syscall.h b/arch/arm/include/asm/syscall.h index dfeed440254a..fe4326d938c1 100644 --- a/arch/arm/include/asm/syscall.h +++ b/arch/arm/include/asm/syscall.h @@ -25,6 +25,9 @@ static inline int syscall_get_nr(struct task_struct *task, if (IS_ENABLED(CONFIG_AEABI) && !IS_ENABLED(CONFIG_OABI_COMPAT)) return task_thread_info(task)->abi_syscall; + if (task_thread_info(task)->abi_syscall == -1) + return -1; + return task_thread_info(task)->abi_syscall & __NR_SYSCALL_MASK; } diff --git a/arch/arm/kernel/entry-common.S b/arch/arm/kernel/entry-common.S index bcc4c9ec3aa4..08bd624e4c6f 100644 --- a/arch/arm/kernel/entry-common.S +++ b/arch/arm/kernel/entry-common.S @@ -246,8 +246,6 @@ ENTRY(vector_swi) bic scno, scno, #0xff000000 @ mask off SWI op-code str scno, [tsk, #TI_ABI_SYSCALL] eor scno, scno, #__NR_SYSCALL_BASE @ check OS number -#else - str scno, [tsk, #TI_ABI_SYSCALL] #endif /* * Reload the registers that may have been corrupted on entry to @@ -256,6 +254,9 @@ ENTRY(vector_swi) TRACE( ldmia sp, {r0 - r3} ) local_restart: +#if defined(CONFIG_AEABI) && !defined(CONFIG_OABI_COMPAT) + str scno, [tsk, #TI_ABI_SYSCALL] @ store scno for syscall restart +#endif ldr r10, [tsk, #TI_FLAGS] @ check for syscall tracing stmdb sp!, {r4, r5} @ push fifth and sixth args diff --git a/arch/arm/kernel/ptrace.c b/arch/arm/kernel/ptrace.c index 2d8e2516906b..fef32d73f912 100644 --- a/arch/arm/kernel/ptrace.c +++ b/arch/arm/kernel/ptrace.c @@ -783,8 +783,9 @@ long arch_ptrace(struct task_struct *child, long request, break; case PTRACE_SET_SYSCALL: - task_thread_info(child)->abi_syscall = data & - __NR_SYSCALL_MASK; + if (data != -1) + data &= __NR_SYSCALL_MASK; + task_thread_info(child)->abi_syscall = data; ret = 0; break;
Since commit 4e57a4ddf6b0 ("ARM: 9107/1: syscall: always store thread_info->abi_syscall"), the seccomp selftests "syscall_errno", "syscall_faked", and "syscall_restart" have been broken. This was related to two issues: - seccomp and PTRACE depend on using the special value of "-1" for skipping syscalls. This value wasn't working because it was getting masked by __NR_SYSCALL_MASK in both PTRACE_SET_SYSCALL and get_syscall_nr(). - the syscall entry label "local_restart" is used for resuming syscalls interrupted by signals, but the updated syscall number (in scno) was not being stored in current_thread_info()->abi_syscall, causing traced syscall restarting to fail. Explicitly test for -1 in PTRACE_SET_SYSCALL and get_syscall_nr(), leaving it exposed when present, allowing tracers to skip syscalls again. Move the AEABI-only assignment of current_thread_info()->abi_syscall after the "local_restart" label to allow tracers to survive syscall restarting. Cc: Russell King <linux@armlinux.org.uk> Cc: Arnd Bergmann <arnd@kernel.org> Cc: Lecopzer Chen <lecopzer.chen@mediatek.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: linux-arm-kernel@lists.infradead.org Fixes: 4e57a4ddf6b0 ("ARM: 9107/1: syscall: always store thread_info->abi_syscall") Signed-off-by: Kees Cook <keescook@chromium.org> --- Note that I haven't tested OABI at all, and AEABI+OABI_COMPAT doesn't work with seccomp. I booted an AEABI system under AEABI+OABI_COMPAT, but I wasn't able to test tracing... --- arch/arm/include/asm/syscall.h | 3 +++ arch/arm/kernel/entry-common.S | 5 +++-- arch/arm/kernel/ptrace.c | 5 +++-- 3 files changed, 9 insertions(+), 4 deletions(-)