diff mbox series

ARM: ptrace: Restore syscall skipping and restart while tracing

Message ID 20230804071045.never.134-kees@kernel.org (mailing list archive)
State New, archived
Headers show
Series ARM: ptrace: Restore syscall skipping and restart while tracing | expand

Commit Message

Kees Cook Aug. 4, 2023, 7:10 a.m. UTC
Since commit 4e57a4ddf6b0 ("ARM: 9107/1: syscall: always store
thread_info->abi_syscall"), the seccomp selftests "syscall_errno",
"syscall_faked", and "syscall_restart" have been broken. This was
related to two issues:

- seccomp and PTRACE depend on using the special value of "-1" for
  skipping syscalls. This value wasn't working because it was getting
  masked by __NR_SYSCALL_MASK in both PTRACE_SET_SYSCALL and
  get_syscall_nr().

- the syscall entry label "local_restart" is used for resuming syscalls
  interrupted by signals, but the updated syscall number (in scno) was
  not being stored in current_thread_info()->abi_syscall, causing traced
  syscall restarting to fail.

Explicitly test for -1 in PTRACE_SET_SYSCALL and get_syscall_nr(),
leaving it exposed when present, allowing tracers to skip syscalls
again.

Move the AEABI-only assignment of current_thread_info()->abi_syscall
after the "local_restart" label to allow tracers to survive syscall
restarting.

Cc: Russell King <linux@armlinux.org.uk>
Cc: Arnd Bergmann <arnd@kernel.org>
Cc: Lecopzer Chen <lecopzer.chen@mediatek.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: linux-arm-kernel@lists.infradead.org
Fixes: 4e57a4ddf6b0 ("ARM: 9107/1: syscall: always store thread_info->abi_syscall")
Signed-off-by: Kees Cook <keescook@chromium.org>
---
Note that I haven't tested OABI at all, and AEABI+OABI_COMPAT doesn't
work with seccomp. I booted an AEABI system under AEABI+OABI_COMPAT,
but I wasn't able to test tracing...
---
 arch/arm/include/asm/syscall.h | 3 +++
 arch/arm/kernel/entry-common.S | 5 +++--
 arch/arm/kernel/ptrace.c       | 5 +++--
 3 files changed, 9 insertions(+), 4 deletions(-)

Comments

Arnd Bergmann Aug. 9, 2023, 7:47 p.m. UTC | #1
On Fri, Aug 4, 2023, at 09:10, Kees Cook wrote:
> Since commit 4e57a4ddf6b0 ("ARM: 9107/1: syscall: always store
> thread_info->abi_syscall"), the seccomp selftests "syscall_errno",
> "syscall_faked", and "syscall_restart" have been broken. This was
> related to two issues:

While it looks like my patch introduced both problems, it might
be better to split your fix into two bits.

> - seccomp and PTRACE depend on using the special value of "-1" for
>   skipping syscalls. This value wasn't working because it was getting
>   masked by __NR_SYSCALL_MASK in both PTRACE_SET_SYSCALL and
>   get_syscall_nr().

> Explicitly test for -1 in PTRACE_SET_SYSCALL and get_syscall_nr(),
> leaving it exposed when present, allowing tracers to skip syscalls
> again.

This part looks good to me, at least it seems to be one of multiple
ways of doing this, depending on how we want to encode the
syscall skipping in the variable.

> - the syscall entry label "local_restart" is used for resuming syscalls
>   interrupted by signals, but the updated syscall number (in scno) was
>   not being stored in current_thread_info()->abi_syscall, causing traced
>   syscall restarting to fail.
>
> Move the AEABI-only assignment of current_thread_info()->abi_syscall
> after the "local_restart" label to allow tracers to survive syscall
> restarting.

I'm not following exactly what you are doing here yet, but I suspect
this part is wrong:

> diff --git a/arch/arm/kernel/entry-common.S b/arch/arm/kernel/entry-common.S
> index bcc4c9ec3aa4..08bd624e4c6f 100644
> --- a/arch/arm/kernel/entry-common.S
> +++ b/arch/arm/kernel/entry-common.S
> @@ -246,8 +246,6 @@ ENTRY(vector_swi)
>  	bic	scno, scno, #0xff000000		@ mask off SWI op-code
>  	str	scno, [tsk, #TI_ABI_SYSCALL]
>  	eor	scno, scno, #__NR_SYSCALL_BASE	@ check OS number
> -#else
> -	str	scno, [tsk, #TI_ABI_SYSCALL]
>  #endif
>  	/*
>  	 * Reload the registers that may have been corrupted on entry to
> @@ -256,6 +254,9 @@ ENTRY(vector_swi)
>   TRACE(	ldmia	sp, {r0 - r3}		)
> 
>  local_restart:
> +#if defined(CONFIG_AEABI) && !defined(CONFIG_OABI_COMPAT)
> +	str	scno, [tsk, #TI_ABI_SYSCALL]	@ store scno for syscall restart
> +#endif
>  	ldr	r10, [tsk, #TI_FLAGS]		@ check for syscall tracing
>  	stmdb	sp!, {r4, r5}			@ push fifth and sixth args
> 

If the local_restart code has to store the syscall number
for an EABI-only kernel, wouldn't it have to also do this
for a kernel with OABI-only or OABI_COMPAT support?

      Arnd
Kees Cook Aug. 10, 2023, 7:32 p.m. UTC | #2
On Wed, Aug 09, 2023 at 09:47:24PM +0200, Arnd Bergmann wrote:
> On Fri, Aug 4, 2023, at 09:10, Kees Cook wrote:
> > Since commit 4e57a4ddf6b0 ("ARM: 9107/1: syscall: always store
> > thread_info->abi_syscall"), the seccomp selftests "syscall_errno",
> > "syscall_faked", and "syscall_restart" have been broken. This was
> > related to two issues:
> 
> While it looks like my patch introduced both problems, it might
> be better to split your fix into two bits.

Okay, sounds good.

> > - seccomp and PTRACE depend on using the special value of "-1" for
> >   skipping syscalls. This value wasn't working because it was getting
> >   masked by __NR_SYSCALL_MASK in both PTRACE_SET_SYSCALL and
> >   get_syscall_nr().
> 
> > Explicitly test for -1 in PTRACE_SET_SYSCALL and get_syscall_nr(),
> > leaving it exposed when present, allowing tracers to skip syscalls
> > again.
> 
> This part looks good to me, at least it seems to be one of multiple
> ways of doing this, depending on how we want to encode the
> syscall skipping in the variable.
> 
> > - the syscall entry label "local_restart" is used for resuming syscalls
> >   interrupted by signals, but the updated syscall number (in scno) was
> >   not being stored in current_thread_info()->abi_syscall, causing traced
> >   syscall restarting to fail.
> >
> > Move the AEABI-only assignment of current_thread_info()->abi_syscall
> > after the "local_restart" label to allow tracers to survive syscall
> > restarting.
> 
> I'm not following exactly what you are doing here yet, but I suspect
> this part is wrong:
> 
> > diff --git a/arch/arm/kernel/entry-common.S b/arch/arm/kernel/entry-common.S
> > index bcc4c9ec3aa4..08bd624e4c6f 100644
> > --- a/arch/arm/kernel/entry-common.S
> > +++ b/arch/arm/kernel/entry-common.S
> > @@ -246,8 +246,6 @@ ENTRY(vector_swi)
> >  	bic	scno, scno, #0xff000000		@ mask off SWI op-code
> >  	str	scno, [tsk, #TI_ABI_SYSCALL]
> >  	eor	scno, scno, #__NR_SYSCALL_BASE	@ check OS number
> > -#else
> > -	str	scno, [tsk, #TI_ABI_SYSCALL]
> >  #endif
> >  	/*
> >  	 * Reload the registers that may have been corrupted on entry to
> > @@ -256,6 +254,9 @@ ENTRY(vector_swi)
> >   TRACE(	ldmia	sp, {r0 - r3}		)
> > 
> >  local_restart:
> > +#if defined(CONFIG_AEABI) && !defined(CONFIG_OABI_COMPAT)
> > +	str	scno, [tsk, #TI_ABI_SYSCALL]	@ store scno for syscall restart
> > +#endif
> >  	ldr	r10, [tsk, #TI_FLAGS]		@ check for syscall tracing
> >  	stmdb	sp!, {r4, r5}			@ push fifth and sixth args
> > 
> 
> If the local_restart code has to store the syscall number
> for an EABI-only kernel, wouldn't it have to also do this
> for a kernel with OABI-only or OABI_COMPAT support?

This is the part I wasn't sure about. Initially I was thinking it didn't
matter because it's only a problem for a seccomp tracer, but I realize
it might be exposed to a PTRACE tracer too. I was only able to test with
EABI since seccomp is disabled for OABI_COMPAT.

Anyway, syscall restart is done this way:

        movlt   scno, #(__NR_restart_syscall - __NR_SYSCALL_BASE)

Can a EABI call restart an OABI syscall? I think so?

So maybe we just need to add:

	str     scno, [tsk, #TI_ABI_SYSCALL]    @ store scno for syscall restart

after that instead of moving it like I did originally?

Let me test that...
Arnd Bergmann Aug. 10, 2023, 8:10 p.m. UTC | #3
On Thu, Aug 10, 2023, at 21:32, Kees Cook wrote:
> On Wed, Aug 09, 2023 at 09:47:24PM +0200, Arnd Bergmann wrote:
>
>> If the local_restart code has to store the syscall number
>> for an EABI-only kernel, wouldn't it have to also do this
>> for a kernel with OABI-only or OABI_COMPAT support?
>
> This is the part I wasn't sure about. Initially I was thinking it didn't
> matter because it's only a problem for a seccomp tracer, but I realize
> it might be exposed to a PTRACE tracer too. I was only able to test with
> EABI since seccomp is disabled for OABI_COMPAT.
>
> Anyway, syscall restart is done this way:
>
>         movlt   scno, #(__NR_restart_syscall - __NR_SYSCALL_BASE)
>
> Can a EABI call restart an OABI syscall? I think so?

There are very few differences between oabi and eabi syscalls, I
think it basically comes down to 

 - the syscall number, and register in which it is passed to the kernel
 - a few syscalls that exist for OABI backward compatibility and were
   deprecated before EABI was added
 - a few syscalls that pass a struct with different alignment rules
 - epoll_wait() uses a runtime check for the output format

It also seems like the __NR_restart_syscall path is only relevant
for syscalls using restart_block for restarting, and that means
it's only poll(), futex(), nanosleep(), clock_nanosleep() and their
time64 counterparts. All of these are handled by the same entry
points for OABI and EABI, i.e. there is no overlap with the
exceptions above. Crucially, epoll does not use restart_block,
unlike poll().

> So maybe we just need to add:
>
> 	str     scno, [tsk, #TI_ABI_SYSCALL]    @ store scno for syscall restart
>
> after that instead of moving it like I did originally?

Yes, I think that works!

For pure EABI and pure OABI kernels, this just does the right thing,
storing a plain __NR_restart_syscall in the field without an ABI
marker. For an OABI compat task running on an EABI kernel, it will
call the EABI version of restart_syscall(), but that is exactly
the same as the OABI version, as shown above.

    Arnd
Kees Cook Aug. 10, 2023, 8:15 p.m. UTC | #4
On Thu, Aug 10, 2023 at 10:10:08PM +0200, Arnd Bergmann wrote:
> On Thu, Aug 10, 2023, at 21:32, Kees Cook wrote:
> > On Wed, Aug 09, 2023 at 09:47:24PM +0200, Arnd Bergmann wrote:
> >
> >> If the local_restart code has to store the syscall number
> >> for an EABI-only kernel, wouldn't it have to also do this
> >> for a kernel with OABI-only or OABI_COMPAT support?
> >
> > This is the part I wasn't sure about. Initially I was thinking it didn't
> > matter because it's only a problem for a seccomp tracer, but I realize
> > it might be exposed to a PTRACE tracer too. I was only able to test with
> > EABI since seccomp is disabled for OABI_COMPAT.
> >
> > Anyway, syscall restart is done this way:
> >
> >         movlt   scno, #(__NR_restart_syscall - __NR_SYSCALL_BASE)
> >
> > Can a EABI call restart an OABI syscall? I think so?
> 
> There are very few differences between oabi and eabi syscalls, I
> think it basically comes down to 
> 
>  - the syscall number, and register in which it is passed to the kernel
>  - a few syscalls that exist for OABI backward compatibility and were
>    deprecated before EABI was added
>  - a few syscalls that pass a struct with different alignment rules
>  - epoll_wait() uses a runtime check for the output format
> 
> It also seems like the __NR_restart_syscall path is only relevant
> for syscalls using restart_block for restarting, and that means
> it's only poll(), futex(), nanosleep(), clock_nanosleep() and their
> time64 counterparts. All of these are handled by the same entry

Right -- it's a tiny corner case I tripped over years ago while building
seccomp filters, so it got added to the selftests. :)

> points for OABI and EABI, i.e. there is no overlap with the
> exceptions above. Crucially, epoll does not use restart_block,
> unlike poll().
> 
> > So maybe we just need to add:
> >
> > 	str     scno, [tsk, #TI_ABI_SYSCALL]    @ store scno for syscall restart
> >
> > after that instead of moving it like I did originally?
> 
> Yes, I think that works!
> 
> For pure EABI and pure OABI kernels, this just does the right thing,
> storing a plain __NR_restart_syscall in the field without an ABI
> marker. For an OABI compat task running on an EABI kernel, it will
> call the EABI version of restart_syscall(), but that is exactly
> the same as the OABI version, as shown above.

Okay, excellent. I came to the same conclusion. Patch 1 in the v2
addresses this and tested okay for me.

Thanks for looking at this!

-Kees
diff mbox series

Patch

diff --git a/arch/arm/include/asm/syscall.h b/arch/arm/include/asm/syscall.h
index dfeed440254a..fe4326d938c1 100644
--- a/arch/arm/include/asm/syscall.h
+++ b/arch/arm/include/asm/syscall.h
@@ -25,6 +25,9 @@  static inline int syscall_get_nr(struct task_struct *task,
 	if (IS_ENABLED(CONFIG_AEABI) && !IS_ENABLED(CONFIG_OABI_COMPAT))
 		return task_thread_info(task)->abi_syscall;
 
+	if (task_thread_info(task)->abi_syscall == -1)
+		return -1;
+
 	return task_thread_info(task)->abi_syscall & __NR_SYSCALL_MASK;
 }
 
diff --git a/arch/arm/kernel/entry-common.S b/arch/arm/kernel/entry-common.S
index bcc4c9ec3aa4..08bd624e4c6f 100644
--- a/arch/arm/kernel/entry-common.S
+++ b/arch/arm/kernel/entry-common.S
@@ -246,8 +246,6 @@  ENTRY(vector_swi)
 	bic	scno, scno, #0xff000000		@ mask off SWI op-code
 	str	scno, [tsk, #TI_ABI_SYSCALL]
 	eor	scno, scno, #__NR_SYSCALL_BASE	@ check OS number
-#else
-	str	scno, [tsk, #TI_ABI_SYSCALL]
 #endif
 	/*
 	 * Reload the registers that may have been corrupted on entry to
@@ -256,6 +254,9 @@  ENTRY(vector_swi)
  TRACE(	ldmia	sp, {r0 - r3}		)
 
 local_restart:
+#if defined(CONFIG_AEABI) && !defined(CONFIG_OABI_COMPAT)
+	str	scno, [tsk, #TI_ABI_SYSCALL]	@ store scno for syscall restart
+#endif
 	ldr	r10, [tsk, #TI_FLAGS]		@ check for syscall tracing
 	stmdb	sp!, {r4, r5}			@ push fifth and sixth args
 
diff --git a/arch/arm/kernel/ptrace.c b/arch/arm/kernel/ptrace.c
index 2d8e2516906b..fef32d73f912 100644
--- a/arch/arm/kernel/ptrace.c
+++ b/arch/arm/kernel/ptrace.c
@@ -783,8 +783,9 @@  long arch_ptrace(struct task_struct *child, long request,
 			break;
 
 		case PTRACE_SET_SYSCALL:
-			task_thread_info(child)->abi_syscall = data &
-							__NR_SYSCALL_MASK;
+			if (data != -1)
+				data &= __NR_SYSCALL_MASK;
+			task_thread_info(child)->abi_syscall = data;
 			ret = 0;
 			break;