diff mbox series

[v5,04/10] ARM: syscall: always store thread_info->abi_syscall

Message ID 20210726141141.2839385-5-arnd@kernel.org (mailing list archive)
State New
Headers show
Series ARM: remove set_fs callers and implementation | expand

Commit Message

Arnd Bergmann July 26, 2021, 2:11 p.m. UTC
From: Arnd Bergmann <arnd@arndb.de>

The system call number is used in a a couple of places, in particular
ptrace, seccomp and /proc/<pid>/syscall.

The last one apparently never worked reliably on ARM for tasks that are
not currently getting traced.

Storing the syscall number in the normal entry path makes it work,
as well as allowing us to see if the current system call is for OABI
compat mode, which is the next thing I want to hook into.

Since the thread_info->syscall field is not just the number any more, it
is now renamed to abi_syscall. In kernels that enable both OABI and EABI,
the upper bits of this field encode 0x900000 (__NR_OABI_SYSCALL_BASE)
for OABI tasks, while normal EABI tasks do not set the upper bits. This
makes it possible to implement the in_oabi_syscall() helper later.

All other users of thread_info->syscall go through the syscall_get_nr()
helper, which in turn filters out the ABI bits.

Note that the ABI information is lost with PTRACE_SET_SYSCALL, so one
cannot set the internal number to a particular version, but this was
already the case. We could change it to let gdb encode the ABI type along
with the syscall in a CONFIG_OABI_COMPAT-enabled kernel, but that itself
would be a (backwards-compatible) ABI change, so I don't do it here.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 arch/arm/include/asm/syscall.h     |  5 ++++-
 arch/arm/include/asm/thread_info.h |  2 +-
 arch/arm/include/uapi/asm/unistd.h |  1 +
 arch/arm/kernel/asm-offsets.c      |  1 +
 arch/arm/kernel/entry-common.S     |  8 ++++++--
 arch/arm/kernel/ptrace.c           | 14 ++++++++------
 6 files changed, 21 insertions(+), 10 deletions(-)

Comments

Kees Cook Aug. 3, 2023, 11:17 p.m. UTC | #1
On Mon, Jul 26, 2021 at 04:11:35PM +0200, Arnd Bergmann wrote:
> From: Arnd Bergmann <arnd@arndb.de>
> 
> The system call number is used in a a couple of places, in particular
> ptrace, seccomp and /proc/<pid>/syscall.

*thread necromancy*

Hi!

So, it seems like the seccomp selftests broke in a few places due to
this change (back in v5.15). I really thought kernelci.org was running
the seccomp tests, but it seems like the coverage is spotty.

Specifically, the syscall_restart selftest fails, as well as syscall_errno
and syscall_faked (both via seccomp and PTRACE), starting with this patch.

> The last one apparently never worked reliably on ARM for tasks that are
> not currently getting traced.
> 
> Storing the syscall number in the normal entry path makes it work,
> as well as allowing us to see if the current system call is for OABI
> compat mode, which is the next thing I want to hook into.
> 
> Since the thread_info->syscall field is not just the number any more, it
> is now renamed to abi_syscall. In kernels that enable both OABI and EABI,
> the upper bits of this field encode 0x900000 (__NR_OABI_SYSCALL_BASE)
> for OABI tasks, while normal EABI tasks do not set the upper bits. This
> makes it possible to implement the in_oabi_syscall() helper later.
> 
> All other users of thread_info->syscall go through the syscall_get_nr()
> helper, which in turn filters out the ABI bits.

While I've reproducing the bisect done by mediatek, I'm still poking
around in here to figure out what's gone wrong. There was a recent patch
to fix this, but it looks like it's not complete:
https://lore.kernel.org/all/20230724121655.7894-1-lecopzer.chen@mediatek.com/

With the above applied, syscall_errno and syscall_faked start working
again, but not the syscall_restart test.

> Note that the ABI information is lost with PTRACE_SET_SYSCALL, so one
> cannot set the internal number to a particular version, but this was
> already the case. We could change it to let gdb encode the ABI type along
> with the syscall in a CONFIG_OABI_COMPAT-enabled kernel, but that itself
> would be a (backwards-compatible) ABI change, so I don't do it here.
> 
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>

Another issue of note, which may just be "by design" for arm32, is that
an invalid syscall (or, at least, a negative syscall) results in SIGILL,
rather than a errno=ENOSYS failure. This seems to have been true at least
as far back as v5.8 (where this was cleaned up for at least arm64 and
s390). There was a seccomp test added for it in v5.9, but it has been
failing for arm32 since then. :(

I mention this because the behavior of the syscall_restart test looks
like an invalid syscall: on restart a SIGILL is caught instead of the
syscall correctly continuing.

Anyway, I'll keep debugging this, but figured I'd mention it in case
anyone else had been seeing issues in here.

-Kees
Kees Cook Aug. 4, 2023, 8:13 a.m. UTC | #2
On Thu, Aug 03, 2023 at 04:17:24PM -0700, Kees Cook wrote:
> Anyway, I'll keep debugging this, but figured I'd mention it in case
> anyone else had been seeing issues in here.

Okay, I think I have a working patch now. Sent here:
https://lore.kernel.org/lkml/20230804071045.never.134-kees@kernel.org/
Arnd Bergmann Aug. 9, 2023, 7:42 p.m. UTC | #3
On Fri, Aug 4, 2023, at 01:17, Kees Cook wrote:
> On Mon, Jul 26, 2021 at 04:11:35PM +0200, Arnd Bergmann wrote:
>> From: Arnd Bergmann <arnd@arndb.de>
>> 
>> The system call number is used in a a couple of places, in particular
>> ptrace, seccomp and /proc/<pid>/syscall.
>
> *thread necromancy*
>
> Hi!
>
> So, it seems like the seccomp selftests broke in a few places due to
> this change (back in v5.15). I really thought kernelci.org was running
> the seccomp tests, but it seems like the coverage is spotty.
>
> Specifically, the syscall_restart selftest fails, as well as syscall_errno
> and syscall_faked (both via seccomp and PTRACE), starting with this patch.

Thanks for tracking this down!

>> The last one apparently never worked reliably on ARM for tasks that are
>> not currently getting traced.
>> 
>> Storing the syscall number in the normal entry path makes it work,
>> as well as allowing us to see if the current system call is for OABI
>> compat mode, which is the next thing I want to hook into.
>> 
>> Since the thread_info->syscall field is not just the number any more, it
>> is now renamed to abi_syscall. In kernels that enable both OABI and EABI,
>> the upper bits of this field encode 0x900000 (__NR_OABI_SYSCALL_BASE)
>> for OABI tasks, while normal EABI tasks do not set the upper bits. This
>> makes it possible to implement the in_oabi_syscall() helper later.
>> 
>> All other users of thread_info->syscall go through the syscall_get_nr()
>> helper, which in turn filters out the ABI bits.
>
> While I've reproducing the bisect done by mediatek, I'm still poking
> around in here to figure out what's gone wrong. There was a recent patch
> to fix this, but it looks like it's not complete:
> https://lore.kernel.org/all/20230724121655.7894-1-lecopzer.chen@mediatek.com/
>
> With the above applied, syscall_errno and syscall_faked start working
> again, but not the syscall_restart test.

Right, I also see you addressed this better in your follow-up patch,
I'll comment there.

>> Note that the ABI information is lost with PTRACE_SET_SYSCALL, so one
>> cannot set the internal number to a particular version, but this was
>> already the case. We could change it to let gdb encode the ABI type along
>> with the syscall in a CONFIG_OABI_COMPAT-enabled kernel, but that itself
>> would be a (backwards-compatible) ABI change, so I don't do it here.
>> 
>> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
>
> Another issue of note, which may just be "by design" for arm32, is that
> an invalid syscall (or, at least, a negative syscall) results in SIGILL,
> rather than a errno=ENOSYS failure. This seems to have been true at least
> as far back as v5.8 (where this was cleaned up for at least arm64 and
> s390). There was a seccomp test added for it in v5.9, but it has been
> failing for arm32 since then. :(
>
> I mention this because the behavior of the syscall_restart test looks
> like an invalid syscall: on restart a SIGILL is caught instead of the
> syscall correctly continuing.


The odd arm behavior came up on IRC recently, and I saw that this
was what arm has always done, but I could not figure out why this
is done. I tried to see where s390 and arm64 changed the behavior
but can't find it. Do you have the commit IDs?

      Arnd
diff mbox series

Patch

diff --git a/arch/arm/include/asm/syscall.h b/arch/arm/include/asm/syscall.h
index fd02761ba06c..f055e846a5cc 100644
--- a/arch/arm/include/asm/syscall.h
+++ b/arch/arm/include/asm/syscall.h
@@ -22,7 +22,10 @@  extern const unsigned long sys_call_table[];
 static inline int syscall_get_nr(struct task_struct *task,
 				 struct pt_regs *regs)
 {
-	return task_thread_info(task)->syscall;
+	if (IS_ENABLED(CONFIG_AEABI) && !IS_ENABLED(CONFIG_OABI_COMPAT))
+		return task_thread_info(task)->abi_syscall;
+
+	return task_thread_info(task)->abi_syscall & __NR_SYSCALL_MASK;
 }
 
 static inline void syscall_rollback(struct task_struct *task,
diff --git a/arch/arm/include/asm/thread_info.h b/arch/arm/include/asm/thread_info.h
index 70d4cbc49ae1..17c56051747b 100644
--- a/arch/arm/include/asm/thread_info.h
+++ b/arch/arm/include/asm/thread_info.h
@@ -62,7 +62,7 @@  struct thread_info {
 	unsigned long		stack_canary;
 #endif
 	struct cpu_context_save	cpu_context;	/* cpu context */
-	__u32			syscall;	/* syscall number */
+	__u32			abi_syscall;	/* ABI type and syscall nr */
 	__u8			used_cp[16];	/* thread used copro */
 	unsigned long		tp_value[2];	/* TLS registers */
 #ifdef CONFIG_CRUNCH
diff --git a/arch/arm/include/uapi/asm/unistd.h b/arch/arm/include/uapi/asm/unistd.h
index ae7749e15726..a1149911464c 100644
--- a/arch/arm/include/uapi/asm/unistd.h
+++ b/arch/arm/include/uapi/asm/unistd.h
@@ -15,6 +15,7 @@ 
 #define _UAPI__ASM_ARM_UNISTD_H
 
 #define __NR_OABI_SYSCALL_BASE	0x900000
+#define __NR_SYSCALL_MASK	0x0fffff
 
 #if defined(__thumb__) || defined(__ARM_EABI__)
 #define __NR_SYSCALL_BASE	0
diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c
index 70993af22d80..a0945b898ca3 100644
--- a/arch/arm/kernel/asm-offsets.c
+++ b/arch/arm/kernel/asm-offsets.c
@@ -48,6 +48,7 @@  int main(void)
   DEFINE(TI_CPU,		offsetof(struct thread_info, cpu));
   DEFINE(TI_CPU_DOMAIN,		offsetof(struct thread_info, cpu_domain));
   DEFINE(TI_CPU_SAVE,		offsetof(struct thread_info, cpu_context));
+  DEFINE(TI_ABI_SYSCALL,	offsetof(struct thread_info, abi_syscall));
   DEFINE(TI_USED_CP,		offsetof(struct thread_info, used_cp));
   DEFINE(TI_TP_VALUE,		offsetof(struct thread_info, tp_value));
   DEFINE(TI_FPSTATE,		offsetof(struct thread_info, fpstate));
diff --git a/arch/arm/kernel/entry-common.S b/arch/arm/kernel/entry-common.S
index 7f0b7aba1498..e837af90cd44 100644
--- a/arch/arm/kernel/entry-common.S
+++ b/arch/arm/kernel/entry-common.S
@@ -226,6 +226,7 @@  ENTRY(vector_swi)
 	/* saved_psr and saved_pc are now dead */
 
 	uaccess_disable tbl
+	get_thread_info tsk
 
 	adr	tbl, sys_call_table		@ load syscall table pointer
 
@@ -237,13 +238,17 @@  ENTRY(vector_swi)
 	 * get the old ABI syscall table address.
 	 */
 	bics	r10, r10, #0xff000000
+	strne	r10, [tsk, #TI_ABI_SYSCALL]
+	streq	scno, [tsk, #TI_ABI_SYSCALL]
 	eorne	scno, r10, #__NR_OABI_SYSCALL_BASE
 	ldrne	tbl, =sys_oabi_call_table
 #elif !defined(CONFIG_AEABI)
 	bic	scno, scno, #0xff000000		@ mask off SWI op-code
+	str	scno, [tsk, #TI_ABI_SYSCALL]
 	eor	scno, scno, #__NR_SYSCALL_BASE	@ check OS number
+#else
+	str	scno, [tsk, #TI_ABI_SYSCALL]
 #endif
-	get_thread_info tsk
 	/*
 	 * Reload the registers that may have been corrupted on entry to
 	 * the syscall assembly (by tracing or context tracking.)
@@ -288,7 +293,6 @@  ENDPROC(vector_swi)
 	 * context switches, and waiting for our parent to respond.
 	 */
 __sys_trace:
-	mov	r1, scno
 	add	r0, sp, #S_OFF
 	bl	syscall_trace_enter
 	mov	scno, r0
diff --git a/arch/arm/kernel/ptrace.c b/arch/arm/kernel/ptrace.c
index 2771e682220b..d886ea8910cb 100644
--- a/arch/arm/kernel/ptrace.c
+++ b/arch/arm/kernel/ptrace.c
@@ -25,6 +25,7 @@ 
 #include <linux/tracehook.h>
 #include <linux/unistd.h>
 
+#include <asm/syscall.h>
 #include <asm/traps.h>
 
 #define CREATE_TRACE_POINTS
@@ -811,7 +812,8 @@  long arch_ptrace(struct task_struct *child, long request,
 			break;
 
 		case PTRACE_SET_SYSCALL:
-			task_thread_info(child)->syscall = data;
+			task_thread_info(child)->abi_syscall = data &
+							__NR_SYSCALL_MASK;
 			ret = 0;
 			break;
 
@@ -880,14 +882,14 @@  static void tracehook_report_syscall(struct pt_regs *regs,
 	if (dir == PTRACE_SYSCALL_EXIT)
 		tracehook_report_syscall_exit(regs, 0);
 	else if (tracehook_report_syscall_entry(regs))
-		current_thread_info()->syscall = -1;
+		current_thread_info()->abi_syscall = -1;
 
 	regs->ARM_ip = ip;
 }
 
-asmlinkage int syscall_trace_enter(struct pt_regs *regs, int scno)
+asmlinkage int syscall_trace_enter(struct pt_regs *regs)
 {
-	current_thread_info()->syscall = scno;
+	int scno;
 
 	if (test_thread_flag(TIF_SYSCALL_TRACE))
 		tracehook_report_syscall(regs, PTRACE_SYSCALL_ENTER);
@@ -898,11 +900,11 @@  asmlinkage int syscall_trace_enter(struct pt_regs *regs, int scno)
 		return -1;
 #else
 	/* XXX: remove this once OABI gets fixed */
-	secure_computing_strict(current_thread_info()->syscall);
+	secure_computing_strict(syscall_get_nr(current, regs));
 #endif
 
 	/* Tracer or seccomp may have changed syscall. */
-	scno = current_thread_info()->syscall;
+	scno = syscall_get_nr(current, regs);
 
 	if (test_thread_flag(TIF_SYSCALL_TRACEPOINT))
 		trace_sys_enter(regs, scno);