diff mbox series

[4/9] ARM: syscall: always store thread_info->syscall

Message ID 20200907153701.2981205-5-arnd@arndb.de (mailing list archive)
State New, archived
Headers show
Series ARM: remove set_fs callers and implementation | expand

Commit Message

Arnd Bergmann Sept. 7, 2020, 3:36 p.m. UTC
The system call number is used in a a couple of places, in particular
ptrace, seccomp and /proc/<pid>/syscall.

The last one apparently never worked reliably on ARM for tasks
that are not currently getting traced.

Storing the syscall number in the normal entry path makes it work,
as well as allowing us to see if the current system call is for
OABI compat mode, which is the next thing I want to hook into.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 arch/arm/include/asm/syscall.h | 3 +++
 arch/arm/kernel/asm-offsets.c  | 1 +
 arch/arm/kernel/entry-common.S | 7 +++++--
 arch/arm/kernel/ptrace.c       | 4 ++--
 4 files changed, 11 insertions(+), 4 deletions(-)

Comments

Linus Walleij Sept. 28, 2020, 9:41 a.m. UTC | #1
Hi Arnd,

help me out here because I feel vaguely stupid...

On Mon, Sep 7, 2020 at 5:38 PM Arnd Bergmann <arnd@arndb.de> wrote:

>  {
> +       if (IS_ENABLED(CONFIG_OABI_COMPAT))
> +               return task_thread_info(task)->syscall & ~__NR_OABI_SYSCALL_BASE;

Where __NR_OABI_SYSCALL_BASE is
#define __NR_OABI_SYSCALL_BASE       0x900000

So you will end up with sycall number & FF6FFFFF
masking off bits 20 and 23.

I suppose this is based on this:

>         bics    r10, r10, #0xff000000
> +       str     r10, [tsk, #TI_SYSCALL]

OK we mask off bits 24-31 before we store this.

>         bic     scno, scno, #0xff000000         @ mask off SWI op-code
> +       str     scno, [tsk, #TI_SYSCALL]

And here too.

>         eor     scno, scno, #__NR_SYSCALL_BASE  @ check OS number

And then happens that which will ... I don't know really.
Exclusive or with 0x9000000 is not immediately intuitive
evident to me, I suppose it is for everyone else... :/

I need some idea how this numberspace is managed in order to
understand the code so I can review it, I guess it all makes perfect
sense but I need some background here.

Thanks,
Linus Walleij
Arnd Bergmann Sept. 28, 2020, 12:42 p.m. UTC | #2
On Mon, Sep 28, 2020 at 11:41 AM Linus Walleij <linus.walleij@linaro.org> wrote:
>
> Hi Arnd,
>
> help me out here because I feel vaguely stupid...
>
> On Mon, Sep 7, 2020 at 5:38 PM Arnd Bergmann <arnd@arndb.de> wrote:
>
> >  {
> > +       if (IS_ENABLED(CONFIG_OABI_COMPAT))
> > +               return task_thread_info(task)->syscall & ~__NR_OABI_SYSCALL_BASE;
>
> Where __NR_OABI_SYSCALL_BASE is
> #define __NR_OABI_SYSCALL_BASE       0x900000
>
> So you will end up with sycall number & FF6FFFFF
> masking off bits 20 and 23.

Right. I fixed a bug in here since I sent this, the correct version also
needs to mask away the __NR_OABI_SYSCALL_BASE for a native
oabi kernel, not just for an eabi kernel with oabi-compat mode.

> I suppose this is based on this:
>
> >         bics    r10, r10, #0xff000000
> > +       str     r10, [tsk, #TI_SYSCALL]
>
> OK we mask off bits 24-31 before we store this.
>
> >         bic     scno, scno, #0xff000000         @ mask off SWI op-code
> > +       str     scno, [tsk, #TI_SYSCALL]
>
> And here too.
>
> >         eor     scno, scno, #__NR_SYSCALL_BASE  @ check OS number
>
> And then happens that which will ... I don't know really.
> Exclusive or with 0x9000000 is not immediately intuitive
> evident to me, I suppose it is for everyone else... :/

This is how the SWI/SVC immediate argument gets turned into
a system call number that is used as an offset into the sys_call_table.

OABI syscalls are called with '__NR_OABI_SYSCALL_BASE | scno'
in the immediate argument of the instruction, so using an
'eor ... , #__NR_SYSCALL_BASE' means that any valid
argument afterwards is a number between zero and
__NR_syscalls, and any invalid argument is a number outside
of that range

EABI syscalls are just 'SVC 0' with the syscall number in register 7
and no offset.

See also
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3f2829a31573e3e502b874c8d69a765f7a778793

> I need some idea how this numberspace is managed in order to
> understand the code so I can review it, I guess it all makes perfect
> sense but I need some background here.

I also had never understood this part before, and I'm still not
sure where the 0x900000 actually comes from, though my best
guess is that this was intended as a an OS specific number space,
with '9' being assigned to Linux (similar to the way Itanium and
MIPS do with their respective offsets). By the time EABI got added,
this was apparently no longer considered helpful.

        Arnd
Russell King (Oracle) Sept. 28, 2020, 3:08 p.m. UTC | #3
On Mon, Sep 28, 2020 at 02:42:43PM +0200, Arnd Bergmann wrote:
> > I need some idea how this numberspace is managed in order to
> > understand the code so I can review it, I guess it all makes perfect
> > sense but I need some background here.
> 
> I also had never understood this part before, and I'm still not
> sure where the 0x900000 actually comes from, though my best
> guess is that this was intended as a an OS specific number space,
> with '9' being assigned to Linux (similar to the way Itanium and
> MIPS do with their respective offsets). By the time EABI got added,
> this was apparently no longer considered helpful.

It is an OS specific number space, originally designed to allow
RISC OS programs to be run under Linux.  There was indeed such a
project, but that died and the code ripped out. EABI, by using
SWI 0 - or more accurately, not reading the SWI opcode, trampled
over the ability for RISC OS programs to be run under Linux.
diff mbox series

Patch

diff --git a/arch/arm/include/asm/syscall.h b/arch/arm/include/asm/syscall.h
index fd02761ba06c..ff6cc365eaf7 100644
--- a/arch/arm/include/asm/syscall.h
+++ b/arch/arm/include/asm/syscall.h
@@ -22,6 +22,9 @@  extern const unsigned long sys_call_table[];
 static inline int syscall_get_nr(struct task_struct *task,
 				 struct pt_regs *regs)
 {
+	if (IS_ENABLED(CONFIG_OABI_COMPAT))
+		return task_thread_info(task)->syscall & ~__NR_OABI_SYSCALL_BASE;
+
 	return task_thread_info(task)->syscall;
 }
 
diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c
index a1570c8bab25..97af6735172b 100644
--- a/arch/arm/kernel/asm-offsets.c
+++ b/arch/arm/kernel/asm-offsets.c
@@ -46,6 +46,7 @@  int main(void)
   DEFINE(TI_CPU,		offsetof(struct thread_info, cpu));
   DEFINE(TI_CPU_DOMAIN,		offsetof(struct thread_info, cpu_domain));
   DEFINE(TI_CPU_SAVE,		offsetof(struct thread_info, cpu_context));
+  DEFINE(TI_SYSCALL,		offsetof(struct thread_info, syscall));
   DEFINE(TI_USED_CP,		offsetof(struct thread_info, used_cp));
   DEFINE(TI_TP_VALUE,		offsetof(struct thread_info, tp_value));
   DEFINE(TI_FPSTATE,		offsetof(struct thread_info, fpstate));
diff --git a/arch/arm/kernel/entry-common.S b/arch/arm/kernel/entry-common.S
index 271cb8a1eba1..2ea3a1989fed 100644
--- a/arch/arm/kernel/entry-common.S
+++ b/arch/arm/kernel/entry-common.S
@@ -223,6 +223,7 @@  ENTRY(vector_swi)
 	/* saved_psr and saved_pc are now dead */
 
 	uaccess_disable tbl
+	get_thread_info tsk
 
 	adr	tbl, sys_call_table		@ load syscall table pointer
 
@@ -234,13 +235,16 @@  ENTRY(vector_swi)
 	 * get the old ABI syscall table address.
 	 */
 	bics	r10, r10, #0xff000000
+	str	r10, [tsk, #TI_SYSCALL]
 	eorne	scno, r10, #__NR_OABI_SYSCALL_BASE
 	ldrne	tbl, =sys_oabi_call_table
 #elif !defined(CONFIG_AEABI)
 	bic	scno, scno, #0xff000000		@ mask off SWI op-code
+	str	scno, [tsk, #TI_SYSCALL]
 	eor	scno, scno, #__NR_SYSCALL_BASE	@ check OS number
+#else
+	str	scno, [tsk, #TI_SYSCALL]
 #endif
-	get_thread_info tsk
 	/*
 	 * Reload the registers that may have been corrupted on entry to
 	 * the syscall assembly (by tracing or context tracking.)
@@ -285,7 +289,6 @@  ENDPROC(vector_swi)
 	 * context switches, and waiting for our parent to respond.
 	 */
 __sys_trace:
-	mov	r1, scno
 	add	r0, sp, #S_OFF
 	bl	syscall_trace_enter
 	mov	scno, r0
diff --git a/arch/arm/kernel/ptrace.c b/arch/arm/kernel/ptrace.c
index 2771e682220b..252060663b00 100644
--- a/arch/arm/kernel/ptrace.c
+++ b/arch/arm/kernel/ptrace.c
@@ -885,9 +885,9 @@  static void tracehook_report_syscall(struct pt_regs *regs,
 	regs->ARM_ip = ip;
 }
 
-asmlinkage int syscall_trace_enter(struct pt_regs *regs, int scno)
+asmlinkage int syscall_trace_enter(struct pt_regs *regs)
 {
-	current_thread_info()->syscall = scno;
+	int scno;
 
 	if (test_thread_flag(TIF_SYSCALL_TRACE))
 		tracehook_report_syscall(regs, PTRACE_SYSCALL_ENTER);