Message ID | 1500381724-19003-2-git-send-email-Dave.Martin@arm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hi, On Tue, 18 Jul 2017 13:41:43 +0100 Dave Martin wrote: > The upper 32 bits of the syscallno field in struct pt_regs are handled > inconsistently, being sometimes zero extended and sometimes > sign-extended. The reason for this appears to be that they are not > intentionally significant at all. > My english parser fails on "not intentionally significant at all". Lothar Waßmann
On Wed, Jul 19, 2017 at 03:14:55PM +0200, Lothar Waßmann wrote: > Hi, > > On Tue, 18 Jul 2017 13:41:43 +0100 Dave Martin wrote: > > The upper 32 bits of the syscallno field in struct pt_regs are handled > > inconsistently, being sometimes zero extended and sometimes > > sign-extended. The reason for this appears to be that they are not > > intentionally significant at all. > > > My english parser fails on "not intentionally significant at all". Hmmm, that was rather convoluted. How about: --8<-- In fact, only the lower 32 bits seem to have any real significance for the behaviour of the code: it's been OK to handle the upper bits inconsistently because they don't matter. Cheers ---Dave
On Tue, Jul 18, 2017 at 01:41:43PM +0100, Dave Martin wrote: > The upper 32 bits of the syscallno field in struct pt_regs are handled > inconsistently, being sometimes zero extended and sometimes > sign-extended. The reason for this appears to be that they are not > intentionally significant at all. > > Currently, the only place I can find where those bits are > significant is in calling trace_sys_enter(), which may be > unintentional: for example, if a compat tracer attempts to cancel a > syscall by passing -1 to (COMPAT_)PTRACE_SET_SYSCALL at the > syscall-enter-stop, it will be traced as syscall 4294967295 > rather than -1 as might be expected (and as occurs for a native > tracer doing the same thing). Elsewhere, reads of syscallno cast > it to an int or truncate it. > > There's also a conspicuous amount of code and casting to bodge > around the fact that although semantically an int, syscallno is > stored as a u64. > > Let's not pretend any more. > > In order to preserve the stp x instruction that stores the syscall > number in entry.S, this patch special-cases the layout of struct > pt_regs for big endian so that the newly 32-bit syscallno field > maps onto the low bits of the stored value. This is not beautiful, > but benchmarking of the getpid syscall on Juno suggests indicates a > minor slowdown if the stp is split into an stp x and stp w. > > Signed-off-by: Dave Martin <Dave.Martin@arm.com> > --- > arch/arm64/include/asm/processor.h | 2 +- > arch/arm64/include/asm/ptrace.h | 9 ++++++++- > arch/arm64/kernel/entry.S | 34 +++++++++++++++++----------------- > arch/arm64/kernel/ptrace.c | 2 +- > arch/arm64/kernel/signal.c | 6 +++--- > arch/arm64/kernel/signal32.c | 2 +- > arch/arm64/kernel/traps.c | 2 +- > 7 files changed, 32 insertions(+), 25 deletions(-) > > diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h > index 64c9e78..379def1 100644 > --- a/arch/arm64/include/asm/processor.h > +++ b/arch/arm64/include/asm/processor.h > @@ -112,7 +112,7 @@ void tls_preserve_current_state(void); > static inline void start_thread_common(struct pt_regs *regs, unsigned long pc) > { > memset(regs, 0, sizeof(*regs)); > - regs->syscallno = ~0UL; > + regs->syscallno = ~0; > regs->pc = pc; > } > > diff --git a/arch/arm64/include/asm/ptrace.h b/arch/arm64/include/asm/ptrace.h > index 11403fd..21c87dc 100644 > --- a/arch/arm64/include/asm/ptrace.h > +++ b/arch/arm64/include/asm/ptrace.h > @@ -116,7 +116,14 @@ struct pt_regs { > }; > }; > u64 orig_x0; > - u64 syscallno; > +#ifdef __AARCH64EB__ > + u32 unused2; > + s32 syscallno; > +#else > + s32 syscallno; > + u32 unused2; > +#endif > + > u64 orig_addr_limit; > u64 unused; // maintain 16 byte alignment This isn't a UAPI struct, so we could juggle these about a bit to put all the unused padding at the end. Other than that, this looks good: Acked-by: Will Deacon <will.deacon@arm.com> Will
On Tue, Aug 01, 2017 at 12:36:47PM +0100, Will Deacon wrote: > On Tue, Jul 18, 2017 at 01:41:43PM +0100, Dave Martin wrote: > > The upper 32 bits of the syscallno field in struct pt_regs are handled > > inconsistently, being sometimes zero extended and sometimes > > sign-extended. The reason for this appears to be that they are not > > intentionally significant at all. > > > > Currently, the only place I can find where those bits are > > significant is in calling trace_sys_enter(), which may be > > unintentional: for example, if a compat tracer attempts to cancel a > > syscall by passing -1 to (COMPAT_)PTRACE_SET_SYSCALL at the > > syscall-enter-stop, it will be traced as syscall 4294967295 > > rather than -1 as might be expected (and as occurs for a native > > tracer doing the same thing). Elsewhere, reads of syscallno cast > > it to an int or truncate it. > > > > There's also a conspicuous amount of code and casting to bodge > > around the fact that although semantically an int, syscallno is > > stored as a u64. > > > > Let's not pretend any more. > > > > In order to preserve the stp x instruction that stores the syscall > > number in entry.S, this patch special-cases the layout of struct > > pt_regs for big endian so that the newly 32-bit syscallno field > > maps onto the low bits of the stored value. This is not beautiful, > > but benchmarking of the getpid syscall on Juno suggests indicates a > > minor slowdown if the stp is split into an stp x and stp w. > > > > Signed-off-by: Dave Martin <Dave.Martin@arm.com> [...] > > diff --git a/arch/arm64/include/asm/ptrace.h b/arch/arm64/include/asm/ptrace.h > > index 11403fd..21c87dc 100644 > > --- a/arch/arm64/include/asm/ptrace.h > > +++ b/arch/arm64/include/asm/ptrace.h > > @@ -116,7 +116,14 @@ struct pt_regs { > > }; > > }; > > u64 orig_x0; > > - u64 syscallno; > > +#ifdef __AARCH64EB__ > > + u32 unused2; > > + s32 syscallno; > > +#else > > + s32 syscallno; > > + u32 unused2; > > +#endif > > + > > u64 orig_addr_limit; > > u64 unused; // maintain 16 byte alignment > > This isn't a UAPI struct, so we could juggle these about a bit to put all > the unused padding at the end. In the __AARCH64EB__ case syscallno still needs to be padded up to an odd 32-bit boundary, so we need padding before it in that case; also we'd still need the #ifdef either way, and the struct is already intentionally a multiple of 16 bytes to align it on the stack. So I'm not sure what we'd save here (?) I could put orig_addr_limit before (padded) syscallno, but I'm not sure that's better. Cheers ---Dave
diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h index 64c9e78..379def1 100644 --- a/arch/arm64/include/asm/processor.h +++ b/arch/arm64/include/asm/processor.h @@ -112,7 +112,7 @@ void tls_preserve_current_state(void); static inline void start_thread_common(struct pt_regs *regs, unsigned long pc) { memset(regs, 0, sizeof(*regs)); - regs->syscallno = ~0UL; + regs->syscallno = ~0; regs->pc = pc; } diff --git a/arch/arm64/include/asm/ptrace.h b/arch/arm64/include/asm/ptrace.h index 11403fd..21c87dc 100644 --- a/arch/arm64/include/asm/ptrace.h +++ b/arch/arm64/include/asm/ptrace.h @@ -116,7 +116,14 @@ struct pt_regs { }; }; u64 orig_x0; - u64 syscallno; +#ifdef __AARCH64EB__ + u32 unused2; + s32 syscallno; +#else + s32 syscallno; + u32 unused2; +#endif + u64 orig_addr_limit; u64 unused; // maintain 16 byte alignment }; diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S index b738880..3bf0bd7 100644 --- a/arch/arm64/kernel/entry.S +++ b/arch/arm64/kernel/entry.S @@ -142,8 +142,8 @@ alternative_else_nop_endif * Set syscallno to -1 by default (overridden later if real syscall). */ .if \el == 0 - mvn x21, xzr - str x21, [sp, #S_SYSCALLNO] + mvn w21, wzr + str w21, [sp, #S_SYSCALLNO] .endif /* @@ -290,8 +290,9 @@ alternative_else_nop_endif * * x7 is reserved for the system call number in 32-bit mode. */ -sc_nr .req x25 // number of system calls -scno .req x26 // syscall number +wsc_nr .req w25 // number of system calls +wscno .req w26 // syscall number +xscno .req x26 // syscall number (zero-extended) stbl .req x27 // syscall table pointer tsk .req x28 // current thread_info @@ -577,8 +578,8 @@ el0_svc_compat: * AArch32 syscall handling */ adrp stbl, compat_sys_call_table // load compat syscall table pointer - uxtw scno, w7 // syscall number in w7 (r7) - mov sc_nr, #__NR_compat_syscalls + mov wscno, w7 // syscall number in w7 (r7) + mov wsc_nr, #__NR_compat_syscalls b el0_svc_naked .align 6 @@ -798,19 +799,19 @@ ENDPROC(ret_from_fork) .align 6 el0_svc: adrp stbl, sys_call_table // load syscall table pointer - uxtw scno, w8 // syscall number in w8 - mov sc_nr, #__NR_syscalls + mov wscno, w8 // syscall number in w8 + mov wsc_nr, #__NR_syscalls el0_svc_naked: // compat entry point - stp x0, scno, [sp, #S_ORIG_X0] // save the original x0 and syscall number + stp x0, xscno, [sp, #S_ORIG_X0] // save the original x0 and syscall number enable_dbg_and_irq ct_user_exit 1 ldr x16, [tsk, #TSK_TI_FLAGS] // check for syscall hooks tst x16, #_TIF_SYSCALL_WORK b.ne __sys_trace - cmp scno, sc_nr // check upper syscall limit + cmp wscno, wsc_nr // check upper syscall limit b.hs ni_sys - ldr x16, [stbl, scno, lsl #3] // address in the syscall table + ldr x16, [stbl, xscno, lsl #3] // address in the syscall table blr x16 // call sys_* routine b ret_fast_syscall ni_sys: @@ -824,24 +825,23 @@ ENDPROC(el0_svc) * switches, and waiting for our parent to respond. */ __sys_trace: - mov w0, #-1 // set default errno for - cmp scno, x0 // user-issued syscall(-1) + cmp wscno, #-1 // user-issued syscall(-1)? b.ne 1f - mov x0, #-ENOSYS + mov x0, #-ENOSYS // set default errno if so str x0, [sp, #S_X0] 1: mov x0, sp bl syscall_trace_enter cmp w0, #-1 // skip the syscall? b.eq __sys_trace_return_skipped - uxtw scno, w0 // syscall number (possibly new) + mov wscno, w0 // syscall number (possibly new) mov x1, sp // pointer to regs - cmp scno, sc_nr // check upper syscall limit + cmp wscno, wsc_nr // check upper syscall limit b.hs __ni_sys_trace ldp x0, x1, [sp] // restore the syscall args ldp x2, x3, [sp, #S_X2] ldp x4, x5, [sp, #S_X4] ldp x6, x7, [sp, #S_X6] - ldr x16, [stbl, scno, lsl #3] // address in the syscall table + ldr x16, [stbl, xscno, lsl #3] // address in the syscall table blr x16 // call sys_* routine __sys_trace_return: diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c index 1b38c01..de77480 100644 --- a/arch/arm64/kernel/ptrace.c +++ b/arch/arm64/kernel/ptrace.c @@ -1363,7 +1363,7 @@ static void tracehook_report_syscall(struct pt_regs *regs, if (dir == PTRACE_SYSCALL_EXIT) tracehook_report_syscall_exit(regs, 0); else if (tracehook_report_syscall_entry(regs)) - regs->syscallno = ~0UL; + regs->syscallno = ~0; regs->regs[regno] = saved_reg; } diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c index 089c3747..4d04b89 100644 --- a/arch/arm64/kernel/signal.c +++ b/arch/arm64/kernel/signal.c @@ -387,7 +387,7 @@ static int restore_sigframe(struct pt_regs *regs, /* * Avoid sys_rt_sigreturn() restarting. */ - regs->syscallno = ~0UL; + regs->syscallno = ~0; err |= !valid_user_regs(®s->user_regs, current); if (err == 0) @@ -673,7 +673,7 @@ static void do_signal(struct pt_regs *regs) { unsigned long continue_addr = 0, restart_addr = 0; int retval = 0; - int syscall = (int)regs->syscallno; + int syscall = regs->syscallno; struct ksignal ksig; /* @@ -687,7 +687,7 @@ static void do_signal(struct pt_regs *regs) /* * Avoid additional syscall restarting via ret_to_user. */ - regs->syscallno = ~0UL; + regs->syscallno = ~0; /* * Prepare for system call restart. We do this here so that a diff --git a/arch/arm64/kernel/signal32.c b/arch/arm64/kernel/signal32.c index c747a0f..d98ca76 100644 --- a/arch/arm64/kernel/signal32.c +++ b/arch/arm64/kernel/signal32.c @@ -354,7 +354,7 @@ static int compat_restore_sigframe(struct pt_regs *regs, /* * Avoid compat_sys_sigreturn() restarting. */ - regs->syscallno = ~0UL; + regs->syscallno = ~0; err |= !valid_user_regs(®s->user_regs, current); diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c index c7c7088..ad2aa71 100644 --- a/arch/arm64/kernel/traps.c +++ b/arch/arm64/kernel/traps.c @@ -589,7 +589,7 @@ asmlinkage long do_ni_syscall(struct pt_regs *regs) if (show_unhandled_signals_ratelimited()) { pr_info("%s[%d]: syscall %d\n", current->comm, - task_pid_nr(current), (int)regs->syscallno); + task_pid_nr(current), regs->syscallno); dump_instr("", regs); if (user_mode(regs)) __show_regs(regs);
The upper 32 bits of the syscallno field in struct pt_regs are handled inconsistently, being sometimes zero extended and sometimes sign-extended. The reason for this appears to be that they are not intentionally significant at all. Currently, the only place I can find where those bits are significant is in calling trace_sys_enter(), which may be unintentional: for example, if a compat tracer attempts to cancel a syscall by passing -1 to (COMPAT_)PTRACE_SET_SYSCALL at the syscall-enter-stop, it will be traced as syscall 4294967295 rather than -1 as might be expected (and as occurs for a native tracer doing the same thing). Elsewhere, reads of syscallno cast it to an int or truncate it. There's also a conspicuous amount of code and casting to bodge around the fact that although semantically an int, syscallno is stored as a u64. Let's not pretend any more. In order to preserve the stp x instruction that stores the syscall number in entry.S, this patch special-cases the layout of struct pt_regs for big endian so that the newly 32-bit syscallno field maps onto the low bits of the stored value. This is not beautiful, but benchmarking of the getpid syscall on Juno suggests indicates a minor slowdown if the stp is split into an stp x and stp w. Signed-off-by: Dave Martin <Dave.Martin@arm.com> --- arch/arm64/include/asm/processor.h | 2 +- arch/arm64/include/asm/ptrace.h | 9 ++++++++- arch/arm64/kernel/entry.S | 34 +++++++++++++++++----------------- arch/arm64/kernel/ptrace.c | 2 +- arch/arm64/kernel/signal.c | 6 +++--- arch/arm64/kernel/signal32.c | 2 +- arch/arm64/kernel/traps.c | 2 +- 7 files changed, 32 insertions(+), 25 deletions(-)