Message ID | 1414620418-29472-1-git-send-email-rabin@rab.in (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, Oct 29, 2014 at 11:06:58PM +0100, Rabin Vincent wrote: > ARM has some private syscalls (for example, set_tls(2)) which lie > outside the range of NR_syscalls. If any of these are called while > syscall tracing is being performed, out-of-bounds array access will > occur in the ftrace and perf sys_{enter,exit} handlers. While this patch looks like good caution, having syscalls outside of NR_syscalls seems like a receipe for a disaster. Can you try to fix that issue as ell, please?
On Thu, Oct 30, 2014 at 01:26:06AM -0700, Christoph Hellwig wrote: > On Wed, Oct 29, 2014 at 11:06:58PM +0100, Rabin Vincent wrote: > > ARM has some private syscalls (for example, set_tls(2)) which lie > > outside the range of NR_syscalls. If any of these are called while > > syscall tracing is being performed, out-of-bounds array access will > > occur in the ftrace and perf sys_{enter,exit} handlers. > > While this patch looks like good caution, having syscalls outside of > NR_syscalls seems like a receipe for a disaster. Can you try to fix > that issue as ell, please? No. We've had them since the inception of Linux on ARM. They predate this tracing crap by more than a decade. We're not changing them because that would be a massive user API breakage.
On Thu, 30 Oct 2014 10:18:08 +0000 Russell King - ARM Linux <linux@arm.linux.org.uk> wrote: > On Thu, Oct 30, 2014 at 01:26:06AM -0700, Christoph Hellwig wrote: > > On Wed, Oct 29, 2014 at 11:06:58PM +0100, Rabin Vincent wrote: > > > ARM has some private syscalls (for example, set_tls(2)) which lie > > > outside the range of NR_syscalls. If any of these are called while > > > syscall tracing is being performed, out-of-bounds array access will > > > occur in the ftrace and perf sys_{enter,exit} handlers. > > > > While this patch looks like good caution, having syscalls outside of > > NR_syscalls seems like a receipe for a disaster. Can you try to fix > > that issue as ell, please? > > No. We've had them since the inception of Linux on ARM. They predate > this tracing crap by more than a decade. We're not changing them > because that would be a massive user API breakage. > Since syscall tracing is only broken on ARM, then the fix needs to be ARM specific, and not remove the check for all other architectures that have a sane NR_syscalls variable. -- Steve
On Thu, Oct 30, 2014 at 07:10:39AM -0400, Steven Rostedt wrote: > On Thu, 30 Oct 2014 10:18:08 +0000 > Russell King - ARM Linux <linux@arm.linux.org.uk> wrote: > > > On Thu, Oct 30, 2014 at 01:26:06AM -0700, Christoph Hellwig wrote: > > > On Wed, Oct 29, 2014 at 11:06:58PM +0100, Rabin Vincent wrote: > > > > ARM has some private syscalls (for example, set_tls(2)) which lie > > > > outside the range of NR_syscalls. If any of these are called while > > > > syscall tracing is being performed, out-of-bounds array access will > > > > occur in the ftrace and perf sys_{enter,exit} handlers. > > > > > > While this patch looks like good caution, having syscalls outside of > > > NR_syscalls seems like a receipe for a disaster. Can you try to fix > > > that issue as ell, please? > > > > No. We've had them since the inception of Linux on ARM. They predate > > this tracing crap by more than a decade. We're not changing them > > because that would be a massive user API breakage. > > > > Since syscall tracing is only broken on ARM, then the fix needs to be > ARM specific, and not remove the check for all other architectures that > have a sane NR_syscalls variable. This issue came up before. We have always had syscall number range of 0x900000 or so. The tracing design does not expect that. Therefore, the tracing design did not take account of ARM when it was created. Therefore, it's up to the tracing people to decide how to properly fit their ill-designed subsystem into one of the popular and well-established kernel architectures - or at least suggest a way to work around this issue.
On Thu, 30 Oct 2014 11:14:41 +0000 Russell King - ARM Linux <linux@arm.linux.org.uk> wrote: > We have always had syscall number range of 0x900000 or so. The tracing > design does not expect that. Therefore, the tracing design did not take > account of ARM when it was created. Therefore, it's up to the tracing > people to decide how to properly fit their ill-designed subsystem into > one of the popular and well-established kernel architectures - or at > least suggest a way to work around this issue. > Fine, lets define a MAX_SYSCALL_NR that is by default NR_syscalls, but an architecture can override it. In trace_syscalls.c, where the checks are done, have this: #ifndef MAX_SYSCALL_NR # define MAX_SYSCALL_NR NR_syscalls #endif change all the checks to test against MAX_SYSCALL_NR instead of NR_syscalls. Then in arch/arm/include/asm/syscall.h have: #define MAX_SYSCALL_NR 0xa00000 or whatever would be the highest syscall number for ARM. -- Steve
On Thu, Oct 30, 2014 at 07:30:28AM -0400, Steven Rostedt wrote: > On Thu, 30 Oct 2014 11:14:41 +0000 > Russell King - ARM Linux <linux@arm.linux.org.uk> wrote: > > > > We have always had syscall number range of 0x900000 or so. The tracing > > design does not expect that. Therefore, the tracing design did not take > > account of ARM when it was created. Therefore, it's up to the tracing > > people to decide how to properly fit their ill-designed subsystem into > > one of the popular and well-established kernel architectures - or at > > least suggest a way to work around this issue. > > > > > Fine, lets define a MAX_SYSCALL_NR that is by default NR_syscalls, but > an architecture can override it. > > In trace_syscalls.c, where the checks are done, have this: > > #ifndef MAX_SYSCALL_NR > # define MAX_SYSCALL_NR NR_syscalls > #endif > > change all the checks to test against MAX_SYSCALL_NR instead of > NR_syscalls. > > Then in arch/arm/include/asm/syscall.h have: > > #define MAX_SYSCALL_NR 0xa00000 > > or whatever would be the highest syscall number for ARM. Or do we just ignore the high "special" ARM syscalls and treat them (from the tracing point of view) as non-syscalls, avoiding the allocation of something around 1.2MB for the syscall bitmap. I really don't know, I don't use any of this tracing stuff, so it isn't something I care about. Maybe those who do use the facility should have an input here?
On Thu, 30 Oct 2014 07:10:39 -0400 Steven Rostedt <rostedt@goodmis.org> wrote: > On Thu, 30 Oct 2014 10:18:08 +0000 > Russell King - ARM Linux <linux@arm.linux.org.uk> wrote: > > > On Thu, Oct 30, 2014 at 01:26:06AM -0700, Christoph Hellwig wrote: > > > On Wed, Oct 29, 2014 at 11:06:58PM +0100, Rabin Vincent wrote: > > > > ARM has some private syscalls (for example, set_tls(2)) which lie > > > > outside the range of NR_syscalls. If any of these are called while > > > > syscall tracing is being performed, out-of-bounds array access will > > > > occur in the ftrace and perf sys_{enter,exit} handlers. > > > > > > While this patch looks like good caution, having syscalls outside of > > > NR_syscalls seems like a receipe for a disaster. Can you try to fix > > > that issue as ell, please? > > > > No. We've had them since the inception of Linux on ARM. They predate > > this tracing crap by more than a decade. We're not changing them > > because that would be a massive user API breakage. > > > > Since syscall tracing is only broken on ARM, then the fix needs to be > ARM specific, and not remove the check for all other architectures that > have a sane NR_syscalls variable. Bah, I misread the patch. I shouldn't read patches before having my morning coffee :-/ I read it backwards. I thought it was removing the checks for NR_syscalls, and not adding them. I'm fine with the patch as is, and will take it. But I agree that the syscall tracing code needs a rewrite to handle these types of issues. It has problems with compat calls as well, which we simply ignore. Sorry for the confusion. -- Steve
On Thu, 30 Oct 2014 07:52:23 -0400 Steven Rostedt <rostedt@goodmis.org> wrote: > Bah, I misread the patch. I shouldn't read patches before having my > morning coffee :-/ > That's what I get by reading email before doing my morning Physical Therapy treatment. I'm off to do my PT exercises and then have breakfast. I'll pull the patch in today, test it out, and then push it off for 3.18-rcX. Thanks, -- Steve
* Russell King - ARM Linux <linux@arm.linux.org.uk> wrote: > On Thu, Oct 30, 2014 at 01:26:06AM -0700, Christoph Hellwig wrote: > > On Wed, Oct 29, 2014 at 11:06:58PM +0100, Rabin Vincent wrote: > > > ARM has some private syscalls (for example, set_tls(2)) which lie > > > outside the range of NR_syscalls. If any of these are called while > > > syscall tracing is being performed, out-of-bounds array access will > > > occur in the ftrace and perf sys_{enter,exit} handlers. > > > > While this patch looks like good caution, having syscalls > > outside of NR_syscalls seems like a receipe for a disaster. > > Can you try to fix that issue as ell, please? > > No. We've had them since the inception of Linux on ARM. They > predate this tracing crap by more than a decade. We're not > changing them because that would be a massive user API > breakage. So if you go around calling other people's code 'crap' so easily: if we should call something 'crap' in this area it's the decision of ARM to deviate from all other architectures arbitrarily and to introduce 'private' syscalls outside NR_syscalls... There's a reason why we have NR_syscalls with relatively tighly packed syscall numbers and there's a reason why we don't do 'private' syscalls on other architectures. I'd probably have NAK-ed ARM's 'private syscalls' had I known about it when this was introduced for ARM. IMO you should be ashamed for it instead of blaming others for the complication ... But yes, it's probably an ABI, albeit a crappy one, which is now hurting the introduction of a generic kernel facility in the ARM space. Thanks, Ingo
On 10/30/2014 06:35 AM, Russell King - ARM Linux wrote: > On Thu, Oct 30, 2014 at 07:30:28AM -0400, Steven Rostedt wrote: >> On Thu, 30 Oct 2014 11:14:41 +0000 >> Russell King - ARM Linux <linux@arm.linux.org.uk> wrote: >> >> >>> We have always had syscall number range of 0x900000 or so. The tracing >>> design does not expect that. Therefore, the tracing design did not take >>> account of ARM when it was created. Therefore, it's up to the tracing >>> people to decide how to properly fit their ill-designed subsystem into >>> one of the popular and well-established kernel architectures - or at >>> least suggest a way to work around this issue. >>> >> >> >> Fine, lets define a MAX_SYSCALL_NR that is by default NR_syscalls, but >> an architecture can override it. >> >> In trace_syscalls.c, where the checks are done, have this: >> >> #ifndef MAX_SYSCALL_NR >> # define MAX_SYSCALL_NR NR_syscalls >> #endif >> >> change all the checks to test against MAX_SYSCALL_NR instead of >> NR_syscalls. >> >> Then in arch/arm/include/asm/syscall.h have: >> >> #define MAX_SYSCALL_NR 0xa00000 >> >> or whatever would be the highest syscall number for ARM. > > Or do we just ignore the high "special" ARM syscalls and treat them (from > the tracing point of view) as non-syscalls, avoiding the allocation of > something around 1.2MB for the syscall bitmap. I really don't know, I > don't use any of this tracing stuff, so it isn't something I care about. > > Maybe those who do use the facility should have an input here? I checked strace and it knows about ARM's high syscalls. I wouldn't want to go from casually using strace to digging deeper with ftrace only to get the impression that syscalls are disappearing.
On Mon, 3 Nov 2014 11:08:03 -0600 Nathan Lynch <Nathan_Lynch@mentor.com> wrote: > > Or do we just ignore the high "special" ARM syscalls and treat them (from > > the tracing point of view) as non-syscalls, avoiding the allocation of > > something around 1.2MB for the syscall bitmap. I really don't know, I > > don't use any of this tracing stuff, so it isn't something I care about. > > > > Maybe those who do use the facility should have an input here? > > I checked strace and it knows about ARM's high syscalls. I wouldn't > want to go from casually using strace to digging deeper with ftrace only > to get the impression that syscalls are disappearing. I agree, but currently the syscall tracing does not support different mappings, and if there's a group of calls outside of NR_syscalls range, they will currently be ignored. The fix may be to restructure how syscall tracing works. But for now, the only answer we have is to just ignore them. x86 has the same issue with compat calls (i386 syscalls on x86_64 kernels). -- Steve
diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c index 4dc8b79..29228c4 100644 --- a/kernel/trace/trace_syscalls.c +++ b/kernel/trace/trace_syscalls.c @@ -313,7 +313,7 @@ static void ftrace_syscall_enter(void *data, struct pt_regs *regs, long id) int size; syscall_nr = trace_get_syscall_nr(current, regs); - if (syscall_nr < 0) + if (syscall_nr < 0 || syscall_nr >= NR_syscalls) return; /* Here we're inside tp handler's rcu_read_lock_sched (__DO_TRACE) */ @@ -360,7 +360,7 @@ static void ftrace_syscall_exit(void *data, struct pt_regs *regs, long ret) int syscall_nr; syscall_nr = trace_get_syscall_nr(current, regs); - if (syscall_nr < 0) + if (syscall_nr < 0 || syscall_nr >= NR_syscalls) return; /* Here we're inside tp handler's rcu_read_lock_sched (__DO_TRACE()) */ @@ -567,7 +567,7 @@ static void perf_syscall_enter(void *ignore, struct pt_regs *regs, long id) int size; syscall_nr = trace_get_syscall_nr(current, regs); - if (syscall_nr < 0) + if (syscall_nr < 0 || syscall_nr >= NR_syscalls) return; if (!test_bit(syscall_nr, enabled_perf_enter_syscalls)) return; @@ -641,7 +641,7 @@ static void perf_syscall_exit(void *ignore, struct pt_regs *regs, long ret) int size; syscall_nr = trace_get_syscall_nr(current, regs); - if (syscall_nr < 0) + if (syscall_nr < 0 || syscall_nr >= NR_syscalls) return; if (!test_bit(syscall_nr, enabled_perf_exit_syscalls)) return;
ARM has some private syscalls (for example, set_tls(2)) which lie outside the range of NR_syscalls. If any of these are called while syscall tracing is being performed, out-of-bounds array access will occur in the ftrace and perf sys_{enter,exit} handlers. # trace-cmd record -e raw_syscalls:* true && trace-cmd report ... true-653 [000] 384.675777: sys_enter: NR 192 (0, 1000, 3, 4000022, ffffffff, 0) true-653 [000] 384.675812: sys_exit: NR 192 = 1995915264 true-653 [000] 384.675971: sys_enter: NR 983045 (76f74480, 76f74000, 76f74b28, 76f74480, 76f76f74, 1) true-653 [000] 384.675988: sys_exit: NR 983045 = 0 ... # trace-cmd record -e syscalls:* true [ 17.289329] Unable to handle kernel paging request at virtual address aaaaaace [ 17.289590] pgd = 9e71c000 [ 17.289696] [aaaaaace] *pgd=00000000 [ 17.289985] Internal error: Oops: 5 [#1] PREEMPT SMP ARM [ 17.290169] Modules linked in: [ 17.290391] CPU: 0 PID: 704 Comm: true Not tainted 3.18.0-rc2+ #21 [ 17.290585] task: 9f4dab00 ti: 9e710000 task.ti: 9e710000 [ 17.290747] PC is at ftrace_syscall_enter+0x48/0x1f8 [ 17.290866] LR is at syscall_trace_enter+0x124/0x184 Fix this by ignoring out-of-NR_syscalls-bounds syscall numbers. Signed-off-by: Rabin Vincent <rabin@rab.in> --- kernel/trace/trace_syscalls.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)