diff mbox

Kernel oops on 32-bit arm with syscall with invalid sysno

Message ID 20150528214256.GF2067@n2100.arm.linux.org.uk (mailing list archive)
State New, archived
Headers show

Commit Message

Russell King - ARM Linux May 28, 2015, 9:42 p.m. UTC
On Thu, May 28, 2015 at 04:41:14PM -0400, William Cohen wrote:
> When reviewing testsuite failures for systemtap I found that the
> 32-bit arm kernels (both 4.1.0-rc5 and 3.19.8) were not handling the
> libc syscall with invalid sysno in the manner described by
> http://www.gnu.org/software/libc/manual/html_node/System-Calls.html.
> Rather than returning -1 and setting errno to ENOSYS the invalid
> syscall gives segfault and a kernel oops.

Looking at this, it seems that we're triggering this:

        BUG_ON(context->in_syscall || context->name_count);

which seems to imply that we've called audit_syscall_entry() twice
without a call to audit_syscall_exit().  That is something we can
fix - and something which only happens with the syscall of "-1"
(which is our "syscall was cancelled" value.)

If I have diagnosed this correctly, the following patch should fix
it.  However, as you're asking for the "cancelled" syscall value,
what you'll get returned from the syscall is the r0 value preserved
as the result of the syscall.  In other words, you won't get -1 and
errno set to ENOSYS.  Not much can be done about that without breaking
syscall cancelling, so expect your test case to continue failing.

Comments

William Cohen May 29, 2015, 3:50 p.m. UTC | #1
On 05/28/2015 05:42 PM, Russell King - ARM Linux wrote:
> On Thu, May 28, 2015 at 04:41:14PM -0400, William Cohen wrote:
>> When reviewing testsuite failures for systemtap I found that the
>> 32-bit arm kernels (both 4.1.0-rc5 and 3.19.8) were not handling the
>> libc syscall with invalid sysno in the manner described by
>> http://www.gnu.org/software/libc/manual/html_node/System-Calls.html.
>> Rather than returning -1 and setting errno to ENOSYS the invalid
>> syscall gives segfault and a kernel oops.
> 
> Looking at this, it seems that we're triggering this:
> 
>         BUG_ON(context->in_syscall || context->name_count);
> 
> which seems to imply that we've called audit_syscall_entry() twice
> without a call to audit_syscall_exit().  That is something we can
> fix - and something which only happens with the syscall of "-1"
> (which is our "syscall was cancelled" value.)

Hi Russell,

The patch below does eliminate the kernel oops for -1, but it breaks things for other invalid/unimplemented syscalls.  For the attached test, invalid_syscall_plus.c:


$ gcc -g -o invalid_syscall_plus invalid_syscall_plus.c
$ ./invalid_syscall_plus 
Illegal instruction (core dumped)

Previously this would print out the expected messages.

-Will
> 
> If I have diagnosed this correctly, the following patch should fix
> it.  However, as you're asking for the "cancelled" syscall value,
> what you'll get returned from the syscall is the r0 value preserved
> as the result of the syscall.  In other words, you won't get -1 and
> errno set to ENOSYS.  Not much can be done about that without breaking
> syscall cancelling, so expect your test case to continue failing.
> 
> diff --git a/arch/arm/kernel/entry-common.S b/arch/arm/kernel/entry-common.S
> index f8ccc21fa032..2c40c1214a72 100644
> --- a/arch/arm/kernel/entry-common.S
> +++ b/arch/arm/kernel/entry-common.S
> @@ -241,11 +241,11 @@ __sys_trace:
>  	cmp	scno, #-1			@ skip the syscall?
>  	bne	2b
>  	add	sp, sp, #S_OFF			@ restore stack
> -	b	ret_slow_syscall
> +	b	3f
>  
>  __sys_trace_return:
>  	str	r0, [sp, #S_R0 + S_OFF]!	@ save returned r0
> -	mov	r0, sp
> +3:	mov	r0, sp
>  	bl	syscall_trace_exit
>  	b	ret_slow_syscall
>  
> 
>
Russell King - ARM Linux May 29, 2015, 4:10 p.m. UTC | #2
On Fri, May 29, 2015 at 11:50:15AM -0400, William Cohen wrote:
> On 05/28/2015 05:42 PM, Russell King - ARM Linux wrote:
> > On Thu, May 28, 2015 at 04:41:14PM -0400, William Cohen wrote:
> >> When reviewing testsuite failures for systemtap I found that the
> >> 32-bit arm kernels (both 4.1.0-rc5 and 3.19.8) were not handling the
> >> libc syscall with invalid sysno in the manner described by
> >> http://www.gnu.org/software/libc/manual/html_node/System-Calls.html.
> >> Rather than returning -1 and setting errno to ENOSYS the invalid
> >> syscall gives segfault and a kernel oops.
> > 
> > Looking at this, it seems that we're triggering this:
> > 
> >         BUG_ON(context->in_syscall || context->name_count);
> > 
> > which seems to imply that we've called audit_syscall_entry() twice
> > without a call to audit_syscall_exit().  That is something we can
> > fix - and something which only happens with the syscall of "-1"
> > (which is our "syscall was cancelled" value.)
> 
> Hi Russell,
> 
> The patch below does eliminate the kernel oops for -1, but it breaks things for other invalid/unimplemented syscalls.  For the attached test, invalid_syscall_plus.c:
> 
> 
> $ gcc -g -o invalid_syscall_plus invalid_syscall_plus.c
> $ ./invalid_syscall_plus 
> Illegal instruction (core dumped)
> 
> Previously this would print out the expected messages.

The patch /doesn't/ change that behaviour at all.

> > diff --git a/arch/arm/kernel/entry-common.S b/arch/arm/kernel/entry-common.S
> > index f8ccc21fa032..2c40c1214a72 100644
> > --- a/arch/arm/kernel/entry-common.S
> > +++ b/arch/arm/kernel/entry-common.S
> > @@ -241,11 +241,11 @@ __sys_trace:
> >  	cmp	scno, #-1			@ skip the syscall?

If the system call number was not -1 (in your case it isn't, it's 0xdeadbeef)

> >  	bne	2b

Branch to the "2" label backwards, otherwise execute this code:

> >  	add	sp, sp, #S_OFF			@ restore stack
> > -	b	ret_slow_syscall
> > +	b	3f
> >  
> >  __sys_trace_return:
> >  	str	r0, [sp, #S_R0 + S_OFF]!	@ save returned r0
> > -	mov	r0, sp
> > +3:	mov	r0, sp
> >  	bl	syscall_trace_exit
> >  	b	ret_slow_syscall

The code at the referenced local "2" is:

2:      cmp     scno, #(__ARM_NR_BASE - __NR_SYSCALL_BASE)
        eor     r0, scno, #__NR_SYSCALL_BASE    @ put OS number back
        bcs     arm_syscall
        mov     why, #0                         @ no longer a real syscall
        b       sys_ni_syscall                  @ not private func

__NR_SYSCALL_BASE will be zero for your kernel.

What this says is that if the system call number is greater than
__ARM_NR_BASE, then branch to arm_syscall(), otherwise call
sys_ni_syscall().

sys_ni_syscall() will return the -1 / ENOSYS you're expecting.

However, __ARM_NR_BASE is:

#define __ARM_NR_BASE                   (__NR_SYSCALL_BASE+0x0f0000)

which, I fully described in my previous email.

arm_syscall() intentionally gives a SIGILL for cases it doesn't handle.

Your case you are now reporting is behaviour that it's always had going
back more than 15 years, and is most definitely a WONTFIX.  Sorry.
William Cohen May 29, 2015, 6:43 p.m. UTC | #3
On 05/29/2015 12:10 PM, Russell King - ARM Linux wrote:
> On Fri, May 29, 2015 at 11:50:15AM -0400, William Cohen wrote:
>> On 05/28/2015 05:42 PM, Russell King - ARM Linux wrote:
>>> On Thu, May 28, 2015 at 04:41:14PM -0400, William Cohen wrote:
>>>> When reviewing testsuite failures for systemtap I found that the
>>>> 32-bit arm kernels (both 4.1.0-rc5 and 3.19.8) were not handling the
>>>> libc syscall with invalid sysno in the manner described by
>>>> http://www.gnu.org/software/libc/manual/html_node/System-Calls.html.
>>>> Rather than returning -1 and setting errno to ENOSYS the invalid
>>>> syscall gives segfault and a kernel oops.
>>>
>>> Looking at this, it seems that we're triggering this:
>>>
>>>         BUG_ON(context->in_syscall || context->name_count);
>>>
>>> which seems to imply that we've called audit_syscall_entry() twice
>>> without a call to audit_syscall_exit().  That is something we can
>>> fix - and something which only happens with the syscall of "-1"
>>> (which is our "syscall was cancelled" value.)
>>
>> Hi Russell,
>>
>> The patch below does eliminate the kernel oops for -1, but it breaks things for other invalid/unimplemented syscalls.  For the attached test, invalid_syscall_plus.c:
>>
>>
>> $ gcc -g -o invalid_syscall_plus invalid_syscall_plus.c
>> $ ./invalid_syscall_plus 
>> Illegal instruction (core dumped)
>>
>> Previously this would print out the expected messages.
> 
> The patch /doesn't/ change that behaviour at all.

You are correct. I was looking at previous results on the wrong machine/architecture.  Sorry.


> 
>>> diff --git a/arch/arm/kernel/entry-common.S b/arch/arm/kernel/entry-common.S
>>> index f8ccc21fa032..2c40c1214a72 100644
>>> --- a/arch/arm/kernel/entry-common.S
>>> +++ b/arch/arm/kernel/entry-common.S
>>> @@ -241,11 +241,11 @@ __sys_trace:
>>>  	cmp	scno, #-1			@ skip the syscall?
> 
> If the system call number was not -1 (in your case it isn't, it's 0xdeadbeef)
> 
>>>  	bne	2b
> 
> Branch to the "2" label backwards, otherwise execute this code:
> 
>>>  	add	sp, sp, #S_OFF			@ restore stack
>>> -	b	ret_slow_syscall
>>> +	b	3f
>>>  
>>>  __sys_trace_return:
>>>  	str	r0, [sp, #S_R0 + S_OFF]!	@ save returned r0
>>> -	mov	r0, sp
>>> +3:	mov	r0, sp
>>>  	bl	syscall_trace_exit
>>>  	b	ret_slow_syscall
> 
> The code at the referenced local "2" is:
> 
> 2:      cmp     scno, #(__ARM_NR_BASE - __NR_SYSCALL_BASE)
>         eor     r0, scno, #__NR_SYSCALL_BASE    @ put OS number back
>         bcs     arm_syscall
>         mov     why, #0                         @ no longer a real syscall
>         b       sys_ni_syscall                  @ not private func
> 
> __NR_SYSCALL_BASE will be zero for your kernel.
> 
> What this says is that if the system call number is greater than
> __ARM_NR_BASE, then branch to arm_syscall(), otherwise call
> sys_ni_syscall().
>
> sys_ni_syscall() will return the -1 / ENOSYS you're expecting.
> 
> However, __ARM_NR_BASE is:
> 
> #define __ARM_NR_BASE                   (__NR_SYSCALL_BASE+0x0f0000)
> 
> which, I fully described in my previous email.
> 
> arm_syscall() intentionally gives a SIGILL for cases it doesn't handle.
> 
> Your case you are now reporting is behaviour that it's always had going
> back more than 15 years, and is most definitely a WONTFIX.  Sorry.
> 

0xdeadbeef is a negative number, so arm_syscall will be called rather than sys_ni_syscall.  What it looks like is that the systemtap testsuite should be using some large (but not too large) positive number such as 0xffff to get the desired unimplemented syscall

-Will
diff mbox

Patch

diff --git a/arch/arm/kernel/entry-common.S b/arch/arm/kernel/entry-common.S
index f8ccc21fa032..2c40c1214a72 100644
--- a/arch/arm/kernel/entry-common.S
+++ b/arch/arm/kernel/entry-common.S
@@ -241,11 +241,11 @@  __sys_trace:
 	cmp	scno, #-1			@ skip the syscall?
 	bne	2b
 	add	sp, sp, #S_OFF			@ restore stack
-	b	ret_slow_syscall
+	b	3f
 
 __sys_trace_return:
 	str	r0, [sp, #S_R0 + S_OFF]!	@ save returned r0
-	mov	r0, sp
+3:	mov	r0, sp
 	bl	syscall_trace_exit
 	b	ret_slow_syscall