diff mbox

parisc: Fix syscall restarts (v2)

Message ID 20151221091933.GA22935@p100.box (mailing list archive)
State Accepted, archived
Headers show

Commit Message

Helge Deller Dec. 21, 2015, 9:19 a.m. UTC
This is version 2 of the patch:

On parisc syscalls which are interrupted by signals sometimes failed to
restart and instead returned -ENOSYS which in the worst case lead to
userspace crashes.
A similiar problem existed on MIPS and was fixed by commit e967ef02
("MIPS: Fix restart of indirect syscalls").

On parisc the current syscall restart code assumes that all syscall
callers load the syscall number in the delay slot of the ble
instruction. That's how it is e.g. done in the unistd.h header file:
	ble 0x100(%sr2, %r0)
	ldi #syscall_nr, %r20
Because of that assumption the current code never restored %r20 before
returning to userspace.

This assumption is at least not true for code which uses the glibc
syscall() function, which instead uses this syntax:
	ble 0x100(%sr2, %r0)
	copy regX, %r20
where regX depend on how the compiler optimizes the code and register
usage.

This patch fixes this problem by adding code to analyze how the syscall
number is loaded in the delay branch and - if needed - copy the syscall
number to regX prior returning to userspace for the syscall restart.

Signed-off-by: Helge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>

--
To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

John David Anglin Dec. 21, 2015, 1:11 p.m. UTC | #1
On 2015-12-21, at 4:19 AM, Helge Deller wrote:

> On parisc the current syscall restart code assumes that all syscall
> callers load the syscall number in the delay slot of the ble
> instruction. That's how it is e.g. done in the unistd.h header file:
> 	ble 0x100(%sr2, %r0)
> 	ldi #syscall_nr, %r20
> Because of that assumption the current code never restored %r20 before
> returning to userspace.

Is %r20 restored for nop case?  The comments in the patch suggest that it is
not necessary to restore it.

I'm  thinking we need to review syscall clobbered register list for glibc.

Dave
--
John David Anglin	dave.anglin@bell.net



--
To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mathieu Desnoyers Dec. 21, 2015, 8:27 p.m. UTC | #2
----- On Dec 21, 2015, at 4:19 AM, Helge Deller deller@gmx.de wrote:

> This is version 2 of the patch:
> 
> On parisc syscalls which are interrupted by signals sometimes failed to
> restart and instead returned -ENOSYS which in the worst case lead to
> userspace crashes.
> A similiar problem existed on MIPS and was fixed by commit e967ef02
> ("MIPS: Fix restart of indirect syscalls").
> 
> On parisc the current syscall restart code assumes that all syscall
> callers load the syscall number in the delay slot of the ble
> instruction. That's how it is e.g. done in the unistd.h header file:
>	ble 0x100(%sr2, %r0)
>	ldi #syscall_nr, %r20
> Because of that assumption the current code never restored %r20 before
> returning to userspace.
> 
> This assumption is at least not true for code which uses the glibc
> syscall() function, which instead uses this syntax:
>	ble 0x100(%sr2, %r0)
>	copy regX, %r20
> where regX depend on how the compiler optimizes the code and register
> usage.
> 
> This patch fixes this problem by adding code to analyze how the syscall
> number is loaded in the delay branch and - if needed - copy the syscall
> number to regX prior returning to userspace for the syscall restart.
> 
> Signed-off-by: Helge Deller <deller@gmx.de>
> Cc: stable@vger.kernel.org
> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> 
> diff --git a/arch/parisc/kernel/signal.c b/arch/parisc/kernel/signal.c
> index dc1ea79..2264f68 100644
> --- a/arch/parisc/kernel/signal.c
> +++ b/arch/parisc/kernel/signal.c
> @@ -435,6 +435,55 @@ handle_signal(struct ksignal *ksig, struct pt_regs *regs,
> int in_syscall)
> 		regs->gr[28]);
> }
> 
> +/*
> + * Check how the syscall number gets loaded into %r20 within
> + * the delay branch in userspace and adjust as needed.
> + */
> +
> +static void check_syscallno_in_delay_branch(struct pt_regs *regs)
> +{
> +	u32 opcode, source_reg;
> +	u32 __user *uaddr;
> +	int err;
> +
> +	/* Usually we don't have to restore %r20 (the system call number)
> +	 * because it gets loaded in the delay slot of the branch external
> +	 * instruction via the ldi instruction.
> +	 * In some cases a register-to-register copy instruction might have
> +	 * been used instead, in which case we need to copy the syscall
> +	 * number into the source register before returning to userspace.
> +	 */
> +
> +	/* A syscall is just a branch, so all we have to do is fiddle the
> +	 * return pointer so that the ble instruction gets executed again.
> +	 */
> +	regs->gr[31] -= 8; /* delayed branching */
> +
> +	/* Get assembler opcode of code in delay branch */
> +	uaddr = (unsigned int *) ((regs->gr[31] & ~3) + 4);

Is it valid to have unaligned instructions ? Does the architecture
allow it, or it's a fumble and we should pr_warn ?

> +	err = get_user(opcode, uaddr);
> +	if (err)

Should we add a pr_warn here ?

> +		return;
> +
> +	/* Check if delay branch uses "ldi int,%r20" */
> +	if ((opcode & 0xffff0000) == 0x34140000)
> +		return;	/* everything ok, just return */
> +
> +	/* Check if delay branch uses "nop" */
> +	if (opcode == INSN_NOP)
> +		return;

When we find a NOP in the delay slot, how can we be sure %r20
still holds the syscall value when we re-play the branch
instruction ? Can it be overwritten during the syscall,
either from start of syscall to here, or from here to
return to userspace ?

> +
> +	/* Check if delay branch uses "copy %rX,%r20" */
> +	if ((opcode & 0xffe0ffff) == 0x08000254) {
> +		source_reg = (opcode >> 16) & 31;
> +		regs->gr[source_reg] = regs->gr[20];

Similar question here, how can we be sure regs->gr[20]
still has the system call number at this point (not
overwritten from start of syscall to here) ?

Thanks,

Mathieu

> +		return;
> +	}
> +
> +	pr_warn("syscall restart: %s (pid %d): unexpected opcode 0x%08x\n",
> +		current->comm, task_pid_nr(current), opcode);
> +}
> +
> static inline void
> syscall_restart(struct pt_regs *regs, struct k_sigaction *ka)
> {
> @@ -457,10 +506,7 @@ syscall_restart(struct pt_regs *regs, struct k_sigaction
> *ka)
> 		}
> 		/* fallthrough */
> 	case -ERESTARTNOINTR:
> -		/* A syscall is just a branch, so all
> -		 * we have to do is fiddle the return pointer.
> -		 */
> -		regs->gr[31] -= 8; /* delayed branching */
> +		check_syscallno_in_delay_branch(regs);
> 		break;
> 	}
> }
> @@ -510,15 +556,9 @@ insert_restart_trampoline(struct pt_regs *regs)
> 	}
> 	case -ERESTARTNOHAND:
> 	case -ERESTARTSYS:
> -	case -ERESTARTNOINTR: {
> -		/* Hooray for delayed branching.  We don't
> -		 * have to restore %r20 (the system call
> -		 * number) because it gets loaded in the delay
> -		 * slot of the branch external instruction.
> -		 */
> -		regs->gr[31] -= 8;
> +	case -ERESTARTNOINTR:
> +		check_syscallno_in_delay_branch(regs);
> 		return;
> -	}
> 	default:
> 		break;
>  	}
Helge Deller Dec. 21, 2015, 8:54 p.m. UTC | #3
On 21.12.2015 21:27, Mathieu Desnoyers wrote:
> ----- On Dec 21, 2015, at 4:19 AM, Helge Deller deller@gmx.de wrote:
> 
>> This is version 2 of the patch:
>>
>> On parisc syscalls which are interrupted by signals sometimes failed to
>> restart and instead returned -ENOSYS which in the worst case lead to
>> userspace crashes.
>> A similiar problem existed on MIPS and was fixed by commit e967ef02
>> ("MIPS: Fix restart of indirect syscalls").
>>
>> On parisc the current syscall restart code assumes that all syscall
>> callers load the syscall number in the delay slot of the ble
>> instruction. That's how it is e.g. done in the unistd.h header file:
>> 	ble 0x100(%sr2, %r0)
>> 	ldi #syscall_nr, %r20
>> Because of that assumption the current code never restored %r20 before
>> returning to userspace.
>>
>> This assumption is at least not true for code which uses the glibc
>> syscall() function, which instead uses this syntax:
>> 	ble 0x100(%sr2, %r0)
>> 	copy regX, %r20
>> where regX depend on how the compiler optimizes the code and register
>> usage.
>>
>> This patch fixes this problem by adding code to analyze how the syscall
>> number is loaded in the delay branch and - if needed - copy the syscall
>> number to regX prior returning to userspace for the syscall restart.
>>
>> Signed-off-by: Helge Deller <deller@gmx.de>
>> Cc: stable@vger.kernel.org
>> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
>>
>> diff --git a/arch/parisc/kernel/signal.c b/arch/parisc/kernel/signal.c
>> index dc1ea79..2264f68 100644
>> --- a/arch/parisc/kernel/signal.c
>> +++ b/arch/parisc/kernel/signal.c
>> @@ -435,6 +435,55 @@ handle_signal(struct ksignal *ksig, struct pt_regs *regs,
>> int in_syscall)
>> 		regs->gr[28]);
>> }
>>
>> +/*
>> + * Check how the syscall number gets loaded into %r20 within
>> + * the delay branch in userspace and adjust as needed.
>> + */
>> +
>> +static void check_syscallno_in_delay_branch(struct pt_regs *regs)
>> +{
>> +	u32 opcode, source_reg;
>> +	u32 __user *uaddr;
>> +	int err;
>> +
>> +	/* Usually we don't have to restore %r20 (the system call number)
>> +	 * because it gets loaded in the delay slot of the branch external
>> +	 * instruction via the ldi instruction.
>> +	 * In some cases a register-to-register copy instruction might have
>> +	 * been used instead, in which case we need to copy the syscall
>> +	 * number into the source register before returning to userspace.
>> +	 */
>> +
>> +	/* A syscall is just a branch, so all we have to do is fiddle the
>> +	 * return pointer so that the ble instruction gets executed again.
>> +	 */
>> +	regs->gr[31] -= 8; /* delayed branching */
>> +
>> +	/* Get assembler opcode of code in delay branch */
>> +	uaddr = (unsigned int *) ((regs->gr[31] & ~3) + 4);
> 
> Is it valid to have unaligned instructions ? Does the architecture
> allow it, or it's a fumble and we should pr_warn ?

How can it be unaligned? It's about u32...
And, no, unaligned instructions are not allowed (I think that at least).
 
>> +	err = get_user(opcode, uaddr);
>> +	if (err)
> 
> Should we add a pr_warn here ?

No. There is no gain to have a warning here.
 
>> +		return;
>> +
>> +	/* Check if delay branch uses "ldi int,%r20" */
>> +	if ((opcode & 0xffff0000) == 0x34140000)
>> +		return;	/* everything ok, just return */
>> +
>> +	/* Check if delay branch uses "nop" */
>> +	if (opcode == INSN_NOP)
>> +		return;
> 
> When we find a NOP in the delay slot, how can we be sure %r20
> still holds the syscall value when we re-play the branch
> instruction ? 

I looked at the code and even tested it (with your testcase actually).

> Can it be overwritten during the syscall,
> either from start of syscall to here, or from here to
> return to userspace ?

No.
 
>> +
>> +	/* Check if delay branch uses "copy %rX,%r20" */
>> +	if ((opcode & 0xffe0ffff) == 0x08000254) {
>> +		source_reg = (opcode >> 16) & 31;
>> +		regs->gr[source_reg] = regs->gr[20];
> 
> Similar question here, how can we be sure regs->gr[20]
> still has the system call number at this point (not
> overwritten from start of syscall to here) ?

Those registers are saved at entry of syscall and
restored at exit (with exception of a few registers
e.g. like r28 which is the return value of the syscall). 

Helge
--
To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mathieu Desnoyers Dec. 24, 2015, 4:07 p.m. UTC | #4
----- On Dec 21, 2015, at 3:54 PM, Helge Deller deller@gmx.de wrote:

> On 21.12.2015 21:27, Mathieu Desnoyers wrote:
>> ----- On Dec 21, 2015, at 4:19 AM, Helge Deller deller@gmx.de wrote:
>> 
>>> This is version 2 of the patch:
>>>
>>> On parisc syscalls which are interrupted by signals sometimes failed to
>>> restart and instead returned -ENOSYS which in the worst case lead to
>>> userspace crashes.
>>> A similiar problem existed on MIPS and was fixed by commit e967ef02
>>> ("MIPS: Fix restart of indirect syscalls").
>>>
>>> On parisc the current syscall restart code assumes that all syscall
>>> callers load the syscall number in the delay slot of the ble
>>> instruction. That's how it is e.g. done in the unistd.h header file:
>>> 	ble 0x100(%sr2, %r0)
>>> 	ldi #syscall_nr, %r20
>>> Because of that assumption the current code never restored %r20 before
>>> returning to userspace.
>>>
>>> This assumption is at least not true for code which uses the glibc
>>> syscall() function, which instead uses this syntax:
>>> 	ble 0x100(%sr2, %r0)
>>> 	copy regX, %r20
>>> where regX depend on how the compiler optimizes the code and register
>>> usage.
>>>
>>> This patch fixes this problem by adding code to analyze how the syscall
>>> number is loaded in the delay branch and - if needed - copy the syscall
>>> number to regX prior returning to userspace for the syscall restart.
>>>
>>> Signed-off-by: Helge Deller <deller@gmx.de>
>>> Cc: stable@vger.kernel.org
>>> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
>>>
>>> diff --git a/arch/parisc/kernel/signal.c b/arch/parisc/kernel/signal.c
>>> index dc1ea79..2264f68 100644
>>> --- a/arch/parisc/kernel/signal.c
>>> +++ b/arch/parisc/kernel/signal.c
>>> @@ -435,6 +435,55 @@ handle_signal(struct ksignal *ksig, struct pt_regs *regs,
>>> int in_syscall)
>>> 		regs->gr[28]);
>>> }
>>>
>>> +/*
>>> + * Check how the syscall number gets loaded into %r20 within
>>> + * the delay branch in userspace and adjust as needed.
>>> + */
>>> +
>>> +static void check_syscallno_in_delay_branch(struct pt_regs *regs)
>>> +{
>>> +	u32 opcode, source_reg;
>>> +	u32 __user *uaddr;
>>> +	int err;
>>> +
>>> +	/* Usually we don't have to restore %r20 (the system call number)
>>> +	 * because it gets loaded in the delay slot of the branch external
>>> +	 * instruction via the ldi instruction.
>>> +	 * In some cases a register-to-register copy instruction might have
>>> +	 * been used instead, in which case we need to copy the syscall
>>> +	 * number into the source register before returning to userspace.
>>> +	 */
>>> +
>>> +	/* A syscall is just a branch, so all we have to do is fiddle the
>>> +	 * return pointer so that the ble instruction gets executed again.
>>> +	 */
>>> +	regs->gr[31] -= 8; /* delayed branching */
>>> +
>>> +	/* Get assembler opcode of code in delay branch */
>>> +	uaddr = (unsigned int *) ((regs->gr[31] & ~3) + 4);
>> 
>> Is it valid to have unaligned instructions ? Does the architecture
>> allow it, or it's a fumble and we should pr_warn ?
> 
> How can it be unaligned? It's about u32...

That would be an instruction that is volountarily offset
from 1, 2, 3 bytes from 4-bytes multiples by the application.
The only situation where I have seen this is in cases where
applications are trying to play games with the debugger or
disassembler and hide what they are doing: they can offset
the start of a function like this, and therefore all the
instructions within that function.

> And, no, unaligned instructions are not allowed (I think that at least).

Might be interesting to try it out though. I'm not saying it's
a valid use-case for an application, but it would be at least good
to known whether this is an input we can expect.

> 
>>> +	err = get_user(opcode, uaddr);
>>> +	if (err)
>> 
>> Should we add a pr_warn here ?
> 
> No. There is no gain to have a warning here.

Allright.

> 
>>> +		return;
>>> +
>>> +	/* Check if delay branch uses "ldi int,%r20" */
>>> +	if ((opcode & 0xffff0000) == 0x34140000)
>>> +		return;	/* everything ok, just return */
>>> +
>>> +	/* Check if delay branch uses "nop" */
>>> +	if (opcode == INSN_NOP)
>>> +		return;
>> 
>> When we find a NOP in the delay slot, how can we be sure %r20
>> still holds the syscall value when we re-play the branch
>> instruction ?
> 
> I looked at the code and even tested it (with your testcase actually).
> 
>> Can it be overwritten during the syscall,
>> either from start of syscall to here, or from here to
>> return to userspace ?
> 
> No.
> 
>>> +
>>> +	/* Check if delay branch uses "copy %rX,%r20" */
>>> +	if ((opcode & 0xffe0ffff) == 0x08000254) {
>>> +		source_reg = (opcode >> 16) & 31;
>>> +		regs->gr[source_reg] = regs->gr[20];
>> 
>> Similar question here, how can we be sure regs->gr[20]
>> still has the system call number at this point (not
>> overwritten from start of syscall to here) ?
> 
> Those registers are saved at entry of syscall and
> restored at exit (with exception of a few registers
> e.g. like r28 which is the return value of the syscall).

Can r28 be used as a source_reg ? If so, what happens then ?

Thanks,

Mathieu

> 
> Helge
John David Anglin Dec. 24, 2015, 4:51 p.m. UTC | #5
On 2015-12-24 11:07 AM, Mathieu Desnoyers wrote:
>>> Is it valid to have unaligned instructions ? Does the architecture
>>> >>allow it, or it's a fumble and we should pr_warn ?
>> >
>> >How can it be unaligned? It's about u32...
> That would be an instruction that is volountarily offset
> from 1, 2, 3 bytes from 4-bytes multiples by the application.
> The only situation where I have seen this is in cases where
> applications are trying to play games with the debugger or
> disassembler and hide what they are doing: they can offset
> the start of a function like this, and therefore all the
> instructions within that function.
>
This is not possible on PA-RISC.  Indeed, user instruction addresses are 
always
offset by three.  There is no way to branch to an instruction that is 
offset.

The least significant two bits of an instruction address contain a 
priority level.
User code on linux and hpux executes at level 3.  The only way a user 
can change
privilege level is with a "gate" instruction (a special branch). Whether 
this is
permitted or not depends on page table permissions that the user can't 
change.
In practice, a level change is only allowed on the gateway page.

Dave
diff mbox

Patch

diff --git a/arch/parisc/kernel/signal.c b/arch/parisc/kernel/signal.c
index dc1ea79..2264f68 100644
--- a/arch/parisc/kernel/signal.c
+++ b/arch/parisc/kernel/signal.c
@@ -435,6 +435,55 @@  handle_signal(struct ksignal *ksig, struct pt_regs *regs, int in_syscall)
 		regs->gr[28]);
 }
 
+/*
+ * Check how the syscall number gets loaded into %r20 within
+ * the delay branch in userspace and adjust as needed.
+ */
+
+static void check_syscallno_in_delay_branch(struct pt_regs *regs)
+{
+	u32 opcode, source_reg;
+	u32 __user *uaddr;
+	int err;
+
+	/* Usually we don't have to restore %r20 (the system call number)
+	 * because it gets loaded in the delay slot of the branch external
+	 * instruction via the ldi instruction.
+	 * In some cases a register-to-register copy instruction might have
+	 * been used instead, in which case we need to copy the syscall
+	 * number into the source register before returning to userspace.
+	 */
+
+	/* A syscall is just a branch, so all we have to do is fiddle the
+	 * return pointer so that the ble instruction gets executed again.
+	 */
+	regs->gr[31] -= 8; /* delayed branching */
+
+	/* Get assembler opcode of code in delay branch */
+	uaddr = (unsigned int *) ((regs->gr[31] & ~3) + 4);
+	err = get_user(opcode, uaddr);
+	if (err)
+		return;
+
+	/* Check if delay branch uses "ldi int,%r20" */
+	if ((opcode & 0xffff0000) == 0x34140000)
+		return;	/* everything ok, just return */
+
+	/* Check if delay branch uses "nop" */
+	if (opcode == INSN_NOP)
+		return;
+
+	/* Check if delay branch uses "copy %rX,%r20" */
+	if ((opcode & 0xffe0ffff) == 0x08000254) {
+		source_reg = (opcode >> 16) & 31;
+		regs->gr[source_reg] = regs->gr[20];
+		return;
+	}
+
+	pr_warn("syscall restart: %s (pid %d): unexpected opcode 0x%08x\n",
+		current->comm, task_pid_nr(current), opcode);
+}
+
 static inline void
 syscall_restart(struct pt_regs *regs, struct k_sigaction *ka)
 {
@@ -457,10 +506,7 @@  syscall_restart(struct pt_regs *regs, struct k_sigaction *ka)
 		}
 		/* fallthrough */
 	case -ERESTARTNOINTR:
-		/* A syscall is just a branch, so all
-		 * we have to do is fiddle the return pointer.
-		 */
-		regs->gr[31] -= 8; /* delayed branching */
+		check_syscallno_in_delay_branch(regs);
 		break;
 	}
 }
@@ -510,15 +556,9 @@  insert_restart_trampoline(struct pt_regs *regs)
 	}
 	case -ERESTARTNOHAND:
 	case -ERESTARTSYS:
-	case -ERESTARTNOINTR: {
-		/* Hooray for delayed branching.  We don't
-		 * have to restore %r20 (the system call
-		 * number) because it gets loaded in the delay
-		 * slot of the branch external instruction.
-		 */
-		regs->gr[31] -= 8;
+	case -ERESTARTNOINTR:
+		check_syscallno_in_delay_branch(regs);
 		return;
-	}
 	default:
 		break;
 	}