Message ID | 20110825145558.GF8883@n2100.arm.linux.org.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thursday 25 August 2011, Russell King - ARM Linux wrote: > > Arnd, can you test this to make sure your gdb test case still works, and > Mark, can you test this to make sure it fixes your problem please? Hi Russell, The patch in question was not actually from me but from Ulrich Weigand, so he's probably the right person to test your patch. I'm forwarding it in full to Uli for reference. Arnd
On Thu, Aug 25, 2011 at 03:55:58PM +0100, Russell King - ARM Linux wrote: > Here's the patch. As the kernel I've run this against doesn't have the > change to try_to_freeze(), I added a might_sleep() in do_signal() during > my testing to verify that it fixes Mark's problem (which it does.) > Mark, can you test this to make sure it fixes your problem please? Yes, it resolves the problem for me - thanks! Tested-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Sorry about the delay, was away for the bank holiday weekend.
On Tue, Aug 30, 2011 at 09:58:03PM +0100, Mark Brown wrote: > On Thu, Aug 25, 2011 at 03:55:58PM +0100, Russell King - ARM Linux wrote: > > > Here's the patch. As the kernel I've run this against doesn't have the > > change to try_to_freeze(), I added a might_sleep() in do_signal() during > > my testing to verify that it fixes Mark's problem (which it does.) > > > Mark, can you test this to make sure it fixes your problem please? > > Yes, it resolves the problem for me - thanks! > > Tested-by: Mark Brown <broonie@opensource.wolfsonmicro.com> > > Sorry about the delay, was away for the bank holiday weekend. That's good, thanks. Just waiting for Ulrich Weigand to verify that his gdb test-case doesn't break with this patch.
Arnd Bergmann <arnd@arndb.de> wrote on 08/26/2011 04:44:26 PM: > On Thursday 25 August 2011, Russell King - ARM Linux wrote: > > > > Arnd, can you test this to make sure your gdb test case still works, and > > Mark, can you test this to make sure it fixes your problem please? > > Hi Russell, > > The patch in question was not actually from me but from Ulrich Weigand, > so he's probably the right person to test your patch. > I'm forwarding it in full to Uli for reference. Hi Arnd, hi Russell, sorry for the late reply, I've just returned from vacation today ... I've not yet run the test, but just from reading through the patch it seems that this will at least partially re-introduce the problem my original patch was trying to fix. The situation here is about GDB performing an "inferior function call", e.g. via the GDB "call" command. To do so, GDB will: 0. [ Have gotten control of the target process via some ptrace intercept previously, and then ... ] 1. Save the register state 2. Create a dummy frame on the stack and set up registers (PC, SP, argument registers, ...) as appropriate for a function call 3. Restart via PTRACE_CONTINUE [ ... at this point, the target process runs the function until it returns to a breakpoint instruction and GDB gets control again via another ptrace intercept ... ] 4. Restore the register state saved in [1.] 5. At some later point, continue the target process [at its original location] with PTRACE_CONTINUE The problem now occurs if at point [0.] the target process just happened to be blocked in a restartable system call. For this sequence to then work as expected, two things have to happen: - at point [3.], the kernel must *not* attempt to restart a system call, even though it thinks we're stopped in a restartable system call - at point [5.], the kernel now *must* restart the originally interrupted system call, even though it thinks we're stopped at some breakpoint, and not within a system call My patch achieved both these goals, while it would seem your patch only solves the first issue, not the second one. In fact, since any interaction with ptrace will always cause the TIF_SYS_RESTART flag to be *reset*, and there is no way at all to *set* it, there doesn't appear to be any way for GDB to achive that second goal. [ With my patch, that second goal was implicitly achieved by the fact that at [1.] GDB would save a register state that already corresponds to the way things should be for restarting the system call. When that register set is then restored in [4.], restart just happens automatically without any further kernel intervention. ] One way to fix this might be to make the TIF_SYS_RESTART flag itself visible to ptrace, so the GDB could save/restore it along with the rest of the register set; this would be similar to how that problem is handled on other platforms. However, there doesn't appear to be an obvious place for the flag in the ptrace register set ... Bye, Ulrich
On Thu, Sep 01, 2011 at 03:41:22PM +0200, Ulrich Weigand wrote: > The problem now occurs if at point [0.] the target process just > happened to be blocked in a restartable system call. For this > sequence to then work as expected, two things have to happen: > > - at point [3.], the kernel must *not* attempt to restart a > system call, even though it thinks we're stopped in a > restartable system call > > - at point [5.], the kernel now *must* restart the originally > interrupted system call, even though it thinks we're stopped > at some breakpoint, and not within a system call > > My patch achieved both these goals, while it would seem your > patch only solves the first issue, not the second one. In > fact, since any interaction with ptrace will always cause the > TIF_SYS_RESTART flag to be *reset*, and there is no way at all > to *set* it, there doesn't appear to be any way for GDB to > achive that second goal. ... > One way to fix this might be to make the TIF_SYS_RESTART flag > itself visible to ptrace, so the GDB could save/restore it > along with the rest of the register set; this would be similar > to how that problem is handled on other platforms. However, > there doesn't appear to be an obvious place for the flag in > the ptrace register set ... Thanks for looking at this. I don't think we can augment the ptrace register set - that would be a major API change which would immediately break lots of userspace, causing user stack overflows and such like. I can't see a way out of this - and given the seriousness of the kernel side issue (causing kernel warnings), and that your change altered the strace behaviour (an unintended user-visible change) I think we're going to have to live with the gdb testcase failing until we can come up with a better fix for it. I also wonder what the validity of this behaviour is - there are cases where you can't do what gdb's trying to do - eg, with a syscall using a restart block (-ERESTART_RESTARTBLOCK) because the restart information could be wiped out by a new syscall performed by the function gdb wants to run. Or when the program receives a signal for it to handle while running that function.
Russell King - ARM Linux <linux@arm.linux.org.uk> wrote on 09/01/2011 04:00:00 PM: > On Thu, Sep 01, 2011 at 03:41:22PM +0200, Ulrich Weigand wrote: > > One way to fix this might be to make the TIF_SYS_RESTART flag > > itself visible to ptrace, so the GDB could save/restore it > > along with the rest of the register set; this would be similar > > to how that problem is handled on other platforms. However, > > there doesn't appear to be an obvious place for the flag in > > the ptrace register set ... > > Thanks for looking at this. > > I don't think we can augment the ptrace register set - that would be a > major API change which would immediately break lots of userspace, > causing user stack overflows and such like. Well, it wouldn't need to go into the default register set (REGSET_GPR), but could be in a new register set. This shouldn't change anything for existing applications, but GDB could probe for support for the new regset and use it if present. > I can't see a way out of this - and given the seriousness of the kernel > side issue (causing kernel warnings), and that your change altered the > strace behaviour (an unintended user-visible change) I think we're going > to have to live with the gdb testcase failing until we can come up with > a better fix for it. The change you suggest, with the addition of making the flag available to the debugger in a new register set, should be fine for GDB. I agree that the race you pointed out in the initial mail to this thread needs to be fixed; an in fact, it seems that this race is also present on s390 ... In discussions with Martin, we were wondering however whether the fix you suggest does actually fully plug the race: Assume the scenario you initally describe, where a first signal is ignored and leads to system call restart. With your latest patch, you call into syscall_restart which sets everything up to restart the call (with interrupts disabled). Then, you go back to user space, with PC pointing to the SVC that is to be restarted. Now assume that some interrupt is pending at this point. At the switch to user space, interrupts will be enabled. Can it now happen that this pending interrupt is recognized *before* the SVC is executed? (Maybe the hardware doesn't allow that on ARM ... it definitely *is* possible on s390, though.) If so, can this not then lead to the exact same race you initially described (the interrupt causing some other task to be scheduled, and then an second signal to be delivered to the task about to restart the SVC)? I guess one way to fix this would be to not actually go back to user space for system call restart at all, but instead just short- cut this in entry.S and simply go right back to the system call entry path. As a side benefit, this could eliminate the need to create a stack frame for the OABI RESTARTBLOCK case ... (This is what we do on s390 b.t.w., where there is a similar issue with the system call number being an immediate argument to the SVC instruction.) > I also wonder what the validity of this behaviour is - there are cases > where you can't do what gdb's trying to do - eg, with a syscall using > a restart block (-ERESTART_RESTARTBLOCK) because the restart information > could be wiped out by a new syscall performed by the function gdb wants > to run. Or when the program receives a signal for it to handle while > running that function. Yes, performing an inferior call during a system call that restarts using RESTARTBLOCK is broken. This is a problem on all architectures, and has been ever since the RESTARTBLOCK mechanism was introduced ... To really fix this case would probably require some way for the debugger to save and restore the restore_block saved state. This is not quite trivial, since it would expose that state to user space, effectively creating a new ABI (and probably requiring sanity checks to ensure a valid state is restored). This probably cannot be fixed by one architecture for itself, but would need support from common kernel code. Bye, Ulrich
On Fri, Sep 02, 2011 at 04:47:35PM +0200, Ulrich Weigand wrote: > Assume the scenario you initally describe, where a first signal is > ignored and leads to system call restart. With your latest patch, > you call into syscall_restart which sets everything up to restart > the call (with interrupts disabled). I don't think SIG_IGN signals even set the TIF work flag, so they never even cause a call into do_signal(). Therefore, as far as syscalls go, attempting to send a process (eg) a SIGINT which its handler is set to SIG_IGN results in the process not even being notified about the attempt - we won't even wake up while the syscall is sleeping. > To really fix this case would probably require some way for the > debugger to save and restore the restore_block saved state. This > is not quite trivial, since it would expose that state to user space, > effectively creating a new ABI (and probably requiring sanity checks > to ensure a valid state is restored). This probably cannot be fixed > by one architecture for itself, but would need support from common > kernel code. Such state would have to be crytographically signed or kept entirely within the kernel, as it would otherwise mean that you could redirect the kernel PC to anywhere...
Russell King - ARM Linux <linux@arm.linux.org.uk> wrote on 09/02/2011 07:22:59 PM: > On Fri, Sep 02, 2011 at 04:47:35PM +0200, Ulrich Weigand wrote: > > Assume the scenario you initally describe, where a first signal is > > ignored and leads to system call restart. With your latest patch, > > you call into syscall_restart which sets everything up to restart > > the call (with interrupts disabled). > > I don't think SIG_IGN signals even set the TIF work flag, so they > never even cause a call into do_signal(). Therefore, as far as > syscalls go, attempting to send a process (eg) a SIGINT which its > handler is set to SIG_IGN results in the process not even being > notified about the attempt - we won't even wake up while the > syscall is sleeping. I don't see why SIG_IGN signals shouldn't set the TIF work flag; the decision whether to ignore a signal is only made once we've got to get_signal_to_deliver. In any case, whether or not the signal is SIG_IGN doesn't matter for the example at all; I'm simply talking about the case whether the first signal we get leads to system call restart, exactly the same as in the original example you initially described here: http://lists.arm.linux.org.uk/lurker/message/20110823.154329.a3e65f95.en.html > > To really fix this case would probably require some way for the > > debugger to save and restore the restore_block saved state. This > > is not quite trivial, since it would expose that state to user space, > > effectively creating a new ABI (and probably requiring sanity checks > > to ensure a valid state is restored). This probably cannot be fixed > > by one architecture for itself, but would need support from common > > kernel code. > > Such state would have to be crytographically signed or kept entirely > within the kernel, as it would otherwise mean that you could redirect > the kernel PC to anywhere... Agreed, that's why the state would need to be verified (in the case of the function pointer, we probably would not want to export the kernel code address to user space in any case, but identify which of the possible target functions is to be called in some other manner). Bye, Ulrich
On Fri, Sep 02, 2011 at 07:40:34PM +0200, Ulrich Weigand wrote: > Russell King - ARM Linux <linux@arm.linux.org.uk> wrote on 09/02/2011 > 07:22:59 PM: > > On Fri, Sep 02, 2011 at 04:47:35PM +0200, Ulrich Weigand wrote: > > > Assume the scenario you initally describe, where a first signal is > > > ignored and leads to system call restart. With your latest patch, > > > you call into syscall_restart which sets everything up to restart > > > the call (with interrupts disabled). > > > > I don't think SIG_IGN signals even set the TIF work flag, so they > > never even cause a call into do_signal(). Therefore, as far as > > syscalls go, attempting to send a process (eg) a SIGINT which its > > handler is set to SIG_IGN results in the process not even being > > notified about the attempt - we won't even wake up while the > > syscall is sleeping. > > I don't see why SIG_IGN signals shouldn't set the TIF work flag; > the decision whether to ignore a signal is only made once we've > got to get_signal_to_deliver. Yes, having looked deeper, you seem to be right - but only if the thread is being ptraced. If it's not being ptraced, ignored signals don't make it that far. And yes, we can end up processing the interrupt before the SVC is executed, which is still a hole. So we need to avoid doing the restart in userspace - which might actually make things easier. I'll take a look into that over the weekend.
diff --git a/arch/arm/include/asm/thread_info.h b/arch/arm/include/asm/thread_info.h index 7b5cc8d..40df533 100644 --- a/arch/arm/include/asm/thread_info.h +++ b/arch/arm/include/asm/thread_info.h @@ -129,6 +129,7 @@ extern void vfp_flush_hwstate(struct thread_info *); /* * thread information flags: * TIF_SYSCALL_TRACE - syscall trace active + * TIF_SYS_RESTART - syscall restart processing * TIF_SIGPENDING - signal pending * TIF_NEED_RESCHED - rescheduling necessary * TIF_NOTIFY_RESUME - callback before returning to user @@ -139,6 +140,7 @@ extern void vfp_flush_hwstate(struct thread_info *); #define TIF_NEED_RESCHED 1 #define TIF_NOTIFY_RESUME 2 /* callback before returning to user */ #define TIF_SYSCALL_TRACE 8 +#define TIF_SYS_RESTART 9 #define TIF_POLLING_NRFLAG 16 #define TIF_USING_IWMMXT 17 #define TIF_MEMDIE 18 /* is terminating due to OOM killer */ @@ -147,6 +149,7 @@ extern void vfp_flush_hwstate(struct thread_info *); #define TIF_SECCOMP 21 #define _TIF_SIGPENDING (1 << TIF_SIGPENDING) +#define _TIF_SYS_RESTART (1 << TIF_SYS_RESTART) #define _TIF_NEED_RESCHED (1 << TIF_NEED_RESCHED) #define _TIF_NOTIFY_RESUME (1 << TIF_NOTIFY_RESUME) #define _TIF_SYSCALL_TRACE (1 << TIF_SYSCALL_TRACE) diff --git a/arch/arm/kernel/entry-common.S b/arch/arm/kernel/entry-common.S index b2a27b6..e922b85 100644 --- a/arch/arm/kernel/entry-common.S +++ b/arch/arm/kernel/entry-common.S @@ -45,6 +45,7 @@ ret_fast_syscall: fast_work_pending: str r0, [sp, #S_R0+S_OFF]! @ returned r0 work_pending: + enable_irq tst r1, #_TIF_NEED_RESCHED bne work_resched tst r1, #_TIF_SIGPENDING|_TIF_NOTIFY_RESUME @@ -56,6 +57,13 @@ work_pending: bl do_notify_resume b ret_slow_syscall @ Check work again +work_syscall_restart: + mov r0, sp @ 'regs' + bl syscall_restart @ process system call restart + teq r0, #0 @ if ret=0 -> success, so + beq ret_restart @ return to userspace directly + b ret_slow_syscall @ otherwise, we have a segfault + work_resched: bl schedule /* @@ -69,6 +77,9 @@ ENTRY(ret_to_user_from_irq) tst r1, #_TIF_WORK_MASK bne work_pending no_work_pending: + tst r1, #_TIF_SYS_RESTART + bne work_syscall_restart +ret_restart: #if defined(CONFIG_IRQSOFF_TRACER) asm_trace_hardirqs_on #endif diff --git a/arch/arm/kernel/ptrace.c b/arch/arm/kernel/ptrace.c index 2491f3b..ac8c34e 100644 --- a/arch/arm/kernel/ptrace.c +++ b/arch/arm/kernel/ptrace.c @@ -177,6 +177,7 @@ put_user_reg(struct task_struct *task, int offset, long data) if (valid_user_regs(&newregs)) { regs->uregs[offset] = data; + clear_ti_thread_flag(task_thread_info(task), TIF_SYS_RESTART); ret = 0; } @@ -604,6 +605,7 @@ static int gpr_set(struct task_struct *target, return -EINVAL; *task_pt_regs(target) = newregs; + clear_ti_thread_flag(task_thread_info(target), TIF_SYS_RESTART); return 0; } diff --git a/arch/arm/kernel/signal.c b/arch/arm/kernel/signal.c index 0340224..42a1521 100644 --- a/arch/arm/kernel/signal.c +++ b/arch/arm/kernel/signal.c @@ -649,6 +649,135 @@ handle_signal(unsigned long sig, struct k_sigaction *ka, } /* + * Syscall restarting codes + * + * -ERESTARTSYS: restart system call if no handler, or if there is a + * handler but it's marked SA_RESTART. Otherwise return -EINTR. + * -ERESTARTNOINTR: always restart system call + * -ERESTARTNOHAND: restart system call only if no handler, otherwise + * return -EINTR if invoking a user signal handler. + * -ERESTART_RESTARTBLOCK: call restart syscall if no handler, otherwise + * return -EINTR if invoking a user signal handler. + */ +static void setup_syscall_restart(struct pt_regs *regs) +{ + regs->ARM_r0 = regs->ARM_ORIG_r0; + regs->ARM_pc -= thumb_mode(regs) ? 2 : 4; +} + +/* + * Depending on the signal settings we may need to revert the decision + * to restart the system call. But skip this if a debugger has chosen + * to restart at a different PC. + */ +static void syscall_restart_handler(struct pt_regs *regs, struct k_sigaction *ka) +{ + if (test_and_clear_thread_flag(TIF_SYS_RESTART)) { + long r0 = regs->ARM_r0; + + /* + * By default, return -EINTR to the user process for any + * syscall which would otherwise be restarted. + */ + regs->ARM_r0 = -EINTR; + + if (r0 == -ERESTARTNOINTR || + (r0 == -ERESTARTSYS && !(ka->sa.sa_flags & SA_RESTART))) + setup_syscall_restart(regs); + } +} + +/* + * Handle syscall restarting when there is no user handler in place for + * a delivered signal. Rather than doing this as part of the normal + * signal processing, we do this on the final return to userspace, after + * we've finished handling signals and checking for schedule events. + * + * This avoids bad behaviour such as: + * - syscall returns -ERESTARTNOHAND + * - signal with no handler (so we set things up to restart the syscall) + * - schedule + * - signal with handler (eg, SIGALRM) + * - we call the handler and then restart the syscall + * + * In order to avoid races with TIF_NEED_RESCHED, IRQs must be disabled + * when this function is called and remain disabled until we exit to + * userspace. + */ +asmlinkage int syscall_restart(struct pt_regs *regs) +{ + struct thread_info *thread = current_thread_info(); + + clear_ti_thread_flag(thread, TIF_SYS_RESTART); + + /* + * Restart the system call. We haven't setup a signal handler + * to invoke, and the regset hasn't been usurped by ptrace. + */ + if (regs->ARM_r0 == -ERESTART_RESTARTBLOCK) { + if (thumb_mode(regs)) { + regs->ARM_r7 = __NR_restart_syscall - __NR_SYSCALL_BASE; + regs->ARM_pc -= 2; + } else { +#if defined(CONFIG_AEABI) && !defined(CONFIG_OABI_COMPAT) + regs->ARM_r7 = __NR_restart_syscall; + regs->ARM_pc -= 4; +#else + u32 sp = regs->ARM_sp - 4; + u32 __user *usp = (u32 __user *)sp; + int ret; + + /* + * For OABI, we need to play some extra games, because + * we need to write to the users stack, which we can't + * do reliably from IRQs-disabled context. Temporarily + * re-enable IRQs, perform the store, and then plug + * the resulting race afterwards. + */ + local_irq_enable(); + ret = put_user(regs->ARM_pc, usp); + local_irq_disable(); + + /* + * Plug the reschedule race - if we need to reschedule, + * abort the syscall restarting. We haven't modified + * anything other than the attempted write to the stack + * so we can merely retry later. + */ + if (need_resched()) { + set_ti_thread_flag(thread, TIF_SYS_RESTART); + return -EINTR; + } + + /* + * We failed (for some reason) to write to the stack. + * Terminate the task. + */ + if (ret) { + force_sigsegv(0, current); + return -EFAULT; + } + + /* + * Success, update the stack pointer and point the + * PC at the restarting code. + */ + regs->ARM_sp = sp; + regs->ARM_pc = KERN_RESTART_CODE; +#endif + } + } else { + /* + * Simple restart - just back up and re-execute the last + * instruction. + */ + setup_syscall_restart(regs); + } + + return 0; +} + +/* * Note that 'init' is a special process: it doesn't get signals it doesn't * want to handle. Thus you cannot kill init even with a SIGKILL even by * mistake. @@ -659,7 +788,6 @@ handle_signal(unsigned long sig, struct k_sigaction *ka, */ static void do_signal(struct pt_regs *regs, int syscall) { - unsigned int retval = 0, continue_addr = 0, restart_addr = 0; struct k_sigaction ka; siginfo_t info; int signr; @@ -674,32 +802,16 @@ static void do_signal(struct pt_regs *regs, int syscall) return; /* - * If we were from a system call, check for system call restarting... + * Set the SYS_RESTART flag to indicate that we have some + * cleanup of the restart state to perform when returning to + * userspace. */ - if (syscall) { - continue_addr = regs->ARM_pc; - restart_addr = continue_addr - (thumb_mode(regs) ? 2 : 4); - retval = regs->ARM_r0; - - /* - * Prepare for system call restart. We do this here so that a - * debugger will see the already changed PSW. - */ - switch (retval) { - case -ERESTARTNOHAND: - case -ERESTARTSYS: - case -ERESTARTNOINTR: - regs->ARM_r0 = regs->ARM_ORIG_r0; - regs->ARM_pc = restart_addr; - break; - case -ERESTART_RESTARTBLOCK: - regs->ARM_r0 = -EINTR; - break; - } - } - - if (try_to_freeze()) - goto no_signal; + if (syscall && + (regs->ARM_r0 == -ERESTARTSYS || + regs->ARM_r0 == -ERESTARTNOINTR || + regs->ARM_r0 == -ERESTARTNOHAND || + regs->ARM_r0 == -ERESTART_RESTARTBLOCK)) + set_thread_flag(TIF_SYS_RESTART); /* * Get the signal to deliver. When running under ptrace, at this @@ -709,19 +821,7 @@ static void do_signal(struct pt_regs *regs, int syscall) if (signr > 0) { sigset_t *oldset; - /* - * Depending on the signal settings we may need to revert the - * decision to restart the system call. But skip this if a - * debugger has chosen to restart at a different PC. - */ - if (regs->ARM_pc == restart_addr) { - if (retval == -ERESTARTNOHAND - || (retval == -ERESTARTSYS - && !(ka.sa.sa_flags & SA_RESTART))) { - regs->ARM_r0 = -EINTR; - regs->ARM_pc = continue_addr; - } - } + syscall_restart_handler(regs, &ka); if (test_thread_flag(TIF_RESTORE_SIGMASK)) oldset = ¤t->saved_sigmask; @@ -740,38 +840,7 @@ static void do_signal(struct pt_regs *regs, int syscall) return; } - no_signal: if (syscall) { - /* - * Handle restarting a different system call. As above, - * if a debugger has chosen to restart at a different PC, - * ignore the restart. - */ - if (retval == -ERESTART_RESTARTBLOCK - && regs->ARM_pc == continue_addr) { - if (thumb_mode(regs)) { - regs->ARM_r7 = __NR_restart_syscall - __NR_SYSCALL_BASE; - regs->ARM_pc -= 2; - } else { -#if defined(CONFIG_AEABI) && !defined(CONFIG_OABI_COMPAT) - regs->ARM_r7 = __NR_restart_syscall; - regs->ARM_pc -= 4; -#else - u32 __user *usp; - - regs->ARM_sp -= 4; - usp = (u32 __user *)regs->ARM_sp; - - if (put_user(regs->ARM_pc, usp) == 0) { - regs->ARM_pc = KERN_RESTART_CODE; - } else { - regs->ARM_sp += 4; - force_sigsegv(0, current); - } -#endif - } - } - /* If there's no signal to deliver, we just put the saved sigmask * back. */