Message ID | 507ADBBB.9090209@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Sun, Oct 14, 2012 at 05:35:23PM +0200, Daniel Mack wrote: > I rebased my ARM development branch and figured that your patch 9fff2fa > ("arm: switch to saner kernel_execve() semantics") breaks the boot on my > board right after init is invoked via NFS: OK, revert it is, then. Nothing in the tree has dependencies on that sucker and while it survives testing here, it's obviously not ready for mainline. So, with abject apologies to everyone involved, please revert.
On Oct 14, 2012 6:40 PM, "Al Viro" <viro@zeniv.linux.org.uk> wrote: > > On Sun, Oct 14, 2012 at 05:35:23PM +0200, Daniel Mack wrote: > > > I rebased my ARM development branch and figured that your patch 9fff2fa > > ("arm: switch to saner kernel_execve() semantics") breaks the boot on my > > board right after init is invoked via NFS: > > OK, revert it is, then. Nothing in the tree has dependencies on that sucker > and while it survives testing here, it's obviously not ready for mainline. > So, with abject apologies to everyone involved, please revert. Reverting it is not straight forward, and half of this patch doesn't seem to cause issues. I can resend my patch with an S-o-b if you want me to.
On Sun, Oct 14, 2012 at 06:44:12PM +0200, Daniel Mack wrote: > On Oct 14, 2012 6:40 PM, "Al Viro" <viro@zeniv.linux.org.uk> wrote: > > > > On Sun, Oct 14, 2012 at 05:35:23PM +0200, Daniel Mack wrote: > > > > > I rebased my ARM development branch and figured that your patch 9fff2fa > > > ("arm: switch to saner kernel_execve() semantics") breaks the boot on my > > > board right after init is invoked via NFS: > > > > OK, revert it is, then. Nothing in the tree has dependencies on that > sucker > > and while it survives testing here, it's obviously not ready for mainline. > > So, with abject apologies to everyone involved, please revert. > > Reverting it is not straight forward, and half of this patch doesn't seem > to cause issues. > > I can resend my patch with an S-o-b if you want me to. Um... That's _really_ interesting. First of all, revert is absolutely straightforward; the only change in Kconfig is "remove the damn select" and it's not hard to resolve. But I actually wonder what the hell is going on with that breakage - the *only* thing your revert changes is that instead of letting the kernel_thread callback return all the way to returning 0 to ret_from_kernel_thread() on do_execve() success you have it do ret_from_kernel_execve() instead. Hmm... Could you try to print current_pt_regs()->ARM_r0 in kernel_execve() before calling ret_from_kernel_execve() with your patch applied? If that somehow got non-zero, we'd see trouble, all right, but I don't see any places where it could. Wait a minute... I think I see what might be going on, but I don't understand it at all. Look: arm start_thread() is #define start_thread(regs,pc,sp) \ ({ \ unsigned long *stack = (unsigned long *)sp; \ memset(regs->uregs, 0, sizeof(regs->uregs)); \ if (current->personality & ADDR_LIMIT_32BIT) \ regs->ARM_cpsr = USR_MODE; \ else \ regs->ARM_cpsr = USR26_MODE; \ if (elf_hwcap & HWCAP_THUMB && pc & 1) \ regs->ARM_cpsr |= PSR_T_BIT; \ regs->ARM_cpsr |= PSR_ENDSTATE; \ regs->ARM_pc = pc & ~1; /* pc */ \ regs->ARM_sp = sp; /* sp */ \ regs->ARM_r2 = stack[2]; /* r2 (envp) */ \ regs->ARM_r1 = stack[1]; /* r1 (argv) */ \ regs->ARM_r0 = stack[0]; /* r0 (argc) */ \ nommu_start_thread(regs); \ }) and the last 3 make no sense whatsoever. Note that on normal execve() we'll be going through the syscall return, so the userland will see 0 in there, no matter what do we do here. Theoretically, it might've been done for ptrace sake (it will be able to observe the values in those registers before the tracee reaches userland), but there's another oddity involved - "stack" is a userland pointer here. Granted, it's been recently written to, so we are not likely to hit a pagefault here, but... What happens if we are under enough memory pressure to swap those pages out? PF in the kernel mode with no exception table entries for those insns?
On Sun, Oct 14, 2012 at 06:26:40PM +0100, Al Viro wrote: > On Sun, Oct 14, 2012 at 06:44:12PM +0200, Daniel Mack wrote: > > On Oct 14, 2012 6:40 PM, "Al Viro" <viro@zeniv.linux.org.uk> wrote: > > > > > > On Sun, Oct 14, 2012 at 05:35:23PM +0200, Daniel Mack wrote: > > > > > > > I rebased my ARM development branch and figured that your patch 9fff2fa > > > > ("arm: switch to saner kernel_execve() semantics") breaks the boot on my > > > > board right after init is invoked via NFS: > > > > > > OK, revert it is, then. Nothing in the tree has dependencies on that > > sucker > > > and while it survives testing here, it's obviously not ready for mainline. > > > So, with abject apologies to everyone involved, please revert. > > > > Reverting it is not straight forward, and half of this patch doesn't seem > > to cause issues. > > > > I can resend my patch with an S-o-b if you want me to. > > Um... That's _really_ interesting. First of all, revert is absolutely > straightforward; the only change in Kconfig is "remove the damn select" > and it's not hard to resolve. But I actually wonder what the hell is > going on with that breakage - the *only* thing your revert changes is > that instead of letting the kernel_thread callback return all the way > to returning 0 to ret_from_kernel_thread() on do_execve() success you > have it do ret_from_kernel_execve() instead. Hmm... > > Could you try to print current_pt_regs()->ARM_r0 in kernel_execve() before > calling ret_from_kernel_execve() with your patch applied? If that somehow > got non-zero, we'd see trouble, all right, but I don't see any places where > it could. > > Wait a minute... I think I see what might be going on, but I don't > understand it at all. Look: arm start_thread() is > #define start_thread(regs,pc,sp) \ > ({ \ > unsigned long *stack = (unsigned long *)sp; \ > memset(regs->uregs, 0, sizeof(regs->uregs)); \ > if (current->personality & ADDR_LIMIT_32BIT) \ > regs->ARM_cpsr = USR_MODE; \ > else \ > regs->ARM_cpsr = USR26_MODE; \ > if (elf_hwcap & HWCAP_THUMB && pc & 1) \ > regs->ARM_cpsr |= PSR_T_BIT; \ > regs->ARM_cpsr |= PSR_ENDSTATE; \ > regs->ARM_pc = pc & ~1; /* pc */ \ > regs->ARM_sp = sp; /* sp */ \ > regs->ARM_r2 = stack[2]; /* r2 (envp) */ \ > regs->ARM_r1 = stack[1]; /* r1 (argv) */ \ > regs->ARM_r0 = stack[0]; /* r0 (argc) */ \ > nommu_start_thread(regs); \ > }) > and the last 3 make no sense whatsoever. Note that on normal execve() we'll > be going through the syscall return, so the userland will see 0 in there, > no matter what do we do here. Theoretically, it might've been done for > ptrace sake (it will be able to observe the values in those registers before > the tracee reaches userland), but there's another oddity involved - "stack" > is a userland pointer here. Granted, it's been recently written to, so > we are not likely to hit a pagefault here, but... What happens if we are > under enough memory pressure to swap those pages out? PF in the kernel > mode with no exception table entries for those insns? FWIW, if you don't mind an experiment, try to take mainline (with that commit not reverted) and add strne r0, [sp, #S_R0] right before get_thread_info tsk in ret_from_fork(). And see if that changes behaviour.
On 14.10.2012 19:55, Al Viro wrote: > On Sun, Oct 14, 2012 at 06:26:40PM +0100, Al Viro wrote: >> On Sun, Oct 14, 2012 at 06:44:12PM +0200, Daniel Mack wrote: >>> On Oct 14, 2012 6:40 PM, "Al Viro" <viro@zeniv.linux.org.uk> wrote: >>>> >>>> On Sun, Oct 14, 2012 at 05:35:23PM +0200, Daniel Mack wrote: >>>> >>>>> I rebased my ARM development branch and figured that your patch 9fff2fa >>>>> ("arm: switch to saner kernel_execve() semantics") breaks the boot on my >>>>> board right after init is invoked via NFS: >>>> >>>> OK, revert it is, then. Nothing in the tree has dependencies on that >>> sucker >>>> and while it survives testing here, it's obviously not ready for mainline. >>>> So, with abject apologies to everyone involved, please revert. >>> >>> Reverting it is not straight forward, and half of this patch doesn't seem >>> to cause issues. >>> >>> I can resend my patch with an S-o-b if you want me to. >> >> Um... That's _really_ interesting. First of all, revert is absolutely >> straightforward; the only change in Kconfig is "remove the damn select" >> and it's not hard to resolve. But I actually wonder what the hell is >> going on with that breakage - the *only* thing your revert changes is >> that instead of letting the kernel_thread callback return all the way >> to returning 0 to ret_from_kernel_thread() on do_execve() success you >> have it do ret_from_kernel_execve() instead. Hmm... >> >> Could you try to print current_pt_regs()->ARM_r0 in kernel_execve() before >> calling ret_from_kernel_execve() with your patch applied? If that somehow >> got non-zero, we'd see trouble, all right, but I don't see any places where >> it could. >> >> Wait a minute... I think I see what might be going on, but I don't >> understand it at all. Look: arm start_thread() is >> #define start_thread(regs,pc,sp) \ >> ({ \ >> unsigned long *stack = (unsigned long *)sp; \ >> memset(regs->uregs, 0, sizeof(regs->uregs)); \ >> if (current->personality & ADDR_LIMIT_32BIT) \ >> regs->ARM_cpsr = USR_MODE; \ >> else \ >> regs->ARM_cpsr = USR26_MODE; \ >> if (elf_hwcap & HWCAP_THUMB && pc & 1) \ >> regs->ARM_cpsr |= PSR_T_BIT; \ >> regs->ARM_cpsr |= PSR_ENDSTATE; \ >> regs->ARM_pc = pc & ~1; /* pc */ \ >> regs->ARM_sp = sp; /* sp */ \ >> regs->ARM_r2 = stack[2]; /* r2 (envp) */ \ >> regs->ARM_r1 = stack[1]; /* r1 (argv) */ \ >> regs->ARM_r0 = stack[0]; /* r0 (argc) */ \ >> nommu_start_thread(regs); \ >> }) >> and the last 3 make no sense whatsoever. Note that on normal execve() we'll >> be going through the syscall return, so the userland will see 0 in there, >> no matter what do we do here. Theoretically, it might've been done for >> ptrace sake (it will be able to observe the values in those registers before >> the tracee reaches userland), but there's another oddity involved - "stack" >> is a userland pointer here. Granted, it's been recently written to, so >> we are not likely to hit a pagefault here, but... What happens if we are >> under enough memory pressure to swap those pages out? PF in the kernel >> mode with no exception table entries for those insns? > > FWIW, if you don't mind an experiment, try to take mainline (with that > commit not reverted) and add > strne r0, [sp, #S_R0] > right before > get_thread_info tsk > in ret_from_fork(). And see if that changes behaviour. > I don't mind experiments at all :) However, with that extra line in place as described, I'm still getting the Oops below. If you want me to test anything else, please let me know. [ 4.683182] VFS: Mounted root (nfs filesystem) on device 0:12. [ 4.742007] devtmpfs: mounted [ 4.745746] Freeing init memory: 172K [ 5.038724] Internal error: Oops - undefined instruction: 0 [#1] SMP THUMB2 [ 5.046044] Modules linked in: [ 5.049263] CPU: 0 Not tainted (3.6.0-11053-g56c8535-dirty #136) [ 5.055925] PC is at cpsw_probe+0x422/0x9ac [ 5.060314] LR is at trace_hardirqs_on_caller+0x8f/0xfc [ 5.065790] pc : [<c03493de>] lr : [<c005e81f>] psr: 60000113 [ 5.065790] sp : cf055fb0 ip : 00000000 fp : 00000000 [ 5.077800] r10: 00000000 r9 : 00000000 r8 : 00000000 [ 5.083270] r7 : 00000000 r6 : 00000000 r5 : c034458d r4 : 00000000 [ 5.090101] r3 : cf057a40 r2 : 00000000 r1 : 00000001 r0 : 00000000 [ 5.096936] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user [ 5.104406] Control: 50c5387d Table: 8f434019 DAC: 00000015 [ 5.110422] Process init (pid: 1, stack limit = 0xcf054240) [ 5.116257] Stack: (0xcf055fb0 to 0xcf056000) [ 5.120824] 5fa0: 00000001 00000000 00000000 00000000 [ 5.129390] 5fc0: cf055fb0 c000d1a8 00000000 00000000 00000000 00000000 00000000 00000000 [ 5.137957] 5fe0: 00000000 becedf10 00000000 b6f81dd0 00000010 00000000 aaaabfaf a8babbaa [ 5.146529] Code: 2206a010 718ef508 0184f8da f8b1f65d (3070f8d8) [ 5.152915] ---[ end trace 7362bbe8e73e6b07 ]--- [ 5.158324] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b [ 5.158324]
On Sun, Oct 14, 2012 at 08:21:53PM +0200, Daniel Mack wrote: > On 14.10.2012 19:55, Al Viro wrote: > > On Sun, Oct 14, 2012 at 06:26:40PM +0100, Al Viro wrote: > >> On Sun, Oct 14, 2012 at 06:44:12PM +0200, Daniel Mack wrote: > >>> On Oct 14, 2012 6:40 PM, "Al Viro" <viro@zeniv.linux.org.uk> wrote: > >>>> > >>>> On Sun, Oct 14, 2012 at 05:35:23PM +0200, Daniel Mack wrote: > >>>> > >>>>> I rebased my ARM development branch and figured that your patch 9fff2fa > >>>>> ("arm: switch to saner kernel_execve() semantics") breaks the boot on my > >>>>> board right after init is invoked via NFS: > >>>> > >>>> OK, revert it is, then. Nothing in the tree has dependencies on that > >>> sucker > >>>> and while it survives testing here, it's obviously not ready for mainline. > >>>> So, with abject apologies to everyone involved, please revert. > >>> > >>> Reverting it is not straight forward, and half of this patch doesn't seem > >>> to cause issues. > >>> > >>> I can resend my patch with an S-o-b if you want me to. > >> > >> Um... That's _really_ interesting. First of all, revert is absolutely > >> straightforward; the only change in Kconfig is "remove the damn select" > >> and it's not hard to resolve. But I actually wonder what the hell is > >> going on with that breakage - the *only* thing your revert changes is > >> that instead of letting the kernel_thread callback return all the way > >> to returning 0 to ret_from_kernel_thread() on do_execve() success you > >> have it do ret_from_kernel_execve() instead. Hmm... > >> > >> Could you try to print current_pt_regs()->ARM_r0 in kernel_execve() before > >> calling ret_from_kernel_execve() with your patch applied? If that somehow > >> got non-zero, we'd see trouble, all right, but I don't see any places where > >> it could. > >> > >> Wait a minute... I think I see what might be going on, but I don't > >> understand it at all. Look: arm start_thread() is > >> #define start_thread(regs,pc,sp) \ > >> ({ \ > >> unsigned long *stack = (unsigned long *)sp; \ > >> memset(regs->uregs, 0, sizeof(regs->uregs)); \ > >> if (current->personality & ADDR_LIMIT_32BIT) \ > >> regs->ARM_cpsr = USR_MODE; \ > >> else \ > >> regs->ARM_cpsr = USR26_MODE; \ > >> if (elf_hwcap & HWCAP_THUMB && pc & 1) \ > >> regs->ARM_cpsr |= PSR_T_BIT; \ > >> regs->ARM_cpsr |= PSR_ENDSTATE; \ > >> regs->ARM_pc = pc & ~1; /* pc */ \ > >> regs->ARM_sp = sp; /* sp */ \ > >> regs->ARM_r2 = stack[2]; /* r2 (envp) */ \ > >> regs->ARM_r1 = stack[1]; /* r1 (argv) */ \ > >> regs->ARM_r0 = stack[0]; /* r0 (argc) */ \ > >> nommu_start_thread(regs); \ > >> }) > >> and the last 3 make no sense whatsoever. Note that on normal execve() we'll > >> be going through the syscall return, so the userland will see 0 in there, > >> no matter what do we do here. Theoretically, it might've been done for > >> ptrace sake (it will be able to observe the values in those registers before > >> the tracee reaches userland), but there's another oddity involved - "stack" > >> is a userland pointer here. Granted, it's been recently written to, so > >> we are not likely to hit a pagefault here, but... What happens if we are > >> under enough memory pressure to swap those pages out? PF in the kernel > >> mode with no exception table entries for those insns? > > > > FWIW, if you don't mind an experiment, try to take mainline (with that > > commit not reverted) and add > > strne r0, [sp, #S_R0] > > right before > > get_thread_info tsk > > in ret_from_fork(). And see if that changes behaviour. > > > > I don't mind experiments at all :) > > However, with that extra line in place as described, I'm still getting > the Oops below. If you want me to test anything else, please let me know. > > > [ 4.683182] VFS: Mounted root (nfs filesystem) on device 0:12. > [ 4.742007] devtmpfs: mounted > [ 4.745746] Freeing init memory: 172K > [ 5.038724] Internal error: Oops - undefined instruction: 0 [#1] SMP > THUMB2 > [ 5.046044] Modules linked in: > [ 5.049263] CPU: 0 Not tainted (3.6.0-11053-g56c8535-dirty #136) > [ 5.055925] PC is at cpsw_probe+0x422/0x9ac > [ 5.060314] LR is at trace_hardirqs_on_caller+0x8f/0xfc > [ 5.065790] pc : [<c03493de>] lr : [<c005e81f>] psr: 60000113 > [ 5.065790] sp : cf055fb0 ip : 00000000 fp : 00000000 > [ 5.077800] r10: 00000000 r9 : 00000000 r8 : 00000000 > [ 5.083270] r7 : 00000000 r6 : 00000000 r5 : c034458d r4 : 00000000 > [ 5.090101] r3 : cf057a40 r2 : 00000000 r1 : 00000001 r0 : 00000000 > [ 5.096936] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM > Segment user > [ 5.104406] Control: 50c5387d Table: 8f434019 DAC: 00000015 > [ 5.110422] Process init (pid: 1, stack limit = 0xcf054240) > [ 5.116257] Stack: (0xcf055fb0 to 0xcf056000) > [ 5.120824] 5fa0: 00000001 > 00000000 00000000 00000000 > [ 5.129390] 5fc0: cf055fb0 c000d1a8 00000000 00000000 00000000 > 00000000 00000000 00000000 > [ 5.137957] 5fe0: 00000000 becedf10 00000000 b6f81dd0 00000010 > 00000000 aaaabfaf a8babbaa > [ 5.146529] Code: 2206a010 718ef508 0184f8da f8b1f65d (3070f8d8) > [ 5.152915] ---[ end trace 7362bbe8e73e6b07 ]--- > [ 5.158324] Kernel panic - not syncing: Attempted to kill init! > exitcode=0x0000000b > [ 5.158324] Very interesting... So we have kernel_thread() payload called and we have it reach kernel_execve() (otherwise your reverting kernel_execve() change would've had no effect). Said payload returns, and sp value seems to be sane. Buggered return address, perhaps? But that would be killing the damn thing everywhere... Just in case - print __builtin_return_address(0) in the beginning of kernel_init(); it ought to point at the end of ret_from_fork... And kill that strne along with assignment to ->ARM_r0 in start_thread(). I've missed the obvious problem with strne - flag values won't survive the call of payload ;-/ IOW, it's still possible that we are getting bitten by strange value left in there. Removing the assignment in start_thread() would check that possiblity...
On Sun, Oct 14, 2012 at 06:26:40PM +0100, Al Viro wrote: > and the last 3 make no sense whatsoever. Note that on normal execve() we'll > be going through the syscall return, so the userland will see 0 in there, > no matter what do we do here. Theoretically, it might've been done for > ptrace sake (it will be able to observe the values in those registers before > the tracee reaches userland), Except that it won't be able to see what start_thread() puts in r0 either; on successful exceve(2) we will store return value of sys_execve() (i.e. 0) in regs->ARM_r0 before we get to any of the places where it could have examine the sucker. So what was that assignment for? And as far as I can see, ARM ELF ABI says that general register values on process startup are undefined, so r1 and r2 assignments also seem to be pointless. OTOH, they predate the ELF conversion by quite a but - that code had been there since 1.x times, when we used to use a.out... In any case, they were *not* going to be usable as main() arguments - zero argc would make userland rather unhappy. I don't have arm libc sources from those times, but I'd expect it to have all those suckers read from userland stack... Russell, could you recall what those had been about? I'm not sure if that had been oopsable that far back (again, oops scenario is userland stack page getting swapped out before we get to start_thread(), leading to direct read from an absent page in start_thread() by plain ldr, without anything in exception table about that insn), but it looks very odd regardless of that problem.
On Sun, Oct 14, 2012 at 08:24:03PM +0100, Al Viro wrote: > Russell, could you recall what those had been about? I'm not sure if that > had been oopsable that far back (again, oops scenario is userland stack > page getting swapped out before we get to start_thread(), leading to > direct read from an absent page in start_thread() by plain ldr, without > anything in exception table about that insn), but it looks very odd > regardless of that problem. BTW, arm64 has copied that logics, so it also seems to be unsafe and very odd - there we definitely have only ELF to cope with. arm64 folks Cc'd...
On Sun, Oct 14, 2012 at 05:35:23PM +0200, Daniel Mack wrote: > I rebased my ARM development branch and figured that your patch 9fff2fa > ("arm: switch to saner kernel_execve() semantics") breaks the boot on my > board right after init is invoked via NFS: Ok, I'm not going to assign blame to Al's commits (I never reviewed his stuff before they hit mainline - patches never posted to the ARM mailing list, and the development actually happened within the merge window, all things we tell people not to do...) I _still_ haven't reviewed that stuff yet. But... nevertheless... > [ 4.682072] VFS: Mounted root (nfs filesystem) on device 0:12. > [ 4.690744] devtmpfs: mounted > [ 4.694395] Freeing init memory: 172K > [ 5.291417] Internal error: Oops - undefined instruction: 0 [#1] SMP > THUMB2 Ok, so this tells us the kernel was built using Thumb2 ISA. > [ 5.298734] Modules linked in: > [ 5.301952] CPU: 0 Not tainted (3.6.0-11053-g56c8535 #128) > [ 5.308071] PC is at cpsw_probe+0x422/0x9ac PC is not word aligned, so it can't be running in the ARM ISA. > [ 5.312459] LR is at trace_hardirqs_on_caller+0x8f/0xfc > [ 5.317934] pc : [<c03493de>] lr : [<c005e81f>] psr: 60000113 Note that this reconfirms the above (well, it should do, it's the same value.) > [ 5.317934] sp : cf055fb0 ip : 00000000 fp : 00000000 > [ 5.329944] r10: 00000000 r9 : 00000000 r8 : 00000000 > [ 5.335413] r7 : 00000000 r6 : 00000000 r5 : c034458d r4 : 00000000 > [ 5.342244] r3 : cf057a40 r2 : 00000000 r1 : 00000001 r0 : 00000000 > [ 5.349078] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM > Segment user And this tells us that we're running in ARM mode, not Thumb mode. > [ 5.356546] Control: 50c5387d Table: 8f434019 DAC: 00000015 > [ 5.362562] Process init (pid: 1, stack limit = 0xcf054240) > [ 5.368395] Stack: (0xcf055fb0 to 0xcf056000) > [ 5.372961] 5fa0: 00000001 > 00000000 00000000 00000000 > [ 5.381525] 5fc0: cf055fb0 c000d1a8 00000000 00000000 00000000 > 00000000 00000000 00000000 > [ 5.390091] 5fe0: 00000000 bee83f10 00000000 b6fdedd0 00000010 > 00000000 aaaabfaf a8babbaa No stack backtrace (and it's silent about why that is). The other strange thing here is that the stack dump above is showing that the stack is completely empty - which shouldn't be the case if we're in a driver probe function - driver probe functions are called via the driver model layers... > [ 5.398664] Code: 2206a010 718ef508 0184f8da f8b1f65d (3070f8d8) And now we come to the Code: line, which makes no sense as an ARM ISA: 0: 2206a010 andcs sl, r6, #16 4: 718ef508 orrvc pc, lr, r8, lsl #10 8: 0184f8da ldrdeq pc, [r4, sl] c: f8b1f65d ; <UNDEFINED> instruction: 0xf8b1f65d 10: 3070f8d8 ldrsbtcc pc, [r0], #-136 ; 0xffffff78 ; <UNPREDICTABLE> But as Thumb, it looks more reasonable: 0: a010 add r0, pc, #64 ; (adr r0, 44 <foo+0x44>) 2: 2206 movs r2, #6 4: f508 718e add.w r1, r8, #284 ; 0x11c 8: f8da 0184 ldr.w r0, [sl, #388] ; 0x184 c: f65d f8b1 bl ffe5d172 <foo+0xffe5d172> 10: f8d8 3070 ldr.w r3, [r8, #112] ; 0x70 I don't have any further comments to make on this yet, as I've no idea what state stuff is in, but the above oops dump to me suggests that we've randomly jumped into some part of the kernel which just happens to be cpsw_probe(). Please send me (in private mail) your vmlinux file and a corresponding oops dump from that same kernel, and I'll dig and try and work out what's going on... This kind of investigation reminds me of those I did back in the 1990s when stuff was rather unstable and ARM was a young architecture. Now all we need is for an ARM platform to dump its entire memory out the ethernet port, bringing an university department network to a halt (I did that once - back in the 1990s - sorry Tim!)
On Sun, Oct 14, 2012 at 08:56:11PM +0100, Al Viro wrote: > On Sun, Oct 14, 2012 at 08:24:03PM +0100, Al Viro wrote: > > > Russell, could you recall what those had been about? I'm not sure if that > > had been oopsable that far back (again, oops scenario is userland stack > > page getting swapped out before we get to start_thread(), leading to > > direct read from an absent page in start_thread() by plain ldr, without > > anything in exception table about that insn), but it looks very odd > > regardless of that problem. > > BTW, arm64 has copied that logics, so it also seems to be unsafe and very > odd - there we definitely have only ELF to cope with. arm64 folks Cc'd... Good point. We don't need this on arm64 and probably neither on arm (at least since EABI). Setting x0 may cause other issues as well. The dynamic loader simply ignores the startup registers but for static binaries the _start code in glibc expects r0 to contain a function pointer to be registered with atexit() in __libc_start_main() or NULL. Since we pass argc in there, for static binaries the rtld_fini argument to __libc_start_main() is neither NULL nor something meaningful. Russell, do you know whether setting these registers is needed for OABI?
On Mon, Oct 15, 2012 at 05:07:10PM +0100, Catalin Marinas wrote: > On Sun, Oct 14, 2012 at 08:56:11PM +0100, Al Viro wrote: > > On Sun, Oct 14, 2012 at 08:24:03PM +0100, Al Viro wrote: > > > > > Russell, could you recall what those had been about? I'm not sure if that > > > had been oopsable that far back (again, oops scenario is userland stack > > > page getting swapped out before we get to start_thread(), leading to > > > direct read from an absent page in start_thread() by plain ldr, without > > > anything in exception table about that insn), but it looks very odd > > > regardless of that problem. > > > > BTW, arm64 has copied that logics, so it also seems to be unsafe and very > > odd - there we definitely have only ELF to cope with. arm64 folks Cc'd... > > Good point. We don't need this on arm64 and probably neither on arm (at > least since EABI). > > Setting x0 may cause other issues as well. The dynamic loader simply > ignores the startup registers but for static binaries the _start code in > glibc expects r0 to contain a function pointer to be registered with > atexit() in __libc_start_main() or NULL. Since we pass argc in there, > for static binaries the rtld_fini argument to __libc_start_main() is > neither NULL nor something meaningful. The value left there by start_thread() will not reach the userland anyway...
On Mon, Oct 15, 2012 at 05:27:32PM +0100, Al Viro wrote: > On Mon, Oct 15, 2012 at 05:07:10PM +0100, Catalin Marinas wrote: > > On Sun, Oct 14, 2012 at 08:56:11PM +0100, Al Viro wrote: > > > On Sun, Oct 14, 2012 at 08:24:03PM +0100, Al Viro wrote: > > > > > > > Russell, could you recall what those had been about? I'm not sure if that > > > > had been oopsable that far back (again, oops scenario is userland stack > > > > page getting swapped out before we get to start_thread(), leading to > > > > direct read from an absent page in start_thread() by plain ldr, without > > > > anything in exception table about that insn), but it looks very odd > > > > regardless of that problem. > > > > > > BTW, arm64 has copied that logics, so it also seems to be unsafe and very > > > odd - there we definitely have only ELF to cope with. arm64 folks Cc'd... > > > > Good point. We don't need this on arm64 and probably neither on arm (at > > least since EABI). > > > > Setting x0 may cause other issues as well. The dynamic loader simply > > ignores the startup registers but for static binaries the _start code in > > glibc expects r0 to contain a function pointer to be registered with > > atexit() in __libc_start_main() or NULL. Since we pass argc in there, > > for static binaries the rtld_fini argument to __libc_start_main() is > > neither NULL nor something meaningful. > > The value left there by start_thread() will not reach the userland anyway... Ah, yes. So not causing any user issues (apart from the possible fault in the kernel while accessing the user stack).
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 261fdd0..1bc092e 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -12,7 +12,6 @@ config ARM select GENERIC_IRQ_PROBE select GENERIC_IRQ_SHOW select GENERIC_KERNEL_THREAD - select GENERIC_KERNEL_EXECVE select GENERIC_PCI_IOMAP select GENERIC_SMP_IDLE_THREAD select GENERIC_STRNCPY_FROM_USER diff --git a/arch/arm/include/asm/unistd.h b/arch/arm/include/asm/unistd.h index 8f60b6e..202bc3a 100644 --- a/arch/arm/include/asm/unistd.h +++ b/arch/arm/include/asm/unistd.h @@ -42,6 +42,7 @@ #define __ARCH_WANT_SYS_SOCKETCALL #endif #define __ARCH_WANT_SYS_EXECVE +#define __ARCH_WANT_KERNEL_EXECVE /* * "Conditional" syscalls diff --git a/arch/arm/kernel/entry-common.S b/arch/arm/kernel/entry-common.S index 417bac1..c05bd28 100644 --- a/arch/arm/kernel/entry-common.S +++ b/arch/arm/kernel/entry-common.S @@ -94,6 +94,18 @@ ENTRY(ret_from_fork) b ret_slow_syscall ENDPROC(ret_from_fork) +/* + * turn a kernel thread into userland process + * use: ret_from_kernel_execve(struct pt_regs *normal) + */ +ENTRY(ret_from_kernel_execve) + mov why, #0 @ not a syscall + str why, [r0, #S_R0] @ ... and we want 0 in ->ARM_r0 as well + get_thread_info tsk @ thread structure + mov sp, r0 @ stack pointer just under pt_regs + b ret_slow_syscall +ENDPROC(ret_from_kernel_execve) + .equ NR_syscalls,0 #define CALL(x) .equ NR_syscalls,NR_syscalls+1 #include "calls.S"