Message ID | 20160530213529.GS19428@n2100.arm.linux.org.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Mon, May 30, 2016 at 10:35:29PM +0100, Russell King - ARM Linux wrote: > With that, on a single CPU, it seems to work correctly every time, but > if I bring up a secondary CPU I start seeing the same problems you've > reported - which seems to need the following patch to solve. Please can > you check whether this resolves your problem? More background behind this patch... I think that commit 8130b9d7b9d8 ("ARM: 7308/1: vfp: flush thread hwstate before copying ptrace registers") was wrong. Let's look at what's happening. The above commit had the following effect: vfp_sync_hwstate(); new_vfp = thread->vfpstate.hard; modify new_vfp (but not new_vfp.cpu) + vfp_flush_hwstate(thread); thread->vfpstate.hard = new_vfp; - vfp_flush_hwstate(thread); Now, the commit above claims that a context switch mid-copy of new_vfp into thread->vfpstate.hard may result in corrupted state. I'm not quite sure how we could get to that point: the only thread which is allowed to touch the traced child's state is the ptracing parent, and the ptracing parent can't do anything until after vfp_set() has returned. Even if the traced child gets a fatal signal, the point at which the signal is delivered to the child is when returning to userspace, which means the child must be runnable. That in turn means that we are not in the middle of changing the register state. So, I'm not sure where Will got the idea that the partially copied state would be visible: it shouldn't ever be visible. If it is visible, then we've got problems even with Will's patch applied, because a partial copy whenever it happens would be visible. In any case, the above change is wrong: vfp_flush_hwstate() sets thread->vfpstate.hard.cpu to an invalid CPU number. Switching the order as Will has done _undoes_ the effect of vfp_flush_hwstate(), which leads to the bug you are seeing. So, I think the correct approach is to revert it. If the partially copied state _is_ visible somehow, that needs to be better documented - the original commit fails to indicate how the partially copied state becomes visible as a result of a context switch. > > Thanks. > > arch/arm/kernel/ptrace.c | 2 +- > 1 files changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/arm/kernel/ptrace.c b/arch/arm/kernel/ptrace.c > index ef9119f7462e..4d9375814b53 100644 > --- a/arch/arm/kernel/ptrace.c > +++ b/arch/arm/kernel/ptrace.c > @@ -733,8 +733,8 @@ static int vfp_set(struct task_struct *target, > if (ret) > return ret; > > - vfp_flush_hwstate(thread); > thread->vfpstate.hard = new_vfp; > + vfp_flush_hwstate(thread); > > return 0; > } > > > -- > RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ > FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up > according to speedtest.net. > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
On Mon, May 30, 2016 at 11:40:28PM +0100, Russell King - ARM Linux wrote: > On Mon, May 30, 2016 at 10:35:29PM +0100, Russell King - ARM Linux wrote: > > With that, on a single CPU, it seems to work correctly every time, but > > if I bring up a secondary CPU I start seeing the same problems you've > > reported - which seems to need the following patch to solve. Please can > > you check whether this resolves your problem? > > More background behind this patch... I think that commit 8130b9d7b9d8 > ("ARM: 7308/1: vfp: flush thread hwstate before copying ptrace registers") > was wrong. > > Let's look at what's happening. The above commit had the following effect: > > vfp_sync_hwstate(); > new_vfp = thread->vfpstate.hard; > modify new_vfp (but not new_vfp.cpu) > + vfp_flush_hwstate(thread); > thread->vfpstate.hard = new_vfp; > - vfp_flush_hwstate(thread); > > Now, the commit above claims that a context switch mid-copy of new_vfp > into thread->vfpstate.hard may result in corrupted state. > > I'm not quite sure how we could get to that point: the only thread which > is allowed to touch the traced child's state is the ptracing parent, and > the ptracing parent can't do anything until after vfp_set() has returned. > Even if the traced child gets a fatal signal, the point at which the > signal is delivered to the child is when returning to userspace, which > means the child must be runnable. That in turn means that we are not > in the middle of changing the register state. > > So, I'm not sure where Will got the idea that the partially copied state > would be visible: it shouldn't ever be visible. If it is visible, then > we've got problems even with Will's patch applied, because a partial > copy whenever it happens would be visible. > > In any case, the above change is wrong: vfp_flush_hwstate() sets > thread->vfpstate.hard.cpu to an invalid CPU number. Switching the order > as Will has done _undoes_ the effect of vfp_flush_hwstate(), which leads > to the bug you are seeing. > > So, I think the correct approach is to revert it. Yes, it looks like I was over-zealous in fixing all users of vfp_flush_hwstate after I fixed the signal handling code. Given that the child isn't runnable until the parent has finished poking at the register state, the original code looks fine. The only thing I'm uncertain of is whether or not PTRACE_SEIZE/PTRACE_LISTEN allow switching to the child (but even then, how is the parent doing to issue such a request?). Will
On Tue, May 31, 2016 at 02:52:52PM +0100, Will Deacon wrote: > The only thing I'm uncertain of is whether or not PTRACE_SEIZE/PTRACE_LISTEN > allow switching to the child (but even then, how is the parent doing > to issue such a request?). I can't see how that would be possible - the parent needs to finish the vfp_set() call before it can issue any others - and there can only be one tracer. IOW, if we're in vfp_set(), the thread must have attached to the child, at which point PTRACE_SEIZE will be rejected. PTRACE_LISTEN also requires the child to be in STOP state as well, and again, can only be issued from the current tracer.
On 16-05-30 05:35 PM, Russell King - ARM Linux wrote: > So, the gdb verisons I have here seem to be particularly poor - but with > some modifications, I can test out on iMX6 by forcing gdb to do the right > thing - by inserting a couple of "mov r0, r0" instructions after the > "break_here" label. I see that problem too with older versions, bisecting shows it has been fixed in commit 6e22494e5076 Do not skip prologue for asm (.S) files in gdb, which is included in gdb 7.10 and up. > With that, on a single CPU, it seems to work correctly every time, but > if I bring up a secondary CPU I start seeing the same problems you've > reported - which seems to need the following patch to solve. Please can > you check whether this resolves your problem? Yes that fixes the problem, the test case succeeds every time. I have stared at those lines in ptrace.c for some time, but couldn't find the problem. Thanks for looking into it!
On Wed, Jun 01, 2016 at 08:54:05AM -0400, Simon Marchi wrote: > On 16-05-30 05:35 PM, Russell King - ARM Linux wrote: > > So, the gdb verisons I have here seem to be particularly poor - but with > > some modifications, I can test out on iMX6 by forcing gdb to do the right > > thing - by inserting a couple of "mov r0, r0" instructions after the > > "break_here" label. > > I see that problem too with older versions, bisecting shows it has been fixed > in commit > > 6e22494e5076 Do not skip prologue for asm (.S) files > > in gdb, which is included in gdb 7.10 and up. > > > With that, on a single CPU, it seems to work correctly every time, but > > if I bring up a secondary CPU I start seeing the same problems you've > > reported - which seems to need the following patch to solve. Please can > > you check whether this resolves your problem? > > Yes that fixes the problem, the test case succeeds every time. I have stared > at those lines in ptrace.c for some time, but couldn't find the problem. Thanks > for looking into it! Hi, can I add a: Tested-by: Simon Marchi <simon.marchi@ericsson.com> tag to the commit please? Many thanks.
On 16-06-02 09:15 AM, Russell King - ARM Linux wrote: > Hi, can I add a: > > Tested-by: Simon Marchi <simon.marchi@ericsson.com> > > tag to the commit please? Yes, of course.
diff --git a/arch/arm/kernel/ptrace.c b/arch/arm/kernel/ptrace.c index ef9119f7462e..4d9375814b53 100644 --- a/arch/arm/kernel/ptrace.c +++ b/arch/arm/kernel/ptrace.c @@ -733,8 +733,8 @@ static int vfp_set(struct task_struct *target, if (ret) return ret; - vfp_flush_hwstate(thread); thread->vfpstate.hard = new_vfp; + vfp_flush_hwstate(thread); return 0; }