mbox series

[RFC,0/2] arm64 kgdb fixes for single stepping

Message ID 20200213031131.13255-1-minyard@acm.org (mailing list archive)
Headers show
Series arm64 kgdb fixes for single stepping | expand

Message

Corey Minyard Feb. 13, 2020, 3:11 a.m. UTC
I got a bug report about using kgdb on arm64, and it turns out it was
fairly broken.  Patch 2 has a description of what was going on.  I am
using a Marvell 8100 board.

The following patches fix the problem, but probably not in the
best way.  They are what I hacked out to show the problems.

I am not quite sure how this will interact with kprobes and hardware
breakpoints which use the same code, but they would have been broken,
too, so this is not making them any worse.

Comments

Will Deacon Feb. 13, 2020, 10:10 a.m. UTC | #1
On Wed, Feb 12, 2020 at 09:11:29PM -0600, minyard@acm.org wrote:
> I got a bug report about using kgdb on arm64, and it turns out it was
> fairly broken.  Patch 2 has a description of what was going on.  I am
> using a Marvell 8100 board.
> 
> The following patches fix the problem, but probably not in the
> best way.  They are what I hacked out to show the problems.
> 
> I am not quite sure how this will interact with kprobes and hardware
> breakpoints which use the same code, but they would have been broken,
> too, so this is not making them any worse.

This should all be handled by kgdb itself, not by changing the low-level
debug exception handling. For example, the '&kgdb_step_hook' can take
care of re-arming the step state machine and kgdb can also simply disable
interrupts during the step if it doesn't want to step into the handler.

Will
Corey Minyard Feb. 13, 2020, 3:57 p.m. UTC | #2
On Thu, Feb 13, 2020 at 10:10:58AM +0000, Will Deacon wrote:
> On Wed, Feb 12, 2020 at 09:11:29PM -0600, minyard@acm.org wrote:
> > I got a bug report about using kgdb on arm64, and it turns out it was
> > fairly broken.  Patch 2 has a description of what was going on.  I am
> > using a Marvell 8100 board.
> > 
> > The following patches fix the problem, but probably not in the
> > best way.  They are what I hacked out to show the problems.
> > 
> > I am not quite sure how this will interact with kprobes and hardware
> > breakpoints which use the same code, but they would have been broken,
> > too, so this is not making them any worse.
> 
> This should all be handled by kgdb itself, not by changing the low-level
> debug exception handling. For example, the '&kgdb_step_hook' can take
> care of re-arming the step state machine and kgdb can also simply disable
> interrupts during the step if it doesn't want to step into the handler.

How can kgdb disable the SS bit in MDSRC, or re-enable it on the right
CPU, without doing this in the exception handling?

I'm actually thinking that this may be a hardware bug.  Looking at the
ARMv8 manual, it looks like PSTATE.SS should be set to 0 if the
processor takes an exception.  That's definitely not happening; if I do
an instruction step from, say, sys_sync(), it gets the single-step trap
on the instruction after the PSTATE.D bit is disabled in el1_irq.

Even so, I think the migration issue is still a problem.  If you do an
eret set up for single-step, and interrupts are on, and you get a timer
interrupt, it could migrate the task to a different CPU if
PREEMPT_ENABLE is set, right?  If so, the MDSRC.SS bit will be set on
the wrong CPU and the single step trap won't happen.  That will break
kprobes, too.

You mention turning off interrupts in kgdb when single-stepping, which
you could do and it would solve this problem.  But it wouldn't solve the
problem of taking a paging exception, which you want to take in this
case.  And you could still migrate on a paging exception.  So I don't
think disabling interrupts is a good solution.

I don't see a solution besides clearing MDSCR.SS on an el1 exception
entry and conditionally setting it on an el1 exception return.  It might
be better to have a thread flag to do this instead of depending on the
setting of that bit; I'm not sure how expensive accessing the MDSRC
register is.

Setting SPSR.SS on subsequent single steps is definitely an issue, but I
can split that out into a separate patch.

-corey

> 
> Will