mbox series

[0/4] arm64: kgdb/kdb: Fix single-step debugging issues

Message ID 20200509214159.19680-1-liwei391@huawei.com (mailing list archive)
Headers show
Series arm64: kgdb/kdb: Fix single-step debugging issues | expand

Message

Wei Li May 9, 2020, 9:41 p.m. UTC
This patch set is to fix several issues of single-step debugging
in kgdb/kdb on arm64.

It seems that these issues have been shelved a very long time,
but i still hope to solve them, as the single-step debugging
is an useful feature.

Note:
Based on patch "arm64: cacheflush: Fix KGDB trap detection",
https://www.spinics.net/lists/arm-kernel/msg803741.html

Wei Li (4):
  arm64: kgdb: Fix single-step exception handling oops
  arm64: Extract kprobes_save_local_irqflag() and
    kprobes_restore_local_irqflag()
  arm64: kgdb: Fix single-stepping into the irq handler wrongly
  arm64: kgdb: Set PSTATE.SS to 1 to reenable single-step

 arch/arm64/include/asm/debug-monitors.h |  6 ++++++
 arch/arm64/kernel/debug-monitors.c      | 28 ++++++++++++++++++++++++-
 arch/arm64/kernel/kgdb.c                | 16 +++++++++++---
 arch/arm64/kernel/probes/kprobes.c      | 28 ++-----------------------
 4 files changed, 48 insertions(+), 30 deletions(-)

Comments

Doug Anderson May 14, 2020, 12:34 a.m. UTC | #1
Hi,

On Sat, May 9, 2020 at 6:49 AM Wei Li <liwei391@huawei.com> wrote:
>
> This patch set is to fix several issues of single-step debugging
> in kgdb/kdb on arm64.
>
> It seems that these issues have been shelved a very long time,
> but i still hope to solve them, as the single-step debugging
> is an useful feature.
>
> Note:
> Based on patch "arm64: cacheflush: Fix KGDB trap detection",
> https://www.spinics.net/lists/arm-kernel/msg803741.html
>
> Wei Li (4):
>   arm64: kgdb: Fix single-step exception handling oops
>   arm64: Extract kprobes_save_local_irqflag() and
>     kprobes_restore_local_irqflag()
>   arm64: kgdb: Fix single-stepping into the irq handler wrongly
>   arm64: kgdb: Set PSTATE.SS to 1 to reenable single-step
>
>  arch/arm64/include/asm/debug-monitors.h |  6 ++++++
>  arch/arm64/kernel/debug-monitors.c      | 28 ++++++++++++++++++++++++-
>  arch/arm64/kernel/kgdb.c                | 16 +++++++++++---
>  arch/arm64/kernel/probes/kprobes.c      | 28 ++-----------------------
>  4 files changed, 48 insertions(+), 30 deletions(-)

Just an overall note that I'm very happy that you posted this patch
series!  It's always been a thorn in my side that stepping and
breakpoints were so broken on arm64 and I'm really excited that you're
fixing them.  Now I'll have to get in the habit of using kgdb for more
than just debugging crashes and maybe I can use it more for exploring
how functions work more.  :-)

I'll also note that with your patch series I'm even seeing the "call"
feature of gdb working now.  That has always been terribly broken for
me.

-Doug
Wei Li May 16, 2020, 8:20 a.m. UTC | #2
Hi Douglas,

On 2020/5/14 8:34, Doug Anderson wrote:
> Hi,
> 
> On Sat, May 9, 2020 at 6:49 AM Wei Li <liwei391@huawei.com> wrote:
>>
>> This patch set is to fix several issues of single-step debugging
>> in kgdb/kdb on arm64.
>>
>> It seems that these issues have been shelved a very long time,
>> but i still hope to solve them, as the single-step debugging
>> is an useful feature.
>>
>> Note:
>> Based on patch "arm64: cacheflush: Fix KGDB trap detection",
>> https://www.spinics.net/lists/arm-kernel/msg803741.html
>>
>> Wei Li (4):
>>   arm64: kgdb: Fix single-step exception handling oops
>>   arm64: Extract kprobes_save_local_irqflag() and
>>     kprobes_restore_local_irqflag()
>>   arm64: kgdb: Fix single-stepping into the irq handler wrongly
>>   arm64: kgdb: Set PSTATE.SS to 1 to reenable single-step
>>
>>  arch/arm64/include/asm/debug-monitors.h |  6 ++++++
>>  arch/arm64/kernel/debug-monitors.c      | 28 ++++++++++++++++++++++++-
>>  arch/arm64/kernel/kgdb.c                | 16 +++++++++++---
>>  arch/arm64/kernel/probes/kprobes.c      | 28 ++-----------------------
>>  4 files changed, 48 insertions(+), 30 deletions(-)
> 
> Just an overall note that I'm very happy that you posted this patch
> series!  It's always been a thorn in my side that stepping and
> breakpoints were so broken on arm64 and I'm really excited that you're
> fixing them.  Now I'll have to get in the habit of using kgdb for more
> than just debugging crashes and maybe I can use it more for exploring
> how functions work more.  :-)
> > I'll also note that with your patch series I'm even seeing the "call"
> feature of gdb working now.  That has always been terribly broken for
> me.
> 
Thanks for reviewing and testing this series.
Enjoy exploring the kernel with kgdb/kdb! :-)

Thanks,
Wei
Doug Anderson June 29, 2020, 9:20 p.m. UTC | #3
Wei,

On Sat, May 16, 2020 at 1:20 AM liwei (GF) <liwei391@huawei.com> wrote:
>
> Hi Douglas,
>
> On 2020/5/14 8:34, Doug Anderson wrote:
> > Hi,
> >
> > On Sat, May 9, 2020 at 6:49 AM Wei Li <liwei391@huawei.com> wrote:
> >>
> >> This patch set is to fix several issues of single-step debugging
> >> in kgdb/kdb on arm64.
> >>
> >> It seems that these issues have been shelved a very long time,
> >> but i still hope to solve them, as the single-step debugging
> >> is an useful feature.
> >>
> >> Note:
> >> Based on patch "arm64: cacheflush: Fix KGDB trap detection",
> >> https://www.spinics.net/lists/arm-kernel/msg803741.html
> >>
> >> Wei Li (4):
> >>   arm64: kgdb: Fix single-step exception handling oops
> >>   arm64: Extract kprobes_save_local_irqflag() and
> >>     kprobes_restore_local_irqflag()
> >>   arm64: kgdb: Fix single-stepping into the irq handler wrongly
> >>   arm64: kgdb: Set PSTATE.SS to 1 to reenable single-step
> >>
> >>  arch/arm64/include/asm/debug-monitors.h |  6 ++++++
> >>  arch/arm64/kernel/debug-monitors.c      | 28 ++++++++++++++++++++++++-
> >>  arch/arm64/kernel/kgdb.c                | 16 +++++++++++---
> >>  arch/arm64/kernel/probes/kprobes.c      | 28 ++-----------------------
> >>  4 files changed, 48 insertions(+), 30 deletions(-)
> >
> > Just an overall note that I'm very happy that you posted this patch
> > series!  It's always been a thorn in my side that stepping and
> > breakpoints were so broken on arm64 and I'm really excited that you're
> > fixing them.  Now I'll have to get in the habit of using kgdb for more
> > than just debugging crashes and maybe I can use it more for exploring
> > how functions work more.  :-)
> > > I'll also note that with your patch series I'm even seeing the "call"
> > feature of gdb working now.  That has always been terribly broken for
> > me.
> >
> Thanks for reviewing and testing this series.
> Enjoy exploring the kernel with kgdb/kdb! :-)

I'm curious to know if you plan another spin.  The feedback you
received was fairly minor so it hopefully shouldn't be too hard.  I'd
very much like to get your patches landed and I'd be happy to try to
address the feedback and spin them myself if you're no longer
available for it.

Thanks!

-Doug
Will Deacon June 30, 2020, 7:22 a.m. UTC | #4
On Mon, Jun 29, 2020 at 02:20:11PM -0700, Doug Anderson wrote:
> On Sat, May 16, 2020 at 1:20 AM liwei (GF) <liwei391@huawei.com> wrote:
> > On 2020/5/14 8:34, Doug Anderson wrote:
> > > On Sat, May 9, 2020 at 6:49 AM Wei Li <liwei391@huawei.com> wrote:
> > >>
> > >> This patch set is to fix several issues of single-step debugging
> > >> in kgdb/kdb on arm64.
> > >>
> > >> It seems that these issues have been shelved a very long time,
> > >> but i still hope to solve them, as the single-step debugging
> > >> is an useful feature.
> > >>
> > >> Note:
> > >> Based on patch "arm64: cacheflush: Fix KGDB trap detection",
> > >> https://www.spinics.net/lists/arm-kernel/msg803741.html
> > >>
> > >> Wei Li (4):
> > >>   arm64: kgdb: Fix single-step exception handling oops
> > >>   arm64: Extract kprobes_save_local_irqflag() and
> > >>     kprobes_restore_local_irqflag()
> > >>   arm64: kgdb: Fix single-stepping into the irq handler wrongly
> > >>   arm64: kgdb: Set PSTATE.SS to 1 to reenable single-step
> > >>
> > >>  arch/arm64/include/asm/debug-monitors.h |  6 ++++++
> > >>  arch/arm64/kernel/debug-monitors.c      | 28 ++++++++++++++++++++++++-
> > >>  arch/arm64/kernel/kgdb.c                | 16 +++++++++++---
> > >>  arch/arm64/kernel/probes/kprobes.c      | 28 ++-----------------------
> > >>  4 files changed, 48 insertions(+), 30 deletions(-)
> > >
> > > Just an overall note that I'm very happy that you posted this patch
> > > series!  It's always been a thorn in my side that stepping and
> > > breakpoints were so broken on arm64 and I'm really excited that you're
> > > fixing them.  Now I'll have to get in the habit of using kgdb for more
> > > than just debugging crashes and maybe I can use it more for exploring
> > > how functions work more.  :-)
> > > > I'll also note that with your patch series I'm even seeing the "call"
> > > feature of gdb working now.  That has always been terribly broken for
> > > me.
> > >
> > Thanks for reviewing and testing this series.
> > Enjoy exploring the kernel with kgdb/kdb! :-)
> 
> I'm curious to know if you plan another spin.  The feedback you
> received was fairly minor so it hopefully shouldn't be too hard.  I'd
> very much like to get your patches landed and I'd be happy to try to
> address the feedback and spin them myself if you're no longer
> available for it.

I'd welcome some feedback on the proposal I sent out at the end of last
week:

https://lore.kernel.org/r/20200626095551.GA9312@willie-the-truck

which looks to solve some (all?) of these issues slightly differently,
because it turns out we need to perform some low-level surgery for
preempt-rt *anyway*...

I'm particularly keen to hear any thoughts concerning the reschedule on
return to EL1 after handling an interrupt that hit during a step.

Will
Doug Anderson July 6, 2020, 9:37 p.m. UTC | #5
Hi,

On Tue, Jun 30, 2020 at 12:22 AM Will Deacon <will@kernel.org> wrote:
>
> On Mon, Jun 29, 2020 at 02:20:11PM -0700, Doug Anderson wrote:
> > On Sat, May 16, 2020 at 1:20 AM liwei (GF) <liwei391@huawei.com> wrote:
> > > On 2020/5/14 8:34, Doug Anderson wrote:
> > > > On Sat, May 9, 2020 at 6:49 AM Wei Li <liwei391@huawei.com> wrote:
> > > >>
> > > >> This patch set is to fix several issues of single-step debugging
> > > >> in kgdb/kdb on arm64.
> > > >>
> > > >> It seems that these issues have been shelved a very long time,
> > > >> but i still hope to solve them, as the single-step debugging
> > > >> is an useful feature.
> > > >>
> > > >> Note:
> > > >> Based on patch "arm64: cacheflush: Fix KGDB trap detection",
> > > >> https://www.spinics.net/lists/arm-kernel/msg803741.html
> > > >>
> > > >> Wei Li (4):
> > > >>   arm64: kgdb: Fix single-step exception handling oops
> > > >>   arm64: Extract kprobes_save_local_irqflag() and
> > > >>     kprobes_restore_local_irqflag()
> > > >>   arm64: kgdb: Fix single-stepping into the irq handler wrongly
> > > >>   arm64: kgdb: Set PSTATE.SS to 1 to reenable single-step
> > > >>
> > > >>  arch/arm64/include/asm/debug-monitors.h |  6 ++++++
> > > >>  arch/arm64/kernel/debug-monitors.c      | 28 ++++++++++++++++++++++++-
> > > >>  arch/arm64/kernel/kgdb.c                | 16 +++++++++++---
> > > >>  arch/arm64/kernel/probes/kprobes.c      | 28 ++-----------------------
> > > >>  4 files changed, 48 insertions(+), 30 deletions(-)
> > > >
> > > > Just an overall note that I'm very happy that you posted this patch
> > > > series!  It's always been a thorn in my side that stepping and
> > > > breakpoints were so broken on arm64 and I'm really excited that you're
> > > > fixing them.  Now I'll have to get in the habit of using kgdb for more
> > > > than just debugging crashes and maybe I can use it more for exploring
> > > > how functions work more.  :-)
> > > > > I'll also note that with your patch series I'm even seeing the "call"
> > > > feature of gdb working now.  That has always been terribly broken for
> > > > me.
> > > >
> > > Thanks for reviewing and testing this series.
> > > Enjoy exploring the kernel with kgdb/kdb! :-)
> >
> > I'm curious to know if you plan another spin.  The feedback you
> > received was fairly minor so it hopefully shouldn't be too hard.  I'd
> > very much like to get your patches landed and I'd be happy to try to
> > address the feedback and spin them myself if you're no longer
> > available for it.
>
> I'd welcome some feedback on the proposal I sent out at the end of last
> week:
>
> https://lore.kernel.org/r/20200626095551.GA9312@willie-the-truck
>
> which looks to solve some (all?) of these issues

Actually, I'm pretty sure that patch #1 of Wei Li's patch series would
still be needed.  Would you object to landing it now just to get one
patch outta the way?

https://lkml.kernel.org/r/20200509214159.19680-2-liwei391@huawei.com


> slightly differently,
> because it turns out we need to perform some low-level surgery for
> preempt-rt *anyway*...
>
> I'm particularly keen to hear any thoughts concerning the reschedule on
> return to EL1 after handling an interrupt that hit during a step.

Thanks for the pointer!  Unfortunately this is yet another area that
I'm keenly interested in it working but pretty much clueless about how
this whole area of the system works.  :-|

My first question, though, is why would we want interrupts enabled
while we're single stepping?  If you're trying to get the processor to
just step one instruction forward you don't really want a bunch of
IRQs going off in the middle of it, do you?  While in kgdb and
debugging the kernel I would assume that a single step would run the
very least amount of code that it could (other than debugger code).
In general every time I exit kgdb (and re-start running the kernel
normally) I sorta assume that there's a chance that something in the
system will hit a timeout (maybe it's talking to external hardware
that will give up or something).  If I'm stepping through code I just
want that little bit of code I'm running to move forward and the rest
of the kernel to stand still.  If I want the rest of the kernel to
proceed I guess I'd set a breakpoint and then say "continue".

It seemed like with Wei Li's patch that we were closing holes and
being more consistent about keeping interrupts disabled when stepping
and I liked that.  Maybe if we made it so that taking interrupts
didn't break everything then it would be _OK_ to let them fire, but if
I had a choice I'd rather they didn't.

...of course, I'm looking at all this from the point of view of kgdb.
I don't know nearly enough about how other parts of the kernel use
single step to comment much on what would be best for them.


Did what I said make sense at all, or was it gibberish?  ...or not
gibberish but not what you were looking for?


-Doug
Wei Li July 7, 2020, 1:37 a.m. UTC | #6
Hi Doug,

On 2020/6/30 5:20, Doug Anderson wrote:
> Wei,
> 
> On Sat, May 16, 2020 at 1:20 AM liwei (GF) <liwei391@huawei.com> wrote:
>>
>> Hi Douglas,
>>
>> On 2020/5/14 8:34, Doug Anderson wrote:
>>> Hi,
>>>
>>> On Sat, May 9, 2020 at 6:49 AM Wei Li <liwei391@huawei.com> wrote:
>>>>
>>>> This patch set is to fix several issues of single-step debugging
>>>> in kgdb/kdb on arm64.
>>>>
>>>> It seems that these issues have been shelved a very long time,
>>>> but i still hope to solve them, as the single-step debugging
>>>> is an useful feature.
>>>>
>>>> Note:
>>>> Based on patch "arm64: cacheflush: Fix KGDB trap detection",
>>>> https://www.spinics.net/lists/arm-kernel/msg803741.html
>>>>
>>>> Wei Li (4):
>>>>   arm64: kgdb: Fix single-step exception handling oops
>>>>   arm64: Extract kprobes_save_local_irqflag() and
>>>>     kprobes_restore_local_irqflag()
>>>>   arm64: kgdb: Fix single-stepping into the irq handler wrongly
>>>>   arm64: kgdb: Set PSTATE.SS to 1 to reenable single-step
>>>>
>>>>  arch/arm64/include/asm/debug-monitors.h |  6 ++++++
>>>>  arch/arm64/kernel/debug-monitors.c      | 28 ++++++++++++++++++++++++-
>>>>  arch/arm64/kernel/kgdb.c                | 16 +++++++++++---
>>>>  arch/arm64/kernel/probes/kprobes.c      | 28 ++-----------------------
>>>>  4 files changed, 48 insertions(+), 30 deletions(-)
>>>
>>> Just an overall note that I'm very happy that you posted this patch
>>> series!  It's always been a thorn in my side that stepping and
>>> breakpoints were so broken on arm64 and I'm really excited that you're
>>> fixing them.  Now I'll have to get in the habit of using kgdb for more
>>> than just debugging crashes and maybe I can use it more for exploring
>>> how functions work more.  :-)
>>>> I'll also note that with your patch series I'm even seeing the "call"
>>> feature of gdb working now.  That has always been terribly broken for
>>> me.
>>>
>> Thanks for reviewing and testing this series.
>> Enjoy exploring the kernel with kgdb/kdb! :-)
> 
> I'm curious to know if you plan another spin.  The feedback you
> received was fairly minor so it hopefully shouldn't be too hard.  I'd
> very much like to get your patches landed and I'd be happy to try to
> address the feedback and spin them myself if you're no longer
> available for it.
> 

Sorry for the long delay. I was busy on a project and missed some mail.
I did receive some feedback about typo or coding style before, so i was expecting
more comment about the logic or people just don't care about these. After all,
this issue has lived a very long time.

It's a good news to hear that Will has plan to solve these issues, i will
follow that to do my bit.
Before that, i can send the next version in this week.


Thanks,
Wei
Will Deacon July 8, 2020, 10:02 p.m. UTC | #7
On Sun, 10 May 2020 05:41:55 +0800, Wei Li wrote:
> This patch set is to fix several issues of single-step debugging
> in kgdb/kdb on arm64.
> 
> It seems that these issues have been shelved a very long time,
> but i still hope to solve them, as the single-step debugging
> is an useful feature.
> 
> [...]

Applied to arm64 (for-next/fixes), thanks!

[1/1] arm64: kgdb: Fix single-step exception handling oops
      https://git.kernel.org/arm64/c/8523c006264d

Cheers,
Will Deacon July 8, 2020, 10:06 p.m. UTC | #8
Doug,

On Mon, Jul 06, 2020 at 02:37:05PM -0700, Doug Anderson wrote:
> On Tue, Jun 30, 2020 at 12:22 AM Will Deacon <will@kernel.org> wrote:
> > On Mon, Jun 29, 2020 at 02:20:11PM -0700, Doug Anderson wrote:
> > > On Sat, May 16, 2020 at 1:20 AM liwei (GF) <liwei391@huawei.com> wrote:
> > > > On 2020/5/14 8:34, Doug Anderson wrote:
> > > > > On Sat, May 9, 2020 at 6:49 AM Wei Li <liwei391@huawei.com> wrote:
> > > > >>
> > > > >> This patch set is to fix several issues of single-step debugging
> > > > >> in kgdb/kdb on arm64.
> > > > >>
> > > > >> It seems that these issues have been shelved a very long time,
> > > > >> but i still hope to solve them, as the single-step debugging
> > > > >> is an useful feature.
> > > > >>
> > > > >> Note:
> > > > >> Based on patch "arm64: cacheflush: Fix KGDB trap detection",
> > > > >> https://www.spinics.net/lists/arm-kernel/msg803741.html
> > > > >>
> > > > >> Wei Li (4):
> > > > >>   arm64: kgdb: Fix single-step exception handling oops
> > > > >>   arm64: Extract kprobes_save_local_irqflag() and
> > > > >>     kprobes_restore_local_irqflag()
> > > > >>   arm64: kgdb: Fix single-stepping into the irq handler wrongly
> > > > >>   arm64: kgdb: Set PSTATE.SS to 1 to reenable single-step
> > > > >>
> > > > >>  arch/arm64/include/asm/debug-monitors.h |  6 ++++++
> > > > >>  arch/arm64/kernel/debug-monitors.c      | 28 ++++++++++++++++++++++++-
> > > > >>  arch/arm64/kernel/kgdb.c                | 16 +++++++++++---
> > > > >>  arch/arm64/kernel/probes/kprobes.c      | 28 ++-----------------------
> > > > >>  4 files changed, 48 insertions(+), 30 deletions(-)
> > > > >
> > > > > Just an overall note that I'm very happy that you posted this patch
> > > > > series!  It's always been a thorn in my side that stepping and
> > > > > breakpoints were so broken on arm64 and I'm really excited that you're
> > > > > fixing them.  Now I'll have to get in the habit of using kgdb for more
> > > > > than just debugging crashes and maybe I can use it more for exploring
> > > > > how functions work more.  :-)
> > > > > > I'll also note that with your patch series I'm even seeing the "call"
> > > > > feature of gdb working now.  That has always been terribly broken for
> > > > > me.
> > > > >
> > > > Thanks for reviewing and testing this series.
> > > > Enjoy exploring the kernel with kgdb/kdb! :-)
> > >
> > > I'm curious to know if you plan another spin.  The feedback you
> > > received was fairly minor so it hopefully shouldn't be too hard.  I'd
> > > very much like to get your patches landed and I'd be happy to try to
> > > address the feedback and spin them myself if you're no longer
> > > available for it.
> >
> > I'd welcome some feedback on the proposal I sent out at the end of last
> > week:
> >
> > https://lore.kernel.org/r/20200626095551.GA9312@willie-the-truck
> >
> > which looks to solve some (all?) of these issues
> 
> Actually, I'm pretty sure that patch #1 of Wei Li's patch series would
> still be needed.  Would you object to landing it now just to get one
> patch outta the way?
> 
> https://lkml.kernel.org/r/20200509214159.19680-2-liwei391@huawei.com

I've grabbed patch 1, cheers.

> > slightly differently,
> > because it turns out we need to perform some low-level surgery for
> > preempt-rt *anyway*...
> >
> > I'm particularly keen to hear any thoughts concerning the reschedule on
> > return to EL1 after handling an interrupt that hit during a step.
> 
> Thanks for the pointer!  Unfortunately this is yet another area that
> I'm keenly interested in it working but pretty much clueless about how
> this whole area of the system works.  :-|

I'm also keen to fix it up but, although I roughly know how it works, I
always fail to find the time to spend on it. :-|

> My first question, though, is why would we want interrupts enabled
> while we're single stepping?  If you're trying to get the processor to
> just step one instruction forward you don't really want a bunch of
> IRQs going off in the middle of it, do you?  While in kgdb and
> debugging the kernel I would assume that a single step would run the
> very least amount of code that it could (other than debugger code).
> In general every time I exit kgdb (and re-start running the kernel
> normally) I sorta assume that there's a chance that something in the
> system will hit a timeout (maybe it's talking to external hardware
> that will give up or something).  If I'm stepping through code I just
> want that little bit of code I'm running to move forward and the rest
> of the kernel to stand still.  If I want the rest of the kernel to
> proceed I guess I'd set a breakpoint and then say "continue".

I understand where you're coming from, but I also think it's a reasonably
narrow viewpoint. If you disable IRQs during a step, you can change the
behaviour of the instruction being stepped. For example, an MRS of DAIF
will now see the I bit set instead of clear, and so something like
irqs_disabled() could go wrong while being stepped. But I think the main
realisation is that instructions being stepped can generate exceptions for
other reasons too, for example if you try to step a BUG() or a get_user().
Not only does that mean that we have to deal with exceptions during a step,
but it also means that disabling interrupts is a pretty bad idea because
the exception context may rely on them being enabled (for example, if it has
to sleep while handling a page fault).

> It seemed like with Wei Li's patch that we were closing holes and
> being more consistent about keeping interrupts disabled when stepping
> and I liked that.  Maybe if we made it so that taking interrupts
> didn't break everything then it would be _OK_ to let them fire, but if
> I had a choice I'd rather they didn't.

Although I appreciate somebody sending patches to improve this logic,
I really worry that it just moves the problem around and it won't be long
before somebody else sends another set of patches trying to deal with the
fallout. That ends up being a waste of everybody's time.

> ...of course, I'm looking at all this from the point of view of kgdb.
> I don't know nearly enough about how other parts of the kernel use
> single step to comment much on what would be best for them.

Sure, and I see you as Mr KGDB in this area anyway, but that's an important
user of this infrastructure (esp. single-step).

> Did what I said make sense at all, or was it gibberish?  ...or not
> gibberish but not what you were looking for?

I'm fairly set on unmasking IRQs during step for the reasons I mention
above. The question is really whether or not to forcibly prevent preemption
during such an irq.

Will
Doug Anderson July 8, 2020, 10:22 p.m. UTC | #9
Hi,

On Wed, Jul 8, 2020 at 3:06 PM Will Deacon <will@kernel.org> wrote:
>
> Doug,
>
> On Mon, Jul 06, 2020 at 02:37:05PM -0700, Doug Anderson wrote:
> > On Tue, Jun 30, 2020 at 12:22 AM Will Deacon <will@kernel.org> wrote:
> > > On Mon, Jun 29, 2020 at 02:20:11PM -0700, Doug Anderson wrote:
> > > > On Sat, May 16, 2020 at 1:20 AM liwei (GF) <liwei391@huawei.com> wrote:
> > > > > On 2020/5/14 8:34, Doug Anderson wrote:
> > > > > > On Sat, May 9, 2020 at 6:49 AM Wei Li <liwei391@huawei.com> wrote:
> > > > > >>
> > > > > >> This patch set is to fix several issues of single-step debugging
> > > > > >> in kgdb/kdb on arm64.
> > > > > >>
> > > > > >> It seems that these issues have been shelved a very long time,
> > > > > >> but i still hope to solve them, as the single-step debugging
> > > > > >> is an useful feature.
> > > > > >>
> > > > > >> Note:
> > > > > >> Based on patch "arm64: cacheflush: Fix KGDB trap detection",
> > > > > >> https://www.spinics.net/lists/arm-kernel/msg803741.html
> > > > > >>
> > > > > >> Wei Li (4):
> > > > > >>   arm64: kgdb: Fix single-step exception handling oops
> > > > > >>   arm64: Extract kprobes_save_local_irqflag() and
> > > > > >>     kprobes_restore_local_irqflag()
> > > > > >>   arm64: kgdb: Fix single-stepping into the irq handler wrongly
> > > > > >>   arm64: kgdb: Set PSTATE.SS to 1 to reenable single-step
> > > > > >>
> > > > > >>  arch/arm64/include/asm/debug-monitors.h |  6 ++++++
> > > > > >>  arch/arm64/kernel/debug-monitors.c      | 28 ++++++++++++++++++++++++-
> > > > > >>  arch/arm64/kernel/kgdb.c                | 16 +++++++++++---
> > > > > >>  arch/arm64/kernel/probes/kprobes.c      | 28 ++-----------------------
> > > > > >>  4 files changed, 48 insertions(+), 30 deletions(-)
> > > > > >
> > > > > > Just an overall note that I'm very happy that you posted this patch
> > > > > > series!  It's always been a thorn in my side that stepping and
> > > > > > breakpoints were so broken on arm64 and I'm really excited that you're
> > > > > > fixing them.  Now I'll have to get in the habit of using kgdb for more
> > > > > > than just debugging crashes and maybe I can use it more for exploring
> > > > > > how functions work more.  :-)
> > > > > > > I'll also note that with your patch series I'm even seeing the "call"
> > > > > > feature of gdb working now.  That has always been terribly broken for
> > > > > > me.
> > > > > >
> > > > > Thanks for reviewing and testing this series.
> > > > > Enjoy exploring the kernel with kgdb/kdb! :-)
> > > >
> > > > I'm curious to know if you plan another spin.  The feedback you
> > > > received was fairly minor so it hopefully shouldn't be too hard.  I'd
> > > > very much like to get your patches landed and I'd be happy to try to
> > > > address the feedback and spin them myself if you're no longer
> > > > available for it.
> > >
> > > I'd welcome some feedback on the proposal I sent out at the end of last
> > > week:
> > >
> > > https://lore.kernel.org/r/20200626095551.GA9312@willie-the-truck
> > >
> > > which looks to solve some (all?) of these issues
> >
> > Actually, I'm pretty sure that patch #1 of Wei Li's patch series would
> > still be needed.  Would you object to landing it now just to get one
> > patch outta the way?
> >
> > https://lkml.kernel.org/r/20200509214159.19680-2-liwei391@huawei.com
>
> I've grabbed patch 1, cheers.

Thanks!


> > > slightly differently,
> > > because it turns out we need to perform some low-level surgery for
> > > preempt-rt *anyway*...
> > >
> > > I'm particularly keen to hear any thoughts concerning the reschedule on
> > > return to EL1 after handling an interrupt that hit during a step.
> >
> > Thanks for the pointer!  Unfortunately this is yet another area that
> > I'm keenly interested in it working but pretty much clueless about how
> > this whole area of the system works.  :-|
>
> I'm also keen to fix it up but, although I roughly know how it works, I
> always fail to find the time to spend on it. :-|

Yeah, I know how it is.  I love working on kgdb but I always have a
hard time putting it ahead of other tasks and there are always other
tasks.  Knowing that other people use it helps me, at least, and some
of my recent work on kgdb was because a whole pile of other people
were all discussing how to get kgdb working.  ;-)


> > My first question, though, is why would we want interrupts enabled
> > while we're single stepping?  If you're trying to get the processor to
> > just step one instruction forward you don't really want a bunch of
> > IRQs going off in the middle of it, do you?  While in kgdb and
> > debugging the kernel I would assume that a single step would run the
> > very least amount of code that it could (other than debugger code).
> > In general every time I exit kgdb (and re-start running the kernel
> > normally) I sorta assume that there's a chance that something in the
> > system will hit a timeout (maybe it's talking to external hardware
> > that will give up or something).  If I'm stepping through code I just
> > want that little bit of code I'm running to move forward and the rest
> > of the kernel to stand still.  If I want the rest of the kernel to
> > proceed I guess I'd set a breakpoint and then say "continue".
>
> I understand where you're coming from, but I also think it's a reasonably
> narrow viewpoint. If you disable IRQs during a step, you can change the
> behaviour of the instruction being stepped. For example, an MRS of DAIF
> will now see the I bit set instead of clear, and so something like
> irqs_disabled() could go wrong while being stepped. But I think the main
> realisation is that instructions being stepped can generate exceptions for
> other reasons too, for example if you try to step a BUG() or a get_user().
> Not only does that mean that we have to deal with exceptions during a step,
> but it also means that disabling interrupts is a pretty bad idea because
> the exception context may rely on them being enabled (for example, if it has
> to sleep while handling a page fault).

OK, you make a fair argument.


> > It seemed like with Wei Li's patch that we were closing holes and
> > being more consistent about keeping interrupts disabled when stepping
> > and I liked that.  Maybe if we made it so that taking interrupts
> > didn't break everything then it would be _OK_ to let them fire, but if
> > I had a choice I'd rather they didn't.
>
> Although I appreciate somebody sending patches to improve this logic,
> I really worry that it just moves the problem around and it won't be long
> before somebody else sends another set of patches trying to deal with the
> fallout. That ends up being a waste of everybody's time.

Sure, though I think right now all of the non-kgdb stepping logic
already disables interrupts, right?  ...and the kgdb logic for single
step is (and has always been) very broken for arm64.  Unless there is
hope in the short or medium term of the "right" solution getting done,
it does feel like Wei Li's patches improve the situation...


> > ...of course, I'm looking at all this from the point of view of kgdb.
> > I don't know nearly enough about how other parts of the kernel use
> > single step to comment much on what would be best for them.
>
> Sure, and I see you as Mr KGDB in this area anyway, but that's an important
> user of this infrastructure (esp. single-step).

Yeah, I think that's how I'll be most useful in this situation.  While
it'd be really cool to understand all the inner workings of this code,
realistically I don't have time and it's probably better for me to
treat it as a black box and just be useful as a tester of the end
result.


> > Did what I said make sense at all, or was it gibberish?  ...or not
> > gibberish but not what you were looking for?
>
> I'm fairly set on unmasking IRQs during step for the reasons I mention
> above. The question is really whether or not to forcibly prevent preemption
> during such an irq.

My first take would be the same argument I made for keeping IRQs off:
the fewer things that happen during single stepping the better.

-Doug