mbox series

[0/2] arm64: kgdb/kdb: Fix pending single-step debugging issues

Message ID 20220411093819.1012583-1-sumit.garg@linaro.org (mailing list archive)
Headers show
Series arm64: kgdb/kdb: Fix pending single-step debugging issues | expand

Message

Sumit Garg April 11, 2022, 9:38 a.m. UTC
This patch-set reworks pending fixes from Wei's series [1] to make
single-step debugging via kgdb/kdb on arm64 work as expected. There was
a prior discussion on ML [2] regarding if we should keep the interrupts
enabled during single-stepping but it turns out that in case of kgdb, it
is risky to enable interrupts as sometimes a resume after single
stepping an interrupt handler leads to following unbalanced locking
issue:

[  300.328300] WARNING: bad unlock balance detected!
[  300.328608] 5.18.0-rc1-00016-g3e732ebf7316-dirty #6 Not tainted
[  300.329058] -------------------------------------
[  300.329298] sh/173 is trying to release lock (dbg_slave_lock) at:
[  300.329718] [<ffffd57c951c016c>] kgdb_cpu_enter+0x7ac/0x820
[  300.330029] but there are no more locks to release!
[  300.330265] 
[  300.330265] other info that might help us debug this:
[  300.330668] 4 locks held by sh/173:
[  300.330891]  #0: ffff4f5e454d8438 (sb_writers#3){.+.+}-{0:0}, at: vfs_write+0x98/0x204
[  300.331735]  #1: ffffd57c973bc2f0 (dbg_slave_lock){+.+.}-{2:2}, at: kgdb_cpu_enter+0x5b4/0x820
[  300.332259]  #2: ffffd57c973a9460 (rcu_read_lock){....}-{1:2}, at: kgdb_cpu_enter+0xe0/0x820
[  300.332717]  #3: ffffd57c973bc2a8 (dbg_master_lock){....}-{2:2}, at: kgdb_cpu_enter+0x1ec/0x820

So, I choose to keep interrupts disabled specifically for kgdb. This
series has been rebased to Linux 5.18-rc1 and I have dropped Doug's
review and test tags as there is significant rework involved.

[1] https://lore.kernel.org/all/20200509214159.19680-1-liwei391@huawei.com/
[2] https://lore.kernel.org/all/CAD=FV=Voyfq3Qz0T3RY+aYWYJ0utdH=P_AweB=13rcV8GDBeyQ@mail.gmail.com/

Sumit Garg (2):
  arm64: kgdb: Fix incorrect single stepping into the irq handler
  arm64: kgdb: Set PSTATE.SS to 1 to re-enable single-step

 arch/arm64/include/asm/debug-monitors.h |  1 +
 arch/arm64/kernel/debug-monitors.c      |  5 ++++
 arch/arm64/kernel/kgdb.c                | 35 +++++++++++++++++++++++--
 3 files changed, 39 insertions(+), 2 deletions(-)

Comments

Doug Anderson April 12, 2022, 12:09 a.m. UTC | #1
Hi,

On Mon, Apr 11, 2022 at 2:38 AM Sumit Garg <sumit.garg@linaro.org> wrote:
>
> This patch-set reworks pending fixes from Wei's series [1] to make
> single-step debugging via kgdb/kdb on arm64 work as expected. There was
> a prior discussion on ML [2] regarding if we should keep the interrupts
> enabled during single-stepping but it turns out that in case of kgdb, it
> is risky to enable interrupts as sometimes a resume after single
> stepping an interrupt handler leads to following unbalanced locking
> issue:
>
> [  300.328300] WARNING: bad unlock balance detected!
> [  300.328608] 5.18.0-rc1-00016-g3e732ebf7316-dirty #6 Not tainted
> [  300.329058] -------------------------------------
> [  300.329298] sh/173 is trying to release lock (dbg_slave_lock) at:
> [  300.329718] [<ffffd57c951c016c>] kgdb_cpu_enter+0x7ac/0x820
> [  300.330029] but there are no more locks to release!
> [  300.330265]
> [  300.330265] other info that might help us debug this:
> [  300.330668] 4 locks held by sh/173:
> [  300.330891]  #0: ffff4f5e454d8438 (sb_writers#3){.+.+}-{0:0}, at: vfs_write+0x98/0x204
> [  300.331735]  #1: ffffd57c973bc2f0 (dbg_slave_lock){+.+.}-{2:2}, at: kgdb_cpu_enter+0x5b4/0x820
> [  300.332259]  #2: ffffd57c973a9460 (rcu_read_lock){....}-{1:2}, at: kgdb_cpu_enter+0xe0/0x820
> [  300.332717]  #3: ffffd57c973bc2a8 (dbg_master_lock){....}-{2:2}, at: kgdb_cpu_enter+0x1ec/0x820
>
> So, I choose to keep interrupts disabled specifically for kgdb. This
> series has been rebased to Linux 5.18-rc1 and I have dropped Doug's
> review and test tags as there is significant rework involved.

Hmmmm. I guess it's really up to Will here, but re-reading his
previous email made it pretty clear that he wasn't willing to land a
solution that he wasn't willing to land a solution that left
interrupts disabled during step. He also pointed out some things that
would actually be broken, like single-stepping over a call to
irqs_disabled() or single stepping over something that caused an
exception where the exception handler needed interrupts enabled.

I thought he had a proposal at:

https://lore.kernel.org/r/20200626095551.GA9312@willie-the-truck

...that was supposed to make all the problems go away and it was just
that nobody had time to implement his proposal?


> [1] https://lore.kernel.org/all/20200509214159.19680-1-liwei391@huawei.com/
> [2] https://lore.kernel.org/all/CAD=FV=Voyfq3Qz0T3RY+aYWYJ0utdH=P_AweB=13rcV8GDBeyQ@mail.gmail.com/
>
> Sumit Garg (2):
>   arm64: kgdb: Fix incorrect single stepping into the irq handler
>   arm64: kgdb: Set PSTATE.SS to 1 to re-enable single-step
>
>  arch/arm64/include/asm/debug-monitors.h |  1 +
>  arch/arm64/kernel/debug-monitors.c      |  5 ++++
>  arch/arm64/kernel/kgdb.c                | 35 +++++++++++++++++++++++--
>  3 files changed, 39 insertions(+), 2 deletions(-)
>
> --
> 2.25.1
>
Sumit Garg April 13, 2022, 7:03 a.m. UTC | #2
Hi Doug,

Thanks for looking into this patch-set.

On Tue, 12 Apr 2022 at 05:39, Doug Anderson <dianders@chromium.org> wrote:
>
> Hi,
>
> On Mon, Apr 11, 2022 at 2:38 AM Sumit Garg <sumit.garg@linaro.org> wrote:
> >
> > This patch-set reworks pending fixes from Wei's series [1] to make
> > single-step debugging via kgdb/kdb on arm64 work as expected. There was
> > a prior discussion on ML [2] regarding if we should keep the interrupts
> > enabled during single-stepping but it turns out that in case of kgdb, it
> > is risky to enable interrupts as sometimes a resume after single
> > stepping an interrupt handler leads to following unbalanced locking
> > issue:
> >
> > [  300.328300] WARNING: bad unlock balance detected!
> > [  300.328608] 5.18.0-rc1-00016-g3e732ebf7316-dirty #6 Not tainted
> > [  300.329058] -------------------------------------
> > [  300.329298] sh/173 is trying to release lock (dbg_slave_lock) at:
> > [  300.329718] [<ffffd57c951c016c>] kgdb_cpu_enter+0x7ac/0x820
> > [  300.330029] but there are no more locks to release!
> > [  300.330265]
> > [  300.330265] other info that might help us debug this:
> > [  300.330668] 4 locks held by sh/173:
> > [  300.330891]  #0: ffff4f5e454d8438 (sb_writers#3){.+.+}-{0:0}, at: vfs_write+0x98/0x204
> > [  300.331735]  #1: ffffd57c973bc2f0 (dbg_slave_lock){+.+.}-{2:2}, at: kgdb_cpu_enter+0x5b4/0x820
> > [  300.332259]  #2: ffffd57c973a9460 (rcu_read_lock){....}-{1:2}, at: kgdb_cpu_enter+0xe0/0x820
> > [  300.332717]  #3: ffffd57c973bc2a8 (dbg_master_lock){....}-{2:2}, at: kgdb_cpu_enter+0x1ec/0x820
> >
> > So, I choose to keep interrupts disabled specifically for kgdb. This
> > series has been rebased to Linux 5.18-rc1 and I have dropped Doug's
> > review and test tags as there is significant rework involved.
>
> Hmmmm. I guess it's really up to Will here, but re-reading his
> previous email made it pretty clear that he wasn't willing to land a
> solution that he wasn't willing to land a solution that left
> interrupts disabled during step. He also pointed out some things that
> would actually be broken, like single-stepping over a call to
> irqs_disabled() or single stepping over something that caused an
> exception where the exception handler needed interrupts enabled.
>
> I thought he had a proposal at:
>
> https://lore.kernel.org/r/20200626095551.GA9312@willie-the-truck
>
> ...that was supposed to make all the problems go away and it was just
> that nobody had time to implement his proposal?
>

So I took a shot at Will's proposal as a replacement of patch #1 in v2
[1]. I hope that it is aligned with Will's thinking.

[1] https://lkml.org/lkml/2022/4/13/136

-Sumit

>
> > [1] https://lore.kernel.org/all/20200509214159.19680-1-liwei391@huawei.com/
> > [2] https://lore.kernel.org/all/CAD=FV=Voyfq3Qz0T3RY+aYWYJ0utdH=P_AweB=13rcV8GDBeyQ@mail.gmail.com/
> >
> > Sumit Garg (2):
> >   arm64: kgdb: Fix incorrect single stepping into the irq handler
> >   arm64: kgdb: Set PSTATE.SS to 1 to re-enable single-step
> >
> >  arch/arm64/include/asm/debug-monitors.h |  1 +
> >  arch/arm64/kernel/debug-monitors.c      |  5 ++++
> >  arch/arm64/kernel/kgdb.c                | 35 +++++++++++++++++++++++--
> >  3 files changed, 39 insertions(+), 2 deletions(-)
> >
> > --
> > 2.25.1
> >