diff mbox series

arm64: don't preempt_disable in do_debug_exception

Message ID 1592501369-27645-1-git-send-email-paul.gortmaker@windriver.com
State New, archived
Headers show
Series arm64: don't preempt_disable in do_debug_exception | expand

Commit Message

Paul Gortmaker June 18, 2020, 5:29 p.m. UTC
In commit d8bb6718c4db ("arm64: Make debug exception handlers visible
from RCU") debug_exception_enter and exit were added to deal with better
tracking of RCU state - however, in addition to that, but not mentioned
in the commit log, a preempt_disable/enable pair were also added.

Based on the comment (being removed here) it would seem that the pair
were not added to address a specific problem, but just as a proactive,
preventative measure - as in "seemed like a good idea at the time".

The problem is that do_debug_exception() eventually calls out to
generic kernel code like do_force_sig_info() which takes non-raw locks
and on -rt enabled kernels, results in things looking like the following,
since on -rt kernels, it is noticed that preemption is still disabled.

 BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:975
 in_atomic(): 1, irqs_disabled(): 0, pid: 35658, name: gdbtest
 Preemption disabled at:
 [<ffff000010081578>] do_debug_exception+0x38/0x1a4
 Call trace:
 dump_backtrace+0x0/0x138
 show_stack+0x24/0x30
 dump_stack+0x94/0xbc
 ___might_sleep+0x13c/0x168
 rt_spin_lock+0x40/0x80
 do_force_sig_info+0x30/0xe0
 force_sig_fault+0x64/0x90
 arm64_force_sig_fault+0x50/0x80
 send_user_sigtrap+0x50/0x80
 brk_handler+0x98/0xc8
 do_debug_exception+0x70/0x1a4
 el0_dbg+0x18/0x20

The reproducer was basically an automated gdb test that set a breakpoint
on a simple "hello world" program and then quit gdb once the breakpoint
was hit - i.e. "(gdb) A debugging session is active.  Quit anyway? "

Fixes: d8bb6718c4db ("arm64: Make debug exception handlers visible from RCU")
Cc: stable@vger.kernel.org
Cc: Naresh Kamboju <naresh.kamboju@linaro.org>
Cc: Paul E. McKenney <paulmck@linux.ibm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
---
 arch/arm64/mm/fault.c | 11 -----------
 1 file changed, 11 deletions(-)

Comments

Will Deacon June 23, 2020, 3:59 p.m. UTC | #1
On Thu, Jun 18, 2020 at 01:29:29PM -0400, Paul Gortmaker wrote:
> In commit d8bb6718c4db ("arm64: Make debug exception handlers visible
> from RCU") debug_exception_enter and exit were added to deal with better
> tracking of RCU state - however, in addition to that, but not mentioned
> in the commit log, a preempt_disable/enable pair were also added.
> 
> Based on the comment (being removed here) it would seem that the pair
> were not added to address a specific problem, but just as a proactive,
> preventative measure - as in "seemed like a good idea at the time".
> 
> The problem is that do_debug_exception() eventually calls out to
> generic kernel code like do_force_sig_info() which takes non-raw locks
> and on -rt enabled kernels, results in things looking like the following,
> since on -rt kernels, it is noticed that preemption is still disabled.
> 
>  BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:975
>  in_atomic(): 1, irqs_disabled(): 0, pid: 35658, name: gdbtest
>  Preemption disabled at:
>  [<ffff000010081578>] do_debug_exception+0x38/0x1a4
>  Call trace:
>  dump_backtrace+0x0/0x138
>  show_stack+0x24/0x30
>  dump_stack+0x94/0xbc
>  ___might_sleep+0x13c/0x168
>  rt_spin_lock+0x40/0x80
>  do_force_sig_info+0x30/0xe0
>  force_sig_fault+0x64/0x90
>  arm64_force_sig_fault+0x50/0x80
>  send_user_sigtrap+0x50/0x80
>  brk_handler+0x98/0xc8
>  do_debug_exception+0x70/0x1a4
>  el0_dbg+0x18/0x20
> 
> The reproducer was basically an automated gdb test that set a breakpoint
> on a simple "hello world" program and then quit gdb once the breakpoint
> was hit - i.e. "(gdb) A debugging session is active.  Quit anyway? "

Hmm, the debug exception handler path was definitely written with the
expectation that preemption is disabled, so this is unfortunate. For
exceptions from kernelspace, we need to keep that guarantee as we implement
things like BUG() using this path. For exceptions from userspace, it's
plausible that we could re-enable preemption, but then we should also
re-enable interrupts and debug exceptions too because we don't
context-switch pstate in switch_to() and we would end up with holes in our
kernel debug coverage (and these might be fatal if e.g. single step doesn't
work in a kprobe OOL buffer). However, that then means that any common code
when handling user and kernel debug exceptions needs to be re-entrant,
which it probably isn't at the moment (I haven't checked).

So although I'm alright with this idea for user debug exceptions, I think
it needs more work.

Will
Mark Rutland June 23, 2020, 4:55 p.m. UTC | #2
On Tue, Jun 23, 2020 at 04:59:01PM +0100, Will Deacon wrote:
> On Thu, Jun 18, 2020 at 01:29:29PM -0400, Paul Gortmaker wrote:
> > In commit d8bb6718c4db ("arm64: Make debug exception handlers visible
> > from RCU") debug_exception_enter and exit were added to deal with better
> > tracking of RCU state - however, in addition to that, but not mentioned
> > in the commit log, a preempt_disable/enable pair were also added.
> > 
> > Based on the comment (being removed here) it would seem that the pair
> > were not added to address a specific problem, but just as a proactive,
> > preventative measure - as in "seemed like a good idea at the time".
> > 
> > The problem is that do_debug_exception() eventually calls out to
> > generic kernel code like do_force_sig_info() which takes non-raw locks
> > and on -rt enabled kernels, results in things looking like the following,
> > since on -rt kernels, it is noticed that preemption is still disabled.
> > 
> >  BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:975
> >  in_atomic(): 1, irqs_disabled(): 0, pid: 35658, name: gdbtest
> >  Preemption disabled at:
> >  [<ffff000010081578>] do_debug_exception+0x38/0x1a4
> >  Call trace:
> >  dump_backtrace+0x0/0x138
> >  show_stack+0x24/0x30
> >  dump_stack+0x94/0xbc
> >  ___might_sleep+0x13c/0x168
> >  rt_spin_lock+0x40/0x80
> >  do_force_sig_info+0x30/0xe0
> >  force_sig_fault+0x64/0x90
> >  arm64_force_sig_fault+0x50/0x80
> >  send_user_sigtrap+0x50/0x80
> >  brk_handler+0x98/0xc8
> >  do_debug_exception+0x70/0x1a4
> >  el0_dbg+0x18/0x20
> > 
> > The reproducer was basically an automated gdb test that set a breakpoint
> > on a simple "hello world" program and then quit gdb once the breakpoint
> > was hit - i.e. "(gdb) A debugging session is active.  Quit anyway? "
> 
> Hmm, the debug exception handler path was definitely written with the
> expectation that preemption is disabled, so this is unfortunate. For
> exceptions from kernelspace, we need to keep that guarantee as we implement
> things like BUG() using this path. For exceptions from userspace, it's
> plausible that we could re-enable preemption, but then we should also
> re-enable interrupts and debug exceptions too because we don't
> context-switch pstate in switch_to() and we would end up with holes in our
> kernel debug coverage (and these might be fatal if e.g. single step doesn't
> work in a kprobe OOL buffer). However, that then means that any common code
> when handling user and kernel debug exceptions needs to be re-entrant,
> which it probably isn't at the moment (I haven't checked).

I'm pretty certain existing code is not reentrant, and regardless it's
going to be a mess to reason about this generally if we have to undo our
strict exception nesting rules.

I reckon we need to treat this like an NMI instead -- is that plausible?

Mark.
Masami Hiramatsu June 25, 2020, 4:03 p.m. UTC | #3
On Tue, 23 Jun 2020 17:55:57 +0100
Mark Rutland <mark.rutland@arm.com> wrote:

> On Tue, Jun 23, 2020 at 04:59:01PM +0100, Will Deacon wrote:
> > On Thu, Jun 18, 2020 at 01:29:29PM -0400, Paul Gortmaker wrote:
> > > In commit d8bb6718c4db ("arm64: Make debug exception handlers visible
> > > from RCU") debug_exception_enter and exit were added to deal with better
> > > tracking of RCU state - however, in addition to that, but not mentioned
> > > in the commit log, a preempt_disable/enable pair were also added.
> > > 
> > > Based on the comment (being removed here) it would seem that the pair
> > > were not added to address a specific problem, but just as a proactive,
> > > preventative measure - as in "seemed like a good idea at the time".
> > > 
> > > The problem is that do_debug_exception() eventually calls out to
> > > generic kernel code like do_force_sig_info() which takes non-raw locks
> > > and on -rt enabled kernels, results in things looking like the following,
> > > since on -rt kernels, it is noticed that preemption is still disabled.
> > > 
> > >  BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:975
> > >  in_atomic(): 1, irqs_disabled(): 0, pid: 35658, name: gdbtest
> > >  Preemption disabled at:
> > >  [<ffff000010081578>] do_debug_exception+0x38/0x1a4
> > >  Call trace:
> > >  dump_backtrace+0x0/0x138
> > >  show_stack+0x24/0x30
> > >  dump_stack+0x94/0xbc
> > >  ___might_sleep+0x13c/0x168
> > >  rt_spin_lock+0x40/0x80
> > >  do_force_sig_info+0x30/0xe0
> > >  force_sig_fault+0x64/0x90
> > >  arm64_force_sig_fault+0x50/0x80
> > >  send_user_sigtrap+0x50/0x80
> > >  brk_handler+0x98/0xc8
> > >  do_debug_exception+0x70/0x1a4
> > >  el0_dbg+0x18/0x20
> > > 
> > > The reproducer was basically an automated gdb test that set a breakpoint
> > > on a simple "hello world" program and then quit gdb once the breakpoint
> > > was hit - i.e. "(gdb) A debugging session is active.  Quit anyway? "
> > 
> > Hmm, the debug exception handler path was definitely written with the
> > expectation that preemption is disabled, so this is unfortunate. For
> > exceptions from kernelspace, we need to keep that guarantee as we implement
> > things like BUG() using this path. For exceptions from userspace, it's
> > plausible that we could re-enable preemption, but then we should also
> > re-enable interrupts and debug exceptions too because we don't
> > context-switch pstate in switch_to() and we would end up with holes in our
> > kernel debug coverage (and these might be fatal if e.g. single step doesn't
> > work in a kprobe OOL buffer). However, that then means that any common code
> > when handling user and kernel debug exceptions needs to be re-entrant,
> > which it probably isn't at the moment (I haven't checked).
> 
> I'm pretty certain existing code is not reentrant, and regardless it's
> going to be a mess to reason about this generally if we have to undo our
> strict exception nesting rules.

Sounds like a kprobe post-handler hits another kprobe, which might invoke
the debug handler in debug context. If kprobes find that, it skips the
nested one, but it needs to do single stepping in it to exit.
Is that not possible on arm64?

Thank you,

> 
> I reckon we need to treat this like an NMI instead -- is that plausible?
> 
> Mark.
Will Deacon June 26, 2020, 9:55 a.m. UTC | #4
On Tue, Jun 23, 2020 at 05:55:57PM +0100, Mark Rutland wrote:
> On Tue, Jun 23, 2020 at 04:59:01PM +0100, Will Deacon wrote:
> > On Thu, Jun 18, 2020 at 01:29:29PM -0400, Paul Gortmaker wrote:
> > >  BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:975
> > >  in_atomic(): 1, irqs_disabled(): 0, pid: 35658, name: gdbtest
> > >  Preemption disabled at:
> > >  [<ffff000010081578>] do_debug_exception+0x38/0x1a4
> > >  Call trace:
> > >  dump_backtrace+0x0/0x138
> > >  show_stack+0x24/0x30
> > >  dump_stack+0x94/0xbc
> > >  ___might_sleep+0x13c/0x168
> > >  rt_spin_lock+0x40/0x80
> > >  do_force_sig_info+0x30/0xe0
> > >  force_sig_fault+0x64/0x90
> > >  arm64_force_sig_fault+0x50/0x80
> > >  send_user_sigtrap+0x50/0x80
> > >  brk_handler+0x98/0xc8
> > >  do_debug_exception+0x70/0x1a4
> > >  el0_dbg+0x18/0x20
> > > 
> > > The reproducer was basically an automated gdb test that set a breakpoint
> > > on a simple "hello world" program and then quit gdb once the breakpoint
> > > was hit - i.e. "(gdb) A debugging session is active.  Quit anyway? "
> > 
> > Hmm, the debug exception handler path was definitely written with the
> > expectation that preemption is disabled, so this is unfortunate. For
> > exceptions from kernelspace, we need to keep that guarantee as we implement
> > things like BUG() using this path. For exceptions from userspace, it's
> > plausible that we could re-enable preemption, but then we should also
> > re-enable interrupts and debug exceptions too because we don't
> > context-switch pstate in switch_to() and we would end up with holes in our
> > kernel debug coverage (and these might be fatal if e.g. single step doesn't
> > work in a kprobe OOL buffer). However, that then means that any common code
> > when handling user and kernel debug exceptions needs to be re-entrant,
> > which it probably isn't at the moment (I haven't checked).
> 
> I'm pretty certain existing code is not reentrant, and regardless it's
> going to be a mess to reason about this generally if we have to undo our
> strict exception nesting rules.

Are these rules written down somewhere? I'll need to update them if we
get this working for preempt-rt (and we should try to do that).

> I reckon we need to treat this like an NMI instead -- is that plausible?

I don't think so. It's very much a synchronous exception, and delivering a
signal to the exceptional context doesn't feel like an NMI to me. There's
also a fair amount of code that can run in debug context (hw_breakpoint,
kprobes, uprobes, kasan) which might not be happy to suddenly be in an
NMI-like environment. Furthermore, the masking rules are different depending
on what triggers the exception.

One of the things I've started looking at is ripping out our dodgy
hw_breakpoint code so that kernel debug exceptions are easier to reason
about. Specifically, I think we end up with something like:

- On taking a non-debug exception from EL0, unmask D as soon as we can.

- On taking a debug exception from EL0, unmask {D,I} and invoke user
  handlers. I think this always means SIGTRAP, apart from uprobes.
  This will mean making those paths preemptible, as I don't think they
  are right now (e.g. traversing the callback hooks uses an RCU-protected
  list).

- On taking a non-debug, non-fatal synchronous exception from EL1, unmask
  D as soon as we can (i.e. we step into these exceptions). Fatal exceptions
  can obviously leave D masked.

- On taking an interrupt from EL1, stash MDSCR_EL1.SS in a pcpu variable and
  clear the register bit if it was set. Then unmask only D and leave I set. On
  return from the exception, set D and restore MDSCR_EL1.SS. If we decide to
  reschedule, unmask D (i.e. we only step into interrupts if we need a
  reschedule. Alternatively, we could skip the reschedule if we were
  stepping.)

- On taking a debug exception from EL1, leave {D,I} set. Watchpoints on
  uaccess are silently stepped over.

Thoughts? We could probably simplify this if we could state that stepping an
instruction in kernel space could only ever be interrupted by an interrupt.
That's probably true for kprobes, but relying on it feels like it might bite
us later on.

Will
Masami Hiramatsu Nov. 24, 2020, 2:48 a.m. UTC | #5
On Fri, 26 Jun 2020 10:55:54 +0100
Will Deacon <will@kernel.org> wrote:

> On Tue, Jun 23, 2020 at 05:55:57PM +0100, Mark Rutland wrote:
> > On Tue, Jun 23, 2020 at 04:59:01PM +0100, Will Deacon wrote:
> > > On Thu, Jun 18, 2020 at 01:29:29PM -0400, Paul Gortmaker wrote:
> > > >  BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:975
> > > >  in_atomic(): 1, irqs_disabled(): 0, pid: 35658, name: gdbtest
> > > >  Preemption disabled at:
> > > >  [<ffff000010081578>] do_debug_exception+0x38/0x1a4
> > > >  Call trace:
> > > >  dump_backtrace+0x0/0x138
> > > >  show_stack+0x24/0x30
> > > >  dump_stack+0x94/0xbc
> > > >  ___might_sleep+0x13c/0x168
> > > >  rt_spin_lock+0x40/0x80
> > > >  do_force_sig_info+0x30/0xe0
> > > >  force_sig_fault+0x64/0x90
> > > >  arm64_force_sig_fault+0x50/0x80
> > > >  send_user_sigtrap+0x50/0x80
> > > >  brk_handler+0x98/0xc8
> > > >  do_debug_exception+0x70/0x1a4
> > > >  el0_dbg+0x18/0x20
> > > > 
> > > > The reproducer was basically an automated gdb test that set a breakpoint
> > > > on a simple "hello world" program and then quit gdb once the breakpoint
> > > > was hit - i.e. "(gdb) A debugging session is active.  Quit anyway? "
> > > 
> > > Hmm, the debug exception handler path was definitely written with the
> > > expectation that preemption is disabled, so this is unfortunate. For
> > > exceptions from kernelspace, we need to keep that guarantee as we implement
> > > things like BUG() using this path. For exceptions from userspace, it's
> > > plausible that we could re-enable preemption, but then we should also
> > > re-enable interrupts and debug exceptions too because we don't
> > > context-switch pstate in switch_to() and we would end up with holes in our
> > > kernel debug coverage (and these might be fatal if e.g. single step doesn't
> > > work in a kprobe OOL buffer). However, that then means that any common code
> > > when handling user and kernel debug exceptions needs to be re-entrant,
> > > which it probably isn't at the moment (I haven't checked).
> > 
> > I'm pretty certain existing code is not reentrant, and regardless it's
> > going to be a mess to reason about this generally if we have to undo our
> > strict exception nesting rules.
> 
> Are these rules written down somewhere? I'll need to update them if we
> get this working for preempt-rt (and we should try to do that).
> 
> > I reckon we need to treat this like an NMI instead -- is that plausible?
> 
> I don't think so. It's very much a synchronous exception, and delivering a
> signal to the exceptional context doesn't feel like an NMI to me. There's
> also a fair amount of code that can run in debug context (hw_breakpoint,
> kprobes, uprobes, kasan) which might not be happy to suddenly be in an
> NMI-like environment. Furthermore, the masking rules are different depending
> on what triggers the exception.
> 
> One of the things I've started looking at is ripping out our dodgy
> hw_breakpoint code so that kernel debug exceptions are easier to reason
> about. Specifically, I think we end up with something like:
> 
> - On taking a non-debug exception from EL0, unmask D as soon as we can.
> 
> - On taking a debug exception from EL0, unmask {D,I} and invoke user
>   handlers. I think this always means SIGTRAP, apart from uprobes.
>   This will mean making those paths preemptible, as I don't think they
>   are right now (e.g. traversing the callback hooks uses an RCU-protected
>   list).
> 
> - On taking a non-debug, non-fatal synchronous exception from EL1, unmask
>   D as soon as we can (i.e. we step into these exceptions). Fatal exceptions
>   can obviously leave D masked.

To make clear, the BRK exception will be non-fatal synchronous exception,
correct? If so, would you mean single-stepping into these exception handlers
too?

As we discussed in another thread, after the BRK only kprobes is merged,
I'm OK for this. But also we need to care about the BRK recursive call.
If someone puts a kprobe in the single-step handler, we can break into the
other break handler is running. (kprobes itself can handle this case, because
it sets the current_kprobe as the recursion-detect flag)

> 
> - On taking an interrupt from EL1, stash MDSCR_EL1.SS in a pcpu variable and
>   clear the register bit if it was set. Then unmask only D and leave I set. On
>   return from the exception, set D and restore MDSCR_EL1.SS. If we decide to
>   reschedule, unmask D (i.e. we only step into interrupts if we need a
>   reschedule. Alternatively, we could skip the reschedule if we were
>   stepping.)

This sounds good to me (context-based single-stepping).

Thank you,

> 
> - On taking a debug exception from EL1, leave {D,I} set. Watchpoints on
>   uaccess are silently stepped over.
> 
> Thoughts? We could probably simplify this if we could state that stepping an
> instruction in kernel space could only ever be interrupted by an interrupt.
> That's probably true for kprobes, but relying on it feels like it might bite
> us later on.
> 
> Will
diff mbox series

Patch

diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 8afb238ff335..4d83ec991b33 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -788,13 +788,6 @@  void __init hook_debug_fault_code(int nr,
 	debug_fault_info[nr].name	= name;
 }
 
-/*
- * In debug exception context, we explicitly disable preemption despite
- * having interrupts disabled.
- * This serves two purposes: it makes it much less likely that we would
- * accidentally schedule in exception context and it will force a warning
- * if we somehow manage to schedule by accident.
- */
 static void debug_exception_enter(struct pt_regs *regs)
 {
 	/*
@@ -816,8 +809,6 @@  static void debug_exception_enter(struct pt_regs *regs)
 		rcu_nmi_enter();
 	}
 
-	preempt_disable();
-
 	/* This code is a bit fragile.  Test it. */
 	RCU_LOCKDEP_WARN(!rcu_is_watching(), "exception_enter didn't work");
 }
@@ -825,8 +816,6 @@  NOKPROBE_SYMBOL(debug_exception_enter);
 
 static void debug_exception_exit(struct pt_regs *regs)
 {
-	preempt_enable_no_resched();
-
 	if (!user_mode(regs))
 		rcu_nmi_exit();