diff mbox series

arm64: Clear OS lock in enable_debug_monitors

Message ID 20220609033322.12436-1-mark-pk.tsai@mediatek.com (mailing list archive)
State New, archived
Headers show
Series arm64: Clear OS lock in enable_debug_monitors | expand

Commit Message

Mark-PK Tsai (蔡沛剛) June 9, 2022, 3:33 a.m. UTC
Always clear OS lock before enable debug event.

The OS lock is clear in cpuhp ops in recent kernel,
but when the debug exception happened before it
kernel might crash because debug event enable didn't
take effect when OS lock is hold.

Below is the use case that having this problem:

Register kprobe in console_unlock and kernel will
panic at secondary_start_kernel on secondary core.

CPU: 1 PID: 0 Comm: swapper/1 Tainted: P
...
pstate: 004001c5 (nzcv dAIF +PAN -UAO)
pc : do_undefinstr+0x5c/0x60
lr : do_undefinstr+0x2c/0x60
sp : ffffffc01338bc50
pmr_save: 000000f0
x29: ffffffc01338bc50 x28: ffffff8115e95a00 T
x27: ffffffc01258e000 x26: ffffff8115e95a00
x25: 00000000ffffffff x24: 0000000000000000
x23: 00000000604001c5 x22: ffffffc014015008
x21: 000000002232f000 x20: 00000000000000f0 j
x19: ffffffc01338bc70 x18: ffffffc0132ed040
x17: ffffffc01258eb48 x16: 0000000000000403 L&
x15: 0000000000016480 x14: ffffffc01258e000 i/
x13: 0000000000000006 x12: 0000000000006985
x11: 00000000d5300000 x10: 0000000000000000
x9 : 9f6c79217a8a0400 x8 : 00000000000000c5
x7 : 0000000000000000 x6 : ffffffc01338bc08 2T
x5 : ffffffc01338bc08 x4 : 0000000000000002
x3 : 0000000000000000 x2 : 0000000000000004
x1 : 0000000000000000 x0 : 0000000000000001 *q
Call trace:
 do_undefinstr+0x5c/0x60
 el1_undef+0x10/0xb4
 0xffffffc014015008
 vprintk_func+0x210/0x290
 printk+0x64/0x90
 cpuinfo_detect_icache_policy+0x80/0xe0
 __cpuinfo_store_cpu+0x150/0x160
 secondary_start_kernel+0x154/0x440

The root cause is that OS_LSR_EL1.OSLK is reset
to 1 on a cold reset[1] and the firmware didn't
unlock it by default.
So the core didn't go to el1_dbg as expected after
kernel_enable_single_step and eret.

[1] https://developer.arm.com/documentation/ddi0595/2021-06/AArch64-Registers/OSLSR-EL1--OS-Lock-Status-Register?lang=en
Signed-off-by: Mark-PK Tsai <mark-pk.tsai@mediatek.com>
---
 arch/arm64/kernel/debug-monitors.c | 23 ++++++++++++-----------
 1 file changed, 12 insertions(+), 11 deletions(-)

Comments

Marc Zyngier June 9, 2022, 11:20 a.m. UTC | #1
On Thu, 09 Jun 2022 04:33:18 +0100,
Mark-PK Tsai <mark-pk.tsai@mediatek.com> wrote:
> 
> Always clear OS lock before enable debug event.
> 
> The OS lock is clear in cpuhp ops in recent kernel,
> but when the debug exception happened before it
> kernel might crash because debug event enable didn't
> take effect when OS lock is hold.
> 
> Below is the use case that having this problem:
> 
> Register kprobe in console_unlock and kernel will
> panic at secondary_start_kernel on secondary core.

Feels a bit extreme to do that, but hey...

> 
> CPU: 1 PID: 0 Comm: swapper/1 Tainted: P
> ...
> pstate: 004001c5 (nzcv dAIF +PAN -UAO)
> pc : do_undefinstr+0x5c/0x60
> lr : do_undefinstr+0x2c/0x60
> sp : ffffffc01338bc50
> pmr_save: 000000f0
> x29: ffffffc01338bc50 x28: ffffff8115e95a00 T
> x27: ffffffc01258e000 x26: ffffff8115e95a00
> x25: 00000000ffffffff x24: 0000000000000000
> x23: 00000000604001c5 x22: ffffffc014015008
> x21: 000000002232f000 x20: 00000000000000f0 j
> x19: ffffffc01338bc70 x18: ffffffc0132ed040
> x17: ffffffc01258eb48 x16: 0000000000000403 L&
> x15: 0000000000016480 x14: ffffffc01258e000 i/
> x13: 0000000000000006 x12: 0000000000006985
> x11: 00000000d5300000 x10: 0000000000000000
> x9 : 9f6c79217a8a0400 x8 : 00000000000000c5
> x7 : 0000000000000000 x6 : ffffffc01338bc08 2T
> x5 : ffffffc01338bc08 x4 : 0000000000000002
> x3 : 0000000000000000 x2 : 0000000000000004
> x1 : 0000000000000000 x0 : 0000000000000001 *q
> Call trace:
>  do_undefinstr+0x5c/0x60
>  el1_undef+0x10/0xb4
>  0xffffffc014015008
>  vprintk_func+0x210/0x290
>  printk+0x64/0x90
>  cpuinfo_detect_icache_policy+0x80/0xe0
>  __cpuinfo_store_cpu+0x150/0x160
>  secondary_start_kernel+0x154/0x440
> 
> The root cause is that OS_LSR_EL1.OSLK is reset
> to 1 on a cold reset[1] and the firmware didn't
> unlock it by default.
> So the core didn't go to el1_dbg as expected after
> kernel_enable_single_step and eret.
> 
> [1] https://developer.arm.com/documentation/ddi0595/2021-06/AArch64-Registers/OSLSR-EL1--OS-Lock-Status-Register?lang=en
> Signed-off-by: Mark-PK Tsai <mark-pk.tsai@mediatek.com>
> ---
>  arch/arm64/kernel/debug-monitors.c | 23 ++++++++++++-----------
>  1 file changed, 12 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/arm64/kernel/debug-monitors.c b/arch/arm64/kernel/debug-monitors.c
> index bf9fe71589bc..186f2846d652 100644
> --- a/arch/arm64/kernel/debug-monitors.c
> +++ b/arch/arm64/kernel/debug-monitors.c
> @@ -70,6 +70,17 @@ static int __init early_debug_disable(char *buf)
>  
>  early_param("nodebugmon", early_debug_disable);
>  
> +/*
> + * OS lock clearing.
> + */
> +static int clear_os_lock(unsigned int cpu)
> +{
> +	write_sysreg(0, osdlr_el1);
> +	write_sysreg(0, oslar_el1);
> +	isb();
> +	return 0;
> +}
> +
>  /*
>   * Keep track of debug users on each core.
>   * The ref counts are per-cpu so we use a local_t type.
> @@ -91,6 +102,7 @@ void enable_debug_monitors(enum dbg_active_el el)
>  		enable |= DBG_MDSCR_KDE;
>  
>  	if (enable && debug_enabled) {
> +		clear_os_lock(0);
>  		mdscr = mdscr_read();
>  		mdscr |= enable;
>  		mdscr_write(mdscr);
> @@ -119,17 +131,6 @@ void disable_debug_monitors(enum dbg_active_el el)
>  }
>  NOKPROBE_SYMBOL(disable_debug_monitors);
>  
> -/*
> - * OS lock clearing.
> - */
> -static int clear_os_lock(unsigned int cpu)
> -{
> -	write_sysreg(0, osdlr_el1);
> -	write_sysreg(0, oslar_el1);
> -	isb();
> -	return 0;
> -}
> -
>  static int __init debug_monitors_init(void)
>  {
>  	return cpuhp_setup_state(CPUHP_AP_ARM64_DEBUG_MONITORS_STARTING,

If the OS Lock is cleared every time we enabled debug, what is the
point of the notifier then?

Thanks,

	M.
Will Deacon June 9, 2022, 11:57 a.m. UTC | #2
On Thu, Jun 09, 2022 at 11:33:18AM +0800, Mark-PK Tsai wrote:
> Always clear OS lock before enable debug event.
> 
> The OS lock is clear in cpuhp ops in recent kernel,
> but when the debug exception happened before it
> kernel might crash because debug event enable didn't
> take effect when OS lock is hold.
> 
> Below is the use case that having this problem:
> 
> Register kprobe in console_unlock and kernel will
> panic at secondary_start_kernel on secondary core.
> 
> CPU: 1 PID: 0 Comm: swapper/1 Tainted: P
> ...
> pstate: 004001c5 (nzcv dAIF +PAN -UAO)
> pc : do_undefinstr+0x5c/0x60
> lr : do_undefinstr+0x2c/0x60
> sp : ffffffc01338bc50
> pmr_save: 000000f0
> x29: ffffffc01338bc50 x28: ffffff8115e95a00 T
> x27: ffffffc01258e000 x26: ffffff8115e95a00
> x25: 00000000ffffffff x24: 0000000000000000
> x23: 00000000604001c5 x22: ffffffc014015008
> x21: 000000002232f000 x20: 00000000000000f0 j
> x19: ffffffc01338bc70 x18: ffffffc0132ed040
> x17: ffffffc01258eb48 x16: 0000000000000403 L&
> x15: 0000000000016480 x14: ffffffc01258e000 i/
> x13: 0000000000000006 x12: 0000000000006985
> x11: 00000000d5300000 x10: 0000000000000000
> x9 : 9f6c79217a8a0400 x8 : 00000000000000c5
> x7 : 0000000000000000 x6 : ffffffc01338bc08 2T
> x5 : ffffffc01338bc08 x4 : 0000000000000002
> x3 : 0000000000000000 x2 : 0000000000000004
> x1 : 0000000000000000 x0 : 0000000000000001 *q
> Call trace:
>  do_undefinstr+0x5c/0x60
>  el1_undef+0x10/0xb4
>  0xffffffc014015008
>  vprintk_func+0x210/0x290
>  printk+0x64/0x90
>  cpuinfo_detect_icache_policy+0x80/0xe0
>  __cpuinfo_store_cpu+0x150/0x160
>  secondary_start_kernel+0x154/0x440
> 
> The root cause is that OS_LSR_EL1.OSLK is reset
> to 1 on a cold reset[1] and the firmware didn't
> unlock it by default.
> So the core didn't go to el1_dbg as expected after
> kernel_enable_single_step and eret.

Hmm, I thought we didn't use hardware single-step for kprobes after
7ee31a3aa8f4 ("arm64: kprobes: Use BRK instead of single-step when executing
instructions out-of-line"). What is triggering this exception?

Will
Mark-PK Tsai (蔡沛剛) June 10, 2022, 6:36 a.m. UTC | #3
> On Thu, Jun 09, 2022 at 11:33:18AM +0800, Mark-PK Tsai wrote:
> > Always clear OS lock before enable debug event.
> > 
> > The OS lock is clear in cpuhp ops in recent kernel,
> > but when the debug exception happened before it
> > kernel might crash because debug event enable didn't
> > take effect when OS lock is hold.
> > 
> > Below is the use case that having this problem:
> > 
> > Register kprobe in console_unlock and kernel will
> > panic at secondary_start_kernel on secondary core.
> > 
> > CPU: 1 PID: 0 Comm: swapper/1 Tainted: P
> > ...
> > pstate: 004001c5 (nzcv dAIF +PAN -UAO)
> > pc : do_undefinstr+0x5c/0x60
> > lr : do_undefinstr+0x2c/0x60
> > sp : ffffffc01338bc50
> > pmr_save: 000000f0
> > x29: ffffffc01338bc50 x28: ffffff8115e95a00 T
> > x27: ffffffc01258e000 x26: ffffff8115e95a00
> > x25: 00000000ffffffff x24: 0000000000000000
> > x23: 00000000604001c5 x22: ffffffc014015008
> > x21: 000000002232f000 x20: 00000000000000f0 j
> > x19: ffffffc01338bc70 x18: ffffffc0132ed040
> > x17: ffffffc01258eb48 x16: 0000000000000403 L&
> > x15: 0000000000016480 x14: ffffffc01258e000 i/
> > x13: 0000000000000006 x12: 0000000000006985
> > x11: 00000000d5300000 x10: 0000000000000000
> > x9 : 9f6c79217a8a0400 x8 : 00000000000000c5
> > x7 : 0000000000000000 x6 : ffffffc01338bc08 2T
> > x5 : ffffffc01338bc08 x4 : 0000000000000002
> > x3 : 0000000000000000 x2 : 0000000000000004
> > x1 : 0000000000000000 x0 : 0000000000000001 *q
> > Call trace:
> >  do_undefinstr+0x5c/0x60
> >  el1_undef+0x10/0xb4
> >  0xffffffc014015008
> >  vprintk_func+0x210/0x290
> >  printk+0x64/0x90
> >  cpuinfo_detect_icache_policy+0x80/0xe0
> >  __cpuinfo_store_cpu+0x150/0x160
> >  secondary_start_kernel+0x154/0x440
> > 
> > The root cause is that OS_LSR_EL1.OSLK is reset
> > to 1 on a cold reset[1] and the firmware didn't
> > unlock it by default.
> > So the core didn't go to el1_dbg as expected after
> > kernel_enable_single_step and eret.
> 
> Hmm, I thought we didn't use hardware single-step for kprobes after
> 7ee31a3aa8f4 ("arm64: kprobes: Use BRK instead of single-step when executing
> instructions out-of-line"). What is triggering this exception?
> 
> Will

You're right.
Actually this issue happend in 5.4 LTS, and the commit you mentioned
can avoid the kernel panic by not using hardware single-step.

I think 5.4 LTS should apply this commit.

7ee31a3aa8f4 ("arm64: kprobes: Use BRK instead of single-step when executing instructions out-of-line")

Cc: stable@vger.kernel.org



And I'm not sure if there is other use case may have problem if the
kernel don't clear OS lock in enable_debug_monitors everytime.
So should we do this to prevent someone face the similar issue?
diff mbox series

Patch

diff --git a/arch/arm64/kernel/debug-monitors.c b/arch/arm64/kernel/debug-monitors.c
index bf9fe71589bc..186f2846d652 100644
--- a/arch/arm64/kernel/debug-monitors.c
+++ b/arch/arm64/kernel/debug-monitors.c
@@ -70,6 +70,17 @@  static int __init early_debug_disable(char *buf)
 
 early_param("nodebugmon", early_debug_disable);
 
+/*
+ * OS lock clearing.
+ */
+static int clear_os_lock(unsigned int cpu)
+{
+	write_sysreg(0, osdlr_el1);
+	write_sysreg(0, oslar_el1);
+	isb();
+	return 0;
+}
+
 /*
  * Keep track of debug users on each core.
  * The ref counts are per-cpu so we use a local_t type.
@@ -91,6 +102,7 @@  void enable_debug_monitors(enum dbg_active_el el)
 		enable |= DBG_MDSCR_KDE;
 
 	if (enable && debug_enabled) {
+		clear_os_lock(0);
 		mdscr = mdscr_read();
 		mdscr |= enable;
 		mdscr_write(mdscr);
@@ -119,17 +131,6 @@  void disable_debug_monitors(enum dbg_active_el el)
 }
 NOKPROBE_SYMBOL(disable_debug_monitors);
 
-/*
- * OS lock clearing.
- */
-static int clear_os_lock(unsigned int cpu)
-{
-	write_sysreg(0, osdlr_el1);
-	write_sysreg(0, oslar_el1);
-	isb();
-	return 0;
-}
-
 static int __init debug_monitors_init(void)
 {
 	return cpuhp_setup_state(CPUHP_AP_ARM64_DEBUG_MONITORS_STARTING,