Message ID | 20220609033322.12436-1-mark-pk.tsai@mediatek.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | arm64: Clear OS lock in enable_debug_monitors | expand |
On Thu, 09 Jun 2022 04:33:18 +0100, Mark-PK Tsai <mark-pk.tsai@mediatek.com> wrote: > > Always clear OS lock before enable debug event. > > The OS lock is clear in cpuhp ops in recent kernel, > but when the debug exception happened before it > kernel might crash because debug event enable didn't > take effect when OS lock is hold. > > Below is the use case that having this problem: > > Register kprobe in console_unlock and kernel will > panic at secondary_start_kernel on secondary core. Feels a bit extreme to do that, but hey... > > CPU: 1 PID: 0 Comm: swapper/1 Tainted: P > ... > pstate: 004001c5 (nzcv dAIF +PAN -UAO) > pc : do_undefinstr+0x5c/0x60 > lr : do_undefinstr+0x2c/0x60 > sp : ffffffc01338bc50 > pmr_save: 000000f0 > x29: ffffffc01338bc50 x28: ffffff8115e95a00 T > x27: ffffffc01258e000 x26: ffffff8115e95a00 > x25: 00000000ffffffff x24: 0000000000000000 > x23: 00000000604001c5 x22: ffffffc014015008 > x21: 000000002232f000 x20: 00000000000000f0 j > x19: ffffffc01338bc70 x18: ffffffc0132ed040 > x17: ffffffc01258eb48 x16: 0000000000000403 L& > x15: 0000000000016480 x14: ffffffc01258e000 i/ > x13: 0000000000000006 x12: 0000000000006985 > x11: 00000000d5300000 x10: 0000000000000000 > x9 : 9f6c79217a8a0400 x8 : 00000000000000c5 > x7 : 0000000000000000 x6 : ffffffc01338bc08 2T > x5 : ffffffc01338bc08 x4 : 0000000000000002 > x3 : 0000000000000000 x2 : 0000000000000004 > x1 : 0000000000000000 x0 : 0000000000000001 *q > Call trace: > do_undefinstr+0x5c/0x60 > el1_undef+0x10/0xb4 > 0xffffffc014015008 > vprintk_func+0x210/0x290 > printk+0x64/0x90 > cpuinfo_detect_icache_policy+0x80/0xe0 > __cpuinfo_store_cpu+0x150/0x160 > secondary_start_kernel+0x154/0x440 > > The root cause is that OS_LSR_EL1.OSLK is reset > to 1 on a cold reset[1] and the firmware didn't > unlock it by default. > So the core didn't go to el1_dbg as expected after > kernel_enable_single_step and eret. > > [1] https://developer.arm.com/documentation/ddi0595/2021-06/AArch64-Registers/OSLSR-EL1--OS-Lock-Status-Register?lang=en > Signed-off-by: Mark-PK Tsai <mark-pk.tsai@mediatek.com> > --- > arch/arm64/kernel/debug-monitors.c | 23 ++++++++++++----------- > 1 file changed, 12 insertions(+), 11 deletions(-) > > diff --git a/arch/arm64/kernel/debug-monitors.c b/arch/arm64/kernel/debug-monitors.c > index bf9fe71589bc..186f2846d652 100644 > --- a/arch/arm64/kernel/debug-monitors.c > +++ b/arch/arm64/kernel/debug-monitors.c > @@ -70,6 +70,17 @@ static int __init early_debug_disable(char *buf) > > early_param("nodebugmon", early_debug_disable); > > +/* > + * OS lock clearing. > + */ > +static int clear_os_lock(unsigned int cpu) > +{ > + write_sysreg(0, osdlr_el1); > + write_sysreg(0, oslar_el1); > + isb(); > + return 0; > +} > + > /* > * Keep track of debug users on each core. > * The ref counts are per-cpu so we use a local_t type. > @@ -91,6 +102,7 @@ void enable_debug_monitors(enum dbg_active_el el) > enable |= DBG_MDSCR_KDE; > > if (enable && debug_enabled) { > + clear_os_lock(0); > mdscr = mdscr_read(); > mdscr |= enable; > mdscr_write(mdscr); > @@ -119,17 +131,6 @@ void disable_debug_monitors(enum dbg_active_el el) > } > NOKPROBE_SYMBOL(disable_debug_monitors); > > -/* > - * OS lock clearing. > - */ > -static int clear_os_lock(unsigned int cpu) > -{ > - write_sysreg(0, osdlr_el1); > - write_sysreg(0, oslar_el1); > - isb(); > - return 0; > -} > - > static int __init debug_monitors_init(void) > { > return cpuhp_setup_state(CPUHP_AP_ARM64_DEBUG_MONITORS_STARTING, If the OS Lock is cleared every time we enabled debug, what is the point of the notifier then? Thanks, M.
On Thu, Jun 09, 2022 at 11:33:18AM +0800, Mark-PK Tsai wrote: > Always clear OS lock before enable debug event. > > The OS lock is clear in cpuhp ops in recent kernel, > but when the debug exception happened before it > kernel might crash because debug event enable didn't > take effect when OS lock is hold. > > Below is the use case that having this problem: > > Register kprobe in console_unlock and kernel will > panic at secondary_start_kernel on secondary core. > > CPU: 1 PID: 0 Comm: swapper/1 Tainted: P > ... > pstate: 004001c5 (nzcv dAIF +PAN -UAO) > pc : do_undefinstr+0x5c/0x60 > lr : do_undefinstr+0x2c/0x60 > sp : ffffffc01338bc50 > pmr_save: 000000f0 > x29: ffffffc01338bc50 x28: ffffff8115e95a00 T > x27: ffffffc01258e000 x26: ffffff8115e95a00 > x25: 00000000ffffffff x24: 0000000000000000 > x23: 00000000604001c5 x22: ffffffc014015008 > x21: 000000002232f000 x20: 00000000000000f0 j > x19: ffffffc01338bc70 x18: ffffffc0132ed040 > x17: ffffffc01258eb48 x16: 0000000000000403 L& > x15: 0000000000016480 x14: ffffffc01258e000 i/ > x13: 0000000000000006 x12: 0000000000006985 > x11: 00000000d5300000 x10: 0000000000000000 > x9 : 9f6c79217a8a0400 x8 : 00000000000000c5 > x7 : 0000000000000000 x6 : ffffffc01338bc08 2T > x5 : ffffffc01338bc08 x4 : 0000000000000002 > x3 : 0000000000000000 x2 : 0000000000000004 > x1 : 0000000000000000 x0 : 0000000000000001 *q > Call trace: > do_undefinstr+0x5c/0x60 > el1_undef+0x10/0xb4 > 0xffffffc014015008 > vprintk_func+0x210/0x290 > printk+0x64/0x90 > cpuinfo_detect_icache_policy+0x80/0xe0 > __cpuinfo_store_cpu+0x150/0x160 > secondary_start_kernel+0x154/0x440 > > The root cause is that OS_LSR_EL1.OSLK is reset > to 1 on a cold reset[1] and the firmware didn't > unlock it by default. > So the core didn't go to el1_dbg as expected after > kernel_enable_single_step and eret. Hmm, I thought we didn't use hardware single-step for kprobes after 7ee31a3aa8f4 ("arm64: kprobes: Use BRK instead of single-step when executing instructions out-of-line"). What is triggering this exception? Will
> On Thu, Jun 09, 2022 at 11:33:18AM +0800, Mark-PK Tsai wrote: > > Always clear OS lock before enable debug event. > > > > The OS lock is clear in cpuhp ops in recent kernel, > > but when the debug exception happened before it > > kernel might crash because debug event enable didn't > > take effect when OS lock is hold. > > > > Below is the use case that having this problem: > > > > Register kprobe in console_unlock and kernel will > > panic at secondary_start_kernel on secondary core. > > > > CPU: 1 PID: 0 Comm: swapper/1 Tainted: P > > ... > > pstate: 004001c5 (nzcv dAIF +PAN -UAO) > > pc : do_undefinstr+0x5c/0x60 > > lr : do_undefinstr+0x2c/0x60 > > sp : ffffffc01338bc50 > > pmr_save: 000000f0 > > x29: ffffffc01338bc50 x28: ffffff8115e95a00 T > > x27: ffffffc01258e000 x26: ffffff8115e95a00 > > x25: 00000000ffffffff x24: 0000000000000000 > > x23: 00000000604001c5 x22: ffffffc014015008 > > x21: 000000002232f000 x20: 00000000000000f0 j > > x19: ffffffc01338bc70 x18: ffffffc0132ed040 > > x17: ffffffc01258eb48 x16: 0000000000000403 L& > > x15: 0000000000016480 x14: ffffffc01258e000 i/ > > x13: 0000000000000006 x12: 0000000000006985 > > x11: 00000000d5300000 x10: 0000000000000000 > > x9 : 9f6c79217a8a0400 x8 : 00000000000000c5 > > x7 : 0000000000000000 x6 : ffffffc01338bc08 2T > > x5 : ffffffc01338bc08 x4 : 0000000000000002 > > x3 : 0000000000000000 x2 : 0000000000000004 > > x1 : 0000000000000000 x0 : 0000000000000001 *q > > Call trace: > > do_undefinstr+0x5c/0x60 > > el1_undef+0x10/0xb4 > > 0xffffffc014015008 > > vprintk_func+0x210/0x290 > > printk+0x64/0x90 > > cpuinfo_detect_icache_policy+0x80/0xe0 > > __cpuinfo_store_cpu+0x150/0x160 > > secondary_start_kernel+0x154/0x440 > > > > The root cause is that OS_LSR_EL1.OSLK is reset > > to 1 on a cold reset[1] and the firmware didn't > > unlock it by default. > > So the core didn't go to el1_dbg as expected after > > kernel_enable_single_step and eret. > > Hmm, I thought we didn't use hardware single-step for kprobes after > 7ee31a3aa8f4 ("arm64: kprobes: Use BRK instead of single-step when executing > instructions out-of-line"). What is triggering this exception? > > Will You're right. Actually this issue happend in 5.4 LTS, and the commit you mentioned can avoid the kernel panic by not using hardware single-step. I think 5.4 LTS should apply this commit. 7ee31a3aa8f4 ("arm64: kprobes: Use BRK instead of single-step when executing instructions out-of-line") Cc: stable@vger.kernel.org And I'm not sure if there is other use case may have problem if the kernel don't clear OS lock in enable_debug_monitors everytime. So should we do this to prevent someone face the similar issue?
diff --git a/arch/arm64/kernel/debug-monitors.c b/arch/arm64/kernel/debug-monitors.c index bf9fe71589bc..186f2846d652 100644 --- a/arch/arm64/kernel/debug-monitors.c +++ b/arch/arm64/kernel/debug-monitors.c @@ -70,6 +70,17 @@ static int __init early_debug_disable(char *buf) early_param("nodebugmon", early_debug_disable); +/* + * OS lock clearing. + */ +static int clear_os_lock(unsigned int cpu) +{ + write_sysreg(0, osdlr_el1); + write_sysreg(0, oslar_el1); + isb(); + return 0; +} + /* * Keep track of debug users on each core. * The ref counts are per-cpu so we use a local_t type. @@ -91,6 +102,7 @@ void enable_debug_monitors(enum dbg_active_el el) enable |= DBG_MDSCR_KDE; if (enable && debug_enabled) { + clear_os_lock(0); mdscr = mdscr_read(); mdscr |= enable; mdscr_write(mdscr); @@ -119,17 +131,6 @@ void disable_debug_monitors(enum dbg_active_el el) } NOKPROBE_SYMBOL(disable_debug_monitors); -/* - * OS lock clearing. - */ -static int clear_os_lock(unsigned int cpu) -{ - write_sysreg(0, osdlr_el1); - write_sysreg(0, oslar_el1); - isb(); - return 0; -} - static int __init debug_monitors_init(void) { return cpuhp_setup_state(CPUHP_AP_ARM64_DEBUG_MONITORS_STARTING,
Always clear OS lock before enable debug event. The OS lock is clear in cpuhp ops in recent kernel, but when the debug exception happened before it kernel might crash because debug event enable didn't take effect when OS lock is hold. Below is the use case that having this problem: Register kprobe in console_unlock and kernel will panic at secondary_start_kernel on secondary core. CPU: 1 PID: 0 Comm: swapper/1 Tainted: P ... pstate: 004001c5 (nzcv dAIF +PAN -UAO) pc : do_undefinstr+0x5c/0x60 lr : do_undefinstr+0x2c/0x60 sp : ffffffc01338bc50 pmr_save: 000000f0 x29: ffffffc01338bc50 x28: ffffff8115e95a00 T x27: ffffffc01258e000 x26: ffffff8115e95a00 x25: 00000000ffffffff x24: 0000000000000000 x23: 00000000604001c5 x22: ffffffc014015008 x21: 000000002232f000 x20: 00000000000000f0 j x19: ffffffc01338bc70 x18: ffffffc0132ed040 x17: ffffffc01258eb48 x16: 0000000000000403 L& x15: 0000000000016480 x14: ffffffc01258e000 i/ x13: 0000000000000006 x12: 0000000000006985 x11: 00000000d5300000 x10: 0000000000000000 x9 : 9f6c79217a8a0400 x8 : 00000000000000c5 x7 : 0000000000000000 x6 : ffffffc01338bc08 2T x5 : ffffffc01338bc08 x4 : 0000000000000002 x3 : 0000000000000000 x2 : 0000000000000004 x1 : 0000000000000000 x0 : 0000000000000001 *q Call trace: do_undefinstr+0x5c/0x60 el1_undef+0x10/0xb4 0xffffffc014015008 vprintk_func+0x210/0x290 printk+0x64/0x90 cpuinfo_detect_icache_policy+0x80/0xe0 __cpuinfo_store_cpu+0x150/0x160 secondary_start_kernel+0x154/0x440 The root cause is that OS_LSR_EL1.OSLK is reset to 1 on a cold reset[1] and the firmware didn't unlock it by default. So the core didn't go to el1_dbg as expected after kernel_enable_single_step and eret. [1] https://developer.arm.com/documentation/ddi0595/2021-06/AArch64-Registers/OSLSR-EL1--OS-Lock-Status-Register?lang=en Signed-off-by: Mark-PK Tsai <mark-pk.tsai@mediatek.com> --- arch/arm64/kernel/debug-monitors.c | 23 ++++++++++++----------- 1 file changed, 12 insertions(+), 11 deletions(-)