Message ID | 20170216013109.GA772@jagdpanzerIV.localdomain (mailing list archive) |
---|---|
State | Not Applicable, archived |
Headers | show |
* Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> [170215 17:32]: > On (02/15/17 10:01), Tony Lindgren wrote: > [..] > > Below is another issue I noticed caused by commit f975237b7682 that > > I noticed during booting. > > do you mean that with f975237b7682 you _always_ see that illegal RCU > usage warning? Yeah on every boot on devices using cpuidle_coupled. > > 8< -------------------------- > > [ 2.581939] hw-breakpoint: Failed to enable monitor mode on CPU 0. > > [ 2.591613] hw-breakpoint: CPU 0 failed to disable vector catch > > [ 2.597686] > > [ 2.597717] =============================== > > [ 2.597717] [ ERR: suspicious RCU usage. ] > > [ 2.597717] 4.10.0-rc8-next-20170215+ #119 Not tainted > > [ 2.597717] ------------------------------- > > [ 2.597717] ./include/trace/events/printk.h:32 suspicious rcu_dereference_check() usage! > > so this is trace_console(text, len) from > > cpuidle_enter_state() > ... > vprintk_emit() << error: Failed to enable monitor mode/CPU 0 failed to disable... > console_unlock() > call_console_drivers() > trace_console() > > > > [ 2.597717] > > [ 2.597717] other info that might help us debug this: > > [ 2.597717] > > [ 2.597717] > > [ 2.597717] RCU used illegally from idle CPU! > > um... is this what rcuidle() for? Yeah I think that's all there is to it. We've had a bunch of similar issues pop up over past year :) See for example: $ git log --author=McKenney drivers/base/power > --- > > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c > index 7180088cbb23..34da86e73d00 100644 > --- a/kernel/printk/printk.c > +++ b/kernel/printk/printk.c > @@ -1563,7 +1563,7 @@ static void call_console_drivers(const char *ext_text, size_t ext_len, > { > struct console *con; > > - trace_console(text, len); > + trace_console_rcuidle(text, len); > > if (!console_drivers) > return; > > --- > > > I need to look more at this. That fixes it for me thanks, so feel free to add: Tested-by: Tony Lindgren <tony@atomide.com> > > [ 2.597717] rcu_scheduler_active = 2, debug_locks = 0 > > [ 2.597747] RCU used illegally from extended quiescent state! > > > > [ 2.597747] 2 locks held by swapper/0/0: > > [ 2.597747] #0: (cpu_pm_notifier_lock){......}, at: [<c0237e2c>] cpu_pm_exit+0x10/0x54 > > [ 2.597747] #1: (console_lock){+.+.+.}, at: [<c01ab350>] vprintk_emit+0x264/0x474 > > [ 2.597747] > > [ 2.597747] stack backtrace: > > [ 2.597778] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.10.0-rc8-next-20170215+ #119 > > [ 2.597778] Hardware name: Generic OMAP4 (Flattened Device Tree) > > [ 2.597778] [<c0110228>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14) > > [ 2.597778] [<c010c224>] (show_stack) from [<c04ca8a0>] (dump_stack+0xac/0xe0) > > [ 2.597778] [<c04ca8a0>] (dump_stack) from [<c01ab084>] (console_unlock+0x5e8/0x650) > > [ 2.597778] [<c01ab084>] (console_unlock) from [<c01ab35c>] (vprintk_emit+0x270/0x474) > > [ 2.597778] [<c01ab35c>] (vprintk_emit) from [<c01ab6f0>] (vprintk_default+0x20/0x28) > > [ 2.597808] [<c01ab6f0>] (vprintk_default) from [<c0250c94>] (printk+0x20/0x30) > > [ 2.597808] [<c0250c94>] (printk) from [<c0111004>] (reset_ctrl_regs+0x108/0x288) > > [ 2.597808] [<c0111004>] (reset_ctrl_regs) from [<c0111220>] (dbg_cpu_pm_notify+0x28/0x30) > > [ 2.597808] [<c0111220>] (dbg_cpu_pm_notify) from [<c015f3ac>] (notifier_call_chain+0x44/0x80) > > [ 2.597808] [<c015f3ac>] (notifier_call_chain) from [<c0237e48>] (cpu_pm_exit+0x2c/0x54) > > [ 2.597808] [<c0237e48>] (cpu_pm_exit) from [<c0126ca0>] (omap_enter_idle_coupled+0x80/0x208) > > [ 2.597808] [<c0126ca0>] (omap_enter_idle_coupled) from [<c0680d80>] (cpuidle_enter_state+0x118/0x4ac) > > [ 2.597808] [<c0680d80>] (cpuidle_enter_state) from [<c0682e54>] (cpuidle_enter_state_coupled+0x3a8/0x40c) > > [ 2.597839] [<c0682e54>] (cpuidle_enter_state_coupled) from [<c0190c04>] (do_idle+0x1a4/0x218) > > [ 2.597839] [<c0190c04>] (do_idle) from [<c0190ffc>] (cpu_startup_entry+0x18/0x1c) > > [ 2.597839] [<c0190ffc>] (cpu_startup_entry) from [<c0c00c40>] (start_kernel+0x35c/0x3d4) > > [ 2.597839] [<c0c00c40>] (start_kernel) from [<8000807c>] (0x8000807c)
On (02/15/17 20:03), Tony Lindgren wrote: > * Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> [170215 17:32]: > > On (02/15/17 10:01), Tony Lindgren wrote: > > [..] > > > Below is another issue I noticed caused by commit f975237b7682 that > > > I noticed during booting. > > > > do you mean that with f975237b7682 you _always_ see that illegal RCU > > usage warning? > > Yeah on every boot on devices using cpuidle_coupled. does this mean that with the printk-safe patches reverted (so, basically, the same conditions module 4 printk patches) you don't see illegal RCU usage reports? at the moment I can't see any connection between f975237b7682 and RCU usage from idle CPU. [..] > > --- > > > > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c > > index 7180088cbb23..34da86e73d00 100644 > > --- a/kernel/printk/printk.c > > +++ b/kernel/printk/printk.c > > @@ -1563,7 +1563,7 @@ static void call_console_drivers(const char *ext_text, size_t ext_len, > > { > > struct console *con; > > > > - trace_console(text, len); > > + trace_console_rcuidle(text, len); > > > > if (!console_drivers) > > return; > > > > --- > > > > > > I need to look more at this. > > That fixes it for me thanks, so feel free to add: > > Tested-by: Tony Lindgren <tony@atomide.com> thanks. -ss
* Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> [170215 20:26]: > On (02/15/17 20:03), Tony Lindgren wrote: > > * Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> [170215 17:32]: > > > On (02/15/17 10:01), Tony Lindgren wrote: > > > [..] > > > > Below is another issue I noticed caused by commit f975237b7682 that > > > > I noticed during booting. > > > > > > do you mean that with f975237b7682 you _always_ see that illegal RCU > > > usage warning? > > > > Yeah on every boot on devices using cpuidle_coupled. > > does this mean that with the printk-safe patches reverted > (so, basically, the same conditions module 4 printk patches) > you don't see illegal RCU usage reports? at the moment I can't > see any connection between f975237b7682 and RCU usage from idle CPU. Yes reverting those four patches I listed earlier also makes it go away. Regards, Tony
On (02/16/17 07:10), Tony Lindgren wrote: [..] > > > > [..] > > > > > Below is another issue I noticed caused by commit f975237b7682 that > > > > > I noticed during booting. > > > > > > > > do you mean that with f975237b7682 you _always_ see that illegal RCU > > > > usage warning? > > > > > > Yeah on every boot on devices using cpuidle_coupled. > > > > does this mean that with the printk-safe patches reverted > > (so, basically, the same conditions module 4 printk patches) > > you don't see illegal RCU usage reports? at the moment I can't > > see any connection between f975237b7682 and RCU usage from idle CPU. > > Yes reverting those four patches I listed earlier also makes it go > away. aha... so, the previous RCU warning was simply suppressed by lockdep_off() that we used to have in printk(). RCU dereference check #define __rcu_dereference_check(p, c, space) \ ({ \ /* Dependency order vs. p above. */ \ typeof(*p) *________p1 = (typeof(*p) *__force)lockless_dereference(p); \ RCU_LOCKDEP_WARN(!(c), "suspicious rcu_dereference_check() usage"); \ rcu_dereference_sparse(p, space); \ ((typeof(*p) __force __kernel *)(________p1)); \ }) where RCU_LOCKDEP_WARN() that prints "suspicious rcu_dereference_check() usage" is #define RCU_LOCKDEP_WARN(c, s) \ do { \ static bool __section(.data.unlikely) __warned; \ if (debug_lockdep_rcu_enabled() && !__warned && (c)) { \ __warned = true; \ lockdep_rcu_suspicious(__FILE__, __LINE__, s); \ } \ } while (0) where debug_lockdep_rcu_enabled() int notrace debug_lockdep_rcu_enabled(void) { return rcu_scheduler_active != RCU_SCHEDULER_INACTIVE && debug_locks && current->lockdep_recursion == 0; } depends on lockdep state. and we just used to have 'current->lockdep_recursion != 0' here, because of lockdep_off() in printk() around console_unlock(), which increments ->lockdep_recursion. now we have lockdep enabled and the ->lockdep_recursion == 0. so the RCU warning is valid and I need to Cc stable on that _rcuidle patch, the tracepoint is pretty old. it's from 3.4 -ss
* Sergey Senozhatsky <sergey.senozhatsky@gmail.com> [170216 08:33]: > On (02/16/17 07:10), Tony Lindgren wrote: > [..] > > > > > [..] > > > > > > Below is another issue I noticed caused by commit f975237b7682 that > > > > > > I noticed during booting. > > > > > > > > > > do you mean that with f975237b7682 you _always_ see that illegal RCU > > > > > usage warning? > > > > > > > > Yeah on every boot on devices using cpuidle_coupled. > > > > > > does this mean that with the printk-safe patches reverted > > > (so, basically, the same conditions module 4 printk patches) > > > you don't see illegal RCU usage reports? at the moment I can't > > > see any connection between f975237b7682 and RCU usage from idle CPU. > > > > Yes reverting those four patches I listed earlier also makes it go > > away. > > aha... so, the previous RCU warning was simply suppressed by lockdep_off() > that we used to have in printk(). > > > RCU dereference check > > #define __rcu_dereference_check(p, c, space) \ > ({ \ > /* Dependency order vs. p above. */ \ > typeof(*p) *________p1 = (typeof(*p) *__force)lockless_dereference(p); \ > RCU_LOCKDEP_WARN(!(c), "suspicious rcu_dereference_check() usage"); \ > rcu_dereference_sparse(p, space); \ > ((typeof(*p) __force __kernel *)(________p1)); \ > }) > > > where RCU_LOCKDEP_WARN() that prints "suspicious rcu_dereference_check() usage" > is > > > #define RCU_LOCKDEP_WARN(c, s) \ > do { \ > static bool __section(.data.unlikely) __warned; \ > if (debug_lockdep_rcu_enabled() && !__warned && (c)) { \ > __warned = true; \ > lockdep_rcu_suspicious(__FILE__, __LINE__, s); \ > } \ > } while (0) > > > > where debug_lockdep_rcu_enabled() > > int notrace debug_lockdep_rcu_enabled(void) > { > return rcu_scheduler_active != RCU_SCHEDULER_INACTIVE && debug_locks && > current->lockdep_recursion == 0; > } > > depends on lockdep state. and we just used to have > 'current->lockdep_recursion != 0' here, because of lockdep_off() > in printk() around console_unlock(), which increments ->lockdep_recursion. > > now we have lockdep enabled and the ->lockdep_recursion == 0. > > > so the RCU warning is valid and I need to Cc stable on that _rcuidle > patch, the tracepoint is pretty old. it's from 3.4 OK thanks for checking why it changed. Regards, Tony
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index 7180088cbb23..34da86e73d00 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -1563,7 +1563,7 @@ static void call_console_drivers(const char *ext_text, size_t ext_len, { struct console *con; - trace_console(text, len); + trace_console_rcuidle(text, len); if (!console_drivers) return;