diff mbox

Regression in next with use printk_safe buffers in printk

Message ID 20170216013109.GA772@jagdpanzerIV.localdomain (mailing list archive)
State Not Applicable, archived
Headers show

Commit Message

Sergey Senozhatsky Feb. 16, 2017, 1:31 a.m. UTC
On (02/15/17 10:01), Tony Lindgren wrote:
[..]
> Below is another issue I noticed caused by commit f975237b7682 that
> I noticed during booting.

do you mean that with f975237b7682 you _always_ see that illegal RCU
usage warning?


> 8< --------------------------
> [    2.581939] hw-breakpoint: Failed to enable monitor mode on CPU 0.
> [    2.591613] hw-breakpoint: CPU 0 failed to disable vector catch
> [    2.597686]
> [    2.597717] ===============================
> [    2.597717] [ ERR: suspicious RCU usage.  ]
> [    2.597717] 4.10.0-rc8-next-20170215+ #119 Not tainted
> [    2.597717] -------------------------------
> [    2.597717] ./include/trace/events/printk.h:32 suspicious rcu_dereference_check() usage!

so this is trace_console(text, len) from

	cpuidle_enter_state()
	...
		vprintk_emit()		<< error: Failed to enable monitor mode/CPU 0 failed to disable...
		  console_unlock()
		    call_console_drivers()
		      trace_console()


> [    2.597717]
> [    2.597717] other info that might help us debug this:
> [    2.597717]
> [    2.597717]
> [    2.597717] RCU used illegally from idle CPU!

um... is this what rcuidle() for?


---


---


I need to look more at this.

	-ss

> [    2.597717] rcu_scheduler_active = 2, debug_locks = 0
> [    2.597747] RCU used illegally from extended quiescent state!


> [    2.597747] 2 locks held by swapper/0/0:
> [    2.597747]  #0:  (cpu_pm_notifier_lock){......}, at: [<c0237e2c>] cpu_pm_exit+0x10/0x54
> [    2.597747]  #1:  (console_lock){+.+.+.}, at: [<c01ab350>] vprintk_emit+0x264/0x474
> [    2.597747]
> [    2.597747] stack backtrace:
> [    2.597778] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.10.0-rc8-next-20170215+ #119
> [    2.597778] Hardware name: Generic OMAP4 (Flattened Device Tree)
> [    2.597778] [<c0110228>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
> [    2.597778] [<c010c224>] (show_stack) from [<c04ca8a0>] (dump_stack+0xac/0xe0)
> [    2.597778] [<c04ca8a0>] (dump_stack) from [<c01ab084>] (console_unlock+0x5e8/0x650)
> [    2.597778] [<c01ab084>] (console_unlock) from [<c01ab35c>] (vprintk_emit+0x270/0x474)
> [    2.597778] [<c01ab35c>] (vprintk_emit) from [<c01ab6f0>] (vprintk_default+0x20/0x28)
> [    2.597808] [<c01ab6f0>] (vprintk_default) from [<c0250c94>] (printk+0x20/0x30)
> [    2.597808] [<c0250c94>] (printk) from [<c0111004>] (reset_ctrl_regs+0x108/0x288)
> [    2.597808] [<c0111004>] (reset_ctrl_regs) from [<c0111220>] (dbg_cpu_pm_notify+0x28/0x30)
> [    2.597808] [<c0111220>] (dbg_cpu_pm_notify) from [<c015f3ac>] (notifier_call_chain+0x44/0x80)
> [    2.597808] [<c015f3ac>] (notifier_call_chain) from [<c0237e48>] (cpu_pm_exit+0x2c/0x54)
> [    2.597808] [<c0237e48>] (cpu_pm_exit) from [<c0126ca0>] (omap_enter_idle_coupled+0x80/0x208)
> [    2.597808] [<c0126ca0>] (omap_enter_idle_coupled) from [<c0680d80>] (cpuidle_enter_state+0x118/0x4ac)
> [    2.597808] [<c0680d80>] (cpuidle_enter_state) from [<c0682e54>] (cpuidle_enter_state_coupled+0x3a8/0x40c)
> [    2.597839] [<c0682e54>] (cpuidle_enter_state_coupled) from [<c0190c04>] (do_idle+0x1a4/0x218)
> [    2.597839] [<c0190c04>] (do_idle) from [<c0190ffc>] (cpu_startup_entry+0x18/0x1c)
> [    2.597839] [<c0190ffc>] (cpu_startup_entry) from [<c0c00c40>] (start_kernel+0x35c/0x3d4)
> [    2.597839] [<c0c00c40>] (start_kernel) from [<8000807c>] (0x8000807c)

Comments

Tony Lindgren Feb. 16, 2017, 4:03 a.m. UTC | #1
* Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> [170215 17:32]:
> On (02/15/17 10:01), Tony Lindgren wrote:
> [..]
> > Below is another issue I noticed caused by commit f975237b7682 that
> > I noticed during booting.
> 
> do you mean that with f975237b7682 you _always_ see that illegal RCU
> usage warning?

Yeah on every boot on devices using cpuidle_coupled.

> > 8< --------------------------
> > [    2.581939] hw-breakpoint: Failed to enable monitor mode on CPU 0.
> > [    2.591613] hw-breakpoint: CPU 0 failed to disable vector catch
> > [    2.597686]
> > [    2.597717] ===============================
> > [    2.597717] [ ERR: suspicious RCU usage.  ]
> > [    2.597717] 4.10.0-rc8-next-20170215+ #119 Not tainted
> > [    2.597717] -------------------------------
> > [    2.597717] ./include/trace/events/printk.h:32 suspicious rcu_dereference_check() usage!
> 
> so this is trace_console(text, len) from
> 
> 	cpuidle_enter_state()
> 	...
> 		vprintk_emit()		<< error: Failed to enable monitor mode/CPU 0 failed to disable...
> 		  console_unlock()
> 		    call_console_drivers()
> 		      trace_console()
> 
> 
> > [    2.597717]
> > [    2.597717] other info that might help us debug this:
> > [    2.597717]
> > [    2.597717]
> > [    2.597717] RCU used illegally from idle CPU!
> 
> um... is this what rcuidle() for?

Yeah I think that's all there is to it. We've had a bunch of similar
issues pop up over past year :) See for example:

$ git log --author=McKenney drivers/base/power

> ---
> 
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 7180088cbb23..34da86e73d00 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -1563,7 +1563,7 @@ static void call_console_drivers(const char *ext_text, size_t ext_len,
>  {
>  	struct console *con;
>  
> -	trace_console(text, len);
> +	trace_console_rcuidle(text, len);
>  
>  	if (!console_drivers)
>  		return;
> 
> ---
> 
> 
> I need to look more at this.

That fixes it for me thanks, so feel free to add:

Tested-by: Tony Lindgren <tony@atomide.com>


> > [    2.597717] rcu_scheduler_active = 2, debug_locks = 0
> > [    2.597747] RCU used illegally from extended quiescent state!
> 
> 
> > [    2.597747] 2 locks held by swapper/0/0:
> > [    2.597747]  #0:  (cpu_pm_notifier_lock){......}, at: [<c0237e2c>] cpu_pm_exit+0x10/0x54
> > [    2.597747]  #1:  (console_lock){+.+.+.}, at: [<c01ab350>] vprintk_emit+0x264/0x474
> > [    2.597747]
> > [    2.597747] stack backtrace:
> > [    2.597778] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.10.0-rc8-next-20170215+ #119
> > [    2.597778] Hardware name: Generic OMAP4 (Flattened Device Tree)
> > [    2.597778] [<c0110228>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
> > [    2.597778] [<c010c224>] (show_stack) from [<c04ca8a0>] (dump_stack+0xac/0xe0)
> > [    2.597778] [<c04ca8a0>] (dump_stack) from [<c01ab084>] (console_unlock+0x5e8/0x650)
> > [    2.597778] [<c01ab084>] (console_unlock) from [<c01ab35c>] (vprintk_emit+0x270/0x474)
> > [    2.597778] [<c01ab35c>] (vprintk_emit) from [<c01ab6f0>] (vprintk_default+0x20/0x28)
> > [    2.597808] [<c01ab6f0>] (vprintk_default) from [<c0250c94>] (printk+0x20/0x30)
> > [    2.597808] [<c0250c94>] (printk) from [<c0111004>] (reset_ctrl_regs+0x108/0x288)
> > [    2.597808] [<c0111004>] (reset_ctrl_regs) from [<c0111220>] (dbg_cpu_pm_notify+0x28/0x30)
> > [    2.597808] [<c0111220>] (dbg_cpu_pm_notify) from [<c015f3ac>] (notifier_call_chain+0x44/0x80)
> > [    2.597808] [<c015f3ac>] (notifier_call_chain) from [<c0237e48>] (cpu_pm_exit+0x2c/0x54)
> > [    2.597808] [<c0237e48>] (cpu_pm_exit) from [<c0126ca0>] (omap_enter_idle_coupled+0x80/0x208)
> > [    2.597808] [<c0126ca0>] (omap_enter_idle_coupled) from [<c0680d80>] (cpuidle_enter_state+0x118/0x4ac)
> > [    2.597808] [<c0680d80>] (cpuidle_enter_state) from [<c0682e54>] (cpuidle_enter_state_coupled+0x3a8/0x40c)
> > [    2.597839] [<c0682e54>] (cpuidle_enter_state_coupled) from [<c0190c04>] (do_idle+0x1a4/0x218)
> > [    2.597839] [<c0190c04>] (do_idle) from [<c0190ffc>] (cpu_startup_entry+0x18/0x1c)
> > [    2.597839] [<c0190ffc>] (cpu_startup_entry) from [<c0c00c40>] (start_kernel+0x35c/0x3d4)
> > [    2.597839] [<c0c00c40>] (start_kernel) from [<8000807c>] (0x8000807c)
Sergey Senozhatsky Feb. 16, 2017, 4:25 a.m. UTC | #2
On (02/15/17 20:03), Tony Lindgren wrote:
> * Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> [170215 17:32]:
> > On (02/15/17 10:01), Tony Lindgren wrote:
> > [..]
> > > Below is another issue I noticed caused by commit f975237b7682 that
> > > I noticed during booting.
> > 
> > do you mean that with f975237b7682 you _always_ see that illegal RCU
> > usage warning?
> 
> Yeah on every boot on devices using cpuidle_coupled.

does this mean that with the printk-safe patches reverted
(so, basically, the same conditions module 4 printk patches)
you don't see illegal RCU usage reports? at the moment I can't
see any connection between f975237b7682 and RCU usage from idle CPU.

[..]

> > ---
> > 
> > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> > index 7180088cbb23..34da86e73d00 100644
> > --- a/kernel/printk/printk.c
> > +++ b/kernel/printk/printk.c
> > @@ -1563,7 +1563,7 @@ static void call_console_drivers(const char *ext_text, size_t ext_len,
> >  {
> >  	struct console *con;
> >  
> > -	trace_console(text, len);
> > +	trace_console_rcuidle(text, len);
> >  
> >  	if (!console_drivers)
> >  		return;
> > 
> > ---
> > 
> > 
> > I need to look more at this.
> 
> That fixes it for me thanks, so feel free to add:
> 
> Tested-by: Tony Lindgren <tony@atomide.com>

thanks.

	-ss
Tony Lindgren Feb. 16, 2017, 3:10 p.m. UTC | #3
* Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> [170215 20:26]:
> On (02/15/17 20:03), Tony Lindgren wrote:
> > * Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> [170215 17:32]:
> > > On (02/15/17 10:01), Tony Lindgren wrote:
> > > [..]
> > > > Below is another issue I noticed caused by commit f975237b7682 that
> > > > I noticed during booting.
> > > 
> > > do you mean that with f975237b7682 you _always_ see that illegal RCU
> > > usage warning?
> > 
> > Yeah on every boot on devices using cpuidle_coupled.
> 
> does this mean that with the printk-safe patches reverted
> (so, basically, the same conditions module 4 printk patches)
> you don't see illegal RCU usage reports? at the moment I can't
> see any connection between f975237b7682 and RCU usage from idle CPU.

Yes reverting those four patches I listed earlier also makes it go
away.

Regards,

Tony
Sergey Senozhatsky Feb. 16, 2017, 4:31 p.m. UTC | #4
On (02/16/17 07:10), Tony Lindgren wrote:
[..]
> > > > [..]
> > > > > Below is another issue I noticed caused by commit f975237b7682 that
> > > > > I noticed during booting.
> > > > 
> > > > do you mean that with f975237b7682 you _always_ see that illegal RCU
> > > > usage warning?
> > > 
> > > Yeah on every boot on devices using cpuidle_coupled.
> > 
> > does this mean that with the printk-safe patches reverted
> > (so, basically, the same conditions module 4 printk patches)
> > you don't see illegal RCU usage reports? at the moment I can't
> > see any connection between f975237b7682 and RCU usage from idle CPU.
> 
> Yes reverting those four patches I listed earlier also makes it go
> away.

aha... so, the previous RCU warning was simply suppressed by lockdep_off()
that we used to have in printk().


RCU dereference check

#define __rcu_dereference_check(p, c, space) \
({ \
	/* Dependency order vs. p above. */ \
	typeof(*p) *________p1 = (typeof(*p) *__force)lockless_dereference(p); \
	RCU_LOCKDEP_WARN(!(c), "suspicious rcu_dereference_check() usage"); \
	rcu_dereference_sparse(p, space); \
	((typeof(*p) __force __kernel *)(________p1)); \
})


where RCU_LOCKDEP_WARN() that prints "suspicious rcu_dereference_check() usage"
is


#define RCU_LOCKDEP_WARN(c, s)						\
	do {								\
		static bool __section(.data.unlikely) __warned;		\
		if (debug_lockdep_rcu_enabled() && !__warned && (c)) {	\
			__warned = true;				\
			lockdep_rcu_suspicious(__FILE__, __LINE__, s);	\
		}							\
	} while (0)



where debug_lockdep_rcu_enabled()

int notrace debug_lockdep_rcu_enabled(void)
{
	return rcu_scheduler_active != RCU_SCHEDULER_INACTIVE && debug_locks &&
	       current->lockdep_recursion == 0;
}

depends on lockdep state. and we just used to have
'current->lockdep_recursion != 0' here, because of lockdep_off()
in printk() around console_unlock(), which increments ->lockdep_recursion.

now we have lockdep enabled and the ->lockdep_recursion == 0.


so the RCU warning is valid and I need to Cc stable on that _rcuidle
patch, the tracepoint is pretty old. it's from 3.4

	-ss
Tony Lindgren Feb. 16, 2017, 7:13 p.m. UTC | #5
* Sergey Senozhatsky <sergey.senozhatsky@gmail.com> [170216 08:33]:
> On (02/16/17 07:10), Tony Lindgren wrote:
> [..]
> > > > > [..]
> > > > > > Below is another issue I noticed caused by commit f975237b7682 that
> > > > > > I noticed during booting.
> > > > > 
> > > > > do you mean that with f975237b7682 you _always_ see that illegal RCU
> > > > > usage warning?
> > > > 
> > > > Yeah on every boot on devices using cpuidle_coupled.
> > > 
> > > does this mean that with the printk-safe patches reverted
> > > (so, basically, the same conditions module 4 printk patches)
> > > you don't see illegal RCU usage reports? at the moment I can't
> > > see any connection between f975237b7682 and RCU usage from idle CPU.
> > 
> > Yes reverting those four patches I listed earlier also makes it go
> > away.
> 
> aha... so, the previous RCU warning was simply suppressed by lockdep_off()
> that we used to have in printk().
> 
> 
> RCU dereference check
> 
> #define __rcu_dereference_check(p, c, space) \
> ({ \
> 	/* Dependency order vs. p above. */ \
> 	typeof(*p) *________p1 = (typeof(*p) *__force)lockless_dereference(p); \
> 	RCU_LOCKDEP_WARN(!(c), "suspicious rcu_dereference_check() usage"); \
> 	rcu_dereference_sparse(p, space); \
> 	((typeof(*p) __force __kernel *)(________p1)); \
> })
> 
> 
> where RCU_LOCKDEP_WARN() that prints "suspicious rcu_dereference_check() usage"
> is
> 
> 
> #define RCU_LOCKDEP_WARN(c, s)						\
> 	do {								\
> 		static bool __section(.data.unlikely) __warned;		\
> 		if (debug_lockdep_rcu_enabled() && !__warned && (c)) {	\
> 			__warned = true;				\
> 			lockdep_rcu_suspicious(__FILE__, __LINE__, s);	\
> 		}							\
> 	} while (0)
> 
> 
> 
> where debug_lockdep_rcu_enabled()
> 
> int notrace debug_lockdep_rcu_enabled(void)
> {
> 	return rcu_scheduler_active != RCU_SCHEDULER_INACTIVE && debug_locks &&
> 	       current->lockdep_recursion == 0;
> }
> 
> depends on lockdep state. and we just used to have
> 'current->lockdep_recursion != 0' here, because of lockdep_off()
> in printk() around console_unlock(), which increments ->lockdep_recursion.
> 
> now we have lockdep enabled and the ->lockdep_recursion == 0.
> 
> 
> so the RCU warning is valid and I need to Cc stable on that _rcuidle
> patch, the tracepoint is pretty old. it's from 3.4

OK thanks for checking why it changed.

Regards,

Tony
diff mbox

Patch

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 7180088cbb23..34da86e73d00 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1563,7 +1563,7 @@  static void call_console_drivers(const char *ext_text, size_t ext_len,
 {
 	struct console *con;
 
-	trace_console(text, len);
+	trace_console_rcuidle(text, len);
 
 	if (!console_drivers)
 		return;