diff mbox

[16/20] sched/idle: Use explicit broadcast oneshot control function

Message ID 18575128.8R6Lp68k0Q@vostro.rjw.lan (mailing list archive)
State Not Applicable, archived
Headers show

Commit Message

Rafael J. Wysocki April 1, 2015, 10:22 p.m. UTC
From: Thomas Gleixner <tglx@linutronix.de>

Replace the clockevents_notify() call with an explicit function call.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 kernel/sched/idle.c |    5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Linus Walleij April 28, 2015, 10:11 a.m. UTC | #1
On Thu, Apr 2, 2015 at 12:22 AM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:

> From: Thomas Gleixner <tglx@linutronix.de>
>
> Replace the clockevents_notify() call with an explicit function call.
>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

For some reason this makes my Ux500 system arbitrarily hang,
especially during boot. Bisected down to this commit.
Since the entire changeset is removing the notifications
altogether I can't just revert it.

Disabling CONFIG_CPU_IDLE removes the problem.

Tried registering a stub driver (I just #if 0 all the code in
drivers/cpuidle/cpuidle-ux500.c) it still crashes.

That makes me think something inside the cpuidle subsystem
is locking up after this, but my other idea is that the timer
may be involved in some way, like this is stressing the timer
in some new yet untested way.

Has anyone else seen problems with this or is it only
ux500?

I'm looking closer at it but feel a bit clueless...

Yours,
Linus Walleij
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sudeep Holla April 28, 2015, 10:17 a.m. UTC | #2
On 28/04/15 11:11, Linus Walleij wrote:
> On Thu, Apr 2, 2015 at 12:22 AM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>
>> From: Thomas Gleixner <tglx@linutronix.de>
>>
>> Replace the clockevents_notify() call with an explicit function call.
>>
>> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> For some reason this makes my Ux500 system arbitrarily hang,
> especially during boot. Bisected down to this commit.
> Since the entire changeset is removing the notifications
> altogether I can't just revert it.
>
> Disabling CONFIG_CPU_IDLE removes the problem.
>
> Tried registering a stub driver (I just #if 0 all the code in
> drivers/cpuidle/cpuidle-ux500.c) it still crashes.
>
> That makes me think something inside the cpuidle subsystem
> is locking up after this, but my other idea is that the timer
> may be involved in some way, like this is stressing the timer
> in some new yet untested way.
>
> Has anyone else seen problems with this or is it only
> ux500?
>
> I'm looking closer at it but feel a bit clueless...
>

Yes I am too in similar situation with my Vexpress platform.
I did report with the lockdep logs[1] before v4.1-rc1.
I still see the issue with v4.1-rc1.

Regards,
Sudeep

[1] https://lkml.org/lkml/2015/4/23/329
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Daniel Lezcano April 28, 2015, 10:34 a.m. UTC | #3
On 04/28/2015 12:11 PM, Linus Walleij wrote:
> On Thu, Apr 2, 2015 at 12:22 AM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>
>> From: Thomas Gleixner <tglx@linutronix.de>
>>
>> Replace the clockevents_notify() call with an explicit function call.
>>
>> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> For some reason this makes my Ux500 system arbitrarily hang,
> especially during boot. Bisected down to this commit.
> Since the entire changeset is removing the notifications
> altogether I can't just revert it.
>
> Disabling CONFIG_CPU_IDLE removes the problem.
>
> Tried registering a stub driver (I just #if 0 all the code in
> drivers/cpuidle/cpuidle-ux500.c) it still crashes.
>
> That makes me think something inside the cpuidle subsystem
> is locking up after this, but my other idea is that the timer
> may be involved in some way, like this is stressing the timer
> in some new yet untested way.
>
> Has anyone else seen problems with this or is it only
> ux500?
>
> I'm looking closer at it but feel a bit clueless...

If you keep the #if 0 and remove the CPUIDLE_FLAG_TIMER_STOP flag, does 
it still crash ?

> To unsubscribe from this list: send the line "unsubscribe linux-pm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
Sudeep Holla April 28, 2015, 10:42 a.m. UTC | #4
On 28/04/15 11:34, Daniel Lezcano wrote:
> On 04/28/2015 12:11 PM, Linus Walleij wrote:
>> On Thu, Apr 2, 2015 at 12:22 AM, Rafael J. Wysocki
>> <rjw@rjwysocki.net> wrote:
>>
>>> From: Thomas Gleixner <tglx@linutronix.de>
>>>
>>> Replace the clockevents_notify() call with an explicit function
>>> call.
>>>
>>> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>>> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>
>> For some reason this makes my Ux500 system arbitrarily hang,
>> especially during boot. Bisected down to this commit. Since the
>> entire changeset is removing the notifications altogether I can't
>> just revert it.
>>
>> Disabling CONFIG_CPU_IDLE removes the problem.
>>
>> Tried registering a stub driver (I just #if 0 all the code in
>> drivers/cpuidle/cpuidle-ux500.c) it still crashes.
>>
>> That makes me think something inside the cpuidle subsystem is
>> locking up after this, but my other idea is that the timer may be
>> involved in some way, like this is stressing the timer in some new
>> yet untested way.
>>
>> Has anyone else seen problems with this or is it only ux500?
>>
>> I'm looking closer at it but feel a bit clueless...
>
> If you keep the #if 0 and remove the CPUIDLE_FLAG_TIMER_STOP flag,
> does it still crash ?
>

At-least I observed issue only when I am using hardware broadcast timer.
It doesn't hang when I am using hrtimer as broadcast timer in which case
one of the cpu will be not enter deeper idle states that lose timer.
I will rerun on v4.1-rc1 and post the complete log.

Regards,
Sudeep
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki April 28, 2015, 12:19 p.m. UTC | #5
On Tue, Apr 28, 2015 at 12:42 PM, Sudeep Holla <sudeep.holla@arm.com> wrote:
>
>
> On 28/04/15 11:34, Daniel Lezcano wrote:
>>
>> On 04/28/2015 12:11 PM, Linus Walleij wrote:
>>>
>>> On Thu, Apr 2, 2015 at 12:22 AM, Rafael J. Wysocki
>>> <rjw@rjwysocki.net> wrote:
>>>
>>>> From: Thomas Gleixner <tglx@linutronix.de>
>>>>
>>>> Replace the clockevents_notify() call with an explicit function
>>>> call.
>>>>
>>>> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>>>> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>>
>>>
>>> For some reason this makes my Ux500 system arbitrarily hang,
>>> especially during boot. Bisected down to this commit. Since the
>>> entire changeset is removing the notifications altogether I can't
>>> just revert it.
>>>
>>> Disabling CONFIG_CPU_IDLE removes the problem.
>>>
>>> Tried registering a stub driver (I just #if 0 all the code in
>>> drivers/cpuidle/cpuidle-ux500.c) it still crashes.
>>>
>>> That makes me think something inside the cpuidle subsystem is
>>> locking up after this, but my other idea is that the timer may be
>>> involved in some way, like this is stressing the timer in some new
>>> yet untested way.
>>>
>>> Has anyone else seen problems with this or is it only ux500?
>>>
>>> I'm looking closer at it but feel a bit clueless...
>>
>>
>> If you keep the #if 0 and remove the CPUIDLE_FLAG_TIMER_STOP flag,
>> does it still crash ?
>>
>
> At-least I observed issue only when I am using hardware broadcast timer.
> It doesn't hang when I am using hrtimer as broadcast timer in which case
> one of the cpu will be not enter deeper idle states that lose timer.
> I will rerun on v4.1-rc1 and post the complete log.

So the bug here is that cpuidle_enter() enables interrupts, so the
assumption about them being not enabled made by
tick_broadcast_oneshot_control() is actually not valid.

It looks like we need to acquire the clockevents_lock at least in this
particular case.  Let me see where to put it and I'll send a patch for
testing.

Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Linus Walleij April 28, 2015, 12:37 p.m. UTC | #6
On Tue, Apr 28, 2015 at 2:19 PM, Rafael J. Wysocki <rafael@kernel.org> wrote:
> Sudeep:
>> At-least I observed issue only when I am using hardware broadcast timer.
>> It doesn't hang when I am using hrtimer as broadcast timer in which case
>> one of the cpu will be not enter deeper idle states that lose timer.
>> I will rerun on v4.1-rc1 and post the complete log.
>
> So the bug here is that cpuidle_enter() enables interrupts, so the
> assumption about them being not enabled made by
> tick_broadcast_oneshot_control() is actually not valid.
>
> It looks like we need to acquire the clockevents_lock at least in this
> particular case.  Let me see where to put it and I'll send a patch for
> testing.

Aha that looks very much like it. Put me on the patch and I'll
take it for a spin.

Yours,
Linus Walleij
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sudeep Holla April 28, 2015, 1:04 p.m. UTC | #7
Hi Rafael,

On 28/04/15 13:19, Rafael J. Wysocki wrote:
> On Tue, Apr 28, 2015 at 12:42 PM, Sudeep Holla <sudeep.holla@arm.com> wrote:

[...]

>>
>> At-least I observed issue only when I am using hardware broadcast timer.
>> It doesn't hang when I am using hrtimer as broadcast timer in which case
>> one of the cpu will be not enter deeper idle states that lose timer.
>> I will rerun on v4.1-rc1 and post the complete log.
>
> So the bug here is that cpuidle_enter() enables interrupts, so the
> assumption about them being not enabled made by
> tick_broadcast_oneshot_control() is actually not valid.
>
> It looks like we need to acquire the clockevents_lock at least in this
> particular case.  Let me see where to put it and I'll send a patch for
> testing.
>

Thanks for looking at this, I can also help in testing. Sending the 
complete log again with v4.1-rc1 if it's any useful.

Regards,
Sudeep

--->8

BUG: spinlock lockup suspected on CPU#0, swapper/0/0
  lock: tick_broadcast_lock+0x0/0x40, .magic: dead4ead, .owner: 
swapper/0/0, .owner_cpu: 0
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.1.0-rc1 #16
Hardware name: ARM-Versatile Express
[<c0014b1d>] (unwind_backtrace) from [<c0011659>] (show_stack+0x11/0x14)
[<c0011659>] (show_stack) from [<c05192ed>] (dump_stack+0x6d/0x78)
[<c05192ed>] (dump_stack) from [<c005c897>] (do_raw_spin_lock+0xc3/0x144)
[<c005c897>] (do_raw_spin_lock) from [<c00811f9>] 
(tick_handle_oneshot_broadcast+0x25/0x164)
[<c00811f9>] (tick_handle_oneshot_broadcast) from [<c001dd21>] 
(sp804_timer_interrupt+0x31/0x34)
[<c001dd21>] (sp804_timer_interrupt) from [<c0067ee1>] 
(handle_irq_event_percpu+0x45/0x154)
[<c0067ee1>] (handle_irq_event_percpu) from [<c006801f>] 
(handle_irq_event+0x2f/0x44)
[<c006801f>] (handle_irq_event) from [<c0069edb>] 
(handle_fasteoi_irq+0x6f/0xf0)
[<c0069edb>] (handle_fasteoi_irq) from [<c00677c7>] 
(generic_handle_irq+0x23/0x2c)
[<c00677c7>] (generic_handle_irq) from [<c0067a0d>] 
(__handle_domain_irq+0x45/0x84)
[<c0067a0d>] (__handle_domain_irq) from [<c0009303>] 
(gic_handle_irq+0x27/0x50)
[<c0009303>] (gic_handle_irq) from [<c0011eff>] (__irq_svc+0x3f/0x64)
Exception stack(0xc07ebf08 to 0xc07ebf50)
bf00:                   00000000 c1038814 00000001 00000001 c07f3f80 
ee78dec0
bf20: 00000000 c1038810 00000000 00000001 c07ebf88 c0520fc8 00000000 
c07ebf50
bf40: c008151d c02e104e 40000173 ffffffff
[<c0011eff>] (__irq_svc) from [<c02e104e>] (_test_and_clear_bit+0x26/0x48)
[<c02e104e>] (_test_and_clear_bit) from [<c008151d>] 
(tick_broadcast_oneshot_control+0x159/0x1ec)
[<c008151d>] (tick_broadcast_oneshot_control) from [<c0050273>] 
(cpu_startup_entry+0x2c3/0x2f8)
[<c0050273>] (cpu_startup_entry) from [<c0782a47>] 
(start_kernel+0x327/0x330)
BUG: spinlock lockup suspected on CPU#3, swapper/3/0
BUG: spinlock lockup suspected on CPU#4, swapper/4/0
  lock: tick_broadcast_lock+0x0/0x40, .magic: dead4ead, .owner: 
swapper/0/0, .owner_cpu: 0
CPU: 4 PID: 0 Comm: swapper/4 Not tainted 4.1.0-rc1 #16
Hardware name: ARM-Versatile Express
[<c0014b1d>] (unwind_backtrace) from [<c0011659>] (show_stack+0x11/0x14)
[<c0011659>] (show_stack) from [<c05192ed>] (dump_stack+0x6d/0x78)
BUG: spinlock lockup suspected on CPU#2, swapper/2/0
[<c05192ed>] (dump_stack) from [<c005c897>] (do_raw_spin_lock+0xc3/0x144)
  lock: tick_broadcast_lock+0x0/0x40, .magic: dead4ead, .owner: 
swapper/0/0, .owner_cpu: 0
[<c005c897>] (do_raw_spin_lock) from [<c0081409>] 
(tick_broadcast_oneshot_control+0x45/0x1ec)
[<c0081409>] (tick_broadcast_oneshot_control) from [<c0050273>] 
(cpu_startup_entry+0x2c3/0x2f8)
[<c0050273>] (cpu_startup_entry) from [<800093d1>] (0x800093d1)
CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.1.0-rc1 #16
Hardware name: ARM-Versatile Express
[<c0014b1d>] (unwind_backtrace) from [<c0011659>] (show_stack+0x11/0x14)
[<c0011659>] (show_stack) from [<c05192ed>] (dump_stack+0x6d/0x78)
[<c05192ed>] (dump_stack) from [<c005c897>] (do_raw_spin_lock+0xc3/0x144)
[<c005c897>] (do_raw_spin_lock) from [<c0081409>] 
(tick_broadcast_oneshot_control+0x45/0x1ec)
[<c0081409>] (tick_broadcast_oneshot_control) from [<c0050273>] 
(cpu_startup_entry+0x2c3/0x2f8)
[<c0050273>] (cpu_startup_entry) from [<800093d1>] (0x800093d1)
  lock: tick_broadcast_lock+0x0/0x40, .magic: dead4ead, .owner: 
swapper/0/0, .owner_cpu: 0
CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.1.0-rc1 #16
Hardware name: ARM-Versatile Express
[<c0014b1d>] (unwind_backtrace) from [<c0011659>] (show_stack+0x11/0x14)
[<c0011659>] (show_stack) from [<c05192ed>] (dump_stack+0x6d/0x78)
[<c05192ed>] (dump_stack) from [<c005c897>] (do_raw_spin_lock+0xc3/0x144)
[<c005c897>] (do_raw_spin_lock) from [<c0081409>] 
(tick_broadcast_oneshot_control+0x45/0x1ec)
[<c0081409>] (tick_broadcast_oneshot_control) from [<c0050273>] 
(cpu_startup_entry+0x2c3/0x2f8)
[<c0050273>] (cpu_startup_entry) from [<800093d1>] (0x800093d1)





--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

Index: linux-pm/kernel/sched/idle.c
===================================================================
--- linux-pm.orig/kernel/sched/idle.c
+++ linux-pm/kernel/sched/idle.c
@@ -158,8 +158,7 @@  static void cpuidle_idle_call(void)
 	 * is used from another cpu as a broadcast timer, this call may
 	 * fail if it is not available
 	 */
-	if (broadcast &&
-	    clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, &dev->cpu))
+	if (broadcast && tick_broadcast_enter())
 		goto use_default;
 
 	/* Take note of the planned idle state. */
@@ -176,7 +175,7 @@  static void cpuidle_idle_call(void)
 	idle_set_state(this_rq(), NULL);
 
 	if (broadcast)
-		clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, &dev->cpu);
+		tick_broadcast_exit();
 
 	/*
 	 * Give the governor an opportunity to reflect on the outcome