diff mbox

[Bug] ARM: mxs: STI: console can't wake up from freeze

Message ID 322177156.158733.9867e3e7-5710-4844-a098-6f44bd852a6d.open-xchange@email.1und1.de (mailing list archive)
State New, archived
Headers show

Commit Message

Stefan Wahren Nov. 5, 2016, 11:07 a.m. UTC
Hi,

[add Rui]

> Russell King - ARM Linux <linux@armlinux.org.uk> hat am 1. November 2016 um
> 10:23 geschrieben:
> 
> 
> On Mon, Oct 31, 2016 at 08:54:33PM +0100, Stefan Wahren wrote:
> > [  366.696043] INFO: task ext4lazyinit:70 blocked for more than 120 seconds.
> > [  366.703046]       Not tainted 4.9.0-rc1 #7
> > [  366.707188] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> > this
> > message.
> > [  366.715161] ext4lazyinit    D c05aa6ac     0    70      2 0x00000000
> 
> This looks like a very different problem - I guess one for ext4 people.
> 

i investigated the issue more further. This trace wasn't representative, because
the stacktrace is different in most cases. The "endless loop" is located in
freeze_enter(). So i added some debug messages:

-------------------------------->8------------------------------------

-------------------------------->8------------------------------------

and repeated the test cases for freeze which shows none the representative
stacktraces:

1. cmdline contains no_console_suspend

* echo freeze > /sys/power/state

...
[  139.371308] PM: suspend of devices complete after 1342.721 msecs
[  139.385203] PM: late suspend of devices complete after 7.668 msecs
[  139.399428] PM: noirq suspend of devices complete after 7.936 msecs
[  139.406639] PM: calling cpuidle_resume()
[  139.410619] PM: calling wake_up_all_idle_cpus()
[  139.415339] PM: suspend-to-idle

>>> no reaction to input via Debug UART

[  366.683570] INFO: task bash:373 blocked for more than 120 seconds.
[  366.689814]       Not tainted 4.9.0-rc1-dirty #14
[  366.694705] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[  366.702685] bash            D c05aa6ec     0   373    275 0x00000000
[  366.709227] [<c05aa6ec>] (__schedule) from [<c05aaff8>] (schedule+0x3c/0xbc)
[  366.716495] [<c05aaff8>] (schedule) from [<c005b588>]
(suspend_devices_and_enter+0x888/0xa78)
[  366.725228] [<c005b588>] (suspend_devices_and_enter) from [<c005bea4>]
(pm_suspend+0x72c/0x81c)
[  366.734150] [<c005bea4>] (pm_suspend) from [<c005a008>]
(state_store+0x80/0xcc)
[  366.741554] [<c005a008>] (state_store) from [<c02f0270>]
(kobj_attr_store+0x18/0x1c)
[  366.749515] [<c02f0270>] (kobj_attr_store) from [<c01911e8>]
(sysfs_kf_write+0x48/0x4c)
[  366.757735] [<c01911e8>] (sysfs_kf_write) from [<c0190308>]
(kernfs_fop_write+0xfc/0x1d0)
[  366.766130] [<c0190308>] (kernfs_fop_write) from [<c011f578>]
(__vfs_write+0x2c/0x124)
[  366.774255] [<c011f578>] (__vfs_write) from [<c011f724>]
(vfs_write+0xb4/0x1a4)
[  366.781640] [<c011f724>] (vfs_write) from [<c011f8e8>] (SyS_write+0x44/0x88)
[  366.788890] [<c011f8e8>] (SyS_write) from [<c000a2c0>]
(ret_fast_syscall+0x0/0x1c)
[  366.796627]
[  366.796627] Showing all locks held in the system:
[  366.803011] 2 locks held by khungtaskd/10:
[  366.807149]  #0: [  366.808931]  (
rcu_read_lock[  366.811784] ){......}
, at: [  366.814813] [<c0093a40>] watchdog+0xb4/0x61c
[  366.819128]  #1: [  366.820902]  (
tasklist_lock[  366.823876] ){.+.+..}
, at: [  366.826765] [<c0051dbc>] debug_show_all_locks+0x28/0x1bc
[  366.832151] 4 locks held by bash/373:
[  366.835973]  #0: [  366.837765]  (
sb_writers[  366.840365] #4
){.+.+.+}[  366.842987] , at:
[  366.845079] [<c011f804>] vfs_write+0x194/0x1a4
[  366.849551]  #1: [  366.851324]  (
&of->mutex[  366.854058] ){+.+.+.}
, at: [  366.856944] [<c01902cc>] kernfs_fop_write+0xc0/0x1d0
[  366.861941]  #2: [  366.863839]  (
s_active[  366.866288] #43
){.+.+.+}[  366.868877] , at:
[  366.870938] [<c01902d4>] kernfs_fop_write+0xc8/0x1d0
[  366.876053]  #3: [  366.877843]  (
pm_mutex[  366.880272] ){+.+.+.}
, at: [  366.883266] [<c005b808>] pm_suspend+0x90/0x81c
[  366.887757]
[  366.889268] =============================================
[  366.889268]

2. cmdline contains no_console_suspend

* echo enabled > /sys/class/tty/ttyAMA0/power/wakeup
* echo freeze > /sys/power/state

the same as in 1.

3. cmdline doesn't contains no_console_suspend

* echo enabled > /sys/class/tty/ttyAMA0/power/wakeup
* echo freeze > /sys/power/state

[  161.093187] PM: Syncing filesystems ... [  161.734413] done.
[  161.793242] Freezing user space processes ... (elapsed 0.008 seconds) done.
[  161.809797] Freezing remaining freezable tasks ... (elapsed 0.004 seconds)
done.
[  161.821953] Suspending console(s) (use no_console_suspend to debug)

>>>> no reaction to Debug UART

I expected that in all 3 cases freeze_wake() will be called. Why not?

Here my config again:

CONFIG_PM_SLEEP=y
CONFIG_SUSPEND=y
CONFIG_SUSPEND_FREEZER=y
CONFIG_PM=y
CONFIG_CPU_IDLE is not set

Comments

Zhang, Rui Nov. 5, 2016, 11:39 a.m. UTC | #1
On Sat, 2016-11-05 at 12:07 +0100, Stefan Wahren wrote:
> Hi,
> 
> [add Rui]
> 
> > 
> > Russell King - ARM Linux <linux@armlinux.org.uk> hat am 1. November
> > 2016 um
> > 10:23 geschrieben:
> > 
> > 
> > On Mon, Oct 31, 2016 at 08:54:33PM +0100, Stefan Wahren wrote:
> > > 
> > > [  366.696043] INFO: task ext4lazyinit:70 blocked for more than
> > > 120 seconds.
> > > [  366.703046]       Not tainted 4.9.0-rc1 #7
> > > [  366.707188] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > > disables
> > > this
> > > message.
> > > [  366.715161] ext4lazyinit    D c05aa6ac     0    70      2
> > > 0x00000000
> > This looks like a very different problem - I guess one for ext4
> > people.
> > 
> i investigated the issue more further. This trace wasn't
> representative, because
> the stacktrace is different in most cases. The "endless loop" is
> located in
> freeze_enter(). So i added some debug messages:
> 
is this a regression?

> -------------------------------->8-----------------------------------
> -
> diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
> index 1e7f5da..079c08d 100644
> --- a/kernel/power/suspend.c
> +++ b/kernel/power/suspend.c
> @@ -67,17 +67,20 @@ static void freeze_enter(void)
>  	spin_unlock_irq(&suspend_freeze_lock);
>  
>  	get_online_cpus();
> +	pr_info("PM: calling cpuidle_resume()\n");
>  	cpuidle_resume();
>  
>  	/* Push all the CPUs into the idle loop. */
> +	pr_info("PM: calling wake_up_all_idle_cpus()\n");
>  	wake_up_all_idle_cpus();
> -	pr_debug("PM: suspend-to-idle\n");
> +	pr_info("PM: suspend-to-idle\n");
>  	/* Make the current CPU wait so it can enter the idle loop
> too. */
>  	wait_event(suspend_freeze_wait_head,
>  		   suspend_freeze_state == FREEZE_STATE_WAKE);
> -	pr_debug("PM: resume from suspend-to-idle\n");
> +	pr_info("PM: resume from suspend-to-idle\n");
>  
>  	cpuidle_pause();
> +	pr_info("PM: called cpuidle_pause()\n");
>  	put_online_cpus();
>  
>  	spin_lock_irq(&suspend_freeze_lock);
> @@ -91,6 +94,8 @@ void freeze_wake(void)
>  {
>  	unsigned long flags;
>  
> +	pr_info("PM: freeze_wake()\n");
> +
>  	spin_lock_irqsave(&suspend_freeze_lock, flags);
>  	if (suspend_freeze_state > FREEZE_STATE_NONE) {
>  		suspend_freeze_state = FREEZE_STATE_WAKE;
> 
> -------------------------------->8-----------------------------------
> -
> 
> and repeated the test cases for freeze which shows none the
> representative
> stacktraces:
> 
> 1. cmdline contains no_console_suspend
> 
> * echo freeze > /sys/power/state
> 
> ...
> [  139.371308] PM: suspend of devices complete after 1342.721 msecs
> [  139.385203] PM: late suspend of devices complete after 7.668 msecs
> [  139.399428] PM: noirq suspend of devices complete after 7.936
> msecs
> [  139.406639] PM: calling cpuidle_resume()
> [  139.410619] PM: calling wake_up_all_idle_cpus()
> [  139.415339] PM: suspend-to-idle
> 
> > 
> > > 
> > > > 
> > > > no reaction to input via Debug UART
> [  366.683570] INFO: task bash:373 blocked for more than 120 seconds.
> [  366.689814]       Not tainted 4.9.0-rc1-dirty #14
> [  366.694705] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this
> message.
> [  366.702685] bash            D c05aa6ec     0   373    275
> 0x00000000
> [  366.709227] [<c05aa6ec>] (__schedule) from [<c05aaff8>]
> (schedule+0x3c/0xbc)
> [  366.716495] [<c05aaff8>] (schedule) from [<c005b588>]
> (suspend_devices_and_enter+0x888/0xa78)
> [  366.725228] [<c005b588>] (suspend_devices_and_enter) from
> [<c005bea4>]
> (pm_suspend+0x72c/0x81c)
> [  366.734150] [<c005bea4>] (pm_suspend) from [<c005a008>]
> (state_store+0x80/0xcc)
> [  366.741554] [<c005a008>] (state_store) from [<c02f0270>]
> (kobj_attr_store+0x18/0x1c)
> [  366.749515] [<c02f0270>] (kobj_attr_store) from [<c01911e8>]
> (sysfs_kf_write+0x48/0x4c)
> [  366.757735] [<c01911e8>] (sysfs_kf_write) from [<c0190308>]
> (kernfs_fop_write+0xfc/0x1d0)
> [  366.766130] [<c0190308>] (kernfs_fop_write) from [<c011f578>]
> (__vfs_write+0x2c/0x124)
> [  366.774255] [<c011f578>] (__vfs_write) from [<c011f724>]
> (vfs_write+0xb4/0x1a4)
> [  366.781640] [<c011f724>] (vfs_write) from [<c011f8e8>]
> (SyS_write+0x44/0x88)
> [  366.788890] [<c011f8e8>] (SyS_write) from [<c000a2c0>]
> (ret_fast_syscall+0x0/0x1c)
> [  366.796627]
> [  366.796627] Showing all locks held in the system:
> [  366.803011] 2 locks held by khungtaskd/10:
> [  366.807149]  #0: [  366.808931]  (
> rcu_read_lock[  366.811784] ){......}
> , at: [  366.814813] [<c0093a40>] watchdog+0xb4/0x61c
> [  366.819128]  #1: [  366.820902]  (
> tasklist_lock[  366.823876] ){.+.+..}
> , at: [  366.826765] [<c0051dbc>] debug_show_all_locks+0x28/0x1bc
> [  366.832151] 4 locks held by bash/373:
> [  366.835973]  #0: [  366.837765]  (
> sb_writers[  366.840365] #4
> ){.+.+.+}[  366.842987] , at:
> [  366.845079] [<c011f804>] vfs_write+0x194/0x1a4
> [  366.849551]  #1: [  366.851324]  (
> &of->mutex[  366.854058] ){+.+.+.}
> , at: [  366.856944] [<c01902cc>] kernfs_fop_write+0xc0/0x1d0
> [  366.861941]  #2: [  366.863839]  (
> s_active[  366.866288] #43
> ){.+.+.+}[  366.868877] , at:
> [  366.870938] [<c01902d4>] kernfs_fop_write+0xc8/0x1d0
> [  366.876053]  #3: [  366.877843]  (
> pm_mutex[  366.880272] ){+.+.+.}
> , at: [  366.883266] [<c005b808>] pm_suspend+0x90/0x81c
> [  366.887757]
> [  366.889268] =============================================
> [  366.889268]
> 
> 2. cmdline contains no_console_suspend
> 
> * echo enabled > /sys/class/tty/ttyAMA0/power/wakeup
> * echo freeze > /sys/power/state
> 
> the same as in 1.
> 
> 3. cmdline doesn't contains no_console_suspend
> 
> * echo enabled > /sys/class/tty/ttyAMA0/power/wakeup
> * echo freeze > /sys/power/state
> 
> [  161.093187] PM: Syncing filesystems ... [  161.734413] done.
> [  161.793242] Freezing user space processes ... (elapsed 0.008
> seconds) done.
> [  161.809797] Freezing remaining freezable tasks ... (elapsed 0.004
> seconds)
> done.
> [  161.821953] Suspending console(s) (use no_console_suspend to
> debug)
> 
> > 
> > > > > no reaction to Debug UART

Then the system does not have any response? or the system freezes and
wakes up as expected?

> I expected that in all 3 cases freeze_wake() will be called. Why not?
> 
> Here my config again:
> 
> CONFIG_PM_SLEEP=y
> CONFIG_SUSPEND=y
> CONFIG_SUSPEND_FREEZER=y
> CONFIG_PM=y
> CONFIG_CPU_IDLE is not set

hmmm, why not have CONFIG_CPU_IDLE set?

thanks,
rui
Stefan Wahren Nov. 5, 2016, 12:01 p.m. UTC | #2
Hi Rui,

> Zhang Rui <rui.zhang@intel.com> hat am 5. November 2016 um 12:39 geschrieben:
> 
> 
> On Sat, 2016-11-05 at 12:07 +0100, Stefan Wahren wrote:
> > Hi,
> > 
> > [add Rui]
> > 
> > > 
> > > Russell King - ARM Linux <linux@armlinux.org.uk> hat am 1. November
> > > 2016 um
> > > 10:23 geschrieben:
> > > 
> > > 
> > > On Mon, Oct 31, 2016 at 08:54:33PM +0100, Stefan Wahren wrote:
> > > > 
> > > > [  366.696043] INFO: task ext4lazyinit:70 blocked for more than
> > > > 120 seconds.
> > > > [  366.703046]       Not tainted 4.9.0-rc1 #7
> > > > [  366.707188] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > > > disables
> > > > this
> > > > message.
> > > > [  366.715161] ext4lazyinit    D c05aa6ac     0    70      2
> > > > 0x00000000
> > > This looks like a very different problem - I guess one for ext4
> > > people.
> > > 
> > i investigated the issue more further. This trace wasn't
> > representative, because
> > the stacktrace is different in most cases. The "endless loop" is
> > located in
> > freeze_enter(). So i added some debug messages:
> > 
> is this a regression?

i've never seen this working on MXS platform, but maybe i hit a corner case. The
oldest kernel that i tested was 3.18 and this issue was reproducible. Should i
test an older version?

> > 
> > and repeated the test cases for freeze which shows none the
> > representative
> > stacktraces:
> > 
> > 1. cmdline contains no_console_suspend
> > 
> > * echo freeze > /sys/power/state
> > 
> > ...
> > [  139.371308] PM: suspend of devices complete after 1342.721 msecs
> > [  139.385203] PM: late suspend of devices complete after 7.668 msecs
> > [  139.399428] PM: noirq suspend of devices complete after 7.936
> > msecs
> > [  139.406639] PM: calling cpuidle_resume()
> > [  139.410619] PM: calling wake_up_all_idle_cpus()
> > [  139.415339] PM: suspend-to-idle
> > 
> > > 
> > > > 
> > > > > 
> > > > > no reaction to input via Debug UART
> > [  366.683570] INFO: task bash:373 blocked for more than 120 seconds.
> > [  366.689814]       Not tainted 4.9.0-rc1-dirty #14
> > [  366.694705] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > disables this
> > message.
> > [  366.702685] bash            D c05aa6ec     0   373    275
> > 0x00000000
> > [  366.709227] [<c05aa6ec>] (__schedule) from [<c05aaff8>]
> > (schedule+0x3c/0xbc)
> > [  366.716495] [<c05aaff8>] (schedule) from [<c005b588>]
> > (suspend_devices_and_enter+0x888/0xa78)
> > [  366.725228] [<c005b588>] (suspend_devices_and_enter) from
> > [<c005bea4>]
> > (pm_suspend+0x72c/0x81c)
> > [  366.734150] [<c005bea4>] (pm_suspend) from [<c005a008>]
> > (state_store+0x80/0xcc)
> > [  366.741554] [<c005a008>] (state_store) from [<c02f0270>]
> > (kobj_attr_store+0x18/0x1c)
> > [  366.749515] [<c02f0270>] (kobj_attr_store) from [<c01911e8>]
> > (sysfs_kf_write+0x48/0x4c)
> > [  366.757735] [<c01911e8>] (sysfs_kf_write) from [<c0190308>]
> > (kernfs_fop_write+0xfc/0x1d0)
> > [  366.766130] [<c0190308>] (kernfs_fop_write) from [<c011f578>]
> > (__vfs_write+0x2c/0x124)
> > [  366.774255] [<c011f578>] (__vfs_write) from [<c011f724>]
> > (vfs_write+0xb4/0x1a4)
> > [  366.781640] [<c011f724>] (vfs_write) from [<c011f8e8>]
> > (SyS_write+0x44/0x88)
> > [  366.788890] [<c011f8e8>] (SyS_write) from [<c000a2c0>]
> > (ret_fast_syscall+0x0/0x1c)
> > [  366.796627]
> > [  366.796627] Showing all locks held in the system:
> > [  366.803011] 2 locks held by khungtaskd/10:
> > [  366.807149]  #0: [  366.808931]  (
> > rcu_read_lock[  366.811784] ){......}
> > , at: [  366.814813] [<c0093a40>] watchdog+0xb4/0x61c
> > [  366.819128]  #1: [  366.820902]  (
> > tasklist_lock[  366.823876] ){.+.+..}
> > , at: [  366.826765] [<c0051dbc>] debug_show_all_locks+0x28/0x1bc
> > [  366.832151] 4 locks held by bash/373:
> > [  366.835973]  #0: [  366.837765]  (
> > sb_writers[  366.840365] #4
> > ){.+.+.+}[  366.842987] , at:
> > [  366.845079] [<c011f804>] vfs_write+0x194/0x1a4
> > [  366.849551]  #1: [  366.851324]  (
> > &of->mutex[  366.854058] ){+.+.+.}
> > , at: [  366.856944] [<c01902cc>] kernfs_fop_write+0xc0/0x1d0
> > [  366.861941]  #2: [  366.863839]  (
> > s_active[  366.866288] #43
> > ){.+.+.+}[  366.868877] , at:
> > [  366.870938] [<c01902d4>] kernfs_fop_write+0xc8/0x1d0
> > [  366.876053]  #3: [  366.877843]  (
> > pm_mutex[  366.880272] ){+.+.+.}
> > , at: [  366.883266] [<c005b808>] pm_suspend+0x90/0x81c
> > [  366.887757]
> > [  366.889268] =============================================
> > [  366.889268]
> > 
> > 2. cmdline contains no_console_suspend
> > 
> > * echo enabled > /sys/class/tty/ttyAMA0/power/wakeup
> > * echo freeze > /sys/power/state
> > 
> > the same as in 1.
> > 
> > 3. cmdline doesn't contains no_console_suspend
> > 
> > * echo enabled > /sys/class/tty/ttyAMA0/power/wakeup
> > * echo freeze > /sys/power/state
> > 
> > [  161.093187] PM: Syncing filesystems ... [  161.734413] done.
> > [  161.793242] Freezing user space processes ... (elapsed 0.008
> > seconds) done.
> > [  161.809797] Freezing remaining freezable tasks ... (elapsed 0.004
> > seconds)
> > done.
> > [  161.821953] Suspending console(s) (use no_console_suspend to
> > debug)
> > 
> > > 
> > > > > > no reaction to Debug UART
> 
> Then the system does not have any response? or the system freezes and
> wakes up as expected?

The system doesn't react to any input from Debug UART and after 2 minutes i got
the following stacktrace about hung task. But i didn't get any change to come
back to the console. Interestingly basic PM debugging tests doesn't show this
issue. Also "echo mem > /sys/power/state" doesn't cause this issue.

> 
> > I expected that in all 3 cases freeze_wake() will be called. Why not?
> > 
> > Here my config again:
> > 
> > CONFIG_PM_SLEEP=y
> > CONFIG_SUSPEND=y
> > CONFIG_SUSPEND_FREEZER=y
> > CONFIG_PM=y
> > CONFIG_CPU_IDLE is not set
> 
> hmmm, why not have CONFIG_CPU_IDLE set?

I'm using mxs_defconfig which doesn't select the ARM CPU idle. Is this
necessary?

> 
> thanks,
> rui
Daniel Lezcano Nov. 5, 2016, 1 p.m. UTC | #3
On Sat, Nov 05, 2016 at 01:01:26PM +0100, Stefan Wahren wrote:

[ ... ]

> > > CONFIG_SUSPEND=y
> > > CONFIG_SUSPEND_FREEZER=y
> > > CONFIG_PM=y
> > > CONFIG_CPU_IDLE is not set
> > 
> > hmmm, why not have CONFIG_CPU_IDLE set?
> 
> I'm using mxs_defconfig which doesn't select the ARM CPU idle. Is this
> necessary?

Very likely :) suspend_freezer and cpuidle are tied together.

Moreover, while reading the code, it appears without CONFIG_CPU_IDLE the
function cpuidle_idle_call is constantly failing, ending up by having the
cpu looping again and again in the idle loop. The function stubs return
-ENODEV in the cpuidle's header when CONFIG_CPU_IDLE is not set.

If I'm not wrong the traces should show the cpu actually does never go
to suspend. As soon as it enters the state, it should exit immediately
without an interrupt event.

Probably there is an inconsistent configuration leaving the kernel with
a strange wakeup condition or a race with the short the suspend/resume
cycle delay.

First test would be to enable CONFIG_CPU_IDLE.

If it is confirmed, it would be nice to feed bugzilla.
Stefan Wahren Nov. 5, 2016, 3:28 p.m. UTC | #4
Hi Daniel,

> Daniel Lezcano <daniel.lezcano@linaro.org> hat am 5. November 2016 um 14:00
> geschrieben:
> 
> 
> On Sat, Nov 05, 2016 at 01:01:26PM +0100, Stefan Wahren wrote:
> 
> [ ... ]
> 
> > > > CONFIG_SUSPEND=y
> > > > CONFIG_SUSPEND_FREEZER=y
> > > > CONFIG_PM=y
> > > > CONFIG_CPU_IDLE is not set
> > > 
> > > hmmm, why not have CONFIG_CPU_IDLE set?
> > 
> > I'm using mxs_defconfig which doesn't select the ARM CPU idle. Is this
> > necessary?
> 
> Very likely :) suspend_freezer and cpuidle are tied together.
> 
> Moreover, while reading the code, it appears without CONFIG_CPU_IDLE the
> function cpuidle_idle_call is constantly failing, ending up by having the
> cpu looping again and again in the idle loop. The function stubs return
> -ENODEV in the cpuidle's header when CONFIG_CPU_IDLE is not set.
> 
> If I'm not wrong the traces should show the cpu actually does never go
> to suspend. As soon as it enters the state, it should exit immediately
> without an interrupt event.
> 
> Probably there is an inconsistent configuration leaving the kernel with
> a strange wakeup condition or a race with the short the suspend/resume
> cycle delay.
> 
> First test would be to enable CONFIG_CPU_IDLE.
> 
> If it is confirmed, it would be nice to feed bugzilla.

first i tried with CONFIG_CPU_IDLE=Y and then additionally with
CONFIG_ARM_CPUIDLE=y.

In both cases the issue is still reproducible.

As i wrote in my email before, i added a pr_info() into freeze_wake. But i never
see the output of this message. So i assume freeze_wake is never called. Again,
how could this happen?
Russell King (Oracle) Nov. 5, 2016, 6:05 p.m. UTC | #5
On Sat, Nov 05, 2016 at 04:28:37PM +0100, Stefan Wahren wrote:
> As i wrote in my email before, i added a pr_info() into freeze_wake.
> But i never see the output of this message. So i assume freeze_wake
> is never called. Again, how could this happen?

Hmm, so the bit that you're getting stuck on is:

        wait_event(suspend_freeze_wait_head,
                   suspend_freeze_state == FREEZE_STATE_WAKE);

Now there's two things about this here - it's a non-interruptible wait,
so I think the hung task detection may trigger on that (I'm not entirely
sure on that point though, and I don't have time this evening to read
the code to find out.)

The second thing is, that in order to pass this point, something has to
call freeze_wake().

There's not that many possibilities for that:

$ git grep freeze_wake drivers kernel
drivers/base/power/wakeup.c:    freeze_wake();
drivers/base/power/wakeup.c:    freeze_wake();
kernel/power/suspend.c:void freeze_wake(void)
kernel/power/suspend.c:EXPORT_SYMBOL_GPL(freeze_wake);

One of those is pm_system_wakeup(), the other is wakeup_source_activate()
via wakeup_source_report_event() via __pm_stay_awake() or
__pm_wakeup_event().

Looking at the results of:

$ grep 'pm_wakeup_event\|pm_stay_awake\|pm_system_wakeup' drivers kernel -r

it looks like for freeze support to work, various drivers need to call
one of these functions.

The iMX serial driver doesn't call any of these functions, so I can't
see how we'd get past this point - and from that grep you'll see nothing
in kernel/irq touches any of these functions.

Documentation/power/suspend-and-interrupts.txt does a very poor job
(which is typical) of describing what the requirements are for
"suspend-to-idle", it doesn't really say that any of the above
functions must be called and it doesn't say who's responsible for
calling these functions.  It does talk about "pm_system_wakeup()"
for "rare cases".

So my conclusion, based on the poor documentation and the results of
my greps, is that "freeze" aka "suspend-to-idle" is not supported on
the majority of hardware, and attempting to use it will result in the
system locking up in exactly the way you're seeing.
diff mbox

Patch

diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
index 1e7f5da..079c08d 100644
--- a/kernel/power/suspend.c
+++ b/kernel/power/suspend.c
@@ -67,17 +67,20 @@  static void freeze_enter(void)
 	spin_unlock_irq(&suspend_freeze_lock);
 
 	get_online_cpus();
+	pr_info("PM: calling cpuidle_resume()\n");
 	cpuidle_resume();
 
 	/* Push all the CPUs into the idle loop. */
+	pr_info("PM: calling wake_up_all_idle_cpus()\n");
 	wake_up_all_idle_cpus();
-	pr_debug("PM: suspend-to-idle\n");
+	pr_info("PM: suspend-to-idle\n");
 	/* Make the current CPU wait so it can enter the idle loop too. */
 	wait_event(suspend_freeze_wait_head,
 		   suspend_freeze_state == FREEZE_STATE_WAKE);
-	pr_debug("PM: resume from suspend-to-idle\n");
+	pr_info("PM: resume from suspend-to-idle\n");
 
 	cpuidle_pause();
+	pr_info("PM: called cpuidle_pause()\n");
 	put_online_cpus();
 
 	spin_lock_irq(&suspend_freeze_lock);
@@ -91,6 +94,8 @@  void freeze_wake(void)
 {
 	unsigned long flags;
 
+	pr_info("PM: freeze_wake()\n");
+
 	spin_lock_irqsave(&suspend_freeze_lock, flags);
 	if (suspend_freeze_state > FREEZE_STATE_NONE) {
 		suspend_freeze_state = FREEZE_STATE_WAKE;