diff mbox

[1/2] cpufreq: try to resume policies which failed on last resume

Message ID 5562479.pVWRuDL0y6@vostro.rjw.lan (mailing list archive)
State Accepted, archived
Headers show

Commit Message

Rafael J. Wysocki Dec. 26, 2013, 1:05 a.m. UTC
On Tuesday, December 24, 2013 07:11:00 AM Viresh Kumar wrote:
> __cpufreq_add_dev() can fail sometimes while we are resuming our system.
> Currently we are clearing all sysfs nodes for cpufreq's failed policy as that
> could make userspace unstable. But if we suspend/resume again, we should atleast
> try to bring back those policies.
> 
> This patch fixes this issue by clearing fallback data on failure and trying to
> allocate a new struct cpufreq_policy on second resume.
> 
> Reported-and-tested-by: Bjørn Mork <bjorn@mork.no>
> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

Well, while I appreciate the work done here, I don't like the changelog,
I don't really like the way the code is written and I don't like the comments.
Sorry about that.

Bjorn, can you please test the patch below instead along with the [2/2]
from this series (on top of linux-pm.git/pm-cpufreq)?

Rafael


---
From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Subject: cpufreq: Clean up after a failing light-weight initialization

If cpufreq_policy_restore() returns NULL during system resume,
__cpufreq_add_dev() should just fall back to the full initialization
instead of returning an error, because that may actually make things
work.  Moreover, it should not leave stale fallback data behind after
it has failed to restore a previously existing policy.

This change is based on Viresh Kumar's work.

Reported-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/cpufreq/cpufreq.c |   23 ++++++++++++++---------
 1 file changed, 14 insertions(+), 9 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Viresh Kumar Dec. 26, 2013, 2:47 a.m. UTC | #1
On 26 December 2013 06:35, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Subject: cpufreq: Clean up after a failing light-weight initialization
>
> If cpufreq_policy_restore() returns NULL during system resume,
> __cpufreq_add_dev() should just fall back to the full initialization
> instead of returning an error, because that may actually make things
> work.  Moreover, it should not leave stale fallback data behind after
> it has failed to restore a previously existing policy.
>
> This change is based on Viresh Kumar's work.
>
> Reported-by: Bjørn Mork <bjorn@mork.no>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---

Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Viresh Kumar Dec. 27, 2013, 9:57 a.m. UTC | #2
On 26 December 2013 08:17, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> On 26 December 2013 06:35, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>> Subject: cpufreq: Clean up after a failing light-weight initialization
>>
>> If cpufreq_policy_restore() returns NULL during system resume,
>> __cpufreq_add_dev() should just fall back to the full initialization
>> instead of returning an error, because that may actually make things
>> work.  Moreover, it should not leave stale fallback data behind after
>> it has failed to restore a previously existing policy.
>>
>> This change is based on Viresh Kumar's work.
>>
>> Reported-by: Bjørn Mork <bjorn@mork.no>
>> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>> ---
>
> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>

I think there is nothing much different in this patch compared to what Bjorn
tested. So you can probably push that now and let him test linux-next later
once he is back?
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Viresh Kumar Dec. 27, 2013, 9:58 a.m. UTC | #3
On 27 December 2013 15:27, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> I think there is nothing much different in this patch compared to what Bjorn
> tested. So you can probably push that now and let him test linux-next later
> once he is back?

Just saw that you already pushed it. :)
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjørn Mork Dec. 30, 2013, 4:40 p.m. UTC | #4
Yes, that's good. As you might have guessed by the lack of any response, I won't be able to test this in 2013.

Thanks for all the good work!


Bjørn 

Viresh Kumar <viresh.kumar@linaro.org> wrote:
>On 27 December 2013 15:27, Viresh Kumar <viresh.kumar@linaro.org>
>wrote:
>> I think there is nothing much different in this patch compared to
>what Bjorn
>> tested. So you can probably push that now and let him test linux-next
>later
>> once he is back?
>
>Just saw that you already pushed it. :)

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjørn Mork Jan. 2, 2014, 12:15 p.m. UTC | #5
"Rafael J. Wysocki" <rjw@rjwysocki.net> writes:

> On Tuesday, December 24, 2013 07:11:00 AM Viresh Kumar wrote:
>> __cpufreq_add_dev() can fail sometimes while we are resuming our system.
>> Currently we are clearing all sysfs nodes for cpufreq's failed policy as that
>> could make userspace unstable. But if we suspend/resume again, we should atleast
>> try to bring back those policies.
>> 
>> This patch fixes this issue by clearing fallback data on failure and trying to
>> allocate a new struct cpufreq_policy on second resume.
>> 
>> Reported-and-tested-by: Bjørn Mork <bjorn@mork.no>
>> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
>
> Well, while I appreciate the work done here, I don't like the changelog,
> I don't really like the way the code is written and I don't like the comments.
> Sorry about that.
>
> Bjorn, can you please test the patch below instead along with the [2/2]
> from this series (on top of linux-pm.git/pm-cpufreq)?

I tested this series with your modified 1/2 today, but on top of a
current mainline (commit 9a0bb2966ef) instead of linux-pm.git/pm-cpufreq.
Shouldn't make much difference since Linus already has pulled your
'pm+acpi-3.13-rc6' tag, which included that pm-cpufreq branch.

This series fixes the reported bug for me.


But I observed this, most likely unrelated, splat during the test:

ACPI Exception: AE_BAD_PARAMETER, Returned by Handler for [EmbeddedControl] (20131115/evregion-282)
ACPI Error: Method parse/execution failed [\_SB_.PCI0.LPC_.EC__.LPMD] (Node ffff880232499c18), AE_BAD_PARAMETER (20131115/psparse-536)
ACPI Error: Method parse/execution failed [\_PR_.CPU0._PPC] (Node ffff8802326f93d0), AE_BAD_PARAMETER (20131115/psparse-536)
ACPI Error: Method parse/execution failed [\_PR_.CPU1._PPC] (Node ffff8802326f9268), AE_BAD_PARAMETER (20131115/psparse-536)
ACPI Exception: AE_BAD_PARAMETER, Evaluating _PPC (20131115/processor_perflib-140)

======================================================
[ INFO: possible circular locking dependency detected ]
3.13.0-rc6+ #181 Not tainted
-------------------------------------------------------
s2disk/5651 is trying to acquire lock:
 (s_active#170){++++.+}, at: [<ffffffff8118b9d7>] sysfs_addrm_finish+0x28/0x46

but task is already holding lock:
 (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff81039ff5>] cpu_hotplug_begin+0x28/0x50

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #1 (cpu_hotplug.lock){+.+.+.}:
       [<ffffffff81075027>] lock_acquire+0xfb/0x144
       [<ffffffff8139d4d2>] mutex_lock_nested+0x6c/0x397
       [<ffffffff81039f4a>] get_online_cpus+0x3c/0x50
       [<ffffffff812a6974>] store+0x20/0xad
       [<ffffffff8118a9a1>] sysfs_write_file+0x138/0x18b
       [<ffffffff8112a428>] vfs_write+0x9c/0x102
       [<ffffffff8112a716>] SyS_write+0x50/0x85
       [<ffffffff813a57a2>] system_call_fastpath+0x16/0x1b

-> #0 (s_active#170){++++.+}:
       [<ffffffff81074760>] __lock_acquire+0xae3/0xe68
       [<ffffffff81075027>] lock_acquire+0xfb/0x144
       [<ffffffff8118b027>] sysfs_deactivate+0xa5/0x108
       [<ffffffff8118b9d7>] sysfs_addrm_finish+0x28/0x46
       [<ffffffff8118bd3f>] sysfs_remove+0x2a/0x31
       [<ffffffff8118be2f>] sysfs_remove_dir+0x66/0x6b
       [<ffffffff811d5d11>] kobject_del+0x18/0x42
       [<ffffffff811d5e1c>] kobject_cleanup+0xe1/0x14f
       [<ffffffff811d5ede>] kobject_put+0x45/0x49
       [<ffffffff812a6e85>] cpufreq_policy_put_kobj+0x37/0x83
       [<ffffffff812a8bab>] __cpufreq_add_dev.isra.18+0x75e/0x78c
       [<ffffffff812a8c39>] cpufreq_cpu_callback+0x53/0x88
       [<ffffffff813a314c>] notifier_call_chain+0x67/0x92
       [<ffffffff8105bce4>] __raw_notifier_call_chain+0x9/0xb
       [<ffffffff81039e7c>] __cpu_notify+0x1b/0x32
       [<ffffffff81039ea1>] cpu_notify+0xe/0x10
       [<ffffffff8103a12b>] _cpu_up+0xf1/0x124
       [<ffffffff8138ee7d>] enable_nonboot_cpus+0x52/0xbf
       [<ffffffff8107a2a3>] hibernation_snapshot+0x1be/0x2ed
       [<ffffffff8107eb84>] snapshot_ioctl+0x1e5/0x431
       [<ffffffff81139080>] do_vfs_ioctl+0x45e/0x4a8
       [<ffffffff8113911c>] SyS_ioctl+0x52/0x82
       [<ffffffff813a57a2>] system_call_fastpath+0x16/0x1b

other info that might help us debug this:

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(cpu_hotplug.lock);
                               lock(s_active#170);
                               lock(cpu_hotplug.lock);
  lock(s_active#170);

 *** DEADLOCK ***

7 locks held by s2disk/5651:
 #0:  (pm_mutex){+.+.+.}, at: [<ffffffff8107e9ea>] snapshot_ioctl+0x4b/0x431
 #1:  (device_hotplug_lock){+.+.+.}, at: [<ffffffff81283730>] lock_device_hotplug+0x12/0x14
 #2:  (acpi_scan_lock){+.+.+.}, at: [<ffffffff8122c657>] acpi_scan_lock_acquire+0x12/0x14
 #3:  (console_lock){+.+.+.}, at: [<ffffffff810817f2>] suspend_console+0x20/0x38
 #4:  (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff81039fb9>] cpu_maps_update_begin+0x12/0x14
 #5:  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff81039ff5>] cpu_hotplug_begin+0x28/0x50
 #6:  (cpufreq_rwsem){.+.+.+}, at: [<ffffffff812a84cc>] __cpufreq_add_dev.isra.18+0x7f/0x78c

stack backtrace:
CPU: 0 PID: 5651 Comm: s2disk Not tainted 3.13.0-rc6+ #181
Hardware name: LENOVO 2776LEG/2776LEG, BIOS 6EET55WW (3.15 ) 12/19/2011
 ffffffff81d3edf0 ffff8802174898f8 ffffffff81399cac 0000000000004549
 ffffffff81d3edf0 ffff880217489948 ffffffff81397dc5 ffff880217489928
 ffff88023198ea50 0000000000000006 ffff88023198f1d8 0000000000000006
Call Trace:
 [<ffffffff81399cac>] dump_stack+0x4e/0x68
 [<ffffffff81397dc5>] print_circular_bug+0x1f8/0x209
 [<ffffffff81074760>] __lock_acquire+0xae3/0xe68
 [<ffffffff8107565d>] ? debug_check_no_locks_freed+0x12c/0x143
 [<ffffffff81075027>] lock_acquire+0xfb/0x144
 [<ffffffff8118b9d7>] ? sysfs_addrm_finish+0x28/0x46
 [<ffffffff81072201>] ? lockdep_init_map+0x14e/0x160
 [<ffffffff8118b027>] sysfs_deactivate+0xa5/0x108
 [<ffffffff8118b9d7>] ? sysfs_addrm_finish+0x28/0x46
 [<ffffffff8118b9d7>] sysfs_addrm_finish+0x28/0x46
 [<ffffffff8118bd3f>] sysfs_remove+0x2a/0x31
 [<ffffffff8118be2f>] sysfs_remove_dir+0x66/0x6b
 [<ffffffff811d5d11>] kobject_del+0x18/0x42
 [<ffffffff811d5e1c>] kobject_cleanup+0xe1/0x14f
 [<ffffffff811d5ede>] kobject_put+0x45/0x49
 [<ffffffff812a6e85>] cpufreq_policy_put_kobj+0x37/0x83
 [<ffffffff812a8bab>] __cpufreq_add_dev.isra.18+0x75e/0x78c
 [<ffffffff81071b04>] ? __lock_is_held+0x32/0x54
 [<ffffffff812a8c39>] cpufreq_cpu_callback+0x53/0x88
 [<ffffffff813a314c>] notifier_call_chain+0x67/0x92
 [<ffffffff8105bce4>] __raw_notifier_call_chain+0x9/0xb
 [<ffffffff81039e7c>] __cpu_notify+0x1b/0x32
 [<ffffffff81039ea1>] cpu_notify+0xe/0x10
 [<ffffffff8103a12b>] _cpu_up+0xf1/0x124
 [<ffffffff8138ee7d>] enable_nonboot_cpus+0x52/0xbf
 [<ffffffff8107a2a3>] hibernation_snapshot+0x1be/0x2ed
 [<ffffffff8107eb84>] snapshot_ioctl+0x1e5/0x431
 [<ffffffff81139080>] do_vfs_ioctl+0x45e/0x4a8
 [<ffffffff811414c8>] ? fcheck_files+0xa1/0xe4
 [<ffffffff81141a67>] ? fget_light+0x33/0x9a
 [<ffffffff8113911c>] SyS_ioctl+0x52/0x82
 [<ffffffff811df4ce>] ? trace_hardirqs_on_thunk+0x3a/0x3f
 [<ffffffff813a57a2>] system_call_fastpath+0x16/0x1b
CPU1 is up
ACPI: Waking up from system sleep state S4
PM: noirq thaw of devices complete after 2.727 msecs
PM: early thaw of devices complete after 1.137 msecs



This warning appeared when I tried cancelling hibernation, which is
known to fail when using the acpi-cpufreq driver.  The point of the test
was to verify that such failures still produce the expected result:

 - all stale sysfs files are cleaned up and removed
 - later suspend/resume actions will restore (or actually reinitiate)
   the cpufreq driver for the non-boot CPUs

Which was successful, except that it produced the above lockdep warning.

I have not verified that this is a new warning (which means that it most
likely is not).  And I expect the whole acpi-cpufreq problem, including
that warning, to go away when you eventually push
http://www.spinics.net/lists/cpufreq/msg08714.html with your improved
changelog (thanks for doing that BTW).  So I don't worry too much about
it.  Just wanted to let you know.



Bjørn
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki Jan. 3, 2014, 12:40 a.m. UTC | #6
On Thursday, January 02, 2014 01:15:10 PM Bjørn Mork wrote:
> "Rafael J. Wysocki" <rjw@rjwysocki.net> writes:
> 
> > On Tuesday, December 24, 2013 07:11:00 AM Viresh Kumar wrote:
> >> __cpufreq_add_dev() can fail sometimes while we are resuming our system.
> >> Currently we are clearing all sysfs nodes for cpufreq's failed policy as that
> >> could make userspace unstable. But if we suspend/resume again, we should atleast
> >> try to bring back those policies.
> >> 
> >> This patch fixes this issue by clearing fallback data on failure and trying to
> >> allocate a new struct cpufreq_policy on second resume.
> >> 
> >> Reported-and-tested-by: Bjørn Mork <bjorn@mork.no>
> >> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
> >
> > Well, while I appreciate the work done here, I don't like the changelog,
> > I don't really like the way the code is written and I don't like the comments.
> > Sorry about that.
> >
> > Bjorn, can you please test the patch below instead along with the [2/2]
> > from this series (on top of linux-pm.git/pm-cpufreq)?
> 
> I tested this series with your modified 1/2 today, but on top of a
> current mainline (commit 9a0bb2966ef) instead of linux-pm.git/pm-cpufreq.
> Shouldn't make much difference since Linus already has pulled your
> 'pm+acpi-3.13-rc6' tag, which included that pm-cpufreq branch.
> 
> This series fixes the reported bug for me.
> 
> 
> But I observed this, most likely unrelated, splat during the test:
> 
> ACPI Exception: AE_BAD_PARAMETER, Returned by Handler for [EmbeddedControl] (20131115/evregion-282)
> ACPI Error: Method parse/execution failed [\_SB_.PCI0.LPC_.EC__.LPMD] (Node ffff880232499c18), AE_BAD_PARAMETER (20131115/psparse-536)
> ACPI Error: Method parse/execution failed [\_PR_.CPU0._PPC] (Node ffff8802326f93d0), AE_BAD_PARAMETER (20131115/psparse-536)
> ACPI Error: Method parse/execution failed [\_PR_.CPU1._PPC] (Node ffff8802326f9268), AE_BAD_PARAMETER (20131115/psparse-536)
> ACPI Exception: AE_BAD_PARAMETER, Evaluating _PPC (20131115/processor_perflib-140)
> 
> ======================================================
> [ INFO: possible circular locking dependency detected ]
> 3.13.0-rc6+ #181 Not tainted
> -------------------------------------------------------
> s2disk/5651 is trying to acquire lock:
>  (s_active#170){++++.+}, at: [<ffffffff8118b9d7>] sysfs_addrm_finish+0x28/0x46
> 
> but task is already holding lock:
>  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff81039ff5>] cpu_hotplug_begin+0x28/0x50
> 
> which lock already depends on the new lock.
> 
> 
> the existing dependency chain (in reverse order) is:
> 
> -> #1 (cpu_hotplug.lock){+.+.+.}:
>        [<ffffffff81075027>] lock_acquire+0xfb/0x144
>        [<ffffffff8139d4d2>] mutex_lock_nested+0x6c/0x397
>        [<ffffffff81039f4a>] get_online_cpus+0x3c/0x50
>        [<ffffffff812a6974>] store+0x20/0xad
>        [<ffffffff8118a9a1>] sysfs_write_file+0x138/0x18b
>        [<ffffffff8112a428>] vfs_write+0x9c/0x102
>        [<ffffffff8112a716>] SyS_write+0x50/0x85
>        [<ffffffff813a57a2>] system_call_fastpath+0x16/0x1b
> 
> -> #0 (s_active#170){++++.+}:
>        [<ffffffff81074760>] __lock_acquire+0xae3/0xe68
>        [<ffffffff81075027>] lock_acquire+0xfb/0x144
>        [<ffffffff8118b027>] sysfs_deactivate+0xa5/0x108
>        [<ffffffff8118b9d7>] sysfs_addrm_finish+0x28/0x46
>        [<ffffffff8118bd3f>] sysfs_remove+0x2a/0x31
>        [<ffffffff8118be2f>] sysfs_remove_dir+0x66/0x6b
>        [<ffffffff811d5d11>] kobject_del+0x18/0x42
>        [<ffffffff811d5e1c>] kobject_cleanup+0xe1/0x14f
>        [<ffffffff811d5ede>] kobject_put+0x45/0x49
>        [<ffffffff812a6e85>] cpufreq_policy_put_kobj+0x37/0x83
>        [<ffffffff812a8bab>] __cpufreq_add_dev.isra.18+0x75e/0x78c
>        [<ffffffff812a8c39>] cpufreq_cpu_callback+0x53/0x88
>        [<ffffffff813a314c>] notifier_call_chain+0x67/0x92
>        [<ffffffff8105bce4>] __raw_notifier_call_chain+0x9/0xb
>        [<ffffffff81039e7c>] __cpu_notify+0x1b/0x32
>        [<ffffffff81039ea1>] cpu_notify+0xe/0x10
>        [<ffffffff8103a12b>] _cpu_up+0xf1/0x124
>        [<ffffffff8138ee7d>] enable_nonboot_cpus+0x52/0xbf
>        [<ffffffff8107a2a3>] hibernation_snapshot+0x1be/0x2ed
>        [<ffffffff8107eb84>] snapshot_ioctl+0x1e5/0x431
>        [<ffffffff81139080>] do_vfs_ioctl+0x45e/0x4a8
>        [<ffffffff8113911c>] SyS_ioctl+0x52/0x82
>        [<ffffffff813a57a2>] system_call_fastpath+0x16/0x1b
> 
> other info that might help us debug this:
> 
>  Possible unsafe locking scenario:
> 
>        CPU0                    CPU1
>        ----                    ----
>   lock(cpu_hotplug.lock);
>                                lock(s_active#170);
>                                lock(cpu_hotplug.lock);
>   lock(s_active#170);
> 
>  *** DEADLOCK ***
> 
> 7 locks held by s2disk/5651:
>  #0:  (pm_mutex){+.+.+.}, at: [<ffffffff8107e9ea>] snapshot_ioctl+0x4b/0x431
>  #1:  (device_hotplug_lock){+.+.+.}, at: [<ffffffff81283730>] lock_device_hotplug+0x12/0x14
>  #2:  (acpi_scan_lock){+.+.+.}, at: [<ffffffff8122c657>] acpi_scan_lock_acquire+0x12/0x14
>  #3:  (console_lock){+.+.+.}, at: [<ffffffff810817f2>] suspend_console+0x20/0x38
>  #4:  (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff81039fb9>] cpu_maps_update_begin+0x12/0x14
>  #5:  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff81039ff5>] cpu_hotplug_begin+0x28/0x50
>  #6:  (cpufreq_rwsem){.+.+.+}, at: [<ffffffff812a84cc>] __cpufreq_add_dev.isra.18+0x7f/0x78c
> 
> stack backtrace:
> CPU: 0 PID: 5651 Comm: s2disk Not tainted 3.13.0-rc6+ #181
> Hardware name: LENOVO 2776LEG/2776LEG, BIOS 6EET55WW (3.15 ) 12/19/2011
>  ffffffff81d3edf0 ffff8802174898f8 ffffffff81399cac 0000000000004549
>  ffffffff81d3edf0 ffff880217489948 ffffffff81397dc5 ffff880217489928
>  ffff88023198ea50 0000000000000006 ffff88023198f1d8 0000000000000006
> Call Trace:
>  [<ffffffff81399cac>] dump_stack+0x4e/0x68
>  [<ffffffff81397dc5>] print_circular_bug+0x1f8/0x209
>  [<ffffffff81074760>] __lock_acquire+0xae3/0xe68
>  [<ffffffff8107565d>] ? debug_check_no_locks_freed+0x12c/0x143
>  [<ffffffff81075027>] lock_acquire+0xfb/0x144
>  [<ffffffff8118b9d7>] ? sysfs_addrm_finish+0x28/0x46
>  [<ffffffff81072201>] ? lockdep_init_map+0x14e/0x160
>  [<ffffffff8118b027>] sysfs_deactivate+0xa5/0x108
>  [<ffffffff8118b9d7>] ? sysfs_addrm_finish+0x28/0x46
>  [<ffffffff8118b9d7>] sysfs_addrm_finish+0x28/0x46
>  [<ffffffff8118bd3f>] sysfs_remove+0x2a/0x31
>  [<ffffffff8118be2f>] sysfs_remove_dir+0x66/0x6b
>  [<ffffffff811d5d11>] kobject_del+0x18/0x42
>  [<ffffffff811d5e1c>] kobject_cleanup+0xe1/0x14f
>  [<ffffffff811d5ede>] kobject_put+0x45/0x49
>  [<ffffffff812a6e85>] cpufreq_policy_put_kobj+0x37/0x83
>  [<ffffffff812a8bab>] __cpufreq_add_dev.isra.18+0x75e/0x78c
>  [<ffffffff81071b04>] ? __lock_is_held+0x32/0x54
>  [<ffffffff812a8c39>] cpufreq_cpu_callback+0x53/0x88
>  [<ffffffff813a314c>] notifier_call_chain+0x67/0x92
>  [<ffffffff8105bce4>] __raw_notifier_call_chain+0x9/0xb
>  [<ffffffff81039e7c>] __cpu_notify+0x1b/0x32
>  [<ffffffff81039ea1>] cpu_notify+0xe/0x10
>  [<ffffffff8103a12b>] _cpu_up+0xf1/0x124
>  [<ffffffff8138ee7d>] enable_nonboot_cpus+0x52/0xbf
>  [<ffffffff8107a2a3>] hibernation_snapshot+0x1be/0x2ed
>  [<ffffffff8107eb84>] snapshot_ioctl+0x1e5/0x431
>  [<ffffffff81139080>] do_vfs_ioctl+0x45e/0x4a8
>  [<ffffffff811414c8>] ? fcheck_files+0xa1/0xe4
>  [<ffffffff81141a67>] ? fget_light+0x33/0x9a
>  [<ffffffff8113911c>] SyS_ioctl+0x52/0x82
>  [<ffffffff811df4ce>] ? trace_hardirqs_on_thunk+0x3a/0x3f
>  [<ffffffff813a57a2>] system_call_fastpath+0x16/0x1b
> CPU1 is up
> ACPI: Waking up from system sleep state S4
> PM: noirq thaw of devices complete after 2.727 msecs
> PM: early thaw of devices complete after 1.137 msecs
> 
> 
> 
> This warning appeared when I tried cancelling hibernation, which is
> known to fail when using the acpi-cpufreq driver.  The point of the test
> was to verify that such failures still produce the expected result:
> 
>  - all stale sysfs files are cleaned up and removed
>  - later suspend/resume actions will restore (or actually reinitiate)
>    the cpufreq driver for the non-boot CPUs
> 
> Which was successful, except that it produced the above lockdep warning.
> 
> I have not verified that this is a new warning (which means that it most
> likely is not).  And I expect the whole acpi-cpufreq problem, including
> that warning, to go away when you eventually push
> http://www.spinics.net/lists/cpufreq/msg08714.html with your improved
> changelog (thanks for doing that BTW).  So I don't worry too much about
> it.  Just wanted to let you know.

OK, thanks!  Well, this is somewhat worrisome.

Could you also check the linux-pm.git/fixes branch that contains all patches
I'm planning to push for 3.13-rc7 shortly?

Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjørn Mork Jan. 3, 2014, 9:24 a.m. UTC | #7
"Rafael J. Wysocki" <rjw@rjwysocki.net> writes:

> OK, thanks!  Well, this is somewhat worrisome.
>
> Could you also check the linux-pm.git/fixes branch that contains all patches
> I'm planning to push for 3.13-rc7 shortly?

It's definitely still there.  But I'm less sure about the exact trigger.

I have now booted and suspend that branch a few times trying to figure
out which actions it depends on. It seems to depend on a modified
scaling_max_freq (or any other attribute, I guess - haven't testing
anything else) in combination with suspend.  But the order doesn't
necessarily matter.  I can sometimes set off the warning by writing to
sysfs after resuming, and sometimes at suspend time if the value is
already modified.  The results are not consistent though.  Sometimes
there is no warning at all.

But what's 100% sure is that I still can trigger this with the
linux-pm.git/fixes branch.


For example, triggered by

  echo 800000 >/sys/devices/system/cpu/cpu1/cpufreq/scaling_max_freq

after resume (I believe the taint is due to a _request_firmware failure
for one of the bluetooth devices):

 
 ======================================================
 [ INFO: possible circular locking dependency detected ]
 3.13.0-rc6+ #183 Tainted: G        W   
 -------------------------------------------------------
 bash/4897 is trying to acquire lock:
  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff81039f4a>] get_online_cpus+0x3c/0x50
 
 but task is already holding lock:
  (s_active#273){++++.+}, at: [<ffffffff8118a93f>] sysfs_write_file+0xd6/0x18b
 
 which lock already depends on the new lock.
 
 
 the existing dependency chain (in reverse order) is:
 
 -> #1 (s_active#273){++++.+}:
        [<ffffffff81075027>] lock_acquire+0xfb/0x144
        [<ffffffff8118b027>] sysfs_deactivate+0xa5/0x108
        [<ffffffff8118b9d7>] sysfs_addrm_finish+0x28/0x46
        [<ffffffff8118bd3f>] sysfs_remove+0x2a/0x31
        [<ffffffff8118be2f>] sysfs_remove_dir+0x66/0x6b
        [<ffffffff811d5d11>] kobject_del+0x18/0x42
        [<ffffffff811d5e1c>] kobject_cleanup+0xe1/0x14f
        [<ffffffff811d5ede>] kobject_put+0x45/0x49
        [<ffffffff812a6ed5>] cpufreq_policy_put_kobj+0x37/0x83
        [<ffffffff812a8bfb>] __cpufreq_add_dev.isra.18+0x75e/0x78c
        [<ffffffff812a8c89>] cpufreq_cpu_callback+0x53/0x88
        [<ffffffff813a314c>] notifier_call_chain+0x67/0x92
        [<ffffffff8105bce4>] __raw_notifier_call_chain+0x9/0xb
        [<ffffffff81039e7c>] __cpu_notify+0x1b/0x32
        [<ffffffff81039ea1>] cpu_notify+0xe/0x10
        [<ffffffff8103a12b>] _cpu_up+0xf1/0x124
        [<ffffffff8138ee7d>] enable_nonboot_cpus+0x52/0xbf
        [<ffffffff8107a2a3>] hibernation_snapshot+0x1be/0x2ed
        [<ffffffff8107eb84>] snapshot_ioctl+0x1e5/0x431
        [<ffffffff81139080>] do_vfs_ioctl+0x45e/0x4a8
        [<ffffffff8113911c>] SyS_ioctl+0x52/0x82
        [<ffffffff813a57a2>] system_call_fastpath+0x16/0x1b
 
 -> #0 (cpu_hotplug.lock){+.+.+.}:
        [<ffffffff81074760>] __lock_acquire+0xae3/0xe68
        [<ffffffff81075027>] lock_acquire+0xfb/0x144
        [<ffffffff8139d4d2>] mutex_lock_nested+0x6c/0x397
        [<ffffffff81039f4a>] get_online_cpus+0x3c/0x50
        [<ffffffff812a69c4>] store+0x20/0xad
        [<ffffffff8118a9a1>] sysfs_write_file+0x138/0x18b
        [<ffffffff8112a428>] vfs_write+0x9c/0x102
        [<ffffffff8112a716>] SyS_write+0x50/0x85
        [<ffffffff813a57a2>] system_call_fastpath+0x16/0x1b
 
 other info that might help us debug this:
 
  Possible unsafe locking scenario:
 
        CPU0                    CPU1
        ----                    ----
   lock(s_active#273);
                                lock(cpu_hotplug.lock);
                                lock(s_active#273);
   lock(cpu_hotplug.lock);
 
  *** DEADLOCK ***
 
 3 locks held by bash/4897:
  #0:  (sb_writers#5){.+.+.+}, at: [<ffffffff811297c7>] file_start_write+0x27/0x29
  #1:  (&of->mutex){+.+.+.}, at: [<ffffffff8118a937>] sysfs_write_file+0xce/0x18b
  #2:  (s_active#273){++++.+}, at: [<ffffffff8118a93f>] sysfs_write_file+0xd6/0x18b
 
 stack backtrace:
 CPU: 1 PID: 4897 Comm: bash Tainted: G        W    3.13.0-rc6+ #183
 Hardware name: LENOVO 2776LEG/2776LEG, BIOS 6EET55WW (3.15 ) 12/19/2011
  ffffffff81ca83b0 ffff88021fa65c08 ffffffff81399cac 0000000000000006
  ffffffff81ca83b0 ffff88021fa65c58 ffffffff81397dc5 0000000000000002
  ffff8800b1af2810 0000000000000003 ffff8800b1af2ef0 0000000000000003
 Call Trace:
  [<ffffffff81399cac>] dump_stack+0x4e/0x68
  [<ffffffff81397dc5>] print_circular_bug+0x1f8/0x209
  [<ffffffff81074760>] __lock_acquire+0xae3/0xe68
  [<ffffffff81075027>] lock_acquire+0xfb/0x144
  [<ffffffff81039f4a>] ? get_online_cpus+0x3c/0x50
  [<ffffffff8139d4d2>] mutex_lock_nested+0x6c/0x397
  [<ffffffff81039f4a>] ? get_online_cpus+0x3c/0x50
  [<ffffffff81071b04>] ? __lock_is_held+0x32/0x54
  [<ffffffff81039f4a>] ? get_online_cpus+0x3c/0x50
  [<ffffffff81039f4a>] get_online_cpus+0x3c/0x50
  [<ffffffff812a69c4>] store+0x20/0xad
  [<ffffffff8118a9a1>] sysfs_write_file+0x138/0x18b
  [<ffffffff8112a428>] vfs_write+0x9c/0x102
  [<ffffffff8112a716>] SyS_write+0x50/0x85
  [<ffffffff813a57a2>] system_call_fastpath+0x16/0x1b


Or by s2disk (not the same boot - this time without the taint):

 ======================================================
 [ INFO: possible circular locking dependency detected ]
 3.13.0-rc6+ #183 Not tainted
 -------------------------------------------------------
 s2disk/5123 is trying to acquire lock:
  (s_active#171){++++.+}, at: [<ffffffff8118b9d7>] sysfs_addrm_finish+0x28/0x46
 
 but task is already holding lock:
  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff81039ff5>] cpu_hotplug_begin+0x28/0x50
 
 which lock already depends on the new lock.
 
 
 the existing dependency chain (in reverse order) is:
 
 -> #1 (cpu_hotplug.lock){+.+.+.}:
        [<ffffffff81075027>] lock_acquire+0xfb/0x144
        [<ffffffff8139d4d2>] mutex_lock_nested+0x6c/0x397
        [<ffffffff81039f4a>] get_online_cpus+0x3c/0x50
        [<ffffffff812a69c4>] store+0x20/0xad
        [<ffffffff8118a9a1>] sysfs_write_file+0x138/0x18b
        [<ffffffff8112a428>] vfs_write+0x9c/0x102
        [<ffffffff8112a716>] SyS_write+0x50/0x85
        [<ffffffff813a57a2>] system_call_fastpath+0x16/0x1b
 
 -> #0 (s_active#171){++++.+}:
        [<ffffffff81074760>] __lock_acquire+0xae3/0xe68
        [<ffffffff81075027>] lock_acquire+0xfb/0x144
        [<ffffffff8118b027>] sysfs_deactivate+0xa5/0x108
        [<ffffffff8118b9d7>] sysfs_addrm_finish+0x28/0x46
        [<ffffffff8118bd3f>] sysfs_remove+0x2a/0x31
        [<ffffffff8118be2f>] sysfs_remove_dir+0x66/0x6b
        [<ffffffff811d5d11>] kobject_del+0x18/0x42
        [<ffffffff811d5e1c>] kobject_cleanup+0xe1/0x14f
        [<ffffffff811d5ede>] kobject_put+0x45/0x49
        [<ffffffff812a6ed5>] cpufreq_policy_put_kobj+0x37/0x83
        [<ffffffff812a8bfb>] __cpufreq_add_dev.isra.18+0x75e/0x78c
        [<ffffffff812a8c89>] cpufreq_cpu_callback+0x53/0x88
        [<ffffffff813a314c>] notifier_call_chain+0x67/0x92
        [<ffffffff8105bce4>] __raw_notifier_call_chain+0x9/0xb
        [<ffffffff81039e7c>] __cpu_notify+0x1b/0x32
        [<ffffffff81039ea1>] cpu_notify+0xe/0x10
        [<ffffffff8103a12b>] _cpu_up+0xf1/0x124
        [<ffffffff8138ee7d>] enable_nonboot_cpus+0x52/0xbf
        [<ffffffff8107a2a3>] hibernation_snapshot+0x1be/0x2ed
        [<ffffffff8107eb84>] snapshot_ioctl+0x1e5/0x431
        [<ffffffff81139080>] do_vfs_ioctl+0x45e/0x4a8
        [<ffffffff8113911c>] SyS_ioctl+0x52/0x82
        [<ffffffff813a57a2>] system_call_fastpath+0x16/0x1b
 
 other info that might help us debug this:
 
  Possible unsafe locking scenario:
 
        CPU0                    CPU1
        ----                    ----
   lock(cpu_hotplug.lock);
                                lock(s_active#171);
                                lock(cpu_hotplug.lock);
   lock(s_active#171);
 
  *** DEADLOCK ***
 
 7 locks held by s2disk/5123:
  #0:  (pm_mutex){+.+.+.}, at: [<ffffffff8107e9ea>] snapshot_ioctl+0x4b/0x431
  #1:  (device_hotplug_lock){+.+.+.}, at: [<ffffffff812837ac>] lock_device_hotplug+0x12/0x14
  #2:  (acpi_scan_lock){+.+.+.}, at: [<ffffffff8122c6d3>] acpi_scan_lock_acquire+0x12/0x14
  #3:  (console_lock){+.+.+.}, at: [<ffffffff810817f2>] suspend_console+0x20/0x38
  #4:  (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff81039fb9>] cpu_maps_update_begin+0x12/0x14
  #5:  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff81039ff5>] cpu_hotplug_begin+0x28/0x50
  #6:  (cpufreq_rwsem){.+.+.+}, at: [<ffffffff812a851c>] __cpufreq_add_dev.isra.18+0x7f/0x78c
 
 stack backtrace:
 CPU: 0 PID: 5123 Comm: s2disk Not tainted 3.13.0-rc6+ #183
 Hardware name: LENOVO 2776LEG/2776LEG, BIOS 6EET55WW (3.15 ) 12/19/2011
  ffffffff81d3de70 ffff8800b10718f8 ffffffff81399cac 0000000000003a93
  ffffffff81d3de70 ffff8800b1071948 ffffffff81397dc5 ffff8800b1071928
  ffff8800b10c8bd0 0000000000000006 ffff8800b10c9358 0000000000000006
 Call Trace:
  [<ffffffff81399cac>] dump_stack+0x4e/0x68
  [<ffffffff81397dc5>] print_circular_bug+0x1f8/0x209
  [<ffffffff81074760>] __lock_acquire+0xae3/0xe68
  [<ffffffff8107565d>] ? debug_check_no_locks_freed+0x12c/0x143
  [<ffffffff81075027>] lock_acquire+0xfb/0x144
  [<ffffffff8118b9d7>] ? sysfs_addrm_finish+0x28/0x46
  [<ffffffff81072201>] ? lockdep_init_map+0x14e/0x160
  [<ffffffff8118b027>] sysfs_deactivate+0xa5/0x108
  [<ffffffff8118b9d7>] ? sysfs_addrm_finish+0x28/0x46
  [<ffffffff8118b9d7>] sysfs_addrm_finish+0x28/0x46
  [<ffffffff8118bd3f>] sysfs_remove+0x2a/0x31
  [<ffffffff8118be2f>] sysfs_remove_dir+0x66/0x6b
  [<ffffffff811d5d11>] kobject_del+0x18/0x42
  [<ffffffff811d5e1c>] kobject_cleanup+0xe1/0x14f
  [<ffffffff811d5ede>] kobject_put+0x45/0x49
  [<ffffffff812a6ed5>] cpufreq_policy_put_kobj+0x37/0x83
  [<ffffffff812a8bfb>] __cpufreq_add_dev.isra.18+0x75e/0x78c
  [<ffffffff81071b04>] ? __lock_is_held+0x32/0x54
  [<ffffffff812a8c89>] cpufreq_cpu_callback+0x53/0x88
  [<ffffffff813a314c>] notifier_call_chain+0x67/0x92
  [<ffffffff8105bce4>] __raw_notifier_call_chain+0x9/0xb
  [<ffffffff81039e7c>] __cpu_notify+0x1b/0x32
  [<ffffffff81039ea1>] cpu_notify+0xe/0x10
  [<ffffffff8103a12b>] _cpu_up+0xf1/0x124
  [<ffffffff8138ee7d>] enable_nonboot_cpus+0x52/0xbf
  [<ffffffff8107a2a3>] hibernation_snapshot+0x1be/0x2ed
  [<ffffffff8107eb84>] snapshot_ioctl+0x1e5/0x431
  [<ffffffff81139080>] do_vfs_ioctl+0x45e/0x4a8
  [<ffffffff811414c8>] ? fcheck_files+0xa1/0xe4
  [<ffffffff81141a67>] ? fget_light+0x33/0x9a
  [<ffffffff8113911c>] SyS_ioctl+0x52/0x82
  [<ffffffff811df4ce>] ? trace_hardirqs_on_thunk+0x3a/0x3f
  [<ffffffff813a57a2>] system_call_fastpath+0x16/0x1b
 CPU1 is up
 ACPI: Waking up from system sleep state S4
 PM: noirq thaw of devices complete after 1.722 msecs
 PM: early thaw of devices complete after 1.180 msecs



Bjørn
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjørn Mork Jan. 3, 2014, 9:53 a.m. UTC | #8
Bjørn Mork <bjorn@mork.no> writes:
> "Rafael J. Wysocki" <rjw@rjwysocki.net> writes:
>
>> OK, thanks!  Well, this is somewhat worrisome.
>>
>> Could you also check the linux-pm.git/fixes branch that contains all patches
>> I'm planning to push for 3.13-rc7 shortly?
>
> It's definitely still there.  But I'm less sure about the exact trigger.

I did a couple of more tests after clean reboots, and 100% sure ways to
trigger the circular locking on my laptop are

  echo ondemand > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
  s2disk

or

  echo 1401000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq 
  s2disk

or

  echo 1401000 > /sys/devices/system/cpu/cpu1/cpufreq/scaling_max_freq 
  s2disk


Note that "ondemand" and "1401000" are the default vaules, so I don't
actually change anything here.  The write is causing the problem, not
the value.  As expected, I guess.

Also note that boot vs non-boot cpu doesn't seem to matter.  Nor does
cancelling the hibernation.  The warning appears on hibernate - not on
resume.

Hope this helps.


Bjørn
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Viresh Kumar Jan. 3, 2014, 11:19 a.m. UTC | #9
On 3 January 2014 15:23, Bjørn Mork <bjorn@mork.no> wrote:
> Note that "ondemand" and "1401000" are the default vaules, so I don't
> actually change anything here.  The write is causing the problem, not
> the value.  As expected, I guess.
>
> Also note that boot vs non-boot cpu doesn't seem to matter.  Nor does
> cancelling the hibernation.  The warning appears on hibernate - not on
> resume.

Hmm... I spent quite some time understanding whats going on and really
couldn't get across anything as of now. I haven't tried reproducing it though.

Few things that I can make out of this mail chain so far:
- Apart from the log, everything is working fine. i.e. system is back in
working condition.
- It only happens when cpufreq_add_dev() fails during hibernation while
we enable non-boot CPUs again to save image to disk. So, isn't a problem
for a system which doesn't have any issues with add_dev() failing on
hibernation
- There is a contention of locks in the order they are taken. And the contention
looks to be between, hotplug lock taken by cpu_online_cpus() and s_active
lock for sysfs files. Don't know what's the role of previous write to
sysfs files.
As that should finish before hibernation starts and so all locks should be back
in place.

--
viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjørn Mork Jan. 3, 2014, 11:55 a.m. UTC | #10
Viresh Kumar <viresh.kumar@linaro.org> writes:

> On 3 January 2014 15:23, Bjørn Mork <bjorn@mork.no> wrote:
>> Note that "ondemand" and "1401000" are the default vaules, so I don't
>> actually change anything here.  The write is causing the problem, not
>> the value.  As expected, I guess.
>>
>> Also note that boot vs non-boot cpu doesn't seem to matter.  Nor does
>> cancelling the hibernation.  The warning appears on hibernate - not on
>> resume.
>
> Hmm... I spent quite some time understanding whats going on and really
> couldn't get across anything as of now. I haven't tried reproducing it though.
>
> Few things that I can make out of this mail chain so far:
> - Apart from the log, everything is working fine. i.e. system is back in
> working condition.

Correct. And users not running a lock debugging kernel will of course
not even see the warning.

> - It only happens when cpufreq_add_dev() fails during hibernation while
> we enable non-boot CPUs again to save image to disk. So, isn't a problem
> for a system which doesn't have any issues with add_dev() failing on
> hibernation

Wrong.  This was my initial assumption but I later found out that the
issue is unrelated to hibernation failures.  Sorry about the confusion.

> - There is a contention of locks in the order they are taken. And the contention
> looks to be between, hotplug lock taken by cpu_online_cpus() and s_active
> lock for sysfs files. Don't know what's the role of previous write to
> sysfs files.
> As that should finish before hibernation starts and so all locks should be back
> in place.

Yes, that seems logical.  But I guess this is where it fails?



Bjørn
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Viresh Kumar Jan. 6, 2014, 6:27 a.m. UTC | #11
On 3 January 2014 17:25, Bjørn Mork <bjorn@mork.no> wrote:
> Correct. And users not running a lock debugging kernel will of course
> not even see the warning.

Okay..

>> - It only happens when cpufreq_add_dev() fails during hibernation while
>> we enable non-boot CPUs again to save image to disk. So, isn't a problem
>> for a system which doesn't have any issues with add_dev() failing on
>> hibernation
>
> Wrong.  This was my initial assumption but I later found out that the
> issue is unrelated to hibernation failures.  Sorry about the confusion.

Hmm.. Can we have the latest warning logs you have? Earlier ones were
related to hibernation..

>> - There is a contention of locks in the order they are taken. And the contention
>> looks to be between, hotplug lock taken by cpu_online_cpus() and s_active
>> lock for sysfs files. Don't know what's the role of previous write to
>> sysfs files.
>> As that should finish before hibernation starts and so all locks should be back
>> in place.
>
> Yes, that seems logical.  But I guess this is where it fails?

It looked like that.. Though your new logs might indicate something else.
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjørn Mork Jan. 6, 2014, 9:01 a.m. UTC | #12
Viresh Kumar <viresh.kumar@linaro.org> writes:
> On 3 January 2014 17:25, Bjørn Mork <bjorn@mork.no> wrote:
>> Correct. And users not running a lock debugging kernel will of course
>> not even see the warning.
>
> Okay..
>
>>> - It only happens when cpufreq_add_dev() fails during hibernation while
>>> we enable non-boot CPUs again to save image to disk. So, isn't a problem
>>> for a system which doesn't have any issues with add_dev() failing on
>>> hibernation
>>
>> Wrong.  This was my initial assumption but I later found out that the
>> issue is unrelated to hibernation failures.  Sorry about the confusion.
>
> Hmm.. Can we have the latest warning logs you have? Earlier ones were
> related to hibernation..

That's correct.  I have not observed this on suspend to RAM.  But then
again I haven't rigged any way to log that, so I don't think it's
conclusive..

The point I tried to make is that it isn't related to any hibernation
*failures*.  The warning appears even if the add_dev() is successful,
and it also appears if I touch only the *boot* cpu cpufreq attributes.

I.e., this seems to be unrelated to the hotplugging code.

Here's another copy of the warning, captured by cancelling hibernation
as I don't have any other way to log it at the moment.  But I do see the
warning appear on the console *before* cancelling.  And I also see this
warning appear when trying the in-kernel hibernation instead of
userspace.

[  267.893084] ======================================================
[  267.893085] [ INFO: possible circular locking dependency detected ]
[  267.893087] 3.13.0-rc6+ #183 Tainted: G        W   
[  267.893089] -------------------------------------------------------
[  267.893090] s2disk/5450 is trying to acquire lock:
[  267.893101]  (s_active#164){++++.+}, at: [<ffffffff8118b9d7>] sysfs_addrm_finish+0x28/0x46
[  267.893102] 
[  267.893102] but task is already holding lock:
[  267.893111]  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff81039ff5>] cpu_hotplug_begin+0x28/0x50
[  267.893112] 
[  267.893112] which lock already depends on the new lock.
[  267.893112] 
[  267.893113] 
[  267.893113] the existing dependency chain (in reverse order) is:
[  267.893117] 
[  267.893117] -> #1 (cpu_hotplug.lock){+.+.+.}:
[  267.893123]        [<ffffffff81075027>] lock_acquire+0xfb/0x144
[  267.893128]        [<ffffffff8139d4d2>] mutex_lock_nested+0x6c/0x397
[  267.893132]        [<ffffffff81039f4a>] get_online_cpus+0x3c/0x50
[  267.893137]        [<ffffffff812a69c4>] store+0x20/0xad
[  267.893142]        [<ffffffff8118a9a1>] sysfs_write_file+0x138/0x18b
[  267.893147]        [<ffffffff8112a428>] vfs_write+0x9c/0x102
[  267.893151]        [<ffffffff8112a716>] SyS_write+0x50/0x85
[  267.893155]        [<ffffffff813a57a2>] system_call_fastpath+0x16/0x1b
[  267.893160] 
[  267.893160] -> #0 (s_active#164){++++.+}:
[  267.893164]        [<ffffffff81074760>] __lock_acquire+0xae3/0xe68
[  267.893168]        [<ffffffff81075027>] lock_acquire+0xfb/0x144
[  267.893172]        [<ffffffff8118b027>] sysfs_deactivate+0xa5/0x108
[  267.893175]        [<ffffffff8118b9d7>] sysfs_addrm_finish+0x28/0x46
[  267.893178]        [<ffffffff8118bd3f>] sysfs_remove+0x2a/0x31
[  267.893182]        [<ffffffff8118be2f>] sysfs_remove_dir+0x66/0x6b
[  267.893186]        [<ffffffff811d5d11>] kobject_del+0x18/0x42
[  267.893190]        [<ffffffff811d5e1c>] kobject_cleanup+0xe1/0x14f
[  267.893193]        [<ffffffff811d5ede>] kobject_put+0x45/0x49
[  267.893197]        [<ffffffff812a6ed5>] cpufreq_policy_put_kobj+0x37/0x83
[  267.893201]        [<ffffffff812a8bfb>] __cpufreq_add_dev.isra.18+0x75e/0x78c
[  267.893204]        [<ffffffff812a8c89>] cpufreq_cpu_callback+0x53/0x88
[  267.893208]        [<ffffffff813a314c>] notifier_call_chain+0x67/0x92
[  267.893213]        [<ffffffff8105bce4>] __raw_notifier_call_chain+0x9/0xb
[  267.893217]        [<ffffffff81039e7c>] __cpu_notify+0x1b/0x32
[  267.893221]        [<ffffffff81039ea1>] cpu_notify+0xe/0x10
[  267.893225]        [<ffffffff8103a12b>] _cpu_up+0xf1/0x124
[  267.893230]        [<ffffffff8138ee7d>] enable_nonboot_cpus+0x52/0xbf
[  267.893234]        [<ffffffff8107a2a3>] hibernation_snapshot+0x1be/0x2ed
[  267.893238]        [<ffffffff8107eb84>] snapshot_ioctl+0x1e5/0x431
[  267.893242]        [<ffffffff81139080>] do_vfs_ioctl+0x45e/0x4a8
[  267.893245]        [<ffffffff8113911c>] SyS_ioctl+0x52/0x82
[  267.893249]        [<ffffffff813a57a2>] system_call_fastpath+0x16/0x1b
[  267.893250] 
[  267.893250] other info that might help us debug this:
[  267.893250] 
[  267.893251]  Possible unsafe locking scenario:
[  267.893251] 
[  267.893252]        CPU0                    CPU1
[  267.893253]        ----                    ----
[  267.893256]   lock(cpu_hotplug.lock);
[  267.893259]                                lock(s_active#164);
[  267.893261]                                lock(cpu_hotplug.lock);
[  267.893265]   lock(s_active#164);
[  267.893266] 
[  267.893266]  *** DEADLOCK ***
[  267.893266] 
[  267.893268] 7 locks held by s2disk/5450:
[  267.893275]  #0:  (pm_mutex){+.+.+.}, at: [<ffffffff8107e9ea>] snapshot_ioctl+0x4b/0x431
[  267.893283]  #1:  (device_hotplug_lock){+.+.+.}, at: [<ffffffff812837ac>] lock_device_hotplug+0x12/0x14
[  267.893290]  #2:  (acpi_scan_lock){+.+.+.}, at: [<ffffffff8122c6d3>] acpi_scan_lock_acquire+0x12/0x14
[  267.893297]  #3:  (console_lock){+.+.+.}, at: [<ffffffff810817f2>] suspend_console+0x20/0x38
[  267.893305]  #4:  (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff81039fb9>] cpu_maps_update_begin+0x12/0x14
[  267.893311]  #5:  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff81039ff5>] cpu_hotplug_begin+0x28/0x50
[  267.893318]  #6:  (cpufreq_rwsem){.+.+.+}, at: [<ffffffff812a851c>] __cpufreq_add_dev.isra.18+0x7f/0x78c
[  267.893319] 
[  267.893319] stack backtrace:
[  267.893322] CPU: 0 PID: 5450 Comm: s2disk Tainted: G        W    3.13.0-rc6+ #183
[  267.893324] Hardware name: LENOVO 2776LEG/2776LEG, BIOS 6EET55WW (3.15 ) 12/19/2011
[  267.893329]  ffffffff81d3cd00 ffff8800b7acd8f8 ffffffff81399cac 00000000000068f7
[  267.893334]  ffffffff81d3cd00 ffff8800b7acd948 ffffffff81397dc5 ffff8800b7acd928
[  267.893338]  ffff8800b7b24990 0000000000000006 ffff8800b7b25118 0000000000000006
[  267.893339] Call Trace:
[  267.893344]  [<ffffffff81399cac>] dump_stack+0x4e/0x68
[  267.893349]  [<ffffffff81397dc5>] print_circular_bug+0x1f8/0x209
[  267.893353]  [<ffffffff81074760>] __lock_acquire+0xae3/0xe68
[  267.893357]  [<ffffffff8107565d>] ? debug_check_no_locks_freed+0x12c/0x143
[  267.893361]  [<ffffffff81075027>] lock_acquire+0xfb/0x144
[  267.893365]  [<ffffffff8118b9d7>] ? sysfs_addrm_finish+0x28/0x46
[  267.893369]  [<ffffffff81072201>] ? lockdep_init_map+0x14e/0x160
[  267.893372]  [<ffffffff8118b027>] sysfs_deactivate+0xa5/0x108
[  267.893376]  [<ffffffff8118b9d7>] ? sysfs_addrm_finish+0x28/0x46
[  267.893380]  [<ffffffff8118b9d7>] sysfs_addrm_finish+0x28/0x46
[  267.893383]  [<ffffffff8118bd3f>] sysfs_remove+0x2a/0x31
[  267.893386]  [<ffffffff8118be2f>] sysfs_remove_dir+0x66/0x6b
[  267.893390]  [<ffffffff811d5d11>] kobject_del+0x18/0x42
[  267.893393]  [<ffffffff811d5e1c>] kobject_cleanup+0xe1/0x14f
[  267.893396]  [<ffffffff811d5ede>] kobject_put+0x45/0x49
[  267.893400]  [<ffffffff812a6ed5>] cpufreq_policy_put_kobj+0x37/0x83
[  267.893404]  [<ffffffff812a8bfb>] __cpufreq_add_dev.isra.18+0x75e/0x78c
[  267.893407]  [<ffffffff81071b04>] ? __lock_is_held+0x32/0x54
[  267.893413]  [<ffffffff812a8c89>] cpufreq_cpu_callback+0x53/0x88
[  267.893416]  [<ffffffff813a314c>] notifier_call_chain+0x67/0x92
[  267.893421]  [<ffffffff8105bce4>] __raw_notifier_call_chain+0x9/0xb
[  267.893425]  [<ffffffff81039e7c>] __cpu_notify+0x1b/0x32
[  267.893428]  [<ffffffff81039ea1>] cpu_notify+0xe/0x10
[  267.893432]  [<ffffffff8103a12b>] _cpu_up+0xf1/0x124
[  267.893436]  [<ffffffff8138ee7d>] enable_nonboot_cpus+0x52/0xbf
[  267.893439]  [<ffffffff8107a2a3>] hibernation_snapshot+0x1be/0x2ed
[  267.893443]  [<ffffffff8107eb84>] snapshot_ioctl+0x1e5/0x431
[  267.893447]  [<ffffffff81139080>] do_vfs_ioctl+0x45e/0x4a8
[  267.893451]  [<ffffffff811414c8>] ? fcheck_files+0xa1/0xe4
[  267.893454]  [<ffffffff81141a67>] ? fget_light+0x33/0x9a
[  267.893457]  [<ffffffff8113911c>] SyS_ioctl+0x52/0x82
[  267.893462]  [<ffffffff811df4ce>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[  267.893466]  [<ffffffff813a57a2>] system_call_fastpath+0x16/0x1b



I don't think I do anything extra-ordinary to trigger this, so I would
be surprised if you can't reproduce it by doing 

  export x=`cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor`
  echo $x >/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
  s2disk

or similar on a kernel with CONFIG_PROVE_LOCKING=y


Bjørn
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Viresh Kumar Jan. 6, 2014, 9:57 a.m. UTC | #13
On 6 January 2014 14:31, Bjørn Mork <bjorn@mork.no> wrote:
> That's correct.  I have not observed this on suspend to RAM.  But then
> again I haven't rigged any way to log that, so I don't think it's
> conclusive..
>
> The point I tried to make is that it isn't related to any hibernation
> *failures*.  The warning appears even if the add_dev() is successful,
> and it also appears if I touch only the *boot* cpu cpufreq attributes.

Okay.. But your log below is for add_dev() failure case.

> I.e., this seems to be unrelated to the hotplugging code.
>
> Here's another copy of the warning, captured by cancelling hibernation
> as I don't have any other way to log it at the moment.  But I do see the
> warning appear on the console *before* cancelling.  And I also see this
> warning appear when trying the in-kernel hibernation instead of
> userspace.

How do you cancel hibernation here? Sorry, what is in-kernel hibernation?

> [  267.893084] ======================================================
> [  267.893085] [ INFO: possible circular locking dependency detected ]
> [  267.893087] 3.13.0-rc6+ #183 Tainted: G        W
> [  267.893089] -------------------------------------------------------
> [  267.893090] s2disk/5450 is trying to acquire lock:
> [  267.893101]  (s_active#164){++++.+}, at: [<ffffffff8118b9d7>] sysfs_addrm_finish+0x28/0x46
> [  267.893102]
> [  267.893102] but task is already holding lock:
> [  267.893111]  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff81039ff5>] cpu_hotplug_begin+0x28/0x50
> [  267.893112]
> [  267.893112] which lock already depends on the new lock.
> [  267.893112]
> [  267.893113]
> [  267.893113] the existing dependency chain (in reverse order) is:
> [  267.893117]
> [  267.893117] -> #1 (cpu_hotplug.lock){+.+.+.}:
> [  267.893123]        [<ffffffff81075027>] lock_acquire+0xfb/0x144
> [  267.893128]        [<ffffffff8139d4d2>] mutex_lock_nested+0x6c/0x397
> [  267.893132]        [<ffffffff81039f4a>] get_online_cpus+0x3c/0x50
> [  267.893137]        [<ffffffff812a69c4>] store+0x20/0xad
> [  267.893142]        [<ffffffff8118a9a1>] sysfs_write_file+0x138/0x18b
> [  267.893147]        [<ffffffff8112a428>] vfs_write+0x9c/0x102
> [  267.893151]        [<ffffffff8112a716>] SyS_write+0x50/0x85
> [  267.893155]        [<ffffffff813a57a2>] system_call_fastpath+0x16/0x1b
> [  267.893160]
> [  267.893160] -> #0 (s_active#164){++++.+}:
> [  267.893164]        [<ffffffff81074760>] __lock_acquire+0xae3/0xe68
> [  267.893168]        [<ffffffff81075027>] lock_acquire+0xfb/0x144
> [  267.893172]        [<ffffffff8118b027>] sysfs_deactivate+0xa5/0x108
> [  267.893175]        [<ffffffff8118b9d7>] sysfs_addrm_finish+0x28/0x46
> [  267.893178]        [<ffffffff8118bd3f>] sysfs_remove+0x2a/0x31
> [  267.893182]        [<ffffffff8118be2f>] sysfs_remove_dir+0x66/0x6b
> [  267.893186]        [<ffffffff811d5d11>] kobject_del+0x18/0x42
> [  267.893190]        [<ffffffff811d5e1c>] kobject_cleanup+0xe1/0x14f
> [  267.893193]        [<ffffffff811d5ede>] kobject_put+0x45/0x49
> [  267.893197]        [<ffffffff812a6ed5>] cpufreq_policy_put_kobj+0x37/0x83

All above is for case where add dev was required to clear up earlier
allocated policy as something failed during add_dev()

> [  267.893201]        [<ffffffff812a8bfb>] __cpufreq_add_dev.isra.18+0x75e/0x78c
> [  267.893204]        [<ffffffff812a8c89>] cpufreq_cpu_callback+0x53/0x88
> [  267.893208]        [<ffffffff813a314c>] notifier_call_chain+0x67/0x92
> [  267.893213]        [<ffffffff8105bce4>] __raw_notifier_call_chain+0x9/0xb
> [  267.893217]        [<ffffffff81039e7c>] __cpu_notify+0x1b/0x32
> [  267.893221]        [<ffffffff81039ea1>] cpu_notify+0xe/0x10
> [  267.893225]        [<ffffffff8103a12b>] _cpu_up+0xf1/0x124
> [  267.893230]        [<ffffffff8138ee7d>] enable_nonboot_cpus+0x52/0xbf
> [  267.893234]        [<ffffffff8107a2a3>] hibernation_snapshot+0x1be/0x2ed

And this is again the same log as what you have sent last time. It happened
while saving image after disabling all non-boot CPUs. And so it happened for
your case as your driver->init() was failing, so it may not get reproduced at
my end. Haven't tried it though.

> I don't think I do anything extra-ordinary to trigger this, so I would
> be surprised if you can't reproduce it by doing
>
>   export x=`cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor`
>   echo $x >/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
>   s2disk

I don't see you cancelling hibernation here. Sorry got confused on how
exactly you reproduced it.
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjørn Mork Jan. 6, 2014, 10:49 a.m. UTC | #14
Viresh Kumar <viresh.kumar@linaro.org> writes:

> On 6 January 2014 14:31, Bjørn Mork <bjorn@mork.no> wrote:
>> That's correct.  I have not observed this on suspend to RAM.  But then
>> again I haven't rigged any way to log that, so I don't think it's
>> conclusive..
>>
>> The point I tried to make is that it isn't related to any hibernation
>> *failures*.  The warning appears even if the add_dev() is successful,
>> and it also appears if I touch only the *boot* cpu cpufreq attributes.
>
> Okay.. But your log below is for add_dev() failure case.

Any suggestion on how to capure warnings issued between freeze and thaw
is appreciated :-)

Will a netconsole work?  I can try that later.

>> I.e., this seems to be unrelated to the hotplugging code.
>>
>> Here's another copy of the warning, captured by cancelling hibernation
>> as I don't have any other way to log it at the moment.  But I do see the
>> warning appear on the console *before* cancelling.  And I also see this
>> warning appear when trying the in-kernel hibernation instead of
>> userspace.
>
> How do you cancel hibernation here? Sorry, what is in-kernel hibernation?

I don't know the proper terms here.  I normally use s2disk from uswsusp
to hibernate.  By "in-kernel hibernation" I meant the process initiated
by doing

 echo disk >/sys/power/state



>> [  267.893084] ======================================================
>> [  267.893085] [ INFO: possible circular locking dependency detected ]
>> [  267.893087] 3.13.0-rc6+ #183 Tainted: G        W
>> [  267.893089] -------------------------------------------------------
>> [  267.893090] s2disk/5450 is trying to acquire lock:
>> [  267.893101]  (s_active#164){++++.+}, at: [<ffffffff8118b9d7>] sysfs_addrm_finish+0x28/0x46
>> [  267.893102]
>> [  267.893102] but task is already holding lock:
>> [  267.893111]  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff81039ff5>] cpu_hotplug_begin+0x28/0x50
>> [  267.893112]
>> [  267.893112] which lock already depends on the new lock.
>> [  267.893112]
>> [  267.893113]
>> [  267.893113] the existing dependency chain (in reverse order) is:
>> [  267.893117]
>> [  267.893117] -> #1 (cpu_hotplug.lock){+.+.+.}:
>> [  267.893123]        [<ffffffff81075027>] lock_acquire+0xfb/0x144
>> [  267.893128]        [<ffffffff8139d4d2>] mutex_lock_nested+0x6c/0x397
>> [  267.893132]        [<ffffffff81039f4a>] get_online_cpus+0x3c/0x50
>> [  267.893137]        [<ffffffff812a69c4>] store+0x20/0xad
>> [  267.893142]        [<ffffffff8118a9a1>] sysfs_write_file+0x138/0x18b
>> [  267.893147]        [<ffffffff8112a428>] vfs_write+0x9c/0x102
>> [  267.893151]        [<ffffffff8112a716>] SyS_write+0x50/0x85
>> [  267.893155]        [<ffffffff813a57a2>] system_call_fastpath+0x16/0x1b
>> [  267.893160]
>> [  267.893160] -> #0 (s_active#164){++++.+}:
>> [  267.893164]        [<ffffffff81074760>] __lock_acquire+0xae3/0xe68
>> [  267.893168]        [<ffffffff81075027>] lock_acquire+0xfb/0x144
>> [  267.893172]        [<ffffffff8118b027>] sysfs_deactivate+0xa5/0x108
>> [  267.893175]        [<ffffffff8118b9d7>] sysfs_addrm_finish+0x28/0x46
>> [  267.893178]        [<ffffffff8118bd3f>] sysfs_remove+0x2a/0x31
>> [  267.893182]        [<ffffffff8118be2f>] sysfs_remove_dir+0x66/0x6b
>> [  267.893186]        [<ffffffff811d5d11>] kobject_del+0x18/0x42
>> [  267.893190]        [<ffffffff811d5e1c>] kobject_cleanup+0xe1/0x14f
>> [  267.893193]        [<ffffffff811d5ede>] kobject_put+0x45/0x49
>> [  267.893197]        [<ffffffff812a6ed5>] cpufreq_policy_put_kobj+0x37/0x83
>
> All above is for case where add dev was required to clear up earlier
> allocated policy as something failed during add_dev()

I see.  I'll try capturing at least part of the warning with a camera,
when it's not failing.

>> [  267.893201]        [<ffffffff812a8bfb>] __cpufreq_add_dev.isra.18+0x75e/0x78c
>> [  267.893204]        [<ffffffff812a8c89>] cpufreq_cpu_callback+0x53/0x88
>> [  267.893208]        [<ffffffff813a314c>] notifier_call_chain+0x67/0x92
>> [  267.893213]        [<ffffffff8105bce4>] __raw_notifier_call_chain+0x9/0xb
>> [  267.893217]        [<ffffffff81039e7c>] __cpu_notify+0x1b/0x32
>> [  267.893221]        [<ffffffff81039ea1>] cpu_notify+0xe/0x10
>> [  267.893225]        [<ffffffff8103a12b>] _cpu_up+0xf1/0x124
>> [  267.893230]        [<ffffffff8138ee7d>] enable_nonboot_cpus+0x52/0xbf
>> [  267.893234]        [<ffffffff8107a2a3>] hibernation_snapshot+0x1be/0x2ed
>
> And this is again the same log as what you have sent last time. It happened
> while saving image after disabling all non-boot CPUs. And so it happened for
> your case as your driver->init() was failing, so it may not get reproduced at
> my end. Haven't tried it though.
>
>> I don't think I do anything extra-ordinary to trigger this, so I would
>> be surprised if you can't reproduce it by doing
>>
>>   export x=`cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor`
>>   echo $x >/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
>>   s2disk
>
> I don't see you cancelling hibernation here. Sorry got confused on how
> exactly you reproduced it.

I cancelled by pressing backspace while the image was being written.


Bjørn
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Viresh Kumar Jan. 6, 2014, 10:54 a.m. UTC | #15
On 6 January 2014 16:19, Bjørn Mork <bjorn@mork.no> wrote:
> I cancelled by pressing backspace while the image was being written.

I will answer other queries on next mail, but a quick question:
Do you get these warnings even if you don't cancel hibernation?
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki Jan. 6, 2014, 11:14 a.m. UTC | #16
On Monday, January 06, 2014 11:49:00 AM Bjørn Mork wrote:
> Viresh Kumar <viresh.kumar@linaro.org> writes:
> 
> > On 6 January 2014 14:31, Bjørn Mork <bjorn@mork.no> wrote:
> >> That's correct.  I have not observed this on suspend to RAM.  But then
> >> again I haven't rigged any way to log that, so I don't think it's
> >> conclusive..
> >>
> >> The point I tried to make is that it isn't related to any hibernation
> >> *failures*.  The warning appears even if the add_dev() is successful,
> >> and it also appears if I touch only the *boot* cpu cpufreq attributes.
> >
> > Okay.. But your log below is for add_dev() failure case.
> 
> Any suggestion on how to capure warnings issued between freeze and thaw
> is appreciated :-)

You can try to add no_console_suspend to the kernel's command line.

Thanks!
Rafael J. Wysocki Jan. 6, 2014, 11:33 a.m. UTC | #17
On Monday, January 06, 2014 04:24:09 PM Viresh Kumar wrote:
> On 6 January 2014 16:19, Bjørn Mork <bjorn@mork.no> wrote:
> > I cancelled by pressing backspace while the image was being written.

Basically, stuff is still frozen at this point, but devices are fully
functional.  It's kind of like returning an error code from swsusp_write()
in hibernate() (kernel/power/hibernate.c).

> I will answer other queries on next mail, but a quick question:
> Do you get these warnings even if you don't cancel hibernation?

Good question. :-)

Thanks!
Viresh Kumar Jan. 8, 2014, 5:51 a.m. UTC | #18
On 6 January 2014 16:47, Bjørn Mork <bjorn@mork.no> wrote:
> Viresh Kumar <viresh.kumar@linaro.org> writes:
>
>> On 6 January 2014 16:19, Bjørn Mork <bjorn@mork.no> wrote:
>>> I cancelled by pressing backspace while the image was being written.
>>
>> I will answer other queries on next mail, but a quick question:
>> Do you get these warnings even if you don't cancel hibernation?
>
> Yes.

Hmm.. I see. Now, I really don't believe that you wouldn't get these
warnings on a suspend/resume but only on hibernation.

> I believe the photo shows that you are right: This warning happens when
> the failure path is entered, even if both the hibernation and following
> resume is successful.  We see __cpufreq_add_dev calling
> cpufreq_policy_put_kobj here as well.

@Rafael: I need your expert opinion here, I am not so good with these
concepts :)

Here is my theory about these warnings:
- Bjorn gets these only when CONFIG_PROVE_LOCKING=y.
- This feature enables the kernel to report locking related deadlocks
before they actually occur. (lib/Kconfig.debug)
- So, no locking issue has happened here but its just an warning that
you might end up getting into such issue later.
- Contention is between sysfs and cpu-hotplug lock.
- While changing stuff from sysfs we do this:
   sysfs-lock (by kernel)
   cpu-hotplug lock (by cpufreq)
- And while doing hibernation if add_dev fails then we do this
   cpu-hotplug lock (by kernel while adding CPUs)
   sysfs-lock (by cpufreq while removing sysfs directories)

And kernel is recording all these sequences all the time. Because
we first tried changing values from sysfs, kernel recorded that
sequence. And as soon as it found another one it WARNed us?

While on a suspend/resume, we will get into add-dev failure only
in resume and probably by that time these recordings are refreshed?
And that's why Bjorn isn't facing any issue there ? Just an wild guess.

And if all above "virtual" theory is correct, then I don't know if we
can/want to fix that..

> Thinking about this: This failure will always happen, regardless of
> cancelling or not, won't it?

Yeah.

> But it happens *after* the freeze so it
> doesn't matter if we resume from the frozen image.

yeah..

So, even tests like this should give you this warning:

Documentation/power/basic-pm-debugging.txt
core
- test the freezing of processes, suspending of devices, platform global
  control methods(*), the disabling of nonboot CPUs and suspending of
  platform/system devices

> So you are probably right as usual:

Really? Ask Rafael, he doesn't agree :)

> This warning is related to the
> acpi-cpufreq driver failing, and you cannot reproduce it without
> simulating such a failure (or using that driver yourself).  Which means
> that the warning disappears with my patch in
> http://www.spinics.net/lists/cpufreq/msg08714.html
> which explains why I don't normally see it.  I depend on being able to
> cancel hibernation, so I use that patch most of the time. Except when
> I'm testing one of your branches.

Correct.
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

Index: linux-pm/drivers/cpufreq/cpufreq.c
===================================================================
--- linux-pm.orig/drivers/cpufreq/cpufreq.c
+++ linux-pm/drivers/cpufreq/cpufreq.c
@@ -1016,15 +1016,17 @@  static int __cpufreq_add_dev(struct devi
 	read_unlock_irqrestore(&cpufreq_driver_lock, flags);
 #endif
 
-	if (frozen)
-		/* Restore the saved policy when doing light-weight init */
-		policy = cpufreq_policy_restore(cpu);
-	else
+	/*
+	 * Restore the saved policy when doing light-weight init and fall back
+	 * to the full init if that fails.
+	 */
+	policy = frozen ? cpufreq_policy_restore(cpu) : NULL;
+	if (!policy) {
+		frozen = false;
 		policy = cpufreq_policy_alloc();
-
-	if (!policy)
-		goto nomem_out;
-
+		if (!policy)
+			goto nomem_out;
+	}
 
 	/*
 	 * In the resume path, since we restore a saved policy, the assignment
@@ -1118,8 +1120,11 @@  err_get_freq:
 	if (cpufreq_driver->exit)
 		cpufreq_driver->exit(policy);
 err_set_policy_cpu:
-	if (frozen)
+	if (frozen) {
+		/* Do not leave stale fallback data behind. */
+		per_cpu(cpufreq_cpu_data_fallback, cpu) = NULL;
 		cpufreq_policy_put_kobj(policy);
+	}
 	cpufreq_policy_free(policy);
 
 nomem_out: