Message ID | 5562479.pVWRuDL0y6@vostro.rjw.lan (mailing list archive) |
---|---|
State | Accepted, archived |
Headers | show |
On 26 December 2013 06:35, Rafael J. Wysocki <rjw@rjwysocki.net> wrote: > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > Subject: cpufreq: Clean up after a failing light-weight initialization > > If cpufreq_policy_restore() returns NULL during system resume, > __cpufreq_add_dev() should just fall back to the full initialization > instead of returning an error, because that may actually make things > work. Moreover, it should not leave stale fallback data behind after > it has failed to restore a previously existing policy. > > This change is based on Viresh Kumar's work. > > Reported-by: Bjørn Mork <bjorn@mork.no> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > --- Acked-by: Viresh Kumar <viresh.kumar@linaro.org> -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 26 December 2013 08:17, Viresh Kumar <viresh.kumar@linaro.org> wrote: > On 26 December 2013 06:35, Rafael J. Wysocki <rjw@rjwysocki.net> wrote: >> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> >> Subject: cpufreq: Clean up after a failing light-weight initialization >> >> If cpufreq_policy_restore() returns NULL during system resume, >> __cpufreq_add_dev() should just fall back to the full initialization >> instead of returning an error, because that may actually make things >> work. Moreover, it should not leave stale fallback data behind after >> it has failed to restore a previously existing policy. >> >> This change is based on Viresh Kumar's work. >> >> Reported-by: Bjørn Mork <bjorn@mork.no> >> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> >> --- > > Acked-by: Viresh Kumar <viresh.kumar@linaro.org> I think there is nothing much different in this patch compared to what Bjorn tested. So you can probably push that now and let him test linux-next later once he is back? -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 27 December 2013 15:27, Viresh Kumar <viresh.kumar@linaro.org> wrote: > I think there is nothing much different in this patch compared to what Bjorn > tested. So you can probably push that now and let him test linux-next later > once he is back? Just saw that you already pushed it. :) -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Yes, that's good. As you might have guessed by the lack of any response, I won't be able to test this in 2013. Thanks for all the good work! Bjørn Viresh Kumar <viresh.kumar@linaro.org> wrote: >On 27 December 2013 15:27, Viresh Kumar <viresh.kumar@linaro.org> >wrote: >> I think there is nothing much different in this patch compared to >what Bjorn >> tested. So you can probably push that now and let him test linux-next >later >> once he is back? > >Just saw that you already pushed it. :) -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
"Rafael J. Wysocki" <rjw@rjwysocki.net> writes: > On Tuesday, December 24, 2013 07:11:00 AM Viresh Kumar wrote: >> __cpufreq_add_dev() can fail sometimes while we are resuming our system. >> Currently we are clearing all sysfs nodes for cpufreq's failed policy as that >> could make userspace unstable. But if we suspend/resume again, we should atleast >> try to bring back those policies. >> >> This patch fixes this issue by clearing fallback data on failure and trying to >> allocate a new struct cpufreq_policy on second resume. >> >> Reported-and-tested-by: Bjørn Mork <bjorn@mork.no> >> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> > > Well, while I appreciate the work done here, I don't like the changelog, > I don't really like the way the code is written and I don't like the comments. > Sorry about that. > > Bjorn, can you please test the patch below instead along with the [2/2] > from this series (on top of linux-pm.git/pm-cpufreq)? I tested this series with your modified 1/2 today, but on top of a current mainline (commit 9a0bb2966ef) instead of linux-pm.git/pm-cpufreq. Shouldn't make much difference since Linus already has pulled your 'pm+acpi-3.13-rc6' tag, which included that pm-cpufreq branch. This series fixes the reported bug for me. But I observed this, most likely unrelated, splat during the test: ACPI Exception: AE_BAD_PARAMETER, Returned by Handler for [EmbeddedControl] (20131115/evregion-282) ACPI Error: Method parse/execution failed [\_SB_.PCI0.LPC_.EC__.LPMD] (Node ffff880232499c18), AE_BAD_PARAMETER (20131115/psparse-536) ACPI Error: Method parse/execution failed [\_PR_.CPU0._PPC] (Node ffff8802326f93d0), AE_BAD_PARAMETER (20131115/psparse-536) ACPI Error: Method parse/execution failed [\_PR_.CPU1._PPC] (Node ffff8802326f9268), AE_BAD_PARAMETER (20131115/psparse-536) ACPI Exception: AE_BAD_PARAMETER, Evaluating _PPC (20131115/processor_perflib-140) ====================================================== [ INFO: possible circular locking dependency detected ] 3.13.0-rc6+ #181 Not tainted ------------------------------------------------------- s2disk/5651 is trying to acquire lock: (s_active#170){++++.+}, at: [<ffffffff8118b9d7>] sysfs_addrm_finish+0x28/0x46 but task is already holding lock: (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff81039ff5>] cpu_hotplug_begin+0x28/0x50 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (cpu_hotplug.lock){+.+.+.}: [<ffffffff81075027>] lock_acquire+0xfb/0x144 [<ffffffff8139d4d2>] mutex_lock_nested+0x6c/0x397 [<ffffffff81039f4a>] get_online_cpus+0x3c/0x50 [<ffffffff812a6974>] store+0x20/0xad [<ffffffff8118a9a1>] sysfs_write_file+0x138/0x18b [<ffffffff8112a428>] vfs_write+0x9c/0x102 [<ffffffff8112a716>] SyS_write+0x50/0x85 [<ffffffff813a57a2>] system_call_fastpath+0x16/0x1b -> #0 (s_active#170){++++.+}: [<ffffffff81074760>] __lock_acquire+0xae3/0xe68 [<ffffffff81075027>] lock_acquire+0xfb/0x144 [<ffffffff8118b027>] sysfs_deactivate+0xa5/0x108 [<ffffffff8118b9d7>] sysfs_addrm_finish+0x28/0x46 [<ffffffff8118bd3f>] sysfs_remove+0x2a/0x31 [<ffffffff8118be2f>] sysfs_remove_dir+0x66/0x6b [<ffffffff811d5d11>] kobject_del+0x18/0x42 [<ffffffff811d5e1c>] kobject_cleanup+0xe1/0x14f [<ffffffff811d5ede>] kobject_put+0x45/0x49 [<ffffffff812a6e85>] cpufreq_policy_put_kobj+0x37/0x83 [<ffffffff812a8bab>] __cpufreq_add_dev.isra.18+0x75e/0x78c [<ffffffff812a8c39>] cpufreq_cpu_callback+0x53/0x88 [<ffffffff813a314c>] notifier_call_chain+0x67/0x92 [<ffffffff8105bce4>] __raw_notifier_call_chain+0x9/0xb [<ffffffff81039e7c>] __cpu_notify+0x1b/0x32 [<ffffffff81039ea1>] cpu_notify+0xe/0x10 [<ffffffff8103a12b>] _cpu_up+0xf1/0x124 [<ffffffff8138ee7d>] enable_nonboot_cpus+0x52/0xbf [<ffffffff8107a2a3>] hibernation_snapshot+0x1be/0x2ed [<ffffffff8107eb84>] snapshot_ioctl+0x1e5/0x431 [<ffffffff81139080>] do_vfs_ioctl+0x45e/0x4a8 [<ffffffff8113911c>] SyS_ioctl+0x52/0x82 [<ffffffff813a57a2>] system_call_fastpath+0x16/0x1b other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(cpu_hotplug.lock); lock(s_active#170); lock(cpu_hotplug.lock); lock(s_active#170); *** DEADLOCK *** 7 locks held by s2disk/5651: #0: (pm_mutex){+.+.+.}, at: [<ffffffff8107e9ea>] snapshot_ioctl+0x4b/0x431 #1: (device_hotplug_lock){+.+.+.}, at: [<ffffffff81283730>] lock_device_hotplug+0x12/0x14 #2: (acpi_scan_lock){+.+.+.}, at: [<ffffffff8122c657>] acpi_scan_lock_acquire+0x12/0x14 #3: (console_lock){+.+.+.}, at: [<ffffffff810817f2>] suspend_console+0x20/0x38 #4: (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff81039fb9>] cpu_maps_update_begin+0x12/0x14 #5: (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff81039ff5>] cpu_hotplug_begin+0x28/0x50 #6: (cpufreq_rwsem){.+.+.+}, at: [<ffffffff812a84cc>] __cpufreq_add_dev.isra.18+0x7f/0x78c stack backtrace: CPU: 0 PID: 5651 Comm: s2disk Not tainted 3.13.0-rc6+ #181 Hardware name: LENOVO 2776LEG/2776LEG, BIOS 6EET55WW (3.15 ) 12/19/2011 ffffffff81d3edf0 ffff8802174898f8 ffffffff81399cac 0000000000004549 ffffffff81d3edf0 ffff880217489948 ffffffff81397dc5 ffff880217489928 ffff88023198ea50 0000000000000006 ffff88023198f1d8 0000000000000006 Call Trace: [<ffffffff81399cac>] dump_stack+0x4e/0x68 [<ffffffff81397dc5>] print_circular_bug+0x1f8/0x209 [<ffffffff81074760>] __lock_acquire+0xae3/0xe68 [<ffffffff8107565d>] ? debug_check_no_locks_freed+0x12c/0x143 [<ffffffff81075027>] lock_acquire+0xfb/0x144 [<ffffffff8118b9d7>] ? sysfs_addrm_finish+0x28/0x46 [<ffffffff81072201>] ? lockdep_init_map+0x14e/0x160 [<ffffffff8118b027>] sysfs_deactivate+0xa5/0x108 [<ffffffff8118b9d7>] ? sysfs_addrm_finish+0x28/0x46 [<ffffffff8118b9d7>] sysfs_addrm_finish+0x28/0x46 [<ffffffff8118bd3f>] sysfs_remove+0x2a/0x31 [<ffffffff8118be2f>] sysfs_remove_dir+0x66/0x6b [<ffffffff811d5d11>] kobject_del+0x18/0x42 [<ffffffff811d5e1c>] kobject_cleanup+0xe1/0x14f [<ffffffff811d5ede>] kobject_put+0x45/0x49 [<ffffffff812a6e85>] cpufreq_policy_put_kobj+0x37/0x83 [<ffffffff812a8bab>] __cpufreq_add_dev.isra.18+0x75e/0x78c [<ffffffff81071b04>] ? __lock_is_held+0x32/0x54 [<ffffffff812a8c39>] cpufreq_cpu_callback+0x53/0x88 [<ffffffff813a314c>] notifier_call_chain+0x67/0x92 [<ffffffff8105bce4>] __raw_notifier_call_chain+0x9/0xb [<ffffffff81039e7c>] __cpu_notify+0x1b/0x32 [<ffffffff81039ea1>] cpu_notify+0xe/0x10 [<ffffffff8103a12b>] _cpu_up+0xf1/0x124 [<ffffffff8138ee7d>] enable_nonboot_cpus+0x52/0xbf [<ffffffff8107a2a3>] hibernation_snapshot+0x1be/0x2ed [<ffffffff8107eb84>] snapshot_ioctl+0x1e5/0x431 [<ffffffff81139080>] do_vfs_ioctl+0x45e/0x4a8 [<ffffffff811414c8>] ? fcheck_files+0xa1/0xe4 [<ffffffff81141a67>] ? fget_light+0x33/0x9a [<ffffffff8113911c>] SyS_ioctl+0x52/0x82 [<ffffffff811df4ce>] ? trace_hardirqs_on_thunk+0x3a/0x3f [<ffffffff813a57a2>] system_call_fastpath+0x16/0x1b CPU1 is up ACPI: Waking up from system sleep state S4 PM: noirq thaw of devices complete after 2.727 msecs PM: early thaw of devices complete after 1.137 msecs This warning appeared when I tried cancelling hibernation, which is known to fail when using the acpi-cpufreq driver. The point of the test was to verify that such failures still produce the expected result: - all stale sysfs files are cleaned up and removed - later suspend/resume actions will restore (or actually reinitiate) the cpufreq driver for the non-boot CPUs Which was successful, except that it produced the above lockdep warning. I have not verified that this is a new warning (which means that it most likely is not). And I expect the whole acpi-cpufreq problem, including that warning, to go away when you eventually push http://www.spinics.net/lists/cpufreq/msg08714.html with your improved changelog (thanks for doing that BTW). So I don't worry too much about it. Just wanted to let you know. Bjørn -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thursday, January 02, 2014 01:15:10 PM Bjørn Mork wrote: > "Rafael J. Wysocki" <rjw@rjwysocki.net> writes: > > > On Tuesday, December 24, 2013 07:11:00 AM Viresh Kumar wrote: > >> __cpufreq_add_dev() can fail sometimes while we are resuming our system. > >> Currently we are clearing all sysfs nodes for cpufreq's failed policy as that > >> could make userspace unstable. But if we suspend/resume again, we should atleast > >> try to bring back those policies. > >> > >> This patch fixes this issue by clearing fallback data on failure and trying to > >> allocate a new struct cpufreq_policy on second resume. > >> > >> Reported-and-tested-by: Bjørn Mork <bjorn@mork.no> > >> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> > > > > Well, while I appreciate the work done here, I don't like the changelog, > > I don't really like the way the code is written and I don't like the comments. > > Sorry about that. > > > > Bjorn, can you please test the patch below instead along with the [2/2] > > from this series (on top of linux-pm.git/pm-cpufreq)? > > I tested this series with your modified 1/2 today, but on top of a > current mainline (commit 9a0bb2966ef) instead of linux-pm.git/pm-cpufreq. > Shouldn't make much difference since Linus already has pulled your > 'pm+acpi-3.13-rc6' tag, which included that pm-cpufreq branch. > > This series fixes the reported bug for me. > > > But I observed this, most likely unrelated, splat during the test: > > ACPI Exception: AE_BAD_PARAMETER, Returned by Handler for [EmbeddedControl] (20131115/evregion-282) > ACPI Error: Method parse/execution failed [\_SB_.PCI0.LPC_.EC__.LPMD] (Node ffff880232499c18), AE_BAD_PARAMETER (20131115/psparse-536) > ACPI Error: Method parse/execution failed [\_PR_.CPU0._PPC] (Node ffff8802326f93d0), AE_BAD_PARAMETER (20131115/psparse-536) > ACPI Error: Method parse/execution failed [\_PR_.CPU1._PPC] (Node ffff8802326f9268), AE_BAD_PARAMETER (20131115/psparse-536) > ACPI Exception: AE_BAD_PARAMETER, Evaluating _PPC (20131115/processor_perflib-140) > > ====================================================== > [ INFO: possible circular locking dependency detected ] > 3.13.0-rc6+ #181 Not tainted > ------------------------------------------------------- > s2disk/5651 is trying to acquire lock: > (s_active#170){++++.+}, at: [<ffffffff8118b9d7>] sysfs_addrm_finish+0x28/0x46 > > but task is already holding lock: > (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff81039ff5>] cpu_hotplug_begin+0x28/0x50 > > which lock already depends on the new lock. > > > the existing dependency chain (in reverse order) is: > > -> #1 (cpu_hotplug.lock){+.+.+.}: > [<ffffffff81075027>] lock_acquire+0xfb/0x144 > [<ffffffff8139d4d2>] mutex_lock_nested+0x6c/0x397 > [<ffffffff81039f4a>] get_online_cpus+0x3c/0x50 > [<ffffffff812a6974>] store+0x20/0xad > [<ffffffff8118a9a1>] sysfs_write_file+0x138/0x18b > [<ffffffff8112a428>] vfs_write+0x9c/0x102 > [<ffffffff8112a716>] SyS_write+0x50/0x85 > [<ffffffff813a57a2>] system_call_fastpath+0x16/0x1b > > -> #0 (s_active#170){++++.+}: > [<ffffffff81074760>] __lock_acquire+0xae3/0xe68 > [<ffffffff81075027>] lock_acquire+0xfb/0x144 > [<ffffffff8118b027>] sysfs_deactivate+0xa5/0x108 > [<ffffffff8118b9d7>] sysfs_addrm_finish+0x28/0x46 > [<ffffffff8118bd3f>] sysfs_remove+0x2a/0x31 > [<ffffffff8118be2f>] sysfs_remove_dir+0x66/0x6b > [<ffffffff811d5d11>] kobject_del+0x18/0x42 > [<ffffffff811d5e1c>] kobject_cleanup+0xe1/0x14f > [<ffffffff811d5ede>] kobject_put+0x45/0x49 > [<ffffffff812a6e85>] cpufreq_policy_put_kobj+0x37/0x83 > [<ffffffff812a8bab>] __cpufreq_add_dev.isra.18+0x75e/0x78c > [<ffffffff812a8c39>] cpufreq_cpu_callback+0x53/0x88 > [<ffffffff813a314c>] notifier_call_chain+0x67/0x92 > [<ffffffff8105bce4>] __raw_notifier_call_chain+0x9/0xb > [<ffffffff81039e7c>] __cpu_notify+0x1b/0x32 > [<ffffffff81039ea1>] cpu_notify+0xe/0x10 > [<ffffffff8103a12b>] _cpu_up+0xf1/0x124 > [<ffffffff8138ee7d>] enable_nonboot_cpus+0x52/0xbf > [<ffffffff8107a2a3>] hibernation_snapshot+0x1be/0x2ed > [<ffffffff8107eb84>] snapshot_ioctl+0x1e5/0x431 > [<ffffffff81139080>] do_vfs_ioctl+0x45e/0x4a8 > [<ffffffff8113911c>] SyS_ioctl+0x52/0x82 > [<ffffffff813a57a2>] system_call_fastpath+0x16/0x1b > > other info that might help us debug this: > > Possible unsafe locking scenario: > > CPU0 CPU1 > ---- ---- > lock(cpu_hotplug.lock); > lock(s_active#170); > lock(cpu_hotplug.lock); > lock(s_active#170); > > *** DEADLOCK *** > > 7 locks held by s2disk/5651: > #0: (pm_mutex){+.+.+.}, at: [<ffffffff8107e9ea>] snapshot_ioctl+0x4b/0x431 > #1: (device_hotplug_lock){+.+.+.}, at: [<ffffffff81283730>] lock_device_hotplug+0x12/0x14 > #2: (acpi_scan_lock){+.+.+.}, at: [<ffffffff8122c657>] acpi_scan_lock_acquire+0x12/0x14 > #3: (console_lock){+.+.+.}, at: [<ffffffff810817f2>] suspend_console+0x20/0x38 > #4: (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff81039fb9>] cpu_maps_update_begin+0x12/0x14 > #5: (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff81039ff5>] cpu_hotplug_begin+0x28/0x50 > #6: (cpufreq_rwsem){.+.+.+}, at: [<ffffffff812a84cc>] __cpufreq_add_dev.isra.18+0x7f/0x78c > > stack backtrace: > CPU: 0 PID: 5651 Comm: s2disk Not tainted 3.13.0-rc6+ #181 > Hardware name: LENOVO 2776LEG/2776LEG, BIOS 6EET55WW (3.15 ) 12/19/2011 > ffffffff81d3edf0 ffff8802174898f8 ffffffff81399cac 0000000000004549 > ffffffff81d3edf0 ffff880217489948 ffffffff81397dc5 ffff880217489928 > ffff88023198ea50 0000000000000006 ffff88023198f1d8 0000000000000006 > Call Trace: > [<ffffffff81399cac>] dump_stack+0x4e/0x68 > [<ffffffff81397dc5>] print_circular_bug+0x1f8/0x209 > [<ffffffff81074760>] __lock_acquire+0xae3/0xe68 > [<ffffffff8107565d>] ? debug_check_no_locks_freed+0x12c/0x143 > [<ffffffff81075027>] lock_acquire+0xfb/0x144 > [<ffffffff8118b9d7>] ? sysfs_addrm_finish+0x28/0x46 > [<ffffffff81072201>] ? lockdep_init_map+0x14e/0x160 > [<ffffffff8118b027>] sysfs_deactivate+0xa5/0x108 > [<ffffffff8118b9d7>] ? sysfs_addrm_finish+0x28/0x46 > [<ffffffff8118b9d7>] sysfs_addrm_finish+0x28/0x46 > [<ffffffff8118bd3f>] sysfs_remove+0x2a/0x31 > [<ffffffff8118be2f>] sysfs_remove_dir+0x66/0x6b > [<ffffffff811d5d11>] kobject_del+0x18/0x42 > [<ffffffff811d5e1c>] kobject_cleanup+0xe1/0x14f > [<ffffffff811d5ede>] kobject_put+0x45/0x49 > [<ffffffff812a6e85>] cpufreq_policy_put_kobj+0x37/0x83 > [<ffffffff812a8bab>] __cpufreq_add_dev.isra.18+0x75e/0x78c > [<ffffffff81071b04>] ? __lock_is_held+0x32/0x54 > [<ffffffff812a8c39>] cpufreq_cpu_callback+0x53/0x88 > [<ffffffff813a314c>] notifier_call_chain+0x67/0x92 > [<ffffffff8105bce4>] __raw_notifier_call_chain+0x9/0xb > [<ffffffff81039e7c>] __cpu_notify+0x1b/0x32 > [<ffffffff81039ea1>] cpu_notify+0xe/0x10 > [<ffffffff8103a12b>] _cpu_up+0xf1/0x124 > [<ffffffff8138ee7d>] enable_nonboot_cpus+0x52/0xbf > [<ffffffff8107a2a3>] hibernation_snapshot+0x1be/0x2ed > [<ffffffff8107eb84>] snapshot_ioctl+0x1e5/0x431 > [<ffffffff81139080>] do_vfs_ioctl+0x45e/0x4a8 > [<ffffffff811414c8>] ? fcheck_files+0xa1/0xe4 > [<ffffffff81141a67>] ? fget_light+0x33/0x9a > [<ffffffff8113911c>] SyS_ioctl+0x52/0x82 > [<ffffffff811df4ce>] ? trace_hardirqs_on_thunk+0x3a/0x3f > [<ffffffff813a57a2>] system_call_fastpath+0x16/0x1b > CPU1 is up > ACPI: Waking up from system sleep state S4 > PM: noirq thaw of devices complete after 2.727 msecs > PM: early thaw of devices complete after 1.137 msecs > > > > This warning appeared when I tried cancelling hibernation, which is > known to fail when using the acpi-cpufreq driver. The point of the test > was to verify that such failures still produce the expected result: > > - all stale sysfs files are cleaned up and removed > - later suspend/resume actions will restore (or actually reinitiate) > the cpufreq driver for the non-boot CPUs > > Which was successful, except that it produced the above lockdep warning. > > I have not verified that this is a new warning (which means that it most > likely is not). And I expect the whole acpi-cpufreq problem, including > that warning, to go away when you eventually push > http://www.spinics.net/lists/cpufreq/msg08714.html with your improved > changelog (thanks for doing that BTW). So I don't worry too much about > it. Just wanted to let you know. OK, thanks! Well, this is somewhat worrisome. Could you also check the linux-pm.git/fixes branch that contains all patches I'm planning to push for 3.13-rc7 shortly? Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
"Rafael J. Wysocki" <rjw@rjwysocki.net> writes: > OK, thanks! Well, this is somewhat worrisome. > > Could you also check the linux-pm.git/fixes branch that contains all patches > I'm planning to push for 3.13-rc7 shortly? It's definitely still there. But I'm less sure about the exact trigger. I have now booted and suspend that branch a few times trying to figure out which actions it depends on. It seems to depend on a modified scaling_max_freq (or any other attribute, I guess - haven't testing anything else) in combination with suspend. But the order doesn't necessarily matter. I can sometimes set off the warning by writing to sysfs after resuming, and sometimes at suspend time if the value is already modified. The results are not consistent though. Sometimes there is no warning at all. But what's 100% sure is that I still can trigger this with the linux-pm.git/fixes branch. For example, triggered by echo 800000 >/sys/devices/system/cpu/cpu1/cpufreq/scaling_max_freq after resume (I believe the taint is due to a _request_firmware failure for one of the bluetooth devices): ====================================================== [ INFO: possible circular locking dependency detected ] 3.13.0-rc6+ #183 Tainted: G W ------------------------------------------------------- bash/4897 is trying to acquire lock: (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff81039f4a>] get_online_cpus+0x3c/0x50 but task is already holding lock: (s_active#273){++++.+}, at: [<ffffffff8118a93f>] sysfs_write_file+0xd6/0x18b which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (s_active#273){++++.+}: [<ffffffff81075027>] lock_acquire+0xfb/0x144 [<ffffffff8118b027>] sysfs_deactivate+0xa5/0x108 [<ffffffff8118b9d7>] sysfs_addrm_finish+0x28/0x46 [<ffffffff8118bd3f>] sysfs_remove+0x2a/0x31 [<ffffffff8118be2f>] sysfs_remove_dir+0x66/0x6b [<ffffffff811d5d11>] kobject_del+0x18/0x42 [<ffffffff811d5e1c>] kobject_cleanup+0xe1/0x14f [<ffffffff811d5ede>] kobject_put+0x45/0x49 [<ffffffff812a6ed5>] cpufreq_policy_put_kobj+0x37/0x83 [<ffffffff812a8bfb>] __cpufreq_add_dev.isra.18+0x75e/0x78c [<ffffffff812a8c89>] cpufreq_cpu_callback+0x53/0x88 [<ffffffff813a314c>] notifier_call_chain+0x67/0x92 [<ffffffff8105bce4>] __raw_notifier_call_chain+0x9/0xb [<ffffffff81039e7c>] __cpu_notify+0x1b/0x32 [<ffffffff81039ea1>] cpu_notify+0xe/0x10 [<ffffffff8103a12b>] _cpu_up+0xf1/0x124 [<ffffffff8138ee7d>] enable_nonboot_cpus+0x52/0xbf [<ffffffff8107a2a3>] hibernation_snapshot+0x1be/0x2ed [<ffffffff8107eb84>] snapshot_ioctl+0x1e5/0x431 [<ffffffff81139080>] do_vfs_ioctl+0x45e/0x4a8 [<ffffffff8113911c>] SyS_ioctl+0x52/0x82 [<ffffffff813a57a2>] system_call_fastpath+0x16/0x1b -> #0 (cpu_hotplug.lock){+.+.+.}: [<ffffffff81074760>] __lock_acquire+0xae3/0xe68 [<ffffffff81075027>] lock_acquire+0xfb/0x144 [<ffffffff8139d4d2>] mutex_lock_nested+0x6c/0x397 [<ffffffff81039f4a>] get_online_cpus+0x3c/0x50 [<ffffffff812a69c4>] store+0x20/0xad [<ffffffff8118a9a1>] sysfs_write_file+0x138/0x18b [<ffffffff8112a428>] vfs_write+0x9c/0x102 [<ffffffff8112a716>] SyS_write+0x50/0x85 [<ffffffff813a57a2>] system_call_fastpath+0x16/0x1b other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(s_active#273); lock(cpu_hotplug.lock); lock(s_active#273); lock(cpu_hotplug.lock); *** DEADLOCK *** 3 locks held by bash/4897: #0: (sb_writers#5){.+.+.+}, at: [<ffffffff811297c7>] file_start_write+0x27/0x29 #1: (&of->mutex){+.+.+.}, at: [<ffffffff8118a937>] sysfs_write_file+0xce/0x18b #2: (s_active#273){++++.+}, at: [<ffffffff8118a93f>] sysfs_write_file+0xd6/0x18b stack backtrace: CPU: 1 PID: 4897 Comm: bash Tainted: G W 3.13.0-rc6+ #183 Hardware name: LENOVO 2776LEG/2776LEG, BIOS 6EET55WW (3.15 ) 12/19/2011 ffffffff81ca83b0 ffff88021fa65c08 ffffffff81399cac 0000000000000006 ffffffff81ca83b0 ffff88021fa65c58 ffffffff81397dc5 0000000000000002 ffff8800b1af2810 0000000000000003 ffff8800b1af2ef0 0000000000000003 Call Trace: [<ffffffff81399cac>] dump_stack+0x4e/0x68 [<ffffffff81397dc5>] print_circular_bug+0x1f8/0x209 [<ffffffff81074760>] __lock_acquire+0xae3/0xe68 [<ffffffff81075027>] lock_acquire+0xfb/0x144 [<ffffffff81039f4a>] ? get_online_cpus+0x3c/0x50 [<ffffffff8139d4d2>] mutex_lock_nested+0x6c/0x397 [<ffffffff81039f4a>] ? get_online_cpus+0x3c/0x50 [<ffffffff81071b04>] ? __lock_is_held+0x32/0x54 [<ffffffff81039f4a>] ? get_online_cpus+0x3c/0x50 [<ffffffff81039f4a>] get_online_cpus+0x3c/0x50 [<ffffffff812a69c4>] store+0x20/0xad [<ffffffff8118a9a1>] sysfs_write_file+0x138/0x18b [<ffffffff8112a428>] vfs_write+0x9c/0x102 [<ffffffff8112a716>] SyS_write+0x50/0x85 [<ffffffff813a57a2>] system_call_fastpath+0x16/0x1b Or by s2disk (not the same boot - this time without the taint): ====================================================== [ INFO: possible circular locking dependency detected ] 3.13.0-rc6+ #183 Not tainted ------------------------------------------------------- s2disk/5123 is trying to acquire lock: (s_active#171){++++.+}, at: [<ffffffff8118b9d7>] sysfs_addrm_finish+0x28/0x46 but task is already holding lock: (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff81039ff5>] cpu_hotplug_begin+0x28/0x50 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (cpu_hotplug.lock){+.+.+.}: [<ffffffff81075027>] lock_acquire+0xfb/0x144 [<ffffffff8139d4d2>] mutex_lock_nested+0x6c/0x397 [<ffffffff81039f4a>] get_online_cpus+0x3c/0x50 [<ffffffff812a69c4>] store+0x20/0xad [<ffffffff8118a9a1>] sysfs_write_file+0x138/0x18b [<ffffffff8112a428>] vfs_write+0x9c/0x102 [<ffffffff8112a716>] SyS_write+0x50/0x85 [<ffffffff813a57a2>] system_call_fastpath+0x16/0x1b -> #0 (s_active#171){++++.+}: [<ffffffff81074760>] __lock_acquire+0xae3/0xe68 [<ffffffff81075027>] lock_acquire+0xfb/0x144 [<ffffffff8118b027>] sysfs_deactivate+0xa5/0x108 [<ffffffff8118b9d7>] sysfs_addrm_finish+0x28/0x46 [<ffffffff8118bd3f>] sysfs_remove+0x2a/0x31 [<ffffffff8118be2f>] sysfs_remove_dir+0x66/0x6b [<ffffffff811d5d11>] kobject_del+0x18/0x42 [<ffffffff811d5e1c>] kobject_cleanup+0xe1/0x14f [<ffffffff811d5ede>] kobject_put+0x45/0x49 [<ffffffff812a6ed5>] cpufreq_policy_put_kobj+0x37/0x83 [<ffffffff812a8bfb>] __cpufreq_add_dev.isra.18+0x75e/0x78c [<ffffffff812a8c89>] cpufreq_cpu_callback+0x53/0x88 [<ffffffff813a314c>] notifier_call_chain+0x67/0x92 [<ffffffff8105bce4>] __raw_notifier_call_chain+0x9/0xb [<ffffffff81039e7c>] __cpu_notify+0x1b/0x32 [<ffffffff81039ea1>] cpu_notify+0xe/0x10 [<ffffffff8103a12b>] _cpu_up+0xf1/0x124 [<ffffffff8138ee7d>] enable_nonboot_cpus+0x52/0xbf [<ffffffff8107a2a3>] hibernation_snapshot+0x1be/0x2ed [<ffffffff8107eb84>] snapshot_ioctl+0x1e5/0x431 [<ffffffff81139080>] do_vfs_ioctl+0x45e/0x4a8 [<ffffffff8113911c>] SyS_ioctl+0x52/0x82 [<ffffffff813a57a2>] system_call_fastpath+0x16/0x1b other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(cpu_hotplug.lock); lock(s_active#171); lock(cpu_hotplug.lock); lock(s_active#171); *** DEADLOCK *** 7 locks held by s2disk/5123: #0: (pm_mutex){+.+.+.}, at: [<ffffffff8107e9ea>] snapshot_ioctl+0x4b/0x431 #1: (device_hotplug_lock){+.+.+.}, at: [<ffffffff812837ac>] lock_device_hotplug+0x12/0x14 #2: (acpi_scan_lock){+.+.+.}, at: [<ffffffff8122c6d3>] acpi_scan_lock_acquire+0x12/0x14 #3: (console_lock){+.+.+.}, at: [<ffffffff810817f2>] suspend_console+0x20/0x38 #4: (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff81039fb9>] cpu_maps_update_begin+0x12/0x14 #5: (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff81039ff5>] cpu_hotplug_begin+0x28/0x50 #6: (cpufreq_rwsem){.+.+.+}, at: [<ffffffff812a851c>] __cpufreq_add_dev.isra.18+0x7f/0x78c stack backtrace: CPU: 0 PID: 5123 Comm: s2disk Not tainted 3.13.0-rc6+ #183 Hardware name: LENOVO 2776LEG/2776LEG, BIOS 6EET55WW (3.15 ) 12/19/2011 ffffffff81d3de70 ffff8800b10718f8 ffffffff81399cac 0000000000003a93 ffffffff81d3de70 ffff8800b1071948 ffffffff81397dc5 ffff8800b1071928 ffff8800b10c8bd0 0000000000000006 ffff8800b10c9358 0000000000000006 Call Trace: [<ffffffff81399cac>] dump_stack+0x4e/0x68 [<ffffffff81397dc5>] print_circular_bug+0x1f8/0x209 [<ffffffff81074760>] __lock_acquire+0xae3/0xe68 [<ffffffff8107565d>] ? debug_check_no_locks_freed+0x12c/0x143 [<ffffffff81075027>] lock_acquire+0xfb/0x144 [<ffffffff8118b9d7>] ? sysfs_addrm_finish+0x28/0x46 [<ffffffff81072201>] ? lockdep_init_map+0x14e/0x160 [<ffffffff8118b027>] sysfs_deactivate+0xa5/0x108 [<ffffffff8118b9d7>] ? sysfs_addrm_finish+0x28/0x46 [<ffffffff8118b9d7>] sysfs_addrm_finish+0x28/0x46 [<ffffffff8118bd3f>] sysfs_remove+0x2a/0x31 [<ffffffff8118be2f>] sysfs_remove_dir+0x66/0x6b [<ffffffff811d5d11>] kobject_del+0x18/0x42 [<ffffffff811d5e1c>] kobject_cleanup+0xe1/0x14f [<ffffffff811d5ede>] kobject_put+0x45/0x49 [<ffffffff812a6ed5>] cpufreq_policy_put_kobj+0x37/0x83 [<ffffffff812a8bfb>] __cpufreq_add_dev.isra.18+0x75e/0x78c [<ffffffff81071b04>] ? __lock_is_held+0x32/0x54 [<ffffffff812a8c89>] cpufreq_cpu_callback+0x53/0x88 [<ffffffff813a314c>] notifier_call_chain+0x67/0x92 [<ffffffff8105bce4>] __raw_notifier_call_chain+0x9/0xb [<ffffffff81039e7c>] __cpu_notify+0x1b/0x32 [<ffffffff81039ea1>] cpu_notify+0xe/0x10 [<ffffffff8103a12b>] _cpu_up+0xf1/0x124 [<ffffffff8138ee7d>] enable_nonboot_cpus+0x52/0xbf [<ffffffff8107a2a3>] hibernation_snapshot+0x1be/0x2ed [<ffffffff8107eb84>] snapshot_ioctl+0x1e5/0x431 [<ffffffff81139080>] do_vfs_ioctl+0x45e/0x4a8 [<ffffffff811414c8>] ? fcheck_files+0xa1/0xe4 [<ffffffff81141a67>] ? fget_light+0x33/0x9a [<ffffffff8113911c>] SyS_ioctl+0x52/0x82 [<ffffffff811df4ce>] ? trace_hardirqs_on_thunk+0x3a/0x3f [<ffffffff813a57a2>] system_call_fastpath+0x16/0x1b CPU1 is up ACPI: Waking up from system sleep state S4 PM: noirq thaw of devices complete after 1.722 msecs PM: early thaw of devices complete after 1.180 msecs Bjørn -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Bjørn Mork <bjorn@mork.no> writes: > "Rafael J. Wysocki" <rjw@rjwysocki.net> writes: > >> OK, thanks! Well, this is somewhat worrisome. >> >> Could you also check the linux-pm.git/fixes branch that contains all patches >> I'm planning to push for 3.13-rc7 shortly? > > It's definitely still there. But I'm less sure about the exact trigger. I did a couple of more tests after clean reboots, and 100% sure ways to trigger the circular locking on my laptop are echo ondemand > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor s2disk or echo 1401000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq s2disk or echo 1401000 > /sys/devices/system/cpu/cpu1/cpufreq/scaling_max_freq s2disk Note that "ondemand" and "1401000" are the default vaules, so I don't actually change anything here. The write is causing the problem, not the value. As expected, I guess. Also note that boot vs non-boot cpu doesn't seem to matter. Nor does cancelling the hibernation. The warning appears on hibernate - not on resume. Hope this helps. Bjørn -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 3 January 2014 15:23, Bjørn Mork <bjorn@mork.no> wrote: > Note that "ondemand" and "1401000" are the default vaules, so I don't > actually change anything here. The write is causing the problem, not > the value. As expected, I guess. > > Also note that boot vs non-boot cpu doesn't seem to matter. Nor does > cancelling the hibernation. The warning appears on hibernate - not on > resume. Hmm... I spent quite some time understanding whats going on and really couldn't get across anything as of now. I haven't tried reproducing it though. Few things that I can make out of this mail chain so far: - Apart from the log, everything is working fine. i.e. system is back in working condition. - It only happens when cpufreq_add_dev() fails during hibernation while we enable non-boot CPUs again to save image to disk. So, isn't a problem for a system which doesn't have any issues with add_dev() failing on hibernation - There is a contention of locks in the order they are taken. And the contention looks to be between, hotplug lock taken by cpu_online_cpus() and s_active lock for sysfs files. Don't know what's the role of previous write to sysfs files. As that should finish before hibernation starts and so all locks should be back in place. -- viresh -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Viresh Kumar <viresh.kumar@linaro.org> writes: > On 3 January 2014 15:23, Bjørn Mork <bjorn@mork.no> wrote: >> Note that "ondemand" and "1401000" are the default vaules, so I don't >> actually change anything here. The write is causing the problem, not >> the value. As expected, I guess. >> >> Also note that boot vs non-boot cpu doesn't seem to matter. Nor does >> cancelling the hibernation. The warning appears on hibernate - not on >> resume. > > Hmm... I spent quite some time understanding whats going on and really > couldn't get across anything as of now. I haven't tried reproducing it though. > > Few things that I can make out of this mail chain so far: > - Apart from the log, everything is working fine. i.e. system is back in > working condition. Correct. And users not running a lock debugging kernel will of course not even see the warning. > - It only happens when cpufreq_add_dev() fails during hibernation while > we enable non-boot CPUs again to save image to disk. So, isn't a problem > for a system which doesn't have any issues with add_dev() failing on > hibernation Wrong. This was my initial assumption but I later found out that the issue is unrelated to hibernation failures. Sorry about the confusion. > - There is a contention of locks in the order they are taken. And the contention > looks to be between, hotplug lock taken by cpu_online_cpus() and s_active > lock for sysfs files. Don't know what's the role of previous write to > sysfs files. > As that should finish before hibernation starts and so all locks should be back > in place. Yes, that seems logical. But I guess this is where it fails? Bjørn -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 3 January 2014 17:25, Bjørn Mork <bjorn@mork.no> wrote: > Correct. And users not running a lock debugging kernel will of course > not even see the warning. Okay.. >> - It only happens when cpufreq_add_dev() fails during hibernation while >> we enable non-boot CPUs again to save image to disk. So, isn't a problem >> for a system which doesn't have any issues with add_dev() failing on >> hibernation > > Wrong. This was my initial assumption but I later found out that the > issue is unrelated to hibernation failures. Sorry about the confusion. Hmm.. Can we have the latest warning logs you have? Earlier ones were related to hibernation.. >> - There is a contention of locks in the order they are taken. And the contention >> looks to be between, hotplug lock taken by cpu_online_cpus() and s_active >> lock for sysfs files. Don't know what's the role of previous write to >> sysfs files. >> As that should finish before hibernation starts and so all locks should be back >> in place. > > Yes, that seems logical. But I guess this is where it fails? It looked like that.. Though your new logs might indicate something else. -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Viresh Kumar <viresh.kumar@linaro.org> writes: > On 3 January 2014 17:25, Bjørn Mork <bjorn@mork.no> wrote: >> Correct. And users not running a lock debugging kernel will of course >> not even see the warning. > > Okay.. > >>> - It only happens when cpufreq_add_dev() fails during hibernation while >>> we enable non-boot CPUs again to save image to disk. So, isn't a problem >>> for a system which doesn't have any issues with add_dev() failing on >>> hibernation >> >> Wrong. This was my initial assumption but I later found out that the >> issue is unrelated to hibernation failures. Sorry about the confusion. > > Hmm.. Can we have the latest warning logs you have? Earlier ones were > related to hibernation.. That's correct. I have not observed this on suspend to RAM. But then again I haven't rigged any way to log that, so I don't think it's conclusive.. The point I tried to make is that it isn't related to any hibernation *failures*. The warning appears even if the add_dev() is successful, and it also appears if I touch only the *boot* cpu cpufreq attributes. I.e., this seems to be unrelated to the hotplugging code. Here's another copy of the warning, captured by cancelling hibernation as I don't have any other way to log it at the moment. But I do see the warning appear on the console *before* cancelling. And I also see this warning appear when trying the in-kernel hibernation instead of userspace. [ 267.893084] ====================================================== [ 267.893085] [ INFO: possible circular locking dependency detected ] [ 267.893087] 3.13.0-rc6+ #183 Tainted: G W [ 267.893089] ------------------------------------------------------- [ 267.893090] s2disk/5450 is trying to acquire lock: [ 267.893101] (s_active#164){++++.+}, at: [<ffffffff8118b9d7>] sysfs_addrm_finish+0x28/0x46 [ 267.893102] [ 267.893102] but task is already holding lock: [ 267.893111] (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff81039ff5>] cpu_hotplug_begin+0x28/0x50 [ 267.893112] [ 267.893112] which lock already depends on the new lock. [ 267.893112] [ 267.893113] [ 267.893113] the existing dependency chain (in reverse order) is: [ 267.893117] [ 267.893117] -> #1 (cpu_hotplug.lock){+.+.+.}: [ 267.893123] [<ffffffff81075027>] lock_acquire+0xfb/0x144 [ 267.893128] [<ffffffff8139d4d2>] mutex_lock_nested+0x6c/0x397 [ 267.893132] [<ffffffff81039f4a>] get_online_cpus+0x3c/0x50 [ 267.893137] [<ffffffff812a69c4>] store+0x20/0xad [ 267.893142] [<ffffffff8118a9a1>] sysfs_write_file+0x138/0x18b [ 267.893147] [<ffffffff8112a428>] vfs_write+0x9c/0x102 [ 267.893151] [<ffffffff8112a716>] SyS_write+0x50/0x85 [ 267.893155] [<ffffffff813a57a2>] system_call_fastpath+0x16/0x1b [ 267.893160] [ 267.893160] -> #0 (s_active#164){++++.+}: [ 267.893164] [<ffffffff81074760>] __lock_acquire+0xae3/0xe68 [ 267.893168] [<ffffffff81075027>] lock_acquire+0xfb/0x144 [ 267.893172] [<ffffffff8118b027>] sysfs_deactivate+0xa5/0x108 [ 267.893175] [<ffffffff8118b9d7>] sysfs_addrm_finish+0x28/0x46 [ 267.893178] [<ffffffff8118bd3f>] sysfs_remove+0x2a/0x31 [ 267.893182] [<ffffffff8118be2f>] sysfs_remove_dir+0x66/0x6b [ 267.893186] [<ffffffff811d5d11>] kobject_del+0x18/0x42 [ 267.893190] [<ffffffff811d5e1c>] kobject_cleanup+0xe1/0x14f [ 267.893193] [<ffffffff811d5ede>] kobject_put+0x45/0x49 [ 267.893197] [<ffffffff812a6ed5>] cpufreq_policy_put_kobj+0x37/0x83 [ 267.893201] [<ffffffff812a8bfb>] __cpufreq_add_dev.isra.18+0x75e/0x78c [ 267.893204] [<ffffffff812a8c89>] cpufreq_cpu_callback+0x53/0x88 [ 267.893208] [<ffffffff813a314c>] notifier_call_chain+0x67/0x92 [ 267.893213] [<ffffffff8105bce4>] __raw_notifier_call_chain+0x9/0xb [ 267.893217] [<ffffffff81039e7c>] __cpu_notify+0x1b/0x32 [ 267.893221] [<ffffffff81039ea1>] cpu_notify+0xe/0x10 [ 267.893225] [<ffffffff8103a12b>] _cpu_up+0xf1/0x124 [ 267.893230] [<ffffffff8138ee7d>] enable_nonboot_cpus+0x52/0xbf [ 267.893234] [<ffffffff8107a2a3>] hibernation_snapshot+0x1be/0x2ed [ 267.893238] [<ffffffff8107eb84>] snapshot_ioctl+0x1e5/0x431 [ 267.893242] [<ffffffff81139080>] do_vfs_ioctl+0x45e/0x4a8 [ 267.893245] [<ffffffff8113911c>] SyS_ioctl+0x52/0x82 [ 267.893249] [<ffffffff813a57a2>] system_call_fastpath+0x16/0x1b [ 267.893250] [ 267.893250] other info that might help us debug this: [ 267.893250] [ 267.893251] Possible unsafe locking scenario: [ 267.893251] [ 267.893252] CPU0 CPU1 [ 267.893253] ---- ---- [ 267.893256] lock(cpu_hotplug.lock); [ 267.893259] lock(s_active#164); [ 267.893261] lock(cpu_hotplug.lock); [ 267.893265] lock(s_active#164); [ 267.893266] [ 267.893266] *** DEADLOCK *** [ 267.893266] [ 267.893268] 7 locks held by s2disk/5450: [ 267.893275] #0: (pm_mutex){+.+.+.}, at: [<ffffffff8107e9ea>] snapshot_ioctl+0x4b/0x431 [ 267.893283] #1: (device_hotplug_lock){+.+.+.}, at: [<ffffffff812837ac>] lock_device_hotplug+0x12/0x14 [ 267.893290] #2: (acpi_scan_lock){+.+.+.}, at: [<ffffffff8122c6d3>] acpi_scan_lock_acquire+0x12/0x14 [ 267.893297] #3: (console_lock){+.+.+.}, at: [<ffffffff810817f2>] suspend_console+0x20/0x38 [ 267.893305] #4: (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff81039fb9>] cpu_maps_update_begin+0x12/0x14 [ 267.893311] #5: (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff81039ff5>] cpu_hotplug_begin+0x28/0x50 [ 267.893318] #6: (cpufreq_rwsem){.+.+.+}, at: [<ffffffff812a851c>] __cpufreq_add_dev.isra.18+0x7f/0x78c [ 267.893319] [ 267.893319] stack backtrace: [ 267.893322] CPU: 0 PID: 5450 Comm: s2disk Tainted: G W 3.13.0-rc6+ #183 [ 267.893324] Hardware name: LENOVO 2776LEG/2776LEG, BIOS 6EET55WW (3.15 ) 12/19/2011 [ 267.893329] ffffffff81d3cd00 ffff8800b7acd8f8 ffffffff81399cac 00000000000068f7 [ 267.893334] ffffffff81d3cd00 ffff8800b7acd948 ffffffff81397dc5 ffff8800b7acd928 [ 267.893338] ffff8800b7b24990 0000000000000006 ffff8800b7b25118 0000000000000006 [ 267.893339] Call Trace: [ 267.893344] [<ffffffff81399cac>] dump_stack+0x4e/0x68 [ 267.893349] [<ffffffff81397dc5>] print_circular_bug+0x1f8/0x209 [ 267.893353] [<ffffffff81074760>] __lock_acquire+0xae3/0xe68 [ 267.893357] [<ffffffff8107565d>] ? debug_check_no_locks_freed+0x12c/0x143 [ 267.893361] [<ffffffff81075027>] lock_acquire+0xfb/0x144 [ 267.893365] [<ffffffff8118b9d7>] ? sysfs_addrm_finish+0x28/0x46 [ 267.893369] [<ffffffff81072201>] ? lockdep_init_map+0x14e/0x160 [ 267.893372] [<ffffffff8118b027>] sysfs_deactivate+0xa5/0x108 [ 267.893376] [<ffffffff8118b9d7>] ? sysfs_addrm_finish+0x28/0x46 [ 267.893380] [<ffffffff8118b9d7>] sysfs_addrm_finish+0x28/0x46 [ 267.893383] [<ffffffff8118bd3f>] sysfs_remove+0x2a/0x31 [ 267.893386] [<ffffffff8118be2f>] sysfs_remove_dir+0x66/0x6b [ 267.893390] [<ffffffff811d5d11>] kobject_del+0x18/0x42 [ 267.893393] [<ffffffff811d5e1c>] kobject_cleanup+0xe1/0x14f [ 267.893396] [<ffffffff811d5ede>] kobject_put+0x45/0x49 [ 267.893400] [<ffffffff812a6ed5>] cpufreq_policy_put_kobj+0x37/0x83 [ 267.893404] [<ffffffff812a8bfb>] __cpufreq_add_dev.isra.18+0x75e/0x78c [ 267.893407] [<ffffffff81071b04>] ? __lock_is_held+0x32/0x54 [ 267.893413] [<ffffffff812a8c89>] cpufreq_cpu_callback+0x53/0x88 [ 267.893416] [<ffffffff813a314c>] notifier_call_chain+0x67/0x92 [ 267.893421] [<ffffffff8105bce4>] __raw_notifier_call_chain+0x9/0xb [ 267.893425] [<ffffffff81039e7c>] __cpu_notify+0x1b/0x32 [ 267.893428] [<ffffffff81039ea1>] cpu_notify+0xe/0x10 [ 267.893432] [<ffffffff8103a12b>] _cpu_up+0xf1/0x124 [ 267.893436] [<ffffffff8138ee7d>] enable_nonboot_cpus+0x52/0xbf [ 267.893439] [<ffffffff8107a2a3>] hibernation_snapshot+0x1be/0x2ed [ 267.893443] [<ffffffff8107eb84>] snapshot_ioctl+0x1e5/0x431 [ 267.893447] [<ffffffff81139080>] do_vfs_ioctl+0x45e/0x4a8 [ 267.893451] [<ffffffff811414c8>] ? fcheck_files+0xa1/0xe4 [ 267.893454] [<ffffffff81141a67>] ? fget_light+0x33/0x9a [ 267.893457] [<ffffffff8113911c>] SyS_ioctl+0x52/0x82 [ 267.893462] [<ffffffff811df4ce>] ? trace_hardirqs_on_thunk+0x3a/0x3f [ 267.893466] [<ffffffff813a57a2>] system_call_fastpath+0x16/0x1b I don't think I do anything extra-ordinary to trigger this, so I would be surprised if you can't reproduce it by doing export x=`cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor` echo $x >/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor s2disk or similar on a kernel with CONFIG_PROVE_LOCKING=y Bjørn -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 6 January 2014 14:31, Bjørn Mork <bjorn@mork.no> wrote: > That's correct. I have not observed this on suspend to RAM. But then > again I haven't rigged any way to log that, so I don't think it's > conclusive.. > > The point I tried to make is that it isn't related to any hibernation > *failures*. The warning appears even if the add_dev() is successful, > and it also appears if I touch only the *boot* cpu cpufreq attributes. Okay.. But your log below is for add_dev() failure case. > I.e., this seems to be unrelated to the hotplugging code. > > Here's another copy of the warning, captured by cancelling hibernation > as I don't have any other way to log it at the moment. But I do see the > warning appear on the console *before* cancelling. And I also see this > warning appear when trying the in-kernel hibernation instead of > userspace. How do you cancel hibernation here? Sorry, what is in-kernel hibernation? > [ 267.893084] ====================================================== > [ 267.893085] [ INFO: possible circular locking dependency detected ] > [ 267.893087] 3.13.0-rc6+ #183 Tainted: G W > [ 267.893089] ------------------------------------------------------- > [ 267.893090] s2disk/5450 is trying to acquire lock: > [ 267.893101] (s_active#164){++++.+}, at: [<ffffffff8118b9d7>] sysfs_addrm_finish+0x28/0x46 > [ 267.893102] > [ 267.893102] but task is already holding lock: > [ 267.893111] (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff81039ff5>] cpu_hotplug_begin+0x28/0x50 > [ 267.893112] > [ 267.893112] which lock already depends on the new lock. > [ 267.893112] > [ 267.893113] > [ 267.893113] the existing dependency chain (in reverse order) is: > [ 267.893117] > [ 267.893117] -> #1 (cpu_hotplug.lock){+.+.+.}: > [ 267.893123] [<ffffffff81075027>] lock_acquire+0xfb/0x144 > [ 267.893128] [<ffffffff8139d4d2>] mutex_lock_nested+0x6c/0x397 > [ 267.893132] [<ffffffff81039f4a>] get_online_cpus+0x3c/0x50 > [ 267.893137] [<ffffffff812a69c4>] store+0x20/0xad > [ 267.893142] [<ffffffff8118a9a1>] sysfs_write_file+0x138/0x18b > [ 267.893147] [<ffffffff8112a428>] vfs_write+0x9c/0x102 > [ 267.893151] [<ffffffff8112a716>] SyS_write+0x50/0x85 > [ 267.893155] [<ffffffff813a57a2>] system_call_fastpath+0x16/0x1b > [ 267.893160] > [ 267.893160] -> #0 (s_active#164){++++.+}: > [ 267.893164] [<ffffffff81074760>] __lock_acquire+0xae3/0xe68 > [ 267.893168] [<ffffffff81075027>] lock_acquire+0xfb/0x144 > [ 267.893172] [<ffffffff8118b027>] sysfs_deactivate+0xa5/0x108 > [ 267.893175] [<ffffffff8118b9d7>] sysfs_addrm_finish+0x28/0x46 > [ 267.893178] [<ffffffff8118bd3f>] sysfs_remove+0x2a/0x31 > [ 267.893182] [<ffffffff8118be2f>] sysfs_remove_dir+0x66/0x6b > [ 267.893186] [<ffffffff811d5d11>] kobject_del+0x18/0x42 > [ 267.893190] [<ffffffff811d5e1c>] kobject_cleanup+0xe1/0x14f > [ 267.893193] [<ffffffff811d5ede>] kobject_put+0x45/0x49 > [ 267.893197] [<ffffffff812a6ed5>] cpufreq_policy_put_kobj+0x37/0x83 All above is for case where add dev was required to clear up earlier allocated policy as something failed during add_dev() > [ 267.893201] [<ffffffff812a8bfb>] __cpufreq_add_dev.isra.18+0x75e/0x78c > [ 267.893204] [<ffffffff812a8c89>] cpufreq_cpu_callback+0x53/0x88 > [ 267.893208] [<ffffffff813a314c>] notifier_call_chain+0x67/0x92 > [ 267.893213] [<ffffffff8105bce4>] __raw_notifier_call_chain+0x9/0xb > [ 267.893217] [<ffffffff81039e7c>] __cpu_notify+0x1b/0x32 > [ 267.893221] [<ffffffff81039ea1>] cpu_notify+0xe/0x10 > [ 267.893225] [<ffffffff8103a12b>] _cpu_up+0xf1/0x124 > [ 267.893230] [<ffffffff8138ee7d>] enable_nonboot_cpus+0x52/0xbf > [ 267.893234] [<ffffffff8107a2a3>] hibernation_snapshot+0x1be/0x2ed And this is again the same log as what you have sent last time. It happened while saving image after disabling all non-boot CPUs. And so it happened for your case as your driver->init() was failing, so it may not get reproduced at my end. Haven't tried it though. > I don't think I do anything extra-ordinary to trigger this, so I would > be surprised if you can't reproduce it by doing > > export x=`cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor` > echo $x >/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor > s2disk I don't see you cancelling hibernation here. Sorry got confused on how exactly you reproduced it. -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Viresh Kumar <viresh.kumar@linaro.org> writes: > On 6 January 2014 14:31, Bjørn Mork <bjorn@mork.no> wrote: >> That's correct. I have not observed this on suspend to RAM. But then >> again I haven't rigged any way to log that, so I don't think it's >> conclusive.. >> >> The point I tried to make is that it isn't related to any hibernation >> *failures*. The warning appears even if the add_dev() is successful, >> and it also appears if I touch only the *boot* cpu cpufreq attributes. > > Okay.. But your log below is for add_dev() failure case. Any suggestion on how to capure warnings issued between freeze and thaw is appreciated :-) Will a netconsole work? I can try that later. >> I.e., this seems to be unrelated to the hotplugging code. >> >> Here's another copy of the warning, captured by cancelling hibernation >> as I don't have any other way to log it at the moment. But I do see the >> warning appear on the console *before* cancelling. And I also see this >> warning appear when trying the in-kernel hibernation instead of >> userspace. > > How do you cancel hibernation here? Sorry, what is in-kernel hibernation? I don't know the proper terms here. I normally use s2disk from uswsusp to hibernate. By "in-kernel hibernation" I meant the process initiated by doing echo disk >/sys/power/state >> [ 267.893084] ====================================================== >> [ 267.893085] [ INFO: possible circular locking dependency detected ] >> [ 267.893087] 3.13.0-rc6+ #183 Tainted: G W >> [ 267.893089] ------------------------------------------------------- >> [ 267.893090] s2disk/5450 is trying to acquire lock: >> [ 267.893101] (s_active#164){++++.+}, at: [<ffffffff8118b9d7>] sysfs_addrm_finish+0x28/0x46 >> [ 267.893102] >> [ 267.893102] but task is already holding lock: >> [ 267.893111] (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff81039ff5>] cpu_hotplug_begin+0x28/0x50 >> [ 267.893112] >> [ 267.893112] which lock already depends on the new lock. >> [ 267.893112] >> [ 267.893113] >> [ 267.893113] the existing dependency chain (in reverse order) is: >> [ 267.893117] >> [ 267.893117] -> #1 (cpu_hotplug.lock){+.+.+.}: >> [ 267.893123] [<ffffffff81075027>] lock_acquire+0xfb/0x144 >> [ 267.893128] [<ffffffff8139d4d2>] mutex_lock_nested+0x6c/0x397 >> [ 267.893132] [<ffffffff81039f4a>] get_online_cpus+0x3c/0x50 >> [ 267.893137] [<ffffffff812a69c4>] store+0x20/0xad >> [ 267.893142] [<ffffffff8118a9a1>] sysfs_write_file+0x138/0x18b >> [ 267.893147] [<ffffffff8112a428>] vfs_write+0x9c/0x102 >> [ 267.893151] [<ffffffff8112a716>] SyS_write+0x50/0x85 >> [ 267.893155] [<ffffffff813a57a2>] system_call_fastpath+0x16/0x1b >> [ 267.893160] >> [ 267.893160] -> #0 (s_active#164){++++.+}: >> [ 267.893164] [<ffffffff81074760>] __lock_acquire+0xae3/0xe68 >> [ 267.893168] [<ffffffff81075027>] lock_acquire+0xfb/0x144 >> [ 267.893172] [<ffffffff8118b027>] sysfs_deactivate+0xa5/0x108 >> [ 267.893175] [<ffffffff8118b9d7>] sysfs_addrm_finish+0x28/0x46 >> [ 267.893178] [<ffffffff8118bd3f>] sysfs_remove+0x2a/0x31 >> [ 267.893182] [<ffffffff8118be2f>] sysfs_remove_dir+0x66/0x6b >> [ 267.893186] [<ffffffff811d5d11>] kobject_del+0x18/0x42 >> [ 267.893190] [<ffffffff811d5e1c>] kobject_cleanup+0xe1/0x14f >> [ 267.893193] [<ffffffff811d5ede>] kobject_put+0x45/0x49 >> [ 267.893197] [<ffffffff812a6ed5>] cpufreq_policy_put_kobj+0x37/0x83 > > All above is for case where add dev was required to clear up earlier > allocated policy as something failed during add_dev() I see. I'll try capturing at least part of the warning with a camera, when it's not failing. >> [ 267.893201] [<ffffffff812a8bfb>] __cpufreq_add_dev.isra.18+0x75e/0x78c >> [ 267.893204] [<ffffffff812a8c89>] cpufreq_cpu_callback+0x53/0x88 >> [ 267.893208] [<ffffffff813a314c>] notifier_call_chain+0x67/0x92 >> [ 267.893213] [<ffffffff8105bce4>] __raw_notifier_call_chain+0x9/0xb >> [ 267.893217] [<ffffffff81039e7c>] __cpu_notify+0x1b/0x32 >> [ 267.893221] [<ffffffff81039ea1>] cpu_notify+0xe/0x10 >> [ 267.893225] [<ffffffff8103a12b>] _cpu_up+0xf1/0x124 >> [ 267.893230] [<ffffffff8138ee7d>] enable_nonboot_cpus+0x52/0xbf >> [ 267.893234] [<ffffffff8107a2a3>] hibernation_snapshot+0x1be/0x2ed > > And this is again the same log as what you have sent last time. It happened > while saving image after disabling all non-boot CPUs. And so it happened for > your case as your driver->init() was failing, so it may not get reproduced at > my end. Haven't tried it though. > >> I don't think I do anything extra-ordinary to trigger this, so I would >> be surprised if you can't reproduce it by doing >> >> export x=`cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor` >> echo $x >/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor >> s2disk > > I don't see you cancelling hibernation here. Sorry got confused on how > exactly you reproduced it. I cancelled by pressing backspace while the image was being written. Bjørn -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 6 January 2014 16:19, Bjørn Mork <bjorn@mork.no> wrote:
> I cancelled by pressing backspace while the image was being written.
I will answer other queries on next mail, but a quick question:
Do you get these warnings even if you don't cancel hibernation?
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
On Monday, January 06, 2014 11:49:00 AM Bjørn Mork wrote: > Viresh Kumar <viresh.kumar@linaro.org> writes: > > > On 6 January 2014 14:31, Bjørn Mork <bjorn@mork.no> wrote: > >> That's correct. I have not observed this on suspend to RAM. But then > >> again I haven't rigged any way to log that, so I don't think it's > >> conclusive.. > >> > >> The point I tried to make is that it isn't related to any hibernation > >> *failures*. The warning appears even if the add_dev() is successful, > >> and it also appears if I touch only the *boot* cpu cpufreq attributes. > > > > Okay.. But your log below is for add_dev() failure case. > > Any suggestion on how to capure warnings issued between freeze and thaw > is appreciated :-) You can try to add no_console_suspend to the kernel's command line. Thanks!
On Monday, January 06, 2014 04:24:09 PM Viresh Kumar wrote: > On 6 January 2014 16:19, Bjørn Mork <bjorn@mork.no> wrote: > > I cancelled by pressing backspace while the image was being written. Basically, stuff is still frozen at this point, but devices are fully functional. It's kind of like returning an error code from swsusp_write() in hibernate() (kernel/power/hibernate.c). > I will answer other queries on next mail, but a quick question: > Do you get these warnings even if you don't cancel hibernation? Good question. :-) Thanks!
On 6 January 2014 16:47, Bjørn Mork <bjorn@mork.no> wrote: > Viresh Kumar <viresh.kumar@linaro.org> writes: > >> On 6 January 2014 16:19, Bjørn Mork <bjorn@mork.no> wrote: >>> I cancelled by pressing backspace while the image was being written. >> >> I will answer other queries on next mail, but a quick question: >> Do you get these warnings even if you don't cancel hibernation? > > Yes. Hmm.. I see. Now, I really don't believe that you wouldn't get these warnings on a suspend/resume but only on hibernation. > I believe the photo shows that you are right: This warning happens when > the failure path is entered, even if both the hibernation and following > resume is successful. We see __cpufreq_add_dev calling > cpufreq_policy_put_kobj here as well. @Rafael: I need your expert opinion here, I am not so good with these concepts :) Here is my theory about these warnings: - Bjorn gets these only when CONFIG_PROVE_LOCKING=y. - This feature enables the kernel to report locking related deadlocks before they actually occur. (lib/Kconfig.debug) - So, no locking issue has happened here but its just an warning that you might end up getting into such issue later. - Contention is between sysfs and cpu-hotplug lock. - While changing stuff from sysfs we do this: sysfs-lock (by kernel) cpu-hotplug lock (by cpufreq) - And while doing hibernation if add_dev fails then we do this cpu-hotplug lock (by kernel while adding CPUs) sysfs-lock (by cpufreq while removing sysfs directories) And kernel is recording all these sequences all the time. Because we first tried changing values from sysfs, kernel recorded that sequence. And as soon as it found another one it WARNed us? While on a suspend/resume, we will get into add-dev failure only in resume and probably by that time these recordings are refreshed? And that's why Bjorn isn't facing any issue there ? Just an wild guess. And if all above "virtual" theory is correct, then I don't know if we can/want to fix that.. > Thinking about this: This failure will always happen, regardless of > cancelling or not, won't it? Yeah. > But it happens *after* the freeze so it > doesn't matter if we resume from the frozen image. yeah.. So, even tests like this should give you this warning: Documentation/power/basic-pm-debugging.txt core - test the freezing of processes, suspending of devices, platform global control methods(*), the disabling of nonboot CPUs and suspending of platform/system devices > So you are probably right as usual: Really? Ask Rafael, he doesn't agree :) > This warning is related to the > acpi-cpufreq driver failing, and you cannot reproduce it without > simulating such a failure (or using that driver yourself). Which means > that the warning disappears with my patch in > http://www.spinics.net/lists/cpufreq/msg08714.html > which explains why I don't normally see it. I depend on being able to > cancel hibernation, so I use that patch most of the time. Except when > I'm testing one of your branches. Correct. -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Index: linux-pm/drivers/cpufreq/cpufreq.c =================================================================== --- linux-pm.orig/drivers/cpufreq/cpufreq.c +++ linux-pm/drivers/cpufreq/cpufreq.c @@ -1016,15 +1016,17 @@ static int __cpufreq_add_dev(struct devi read_unlock_irqrestore(&cpufreq_driver_lock, flags); #endif - if (frozen) - /* Restore the saved policy when doing light-weight init */ - policy = cpufreq_policy_restore(cpu); - else + /* + * Restore the saved policy when doing light-weight init and fall back + * to the full init if that fails. + */ + policy = frozen ? cpufreq_policy_restore(cpu) : NULL; + if (!policy) { + frozen = false; policy = cpufreq_policy_alloc(); - - if (!policy) - goto nomem_out; - + if (!policy) + goto nomem_out; + } /* * In the resume path, since we restore a saved policy, the assignment @@ -1118,8 +1120,11 @@ err_get_freq: if (cpufreq_driver->exit) cpufreq_driver->exit(policy); err_set_policy_cpu: - if (frozen) + if (frozen) { + /* Do not leave stale fallback data behind. */ + per_cpu(cpufreq_cpu_data_fallback, cpu) = NULL; cpufreq_policy_put_kobj(policy); + } cpufreq_policy_free(policy); nomem_out: