Message ID | 53DB6B81.6050400@redhat.com (mailing list archive) |
---|---|
State | Not Applicable, archived |
Headers | show |
On 08/01/14 03:27, Prarit Bhargava wrote: > > Can you send me the test and the trace of the deadlock? I'm not creating it with: > This was with conservative as the default, and switching to ondemand # cd /sys/devices/system/cpu/cpu2/cpufreq # ls affected_cpus scaling_available_governors conservative scaling_cur_freq cpuinfo_cur_freq scaling_driver cpuinfo_max_freq scaling_governor cpuinfo_min_freq scaling_max_freq cpuinfo_transition_latency scaling_min_freq related_cpus scaling_setspeed scaling_available_frequencies stats # cat conservative/down_threshold 20 # echo ondemand > scaling_governor ====================================================== [ INFO: possible circular locking dependency detected ] 3.16.0-rc3-00039-ge1e38f124d87 #47 Not tainted ------------------------------------------------------- sh/75 is trying to acquire lock: (s_active#9){++++..}, at: [<c0358a94>] kernfs_remove_by_name_ns+0x3c/0x84 but task is already holding lock: (&policy->rwsem){+++++.}, at: [<c05ab1f0>] store+0x68/0xb8 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&policy->rwsem){+++++.}: [<c0359234>] kernfs_fop_open+0x138/0x298 [<c02fa3f4>] do_dentry_open.isra.12+0x1b0/0x2f0 [<c02fa604>] finish_open+0x20/0x38 [<c0308d34>] do_last.isra.37+0x5ac/0xb68 [<c03093a4>] path_openat+0xb4/0x5d8 [<c0309bcc>] do_filp_open+0x2c/0x80 [<c02fb558>] do_sys_open+0x10c/0x1c8 [<c020f0a0>] ret_fast_syscall+0x0/0x48 -> #0 (s_active#9){++++..}: [<c0357d18>] __kernfs_remove+0x250/0x300 [<c0358a94>] kernfs_remove_by_name_ns+0x3c/0x84 [<c035aa78>] remove_files+0x34/0x78 [<c035aee0>] sysfs_remove_group+0x40/0x98 [<c05b0560>] cpufreq_governor_dbs+0x4c0/0x6ec [<c05abebc>] __cpufreq_governor+0x118/0x200 [<c05ac0fc>] cpufreq_set_policy+0x158/0x2ac [<c05ad5e4>] store_scaling_governor+0x6c/0x94 [<c05ab210>] store+0x88/0xb8 [<c035a00c>] sysfs_kf_write+0x4c/0x50 [<c03594d4>] kernfs_fop_write+0xc0/0x180 [<c02fc5c8>] vfs_write+0xa0/0x1a8 [<c02fc9d4>] SyS_write+0x40/0x8c [<c020f0a0>] ret_fast_syscall+0x0/0x48 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&policy->rwsem); lock(s_active#9); lock(&policy->rwsem); lock(s_active#9); *** DEADLOCK *** 6 locks held by sh/75: #0: (sb_writers#4){.+.+..}, at: [<c02fc6a8>] vfs_write+0x180/0x1a8 #1: (&of->mutex){+.+...}, at: [<c0359498>] kernfs_fop_write+0x84/0x180 #2: (s_active#10){.+.+..}, at: [<c03594a0>] kernfs_fop_write+0x8c/0x180 #3: (cpu_hotplug.lock){++++++}, at: [<c0221ef8>] get_online_cpus+0x38/0x9c #4: (cpufreq_rwsem){.+.+.+}, at: [<c05ab1d8>] store+0x50/0xb8 #5: (&policy->rwsem){+++++.}, at: [<c05ab1f0>] store+0x68/0xb8 stack backtrace: CPU: 0 PID: 75 Comm: sh Not tainted 3.16.0-rc3-00039-ge1e38f124d87 #47 [<c0214de8>] (unwind_backtrace) from [<c02123f8>] (show_stack+0x10/0x14) [<c02123f8>] (show_stack) from [<c0709e5c>] (dump_stack+0x70/0xbc) [<c0709e5c>] (dump_stack) from [<c070722c>] (print_circular_bug+0x280/0x2d4) [<c070722c>] (print_circular_bug) from [<c02629cc>] (__lock_acquire+0x18d0/0x1abc) [<c02629cc>] (__lock_acquire) from [<c026310c>] (lock_acquire+0x9c/0x138) [<c026310c>] (lock_acquire) from [<c0357d18>] (__kernfs_remove+0x250/0x300) [<c0357d18>] (__kernfs_remove) from [<c0358a94>] (kernfs_remove_by_name_ns+0x3c/0x84) [<c0358a94>] (kernfs_remove_by_name_ns) from [<c035aa78>] (remove_files+0x34/0x78) [<c035aa78>] (remove_files) from [<c035aee0>] (sysfs_remove_group+0x40/0x98) [<c035aee0>] (sysfs_remove_group) from [<c05b0560>] (cpufreq_governor_dbs+0x4c0/0x6ec) [<c05b0560>] (cpufreq_governor_dbs) from [<c05abebc>] (__cpufreq_governor+0x118/0x200) [<c05abebc>] (__cpufreq_governor) from [<c05ac0fc>] (cpufreq_set_policy+0x158/0x2ac) [<c05ac0fc>] (cpufreq_set_policy) from [<c05ad5e4>] (store_scaling_governor+0x6c/0x94) [<c05ad5e4>] (store_scaling_governor) from [<c05ab210>] (store+0x88/0xb8) [<c05ab210>] (store) from [<c035a00c>] (sysfs_kf_write+0x4c/0x50) [<c035a00c>] (sysfs_kf_write) from [<c03594d4>] (kernfs_fop_write+0xc0/0x180) [<c03594d4>] (kernfs_fop_write) from [<c02fc5c8>] (vfs_write+0xa0/0x1a8) [<c02fc5c8>] (vfs_write) from [<c02fc9d4>] (SyS_write+0x40/0x8c) [<c02fc9d4>] (SyS_write) from [<c020f0a0>] (ret_fast_syscall+0x0/0x48)
On 08/01/2014 01:18 PM, Stephen Boyd wrote: > On 08/01/14 03:27, Prarit Bhargava wrote: >> >> Can you send me the test and the trace of the deadlock? I'm not creating it with: >> > > This was with conservative as the default, and switching to ondemand > > # cd /sys/devices/system/cpu/cpu2/cpufreq > # ls > affected_cpus scaling_available_governors > conservative scaling_cur_freq > cpuinfo_cur_freq scaling_driver > cpuinfo_max_freq scaling_governor > cpuinfo_min_freq scaling_max_freq > cpuinfo_transition_latency scaling_min_freq > related_cpus scaling_setspeed > scaling_available_frequencies stats > # cat conservative/down_threshold > 20 > # echo ondemand > scaling_governor Thanks Stephen, There's obviously a difference in our .configs. I have a global conservative directory, ie) /sys/devices/system/cpu/cpufreq/conservative instead of a per-cpu governor file. ie) what are your .config options for CPUFREQ? Mine are: # # CPU Frequency scaling # CONFIG_CPU_FREQ=y CONFIG_CPU_FREQ_GOV_COMMON=y CONFIG_CPU_FREQ_STAT=m CONFIG_CPU_FREQ_STAT_DETAILS=y # CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE is not set # CONFIG_CPU_FREQ_DEFAULT_GOV_USERSPACE is not set # CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND is not set CONFIG_CPU_FREQ_DEFAULT_GOV_CONSERVATIVE=y CONFIG_CPU_FREQ_GOV_PERFORMANCE=y CONFIG_CPU_FREQ_GOV_POWERSAVE=y CONFIG_CPU_FREQ_GOV_USERSPACE=y CONFIG_CPU_FREQ_GOV_ONDEMAND=y CONFIG_CPU_FREQ_GOV_CONSERVATIVE=y Is there some other config option I have to set? P. > > ====================================================== > [ INFO: possible circular locking dependency detected ] > 3.16.0-rc3-00039-ge1e38f124d87 #47 Not tainted > ------------------------------------------------------- > sh/75 is trying to acquire lock: > (s_active#9){++++..}, at: [<c0358a94>] kernfs_remove_by_name_ns+0x3c/0x84 > > but task is already holding lock: > (&policy->rwsem){+++++.}, at: [<c05ab1f0>] store+0x68/0xb8 > > which lock already depends on the new lock. > > > the existing dependency chain (in reverse order) is: > > -> #1 (&policy->rwsem){+++++.}: > [<c0359234>] kernfs_fop_open+0x138/0x298 > [<c02fa3f4>] do_dentry_open.isra.12+0x1b0/0x2f0 > [<c02fa604>] finish_open+0x20/0x38 > [<c0308d34>] do_last.isra.37+0x5ac/0xb68 > [<c03093a4>] path_openat+0xb4/0x5d8 > [<c0309bcc>] do_filp_open+0x2c/0x80 > [<c02fb558>] do_sys_open+0x10c/0x1c8 > [<c020f0a0>] ret_fast_syscall+0x0/0x48 > > -> #0 (s_active#9){++++..}: > [<c0357d18>] __kernfs_remove+0x250/0x300 > [<c0358a94>] kernfs_remove_by_name_ns+0x3c/0x84 > [<c035aa78>] remove_files+0x34/0x78 > [<c035aee0>] sysfs_remove_group+0x40/0x98 > [<c05b0560>] cpufreq_governor_dbs+0x4c0/0x6ec > [<c05abebc>] __cpufreq_governor+0x118/0x200 > [<c05ac0fc>] cpufreq_set_policy+0x158/0x2ac > [<c05ad5e4>] store_scaling_governor+0x6c/0x94 > [<c05ab210>] store+0x88/0xb8 > [<c035a00c>] sysfs_kf_write+0x4c/0x50 > [<c03594d4>] kernfs_fop_write+0xc0/0x180 > [<c02fc5c8>] vfs_write+0xa0/0x1a8 > [<c02fc9d4>] SyS_write+0x40/0x8c > [<c020f0a0>] ret_fast_syscall+0x0/0x48 > > other info that might help us debug this: > > Possible unsafe locking scenario: > > CPU0 CPU1 > ---- ---- > lock(&policy->rwsem); > lock(s_active#9); > lock(&policy->rwsem); > lock(s_active#9); > > *** DEADLOCK *** > > 6 locks held by sh/75: > #0: (sb_writers#4){.+.+..}, at: [<c02fc6a8>] vfs_write+0x180/0x1a8 > #1: (&of->mutex){+.+...}, at: [<c0359498>] kernfs_fop_write+0x84/0x180 > #2: (s_active#10){.+.+..}, at: [<c03594a0>] kernfs_fop_write+0x8c/0x180 > #3: (cpu_hotplug.lock){++++++}, at: [<c0221ef8>] get_online_cpus+0x38/0x9c > #4: (cpufreq_rwsem){.+.+.+}, at: [<c05ab1d8>] store+0x50/0xb8 > #5: (&policy->rwsem){+++++.}, at: [<c05ab1f0>] store+0x68/0xb8 > > stack backtrace: > CPU: 0 PID: 75 Comm: sh Not tainted 3.16.0-rc3-00039-ge1e38f124d87 #47 > [<c0214de8>] (unwind_backtrace) from [<c02123f8>] (show_stack+0x10/0x14) > [<c02123f8>] (show_stack) from [<c0709e5c>] (dump_stack+0x70/0xbc) > [<c0709e5c>] (dump_stack) from [<c070722c>] (print_circular_bug+0x280/0x2d4) > [<c070722c>] (print_circular_bug) from [<c02629cc>] (__lock_acquire+0x18d0/0x1abc) > [<c02629cc>] (__lock_acquire) from [<c026310c>] (lock_acquire+0x9c/0x138) > [<c026310c>] (lock_acquire) from [<c0357d18>] (__kernfs_remove+0x250/0x300) > [<c0357d18>] (__kernfs_remove) from [<c0358a94>] (kernfs_remove_by_name_ns+0x3c/0x84) > [<c0358a94>] (kernfs_remove_by_name_ns) from [<c035aa78>] (remove_files+0x34/0x78) > [<c035aa78>] (remove_files) from [<c035aee0>] (sysfs_remove_group+0x40/0x98) > [<c035aee0>] (sysfs_remove_group) from [<c05b0560>] (cpufreq_governor_dbs+0x4c0/0x6ec) > [<c05b0560>] (cpufreq_governor_dbs) from [<c05abebc>] (__cpufreq_governor+0x118/0x200) > [<c05abebc>] (__cpufreq_governor) from [<c05ac0fc>] (cpufreq_set_policy+0x158/0x2ac) > [<c05ac0fc>] (cpufreq_set_policy) from [<c05ad5e4>] (store_scaling_governor+0x6c/0x94) > [<c05ad5e4>] (store_scaling_governor) from [<c05ab210>] (store+0x88/0xb8) > [<c05ab210>] (store) from [<c035a00c>] (sysfs_kf_write+0x4c/0x50) > [<c035a00c>] (sysfs_kf_write) from [<c03594d4>] (kernfs_fop_write+0xc0/0x180) > [<c03594d4>] (kernfs_fop_write) from [<c02fc5c8>] (vfs_write+0xa0/0x1a8) > [<c02fc5c8>] (vfs_write) from [<c02fc9d4>] (SyS_write+0x40/0x8c) > [<c02fc9d4>] (SyS_write) from [<c020f0a0>] (ret_fast_syscall+0x0/0x48) > > -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 08/01/14 12:15, Prarit Bhargava wrote: > > On 08/01/2014 01:18 PM, Stephen Boyd wrote: >> On 08/01/14 03:27, Prarit Bhargava wrote: >>> Can you send me the test and the trace of the deadlock? I'm not creating it with: >>> >> This was with conservative as the default, and switching to ondemand >> >> # cd /sys/devices/system/cpu/cpu2/cpufreq >> # ls >> affected_cpus scaling_available_governors >> conservative scaling_cur_freq >> cpuinfo_cur_freq scaling_driver >> cpuinfo_max_freq scaling_governor >> cpuinfo_min_freq scaling_max_freq >> cpuinfo_transition_latency scaling_min_freq >> related_cpus scaling_setspeed >> scaling_available_frequencies stats >> # cat conservative/down_threshold >> 20 >> # echo ondemand > scaling_governor > Thanks Stephen, > > There's obviously a difference in our .configs. I have a global conservative > directory, ie) /sys/devices/system/cpu/cpufreq/conservative instead of a per-cpu > governor file. > > ie) what are your .config options for CPUFREQ? > > Mine are: > > # > # CPU Frequency scaling > # > CONFIG_CPU_FREQ=y > CONFIG_CPU_FREQ_GOV_COMMON=y > CONFIG_CPU_FREQ_STAT=m > CONFIG_CPU_FREQ_STAT_DETAILS=y > # CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE is not set > # CONFIG_CPU_FREQ_DEFAULT_GOV_USERSPACE is not set > # CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND is not set > CONFIG_CPU_FREQ_DEFAULT_GOV_CONSERVATIVE=y > CONFIG_CPU_FREQ_GOV_PERFORMANCE=y > CONFIG_CPU_FREQ_GOV_POWERSAVE=y > CONFIG_CPU_FREQ_GOV_USERSPACE=y > CONFIG_CPU_FREQ_GOV_ONDEMAND=y > CONFIG_CPU_FREQ_GOV_CONSERVATIVE=y > > Is there some other config option I have to set? I have the same options. The difference is that my driver has a governor per policy. That's set with the CPUFREQ_HAVE_GOVERNOR_PER_POLICY flag. If I remove that flag I can't trigger the lockdep splat anymore with this sequence and your patch.
On 08/01/2014 03:36 PM, Stephen Boyd wrote: > On 08/01/14 12:15, Prarit Bhargava wrote: >> >> On 08/01/2014 01:18 PM, Stephen Boyd wrote: >>> On 08/01/14 03:27, Prarit Bhargava wrote: >>>> Can you send me the test and the trace of the deadlock? I'm not creating it with: >>>> >>> This was with conservative as the default, and switching to ondemand >>> >>> # cd /sys/devices/system/cpu/cpu2/cpufreq >>> # ls >>> affected_cpus scaling_available_governors >>> conservative scaling_cur_freq >>> cpuinfo_cur_freq scaling_driver >>> cpuinfo_max_freq scaling_governor >>> cpuinfo_min_freq scaling_max_freq >>> cpuinfo_transition_latency scaling_min_freq >>> related_cpus scaling_setspeed >>> scaling_available_frequencies stats >>> # cat conservative/down_threshold >>> 20 >>> # echo ondemand > scaling_governor >> Thanks Stephen, >> >> There's obviously a difference in our .configs. I have a global conservative >> directory, ie) /sys/devices/system/cpu/cpufreq/conservative instead of a per-cpu >> governor file. >> >> ie) what are your .config options for CPUFREQ? >> >> Mine are: >> >> # >> # CPU Frequency scaling >> # >> CONFIG_CPU_FREQ=y >> CONFIG_CPU_FREQ_GOV_COMMON=y >> CONFIG_CPU_FREQ_STAT=m >> CONFIG_CPU_FREQ_STAT_DETAILS=y >> # CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE is not set >> # CONFIG_CPU_FREQ_DEFAULT_GOV_USERSPACE is not set >> # CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND is not set >> CONFIG_CPU_FREQ_DEFAULT_GOV_CONSERVATIVE=y >> CONFIG_CPU_FREQ_GOV_PERFORMANCE=y >> CONFIG_CPU_FREQ_GOV_POWERSAVE=y >> CONFIG_CPU_FREQ_GOV_USERSPACE=y >> CONFIG_CPU_FREQ_GOV_ONDEMAND=y >> CONFIG_CPU_FREQ_GOV_CONSERVATIVE=y >> >> Is there some other config option I have to set? > > I have the same options. The difference is that my driver has a governor > per policy. That's set with the CPUFREQ_HAVE_GOVERNOR_PER_POLICY flag. > If I remove that flag I can't trigger the lockdep splat anymore with > this sequence and your patch. I see -- so you're seeing this on arm then? If so, let me know so I can reserve one to work on :) Thanks! P. > -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 08/01/14 12:43, Prarit Bhargava wrote: > > On 08/01/2014 03:36 PM, Stephen Boyd wrote: >> On 08/01/14 12:15, Prarit Bhargava wrote: >>> On 08/01/2014 01:18 PM, Stephen Boyd wrote: >>>> On 08/01/14 03:27, Prarit Bhargava wrote: >>>>> Can you send me the test and the trace of the deadlock? I'm not creating it with: >>>>> >>>> This was with conservative as the default, and switching to ondemand >>>> >>>> # cd /sys/devices/system/cpu/cpu2/cpufreq >>>> # ls >>>> affected_cpus scaling_available_governors >>>> conservative scaling_cur_freq >>>> cpuinfo_cur_freq scaling_driver >>>> cpuinfo_max_freq scaling_governor >>>> cpuinfo_min_freq scaling_max_freq >>>> cpuinfo_transition_latency scaling_min_freq >>>> related_cpus scaling_setspeed >>>> scaling_available_frequencies stats >>>> # cat conservative/down_threshold >>>> 20 >>>> # echo ondemand > scaling_governor >>> Thanks Stephen, >>> >>> There's obviously a difference in our .configs. I have a global conservative >>> directory, ie) /sys/devices/system/cpu/cpufreq/conservative instead of a per-cpu >>> governor file. >>> >>> ie) what are your .config options for CPUFREQ? >>> >>> Mine are: >>> >>> # >>> # CPU Frequency scaling >>> # >>> CONFIG_CPU_FREQ=y >>> CONFIG_CPU_FREQ_GOV_COMMON=y >>> CONFIG_CPU_FREQ_STAT=m >>> CONFIG_CPU_FREQ_STAT_DETAILS=y >>> # CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE is not set >>> # CONFIG_CPU_FREQ_DEFAULT_GOV_USERSPACE is not set >>> # CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND is not set >>> CONFIG_CPU_FREQ_DEFAULT_GOV_CONSERVATIVE=y >>> CONFIG_CPU_FREQ_GOV_PERFORMANCE=y >>> CONFIG_CPU_FREQ_GOV_POWERSAVE=y >>> CONFIG_CPU_FREQ_GOV_USERSPACE=y >>> CONFIG_CPU_FREQ_GOV_ONDEMAND=y >>> CONFIG_CPU_FREQ_GOV_CONSERVATIVE=y >>> >>> Is there some other config option I have to set? >> I have the same options. The difference is that my driver has a governor >> per policy. That's set with the CPUFREQ_HAVE_GOVERNOR_PER_POLICY flag. >> If I remove that flag I can't trigger the lockdep splat anymore with >> this sequence and your patch. > I see -- so you're seeing this on arm then? If so, let me know so I can reserve > one to work on :) > I only have ARM to test on. You can set this flag in your cpufreq driver if you have independently scaling CPU frequencies, so it isn't necessarily an ARM thing.
On 08/01/2014 12:54 PM, Stephen Boyd wrote: > On 08/01/14 12:43, Prarit Bhargava wrote: >> >> On 08/01/2014 03:36 PM, Stephen Boyd wrote: >>> On 08/01/14 12:15, Prarit Bhargava wrote: >>>> On 08/01/2014 01:18 PM, Stephen Boyd wrote: >>>>> On 08/01/14 03:27, Prarit Bhargava wrote: >>>>>> Can you send me the test and the trace of the deadlock? I'm not creating it with: >>>>>> >>>>> This was with conservative as the default, and switching to ondemand >>>>> >>>>> # cd /sys/devices/system/cpu/cpu2/cpufreq >>>>> # ls >>>>> affected_cpus scaling_available_governors >>>>> conservative scaling_cur_freq >>>>> cpuinfo_cur_freq scaling_driver >>>>> cpuinfo_max_freq scaling_governor >>>>> cpuinfo_min_freq scaling_max_freq >>>>> cpuinfo_transition_latency scaling_min_freq >>>>> related_cpus scaling_setspeed >>>>> scaling_available_frequencies stats >>>>> # cat conservative/down_threshold >>>>> 20 >>>>> # echo ondemand > scaling_governor >>>> Thanks Stephen, >>>> >>>> There's obviously a difference in our .configs. I have a global conservative >>>> directory, ie) /sys/devices/system/cpu/cpufreq/conservative instead of a per-cpu >>>> governor file. >>>> >>>> ie) what are your .config options for CPUFREQ? >>>> >>>> Mine are: >>>> >>>> # >>>> # CPU Frequency scaling >>>> # >>>> CONFIG_CPU_FREQ=y >>>> CONFIG_CPU_FREQ_GOV_COMMON=y >>>> CONFIG_CPU_FREQ_STAT=m >>>> CONFIG_CPU_FREQ_STAT_DETAILS=y >>>> # CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE is not set >>>> # CONFIG_CPU_FREQ_DEFAULT_GOV_USERSPACE is not set >>>> # CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND is not set >>>> CONFIG_CPU_FREQ_DEFAULT_GOV_CONSERVATIVE=y >>>> CONFIG_CPU_FREQ_GOV_PERFORMANCE=y >>>> CONFIG_CPU_FREQ_GOV_POWERSAVE=y >>>> CONFIG_CPU_FREQ_GOV_USERSPACE=y >>>> CONFIG_CPU_FREQ_GOV_ONDEMAND=y >>>> CONFIG_CPU_FREQ_GOV_CONSERVATIVE=y >>>> >>>> Is there some other config option I have to set? >>> I have the same options. The difference is that my driver has a governor >>> per policy. That's set with the CPUFREQ_HAVE_GOVERNOR_PER_POLICY flag. >>> If I remove that flag I can't trigger the lockdep splat anymore with >>> this sequence and your patch. >> I see -- so you're seeing this on arm then? If so, let me know so I can reserve >> one to work on :) >> > > I only have ARM to test on. You can set this flag in your cpufreq driver > if you have independently scaling CPU frequencies, so it isn't > necessarily an ARM thing. > Sorry, forgot to mention this in the earlier email. But yeah, this is a totally SW feature. You should be able to enable this flag in your driver. -Saravana
On 08/01/2014 05:25 PM, Saravana Kannan wrote: > On 08/01/2014 12:54 PM, Stephen Boyd wrote: >> On 08/01/14 12:43, Prarit Bhargava wrote: >>> >>> On 08/01/2014 03:36 PM, Stephen Boyd wrote: >>>> On 08/01/14 12:15, Prarit Bhargava wrote: >>>>> On 08/01/2014 01:18 PM, Stephen Boyd wrote: >>>>>> On 08/01/14 03:27, Prarit Bhargava wrote: >>>>>>> Can you send me the test and the trace of the deadlock? I'm not creating >>>>>>> it with: >>>>>>> >>>>>> This was with conservative as the default, and switching to ondemand >>>>>> >>>>>> # cd /sys/devices/system/cpu/cpu2/cpufreq >>>>>> # ls >>>>>> affected_cpus scaling_available_governors >>>>>> conservative scaling_cur_freq >>>>>> cpuinfo_cur_freq scaling_driver >>>>>> cpuinfo_max_freq scaling_governor >>>>>> cpuinfo_min_freq scaling_max_freq >>>>>> cpuinfo_transition_latency scaling_min_freq >>>>>> related_cpus scaling_setspeed >>>>>> scaling_available_frequencies stats >>>>>> # cat conservative/down_threshold >>>>>> 20 >>>>>> # echo ondemand > scaling_governor >>>>> Thanks Stephen, >>>>> >>>>> There's obviously a difference in our .configs. I have a global conservative >>>>> directory, ie) /sys/devices/system/cpu/cpufreq/conservative instead of a >>>>> per-cpu >>>>> governor file. >>>>> >>>>> ie) what are your .config options for CPUFREQ? >>>>> >>>>> Mine are: >>>>> >>>>> # >>>>> # CPU Frequency scaling >>>>> # >>>>> CONFIG_CPU_FREQ=y >>>>> CONFIG_CPU_FREQ_GOV_COMMON=y >>>>> CONFIG_CPU_FREQ_STAT=m >>>>> CONFIG_CPU_FREQ_STAT_DETAILS=y >>>>> # CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE is not set >>>>> # CONFIG_CPU_FREQ_DEFAULT_GOV_USERSPACE is not set >>>>> # CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND is not set >>>>> CONFIG_CPU_FREQ_DEFAULT_GOV_CONSERVATIVE=y >>>>> CONFIG_CPU_FREQ_GOV_PERFORMANCE=y >>>>> CONFIG_CPU_FREQ_GOV_POWERSAVE=y >>>>> CONFIG_CPU_FREQ_GOV_USERSPACE=y >>>>> CONFIG_CPU_FREQ_GOV_ONDEMAND=y >>>>> CONFIG_CPU_FREQ_GOV_CONSERVATIVE=y >>>>> >>>>> Is there some other config option I have to set? >>>> I have the same options. The difference is that my driver has a governor >>>> per policy. That's set with the CPUFREQ_HAVE_GOVERNOR_PER_POLICY flag. >>>> If I remove that flag I can't trigger the lockdep splat anymore with >>>> this sequence and your patch. >>> I see -- so you're seeing this on arm then? If so, let me know so I can reserve >>> one to work on :) >>> >> >> I only have ARM to test on. You can set this flag in your cpufreq driver >> if you have independently scaling CPU frequencies, so it isn't >> necessarily an ARM thing. >> > > Sorry, forgot to mention this in the earlier email. But yeah, this is a totally > SW feature. You should be able to enable this flag in your driver. Thanks Saravana, I'll get back to this today. I'm also going to try to track down an ARM server as well. I'll get back to everyone in a bit. P. > > -Saravana > -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Sorry for the delay guys, was away :( Adding Robert as well, he reported something similar so better discuss here. On 1 August 2014 22:48, Stephen Boyd <sboyd@codeaurora.org> wrote: > This was with conservative as the default, and switching to ondemand > > # cd /sys/devices/system/cpu/cpu2/cpufreq > # ls > affected_cpus scaling_available_governors > conservative scaling_cur_freq > cpuinfo_cur_freq scaling_driver > cpuinfo_max_freq scaling_governor > cpuinfo_min_freq scaling_max_freq > cpuinfo_transition_latency scaling_min_freq > related_cpus scaling_setspeed > scaling_available_frequencies stats > # cat conservative/down_threshold > 20 > # echo ondemand > scaling_governor > > ====================================================== > [ INFO: possible circular locking dependency detected ] > 3.16.0-rc3-00039-ge1e38f124d87 #47 Not tainted > ------------------------------------------------------- > sh/75 is trying to acquire lock: > (s_active#9){++++..}, at: [<c0358a94>] kernfs_remove_by_name_ns+0x3c/0x84 > > but task is already holding lock: > (&policy->rwsem){+++++.}, at: [<c05ab1f0>] store+0x68/0xb8 > > which lock already depends on the new lock. > > > the existing dependency chain (in reverse order) is: > > -> #1 (&policy->rwsem){+++++.}: > [<c0359234>] kernfs_fop_open+0x138/0x298 > [<c02fa3f4>] do_dentry_open.isra.12+0x1b0/0x2f0 > [<c02fa604>] finish_open+0x20/0x38 > [<c0308d34>] do_last.isra.37+0x5ac/0xb68 > [<c03093a4>] path_openat+0xb4/0x5d8 > [<c0309bcc>] do_filp_open+0x2c/0x80 > [<c02fb558>] do_sys_open+0x10c/0x1c8 > [<c020f0a0>] ret_fast_syscall+0x0/0x48 > > -> #0 (s_active#9){++++..}: > [<c0357d18>] __kernfs_remove+0x250/0x300 > [<c0358a94>] kernfs_remove_by_name_ns+0x3c/0x84 > [<c035aa78>] remove_files+0x34/0x78 > [<c035aee0>] sysfs_remove_group+0x40/0x98 > [<c05b0560>] cpufreq_governor_dbs+0x4c0/0x6ec > [<c05abebc>] __cpufreq_governor+0x118/0x200 > [<c05ac0fc>] cpufreq_set_policy+0x158/0x2ac > [<c05ad5e4>] store_scaling_governor+0x6c/0x94 > [<c05ab210>] store+0x88/0xb8 > [<c035a00c>] sysfs_kf_write+0x4c/0x50 > [<c03594d4>] kernfs_fop_write+0xc0/0x180 > [<c02fc5c8>] vfs_write+0xa0/0x1a8 > [<c02fc9d4>] SyS_write+0x40/0x8c > [<c020f0a0>] ret_fast_syscall+0x0/0x48 > > other info that might help us debug this: > > Possible unsafe locking scenario: > > CPU0 CPU1 > ---- ---- > lock(&policy->rwsem); > lock(s_active#9); > lock(&policy->rwsem); > lock(s_active#9); > > *** DEADLOCK *** > > 6 locks held by sh/75: > #0: (sb_writers#4){.+.+..}, at: [<c02fc6a8>] vfs_write+0x180/0x1a8 > #1: (&of->mutex){+.+...}, at: [<c0359498>] kernfs_fop_write+0x84/0x180 > #2: (s_active#10){.+.+..}, at: [<c03594a0>] kernfs_fop_write+0x8c/0x180 > #3: (cpu_hotplug.lock){++++++}, at: [<c0221ef8>] get_online_cpus+0x38/0x9c > #4: (cpufreq_rwsem){.+.+.+}, at: [<c05ab1d8>] store+0x50/0xb8 > #5: (&policy->rwsem){+++++.}, at: [<c05ab1f0>] store+0x68/0xb8 > > stack backtrace: > CPU: 0 PID: 75 Comm: sh Not tainted 3.16.0-rc3-00039-ge1e38f124d87 #47 > [<c0214de8>] (unwind_backtrace) from [<c02123f8>] (show_stack+0x10/0x14) > [<c02123f8>] (show_stack) from [<c0709e5c>] (dump_stack+0x70/0xbc) > [<c0709e5c>] (dump_stack) from [<c070722c>] (print_circular_bug+0x280/0x2d4) > [<c070722c>] (print_circular_bug) from [<c02629cc>] (__lock_acquire+0x18d0/0x1abc) > [<c02629cc>] (__lock_acquire) from [<c026310c>] (lock_acquire+0x9c/0x138) > [<c026310c>] (lock_acquire) from [<c0357d18>] (__kernfs_remove+0x250/0x300) > [<c0357d18>] (__kernfs_remove) from [<c0358a94>] (kernfs_remove_by_name_ns+0x3c/0x84) > [<c0358a94>] (kernfs_remove_by_name_ns) from [<c035aa78>] (remove_files+0x34/0x78) > [<c035aa78>] (remove_files) from [<c035aee0>] (sysfs_remove_group+0x40/0x98) > [<c035aee0>] (sysfs_remove_group) from [<c05b0560>] (cpufreq_governor_dbs+0x4c0/0x6ec) > [<c05b0560>] (cpufreq_governor_dbs) from [<c05abebc>] (__cpufreq_governor+0x118/0x200) > [<c05abebc>] (__cpufreq_governor) from [<c05ac0fc>] (cpufreq_set_policy+0x158/0x2ac) > [<c05ac0fc>] (cpufreq_set_policy) from [<c05ad5e4>] (store_scaling_governor+0x6c/0x94) > [<c05ad5e4>] (store_scaling_governor) from [<c05ab210>] (store+0x88/0xb8) > [<c05ab210>] (store) from [<c035a00c>] (sysfs_kf_write+0x4c/0x50) > [<c035a00c>] (sysfs_kf_write) from [<c03594d4>] (kernfs_fop_write+0xc0/0x180) > [<c03594d4>] (kernfs_fop_write) from [<c02fc5c8>] (vfs_write+0xa0/0x1a8) > [<c02fc5c8>] (vfs_write) from [<c02fc9d4>] (SyS_write+0x40/0x8c) > [<c02fc9d4>] (SyS_write) from [<c020f0a0>] (ret_fast_syscall+0x0/0x48) Thanks for coming to my rescue Stephen :), I was quite sure I got this with ondemand as well.. I will be looking very closely at the code now to see what's going wrong. And btw, does anybody here has the exact understanding of why this lockdep does happen? I mean what was the real problem for which we just dropped the rwsems.. I understood that earlier but couldn't get that again :) Thanks all for you work on getting this fixed. -- viresh -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 08/04/2014 06:36 AM, Viresh Kumar wrote: > Sorry for the delay guys, was away :( > >> >> Possible unsafe locking scenario: >> >> CPU0 CPU1 >> ---- ---- >> lock(&policy->rwsem); >> lock(s_active#9); >> lock(&policy->rwsem); >> lock(s_active#9); >> > > Thanks for coming to my rescue Stephen :), I was quite sure I got this > with ondemand > as well.. Yeah, I'm confused by it a bit because I wasn't able to reproduce it. But I've got some pretty clear instructions on how to do it. > > I will be looking very closely at the code now to see what's going wrong. > And btw, does anybody here has the exact understanding of why this > lockdep does happen? I mean what was the real problem for which we The issue is the collision between the setup & teardown of the policy's governor sysfs files. On creation the kernel does: down_write(&policy->rwsem) mutex_lock(kernfs_mutex) <- note this is similar to the "old" sysfs_mutex. The opposite happens on a governor switch, specifically the existing governor's exit, and then we get a lockdep warning. I tried to reproduce with the instructions but was unable to ... ut that was on Friday ;) and I'm going to try again this morning. I've also ping'd some of the engineers here in the office who are working on ARM to get access to a system to do further analysis and testing. I'll ping back later in the day with some results. Sorry I don't have a better answer or solution yet, Viresh. P. -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 4 August 2014 17:55, Prarit Bhargava <prarit@redhat.com> wrote: > The issue is the collision between the setup & teardown of the policy's governor > sysfs files. > > On creation the kernel does: > > down_write(&policy->rwsem) > mutex_lock(kernfs_mutex) <- note this is similar to the "old" sysfs_mutex. > > The opposite happens on a governor switch, specifically the existing governor's > exit, and then we get a lockdep warning. Okay, probably a bit more clarity is what I was looking for. Suppose we try to change governor, now tell me what will happen. > I tried to reproduce with the instructions but was unable to ... ut that was on > Friday ;) and I'm going to try again this morning. I've also ping'd some of the > engineers here in the office who are working on ARM to get access to a system to > do further analysis and testing. You DON'T need an ARM for that, just try that on any x86 machine which has multiple groups of CPUs sharing clock line. Or in other terms there are multiple policy structures there.. You just need to enable the flag we were discussing about, it just decided the location where governor's directory will get created. Nothing else. -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 08/04/2014 06:38 AM, Viresh Kumar wrote: > On 4 August 2014 17:55, Prarit Bhargava <prarit@redhat.com> wrote: >> The issue is the collision between the setup & teardown of the policy's governor >> sysfs files. >> >> On creation the kernel does: >> >> down_write(&policy->rwsem) >> mutex_lock(kernfs_mutex) <- note this is similar to the "old" sysfs_mutex. >> >> The opposite happens on a governor switch, specifically the existing governor's >> exit, and then we get a lockdep warning. > > Okay, probably a bit more clarity is what I was looking for. Suppose we try > to change governor, now tell me what will happen. > Viresh, I have a good understanding of what's going on. I was planning to fix this after the rest of my patches are done. Didn't want to keep too many outstanding unaccepted patches. The problem is when between one thread trying to cat a governor's file (say, sampling_rate) vs the governor getting a POLICY_EXIT. In the read thread, the sysfs lock is grabbed first and then the policy lock is grabbed next. When the governor gets the POLICY_EXIT, we call it with the policy lock held and the governor tries to do sysfs remove which tries to grab the sysfs lock. Classic deadlock. Could you please look at my policy free/remove patches? If you can do that, I can work on a fix for this. It might also be simpler to fix after my patch series (not sure, might be). Thanks, Saravana
On 5 August 2014 01:46, Saravana Kannan <skannan@codeaurora.org> wrote: > The problem is when between one thread trying to cat a governor's file (say, > sampling_rate) vs the governor getting a POLICY_EXIT. I don't see two threads racing against each other here. Simply changing the governor from conservative->ondemand creates this. Or is it that the kernel is detecting two different orders of taking lock? But during governor change, isn't the sysfs lock taken first as we are storing a value to "scaling_governor"? So, isn't this a sysfs lock first in all cases? In short, I am still unclear about the *exact* problem here. > Could you please look at my policy free/remove patches? If you can do that, > I can work on a fix for this. It might also be simpler to fix after my patch > series (not sure, might be). I had an overall look of those on the day you posted them, but haven't commented yet as was going away.. There is no way those can land in 3.17-rc1 atleast and so we still have some time to get them pushed.. Anyway, they are my number two priority and the number one is this bug, which we have to fix in stable kernels as well. So, a dependency on your series wouldn't work.. Let me see if I can get some solution to this problem first. -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Viresh Kumar wrote: > On 5 August 2014 01:46, Saravana Kannan <skannan@codeaurora.org> wrote: >> The problem is when between one thread trying to cat a governor's file >> (say, >> sampling_rate) vs the governor getting a POLICY_EXIT. > > I don't see two threads racing against each other here. Simply changing > the governor from conservative->ondemand creates this. > > Or is it that the kernel is detecting two different orders of taking lock? > > But during governor change, isn't the sysfs lock taken first as we are > storing a value to "scaling_governor"? So, isn't this a sysfs lock first > in all cases? > > In short, I am still unclear about the *exact* problem here. > >> Could you please look at my policy free/remove patches? If you can do >> that, >> I can work on a fix for this. It might also be simpler to fix after my >> patch >> series (not sure, might be). > > I had an overall look of those on the day you posted them, but haven't > commented yet as was going away.. > > There is no way those can land in 3.17-rc1 atleast and so we still have > some time to get them pushed.. > > Anyway, they are my number two priority and the number one is this bug, > which we have to fix in stable kernels as well. So, a dependency on your > series wouldn't work.. Sigh... ok. I too will try to fix this one. I already have something in mind for this. Looks like Srivatsa has gone off the grid too. I'm hoping at least one of you can do a review of my series. Come on guys, not everyone has to work on the same patch/issue. :-( -Saravana
On 5 August 2014 11:59, <skannan@codeaurora.org> wrote: > Looks like Srivatsa has gone off the grid too. I'm hoping at least one of > you can do a review of my series. Come on guys, not everyone has to work > on the same patch/issue. :-( Srivatsa is currently going through some personal stuff. He is joining MIT for his Masters/Phd and so would be moving to US soon :) -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2 August 2014 01:06, Stephen Boyd <sboyd@codeaurora.org> wrote: > I have the same options. The difference is that my driver has a governor > per policy. That's set with the CPUFREQ_HAVE_GOVERNOR_PER_POLICY flag. You may call me stupid but I got a bit confused after looking into the code again. Why does the crash dump depends on this flag? We *always* remove the governor specific directory while switching governors (Ofcourse only if its updated for All CPUs). And so on a dual core platform, where both CPU 0 & 1 share a clock line, switching of governors should result in this crash dump? I may know the answer to the stupid question I had, but not sure why that is a problem. The only (and quite significant) difference that this flag makes is the location of governor-specific directory: - w/o this flag: /sys/devices/system/cpu/cpufreq/<here> - w/ this flag: /sys/devices/system/cpu/cpu*/cpufreq/<here> So, is there some issue with the sysfs lock for <cpu*/cpufreq/> node as while switching governor we change <cpu*/cpufreq/scaling_governor> at the same location? -- viresh -- viresh -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 08/05/2014 03:46 AM, Viresh Kumar wrote: > On 2 August 2014 01:06, Stephen Boyd <sboyd@codeaurora.org> wrote: >> I have the same options. The difference is that my driver has a governor >> per policy. That's set with the CPUFREQ_HAVE_GOVERNOR_PER_POLICY flag. > > You may call me stupid but I got a bit confused after looking into the code > again. Why does the crash dump depends on this flag? Nope, not a stupid question. After reproducing (finally!) yesterday I've been wondering the same thing. > > We *always* remove the governor specific directory while switching governors > (Ofcourse only if its updated for All CPUs). And so on a dual core platform, > where both CPU 0 & 1 share a clock line, switching of governors should result > in this crash dump? I've been looking into *exactly* this. On any platform where cpu_weight(affected_cpus) == 1 for a particular cpu this lockdep trace should happen. > > I may know the answer to the stupid question I had, but not sure why that is a > problem. The only (and quite significant) difference that this flag makes > is the location of governor-specific directory: > - w/o this flag: /sys/devices/system/cpu/cpufreq/<here> > - w/ this flag: /sys/devices/system/cpu/cpu*/cpufreq/<here> > cpufreq_global_kobject vs the policy's kobject. > So, is there some issue with the sysfs lock for <cpu*/cpufreq/> node as while > switching governor we change <cpu*/cpufreq/scaling_governor> at the same > location? That's what I'm wondering too. I'm going to instrument the code to find out this morning. I'm wondering if this comes down to a lockdep class issue (perhaps lockdep puts globally defined locks like cpufreq_global_kobject in a different class?). P. -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 1 August 2014 22:48, Stephen Boyd <sboyd@codeaurora.org> wrote: > On 08/01/14 03:27, Prarit Bhargava wrote: >> >> Can you send me the test and the trace of the deadlock? I'm not creating it with: >> > > This was with conservative as the default, and switching to ondemand > > # cd /sys/devices/system/cpu/cpu2/cpufreq > # ls > affected_cpus scaling_available_governors > conservative scaling_cur_freq > cpuinfo_cur_freq scaling_driver > cpuinfo_max_freq scaling_governor > cpuinfo_min_freq scaling_max_freq > cpuinfo_transition_latency scaling_min_freq > related_cpus scaling_setspeed > scaling_available_frequencies stats > # cat conservative/down_threshold > 20 > # echo ondemand > scaling_governor > > ====================================================== > [ INFO: possible circular locking dependency detected ] > 3.16.0-rc3-00039-ge1e38f124d87 #47 Not tainted > ------------------------------------------------------- > sh/75 is trying to acquire lock: > (s_active#9){++++..}, at: [<c0358a94>] kernfs_remove_by_name_ns+0x3c/0x84 Can you please retry this on mainline? I wasn't able to reproduce it now over 3.17. I am trying this on Exynos b.L implementation.. -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 10/13/2014 06:43 AM, Viresh Kumar wrote: > On 1 August 2014 22:48, Stephen Boyd <sboyd@codeaurora.org> wrote: >> On 08/01/14 03:27, Prarit Bhargava wrote: >>> >>> Can you send me the test and the trace of the deadlock? I'm not creating it with: >>> >> >> This was with conservative as the default, and switching to ondemand >> >> # cd /sys/devices/system/cpu/cpu2/cpufreq >> # ls >> affected_cpus scaling_available_governors >> conservative scaling_cur_freq >> cpuinfo_cur_freq scaling_driver >> cpuinfo_max_freq scaling_governor >> cpuinfo_min_freq scaling_max_freq >> cpuinfo_transition_latency scaling_min_freq >> related_cpus scaling_setspeed >> scaling_available_frequencies stats >> # cat conservative/down_threshold >> 20 >> # echo ondemand > scaling_governor >> >> ====================================================== >> [ INFO: possible circular locking dependency detected ] >> 3.16.0-rc3-00039-ge1e38f124d87 #47 Not tainted >> ------------------------------------------------------- >> sh/75 is trying to acquire lock: >> (s_active#9){++++..}, at: [<c0358a94>] kernfs_remove_by_name_ns+0x3c/0x84 > > Can you please retry this on mainline? I wasn't able to reproduce it > now over 3.17. > I am trying this on Exynos b.L implementation.. I have 100% reproducibility on latest mainline. Viresh, please see my next post on the locking issues in cpufreq. P. > -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 6f02485..fa11a7d 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -2200,9 +2200,7 @@ static int cpufreq_set_policy(struct cpufreq_policy *polic /* end old governor */ if (old_gov) { __cpufreq_governor(policy, CPUFREQ_GOV_STOP); - up_write(&policy->rwsem); __cpufreq_governor(policy, CPUFREQ_GOV_POLICY_EXIT); - down_write(&policy->rwsem); } /* start new governor */ @@ -2211,9 +2209,7 @@ static int cpufreq_set_policy(struct cpufreq_policy *polic if (!__cpufreq_governor(policy, CPUFREQ_GOV_START)) goto out; - up_write(&policy->rwsem); __cpufreq_governor(policy, CPUFREQ_GOV_POLICY_EXIT); - down_write(&policy->rwsem); } /* new governor failed, so re-start old one */