diff mbox

cpufreq, store_scaling_governor requires policy->rwsem to be held for duration of changing governors [v2]

Message ID 53DB6B81.6050400@redhat.com (mailing list archive)
State Not Applicable, archived
Headers show

Commit Message

Prarit Bhargava Aug. 1, 2014, 10:27 a.m. UTC
On 07/31/2014 08:55 PM, Saravana Kannan wrote:
> On 07/31/2014 03:58 PM, Prarit Bhargava wrote:
>>
>>
>> On 07/31/2014 06:13 PM, Saravana Kannan wrote:
>>> On 07/31/2014 02:08 PM, Prarit Bhargava wrote:
>>>>
>>>>
>>>> On 07/31/2014 04:38 PM, Saravana Kannan wrote:
>>>>> On 07/31/2014 01:30 PM, Prarit Bhargava wrote:
>>>>>>
>>>>>>
>>>>>> On 07/31/2014 04:24 PM, Saravana Kannan wrote:
>>>>>>>
>>>>>>> Prarit,
>>>>>>>
>>>>>>> I'm not an expert on sysfs locking, but I would think the specific sysfs
>>>>>>> lock
>>>>>>> would depend on the file/attribute group. So, can you please try to
>>>>>>> hotplug a
>>>>>>> core in/out (to trigger the POLICY_EXIT) and then read a sysfs file
>>>>>>> exported by
>>>>>>> the governor? scaling_governor doesn't cut it since that file is not
>>>>>>> removed on
>>>>>>> policy exit event to governor. If it's ondemand, try reading/write it's
>>>>>>> sampling
>>>>>>> rate file.
>>>>>>
>>>>>> Thanks Saravana -- will do.  I will get back to you shortly on this.
>>>>>>
>>>>>
>>>>> Thanks. Btw, in case you weren't already aware of it. You'll have to hoplug
>>>>> out
>>>>> all the CPUs in a cluster to trigger a POLICY_EXIT for that cluster/policy.
>>>>
>>>> Yep -- the affected_cpus file should show all the cpus in the policy IIRC.  One
>>>> of the systems I have has 1 cpu/policy and has 48 threads so the POLICY_EXIT is
>>>> called.
>>>>
>>>> I'll put something like
>>>>
>>>> while [1];
>>>> do
>>>> echo ondemand > /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor
>>>> cat /sys/devices/system/cpu/cpufreq/ondemand/sampling_rate
>>>> echo 20000 > /sys/devices/system/cpu/cpufreq/ondemand/sampling_rate
>>>> cat /sys/devices/system/cpu/cpufreq/ondemand/sampling_rate
>>>> echo 0 > /sys/devices/system/cpu/cpu1/online
>>>> sleep 1
>>>> echo 1 > /sys/devices/system/cpu/cpu1/online
>>>> sleep 1
>>>> done
>>>>
>>>
>>> The actual race can only happen with 2 threads. I'm just trying to trigger a
>>> lockdep warning here.
>>
>> I ran the above in two separate terminals with cpuset -c 0 and cpuset -c 1 to
>> multi-thread it all.  No deadlock or LOCKDEP trace after about 1/2 hour, so I
>> think we're in the clear on that concern.
>>
> 
> I wasn't convinced. So, I took some help from Stephen to test it.
> 
> It's been a while, so I didn't remember the original issue clearly when I gave
> you some test suggestions. Now that I looked at the code more closely, I have a
> proper way to reproduce the original issue.
> 
> Nack for this patch for 2 reasons:
> 1. You seem to have accidentally removed a GOV_STOP in your patch. We definitely
> can't do that. This broke changing governors and that's why your patch didn't
> cause any issues. Because all your governor echos were failing.
> 2. When we fixed that and actually tried a proper test (not the one I gave you),
> we reproduced the original issue.
> 
> To reproduce original issue:
> Preconditions:
> * lockdep is enabled
> * governor per policy is enabled
> 
> Steps:
> 1. Set governor to ondemand.
> 2. Cat one of the ondemand sysfs files.
> 3. Change governor to conservative.

Can you send me the test and the trace of the deadlock?  I'm not creating it with:


> 
> When you do that, there's an AB, BA dead lock issue with one thread trying to
> cat a governor sysfs file and another thread trying to change governors.
> 



> -Saravana
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Stephen Boyd Aug. 1, 2014, 5:18 p.m. UTC | #1
On 08/01/14 03:27, Prarit Bhargava wrote:
>
> Can you send me the test and the trace of the deadlock?  I'm not creating it with:
>

This was with conservative as the default, and switching to ondemand

# cd /sys/devices/system/cpu/cpu2/cpufreq
# ls
affected_cpus                  scaling_available_governors
conservative                   scaling_cur_freq
cpuinfo_cur_freq               scaling_driver
cpuinfo_max_freq               scaling_governor
cpuinfo_min_freq               scaling_max_freq
cpuinfo_transition_latency     scaling_min_freq
related_cpus                   scaling_setspeed
scaling_available_frequencies  stats
# cat conservative/down_threshold
20
# echo ondemand > scaling_governor

 ======================================================
 [ INFO: possible circular locking dependency detected ]
 3.16.0-rc3-00039-ge1e38f124d87 #47 Not tainted
 -------------------------------------------------------
 sh/75 is trying to acquire lock:
  (s_active#9){++++..}, at: [<c0358a94>] kernfs_remove_by_name_ns+0x3c/0x84

 but task is already holding lock:
  (&policy->rwsem){+++++.}, at: [<c05ab1f0>] store+0x68/0xb8

 which lock already depends on the new lock.


 the existing dependency chain (in reverse order) is:

-> #1 (&policy->rwsem){+++++.}:
        [<c0359234>] kernfs_fop_open+0x138/0x298
        [<c02fa3f4>] do_dentry_open.isra.12+0x1b0/0x2f0
        [<c02fa604>] finish_open+0x20/0x38
        [<c0308d34>] do_last.isra.37+0x5ac/0xb68
        [<c03093a4>] path_openat+0xb4/0x5d8
        [<c0309bcc>] do_filp_open+0x2c/0x80
        [<c02fb558>] do_sys_open+0x10c/0x1c8
        [<c020f0a0>] ret_fast_syscall+0x0/0x48

-> #0 (s_active#9){++++..}:
        [<c0357d18>] __kernfs_remove+0x250/0x300
        [<c0358a94>] kernfs_remove_by_name_ns+0x3c/0x84
        [<c035aa78>] remove_files+0x34/0x78
        [<c035aee0>] sysfs_remove_group+0x40/0x98
        [<c05b0560>] cpufreq_governor_dbs+0x4c0/0x6ec
        [<c05abebc>] __cpufreq_governor+0x118/0x200
        [<c05ac0fc>] cpufreq_set_policy+0x158/0x2ac
        [<c05ad5e4>] store_scaling_governor+0x6c/0x94
        [<c05ab210>] store+0x88/0xb8
        [<c035a00c>] sysfs_kf_write+0x4c/0x50
        [<c03594d4>] kernfs_fop_write+0xc0/0x180
        [<c02fc5c8>] vfs_write+0xa0/0x1a8
        [<c02fc9d4>] SyS_write+0x40/0x8c
        [<c020f0a0>] ret_fast_syscall+0x0/0x48

 other info that might help us debug this:

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(&policy->rwsem);
                                lock(s_active#9);
                                lock(&policy->rwsem);
   lock(s_active#9);

  *** DEADLOCK ***

 6 locks held by sh/75:
  #0:  (sb_writers#4){.+.+..}, at: [<c02fc6a8>] vfs_write+0x180/0x1a8
  #1:  (&of->mutex){+.+...}, at: [<c0359498>] kernfs_fop_write+0x84/0x180
  #2:  (s_active#10){.+.+..}, at: [<c03594a0>] kernfs_fop_write+0x8c/0x180
  #3:  (cpu_hotplug.lock){++++++}, at: [<c0221ef8>] get_online_cpus+0x38/0x9c
  #4:  (cpufreq_rwsem){.+.+.+}, at: [<c05ab1d8>] store+0x50/0xb8
  #5:  (&policy->rwsem){+++++.}, at: [<c05ab1f0>] store+0x68/0xb8

 stack backtrace:
 CPU: 0 PID: 75 Comm: sh Not tainted 3.16.0-rc3-00039-ge1e38f124d87 #47
 [<c0214de8>] (unwind_backtrace) from [<c02123f8>] (show_stack+0x10/0x14)
 [<c02123f8>] (show_stack) from [<c0709e5c>] (dump_stack+0x70/0xbc)
 [<c0709e5c>] (dump_stack) from [<c070722c>] (print_circular_bug+0x280/0x2d4)
 [<c070722c>] (print_circular_bug) from [<c02629cc>] (__lock_acquire+0x18d0/0x1abc)
 [<c02629cc>] (__lock_acquire) from [<c026310c>] (lock_acquire+0x9c/0x138)
 [<c026310c>] (lock_acquire) from [<c0357d18>] (__kernfs_remove+0x250/0x300)
 [<c0357d18>] (__kernfs_remove) from [<c0358a94>] (kernfs_remove_by_name_ns+0x3c/0x84)
 [<c0358a94>] (kernfs_remove_by_name_ns) from [<c035aa78>] (remove_files+0x34/0x78)
 [<c035aa78>] (remove_files) from [<c035aee0>] (sysfs_remove_group+0x40/0x98)
 [<c035aee0>] (sysfs_remove_group) from [<c05b0560>] (cpufreq_governor_dbs+0x4c0/0x6ec)
 [<c05b0560>] (cpufreq_governor_dbs) from [<c05abebc>] (__cpufreq_governor+0x118/0x200)
 [<c05abebc>] (__cpufreq_governor) from [<c05ac0fc>] (cpufreq_set_policy+0x158/0x2ac)
 [<c05ac0fc>] (cpufreq_set_policy) from [<c05ad5e4>] (store_scaling_governor+0x6c/0x94)
 [<c05ad5e4>] (store_scaling_governor) from [<c05ab210>] (store+0x88/0xb8)
 [<c05ab210>] (store) from [<c035a00c>] (sysfs_kf_write+0x4c/0x50)
 [<c035a00c>] (sysfs_kf_write) from [<c03594d4>] (kernfs_fop_write+0xc0/0x180)
 [<c03594d4>] (kernfs_fop_write) from [<c02fc5c8>] (vfs_write+0xa0/0x1a8)
 [<c02fc5c8>] (vfs_write) from [<c02fc9d4>] (SyS_write+0x40/0x8c)
 [<c02fc9d4>] (SyS_write) from [<c020f0a0>] (ret_fast_syscall+0x0/0x48)
Prarit Bhargava Aug. 1, 2014, 7:15 p.m. UTC | #2
On 08/01/2014 01:18 PM, Stephen Boyd wrote:
> On 08/01/14 03:27, Prarit Bhargava wrote:
>>
>> Can you send me the test and the trace of the deadlock?  I'm not creating it with:
>>
> 
> This was with conservative as the default, and switching to ondemand
> 
> # cd /sys/devices/system/cpu/cpu2/cpufreq
> # ls
> affected_cpus                  scaling_available_governors
> conservative                   scaling_cur_freq
> cpuinfo_cur_freq               scaling_driver
> cpuinfo_max_freq               scaling_governor
> cpuinfo_min_freq               scaling_max_freq
> cpuinfo_transition_latency     scaling_min_freq
> related_cpus                   scaling_setspeed
> scaling_available_frequencies  stats
> # cat conservative/down_threshold
> 20
> # echo ondemand > scaling_governor

Thanks Stephen,

There's obviously a difference in our .configs.  I have a global conservative
directory, ie) /sys/devices/system/cpu/cpufreq/conservative instead of a per-cpu
governor file.

ie) what are your .config options for CPUFREQ?

Mine are:

#
# CPU Frequency scaling
#
CONFIG_CPU_FREQ=y
CONFIG_CPU_FREQ_GOV_COMMON=y
CONFIG_CPU_FREQ_STAT=m
CONFIG_CPU_FREQ_STAT_DETAILS=y
# CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE is not set
# CONFIG_CPU_FREQ_DEFAULT_GOV_USERSPACE is not set
# CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND is not set
CONFIG_CPU_FREQ_DEFAULT_GOV_CONSERVATIVE=y
CONFIG_CPU_FREQ_GOV_PERFORMANCE=y
CONFIG_CPU_FREQ_GOV_POWERSAVE=y
CONFIG_CPU_FREQ_GOV_USERSPACE=y
CONFIG_CPU_FREQ_GOV_ONDEMAND=y
CONFIG_CPU_FREQ_GOV_CONSERVATIVE=y

Is there some other config option I have to set?

P.

> 
>  ======================================================
>  [ INFO: possible circular locking dependency detected ]
>  3.16.0-rc3-00039-ge1e38f124d87 #47 Not tainted
>  -------------------------------------------------------
>  sh/75 is trying to acquire lock:
>   (s_active#9){++++..}, at: [<c0358a94>] kernfs_remove_by_name_ns+0x3c/0x84
> 
>  but task is already holding lock:
>   (&policy->rwsem){+++++.}, at: [<c05ab1f0>] store+0x68/0xb8
> 
>  which lock already depends on the new lock.
> 
> 
>  the existing dependency chain (in reverse order) is:
> 
> -> #1 (&policy->rwsem){+++++.}:
>         [<c0359234>] kernfs_fop_open+0x138/0x298
>         [<c02fa3f4>] do_dentry_open.isra.12+0x1b0/0x2f0
>         [<c02fa604>] finish_open+0x20/0x38
>         [<c0308d34>] do_last.isra.37+0x5ac/0xb68
>         [<c03093a4>] path_openat+0xb4/0x5d8
>         [<c0309bcc>] do_filp_open+0x2c/0x80
>         [<c02fb558>] do_sys_open+0x10c/0x1c8
>         [<c020f0a0>] ret_fast_syscall+0x0/0x48
> 
> -> #0 (s_active#9){++++..}:
>         [<c0357d18>] __kernfs_remove+0x250/0x300
>         [<c0358a94>] kernfs_remove_by_name_ns+0x3c/0x84
>         [<c035aa78>] remove_files+0x34/0x78
>         [<c035aee0>] sysfs_remove_group+0x40/0x98
>         [<c05b0560>] cpufreq_governor_dbs+0x4c0/0x6ec
>         [<c05abebc>] __cpufreq_governor+0x118/0x200
>         [<c05ac0fc>] cpufreq_set_policy+0x158/0x2ac
>         [<c05ad5e4>] store_scaling_governor+0x6c/0x94
>         [<c05ab210>] store+0x88/0xb8
>         [<c035a00c>] sysfs_kf_write+0x4c/0x50
>         [<c03594d4>] kernfs_fop_write+0xc0/0x180
>         [<c02fc5c8>] vfs_write+0xa0/0x1a8
>         [<c02fc9d4>] SyS_write+0x40/0x8c
>         [<c020f0a0>] ret_fast_syscall+0x0/0x48
> 
>  other info that might help us debug this:
> 
>   Possible unsafe locking scenario:
> 
>         CPU0                    CPU1
>         ----                    ----
>    lock(&policy->rwsem);
>                                 lock(s_active#9);
>                                 lock(&policy->rwsem);
>    lock(s_active#9);
> 
>   *** DEADLOCK ***
> 
>  6 locks held by sh/75:
>   #0:  (sb_writers#4){.+.+..}, at: [<c02fc6a8>] vfs_write+0x180/0x1a8
>   #1:  (&of->mutex){+.+...}, at: [<c0359498>] kernfs_fop_write+0x84/0x180
>   #2:  (s_active#10){.+.+..}, at: [<c03594a0>] kernfs_fop_write+0x8c/0x180
>   #3:  (cpu_hotplug.lock){++++++}, at: [<c0221ef8>] get_online_cpus+0x38/0x9c
>   #4:  (cpufreq_rwsem){.+.+.+}, at: [<c05ab1d8>] store+0x50/0xb8
>   #5:  (&policy->rwsem){+++++.}, at: [<c05ab1f0>] store+0x68/0xb8
> 
>  stack backtrace:
>  CPU: 0 PID: 75 Comm: sh Not tainted 3.16.0-rc3-00039-ge1e38f124d87 #47
>  [<c0214de8>] (unwind_backtrace) from [<c02123f8>] (show_stack+0x10/0x14)
>  [<c02123f8>] (show_stack) from [<c0709e5c>] (dump_stack+0x70/0xbc)
>  [<c0709e5c>] (dump_stack) from [<c070722c>] (print_circular_bug+0x280/0x2d4)
>  [<c070722c>] (print_circular_bug) from [<c02629cc>] (__lock_acquire+0x18d0/0x1abc)
>  [<c02629cc>] (__lock_acquire) from [<c026310c>] (lock_acquire+0x9c/0x138)
>  [<c026310c>] (lock_acquire) from [<c0357d18>] (__kernfs_remove+0x250/0x300)
>  [<c0357d18>] (__kernfs_remove) from [<c0358a94>] (kernfs_remove_by_name_ns+0x3c/0x84)
>  [<c0358a94>] (kernfs_remove_by_name_ns) from [<c035aa78>] (remove_files+0x34/0x78)
>  [<c035aa78>] (remove_files) from [<c035aee0>] (sysfs_remove_group+0x40/0x98)
>  [<c035aee0>] (sysfs_remove_group) from [<c05b0560>] (cpufreq_governor_dbs+0x4c0/0x6ec)
>  [<c05b0560>] (cpufreq_governor_dbs) from [<c05abebc>] (__cpufreq_governor+0x118/0x200)
>  [<c05abebc>] (__cpufreq_governor) from [<c05ac0fc>] (cpufreq_set_policy+0x158/0x2ac)
>  [<c05ac0fc>] (cpufreq_set_policy) from [<c05ad5e4>] (store_scaling_governor+0x6c/0x94)
>  [<c05ad5e4>] (store_scaling_governor) from [<c05ab210>] (store+0x88/0xb8)
>  [<c05ab210>] (store) from [<c035a00c>] (sysfs_kf_write+0x4c/0x50)
>  [<c035a00c>] (sysfs_kf_write) from [<c03594d4>] (kernfs_fop_write+0xc0/0x180)
>  [<c03594d4>] (kernfs_fop_write) from [<c02fc5c8>] (vfs_write+0xa0/0x1a8)
>  [<c02fc5c8>] (vfs_write) from [<c02fc9d4>] (SyS_write+0x40/0x8c)
>  [<c02fc9d4>] (SyS_write) from [<c020f0a0>] (ret_fast_syscall+0x0/0x48)
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Stephen Boyd Aug. 1, 2014, 7:36 p.m. UTC | #3
On 08/01/14 12:15, Prarit Bhargava wrote:
>
> On 08/01/2014 01:18 PM, Stephen Boyd wrote:
>> On 08/01/14 03:27, Prarit Bhargava wrote:
>>> Can you send me the test and the trace of the deadlock?  I'm not creating it with:
>>>
>> This was with conservative as the default, and switching to ondemand
>>
>> # cd /sys/devices/system/cpu/cpu2/cpufreq
>> # ls
>> affected_cpus                  scaling_available_governors
>> conservative                   scaling_cur_freq
>> cpuinfo_cur_freq               scaling_driver
>> cpuinfo_max_freq               scaling_governor
>> cpuinfo_min_freq               scaling_max_freq
>> cpuinfo_transition_latency     scaling_min_freq
>> related_cpus                   scaling_setspeed
>> scaling_available_frequencies  stats
>> # cat conservative/down_threshold
>> 20
>> # echo ondemand > scaling_governor
> Thanks Stephen,
>
> There's obviously a difference in our .configs.  I have a global conservative
> directory, ie) /sys/devices/system/cpu/cpufreq/conservative instead of a per-cpu
> governor file.
>
> ie) what are your .config options for CPUFREQ?
>
> Mine are:
>
> #
> # CPU Frequency scaling
> #
> CONFIG_CPU_FREQ=y
> CONFIG_CPU_FREQ_GOV_COMMON=y
> CONFIG_CPU_FREQ_STAT=m
> CONFIG_CPU_FREQ_STAT_DETAILS=y
> # CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE is not set
> # CONFIG_CPU_FREQ_DEFAULT_GOV_USERSPACE is not set
> # CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND is not set
> CONFIG_CPU_FREQ_DEFAULT_GOV_CONSERVATIVE=y
> CONFIG_CPU_FREQ_GOV_PERFORMANCE=y
> CONFIG_CPU_FREQ_GOV_POWERSAVE=y
> CONFIG_CPU_FREQ_GOV_USERSPACE=y
> CONFIG_CPU_FREQ_GOV_ONDEMAND=y
> CONFIG_CPU_FREQ_GOV_CONSERVATIVE=y
>
> Is there some other config option I have to set?

I have the same options. The difference is that my driver has a governor
per policy. That's set with the CPUFREQ_HAVE_GOVERNOR_PER_POLICY flag.
If I remove that flag I can't trigger the lockdep splat anymore with
this sequence and your patch.
Prarit Bhargava Aug. 1, 2014, 7:43 p.m. UTC | #4
On 08/01/2014 03:36 PM, Stephen Boyd wrote:
> On 08/01/14 12:15, Prarit Bhargava wrote:
>>
>> On 08/01/2014 01:18 PM, Stephen Boyd wrote:
>>> On 08/01/14 03:27, Prarit Bhargava wrote:
>>>> Can you send me the test and the trace of the deadlock?  I'm not creating it with:
>>>>
>>> This was with conservative as the default, and switching to ondemand
>>>
>>> # cd /sys/devices/system/cpu/cpu2/cpufreq
>>> # ls
>>> affected_cpus                  scaling_available_governors
>>> conservative                   scaling_cur_freq
>>> cpuinfo_cur_freq               scaling_driver
>>> cpuinfo_max_freq               scaling_governor
>>> cpuinfo_min_freq               scaling_max_freq
>>> cpuinfo_transition_latency     scaling_min_freq
>>> related_cpus                   scaling_setspeed
>>> scaling_available_frequencies  stats
>>> # cat conservative/down_threshold
>>> 20
>>> # echo ondemand > scaling_governor
>> Thanks Stephen,
>>
>> There's obviously a difference in our .configs.  I have a global conservative
>> directory, ie) /sys/devices/system/cpu/cpufreq/conservative instead of a per-cpu
>> governor file.
>>
>> ie) what are your .config options for CPUFREQ?
>>
>> Mine are:
>>
>> #
>> # CPU Frequency scaling
>> #
>> CONFIG_CPU_FREQ=y
>> CONFIG_CPU_FREQ_GOV_COMMON=y
>> CONFIG_CPU_FREQ_STAT=m
>> CONFIG_CPU_FREQ_STAT_DETAILS=y
>> # CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE is not set
>> # CONFIG_CPU_FREQ_DEFAULT_GOV_USERSPACE is not set
>> # CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND is not set
>> CONFIG_CPU_FREQ_DEFAULT_GOV_CONSERVATIVE=y
>> CONFIG_CPU_FREQ_GOV_PERFORMANCE=y
>> CONFIG_CPU_FREQ_GOV_POWERSAVE=y
>> CONFIG_CPU_FREQ_GOV_USERSPACE=y
>> CONFIG_CPU_FREQ_GOV_ONDEMAND=y
>> CONFIG_CPU_FREQ_GOV_CONSERVATIVE=y
>>
>> Is there some other config option I have to set?
> 
> I have the same options. The difference is that my driver has a governor
> per policy. That's set with the CPUFREQ_HAVE_GOVERNOR_PER_POLICY flag.
> If I remove that flag I can't trigger the lockdep splat anymore with
> this sequence and your patch.

I see -- so you're seeing this on arm then?  If so, let me know so I can reserve
one to work on :)

Thanks!

P.
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Stephen Boyd Aug. 1, 2014, 7:54 p.m. UTC | #5
On 08/01/14 12:43, Prarit Bhargava wrote:
>
> On 08/01/2014 03:36 PM, Stephen Boyd wrote:
>> On 08/01/14 12:15, Prarit Bhargava wrote:
>>> On 08/01/2014 01:18 PM, Stephen Boyd wrote:
>>>> On 08/01/14 03:27, Prarit Bhargava wrote:
>>>>> Can you send me the test and the trace of the deadlock?  I'm not creating it with:
>>>>>
>>>> This was with conservative as the default, and switching to ondemand
>>>>
>>>> # cd /sys/devices/system/cpu/cpu2/cpufreq
>>>> # ls
>>>> affected_cpus                  scaling_available_governors
>>>> conservative                   scaling_cur_freq
>>>> cpuinfo_cur_freq               scaling_driver
>>>> cpuinfo_max_freq               scaling_governor
>>>> cpuinfo_min_freq               scaling_max_freq
>>>> cpuinfo_transition_latency     scaling_min_freq
>>>> related_cpus                   scaling_setspeed
>>>> scaling_available_frequencies  stats
>>>> # cat conservative/down_threshold
>>>> 20
>>>> # echo ondemand > scaling_governor
>>> Thanks Stephen,
>>>
>>> There's obviously a difference in our .configs.  I have a global conservative
>>> directory, ie) /sys/devices/system/cpu/cpufreq/conservative instead of a per-cpu
>>> governor file.
>>>
>>> ie) what are your .config options for CPUFREQ?
>>>
>>> Mine are:
>>>
>>> #
>>> # CPU Frequency scaling
>>> #
>>> CONFIG_CPU_FREQ=y
>>> CONFIG_CPU_FREQ_GOV_COMMON=y
>>> CONFIG_CPU_FREQ_STAT=m
>>> CONFIG_CPU_FREQ_STAT_DETAILS=y
>>> # CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE is not set
>>> # CONFIG_CPU_FREQ_DEFAULT_GOV_USERSPACE is not set
>>> # CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND is not set
>>> CONFIG_CPU_FREQ_DEFAULT_GOV_CONSERVATIVE=y
>>> CONFIG_CPU_FREQ_GOV_PERFORMANCE=y
>>> CONFIG_CPU_FREQ_GOV_POWERSAVE=y
>>> CONFIG_CPU_FREQ_GOV_USERSPACE=y
>>> CONFIG_CPU_FREQ_GOV_ONDEMAND=y
>>> CONFIG_CPU_FREQ_GOV_CONSERVATIVE=y
>>>
>>> Is there some other config option I have to set?
>> I have the same options. The difference is that my driver has a governor
>> per policy. That's set with the CPUFREQ_HAVE_GOVERNOR_PER_POLICY flag.
>> If I remove that flag I can't trigger the lockdep splat anymore with
>> this sequence and your patch.
> I see -- so you're seeing this on arm then?  If so, let me know so I can reserve
> one to work on :)
>

I only have ARM to test on. You can set this flag in your cpufreq driver
if you have independently scaling CPU frequencies, so it isn't
necessarily an ARM thing.
Saravana Kannan Aug. 1, 2014, 9:25 p.m. UTC | #6
On 08/01/2014 12:54 PM, Stephen Boyd wrote:
> On 08/01/14 12:43, Prarit Bhargava wrote:
>>
>> On 08/01/2014 03:36 PM, Stephen Boyd wrote:
>>> On 08/01/14 12:15, Prarit Bhargava wrote:
>>>> On 08/01/2014 01:18 PM, Stephen Boyd wrote:
>>>>> On 08/01/14 03:27, Prarit Bhargava wrote:
>>>>>> Can you send me the test and the trace of the deadlock?  I'm not creating it with:
>>>>>>
>>>>> This was with conservative as the default, and switching to ondemand
>>>>>
>>>>> # cd /sys/devices/system/cpu/cpu2/cpufreq
>>>>> # ls
>>>>> affected_cpus                  scaling_available_governors
>>>>> conservative                   scaling_cur_freq
>>>>> cpuinfo_cur_freq               scaling_driver
>>>>> cpuinfo_max_freq               scaling_governor
>>>>> cpuinfo_min_freq               scaling_max_freq
>>>>> cpuinfo_transition_latency     scaling_min_freq
>>>>> related_cpus                   scaling_setspeed
>>>>> scaling_available_frequencies  stats
>>>>> # cat conservative/down_threshold
>>>>> 20
>>>>> # echo ondemand > scaling_governor
>>>> Thanks Stephen,
>>>>
>>>> There's obviously a difference in our .configs.  I have a global conservative
>>>> directory, ie) /sys/devices/system/cpu/cpufreq/conservative instead of a per-cpu
>>>> governor file.
>>>>
>>>> ie) what are your .config options for CPUFREQ?
>>>>
>>>> Mine are:
>>>>
>>>> #
>>>> # CPU Frequency scaling
>>>> #
>>>> CONFIG_CPU_FREQ=y
>>>> CONFIG_CPU_FREQ_GOV_COMMON=y
>>>> CONFIG_CPU_FREQ_STAT=m
>>>> CONFIG_CPU_FREQ_STAT_DETAILS=y
>>>> # CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE is not set
>>>> # CONFIG_CPU_FREQ_DEFAULT_GOV_USERSPACE is not set
>>>> # CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND is not set
>>>> CONFIG_CPU_FREQ_DEFAULT_GOV_CONSERVATIVE=y
>>>> CONFIG_CPU_FREQ_GOV_PERFORMANCE=y
>>>> CONFIG_CPU_FREQ_GOV_POWERSAVE=y
>>>> CONFIG_CPU_FREQ_GOV_USERSPACE=y
>>>> CONFIG_CPU_FREQ_GOV_ONDEMAND=y
>>>> CONFIG_CPU_FREQ_GOV_CONSERVATIVE=y
>>>>
>>>> Is there some other config option I have to set?
>>> I have the same options. The difference is that my driver has a governor
>>> per policy. That's set with the CPUFREQ_HAVE_GOVERNOR_PER_POLICY flag.
>>> If I remove that flag I can't trigger the lockdep splat anymore with
>>> this sequence and your patch.
>> I see -- so you're seeing this on arm then?  If so, let me know so I can reserve
>> one to work on :)
>>
>
> I only have ARM to test on. You can set this flag in your cpufreq driver
> if you have independently scaling CPU frequencies, so it isn't
> necessarily an ARM thing.
>

Sorry, forgot to mention this in the earlier email. But yeah, this is a 
totally SW feature. You should be able to enable this flag in your driver.

-Saravana
Prarit Bhargava Aug. 4, 2014, 10:11 a.m. UTC | #7
On 08/01/2014 05:25 PM, Saravana Kannan wrote:
> On 08/01/2014 12:54 PM, Stephen Boyd wrote:
>> On 08/01/14 12:43, Prarit Bhargava wrote:
>>>
>>> On 08/01/2014 03:36 PM, Stephen Boyd wrote:
>>>> On 08/01/14 12:15, Prarit Bhargava wrote:
>>>>> On 08/01/2014 01:18 PM, Stephen Boyd wrote:
>>>>>> On 08/01/14 03:27, Prarit Bhargava wrote:
>>>>>>> Can you send me the test and the trace of the deadlock?  I'm not creating
>>>>>>> it with:
>>>>>>>
>>>>>> This was with conservative as the default, and switching to ondemand
>>>>>>
>>>>>> # cd /sys/devices/system/cpu/cpu2/cpufreq
>>>>>> # ls
>>>>>> affected_cpus                  scaling_available_governors
>>>>>> conservative                   scaling_cur_freq
>>>>>> cpuinfo_cur_freq               scaling_driver
>>>>>> cpuinfo_max_freq               scaling_governor
>>>>>> cpuinfo_min_freq               scaling_max_freq
>>>>>> cpuinfo_transition_latency     scaling_min_freq
>>>>>> related_cpus                   scaling_setspeed
>>>>>> scaling_available_frequencies  stats
>>>>>> # cat conservative/down_threshold
>>>>>> 20
>>>>>> # echo ondemand > scaling_governor
>>>>> Thanks Stephen,
>>>>>
>>>>> There's obviously a difference in our .configs.  I have a global conservative
>>>>> directory, ie) /sys/devices/system/cpu/cpufreq/conservative instead of a
>>>>> per-cpu
>>>>> governor file.
>>>>>
>>>>> ie) what are your .config options for CPUFREQ?
>>>>>
>>>>> Mine are:
>>>>>
>>>>> #
>>>>> # CPU Frequency scaling
>>>>> #
>>>>> CONFIG_CPU_FREQ=y
>>>>> CONFIG_CPU_FREQ_GOV_COMMON=y
>>>>> CONFIG_CPU_FREQ_STAT=m
>>>>> CONFIG_CPU_FREQ_STAT_DETAILS=y
>>>>> # CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE is not set
>>>>> # CONFIG_CPU_FREQ_DEFAULT_GOV_USERSPACE is not set
>>>>> # CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND is not set
>>>>> CONFIG_CPU_FREQ_DEFAULT_GOV_CONSERVATIVE=y
>>>>> CONFIG_CPU_FREQ_GOV_PERFORMANCE=y
>>>>> CONFIG_CPU_FREQ_GOV_POWERSAVE=y
>>>>> CONFIG_CPU_FREQ_GOV_USERSPACE=y
>>>>> CONFIG_CPU_FREQ_GOV_ONDEMAND=y
>>>>> CONFIG_CPU_FREQ_GOV_CONSERVATIVE=y
>>>>>
>>>>> Is there some other config option I have to set?
>>>> I have the same options. The difference is that my driver has a governor
>>>> per policy. That's set with the CPUFREQ_HAVE_GOVERNOR_PER_POLICY flag.
>>>> If I remove that flag I can't trigger the lockdep splat anymore with
>>>> this sequence and your patch.
>>> I see -- so you're seeing this on arm then?  If so, let me know so I can reserve
>>> one to work on :)
>>>
>>
>> I only have ARM to test on. You can set this flag in your cpufreq driver
>> if you have independently scaling CPU frequencies, so it isn't
>> necessarily an ARM thing.
>>
> 
> Sorry, forgot to mention this in the earlier email. But yeah, this is a totally
> SW feature. You should be able to enable this flag in your driver.

Thanks Saravana, I'll get back to this today.  I'm also going to try to track
down an ARM server as well.

I'll get back to everyone in a bit.

P.

> 
> -Saravana
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Viresh Kumar Aug. 4, 2014, 10:36 a.m. UTC | #8
Sorry for the delay guys, was away :(

Adding Robert as well, he reported something similar so better discuss here.

On 1 August 2014 22:48, Stephen Boyd <sboyd@codeaurora.org> wrote:
> This was with conservative as the default, and switching to ondemand
>
> # cd /sys/devices/system/cpu/cpu2/cpufreq
> # ls
> affected_cpus                  scaling_available_governors
> conservative                   scaling_cur_freq
> cpuinfo_cur_freq               scaling_driver
> cpuinfo_max_freq               scaling_governor
> cpuinfo_min_freq               scaling_max_freq
> cpuinfo_transition_latency     scaling_min_freq
> related_cpus                   scaling_setspeed
> scaling_available_frequencies  stats
> # cat conservative/down_threshold
> 20
> # echo ondemand > scaling_governor
>
>  ======================================================
>  [ INFO: possible circular locking dependency detected ]
>  3.16.0-rc3-00039-ge1e38f124d87 #47 Not tainted
>  -------------------------------------------------------
>  sh/75 is trying to acquire lock:
>   (s_active#9){++++..}, at: [<c0358a94>] kernfs_remove_by_name_ns+0x3c/0x84
>
>  but task is already holding lock:
>   (&policy->rwsem){+++++.}, at: [<c05ab1f0>] store+0x68/0xb8
>
>  which lock already depends on the new lock.
>
>
>  the existing dependency chain (in reverse order) is:
>
> -> #1 (&policy->rwsem){+++++.}:
>         [<c0359234>] kernfs_fop_open+0x138/0x298
>         [<c02fa3f4>] do_dentry_open.isra.12+0x1b0/0x2f0
>         [<c02fa604>] finish_open+0x20/0x38
>         [<c0308d34>] do_last.isra.37+0x5ac/0xb68
>         [<c03093a4>] path_openat+0xb4/0x5d8
>         [<c0309bcc>] do_filp_open+0x2c/0x80
>         [<c02fb558>] do_sys_open+0x10c/0x1c8
>         [<c020f0a0>] ret_fast_syscall+0x0/0x48
>
> -> #0 (s_active#9){++++..}:
>         [<c0357d18>] __kernfs_remove+0x250/0x300
>         [<c0358a94>] kernfs_remove_by_name_ns+0x3c/0x84
>         [<c035aa78>] remove_files+0x34/0x78
>         [<c035aee0>] sysfs_remove_group+0x40/0x98
>         [<c05b0560>] cpufreq_governor_dbs+0x4c0/0x6ec
>         [<c05abebc>] __cpufreq_governor+0x118/0x200
>         [<c05ac0fc>] cpufreq_set_policy+0x158/0x2ac
>         [<c05ad5e4>] store_scaling_governor+0x6c/0x94
>         [<c05ab210>] store+0x88/0xb8
>         [<c035a00c>] sysfs_kf_write+0x4c/0x50
>         [<c03594d4>] kernfs_fop_write+0xc0/0x180
>         [<c02fc5c8>] vfs_write+0xa0/0x1a8
>         [<c02fc9d4>] SyS_write+0x40/0x8c
>         [<c020f0a0>] ret_fast_syscall+0x0/0x48
>
>  other info that might help us debug this:
>
>   Possible unsafe locking scenario:
>
>         CPU0                    CPU1
>         ----                    ----
>    lock(&policy->rwsem);
>                                 lock(s_active#9);
>                                 lock(&policy->rwsem);
>    lock(s_active#9);
>
>   *** DEADLOCK ***
>
>  6 locks held by sh/75:
>   #0:  (sb_writers#4){.+.+..}, at: [<c02fc6a8>] vfs_write+0x180/0x1a8
>   #1:  (&of->mutex){+.+...}, at: [<c0359498>] kernfs_fop_write+0x84/0x180
>   #2:  (s_active#10){.+.+..}, at: [<c03594a0>] kernfs_fop_write+0x8c/0x180
>   #3:  (cpu_hotplug.lock){++++++}, at: [<c0221ef8>] get_online_cpus+0x38/0x9c
>   #4:  (cpufreq_rwsem){.+.+.+}, at: [<c05ab1d8>] store+0x50/0xb8
>   #5:  (&policy->rwsem){+++++.}, at: [<c05ab1f0>] store+0x68/0xb8
>
>  stack backtrace:
>  CPU: 0 PID: 75 Comm: sh Not tainted 3.16.0-rc3-00039-ge1e38f124d87 #47
>  [<c0214de8>] (unwind_backtrace) from [<c02123f8>] (show_stack+0x10/0x14)
>  [<c02123f8>] (show_stack) from [<c0709e5c>] (dump_stack+0x70/0xbc)
>  [<c0709e5c>] (dump_stack) from [<c070722c>] (print_circular_bug+0x280/0x2d4)
>  [<c070722c>] (print_circular_bug) from [<c02629cc>] (__lock_acquire+0x18d0/0x1abc)
>  [<c02629cc>] (__lock_acquire) from [<c026310c>] (lock_acquire+0x9c/0x138)
>  [<c026310c>] (lock_acquire) from [<c0357d18>] (__kernfs_remove+0x250/0x300)
>  [<c0357d18>] (__kernfs_remove) from [<c0358a94>] (kernfs_remove_by_name_ns+0x3c/0x84)
>  [<c0358a94>] (kernfs_remove_by_name_ns) from [<c035aa78>] (remove_files+0x34/0x78)
>  [<c035aa78>] (remove_files) from [<c035aee0>] (sysfs_remove_group+0x40/0x98)
>  [<c035aee0>] (sysfs_remove_group) from [<c05b0560>] (cpufreq_governor_dbs+0x4c0/0x6ec)
>  [<c05b0560>] (cpufreq_governor_dbs) from [<c05abebc>] (__cpufreq_governor+0x118/0x200)
>  [<c05abebc>] (__cpufreq_governor) from [<c05ac0fc>] (cpufreq_set_policy+0x158/0x2ac)
>  [<c05ac0fc>] (cpufreq_set_policy) from [<c05ad5e4>] (store_scaling_governor+0x6c/0x94)
>  [<c05ad5e4>] (store_scaling_governor) from [<c05ab210>] (store+0x88/0xb8)
>  [<c05ab210>] (store) from [<c035a00c>] (sysfs_kf_write+0x4c/0x50)
>  [<c035a00c>] (sysfs_kf_write) from [<c03594d4>] (kernfs_fop_write+0xc0/0x180)
>  [<c03594d4>] (kernfs_fop_write) from [<c02fc5c8>] (vfs_write+0xa0/0x1a8)
>  [<c02fc5c8>] (vfs_write) from [<c02fc9d4>] (SyS_write+0x40/0x8c)
>  [<c02fc9d4>] (SyS_write) from [<c020f0a0>] (ret_fast_syscall+0x0/0x48)

Thanks for coming to my rescue Stephen :), I was quite sure I got this
with ondemand
as well..

I will be looking very closely at the code now to see what's going wrong.
And btw, does anybody here has the exact understanding of why this
lockdep does happen? I mean what was the real problem for which we
just dropped the rwsems.. I understood that earlier but couldn't get that
again :)

Thanks all for you work on getting this fixed.

--
viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Prarit Bhargava Aug. 4, 2014, 12:25 p.m. UTC | #9
On 08/04/2014 06:36 AM, Viresh Kumar wrote:
> Sorry for the delay guys, was away :(
> 
>>
>>   Possible unsafe locking scenario:
>>
>>         CPU0                    CPU1
>>         ----                    ----
>>    lock(&policy->rwsem);
>>                                 lock(s_active#9);
>>                                 lock(&policy->rwsem);
>>    lock(s_active#9);
>>
> 
> Thanks for coming to my rescue Stephen :), I was quite sure I got this
> with ondemand
> as well..

Yeah, I'm confused by it a bit because I wasn't able to reproduce it.  But I've
got some pretty clear instructions on how to do it.

> 
> I will be looking very closely at the code now to see what's going wrong.
> And btw, does anybody here has the exact understanding of why this
> lockdep does happen? I mean what was the real problem for which we

The issue is the collision between the setup & teardown of the policy's governor
sysfs files.

On creation the kernel does:

down_write(&policy->rwsem)
mutex_lock(kernfs_mutex) <- note this is similar to the "old" sysfs_mutex.

The opposite happens on a governor switch, specifically the existing governor's
exit, and then we get a lockdep warning.

I tried to reproduce with the instructions but was unable to ... ut that was on
Friday ;) and I'm going to try again this morning.  I've also ping'd some of the
engineers here in the office who are working on ARM to get access to a system to
do further analysis and testing.

I'll ping back later in the day with some results.

Sorry I don't have a better answer or solution yet, Viresh.

P.
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Viresh Kumar Aug. 4, 2014, 1:38 p.m. UTC | #10
On 4 August 2014 17:55, Prarit Bhargava <prarit@redhat.com> wrote:
> The issue is the collision between the setup & teardown of the policy's governor
> sysfs files.
>
> On creation the kernel does:
>
> down_write(&policy->rwsem)
> mutex_lock(kernfs_mutex) <- note this is similar to the "old" sysfs_mutex.
>
> The opposite happens on a governor switch, specifically the existing governor's
> exit, and then we get a lockdep warning.

Okay, probably a bit more clarity is what I was looking for. Suppose we try
to change governor, now tell me what will happen.

> I tried to reproduce with the instructions but was unable to ... ut that was on
> Friday ;) and I'm going to try again this morning.  I've also ping'd some of the
> engineers here in the office who are working on ARM to get access to a system to
> do further analysis and testing.

You DON'T need an ARM for that, just try that on any x86 machine which has
multiple groups of CPUs sharing clock line. Or in other terms there are multiple
policy structures there..

You just need to enable the flag we were discussing about, it just decided the
location where governor's directory will get created. Nothing else.
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Saravana Kannan Aug. 4, 2014, 8:16 p.m. UTC | #11
On 08/04/2014 06:38 AM, Viresh Kumar wrote:
> On 4 August 2014 17:55, Prarit Bhargava <prarit@redhat.com> wrote:
>> The issue is the collision between the setup & teardown of the policy's governor
>> sysfs files.
>>
>> On creation the kernel does:
>>
>> down_write(&policy->rwsem)
>> mutex_lock(kernfs_mutex) <- note this is similar to the "old" sysfs_mutex.
>>
>> The opposite happens on a governor switch, specifically the existing governor's
>> exit, and then we get a lockdep warning.
>
> Okay, probably a bit more clarity is what I was looking for. Suppose we try
> to change governor, now tell me what will happen.
>
Viresh, I have a good understanding of what's going on. I was planning 
to fix this after the rest of my patches are done. Didn't want to keep 
too many outstanding unaccepted patches.

The problem is when between one thread trying to cat a governor's file 
(say, sampling_rate) vs the governor getting a POLICY_EXIT.

In the read thread, the sysfs lock is grabbed first and then the policy 
lock is grabbed next. When the governor gets the POLICY_EXIT, we call it 
with the policy lock held and the governor tries to do sysfs remove 
which tries to grab the sysfs lock. Classic deadlock.

Could you please look at my policy free/remove patches? If you can do 
that, I can work on a fix for this. It might also be simpler to fix 
after my patch series (not sure, might be).

Thanks,
Saravana
Viresh Kumar Aug. 5, 2014, 6:14 a.m. UTC | #12
On 5 August 2014 01:46, Saravana Kannan <skannan@codeaurora.org> wrote:
> The problem is when between one thread trying to cat a governor's file (say,
> sampling_rate) vs the governor getting a POLICY_EXIT.

I don't see two threads racing against each other here. Simply changing
the governor from conservative->ondemand creates this.

Or is it that the kernel is detecting two different orders of taking lock?

But during governor change, isn't the sysfs lock taken first as we are
storing a value to "scaling_governor"? So, isn't this a sysfs lock first
in all cases?

In short, I am still unclear about the *exact* problem here.

> Could you please look at my policy free/remove patches? If you can do that,
> I can work on a fix for this. It might also be simpler to fix after my patch
> series (not sure, might be).

I had an overall look of those on the day you posted them, but haven't
commented yet as was going away..

There is no way those can land in 3.17-rc1 atleast and so we still have
some time to get them pushed..

Anyway, they are my number two priority and the number one is this bug,
which we have to fix in stable kernels as well. So, a dependency on your
series wouldn't work..

Let me see if I can get some solution to this problem first.
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Saravana Kannan Aug. 5, 2014, 6:29 a.m. UTC | #13
Viresh Kumar wrote:
> On 5 August 2014 01:46, Saravana Kannan <skannan@codeaurora.org> wrote:
>> The problem is when between one thread trying to cat a governor's file
>> (say,
>> sampling_rate) vs the governor getting a POLICY_EXIT.
>
> I don't see two threads racing against each other here. Simply changing
> the governor from conservative->ondemand creates this.
>
> Or is it that the kernel is detecting two different orders of taking lock?
>
> But during governor change, isn't the sysfs lock taken first as we are
> storing a value to "scaling_governor"? So, isn't this a sysfs lock first
> in all cases?
>
> In short, I am still unclear about the *exact* problem here.
>
>> Could you please look at my policy free/remove patches? If you can do
>> that,
>> I can work on a fix for this. It might also be simpler to fix after my
>> patch
>> series (not sure, might be).
>
> I had an overall look of those on the day you posted them, but haven't
> commented yet as was going away..
>
> There is no way those can land in 3.17-rc1 atleast and so we still have
> some time to get them pushed..
>
> Anyway, they are my number two priority and the number one is this bug,
> which we have to fix in stable kernels as well. So, a dependency on your
> series wouldn't work..

Sigh... ok. I too will try to fix this one. I already have something in
mind for this.

Looks like Srivatsa has gone off the grid too. I'm hoping at least one of
you can do a review of my series. Come on guys, not everyone has to work
on the same patch/issue. :-(

-Saravana
Viresh Kumar Aug. 5, 2014, 6:43 a.m. UTC | #14
On 5 August 2014 11:59,  <skannan@codeaurora.org> wrote:
> Looks like Srivatsa has gone off the grid too. I'm hoping at least one of
> you can do a review of my series. Come on guys, not everyone has to work
> on the same patch/issue. :-(

Srivatsa is currently going through some personal stuff. He is joining MIT
for his Masters/Phd and so would be moving to US soon :)
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Viresh Kumar Aug. 5, 2014, 7:46 a.m. UTC | #15
On 2 August 2014 01:06, Stephen Boyd <sboyd@codeaurora.org> wrote:
> I have the same options. The difference is that my driver has a governor
> per policy. That's set with the CPUFREQ_HAVE_GOVERNOR_PER_POLICY flag.

You may call me stupid but I got a bit confused after looking into the code
again. Why does the crash dump depends on this flag?

We *always* remove the governor specific directory while switching governors
(Ofcourse only if its updated for All CPUs). And so on a dual core platform,
where both CPU 0 & 1 share a clock line, switching of governors should result
in this crash dump?

I may know the answer to the stupid question I had, but not sure why that is a
problem. The only (and quite significant) difference that this flag makes
is the location of governor-specific directory:
- w/o this flag: /sys/devices/system/cpu/cpufreq/<here>
- w/ this flag: /sys/devices/system/cpu/cpu*/cpufreq/<here>

So, is there some issue with the sysfs lock for <cpu*/cpufreq/> node as while
switching governor we change  <cpu*/cpufreq/scaling_governor> at the same
location?

--
viresh

--
viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Prarit Bhargava Aug. 5, 2014, 10:47 a.m. UTC | #16
On 08/05/2014 03:46 AM, Viresh Kumar wrote:
> On 2 August 2014 01:06, Stephen Boyd <sboyd@codeaurora.org> wrote:
>> I have the same options. The difference is that my driver has a governor
>> per policy. That's set with the CPUFREQ_HAVE_GOVERNOR_PER_POLICY flag.
> 
> You may call me stupid but I got a bit confused after looking into the code
> again. Why does the crash dump depends on this flag?

Nope, not a stupid question.  After reproducing (finally!) yesterday I've been
wondering the same thing.

> 
> We *always* remove the governor specific directory while switching governors
> (Ofcourse only if its updated for All CPUs). And so on a dual core platform,
> where both CPU 0 & 1 share a clock line, switching of governors should result
> in this crash dump?

I've been looking into *exactly* this.  On any platform where
cpu_weight(affected_cpus) == 1 for a particular cpu this lockdep trace should
happen.


> 
> I may know the answer to the stupid question I had, but not sure why that is a
> problem. The only (and quite significant) difference that this flag makes
> is the location of governor-specific directory:
> - w/o this flag: /sys/devices/system/cpu/cpufreq/<here>
> - w/ this flag: /sys/devices/system/cpu/cpu*/cpufreq/<here>
> 

cpufreq_global_kobject vs the policy's kobject.

> So, is there some issue with the sysfs lock for <cpu*/cpufreq/> node as while
> switching governor we change  <cpu*/cpufreq/scaling_governor> at the same
> location?

That's what I'm wondering too.  I'm going to instrument the code to find out
this morning.  I'm wondering if this comes down to a lockdep class issue
(perhaps lockdep puts globally defined locks like cpufreq_global_kobject in a
different class?).

P.
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Viresh Kumar Oct. 13, 2014, 10:43 a.m. UTC | #17
On 1 August 2014 22:48, Stephen Boyd <sboyd@codeaurora.org> wrote:
> On 08/01/14 03:27, Prarit Bhargava wrote:
>>
>> Can you send me the test and the trace of the deadlock?  I'm not creating it with:
>>
>
> This was with conservative as the default, and switching to ondemand
>
> # cd /sys/devices/system/cpu/cpu2/cpufreq
> # ls
> affected_cpus                  scaling_available_governors
> conservative                   scaling_cur_freq
> cpuinfo_cur_freq               scaling_driver
> cpuinfo_max_freq               scaling_governor
> cpuinfo_min_freq               scaling_max_freq
> cpuinfo_transition_latency     scaling_min_freq
> related_cpus                   scaling_setspeed
> scaling_available_frequencies  stats
> # cat conservative/down_threshold
> 20
> # echo ondemand > scaling_governor
>
>  ======================================================
>  [ INFO: possible circular locking dependency detected ]
>  3.16.0-rc3-00039-ge1e38f124d87 #47 Not tainted
>  -------------------------------------------------------
>  sh/75 is trying to acquire lock:
>   (s_active#9){++++..}, at: [<c0358a94>] kernfs_remove_by_name_ns+0x3c/0x84

Can you please retry this on mainline? I wasn't able to reproduce it
now over 3.17.
I am trying this on Exynos b.L implementation..
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Prarit Bhargava Oct. 13, 2014, 11:52 a.m. UTC | #18
On 10/13/2014 06:43 AM, Viresh Kumar wrote:
> On 1 August 2014 22:48, Stephen Boyd <sboyd@codeaurora.org> wrote:
>> On 08/01/14 03:27, Prarit Bhargava wrote:
>>>
>>> Can you send me the test and the trace of the deadlock?  I'm not creating it with:
>>>
>>
>> This was with conservative as the default, and switching to ondemand
>>
>> # cd /sys/devices/system/cpu/cpu2/cpufreq
>> # ls
>> affected_cpus                  scaling_available_governors
>> conservative                   scaling_cur_freq
>> cpuinfo_cur_freq               scaling_driver
>> cpuinfo_max_freq               scaling_governor
>> cpuinfo_min_freq               scaling_max_freq
>> cpuinfo_transition_latency     scaling_min_freq
>> related_cpus                   scaling_setspeed
>> scaling_available_frequencies  stats
>> # cat conservative/down_threshold
>> 20
>> # echo ondemand > scaling_governor
>>
>>  ======================================================
>>  [ INFO: possible circular locking dependency detected ]
>>  3.16.0-rc3-00039-ge1e38f124d87 #47 Not tainted
>>  -------------------------------------------------------
>>  sh/75 is trying to acquire lock:
>>   (s_active#9){++++..}, at: [<c0358a94>] kernfs_remove_by_name_ns+0x3c/0x84
> 
> Can you please retry this on mainline? I wasn't able to reproduce it
> now over 3.17.
> I am trying this on Exynos b.L implementation..

I have 100% reproducibility on latest mainline.

Viresh, please see my next post on the locking issues in cpufreq.

P.

> 
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 6f02485..fa11a7d 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -2200,9 +2200,7 @@  static int cpufreq_set_policy(struct cpufreq_policy *polic
        /* end old governor */
        if (old_gov) {
                __cpufreq_governor(policy, CPUFREQ_GOV_STOP);
-               up_write(&policy->rwsem);
                __cpufreq_governor(policy, CPUFREQ_GOV_POLICY_EXIT);
-               down_write(&policy->rwsem);
        }

        /* start new governor */
@@ -2211,9 +2209,7 @@  static int cpufreq_set_policy(struct cpufreq_policy *polic
                if (!__cpufreq_governor(policy, CPUFREQ_GOV_START))
                        goto out;

-               up_write(&policy->rwsem);
                __cpufreq_governor(policy, CPUFREQ_GOV_POLICY_EXIT);
-               down_write(&policy->rwsem);
        }

        /* new governor failed, so re-start old one */