diff mbox

cpufreq: Set cpufreq_cpu_data to NULL before putting kobject

Message ID ed8fd187687cb4ea9afd0bc32107ca5abf03e679.1422580135.git.viresh.kumar@linaro.org (mailing list archive)
State Not Applicable, archived
Headers show

Commit Message

Viresh Kumar Jan. 30, 2015, 1:13 a.m. UTC
cpufreq_cpu_data is protected by cpufreq_driver_lock and one of the instances
has missed this. And as a result we get this:

 ------------[ cut here ]------------
 WARNING: CPU: 0 PID: 4 at include/linux/kref.h:47
 kobject_get+0x41/0x50()
 Modules linked in: acpi_cpufreq(+) nfsd auth_rpcgss nfs_acl
 lockd grace sunrpc xfs libcrc32c sd_mod ixgbe igb mdio ahci hwmon
 ...
 Call Trace:
  [<ffffffff81661b14>] dump_stack+0x46/0x58
  [<ffffffff81072b61>] warn_slowpath_common+0x81/0xa0
  [<ffffffff81072c7a>] warn_slowpath_null+0x1a/0x20
  [<ffffffff812e16d1>] kobject_get+0x41/0x50
  [<ffffffff815262a5>] cpufreq_cpu_get+0x75/0xc0
  [<ffffffff81527c3e>] cpufreq_update_policy+0x2e/0x1f0
  [<ffffffff810b8cb2>] ? up+0x32/0x50
  [<ffffffff81381aa9>] ? acpi_ns_get_node+0xcb/0xf2
  [<ffffffff81381efd>] ? acpi_evaluate_object+0x22c/0x252
  [<ffffffff813824f6>] ? acpi_get_handle+0x95/0xc0
  [<ffffffff81360967>] ? acpi_has_method+0x25/0x40
  [<ffffffff81391e08>] acpi_processor_ppc_has_changed+0x77/0x82
  [<ffffffff81089566>] ? move_linked_works+0x66/0x90
  [<ffffffff8138e8ed>] acpi_processor_notify+0x58/0xe7
  [<ffffffff8137410c>] acpi_ev_notify_dispatch+0x44/0x5c
  [<ffffffff8135f293>] acpi_os_execute_deferred+0x15/0x22
  [<ffffffff8108c910>] process_one_work+0x160/0x410
  [<ffffffff8108d05b>] worker_thread+0x11b/0x520
  [<ffffffff8108cf40>] ? rescuer_thread+0x380/0x380
  [<ffffffff81092421>] kthread+0xe1/0x100
  [<ffffffff81092340>] ? kthread_create_on_node+0x1b0/0x1b0
  [<ffffffff81669ebc>] ret_from_fork+0x7c/0xb0
  [<ffffffff81092340>] ? kthread_create_on_node+0x1b0/0x1b0
 ---[ end trace 89e66eb9795efdf7 ]---

And here is the race:

 Thread A: Workqueue: kacpi_notify

 acpi_processor_notify()
   acpi_processor_ppc_has_changed()
         cpufreq_update_policy()
           cpufreq_cpu_get()
             kobject_get()

 Thread B: xenbus_thread()

 xenbus_thread()
   msg->u.watch.handle->callback()
     handle_vcpu_hotplug_event()
       vcpu_hotplug()
         cpu_down()
           __cpu_notify(CPU_POST_DEAD..)
             cpufreq_cpu_callback()
               __cpufreq_remove_dev_finish()
                 cpufreq_policy_put_kobj()
                   kobject_put()

cpufreq_cpu_get() gets the policy from per-cpu variable cpufreq_cpu_data under
cpufreq_driver_lock, and once it gets a valid policy it expects it to not be
freed until cpufreq_cpu_put() is called.

But the race happens when another thread puts the kobject first and updates
cpufreq_cpu_data later and that too without these locks. And so the first thread
gets a valid policy structure and before it does kobject_get() on it, the second
one does kobject_put(). And so this WARN().

Fix this by setting cpufreq_cpu_data to NULL before putting the kobject and that
too under locks.

Cc: <stable@vger.kernel.org>        # 3.12+
Reported-by: Ethan Zhao <ethan.zhao@oracle.com>
Reported-and-tested-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
@Santosh: I have changed read locks to write locks here and so you need to test
again.

 drivers/cpufreq/cpufreq.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

Comments

ethan zhao Jan. 30, 2015, 1:30 a.m. UTC | #1
Viresh,

On 2015/1/30 9:13, Viresh Kumar wrote:
> cpufreq_cpu_data is protected by cpufreq_driver_lock and one of the instances
> has missed this. And as a result we get this:
>
>   ------------[ cut here ]------------
>   WARNING: CPU: 0 PID: 4 at include/linux/kref.h:47
>   kobject_get+0x41/0x50()
>   Modules linked in: acpi_cpufreq(+) nfsd auth_rpcgss nfs_acl
>   lockd grace sunrpc xfs libcrc32c sd_mod ixgbe igb mdio ahci hwmon
>   ...
>   Call Trace:
>    [<ffffffff81661b14>] dump_stack+0x46/0x58
>    [<ffffffff81072b61>] warn_slowpath_common+0x81/0xa0
>    [<ffffffff81072c7a>] warn_slowpath_null+0x1a/0x20
>    [<ffffffff812e16d1>] kobject_get+0x41/0x50
>    [<ffffffff815262a5>] cpufreq_cpu_get+0x75/0xc0
>    [<ffffffff81527c3e>] cpufreq_update_policy+0x2e/0x1f0
>    [<ffffffff810b8cb2>] ? up+0x32/0x50
>    [<ffffffff81381aa9>] ? acpi_ns_get_node+0xcb/0xf2
>    [<ffffffff81381efd>] ? acpi_evaluate_object+0x22c/0x252
>    [<ffffffff813824f6>] ? acpi_get_handle+0x95/0xc0
>    [<ffffffff81360967>] ? acpi_has_method+0x25/0x40
>    [<ffffffff81391e08>] acpi_processor_ppc_has_changed+0x77/0x82
>    [<ffffffff81089566>] ? move_linked_works+0x66/0x90
>    [<ffffffff8138e8ed>] acpi_processor_notify+0x58/0xe7
>    [<ffffffff8137410c>] acpi_ev_notify_dispatch+0x44/0x5c
>    [<ffffffff8135f293>] acpi_os_execute_deferred+0x15/0x22
>    [<ffffffff8108c910>] process_one_work+0x160/0x410
>    [<ffffffff8108d05b>] worker_thread+0x11b/0x520
>    [<ffffffff8108cf40>] ? rescuer_thread+0x380/0x380
>    [<ffffffff81092421>] kthread+0xe1/0x100
>    [<ffffffff81092340>] ? kthread_create_on_node+0x1b0/0x1b0
>    [<ffffffff81669ebc>] ret_from_fork+0x7c/0xb0
>    [<ffffffff81092340>] ? kthread_create_on_node+0x1b0/0x1b0
>   ---[ end trace 89e66eb9795efdf7 ]---
>
> And here is the race:
>
>   Thread A: Workqueue: kacpi_notify
>
>   acpi_processor_notify()
>     acpi_processor_ppc_has_changed()
>           cpufreq_update_policy()
>             cpufreq_cpu_get()
>               kobject_get()
>
>   Thread B: xenbus_thread()
>
>   xenbus_thread()
>     msg->u.watch.handle->callback()
>       handle_vcpu_hotplug_event()
>         vcpu_hotplug()
>           cpu_down()
>             __cpu_notify(CPU_POST_DEAD..)
>               cpufreq_cpu_callback()
>                 __cpufreq_remove_dev_finish()
>                   cpufreq_policy_put_kobj()
>                     kobject_put()
>
> cpufreq_cpu_get() gets the policy from per-cpu variable cpufreq_cpu_data under
> cpufreq_driver_lock, and once it gets a valid policy it expects it to not be
> freed until cpufreq_cpu_put() is called.
  It is another bug.

>
> But the race happens when another thread puts the kobject first and updates
> cpufreq_cpu_data later and that too without these locks. And so the first thread
> gets a valid policy structure and before it does kobject_get() on it, the second
> one does kobject_put(). And so this WARN().
>
> Fix this by setting cpufreq_cpu_data to NULL before putting the kobject and that
> too under locks.
>
> Cc: <stable@vger.kernel.org>        # 3.12+
> Reported-by: Ethan Zhao <ethan.zhao@oracle.com>
> Reported-and-tested-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
> ---
> @Santosh: I have changed read locks to write locks here and so you need to test
> again.
>
>   drivers/cpufreq/cpufreq.c | 6 +++---
>   1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index 4473eba1d6b0..e3bf702b5588 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -1409,9 +1409,10 @@ static int __cpufreq_remove_dev_finish(struct device *dev,
>   	unsigned long flags;
>   	struct cpufreq_policy *policy;
>   
> -	read_lock_irqsave(&cpufreq_driver_lock, flags);
> +	write_lock_irqsave(&cpufreq_driver_lock, flags);
>   	policy = per_cpu(cpufreq_cpu_data, cpu);
> -	read_unlock_irqrestore(&cpufreq_driver_lock, flags);
> +	per_cpu(cpufreq_cpu_data, cpu) = NULL;
> +	write_unlock_irqrestore(&cpufreq_driver_lock, flags);
>   
>   	if (!policy) {
>   		pr_debug("%s: No cpu_data found\n", __func__);
> @@ -1466,7 +1467,6 @@ static int __cpufreq_remove_dev_finish(struct device *dev,
>   		}
>   	}
>   
> -	per_cpu(cpufreq_cpu_data, cpu) = NULL;
  Yes, here is one bug should be fix. but seems not enough to avoid the 
issue completely,
  how about the Thread B running here

  Thread B: xenbus_thread()

  xenbus_thread()
    msg->u.watch.handle->callback()
      handle_vcpu_hotplug_event()
        vcpu_hotplug()
          cpu_down()
            __cpu_notify(CPU_POST_DEAD..)
              cpufreq_cpu_callback()
                __cpufreq_remove_dev_prepare
		 update_policy_cpu(){
                    ...
                    down_write(&policy->rwsem);
                    policy->last_cpu = policy->cpu;
                    policy->cpu = cpu;
                    up_write(&policy->rwsem);
                    ---->
                 }

And thread A run to the same position as above, don't ignore my work blindly, that piece of bandage
could save your time :>
           
Thanks,
Ethan


>   	return 0;
>   }
>   

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Viresh Kumar Jan. 30, 2015, 2:05 a.m. UTC | #2
On 30 January 2015 at 07:00, ethan zhao <ethan.zhao@oracle.com> wrote:

>  It is another bug.

Hmm, lets see..

>  Yes, here is one bug should be fix. but seems not enough to avoid the issue
> completely,

Did you test it ?

>  how about the Thread B running here
>
>  Thread B: xenbus_thread()
>
>  xenbus_thread()
>    msg->u.watch.handle->callback()
>      handle_vcpu_hotplug_event()
>        vcpu_hotplug()
>          cpu_down()
>            __cpu_notify(CPU_POST_DEAD..)
>              cpufreq_cpu_callback()
>                __cpufreq_remove_dev_prepare
>                  update_policy_cpu(){
>                    ...
>                    down_write(&policy->rwsem);
>                    policy->last_cpu = policy->cpu;
>                    policy->cpu = cpu;
>                    up_write(&policy->rwsem);
>                    ---->
>                 }
>
> And thread A run to the same position as above, don't ignore my work
> blindly, that piece of bandage
> could save your time

Oh, please don't misunderstand me. I didn't had any intention of showing
any kind of disrespect.

Okay, why do you say that the above thread you shown has a bug in there?
Its juse updating policy->cpu and that shouldn't make anything unstable.

Please explain again one more time, in details why do you think you hit a
different bug. Also look at kref.h to see which piece of code has hit that
WARN() and it will happen only in the case I have shown. Lets see if I
missed something :)

--
viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
ethan zhao Jan. 30, 2015, 2:10 a.m. UTC | #3
Viresh,

On 2015/1/30 10:05, Viresh Kumar wrote:
> On 30 January 2015 at 07:00, ethan zhao <ethan.zhao@oracle.com> wrote:
>
>>   It is another bug.
> Hmm, lets see..
>
>>   Yes, here is one bug should be fix. but seems not enough to avoid the issue
>> completely,
> Did you test it ?
For a PPC notification and xen-bus thread race, could you tell me a way how
to reproduce it by trigger the PPC notification and xen-bus events 
manually ?
You really want me write some code into a test kernel to flood the PPC 
and xen-bus at the same time ? if we could analysis code and get the 
issue clearly,
we wouldn't wait the users to yell out.

Thanks,
Ethan

>
>>   how about the Thread B running here
>>
>>   Thread B: xenbus_thread()
>>
>>   xenbus_thread()
>>     msg->u.watch.handle->callback()
>>       handle_vcpu_hotplug_event()
>>         vcpu_hotplug()
>>           cpu_down()
>>             __cpu_notify(CPU_POST_DEAD..)
>>               cpufreq_cpu_callback()
>>                 __cpufreq_remove_dev_prepare
>>                   update_policy_cpu(){
>>                     ...
>>                     down_write(&policy->rwsem);
>>                     policy->last_cpu = policy->cpu;
>>                     policy->cpu = cpu;
>>                     up_write(&policy->rwsem);
>>                     ---->
>>                  }
>>
>> And thread A run to the same position as above, don't ignore my work
>> blindly, that piece of bandage
>> could save your time
> Oh, please don't misunderstand me. I didn't had any intention of showing
> any kind of disrespect.
>
> Okay, why do you say that the above thread you shown has a bug in there?
> Its juse updating policy->cpu and that shouldn't make anything unstable.
>
> Please explain again one more time, in details why do you think you hit a
> different bug. Also look at kref.h to see which piece of code has hit that
> WARN() and it will happen only in the case I have shown. Lets see if I
> missed something :)
>
> --
> viresh

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Viresh Kumar Jan. 30, 2015, 2:13 a.m. UTC | #4
On 30 January 2015 at 07:40, ethan zhao <ethan.zhao@oracle.com> wrote:
> For a PPC notification and xen-bus thread race, could you tell me a way how
> to reproduce it by trigger the PPC notification and xen-bus events manually
> ?
> You really want me write some code into a test kernel to flood the PPC and
> xen-bus at the same time ? if we could analysis code and get the issue
> clearly, we wouldn't wait the users to yell out.

I thought you already have a test where you are hitting the issue you originally
reported. Atleast Santosh did confirm that he is hitting 3/5 times in his kernel
during boot..

My reasoning of why your observation doesn't fit here:

Copying from your earlier mail..

 Thread A: Workqueue: kacpi_notify

 acpi_processor_notify()
   acpi_processor_ppc_has_changed()
         cpufreq_update_policy()
           cpufreq_cpu_get()
             kobject_get()

This tries to increment the count and the warning you have mentioned
happen because:

WARN_ON_ONCE(atomic_inc_return(&kref->refcount) < 2);

i.e. even after incrementing the count, it is < 2. Which I believe will be
1. Which means that we have tried to do kobject_get() on a kobject
for which kobject_put() is already done.

 Thread B: xenbus_thread()

 xenbus_thread()
   msg->u.watch.handle->callback()
     handle_vcpu_hotplug_event()
       vcpu_hotplug()
         cpu_down()
           __cpu_notify(CPU_DOWN_PREPARE..)
             cpufreq_cpu_callback()
               __cpufreq_remove_dev_prepare()
                 update_policy_cpu()
                   kobject_move()


Okay, where is the race or kobject_put() here ? We are just moving
the kobject and it has nothing to do with the refcount of kobject.

Why do you see its a race ?
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
ethan zhao Jan. 30, 2015, 2:21 a.m. UTC | #5
On 2015/1/30 10:13, Viresh Kumar wrote:
> On 30 January 2015 at 07:40, ethan zhao <ethan.zhao@oracle.com> wrote:
>> For a PPC notification and xen-bus thread race, could you tell me a way how
>> to reproduce it by trigger the PPC notification and xen-bus events manually
>> ?
>> You really want me write some code into a test kernel to flood the PPC and
>> xen-bus at the same time ? if we could analysis code and get the issue
>> clearly, we wouldn't wait the users to yell out.
> I thought you already have a test where you are hitting the issue you originally
> reported. Atleast Santosh did confirm that he is hitting 3/5 times in his kernel
> during boot..
  As I know, PPC notification only happens when power capping needed, 
maybe the server
  over-hot, if the cooling condition recover, you couldn't reproduce it 
either !.
>
> My reasoning of why your observation doesn't fit here:
>
> Copying from your earlier mail..
>
>   Thread A: Workqueue: kacpi_notify
>
>   acpi_processor_notify()
>     acpi_processor_ppc_has_changed()
>           cpufreq_update_policy()
>             cpufreq_cpu_get()
>               kobject_get()
>
> This tries to increment the count and the warning you have mentioned
> happen because:
>
> WARN_ON_ONCE(atomic_inc_return(&kref->refcount) < 2);
>
> i.e. even after incrementing the count, it is < 2. Which I believe will be
> 1. Which means that we have tried to do kobject_get() on a kobject
> for which kobject_put() is already done.
>
>   Thread B: xenbus_thread()
>
>   xenbus_thread()
>     msg->u.watch.handle->callback()
>       handle_vcpu_hotplug_event()
>         vcpu_hotplug()
>           cpu_down()
>             __cpu_notify(CPU_DOWN_PREPARE..)
>               cpufreq_cpu_callback()
>                 __cpufreq_remove_dev_prepare()
>                   update_policy_cpu()
>                     kobject_move()
>
>
> Okay, where is the race or kobject_put() here ? We are just moving
> the kobject and it has nothing to do with the refcount of kobject.
>
> Why do you see its a race ?
  I mean the policy->cpu has been changed, that CPU is about to be down,
  Thread A continue to get and update the policy for it blindly, that is
  what I Say 'race', not the refcount itself.

  Thanks,
  Ethan

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Viresh Kumar Jan. 30, 2015, 3:14 a.m. UTC | #6
On 30 January 2015 at 07:51, ethan zhao <ethan.zhao@oracle.com> wrote:
>> My reasoning of why your observation doesn't fit here:
>>
>> Copying from your earlier mail..
>>
>>   Thread A: Workqueue: kacpi_notify
>>
>>   acpi_processor_notify()
>>     acpi_processor_ppc_has_changed()
>>           cpufreq_update_policy()
>>             cpufreq_cpu_get()
>>               kobject_get()
>>
>> This tries to increment the count and the warning you have mentioned
>> happen because:
>>
>> WARN_ON_ONCE(atomic_inc_return(&kref->refcount) < 2);
>>
>> i.e. even after incrementing the count, it is < 2. Which I believe will be
>> 1. Which means that we have tried to do kobject_get() on a kobject
>> for which kobject_put() is already done.
>>
>>   Thread B: xenbus_thread()
>>
>>   xenbus_thread()
>>     msg->u.watch.handle->callback()
>>       handle_vcpu_hotplug_event()
>>         vcpu_hotplug()
>>           cpu_down()
>>             __cpu_notify(CPU_DOWN_PREPARE..)
>>               cpufreq_cpu_callback()
>>                 __cpufreq_remove_dev_prepare()
>>                   update_policy_cpu()
>>                     kobject_move()
>>
>>
>> Okay, where is the race or kobject_put() here ? We are just moving
>> the kobject and it has nothing to do with the refcount of kobject.
>>
>> Why do you see its a race ?
>
>  I mean the policy->cpu has been changed, that CPU is about to be down,
>  Thread A continue to get and update the policy for it blindly, that is
>  what I Say 'race', not the refcount itself.

First of all, the WARN you had in your patch doesn't have anything to do
with the so-called race you just define. Its because of the reason I defined
earlier.

Second, what if policy->cpu is getting updated? A policy manages a group
of CPUs, not a single cpu. And there still are other CPUs online for that
policy and so kobject_get() for that policy->kobj is perfectly valid.
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
ethan zhao Jan. 30, 2015, 3:46 a.m. UTC | #7
On 2015/1/30 11:14, Viresh Kumar wrote:
> On 30 January 2015 at 07:51, ethan zhao <ethan.zhao@oracle.com> wrote:
>>> My reasoning of why your observation doesn't fit here:
>>>
>>> Copying from your earlier mail..
>>>
>>>    Thread A: Workqueue: kacpi_notify
>>>
>>>    acpi_processor_notify()
>>>      acpi_processor_ppc_has_changed()
>>>            cpufreq_update_policy()
>>>              cpufreq_cpu_get()
>>>                kobject_get()
>>>
>>> This tries to increment the count and the warning you have mentioned
>>> happen because:
>>>
>>> WARN_ON_ONCE(atomic_inc_return(&kref->refcount) < 2);
>>>
>>> i.e. even after incrementing the count, it is < 2. Which I believe will be
>>> 1. Which means that we have tried to do kobject_get() on a kobject
>>> for which kobject_put() is already done.
>>>
>>>    Thread B: xenbus_thread()
>>>
>>>    xenbus_thread()
>>>      msg->u.watch.handle->callback()
>>>        handle_vcpu_hotplug_event()
>>>          vcpu_hotplug()
>>>            cpu_down()
>>>              __cpu_notify(CPU_DOWN_PREPARE..)
>>>                cpufreq_cpu_callback()
>>>                  __cpufreq_remove_dev_prepare()
>>>                    update_policy_cpu()
>>>                      kobject_move()
>>>
>>>
>>> Okay, where is the race or kobject_put() here ? We are just moving
>>> the kobject and it has nothing to do with the refcount of kobject.
>>>
>>> Why do you see its a race ?
>>   I mean the policy->cpu has been changed, that CPU is about to be down,
>>   Thread A continue to get and update the policy for it blindly, that is
>>   what I Say 'race', not the refcount itself.
> First of all, the WARN you had in your patch doesn't have anything to do
> with the so-called race you just define. Its because of the reason I defined
> earlier.


>
> Second, what if policy->cpu is getting updated? A policy manages a group
> of CPUs, not a single cpu. And there still are other CPUs online for that
> policy and so kobject_get() for that policy->kobj is perfectly valid.
  You mean the policy is shared by all CPUs, so PPC notification about one
  CPU should update all CPU's policy, right ? even the requested CPU is 
shutting
  down.

Thanks,
Ethan


--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Viresh Kumar Jan. 30, 2015, 4:14 a.m. UTC | #8
On 30 January 2015 at 09:16, ethan zhao <ethan.zhao@oracle.com> wrote:
>  You mean the policy is shared by all CPUs, so PPC notification about one

A policy may or maynot be shared by multiple CPUs. It all depends on your
systems configuration. All CPUs which share clock line, share a policy.

You can verify that from /sys/devices/system/cpu/cpu*/related_cpus. This
gives the CPUs that share policy.

>  CPU should update all CPU's policy, right ? even the requested CPU is
> shutting  down.

CPUs sharing policy have a single policy structure. And so policy would
be updated for all CPUs that share the poilcy. Even if some cpu goes down,
the policy might still be up and running.

--
viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Santosh Shilimkar Jan. 30, 2015, 10:55 p.m. UTC | #9
On 1/30/2015 2:57 PM, Rafael J. Wysocki wrote:
> On Friday, January 30, 2015 06:43:12 AM Viresh Kumar wrote:
>> cpufreq_cpu_data is protected by cpufreq_driver_lock and one of the instances
>> has missed this.
>
> No, it isn't.  cpufreq_cpu_data is a pointer and doesn't need any locks to
> protect it.  What the lock does is to guarantee the callers of cpufreq_cpu_get()
> that the policy object won't go away from under them (it is used for some other
> purposes too, but they are unrelated).
>
> What technically happens is an ordering problem.  per_cpu(cpufreq_cpu_data, cpu)
> needs to be cleared before calling kobject_put(&policy->kobj) *and* under the
> lock, because otherwise if someone else calls cpufreq_cpu_get() in parallel
> with that, they can obtain a non-NULL from it *after* kobject_put(&policy->kobj)
> was executed.
>
> So the lock is needed not just because it protects cpufreq_cpu_data, but because
> it is supposed to prevent writes to cpufreq_cpu_data from happening between the
> read from it and the kobject_get(&policy->kobj) in cpufreq_cpu_get().
>
>> And as a result we get this:
>>
>>   ------------[ cut here ]------------
>>   WARNING: CPU: 0 PID: 4 at include/linux/kref.h:47
>>   kobject_get+0x41/0x50()
>>   Modules linked in: acpi_cpufreq(+) nfsd auth_rpcgss nfs_acl
>>   lockd grace sunrpc xfs libcrc32c sd_mod ixgbe igb mdio ahci hwmon
>>   ...
>>   Call Trace:
>>    [<ffffffff81661b14>] dump_stack+0x46/0x58
>>    [<ffffffff81072b61>] warn_slowpath_common+0x81/0xa0
>>    [<ffffffff81072c7a>] warn_slowpath_null+0x1a/0x20
>>    [<ffffffff812e16d1>] kobject_get+0x41/0x50
>>    [<ffffffff815262a5>] cpufreq_cpu_get+0x75/0xc0
>>    [<ffffffff81527c3e>] cpufreq_update_policy+0x2e/0x1f0
>>    [<ffffffff810b8cb2>] ? up+0x32/0x50
>>    [<ffffffff81381aa9>] ? acpi_ns_get_node+0xcb/0xf2
>>    [<ffffffff81381efd>] ? acpi_evaluate_object+0x22c/0x252
>>    [<ffffffff813824f6>] ? acpi_get_handle+0x95/0xc0
>>    [<ffffffff81360967>] ? acpi_has_method+0x25/0x40
>>    [<ffffffff81391e08>] acpi_processor_ppc_has_changed+0x77/0x82
>>    [<ffffffff81089566>] ? move_linked_works+0x66/0x90
>>    [<ffffffff8138e8ed>] acpi_processor_notify+0x58/0xe7
>>    [<ffffffff8137410c>] acpi_ev_notify_dispatch+0x44/0x5c
>>    [<ffffffff8135f293>] acpi_os_execute_deferred+0x15/0x22
>>    [<ffffffff8108c910>] process_one_work+0x160/0x410
>>    [<ffffffff8108d05b>] worker_thread+0x11b/0x520
>>    [<ffffffff8108cf40>] ? rescuer_thread+0x380/0x380
>>    [<ffffffff81092421>] kthread+0xe1/0x100
>>    [<ffffffff81092340>] ? kthread_create_on_node+0x1b0/0x1b0
>>    [<ffffffff81669ebc>] ret_from_fork+0x7c/0xb0
>>    [<ffffffff81092340>] ? kthread_create_on_node+0x1b0/0x1b0
>>   ---[ end trace 89e66eb9795efdf7 ]---
>>
>> And here is the race:
>>
>>   Thread A: Workqueue: kacpi_notify
>>
>>   acpi_processor_notify()
>>     acpi_processor_ppc_has_changed()
>>           cpufreq_update_policy()
>>             cpufreq_cpu_get()
>>               kobject_get()
>>
>>   Thread B: xenbus_thread()
>>
>>   xenbus_thread()
>>     msg->u.watch.handle->callback()
>>       handle_vcpu_hotplug_event()
>>         vcpu_hotplug()
>>           cpu_down()
>>             __cpu_notify(CPU_POST_DEAD..)
>>               cpufreq_cpu_callback()
>>                 __cpufreq_remove_dev_finish()
>>                   cpufreq_policy_put_kobj()
>>                     kobject_put()
>>
>> cpufreq_cpu_get() gets the policy from per-cpu variable cpufreq_cpu_data under
>> cpufreq_driver_lock, and once it gets a valid policy it expects it to not be
>> freed until cpufreq_cpu_put() is called.
>
> Because it does the kobject_get() under the lock too.
>
>> But the race happens when another thread puts the kobject first and updates
>> cpufreq_cpu_data later
>
> This is an ordering problem.
>
>> and that too without these locks.
>
> And this is racy.
>
>> And so the first thread
>> gets a valid policy structure and before it does kobject_get() on it, the second
>> one does kobject_put(). And so this WARN().
>>
>> Fix this by setting cpufreq_cpu_data to NULL before putting the kobject and that
>> too under locks.
>
> That's almost correct.  In addition to the above (albeit maybe unintentionally)
> the patch also fixes the possible race between two instances of
> __cpufreq_remove_dev_finish() with the same arguments running in parallel with
> each other.  The proof is left as an exercise to the reader. :-)
>
> Please try to improve the changelog ...
>
>> Cc: <stable@vger.kernel.org>        # 3.12+
>> Reported-by: Ethan Zhao <ethan.zhao@oracle.com>
>> Reported-and-tested-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
>> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
>> ---
>> @Santosh: I have changed read locks to write locks here and so you need to test
>> again.
>>
Right. Tested this version and confirm that it fixes the reported WARN()

Regards,
Santosh

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki Jan. 30, 2015, 10:57 p.m. UTC | #10
On Friday, January 30, 2015 06:43:12 AM Viresh Kumar wrote:
> cpufreq_cpu_data is protected by cpufreq_driver_lock and one of the instances
> has missed this.

No, it isn't.  cpufreq_cpu_data is a pointer and doesn't need any locks to
protect it.  What the lock does is to guarantee the callers of cpufreq_cpu_get()
that the policy object won't go away from under them (it is used for some other
purposes too, but they are unrelated).

What technically happens is an ordering problem.  per_cpu(cpufreq_cpu_data, cpu)
needs to be cleared before calling kobject_put(&policy->kobj) *and* under the
lock, because otherwise if someone else calls cpufreq_cpu_get() in parallel
with that, they can obtain a non-NULL from it *after* kobject_put(&policy->kobj)
was executed.

So the lock is needed not just because it protects cpufreq_cpu_data, but because
it is supposed to prevent writes to cpufreq_cpu_data from happening between the
read from it and the kobject_get(&policy->kobj) in cpufreq_cpu_get().

> And as a result we get this:
> 
>  ------------[ cut here ]------------
>  WARNING: CPU: 0 PID: 4 at include/linux/kref.h:47
>  kobject_get+0x41/0x50()
>  Modules linked in: acpi_cpufreq(+) nfsd auth_rpcgss nfs_acl
>  lockd grace sunrpc xfs libcrc32c sd_mod ixgbe igb mdio ahci hwmon
>  ...
>  Call Trace:
>   [<ffffffff81661b14>] dump_stack+0x46/0x58
>   [<ffffffff81072b61>] warn_slowpath_common+0x81/0xa0
>   [<ffffffff81072c7a>] warn_slowpath_null+0x1a/0x20
>   [<ffffffff812e16d1>] kobject_get+0x41/0x50
>   [<ffffffff815262a5>] cpufreq_cpu_get+0x75/0xc0
>   [<ffffffff81527c3e>] cpufreq_update_policy+0x2e/0x1f0
>   [<ffffffff810b8cb2>] ? up+0x32/0x50
>   [<ffffffff81381aa9>] ? acpi_ns_get_node+0xcb/0xf2
>   [<ffffffff81381efd>] ? acpi_evaluate_object+0x22c/0x252
>   [<ffffffff813824f6>] ? acpi_get_handle+0x95/0xc0
>   [<ffffffff81360967>] ? acpi_has_method+0x25/0x40
>   [<ffffffff81391e08>] acpi_processor_ppc_has_changed+0x77/0x82
>   [<ffffffff81089566>] ? move_linked_works+0x66/0x90
>   [<ffffffff8138e8ed>] acpi_processor_notify+0x58/0xe7
>   [<ffffffff8137410c>] acpi_ev_notify_dispatch+0x44/0x5c
>   [<ffffffff8135f293>] acpi_os_execute_deferred+0x15/0x22
>   [<ffffffff8108c910>] process_one_work+0x160/0x410
>   [<ffffffff8108d05b>] worker_thread+0x11b/0x520
>   [<ffffffff8108cf40>] ? rescuer_thread+0x380/0x380
>   [<ffffffff81092421>] kthread+0xe1/0x100
>   [<ffffffff81092340>] ? kthread_create_on_node+0x1b0/0x1b0
>   [<ffffffff81669ebc>] ret_from_fork+0x7c/0xb0
>   [<ffffffff81092340>] ? kthread_create_on_node+0x1b0/0x1b0
>  ---[ end trace 89e66eb9795efdf7 ]---
> 
> And here is the race:
>
>  Thread A: Workqueue: kacpi_notify
> 
>  acpi_processor_notify()
>    acpi_processor_ppc_has_changed()
>          cpufreq_update_policy()
>            cpufreq_cpu_get()
>              kobject_get()
> 
>  Thread B: xenbus_thread()
> 
>  xenbus_thread()
>    msg->u.watch.handle->callback()
>      handle_vcpu_hotplug_event()
>        vcpu_hotplug()
>          cpu_down()
>            __cpu_notify(CPU_POST_DEAD..)
>              cpufreq_cpu_callback()
>                __cpufreq_remove_dev_finish()
>                  cpufreq_policy_put_kobj()
>                    kobject_put()
> 
> cpufreq_cpu_get() gets the policy from per-cpu variable cpufreq_cpu_data under
> cpufreq_driver_lock, and once it gets a valid policy it expects it to not be
> freed until cpufreq_cpu_put() is called.

Because it does the kobject_get() under the lock too.

> But the race happens when another thread puts the kobject first and updates
> cpufreq_cpu_data later

This is an ordering problem.

> and that too without these locks.

And this is racy.

> And so the first thread
> gets a valid policy structure and before it does kobject_get() on it, the second
> one does kobject_put(). And so this WARN().
> 
> Fix this by setting cpufreq_cpu_data to NULL before putting the kobject and that
> too under locks.

That's almost correct.  In addition to the above (albeit maybe unintentionally)
the patch also fixes the possible race between two instances of
__cpufreq_remove_dev_finish() with the same arguments running in parallel with
each other.  The proof is left as an exercise to the reader. :-)

Please try to improve the changelog ...

> Cc: <stable@vger.kernel.org>        # 3.12+
> Reported-by: Ethan Zhao <ethan.zhao@oracle.com>
> Reported-and-tested-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
> ---
> @Santosh: I have changed read locks to write locks here and so you need to test
> again.
> 
>  drivers/cpufreq/cpufreq.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index 4473eba1d6b0..e3bf702b5588 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -1409,9 +1409,10 @@ static int __cpufreq_remove_dev_finish(struct device *dev,
>  	unsigned long flags;
>  	struct cpufreq_policy *policy;
>  
> -	read_lock_irqsave(&cpufreq_driver_lock, flags);
> +	write_lock_irqsave(&cpufreq_driver_lock, flags);
>  	policy = per_cpu(cpufreq_cpu_data, cpu);
> -	read_unlock_irqrestore(&cpufreq_driver_lock, flags);
> +	per_cpu(cpufreq_cpu_data, cpu) = NULL;
> +	write_unlock_irqrestore(&cpufreq_driver_lock, flags);
>  
>  	if (!policy) {
>  		pr_debug("%s: No cpu_data found\n", __func__);
> @@ -1466,7 +1467,6 @@ static int __cpufreq_remove_dev_finish(struct device *dev,
>  		}
>  	}
>  
> -	per_cpu(cpufreq_cpu_data, cpu) = NULL;
>  	return 0;
>  }
>  
>
Viresh Kumar Jan. 31, 2015, 12:31 a.m. UTC | #11
On 31 January 2015 at 04:27, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> No, it isn't.  cpufreq_cpu_data is a pointer and doesn't need any locks to
> protect it.  What the lock does is to guarantee the callers of cpufreq_cpu_get()
> that the policy object won't go away from under them (it is used for some other
> purposes too, but they are unrelated).

Yeah, its not just locking. Otherwise the locking only at the bottom
of this routine
should have fixed it.

> That's almost correct.  In addition to the above (albeit maybe unintentionally)
> the patch also fixes the possible race between two instances of
> __cpufreq_remove_dev_finish() with the same arguments running in parallel with
> each other.  The proof is left as an exercise to the reader. :-)

Two __cpufreq_remove_dev_finish() can't get called in parallel, so it
doesn't fix
any race there :).

> Please try to improve the changelog ...

Yes.
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
ethan zhao Feb. 2, 2015, 1:54 a.m. UTC | #12
Viresh,

On 2015/1/30 12:14, Viresh Kumar wrote:
> On 30 January 2015 at 09:16, ethan zhao <ethan.zhao@oracle.com> wrote:
>>   You mean the policy is shared by all CPUs, so PPC notification about one
> A policy may or maynot be shared by multiple CPUs. It all depends on your
> systems configuration. All CPUs which share clock line, share a policy.
>
> You can verify that from /sys/devices/system/cpu/cpu*/related_cpus. This
> gives the CPUs that share policy.
>
>>   CPU should update all CPU's policy, right ? even the requested CPU is
>> shutting  down.
> CPUs sharing policy have a single policy structure. And so policy would
> be updated for all CPUs that share the poilcy. Even if some cpu goes down,
> the policy might still be up and running.
  Let' me check ACPI spec and BIOS to match your implementation.

  About that patch you suggested, there is another question left pending:

  Policy will be freed only when that's last CPU shutting down, how does it
  happen when system booting ?

  The description of the patch would be wrong (the Xen_bus call stack) --
  Did the xen_bus shut down every CPU till the last during booting ?

  Thanks,
  Ethan
>
> --
> viresh

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Viresh Kumar Feb. 2, 2015, 3:20 a.m. UTC | #13
On 2 February 2015 at 07:24, ethan zhao <ethan.zhao@oracle.com> wrote:
>  Policy will be freed only when that's last CPU shutting down, how does it
>  happen when system booting ?

I couldn't understand what you are asking here. Though policy will be allocated
for the first CPU of every policy during boot..

>  The description of the patch would be wrong (the Xen_bus call stack) --
>  Did the xen_bus shut down every CPU till the last during booting ?

I don't know about that..
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 4473eba1d6b0..e3bf702b5588 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -1409,9 +1409,10 @@  static int __cpufreq_remove_dev_finish(struct device *dev,
 	unsigned long flags;
 	struct cpufreq_policy *policy;
 
-	read_lock_irqsave(&cpufreq_driver_lock, flags);
+	write_lock_irqsave(&cpufreq_driver_lock, flags);
 	policy = per_cpu(cpufreq_cpu_data, cpu);
-	read_unlock_irqrestore(&cpufreq_driver_lock, flags);
+	per_cpu(cpufreq_cpu_data, cpu) = NULL;
+	write_unlock_irqrestore(&cpufreq_driver_lock, flags);
 
 	if (!policy) {
 		pr_debug("%s: No cpu_data found\n", __func__);
@@ -1466,7 +1467,6 @@  static int __cpufreq_remove_dev_finish(struct device *dev,
 		}
 	}
 
-	per_cpu(cpufreq_cpu_data, cpu) = NULL;
 	return 0;
 }