diff mbox

[Resend] cpufreq: Set cpufreq_cpu_data to NULL before putting kobject

Message ID ed8fd187687cb4ea9afd0bc32107ca5abf03e679.1422663249.git.viresh.kumar@linaro.org (mailing list archive)
State Accepted, archived
Delegated to: Rafael Wysocki
Headers show

Commit Message

Viresh Kumar Jan. 31, 2015, 12:32 a.m. UTC
In __cpufreq_remove_dev_finish(), per-cpu 'cpufreq_cpu_data' needs to be cleared
before calling kobject_put(&policy->kobj) *and* under the lock. Otherwise if
someone else calls cpufreq_cpu_get() in parallel with it, they can obtain a
non-NULL policy from it *after* kobject_put(&policy->kobj) was executed.

Consider this case:

Thread A				Thread B
cpufreq_cpu_get()
  read_lock_irqsave()
  read-per-cpu cpufreq_cpu_data
					per_cpu(&cpufreq_cpu_data, cpu) = NULL
					kobject_put(&policy->kobj);
  kobject_get(&policy->kobj);


And this will result in below Warnings:

 ------------[ cut here ]------------
 WARNING: CPU: 0 PID: 4 at include/linux/kref.h:47
 kobject_get+0x41/0x50()
 Modules linked in: acpi_cpufreq(+) nfsd auth_rpcgss nfs_acl
 lockd grace sunrpc xfs libcrc32c sd_mod ixgbe igb mdio ahci hwmon
 ...
 Call Trace:
  [<ffffffff81661b14>] dump_stack+0x46/0x58
  [<ffffffff81072b61>] warn_slowpath_common+0x81/0xa0
  [<ffffffff81072c7a>] warn_slowpath_null+0x1a/0x20
  [<ffffffff812e16d1>] kobject_get+0x41/0x50
  [<ffffffff815262a5>] cpufreq_cpu_get+0x75/0xc0
  [<ffffffff81527c3e>] cpufreq_update_policy+0x2e/0x1f0
  [<ffffffff810b8cb2>] ? up+0x32/0x50
  [<ffffffff81381aa9>] ? acpi_ns_get_node+0xcb/0xf2
  [<ffffffff81381efd>] ? acpi_evaluate_object+0x22c/0x252
  [<ffffffff813824f6>] ? acpi_get_handle+0x95/0xc0
  [<ffffffff81360967>] ? acpi_has_method+0x25/0x40
  [<ffffffff81391e08>] acpi_processor_ppc_has_changed+0x77/0x82
  [<ffffffff81089566>] ? move_linked_works+0x66/0x90
  [<ffffffff8138e8ed>] acpi_processor_notify+0x58/0xe7
  [<ffffffff8137410c>] acpi_ev_notify_dispatch+0x44/0x5c
  [<ffffffff8135f293>] acpi_os_execute_deferred+0x15/0x22
  [<ffffffff8108c910>] process_one_work+0x160/0x410
  [<ffffffff8108d05b>] worker_thread+0x11b/0x520
  [<ffffffff8108cf40>] ? rescuer_thread+0x380/0x380
  [<ffffffff81092421>] kthread+0xe1/0x100
  [<ffffffff81092340>] ? kthread_create_on_node+0x1b0/0x1b0
  [<ffffffff81669ebc>] ret_from_fork+0x7c/0xb0
  [<ffffffff81092340>] ? kthread_create_on_node+0x1b0/0x1b0
 ---[ end trace 89e66eb9795efdf7 ]---

And here is the actual race (+ the race mentioned above):

 Thread A: Workqueue: kacpi_notify

 acpi_processor_notify()
   acpi_processor_ppc_has_changed()
         cpufreq_update_policy()
           cpufreq_cpu_get()
             kobject_get()

 Thread B: xenbus_thread()

 xenbus_thread()
   msg->u.watch.handle->callback()
     handle_vcpu_hotplug_event()
       vcpu_hotplug()
         cpu_down()
           __cpu_notify(CPU_POST_DEAD..)
             cpufreq_cpu_callback()
               __cpufreq_remove_dev_finish()
                 cpufreq_policy_put_kobj()
                   kobject_put()

cpufreq_cpu_get() gets the policy from per-cpu variable cpufreq_cpu_data under
cpufreq_driver_lock, and once it gets a valid policy it expects it to not be
freed until cpufreq_cpu_put() is called.

But the race happens when another thread puts the kobject first and updates
cpufreq_cpu_data before or later. And so the first thread gets a valid policy
structure and before it does kobject_get() on it, the second one has already
done kobject_put().

Fix this by setting cpufreq_cpu_data to NULL before putting the kobject and that
too under locks.

Reported-by: Ethan Zhao <ethan.zhao@oracle.com>
Reported-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 drivers/cpufreq/cpufreq.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

Comments

Viresh Kumar Jan. 31, 2015, 2:14 a.m. UTC | #1
On 31 January 2015 at 06:02, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> In __cpufreq_remove_dev_finish(), per-cpu 'cpufreq_cpu_data' needs to be cleared
> before calling kobject_put(&policy->kobj) *and* under the lock. Otherwise if
> someone else calls cpufreq_cpu_get() in parallel with it, they can obtain a
> non-NULL policy from it *after* kobject_put(&policy->kobj) was executed.
>
> Consider this case:
>
> Thread A                                Thread B
> cpufreq_cpu_get()
>   read_lock_irqsave()
>   read-per-cpu cpufreq_cpu_data
>                                         per_cpu(&cpufreq_cpu_data, cpu) = NULL
>                                         kobject_put(&policy->kobj);
>   kobject_get(&policy->kobj);
>
>
> And this will result in below Warnings:
>
>  ------------[ cut here ]------------
>  WARNING: CPU: 0 PID: 4 at include/linux/kref.h:47
>  kobject_get+0x41/0x50()
>  Modules linked in: acpi_cpufreq(+) nfsd auth_rpcgss nfs_acl
>  lockd grace sunrpc xfs libcrc32c sd_mod ixgbe igb mdio ahci hwmon
>  ...
>  Call Trace:
>   [<ffffffff81661b14>] dump_stack+0x46/0x58
>   [<ffffffff81072b61>] warn_slowpath_common+0x81/0xa0
>   [<ffffffff81072c7a>] warn_slowpath_null+0x1a/0x20
>   [<ffffffff812e16d1>] kobject_get+0x41/0x50
>   [<ffffffff815262a5>] cpufreq_cpu_get+0x75/0xc0
>   [<ffffffff81527c3e>] cpufreq_update_policy+0x2e/0x1f0
>   [<ffffffff810b8cb2>] ? up+0x32/0x50
>   [<ffffffff81381aa9>] ? acpi_ns_get_node+0xcb/0xf2
>   [<ffffffff81381efd>] ? acpi_evaluate_object+0x22c/0x252
>   [<ffffffff813824f6>] ? acpi_get_handle+0x95/0xc0
>   [<ffffffff81360967>] ? acpi_has_method+0x25/0x40
>   [<ffffffff81391e08>] acpi_processor_ppc_has_changed+0x77/0x82
>   [<ffffffff81089566>] ? move_linked_works+0x66/0x90
>   [<ffffffff8138e8ed>] acpi_processor_notify+0x58/0xe7
>   [<ffffffff8137410c>] acpi_ev_notify_dispatch+0x44/0x5c
>   [<ffffffff8135f293>] acpi_os_execute_deferred+0x15/0x22
>   [<ffffffff8108c910>] process_one_work+0x160/0x410
>   [<ffffffff8108d05b>] worker_thread+0x11b/0x520
>   [<ffffffff8108cf40>] ? rescuer_thread+0x380/0x380
>   [<ffffffff81092421>] kthread+0xe1/0x100
>   [<ffffffff81092340>] ? kthread_create_on_node+0x1b0/0x1b0
>   [<ffffffff81669ebc>] ret_from_fork+0x7c/0xb0
>   [<ffffffff81092340>] ? kthread_create_on_node+0x1b0/0x1b0
>  ---[ end trace 89e66eb9795efdf7 ]---
>
> And here is the actual race (+ the race mentioned above):
>
>  Thread A: Workqueue: kacpi_notify
>
>  acpi_processor_notify()
>    acpi_processor_ppc_has_changed()
>          cpufreq_update_policy()
>            cpufreq_cpu_get()
>              kobject_get()
>
>  Thread B: xenbus_thread()
>
>  xenbus_thread()
>    msg->u.watch.handle->callback()
>      handle_vcpu_hotplug_event()
>        vcpu_hotplug()
>          cpu_down()
>            __cpu_notify(CPU_POST_DEAD..)
>              cpufreq_cpu_callback()
>                __cpufreq_remove_dev_finish()
>                  cpufreq_policy_put_kobj()
>                    kobject_put()
>
> cpufreq_cpu_get() gets the policy from per-cpu variable cpufreq_cpu_data under
> cpufreq_driver_lock, and once it gets a valid policy it expects it to not be
> freed until cpufreq_cpu_put() is called.
>
> But the race happens when another thread puts the kobject first and updates
> cpufreq_cpu_data before or later. And so the first thread gets a valid policy
> structure and before it does kobject_get() on it, the second one has already
> done kobject_put().
>
> Fix this by setting cpufreq_cpu_data to NULL before putting the kobject and that
> too under locks.
>

Cc: <stable@vger.kernel.org>        # 3.12+
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
ethan zhao Feb. 2, 2015, 3:20 a.m. UTC | #2
Viresh,

On 2015/1/31 8:32, Viresh Kumar wrote:
> In __cpufreq_remove_dev_finish(), per-cpu 'cpufreq_cpu_data' needs to be cleared
> before calling kobject_put(&policy->kobj) *and* under the lock. Otherwise if
> someone else calls cpufreq_cpu_get() in parallel with it, they can obtain a
> non-NULL policy from it *after* kobject_put(&policy->kobj) was executed.
>
> Consider this case:
>
> Thread A				Thread B
> cpufreq_cpu_get()
>    read_lock_irqsave()
>    read-per-cpu cpufreq_cpu_data
> 					per_cpu(&cpufreq_cpu_data, cpu) = NULL
> 					kobject_put(&policy->kobj);
>    kobject_get(&policy->kobj);
>
>
> And this will result in below Warnings:
>
>   ------------[ cut here ]------------
>   WARNING: CPU: 0 PID: 4 at include/linux/kref.h:47
>   kobject_get+0x41/0x50()
>   Modules linked in: acpi_cpufreq(+) nfsd auth_rpcgss nfs_acl
>   lockd grace sunrpc xfs libcrc32c sd_mod ixgbe igb mdio ahci hwmon
>   ...
>   Call Trace:
>    [<ffffffff81661b14>] dump_stack+0x46/0x58
>    [<ffffffff81072b61>] warn_slowpath_common+0x81/0xa0
>    [<ffffffff81072c7a>] warn_slowpath_null+0x1a/0x20
>    [<ffffffff812e16d1>] kobject_get+0x41/0x50
>    [<ffffffff815262a5>] cpufreq_cpu_get+0x75/0xc0
>    [<ffffffff81527c3e>] cpufreq_update_policy+0x2e/0x1f0
>    [<ffffffff810b8cb2>] ? up+0x32/0x50
>    [<ffffffff81381aa9>] ? acpi_ns_get_node+0xcb/0xf2
>    [<ffffffff81381efd>] ? acpi_evaluate_object+0x22c/0x252
>    [<ffffffff813824f6>] ? acpi_get_handle+0x95/0xc0
>    [<ffffffff81360967>] ? acpi_has_method+0x25/0x40
>    [<ffffffff81391e08>] acpi_processor_ppc_has_changed+0x77/0x82
>    [<ffffffff81089566>] ? move_linked_works+0x66/0x90
>    [<ffffffff8138e8ed>] acpi_processor_notify+0x58/0xe7
>    [<ffffffff8137410c>] acpi_ev_notify_dispatch+0x44/0x5c
>    [<ffffffff8135f293>] acpi_os_execute_deferred+0x15/0x22
>    [<ffffffff8108c910>] process_one_work+0x160/0x410
>    [<ffffffff8108d05b>] worker_thread+0x11b/0x520
>    [<ffffffff8108cf40>] ? rescuer_thread+0x380/0x380
>    [<ffffffff81092421>] kthread+0xe1/0x100
>    [<ffffffff81092340>] ? kthread_create_on_node+0x1b0/0x1b0
>    [<ffffffff81669ebc>] ret_from_fork+0x7c/0xb0
>    [<ffffffff81092340>] ? kthread_create_on_node+0x1b0/0x1b0
>   ---[ end trace 89e66eb9795efdf7 ]---
>
> And here is the actual race (+ the race mentioned above):
>
>   Thread A: Workqueue: kacpi_notify
>
>   acpi_processor_notify()
>     acpi_processor_ppc_has_changed()
>           cpufreq_update_policy()
>             cpufreq_cpu_get()
>               kobject_get()
>
>   Thread B: xenbus_thread()
>
>   xenbus_thread()
>     msg->u.watch.handle->callback()
>       handle_vcpu_hotplug_event()
>         vcpu_hotplug()
>           cpu_down()
>             __cpu_notify(CPU_POST_DEAD..)
>               cpufreq_cpu_callback()
>                 __cpufreq_remove_dev_finish()
>                   cpufreq_policy_put_kobj()
>                     kobject_put()
>
> cpufreq_cpu_get() gets the policy from per-cpu variable cpufreq_cpu_data under
> cpufreq_driver_lock, and once it gets a valid policy it expects it to not be
> freed until cpufreq_cpu_put() is called.
>
> But the race happens when another thread puts the kobject first and updates
> cpufreq_cpu_data before or later. And so the first thread gets a valid policy
> structure and before it does kobject_get() on it, the second one has already
> done kobject_put().
>
> Fix this by setting cpufreq_cpu_data to NULL before putting the kobject and that
> too under locks.
>
> Reported-by: Ethan Zhao <ethan.zhao@oracle.com>
> Reported-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
> ---
>   drivers/cpufreq/cpufreq.c | 6 +++---
>   1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index 4473eba1d6b0..e3bf702b5588 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -1409,9 +1409,10 @@ static int __cpufreq_remove_dev_finish(struct device *dev,
>   	unsigned long flags;
>   	struct cpufreq_policy *policy;
>   
> -	read_lock_irqsave(&cpufreq_driver_lock, flags);
> +	write_lock_irqsave(&cpufreq_driver_lock, flags);
>   	policy = per_cpu(cpufreq_cpu_data, cpu);
> -	read_unlock_irqrestore(&cpufreq_driver_lock, flags);
> +	per_cpu(cpufreq_cpu_data, cpu) = NULL;
> +	write_unlock_irqrestore(&cpufreq_driver_lock, flags);
>   
>   	if (!policy) {
>   		pr_debug("%s: No cpu_data found\n", __func__);
> @@ -1466,7 +1467,6 @@ static int __cpufreq_remove_dev_finish(struct device *dev,
>   		}
>   	}
>   
> -	per_cpu(cpufreq_cpu_data, cpu) = NULL;
>   	return 0;
>   }
  This seems couldn't prevent all the 'bad thing' from happening, E.G.


  Thread A: Workqueue: kacpi_notify

  acpi_processor_notify()
    acpi_processor_ppc_has_changed()
          cpufreq_update_policy()
            cpufreq_cpu_get()

            beginning the deference of policy        Thread B:
            ... ... __cpufreq_remove_dev_finish()
cpufreq_policy_free(policy);


Perhaps move policy->rwsem out side the policy structure is a way to 
avoid it completely.
and you could stopping the PPC thread stepping forward as my patch as 
temporary workaround.

Thanks,
Ethan




>   

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Viresh Kumar Feb. 2, 2015, 3:24 a.m. UTC | #3
On 2 February 2015 at 08:50, ethan zhao <ethan.zhao@oracle.com> wrote:
>  This seems couldn't prevent all the 'bad thing' from happening, E.G.
>
>
>  Thread A: Workqueue: kacpi_notify
>
>  acpi_processor_notify()
>    acpi_processor_ppc_has_changed()
>          cpufreq_update_policy()
>            cpufreq_cpu_get()

We take cpufreq_driver_lock() here, and so this will
block thread B.

>            beginning the deference of policy        Thread B:
>            ... ... __cpufreq_remove_dev_finish()
> cpufreq_policy_free(policy);
>
>
> Perhaps move policy->rwsem out side the policy structure is a way to avoid
> it completely.
> and you could stopping the PPC thread stepping forward as my patch as
> temporary workaround.

I couldn't understand your problem completely. Apart from giving a detailed
look of what's going on both threads, always specify where the BUG actually
is..
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
ethan zhao Feb. 2, 2015, 3:38 a.m. UTC | #4
On 2015/2/2 11:24, Viresh Kumar wrote:
> On 2 February 2015 at 08:50, ethan zhao <ethan.zhao@oracle.com> wrote:
>>   This seems couldn't prevent all the 'bad thing' from happening, E.G.
>>
>>
>>   Thread A: Workqueue: kacpi_notify
>>
>>   acpi_processor_notify()
>>     acpi_processor_ppc_has_changed()
>>           cpufreq_update_policy()
>>             cpufreq_cpu_get()
> We take cpufreq_driver_lock() here, and so this will
> block thread B.
  No, there is no  cpufreq_driver_lock acquired between

  cpufreq_cpu_get()  and cpufreq_cpu_put()


>>             beginning the deference of policy        Thread B:
>>             ... ... __cpufreq_remove_dev_finish()
>> cpufreq_policy_free(policy);
>>
>>
>> Perhaps move policy->rwsem out side the policy structure is a way to avoid
>> it completely.
>> and you could stopping the PPC thread stepping forward as my patch as
>> temporary workaround.
> I couldn't understand your problem completely. Apart from giving a detailed
> look of what's going on both threads, always specify where the BUG actually
> is..
The problem is you are using a rwsem inside policy structure to protect its
assessment, that is bad design.

Thanks,
Ethan

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Viresh Kumar Feb. 2, 2015, 3:43 a.m. UTC | #5
On 2 February 2015 at 09:08, ethan zhao <ethan.zhao@oracle.com> wrote:
>> We take cpufreq_driver_lock() here, and so this will
>> block thread B.
>
>  No, there is no  cpufreq_driver_lock acquired between
>
>  cpufreq_cpu_get()  and cpufreq_cpu_put()

I am not saying that the lock is taken between them. But within
cpufreq_cpu_get() to make sure policy doesn't get freed while we
are doing kobject_get().

>>>             beginning the deference of policy        Thread B:
>>>             ... ... __cpufreq_remove_dev_finish()
>>> cpufreq_policy_free(policy);
>>>
>>>
>>> Perhaps move policy->rwsem out side the policy structure is a way to
>>> avoid
>>> it completely.
>>> and you could stopping the PPC thread stepping forward as my patch as
>>> temporary workaround.
>>
>> I couldn't understand your problem completely. Apart from giving a
>> detailed
>> look of what's going on both threads, always specify where the BUG
>> actually
>> is..
>
> The problem is you are using a rwsem inside policy structure to protect its
> assessment, that is bad design.

What is the current bug you are facing right now, I haven't understood it well.
Also a lock within the structure isn't new. Its all over the kernel. I
don't understand
why you say its a bad design.

--
viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
ethan zhao Feb. 2, 2015, 3:56 a.m. UTC | #6
On 2015/2/2 11:43, Viresh Kumar wrote:
> On 2 February 2015 at 09:08, ethan zhao <ethan.zhao@oracle.com> wrote:
>>> We take cpufreq_driver_lock() here, and so this will
>>> block thread B.
>>   No, there is no  cpufreq_driver_lock acquired between
>>
>>   cpufreq_cpu_get()  and cpufreq_cpu_put()
> I am not saying that the lock is taken between them. But within
> cpufreq_cpu_get() to make sure policy doesn't get freed while we
> are doing kobject_get().
  How to prevent the policy to be freed between

cpufreq_cpu_get()  and cpufreq_cpu_put() ?

>>>>              beginning the deference of policy        Thread B:
>>>>              ... ... __cpufreq_remove_dev_finish()
>>>> cpufreq_policy_free(policy);
>>>>
>>>>
>>>> Perhaps move policy->rwsem out side the policy structure is a way to
>>>> avoid
>>>> it completely.
>>>> and you could stopping the PPC thread stepping forward as my patch as
>>>> temporary workaround.
>>> I couldn't understand your problem completely. Apart from giving a
>>> detailed
>>> look of what's going on both threads, always specify where the BUG
>>> actually
>>> is..
>> The problem is you are using a rwsem inside policy structure to protect its
>> assessment, that is bad design.
> What is the current bug you are facing right now, I haven't understood it well.
> Also a lock within the structure isn't new. Its all over the kernel. I
> don't understand
> why you say its a bad design.
You are maxing up the water with sand ?

Thanks,
Ethan

>
> --
> viresh

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Viresh Kumar Feb. 2, 2015, 3:59 a.m. UTC | #7
On 2 February 2015 at 09:26, ethan zhao <ethan.zhao@oracle.com> wrote:
>  How to prevent the policy to be freed between
>
> cpufreq_cpu_get()  and cpufreq_cpu_put() ?

kobject_get() increases the reference count of a policy and the policy
will only be freed when this is zero. And it will only be zero once all
cpufreq_cpu_get() are matched with corresponding cpufreq_cpu_put().

> You are maxing up the water with sand ?

:)
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
ethan zhao Feb. 2, 2015, 4:06 a.m. UTC | #8
On 2015/2/2 11:59, Viresh Kumar wrote:
> On 2 February 2015 at 09:26, ethan zhao <ethan.zhao@oracle.com> wrote:
>>   How to prevent the policy to be freed between
>>
>> cpufreq_cpu_get()  and cpufreq_cpu_put() ?
> kobject_get() increases the reference count of a policy and the policy
> will only be freed when this is zero. And it will only be zero once all
> cpufreq_cpu_get() are matched with corresponding cpufreq_cpu_put().
  Is that an idea it supposed to be or fact ?

if (!cpufreq_suspended)
             cpufreq_policy_free(policy);

  static void cpufreq_policy_free(struct cpufreq_policy *policy)
{
     free_cpumask_var(policy->related_cpus);
     free_cpumask_var(policy->cpus);
     kfree(policy);
}

  It seems you just think about it ideally in mind.

  Thanks,
  Ethan
>
>> You are maxing up the water with sand ?
> :)

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Viresh Kumar Feb. 2, 2015, 4:09 a.m. UTC | #9
On 2 February 2015 at 09:36, ethan zhao <ethan.zhao@oracle.com> wrote:
>  Is that an idea it supposed to be or fact ?
>
> if (!cpufreq_suspended)
>             cpufreq_policy_free(policy);
>
>  static void cpufreq_policy_free(struct cpufreq_policy *policy)
> {
>     free_cpumask_var(policy->related_cpus);
>     free_cpumask_var(policy->cpus);
>     kfree(policy);
> }
>
>  It seems you just think about it ideally in mind.

if (!cpufreq_suspended)
        cpufreq_policy_put_kobj(policy);

static void cpufreq_policy_put_kobj(struct cpufreq_policy *policy)
{
        ...

        kobject_put(kobj);

        /*
        * We need to make sure that the underlying kobj is
        * actually not referenced anymore by anybody before we
        * proceed with unloading.
        */
        pr_debug("waiting for dropping of refcount\n");
        wait_for_completion(cmp);
}
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
ethan zhao Feb. 2, 2015, 4:16 a.m. UTC | #10
On 2015/2/2 12:09, Viresh Kumar wrote:
> On 2 February 2015 at 09:36, ethan zhao <ethan.zhao@oracle.com> wrote:
>>   Is that an idea it supposed to be or fact ?
>>
>> if (!cpufreq_suspended)
>>              cpufreq_policy_free(policy);
>>
>>   static void cpufreq_policy_free(struct cpufreq_policy *policy)
>> {
>>      free_cpumask_var(policy->related_cpus);
>>      free_cpumask_var(policy->cpus);
>>      kfree(policy);
>> }
>>
>>   It seems
>>   you just think about it ideally in mind.
  We am talking about the policy allocation and de-allocation. right ?
  I showed you the cpufreq_policy_free(policy) doesn't check kobject
  refcount  as above.

  Hmmm, you are still sleeping in the kobject, wake up and don't mix
  water anymore.

Thanks,
Ethan


> if (!cpufreq_suspended)
>          cpufreq_policy_put_kobj(policy);
>
> static void cpufreq_policy_put_kobj(struct cpufreq_policy *policy)
> {
>          ...
>
>          kobject_put(kobj);
>
>          /*
>          * We need to make sure that the underlying kobj is
>          * actually not referenced anymore by anybody before we
>          * proceed with unloading.
>          */
>          pr_debug("waiting for dropping of refcount\n");
>          wait_for_completion(cmp);
> }

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Viresh Kumar Feb. 2, 2015, 4:26 a.m. UTC | #11
On 2 February 2015 at 09:46, ethan zhao <ethan.zhao@oracle.com> wrote:
>  We am talking about the policy allocation and de-allocation. right ?
>  I showed you the cpufreq_policy_free(policy) doesn't check kobject
>  refcount  as above.
>
>  Hmmm, you are still sleeping in the kobject, wake up and don't mix
>  water anymore.

It would be nice if we give each other the respect we deserve, And talk
about concrete points here.

>> if (!cpufreq_suspended)
>>          cpufreq_policy_put_kobj(policy);
>>
>> static void cpufreq_policy_put_kobj(struct cpufreq_policy *policy)
>> {
>>          ...
>>
>>          kobject_put(kobj);
>>
>>          /*
>>          * We need to make sure that the underlying kobj is
>>          * actually not referenced anymore by anybody before we
>>          * proceed with unloading.
>>          */
>>          pr_debug("waiting for dropping of refcount\n");
>>          wait_for_completion(cmp);
>> }

I gave you exactly what you wanted to go through, but it seems you
haven't tried enough.

Before freeing policy with cpufreq_policy_free(), we call
cpufreq_policy_put_kobj(). Now what does this function do? It waits
for the completion to fire (wait_for_completion()). This completion
will only fire when refcount of a kobject becomes zero.

Initially when we create the kobject, it is initialized to one. And the
last kobject_put() you see above in cpufreq_policy_put_kobj()
makes it zero. All other cpufreq_cpu_get() and put() should happen
in pairs, otherwise this refcount will never be zero again.

As soon as the refcount becomes zero, we are sure no one else is
using the policy structure anymore. And so we free it with
cpufreq_policy_free().

That routines doesn't have any tricks and simply frees the policy.
Because, before calling cpufreq_policy_put_kobj(), we have set
the per-cpu variable to NULL, nobody else will get the policy
structure by calling cpufreq_cpu_get(). And that's what my patch
tried to solve.

Let me know if I wasn't explanatory enough..

--
viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
ethan zhao Feb. 2, 2015, 4:45 a.m. UTC | #12
On 2015/2/2 12:26, Viresh Kumar wrote:
> On 2 February 2015 at 09:46, ethan zhao <ethan.zhao@oracle.com> wrote:
>>   We am talking about the policy allocation and de-allocation. right ?
>>   I showed you the cpufreq_policy_free(policy) doesn't check kobject
>>   refcount  as above.
>>
>>   Hmmm, you are still sleeping in the kobject, wake up and don't mix
>>   water anymore.
> It would be nice if we give each other the respect we deserve, And talk
> about concrete points here.
  Welcome back to the right way.
>>> if (!cpufreq_suspended)
>>>           cpufreq_policy_put_kobj(policy);
>>>
>>> static void cpufreq_policy_put_kobj(struct cpufreq_policy *policy)
>>> {
>>>           ...
>>>
>>>           kobject_put(kobj);
>>>
>>>           /*
>>>           * We need to make sure that the underlying kobj is
>>>           * actually not referenced anymore by anybody before we
>>>           * proceed with unloading.
>>>           */
>>>           pr_debug("waiting for dropping of refcount\n");
>>>           wait_for_completion(cmp);
>>> }
> I gave you exactly what you wanted to go through, but it seems you
> haven't tried enough.
>
> Before freeing policy with cpufreq_policy_free(), we call
> cpufreq_policy_put_kobj(). Now what does this function do? It waits
> for the completion to fire (wait_for_completion()). This completion
> will only fire when refcount of a kobject becomes zero.
>
> Initially when we create the kobject, it is initialized to one. And the
> last kobject_put() you see above in cpufreq_policy_put_kobj()
> makes it zero. All other cpufreq_cpu_get() and put() should happen
> in pairs, otherwise this refcount will never be zero again.
>
> As soon as the refcount becomes zero, we are sure no one else is
> using the policy structure anymore. And so we free it with
> cpufreq_policy_free().
  But there is no checking against refcount in or before

  cpufreq_policy_free(), that is one issue I mentioned.

>
> That routines doesn't have any tricks and simply frees the policy.
> Because, before calling cpufreq_policy_put_kobj(), we have set
> the per-cpu variable to NULL, nobody else will get the policy
  It is possible cpufreq_cpu_get() within the PPC thread was called just
  before __cpufreq_remove_dev_finish() is to be called in another thread,
  so you set the per_cpu(cpufreq_cpu_data, cpu) to NULL will not prevent
  the actions between cpufreq_cpu_get and cpufreq_cpu_put().

  And then the freeing happens in __cpufreq_remove_dev_finish().
> structure by calling cpufreq_cpu_get(). And that's what my patch
> tried to solve.
>
> Let me know if I wasn't explanatory enough..

Ethan
>
> --
> viresh

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Viresh Kumar Feb. 2, 2015, 4:54 a.m. UTC | #13
On 2 February 2015 at 10:15, ethan zhao <ethan.zhao@oracle.com> wrote:
> On 2015/2/2 12:26, Viresh Kumar wrote:

>  But there is no checking against refcount in or before
>
>  cpufreq_policy_free(), that is one issue I mentioned.

As I said earlier, the completion will only fire once the refcount
is zero. And so there is no need of any check here.

>> That routines doesn't have any tricks and simply frees the policy.
>> Because, before calling cpufreq_policy_put_kobj(), we have set
>> the per-cpu variable to NULL, nobody else will get the policy
>
>  It is possible cpufreq_cpu_get() within the PPC thread was called just
>  before __cpufreq_remove_dev_finish() is to be called in another thread,
>  so you set the per_cpu(cpufreq_cpu_data, cpu) to NULL will not prevent
>  the actions between cpufreq_cpu_get and cpufreq_cpu_put().
>
>  And then the freeing happens in __cpufreq_remove_dev_finish().

It will.. You aren't looking closely enough. If cpufreq_cpu_get() is called just
before remove-dev, then cpufreq_cpu_get() will take:

read_lock_irqsave(&cpufreq_driver_lock, flags);

And it will do:

read_unlock_irqrestore(&cpufreq_driver_lock, flags);

only after increasing the refcount with kobject_get().

While on the other side __cpufreq_remove_dev_finish() will do this:

       write_lock_irqsave(&cpufreq_driver_lock, flags);
       policy = per_cpu(cpufreq_cpu_data, cpu);
       per_cpu(cpufreq_cpu_data, cpu) = NULL;
       write_unlock_irqrestore(&cpufreq_driver_lock, flags);

So, it will wait for the read_lock in cpufreq_cpu_get() to finish before
setting per-cpu variable to NULL. And so, after kobject_put() in
cpufreq_policy_put_kobj(), we will wait for the completion to fire
and that will only happen once a corresponding cpufreq_cpu_put()
is issued.
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
ethan zhao Feb. 2, 2015, 3:04 p.m. UTC | #14
On 2015/2/2 12:54, Viresh Kumar wrote:
> On 2 February 2015 at 10:15, ethan zhao <ethan.zhao@oracle.com> wrote:
>> On 2015/2/2 12:26, Viresh Kumar wrote:
>>   But there is no checking against refcount in or before
>>
>>   cpufreq_policy_free(), that is one issue I mentioned.
> As I said earlier, the completion will only fire once the refcount
> is zero. And so there is no need of any check here.
>
>>> That routines doesn't have any tricks and simply frees the policy.
>>> Because, before calling cpufreq_policy_put_kobj(), we have set
>>> the per-cpu variable to NULL, nobody else will get the policy
>>   It is possible cpufreq_cpu_get() within the PPC thread was called just
>>   before __cpufreq_remove_dev_finish() is to be called in another thread,
>>   so you set the per_cpu(cpufreq_cpu_data, cpu) to NULL will not prevent
>>   the actions between cpufreq_cpu_get and cpufreq_cpu_put().
>>
>>   And then the freeing happens in __cpufreq_remove_dev_finish().
> It will.. You aren't looking closely enough. If cpufreq_cpu_get() is called just
> before remove-dev, then cpufreq_cpu_get() will take:
>
> read_lock_irqsave(&cpufreq_driver_lock, flags);
>
> And it will do:
>
> read_unlock_irqrestore(&cpufreq_driver_lock, flags);
>
> only after increasing the refcount with kobject_get().
>
> While on the other side __cpufreq_remove_dev_finish() will do this:
>
>         write_lock_irqsave(&cpufreq_driver_lock, flags);
>         policy = per_cpu(cpufreq_cpu_data, cpu);
>         per_cpu(cpufreq_cpu_data, cpu) = NULL;
>         write_unlock_irqrestore(&cpufreq_driver_lock, flags);
>
> So, it will wait for the read_lock in cpufreq_cpu_get() to finish before
> setting per-cpu variable to NULL. And so, after kobject_put() in
> cpufreq_policy_put_kobj(), we will wait for the completion to fire
  Closely enough this time, understood, thanks for your explanation.


  Ethan
> and that will only happen once a corresponding cpufreq_cpu_put()
> is issued.

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ethan Zhao Feb. 25, 2015, 3:24 a.m. UTC | #15
Viresh,
  With this patch applied, still got the following warning and panic,
seems it needs more care.

     54.474618] ------------[ cut here ]------------
[   54.545816] WARNING: CPU: 0 PID: 213 at include/linux/kref.h:47
kobject_get+0x41/0x50()
[   54.642595] Modules linked in: i2c_i801(+) mfd_core shpchp(+)
acpi_cpufreq(+) edac_core ioatdma(+) xfs libcrc32c ast syscopyarea ixgbe
sysfillrect sysimgblt sr_mod sd_mod drm_kms_helper igb mdio cdrom e1000e ahci
dca ttm libahci uas drm i2c_algo_bit ptp megaraid_sas libata usb_storage
i2c_core pps_core dm_mirror dm_region_hash dm_log dm_mod
[   55.007264] CPU: 0 PID: 213 Comm: kworker/0:2 Not tainted
3.18.5
[   55.099970] Hardware name: Oracle Corporation SUN FIRE X4170 M2 SERVER
   /ASSY,MOTHERBOARD,X4170, BIOS 08120104 05/08/2012
[   55.239736] Workqueue: kacpi_notify acpi_os_execute_deferred
[   55.308598]  0000000000000000 00000000bd730b61 ffff88046742baf8
ffffffff816b7edb
[   55.398305]  0000000000000000 0000000000000000 ffff88046742bb38
ffffffff81078ae1
[   55.488040]  ffff88046742bbd8 ffff8806706b3000 0000000000000292
0000000000000000
[   55.577776] Call Trace:
[   55.608228]  [<ffffffff816b7edb>] dump_stack+0x46/0x58
[   55.670895]  [<ffffffff81078ae1>] warn_slowpath_common+0x81/0xa0
[   55.743952]  [<ffffffff81078bfa>] warn_slowpath_null+0x1a/0x20
[   55.814929]  [<ffffffff8130d0d1>] kobject_get+0x41/0x50
[   55.878654]  [<ffffffff8153e955>] cpufreq_cpu_get+0x75/0xc0
[   55.946528]  [<ffffffff8153f37e>] cpufreq_update_policy+0x2e/0x1f0
[   56.021682]  [<ffffffff810bf9d2>] ? up+0x32/0x50
[   56.078126]  [<ffffffff813ab975>] ? acpi_ns_get_node+0xcb/0xf2
[   56.148974]  [<ffffffff813abdc9>] ? acpi_evaluate_object+0x22c/0x252
[   56.226066]  [<ffffffff813ac3c2>] ? acpi_get_handle+0x95/0xc0
[   56.295871]  [<ffffffff8138a7fb>] ? acpi_has_method+0x25/0x40
[   56.365661]  [<ffffffff813bbcd4>] acpi_processor_ppc_has_changed+0x77/0x82
[   56.448956]  [<ffffffff8108f726>] ? move_linked_works+0x66/0x90
[   56.520842]  [<ffffffff813b87b9>] acpi_processor_notify+0x58/0xe7
[   56.594807]  [<ffffffff8139dfd8>] acpi_ev_notify_dispatch+0x44/0x5c
[   56.670859]  [<ffffffff81388fe3>] acpi_os_execute_deferred+0x15/0x22
[   56.747936]  [<ffffffff8109268e>] process_one_work+0x14e/0x3f0
[   56.818766]  [<ffffffff81092d9b>] worker_thread+0x11b/0x4d0
[   56.886486]  [<ffffffff81092c80>] ? rescuer_thread+0x350/0x350
[   56.957316]  [<ffffffff810984f1>] kthread+0xe1/0x100
[   57.017742]  [<ffffffff81098410>] ? kthread_create_on_node+0x1b0/0x1b0
[   57.096903]  [<ffffffff816bfe7c>] ret_from_fork+0x7c/0xb0
[   57.162534]  [<ffffffff81098410>] ? kthread_create_on_node+0x1b0/0x1b0
[   57.241680] ---[ end trace dce06bb76f547de5 ]---


Any idea ?

Thanks,
Ethan

On Mon, Feb 2, 2015 at 11:04 PM, ethan zhao <ethan.zhao@oracle.com> wrote:
>
> On 2015/2/2 12:54, Viresh Kumar wrote:
>>
>> On 2 February 2015 at 10:15, ethan zhao <ethan.zhao@oracle.com> wrote:
>>>
>>> On 2015/2/2 12:26, Viresh Kumar wrote:
>>>   But there is no checking against refcount in or before
>>>
>>>   cpufreq_policy_free(), that is one issue I mentioned.
>>
>> As I said earlier, the completion will only fire once the refcount
>> is zero. And so there is no need of any check here.
>>
>>>> That routines doesn't have any tricks and simply frees the policy.
>>>> Because, before calling cpufreq_policy_put_kobj(), we have set
>>>> the per-cpu variable to NULL, nobody else will get the policy
>>>
>>>   It is possible cpufreq_cpu_get() within the PPC thread was called just
>>>   before __cpufreq_remove_dev_finish() is to be called in another thread,
>>>   so you set the per_cpu(cpufreq_cpu_data, cpu) to NULL will not prevent
>>>   the actions between cpufreq_cpu_get and cpufreq_cpu_put().
>>>
>>>   And then the freeing happens in __cpufreq_remove_dev_finish().
>>
>> It will.. You aren't looking closely enough. If cpufreq_cpu_get() is
>> called just
>> before remove-dev, then cpufreq_cpu_get() will take:
>>
>> read_lock_irqsave(&cpufreq_driver_lock, flags);
>>
>> And it will do:
>>
>> read_unlock_irqrestore(&cpufreq_driver_lock, flags);
>>
>> only after increasing the refcount with kobject_get().
>>
>> While on the other side __cpufreq_remove_dev_finish() will do this:
>>
>>         write_lock_irqsave(&cpufreq_driver_lock, flags);
>>         policy = per_cpu(cpufreq_cpu_data, cpu);
>>         per_cpu(cpufreq_cpu_data, cpu) = NULL;
>>         write_unlock_irqrestore(&cpufreq_driver_lock, flags);
>>
>> So, it will wait for the read_lock in cpufreq_cpu_get() to finish before
>> setting per-cpu variable to NULL. And so, after kobject_put() in
>> cpufreq_policy_put_kobj(), we will wait for the completion to fire
>
>  Closely enough this time, understood, thanks for your explanation.
>
>
>  Ethan
>
>> and that will only happen once a corresponding cpufreq_cpu_put()
>> is issued.
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 4473eba1d6b0..e3bf702b5588 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -1409,9 +1409,10 @@  static int __cpufreq_remove_dev_finish(struct device *dev,
 	unsigned long flags;
 	struct cpufreq_policy *policy;
 
-	read_lock_irqsave(&cpufreq_driver_lock, flags);
+	write_lock_irqsave(&cpufreq_driver_lock, flags);
 	policy = per_cpu(cpufreq_cpu_data, cpu);
-	read_unlock_irqrestore(&cpufreq_driver_lock, flags);
+	per_cpu(cpufreq_cpu_data, cpu) = NULL;
+	write_unlock_irqrestore(&cpufreq_driver_lock, flags);
 
 	if (!policy) {
 		pr_debug("%s: No cpu_data found\n", __func__);
@@ -1466,7 +1467,6 @@  static int __cpufreq_remove_dev_finish(struct device *dev,
 		}
 	}
 
-	per_cpu(cpufreq_cpu_data, cpu) = NULL;
 	return 0;
 }