Message ID | ed8fd187687cb4ea9afd0bc32107ca5abf03e679.1422580135.git.viresh.kumar@linaro.org (mailing list archive) |
---|---|
State | Not Applicable, archived |
Headers | show |
Viresh, On 2015/1/30 9:13, Viresh Kumar wrote: > cpufreq_cpu_data is protected by cpufreq_driver_lock and one of the instances > has missed this. And as a result we get this: > > ------------[ cut here ]------------ > WARNING: CPU: 0 PID: 4 at include/linux/kref.h:47 > kobject_get+0x41/0x50() > Modules linked in: acpi_cpufreq(+) nfsd auth_rpcgss nfs_acl > lockd grace sunrpc xfs libcrc32c sd_mod ixgbe igb mdio ahci hwmon > ... > Call Trace: > [<ffffffff81661b14>] dump_stack+0x46/0x58 > [<ffffffff81072b61>] warn_slowpath_common+0x81/0xa0 > [<ffffffff81072c7a>] warn_slowpath_null+0x1a/0x20 > [<ffffffff812e16d1>] kobject_get+0x41/0x50 > [<ffffffff815262a5>] cpufreq_cpu_get+0x75/0xc0 > [<ffffffff81527c3e>] cpufreq_update_policy+0x2e/0x1f0 > [<ffffffff810b8cb2>] ? up+0x32/0x50 > [<ffffffff81381aa9>] ? acpi_ns_get_node+0xcb/0xf2 > [<ffffffff81381efd>] ? acpi_evaluate_object+0x22c/0x252 > [<ffffffff813824f6>] ? acpi_get_handle+0x95/0xc0 > [<ffffffff81360967>] ? acpi_has_method+0x25/0x40 > [<ffffffff81391e08>] acpi_processor_ppc_has_changed+0x77/0x82 > [<ffffffff81089566>] ? move_linked_works+0x66/0x90 > [<ffffffff8138e8ed>] acpi_processor_notify+0x58/0xe7 > [<ffffffff8137410c>] acpi_ev_notify_dispatch+0x44/0x5c > [<ffffffff8135f293>] acpi_os_execute_deferred+0x15/0x22 > [<ffffffff8108c910>] process_one_work+0x160/0x410 > [<ffffffff8108d05b>] worker_thread+0x11b/0x520 > [<ffffffff8108cf40>] ? rescuer_thread+0x380/0x380 > [<ffffffff81092421>] kthread+0xe1/0x100 > [<ffffffff81092340>] ? kthread_create_on_node+0x1b0/0x1b0 > [<ffffffff81669ebc>] ret_from_fork+0x7c/0xb0 > [<ffffffff81092340>] ? kthread_create_on_node+0x1b0/0x1b0 > ---[ end trace 89e66eb9795efdf7 ]--- > > And here is the race: > > Thread A: Workqueue: kacpi_notify > > acpi_processor_notify() > acpi_processor_ppc_has_changed() > cpufreq_update_policy() > cpufreq_cpu_get() > kobject_get() > > Thread B: xenbus_thread() > > xenbus_thread() > msg->u.watch.handle->callback() > handle_vcpu_hotplug_event() > vcpu_hotplug() > cpu_down() > __cpu_notify(CPU_POST_DEAD..) > cpufreq_cpu_callback() > __cpufreq_remove_dev_finish() > cpufreq_policy_put_kobj() > kobject_put() > > cpufreq_cpu_get() gets the policy from per-cpu variable cpufreq_cpu_data under > cpufreq_driver_lock, and once it gets a valid policy it expects it to not be > freed until cpufreq_cpu_put() is called. It is another bug. > > But the race happens when another thread puts the kobject first and updates > cpufreq_cpu_data later and that too without these locks. And so the first thread > gets a valid policy structure and before it does kobject_get() on it, the second > one does kobject_put(). And so this WARN(). > > Fix this by setting cpufreq_cpu_data to NULL before putting the kobject and that > too under locks. > > Cc: <stable@vger.kernel.org> # 3.12+ > Reported-by: Ethan Zhao <ethan.zhao@oracle.com> > Reported-and-tested-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> > Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> > --- > @Santosh: I have changed read locks to write locks here and so you need to test > again. > > drivers/cpufreq/cpufreq.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c > index 4473eba1d6b0..e3bf702b5588 100644 > --- a/drivers/cpufreq/cpufreq.c > +++ b/drivers/cpufreq/cpufreq.c > @@ -1409,9 +1409,10 @@ static int __cpufreq_remove_dev_finish(struct device *dev, > unsigned long flags; > struct cpufreq_policy *policy; > > - read_lock_irqsave(&cpufreq_driver_lock, flags); > + write_lock_irqsave(&cpufreq_driver_lock, flags); > policy = per_cpu(cpufreq_cpu_data, cpu); > - read_unlock_irqrestore(&cpufreq_driver_lock, flags); > + per_cpu(cpufreq_cpu_data, cpu) = NULL; > + write_unlock_irqrestore(&cpufreq_driver_lock, flags); > > if (!policy) { > pr_debug("%s: No cpu_data found\n", __func__); > @@ -1466,7 +1467,6 @@ static int __cpufreq_remove_dev_finish(struct device *dev, > } > } > > - per_cpu(cpufreq_cpu_data, cpu) = NULL; Yes, here is one bug should be fix. but seems not enough to avoid the issue completely, how about the Thread B running here Thread B: xenbus_thread() xenbus_thread() msg->u.watch.handle->callback() handle_vcpu_hotplug_event() vcpu_hotplug() cpu_down() __cpu_notify(CPU_POST_DEAD..) cpufreq_cpu_callback() __cpufreq_remove_dev_prepare update_policy_cpu(){ ... down_write(&policy->rwsem); policy->last_cpu = policy->cpu; policy->cpu = cpu; up_write(&policy->rwsem); ----> } And thread A run to the same position as above, don't ignore my work blindly, that piece of bandage could save your time :> Thanks, Ethan > return 0; > } > -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 30 January 2015 at 07:00, ethan zhao <ethan.zhao@oracle.com> wrote: > It is another bug. Hmm, lets see.. > Yes, here is one bug should be fix. but seems not enough to avoid the issue > completely, Did you test it ? > how about the Thread B running here > > Thread B: xenbus_thread() > > xenbus_thread() > msg->u.watch.handle->callback() > handle_vcpu_hotplug_event() > vcpu_hotplug() > cpu_down() > __cpu_notify(CPU_POST_DEAD..) > cpufreq_cpu_callback() > __cpufreq_remove_dev_prepare > update_policy_cpu(){ > ... > down_write(&policy->rwsem); > policy->last_cpu = policy->cpu; > policy->cpu = cpu; > up_write(&policy->rwsem); > ----> > } > > And thread A run to the same position as above, don't ignore my work > blindly, that piece of bandage > could save your time Oh, please don't misunderstand me. I didn't had any intention of showing any kind of disrespect. Okay, why do you say that the above thread you shown has a bug in there? Its juse updating policy->cpu and that shouldn't make anything unstable. Please explain again one more time, in details why do you think you hit a different bug. Also look at kref.h to see which piece of code has hit that WARN() and it will happen only in the case I have shown. Lets see if I missed something :) -- viresh -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Viresh, On 2015/1/30 10:05, Viresh Kumar wrote: > On 30 January 2015 at 07:00, ethan zhao <ethan.zhao@oracle.com> wrote: > >> It is another bug. > Hmm, lets see.. > >> Yes, here is one bug should be fix. but seems not enough to avoid the issue >> completely, > Did you test it ? For a PPC notification and xen-bus thread race, could you tell me a way how to reproduce it by trigger the PPC notification and xen-bus events manually ? You really want me write some code into a test kernel to flood the PPC and xen-bus at the same time ? if we could analysis code and get the issue clearly, we wouldn't wait the users to yell out. Thanks, Ethan > >> how about the Thread B running here >> >> Thread B: xenbus_thread() >> >> xenbus_thread() >> msg->u.watch.handle->callback() >> handle_vcpu_hotplug_event() >> vcpu_hotplug() >> cpu_down() >> __cpu_notify(CPU_POST_DEAD..) >> cpufreq_cpu_callback() >> __cpufreq_remove_dev_prepare >> update_policy_cpu(){ >> ... >> down_write(&policy->rwsem); >> policy->last_cpu = policy->cpu; >> policy->cpu = cpu; >> up_write(&policy->rwsem); >> ----> >> } >> >> And thread A run to the same position as above, don't ignore my work >> blindly, that piece of bandage >> could save your time > Oh, please don't misunderstand me. I didn't had any intention of showing > any kind of disrespect. > > Okay, why do you say that the above thread you shown has a bug in there? > Its juse updating policy->cpu and that shouldn't make anything unstable. > > Please explain again one more time, in details why do you think you hit a > different bug. Also look at kref.h to see which piece of code has hit that > WARN() and it will happen only in the case I have shown. Lets see if I > missed something :) > > -- > viresh -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 30 January 2015 at 07:40, ethan zhao <ethan.zhao@oracle.com> wrote: > For a PPC notification and xen-bus thread race, could you tell me a way how > to reproduce it by trigger the PPC notification and xen-bus events manually > ? > You really want me write some code into a test kernel to flood the PPC and > xen-bus at the same time ? if we could analysis code and get the issue > clearly, we wouldn't wait the users to yell out. I thought you already have a test where you are hitting the issue you originally reported. Atleast Santosh did confirm that he is hitting 3/5 times in his kernel during boot.. My reasoning of why your observation doesn't fit here: Copying from your earlier mail.. Thread A: Workqueue: kacpi_notify acpi_processor_notify() acpi_processor_ppc_has_changed() cpufreq_update_policy() cpufreq_cpu_get() kobject_get() This tries to increment the count and the warning you have mentioned happen because: WARN_ON_ONCE(atomic_inc_return(&kref->refcount) < 2); i.e. even after incrementing the count, it is < 2. Which I believe will be 1. Which means that we have tried to do kobject_get() on a kobject for which kobject_put() is already done. Thread B: xenbus_thread() xenbus_thread() msg->u.watch.handle->callback() handle_vcpu_hotplug_event() vcpu_hotplug() cpu_down() __cpu_notify(CPU_DOWN_PREPARE..) cpufreq_cpu_callback() __cpufreq_remove_dev_prepare() update_policy_cpu() kobject_move() Okay, where is the race or kobject_put() here ? We are just moving the kobject and it has nothing to do with the refcount of kobject. Why do you see its a race ? -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2015/1/30 10:13, Viresh Kumar wrote: > On 30 January 2015 at 07:40, ethan zhao <ethan.zhao@oracle.com> wrote: >> For a PPC notification and xen-bus thread race, could you tell me a way how >> to reproduce it by trigger the PPC notification and xen-bus events manually >> ? >> You really want me write some code into a test kernel to flood the PPC and >> xen-bus at the same time ? if we could analysis code and get the issue >> clearly, we wouldn't wait the users to yell out. > I thought you already have a test where you are hitting the issue you originally > reported. Atleast Santosh did confirm that he is hitting 3/5 times in his kernel > during boot.. As I know, PPC notification only happens when power capping needed, maybe the server over-hot, if the cooling condition recover, you couldn't reproduce it either !. > > My reasoning of why your observation doesn't fit here: > > Copying from your earlier mail.. > > Thread A: Workqueue: kacpi_notify > > acpi_processor_notify() > acpi_processor_ppc_has_changed() > cpufreq_update_policy() > cpufreq_cpu_get() > kobject_get() > > This tries to increment the count and the warning you have mentioned > happen because: > > WARN_ON_ONCE(atomic_inc_return(&kref->refcount) < 2); > > i.e. even after incrementing the count, it is < 2. Which I believe will be > 1. Which means that we have tried to do kobject_get() on a kobject > for which kobject_put() is already done. > > Thread B: xenbus_thread() > > xenbus_thread() > msg->u.watch.handle->callback() > handle_vcpu_hotplug_event() > vcpu_hotplug() > cpu_down() > __cpu_notify(CPU_DOWN_PREPARE..) > cpufreq_cpu_callback() > __cpufreq_remove_dev_prepare() > update_policy_cpu() > kobject_move() > > > Okay, where is the race or kobject_put() here ? We are just moving > the kobject and it has nothing to do with the refcount of kobject. > > Why do you see its a race ? I mean the policy->cpu has been changed, that CPU is about to be down, Thread A continue to get and update the policy for it blindly, that is what I Say 'race', not the refcount itself. Thanks, Ethan -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 30 January 2015 at 07:51, ethan zhao <ethan.zhao@oracle.com> wrote: >> My reasoning of why your observation doesn't fit here: >> >> Copying from your earlier mail.. >> >> Thread A: Workqueue: kacpi_notify >> >> acpi_processor_notify() >> acpi_processor_ppc_has_changed() >> cpufreq_update_policy() >> cpufreq_cpu_get() >> kobject_get() >> >> This tries to increment the count and the warning you have mentioned >> happen because: >> >> WARN_ON_ONCE(atomic_inc_return(&kref->refcount) < 2); >> >> i.e. even after incrementing the count, it is < 2. Which I believe will be >> 1. Which means that we have tried to do kobject_get() on a kobject >> for which kobject_put() is already done. >> >> Thread B: xenbus_thread() >> >> xenbus_thread() >> msg->u.watch.handle->callback() >> handle_vcpu_hotplug_event() >> vcpu_hotplug() >> cpu_down() >> __cpu_notify(CPU_DOWN_PREPARE..) >> cpufreq_cpu_callback() >> __cpufreq_remove_dev_prepare() >> update_policy_cpu() >> kobject_move() >> >> >> Okay, where is the race or kobject_put() here ? We are just moving >> the kobject and it has nothing to do with the refcount of kobject. >> >> Why do you see its a race ? > > I mean the policy->cpu has been changed, that CPU is about to be down, > Thread A continue to get and update the policy for it blindly, that is > what I Say 'race', not the refcount itself. First of all, the WARN you had in your patch doesn't have anything to do with the so-called race you just define. Its because of the reason I defined earlier. Second, what if policy->cpu is getting updated? A policy manages a group of CPUs, not a single cpu. And there still are other CPUs online for that policy and so kobject_get() for that policy->kobj is perfectly valid. -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2015/1/30 11:14, Viresh Kumar wrote: > On 30 January 2015 at 07:51, ethan zhao <ethan.zhao@oracle.com> wrote: >>> My reasoning of why your observation doesn't fit here: >>> >>> Copying from your earlier mail.. >>> >>> Thread A: Workqueue: kacpi_notify >>> >>> acpi_processor_notify() >>> acpi_processor_ppc_has_changed() >>> cpufreq_update_policy() >>> cpufreq_cpu_get() >>> kobject_get() >>> >>> This tries to increment the count and the warning you have mentioned >>> happen because: >>> >>> WARN_ON_ONCE(atomic_inc_return(&kref->refcount) < 2); >>> >>> i.e. even after incrementing the count, it is < 2. Which I believe will be >>> 1. Which means that we have tried to do kobject_get() on a kobject >>> for which kobject_put() is already done. >>> >>> Thread B: xenbus_thread() >>> >>> xenbus_thread() >>> msg->u.watch.handle->callback() >>> handle_vcpu_hotplug_event() >>> vcpu_hotplug() >>> cpu_down() >>> __cpu_notify(CPU_DOWN_PREPARE..) >>> cpufreq_cpu_callback() >>> __cpufreq_remove_dev_prepare() >>> update_policy_cpu() >>> kobject_move() >>> >>> >>> Okay, where is the race or kobject_put() here ? We are just moving >>> the kobject and it has nothing to do with the refcount of kobject. >>> >>> Why do you see its a race ? >> I mean the policy->cpu has been changed, that CPU is about to be down, >> Thread A continue to get and update the policy for it blindly, that is >> what I Say 'race', not the refcount itself. > First of all, the WARN you had in your patch doesn't have anything to do > with the so-called race you just define. Its because of the reason I defined > earlier. > > Second, what if policy->cpu is getting updated? A policy manages a group > of CPUs, not a single cpu. And there still are other CPUs online for that > policy and so kobject_get() for that policy->kobj is perfectly valid. You mean the policy is shared by all CPUs, so PPC notification about one CPU should update all CPU's policy, right ? even the requested CPU is shutting down. Thanks, Ethan -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 30 January 2015 at 09:16, ethan zhao <ethan.zhao@oracle.com> wrote: > You mean the policy is shared by all CPUs, so PPC notification about one A policy may or maynot be shared by multiple CPUs. It all depends on your systems configuration. All CPUs which share clock line, share a policy. You can verify that from /sys/devices/system/cpu/cpu*/related_cpus. This gives the CPUs that share policy. > CPU should update all CPU's policy, right ? even the requested CPU is > shutting down. CPUs sharing policy have a single policy structure. And so policy would be updated for all CPUs that share the poilcy. Even if some cpu goes down, the policy might still be up and running. -- viresh -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 1/30/2015 2:57 PM, Rafael J. Wysocki wrote: > On Friday, January 30, 2015 06:43:12 AM Viresh Kumar wrote: >> cpufreq_cpu_data is protected by cpufreq_driver_lock and one of the instances >> has missed this. > > No, it isn't. cpufreq_cpu_data is a pointer and doesn't need any locks to > protect it. What the lock does is to guarantee the callers of cpufreq_cpu_get() > that the policy object won't go away from under them (it is used for some other > purposes too, but they are unrelated). > > What technically happens is an ordering problem. per_cpu(cpufreq_cpu_data, cpu) > needs to be cleared before calling kobject_put(&policy->kobj) *and* under the > lock, because otherwise if someone else calls cpufreq_cpu_get() in parallel > with that, they can obtain a non-NULL from it *after* kobject_put(&policy->kobj) > was executed. > > So the lock is needed not just because it protects cpufreq_cpu_data, but because > it is supposed to prevent writes to cpufreq_cpu_data from happening between the > read from it and the kobject_get(&policy->kobj) in cpufreq_cpu_get(). > >> And as a result we get this: >> >> ------------[ cut here ]------------ >> WARNING: CPU: 0 PID: 4 at include/linux/kref.h:47 >> kobject_get+0x41/0x50() >> Modules linked in: acpi_cpufreq(+) nfsd auth_rpcgss nfs_acl >> lockd grace sunrpc xfs libcrc32c sd_mod ixgbe igb mdio ahci hwmon >> ... >> Call Trace: >> [<ffffffff81661b14>] dump_stack+0x46/0x58 >> [<ffffffff81072b61>] warn_slowpath_common+0x81/0xa0 >> [<ffffffff81072c7a>] warn_slowpath_null+0x1a/0x20 >> [<ffffffff812e16d1>] kobject_get+0x41/0x50 >> [<ffffffff815262a5>] cpufreq_cpu_get+0x75/0xc0 >> [<ffffffff81527c3e>] cpufreq_update_policy+0x2e/0x1f0 >> [<ffffffff810b8cb2>] ? up+0x32/0x50 >> [<ffffffff81381aa9>] ? acpi_ns_get_node+0xcb/0xf2 >> [<ffffffff81381efd>] ? acpi_evaluate_object+0x22c/0x252 >> [<ffffffff813824f6>] ? acpi_get_handle+0x95/0xc0 >> [<ffffffff81360967>] ? acpi_has_method+0x25/0x40 >> [<ffffffff81391e08>] acpi_processor_ppc_has_changed+0x77/0x82 >> [<ffffffff81089566>] ? move_linked_works+0x66/0x90 >> [<ffffffff8138e8ed>] acpi_processor_notify+0x58/0xe7 >> [<ffffffff8137410c>] acpi_ev_notify_dispatch+0x44/0x5c >> [<ffffffff8135f293>] acpi_os_execute_deferred+0x15/0x22 >> [<ffffffff8108c910>] process_one_work+0x160/0x410 >> [<ffffffff8108d05b>] worker_thread+0x11b/0x520 >> [<ffffffff8108cf40>] ? rescuer_thread+0x380/0x380 >> [<ffffffff81092421>] kthread+0xe1/0x100 >> [<ffffffff81092340>] ? kthread_create_on_node+0x1b0/0x1b0 >> [<ffffffff81669ebc>] ret_from_fork+0x7c/0xb0 >> [<ffffffff81092340>] ? kthread_create_on_node+0x1b0/0x1b0 >> ---[ end trace 89e66eb9795efdf7 ]--- >> >> And here is the race: >> >> Thread A: Workqueue: kacpi_notify >> >> acpi_processor_notify() >> acpi_processor_ppc_has_changed() >> cpufreq_update_policy() >> cpufreq_cpu_get() >> kobject_get() >> >> Thread B: xenbus_thread() >> >> xenbus_thread() >> msg->u.watch.handle->callback() >> handle_vcpu_hotplug_event() >> vcpu_hotplug() >> cpu_down() >> __cpu_notify(CPU_POST_DEAD..) >> cpufreq_cpu_callback() >> __cpufreq_remove_dev_finish() >> cpufreq_policy_put_kobj() >> kobject_put() >> >> cpufreq_cpu_get() gets the policy from per-cpu variable cpufreq_cpu_data under >> cpufreq_driver_lock, and once it gets a valid policy it expects it to not be >> freed until cpufreq_cpu_put() is called. > > Because it does the kobject_get() under the lock too. > >> But the race happens when another thread puts the kobject first and updates >> cpufreq_cpu_data later > > This is an ordering problem. > >> and that too without these locks. > > And this is racy. > >> And so the first thread >> gets a valid policy structure and before it does kobject_get() on it, the second >> one does kobject_put(). And so this WARN(). >> >> Fix this by setting cpufreq_cpu_data to NULL before putting the kobject and that >> too under locks. > > That's almost correct. In addition to the above (albeit maybe unintentionally) > the patch also fixes the possible race between two instances of > __cpufreq_remove_dev_finish() with the same arguments running in parallel with > each other. The proof is left as an exercise to the reader. :-) > > Please try to improve the changelog ... > >> Cc: <stable@vger.kernel.org> # 3.12+ >> Reported-by: Ethan Zhao <ethan.zhao@oracle.com> >> Reported-and-tested-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> >> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> >> --- >> @Santosh: I have changed read locks to write locks here and so you need to test >> again. >> Right. Tested this version and confirm that it fixes the reported WARN() Regards, Santosh -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Friday, January 30, 2015 06:43:12 AM Viresh Kumar wrote: > cpufreq_cpu_data is protected by cpufreq_driver_lock and one of the instances > has missed this. No, it isn't. cpufreq_cpu_data is a pointer and doesn't need any locks to protect it. What the lock does is to guarantee the callers of cpufreq_cpu_get() that the policy object won't go away from under them (it is used for some other purposes too, but they are unrelated). What technically happens is an ordering problem. per_cpu(cpufreq_cpu_data, cpu) needs to be cleared before calling kobject_put(&policy->kobj) *and* under the lock, because otherwise if someone else calls cpufreq_cpu_get() in parallel with that, they can obtain a non-NULL from it *after* kobject_put(&policy->kobj) was executed. So the lock is needed not just because it protects cpufreq_cpu_data, but because it is supposed to prevent writes to cpufreq_cpu_data from happening between the read from it and the kobject_get(&policy->kobj) in cpufreq_cpu_get(). > And as a result we get this: > > ------------[ cut here ]------------ > WARNING: CPU: 0 PID: 4 at include/linux/kref.h:47 > kobject_get+0x41/0x50() > Modules linked in: acpi_cpufreq(+) nfsd auth_rpcgss nfs_acl > lockd grace sunrpc xfs libcrc32c sd_mod ixgbe igb mdio ahci hwmon > ... > Call Trace: > [<ffffffff81661b14>] dump_stack+0x46/0x58 > [<ffffffff81072b61>] warn_slowpath_common+0x81/0xa0 > [<ffffffff81072c7a>] warn_slowpath_null+0x1a/0x20 > [<ffffffff812e16d1>] kobject_get+0x41/0x50 > [<ffffffff815262a5>] cpufreq_cpu_get+0x75/0xc0 > [<ffffffff81527c3e>] cpufreq_update_policy+0x2e/0x1f0 > [<ffffffff810b8cb2>] ? up+0x32/0x50 > [<ffffffff81381aa9>] ? acpi_ns_get_node+0xcb/0xf2 > [<ffffffff81381efd>] ? acpi_evaluate_object+0x22c/0x252 > [<ffffffff813824f6>] ? acpi_get_handle+0x95/0xc0 > [<ffffffff81360967>] ? acpi_has_method+0x25/0x40 > [<ffffffff81391e08>] acpi_processor_ppc_has_changed+0x77/0x82 > [<ffffffff81089566>] ? move_linked_works+0x66/0x90 > [<ffffffff8138e8ed>] acpi_processor_notify+0x58/0xe7 > [<ffffffff8137410c>] acpi_ev_notify_dispatch+0x44/0x5c > [<ffffffff8135f293>] acpi_os_execute_deferred+0x15/0x22 > [<ffffffff8108c910>] process_one_work+0x160/0x410 > [<ffffffff8108d05b>] worker_thread+0x11b/0x520 > [<ffffffff8108cf40>] ? rescuer_thread+0x380/0x380 > [<ffffffff81092421>] kthread+0xe1/0x100 > [<ffffffff81092340>] ? kthread_create_on_node+0x1b0/0x1b0 > [<ffffffff81669ebc>] ret_from_fork+0x7c/0xb0 > [<ffffffff81092340>] ? kthread_create_on_node+0x1b0/0x1b0 > ---[ end trace 89e66eb9795efdf7 ]--- > > And here is the race: > > Thread A: Workqueue: kacpi_notify > > acpi_processor_notify() > acpi_processor_ppc_has_changed() > cpufreq_update_policy() > cpufreq_cpu_get() > kobject_get() > > Thread B: xenbus_thread() > > xenbus_thread() > msg->u.watch.handle->callback() > handle_vcpu_hotplug_event() > vcpu_hotplug() > cpu_down() > __cpu_notify(CPU_POST_DEAD..) > cpufreq_cpu_callback() > __cpufreq_remove_dev_finish() > cpufreq_policy_put_kobj() > kobject_put() > > cpufreq_cpu_get() gets the policy from per-cpu variable cpufreq_cpu_data under > cpufreq_driver_lock, and once it gets a valid policy it expects it to not be > freed until cpufreq_cpu_put() is called. Because it does the kobject_get() under the lock too. > But the race happens when another thread puts the kobject first and updates > cpufreq_cpu_data later This is an ordering problem. > and that too without these locks. And this is racy. > And so the first thread > gets a valid policy structure and before it does kobject_get() on it, the second > one does kobject_put(). And so this WARN(). > > Fix this by setting cpufreq_cpu_data to NULL before putting the kobject and that > too under locks. That's almost correct. In addition to the above (albeit maybe unintentionally) the patch also fixes the possible race between two instances of __cpufreq_remove_dev_finish() with the same arguments running in parallel with each other. The proof is left as an exercise to the reader. :-) Please try to improve the changelog ... > Cc: <stable@vger.kernel.org> # 3.12+ > Reported-by: Ethan Zhao <ethan.zhao@oracle.com> > Reported-and-tested-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> > Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> > --- > @Santosh: I have changed read locks to write locks here and so you need to test > again. > > drivers/cpufreq/cpufreq.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c > index 4473eba1d6b0..e3bf702b5588 100644 > --- a/drivers/cpufreq/cpufreq.c > +++ b/drivers/cpufreq/cpufreq.c > @@ -1409,9 +1409,10 @@ static int __cpufreq_remove_dev_finish(struct device *dev, > unsigned long flags; > struct cpufreq_policy *policy; > > - read_lock_irqsave(&cpufreq_driver_lock, flags); > + write_lock_irqsave(&cpufreq_driver_lock, flags); > policy = per_cpu(cpufreq_cpu_data, cpu); > - read_unlock_irqrestore(&cpufreq_driver_lock, flags); > + per_cpu(cpufreq_cpu_data, cpu) = NULL; > + write_unlock_irqrestore(&cpufreq_driver_lock, flags); > > if (!policy) { > pr_debug("%s: No cpu_data found\n", __func__); > @@ -1466,7 +1467,6 @@ static int __cpufreq_remove_dev_finish(struct device *dev, > } > } > > - per_cpu(cpufreq_cpu_data, cpu) = NULL; > return 0; > } > >
On 31 January 2015 at 04:27, Rafael J. Wysocki <rjw@rjwysocki.net> wrote: > No, it isn't. cpufreq_cpu_data is a pointer and doesn't need any locks to > protect it. What the lock does is to guarantee the callers of cpufreq_cpu_get() > that the policy object won't go away from under them (it is used for some other > purposes too, but they are unrelated). Yeah, its not just locking. Otherwise the locking only at the bottom of this routine should have fixed it. > That's almost correct. In addition to the above (albeit maybe unintentionally) > the patch also fixes the possible race between two instances of > __cpufreq_remove_dev_finish() with the same arguments running in parallel with > each other. The proof is left as an exercise to the reader. :-) Two __cpufreq_remove_dev_finish() can't get called in parallel, so it doesn't fix any race there :). > Please try to improve the changelog ... Yes. -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Viresh, On 2015/1/30 12:14, Viresh Kumar wrote: > On 30 January 2015 at 09:16, ethan zhao <ethan.zhao@oracle.com> wrote: >> You mean the policy is shared by all CPUs, so PPC notification about one > A policy may or maynot be shared by multiple CPUs. It all depends on your > systems configuration. All CPUs which share clock line, share a policy. > > You can verify that from /sys/devices/system/cpu/cpu*/related_cpus. This > gives the CPUs that share policy. > >> CPU should update all CPU's policy, right ? even the requested CPU is >> shutting down. > CPUs sharing policy have a single policy structure. And so policy would > be updated for all CPUs that share the poilcy. Even if some cpu goes down, > the policy might still be up and running. Let' me check ACPI spec and BIOS to match your implementation. About that patch you suggested, there is another question left pending: Policy will be freed only when that's last CPU shutting down, how does it happen when system booting ? The description of the patch would be wrong (the Xen_bus call stack) -- Did the xen_bus shut down every CPU till the last during booting ? Thanks, Ethan > > -- > viresh -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2 February 2015 at 07:24, ethan zhao <ethan.zhao@oracle.com> wrote: > Policy will be freed only when that's last CPU shutting down, how does it > happen when system booting ? I couldn't understand what you are asking here. Though policy will be allocated for the first CPU of every policy during boot.. > The description of the patch would be wrong (the Xen_bus call stack) -- > Did the xen_bus shut down every CPU till the last during booting ? I don't know about that.. -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 4473eba1d6b0..e3bf702b5588 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -1409,9 +1409,10 @@ static int __cpufreq_remove_dev_finish(struct device *dev, unsigned long flags; struct cpufreq_policy *policy; - read_lock_irqsave(&cpufreq_driver_lock, flags); + write_lock_irqsave(&cpufreq_driver_lock, flags); policy = per_cpu(cpufreq_cpu_data, cpu); - read_unlock_irqrestore(&cpufreq_driver_lock, flags); + per_cpu(cpufreq_cpu_data, cpu) = NULL; + write_unlock_irqrestore(&cpufreq_driver_lock, flags); if (!policy) { pr_debug("%s: No cpu_data found\n", __func__); @@ -1466,7 +1467,6 @@ static int __cpufreq_remove_dev_finish(struct device *dev, } } - per_cpu(cpufreq_cpu_data, cpu) = NULL; return 0; }
cpufreq_cpu_data is protected by cpufreq_driver_lock and one of the instances has missed this. And as a result we get this: ------------[ cut here ]------------ WARNING: CPU: 0 PID: 4 at include/linux/kref.h:47 kobject_get+0x41/0x50() Modules linked in: acpi_cpufreq(+) nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c sd_mod ixgbe igb mdio ahci hwmon ... Call Trace: [<ffffffff81661b14>] dump_stack+0x46/0x58 [<ffffffff81072b61>] warn_slowpath_common+0x81/0xa0 [<ffffffff81072c7a>] warn_slowpath_null+0x1a/0x20 [<ffffffff812e16d1>] kobject_get+0x41/0x50 [<ffffffff815262a5>] cpufreq_cpu_get+0x75/0xc0 [<ffffffff81527c3e>] cpufreq_update_policy+0x2e/0x1f0 [<ffffffff810b8cb2>] ? up+0x32/0x50 [<ffffffff81381aa9>] ? acpi_ns_get_node+0xcb/0xf2 [<ffffffff81381efd>] ? acpi_evaluate_object+0x22c/0x252 [<ffffffff813824f6>] ? acpi_get_handle+0x95/0xc0 [<ffffffff81360967>] ? acpi_has_method+0x25/0x40 [<ffffffff81391e08>] acpi_processor_ppc_has_changed+0x77/0x82 [<ffffffff81089566>] ? move_linked_works+0x66/0x90 [<ffffffff8138e8ed>] acpi_processor_notify+0x58/0xe7 [<ffffffff8137410c>] acpi_ev_notify_dispatch+0x44/0x5c [<ffffffff8135f293>] acpi_os_execute_deferred+0x15/0x22 [<ffffffff8108c910>] process_one_work+0x160/0x410 [<ffffffff8108d05b>] worker_thread+0x11b/0x520 [<ffffffff8108cf40>] ? rescuer_thread+0x380/0x380 [<ffffffff81092421>] kthread+0xe1/0x100 [<ffffffff81092340>] ? kthread_create_on_node+0x1b0/0x1b0 [<ffffffff81669ebc>] ret_from_fork+0x7c/0xb0 [<ffffffff81092340>] ? kthread_create_on_node+0x1b0/0x1b0 ---[ end trace 89e66eb9795efdf7 ]--- And here is the race: Thread A: Workqueue: kacpi_notify acpi_processor_notify() acpi_processor_ppc_has_changed() cpufreq_update_policy() cpufreq_cpu_get() kobject_get() Thread B: xenbus_thread() xenbus_thread() msg->u.watch.handle->callback() handle_vcpu_hotplug_event() vcpu_hotplug() cpu_down() __cpu_notify(CPU_POST_DEAD..) cpufreq_cpu_callback() __cpufreq_remove_dev_finish() cpufreq_policy_put_kobj() kobject_put() cpufreq_cpu_get() gets the policy from per-cpu variable cpufreq_cpu_data under cpufreq_driver_lock, and once it gets a valid policy it expects it to not be freed until cpufreq_cpu_put() is called. But the race happens when another thread puts the kobject first and updates cpufreq_cpu_data later and that too without these locks. And so the first thread gets a valid policy structure and before it does kobject_get() on it, the second one does kobject_put(). And so this WARN(). Fix this by setting cpufreq_cpu_data to NULL before putting the kobject and that too under locks. Cc: <stable@vger.kernel.org> # 3.12+ Reported-by: Ethan Zhao <ethan.zhao@oracle.com> Reported-and-tested-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> --- @Santosh: I have changed read locks to write locks here and so you need to test again. drivers/cpufreq/cpufreq.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)