Message ID | 2362640.pUofnXyzOi@vostro.rjw.lan (mailing list archive) |
---|---|
State | Superseded, archived |
Headers | show |
On 08/01/2013 05:38 AM, Rafael J. Wysocki wrote: > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > The cpufreq core is a little inconsistent in the way it uses the > driver module refcount. > > Namely, if __cpufreq_add_dev() is called for a CPU without siblings > or generally a CPU for which a new policy object has to be created, > it grabs a reference to the driver module to start with, but drops > that reference before returning. As a result, the driver module > refcount is then equal to 0 after __cpufreq_add_dev() has returned. > > On the other hand, if the given CPU is a sibling of some other > CPU already having a policy, cpufreq_add_policy_cpu() is called > to link the new CPU to the existing policy. In that case, > cpufreq_cpu_get() is called to obtain that policy and grabs a > reference to the driver module, but that reference is not > released and the module refcount will be different from 0 after > __cpufreq_add_dev() returns (unless there is an error). That > prevents the driver module from being unloaded until > __cpufreq_remove_dev() is called for all the CPUs that > cpufreq_add_policy_cpu() was called for previously. > > To remove that inconsistency make cpufreq_add_policy_cpu() execute > cpufreq_cpu_put() for the given policy before returning, which > decrements the driver module refcount so that it will be 0 after > __cpufreq_add_dev() returns, Removing the inconsistency is a good thing, but I think we should make it consistent the other way around - make a CPU-online increment the driver module refcount and decrement it only on CPU-offline. The reason is, one should not be able to unload the back-end cpufreq driver module when some CPUs are still being managed. Nasty things will result if we allow that. For example, if we unload the module, and then try to do a CPU offline, then the cpufreq hotplug notifier won't even be called (because cpufreq_unregister_driver also unregisters the hotplug notifier). And that might be troublesome. Even worse, if we unload a cpufreq driver module and load a new one and *then* try to offline the CPU, then the cpufreq_driver->exit() function that we call during CPU offline will end up calling the corresponding function of an entirely different driver! So the ->init() and ->exit() calls won't match. These complications won't exist if we simply prevent unloading the driver module as long as they are used in managing atleast one CPU. So I think it would be good to make the code consistent that way. Regards, Srivatsa S. Bhat > but also make it take a reference to > the policy itself using kobject_get() and do not release that > reference (unless there's an error or system resume is under way), > which again is consistent with the "raw" __cpufreq_add_dev() > behavior. > > Accordingly, modify __cpufreq_remove_dev() to use kobject_put() to > drop policy references taken by cpufreq_add_policy_cpu(). > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > --- > > On top of current linux-pm.git/linux-next. > > Thanks, > Rafael > > --- > drivers/cpufreq/cpufreq.c | 24 +++++++++++++++--------- > 1 file changed, 15 insertions(+), 9 deletions(-) > > Index: linux-pm/drivers/cpufreq/cpufreq.c > =================================================================== > --- linux-pm.orig/drivers/cpufreq/cpufreq.c > +++ linux-pm/drivers/cpufreq/cpufreq.c > @@ -908,8 +908,10 @@ static int cpufreq_add_policy_cpu(unsign > unsigned long flags; > > policy = cpufreq_cpu_get(sibling); > - WARN_ON(!policy); > + if (WARN_ON_ONCE(!policy)) > + return -ENODATA; > > + kobject_get(&policy->kobj); > if (has_target) > __cpufreq_governor(policy, CPUFREQ_GOV_STOP); > > @@ -932,14 +934,14 @@ static int cpufreq_add_policy_cpu(unsign > /* Don't touch sysfs links during light-weight init */ > if (frozen) { > /* Drop the extra refcount that we took above */ > - cpufreq_cpu_put(policy); > - return 0; > + kobject_put(&policy->kobj); > + } else { > + ret = sysfs_create_link(&dev->kobj, &policy->kobj, "cpufreq"); > + if (ret) > + kobject_put(&policy->kobj); > } > > - ret = sysfs_create_link(&dev->kobj, &policy->kobj, "cpufreq"); > - if (ret) > - cpufreq_cpu_put(policy); > - > + cpufreq_cpu_put(policy); > return ret; > } > #endif > @@ -1298,10 +1300,14 @@ static int __cpufreq_remove_dev(struct d > if (!frozen) > cpufreq_policy_free(data); > } else { > - > + /* > + * There are more CPUs using the same policy, so only drop the > + * reference taken by cpufreq_add_policy_cpu() (unless the > + * system is suspending). > + */ > if (!frozen) { > pr_debug("%s: removing link, cpu: %d\n", __func__, cpu); > - cpufreq_cpu_put(data); > + kobject_put(&data->kobj); > } > > if (cpufreq_driver->target) { > -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 1 August 2013 13:41, Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> wrote: > On 08/01/2013 05:38 AM, Rafael J. Wysocki wrote: >> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> >> >> The cpufreq core is a little inconsistent in the way it uses the >> driver module refcount. >> >> Namely, if __cpufreq_add_dev() is called for a CPU without siblings >> or generally a CPU for which a new policy object has to be created, >> it grabs a reference to the driver module to start with, but drops >> that reference before returning. As a result, the driver module >> refcount is then equal to 0 after __cpufreq_add_dev() has returned. >> >> On the other hand, if the given CPU is a sibling of some other >> CPU already having a policy, cpufreq_add_policy_cpu() is called >> to link the new CPU to the existing policy. In that case, >> cpufreq_cpu_get() is called to obtain that policy and grabs a >> reference to the driver module, but that reference is not >> released and the module refcount will be different from 0 after >> __cpufreq_add_dev() returns (unless there is an error). That >> prevents the driver module from being unloaded until >> __cpufreq_remove_dev() is called for all the CPUs that >> cpufreq_add_policy_cpu() was called for previously. >> >> To remove that inconsistency make cpufreq_add_policy_cpu() execute >> cpufreq_cpu_put() for the given policy before returning, which >> decrements the driver module refcount so that it will be 0 after >> __cpufreq_add_dev() returns, > > Removing the inconsistency is a good thing, but I think we should > make it consistent the other way around - make a CPU-online increment > the driver module refcount and decrement it only on CPU-offline. I took time to review to this mail as I was looking at the problem yesterday. I am sorry to say, but I have completely different views as compared to You and Rafael both :) First of all, Rafael's patch is incomplete as it hasn't fixed the issue completely. When we have multiple CPUs per policy and cpufreq_add_dev() is called for the first one, it call cpufreq_get_cpu() for all cpus of this policy(), so count is == x (no. of CPUs in this policy) + 1 (This is the initial value of .owner). And so we still have module count getting incremented for other cpus :) Now few lines about My point of view to this whole thing. I believe we should get rid of .owner field from struct cpufreq_driver completely. It doesn't make sense to me in doing all this management at all. Surprised? Shocked? Laughing at me? :) Well I may be wrong but this is what I think: - It looks stupid to me that I can't do this from userspace in one go: $ insmod cpufreq_driver.ko $ rmmod cpufreq_driver.ko What the hell changed in between that isn't visible to user? It looked completely stupid in that way.. Something like this sure makes sense: $ insmod ondemand-governor.ko $ change governor to ondemand for few CPUs $ rmmod ondemand-governor.ko as we have deliberately add few users of governor. And so without second step, rmmod should really work smoothly. And it does. Now, why shouldn't there be a problem with this approach? I will write that inline to the problems you just described. > The reason is, one should not be able to unload the back-end cpufreq > driver module when some CPUs are still being managed. Nasty things > will result if we allow that. For example, if we unload the module, > and then try to do a CPU offline, then the cpufreq hotplug notifier > won't even be called (because cpufreq_unregister_driver also > unregisters the hotplug notifier). And that might be troublesome. So what? Its simply equivalent to we have booted our system, haven't inserted cpufreq module and taken out a cpu. > Even worse, if we unload a cpufreq driver module and load a new > one and *then* try to offline the CPU, then the cpufreq_driver->exit() > function that we call during CPU offline will end up calling the > corresponding function of an entirely different driver! So the > ->init() and ->exit() calls won't match. That's not true. When we unload the module, it must call cpufreq_unregister_driver() which should call cpufreq_remove_cpu() for all cpus and so exit() is already called for last module. If we get something new now, it should simply work. What do you think gentlemen? -- viresh -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 08/01/2013 08:14 PM, Viresh Kumar wrote: > On 1 August 2013 13:41, Srivatsa S. Bhat > <srivatsa.bhat@linux.vnet.ibm.com> wrote: >> On 08/01/2013 05:38 AM, Rafael J. Wysocki wrote: >>> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> >>> >>> The cpufreq core is a little inconsistent in the way it uses the >>> driver module refcount. >>> >>> Namely, if __cpufreq_add_dev() is called for a CPU without siblings >>> or generally a CPU for which a new policy object has to be created, >>> it grabs a reference to the driver module to start with, but drops >>> that reference before returning. As a result, the driver module >>> refcount is then equal to 0 after __cpufreq_add_dev() has returned. >>> >>> On the other hand, if the given CPU is a sibling of some other >>> CPU already having a policy, cpufreq_add_policy_cpu() is called >>> to link the new CPU to the existing policy. In that case, >>> cpufreq_cpu_get() is called to obtain that policy and grabs a >>> reference to the driver module, but that reference is not >>> released and the module refcount will be different from 0 after >>> __cpufreq_add_dev() returns (unless there is an error). That >>> prevents the driver module from being unloaded until >>> __cpufreq_remove_dev() is called for all the CPUs that >>> cpufreq_add_policy_cpu() was called for previously. >>> >>> To remove that inconsistency make cpufreq_add_policy_cpu() execute >>> cpufreq_cpu_put() for the given policy before returning, which >>> decrements the driver module refcount so that it will be 0 after >>> __cpufreq_add_dev() returns, >> >> Removing the inconsistency is a good thing, but I think we should >> make it consistent the other way around - make a CPU-online increment >> the driver module refcount and decrement it only on CPU-offline. > > I took time to review to this mail as I was looking at the problem > yesterday. I am sorry to say, but I have completely different views as > compared to You and Rafael both :) > > First of all, Rafael's patch is incomplete as it hasn't fixed the issue > completely. When we have multiple CPUs per policy and > cpufreq_add_dev() is called for the first one, it call cpufreq_get_cpu() > for all cpus of this policy(), so count is == x (no. of CPUs in this policy) > + 1 (This is the initial value of .owner). > > And so we still have module count getting incremented for other cpus :) > Good catch! > Now few lines about My point of view to this whole thing. I believe we > should get rid of .owner field from struct cpufreq_driver completely. It > doesn't make sense to me in doing all this management at all. Surprised? > Shocked? Laughing at me? :) > > Well I may be wrong but this is what I think: > - It looks stupid to me that I can't do this from userspace in one go: > $ insmod cpufreq_driver.ko > $ rmmod cpufreq_driver.ko > > What the hell changed in between that isn't visible to user? It looked > completely stupid in that way.. > > Something like this sure makes sense: > $ insmod ondemand-governor.ko > $ change governor to ondemand for few CPUs > $ rmmod ondemand-governor.ko > > as we have deliberately add few users of governor. And so without second > step, rmmod should really work smoothly. And it does. > > Now, why shouldn't there be a problem with this approach? I will write > that inline to the problems you just described. > >> The reason is, one should not be able to unload the back-end cpufreq >> driver module when some CPUs are still being managed. Nasty things >> will result if we allow that. For example, if we unload the module, >> and then try to do a CPU offline, then the cpufreq hotplug notifier >> won't even be called (because cpufreq_unregister_driver also >> unregisters the hotplug notifier). And that might be troublesome. > > So what? Its simply equivalent to we have booted our system, haven't > inserted cpufreq module and taken out a cpu. > >> Even worse, if we unload a cpufreq driver module and load a new >> one and *then* try to offline the CPU, then the cpufreq_driver->exit() >> function that we call during CPU offline will end up calling the >> corresponding function of an entirely different driver! So the >> ->init() and ->exit() calls won't match. > > That's not true. When we unload the module, it must call > cpufreq_unregister_driver() which should call cpufreq_remove_cpu() > for all cpus and so exit() is already called for last module. > Sorry, I missed this one. > If we get something new now, it should simply work. > Yeah, I now see your point. It won't create any problems by unloading the module and loading a new one. > What do you think gentlemen? > Well, I now agree that we don't have to keep the module refcount non-zero as long as CPUs are being managed (that was just my misunderstanding, sorry for the noise). However, I think the _get() and _put() used in the existing code is for synchronization: that is, to avoid races between trying to unload the cpufreq driver module and running a hotplug notifier (and calling the driver module's ->init() or ->exit() function). With that being the case, I think we can retain the module refcounts and use them only for synchronization. And naturally the refcount should drop to zero after the critical section; no point keeping it incremented until the CPU is taken offline. Or, am I confused again? Regards, Srivatsa S. Bhat -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 1 August 2013 20:54, Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> wrote: > Well, I now agree that we don't have to keep the module refcount > non-zero as long as CPUs are being managed (that was just my > misunderstanding, sorry for the noise). However, I think the _get() > and _put() used in the existing code is for synchronization: that > is, to avoid races between trying to unload the cpufreq driver > module and running a hotplug notifier (and calling the driver module's > ->init() or ->exit() function). > > With that being the case, I think we can retain the module refcounts > and use them only for synchronization. And naturally the refcount > should drop to zero after the critical section; no point keeping > it incremented until the CPU is taken offline. > > Or, am I confused again? No, you aren't. But, for synchronization we need some blocking stuff, so that when user tries to rmmod the module, it should wait until other critical sections are over and then unload the module. But here, we are simply returning from rmmod, saying that we are busy :) So, for that kind of synchronization we better use locks available inside cpufreq core of if required another variable which can be used for refcount. But now this. -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thursday, August 01, 2013 08:54:59 PM Srivatsa S. Bhat wrote: > On 08/01/2013 08:14 PM, Viresh Kumar wrote: > > On 1 August 2013 13:41, Srivatsa S. Bhat > > <srivatsa.bhat@linux.vnet.ibm.com> wrote: > >> On 08/01/2013 05:38 AM, Rafael J. Wysocki wrote: > >>> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > >>> > >>> The cpufreq core is a little inconsistent in the way it uses the > >>> driver module refcount. > >>> > >>> Namely, if __cpufreq_add_dev() is called for a CPU without siblings > >>> or generally a CPU for which a new policy object has to be created, > >>> it grabs a reference to the driver module to start with, but drops > >>> that reference before returning. As a result, the driver module > >>> refcount is then equal to 0 after __cpufreq_add_dev() has returned. > >>> > >>> On the other hand, if the given CPU is a sibling of some other > >>> CPU already having a policy, cpufreq_add_policy_cpu() is called > >>> to link the new CPU to the existing policy. In that case, > >>> cpufreq_cpu_get() is called to obtain that policy and grabs a > >>> reference to the driver module, but that reference is not > >>> released and the module refcount will be different from 0 after > >>> __cpufreq_add_dev() returns (unless there is an error). That > >>> prevents the driver module from being unloaded until > >>> __cpufreq_remove_dev() is called for all the CPUs that > >>> cpufreq_add_policy_cpu() was called for previously. > >>> > >>> To remove that inconsistency make cpufreq_add_policy_cpu() execute > >>> cpufreq_cpu_put() for the given policy before returning, which > >>> decrements the driver module refcount so that it will be 0 after > >>> __cpufreq_add_dev() returns, > >> > >> Removing the inconsistency is a good thing, but I think we should > >> make it consistent the other way around - make a CPU-online increment > >> the driver module refcount and decrement it only on CPU-offline. > > > > I took time to review to this mail as I was looking at the problem > > yesterday. I am sorry to say, but I have completely different views as > > compared to You and Rafael both :) > > > > First of all, Rafael's patch is incomplete as it hasn't fixed the issue > > completely. When we have multiple CPUs per policy and > > cpufreq_add_dev() is called for the first one, it call cpufreq_get_cpu() > > for all cpus of this policy(), so count is == x (no. of CPUs in this policy) > > + 1 (This is the initial value of .owner). > > > > And so we still have module count getting incremented for other cpus :) > > > > Good catch! Sorry, I don't see how this happens. __cpufreq_add_dev() only directly calls cpufreq_cpu_get() at the beginning to check if the policy is there and then immediately calls cpufreq_cpu_put() in that case (for that policy). Next, cpufreq_add_policy_cpu() calls cpufreq_cpu_get(), but that's what my patch changes. I don't see where else cpufreq_cpu_get() is called by __cpufreq_add_dev() whether directly or indirectly. Moreover, if I'm completely wrong and it is called there in an invisible hush-hush way, it has to be explained why the module usage count as printed by lsmod for acpi-cpufreq is 0 (in current linux-next). > > Now few lines about My point of view to this whole thing. I believe we > > should get rid of .owner field from struct cpufreq_driver completely. It > > doesn't make sense to me in doing all this management at all. Surprised? > > Shocked? Laughing at me? :) > > > > Well I may be wrong but this is what I think: > > - It looks stupid to me that I can't do this from userspace in one go: > > $ insmod cpufreq_driver.ko > > $ rmmod cpufreq_driver.ko > > > > What the hell changed in between that isn't visible to user? It looked > > completely stupid in that way.. > > > > Something like this sure makes sense: > > $ insmod ondemand-governor.ko > > $ change governor to ondemand for few CPUs > > $ rmmod ondemand-governor.ko > > > > as we have deliberately add few users of governor. And so without second > > step, rmmod should really work smoothly. And it does. > > > > Now, why shouldn't there be a problem with this approach? I will write > > that inline to the problems you just described. > > > >> The reason is, one should not be able to unload the back-end cpufreq > >> driver module when some CPUs are still being managed. Nasty things > >> will result if we allow that. For example, if we unload the module, > >> and then try to do a CPU offline, then the cpufreq hotplug notifier > >> won't even be called (because cpufreq_unregister_driver also > >> unregisters the hotplug notifier). And that might be troublesome. > > > > So what? Its simply equivalent to we have booted our system, haven't > > inserted cpufreq module and taken out a cpu. I'd put that differently. With the current code as is it may cause problems to happen, but there are two ways to change that in general: (1) Disallow the removal of the cpufreq driver while there are any users, but for that we only need the driver to be refcounted *once* when a new policy is created (and not as many times as there are CPUs using that policy). Then, the reference can be dropped while removing the policy object. (2) Allow the removal of the cpufreq driver, but harden the code against that. [Maybe it doesn't have to be hardened any more as is, I haven't checked that.] I agree with Viresh that (1) is kind of weird from the usability perspective, because if we did that, it wouldn't be practically possible to remove cpufreq driver modules after loading them (cpuidle currently has that problem). > >> Even worse, if we unload a cpufreq driver module and load a new > >> one and *then* try to offline the CPU, then the cpufreq_driver->exit() > >> function that we call during CPU offline will end up calling the > >> corresponding function of an entirely different driver! So the > >> ->init() and ->exit() calls won't match. > > > > That's not true. When we unload the module, it must call > > cpufreq_unregister_driver() which should call cpufreq_remove_cpu() > > for all cpus and so exit() is already called for last module. > > > > Sorry, I missed this one. > > > If we get something new now, it should simply work. > > > > Yeah, I now see your point. It won't create any problems by > unloading the module and loading a new one. > > > What do you think gentlemen? > > > > Well, I now agree that we don't have to keep the module refcount > non-zero as long as CPUs are being managed (that was just my > misunderstanding, sorry for the noise). However, I think the _get() > and _put() used in the existing code is for synchronization: that > is, to avoid races between trying to unload the cpufreq driver > module and running a hotplug notifier (and calling the driver module's > ->init() or ->exit() function). > > With that being the case, I think we can retain the module refcounts > and use them only for synchronization. And naturally the refcount > should drop to zero after the critical section; no point keeping > it incremented until the CPU is taken offline. > > Or, am I confused again? No, I don't think so. In fact, the point of my patch was to make the module refcount stay 0 beyond critical sections, but it looks like I overlooked something. What is that? Rafael
On 08/01/2013 11:34 PM, Rafael J. Wysocki wrote: > On Thursday, August 01, 2013 08:54:59 PM Srivatsa S. Bhat wrote: >> On 08/01/2013 08:14 PM, Viresh Kumar wrote: >>> On 1 August 2013 13:41, Srivatsa S. Bhat >>> <srivatsa.bhat@linux.vnet.ibm.com> wrote: >>>> On 08/01/2013 05:38 AM, Rafael J. Wysocki wrote: >>>>> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> >>>>> >>>>> The cpufreq core is a little inconsistent in the way it uses the >>>>> driver module refcount. >>>>> >>>>> Namely, if __cpufreq_add_dev() is called for a CPU without siblings >>>>> or generally a CPU for which a new policy object has to be created, >>>>> it grabs a reference to the driver module to start with, but drops >>>>> that reference before returning. As a result, the driver module >>>>> refcount is then equal to 0 after __cpufreq_add_dev() has returned. >>>>> >>>>> On the other hand, if the given CPU is a sibling of some other >>>>> CPU already having a policy, cpufreq_add_policy_cpu() is called >>>>> to link the new CPU to the existing policy. In that case, >>>>> cpufreq_cpu_get() is called to obtain that policy and grabs a >>>>> reference to the driver module, but that reference is not >>>>> released and the module refcount will be different from 0 after >>>>> __cpufreq_add_dev() returns (unless there is an error). That >>>>> prevents the driver module from being unloaded until >>>>> __cpufreq_remove_dev() is called for all the CPUs that >>>>> cpufreq_add_policy_cpu() was called for previously. >>>>> >>>>> To remove that inconsistency make cpufreq_add_policy_cpu() execute >>>>> cpufreq_cpu_put() for the given policy before returning, which >>>>> decrements the driver module refcount so that it will be 0 after >>>>> __cpufreq_add_dev() returns, >>>> >>>> Removing the inconsistency is a good thing, but I think we should >>>> make it consistent the other way around - make a CPU-online increment >>>> the driver module refcount and decrement it only on CPU-offline. >>> >>> I took time to review to this mail as I was looking at the problem >>> yesterday. I am sorry to say, but I have completely different views as >>> compared to You and Rafael both :) >>> >>> First of all, Rafael's patch is incomplete as it hasn't fixed the issue >>> completely. When we have multiple CPUs per policy and >>> cpufreq_add_dev() is called for the first one, it call cpufreq_get_cpu() >>> for all cpus of this policy(), so count is == x (no. of CPUs in this policy) >>> + 1 (This is the initial value of .owner). >>> >>> And so we still have module count getting incremented for other cpus :) >>> >> >> Good catch! > > Sorry, I don't see how this happens. > > __cpufreq_add_dev() only directly calls cpufreq_cpu_get() at the beginning to > check if the policy is there and then immediately calls cpufreq_cpu_put() in > that case (for that policy). > > Next, cpufreq_add_policy_cpu() calls cpufreq_cpu_get(), but that's what my > patch changes. > > I don't see where else cpufreq_cpu_get() is called by __cpufreq_add_dev() > whether directly or indirectly. > __cpufreq_add_dev()->cpufreq_add_dev_interface()->cpufreq_add_dev_symlink(). The last one does: 815 for_each_cpu(j, policy->cpus) { 816 struct cpufreq_policy *managed_policy; 817 struct device *cpu_dev; 818 819 if (j == cpu) 820 continue; 821 822 pr_debug("CPU %u already managed, adding link\n", j); 823 managed_policy = cpufreq_cpu_get(cpu); 824 cpu_dev = get_cpu_device(j); 825 ret = sysfs_create_link(&cpu_dev->kobj, &policy->kobj, 826 "cpufreq"); ... } > Moreover, if I'm completely wrong and it is called there in an invisible > hush-hush way, it has to be explained why the module usage count as printed by > lsmod for acpi-cpufreq is 0 (in current linux-next). > Perhaps because none of your policies have more than one CPU associated with it? I think related_cpus should be able to tell us that.. But yes, it is a little hidden and moreover, we don't take the refcount if there is only one CPU in the mask. Which is a little inconsistent, IMHO. >>> Now few lines about My point of view to this whole thing. I believe we >>> should get rid of .owner field from struct cpufreq_driver completely. It >>> doesn't make sense to me in doing all this management at all. Surprised? >>> Shocked? Laughing at me? :) >>> >>> Well I may be wrong but this is what I think: >>> - It looks stupid to me that I can't do this from userspace in one go: >>> $ insmod cpufreq_driver.ko >>> $ rmmod cpufreq_driver.ko >>> >>> What the hell changed in between that isn't visible to user? It looked >>> completely stupid in that way.. >>> >>> Something like this sure makes sense: >>> $ insmod ondemand-governor.ko >>> $ change governor to ondemand for few CPUs >>> $ rmmod ondemand-governor.ko >>> >>> as we have deliberately add few users of governor. And so without second >>> step, rmmod should really work smoothly. And it does. >>> >>> Now, why shouldn't there be a problem with this approach? I will write >>> that inline to the problems you just described. >>> >>>> The reason is, one should not be able to unload the back-end cpufreq >>>> driver module when some CPUs are still being managed. Nasty things >>>> will result if we allow that. For example, if we unload the module, >>>> and then try to do a CPU offline, then the cpufreq hotplug notifier >>>> won't even be called (because cpufreq_unregister_driver also >>>> unregisters the hotplug notifier). And that might be troublesome. >>> >>> So what? Its simply equivalent to we have booted our system, haven't >>> inserted cpufreq module and taken out a cpu. > > I'd put that differently. > > With the current code as is it may cause problems to happen, but there are > two ways to change that in general: > > (1) Disallow the removal of the cpufreq driver while there are any users, but > for that we only need the driver to be refcounted *once* when a new policy > is created (and not as many times as there are CPUs using that policy). > Then, the reference can be dropped while removing the policy object. > > (2) Allow the removal of the cpufreq driver, but harden the code against > that. [Maybe it doesn't have to be hardened any more as is, I haven't > checked that.] > > I agree with Viresh that (1) is kind of weird from the usability perspective, > because if we did that, it wouldn't be practically possible to remove cpufreq > driver modules after loading them (cpuidle currently has that problem). > Yes, I think we can go with Viresh's approach and use plain locking to synchronize things. Returning -EBUSY isn't really beneficial, since the critical sections are small and finite - its not like the user has to wait a long time to rmmod the module if we use locking. >>>> Even worse, if we unload a cpufreq driver module and load a new >>>> one and *then* try to offline the CPU, then the cpufreq_driver->exit() >>>> function that we call during CPU offline will end up calling the >>>> corresponding function of an entirely different driver! So the >>>> ->init() and ->exit() calls won't match. >>> >>> That's not true. When we unload the module, it must call >>> cpufreq_unregister_driver() which should call cpufreq_remove_cpu() >>> for all cpus and so exit() is already called for last module. >>> >> >> Sorry, I missed this one. >> >>> If we get something new now, it should simply work. >>> >> >> Yeah, I now see your point. It won't create any problems by >> unloading the module and loading a new one. >> >>> What do you think gentlemen? >>> >> >> Well, I now agree that we don't have to keep the module refcount >> non-zero as long as CPUs are being managed (that was just my >> misunderstanding, sorry for the noise). However, I think the _get() >> and _put() used in the existing code is for synchronization: that >> is, to avoid races between trying to unload the cpufreq driver >> module and running a hotplug notifier (and calling the driver module's >> ->init() or ->exit() function). >> >> With that being the case, I think we can retain the module refcounts >> and use them only for synchronization. And naturally the refcount >> should drop to zero after the critical section; no point keeping >> it incremented until the CPU is taken offline. >> >> Or, am I confused again? > > No, I don't think so. > > In fact, the point of my patch was to make the module refcount stay 0 > beyond critical sections, but it looks like I overlooked something. What is > that? > Its the cpufreq_cpu_get() hidden away in cpufreq_add_dev_symlink(). With that taken care of, everything should be OK. Then we can change the synchronization part to avoid using refcounts. Regards, Srivatsa S. Bhat -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thursday, August 01, 2013 10:54:08 PM Viresh Kumar wrote: > On 1 August 2013 20:54, Srivatsa S. Bhat > <srivatsa.bhat@linux.vnet.ibm.com> wrote: > > Well, I now agree that we don't have to keep the module refcount > > non-zero as long as CPUs are being managed (that was just my > > misunderstanding, sorry for the noise). However, I think the _get() > > and _put() used in the existing code is for synchronization: that > > is, to avoid races between trying to unload the cpufreq driver > > module and running a hotplug notifier (and calling the driver module's > > ->init() or ->exit() function). > > > > With that being the case, I think we can retain the module refcounts > > and use them only for synchronization. And naturally the refcount > > should drop to zero after the critical section; no point keeping > > it incremented until the CPU is taken offline. > > > > Or, am I confused again? > > No, you aren't. > > But, for synchronization we need some blocking stuff, so that when > user tries to rmmod the module, it should wait until other critical > sections are over and then unload the module. But here, we are > simply returning from rmmod, saying that we are busy :) > > So, for that kind of synchronization we better use locks available > inside cpufreq core of if required another variable which can be > used for refcount. But now this. I suppose you wanted to say "But not this"? I agree. That said the situation now is that for acpi-cpufreq the module usage count is 0 (as of linux-next from today), so you can actually rmmod it. I don't see why it should be different in the case when there are multiple CPUs for the same policy and hence my patch. [I don't see the problem with it as I said to my reply to Srivatsa, so care to explain it to me?.] Now it seems to me that the code *is* racy (I haven't verified that, though), so I agree that we should use some other synchronization framework to be able to rmmod and modprobe cpufreq drivers safely. Using the driver module usage counter for that is not quite correct in my view. Thanks, Rafael
On 08/02/2013 12:31 AM, Rafael J. Wysocki wrote: > On Thursday, August 01, 2013 11:36:49 PM Srivatsa S. Bhat wrote: >> Its the cpufreq_cpu_get() hidden away in cpufreq_add_dev_symlink(). With >> that taken care of, everything should be OK. Then we can change the >> synchronization part to avoid using refcounts. > > So I actually don't see why cpufreq_add_dev_symlink() needs to call > cpufreq_cpu_get() at all, since the policy refcount is already 1 at the > point it is called and the bumping up of the driver module refcount is > pointless. > Hmm, yes, it seems so. > However, if I change that I also need to change the piece of code that > calls the complementary cpufreq_cpu_put() and I kind of cannot find it. > ... I guess that's because you are looking at the code with your patch applied (and your patch removed that _put()) ;-) Its this part in __cpufreq_remove_dev(): 1303 } else { 1304 1305 if (!frozen) { 1306 pr_debug("%s: removing link, cpu: %d\n", __func__, cpu); 1307 cpufreq_cpu_put(data); 1308 } 1309 Regards, Srivatsa S. Bhat -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thursday, August 01, 2013 11:36:49 PM Srivatsa S. Bhat wrote: > On 08/01/2013 11:34 PM, Rafael J. Wysocki wrote: > > On Thursday, August 01, 2013 08:54:59 PM Srivatsa S. Bhat wrote: > >> On 08/01/2013 08:14 PM, Viresh Kumar wrote: > >>> On 1 August 2013 13:41, Srivatsa S. Bhat > >>> <srivatsa.bhat@linux.vnet.ibm.com> wrote: > >>>> On 08/01/2013 05:38 AM, Rafael J. Wysocki wrote: > >>>>> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > >>>>> > >>>>> The cpufreq core is a little inconsistent in the way it uses the > >>>>> driver module refcount. > >>>>> > >>>>> Namely, if __cpufreq_add_dev() is called for a CPU without siblings > >>>>> or generally a CPU for which a new policy object has to be created, > >>>>> it grabs a reference to the driver module to start with, but drops > >>>>> that reference before returning. As a result, the driver module > >>>>> refcount is then equal to 0 after __cpufreq_add_dev() has returned. > >>>>> > >>>>> On the other hand, if the given CPU is a sibling of some other > >>>>> CPU already having a policy, cpufreq_add_policy_cpu() is called > >>>>> to link the new CPU to the existing policy. In that case, > >>>>> cpufreq_cpu_get() is called to obtain that policy and grabs a > >>>>> reference to the driver module, but that reference is not > >>>>> released and the module refcount will be different from 0 after > >>>>> __cpufreq_add_dev() returns (unless there is an error). That > >>>>> prevents the driver module from being unloaded until > >>>>> __cpufreq_remove_dev() is called for all the CPUs that > >>>>> cpufreq_add_policy_cpu() was called for previously. > >>>>> > >>>>> To remove that inconsistency make cpufreq_add_policy_cpu() execute > >>>>> cpufreq_cpu_put() for the given policy before returning, which > >>>>> decrements the driver module refcount so that it will be 0 after > >>>>> __cpufreq_add_dev() returns, > >>>> > >>>> Removing the inconsistency is a good thing, but I think we should > >>>> make it consistent the other way around - make a CPU-online increment > >>>> the driver module refcount and decrement it only on CPU-offline. > >>> > >>> I took time to review to this mail as I was looking at the problem > >>> yesterday. I am sorry to say, but I have completely different views as > >>> compared to You and Rafael both :) > >>> > >>> First of all, Rafael's patch is incomplete as it hasn't fixed the issue > >>> completely. When we have multiple CPUs per policy and > >>> cpufreq_add_dev() is called for the first one, it call cpufreq_get_cpu() > >>> for all cpus of this policy(), so count is == x (no. of CPUs in this policy) > >>> + 1 (This is the initial value of .owner). > >>> > >>> And so we still have module count getting incremented for other cpus :) > >>> > >> > >> Good catch! > > > > Sorry, I don't see how this happens. > > > > __cpufreq_add_dev() only directly calls cpufreq_cpu_get() at the beginning to > > check if the policy is there and then immediately calls cpufreq_cpu_put() in > > that case (for that policy). > > > > Next, cpufreq_add_policy_cpu() calls cpufreq_cpu_get(), but that's what my > > patch changes. > > > > I don't see where else cpufreq_cpu_get() is called by __cpufreq_add_dev() > > whether directly or indirectly. > > > > __cpufreq_add_dev()->cpufreq_add_dev_interface()->cpufreq_add_dev_symlink(). Ah, OK, thanks! > The last one does: > > 815 for_each_cpu(j, policy->cpus) { > 816 struct cpufreq_policy *managed_policy; > 817 struct device *cpu_dev; > 818 > 819 if (j == cpu) > 820 continue; > 821 > 822 pr_debug("CPU %u already managed, adding link\n", j); > 823 managed_policy = cpufreq_cpu_get(cpu); > 824 cpu_dev = get_cpu_device(j); > 825 ret = sysfs_create_link(&cpu_dev->kobj, &policy->kobj, > 826 "cpufreq"); > ... > } And when do we drop this one? > > Moreover, if I'm completely wrong and it is called there in an invisible > > hush-hush way, it has to be explained why the module usage count as printed by > > lsmod for acpi-cpufreq is 0 (in current linux-next). > > > > Perhaps because none of your policies have more than one CPU associated > with it? I think related_cpus should be able to tell us that.. Yes, that's the case. > But yes, it is a little hidden and moreover, we don't take the refcount if > there is only one CPU in the mask. Which is a little inconsistent, IMHO. Well, I obviously agree. > >>> Now few lines about My point of view to this whole thing. I believe we > >>> should get rid of .owner field from struct cpufreq_driver completely. It > >>> doesn't make sense to me in doing all this management at all. Surprised? > >>> Shocked? Laughing at me? :) > >>> > >>> Well I may be wrong but this is what I think: > >>> - It looks stupid to me that I can't do this from userspace in one go: > >>> $ insmod cpufreq_driver.ko > >>> $ rmmod cpufreq_driver.ko > >>> > >>> What the hell changed in between that isn't visible to user? It looked > >>> completely stupid in that way.. > >>> > >>> Something like this sure makes sense: > >>> $ insmod ondemand-governor.ko > >>> $ change governor to ondemand for few CPUs > >>> $ rmmod ondemand-governor.ko > >>> > >>> as we have deliberately add few users of governor. And so without second > >>> step, rmmod should really work smoothly. And it does. > >>> > >>> Now, why shouldn't there be a problem with this approach? I will write > >>> that inline to the problems you just described. > >>> > >>>> The reason is, one should not be able to unload the back-end cpufreq > >>>> driver module when some CPUs are still being managed. Nasty things > >>>> will result if we allow that. For example, if we unload the module, > >>>> and then try to do a CPU offline, then the cpufreq hotplug notifier > >>>> won't even be called (because cpufreq_unregister_driver also > >>>> unregisters the hotplug notifier). And that might be troublesome. > >>> > >>> So what? Its simply equivalent to we have booted our system, haven't > >>> inserted cpufreq module and taken out a cpu. > > > > I'd put that differently. > > > > With the current code as is it may cause problems to happen, but there are > > two ways to change that in general: > > > > (1) Disallow the removal of the cpufreq driver while there are any users, but > > for that we only need the driver to be refcounted *once* when a new policy > > is created (and not as many times as there are CPUs using that policy). > > Then, the reference can be dropped while removing the policy object. > > > > (2) Allow the removal of the cpufreq driver, but harden the code against > > that. [Maybe it doesn't have to be hardened any more as is, I haven't > > checked that.] > > > > I agree with Viresh that (1) is kind of weird from the usability perspective, > > because if we did that, it wouldn't be practically possible to remove cpufreq > > driver modules after loading them (cpuidle currently has that problem). > > > > Yes, I think we can go with Viresh's approach and use plain locking to > synchronize things. Returning -EBUSY isn't really beneficial, since the > critical sections are small and finite - its not like the user has to wait > a long time to rmmod the module if we use locking. Agreed. > >>>> Even worse, if we unload a cpufreq driver module and load a new > >>>> one and *then* try to offline the CPU, then the cpufreq_driver->exit() > >>>> function that we call during CPU offline will end up calling the > >>>> corresponding function of an entirely different driver! So the > >>>> ->init() and ->exit() calls won't match. > >>> > >>> That's not true. When we unload the module, it must call > >>> cpufreq_unregister_driver() which should call cpufreq_remove_cpu() > >>> for all cpus and so exit() is already called for last module. > >>> > >> > >> Sorry, I missed this one. > >> > >>> If we get something new now, it should simply work. > >>> > >> > >> Yeah, I now see your point. It won't create any problems by > >> unloading the module and loading a new one. > >> > >>> What do you think gentlemen? > >>> > >> > >> Well, I now agree that we don't have to keep the module refcount > >> non-zero as long as CPUs are being managed (that was just my > >> misunderstanding, sorry for the noise). However, I think the _get() > >> and _put() used in the existing code is for synchronization: that > >> is, to avoid races between trying to unload the cpufreq driver > >> module and running a hotplug notifier (and calling the driver module's > >> ->init() or ->exit() function). > >> > >> With that being the case, I think we can retain the module refcounts > >> and use them only for synchronization. And naturally the refcount > >> should drop to zero after the critical section; no point keeping > >> it incremented until the CPU is taken offline. > >> > >> Or, am I confused again? > > > > No, I don't think so. > > > > In fact, the point of my patch was to make the module refcount stay 0 > > beyond critical sections, but it looks like I overlooked something. What is > > that? > > > > Its the cpufreq_cpu_get() hidden away in cpufreq_add_dev_symlink(). With > that taken care of, everything should be OK. Then we can change the > synchronization part to avoid using refcounts. So I actually don't see why cpufreq_add_dev_symlink() needs to call cpufreq_cpu_get() at all, since the policy refcount is already 1 at the point it is called and the bumping up of the driver module refcount is pointless. However, if I change that I also need to change the piece of code that calls the complementary cpufreq_cpu_put() and I kind of cannot find it. Thanks, Rafael
On 08/02/2013 12:51 AM, Rafael J. Wysocki wrote: > On Friday, August 02, 2013 12:31:23 AM Srivatsa S. Bhat wrote: >> On 08/02/2013 12:31 AM, Rafael J. Wysocki wrote: >>> On Thursday, August 01, 2013 11:36:49 PM Srivatsa S. Bhat wrote: >>>> Its the cpufreq_cpu_get() hidden away in cpufreq_add_dev_symlink(). With >>>> that taken care of, everything should be OK. Then we can change the >>>> synchronization part to avoid using refcounts. >>> >>> So I actually don't see why cpufreq_add_dev_symlink() needs to call >>> cpufreq_cpu_get() at all, since the policy refcount is already 1 at the >>> point it is called and the bumping up of the driver module refcount is >>> pointless. >>> >> >> Hmm, yes, it seems so. >> >>> However, if I change that I also need to change the piece of code that >>> calls the complementary cpufreq_cpu_put() and I kind of cannot find it. >>> >> >> ... I guess that's because you are looking at the code with your patch >> applied (and your patch removed that _put()) ;-) > > No, it's not that one. That one was complementary to the cpufreq_cpu_get() > done by cpufreq_add_policy_cpu() before my patch. Since my patch changes > cpufreq_add_policy_cpu() to call cpufreq_cpu_put() before returning and > bump up the policy refcount with kobject_get(), the one in > __cpufreq_remove_dev() is changed into kobject_put() (correctly, IMO). > > What gives? > Actually, it _is_ the one I pointed above. This thing is tricky, here's why: cpufreq_add_policy_cpu() is called only if: a. The CPU being onlined has per_cpu(cpufreq_cpu_data, cpu) == NULL and b. Its is present in some CPU's related_cpus mask. If condition (a) doesn't hold good, you get out right in the beginning of __cpufreq_add_dev(). So, cpufreq_add_policy_cpu() is called very rarely because, inside __cpufreq_add_dev we do: 1093 write_lock_irqsave(&cpufreq_driver_lock, flags); 1094 for_each_cpu(j, policy->cpus) { 1095 per_cpu(cpufreq_cpu_data, j) = policy; 1096 per_cpu(cpufreq_policy_cpu, j) = policy->cpu; 1097 } 1098 write_unlock_irqrestore(&cpufreq_driver_lock, flags); So for all the CPUs in the above policy->cpus mask, we simply return without further ado when they are onlined. In particular, we *dont* call cpufreq_add_policy_cpu() for any of them. And their refcounts are incremented by the cpufreq_add_dev_interface()-> cpufreq_add_dev_symlink() function. So, ultimately, we increment the refcount for a given non-policy-owner CPU only once. *Either* in cpufreq_add_dev_symlink() *or* in cpufreq_add_policy_cpu(), but never both. So, in the teardown path, __cpufreq_remove_dev() needs only one place to decrement it as shown below: 1303 } else { 1304 1305 if (!frozen) { 1306 pr_debug("%s: removing link, cpu: %d\n", __func__, cpu); 1307 cpufreq_cpu_put(data); 1308 } Pretty good maze, right? ;-( Regards, Srivatsa S. Bhat -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Friday, August 02, 2013 12:31:23 AM Srivatsa S. Bhat wrote: > On 08/02/2013 12:31 AM, Rafael J. Wysocki wrote: > > On Thursday, August 01, 2013 11:36:49 PM Srivatsa S. Bhat wrote: > >> Its the cpufreq_cpu_get() hidden away in cpufreq_add_dev_symlink(). With > >> that taken care of, everything should be OK. Then we can change the > >> synchronization part to avoid using refcounts. > > > > So I actually don't see why cpufreq_add_dev_symlink() needs to call > > cpufreq_cpu_get() at all, since the policy refcount is already 1 at the > > point it is called and the bumping up of the driver module refcount is > > pointless. > > > > Hmm, yes, it seems so. > > > However, if I change that I also need to change the piece of code that > > calls the complementary cpufreq_cpu_put() and I kind of cannot find it. > > > > ... I guess that's because you are looking at the code with your patch > applied (and your patch removed that _put()) ;-) No, it's not that one. That one was complementary to the cpufreq_cpu_get() done by cpufreq_add_policy_cpu() before my patch. Since my patch changes cpufreq_add_policy_cpu() to call cpufreq_cpu_put() before returning and bump up the policy refcount with kobject_get(), the one in __cpufreq_remove_dev() is changed into kobject_put() (correctly, IMO). What gives? Rafael
Index: linux-pm/drivers/cpufreq/cpufreq.c =================================================================== --- linux-pm.orig/drivers/cpufreq/cpufreq.c +++ linux-pm/drivers/cpufreq/cpufreq.c @@ -908,8 +908,10 @@ static int cpufreq_add_policy_cpu(unsign unsigned long flags; policy = cpufreq_cpu_get(sibling); - WARN_ON(!policy); + if (WARN_ON_ONCE(!policy)) + return -ENODATA; + kobject_get(&policy->kobj); if (has_target) __cpufreq_governor(policy, CPUFREQ_GOV_STOP); @@ -932,14 +934,14 @@ static int cpufreq_add_policy_cpu(unsign /* Don't touch sysfs links during light-weight init */ if (frozen) { /* Drop the extra refcount that we took above */ - cpufreq_cpu_put(policy); - return 0; + kobject_put(&policy->kobj); + } else { + ret = sysfs_create_link(&dev->kobj, &policy->kobj, "cpufreq"); + if (ret) + kobject_put(&policy->kobj); } - ret = sysfs_create_link(&dev->kobj, &policy->kobj, "cpufreq"); - if (ret) - cpufreq_cpu_put(policy); - + cpufreq_cpu_put(policy); return ret; } #endif @@ -1298,10 +1300,14 @@ static int __cpufreq_remove_dev(struct d if (!frozen) cpufreq_policy_free(data); } else { - + /* + * There are more CPUs using the same policy, so only drop the + * reference taken by cpufreq_add_policy_cpu() (unless the + * system is suspending). + */ if (!frozen) { pr_debug("%s: removing link, cpu: %d\n", __func__, cpu); - cpufreq_cpu_put(data); + kobject_put(&data->kobj); } if (cpufreq_driver->target) {