Message ID | 5230C89D.7010801@linux.vnet.ibm.com (mailing list archive) |
---|---|
State | Superseded, archived |
Headers | show |
On 09/12/2013 01:37 AM, Stephen Warren wrote: > On 09/11/2013 01:46 PM, Srivatsa S. Bhat wrote: >> On 09/12/2013 12:33 AM, Stephen Warren wrote: >>> On 09/11/2013 12:42 PM, Srivatsa S. Bhat wrote: >>> ... >>>> OK, I took a second look at the code, and I suspect that applying the >>>> second patch might help. So can you try by applying both the patches >>>> please[1][2]? >>>> >>> ... >>>> [1]. http://marc.info/?l=linux-kernel&m=137889516210816&w=2 >>>> [2]. http://marc.info/?l=linux-kernel&m=137889800511940&w=2 >>> >>> Yes, with both of those patches applies, the problem is solved:-) >>> >>> I was going to test the second patch originally, but it sounded like it >>> was more of a cleanup rather than a fix for my issue, so I didn't bother >>> when I found the problem wasn't solved by patch 1. Sorry! >>> >> >> Well, honestly, even I had intended the second patch as a cleanup and >> hadn't asked you to test it ;-) Only when you reported that the first patch >> failed to solve your problem, I realized that the second patch was >> important too! :-) Thanks for testing! >> >>> For the record, I'm testing on a 2-CPU system, so I'm not sure whether >>> your explanation applies; it talks about CPUs 2 and 3 whereas I only >>> have CPUs 0 and 1, but perhaps your explanation applies equally to any >>> pair of CPUs? >>> >> >> Yes, it applies to any pair of CPUs, as long as the CPU first taken down >> is not the policy->cpu. In your case, it applies like this: >> IIUC, CPU0 is the boot cpu, and hence it wont be taken offline using hotplug. >> So only CPU 1 is taken offline during suspend. And if it is not the policy->cpu, >> then it hits the very same bug that I described with the analogy of CPUs 2 >> and 3. >> >>> For the record, here's the information you requested in the other email: >>> >>> # cat /sys/devices/system/cpu/cpu*/cpufreq/related_cpus >>> 0 1 >>> 0 1 >> >> Thanks! It would have been more useful to somehow know which was the >> policy->cpu. But looking at the problem, certainly CPU0 was the policy->cpu >> in your case. > > Yes, I believe CPU0 since, > >> # ls -l /sys/devices/system/cpu/cpu1/cpufreq >> lrwxrwxrwx 1 root root 0 Jan 1 00:01 /sys/devices/system/cpu/cpu1/cpufreq -> ../cpu0/cpufreq > > and cpu0/cpufreq/ has all the files in it. > > ... Ah, nice! >> So can you see if patch 1 + this above fix solves your problem as well? >> Then we can retain the original patch 2 as a cleanup, after these 2 patches. >> This organization also makes the code look better and understandable. > > Yes, both patch 1+3 and 1+3+2 work fine. > Cool! Thanks a lot for all your testing efforts Stephen! :-) Regards, Srivatsa S. Bhat -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 09/11/2013 01:46 PM, Srivatsa S. Bhat wrote: > On 09/12/2013 12:33 AM, Stephen Warren wrote: >> On 09/11/2013 12:42 PM, Srivatsa S. Bhat wrote: >> ... >>> OK, I took a second look at the code, and I suspect that applying the >>> second patch might help. So can you try by applying both the patches >>> please[1][2]? >>> >> ... >>> [1]. http://marc.info/?l=linux-kernel&m=137889516210816&w=2 >>> [2]. http://marc.info/?l=linux-kernel&m=137889800511940&w=2 >> >> Yes, with both of those patches applies, the problem is solved:-) >> >> I was going to test the second patch originally, but it sounded like it >> was more of a cleanup rather than a fix for my issue, so I didn't bother >> when I found the problem wasn't solved by patch 1. Sorry! >> > > Well, honestly, even I had intended the second patch as a cleanup and > hadn't asked you to test it ;-) Only when you reported that the first patch > failed to solve your problem, I realized that the second patch was > important too! :-) Thanks for testing! > >> For the record, I'm testing on a 2-CPU system, so I'm not sure whether >> your explanation applies; it talks about CPUs 2 and 3 whereas I only >> have CPUs 0 and 1, but perhaps your explanation applies equally to any >> pair of CPUs? >> > > Yes, it applies to any pair of CPUs, as long as the CPU first taken down > is not the policy->cpu. In your case, it applies like this: > IIUC, CPU0 is the boot cpu, and hence it wont be taken offline using hotplug. > So only CPU 1 is taken offline during suspend. And if it is not the policy->cpu, > then it hits the very same bug that I described with the analogy of CPUs 2 > and 3. > >> For the record, here's the information you requested in the other email: >> >> # cat /sys/devices/system/cpu/cpu*/cpufreq/related_cpus >> 0 1 >> 0 1 > > Thanks! It would have been more useful to somehow know which was the > policy->cpu. But looking at the problem, certainly CPU0 was the policy->cpu > in your case. Yes, I believe CPU0 since, > # ls -l /sys/devices/system/cpu/cpu1/cpufreq > lrwxrwxrwx 1 root root 0 Jan 1 00:01 /sys/devices/system/cpu/cpu1/cpufreq -> ../cpu0/cpufreq and cpu0/cpufreq/ has all the files in it. ... > So can you see if patch 1 + this above fix solves your problem as well? > Then we can retain the original patch 2 as a cleanup, after these 2 patches. > This organization also makes the code look better and understandable. Yes, both patch 1+3 and 1+3+2 work fine. -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 12 September 2013 01:16, Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> wrote: > From: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> > Subject: [PATCH] cpufreq: Restructure if/else block to avoid unintended behavior > > In __cpufreq_remove_dev_prepare(), the code which decides whether to remove > the sysfs link or nominate a new policy cpu, is governed by an if/else block > with a rather complex set of conditionals. Worse, they harbor a subtlety > which leads to certain unintended behavior. > > The code looks like this: > > if (cpu != policy->cpu && !frozen) { > sysfs_remove_link(&dev->kobj, "cpufreq"); > } else if (cpus > 1) { > new_cpu = cpufreq_nominate_new_policy_cpu(...); > ... > update_policy_cpu(..., new_cpu); > } > > The original intention was: > If the CPU going offline is not policy->cpu, just remove the link. > On the other hand, if the CPU going offline is the policy->cpu itself, > handover the policy->cpu job to some other surviving CPU in that policy. > > But because the 'if' condition also includes the 'frozen' check, now there > are *two* possibilities by which we can enter the 'else' block: > > 1. cpu == policy->cpu (intended) > 2. cpu != policy->cpu && frozen (unintended) > > Due to the second (unintended) scenario, we end up spuriously nominating > a CPU as the policy->cpu, even when the existing policy->cpu is alive and > well. This can cause problems further down the line, especially when we end > up nominating the same policy->cpu as the new one (ie., old == new), > because it totally confuses update_policy_cpu(). > > To avoid this mess, restructure the if/else block to only do what was > originally intended, and thus prevent any unwelcome surprises. > > Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> > --- > > drivers/cpufreq/cpufreq.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c > index 62bdb95..247842b 100644 > --- a/drivers/cpufreq/cpufreq.c > +++ b/drivers/cpufreq/cpufreq.c > @@ -1193,8 +1193,9 @@ static int __cpufreq_remove_dev_prepare(struct device *dev, > cpumask_clear_cpu(cpu, policy->cpus); > unlock_policy_rwsem_write(cpu); > > - if (cpu != policy->cpu && !frozen) { > - sysfs_remove_link(&dev->kobj, "cpufreq"); > + if (cpu != policy->cpu) { > + if (!frozen) > + sysfs_remove_link(&dev->kobj, "cpufreq"); > } else if (cpus > 1) { > > new_cpu = cpufreq_nominate_new_policy_cpu(policy, cpu, frozen); Ahh, I wrote exactly the same crap.. Rafael please take Srivatsa's patch here :) > So can you see if patch 1 + this above fix solves your problem as well? > Then we can retain the original patch 2 as a cleanup, after these 2 patches. Why do we need 2 now? We should never hit that case I would say.. And If we do, there is some other bug in our code which we have hidden :) -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 62bdb95..247842b 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -1193,8 +1193,9 @@ static int __cpufreq_remove_dev_prepare(struct device *dev, cpumask_clear_cpu(cpu, policy->cpus); unlock_policy_rwsem_write(cpu); - if (cpu != policy->cpu && !frozen) { - sysfs_remove_link(&dev->kobj, "cpufreq"); + if (cpu != policy->cpu) { + if (!frozen) + sysfs_remove_link(&dev->kobj, "cpufreq"); } else if (cpus > 1) { new_cpu = cpufreq_nominate_new_policy_cpu(policy, cpu, frozen);