diff mbox

3.10-rcX: cpu governor ondemand doesn't scale well after s2ram

Message ID 51D05DF4.50704@linux.vnet.ibm.com (mailing list archive)
State Superseded, archived
Headers show

Commit Message

Srivatsa S. Bhat June 30, 2013, 4:33 p.m. UTC
On 06/30/2013 07:52 PM, Rafael J. Wysocki wrote:
> On Saturday, June 29, 2013 07:50:11 PM Toralf Förster wrote:
>> The latest bisect attempt gave :
>>
>> commit a66b2e503fc79fff6632d02ef5a0ee47c1d2553d
>> Author: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
>> Date:   Wed May 15 21:47:17 2013 +0200
>>
>>     cpufreq: Preserve sysfs files across suspend/resume
>>
>>     The file permissions of cpufreq per-cpu sysfs files are not preserved
>>     across suspend/resume because we internally go through the CPU
>>     Hotplug path which reinitializes the file permissions on CPU online.
>>
>>     But the user is not supposed to know that we are using CPU hotplug
>>     internally within suspend/resume (IOW, the kernel should not silently
>>     wreck the user-set file permissions across a suspend cycle).
>>     Therefore, we need to preserve the file permissions as they are
>>     across suspend/resume.
>>
>>     The simplest way to achieve that is to just not touch the sysfs files
>>     at all - ie., just ignore the CPU hotplug notifications in the
>>     suspend/resume path (_FROZEN) in the cpufreq hotplug callback.
>>
>>     Reported-by: Robert Jarzmik <robert.jarzmik@intel.com>
>>     Reported-by: Durgadoss R <durgadoss.r@intel.com>
>>     Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
>>     Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
>>     Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>
>>
>>
>> To get a more reliable bisect result I had to start BOINC before (4
>> childs each with nice -19 started)
> 
> Well, to be honest, I'm not really sure how the above commit can cause the
> problem you're seeing to happen ...
> 
> Srivatsa, Viresh, any ideas?
>

I tried to look up what problem is being reported, but apart from the hint
from the subject line, I couldn't find anything more. So, guessing that there
is something wrong with cpufreq after an s3 cycle, Toralf, can you please
try out the below patch and see if it improves anything? (Don't revert anything,
just apply the below diff on a problematic kernel and see if it solves your
issue).


---

 drivers/cpufreq/cpufreq_stats.c |    1 +
 1 file changed, 1 insertion(+)



--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Toralf Förster June 30, 2013, 5:05 p.m. UTC | #1
On 06/30/2013 06:33 PM, Srivatsa S. Bhat wrote:
> Toralf, can you please
> try out the below patch and see if it improves anything? (Don't revert anything,
> just apply the below diff on a problematic kernel and see if it solves your
> issue).

applied on top of a66b2e5 - issue went away (either fixed or hidden now)

Thx
Toralf Förster July 3, 2013, 7:46 p.m. UTC | #2
On 06/30/2013 07:05 PM, Toralf Förster wrote:
> On 06/30/2013 06:33 PM, Srivatsa S. Bhat wrote:
>> Toralf, can you please
>> try out the below patch and see if it improves anything? (Don't revert anything,
>> just apply the below diff on a problematic kernel and see if it solves your
>> issue).
> 
> applied on top of a66b2e5 - issue went away (either fixed or hidden now)
> 
> Thx
> 
But if I do apply that patch on top of kernel 3.10 then it has no effect
- 3.10 shows the same issue.
Srivatsa S. Bhat July 4, 2013, 6:55 a.m. UTC | #3
On 07/04/2013 01:16 AM, Toralf Förster wrote:
> On 06/30/2013 07:05 PM, Toralf Förster wrote:
>> On 06/30/2013 06:33 PM, Srivatsa S. Bhat wrote:
>>> Toralf, can you please
>>> try out the below patch and see if it improves anything? (Don't revert anything,
>>> just apply the below diff on a problematic kernel and see if it solves your
>>> issue).
>>
>> applied on top of a66b2e5 - issue went away (either fixed or hidden now)
>>
>> Thx
>>
> But if I do apply that patch on top of kernel 3.10 then it has no effect
> - 3.10 shows the same issue.
> 

Weird. So I think something *else* got broken by some other patch, _after_
a66b2e5. Viresh, do you have any suspect commits in mind?

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Viresh Kumar July 4, 2013, 7:01 a.m. UTC | #4
On 4 July 2013 12:25, Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> wrote:
> On 07/04/2013 01:16 AM, Toralf Förster wrote:
>> On 06/30/2013 07:05 PM, Toralf Förster wrote:
>>> On 06/30/2013 06:33 PM, Srivatsa S. Bhat wrote:
>>>> Toralf, can you please
>>>> try out the below patch and see if it improves anything? (Don't revert anything,
>>>> just apply the below diff on a problematic kernel and see if it solves your
>>>> issue).
>>>
>>> applied on top of a66b2e5 - issue went away (either fixed or hidden now)
>>>
>>> Thx
>>>
>> But if I do apply that patch on top of kernel 3.10 then it has no effect
>> - 3.10 shows the same issue.
>>
>
> Weird. So I think something *else* got broken by some other patch, _after_
> a66b2e5. Viresh, do you have any suspect commits in mind?

If I was God I could have. But as I am a poor human being I
need more details (preferably with git bisect) :)

--
Viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Srivatsa S. Bhat July 4, 2013, 7:08 a.m. UTC | #5
On 07/04/2013 12:31 PM, Viresh Kumar wrote:
> On 4 July 2013 12:25, Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>> On 07/04/2013 01:16 AM, Toralf Förster wrote:
>>> On 06/30/2013 07:05 PM, Toralf Förster wrote:
>>>> On 06/30/2013 06:33 PM, Srivatsa S. Bhat wrote:
>>>>> Toralf, can you please
>>>>> try out the below patch and see if it improves anything? (Don't revert anything,
>>>>> just apply the below diff on a problematic kernel and see if it solves your
>>>>> issue).
>>>>
>>>> applied on top of a66b2e5 - issue went away (either fixed or hidden now)
>>>>
>>>> Thx
>>>>
>>> But if I do apply that patch on top of kernel 3.10 then it has no effect
>>> - 3.10 shows the same issue.
>>>
>>
>> Weird. So I think something *else* got broken by some other patch, _after_
>> a66b2e5. Viresh, do you have any suspect commits in mind?
> 
> If I was God I could have. But as I am a poor human being I
> need more details (preferably with git bisect) :)
> 

Haha :)

Coming to git bisect, I think it needs to be done between a66b2e5 + this-patch
and 3.10 + this-patch, to ensure that we look for problematic commits in-between
a66b2e5 and 3.10 (to avoid the bisect result ending up at a66b2e5 again).

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Viresh Kumar July 4, 2013, 7:58 a.m. UTC | #6
On 4 July 2013 12:38, Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> wrote:
> Coming to git bisect, I think it needs to be done between a66b2e5 + this-patch
> and 3.10 + this-patch, to ensure that we look for problematic commits in-between
> a66b2e5 and 3.10 (to avoid the bisect result ending up at a66b2e5 again).

Exactly!!
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Viresh Kumar July 4, 2013, 8:04 a.m. UTC | #7
On 4 July 2013 12:38, Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> wrote:
> Coming to git bisect, I think it needs to be done between a66b2e5 + this-patch
> and 3.10 + this-patch, to ensure that we look for problematic commits in-between
> a66b2e5 and 3.10 (to avoid the bisect result ending up at a66b2e5 again).

Toralf,

Try reverting these patches in the order specified (revert both)
419e172145cf6c51d436a8bf4afcd17511f0ff79
c28375583b6471c1cb833a628ab6afb5b69079d0

They are touching ondemand governor and this may be creating some
problems. But I am not sure about it, I must admit.
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Viresh Kumar July 4, 2013, 8:23 a.m. UTC | #8
On 4 July 2013 13:34, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> On 4 July 2013 12:38, Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>> Coming to git bisect, I think it needs to be done between a66b2e5 + this-patch
>> and 3.10 + this-patch, to ensure that we look for problematic commits in-between
>> a66b2e5 and 3.10 (to avoid the bisect result ending up at a66b2e5 again).
>
> Toralf,
>
> Try reverting these patches in the order specified (revert both)
> 419e172145cf6c51d436a8bf4afcd17511f0ff79

Ok, I did a mistake here (Thanks Srivatsa for telling me), this wasn't
there in 3.10 but 3.11-rc1

> c28375583b6471c1cb833a628ab6afb5b69079d0
>
> They are touching ondemand governor and this may be creating some
> problems. But I am not sure about it, I must admit.

So just try reverting about commit.

In case you are using acpi-cpufreq driver, please try reverting this before
above one:

8673b83bf2f013379453b4779047bf3c6ae387e4

It is playing with how frequencies are reported and may be the
culprit.
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Toralf Förster July 4, 2013, 4:42 p.m. UTC | #9
On 07/04/2013 10:23 AM, Viresh Kumar wrote:
> Ok, I did a mistake here (Thanks Srivatsa for telling me), this wasn't
> there in 3.10 but 3.11-rc1
> 
>> > c28375583b6471c1cb833a628ab6afb5b69079d0
>> >
>> > They are touching ondemand governor and this may be creating some
>> > problems. But I am not sure about it, I must admit.
> So just try reverting about commit.
> 
> In case you are using acpi-cpufreq driver, please try reverting this before
> above one:
> 
> 8673b83bf2f013379453b4779047bf3c6ae387e4
I reverted  8673b83b on top of 3.10.0  - no success, then I reverted
c2837558 too - no success.
Viresh Kumar July 5, 2013, 4:35 a.m. UTC | #10
On 4 July 2013 22:12, Toralf Förster <toralf.foerster@gmx.de> wrote:
> On 07/04/2013 10:23 AM, Viresh Kumar wrote:
>> Ok, I did a mistake here (Thanks Srivatsa for telling me), this wasn't
>> there in 3.10 but 3.11-rc1
>>
>>> > c28375583b6471c1cb833a628ab6afb5b69079d0
>>> >
>>> > They are touching ondemand governor and this may be creating some
>>> > problems. But I am not sure about it, I must admit.
>> So just try reverting about commit.
>>
>> In case you are using acpi-cpufreq driver, please try reverting this before
>> above one:
>>
>> 8673b83bf2f013379453b4779047bf3c6ae387e4
> I reverted  8673b83b on top of 3.10.0  - no success, then I reverted
> c2837558 too - no success.

I assume that you have applied the patch given by Srivatsa earlier over
these reverts?

In that case you can go for a revert I believe. There aren't many cpufreq
patches between 3.10 and the commit where you tested successfully
earlier.

But I now also suspect that this might have been caused by something
outside cpufreq.
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Toralf Förster July 5, 2013, 2:06 p.m. UTC | #11
On 07/05/2013 06:35 AM, Viresh Kumar wrote:
> I assume that you have applied the patch given by Srivatsa earlier over
> these reverts?
yes

> But I now also suspect that this might have been caused by something
> outside cpufreq.

+1

At least the likelihood is big and the "saving" to narrow down bisecting just to
the cpufreq patch is small compared to the "cost" of the risk of an unwanted side effect.
OTOH not sure when I find a time frame for the bisect nightmare.
That's why I tweaked my s2ram script in this way:

+       if [[ "$1" = "mem" && "$(uname -r)" = "3.10.0" ]]; then
+               echo " tweak governor ..."
+               for g in performance ondemand; do
+                       for i in 0 1 2 3; do
+                               echo $g > /sys/devices/system/cpu/cpu$i/cpufreq/scaling_governor
+                       done
+               done
+               echo 1 > /sys/devices/system/cpu/cpufreq/ondemand/ignore_nice
+       fi
Toralf Förster July 10, 2013, 7:31 p.m. UTC | #12
On 07/04/2013 09:58 AM, Viresh Kumar wrote:
> On 4 July 2013 12:38, Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>> Coming to git bisect, I think it needs to be done between a66b2e5 + this-patch
>> and 3.10 + this-patch, to ensure that we look for problematic commits in-between
>> a66b2e5 and 3.10 (to avoid the bisect result ending up at a66b2e5 again).
> 
> Exactly!!
> 

Problem : yesterday I wasn't even able to 100% validate the issue at
commit a66b2e5 + f51e1eb just b/c there's another issue which
erratically happens here: sometimes cpu0 is too (not only cpu 1-3). In
that case the frequency are set to 2.6 GHz for each coret instead of 800
MHz for cpu0 and 2.601 GHz for cpu 1-3.

Furthermore I've the strong feeling, that the plugged in USB devices
might have an effect. And in the mean while I would not even exclude the
influence of the type of the 4 BOINC tasks (Asteorid@Home versus
Einstein@Home / World Community Grid which I have to run with nice -19
in the back ground) at the bisect results.

So all my ongoing attempts to bisect weren't successful till now.

I just tested the latest git tree from today (v3.10-8615-g23e3a1d) -
although f51e1eb is merged - the origin issue (cpu 1-3 set to 2.601 GHz
after wakeup) is still here.
diff mbox

Patch

diff --git a/drivers/cpufreq/cpufreq_stats.c b/drivers/cpufreq/cpufreq_stats.c
index fb65dec..591b6fb 100644
--- a/drivers/cpufreq/cpufreq_stats.c
+++ b/drivers/cpufreq/cpufreq_stats.c
@@ -349,6 +349,7 @@  static int __cpuinit cpufreq_stat_cpu_callback(struct notifier_block *nfb,
 
 	switch (action) {
 	case CPU_ONLINE:
+	case CPU_ONLINE_FROZEN:
 		cpufreq_update_policy(cpu);
 		break;
 	case CPU_DOWN_PREPARE: