diff mbox

[RFT,v7,5/8] cpuidle: Return nohz hint from cpuidle_select()

Message ID CAJZ5v0iThFDEjnwTbpAhwHY_vF_KDdAUyhDL1CdB4GJsG5eNRQ@mail.gmail.com (mailing list archive)
State Not Applicable, archived
Headers show

Commit Message

Rafael J. Wysocki March 21, 2018, 10:15 p.m. UTC
On Wed, Mar 21, 2018 at 6:59 PM, Thomas Ilsche
<thomas.ilsche@tu-dresden.de> wrote:
> On 2018-03-21 15:36, Rafael J. Wysocki wrote:
>>
>>
>> So please disregard this one entirely and take the v7.2 replacement
>> instead of it:https://patchwork.kernel.org/patch/10299429/
>>
>> The current versions (including the above) is in the git branch at
>>
>>   git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git \
>>   idle-loop-v7.2
>
>
> With v7.2 (tested on SKL-SP from git) I see similar behavior in idle
> as with v5: several cores which just keep the sched tick enabled.
> Worse yet, some go only in C1 (not even C1E!?) despite sleeping the
> full sched tick.
> The resulting power consumption is ~105 W instead of ~ 70 W.
>
> https://wwwpub.zih.tu-dresden.de/~tilsche/powernightmares/v7_2_skl_sp_idle.png
>
> I have briefly ran v7 and I believe it was also affected.

Then it looks like menu_select() stubbornly thinks that the idle
duration will be within the tick boundary on those cores.

That may be because the bumping up of the correction factor in
menu_reflect() is too conservative or it may be necessary to do
something radical to measured_us in menu_update() in case of a tick
wakeup combined with a large next_timer_us value.

For starters, please see if the attached patch (on top of the
idle-loop-v7.2 git branch) changes this behavior in any way.

Comments

Thomas Ilsche March 22, 2018, 1:18 p.m. UTC | #1
On 2018-03-21 23:15, Rafael J. Wysocki wrote:
> On Wed, Mar 21, 2018 at 6:59 PM, Thomas Ilsche
> <thomas.ilsche@tu-dresden.de> wrote:
>> On 2018-03-21 15:36, Rafael J. Wysocki wrote:
>>>
>>>
>>> So please disregard this one entirely and take the v7.2 replacement
>>> instead of it:https://patchwork.kernel.org/patch/10299429/
>>>
>>> The current versions (including the above) is in the git branch at
>>>
>>>    git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git \
>>>    idle-loop-v7.2
>>
>>
>> With v7.2 (tested on SKL-SP from git) I see similar behavior in idle
>> as with v5: several cores which just keep the sched tick enabled.
>> Worse yet, some go only in C1 (not even C1E!?) despite sleeping the
>> full sched tick.
>> The resulting power consumption is ~105 W instead of ~ 70 W.
>>
>> https://wwwpub.zih.tu-dresden.de/~tilsche/powernightmares/v7_2_skl_sp_idle.png
>>
>> I have briefly ran v7 and I believe it was also affected.
> 
> Then it looks like menu_select() stubbornly thinks that the idle
> duration will be within the tick boundary on those cores.
> 
> That may be because the bumping up of the correction factor in
> menu_reflect() is too conservative or it may be necessary to do
> something radical to measured_us in menu_update() in case of a tick
> wakeup combined with a large next_timer_us value.
> 
> For starters, please see if the attached patch (on top of the
> idle-loop-v7.2 git branch) changes this behavior in any way.
> 

The patch on top of idle-loop-v7.2 doesn't improve idle behavior on
SKL-SP. Overall it is pretty erratic, I have not seen any regular
patterns. Sometimes only few cpus are affected, here's a screenshot of
almost all cpus being affected after a short burst workload.

https://wwwpub.zih.tu-dresden.de/~tilsche/powernightmares/v7_2_reflect_skl_sp_idle.png
Rafael J. Wysocki March 22, 2018, 5:23 p.m. UTC | #2
On Thursday, March 22, 2018 2:18:59 PM CET Thomas Ilsche wrote:
> On 2018-03-21 23:15, Rafael J. Wysocki wrote:
> > On Wed, Mar 21, 2018 at 6:59 PM, Thomas Ilsche
> > <thomas.ilsche@tu-dresden.de> wrote:
> >> On 2018-03-21 15:36, Rafael J. Wysocki wrote:
> >>>
> >>>
> >>> So please disregard this one entirely and take the v7.2 replacement
> >>> instead of it:https://patchwork.kernel.org/patch/10299429/
> >>>
> >>> The current versions (including the above) is in the git branch at
> >>>
> >>>    git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git \
> >>>    idle-loop-v7.2
> >>
> >>
> >> With v7.2 (tested on SKL-SP from git) I see similar behavior in idle
> >> as with v5: several cores which just keep the sched tick enabled.
> >> Worse yet, some go only in C1 (not even C1E!?) despite sleeping the
> >> full sched tick.
> >> The resulting power consumption is ~105 W instead of ~ 70 W.
> >>
> >> https://wwwpub.zih.tu-dresden.de/~tilsche/powernightmares/v7_2_skl_sp_idle.png
> >>
> >> I have briefly ran v7 and I believe it was also affected.
> > 
> > Then it looks like menu_select() stubbornly thinks that the idle
> > duration will be within the tick boundary on those cores.
> > 
> > That may be because the bumping up of the correction factor in
> > menu_reflect() is too conservative or it may be necessary to do
> > something radical to measured_us in menu_update() in case of a tick
> > wakeup combined with a large next_timer_us value.
> > 
> > For starters, please see if the attached patch (on top of the
> > idle-loop-v7.2 git branch) changes this behavior in any way.
> > 
> 
> The patch on top of idle-loop-v7.2 doesn't improve idle behavior on
> SKL-SP. Overall it is pretty erratic, I have not seen any regular
> patterns. Sometimes only few cpus are affected, here's a screenshot of
> almost all cpus being affected after a short burst workload.
> 
> https://wwwpub.zih.tu-dresden.de/~tilsche/powernightmares/v7_2_reflect_skl_sp_idle.png

Thanks for the information!

I will post a v7.3 of patch [5/8] shortly that appears to give good results
for me.  It may be selectig deep states quite aggressively, but let's see.
diff mbox

Patch

---
 drivers/cpuidle/governors/menu.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-pm/drivers/cpuidle/governors/menu.c
===================================================================
--- linux-pm.orig/drivers/cpuidle/governors/menu.c
+++ linux-pm/drivers/cpuidle/governors/menu.c
@@ -498,7 +498,7 @@  static void menu_reflect(struct cpuidle_
 		 * correction factor.  Use 0.75 * RESOLUTION (which is easy
 		 * enough to get) that should work fine on the average.
 		 */
-		new_factor += RESOLUTION / 2 + RESOLUTION / 4;
+		new_factor += RESOLUTION;
 		data->correction_factor[data->bucket] = new_factor;
 	} else {
 		data->needs_update = 1;